199
Correct algebras of bounded capacity
REFERENCES 1. GEBERMAN, Sh. A. and MARINOV, T. M. Estimation of the likelihood of deposits of hydrothermal
ores
using a computer. Tr. MINXand Gubkin GP, No. 62, Moscow, 1966. 2. ZHURAVLEV, Yu. I. Algebraic approach to recognition problems. In: Problems of cybernetics (ProbL kibernetiki), No. 33, S-78. Nauka, Moscow, 1978. 3. BASKAKOVA, L. V. A recognition algorithm with threshold parameters. Zh. vychisl. Mat. mat. Fiz., 20, 5, 13SO-1353,198O. 4. ZHURAVLEV, Yu. I., MIROSHNIK, S. N. and SHVARTIN, S. M. An approach to optimization recognition algorithms Zh. vychisl Mat. mat. Fiz., 16, 1, 209-218, 1976.
in a class of
5. RYAZANOV, V. V. Optimization of algorithms for calculating estimates with respect to parameters characterizing the representability of the standard rows. Zh. vychisl Mat. mat. Fiz., 16, 6, 1559-1570, 1976. 6. DMITRIEV, A. N., ZHURAVLEV, Yu. I. and KRENDELEV, F. P. On a principle of classification and the prediction of geological objects and phenomena Zzc SO Akad. Nauk SSSR. Geol. geofiz., VoL 5, 50-64,1968.
U.S.S.R. Comput. Maths. Math. Phys. VoL 21, No. 5,199-215, Printed in Great Britain
1981.
0041-5553/81/050199-17$07.50/O 0 1982. Pergamon Press Ltd.
CORRECT ALGEBRAS OF BOUNDED CAPACITY ON THE SET OF ALGORITHMS OF THE CALCULATION OF ESTIMATES* V. L. MATROSOV Moscow (Received 8 January
THE PROBLEM of satisfying the sufficient occurrence
of the errors of the recognition
for sufficiently algorithms.
conditions
1981)
of uniform
convergence
of the frequency
wide subclasses U (L, A) of the algebraic closure of the calculation
of estimates
Upper bounds of the capacity of the classes U(L, A) and of the rate of convergence
the algorithms minimizing
a functional
of
algorithms on the teaching sample to their probabilities of
of the empirical risk are obtained.
Introduction
There are a large number of papers devoted to the study of models of recognition algorithms with piecewise-linear separating surfaces and of the calculation of estimates, for which the completeness problem, first formulated by Yu. I. Zhuravltiv, has been investigated. A model is said to be complete, if in it for every finite sample S1, . . . , Sq a (correct) algorithm classifying all the objects Sl, . . . , S, without errors exists. Models of algorithms satisfying the completeness condition have been called correct. In [ I,21 the correctness of linear 5?(V) or algebraic U(A) closures of this kind of algorithm was proved. Moreover, as follows from [3], an algorithm correct for an arbitrary sample (problem), can be defined in the following form A = R 0r(?'), where *Zh. vj%hisl.Mat. mat. Fiz., 21,5, 1276-1291,
1981.
200
V. L. Malrosov
is an operator, r @) is a decision rule, Ak (i, i) is the degree of the operator of the calculation of
estimates, &j E (0, 1). Therefore, the search for a correct algorithm can now be carried out in the family of parametric algorithms {A} = {A (5;,p”,v, f, c,E, fi)},forwhichA=ROr(?)andRhasthe form (1). The problem of finding an algorithm recognizing a given sample, in a general formulation, was considered in [4]. By this approach, in the class of recognition algorithms (rules) F (x, a) (a is the collection of parameters), it is necessary to find an algorithm reducing to a minimum the functional ~(a)=
J [o-~(w)ywz,w), =,no
if the function P (x, w) is unknown. The function R (Q) characterizes the probability of incorrect classification by the algorithm F (x, a), where for each pair (x, w) in the space (X, !&) the description of the object specifies x, and the class to which the object x really belongs indicates o. It is assumed that the number of classes is small, that is, 520 = {I, 2, . . . , k} , a unique correspondence is given between the set of parameters {cz} of some space A and the class of recognition algorithms {F (5, a)$, where F (x, a) takes the same discrete values as w. Let the minimum of the functional R (a) be attained at some point a = oo of the parameter space A. Then as an approximation we take the value o* which is the minimum of the empirical risk functional
. . . , (x6, ~6) is a sample obtained in a series of independent trials with the distribution P (x, w). Here R, (a) equals the frequency u (T,) of occurrence of the
where (xl, al), unchanged
events T, = {(cc, 01) IF(s.
a)+~}.
IfS= {T,[a~_l}.
I?(&)-R(a,)G2
thentheformula
sup IY(Ta)-P(T,)
I.
T,ES
holds. Therefore, approximates
closeness of the values of oo and CY*will be guaranteed,
R (a) uniformly
(2)
if the function R, (a)
with respect to the parameter o, that is, for any E > 0
P{sup 12’(cc)-P(a)
I Be) ‘0,
6+-m.
Sufficient conditions for the uniform convergence of the frequencies of occurrence of the events and their probabilities were obtained in [4], It turned out that the conditions obtained were independent of the distribution P (x, a), which is usually unknown, and were determined only by the class of events S.
Correct algebras of bounded capacity
201
The derivation of sufficient conditions is connected with the concept of the growth function ms (cJ), defined as the maximum number of methods of partitioning CJpoints into two classes by recognition algorithms F (x, 01).As a result if rns (4) is majorized by some power function, then uniform convergence of the frequencies of occurrence of the events to their probabilities occurs, In the case of the family {A} = (A (T, J, 7, 2, I?, A, /3)} these conditions imply the following _ assertion: for every E > 0 a 60 exists (effectively specified) sufficient for the procedure minimizing the empirical risk functional to choose, with probability close to unity, a recognition algorithm for which the error frequency on the sample so0= (S’, . . . , S60) differs from the quality R (a) of the best algorithm in (A} recognizingS6u, by not more than e.
1, Notation, definitions
1. Let us consider the set M = Ml X . . . X M, of admissible objects, where each set Mi is provided with a semimetric pi, for which all the axioms of a metric are defined except for the triangle axiom. Therefore, M= { (a,, . , . , a,) } = {S} , where ui E Mi, i = 1,2, . . . , n, and the following partition of the set M into classes exists:
2. Let the finite sequence of pairs
@I, a(S*)), . . ., CL, a(S”J),
(1.1)
be specified, where ($1, . . . , S,) is an arbitrary aggregate (sample) of objects from M and St=(ail, *. - 7 at,), ai>EMj, i=l, 2, . . . , m, j=l, 2, . . . , n, and (Y(Si) is the information vector (oil, , . . , ail) composed of zeros and units, where oij = 1 if and only if the predicate P~(Si)=(SiEK~)iStrUe,i=1,2 ,..., m,j=1,2 ,..., I We will call (S1 , . . . , S,) a teaching sample, and the sequence (1 .l) the initial information and write10 (Sl, . . . , S,) =lo. 3. LetZj={Si,..., will assume that Rj={S,,
S,} flKj and rz>= =$S,,. . . , S,,) \Kj. Without loss of generality we . . . , S,,,,} and CK.,={S,,,,+j- . + . 7 Sl~l>-
4. Let there be given for recognition the finite sample of objects (Sl, . . . , 3) = $J ; we will call it the working sample. Every algorithm A of the calculation of estimates is defined (see [2]) in the form A =A (7, i;, y, 2, ?), where z, i;, 7, ?Yis the collection of parameters, A = R 0 I (?), R = R G, F, 7,:) is an operator, p = (cl, ~2); it is applied to the initial information 10 and to the working sample $7, and the result of it is the matrix of estimates of the membership of the objects Si in the class Kj: R(I,,
r (c”) is a fixed decision rule:
sip)=llr,,llrl,:l=IIrI(s.)
I!:
202
K L Matrosov We will consider the specification
changes in comparison
of the operators of the calculation
of estimates with small
with [2].
The system of supporting
sets {Q,>,
Case u. The aggregate (OR} := (ii, . . . , i,,} c { 1, 2, . . . , n) . Gze b. The aggregate
{Q,}
subsets of the set (1, 2, . . . , a}.
consists of all the one-element
The proximity function B (B’). We establish a correspondence a proximity
between every operator R and
function B (B’) defined as follows. Let ? = (~1, . . . , en), the ej be non-negative . . ,~~),S’=(bl,...,
numbersandS=(al,.
real
b,).Thenincasea
in case b
B(S,s’,~,i)=e[~i--pi(ai,bi)], Calculation I&=(y1,...
O(z)=
{
‘z:-;
of an estimate for the class Ki with the given system of supporting
sets
, ym), where yj = y (Sj) > 0 is the weight of the objects Sj, i = 1, 2, . . . , m, , p,);F’j > 0 is the weight of the i-th feature. We put p (CLR) =pjl t . . . t pik, if
F=(p1,... {CA) = {ill . . . , ik}.
In case a
r.%(.sj=
cy
(St)p (0,) B” (Si: S, s) .
In case b
C
rj”(S)=
JJ,(S,)=cc
a=O,Bl
=B, BO = I-B; thenF~(S)=xIQl
operators of the calculation
2
y(S,)pP(Si,S,F,Y),
v=1
+x0 FjO, xi = 0, 1. We will denote the class of
of estimates by (a>.
The decision ruler (?). Here C=< (cl,, cz,), (ilrr,ilqX!) otherwise.
zIIpijl!p:SI~i=ly 2, . - . 3 1,
In the set of operators and multiptication
. . . , (c,[, czl) ), O
where /IQ = 1, if I’ij > c2j, I_‘j = 0 if Fjj < clj, and fljj = A
{a} = {R (T, P, 7, F j } the operations
of addition,
multiplication
by a scalar are defined.
Let RI, R2 E (R} and R, (lo, sq) =IIQ*IIq,l, R,(I,, the operators b . R 1,.R 1 t R2, RI * R2 as follows:
L?)
=IInzj’I/qv!;
(/:-RI) (lo, Sq)=lib*G1ll*x~: \R,-l-R:) (lo, sq)
(K,* Rz)
=Ila~j’+a,,2ilqxr,
(In, S*) =Ijaij’
.ail’IIqxl*
b is a scalar. We define
Correct algebras of bounded capacity
Let N be the natural series. We construct
203
the following set of operator polynomials:
for every
k,LEN U(L,
Ri E (i?}:
k) =(RjR=c$CiRi)“;
c, -
are constants
i=l
andforeverynaturalt
.?T(L,k)={RI(5Il,)...(ZIl,)
[l?=~F,R..f,i.=L].
j where Ri E “v (Zi, k), the Fi are constants
f-1
; ) i=l,2
\7
t. In particular,
)...)
i-i
-51 (L, k) = E (L, k),
Then we put
It is obvious that c (R, L) C U(R), where U (R) is the algebraic closure of the operators
2. Formulation
of the problem
Let the working sample 34 = (Sl , . . . , Ss) and be specified. The problem Z (II-J,Fq) = Z (Lo, 9, . . . algorithm calculating the information matrix Iloijllqx the algorithm A of the calculation of estimates to the II&&xl,
{I?).
the initial information IO = 10 (Sl, . . . ?Sill) , ST) consists of the construction of an 1, where “ii = Pj (S). The result of applying problem 2 (lo, s”ll) is A (Zo, Sl , . . . , Sq) =
a matrix which can be interpreted
as follows: if & E (0, I} then & is the value of the predicate Pi (9) calculated by the algorithm A, and if flij = A, then the algorithm has not calculated the value of Pi (9). Definition
IlQijllqx/,
1. The algorithm A is said to be correct for the problem Z, if A (lo, Sl, . . . , Sq) =
where ]]oi~]]qx 1 is the true information
Definition
2. The class of algorithms
if for any problem
ZE. {Z} an algorithm
matrix for the sample Fq.
{A} is said to be correct for the set of problems AE {A}
exists, correct for Z.
Let U(A) (or “v (A, L)) be a set of algorithms such that A = R o r(F), R E $(R,
where R E U(R) (or
L)). In [2] the class of all the problems
{Z} defined above was considered, and the following theorem was proved.
negligible constraints,
{Z},
with
Theorem 1 The algebraic closure U(A) of algorithms of the calculation recognition
algorithms
{A} = {i?cr-(e)}
of estimates with the set of
is correct on the set of problems
(2).
In [3] it is proved that for every problem Z = (lo, z9) the correct algorithm A sought can be specified in the following form (l), where
R=(CI+C?)
22 r=,
,=I
P
lk,
204
V. L. Matrosov
the A’ (i, j) are operators
of the linear closure
of the class (a}.
2 (R)
We prove the following lemma.
Lemma 1 An algorithm of the form (1) belongs to the class n(,4, L) if
L=qZ (2q+z-3). Proof In [2] the operator A’ (r, i) is constructed
(2.1)
by introducing
the auxiliary operators
1
Rj=
c
R+
Rjl,
ifj
’ E
Rjt, Ri,jE {a} 7
Rivj,
vzi
or _n
Ri-iX
z
(R,:‘--R,:*),
R,uj’y
Rid2E
{R}
e
c#i
Then R’ (i, j) was obtained by multiplying
by a constant
the operator Rij = Rj + Rii.
RjEC(Z-1, I), R,jd(2(q-1), 1), R,d(2q+Z--3, A’Mql (qZ(2q+l-3)) k), or A’& (R, qZ(2q+l-3) > . Consequently,
I). From this
We will now search for a correct algorithm for the arbitrary problem Z in the class of algorithms U (A. L). We now consider the algebraic closure of the algorithms for calculating algorithm A E U(A) has the form A = R o r (?), where
R=
the &,E{R},
c(&, . . . , iP) RiP’. . . RiIlpPy II ti,,.,.,t,1
avE2V, c(il, . . . , iP)
are constants.
v=l,
estimates U(A). Every
2,. . * ) p,
(2.2)
Here Riv for every v = 1, 2, . . . , p is defined
by the collection of parameters~,,~V, ‘;;, z,,,, and therefore in accordance with (2.2) each algorithm A E U(A) can be associated with a collection of parameters
As a result to each algorithm A E U (A) there corresponds a function A (S, (Y),which it calculates. If S E {S> is an admissible object, then,4 (S, a) =fif and only if the value of the predicate Pi (S) calculated by this algorithm equals unity. Below we will identify the algorithm A and the function A (s, 4.
205
Correct algebras of bounded capacity
In [4] the following was considered
as the quality functional
of the operation
of the
algorithm A (S, o):
A(a)=J0{[o-A(S,a)l’}dP(S,o), where P (S, o) = P (S) P (o IS), P (S) is the probability
(2.3)
of the occurrence
of the situation
(object)
S, and P (u IS) is the probability that the object S will be assigned to the class numbered Then the functional R(a) defines the probability of incorrect classification OE{l, 2,. *., I}. by the algorithm,
and the problem arises of finding the algorithm A (S, a) reducing the functional
R (a) to a minimum. independent
The function
(teaching)
P (S, w) is regarded as unknown,
. SS), found from the distribution
Also given is the working sample r’4 = (51, random and independent trials:
(S’, x(P)),. The information
but the random and
sample (1, 1) is given,
..)
P (S, w) in
(P, x(P)).
vectors (Y(Si), i = 1,2, . . , q, are regarded as unknown.
By Lemma 1, a correct algorithm recognizing the sample Fq and using the teaching sequence of pairs (1. I), belongs to the class D (A, L) C U (4) for an L satisfying (2.1). However, this class E? (A, L) depends on the length of the sample 4; therefore there is no guarantee that we will find ourselves in the conditions
of the deterministic
formulation
(41 of the teaching problem of pattern
recognition, although it is obvious that the class U(A) includes an algorithm classifying without error all the objects of Sq. In this connection we will consider the stochastic version [4] of the pattern recognition problem. In this case the problem reduces to finding an algorithm A (S, a) minimizing the functional
,i(a)=$f:[a3i-A(Si,a)]2, where a(S')=(aij,
...,
a,,), alo i=l.
Let the minima of the functionals respectively.
(2.3) and (2.4) be attained at the points a* and a0
Then a measure of the proximity
of a* and au is p (cK*,(~0) = d (au) - i (a*) and it
follows from (2) that closeness of the values can be guaranteed uniformly with respect to the parameter a. We write
T,= { (S, co)IA (s, Then the functional
(2.4)
‘-xx> +a).
fi (a) is for every value of a the probability R(a)
=P(A(S,
if do (a) approximates
d (a)
(2.5) of the event (2.5):
a>+a}=P(T,).
The empirical estimate & (0) equals the frequency of occurrence u (Q) of this event in the given sample. We denote by F the class of events T, for any a. Then the uniform proximity of the functions d (a) and & (a) indicates the uniform proximity of the frequencies and probabilities of the events T, in the class F.
206
V. L. Matrosov
In [4] sufficient occurrence
conditions
were obtained
of events to their probabilities,
formulating
for the uniform convergence
independent
of the distribution
of the frequencies
function.
of
Before
them we give the necessary definitions.
Definition 3. We define a classification disjoint sets $1 and & S,flS2=$.
of the sample 5/s as every representation
of it as two
{S’l, . . . , Sta}, sz= {S**+r, . . ,’. S’6} ,)where S,US,=S”,
that is, s,=
We will denote a classification
of the sample F6 by (31, Sz); r (c”> is a fixed decision
rule defined by the constants cl and c2 (cl > ~2).
Definition 4. We will say that the operator R (the algorithm A (S, CY))realizes the classification $1, F2) of the sample T6 in the class Kj, if for any objects St E Fl and Sr’ E F2 the estimates of the operator (the values of the algorithm A (S, cu)) satisfy the conditions A(S’,
a)=J,
A(F’,
Let the class of algorithms corresponding
I’s >ci,
I%=%
v.)G‘7. {A (S, a)}
and the class of events .!9- (S, a) = {T,}, to it, and also the fixed sample ?& = (9, . . . , N), be given.
Definition 5. We will define the index of the system 9- (S, a) with respect to the sample ??4 as the number of classifications
of the sample s”4 realizable by algorithms of {A (S, a) }, and
denote it by the number
EBW, a) (S’, . . . , Sq) =Eb (Sq). EC7 (SC j < 2’.
Obviously,
Definition 6. The growth function of the class {A (S, a) } is defined as mr(4)=
mas Edts,u)(S1,. (S’,...,S’)
A theorem obtained in [4] implies an important
. . ,S;).
characteristic
of a class of algorithms. We will
formulate it.
Theorem 2 The growth function rns(4) function
1.5@-l/(n-
is either identically
equal to 24, or it is majorized by the power
l)!, where n is the least number 4 for which LQ- (4) +2*.
It follows from this theorem that the number n- 1 is a measure of the diversity of the class {A (S, a) 1.
Definition 7. We will call this number the capacity of the class {A (S, CY)} and denote it by AA (s, a).
If rrzd (q) =27 for every 4, then the capacity is regarded as infinite, otherwise it is finite. Then the finite capacity of the class {A (S, a)} IS a sufficient condition of the uniform convergence of the frequencies of occurrence of the events of the class P- (S, CC)to their probabilities.
As the class {A (S, a)} we will consider the class of algorithms 8 (4, L). Below we will prove the finite capacity of the class p (A, CY)for every L EN
Correct algebras of bounded capacity
207
3. Auxiliary constructions
S6),Fm=(S1 ,...,
I_etFP=(Sl,..,, Sj=(bjl,...,
bjn),j’l,Z
p(r,j,i),andalsoF=(1,2
)...,
m. Following
,...,
6),rq=m-ml.
S,)whereSi=(ail,..., ain),i=l,2 ,..., [3], we adopt the notation pr (air, bir) =
6,
We consider the mappings aJ,A(I,,
X) = iL4, . . .
a),p’(Zo, 6 j = (tM,
h”) :
)
...,
k=l,
w’),
2, _. . , 17r,,
r=l,
2,. . .,
(3.1)
2,. . . , nz,,
p=l,
such that p (r,
k,
t,.,“) G . . . Gp(r,
k,
frbk!, (3.2)
As a result the mappings @rk and Qrp’ generate the permutation
iiulll
...
I/ad’
I
. . * &lb’,
,,
IIa~211,1xb=
)...,
!
We will call the sample r’s a sample of general type if all the inequalities mappings (3.1) are single-valued.
r:
(3.3)
u:bi
. . . . ./
ilU,ilIlnxb=l/
matrices of the collection
(3.2) are strict. Then the
Below we consider only samples of general type.
Let us consider the r-th rows of the matrices of (3.3): i (r, k) = (t,,‘,
. . . , ttbk),
(3.4)
We define the quantities
~~F,~~(~)=IfIei&-(,(r,k,t,i*)l, j=l
(3.5)
ei:,klC~)=rj[ B[E-p(r,k,a,j*)].
e(z)=l-0(Z).
j=l
We write z (r, k) = min[ft[::kj I
l(r,
p) = min[s,,‘,:,,
(z-l-1) =OVJ+~I? (3.6) (x+1)
x It is obvious that z (r, k), I (r, p) E (0, 1, . . . , S}.
=OVr=61.
208
K L. Matrosov
Definition 8. We will call an ?-section information
of the sample Fs (with respect to the initial
ZO) the pair of matrices of the form 2 (1, 1) . . . z (1, m.1)
F1(Jo,;,g*)=
. . . . . . . . . * , 2 (n, I) * . . 2 (n, ml) i!
F, (lo, ;, g6) = Let us
fix
the initial information
set of all the operators
(a} = (R ( T,
Let a system of one-element
1(1, 1). * .1(1, m) j . . . . . . . . . . 1. 2 I (n, 1). . .I (n, 7%) II
10 and the sample S “6. We introduce
a binary relation on the
Z)} .
supporting
sets {Q) = {i} be given.
Definition 9. We will say that the operators R 1 (‘71,Xl), R2 (;;i,X?2) with the system S2 are in the relation L (R, _i R?), if X= X2 and ?‘I, ;;i (sections of the sample F6) are identical, that is pi(ZO, s,, s”)=F,(Z,. F.. Scl,l=l. 2. We note that this definition relation is an equivalence class of l-equivalence
is independent
of the values of c, Fi. It is easy to see that this and will denote by G1 (R, 52) the
relation. We will call it a l-equivalence
generated by the operator RE {a}.
Let the supporting
set O= {il. . . . , ik} be given, We consider the submatrices
matrices (3.3) obtained by deleting from (3.3) the rows with the numbers
Definition 10. We will define an Ssection
of the
!@ {i,, . . . , in} :
of the sample ??, consistent
with S2’ (an 52’
e-section) as a pair of matrices
Ff, (Jo,;:
x8,=
2 (il? 1) * . . z (il, ml) . . . . . . . . . .
,
5 (ik, 1) . . . z (ik, ml) I (iI, 1). . . I (iI, m,) J’f’ (lo, I, x6) =
.
.
, l (ik,
.
.
.
.
.
.
1) . . . 1 cikj
.
.
m2)
and denote it by (F,“’ (T) , F2” (T) ). Then we will say that the operators R,=R, (G, iZi), Rs=-Rz (Tz, 5~) E {a) with the supporting set 52’ are 1-equivalent, if ?I = Z?l and the 52’et -section and the S1’e2-section are identical:
We denote by Gt (R, Cfj the class of operators
RE {R},
l-equivalent
with respect to S2’.
Correct algebras of bounded capacity
209
Therefore, the set of operators {I?} partitions into equivalence classes: let
{nl>=
{R2}=
U G1(R, Q), 6, N)
U
e,;
U G1(R 9’) ,
I
[ Q,
then {$I) = {a’} iJ {I?}. Let us regard x [Cl (R, a)] and x [G, (R, Q’)] as the number of equivalence classes of the sets (R’) and {R’} respectively. Theorem
3
The following inequalities hold:
x[G,(R, C!)]~3(m6+1)“+1,
(3.7)
E
x[Gi(R Q’) I< * c,“[3(m6+1)+11.
(3.8)
b-i
We consider the r-th row of (3.4) for k = 1,2, . . . , ml, p = 1,2, . . . , m2, r = 1, We form from them a sequence of all the elements of such rows. We write tyil’” =ali* and form for the sequence Proof:
2 >* *., n.
t(r, I), . . . ?f(r, m,),
a(r, I), . . . , a(r,
m,)
(3 -9)
the following substitution:
!
t:l
...
&).
t:.s . . .
t;;
. . . tz
...
Ap(fZn$). . .q&P..
. .(r (&L.
t::
. . . tz
* q(t::).
. . q&)
(3.10) ’
where cpis the injective mapping of the sequence (3.9), defined as follows. Let us write cp(t?:) =f~~~i~,\,, and let
i), (F(Ji,i)),
(2, i)=(T(li, (Z’, j’) = =
i-1)Fq2[,k?i+l)), (cF(h7
1
(@(k-J- 1: 11.i(k-I_ -
i=l,2 1, l)),
,...,
i=6,
6-1,
k=l,2,...,?77,
k=ly2,...,m-1.
Then PO=
P P, @Cl, I), vC.1, 1))=
p(r, Z’, j’)-p(r,
1, j)=
min
min
i=l,Z,...,rn 3=1,:, 6
{plp(r,
p(r, i, j),
Z”, j”)-p(r,
I, j>>O).
I”E(i 02 ...Im)\(l) j’*E(* II2 96)\(j)
We define the predicate: PT(e, 1, j, Z’, j’)=(e[p(r,
1,i), P(r, l’, i’)l),
(3.11)
(3.12)
210
ViL. Matrosov
Then, by (3.10)
(3.1 l), (3.5), for every pair e’, E” > 0, on which the values of the truth of the
predicate 9, for any fixed 1,j, I, 1’ are identical, the equations
are satisfied for any natural x. Consequently, identical.
by (3.6), the E’ and E” sections of the sample ssb are
Let us consider pr =
mas p (r, i, i) i--l ?,.. ,m ;-1,2,...,6
and the predicate ‘Y,(e)= then
v s,(E,l,j,z’,i’), I.,.l’.j’
Let us arrange the set of elements M,= {p (r, i, j) ] in I,&. . . , 2,. . . , s} in increasing order. Then the elements of this increasing sequence form a of the segment [po, p1 1, each segment of the partition being identical with some segment
9, (e) = (EE [pO, p,] >.
m, p=i, partition
of the form [p (r, I, i), p (r, I’, j’)]. Since there are three non-zero vectorsx” and 1Mr1= M6, ) ($2) 1=n, then from definitions
8,9 we obtain the estimate (3.7).
Since the number of different one-element proof of (3.7) and definition
supporting
sets of power k is C&k, then, using the
10, we obtain (3.8).
Corollary. The total number x of equivalence
classes with respect to the systems of supporting
sets {R} and {O’} is overbounded:
~<‘2”[3(m6+1)“+1]. Indeed, it follows from (3.9), (3.10) that 1+(2=-l’)
[3(??26+1)+1]
x=z[G,
(3.13)
(R. a) ]+x
[C, (R, Q’)] GS(rrrdi1)“+
==2’~[3(7?26+1j”+l].
4. The fundamental theorem We introduce supporting
the notation
Cl (f?) for the class of l-equivalence
with the arbitrary
system of
sets (O} or {a’).
We associate with each object Sr of the working sample r6 and with each class GI (R) the Boolean vector br [G, (R)] =
(Et’, . . . , ‘6”‘)
(4.1)
as follows. Let R E Cl (R) and R = R (3. C&se 1. Cl (R) = Gr (R, 52). Then Oti=(@ir’, S’, s, r),
i=l,
2,. . . , m,, E,,‘=B(S,,
Case 2. G, (R) =G, (R,
. . . , @in’), i=l,
S’, e, r),
i=m,+l,.
Q’) , Q’= {ii, . . . , ik}, IThen for
2, . . . , m, . . , m,
r E a2
@~r’=~(s~
r=l,2,.
. . ,n.
211
Correct algebras of bounded capacity
e&‘=B(&,
S’, e, r),
i=m,+l,
&'=B(&, S':E,r)?
It
. . . , 02,
2:. . . , m,,
i=l,
o,,-‘.
2,...,mj,
i=l,
i.C!!’
G:‘=
’
sign (EG,:
)
i=m,+l,.
. , , m.
r=r?’
The definition of (4.1) is complete. We note that 6)ii~ n-dimensional vector in case 1.
(0,1) incase 2, while 8,’ is an
It follows from definition (4.1) that the number of different vectors does not exceed 2mn in case 1 and 2m in case 2.
9')] are Definition 11. We will say that the vectors B, [G, (R,Q)] and Or [G,(f?, characteristic vectors for the object St with respect to the equivalence classes Cl (R, !TL)and G1 tR, a’). Lemma 2. For all objects St1 and St2 of the sample s”6 with identical characteristic vectors (Gt, [G,(K, Q)l=@i,[G,(4 WI) for any operator Ro E Cl (R) the estimates attained by the operator Rg are identical: I’fljo= I’:;. Proof: For case 1 forRo =Ro (Qsince (B*(Sj, S’s, F. I),
@,,[G,(R, Q)]=@,,[G,(R, G)]?thevectors a.eTBa(Sj, A"', e,n)),
\;B"(Sj, S':, E,I),....B"(Sj, S'z, ~3n)) by definition (3.4) are equal fori = 1, 2, . . . , m, a = 0, 1. Consequently, mt
n
yn S'I,T,l.)" r:,,=cl!2p,.*::B(S,. YC
For case 2, by definition (3.4), OLli=Ol~, i=l,
p&l (S,, I.!:-, T,1.) =r!,,'.
2, . . . , m, that is,
212 then
Similarly,
The lemma is proved. We will now prove the fundamental
theorem guaranteeing
the finite capacity of the class of
algorithms u” (A, L).
Theorem 4 For every L EN there exists a 6 such that any sample y6 has a classification
(gl, @ not i
realizable by any operator R E c (R, L). Proof: Let us assume the contrary:
let there exist an L such that for any 6 every classification
(31, &) of the sample F’s is realized by some operator R E Y?(R, L). By the definition c (R, L), every operator
of this class is an operator polynomial
of the class
of the form
(4.2)
We introduce
the following equivalence
relation (L-equivalence)
on the set of all operators of
3 (R, L). Also let an operator
be given of the class p (R, L). We will say that the operatorsR and R’ are L-equivalent (R k=R’) , if R,L&’ for every i = 1,2, . . . , L. As a result the class g (R, L) is partitioned into equivalence classes, which we denote by GL (R),
Correct algebras of bounded capacity
This definition
213
implies that every class is generated by an operator L
Ro=
z
Ri.
(4.3)
i=i
By (3.13), the number of classes of L-equivalence
does not exceed
{~“[3(m6+l)“+l]}~<2~‘“+2’(m6+1~L”.
Let us consider the arbitrary
(4.4)
class CL (R) generated by (4.3), where every operator
Ri E Cl (Ri). Here for every r = 1, 2, . . . ,6 to the object S there corresponds
the Boolean vector
0, [GL(R)] =(Ot [G, (Z-6)I,. . . ,@, [Gi (RL)I )> which we call the generalized characteristic vector of the object with respect to the class GL (R). It is easy to see that the number of different generalized vectors for the class G,r_(R) does not exceed
iPnL.
(4.5)
We note that if @,,[G,(R)] =6,,[GL(R)]f or some rl #Q, then CS,,[G,(Ri)] =bt,[G,(Ri)] foreveryi= 1,2,. . . , L and accordingly the estimates attainable by the operators Ri with respect to the class Ki for the objects St1 and St2 are identical, that is, rif, =I’f,’ . Therefore the estimates attainable
by each operator of the form (4.2) for the given objects equal, respectively,
(4.6)
Let us consider the sequence of generalized characteristic F6. Then, by condition
(4.9,
this sequence contains not less than [6/2mnL]
is, natural numbers 71, . . . ,T+E
8,, R [ Gt (R) ] . We introduce
vectors for the objects of the sample
[l,S]
existsuchthatcR
> [S/2”“L]
the following set of classifications
identical vectors, that
and aT,[GL(R)]=...=
of the sample @ :
fm[RIL-{(Si, $) 1zrxIp(r, p=l, 2,. . . , CR)
x [ (Pd,)
& pd&)]
}*
We calculate the number of elements of this set:
This equation follows from the fact that the number of all the subsets of the set !&= {srl, . . . , S-CR}, which are not empty and are not the whole set %I [ R] L, equals 2cR -2, and the number of subsets of the set s’\!Dl, is 26-cR.
K L..Matrosov
214
It follows from the assumption made at the beginning of the proof that each of the 2s -classifications of the sample is realized by some operator of an appropriate class of L-equivalence. Then it follows from (4.4) that there exists not less than [2 ’/-oL(*+*)(m6+ I) Ln] classifications realizable by the operators R E “v (R, L), belonging to the same class GL (Rg). Let R be the set of all those classifications. We will prove that Rfl!DI[R,] r,st$. Indeed, the number of classifications of the sample ssb, not belonging to the set W, does not exceed
On the other hand
This inequality is satisfied for fairly large S, more precisely: 2
c~o-1~2[6/2mn~l-1>2L(n+2,
The latter is true if [C?/2”‘nL]-1>L
(m6+1)L".
(n-l-2) SLn log? (me-k 1). or
6>2”“L[Ln
log, (m6+1) +L(n+2) +11.
(4.7)
But this implies that a classification (s/, , $+) &D [I?,] L, exists belonging to R. By the definition of 9 [Ro] L, at least two objects S 7r E 31 and STP E 32 exist for which& [GL (R,) ] =GTP [ G, (R,) ] , but by (4.6) the estimates Frrj and r,,j, attainable by any operator of the class GL (RI-J) are identical, that is, rTrj = rr,j. But this is impossible, since !?I n Tz = @.The theorem is proved. Remark. Let us consider the set of operator polynomials 0’ (R, L) of the form (4.2), where for all i = 1,2, . . . , L the operator Ri has the system of supporting sets {sl}=(i). Then in the formulation of Theorem 4 instead of the class 2 (R, L) we can consider the class p (R, L) and the estimate (4.7) is replaced by the following:
6>2~~L[Lnlogz(mC?+l) +2L+1].
(4.8)
Indeed, by (3.7) the number of classes ofl-equivalence does not exceed [3().15+1) ‘1+1]L<22L (nd+l)Ln. Moreover the estimate (4.5) remains unchanged. Then, solving the inequality 2(a’2mnL)-i>22L(nlG+1)Ln for 6, we obtain (4.8). Let F (S, c~)be the class of events (2.5) corresponding to the class of algorithms $ (A, L); then Theorem 4 implies the estimate of capacity AA (S, a) of the class G (A, L). Corollary.
The following inequality holds: (4.9)
We will now obtain an upper bound for AA (S, CY)in a more convenient form.
Correct algebras of bounded capacity
215
Theorem 5 The capacity of the class of algorithms
Let a working sample ?? be given such that 6 = 4mnL+*. In particular,
hoof:
and also 2L+1~2mnL+1, everym,nandL=1,2
then Ln [log,
LnS1<2mnL;
(2 log? 6-I-l) +2mnL+1~4(mnL)‘+ 2mnL’4
3 (A, L) has an upper bound:
,...
rns t 1 < 62,
(n&i-1)+1]+2L+l
Il~nL+1~3”‘“L+i.
The last inequality
. Using also the estimate (4.9) and the equation Imd+*
holds for =
X 2mnL, we obtain (4.8). We will apply this result to estimate the length of the sample
sufficient for teaching in the recognition problem. Following [4], we will assume {S,, . a., &J that the lengths of the working and teaching samples are the same, that is F& = (Sl , , . . ,S6) and ~~=(S1,*..&). Also let 36 and r6 have been obtained
in a series of independent {A (S, a) } : be given with the capacity AA (,I$a).
algorithms
trials and a class of
Theorem 6 (see [4]). For the error frequency
ot the recognition
algorithm
A= {A (s,
a) } on the teaching and
working samples to differ, with probability greater than l- 7, by not more than E, it is sufficient that the length of the samples satisfy the condition
6= 2&d&
1_ ln(r)/5)
a)
e”
[
- (C/2)
,
--In+ I
6, CT. a)
Theorems 5 and 6 imply Theorem 7. Theorem 7 For every 0 < ‘I)< 1 and E > 0 in order that, with probability frequency
of the recognition
greater than 1 - q, the error
algorithm A E D (A, L) on the teaching and working samples should
differ by less than E, it is sufficient
that the length of the teaching sample FQ satisfy the condition
In conclusion the author acknowledges problem and for his interest.
his indebtedness
to Yu. I. Zhuravlev for suggesting the
Translated by J. Berry
REFERENCES 1. ZHURAVLEV, Yu. I. Correct algebras on sets of incorrect (heuristic) a$orithms 14-21,1977.
I. Kibernetika, No. 4,
2. ZHURAVLEV, Yu I. Correct algebras on sets of incorrect (heuristic) algorithms *l-27,1977.
II. Kibernetika, No. 6,
3. ZHURAVLEV, Yu. I. Correct algebras on sets of incorrect (heuristic) algorithms 35-43,1978.
111.Kibemetiku, No. 2,
4. VAPNIK, V. N. and CHERVONENKIS, A. Ya. Theory of pattern recognition (Teoriya raspoznavaniya obrazov), Nauka, Moscow, 1974.