Correct algebras of bounded capacity on the set of algorithms of the calculation of estimates

Correct algebras of bounded capacity on the set of algorithms of the calculation of estimates

199 Correct algebras of bounded capacity REFERENCES 1. GEBERMAN, Sh. A. and MARINOV, T. M. Estimation of the likelihood of deposits of hydrothermal ...

996KB Sizes 0 Downloads 19 Views

199

Correct algebras of bounded capacity

REFERENCES 1. GEBERMAN, Sh. A. and MARINOV, T. M. Estimation of the likelihood of deposits of hydrothermal

ores

using a computer. Tr. MINXand Gubkin GP, No. 62, Moscow, 1966. 2. ZHURAVLEV, Yu. I. Algebraic approach to recognition problems. In: Problems of cybernetics (ProbL kibernetiki), No. 33, S-78. Nauka, Moscow, 1978. 3. BASKAKOVA, L. V. A recognition algorithm with threshold parameters. Zh. vychisl. Mat. mat. Fiz., 20, 5, 13SO-1353,198O. 4. ZHURAVLEV, Yu. I., MIROSHNIK, S. N. and SHVARTIN, S. M. An approach to optimization recognition algorithms Zh. vychisl Mat. mat. Fiz., 16, 1, 209-218, 1976.

in a class of

5. RYAZANOV, V. V. Optimization of algorithms for calculating estimates with respect to parameters characterizing the representability of the standard rows. Zh. vychisl Mat. mat. Fiz., 16, 6, 1559-1570, 1976. 6. DMITRIEV, A. N., ZHURAVLEV, Yu. I. and KRENDELEV, F. P. On a principle of classification and the prediction of geological objects and phenomena Zzc SO Akad. Nauk SSSR. Geol. geofiz., VoL 5, 50-64,1968.

U.S.S.R. Comput. Maths. Math. Phys. VoL 21, No. 5,199-215, Printed in Great Britain

1981.

0041-5553/81/050199-17$07.50/O 0 1982. Pergamon Press Ltd.

CORRECT ALGEBRAS OF BOUNDED CAPACITY ON THE SET OF ALGORITHMS OF THE CALCULATION OF ESTIMATES* V. L. MATROSOV Moscow (Received 8 January

THE PROBLEM of satisfying the sufficient occurrence

of the errors of the recognition

for sufficiently algorithms.

conditions

1981)

of uniform

convergence

of the frequency

wide subclasses U (L, A) of the algebraic closure of the calculation

of estimates

Upper bounds of the capacity of the classes U(L, A) and of the rate of convergence

the algorithms minimizing

a functional

of

algorithms on the teaching sample to their probabilities of

of the empirical risk are obtained.

Introduction

There are a large number of papers devoted to the study of models of recognition algorithms with piecewise-linear separating surfaces and of the calculation of estimates, for which the completeness problem, first formulated by Yu. I. Zhuravltiv, has been investigated. A model is said to be complete, if in it for every finite sample S1, . . . , Sq a (correct) algorithm classifying all the objects Sl, . . . , S, without errors exists. Models of algorithms satisfying the completeness condition have been called correct. In [ I,21 the correctness of linear 5?(V) or algebraic U(A) closures of this kind of algorithm was proved. Moreover, as follows from [3], an algorithm correct for an arbitrary sample (problem), can be defined in the following form A = R 0r(?'), where *Zh. vj%hisl.Mat. mat. Fiz., 21,5, 1276-1291,

1981.

200

V. L. Malrosov

is an operator, r @) is a decision rule, Ak (i, i) is the degree of the operator of the calculation of

estimates, &j E (0, 1). Therefore, the search for a correct algorithm can now be carried out in the family of parametric algorithms {A} = {A (5;,p”,v, f, c,E, fi)},forwhichA=ROr(?)andRhasthe form (1). The problem of finding an algorithm recognizing a given sample, in a general formulation, was considered in [4]. By this approach, in the class of recognition algorithms (rules) F (x, a) (a is the collection of parameters), it is necessary to find an algorithm reducing to a minimum the functional ~(a)=

J [o-~(w)ywz,w), =,no

if the function P (x, w) is unknown. The function R (Q) characterizes the probability of incorrect classification by the algorithm F (x, a), where for each pair (x, w) in the space (X, !&) the description of the object specifies x, and the class to which the object x really belongs indicates o. It is assumed that the number of classes is small, that is, 520 = {I, 2, . . . , k} , a unique correspondence is given between the set of parameters {cz} of some space A and the class of recognition algorithms {F (5, a)$, where F (x, a) takes the same discrete values as w. Let the minimum of the functional R (a) be attained at some point a = oo of the parameter space A. Then as an approximation we take the value o* which is the minimum of the empirical risk functional

. . . , (x6, ~6) is a sample obtained in a series of independent trials with the distribution P (x, w). Here R, (a) equals the frequency u (T,) of occurrence of the

where (xl, al), unchanged

events T, = {(cc, 01) IF(s.

a)+~}.

IfS= {T,[a~_l}.

I?(&)-R(a,)G2

thentheformula

sup IY(Ta)-P(T,)

I.

T,ES

holds. Therefore, approximates

closeness of the values of oo and CY*will be guaranteed,

R (a) uniformly

(2)

if the function R, (a)

with respect to the parameter o, that is, for any E > 0

P{sup 12’(cc)-P(a)

I Be) ‘0,

6+-m.

Sufficient conditions for the uniform convergence of the frequencies of occurrence of the events and their probabilities were obtained in [4], It turned out that the conditions obtained were independent of the distribution P (x, a), which is usually unknown, and were determined only by the class of events S.

Correct algebras of bounded capacity

201

The derivation of sufficient conditions is connected with the concept of the growth function ms (cJ), defined as the maximum number of methods of partitioning CJpoints into two classes by recognition algorithms F (x, 01).As a result if rns (4) is majorized by some power function, then uniform convergence of the frequencies of occurrence of the events to their probabilities occurs, In the case of the family {A} = (A (T, J, 7, 2, I?, A, /3)} these conditions imply the following _ assertion: for every E > 0 a 60 exists (effectively specified) sufficient for the procedure minimizing the empirical risk functional to choose, with probability close to unity, a recognition algorithm for which the error frequency on the sample so0= (S’, . . . , S60) differs from the quality R (a) of the best algorithm in (A} recognizingS6u, by not more than e.

1, Notation, definitions

1. Let us consider the set M = Ml X . . . X M, of admissible objects, where each set Mi is provided with a semimetric pi, for which all the axioms of a metric are defined except for the triangle axiom. Therefore, M= { (a,, . , . , a,) } = {S} , where ui E Mi, i = 1,2, . . . , n, and the following partition of the set M into classes exists:

2. Let the finite sequence of pairs

@I, a(S*)), . . ., CL, a(S”J),

(1.1)

be specified, where ($1, . . . , S,) is an arbitrary aggregate (sample) of objects from M and St=(ail, *. - 7 at,), ai>EMj, i=l, 2, . . . , m, j=l, 2, . . . , n, and (Y(Si) is the information vector (oil, , . . , ail) composed of zeros and units, where oij = 1 if and only if the predicate P~(Si)=(SiEK~)iStrUe,i=1,2 ,..., m,j=1,2 ,..., I We will call (S1 , . . . , S,) a teaching sample, and the sequence (1 .l) the initial information and write10 (Sl, . . . , S,) =lo. 3. LetZj={Si,..., will assume that Rj={S,,

S,} flKj and rz>= =$S,,. . . , S,,) \Kj. Without loss of generality we . . . , S,,,,} and CK.,={S,,,,+j- . + . 7 Sl~l>-

4. Let there be given for recognition the finite sample of objects (Sl, . . . , 3) = $J ; we will call it the working sample. Every algorithm A of the calculation of estimates is defined (see [2]) in the form A =A (7, i;, y, 2, ?), where z, i;, 7, ?Yis the collection of parameters, A = R 0 I (?), R = R G, F, 7,:) is an operator, p = (cl, ~2); it is applied to the initial information 10 and to the working sample $7, and the result of it is the matrix of estimates of the membership of the objects Si in the class Kj: R(I,,

r (c”) is a fixed decision rule:

sip)=llr,,llrl,:l=IIrI(s.)

I!:

202

K L Matrosov We will consider the specification

changes in comparison

of the operators of the calculation

of estimates with small

with [2].

The system of supporting

sets {Q,>,

Case u. The aggregate (OR} := (ii, . . . , i,,} c { 1, 2, . . . , n) . Gze b. The aggregate

{Q,}

subsets of the set (1, 2, . . . , a}.

consists of all the one-element

The proximity function B (B’). We establish a correspondence a proximity

between every operator R and

function B (B’) defined as follows. Let ? = (~1, . . . , en), the ej be non-negative . . ,~~),S’=(bl,...,

numbersandS=(al,.

real

b,).Thenincasea

in case b

B(S,s’,~,i)=e[~i--pi(ai,bi)], Calculation I&=(y1,...

O(z)=

{

‘z:-;

of an estimate for the class Ki with the given system of supporting

sets

, ym), where yj = y (Sj) > 0 is the weight of the objects Sj, i = 1, 2, . . . , m, , p,);F’j > 0 is the weight of the i-th feature. We put p (CLR) =pjl t . . . t pik, if

F=(p1,... {CA) = {ill . . . , ik}.

In case a

r.%(.sj=

cy

(St)p (0,) B” (Si: S, s) .

In case b

C

rj”(S)=

JJ,(S,)=cc

a=O,Bl

=B, BO = I-B; thenF~(S)=xIQl

operators of the calculation

2

y(S,)pP(Si,S,F,Y),

v=1

+x0 FjO, xi = 0, 1. We will denote the class of

of estimates by (a>.

The decision ruler (?). Here C=< (cl,, cz,), (ilrr,ilqX!) otherwise.

zIIpijl!p:SI~i=ly 2, . - . 3 1,

In the set of operators and multiptication

. . . , (c,[, czl) ), O
where /IQ = 1, if I’ij > c2j, I_‘j = 0 if Fjj < clj, and fljj = A

{a} = {R (T, P, 7, F j } the operations

of addition,

multiplication

by a scalar are defined.

Let RI, R2 E (R} and R, (lo, sq) =IIQ*IIq,l, R,(I,, the operators b . R 1,.R 1 t R2, RI * R2 as follows:

L?)

=IInzj’I/qv!;

(/:-RI) (lo, Sq)=lib*G1ll*x~: \R,-l-R:) (lo, sq)

(K,* Rz)

=Ila~j’+a,,2ilqxr,

(In, S*) =Ijaij’

.ail’IIqxl*

b is a scalar. We define

Correct algebras of bounded capacity

Let N be the natural series. We construct

203

the following set of operator polynomials:

for every

k,LEN U(L,

Ri E (i?}:

k) =(RjR=c$CiRi)“;

c, -

are constants

i=l

andforeverynaturalt
.?T(L,k)={RI(5Il,)...(ZIl,)

[l?=~F,R..f,i.=L].

j where Ri E “v (Zi, k), the Fi are constants

f-1

; ) i=l,2

\7

t. In particular,

)...)

i-i

-51 (L, k) = E (L, k),

Then we put

It is obvious that c (R, L) C U(R), where U (R) is the algebraic closure of the operators

2. Formulation

of the problem

Let the working sample 34 = (Sl , . . . , Ss) and be specified. The problem Z (II-J,Fq) = Z (Lo, 9, . . . algorithm calculating the information matrix Iloijllqx the algorithm A of the calculation of estimates to the II&&xl,

{I?).

the initial information IO = 10 (Sl, . . . ?Sill) , ST) consists of the construction of an 1, where “ii = Pj (S). The result of applying problem 2 (lo, s”ll) is A (Zo, Sl , . . . , Sq) =

a matrix which can be interpreted

as follows: if & E (0, I} then & is the value of the predicate Pi (9) calculated by the algorithm A, and if flij = A, then the algorithm has not calculated the value of Pi (9). Definition

IlQijllqx/,

1. The algorithm A is said to be correct for the problem Z, if A (lo, Sl, . . . , Sq) =

where ]]oi~]]qx 1 is the true information

Definition

2. The class of algorithms

if for any problem

ZE. {Z} an algorithm

matrix for the sample Fq.

{A} is said to be correct for the set of problems AE {A}

exists, correct for Z.

Let U(A) (or “v (A, L)) be a set of algorithms such that A = R o r(F), R E $(R,

where R E U(R) (or

L)). In [2] the class of all the problems

{Z} defined above was considered, and the following theorem was proved.

negligible constraints,

{Z},

with

Theorem 1 The algebraic closure U(A) of algorithms of the calculation recognition

algorithms

{A} = {i?cr-(e)}

of estimates with the set of

is correct on the set of problems

(2).

In [3] it is proved that for every problem Z = (lo, z9) the correct algorithm A sought can be specified in the following form (l), where

R=(CI+C?)

22 r=,

,=I

P
lk,

204

V. L. Matrosov

the A’ (i, j) are operators

of the linear closure

of the class (a}.

2 (R)

We prove the following lemma.

Lemma 1 An algorithm of the form (1) belongs to the class n(,4, L) if

L=qZ (2q+z-3). Proof In [2] the operator A’ (r, i) is constructed

(2.1)

by introducing

the auxiliary operators

1

Rj=

c

R+

Rjl,

ifj

’ E

Rjt, Ri,jE {a} 7

Rivj,

vzi

or _n

Ri-iX

z

(R,:‘--R,:*),

R,uj’y

Rid2E

{R}

e

c#i

Then R’ (i, j) was obtained by multiplying

by a constant

the operator Rij = Rj + Rii.

RjEC(Z-1, I), R,jd(2(q-1), 1), R,d(2q+Z--3, A’Mql (qZ(2q+l-3)) k), or A’& (R, qZ(2q+l-3) > . Consequently,

I). From this

We will now search for a correct algorithm for the arbitrary problem Z in the class of algorithms U (A. L). We now consider the algebraic closure of the algorithms for calculating algorithm A E U(A) has the form A = R o r (?), where

R=

the &,E{R},

c(&, . . . , iP) RiP’. . . RiIlpPy II ti,,.,.,t,1

avE2V, c(il, . . . , iP)

are constants.

v=l,

estimates U(A). Every

2,. . * ) p,

(2.2)

Here Riv for every v = 1, 2, . . . , p is defined

by the collection of parameters~,,~V, ‘;;, z,,,, and therefore in accordance with (2.2) each algorithm A E U(A) can be associated with a collection of parameters

As a result to each algorithm A E U (A) there corresponds a function A (S, (Y),which it calculates. If S E {S> is an admissible object, then,4 (S, a) =fif and only if the value of the predicate Pi (S) calculated by this algorithm equals unity. Below we will identify the algorithm A and the function A (s, 4.

205

Correct algebras of bounded capacity

In [4] the following was considered

as the quality functional

of the operation

of the

algorithm A (S, o):

A(a)=J0{[o-A(S,a)l’}dP(S,o), where P (S, o) = P (S) P (o IS), P (S) is the probability

(2.3)

of the occurrence

of the situation

(object)

S, and P (u IS) is the probability that the object S will be assigned to the class numbered Then the functional R(a) defines the probability of incorrect classification OE{l, 2,. *., I}. by the algorithm,

and the problem arises of finding the algorithm A (S, a) reducing the functional

R (a) to a minimum. independent

The function

(teaching)

P (S, w) is regarded as unknown,

. SS), found from the distribution

Also given is the working sample r’4 = (51, random and independent trials:

(S’, x(P)),. The information

but the random and

sample (1, 1) is given,

..)

P (S, w) in

(P, x(P)).

vectors (Y(Si), i = 1,2, . . , q, are regarded as unknown.

By Lemma 1, a correct algorithm recognizing the sample Fq and using the teaching sequence of pairs (1. I), belongs to the class D (A, L) C U (4) for an L satisfying (2.1). However, this class E? (A, L) depends on the length of the sample 4; therefore there is no guarantee that we will find ourselves in the conditions

of the deterministic

formulation

(41 of the teaching problem of pattern

recognition, although it is obvious that the class U(A) includes an algorithm classifying without error all the objects of Sq. In this connection we will consider the stochastic version [4] of the pattern recognition problem. In this case the problem reduces to finding an algorithm A (S, a) minimizing the functional

,i(a)=$f:[a3i-A(Si,a)]2, where a(S')=(aij,

...,

a,,), alo i=l.

Let the minima of the functionals respectively.

(2.3) and (2.4) be attained at the points a* and a0

Then a measure of the proximity

of a* and au is p (cK*,(~0) = d (au) - i (a*) and it

follows from (2) that closeness of the values can be guaranteed uniformly with respect to the parameter a. We write

T,= { (S, co)IA (s, Then the functional

(2.4)

‘-xx> +a).

fi (a) is for every value of a the probability R(a)

=P(A(S,

if do (a) approximates

d (a)

(2.5) of the event (2.5):

a>+a}=P(T,).

The empirical estimate & (0) equals the frequency of occurrence u (Q) of this event in the given sample. We denote by F the class of events T, for any a. Then the uniform proximity of the functions d (a) and & (a) indicates the uniform proximity of the frequencies and probabilities of the events T, in the class F.

206

V. L. Matrosov

In [4] sufficient occurrence

conditions

were obtained

of events to their probabilities,

formulating

for the uniform convergence

independent

of the distribution

of the frequencies

function.

of

Before

them we give the necessary definitions.

Definition 3. We define a classification disjoint sets $1 and & S,flS2=$.

of the sample 5/s as every representation

of it as two

{S’l, . . . , Sta}, sz= {S**+r, . . ,’. S’6} ,)where S,US,=S”,

that is, s,=

We will denote a classification

of the sample F6 by (31, Sz); r (c”> is a fixed decision

rule defined by the constants cl and c2 (cl > ~2).

Definition 4. We will say that the operator R (the algorithm A (S, CY))realizes the classification $1, F2) of the sample T6 in the class Kj, if for any objects St E Fl and Sr’ E F2 the estimates of the operator (the values of the algorithm A (S, cu)) satisfy the conditions A(S’,

a)=J,

A(F’,

Let the class of algorithms corresponding

I’s >ci,

I%=%

v.)G‘7. {A (S, a)}

and the class of events .!9- (S, a) = {T,}, to it, and also the fixed sample ?& = (9, . . . , N), be given.

Definition 5. We will define the index of the system 9- (S, a) with respect to the sample ??4 as the number of classifications

of the sample s”4 realizable by algorithms of {A (S, a) }, and

denote it by the number

EBW, a) (S’, . . . , Sq) =Eb (Sq). EC7 (SC j < 2’.

Obviously,

Definition 6. The growth function of the class {A (S, a) } is defined as mr(4)=

mas Edts,u)(S1,. (S’,...,S’)

A theorem obtained in [4] implies an important

. . ,S;).

characteristic

of a class of algorithms. We will

formulate it.

Theorem 2 The growth function rns(4) function

1.5@-l/(n-

is either identically

equal to 24, or it is majorized by the power

l)!, where n is the least number 4 for which LQ- (4) +2*.

It follows from this theorem that the number n- 1 is a measure of the diversity of the class {A (S, a) 1.

Definition 7. We will call this number the capacity of the class {A (S, CY)} and denote it by AA (s, a).

If rrzd (q) =27 for every 4, then the capacity is regarded as infinite, otherwise it is finite. Then the finite capacity of the class {A (S, a)} IS a sufficient condition of the uniform convergence of the frequencies of occurrence of the events of the class P- (S, CC)to their probabilities.

As the class {A (S, a)} we will consider the class of algorithms 8 (4, L). Below we will prove the finite capacity of the class p (A, CY)for every L EN

Correct algebras of bounded capacity

207

3. Auxiliary constructions

S6),Fm=(S1 ,...,

I_etFP=(Sl,..,, Sj=(bjl,...,

bjn),j’l,Z

p(r,j,i),andalsoF=(1,2

)...,

m. Following

,...,

6),rq=m-ml.

S,)whereSi=(ail,..., ain),i=l,2 ,..., [3], we adopt the notation pr (air, bir) =

6,

We consider the mappings aJ,A(I,,

X) = iL4, . . .

a),p’(Zo, 6 j = (tM,

h”) :

)

...,

k=l,

w’),

2, _. . , 17r,,

r=l,

2,. . .,

(3.1)

2,. . . , nz,,

p=l,

such that p (r,

k,

t,.,“) G . . . Gp(r,

k,

frbk!, (3.2)

As a result the mappings @rk and Qrp’ generate the permutation

iiulll

...

I/ad’

I

. . * &lb’,

,,

IIa~211,1xb=

)...,

!

We will call the sample r’s a sample of general type if all the inequalities mappings (3.1) are single-valued.

r:

(3.3)

u:bi

. . . . ./

ilU,ilIlnxb=l/

matrices of the collection

(3.2) are strict. Then the

Below we consider only samples of general type.

Let us consider the r-th rows of the matrices of (3.3): i (r, k) = (t,,‘,

. . . , ttbk),

(3.4)

We define the quantities

~~F,~~(~)=IfIei&-(,(r,k,t,i*)l, j=l

(3.5)

ei:,klC~)=rj[ B[E-p(r,k,a,j*)].

e(z)=l-0(Z).

j=l

We write z (r, k) = min[ft[::kj I

l(r,

p) = min[s,,‘,:,,

(z-l-1) =OVJ+~I? (3.6) (x+1)

x It is obvious that z (r, k), I (r, p) E (0, 1, . . . , S}.

=OVr=61.

208

K L. Matrosov

Definition 8. We will call an ?-section information

of the sample Fs (with respect to the initial

ZO) the pair of matrices of the form 2 (1, 1) . . . z (1, m.1)

F1(Jo,;,g*)=

. . . . . . . . . * , 2 (n, I) * . . 2 (n, ml) i!

F, (lo, ;, g6) = Let us

fix

the initial information

set of all the operators

(a} = (R ( T,

Let a system of one-element

1(1, 1). * .1(1, m) j . . . . . . . . . . 1. 2 I (n, 1). . .I (n, 7%) II

10 and the sample S “6. We introduce

a binary relation on the

Z)} .

supporting

sets {Q) = {i} be given.

Definition 9. We will say that the operators R 1 (‘71,Xl), R2 (;;i,X?2) with the system S2 are in the relation L (R, _i R?), if X= X2 and ?‘I, ;;i (sections of the sample F6) are identical, that is pi(ZO, s,, s”)=F,(Z,. F.. Scl,l=l. 2. We note that this definition relation is an equivalence class of l-equivalence

is independent

of the values of c, Fi. It is easy to see that this and will denote by G1 (R, 52) the

relation. We will call it a l-equivalence

generated by the operator RE {a}.

Let the supporting

set O= {il. . . . , ik} be given, We consider the submatrices

matrices (3.3) obtained by deleting from (3.3) the rows with the numbers

Definition 10. We will define an Ssection

of the

!@ {i,, . . . , in} :

of the sample ??, consistent

with S2’ (an 52’

e-section) as a pair of matrices

Ff, (Jo,;:

x8,=

2 (il? 1) * . . z (il, ml) . . . . . . . . . .

,

5 (ik, 1) . . . z (ik, ml) I (iI, 1). . . I (iI, m,) J’f’ (lo, I, x6) =

.

.

, l (ik,

.

.

.

.

.

.

1) . . . 1 cikj

.

.

m2)

and denote it by (F,“’ (T) , F2” (T) ). Then we will say that the operators R,=R, (G, iZi), Rs=-Rz (Tz, 5~) E {a) with the supporting set 52’ are 1-equivalent, if ?I = Z?l and the 52’et -section and the S1’e2-section are identical:

We denote by Gt (R, Cfj the class of operators

RE {R},

l-equivalent

with respect to S2’.

Correct algebras of bounded capacity

209

Therefore, the set of operators {I?} partitions into equivalence classes: let

{nl>=

{R2}=

U G1(R, Q), 6, N)

U

e,;

U G1(R 9’) ,

I

[ Q,

then {$I) = {a’} iJ {I?}. Let us regard x [Cl (R, a)] and x [G, (R, Q’)] as the number of equivalence classes of the sets (R’) and {R’} respectively. Theorem

3

The following inequalities hold:

x[G,(R, C!)]~3(m6+1)“+1,

(3.7)

E

x[Gi(R Q’) I< * c,“[3(m6+1)+11.

(3.8)

b-i

We consider the r-th row of (3.4) for k = 1,2, . . . , ml, p = 1,2, . . . , m2, r = 1, We form from them a sequence of all the elements of such rows. We write tyil’” =ali* and form for the sequence Proof:

2 >* *., n.

t(r, I), . . . ?f(r, m,),

a(r, I), . . . , a(r,

m,)

(3 -9)

the following substitution:

!

t:l

...

&).

t:.s . . .

t;;

. . . tz

...

Ap(fZn$). . .q&P..

. .(r (&L.

t::

. . . tz

* q(t::).

. . q&)

(3.10) ’

where cpis the injective mapping of the sequence (3.9), defined as follows. Let us write cp(t?:) =f~~~i~,\,, and let

i), (F(Ji,i)),

(2, i)=(T(li, (Z’, j’) = =

i-1)Fq2[,k?i+l)), (cF(h7

1

(@(k-J- 1: 11.i(k-I_ -

i=l,2 1, l)),

,...,

i=6,

6-1,

k=l,2,...,?77,

k=ly2,...,m-1.

Then PO=

P P, @Cl, I), vC.1, 1))=

p(r, Z’, j’)-p(r,

1, j)=

min

min

i=l,Z,...,rn 3=1,:, 6

{plp(r,

p(r, i, j),

Z”, j”)-p(r,

I, j>>O).

I”E(i 02 ...Im)\(l) j’*E(* II2 96)\(j)

We define the predicate: PT(e, 1, j, Z’, j’)=(e[p(r,

1,i), P(r, l’, i’)l),

(3.11)

(3.12)

210

ViL. Matrosov

Then, by (3.10)

(3.1 l), (3.5), for every pair e’, E” > 0, on which the values of the truth of the

predicate 9, for any fixed 1,j, I, 1’ are identical, the equations

are satisfied for any natural x. Consequently, identical.

by (3.6), the E’ and E” sections of the sample ssb are

Let us consider pr =

mas p (r, i, i) i--l ?,.. ,m ;-1,2,...,6

and the predicate ‘Y,(e)= then

v s,(E,l,j,z’,i’), I.,.l’.j’

Let us arrange the set of elements M,= {p (r, i, j) ] in I,&. . . , 2,. . . , s} in increasing order. Then the elements of this increasing sequence form a of the segment [po, p1 1, each segment of the partition being identical with some segment

9, (e) = (EE [pO, p,] >.

m, p=i, partition

of the form [p (r, I, i), p (r, I’, j’)]. Since there are three non-zero vectorsx” and 1Mr1= M6, ) ($2) 1=n, then from definitions

8,9 we obtain the estimate (3.7).

Since the number of different one-element proof of (3.7) and definition

supporting

sets of power k is C&k, then, using the

10, we obtain (3.8).

Corollary. The total number x of equivalence

classes with respect to the systems of supporting

sets {R} and {O’} is overbounded:

~<‘2”[3(m6+1)“+1]. Indeed, it follows from (3.9), (3.10) that 1+(2=-l’)

[3(??26+1)+1]

x=z[G,

(3.13)

(R. a) ]+x

[C, (R, Q’)] GS(rrrdi1)“+

==2’~[3(7?26+1j”+l].

4. The fundamental theorem We introduce supporting

the notation

Cl (f?) for the class of l-equivalence

with the arbitrary

system of

sets (O} or {a’).

We associate with each object Sr of the working sample r6 and with each class GI (R) the Boolean vector br [G, (R)] =

(Et’, . . . , ‘6”‘)

(4.1)

as follows. Let R E Cl (R) and R = R (3. C&se 1. Cl (R) = Gr (R, 52). Then Oti=(@ir’, S’, s, r),

i=l,

2,. . . , m,, E,,‘=B(S,,

Case 2. G, (R) =G, (R,

. . . , @in’), i=l,

S’, e, r),

i=m,+l,.

Q’) , Q’= {ii, . . . , ik}, IThen for

2, . . . , m, . . , m,

r E a2

@~r’=~(s~

r=l,2,.

. . ,n.

211

Correct algebras of bounded capacity

e&‘=B(&,

S’, e, r),

i=m,+l,

&'=B(&, S':E,r)?

It

. . . , 02,

2:. . . , m,,

i=l,

o,,-‘.

2,...,mj,

i=l,

i.C!!’

G:‘=



sign (EG,:

)

i=m,+l,.

. , , m.

r=r?’

The definition of (4.1) is complete. We note that 6)ii~ n-dimensional vector in case 1.

(0,1) incase 2, while 8,’ is an

It follows from definition (4.1) that the number of different vectors does not exceed 2mn in case 1 and 2m in case 2.

9')] are Definition 11. We will say that the vectors B, [G, (R,Q)] and Or [G,(f?, characteristic vectors for the object St with respect to the equivalence classes Cl (R, !TL)and G1 tR, a’). Lemma 2. For all objects St1 and St2 of the sample s”6 with identical characteristic vectors (Gt, [G,(K, Q)l=@i,[G,(4 WI) for any operator Ro E Cl (R) the estimates attained by the operator Rg are identical: I’fljo= I’:;. Proof: For case 1 forRo =Ro (Qsince (B*(Sj, S’s, F. I),

@,,[G,(R, Q)]=@,,[G,(R, G)]?thevectors a.eTBa(Sj, A"', e,n)),

\;B"(Sj, S':, E,I),....B"(Sj, S'z, ~3n)) by definition (3.4) are equal fori = 1, 2, . . . , m, a = 0, 1. Consequently, mt

n

yn S'I,T,l.)" r:,,=cl!2p,.*::B(S,. YC

For case 2, by definition (3.4), OLli=Ol~, i=l,

p&l (S,, I.!:-, T,1.) =r!,,'.

2, . . . , m, that is,

212 then

Similarly,

The lemma is proved. We will now prove the fundamental

theorem guaranteeing

the finite capacity of the class of

algorithms u” (A, L).

Theorem 4 For every L EN there exists a 6 such that any sample y6 has a classification

(gl, @ not i

realizable by any operator R E c (R, L). Proof: Let us assume the contrary:

let there exist an L such that for any 6 every classification

(31, &) of the sample F’s is realized by some operator R E Y?(R, L). By the definition c (R, L), every operator

of this class is an operator polynomial

of the class

of the form

(4.2)

We introduce

the following equivalence

relation (L-equivalence)

on the set of all operators of

3 (R, L). Also let an operator

be given of the class p (R, L). We will say that the operatorsR and R’ are L-equivalent (R k=R’) , if R,L&’ for every i = 1,2, . . . , L. As a result the class g (R, L) is partitioned into equivalence classes, which we denote by GL (R),

Correct algebras of bounded capacity

This definition

213

implies that every class is generated by an operator L

Ro=

z

Ri.

(4.3)

i=i

By (3.13), the number of classes of L-equivalence

does not exceed

{~“[3(m6+l)“+l]}~<2~‘“+2’(m6+1~L”.

Let us consider the arbitrary

(4.4)

class CL (R) generated by (4.3), where every operator

Ri E Cl (Ri). Here for every r = 1, 2, . . . ,6 to the object S there corresponds

the Boolean vector

0, [GL(R)] =(Ot [G, (Z-6)I,. . . ,@, [Gi (RL)I )> which we call the generalized characteristic vector of the object with respect to the class GL (R). It is easy to see that the number of different generalized vectors for the class G,r_(R) does not exceed

iPnL.

(4.5)

We note that if @,,[G,(R)] =6,,[GL(R)]f or some rl #Q, then CS,,[G,(Ri)] =bt,[G,(Ri)] foreveryi= 1,2,. . . , L and accordingly the estimates attainable by the operators Ri with respect to the class Ki for the objects St1 and St2 are identical, that is, rif, =I’f,’ . Therefore the estimates attainable

by each operator of the form (4.2) for the given objects equal, respectively,

(4.6)

Let us consider the sequence of generalized characteristic F6. Then, by condition

(4.9,

this sequence contains not less than [6/2mnL]

is, natural numbers 71, . . . ,T+E

8,, R [ Gt (R) ] . We introduce

vectors for the objects of the sample

[l,S]

existsuchthatcR

> [S/2”“L]

the following set of classifications

identical vectors, that

and aT,[GL(R)]=...=

of the sample @ :

fm[RIL-{(Si, $) 1zrxIp(r, p=l, 2,. . . , CR)

x [ (Pd,)

& pd&)]

}*

We calculate the number of elements of this set:

This equation follows from the fact that the number of all the subsets of the set !&= {srl, . . . , S-CR}, which are not empty and are not the whole set %I [ R] L, equals 2cR -2, and the number of subsets of the set s’\!Dl, is 26-cR.

K L..Matrosov

214

It follows from the assumption made at the beginning of the proof that each of the 2s -classifications of the sample is realized by some operator of an appropriate class of L-equivalence. Then it follows from (4.4) that there exists not less than [2 ’/-oL(*+*)(m6+ I) Ln] classifications realizable by the operators R E “v (R, L), belonging to the same class GL (Rg). Let R be the set of all those classifications. We will prove that Rfl!DI[R,] r,st$. Indeed, the number of classifications of the sample ssb, not belonging to the set W, does not exceed

On the other hand

This inequality is satisfied for fairly large S, more precisely: 2

c~o-1~2[6/2mn~l-1>2L(n+2,

The latter is true if [C?/2”‘nL]-1>L

(m6+1)L".

(n-l-2) SLn log? (me-k 1). or

6>2”“L[Ln

log, (m6+1) +L(n+2) +11.

(4.7)

But this implies that a classification (s/, , $+) &D [I?,] L, exists belonging to R. By the definition of 9 [Ro] L, at least two objects S 7r E 31 and STP E 32 exist for which& [GL (R,) ] =GTP [ G, (R,) ] , but by (4.6) the estimates Frrj and r,,j, attainable by any operator of the class GL (RI-J) are identical, that is, rTrj = rr,j. But this is impossible, since !?I n Tz = @.The theorem is proved. Remark. Let us consider the set of operator polynomials 0’ (R, L) of the form (4.2), where for all i = 1,2, . . . , L the operator Ri has the system of supporting sets {sl}=(i). Then in the formulation of Theorem 4 instead of the class 2 (R, L) we can consider the class p (R, L) and the estimate (4.7) is replaced by the following:

6>2~~L[Lnlogz(mC?+l) +2L+1].

(4.8)

Indeed, by (3.7) the number of classes ofl-equivalence does not exceed [3().15+1) ‘1+1]L<22L (nd+l)Ln. Moreover the estimate (4.5) remains unchanged. Then, solving the inequality 2(a’2mnL)-i>22L(nlG+1)Ln for 6, we obtain (4.8). Let F (S, c~)be the class of events (2.5) corresponding to the class of algorithms $ (A, L); then Theorem 4 implies the estimate of capacity AA (S, a) of the class G (A, L). Corollary.

The following inequality holds: (4.9)

We will now obtain an upper bound for AA (S, CY)in a more convenient form.

Correct algebras of bounded capacity

215

Theorem 5 The capacity of the class of algorithms

Let a working sample ?? be given such that 6 = 4mnL+*. In particular,

hoof:

and also 2L+1~2mnL+1, everym,nandL=1,2

then Ln [log,

LnS1<2mnL;

(2 log? 6-I-l) +2mnL+1~4(mnL)‘+ 2mnL’4

3 (A, L) has an upper bound:

,...

rns t 1 < 62,

(n&i-1)+1]+2L+l
Il~nL+1~3”‘“L+i.

The last inequality

. Using also the estimate (4.9) and the equation Imd+*

holds for =

X 2mnL, we obtain (4.8). We will apply this result to estimate the length of the sample

sufficient for teaching in the recognition problem. Following [4], we will assume {S,, . a., &J that the lengths of the working and teaching samples are the same, that is F& = (Sl , , . . ,S6) and ~~=(S1,*..&). Also let 36 and r6 have been obtained

in a series of independent {A (S, a) } : be given with the capacity AA (,I$a).

algorithms

trials and a class of

Theorem 6 (see [4]). For the error frequency

ot the recognition

algorithm

A= {A (s,

a) } on the teaching and

working samples to differ, with probability greater than l- 7, by not more than E, it is sufficient that the length of the samples satisfy the condition

6= 2&d&

1_ ln(r)/5)

a)

e”

[

- (C/2)

,

--In+ I

6, CT. a)

Theorems 5 and 6 imply Theorem 7. Theorem 7 For every 0 < ‘I)< 1 and E > 0 in order that, with probability frequency

of the recognition

greater than 1 - q, the error

algorithm A E D (A, L) on the teaching and working samples should

differ by less than E, it is sufficient

that the length of the teaching sample FQ satisfy the condition

In conclusion the author acknowledges problem and for his interest.

his indebtedness

to Yu. I. Zhuravlev for suggesting the

Translated by J. Berry

REFERENCES 1. ZHURAVLEV, Yu. I. Correct algebras on sets of incorrect (heuristic) a$orithms 14-21,1977.

I. Kibernetika, No. 4,

2. ZHURAVLEV, Yu I. Correct algebras on sets of incorrect (heuristic) algorithms *l-27,1977.

II. Kibernetika, No. 6,

3. ZHURAVLEV, Yu. I. Correct algebras on sets of incorrect (heuristic) algorithms 35-43,1978.

111.Kibemetiku, No. 2,

4. VAPNIK, V. N. and CHERVONENKIS, A. Ya. Theory of pattern recognition (Teoriya raspoznavaniya obrazov), Nauka, Moscow, 1974.