Optimization of a class of recognition algorithms

Optimization of a class of recognition algorithms

OPTIMIZATION OF A CLASS OF RECOGNITION ALCA)RITHMS* E. G. KUL’YANOV Moscow (Received 16 October 1972) (Revised version 16 October 1973) ONE OF the cl...

759KB Sizes 0 Downloads 83 Views

OPTIMIZATION OF A CLASS OF RECOGNITION ALCA)RITHMS* E. G. KUL’YANOV Moscow (Received 16 October 1972) (Revised version 16 October 1973)

ONE OF the classes of recognition algorithms, algorithms using the calculation of estimates, is considered. An algorithm for finding the optimal algorithm of this class is constructed optimization

(the

is carried out by the number of correctly recognized rows of a control matrix).

Introduction Recognition

theory is a new, rapidly developing branch of applied mathematics.

and even classes of algorithms intended continually

appearing. A multitude

for the solution of some specific recognition

of algorithms faces the investigator

Algorithms problems are

who for the first time

encounters the necessity to use recognition theory in his work, and to select from them the optimal one which best corresponds to the specific problem. This paper is devoted to the construction “voting algorithms”,

recognition

of the optimal algorithm of the class known as

algorithms based on the calculation

The class of algorithms considered is intended n-dimensional

vectors, whose coordinates

of estimates

for the recognition

correspond

to particular

[l] .

of objects described by

features of the object described.

In this paper the case is considered where the coordinates assume the values 1 or 0, which corresponds to the presence or absence in the object of a given feature, but in the general case the coordinates may assume any values describing the degree of manifestation of the feature. The class of recognition algorithms based on the calculation of estimates occurs in an extensive class of “recognition

by training”

algorithms, which are characterized

sequence (training matrix), containing to be recognized must be referred. The essence of the recognition

representatives

by the presence of a training

of each class, to one of which the object

consists of a comparison

of the vector S to be recognized,

with each of the vectors of the training matrix belonging to some class Ki, and its proximity

to

these vectors is described by the estimate rki (S) of the vector S with respect to the class Ki. Possessing a set of estimates of the vector S to be recognized with respect to all the classes, the recognition algorithm makes a decision based on the decision rule F about the membership of the vector S in one of the classes, or refuses to recognize it. The search for the optimal algorithm is carried out as follows. The class of algorithms is regarded as a parametric family. In the domain of permissible values of the parameters a functional is specified which describes the quality of the operation of the algorithm corresponding to a specific set of values of the parameters. The point of extremum of this functional gives us the *Zh. v_ikhisI.Mat. mat. Fiz., 14, 3,156-167,

1914.

214

215

Optimizationof a classof recognition algorithms

values of the parameters parameters

to which the optimal algorithm corresponds.

the weights of the vectors of the recognition

of these vectors as the representatives

In this paper we regard as

matrix 7 (Sj) , describing the significance

of the corresponding

class, and the threshold of resolution

r of the decision rule.

On the basis of a description recognition

of the necessary and sufficient

conditions

for the valid

of the vectors of the control sample (control matrix) by a system of linear inequalities,

it is shown in this paper that the optimal values of the weights 7 (Sj) of the rows Si of the control matrix give a solution of this system for some optimal value r of the threshold of the decision rule.

1. Fundamental

concepts and formulation

The training matrix for the recognition denoted by T,,,,l

of the problem

algorithm using the calculation

and contains m n-dimensional

of estimates is

vectors belonging to I classes. Here the m 1

first vectors belong to the class’K1, the next m 2 - ml vectors belong to the class Kz , and so on vectors belong to the class Kl. The stages of the calculation

until the m-mI_,=m~-ml-ilast the estimates

rxi (3) are described in detail in [I] ; here it is important

considered the formula obtained Ki is applicable:

there for calculating

r,p

mi c

=

of

that in the case

the estimate of the vector S over the class

y(sj)[2%s)-i],

j'rni-l$l

where 1 (Sj) is the weight of the vector

(Sj, S)is the SjETn, ,,,, 1, r"

of the vectors Si and S (we recall that the coordinates

number of identical coordinates

of the vectors are zero or unity).

Possessing the set of estimates rK, (5’)) . . . , rKl (8) , the algorithm makes a decision about the membership of the vector S in one of the classes, namely: the vector belongs to that class whose estimate, relative to the sum of all the estimates, exceeds some number r. Mathematically the decision rule is written in the following form:

F (rK, 69, . . ., rKl (8)) =

iy 10

if

?I! (a class Ki) : I?K~(S) [A

roar

> r,

j=l

otherwise

A zero value of the decision rule testifies that algorithm fails to make a recognition. It is easy to see that the operation values of the weights 1 (Sj)and

of the recognition

algorithm is largely determined

by the

the threshold r of the decision rule. Our problem is to find the

optimal values of these parameters,

that is, those values for which the algorithm will give the

least number of errors and refusals. The matrix T’,,, m,, 1 known as the control matrix, is used to control the quality of operation of the algorithm. Its construction is quite similar to that of the training matrix, the only difference is that the control matrix may contain other vectors and in a different quantity the first m’, vectors belong to the class K, , the next rnh - rn: to the class Kz , and so on. The optimization criterion of the class of algorithms considered was selected to be the number of correctly recognized vectors of the control matrix T’,, rr.‘,l: k=cp(y (S,), . . . , 7 (S,) , r) , where 7 (Sj) is the weight of the j-th vector of the training matrix T,,, m, I, and r is the threshold of the decision rule.

216

E. G. Kui’yanov

It is obvious that O
For the row (vector) recognition rule Sjl~f& it is necessary and sufficient that its estimates satisfy the system

r”i-l 6’) < r t

rKi

(sj'),

(2)

I=1

rKi+l

tsj') <

rKLi(Sj’)t

ri i=l

* . . . . . *.

. . . . . *

For the whole control matrix T’,, ,,,‘, 1 we obtain the system (3) consisting of m’ of the form (2), describing the necessary and sufficient condition for the correct recognition of each row of the control matrix T’, ,,,*, l: rECi(Sd

s

rk

hi

(3)

cm

i=t

. . . . . . . . . . ...* rK,

(&‘I

<

f t

a.

rKr Ps’),

i=l . . . . . . . . . ...*.

* . . *.

* *.

. f..

. .

Optimizationof a class of recognition algorithms

217

i=l

We notice that in the system (3) there are no zero inequalities. This is due to the fact that the arbitrary row Sj’ET’,, ,,,p,I can be completely non-identical with only one row, namely that one each of whose coordinates is opposite to the corresponding coordinate of the row S;. This implies that there cannot be more than one zero coefficient in each inequality of the system (3). In the general case the system (3) is inconsistent, and our problem is to construct an algorithm with such values of the parameters T (S,) , lGj
2. Fundamental result The construction of the maximum consistent subsystem of the system (3) begins with the analysis of the subsystem of the non-strict inequalities of the system (3): 1

rK,

6%‘)> r

z

rKi

(&‘),

i=l

In section 3, a proof will be given that a value of the threshold of the decision rule rmar exists which is such that for any rirmar it is inconsistent. Moreover, investigating the remaining subsystem of strict inequalities (3b), we obtain that it decomposes into m’ subsystems of the following form: if SUEZ&,then

218

E. G. Kul’yanov

r~,(8,‘)<

r

2

rKi

(sj’),

i=l

2

rKi-l tsj‘I<

r

c

rKi

(sj'),

i=l

(39

1

%+I tsj’) < r x rKi i=l ........ . . . rK,(sj’>

<

(sj'), . . .

rt

rKi(sj’>-

C=l

Investigating each of these subsystems, we obtain that for each j, lGjrj

The proposed algorithm for the optimization of the class of recognition algorithms consists of the following stages. 1. For the original subsystem of non-strict inequalities (3a0) of the system (3) we find the value r&, of the decision rule threshold. 2. For all the subsystems of the form (3i) we find the corresponding rj and arrange them in increasing order: rj,G . . . F& , we remove from the system (3a0) the inequality corresponding to the system of strict inequalities possessing the maximum value of the threshold Fj,,,,. From the set rh, we remove the last term q,,,,. rjl,

.

.

.

,

4. For the new system (3a’) we find rA,. In the general case r-k,, > r&. If in the new we search for the general solution of the system (3a’) and the system set the last rjnz,-t
3. Justification of the algorithm 1. Necessav

condition of correct recognition

219

Optimization of a class of recognition algorithms

In accordance with the scheme given in section 2 for constructing first investigate the subsystem of non-strict

inequalities

the optimal algorithm, we

(3a). Writing out rKi (S,‘) by Eq. (1)

we obtain mt

c

F(S,S,‘)

Y Pi)

I2

-U>r&&

Y(&)127’s’9s”)--l,

mo=O,

i=1

..*....................................... ml

c

i=ml_l

;r(S s’,,’

F(Si, Sk,)

i’

-l]$rh

Y Csi> i2

2 j=l

fl

-

y(si)[Z

mI = m.

11,

i=mj_l+l

We notice that on the left side of the inequalities

of the system the summation

is over the rows

of the training matrix occurring in one class, and on the right side it is over all the m rows of the matrix T,, ,,,, 1. Transferring all the terms to the left side and collecting like terms, we obtain the system 7CSi’ &‘)

‘(‘i, ‘,‘) (r -

1) 2

-IIfr

Y(%) [a

2 km,+1

i=l .

.

.

*l-l

F

c

Y

w

.

.

.

.

;(si’ s’,,,

P

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

~q, S’&

.

.

-

.

.

.

.

.

.

(4)

l]
i=ml_l+l

We obtain a system of iI4’ linear inequalities either

.

--I+(‘--

i=l

coefficients

iI< 0,

-

Y(si)]a

r[2~s~~s~‘)-l], or

(F-

with m unknowns

T (S,),

1) [27’13 ‘I’)-!],

i=l, 2, . . . , m, j=l, 2, . , . , m’.

. . . , y (S,)

and with

In this paper we consider the case where the training matrix is not smaller than the control matrix, that is, m>m’. In the study of systems of linear inequalities used [2, 31 : this asserts that every consistent

the principle of limiting solutions is widely

system of non-strict

linear inequalities

fi (x) GCG,

l
of rank p we can choose p linearly independent functionals fV, (2) , . . . , fYP (x) , such that every solution of the system fyi (x) =ai, l
(a zero solution

the limiting subsystem of the system (4) and its f, i ( y) ~0, lGi
solution of this system of equations has dimension m-p. By the principle of limiting solutions it occurs in the general solution of the system (4). Accordingly, the dimension of the set of solutions of the system (4) is q>m-p. For pCm’I. It remains to consider the case p=m’=m.

If p=m'=m all the m' linear functions on the left sides of the inequalities of the system (4) are linearly independent, that is, any non-trivial linear combination of them is not equal to zero. It is shown in [3f that for the system fi(z) 20, 1GL
=l

satisfied for any hi*O, and consequently also for A+>O. From this, because of sufficiency, there exists an x=X, s+O> since otherwise it would satisfy all the equations.

is

Therefore, the system (4) is always consistent, even if we neglect the trivial solution T (Sj)=O. Now taking into account the requirement T (S,) Z&l for any j=it 2, s . j , m, we obtain thai the consistency of the system will be determined by the decision rule threshold I, since for r,l we obtain that the sum of the non-negative values is less than or equal to zero, and consequently, the system is inconsistent (there exists a y (Sj) >O). On the other hand for 60 we have the sum of the non-positive values less than or equal to zero, and consequentiy the system is consistent. We obtain a one-dimensional extremal problem on the m~imization of the decision rule threshold r, situated within the limits O
2, * , _, m, j=% 2, +j). , m’;

2) at each step (for every fixed r) we calculate the coefficients of the system (4): r[$(% s~‘)-l] or (r-4) [$% s,‘)-*]; 3) we establish the consistency (or inconsistency) of the system, and depending on this we pass to a new r: r,=g(rA; 4) the process is continued until for a given G-0 the sequoia

f rk--rkkl f
is satisfied,

where rk

=

max {I’* 1 n

7, = min {r,’ f n

system (4) consistent system (4) inconsistent

-rk, we obtain the required value of the decision rule threshold. 5) taking rm,,--

The continuity of the consistency property of the system (4) is necessary for the realization of the methods of searching for rmrtx.

Optimization of a ckns of recognition algorithms

221

By Dedekind’s we~-~o~ axiom, if the division of the segment AB into two classes satisfies the following conditions: 1) AEI,

B=II;

2) each point CEAB belongs to one and only one class; 3)

each class is not exhausted by the ends of the segment;

4) for any point s~1 and any point y=II the following requirement is satisfied: x is situated between A andy, then a point LkAB exists such that any point x situated between A and D belongs to class I, and any pointy situated between I) and B belongs to class II.

We refer to class I those points of the segment [0, l] for which the system (4) is consistent, and to class II those for which it is inconsistent. The validity of conditions 1) and 2) is obvious. We demonstrate the validity of the remaining conditions. Lemma I

If for Y= rl the system (4) is consistent, the system is also consistent for any ri’C rl Proof Since the system (4) is consistent for Y= rl , a solution of it 7%(8%), . . , , 1%(S’,,,), exists, that is, the foflowing system is satisfied:

Since ri’ 1ri-l 1) , and the positive terms decrease. Therefore, the solution Ti (S,), +. . , ri (S,) is also a solution for r = r’, . Consequently, the system (4) is consistent for t = r’, . Lemma 2

If the system (4) is inconsistent for r = r2, the system (4) is also inconsistent for any r2‘>rt &oof. We suppose that the system (4) is consistent for r = t’~ . Then, by Lemma 1, the system (4) is also consistent for Y= rz , which is not so. Consequently, the system (4) is inconsistent for r = r’*. Theorem I

In the case considered the fourth condition of the continuity axiom is satisfied.

Proof: Let z-H,

~~11. Since A=O
1) x is situated between A andy, 2) y is situated between A and x.

and A
222

E.G. Kuf’yanov

If the first case is satisfied, the theorem is proved. We show that the second case is impossible: since ZEI, the system (4) is consistent for r =x, and since y is situated between A and X, we have yO such that the system (4) is consistent for r = rl

.

Proof: We rewrite the system (4) in the form

. . . . . . . . . . . . . . . . . . . . . . . .

..a.........

separating the inequalities corresponding to the rows of the control matrix belonging to the first class. We notice that T (S!) , . . . , y (S,,) make a contribution to the negative term of the inequalities from the first to the m’r -th. Their coefficients in these inequalities are [ 2@t* ‘j’)-i], where i=l, 2,. . . , ml, j-1,2,. . . , ml‘. We select from them the numbers differing least from zero: P1=

min . 1=1, ia,..., m,,3=1, z...., m,

,{(2"si~"j')-

11 +&Oo),

Similarly, for the weights of the rows &,,til +,, . . . , Spni, belonging to the i- th cIass, the $1, coefficients in the negative side will be f2’(“i* ‘1’)--1] for i=??&-f-4, . . . , TtZi, j=$_, . * , , mi’. We denote the minimum by pi. Now if in the system (4) in the negative terms we replace the [ 2zst9 s1’)-11 by the corresponding pi, leaving the [ @s~8s,‘)-_l] =O unchanged (if there are any), the inequalities are weakened. But by varying r it is possible to attain consistency of the new system. It is important that any solution of the new system will be a solution of the system (4). = We consider as a solution y (8,) = . . . =‘y (s%,) = I/r%j&, . . . ,y (Sml_l+tf = . . . =‘f (sd I/ (ml-mt-i j pl (naturally in the training matrix there are no empty classes, that is, the denominator is non-zero). Substituting these values in the new system and summing the content of the brackets for (r - l), we obtain the system

. . . . . . . ..*......................*.

223

Optimizationof a classof recognition algorithms

where Fi,, mi-nti_r in the general case is less than the corresponding mi, mi--mt-I, since in the summation there may be terms with zero coefficients which we have not replaced by the values ccl, so as not to permit the strengthening of the inequalities by an increase in the absolute value of the negative part. Therefore, we obtain a system of m’ linear inequalities with one unknown r.

-I

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . - ml ml -

ml-1

ml-,

+r 1

* * . + (ml_, -

m&

ml -

m,l_l

+m’,, [2 ml + ml-1 FCS7Tl[_l’ i;#) 111-l ”

-

~@,, sg,,

I]

1

6

(7) -

11

0.

Considering that all the coefficients in the system (7) are fixed numbers, we can write it more briefly:

We notice that if p,=O, this inequality is satisfied for any r, and consequently it may be removed from the system. Since all the inequalities are the same way round, the solution is of the form r< 1

6

,where y = min

6=max{P,,

Since r>O,

00,

. . . , pm*}.

there exists an ri, O
for which the system (4) is consistent.

It would be possible to explain whether in the second class points exist which are different from the end of the segment, but this is not important in this case. Even if there are no such points, for a specified E)O we can take as rmax any point rE [ I-E, 1). Therefore, on the segment [0, l] there exists a point D such that for any r+ [ 0, D) the system (4) is consistent, and for any rk’E (D, I] it is inconsistent.

3. The necessary and sufficient condition Returning once more to the subsystem of strict inequalities (3b), we consider one of its m’ subsystems - the subsystem (3i), corresponding to the row of the control matrix S{EK(.

224

E. G. Kul~anov

We notice that for 60 and the requirement y (Si) >O the system (3j) is inconsistent, since by a remark of section 1, for a futed set y (8, ), . . . , 7 (S,,,) there is a negative number on the left side of each inequality of the system, and a positive number on the right side. For r> 1 and 7 (Si) 20 each inequality is obviously valid, and consequently, the system (31) is consistent. Therefore, the problem of determining the consistency of a system of the form (3j) reduces to the problem of minimizing the decision rule threshold r for which the system (31) remains valid. For the solution of this problem, as also for the solution of the similar problem for a system of non-strict inequalities, continuity of the consistency property of the system with respect to r is essential. The proof of continuity for a subsystem of strict inequalities is quite similar to the proof given in part 2 of this section. Considering for each row S) of the control matrix, the corrresponding system of strict inequalities (3j), we obtain a set of values of the decision rule threshold ri, . . . , r,,,‘, for which the system (31) is consistent. The property of the subsystem of strict inequ~ities (3b) of the system (3) considered, together with the properties of the subsystem of non-strict inequalities (3a) of the system (3) explained in parts 1,2, give the necessary and sufficient condition for correct recognition, and completely prove the optimal recognition algorithm from a class of recognition algorithms based on the calculation of estimates, presented in section 2. ~a~s~te~ by J. Berry REFERENCES 1.

ZHLJRAVLEV, Yu. I. and NIKOFOROV, V. V., Recognition algorithms based on the calculation of estimates. Ki~~etika, No. 3, l-11,1971.

2.

CHERNIKOV, S. N., Linear inequalities (Lineinye neravenstva), Fizmatgk,

3.

FAN TSZY.,On systems of linear inequalities, in: Linear inequalitiesand related questions (Lineinye neravenstva i smezhnye voprosy), 214-262, Izd-vo in. lit., Moscow, 1959.

Moscow, 1968.