OPTIMIZATION OF A CLASS OF RECOGNITION ALCA)RITHMS* E. G. KUL’YANOV Moscow (Received 16 October 1972) (Revised version 16 October 1973)
ONE OF the classes of recognition algorithms, algorithms using the calculation of estimates, is considered. An algorithm for finding the optimal algorithm of this class is constructed optimization
(the
is carried out by the number of correctly recognized rows of a control matrix).
Introduction Recognition
theory is a new, rapidly developing branch of applied mathematics.
and even classes of algorithms intended continually
appearing. A multitude
for the solution of some specific recognition
of algorithms faces the investigator
Algorithms problems are
who for the first time
encounters the necessity to use recognition theory in his work, and to select from them the optimal one which best corresponds to the specific problem. This paper is devoted to the construction “voting algorithms”,
recognition
of the optimal algorithm of the class known as
algorithms based on the calculation
The class of algorithms considered is intended n-dimensional
vectors, whose coordinates
of estimates
for the recognition
correspond
to particular
[l] .
of objects described by
features of the object described.
In this paper the case is considered where the coordinates assume the values 1 or 0, which corresponds to the presence or absence in the object of a given feature, but in the general case the coordinates may assume any values describing the degree of manifestation of the feature. The class of recognition algorithms based on the calculation of estimates occurs in an extensive class of “recognition
by training”
algorithms, which are characterized
sequence (training matrix), containing to be recognized must be referred. The essence of the recognition
representatives
by the presence of a training
of each class, to one of which the object
consists of a comparison
of the vector S to be recognized,
with each of the vectors of the training matrix belonging to some class Ki, and its proximity
to
these vectors is described by the estimate rki (S) of the vector S with respect to the class Ki. Possessing a set of estimates of the vector S to be recognized with respect to all the classes, the recognition algorithm makes a decision based on the decision rule F about the membership of the vector S in one of the classes, or refuses to recognize it. The search for the optimal algorithm is carried out as follows. The class of algorithms is regarded as a parametric family. In the domain of permissible values of the parameters a functional is specified which describes the quality of the operation of the algorithm corresponding to a specific set of values of the parameters. The point of extremum of this functional gives us the *Zh. v_ikhisI.Mat. mat. Fiz., 14, 3,156-167,
1914.
214
215
Optimizationof a classof recognition algorithms
values of the parameters parameters
to which the optimal algorithm corresponds.
the weights of the vectors of the recognition
of these vectors as the representatives
In this paper we regard as
matrix 7 (Sj) , describing the significance
of the corresponding
class, and the threshold of resolution
r of the decision rule.
On the basis of a description recognition
of the necessary and sufficient
conditions
for the valid
of the vectors of the control sample (control matrix) by a system of linear inequalities,
it is shown in this paper that the optimal values of the weights 7 (Sj) of the rows Si of the control matrix give a solution of this system for some optimal value r of the threshold of the decision rule.
1. Fundamental
concepts and formulation
The training matrix for the recognition denoted by T,,,,l
of the problem
algorithm using the calculation
and contains m n-dimensional
of estimates is
vectors belonging to I classes. Here the m 1
first vectors belong to the class’K1, the next m 2 - ml vectors belong to the class Kz , and so on vectors belong to the class Kl. The stages of the calculation
until the m-mI_,=m~-ml-ilast the estimates
rxi (3) are described in detail in [I] ; here it is important
considered the formula obtained Ki is applicable:
there for calculating
r,p
mi c
=
of
that in the case
the estimate of the vector S over the class
y(sj)[2%s)-i],
j'rni-l$l
where 1 (Sj) is the weight of the vector
(Sj, S)is the SjETn, ,,,, 1, r"
of the vectors Si and S (we recall that the coordinates
number of identical coordinates
of the vectors are zero or unity).
Possessing the set of estimates rK, (5’)) . . . , rKl (8) , the algorithm makes a decision about the membership of the vector S in one of the classes, namely: the vector belongs to that class whose estimate, relative to the sum of all the estimates, exceeds some number r. Mathematically the decision rule is written in the following form:
F (rK, 69, . . ., rKl (8)) =
iy 10
if
?I! (a class Ki) : I?K~(S) [A
roar
> r,
j=l
otherwise
A zero value of the decision rule testifies that algorithm fails to make a recognition. It is easy to see that the operation values of the weights 1 (Sj)and
of the recognition
algorithm is largely determined
by the
the threshold r of the decision rule. Our problem is to find the
optimal values of these parameters,
that is, those values for which the algorithm will give the
least number of errors and refusals. The matrix T’,,, m,, 1 known as the control matrix, is used to control the quality of operation of the algorithm. Its construction is quite similar to that of the training matrix, the only difference is that the control matrix may contain other vectors and in a different quantity the first m’, vectors belong to the class K, , the next rnh - rn: to the class Kz , and so on. The optimization criterion of the class of algorithms considered was selected to be the number of correctly recognized vectors of the control matrix T’,, rr.‘,l: k=cp(y (S,), . . . , 7 (S,) , r) , where 7 (Sj) is the weight of the j-th vector of the training matrix T,,, m, I, and r is the threshold of the decision rule.
216
E. G. Kui’yanov
It is obvious that O
For the row (vector) recognition rule Sjl~f& it is necessary and sufficient that its estimates satisfy the system
r”i-l 6’) < r t
rKi
(sj'),
(2)
I=1
rKi+l
tsj') <
rKLi(Sj’)t
ri i=l
* . . . . . *.
. . . . . *
For the whole control matrix T’,, ,,,‘, 1 we obtain the system (3) consisting of m’ of the form (2), describing the necessary and sufficient condition for the correct recognition of each row of the control matrix T’, ,,,*, l: rECi(Sd
s
rk
hi
(3)
cm
i=t
. . . . . . . . . . ...* rK,
(&‘I
<
f t
a.
rKr Ps’),
i=l . . . . . . . . . ...*.
* . . *.
* *.
. f..
. .
Optimizationof a class of recognition algorithms
217
i=l
We notice that in the system (3) there are no zero inequalities. This is due to the fact that the arbitrary row Sj’ET’,, ,,,p,I can be completely non-identical with only one row, namely that one each of whose coordinates is opposite to the corresponding coordinate of the row S;. This implies that there cannot be more than one zero coefficient in each inequality of the system (3). In the general case the system (3) is inconsistent, and our problem is to construct an algorithm with such values of the parameters T (S,) , lGj
2. Fundamental result The construction of the maximum consistent subsystem of the system (3) begins with the analysis of the subsystem of the non-strict inequalities of the system (3): 1
rK,
6%‘)> r
z
rKi
(&‘),
i=l
In section 3, a proof will be given that a value of the threshold of the decision rule rmar exists which is such that for any ri
rmar it is inconsistent. Moreover, investigating the remaining subsystem of strict inequalities (3b), we obtain that it decomposes into m’ subsystems of the following form: if SUEZ&,then
218
E. G. Kul’yanov
r~,(8,‘)<
r
2
rKi
(sj’),
i=l
2
rKi-l tsj‘I<
r
c
rKi
(sj'),
i=l
(39
1
%+I tsj’) < r x rKi i=l ........ . . . rK,(sj’>
<
(sj'), . . .
rt
rKi(sj’>-
C=l
Investigating each of these subsystems, we obtain that for each j, lGjrj
The proposed algorithm for the optimization of the class of recognition algorithms consists of the following stages. 1. For the original subsystem of non-strict inequalities (3a0) of the system (3) we find the value r&, of the decision rule threshold. 2. For all the subsystems of the form (3i) we find the corresponding rj and arrange them in increasing order: rj,G . . . F& , we remove from the system (3a0) the inequality corresponding to the system of strict inequalities possessing the maximum value of the threshold Fj,,,,. From the set rh, we remove the last term q,,,,. rjl,
.
.
.
,
4. For the new system (3a’) we find rA,. In the general case r-k,, > r&. If in the new we search for the general solution of the system (3a’) and the system set the last rjnz,-t
3. Justification of the algorithm 1. Necessav
condition of correct recognition
219
Optimization of a class of recognition algorithms
In accordance with the scheme given in section 2 for constructing first investigate the subsystem of non-strict
inequalities
the optimal algorithm, we
(3a). Writing out rKi (S,‘) by Eq. (1)
we obtain mt
c
F(S,S,‘)
Y Pi)
I2
-U>r&&
Y(&)127’s’9s”)--l,
mo=O,
i=1
..*....................................... ml
c
i=ml_l
;r(S s’,,’
F(Si, Sk,)
i’
-l]$rh
Y Csi> i2
2 j=l
fl
-
y(si)[Z
mI = m.
11,
i=mj_l+l
We notice that on the left side of the inequalities
of the system the summation
is over the rows
of the training matrix occurring in one class, and on the right side it is over all the m rows of the matrix T,, ,,,, 1. Transferring all the terms to the left side and collecting like terms, we obtain the system 7CSi’ &‘)
‘(‘i, ‘,‘) (r -
1) 2
-IIfr
Y(%) [a
2 km,+1
i=l .
.
.
*l-l
F
c
Y
w
.
.
.
.
;(si’ s’,,,
P
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
~q, S’&
.
.
-
.
.
.
.
.
.
(4)
l]
i=ml_l+l
We obtain a system of iI4’ linear inequalities either
.
--I+(‘--
i=l
coefficients
iI< 0,
-
Y(si)]a
r[2~s~~s~‘)-l], or
(F-
with m unknowns
T (S,),
1) [27’13 ‘I’)-!],
i=l, 2, . . . , m, j=l, 2, . , . , m’.
. . . , y (S,)
and with
In this paper we consider the case where the training matrix is not smaller than the control matrix, that is, m>m’. In the study of systems of linear inequalities used [2, 31 : this asserts that every consistent
the principle of limiting solutions is widely
system of non-strict
linear inequalities
fi (x) GCG,
l
of rank p we can choose p linearly independent functionals fV, (2) , . . . , fYP (x) , such that every solution of the system fyi (x) =ai, l
(a zero solution
the limiting subsystem of the system (4) and its f, i ( y) ~0, lGi
solution of this system of equations has dimension m-p. By the principle of limiting solutions it occurs in the general solution of the system (4). Accordingly, the dimension of the set of solutions of the system (4) is q>m-p. For pCm’I. It remains to consider the case p=m’=m.
If p=m'=m all the m' linear functions on the left sides of the inequalities of the system (4) are linearly independent, that is, any non-trivial linear combination of them is not equal to zero. It is shown in [3f that for the system fi(z) 20, 1GL
=l
satisfied for any hi*O, and consequently also for A+>O. From this, because of sufficiency, there exists an x=X, s+O> since otherwise it would satisfy all the equations.
is
Therefore, the system (4) is always consistent, even if we neglect the trivial solution T (Sj)=O. Now taking into account the requirement T (S,) Z&l for any j=it 2, s . j , m, we obtain thai the consistency of the system will be determined by the decision rule threshold I, since for r,l we obtain that the sum of the non-negative values is less than or equal to zero, and consequently, the system is inconsistent (there exists a y (Sj) >O). On the other hand for 60 we have the sum of the non-positive values less than or equal to zero, and consequentiy the system is consistent. We obtain a one-dimensional extremal problem on the m~imization of the decision rule threshold r, situated within the limits O
2, * , _, m, j=% 2, +j). , m’;
2) at each step (for every fixed r) we calculate the coefficients of the system (4): r[$(% s~‘)-l] or (r-4) [$% s,‘)-*]; 3) we establish the consistency (or inconsistency) of the system, and depending on this we pass to a new r: r,=g(rA; 4) the process is continued until for a given G-0 the sequoia
f rk--rkkl f
is satisfied,
where rk
=
max {I’* 1 n
7, = min {r,’ f n
system (4) consistent system (4) inconsistent
-rk, we obtain the required value of the decision rule threshold. 5) taking rm,,--
The continuity of the consistency property of the system (4) is necessary for the realization of the methods of searching for rmrtx.
Optimization of a ckns of recognition algorithms
221
By Dedekind’s we~-~o~ axiom, if the division of the segment AB into two classes satisfies the following conditions: 1) AEI,
B=II;
2) each point CEAB belongs to one and only one class; 3)
each class is not exhausted by the ends of the segment;
4) for any point s~1 and any point y=II the following requirement is satisfied: x is situated between A andy, then a point LkAB exists such that any point x situated between A and D belongs to class I, and any pointy situated between I) and B belongs to class II.
We refer to class I those points of the segment [0, l] for which the system (4) is consistent, and to class II those for which it is inconsistent. The validity of conditions 1) and 2) is obvious. We demonstrate the validity of the remaining conditions. Lemma I
If for Y= rl the system (4) is consistent, the system is also consistent for any ri’C rl Proof Since the system (4) is consistent for Y= rl , a solution of it 7%(8%), . . , , 1%(S’,,,), exists, that is, the foflowing system is satisfied:
Since ri’ 1ri-l 1) , and the positive terms decrease. Therefore, the solution Ti (S,), +. . , ri (S,) is also a solution for r = r’, . Consequently, the system (4) is consistent for t = r’, . Lemma 2
If the system (4) is inconsistent for r = r2, the system (4) is also inconsistent for any r2‘>rt &oof. We suppose that the system (4) is consistent for r = t’~ . Then, by Lemma 1, the system (4) is also consistent for Y= rz , which is not so. Consequently, the system (4) is inconsistent for r = r’*. Theorem I
In the case considered the fourth condition of the continuity axiom is satisfied.
Proof: Let z-H,
~~11. Since A=O
1) x is situated between A andy, 2) y is situated between A and x.
and A
222
E.G. Kuf’yanov
If the first case is satisfied, the theorem is proved. We show that the second case is impossible: since ZEI, the system (4) is consistent for r =x, and since y is situated between A and X, we have yO such that the system (4) is consistent for r = rl
.
Proof: We rewrite the system (4) in the form
. . . . . . . . . . . . . . . . . . . . . . . .
..a.........
separating the inequalities corresponding to the rows of the control matrix belonging to the first class. We notice that T (S!) , . . . , y (S,,) make a contribution to the negative term of the inequalities from the first to the m’r -th. Their coefficients in these inequalities are [ 2@t* ‘j’)-i], where i=l, 2,. . . , ml, j-1,2,. . . , ml‘. We select from them the numbers differing least from zero: P1=
min . 1=1, ia,..., m,,3=1, z...., m,
,{(2"si~"j')-
11 +&Oo),
Similarly, for the weights of the rows &,,til +,, . . . , Spni, belonging to the i- th cIass, the $1, coefficients in the negative side will be f2’(“i* ‘1’)--1] for i=??&-f-4, . . . , TtZi, j=$_, . * , , mi’. We denote the minimum by pi. Now if in the system (4) in the negative terms we replace the [ 2zst9 s1’)-11 by the corresponding pi, leaving the [ @s~8s,‘)-_l] =O unchanged (if there are any), the inequalities are weakened. But by varying r it is possible to attain consistency of the new system. It is important that any solution of the new system will be a solution of the system (4). = We consider as a solution y (8,) = . . . =‘y (s%,) = I/r%j&, . . . ,y (Sml_l+tf = . . . =‘f (sd I/ (ml-mt-i j pl (naturally in the training matrix there are no empty classes, that is, the denominator is non-zero). Substituting these values in the new system and summing the content of the brackets for (r - l), we obtain the system
. . . . . . . ..*......................*.
223
Optimizationof a classof recognition algorithms
where Fi,, mi-nti_r in the general case is less than the corresponding mi, mi--mt-I, since in the summation there may be terms with zero coefficients which we have not replaced by the values ccl, so as not to permit the strengthening of the inequalities by an increase in the absolute value of the negative part. Therefore, we obtain a system of m’ linear inequalities with one unknown r.
-I
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . - ml ml -
ml-1
ml-,
+r 1
* * . + (ml_, -
m&
ml -
m,l_l
+m’,, [2 ml + ml-1 FCS7Tl[_l’ i;#) 111-l ”
-
~@,, sg,,
I]
1
6
(7) -
11
0.
Considering that all the coefficients in the system (7) are fixed numbers, we can write it more briefly:
We notice that if p,=O, this inequality is satisfied for any r, and consequently it may be removed from the system. Since all the inequalities are the same way round, the solution is of the form r< 1
6
,where y = min
6=max{P,,
Since r>O,
00,
. . . , pm*}.
there exists an ri, O
for which the system (4) is consistent.
It would be possible to explain whether in the second class points exist which are different from the end of the segment, but this is not important in this case. Even if there are no such points, for a specified E)O we can take as rmax any point rE [ I-E, 1). Therefore, on the segment [0, l] there exists a point D such that for any r+ [ 0, D) the system (4) is consistent, and for any rk’E (D, I] it is inconsistent.
3. The necessary and sufficient condition Returning once more to the subsystem of strict inequalities (3b), we consider one of its m’ subsystems - the subsystem (3i), corresponding to the row of the control matrix S{EK(.
224
E. G. Kul~anov
We notice that for 60 and the requirement y (Si) >O the system (3j) is inconsistent, since by a remark of section 1, for a futed set y (8, ), . . . , 7 (S,,,) there is a negative number on the left side of each inequality of the system, and a positive number on the right side. For r> 1 and 7 (Si) 20 each inequality is obviously valid, and consequently, the system (31) is consistent. Therefore, the problem of determining the consistency of a system of the form (3j) reduces to the problem of minimizing the decision rule threshold r for which the system (31) remains valid. For the solution of this problem, as also for the solution of the similar problem for a system of non-strict inequalities, continuity of the consistency property of the system with respect to r is essential. The proof of continuity for a subsystem of strict inequalities is quite similar to the proof given in part 2 of this section. Considering for each row S) of the control matrix, the corrresponding system of strict inequalities (3j), we obtain a set of values of the decision rule threshold ri, . . . , r,,,‘, for which the system (31) is consistent. The property of the subsystem of strict inequ~ities (3b) of the system (3) considered, together with the properties of the subsystem of non-strict inequalities (3a) of the system (3) explained in parts 1,2, give the necessary and sufficient condition for correct recognition, and completely prove the optimal recognition algorithm from a class of recognition algorithms based on the calculation of estimates, presented in section 2. ~a~s~te~ by J. Berry REFERENCES 1.
ZHLJRAVLEV, Yu. I. and NIKOFOROV, V. V., Recognition algorithms based on the calculation of estimates. Ki~~etika, No. 3, l-11,1971.
2.
CHERNIKOV, S. N., Linear inequalities (Lineinye neravenstva), Fizmatgk,
3.
FAN TSZY.,On systems of linear inequalities, in: Linear inequalitiesand related questions (Lineinye neravenstva i smezhnye voprosy), 214-262, Izd-vo in. lit., Moscow, 1959.
Moscow, 1968.