Classification by cascaded threshold elements

Classification by cascaded threshold elements

Pattern Recognition Pergamon Press 1971. Vol. 3, pp. 243-251. Printed in Great Britain Classification by Cascaded Threshold Elements* F. HOLDERMANN...

394KB Sizes 0 Downloads 49 Views

Pattern Recognition

Pergamon Press 1971. Vol. 3, pp. 243-251.

Printed in Great Britain

Classification by Cascaded Threshold Elements* F. HOLDERMANN Research group at the Institute for Information Processing, University of Karlsruhe, W. Germany

(Received 4 August 1969 and in revised jbrm 10 August 1970) Abstract--A method is presented to synthesize threshold element networks by iterative computation of the parameters. The modification of the parameters, which is under the control of a gain function, is carried out sequentially. The training procedure of the whole cascade is subdivided into separate training periods during which only the parameters of one threshold element are modified. The algorithm starts with one threshold element, lfa given pattern set is not linearly separable the structure of the classifier is extended automatically to a general cascade. Computer simulations indicate good results.

1. I N T R O D U C T I O N THIS paper is concerned with the synthesis o f networks of threshold elements to separate nonlinear pattern sets. It has been demonstrated (t':l that it is always possible to solve any decision problem by cascading a number of threshold elements. If the synthesis of such a network is accomplished in such a way that the parameters of those threshold elements which have already been determined remain unchanged during the further evaluation, the number of required threshold elements in the final network is generally quite large. This paper presents a method to synthesize a network of threshold elements by iterative computation of the parameters requiring relatively few threshold elements. Computer simulations indicate satisfactory results. 2. LINEAR S E P A R A T I O N

2.1 Problem specification The binary patterns Xi(l < i < m) of a pattern set shall be separated into two classes A and B with one threshold element. To achieve this the components wj (j = 1 . . . n) of a weight vector W and the threshold T of a threshold element (Fig. 1) must be determined in such a way that the following system of equalities exists, equation (1):

• xijwj-T=

Yi,

f ( x i l , x i 2 . . . . xi, ) = 1 i f X i ~ B

j=l

-

Z x i j w j + T - e = Yi,

f(xil,Xi2

....

xi,) = 0

ifXie;A

j=l

yi>O

1
* This work was sponsored by the German Ministry of Defense. Contract No. T880-1-203. 243

(1)

244

HOLDERMANN

F.

Xil

e

xi2

o

,••

:),, El o

o f

SYMBOL xin

o

FIG. 1. Structure of a threshold element. x u are the components of a pattern Xi, e is a positive constant and f is the output function of the threshold element. Yi is the distance from the threshold, if it is negative the corresponding pattern X~ is wrongly classified. With regard to a hardware realization of the threshold element a solution of equation (I) must additionally fulfill the following requirements : w j, (j = l . . . n) and T must be integer numbers between certain limits which may be specified. 2.2 Parameter modification Starting with an arbitrarily chosen weight vector it is the task of a training algorithm to modify the components wj (j = I ... n) and Tsequentially. Since the number of possible parameter modifications is rather large, it is necessary to restrict them to some basic modificationst3L In the procedure to be described only the following 4 modifications are allowed : (l) The value of a component wj may be increased or decreased by a fixed amount A: type 1.• ~j" = w j + A type 2: wj = w j - A

A > 0, j = l . . . n

(2)

(2) Together with the modification of wj the value of the threshold T is modified by the same a m o u n t type 3 : wj = wj + A and T' = T+ A

(3)

type 4 : wj = wj - A

and

T' = T - A A > 0,.j = 1 . . . n

2.3 Gain fiowtion In order to be able to estimate the improvement which can be obtained from the different parameter modifications, the following gain function is defined which must be maximized during the training process: The normalized sum of all negative distances Yi (1 < i _< m) of the wrongly classified patterns shall be maximized, equation (4).

Z =

~ y.JlWl shall be a maximum• (4) ~<-o The value of Z is negative and converges to the value zero only in the case of the solution vector. l
Classification by cascaded threshold elements

245

2.4 The training algorithm The parameters of a threshold element are modified sequentially, one after the other. At the beginning of each iteration step, for all possible parameter modifications (4. n possibilities) the corresponding values of the gain function are evaluated tentatively. Only that modification is actually executed which guarantees the best value of the gain function. Let us denote the value of the gain function resulting from a parameter modification of the type k at component w~ by Z)k, then equation (5) gives the best value Z" which can be obtained at that iteration stage by executing one of the possible parameter modifications.

Fma 'l

Z' -- max

x (Zjk

.

(5)

~=i L k=1

The simulation of this method shows, that by selecting the modifications in the described way, after a certain number of training steps the gain function converges to zero. in the linearly separable case, or to a m a x i m u m value otherwise. This fact may be recognized by observing the gain function with the sequential training steps and the training process has to be terminated.~4~ An exact stopping condition cannot be formulated and must be determined by experience. In the simulation of this procedure the following stopping cohdition turned out to be sufficient: The training process stops if after 2. n training steps no further improvement of the gain function can be reached. 2.5 Simulation and results The described procedure was tested with binary pattern sets which were generated by a pseudo random number generator. The class-membership of the generated patterns of a set was determined by a threshold element generated at random. By this means the pattern sets were known to be linearly separable. The weights of the randomly generated thresholds were restricted to integer numbers in the range of + 10. The range of the evaluated weights was limited to +20. The constant e, equation (1), and the modification increment A were chosen to : ~=A=I. With these limitation all the generated pattern sets were separated linearly by the described procedure. For comparison the same sets were also separated by the well known "'Increment M e t h o d " J 5~The results are shown in Table 1. The programs were written in Algol language and run on a C D 3300. TABLE I. RESULTS WITH LINEARLY SEPARABLE EXAMPLES

Set no. 1

2 3 4 5

m

n

1000 1000 1000 1000 1000

20 20 20 10 10

Computer time rain Increment Described method procedure 8

24 9 0.7 0.9

w,....

6

38

6.5 6 0.6 1.0

45 39 20 22

m = number of patterns. n = number of criteria. w,,,~, = maximum weight generated by the Increment method.

246

F. HOLDERMANN

3. N O N L I N E A R S E P A R A T I O N 3.1 Problem specification If a given problem is not linearly separable, in a next step the classifier structure has to be extended. One way to do this, is to use a cascaded structure according to Fig. 2. ~4}

FIG. 2. General structure of a cascade with 2. 3 and 4 threshold elements. Using such a loop free network with e threshold elements, the separation problem becomes that of finding a set of weight vectors W v, (v = 1 ... e) such that

x~iw,j+ ~ f,,'w,,.,,+,-Te = y~, > 0,

ifX~EB

v=l

j:=l

~,,

(6)

¢--1

X~jWei-- ~ f~.we.,,+~+T--e = y~, > O, ifX, e A j=l

v=l

with

1 ~, v-I _>0 0 if x,jw~i + ~., f,, . w~.,,+ , - T~ .i=1 ,'=1 _~ --e

f,i

v =2...e-1 and fu =

1 0

if

j=l

xuwl j - T 1

_o _< - e

i = 1...m

w~i and Tv (v = 1 ... e,j = l ... n) must be integer numbers within given limits. Since the decision boundary which is realized by such a network is composed of pieces of hyperplanes which correspond to the single threshold elements, this way of classifying is called piecewise linear separation. 3.2 Training procedure for piecewise linear separation Similarly to the evaluation of the parameters of a single threshold element, the determination of the parameters of a cascade may be accomplished by changing the parameter values sequentially. Since there is a large number of possibilities to modify the parameters of a cascade, it is reasonable to make some restrictions which reduce the number of possibilities.

Classification by cascaded threshold elements

247

The training procedure for the whole cascade is subdivided into several training periods during which only the parameters of a single threshold element are modified in the same way as described under 2.4. The parameters of all other elements remain unchanged. In Fig. 3 the corresponding threshold elements which are modified during the different training extension training

period

of

Sl

of the structure period

of

Sz

traln ing period of St ,,

,, extension of the structure

c

~

~

training

period

of

S3

training

period

of

5z

training

period of

St

1

extension of the structure training

period

of

St.

I

FIG. 3. Training algorithm.

training periods are marked by a diagonal stroke. The algorithm starts with a training period of the threshold element with the highest index number e. Then the training periods of threshold elements with decreasing index number v (v = e - 1 . . . 1) follow sequentially. After the training period of the threshold element St the algorithm continues with a

F. HOLDERMANN

248

training phase of the highest ordered threshold element Se. The sequence of training periods of S,, Se_ t . . . . S~, S, is called a training cycle. A new gain function is defined which supervises the entire cascade structure. This gain function is defined as the normalized sum of all negative distances y,~ to the threshold T,. This gain function which has to be maximized is called global gain function Z g [equation (7)]. Zg =

~

Ye'dlW,I has to be maximized.

(7)

1 ~i
The algorithm always compares the values of the global gain function of two successive training cycles. If both values are equal, the existing cascade structure has to be extended by an additional threshold element. During the training period of threshold element S, the modifications of all parameters are regarded, included the weights of the additional inputs. The output values f,~ (i = 1. . . . m) of S, are given by: -

t

> 0

then f,~ = 1

_<-e

thenfvi=O.

x,jw,~+ E f , iw,.,,+,- T,,

j--I

,=t

Since the weights w,t.n+, of the additional inputs of higher ordered threshold elements S,~ (1 < v I < e, r = 1 ... v t - 1) may be positive or negative, the following correspondence does not exist: fvl '=" 1 i f X i ~ B fvi " O

ifXi~A.

Before starting the training of St it is therefore necessary to decide which value (0 or 1) of fv~ (i = 1 ... m) would produce a better value of the global gain function Zg. To achieve this, the value off,~ (i = I ... m) which is obtained by presenting the pattern Xi to the network is negated tentatively. If the result of this test negation is such, that Y'~i
and

Y~i<-O-'AZgi=Y'~-Y~i
or

y',~
and

Y~i>O-'AZg~=Y'e~
then during the training period of S, the value offal should not be changed since the value of the global gain function can not be improved by this modification (AZg~ < 0). y'~ denotes the value of the distance of Xi to the threshold Te after test negation and AZgi is the corresponding difference of the values of Z g which results from this modification. If however the test results in: t

t

0 >_ Y~i > Yei ~ AZgi = Yet--Y~ > 0 or t

Yet>O

and

Yet < O ~ AZgi = - yei > O l<_i<_m

Classification by cascaded threshold elements

249

then the training algorithm should try to achieve this negation of f,~ by modification of the weights w,j (j = 1 . . . n + v - l ) , since it leads to an improvement of the value of Zg. In all the other cases Y'el >- 0

and

Yei >-- 0 -* A Z g i = 0

or Y'~i = Yei ~ A Z g i = 0 l
a modification o f f , i does not affect the value of the global gain function. During the following training period of Sv these patterns must not be taken into consideration. The subset of training patterns therefore only contains patterns Xi with AZg i :/: 0. To achieve the best value of the global gain function during a training period of S~ it is possible to define a local gain function Zlv which has to be minimized. Zlv =

~

A Z g , has to be minimized.

(8)

I <_i
AZgi >0

The allowed parameter modifications during a training period of Sv are the same as given by equations (2) and (3). Denoting the value of the local gain function after a modification of type k (k = 1 ... 4) at component wj (j = 1 ... n + v - 1) by Zl'vj k, then equation (9) gives the best value Zl~ of the local gain function which can be obtained by one of the allowed modifications. Zl',. =

min j=l

min (Zl'~.jk) . k~l

(9)

That modification is actually executed, which results in the best value of the local gain function Zl'~. By observing the values of the local gain function with the sequential training steps it can be decided, when a training period of Sv has to be terminated. ~4~ In the simulation of the procedure a stopping condition similar to that under 2.4 proved to be sufficient:The training period of S~ can be terminated if after 2. (n + v - l ) training steps no further improvement of the local function can be reached. If during a training cycle there was no Zl,. (v = 1 ... e) which could be improved, the value of the global gain function Z g remains unchanged. In this situation the existing cascade structure must be extended. This can be done in such a way, that the general cascade with e elements is extended to a general cascade with e + 1 elements. In the simulation of this procedure the introduced element was the element with index number e + 1 in the extended structure. After the extension of the structure, the training process continues as described before, until all patterns of the set are separated correctly. 3.3 Simulation and results To test the effectiveness of this procedure several computer simulation experiments have been carried out. Problems were used, where the maximal structure of a separating cascade was known. To achieve this, binary pattern sets and general cascades were generated randomly. The cascades were only used to determine the classmembership of the

F. HOLDERMANN

250

patterns which served as training samples to the algorithm. The weights of the randomly generated cascades were limited to integer numbers in the range of _+5. The results shown in Table 2 demonstrate that the number of computed elements in a cascade roughly corresponds to the number of elements in the generated cascade. The variation of the weight values of the computed cascades was restricted to integer numbers in the range + 10 with A = 1 and e = 1. TABLE 2. RESULTS WITH CASCADESTRUCTURE Set no.

m

n

e~

e,

1 2 3 4 5 6 7 8 9 10

200 200 300 400 300 300 300 300 400 400

8 8 8 9 10 12 12 12 15 15

3 3 3 4 4 5 4 4 4 5

3 2 4 7 4 6 4 5 4 6

m -n = e~ = e, =

number number number number

of patterns. of criteria. of elements in the generated cascade. of elements in the computed cascade.

As a second group of pattern sets, handwritten numerals were classified. The binary measurements were given by the elements of a 9 x 5 raster which was produced by a preprocessing procedure from a 64 × 64 raster. The classification of the 10-class-problem was subdivided to 10 two-class-problems by separating one class from all remaining classes. The pattern set contained 450 patterns with each 45 criteria. The weight values were restricted to integer numbers in the range of ___5. The algorithm computed not more than 3 threshold elements in a cascade for one separation problem.

4. CONCLUSION These examples demonstrate that the described procedure in all cases comes to a solution with a reasonable number of threshold elements in a general cascade. A further advantage of the algorithm is, that the computed values of the weights and thresholds are integer numbers within given limits, so that the threshold elements may be implemented without difficulties. SUMMARY Nonlinear separation of pattern sets may be achieved by cascaded threshold elements. A method is presented to synthesize such networks by iterative computation of the parameters of the threshold elements. The numerous possibilities of parameter modifications of a threshold element are strongly restricted to 4 basic types. An algorithm which works under control of a gain

Classification by cascaded threshold elements

251

function carries out the modifications sequentially. The algorithm starts with the training of one threshold element. If the given problem is not linearly separable, the training is terminated if a stopping condition is reached, which is derived by observation of the gain function. In this case the structure of the classifier is extended to a cascade with 2 threshold elements. If the given problem cannot be solved with two threshold elements, after some training the structure is again extended, etc. The training of a general cascade, which is supervised by a global gain function is subdivided into separate training cycles and training periods. During a training period only the parameters of one threshold element are modified. During a training cycle all threshold elements of the cascade are modified sequentially following some ordering scheme. The values of the global gain function of successive training cycles are compared and determine when the structure has to be extended. The procedure has been tested with different pattern sets, which were generated at random. The results are satisfactory.

REFERENCES I. J. A. CADzow, Synthesis of nonlinear decision boundaries by cascaded threshold gates, IEEE Trans. Computers C-17, 1165-1172 (1968). 2. J. E. HOPCgOFT, Synthesis of threshold logic networks. T.R. 6764-1. Stanford Electronics Labs.. Stanford. Cal. (April 1964). 3. F. HOLDERMANN, H. KAZMtEgCZ^g and W. ZOgN, An algorithm for adapting threshold elements by a digital computer, Proc. Int. Conf. Pattern Recognition, Grenoble, 11-13 Sept., pp. 149-159 (1968). 4. F. HOLDERMANN,Separierung durch Kaskadenschaltung yon Schwellwertelementen, Dissertation, Universit~it Karlsruhe (1969), 5. C. H. MAYS, Adaptive threshold logic, TR 1557-1, Stanford Electronics Lab., Stanford, Cal. (1963L