Supervised learning based on “neurons” sensitive to similarities and dissimilarities in the stimulus features

Supervised learning based on “neurons” sensitive to similarities and dissimilarities in the stimulus features

INFIX TIQN SCIENCES 70.109- 118 f 1993) 109 Supervised Learning Based on “Neurons” Sensitive to Similarities and Dissimilarities in the Stimulus Fe...

608KB Sizes 0 Downloads 15 Views

INFIX

TIQN SCIENCES 70.109- 118 f 1993)

109

Supervised Learning Based on “Neurons” Sensitive to Similarities and Dissimilarities in the Stimulus Features JOSEF

SKRZYPEK

and VALTER

RODRIGUES

ABSTRACT An abstract model of a neuron is introduced that can compare stimuli by detecting similarities (S) and dissimilarities (D) in synaptic inputs. Using SD neurons in a traditional error backpropagation (BP) neural networks improves categorization and learning capability. The nonlinear combination of similar and dissimilar input features captures more extensive information about input stimuli. This guarantees more effective convergence properties when tested with XOR problems.

1.

INTRODUCTION

Recently a number of computational models have been introduced that attempt to extend the Hebb theory of learning 12, 4, 63. The most popular paradigm, error backpropagation-based learning [5], results from comparing the network output (activated by a currently viewed instance of an object) with the unique form of the output category representative specified by the supervisor. The square of differences found in this comparison is a measure of distance between the category representative and the input; successful learning is indicated by the square of this error value approaching zero. In principle, learning could involve comparisons between the currently viewed instance of an object and previous inputs that have already been categorized. Furthermore, such a comparison should examine similar and dissimilar characteristics between object instances. In general, when comparing inputs from the same category, similarities seem to be most important [7, 81. On the other hand, for stimuli belonging to O.Elsevier Science Publishing Co., Inc. 19Y3 655 Avenue of the Americas, New York. NY 10010

0020”0255/93/$6.00

110

J. SKRZYPEK

AND V. RODRIGUES

different categories, the comparison of dissimilarities is more relevant than feature similarities. In this paper we introduce similarity-dissimilarity (SD) “neurons” that can explicitly capture information relevant to distance between similarities and dissimilarities of object features. We consider here only the first-order comparison between the current exemplar and the most recently learned category representative. In other words, during sequential presentation of the input data instances, two consecutive stimuli are compared to each other. The time delay between consecutive inputs can be easily adjusted to allow comparisons between nonconsecutive inputs. The distance measure between two objects compared analytically depends on a multitude of factors [g]. Our formulation of the SD neuron relates the similarity or dissimilarity of two objects as a function of both common and distinctive features in both objects. Thus, an increase in the number of common features implies growing similarity, and conversely a decrease in common features increases dissimilarity. This is a monotonic relationship where the relative effect of any two common (distinctive) features is independent of a third common (distinctive) feature. 2.

SD NEURON

FOR~IS~

Our SD “neuron” is sensitive to the non-Euclidean distance measure that indicates similarity or dissimilarity between two objects. Its mathematical formalism can be expressed as a neuronal input-output relationship specified by a nonlinear combining function modified from [S].

The sensitization parameters

A, and Bi are given by if Fi,,#Fi,_,,+7;: otherwise

and

Bi =

1 i0

if Fi,t = Fi,,_ Ai t- Ti otherwise

SUPERVISED

111

LEARNING

The output is

where

a(t) is the activation function of the SD neuron; h = [O,11 defines whether the SD neuron is more sensitive to similarities or to dissimilarities; N is the total number of synapses in the neuron; F;,, is the value of feature i at instant t; wF is the associated synaptic weight; Ai is the mismatching parameter thlt fires when a feature Fj,2_Ai that occurred in the recent past (t - hi) is absent from the current input and vice versa; B, is the matching parameter that indicates whether a feature Il;,,_Ai is now also present in the input; Ai indicates the temporal interval within which comparisons are effective; Oi, is the threshold of cell i that can be learned; and T is the amount of noise that can be tolerated during the comparison. The parameters Ai and Bi indicate whether the associated feature t;;,, stimulates the SD neuron in a desensitization or sensitization state, respectively. In other words, A, and Bi indicate whether the feature value Fi,, should be multiplied by a desensitization factor 1 - A (if B, = 1) or by a sensitization factor -h (if Ai = 1). A desensitization state prevails when excitatory and inhibitory synapses remain unchanged in their respective functions. A sensitization state occurs when changes in the neuron input cause an inversion in the synaptic function-for example, an excitatory synapse becomes inhibitory, and vice versa. 2.1. ENCODING SIMILARITIES SWAPTIC WEIGHTS

AND DISSIMILARITIES

AS CHANGES

IN

To understand how a network of SD neurons could learn an input pattern encoded as different degrees of similarity between temporally distinct exemplars, we consider categorization as a form of conditioned response to a distributed pattern of activity. In other words, the learning phase is characterized by a sequence of patterns that should be categorized one at a time regardless of how many comparisons are made. The BP learning rule [5] is applied every time an exemplar is presented to activate the network, and the weights are updated after each presentation cycle

112

J, SKRZYPEK

AND V. RODRIGUES

according to l-h Awji = qSj Oi _ h

for similarity in the synapse activity for dissimilarity in the synapse activity

(2)

where for similarity for dissimilari~ where Sj is the error for an arbitrary cell j, n is the learning rate, Oi is the SD neuron output, and f’(oj) is the derivative of the output. In essence, the synaptic weight change could be generated in two different and opposite ways. In the case of a feature similarity, Ayji is a function of positive values of Sj, Oi, n, and in the case of a dissimilar@ the synaptic weight change depends on negative and proportionally different values of Sj, Oi, n. Thus similarity and dissimilarity information can be propagated through the net by two semantically different values encoded as changes in the synaptic weights. 3.

RESULTS

To examine temporal behavior of the neuron model, we designed the simulation experiments shown in Figure 1. The inputs are simple binary waveforms where the frequency of I, is twice the frequency of Z,. The output of a neuron is normalized by a sigmoid function and has values ranging between - 1 and + 1. 3.1.

TEMPOR.4

BEHAy7oR

OF THE SD NEURON

According to Equation (11, when excitatory synapses of the SD neuron are activated with similar inputs over some period of comparison Aj = 1, the neuron generates positive output. If the synaptic activities are inhibitory, the output becomes negative. When the synaptic inputs are dissimilar, the neuron behaves as though the polarity of each synapse were reversed; excitatory synapses become inhibitory and vice versa. In other words, temporal dissimilarity in inputs (Ai = 1 and Bi = 0) to the SD “neuron” is equivalent to having its activity a(t) dominated by the second term in (1). The resulting activation level is proportional to a combination of these “inversions” in synaptic activities. In general, activation level is a

SUPERVISED

113

LEARNING

9 Aw

Fig. 1. SD neuron with two binary inputs F, and F2 where the frequency of F, is twice the frequency of F2. W, and W2 are the synaptic weights. a(t) is the neuron activity described by the combining function given in Equation (1).

graded function of synaptic polarity changes; the increasing similarities are signaled by continuity in the output of the SD neuron level. Meanwhile the dissimilarities are represented by output transitions between high and low levels. All synaptic activities remain unchanged in value and polarity until a learning rule modifies them. Figure 2 shows two different responses of the SD neuron stimulated with two binary inputs from Figure 1. Using Equation (1) specifying the combining rule for SD activation and the equations for updating weights [(2) and (311, we can explain the temporal behavior of the SD neuron output in the following way. In Figure 2(A), when both synapses are excitatory (w, and w2 are positive), the dissimilarity of inputs at the first (positive) transition of I, results in negative transition in the output trace, an effect comparable to both input synapses becoming inhibitory. Corresponding parameters of Equation (1) have the values A, = 1, h = 1, w1 = 0.7, w2 = 0.5, B, = 0 because F, changes in amplitude from 0.0 to 1.0, B, = 1

114

J. SKRZYPEK

-0.6

0.2

1

i

0.0

-0.2~

I -0.4

-0.6 1

I

AND V. RODRIGUES

t P

u

Fig. 2. The temporal behavior of the neuron for hvo binary inputs as shown in Figure 1. (A) neuron response for A = 0.7, W, = 0.5, W, = 0.7, A, = 1, A2 = 1 with two excitatory synapses; (B) neuron response for A = 0.7, W, = 0.7, W, = - 0.5, A1 = 1, AZ = 2 with one excitatory synapse and one inhibitory synapse.

because F2 remains the same at 0.0, F1 = 1.0, F2= 0.0, A, = 1,and A, = 0. The final result of this first transition is the output value of -0.49. During the second (negative) transition in I,, the second input Z, goes “high.” The corresponding values of parameters in the equation for combining functions are A, = 1, h = 1, w, = 0.7, wz = 0.5, Z?,= 0 because F, changes in amplitude from 1.0 to 0.0, B,= 0 because F2 changes from 0.0 to l,O, F,= 0.0, F2= 1.0, A, = 1, and A, = 1. The final result of this second transition is the output value -0.35. During the third and final transition, I, becomes similar to I,, resulting in an additional positive step at the output, signifying increasing similarity. Thus the cumulative effects of both

SUPERVISED

LEARNING

115

inputs cause the following sequence of transitions in the output level: - 0.49, - 0.35, - 0.34 [Figure 2(A)]. The same analysis could be applied to Figure 2(B), which represents a more complex situation. Here the comparisons are between two consecutive inputs at the first synapse (A, = l>, which is excitatory, w, = 0.7. At the second synapse, which is inhibitory, wZ= -0.5, the comparison is made between every third input (A2 = 2). 3.2.

HECXJLARITY

OF CONVERGENCE

To examine learning capabilities of the SD neurons we compared the performance of a conventional BP algorithm with the BP based on SD neurons, using an exclusive-or (XOR) problem as a test. The XOR problem is a parity problem in which an output response of 1 is required if the input pattern of size 2 (00, 01, 10, 11) consists of an odd number of l’s and 0 otherwise. A three-layer network with two input nodes, five hidden nodes, and one output node was used with three different values of learning rate parameter (7 = 0.7, 1.O, 1.4). The behavior of this network can be analyzed as was done for the single SD “neuron” in Figure 2. We consider the learning phase to be completed when a cutoff error threshold less than lo-’ is achieved. Alternatively, the learning phase is terminated after more than 2 x 10’ epochs. The SD BP network has regularly converged to a stable solution in all experiments (Figure 3). Although the SD BP was often slower in reaching the cutoff covergence point than the best results for conventional BP, its upper and lower limits of number of epochs were finite and proportional to the learning rate. The dark shaded area represents the scatter region of convergence for the SD-based BP. Results for the conventional BP (lightly shaded area) indicate the absence of upper limits for convergence. XOR learning experiments have shown that the SD networks display a more consistent regularity in convergence behavior than conventional BP networks. In Figure 4, the convergence curve for a traditional BP (upper part of the figure) displays many large variations in error value from one epoch to another. In comparison, the behavior of the error function in the SD network t/1 = 0.001) shows a relatively smooth descent, implying that the network adaptation is in the right direction during almost every epoch. 4.

DISCUSSION

Examination of a current exemplar with respect to previously categorized representatives allows the SD-based BP to learn information about

116

J. SKRZYPEK

200

400

AND

6OQ

800 Y

V. RODRIGUES

IWO

Epochr

Fig. 3. Comparisons of the convergence limits between the SD “neuron’‘-based BP (horizontal or vertical shading) and the conventional BP network (gray shading). The conventional BP nets have no upper limits for convergence.

past activities of the hidden units as compared to the present level of activation. This information propagated throughout the network gives better reproduction of the proper category representative. Differences in this reproduction are considered errors that are quantitatively backpropagated, thus modifying the weights and forcing the network to learn the currently presented exemplar. In other words, the similarities and dissimilarities detected between stimulus input and category representatives give the SD-based BP neural network an ability to find more informative local minima of the error function. This property is comparable to the modified algo~thms of the random optimization method of Matyas [l], which ensures convergence to the global minimum of the objective function. Our studies are restricted to “first-order comparisons” for the sake of simplicity, but this should not preclude an intensive comparative processing in which higher order comparisons could be made in parallel as a sort of local aligning technique [31. More representatives can be included in the comparisons as the time delay increases: an extensive analysis could be done between all exemplars. Thus the singularity of the current aspect of the viewed exemplar in relation to all other representatives can be detected. Our preliminary experimental results suggest that a supervised learning strategy like BP can be made more efficient when learning includes comparisons.

SUPERVISED LEARNING

fl

p:

.: :..:b

117

li 13

:

IO

:

‘>

-

-

:

‘0

-

;

I?

.-

:

40

-

:

1s

-

I

t-d

s-o :

Fig. 4. Comparison of convergence behavior between the SD-based BP (bottom) and the conventional BP (top) when tested with the XOR problem. The learning curve of a conventional BP displays many sharp jumps (“noise”), indicating large errors between consecutive epochs. The SD-based BP network, with A = 0.001, displays a much smoother learning curve.

I/alter Rodrigues was supported by an INPE/CNPq-Brazil fellowship. We sincerely acknowledge partial support to JS by ARCO-UCLA grant no. 1, ONR grant NOOO14-86-K0395, and AR0 grant DAALO3-88-K-0052.

REFERENCES N. Baba, A new approach for finding the global minimum of error function of neural networks, Neural Networks 2:367-313 (1989). M. A. Cohen and S. Grossberg, Absolute stability of global pattern formation and parallel memory storage by competitive neural networks, Trans. IEEE 13:815-826 (1983). C. Koch and S. Ullman, Selecting one among the many: a simple network implementing shifts in selective visual attention, MIT Memo 770, 1984. A. H. Kopf, A neuronal model of classical conditioning, Psychobiology 16:85-125 (1988).

118

J. SKRZYPEK

AND V. RODRIGUES

5. J. L. McClelland, D. E. Rumelhart, and the PDP Research fribured Prucessing, MIT Press, Cambridge, Mass., 1Y86.

Group,

Parallel Dis-

6. R. S. Sutton and A. G. Barto, Toward a modern theory of adaptive networks: expectation and prediction, Psychol. ReLl. S&135-170 (1981). 7. A. Treisman, Features and objects: the fourteenth Bartlett memorial lecture, Q. J. &I. fsychol. 4OA(2):201-237 (May 1988). 8. A. Tversky, Features of similarity, Ps)tchol. Rer!. 84:327-352 (1977). Recehed

28 October 1991: rwised 30 April 1992