Neural Networks, Vol. 2, pp. 495-5[16, 1989
tlS93-608(l:89 $3.00 + .00 ('opyright ~ 1~18~. ~ Pergamon Press plc
Printed in the USA. All rights reserved.
ORIGINAL CONTRIBUTION
A Learning Mechanism For Invariant Pattern Recognition in Neural Networks A. C. C. COOLEN AND F. W. KUIJK University of Utrccht
( Received 24 ()ct~ber 19S8: revi,~edund acc{7~ted 26 Metv 1~,~89) Abstract--We show that a neural network with Hebbian learning and transmission delays will automatically pepTlbrm invariant pattern recognition/br a one-parameter tran,~lrmation group. To do this it has to experience a learning pha,se in which static objects are presented as well as ob/ects that are continuously ttndergoing small tran,slfbrmations. Our network is ,lidly connected and starts with zero initial synapses so it does not require any a priori knowledge ~{¢ the transformation group, bront the in[brmation contained in the " m o v i n g " input, the network creates its internal representation of the tran,sJbrmation connecting the moving states. If dze network cannot per[brm this tran,sJbrmation exactly, we show I/tat in general the network representation will be a sensible approximation in terms Qf state overlaps. The limitation ~[ oztr model is that we can intplement only oneparameter tran,sJormation groups.
Keywords--Neural network, Invariant pattern recognition, Self-organization, Transformations. may even bc fruitful since it can eliminate so-called spurious stable states (Amit, Gutfreund, & Sompolinsky, 1985a, 1985b, 1987). As already remarked by Hopfield, sequences of states can be stored in a neural network if synapses are updated in a suitable nonsymmetric way during a learning stage in which patterns are presented in succession. The learning rule used is in fact again Hebb's rule; however it is assumed to be valid locally, that is, at the position of the synapse, so the existence of transmission delays will lead to the storage of associations between bits of subsequent patterns (Coolen & Gielen, 1988; Dehaene, Changeux, & Nadal, 1987; Gutfreund & Mezard. 1988; Herz, Sulzer, Kuehn, & Van Hemmen, 1988; Riedel, Kuehn, & Van Hemmen, 1988; Sompolinsky & Kanter, 1986). The existence of transmission delays can be exploited in another way. Instead of presenting a finite number of states in succession we will present many pairs of states. Synapses will then be modified in such a way that in the resulting network all first states of the pairs that are learned will be mapped onto the corresponding successors (so we store sequences consisting of two patterns). If all pairs consist of states related by the same transformation, and if the number of pairs is sufficiently large, the network will simply perform some sort of representation of the transformation involved. If the network has also learned static objects in
1. INTRODUCTION Nowadays the functioning of neural networks as associative memories is fairly well understood. The basic idea is that information is stored in the values of the synaptic connections between pairs of binary neurons (Hebb, 1949; McCulloch &Pitts, 1943). In particular, an item (word, image, etc,) is represented by a system state which is an attractor in state space under the system dynamics. Following Little (1974) and Hopfield (1982) each neuron can be in two states only: it is either at rest or it fires with maximum frequency. The state of a network of N neurons is therefore represented bv an N-bit vector. An item is stored in the network by forcing the network into the corresponding state; then synapses will be modified according to Hebb's (1949) rule. Because of the formal equivalence of this type of neural network model and frustrated Ising spin systems one can describe the behaviour of such networks analytically by applying statistical mechanical tools. For instance it has been shown rigorously that stored items are indeed attractors in state space; introduction of noise
A c k n o w l e d g e m e n t s - - W c would like to thank Dr. J. P. Nadal for his xery usclul suggestions. Miss S. M. McNab gave wfluable stylistic advice. Requests for reprints should be sent to A. C. C. Coolen, Department of Medical and Physiological Physics, University of (!trecht. Princetonplein 5. 3584 CC Utrecht, The Netherlands.
495
A. C. C. Coolen and F W. Kuijk
496 the standard way it will be able to perform invariant pattern recognition. There are two tendencies in the system's evolution: firstly, the network will try to perform pure transformations (as a result of learning the "moving" objects); secondly, the network will try to reconstruct the stored patterns. The latter reconstruction, however, can only occur if the actual network state is correlated with one or more of the stored patterns. Therefore as long as the actual system state is not correlated with any stored pattern it will be mapped according to the network representation of the transformation that has been learned. We will show that a proper balance between the two tendencies exists such that a non-zero correlation of the actual system state with some stored pattern will cause the system to reproduce the pattern in question, whereas if all correlations are zero the actual state will simply be transformed. Clearly, an initial state will cause the network to reconstruct a stored pattern if this state can be mapped onto this pattern by repeating the transformation that has been learned, that is, if this transformation is the generator of a continuous group, we have invariant pattern recognition under this specific group. This invariant pattern recognition is purely the result of the presence of transmission delays and emerges only for transformations with which the network is confronted. Our paper is organised as follows: in section 2 we show how a network can learn a transformation from implicit information if confronted with examples of states which are mutually related by this transformation. In section 3 we consider what happens when the learning of static objects is combined with the learning of a transformation from "moving" input. We show that invariant pattern recognition can be achieved for a special class of transformations (topological maps). Section 4 contains a summary of our results.
2. LEARNING A TRANSFORMATION BY EXAMPLE
excitatory effect on neuron i; if J:, -=: 0 the effect is inhibitory. At first we will assume the system dynamics to be synchronous and deterministic s,(t +- t) = sgn[t-L(t)I
IW).
If the local field h~(t) is zero we draw &(t + t) at random according to p(s) = ~~ ~~~.~ 4- ½d~, -1 (we will investigate the implications of introducing noise and sequential dynamics in sections 2.4 and 2.5 respectively). During a learning phase where a large number (M) of pairs of states are presented synapses are modified according to Hebb's (1949) rule. We write the/*th pair as (~, 71'), so q = tq~ . . . . ,1,~,) and ~I = ( t l l . . . . . r/,~,) (all components are ±1). This learning rule is assumed to be a local one. that is. valid at the position of the synapse. By virtue of the existence of transmission delays connections will now be formed according to the correlations between bits of subsequent network states. If the delays are of the order of the time between exposure of the network to the state ~ and to the state ~ ' , then presenting pair lz will give rise to the following modification of synapses
~J,(ll) = c~ " ll,'(IZJ~l~(/z) (C~ is some positive constant). A f t e r presenting M
pairs we have J. := ~ Y, ~l;(10'/~(/,). We assume that all pairs are related by some trans formation T: { - 1 , 1}N --+ { - 1, 1}'v (not necessarily invertible) in the following way 4'(#) = T ~t(/~)
!V/~)
and that all ~ are drawn at random from some distribution ~. One finds that after presenting M examples of such a "moving" input and appropriate scaling ( a = l / M ) the connections are
J,,(M) = ((T~),qj),, + O(M ~2).
2.1. Architecture, Dynamics and Learning We consider a fully connected neural network of the Little-Hopfield type (Hopfield, 1982; Little, 1974) where the neurons s~ (i = 1 . . . N) are represented by Ising spins; that is, s, = 1 if neuron i fires with maximum frequency or s~ = - 1 if neuron i is at rest. The evolution of states in the network follows as usual from local inputs h~ h, =- ~ J,,s~ - h,,,
O)
J
where Jq is the strength of the synaptic connection from neuron j to neuron i and h,.~ is the neuronal threshold of neuron i. If J,i > 0 neuron j has an
2)
(3)
In the same way we allow for adaptation of the n e u ronal thresholds Ah,.,(10 = cd'. q;[~z) resulting in
h,.~(M) = I'.(T~)~}~ + O(M 'n)
(4)
(where F is some constant and a = 1/M); From now on we will only consider the situation where the network is fully trained: 1 L, = ~ ((T~),,7,),,
(5)
h,., = F , ((T~)i)~,
(6J
lnvariant Pattern Recognition
497
(connections are scaled by 1/N to ensure that local fields are bounded). In section 2.2 we will show that, if the transformation T can be learned in this way, Z~ ((T~)?I):~ will be of order 1. From this it follows that M >> N 2 will generally suffice as the number of transformation examples, since the relative error in the local fields is O ( N / X / M ) . There is no contradiction in having M >>N 2 learning examples in contrast to the finite storage capacity of the usual associative memory models, which is 2N at the most. Storing patterns in a network can be regarded as learning a specific transformation T, however, the number of patterns stored effects only the definition of T and is not related to the number of training examples.
tation; that is F = 0. One finds 1 '~
(10)
r, ~ ~ ((Ts'),s,):
(11)
l~j,'er,
= t
r,
\ 1 - T,]
where
From a 2 -> 0 we can derive r,-< 1. From (9), (10), and (11) it follows that the network representation of T will be an approximation of T in terms of overlaps as soon as an extensive number of T, are of order 1. Since all zj are in the interval [0, 1] this condition can be written as
1
~ ((TY),s,)i~ = O(1)
(12)
u
2.2. Quality of the Network Representation As a measure of the quality Q of the network representation of T we use the average overlap between the image of a state under the neurodynamics following from (2), (5), and (6) and the image of this state under T
Q~(I~(T,(),sgn[~J,,s,-h,,,]),~ /-
1
- NTE J]' dX" P,(X)sgn(X)
(not all transformations can be represented in this simple network of course, e.g., a transformation such as (Ts')i = sisi+l for which ((TZ),sff = 0 cannot be represented).
2.3. Realisable Transformations We define realisable transformations as transformations which can be realised with a synchronous Little-Hopfield network with appropriate connections. For such transformations T there exists a connection matrix G , such that
17) (TY), = s g n [ ~ (;,~& 1.
where
t'ix)~(,~[x-(T,(~)T,(~)(1,~:-
;i - r)),~]),.
(s)
We approximate (7) and obtain Q by replacing Pj(X) by a Gaussian distribution with correct mean and variance , 0 ~ ~ ~
J"" d x " e '::
(9)
t
We assume that all G, i are of the same order in N and consider only transformations where no threshold is necessary. We define the number of projections to neuron j as II(j)l, where /(j) =~ {i I(,,, # II}. We can scale the rows of the matrix G , with positive constants without altering the definition of T. We will eliminate this ambiguity by choosing a particular
scale
where
G;, - 2/7r.
(13)
A
/6 ~ f d X . X . P,(X) = ~1 ~, (s,(Ts),)i~
Again we consider states drawn at random from = ½ [(~,.~ + ~5,,. ~]. In this case one can easily derive
p(&) -
v • ((T,V),g
ai ~ I d X " X2p,( x )
-- l* 2
if j ¢ l(i)
J,, = (I
l
: N-5_~,~ (s,(rs),)"'(s,&)n. (sk(Ts),)~, + I "e
•
J. =
sgn(G,,) • sgn 1 + ~
(( Ts'),>~
s,
ifj E l(i)
h,, = 0.
1
- 2I" • ~ ((Ts:))~, ~ (s),, - (s,(TZ),),, - 1, ~. t
For this kind of transformation the connections J0 can be computed if we assume [ l ( j ) ] - - + :~ if N - - , ~.
As a result of the choice, made in (13), one finds We will (for simplicity) consider only states drawn at random from the distribution: p(&) = ½ [,~,.~ + (5,,._~] and accordingly we neglect threshold adap-
J,, = G,,/N
(14)
h,,
(15)
:
0.
498
/t. C. C. (oMen and F W. Ku(/k
A p p a r e n t l y such transformations will be learned exactly. For transformations where the n u m b e r of connections to each n e u r o n is small, computing connections Jti is somewhat more complicated. We will consider a class of transformations with low connectivity for which exact evaluation of (5,6) is again possible. We consider transformations T for which only the sign of the matrix elements G,j contains information
1.2
NJij/Gij O 1.0 O O
where connectivity is uniform:
il(i)J
= n for all i. If 0.8
j ~ l(i) one finds J~i = 0 (as it should). If j E 1(/)
50
the connection will be w(x) dx
J,~ = ~ sgn(G~i) 2 •
FIGURE 1. Relative weight of the synaptic connections as a function of connectivity n for the example transformations defined in (16),
if n > 1
i
1
J,, = ~ sgn(G,,)
if n = 1
where
w(x)=-
fl'~,~-ldx ~
([
,~ x
w(x)~9.
~'"w~dx.w(x) ,
1>
~~,~,s~ ,
,,
,, (n1 ) -n / 2
if n is even
= -9 , , ( n(n- - 11)/2
)
if n is odd,
then topological transformations are transformations from this structure onto itself (these transformations need not be continuous or invertible ones). If neurons represent pixels on a screen, then topological transformations as defined by (18) are simply transformations from the screen onto itself. For this class of transformations everything can be c o m p u t e d exactly if for the state population D we take all states drawn at r a n d o m from
T h e r e f o r e we find if n > 1 and j E l(i) /
p(.s',)
=
i E(I
-
1
-~
al6
-- a ) 5
19)
\
N . J,,/G,j = .,,~mr~221 .... .( -nn-/i2)
if n is even
N . J , JG,j = ~ 2 ~
if n is odd.
where a is the average activity
•
" ( n(n --- 11)/2 )
(s,s). = a'- + I .... a Z) ,~ ,.
After the learning stage we have (17) 1
We show this relation (17) in Figure 1. Again the n e t w o r k will create an exact representation of T. If we are dealing with the so-called " b i a s e d " state populations (where the average activity is non-zero) or realisable transformations T where thresholds are involved, the n e t w o r k representation will not always be exact; f u r t h e r m o r e the result will d e p e n d rather critical on the threshold adaptation F. We will now study in m o r e detail the realisable transformations where II(i)1 = 1 for all i.
J,, = ~ [,-" +
I - , ' i ~,,,,I
(2(1~
tt,, = I'u.
(21)
The network's r e p r e s e n t a n o n 7'* of T will be (T*s')~
=
sgn[1 . ( 1 -
a~)s°'" + a : m ( g ) -
Ial
where m ( ~ ) is the average activity of state ~:: r e ( g ) =- 1/N Z, s~. T h e quality function Q for this case is
,--(
[, +
-]>'
~2
~'z ~
2.4. Topological q l ~ ' ~ a d o . s If the support of T is extensively large we have
These transformations are defined as (T,~)t = s,(~) for some index map n: {1 . . . . . N}
,{1 . . . . . N}.
08)
If n e u r o n sites have a o n e - t o - o n e c o r r e s p o n d e n c e with position vectors o n some topological structure,
:
+ + <~(1-
-a2 m(-g))' sgnll
13-ii
Naal(s')__ ~ F]>~,
lnvariant Pattern Recognition
499 [
fl,
_ _1.~. , din- P(m)(1 + m)sgn 1 + + 2l
f, . d i n "
am - 1"] Na~_a~J_ _
P(m)(l - m)sgn [ 1 - N aa m~ -_F a] ~ ]
where {N
,,(,,,> :
[1 + a
+
)
N
(N ~
+ 7-_(1 - re)log
~).
One finds for a = I) Q(N
,~) - 1
forallF
C>(N
> ~)
= ].l
if i" < a 2
Q(N
> co) = 1
ifl" - a 2
Q(N
>~) - -la!
ifF>a-'.
and f o r a ¢ ()
(25)
Clearly the "inverse t e m p e r a t u r e " ff must be scaled as fi---, N[L This scaling might have been expected, since local fields in our construction tend to bc of order 1 / N : scaling t e m p e r a t u r e can easily be avoided if one does not scale connections as in (5). After scaling the t e m p e r a t u r e as indicated we find the result as shown in Figure 2: noise does not qualitatively affect the network representation, it just gradually decreases the quality function Q. For the topological transformations considered in section 2,4 we find
= g(1 + a)tanh fl(1 - a ~)
'
Following Little (1974) we modify the evolution rule (2) so as to include possible noise as a M a r k o v process
Or
ix
Q(N ~ x)
2.5. Noise
> .J') = [~
__ e ':~' tanh
(22)
Therefore, topological transformations will be learned exactly for " b i a s e d " system states as well (i.e., states with n o n - z e r o average activity) if thresholds are properly adapted (for a = 0 of course no threshold adaptation is necessary). H o w e v e r it should be noted that from (22) it also follows that the choice of the adaptation factor [" is rather critical i r a ¢ (1.
W(,J
~c) =
Q(N-~
I
l
+ 2(1 - a)tanh fi(l - a:)
(
(l
1
'11 ')1
* Na -l~ --- v a
<
Na 1 - a:
.
(26)
A p p r o p r i a t e threshold adaptation l" = a 2 will give Q ( N - - - ~ ~) = tanh{/](1 - a:)l.
2.6. Sequential Dynamics To obtain an idea of how the network will perform a transformation if the neurons arc sequentially updated we have done simulations with tin N = 16l)000 system. The neurons represent pixels in a 400 x 400 screen and the network has learned a rotation of the screen image over + 0.05 rad. Dynamics was as follows: at each step a neuron i was chosen at r a n d o m and updated according to the sign of the local field
(23) 2
'
cosh(fih/(s:)) 1.0
where fi = I / T is the inverse " t e m p e r a t u r e " and local fields h ( s ' ) are defined by (1). The definition of the quality of the network realisation must be replaced by a new definition, namely the average quality over all final states of the M a r k o v process (2 ~ 1 ,~ ~ (W(s'
> ,~')('Fs'),s,'>,
0.5
d'
N ~ ((T~4),(sfW(~4 i
Q
> .¢')),,5,,
= ~ ~, ((Ts'), tanh(flh,(s')))w
(24)
Introducing noise in the way described above apparently a m o u n t s merely to replacing s g n [ . . . ] by t a n h f l / . . . ] in the expressions derived for deterministic evolution. In this way we find for those realisable transformations considered in 2.3 for which II(j)l ~ ~ if
0.0 0.0
1.0
T
2.0
FIGURE 2. Quality function Q as a function of the scaled temperature for realisable transformations with high connectivity.
500
A. C. C. Coolen and I< W. Kuijk
h~ (1). In Figure 3 we have shown the initial network stage (left picture) together with the network state after 4.2 • 106 iteration steps (right picture). In terms of correlations ("overlaps") the final state corresponds quite well to a rotation of the initial state; introducing sequential evolution instead of synchronous evolution causes some kind of diffusion of the image (as might be expected). Again, as with noise, the deterioration is not dramatic. The duration of the iteration will be limited according to the amount of diffusion that one is willing to accept. If the topological transformation is invertible it is possible to derive evolution equations to describe the sequential process for continuous time dynamics. We assume that each spin j has a probability per unit time w that it will flip w(s,
,
1
-s,) = ~ l1 - tanhtfls,h,) l
(27)
where hj is the local field (1). For topological transformations with mean activity a = 0 we have (after rescaling temperature as in 2.5) hj = s~tj). The probability of finding the microscopic state ~' at time t will be denoted by p,(s'), the evolution of which is governed by the master equation
O p= ~'~, p,(F~g)w(-sj ( • )
"> s,)
These q, measure to what extent a state s~ is correlated with the state that would be if the transformation T were to be performed n times on the initial state ~(t = 0). We define expectation values of the q,, as usual {q,,) -=
~_.p,(~)q,,(Sr
(fluctuations around these average values will be of order 1 / ~ ) . From the master equation it simply follows for N ~ ~ that d dt (q'') = tanh(fl)(q, )
(30)
Since these (in principle infinitely many) differential equations are linear they can be solved. This is most easily done by first assuming T t o b e a periodic transformation. In this case we would have q,+K(-s) = q , ( ~ ) ( K large) for all n. Now the eigenvectors and thus the solution of (31) can be computed. After taking the limit K ~ ~ one finds
d2 " cosl2(n - m)
-- tanh(fl) • t . sin()~)] . e ,i..... h.,~,,~.~l
131)
!
- p,(~) E wU,
, - s 3.
(28)
i
Fj is the spin flip operator: F j l f f i ) ( S l . . . . sj, . . , SN) - @(s~, . . , - s j . . . . SN). We also define the correlations 1
q,,(~) =- -~ ~, s(t : 0)~,,,,s,.
(29)
If after one step of an exact transformation the network state is not correlated with the initial state we have qm(0) = 6,,.o and the sum in (31) is replaced by its first term. In the latter case it follows for every n that (q,) reaches its maximum for 6 = n/tanh(fl), so the system state is being transformed by T ( t h e r e is also diffusion). For n -< 5 we show the correlations (31) in Figure 4 if qm(O) = 6 ......
lnvariant Pattern Recognition
501
10
n=0
qn¸
n=l
=3
0
. 0
0 ~ time (steps/spin)
5
FIGURE 4. The result of describing with a master equation the diffusion, caused by sequential updating; correlations of the actual network state with the states that are found by applying the transformation T n times to the initial network state (as a function of time),
mations of the T type if the initial state is only weakly correlated with the stored patterns, (b) evolve towards one or more of the stored patterns if the initial state resembles these patterns in terms of correlation, or (c) behave according to some combination of these tendencies. If the initial state is not a familiar one in terms of correlations, the network will start performing successive iterations of its representation of the transformation T until a state is reached that is correlated with the stored patterns. Each initial state that is related to, say, pattern/L, in that it can be constructed by successively applying T ~ to patt e r n / t , will lead to the reconstruction of p a t t e r n / t . In this respect patterns can be recognized, invariant under the transformation group generated by T. A priori restrictions on the weight factor c (if invariant recognition is to be effected in the way described above) can easily be derived. We want s: ~x~ as soon as q,,(s') = qcJ,x for some ). with q > q,+ where overlap functions %(.;:) are defined as usual (Amit et al.. 1985a) q,,(Y)
1 ~
• ~,"
(33)
t
3. I N V A R I A N T P A T T E R N R E C O G N I T I O N FOR TOPOLOGICAL TRANSFORMATIONS 3.1. General
We will investigate the behavior of a network which has learned a transformation as well as static objects (the latter patterns are assumed to have been learned according to H e b b ' s rule, as formulated in Hopfield (1982) and Amit et al. (1985a, 1985b)). For simplicity we will deal only with topological transformations since there is only a quantitative difference between the network behaviour in this case and in the more general case. For the same reason we take the state population ~ during the transformation learning phase to be 1
p(s,) = 2 16,,~ + &,. ~] The p static patterns, to be denoted by £i/,~ = (5,1('' . . . . . ~}4!~) where /L = 1 . . p, are drawn at random from the same distribution ft. We then find the final connection matrix j,, = ~ , c=' ;I,);/,~,.i + 6~.1,
(32)
(the topological transformation being (T~)~ = s~(0 where ~r is an index mapping). The relative weight of the two contributions to the connection matrix is reflected in the value of e. Starting from some initial state and assuming synchronous deterministic dynamics the network will (a) perform pure transfor-
(they measure to what extent the present state s: resembles the stored patterns as labelled b y / 0 . Also we would like the network to perform pure transformations if all correlations are sufficiently small, so Y --~ Ts' if E, Iq,(~')] < q, . Both of our demands are met if c ~ ~ (q, , q , )
(34)
(so we always have ~: > 1). From (34) one can expect the optimum value of a to be 2.
3.2. Translations, Rotations and Scale Transformations
A priori it is not at all clear whether there is a proper balance between the two contributions to the connection matrix such that the invariant pattern retrieval as described in section 3.1 is indeed achieved. For instance, it is possible that the instantaneous image will be completely destroyed at the m o m e n t when the network switches from transformation behaviour to relaxation behaviour. Therefore we present simulation results from an N = 576 network which has learned both transformations and static patterns. We have considered three kinds of topological transformations: rotations, translations and scale transformations. For three values of ~: (which represents the balance between transformation term and attractor term in the connection matrix) we show the successive system states following an initial state which was obtained by applying T ~twice to one of the 10
,'t. C. (". C ootc~I and E W. Kui/k
502
E 3.0
2.0
0.5
o
1
2
3
4
time (steps/spin)
FIGURE 5. Simulation results of an N = 576 network that is required to perform invadant pattern recognition for translations, The network state is shown as a function of the number of steps per spin for several values of the relative weight ~:of the attraction term in the connection matrix.
patterns that are stored (the other 9 patterns were drawn at random). Dynamics is synchronous and deterministic (as in (2)). In Figure 5 we show the results of simulations with an N = 576 network having connections as in (32). A network state is represented by a 24 × 24 grid in
which each of the 576 blocks is either black or white (if the corresponding spin is 1 or t, respectively). The transformation learned is translation to the right over one block (with periodic boundary conditions). This transformation can: be represented exactly as a topological transformation (independent
3.0
2.0
0.5
0
1
2
3
4
t i m e (steps/spin)
FIGURE 6, Simulation results of an N = 576 netwm'k that is required to perform tnvor4ant pat~Mn ~ l o n The network state is shown as s function of the number of steps per spin for several values of the ~ attraction term in the connection matrix.
tar ~ . weight e of the
lnvariant Pattern Recognition
503
located in the centre of the screen, the scaling factor being 1.2 (again we considered synchronous and deterministic dynamics). For ¢,' = 2 we find the retriewll we wanted; for the other ,,' choices the conclusions are the same as for rotations. The e range for correct retrieval was found to be ¢: @ (1.3, 2.5). Clearly our network can perform invariant pattern recognition although its performance will depend on how well the transformation considered can be defined with the (finite) number of neurons in the network.
of the system size), It can be seen in Figure 5 that for ~: = 2 we find invariant pattern retrieval. In successive time steps the system state is "translated" until the stable state (the stored pattern) is reached. If the attractor term is too small (a = 0.5) we see that the attractor is destroyed; the system state is being mapped continuously. If the attractor term is too large the network will simply evolve to the nearest stored pattern (according to H a m m i n g distance), without being transformed. The stable state reached is not necessarily the one to which the initial state was related by a translation. The values of ¢: which allow for invariant pattern recognition turned out to be ¢: C (1.1, 3.9}. The values, however, will depend highly on the system size; larger systems will obviously perform better. In Figure 6 we see the same experiment for rotations. The system has learned a rotation over 0.26 rad around the origin, which was located in the centre of the screen (note, however, that for a relatively small system. 576 neurons, one cannot define rotations with high accuracy). For e: = 2 we find invariant pattern recognition for rotations. ~Ibo strong (~: = 3) or too weak (~: = 0.5) attractor terms in the connection matrix will again destroy this effect. The ~' region for which we found invariant retrieval as shown for e: = 2 was found to be ~: G (1.3, 2.9). As a third example we considered scale transformations (Figure 7). The N = 576 network learned to perform a scaling transformation with the origin
3.3. Possible lnvariant Pattern Recognition in Two Parameter Directions In all the examples considered in section 3.2, one type of transformation was learned, leading to invariant pattern recognition in only one p a r a m e t e r direction. The network will, for example, recognise objects if they are rotated to the left. but not recognise them if they are rotated to the right; this is not quite satisfactory. We will now consider the situation where the learning stage for transformations is generalised, such that not only a transformation T was learned by example, but also its inverse T J (provided this inverse exists). We now have (neglecting scaling for the moment) J,. : ((T;j).q,).,, + ((T ';/),q,)~,.
(35)
E 3.0
p
mmu•llwlnul~lw•lml
2.0
i :~l:'|~,
i
I I I I
. I I I
-'lli,,.i-';,"'=.i| .'-n'--',1%~'-l|f, i ..II.... u l I I I I • I I I III
I I I I I I l l l l l l l l m
wnl
•1
M
•I•NI
F :;:,:,,=i •
,,
.
i ! i l l l l l l l l l
•
•
!'.'=.:~
=gW
!'.~_..q.uJa . : B ~
|
,
i
! i =
Ill
mwl•N••lllu•llm•l•
Nl~l~lmlnUllm
•.-:.'|
0.5
•.,,W . . . . . . . . .
J:.!
.|||D~li'.,J|R-'; |.|nnl.!" ml~
0
1
• ~ • u.m• u.m~ ~ m o
nil
2
3
4
time ( s t e p s / s p i n ) F I G U R E 7. S i m u l a t i o n results of an N = 576 network that is required to p e r f o r m invarlant patern r e c o g n i t i o n for scale t r a n s f o r m a t i o n s . T h e n e t w o r k state is s h o w n as a function of the n u m b e r of steps per spin for several v a l u e s of the relative weight e of the attraction term in the c o n n n e c t i o n matrix.
,504
A. C. C. Coolen and F. W. Kuilk
And for topological transformations if p(~/) =
In conclusion one might say that it may still be possible to have invariant pattern recognition in two parameter directions; if one were to do simulations in order to answer the question one would need a large number of neurons, since the finite size effects are very important in the case of two parameter directions. Also the retrieval performance will depend on topological features of the object to be recognised.
½ (a,,.~ + ,~,, ._,): J0 = 6~1,~., + 6~ '!~i~
(36)
(If the mean activity a is nonzero one must simply include threshold adaptation, as indicated in 2.4). Instead of ending up with a new state 7' which is the transformation T applied to the present state g, we now find that the new state 7 ' consists of a combination of two noise images of the present state ~': one being T~' and the other being T -~'. This effect is shown in Figure 8 for an N = 160000 network which has learned rotations over both 2r~/15 rad and -2~t/15 tad as described in (35) (with sequential deterministic dynamics as in (2)). As time passes, this "doubling" of the image in both directions will simply continue until all structure has disappeared (unless the transformation learned is periodic). Therefore, since the descendants of the initial state still have the structure of rotated versions of the initial state in terms of overlaps (these overlaps are clearly larger than the random overlaps, which are of the order 1/V~), we might also have invariant pattern recognition in two parameter directions. If, however, the size of the object that is to be transformed is large compared to the transformation parameter, we will not have this doubling effect. In this latter case there is only some kind of diffusion, as shown in Figure 9 (where the network has learned rotations over 0.05 rad as well as over - 0.05 tad and dynamics was sequential and deterministic again). For this process one can generalize the derivation of the evolution equations (30) in order to compute the overlaps between the descendants of the initial state and the images of this initial state under the transformation T.
4. D I S C U S S I O N
We have shown that a neural network of the LittleHopfield type will automatically perform invariant pattern recognition for a one-parameter transformation group if there are transmission delays and if the network has seen not only static objects during a learning phase but also (many) objects that were "moving"; that is, being transformed by the generator of that particular transformation group. The initial synaptic strengths are zero; during t h e learning stage synapses are modified according to the Hebbian (1949) rule as commonly used in Hopfield networks (Amit et al., 1985a; Hopfield, 1982). If during the learning stage only static objects are presented, or objects that are "moving" in a random manner (without any temporal relationship), the network will simply store and retrieve the static objects as in the standard Hopfield (1982) network. If, however, there is some temporal relationship between the "moving,' states (successive states are related by some fixed transformation T) the network will also create a representation of this transformation. In the latter case the static objects can be retrieved, invariant under the group, generated by the network's representation of T. The network representation of the transformation
I
i
!
~,
.
"~'-
• ,..:
",,"
J
lL ~ 10 ~ Iteration steps.
~
a
.
N =
r .;
,~
~:.
•
L~..
~
;
":7
•',
• ~. ' . . '
:/,.
.2,
• ,,o"
~ .
"
o ~, .. •
~ ";~.-~
•
,
.,~ . . ~
~X',
~. •
•
•
c
lnvariant Pattern Recognition
505
FIGURE 9. Simulation results of an N = 160000 network with sequential deterministic dynamics which has learned both rotations to the left and rotations to the right over 0.05 rad. Left picture: the initial state. Right picture: network state after 2 • 10~ iteration steps.
T is exact if its internal dynamics is deterministic and s y n c h r o n o u s and if the transformation is, at least in principle, realisable. By realisable we mean that there is a connection matrix G,i such that the network's evolution in time a m o u n t s to applying T to its actual state. If the transformation is not realisable with the network architecture or if the dynamics is sequential or " n o i s y , " the network will in general find an approximation of 7" in terms of state overlaps. It seems to us that the nice aspect of our approach is that it is a very simple and natural one. Only if there is a need to p e r f o r m invariant pattern recognition, that is, if the objects the network handles are not always static but are often evolving in time in a structural way, will the network develop invariant pattern recognition. T h e n e t w o r k does not require any initial information concerning the transformation group. All structure is e v o k e d by the input, the presence of delays, and a H e b b i a n learning rule. We think this is a sensible strategy for a biological system. The problem with our proposal is the following: it seems only possible to implement o n e - p a r a m e t e r transformation groups, since time plays the role of group parameter. A l t h o u g h we have shown that the principle works quite nicely in one p a r a m e t e r direction, it is not at all clear how well the network will perform if two p a r a m e t e r directions are involved (for example: rotations to the left as well as rotations to the right); we find that the n e t w o r k ' s p e r f o r m a n c e d e p e n d s on the size of the object that is to be recognised. If we c o m p a r e our model with other work on invariant pattern recognition in neural networks, such as that by yon der Malsburg and Bienenstock (1987), Bienenstock and yon der Malsburg (1987), Kree and Zippelius (1988), or D o t s e n k o (1988), it seems to us that our model is clearly inferior as far as perform-
ance is concerned, but it may have the advantage of being adaptive and natural. No special architecture is needed (we only assume a H e b b i a n learning rule and the existence of transmission delays) and invariant pattern recognition develops only if the input is subject to transformations, that is, if invariant pattern recognition is needed.
REFERENCES Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1985a). Spinglass models of neural networks. Physical Review, A32, 10071018. Amit, D. J., Gutfreund, H.. & Sompolinsky, H. (1985b), Storing infinite numbers of panerns in a spin-glass model of neural networks. Physical Review Letters, 55, 1530-1533. Amit, D. J., Gutfreund, H., & Sompolinsky, H. (1987). Information storage in neural networks with low levels of activity. Physical Review, A35, 2293-2303. Biencnstock, E., & Malsburg, yon der, C. (1987). A neural network for invariant pattern recognition. E'urophysics Letters', 4, 121-126. Coolen, A, C. C., & Gielem C. C. A, M, (1988). Delays in neural networks, Europhysics Letters, 7,281-285. Dchaene, S., Changeux, J. P., & Nadal, J. P. (1987). Neural networks that learn temporal sequences by selection. Proceedings of the National Academy Of Sciences USA, 84, 27272731, Dotsenko, V. S. (1988). Neural networks: translation-, rotationand scale-invariant pattern recognition. Journal of Physics', A21, L783-L787. Gutfreund, H., & Mezard, M. (1988). Proces'sing temporal sequences in neural networks. Preprint. Hebb, D. O. (1949). The organization ~fbehavioar. New York: Wiley. Herz, A., Sulzer, B., Kuehn, R., & Hemmcn, van, J. L. (1988). The Hebb rule: Storing static and dynamic objects in an associative neural network, Preprint.
Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences USA, 79, 2554-2558. Kree, R., & Zippelius, A. (1988). Recognition of topological
500 features of graphs and images in neural networks. Journal r~f Physics, A21, L813-L818. Little, W. A. (1974). The existence of persistent states in the brain. Mathematical Biosciences, 19, 101-119. Malsburg, yon der, C., & Bienenstock, E. (1987). A neural network for the retrieval of superimposed connection patterns. Europhysics Letters, 3, 1243-1249. McCulloch, W. S., & Pitts, W, H. (1943). A logical calculus for
A. C. C Cooten and E W. Kuijk ideas immanent in nervous activity. Bulletin oj Mathematical Biophysics, 5, 115-133. Riedel, U., Kuehn, R., & Hernmen, van, J~ L. { 1988). Temporal sequences and chaos in neural nets. Pln,~i¢~al Review, ~38. 11[)5-1108.
Sompolinsky, K., & Kanter, I. (1986). 'Iemporal association ii~, asymmetric neural networks. PhysicalReview l,etters', 57, 28612864.