Volume 114A, number 6
PHYSICS LETTERS
3 March 1986
N E U R O N G R O U P I N G S U G G E S T E D BY A MULTIPLICATIVE LEARNING M O D E L T. G E S Z T I Department of Atomic Physics, Ei~tvOs UniversiO'. Puskin u 5 7, H-1088 Budapest. Hunga O'
Received 20 September 1985; accepted for publication 12 December 1985
An approximate error analysis due to Hopfield and adapted to a multiplicative model of learning by synaptic plasticity. suggests that for preliminary information processing it is advantageous to divide a neural network into ganglia of few neurons. which is actually observed around sensory organs.
Mathematical modeling and computer simulation is a promising way to gain some insight into the connection between organization and functions of neural networks. Recent progress has been reviewed by Clark et al. [ 1 ]. An ulterior but so far remote-looking aim [ 1 ] of such studies would be to obtain results that could be connected to specific anatomic information about the brain [ 2 - 4 ] . A central subject o f this research is the process of learning by synaptic plasticity, i.e. the modification o f the strengths o f the interneuronal connections called synapses [5]. In most of the models used the neurons are regarded as logical decision elements [6] characterized by a "quasi-spin" variable, say, Vi = -+1 (i = 1, .... N for N neurons): + 1 if the ith neuron fires at the given time and - 1 if it does not fire (although the two possible states can be given a different interpretation [7]). The N numbers V1 . . . . . VN form a "vector" {V/.} that specifies the state o f the system o f neurons at a given time. The synaptic strengths are characterized by numbers Tij ("spin-spin coupling constants"): positive for an excitatory synapse and negative for an inhibitory one. Guided by some model of dynamics to be specified in each model [1,7], neurons take decisions from time to time to flip V/towards +1 or - 1 , depending on the value o f the summarized input ("effective magnetic field")
hi
rij .
(1)
Learned items are encoded into various a t t r a c t o r s 334
of the dynamics: single states, limit cycles or more complicated combinations o f states, towards any of which the neuron-flip dynamics may converge. Recalling one of the learned items is a fast [8] change of the state o f the system of neurons, converging towards the corresponding attractor; learning a new item is a slow [8] change of the synaptic strengths [5 ] so as to form a new attractor. The latter changes however add quasi-random noise [7 ] to the recalling o f the previously learned items, which sets a limit to this kind o f superposed storage [9] o f various pieces of information on the same network. The aim o f this note is to present a result that may offer a rational explanation to a simple and specific anatomic information about the brain: the existence o f small groups of neurons (ganglia) around the sensory organs [ 10], and their absence elsewhere. To obtain this result Hopfield's error analysis [7] is adopted to a new plasticity algorithm (eq. (3) below). Apart from that difference, we accept Hopfield's options [7]" a symmetric matrix o f synaptic strengths, Tij = T/i, and a dynamical law allowing the flip o f a single, randomly chosen neuron at each o f equal time steps, with a sharp threshold condition: the V/chosen takes the value +1 or - 1 according to the sign o f h i at the same time. This means that actually V/does not flip if B i - Vih i >1 O,
(2)
whereas in the opposite case the Vi chosen changes into its negative. In this simple model the only attractors are meta0.375-9601/86/$ 03.50 © Elsevier Science Publishers B.V. (North-Holland Physics Publishing Division)
Volume l14A, number 6
PHYSICS LETTERS
stable states: a number n of vectors { V/s) (s = 1 ..... n) for any of which condition (2) is satisfied for all/. A learning procedure is supposed to modify the original Tii's in a way to make n prescribed vectors (n "words" to be learned) the attractors of the flipping neuron system. As a tractable model we use the "plasticity algorithm"
Oox (o
n 0
$
$
(3)
3 March 1986
(i) One calculates P := P(B~ < 0), the "probability of error in a single bit". The calculation regards the memorized vector components Vs as random numbers assuming +1 or - 1 with equal probabilities, and uses the central limit theorem twice: both the sum in eq. (3) for a large number n of learned words, and that in eq. (4) for a large number N of neurons, are regarded as random variables of gaussian distribution. The result is e = 1 - ~(Z),
where T O -- T~ are random numbers of value -+1 ("native synaptic strengths"), whereas the parameter ~ characterizes the depth of changes in a single learning event: Tij is multiplied or divided by e" according to whether the (i]) synapse is satisfied or frustated in state {VJ}. This multiplicative algorithm is symmetrypreserving (Tii = 7~i), commutative ( Tii does not depend on the order the words are memorized) and signpreserving. The first two properties are mathematically convenient ;they are shared by the additive algorithm used in various studies [5,7,1 I]. The third property assures that a synapse never changes its excitatory or inhibitory character [1], which seems to be the real case ['12]. As mentioned above, the different memorized items make each other's recalling noisy. Another source of noise are the random signs o f the native TO's. Consequently, nominally learned words (V~) become attractors but with some error: if the neuron system is in a state {V~ corresponding to one of them, then for some neurons condition (2) is violated: B r < 0, where
Br = vr "j~i Tiil/~"
(4)
Then these neurons are locally unstable and they may flip out of that state. If such neurons are too numerous then the corresponding word cannot be recalled. Following Hopfield [7] we formalize this by the criterion that for reliable recall of a memorized word the probability that more than 5% of the neurons are unstable in the corresponding state, should not exceed ½. One can then ask the number n* (N,a) of words that can be stored on a network of N neurons by the multiplicative learining algorithm (3) of parameter or, without violating this criterion. The evaluation follows Hopfield's error analysis [7] and uses the same approximations. It has two steps:
(5)
where ~ ( x ) is the probability integral and
Z = ( N e x p [ - a 2 (n - 1)]/(1 + coth2a))l/2 .
(6)
(ii) If i :/:j then B r and B~ are regarded as independent random variables. This approximation, also implicit in Hopfield's paper [7], neglects the correlation caused by the same term V ri Tij V~? contributing to both B r and 6~ (eq. (4)). With this approximation positive and negative values o f B r form a sequence of binomial distribution and Hopfield's criterion can be attached a definite value of P for eachN, approaching P* = 0.052 i f N is large. Both approximations (i) and (ii) are valid for N ~, 1 ((i) needs also n ~, 1). In particular, (ii) is what in statistical physics one calls a "mean-field approximation"; with a long-range interaction (each couple of neurons interconnected) it becomes exact for N-* o,. Inverting eq. (6) one obtains for large N
n*/N= ot-2N-l(lnN+ ot2 - ln[Z*2(1 + cth2ot)] )
(7) with Z* = 1.6, the solution ofeq. (5) for P =P*. The functional relation (7) for the illustrative value ¢x= 0.3 is displayed in fig. 1, along with some numerical simulation results. In the simulation different numbers n of pseudo-randomly generated words are repeatedly taught using eq. (3) to a network of fixed N, and the (noninteger) n* corresponding to Hopfield's criterion is determined by interpolation. For the approximations used in deriving eq. (7), the agreement seems satisfactory. To do simulations for the more realistic_parameter value tx = 0.1 (see below) would have required dealing with much larger samples and using computing time beyond the authors's present possibilities. The key result, clearly exhibited by the simulated values as well, is the existence of a maximum ofn*]N at an "optimal" size of the network, N = Nopt ~ 81 335
Volume l14A, number 6
PHYSICS LETTERS
0.15 n/N 0.10
0.5
s'o
m
N Fig. 1. The number of memorizable words per neuron, n *IN, versus the number of neurons in the network, N (logarithmic scale). Full line: eq. (7), circles: numerical simulations, Nopt: eq. (8). on which n* = no*_t ~- 11 words can be stored for = 0.3. The g e n e r a ' f o r m u l a s are Nop t = Z . 2 ( 1 + cth2o0 e x p ( l - or2),
(8)
and nop t = a - 2 F o r small a the ratio (n*/N)opt approaches the limit (Z*2e) - 1 ~ 0.14. For large a the optimal ratio is drastically reduced: too strong synaptic modification causes too much noise. Our model gives no advantage to reducing a much below 0.1 since there the above limiting value is already closely approached. On the contrary, too low a makes the stored information susceptible to physical noises not included in the present model (and often modelled b y a non-sharp threshold [ 1]). Therefore ~ 0.1 seems an optimal choice if nature could select between chemically different mechanisms o f synaptic plasticity. Taking that as a biologically reasonable estimate o f a, the corresponding optimal size is Nop t 700. How to store then the m a x i m u m number o f words if there are N >>Nop t neurons available? There is an obvious trick to do that: splitting the network into smaller groups o f neurons (what anatomists call ganglia), each containing Nop t neurons; then the number o f storable words in (n*/N)opt N; irrespective of the actual size o f the network. It should be emphasized that this trick does not enlarge the information storage capacity since now the stored words are shorter. It depends on the actual purpose whether many short words or few long words 336
3 March 1986
are the more advantageous form to store roughly the same amount o f information. This is where the organisation o f the brain seems to take a purpose-guided option: (a) Many short words have to be stored in preliminary processing o f raw input signal, producing a reduced amount o f more ordered information [ 13]. This is the case where sensory organs join the brain, and indeed, the ganglia are there [ 10]. (b) On a higher level o f processing, the storage of a smaller number o f longer words is needed [ 13 ]. That happens in internal parts o f the brain, which appear much less structured [10]. It would deserve further study to find the class o f models for which the n*/N versus N curve has a maxim u m and the interpretation in terms o f ganglia may hold. In particular, this is not the case for the additive, non-sign-preserving algorithm [5,7,11 ]. Let us note finally that between extemities (a) and (b) there is the cortex with its famous columnar structure [2,4]. At least at first sight, the present work seems to have no relevance to that feature. I thank F.C. Crick for helpful criticism on a preliminary version, T. T61 for carefully reading this rrmnuscript, and J. Kert6sz, A. Cs~k~ny, R. N~meth, G. Mesz~na and J. Kiirti for discussions.
References [ 1 ] J.W. Clark, J. Rafelski and J.V. Winston, Phys. Rep. 123 (1985) 215. [2] G.M. Shepherd, The synaptic organization of the brain, 2nd Ed. (Oxford Univ. Press, Oxford, 1979). [3] J. Szent~gothai, in: Visual centers in the brain, ed. R. Jung (Springer, Berlin, 1973). [4] J. Szent~gothai, in: Architectonics of the cerebral cortex, eds. M.A.B. Brazier and H. Petsche (Raven, New York, 1978). [5] D.O. Hebb, The organization of behavior (Wiley, New York, 1949). [6] W.S. McCuUoch and W. Pitts, Bull. Math. Biphys. 5 (1943) 115. [7] J.J. Hopfield, Proc. Natl. Acad. Sci. USA 79 (1982) 2554. [8] E.R. CaianieUo, J. Theor. Biol. 2 (1961) 204. [9] F.C. Crick and G. Mitchinson, Nature 304 (1983) 111. [10] W.J.H. Nauta and M. Feirtag, Sci. Am. 241 (1979) 78. [11] L.N. Cooper, F. Liberman and E. Oja, Biol. Cybern. 33 (1979) 9. [12] E.R. Kandel, Sci. Am. 241 (1979) 60. [13] R.O. Duda and P.E. Hart, Pattern classification and scene analysis (Wiley, New York, 1973).