The world of attractor neural networks—Modeling brain function?

The world of attractor neural networks—Modeling brain function?

JOURNAL OF MATHEMATICAL PSYCHOLOGY 35, 532-544 Book (1991) Review The World of Attractor Neural NetworksModeling Brain Function? DANIEL J. AMIT...

992KB Sizes 2 Downloads 48 Views

JOURNAL

OF MATHEMATICAL

PSYCHOLOGY

35, 532-544

Book

(1991)

Review

The World of Attractor Neural NetworksModeling Brain Function? DANIEL J. AMIT.

Modeling Brain Function-The World of Attractor Neural Networks. Cambridge, MA: Cambridge Univ. Press, 1989. Pp. XVII + 505.

$ 39.50. Reviewed by ANDREAS V. M. HERZ AND REIMER KUHN Daniel J. Amit received his MSc. from Hebrew University of Jerusalem in 1961, and a Ph.D. in Physics from Brandeis University in 1966. Since then he has held several research positions and visiting and associate professorships, e.g., at Brandeis University, at Cornell University, at Hebrew University, at the Laue Langevin Institute in Munich, at Stanford University, at CEN Saclay, and at the Freie Universitlt Berlin. He is currently Professor of Physics at the Racah Institute of Physics, Hebrew University, Jerusalem. Since 1983 he has made neural networks his prime research interest, and he has made several important contributions to the field. Amit is author of the physics textbook, Field Theory, the Renormalization Group, and Crirical Phenomena (New York: McGraw-Hill, 1978), and Honorary Editor of the interdisciplinary journal NETWORK. The reviewers, Andreas V. M. Herz and Reimer Kuhn, are currently affiliated with the Division of Chemistry at the California Institute of Technology (AVMH), and the Sonderforschungsbereich 123 at the University of Heidelberg (RK), respectively. They share neural network theory as their main field of research. A. V. M. Herz studied physics at the Georgia Institute of Technology in Atlanta (MSc. in 1984), at the Ludwig-Maximilians-Universitat, Munich (Diploma in 1987), and the Ruprecht-Karls-Universitlt, Heidelberg, where he graduated in 1990 with a Ph.D. thesis on Temporal Association in Neural Networks. At present, he is working on a global analysis of dynamical systems with delayed feedback. R. Kuhn studied physics at Kiel-University, and at Sussex University in Brighton, UK. He received his Physics Diploma in 1984 and his Ph.D. in 1987, both from Kiel-University. His research interests include the theory of disordered systems and phase transitions. He has coauthored (with J. L. van Hemmen) two recent review articles, “Collective Phenomena in Neural Networks” and “Temporal Association in Neural Networks,” both in Models of Neural Networks, edited by E. Domany, J. L. van Hemmen, and K. Schulten (Heidelberg: Springer, 1990). \E 1991 Academx Press. Inc

In one of his lucid forays into “experimental metaphysics,” J. L. Borges (1981) relates an episode in which the literary world is obsessed by the discovery of one single volume of an imaginary multi-volume encyclopedia describing an imaginary Requests for reprints Institute of Technology,

should be sent to A. V. M. Pasadena, CA 91125.

532 0022-2496191 Copyright All rlghts

$3.00

c 1991 by Academic Press, Inc. of reproduction in any form reserved.

Herz,

Division

of Chemistry,

139-74

California

BOOK

533

REVIEW

world called Tlon. Despite a relentless universal search for the remaining volumes, so the story goes, they fail to show up, so that eventually a man of letters-tired of all the subordinate detective work that is put into the search-half earnestly suggests a project of reconstructing the whole set from the content of the one and only available volume: ex ungue leonem! In a similar spirit, though with a playfully provocative undertone, we have at times suggested that the recent achievements of the theory of attractor neural networks (ANNs) might be reconstructed from three key papers. First and foremost, there is Hoplield’s “Neural Networks and Physical Systems with Emergent Collective Computational Abilities” (1982) which laid the foundations to the field. The next, by Amit, Gutfreund, and Sompolinsky (1987), entitled “Statistical Mechanics of Neural Networks near Saturation,” has not only taught us how a systematic statistical mechanical analysis of neural networks (with symmetric couplings) could be performed, it has also revealed the beauty and richness of the seemingly so simple model proposed by Hoplield. The third and last item on our list, “The Space of Interactions in Neural Network Models” by Elizabeth Gardner (1988), has added an entirely new dimension to the field-a method predicting properties of whole classes of neural networks instead of properties of specific realizations. The book to be discussed here has been written by D. J. Amit, coauthor of the second contribution to our collection of key papers, and undoubtedly one of the leading experts in neural network research. It provides a thorough introduction to the fascinating world of attractor neural networks and was conceived to make this world accessible to disciplines other than physics, where substantial parts of it have been created over the last few years. In order to facilitate our review, let us briefly sketch what ANNs are all about-an attempt at describing, in a highly schematized manner, the dynamics of large networks of mutually interacting neurons. In its simplest form, an ANN consists of a collection of, say N, so-called formal neurons Si, 1 d i < N (McCulloch & Pitts, 1943), simple threshold automata that take only two states: firing (Si = 1) or quiescent (Sj = - 1). The neuronsare coupled by a set of synapses {Jii}, which may be excitatory (JO > 0), inhibitory (J,< 0), or absent (Jii = 0). In particular, one usually assumes that there are no self-interactions (Jii= 0). The dynamics is such that each neuron evaluates its post-synaptic potential (PSP), a weighted sum of incoming signals arriving at its dendritic tree,

hi(t)=

i

JqLSj(t), (1.a)

j=l

and determines its subsequent state such that it fires only if the PSP exceeds a certain threshold Ti. Formally, s,(t+st)=sign[h,(t)-

Ti].

(1.b)

534

BOOK

REVIEW

Various dynamic prescriptions based on (1) may still be contemplated. For instance, neurons may change their state one-at-a-time in a random or a regular order (sequential dynamics) or in complete synchrony (parallel dynamics). The reader may have noted that each single neuron is nothing but a single-layer perceptron (Rosenblatt, 1961). In an ANN, however, these perceptrons are interacting. The output of each of them is fed back into the system, to be used as input by many others. This feature makes ANNs fundamentally different from feed-forward systems, of which the perceptron constitutes a special case. Whereas the latter produce the same type of output for every input-for the perceptron it may be a tiring or a quiescent state-the former know two qualitatively different sorts of dynamical states, those that are fixedpoints of the dynamics, and those that change in time. To which category each state belongs is determined by the couplings J, and the thresholds Ti. ANNs can be made potentially useful devices, once one knows how to construct fixed points at will (and in large quantities) and make them stable attractors of the dynamics-hence the term ANN. Such devices can function as associative or content addressable memories. An initial state-a “stimulus”-lying in the basin of attraction of a stable fixed point of the dynamics-a “memory’‘-will spontaneously evolve towards this attractor. Arrival at the fixed point may be interpreted as a cognitive event-the “recognition of the stimulus.” All this has a manifestly biological flavour on at least two levels. At a lower level, the fundamental building blocks of ANNs are neuron-like elements, coupled by synapses which are the location of memory and the agents for tailoring attractors through “learning.” At a higher level such a system gives rise to a “recognition mechanism” which, being guided by attributes of the input percept itself, can do without extensive searches through memory. Moreover, in certain cases, the algorithms to determine the values of the synaptic efficacies J, such as to establish a collection of desired attractors can be as straightforward as a correlation measurement of pre- and post-synaptic activity during some learning period when the states to be stored are presented to the net. This procedure follows closely a neurophysiological postulate for learning formulated by the psychologist Hebb (1949) and may be simple and plausible enough not to be ruled out as unrealistic from a biological point of view. The sign-function in (1.b) makes the dynamics of formal neural networks nonlinear. As a consequence, the design or identification of attractors is in general a highly non-trivial task. A major breakthrough was accomplished by Hopfield’s (1982) discovery that the threshold dynamics (1) is governed by the Lyapunov (or energy) function (2)

if the synaptic couplings are symmetric, Jii = Jj,, and if updating in (1.b) is asynchronous. That is, the dynamics (1) corresponds to a downhill motion in the energy landscape created by H. Remarkably, certain (disordered) magnetic

BOOK

REVIEW

535

materials can be characterized by energy functions formally equivalent with H, the Si denoting microscopic magnetic moments called spins. The search for minima-local or global-of the energy function H, and thus for attractors of the threshold dynamics (1 ), has been a domain of physics for many decades. Over the years, a broad range of analytical tools has been compiled for studying the collective properties of spin systems described by (2). By analogy, these tools may be put to work in studying the collective properties of networks of formal neurons, a circumstance that has contributed extensively to making the theory of ANNs a very lively and productive branch of physics, as can be verified by a quick browse through any of the standard physics journals. That is not to say, though, that other disciplines, such as electrical engineering, computer science, and many more have not joined the game. Amit’s book describes in some detail the results of this recent encounter between neurobiology and physics. But there is more than that. The author makes a major effort to communicate these results to directly concerned disciplines-biology and cognitive science above all. This attempt has resulted in his writing the first three chapters as well as the expository non-technical descriptions of the remaining part of the book. Chapter 1 introduces the conceptual framework for the theoretical analysis of neural networks and states some of the main desiderata any such theory should meet if it is to be considered a serious candidate for “Modeling Brain Function.” At the microscopic level there is, above all, the requirement that the fundamental building blocks of (formal) neural networks should capture essential properties of their biological counterparts. While probably everybody would agree on this, there is an important epistemological snag here, related to the question “What is essential?’ At the macroscopic level, neural networks should give rise to phenomena that correspond to our intuitions about cognitive processes: associativity, parallel processing, and the potential for abstraction, to name only a few. A central issue here is the demand for a mechnism that spontaneously generates the meaning of some input to the neural system. Next, the neurophysiological background is described from which formal neural networks are abstracted. At the single neuron level, the outcome of this modeling is the two-state threshold automaton described earlier. The basic tenet of the present book, however, is that cognitively significant events would emerge only at the network level: “Beyond a certain level complex function must be a result of the interaction of large numbers of simple elements, each chosen from a small variety” (Amit, p. 9). Here, the decisiveness of the “must” may perhaps be debated. In any case, a biologist might object that the combination of “must” and “simple” is the statement of a metaphysics that keeps permeating neural network theory-at times (dangerously) unnoticed. At this point it is perhaps worth emphasizing that a faithful description of the interaction of a collection of formal two-state neurons is a delicate task. To mention only one major complication, there are various transmission delays due to different axonal and dendritic distances that signals must travel. This will cause neurons to

536

BOOK

REVIEW

respond to firing states of other neurons, some of which are already outdated, some not. Details of the resulting haphazard anarchy are difficult to capture in general, or by formal rules; they will depend on specific characteristics of the network architecture. The favourite choice of physicists has been to replace this anarchy by another one: asynchronous dynamics with a random order of updatings. While it is easy to state what has been gained by this desperate trick-analytic tractability-it appears very difficult to assess what has been lost. This is perhaps one of the weakest points of the whole enterprise. For the time being, the only excuse neural network theory has to offer is in the results: The emergent collective behaviour of large nets of formal neurons is found to be surprisingly insensitive to alteration of details of the modeling-at the level of single neurons as well as at the level of dynamical prescriptions. Chapter 2 begins with the introduction of an alternative description of neurodynamics in which the dynamical state of each neuron is characterized by a coarse grained variable, the spike rate, rather than by the presence or absence of single spikes. The pros and cons of the two approaches are in a way complementary. Provided that spike rates do contain all the information that is essential for the dynamical evolution of the system, a faithful description of the dynamics is much easier to obtain in this scheme than for a system operating on a spike-ornone basis: A description of the spike rate dynamics in terms of a set of differential (RC-charging) equations will be a good approximation, at least in situations where changes in spike rates are slow on the time scales set by the transmission delays. On the other hand, single spikes are manifestly there, and one might have lost important phase information (Gray, K&rig, Engel, & Singer, 1989) by switching to the concept of spike rates. Throughout the book this alternative description is, however, not taken up again, except briefly in the last chapter on hardware realizations of ANNs. Back to two-state neurons, the idea of synaptic noise and its origin- fluctuations in the amount of neurotransmitter released per spike-is introduced, followed by a derivation of the matrix of state-to-state transition probabilities of the resulting stochastic or “Glauber” dynamics. In a set of illuminating examples, Amit demonstrates the relation between prescriptions for the dynamics (synchronous, or asynchronous with serial or random order of updatings) and the nature of attractors (fixed points and limit cycles). What is relevant here is that fixed point attractors turn out to be insensitive to the specific choice of the updating rule. Unfortunately, the author is missing the chance to compare these results with those of the alternative approach in terms of spike rates. The visualization of the noiseless dynamical process as a downhill motion in an energy landscape is introduced as a possible explanation for the appearance of fixed point attractors, a description that is demonstrated to carry images of analytic and predictive power with it. The central hypothesis of the book is that the arrival of a trajectory at a fixed point may serve as an intrinsic mechanism of the network for the assignment of meaning to a “stimulus” which may have initialized this path.

BOOK

537

REVIEW

The characterization of the process as a downhill march in a landscape would, for instance, immediately reveal its “associative” nature, would probably elicit questions as to the basic times involved, and could create images of perception errors, occurring when the system gets trapped in spurious side valleys. Finally the beneficial role of noise is discussed: It allows for uphill motions with a certain probability and thus can help the system escape from spurious states. Noisy, i.e., stochastic, dynamical systems are, however, in general ergodic: That is, after some time, every dynamical trajectory will forget about where it started and will visit every state in phase space (with a frequency determined by the transition probabilities), rendering such a system useless for classification or association. It is for Chapter 3-one of the finest throughout the book-to explain how ergodicity in finite noisy systems comes about, and to point out ways in which it might be broken so that information processing becomes possible again: the true, noiseless way, and the way of effective ergodicity breaking on any desired time scale in sufficiently large systems. The latter borrows a good deal from intuitions physicists have developed about cooperative behaviour associated with phase transitions. These are nicely explaned in terms of ferromagnetic systems, from relevant facts about the role of connectivity and dimension up to a quantitative description of ergodicity breaking in fully connected ferromagnets. This example is particularly illuminating for two reasons. Its quantitative aspect remains transparent even in the analytic details, and it is equipped with transition probabilities which are of the same structure as those of the stochastic dynamics of networks of two-state neurons introduced in Chapter 2. So it contains, in essence, much of what is needed to understand noisy ANNs. The remainder of the chapter is devoted to establishing free energy functionals as appropriate substitutes for energy landscapes in the case of noisy dynamics,’ and synaptic symmetry is identified as the sufficient condition for the existence of Lyapunov functions for the noiseless and the noisy dynamics-asynchronous and parallel. This finally paves the way to understanding what statistical mechanics can contribute to elucidating the nature of noisy attractors in situations where the full dynamical problem turns out to be too complicated to unravel, which is by far the majority of cases. And a “teaspoon of statistical mechanics” (Amit, p. 133), again for the fully connected ferromagnet, is provided to drive the message home. The biologist may discover a grain of salt here, namely that progress in understanding neural network models is promised (and made) at the cost of an assumption that is neurophysiologically untenable: synaptic symmetry. We will have occasion to return to this point later on. After the first three chapters, the reader is well prepared for the neural network theory proper that is covered in the remainder of the book. In addition, each of the technical chapters to come has its own non-technical introduction to the material presented, putting the subject matter into perspective, reiterating underlying ’ Incidentally. in the literature.

we are not aware

of a proof-as

general

as that presented

in the appendix--elsewhere

538

BOOK

REVIEW

assumptions, identifying simplifications involved, and listing key results. These expository parts of each chapter constitute perhaps one of the best traits of the book. They are aimed at the interdisciplinary reader, but are beneficial reading for the expert as well. Beyond their obvious function as introductions to each chapter, we like to interpret them as putting the reader into a state of mind in which he or she feels that own comments, alterations, or amendments to the lists of underlying assumptions or simplifications are invited-precisely the state of mind from which, in the long run, the field will benefit most. Chapter 4 introduces the “standard model,” first proposed by Hoplield (1982) and designed to store a set of unbiased binary random patterns. That is, each bit of each pattern is randomly chosen to be + 1 or - 1 with equal probability. To the non-physicist, the choice of random patterns may seem odd at first sight. It should, however, be read as an attempt at generality, since any model storing an ensemble of data with given statistical properties can justly claim a wider scope than a model that is only demonstrated to work for special examples. Statistical properties can after all be varied in well defined ways. As in Hebb’s (1949) neurophysiological postulate for learning, according to which efficacies of excitatory synapses increase in response to repeated or persistent concurrence of pre- and post-synaptic spiking, synaptic changes during the learning phase of the Hopfield network are locally determined on the basis of pre- and post-synaptic activity. The similarity is quite striking, but it ends here: As a result of a symmetry between “firing” and “non-firing” neural states, Hopfield’s prescription leads to an increase of excitatory efficacy not only if both the pre- and post-synaptic neuron are firing, but also if both are “quiescent.” Amit’s derivation of the Hopfield synaptic couplings from the Hebb rule by appeal to neurophysiological evidence therefore appears somewhat odd, and in fact, one of the decisive steps in the line of reasoning is dictated by nothing but necessities of the modeling, namely the need for synaptic symmetry. This being so, it might have been wise to omit the argument altogether, and to leave it at stating similarities and differences between Hebb’s and Hopfield’s learning rules. The properties of the Hoplield network at low storage levels where the number of stored patterns is much smaller than the number of neurons are, however, exceedingly well explained. It is demonstrated that the stored patterns are stable attractors of the noiseless threshold dynamics (1 ), and that they have very large basins of attraction. Other, so-called spurious attractors, are also shown to exist as stable stationary states of the system, and their nature is elucidated. A set of simulation results is provided to illustrate these properties. They show in particular that one can destabilize the spurious attractors through moderate levels of noise while keeping those states stable which are strongly correlated with one of the stored patterns. The statistical mechanical analysis of the noisy network dynamics is presented in several levels of increasing sophistication, from a heuristic derivation of a set of self-consistency equations describing attractors in the presence of synaptic noise to a full blown computation of the noisy Lyapunov function. Its minima are determined by precisely the just mentioned self-consistency equations. All this is nicely tuned to the prospective interdisciplinary readership. Should a

BOOK

REVIEW

539

reader, for instance, have problems following the development of the theory in all depth, he or she can safely skip the rest of the chapter and will nevertheless have had a fair chance of seeing at least a sketch of the whole picture. Chapter 5 constitutes a break in the main line of the book. Functional synaptic asymmetry is introduced as a mechanism to produce attractors more complicated than stationary states-well defined temporal sequences of firing patterns in particular. These temporal associations could, for example, be of use in the control of rhythmic motion, in reciting songs or poems, and so on. A network that is able to count the cardinal number of a sequence of identical external stimuli (chimes) is described and discussed as a first step towards implementing structured operations in models of neural networks. In the expository part of the chapter, we find passages from the early Wittgenstein (of the Tractatus, 1921) cited as part of a philosophical motivation for the introduction of models for sequence generation. Our feeling here is that this is perhaps a touch too glorious for the purpose. Apart from this, the cited sentences of the Tractatus (Nos. 1.13, 2.01, 2.0271, 2.03, and 2.051) refer to 1ogicaE and semantic rather than to temporal relations between objects or facts. In any case, the late Wittgenstein (of the Philosophical Investigations (1971), for instance) would probably have considered things to be more complicated than linear sequences anyway. The level of sophistication required on the part of the reader is much lower here than in the other technical chapters. From a pedagogical point of view, the present chapter is therefore nicely located between two of the harder ones of the book. Back to the standard model, the question of its storage capacity is addressed in Chapter 6. This is one of the central system theoretical questions of the field, since for an information processing device operating in a parallel distributed fashion (such as the brain), high storage capacities for individual subunits are essential to reduce communication load. As a first step, statistical estimates for the maximal number of patterns that can be stored without errors are obtained by a simple so-called signal-to-noise-ratio analysis, a method that was already used in Chapter 4 to establish the stability of the stored pattern states in the low loading limit. If more patterns are encoded in the Hoplield couplings, the stable attractors begin to deviate slightly from the desired states, but are still strongly correlated with them. Eventually, a critical storage level is reached beyond which all associative capabilities of the network abruptly break down-a rather drastic change in network performance that has been termed a “black-out catastrophe.” The associated physical phenomenology-spin-glass ordering-is nicely explained. The statistical mechanical analysis that has unraveled these (and many other) phenomena has a strong aesthetic appeal. Unfortunately, the appendix containing the technical details of the computation hardly provides any information beyond what can be found in the original research literature (the second contribution to our list of key papers)-not exactly what is expected or desired from a monograph directed to an interdisciplinary readership-so it will probably be accessible only to the most avid of readers.

540

BOOK

REVIEW

In the Hoplied model, all stored patterns are imprinted in the couplings with equal intensity. This is eventually responsible for the occurrence of the black-out catastrophe mentioned above. Second, as more and more patterns are added to the system, synaptic efficacies can grow without bounds, which is biologically untenable. Storage prescriptions which lead to synapses of bounded strength are therefore introduced and are shown to give rise to the phenomenon of forgetfulness: New patterns can constantly be added to the memory and are kept, at the expense of old ones which are forgotten. Finally, a radically different approach to the question of storage capacity, the work of Gardner (1988) mentioned in our introduction, is also briefly discussed. Rather than computing the number of patterns that can be memorized and retrieved with a given storage prescription, one scans the space of couplings for sets of J,‘s which will ensure the stability conditions for a given number of patterns. It is argued that if no such set exists, then the absolute storage capacity of any conceivable formal neural network is exceeded. Unfortunately, there is only a sketch of the analysis, and most of the discussion remains at a qualitative level. It is quite remarkable that the question of storage capacities of neural networks has been one of the central issues in physicists’ approaches to the field, while its role has been much less prominent among cognitive psychologists and even marginal among biologists. The pronounced interest of physicists in this question is certainly partly due to the fact that the black-out catastrophe of an ANN is associated with a phase transition, a phenomenon which has aroused the curiosity of physicists for a long time. On the other side, it appears difficult to endow the concept of storage capacity of, say, the brain with a precise operational meaning. Measures for formal neural networks, such as the number of stored patterns per neuron, appear to be hardly accessible to the arsenal of experimental techniques of biologists or cognitive scientists. The possibility that results of a nature derived in the present chapter may be put to direct experimental test appears, therefore, rather remote. These problems are nicely reflected in the two examples Amit quotes about the suggested memory limits of human brains. One is an estimate of von Neumann (1958) about the number of information bits that may penetrate the CNS through the sensory systems in a human lifetime. It seems to us that this number (- 102’ bits) has very little to say about the brain’s storage capacity. Thus Amit’s attempt to relate it with the number of bits that can be stored in formal networks of the same size and connectivity (- 1015 bits, for uncorrelated patterns), and his arguments for the possibility that correlations between the stored patterns might account for a mechanism to close the gap between these two estimates really appear moot. The other example is related to experiments of Standing (1973) where subjects were asked to identify which member of a pair of pictures belonged to a set they had been presented with two days earlier. The percentage of correct classifications was roughly two-thirds and remained nearly constant over a wide range of sample sizes that were as large as 10000 pictures. While these numbers indicate that the storage capacity of the brain or parts of it (but which parts?) might be impressive indeed, it appears wellnigh impossible to translate them into measures such as stored patterns per formal

BOOK

REVIEW

541

neuron or stored bits per synapse. Moreover, the constancy of the 66% figure of retained pictures over a wide range of sample sizes indicates that a quantity other than storage capacity had been measured to begin with. The Hoplield model embodies a number of simplifications and idealizations which stand in contrast to biological facts but which were crucial in discovering some of the simpler system theoretical questions that lay hidden in the gamut of neurophysiological data (such as for mechanisms of associative information processing), and in getting analytical investigations off the ground. While Amit is aware that “...no amount of lifting simplifications will closely approximate the full glory of an assembly of real live neurons...” (Amit, p. 345) he embarks in Chapter 7 on a program of demonstrating that the basic functional features of the standard ANN survive the removal of broad range of simplifications and idealizations essentially unattenuated. The idea is that this kind of robustness against changing details of the modeling might indicate that something essential about mechanisms for associative information processing in nerve nets has been captured already in the standard model. This then would reinforce the claim that ANNs may contribute to understanding brain functions-if perhaps in ways the details of which remain to be discovered. Among the issues covered in Chapter 7 are random deviations of synaptic efficacies from their ideal values, sparse rather than full connectivity, models with synapses that are non-linear functions of the Hoplield couplings, in particular synapses with limited analog depth, random asymmetric “dilution,” i.e., severing of synapses, the introduction of synaptic specificity (Dale’s law), “memory” at the level of individual neurons to account for relative refractory periods, and more. Models with non-linear or symmetrically diluted synapses can be mapped onto a Hopfield model which has a random contribution added to the synaptic strengths. Details as to how this mapping is effected are, however, not presented, so the reader has to accept it as a fait accompli. In this respect it appears somehow odd that a related, more straightforward approach of van Hemmen (1987), in which the free energies of such models are evaluated directly, is not even mentioned. In the standard model, the stored patterns are random and unbiased. That is, in an attractor on average half of the neurons will fire-a level of activity much higher than that observed in the mammalian cortex. Chapter 8 is devoted to modifications in storage prescriptions and theory which are necessary to accommodate so-called low activity patterns in neural networks. Generalizations of these ideas are then put to work for the storage of hierarchically organized data. Here, reference to the work of Cortes, Krogh, and Hertz (1987), which relates storage prescriptions of this type to a general learning rule based on the correlation matrix between patterns, would probably have helped to make the treatment more systematic and coherent. The problem of learning in neural networks is analyzed in two complementary ways in Chapter 9. One is learning in modes, where the network is explicitly instructed to accommodate a set of patterns in its synaptic code- a scenario appropriate to formal networks and artificial devices, but presumably of little relevance to biology. The theory of this type of learning is presented, including 480,35/4-IO

542

BOOK

REVIEW

convergence proofs for some of the algorithms, much of this being inspired by the learning theory for the perceptron. The other approach to learning involves a description of a natural double dynamics of both neurons and synapses. This line of research is, however, only beginning to become quantitative, and the setups proposed so far appear to have little potential for generalization towards a coherent description of the process of the acquisition of persistent knowledge in neuronal networks within a constantly evolving environment. Finally, Chapter 10 deals with hardware realizations of ANNs. Three examples are presented: an electronic resistor-amplifier network, an electro-optical network, and a shift-register implementation. The motivations for describing these networks are that they can provide testing grounds for some of the theoretical concepts, and that they can be used to check the resilience of the main functional features of ANNs against various uncontrollable influences that are always present in real physical systems. Reports about the performance of these three architectures which would provide this sort of feedback are regrettably missing. Vis-a-vis de tout, this is a line book on attractor neural networks that will also be enjoyable reading for those taking delight in well written scientific prose. Order and logic of presentation are well tuned to the subject matter-all the way from the overall outline of the book down to its layout. An extensive table of contents and a glossary will help the novice find his way through the material, and will allow the expert to dive straight into the depths of a specific topic. The book could provide a good basis for a comprehensive two-semester course in neural network theory. Alternatively, the interested graduate student might well use it for independent studies. Finally, active researchers in biology, neuroscience, and the cognitive and computing sciences are among the audiences explicitly addressed by Amit. In large parts, we think, the contents of the book should be accessible to them, though perhaps not always without effort. Yet not all is nice. Parts of the book give the impression of having been produced in haste. There are, for instance, fairly many printing errors (regrettably, mostly in formulae), misplaced figures, and errors in some of the illustrative examples within the introductory chapters. Coming to the choice of material, we are led back to the list of key papers at the beginning of our review. Given the importance of the Gardner approach to neural networks and the wealth of interesting results it has produced-among them Virasoro’s (1988) beautiful model-independent work on predicting the effects of brain lesions-it would have deserved an expanded treatment in a chapter of its own. In addition, readers interested in modeling brain functions might be curious to know whether the work on feed-forward systems done, for instance, at the Weizmann Institute, Israel, and in Giessen, FRG, has something useful to tell about afferent systems-ven though, admittedly, feed-forward systems are outside the realm of ANNs. At the level of abstraction from neurophysiological substrate there is much that is left open or barely touched upon. There is, for example, the proper identification of the neurodynamics; there is-staring in our face-structure on different scales in

BOOK

543

REVIEW

biological networks-but a world of attractor neural networks predominantly made up of unstructured, amorphous models; there is the problem of delayed mutual influence between structural subunits of the brain; and so on. The picture of the humain brain as an assemly of lo7 elementary networks with lo4 neurons each, repeatedly used in the book, is certainly too naive, especially since the question of how to achieve and stabilize the delicate balance between interaction and insulation (Minsky, 1986) of subunits working in parallel is not even raised. According to a metaphor coined by the Russian physicist Y. I. Frenkel, a good theoretical model of a complex system should be like a good caricature: it should emphasize those features that are most important while down-playing inessential details.’ Once again, the epistemological hitch with this advice is that one really does not know what the inessential details are until one has understood the system under study. With respect to the above mentioned questions concerning neurophysiology, it is not clear a priori where they stand in this dilemma. One of the obstacles for the model builder here is that it appears difficult to put predictions of ANN theory to a direct experimental test so that the biological relevance of the whole enterprise must still be regarded as an open question. This is not in small part a problem of scales: The results of ANN theory can claim significance (and robustness with respect to details of the modelling) only on a scale where collective phenomena of large unstructured assemblies of neurons are responsible for the emergent behaviour, i.e., for homogeneous networks of 103-lo4 (or more) neurons. At present, the firing patterns of networks of this size are way beyond what can be monitored by experimental techniques of neurophysiology. On the other hand, it would seem that experimental psychology rarely comes up with results pertinent to the nature of attractors of specific structural subunits of the brain that are in the indicated size range. We do, however, agree with the author that ANN theory has produced a number of new and interesting concepts that may help to guide future theoretical and experimental work-in the sense of Kuhn (1970). Thus, while for the time being we would hesitate to tag ANN theory with the label “Modeling Brain Function,” we hope that this review will convince many readers that something useful can be learnt from ANNs, which may help to point out new avenues towards understanding old problems. With Amit, we would therefore like to regard this book as an opening of a discussion-undoubtedly a very qualified one.

REFERENCES AMIT, D. J., GUTFREUND, H., & SOMPOLINSKY. H. (1987). Statistical mechanics of neural saturation. Annals of Physics, 173, 3Ck67. BORGES, J. L. (1981). Tliin, Uqbar, Orbis Tertius. In Gesammelfe We&e 3/I. Miinchen: Original in BORGES, J. L. (1974). Obras Completas. Buenos Aires: Emect Editores.

‘We

came across this formulation

in a set of summer-school

lectures

by M. E. Fisher

networks Carl

(1982).

near Hanser.

544

BOOK

REVIEW

CORTES, C., KROGH, A., & HERTZ, J. (1987). Hierarchical associative networks. Journal of Physics, A, 20, 44494455. FISHER, M. E. (1983). Scaling, universality and renormalization group theory. In F. J. Hahne, (Ed.). Springer Lecture Notes in Physics (Vol. 186). Berlin/Heidelberg/New York: Springer. GARDNER, E. (1988). The space of interactions in neural network models. Journal of Phvsics, A, 21, 257-270. GRAY, C. M., KGNIG, P.. ENGEL. A. K., & SINGER, W. (1989). Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nafure, 338, 334-337. HEBB, D. 0. (1949). The organization of behauiour. New York: Wiley. VAN HEMMEN. J. L. (1987). Nonlinear neural networks near saturation, Physical Reviews A, 36. 1959-1962. HOPFIELD, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences, USA, 83, 25542558. KUHN, T. S. (1970). The slructure of scientific revolutions (2nd ed.). Chicago: Univ. of Chicago Press. MCCULLOCH. W. C., & PINTS, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115-133. MINSKY, M. (1986). The society> of mind. New York: Simon & Schuster. VON NEUMANN. J. (1958). The computer and the brain. New Haven: Yale Univ. Press. ROSENBLATT, F. ( 1961). Principles of neurodynamics. Washington, DC: Spartan. STANDING, L. (1973). Learning 10,000 pictures, Quarterly Journal of Experimental Psychology, 25, 207. VIRASO~O, M. A. (1988). The effect of synapses destruction on categorization in neural networks, Europhysics Letters, 7, 293. WITTGENSTEIN, L. ( 1921). Tractatus Logico-Philosophicus, Annalen der Naturphilosophie (Bilingual ed.). London: Routledge & Kegan Paul. WITTGENSTEIN, L. (1971). Philosophische Untersuchungen. Frankfurt: Suhrkamp.