Neurocomputing: Foundations of Research

Neurocomputing: Foundations of Research

08Y3-60X0189$3.011 + .OO CopyrIght 8~:lY89 Pergamon Press plc Neurul Nrrworks. Vol. 2. pp. 405-409, 1989 Prmted in the USA. All rights reserved. BOO...

710KB Sizes 61 Downloads 660 Views

08Y3-60X0189$3.011 + .OO CopyrIght 8~:lY89 Pergamon Press plc

Neurul Nrrworks. Vol. 2. pp. 405-409, 1989 Prmted in the USA. All rights reserved.

BOOK

REVIEW

ship among nodes to illustrate how topographical representations of feature spaces can be selectively organized. For a recent review of issues in self-organized learning see Grossberg (1987). Within the non-self-organized (externally supervised) learning lineage of neurocomputing research, a similar accumulation of ideas is apparent. beginning with Rosenblatt’s (chapter 8) description of the Perceptron. This chapter gives an intriguing glimpse into the enthusiasm generated by the “first wave” of neural network research in the 1950s. Minsky and Papert’s critique (excerpted in chapter 13), is generally credited (blamed?) both for pointing out limitations of Rosenblatt’s architecture and for deflating much of the burgeoning enthusiasm for neurocomputing by pointing out intrinsic limitations of perceptron-like neurocomputers. In Minsky and Papert’s formulation of perceptrons, essentially that which appears in many engineering and machine learning texts, a perceptron is a two layer network that can-by definition-compute only linearly separable predicates. Meanwhile, under “cover” of doing research in adaptive signal processing, Widrow and Hoff (chapter 10) were developing the “delta rule” for weight-update procedures in supervised learning. Their formulation provided both a computationally cheap alternative to certain more “optimal” techniques requiring expensive computations of matrix inverses or pseudoinverses (see Kohonen’s chapter 14 and Anderson’s chapier 15) and also provided a least-squares approximation of solutions for overlapping clusters in data sets that were not linearly separable. The Widrow-Hoff procedure was thus appropriate only for data sets which, if not actually linearly separable, admitted a best approximate separability criterion which was defined to have a single minimum in its performance space. No problems with local minima could arise because no local minima could come about, given the allowable problem domains, making the application of the delta rule of limited generality. The modern period in supervised learning research began quietly with Paul Werbos’ (1974, 19X8) doctoral dissertation. and continued with the subsequent reformulation by Parker (1985) and Rumelhart. Hinton, and Williams (chapters 41 and 42) of backpropagation. At last an analytic procedure for sensibly updating weights within middle (“hidden”) layers of multilayer networks had been specified, and one plausible solution to the “credit assignment probleni” derived! Given suitable choices of nonlinear signal functions. problems involving nonlinearly separable predicates are now routinely solved by simulated neurocomputers in popular student exercises (e.g., McClelland & Rumelhart, 1988.) The deeper questions concerning when the gain in performance justifies the increase in complexity, both structurally and dynamically, of backpropagation networks relative to conventional statistical pattern recognition and classification architectures remain an important research topic for the field. Backpropagation provides a powerful tool for discovering complex structures in data

Neurocomputing: Foundations of Research Edited by James A. Anderson and Edward Rosenfeld, Cambridge, MA: MIT Press, 1988, $55.00, 729 pp. ISBN O-262-01097-6. Neurocomputing: Foundations of Research provides a representative sample of research contributions that have played an important role in defining the modern field of neurocomputing. This collection is also an informal chronicle of how neurocomputing ideas have gradually accumulated, mutated, and combined over the last century. Anyone who masters the contents of this book will have an excellent idea of the central tendency of, and a reasonable idea of the variation within, the neurocomputing paradigm as it had developed through 1987. Because the book chronicles a gradual accumulation of ideas from William James (chapter 1) through the modern era inaugurated by theorists such as Anderson (chapters 15, 22. 36). Grossberg (chapters 19. 24). and Kohonen (chapters 14, 30). it shatters any illusions that the rich complex of ideas that now enliven the field sprang fully formed from anyone’s brow in the present decade. This point is reinforced both within and between the different lineages of neurocomputing research covered by the selections. For example. within the self-organized (internally regulated) learning tradition of research, we find von der Malsburg (chapter 17) borrowing Grossberg’s (1972) additive short term memory (STM) equations to serve as one basis. along with a normalized Hebbian synaptic modification rule, for constructing a striking model of the development of orientational selectivity among cortical “feature detecting” cells. This treatment in turn inspired Grossberg (chapter 19) to combine a shunting competitive STM model with a synaptic modification rule that allowed both passive decay and active growth. The outcome was a competitive learning model of self-organized pattern classification, and perhaps the first application to an important learning problem of the now well-known “winner take all” property (called the “choice” property in Grossberg, 1973) of recurrent shunting competitive networks. Equally important. chapter 19 motivates the basic ideas of Grossberg’s adaptive resonance theory (chapter 24), for example. of self-stabilizing category formation, by proving that the classification scheme initially induced by a feedforward adaptive filter would be unstable for many training sets. Yet a later treatment of the development of selectivity. based on a new variant of competitive learning and a more complex view of available post-synaptic information, is found in chapter 26 by Bienenstock, Cooper, and Munro. Kohonen, in chapter 30, employs a variant of competitive learning within networks having a neighborhood relation-

The authors discussions.

wish

to thank John W. L. Merrill for helpful 405

406 with clearly statable analytic criteria for a weight change procedure. It may not be surprising, therefore, that judging by submissions to current conferences, backpropagation and its variants appear to account for half of all the ongoing research in neurocomputing. The development of satisfactory solutions to the problem of credit assignment in multilayer nets has taken decades and is indeed a continuing research topic, but it is nevertheless fascinating to discover the extent to which theoretical insight must lead the way in defining problems before solutions can be found. It may be moot to debate to what extent Rosenblatt himself was aware of the limitations of the powers of perceptrons, but Rosenblatt apparently intended the term “perceptron” to have a broader meaning than that of Minsky and Papert, including but not limited to feedforward calculation of linearly separable predicate classes. Chapter 8 only gives a partial indication of the range of Rosenblatt’s intuitions. For example, Principles of Neurodynamics (1962) includes a discussion of what he calls “back-propagating error correction procedures.” He writes, “The procedure to be described here is called the ‘back-propagating error correction procedure’ since it takes its cue from the error of the R-units [responseunits], propagating corrections back towards the sensory end of the network if it fails to make a satisfactory correction quickly at the response end,” (p. 292). While his use of the term “back-propagating” is not formally equivalent to what is now called backpropagation, the core notion of transforming error signals at a later level into rules for modification of weights at an earlier level is unmistakable, as is the intent of his ensuing discussion of the issue of credit assignment raised by such procedures. Back-propagating error signals, in this wider sense, have also been important in the historical development of cerebellar models for side-loop calibration of feedforward motor control signals (Grossberg & Kuperstein, 1989; Ito, 1984). Another approach to the credit assignment problem is found in what is called reinforcement learning. Based in large part on evidence from the long tradition of studies of animal learning, reinforcement learning occupies in some sense a “middle ground” between self-organized and non-self-organized learning. In reinforcement learning the environment provides low-dimensional feedback (such as “reward” or “punishment”) concerning the correctness of a behavioral performance chosen amidst variations in complex, high-dimensional sensory stimulation. Unfortunately, neural network modeling of reinforcement phenomena is a rather small part of the present field of neurocomputing, and is represented only by the article of Barto, Sutton, and Anderson (chapter 32) in this volume. In their introduction to this chapter the editors mention that the included work is closely related “. . to the Rescorla-Wagner model of classical conditioning, probably the most successful current model of classical conditioning.” Readers may be left unaware that the Rescorla-Wagner model is a phenomenological model at the level of behavior, not a neurally based model, or that important gaps in its explanatory power remain. For a review of some recent issues in reinforcement learning, see Grossberg and Levine (1987). Another tradition in neurocomputing research can be called the “physical system approach,” in which the end

Book Review

result of a computation is represented by a configuration of network activity at some minimum or equilibrium value of some parameter with a plausible physical interpretation, such as temperature, energy, or noise. Articles within this tradition include, in addition to those of Anderson and Kohonen already mentioned, chapters by Amari (21). Anderson and colleagues (22, 36), Geman and Geman (37). Hopfield (35), Kirkpartick, Gelatt, and Vecchi (33). and Ackley, Hinton, and Sejnowski (38). Apparently, the urge to find interpretations of intelligent functions in physical terms has had long and broad appeal. The chapters just mentioned include some of the most intriguing and influential ideas in the field of neurocomputing, and we feel that any deep theory of intelligence must simultaneously respect physical constraints while elucidating the peculiar form of physical processes characteristic of intelligent behavior by living forms. The models in the cited chapters share a characteristic degree of symmetry or homogeneity in their structures. which are in some ways evocative of Lashley’s biologically inspired notions of mass action and equipotentiality (described later in this review). Nevertheless, the degree of homogeneity in these models is simply not characteristic of much of what is known about specializations of brain structures, many of which are devoted to the kinds of “preprocessing” whose importance and difficulty was for so long underestimated in such fields as computer vision. We believe that disproportionate emphasis is presently placed within the field of neurocomputing upon such homogeneous “physical system” models. at the expense of furthering our understanding of unique and powerful properties of certain more specialized and locally structured networks. The chapters of the book could be partitioned into many categorical schemes besides the one ac’ve just outlined. Neurocomputing contains an interesting blend of trulyclassic contributions which have already demonstrated their decades old importance, and recent contributions whose impact in the long term is less certain. The authors have done an admirable (and courageous!) job of selecting examples in the latter category. Unlike political history texts. which so often end their narrative just before the birthdates of their readers in order to avoid taking stands on currently or recently controversial events, the prcsent book contains considerable material on issues still hotly debated in research meetings and reports. The early selections must command near universal approval, however: and provide a welcome overview of roots that arc certainly “foundations of research.” The first five chapters constitute a delightful survey of precursors to some of the paradigmatic ideas in neural network research. Chapter 1 by William James gives as clear a statement as may exist in the English language of the notion that complex ideas are formed by “propagation of excitement” among brain processes. Chapters 2 and 3 by Warren McCulloch and Walter Pitts introduce many ideas still of great importance, especially that many simple elements working collectively can compute complex functions and that spatially distributed maps of activities in neural layers can code abstract properties, such as form. Hebb’s (chapter 4) classic statement of a hypothesis of neural adaptation is a remarkable illustration of the power afforded by giving an abstract idea a concrete physical

Book Review

interpretation-the Humean, even Aristotelian, postulate of learning based on repeated conjunction becoming the postulate of connection strengths between neurons growing with use. Without fancy computers or even formal mathematical statements, Hebb’s hypothesis. together with his cognitivist formulation of short-term memory traces enabled by reverberatory circuits, laid the foundations of much later research. To this day, phrases such as “Hebbian learning” and “non-Hebbian learning” pervade the literature. Karl Lashley’s contribution (chapter 5) concerns the centuries-old debate about the extent of localization of function in the brain, arguing that memory traces are rather spatially distributed and not localized. Much of the evidence available to Lashley came from studies in which animals trained to perform some motor task (e.g., maze running) were then subjected to lesions of various brain tissues. Lashley found that capacity to perform the task often survived, at least in partial form, whichever of several areas of the brain was removed. Thus the memory could not be spatially localized, and he formulated principles of “mass action” (performance roughly proportional to brain mass) and “equipotentiality” (substitutability of differing brain tissue to perform a given task) to account for his results. While much of Lashley’s methodology was crude by current standards. and certain of his conclusions are in need of modification in light of our more recent capacity to observe localized brain functions on finer scales, the core idea of redundancy and distribution of information in brain has survived, and is a key tenet of much recent research in neural modeling. Besides the service they have done us all simply to have collected all the papers into a single volume, the editors have also provided informative linking commentary in their introductions to each chapter. Their introduction to Chapter 2. in which the McCulloch-Pitts model of a summing, thresholding, binary output neuron is introduced. is an intriguing example of both the utility and limitations of such short commentary. They describe how the notion of a binary output code was a natural conclusion from the “all or none” appearance of the spiking action potential. Later intracellular recordings made apparent that presynaptic cell potentials are graded, and that neurons appear to convert voltage to frequency of action potential output. So far so good; the editors have, within severe space limitations, pointed out some of the fundamental physiological justifications for using continuous or analog output neurons. Still, a few more paragraphs might have indicated to the reader that the present vogue for a cerain form of continuous output element should not lead neural modelers to replace one incorrect dogma with another one. For example, recent work (Richmond, Optican, Podell, & Spitzer, 1987) indicates that neurons may be transmitting information in the temporal frequency modulation of their firing rates, rather than merely in their mean rate as many of us so often assume. Moreover, neither the brief introduction to chapter 2 nor the General Introduction to the volume examines the deeper issue of whether the firing rate of individual neurons is always the appropriate computational unit for modelers to employ. What of retinal horizontal cells, which do not spike at all? What of gap junctions. dendrodendritic interactions, axo-axonal coupling‘? These phenomena may appear as mere biological

407

extravagance to those whose model neurons must be of the form “compute the dot product of the inputs and the weights and run the result through a sigmoid.” Nature, the opportunist, has invented many such mechanisms, however; and while some may represent evolutionary happenstance, others might hold important keys to understanding fundamental mechanisms of intelligence. An example of the relevance of subcellular mechanisms can be found in the recent invocation of the notion of diffusion of signals within a cortical syncytium (Cohen & Grossberg, 1984). At a certain point within the development of a vision theory it becomes neccessary to postulate a mechanism for “filling in” of color or brightness signals that rapidly and essentially homogeneously replicates a contour-generated value throughout a region. The temporal requirements for such a process and the desired profile of spatial interactions are such that propagation of this signal through action potentials is an unlikely candidate mechanism. Thus, theory forces that a mechanism be postulated for signal transport whose interactions are both more rapid and more flexibly modulated (inhibited) than a single cell’s action potential cycle. Right or wrong, the prediction of cortical syncytia involved in color and brightness processing is an important example of the kind of dialogue possible between neural network modelers and neurophysiologists. If confirmed, it would constitute a theoretical success on a par with confirmations of the existence of previously unobserved particles in physics. Many such predictions are being made and tested by individuals and groups at many laboratories. Concepts such as a cell syncytium will seem less mysterious as more neural network modelers acquire a greater interdisciplinary understanding of neural function. We believe that an understanding of specialized “local circuit” neural networks will prove of increasing importance in the coming decades. Many such networks. ubiquitous in both peripheral sensory and motor systems in animals, show rather limited forms of adaptation (in the usual sense of “weight update” over many presentations of inputs) in their mature function. Indeed the demands of, for example, preattentive vision or muscle coordination hardly afford the luxury of slow weight update procedures. Instead, the operative biological networks are so ideally structured (i.e., adapted by evolution or development) to their task that their dynamics reconfigure to appropriate task-specific modes in real time. Anderson and Rosenfeld ought not be faulted, however, for failing to include more material on issues such as we have just raised. Indeed, Neurocomputing contains proportionately more behaviorally and physiologically inspired material than do many current conferences in the field. In any event, we are informed that the authors are presently compiling another volume of collected papers that will in fact contain a greater emphasis on biologically inspired work. Such a volume would be a welcome contribution. The latter chapters of Neurocomputing contain contributions from current researchers, including papers published as recently as 1987. In choosing among which papers by which authors to include. the editors manifested a tolerance for diversity of opinion and a disposition to accent the positive whenever possible. In places, however, these tendencies result in somewhat uncritical acceptance of au-

408 thors’ claims. For example, the introduction to the chapter by Marr discusses his suggestion for edge detection based on zero-crossings of Laplacians of Gaussians of image intensity and cites its claimed advantages without mentioning either its limitations for machine vision applications or its explanatory inadequacies as a foundation for a theory of how mammalian visual systems actually respond to edges. In a related vein the inclusion of Marr and Poggio’s chapter 20 helps perpetuate a historically inequitable interpretation of the contributions of Sperling (1970) and Dev (1975) in devising cooperative-competitive disparity computing algorithms. For a review of early neural modeling proposals in vision, see Grossberg (1983). The introduction to Sejnowski and Rosenberg’s chapter on NETtalk is another example of the editors’ enthusiasm outrunning critical evaluation. To be sure, NETtalk is a demonstration that rule-like behavior can be acquired from experience and continuous modification of parameters, and provides some support for the anti-Chomskian position that children might be able to learn to talk without a great deal of dependence upon specialized and innate linguistic knowledge. These are critical first steps, but they may prove to have been easy relative to the still uncompleted steps. For example, the data input to the model is presegmented-thereby finessing one of the most difficult aspects of speech perception-but this point is obscured by the editors’ remark that “Sejnowski and Rosenberg taught the system both isolated words and continuous speech [emphasis added] (p. 662). Also, the model only generated the parameters for articulatory specification corresponding to each pronounced phoneme; the knowledge of how to generate the trajectories of the model’s “tongue,” “lips,” and so forth, necessary for producing the compelling audio tape used to demonstrate the model was pre-wired. Thus the model was in effect learning only a “symbol to symbol” associative mapping, rather than the more difficult “analog to symbol” or “symbol to analog” mappings involved in speech perception and production. We feel that the most successful models in the speech domain will respect the necessity of parsing competencies and networks alike into specialized subfunctions and subcircuits. The use of the term “foundations” in the book’s title implies a strong claim to longevity of ideas presented. As we’ve indicated, the earlier chapters are of proven worth. and will doubtless continue to be read for decades. It will be fascinating to see which articles from the 80s will be included in a collection on foundations of neurocomputing compiled some time in the 21st century. The authors have made some excellent choices. Virtually all of the plenary speakers at recent major neural network conferences are represented in the book, and no major theoretical themes have been overlooked. Still, everyone reading the book will have one or two favorite omissions. Ours include Hartline and Ratliffs (1957) exploration of lateral inhibition in Limulus retina (though chapter 23 compensates in part); Lettvin, Maturana, McCulloch, and Pitts’s (1959) pioneering “What the frog’s eye tells the frog’s brain”; and LashIcy’s insightful essay on serial order in behavior (1951). Also, Sutton and Barto’s (1981) paper, while presenting a less developed model than that contained in the Barto. Sutton, and Anderson paper (chapter 32), contains a great deal more explanatory material on the importance of re-

Rook Review

inforcement learning and on links between the conditioning and adaptive systems literatures. Our suggested articles are primariiy concerned with physiology or animal behavior. because we believe that an understanding of mind can only come about through a process of guided reinvention-based on observation and discovery-rather than of pure invention. While debates about artificial intelligence and neural networks are often phrased in terms of symbolic versus analog processing,or in terms of the importance of parallel processing and distributed representations, we believe that a more fundamental difference in attitude concerns the importance of adducing information from behavioral and biological data. We believe that for researchers to attempt to devise model\ capable of intelligent perception, cognition: and control of action in ignorance of nature’s own examples-whether done in the name of artificial intelligence, neural networks, connectionism, or any other paradigm-can lead at best only to a few fortuitous partial solutions. We salute Anderson and Rosenfeld’s efforts to encourage researchers interested in designing new computing technologies to become better acquainted with results from the life sciences. and look forward to another collection of papers that will help further this goal. We understand that the publisher has plans to release a paperback edition of Neurocomputing. This is wonderful news, as in our introductory courses on neural modeling we have assigned many readings that are reprinted in this book. whose hardcover price is somewhat high for most student budgets. The hardcover edition nevertheless represents somewhat of a bargain in terms of the value of included material relative to average prices of books these days, and the muscle needed to heft the hardcover edition carries an appropriate symbolic message: As the included articles demonstrate, neurocomputing has much. much more muscle to flex today than during its infant struggle with now-traditional computing. We see little chance that the paradigms of the field could once again be displaced from the mainstream of research in natural and synthetic intelligence.

REFERENCES Cohen, M. A., & Grossberg, S. (1984). Neural dynamics of brightness perception: Features, boundaries, diffusion, and resonance. Perception & Psychophysics, 36, 428-456. Dev, P. (1975). Perception of depth surfaces in random-dot stereograms: A neural model. InternationalJournal of Man-Machine Studies. 7, 511-528. Grossberg, S. (1972) Neural expectation: Cerebellar and retinal analogs of cells fired by learnable or unlearned pattern classes. Kybernetik, 10,49-57. Grossberg, S. (1973). Contour enhancement, short-term memory and constancies in reverberating networks. Studier in Applied Mathematics,52, 213-257. Grossberg, S. (1983). The quantized geometry of visual space: The coherent computation of depth, form, and lightness. Behavioral & Brain Sciences, 6, 625-692. Grossberg, S. (1987). Competitive learning: From interactive activation to adaptive resonance. Cognitive Science, U, 23-43. Grossberg, S., & Kuperstein, M. (1989). Neuruf dynamicsof adaptive sensory-motorcontrol: Expanded edition. New York: Pergamon Press. Grossberg, S., & Levine, D. (1987). Neural dynamics of atten-

Book Review tionally modulated Pavlovian conditioning: Blocking, interstimulus interval, and secondary reinforcement. Applied Opfits, 26, 5015. Hartline, H. K., & Ratliff, E (1957). Inhibitory interaction of receptor units in the eye of Limulus. Journal of General Physiology. 1, 227-295. Ito. M. (1984). The cerebellum and neural control. New York: Raven Press. Lashley, K. S. (1951). The problem of serial order in behavior. In L. A. Jeffres (Ed.), Cerebral mechanisms in behavior. New York: Wiley. Lettvm, J. Y., Maturana, H. R., McCulloch, W. S., & Pitts, W. H. (1959). What the frog’s eye tells the frog’s brain. Proceedings of the Institute of Radio Engineers,

ical Review, 88, 135-171.

Werbos, I? (1974). Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. thesis, Harvard University. (Also reprinted as a report of the Harvard/MIT Cambridge Project, 1975, under Dept. of Defense contract.) Werbos, P. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1.339-356.

47, 1940-1951,

D. E. (Eds.). (1988). Explorations in parallel distributed processing. Cambridge, MA: MIT Press. Parker, D. B. (1985). Learning-logic (Tech. Rep. No. TR-47). Cambridge, MA: MIT Center for Research in Computational Economics and Management Science. Richmond, B. J.. Optican, L. M., Podell, M., & Sptizer, H. (1987). Temporal encoding of two-dimensional patterns by single units in primate inferior temporal cortex, I: Response characteristics. Journal of Neurophysiology, 57(l). 132-178. Rosenblatt. F. (1962). Principles of neurodynamics. Washington. DC: Spartan Press. McClelland, J. L., & Rumelhart,

Sperling. G. (1970). Binocular vision: A physical and neural theory. American Journal of Psychology, 83, 461-534. Sutton, R. S. & Barto, A. G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction. Psycholog-

Ennio Mingolla * Daniel BullockiCenter ,for Adaptive Systems Boston University 11I Cummington Street Boston, MA 02215

*Supported in part by AFOSR F49620-87-C-0018 tsupported in part by NSF IRI-87-lhY60.