An introduction to neural networks

An introduction to neural networks

NEUROCOMPUTINC Nemocomputing 14(1997) 101-104 Book review Material to be included in this section can be submitted to: Dr. K.J. Cios, University of T...

359KB Sizes 1 Downloads 135 Views

NEUROCOMPUTINC Nemocomputing 14(1997) 101-104

Book review Material to be included in this section can be submitted to: Dr. K.J. Cios, University of Toledo, Department of Electrical Engineering and Computer Science, Toledo, OH 43606-3390, USA. Email: fat [email protected]

An Introduction to Neural Networks by J.A. Anderson. MIT Press, ISBN O-262-01144-1,650 pp. “( . . .) Typically, a simulation reconstructs the correct answers about 70% of the time. That is, with considerable effort and a remarkable amount of computer time we created a student who earned C in arithmetic. (This led to an intereating conversation with one of the admiiistrators of a supercomputer center a few years ago, trying to explain why we believed several hours of supercomputer time were usefitlly spent getting the wrong answers in elementary arithmetic.). (. . . Y’ [ 1, p. 6041 To initiate the reader to the area of artificial neural networks with a strong biological motivation: such is the main objective of An Introduction ro Neural Nefworks, by J.A. Anderson [ 11.This book is organized in 17 chapters plus an afterword, where the author explains how the reader can obtain the source code of the programs that are used throughout the book. The httroduction consists of a general foreword where some considerations are made regarding the main adopted directives, which includes the emphasis on the biology and psychology underlying the considered models and assumptions, instead of a more comprehensive algorithmic analysis. An important tendency that is felt throughout the book, which is also highlighted in its foreword, consists in the fact that, while on one hand the techniques ate frequently discussed from a biological (neurophysiological or cognitive) point of view, on the other hand, the implementation of the techniques is not straightforward, demanding references to additional technical texts. In addition to the emphasis on the biological motivation that has underlined research in artificial neural nets since the early days, another nice feature in Anderson’s book is the preoccupation in presenting in a clear and introductory way important mathematical concepts, with special attention concentrated on matrices and linear algebra. Other positive features include the ptesentation of a concise abstract in the beginning of each chapter and the clear way in which the book is written. Chapter 1 starts by introducing the reader to neural networks by means of a brief discussion of simple data processing issues, which is followed by a presentation of the basic physiological aspects of single neurons. Although many aspects related to the biological neural cell are introduced with the aid of helpful images and schemes (incidentally, few of them are so self-explanatory as the cover of Hubel’s book [2]), this chapter will possibly be not very accessible to the non-initiated in electro- or biochemistry. Having introduced the basic facts about single neurons in Chapter 1, Anderson proceeds by discussing the interaction between neurons, as well as theoretical models, in Chapter 2. The following three models ate discussed in this chapter: McCulloch-Bitts; the “integrate-and-fire.” and the generic connectionist models, which are explained in a simple and didactical way. The importance of the non-linear stage and its relation to the dynamic range of real neurons is also emphasized in this chapter. Although Anderson presents some 0925-2312/97/$17.00 0 1997 Elsevier Science B.V. All rights reserved PII SO925-23 12(96)00046-X

102

Book review /Neurocompuring

14 (1997) 101-104

remarks about the history of the development of the artificial neural networks, this is done in rather a sparse way throu&ut the book, in such a way that those who are particularly interested in this topic will have some difficulty in bringing such information together. Chapter 3 starts with a concise and clear discussion about data representation, more specifically about the concentrated versus distributed paradigms; that is, whereas an information is mpresented by a specialized neuron (which is dedicated to the recognition of such an information), or by a pattern of unity activities (i.e. a set of unities that arc activated in order to signal such information). This is a topic whose importance is reflected throughout the book. Subsequently, this chapter proceeds by intrcducing some linear algebra concepts, as well as the first Pascal program fragments. Indeed, one of the tendencies underlying Anderson’s approach is the presentation of the theory and models, followed by examples using programs written in Pascal. Although such a practical approach is highly praiseworthy, it should be noted that the implementation of the basic examples, whose main goal is to gain insight about the behavior of the models, could be alternatively done in a much more straightforward way by using the script languages provided by mathematical software such as Matlab or Mathematics. In fact, it has become quite a common practice in many research laboratories to preliminary assess new ideas or concepts using such software before that the fmal algorithm is mote efficiently coded into a programming language such as C or Pascal. Such mathematical software has many advantages to the approach adopted by Anderson, including ready-to-use tools for manipulating matrices and many other mathematical functions (e.g. FFT, statistical analysis, signal processiug, dynamical systems, and so on), as well as nice and comprehensive I/O graphical interfaces. It should be also noted that some of the bibliographical items that could be quoted in this chapter are missing. Nevertheless, these shortcomings ate by far compensated by the intertwined presentation of some introductory linear algebra conjointly with the associated fragments of Pascal programs. The process of sensory transduction, i.e. how the external stimuli is converted into neuronal activation, is presented in Chapter 4, which also addresses lateral inhibition and the winner-take-all model. The reader is introduced to neuron discharge recordings and the importance of such experiments in understanding neural systems. The do-it-yourself experiment on recording neuronal discharge on a cockroach (not recommended to the fainthearted) and the short introduction to the Limulus polyphemus behavior are particularly intemsting. Lateral inhibition is introduced in a comprehensible way during the detailed explanation of the lateral eye of the Limulus. This chapter also presents the fmt experiment regarding algorithmic modeling, which includes some appropriate advices on gocd programming practice. Notwithstanding the right approach to programming presented in this chapter, it does not indicate a suitable basic bibliographical reference, e.g. [3], which could complement such an issue. Chapter 5, which is basically a continuation of Chapter 3, deals with matrices, linear algebra and linear systems theory, and also deserved more comprehensive basic references (41. Particularly, this chapter introduces the useful concept of outer or Cartesian product matrix, whose importance is explored in different parts of the book. The linear associator model, introduced and discussed in Chapters 6 and 7 in terms of outer products, plays a special role in the book, mainly with respect to the derivation of the BSB neural network model. In Anderson’s book, nearly all others neural network models are explained, compared with or developed from the linear associator, which is interesthtg from the perspective of a more unified and comparative treatment of the various neural models. Short- and long-term memories are briefly introduced. Chapter 6 includes a good introduction to the Hebbiau learning rule in terms of matrix operations, pattern association and autoassociative systems, while Chapter 7 discusses supervised learning and associative memory. The developments presented in these chapter provide a nice example of an additional virtue in Anderson’s book, namely its straightforward way of translating abstract ideas (mainly to beginners) into teal examples. The simulations on string processing (which are further explored in other chapters) adopts a distributed data representation of characters which is analog to the ASCII character code. lherefore., a character is represented as an activity pattern of + I and - I, e.g. the character “T” is represented as the sequence + 1+ I- I + I- I+ I - I- I. Strings are composed by appending the corresponding sequences. The experiments are carried out by creating problems from this basic string tepresentation, such as teaching associations between states and their respective capitals to a neural network. The network is afterwards asked to retrieve the state for a given capital, and so on. Suddenly, the dialectics between the local and distributed paradigms becomes clearer.. . Chapters 8 and 9 address the perceptron and the ADALINE models, respectively, including an extensive discussion about Minsk and Papert’s paradigm. This discussion is conducted according to an important perspective that is recurrent in Anderson’s book, namely the assessment and comparison of the main failures

Book review/Neurocomputing and

14 (1997) 101-104

103

limitations of the studiedmodels, as well as some possible directions toward solutions of the mom fundamental problems. Backpropagation is explained in Chapter 9. Examples of applications in image compression and digit recognition, as well as the NETtalk network, are also presented. Anderson has also paid great attention to an issue that is becoming very relevant in scientific dissemination: the revolution implied by the concretization of Internet. Firstly, the many Pascal programs and data that are referred throughout the book can be obtained by ftp from the MIT Press server or directly from the “CGlO2 WWW” home page, the server for the course on Cognitive and Neuroscience taught by the author at Brown University [5]. This kind of interaction, which has become more and mom common nowadays, may range from inviting the reader to contact the author through his email or offering a continuously updated errata [6], to the entirely publication of on-line electronic books [7]. lhese aspects deserve some considerations. First, computer networks have become a fundamental repository where copies of papers (or even complete books) and bibliographical search engines can be easily found. It is now possible to complement technical papers by incorporating video sequences, sound and simulations. More and more, many of the ideas incorporated into papers will come Erom the net, which raises the problem of how such material should be properly referred. On the other hand, information iu the net is extremely volatile: a tile “published” in the net can evolve or change as time goes by. Even an electronic address may change (as they often do - Who hasn’t seen the “Document Not Found” WWW error message yet?). So, how to refer to an information that is as volatile as this? Clearly, this represents a dilemma: on one hand, the material that can be found in the net can not be forgotten and must be referred; on the other hand, this material may be substantially altered or even non-existent by the time the work where it is mentioned is published. Anderson has chosen to make some references to information in the net, by including a fip address and referring to trends in email discussion lists. Chapter IO. which is larger and mom complete than the others, deals with information representation issues such as distribution X concentration of information in data structures, as well as some related biological examples. This chapter provides a typical example of the cross-fertilization principle in cybernetics, i.e. how advances can be made by bringing together computer and biological issues. Topographical maps arc explained in a comprehensive and caretid way in this chapter. In particular, the topographical structure of the primate visual system is described, including some indications on the important topic of orientation specificity. In spite of presenting a satisfactory introduction to many aspects of biological vision, under the information processing point of view, important topics such as information compression due to the (number of retinal receptors)/(number of ganglion cells) rate (the visual front-end “bottleneck”) and the organization of the receptive fields in the retina [8,2] have been overlooked. Related important mathematical concepts such as the Gabor and wavelet transforms (e.g. [8,91), which have proven to be particularly relevant for the modeling of the retinal and cortical receptive fields, have not been presented either. Chapter I I describes possible solutions to cognitive problems (more specifically, categorization and motion perception), including the interesting description of an experiment in psychophysics regarding the representation of dot patterns, while Chapter I2 reviews the Hoptield networks and Boltzmann machines, including the analysis of some basic aspects of dynamical systems and the use of simulated annealing. The relationship between neural networks and pattern recognition and interpolation (or approximation, as well as implementation by radial basis functions) of functions are addressed in Chapter 13. Some issues on representation of information, introduced in Chapter IO, are revisited in Chapter I3 under a more pragmatic perspective. Kohonen maps are explained in Chapter 14. As it frequently happens, the important issue concerning how to label the classes in the self-organized maps, which is usually accomplished by supervised learning, is not specifically discussed. Chapters 15, I6 and I7 conclude the book with an extensive discussion about the brain-state-in-a-box model (BSB), which is introduced in Chapter I5 in terms of matrix feedback-based models. This discussion is done in a clear and comprehensive way in terms of eigenvalues and eigenvectors processing. Chapter I6 investigates the issue of pattern association from the cognitive point of view, and some examples related to BSB are presented. The book is concluded in Chapter 17, where different aspects related to the creation of a neural networks that learn arithmetics using the aforementioned BSB model and including the formation of number concept represented by hybrid data structures, are discussed. The discussion on topographical maps is resumed in this chapter. Overall, Anderson’s hook provides a skilml introduction to artificial neural networks, the main virmes of the book being in its biological approach where theoretical models and nemophysiological and cognitive data are often discussed in an integrated way. The book is written in an uncomplicated and relatively informal way

104

Book review/Neurocomputing

14 (1997) 101-104

(including many well humored remarks), resulting in a text that is particularly pleasing to read. Furthermore, the technical discussion is done in an almost unified way through the use of the linear associator model and linear algebra. ‘lhe topics are presented in a way that alternates theoretical and practical issues and presenting fragments of Pascal programs and simulation results. ‘Ihe complete Pascal programs set can be freely obtained through the INTERNET. The overall structure of the book is well organixed, though it presents some inconsistencies (e.g. Chapter 3 introduces some mathematical concepts, Chapter 4 discusses biological issues such as lateral inhibition and Chapter 5 retakes the mathematical discussion, now on matrices processing). The main drawback of the book is acknowledged by author himself in the Foreword: algorithmic technical aspects are neglected in benefit of biological issues. Consequently, actual implementations of the models requite further references to mote technical texts. However, this is a shortcoming that is largely compensated by the nice aspects underlying Anderson’s book, a work that is destined to become a classical reference in its area.

References [l] J.A. Anderson, An Introduction to Neural Networks (MIT Press, Cambridge, MA, 1995). [z] D.H. Hubel, Eye, Brain and Vision, Scientific American Library, NY (1988) 1995. [3] J.P. Ttemblay and R.B. Bunt, An fnrroduction to Computer Science: An AIgorithmic Approach (McGrawHill, New York, 1979). [4] A. Papoulis, The Fourier Integral and its Applications (McGraw-Hill, New York, 1962). [S] http://maigtet.cog.brown.edu/102/102.html [6] R. Jain, R. Kasturi and B.G. Schunck, Machine Vision (McGraw-Hill. New York, 1995); On-line information can be found from http://machine_ vision.cse.psu.edu/ [7] J. Sirosh, R. Miikkulainen and Y. Choe, eds., Lateral Interactions in the Cortex: Structure and Function, The UTCS Neural Networks Research Group, Austin, TX, 19%. Electronic book, ISBN 0-%47060-O-g, http://www.cs.utexas.edu/users/nn/web-pubs/htmlbook%. [s] R.L. De Valois and K.K. De Valois, Spatial Vision (Oxford Sciences Publications, 19901. [9] L. da F. Costa, Opinions on wavelet receptive field model, http://vision.arc.nasa.gov/VisionScience/ mail/cvnet/1995/0173.html, 1995.

Roberto Marcondes Cesar Jr Cybernetic Vision Research Group IFSC - University of .I% Paul0 Caixa Postal 369, SP. 13560-970, Brazil

Luciano da Fontoura Costa Institut de Physique Th&orique, Universiti Catholique de Louvain 2, chemin du Cyclotron B-1348 Louvain-la-Neuve, Belgium email: [email protected]