An undergraduate course in artificial neural networks

An undergraduate course in artificial neural networks

CompurersEduc. Vol. 16, No. 2, pp. 153-156, 1991 Printed in Great Britain. All rights reserved 0360-1315/91 $3.00+0.00 Copyright 0 1991 PergamonPress...

442KB Sizes 0 Downloads 42 Views

CompurersEduc. Vol. 16, No. 2, pp. 153-156, 1991 Printed in Great Britain. All rights reserved

0360-1315/91 $3.00+0.00 Copyright 0 1991 PergamonPress plc

AN UNDERGRADUATE NEURAL F.

COURSE IN ARTIFICIAL NETWORKS

LAYNE WALLACE

and

SUSAN R. WALLACE

University of North Florida, College of CIS, 4567 St. Johns Bluff Rd, S., Jacksonville, FL 32216, U.S.A. (Received

I5 July 1990; accepted

15 September

1990)

INTRODUCTION

Recent interest in artificial neural networks has reached a new high. While theory concerning these networks has been under study since the early 194Os, practical application of a widespread nature has only recently come into its own. A number of new companies, touting software and hardware for artificial neural networks, have carved their own niche in the computer marketplace. This widespread popular interest has stimulated an interest at the higher educational level. Most courses dealing with artificial neural networks are offered at the graduate level. This paper attempts to show how an introduction to artificial neural networks can be implemented at the undergraduate level and gives the results of implementing such an undergraduate course. This type of course would apply as an elective for Computer Science majors under the ACM’s Curriculum ‘78 and a supplemental course to IS7 under the 1982 ACM guidelines from the ACM Curriculum Committee on Information Systems. Information Systems majors could use this course as an elective under DPMA CIS/86 and could be adapted to CIS/86-12. The field of artificial neural networks has enjoyed both enthusiastic popularity and an almost total lack of interest over the last 50 years. Part of the reason for this dichotomy is that the field of artificial neural networks takes its substance from a variety of disciplines: mathematics, biology, psychology and computer science. Each of these separate disciplines tend to resist attempts to use information from the other disciplines. To cover such diverse material in the undergraduate setting, the course can be divided into three main topic areas: (A) Background tools and history; (B) Overview of training algorithms and architectures; and (C) Applied and commercial artificial neural networks. A course of this type firsthand knowledge of answers and the student be used as supplemental the subject matter.

allows advanced undergraduate students the unique opportunity to gain a field of active research. Many of the questions that arise have no set projects may not have been attempted before. Recent journal articles must material to a basic text to present an accurate and current overview of

BACKGROUND

TOOLS

AND HISTORY

There are two basic tools needed as background for an introduction to artificial neural networks. A grounding in basic math with an emphasis on algebra and matrix theory is necessary. Most of the training mechanisms associated with artificial neural networks can be viewed from a matrix point of view. The other background tool is information about the physiology of typical brain structures from the neuronal level to the cortico-functional level. Not only does this provide the student with a basis for understanding past research but also allows the student to develop new uses and modifications based on brain physiology research. Developments in both of these fields of study have provided the bulk of the history of artificial neural networks. The mathematical neuronal model developed by McCulloch and Pitts [l] is still the basis for most artificial neural net implementations. At the functional level, much of the work in modeling vision 153

F. LAYNE WALLACE and

154

SUSAN R. WALLACE

and, more recently, the hippocampus was directed by findings in neurophysiology. Hebb’s theory of learning[2] has provided the basis for altering the strength between artificial neurons and, therefore, the ability to store information in the net as a whole. Rosenblatt [3] developed the most popular implementation of artificial neural nets, the perception, during the late 1950’s and early 1960’s. This work was almost buried by the scathing, and somewhat misdirected, writings of time should be taken to acquaint students with basic Minsky and Papert[4]. In addition, terminology, such as whether a two-layer network refers to two neurode layers (input and output) or the numbers of layers of connections. The concept of supervised and unsupervised training is also necessary. 0l:erview

qf training algorithms

and architectures

The training algorithms and associated architectures presented in an undergraduate class can easily be presented in a historical context. A description of the initial network should introduce students to the typical computer implementation of the McCulloch-Pitts neuron[l] and to simple Hebbian learning simulation. This network works well as a classifier when the input data are not correlated. The students can be shown that when the data becomes less distinct, the network fails to differentially classify the input data sets. This also gives an opportunity to show how data representation plays a crucial part in computing. The next training algorithm would logically be the perceptron [3]. From a historical perspective, the perceptron was the most widely used of the early neural network simulations. From an educational perspective, the perceptron is the direct ancestor of the most widely used artificial neural networks commercially available today. The perceptron should be presented with Delta rule [5] to determine how much to alter the connections between neurons. Two-layer perceptrons can be shown to have no more functional power than a one-layer perceptron[4]. The next artificial neural network to be examined should be a two-layer network, trained by backpropagation[6]. The backpropagation net is excellent to show how the concerns of Minsky and Papert could be answered, particularly in the case of linear separability. Multilayer networks should be discussed in terms of training techniques. Applications of backpropagation that can be discussed in the undergraduate classroom include pattern associators, prediction systems, data compression systems and classifiers. Other topics dealing with backpropagation include local minima, momentum and the tradeoff between rapid training and the ability to generalize. There are two basic -ways of representing artificial neurons; in matrix form or as individual neurons in a record format. Class discussion and detailed examples are necessary to properly convey the subtle differences between the two methods. Traditional statistical techniques, such as linear regression and discriminant function analysis, are good alternative techniques to artificial neural network implementations for the same problem. Advantages and disadvantages of each should be demonstrated by using classroom examples. A technique by Kohonen[7] provides an excellent example of unsupervised training. This technique may be joined with a supervised traning algorithm to produce a network known as counterpropagation [8]. The advantages of counterpropagation over backpropagation can be easily seen by use of examples. Kohonen modules give the additional benefit of extracting features from the input data which are not biased by predispositions of the programmer. These features can be studied by examining the output from the Kohonen layer. A good example for this is an expert system using medical symptoms to classify specific diseases. Other unsupervised training methods which are easy to demonstrate to undergraduate students include Boltzman machines [6] and Cauchy training [9]. Recurrent networks and continuous, real-time nets can be represented with Hopfield nets[lO]. Associative memories[l 1] are excellent for classroom exercises concerning models of human long-term memory. A discussion of the Fukushima’s Cognitron [ 121 and Neocognitron [I 31 is desirable but time constraints will make any attempt at implementation difficult, at best. The most flexible network to be presented at the undergraduate level is an implementation of Carpenter and Grossberg’s Adaptive Resonance Theory[l4]. The ART1 provides students with a mechanism to study unsupervized models of human behavior. It has the ability to learn new patterns and adapt old ones as execution continues. The ART1 acts as a classification system which leads to an abundance of possible applications for commercial settings and neural modeling

Undergraduate course in artificial neural networks

155

research. A discussion of the ART2 is beneficial to show how a powerful artificial neural network can be adapted to account for new findings in the literature concerning brain function[lS]. Applied and commercial ~rtl~ciQ~neural networks Commercially available neural networks can be evaluated regarding applicability, ease of use and implementation. These networks can be divided into two groups; software which runs on general purpose computers and hardware which uses specialized software or languages specifically designed for artificial neural networks. While not a true commercial system, the package accompanying the Parallel Distributing Processing volumes[16] is useful to show the flexibility of artificial neural networks. A commercial pattern associator, such as that produced by Nestor[17], gives students a good idea of the ability provided for practical applications. In addition, specialized hardware and artificial neural network languages, such as that provided by ANZA[18], give the students an alternative view to that of implementation by a general programming language on a general purpose computer. Course implementation The University of North Florida is a small (7500 students) institution with a College of Computer and Information Sciences housing both Computer Science and Info~ation Systems/Science. An artificial neural networks course was offered as a special topics course for three credit hours with 18 junior and senior undergraduates and 5 graduate students enrolled. The background of the students varied from theoretical computer science (CS majors) to applications development (IS majors) but all students were expected to have successfully completed a data structures course. The text for the course was Wasserman’s ~e~$~~ Cornp~ti~~ [19]. The semester was 14 weeks long with a final exam after the 14 weeks. The class met two days a week for 1 h and 15 min. Three tests were given including the noncomprehensive final exam. The students were required to complete three programming assignments using the programming language and computer system of choice. The students chose C, Pascal, BASIC and COBOL on microcomputers, minicomputers and a mainframe. The three programming assignments used a pattern recognition task where the programs were required to recognize the characters 0 through 9 in a 5 by 7 grid. All programming projects were required to be able to recognize the data used in training the networks and to try several sets of “dirty” data to see if the networks could generalize. The first assignment used a simple perceptron which demonstrated the need for uncorrelated data but still showed the power that even a simple artificial neural network possessed. Programming assignment 2 used the same data set as programming assignment 1 but the students were required to develop a backpropagation network. Two major points of this exercise were to show that two layer networks with propagated error correction could generalize to a “fuzzier” data set. As a side note, several students used their backpropagation networks in other problems outside of class such as predicting the scores of sporting events and as data classifiers. The third programming assignment used the same data set as assignments 1 and 2 but used a counterpropagation network. This demonstrated the usefulness of a Kohonen feature extractor and also showed a rudimentary hybrid network composed of two established networks. The graduate students were required to complete an additional graduate project and were given separate tests from the undergraduates. Graduate students were also required to make a class presentation of their projects. The course content was designed around the topics listed above. A suggested course syllabus is provided in Appendix A. Sample programs for each of the networks detailed in class were prepared for the students by the instructors. These programs differed from those available from journal articles in that they were designed to be simple to understand. Therefore, many of the sample programs lack the power of the ones from journal articles but allowed the students to grasp the basic concepts. Additionai class lecture material was taken from the series of articles by Maureen Caudill in AI Expert from December, 1987, to May, 1989 (Much of this material has been gathered into a book 1201).The students were also directed to articles in popular publications such as Byte and Dr Dobb’s Journal. The authors of this paper taught the course as a team. Susan Wallace has an extensive mathematical educational background while Layne Wallace was trained in physiological psychology. Both have been teaching computer and information sciences at university level for over eight years.

F. LAYNE WALLACE and SUSAN R. WALLACE

156

REFERENCES 1. 2. 3. 4. 5. 6. I. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

McCulloch W. W. and Pitts W., Bull. math. Biophys. 5, (1943). Hebb D. O., Organizarion of Behavior. Wiley, New York (1949). Rosenblatt R., Principles of Neurodynamics. Spartan Books, New York (1959) Minsky M. L. and Papert -S., PerceIxrons. MIT Press, Cambridge, MA (1969). Widrow B. and Hoff M. E.. In 1960 IRE WESCON Conuenfion Record. Institute of Radio Eneineers. New York (1960). Rummelhart D. E. and McClelland J. L., Parallel Distributed Processing: Explorationsyn the' Microstructure if Cognition. MIT Press, Cambridge, MA (1986). Kohonen T., Self-Organization and Associalitie Memory. Springer Verlag, Berlin (1984). Hecht-Nielson R., Applied Optics 26, 23 (1987). Szu H. and Hartley R., Phys. Left. 1222, 3-4 (1987). Hopfield J. J., Proc. natn Acad. Sci. U.S.A. 79 (1982). Kosko B., IEEE Trans. Syst. Man Cybern. 18, 1 (1987). Fukushima K., Biol. Cybern. 20 (1975). Fukushima K., Pattern Recognilion 15, 6 (1982). Carpenter G. A. and Grossberg S., Computer Vision Graphics Image Process. 37 (1987). Carpenter G. A. and Grossberg S., Appi Optics 26 (198?). McClelland J. L. and Rummelhart D. E., Explorations in Parallel Distributed Processing: A Handbook of Models, Programs, and Exercises. MIT Press, Cambridge, MA (1988). Nestor Inc. Nestor Development System. Nestor, Providence (1990). HNC. Inc. ANZA Plus. HNC. San Diego (1990). Wasserman P. D., Neural Computing: Theory an> Pracfice. Van Nostrand Reinhold, New York (1989). Caudill M., Naturally Intelligent Systems. MIT Press, Camridge, MA (1990).

APPENDIX

A

Introduction to artificial neural networks syllabus This course is designed to give the student an understanding of the history and current trends in the field of artificial neural networks. Specific topics to be covered include a brief overview of neurophysiology, early neural network models, current architectures and training strategies, applied and commercial systems, and current research trends. Programming examples will be presented in class. Students are expected to have completed an introductory programming course and a course in computer data structures. Programming languages and computer systems to be used for programming assignments are left to the student. Text: Wasserman P., Neural Computing: Theory and Practice. Van Nostrand-Reinhold, New York (1989). Grading:

Tests (2 plus final) Homework & programs

15% 25%

Tentative schedule Week 1 2 3 4 5 6 7 8 9 10 II 12 13 14 15

Topics Introduction, overview, history Review of vector and matrix manipulations The physiology of the cerebral cortex Single layer networks, grandmother neurons Pattern associators, the perceptron The perceptron and linear separability t*++ TEST 1 +*** Two layer networks, backpropagation Backpropagation Counterpropagation Comparing statistics to neural networks Bi-directional Associative Memories **** TEST II **** Boltzman machines Hopfield nets, cognitron, neocognitron Adaptive Resonance Theory Commercial applications **** FINAL ****