BioSystems, 23 (1989) 171--173 Elsevier Scientific Publishers Ireland Ltd.
171
Sixty million connections per second Harold M. Hastings* Department of Mathematics, Hofstra University, Hempstead, NYl1550 (U.S.A.) The purpose of this note is to report processing in a feedforward neural network at a peak speed of 59.9 million connections per second. The network algorithm was' coded in Fortran and executed on a CRAY XMP-1, a high speed general purpose vector machine.
Keywords: Feedforward network; Back propogation; Trained networks.
Introduction F e e d f o r w a r d n e u r a l n e t w o r k s (cf. R u m e l h a r t et al., 1986) a r e a n a t u r a l o u t g r o w t h of percept r o n s (Rosenblatt, 1960) and a r e one of t h e s t a n d a r d artificial neural n e t w o r k models. T h e artificial n e u r o n s a r e soft-limiting t h r e s h o l d devices with a sigmoidal a c t i v a t i o n function, usually of t h e f o r m
m a y p r o v i d e a r e a s o n a b l e model for e a r l y visual processing. Back p r o p o g a t i o n ( R u m e l h a r t et al., 1986) is a s t a n d a r d and effective t r a i n i n g s c h e m e for f e e d f o r w a r d neural n e t w o r k s . Back p r o p o g a tion assigns a credit to each n e u r o n for its role in the output, and uses this a s s i g n m e n t to modify w e i g h t s in r e s p o n s e to e r r o r s . Back p r o p o g a t i o n is artificial, in t h a t we k n o w of no r e l e v a n t biological m e c h a n i s m s .
activation = 1/(1 + e x p ( - net)). Methodology and results T h e a c t i v a t i o n function t r a n s f o r m s t h e combined input (net) e n t e r i n g a n e u r o n a t a particular t i m e into a single o u t p u t (activation). T h e r e a r e s e v e r a l l a y e r s of t h e s e neurons: an input layer, one or m o r e hidden l a y e r s , and an o u t p u t layer. Signals flow f o r w a r d f r o m t h e i n p u t l a y e r t h r o u g h t h e hidden layers, and to the o u t p u t layer. T h e iinformation in the n e t w o r k is s t o r e d in w e i g h t s of connections b e t w e e n layers. T h e input, net, to a n e u r o n in the hidden or o u t p u t l a y e r s is t h e w e i g h t e d s u m of activations of n e u r o n s in e a r l i e r l a y e r s , possibly with an additional bia,,~. Since M a r r {1982) a r g u e s t h a t t h e r e s p o n s e t i m e for visual r e c o g n i t i o n p r e c l u d e s significant f e e d b a c k b e t w e e n b r a i n regions, fully t r a i n e d f e e d f o r w a r d n e t w o r k s
*Consultant to Corporate Research Center, Grumman Aerospace, Bethpage, NY 11714.
T h e s p e e d r e s u l t s in p a r t f r o m r e p l a c i n g t h e s t a n d a r d sigmoidal a c t i v a t i o n function for feedf o r w a r d n e t w o r k s d e s c r i b e d a b o v e b y an effective b u t simplified sigmoidal function, of the form
activation = 0.5 + net~(2 + 2 *abs(net)) and in p a r t f r o m efficient parallelization. In addition to the a b o v e cited p e r f o r m a n c e of a fully t r a i n e d net, a t h r e e - l a y e r n e t w o r k achieved 13.2 million connections p e r second with a b a c k p r o p o g a t i o n l e a r n i n g a l g o r i t h m to be described m o r e fully b y E i l b e r t et al. (in preparation). This a l g o r i t h m minimizes e n e r g y redefined in t e r m s of a c o m b i n a t i o n of t h e L 4 and L ® m e t r i c s , r a t h e r t h a n t h e s t a n d a r d L 2 metric, to significantly e n h a n c e learning. I t r e d u c e s t h e n u m b e r of cycles n e e d e d to solve a
0303-2647/89/$03.50 Q 1989 Elsevier Scientific Publishers Ireland Ltd. Published and Printed :in Ireland
172 given problem and increases the probability of finding a global e n e r g y minimum, at a slight computational cost. The fully t r a i n e d net, with learning t u r n e d off, ran at 59.9 million connections/s. This processing s p r e a d r e p r e s e n t s a computation r a t e of 149 megaflops, or a p p r o x i m a t e l y 65% of the peak speed of the CRAY. Significantly f a s t e r results should be easily obtainable on general p u r p o s e a r r a y or v e c t o r machines such as the N A S A MPP, the DAP, or the Intel H y p e r c u b e .
The above r e s u l t s should readily e x t e n d to a v a r i e t y of artificial neural n e t w o r k s with Hebbian learning schemes (Hebb, 1949; cf. McClelland et al., 1986), which have similar complexity to back propogation. In s u m m a r y , fully t r a i n e d artificial neural n e t w o r k s on fast general p u r p o s e v e c t o r or parallel c o m p u t e r s m a y now achieve potentially useful processing speeds. It remains to find a p p r o p r i a t e a r c h i t e c t u r e s and learning schemes.
Acknowledgements Discussion We first compare the above speeds with those achieved by P o m e r l e a u et al. (1988) using 10 cascaded custom W A R P processors. P o m e r l e a u et al. stuck several parallel copies of the net onto the W A R P so t h e y could simultaneously calculate o u t p u t s for several inputs. One learning cycle was defined as parallel feedforward and feedback on as m a n y inputs as fit onto the W A R P simultaneously. W h e n inputs and weights w e r e preloaded the r a t e achieved on this W A R P configuration was 17 million connections per second. Our comparable r a t e of 13.2 million was achieved o v e r an e n t i r e sequence of training trials. T h e i r r a t e without learning was 42.5 million, o v e r a single learning trial, compared to our 59.9 million connections per second. F o r an informal comparison, a fly has about 60,000 neurons (cf. Yates, 1988). If t h e r e is a 1% connectivity, and a 10 cycles p e r second r a t e for all active connections, and 10°/o of the connections are active at any time, the realized processing r a t e is 36 million connections per second. H o w e v e r , it m u s t be emphasized t h a t actual biological neurons and processing are much more complex than the above t h r e s h o l d devices and processing (Yates, 1988).
We t h a n k David E n g l u n d and R o b e r t Waffenschmidt for helpful conversations. This r e s e a r c h was p e r f o r m e d at the G r u m m a n Corp o r a t e R e s e a r c h Center, whose s u p p o r t we gratefully acknowledge.
References Hebb, D.O., 1949, The Organization of Behavior (John Wiley, New York). Pomerleau, D.A., Gusciora, G.L., Touretzky, D.L. and Kung, H.T., 1988, Neural net simulation at warp speed, in: Proc. IEEE International Conference on Neural Networks, San Diego, California, II-413-II-150. Rumelhart, D.E., Hinton, G.E. and McClelland, J.L., 1986, The appeal of parallel distributed processing, in: Parallel Distributed Processing, D.E. Rumelhart and J.L. McClelland (eds.) (MIT Press, Cambridge, MA) pp. 4576. Rumelhart, D.E., Hinton, G.E. and Williams, R.J., 1986, Learning internal representations by error propogation, in: Parallel Distributed Processing, D.E. Rumelhart and J.L. McClelland (eds.) (MIT Press, Cambridge, MA) pp. 318--362. Yates, F.E., 1988, Evolutionary computing by dynamics in living organisms, in: Advances in Cognitive Science, AAAS Selected Symposia, Vol. 104, M.F. Kochen and H.M. Hastings (eds.) (Westview, Boulder, COt pp. 2649.