Hardware implementation of neural networks in Japan

Hardware implementation of neural networks in Japan

Neurocomputing 5 (1993) 3-16 Elsevier 3 NEUCOM 195 Hardware implementation of neural networks in Japan Yuzo Hirai Institute of Information Sciences...

1MB Sizes 0 Downloads 120 Views

Neurocomputing 5 (1993) 3-16 Elsevier

3

NEUCOM 195

Hardware implementation of neural networks in Japan Yuzo Hirai Institute of Information Sciences and Electronics, University of Tsukuba, lbaraki 305, Japan Received 29 November 1991 Revised 21 August 1992

Abstract Hirai, Y., Hardware implementation of neural networks in Japan, Neurocomputing 5 (1993) 3-16. In this paper, research activities on hardware implementation of neural networks in Japan are reviewed. In Japan, digital, analog, and optoelectronic technologies have been applied to neural network hardware. Among them, digital approach is the most prevailing. Several types of on-chip BP learning have been implemented in digital hardware, and 2.3 GCUPS (Giga Connection Updates Per Second) learning speed has already been attained. Most of the largest Japanese electronic companies have developed this kind of system and have run various neural networks on it. Although the most prevailing approach is digital, intensive researches on analog and optoelectronic approaches are also carded out. For analog approach, 28 GCUPS on-chip learning speed and 1 TCPS (Tera Connections Per Second) processing speed for Boltzmann machine with 1 bit digital output have been obtained. For the optoelectronic approach, although the network size is small, 640 MCUPS BP learning speed has been attained.

Keywords, Analog VLSI; digital VLSI; neural network hardware; optoelectronic technology.

1. Introduction

One of the most important differences between the Perceptron era and the recent revival of neural networks is in the remarkable progress in semiconductor device technologies. It provides a powerful means to implement large neural networks in electronic or optoelectronic hardware. Since the pioneering work on analog chips for autoassociative memory and retinal circuits by the Caltech group and for associative memory by the AT&T group, a large number of experiments on hardware implementation has been carried out in US, in Europe and in Japan. Since then, especially in the US, the analog approach has been prevailing. This came from the fact that as shown in Table 1, analog circuit is compact and many neurons and synapses can be packed in a single chip. On the contrary, in Europe and in Japan, digital implementation precedes analog and optoelectronic approaches, because the advantages shown in the table can be enjoye,d. The digiCorrespondence to: Yuzo Hirai, Institute of Information Sciences and Electronics, Unive~ity of Tsukuba, Ibaraki 305, Japan. 0925-2312/93/$06.00 (~) 1993 - Elsevier Science Publishers B.V. All rights reserved

4

r. Hirai

Table 1.

Advantages and disadvantages of analog, optoelectronic and digital approaches.

Analog

Advantages compact high speed, but depends on precision

Optoelectronics

large fan-in and fan-out modifiable synapse by SLM

Digital

high precision high scalability modifiable synapses

Disadvantages susceptible to process-parameter variation susceptible to noise lacks scalability the precision of synaptic weights is limited difficult to make modifiable synapses opto-electronic and electro-optic transformation susceptible to noise and parameter variation large circuit size

tal approach is currently the best solution to make general purpose neural networks with high precision. Most Japanese electronic companies have developed digital neural network systems, which range from the one end using conventional multiplier and accumulator such as DSP to the other end using PDM (Pulse Density Modulating) digital circuit. Recently several chips capable of on-chip learning have been developed. One disadvantage of the digital approach is that the circuit size becomes larger than for the other approaches. Although the number of neurons that a single chip can contain depends on the precision required, the order will remain less than 103 digital neurons in the near future. As shown in the table, however, since digital hardware is highly scalable, many chips can be connected to make a large neural network. In this case, synchronization may limit the scalability and the speed. By using PDB digital hardware, however, the entire network can operate asynchronously and chips can be connected unlimitedly without supplying a common clock. Another way to circumvent this small number problem is to use WSI (Wafer Scale Integration) and actually, a working WSI digital neural network has been developed. Digital neural network systems in Japan have come close to the commercial level. Intensive researches on the analog approach have also been carried out in Japan. As shown in the table there are many difficulties in the analog approach. Therefore, some kind of restrictions must be imposed on the implementation. To attain scalability, for example, many analog chips employ 1 bit digital output to avoid mismatching between chips to be connected. Although the optoelectronic approach is still at laboratory level, active researches have been carried out and several optoelectronic neural chips have been developed in Japan. In the following sections, after introducing the basic hardware technologies to represent neural network functions, examples of hardware implementation in Japan will be reviewed.

2. H o w to map neural functions to hardware

A real neuron consists of excitatory and inhibitory synapses, dendrites, a cell body and an axon. The information carried by the axon is analog value encoded by impulse density. When an impulse comes to a synapse, it produces positive or negative analog potential, called excitatory postsynaptic potential (epsp) or inhibitory postsynaptic potential (ipsp), in accordance with the type of the synapse. The magnitude of the potential depends on the transmission efficiency or weight of the synapse, and it decays according to its membrane

Hardware implementation of neural networks in Japan

5

time. constant. So, if another impulse arrives before the potential decays to zero, temporal summation occurs and the potential becomes larger than the previous one. In addition, if more than one impulse arrives at different synapses at the same time, spatial summation of the postsynaptic potentials occurs. When the temporally and spatially summed potential exceeds threshold, output impulse stream is transmitted to other neurons through the axon. These functions of a single neuron can be modeled by the following nonlinear differential equation: dy~

# • ~



(t) = - y j ( t ) + ~

wji • y i ( t ) ,

(1)

i

where # is the time constant of the decay of the postsynaptic potential, y~(t) is an internal potential of the jth neuron, wji is the synaptic weight from the ith neuron to the jth neuron, and yi(t) is the output of the ith neuron. Although in a real neuron the time constant differs from synapse to synapse, to simplify the model, they are assumed to be the same in this equation. The output of the jth neuron is defined by

yj(t) = f ( y ; ( t ) ) ,

(2)

where f represents an output function. From the above discussions, the key components for the hardware implementation can be summarized as follows: 1. 2. 3. 4. 5.

Representation of input and output. Representation, storage and multiplication of synaptic weight. Spatial summation. Temporal summation. Representation of output function.

The methods for the implementation of these functions in electronic and optoelectronic circuits are summarized in Tables 2-6. Table 2. Representation of input and output.

i Appr°ach Analog

II Representation analog voltage pulse encoding Ibit digitaloutput Optoelectronics analog voltage light intensity Digital binary digital (bit serialtransmission) pulse density

Circuit technology buffer amp. VCO comparator buffer amp LED and photodiode bit parallelbus serial/parallelconversion rate multiplier

3. Examples of hardware implementation Examples of hardware implementation of neural networks in the world are summarized in

Table 7, where italic entries show those developed in Japan. It is not a complete list and many important examples may be missed. From this table we can read the following tendencies:

6

r. Hirai

Table 3. Representation, storage and multiplication of synaptic weights.

Analog

Representation

Storage

Multiplication

resistor array MOS channel resistance amplifier gain storage capacitor

fixed or digital switch array storage device

Kirchhoff's low V-I characteristics

memory for bias voltage floating gate refreshing capacitor with digital memory and D/A converter digital memory refreshing capacitor electric charge to modurate polarizor density of transmission media light source (CRT or LED array) memory for bias voltage digital memory digital memory digital memory

transconductance gain Gilbert-Multiplier Gilbert-Multiplier

D/A converter switched capacitor SLM

Optoelectronics

photodiode array Binary digital Pulse density

digital memory pulse density digital memory

multiplying D/A converter switching frequency difference in polarizing angles transmissivility gain of photodiode gain of photodiode digital multiplier logical AND rate multiplier

Table 4. Representation of spatial summation. Analog Optoelectronics Binary digital Pulse density

current summation by a differential amplifier with wired-OR inputs convergence of light by lens current summation accumulator nonlinear summation by OR gates

Table 5. Representation of temporal summation. Analog Optoelectronics Binary digital Pulse density

RC integrator no example numerical solution up/down counter and rate multiplier solution of differential equation by integral form

Table 6. Representation of output function. circuits Analog Optoelectronics Binary digital Pulse density

output functions sigmoid tanh unit sigmoid tanh any function tanh ramp

realizing methods transfer characteristics of amplifier comparator transfer characteristics of amplifier table look-up nonlinear spatial summation thresholding gate to suppress negative output

- Recently digital approach has become very active. - Numbers of neurons and synapses containable in analog neural chips grow enough to apply to specific tasks such as preprocessing for pattern recognition [7].

Hardware implementation of neural networks in Japan

7

- On-chip learning becomes available in analog [10], binary digital [18] and PDM approaches

[24]. - The combination of analog and digital approaches attained the state of the art processing speed, 1 TCPS [10]. - Digital approach attained the state of the art BP learning speed, 2.3 GCUPS [18]. The above survey indicates that for special purpose use which does not require high precision, analog internal circuits with digital output is the best way. For general purpose use which

requires high precision, the digital approach is the best. Table 7. Examples of hardware implementation of neural networks. E__ Analog

Optoelectronics

Digital Binary digital

Pulse density

Research Groups Caltech (1986) [1]

Comments autoassociative network 22 neurons fully interconnected weight with three strengths (1.0~ -1) AT&T (1986) [2] associative memory 256 neurons with fixed connection Caltech (1988) [3] silicon retina by resistive networks 48x48 neurons UCLA(1988) [4] nP chip 48 inputs and 10 neurons INTEL(1989) [5] ETANN, 10,240 floating gate synapses Matsushita(1990) [6] dynamic refresh for synaptic weight capacitors 64 neuron 3-layered network, 100MCPS AT&T(1990) [7] 32,000 lbit synapses, 256 neurons 1bit digital input and output, 320 GCPS Caltech(1990) [8] CCD synapses 256 neurons fullyinterconnected Fujitsu(1990) [9] 16bit modifiable synapse 1 neuron per chip Mitsubishi(1991) [10] on-chip learning by dital control (Boltzmann Machine) 336 neurons, 28,000 synapses, 1bit output, 28 GCUPS AT&T(1991) [11] 4,096 6bit synapses, 3bit states, synapse refresh 5 GCPS AT&T(1989) [12] I optically programmable synapses 120 photoconductive array Mitsubishi(1991) [13] electrically programmable synapses variable-sensitivity photodiode array 8 neurons and 64 synapses Hitachi(1989) [1~] WSI(Wafer Scale Integration) neural net, broadcast bus 576 neurons/wafer, 64 synapses/neuron Fujitsu(1990) [15] 256 DSPs connected by ring bus, 587 MCUPS Toshiba(1990) [16] 2 neurons per chip, ring bus Siemens(1990) [17] systolic array Hitachi(1991) [18] WSI neural net with BP Learning, 2.3 GCUPS Univ. of Edinbugh(1988) [19] 64 synapses, analog summation Tampere Univ. of Tech. simulation (1989) [20] Univ. of Tsukuba(1989) [21] 84 6bit synapses and 6 neurons per chip, 12bit state a system of 54 neurons fully interconnected i,,~working asynchronous operation and high scalability Neural Semiconductor commercialized chip set (1990) [22] Univ. Diisseldorf(1990) [23] 24 neurons and 64 synapses constitute a system Ricoh(1991) [24] on-chip BP learning, 1 neuron per chip

8

Y. Hirai

4. The analog approach in Japan In this and the subsequent three sections, examples of hardware implementation in Japan are reviewed. This section begins with the analog approach.

4.1 On-chip 64 neuron three-layered network [6] Matsushita Electronics Research Laboratory fabricated a BiCMOS analog neural chip which contained 64 neurons and three 16 × 16 synapse arrays for a fixed three-layered network. The input layer consists of 32 neurons, the hidden layer consists of 16 neurons and the output layer consists of 16 neurons. Two synaptic arrays are used for the connections between the input and the hidden layer, and one array is for the connections between the hidden and the output layer. The processing speed is 100 MCPS. A synaptic weight is stored in a storage capacitor. To maintain the weight charge, a dynamic refresh technique is used in the same way as for DRAM. An off-chip refresh controller which contains 8 bit weight memory is used for that purpose. They used Bipolar-MOS analog circuit to reduce multiplication error, since mismatch is small in bipolar transistor pairs.

4.2 General purpose analog chip [9] Most of the analog chips fabricated in the world aim at special purpose use. This reflects the current research trends to exploit the compactness of analog circuits at the expense of precision. An alternative approach is to aim at general purpose use, and such an analog chip was fabricated by Fujitsu. It has 16 bit synaptic weights and high precision analog circuits at the expense of compactness. Although a chip contains only one neuron, a large neuronal network can be made by connecting chips with a broadcasting bus. A chip broadcasts its analog output to all the other chips, in which they multiply their synaptic weights to the input in parallel. Then the next chip is addressed and it broadcasts the output, and so on. Synaptic weights are stored in external 16 bit RAM. By using a broadcasting bus, wiring problem can be avoided.

4.3 28 GCUPS and 1 TCPS Boltzmann machine chip [10] Mitsubishi Electric Corporation fabricated an analog chip of 336-neuron 28k-synapse selflearning neural networks using 1.0 #m CMOS technology. This chip is dedicated for fully connected symmetrical neural networks, especially for the Boltzmann machine. There are two types of chips. One is for diagonal components and the other one is for non-diagonal components, and they can be connected by a branch-neuron unit architecture. They say 200 chips can be connected, provided that less than 30% of the neurons are active at the same time and the fluctuation of each neuron is less than 1%. This scalability becomes possible because of the 1 bit digital neuron output. Synaptic weight is stored in a storage capacitor with 5 bit precision. The synaptic weight is modified according to the Boltzmann machine learning rule, but in an approximate form. They used a deterministic learning rule instead of the original stochastic one. A digital circuit constituting a part of a synapse controls learning. They say that the learning speed is 28 GCUPS and the processing speed is 1 TCPS.

H a r d w a r e implementation o f neural networks in Japan

9

5. The optoelectronic approach in Japan 5.1 Optical learning chip [13] Mitsubishi Electric Corporation fabricated an optical learning chip using a variable sensitivity photodiode (VSPD) with a metal-semiconductor-metal structure. This diode changes its photo detection sensitivity depending on the bias voltage. The polarity of the photo current reverses when the polarity of the bias voltage is reversed, so that modifiable excitatory and inhibitory synaptic weights can be represented by adjusting the bias voltage of a single synapse. An external memory is necessary to store the bias voltage of each synapse. A line-shaped LED array containing 8 LEDs is used for 8 input lines, and is stacked on a two-dimensional 8 × 8 VSPD array. The performance of the chip was measured by making use of a BP learning system consisting of an 8 × 8 × 3 layered network. The learning speed wa,; 640 MCUPS.

6. The binary digital approach in Japan There are two approaches in digital implementation. One is using conventional digital multiplier and accumulator with binary digital code. The other one is using pulse density modulation. In this section, examples of the binary digital approach are described.

6.1 WSI digital neural network [14, 18] One disadvantage of the digital approach is the large circuit size, so that only a small number of neurons and synapses can be contained in a single chip. One of the solutions to circumvent this problem is to use wafer scale integration. Hitachi integrated 576 neurons on a wafer [14]. They are connected with a broadcasting bus. When an output from one neuron is broadcasted, it is multiplied with the synaptic weights stored in the other neurons in parallel. Each neuron contains 64 pairs of addresses for an input neuron and a synaptic weight to be multiplied with the input. When the address part matches the address of a neuron providing the input, the synaptic weight is multiplied with the input. Although the number of synapses contained in a single neuron is small, a neuron can potentially be connected to any other neuron by specifying its address. Recently Hitachi extended the WSI approach to an on-wafer BP learning network [18]. They divided a network into two symmetrical subnetworks, feed-forward net and backprop net. The output calculated in the feed-forward net was copied to the corresponding neuron in the backprop net at the same time when the output was broadcasted to other neurons in the feed-forward net. The error calculated in the backprop net was also copied to the feed-forward net at the same time when the error was broadcasted to the neurons in hidden layers in the backprop net. Weight updating was carried out in both networks in parallel, so that the two networks have the same weights but the directions of the connections are opposite. In this way, copying weights between the two networks, which requires relatively long time, can be overlapped with the calculations of outputs and errors. They integrated 144 neurons on a wafer. The output of a neuron has 9 bit resolution. A synaptic weight has 16 bit precision and each neuron has 64 pairs of neuron addresses and weights. By connecting eight wafers, they

1I. Hirai

10

have developed a 1,152 neuron BP learning system. The performance is 2.3 GCUPS, which is the sate of the art BP learning speed. Several applications of neural networks are actually running on this system.

6.2 Ring bus neural network system [15, 16] A ring bus architecture is an alternative way to avoid wiring problems. Neural network systems of this type were developed by Fujitsu and Toshiba. Fujitsu used DSP for a neural processing element, and a system composed of 256 processing elements has been developed. The performance of BP learning is 587 MCUPS [15]. Toshiba has been developing a custom VLSI for a neural processing element [16]. A chip contains 2 neurons, and they are planning to develop a system of 50 chips connected with a ring bus. They estimate that the processing speed will be 2 GCPS.

7. Pulse density modulation One of the advantages of using pulse density modulation is that essentially a single wire is sufficient to send the output of a neuron. Another advantage is that the entire network can operate asynchronously. This asynchronous operation provides a powerful means for high scalability. Two examples of this type are described in this section. One is from our laboratory, which operates asynchronously. The other one is from Ricoh Corporation, which operates synchronously.

7.1 PDM digital neural network system [21] We have fabricated a PDM digital neural chip in cooperation with Hitachi Central Research Laboratory. By simply connecting 72 chips without supplying a common clock, an asynchronous neural network system of 54 neurons fully interconnected by both excitatory and inhibitory synapses has been developed. The author believes that this system was the first actually running digital neural network system in the world. The functions of a single neuron are mapped to PDM digital circuits as follows. A single neuron consists of synaptic, dendrite and cell body circuits. Synaptic weight is represented by the transformation of input pulse density to the density proportional to the weight. This transformation is carried out by a 6 bit rate multiplier. The output transformed by the rate multiplier becomes fout :

weight value 26

" J~nput



(3)

Since the weight value, which is programmable, is always smaller than 64, the synaptic weight expressible by the rate multiplier is smaller than one. This imposes a severe restriction on neural networks. We circumvented this problem by raising the input density by a factor greater than one. The factor can be set to either 1 or 2 by the program. The output pulses from synapses are spatially summed by OR gates in a dendrite circuit, so that pulses coming at the same time are counted as one pulse. Although linear summation cannot take place, the spatial summation characteristic has gradual saturation similar to the

11

H a r d w a r e i m p l e m e n t a t i o n o f n e u r a l n e t w o r k s in J a p a n

positive part of the hyperbolic tangent function, as shown in Fig. 1. Figure l(a) shows theoretical curves for expected sum frequencies and Fig. l(b) shows the actual data obtained by the system we have developed.

/ //

///

0.4

0

]

Fig. 1.

OR

x

0.6

Normalized

f r equen,'y

/ --

-

,,

0.8

l 0

o

9.4

Nr,~al ~z.d

7 r. [,.,,tt,.r

.

Spatialsummationcharacteristicsobtained by theoretical analysis (a) and experimentaldata Co). Abscissaand ordinatesrepresentinput and expected sum frequencies,respectively. Theyare normalizedby the maximumfrequency. The parameteris the numberof inputs.

TO integrate spatially summed output pulses from synaptic circuits, an up/down counter is used. Since pulse density has no polarity, excitatoty pulses and inhibitory pulses are summed separately. Excitatory pulses are fed to the up input and inhibitory pulses are fed to the down input of the counter. The temporal summation which is modeled by the first order differential equation is carried out by solving the following equivalent integral form: t

y~(t) = -ill fo

[-y~(t)+

Zi wji " yi(t)] dt + y~(O) .

(4)

This equation is realized by the digital circuit shown in Fig. 2. The content of the up/down counter represents the internal potential of a neuron. The absolute value of the potential is transformed to the pulse density by a 12 bit rate multiplier. The negative feedback term appearing in the integrand of the above equation is realized by providing the output from the rate multiplier to the up input of the counter when the internal potential is negative, and by providing the output to the down input when it is positive. The output is transmitted to other neurons when the internal potential is positive, so that the output function is a ramp function with nonlinearity imposed by the spatial summation characteristic. The time constant of the circuit is determined by 212 /t-

fclock '

(5)

where fclock is the clock frequency driving the rate multiplier. Therefore, the number of states of the rate multiplier corresponds to the capacitance of an RC integrator and the clock frequency corresponds to the conductance.

12

y. H i r a i

Sign Bit Excitatory

Input - Up

"~L ~

-' Up/Down

Inhibitory Input

Down

Control Circuit

Output Impulses ~ --",

Counter

(12bit)

Down

i

Rate Multiplier (12bit) ~

Gate

k

-1"

Main Clock

Up Down

/ x( 1 )

/

Fig. 2. Structure of a cell body circuit.

The circuits described above have been fabricated using a 1.3 #m COMS gate array. The circuit structure of a single chip is shown in Fig. 3. A chip contains six neurons and 84 synapses. Each neuron has seven excitatory and seven inhibitory synapses. The number of synapses can be increased by simply cascading the chips. The spatially summed pulses from dendrite circuits can be supplied directly at the output terminals by bypassing the cell body circuits. By connecting the output terminals to the input terminals of dendrite circuits, which are shown in the upper part of the figure, of another chip, the number of synapses can be increased unlimitedly.

g-,].g*

[]

Synapse

~

Multiplexer

'1'

[] Input/output

Terminal

Fig. 3. Circuit structure of a single chip.

13

Hardware implementation of neural networks in Japan

By connecting 72 chips as shown in Fig. 4(a), a system of 54 neurons fully interconnected by both excitatory and inhibitory synapses has been developed. There are 5,832 6-bit synapses in the system. Every neuron is driven by a different crystal oscillator, so that the system operates asynchronously. The photo of the system is shown in Fig. 4(b). It is connected with the VME bus of a workstation. Synaptic weight registers, up/down counters and control registers have their unique addresses and they are mapped in the memory space of the workstation. Dendrite Extention Terminals

Computer~,~

t,,~ -J ;-;-/22 )2 Neuro Board

Fig. 4.

Output Terminals



Neuro chip using Cell Body Circu=ts

[]

Neuro chip not using Cell Body Circuits

l

Structure of the PDM digital neural network system (a) and the photo (b). The chips in the bottom row are used for synapses and cell bodies; the other chips are used only for synapses.

:Step responses of a neuron are shown in Fig. 5. Since the clock frequency is 4 MHz, the time constant is 1 msec. As can be seen in the figure, the behaviour of neurons completely follows the differential equation with 1 msec time constant. Various types of neural networks,

Y. Hirai

14

including winner-takes-all, TSP, planarization, pattern matching and semantic networks, have already run on the system successfully. As an example, in Fig. 6 the performance of the system carrying out a winner-takes-all network composed of 50 neurons is shown.

f

=

2047

f

=

1024

f

=

512

2000

lOOO

0

--

o

--

lO

20

time (ms)

Fig. 5. Step responseof a single neuron. 2000

)~euron

241

i000

0 I: Im,!i

=

, |~

-i000 0.5

Fig. 6.

1.0

1.5

2.0

msec

The performanceof the system carrying out a winner-takes-all composedof 50 neurons.

7.2 PCM on-chip BP learning [24] Recently Ricoh Corporation has developed a PDM digital neural chip capable of on-chip BP learning. A synaptic weight is also represented by pulse density and the multiplication between an input and a weight is carried out by logical AND between their pulses. Spatial summation is carried out by OR gates in the same way as in our chip. The chip is dedicated to feed-forward networks. A chip contains one neuron.

Hardware implementation of neural networks in Japan

15

They trained an inverted pendulum using 12 chips. At the top of the pendulum four propellets with small motors are mounted in four orthogonal directions. By adjusting the driving forces of these propellers, the pendulum can stand fantastically.

8. Conclusion Recent research activities on hardware implementation of neural networks in Japan were reviewed. Although the digital approach is prevailing in Japan, researches on the analog and optoelectronic approach are also active. Many research groups begin to address on-chip learning. Although it becomes possible to integrate many neurons in a single chip, it is still difficult to integrate a sufficient number of neurons for a broad range of practical applications. Therefore, scalability is a necessary condition for any hardware of neural networks, especially for the one aiming at real time operation. However, the sizes of many neural networks exceed the potential of current technologies. Therefore, virtual technology which enables small size hardware to apply larger neural networks is necessary. For early sensory processing such as spatial filtering operations, the virtual idea may be realized relatively easily, since the kernel is fixed and the size is small. The architecture employed by the AT&T group [7] is an example. By storing the weights corresponding to the kernel and by feeding input data through shift registers, convolution between the kernel and the data can be carried out at the speed of 100 GCPS for binary data. However, since the precision is limited, the application area is limited. A systolic array is an inherently virtual system and the pipelining computation is the key feature of the speed. Each processing element may contain a large amount of local memory for weights. However, the problem is that the larger the memory, the longer the loading time. For ,example, 6.4 × 8 × 108 bits of memory will take about 160 seconds to load new weights by means of an 8-bit bus with 400 KByte per second transfer rate [17]. To make, a system virtual a high-speed bus is mandatory. In this half decade, remarkable progress in hardware implementation of neural networks has been achieved. We hope this progress will continue and neural networks will grow into an indispensable information processing technology.

Acknowledgement I would like to thank many Japanese researchers who were willing to offer the results of their research. Without their help [ could not have finished this paper.

References [1] M.A. Sivilotti, M.R. Emerling and C,A. Mead, VLSI architecture for implementation of neural networks, Proc. AlP Conf. on Neural Networks flor Computing (1986) 408-413. [2] H.P. Graf, L.D. Jackel, R.E. Howard, B. Straughn, B.J.S. Denker, W. Hubbard, D.M. Tennat and D. Schwartz, VLSI implementation of a neural network memory with several hundreds of neurons, Neural Networks for Computing (AlP Press, 1986). [3] C.A. Mead and M.A. Mahowald, A silicon model of early visual processing, Neural Networks 1 (1988) 91-97.

16

1I. Hirai

[4] B. Furman and A.A. Abidi, An analog CMOS backward error-propagation LSI, Proc. First Annual INNS Syrup. (1988). [5] M. Holler, S. Tam, H. Casstro and R. Benson, An electrically trainable artificial neural network (ETANN) with 10240 floating gate synapses, Proc. IJCNN'89, Vol. II, Washington D.C. (1989) 191-196. [6] T. Morishita, Y. Tamura and T. Otsuki, A BiCMOS analog neural network with dynamically updated weights, Proc. IEEE ISSCC'90, TPM 9.1 (1990) 142-143. [7] H.P. Graf and D. Henderson, A reconfigurable CMOS neural network, Proc. IEEE ISSCC'90, TPM 9.2 (1990) 144-145. [8] A.J. Argranat, C.E Neugebauer and A. Yariv, A CCD based neural network integrated circuit with 64K analog programmable synapses, Proc. LICNN'90, Vol. II, San Diego (1990) 551-555. [9] H. Kato, Y. Sugiura and S. Tsuchiya, On implementing neurocomputer systems using analog neuroprocessor chips, Technical Report of Information Processing Society of Japan, 89-ARC-78-2, 1989 (in Japanese). [10] Y. Arima, K. Mashiko, K. Okada and T. Yamada, A 336-neuron 28k-synapse self-learning neural network chip with branch-neuron-unit architecture, Proc. IEEE ISSCC'90, TPM 11.2 (1991) 182-184. [11] B.E. Boser and E. SAckinger, An analog neural network processor with programmable network topology, Proc. IEEE ISSCC'91, TPM 11.3 (1991) 184-185. [12] R.C. Frye, E.A. Reitman, C.C. Wong and B.L. Chin, An investigation of adaptive learning implemented in an optically controlled neural network, Proc. IJCNN'89, Vol. lI, Washington D.C. (1989) 457-463. [13] K. Kyuma, Y. Nitta, J. Ohta, S. Tai and M. Takahashi, The first demonstration of an optical learning chip, Optical Comput. (1991) 291-294. [14] M. Yasunaga, N. Masuda, M. Asai, M. Yamada, A. Masaki and Y. Hirai, A wafer scale integration neural network utilizing completely digital circuits, Proc. IJCNN'89, Vol. II, Washington D.C. (1989) 213-217. [15] H. Kato, H. Yoshizawa, H. Ichiki and K. Asakawa, A parallel neurocomputer architecture towards Billion connection updates per second, Proc. IJCNN'90, Vol. II, Washington D.C. (1990) 47-50. [16] K. Shimokawa, News in Nikkei Electronics (March 5, 1990) (in Japanese). [17] U. Ramacher and J. Beichter, Architecture of a systolic neuro-emulator, Proc. IJCNN'90, Vol. II, Washington D.C. (1990) 59-63. [18] M. Asai, M. Yamada, N. Masuda, M. Yasunaga, M. Yagya and K. Shibata, High-speed learning neuro-WSI, Technical Report of IEICE Japan, NC90-12, 1990, 87-92 (in Japanese). [19] A.E Murray and A.V.W. Smith, Asynchronous VLSI neural networks using pulse-stream arithmetics, IEEE J. Solid-State Circuits 23 (3) (1988) 688--697. [20] J. Tomberg, T. Ritoniemi, K. Kaski and H. Tenhunen, Fully digital neural network implementation based on pulse density modulation, Proc. IEEE 1989 CICC (1989) 12.7.1-12.7.4. [21] Y. Hirai, K. Kamada, M. Yamada and M. Ooyama, A digital neuro-chip with unlimited connectability for large scale neural networks, Proc. IJCNN'89, Vol. II, Washington D.C. (1989) 163-169. [22] Stanford Tomlinson Jr., M., DJ. Walker and M.A. Sivilotti, A digital neural network architecture for VLSI, Proc. IJCNN'90, Vol. II, San Diego (1990) 545-550. [23] J.R. Beerhold, M. Jansen and R. Eckmiller, Pulse-processing neural net hardware with selectable topology and adaptive weights and delays, Proc. IJCNN'90, Vol. II, San Diego (1990) 569-574. [24] H. Eguchi, T. Furuta, H. Horiguchi, S. Oteki and T. Kitaguchi, Neural network LSI chip with on.chip learning, Proc. IJCNN'91, Seattle (1991). Yuzo Hirai was born in Tokyo, Japan, in 1948. He received the B.M., M.S., and Dr. degrees in electrical engineering from Keio University, Yokohama, Japan, in 1970, 1972 and 1975, respectively. He was with Fujitsu Co., Kawasaki, Japan, from 1975 to 1978. From April 1978 to 1981, he was a Research Assistant of the Institute of Information Sciences and Electronics, University of Tsukuba, Ibaraki, Japan, From 1981 to 1985, he was an Assistant Professor, and from 1985 to 1992, he was an Associate Professor. He is currently a Professor of the same institute. His research interests are in neural networks including hardware implementation, pattern recognition, visual information processing, associative memory and modeling cognitive functions. Dr. Hirai is a member of the IEEE, the International Neural Network Society, the Institute of Electronics, Information and Communication Engineers in Japan, and Information Processing Society of Japan.