Technical note
A multilayered feed forward neural network suitable for VLSI implementation Himanshu S Mazumdar
A potentially simplified training strategy for feed forward type neural networks is developed in view of VLSI implementation. The gradient descent back propagation technique is simplified to train stochastic type neural hardware. The proposed learning algorithm uses ADD, SUBTRACT and LOGICAL operations only. This reduces circuit complexity with an increase in speed. The forward and reverse characteristics of perceptrons are generated using random threshold logic. The proposed hardware consists of 31 perceptrons per layer working in parallel with a programmable number of layers working in sequential mode. Keywords: stochastic, multilayer neural network, VLSI
Multilayered feed forward neural networks 1'2 have created enormous interest among AI researchers for their feature extraction capability ~ from input patterns. Their potential as problem solving tools has also been demonstrated with various learning algorithms 4 ~'. However, for any serious real world application, such networks should have a large number of interconnections operating in parallel, obeying predictable learning rules uniformly throughout the network. The VLSI implementation of such a network needs a large number Physi(al Research Laboratory, Navrangpura, Ahmedabad-'~80 009, India Paper received: 1 July 1g~;4
of interface lines (pin connections). Learning algorithms like back propagation need large floating point operations, which increase rapidly with the increase in number of perceptrons per layer. It is thus necessary to give adequate stress to the simplification of the learning algorithm so as to reduce the computational need for massive parallelism. This paper describes a simplified technique to implement multilayered networks using random number generators. The gradient descent back propagation learning technique is modified to implement stochastic 7 logic. The prototype network built by us consists of a maximum of 31 perceptrons in each layer. The learning algorithm u s e s only ADD, SUBTRACT and LOGICAL operations. Interlayer communication needs only one bit per output and hence minimizes the data path considerably.
Back propagation algorithm We first review briefly the back propagation technique which is normally used for training feed forward type networks. The inputoutput characteristic function of perceptrons 1'~{ is chosen such that it is continuous and its derivative is bounded between definite limits for a large dynamic range of input. Consider a three-layer network of units Y,, Yi and
Yk interconnected through weights w h and wki such that:
X i - ~,, wj,* Y,;
Yi
f(X,)
(second layer) X~ - ~ j
wkj
Yj;
Y,. - f(Xk)
(third layer) where Y = f(X) represents the inputoutput characteristic function of the perceptrons, which is usually chosen to be of the form:
f(X)
1/(I + e ~).
The weights Wki and wii are modified using the BP algorithm as follows: wkj-w~j+n'Y'l(Y~ {Yk' (1 wli
wli + .
Y~)}I
(1)
Yi ' { Yj ~
(1 - Yi)}* ~-~k [(Yk
~{Yk'(1
d~)
dk)
(2)
Y,)}" wkjl
where n is a constant less than 1 and dk is the desired output. Most of the operators in Equations (1) and (2) operate on real variables. Hence it is difficult to implement a large number of perceptrons working in parallel in a VLSI. It has been shown that the choice of the perceptron's characteristics 1 is not very critical for convergence of error during training. This paper suggests one such choice which reduces the computational requirement drastically without affecting the error convergence property during training.
0141-9331/95/$9.50 ~ 1995 Elsevier Science B.V. All rights reserved Microprocessors and Microsystems Volume 19 Number 4 May 1995
23t
A multilayered feed forward neural network: H S Mazumdar
PROPOSED ALGORITHM
are generated by a threshold detector having a random bias value as shown in Figure 1. This is equivalent to a choice of input-output characteristic function Y = f(X) of the form:
We will analyse the significance of each factor in expressions (1) and (2) in order to replace them by suitable probability functions.
Y= 1 for(X+r)>0
[Yk -- dk] is the departure of output Yk from the desired value dk.
where r is a random integer inside the dynamic range of X. Yj and Yk are calculated using the above expression in the forward pass. This has two advantages: the value of Y] is binary but is statistically continuous in the dynamic range of learning and the computation of Xk (for the next layer) does not need any multiplications. In the backward pass, a term of the form Y k * ( 1 - Yk) is desirable. This term increases the sensitivity of Wkj near the threshold value of Xk. Equivalently, the network's willingness to learn or forget is inversely proportional to the distance of Xj from the threshold. Considering this, a simplified reverse characteristic
[Wki] is the back propagation signal path's conductance. [Y~*(1 - Y i ) ] term (Yk*(1 perceptron.
is similar to the above Yk)) for a hidden layer
[Yi] is the excitation potential for the network. In this architecture, we have considered all the perceptrons Yi, Yj and Yk having binary output. In the proposed system the perceptron's forward pass transfer characteristics
cur,~l "Y-i
.
.
.
.
.
.
.
R(X) = 0 for abs(X) > r = 1 for abs(X) ~< r
(3)
0for(X+r)~< 0
[Y~*(1 Yk)] is the measure of willingness of the Kth output perceptron to learn. This is measured by its nearness from its average value (0.5). This factor plays an important role in training.
function R(X), as shown in Figure 1, is used to replace the terms of the form (Y'(1 Y)) in Equations (1) and (2):
where r is a random integer in the range 0 ~ < r < N and the dynamic range of Xfor learning is ( N to +N). The value of N is dependent on the network configuration. To improve the dynamic range of inputs and the rate of convergence the following options are tested: • •
Use of N as random integer itself; Use of r as a weighted random number; • Adjusting the value of N gradually as the error converges.
H A R D W A R E DESCRIPTION
Figure 2 shows a simplified hardware diagram of the multilayered neuro-
.
1. p = N - r a n d o m ( 2 * N +
1) + x ,
ifp > 0 t h e n z = l e l s e z = 0 ; y = ~ z {dottedline}
X
:..
:~.-
_J7
2. y =
1/(1
+e-X)
{solidline}
l. p = random( N ) - abs(x),
ifp > 0thenz=
Y
lelsez=0;
y = ~ z {dottedline} X
2. y = e - ' / ( 1
+e-X)2
{solidline}
Figure 1 The forward and reverse characteristics of the proposed stochastics function along with the exponential sigmoid function (solid line)
232
Microprocessors and Microsystems Volume 19 Number 4 May 1995
(4)
A multilayered feed forward neural network: H S Mazumdar
f.M ~II 41t . [ IV)IVII~IVJ
SCKI
BIT
32
IENI
~
~ ~ _ _ ~
16 BIT ~ N
LAICH |:LCRN LCH,, ~'
SHI F I'-REG F
SHI
- -
cm ~=I lj lj i(/)iG,ii(,,Qi(,~ S ( . K 2 ~ ,
~~.._. BUFFER ~
I SHIFI-REG
EN3
SH2
b. L~ I.~ t.d
16 BII ADDER ADR
1'
16 ell XnR XO
~J 16 Bll 32K RAM
EXT. RAM Block diagram of the proposed multilayered feedforwardneural network
4. 5.
CONI ROLLER
Yn
16BIT EXI. DAIABUS
engine. The system consists of three 32 bit registers, one 16 bit adder and a random number generator which are connected to memory via a 16 bit bus. A PROM based sequencer controller generates all necessary control signals for different passes and is also used to store the configuration of the network. Register SH1 is loaded with input Yi which is rotated 32 times using 1024 clock cycles. In every clock cycle the corresponding weight wji is fetched from external RAM and, depending on the MSB of SH1, wji is added to or subtracted from the contents of the latch. After one complete cycle of SH1 Xj is computed in LCH. In training mode a random number is added to Xj through BF3. The sign bit of Xj represents Yj which is stored in SH3. All Yj are computed from wji and Yi in 1024 clock cycles. The learning mode needs five passes for a two-layer network as follows:
l
TTTTTTTT
[ RANDOM I GEN
Figure2
Ittt
16 BII
BUFFER BF3 SHIFT-REG SH3
c
INT.I)AIA B
f
RESULTS
Compute dj Correct wji
The complete two-layer learn cycle needs about 7000 clock cycles, which is less than one millisecond.
The above algorithm is first tested in a simulated environment using 16 bit integer arithmetic. The results are compared with those from the original
Table 1 Exp no.
Real B.P.
Stochastic
Time (s)
Loops
Time (s)
Loops
1 2 3 4 5 6 7 8 9 10 11 12
740 1133 699 311 740 1232 288 392 900 196 411 251
440 321 522 144 1056 369 317 1074 261 232 243 165 282
115 113 128 51 367 129 116 374 93 83 87 59 101
417
139
1.
Compute Yj
13
178
283 433 267 119 283 471 110 150 344 75 157 96 68
2. 3.
Compute Yk Correct wkj
Mean
574
219
Microprocessors and Microsystems Volume 19 Number 4 May 1995
233
A multilayered feed forward neural network: H S Mazumdar backpropagation algorithm using real number arithmetic. An IBM-386 computer with co-processor was used for testing both the methods (see Appendix for a listing of the program used). Both the networks were configured with eight input units, eight hidden units and one output unit. All parameters in both types of networks were optimized for minimum convergence time. The output function was chosen as mirror symmetry detection of input data. The convergence success rate was found to be larger and convergence time shorter in the proposed algorithm as compared to the BP algorithm. Some sample results are shown in Table 1. TTL based hardware has also been tested for successful convergence for various output functions.
REFERENCES 1 Rumelhart,D E, Hinton, G E and Williams, R I 'Learning representationsby back-propagation errors' Nature Vol 323 No 9 (1986) pp 533 536 2 Lippmann, R P 'An introduction to computing with neural nets' IEEEASSP Mag. (April 1987) pp4 22 3 Touretzky, D S and Pomerleau D A 'What's hidden in the hidden layers?' BYTE (August 1989) pp 227-233 4 Widrow, B, Winter, R G and Baxter, R A 'Layered neural nets for pattern recognition' Trans. Acoust., Speech Sig. Proc. Vol 36 No 7 (July 1988) 5 Lang, K J, Waibel, A H and Hinton, G E 'A time-delay neural network architecture for isolated word recognition' Neut. Netw. Vol 3 (1990) pp 23-43 6 Gagan,M and Wei, C 'On hidden nodes for neural net' IEEE Trans. Circ. Syst. Vol 36 No 5 (May 1989) 7 AIspector,J, Allen, R B, Hu, V and Satyanarayana, S 'Stochastic learning networks and their implementation' in Anderson, D Z (Ed) Proceedings of the IEEE Conference on Neural Information Processing SystemsNatural and Synthetic, American Institute of
Physics, New York (1988) pp 9-21 8 Jones, W P and Hoskins, J 'Back-propagation' BYTE (October 1987) pp 155-162
APPENDIX A sample program for a three-layer network is shown. NO, N1, N2 and N3 are constants adjusted for dynamic range and rate of convergence.
234
PROCEDURE forward pass; BEGIN FOR j: 0 T O j m a x D O BEGIN xjlj]:-0; FORi:-0TOimax+l DO IF yi[i] > 0 THEN xjljJ: = xj[j] + wji[j, i]; IF xj[j] > random(N0) THEN y j [ j ] : - 1 ELSE yj[j]:-0; END; FOR k : - 0 TO kmax DO BEGIN xk[k]: - 0; FOR j : = 0 TO jmax +1 DO IF yj[j] > 0 THEN xk[k]: - xk[k] + wkj[k,j]; IF xk[k] > random(N0) THEN y k [ k ] : - 1 ELSE yk[k]: - 0; END; END; (* .................... *) PROCEDURE training_pass; BEGIN FORj:=0TOjmax+I DO outj[j]: - 0; FOR k : - 0 TO kmax DO BEGIN If abs(xk[k]) > random(N1) THEN p : = 0 ELSE p : - 1; d: = out[k]-yk[k]; IF d < > 0 THEN FOR j : = 0 TO jmax +1 DO BEGIN IF yj[j] > 0 THEN wkj[k, j]: = wkj[k, j] + d ELSE wkj[k, j]: wkj [k, j] - d; IF p > 0 THEN IF d > 0 THEN outj[j]: = outj [j]
Microprocessors and Microsystems Volume 19 Number 4 May 1995
+ wkj [k, j] ELSE outj[j]: outj[j] - wkj[k,j]; END; END; FOR j : - 0 TO jmax DO BEGIN IF abs(xj[j]) > random(N2) THEN a: = 0 ELSE a : - 1; IF abs(outj[j])> random(N3) THEN b : - 1 ELSE b : = 0 ; d: = a and b; IF d < > 0 THEN IF outjlj] > 0 THEN FORi:-0TOimax+l DO IF yi[i] > 0 THEN wji[j, i ] : wji Ij, i] + d ELSE wjilj, i ] : wji[j, if - d ELSE FORi:-0TOimax+l DO IF yi [i] > 0 THEN wji[j, i ] : wji[i, jJ - d ELSE wji[j, i]: = wji[j, i] + d ; END; END;
Himanshu S. Mazumdar was born in Bihar, India in 1947. He received a Bachelor of Engineering degree in electronics and telecommunication from Jadavpur University, Calcutta, India in 1968. Mr Mazumdar was chief architect of the payload instrumentation of "'ANURADHA", the Indian Cosmic Ray experiment onboard NASA's Space Shuttle Challenger. His research interest lies in the areas of neural network hardware and software design. He is currently the head of the Electronics Division of the Physical Research Laboratory, Ahmedabad, India.