Copyright ~ IFAC Real Time Programming, Shantou, Guangdong Province, P.R. China, 1998
IMPRECISE NEURAL COMPUTATION IN REAL-TIME NEURAL SYSTEM DESIGN John Sum, Gilbert H. Young and Wing-kay Kan * ·High Performance Computing Laboratory, Department of Computer Science and Engineering, Chinese University of Hong !(ong, Shatin, N. T. Hong !(ong
Abstract. In order to solve real-time control problems, a good software design together with an appropriate control scheme and a system identification method are extremely importance. To facilitate the software design to cope with such a time critical system, the concept of imprecise and approximate computation has been imposed and applied in real-time scheduling problems for more than a decade. Applying neural network to solve real-time problem is always a problem to neural network practitioners. In this paper, a principle for neural computation and real-time system - imprecise neural computation - will be presented. This principle extends the idea of imprecise computation in real-time systems by introducing concepts like mandatory neural structure and imprecise pruning. Using such concepts, it is able to design and analyze a real-time neural system for different real-time applications. Copyright © 1998 IFAC
Key Words. Imprecise computation, Model complexity, Neural computation, Pruning, Real-time systems, Training
r-----------------------------------,
'5TR
This design methodology has been used by neural network researchers by employing neural network as identifier and regulator. As the training of a neural network is time consuming, application of such real-time controller can only be feasible for soft real-time control problem such as process control (Soucek, 1989). y
To deal with such problem, a conceptual framework extending the idea of imprecise computation, by introducing a concept of mandatory structure, called imprecise neural computation has been proposed recently (Sum et al., 1998c). This paper will describe how imprecise neural computation can be applied to design new training and pruning algorithms which can facilitate the design of real-time neural system identifier.
Fig. 1. Block diagram of a real-time self-tuning regulator. Adapted from Astrom K. and B. Wittenmark, Computer- Controlled Systems, Prentice Hall, 1990.
In the next section, the principle of neural computation will be elucidated. Design examples of imprecise training and imprecise pruning will be given in section three and four respectively. Then the paper will be concluded in last section.
1. INTRODUCTION To control a dynamic system in real-time, a realtime system identifier is usually required. Figure 1 shows the block diagram of a self-tuning regulator. Assuming a parametric model for the system, the identifier estimates the model parameters from the sampled input-output data pair. Then model parameter is passed to the design consideration block to generate a regulator parameter, which is used for suggesting suitable regulation scheme, to the regulator. Once the regulator has received the regulator parameter, it thus generates a sequence of control signals according to the control scheme.
2. IMPRECISE NEURAL COMPUTATION In1precise neural computation is a generic principle which tries to facilitate the software and hardware design of a neural network based system in dealing with the time critical problem. Traditionally, neural network is simply treated as a black box. Once a batch of data set has been collected from a dynamic system, a neural network iden-
123
tifier will be trained to learn the system behavior and then a neural network controller will be trained to control the system. The cost of building such a controller/identifier system is solely determined by the quality of the system output with respect to the desired output. Heavy computational cost and the error incurred due to incomplete computation are usually ignored.
which must be executed in order to achieve acceptable result according to a performance measure while optional part is the portion for refining the result), the optional part may be left unfinished when completing all tasks is not feasible.
In real-time problem, one always needs to face the following important problem :
In general, mandatory part cannot be solidly defined. It can be defined in whatever appropriate way as long as completing the execution of mandatory part can produce satisfactory result. Let us give two examples to illustrate the idea.
In a time critical situation, the system can decide how much optional part should be executed.
Problem 1 If it is not possible to finish all computation tasks with the given time, what should we do?
Example 1 Let the output of a feedforward neural network is denoted by f( x, 0), where x is the input to the network and 0 is the weight vector. The training is accomplished by gradient descent. In the tth iteration,
Similarly, in real-time neural computing, we face the same problem: Problem 2 If it is not possible to train a fully connected neural network with the whole· data set within a given allowable time span, what could we do?
• input
Xt
to the network
• calculate the output, f(Xt, 0)
These problems can be answered in the light of imprecise computation (Natagajan, 1995; van Tilborg & G. Koob, 1991; Stankovic & Ramamritham, 1988).
• calculate the error between the network and the target output, i.e. (Yt - f(Xt, 0)) • update the weight vector according to
Principle 1 (Imprecise Computation) If the computation of a task can be partitioned into mandatory part and optional part (where mandatory part is the necessary portion which must be executed in order to achieve acceptable result according to a performance measure while optional part is the portion for refining the result), the optional part may be left unfinished when completing all tasks is not feasible.
where J.l is the step size. Suppose () E Rm , the computation of D.() would require more than m multiplications. In case the allowable time for one iteration is not enough for m multiplications, one can update only those weights whose 8Jb~t.,8) value is larger than a threshold. 1 The computational power for calculating the output of the network together with weight update will be the mandatory part of the tth iteration.
In a time critical situation, the system can decide how much optional part should be executed. Therefore in designing such an algorithm, the execution time for the mandatory part should be small enough in order to meet the critical time constraint by trading off the quality of the results (Leung, 1995; Lin et aI., 1987; Lin et al., 1991; Liu et al., 1991).
Example 2 Once training is finished, only two steps will be executed :
• input Xt to the network • calculate the output, !(Xt,f)) Similarly, suppose the allowable time for one iteration is not enough, one will have to cut the size of the neural network in order to minimize the number of multiplication required for calculating the network output. In such case, the minimal structure for producing satisfactory result is the mandatory part.
Borrowing the idea from imprecise computation in real-time computing, the imprecise neural computation has been proposed recently (Sum et aI., 1998c). It provides a conceptual framework for the design and analyze real-time neural system. The basic principle of imprecise neural computation can be stated as following.
Note that the neural network being discussed in example one is in the learning mode while the neural network being discussed in example two is in the usage mode. Figure 2 shows their relationship.
Principle 2 (Imprecise Neural Computation) If the learning for neural computation can be partitioned into mandatory part and optional part where mandatory part is the necessary portion
124
USAGE
LEARNING
G
G
1
1
G
~ E3
E4
neural network is written as follows :
y(x)
= f(x, B),
(1)
where y E R is the output of the network, x E R m is the input, () E R n (m+2) is the parametric vector.
G
Let
e(0)
be the initial parametric vector and
P(O) = 6- 11n (m+2) xn(m+2), the training of a feedforward neural network can be accomplished by the following recursive equations (Kollias & Anastassiou, 1989), (Leung et al., 1996), (Leung et al., 1997), (Shah et al., 1992):
El: Input Data E2: Compute Output E3: Parameter Update E4: Pruning
P(t) B(t)
Fig. 2. Flow diagram for learning and usage mode. Event £2 can include subtask of generating control signal.
=
(I - L(t)H(t))P(t - 1)
(2)
B(t - 1) + L(t)[Y(Xt) - Y(Xt)]'
(3)
where
• Learning mode During learning mode, four tasks are needed to be executed. Once a data is input to the neural network (El), the output of the neural network will be computed (E2) and the output error is evaluated. Then the weight vector is updated (i.e. training) (E3) and the network is pruned (E4 ). • Usage phase Once learning is finished, only two tasks have to be executed - receiving data (El) and computing the network output (E2).
L(t)
P(t - l)H(t) HT(t)P(t - l)H(t)
H(t)
afl ao
+1
8=8(t-l)
Besides, let Wl, W2, .. " W n be the weight vectors associated with the 1$t to the nth hidden unit respectively. Suppose a neural network consists of two hidden units, one output unit and one input unit,
f(x, ())
As E 3 and E 4 are costly for the learning mode while E2 is costly for usage mode, designing imprecise algorithms for use in the learning mode and the usage mode should be separated. In the rest of this paper, we will focus on the learning mode (training and pruning) only.
O'(OlI X B20 0'( 021 X OlD
+
+ ( 12 ) + 822 ),
where 0' is the nonlinear sigmoidal function. In this model, Wl (B lo , OIl, BI2 )T and W2
=
=
(B 20 , 021, ()22)T.
3. IMPRECISE TRAINING Imprecisijy conventional approach
In the design of imprecise training, there are two possible approaches. One is restructuring the existing training algorithm in such a way that includes computational mandatory part and computational optional part. Another possible approach is to restructuring the algorithm in such a way that the mandatory part is to handle the training of a mandatory structure while the optional part is for handle the optional structure, similar to that mentioned in (Lo, 1994) and (Sum et al., 1998a). This section only introduce how the principle of imprecise neural computation can facilitate the former approach.
Using the principle of imprecise neural computation, the calculation of P(t) can be reformulated into the following block matrix form,
p=
Conventional approach
Without loss of generality, it is assumed that a neural network consists of n hidden units, one output unit and m input units. The function of a
where Pij E R(m+2) x (m+2), L i , Hi E R m + 2 . The time index t is dropped for simplicity. So, the
125
I:E===~~~~--.--co_m_p_u_ta_tin_al_T_i_m_e_-~(-(---~·1 I Mandatory I Optional I) 1
(i,j)th block matrix of LH T is simply LiHJ. (i)
Accurate Computation
The (i,j)th block matrix of by
LH T
(ii)
I Mandatory I Imprecise Optional Part
(iii)
I Mandatory I
Discard Optional Part
(iv)
D
Discard Optional Part Further Speed-up by SIMD
P is then given
n
LLiHJPkj k=1
Since L = P H / (HT P H n
+ 1)
and Fig. 3. The idea of imprecise training and parallel training.
n
HTpH = LLHT PijHj , i=l j=1
where
HT (t )Pi (t - 1) Hi (t)
:~i
The (i, j)th block matrix of LH T P can also be given
(6) (7)
IWi=Wi(t-l) .
One major advantage of using such decoupled algorithm is that. Instead of updating n 2 block matrix, only updating n block matrix. The computational cost is relatively small. For each block update, the computation cost for one-step update
2:~=1 2:~1 PilHlHJ Pkj 2:7=1 2:';=1 HT PijHj + 1 and the RLS update can be stated as follows. For i
+1
= 1, ... , nand j = 1, ... , n,
IS
1
2(m + 2)(3m + 15) as Pi E R(m+2)x(m+2). Compare with the original algorithm, the speed up is in an order O(n). If the above algorithm is implemented on a parallel machine (Sum et al., 1998d), say SIMD, the algorithm can be further speeded up by an order of O(n). The idea between imprecise and parallel training are shown in Figure 3.
Suppose P is a diagonal dominate matrix, the computation of the diagonal block matrix will be more important than those off-diagonal. Further assuming that HiHJ is small and the denominator is replaced by (HT PiiHi + 1) for the updating of Pii , a two-step update for P can be defined as follows:
4. IMPRECISE PRUNING
• For all i = 1, , n, update block matrix Pii • For all i 1, , n and for all j #, i, update block matrix Pij
=
Basically, pruning is a model reduction technique which attempts to remove excessive model parameters. One heuristic pruning method is based on the idea of error sensitivity (see optimal brain damage (LeCun et al., 1990), such as optimal brain surgeon (Hassibi, 1993) and (Leung, 1996)), or the idea of Bayesian (Sum et al., 1997a; Sum et al., 1997b) .
The first step of update can be viewed as the mandatory part of the training. Once the second step of update is ignored, we obtain the socalled decoupled RLS algorithm. For simplicity, Pi is used instead of Pii to denote the i th diagonal block of P. The decoupled RLS algorithm can thus be expressed into n decoupled filter equations (Puskorius & Feldkamp, 1991; Shah et al., 1992). For i = 1, ... , n,
Using error sensitivity approach, weights are ranked in regarding to its corresponding error sensitivities. If its error sensitivity is small, it is assumed to be less important. That is to say, Bi can be set to zero if "V"V (Ji(Ji E( 8) is small.
(I - L i ( t ) Hi (t ))Pi (t - 1) (4) Wi(t - 1) + Li(t)[Y(Xt) - Y(Xt)],(5)
Since after N iterations, the matrix P(N) and H(k), where k = 1"", N can be related by the
126
following equation.
No of weight pruned I I
N
P-l(N) ~ p-l(O)
+L
H(k)HT(k).
(8)
sort
k=l
weight remove
tv
Multiplying the k th diagonal element of p - 1 (N) with the square of the magnitude of the k th parameter, the second order derivati ve of E (0) can be approximated by
The saliency measure of the k th weight can thus be computed by the following equation:
Fig. 4. A typical pruning curve for heuristic pruning.
No of weight pruned if p-l (0) equals to >"1. Thus, the pruning algorithm can summarized as following. 1. Evaluating p - 1 ( N) and ()~ (( p - 1 ( N) ) k k - >..) for all k from 1 to ne. 2. Rearranging the index {1rk} according to the ascending order of f)~ (( p-l (N)) kk - >..) . 3. Set E(8[l,O]) = 0 and k = 1 4. While E(Op,k-l)) < Eo, (a) Compute validation error E(B[1,k]) (b) k = k + 1. Fig. 5. A typical pruning curve for non heuristic pruning.
Here e[1,k] is the parameter vector where 1rl to 1rk elements are zero and the 1rk+l to 1rne elements are identical to 81rk + 1 up to 81rn8 •
for completing the whole pruning procedure, both heuristic pruning and non-heuristic pruning perform similarly. In case the allowable execution time is not enough, non-heuristic pruning should be employed for pruning since it ensure partial pruning results can be obtained as anytime.
Figure 4 shows a typical pruning curve for heuristic pruning (Sum et al., 1998b). As sorting is required for the ranking of the importance of weight, this part is a mandatory part. Without execution this part, there is no way to prune the network. At the period of time, no weight can be removed during the very beginning. After the sorting list is generated, the validation-then-remove process (optional part) is started. Due to the nature of sorted list and the assumption that at most one weight can be removed at one time. The total number of weight removed increases linearly. Once further removal of a weight will cause a large validation error, the pruning process terminates.
5. CONCLUSION In this paper, a new concept on the design of neural network for real-time application - imprecise neural computation - has been presented. This essential idea comes from imprecise computation which has been studied intensively for more than two decades in dealing with real-time problem solving. Two design examples are presented, one for learning and the other is for pruning, using this new concept.
Figure 5 shows a typical pruning curve for nonheuristic pruning (Sum et al., 1998b). Since weights are not ranked, it can be viewed that no mandatory part exist. Only optional part exists. Weight will be checked one after the other. Suppose the chance of removing a weight is equal, total number of weight being removed will increase (roughly) linear.
Using idea sin1ilar to that of imprecise computation, such as pruning and fast training, has long been studied and applied in the area of neural networks but what they have done are not catered for hard real-time application. From the implementation point of view, imprecise neural computation provides a new conceptual framework for study-
Compare these two curves, the following conclusion can be drawn. In case it is long enough
127
ing methodologies of employing neural network approach in real-time applications. With the advancement of the hardware and parallel computing technology, true real-time neural systems for solving real-time problems will be possibly achievable in the future.
pled extended Kalman filter training of feedforward layered networks, in Proceedings of IJCNN'91 , VoI.I, 771-111. Shah S., F. Palmieri and M. Datum (1992). Optimal filtering algorithm for fast learning in feedforward neural networks. Neural Networks, Vo1.5, 779-787. Soucek B. (1989). Neural and Concurrent RealTime systems, John Wiley & Sons, Inc. Stankovic J. and K. Ramamritham (1988). Hard Real- Time Systems, IEEE Computer Society Press. Sum J., C. Leung, L. Chan, W. Kan and G.R. Young (1997). On the Kalman filtering method in neural network training and pruning, to appear in IEEE Transactions on Neural Networks. Sum J., C. Leung, L. Chan, W. Kan and G.H. Young (1997). An adaptive Bayesian pruning for neural network in non-stationary environment, to appear in Neural Computation. Sum J., W. Kan and G.H. Young (1998). Hypercube recurrent neural network, submitted. Sum J., W. Kan and G.H. Young (1998). Note on some pruning algorithms for recurrent neural network, unpublished manuscript. Sum J., G.H. Young and W. Kan (1998). Imprecise neural computation, to be presented in International Conference in Theoretical Computer Science, Hong Kong. Sum J., G.H. Young and W. Kan (1998). Parallel algorithm for the realization of recursive least square based training and pruning using SIMD machine, submitted to PDPTA '98 Las Vegas. van Tilborg A. and G. Koob (1991). Foundations of Real- Time Computing : Scheduling and Resource Management, Kulwer Academic Publishers.
6. REFERENCES Astrom K. and B. Wittenmark (1990). ComputerControlled Systems, Prentice Hall. Hassibi Band D.G. Stork (1993). Second order derivatives for network pruning: Optimal brain surgeon. In Hanson et a1. (eds) Advances in Neural Information Processing Systems, 164171. R. Johansson, System Modeling and Identification, Prentice Hall Inc. Kollias S. and D. Anastassiou (1989). An adaptive least squares algorithm for the efficient training of artificial neural networks. IEEE Transactions on Circuits and Systems, Vo1.36(8), 10921101. LeCun Y., J.S Denker and S.S. Solla (1990). Optimal brain damage, Advances in Neural Information Processing Systems 2 (D.S. Touretsky, ed.) 396-404. Leung C.S., K.W \\rong, J. Sum and L.W.Chan (1996). On-line training and pruning for RLS algorithms. Electronics Letters, 7, 2152-2153. Leung C.S., P-F. Sum, A-C. Tsoi and L Chan (1997). Several aspects of pruning methods in recursive least square algorithms for neural networks, Theoretical Aspects of Neural Computation: A Multidisciplinary Perspective, K. Wong et al. (eds.) Springer-Verlag, p.71-80. Leung J.Y.T. (1995). A survey of scheduling results for imprecise computation tasks, in S. Natarajan (eds) Imprecise and Approximate Computation, Kluwer Academic Publication. Lin, K-J., S. Nataragin and J. W-S. Liu (1987). Imprecise results : Utilizing partial computations in real-time systems, Proc. of the 8th RealTime Systems Symposium, San Franciso, CA. Lin K., J. Liu, K. Kenny and S. Natarajan (1991). FLEX : A language for programming flexible real-time systems, in A. van Tilborg and G. Koob (eds) , Foundations of Real- Time Computing : Formal Specifications and Methods, Kluwer Academic Publishers. Liu J., K. Lin, A.C. Yu, J. Chung and W. Zhao (1991). Algorithms for scheduling imprecise computations, IEEE Computer. Lo J. (1994). Synthetic approach to optimal filtering, IEEE Transactions on Neural Networks, Vo1.5(5), 803-811. Natagajan S. (1995). Imprecise and Approximate Computation, Kluwer Academic Publisher. Puskorius G.V. and L.A. Feldkamp (1991). Decou-
128