Computers Elect. Engng Vol. 19, No. 4, pp. 255-264, 1993 Printed in Great Britain. All rights reserved
A MULTILAYERED USING DIRECT
0045-7906/93 $6.00+ 0.00 Copyright © 1993 Pergamon Press Ltd
NEURAL NET CONTROLLER LEARNING ALGORITHM J. YuH
Autonomous Systems Laboratory, Department of Mechanical Engineering, University of Hawaii, Honolulu, HI 96822, U.S.A.
(Received in final revised form 12 March 1992) Abstract--This paper describes a recent study on the application of neural networks to dynamic system control. We consider two learning algorithms: the error backpropagation algorithm and the parallel recursive prediction error algorithm. The proposed method is an on-line approach ofa multilayered neural network controller which does not require any information about the system dynamics. Lengthy training of the controller might be eliminated using the proposed approach. The implementation aspects of the proposed approaches to dynamic system control are discussed by case studies.
Keywords: Neural network, Learning control, Dynamic systems.
1. I N T R O D U C T I O N Applications of neural networks to various areas such as signal processing and pattern recognition have been discussed in the literature. Neural network control systems have received increasing attention during the last few years. The main problem in the neural net application to control system design is that the training signal to the network is not known in advance. The training signal is the desired control input signal to the plant, which drives the system output to the desired value. Therefore, we need either to estimate the desired control input or to obtain the (inverse) dynamic model of the plant off-line. Neural network approaches to control system design, shown in the literature, can be classified into two groups with respect to the implementation of their learning algorithms: the on-line approach and the off-line approach. In the on-line approach the neural net controller is trained while the plant is controlled. Using this approach, it may not provide the desired performance during the learning period, but it may be good for time-varying plant systems. In the off-line approach, the neural net model of the plant system is obtained by training, and then that model is used to design the controller. The neural net of the inverse of the plant dynamics is often trained and then that model is directly used as a feedforward controller. This approach may provide the desired performance of the time-invariant plant systems but not the time-varying systems. If there are any changes in the plant dynamics, this requires training the neural net of the inverse dynamics again. However since learning or training periods of neural nets usually take a long time, the off-line approach may be preferred. Psaltis et al. [1] discussed a few learning architectures for multilayered neural network controllers: a general learning architecture, a specialized learning architecture and the combination of these two architectures. The general architecture is an off-line approach. It is learning the inverse of the plant but this approach may require training the neural net over a larger operational range than is actually necessary. The specialized architecture can be used for on-line implementation but requires knowledge of the Jacobian of the plant system. Nguyen and Widrow [2] proposed a self-learning control system using two-layer neural networks. The neural net plant emulator is first trained and then the controller is trained with that emulator. The effectiveness of their approach was illustrated by computer simulation for a trailer truck backing into a loading dock problem. Scanner and Akin [3] proposed a neuromorphic pitch attitude regulator of an underwater telerobot using the three-layer neural network. The netural network controller is trained by the backpropagation synaptic update mechanism with a modified error signal based on the payoff function, and then this is used to regulate the pitch attitude of an underwater telerobot. A trainable adaptive controller using neural networks was also discussed by Guez and Selinsky [4]. The network was trained by a linear control law, nonlinear control law and human operators as 255
256
J. Yu~
its teachers, and then the trained network was used as a controller. A cart-pole system was used to illustrate their approach. Miller [5] discussed the real-time application of neural networks for sensor-based control of robots with vision. He utilized the cerebellar model arithmetic computer (CMAC) neural network model proposed by Albus [6]. Kawato et al. [7] proposed a two-layer neural net controller for robotic manipulators. The neural net controller is trained while the plant is controlled by the PD type linear controller. Results of computer simulation with 3 DOF robot showed that as the neural net is trained, the neural net controller takes over the control system action from the linear controller. Therefore, this approach can be used without having an off-line training period. However, they conveniently chose the subsystems (nonlinear functions) based on the information about the robot system dynamics. Therefore, their approach can be interpreted as one similar to the nonlinear model-based adaptive control [8]. Yuh [9] proposed a three-layer neural net controller for underwater robotic vehicles. Instead of estimating the desired control signal (the training signal), the error signal between the desired control signal and the current control signal is estimated from the output velocity error. Results of computer simulation with planar motion of the vehicle showed that his approach can be used on-line and adapt the changes in the vehicle dynamics and its environment. Jamshidi et al. [10] discussed the effects of the number of layers and the number of nodes per layer in the neural network application to robot control. Okuma et al. [11] proposed neural networks that do not learn inverse dynamic models but compensate nonlinearities of robotic manipulators with the computed torque method. Future trends in control applications of neural networks are also discussed by Werbos [12] and McCusker [13]. In this paper, we consider a multilayered neural network controller with error estimation by the critic equation. Error backpropagation (EBP) is being used by many researchers to train the multilayered neural networks, and its detailed algorithm was discussed by Rumelhart et al. [14]. The parallel recursive prediction error (PRPE) algorithm is based on the conventional recursive prediction error algorithm [15] and its use in training the neural network was proposed by Chen et al. [16]. We will describe PRPE algorithm and its application to dynamic system control. The presented control system will be implemented for case problems: two-link robot control and yaw motion control of an underwater robotic vehicle. Results of computer simulation for each case will be discussed to evaluate performance of the control system. This paper is organized as follows. In Section 2, the neural net control system and PRPE are described. In Section 3, two dynamic systems are introduced and the neural net controller is implemented to control these systems. Results of the simulation are discussed before the conclusions. 2. N E U R A L
NET CONTROLLER
In this study, we consider a multilayered neural network (Fig. 1) whose neurons in the input and output layers have a linear activation function, f ( x ) = x while neurons in the hidden layers have a sigrnoid activation function, f ( x ) = ( e - x _ eX)/(e -x + eX). Each layer except the output layer has
Plant
Learning Algorithm~
Critic Equation
,
Fig. 1. A neural net control system.
~=~
A multilayeredneural net controller
257
a neuron receiving constant input. Inputs to neurons in the input layer are state error signals and constant while outputs of neurons in the output layer are control signals to the dynamic system. The feedforward process computing the network outputs and the learning process adjusting the weights of the network are described in the following.
2.1. Feedforward process The input signals propagate during the feedforward process from the input layer to the output layer, being modified at each neuron. The output of thejth neuron in the mth layer is computed by: qT(t) = f [WT(t)Tq s- '(t)],
(1)
where W7 is a weight matrix between the mth and (m - 1)th layers and qm-~ is an output vector of neurons in the (m - 1)th layer. The output vector of the input layer is: qinput(t) -----[er(t), d ] T,
(2)
where the vector e = Ya - Y with y being the state vector, ya its desired value, d is an arbitrary constant and the superscript T implies the transpose of the matrix. The output vector of the output layer is the control input U: q output= U.
(3)
2.2. Learning algorithm We first discuss an off-line prediction error algorithm followed by the description of its recursive form. Assume that a set of data (t = 1 ... N) are available. Then, the weights (W) of the network that give the desired input-output mapping can be determined by minimizing the loss function: 1
u
J(W) = ~-~ ,--~l {~(t, W)T¢(t, W)},
(4)
where N represents the number of measurements and ~ is the network output error vector. The minimization of the loss function d in equations (4) can be achieved iteratively according to: =
+ psj[w*-,],
(5)
where k denotes the iteration step, W~is a vector of weights associated with the jth neuron in the network, Sj[W] is a search direction based on the information about J, acquired from the previous iteration and p is a positive constant known as a learning rate. The search direction can be approximated by the modified Gauss-Newton algorithm: S~[ W] = H f '(W) [ - VJ( W)], where
(6)
1 N
HI(W) = ~ , ~ {~(t, W)~(t, W)T}, -vs(w)
=
1
u
w)}
and ~ ( t , W) is the gradient of U(t, W) ~ t h r e s e t to ~ . H~(W) is the diagonal element of tbe approximate hessian H(W) of tbe loss function. It is noted that the equation (6) b~omes the s t ~ s t - d e s c e n t algorithm when H(W) is replaced by the identity matrix. ~ e recursive approximation of th~ off-Hne algorithm [equations (5) and (6)] can be obtained as follows: ~.(t) = ~-(t - - l) + p~(t)~/(t)~(t). (7) In equation (7), ~. is a r~ursivc approximation of the invem of the diagonal element of hessian, 1
~ ( t + 1) = ~ { ~ ( t ) - ~(t)%(t + 1)[2I + ~ ( t + 1)~(t)%(t + l ) ] - ' ~ ( t + 1)~(t)},
(8)
258
J. Yon
where 2 is the forgetting factor. To set an upper bound for the eigenvalues of F~, a constant trace method is used in this study. Therefore, the F~-matrix is updated at each time step as follows: /~(t + 1) = F~(t) - F~(t)~j(t + 1)[2I + ~ y ( t + 1)F~(/)~j(t + l ) ] - t ~ ( t + 1)F~(t),
(9)
+ l) ~ ( t ÷ 1) = trace[~(t + 1)l'
(10)
where a is a positive constant trace of F~. and 0 < 2 ~< 1. As mentioned earlier, the above algorithm with the fixed F in equation (7) becomes the error backpropagation algorithm. 2.3. Critic equation
To implement the above learning algorithm, the desired control input Ud is needed to compute E and train the weights in the network. However, unlike the classic backpropagation algorithm, the desired mapping of network inputs to its outputs that are control signals is unknown a priori (i.e. the training signal Ud is not known a priori). We use the concept of learning with a critic to train the network [13]. The error signal E of the learning algorithm is estimated by the following critic equation: e(t) = C*[e~(t) + ~e~(t)] -- C, ~, (11) where e, = Vd(t) - V(t), e~ = Xd(t) - X ( t ) , C* is a positive constant matrix with the appropriate dimensions, C, is a semi-positive constant diagonal matrix, each element of the normalized control input vector ~ is U~/U~,,~, V and X are the actual velocity and position vectors, V~ and X~ are the desired velocity and position vectors and constant fl is chosen for a reasonable compensation of position error. Instead of training the network with a teacher (predetermined control law or human operator), a network is trained by the critic signal based on system performance and develops its control strategy. 3. C A S E
STUDIES
We implement the neural net controller for two dynamic systems: the yaw motion of an underwater robotic vehicle and a two-link manipulator system. Results of computer simulation for each case will be discussed. 3. I. Case 1: y a w motion control o f underwater robotic vehicle
Underwater robotic vehicles have difficult control problems, due to not only their nonlinear behavior, but also the high-density environment. The complete equations of 6 DOF vehicle motion can be found in Yuh [18]. The yaw motion control of the vehicle can be described by the following single-input-single-output (SISO) system:
= - N , , I $ I$ + u(t),
02)
where It is the inertial term including the added mass, u is the input torque generated by the thrusters, q~ and 6 are the yaw angle and its velocity, respectively, and N , is a drag coefficient. We used the network (3-3-3-1) which means three neurons in the input layer, three neurons in each two hidden layers and one neuron in the output layer. The critic constant C* is estimated by Ur~/6.,~. The maximum input torque is realized by the physical thruster system and the maximum velocity is usually known a priori from the operating conditions. In the simulation, the following numerical values were used: fl = 0.1, ~ = 3.0, F ( 0 ) = diag[l], T = 10 -2 s, p = 0.9, 2 = 0.9, C, = 0.1, initial values of the weights are randomly chosen, and the repeated motion was generated by the desired yaw angle for each cycle shown in Fig. 2. Results of the simulation are shown in Figs 3-6: 1. We first investigated the performance of the presented control system and results were compared with those of the error backpropagation algorithm. Iz and Nrr were fixed to be 150 and 375, respectively, um~ = 35 N m, and 6 . ~ = 0.3 rad/s. For each cycle of the repeated motion, mean square error (MSE) of yaw angle was measured as shown in Fig. 3. It is observed that the neural network is well trained. Output error is reduced and remained very
A multilayered neural net controller
259
4.0 A
-~ o x.~ 3.0 c~ ¢< 2.0 o >"o ~ 1.0
.~
~/I ~1 121
=',
o
Time (see.) Fig. 2. Desired yaw angle for each cycle in the repeated motion.
small as time goes by. The presented learning algorithm shows a faster convergence rate than the error backpropagation algorithm. 2. We also investigated the effect of the system parameter changes. After the first two cycles where (lz, Nr,) were fixed to be (150, 375), (I~, N,,) were changed every cycle between (300, 750) and (150, 375). Figures 4 and 5 show the yaw angle error and its velocity error, respectively and Fig. 6 shows the input torque signals. Results show that after adjusting weights of the network during the first cycle, the presented control system provides good performance in spite of the system parameter changes. 3.2. Case 2: two-link manipulator control
Consider a two-link manipulator system described by: u~ = m2F,(~, + ~ ) + mfl~ l~(2£, + £~)c~ + (m~ + m2)l~ £~ - m~l~ l~Yc~s~ - 2m~l~l:~qYc~s~ +m212gs~2 + (m I + m~)l~gs~ + VlJh,
(]3)
u2 = m~l~ 12c~, + m~l112s2Y:~+ m~l:gs~2 + m21~,(£~ + ~2) + v2Yc~,
(14)
5E-006-
4~-006,
PRPE 3E-006.
bJ GO 2E-006
1E-O06 •
OE+OOO
O
=~
,~s ~ Time ~sec.~
~'s
lJO
Fig. 3. Mean square error of yaw an~le for each cycle with PRPE and EBP.
260
J. YUH
6E-003 ~ 4.E-003
~-,
~ ~-oo~,]~ ~ OE+O00
,,,
(~ -2E--003
~, cl -,~F-O0~1 -~-IX~:~
2~
,~
72
Time (sec.)
9~
~o
Fig. 4. Yaw angle error with parameter changes.
2E-002A
(.J 1E-002:
tl) m "~1
~ OE+O00
v
0 ~- - 1E-002 (J 0 -2E-002 > -3E-002
o
~
~
~
Time (sec.)
~
~o
Fig. 5. Yaw angular velocity error with parameter changes. 4.E+001 -
2E+O01 •
E
z
OE÷O00.
v
~ET-2E+O01. 0 t---4.E+001.
-~E+O~
~',
,~
~'=
9'~
Time (sec.) Fig. 6, Input torque with parameter changes.
~o
A
multilayered neural net controller
261
0.6
"0
0.5
~0.~ I~ ~-
< 0.~ .,~
t-
o~
•-~ 0.~ .~ ~1 k.
•~ ~.~ I11 ~-I o.n 0.0
o.~, Time
o.'~ (sec)
o:~
Fig. 7. Desired joint angle of each joint for each cycle in the repeated motion.
where xi (i = 1, 2) is the angular displacement of each link, :~i and £~ are its velocity and acceleration, respectively, l~ is the length of each link, m~ is the point mass at the end of each link, v~is the viscous friction coefficients of each link, s~ = sin(x1), s2 = sin(x2), c~ = cos(x~) and slz = sin(xt + x~). In the simulation, we used the network (5-5-5-2) and the following numerical values are used: l ~ = 1 2 = l m , m~---6kg, v ~ = 5 . 5 N m - s , v ~ = 2 . 6 N m - s , g = 9 . 8 m / s ~, /~=0.1, g = 5 . 0 , F(0) = diag[1], T = 10-3s, 2 = 0.9, p = 0.9, C, = 0.I, initial values of the weights are randomly chosen, and the same desired joint angle for each joint was used as shown in Fig. 7. Results of the simulation are shown in Figs 8--12: 1. We first investigated performance of the presented control system and its result was compared with one by the error backpropagation algorithm, m~ was fixed to be 1.5 kg and diagonal terms of C* were estimated by ~/I,max 2 kN m,/42.max 1 kN m and ~,m~x = 2 rad/s. For each cycle of the repeated motion, mean square error (MSE) of each joint angle was measured. Results by PRPE and EBP are shown in Figs 8 and 9, respectively. It is observed that the neural network is well trained. Output error by PRPE is reduced as time goes by. Output error by EBP is smaller overall than the one by PRPE and PRPE shows a faster convergence rate than EBP. =
=
4.E-006-
I.~
.~ 3E-006.
21[-008
O,
~.~
' :~.~ Time
,,.'e (sec.)
~.~
e.~
Fig. 8. Mean square error of joint angle 1 for each cycle with PRPE.
262
J. YuH 3E-008-
2E-008.
t f-O08.
OE+O00 O.
'
1.~
' ~.~ ,.~ Time (sec.)
6.~,
'
8.~
Fig. 9. Mean square error of joint angle 1 for each cycle with EBP. 6E-003
~E-O03 A
"0
2E-003
o
k. v
~- OE+O00,
8 I,-
ILl - 2 £ - 0 0 3 ¢0
.~_ - , ~ - ~ m
13_ -~E-O03. -8E-003
0.0
'
1.~
3.~2
4.~l
'
6.=4
'
Time (sec.) Fig. 10. Joint angle error with parameter changes. 3E-OOt
2E-OOt ~J q~ ~01 t E-O01 "¢D 0 , ~ , OE+O00
~ -1E-~1 ~-~g-~ .~
g -~-~
~.~
1.~
'
3.~
Time
'
*.h
e.'~
e.1~
(sec.)
Fig. 11. Joint angular velocity error with parameter changes.
A multilayered neural net controller
263
2E+003
I E+003
~Z ~ 5E+002 ~ ~
OE+O00
(n o..) ~] ET
tO
t--
-5E+002, _I E+O03, -2E+0031
--2F+IX~.~] ~.00
,
1.~0
~.~0 Time
,.~o
(sec.)
63,0
s.6o
Fig. 12. Joint torque inputs with parameter changes (_ for joint 1 & . . . for joint 2).
2. We also investigated the effect of the system parameter changes. Every two cycles, m2 was changed between 1.5 and 3 kg. Figures 10 and 11 show the joint angle error and its velocity error, respectively. We present the results for joint 1 since both joints show nearly the same error profiles. Figure 12 shows the input torque signals where the dotted line is for joint 2 and the solid line is for joint 1. Results show that the control system presented provides good performance in spite of the system parameter changes. 4. C O N C L U S I O N S In this study, we considered a direct approach of neural network for control application using the concept of learning with a critic. We described a PRPE learning algorithm which has a faster convergence rate than EBP algorithm. The presented control system does not require any information about the system. Results of case studies with two dynamic systems shows that the presented control system has the capability of learning and adapting to the system parameter changes without a lengthy off-line training period. Future study on this subject includes theoretical investigation about the global stability of the control system. Acknowledgements--This work was partially supported by (R/OE-13) the University of Hawaii Sea Grant College Program under Institutional Grant No. NA89AA-D-SG063 from the NOAA Office of Sea Grant, Department of Commerce and partially by National Science Foundation under Contract No. BCS91-57896. This is a Sea Grant Publication UNIHI-SEAGRANT-JC-92-16.
REFERENCES 1. D. Psaltis, A. Sideris and A. A. Yamamura, A multilayered neural network controller. IEEE Control Syst. Mag. Apt, 17-20 (1988). 2. D. H. Nguyen and B. Widrow, Neural networks for self-learning control systems. IEEE Control Syst. Mag. Al~r, 18-23 (1990). 3. R. M. Scanner and D. L. Akin, Neuromorphic pitch attitude regulation of an underwater telerobot. IEEE Control Syst. Mag. Apr, 62-68 (1990). 4. A. Guez and J. S¢linsky, A trainable neuromorphic controller. J. Robot. Syst. S, 363-388 (1988). 5. W. T. Miller, Real-time application of neural networks for sensor-based control of robots with vision. IEEE Trans Systems Man. Cybernet. 19, 825-831 (1989). 6. J. S. Albus, A new approach to manipulator control: the cerebeller model articulation controller (CMAC). J. Dynam. Syst. Measmt Control 97, 220-227 (1975). 7. M. Kawato, Y. Uno, M. Isobe and R. Suzuki, Hierarchical neural network model for voluntary movement with application to robotics. IEEE Control Syst. Mag. 8-15 (1988). 8. J. J. Craig, P. Hsu and S. S. Sastry, Adaptive control of mechanical manipulators. Int. J. Robot. Res. 6, 16-28 (1987). 9. J. Yuh, A neural net controller for underwater robotic vehicles. IEEE J. Ocean Engng IS, 161-166 (1990). 10. M. Jamshidi, B. Home and N. Vadiee, A neural network-based controller for a two-link robot. Proc. 29th CDC, pp. 3256-3257 (1990).
264
J. Yur~
11. S. Okuma, A. Ishiguro, T. Furuhashi and Y. Uchikawa, A neural network compensator for uncertainties of robotic manipulators. Proc. 29th CDC, pp. 3303-3307 (1990). 12. P. J. Werbos, Neural networks for control and system identification. Proc. 28th Conf. Decision and Control, pp. 260-265 (1989). 13. T, McCusker, Neural networks and fuzzy logic, tools of promise for controls. Control Engng May, 84-85 (1990). 14. D. E. Rumelhart, J. L. McClelland et al., Parallel Distributed Processing. MIT Press, Cambridge, MA (1986). 15. L. Ljung and T. Soderstrom, Theory and Practice of Recursive Identification. MIT Press, Cambridge, MA (1983). 16. S. Chen, C. F. N. Cowan, S. A. Billings and P. M. Grant, Parallel recursive prediction error algorithm for training layered neural networks. Int. J. Control 51, 1215-1228 (1990). 17. B. Widrow, N. K. Gupta and S. Maitra, Punish/reward: learning with a critic in adaptive threshold systems. IEEE Trans. Syst. Man. Cybernet. SMC 3, 455-465 (1987). 18. J. Yuh, Modeling and control of underwater robotic vehicles. IEEE Trans. Syst. Man. Cybernet. 20, 1475-1482 (1990). 19. J. Yuh, R. Lakshmi, S. K. Lee and J. Oh, Adaptive neural net controller for robotic manipulators. Robotics and Manufacturing. ASME Press (1990).
AUTHOR'S BIOGRAPHY
Junku Yuh~Junku Yuh received the B.S, degree in mechanics and design from Seoul National University, Korea in 1981
and the M.S. and Ph.D. degrees in mechanical engineering from Oregon State University in 1982 and 1986, respectively. He is currently an Associate Professor of Mechanical Engineering at the University of Hawaii, Honolulu. His current research interests include intelligent control, neural network control, fault-tolerance redundant control, robotics, underwater robotic vehicles and flexible space robots. Dr Yuh received the 1989 DOW Outstanding Young Faculty Award, the 1991 Boeing Faculty Award, the 1991 Fujio Matsuda Fellow Award and the 1991 NSF Presidential Young Investigator Award. He served as a Guest Editor of the Special Issue of the Jounral of Robotic Systems on Underwater Robotics, June, 1991. He has been also involved in organizing various international meetings. He is a member of Phi Kappa Phi, ASME, IEEE, ASEE, SME/RI, KSEA, ISOPE and Who's Who in the World.