Alternative Designs of Neural Network Based Autopilots: A Comparative Study

Alternative Designs of Neural Network Based Autopilots: A Comparative Study

Copyright © IFAC Manoeuvring and Control of Marine Craft, Brijuni, Croatia, 1997 ALTERNATIVE DESIGNS OF NEURAL NETWORK BASED AUTOPILOTS: A COMPARATIV...

1MB Sizes 0 Downloads 55 Views

Copyright © IFAC Manoeuvring and Control of Marine Craft, Brijuni, Croatia, 1997

ALTERNATIVE DESIGNS OF NEURAL NETWORK BASED AUTOPILOTS: A COMPARATIVE STUDY Grant E. Heam, Y. Zhang and P. Sen Department of Marine Technology University of Newcastle upon Tyne Newcastle upon Tyne NE] 7RU, UK

Abstract: This paper represents an extension of earlier work on the development of neural network (NN) based ship control systems (Zhang, et al., 1997a). A recent overview of various NN controllers (Sen, et al. , 1997) reveals that the "intensive training" strategy developed by Zhang, et al. , (1997a) is different from the commonly used "iterative training" strategy of Psaltis, et al. (1988) and Saerens and Soquet (1991). Based on the observation of advantages and disadvantages of the both training strategies, this paper investigates a new training approach, designated "moderate training", which will incorporate selectively both the iterative and intensive training methodologies . The underlying idea is to combine the good features of both approaches by selecting, in an adaptive manner, an appropriate number of training iterations for each sampling interval. A series of course-keeping/course-changing simulations are undertaken for each of the three training strategies, and comparisons are made to demonstrate the improved properties of the moderate training strategy. Keywords : Neural networks, autopilots, direct neural control, moderate training.



1. INTRODUCTION: WHICH CONTROL SCHEME?

Neural networks have been seen for some years now as providing considerable promise for applications in process control. A state-of-the-art overview of various neural control schemes reported in literature has been provided by Sen, et al. (1997). Here the distinctive characteristics of these schemes and some marine applications are briefly outlined.



Supervised control (Fig. I) is intended to duplicate the skills of a human trainer, or to mimic the behaviour of an existing controller. Clearly off-line training is required before the neural network controller (NNC) can replace the trainer. In marine applications supervised control has been used to copy some existing controller (Endo, et al., 1989; Witt and Miller, 1993; Mort , et al., 1993; Burns, 1995), or to mimic the behaviours of a human operator (Enab, 1996).

Direct inverse control (Fig. 2) involves inverse modelling (the dotted line) and open-loop control (the solid line). Once the plant inverse is obtained, it is directly cascaded with the plant as a controller. This architecture is based on the assumption that a one-to-one mapping from the input to the output exists. An example of using direct inverse control for ship steering can be found in the work of Simensen and Murray-Smith (1995) .

~

NNC (mverse)

~

Plant

+h' e. -

r-

.

I

y, ..

----,

l , I

, u': (inverse) NNC: ' ;-_. L ---:

~----~.

'\ Fig. 2. Direct inverse control



Fig. 1. Supervised control

83

Indirect control (Fig. 3) uses two neural networks to form the whole system. At the outset a neural network emulator (NNE) is trained off-line to represent the plant response (also called forward modelling). Then the NNE can be used as a "channel" through which the parameters of the NNC are adjusted. Although widely used, the accuracy of indirect control depends heavily on the quality of the NNE. In addition, the NNE must be re-trained to cope with new situations.

subscripts k. k-I. k-2, etc . The superscript d means the desired value. A rudder limiter in the form of a ramp function is used between the rudder command (o~) generated by the NNC and the actual rudder (0 1 ) acting on the ship. Consistent with our earlier reported studies (Zhang . et al.. 1997a) the cost function of E, is written in the form

r

El

where p. A and

Fig. 3. Indirect control



='2I [ p(\jf ,d (J

\jf ,

)2 + 1I\.Us: I2+ or,2].

(I)

are weighting constants. Thi s

form is meant to penali se the excessive use of rudder and to reduce the added drag due to frequent changes in yaw rate.

Direct control (Fig. 4) was originally named specialised learning by Psalti s, et at. (1988). Given some modest amount of knowledge about the plant sensitivi ty derivative (i.e .. the NNC learn s continually from the direct evaluation of the control acc uracy. hence avoiding the need for off-line training as identified from other control schemes discussed above.

ay/au' ),

r

r

Fig . 5. Neural network autopilot Given the structure of the direct neural controller. there are in fact three different training strategies. From our reading of the literature it is apparent that the most often used approach is based on the following procedure . At time k the measured ship heading is used to formulate the cost function El'

Fig. 4. Direct control Other NN based control schemes are internal model control. model predictive control, model-reference adapti ve control and feedforward- feedback control. see (Sen. et al. , 1997». Among these various NN based control schemes, direct control (Fig. 4 ) has been selected in ship control research undertaken at Newcastle. The following good features of direct control may justify this choice: • • •



The connective weights of the network. w;; ' are then adjusted using the backpropagati o n (Rumelhart. et al., 1986): aEl

w(n+I)=w () Il -11--. '/ // ch-vi,

The des ign of the NNC is independent of the mathematical model of the plant. There is no specific off-line training phase ; the NNC is immediately operational. No trainer is required ; the NNC adju sts its parameters by directly evaluating its performance accuracy. Direct control can cope with new situations (such as different ships and different environmental conditions) by dynamically learning the interac tions between the plant and the environment.

algorithm

(2)

where n denotes the number of training iterations. and T) is a learning-rate parameter. Note that within each sampling interval weight-adjustment is carried out only once ; the updated weight is then kept for the next sampling interval. We define this training procedure, as shown in Fig . 6(a), as "iterati ve trmmng Here the whole training process IS exclusively based on the weights updated from the very beginning, hence the relativel y long term behaviour of the plant can be "memorised " in the form of network weights. As training converges. the plant can gradually follow a static or a moving target. However. fairly large tracking errors can be expected for a considerable lo ng period of time before the convergence is achieved. To overcome this problem we have proposed and successfull y applied "intens ive training", as shown in Fig. 6(b). Instead of using o ne training iteration , multiple training iterations are now

2. WHICH TRAINING STRATEGY?

As stated earlier the NN control scheme used for autopilot design represents an extension of direct control. as shown in Fig. 5. Here \jf, rand 8 represent the ship heading, yaw rate and the rudder angle respectively. The time step is denoted by

84

exhibit a degree of long-term learning, and improve the control performance over time.

involved within each sampling interval. Here the network weights obtained from the previous interval are not maintained; updating procedure starts from completely new random values, that is, the weights are refreshed at the beginning of each sampling interval. The advantage of such intensive training is the fast achievement of relatively higher control accuracy, making the control system immediately operational from the start. However, the selection of the learning-rate parameter (Tl) and the maximum number of training iterations (Il m,x ) allowed in each sampling interval should be carefully incorporated. In other words they need to be properly chosen to balance the sufficient (but not excessive) network training and the on-line control requirement. To achieve this , some pre-testing is required.

3. USE OF BACKPROPAGATION ALGORITHM Based on the gradient descent method , the backpropagation algorithm of Eq. (2) adjusts the network weights in order to minimise the cost function specified by Eq . ( I) . Using the chain rule, it follows that the required gradient of E, with respect to

w ij

is determined by

dE, dW ij

= dE, ao,

ao~

dO, ao~ aWii aE, a\jf, aE, aE, ar, ) ao, ao ~ ( = d\jl , dO, + ao, + ar, ao, ao~ d ii

(3)

W

Note that the quantity

dOUaWij in Eq . (3) can be

readily obtained from the backpropagation algorithm. will but the calculations of d\jl , /dO , and need in general some mathematical expression of the ship dynamics . This is unlikely to be available for an unknown system. To circumvent this requirement it is suggested that be replaced by

ar,/do,

k

k+!

time

(a) Iterative training

sign(d\jl , /aO,) and

a\jl ,/dO, ar,/ao, by

sign(ar, /ao , ). This

is consistent with the work of Saerens and Soquet /ao ~ for the rudder (1991) . In a similar manner, limiter is also replaced by signs of the appropriate ratios, so that Eq. (3 ) is modified to have the following form

dO,

k+!

k

time

aE, - "" [( P \jI, aWij

(h) Intensive training

Wi~( 11 + I) = W~ (I/) + 6w,> n )

i E, k

\ 11

( dOa (

W~"( 1l + I) = wi:(n )+ 6W~( Il )

E,) 11

Here,

w> " ; \) ~ . ..

k+!

)

d\jl-, ) + II.U, ~ s: sign ( ~,

( ao, Jao ~ ao, aW ii

(4)

r , )] sign - +crr, sign - , -.

~L \ ______~

= N =(

- \I/ ~I

sign(d\jl , /aO , ) and sign(ar, /ao , ) are readily

known from a qualitative understanding of the ship behaviour. On the basis of the sign conventions consistent with Zhang, et al. (I 997a) , a positive increase in rudder 0, will reduce the ship heading \jI , and suppress the yaw rate r, . Therefore, it is clear

time

(c) Moderate training

that

Fig. 6. Different training strategies

(5)

The obvious next level of development is to provide a training strategy which can combine the good features of the previous two strategies and remove their disadvantages. Such a new strategy , shown in Fig. 6(c), is called "moderate training" . Here the number of training iterations specified for each sampling interval is not fixed but selected dynamically as a function of the cost function: if there is little improvement in the reduction of E" network training is terminated . In the new interval the previously obtained weights are not discarded but used as starting values for the new updating process. By doing so, it is expected that the NNC can maintain a relatively high control accuracy at the initial stage ,

and

(6) Similarly, since 0, will increase or decrease as o ~ increases or decreases (7)

Note that the three training strategies discussed in Section 2 do not really affect the way in which Eq . (4) is written . However, the control performances do rely on the specific training strategy used. This will be demonstrated in the next Section .

85

4. SIMULA TIONS The task of the autopilot is to find appropriate rudder angles to perform course-keeping/course-changing under different environmental conditions. In this study the reference course is specified by t

E

[05,1005),

t E [1005,2005).

15

actual heading

t

E

[2005,2505),

t

E

[2505,4005

(8)

J.

desired heading

bii 10 t---r-'.;::7""""T"""---

"

:;:.

5

~

O~----------r~-~-+-----~

~

-5

.c " -to -151-_~_ _-,-:-_~_---=:ti:.::mt::e~(s!....)---,-'-:-_-:-'-,--_:-'-:-_ _

bii

30

~

20

~

10

50

100

150

200

250

300

350

4 0

30 time(s) - 0~--5~0--~to~0-~15-0----'20-0~-2~50--3~OO'---3~5~0-~400

(a) No wind, no noise 15

optimal values after a sufficient number of training iterations (from some hundred or even thousand, depending on the specific control problems and network structures), the same number of sampling intervals will be required to achieve stable control performance. Obviously if a sampling interval is of order seconds (in common with most marine control studies), the NNC using iterative training can hardly converge for the reference course designated in Eq . (8). In order to observe how "iterative training" works, we deliberately set one sampling interval as O.ls. Simulation results are shown in Fig. 7. It is clear that the rudder angle tends to oscillate during the transient periods (i.e., when there is a sudden change in the reference course), especially when random wind and measurement noise are added. As network training continues the weights gradually converge and hence the output oscillation eventually disappears . Obviously iterative training is suitable for situations where the reference course does not change very often. Further, one would expect that the NNC's performance will converge much faster if the sampling interval is assigned to a smaller value (say O.Ols). However, a shorter sampling interval means a heavier load for data acquisition, and this might not be feasible in marine control practice.

bii 10+---1----'>-....""""' Q)

:;:.

5

4.2 Results from Intensive Training

"" ]" -5 .c " -to time(s)

-15

~~L6"~~' ~=~~t:~~ ~~~~ 50

toO

150

200

250

300

350

400

(b) Under wind plus noise Fig. 7. NNC performance using "iterative training" The reason for selecting such a course sequence is to test the NNC's response with different step sizes and different time periods allowed to complete (or otherwise) the course-changing manoeuvres. In all the simulation results provided, the weighting constants in Eq. (I) are: p=0.5, 1.=0.1, and cr=O.1. The learning-rate parameter 11 in Eq. (2) takes the value 0.5. The rudder angle and its rate are constrained to

The next simulation is to test intensive training for the same course-keeping/changing task. Here a maximum number of training iterations, n l1lax ' is specified within each sampling interval. The choice of n max should meet the following two requirements . Firstly, n max should be large enough to allow sufficient (but not excessive) training so that the NNC can immediately generate appropriate control signals. Secondly, n max should be small enough to allow the on-line training, that is, the network training should be completed within one sampling interval. Also the value of n max should also be incorporated with the value of ". Some pre-testing is therefore needed to find a suitable value of n ma ,. Detailed explanation about the selection of n max can be found in an earlier reported study, see (Zhang. et al., 1997b) . In the simulation results provided in this paper (see Fig. 8). n ma , is selected 400. In Fig. 8 some high frequency oscillations in rudder are observed. This, as has been discussed by Zhang, et al. (1997b), is caused by excessive training . If tl l1l :" = 300, such oscillations can be effectively reduced. Compared to Fig. 7, it is clearly seen that the control performance using intensive training is significantly improved. Because of the consistent control actions during the transient period, the response of the ship heading is quicker and the overshoot is smaller than the iterative training case. The remaining problem of intensive training appears to be the occurrence of high frequency oscillations in rudder, which can only be reduced by carefully selecting n l11a , (and Tl) values.

OE[-30", 30"] and 8E[-2S/s,2S/s] respectively. The initial speed of the ship is 15 knots . For each training strategy the NN autopilot is tested under the following two scenarios: (a) under the ideal condition of no wind and no measurement noise, and (b) subject to random wind disturbances and measurement noise.

4. J Results from Iterative Training As stated in Section 2, iterative training is on a "one training iteration within one sampling interval" basis. Since the network weights will only converge to their

86

15

OD

actual headine

desired heading

101+--~"--_~

" ""c

:::.

5

]

-5

"

.<: -10

OD

-15j-_-:' 50:--_~_ _~_t.:..:im=e(",s)_-,-_ _~_-:,_ __ 30 100 150 250 300 350 4 0

~

20 10

~

S. DISCUSSIONS AND CONCLUDING REMARKS time(.) 50

150

100

200

250

300

350

400

(a) No wind, no noise 15

OD

actual heading

desired heading

10

:::." 5 1 "" 0 ' ]= -SrI .<: - 10-15

1

50

50

100

150

100

150

200

250

300

350

4 0

350

400

(b) Under wind plus noise Fig. 8. NNC performance using "intensive training "

4.3 Resultsfram Moderate Training The basic idea of moderate training is to dynamically select the number of training iterations in each sampling interval. In particular, if there is no significant reduction in Ek network training is then terminated . Simulation results are provided in Fig. 9. 15

OD

" ""C

aclual heading

5

~

-5

desired heading

"

.<: - 10

~ ~

-15r , _ _~_~_ _-,-_~li~m~ e~ (s~ ) _~_~_ _~_~ 30 50 100 150 200 250 300 350 4 0 20

10

lIme(s) 50

100

150

200

250

300

350

400

(a) No wind, no noise 15

OD

"

:::.

actual heading

desired heading

101+ - -,<---=:'","-=

5

OD

c

~

-5

"

.<: -10 -15

J---5~0---'10-0----'15'-0-~1l~:L:~.!:(s!...)-2~50--3..J..00--3-'5-0---<4 0

50

100

150

200

250

300

350

The direct neural control scheme has been investigated for sometime at Newcastle for various ship control problems . The most attractive features of the proposed NNCs are their independence of a mathematical model of the ship and their adaptive ability of coping with new situations. In this paper some further insights have been provided to compare different training strategies in the framework of the direct neural control scheme. Simulation based study shows that the proposed "moderate training" method can effectively identify the appropriate training effort required in each sampling interval , so that reasonable control actions can be generated from the start of the NNC's operations. Because of the time and page limitation, only the single-input single-output (SISO) autopilot has been concentrated in this study. Future work will investigate the feasibility of moderate training for SIMO and MIMO ship control applications including track-keeping/changing, automatic berthing and integrated automatic navigation systems. REFERENCES

1 01t---;p~-...,......

:::.

OD

Here no pre-testing is required to identify the appropriate n I1l JX ; it is selected automatically. Comparing with Fig. 8 it is found that although there is no obvious improvement in the ship heading accuracy, the high frequency oscillations in rudder have been successfully removed.

400

(b) Under wind plus noise Fig. 9. NNC performance using "moderate training"

87

Burns, R. S. (1995) , "The use of artificial neural networks for the intelligent optimal control of surface ships ," IEEE Journal of Oceanic Engineering, Vo!. 20, No. I , pp . 65-72 . Enab, Y . M. (1996) , "Intelligent controller design for the ship steering problem ," lEE Proceedings-D: Control Theory and Applications, Vo!. 143 , No . l , pp. 17-24. Endo, M ., Amerongen, 1. van and Bakkers, A .W. P. (1989) , "Applicability of neural networks to ship steering", Proc. of IFAC Workshop on Expert Systems and Signal Processing in Marine Automation, Lyngdy , Denmark, pp. 221-232. Mort, N., Derradji , D. A., Tiano, A. and Ranzi , A. (1993), "Application of neural networks to marine vehicle control," Prac. 10th Ship Control Systems Symposium (10th SCSS), Vo!. 2, pp. 287-305. Psaltis, D ., Sideris, A. and Yamamura, A. A. (1988) , "A multi-layered neural network controller." IEEE Control Systems Magazine , Vo!. 8, No . 2, pp. 17-21. Rumelhart, D. E., Hinton . G. E. and Williams , R. 1. (1986), "Learning internal representations by error propagation," 10 Parallel Distributed Processing , Vo!. I , (D. E. Rumelhart and 1. L. McClelland, eds.), Cambridge, MA: MIT Press, Chapter 8, pp. 319-362.

Saerens, M. and Soquet, A. (1991), "Neural controllers based on back-propagation algorithm," lEE Proceedings-F: Radar and Signal Processing, Vo\. 138, No. I. pp. 56-62. Sen, P. , Hearn , G. E. and Zhang, Y. (1997), "Adaptive neural controllers," a chapter to appear in Neural Network Systems Techniques and Applications, (Cornelius T. Leondes Ed .), Academic Press, San Diego, California, USA, 74 pp. Simensen , R. and Murray-Smith, D. 1. (1995), "Simulation of artificial neural networks for ship steering control ," Proc. 2nd Conference of UK Simulation Society (UKSS'95), Scotland, pp. 6572. Witt, N . A. J. and Miller, K. M . (1993) , "A neural network autopilot for ship control ", Proc. 3rd International Conference on Maritime Communications and Control, London, pp. 47-59. Zhang, Y., Hearn , G . E. and Sen, P. (l997a) , "Neural network approaches to a class of ship control problems (Part I. Theoretical design: Part n. Simulation studies)," Proc. 11th Ship Control Systems Symposium (11th SCSS) , Southampton , UK, Vo\. I. pp. 115-133, and pp. 135-150. Zhang, Y., Hearn , G. E. and Sen P. (l997b), "A multi variable neural controller for automatic ship berthing," IEEE Control Systems Magazine , Vo\. 17, No. 4, 15 pp.

88