~
Computers and Chemical Engineering Supplement (1999) S297-5300 © 1999 Elsevier Science Ltd. All rights reserved PII: S0098-1354199/00023-X
Pergamon
RECEDING HORIZON CONTROL USING MODIFIED ITERA TIVE DYNAMIC PROGRAMMING AND NEURAL NETWORK MODELS A. RUSNAK, M. FlKAR AND A. MESzAROS Slovak Universityof Technology, Faculty of ChemicalTechnology, Radlinskeho 9, 81237 Bratislava, Slovakia Phone: (+42) 7395269, Fax: (+42) 7 393 198 E-mail:
[email protected]
Abstract The basic idea of the contribution is to replace a mathematical model of the process by an equivalent neural network (NN) that mimics the phenomenological model and it is used as a process predictor in tile modified iterative dynamic programming control (lOP) algoritIun. IDP is a vel)' useful technique for solving unconstrained and constrained dynamic optimisation problems. The original IDP method is developed for continuous systems within state space formulation. The modified algoritIun uses a leamed NN as a process predictor. The algoritlun modifications resulting from this type of tile models include several important issues tI13t arise from tile use of discrete-time and' input-output model formulations, Moreover, there are also some significant problems tI13t are not to be overlooked stenuning from the receding horizon implementation of tile method. The contribution discusses all these issues. The benefits of the proposed approach are small number of iterations required to converge to global optimum, ability to handle multivariableconstrainedsystems and significanttime reductioncompared to tile original lOP method.
Keywords: iterative dynamic progranuning, neural networks, optimal control
Introduction It is well known that linear control theory can be inadequate for non-linear chemical processes. This 113s led to increased activity in tile area of non-linear process control, especially in tile past decade. Nonlinear process control can be divided into two main areas: optimisation and transformation methods. Optimisation-based methods have excellent performance but there is a lack of theoretical analysis. On tile other side, transformation-based methods have good theoretical background but are limited only to certain types of non-linearsystems. Most of control methods require a model of the system to be controlled. Therefore, performance of a control method is mainly affected by the quality of tile model, Mathematical models of systems may be vel)' complex and need high computational time and memory requirements. Among many optimisation methods, iterative dynamic programming (lOP) developed by Luus (1989), 113s some good properties tI13t attracted attention of many researchers. This method is easy to implement, is quite robust, and does not involve solution of a non-linear programming (NLP) to obtain an optimum. The main drawback of the original lOP is the enormous computational load caused by the integration of the first-principle models that makes it unsuitable for on-line implementation. TIle second disadvantage lies in tile determination of off-line optimal policies only. This can lead in tile presence of modelling errors and disturbances to sustained offset, resulting in off-spec products.
To obtain improvedpcrfonnance of the lOP algoritlun and to avoid tile problems with complexities of the system model, we implemented an ANN as a system predictor into tile lOP algorithm instead tile mathematical model of the controlled system. Artificial neural networks The use of ANN in control theory, has recently attracted a great deal of attention, primarily because ANN appear to provide a convenient means for modelling complicated non-linear processes at low cost. ANN models are non-linear black-box models (Hunt et al., 1992). TIley have good general approximation capabilities for modelling complex non-linear processes because they are able to match input/output behaviour of any continuous non-linear system. System identification by ANN
If we consider a system whose input/output relationship can be written as: y
= f(u)
(1)
where y is tile plant output, u plant input, andA.) is an unknown, possibly non-linear function. Our aim is to design an ANN which generates some function g(.) which closely approximates tile unknown functlon ji.), by using input/outputdata from process. We carry out tile identification that can be described as follows: .v=g(U,W)
(2)
5298
Computers and Chemical Engineering Supplement (/999) S297-S300
where y is the ANN prediction of y from (1), U is the ANN input vector and IV is the weight vector to be identified. Each layer in the ANN structure consists of an appropriate number of neurons that arc the fundamental building blocks. All neurons in a given layer are usually fully connected to the neurons in the next layer. From large number of the ANN architectures, the most popular is the feedfot.. ward neural network, often called backpropagation network. It consists of one input layer, one hidden layer, and one output layer (Hunt et al. 1992). Every neuron consists of synaptic input connections and a single output. The output of a neuron is calculated as follows: ,, - 1
oUI:::: f(inp):::: f ~XiWi
)
+W"
(3)
(
where Xi are the neuron inputs, Wi are the weights, and Wn is an additional input called bias or threshold coefficient. The function j{.) is the activation function of the neuron. The sigmoid function is commonly used as the activation function. The proposed approach is based on idea to use the ANN as a model of the system to be controlled. In comparison with "classical" approaches that use mathematical model of the process (ordinary differential equations), proposed improvement decreases the time consumption that is caused by integrating the mathematical model of the controlled system. Use of the feedforward ANN as a multi-step predictor can generally give bad predictions over a trajectory (prediction horizon) because errors are amplified when inaccurate network outputs are recycled to the input layer. Therefore, the better choice is to use a recurrent ANN as multi-step predictor. There are two classes of recurrent ANN in literature, the GrossbergIHopfield recurrent ANN and the Time-lag recurrent ANN (Su and McAvoy 1992, Werbos, 1990). As control actions are based on the prediction of the plant behaviour, an offset can occur due to disturbances and model mismatch when the ANN is used as the dynamic model of the controlled process. Therefore the plant output is predicted at each sampling time as follows:
yet) :::: feU ,IV) + d(t)
(4)
where d is a disturbance, that is computed by the following equation: d(/+i):::: d(t):::: Yet) - yet)
(5)
where y is the current value of the plant output and y is the prediction of y generated by the ANN predictor. This disturbance is assumed to be constant over tile prediction horizon. At each sampling period the following signals are fed into the ANN predictors: past and present plant outputs together with past and proposed future inputs. The ANN predictor calculates predictions of the plant outputs over the relevant horizon, which are corrected with the calculated deviation (eq. (5»
at time t. These corrected model predictions arc used in IDP algorillun instead of the predictions obtained from mathematical model. Iterative Dynamic Programming TIle original IDP method has been developed for continuous systems within state space formulation. As ANN models are more naturally derived as discrete input/output systems, the original method has been modifiedand is given bellow. IDP is based on Bellman's principle of optimality. The method does not require the detailed knowledge of tile functional relationship between the parameters being optimised and tile objective function being minimised as it is required in gradient-based methods. The basic principle of IDP is to optimise P single stages in tum starting at tile last stage instead of optimising all P stages simultaneously. Thus, whenever the final k stages of the optimal control have been established, tile preceding stage may be obtained by simply considering one new stage and tile continuing witil already established control policy in the remaining stages. Detailed theoretical analysis of tile IDP can be found in (Bojkov and Luus, 1993, 1995; Dadebo and Mcauley, 1995). Problem formulation
Considerthe discrete-time system: y(/):::: f(y(t - 1),y(1 - 2), ... ,u(l -1),u(t - 2), ...) (6)
and supposethat the input u(l) has to be within limits:
(7) The associatedperformance index to be optimised is:
f)
I :::: F(y(/), u(l), 1,1
(8)
where the final time If is given. The optimal control problem is to find tile control policy u(/) in tile time interval OSI
P
(9)
The problem is to find a piecewise constant control policy UI •••Up as an approximation to the continuous policy, which minimises tile performance index given by (8). Modified IDP algorithm
TIle proposedalgorillun can be described by following steps: 1) Choose the parameters P - number of stages. each of length L. M - number of generated control actions. N - numberofy-grid points, rO - initial size of control region, 't. contractionfactor, 'te[0.7 - 0.91. Ne- number of iterations,
5299
Computers and Chemical Engineering Supplement (1999) S297-S300
n - number of steps for which the output trajectories are compared. 2) Choose N control trajectories by perturbing the initial control trajectory UO or optimal control trajectory u H obtained in previous iteration uniformly inside the admissible region 3) Use the N control trajectories to obtain N syste~ output trajectories for interval [0, lj) JJy using the neural model of system and store the system output trajectories. 4) Start at the last stage P corresponding to time interval [t1 - L,tI], for each j-grid point generate M admissible values for control by: (10)
where i is the iteration index, D is a diagonal random matrix of dimensions [M, M] with elements within the range [-I, I), and U~-I is the best value of control action for the particular j-grid point obtained in the previous iteration. Apply these control actions to neural model and obtain the set of corresponding output trajectories at this stage. Compute M values of the cost function (performance index) and choose the control action that minimises the cost function given by (8). Store all N best control actions for the next step. 5) Step back to stage pol, corresponding to time interval [t1 - 2L,tl - L) and again generate the M control actions given by (10) and simulate the system within the interval P-I and P. In the interval P the best control policy is chosen as follows. The current output trajectory is compared with N output trajectories on the interval [P - n, P -I). The trajectory that is the closest to the current one is chosen and yields the best control action on the stage P. Then compare M different values of the objective function and store the control action that gives the minimum value. 6) Repeat the previous step until the initial time t=O is reached. Store the control policy that minimises performance index (8). 7) Reduce the region for the admissible control values by an amount y (11)
The iteration index is increased and the procedure is repeated for a specific number of iterations. After N· '" t Iterations the obtained control trajectory is applied to the controlled system. The general schematic diagram of the proposed control algorithm is depicted in Fig. 1.
.(-_~-lAN ! ······ ..
-
ii..·j··
:
"',,'N>oulJ
Fig. I
The controller structure based on IDP and ANN model
The original IDP method is solves only the problem of dynamic optimisation, i.e. only open-loop control strategy is calculated. When this method is restated within the receding horizon framework, only the first control move obtained is actually implemented. If some control trajectory has been calculated, in the next sampling time the initialisation of IDP is based on this trajectory and assumption that the last control increment is zero. Therefore, the new optimal control trajectory is probably close to the initial and it is not necessary to perform N, iterations. Some other convergence criteria may be defined, as average decrease of the cost in some last iterations. Also, the initial size of the control region r O may be reduced. However, this should only be done with some precaution and in co-operation with testing whether s~me of the closed-loop inputs (reference, disturbance) have not changed substantially. Simulation results To demonstrate the performance and the feasibility of this approach we have applied it to control a continuous-flow, stirred biochemical reactor. The model of the ferrnenter describes the growth of Saccharomyces cerevisiae on glucose with continuous feeding (Meszaros et. al., 1995). In the simulations the described algoritIun was applied to the dissolved oxygen concentration (co(t» control using the gas dilution rate (Dg(t» as the manipulated variable with the sampling time of 0.5 h. The structure of the ANN predictor used was [6,5,3, I): six neurons in the input layer with inputs y(t), y(tI), y(t-2), u(t), u(t-l), u(t-2), five and three neurons in the first and second layer, respectively, and one neuron in the output layer. The IDP parameters were set as follows: P = 8, M = 5, N = 10, l = 3, r = 0.8, Nt = 50, n = 3. The associated performance index to be optimised was: J
N2
N2
j=N.
j=1
= L(y'(t+ j)- y(t+ j»2 +ALAU 2(t+ J-I)
where the parameters were set as follows: N, = I, N] = 8. A= O.
5300
Computers and Chemical Engineering Supplem ent (/999) S297-S300
- - system output Co
Acknowledgements
.. .... .-. desired output c· 0
The authors are very pleased to acknowledge the financial support of the Scientific Grant Agency of the Ministry of Education of the Slovak Republic and Slovak Academy of Sciences under grants No.
It
' ·Il';'\\~ 2
20 40 Simulation time [ h )
1000
2000
r
60
80
3000
Computational time s J
Fig. 2
Receding horizon ANN-IDP control
Figure 2 shows the results. It can be seen that both steady state and transient behaviours are satisfactory. Only when desired dissolved oxygen concentration Co is 0.2 lOs molfl some problems are encountered. In this operational point the static gain of the process changes dramatically and a discrepancy between the process and its ANN model results. However, the condition in this simulation are very drastic as A =0, thus no control penaIisation was performed, When nonzero values of A have been tested, the control actions have been fully satisfactory. The bottom graph shows the actual computational time when the program runs at PC Pentium 100 MHz. One outer iteration of IDP (calculation of the whole optimal control trajectory) takes only some seconds which is far less as in the usual IDP references (cf. Fikar et. al., 1998 with about 3 min for one inner iteration with 200 MHz Pentium PRO CPU). Conclusions This contribution has dealt with a modification of IDP method that can be used with discrete-time models and within receding horizon formulation. The process model has been approximated by a recurrent ANN that serves as a multi-step predictor, The reason for this type of models is a dramatic decrease of computational load needed for single IDP iteration and the subsequent possibility to use IDP on-line. Because the method deals with input-output system formulation rather that with state-space, some parts of the original lOP have been modified. Also, the receding horizon formulation leads to different strategies of the terminationof the method. The results have shown that on-line implementation of the modified method is possible. The replacement ofthe mathematical model by an equivalent ANN in the optimisation step takes advantage of the high speed processing, since simulation with ANN involves only a few non-iterative algebraic calculations. One disadvantage of this approachcan be time consumptionof training phase of the NN model.
95/5195/198, 1/5220/98.
References Bojkov, B., Luus, R, 1993, Evaluation of the parameters used in Iterative' Dynamic Programming. Can. J. Chem. Engng. 71,451-459.
Bojkov, 8., Luus, R., 1995, Time optimal control of high dimensional systems by Iterative Dynamic Programming. Can. J. Chenr. Engng. 73, 380-390. Chmumy, D., Prokop, R., and Bakosova, M., Automatic Control of Teclutological Processes. Alfa, Bratislava, 1988. (in Slovak). Dadebo, S. A., Mcauley, K. B., 1995, Dynamic optimization of constrained chemical engineering problems using dynamic programming. Camp. Chem. Engng. 19,513-525.
Hunt, K. 1. D., Sbarbaro, D., Zbikowski, R., and Gawthrop, PJ.. 1992, Neural networks for control systems- a survey. Automatica. 28, 1083-112. Su, H. T., McAvoy, T. 1., 1992, Long-term predictions of chemicalprocesses using recurrent neural networks: a parallel training approach. Ind. Eng. Chem . Res. 31, 1338-1352.
Fikar, M., Latifi, M. A, Fournier, F., and Creff Y., 1998, Application of IDP to optimal control of a distillation column. Can. J. Chem. Engng. (accepted). Luus, R, 1989, Optimal control by dynamic programming using accessible points and region reduction. Hung. J. Ind. Chem. 17, 523-543 . Meszaros, A, Brdys', M., Tatjewski, P., and Lednicky, P., 1995, Multilayer adaptive control of continuous bioprocesses using optimising control teclutique. Case study: Baker's yeast culture. Bioprocess Engineering. 12, 1-9.
Najim, K, Rusnak, A, Meszaros, A, and Fikar, M.. 1997, Constrained long-range predictivecontrol based on artificial neural networks. Int. J. Sys. Sci. 28, 12111226. Werbos, P., 1990, Backpropagation through time: what it does and how to do it. Proc. of the IEEE. 78, 1550-1560.