Training of neural models for predictive control

Training of neural models for predictive control

ARTICLE IN PRESS Neurocomputing 73 (2010) 1332–1343 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/loca...

521KB Sizes 6 Downloads 121 Views

ARTICLE IN PRESS Neurocomputing 73 (2010) 1332–1343

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Training of neural models for predictive control Maciej Ławryn´czuk  Institute of Control and Computation Engineering, Faculty of Electronics and Information Technology, Warsaw University of Technology, ul. Nowowiejska 15/19, 00-665 Warsaw, Poland

a r t i c l e in f o

a b s t r a c t

Article history: Received 28 January 2009 Received in revised form 3 September 2009 Accepted 14 December 2009 Communicated by R. Tadeusiewicz Available online 11 January 2010

This paper emphasises the link between neural model training and its role in model predictive control (MPC) algorithms. This role is of fundamental importance since in MPC at each sampling instant a model is used on-line to calculate predictions of future behaviour of the process and an optimal future control policy. Taking into account this particular function of models in MPC, a training algorithm of neural dynamic models is derived. An example identification problem of a methanol–water distillation process is discussed. The prediction accuracy of models obtained using the described algorithm and the classical backpropagation scheme is compared, which yields one-step ahead predictors. & 2010 Elsevier B.V. All rights reserved.

Keywords: Model predictive control Neural networks Identification Optimisation

1. Introduction Model predictive control (MPC) is recognised as the only advanced control technique (i.e. more advanced than the classical PID approach) which has been exceptionally successful in practical applications [31]. MPC has influenced not only the directions of development of industrial control systems but also research in this field [21,31,33,38]. MPC algorithms have numerous advantages. In particular, unlike any other control technique they make it possible to take into account constraints imposed on process inputs (manipulated variables) and outputs (controlled variables). Constraints usually determine quality, economic efficiency and safety of production. Secondly, MPC techniques are very efficient in multivariable process control, i.e. for processes with many inputs and outputs. Moreover, MPC algorithms can be efficiently used for processes with difficult dynamic properties, e.g. with significant time-delays or the inverse response. The role of a model in MPC algorithms is crucial. In MPC at each sampling instant a model of the process is used on-line to calculate predictions of future behaviour of the system over some time horizon. These predictions are next used for optimisation of a future control policy. As a results, MPC algorithms are very model-based because they give good control provided that predictions calculated from the model are accurate enough. An inaccurate model gives erroneous predictions. If the mismatch

 Tel.: + 48 22 234 73 76; fax: + 48 22 825 37 19.

E-mail address: [email protected] 0925-2312/$ - see front matter & 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.neucom.2009.12.015

between predicted and real behaviour of the process is significant, the whole optimisation process aimed at finding the optimal future control policy is useless. In such a case it is reasonable to use less advanced control algorithms. MPC techniques based on easy to find linear models are frequently used in practice [31]. In many cases the obtained control accuracy is sufficient, much better than that of the classical PID approach. Because properties of many technological processes are actually nonlinear, different nonlinear MPC techniques have been developed [10,24,31,38]. In comparison with MPC algorithms based on linear models they make it possible to significantly improve the control quality. Main measures of model utility are: approximation accuracy, suitability for control, easiness of development and, in some cases, physical interpretation [27]. Fundamental (first-principle) models [16,22], although potentially very precise, are usually not suitable for on-line control. Such models comprise systems of nonlinear differential and algebraic equations which have to be solved online in MPC at each sampling instant. Such an approach is usually computationally demanding as fundamental models can be very complex and may lead to numerical problems in MPC (e.g. stiffness, ill-conditioning). Moreover, in many cases development of fundamental models is difficult. Incorporation of neural networks [9] into MPC has a few advantages. First of all, neural networks are universal approximators [11], which means that a multilayer perceptron with at least one hidden layer can approximate any smooth function to an arbitrary degree of accuracy. Hence, they can be efficiently used for modelling of various technological processes [12]. Moreover, neural networks have relatively a small number of parameters in

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

comparison with other model types (e.g. polynomials, fuzzy models) and their structure is simple. It is particularly important for multivariable process with many input and output variables. In general, neural models can be used in MPC algorithms of nonlinear processes when the fundamental model is not available or the fundamental model is too complex to be used on-line in MPC. Neural models directly describe input–output relations of process variables. In consequence, numerical problems typical of MPC algorithms based on comprehensive fundamental models are not encountered, it is not necessary to solve on-line in MPC complicated systems of nonlinear differential and algebraic equations to calculate the future control policy. Numerous variant of MPC algorithms which use neural models have been developed. If the neural model is used directly in MPC, at each sampling instant the control policy must by calculated by a nonlinear optimisation routine [1,23,26,38,40,41]. As low computational complexity determines reliability of the whole control system, suboptimal MPC approaches are recommended in which the neural model is locally linearised on-line and the control policy is calculated from an easy to solve quadratic programming problem, the necessity of nonlinear optimisation is avoided [4,20,26,38]. Since neural models are able to learn from examples, they can be also used in adaptive MPC algorithms in which the model is adjusted on-line when parameters of the process change [15,40]. Stable [18,28] and robust [29] versions of MPC algorithms with neural models have been also developed. Undoubtedly, the model identification phase [14] has to be related to its further application in MPC. Surprisingly, this issue is paid little attention in the literature. As emphasised in [34] the norm in identification is usually to focus on identification as an end in itself, the purpose of finding the model is rather secondary. On the other hand, in control engineering obtaining the model is only an intermediate step towards developing the control algorithm. The future role of the model in control cannot be ignored during identification. The model used in MPC has to be able to make good predictions of future behaviour of the process not only one step ahead, but over the whole prediction horizon. Because in MPC algorithms predictions over a long range horizon are usually calculated, it should be natural to take this fact into account as early as at the identification stage. Conceptually, it should lead to obtaining specific models whose long range prediction accuracy is better in comparison with that of models whose identification ignores their further use in MPC. Neural network models are usually trained using the rudimentary backpropagation algorithm [9] which yields classical one-step ahead predictors. Such predictors are naturally not suited to be used recursively in MPC for long range prediction since predictions depend recursively on predictions calculated for previous sampling instants within the prediction horizon. Unfortunately, this fact is ignored during training. To solve the problem resulting from the inaccuracy of one-step ahead predictors in nonlinear MPC two approaches can be used. First of all, neural models can be trained by means of general recurrent training algorithms [25,26,32,37], but they are not devised with MPC in mind. An alternative straightforward solution is to choose the structure of the model in such a way that its role in MPC is not ignored. For example, one can use a multi-model in which for each sampling instant within the prediction horizon one independent submodel is used [5,7,13,34]. For nonlinear processes neural multi-models can be developed [19], consecutive submodels are trained easily as one-step ahead predictors. Alternatively, only one structured model can be used [17], but analogously as multi-models it has the unique ability to predict future values of the output without taking into account previous predictions calculated within the prediction horizon. Unfortu-

1333

nately, when the prediction horizon is long, multi-models and structured model have many parameters, which may turn out to be a disadvantage. This paper describes an algorithm for neural models training (identification) which directly takes into account the specific fact that these models are next used recursively in MPC for long range prediction. The general case of multiple-input single-output (MISO) dynamic processes is discussed. Unlike multi-models, the neural model considered in this paper has the multilayer perceptron (MLP) [9] classical structure, good prediction abilities result from using a specific training approach. During training an efficient quasi-Newton BFGS (Broyden–Fletcher–Goldfarb–Shanno) optimisation algorithm [2,6] with analytical first-order derivatives and an approximated inverse matrix of second-order derivatives (the Hessian matrix) is used. The outline of the paper is as follows. First, Section 2 shortly introduces the principle of MPC and Section 3 defines the structure of the dynamic neural model. The main part of the article, given in Section 4, details the training algorithm. In order to demonstrate usefulness of the discussed algorithm a methanol–water distillation process is considered. Accuracy of neural models obtained by means of the standard backpropagation training algorithm yielding one-step ahead predictors and the discussed approach are presented in Section 5 and the paper is concluded in Section 6.

2. Model predictive control problem formulation In MPC algorithms [21,33,38] at each sampling instant k a set of future control increments is calculated

DuðkÞ ¼ ½DuðkjkÞDuðk þ1jkÞ    Duðk þ Nu 1jkÞT

ð1Þ

It is assumed that Duðk þ pjkÞ ¼ 0 for p ZNu , where Nu is the control horizon. The notation Duðk þ pjkÞ denotes a prediction of the increment of control signals for the future sampling instant kþ p calculated at the current sampling instant k. The objective of MPC is to minimise differences between the reference trajectory ^ þ pjkÞ over the yref ðk þ pjkÞ and predicted output values yðk prediction horizon N ZNu and to penalise excessive control increments. Future control increments (1) are calculated in such a way that the following quadratic cost function is minimised online: JðkÞ ¼

N X p¼1

^ Jyref ðkþ pjkÞyðkþ pjkÞJ2Mp þ

NX u 1

JDuðk þ pjkÞJ2Kp

ð2Þ

p¼0

In this paper multiple-input multiple-output (MIMO) processes are considered with nu inputs (manipulated variables) and ny outputs (controlled variables), i.e. uðkÞ A Rnu , yðkÞ A Rny . M p Z0 and Kp 40 are weighting matrices of dimensionality ny  ny and nu  nu , respectively. Only the first nu elements of the determined future control sequence (1) are actually applied to the process (i.e. control moves for the current sampling instant k), uðkÞ ¼ DuðkjkÞ þuðk1Þ. At the next sampling instant, kþ 1, output measurements are updated, the prediction is shifted one step forward and the whole procedure is repeated. Fig. 1 illustrates the general idea of MPC and shows the notation used, for the simplicity of presentation single-input single-output (SISO) process is assumed. Although parameters of MPC algorithms (N, Nu , M p and Kp ) are usually adjusted experimentally, different tuning criteria can be found, e.g. [21,33,38]. MPC algorithms directly use an explicit dynamic model in order to predict future behaviour of the process, i.e. to calculate ^ þpjkÞ, over the predicted values of the output variable, yðk prediction horizon ðp ¼ 1; . . . ; NÞ which are next used to determine

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

1334

Fig. 2. The structure of the MISO neural model.

Fig. 1. The principle of predictive control, k—the current sampling instant.

the optimal control policy (1). That is why the role of the model in MPC is crucial. MPC techniques are very model-based, the accuracy of the model significantly affects the prediction accuracy and, in consequence, the quality of control. Predicted values of output variables are calculated using the model recurrently. The general prediction equation is ^ þ pjkÞ ¼ yðk þ pjkÞ þdðkÞ yðk

ð3Þ

where quantities yðk þpjkÞ are outputs of the model used for the future sampling instant kþ p calculated at the current sampling instant k, p ¼ 1; . . . ; N. Unmeasured disturbances dðkÞ are usually assumed to be constant over the prediction horizon [38]. They are estimated from

w2 ðiÞvi ðkÞ ¼

i¼0

K X

w2 ðiÞjðzi ðkÞÞ

ð6Þ

i¼0

where zi ðkÞ and vi ðkÞ are the sums of inputs and the outputs of the i th hidden node, respectively, v0 ðkÞ ¼ 18k, j: R!R is the nonlinear transfer function (e.g. hyperbolic tangent) of the hidden layer, K is the number of nonlinear hidden nodes. Recalling the input arguments of the general MISO nonlinear model (5) one has zi ðkÞ ¼ w1 ði; 0Þ þ þ

IX nu u ðnÞ X

w1 ði; Rn þk0 Þun ðktn þ1k0 Þ

nA X

w1 ði; Iu þk0 Þyðkk0 Þ

ð7Þ

k0 ¼ 1

Weights of the network are denoted by w1 ði; jÞ, i ¼ 1; . . . ; K, P j ¼ 0; . . . ; nA þ nnu¼ 1 ðnnB tn þ 1Þ, and w2 ðiÞ, i ¼ 0; . . . ; K, for the first and the second layers, respectively. The number of the network’s input nodes depending on input signals un is Iu ðnÞ ¼ nnB tn þ 1, n ¼ 1; . . . ; nu . The total number of weights is P ðnA þ nnu¼ 1 ðnnB tn þ 1Þ þ1ÞK þ K þ1. Auxiliary coefficients are 8 0 n¼1 > < n1 X n R ¼ ð8Þ Iu ðiÞ n ¼ 2; . . . ; nu > : i¼1

and Iu ¼

3. Structure of the neural model

nu X

Iu ðiÞ

ð9Þ

i¼1

It is assumed that the model of the MIMO process under consideration with nu inputs and ny outputs comprised ny multiple-input single-output (MISO) models. These models are trained independently. Hence, in the following part of the article the problem of training a MISO model is considered. Let the MISO model be described by the following nonlinear discrete-time equation:

The total number of input nodes of the neural model is I ¼ Iu þ nA . In MPC the neural model is used to calculate consecutive predictions over the whole prediction horizon. Using model (6) at the current sampling instant k to find the prediction for the future sampling instant kþ p, one has yðk þ pjkÞ ¼

K X i¼0

Þ; . . . ; u1 ðkn1B Þ; . . . ; unu ðk nu Þ; . . . ; unu ðknnBu Þ;

yðk1Þ; . . . ; yðknA ÞÞ

K X

ð4Þ

where quantities yðkÞ are measured while yðkjkÞ are calculated from the model at the sampling instant k using available measurements up to the sampling instant k1. Usually, the MPC cost function (2) is optimised subject to constraints imposed on the magnitude and increments of input variables as well as the magnitude of output variables. Constraints imposed on input variables result from physical limits of actuators whereas constraints imposed on output variables usually result from technological and economic requirements. Unlike other control techniques, constraints are easily taken into account in MPC, they are simply a part of the optimisation problem which is solved on-line at each sampling instant. As emphasised in the introduction, the ability to satisfy constraints is a great advantage of MPC and determined its industrial success.

yðkÞ ¼ f ðu1 ðkt

yðkÞ ¼

n ¼ 1 k0 ¼ 1

dðkÞ ¼ yðkÞyðkjkÞ

1

nonlinear function realised by the neural model. As the model of the whole MIMO process ny feedforward neural networks of multilayer perceptron (MLP) type with one nonlinear hidden layer and a linear output [9] are used. The structure of the MISO neural model is depicted in Fig. 2. The output of the model is expressed as

t

ð5Þ

where positive integers nA and nnB for n ¼ 1; . . . ; nu define the order of model dynamics, positive integers tn denote discrete-time Pnu n n delays of the model ðtn rnnB Þ, f : RnA þ n ¼ 1 ðnB t þ 1Þ !R is the

w2 ðiÞvi ðk þpjkÞ ¼

K X

w2 ðiÞfðzi ðk þpjkÞÞ

ð10Þ

i¼0

Quantities zi ðk þpjkÞ and consequently yðk þpjkÞ depend on control signal values applied to the plant at previous sampling instants (i.e. un ðk1Þ; un ðk2Þ; . . .), future control signals (i.e. decision variables of the MPC algorithm un ðkjkÞ; un ðkþ 1jkÞ; . . .), values of the plant output signal measured at previous sampling instants (i.e. yðk1Þ; yðk2Þ; . . .) and future output predictions (i.e.

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

^ þ 1jkÞ; yðk ^ þ2jkÞ; . . .). From (7) one has yðk zi ðkþ pjkÞ ¼ w1 ði; 0Þ þ

ðn;pÞ nu IufX X

w1 ði; Rn þ k0 Þun ðktn þ 1k0 þ pjkÞ

Bearing in mind the role of the model in MPC, it is straightforward that the following performance function should be minimised during neural network training:

n ¼ 1 k0 ¼ 1

þ

Iu ðnÞ X

nu X

JN ¼ w1 ði; Rn þ k0 Þun ðktn þ 1k0 þ pÞ

þ

þ

^ w1 ði; Iu þk0 Þyðkk 0 þ pjkÞ

k0 ¼ 1 nA X

w1 ði; Iu þk0 Þyðkk0 þ pÞ

kmax X N X 1 ðyðk þ pjkÞyðk þpÞÞ2 Nðkmax kmin þ 1Þ k ¼ k p ¼ 1

ð13Þ

min

n ¼ 1 k0 ¼ Iuf ðn;pÞ þ 1 IX yp ðpÞ

1335

ð11Þ

The above performance function is used in [35,36] for identification of linear models. Unlike the classical performance function J1 , the function JN is closely related to the role of the model in MPC. All predictions over the prediction horizon N are considered.

k0 ¼ Iyp ðpÞ þ 1

4.2. Training algorithm n

where Iuf ðn; pÞ ¼ maxfminfpt þ 1; Iu ðnÞg; 0g is the number of the network’s input nodes depending on future control signals of the nth input and Iyp ðpÞ ¼ minfp1; nA g is the number of the network’s input nodes depending on output predictions.

4. Neural models training for MPC 4.1. Problem statement It is assumed that for model training a set of training data is given (i.e. input sequences un ð1Þ; . . . ; un ðSÞ, n ¼ 1; . . . ; nu and the output sequence yð1Þ; . . . ; yðSÞ where S is the number of samples) collected from measurements of process variables. The objective of identification is to find the topology and parameters of the neural model in such a way that a predefined performance index (a cost function) which describes the accuracy of the model is minimised. A yet another set (the test set) is used in order to assess generalisation abilities of the model. The right selection of training and data sets is an important issue. These data sequences must be rich enough, they should cover the whole operation domain of the process. Identification consists of three phases: model structure selection, training and model assessment. Typically, for a given process various models with different topology (determined by the number of hidden nodes K) and different order of dynamics (determined by integers nA , nnB , tn ) should be trained and evaluated. The order of dynamics can be determined analytically, e.g. [3]. The model finally chosen for MPC should have a relatively small number of parameters and be precise. In this paper the choice of the order of dynamics and the selection of the neural network topology are not considered. Typically, to assess model accuracy the following mean squared error (MSE) performance index is minimised during training J1 ¼

kmax X 1 ðyðkjk1ÞyðkÞÞ2 kmax kmin þ 1 k ¼ k

ð12Þ

min

where yðkjk1Þ is the prediction for the sampling instant k calculated from the neural model at the sampling instant k1 using measurements up to the sampling instant k1, yðkÞ is the real value of the process output variable collected during the identification experiment, integers kmax and kmin indicate the range of the data set used for training. If the performance index J1 is used for training, obtained models are of good quality when one-step ahead prediction is necessary. It does not correspond to the idea of MPC in which long range predictions over the prediction horizon N are used. From (11) it is evident that predictions depend recursively on predictions calculated for previous sampling instants within the prediction horizon. Intuitively, one-step ahead predictors are not suited to be used recursively in MPC for long range prediction.

Neural network training is in fact an unconstrained optimisation mathematical problem in which the objective function (12) or (13) is minimised. In general, for this purpose different optimisation algorithms have been tested: the rudimentary steepest descent, conjugate gradient methods (Polak-Ribiere, Fletcher-Reeves) and quasi-Newton algorithms (DFP, BFGS) [2,6]. During the experiments the results of which are presented in Section 5, an efficient quasiNewton BFGS (Broyden–Fletcher–Goldfarb–Shanno) optimisation algorithm is used. To calculate the search direction the algorithm uses gradients (first-order derivatives) of the minimised objective function and an inverse matrix of second-order derivatives (the Hessian matrix). At each iteration of the algorithm gradients are calculated analytically. Second-order derivatives are approximated numerically because it is usually too computationally expensive to calculate analytically the full hessian matrix whose dimensionality is nw  nw , where nw is the number of model weights. The BFGS algorithm is regarded as the most effective in the majority of applications, especially, for medium scale optimisation problems. In the following part of this section analytical derivatives of the minimised objective function (13) are derived. Next, the whole optimisation procedure is detailed. 4.2.1. Gradients derivation Differentiating the cost function JN with respect to weights of the neural model one has kmax X N X @JN @yðk þ pjkÞ ¼d ðyðk þpjkÞyðk þ pÞÞ @w1 ði; jÞ @w1 ði; jÞ p¼1 k¼k

ð14Þ

min

for the first layer ði ¼ 1; . . . ; K; j ¼ 0; . . . ; IÞ where d ¼ 2=Nðkmax kmin þ 1Þ and kmax X N X @JN @yðkþ pjkÞ ¼d ðyðk þ pjkÞyðk þpÞÞ @w2 ðiÞ @w2 ðiÞ p¼1 k¼k

ð15Þ

min

for the second layer ði ¼ 0; . . . ; KÞ. Although for prediction the output of the MISO model from (10) and (11) depends on control signals applied to the plant at previous sampling instants and future control signals, during training all input samples are known (collected during the identification experiment). It is only necessary to calculate predictions. It is convenient to rewrite (11) as zi ðk þ pjkÞ ¼ w1 ði; 0Þ þ

IX nu u ðnÞ X

w1 ði; Rn þ k0 Þun ðktn þ1k0 þpÞ

n ¼ 1 k0 ¼ 1

þ

þ

minðp1;n X AÞ

w1 ði; Iu þk0 Þyðkk0 þ pjkÞ

k0 ¼ 1 nA X

w1 ði; Iu þk0 Þyðkk0 þ pÞ

ð16Þ

k0 ¼ minðp1;nA Þ þ 1

For the sake of convenience all inputs of the neural model are renamed. For all possible combinations of k and p, i.e. for

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

1336

k ¼ kmin ; . . . ; kmax , p ¼ 1; . . . ; N the following definitions are introduced:

di;j ¼

x0 ðk þ pjkÞ ¼ 1

x1 ðk þ pjkÞ ¼ u1 ðkt1 þ pÞ ^ xn1 t1 þ 1 ðk þpjkÞ ¼ u1 ðkn1B þ pÞ B

^ xnP u 1

ðnnB tn þ 1Þ

where (

ðk þ pjkÞ ¼ unu ðktnu þ pÞ

n ¼ 1

^

ð17Þ

1 0

i¼l i al

ð24Þ

Taking into account (14), (19), (20) and (23), derivatives of the minimised performance function JN with respect to weights of the first layer are ( kmax X N X @JN ðyðk þpjkÞyðk þ pÞÞ ¼d @w1 ði; jÞ k ¼ kmin p ¼ 1 " K X djðzl ðk þpjkÞÞ  di;j xj ðkþ pjkÞ w2 ðlÞ dzl ðkþ pjkÞ l¼1 !#) minðp1;n X AÞ @yðkk0 þpjkÞ þ ð25Þ w1 ðl; Iu þ k0 Þ @w1 ði; jÞ k ¼1 0

xIu ðkþ pjkÞ ¼ unu ðknnBu þ pÞ ( yðk1 þpÞ xIu þ 1 ðk þ pjkÞ ¼ yðk1 þpjkÞ ^ ( xIu þ nA ðk þpjkÞ ¼

for i ¼ 1; . . . ; K, j ¼ 0; . . . ; I. For initialisation one has to set p¼1

@yðkpjkÞ ¼0 @w1 ði; jÞ

pZ2

yðknA þpÞ

p o nA þ1

yðknA þpjkÞ

p Z nA þ1

ð26Þ

for all k ¼ kmin ; . . . ; kmax , i ¼ 1; . . . ; K, j ¼ 0; . . . ; I. Gradients of the model output with respect to weights of the second layer are

Thanks to introducing variables xi , the derivation of the training algorithm is easier. Eq. (10) can be rewritten as 2 3 K I X X w2 ðiÞj4 w1 ði; jÞxj ðk þpjkÞ5 yðkþ pjkÞ ¼ ð18Þ i¼0

if p o 0

j¼0

Gradients of the model output with respect to weights of the first layer are

K X @yðkþ pjkÞ @v ðk þpjkÞ ¼ þ vi ðkþ pjkÞ w2 ðlÞ l @w2 ðiÞ @w2 ðiÞ l¼1

ð27Þ

where p ¼ 1; . . . ; N, i ¼ 0; . . . ; K. Gradients of outputs of hidden nodes are @vl ðk þ pjkÞ dfðzl ðk þ pjkÞÞ @zl ðk þpjkÞ ¼ @w2 ðiÞ dzl ðkþ pjkÞ @w2 ðiÞ

ð28Þ

Gradients of sums of inputs of hidden nodes are

K X @yðk þpjkÞ @v ðk þ pjkÞ ¼ w2 ðlÞ l @w1 ði; jÞ @w1 ði; jÞ l¼1

ð19Þ

where p ¼ 1; . . . ; N, i ¼ 1; . . . ; K, j ¼ 0; . . . ; I. Gradients of outputs of hidden nodes are @vl ðkþ pjkÞ djðzl ðk þ pjkÞÞ @zl ðk þpjkÞ ¼ @w1 ði; jÞ dzl ðk þpjkÞ @w1 ði; jÞ

ð20Þ

where l ¼ 1; . . . ; K. If hyperbolic tangent is used as the nonlinear transfer function j in the hidden layer of the neural model, one has djðzl ðkþ pjkÞÞ 2 ¼ 1tanh ðzl ðk þpjkÞÞ dzl ðk þ pjkÞ

ð21Þ

@zl ðkþ pjkÞ ¼ @w2 ðiÞ

minðp1;n X AÞ k0 ¼ 1

w1 ði; Iu þ k0 Þ

@yðkk0 þ pjkÞ @w2 ðiÞ

ð29Þ

where l ¼ 1; . . . ; K. Taking into account (15), (27), (28) and (29), derivatives of the minimised performance function JN with respect to weights of the second layer are ( kmax X N X @JN ðyðk þ pjkÞyðk þpÞÞ ¼d @w2 ðiÞ k ¼ kmin p ¼ 1 " minðp1;n K X X AÞ dfðzl ðkþ pjkÞÞ  w2 ðlÞ w1 ðl; Iu þ k0 Þ dzl ðk þpjkÞ l¼1 k0 ¼ 1 #) @yðkk0 þpjkÞ þvi ðkþ pjkÞ ð30Þ @w2 ðiÞ

Gradients of sums of inputs of hidden nodes are 8 xj ðk þpjkÞ > > > > > minðp1;nA Þ > > > þ X w ði; I þ k Þ@yðkk0 þ pjkÞ > u 1 0 @zl ðk þpjkÞ < @w1 ði; jÞ k0 ¼ 1 ¼ > @w1 ði; jÞ > > minðp1;n X AÞ > @yðkk0 þpjkÞ > > > w1 ði; Iu þk0 Þ > > @w1 ði; jÞ : k0 ¼ 1

for i ¼ 0; . . . ; K. For initialisation one has to set @yðkpjkÞ ¼ 0 if p o 0 @w2 ðiÞ

i¼l

for all k ¼ kmin ; . . . ; kmax , i ¼ 0; . . . ; K. ial ð22Þ

The last equation can be rewritten in a more compact way as @zl ðk þpjkÞ ¼ di;j xj ðk þ pjkÞ þ @w1 ði; jÞ

minðp1;n X AÞ k0 ¼ 1

ð31Þ

@yðkk0 þ pjkÞ w1 ði; Iu þk0 Þ @w1 ði; jÞ ð23Þ

4.2.2. Optimisation procedure Let W be a vector comprising all weights of the neural model and gðWÞ a vector containing all gradients of the performance index JN with respect to weights W ¼ ½w1 ð1; 0Þ    w1 ðK; IÞ    w2 ð0Þ    w2 ðKÞT gðWÞ ¼



@JN @JN @JN @JN ... ... ... @w1 ð1; 0Þ @w1 ðK; IÞ @w2 ð0Þ @w2 ðKÞ

ð32Þ T ð33Þ

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

For the n th iteration of the training algorithm which minimises the performance function (13) parameters of the model are updated according to the formula Wn þ 1 ¼ Wn þ Zn dn

ð34Þ

Vectors Wn and Wn þ 1 consists of weights at the current and the next iterations of the optimisation algorithm, respectively, dn is the direction of optimisation and Zn is the step-size towards this direction (the learning rate). The value of Zn is found from a directional minimisation subroutine. For this purpose golden section, second or third order polynomial approximation [2,6] can be used. The direction dn depends on the optimisation method. In case of the simplest but also the most inefficient steepest descent method, dn ¼ gðWn Þ. Having carrying out a large number of experiments it was found that the BFGS method outperforms all its competitors (i.e. the steepest-descent method, conjugate gradient methods and other quasi-Newton methods) in terms of convergence and computation time. In the BFGS algorithm the optimisation direction is calculated from the gradient gðWn Þ and the inverse Hessian matrix ðHðWn ÞÞ1 as dn ¼ ðHðWn ÞÞ1 gðWn Þ

ð35Þ

The approximation of second-order derivatives is updated recurrently. The inversion of the hessian is calculated directly from the formula   r T Vn1 rn sn sTn sn rnT Vn1 þ Vn1 rn sTn ðHðWn ÞÞ1 ¼ Vn ¼ Vn1 þ 1þ n T  sn rn sTn rn sTn rn ð36Þ where sn ¼ Wn Wn1 , rn ¼ gðWn ÞgðWn1 Þ. The approximation of inversion of the hessian is initialised with an identity matrix, i.e. V0 ¼ I. All things considered, the whole optimisation algorithm in which neural models for MPC are trained is as follows: 1. The initialisation: initialise weights of the neural model (e.g. set to random values), set V0 ¼ I, set the iteration number n ¼ 0.

1337

2. For all considered training patterns ðk ¼ kmin ; . . . ; kmin Þ over the prediction horizon ðp ¼ 1; . . . ; NÞ calculate partial derivatives of the minimised performance function JN with respect to weights of the first and second layers from (25) and (30) which comprise the gradient vector (33). 3. Calculate the inversion of the approximate hessian matrix from (36). 4. Calculate the optimisation direction dn from (35). 5. The line search: calculate the step-size Zn towards the optimisation direction. 6. Update weights of the neural model using (34). 7. Calculate the value of the minimised performance index JN from (13). 8. Stop if JgðWn ÞJ2 o e ðe 40 is set by the user) or if the number of iterations is exceeded. Otherwise, increase the iteration number ðn ¼ n þ1Þ and go to step 2. Because in the BFGS algorithm the inverse hessian matrix is approximated numerically from (36), the accuracy of this approximation may deteriorate in consecutive iterations. That is why after every 20 iterations of the algorithm, the matrix Vn is reinitialised in step 3, i.e. Vn ¼ I. The actual frequency of this reinitialisation is found by a trial and error approach.

5. Experiments 5.1. Process description The process under consideration is a methanol–water distillation column shown in Fig. 3. From the perspective of the supervisory MPC algorithm, the plant has two manipulated variables: R—the flow rate of the reflux stream, V—the flow rate of the vapour stream and two controlled variables: xd Fthe composition of the top product (the distillate), xb Fthe composition of the bottom product. The flow rate of the top product is denoted by D, the flow rate of the bottom product is

Fig. 3. The distillation column control system structure.

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

1338

denoted by B. Two fast single-loop PID controllers (denoted as LC) are used to stabilise levels in the reflux tank and in the bottom product tank. Two additional PID controllers (denoted as FC) are also used to control the actual streams of R and V. Four mentioned PID controllers comprise the basic control layer. Design parameters of the distillation column and nominal operating conditions are given in Table 1. At first, the fundamental model of the process is built. It consists of a set of 14 ordinary differential equations and nonlinear algebraic relations [16,22]. The fundamental model

Table 1 Parameters of the distillation column and nominal operating conditions (compositions are given in molar fractions of methanol). Parameter

Value

Number of trays Feed tray Feed stream flow rate Composition of feed stream Tray holdup Condenser holdup Boiler holdup Reflux stream flow rate Vapour stream flow rate Top product flow rate Bottom product flow rate Top product composition Bottom product composition

NT ¼ 12 Nf ¼ 4 F ¼ 100 kmol=h xf ¼ 0:5 Mn ¼ 0:5 kmol, n ¼ 1; . . . ; NT Md ¼ 5 kmol Mb ¼ 5 kmol R ¼ 33:3634 kmol=h V ¼ 83:3634 kmol=h D ¼ 50 kmol=h B ¼ 50 kmol=h xd ¼ 0:95 xb ¼ 0:05

and the calculation of the initial steady-state condition are detailed in appendix. Open-loop step-responses of the process (the top product composition) caused by increasing and decreasing the input variables R; V by 10 kmol/h at the sampling instant k ¼ 0 are depicted in Fig. 4. The sampling time is 1 min. The considered distillation column exhibits significantly nonlinear behaviour, both steady-state and dynamic properties are nonlinear. Hence, it is natural to use nonlinear MPC for this process. Simulation results of MPC algorithms based on neural models applied to the distillation column are presented in [20]. Two control algorithms are compared, namely computationally demanding MPC with nonlinear optimisation, in which the neural model is used on-line without any simplifications, and suboptimal MPC with nonlinear prediction and linearisation, which needs solving on-line only a computationally efficient quadratic programming problem. The control accuracy obtained in the suboptimal algorithm with quadratic programming is practically the same as in the computationally prohibitive MPC approach in which at each sampling instant a nonlinear optimisation problem has to be solved on-line. 5.2. Neural models training In this work the accuracy of the top product composition model is studied. The identification experiment is as follows. At first, the fundamental model (appendix) is used as the real process, it is simulated open-loop in order to obtain two sets of data, namely training and test sets. Both sets contain 2500 samples, the sampling

1 0.98

0.96

0.96

0.94

0.94

0.92

0.92

xd

xd

1 0.98

0.9

0.9

0.88

0.88

0.86

0.86

0.84

0.84 0.82

0.82 0

10

20

30

40

50

0

10

20

k

30

40

50

k

Fig. 4. Step-responses of the distillation column (the top product composition) caused by increasing (solid line) and decreasing (dashed line) the reflux stream R (left) and the vapour stream V (right) by 10 kmol/h at k ¼ 0.

120

1

V

100

0.9

xd

R, V

80 60 R

0.8 0.7

40

0.6

20

0.5

0 1

500

1000

1500

2000

2500

1

k

500

1000

1500 k

Fig. 5. The training data set.

2000

2500

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

120

1

100

V 0.9

60

xd

80 R, V

1339

R

0.8

40

0.7

20

0.6 0.5

0 1

500

1000

1500

2000

2500

1

500

1000

k

1500

2000 2500

k Fig. 6. The test data set.

time is 1 min. The output signal contains small measurement noise. Training and test data sets are depicted in Figs. 5 and 6, respectively. In order to obtain these data sets, random series of steps (constant for 60 min) from the range ½R0 25; R0 þ25 kmol=h and ½V0 25; V0 þ25 kmol=h are used as manipulated variables, quantities R0 , V0 correspond to the nominal operating point given in Table 1. For generation of training and test data sets the random number generator is initialised differently. Thanks to it, input and output sequences in the training data set are different from sequences in the test data set. Both data sets cover the whole operation domain of the process. During calculations the system of differential equations comprising the fundamental model is solved using the Runge–Kutta RK45 method. During previous research [20] it was found that the dynamics of the empirical top product dynamic model should be of the second degree (i.e. tn ¼ 1, nA ¼ nnB ¼ 2, n ¼ 1; 2) yðkÞ ¼ f ðu1 ðk1Þ; u1 ðk2Þ; u2 ðk1Þ; u2 ðk2Þ; yðk1Þ; yðk2ÞÞ

ð37Þ

It is because the first-order structure is very inaccurate and the third-order structure does not improve the accuracy significantly, hence the second-order dynamics is a good choice. The proper scaling of input and output process variables is very important in view of efficient training, it is assumed that u1 ðkÞ ¼ 0:1ðRðkÞR0 Þ

ð38Þ

u2 ðkÞ ¼ 0:1ðVðkÞV0 Þ

ð39Þ

yðkÞ ¼ 10ðxd ðkÞxd0 Þ

ð40Þ

In the nominal operating point xd0 ¼ 0:95 (Table 1). It was also found that a neural model containing K ¼ 7 neurons in the hidden layer is sufficient to approximate the nonlinear nature of the top product model. Increasing the complexity of the model does not result in any significant improvement of model accuracy. For the chosen structure of the top product model given by (37) containing seven hidden nodes where the input and output variables are scaled using (38), (39) and (40), five different prediction horizon cases are considered: N ¼ 1; 5; 10; 15; 20. In other words, the structure of the model is fixed, but during training the classical one-step ahead performance index J1 or indices J5 , J10 , J15 , J20 are used. Results of experiments, i.e. performance index values for different prediction horizons for both training and test data sets are given in Tables 2 and 3, respectively. Because weights of neural models are initialised randomly, for each horizon case the training procedure is repeated 10 times and results presented are the best obtained (the multi-start approach). The maximum number of training epochs (i.e. the maximum number of iterations of the BFGS algorithm) is 500.

Table 2 Performance index values of models trained for different prediction horizons ðN ¼ 1; 5; 10; 15; 20Þ: the training data set. Models Model Model Model Model Model

for for for for for

N¼1 N¼5 N ¼ 10 N ¼ 15 N ¼ 20

J1

J5

J10

J15

J20

1.00 0.97 1.44 1.60 1.64

22.75 10.00 11.83 13.49 14.56

113.25 37.64 29.45 30.82 32.37

341.27 77.32 47.22 42.97 43.54

927.73 111.13 63.58 52.16 50.78

Table 3 Performance index values of models trained for different prediction horizons ðN ¼ 1; 5; 10; 15; 20Þ: the test data set. Models Model Model Model Model Model

for for for for for

N¼1 N¼5 N ¼ 10 N ¼ 15 N ¼ 20

J1

J5

J10

J15

J20

1.51 1.28 1.12 1.22 1.35

45.89 27.43 19.99 21.78 23.58

236.32 120.62 74.05 73.14 74.70

580.56 268.15 138.58 126.09 123.17

1438.35 459.31 206.82 179.39 168.55

Results of experiments given in Tables 2 and 3 are scaled in such a way that it is assumed that the performance index of the model trained for the prediction horizon N ¼ 1 is equal to 1 (for the training data set), before scaling J1 ¼ 3:997520  105 . Values of the performance index actually minimised during training are given in bold. Analysing properties of the model trained for the shortest prediction horizon ðN ¼ 1Þ, i.e. the classical one-step ahead predictor, one can see that such a model is very poor when used recurrently for long-range prediction because this fact is ignored during the training stage. The longer the horizon, the worse prediction properties of the classical model. Conversely, when the model is trained using the prediction horizon next used in the MPC algorithm, it has good prediction abilities. Hence, it is obvious that it is better to take into account the specific length of the prediction horizon at the training stage rather than to use for control the model trained by means of the classical algorithm. Because the performance function JN used during training minimises prediction errors over the whole data set and for long range prediction horizon, it is an average one. As a result, average models are obtained. Naturally, for such models there has to be a trade-off between the accuracy of short range and long range predictions. It is clearly seen analysing the first column in Tables 2 and 3. In case of the training data set, the one-step ahead prediction accuracy ðJ1 Þ deteriorates when long horizons are used for model training. The only exception is when the horizon N ¼ 5

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

1340

is taken into account during training, in such a case J1 ¼ 0:97, but this value is very close to 1. Thanks to good generalisation abilities of obtained models, for the test data set the value of J1 does not increase when the horizon used during training is lengthened. In order to further show the potential of using the presented algorithm for training models which are next used in MPC, the following ratio Pkmax

2 k ¼ kmin ðyðk þpjkÞyðkþ pÞÞ 2 k ¼ kmin ðy1 ðk þ pjkÞyðkþ pÞÞ

rp ¼ Pk max

ð41Þ

Pkmax 2 N 1 X k ¼ kmin ðyðkþ pjkÞyðk þ pÞÞ RN ¼ Pkmax Np¼1 ðy1 ðk þpjkÞyðk þ pÞÞ2

The value of RN shows the average accuracy of prediction calculated from the model in which the prediction horizon is taken into account during training in comparison with the model trained by the classical algorithm. Similarly to coefficients rp , if RN o1 it is clear that models trained using the presented algorithm have better prediction abilities than models obtained using the classical one. Tables 4 and 5 present values of the coefficient RN of models trained for different prediction horizons. Table 4 Average accuracy ratios of models trained for different prediction horizons ðN ¼ 1; 5; 10; 15; 20Þ: the training data set. Models Model Model Model Model

for for for for

N ¼ 5 vs. model for N ¼ 1 N ¼ 10 vs. model for N ¼ 1 N ¼ 15 vs. model for N ¼ 1 N ¼ 20 vs. model for N ¼ 1

R10

R15

R20

0.59 0.79 0.89 0.94

0.47 0.52 0.58 0.61

0.38 0.39 0.42 0.44

0.30 0.30 0.32 0.34

Models Model Model Model Model

for for for for

N ¼ 5 vs. model for N ¼ 1 N ¼ 10 vs. model for N ¼ 1 N ¼ 15 vs. model for N ¼ 1 N ¼ 20 vs. model for N ¼ 1

R5

R10

R15

R20

0.68 0.54 0.59 0.65

0.59 0.42 0.45 0.48

0.55 0.36 0.36 0.38

0.48 0.29 0.29 0.31

N=10 1.5

1

0.5

0 1

5

10 p

15

20

1

5

N=15

10 p

15

20

15

20

N=20

1.6 1.4

1.6 1.4

1.2

1.2

1

1

0.8

rp

rp

R5

Table 5 Average accuracy ratios of models trained for different prediction horizons ðN ¼ 1; 5; 10; 15; 20Þ: the test data set.

N=5

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

ð42Þ

k ¼ kmin

rp

rp

over the prediction horizon ðp ¼ 1; . . . ; NÞ is calculated for available data sets. Coefficients rp compare the long-range prediction accuracy of neural models trained using the discussed algorithm (the numerator) and models trained by means of the classical algorithm (the denominator) which yields one-step ahead predictors. For the sampling instant k þp, the output of the model obtained by the discussed algorithm is denoted by yðkþ pjkÞ, the output of the classical one step ahead model used for long-range prediction is denoted by y1 ðk þ pjkÞ, yðk þ pÞ is the real data sample used for model training and testing. If rp o 1, there is a potential for model training by means of the presented algorithm rather than the classical one. Fig. 7 shows the ratio rp over the prediction horizon for training and test data sets for models trained for different prediction horizons. Considering Fig. 7, one can conclude that, in general, the prediction accuracy of the model trained using the classical algorithm deteriorates fast as the consecutive sampling instants of the horizon are considered. At the end of the horizon the prediction accuracy is very low, or, in other words, the discussed training algorithm reveals its advantages. Furthermore, unlike the one step ahead predictor, obtained models are characterised by good long range prediction accuracy even when the horizon used for training is shorter than the horizon used for model validation and next in MPC.

Taking into account the whole prediction horizon, the following average accuracy ratio can be calculated

0.8

0.6

0.6

0.4

0.4

0.2 0

0.2 0 1

5

10 p

15

20

1

5

10 p

Fig. 7. The ratio rp over the prediction horizon for training (solid line) and test (dashed line) data sets for models trained for different prediction horizons ðN ¼ 5; 10; 15; 20Þ.

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

1 0.98 ΔR=10

xd

0.96 0.94 ΔR=−10

0.92 0.9 0.88 0

5

10 k

15

20

Fig. 8. Step-responses (long-range predictions) caused by increasing and decreasing the reflux stream R by 10 kmol/h at k ¼ 0: the model trained for the prediction horizon N ¼ 1 (dashed line) and N ¼ 20 (dotted line) vs. the real process (solid line).

Accuracy ratios calculated for the same prediction horizon which is actually used during training are given in bold. The smaller the value of RN , the worse long-range prediction abilities of models trained by the classical algorithm and it is more appropriate to use the presented algorithm. Ideally, the value of the prediction horizon used for training should be the same as in the MPC algorithm. On the other hand, as results of experiments (Fig. 7 and Tables 4 and 5) clearly show, it is also reasonable to use for training shorter horizons. For example, when the model is trained for N ¼ 10, it has quite good prediction abilities for N ¼ 15 and even for N ¼ 20. Hence, the model trained for N ¼ 10 can be viewed as ‘‘good enough for MPC in which N r20’’ . Such an observation is important in view of time and effort spent for training. Although training is performed off-line, increasing the horizon used during training leads to increasing the computational complexity of the training algorithm. Because during training all predictions over the whole prediction horizon are considered, the discussed algorithm is more computationally demanding in comparison with the classical backpropagation scheme which yields onestep ahead predictors. On average, for the prediction horizon N ¼ 5 the training time is approximately 2.8 times longer in comparison with the classical backpropagation training. This factor increases to 5.7, 7.8 and 10.6 for N ¼ 10, N ¼ 15 and N ¼ 20, respectively. In addition to finding coefficients rp and RN , one may compare step responses of obtained models. Fig. 8 shows step-responses of the process and predictions calculated by models trained for the prediction horizon N ¼ 1 and 20. The reflux stream R is increased and decreased by 10 kmol/h at k ¼ 0. When the model is trained for the shortest prediction horizon N ¼ 1, obtained step responses are calculated correctly for first sampling instants k ¼ 1; 2; 3, for next sampling instants the prediction error becomes bigger and bigger and consecutive predictions significantly differ from the real process. When the model is trained for the prediction horizon N ¼ 20, it has an ability to correctly predict behaviour of the process over the whole prediction horizon. Differences between the process and predictions calculated from the model are very small.

6. Conclusions This paper emphasises the fact that the model identification stage should be closely related to its further application in MPC algorithms which directly use on-line process models for prediction and optimisation of the optimal control policy. On the one hand, thanks to it, MPC techniques can be successfully applied for processes with many inputs and outputs, constraints can be taken

1341

into account in a straightforward way as a part of the MPC optimisation task. On the other hand, MPC algorithms are very model-based, they are likely to offer good control performance provided that predictions calculated from the model are accurate enough. Conversely, when discrepancies between the process and predictions calculated from the model are significant, MPC is likely to generate a wrong control policy. In such a case there is no point in applying advanced nonlinear MPC solutions which are based on nonlinear models. It is better to use an MPC algorithms based on ‘‘an average’’ linear model or even the classical control system structure (e.g. PIDþ antiwindup). Such control algorithms are potentially not likely to give good control accuracy in case of nonlinear processes but their sensitivity to model inaccuracies is lower in comparison with nonlinear MPC. Unfortunately, in such cases constraints important from technological and safety reasons may be not satisfied. Because distillation processes have nonlinear properties, different nonlinear control techniques give better control accuracy than the classical PID controller [8,39]. Moreover, for such processes the accuracy of MPC algorithms based on nonlinear models is better in comparison with MPC based on linear models [20,38]. In this paper neural models of processes are considered. They can be efficiently used in various MPC schemes, especially in computationally efficient suboptimal MPC algorithms with quadratic programming. As far as the model training phase is concerned, the widely used backpropagation algorithm results in one-step ahead predictors. They are inherently not suited to be used recursively in MPC for long range prediction. Unlike general recurrent training algorithms, the algorithm presented in this paper takes into account the specific way these models are next used in MPC for long-range prediction. It is possible to use for prediction a specialised model the structure of which is designed taking into account its specific role in MPC. For example, one can use a multi-model in which for each sampling instant within the prediction horizon one independent submodel is used [5,7,13,19,34]. Unfortunately, when the prediction horizon is long, multi-models are inevitably likely to have many parameters. Unlike the multi-model, the neural model considered in this paper has the classical structure, good prediction abilities result from using the specialised training approach. For training the efficient quasi-Newton BFGS optimisation algorithm with analytical first-order derivatives and an approximated inverse matrix of second-order derivatives is adopted.

Acknowledgement The work presented in this paper was supported by Polish national budget funds for science for years 2009–2011 in the framework of a research project.

Appendix A. Fundamental model of the methanol–water distillation column A.1. Dynamic model For a thorough discussion concerning modelling of distillation columns the reader is referred to [16,22]. Here, the model is given in a compact form. Basic units of the considered column are: trays (stages), the condenser and the boiler. The feed stream is a liquid mixture of methanol and water. Model assumptions are as follows: 1. Ideal trays. 2. Perfect mixing and equilibrium on all trays, in the condenser and in the boiler.

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

1342

for feed tray

1 0.9

dxNf ðtÞ ¼ VðtÞðyNf 1 ðtÞyNf ðtÞÞ þRðtÞxNf þ 1 ðtÞ dt

MNf

0.8

ðRðtÞ þ FðtÞÞxNf ðtÞ þ FðtÞxf ðtÞ

0.7

ð47Þ

y

0.6

for trays above the feed, n ¼ Nf þ 1; . . . ; NT 1

0.5 0.4

Mn

0.3

MNT

0.1 0 0

0.2

0.4

0.6

0.8

Fig. 9. The vapour liquid equilibrium relation for the mixture of methanol and water.

3. 4. 5. 6.

Total condenser. Constant pressure (1 bar). No vapour holdup on trays. Tray dynamics for liquid is negligible, i.e. tray liquid holdup is constant. 7. Condenser and boiler dynamics is negligible, i.e. condenser and boiler holdups are constant. 8. Energy losses and resulting temperature changes are negligible. 9. Normal operating conditions, i.e. no flooding, weeping etc. Because equilibrium is assumed, it is possible to calculate the composition of vapour, denoted as y, for the given composition of liquid, denoted as x. For the considered mixture of methanol and water the vapour liquid equilibrium (VLE) relation is shown in Fig. 9. It can be numerically approximated by the following empirical relation: p1 x 1 þp2 x þ p3 x2

dx1 ðtÞ ¼ VðtÞðyb ðtÞy1 ðtÞÞ þ ðRðtÞ þ FðtÞÞðx2 ðtÞx1 ðtÞÞ dt

ð44Þ

ð45Þ

for trays below the feed, n ¼ 2; . . . ; Nf 1 dxn ðtÞ Mn ¼ VðtÞðyn1 ðtÞyn ðtÞÞ þðRðtÞ þ FðtÞÞðxn þ 1 ðtÞxn ðtÞÞ dt

Md

dxd ðtÞ ¼ VðtÞyNT ðtÞðRðtÞ þ DðtÞÞxd ðtÞ dt

ð50Þ

In addition to above nonlinear differential equations, the model uses vapour liquid equilibrium equation (43). For the n th tray ðn ¼ 1; . . . ; NT Þ, the relation between the composition xn of the liquid and the composition yn of the vapour is yn ðtÞ ¼ fxy ðxn ðtÞÞ;

n ¼ 1; . . . ; NT

ð51Þ

The vapour liquid equilibrium equation is also formulated for the condenser and the boiler yb ðtÞ ¼ fxy ðxb ðtÞÞ;

yd ðtÞ ¼ fxy ðxd ðtÞÞ

ð52Þ

PID controllers stabilising levels in reflux and bottom product tanks are much faster in comparison with the supervisory MPC algorithm. It is because levels change fast whereas changes in compositions of top and bottom products are significantly slower. Hence, condenser and boiler holdups are assumed to be constant, top and bottom product flow rates are calculated from DðtÞ ¼ VðtÞRðtÞ;

BðtÞ ¼ RðtÞ þ FðtÞVðtÞ

ð53Þ

Design parameters of the distillation column are given in Table 1. Differential equations which constitute the fundamental model are solved using the Runge–Kutta RK45 method. A.2. Steady-state model

for the first (bottom) tray M1

ð49Þ

ð43Þ

where p1 ¼ 8:1249, p2 ¼ 9:6540, p3 ¼ 2:5291. Let NT denote total number of trays (numbered from the bottom to the top), Nf is the feed tray. The flow rates are expressed in kmol/h, compositions are given in mole fractions of light component (methanol), tray, condenser and boiler holdups are given in kmol. The tray holdup is denoted by Mn , n ¼ 1; . . . ; NT , Md is the condenser holdup, Mb is the boiler holdup. Because no vapour holdup is assumed, the vapour flow rate V is constant for all trays. The composition of the vapour stream leaving and entering the n th tray is denoted by yn and yn1 , respectively. The composition of the liquid stream leaving and entering the n th tray is denoted by xn and xn þ 1 , respectively. So as to formulate the mathematical model of the distillation column it is necessary to use the component material balance for each stage, which describes the composition changes in time. For the boiler one has dxb ðtÞ ¼ ðRðtÞ þ FðtÞÞx1 ðtÞVðtÞyb ðtÞBðtÞxb ðtÞ dt

dxNT ðtÞ ¼ VðtÞðyNT 1 ðtÞyNT ðtÞÞ þ RðtÞðxd ðtÞxNT ðtÞÞ dt

for the condenser

1

x

Mb

ð48Þ

for the last (top) tray

0.2

yðxÞ ¼ fxy ðxÞ ¼

dxn ðtÞ ¼ VðtÞðyn1 ðtÞyn ðtÞÞ þ RðtÞðxn þ 1 ðtÞxn ðtÞÞ dt

ð46Þ

For proper simulation of the dynamic model of the distillation column it is necessary to find steady-state initial conditions. Step responses depicted in Fig. 4 are in fact real step responses provided that the simulation is commenced from appropriate initial conditions. Otherwise, initial conditions different from the steady-state influence responses of the model. For the same reason it is necessary to use precise initial conditions when the process is simulated open-loop to obtain training and test data sets shown in Figs. 5 and 6, respectively. In order to calculate initial steady-state conditions a steadystate model of the process is necessary. Such a model can be derived easily from the dynamic one. Because steady-state properties are considered, differential equations (44)–(50) are transformed into algebraic equations by equating all timederivatives to 0 and all time indices t are eliminated. The vapour liquid equilibrium is formulated to describe the steady-state relation between compositions of liquid and vapour for trays (51), for the condenser and the boiler (52). Moreover, steady-state relations between streams B, D, F, R and V (53) are also used. As a result, one obtains the following steady-state model 0 ¼ ðR þ FÞx1 Vf xy ðxb ÞðR þ FVÞxb

ð54Þ

0 ¼ Vðfxy ðxb Þfxy ðx1 ÞÞ þðR þFÞðx2 x1 Þ

ð55Þ

0 ¼ Vðfxy ðxn1 Þfxy ðxn ÞÞ þ ðR þ FÞðxn þ 1 xn Þ;

n ¼ 2; . . . ; Nf 1

ð56Þ

ARTICLE IN PRESS ´ czuk / Neurocomputing 73 (2010) 1332–1343 M. Ławryn

0 ¼ Vðfxy ðxNf 1 Þfxy ðxNf ÞÞ þ RxNf þ 1 ðR þ FÞxNf þ Fxf

ð57Þ

0 ¼ Vðfxy ðxn1 Þfxy ðxn ÞÞ þ Rðxn þ 1 xn Þ;

ð58Þ

n ¼ Nf þ 1; . . . ; NT 1

0 ¼ Vðfxy ðxNT 1 Þfxy ðxNT ÞÞþ Rðxd xNT Þ

ð59Þ

0 ¼ Vf xy ðxNT ÞVxd

ð60Þ

The above steady-state model comprised NT nonlinear algebraic equations, fxy ðxi Þ denotes the nonlinear vapour liquid equilibrium relation (43). Model equations are solved using the Newton–Raphson method [30]. It may be interesting to mention that two configurations are possible: 1. For given values of the reflux stream flow rate R, the vapour stream flow rate V, the feed flow rate F and the composition xf of feed stream, all compositions are calculated, i.e. the composition xd of the top product, the composition xb of the bottom product and compositions xn for consecutive trays ðn ¼ 1; . . . ; NT Þ. 2. For given values of the feed flow rate F, the composition xf of feed stream, the composition xd of the top product and the composition xb of the bottom product, one calculates compositions xn for consecutive trays, the reflux stream flow rate R and the vapour stream flow rate V. When specific values of top and bottom product compositions are required, the second configuration must be used. For design parameters given in Table 1, i.e. the number of trays NT ¼ 12, the feed tray Nf ¼ 4, the feed flow rate F ¼ 100 kmol=h, the composition of the feed stream xF ¼ 0:5, top and bottom product compositions xd ¼ 0:95, xb ¼ 0:05, one obtains the following tray compositions: x2 ¼ 0:37108875, x3 ¼ 0:46384841, x4 ¼ x1 ¼ 0:19074557, 0:49621599, x5 ¼ 0:52564878, x6 ¼ 0:56134620, x7 ¼ 0:60321736, x8 ¼ 0:65076279, x9 ¼ 0:70322201, x10 ¼ 0:75979755, x11 ¼ 0:81987775, x12 ¼ 0:88320672. The reflux stream flow rate is R ¼ 33:3634 kmol=h, the vapour stream flow rate is V ¼ 83:3634 kmol=h as given in Table 1. The top product flow rate is D ¼ 50 kmol=h, the bottom product flow rate is B ¼ 50 kmol=h. References [1] R.K. Al Seyab, Y. Cao, Nonlinear system identification for predictive control using continuous time recurrent neural networks and automatic differentiation, J. Process Control 18 (2008) 568–581. [2] M.S. Bazaraa, J. Sherali, K. Shetty, Nonlinear Programming: Theory and Algorithms, Prentice-Hall, Englewood Cliffs, 1999. [3] J.D. Bomberger, D.E. Seborg, Determination of model order for NARX models directly from input–output data, J. Process Control 8 (1998) 459–468. [4] M.Y. El Ghoumari, H.J. Tantau, Non-linear constrained MPC: real-time implementation of greenhouse air temperature control, Comput. Electron. Agric. 49 (2005) 345–356. [5] W. Favoreel, B. De Moor, M. Gevers, Multiple prediction models for long range predictive control, in: Proceedings of the IFAC World Congress, CD-ROM, Beijing, China, 1999. [6] P. Bishop, W. Murray, M. Wright, Practical Optimization, Academic Press, New York, 1981. [7] C. Greco, G. Menga, E. Mosca, G. Zappa, Performance improvement of self tuning controllers by multistep horizons: the MUSMAR approach, Automatica 20 (1984) 681–700. ¨ [8] S. Gruner, K.D. Mohl, A. Kienle, E.D. Gilles, G. Fernholz, M. Friedrich, Nonlinear control of a reactive distillation column, Control Eng. Pract. 11 (2003) 915–925. [9] S. Haykin, Neural Networks—A Comprehensive Foundation, Prentice-Hall, Englewood Cliffs, 1999. [10] M.A. Henson, Nonlinear model predictive control: current status and future directions, Comput. Chem. Eng. 23 (1998) 187–202. [11] K. Hornik, M. Stinchcombe, H. White, Multilayer feedforward networks are universal approximators, Neural Networks 2 (1989) 359–366. [12] M.A. Hussain, Review of the applications of neural networks in chemical process control—simulation and online implementation, Artif. Intell. Eng. 13 (1999) 55–68.

1343

[13] D. Liu, S.L. Shah, D.G. Fisher, Multiple prediction models for long range predictive control, in: Proceedings of the IFAC World Congress, CD-ROM, Beijing, China, 1999. [14] L. Ljung, System Identification: Theory for the User, Prentice-Hall, Upper Saddle River, 1987. [15] C.H. Lu, C.C. Tsai, Adaptive predictive control with recurrent neural network for industrial processes: an application to temperature control of a variablefrequency oil-cooling machine, IEEE Trans. Ind. Electron. 55 (2008) 1366–1375. [16] W.L. Luyben, Process Modelling, Simulation and Control for Chemical Engineers, McGraw-Hill, New York, 1990. [17] M. Ławryn´czuk, Efficient nonlinear predictive control based on structured neural models, Int. J. Appl. Math. Comput. Sci. 19 (2009) 233–246. [18] M. Ławryn´czuk, W. Tadej, A computationally efficient stable dual-mode type nonlinear predictive control algorithm, Control Cybernet. 37 (2008) 99–132. [19] M. Ławryn´czuk, Suboptimal nonlinear predictive control with neural multimodels, in: L. Rutkowski, R. Tadeusiewicz, L.A. Zadeh, J. Zurada (Eds.), Proceedings of the 9th International Conference on Artificial Intelligence and Soft Computing, ICAISC 2008, Zakopane, Poland, Challenging problems of science, computer science: Computational intelligence: Methods and applications (Exit, Warsaw, 2008) pp. 45–56. [20] M. Ławryn´czuk, A family of model predictive control algorithms with artificial neural networks, Int. J. Appl. Math. Comput. Sci. 17 (2007) 217–232. [21] J.M. Maciejowski, Predictive Control with Constraints, Prentice-Hall, Harlow, 2002. [22] T.E. Marlin, Process Control, McGraw-Hill, New York, 1995. [23] L.A. da Cruz Meleiro, F. Jose´, V. Zuben, R.M. Filho, Constructive learning neural network applied to identification and control of a fuel–ethanol fermentation process, Eng. Appl. Artif. Intell. 22 (2009) 201–215. [24] M. Morari, J. Lee, Model predictive control: past, present and future, Comput. Chem. Eng. 23 (1999) 667–682. [25] K.S. Narendra, K. Parthasarathy, Identification and control of dynamical systems using neural networks, IEEE Trans. Neural Networks 1 (1990) 4–26. [26] M. Nørgaard, O. Ravn, N.K. Poulsen, L.K. Hansen, Neural Networks for Modelling and Control of Dynamic Systems, Springer, London, 2000. [27] R.K. Pearson, Selecting nonlinear model structures for computer control, J. Process Control 13 (2003) 1–26. [28] T. Parisini, M. Sanguineti, R. Zoppoli, Nonlinear stabilization by recedinghorizon neural regulators, Int. J. Control 70 (1998) 341–362. [29] H. Peng, Z.J. Yang, W. Gui, M. Wu, H. Shioya, K. Nakano, Nonlinear system modeling and robust predictive control based on RBF-ARX model, Eng. Appl. Artif. Intell. 20 (2007) 1–9. [30] W.H. Press, S.A. Teukolsky, W.T. Vettering, B.P. Flannery, Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, Cambridge, 1992. [31] S.J. Qin, T. Badgwell, A survey of industrial model predictive control technology, Control Eng. Pract. 11 (2003) 733–764. [32] S.Z. Qin, H.T. Su, T.J. McAvoy, Comparison of four neural net learning methods for dynamic system identification, IEEE Trans. Neural Networks 3 (1992) 122–130. [33] J.A. Rossiter, Model-Based Predictive Control, CRC Press, Boca Raton, 2003. [34] J.A. Rossiter, B. Kouvaritakis, Modelling and implicit modelling for predictive control, Int. J. Control 74 (2001) 1085–1095. [35] D.S. Shook, C. Mohtahdi, S.L. Shah, A control relevant identification strategy for GPC, IEEE Trans. Automatic Control 37 (1992) 975–980. [36] D.S. Shook, C. Mohtahdi, S.L. Shah, Identification for long range predictive control, IEE Proc. Part D 138 (1991) 75–84. [37] H.T. Su, T.J. McAvoy, Long-term predictions of chemical processes using recurrent neural networks: a parallel training approach, Ind. Eng. Chem. Res. 31 (1992) 1338–1352. [38] P. Tatjewski, Advanced Control of Industrial Processes, Structures and Algorithms, Springer, London, 2007. [39] A. Trotta, M. Barolo, Nonlinear model-based control of a binary distillation column, Comput. Chem. Eng. 19(Suppl.) (1995) S519–S524. [40] D.L. Yu, D.W. Yu, J.B. Gomm, Neural model adaptation and predictive control of a chemical process rig, IEEE Trans. Control Syst. Technol. 14 (2006) 828–840. ˜ o, P. Vega, L.D. Garcia, M. Francisco, State-space neural [41] J.M. Zamarren network for modelling prediction and control, Control Eng. Pract. 8 (2000) 1063–1075. Maciej Ławryn´czuk obtained his M.S. and Ph.D. degrees in automatic control from the Warsaw University of Technology, Poland, in 1998 and 2003, respectively. Currently, he is an assistant professor at the Institute of Control and Computation Engineering of the Warsaw University of Technology. His research interests include the application of neural networks in process modelling, control (in particular model predictive control) and optimisation.