Data-Driven Modeling of Batch Processes

Data-Driven Modeling of Batch Processes

Copyright © IFAC Advanced Control of Chemical Processes Hong Kong, China, 2003 ELSEVIER IFAC PUBLICATIONS www.clscvicr.com/locatc/ifac DATA-DRIVEN ...

8MB Sizes 0 Downloads 62 Views

Copyright © IFAC Advanced Control of Chemical Processes Hong Kong, China, 2003

ELSEVIER

IFAC PUBLICATIONS www.clscvicr.com/locatc/ifac

DATA-DRIVEN MODELING OF BATCH PROCESSES Dennis Bonne * Sten Bay Jl1lrgensen *

* DepQ1'tment of Chemical Engineel'ing, Technical UniveTsity of DenmaTk, DK-2800 Lyngby, Denmmk, Email: (db 01' [email protected], Fal;: +4545982906

Abst.ract: A one dimensional grid of interdependent linear models obtained 1'1'0111 operation data is proposed for l1l0deling repeated finite horizon, non linear and nonstationary process operations. Such finite horizon process operations include startups, grade transitions, shut.-downs, and of course batch, semi-batch and periodic processes. The 11l0del grid is identified from data using a novel interpretation of generalized ridge regression that penalizes weighted discrepancies between one linear model and the models in its neighborhood. It. is furthennore outlined how different. representations of such a model grid may be used off-line as well as on-line. 1'01' prf'diction. monitoring, control, and optimization. AllIong t.hese representations is a lineal' time-varying state space model which llIay be used for design in established linear monitoring and control metllOdologies. Copyright © 2003 IFA C Keywords: Batch, Non-lineal' systems, Time-varying systems, Parameter est.imation

I. INTRODUCTION

tion of these moclel grids is addressed in section 3 and it is proposed to llse lIlodel property based regularization to overcome excessive variance. The methods proposed ill sections 2 and ;3 are applied to an industrial case st.udy in section 4 and finall.v conclusions are given.

Batch processes are expf'riencing a renaissance as pl'Odllct.s-on-dem
2. TII\JE-VARYING l\IODELS Most. often the complex alld nonlinear dvnalnics of continuously operated processes can he approximated with it moderate set of local Linear TinleInvariant (LTI) models. each of which clescribes a characteristic region in the operation window. These regions descri bed by local models will often be characterized by a set of active constraints. For batch and semi-batch processes (frolll here on, batch will cover both batch and semi-batch processes) however, the set of active constraints will change as the batch progresses. In fact, t.o

In section 2 it is thus proposed to model batch processes with a tillle-varying grid of linear models and it is demonstrated how such model grids may b(~ applied to both off-line and on-line monitoring, prediction. control, and optimization. Identifica-

589

operate a batch process in an optimal fashion, a specific sequence of constraints is tracked during operation. This means that local approximations of characteristic regions are not sufficient to describe batch operation. The transitions between these locally approximated characteristic regions are also needed to provide a complete descriptioll of batch operation. Furthermore. even if specific sets of constraints were active for longer periods; local LTI models can not be expected to describe the time variation due to changing holdups and/or compositions.

ence Ut E jRn,,(t), output variable Y, E jRn,,(t) with reference Yt E jRn,,(t), and disturbance variable tut E jRn,,(t). Using an ARX model parameterization, the output deviation Yt - Yt at. time t may be given as a weighted sum of nA (t) past output deviations and ?taU) past input deviations f}, -

Y, = -

0-'. .1-1

(Yt-I - Yt-I) - ...

Q.t.I-nA(t) (f}t-nA(t) -

+ bl./_ I

+ ...

CUt _ [ - Ut _ I )

+ bl./-no(t) (D't-no(t)

The periodic nature and the finite horizon of batch processes however, make it possible to model the evolution from each sample point to the next in a batch with one g1'id-point LTI model. In this fashion, both the time variation within the characteristic regions and the transitions between these may be approximated with a grid of gridpoint models. Thus. such a model set. gives it complete description of a hatch. The finite horizon of batch processes means t.hat the model set will be finite. The periodic way in which the same recipe is repeated hatch after batch means that several measurements from the individual si-Impie points are available for ident.ification. That is, the time evolution of a process variable is measured/sampled at specific sample points during the hatch operation and as the batch operation is repeated, severnl mea':mrements are collected from every sample point. With multi pit' data points/measurements from one specific sample point a grid-point model can be identified for this sample point. Explicitly, in additioll to tlw time dirnensioll, data frolll hatch processes also evolve in a batch index dilnension.

Yt-nA(/))

-

(1)

Ut-HOt/))

+Wt

where ?tA(t), ?taU) E [1.. .. ,1] are the grid-point ARX model orders and Q.i •.! E jRn"(i).n,,U) and bi,j E jRH"(i).n,,U) are the grid-point ARX model parameter matrices. Note, as the grid point.s are modeled with individual grid-point. models, the sample points t do not have to be equidist.antly spaced in time. Let N be the batch length(/number of samples) and define the input u. output y, shifted output yO, and disturbance w profiles as u= [ u.~ y=

[V;

I

I uN-l

1/.[

YN

Y2

yO = [ y~) Yi I

w=

]'

,]'

I

(2)

I YN-I

[w; w2 I

]'

,]' ... wN

Note, not. all initial conditions Yo are measurable and/or physically meaningful - e.g. ofl~gas measurements. Thus the ARX model set may be expressed in mat.rix form

fJ - y

= -A(fJ°

- yO)

+ B(u

- u)

+w

(3)

where A. B are structured lower block triangular matrices. The profile w is a sequence of disturbance terms caused by bias in the reference input profile u, the effect of process upsets, and the modeling errors from linear approximations. This means that the disturbance w contains contribu tions from both batch wise persistent distlll'bances, such as recipe/input bias, model bia<;, and erroneous sensor readings, as well a'i from random disturbances, which occur with no batch wise correlation. It thus seelllS resoanble to model t.he disturbance profile w with a random walk model with respect to the batch index k

2.1 Model Pammef.erizaf.ion

Given t.he discussioll ahove. hatch processes are II10dded with sets of dynamic grid-point LTI models. Such a set of grid-point. LTI lnodels could nlso he referred to as a Linear Time Varying (LTV) hatch Inode!' These grid-point. LTI models call be parameterized ill a mllllher of ways - e.g. ilS Output Error (OE) models, Aut.oRegressive models with eXogenous input.s (ARX). State Spnt"(' (SS) models. dc. In the present contrilJllt.ioll tlw ARX model parameterizatioll was chosen. This choice of parameterization offers it relatively good multi variable syst.em description with It moderat.e number of model parameters.

(4 ) where Vh- represents a sequence of batch wise non-persistent disturbances that are assumed to be zero-mean, independent. and identically distributed. The assumption of Vh- being white is rather crtlde, but necessary if one whishes to keep the parameter estimation problem linear. Considering the difference hetween t.wo successive batches

As operation of a batch progresses, nifferent inputs and outputs may be used depending on the cnrrent pha<;e of the batch ann hence in order t.o model batch operation it is convenient to define the following variables and references for each time step t: Input. variable "/1.1 E jRn,,(I) with refer-

590

6.Yk = YI.: - Yk-l =

A6.y~

- B6. u k

+ Vk

zero with respect to batch index. In literature it is said that the controller learns to reject the batch wise persistent disturbances - i.e. the resulting controller is an Iterative Learning Control (ILC) scheme. A more accurate formulation would be that both output and input errors are modeled using integrators with respect to batch index. The trajectory tracking model representation (8) is similar to that of Lee et al. (2000), but the representations differ significantly since (8) includes the effect of initial conditions (H 6.Yk.O) and disturbance propagation (EVk). Another important difference is that (8) does not have double dependence on the batch wise persistent distnrbances i.e., the trajectory tracking model representation (8) only include the batch wise persistent dist.urbances as represented by ek-l and not as both the part of ek-l caused by the batch wise persistent. disturbances and the batch wise persistent disturbances themselves.

(5)

A batch ARX model (5) that is independent of the reference profiles (1), iLl and batch wise persistent disturbances has been obtained. With such a batch ARX model the path is prepared for multivariable. mooel-based monitoring, control, optimization, and of course simulation. During the model derivation above it was assumed that the outputs are known. This is however not the case in practice, where only a sequence Zk of noisy observations of the outputs is available Zk

=

' Yk, ]' + Ek [ Yk.O

(6)

where El.: is a sequence of measurement noise terms that are assumed to be zero-mean, independent and identically distributed.

2.2 Appli.cation Specific Models

The two forms (7) and (8) of the batch ARX model ahove are applicahle to off-line or interbatch type applications. For on-line estimation, monitoring, feedback controL and optimization however, it is convenient to use a state space realization of the hat.ch ARX model. To achieve such a realizat.ion it is necessary to simplify the batch ARX model struct.ure with the assumption that the nnmber of outputs is constant 11)1 (t) = H y for t = 1.... , N. III an observer canonical form the state space realizat.ion is given as

Depending on the task the batch ARX model (5) is to be applied to, it is convenient to convert the batch ARX model into different representations. If the task at hand is to predict (or simulate) the behavior of a batch before it is started the following form is convenient

Note t.hat the dist.urbance matrix E models the propagation of biltch wise non-persistent disturbaJl(~es including batch wise non-persistent. model-plant mismatch.

Xk,t = A t .l:I.·/-1

+ B t 6. U I.-.I-l + [Vk,1

6.YI.:,t = eXk.t

The form (7) above is also convenient for the task of classification/monitoring (e.g. normal or not) of a batch after it has been completed. Furt.hermore, the form (7) can be used to determine openloop optimal recipes in t.he sense of opt.imizing an objective for t.he batch. If such an object.ive is to minimize the deviations e from a desired trajectory fJ, then (7) can be modified into

(9)

with the initial condit.ion .1:1.7,0 = [6.Y~:,o, 0', ... ,0']'. Just as (7), the SS model forlll (9) is convenient for prediction, monitoring. and optimization type applications, but. also facilitates on-line implementations of these. Furt.hermore, the SS model form (9) is particnlarly well suiteo for closed-loop or feedback control applications. For tracking control applications t.he SS nlodel form (9) can be modified into

There are two important points to be made about the trajectory tracking model form (8). First. of all. as t.lw error profile ek in batch k depends on the error profile el.·-l from batch k - 1. the effects of the batch wise persistent disturbances are integrat.ed with respect to batch index. This means that. a properly designed controller can reject the effects of the batch wise persistent disturbances asymptotically with respect to batch index - e.g. removing the effects of recipe and model bias. Secondly, given the above mentioned asymptotic behavior and as the control moves/actions generated by such a controller are deviations from the control/input profile realized in the previous batch, the control actions due to batch wise persistent disturbances will converge asymptotically to

:I:u = A/.I:I..t-J e'·.t

+ Bt6.uk.t-l + [l1l.:,t

= Ck-l,t - CCI.·.t

(10)

Following the discnssion above, it lIlultivariable feedback controller properly designed using the trajectory tracking SS lIlodel form (10), will reject the effects of the batch wise persistent disturbances asymptotically with respect to batch index. That is, due to the output and input error integration in the model framework, a controller designecl to reject disturbances with respect to time in one batch at it time will also asymptotically reject. the effects of batch wise persistent. disturbances with respect to batch index.

591

is given as the difference between their respective estimates. The expected output and input difference profiles which are all contained in ~Zk, are thus given as

3. MODEL IDENTIFICATION With the batch ARX model (5) derived above, the parameterization of the batch model is in place, however the model orders and the model parameters still need to be determined from process data. One major drawback of the proposed parameterization is the immense dimensionality of the resulting set of models - in practice this immense dimensionality will render any standard Least Squares (LS) identification problem ill-conditioned. It turns out however, that as the grid-point models are progressively constrained by the smoothness of the model grid, the conditioning of the identification problem improves.

E{~yd = ~fh·

Several suggestions to how (sets of) LTI or (periodic) LTV models should be identified from data can be found in literature. All these authors employ some or other coefficient shrinkage or subspace method to improve the conditioning of the identification problem and hence lower the variance of the model parameter estimates. Simoglou et al. (2002) suggested estimating a set of independent, overlapping local LTI SS models using Canonical Variant Analysis (CVA). Instead the present contribution proposes estimating a grid/set of int.erdependent grid-point LTI ARX models using a novel int.erpretation of generalized ridge regression.

In industry, the process variables zk.i(P) E IR. are most often logged individually at times T(kJ p), giving Nz(k, p) + 1 observations of variable p in batch k. What is needed however, is up to N + 1 noise free observations of the variables at timcs TU) in the NB batches available for identificat.ion. These noise free or expected observations can be estimat.ed using local polynomial regrf'ssion and If the profile of process variable p in b'1t.ch k is defined as Zlqn then the estimation problem can Iw given explicitly (Hastie et al., 20(H) H.<;

The batch ARX model (5) can be formulated as linear regression ~yJ,

where SI,:.p.t is a smoothing vector. If it. is f1ll't.her assumed that process variable ]J will be used throughout the batch, then the estilnated profile of variable p in batch k is given as zi,(p)

t.iJl, = E{~xde

ZI,.

E IR.(n,,(t)+n,,(t)

Zk=Zk+Wk

(17)

+ E{vd

= ~Xke

(18)

with ~XI, = t.xi,(t.iJ~~, t.it.i,-). This means that, if the process variable estimation error model (14) is a valid approximat.ion. then estimation of model parameters from t.he pretreated data will give unbiased model parameter estimates. Although unbiased, model parameter estimates based on data from a single batch would have excessive variance. Thus to lower the variance of model parameter estimates, all available data should be nsed for the model parameter est.imation

(12)

= SI,:,pzi,(lJ)

Let the true observation given as

= t.XI"e + VI,

where t. X 1,' = ~xdt.y~., t.ui,-) is a structured regressor m<,trix with past outputs and inputs and e = e(A, B) is a column paramet.er vector with the model parameters from t.he batch ARX model. Taking the expectation of the linear regression (17) and recalling (Hi) we find that

(11 )

... Sk.p,N]'

(16)

3.2 Pammeter Estimation

.'1.1 Data Pret1'eatment

ZI,.([I) = [s~.p.o S~'Pl

E{~y?} = ~iJZ

E{~ud = ~it.k

be

(13)

where Zk is thc estimat.ed observation and Wj,- is a spquence of estimation errors. The estimat.ion error WI, will consist. of both systematic errors s11ch as t.lH' height of a characteristic peek being 11nderestinJated due t.o excessive smoot.hing a.nd/or t'/'imming the hills and filling th c valleys d ne to too low loca I regression order, and random errors. Thus the estimat.ion error Wj,- is modeled with a random walk wit.h respect to the batch index k

y = [t.fj; t.fj; = [t.x; t.x;

t.iJ~ D

J'

t.X~13 ]' e

(19) = xe

The linear system (HI) would however, most likely still be rank-deficient and solving it ill a Lea.st Squares (LS) sense would still produce estimates with low bias, but excessive variance. Such excessive model parameter variance would despite the low bias, yield poor model predictions (Larimore, 1996). Hence. to improve the predictive capabilities of an estimated model the variance of the estimated model parameters must be further reduced.

(14)

where VI,. represents a sequence of batch wise nonpersistent estimation errors that are assumed to he zero-mean. Consider the expected difference betweeu two successive hatches, then

592

~

A possible approach to reducing the variance of model parameter estimates is to enforce that the estimated model possesses some desired model properties. One such model property could be that neighboring grid-point models are analogous in the sense that they exhibit similar behavior. In fact, without this property, the model would be a set of independent models and not a gTid of interdependent models. Enforcing model properties however, inevitably introduce bias into the model parameter estimates. There will thus be a tradeoff between the bias and variance of the model parameter estimates and this trade-off will determine the predictive capabilities of estimated models. A parameter estimation method that could incorporate model properties into LS estimates is generalized ridge regression, which also is referred to as Tikhonov regularization. We thus propose to estimate the model parameters by solving the extended LS problenl

O(A)

=

arg minl(Y -

e

xe)' (Y

-

A ""')in [,(A) s.t.

E

(22)

01\

(X' X

+ L' A 2 L)

nonsingular

where N'Ba1 is the number of batch difference profiles available for cross-validation. In this fashion, the computational burden of solving (22) is determined by the number of elements in the finite set 01\. Thus far only estimation of a specific batch ARX parameterization, i.e., a batch ARX model with model orders nA(t) and nB(t) for t = 1, .... N has been considered. These model orders are however unknown and will also have to be identified from data. This means that in addition to the regularization weighting matrix A also the model orders can he used to tune the predictive capabilities of the model est.imate. Traditionally, ARX model orders are selected based on minimizat.ion of measures such as Final Prediction Error (FPE) or Akaike's Information Criterion (AIC) both of which are proport.ional to the optimal value of the LS objective being minimized as part of the identification, to prevent modeling noise/disturbance characteristics, i.e., overfit. Overfit is however also prevented if the ARX model orders ,u'p selected hased on minimization of the mean squared prediction errors from cross-validation of the estimated models. This means that the ARX model order,s can be selected based on minimization of

xe)

+ (ALe)' (ALe))

A

~ ~ (,(A),(,(Al]

(20)

(X'X + L'A"L)-I X'Y where the penalty ALe is a column vector of weighted differences hetween parameters in neighboring grid-point models. In this fashion, the structured penalty matrix L maps t.he parameter vector e int.o the desired parameter differences and the diagonal regularization matrix A weights the parameter differences. The estimated parameter vector OrA) is a function of the regularization matrix A, which det.ermines the shrinkage and hence t.he trade-off between bias and variance. This means that the regularization matrix A can be used t.o tune the predictive capabilities of the model estimate. Through the particular choice of penalty mat.rix L. the regularization matrix A also determines the interdependency between the grid-point models in the model grid.

{nA(t)}~I]= rniny b(A)) [ {ns(t)L=l {"A(t)j;~l {"a(t)j;~l

A given by (22)

8.1.

If the ARX model orders are assumed constant t.hroughout the hatch l/A = nA(t) and 1I.B = n8(t) for t = 1, ... , N, theu (23) simplifies to

(nA, nB) = lllin h(A)] 1/04 ·71.0

8.1.

(24)

A given by (22)

S..1 Model OTdel's Ilnd Regulw'ization Weigh.ts Several methods for choosing (optimal) regularization weights can he found in literatnre (Hansen, 1996), but all of these consider ei ther scalar regularization weights 01' diagonal penalties. In the present work it is proposed to silnply select it regularization mat.rix from a finite set A E 01\. that yield near mininlum rnean squared prediction error, when the estinlatedmodel is cross-validated through "pnre-sinlulation". Thi-\t is, given the '·pure-simulat.ion" prediction error profile (x-{A) from cross-validat.ion hatch k

4. APPLICATION To demonstrate the capability of the proposed data-driven models, an industrial, production scale Bacillus protease fermentation has heen lllodeled from bistorical data (NovozYllles A/S). The modeling objective was prediction of the online measured variables used to supervise the fermentation as well as the product activity which is measured sparsely off-line. That is. the objective wa.s to obtain a model that. can predict the course and outcome of a batch given the batch recipe and its initial conditions. For this Bacillus protease fermentation the hatch recipe consists (essentially) of reference profiles for two

the regularizat.ion matrix A is t.he solut.ion to the discrete optimization problem

593

T,me (-I

Fig. 1. Example of cross-validation of an industrial Bacillus pmtease fermentation model. The five outputs. Dissolved Oxygen (DO), Carbon dioxide Evolution Rate (CER), Oxygen Uptake Rate (OUR), pH. and product activity, are predicted given information about their initial conditions and the hatch recipe (assuming perfect control) - i.e., Pv.7·p-si71l'll.la.I.ion prediction. The thin solid lines (or '*' for the product activit.y) are the historici1lml'i\surenwnts as logged. t.he bold solid lines are the pretreated data (a.'; the data from which the model was identifie(L out not. usl'd in the identification), and the dotted lines are the model predict.ions. suostn'lte feeds. an alkaline feed, pressurr, and temperature -~ i.e .. these reference profiles are t.he inputs of t.he nlOde!. For most of these inputs t!w cnrrent pnlctice is however, that t.he reference profile is eit.her not loggf'd or not manipnlated from hatch to hatch. As a temporary workaround, perfect control WilS assumed and the reference profilrs wpre replaced by the realized profiles. As is COllllnOlI pract.in' in supervision of fermenters. Dissolved Oxygell (DO), Carbon dioxide Evollltioll Rate (CER). Oxygen Uptake Rate (OUR). and pH were chosen ilS process indicat.ors. Along wit.h the prodnct activity these process indicators IlIakeup thl' ont.pllt.s of the IIl0de!.

ARX models. Snch model grids can be used for both off- and on-lil1l~ monit.oring, prediction, control, and opt.imizat.ion applications. It is further proposl'd that these model grids are ident.ified from historic;)l process data using ridge rl'gression. By identifving all the ARX models in a Illodel grid simultaneously. t.he int.erdependency of the ARX models can he used to reduce the variance of their est.ilnat.es and thereby improve the predictiw capahilit ies of the est.imated model grid. The proposed dat.i'l-driven l\Iod~~ling scheme has been (h~nl()nstrat.ed through nlOdeling of an industrial f(~nnentation process.

TIlP historical data Wi1S smoothened and resillnplwl to 3D mill\ltes intervals. The product acti\'it\~ was J'('-Sitlllplf'd using linear int.erpolnhon. while t.he remaining inpllts and output.s were smoothpnpd Ilsing local constant regression and handwidt hs ranging fro1n 6 t 0 7:~ nearest neighbors. Before idelltificM.ioll t.he hat.ch difference profiles were nornwlizl'd. The model was identified nsing data frClln 2') batches and cross-validated using data from 9 IliIt.ches. one of which is shown in figure 1. The identifird Inodel orders ranged fronl o to 20 and t.Ile fol.alllnlnlwr of model pararneters l'st.ilnat.ed was l2.l3S. The mean cross-validi1tioll prediction error \\'a~ O. U.

REFERENCES Hanspn. Per Ch ristian (19%). Ra.n/,;- Deficient. a.nd Discrde lll- Posnl P7'Oblc1f/.s. Polyt.eknisk Forlag. Lynghy. I!astie, Travor. Robert. Tibshirani and ./erome Frie(!lnan (2001). The EIC1f/./:'71.t.8 of St.al.ist:ical Lca:mi7lY. Springer. Nl'w York. Larilllore. Wallace E. (19!J6). Stat.ist.ical opt.illlality and canonical variate analysis syst.em identification. Signal P'I'Ocessing 52(2). 131144. Lee. Jay H.. Kwang S. Lee and Won C. Kim (2000). i'dodel-based iterative leal'lling control with a quadratic criterion for t.ime-varying linear ~.vstellls. Automatico. 36(5), fi41-fi57. Simoglou. A., E. B. rVlartin and A. .I. Morris (2002). Statistical performance monitoring of dynamic n\\lltivariate processes usillg state space llIodell ing. Comp'Ute7'S a.nd Chemical Enyinc(,7'ing 26(6). 909 - 920.

5. CONCLUSION In the present paper it is proposed to model finite horizon, time-varying and nonlinear process opera tions with I-di mensional grins of interdependent

594