Copyright © IFAC Dvnamics and Control of Process Systems. Corfu, Greece, 1998
A CASE STUDY IN NONLINEAR DYNAMIC MODEL IDENTIFICATION R.K. Pearson 1, B.A. Ogunnaike 2
2
/ Institut fur Automatik EI1f, Zurich CH-8092 Zurich, Switzerland (on leave from the DuPont Company) E.I. du Pont de Nemours and Company Experimental Station El/102, Wilmington, DE 19880-0101, USA
1 Introduction
y
Empirical modeling, either from measured process
ranh[hn(k-l) / r = --"---'-'--_..0...
data or from the responses of complex simulation
(,(k-l)
models, represents a practical means of obtaining approximate process models of moderate complexity. In particular, empirical modeling represents one way of obtaining process models compatible with many popular model-based control system design methodologies. This paper presents a brief case study that illustrates a number of important issues in the development of such models: the selection of input sequences, the selection of model structures, the estimation of model parameters, and the relationships between these different issues ..
.
In the specific example considered here, h = 1.5 liters per mole-hour, V = 10.5 liters, and d = 3.5 moles per liter. The sampling rate was T =0.1 hour, and the input range considered was 0.5 to 5.0 liters per hour; while this range is wider than that considered by Eaton and Rawlings, this change was made to enhance the nonlinearity of this model without losing its simplicity. Two points are worth emphasizing here. First, the exact discretization considered here is not equivalent to the Euler obtained by replacing derivatives with first differences, a procedure which does preserve the basic structure of the continuous-time model but only approximates the response to piecewise constant inputs. The second point is that the question of how different discretizations relate to the original continuous-time model is somewhat complex even in the linear case, where it is known that if the continuous-time model has m zeros and n > m poles, the discretized model will exhibit non-minimum phase behavior for sufficiently small sampling intervals T, even if the original system is minimum phase [2]. Analogous results have been developed for the discretization of nonlinear continuous-time systems, with the relative degree of the original nonlinear system playing the role of the pole excess n - m for linear systems [6]. Here, this particular difficulty does not arise because the Eaton-Rawlings CSTR model exhibits first-order dynamics (i.e., it is of relative degree 1).
2 Process Dynamics The basis of this case study is the simple CSTR model considered by Eaton and Rawlings [3]: dy 1 - = -h[ y +2Jly-2dJl] de
(k) = [J- 'Ql(k-J) ]y(k-J+2d'r}J-r k-J) J+-r[y(k-J)+Jl(k-J)]
(1)
Here, yet) represents the reactant concentration in the reactor at time t, and jJ.(t) = u(t)l2hV is the input flow rate (u(t), re-scaled to simplify subsequent results. One of the principal motivations for considering this model is that an exact discretization is possible, if we assume that the manipulated variable jJ.(t) changes only at discrete sampling instants tk = kT, for some fixed sampling interval T. Details are given in [7], but the following recursion relation can be derived relating the reactor concentration y(k) at time it to y(k-i) and jJ.(k-l), the constant input variable during the interval tk_/st !ilk:
329
U010
No.
0 1
2 3 4 5 6 7 8 9 10
v(k - 1)
Type
0 J1.2(k - 1) J1.2(k-2) J1.(k - 1)J1.(k J1.(k - 1)y(k J1.(k - 1)y(k J1.(k - 2)y(k J1.(k - 2)y(k y2(k - 1) y2(k - 2) y(k - 1)y(k -
Affine (reference case) Hammerstein Modified Hammerstein AR-Volterra bilinear, diagonal bilinear, superdiagonal bilinear, subdiagonal bilinear, diagonal controlled logistic eq. additive NARX non-additive NARX
2) 1)
2) 1) 2)
2)
so
Yo + ety(k - 1) ,v(k - 1),
+ (3u(k -
so
Figure 1: The four input sequence classes, p
= 0.10
a natural choice for a "moderate complexity" nonlinear model that could be identified and simplified by stepwise regression techniques [1J.
+ (3)
were considered as approximations to the exact discrete-time dynamics defined in Eq. (2). Here, Yo, Q, (3, and, are model parameters to be estimated from input-output data, while the term v(k -1) is either identically zero, giving an affine reference model, or one of the ten possible model structures listed in Table 1. For a more detailed discussion of the different model types listed there, see [7J; the main point here is that both the qualitative behavior of these models and their suitability to different model-based control system design procedures varies greatly. The primary motivation for choosing this particular set of models is that v(k - 1) then represent all possible nonlinear terms in a NARMAX model of the form: y(k)
0
,so
4
1)
""
"ala
o
Eleven different special cases of the following general model form:
=
,so
;I~I lVuJ'fULl
Empirical models
y(k)
so
GOl0
Table 1: The nonlinear model term v(k - 1)
3
8010
F(y(k -1),y(k - 2),J1.(k -1),J1.(k - 2)),
if F : R4 -+ R is restricted to a second-order polynomial in its arguments. This particular model form is
Input sequences
One of the primary points of this paper is that the results we obtain depend on the input sequences we use to excite the process in generating our identification datasets. Here, we consider input sequences defined as follows. First, consider an independent, identically distributed (i.i.d.) sequence {z(k)} defined for k = 1,2, ... , N, and let p be any real number in the range 0 ::; p ::; 1. Define the input sequence {u( k)} by u(1) = z(1) and, for k = 2, ... , N : u(k)
z(k) { u(k - 1)
with probability p with probability 1 - p.
Here, p represents the switching probability for this sequence, since u(k) =I- u(k - 1) with probability p. Note that as p -+ 1, the sequence {u(k)} approaches the underlying LLd. sequencf) {z(k)}, while as p -+ 0, the sequence {u( k)} approaches the constant limit u(k) = z(1) for all k. This class of input sequences has been considered previously for uniformly distributed i.i.d . sequences {z(k)} [4J; here, we generalize this input class by considering sequences {z (k)} drawn from the beta distribution with parameters a and b. On the interval
330
~
~
§
II !
0
0
!
, ,, !
§
i
...c.
'T
I' I
i
i ..:...
T
i
I
i
I
T
'T'
i
I
•
!
..L
I I
J...
~
•
I'
I
Rank
VOlO
BOlO
GOlO
AOlO
1
8
8
8
8
2 3
4 5 1
4 1
4 1 5
3 9
4 5 10 9 7
7
7
7 10
6 3
2 6
2 6 0
2 0
6 9 0 5 10
4 5
! J..
6
~
7
8 9
la
Figure 2: Prediction error vs. model number
11
[0, 1], the probability density function for this distribution is given by:
1
3 2
3
9
0 10
Table 2: Model rankings vs . input distribution
The wide horizontal line in the center of each box represents the median prediction error value, while the narrow horizontal lines connected to the box with dashed lines represent the minimum and maximum values obtained for the relative RMS prediction error. Similarly, the wide horizontal lines defining the top and bottom of the box represent the upper and lower quartiles, respectively; thus, 50% of the observed RMS prediction errors lie within these limits. Table 2 gives the relative model rankings obtained when the eleven models are ordered by median RMS prediction error. These results illustrate the influence of input sequence distribution on model structure selection. Note that the "best" two models obtained from these identifications were consistently Model 8 (best) and Model 4 (second best), independent of the distribution of the input sequence, a point that will be revisited in Section 7. The relative rankings of the other nine model structures do vary with the choice of input sequence, dramatically so in some cases. For example, Model 5 is ranked "third best" on the basis of the VOI0 input sequence results , but is "second 5 Structure selection worst" on the basis of the GOlO input sequence reFigure 2 shows box plots [5] of the relative RMS pre- sults. Also, note that while Model 0 (the affine referdiction errors obtained from 100 identifications based ence model) is consistently among the poorest of the on statistically independent VOlO input sequences. models considered here, it is not uniformly the poor-
where C is a normalization constant that depends on a and b. For a = b = 1, we recover the uniform distribution, while for a = b -t 0 the beta distribution approaches a degenerate limit concentrated at the two points x = 0 and x = 1. Similarly, for a = b -t 00, the beta distribution approaches a Gaussian limit, while if a "/; b, the distribution is asymmetric. Here, four specific members of this family are considered, designated as follows: "Dxyz" indicates a sequence defined by distribution D and switching probablity p = x .yz (e.g., VOlO is uniformly distributed with a switching probability p = 0.10). The four distributions considered here are the uniform distribution V (corresponding to a = b = 1), a "nearly binary" distribution B (corresponding to a = b = 0.1), a "nearly Gaussian" distribution G (corresponding to a = b = 4) , and an asymmetric distribution A (corresponding to a = 2, b = 4) . Examples of these four sequences for p = 0.10 are shown in Fig. 1.
331
est. If we were identifying these models by directly minimizing the RMS prediction error, this observation would represent a contradiction, since the affine model may be viewed as a constrained special case of all of the other models considered here. The difference is that the models considered here are identified by the standard practice of minimizing the one stepahead prediction error variance, and not the prediction error variance associated with the entire input sequence {u( k) }. The switching probability p also significantly influences the results of model structure selection. For example, in contrast to the DOlO results just presented, the median relative RMS prediction error is better for Model 4 than for Model 8 with the DIOO input sequence. Further, the five best models for the D030, DOlO, and D005 input sequences are the same (i.e., Models 8,4,5,1, and 3, in decreasing order), as is the worst (Model 0), while none of these orderings hold for the DlOO input sequence. In fact, if we examine the DlOO results closely, we find that all eleven models exhibit small enough prediction errors (e.g., median values less than 0.25% in all cases) that they may be regarded as essentially identical in performance; this result is a consequence of the fact that the DlOO input sequences change too rapidly for the process to follow, so all responses are small and well approximated by the affine model.
6
Parameter estimates
To illustrate the influence of input sequences on the estimated model parameters, Figs. 3 through 5 present box plots of the parameter estimates obtained for 100 identification experiments, plotted against the input sequence class. More specifically, the left-most four box plots in each figure show the range of parameters estimated from the D, B, G, and A sequences (left to right) for p = 1.00, while the central four box plots present the corresponding results for p = 0.10. The four box plots to the right of the triple vertical line show the influence of switching probability, summarizing the results obtained for the DlOO, D030, DOlO, and D005 sequences (left to right). Fig. 3 summarizes the results obtained for the estimated Q
,I I
-
'T
"T
T
= ..;..
:- T i
!
I!
.....
~
.L
i I
i
~
i
I
i
I I
i i
.....
Figure 3: Estimated
Q
~
! ,ii .....
T
I! ~ -
I -
i
:
II !
,
-
vs. input sequence, Model 0
parameter for Model 0, and several general trends are apparent. The leftmost four plots clearly demonstrate the influence of the input sequence distribution on the estimated Q parameter: the UlOO and BlOO sequences yield comparable results, both in terms of the median parameter estimate (Q ~ 0.80) and variability around this median value; the G 100 parameter estimates have about the same median value but lower variability, while the median parameter estimate for the AlOO input is noticably larger. The four box plots for p = 0.10 also show a clear dependence of the median parameter estimate on the input sequence distribution, but the most obvious feature of these box plots is their much greater variability. This point is seen even more clearly in the rightmost four box plots, which shows how the variability in the parameter estimates increases consistently as the switching probability decreases from p = 1.00 to P = 0.05. Qualitatively similar results are seen in the Q, {3, and Yo parameter estimates obtained for all cases except Model 8: a clear distribution dependence is seen, principally in the median parameter estimates, and a consistent large increase is seen in variability of the estimated parameters as p -+ o. The same general observations just made for the Q, {3, and Yo parameters also hold for the 'Y parameter for all models except Model 8, but here an additional observation is also noteworthy. As a specific example, Fig. 4 shows the box plots obtained for the 'Y parameter estimated for Model 1: note that for
332
~
§
~
•, i i T
~
i
!
" 7
Ii
T
-:
i ~
.,..
=-.L
,
.L
••
• •
i
I
1,
!
III :
..i..
i
~
''''
.,.
T
-:
I
T ell == == 1. .l.. .L
.,. ...
=
T i
T
!
T ! -=1. ~:
030
010
~
A
100
0015
Figure 5: Estimated, vs. input sequence, Model 8
Figure 4: Estimated, vs. input sequence, Model 1
p
1
~
'"'-
i
~
I
7
= 1.00 (the leftmost four box plots), the, parame-
ter is approximately zero, relative to the much larger negative values obtained for the median parameter estimate when p = 0.10 (the central four box plots). This observation is significant since, may be viewed as a "non linearity measure" for this specific problem (i.e., , = 0 corresponds to the affine reference model). In fact, this observation reinforces the point made above, that for p = 1.00, the input sequence changes too rapidly for the process to follow , so the responses are small in magnitude and well approximated by a linear model. As p -t 0, the process has time to respond to the input sequence changes and the nonlinearity becomes more apparent, reflected in the larger median values of the parameter estimate
,.
As noted above, the situation is dramatically different for Model 8, consistently the best of the eleven approximations considered here. In this case, the variability in the estimated Yo, Q , and, parameters decreases in going from p = 1.00 to P = 0.10, in marked contrast to all of the other models; this behavior is seen clearly in Fig. 5, which summarizes the estimation results for the parameter f. Also, note that the shift toward significantly more negative, parameters with decreasing p observed for Model 1 is also observed here. Interestingly, the variability in the {3 parameter increases consistently as p -t 0, just as in the other models.
Summary
The case study summarized in this paper was presented to illustrate the nature and extent of the influence of input sequences on the results of model structure selection and parameter estimation. The particular example chosen - the Eaton-Rawlings reactor model of Eq. (1) - is simple enough to permit explicit discretization and to avoid certain fundamental difficulties that arise in discretizing continuous-time models of relative degree greater than one [6]. Eleven different empirical models - one affine reference model and ten different non linear models - were identified and their relative performance as approximations of the true process dynamics was compared in terms of relative RMS prediction error. The input sequences on which these identifications and evaluations were based were random step sequences, generalizing those considered elsewhere (e.g., [4]) from a uniform distribution of values on a specified interval [Umin, U max ] to a beta distribution on this interval. The results presented here illustrate clearly that both the relative rankings of these models and the estimated model parameters obtained for a given model structure can depend strongly on the choice of input sequence. As a specific example, Model 5 is a bilinear model based on the non linearity vs(k-l) = J1.(k-l)y(k- 2) . For the UOlO and BOlO input sequences, this model ranks third in terms of median RMS prediction error,
333
~
significant disagreement between the estimated (J parameter ('" 0.76) with the coefficient of Jl(k - 1) in Eq. (4), Model 8 out-performs the Euler discretization . In particular, the peak residuals seen in Fig. 6 are approximately three times smaller, and the standard deviation is approximately half of that for the Euler discretization.
,-------------------------------,
References ,.
[1] S.A. Billings and W.S.F . Voon, "A prediction-
'OD
error and stepwise regression algorithm for nonlinear systems," Int. J. Control, V. 44, 1986, pp. 235 - 244.
Figure 6: UOlO residuals , Model 8 vs. Euler
only slightly worse than Models 8 and 4 and clearly better than the fourth place model. In contrast, for the GOlO input sequence, Model 5 offers poorer performance than the affine reference model, surpassing only Model 10. In fact, Model 10 is also an interesting case, varying strongly with both distribution and switching probability; in particular, the relative ranking of this model varies from one of the best (i.e., rank 4 for the BOlO input sequence) to the absolute worst (for both the GOlO and UlOO input sequences) . In view of this strong dependence on input sequence generally seen for the relative model ranks , the uniformly good performance of Models 4 and 8 is particularly noteworthy. It was observed earlier that both of the non linear terms on which these models are based are included in the Euler discretization of the original continuous-time model (1) . Specifically, this discretization may be written as: y(k)
y(k - 1)
+ 1.05Jl(k -
1)
(4)
-0.15y2(k - 1) - 0.30y(k - l)Jl(k - 1).
The model residuals y(k) - y(k) for a typical UOlO input sequence are shown in Fig. 6 for this model (solid line), along with those for Model 8 (open circles) . Comparing the estimated model parameters for Model 8, it is clear that the observed values Yo '" 0.01 and Q ' " 1.05 are consistent with this model, and 'Y '" -0.2 is comparable to the coefficient of y2 (k - 1) in Eq. (4) . Despite the lack of the bilinear term and
334
[2] K .J . Astrom, P. Hagander, and J. Sternby, "Zeros of Sampled Systems," Automatica, 1984, ppl 31 - 38.
V.
20,
[3] J .W . Eaton and J.B . Rawlings, "Feedback control of chemical processes using on-line optimization techniques," Comput. Chem. Eng., v. 14, 1990, pp. 469 - 479.
[4] E . Hernandez and Y. Arkun, "Control of Nonlinear Systems Using Polynomial ARMA Models," AIChE J. , V . 39, 1993, pp. 446 - 460.
[5] D.C. Hoaglin, F . Mosteller, and J.W. Thkey, Fundamentals of Exploratory Analysis of Variance, John Wiley and Sons, New York, 1991.
[6] S. Monaco and D. Normand-Cyrot, "Zero dynamics of sampled nonlinear systems ," Systems Control Lett. , V . 11 , 1988, pp . 229 - 234. [7] R.K. Pearson, Discrete-time Dynamic Models , in preparation.