Copyright © IFAC Dynamics and Control of Process Systems. Corfu , Greece, 1998
NONLINEAR TERM SELECTION AND PARAMETER ESTIMATION IN THE IDENTIFICATION OF NONLINEAR REDUCED ORDER STATE SPACE MODELS William Docter and Christos Georgakis
1
Chemical Process Modeling and Control Research Center and Department of Chemical Engineering Lehigh University Bethlehem, Pennsylvania, USA
Abstract: This paper presents a general methodology for the development of Nonlinear Low Order Models (NLLOM) from data collected from detailed nonlinear simulation models. This metb'odology is divided into two tasks: development of a Average Linear Low Order Model (ALLOM) and augmentation of the ALLOM to form a NLLOM, the latter task being the focus of this paper. The tools examined for the nonlinear augmentation of the ALLOM include stepwise regression and nonlinear optimization. Results will be presented for the application of these techniques towards the development of an NLLOM from a detailed high purity distillation simulation. Copyright © 1998 1FAC
1. INTRODUCTION
(a) SJ)ecification of primary inputs and outputs of interest (b) Selection of input excitation signal and collection of identification data (c) Removal of separable nonlinear data components (d) Identification of initial Input-Output ALLOM (e) Selection of secondary outputs strongly related to the states of the initial AIr LOM (f) Re-identification of ALLOM
Detailed nonlinear models are frequently available for staged separation processes, especially when the equilibrium thermodynamics are well known and an estimate of the tray efficiencies can be made. These detailed models have been useful in off-line studies; however, their large size has precluded their use in on-line applications such as Model Predictive Control (MPC) and state estimation. An NLLOM is desired that will retain the dominant characteristics of the detailed simulation, yet be fast enough to be used on-line. This NLLOM will be obtained by first identifying a ALLOM and later augmentating it with the appropriate nonlinear structures. The latter task requires that the linear identified model be properly conditioned and accurate over a wide range of conditions. One such robust linear identification method is proposed by Docter and Georgakis [1997) and is outlined as follows:
The ALLOM that the above methodology arrives at will serve as the starting point for the development of a NLLOM which is the focus of this paper. The necessity for this extension is demostrated in Figure 1 where it can be observed that the trajectory of the ALLOM differs significantly from that of the nonlinear simulation data used to identify it. The methodology for extending the ALLOM to form an NLLOM is broken down as follows:
(1) Development of ALLOM
(2) Augmentation of ALLOM to form NLLOM (a) Calculate unmodelled nonlinear residuals (each ALLOM state corresponds di-
1 Correspondence author, address: D311 Ia.cocca Hall, III Research Drive, Bethlehem, PA, 18015, USA; phone: 610758-5432; fax: 610-758-5297 ; email : cgOO@Lehigh .edu
335
• Stepwise regression gives poor parameter values because it considers only the one step ahead prediction residuals. • Nonlinear parameter estimation, defined as a multi-step prediction approach, is able to effectively find more appropriate values.
.00
.' ', . .... ,.~ .
~
~."
~oo
....00
'"c
.S:: ~
'eon
0-
S o
C)
o
500
:
1000
..
..
:.. .. ..
,. .
r
1500
2000
:~
,.~ - to. . . \ :. . i : ,
2. SELECTION OF NONLINEAR TERMS
I
-:00
-'00 0
500
1000
1500
2000
The selection of nonlinear terms to be included in the NLLOM is divided into two tasks. The first task is a rough screening to determine what classes of non linear terms should serve as candidate predictors for the nonlinear residuals. The second task is to then prune the set of candidate functions down to a small set of functions that explain a majority of the variation in the calculated residuals. Both of these tasks will be performed using variants of stepwise regression. Stepwise regression builds a model between predictor and response variables in the following manner [Draper and Smith, 1981):
..~ '~~ o
soo
1000
1500
2000
Time (minutes) Solid - Detailed Simulation Dotted - ALLOM
The major points and contributions of this paper are:
(1) An initial regression model is generated that contains all the candidate predictors or none of them. In many studies, both of these sets of initial conditions are explorered in separate regression models in order to reduce the probability of settling in a local minima. (2) Each of the candidate predictors that is not in the current model ~ tested for inclusion by appending the predictor to the current model and then measuring the improvement of the models fit to the response. (3) Each of the candidate predictors that is in the current model is tested for exclusion from the model by removing the predictor from the current model and then measuring the degradation of the models fit to the response. (4) One of three actions is taken. (a) The predictor that causes the most improvement in step 2 is appended to the other regressors in the current modeL This appended model becomes the current modeL Steps 2 and subsequent ones are repeated. (b) The predictor that causes the least degradation in step 3 is removed from the other regressors in the current modeL This appended model becomes the current modeL Steps 2 and subsequent ones are repeated. (c) If neither inclusion or removal criterion have been met for any of the predictors, the stepwise regression process is terminated.
• Stepwise regression effectively reduces the number of nonlinear terms that are included in the NLLOM.
There is a great deal of ambiguity in the literature as to what criterion should be used to determine inclusion and exclusion. Common choices
Fig. 1. ALLOM vs. Detailed Nonlinear Simulation rect.ly to an output [Docter and Georgakis, 1997]) W/c = Y/c+l - Ay/c - Bu/c (1) (b) Specify candidate nonlinear terms, cp. (u/c, Yl:), for the modeling of each output's residuals. (c) Select the necessary terms for inclusion in NLLOM using stepwise regression between the candidate terms, cp., and the residuals, W/c. (d) Augment ALLOM with selected nonlinear terms and initial values of their respective coefficients. y/c+l = Ay/c + Bu/c + (2)
LOW.
(e) Find optimal values of nonlinear term coefficients using nonlinear parameter estimation. Each of the above steps will be discussed in detail in the following sections. It will be demonstrated that while stepwise regression can extract a manageably small subset of nonlinear terms that minimize the residuals, the parameters that it arrives at might lead to an unstable NLLOM. For this reason, a second step is required to recalculate the parameter values. In this paper, nonlinear parameter estimation is considered for this task.
336
are R2, statistical f-tests, Root Mean Square Error (RMSE) and Sum of Squared Errors (SSE) [Draper and Smith, 1981, Ryan, 1997, Jones, 1997] . Another measure that will be used in this paper is Akaike's Final Prediction Error (FPE) [Akaike, 1969] because of its ease of calculation and since it is a more appropriate measure than RMSE or SSE when comparing models with greatly varying numbers of parameters. Frequently, the user examines more than one of the above measures to reconcile "ties" that may occur if anyone measure is used.
Trial
Included terms
0
No nonlinear terms All Zi Z" Z = [V, YJ All ZiZjZk , Z = [V, YJ All ZiZ, and ZiZjZk, Z = [V, YJ All Yil!i. and u 2
1
2 3 4
Total Terms
FPE
0 28
\09
84
112 13
22 30 20 22
Table 1. Group Screenmg of Nonhnear Terms for Modeling of R.esiduals of Bottoms Composition generated in Section 2.1. The initial model that is used should contain either no predictors (ie., forward selection) or all of them (ie., backward elimination) . If none are initially included then the number of steps that are required is minimized. For example, if there are 100 candidate predictors from which only 3 predictors will ultimately be chosen, forward selection starting from an empty regression model may take as few as three steps to reach final solution while backwards elimination from a model using all regressors would require at least ninety-seven steps. While the forward selection mode minimizes the number of steps, it tends to ignore terms that are important as a group but not important individually, thus while backward elimination is more time consuming, it is also less likely to miss important terms. Typically the selection problem is done using both sets of initial conditions.
2.1 Initial Screening of Nonlinear Terms The first task in selecting the non linear terms that will be included in the NLLOM is to determine which classes of nonlinear functions should be considered. To t.his end, the following methodology is proposed: (1) The FPE is calculated for the nonlinear residuals between the ALLOM and the identification data. (2) Groups of candidate predictor functions are generated. For -example, Group 1 contains all quadratics, Group 2 contains all cubics, etc ... (3) A regression model between each group and the residuals is calculated. (4) The Group that yields the best regression in the FPE sense is selected for inclusion in the candidate set. (5) Regression models are calculated using the Group chosen in the last step appended with each of the remaining unselected groups. If any of the resulting models show significant improvement, then the corresponding groups are also selected for inclusion in the candidate set.
2.3 Term Selection Example The nonlinear function selection process is demonstrated with respect to one of the four outputs of an ALLOM of a high purity distillation column with nominal product purities of 99.9%. The output under examination in this example is the bottoms composition; however, the selection process itself is applicable to the other three outputs of the ALLOM which have been left out due to space limitations for the present paper.
This procedure will be referred to as "blockstepwise regression" in the remainder of this report. While block-stepwise regression is not stepwise regression in the normal sense since it includes and excludes groups of functions rather than individual functions, it essentially does the same task and saves a considerable amount of time versus normal stepwise regression in that the number of function groups is normally very small compared to the number of individual regressors being considered.
The first task is to select the classes of nonlinear functions that will be included in the final stepwise regression. The classes considered here are quadratics and cubics. The criterion used to determine inclusion of a particular class is the FPE. A summary of the FPEs for output 1 is given in Table 1. Trial 0 is the base case where no attempt is made to model the nonlinear residuals and will thus serve as a reference to which the other trial's effectiveness will be gauged. The inclusion of quadratics only in trial 1 affects a significant decrease of nearly 80% in the FPE versus the case with no model. The inclusion of cubics only in trial 2 does nearly as well; however, the inclusion of both quadratic and cubic groups in trial 3 does not significantly improve on either trial 1 or 2 while at the same time greatly increases
2.2 Final Selection of Nonlinear Terms Once the initial screening is complete, there is still the matter of pruning the set down to the few non linear functions that will be included in the NLLOM. This task can be done efficiently using normal stepwise regression on the candidate set
337
50
ee l
1
100 xx X
xxx XXX
xxx
i ~ ~
X
I
x
sc L
!
x
x
x
x
x
x
x
x
x
x
oi~__~__~__~__~__~ , __~,__~
10
Q
OL-__-L____L-__-L____L-__-L____
o
x
10
15
20
12
14
lroi
25
-
I
i 50L
Predictor Index
ill
Fig. 2. Initialization of Stepwise Regression using trial 1 candidate terms as predictor set
.. -- - - .,...-,.- - .--, - -11' - + -
1("-}c- -I--It
o
"
'"I
iso
the number of potential regressors. Thus, either the quadratic or cubic group will be included in the final regression, but not both groups. The quadratic group is chosen in this step since trial 1 yields both a better fit and significantly fewer functions than trial 2. In general, lower order terms will take precedence over higher order terms as is usually the expected case with Taylor series expansions of nonlinear functions.
III
--It'-""'--
)f - - - - - - - - ' - - - -,-
o o
10
12
,
14
0 _ _ __
"'--1(- _x-
-It -- .. -
+ - "'- _9 _ -x --It -
-It'-.,... --
'
,
,
,
,
,
,
2
4
6
8
10
12
14
!:~. '.. ~_ .. -'n.n.-x _'00' 00'00' -~_~ 00 o
o
It is possible that the user may delineate a group of functions that may be more efficiently dealt with as several smaller subgroups. This is indeed the case in this example in that the group of all quadratics can be subdivided into a number of subgroups, such as quadratics in the outputs , quadratics in the inputs and so forth. Trial 4 in Table 1 illustrates analysis of a particular subgroup that was detected in the initialization of the final stepwise regression shown in Figure 2 where the solid lines delineate the upper and lower bounds Qf the FPE of models using predictors that are a subset of the set of all quadratics and the "xl! 's show the FPE for models using each single predictor. From examination of Figure 2, it is clear t hat predictors 14 though 35 do not yield models that are significantly better than no model at all. Predictors 14-35 also have in common that they are all of the cross terms involving u 's (u,Uj, U,Xj) and thus delineate a subgroup of the quadratics. As shown in Table 1, trial 4, removal of these cross terms does not cause a change in the lower bound of the FPE as compared to that in trial!. Stepwise regression will arrive at the same predictor set whether these cross terms are removed beforehand or not, however, the time required to perform the stepwise regression is proportional to the number of candidate predictors, so this set should be pruned as much as possible during the rough screening.
'
,
,
,
,
,
,
2
4
S
8
10
12
14
i1-'00:-_.__ ; -.-_: -'00:--.-_: -_. 00: -_ -~ o
2
4
6
8
10
12
I.
Predictor Index Fig. 3. Progress of Stepwise Regression on Modeling of Distillation Bottoms Composition Residuals is removed from the model, and the x's delineate which predictors are not included in the current model and what the FPE will become if they were to be included in the current model. In the initial step, there are no predictors in the model and it can be seen that predictor number one will achieve the greatest reduction in FPE if included. In step one, predictor one is in the current model and predictor nine achieves the greatest improvement in FPE, and so forth . This process is repeated until all of the predictors in the current model can not be removed because of significant increase in FPE and predictors not in the model can not significantly reduce the FPE- This is the case in step five where predictors 1, 4 and 7 are in the final model. These predictors correspond to Y? , y~, and YIY2. This selected set is agreement with insights gained by examining the locations of the available inputs and outputs in the distillation column shown in Figure 4. It is the Yl residual that is being modeled in this example, and Y2 is its nearest neighbor. The combination of YIY2 and y~ may represent an attempt in the regression to reconstruct the composition on a tray that is intermediate to locations Yl and Y2, but that is not directly available because of the limited size of the ALLOM.
Figure 3 shows the stepwise regresion procedure using the trial 4 predictors in Table 1 to model the bottoms composition residuals. Again the solid lines, at 20 and 110, delineate the upper and lower bounds on the FPE for the set of allowed predictors, the dashed line delineates the FPE of the current regression model. The o's delineate which predictors are included in the current model and what the FPE will become if that predictor
338
ear models as an initial approximation of nonliuear processes; however, even these ARX models yielded by stepwise regression are unstable. The following hypothesis is proposed: • Stepwise regression is able to select the correct functions, both linear and nonlinear, for inclusion in the model; however, the parameter values that stepwise regression calculates for those terms are not suitable.
U1=Feed Flow U1=Reflux Rate U,=Reboiler Heat Duty ¥1=Bottoms Composition ¥1=Tray 17 Composition ¥,=Tray 1Composition ¥,=DistiDate Composition LC=PI Level ControUers
y .• ' I
The remaining task then becomes one of first demonstrating that suitable parameter values exist for the terms selected and then finding an efficient method for calculating those parameters. While subspace methods have shown considerable promise in the area of linear identification [van Overschee, 1995], their extension to nonlinear identification has been limited thus far to bilinear problems [Favoreel et al., 1997]. This is of little help in the distillation example considered here since stepwise regression chose nonlinear terms that were not bilinear. A bilinear models subspace model was identified anyway using the method of FavoreeI et al. [1997], and was found to have very poor performance, lending some evidence that stepwise regression choice to exclude the bilinear terms was correct.
U1 Fig. 4. Distillation Column 3. SOLUTION OF NONLINEAR FUNCTION PARAMETERS
One notable difference between ARX/NARX and subspace methods is that the ARX/NARX examines only the one step ahead prediction capabilities of the model to be identified, while linear subspace methods identify a model with an Output Error (OE) weighting when the input signal has a white noise spectra [van Overschee, 1995] as is the case in the ALLOM developed by Docter and Georgakis [1997]. The OE weighting means that the ALLOM has multiple step ahead prediction capability with the number of steps considered being tied to the number of block rows that are included in the Hankel matrices. Since many online tasks, such as MPC, attempt to make predictions multiple steps into the future, it is vital that the multiple step ahead prediction capability be considered in the identification process. To this end, a method is proposed that takes into account the multiple step ahead prediction by minimizing the error of the NLLOM trajectory rather than minimizing the one-step ahead error as is done by ARX/NARX. The methodology is broken down as follows:
The nonlinear term selection process discussed in Section 2 is performed for each output in the ALLQM and it provides two pieces of information about the NLLOM: which nonlinear terms are needed and values for their parameters. However, for the distillation example under consideration, the paramaters arrived at by stepwise regression yield an NLLOM that is dynamically unstable over the range of the identification data. This NLLOM almost invariably would shoot up to infinity given the same input sequences as the detailed simulation. The instability is likely caused by the least squares method that stepwise regression uses to find the parameters, which in effect, treat the overall modeling problem in a Nonlinear Autoregressive with Exogeneous Input (NARX) framework [Ljung, 1987]. It is not surprising that NARX does not lead to a satisfactory NLLOM since the unsuitability of the Autoregressive with Exogeneous Input (ARX) framework for identifying the ALLOM necessitated the use of subspace methods for that task [Docter and Georgakis, 1997]. Of particular interest is the fact that stepwise regression also identifies unstable linear models. If both linear and nonlinear terms are included in the candidate predictor set, then stepwise regression picks the linear terms for inclusion in the model before all others. This inclusion is in agreement with conventional wisdom that justifies the use of lin-
(1) The ALLOM is frozen (ie. its parameters will NOT be permitted to vary) (2) Nonlinear functions chosen by stepwise regression are appended to the ALLOM to form an NLLOM. All non linear function parameters are initialized to be zero due to the fact that the values supplied by stepwise regression tends to lead to a system that is
339
<0,
unstable. If in a particular case the value form stepwise regression yield stable but poor performance, then the nonlinear terms may be initialized with those values. (3) The identification input sequence, U, is presented to the NLLOM just as it would be presented to the detailed simulation. The reSUlting output of the NLLOM will be refered to as the NLLOM trajectory, fj (tI1), where f) (alb) denotes the a-b step ahead prediction of y at time a based on state information known at time b [Ljung, 1987]. (4) The trajectory error, err = fj (tI1)-y (t), and its Euclidean norm are calculated. (5) Parameters for the nonlinear functions in the NLLOM are varied by some optimization technique and steps 2 through 4 are repeated with the goal being to minimize the norm of the trajectory error.
,~~~ .
200
-400
en
.: .9 .c;; o ~
0.
El
.
o
SOO
1000
1500
2000
~ , ..,~~ .
:"
,;
.
.
. 100
o
-JOO
:
.
Co) -3000
500
1000
1500
2000
,l2iS???1 .100
-200 0
SOO
1000
1500
2000
'~1~
In many regards, this method is similar to the shooting method used to solve boundary value problems, except in this case, the intermediate points along the trajectory are just as important as the endpoint. Since the NLLOM itself comes into play in this optimization, the optimization problem is nonlinear with the possibility of local minima and the like.
o
100
1000
1&00
2000
Time (minutes) Solid - Detailed Simulation Dotted - NLLOM
Fig. 5. NLLOM vs. Detailed Nonlinear Simulation for the function parameters that consider multistep ahead performance.
Since many of the difficulties associated with nonlinear optimization increase greatly as the number of parameters are increased, it is not possible to arrive at an acceptable solution if all of the possible qttadratic and cubic functions are included in the NLLOM. On the other hand, the small number of functions chosen via stepwise regression lead to a managable NLLOM with a significantly decreased risk of numerical/initialization difficulties. The distillation NLLOM considered here has a total of twelve nonlinear functions spread out among is 4 outputs. The performance of this NLLOM after the optimization is shown in Figure 5 and shows dramatic improvement over the ALLOM shown in Figure 1.
References
H. Akaike. Fitting autoregressive models for prediction. Ann. Inst. Stat. Meth ., 21:243- 347, 1969.
William Docter and Christos Georgakis. Identification of reduced order average linear models from nonlinear dynamic simulations. In Proceedings of the 1997 American Control Conference. Paper FAl6-3, AACC, 1997. Albuquerque, NM, June 4-6. N. R. Draper and H. Smith. Applied Regression Analysis. John Wiley & Sons, Inc., second edition, 198!. W. Favoreel, B. de Moor, and P. van Overschee. Subspace identification of bilinear systems subject to white inputs. In Proceedings of the 1997 American Control Conference, volume WM073. AACC, 1997. Albuquerque, NM, June 4-6. Bradley Jones. Matlab Statistics Toolbox User 's Guide. The Mathworks, Natick, MA, January 1997. Lennart Ljung. System Identification: Theory for the User. P T R Prentice Hall, Englewood Cliffs, NJ, 1987. Thomas P. Ryan. Modem Regression Methods. Wiley, New York, 1997. Peter van Overschee. Subspace Identification, Theory-Implementation-Application. PhD thesis, Katholieke Universiteit Leuven, 1995.
4. CONCLUSIONS
Although the parameter values obtained via stepwise regression yield unstable and unsuitable NLLOM, it has been demonstrated that the functions that stepwise regression chooses can lead to a suitable NLLOM if the parameters themselves are solved by another method. This failure of parameter solution by stepwise regression is due to the fact that stepwise regression treats the parameter estimation problem within a one-step ahead context while MPC and state estimation require the low order model to have good multistep ahead performance. It has been shown in this paper that. nonlinear optimization can find values
340