Copyright e @ IFAC Advanced Control of Olemica1 Otemical Processes. Kyoto. Japan. 1994
Nonlinear Input/Output Modeling
Ronald K K.. Pearson
E.I. DuPont de Nemours and Company, Wilmington, Delaware, 19880-0101
Ab8tract. Abstract. This paper gives a highly abbreviated overview of some of the key issues in empirical nonlinear modeling for chemical process applications. This task is complicated by the inherent nature of nonlinea.rity: nonlinearity: the term describes a class of systems by the one feature they lack. - linear 11v.. laclc. In fact, .this this division -linear • • nonlinear - suggests a "unity" or "homogeneity" Consequently, this review will focus on of the class of nonlinear systems that does not exist. Couequently, su~classes of nonlinear models that have analytically useful structural characteristics specific sub-classes and comparisons will be made both between theses classes and with the more familiar linear models. Length limitations restrict these discussions somewhat, but it is hoped that the range of examples will be great enough to demonstrate how nonlinear model identification is 0/ thi. both similar to and different from linear model identification. The general conclu.ion of paper i. that nonlinear nonlineor input/output modeling i. a vitally many tlitallJl important practical art with manJl unre.olved unre.ollled illue'i i••ue'j the principal objectille objective 0/ of thi. paper i. to elucidate el~cidate .ome of 0/ the.e illue i••ue•. •. Keywords. Nonlinear System Identification, Higher-Order Statistics, Model Structure Selection, Nonparametric Modeling, Input Sequence Design
1
INTRODUCTION
The title of this paper immediately raises two questions:
1. why nonlinear models? 2. why input/output models? The motivation to explore nonlinear models comes from the unavoidable nonlinearity of the dynamics of many chemical processes. Indeed, several of the refer~nces cited in this paper deal with the nonlinearity of two of the most important chemical process unit operations: reactions and separations. Turning to the second question, while it is true that fundamental (i.e., (Le. , "first principles") models generally give us more complete process understanding than empirical models do, they are also generally much more complex and require correspondingly longer to develop. Thus, while models like that developed by Congalidis, Richards, and Ray (1989) for the co-polymerization c~polymerization of methyl methacrylate and vinyl acetate could, in principle, be used for applications like model-predictive control, in practice, they are not. Also, note that besides providing simpler models, analysis of input/output data can also give us useful process insights that can be used in subsequently developing or refining fundamental models. In particular, all fundamental models are based on assumptions (i.e., these effects are important, those rong: are negligible) and these assumptions may be wrong: empirical models can help us uncover such "surprises." As many can ca.n attest, even linear empirical modeling can be quite challenging. The objective of this paper
is to give some general insights into the additional complications that can arise in nonlinear empirical Consequently, Section 2 defines both the modeling. Couequently, class of linear ARMAX models and several classes of nonlinear dynamic models. Section 3 illustrates the general nature of nonlinearity with two specific examples of nonlinear behavior - asymmetric responses to symmetric input changes and sub-harmonic generation in response to periodic input sequences. Building on this background, baclcground, Section 4 discusses some of the practical considerations involved in selecting a nonlinear model structure and Section 5 discusses some of the important connections between nonlinear models and higher-order statistics. Section 6 uses these results to discuss the notions of identifiability and input sequence design and Section 7 illustrates many of the ideas discussed earlier with a process example that is simple enough to be comprehensible, but complex enough to illustrate the nature of the problems involved. Finally, Section 8 gives a brief summary of the main ideas presented here. For a more detailed discussion of some of the ideas presented here, refer to the forthcomming survey by Pearson and Ogunna.ilce Ogunnaike (1994). (1994) .
2
MODEL STRUCTURES
The following paragraphs describe a few of the more common nonlinear dynamic model structures appearing in the literature. Because many of these model structures are generalizations of linear models, the discussions will begin with the linear case. For brevity, this discussion will be restricted to the discrete-time
formulation. In this formulation, one of the most popular linear representations is the ARMAX (AutoRegressive Moving Average with eXogenous inputs) model discussed diacuaaed by Ljung (1987):
y(k)
=
E G,y(k ajy(k - j) + E bju(k h,u(k - j) ""
qq
,-1 cje(k + E c,e(k
H(z)
f(·)
,.0
j_l
j_O
r
,.0 j_O
(1)
j}. j).
In a typical process application, this model would relate the process response sequence {y(k)} to the input k)}, called an ezogeneoul ezogeneouI input in the sequence {{u((k)}, statistical time-series literature. In the applications pplicatioDs considered here, this input can be either deterministic or stochastic and represents the time-varying value of a manipulated process variable. The "disturbance sequence" or "noise sequence" {e(k)} {e( k)} represents the combined effects of measurement noise, modeling errors (e.g., neglected nonlinearities), unmeasured disdiaturbances, etc. Probably the most common approxima.tion is to take {e(k)} as an independent, identiimation cally distributed (LLd.) o. For (i.i.d.) sequence, setting r r > 0, the coefficients {Cj} permit the description of correlated disturbance sequences. In either case, note that since {e( k)} is a stochastic process, {y( {Ye k)} is also a stochastic process, regardless of the nature of the inpu t sequence {u( k) k)}};j if the input sequence is also input taken as a stochastic process, it is generally assumed to be independent of the error sequence {e(k)}. Note that this model extends directly to multiple-input ARMAX models by adding terms to the second sum, corresponding to the lagged values of additional exogeneous inputs. This observation is generic and also applies to all of the nonlinear models introduced below.
Static
Linear
Nonlinearity Nonlinearity
Dynamics
Fig. 1. Hammerstein Model Structure
H(z)
f(·)
=
=
Gju(k aju(k - ;) j)
00
bi,ju(k hi"u(k - i)u(k - j)
iaO ,=0 j.O i.O 00
00
00
E E E c"iJu(k CI,i,ju(k -I)u(k -l)u(k - i)u(k - j)
+ EEE
1_0 i_O ,=0 j_O '=0 i=O
=
+ .... As a simple example of this model class, the following second-order Volterra model will be considered repeatfonowing sections (c.f. Nikias edly in the following Nilrias and Petropulu (1993»:
y(k)
=
u(k)
+
h(k buCk - 1)u(k l)u(k - 2).
(3)
This class of systems is very broad (c.f. Tong (1990), sec. 2.7), but to be practically useful, the sums must be truncated to some finite upper limit M and the finite. number of sums included must also be made finite. Even when M is relatively small and only a few sums are retained, objection is frequently raised to the number of parameters required to specify pecify Volterra models. A particularly useful subset of the class of Volterra models is the class of "block-oriented" nonlinear models. Probably the most popular member of this class is the Hammerstein model, discussed in Billings (1980), Haber and Unbehauen (1990), Greblicki and Pawlak (1989,1991), Eskinat, Johnson and Luyben (1991), and Su and McAvoy (1993). The structure of this model is shown in Fig. 1 and consists of a static nonf (.) followed by a linear dynamical system linearity 1(·) H (z). If the {unction function 10 f(·) is analytic, it is not diffiH(z). cult to show that this model can be represented by
00
j .. o ,=0
Nonlinearity
00
=
= Yoyo + L
Dynamics
+EE
Probably the best known class of nonlinear systems that do possess "moving average" representations is the Volterra model:
y(k)
Static
Fig. 2. Wiener Model Structure
Two special cases of this linear model are particularly significant. The case p = 0, q = 00 corresponds to the convolution model with {bj} {hj} representing the impulse response of the system. By Wold's decomposition theorem, any stationary Gaussian stochastic {y(k)} process {Ye k)} may be represented in this form with {u( k)} an i.i.d. LLd. Gaussian sequence and {e( k)} identically zero (c.f. Kay (1988), p. 112). Conversely, the case p 00, q 0 represents an infinite-order au toregressive model. As in the previous case, it folautoregressive lows from a theorem of Kolmogorov that any stationary Gaussian process may also be represented in this form (d. (c.f. Kay (1988), p. 112). The practical implication of this result is that some linear systems are best approximated by "low-order autoregressive models," wbile other systems are best approximated by "lowwhile order moving average models." Conversely, it will be demonstrated in Section 3 that this result does not extend to nonlinear systems: some nonlinear phenomena are "inherently autoregressive" in nature and do not possess a "moving average" representation.
=
Linear
(2)
2
Artificial neural net orks are mathematical models networks motivated by and looeely modeled on biological neural models propoeed originally by McCulloch (1943). Probably the most popular artificial neural network feed forward network (c.f. Rumelhart et al. is the feedforward (1986», (1986». a multi-layer structure that implements a m --+ static nonlinear map T : R m Rn between &D an mdimensional input space and an n-dimensional outm and y e put space. In particular, u E Rm particular. if 1£ E Rn, Rn. a two-layer feedforward network implements the transformation: formation:
series (2) with "diagonal" model coeffithe Volterra aeries cients (i.e., (i.e.. hiJ 6;J = 0 unless i = j, j. CiJ,II C;J." = 0 unless k. etc.). If the order of the blocks is reversed i j k, follow, the linear dy- i.e., i.e .• if the static nonlinearity lollow, namics, model. namics. the resulting system is called a Wiener model, (1980). Haber and Unbehauen discussed in Billings (1980), (1990), (1990) . Greblicki Grebliclri and Pawlak (1991), (1991) . and Greblicki GreblicJri (1992) . This model is shown in Fig. 2 and is not (1992). equivalent to the Hammerstein model, &8 subsequent model. as examples will illustrate. Both of these nonlinear models are special cases of the more general "sandwich model" considered GreblicJri coDBidered by Brillinger (1977) and Greblicki and Pawla.k Pawlak (1991), (1991). in which the static nonlinearity is "sandwiched" between two linear dynamic models. More general "block-oriented" nonlinear models have been investigated involving both series and parallel connections of static nonlinearities and linear dynamics (c.f. Billings (1980), (1980). Haber and Unbehauen (1990), (1990). (1991) . As a specific examand Greblicki and Pawlak (1991). ple, the Ury80n U rysen model consists of several Hammerple. stein models connected in parallel with their outputs summed, summed. but driven by a common input (c.f. Billings (1980». (1980» .
= =
=
=
=
n
Jli 11;
(6)
Si) 8;)
»
= F( y(k --1),y(k 1), y(k - 2), ... , 11(1; - p), (4) = ...,y(k u(k), 1£(k u(k - 1) 1),•...• ..., 1£(k u(k - ,) q) 1£(k). e(k - 1), e(k 2), ..., e(k + e(k) 1). 2) •...•
This flexibility may be used in developing "neural versions" of either block-oriented nonlinear modexample. applicaels or NARMAX models. For example, Hammerstern models tions of neural network-based Hammerstein McA voy to chemical processes is discussed by Su and MeAvoy (1993). (1993) . Similarly, Similarly. "Model IV" discussed by Narendra and Parthasarathy (1990) is basically the NARMAX model described above with F(·) implemented as &8 a feedforward network. These networks feed delayed responses from the output layer back to the input layer and are called recurrent feedforward networks. In multi-layer networks, networks. it is possible poesible to feed delayed "hidden-layer" responses back to the hidden layer itself, introducing dynamics that are entirely internal to the network. Networks of this form were proposed by Elman (1990) and have been considered for chemical process model applications because of their similarity to nonlinea.r nonlinear state-space models. In particular, these models have the general structure:
( .) is a nonlinear function of the p + q + r + 1 where F (.) variables indicated. Usua.lly, Usually. this nonlinear function is taken to be a low-order polynomial, polynomial. but if we extend this class to more general functions, functions. the range corr~ of nonlinear behavior we can represent expands correspondingly. In particular, particular. note that most of the nonlinear autoregressive models discussed by Tong (1990) are special cases of the NARMAX model with nonF(.). The advantage of restrictpolynomial functions F(·). F(.) to multivariable multi variable polynomials is that it permits ing F(·) stepwiae the use of linear regression techniques like stepwise regression for model identification (c.f. Billings and (1986» . Alternatively, Alternatively. Chen and Tsay (1993) Voon (1986». consider a different structural restriction that permits the use of nonparametric non parametric regression methods (see Section "for 4 for a brief discussion). Specifically, Specifically. the NAARX (Nonlinear Additive AutoRegressive models with eXogeneous inputs) are defined as:
= h(y(k fl(y(k - 1» 1» + h(y(k f'J(y(k - 2» ... = 2» + ... + I,.(y(k f,,(y(k - p» p» + go(u(k» (u(k -1» go(1£(k» + 91 g1(1£(k - 1» + ... ... + gq(u(k (5) gq(1£(k - q» + e(k).
x(k) y(k)
q»
= 1), u(k» u(k» = F(x(k - 1). = G(x(k», G(x(k».
(7)
where F(·) F (·) and G(.) G(·) represent vector transformations like those discussed above. One advantage of this formulation is that the general structure parallels that of typical first-principles models, models. raising the possibility of both simplified model identification and physical model interpretation (c.f. Scott and Ray (1993».
Note that the linear ARMAX model with LLd. i.i.d. modeling errors discussed above (r = 0 in Eq. (1» (1» represents an important special case of the NAARX family. Similarly, Similarly. Hammerstein models may be obtained by taking 1;(%) fi(X) = aiX a;% and 9i(X) 9i(%) = 6i90(X) 6i90(%) where go (%) is the static nonlinearity defining the Hammerstein model. Further, Further. note that the logistic model y(k) = a,(k a1'(k - 1)(1 - y(k - 1) 1» - perhaps perha.ps the standard chaotic discrete-time model (d. (c.f. Tong (1990» - is also a special case of the NAARX model.
=
=
WijUj --
it" components of the input where 1£;i &Dd and !/i y; are the i'" and output vectors, vectors. respectively, respectively. {Wii} {W;i} is an n x m matrix of "synaptic weights" and {8 {Si} i } is an m-vector of "bias weights." The function .(.) ,(.) is a saturation nonlinearity (typically, (typically. .(x) ,(%) = 1/(1 + e-ZS » called the function ." Multi-layer networks are con"squashing function." structed by connecting two-layer networks in cascade thr~layer network is and it has been shown that a three-layer flexible enough to approximate any continuous transformation T with arbitrary accuracy on compact subm (c.f. Cybenko (1989», sets of Rn and Rm (1989». although Kramer (1991) and others have argued that more than three layers are desirable for certain applications.
r»r»
y(k)
.(L W;i1£i '(L i-1 i-I
The "autoregressive" family of nonlinear dynamic models is represented by the NARMAX (Nonlinear ARMAX) models studied by Billings and Voon (1986):
y(k)
= =
=
Closely related to the feedforward networks just ded~ scribed are networks based on radial basis functions (RBF) (c.f. (d. Pottmann and Seborg (1992». These networks also implement multivariable nonlinear static
=
3
mappings F (·), defined by: F(·),
-u(A:). In contrast, the response of the second-order -u(k). Volterra model to a step input of amplitude A is:
N H
F(x)
=L
o;tP(lIx - ydl) Giq,
0 Jf(k) = { A ,(k) { A(I + Ab) A(1
(8)
=
ii_I .l
x, ,p(.) tP(·) where IIxll Ilxll is the Euclidean norm of the vector x, is a scalar function, and {a;} {Gi} is a set of unknown model coefficients to be determined from the available data. The function q,( tP( z) typically exhibits a local minimum 80 the vectors {y;} or maximum at z 0, 80 {Yi} are called the centers of the basis functions. As AIJ a specific example, Pottmann and Seborg (1992) consider the func(Z2 + {J)-1/2, tion q,(z) tP(z) (z2 P)-l/2, which exhibits a maximum at z o. The advantage of radial basis functions P aild and the centen centers is that, once the width parameter {J {y i} are specified, identification of the model coeffi{y;} cients {o;} {Gi} reduces to a linear regression problem; the harder part of the identification problem is the initial {y;}. The use of RBF networks specification of P fJ and {Ya}. in dynamic modeling is similar to the use of feedforward neural networks: the static mapping F(·) can implement either the static nonlinearity appearing in the NARMAX model or the static nonlinearities appearing in block-oriented models.
=
=
te similar asymChien and Ogunnaike (1993) illutr illustrate metric step responses in their analysis of high-purity distillation columns. In particular, changes in reflux ratio or steam flow to the reboiler that move toward lower purity have greater effects on composition than changes that move toward higher purity. Responses of this general type - moves away from a fundamental constraint (e.g., the 100% purity limit) being easier than moves toward it - can be expected in many systems if they are operated close enough to the constraint. An even more pronounced example of asymmetric response is considered in Section 7. The phenomenon of subharmonic generation illustrates the difference between "inherently autoregressive" nonlinear behavior and "inherently moving average" behavior. Specifically, note that if an input sequence {u( k)} is periodic with period L, it satisfies the condition u(k + L) u(k) for all A:. yeA:) g(u(k» k. If Jf(k) g(u(k» for any static nonlinear function g(.), it follows immey(k + L) = g(u(A: g(u(k + L» L» = g(u(k» g(u(A:» = y(k) y(A:) diately that y(A: for all k; that is, if the input sequence is periodic with period L, the output sequence is also periodic th with period L. Since the generation of the "l/n _ "I/n'''subharmonic" corresponds to lengthening the fundamental period of oscillation from L to nL, it follows that static nonlinearities cannot generate subharmonics. Since linear systems also preserve periodicity, it also follows that block-oriented models - Hammerstein, Wiener, "sandwich," ·sandwich," Uryson, etc. - cannot generate sub-harmonics. Similar arguments show that Volterra and other "moving average" nonlinear modcannot generate sub-harmonics, either. els c~not
The nonlinear models just described represent a very incomplete list, but they are representative of the types that have been considered for chemical process applications. The following sections will attempt to give 80me some useful insights into the general nature of these models, their important differences, and the practical issues involved in using them to model realworld processes.
3
(9)
k ~ 2.
·steadySpecifically, assume b > 0 and note that the "steadystate gain" inferred from this step response is K 1 + Ab, which is greater than 1 for steps of positive amplitude and less than 1 for steps of negative amplitude.
=
=
k
=
=
=
MODEL BEHAVIOR
To illustrate some of the important differences between linear model behavior and nonlinear model behavior, the following paragraphs briefly consider two inherently nonlinear phenomena: asymmetric responses to symmetric input changes and subharmonic generation in response to periodic input sequences. Thi8 lecond ,econd example ,ubharerample i8 not included becau,e becau8e 8ubharmonic generation i, ligi8 a problem 0/ of great practical ,ignificance in chemical procel' proce81 application.!, application8, but rather becau8e it clearly illu,tratel illu8trate8 one profound difference claue8 of nonlinear 'Yltem,: 8yltem8: "nonbetween two broad cla'8el linear moving average model, model8"nand and "nonlineor "nonlinear outoreautoregreuive modeI8. "n Other inherently nonlinear phegre"ive model,. nomena include harmonic generation, jump phenomena, synchronization, chaos, and a wide variety of amplitude-dependent responses, as discussed in Tong (1990) and Nayfeh and Mook (1979).
=
=
In contrast, Tong (1990) shows explicitly that the following threshold autoregressive model can generate sub-harmonics: su b-harmonics:
y
(k) ( k) _ { 2y(k - 1) + u(k) u(k)
Iy(k - 1)1 $ 2 (10) ly(A: -1)1> 2 Iy(k
=
u( k) = Specifically, in response to the input sequence u(k) (_I)k (_I)1e with period 2, the output sequence has perd -subharmonic" of the in"1/3 rd riod 6, representing a "I/3 put. This observation suggests that the key feature required in nonlinear models capable of generating sub-harmonics is recursion: the present output must depend on previous outputs and not just on the past history of the inputs. Applying this conclusion to neural networks, it follows that recurrent networks are
First, consider the 2nd-order Volterra model defined in Eq. (3) with u(k) and y(k) representing deviations from a nominal (e.g., "steady- state") operating condition 80 so that both positive and negative values make physical sense for both variables. For a linear system, note that if y(k) y( k) is the response to an input change u(k), then -y(k) is the response to an input change 4
necellllary to generate sub-harmonica. .u~harmonica. In particular, necessary it follows that either the "neural NARMAX" models or the Elman networks discussed above can generate sub-harmonics su~harmonics (note that Eq. (10) is a special cue of Eq. (7», (7)), but the "neural Hammerstein" and other block-oriented neural models cannot. The same conclusions apply to nonlinear models based on radial basis function networks.
4
Non parametric Nonparametric NARMAX
/
l
Neural
Polynomial
Non parametric Nonparametric
NARMAX
NARMAX
NAARX
STRUCTURE SELECTION
l
procellll The first step in constructing any empirical process model is the selection of an appropriate model structure (e.g., Volterra, Hammerstein, Wiener, NARMAX, etc.). This selection problem may be approached from either of two basic philosophies - exploratory or confirmatory. The exploratory philosophy seeks to extract as much structural information while the confirfrom the available data as possible, wbile matory philosophy selects a candidate model structure and asks the data for a confirmation or rejection of its appropriateness. Closely related to these philosophies are the notions of parametric, nonparametric, and semi-parametric modeling. In parametric modeling, the structure of the model is fixed, reducing the model identification problem to one of determining a finite set of model parameters for which the model predictions best match the available data. In nonparametric modeling, a particular model structure is sought from the available data. For example, in nonparaf(x(k» is metric regression analysis, a model y(k) f(%(k» sought without specifying the form of f(·) (c.f. (d. Hardle dIe (1990». (1990)). In semi-parametric modeling, part of the model structure is completely specified while part is only loosely specified. In process applications, we are generally interested in obtaining a parametric model, non parametric or semi-parametric procedures can but nonparametric su~ be extremely useful in suggesting model forms for subsequent parametric identification.
/
l
Polynomial
Semi-parametric
NAARX
Hammerstein
l Parametric Hammerstein
NARMAX Identification Procedures Fig. 3. NARMAX
of polynomial NARMAX models: the function F(·) F(.) ta..ken to be a low-order multiva.riable polynomial is taken in all of the lagged output, input, and model error variables. The number of terms in this polynomial (hence the number of model parameters to be identified) grows rapidly with increasing polynomial model order. The use of stepwise regression techniques has been discussed by various authors (e.g., Draper and Smith (1981), Billings and Voon (1986), Hernandez and Arkun (1993)) (1993» and it provides a systematic approach to reducing this complexity by retaining only those terms in the polynomial that significantly improve the goodness of fit. FinaJIy, Finally, the general proceparametric regression in many variables dure for non nonparametric advocated by HardIe Hardle (1990) is the use of additive models, replacing a single N-dimensional smoothing problem with N one-dimensional smoothing problems. This idea was wa.s Chen and Tsay's (1993) motivation for investigating nonparametric non parametric NAARX models.
=
idea.s are illustrated in Fig. 3 for the NARThese ideas MAX modeling problem. In its most general form, this problem involves selection of a functional form for multiva.riable nonlinea.rity noulinea.rity F(·), F(.), selection of the inthe multivariable put l, and specification of pu t variable va.riable or variables va.riables {u( k) k)}, the order parameters p, q, r in Eq. (4). Once we have specified the model inputs and order parameters, it would be possible - in theory - to use nonparametric regression techniques to determine the form of F(·), as a.s indicated at the top of Fig. 3. In practice, non parametric rehowever, the data requirements for nonparametric a.s discussed va.riables are extreme, as gression in many variables HardIe (1990). Three more practical alternatives by Hardle are shown in the second tier of Fig. 3 - neural NARMAX modeling, polynomial NARMAX modeling, and non parametric NAARX modeling. Other possibilities nonparametric exist (e.g., RBF networks instead of neural networks), but the three given here are representative. First, the neural NARMAX alternative discussed in Section 2 replaces the non nonparametric parametric NARMAX problem with a parametric one, implementing the multivariable multiva.riable static nonlinearity nonlinea.rity F(·) as a.s a neural network. Next, probably the most popular procedure in practice is the use
The model identification problem may be simplified further by imposing additional structural restrictions NARMAX model, as a.s illustrated in the third on the NARMAX tier of Fig. 3. For example, if the cross-terms in the polynomial NARMAX NARMAX model are eliminated enNAARX model. Altirely, the result is a polynomial NAARX ternatively, recall that Hammerstein models are spefi(X) aiX and cial cases of the NAARX model with li(X) gj(x) = 6j90(%) hjgo(x) where 90(%) go(x) is the static Hammerstein 9j(%) nonlinearity. Thus, we can adopt a semi-parametric nonlinea.rity. model identification procedure by using nonparametgo ( .) and ric smoothing to estimate the scalar function 90 {bj}. estimating the linear model parameters {ail {ai} and {hj}. Greblicki and Pawlak (1989) discuss nonparametric identification of this nonlinearity noulinea.rity in detail. This model identification problem can be further reduced to a
=
5
=
fully parametric procedure like that employed by Eemat, Johnaon, and Luyben (1991). There, the aukin at , Johnson, thors developed Hammerstein models for distillation columns, representing the linear dynamics by a 10 low-order ARMAX model and the static nonlinearity by a low-order polynomial. The authors conclude that the identified Hammerstein models represent the column dynamics better than linear models do, but that model performance degrades unacceptably for highpurity columns. It remains an open question, however, whether this conclusion represents an inherent Hammerstein models or a limlimitation of the class of Hammerstem su~class considered by the itation of the parametric sub-class authors. Note that the semi-parametric Hammerstein modeling procedure described above could give useful insight into this question. In particular, note that nonparametric estimation of the static noJilinearity nonlinearity could apbe used to identify functions that are not well aJ>proximated by low-order polynomials (e.g., saturation nonlinearites). As a specific example of this difficulty, Pottmann, Unbehauen and Seborg (1993) consider a pH control problem in which the pH curve must be approximated by a piecewise-polynomial function to achieve an adequate fit; this approach is closely reparametric lated to the use of spline functions for non nonparametric regression (c.f. Hardle (1990». (1990)).
more difficult and more important importa.nt for nonlinear models than for linear models because the qualitative behavior of nonlinear models can depend very strongly on the specific input sequence driving them. The example discussed in Section 7 illustrates this point in detail. Finally, note that other practical considerations may dictate that certain model structures are preferred prefened over others, even at the expense of goodness of fit or some of the other adequacy criteria just discussed. In particular, differences in overall qualitative behavior between different models with comparable goodness of fit can be profound, as Section 7 will demonstrate. For example, it has been argued (c.f. (c.£. Morari and Zafiriou (1989)) (1989» that non-mimimum phase dynamics in linear models limit achievable control system performance because they do not exhibit well-defined (e.g., stable and causal) inverses. Analogous difficulties can arise in nonlinear models, where the corresponding notion (c.f. Monaco and is that of unstable zero dynamics (c.l. Normand-Cyrot (1988)). (1988». A different, though similar, difficulty can arise in block-oriented models if the static nonlinearity f(·) fails to have an inverse. Speciff(·) defined on a compact ically, note that a function fO support set is invertible if and only if it is strictly monotonic (c.f. Klambauer (1975), p. 181). In Section 7, several models are considered for the same system, including both a non-invertible Wiener model and a NARMAX NARMAX model (Model 1) for which the inverse is given explicitly. The point is, even though both models may yield comparable goodness of fit to the available input/output data, control strategies developed from these models are apt to be quite different in both structure and performance.
In taking a regression approach to model structure selection, the primary focus is on goodness of fit. In practice, it is important to note that other criteria may be of equal or greater importance. For example, an extremely important related issue is the sensitivity of the model prediction enors to changes in the pro~ problem formulation. formulation . This issue has two aspects - the sensitivity of the algorithms used for model identification to enors errors in the data, and the region of validity of the model M as an approximation of some realworld process 'P. 1'. The issue of sensitivity to errors enors in the data is the domain of robust statistics (c.f. Pol(1981)) and will be jak and Tsypkin (1980), Huber (1981» discussed further in Section 5. Note also that this sensitivity to data quality will generally depend on both the model structure chosen and the input variables selected for inclusion in the model. In particular, note that the stepwise regression procedure discussed above for polynomial NARMAX NARMAX model development considers each possible model term for inclusion or exclusion on the basis of goodness of fit alone; Leger and Altman (1993) give a detailed discussion of the infl uence of such variable selections on the robustness influence of the resulting model fit. Specifically, Specifica.lly, they note that the terms included in a multivariable model can profoundly affect the sensitivity of the model predictions to outliers in the data.
5
HIGHER-ORDER STATISTICS
When a linear system is excited by a Gaussian input sequence {u(k)}, the resulting output sequence {y(k)} b(k)} is also Gaussian. When a nonlinear system is excited by a Gaussian input sequence, the resulting output is generally non-Gaussian, establishing a strong connection between non-Gaussian statistics and nonlinear systems. As a practical example of this connection, Brillinger (1977) considered the problem of identifying the "sandwich model" discussed in Section 2. He showed that the static nonlinearity and both linear dynamic subsystems can be identified from the power spectrum of the input sequence, the crossspectrum of the input and output sequences, and a cross-bispectrum between these sequences. Note that the bispectrum and other higher-order statistics are Gaussian sequences (c.f. Nikias identically zero for Gaussia.n and Petropulu (1993» (1993)) and are therefore quantitative measures of "non-Ga.ussianity." "non-Gaussianity." Brillinger's results assume a Gaussian input sequence {u(k)}, are nonparametric, and require the static nonlinearity G(·) GO to be asymmetric (Le., (i.e., neither even nor odd). Qualitatively similar results for low-order Volterra model identification have been derived by Koh and Powers (1985) and Pearson, Ogunnaike, and Doyle (1994). In particu-
The region of validity of the model M is an extremely important topic, to be discussed further in Sections 6,7, and 8. One way of describing this region of validity is in terms of a set U of input sequences {u( k)} for which the model predictiClC-. {y( k)} are "sufficiently predictiCli:... {j(k)} close" to their "true" values {y(k)}. An important extension of this concept is to consider the qualitative behavior of the model M over the set U does the model exhibit multiple steady states, chaotic regimes, etc.? Note that these considerations are both 6
lar, the matrix B of oC second-order Volterra coefficients coefficient. J} is given by: {hi {bioi}
Outlier
Average
Variance
Skewness Skewneu
Kurtoei
No
-0.006 +0.002
1.000 1.065
-0.002 +0.454
0.553 3.894
Yes
_ 11 R-1T. D-1"., R- 1 B -= 2' ....... .£,u tAU .......... ..1
(11)
Effect. oC of +8u TABLE 1 1 Effects +8(7' Outlier on Estimated IABLE Moment. Moments
where R Rv.u .... is the autocorrelation matrix Cor for the Gaussian input sequence {u( k)} and T,u {u(k)} T.... is a crosscrouinput and output bicorrelation matrix between the inpnt sequences.
non-Gauuian processes. As in the Wiener model exnon-Ga1188ian ample just considered, to see the nonlinear nature of this system, it is necessary to consider higher-order statistics. Here, however, the effects effect. are extremely subtle because the skewness ., = E{%3}/(13 E {%3} / u 3 is also zero. A complete dynamic third-moment character(1, 2) = ization reveals that the third cumulant C3 c3(1,2) Eb(k)JI(k-l)l/(k-2)} E {,(k ),( k -1 ),( k - 2)} is nonzero, reflecting the "dynamically asymmetric," non-Gaussian nature of the sequence {,( k)}. In fact, the nonlinear model strncb(k)}. structture ure correctly follows from Eq. (11) since this system is a second-order Volterra model driven by a GausGaU8sian input sequence. Equivalently, this model may be identified by nonparametric non parametric frequency-domain techniques based on cross-bispectra like those considered by Brillinger (1977).
Brillinger's (1977) "sandwich model" identification results are based on cumulant expressions involving (generally multivariable) static nonlinearities and multivariate Gauuian Gaussian random variables. Applying these results to the Wiener. Wiener model shown in Fig. 2 yields the Collowing following cross-correlation expression:
=
(12)
where {%(k)} is the output sequence generated by the linear subsystem H(z) in response to the input sequence {u(k)}. {u(k)} . In the special case where H(z) = 1, %(k) x(k) = u(k) and Eq. (12) reduces to Bussgang's theorem (c.f. (d. Bendat (1990». Taking the Fourier transof Eq. (12) yields the nonparametric "equivalent form oC identification- expression: linear system identification"
=
=
of higher-order statisFinally, before leaving the topic oC tics, a cautionary example is offered, motivating the need for robust statistical pracedures procedures like those proposed by Huber (1981). In particular, note that the higher-order statistics required for nonlinear system clauical identification are dynamic extensions of the classical statistical notions of skewness and kurt08is (c.f. Box and Tiao (1973». Consider the effects effect. of a single "outlier" of magnitude +8(7' +8u contaminating a sequence oC of 1024 i.i.d. LLd. Gaussian Gauuian random variables with mean zero and variance (7'2. u 2 • Estimated values for the mean, variance, skewness, and kurtosis kurt08is with and without this outlier are given in Table 1. The presence of this one "bad" data point (Le., a large error in 0.1% of the data) has negligible effect on the estimated mean, increases the variance slightly, but inflates both the estimated skewness and kurtosis enormously: note that both of these quantities ,hould be zero. Analogous outlier sensitivities can be expected for estimated bispectra, trispectra, and other higher-order statistics. Indeed, the sensitivity is worse for these dynamic statistics in the sense that the effects ofthe of the outlier can be distributed unpredictably over the entire frequency space on which these higher-order spectra are defined.
s.... (w) = SSuv("') .... (w) S"u(w)
fJ(w) H(w)
E{X!(X)}] E{%!(%)}] H( ) W ·• [[ E{%2} '"
=
(13)
of the fact that Suz("') S~(w) = Here, use has been made oC H(w)Suu(w). f(%) = ax, H(w)S.... (w). Note that if !(%) 4%, this expression fJ(w) = aH(w). 4H(w). reduces to the correct linear result b(w) However, if !(%) f(%) = ox 4% + q,(x) ~(%) where q,(x) ~(%) is any even function (Le., q,( -x) = q,(x», ~(-%) ~(%», it follows that E{%~(%)} 80 again, fI(w) fJ(w) 4H(w). ThUl, E{xq,(%)} 0, 80 aH(w). Thw, nonparametric 'f/.tem identification Gow,ian innonparumetric'l/,tem identifiCAtion tDith G4U1,ian arbitru'll even et/en perturbation, ~(%). put. i, blind to arbitrary perturbatioRl q,( x). To detect the presence of this nonlinearity, it is necessary to consider higher-order statistics. Conversely, if f( %) a% !( 4% + b%3, b%3 , the "equivalent linear model" identified is H(w) fJ(w) (a (4 + 3b(7'2)H(w), 3b(2 )H(w), permitting t~e ratio bla b/4 to be estimated from the dependence of H(w) on the input signal intensity (7'2. u'.
=
=
=
=
=
=
=
The second-order Volterra model considered earlier gives another illustration of the need for higher-order statistics in nonlinear system identification. Specifically, if the model in Eq. (3) is excited by a zeromean, i.i.d. Li.d. Gaussian sequence with variance (7'2, u', the model response will also be a zero-mean sequence that is uncorrelated, i.e.: Le.:
R.~(k)
= Ru,,(k) = {
6
In theory, linear systems may be identified from their responses to a wide variety of input sequences: impulses, steps, pseudo-random binary sequences (PRBS), Li.d. LLd. Gaussian sequences (i.e., (Le., Gaussian "white noise"), or correlated Gaussian sequences (e.g (e.g.:.. Gaussian autoregressive processes) processes).. In practice, somE 80mt input sequences are more effective than others, bUI but input sequence design procedures for linea.r linear system
(7'2 0
IDENTIFIABILITY AND INPUT SEQUENCE DESIGN
(14)
For Gaussian processes, lack of correlation implies statistical independence, but this result does not hold for 7
power .pecidentification typically only specify Ipecify the potDer (c.f. Ljung (1987». In partrum of the sequence (c.!. ticular, note that the random telegraph wave (a pardiscussed in Papoulis (1965» ticular PRBS discuued (1965» and the first-order Gauwan Gauuian autoregressive process have the power same correlation structure (and thus the same lame po er spectrum). Consequently, these sequences would be equally effective in linear system identification. The choice between these (or other) sequences will ultimately depend on additional practical considerations, as discussed in Ljung (1987).
limits of this permutation invariance. If the nonlinear "aquuhing "squashing functions" are removed from a feedforward net ork, this permutation invariance expands to an network, class of transformainvariance under a much broader clua tions. In particular, the (linear) map implemented by such post- multiplication IUch a network is invariant under poetof the weights eights in layer i by an arbitrary nonsingular matrix W and pre-multiplication of the weights eights in layer i + 1 by its inverse W- 11 • This invariance is analogous to the invariance of state-space models under the appropriate similarity transformations of the matrices defining the model. Specifically, consider the linear state-space model
In contrast, specification of the power spectrum of the input sequence is not sufficient for nonlinear system identification. Specifically, it was wu noted in Pear80n, Ogunnaike and Doyle (1994) that no PRBS input can be used to identify the diagonal elements of the B matrix in second-order Volterra models. However, the B matrix can be identified using the Gaussian autocorrelation sequence with the same autocorrelation matrix; this result is given explicitly by Eq. (11). In general, PRBS inputs are a poor choice for nonlinear system identification because they do not exercise the process over a wide enough range of input values. Note, however, that simply expanding the range of input values is not sufficient, either. For example, note that the impulse response of the second-order Volterra model defined in Eq. (3) is ,(k) y(k) = u(k) = a6(k), for arbitrary a. Thu., impul.e re.pon.e oJ thi• • y.tem Thu" the impul,e re'pon,e of thi, 'fI,tem give, give. no indication of oJ it. nonlineority, nonlinearitfl, regardle•• regardleu oJ of the input amplitude a or the .trength ,trength of oJ the nonlinearitfl nonlineority b. More generally, note that the impulse response of a Volterra model depends only on the diagonal model coefficients: ,(k) bk,k + Ck,k,k y(k) = Yo + ak a" + b"." c"."." + ....
=
= Axk = Ax"
Xk+l X"+1
=
Yk
+
BUk Bu"
(15)
CXk,
and note that the input/output response of this model is invariant under the transformation A -. - W- 11 AW, 1 1 B --. W- B, and C -+ - CW. In ARMAX models, this invariance corresponds to non-uniqueness under pole-zero cancellation. For example, consider the standard linear "first order plus deadtime" dead time" model:
y(k) ,(k)
=
ay(k = a,(k
1)
+
bu(k buCk - d).
(16)
y(k - 1) in terms of Using this equation to represent fI(k
,(k y(k - 2) and u(k - d - 1) and substituting the results fullfl equivalent back into Eq. (16) then yields the fully second-order model:
y(k) ,(k)
='0
bu(k-d) + abu(k-d-l).(17) = aa'y(k-2) ,(k-2) + buCk-d) 2
From a theoretical perspective, Li.d. LLd. input sequences are often very effective for nonlinear system identification (c.f. Cho and Powers (1991), Tseng and Powers (1992), Pearson, Ogunnaike and Doyle (1994». Because so many results are available for Gaussian stochastic processes, LLd. Gaussian sequences are particularly popular for both developing and investigating nonlinear system identification algorithms. In some cases, however, better performance may be achieved by using non-Gaussian sequences tailored to the model structure under consideration. For example, it was shown in Pearson, Ogunnaike, and Doyle (1994) that leptokurtic distributions (c.f. (c.!. Box and Tiao (1973» (1973» are desirable for identifying the diag
The same non-uniqueness arises in NARMAX models: simply use the original model to derive expressions for the variables ,(k j) required in the model and suby(k - i) stitute them back into the original equation. Since this result will always lead to a higher-order model, Ockham's razor can be invoked to select the simpler, original model. In practice, however, since these models are only approximations of a more complicated "truth," we will generally not identify either of these equivalent models, but rather a model that is "near" one or the other, depending on our initial choice of model order parameters, the available data, the details of the identification algorithm used, etc. Further, as the following example will illustrate, the same general qualitative behavior can often be obtained from very different model structures; in such cases, it may not be obvious which model is "simpler," nor whether there are any "near equivalences" between these models.
As in the case of linear system identification, identifiability also depends on the structure of the model sought. For example, note that the nonlinear static map implemented by a feedforward neural network is invariant under the permutation of processing units in the hidden layers. Thus, nonlinear models based on feedforward networks are only identifiable within the
Many of the nonlinear modeling issues discussed in the preceeding sections are illustrated by the following example. A single irreversible, exothermic reaction, A -+ - B, is assumed uaumed to occur in a CSTR. Detailed descriptions of this problem are given in N &has, ab as, Henson, and Seborg (1992) and Pottmann and Seborg (1992); the dynamics are described by two coupled
7
8
AN EXAMPLE
nonliDear ordinary ordiJ1ary dift'erential difl'erential equ&tiou equatiou relating t.he the nonlinear lpecies A in the reactor efftaeat effluent concentration CA of species to the reactor temperature T and le of ud the flow rate fc coolant coolut to the reactor cooling jacket. Like the hi&hhighpurity distillation diatillation column example diacuued d.iacuued earlier, this system .y.tem exhibits a pronounced asymmetry in ita its responae le. response to changes chuges in the manipulated variable fc. chuge in CA for a 10% step Itep clecretUe decretUe Specifically, the change in fe qe is a monotonic mODo tonic decrease that looks loob very much first-order linear .y.tem system response. In contrut, contrast, like a firat-order the change in CA for a 10% .tep step increGle & increa"e in 9c fe is a decaying oscillation, very much like the response of aecond-order linear .ystem. system. The folan underdamped second-order lowing paragraphs paragraph. will not attempt to develop a apelpecific empirical model for this CSTR example, but they will explore the model structure selection problem in some detail. In particular, this section will focus on the question: which cla.ues classes of models are rich enough to exhibit the qualitative behavior just ju.t described for the CSTR? The input sequence considered here will be: u(1) = = {{ :
k~O
1.0 0.5 0.0 -0.5 -1.0 0.6
0.4
0.8
1.0
Fig. 4. Wiener Model Step Responses
1.0
(18)
k
0.2
0.0
---
0.5
where A may be either positive or negative.
0.0
First, consider the class of truncated Volterra models. If each infinite sum is truncated to M terms, the model response reaches steady-state after M time .teps, steps, i.e.
-0.5 -1.0
M AI
!f(1) =!fo y(k) = Yo
M AI
M AI
Cli + A2~~biJ A2 L: L: biJ + ... (19) + A L: ~Cli
0.0
0.2
0.6
0.4
0.8
1.0
i_O
Fig. 5. Hammerstein Model Step Responses for 1k > M. Thus, if we are to approximate the CSTR mu.t response with a truncated Volterra model, we must take M large enough to allow the oscillatory positive step response to approach its .teady-.tate steady-state value. Generally, this value of M M will be large, resulting in an u unacceptably large number of model parameters to identify. For example, if it tues takes M 30 samples for the oscillation to decay, even restricting consideration to second-order Volterra models will require identification of approximately 450 quadratic parameters bi,i. bi.i'
of amplitude f(A). j(A). Thus, if the qualitative nature inherent!) of the linear .ubsystem's subsystem's step response is inherentl) monotone, the overall Hammerstein model's step response will also be monotone. Conversely, if the lineal .ponse line&! system exhibits an .y.tem u oscillatory oscilla.tory step .tep response, the Ham· Ham· merstein will also exhibit an oscillatory step response
=
In contrast, the Wiener model i, .ufficiently sufficiently llexibl~ flexible to exhibit step responses like the CSTR example. I1 h particular, consider the Wiener model with first-orde] first-ordel linear dynamics followed by the static nonlinea.rit) nonlinearit) j(x) f(%) g(h(x» g(h(%» where:
As noted in Section 2, block-oriented models are Volterra models with restricted structures that can be characterized by many fewer parameters. For example, it was noted that the Hammerstein model is a Volterra model in which only the "diagonal" model coefficients are non-zero. Applied to the example just discussed, this restriction reduces the number of quadratic model parameters from a.bout about 450 to 30; in practice, Hammerstein models may ma.y be approximated by many fewer parameters than this by exploiting the block structure of the model rather than using the Volterra representation. Unfortunately, the Hammerstein model is incapable of describing the qualitative behavior exhibited by the CSTR example considered here. In particular, note that the static nonlinearity f(·) preceeding the linear dynamics in the Hammerstein model changes a.a step of amplitude A into a step
=
h(x) h(%) g(%) ( %) 9
= = = ~
Izl) --.sgn(z) .. gn(%) In(l --1%1) {
1- e-~/Y".sin(21rjx) %~o x ~ 0 (20', 1 - e~/Y" %<0. % < o.
Here, "gn( x) is the algebraic sign function, defined ~ .. gn( %) &l +1 if % x > 0, -1 if % x < 0, and 0 if % %= o. O. This nonlin· earity was chosen so that y(t) l_e- tr// rr "in(21r/t+4; .. in(27rft+4> for a step input of amplitude A = +1 and y(t) = r/ r -_ 1 for a step input of amplitude A e- c1r -1. Th. response for intermediate amplitude steps is shown iJ Fig. 4. For comparison, the step responses (or for th4 th. corresponding Hammerstein model - i.e., the sam4 Bam. static nonlinea.rity nonlinearity followed by first-order linear dy namics - is shown in Fig. 5. As noted above a.bove
=
9
= =
=
clifIicult difficult to identify from input/output data uiDg uing either parametric or nonparametric procedures. 1.0
In leDeral, general, NARMAX N ARMAX models appear to be a better .tep choice for representing behavior like the CSTR step
0.5 o.S
response, as the following five examples illustrate:
0.0
1. ,,(k) (i)
-0.5 -o.S
2.
4,,(k -1) + bu(i buCk -1) + cu(i cu(k -1),(i -l),(k -1) = car(i ,(i) y(k) = aly(i oly(k - 1)1 + bu(i buCk - 1)
+ bu(i = buCk - 1) + y 2(i - 1) + buCk - 1) ,(i) = dd,,'(k ,(k) =
3. ,,(k) (i) = ar(i oy(k - 1) -1.0
4.
-1.0 -0.5 0.0 0.5 -o.S Fig. 6. Static Nonlinearity for Block-Oriented Models
=
5. S. ,(k) = ay(i oy(k - 1) dy'(k - 1) 1) + dy'(i
1.0
+
bu(i buCk - 1)
+
c9[y(i cS[y(k - 2)]
cu(i cu(k - l)y(il)y(k-
where I( S( z) in Model 3 is the function the step response of the Hammerstein model exhibits the qualitative character of the first-order linear subsystem, with appropriately transformed input amplitudes. Also, note that the steady-state behavior of the Wiener &Dd and Hammerstein models are identical because both models reduce to the static nonlinearity 10 at steady-state. Thu., the di.tinction diltinction between bettDeen 1(·) Hammer,tein model. lie. lie, entirely in their Wiener and Hammer.tein tran,ient behavior. behat!ior. tran.ient
z~O = {zo~ z ~0 z
9(z) S(z) = {
(21)
Conversely, these examples also demonstrate that the general qualitative behavior of nonlinear models with -similar" step responses can be radically different. "similar" This observa.tion observation emphasizes yet another point made in Section 4: goodness of fit alone is not a sufficient criterion for "model adequacy."
The static nonlinearity 10 I (.) defined in Eq. (20) is plotted in Fig. 6. Several points are worth noting. First, this function is highly asymmetric, reflecting the asymmetry of the Wiener .ystem system response. Further, note that this nonlinea.rity nonlinearity is not monotomc. monotonic. As noted in Section 4, it therefore follows that 1(·) is not invertible, complicating the task of inverse-based control system design, and probably limiting achievable control system performance. pedormance. In addition, note that 1(,) implies a non-monotonic the behavior of 1(·) non-monotomc relationship between the input amplitude A and the steadystate response of the model. Specifically, note th thatt the steady-sta.te +0.80 steady-state value of the step response for A steady-state value for A is greater than the steady-sta.te +1.00. While this behavior is somewhat counter-intuitive, it does emphasize another point made in Section 4: it is important to consider the behavior of any nonlinear model over the whole set of possible inputs U, and not just for a few selected sequences (e.g., ±10% flow ra.te rate changes in this example). example) . Both extrapolations beyond the original input sequence range and interpolation$ polation, between .elected input. inpuu can behave quite unexpectedly.
Model 1 is a bilinear generalization of the standard first-order linear model (see Tong (1990) for a discusnonsion of bilinear models). For step inputs, this Donlinear model reduces to the first-order linear model with autoregressive coefficient a + cA. The qualitative behavior of this step response can thus be determined from linear system theory. In particular, this model will exhibit a stable, monotonic response if 0 < a + cA < 1, a stable, damped oscillatory response if -1 < a + cA < 0, and unstable step responses outside these ranges. Step responses for A in the range -1 ~ A ~ + +11 are shown in Fig. 7 for ao = -0.35, b = 1.0, c = -0.55. While the exact details of these step responses are not identical to those of the Wiener model, the qualitative behavior is the same: a step of amplitude A = +1 yields a decaying oscillation, while a step of amplitude A = -1 yields a monotonic response. Unlike the Wiener model, however, the steady-state response varies monotonically mono tonically with A, a more intuitively pleasing behavior than the nonmonotomc monotonic variation exhibited by the Wiener model. Also unlike the Wiener model, note that Model 1 is invertible. In particular, the control input u(k u( k - 1) necessary to drive the output of Model 1 from y( y(kk -1) to y(k) is:
= =
=
One final point to note regarding the Wiener model is that the oscillatory character of positive step responses is described by the behavior of 1(%) lex) as % % -~ +1. In particular, note that 1(%) lex) oscillates with increasing frequency as % 1, analogous to the be+1, % -~ + ha.vior havior of the "topologist's sine curve" 8in(I/%) ,in(I/%) as % % ~ - 0 (c.f. Munues Munkres (1975»). (1975» . Alternatively, wee may regard this function as fractal in nature, since the qualitative behavior remains the same if we look at the function with successively finer resolution near % % = +1. Consequently, this function would be quite
=
=
u(k _ 1) = y(k) 1) , y(k} - tJy(k oy(k --I}, b+cy(k-l} b+cy(k-l)
(22)
ey( k --1) 1) i: #: o. provided b + CY(k O. Finally, it is important to note that the identification oC of this model is much easier than the Wiener model identification problem just
=
10
1.0 1.0
0.5
la
0.0 0.0 -1.0
I\.
-2.0
~
-0.5 -1.0
-3.0
0.0
0.2
0.4
0.6
0.8
1.0
Fig. 7. Model 1 Step Responses
0.0
0.2
0.4
0.6
0.8
1.0
Fig. 8. Model 2 Step Responses that Model 1 is a. a pa.raparadiscussed. Specifically, note tha.t metric model, linear parameters Cl, linea.r in the three model pa.rameters b, and c. Models 2 and 3 a.re NAARX models and also are both NAARX threshold autoregressive models (c.f. Tong (1990», sirnila.r similar in structure to Eq. (10). Unlike Eq. (10), however, both of these models exhibit an interesting b(k)} "nea.rly linea.r" scaling property. Specifically, if {,(i)} "nearly linear" {u(k)}, then the is the response to an input sequence {u(i)}, response to an input sequence {lu(i)} {~u(k)} is {l,(i)} {~,(k)} for any ..\ l > o. O. This result follows from the fact that tha.t va.lue function 1(%) f(%) = 1%1 in Model both the absolute value "half-wave rectification functionf(%} 2 and the tIJIalf-wave function" I( z) 8(%) in Model 3 a.re 9(%) are homogeneou, homogeneoul (c.!. Davis (1962» with respect to positive scaling constants l: ~: f(~%) I( lz:) = ~f(%). l/( %). Thus, it follows that the quantitative behavior of the step response of these models depends only on parameters a, algebraic lign the model pa.rameters Cl, b, and c and the Cllgebraic 'ign of the input step. In particular, pa.rticu1a.r, unlike Models 4 and qualitative behavior of the step 5 discussed below, the qUCllitCltive response is not a function of the input step magnitude IAI, only its sign. This behavior is evident in the step responses shown in Figs. 8 and 9 for these two models. Here, the parameters pa.rameters a.re -0.70, b 1 for Model are Cl a -0.70," 2 and aCl 0.60, b = 1, c = -0.80 for Model 3.
=
=
=
=
1.0
0.0 -1.0
= =
-2.0
0.0
0.4
0.6
0.8
1.0
Fig. 9. Model 3 Step Responses
=
Much more can be said about the qualitative behavior of these models because of their simple structure. In particular, note that if the response of Model 2 to an input sequence {u(k)} tuCk)} is uniformly positive, the absolute va.lue value function has no DO effect and the model reduces to Eq. (16) with delay d = 1. In contrast, if the response to an input sequence {u(k)} is uniformly negaa.bsolute value tive, the absolute va.lue function may be replaced by a negative sign, reversing the sign of the autoregressive coefficient Cl. These observa.tions observations describe the step responses shown in Fig. 8: for A > 0, the model shown is effectively Eq. (16) with Cla = -0.70, for (or A > 0, it is effecitvely the same model with (ICl = +0.70. If the response to {u( k)} can be both positive and negative, more complicated behavior is possible, as illustrated in Fig. 10. This plot shows the response of Model 2 with Cla = -1.5, b 1 to a positive step input. scales This response is chaotic and, as noted above, sca.les
1.0
0.5
=
=
=
0.2·
0.0 -0.5
=
0.0
0.2
0.4
0.6
0.8
1.0
Fig. 10. Chaotic Step Response for Model 2
=
11
1.0 1
2.0
0.5
1.0
0.0
0.0
-0.5
-1.0
-1.0
r
~
-1.5 0.0
0.2
0.4
0.6
0.8
0.0
1.0
linearly with the step amplitude A for any A > o. In contrast, for A < 0, the step response is always negative, so the model behaves like aa. first-order linear model with autoregressive coefficient aCl = +1.5; this response is unstable, growing exponentially with time. Interestingly, chaotic step responses have not yet been observed in Model 3, although sustained 0soecillations have, as shown in Fig. 11. There, the positive step response is shown for Model 3 with param0.35, b Finally, note that eters a 1, Cc -1.2. Fina.lly, both of these models exhibit unique steady-states. For Model 2, if -1 < a < 1, this steady state value is:
=
y. 1/.
= {{
=
bA/(1 - a) bA/Cl
bA/Cl + a)
0.0
0.0
0.2
0.4
0.6
Fig. 13. Model 4 Step Response, A
0.8
1.0
= +3.00
and not simply for set U of probable input sequences &Dd a few "representative" elements of U.
bA > O,a + c < 1 (24) bA
Unlike Models 1, 2, and 3, Model 4 exhibits multiple y(k) = ,(ky(ksteady states. In particular, substituting 1/(k) 1) = 1/. y. into Model 4 yields a quadratic equation for y., with the two roots: 1/.,
Model 4 replaces the nonsmooth absolute value nODnonlinea.rity in Model 2 with a quadratic term, yielding linearity a polynomial N NAARX AARX model. The step responses for this model a.re are shown in Fig. 12 for 6b = 1, d = -0.24 and the values of A between -1 and +1. These responses are qua.1itatively qualitatively similar to those shown in Fig. 8 for Model 2. Profound differences exist between these models, however, despite the apparent simplicity of substituting a quadratic nonlinearity for an absolute value. First, note that Model 4 does not share the "nearly linear" scaling property of Model 2: the qualitative dependence of the step response on the step amplitude A is significant, as shown in Figs. 13 and 14. Specifically, note that the step response for A = + +1.00 1.00 shown in Fig. 12 is only slightly oscillatory, while the step response for A = +3.00 +3 .00 shown in Fig. 13 is highly oscillatory. Increasing the amplitude further to A = +7.00 yields the chaotic step response shown in Fig. 14. Finally, note that if the step amplitude is made too large - either positive or negative - Model 4 becomes unstable. This general behavior further illustrates the point made in Section 4: nonlinear models need to be characterized over the entire
=
1.0
1.0
(23)
=
0.8
2.0
while for Model 3, the steady state response is:
_ { 6A/(1 bA/(1 - a - c) y. bA/(1 - a) 1/. bA/Cl
0.6
3.0
=
bA >00 hA> hA < 0, bA
0.4
Fig. 12. Model 4 Step Responses
Fig. 11. Sustained Oscillation for Model 3
=
0.2
=
=
=
y.
1 ± ./1Jl- 4d6A 4dbA = = 2d 2d
(25)
Further, note that if 1- 4dbA < 0, this model exhibits no real steady states, suggesting oscillatory, chaotic, or unstable behavior. Model 5 is a simple polynomial NARMAX model that includes boths Models 1 and 4 as special cases. For arbitrary parameters a, 6, b, e, c, and d, this model will generally resemble Model 4 in its overall qua.1itative qualitative behavior. For example, unless d 0, this model will exhibit two steady states like Model 4. Further, the qualitative qua.1itative behavior of this model will depend on the amplitude of the input sequence, like Models 1 and 4 and unlike Models 2 and 3. Typical behavior for the positive step response would be a monotonically increasing response for small amplitude steps, damped
=
=
=
12
proximate
6.0
4.0 2.0 0.0
-2.0
As noted in Section 4 and illustrated in Sections 6 and 7, specification of a reasonably "complete" set U of input sequences {u( k)} is important in nonlinear system identification. In particular, the qualitative nature of a nonlinear model's behavior can depend very strongly on its input sequence, in contrast to the behavior of linear models. In particular, note that if {udk)} {Ul (k)} and {u2(k)} {U2 (k)} are two input sequences to a linear model and {yl {Yl ((k)} k )} and {y2 {Y2 (k)} ( k )} are the corresponding output sequences, the response to any "intermediate" sequence ii(k) tiCk) = tUl tUI(k) (k) + (l-t)u2(k) for 0< o < tf < 1 is simply tYI(k) fYl(k) + (l-t)Y2(k). (l-f)Y2(k). Such "interpolatory" behavior mayor may not hold for nonlinear ARMAX models considered in Secmodels, as the N NARMAX tion 7 demonstrate. This observation has two pracnontinear model development. tical implications for nonlinear First, because the real-world process l' being modeled .," is itself nonlinear, it is important to characterize its behavior over the set U of practically important input sequences as completely as possible. Second, once a candidate structure is selected for the empirical model M of this process, it is important to characterize ib it, qualitative behavior over the set U as well. It is only after both of these characterizations have been made that we can hope to estimate "model adequacy."
-4.0
0.0
0.2
0.4
0.6
Fig. 14. Model 4 Step Response, A
0.8
1.0
= +7.00
oscillations at larger amplitudes, sustained oscillations O8cillations and chaotic behavior at still larger amplitudes, and instability at sufficiently large &lDplitudes. amplitudes.
8
=
SUMMARY
This paper has attempted to summarize a number nontinear input input/output of the key issues in nonlinear / out put modeling, comparing and contrasting this problem with its linear counterpart where possible. Consequently, the objective of the paper has been more to raise questions than to answer them and, indeed, some of these questions answer, even in general appear to be very difficult to a.nswer, tenns. Certain general conclusions may be drawn, terms. however. First, the example discussed in Section 7 emphasizes the point made in Section 4 that goodness of fit alone is not a sufficient criterion for "model adequacy." In particular, other considerations like the sensitivity of the identified model to errors in the data may be equally important and can be profoundly influenced by general choices of model structure, order a.nd terms (e.g., input variables) included parameters, and in or excluded from the model. In addition, the results of Section 7 clearly demonstrate that the overall qualitative behavior of different models with comparable ra.digoodness of fit for a few selected inputs can be radically different: some will exhibit chaotic responses or multiple steady states while others will not, the qualitative behavior of the responses of some models will change dramatically with the magnitude of the input sequence while others will not, etc.
The subharmonic generation example discussed in Section 3 emphasized the difference between "nonlinear autoregressive" and "nonlinear moving average" behavior. In particular, it was shown that subsu~ harmonic generation is an "inherently autoregressive" phenomenon that has no "moving average representation." The CSTR step response example discussed in Section 7 is less definite, but the results suggest that wbile while this response is "barely representable" by moving average models (i.e., the Wiener model), it is much more easily represented by autoregressive models. Conversely, this example also demonstrated that nonlinear nontinear autoregressive models can exhibit ~ p0tentially undesirable behavior (e.g., multiple steady states, chaotic step responses, amplitude-dependent instabilities, etc.). If such qualitative behavior is inherent in the process 1', its presence in the empirical model M is reasonable, but this is not always the case. In contrast, block-oriented and other "nonlinear moving average" models tend to exhibit less of this type of behavior. Consequently, intermediate cases like the high-purity distillation column pose particularly intriguing questions. Specifically, it was noted in Section 4 that while the Hammerstein Ha.mmerstein models investigated by Eskinat, Johnson, and Luyben (1991) do not adequately describe high-purity column dynamics, the
Unfortunately, length considerations did not permit a detailed discussion of robustness here, but this topic is extremely important in practice, as the last example in Section 5 illustrated. A useful approach to linear system identification is to first "clean" the data using a procedure based on robust statistics like that outlined by Martin and Thomson (1982). This approach effectively identifies dynamic outliers in the data and replaces them with estimates based on neighboring data points. An inherent working assumption in this apa~ proach is that the underlying "clean" data is well a~ ap13
Sons, New York. sy&S.A. Billings (1980), "Identification of nonlinear systems terns - a survey," lEE Proc., v. 127, pp. 272 - 285.
~ ~ ~ H{z) H(z)
S.A. Billings and W.S.F. Voon (1986), "A predictionerror and stepwise regression algorithm for nonlinear systems," 1nl. Int. J. Control, v. 44, pp. 803 - 822.
exp{ .) exp(·)
I
I
Baf/e,ian Inference G.E.P. Box and G.C. Tiao (1973), Balle,ian in Stati,tical naly,u, Addison-Wesely, Reading, MA. Statiltical A Analf/';",
Concentration
particD.R. Brillinger (1977), "The identification of a pa.rticular nonlinear time series system," Biometrika, v. 64, pp. 509 - 515.
Mainpulated Logarithm of Variable
Concentration
Fig. 15. Implicit Wiener model for high-purity columns
R. Chen Ch en and R.S. Tsay (1993), "Nonlinear Additive ARX Models," J. Am. State A,..oc., v. 88, pp. 955Stat. Auoc., 967.
more general question remains open: can these dynamics be adequately described by any Hammerstein model, or is the class of Hammerstein models simply too restrictive? A popular alternative to dealing with high-purity column dynamics is to model the logarithm of the product concentration instead of the concentration itself. Note that this alternative implicitly constructs a Wiener model of the process, as shown in Fig. 15. That is, since the relationship between the manipulated variable {u( k)} and the logarithm of product concentration is assumed linear, the relationship between {u( k)} and product concentration is a Wiener model with an exponential compensating nonlinearity, as shown. Nonparametric identification of the nonlinearity in either Hammerstein or Wiener models (c.f. Greblicki Grebliclri and Pawlak (1989,1991), Grebliclri (1992» (1992» would permit the search for other, more blicki effective nonlinearities in either of these approaches to modeling high-purity distillation columns.
LL. Chien and B.A. Ogunnaike (1993), "Modeling and 1.1. Control of High-Purity Distillation Columns," presented at AIChE National Meeting, Miami, November, 1993. Y.S. Cho and E.J. Powers (1991), "Esimtation of N onlinear Systems with an LLD. InQuadratically Nonlinear put," proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, May, 1991, vol. 5, pp. 3117 - 3120. 1.P. Congalidis, W.H. Ray (1989), J.P. Co ngalidis, J.R. Richards, and W.B. "Feedforward and Feedback Control of a Solution Copolymerization Reactor," AIChE Journal, vol. 35, pp. 891 - 907. G. Cybenko (1989), "Approximation by superposiControl.., Signal.., tions of a single function," Moth. Math. Controz" Signal" S,,~tem.., v. 2, pp. 303 - 314. and Sf/Item"
Finally, it is important to note the role that certain developclasses of nonlinear models can play in the develo~ ment of algorithms and methodologies, even when they are not the best choice of model structure for practical applications. In particular, it was noted in Section 7 that the Volterra model was a poor choice for describing the CSTR dynamics considered there because of the extreme number of parameters required. required for its specification, even in the restrictive case of second-order Volterra models. Conversely, the investigation of this general class of models described in Pearson, Ogunnaike, and Doyle (1994) has yielded some very useful insights into the challenging problem of input sequence design for nonlinear system identification. Specifically, these results make it clear that PRBS inputs are singularly inappropriate for Hammerstein model identification, a conclusion that is reasonable in retrospect, important in view of the popularity of PRBS for linear modeling, but not obvious beforehand.
9
H.T. Davis (1962), Introduction to Nonlinear DifferB.T. Equation6, Dover, New York. ential and Integral Equation~,
Regreuion N. Draper and H. Smith (1981), Applied Regre"ion Analf/6i6, 2nd ed., John Wiley and Sons, New York. Anal",i" J.L. Elman (1990), "Finding structure in time," CogJ.1. nitive Sci., v. 14, pp. 179 - 211. E. Eskinat, Eslrinat, S.H. Johnson, and W.L. Luyben (1991), "Use of Hammerstein Models in Identification of Nonlinear Systems," AIChE Journal, v. 37, pp. 255 - 268. W. Greblicki Grebliclri and M. Pawlak (1989), "Nonparametric Identification of Hammerstein Systems," IEEE Tran6. Tran~. Inform. Theory, v. 35, pp. 409 - 418. W. Grebliclri Greblicki and M. Pawlak (1991), "Nonparametric Identification of Nonlinear Nonlinear Block-Oriented Systems," Preprint~, Preprint6, 9th IFAC/IFORS Sympo~ium, Sympo6ium, Budapest, Hungary, pp. 720 - 724. W. Greblicki Grebliclri (1992), "Nonparametric Identification of Wiener Systems," IEEE Tran6. Tran~. Inform. Theory, v. 38, pp. 1487 - 1493.
REFERENCES
J.S. Bendat (1990), Nonlinear Sy~tem 1.S. SY6tem Analy~i~ AnalY6i6 and Identification from Random Data, lohn John Wiley and
R. Haber and H. Unbehauen (1990), "Structure Iden-
14
Nw.. and &Dd A. Petropulu (1993), Higher-Order C.L. Nwu Prentice-Ha.1l, Ne New York. Analy.i., Prentice-Hall, Spectra AnalSf.i.,
tification of Nonlinear Nonlineu Dynamic Dyna.mic Systems Syatema - A Survey Approa.chea," AutomatiCG, Automatica, v. 26, on Input/Output Approaches," pp. 651 - 677.
ProbabilitSf, Random &ndom Variable., A. Papoulis (1965), Probability, and Stocha,tic Stocha.tic Proce8,e~, Proce"eI, McGraw-Hill, New York.
W . Hardle Hudle (1990), (1990) , Applied Nonparometric Nonparametric Regre••ion, Regre"ion, W. Cambridge Ca.mbridge University Press, Preas, New York.
R.K. Pearson and B.A. Ogunnaike (1994), "NonlinProceas Identification," ch. 4 in N Nonlineor ear Process onlineor ProCel' ce'8 Control, Control, M.A. Hen80n and D.E. Seborg, eds., eels., all, Englewood Clif£s, Prentice-H Cliffs, N J. J. Prentice-Ha.1l,
E. Hernandez and Y. Arkun (1993), "Control of Nonlinear linea.r Systems Systema Using Polynomial ARMA Models," AIChE Journal, v. 39, pp. 446 - 460. P.J . Huber (1981), Robu.t Stati~tic~, Statidic., John Wiley a.nd &Dd P.J. Sons, Sona, New York.
Pearson, B.A. Ogunnaike, Ogunna.ike, &Dd F.J. Doyle (1994), R.K. Pearaon, "Identification of 2nd-order Volterra models for chemical processes," in preparation.
S.M. Kay (1988), Modem Spectral Edimation, E.timation, Prentice-Ha.1l, New York. Prentice-Hall,
JA. Z. Tsypkin (1980), "Robust IdenB.T. Poljak and lA. tification," AutomaticG, Automatica, v. 16, pp. 53 - 63.
G Kla.mbauer (1975), (1975) , MathematiCGl Mathematical AnalSf.i., Ma.rcel G.. Klambauer Analf/~i~, Marcel Dekker, New York.
M. Pottmann PoUmann and &Dd D.E. Seborg (1992), "Identification of non-linear non-linea.r processes proceaaea using reciprocal multiquadric J . Proc. Control, v. 2, pp. 189 - 203. functions," J.
T . Koh and &Dd E.J. Powers (1985), "Second-Order T. &Dd Its Applications to NonlinVolterra Filtering and ea.r System Identification," IEEE Tran.. AcoudiCl, ear Tran,. Acou.tia, Speech, and Signal Proce.,ing, Proce"ing, v. 33, pp. 1445 - 1455.
&Dd D.E. Seborg M. Pottmann, H. Unbehauen, and (1993), "Application of a general multi-model approa.ch for identification of highly nonlinear nonlineu processes proach - a case study," 1nt. Int. J. Control, v. 57, pp. 97 - 120.
Kra.mer (1991), "Nonlinear Principal ComM.A. Kramer Autousociative Neural Netponent Analysis Using Autoassociative Journal, vol. 37, pp. 233 - 243. works," AIChE Journal,
Di.tributed D.E. Rumelhart, et al. (1986), Parallel Di,tributed Proce"ing, vol. 1, MIT Press, Preas, Cambridge, Ca.mbridge, MA.
&Dd N. Altman (1993), "Assessing Influence C. Leger and in Variable Selection Problem," J. J . Am. Stati.t. Statid. A..oc., v. 88, pp. 547 - 556. 'DC.,
G.M. G .M. Scott and W.H. W.H . Ray (1993), "Creating Efficient Nonlinear Neural Network Process Proceaa Models That Allow Model Interpretation," J. Proce" Proce" Control, v. 3, pp. 163 - 178.
(1987) , Sf/,tem SSf.tem Identification: Theory for the L. Ljung (1987), U.er, U.er, Prentice-Hall, Prentice-Ha.ll, New York. R.D. Martin and D.J. Thomson Thom80n (1982), "Robustresistant spectrum estimation," Proc. IEEE, vol. 70, pp. 1097 - 1114.
(1993) , "Integration of H.T. Su and T.J. McAvoy (1993), Linea.r Dynamic Dyna.rnic Multilayer Perceptron Networks and Linear Models: A Hammerstein Ha.mmerstein Modeling Approach," Approa.ch," Ind. Ind. Models: Eng. Chem. Re~., Chem. Re • .• v. 32, pp. 1927 - 1936.
W .S. McCulloch (1943), "A logical calculus of the W.S. Moth. Bioideas immanent in nervous activity," Bull. Math. phf/~., phSf'" v. 5, pp. 115 - 133.
Non-lineor Time Serie" Serie., Oxford UniH. Tong (1990), Non-linear versity Press, New York. C.R. C.H. Tseng and E.l. E.J . Powers (1992), "Time Domain Cubic System Identification Using Higher-Order Correlations of LLD. LLD . Signals," proceedings of the Third International Symposium on Signal Processing and Its Applications, Queensland, Australia, August, 1992, pp. 212 - 215.
S. Monaco and D. Norm&Dd-Cryot Normand-Cryot (1988), "Zero dynamics na.rnics of sampled sa.mpled nonlinear systems," Sy.tem~ Syltem. and Control Letter., Letter~, vol. 11, pp. 229 - 234. M. Morm Morari and E. Zafiriou (1989), (1989) , Robu~t Robud Proce~8 Procell Control, Prentice-Hall, Prentice-Ha.1l, Englewood Cliffs, NJ.
J.R. J .R. Munkres (1975), (1975) , Topology: Topology: A Fir8t Fird Cour8e, Coune, Prentice-Hall, Prentice-Ha.1l, Englewood Cliffs, NJ. N J. E.P. Nahas, Na.has, M.A. Henson, Hen80n, and D.E. Seborg (1992), "Nonlinea.r Internal Model Control Strategy for Neu"Nonlinear ral Network Models," Computer. Computer~ Chem. Eng., v. 16, pp. 1039 - 1057.
K.S. Narendra. Parthasarathy (1990), (1990) , "IdenNa.rendra and K. Parthasara.thy tification &Dd Control of Dynamical Dyna.rnical Systems Using tifica.tion a.nd Neural Networks," IEEE Tron8. Tran • . Neural Networh, Networlu, v. 1, 1. pp. 4 - 26. A.H. Nayfeh and D.T. Mook (1979), Nonlinear Nonlineor O,cilO.cillotion8, lotion •• John Wiley and Sons, Sons. New York.
15