Identification of Nonlinear Static Processes with Local Polynomial Regression and Subset Selection

Identification of Nonlinear Static Processes with Local Polynomial Regression and Subset Selection

Proceedings of the 15th IFAC Symposium on System Identification Saint-Malo, France, July 6-8, 2009 Identification of Nonlinear Static Processes with ...

378KB Sizes 0 Downloads 15 Views

Proceedings of the 15th IFAC Symposium on System Identification Saint-Malo, France, July 6-8, 2009

Identification of Nonlinear Static Processes with Local Polynomial Regression and Subset Selection Heiko Sequenz, Alexander Schreiber, Rolf Isermann ∗ ∗

Institute of Automatic Control, Laboratory of Control Engineering and Process Automation,Technical University Darmstadt, Germany (Tel: +49-6151-16 3914; e-mail: [email protected], [email protected], [email protected])

Abstract: The presented method for nonlinear system identification is based on the LOLIMOT algorithm introduced by Nelles and Isermann [1996]. The LOLIMOT algorithm divides the input space by a tree-construction algorithm and interpolates the local linear models by local membership functions. Instead of assuming local linear models, the presented algorithm utilizes general local nonlinear functions, which make the algorithm more flexible. These are approximated by a multidimensional Taylor series. Since the amount of regressors grows fast with the number of inputs and the expansion order, a subset selection procedure is introduced. It reveals significant regressors and gives information about the local functional behavior. The local subset selection is implemented as a stepwise regression with replacement of regressors. Mallows’ Cp -statistic is used for the subset selection algorithm and is also implemented for final model selection. The benefit of the extended algorithm lies in the higher flexibility in the local models, which results in less partitions of the input space by a similar approximation quality. Keywords: Local Regression; Subset Selection; LOLIMOT; Identification Algorithm; Multivariate Polynomials; Least-Squares Estimation; Automotive Emissions 1. INTRODUCTION The goal of this paper is to provide a method for model building of a mostly unknown process. It is only necessary to specify a set of admissible input variables which models the output and to provide a dataset which excites the process sufficiently. Thus, this approach can be seen as a MISO black-box method with the general function to be approximated y = f (u1 , u2 , . . . , up ) + , (1) where y is the observed output, described by the p input variables u and the i.i.d. gaussian noise . It will be seen that it is also possible to include prior knowledge, such that the model can also be viewed as a gray-box approach. This may improve model quality but is not necessary to obtain reliable results. So far only static processes will be considered but the method can be transformed to dynamic processes as well. The algorithm introduced is based on the LOLIMOT algorithm [Nelles and Isermann, 1996], which has the ability to adapt the model structure to the nonlinear behavior of the process. LOLIMOT is an abbreviation for “local linear model tree.” It consists of an outer loop which parts the input space into hypercubes and an inner loop for parameter estimation. The local linear or to be more precise local affine models are weighted by gaussian membership functions and added to a model over the whole input space. The tree-building structure of the outer loop enables the algorithm to adapt on the nonlinear structure and supplies the algorithm with its flexibility. The local models are

978-3-902661-47-0/09/$20.00 © 2009 IFAC

however restricted to be affine. Therefore, here the assumption of a general local function is made. It is approximated by a Taylor series with center taken as the middle of the considered input space. The Taylor series expansion is evaluated up to a predefined order o. Since the number of terms grows fast with the order o and the number of inputs p, a selection algorithm of the significant terms is introduced. The selection procedure is repeated in every partition step (outer loop). The increase in computational effort results in an additional accuracy of the local models (estimated in inner loop). It may as well reduce the overall computation time by demanding less partitions of the input space at identical accuracy, which serves as well the interpretability. A selection procedure can not guarantee to pick the best subset of regressors without checking every admissible set of regressors. Since a testing of every admissible set would even with modern computers imply an unacceptable computational effort, a stepwise regression with replacement is utilized here. Models are rated in terms of Mallows’ Cp -statistic [Mallows, 1973] but other criteria like Akaike information criterion (AIC) or Bayesian information criterion (BIC) [Stoica and Selen, 2004] are also possible. The presented algorithm contains ideas for subset selection by Miller [2002] and for local regression by Loader [1999]. 1.1 Structure of the Paper The paper is organized as follows. In section 2 the LOLIMOT algorithm is briefly introduced. Section 3 presents the building of polynomial models with regressor selection.

138

10.3182/20090706-3-FR-2004.0106

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 The selection procedure by the stepwise regression with replacement is discussed as well as alternative algorithms w.r.t. their feasibility. It is further shown in section 4 how the idea of the tree-building LOLIMOT algorithm can be merged with local polynomial models. An Example for system identification with comparison of the different strategies is given in section 5. The obtained conclusions are summarized in section 6.

u2 1-1

1. iteration

u1

u2

u2 2-1 2. iteration

2. THE LOLIMOT ALGORITHM

2-2

Some of the commonly used neural networks like multilayer perceptrons (MLP) are not transparent, which means that they are not physically interpretable, converge slowly and need a relatively long calculation time [Isermann et al., 1997]. Because of these reasons, a special net structure with fast convergent parameter estimation methods was developed. It allows an automated adaptation of the model structure and needs short calculating time. LOLIMOT (LOcal LInear MOdel Tree) neural networks are local linear neuro-fuzzy models, which are adapted with a special construction algorithm [Nelles, 2001]. They approximate non-linear (static and dynamic) systems with a number of local linear models (see Fig. 1). Φ1

u

u1

u2

u2

Φ2 u

...

LPM 2

u

3-2 3. iteration 3-3 u1

by the variance parameter σj2 . These basic membership functions lead to normalized validity functions Φj weighting the portions of the local models of the global model μj (u)

M 

j=1 ^ y

u1

Fig. 2. Operation of the LOLIMOT structure search algorithm in the first three iterations for a twodimensional input space (p = 2)

Φj (u) =

,

(3)

μj (u)

where M denotes the number of local models. Thus, the global output for a multi-input single-output LOLIMOT model is defined by the weighted overlap of all local models

y^

2

...

up

u2 3-1

LPM 1 y^1 u1

u1

ΦM

yˆ =

M 

Φj (u) · (w0,j + w1,j u1 + . . . + wp,j up )

(4)

j=1

LPM M y^M

Fig. 1. Network structure of a static local (polynomial) neuro fuzzy model with M neurons for p inputs The structure construction algorithm of the LOLIMOT approach for several inputs is based on an iterative axisorthogonal partitioning of the input space, which has the advantage that transparent and efficient parameter estimation algorithms can be used. For example, a twodimensional input space of a unknown function might be divided by the algorithm as shown in Fig. 2.

with parameters w0,j , w1,j , . . . , wp,j obtained with leastsquares parameter estimation. To allow better identification capabilities and to reduce the number of local models needed, other nonlinear local models (e.g. polynomials) can be used instead of restricting to the affine ones. Because this paper deals primary with local polynomial functions, these local functions are abbreviated as LPM (Local Polynomial Model ), as shown in Fig. 1. 3. POLYNOMIAL MODELING BY SUBSET SELECTION

In every partition a local model is realized, which is assumed to be affine for the LOLIMOT algorithm. The transfer between the local models is carried out by weighting them with appropriate validity functions. These validity functions are calculated by normalizing the basic membership functions μi of each local linear model. In LOLIMOT Gaussians are used as basic membership functions   1 (u − u0,j )2 . (2) μj (u) = exp − 2 σj2

Before introducing how to merge the local polynomial models with the LOLIMOT algorithm, the algorithm for building the local model itself will be explained. The idea is as follows: Basically all steady functions appearing in real applications can be approximated arbitrary well by a Taylor series in a neighborhood of its center. Let this center point be denoted by u0 , then a multidimensional Taylor series for the general nonlinear function can be written as (e.g. Bronstein et al. [2000])

They have their center identical with the middle of the considered input space (u0,j ) and can be tuned for smoothness

f (u0 + Δu) =

139

(5)

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 i  o  ∂ 1 ∂ f (u0 ) + Δu1 + . . . + Δup f (u0 ) + Ro , i! ∂u1 ∂up i=1

14

12

10

Cp value

where o is the order of the Taylor series expansion and Ro the remainder. The remainder decreases with the order, but the number of terms increase such that a proper selection is getting more expensive by increasing order. Therefore in Loader [1999] it is suggested to use orders of o = 2, others like Hastie et al. [2001] suggest orders of o = 3 or even higher. A universal choice of an order for local polynomial regression does not exist as it always dependents on the data. Here an order of o = 3 is recommended as good trade-off between accuracy and computational effort. The number of admissible regressors k in a polynomial model can be specified subject to the number of inputs p and the order o

8

6

4

2

0

0

2

4

6

8

10

12

14

Number of Regressors

Fig. 3. Cp-plot of ”best” models from example in section 5 selected by the algorithm presented in section 3.2

min(p,o)

k=



km

(6)

m=1

with km

  o−m  i + m − 1 p = · , m i i=0

where km gives the number of terms with m different input variables. The formula can be derived by applying basic combinatorics. The amount of admissible regressors grows fast with p and o (e.g. for p = 5 and o = 3 there are k = 56 admissible regressors). Since the number of data samples needs to be considerably greater than the number of regressors to get a reliable estimation for the parameters, it is required to have N  n, (7) where N is the size of the data and n the number of estimated parameters. To ensure Eq. (7) a selection of significant regressors is necessary. This can also be motivated by the well known bias-variance trade-off (e.g. Ljung [1999]). To compare the models quality some criteria of fit are introduced and discussed in section 3.1. In section 3.2 the algorithm for subset selection is described, using the introduced Cp -statistic. Finally the selection of the best model is discussed in section 3.3. 3.1 Criteria of fit A criterion of fit is needed to compare different models for the selection algorithm. A popular choice is Mallows’ Cp -statistic. It is derived and discussed by Mallows [1973]: N 2 (yi − yˆi ) Cp = i=1 2 − N + 2n, (8) σ ˆ where σ ˆ 2 is an estimation of the residual variance, yˆ the output of the considered model with n parameters fitted. It is an unbiased estimation of the expected value of the scaled mean square error of prediction which can be written as N 1  2 J= 2 (yi − yˆi ) . (9) σ i=1 Minimizing the scaled squared error is equivalent to minimize the more common mean square error which is similarly defined (substituting the denominator by the number

of observations N ). The error can be evaluated as prediction or simulation error [Ljung, 1999] which is identical for the present static case. For dynamic systems it is recommended to use the simulation error [Piroddi and Spinelli, 2003] which will lead to a nonlinear least-squares problem. It might be tempting to conclude, that the model minimizing the Cp -statistic is the best, but differences in the obtained Cp -value can result from random noise. A typical Cp -plot, which shows the derived Cp -value over the number of selected regressors, is given in Fig. 3. The quality of the Cp -statistic is also highly dependent on a good variance estimation. A common estimation is that given by the full model (all k admissible regressors included in the model): N 2 (yi − yˆi ) 2 σ ˆ = i=1 . (10) N −k This estimation is unbiased as the regressors are able to describe the unknown function properly, i.e. as there is no bias in the error. The remaining variance error divided by the degrees of freedom results in an unbiased estimation of σ 2 . To guarantee unbiased residuals a sufficient high order for the Taylor series is necessary. It is also required, that every input affecting the output is incorporated. Besides these minor drawbacks of Mallows’ Cp , there are some advantages which make it a widespread selection criterion. Adverse to criteria like the F-to-enter, it has the ability to compare more than two models against each other [Seber, 1977]. Furthermore it can be interpreted as unbiased estimation of the illustrative scaled squared error given in Eq. (9) and is simple to calculate. It can also be altered to force more or less regressors in the selection by changing the factor of the penalty term (2n) in Eq. (8). A factor greater than 2 would penalize the number of regressors more and thus select less regressors and analog with a smaller value. Anyhow, it would then not be anymore an unbiased estimation of Eq. (9). Other criteria like Akaike information criterion (AIC) or the Bayesian Information Criterion (BIC) have similar properties and may as well be used as criterion of fit. To milden the risk of overfitting given by AIC, the corrected AICc can be used in finite sets.

140

AICc = N ln σn2 +

2N n N −n−1

(11)

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 It is asymptotical equivalent to AIC (as N → ∞). A good review of information criteria is given by Stoica and Selen [2004]. These criteria underlie the same minor drawbacks as the Cp -statistic and are furthermore harder to interpret. Therefore the Cp -statistic is taken as criterion of fit for the selection algorithm in the following. 3.2 The selection algorithm The reason for applying a selection algorithm lies in the bias-variance trade-off (e.g. Ljung [1999]). If there is not enough data available to get reliable estimations for all k regressors (see Eq. (7)) a selection of the most significant regressors need to be done. Hence a bias error is introduced in the estimation but the variance error decreases. The splitting of the mean squared error in a variance and a bias term is shown amongst others in Nelles [2001]. Since minimizing the Cp -statistic is by expectation equivalent to minimize the mean squared error, the Cp -value (Eq. (8)) is taken as decision rule in the algorithm. To guarantee that truly the best model w.r.t. the Cp statistic is chosen, all subsets of the admissible set of regressors need to be checked. This implies testing 2k − 1 different models, since every k regressor has the possibility to be in or off the model. This is in most cases not feasible as the number of regressors grows fast with the number of inputs p and the order o (see Eq. (6)). Therefore heuristic algorithms like forward, backward or stepwise regression need to be applied [Miller, 2002]. Since computational effort is getting less critical, the introduced stepwise selection algorithm is additionally equipped with a replacement step for regressors. The higher computational effort is paid-off by a much higher flexibility. The detailed structure of the algorithm is given in Fig. 4.

Part 1 of the algorithm always converges with finite steps, since the Cp -value of the regarded model is decreasing and bounded below (furthermore the admissible set of regressors is finite). More attention need to be paid to part 2 to guarantee convergence. The Cp -value is not monotonically decreasing thus there could not be the same argument be used as in part 1. Anyhow, checking if a model has been evaluated in a previous step prevents from converging in the same minimum twice. As the number of minima is finite and the number of regressors is limited, the second part converges too. It shall be mentioned that it still can not be guaranteed that the global minimum w.r.t. the Cp value will be reached, without checking every possible subset of regressors, what would imply an unfeasible number of model calculations as was mentioned before. 3.3 Choice of the “best” model The easiest way to choose a “best” model is by the calculated Cp -value. This is a good choice if the variance estimation is sufficiently good, but there are cases where a proper variance estimation is not possible. For example if the variance depends on the input and there is not enough data available to estimate the variance in certain regions. These cases can be compensated by applying a crossvalidation subsequent to the selection algorithm. Crossvalidation is a powerful tool for examining the simulation ability without strictly dividing the data in a training and a validation set [Nelles, 2001]. Furthermore there are no additional assumptions necessary like the estimation of the variance in the Cp -statistic. A major drawback is its computational effort, why it is not feasible as criterion of fit for the whole selection algorithm. In this last step however only a few competitive models need to be compared, which are chosen as the “best” models by the Cp -value for certain number of regressors. Anyway, if there is no reason to assume that the variance estimation is not sufficient, the ”best” model will be determined by the Cp -value. 4. MERGING THE POLYNOMIAL MODELING WITH THE LOLIMOT ALGORITHM

Fig. 4. Algorithm for subset selection In part 1 a stepwise selection with replacement is performed until no more improvement is achieved by either adding, deleting, or replacing a regressor. To prevent the algorithm from stopping in a local minimum, it is continued in part 2 with a forward selection independent from model improvement. It is possible that a regressor is getting significant just in connection with another regressor. These connections can be found by continued adding regressors. The replacing and deletion steps thereafter optimize the regressor set. Anyhow for computational efficiency the second part of the algorithm will be terminated if the Cp -value is permanently increasing.

The advantages of the introduced LOLIMOT-Algorithm and the polynomial modeling can be combined to receive a flexible identification algorithm. The outer-loop (see Fig. 2) splits the input space, such that different functional characteristics can be adapted by the several local models. The flexible local models, realized by a Taylor series with selection of significant regressors, enables the algorithm to adapt on complex local structures and avoids unnecessary partitions of the input space. This results in more information about the true system and a similar accuracy by less input space partitions. The building of the local models works in an analogous manner as described in section 3. There will only be need to introduce a weighting matrix motivated from the gaussian membership function. Hence a weighted LeastSquares algorithm is performed which is realized by applying the transformation   ˜ = Wj X and y ˜ = Wj y, X (12) where Wj = diag (Φj (u)) is describing the diagonal weighting matrix for the j-th local model. Taken this

141

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009

where neff,j describes the number of effective parameters in the j-th local model. These can be calculated by

−1  2 neff = tr X WX X W X = tr(H) (14) The residual variance is indicated as σ ˆ 2 in Eq. (13). It can be calculated like in Eq. (10) with the estimated output given by a weighted sum of the local models M ˆ j ) Wj (y − yj ) j=1 (y − y 2 . (15) σ ˆ = N −k It is easy to extend this approach to a local variance estimation by substituting the global variance estimator with σ ˆj2 . This can be calculated by σ ˆj2 =

ˆ j ) Wj (y − yj ) (y − y . tr(Wj ) − neff,j

(16)

further partitions of the input space. It is also possible to divide the input space up to a predefined number of local polynomial models or until a predefined accuracy is reached. The best model can then be selected by Eq. (19). The algorithm can easily be extended for modeling dynamic processes, as is described for the LOLIMOT algorithm in Isermann et al. [1997]. The algorithm for subset selection can be applied in the same way. The error shall than be measured as simulation error [Piroddi and Spinelli, 2003]. 5. MODELING THE BOOST PRESSURE OF A DIESEL ENGINE In the following example the modeling of the boost pressure (p2 ) for a Common-Rail Diesel engine with exhaust gas recirculation and variable turbine geometry turbocharger is examined. The modeling is done for a three input system: Angle for main injection (ϕMI ), position of exhaust gas valve (sAGR ), and position of the variable turbine geometry actuator (sVTG ). The input values are shown in Fig. 5. The stationary output is measured for 74 points at a constant engine speed (neng = 2000rpm) and a constant injection quantity (qMI = 15mm3 /cyc). sAGR [mm]

transformation, the algorithm for local polynomial modeling works basically in the same way as described in section 3. Anyway, some further attention need to be paid to the local criterion of fit. A local criterion for Mallows’ Cp is given by Loader [1999] ˆ j ) Wj (y − y ˆj ) (y − y Cp,local,j = + tr(Wj ) + 2neff,j , (13) 2 σ ˆ

This gives some information about the dependence of the variance on the inputs but will not further be applied in the algorithm. In most cases the dependence of the variance on the inputs can be neglected, such that Eq. (15) is taken as variance estimator. Anyway, in all cases all admissible regressors should be used for estimation in the local models to avoid bias. This is why the effective number of parameters is identical to the number of admissible regressors in Eq. (15). To select the global model (outer loop) with the best complexity/quality trade-off, a criterion for the global model need to be given. This criterion is given as the sum of the local Cp -values Cp,global =

M 

Cp,local,j ,

(18)

j=1

which in the case of a global variance can also be written as M M  ˆ j ) Wj (y − yj ) j=1 (y − y Cp,global = −N +2 neff,j . σ ˆ2 j=1

ϕMI[◦ Crs]

sVTG[mm]

The revealed regressor selection need to be checked by a cross-validation according to section 3.3 as the smaller dataset might not allow a reliable local variance estimation. This estimation allows to form a variance function, dependent on the input M σj2 j=1 Φj (u)ˆ 2 σ ˆ (u) = M . (17) j=1 Φj (u)

6 4 2 0 6

10

20

30

40

50

60

70

10

20

30

40

50

60

70

10

20

30

40

50

60

70

5

4 6 4 2 0

Samples [−]

Fig. 5. Input values for identification of the boost pressure There were three different models build to compare different algorithms. First a standard LOLIMOT model was build, abbreviated as LLM (local linear model). Secondly a LOLIMOT model with transformed inputs was constructed. These were besides linear, the quadratic, cubic, and interaction terms. Therefore it is abbreviated as LPM all (local polynomial model with all regressors). Finally the model was build using the introduced identification algorithm, which is named as LPM selected (local polynomial model with selected regressors). All algorithms were able to reflect the training data but needed different numbers of local models. The results are summarized in table 1. In order to compare the results of these three algorithms, all models are trained to achieve a minimum coefficient of determination (R2 ) of 0.99. The number of local models and the number of parameters can then be used to judge the results.

(19) With this criterion given, the partition algorithm can be terminated if no more improvement is achieved by

142

Table 1. Comparison of selected best models w.r.t. complexity/quality trade-off LLM LPM all LPM selected

R2 0.9923 0.9945 0.9924

No. of LLM/LPM 6 2 2

peff 13.22 25.61 10.31

ptotal 24 38 14

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 stricting on orthogonal partitions of the input space. Less partitions of the input space implies less interpolation of local models. This improves the interpretability and avoids the critical transition region between the local models. The local selection of significant regressors prevents the algorithm from local overfitting by keeping a high flexibility. Furthermore, there can regressors be incorporated just on suspicion. The selection algorithm prevents from estimating nonsignificant regressors. The introduced expanded LOLIMOT algorithm shows its improved performance especially for small datasets, when it is important to extract as many information out of the dataset as possible. The bias-variance trade-off of the model is optimized w.r.t. the Cp -statistic.

1,3

2

Predicted boost pressure p [bar]

1,4

1,2

1,1

1 1

1,1

1,2

1,3

REFERENCES

1,4

Measured boost pressure p2 [bar]

Fig. 6. Measured vs. predicted plot of the output p2 [bar], R2 = 0.992 The standard LOLIMOT model needed the most partitions for a comparable accuracy. The effective number of parameters is similar to the introduced identification method, but the total number of parameters is much higher and would thus demand more memory. The second model with the additional transformed inputs is able to describe the data better with a lower number of local models needed, but with a lot more effective parameters what might have been attained by an overfitting to the training data. The required memory is even higher for the second model than for the first. The introduced algorithm used for the third model clearly shows the best performance/memory trade-off. The accuracy is comparable to the other strategies but have been obtained with less local models (compared to the LLM algorithm) and a lower number of effective parameters (compared to the LPM all model). The total number of parameters is anyhow the smallest for the third model. The results of the presented algorithm are also physically interpretable. The selected regressors are mainly based on the position of the variable turbine geometry (sVTG ) and exhaust gas valve (sAGR ). Only one regressor containing the injection angle is selected, which describes the (smaller) effect of a higher exhaust gas temperature and exhaust back pressure on the resulting boost pressure. Fig. 6 shows the predicted values for the third model in a measured vs. predicted plot. 6. CONCLUSION The combination of the construction algorithm LOLIMOT and the local polynomial modeling with subset selection reveals some advantages. The identification algorithm possesses a high flexibility to adapt on complex nonlinear structures but avoids a higher number of divisions by assuming local polynomial models. The construction algorithm in LOLIMOT is for the sake of simplicity only considering orthogonal divisions of the input space instead of allowing arbitrary partitions like in Hinging Hyperplanes [Breiman, 1993, Ernst (T¨ opfer), 1998]. This constraint is eased by allowing arbitrary local models. Including interaction terms together with linear and higher order terms enables to build local models with same accuracy by re-

L. Breiman. Hinging Hyperplanes for Regression, Classification, and Functionapproximation. IEEE Transactions on Information Theory, 39(3):999–1013, 1993. I.N. Bronstein, K.A. Semendjajew, G. Musiol, and H. Muhlig. Handbook of Mathematics. Verlag Harri Deutsch, 2000. S. Ernst (T¨ opfer). Hinging hyperplane trees for approximation and identification. In Proceedings of the 37th IEEE Conference on Decision and Control, 1998. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001. R. Isermann, S. Ernst (T¨opfer), and O. Nelles. Identification with Dynamic Neural Networks-Architectures, Comparisons, Applications. In Proceedings of the 11th IFAC Symposium on System Identification, volume 3, pages 997–1022, 1997. L. Ljung. System Identification: Theory for the user. Prentice-Hall, Inc., 1999. C. Loader. Local Regression and Likelihood. Springer, 1999. C. Mallows. Some comments on Cp . Technometrics, 15 (4):661–675, 1973. A.J. Miller. Subset Selection in Regression. CRC Press, 2002. O. Nelles. Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models. Springer Verlag, 2001. O. Nelles and R. Isermann. Basis Function Networks for Interpolation of Local Linear Models. In Proceedings of the 35th IEEE Conference on Decision and Control, volume 1, 1996. L. Piroddi and W. Spinelli. Structure Selection for Polynomial NARX Models based on Simulation Error Minimization. In Proceedings of the 13th IFAC Symposium on System Identification, pages 371–376, 2003. G. A. F. Seber. Linear Regression Analysis. John Wiley & Sons Inc, 1977. P. Stoica and Y. Selen. Model-Order Selection: A review of information criterion rules. Signal Processing Magazine, IEEE, 21(4):36–47, 2004.

143