Proceedings of the 15th IFAC Symposium on System Identification Saint-Malo, France, July 6-8, 2009
Benchmark Nonlinear System Identification using Wavelet based SDP Models Nguyen-Vu Truong Liuping Wang School of Electrical and Computer Engineering, RMIT University, Melbourne, Australia e-mail:
[email protected],
[email protected] Abstract: This paper addresses the identification of a Wiener-Hammerstein benchmark nonlinear system using wavelet based State Dependent Parameter (WSDP) models. The major attractive feature of the proposed technique lies on the systematic approach to nonlinear model structure determination using the Predicted Residual Sums of Squares (PRESS) statistic and forward regression. This procedure detects parametrically efficient structures for the wavelet based parameterization of SDP models, thus further enhances the parsimony of the model and helps to avoid over-parameterization. The obtained simulation results demonstrate the merit of the approach in identifying this benchmark nonlinear system. Keywords: Wavelet; State Dependent Parameter; Nonlinear System Identification; PRESS Statistic. 1. INTRODUCTION
cation of the WSDP models in identifying the benchmark nonlinear system. Finally, Section 4 concludes the paper.
Our recent publications (Truong et al. 2006, 2007a, 2007b, 2008a, 2008b) have presented approaches to the identification of nonlinear systems using wavelet based State Dependent Parameter (SDP) models. This model structure expresses the nonlinear system as a set of the linear regressive output/input terms (states) multiplied by associated State Dependent Parameters to characterize the nonlinearities. These state dependencies, in the first step, are non-parametrically estimated using a SDP algorithm based on recursive fixed interval smoothing as in Young’s work (see Young 1993, 2001, Young et al. 2001). The shapes of the SDP relationships (as defined by the plots of the parameters against the state variables) indicate and visualize the nature of the most significant nonlinearities within the dynamic SDP model. They are then, in the second step, parameterized in a compact manner via wavelet series expansion by employing appropriate types of wavelet basis functions that are selected corresponding to the features of the SDP relationships. This formulates the wavelet based SDP model (WSDP). The Predicted Residual Sums of Squares (PRESS) statistics and forward regression in conjunction with Orthogonal Decomposition are used to systematically select the optimized nonlinear model structure (Truong et al. 2006, 2007a, 2007b, 2008a, 2008b), leading to a more parsimonious nonlinear system representation. In this paper, the WSDP model is used to identify a Wiener-Hammerstein benchmark nonlinear system from a real-life data set. This benchmark is to compare and get better understanding about the capabilities of different identification approaches in modelling nonlinear systems. The paper is organized as follows. Section 2 provides background information about the WSDP models. Simulation results are reported in Section 3 to demonstrate the appli-
978-3-902661-47-0/09/$20.00 © 2009 IFAC
844
2. BACKGROUND We assume that a nonlinear system can be represented by following State Dependent Parameter (SDP) model:
y(k) = +
ny X
q=1 nu X
fq {y(k − q)}y(k − q) gq {u(k − q)}u(k − q) + e(k)
(1)
q=0
where, u(k) and y(k) are, respectively, the sampled inputoutput sequences; and nu ,ny refer to the maximum number of lagged inputs and outputs. The error term e(k) represents the noise variable, assumed initially to be a zero mean, white noise process that is uncorrelated with the input u(k) and its past values. Here, the parameters fq {.} and gq (.} are dependent on the non-minimal state variables defined by the input and output variables and their past sampled values. In the open literature (Young 1993, 2001,Young et al. 2001) they are regarded as State Dependent Parameters (SDP) to carry the system’s nonlinearities. At this point, the nonlinear system identification problem is best investigated by a two-stage identification procedure: (1) SDPs’ non-parametric estimation, (2) Nonlinear system model structure selection and parameter estimation. 2.1 SDP Nonparametric Estimation
10.3182/20090706-3-FR-2004.0118
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 The estimate of the state dependency is based on considering the changes of the SDPs in a transformed space, where the data are re-ordered in a non-temporal manner (normally the simple ascending-order). In this transformed space, the SDPs are assumed to vary in a stochastic manner, according to a specified member of the Generalised Random Walk (GRW) family, such as the random walk or the integrated random walk. They are then estimated in a recursive approach that exploits the Fixed Interval Smoothing (FIS) algorithm, where the hyper-parameters associated with the stochastic models for the parameter variations are estimated using Prediction Error Decomposition (PED), as well as associated Maximum likelihood (ML) algorithm (see Young 1993, 2001,Young et al. 2001, Young 2005 for more details). The final results of this process are in the form of non-parametric relationships (graphs) between the SDP estimates and the states on which they are dependent. The features of these nonparametric relationships serve as the basis for the selection of wavelet functions as well as the associated scaling parameters for the subsequent nonlinear model’s structure selection process. 2.2 Model Structure Selection and Parameter Estimation Using wavelet series expansion, a SDP relationship f can be parameterized with respect to the state variable x as below:
f (x) =
iX max
X
ai,k Ψ(2−i x − k) + ξ(x)
By doing so, the State Dependent Parameters fq (x) and gq (x) can be compactly parameterized in the following forms n
fq (x) =
fq X
n
af q,j lf q,j (x) and gq (x) =
gq X
agq,j lgq,j (x) (5)
j=0
j=0
o n where, Lfq = {lf q,0 , ..., lf q,nfq }, Lgq = lgq,0 , ..., lgq,ngq are, respectively, the optimized sets of wavelet functions (which are the scaled and translated versions of the mother wavelet Ψ(x)) used for parameterization of fq (x) and gq (x). Substituting (5) into (1) yields n
y(k) =
fq ny X X
[af q,j lf q,j {y(k − q)}]y(k − q)
q=1 j=0 nu ngq
+
XX
[agq,j lgq,j {u(k − q)}]u(k − q) + e(k)
(6)
q=0 j=0
Equation (6) is regarded as a wavelet based SDP model (WSDP). By letting θf i = [af i,0 , ..., af i,nfi ] θgi = [agi,0 , ..., agi,ngi ] Lk,f i = y(k − i)[lf i,0 {y(k − i)}, ..., lf i,nfi {y(k − i)}] Lk,gi = u(k − i)[lgi,0 {u(k − i)}, ..., lgi,ngi {u(k − i)}] ¤T £ θ = θf 1 , ..., θf ny , θg0 , ..., θgnu Lk = [Lkf 1 , ..., Lkf ny , Lkg0 , ..., Lkgnu ]T
(2)
L = [L0 , ..., LN −1 ]T
i=imin k∈Li
E = [e(0), ..., e(N − 1)]T where Ψ(x) is any compactly supported mother wavelet Y = [y(0), ..., y(N − 1)]T (7) function (e.g. a Mexican hat wavelet, Morlet wavelet), whose supporting range falls within (s1 , s2 ). Here, imin , imax , respectively, refer to the minimum and maximum (finest Equation (6) is now written in the following matrix form: and coarsest) scales (resolutions) to be employed for approximation. Li (determined as in (3) and (4) ) is the Y = Lθ + E (8) translation library at scale i, defined as which is a standard least squares formulation. Li = {k ∈ (2−i xmin − s2 , 2−i xmax − s1 ), k ∈ Z} (3) Let us define the cost function as below (4) Limax ⊂ Limax −1 ⊂ ... ⊂ Limin . The realization of f (x) as in (2) is often overparameterized since the wavelet function library as derived in (3) and (4) consists all the possible combination of the parameters. In order to obtain a compact parameterization for the SDP relationship under study, a PRESS statistic based term selection algorithm is implemented (Truong et al., 2006, 2007a, 2007b). This procedure uses the incremental value of PRESS 1 (∆P RESS) as criterion to detect the significance of each terms within the model in which the maximum ∆P RESS signifies the most significant term, while its minimum reflects the least significant term. Based on this, the algorithm initializes with the initial subset being the most significant term. It then starts to grow to include the subsequent significant terms in a forward regression manner, until a specified performance is achieved (see Appendix for details). 1
The difference between the overparameterized (original) model’s PRESS value and the one calculated by excluding a term from the original model.
845
T
J = [Y − Lθ] [Y − Lθ] (9) ˆ and as usual, the estimate θ of the parameter vector θ to minimize the cost function J is obtained in the usual least squares manner, i.e. £ ¤−1 £ T ¤ θˆ = LT L L Y
(10)
2.3 Identification Procedure The overall nonlinear system identification using the proposed approach can be summarized into the following steps: (1) Determining the SDP model order. This includes the selection of the initial values 2 of ny and nu . (2) Non-parametrically estimating the associated SDP parameters. 2
Which normally start with lower values.
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 (3) Optimized SDP parameterization structure selection. This involves the following steps: (a) Based on the features of the respective SDP nonparametric estimate, determine an appropriate wavelet function and the associated scaling parameters (i.e. finest and coarsest scaling parameters) to be used for the parameterization. (b) Using the PRESS based selection algorithm, determine an optimized functional structure used for the respective SDP parameterization. (4) Final parametric optimization. (a) Substitute the optimized SDP functional structures into the original SDP model. (b) Using the measured data, estimate the associated parameters via an Orthogonal Least Squares algorithm. (5) Model validation. • If the identified values of ny and nu as selected in step 1 provide a satisfactory performance over the considered data, terminate the procedure. • Otherwise, increase the model’s order, i.e. ny = ny + 1 and/or nu = nu + 1, and repeat steps 2 through 5. 3. SIMULATION RESULTS
(Vandersteen 1997). The measurement was performed at a sampling rate of 51.2 kHz. The measured input and output signals are as shown in Figure 2. Input 4 Estimation
Validation
2
0
−2
−4
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2 5
x 10 Output 1.5 1 0.5 0 −0.5 −1 −1.5
3.1 System Description for the Benchmark Example The benchmark system is an electronic nonlinear system design by G. Vandersteen (Vandersteen 1997). This system is in the form of a Wiener-Hammerstein structure which consists of a static nonlinear block sandwiched between 2 dynamic linear systems (as described in Figure 1). In this system, the first linear block is a third order Chebyshev filter which was designed to have pass-band ripple of 0.5 dB and cut-off frequency of 4.4 kHz. The second one is a third order inverse Chebyshev filter (stop-band attenuation of 40 dB starting at 5 kHz) 3 , and the static nonlinear block is built using a diode circuit.
0
0.2
0.4
0.6
0.8
1 1.2 Sampling Index
1.4
1.6
1.8
2
Fig. 2. Measured input and output signals.
3.2 Identification Results For identification and validation purposes, this data set was divided into: (1) estimation set which consists of the first 100000 data points used for model estimation and (2) test set from the remaining data (88000 data points) for model validation and testing. The test data is not used for any purpose during the tuning of the model (parameter estimation, model selection, early stopping of the identification algorithm, etc.). Using a 6th order WSDP model (ny = 6, nu = 6), the benchmark nonlinear system can be represented as in the following general form: y(k) = f1 [y(k − 1)]y(k − 1) + f2 [y(k − 2)]y(k − 2) + f3 [y(k − 3)]y(k − 3) + f4 [y(k − 4)]y(k − 4) + f5 [y(k − 5)]y(k − 5) + f6 [y(k − 6)]y(k − 6) + g[u(k)]u(k) + g1 [u(k − 1)]u(k − 1) + g2 [u(k − 2)]u(k − 2) + g3 [u(k − 3)]u(k − 3) + g4 [u(k − 4)]u(k − 4) + g5 [u(k − 5)]u(k − 5) + g6 [u(k − 6)]u(k − 6) (11)
Fig. 1. Wiener-Hammerstein benchmark nonlinear system schematic. The experimental data were generated using a Gaussian filtered excitation signal to excite this benchmark system 3
This system has a transmission zero in the frequency band of interest. This can complicate the identification significantly due to difficulties from the inversion of such a characteristic.
846
5
x 10
In this paper, a black-box strategy is however adopted, omitting the nonparametric estimation stage of the proposed approach. A fixed mother wavelet basis function (Mexican hat) is chosen as in (15). With the knowledge that most of the system’s output is contained in the lower frequency bands, the finest and coarsest scaling parameters used for the identification of this system are chosen
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 to be 2 and 4 4 . This results in a total number of 247 candidate terms in the over-parameterized model. Using the PRESS based selection algorithm to select the significant model terms, the final identified model is found to be: y(k) = fˆ1 [y(k − 1)]y(k − 1) + fˆ2 [y(k − 2)]y(k − 2) + fˆ4 [y(k − 4)]y(k − 4) + fˆ6 [y(k − 6)]y(k − 6) + gˆ[u(k)]u(k) + gˆ1 [u(k − 1)]u(k − 1) + gˆ2 [u(k − 2)]u(k − 2) + gˆ3 [u(k − 3)]u(k − 3) + gˆ4 [u(k − 4)]u(k − 4) + gˆ5 [u(k − 5)]u(k − 5) + gˆ6 [u(k − 6)]u(k − 6) (12) in which, fˆ1 (x) = −0.018994Ψ3,0 (x) − 0.0055129Ψ3,1 (x) + 2.621 fˆ2 (x) = −1.9985 fˆ4 (x) = −0.072028Ψ3,2 (x) − 0.027809Ψ3,1 (x) + 0.4251 fˆ6 (x) = 0.030889Ψ3,−2 (x) − 0.0504
index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Model’s term [Ψ3,0 (y(k − 1))]y(k − 1) [Ψ3,1 (y(k − 1))]y(k − 1) y(k − 1) y(k − 2) [Ψ3,2 (y(k − 4))]y(k − 4) [Ψ3,1 (y(k − 4))]y(k − 4) y(k − 4) [Ψ3,−2 (y(k − 6))]y(k − 6) y(k − 6) [Ψ3,2 (u(k))]u(k) [Ψ3,−1 (u(k))]u(k) [Ψ3,1 (u(k))]u(k) [Ψ3,0 (u(k − 1))]u(k − 1) [Ψ3,1 (u(k − 1))]u(k − 1) [Ψ3,0 (u(k − 2))]u(k − 2) [Ψ3,−1 (u(k − 3))]u(k − 3) [Ψ3,0 (u(k − 4))]u(k − 4) [Ψ3,−1 (u(k − 4))]u(k − 4) [Ψ3,1 (u(k − 4))]u(k − 4) [Ψ3,2 (u(k − 5))]u(k − 5) [Ψ3,1 (u(k − 5))]u(k − 5) [Ψ3,0 (u(k − 6))]u(k − 6) [Ψ3,−1 (u(k − 6))]u(k − 6)
∆P RESSk 0.01 0.03 1.3259 3.7441 0.01 0.02 0.201 0.03 0.43 0.46 0.07 0.18 0.94 0.23 1.09 0.14 0.475 0.03 0.174 0.595 0.516 1.126 0.102
Rank 23 20 2 1 22 21 12 18 10 9 17 13 5 11 4 15 8 19 14 6 7 3 16
Table 1. ∆P RESS table
gˆ(x) = 0.0029Ψ3,2 (x) + 0.002Ψ3,−1 (x) + 0.0034Ψ3,1 (x) gˆ1 (x) = 0.0047Ψ3,0 (x) − 0.001Ψ3,1 (x) gˆ2 (x) = −0.0053Ψ3,0 (x) gˆ3 (x) = −0.0012Ψ3,−1 (x) gˆ4 (x) = 0.012Ψ3,0 (x) − 0.0014Ψ3,−1 (x) − 0.0073Ψ3,1 (x) gˆ5 (x) = 0.0342Ψ3,2 (x) + 0.0120Ψ3,1 (x) gˆ6 (x) = 0.0078Ψ3,0 (x) + 0.0024Ψ3,−1 (x) (13) where, Ψj,k (x) = Ψ(2−j x − k) 2
−0.5x2
Ψ(x) = (1 − x )e
(14) (15)
Table 1 shows the incremental values of ∆P RESSk that result from excluding the associated terms from the model. As discussed earlier, this value reflects the significance of each term toward the model parameterization. The most significant term corresponds to the maximum ∆P RESSk (3.741), which is ranked 1 as in Table 1. The least significant term is reflected by the minimum ∆P RESSk (0.01), which is ranked 23 in Table 1. For validation, the identified model (12) is simulated by using only the measured input data to obtain the simulation output 5 ysim (k) over the test set. The simulation output ysim (k) and simulation error esim (k) are shown in both time (Figure 3) and frequency domains (Figure 4). They are very well overlapped. The mean value of the simulation error (time domain) µt is calculated as in (16) to be 7.28 × 10−5 , and its standard deviation st calculated as in (17) is 0.0145. As well, the root mean square (RMS) value of the error over the test set eRM St and the estimation set eRM Se calculated as in (18) and (19) are 0.0145 and 0.0143 respectively. These 4 Upper scaled wavelets contain lower frequency components while lower scaled wavelets correspond to higher frequency components. 5 Simulation is defined as calculating the output, using only the measured input data. No past values of the measured output should be used.
847
Fig. 3. (a) Model simulation output (black) and error (grey) over the test (validation) set, (b) Zoom-in view from sampling index 2000 to 2500: measured output (solid) versus simulated output (dot-dot). results indicate that the system dynamics has been well captured by the identified WSDP model (23 terms).
µt = v u u st = t
1 87000
1 87000
188000 X
esim (k)
(16)
k=101001
188000 X
k=101001
(esim (k) − µt )2
(17)
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 ACKNOWLEDGEMENT The authors wish to thank Professor Peter C. Young (University of Lancaster, U.K.) who visited RMIT in 2005 for explaining various concept regarding the SDP models. APPENDIX PRESS based term selection algorithm Consider a linear-in-the-parameter regression equation as described below Y = Pθ + Ξ
(20)
in which, Y = [y (0), ..., y(n − 1)]T Ξ = [e(0), ..., e(n − 1)]T θ = [θ0 , ..., θm−1 ]T P = [p0 , ..., pm−1 ], size(pk ) = n × 1 Let us define the prediction error as, ξ−k (k) = y(k) − P (k, :)θˆ−k = y(k) − yˆ−k (k)
Fig. 4. (a) DFT of measured output (black) versus simulated output (grey), (b) DFT of simulation error (grey) superimposed on measured output’s DFT (black).
eRM St
v u u =t
eRM Se
1 87000
v u u =t
188000 X
esim (k)2
(21) where ξ−k (k), k = 0, ...n − 1 is regarded as the PRESS residual, which represents the true prediction error because y(k) and yˆ−k (k) = P (k, :)θˆ−k are independent. Now, orthogonally decompose P into 2 components: a m × m upper triangular matrix T , and n × m matrix W with orthogonal columns of ωi that W T W = T diag[ω0T ω0 , ..., ωiT ωi , ..., ωm−1 ωm−1 ]: i.e.,
(18)
P = WT
k=101001
100000 X 1 esim (k)2 87000
(19)
k=1001
Note that an improved model could be obtained if a larger number of wavelet terms had been utilized. However, the PRESS procedure and the model performance suggest that it is an acceptable, parsimonious approximation of the benchmark nonlinear system under the present study, and that the explanation of the data in the cross-validation (testing) should be sufficient for most purposes.
(22)
Then, Y = P θ + Ξ = (P T −1 )(T θ) + Ξ = W G + Ξ (23) where, G = [g0 , ..., gm−1 ]T is the auxiliary model parameter matrix, while W represents the transformed data matrix. Taking advantage of the orthogonality of W , G can be estimated in a least squares manner, as shown below: £ ¤ £ ¤ ˆ = W T W −1 W T Y G
£ ¤ T = inv(diag[ω0T ω0 , ..., ωiT ωi , ..., ωm−1 ωm−1 ]) W T Y £ ¤ 1 1 1 ] WTY (24) = diag[ T , ..., T , ..., T ω0 ω0 ωi ωi ωm−1 ωm−1
Therefore, 4. CONCLUSION
ωT Y ωiT Y = i 2 (25) T ωi ωi kωi k Using these, the PRESS residual can be calculated (Meyer 1990, Wang and Cluett 2000, Hong et al. 2003) as gˆi =
In this paper, we have presented results on the identification of a Wiener-Hammerstein benchmark nonlinear system using WSDP models which represent the system nonlinearities using SDP parameters in the form of wavelets. The proposed approach incorporates a PRESS based algorithm for optimized model structure selection. This, as a result, enables the identified model to represent nonlinear system in a compact manner. The obtained simulation results demonstrate the merits of the proposed approach in identifying this benchmark system.
848
Pm1 −1 y(k) − i=0 ωi (k)gi ξ−k (k) = Pm1 −1 ωi (k)2 1 − i=0 kω k2
(26)
i
Now, in order to simplify this representation of ξ−k (k), it is denoted by ξ−k (k|m), corresponding to the PRESS residual ξ−k (k) determined from a m-term model.
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 REFERENCES
In terms of ξ−k (k|m), the PRESS statistic is defined as n−1 X
P RESS(m) =
2 ξ−k (k|m)
(27)
k=0
which measures the predictive capabilities of the estimated model. If the addition of a term into the model yields a positive increment of the PRESS value, it means that term is decreasingly significant for the parameterization, and vice versa. Consequently, the PRESS can be used as a criterion to detect the significance of each term in the model. It is, in fact, the incremental value of PRESS resulting from excluding a term from the model that reflects its contribution in the model. If we let −i (m − 1) be the value calculated from a (m − 1)P RESSm term model by excluding the ith term from the original −i (m − 1) − m-term model, then ∆P RESSi = P RESSm P RESS(m) can be used as a criterion for the new term selection algorithm. The PRESS Term Selection Algorithm For the ease of representation, let us denote φi be the (i + 1)th column of Φ: φi = Φ(:, i + 1), and P (−i) denotes the matrix which is resulted from excluding the ith column from the original matrix P . (1) Initialize Φ = P, [N, m] = size(P ) (2) Orthogonal Decomposition (a) [N, m1 ] = size(Φ). Initialize ω0 = φ0 , g0 = (b) For 1 ≤ i ≤ m1 − 1, compute αj,i =
ω0T Y . ω0T ω0
ωjT φi , j = 0, 1, ..., i − 1 ωjT ωj
ωi = φi −
i−1 X
αj,i ωj
j=0
ωiT Y ωiT ωi (3) PRESS computation gi =
Pm1 −1 y(k) − i=0 ωi (k)gi ξ−k (k) = Pm1 −1 ωi (k)2 1 − i=0 kω k2 i
P RESS =
N −1 X
2 ξ−k (k)
k=0
(4) P RESS(m) = P RESS. For 1 ≤ i1 ≤ m, (a) Set Φ = P (−i1 ) . Repeat steps 2 and 3. −i1 (m − 1) = P RESS. Calculate (b) P RESSm −i1 (m − 1) − P RESS(m) ∆P RESSi1 = P RESSm (5) Based on the largest ∆P RESSi1 value, select the most significant term to be added to the regressor matrix. (6) Solve for the intermediate parameter estimate in a least squares manner. (7) Calculate the approximation accuracy, and compare it to the desired value: • If a satisfactory performance is achieved, stop the algorithm; • Otherwise, add extra terms into the regressor matrix based on the next largest ∆P RESSi1 values, and repeat from step 6 to 7.
849
N.V. Truong, L. Wang and P.C. Young. Nonlinear system modeling based on nonparametric identification and linear wavelet estimation of SDP models. Proc. 45 th IEEE Conf. on Decision and Control, San Diego, USA, pp. 2523-2528, 2006. N.V. Truong, L. Wang and P.C. Young. Nonlinear system modeling based on nonparametric identification and linear wavelet estimation of SDP models. International Journal of Control, 80:774-788, 2007a. N.V. Truong, L. Wang and J.M. Huang. Nonlinear modeling of a magnetic bearing using SDP model and linear wavelet parameterization. Proc. 2007 American Control Conference, New York, pp. 2254-2259, 2007b. N.V. Truong and L. Wang. Nonlinear System Identification in a Noisy Environment using Wavelet based SDP Models. in Proc. 17 th IFAC World Congress, Seoul, S. Korea, pp. 7439-7444, 2008a. N.V. Truong, L. Wang and P.K.C. Wong. Modelling and Short-term Forecasting of Daily Peak Power Demand in Victoria using 2-Dimensional Wavelet based SDP models. International Journal of Electrical Power and Energy Systems,30:511-518, 2008b. P.C. Young. The identification and estimation of nonlinear stochastic systems. In A.I. Mees (ed), Nonlinear dynamics and statistics, pages 127–166,Boston:Birkhauser, 2001. P.C. Young, P. Mckenna, and J. Bruun. Identification of non-linear sochastic systems by state dependent parameter estimation. International Journal of Control, 74: 1837-1857, 2001. G. Vandersteen. Identification of linear and nonlinear systems in an errors-in-variables least squares and total least squares framework. PhD Thesis, Vrije Universiteit Brussel,1997. L. Ljung. System Identification: Theory for the User. Second edition, Prentice Hall, New Jersey, 1999. R.H. Meyer. Classical and Modern Regression with Applications. Second edition, PWS-Ken: Boston, MA, 1990. L. Wang and W. Cluett. From Plant Data to Process Control: Ideas for Process Identification and PID Design. Systems and Control Book Series, Taylor and Francis: London, 2000. X. Hong, P.M. Sharkey, and K. Warwick. Automatic nonlinear predictive model construction algorithm using forward regression and the PRESS statisti. IEE Proceedings: Control Theory and Applications, 150:245-254, 2003.