Signal Processing 46 (1995) 179-202
Adaptive model selection and estimation for nonlinear systems using a sliding data window W. Luo, S.A. Billings* Department
of Automatic
Control and Systems Engineering,
University of Shefield,
Sheffield, SI 4DV. UK
Received 22 April 1994; revised 20 October 1994 and 30 May 1995
Abstract
A new algorithm which provides adaptive model selection and estimation on-line is derived based on the polynomial nonlinear ARMAX model (NARMAX). The algorithm uses rectangular windowing regression procedures where the forgetting factor is unity within a sliding data window. Variations in the model structure, or which terms are in the model, and the parameter estimates are tracked by using a sliding rectangular window based on Givens rotations. The algorithm which minimizes the loss function at every step by selecting significant regression variables and computing the corresponding parameter estimates provides an efficient adaptive procedure which can be applied in nonlinear signal processing applications. Simulated examples are included to demonstrate the performance of the new algorithm. Zusammenfassung
Auf der Basis des polynomialen nichtlinearen ARMAX Modells (NARMAX) wird ein neuer Algorithmus hergeleitet, der die adaptive Modellselektion und -schatzung im fortlaufenden Betrieb ermiiglicht. Der Algorithmus benutzt Regressionsverfahren mit Rechteckfenstern, wobei der Vergessensfaktor im gleitenden Datenfenster gleich 1 gewahlt wird. Auf der Basis der Givensgschen Drehoperatoren werden Veranderungen der Modellstruktur - also welche Terme im Model1 auftreten - und die Parameterschatzwerte mithilfe eines gleitenden Rechteckfensters nachgefiihrt. Der Algorithmus, der in jedem Schritt die Verlustfunktion durch Selektion der signifikanten Regressionsvariablen und Berechnung der entsprechenden Parameterschatzwerte minimiert, stellt ein effizientes Adaptionsverfahren dar, das in Anwendungen der nichtlinearen Signalverarbeitung eingesetzt werden kann. Simulationsbeispiele werden angefiihrt, urn die Leistungsfahigkeit des neuen Algorithmus nachzuweisen.
Un nouvel algorithme qui permet la selection et l’adaptation dun modele adaptatif en ligne est derive en se basant sur le modele ARMAX nonlineaire polynomial (NARMAX). L’algorithme utilise des procedures de regression avec fen&rage rectangulaire, oi le facteur d’oubli est igal a un dans la fenetre de don&s glissante. Les variations dans la structure du modele, ou quels termes sont presents dans le modtle, de m&me que les estimees des paramttres, sont suivis en utilisant une fenetre rectangulaire glissante basee sur les rotations de Givens. L’algorithme, qui minimise la fonction de perte $ chaque pas en stlectionnant les variables de regression significatives et en calculant les estimtes correspondantes des
* Corresponding author. 0165-1684/95/%9.50 0 1995 Elsevier Science B.V. All rights reserved SSDI 0165-1684(95)00081-X
180
W. ho, SA. Billings 1 Signal Processing 46 (I 995) 179-202
paramktres, est une prockdure adaptive efficace qui peut Ctre appliquke au traitement nonlinkaire du signal. Des exemples
simults sont inclus afin de demontrer les performances de ce nouvel algorithme. Keywords:
Adaptive model selection; Nonlinear system
1. Introduction
If the structure of a system model is known a priori on-line estimation is reduced to determining estimates of the unknown parameters. In this situation, either the standard recursive least squares algorithm (RLS) or variations of this which exhibit improved numerical properties can be applied including the sliding windowing algorithms [ll, 193. Orthogonal least squares parameter estimation algorithms based on orthogonal-triangular decomposition (QR algorithms) are known to be superior to all other RLS algorithms and are therefore often applied in real-time signal processing applications. When the structure of a system model is unknown detecting the system structure becomes a critical part of the identification procedure. Techniques used for the on-line modification of the order of linear models are not easily extended to the nonlinear case because the simple relationships between additional model terms that exists in the linear case are no longer valid for nonlinear models. New algorithms are therefore required for the online structure detection of nonlinear systems. An on-line orthogonal algorithm based on an exponential windowing approach was presented in [18]. The algorithm minimizes the loss function at each step by selecting significant regression variables while maintaining the orthogonality of the vector space as each new observation was processed. Because this algorithm uses orthogonal QR decomposition with column permutations and the effects of old data were exponentially weighted out, stability was ensured in continuous computation. A better alternative in some applications would be to use a rectangular window with a forgetting factor of 1 but to slide the window over the data set to provide improved adaptation. If the forgetting factor 1 is unity, however, this will cause overflow in continuous computations during on-line processing based on the exponential window approach (see
Appendix A for further details) and an alternative solution must be sought. The objective of the present study is therefore to introduce an adaptive orthogonal algorithm with on-line structure detection which is applicable for rectangular windowing regression. The algorithm determines the contribution that the candidate variable makes to the output to determine on-line which variables should be included in the model and then updates the parameter estimates with the selected model structure. Because recursive QR decomposition algorithms provide numerically stable and accurate computations with an efficient use of memory these form an ideal basis for development of the new rectangular window method. The new algorithm can adaptively change the model structure on-line but does not require storage of all the regression matrix and associated orthogonal vectors. The new algorithm is different from the exponential windowing method because of the operation of the sliding window, that is the oldest measurement contained in the last observation window is removed before a new measurement is added. Since the oldest measurement is eliminated and the newest measurement is added the new algorithm constructs a sliding window and can produce a series of sub-models based on the data which are updated successively in each window. The new algorithm can perform adaptive on-line model structure detection using a forward selection method and will be referred to as the GFSL algorithm (Givens rotation with Forward selection and SLiding windowing algorithm). The paper is organized as follows. First the NARMAX model is defined and the recursive orthogonal transformation with a sliding window is derived. The principle of structure detection is described in Section 4 and the procedures and initialization of the algorithm are summarized in Section 5. A discussion in Section 6 considers the computational costs and alternative algorithms. The final
W. Luo, S.A. Billings / Signal Processing
two sections contain numerical results and conclusions.
46 (1995) I79- 202
181
where K denotes the unknown parameters. Transforming the lagged sequences in Eq. (2.3) into regressors forms a pseudo-linear regression model given by
2. NARMAX model A single-input single-output [ 161 can be defined as y(r) = F[y(t
NARMAX model
- l), . . ,y(t - n,),u(t - d), . ..)
u(t - d - n, + l), &(t - I), . . ) E(f - n,)] + .5(t),
(2.1)
where t denotes discrete time, y(t), u(t) and c(t) represent the output, input, and prediction errors, respectively, Al,,,IZ, and nE are the corresponding orders, F [. ] is some non-linear function and d is the minimum time delay of the input. The prediction errors are defined by s(r) = y(t) - 9(t),
(2.2)
where ^ denotes predicted values, j(t) is the one step ahead prediction output. The degree of the power terms in y(.), u(.) and E(.) is defined by nl . Defining di =y(r-l),
A,=y(t-2),
... .
An” = Y(L - n,), Any+l = u(t - d),
An,+* = u(t - 1 - d),
.
A n,+n, -- u(t - n, - d), A n,+n,+l = E(t - I),
A.,+,,.+2 = ~(t - 3,
Any+n,+n, = E(t - n,), s = ny + n, + ne,
then a polynomial NARMAX model with degree n, can be expressed as Y(t) = &I(L) + S: ki,(t)Ai, i, = 1 s
+
s
1 1 ki,i,(t)Ai,Ai, + ... i,=li,=il
+5 i ...S:k,i, .._inJt)AilAi2... Ai,<, i,=l
i2=il
i.,=i.,-,
(2.3)
YCt) = f
di(t)ei(t)
+ E(t)3
(2.4)
i=l
where 4i(t) expresses the ith regression variable (regressor) which is a monomial of lagged u(t), y(t) and/or s(t), m is the number of the regressors and 0,(t) is the unknown parameter corresponding to 4i(t). The regression equations at the time points 1,2, . . . ,I are given by y(t) = @(r)O(r) + s(r),
(2.5)
where Q(t) is the t x m regression matrix, 0(t) is the m x 1 parameter matrix, v(t) and I are t x 1 matrices. If the functions 4i(.) are chosen properly, the polynomial model (2.4) can arbitrarily well approximate any continuous F[.] in (2.1) [S]. Various possibilities for parameterizing F [ .] exist including the extended model set NARMAX models which may involve functions such as absolute value, exponential, logarithmic, sgn( .), etc. [S]. Since the linear-in-the-parameter property is preserved several efficient identification algorithms have been developed (e.g. [7,9, 171) for this class of NARMAX model. A NARMAX model with an adaptive or changing structure can be defined by extending the definition of m in Eq. (2.4) to be m(t). Such a model will be time-dependent not signal-dependent [4]. To match the dynamics of the nonlinear system under test the identification based on the NARMAX model often involves power and cross-product terms of the inputs, outputs and noise. In the initial stage of processing many candidate regressors may emerge but most of these are often insignificant or linearly dependent. On-line structure detection can therefore be applied to determine which regressors should be included in the model at a particular time point. Like most nonlinear models, the structure of the NARMAX model does not satisfy the shiftinvariant property which allows successive expansion of candidate variables and therefore techniques like lattice algorithms which utilize these properties for on-line modification of the model
182
W. Luo. S.A. Billings ] Signal Processing 46 (1995) 179-202
structure are not used easily. This excludes most of the well known algorithms derived for linear models. It becomes necessary therefore to derive new algorithms for adaptive on-line structure detection and parameter estimation for nonlinear systems which can only be described by nonlinear models.
row vector Or yields
Premutiplying the above equations by a t x t orthonormal matrix Q*(t - 1) (i.e. Q*(t - 1) x Q(t - 1) = I) yields the QR orthogonal decomposition
3. Recursive QR decomposition with a sliding window u(t - 1) =
In the GFEX (Givens rotation with Forward selection and Exponential windowing) algorithm [18] the estimates of system structure and parameters can be considered as the result produced from a new measurement and a priori information. The latter contains previously processed data which is expressed concisely in an augmented matrix. At the time instant t this matrix is given by [R(t - 1) u,(t - l)], where R(t - 1) is an m x m upper triangular matrix and um(t - 1) is an m x 1 column matrix. This matrix and the new measurement at time t contain all the information in an exponential window up to time t. If the data contained in the window were unweighted this window would become rectangular. To obtain accurate solutions in this latter situation and to avoid overflow in orthogonal decomposition the oldest measurement contained in the matrix [R(t - 1) o,(t - i)] needs to be removed before the new measurement is added. This leads to the implementation of a sliding window method and forms the basis of the new algorithm presented in this study. At a specific time instant an augmented matrix produced by deleting the oldest data will be referred to as the backward-system and an augmented matrix produced by adding the new measurement will be referred as the forward-system at this time instant. The quantities corresponding to the two systems are marked with the subscripts b and f, respectively. First consider the forward-operation of the recursive decomposition. Suppose that there are t - 1 observations of the system Y(t - 1) = @(t - l)@t - 1) which contains m regression variables. Augmenting the t - 1 regression equations with a zero element denoted O2 and a 1 x m zero
%l(t -1) 1 1
UI_i_m(t - 1) i
02
R(t - 1) =
O1_,_,,, e^(t - 1) + Q*(t [
01
l)E(t
-
I),
(3.1)
where u,,,(t - 1) and or,_ 1_ ,,,(t - 1) have dimensions mxl and (t- 1 -m)xl, respectively, O1_r_,,, is a (t - 1 - m) x m zero matrix, E(t - 1) is the residual vector. When the new measurement data at time t, 4(t) and y(t) (where d(t) is an 1 x m regression matrix and y(t) is the output at time t) are added to improve the previous estimates, the regression equations can be written as
+
Q*(t - l)s(t - 1) +
,
(3.2)
where O,, and O,, are m x 1 and (t - 1 - m) x 1 zero matrices respectively, and s(t) is the a priori prediction error defined by i(t) = y(t) - 4(t) x e^(t - 1). Choose a t x t orthonormal matrix et(t) to update the QR decomposition so that
(3.3)
K Luo, S.A. Billings / Signal Processing 46 (1995) 179-202
where ut(t) is the tth element of a(t), the m x m upper triangular matrix is defined as
R(C) =
112(t)
.
.
.
rdt)
r22(t)
.
.
.
rh(t)
0
. .
.
.
L
0
.
>
(3.4)
. . 0 r,,(t)
the orthogonal matrix Q(t) = Q(t - l)Qt(t) is given by
Q(t)= Cc(t) e(t) ... 4ndt)4m+1(t)... 4t(t)l, (3.5) and the residual vector becomes &WI(t) a(t) =
&i-l-m(t)
[
E(t)
I
1
where the tth element of s(t), s(t) is the residual (the a posteriori prediction error) defined as E(t) = y(t) - $(t)&t). The estimation of 8(t) is easily achieved by solving the triangular system R(t)&t) = v,(t). This shows that removing the equations between the (t - 1 - m)th and (t - 1)th row of (3.3) does not effect the solution. The residual sum of squares (RSS) can easily be obtained from RSS(t) = lls(N2 = MN2
- llsdt)l12
= YT@ba - CWm@)
(3.6)
and can therefore be used for on-line structure detection which will be described later. If the orthogonal decomposition is performed row by row in the transformation from Eq. (3.2) to (3.3) the equations between the (t - 1 - n)th and (t - 1)th row of (3.2) do not need to change when the new upper triangular matrix R(t) and the new column vector u,,,(t) are obtained. The operation which eliminates 4(t) to a zero row and correspondingly transforms y(t) to u,(t) can be executed in an augmented matrix form and is expressed as follows:
183
Because the orthogonal transformation is performed row by row the orthogonal matrix et(t) does not need to be stored and therefore neither does Q(t) (where Q(t) = n:=,Qi(i)), so that the recursive orthogonal decomposition can still be performed. When the augmented matrix system is used to express the above orthogonal transformation neither Q,(t) nor Q(t) needs to be expressed explicitly, but the changes in R and u, can clearly be observed and the continuous computation is easily executed. If new measurement data need to be added to improve the estimates this augmented matrix system is easily used by putting each data set at the (m + 1)th row of the augmented matrix and then transforming this row of data. After the new data are processed the augmented matrix is updated successively and a recursive transformation is performed. At the beginning of every step u,(t) can be ignored and can be replaced by the 1 x 1 zero matrix 02. Consequently the result of Eq. (3.3) can be rewritten in the form of an augmented matrix for continuous computation, which is given
;
R(t) %dt) 01 1 02
’
(3.7)
When the observing window consists of the wI data up to time t, the matrix (3.7) should be given in
1
R,,(t) cv,,m(t) 01
02
’
(3.8)
where the subscript wf denotes the rectangular window and the forward-system. In back-operation the oldest measurements, i.e. +(t - wt + 1) and y(t - wI + l), are put at the positions of O1 and 02, and then eliminated from the matrix (3.3) using an orthogonal rotation same as that in the forward-operation but with a different coefficient (see A.lO). This results in the backsystem at time t, given by Z&(t) 1 01
%&t) 02
1
(3.9)
’
where b indicates the back-system which includes the information of the wI - 1 data up to time t. In every computational interval, say at time t, the GFSL algorithm deletes the oldest measurement
184
W. Luo, S.A. Billings / Signal Processing
data using (A.lO) to produce a backward-system up to time t - 1, and then adds the newest measurement by means of the same formula to produce the forward-system up to time t, given in (3.8). In continuous computation the observation window with length wI slides over all the data and the orthogonal decomposition is recursively applied to the data in this sliding window. Finally, the system structure and parameters at time t are determined according to the forward-system. The full procedure is given in Appendix A.
4. On-line structure detection Since the system structure at t is unknown, the computation has to involve all candidate variables and some of these variables may be linearly dependent so that the regression matrix G,,(t) corresponding to the w, data up to time t may not be full rank. This means that @z,(t)@,,,,(t) may be singular and there may no longer be a unique solution. In this situation e,,(t) can still be decomposed, but R,,(t) is a real upper triangular matrix with zero diagonal elements and m, positive diagonal elements and the row vectors of R,,(t) corresponding to the zero diagonal elements become zero vectors. The method for on-line structure detection derived in [ 183 can be readily applied to the forwardsystem (3.8) to determine the sub-model available to time t. To match the condition of the finite window (the size of the window is wI), rewriting RSS given in (3.6) yields RSS W,,m,W(t) = YZ,(r)Yw,(t)- UT,,.rn,(t)(t)u,~,,~(,)(t),
46 (1995) 179-202
RSS (NRSS), given by NRSS
=
i=l
where ERR,,,i(t) = v~,,i(t)/Yw,(t)TYw,(t). This is the error reduction ratio (ERR) [S, 171 of the orthogonal vector qw,,i which is obtained by transforming the first i columns of a,,(t) and u,,,~,~ represents the projection of the output in the direction of the ith orthogonal vector, 4a.i. The value of UwI,ican easily be used to select significant regressors from all the candidate regression variables by using a forward search procedure so that the selected variable minimizes NRSS,,Jt) at the jth selection step. The selection will be continued for m,(t) steps until NRSS lv,.fn,ct,(t)G 5s
where the subscript wf denotes the quantities relative to a forward-system which includes the information of wI data up to time t, ms(t) is the number of the selected regressors in the sub-model at time t. m,(t) is less than the number of candidate variables, m. The quantities with the subscript m,(t) indicates that they are associated with the value of m,(t). Dividing (4.1) by yE,y,, gives the normalized
(4.3)
where 5, is a pre-set tolerance. If m,(t) orthogonal vectors are selected, this means that the first m,(t) columns of a,,,,(t) are determined as significant regressors. To ensure clarity the notation ‘(t)’ and subscript wf will be ignored in the remainder of this section. At the jth selection step all candidate variables 4P, p = j, . . . ,m, are considered initially. Choose the candidate variable with the maximum of u,?(,), p=j, . . . , m, as the jth optimal regressor so that the RSSj can be minimized. Such a computation for u&, can only use the columns of R, rp, p = j, . . . , m, which have the following form: rI, = [rl,
(4.1)
1 - 1 ERRw,i(t),
rzp . . . Tjp . . . Tpp 0 . . . 0] T ,
and up, p = j, . . . , m. First set vj(j),= vj and two auxiliary variables T$,,) = rip and v$)p)= vj. Calculate Vj(p)using the following procedure: For i = j + 1, . . . ,p,
GYP, = &pFTq,
(4.4a)
(i- 1)
(i- 1) rj(p) v~)p) = uj(p) T
‘j(p)
(4.4b)
+Vi&*
‘j(P)
W. Luo, S.A. Billings / Signal Processing
Finally ufu,) = (u$,)“. For example, when u;(k)is the maximum of all the t&,,, & is selected as the jth optimal regressor. Then exchange the positions of rp-i) and r;:i-lJ in R(j-1) to form R(j- I)*, where the matrices with the superscript (j - 1) indicate the forms of the (j - 1)th selection step. Since the permuted R”- I)* 1s . not triangular, a reorthogonalization procedure must be applied to retriangularize the augmented matrix p[
lb*
uW1l
(4.5)
“0,1 ’
01
where ug- ‘) is the orthogonalized output vector produced in the (j - 1)th selection step. The new augmented matrix c”b’:’
185
GFSL Algorithm. t < wl (i.e. in the initialization
(9 If
of the whole computation), go to Step (ii); otherwise perform the back-operation. To delete the data at time t - wI from (5.1) first put [#(t - wI) y(t - wl)] in the (m + 1)th row of (5.1) to form
1
Rw,(r - 1) u,+,(t - 1) Y(t-WI . [ d(t-%)
(5.2)
Then with w, = - 1, apply the orthogonal transformation procedure (A.lO) to achieve the backward-system
R,.,(t - 1) uw,,dt - 1) 01 02 1’
(5.3)
(ii) Add the data at time t in (5.3) to form
51,
(4.6)
can easily be formed by applying a Givens transformation (A.lO) with w, = 1 to from the jth row to mth row of (4.5). The selection process is continued until (4.3) is satisfied. At every selection step the procedures include the selection of the optimal regressors, the permutation of the columns of R and the retriangularization of the augmented matrix. Although the operation is only performed in R and u,, the effect is to select the optimal regressors in the regression matrix @.
5. Computational
46 (1995) 179.-202
procedures and initialization
To completely describe the implementation of the new GFSL algorithm, a summary of the procedures will be given in this section. The initialization of the algorithm will also be discussed. Suppose at time t, the forward-system at time t - 1 is given by
1’
&Jr - 1) rw,,,(r - 1) 0, 02 [
(5.1)
the operations of the GFSL algorithm during the computational interval at time t can be described as follows.
1
&,(t - I) #(t)
rWb,m(f- 1) y(r)
(5.4)
Setting w, = 1, using (A.lO) retriangularizes (5.4) to form the forward-system
Ri:‘(t) u’,O,‘.,(t)
C 01
02 1 .
(5.5)
(iii) Choose the jth optimal regressor (j 2 1) with the maximum U$,,j(p,(t), p = j, . . . ,m, using (4.5). (iv) Based on the result of Step (iii), perform the permutation of the columns of R$“(t) and then the corresponding retriangularization. (v) Compute NRSS,,,j(t) which is defined as NRSSwr,j(t) = 1 - i ERR,t,i(t),
(5.6)
i=l
where
ERR,,,i(t) =
uif,i(t)
dr(th,(t)
(5.7)
If NRSS,Jt) satisfies the pre-set tolerance, namely, NRSS,,,j(t) < t,, the selection is terminated; otherwise utilize some statistical tests to check the selection and then decide whether to return to Step (iii) to involve more regressors. Suppose that m,(t) regressors have
W. Luo. S.A. Billings 1 Signal Processing 46 (1995) 179-202
186
been selected, the new forward-system is rearranged based on the contribution of the regressors to the output as
I 3
L
where R,,(t) = R~““(t) and u,,(t) = t$T(‘“(t). Using back-substitution calculate the parameters 0{(t), i = 1, . . . , m,(t), from R,,,,s,,,(t) which is the top-left triangular portion of the above matrix and uw,,,,(t)(t) which contains the first m,(t) elements. (vii) Compute the residual at the time instant t,
(vi)
m.(t)
E(t)= _V(t)- 1
$wf,iCt)
oww,,i(t).
i=l
In the initialization of the GFSL algorithm let all the elements in R,,(O) and u,JO) equal a small positive number to prevent division by zero. The sub-models which are generated at different time points may be different, therefore, the candidate variables should be sufficient to describe the dynamics of the system under test in a wide range of operations. Since the GFSL algorithm can select significant regressors on-line it allows new candidate variables to be added or some useless candidate variables to be removed on-line. These operations can be realized by means of the addition of some “empty” variables at the beginning of the computation by assigning all the elements of the associated columns of R,,+(O)as very small numbers. Variables which have not been involved in the initial regression model can then replace these ‘empty’ variables in the computational process. Removing some useless candidate variables can also be easily realized by substituting all the elements of the associated columns of R,(t) with very small numbers. An alternative method of adjusting candi-
date variables is to extend or contract on-line the dimension of the augmented matrix (3.8). All the new elements are initialized to very small numbers at time t and then the data associated with these new variables are added successively to the computation so that the data will have the effect on the estimates from time t. The range of the size of the window, wr, can be wide and depends on the knowledge about the system under investigation, e.g. the system dynamics, the selection of the sampling interval and the computational facilities. Roughly speaking, the faster the system structure varies, the smaller the wr. If the forgetting factor 1 is chosen between 0.98 and 0.995 in exponential windowing algorithms for a system, the asymptotic memory length N, is W-200 based on the concept of N, = l/(1 - A) [lo]. As a reference w1 should be selected to be larger than 50 for the same system. If the computations become unstable wI should typically be increased to provide more data in the window. This will be discussed in more detail later. Conversely, a large w1 can lead to smooth estimates but may produce large estimation deviations for sudden changes. If wI is very large and the values of the candidate variables are also large a scaling procedure should be adopted to reduce the amplitude of the signals to be processed and hence to avoid overflow in the computations which may arise because this situation is similar to the computations of the GFEX algorithm with 1 = 1.
6. Properties of the GFSL algorithm The sub-models produced by the GFSL algorithm are local models based on the data window at specific time points. As old data are removed and new measurements are added the sub-models change and the local windowing effect is relatively obvious. However, when the previous exponential windowing algorithm (GFEX) [18] is used the data in an observation window decay exponentially. The sub-models from the GFEX algorithm therefore weight out the effect of old data and may better reflect the present system state. The computation of adding a row using the Givens rotation method is numerically stable, but
W. Luo, S.A. Billings / Signal Processing
the deletion procedures described in (A.4) and (A.9) are potentially destabilising, even though there are no roundoff errors in the deletion process. For example, in (A.9) $kk is computed from the square root of the difference between two numbers and then used as a denominator to produce b, and b,. Since are always stored in limited cb,.kk and &!il,k precision they cannot be recovered as in infinite precision and hence the difference between them will not be exact. If the elements in the row to be deleted are large so that c b* “,kk approaches zero, the deletion operation may cause numerical instabilities. Such problems usually happen in circumstances where the size of the window, wI, is small but the amplitude of the signals to be processed changes rapidly. Fortunately, most of applications with rectangular windowing methods are relatively large Wevalues. The above deletion procedures involve rn subtractions in computing the diagonal elements of RWbat every computational interval and the instability of the algorithm may increase following the increase in the number of candidate variables. Golub and Styan [lS] suggested a stable method for deleting a row which only involves one subtraction in the computation for the diagonal elements of R,,, but requires 9m2/2 multiplications and 2m - 1 square roots. Considering the computational cost and complexity, it is better to use this method to process cases where roundoff errors are likely to be serious. Although this method is efficient, the sliding window methods should use relatively large windows and are best suited to systems where the difference in the amplitudes of the signals is not very large. This is because the stability of the operation is still limited by the condition number of Rwb(t). It is better to monitor the occurrence of instabilities by measuring some important variables, e.g. cb*,,kkin (A.9). In respect of the computational cost, it is difficult to give an exact comprison between the GFSL algorithm and the off-line orthogonal algorithms when they are applied in the rectangular windowing case, because the computational cost is always associated with the structure detection. Roughly speaking, the larger the wl, the more the computational savings using the GFSL algorithm. For example, when 10 regressors are selected from 20 candidate variables in a computational interval,
46 (1995) 179-202
187
the computational cost of the GFSL algorithm will be lower than the off-line modified Gram-Schmidt algorithm [9] if w, > 90. This is because, in a computational interval, when the off-line algorithm is applied to a data set of this length, the data set of the whole length must be transformed into the orthogonal vectors in every selection step and then the regressors are selected. But the GFSL algorithm works in the (20 + 1) x (20 + 1) matrix. In the computational interval, only one row of old data are deleted and one row of new data are added. The cost of computing the contribution of candidate regressors to the output is also low. The cost for retriangularization is high but the operation is only located in a 20 x 21 matrix (i.e. [R a]). It is worth noting that there are many zero elements in R sothat the computational cost can be reduced immediately. Compared with the exponential windowing algorithm (GFEX) which has only the adding operation, the GFSL algorithm requires more computation because of the deletion operation, but the cost is not very much more than that of detecting the system structure.
7. Numerical results The GFSL algorithm is suited to slowly timevarying systems and works best with large windows. The results from two typical examples will be presented below to illustrate this algorithm. Experiment 1: A simulation system This is a slowly time-varying system described as ( 0.5z(t - 1) + u(t - 1) l-250th point, 0.4z(t - 1) + u(t - 1) + O.O5d(t - 1) 25 l-500th point,
0.3z(t - 1) +0.9u(t - 1) +O.O75u’(t - 1) 501-750th point, 0.2z(t - 1) + 0.8u(t - 1) + 0.1u2(t - 1) \
751-1000th point,
y(t) = z(t) + e(t).
W. Luo. S.A. Billings / Signal Processing
188
46 (1995) 179-202
predicted output and residuals plotted in Fig. 2 also show that the fitting is quite good. Since 5, is usually unknown a priori, it is possible to miss some regressors which make a very small contribution when 5, is larger than the optimal cutoff. See for example the case at the 500th point where u2(t - 1) is missed. In contrast, a small 5, may lead to the appearance of insignificant terms in a selected sub-model, e.g. y2(t - 1) at the 900th point (Fig. 3(d)). But such terms usually make a very
In the test, the input signal u(t) was an independent uniformly distributed sequence with zero mean and variance 1.03, and the noise signal e(t) was Gaussian white with zero mean and variance 0.01. The input and output signals are illustrated in Fig. 1. 1000 data generated by the above system were processed using the GFSL algorithm with the ny = n, = nC= 2, n[ = 2, specification initial 5, = 0.01, wI = 200. The sub-models at the 250,500, 750 and 1000th points are shown in Table 1. The
(a) Y
106
400
306
206
Fig. 1. (a) Real input (simulation
Table 1 Experiment Point 250th
1 (Simulation
system, time-varying
(simulation
8Od
900.
system).
model) 0
ERR
u(t - 1)
1.OOOOOOE+ 00 5.000000E - 01 None
1.006801E + 00 4.894617E - 01 None
7.560013E 2.394044E None
1.000001 E + 00 4.000001 E - 01 5000000E - 02
1.008868E + 00 4.153543E - 01 None
8.194872E - 01 1.714760E - 01 None
9.OOOOOlE - 01 3000001E - 01 7.500000E - 02
9.061748E 3.001955E 7.536118E
8.907980E - 01 9.290199E - 02 1.055724E - 02
8.000001E - 01 2.OOOOO1E - 01 1.OOOOOOE- 01
7.926466E - 01 2.029505E - 01 9.23728OE - 02
u(t - I)
u(t - 1) v(t - 1) uqt - 1)
lC0Oth
system). (b) Real output
706
0
Y(t - 1) u2(t - 1) 750th
606
Terms
Y(t - 1) U*(t - 1) 500th
506
U(f - 1) Y(t - 1) uZ(t - 1)
t=1000
w,=200
5, = 0.01 1000 ,;, s2(t) = 9.549729E
~~c/fy + 00
= 8.8670518
U: = 9.267196E
- 03
- 03
- 01 - 01 - 02
9.280992E 4.196139E 2.203432E
- 01 - 01
- 01 - 02 - 02
1ood L
W. Luo. S.A. Billings / Signal Processing
46 (1995) 179-202
189
>/ 2. 0. -2’
100
200
300
Fig. 2. (a) Predicted
400
output
(simulation
500
600
system). (b) Residuals
700
800
(simulation
900
1000‘t
system).
0.0
(a)
, 0.8. 1 100
200
300
400
500
600
700
800
900
1000*t
100
200
300
400
500
600
700
800
900
1000-t
system).
(c) Parameter
Ib)
(d) Fig. 3. (a) Parameter of y(t - 1) (simulation system). (b) Parameter (simulation system). (d) Parameter of y’(t - 1) (simulation system).
small contribution to the output and emerge randomly. Inspection of the residuals (Fig. 2(b)) shows that the errors always rise at the beginning of every variation in the system structure and then reduce gradually. From this the effect of the movement
of u(t - 1) (simulation
of
u*(t - 1)
of the sliding window is evident. The curves of the parameters look smooth and exhibit the change in the system structure and parameters. When the contribution of u2(t - 1) at the 500th point increases this term is selected in the submodel.
W. Luo, S.A. Billings / Signal Processing 46 (1995) 179-202
190
e, 1.0 0.5 0.0
-0.5 ,
100
200
300
400
500
600
700
800
900
1000, t
100
200
300
400
500
600
700
800
900
1000‘t
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000 It
100
200
300
400
500
600
700
800
900
1 1000 t
P, 1.0 0.5 0.0 -0.5
0.5'
0.0
-
-0.5
a% 1.00.5 0.0 -% -0.5
1.00.5 0.0 -0.5 J
Fig. 4. On-line validity tests (simulation system).
W. Luo. S.A. Billings / Signal Processing
The on-line validity tests with a sliding window (Appendix B) shown in Fig. 4 provide the validity of the sub-models. The tests use the smoothed values of the means with an average factor 1.25 times the length WIand z = 15. It is observed that p,,(t) rises in the interval between the 250 and 500th point due to the missing of u2(t - 1). There are several large jumps in the plots of p,z’, and pU2’~z’. Most of these appear in cases when the data in the window belong to more than one system structure. In such a case the basic assumption in deriving the tests cannot be satisfied. This causes the results of the tests to fluctuate locally. Nevertheless, most of the portions of the plots are within the confidence intervals.
46 (1995) 179-202
191
experiment and the input and output are illustrated in Fig. 5. A time-varying NARMAX model was fitted using the GFSL algorithm with nY= n. = rrE= 2, nl = 2, & = 0.0001 and w1 = 150. The number of candidate variables is 28. The results are shown in Figs. 6, 7 and Table 2 and indicates that the estimator can track the changes in the system. At the 1000th point C:Py”e2(t) is 2.468 and the sub-model only involves six terms. As a comparison, a global model is shown in Figs. 9 and 10 and Table 3, which was produced by the off-line classical GramSchmidt algorithm with the initial parameters nY= n,, = 4, n, = 0, nr = 3, AIC = 4.0 (Akaike’s Information Criterion). The off-line estimator selects 15 terms from 165 candidate variables and results in RSS = 2.571 ( = C/~Y’E’(~)). Although there is no large difference between two values at the last point, the structure of the global model is more
Experiment 2: A large pilot scale liquid level system.
A description of this system is given by Billings and Voon [3]. 1000 data are used in the present
0.8
0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8
t
100
200
300
400
500
600
700
800
900
1000
100
200
300
400
500
600
700
800
900
1000 t
I
(b) Fig. 5. (a) Input (liquid level system). (b) Output (liquid level system)
W. Luo, S.A. Billings 1 Signal Processing
192
46 (1995) 179-202
/ 0.5 0.0 -0.5 -1.0 -1.5 -2.0
(a)
-0.15
(b) Fig. 6. (a) Predicted output (liquid level system). (b) Residuals (liquid level system)
complex than that of the time-varying model. The result of the global model shows that the first two terms, ~(t - 1) and u(t - l), make large contributions to the output (totalling 98.65%). The result at the 1000th point of the time-varying model shows that the same terms make a 97.65% contribution in the last window. The curve of the estimated parameters (Fig. 7) illustrates that these two terms are always selected in the sub-models over all time. The variance of the residuals 0; (2.404 x 1O- “) can be compared with the result (2.571 x 10P3) produced from the off-line estimator. This also approaches the result (2.018 x 10V3) produced from a recursive prediction error parameter estimator in which the system structure has been determined c71Ignoring the instability at the initial stage, the on-line validity tests shown in Fig. 8 suggest that
the time-varying model is acceptable. It is also observed that the curves of p,,~‘~ and p,,~‘~z’ have several peaks. The fixed sub-models at these points may not be reasonable due to changes in the system. Another reason could be related to the basic assumption of the tests. For this, the analysis is similar to Experiment 1.
8. Conclusions
A recursive orthogonal QR algorithm based on a rectangular sliding window and referred to as the GFSL algorithm has been derived. This algorithm uses a unity weighting factor over a window of data and adaptively adjusts the model structure on-line. In continuous computation, a series of sub-models over a sliding window are obtained and only
W. Luo, S.A. Billings / Signal Processing 46 (I 995) I79-202
0.4, 0.0
(b)
500
600
700
800
900
1000-t
400
500
600
700
800
900
1000*t
400
500
600
100
800
900
100
200
300
400
100
200
300
200
300
-0.4
-0.8
I
(dl
100
Lfwll&
0 -1 -2%
100
200
300
400
100
200
300
400
.’
500
600
700
800
900
1000
500
600
700
800
900
1000qt
700
800
900
1000
le)
(f)
6
0.0,
nn
I
.
A
-0.4:
100
200
300
400
500
600
*t
(cl)
Fig. 7. (a) Parameter of y(t - 1) (liquid level system). (b) Parameter of u(t - 1) (liquid level system). (c) Parameter of constant term (liquid level system). (d) Parameter of u(t - 2) (liquid level system). (e) Parameter of y(t - 2)u(t - 1) (liquid level system).(f) Parameter of y(t - 2) (liquid level system). (g) Parameter of y*(t - 2) (liquid level system).
W. Luo, S.A. Biilings / Signal Processing
194
Table 2 Experiment
2 (Liquid
level system, time-varying 0
Term
1000
Y(t - 1) 7.187055E - 01 u(t - 1) 3.907416E - 01 Constant - 4.657357E - 02 u(t - 2) - 1.299308E - 01 y(t - l)u(t - 1) - 2.027587E - 01 v(t - 2) 1.590202E - 01
9.479078E 2.857187E 3.893125E 1.126658E 6.353696E 3.707633E
t=1000
w, = 150
5, = 0.0001
~~a/y~y = 6.026616E
,000 ,;, E’(t) = 2.467572E
size of the sliding window is relatively large, the computational cost for updating the estimates in a window is lower than that of the off-line ortho,gonal QR algorithms that are applied to the same data set in a computational interval, because the operation with high computational cost is only performed in a matrix with small dimension. The algorithm determines the contribution that the candidate variables make to the output to determine which variables should be included in the final model before the corresponding parameter estimates are updated. Since the information in an observation window is represented concisely in the augmented matrix, the completeness of the system information recorded can ensure the correct determination of the system structure. If the system structure does not change over time the resulting parameter estimates will converge to the true values in the corresponding rectangular window. Because the GFSL algorithm uses stable Givens rotations, the numerical performance is similar to the exponential windowing algorithm in most applications. But when the size of the window is relatively small and the signals change rapidly, the data deletion procedures can sometimes cause
model) ERR
Point
+ 00
u: = 2.404027E
-
01 02 03 03 04 04
- 03
- 03
a small amount of memory space is required. Compared with the previous exponential windowing algorithm this new GFSL algorithm requires data deletion at each step and this incurs additional computations. This small additional cost is offset by providing good numerical performance to continuous computation with both rectangular windowing and orthogonal decomposition. When the
Table 3 Experiment
2 (Liquid
level system, global
Term
1) u(t - 1) u(t - 2)
y(t y(t y(t Y(t y(t y(t y(t y(t yZ(t -
l)y(t l)u(t l)y(t 2) 2)u(t 2)u(t l)u(t 2)u(r 1) d(t - 4) y(t - l)y(t y+t - l)u(t
- 2)y(t - 3) - 1) - 4)u(t - 2) -
2) 3) l)u(t - 2) 4)
-
-
- 4)u(t - 3) - 1)
n=lOOO Criterion
model, CGS)
B
Y(C-
ERR 7.085734E - 01 3.8845088 - 01 8.500008E - 02 2.881762E - 02 3.518114E - 01 2.742458E - 01 2.593803E - 01 3.1370178 - 01 1.9698538 - 01 1.887223E - 01 1.421650E - 01 3.645250E - 02 4.763843E - 02 1.335944E - 01 1.093884E - 01
No. of terms = 16.5 AK
= 4.0
RSS = 2.570706E
+ 00
46 (1995) 179-202
eT e/yTy = 6.278502E u: = 2.571030E
- 03
- 03
9.679613E 1.854159E 3.367434E 1.084866E 7.035443E 1.069390E 1.882639E 2.081952E 1.245256E 1.159556E 1.026571E 8.0599OlE 4.487067E 3.9389918 8.8912418
Standard - 01 - 02 - 03 - 03 - 04 - 03 - 04 - 04 - 04 - 04 -‘04 - 05 - 05 - 05 - 05
3.556876E 1.247766E 1.359262E 5.630327E 3.157156E 4.316852E 3.379767E 5.555111E 5.406829E 3.118323E 2.761783E 1.085798E 1.332959E 3.223927E 2.907003E
deviation -
02 02 02 03 02 02 02 02 02 02 02 02 02 02 02
W. Luo, S.A. Billings / Signal Processing
46 (1995) 179-202
195
1.7 P 0 -5’ 0.0 -0.5 100
200
300
400
500
600
700
800
900
1000 at
100
200
300
400
500
600
700
800
900
1000II
400
500
600
700
800
900
1000 If
PEt&l) 1.00.5 0.0 -0.5
e
U=‘E
1.0 0.5
I v
0.0 -0.5 100
200
300
Puvc’ 1.0
1
Fig. 8. On-line validity test (liquid level system).
196
W. Luo. S.A. Liillings / Signal Processing 46 (1995) 179-202
G 0.5 0.0 -0.5 -1.0 -1.5 -2.0 1Od
206
300
100
200
300
400
500.
60d
700.
806
god
1006 1;
400
500
600
700
800
900
1000 *t
(a)
-0.21
I’
(b)
Fig. 9. (a) Predicted output (liquid level system, global model). (b) Residuals (liquid level systems, global model).
+16pn 1.0
-1.0
+iw 1.0
-1.0
Fig. 10. Off-line validity tests (liquid level system, global model).
W. Luo. S.A. Billings / Signal Processing 46 (1995) 179-202
numerical problems. The GFSL algorithm is therefore more suited to slowly time-varying systems where the input-output signals are relatively smooth and larger windows can be employed.
ith and jth row is presented as follows:
rowi: =+
0,
0,
rowj: Acknowledgements
*
SAB gratefully acknowledges that part of this work was supported by SERC under grant GR/H/35286.
...
,O,Cf,i,k,Cf,i.k+l,
To illustrate this consider the following matrix:
c
1
&,(f - 1) UWb,m(t - 1)
_cf
=
Y(t)
4(t)
(A.11
where _Cf is an (m + 1) x (m + 1) matrix which includes the new measurements 4(t) and y(r), &(t - 1) and uWb,,,(t- 1) contain information about the past wr - 1 data up to time t - 1 and w1is defined as the size of the observation window (namely, the length of data to be processed at every computational step). Using the Givens rotation [ 12,131 the triangularization of (A. 1) can be implemented using the following procedure: Fork=1,2 ,..., m,
(A.2b) For p = k, k + 1, . . . ,m + 1, &P =
W
cf.,+
1.p
Cf,kp
=
fE
+
-cf,kp,
V- 1)
Cf,m+l,k
f ~2
+$,:‘,.,fc,
(A.2c) (A.2d)
where &‘i+1 = [4(t) y(t)], is the last row of cf. This orthogonal QR transformation is performed row by row. For example the transformation of the
...
0, . . . ,O, _f,J,k?_f,J,k+t? c c 0,
,“,o,c:j,k+lt
‘..
,Cti,rn+l;
.‘.
...
rsf.j,m+l
,C:j,m+l;
where * denotes the new quantities formation. Finally, the matrix
_c:=&v,(t) [
A. 1. Forward-operation
. . . Tcf,i,m+l
. . . ,O,~ti,k,~?i,k+lr
1
1
%. m(t)
o
Appendix A. GFSL algorithm
191
02
after trans-
(A.3)
is obtained where R,,+(t) and u,+,(t) contain the information about the w1data window up to time t. The right-hand side of (A.3) represents the forwardsystem at time t. The transformation operation of the GFEX algorithm [18] is a forward operation similar to the transformation of (A.l). The above method only differs by using R,,,,(t - 1) and uWh_(t - 1) instead of the two corresponding quantities in an exponential window (e.g. R”(t - 1) and U;n(t- 1)) and using a unity weighting factor in an observation window. As mentioned above if the forgetting factor I is unity (a rectangular window) a forward operation will cause overflow in continuous computation (the size of a rectangular window becomes infinite). This follows from Eq. (A.2a). For example, in every computational step, the first diagonal element of the new triangular matrix R(t) is always computed by squaring the first element of the new measurement [4(t) y(t)] and then adding this to the square of the first diagonal element of the previously formed R(t - 1). When ,I = 1, all quantities in the triangular matrix do not decay at every computational step so that the first diagonal element rll will tend to become infinite. By using a finite rectangular window, this problem can be overcome by removing the oldest data from the matrix L-&v,@ - 1) %,,??I(t - l)] which was formed in the forward-operation at time t - 1 using wr data up to time t - 1. Since the complete regression matrix cannot be stored but only wi records of the latest inputs, outputs and residuals some methods to backdate the window must be developed.
198
W. Luo, S.A. Billings / Signal Processing
46 (1995) 179-202
A.2. Backward-operation
pressed as
To delete data from the augmented matrix, Chambers [6] used the reverse operation to add a row of data and Golub [14] gave a method in which the row of data to be deleted was multiplied by a factor fi and then orthogonally transformed to form a backdated matrix. Inspection of (A.2) shows that the procedure for deleting a row of data can be considered as the reverse implementation of adding a row of data. Suppose [4(t) y(t)] is required to be eliminated from the resulting matrix _C:, the problem is to find $kp from &kP and &:‘i,P, p = k, . . . ) m + 1. First [4(t) y(t)] is put into the (m + 1)th row of _C:. Rearranging (A.2) forms the following operations:
YTw$)Yw$)
For k = 1, . . . ,m, cf.
kk
.L=
&kk)’
=
-
fs
=
cf;::.k.
CLkp-(C* c%+
1,p
(A.4b)
_f.kp
-
(k-1) Cf.&
Cf,kp
1.p
f s +
$,$)l,p
1
%w?l(~)
()
1
Writing the above in matrix form yields
@w,(t)
’
[ a&t-wwl+
1)1
e(t),
where a (a’ = 1) denotes an operator which is not used in the computational subroutines (A.9) but as a marker. Further
@w,(t)
Yw,W
[ a y(t - wI + 1)I[ =
a4(t-wY+
1)1
e(t).
transforming y,,+(t) and @,(t) yields
(A.4c)
fsY_c,
f c’
02
I
(A.4d)
In sliding window algorithms, the oldest measurement [~$(t - wI + 1) y(t - wI + l)] which needs to be deleted is put in the (m + 1)th row of _Cf*and then using (A.4) the matrix
44&) _c,*= [
(A.6)
m+l, -
=
- (cj(t - WI + l)e(t))‘C#+ - WI+ 1)&t).
Orthogonally
-1
Forp=k,k+l,...,
= (~w,(t)e(t))T~,,(t)e(t)
(A.4a)
(&,:‘l,k)2,
(k- 1)
Cf,kk *, cf.kk
=Y&(r)Y,(t) - Y2@- WI+ 1)
65)
can be obtained, where R,,(t) and u,+,,,,(t) correspond to w1- 1 data up to time t (these matrices or variables wih the subscript b are associated with deleting the oldest measurement). The right-hand side of (A.5) is the backward-system at t. This method is simple but the operations of adding and deleting are not uniform and therefore it is not easy to implement. Therefore another deleting procedure will be derived as follows. After deleting [4(t - wI + 1) y(t - wI + l)], the sum of squares of the outputs can be ex-
K,(t) =
m +QTWw,(O +
O,,-IPI
11
a&(t-wwl+
(A.7)
where QT(t) is the orthogonal transformation matrix, O,,_, is a (wI - m) x m zero matrix, u+Jt) are of dimension m x 1 and and v,,,,,-,(t) (wt - m) x 1, e,,(t) is the residual vector, O,, and OEl are the zero vectors with dimension m x 1 and (wwedr! x 1, and &b(t) is a prediction error de-
&,,b@)
=
Y@
-
wl
+
1)
-
#@
-
WI
+
1)&t).
The deleting procedure can now be performed on a concise form of (A.7): R,(t) a&t-w[+l)
vlv,,ln(t) ay(t-ww,+l)
1
(A-8)
W. Luo, S.A. Billings / Signal Processing 46 (1995) 179-202
and the arithmetic coefficient
by defining
1
%‘. #n(f) K&) _cbn= [ #(t - W[+ 1) y(t - Wl+ 1) and, for (A@,
w, =
= 1 I
cb
c
Sk:,
Thus the arithmetic for retriangularizing _C, is to choose b, and b, in (A.9) so that the elements of the (m + 1)th row become zero. Such a computation which transforms _C, into a backward-system ,C,* defined in (AS) can be performed as follows: Fork = 1,2, . . . ,m, *
2
=
Cbmkk
(k-1) (~bn,m+,,k)2
-
2 cb,,
(because c b*., kk =
kk
-
ta
&,‘i
1,k12
),
(A.9a) b,
_
bs,=
uCbn,mil.k
(A.9c)
Cb.,kp
bc &kp
-
&m’! =
I,*~s
Cbmkpbc
+
(k-1) OICb.,m+l.p
b)
st 2
(A.9d) (k)
Cb..m+
1.p = Cb,.kp b s -&!L,,,bc
@$,,+l,p
=
but
Store
-$,,,bs, $m,kpbs -
+
~&,i;~,~~c,
Ct,-,‘l
(A.9e)
l,pbc).
Finally _C,*= _Ct”. A.3. A ungorm formula for the two operations Combining (A.2) with (A.9) provides a uniform formula for adding and deleting a row of data. Define the matrix _c=
=
c:
d,,=%,
+
k;;:;k,’
(A.lOc)
wc,
d,,=<+,
(k- 1) (A.lOd) Ckk
Ckk
and for p = k, . . . ,m + 1, ck*p
=
ckp kc + $2
(k) l,p = Cm+
-~kp&swc
:!p
4s
wc
+~:;:!,&c’%.
(A.lOe) (A. 10f)
Since the adding and deleting operations using (A.lO) are simple and uniform, this method will be selected as the computational procedure for the new GFSL algorithm.
Appendix B. On-line model validity tests with sliding windows
kk
For p = k, k + 1, . . . ,m + 1,
(because
(A.lOb)
(A.9b)
*
cb.,
where ctk- 1) b = -b.,m+l.k I cb*,,kk
=
for adding a row,
(k-1)
%m kk Cb..kk*
&kp
1
i - 1 for deleting a row.
Then for k = 1, . . . ,m,
_b. .
M
[
199
Cr for adding a row, c C& for deleting a row,
(A.lOa)
Because the weighting factors are unity in the sliding window estimation the choice of on-line model validity tests depends on the size of the window, wI, the dimension of memory space and the limit of computational time in every interval. Assume that the maximum number of the lags to be used is T,,,~~.When the latest wr + r,,, records of inputs and residuals are available and the computational time available is sufficient, the off-line validity tests [l, 21 can be applied. If wI is relatively large and the memory requirements and computational time for the off-line tests are excessive, the on-line validity tests for exponential windowing algorithms [18] may be considered. In such an implementation, although the weighting factor 1 is unity in a observation window, the calculation for the forgetting factor q of the validity tests cannot use ecause substituting A= 1 into the rj=1/(2-I),b above equation yields q = 1 and hence the asymptotic memory length N, (N, = q/(1 - q)), which is equal to the window length, becomes infinite. This is clearly not correct for the sliding window case because the number of data sets in a sliding window
W. Luo, S.A. Billings / Signal Processing
200
is always finite. Therefore if the size of the window wI can be designed as the asymptotic memory length N, the forgetting factor v can be determined using q = wI/(wI + 1). For example, when wI = 100, rl = 0.99. If the computational time is limited but the memory space is sufficient to store the latest WI+ rmaxrecords of the input and residual, a set of on-line model validity tests with sliding windowing can be developed. This represents a compromise between completeness of the validity tests and complexity of the computation. The basic assumptions are that in a wide interval the means of the samples are close to zero and the means of the squares of samples do not vary greatly. Such conditions often exist in cases where a large window is applied to systems which vary slowly and smoothly. Similar to the on-line validity tests for exponential windowing methods, a new set of functions are defined by taking the average of the off-line correlation functions [l, 23, which are associated with finite lags of the w1data samples up to time t and are given as Pabk
WI, t)
=
$
,go yY,b(o>
WI,
46 (1995) 179-202
&CO? WI,t1 PUU,(OV WI? t)
PU’.&4? Pu’~&,
WI,
t)
WI,
t)
=
’ wl
JY&(O,
WI>t)
=
t)
Y(O,
WI>
t)
where
- s,(t - WI)s(r - WI)+ s,(t)s(Q, P&r, WI,r) = (i~$~,GlscS)
1
When the observation window slides over the data, the newest measurements can be succesively used in the tests and the oldest samples can be deleted from previously formed windows. From the results of these tests, it is possible to observe whether the fitted model is adequate or not in on-line identification. The normalized functions of the five validity tests are given as (the subscript s denotes normalization)
P&
WI,
PEE,(rY WI,t1 = (ix,Zjs(il)
t),
a(i - k) b(i).
i i=t-w,+
0
=
where r is the number of lags and ‘y,b(o,
4,
(B.3) ’
PEE(? WI9t) P&Y Y,,(O,WI, t) = P&
WITt) WI,
t) ’
- n,(t - WI)s(r - WI1+ n*(r)s(tl, i-l P~~~&~WI~~)
=
i=~wD,oE(i)
( -
We,(?
w1, t)
=
>
I
(
w
‘2
-
4)
a
-
WI)
Wl) E(t -
WI)
+
Mt)4tL
mE(i))
i=l-w,
P.1)
-
G(t
-
+
C,(t)&(t),
t-1
Puab,WI, t) P”& WI>t) = JY”U(O, WITt) ‘YeAO, WI, t) =
P&Y WI, t) JPUU,(O> WI t) PEE,(OY WI t) ’ 7
7
w2.,(5w,t)
=
i=~wc,0(E2(i) (
-
Zj3)
>
I
- C,(t - WI)(&2(t - WI) - &;,(t - WJ)
(B.2) + G(t)k2(t)
- &f,(t))),
W. Luo, S.A. Billings
(i;
PEEP. Y,t) =
/ Signal Processing
+2(t),
46 (1995) 179-202
201
calculated over a length of 224 times w,. Typically r is usually 5-20. Therefore, the implementation of these tests not only has low computational cost but also requires small memory space. Based on the results of the off-line model validity tests [l, 23, the values of the functions pEE,(r,wl, t), and b”%,k WI,4 PI& WI>tL PE(E& WI>t), j3U~~E~~,(z, wl, t) all lie within the range from - 1 to 1. If the fitted model is adequate at time t, pEE,(r,wlr t) should be near zero and the other functions should be zero. If the size of the window is large the standard deviation of the correlation is l/A and the 95% confidence intervals are defined as approximately f 1.96/J’&.
~.“,(O~w,,t)=(i~~;u’(i))-u~~t-w~~+U2(L~, Wu2~,(0,
w,
(
t) =
1-l c
(u2(i)
-
q@
i=z-w,
wt2.,K4
-
(u2(t
+
(U2@)
(
WI, t) =
‘2
) -
WI)
-
-
ui,(t
-
WJy
U;,@))2,
(E2(i)
-gj@)
i=I-w, -
(E2(t
+
b2(t)
-
WI)
-
E,(i)=
-
&;,(t
-
WI))2
G,(t))2,
u,(i)=
& k$ou(i - k),
References Cl1 S.A.
D,(i) = &
i
e(i - k - l)u(i - k - l),
k-0
c,(i)=
PI
~k~ol~2G-k)-~l. c31 I
u,2,0=1 1 wI t,=i-w,+
1
u2(t1h II41
, q)=f
1
t2(t1).
1 t,=i-w,+l
The overbar denotes the time average and the symbol prime denotes that the values have had the mean removed. The nine instruments with the subscript r in Eqs. (B.lHB.5) provide a measure of the model validity for the recursive computation with a sliding window. The five correlation functions involve nine recursiveTerations and twelve caculations for --means (e,(i), u,(i), D,(i), C,(i), m and m, i = t - wf, t). These average values can also be computed recursively so the computational cost is reduced. For example, when w1 is relatively large, the calculation of m can be obtained using
c51
C61 c71
C81
c91
Cl01
EZ(t - WI) E*(t) E,Z,o = &;,(t - 1) +-. WI
WI
For cases where wI is a relatively small value the means should be replaced by the smoothed values
c1t1
Billings and W.S.F. Voon, “Structure detection and model validity tests in the identification of nonlinear systems”, IEE Proc., Pt.D, Vol. 130, 1983, pp. 193-199. S.A. Billings and W.S.F. Voon, “Correlation based model validity tests for non-linear models”, Internat. J. Systems SC;., Vol. 15, 1986, pp. 601-615. S.A. Billings and W.S.F. Voon, “A prediction error and stepwise regression estimation algorithm for nonlinear systems”, Internat. J. Control, Vol. 44, 1986, pp. 803-822. S.A. Billings and W.S.F. Voon, “Piecewise linear identification of nonlinear systems”, Infernat. J. Control, Vol. 46, No. 1, 1987, pp. 215-235. S.A. Billings and S. Chen, “Extended model set, global data and threshed model identification of severely nonlinear systems”, Internat. J. Control, Vol. 50, No. 5, 1989, pp. 1897-1923. J.M. Chambers, “Regression updating”, J. Amer. Statist. Assoc., Vol. 66, No. 336, 1971, pp. 744748. S. Chen and S.A. Billings, “A recursive prediction error parameter estimator for nonlinear models”, Internat. J. Control, Vol. 49, 1989, pp. 569-594. S. Chen and S.A. Billings, “Representations of nonlinear systems: The NARMAX model”, Internat. J. Control, Vol. 49, 1989, pp. 101331032. S. Chen, S.A. Billings and W. Luo, “Orthogonal least squares methods and their application to non-linear system identification”, Internat. J. Control, Vol. 50, 1989, pp. 1873-1896. D. Clarke and P.J. Gawthrop, “Self-tuning controller”, Proc. Inst. Elec. Emy., Vol. 122, 1975, pp. 929-934. G. Cook and Beale, “Application of least square parameter identification with fixed length data window. _.“, IEEE Trans. Industrial Electronics, Vol. 30. No. 4, 1983, pp. 334-339.
202
W. Luo, S.A. Billings / Signal Processing 46 (1995) 179-202
[12] W.M. Gentleman, “Least squares computation
by Givens transformation without square roots”, J. Inst. Math. Appl., Vol. 12, 1973, pp. 329336. [ 133 W.M. Gentleman, “Regression problems and the QR decomposition”, Bull. Inst. Math. Appl., Vol. 10, 1974, pp. 195-197. [14] G.H. Golub, “Matrix decompositions and statistical calculations”, in: R.C. Milton and J.A. Nelder, eds., Statistical Computation, Academic Press, New York, 1969,pp. 365395. [lS] G.H. Golub and G.P.H. Styan, “Numerical computations for univariate linear models”, J. Statist. Comput. Simul., Vol. 2, 1973, pp. 253-274. [16] I.J. Leontaritis and S.A. Billings,“Input-output parametric models for nonlinear systems, Part I: Deterministic non-
linear systems; Part II: Stochastic nonlinear systems”, Internat. J. Control, Vol. 41, 1985, pp. 303-344. [17] Y.P. Liu, M.J. Korenberg, S.A. Billings and M.B. Fedzil, “The nonlinear identification of a heat exchanger”, 26th IEEE Conf on Decision and Control, Los Angeles, USA, 9-11 December 1987. [18] W. Luo, S.A. Billings and K.M. Tsang, On-line structure detection and parameter estimation with exponential windowing for nonlinear systems, Research Report No. 503, Department of Automatic Control and Systems Engineering, University of Sheffield, UK. [19] P.C. Young, Recursive Estimation and Time-series Analysis, Springer, Berlin, 1984, pp. 57-61.