Proceedings of the 15th IFAC Symposium on System Identification Saint-Malo, France, July 6-8, 2009
Time Variable Parameter Estimation Peter C. Young ∗ ∗
Systems and Control Group, Lancaster Environment Centre, Lancaster University, U.K; Fenner School of Environment and Society, Australian National University, Canberra; and School of Electrical Engineering and Telecommunications, University of New South Wales, Sydney Australia; e-mail: p.young@ lancaster.ac.uk. Abstract: The paper outlines the development of time variable parameter (TVP) estimation as an approach to modelling time varying dynamic systems. It then describes one of the latest methods for estimating time variable parameters in transfer function models and shows how it overcomes problems associated with earlier methods based on the least squares estimation of time variable parameters in the more restricted auto-regressive, exogenous variables (ARX) model. This ‘dynamic transfer function’ (DTF) estimation methodology is a combination of recursive-iterative instrumental variable filtering and fixed interval smoothing algorithms. Keywords: time-variable, non-stationary, recursive estimation, filtering, smoothing. 1. INTRODUCTION Until comparatively recently, the main emphasis in research and development on Time Variable Parameter (TVP) estimation has been concerned with ‘on-line’ or ‘real-time’ methods: see e.g. [Chen and Guo, 1991, Wellstead and Zarrop, 1991, Bittanti and Campi, 1994]. As a result, most algorithms have been of the ‘filtering’ type, ˆ k|k of the TVP vector pk , at any where the estimate p sampling instant k, is a function of all the data up to and including this k th instant. Surprisingly given its ultimate power and utility, the extension of these on-line methods to the ‘off-line’ analysis situation was not considered very much at first, despite the fact that a mechanism for such ‘smoothing’ estimation, in the form of various Fixed Interval Smoothing (FIS) algorithms and associated publications on this subject [Bryson and Ho, 1969, Norton, 1975, Jakeman and Young, 1981], have been available for many years. In these FIS ˆ k|N of pk estimation algorithms, the smoothed estimate p is based on all of the data available over a ‘fixed interval’ of N samples, usually the full sample length of the time series data. Note that, although these estimates are smoother and have a lower variance than the filtered estimates, their tracking ability is not impaired: indeed, because they are ‘lag-free’, they track to true changes much better (as one would expect with ‘off-line’ estimates) The present paper discusses both on-line and off-line estimation of standard TVP (‘dynamic’) linear regression models, as well as the Dynamic Transfer Function (DTF) model. In these estimation algorithms the TVPs are characterized stochastically as a class of Generalized Random Walk (GRW) models, the hyper-parameters of which are automatically optimized by maximum likelihood, so removing the need for ‘tuning’ them or their equivalents (such as forgetting factors in more traditional TVP estimation algorithms). The main purpose
978-3-902661-47-0/09/$20.00 © 2009 IFAC
432
of this paper is to demonstrate the advantages of the novel DTF model estimation method, as well as providing a tutorial introduction to the various TVP algorithms and their use. The algorithms outlined in the paper are available in the CAPTAIN Toolbox for Matlab [Taylor et al., 2007], which can be downloaded at http://www.es.lancs.ac.uk/cres/captain/. 2. THE TVP TRANSFER FUNCTION MODEL Although the extension to multi-input systems is straightforward, it is simpler to consider a single input, single output system. In the case of a DTF representation, the model takes the following form: yk =
B(z −1 , k) uk−δ + ξk A(z −1 , k)
t = 1, . . . , N
(1)
where uk and yk are, respectively, the input and output variables measured at the k th sampling instant; z −1 is the backward shift operator, i.e., z −r yk = yk−r ; and A(z −1 , k), B(z −1 , k) are time variable coefficient polynomials in z −1 of the following form: A(z −1 , k) = 1 + a1,k z −1 + a2,k z −2 + · · · + an,k z −n B(z −1 , k) = b0,k + b1,k z −1 + · · · + bm,k z −m .
(2)
The term δ is a pure time delay, measured in sampling intervals, which is introduced to allow for any temporal delay that may occur between the incidence of a change in uk and its first effect on yk . Finally, ξk represents uncertainty in the relationship arising from a combination of measurement noise, the effects of other unmeasured inputs and modelling error. Normally, ξk is assumed to be independent of the assumed noise-free input uk and is modelled as an AutoRegressive (AR) or AutoRegressiveMoving Average (ARMA) stochastic process, although this restriction can be avoided by the use of instrumental variable methods, as discussed below.
10.3182/20090706-3-FR-2004.0135
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 In the more restricted case of the Dynamic AutoRegressive, eXogenous variables (DARX) model, ξk is defined as 1 ξk = ek A(z −1 , k) where ek is assumed to be zero mean white noise. This has the advantage that equation (1) can be written in the following alternative vector equation or ‘linear regression’ form: yk = zTk pk + ek (3) where now, zTk = [−yk−1 − yk−2 . . . − yk−n uk−δ . . . uk−δ−m ] pk = [a1,k a2,k . . . an,k b0,k b1,k . . . bm,k ]T = [p1,k p2,k . . . pn+m+1,k ]T (4) 3. MODELLING THE PARAMETER VARIATIONS: THE DYNAMIC TRANSFER FUNCTION (DTF) MODEL It is the contention of the present paper that TVP estimation is best considered as a unified operation that involves both recursive filtering and smoothing, based on modelling the parameter variations in a stochastic, state space manner. Here, the time series data are processed sequentially, first by the ‘forward-pass’ filtering algorithm that provides ˆ k|k k = 1, 2, . . . N , as well the on-line parametric estimate p ˆ k+f |k into the future, where as any predictions (forecasts) p f is the forecasting horizon. Following this, the ‘backwardpass’ FIS algorithm updates these filtered estimates to ˆ k|N k = N, N − 1, . . . , 1, as yield the smoothed estimate p well as any interpolates over gaps in the data; or backcasts ˆ k−b|k , where b is the backcasting horizon (normally for p k = 1 at the beginning of the data set to yield a backcast into past). Although TVP estimation of this type can be applied most easily to ARX models, to yield the DARX algorithm, it will be described here in the more difficult context of the general DTF model (1). We will then see that this DTF algorithm can be simplified to handle the the DARX and the various linear regression models. Reflecting the statistical setting of the analysis and referring to previous research on this topic, it seems desirable if the temporal variability of the model parameter vector pk is characterized in some stochastic manner. Normally, when little is known about the nature of the time variability, this model needs to be both simple and flexible. One of the simplest and most generally useful class of stochastic, state space models involves the assumption the ith parameter, pi,k , i = 1, 2, . . . , n + m + 1, in pk is defined by a two dimensional stochastic state vector xi,k = [li,k di,k ]T , where li,k and di,k are, respectively, the changing level and slope of the associated TVP. This selection of a two dimensional state representation of the TVPs is based on practical experience over a number of years. Initial research tended to use a simple scalar random walk model for the parameter variations but subsequent research showed the value of modelling not only the level changes in the TVPs but also their rates of change. The stochastic evolution of each xi,k (and, therefore, of each of the n + m + 1 parameters in pk ) is assumed to be described by one of the GRW family of random walks
433
[Jakeman and Young, 1981] defined in the following state space terms: xi,k = Fi xi,k−1 + Gi η i,k i = 1, 2, . . . , n + m + 1 (5a) where αi βi δi 0 Fi = ; Gi = 0 γi 0 ǫi and ηi,k = [η1i,k η2i,k ]T is a 2×1, zero mean, white noise vector that allows for stochastic variability in the parameters and is assumed to be characterized by a (normally diagonal) covariance matrix Qηi . Of course, equation (5a) is a generic model formulated in this manner only to unify various random walk-type models: it is never used in its entirety since it is clearly over-parameterized. Omitting the i subscript for convenience, this general model comprises as special cases: the integrated random walk (IRW: α = β = γ = ǫ = 1; δ = 0); the scalar random walk (RW: scalar but equivalent to (5a) if β = γ = ǫ = 0; α = δ = 1, as well as the first order autoregressive, AR(1) model with β = γ = ǫ = 0; 0 < α < 1; δ = 1 : i.e., both relating only to the first equation in (5a) (see below); the intermediate case of smoothed random walk (SRW: 0 < α < 1; β = γ = ǫ = 1; and δ = 0); and, finally, both the local linear trend (LLT: α = β = γ = ǫ = 1; δ = 1) and damped trend (DT: α = β = δ = ǫ = 1; 0 < γ < 1). Note that the LLT model can be considered simply as the combination of the simpler RW and IRW models. It was originally suggested by Harvey (see e.g. Harvey [1989]) who has used it extensively in the context of ‘structural time series’ forecasting and, with others, exploited it in the Structural Time Series Analyser, Modeller and Predictor (STAMP) computer program. As can be seen from these examples of the GRW model, the only unknown parameters are normally the diagonal elements of the covariance matrix Qηi , the SRW damping coefficient α and the DT damping coefficient γ. These are combined into a ‘hyper-parameter’ vector, so called to differentiate these hyper-parameters from the TVPs that are the main object of the estimation analysis. However, the hyper-parameters are also assumed to be unknown a priori and need to be estimated from the data, normally under the assumption that they are time invariant (see later). Note that, in the case of the RW model, i.e., li,k = li,k−1 + η1i,k ; li,k = pi,k , each parameter can be assumed to be time-invariant if the variance of the white noise input η1i,k is set to zero. Then the stochastic TVP setting reverts conveniently to the more normal, constant parameter TF model situation. Having introduced the GRW models for the parameter variations, an overall state space model can be constructed straightforwardly by the aggregation of the subsystem matrices defined in (5a), with the ‘observation’ equation defined by the model equation (3): i.e., Observation Eqn. : yk = Hk xk + µk (i) (5b) State Eqn. : xk = Fxk−1 + Gη k (ii) If p = 2(n + m + 1), then F is a p × p block diagonal with blocks defined by the Fi matrices in (5a); G is a p × p block diagonal matrix with blocks defined by the corresponding sub-system matrices Gi in (5a); and η k is a
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 p dimensional vector containing, in appropriate locations, the white noise input vectors η i,k (‘system disturbances’ in normal state space terminology) to each of the GRW models in (5a). These white noise inputs, which provide the stochastic degree of freedom to allow for parametric change in the model, are assumed to be independent of the observation noise ek and have a covariance matrix Q formed from the combination of the individual covariance matrices Qηi . Finally, Hk is a 1 × p vector of the following form, [−yk−1 0 . . . 0 − yk−n 0 uk−δ 0 . . . 0 uk−δ−m 0] (5c) that relates the scalar observation yk to the state variables defined by (5b)(ii), so that it represents the DTF model (1), with each parameter defined as a GRW process and µk = A(z −1 , k)ξk . In the case of the scalar RW and AR(1) models for parameter variation, the alternate zeros are simply omitted. If the noise variable µk in (5b)(i) happens to be zero mean, white noise with variance σ 2 , then the above TVP model reduces to the simpler DARX model. It is well known that this DARX model can be treated as a ‘linear regression’ relationship and that the standard forms of the KF and FIS algorithms can be used, very successfully, to estimate the TVPs. These standard solutions are discussed in detail in Young [1999], which describes and evaluates Dynamic Linear Regression (DLR), Dynamic AutoRegression (DAR), and Dynamic Harmonic Regression (DHR) algorithms, in addition to the DARX estimation (see also Young et al. [1999]). All of these algorithms are available in the CAPTAIN Toolbox, where they provide a valuable resource for various aspects of adaptive extrapolation, interpolation and smoothing of nonstationary time series with missing samples. The problem is, of course, that µk is not white and gaussian, even if ξk has these desirable properties. This difference is very important in the DTF context since it can be shown that the TVP estimates obtained from the standard recursive filtering/smoothing algorithm will not converge properly to the region around the true variable parameter (equivalent to asymptotic bias in constant parameter model estimation: see later, Figure 4). The level of this effect is dependent on the magnitude of the measurement noise and it is necessary to modify the standard algorithms to avoid such problems. This can be achieved by attempting to model the noise µk in some moving average manner [Norton, 1975, 1986]. However, since µk is a complex, nonstationary, noise process, its complete estimation is not straightforward. A alternative approach, which does not require modelling µk , provided it is independent of the input uk , is the following recursive-iterative Instrumental Variable (IV) algorithm. 1. Forward-Pass Symmetric IV Equations (iterative) Iterate the following recursive equations (6a) - (6e) for j = 1, 2, . . . , IT , with standard RLS estimation used at ˆ k = Hk for j = 1: the first iteration: i.e. H Prediction: ˆ k|k−1 = Fˆ x xk−1 ˆ k|k−1 = FP ˆ k−1 FT + GQr GT . P
(6a)
434
Correction: ˆ k|k−1 } ˆk = x ˆ k|k−1 + Gk {yk − Hk x x ˆ ˆ ˆ ˆ Pk = Pk|k−1 + Gk Hk Pk|k−1 i−1 h ˆ k|k−1 H ˆT ˆ kP ˆ k|k−1 H ˆT 1+H Gk = P k k
(6b)
ˆ k=[−ˆ H xk−1 , 0, . . . , x ˆk−n , 0, . . . , uk−δ , 0, . . . , uk−δ−m , 0] where x ˆk is generated by the auxiliary model: ˆj−1 (z −1 , k) B x ˆk = uk−δ . (6c) Aˆj−1 (z −1 , k) The FIS algorithm is then in the form of a backward recursion operating from the end of the sample set to the beginning. 2. Backward-Pass Fixed Interval Smoothing IV (FISIV) equations (single pass) ˆ k+1|N + GQr GT Lk ˆ k|N =F−1 x x iT h ˆ Tk+1 H ˆ k+1 ˆ k+1 H Lk= I − P oi n h ˆ ˆT L ˆ k+1 − H ˆT ˆ k+1 F k+1 yk+1 − Hk+1 x i h ˆ −1 FP ˆ k, ˆT P ˆ −1 ˆ k|N =P ˆk + P ˆ kF ˆ k+1|N − P ˆ k+1|k P P P k+1|k k+1|k (6d) with LN = 0. In these algorithms, the p×p Noise Variance ˆ k are Ratio (NVR) matrix Qr and the p × p matrix P defined as follows, Q ˆ k = Pk , Qr = 2 ; P (6e) σ σ2 As we will see in the later examples, the advantage of the smoothing recursions is that they provide ‘lag-free’, lower variance estimates of the TVPs. The main difference between the above algorithm (6a)– (6e) and the standard recursive filtering/smoothing algoˆ k vector and rithms is the introduction of ‘hats’ on the H ˆ the Pk matrix, and the use of an iterative IV solution in the forward-pass algorithm. In the standard algorithm, which applies for the simpler linear regression and DARX ˆ k is replaced by Hk in (6b) and there is no models, H need for iteration in the forward-pass (see the simulation ˆ k is the IV vector, example in the next section). In (6b) H which is used by the algorithm in the generation of all ˆ k terms and is the main vehicle for correcting the the P deficiencies of the recursive least squares solution. The ˆj−1 (z −1 , k) indicates subscript j − 1 on Aˆj−1 (z −1 , k) and B that the estimated DTF polynomials in the ‘auxiliary model’, (6c), which generates the instrumental variables ˆ k , are updated in an x ˆk that appear in the definition of H iterative manner, starting with the least squares estimates of these polynomials for j = 1. Iteration is continued for IT iterations, until the forward pass (filtered) IV estimates of the TVPs are no longer changing significantly: often only 3 or 4 iterations are required. This recursive-iterative approach to time variable parameter estimation is based on the standard IV algorithm for constant parameter TF models, except that the ‘symmetric gain’ version of the IV algorithm [Young, 1970, 1984] is used, rather than the more usual asymmetric version. This is necessary in order that the standard recursive FIS algorithm in (6d) can be used to generate the smoothed
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 estimates of the TVPs. Note finally that optimal Refined IV algorithms [Young, 2008] for constant parameter models exploit adaptive pre-filtering to induce optimality in a Maximum Likelihood (ML) sense. It would be possible to extend the above algorithm to include such adaptive prefiltering but this would require TVP pre-filters, or some alternative average prefilter, so considerably increasing the complexity of the algorithm. 3.1 Optimization of the Hyperparameters
The recursive TVP estimation algorithm is used repeatedly to generate the one step-ahead-prediction errors εk and, thence, the value of the cost function in (13) associated with the latest selection of hyper-parameter estimates made by the optimization algorithm. The optimization algorithm then adjusts its selection of hyper-parameter estimates in order to converge on those estimates that satisfy (13) to some specified accuracy. 4. SIMULATION EXAMPLES
In relation to the algorithm (6a) to (6e), the hyperparameter vector h is composed of the diagonal elements of the NVR matrix Qr in (6e) and, if they are specified, the SRW/DT damping coefficients αi and γi . The approach to ML optimization used here is based on Prediction Error Decomposition, which derives originally from the work of Schweppe [1965], who showed how to generate likelihood functions for Gaussian signals using the Kalman filter (see also Bryson and Ho [1969], page 389). Its importance in the present context was probably first recognized by Harvey [1989] and Kitagawa and Gersch [1996] in their development of ‘unobserved component’ or ‘structural’ forecasting models. Since then, it has become one of the two standard approaches to the maximum likelihood optimization problem (the other being the Expectation and Minimization (EM) algorithm [Dempster et al., 1977]). With given initial values for the hyper-parameters, the Kalman filter algorithm will yield the one-step-ahead prediction errors (‘innovations’ or ‘recursive residuals’) εk , where ˆ k|k−1 k = 1, 2, . . . , N. (10) εk = yk − Ht x If the first p observations are regarded as fixed, the loglikelihood function of yp+1 , . . . , yN can be defined in terms of the standard ‘linear regression’ form of prediction error decomposition. It is straightforward to show that, in the case of the ARX model, the ML estimate of σ 2 , conditional on the hyper-parameters, is given by N X 1 ε2t σ ˆ2 = , (11) N − p t=p+1 1 + Ht Pt|t−1 HTt so that it can be estimated in this manner and ‘concentrated out’ to yield the following expression for the ‘concentrated likelihood’: N −p log(Lc ) = − log(2π + 1) 2 N (12) N −p 1 X log(ˆ σ2 ) log(1 + Hk Pk|t−1 HTk ) − − 2 2 k=p+1
which needs to be maximized with respect to the unknown hyper-parameters in order to obtain their ML estimates. Since (12) is nonlinear in the hyper-parameters, the likelihood maximization needs to be carried out numerically. Consequently, it is more convenient to remove the constant term (since it will play no part in the optimization) and consider the optimization of the hyper-parameter vector h in the following minimization manner: N X ˆ = arg min 1 h {log(1 + Hk Pk|k−1 HTk ) 2 (13) h k=p+1 + (N − p) log(ˆ σ 2 )},
435
4.1 Example 1: DARX Estimation This example is is based on an example originally described in Wellstead and Zarrop [1991] and is intended to compare the performance of the limited memory ‘forgetting’ algorithms’ [Kulhavy, 1987, Wellstead and Zarrop, 1991] with that of the statistical algorithms described in the previous section. Here, the model takes the following, simpler DARX form: b0,k 1 yk = uk−1 + ek 1 + a1 z −1 + a2 z −2 1 + a1 z −1 + a2 z −2 or, in discrete-time equation terms: yk = −a1 yk−1 − a2 yk−2 + b0,k uk−1 + ek where ek is zero mean, white noise with variance 0.16 (yielding a noise/signal ratio, std(ξk )/std(yk ) = 0.1). The b0 parameter changes from 1.0 to 2.0 at t = 200, and then back to 1.0 at t = 900; while a1 = −1.0 and a2 = 0.25 are time invariant. The simulated input-output data are shown in Figure 1: the input uk changes from a square wave between plus and minus one to a very small amplitude square wave of plus and minus 0.002 at t = 400, reverting to the original large square wave at t = 850. This choice of input signal induces ‘estimator wind-up’ in the case of the standard Exponentially-Weighted-Past (EWP) algorithm because the information content in the data during the period of low input activity is not sufficient to ensure good performance from this rather crude TVP estimation algorithm. This is illustrated in the lower panel of Figure 2, where the design parameter λ = 0.85 was obtained by manual tuning to yield the best overall performance: the wind-up behaviour is clearly apparent after t = 780. The upper panel of Figure 2 shows the results obtained with the Directional Forgetting (DF) algorithm, which is designed specifically to limit estimator wind-up. Here, the design parameter is set to 0.85. We see that this produces a distinct improvement over the standard EWP algorithm, with the worst excesses of the wind-up no longer occurring. However, the response to the parametric change is relatively slow and there is considerable interaction between the estimates over the period of input inactivity. By far the best results are those shown in Figure 3, as obtained using the ML optimized DTF algorithm amended for the simpler DARX model form. Since least squares estimation of the DARX model is satisfactory in this case, the IV iterations are not required and the algorithm reverts to the standard algorithm for GRW modelled parameter variations. In the case of the simplest RW model, the forward-pass, filtering part of this DARX algorithm, as used to obtain the results in the top panel of Figure 3, takes the simple form:
Input Signal
1 0.5 0 -0.5 -1 0
200
400 600 800 Time (Number of Samples)
1000
1200
True and Estimated Parameters
-1.5
Noisy Output Signal
10 5 0 -5 -10
0
200
400 600 800 Time (Number of Samples)
1000
1200
True and Estimated Parameters
True and Estimated Parameters
Fig. 1. Input uk (upper panel) and noisy output yk (lower panel) for the first simulation example. Directional Forgetting: Filtering 3 2
0 -1 0
200
400 600 800 Time (Number of Samples) EWP Forgetting: Filtering
1000
1200
200
400 600 800 Time (Number of Samples)
1000
1200
3 2 1 0 -1 -2
0
RW Model: Filtering 3 2 1 0 -1 -2
0
200
400 600 800 Time (Number of Samples) RW Model: Smoothing
1000
1200
200
400 600 800 Time (Number of Samples)
1000
1200
3 2 1 0 -1 -2
0
Fig. 3. Estimation results for the simulation example using the ML optimized DARX estimation algorithm: filtering results (top panel) and smoothing results (bottom panel). Estimated standard error bounds are shown dotted in both cases. invariant. However, NVRb0 is significant and has been optimized at a value that gives good tracking of the step changes in the b0 parameter.
1
-2
True and Estimated Parameters
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009
Fig. 2. Estimation results for the simulation example using the standard Exponentially Weighted Past (EWP: lower panel) and Directionally Forgetting (DF: upper panel) estimation algorithms. Pk|k−1 = Pk−1 + Qr ˆk = p ˆ k−1 + Gk yk − zTk p ˆk−1 p −1 Gk = Pk|k−1 zk 1 + zTk Pk|k−1 zk
(10)
Pk = Gk zTk Pk|k−1 which is of similar complexity to the EWP and DF algorithms. Here, however, the elements of the diagonal NVR matrix Qr are the NVR hyper-parameters for the three TF model parameters, which are ML-optimized at the following values: NVRa1 = 1.5 × 10−16 ; NVRa2 = 4.6 × 10−20 ; NVRb0 = 0.0186. where we see that the NVRa1 and NVRa2 are both insignificantly different from zero, illustrating how the ML optimization has, quite objectively, inferred from the data that the associated a1 and a2 parameters are time-
436
The lower panel in Figure 3, shows the backward-pass FIS estimation results, as obtained from the simplest RW version of the FIS algorithm (6d). In comparison with the filtering results in the upper panel, it will be noted that the smoothed estimates are more accurate; the step changes are anticipated because the estimates are based on the whole data set; and the estimates of the constant parameters are now themselves constant. Note also that, since both the filtering and smoothing algorithms are statistical in nature, they provide information on the statistical properties of the estimates, as shown here by the standard error (se) bounds plotted as the dotted lines either side of the estimates. As expected, the se bounds on the smoothed estimates are visibly smaller than those of the filtered estimates, particularly over the period of input inactivity. The superiority of the DARX approach to TVP estimation is clear from the results presented in Figures 1 to 3. Not only are the results visibly better, despite the fact that the filtering DARX algorithm is of similar complexity to the EWP and DF algorithms, but also the data-based ML optimization of the algorithm is fully automatic, so removing the need for subjective and tiresome manual tuning. 4.2 Example 2: DTF Estimation As a simple example of DTF modelling, consider the estimation of the parameters in the following first order [1 1 2] TVP model: b0 xk = uk−2 uk = N(0, 6.25) 1 + a1,k z −1 (11) yk = xk + ξk ξk = N(0, 2.56),
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 where the b0 = 0.5 is constant and a1,k varies sinusoidally, as 0.9 sin(0.02k). Estimation is based on the measurements of yk and uk , k = 1, 2, . . . , 2000. The noise/signal ratio on yk is quite high, with std(ξk )/std(yk ) = 0.84. Actual (dashed), DTF(IV) Estimate (full)
optimization in terms of the sum of squares of the single of multiple-step-ahead prediction errors is also available). Finally, it is important to note that the approach to TVP estimation described in this paper can be extended to State-Dependent Parameter (SDP) parameter estimation and the data-based modeling of stochastic nonlinear systems [Young et al., 2001] (the sdp routine in CAPTAIN).
TVP
1
REFERENCES
0 -1 0
200
400
600
800
1000
1200
1400
1600
1800
2000
1800
2000
Actual (dashed), DARX(LS) Estimate (full)
TVP
1 0 -1 0
200
400
600
800 1000 1200 Time (samples)
1400
1600
Fig. 4. Comparison of the good, DTF estimate a ˆ1,k|N of a1,k (upper panel, full line) and poor least squares, DARX estimate of the TVP (lower panel, full line). True variation: dashed lines; SE bounds, dotted lines. The superiority of the DTF estimates is clear in Figure 4, where the DTF estimates are much better than the equivalent DARX estimates. Also, the DTF model with these estimated parameters explains the data well: the coefficient of determination based on the simulated DTF model output compared with the noise-free simulated output of (11) is RT2 = 0.93 (93% of the output series variance explained by the TVP model); whilst for the DARX model, this is reduced to RT2 = 0.85. The model residuals (innovations) for the DTF model are also superior: they have an approximately normal amplitude distribution; and, as required, both the autocorrelation function (acf) and the cross correlation function (ccf) between the residuals and the input uk , are insignificant at all lags. Note finally that the b0 parameter is identified by the NVR hyper-parameter optimization as being constant. However, if both parameters are allowed to vary in a similar manner (e.g. a1,k sinusoidal, as here, and b0,k with a similar frequency, cosine variation between 0.9 and −0.9), then the associated NVRs are optimized as Qr = diag [0.00101 0.00232], showing that the ML optimization has found strong evidence of temporal changes in both parameters, the variations in which are estimated well. 5. CONCLUSIONS This paper has described the DARX and DTF algorithms for TVP estimation, which are available as the darx and dtfm routines in the CAPTAIN Toolbox for Matlab. These algorithms allow for the identification and estimation of both on-line and off-line estimates of time variable parameters in transfer function models, with the hyperparameters that define the statistical rate of change of the TVPs optimized by maximum likelihood (an option for
437
S. Bittanti and M. Campi. Bounded error identification of time-varying parameters by rls techniques. IEEE Transactions on Automatic Control, 39:1106–1110, 1994. A. E. Bryson and Y. C. Ho. Applied Optimal Control. Blaisdell, Waltham, Mass., 1969. H. F. Chen and L. Guo. Identification and Stochastic Adaptive Control. Birkhauser, Boston, 1991. A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Jnl. Royal Stat. Soc., Series B, 39:1–38, 1977. A. C. Harvey. Forecasting Structural Time Series Models and the Kalman Filter. Cambridge University Press: Cambridge, 1989. A. J. Jakeman and P. C. Young. Recursive filtering and the inversion of ill-posed causal problems. Utilitas Mathematica, 25:351–376, 1981. G. Kitagawa and W. Gersch. Smoothness Priors Analysis of Time Series. Springer-Verlag, New York, 1996. R. Kulhavy. Restricted exponential forgetting in real time identification. Automatica, 23:589–600, 1987. J. P. Norton. An Introduction to Identification. Academic Press, New-York, 1986. J. P. Norton. Optimal smoothing in the identification of linear time-varying systems. Proceedings Institute Electrical Engineers, 122:663–668, 1975. F. Schweppe. Evaluation of likelihood functions for gaussian signals. IEEE Trans. on Information Theory, 11: 61–70, 1965. C. J. Taylor, D. J. Pedregal, P. C. Young, and W. Tych. Environmental time series analysis and forecasting with the CAPTAIN toolbox. Environmental Modelling & Software, 22:797–814, 2007. P. E. Wellstead and M. B. Zarrop. Self-Tuning Systems: Control and Signal Processing. John Wiley, New York, 1991. P. C. Young. Recursive Estimation and Time-Series Analysis. Springer-Verlag, Berlin, 1984. P. C. Young. Nonstationary time series analysis and forecasting. Progress in Environmental Science, 1:3–48, 1999. P. C. Young. The refined instrumental variable method: unified estimation of discrete and continuous-time transfer function models. Journal Europ´een des Syst`emes Automatis´es, 42:149–179, 2008. P. C. Young. Differential Equation Error Method of RealTime Process Identification. Ph.D Thesis, University of Cambridge, Cambridge, U.K., 1970. P. C. Young, D. J. Pedregal, and W. Tych. Dynamic harmonic regression. Jnl. of Forecasting, 18:369–394, 1999. P. C. Young, P. McKenna, and J. Bruun. Identification of nonlinear stochastic systems by state dependent parameter estimation. International Journal of Control, 74: 1837–1857, 2001.