Mechanical Systems and Signal Processing (1995) 9(3), 287–297
ON THE OVER-SAMPLING OF DATA FOR SYSTEM IDENTIFICATION K. W Dynamics and Control Research Group, Department of Engineering, University of Manchester, Manchester M13 9PL, U.K. (Received 28 March 1994, accepted 5 June 1994) It is well-known that the accuracy of system identification parameter estimates deteriorate when the sampling frequency of the data is increased beyond a certain limit. A simple interpretation of this phenomenon in terms of data extrapolation is presented here. An upper-bound for the over-sampling frequency is estimated; however, this can only be considered as an existence result as it depends on the signal-to-noise ratio for the data, which is not known a priori.
1. INTRODUCTION
Ljung [5] summarises his discussion on over-sampling as follows ,
‘‘Very fast sampling leads to numerical problems, model fits in high-frequency bands, and poor returns for hard work.
,
‘‘As the sampling interval increases over the natural time constants of the system, the variance (of parameter estimates) increases drastically.’’ (In fact, he shows analytically for a simple example that the parameter variance tends to infinity as the sampling interval Dt tends to zero [5] pp 378.)
,
‘‘Optimal choices of Dt for a fixed number of samples will lie in the range of the time constants of the system. These are, however, not known, and overestimating them may lead to very bad results.’’
It is thus a text book result that over-sampling of data is sub optimal, and the current paper makes no claim to novelty in pointing this out. What is hopefully new is a simple explanation of poor results in terms of extrapolation. It is shown here that the problem occurs for any system identification method which proceeds by minimising some increasing function of the one-step-ahead prediction error. The discussion on oversampling essentially began with Astro¨m in [1]. Comprehensive treatments can be found in [5], [3] and [8]. A useful recent reference is [4]. The layout of the paper is as follows. Section 2 summarises the important points about ARX (Auto-Regressive with eXogenous inputs) modelling. Section 3 demonstrates the problems in fitting an ARX-model to over-sampled data using numerically simulated data. Section 4 explains these results in terms of extrapolation rules. Finally, Section 5 provides a simple theoretical estimate for an upper-bound to the over-sampling frequency. 287 0888–3270/95/030287 + 11 $08.00/0
7 1995 Academic Press Limited
.
288
2. ARX MODELLING OF OVER-SAMPLED DATA
To keep things as simple as possible, the system investigated will be a linear single dof oscillator, my¨ + cy˙ + ky = x(t) (1) where m, c and k are the mass, damping and stiffness, and overdots denote differentiation with respect to time. There is no loss of generality. The discussion which follows is restricted to discrete-time models of continuous-time systems, the parameter estimation problem for continuous-time models does not suffer from over-sampling. Obtaining the discrete-time representation of equation (1) is a text book exercise; if the input and output signals are sampled at regular intervals of time Dt, one obtains records of data xi = x(iDt) and yi = y(iDt) for i = 1, . . . , N related by my¨i + cy˙i + kyi = xi .
(2)
The velocity and acceleration can be approximated by difference formulae yi − y i − 1 Dt yi + 1 − 2yi + yi − 1 y¨i = . Dt 2 Substituting these expressions into (2) gives y˙i =
6
yi = 2 −
7
6
7
cDt kDt 2 cDt Dt 2 − yi − 1 + − 1 yi − 2 + x m m m m i−1
(3) (4)
(5)
or, in a standard notation, yi = a1 yi − 1 + a2 yi − 2 + b1 xi − 1
(6)
which is termed an ARX (Auto-Regressive with eXogenous inputs [5] [2]) model. In fact, the following arguments will apply equally well to the considerably more general NARMAX (Nonlinear Auto-Regressive Moving-Average with eXogenous inputs) models [2]. The main advantage of using such a discrete-time form over continuous-time models as in (1) is clear. To estimate each parameter, a measurement of the corresponding state is needed [6]; in the case of (1), sampled inputs, displacements, velocities and accelerations are needed; for (6) only samples of input and displacement are required. The parameters can be estimated using many different algorithms; the universally applied least-squared error approaches proceed by minimising a squared-error function, M
J({ui ; i = 1, . . . , Np }) = s zi2
(7)
i=1
with respect to the model parameters ui . To give a concrete example, in the case of equation (6), the error is given by, zi = yi − aˆ1 yi − 1 + aˆ2 yi − 2 + b i xi − 1
(8)
where the carets denote estimated quantities or fixed realisations of the parameters. The software used to produce the parameter estimates in the current study is a ARX modelling code which transforms the estimation to an orthogonal form in which the parameters can be identified independently of each other [2]. Before proceeding to explain how the algorithms can break down if the data is oversampled it is appropriate to discuss some commonly used metrics of the model validity.
-
289
The question of parameter bias as a result of correlated measurement noise is ignored here as it is assumed that it cannot directly affect the eventual conclusions. As a result there is no discussion of noise models. 2.1. Having obtained an ARX model for a system, the next stage in the identification procedure is to determine if the structure is correct and the parameter estimates are unbiased. It is important to know if the model has successfully captured the system dynamics so that it will provide good predictions of the system output for different input excitations, or if it has simply fitted the model to the data in which case it will be of little use since it will only be applicable to one data set. Two basic tests will be described below in increasing order of stringency. In the following, yi denotes a measured output while yˆi denotes an output value predicted by the model. 2.1.1. One-step-ahead predictions Given the ARX representation of a system ny
nx
j=1
j=1
yi = s aj yi − j + s bj xi − j + zi
(9)
where output (resp. input) lags of order ny (resp. nx ) are included. The one-step-ahead prediction of yi is made using measured values for all past inputs and outputs. Estimates of the residuals or errors are obtained from the expression zi = yi − yˆi i.e. ny
nx
j=1
j=1
yˆi = s aj yi − j + s bj xi − j .
(10)
The one-step-ahead series can then be compared to the measured outputs. Good agreement is clearly a necessary condition for model validity. It is important to note that the least-squares methods described above work by minimising the one-step-ahead errors over the whole data set (equation (8)). 2.1.2. Model predicted output In this case, the inputs are the only measured quantities used to generate the model output, i.e. ny
nx
j=1
j=1
yˆi = s aj yˆi − j + s bj xi − j .
(11)
In order to avoid a misleading transient at the start of the record for yˆ , the first ny values of the measured output are used to start the recursion. As above, the estimated outputs must be compared with the measured outputs, with good agreement a necessary condition for accepting the model. It is clear that this test is stronger than the previous one; in fact the one-step-ahead predictions can be excellent in some cases when the model predicted output shows complete disagreement with the measured data. It is useful to have an objective measure of the closeness to fit of the model. The one adopted here is the normalised mean-square error or MSE. This is defined by the ratio of signal variances, s2 MSE = z2 (12) sy
290
.
This is normalised so that using the mean level of the response y as a model results in an unit MSE. 2.1.3. Significance testing In practice, it is unusual to know which terms should be in the model. In order to reduce the computational load on the parameter estimation procedure it is clearly desirable to determine which terms should be included. The measure of significance used in the orthogonal estimation procedure referred to above is called the error-reduction ratio or ERR [2]. A general significance factor can be defined as follows. Each model term u(t), e.g. u(t) = a2 yi − 2 can be used on its own to generate a time-series which will have variance su2 . The significance factor su is then defined by su =
su2 sy2
(13)
where sy2 is the variance of the estimated output, i.e. the sum of all the model terms. Roughly speaking, su is the proportion contributed to the model variance by the term u. Alternatively, it is the expected reduction in the model MSE if the term is included in the model, hence the name ERR, although the term is strictly only correct when used with an orthogonal estimator. This is because all terms in the orthogonal basis are uncorrelated. 3. NUMERICAL SIMULATION
The aim of the present section is to illustrate the problem of over-sampling using data from a numerical simulation of a sdof linear system, namely, y¨ + 20y˙ + 104y = x(t).
(14)
The equation was stepped forward in time using a fixed-step fourth-order Runge–Kutta procedure [7]. The excitation x(t) was a zero-mean white Gaussian sequence band-limited
Figure 1. (a) Input and (b) output time-series for over-sampled system.
-
291
Figure 2. Measured (a) output and (b) one-step-ahead predictions for over-sampled system.
onto the interval 0 to 30 Hz. A step-size of 0.1 msec was used, corresponding to a sampling frequency of 10000 Hz. The natural frequency of the system is 15.9 Hz so the sampling is at about 630 times this value and 333 times the highest frequency in the data. The data is noise-free except for the inaccuracy of the integration scheme and truncation error, of which the first is expected to be the most significant. Figure 1 shows 1000 points of input (x) and output (y) data which were used to fit an ARX model. A structure detection approach was used to fit the ARX model. At each iteration of the procedure, the modelling program adds to the model, the term which produces the greatest ERR, i.e. the most significant term. This is termed the forward regression algorithm [2].
Figure 3. Measured ( · · · ) and model predicted (——) output for over-sampled system.
292
.
Figure 4. Prediction by linear extrapolation.
After one iteration, the method produced a model, yi = 1.0063yi − 1 with a MSE of 1.08 × 10−4—a value which signifies great accuracy. Clearly this cannot be a serious attempt at a model as it simply represents an exponential growth. However, the level of sampling is so high that the variation in the data from one point to the next is very small, and keeping the value constant produces an excellent one-step-ahead prediction. This is true for each point of data so the least-squares procedure arrives at the yi 1 yi − 1 prescription. This is reflected in the ERR value of 0.99989 for the single term. If the algorithm is forced into another iteration, the model, yi = 2.005yi − 1 − 1.0052yi − 2 is produced with a MSE of 6.01 × 10−9. This signifies almost perfect agreement. The combined ERR values for the terms of 0.999998 show that they account for almost all the variation in the data. Figure 2 shows the measured response of the system compared with the one-step-ahead prediction from the model, the agreement is excellent as expected. However, the comparison between measured data and model-predicted output shows serious discrepancies (Fig. 3). As expected from the AR structure (no forcing terms) the model predicts an exponential growth (the only other possibility being an exponential decay). Further iterations of the algorithm would eventually introduce a forcing x term. However any reasonable person would terminate the procedure after two steps given the tiny MSE; this is the danger of oversampled data. Like the single-term model, the two-term ARX model also has a simple interpretation. The one-step-ahead prediction is a simple linear extrapolation from the previous two values (Fig. 4). The increase over the interval Dt is approximately constant i.e. yi − yi − 1 1 yi − 1 − yi − 2 c yi 1 2yi − 1 − yi − 2 .
(15)
The next simulation was carried out at 5000 Hz, i.e. at half the sampling frequency. After one iteration of the identification algorithm, the model was, yi = 0.9997yi − 1
a1 1.0063 2.0050 0.9997 1.4997 2.0000 1.9904 1.9605 1.9217 1.8293
Sampling frequency (Hz)
10000 10000 5000 5000 2500 1000 500 500 300
— −1.0052 — — −1.0011 −0.9997 −0.9961 −0.9607 −0.9354
a2 — — — −0.5003 — — — — —
a3 — — — — — — — 3.8796 × 10−6 1.0425 × 10−5
ERR b1 0.99989 0.99989 0.99962 0.99962 0.99879 0.99067 0.96501 0.96501 0.90377
ERR a1
— 1.0841 × 10−4 — 3.7999 × 10−4 1.2088 × 10−3 9.3132 × 10−3 0.03458 0.03458 0.09406
ERR a2
T 1 ARX model data for various sampling frequencies
MSE 1.0824 × 10−4 6.0103 × 10−9 3.7959 × 10−4 4.2635 × 10−8 1.0383 × 10−6 1.4221 × 10−5 3.9965 × 10−4 1.7933 × 10−8 8.4623 × 10−7
Normalised — — — — — — — 3.9974 × 10−4 2.1672 × 10−3
a3
- 293
294
.
Figure 5. (a) Input and (b) output time-series for correctly sampled system.
with a one-step-ahead MSE of 3.8 × 10−4. The ERR of 0.99962 is a clear signal that oversampling has occurred. A further iteration of the algorithm yields the model, yi = 1.4997yi − 1 − 0.50031yi − 3 with MSE of 4.26 × 10−7. The combined ERR for the two terms is 0.99999999! This result can also be explained in terms of a linear extrapolation based on, 2(yi − yi − 1 ) 1 yi − 1 − yi − 3 c yi 1 32 yi − 1 − 12 yi − 2 .
(16)
The effects of the sampling frequency are shown clearly by the results in Table 1.
Figure 6. Measured output ( · · · ) and one-step-ahead predictions (——) for correctly sampled system.
-
295
The first indications that a valid model requires the inclusion of a forcing term occur when the sampling frequency is 500 Hz. 1000 points of the input and output (Fig. 5) were used to fit a model. A two-term extrapolation model, yi = 1.9605yi − 1 − 0.99606yi − 2 produces a one-step-ahead MSE of only 3.997 × 10−4. Another iteration introduces a forcing term, yi = 0.19217yi − 1 − 0.96072yi − 2 + 3.8796 × 10−6xi − 1 as xi − 1 is the term with the next highest ERR i.e. the term which produces the greatest decrease in error. This reduces the MSE to 1.79 × 10−8. [Note the size (O(Dt 2)) of the coefficient of xi − 1 ]. The effect on the model-predicted output is dramatic, Fig. 6 shows a comparison with the true output. An alternative method of arriving at the extrapolation formula in (15) is to take the limit as Dt:0 in (5). The fact that a different extrapolant is obtained in (16) is not significant, this could also be obtained from a limiting procedure if a different discretisation of the derivatives in (1) had been chosen. 4. PREDICTION USING LINEAR EXTRAPOLATION
The purpose of this section is simply to formalise the observations of the last. Assume a band limited response p(t), so that the spectrum P(v) = 0 for vr E v . The noise n(t) is assumed to be band-limited also with N(v) = 0 for vn E v. Without loss of generality it can be assumed that vn E vr ; this is simply so that any ‘smoothness’ criterion which applies to p(t) also applies to the noise. The total signal y(t) is a superposition of process and noise, y(t) = p(t) + n(t) 4 yi = pi + ni
(17)
The least-squares criterion for selecting a model is that, J(z) = s zi2
(18)
i
be minimised, where the zi are the model residuals i.e. the errors from the one-step-ahead predictions. Now, the error function J can be defined for the extrapolation model in terms of the extrapolation errors ziI and also for the ARX model in terms of the residuals ziA . Clearly as Dt varies, the actual numbers ziA will change, but even if a perfect unbiased process model is obtained, the sum of squared-errors is bounded below, i.e. J(z A) e Na
(19)
where a is the mean-square of the noise signal n(t) and N is the number of points in the estimation set. On the contrary, because both noise and process become smoother as Dt 4 0, it is clear that, J(z I(Dt)) 4 0 as Dt 4 0. (20) So there must exist Dt 0 such that for all Dt W Dt 0, J(z I(Dt)) E J(z A)
(21)
and the least-squares criterion will always select the extrapolation formula as the valid model.
.
296
This result is very general, it holds for any identification method which seeks to minimise any increasing function of the error in the one-step-ahead predictions. The effects would be avoided if the least squared-error criterion could be based on the model predicted output. 5. AN UPPER BOUND FOR THE OVER-SAMPLING FREQUENCY
To simplify the analysis the signal will be replaced by a sine-wave at the highest frequency of interest. So, yi = y(ti ) = A sin
0 1 2p t t i
(22)
where t is the shortest time-scale in the data. If the sampling interval is Dt, the extrapolated prediction for y(t) one time-step ahead, is given by equation (15), so yˆi + 1 = 2A sin
0 1
0
2p 2p t − A sin (t − Dt) t i t i
1
(23)
while the exact result is, yi + 1 = 2A sin
0
1
2p (t + Dt) . t i
(24)
The error from extrapolation z, is thus, (25) zi + 1 = 2A sin (ati )(cos (aDt) − 1) taking a = 2p/t for notational convenience. This is clearly a maximum when sin (at) is a maximum, i.e. when it is unity. An upper-bound for zi is therefore, (26) zmax = −2A(cos (aDt) − 1). (It is interesting to note that a situation with zmax = 0 can arise if t is a multiple of Dt.) Over-sampling will certainly occur if the RMS measurement noise zrms is greater than this upper bound as the least squared-error criterion will then select the extrapolation model in preference to the true process model. This gives an implicit relation for the over-sampling frequency fs in terms of the maximum frequency of interest fmax ,
0
zrms = zmax = 2z2yrms 1 − cos
0 11 2pfmax fs
(27)
or, after a little algebra, 2pfmax
fs = cos−1
0
1 1− 2z2g
1
(28)
where g = yrms /zrms is the signal-to-noise ratio. This is an ‘exact’ expression for fs . By the use of a judicious approximation, the cosine can be removed; it is assumed that 1/2z2g is small, i.e. the level of measurement noise is small compared to signal. In order to simplify (28), the expansion for cos (1 + x) with x small is required. Differentiating the function cos (1 + x), expanding as a Taylor series and integrating term by term yields the estimate, cos (1 + x) = z2x + O(x 3/2 )
(29)
-
297
where x W 1. Applying this approximation to equation (28) gives, fs =
2pfmax
(30)
X
1 z2g
or, finally, fs = p321/4g 1/2fmax
(31)
for high signal-to-noise ratio g. This result [and the exact (28)] can only be regarded as existence results due to the fact that the signal-to-noise ratio would not be known in practice. However, it is amusing to note that the over-sampling frequency estimated in Section 3 can now be used to give bounds on the accuracy of the simulated data. The data at 500 Hz were definitely oversampled and given that the maximum frequency of interest was 30 Hz, this gives a signal-to-noise ratio of about 5.0. Data sampled at 250 Hz gave sensible results and this corresponds to a signal-to-noise of 20.0. ACKNOWLEDGEMENTS
Thanks are due to Professor S. A. Billings for many illuminating discussions. REFERENCES 1. K. J. A 1969 Information Sciences 1 pp. 273–287. On the choice of sampling rates in parameter identification of time series. 2. S. C, S. A. B and Y. P. L 1987 International Journal of Control 50, 1873–1896. Orthogonal least-squares methods and their application to nonlinear system identification. 3. G. C. G and R. L. P 1977 Dynamic System Identification: Experiment Design and Data Analysis. New York: Academic Press. 4. P. H. K 1992 Preprint, Department of Building Technology and Structural Engineering, Aalborg University, Denmark. Optimal selection of the sampling interval for estimation of modal parameters by an ARMA-model. 5. L. L 1987 System Identification: Theory for the User. Englewood Cliffs, NJ: Prentice Hall. 6. K. S. M, K. W and G. R. T 1992 Journal of Sound and Vibration 152 pp. 471–499. Direct parameter estimation for linear and nonlinear structures. 7. W. H. P, B. P. F, S. A. T and W. T. V 1986 Numerical Recipes—The Art of Scientific Computing. Cambridge: Cambridge University Press. 8. M. B. Z 1979 Optimal Experiment Design for Dynamic System Identification. Lecture Notes in Control and Information Sciences, 21, Berlin: Springer-Verlag.