On adaptive smoothing of empirical transfer function estimates

On adaptive smoothing of empirical transfer function estimates

ON ADAPTIVE SMOOTHING OF EMPIRICAL TRANSFER FUNCTI. .. 14th World Congress of IFAC H-3a-12-5 Copyright © 1999 IFAC 14th Triennial World Congress~ B...

3MB Sizes 6 Downloads 15 Views

ON ADAPTIVE SMOOTHING OF EMPIRICAL TRANSFER FUNCTI. ..

14th World Congress of IFAC

H-3a-12-5

Copyright © 1999 IFAC 14th Triennial World Congress~ Beijing, P.R. China

ON ADAPTIVE SMOOTHING OF EMPIRICAL TRANSFER FUNCTION ESTIMATES Anders Steulllan * Fredrik Gustafsson * Daniel E. Rivera ** Lennart Ljung ~ TODlas McKelvey *

* Departrnent of Electrical Engineering, Linkoping Univers2:ty, SE-581 83 Linkoping, SUJeden. Fax: +46 13 28 26 22 ** Department of Chemical, Bio and Materials Engineering and

Control Systems Engineering Laboratory, Arizona State University, Tempe, Arizona. US~4

Abstract: The determination of the right resolution parameter when estimating frequency functions for linear systems is a trade-off between bias and variance. Traditional approaches, like "window-closing" ern ploy a. global resolution parameter - the "\\rindo\v ,,'idth - that is tuned by ad hoc methods, usually visual inspection of the results. Here \ve suggest an adaptive method that tunes such parameters by an automatic procedure. A further benefit is that the tuning can be done locally, i.e., different resolutions can be used in different frequency bands. The ideas are based on local polynolnial regression and the "just-in-time -model concept. The advantages of the method are illustrated in nurnerical exatnples. Copyright© 1999IFAC H

Keywords: frequency-response methods, non-parametric regression

1. INTR,ODUCTION EstilIlating the frequency function G (e iw ) of a linear system is a classical problem, see, e.g., Brillingcr (1981), Jcnkins and vVatts (1968), Bcndat and Piersol (1980), and Ljung (1987). The methods can be divided into parametric ones, \vhich estimate the parameters of an ARX, ARJvlAX or similar model, and compute the model's frequency function, and non-parametric ones, that essentially S11100th the ratio of the output's and input's FOllrier functions - one ,.vay or another. The procedures all involve some parameters that govern the resolution of the estimate. For paralnetric methods, the chosen model orders serve as sueh parameters. For non-parametric methods, the ,\'7idth of some kind of smooihing v.rindo\v has to be chosen. The selection of such parameters of course reflects a bias/variance trade-off. \'lith better resolution, (i.e., smaller bias), fewer data points can be involved in the estimate ~vhich lead

to higher variance. There are many "rays to strike this balance, but most methods rely on some subjective method, like visual inspection of the estimate, for the final choice. Here we shall describe a method that has tV{O potential advantages in the choice of resolution paralneters: \~le shall use local resolution parameters, so that the resolution is allowed to be frequencydependent. • We suggest a procedure, based on crossvalidation, that gives an automatic choice of the optimal value.



The ideas are based on local polynomial regression (Stone~ 1977; Cleveland, 1979; Katkovnik, 1979). \\re have developed such methods for regression and rlynalnical systems, using terlns like JUSTIN-TI~1E models and fv[ODELS-O::\'"-DEivlAND; see Stenulan (1997) ~ Stenman et al. (1996) and Sten-

4183

Copyright 1999 IFAC

ISBN: 0 08 043248 4

ON ADAPTIVE SMOOTHING OF EMPIRICAL TRANSFER FUNCTI. ..

14th World Congress of IFAC

man et al. (1997). In this paper we apply the techniques to freqnency function estimation.

the frequency metric fashion

The paper is organized as follo,vs: Section 2 describes the general set-up, while Section 3 gives the general background of local regression. The application of adaptive frequency function estimation is treated in Section 4. Section 5 illustrates the proposed method with a numerical example, and Section 6 finally~ provides some concluding remarks.

G(e iW )

t.L'

can be estimated in a nonpara-

= ~L u k Wk

Wk

A traditional application of nonparametric methods in system identification occurs in the frequency d01nain when estimating the frequency response function of a system. If the system considered is linear, Le., it can be modeled by the input-output relation

y(t) :::::: G(q)u(t)

+

t

e(t),

~

1, ... ,N,

(1)

an estimate of the frequency function G(eif.l,;) can

be formed as the ratio between the (discrete) :Fourier transforms of the output and input signals, i.e.,

.: : r( iw) ~ }~l\r(w) G f'.I C UN(W)·

(2)

This estimate is often called the empirical transfer function estimate, ETFE, since it is formed with no other assumptions than linearity of the system {Ljung, 1987). It is \,tell-knoVo-Tn that the ETFE is a very crude estima.te of the true transfer function. This is

due to both the fact that the observed outputs are corrupted by measurement noise e(t) which propagates to the ETFE through the FOllrier transform, and leakage effects. In particular, for sufficiently large 1'1 (see Ljung (1987, Lemma 6.1), it can be shown that the ETFE satisfies

EGN(e iW ) == G(e iw )

-,; T C; (iW) var

N



-

+ R~),

1

[[TN-(w)

1

2

(3)

[rTo. ( ,) ~e u.:

+ R N(2)]

'

(4)

(5)

where the weights 1.L1k are selected such as a good trade-off between bias alldvariancc is achieved. In traditional t.reatments of the problern~ see for eX3Inple Brillinger (1981) and Ljung (1987), the weights have been selected according to a frequency window function

1

2. FREQUENCY RESPONSE· ESTlIvlATION

wkGN(e iwk ),

k

== W",(Wk - w),

\vhere I is a parameter that controls the (inverse) width of the windo~". _A. commonly used window function in this context is so-called Hamming window.. A problem with these methods however, is that the resolution parameter I in practical situations has to be chosen manually by the user, and that it in general is fixed over the whole frequency axis. The reason for this is that the methods typically are implemented in the time domain using the Blackman-Tukey procedure (Blackman and Tukey, 1958), or as averaging of periodograms (\Velch, 1967). This automatically involves a global choice of resolu tiOll paralneter. Better performance can be expected using a frequency \-vindov;r with variable "vidth, \vhich adapts to the local properties of the E,TFE. In the following sections, we will see how the concept of local polynomial regression can be used for this purpose.

REGRESSIO~

3. LOCAL POLYNOIVIIAL

Local polynomial regression is a special class of nonparametric methods \vhich has its origins in kernel methods for density estimation (Rosenblatt~ 1956; Parzen~ 1962) and kernel regression (Nadaraya, 1964; Watson~ 1964). It enjoyed a reincarnation in a more general setting in the late 1970's '~lith the Vv"ork of Stone (1977), Cleveland (1979) and Katkovnik (1979). Its current popularity is largely due to soft\vare packages such as LOWESS (Cleveland.~ 1979) and LOESS (Cleveland and Devlin, 1988).

where

RY/ -+ 0 as N

-+

00,

i::=; 1,2,

and Pe(w) is the spectrum of the noise. That is, the ETFE is an asymptotically unbiased estimate of G(e'i,W). However, the variance does not decay as N increases. Instead it approa.ches the noiseto-input-signal ratio at the frequency in question. One way to improve the poor variance properties of the ETFE is to a.ssume that the values of the true frequency function at neighboring frequencies are related. Hence the frequency function value at

3.1 Basic Concepts and Theory

The underlying model for local regression is k

==

1~ ...

,.1\l,

(6)

\vhere m(·) is an unkno\vn function, a.nd ek is an error term modclcd as i.i.d. randoID variables wit.h zero means and variances CfZ' The function m(·) is assumed to be smooth and is estimated by fitting a pth degree polynomial to the data within a sliding

windu\'v. That is, for each point x, an estimate

4184

Copyright 1999 IFAC

ISBN: 0 08 043248 4

ON ADAPTIVE SMOOTHING OF EMPIRICAL TRANSFER FUNCTI. ..

14th World Congress of IFAC

p

is obtained by the weighted regression problem;

m(Xk ) ~ L:$j(Xk -x)j

minimize

~l(Y k- ~Pj(Xk - X)j) W( X\- x) :~,

denote the estinlatcd loeal polynoIllial \-vithin the windo'~l. To assess the quality of this fitted polynomial, we use the risk function

(7)

R(

where f(·) is a scalar-valued, positive norm function, h is a bandwidth paraIneter controlling the size of the local neighborhood, and lV" (.) is a windo\v function (usually referred to as the kernel) assigning "veights to each remote data point according to its distance from x. If fjj, i = 0, ... ,p, denote the minimizers of (7), an estimate of m(v) (x) is given by

= L:=l Wk(X) Ef(m(Xk) -

h)

tr(~)

x,

~ ~ where Wk(X) =: VI/(h l(Xk - x)) and W =:;; diag(wl(x), ... ,WN(X). An estinlator for this quantity, which measures the estimation error, is given by (see Cleveland and Loader (1996);

fl,(x, h) == _ ( 1) tr W

~ote

that each local regression problem produces rh(v)(x) for a single point x; to obtain estimates at other locations, the local \vcights change and new regression problems have to be solved.

It is clear that the va.lue /30 obtained in (8) can be used by itself as an estirrlate of rn(x). However, it is also possible to enhance the estimate by plugging in the derivative estimates into an additional optimization step. See Stenman (1997) for more details around this.

m(Xk»!o-i (10)

(8)

The norm l (.) is usually taken as a quadratic function, which is convenient both for computation and analysis since silnple and powerful least squares methods can be used. However, demands for robustness against onttiers may sOInetinles warrant other choices.

(9)

j=O

- tr(W)

[t w",,~x) k=l

fCY"" - m(X k »)

O"k

+ 2tr((X T WVX)-1(X T W

2

VX»]

(11)

where V ~ diag(llar, ... , l/a'j..;) and X is a \landermonde type nlatrix of size ]'[ x (p + 1) v.rith rows 1, (Xi

~

~

x), ... ,(Xi

x)P.

This has the interpretation as a localized version of f\.lallov..r s Gp statistics (:t\.lallo"vs, 1973; Cleveland and Loader 1996). By also allowing an arbritrary penalty on the variance part in (11) in ord€r to prevent the criterion to find spurious features at small bandwidths, we obtain the local generalized C p with variance penalty Q (Loader, 1997); 1

C(x, h)

8 1 [~ 1Dk(X) = tr(W) ~ ~.e(Yk -

_ m(Xk »)

+ atr«X T WVX)-1(X T W

3

3.2 Adaptive Bandwidth Selection The bandwidth h has a critical impact on the resulting estimate since it governs a trade-off between bias and variance. Methods that use the a.vailable data to produce a bandwidth are usually referred to as bandwidth selectors, and have been extensively studied within the statistical literature, see) for instance, Fan and Gijbels (1996). They can roughly be divided into '~classical') methods which are based on crossvalidation ideas, and "plug-in" methods which rely on asymptotic l\,.1SE expressions (Loader, 1995). The majority of the bandwidth selectors Pl'oposed so far, though) have been of global type, i.e.~ they produce a single global bandwidth. However, adaptive methods, which select local bandwidths for each fitting point, have gained a significant interest in recent years and the development of theln still seems to be an open and active research area.

1

- tr(W)

VX))

J. (12)

Thus, for each fitting point x, a. local bandwidth h ;::::: hex) can be obtained by nlinimizing C(x, h) subject to h.

4. ADAPTI\TE SlVI00THING OF THE ETFE The adaptive smoothing technique described in Section 3 can easily be extended to the ETFE smoothing case. Fronl (3) and (4) it follows that the ETFE, at least asymptotically, can be modeled as in (6) '\vith Yk

= G(e

iWk

),

X k ==

Wk,

k

=:

0, ...

N

'2 - 1~

1

and \vith a COlnplex zero mean noise terlll variance given by

In this contribution we consider an adaptive, cross-validation type approa.ch. Let

....,..2 ~ 't/k -

q)e(Wk) t[ll\r (Wk) 12



eh

with

(13)

4185

Copyright 1999 IFAC

ISBN: 0 08 043248 4

ON ADAPTIVE SMOOTHING OF EMPIRICAL TRANSFER FUNCTI. ..

14th World Congress ofIFAC

For simplicity Vole assume a quadratic norm~ I!(E) :=: jcl 2 , ~'hich yields that the local regression problems can be solved explicitly using ordinary weighted least squares.

where

In order to use the localized Gp criterion and to enhance the quality of the estimate, it is crucial to have a good estimate of the variance (13). This can be achieved in the following way. An estimate of the residual spectrum
A data set ,vas generated according to

rjJ:::: 1.31J"/4, k == 0.5, C

Yet)

==

~e(Wk) IU1V(u..'k)1 2

5.1 Using the Blackman- T1.i.key Procedure

In Figure 2 (a)-(c), the ETFE has been windowed \vith a Ha-mIning windo\v of different widths. In Figure 2 (a) a wide v.rindov.r with ~/ == 32 is used. The plot is smooth, but the resolution at the peak is quite poor. In Figure 2 (b) a narrOVler \vindo\\r with, == 256 is used~ The resolution at the peak is now better, but a.t other frequencies the plot is noisier. Figure 2 (c) shovts a cornpromise ,vi th ~riIldo,v "vidth 'Y == 128.

4.1 C01nputational Aspects The adaptive smoothing procedure described in Section 3 scenlS very attractive when computing pointwise estimates. Ho\vever, estimating the ETFE on a large grid of frequencies may be a very time consuming task} since we at each grid point have to solve a number of regression problems for different bandwidths in order to minimize the localized C p criterion. Considerable speedups can be obtained by considering recursive splitting ideas similar to those of LOCFIT (I.Joader, 1997); Start initially by computing estimates at the boundary points Wi and Wj of the grid, which yields the corresponding bandwidths hi and hj~ If Wjj

~ ( •

min(h i1 h j

(15)

function are shown in Figure 1 (a). The system has a damped peak at u) =:: 1. The corresponding ETFE plots are shown in Figure 1 (b).

It is of course also possible to obtain local variance estimates directly from the rav.," ETFE data, using a non-adaptive smooth with a small bandwidth.

IWi -

t = 1, ... ? N?

v,rhere N ==- 2 , u(t) is a unit PRBS signal, and {e(t)} is a Gaussian random sequence ,vith zero mean and standard deviation (J,-~ === 0.03. The aUlplitude and phase curves of the true transfer

where Go (e iwre ) denotes an initial smooth of the ETFE using a global ba.ndwidth. Plugging in this in (13) results in the variance estimate

---2

A
12

IYN(Wk) - Go(e iWk ) [IN (Wk )j2,

17k

= G(q)u(t) + e(t),

=:

i_~l~~~ D

CHi

,

1.!i

:2

2.1>

:1

Frequency [radfs}

):

recursively split the interval into two equally sized pieces, and apply the same procedure on the t\VO halves. This greatly reduces the number of computations. Estimates at intermediate frequency points can be obtained using cubic spline interpolation bet"",·een the fitted points. Empirical studies have shown that ( ~ O. 7 seems to be a reasonable ehoice.

(a) The trne frequency re~ponse, G( e iw ).

If - 1ll

5. AN EXAIvIPLE

a

0,5

1

1.5

2.

~,5

:;J

Fr-equency {ra,dls}

The follo\ving example 1 taken from Bodin (1995), illustrates the impact different window widths have on the resulting estimate. Consider the linear, discrete time system G(q)

= C (q2

- 2r cos(4)

+ f).
(b) T'he ETFE, GN(e iw ).

q-k x

(q2 ~ 2r

+r + 1"2)2

cos(cP - AcjJ)q

(q2 _ 2r cos (
2

)

(14)

Fig. 1. Frequency functions of the systern in (14).

4186

Copyright 1999 IFAC

ISBN: 0 08 043248 4

ON ADAPTIVE SMOOTHING OF EMPIRICAL TRANSFER FUNCTI. ..

14th World Congress ofIFAC

5.2 Using an Adaptive Smoother

i~l -tOo

J:~

{)

0.5

1

:I!:

\,5

']

2,5


Frequency [rad/sJ

r.e-

f

a.5

FrPq~Jency

i1

The problem \vith t.he fixed \vindo\v approach in the preceeding section is that an acceptable noise reduction in one part of the frequency interval is achieved to the price of a too low resolution in another part of the interval. Considerable improvements could therefore be obtained by allowing a Sllloothing window with variable ~ridth, which adapts to the local pl'operties of the ETFE. We VIIrill therefore apply the methods described in Section 3 and Section 4.

:i;!,~

~

{rl1dlsj

Figure 3 (a) shows the result after applying an adaptive, local quadratic smoother (p ::= 2) using

(a) I :::: 32

the tricube \\rindow J~I ~ 1 otherwise.

(16)

The smoothing has been carried out using the

recursive, interval splitting approach described in Section 4.1. The selected fitting points and

f::~l Cl

1

0,$

~

1,$

Fh~qUtmr.y

(b)

:!.5

associated bandwidths are shown in Figure 3 (b). As shown the split.ting approach results in Inore fitting points where smaller bandwidths are needed, Le., around the peak. The actual fits are COlllputcd at only 30 frequencies (compared to the original 2048 frequency points) J "V hich drastically decreases the compntation titne.

.,

(r;ul/s.l

,:=: 256

6. CONCLUSIONS

1

~ -~

J_~I Q

J: ~

a

0.5

1

1b

'2

2.5

3

Frequency [radls}

] 'J,5

f

1.5

:2.

'l,G

,

Aequeno::v {rad/s}

(c) '1 = 128

Fig. 2, \Vindo\ved frequency responses using Hamming windows of different Vv"idths 'Y. Solid lines: vVindowed estimate. Dashed lines: True frequency response.

This clearly sho\vs the trade-off between resolution (narrow window) and variance reduction (wide windo",") which has to be taken into account when using a frequency window with fixed resolution.

Based on local modeling, we have described a method that estimates frequency functions of linear systems using an autonlat.ic, adaptive, and frequency-dependent choice of frequency resolution. This gives several advantages over traditional spectral analysis techniques. I\1any frequency functions exhibit fine details to different degrees in different frequency bands. Our approach thus gives a useful alternative to multiresolution techniques~ based, e.g.) on wavelets. \Ve have also demonstrated how the automated procedure based on a cross-validation type approach leads to good choice of band\vidths.

7. REFER,ENCES Bendat, J.S. a.nd A.G. Piersol (1980). Engineering Applications of Correlation and Spectral A'lluly.Yi.'l. \\tiley, New York. Blackma.n~ R.B. and J.Vl. Thkey (1958). The Measurement of PO'll'er Spectra. Dover, ~ev..r y"ork. Bodin, P. (1995). all wavelets and orthonornlal bases in system identification. Licentiate thesis TRITA-REG-9502. Automatic Control, Dept.. of Signals, Sensors and Systems, Royal Institute of Technology, Sweden.

4187

Copyright 1999 IFAC

ISBN: 0 08 043248 4

ON ADAPTIVE SMOOTHING OF EMPIRICAL TRANSFER FUNCTI. ..

14th World Congress ofIFAC

.Jenkins~

(a)

I

1.10

F}equen{.~v

.)

2

(nul/sj

Cb) Fig. 3. Result obtained using an adaptive local polynoluial smoother . (a) .A.daptively smoothed ETFE. Solid line: Estimate. Dashed line: True frequency response. (b) Fitting locations and corresponding bandVw''"idths (circles).

Brillinger~

D.R. (1981). Time Series: Data Analysis and Theory. Holden-Day, San Fransisco. Cleveland, W.S. (1979). Robust locally weighted regression and Sllloothing scatterplots. Journal of the American Statistical Association 74, 829-836. Cleveland, "i.S. and C.R. Loader (1996). Smoothing by loca.l regression: Principles and methods. In: Statistical theor1J and ClJmputational aspects of smoothing (v,r. Hardle and I\1.G. SchiIuek, Eds.). pp. 10-49. Physica-Verlag, Heidelberg. Cleveland W.S. and S.J. Devlin (1988). Locally weighted regression: an approa.ch to regression analysis by local fitting. Journal oj the American Statistical Association 83, 596610. Fan, .J. and I. Gij bels (1996). Local P olyno mi al Modelling and Its Applications. Chapman & Hall.

G.hlJ.. and D.G. \\latts (1968). Spectral Analysis and its Applications. Holden-Day, San Fransisco, California. Katkovnik~ \T,Y. (1979). Linear and nonlinear methods of nonparametric regression analysis. Soviet Automatic Control 5, 25-34. Ljung, L. (1987). System Identification - Theory for the 1j,,'JeT. Inforluation and Systerll Sciences Series. Prentice Hall, Engle\voorl Cliffs, Ne\v Jersey. Loader~ C.R. (1995). Old faithful erupts: Bandwidth selection reviewed. Technical report . .A.T&T Bell Laboratories. Loader, C.R. (1997). Locfit: An Introduction. £o\T&T Bell Laboratories. ~Iallows, C.L. (1973). Some comments on Cp. Technometrics 15, 661~675. Nadaraya, E. (1964). On estimating regression. Theory of Probability and Its Applications 10, 186-190. Parzen, E. (1962). On estimation of a probability density function and mode. The Annals of Mathematical StatisticH 33, 1065-1076. Rosenblatt, Iv1. (1956). Remarks on some nonparametric estimates of a density function. The Annals of Mathematical Statistics 27, 832837. Stcnman, A. (1997). Just-in-Time Models with Applications to Dynamical Systems. Licentiate thesis LIU-TE,K-LIC-1997:02. Department of Electrical Engineering, Link()ping university. 8-581 83 Linkoping, S,veden. Stenmau) A., A.v.. . Nazin and F. Gustafsson (1997). Asymptotic properties of Just.-inTime models. In: Preprints of the 11th IFAC Symposium on System Identification, Kitakyushu Japan (yr". Sa"vaTagi and S. Sagara, Eds.). pp. 1249-1254. Stenman, A., F. Gustafsson and L. Ljung (1996). Jlist in time Illodels for dynamical systelns. In: Proceedin.gs of the :15th IEEE Conference on Decision and Control; Kobe, Japan. pp. 1115-1120. Stone, C.J. (1977). Consistent nonpararrletrie regression. The Annals of Statistics 5,595-£20. \;V"'atsoIl, G. (1964). Srnooth regression analysis. Sankhyii A(26)~ 359-372. \""'clch, P.D. (1967). The use of the fast Fourier transfol'rll for the cstinlation of pov.rcr spectra: j\ method based on tinle avera.ging over short modified periodograms. IEEE Trans. ..4 udio Electroacoust. AV-I5, 70~73. f

J

4188

Copyright 1999 IFAC

ISBN: 0 08 043248 4