Asymptotic properties of transfer function estimates using non-parametric noise models under relaxed assumptions

Asymptotic properties of transfer function estimates using non-parametric noise models under relaxed assumptions

Proceedings of the 15th IFAC Symposium on System Identification Saint-Malo, France, July 6-8, 2009 Asymptotic properties of transfer function estimat...

782KB Sizes 0 Downloads 22 Views

Proceedings of the 15th IFAC Symposium on System Identification Saint-Malo, France, July 6-8, 2009

Asymptotic properties of transfer function estimates using non-parametric noise models under relaxed assumptions Kurt BarbΓ©, Rik Pintelon and Johan Schoukens Dept. ELEC,Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium; e-mail: [email protected] Abstract: It is well known that under very general assumptions the discrete Fourier coefficients of filtered noise is asymptotically independent circular complex Gaussian distributed, based on a generalized central limit theorem (CLT). The standard results on the consistency and the asymptotic uncertainty of the frequency domain Errors-in-Variables (EIV) estimator are derived under the assumption that the Fourier coefficients are circular complex Gaussian distributed and independent over the different frequency bins. In this paper, we shall study the influence of this assumption on the consistency and the efficiency of the frequency domain EIV-estimator. We show that a slightly stronger form of the CLT is needed to preserve the classically obtained uncertainty bounds if independent complex Gaussian Fourier coefficients are not assumed. Our analysis reveals that the classical derived asymptotic uncertainty bounds are valid for a very wide class of distributions. ο€ 

1.

INTRODUCTION

Frequency domain system identification offers a tool to identify the transfer function 𝐺0 (π‘—πœ”π‘˜ ) of a linear dynamic time-invariant (LTI) system, as in Fig. 1, measured at angular frequencies πœ”π‘˜ , π‘˜ = 0, β‹― , 𝐿 βˆ’ 1, where 𝑗 = βˆ’1. The 𝑒0 𝑛𝑒

𝑦0

𝐺0

𝑒

1 π‘ˆ π‘˜ = 𝑃

𝑛𝑦

+

+

πœŽπœ‰2,πœƒ π‘˜ = πœŽπ‘Œ2 π‘˜ + 𝐺(π‘—πœ”π‘˜ , πœƒ) 2 πœŽπ‘ˆ2 π‘˜ 2 βˆ’ 2Re(πœŽπ‘Œπ‘ˆ π‘˜ 𝐺 π‘—πœ”π‘˜ , πœƒ ) 2 2 2 with πœŽπ‘ˆ π‘˜ , πœŽπ‘Œ π‘˜ , πœŽπ‘Œπ‘ˆ π‘˜ the (co)variances at frequencies πœ”π‘˜ . Letting π‘ˆ 𝑖 (π‘˜) and π‘Œ 𝑖 (π‘˜) denote the Fourier coefficients at frequency line π‘˜ for the 𝑖th measured period, the sample mean of the Fourier coefficients at frequency line π‘˜ is defined as,

1 π‘Œ π‘˜ = 𝑃

𝑦

Fig. 1 The simulation setup, where 𝑒0 and 𝑦0 are the respective input/output signals, 𝑒 and 𝑦 the measured input/output and 𝑛𝑒 and 𝑛𝑦 the (filtered) input/output noise.

identification of such systems is generally formulated as a weighted least squares problem, Pintelon and Schoukens (2001). Besides the information of the input and output signals, a frequency dependent weighting matrix (depending on the noise characteristics of the input/output errors) is used to enhance the statistical properties of the analysis and the identification of the transfer function (systematic errors are removed, the uncertainty is reduced). To estimate the transfer function 𝐺0 (π‘—πœ”π‘˜ ), a parametric estimate 𝐺 (π‘—πœ”π‘˜ , πœƒ) is considered. The Errors-In-Variables (EIV) estimator πœƒ of the parameters πœƒ is found by minimizing the following quadratic cost function, with respect to the parameters πœƒ, 𝐿/2βˆ’1

𝐾𝐿 πœƒ π‘ˆ, π‘Œ = π‘˜=1

π‘Œ π‘˜ βˆ’ 𝐺 π‘—πœ”π‘˜ , πœƒ π‘ˆ(π‘˜) πœŽπœ‰2,πœƒ π‘˜

2

(1)

with π‘ˆ π‘˜ , π‘Œ(π‘˜) the sample mean of the discrete Fourier transform (DFT) spectra of 𝑃 periods of the steady state response to a periodic input, 𝐴 the complex conjugate of 𝐴 and where

978-3-902661-47-0/09/$20.00 Β© 2009 IFAC

𝑃

π‘ˆ 𝑖 (π‘˜) 𝑖=1 𝑃

(2) π‘Œ 𝑖 (π‘˜)

𝑖=1

Many properties of the cost function (1) have been studied in detail. One can show for filtered independently and identically distributed (iid) noise 𝑛𝑒 , 𝑛𝑦 that if the cost function (1) is smooth over the parameter πœƒ and the parameter space is compact, the EIV-estimator is strongly consistent, Pintelon and Schoukens (2001). Asymptotically (𝐿 β†’ ∞), in the case of no modeling errors, the EIVestimator can be written as, πœƒ = πœƒ0 + π›Ώπœƒ + π‘πœƒ 𝑇 (3) β€² β€² βˆ’1 with π›Ώπœƒ = βˆ’πΎπΏ πœƒ0 |π‘ˆ, π‘Œ 𝐾𝐿′ πœƒ0 |π‘ˆ, π‘Œ where π‘πœƒ contains the bias contribution which vanishes proportional to πΏβˆ’1 . The asymptotic covariance matrix of πœƒ is exactly given by, Pintelon and Schoukens (2001), Cov π›Ώπœƒ 2 βˆ’1 βˆ’1 = 𝐾 β€² β€² πœƒ0 |π‘ˆ, π‘Œ 𝑄𝐿 πœƒ0 𝐾𝐿′′ πœƒ0 |π‘ˆ, π‘Œ (4) πΏβˆ’2 𝐿 2 𝑇 𝑄𝐿 πœƒ0 = 𝔼 𝐾𝐿′ πœƒ0 |π‘ˆ, π‘Œ 𝐾𝐿′ πœƒ0 |π‘ˆ, π‘Œ πΏβˆ’2 The computation of 𝑄𝐿 (πœƒ0 ) in equation (4) requires the knowledge of the third and fourth moments of the disturbing noise, which is in general unknown. To overcome this problem, it is usually postulated that the Fourier coefficients,

1139

10.3182/20090706-3-FR-2004.0122

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 𝑁𝑒 π‘˜ , 𝑁𝑦 (π‘˜), of the disturbing noise 𝑛𝑒 , 𝑛𝑦 are circular complex Gaussian distributed and 𝑁π‘₯ π‘˜ , 𝑁π‘₯ 𝑙 , for π‘˜ β‰  𝑙 and π‘₯ = 𝑒, 𝑦 are independent. It is well known that this assumption is only valid asymptotically (𝐿 β†’ ∞), Brillinger (1981). For finite record lengths 𝐿, one can show, Pintelon and Schoukens (2001), that if the disturbing noise 𝑛π‘₯ is a filtered iid sequence, the DFT coefficients of 𝑛π‘₯ satisfy, 𝑁π‘₯ π‘˜ = 𝐻π‘₯ π‘˜ 𝐸π‘₯ π‘˜ + 𝑇π‘₯ (π‘˜) where 𝐻π‘₯ π‘˜ 𝐸π‘₯ π‘˜ and 𝑇π‘₯ (π‘˜) denote respectively, the DFT coefficients of the noise filter, the underlying iid noise source and the leakage contribution due to the non-periodicity of the noise. Furthermore, 𝐸π‘₯ π‘˜ is uncorrelated over π‘˜, the leakage contribution vanishes with probability one proportional to πΏβˆ’1/2 and 𝔼 𝐸π‘₯ π‘˜ 𝑇π‘₯ (π‘˜) = π’ͺ

1 𝐿

. Therefore, one can

conclude that the leakage contribution 𝑇π‘₯ (π‘˜) has no influence on the asymptotic covariance matrix (4). Let us denote, 4 𝑀1 = Re 𝐽1𝐻 𝐽1 πΏβˆ’2 (5) 4 𝑀2 = Re 𝐽2𝐻 𝐽2 πΏβˆ’2 where the (π‘˜, 𝑖)th element of the matrices 𝐽1 , 𝐽2 are respectively given by, π‘ˆ0 π‘˜ πœ•πΊ πœ”π‘˜ , πœƒ 𝐽1 π‘˜,𝑖 = πœŽπœ‰ ,πœƒ π‘˜ πœ•πœƒπ‘– πœƒ=πœƒ 0

𝐽2

π‘˜,𝑖

=

2 πœŽπ‘Œ2 π‘˜ πœŽπ‘ˆ2 π‘˜ βˆ’ πœŽπ‘Œπ‘ˆ π‘˜ 2 πœŽπœ‰ ,πœƒ π‘˜

2

πœ•πΊ πœ”π‘˜ , πœƒ πœ•πœƒπ‘–

πœƒ=πœƒ0

with π‘ˆ0 π‘˜ the discrete Fourier coefficients of the true periodic input. Assuming𝑁𝑒 π‘˜ , 𝑁𝑦 (π‘˜) to be circular complex Gaussian distributed and independent over the frequency bin π‘˜, together with (5) allows simplifying expression (4) to, Pintelon and Hong (2007) 2 Cov π›Ώπœƒ = π‘€βˆ’1 𝑀1 + 𝑀2 𝑀1βˆ’1 (6) πΏβˆ’2 1 The assumption that 𝑁𝑒 π‘˜ , 𝑁𝑦 (π‘˜) are circular complex Gaussian distribution and independent over the frequency lines is valid asymptotically (𝐿 β†’ ∞) by virtue of the central limit theorem. Unfortunately, this does not imply convergence of the uncertainty bounds, Lukacs (1975). It can be expected that this influences the efficiency bounds. This is potentially dangerous since there is no warning for the user due to the fact that the distribution of the Fourier coefficients looks Gaussian. Indeed, we show that to preserve the asymptotic uncertainty of the EIV estimator, πœƒ, a stronger version of the central limit theorem is needed. However, one of the conclusions of the analysis is, that the uncertainty bounds are guaranteed for a very wide class of distributions. This paper is organized as follows: section 2 formalizes the excitation signal and defines the Gaussian Likelihood function for the EIV set-up, Fig. 1; in section 3 we study a stronger form of the Central Limit Theorem (CLT) for discrete Fourier coefficients under which the classical

uncertainty bounds, Pintelon and Hong (2007) hold, section 4 provide numerical examples and some conclusions are formulated in section 5. 2.

ASSUMPTIONS AND SET-UP

2.1. The excitation signal In Fig. 1 the set-up is defined. We consider a normalized periodic excitation 𝑒0 , such that the rms-value does not depend on the number of excited frequency lines 𝐿 2 βˆ’ 1. Increasing the record length 𝐿, implies that the number of excited frequency lines increases accordingly to 𝐿. The steady state response to a periodic input 𝑒0 is a periodic output signal 𝑦0 . In order to avoid transient effects, the measurements can only start when transients become small with respect to the noise level. Leakage errors are avoided by measuring an integer number of periods 𝑃 of the periodic signal. We consider 𝐿 measured points per period resulting in a total record of 𝑁 = 𝑃𝐿 data points. It is assumed that both the input and output signals, 𝑒0 , 𝑦0 are disturbed by filtered iid (identical independently distributed) noise, 𝑛𝑒 , 𝑛𝑦 , where 𝑛𝑒 , 𝑛𝑦 are allowed to be mutually correlated, formally Assumption 1 For the measured signals 𝑒, 𝑦, with period length 𝐿, the following relation holds, 𝑒 = 𝑒0 + 𝑛𝑒 and 𝑦 = 𝑦0 + 𝑛𝑦 where 𝑛𝑒 , 𝑛𝑦 denote the input/output noise respectively possibly correlated, satisfying, 𝑛𝑒 = 𝑕𝑒 βˆ— 𝑒𝑒 and 𝑛𝑦 = 𝑕𝑦 βˆ— 𝑒𝑦 βˆ— denotes the convolution, 𝑒𝑒 , 𝑒𝑦 are two (mutually correlated) iid sequences with zero mean and unit variance, and 𝑕𝑒 , 𝑕𝑦 the impulse responses of the input/output noise filters respectively. Furthermore, it is assumed that 𝑕𝑒 , 𝑕𝑦 satisfy, ∞

𝑛2 𝑕π‘₯ (𝑛) < ∞ with π‘₯ = 𝑒, 𝑦 𝑛=0

We define the discrete Fourier transform (DFT) of the 𝑖th period at frequency bin π‘˜ as, π‘ˆ

𝑖

π‘˜ =

1

πΏβˆ’1

𝐿 𝑛=0

𝑛

𝑒( 𝑖 βˆ’ 1 𝐿 + 𝑛)𝑒 βˆ’2πœ‹π‘—π‘˜ 𝐿

(7)

We denote π‘ˆ0 (π‘˜), 𝐻𝑒 (π‘˜) the true Fourier coefficient and transfer function at frequency bin π‘˜ of the signal 𝑒0 and the noise filter 𝑕𝑒 respectively. Further, we let 𝐸𝑒 𝑖 π‘˜ be the discrete Fourier coefficient at frequency bin π‘˜ of the sequence 𝑒𝑒 𝑖 βˆ’ 1 𝐿 + 𝑛 , 𝑛 = 0, β‹― , 𝐿 βˆ’ 1, then the following holds, Pintelon and Schoukens (2001), Theorem 1 Under Assumption 1 the following holds for every period 𝑖, 1 π‘ˆ 𝑖 π‘˜ = π‘ˆ0 π‘˜ + 𝐻𝑒 π‘˜ 𝐸𝑒 𝑖 π‘˜ + π’ͺ𝑀𝑆 (8) 𝐿 where the convergence in (8) is in Mean Squared sense, Billingsley (1995). A similar expression holds for the output Fourier coefficients π‘Œ 𝑖 π‘˜ .

1140

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 It is shown in Pintelon et. al. (1997), that the term π’ͺ𝑀𝑆

1 𝐿

can be interpreted as transient effects of the noise

filter due to initial and final conditions. In the rest of this paper, we shall assume that the record length 𝐿 is large enough such that the noise transients are small with respect to the Fourier coefficients of the signal. Assumption 2 The discrete Fourier transform at frequency bin k of period i of the signal 𝑒 results in, π‘ˆ 𝑖 π‘˜ = π‘ˆ0 π‘˜ + 𝐻𝑒 π‘˜ 𝐸𝑒 𝑖 π‘˜ (9) A similar expression holds for the signal 𝑦. The assumptions are the classical assumptions of frequency domain system identification, Pintelon and Schoukens (2001), without assuming that the Fourier coefficients π‘ˆ 𝑖 π‘˜ are independent (for π‘˜ β‰  𝑙) complex circular Gaussian distributed with mean π‘ˆ0 π‘˜ and variance 𝐻𝑒 π‘˜ 2 . The influence of removing the transient term in (8) was already discussed in Schoukens and Pintelon (1999). 2.2. The Gaussian likelihood function for LTI systems We estimate the transfer function 𝐺0 (π‘—πœ”π‘˜ ) in a parametric way by introducing a parametric model 𝐺 (π‘—πœ”π‘˜ , πœƒ), a rational form in the parameters πœƒ, 𝑛𝐡 𝐡(πœ”π‘˜ , πœƒ) 𝑏𝑙 π‘—πœ”π‘˜ 𝑙 𝐺 π‘—πœ”π‘˜ , πœƒ = = 𝑛𝑙=0 (10) 𝐴 𝑙 𝐴(πœ”π‘˜ , πœƒ) 𝑙=0 π‘Žπ‘™ π‘—πœ”π‘˜ where 𝐴 and 𝐡 are polynomials. Furthermore, it is assumed that there are no model errors, Assumption 3 There exists a compact set 𝑆 βŠ† ℝ𝑛 𝐴 +𝑛 𝐡 and πœƒ0 ∈ 𝑆 such that, π‘Œ0 = 𝐺 π‘—πœ”π‘˜ , πœƒ0 π‘ˆ0 Besides eliminating modeling errors, Assumption 3 is a needed technical assumption to ensure consistency. Under the assumption that the Fourier coefficients of the sample mean, (1), π‘ˆ π‘˜ , π‘Œ (π‘˜) are complex circular Gaussian distributed the following likelihood function, 𝑔(𝑍 𝑍0 , πœƒ) holds, where 𝑍(π‘˜) = [π‘Œ π‘˜ , π‘ˆ(π‘˜)] and 𝑍0 π‘˜ = [π‘Œ0 π‘˜ , π‘ˆ0 (π‘˜)], 𝐿 βˆ’1 2

1

𝑔 𝑍 𝑍0 , πœƒ = πœ‹ πΏβˆ’2

𝐿 βˆ’1 2 det π‘˜=1

βˆ’ 𝑍0 π‘˜

𝐻

exp βˆ’ 𝐢𝑍 π‘˜

𝑍 π‘˜ π‘˜=1

(11)

𝐢𝑍† (π‘˜) 𝑍 π‘˜ βˆ’ 𝑍0 π‘˜

2 πœŽπ‘Œ2 (π‘˜) πœŽπ‘Œπ‘ˆ (π‘˜) and 𝐴† indicates the 2 πœŽπ‘Œπ‘ˆ (π‘˜) πœŽπ‘ˆ2 (π‘˜) pseudo-inverse of 𝐴. Application of Assumption 3 and other manipulations, Pintelon and Schoukens (2001), lead to the cost-function, (1). Due to the CLT, the likelihood function (11) is only valid asymptotically. The influence of this asymptotical result on the uncertainty bounds is investigated in the next section.

with 𝐢𝑍 π‘˜ =

3.

ROBUSTNESS OF THE ASYMPTOTIC UNCERTAINTY OF THE ML ESTIMATOR FOR LTI SYSTEMS 3.1. Convergence of the Log-likelihood in probability

Two sequences of random variables 𝑋𝑛 , π‘Œπ‘› for 𝑛 β‰₯ 0 are called asymptotically equivalent in probability, Billingsley (1995), if, plimπ‘›β†’βˆž 𝑋𝑛 βˆ’ π‘Œπ‘› = 0. If we want the uncertainty of the Gaussian ML-estimator, as defined in section 2.2, to be robust against departures from the Gaussian assumption, we need to verify that the actual log-likelihood of the Fourier coefficients π‘ˆ π‘˜ , π‘Œ(π‘˜) and the Gaussian log-likelihood are asymptotically equivalent. Let us define ℒ𝐺 𝑍𝐿 𝑍0 , πœƒ the Gaussian log-likelihood, it is exactly the logarithm of expression (11) and let β„’ 𝑍𝐿 𝑍0 , πœƒ be the actual log-likelihood of the Fourier coefficients. The notation 𝑍𝐿 indicates the discrete Fourier coefficients of a record of length 𝐿. Next, we formulate the definition of the entropy, Dembo, et al. (1991), 𝐻(𝑋) of a complex random vector 𝑋 ∈ β„‚πΏβˆ’2 with probability density function (pdf) 𝑓(π‘₯), 𝐻 𝑋 =βˆ’

β„‚πΏβˆ’2

𝑓(π‘₯) log 𝑓 π‘₯ 𝑑π‘₯

(12)

The following is a technical assumption under which the loglikelihood functions, β„’ 𝑍𝐿 𝑍0 , πœƒ and ℒ𝐺 𝑍𝐿 𝑍0 , πœƒ are asymptotically (𝐿 β†’ ∞) equivalent. Assumption 4 There exists an 𝐿0 > 0, such that for 𝐿 β‰₯ 𝐿0 , the variable 𝑍𝐿 = π‘ˆπΏ , π‘ŒπΏ , with joint pdf 𝑓𝐿 𝑧1 , 𝑧2 , with continuous partial derivatives, has a bounded entropy 𝐻(𝑍𝐿 ) < ∞. Furthermore, it is assumed that the variable 𝑍𝐿 has bounded first and second moments. It is clear that Assumption 4 allows a very wide class of probability distribution for the time domain noise sequences 𝑛𝑒 , 𝑛𝑦 : any distribution having a uniformly bounded pdf has a finite entropy. Theorem 2 Under Assumption 1-4 the following holds, if 𝐻 𝑍𝐿 = π‘™π‘œπ‘” π‘’πœ‹ πΏβˆ’2 det 𝐢𝐿 + π‘œ 𝐿 then π‘π‘™π‘–π‘š β„’ 𝑍𝐿 𝑍0 , πœƒ βˆ’ ℒ𝐺 𝑍𝐿 𝑍0 , πœƒ = 0 πΏβ†’βˆž

𝐿

βˆ’1

where det 𝐢𝐿 = π‘˜2 =1 det 𝐢𝑍 π‘˜ In Theorem 2, log π‘’πœ‹ πΏβˆ’2 det 𝐢𝐿 is the entropy of the Gaussian Discrete Fourier coefficients as described in section 2.2. In Barron (1986), CsiszΓ‘r (1975) it is shown, in 𝐿1 (Absolute mean) and in probability respectively, that Theorem 2 is an immediate consequence of a stronger version of the CLT. This stronger version of the CLT is investigated in the next subsection. We want to show Theorem 2 for the model parameters πœƒ, therefore we need an additional assumption. Assumption 5 There exists an 𝐿0 > 0, such that, for 𝐿 β‰₯ 𝐿0 , the log-likelihoods β„’ 𝑍𝐿 𝑍0 , πœƒ , ℒ𝐺 𝑍𝐿 𝑍0 , πœƒ are continuous functions on the parameter space 𝑆. Under Assumption 5 the log-likelihood β„’ 𝑍𝐿 𝑍0 , πœƒ is continuous on the parameter space 𝑆. Due to Assumption 3 the parameter space 𝑆 is compact and therefore the loglikelihoods β„’ 𝑍𝐿 𝑍0 , πœƒ , ℒ𝐺 𝑍𝐿 𝑍0 , πœƒ are uniform continuous on 𝑆. Convergence in probability of uniform continuous loglikelihoods implies convergence in probability of the respective maxima, SΓΆderstrΓΆm (1974). Let us define πœƒπΏ = argmaxπœƒ βˆˆπ‘† β„’ 𝑍𝐿 𝑍0 , πœƒ and πœƒπΏπΊ = argmaxπœƒ βˆˆπ‘† ℒ𝐺 𝑍𝐿 𝑍0 , πœƒ , then

1141

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 Theorem 3 holds,

Under Assumption 1-5, the following π‘π‘™π‘–π‘š πœƒπΏ βˆ’ πœƒπΏπΊ = 0 πΏβ†’βˆž

Theorem 3 implies that the estimator πœƒπΏπΊ is equivalent to the estimator πœƒπΏ , hence both estimators have the same asymptotic distribution. This implies that the asymptotic uncertainty of the estimator πœƒπΏ equals the asymptotic uncertainty of πœƒπΏπΊ , which proves the robustness. To show Theorem 2, a stronger version of the CLT is needed for the Fourier coefficients. The next subsection, shows that Assumption 4 implies that the discrete Fourier coefficients ZL approach a complex circular Gaussian distribution in the sense of relative entropy, Kullback and Leibler (1951) such that H ZL = log eΟ€ Lβˆ’2 det CL + o L and hence the robustness holds. 3.2. The CLT for discrete Fourier coefficients in Kullback-Leibler sense

𝑒0 𝑑 =

Classically the CLT is generally formulated in the sense of weak convergence of probability measures, Billingsley (1999). A slightly stronger notion of stochastic convergence, Kullback and Leibler (1951) is convergence in relative entropy. Let 𝑋𝑛 , 𝑋 be random variables on a probability space Ξ©. Further, denote the probability density function (pdf) of 𝑋𝑛 as 𝑓𝑛 and the pdf of 𝑋 as 𝑓. The random variables 𝑋𝑛 converge to 𝑋 in relative entropy if, 𝑓𝑛 π‘₯ (13) 𝐷 𝑋𝑛 , 𝑋 = 𝑓𝑛 (π‘₯) log 𝑑π‘₯ β†’ 0 if 𝑛 β†’ ∞ 𝑓 π‘₯ Convergence in relative entropy, (13), is slightly stronger than weak convergence of probability measures, Pinsker (1964). Indeed, in Barron (1986), it is argued that convergence in the sense of (13) is the uniform version of weak convergence. In CsiszΓ‘r (1975) convergence in relative entropy is shown to imply uniform set-wise convergence of probability distributions, where weak convergence is the point-wise convergence of probability distributions, Billingsley (1999). Theorem 4 Let 𝑍 = π‘ˆ 1 , π‘Œ 1 , … , π‘ˆ 1

𝑇

𝐿 2

βˆ’ 1 ,π‘Œ

example has approximately the same uncertainty bounds on the transfer function’s estimate. The second example violates the assumptions of Theorem 4, however no significant deviations in the uncertainty bounds could be detected within the Monte Carlo uncertainty. This means that the conditions of Theorem 4 are sufficient but not necessary. In all examples the Gaussian Maximum Likelihood, as described in subsection 2.2, was used. The uncertainty of the transfer function estimates was computed via 1000 Monte Carlo simulations in example 1, and 10000 simulations were performed in example 2. We use the simulation set-up as in Fig. 1. For the true system 𝐺0 , we use a type-1 digital Chebyshev filter of order 2, a stopband ripple of 10 dB, and a cutoff frequency at 0.15 Γ— 𝑓𝑠 . We used a multisine excitation as the true input signal 𝑒0 ,

𝐿 2

1

𝐿

π‘˜ cos 2πœ‹ 𝑑 + πœ™π‘˜ 𝐿

𝐿 π‘˜=1 where the phases πœ™π‘˜ are drawn at random from a uniform [0,2πœ‹[ distribution, and rms 𝑒0 = βˆ’3dB. In the three examples the input and output signals are respectively disturbed by iid noise such that the signal to noise ratio rms 𝑒0 /𝜎noise β‰ˆ 30dB. 4.1. Example 1 For the noise at input and output a uniform distribution was used with a standard deviation of 𝑠 = 0.025. We centralized the uniform distribution such that the mean equaled 0. It is easy to check that a uniform distribution satisfies Assumption 4. In Fig. 2 the Root Mean Squared error (RMSE) of the transfer function estimates is shown as a function of the frequency. The full black curve is the Maximum likelihood estimate, as described in subsection 2.2, the gray curve is the Maximum Likelihood estimate where the noise followed the uniform distribution as described in the beginning of this

βˆ’

-48

be the complex random vector of the sample means of

4.

RMSE[dB]

the discrete Fourier coefficient 𝑍 satisfying Assumption 1-2 and 4. Then, 𝐷 𝑍, 𝛷 = π‘œ 𝐿 where 𝛷 is a complex circular Gaussian random vector with the same first and second moments as 𝑍. The proof is found in Appendix A. It is easy to see that Theorem 4 immediately satisfies the conditions of Theorem 2.

-56

-60 0

0.32

0.64 0.96 Frequency [radians/sample]

1.28

Fig. 2 The root mean squared error of the transfer function estimates as a function of the frequency. The full black curve is the RMSE for the ML estimator with Gaussian noise, the gray curve is the RMSE for the ML estimator with uniform noise.

NUMERICAL EXAMPLES

We shall distinguish two numerical examples: the first example illustrates the use of uniform noise, the second example uses a distribution for the noise which has an unbounded pdf but a finite entropy and the third example has an unbounded entropy. As anticipated by the theory, the first

-52

example. As the theory suggests, the uncertainty on the transfer function is independent of the noise distribution within the Monte-Carlo uncertainty.

1142

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 4.2. Example For the last example we modified a distribution in Barron (1986) to a very sharp version. The distribution of the input and output noise has following pdf, 3 𝑓2 π‘₯ = 7 1 1 1 4 8 π‘₯ log log log log log log π‘₯ π‘₯ π‘₯ 𝑒 for π‘₯ ≀ 𝑒 βˆ’π‘’ This distribution has the property that its entropy equals βˆ’βˆž and if 𝑋𝑛 is a sequence of iid random variables with pdf 𝑓2 (π‘₯), the entropy of the sum 𝑁 𝑖=1 𝑋𝑖 is unbounded. This distribution clearly violates Assumption 4. In Fig. 3 the RMSE of the transfer function estimates is shown as a function of the frequency. The full black curve is the Maximum likelihood estimate, as described in subsection 2.2, the gray curve is the ML estimate where the noise followed a distribution with pdf 𝑓2 (π‘₯) where the standard deviation was standardized at 0.025, for the black curves the noise followed a Gaussian distribution with zero mean and standard deviation 0.025.

RMSE [dB]

-48

-52

-56

-60 0

0.32

0.64 0.96 Frequency [radians/sample]

1.28

Fig. 3 The root mean squared error of the transfer function estimates as a function of the frequency. The full black curve is the RMSE for the ML estimator with Gaussian noise, the gray curve is the RMSE for the ML estimator where the noise follows the pdf 𝑓3 (π‘₯)

Within 95%-confidence of the Monte-Carlo simulation, the RMSE of the Gaussian ML and the ML where the noise follows the pdf 𝑓2 (π‘₯) do not differ significantly. However, information theory suggests that if there is a significant difference, the predicted uncertainty bounds become conservative, since the Gaussian distribution has the highest entropy among all distributions satisfying Assumption 4, Barron (1986), implying the largest confidence regions. 5.

CONCLUSIONS

We gave an example of a distribution having infinite entropy but satisfying the classical CLT. Fortunately, no significant deviations from the predicted uncertainty bounds by the Gaussian ML could be detected, showing that the conditions for the robustness are not restrictive. ACKNOWLEDGEMENT This research was funded in part by the Methusalem grant of the Flemish Government (METH-1), the Belgian Program on Interuniversity Poles of attraction initiated by the Belgian State, Prime Minister’s Office, Science Policy programming (IUAP VI/4), and the research council of the VUB (OZR). APPENDIX A. SKETCH OF THE PROOF OF THEOREM 4 We begin by introducing some convenient notations. Following Assumption 2, the discrete Fourier coefficients at frequency bin π‘˜ of the input and output white noise source are given by 𝐸𝑒 π‘˜ , 𝐸𝑦 (π‘˜). We define the following random vector

𝐸𝐿 = 𝐸𝑒 1 , 𝐸𝑦 1 , β‹― , 𝐸𝑒

2

βˆ’ 1 , 𝐸𝑦

𝐿 2

βˆ’1

𝑇

,

where T denotes the transpose operator and DC and Nyquist are excluded. Then we denote the joint probability density function (pdf) of 𝐸𝐿 as 𝑓𝐿 (𝒛) where 𝒛 ∈ β„‚πΏβˆ’2 , since 𝐸𝐿 is a complex random vector. Further Assumption 2 implies that 𝔼 𝐸𝐿 = 𝟎 and Assumption 1 implies that 𝐸𝑒 π‘˜ , 𝐸𝑦 (π‘˜) are mutually correlated but 𝐸π‘₯ π‘˜ , 𝐸π‘₯ (𝑙) are uncorrelated for π‘˜ β‰  𝑙, π‘₯ = 𝑒, 𝑦. We denote, 2 πœŽπ‘’2 (π‘˜) πœŽπ‘’π‘¦ (π‘˜) 𝑇 𝐢 π‘˜ = Cov 𝐸𝑒 π‘˜ , 𝐸𝑦 π‘˜ = 2 πœŽπ‘’π‘¦ (π‘˜) πœŽπ‘¦2 (π‘˜) The covariance matrix of the random vector equals 𝐢𝐿 = diag 𝐢(π‘˜) , where diag(𝐴1 , 𝐴2 ) is a block diagonal matrix with 𝐴1 and 𝐴2 as matrix blocks. Due to classical CLT we define the asymptotic density of 𝐸𝐿 as 1 πœ™πΏ 𝒛 = πΏβˆ’2 exp βˆ’π’›π» 𝐢𝐿† 𝒛 πœ‹ det(𝐢𝐿 ) Let us define a random variable Φ𝐿 on the same probability space as 𝐸𝐿 with joint pdf πœ™πΏ such that the random variables 𝐸𝐿 , Φ𝐿 are independent. Due to Assumption 2, it is sufficient to prove that, 𝐷 𝐸𝐿 , 𝛷𝐿 β†’ 0 for 𝐿 β†’ ∞ This will not be proven directly for the Kullback-Leibler distance 𝐷 𝐸𝐿 , 𝛷𝐿 , but by means of the Fisher information. The Fisher information 𝐼(𝑋) of a complex random vector 𝑋 with pdf 𝑓(𝒛) is defined as, Dembo, et al. (1991), βˆ‡π’› 𝑓 . βˆ‡π’› 𝑓 𝐻 𝐼 𝑋 = 𝑑𝒛 (14) 𝑓(𝒛) β„‚πΏβˆ’2

In this paper, we have shown that if the DFT coefficients of the disturbing noise follow a distribution with finite entropy, the actual likelihood can be consistently approximated by the Gaussian likelihood where the DFT coefficients of the noise are independent over the frequencies. This implies that the classical uncertainty bounds for the Gaussian ML-estimator for transfer function estimates are robust against deviations from the Gaussian assumption for a very wide class of distributions.

𝐿

Remark: Please note that βˆ‡π’› 𝑓 in (14) indicates the complex vector of partial derivatives to every complex component 𝑧𝑖 . The notion of differentiability used, is differentiability in ℝ2πΏβˆ’4 where β„‚ is identified with the vector space ℝ2 . Note that Assumption 4 implies that the fisher information 𝐼(EL k ) is finite. De Bruijn’s identity, Dembo, et al. (1991), formulates the relationship between the Fisher information, (14), and the Kullback-Leibler distance, (13). In Dembo, et al. (1991) the identity is formulated for real random vectors

1143

15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 where every component is identically distributed and uncorrelated. We extend the identity of De Bruijn to complex Gaussian random vectors with covariance matrix 𝐢𝐿 . Before, we formulate the identity of De Bruijn for complex random vectors, we give an interesting property regarding complex random vectors and its Fisher information. Lemma 1 Let 𝑋 ∈ ℂ𝑝 be circular complex distributed for 𝑝 β‰₯ 1, then the following property hold, for π‘Ž ∈ β„‚, 𝐼 π‘Žπ‘‹ =

𝐼(𝑋) π‘Ž 2

Proof. See BarbΓ© (2008). ∎ We define 𝐸𝐿 , Φ𝐿 as the standardized versions (covariance matrix equals the identity matrix) of the random vectors 𝐸𝐿 , Φ𝐿 . The following Lemma is an extension of the known identity of De Bruijn. Lemma 2 Under the conditions of Lemma 1, we obtain, 1 𝐼( 𝑑𝐸𝐿 + 1 βˆ’ 𝑑𝛷𝐿 ) βˆ’ 4(𝐿 βˆ’ 2) 𝐷(𝐸𝐿 , 𝛷𝐿 ) = 𝑑𝑑 4𝑑 0 Proof. See BarbΓ© (2008). ∎ The proof of Lemma 2, see BarbΓ© (2008), further reveals that, 𝐼(𝐸𝐿 ) (15) 𝐷(𝐸𝐿 , 𝛷𝐿 ) ≀ βˆ’ (𝐿 βˆ’ 2) 4 Next, we show that 𝐼(EL ) converges to 4(𝐿 βˆ’ 2) when 𝐿 tends to infinity. As explained in the beginning of the proof, the random variables 𝐸π‘₯ π‘˜ , 𝐸π‘₯ (𝑙) are uncorrelated for π‘˜ β‰  𝑙, π‘₯ = 𝑒, 𝑦, but the random variables 𝐸𝑒 π‘˜ , 𝐸𝑦 (π‘˜) are mutually correlated by the covariance matrix 𝐢(π‘˜). Hence; Assumption 1 implies that there exist iid sequences 𝑒π‘₯ = [𝑒π‘₯ 1 , 𝑒π‘₯ 2 , β‹― , 𝑒π‘₯ (𝐿 βˆ’ 1)], for π‘₯ = 𝑒, 𝑦, such that the DFT at frequency bin π‘˜ of 𝑒π‘₯ equals 𝐸π‘₯ (π‘˜). First, we show that 𝐼(EL ) has a limit for 𝐿 β†’ ∞. To prove this result, we need 2 Lemmas. The next Lemma shows that the information of a random vector is bounded by the sum of the information in every component. Lemma 3 Let 𝐸𝐿 be a complex random vector as defined above, then the following inequality holds, 𝐿 βˆ’1 2

𝐼 𝐸𝐿 ≀

𝐿 βˆ’1 2

𝐼 𝐸𝑒 (𝑛) + 𝑛=1

𝐼 𝐸𝑦 (𝑛) 𝑛=1

Proof. See Dembo, et al. (1991).∎ The next Lemma is an inequality closely related to the Cramer-Rao bound, Lemma 4 Let 𝐸𝐿 be a complex random vector as defined above, then the following inequality holds, 𝐼 𝐸𝐿 β‰₯ 4 𝐿 βˆ’ 2 The proof is a straightforward consequence of Theorem 20 in Dembo, et al. (1991). It remains to show that I Ex k

tends to 4 if 𝐿tends to

that limπ‘›β†’βˆž Theorem 4.

π‘₯𝑛 𝑛

= inf𝑛

π‘₯𝑛 𝑛

. This completes the proof of

REFERENCES BarbΓ©, K. (2008), The Fisher information for circular complex Gaussian random vectors, ELEC Internal Note 291008. Barron, A. (1986). Entropy and the central limit theorem. The Annals of Probability 14(1), 336-342. Billingsley, P. (1999). Convergence of Probability Measures. Wiley. New York. Billingsley, P.(1995). Probability and Measure. Wiley. New York. Brillinger, D. (1981). Time Series: Data Analysis and Theory. McGraw-hill. New York. CsiszΓ‘r, I. (1975). I-divergence geometry of probability distributions and minimization problems. Annals of Probability 3, 146-158. Dembo, A., Cover, T. and Thomas, J. (1991). Information theoretic inequalities, IEEE Transactions on Information Theory 37(6), 1501-1518. Eriksson, J. and Koivunen, V. (2006), Complex random vectors and ICA models: Identifiability, Uniqueness and separability, IEEE Transactions on Information Theory 52(3), Fekete, M. (1923), Uber die Verteilung der Wurzeln bei gewissen algebraischen Gleichungen mit ganzzahligen Koeffizienten, Mathematische Zeitschrift 17, 228-249. Lukacs, E. (1975). Stochastic convergence 2nd edition. Academic Press Inc.. New York. Kullback, S and Leibler, R. (1951). On information and sufficiency, Annals of Mathematical Statistics 22, 79-86. Pintelon, R. and Hong, M. (2007), Asymptotic uncertainty of transfer function estimates using non-parametric noise models, IEEE Transactions on Instrumentation and Measurement 56(6), 2599-2605 Pintelon, R. and Schoukens, J (2001). System Identification A Frequency Domain Approach. IEEE Press. New York. Pintelon, R., Schoukens, J. and Vandersteen G. (1997), Frequency Domain System Identification Using Arbitrary Signals. IEEE Transactions on Automatic control 42(12), 1717-1720. Pinsker, M. (1964). Information and Information Stability of Random Variables. Holden Day, San Fransisco. Schoukens, J and Pintelon, R. (1999). Study of conditional ML estimators in time and frequency domain system identification. Automatica 35(1), 91-100. Schoukens, J., Pintelon, R., Vandersteen G. and Guillaume P. (1997). Frequency domain system identification using non-parameteric noise models estimated from a small number of data sets. Automatica 33(6), 1073-1086. SΓΆderstrΓΆm, T. (1974). Convergence properties of the generalized least squares identification method. Automatica 10(6), 617-626.

infinity, to obtain that 𝐼(EL ) βˆ’ 4(L βˆ’ 2) β†’ 0 and completing the proof of the Theorem. This is shown in BarbΓ© (2008) by showing that the sequence 𝐿𝐼 𝐸𝐿 is sub-additive. A real sequence π‘₯𝑛 is called sub-additive if for any π‘š, 𝑛 β‰₯ 0, π‘₯𝑛 +π‘š ≀ π‘₯𝑛 + π‘₯π‘š . Fekete’s lemma, Fekete (1923), implies

1144