Proceedings of the 15th IFAC Symposium on System Identification Saint-Malo, France, July 6-8, 2009
Asymptotic properties of transfer function estimates using non-parametric noise models under relaxed assumptions Kurt BarbΓ©, Rik Pintelon and Johan Schoukens Dept. ELEC,Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium; e-mail:
[email protected] Abstract: It is well known that under very general assumptions the discrete Fourier coefficients of filtered noise is asymptotically independent circular complex Gaussian distributed, based on a generalized central limit theorem (CLT). The standard results on the consistency and the asymptotic uncertainty of the frequency domain Errors-in-Variables (EIV) estimator are derived under the assumption that the Fourier coefficients are circular complex Gaussian distributed and independent over the different frequency bins. In this paper, we shall study the influence of this assumption on the consistency and the efficiency of the frequency domain EIV-estimator. We show that a slightly stronger form of the CLT is needed to preserve the classically obtained uncertainty bounds if independent complex Gaussian Fourier coefficients are not assumed. Our analysis reveals that the classical derived asymptotic uncertainty bounds are valid for a very wide class of distributions. ο
1.
INTRODUCTION
Frequency domain system identification offers a tool to identify the transfer function πΊ0 (πππ ) of a linear dynamic time-invariant (LTI) system, as in Fig. 1, measured at angular frequencies ππ , π = 0, β― , πΏ β 1, where π = β1. The π’0 ππ’
π¦0
πΊ0
π’
1 π π = π
ππ¦
+
+
ππ2,π π = ππ2 π + πΊ(πππ , π) 2 ππ2 π 2 β 2Re(πππ π πΊ πππ , π ) 2 2 2 with ππ π , ππ π , πππ π the (co)variances at frequencies ππ . Letting π π (π) and π π (π) denote the Fourier coefficients at frequency line π for the πth measured period, the sample mean of the Fourier coefficients at frequency line π is defined as,
1 π π = π
π¦
Fig. 1 The simulation setup, where π’0 and π¦0 are the respective input/output signals, π’ and π¦ the measured input/output and ππ’ and ππ¦ the (filtered) input/output noise.
identification of such systems is generally formulated as a weighted least squares problem, Pintelon and Schoukens (2001). Besides the information of the input and output signals, a frequency dependent weighting matrix (depending on the noise characteristics of the input/output errors) is used to enhance the statistical properties of the analysis and the identification of the transfer function (systematic errors are removed, the uncertainty is reduced). To estimate the transfer function πΊ0 (πππ ), a parametric estimate πΊ (πππ , π) is considered. The Errors-In-Variables (EIV) estimator π of the parameters π is found by minimizing the following quadratic cost function, with respect to the parameters π, πΏ/2β1
πΎπΏ π π, π = π=1
π π β πΊ πππ , π π(π) ππ2,π π
2
(1)
with π π , π(π) the sample mean of the discrete Fourier transform (DFT) spectra of π periods of the steady state response to a periodic input, π΄ the complex conjugate of π΄ and where
978-3-902661-47-0/09/$20.00 Β© 2009 IFAC
π
π π (π) π=1 π
(2) π π (π)
π=1
Many properties of the cost function (1) have been studied in detail. One can show for filtered independently and identically distributed (iid) noise ππ’ , ππ¦ that if the cost function (1) is smooth over the parameter π and the parameter space is compact, the EIV-estimator is strongly consistent, Pintelon and Schoukens (2001). Asymptotically (πΏ β β), in the case of no modeling errors, the EIVestimator can be written as, π = π0 + πΏπ + ππ π (3) β² β² β1 with πΏπ = βπΎπΏ π0 |π, π πΎπΏβ² π0 |π, π where ππ contains the bias contribution which vanishes proportional to πΏβ1 . The asymptotic covariance matrix of π is exactly given by, Pintelon and Schoukens (2001), Cov πΏπ 2 β1 β1 = πΎ β² β² π0 |π, π ππΏ π0 πΎπΏβ²β² π0 |π, π (4) πΏβ2 πΏ 2 π ππΏ π0 = πΌ πΎπΏβ² π0 |π, π πΎπΏβ² π0 |π, π πΏβ2 The computation of ππΏ (π0 ) in equation (4) requires the knowledge of the third and fourth moments of the disturbing noise, which is in general unknown. To overcome this problem, it is usually postulated that the Fourier coefficients,
1139
10.3182/20090706-3-FR-2004.0122
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 ππ’ π , ππ¦ (π), of the disturbing noise ππ’ , ππ¦ are circular complex Gaussian distributed and ππ₯ π , ππ₯ π , for π β π and π₯ = π’, π¦ are independent. It is well known that this assumption is only valid asymptotically (πΏ β β), Brillinger (1981). For finite record lengths πΏ, one can show, Pintelon and Schoukens (2001), that if the disturbing noise ππ₯ is a filtered iid sequence, the DFT coefficients of ππ₯ satisfy, ππ₯ π = π»π₯ π πΈπ₯ π + ππ₯ (π) where π»π₯ π πΈπ₯ π and ππ₯ (π) denote respectively, the DFT coefficients of the noise filter, the underlying iid noise source and the leakage contribution due to the non-periodicity of the noise. Furthermore, πΈπ₯ π is uncorrelated over π, the leakage contribution vanishes with probability one proportional to πΏβ1/2 and πΌ πΈπ₯ π ππ₯ (π) = πͺ
1 πΏ
. Therefore, one can
conclude that the leakage contribution ππ₯ (π) has no influence on the asymptotic covariance matrix (4). Let us denote, 4 π1 = Re π½1π» π½1 πΏβ2 (5) 4 π2 = Re π½2π» π½2 πΏβ2 where the (π, π)th element of the matrices π½1 , π½2 are respectively given by, π0 π ππΊ ππ , π π½1 π,π = ππ ,π π πππ π=π 0
π½2
π,π
=
2 ππ2 π ππ2 π β πππ π 2 ππ ,π π
2
ππΊ ππ , π πππ
π=π0
with π0 π the discrete Fourier coefficients of the true periodic input. Assumingππ’ π , ππ¦ (π) to be circular complex Gaussian distributed and independent over the frequency bin π, together with (5) allows simplifying expression (4) to, Pintelon and Hong (2007) 2 Cov πΏπ = πβ1 π1 + π2 π1β1 (6) πΏβ2 1 The assumption that ππ’ π , ππ¦ (π) are circular complex Gaussian distribution and independent over the frequency lines is valid asymptotically (πΏ β β) by virtue of the central limit theorem. Unfortunately, this does not imply convergence of the uncertainty bounds, Lukacs (1975). It can be expected that this influences the efficiency bounds. This is potentially dangerous since there is no warning for the user due to the fact that the distribution of the Fourier coefficients looks Gaussian. Indeed, we show that to preserve the asymptotic uncertainty of the EIV estimator, π, a stronger version of the central limit theorem is needed. However, one of the conclusions of the analysis is, that the uncertainty bounds are guaranteed for a very wide class of distributions. This paper is organized as follows: section 2 formalizes the excitation signal and defines the Gaussian Likelihood function for the EIV set-up, Fig. 1; in section 3 we study a stronger form of the Central Limit Theorem (CLT) for discrete Fourier coefficients under which the classical
uncertainty bounds, Pintelon and Hong (2007) hold, section 4 provide numerical examples and some conclusions are formulated in section 5. 2.
ASSUMPTIONS AND SET-UP
2.1. The excitation signal In Fig. 1 the set-up is defined. We consider a normalized periodic excitation π’0 , such that the rms-value does not depend on the number of excited frequency lines πΏ 2 β 1. Increasing the record length πΏ, implies that the number of excited frequency lines increases accordingly to πΏ. The steady state response to a periodic input π’0 is a periodic output signal π¦0 . In order to avoid transient effects, the measurements can only start when transients become small with respect to the noise level. Leakage errors are avoided by measuring an integer number of periods π of the periodic signal. We consider πΏ measured points per period resulting in a total record of π = ππΏ data points. It is assumed that both the input and output signals, π’0 , π¦0 are disturbed by filtered iid (identical independently distributed) noise, ππ’ , ππ¦ , where ππ’ , ππ¦ are allowed to be mutually correlated, formally Assumption 1 For the measured signals π’, π¦, with period length πΏ, the following relation holds, π’ = π’0 + ππ’ and π¦ = π¦0 + ππ¦ where ππ’ , ππ¦ denote the input/output noise respectively possibly correlated, satisfying, ππ’ = ππ’ β ππ’ and ππ¦ = ππ¦ β ππ¦ β denotes the convolution, ππ’ , ππ¦ are two (mutually correlated) iid sequences with zero mean and unit variance, and ππ’ , ππ¦ the impulse responses of the input/output noise filters respectively. Furthermore, it is assumed that ππ’ , ππ¦ satisfy, β
π2 ππ₯ (π) < β with π₯ = π’, π¦ π=0
We define the discrete Fourier transform (DFT) of the πth period at frequency bin π as, π
π
π =
1
πΏβ1
πΏ π=0
π
π’( π β 1 πΏ + π)π β2πππ πΏ
(7)
We denote π0 (π), π»π’ (π) the true Fourier coefficient and transfer function at frequency bin π of the signal π’0 and the noise filter ππ’ respectively. Further, we let πΈπ’ π π be the discrete Fourier coefficient at frequency bin π of the sequence ππ’ π β 1 πΏ + π , π = 0, β― , πΏ β 1, then the following holds, Pintelon and Schoukens (2001), Theorem 1 Under Assumption 1 the following holds for every period π, 1 π π π = π0 π + π»π’ π πΈπ’ π π + πͺππ (8) πΏ where the convergence in (8) is in Mean Squared sense, Billingsley (1995). A similar expression holds for the output Fourier coefficients π π π .
1140
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 It is shown in Pintelon et. al. (1997), that the term πͺππ
1 πΏ
can be interpreted as transient effects of the noise
filter due to initial and final conditions. In the rest of this paper, we shall assume that the record length πΏ is large enough such that the noise transients are small with respect to the Fourier coefficients of the signal. Assumption 2 The discrete Fourier transform at frequency bin k of period i of the signal π’ results in, π π π = π0 π + π»π’ π πΈπ’ π π (9) A similar expression holds for the signal π¦. The assumptions are the classical assumptions of frequency domain system identification, Pintelon and Schoukens (2001), without assuming that the Fourier coefficients π π π are independent (for π β π) complex circular Gaussian distributed with mean π0 π and variance π»π’ π 2 . The influence of removing the transient term in (8) was already discussed in Schoukens and Pintelon (1999). 2.2. The Gaussian likelihood function for LTI systems We estimate the transfer function πΊ0 (πππ ) in a parametric way by introducing a parametric model πΊ (πππ , π), a rational form in the parameters π, ππ΅ π΅(ππ , π) ππ πππ π πΊ πππ , π = = ππ=0 (10) π΄ π π΄(ππ , π) π=0 ππ πππ where π΄ and π΅ are polynomials. Furthermore, it is assumed that there are no model errors, Assumption 3 There exists a compact set π β βπ π΄ +π π΅ and π0 β π such that, π0 = πΊ πππ , π0 π0 Besides eliminating modeling errors, Assumption 3 is a needed technical assumption to ensure consistency. Under the assumption that the Fourier coefficients of the sample mean, (1), π π , π (π) are complex circular Gaussian distributed the following likelihood function, π(π π0 , π) holds, where π(π) = [π π , π(π)] and π0 π = [π0 π , π0 (π)], πΏ β1 2
1
π π π0 , π = π πΏβ2
πΏ β1 2 det π=1
β π0 π
π»
exp β πΆπ π
π π π=1
(11)
πΆπβ (π) π π β π0 π
2 ππ2 (π) πππ (π) and π΄β indicates the 2 πππ (π) ππ2 (π) pseudo-inverse of π΄. Application of Assumption 3 and other manipulations, Pintelon and Schoukens (2001), lead to the cost-function, (1). Due to the CLT, the likelihood function (11) is only valid asymptotically. The influence of this asymptotical result on the uncertainty bounds is investigated in the next section.
with πΆπ π =
3.
ROBUSTNESS OF THE ASYMPTOTIC UNCERTAINTY OF THE ML ESTIMATOR FOR LTI SYSTEMS 3.1. Convergence of the Log-likelihood in probability
Two sequences of random variables ππ , ππ for π β₯ 0 are called asymptotically equivalent in probability, Billingsley (1995), if, plimπββ ππ β ππ = 0. If we want the uncertainty of the Gaussian ML-estimator, as defined in section 2.2, to be robust against departures from the Gaussian assumption, we need to verify that the actual log-likelihood of the Fourier coefficients π π , π(π) and the Gaussian log-likelihood are asymptotically equivalent. Let us define βπΊ ππΏ π0 , π the Gaussian log-likelihood, it is exactly the logarithm of expression (11) and let β ππΏ π0 , π be the actual log-likelihood of the Fourier coefficients. The notation ππΏ indicates the discrete Fourier coefficients of a record of length πΏ. Next, we formulate the definition of the entropy, Dembo, et al. (1991), π»(π) of a complex random vector π β βπΏβ2 with probability density function (pdf) π(π₯), π» π =β
βπΏβ2
π(π₯) log π π₯ ππ₯
(12)
The following is a technical assumption under which the loglikelihood functions, β ππΏ π0 , π and βπΊ ππΏ π0 , π are asymptotically (πΏ β β) equivalent. Assumption 4 There exists an πΏ0 > 0, such that for πΏ β₯ πΏ0 , the variable ππΏ = ππΏ , ππΏ , with joint pdf ππΏ π§1 , π§2 , with continuous partial derivatives, has a bounded entropy π»(ππΏ ) < β. Furthermore, it is assumed that the variable ππΏ has bounded first and second moments. It is clear that Assumption 4 allows a very wide class of probability distribution for the time domain noise sequences ππ’ , ππ¦ : any distribution having a uniformly bounded pdf has a finite entropy. Theorem 2 Under Assumption 1-4 the following holds, if π» ππΏ = πππ ππ πΏβ2 det πΆπΏ + π πΏ then ππππ β ππΏ π0 , π β βπΊ ππΏ π0 , π = 0 πΏββ
πΏ
β1
where det πΆπΏ = π2 =1 det πΆπ π In Theorem 2, log ππ πΏβ2 det πΆπΏ is the entropy of the Gaussian Discrete Fourier coefficients as described in section 2.2. In Barron (1986), CsiszΓ‘r (1975) it is shown, in πΏ1 (Absolute mean) and in probability respectively, that Theorem 2 is an immediate consequence of a stronger version of the CLT. This stronger version of the CLT is investigated in the next subsection. We want to show Theorem 2 for the model parameters π, therefore we need an additional assumption. Assumption 5 There exists an πΏ0 > 0, such that, for πΏ β₯ πΏ0 , the log-likelihoods β ππΏ π0 , π , βπΊ ππΏ π0 , π are continuous functions on the parameter space π. Under Assumption 5 the log-likelihood β ππΏ π0 , π is continuous on the parameter space π. Due to Assumption 3 the parameter space π is compact and therefore the loglikelihoods β ππΏ π0 , π , βπΊ ππΏ π0 , π are uniform continuous on π. Convergence in probability of uniform continuous loglikelihoods implies convergence in probability of the respective maxima, SΓΆderstrΓΆm (1974). Let us define ππΏ = argmaxπ βπ β ππΏ π0 , π and ππΏπΊ = argmaxπ βπ βπΊ ππΏ π0 , π , then
1141
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 Theorem 3 holds,
Under Assumption 1-5, the following ππππ ππΏ β ππΏπΊ = 0 πΏββ
Theorem 3 implies that the estimator ππΏπΊ is equivalent to the estimator ππΏ , hence both estimators have the same asymptotic distribution. This implies that the asymptotic uncertainty of the estimator ππΏ equals the asymptotic uncertainty of ππΏπΊ , which proves the robustness. To show Theorem 2, a stronger version of the CLT is needed for the Fourier coefficients. The next subsection, shows that Assumption 4 implies that the discrete Fourier coefficients ZL approach a complex circular Gaussian distribution in the sense of relative entropy, Kullback and Leibler (1951) such that H ZL = log eΟ Lβ2 det CL + o L and hence the robustness holds. 3.2. The CLT for discrete Fourier coefficients in Kullback-Leibler sense
π’0 π‘ =
Classically the CLT is generally formulated in the sense of weak convergence of probability measures, Billingsley (1999). A slightly stronger notion of stochastic convergence, Kullback and Leibler (1951) is convergence in relative entropy. Let ππ , π be random variables on a probability space Ξ©. Further, denote the probability density function (pdf) of ππ as ππ and the pdf of π as π. The random variables ππ converge to π in relative entropy if, ππ π₯ (13) π· ππ , π = ππ (π₯) log ππ₯ β 0 if π β β π π₯ Convergence in relative entropy, (13), is slightly stronger than weak convergence of probability measures, Pinsker (1964). Indeed, in Barron (1986), it is argued that convergence in the sense of (13) is the uniform version of weak convergence. In CsiszΓ‘r (1975) convergence in relative entropy is shown to imply uniform set-wise convergence of probability distributions, where weak convergence is the point-wise convergence of probability distributions, Billingsley (1999). Theorem 4 Let π = π 1 , π 1 , β¦ , π 1
π
πΏ 2
β 1 ,π
example has approximately the same uncertainty bounds on the transfer functionβs estimate. The second example violates the assumptions of Theorem 4, however no significant deviations in the uncertainty bounds could be detected within the Monte Carlo uncertainty. This means that the conditions of Theorem 4 are sufficient but not necessary. In all examples the Gaussian Maximum Likelihood, as described in subsection 2.2, was used. The uncertainty of the transfer function estimates was computed via 1000 Monte Carlo simulations in example 1, and 10000 simulations were performed in example 2. We use the simulation set-up as in Fig. 1. For the true system πΊ0 , we use a type-1 digital Chebyshev filter of order 2, a stopband ripple of 10 dB, and a cutoff frequency at 0.15 Γ ππ . We used a multisine excitation as the true input signal π’0 ,
πΏ 2
1
πΏ
π cos 2π π‘ + ππ πΏ
πΏ π=1 where the phases ππ are drawn at random from a uniform [0,2π[ distribution, and rms π’0 = β3dB. In the three examples the input and output signals are respectively disturbed by iid noise such that the signal to noise ratio rms π’0 /πnoise β 30dB. 4.1. Example 1 For the noise at input and output a uniform distribution was used with a standard deviation of π = 0.025. We centralized the uniform distribution such that the mean equaled 0. It is easy to check that a uniform distribution satisfies Assumption 4. In Fig. 2 the Root Mean Squared error (RMSE) of the transfer function estimates is shown as a function of the frequency. The full black curve is the Maximum likelihood estimate, as described in subsection 2.2, the gray curve is the Maximum Likelihood estimate where the noise followed the uniform distribution as described in the beginning of this
β
-48
be the complex random vector of the sample means of
4.
RMSE[dB]
the discrete Fourier coefficient π satisfying Assumption 1-2 and 4. Then, π· π, π· = π πΏ where π· is a complex circular Gaussian random vector with the same first and second moments as π. The proof is found in Appendix A. It is easy to see that Theorem 4 immediately satisfies the conditions of Theorem 2.
-56
-60 0
0.32
0.64 0.96 Frequency [radians/sample]
1.28
Fig. 2 The root mean squared error of the transfer function estimates as a function of the frequency. The full black curve is the RMSE for the ML estimator with Gaussian noise, the gray curve is the RMSE for the ML estimator with uniform noise.
NUMERICAL EXAMPLES
We shall distinguish two numerical examples: the first example illustrates the use of uniform noise, the second example uses a distribution for the noise which has an unbounded pdf but a finite entropy and the third example has an unbounded entropy. As anticipated by the theory, the first
-52
example. As the theory suggests, the uncertainty on the transfer function is independent of the noise distribution within the Monte-Carlo uncertainty.
1142
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 4.2. Example For the last example we modified a distribution in Barron (1986) to a very sharp version. The distribution of the input and output noise has following pdf, 3 π2 π₯ = 7 1 1 1 4 8 π₯ log log log log log log π₯ π₯ π₯ π for π₯ β€ π βπ This distribution has the property that its entropy equals ββ and if ππ is a sequence of iid random variables with pdf π2 (π₯), the entropy of the sum π π=1 ππ is unbounded. This distribution clearly violates Assumption 4. In Fig. 3 the RMSE of the transfer function estimates is shown as a function of the frequency. The full black curve is the Maximum likelihood estimate, as described in subsection 2.2, the gray curve is the ML estimate where the noise followed a distribution with pdf π2 (π₯) where the standard deviation was standardized at 0.025, for the black curves the noise followed a Gaussian distribution with zero mean and standard deviation 0.025.
RMSE [dB]
-48
-52
-56
-60 0
0.32
0.64 0.96 Frequency [radians/sample]
1.28
Fig. 3 The root mean squared error of the transfer function estimates as a function of the frequency. The full black curve is the RMSE for the ML estimator with Gaussian noise, the gray curve is the RMSE for the ML estimator where the noise follows the pdf π3 (π₯)
Within 95%-confidence of the Monte-Carlo simulation, the RMSE of the Gaussian ML and the ML where the noise follows the pdf π2 (π₯) do not differ significantly. However, information theory suggests that if there is a significant difference, the predicted uncertainty bounds become conservative, since the Gaussian distribution has the highest entropy among all distributions satisfying Assumption 4, Barron (1986), implying the largest confidence regions. 5.
CONCLUSIONS
We gave an example of a distribution having infinite entropy but satisfying the classical CLT. Fortunately, no significant deviations from the predicted uncertainty bounds by the Gaussian ML could be detected, showing that the conditions for the robustness are not restrictive. ACKNOWLEDGEMENT This research was funded in part by the Methusalem grant of the Flemish Government (METH-1), the Belgian Program on Interuniversity Poles of attraction initiated by the Belgian State, Prime Ministerβs Office, Science Policy programming (IUAP VI/4), and the research council of the VUB (OZR). APPENDIX A. SKETCH OF THE PROOF OF THEOREM 4 We begin by introducing some convenient notations. Following Assumption 2, the discrete Fourier coefficients at frequency bin π of the input and output white noise source are given by πΈπ’ π , πΈπ¦ (π). We define the following random vector
πΈπΏ = πΈπ’ 1 , πΈπ¦ 1 , β― , πΈπ’
2
β 1 , πΈπ¦
πΏ 2
β1
π
,
where T denotes the transpose operator and DC and Nyquist are excluded. Then we denote the joint probability density function (pdf) of πΈπΏ as ππΏ (π) where π β βπΏβ2 , since πΈπΏ is a complex random vector. Further Assumption 2 implies that πΌ πΈπΏ = π and Assumption 1 implies that πΈπ’ π , πΈπ¦ (π) are mutually correlated but πΈπ₯ π , πΈπ₯ (π) are uncorrelated for π β π, π₯ = π’, π¦. We denote, 2 ππ’2 (π) ππ’π¦ (π) π πΆ π = Cov πΈπ’ π , πΈπ¦ π = 2 ππ’π¦ (π) ππ¦2 (π) The covariance matrix of the random vector equals πΆπΏ = diag πΆ(π) , where diag(π΄1 , π΄2 ) is a block diagonal matrix with π΄1 and π΄2 as matrix blocks. Due to classical CLT we define the asymptotic density of πΈπΏ as 1 ππΏ π = πΏβ2 exp βππ» πΆπΏβ π π det(πΆπΏ ) Let us define a random variable Ξ¦πΏ on the same probability space as πΈπΏ with joint pdf ππΏ such that the random variables πΈπΏ , Ξ¦πΏ are independent. Due to Assumption 2, it is sufficient to prove that, π· πΈπΏ , π·πΏ β 0 for πΏ β β This will not be proven directly for the Kullback-Leibler distance π· πΈπΏ , π·πΏ , but by means of the Fisher information. The Fisher information πΌ(π) of a complex random vector π with pdf π(π) is defined as, Dembo, et al. (1991), βπ π . βπ π π» πΌ π = ππ (14) π(π) βπΏβ2
In this paper, we have shown that if the DFT coefficients of the disturbing noise follow a distribution with finite entropy, the actual likelihood can be consistently approximated by the Gaussian likelihood where the DFT coefficients of the noise are independent over the frequencies. This implies that the classical uncertainty bounds for the Gaussian ML-estimator for transfer function estimates are robust against deviations from the Gaussian assumption for a very wide class of distributions.
πΏ
Remark: Please note that βπ π in (14) indicates the complex vector of partial derivatives to every complex component π§π . The notion of differentiability used, is differentiability in β2πΏβ4 where β is identified with the vector space β2 . Note that Assumption 4 implies that the fisher information πΌ(EL k ) is finite. De Bruijnβs identity, Dembo, et al. (1991), formulates the relationship between the Fisher information, (14), and the Kullback-Leibler distance, (13). In Dembo, et al. (1991) the identity is formulated for real random vectors
1143
15th IFAC SYSID (SYSID 2009) Saint-Malo, France, July 6-8, 2009 where every component is identically distributed and uncorrelated. We extend the identity of De Bruijn to complex Gaussian random vectors with covariance matrix πΆπΏ . Before, we formulate the identity of De Bruijn for complex random vectors, we give an interesting property regarding complex random vectors and its Fisher information. Lemma 1 Let π β βπ be circular complex distributed for π β₯ 1, then the following property hold, for π β β, πΌ ππ =
πΌ(π) π 2
Proof. See BarbΓ© (2008). β We define πΈπΏ , Ξ¦πΏ as the standardized versions (covariance matrix equals the identity matrix) of the random vectors πΈπΏ , Ξ¦πΏ . The following Lemma is an extension of the known identity of De Bruijn. Lemma 2 Under the conditions of Lemma 1, we obtain, 1 πΌ( π‘πΈπΏ + 1 β π‘π·πΏ ) β 4(πΏ β 2) π·(πΈπΏ , π·πΏ ) = ππ‘ 4π‘ 0 Proof. See BarbΓ© (2008). β The proof of Lemma 2, see BarbΓ© (2008), further reveals that, πΌ(πΈπΏ ) (15) π·(πΈπΏ , π·πΏ ) β€ β (πΏ β 2) 4 Next, we show that πΌ(EL ) converges to 4(πΏ β 2) when πΏ tends to infinity. As explained in the beginning of the proof, the random variables πΈπ₯ π , πΈπ₯ (π) are uncorrelated for π β π, π₯ = π’, π¦, but the random variables πΈπ’ π , πΈπ¦ (π) are mutually correlated by the covariance matrix πΆ(π). Hence; Assumption 1 implies that there exist iid sequences ππ₯ = [ππ₯ 1 , ππ₯ 2 , β― , ππ₯ (πΏ β 1)], for π₯ = π’, π¦, such that the DFT at frequency bin π of ππ₯ equals πΈπ₯ (π). First, we show that πΌ(EL ) has a limit for πΏ β β. To prove this result, we need 2 Lemmas. The next Lemma shows that the information of a random vector is bounded by the sum of the information in every component. Lemma 3 Let πΈπΏ be a complex random vector as defined above, then the following inequality holds, πΏ β1 2
πΌ πΈπΏ β€
πΏ β1 2
πΌ πΈπ’ (π) + π=1
πΌ πΈπ¦ (π) π=1
Proof. See Dembo, et al. (1991).β The next Lemma is an inequality closely related to the Cramer-Rao bound, Lemma 4 Let πΈπΏ be a complex random vector as defined above, then the following inequality holds, πΌ πΈπΏ β₯ 4 πΏ β 2 The proof is a straightforward consequence of Theorem 20 in Dembo, et al. (1991). It remains to show that I Ex k
tends to 4 if πΏtends to
that limπββ Theorem 4.
π₯π π
= infπ
π₯π π
. This completes the proof of
REFERENCES BarbΓ©, K. (2008), The Fisher information for circular complex Gaussian random vectors, ELEC Internal Note 291008. Barron, A. (1986). Entropy and the central limit theorem. The Annals of Probability 14(1), 336-342. Billingsley, P. (1999). Convergence of Probability Measures. Wiley. New York. Billingsley, P.(1995). Probability and Measure. Wiley. New York. Brillinger, D. (1981). Time Series: Data Analysis and Theory. McGraw-hill. New York. CsiszΓ‘r, I. (1975). I-divergence geometry of probability distributions and minimization problems. Annals of Probability 3, 146-158. Dembo, A., Cover, T. and Thomas, J. (1991). Information theoretic inequalities, IEEE Transactions on Information Theory 37(6), 1501-1518. Eriksson, J. and Koivunen, V. (2006), Complex random vectors and ICA models: Identifiability, Uniqueness and separability, IEEE Transactions on Information Theory 52(3), Fekete, M. (1923), Uber die Verteilung der Wurzeln bei gewissen algebraischen Gleichungen mit ganzzahligen Koeffizienten, Mathematische Zeitschrift 17, 228-249. Lukacs, E. (1975). Stochastic convergence 2nd edition. Academic Press Inc.. New York. Kullback, S and Leibler, R. (1951). On information and sufficiency, Annals of Mathematical Statistics 22, 79-86. Pintelon, R. and Hong, M. (2007), Asymptotic uncertainty of transfer function estimates using non-parametric noise models, IEEE Transactions on Instrumentation and Measurement 56(6), 2599-2605 Pintelon, R. and Schoukens, J (2001). System Identification A Frequency Domain Approach. IEEE Press. New York. Pintelon, R., Schoukens, J. and Vandersteen G. (1997), Frequency Domain System Identification Using Arbitrary Signals. IEEE Transactions on Automatic control 42(12), 1717-1720. Pinsker, M. (1964). Information and Information Stability of Random Variables. Holden Day, San Fransisco. Schoukens, J and Pintelon, R. (1999). Study of conditional ML estimators in time and frequency domain system identification. Automatica 35(1), 91-100. Schoukens, J., Pintelon, R., Vandersteen G. and Guillaume P. (1997). Frequency domain system identification using non-parameteric noise models estimated from a small number of data sets. Automatica 33(6), 1073-1086. SΓΆderstrΓΆm, T. (1974). Convergence properties of the generalized least squares identification method. Automatica 10(6), 617-626.
infinity, to obtain that πΌ(EL ) β 4(L β 2) β 0 and completing the proof of the Theorem. This is shown in BarbΓ© (2008) by showing that the sequence πΏπΌ πΈπΏ is sub-additive. A real sequence π₯π is called sub-additive if for any π, π β₯ 0, π₯π +π β€ π₯π + π₯π . Feketeβs lemma, Fekete (1923), implies
1144