On the use of the Cramér-Rao Bound for Asymptotically Unbiased Estimators

On the use of the Cramér-Rao Bound for Asymptotically Unbiased Estimators

Copyright @IFACSyslem Identification, Copenhagen, Denmark, 1994 ON THE USE OF THE CRAMER-RAO BOUND FOR ASYMPTOTICALLY UNBIASED ESTIMATORS Y. ROLAIN A...

1MB Sizes 17 Downloads 9 Views

Copyright @IFACSyslem Identification, Copenhagen, Denmark, 1994

ON THE USE OF THE CRAMER-RAO BOUND FOR ASYMPTOTICALLY UNBIASED ESTIMATORS Y. ROLAIN AND J. SCHOUKENS. Vrije Ulliversiteit Brussel, Departltlntt £lEC, PlrinlQIJn 2,B-105O Brussel, &Igiultl, ntlllil: [email protected]'

Abstr,act. The Cramer-Rao bound for unbiased estimators is widely used as a measure to determine the efficiency of unbiased estimators. There are many examples in the literature where this bound is also used to show the efficiency of asymptotically unbiased estimators. A simple example will show that this can potentially lead to a vast underestimated inefficiency of the used estimator. Conditions will be derived for the validity of the unbiased Cramer-Rao bound under these conditions. Unfortunately, to check these the knowledge of the bias derivatives is Te
1. INTRODUcnON

2. THE CONSIDERED DEGENERATED PROBLEM The following simple system model is considered.

To allow a correct usage of an estimator, an extensive knowledge of its properties is Te
2

Y = b X

(1)

Herein, the factor b is a real constant, while X and Y are measurable real quantities. The "measurements" are assumed to fulfil the system equation without any model error. The "output" Y is assumed to be contaminated by zero-mean Gaussian noise, while the nonzero "input" X is assumed to be measured without any noise corruption and equal to 1.

Most of the time however, it is impossible to determine the moments of the estimator for a finite sample length. Hence, the asymptotic properties of the estimator are studied for the number of data tending to infinity. If the analysed estimator turns out to be asymptotically unbiased, it is very tempting to use the same Cramer-Rao bound to determine the efficiency of this estimator. What will be shown in the rest of the paper is that this can potentially lead to unreliable results about the efficiency of asymptotically unbiased estimators.

Fig. 1

Noise framework of the example

In order to keep the expressions as simple as possible, a least squares estimator is constructed for the estimation of b. The cost function of the estimator for b is then:

In this paper, a simple example of an asymptotically unbiased estimator will be introduced where the Cramer-Rao bound for unbiased estimators gives an infinite variance. The same bound will then be recalculated under the assumption that the estimator is a biased estimator. Both bounds will then be compared against a simulation of the estimator.

1 K(Y m' b) = 2 F

F

L (Y mi - b 2 Xe )2

(2)

i = 1

where the index m denotes a measured quantity and F denotes the number of measured data points. Remark that this cost function is equal to the Gaussian loglikelihood cost function up to a constant term.

This work is supported by the Belgium National Fund for Scientific Research; the Belgian Government as a part of the Belgian Program m.. on Interuniversity Poles of Attraction (IUAP 50) initiated by the Belgian State, Prime Minister's Office, Science Policy Programming; and the Flernmish Community (GOA-IMMI).

923

Determination of the stationary points is straightforward for XI = l:

The possible solutions for b can easily be determined analytically: Fig. 2

J£V

m

'

(j~1 YmiJ ~O

bF =± ~

when

bF

otherwise

= 0

K
1 real root

K= 0

triple real root

K>O

3 distinct real roots

Representation of the considered degeneration. 2

On the other hand, if \3 = b is taken a a parameter in the cost function, the cost becomes:

(4)

K(Y m' b) =

The last solution should be chosen as soon as the sum of the measured quantities has a negative value since the parameter b is assumed to be real. 2

F

-r-

(9)

For the latter case, the behavior of the cost in te neighbourhood of the minimum can hence be approximated by a Taylor series of the cost truncated at the second order term for all parameter values. In the former case this is only true for parameter values yielding a non-zero second order derivative. For the other cases, higher derivatives are required to describe the behavior of the function adequately. This very fact will imply the degeneration problems encountered subsequently. (Singular information matrix)

LYml i: )

1( 2 2) 2\ \3 - 2K\3 + K

which is clearly a Morse point, hence this is a regular point.

Replacing b by 13 and developing a least squares estimator analogous to the b estimator, one finds

~F

~ ~ ~

(5)

For positive values of ~F, the estimator for bF is exactly equal to the square root of the ~F estimator. For negative values of ~F' the bF estimator yields zero estimates. This property will be used extensively in the following paragraphs to characterise the behaviour of the former estimator.

2.2. The bias of the estimator

For an estimator, it is important to find the "correct") value of the parameter as the number of experiments increases. To determine whether this happens or not, the first-order statistics of the estimator are analysed. If these expressions can be determined - or approxi· mated - the difference between the real and estimated values can be calculated. This value, if different from zero, is called the bias of the estimator.

2.1. Catastrophe analysis

In this simple example, the catastrophe analysis of the different stationary points of the cost function can easily be performed. The general form of the cost function (2) can be rewritten under the following form:

For the ~F, the expectation value becomes: F

LYe +n Yi jz)

By replacing the sum in this expression by a general constant IC, this can be reduced to:

K(Y m' b)

= 21( b4-

2 2)

2Kb + K

(7)

The estimator ~F is unbiased for all values of F, since its expectation value is exactly equal to the true value of 13.

The stationary points of this function are readily found to be:

=0 b = JK b = -JK

Ye =

F

b

Since the measurements Ymi are assumed to have a normal distribution with zero mean, their sum also has a normal distribution with zero mean, and so has

(8)

the estimator ~F' The variance of the estimator depends on the number of measurements through:

Depending on the value of K, 3 different situations can occur as shown in fig. 2. According to the classification of Thom, this is a cusp catastrophe(Poston and Stewart (1978».

_

2 ay

var(\3F) =

F

(10)

1. It is not sure that this value exists. This is only the case if the model perfectly matches the reality.

924

Starting with this result, the bias calculation can be

tice that f3 e only appears scaled with its standard de-

performed for b. The estimator bF can be rewritten as

viation.

depending on ~F through the following relation: (16)

Two limit cases are interesting in our discussion: the case when the number of measurement points tends to infinity and the case when the exact parameter value becomes much larger than its standard deviation.

(11)

To calculate the bias of this estimator is somewhat more involved. Since the pdf of ~F and the transformation relation between band 13 are both known, the

The behaviour of the bias when the exact parameter value is much larger than its standard deviation is shown in the next graphs. The bias is shown as a function of the normalised exact value, f3/o~.

pdf of bF can be determined through a transformation of the former one: af3



f~(f3)ab + P(f3F < O)o(b)

f3~O

o

13<0

5.0E- l - r - - - - - - - - - - - - - - - - - - - ,

(12)

4.0E-l III

.!lI

with O(b) a Dirac impulse at zero. Eg. (12) describes a probability density function since the transformation of (11) maps the total probability mass, and the inverse transform exists and is differentiable and strictly

..,III

3.0E-l

III

2.0E-l

E 0

1.0E-l

~

~ Z

increasing for ~F > 0 (Pa poulis (1981) pp 125-127).

O.OE+O -1.0E-l

0

=

~

.J21to~

2

exp(

-(b -f3 e 2

20~

Y P(f3F_< O)O(b)

(13)

iD ~

1

5

"

7

8

9

10

-2 .OE+l

III

co

~ -4.0E+l "5

(5

~ -6.0E+l -8.0E+l

+----~~~~,..,_---~~-......---'

0.1

I

10

bl sigma b

Fig. 4

Absolute value of the bias in dB

As can be seen, the bias is significant as long as the 30 confidence interval of the exact parameter value contains negative values. This behaviour is as expected, since the major part of the probability mass lies in this interval.

The Dirac impulse term is located in zero and hence does not contribute to the expectation value. Define:

={

4

O.OE+O-,-------------------.

) +

Notice that this pdf has a very strange behaviour for real values of b in the close neighbourhood of zero. The probability density tends to zero in the exact value of the parameter, since in the limit for b tending to zero, the linear b term will dominate the behaviour of the pdf. For an estimator, it is at least unusual that the probability to obtain a value in the immediate neighbourhood of the exact value tends to zero! For b=O the situation is somewhat different due to the Dirac impulse in zero. The probability to estimate a zero value is therefore by far different from zero.

f(f3)

3

Absolute bias E IbF I - b. as a function of the normalised exact value

Fig. 3

fb(b) = 2bf~(bl + P(~F < O)O(b) 2b

2

bl sigma b

The pdf of the transformed stochastic variable is thus:

(14)

The value of the bias depends on the value of the parameter relative to its standard deviation. Therefore, as the number of measurements increases, the absolute range of parameter values suffering from a significant bias decreases, since the standard deviation decreases .

The expectation value of the estimator becomes: (~- ~.)'

~ JJj3e~df3

.J 21to~ 0

(15)

This remark is important in the analysis of the asymptotic behaviour of the estimator. When the number of measurements tends to infinity, the bias is l

The last integral is known to be a parabolic cylinder function D(Gradsteyn and Ryzhik (1991), 3.462.1). No-

1. This expression has been found using the series expansion ability of Mathematica

925

ones. Remark that in this expression, the derivative of the bias with respect to the parameters influences the value of the bound! (17)

Where o~o is the standard deviation of the estimator with a single measurement. This expression approaches zero as the number of frequencies tends to infinity, and hence the estimator is asymptotically unbiased.

For estimators with a small bias, or estimators that are asymptotically unbiased, it is very tempting to use the bound for unbiased estimators since the bias has been proven to be small. However, this intuitive reasoning proves wrong, since the limited value of the bias does not at all imply a limited value of its derivative! This will be illustrated by the example treated in the previous paragraph.

This seems in opposition with the probability density function that tends to zero in the exact parameter values for small values of b". This is not so however, since the estimator is only unbiased for an infinite number of measurements. For that particular case, the variance of the estimates is zero, and as shown in (13) the pdf is a function of the normalised b-value. As a conclusion, it has been shown that the

First, the derivative of the expectation value with respect to the parameter is evaluated for the b estimator.

bF estima-

tor is asymptotically unbiased, while the ~F estimator is unbiased for any number of measurements. 23 ThR Cramer-Rao bound on the estimator

The first order moment analysis performed in the previous paragraph determined the mean value - and hence the bias - of the estimates. Knowing the mean value however does not tell much about the "compactness" of the distribution of the probability mass, and hence the spread of the parameter values in between different estimates. To do so, a second order statistic is needed.

The last integrals again are (well) known to be parabolic cylinder functions D (See Gradsteyn and Ryzhik (1991), 3.462.1)

a ab

For unbiased estimators, it is common knowledge that the covariance of the parameters is bounded below by the Cramer-Rao bound.

Since the expected value of an estimator equals the real parameter value summed with the bias (no model errors are considered) this expectation value equals the first and last term of (19).

(18)

Where p denotes the parameter vector and Fi pp the Fisher information matrix of the model parameters.

The Fisher information matrix of the estimator is derived from the log-likelihood function of the problem:

At this moment, it is worthwhile to emphasize somewhat the definition of an unbiased estimator: an estimator is called unbiased if, the expectation values of the estimates equals the true parameter value for all elements of the parameter space (Kendall and Stewart (1979) Vol. 2 page 4.)

L (Y F

JI

aA)1 Fi -1 ( ones + ap aAP ~ ( ones p + ~ p pp

m'

b) =

!2 L~F i = 1

(Y

2

mi -

b X

2 Oy

"

YC +

t

"

(2])

Then, the Fisher information matrix is found to be: F

Stricto sensu, the estimator for bF , is thus not unbiased, since this definition is only fulfilled for the estimation based on an infinite number of measurements. It is therefore not allowed to apply the "unbiased" version of the Cramer-Rao bound to it. According to Kendall and Stewart (1979), the version of the CramerRao bound for biased estimators is: COV p

A

--E(b F) =

b

= E [aL~LF] = 4b:FX: ab ab 02 y

(22)

All the parts needed to calculate the Cramer-Rao bound are now known. First, let's take the naive reasoning consisting in the assimilation of an asymptotically unbiased estimator to an unbiased estimator. The bound then becomes: 2 Oy

(19)

(e = e.)

CRBunbias"d

2

4b FX

2

" "

Here, Ap is the bias on the parameters and ones p is a vector with the same size as p but containing only

926

(23)

Taking the limit situation for the value of the parameter b" tending to zero, this bound tends to infinity! lim CRBunbias"d =

b~O

Note that both bounds are equal when the bias is negligible. This proves that both calculations effectively represent the same phenomenon. When the value of b gets smaller than unity, there is a significant difference between the bounds, even if the estimator is asymptotically unbiased as is the case here.

(24)

00

This reasoning leads to the result that the estimator has an infinite variance! Unless the distribution of the estimator is some pathological case, this means that the parameter cannot be identified with this estimator!

Remark that there is a difference between both bounds not only in the point b=O, where the unbiased bound is degenerate, but also in an interval around this point. The performance of the estimator as predicted by the bound for unbiased estimator is far too large, which means that this bound is not a lower bound for the attainable covariance. It is therefore dangerous to claim that an asymptotically unbiased or biased estimator is optimal in the sense that its covariance reaches the Cramer-Rao bound for unbiased estimators.

Taking the correct approach and considering the estimator as what it really is, a biased estimator, the bound becomes:

2

aye

b:, [

2o~

5 n-)D 2

2

2

b" -2 a~

5(--) -

b" 3 -n-)D a~

2

2 ] 2

b" a~

3(--)

2

In the literature the performance of an asymptotic unbiased estimator is often measured using the bound for unbiased estimators (See e.g. Pintelon and Schoukens (1990». The current example shows that this is a very dangerous criterion, unless the derivative of the bias with respect to the parameter(s) is evaluated and proven to tend to zero fast enough .

2

21tFX"a~

The behaviour of this form has been plotted below for different values of b assuming unity variance

(a~ = 1) and 1 measurement point (F = 1). 3.0E-1

.D

1

2.4. Simulation of the estimJltor

c:

o

To check the validity of the performed calculations, the estimator will be simulated under circumstances analog to the ones used to create the graphs shown above. The number of measurements in one estimation will thus be set to one while the variance of these measurements is set to unity. The number of runs is

."

c: 2.0E-1

E o

~ ~

1.0E-1

E

S

j

5

o.0E.0 -f.,.,'TT"f~'T'"',.......,""""':~;::;:;'~:;:r;==1'TTTrr=O'rrj~-1 o 2 3 4 5 b 7 8 9 10

fixed to 10 . The difference between the theoretically predicted and the experimental bias is shown below, together with its 95% confidence interval.

bl sigma b

Fig. 5

Cramer-Rao bound for th" biased estimator

4.0E-3.------------------,

Remark that the variance remains finite at b=O. The

III

~

variance of the estimates bF is thus finite, and not an infinite as predicted by unbiased estimates bound.

2.0E-3

."

~ O.OE+O

"3

E .;;; -2.0E-3

The variance of the estimated parameter is larger for a close to zero parameter value. However, one should be careful when interpreting this graph, since in the drawn case the variance of the result is set to unity and the number of measurements to one. This implies that for values of b smaller than unity the signal-tonoise ratio is lower than 0 dB.

U

~

w

-4.0E-3 -6.0E-3 -f,-,,,...,.,~,.,,.~rrr.......,...,..,'TT""',,...,.,~.,.,....~,......,_....,.,,..,.,...., o 2 3 4 5 b 7 8 9 10 b/sigma b

Fig. 7

Difference between experimental and theoretical bias with 95'10 confidence levels.

In the next graph, the biased and unbiased bounds are compared against each other. 3.0E-l 1.0E.2....-------------------,

2.5E-l ~ ca ca 2.0E-l

.D

.2

1.0E.l

"t:

."

>

c:

.B 1.0E+O

ci. 1.5E-l c» oll 1.0E-l )(

o

/1. .:.

1.0E-l

m

5

~

E 1.0E-2 ~

0.0E +0 +"..,...,..,....,..,..,..,,....-rrTTTT';:;;::;:;:;:;:;:;:;::;:rr;TiTi?T'i'T'fT'i"i'' ' ' '..-ri o 2 3 4 5 b 7 8 9 10 bl sigmab

()

1.0E-3 --f-n,.....,.~'T'"',.......,~TT'",.......,~TT'",.......,~TT'"rTTl~." 2 3 4 5 (, 7 8 9 10 o bl sigma b

Fig. (,

5.0E-2

Fig. 8

Cramer-Rao bound for biased (full line) and unbiased (dotted line) estimator

927

Experimental variance (dots) and biased Cramer-Rao bound (solid).

In fig. 7 the simulated value matches the exact calculated one better than the theory predicts it.

Norton J.P. (1986). An Introduction to Identification. Academic Press, London.

Next, the experimental variance of the estimator will be compared to the Cramer-Rao bound calculated above. In fig. 8, the experimental variance (dotted line) is shown to be well above the Cramer-Rao bound (full line) for values of the nonnalised parameter that include the zero paramater value inside the 30 interval. The inefficiency, calculated as the difference between experimental and theoretically obtainable variance is shown in fig. 91 :

Pintelon R. and J. Schoukens (1990). Robust Identification of Transfer Functions in the s- and z-Domains. IEEE Transactions Instrumentation and Measureml7lt, Vol. IM-39, No. 4, pp. 565-573. Poston T. and I. Stewart (1978). Catastrophe Theory and its AppliCJltions. Pitman, London. Papoulis,A (1981). Probability, Random Variables, and stochastic processes. Mc Graw-Hill, Auckland.

5.0E-2~-----------------,

4.oE-2 ~

3.0E-2



2.0E-2

c

i

.E 1.0E-2 O.OE+O -1 .OE-2 +-""""~TTT""""""~TTT""""""~TTT""""""~TTT"""""""""""'.,.-j 2 3 4 5 b 7 8 9 10 o b/sigma b

Fig. 9

Simulated inefficiency of the estimator.

As can be seen, the inefficiency increases with increasing bias. Nevertheless, the experimental variance of the estimates remains finite in practice, and this is the most important result here. 3. CONCLUSION It has been shown in this example that the CramerRao bound for unbiased estimators cannot always be used directly to predict the behaviour of asymptotically unbiased estimators. In some situations, the bound for unbiased estimators is proven not to be a lower bound for the attainable covariance at all. Solution to this problem is to detennine the Cramer-Rao bound for biased estimators. This bound is always a lower bound for the covariance. However, this implies the evaluation of the derivatives of the bias with respect to the parameters, and even if these are approximated, this may lead to cumbersome and complex calculations. In one word, one can state that the efficiency results presented for asymptotically unbiased estimators have to be taken with care because even if the bias vanishes when the number of measurements tends to infinity, it is not clear at all that the derivatives of this quantity will behave likewise. 4. REFERENCES Kendall M. and A. Stewart (1979).The Advanced Theory of Statistics. Charles Griffin & Company, London. Gradsteyn 1.5. and I.M. Ryzhik (1991).Table of Integrals, Series and Products. Corrected and Enlarged Edition .. Academic Press, San Diego.

1. The Cramer-Rao bound has become dependent on the estimator here. Hence, it may be possible to determine the biased - or asymptotically unbiased - estimator that minimizes the CramerRao bound. This may be a very powerful estimator, if it is possible to find it.

928