Bayesian inference on a squared quantity

Bayesian inference on a squared quantity

Measurement 48 (2014) 13–20 Contents lists available at ScienceDirect Measurement journal homepage: www.elsevier.com/locate/measurement Bayesian in...

602KB Sizes 0 Downloads 93 Views

Measurement 48 (2014) 13–20

Contents lists available at ScienceDirect

Measurement journal homepage: www.elsevier.com/locate/measurement

Bayesian inference on a squared quantity Carlo Carobbi ⇑ Università degli Studi di Firenze, Department of Information Engineering (DINFO), Via S. Marta 3, 50139 Firenze, Italy

a r t i c l e

i n f o

Article history: Received 23 August 2013 Received in revised form 3 October 2013 Accepted 24 October 2013 Available online 31 October 2013 Keywords: Bayesian inference Frequentist inference Method of moments estimators Measurement uncertainty Noncentral chi-squared distribution

a b s t r a c t It is here derived the Bayesian estimator of the noncentrality parameter of the noncentral chi-square distribution. The corresponding frequentist estimator, based on the method of moments, is also derived and its performance is compared with the Bayesian one. The Bayesian estimator is obtained through an analytical derivation which provides insight into the way the estimator works. Reference is also made here to a previously published work on a similar subject by Attivissimo et al. (2012) [1] in order to resolve the paradox there presented. Some defects of the analysis performed in the referenced work are identified and carefully examined. The superiority of the Bayesian estimator is demonstrated although achieved at the price of a greater complexity. Ó 2013 Elsevier Ltd. All rights reserved.

1. Introduction In the recent paper [1] arguments are presented in favour of frequentist inference and against Bayesian inference on the basis of what the authors of [1] judge to be an apparent paradox. The scope here is to investigate how Bayesian inference works when predicting the square of an unknown quantity. It is also shown that the arguments in [1] against Bayesian inference and in favour of frequentist one are questionable. The paper is structured as follows. In Section 2 the basic assumptions about the involved quantities are done and the statistical inference problem is defined and solved through a frequentist analysis based on the method of moments (MoM). The paradox described in [1] is discussed in detail and resolved in Section 3. Section 4 is devoted to the solution of the same inference problem but by using a Bayesian instead of a frequentist analysis. The derivation of Bayesian estimators is done in such a way to ease interpretation of how the estimators process available information and in particular observed values. The inference problem essentially consists in finding the estimator of the ⇑ Tel.: +39 055 4796268; fax: +39 055 4796497. E-mail address: carlo.carobbi@unifi.it 0263-2241/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.measurement.2013.10.034

noncentrality parameter of the noncentral chi-square distribution with one degree of freedom. As far as we know this problem has not been previously dealt with by using a Bayesian approach, either in the scientific or technical literature. In Section 5 the Bayesian estimator is compared with the frequentist one through numerical computation. Section 6 is finally devoted to the conclusive remarks. Only a few and essential references appear. Attivissimo et al. [1] is the paper that triggered this work, in [2,3] Bayesian inference is dealt with in the light of the most authoritative standards devoted to the evaluation of measurement uncertainty, namely [4] (the so-called GUM) and [5] (Supplement 1 to GUM). Therefore the reader interested in a detailed discussion about the relation between the GUM, its Supplement 1 and Bayesian analysis is directed to [2,3] (rather than to [1], where such relation is also dealt with). References from [6–9] provide specific theoretical support and interpretation to the mathematical analysis developed in the following sections. The following conventions are here adopted. The name of a quantity and the corresponding random variable are both represented by an italic and capital letter, e.g. Q. A realization or observation of the quantity Q, as obtained from measurement, is represented by the corresponding italic and lower case letter, i.e. q.

14

C. Carobbi / Measurement 48 (2014) 13–20

2. Frequentist inference on l2 Let X1, X2, ..., Xn be n independent, real, normal quantities with expected value l and variance r2. It assumed that l is unknown while r2 is known, as in [1], and we want to predict l2. In doing this we consider two different situations A and B, where two distinct sets of observations are obtained from measurements, namely A. x1, x2, ..., xn, or B. x21 ; x22 ; . . . ; x2n : Both situations A and B are of interest, in particular in electrical and electronic measurements where l may represent the value of a constant signal and r the rootmean-square value of an additive and thermal equivalent (zero-mean, stationary, normal) noise. In situation A the measuring instrument (e.g. an oscilloscope or digital multimeter) can detect the magnitude and sign of X. In B the a square-law detector is implemented in the measuring instrument (e.g. in field and power meters) that therefore detects X2. In this section we deal with the problem of estimating l2 by using frequentist inference based on the method of moments (MoM), see [6 , p. 301]. Let EX[g(X)] and VX[g(X)] be respectively the expectation and variance of g(X), where g(X) represents a general function of X. Expectation and variance are calculated by using the probability density function (pdf) of X (normal in this case). In situation A the P 2 true power l2 is estimated by ðXÞ , where X ¼ 1n ni¼1 X i . X 2 is normal with expected value l and variance r /n. Therepffiffiffi 2 fore ½X=ðr= nÞ is a noncentral chi-square random variable with m = 1 degrees of freedom and noncentrality pffiffiffi 2 parameter k ¼ ½l=ðr= nÞ . Then [7, p. 943]

h 2 i r2 r2 EX ðXÞ ¼ ðm þ kÞ ¼ l2 þ ; n n

ð1Þ

  h 2 i r4 r2 r2 V X ðXÞ ¼ 2 2ðm þ 2kÞ ¼ 2 2l2 þ n n n

u2





l2A  2

n

r2 n 1 n

ð3Þ

;



2ðxÞ2 þ

r2 n

 ð4Þ

;

Pn

x¼ where  i¼1 xi . Eqs. (3) and (4) need some comments. For the validity of 2

r2 n

ðm þ kÞ ¼ l2 þ r2 ;

ð5Þ

and

V X ½ðX 2 Þ ¼

r4 n2

2ðm þ 2kÞ ¼ 2

r2 n

ð2l2 þ r2 Þ:

ð6Þ

The MoM estimator for situation B is derived similarly to the one for situation A as

l2B  ðx2 Þ  r2 ; u2





l2B  2

r2 h n

ð7Þ 2ðx2 Þ þ r2

i

ð8Þ

Pn 1

2 i¼1 xi

where ðx2 Þ ¼ n

Analogous considerations to those done above for l2A apply to l2B and concerning the validity of the estimator, its variance and consistency. When n = 1 we have ð xÞ2 ¼ ðx2 Þ ¼ x21 and

ð9Þ

and

ð2Þ

MoM estimator in situation A (MoM estimator A) is obtained from (1) and (2), in terms of the estimate l2A   and its variance u2 l2A , as

l2A  ðxÞ2 

EX ½ðX 2 Þ ¼

l2A ¼ l2B  x21  r2 ;

and

r2

Þ2 absurd). Note that (4) has been derived substituting ðx 2 to l in (2). Actually, the logical estimate of the variance of l2A should be derived substituting l2A to l2 in (2), thus h i   2 2 xÞ2  rn . However this estimate obtaining u2 l2A  2 rn 2ð   of u2 l2A is not strictly positive and therefore unacceptable (variance is positive by definition). This lack of transparency of the MoM estimator is not incidental but intrinsic to frequentist inference, contrarily to Bayesian inference. Estimator (3) is consistent, in that it converges (in probability) to the true value l2 when n increases. Correspondingly its variance (4) tends to zero. In situation B the estimator of l2 (MoM estimator B) is P 2 ðX Þ ¼ 1n ni¼1 X 2i . ðX 2 Þ=ðr2 =nÞ is a noncentral chi-squared random variable with m = n degrees of freedom and pffiffiffi 2 noncentrality parameter k ¼ ½l=ðr= nÞ . We then have

xÞ2 and that ð xÞ2  rn > 0. (3) it is needed that EX ½ðXÞ   ð Both conditions are verified when n is so large that r2/n is small when compared with l2. Then for small sample size and small signal-to-noise ratio MoM estimate is expected to fail, in the sense that l2A may be largely deviated from l2 or may even be negative (which is evidently 2

u2













l2A ¼ u2 l2B  2r2 2x21 þ r2 :

ð10Þ

Finally note that when n > 1 MoM estimator A has less bias and variance than MoM estimator B. 3. Resolution of the paradox [1] The analysis in [1] is done in the case n = 1 where, as it will be shown in Section 4, Bayesian inference leads to the same results in both situations A and B, namely

El ½l2  ¼ x21 þ r2 ;

ð11Þ

and

  V l ½l2  ¼ 2r2 2x21 þ r2 :

ð12Þ

It is important to observe that in the Bayesian approach

l is viewed as a random variable whose known probability density function is a mathematical description of the state of knowledge about its unknown and constant true value.

C. Carobbi / Measurement 48 (2014) 13–20

Comparing (9) with (11) we have that noise power r2 is subtracted from the single observation x21 by the MoM estimator while is added by the Bayesian one (further note that the estimators have the same variance, compare (10) with (12)). This different behaviour of the two estimators is labelled as paradoxical by the authors of [1] because both estimators are ‘‘optimal’’ and: ‘‘The optimality of an estimator . . . and of the other . . . is an obvious contradiction’’ ([1], Section 5, p. 2199). The optimality criterion referred to in [1] is the minimum mean squared error (MSE). In particular the authors of [1] state that (symbols are aligned to those adopted in this work): ‘‘Probability theory . . . assures that if the measurement . . . is repeated many times, for many values of the input voltage l . . ., the overall MSE

e2B ¼ ðEl ½l2   l2 Þ2 of the Bayesian estimator is the lowest possible. But probability theory also assures that if the measurement . . . is simulated many times with a fixed in 2 put voltage l, the MSE e2F ¼ l2A  l2 of the frequentist estimator . . . is the lowest possible.’’ These two statements need to be analyzed in detail because they form the whole conceptual basis on which the motivation for the supposed paradox relies. Let us first consider the second statement, the one referred to the frequentist estimator. If the experiment is repeated many times, thus collecting J values x2j ; j ¼ 1; 2; . . . ; J, it is actually verified that

e2F ¼

J i2 1 Xh 2 xj  r 2  l 2 J j¼1

ð13Þ

tends to zero when J tends to infinity because EX ½X 2  ¼ l2 þ r2 . Therefore the second statement is clear and true. We now consider the first statement. Here the situation is much more involved. In order to evaluate the MSE of the Bayesian estimator, according to the authors of [1], a random sequence of values ll, l = 1, 2, . . ., L, should be generated. However it is not clear at all how such sequence should be used to evaluate the above mentioned MSE for the Bayesian estimator. Clarification comes in a later section (see [1], Section 5.1, right column, third paragraph, p. 2199) where it is stated that, in order to evaluate the MSE of the Bayesian estimator, also a sequence of measured values Xl,m, m = 1, 2, . . ., M, should be generated for each ll. Now, for each ll, the MSE of the Bayesian estimator can be evaluated as

e2B ¼

M h i2 1X X 2 þ r2  l2l : M m¼1 l;m

ð14Þ

In fact, the MSE of the Bayesian estimator is evaluated in the same way as that of the frequentist one. Since e2B tends to 4r4 when M tends to infinity then

e2B ¼ e2A þ 4r4

ð15Þ

The authors of [1] evaluate the optimality of the two estimators in the same way, i.e. by using the minimum MSE error as optimality criterion, the MSE being evaluated on the basis of a sequence of data generated according to the relevant parent distribution (for given values of its

15

parameters l and r2). Since, see (15), the MSE of the Bayesian estimator is greater than that of the frequentist estimator the conclusion arrived at in [1] is that the Bayesian estimator is ‘‘inherently inferior’’ to the frequentist one (see [1], Section 5.2, comma (2), p. 2200). This conclusion is evidently wrong because the criterion adopted by the authors of [1] in order to evaluate the performance of the Bayesian estimator is not the correct one (and it is actually not mentioned in the papers cited by the authors themselves, see references [19] and [20] in [1], citation appears in the first paragraph of clause 5.1, p. 2199). The crucial point, disregarded in [1], is that if a sequence of observations is available, instead of a single one, then the Bayesian estimator is updated by this larger amount of information. Therefore (11), which is valid in case of a single observation, does not apply anymore if more than one observation of the quantity X is available. In case of n > 1 observations the Bayesian estimators to be compared with the frequentist ones are those given by (18) or (28) in situations A or B, respectively. The frequency evaluation (i.e. evaluation based on a long sequence of hypothesized or numerically generated observations) of Bayesian inference is a well known technique (e.g., [8], chapter 4) which is useful to demonstrate important and general results,1 if properly applied. In Section 5 of this work it is used to evaluate the performance of a specific estimator. The fact that the Bayesian and frequentist estimators are different is not a paradox in itself because they are obtained through two completely different derivations and the conclusion that they are both asymptotically (n ? 1) unbiased and minimum variance does not imply that they have to be the same for any value of n. On the contrary, the possibility that for the same value of n one estimator may be less biased and have less variance than the other is the motivation for adopting the Bayesian rather than the frequentist inference scheme or vice versa. We further observe that the authors of [1] consider incorrect the Bayesian estimator because they argue that noise power r2 should add up to signal power l2 to produce the measured power x21 (see [1], Section 5.2, p. 2201). Therefore the appropriate way to predict l2 would be through (9), ‘‘a customary operation in electrical engineering’’ (again [1], Section 5.2, p. 2201), instead of (11). Also this claim is questionable because if only one value x1 is measured then x21 does not represent the average power but the instantaneous power of the signal-plus-noise and in electrical engineering it is not a ‘‘customary operation’’ to subtract from the instantaneous power x21 of the signal-plus-noise the average power of noise r2 in order to estimate the (instantaneous or average power) of the signal l2. This practice is wrong not only in electrical engineering but also in MoM based estimation, since a large sample size n is required for the validity of the MoM estimator. Note in particular that l2A , as given by (9), may easily result to be negative, an embarrassing result for the estimate of a power.h The probability of X 21  r2 being i negative, in symbols Pr X 21  r2 < 0 , is plotted in Fig. 1 1 Such as the asymptotic normality of the posterior pdf and the consistency of the point estimate. These results are valid under mild regularity conditions (see [8], Appendix B).

16

C. Carobbi / Measurement 48 (2014) 13–20

estimators is an oversimplification (for further discussion see [8], p. 115) sometimes at the expense of Bayesian inference (when the pdf is asymmetric).

0.7 0.6

Pr ( X21 - σ 2< 0 )

0.5

4. Bayesian inference on l2

0.4

As anticipated we distinguish between situations A and B.

0.3

4.1. Situation A: x1, x2, . . ., xn are observed 0.2

The likelihood of the observed values is 0.1 0

" 0

2

4

6

8

10

μ2/σ 2 Fig. 1. Probability that

X 21

2

2

 r2 < 0 as a function of l /r .

as a function of the signal-to-noise ratio l2/r2. It is evident from Fig. 1 that this probability is pretty large when l2 is not large with respect to r2 (e.g., probability greater than 30% when signal power is less than two times noise power). The concern expressed by the authors of [1] against Bayesian estimator can be favourably interpreted considering that when n = 1, and more generally when n is small, the estimator is mainly determined by the information conveyed by the prior pdf. In absence of prior information, as assumed here and in [1], the prior pdf is non-informative and, roughly speaking, the quality of the Bayesian estimator is poor, just as poor as that of the frequentist one. In the next clause we will deal with the problem of estimating l2 by using Bayesian inference, in both situations A and B and for any number of observations n P 1. It will be shown that, especially when n is appreciably larger than one, the Bayesian estimator B gets ahead of the frequentist one, both in terms of bias and variance. Further, it does not incur the unpleasant circumstance of being negative, as unfortunately may occur to the frequentist estimator. A final remark about the meaning of the Bayesian estimator and the comparison between Bayesian and frequentist estimators follows. The result of Bayesian inference is a pdf of the inferred parameter. Such pdf is obtained from prior inter-subjective (or commonly perceived objectivity, [3]) knowledge about the parameter and experimental evidence. A coverage interval can be derived from the pdf having an agreed amount of probability (degree of belief) of containing the true value of the parameter. Once the information conveyed by the pdf is condensed into a single value by using the statistical expectation operator it is definitely not assured that such value will match with the physical expectation of the user of such information. This is likely to occur when the pdf is asymmetric (see [9] for an example in the case of the log-normal pdf). Through frequentist inference an estimate of the unknown parameter is found but no pdf can be assigned to the parameter since it is interpreted as a deterministic value. Thus reducing the comparison between Bayesian and frequentist inference to the comparison between the respective

lðxjl; r2 Þ / exp 

# ðl  xÞ2 ; 2 2 rn

ð16Þ

P where x = (x1, x2, . . ., xn) and  x ¼ 1n ni¼1 xi . Assuming total ignorance about l (i.e. constant uninformative prior for l) then the posterior pdf of l is simply 2

f ðljx; r

" # 1 ðl  xÞ2 : Þ ¼ pffiffiffiffiffiffiffi r exp  2 2 rn 2p pffiffin

ð17Þ

From (17) we conclude that l is normal with expecta2 tion  x and variance rn and therefore

El ½l2  ¼ ðxÞ2 þ

r2 n

ð18Þ

;

and

V l ½l2  ¼ 2

  r2 ; 2ðxÞ2 þ n n

r2

ð19Þ

pffiffiffi 2 because ½l=ðr= nÞ is noncentral chi-squared with m = 1 degree of freedom and non-centrality parameter pffiffiffi 2 k ¼ ½ x=ðr= nÞ . In situation A, MoM (see (3) and (4)) and Bayesian (see (18) and (19)) estimators are the same except for a change of sign of the term r2/n in (18) with respect to (3). Note however that in this Bayesian analysis no approximations are needed, the derivation is straightforward and transparent, the pdf of l2 is also obtained. From the physical point view the most satisfactory estimate of signal power would 2 be ð xÞ2 rather than ð xÞ2 þ rn , however the pdf of l2 is asymmetric (the more asymmetric the larger is r2/n) thus the considerations made at the end of Section 3 are relevant to this case. 4.2. Situation B: x21 ; x22 ; . . . ; x2n are observed Since X is normal with expectation l and variance r2 then X2/r2 is noncentral chi-squared with m = 1 degree of freedom and non-centrality parameter k ¼ l2 =r2 . It is evi2 2 dent that, since r  is known, the inference on l from x2 ¼ x21 ; x22 ; . . . ; x2n is equivalent to the inference on k from x2 =r2 . The likelihood of x2/r2 is therefore

  n    pffiffiffi jxi j exp  12 kn Y x2 2 / ; k k; r I 1=2 r2 r kn=4

 l

i¼1

ð20Þ

17

C. Carobbi / Measurement 48 (2014) 13–20

where I1/2(z) is the modified Bessel function of the first kind and of order 1/2. In Appendix A it is demonstrated that

(( "  X  2 # 2n1 x2 n pffiffiffi Rjxjk 2 l 2 k; r / exp  k 2 r nr k¼1 " #) "   2 2 #) n pffiffiffi Rjxjk n Rjxjk exp ; kþ þ exp  2 2 nr nr

P2n1 V k ½k ¼ V s ½s2  ¼



P2n1 k¼1

ð21Þ

sj

f

x2

r

2

; r2



n ¼ pffiffiffi 2p

P2n1



r

¼

Rjxjk nr

k¼1

ð24Þ

;

and the weights are

2

n jxjk wk ¼ exp 4 2 r

!2 3 5:

ð25Þ

With this in mind the expectation and variance of k are easily obtained as

P2n1 Ek ½k ¼ Es ½s2  ¼

k¼1

2  wk þ 6n jxjrk þ

P2n1

k¼1 wk

3  E2s ½s2 : n2

P2n1 h 2 r2 i jxjk þ n wk k¼1 El ½l2  ¼ ; P2n1 k¼1 wk

 2 jxjk

r

P2n1 k¼1

 þ wk

P2n1



k¼1

V l ½l2  ¼

jxj4k þ

P2n1 k¼1

6jxj2k r2 n

  wk þ

wk

3r4  E2l ½l2 : n2

ð29Þ

ð22Þ

The comparison between (28) and (18) reveals the way the Bayesian estimator works. Since sign information is lost due to quadrature then the Bayesian estimator evaluates l through the weighted average of all the possible 2n linear combinations of the |xi| s with coefficients +1 or 1, i.e. through jxjk in place of  x. The weight is maximum when

wk

ð26Þ

ð23Þ

the same sign, positive or negative, is attached to all |xi| s. The more deviated is the value of jxjk from that obtained when all |xi| s have the same sign the smaller is the weight. Obviously the importance of (28) and (29) lies in the insight that they provide into the way the Bayesian estimator works, not in their numerical efficiency, which is quite poor. Indeed their numerical evaluation requires the computation of the weighted average of an exponential (2n1) number of terms (e.g. about 1.3  1030 terms if n = 100). It is therefore necessary to evaluate the moments of the pdf (23) once that it has been readapted for numerical calculation as (see Appendix C)

f

1 n

;

ð27Þ

ð28Þ





  2  2  2 þ exp  n2 s þ Rnjxjrk exp 2n Rnjxjrk exp  2n s  Rnjxjrk   : 2 P2n1 n Rjxjk k¼1 exp 2 nr

The normal pdfs are centred in

jxjk

r



 

   2 pffiffiffi Rjxj 2 pffiffiffi Rjxj 2 þ exp  n2 exp n2 Rnjxjrk exp  n2 k  nr k k þ nrk   : 2 P2n1 n Rjxjk k¼1 exp 2 nr

For the purpose of calculating the normalization constant of the posterior pdf, the expectation and variance of kpit ffiffiffi is convenient the change of variable from k to s ¼ k. Indeed, when expressed in terms of s, the pdf in (22) transforms into the following weighted combination of normal pdfs



jxjk

In terms of l2 we have from (26) and (27),

where the term R|x|k represents one (the k-th) of the possible 2n linear combinations of |x1|, |x2|, . . ., |xn| with coefficients +1 or 1. Since we assumed, in case A, constant prior for l then the pffiffifficorresponding prior for k, in case B, is proportional to 1= k (see Appendix B). Hence the posterior pdf of k is

  x2 n f kj 2 ; r2 ¼ pffiffiffi r 2pk

 4

k¼1



sj

x2

r2

;r

2



/ exp

n X i¼1



s2 2

þs

jxi j

r





þ ln 1 þ exp 2s

 ! jxi j

r

;

ð30Þ

18

C. Carobbi / Measurement 48 (2014) 13–20 7

5 4.5

6

n = 300

n = 150

4

5

3.5

f (τ)

std dev

4 3

3 2.5 2 1.5

2

n=1

1

1

0.5 0

0

1

2

3

4

5

6

0 0 10

7

τ

10

1

10

2

10

3

n

Fig. 2. Pdf (30) is here plotted as a function of s = |l|. Each pdf corresponds to a sequence of length n, with n equal to 1, 150 and 300. The true value of l is 3 and r = 1.

11

Fig. 4. Comparison between the standard deviation, square root of (8), of the MoM estimator (red line with dots) and the standard deviation, square root of (29), of the Bayesian one (black line with dots) as a function of the length of the sequence n. The true value of l is 3 and r = 1. Note that when n = 1 both estimators have the same variance, as expected. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

10 9 6

expected value

8 7

n = 300

5

6 5

4

n = 150

f (τ)

4 3 2 1 0 10

3

2 1

2

10

10

3

10

n Fig. 3. Comparison between the MoM estimator (7) (red line with dots) and the Bayesian one (28) (black line with dots) as a function of the length of the sequence n. The true value of l is 3 and r = 1. Note that the difference between the estimators is 2r2 = 2 when n = 1, as expected. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

where the summation involves only n terms instead of 2n1. The same results as those obtained through direct application of (28) and (29) are found by using (30). This is the method followed to produce the results reported in Section 5. Since the pdf of l2 is available also coverage intervals might be numerically calculated but this has not been done because the focus of this work is on point rather than on interval estimation.

5. Numerical results It is interesting to compare MoM estimates (7) and (8) with the Bayesian ones (28) and (29), respectively. We consider two cases (r = 1):

n=1

1

0

0

1

2

3

4

5

6

τ Fig. 5. The same as Fig. 2 but in the case where the true value of l is 0.1 (and r = 1).

1. l = 3; 2. l = 0.1. Case 2 is extreme (noise power one hundred times larger than signal power) however the analysis here developed is general and the comparison between the two approaches, frequentist and Bayesian, is particularly interesting when noise is comparable or large with respect to signal. For the purpose of this comparison 300 sequences of n random terms, with n ranging from 1 to 300, is constructed as here described: the 1st sequence is x21 , the 2nd sequence is x21 ; x22 , the 3rd sequence is x21 ; x22 ; x23 and so on until the 300th and last sequence x21 ; x22 ; x23 ; . . . ; x2300 . It is important to compare both the asymptotical (i.e. for large n) and small sample (i.e. small n) behaviour of MoM and Bayesian estimators.

19

C. Carobbi / Measurement 48 (2014) 13–20

zero but, notably, the MoM one approaches to zero from negative values. This implies that, at least for the particular random sequence generated for the present comparison, the MoM estimator predicts a negative signal power, not only for small sample sizes but also for large ones. A negative power obviously corresponds to a meaningless prediction. On the contrary the Bayesian estimator cannot be negative, although asymptotically unbiased. Finally, see Fig. 7, the Bayesian estimator has an appreciably smaller standard deviation then the MoM one has.

3 2.5

expected value

2 1.5 1 0.5 0

6. Conclusion

-0.5 -1 0 10

1

10

10

2

3

10

n Fig. 6. The same as Fig. 3 but in the case where the true value of l is 0.1 (and r = 1).

3.5 3

std dev

2.5 2 1.5 1 0.5 0 0 10

1

10

10

2

3

10

The fact that Bayesian and frequentist approaches to inference arrive at different estimators cannot be labelled as a paradox, as the authors of [1] claim. Further, differently from what is stated in [1], Bayesian estimator of a squared quantity is not inferior to the corresponding frequentist one and based on the method of moments. Depending on the experimental situation at hand the Bayesian estimator performs the same (large signal-to-noise ratio) or better (small signal-to-noise ratio) than the frequentist one both in terms of bias and variance. In addition, the frequentist estimator suffers from the tendency to be negative (and therefore invalid) for small sample size and small signal-to-noise ratio while the Bayesian one is strictly positive. The major advantage of the frequentist estimator over its Bayesian counterpart is its simplicity: a pocket calculator is sufficient to evaluate the frequentist estimator. On the contrary, in order to evaluate the Bayesian estimator one needs to resort to numerical computation due to the complexity of the posterior probability density function (in particular in the experimental situation named B in this work).

n

Appendix A. Fig. 7. The same as Fig. 4 but in the case where the true value of l is 0.1 (and r = 1).

For the generation of the plots from Figs. 2–7 an ordinary personal computer (dual processor, 2.13 GHz clock frequency, 2 GB random access memory) was used running Matlab. Numerical integration of the pdf (30), necessary to evaluate I and II moments, is based on the use of 5000 discrete points equally spaced over the horizontal axis interval represented in Figs. 2 and 5. Computation time is of the order of few tens of seconds. Let us first consider case 1 (i.e. the true value of l equals 3). In the plots red is for MoM and black is for Bayesian estimate. In Fig. 2 the pdf (30) for the sequences whose length is 1, 150 and 300 is plotted. The larger is n the more concentrated about s = |l| = 3 is the corresponding pdf. The expected values are shown in Fig. 3 while their standard deviations are in Fig. 4. From Figs. 3 and 4 we conclude that, in this case, MoM and Bayesian estimators are equivalent. We now consider case 2 (the true value of l equals 0.1). The abscissa of the peak of the pdf, see Fig. 5, tends to get closer to zero when n increases. The expected values of both Bayesian and MoM estimators, see Fig. 6, tends to

Since

rffiffiffiffiffiffi 2 coshðzÞ; I1=2 ðzÞ ¼ pz

ðA:1Þ

then, substituting (A.1) into (20), we obtain

l

  Y   n pffiffiffi jxi j x2 1 2 / exp  : k; r kn cosh k 2 r2 r



ðA:2Þ

i¼1

Now, since 2 cosh (z) = exp (z) + exp (z), we have  2   Y     n  pffiffiffi jxi j pffiffiffi jxi j x 1 k l 2 k; r2 / exp  kn exp þ exp  k : r r r 2 i¼1 ðA:3Þ

The product

    n  Y pffiffiffi jxi j pffiffiffi jxi j þ exp  k k exp i¼1

r

is easily rearranged as

r

ðA:4Þ

20

C. Carobbi / Measurement 48 (2014) 13–20

2n X

  pffiffiffi Rjxjk : exp  k

ðA:5Þ

r

k¼1

Appendix C. The numerator of (23) is easily rearranged as follows 



  2 2  2  X 2 þ exp  n2 s þ Rnjxjrk exp n2 Rnjxjrk exp  2n s  Rnjxjrk n1

The term R|x|k represents one (the k-th) of the possible 2n linear combinations of |x1|, |x2|, . . ., |xn| with coefficients +1 or 1. For example, in the case n = 3 we have 23 = 8 possible linear combinations, namely: |x1| + |x2| + |x3|, |x1| + |x2| + |x3|, |x1|  |x2| + |x3|, |x1| + |x2|  |x3|, (|x1| + |x2| + |x3|), (|x1| + |x2| + |x3|), (|x1|  |x2| + |x3|) and  (|x1| + |x2|  |x3|). Note that 2n1 combinations are obtained from the other 2n1 through a change of sign. Therefore  pffiffiffi  X n1  pffiffiffi   pffiffiffi  2 X 2 Rjxjk Rjxjk Rjxjk k k exp  k exp ¼ þ exp  k¼1 n

r

k¼1

r

r

ðA:6Þ

Substituting the right-side of (A.6) in place of the product in (A.3) we have  2   X     2n1  pffiffiffi Rjxjk pffiffiffi Rjxjk x 1 þ exp  k ; l 2 k; r2 / exp  kn exp k 2 r r r k¼1 ðA:7Þ

and rearranging

(( "  X  2 # 2n1 x2 n pffiffiffi Rjxjk 2 / k  k; r exp  2 r2 nr k¼1 " "   2 #) 2 #) p ffiffi ffi n Rjxjk n Rjxjk exp : ðA:8Þ kþ þ exp  2 2 nr nr

 l

Appendix B. Let fW(w) be the pdf of the real random variable W. Then the pdf fZ(z) of the random variable Z = W2 is obtained from fW(w) as (see [6], p. 119, equation (4–84))

pffiffiffi pffiffiffi 1 1 fZ ðzÞ ¼ pffiffiffi fW ð zÞ þ pffiffiffi fW ð zÞ 2 z 2 z

ðA:9Þ

if z > 0, and fZ(z) = 0 if z < 0. Therefore if fW(w) is the improper constant prior then fZ ðzÞ / p1ffiz.

k¼1

¼

2n1 n X



io   h k k þ exp s Rjxj exp  n2 s2  exp s Rjxj r r

k¼1 n h



i   Y exp s jxri j þ exp s jxri j ¼ exp  2n s2  i¼1

¼ exp



 2n



(

s  exp ln 2

n h



i Y exp s jxri j þ exp s jxri j

)!

i¼1



 2



 2

¼ exp  2n s

 exp

n X

! h



i ln exp s jxri j þ exp s jxri j

i¼1

¼ exp  2n s

 exp

n n X

h



io

!

s jxri j þ ln 1 þ exp 2s jxri j

i¼1 n n h

io X 2 ¼ exp  s2 þ s jxri j þ ln 1 þ exp 2s jxri j

!

i¼1

ðA:10Þ

and the validity of (30) is demonstrated. References [1] Filippo Attivissimo, Nicola Giaquinto, Mario Savino, A Bayesian paradox and its impact on the GUM approach to uncertainty, Measurement 45 (2012) 2194–2202. [2] Ignacio Lira, Evaluating the measurement uncertainty – fundamentals and practical guidance, Institute of Physics, Series in Measurement and Technology (2002) (Edition 1). [3] Giulio D’Agostini, Bayesian Reasoning in Data Analysis: A Critical Introduction, World Scientific Publishing, 2003. [4] Joint Committee for Guides in Metrology (JCGM), Evaluation of measurement data – guide to the expression of uncertainty in measurement, JCGM 100:2008, First edition, 2008. [5] Joint Committee for Guides in Metrology (JCGM), Evaluation of measurement data – supplement 1 to the ‘Guide to the expression of uncertainty in measurement’ – propagation of distributions using a Monte Carlo method, JCGM 101:2008, First edition, 2008. [6] Athanasios Papoulis, Probability & Statistics, Prentice-Hall International, Inc., 1990. [7] Milton Abramowitz, Irene A. Stegun, Handbook of Mathematical Functions, Dover Publications, Inc., 1972. [8] Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin, Bayesian Data Analysis, second ed., Chapman & Hall/CRC, 2004. [9] Carlo F.M. Carobbi, Marco Cati, Luigi M. Millanta, Using the lognormal distribution in the statistical treatment of experimental data affected by large dispersion, in: Proc. IEEE Intern. Symp. on EMC, Boston, Massachusetts, August 18–22, 2003, pp. 812–816.