The use of amplitude distributions in practical data assessment problems

The use of amplitude distributions in practical data assessment problems

Journal of Atmospheric and Termtrial Physics, Vol. 41, pp. 1201-1204 Pergamon Press Ltd. 1979. Printedin Northern Ireland The use of amplitude distri...

339KB Sizes 4 Downloads 114 Views

Journal of Atmospheric and Termtrial Physics, Vol. 41, pp. 1201-1204 Pergamon Press Ltd. 1979. Printedin Northern Ireland

The use of amplitude distributions in practical data assessment problems H. A. Department

of Physics,

University

BIEL

VON

of Canterbury,

Christchurch,

New Zealand

(Received 10 October 1978; in reoisedform 11 May 1979) Abstract-A method is presented for fitting experimental amplitude distribution data to the functional relationship C= 1 -exp (-aRb), where C is the probability that the amplitude is less than R. The constants of fit a and b are shown to be sufficient for estimating a probability density function and, hence, for approximating any of the moments for the data distribution.

INTRODUCIION

Correspondingly, the nth amplitude will be designated as R,. The combination (R,, C,) now constitute individual points on the cumulative distribution curve for the set of experimental data. The properties of this function are well known: C must increase monotonically for increasing values of R, and the limiting values of C are 0 and 1.0. The next question is in regard to the relationship between C and R. Assume that this relationship is given by

In many experimental situations it is not possible to obtain a large number of truly independent samples and some of the data values obtained may be contaminated by noise or dynamic range limitations in the measuring apparatus. An example might be radio wave propagation data from a particular scattering circuit. In this example the signal amplitude fluctuations are large-typically of the order of lo-20 dB. Consequently, large signals can saturate the receiving equipment, while weak signals are masked by noise. Under such conditions the determination of average and r.m.s. values from the data is difficult. Theoretically this problem can be solved quite simply: it is well-known that the average and meansquared values of a fluctuating quantity can be computed when the probability density function for the process is known. From a practical point of view the problem is to determine, from experimental data, an expression for the appropriate probability density function. A simple and effective method for estimating the required probability density function is presented in the next section of this paper.

C= l-exp[-aRb], or In(R)=

-bl Y”

(a)

(2a)

1

+iln[-ln(l-C)],

(2b)

where a and b are real, positive constants. Equation (2) is totally empirical. The justification for its use is simply this: if values for a and b can be determined so as to fit the experimental data (R,, C,) then we have an analytical estimate for the cumulative distribution function. A second reason for choosing the particular relationship in equation (2) is that it is easily manipulated mathematically. Thus, in terms of the probability density function P(R), we have C=

ESTIMATING THJ3 PROBALtRJTY DENSITY FUNCTION

RP(R) dR, I0

(3)

from which P(R) dR = abR’b-l’exp

Assume that a certain experiment has yielded N amplitude data values. By definition of the amplitude, the values are assumed to be greater than, or equal to, zero. The data are next arranged in ascending order so that the first value will represent the lowest amplitude, and the Nth value will represent the largest amplitude. Define the quantity C, so that

[-aRb]

dR.

Equation (4) is an estimate of the required bility density function. The mth moment distribution is given by: E(R”)

=

m J

R”P(R)

dR,

(1) 1201

equation

probaof the

(5a)

0

and, employing

(4)

(4), .

H. A.

1202

VON

Hence the mean, and mean squared values of R are, respectively, given by

. The de~rmination of the constants a and b proceeds in a straightforward manner. The data values (R,, C,,) are first converted to the form (Q,,, P,) so that equation (2b) assumes the form Q,==[-iln(a)]+iF,,

B~L

certain amount of artificiality had to be used in the test cases presented in this section. The first question to be answered was how well the empirical relationship in (2a) would fit ideal amplitude data from some statistical distribution. For this purpose the well-known process given by Rrcn (1944,195s) was examined. In essence RICE developed the probability density function P(R, z) for themodulus of a field vector which results when a sinusoidal signal of amplitude A coexists with random noise of zero mean and standard deviation cr. For the general case we define the quantity zz as the signal-to-noise power ratio

(7a)

A2

2

z =y,

where

2u

Q,=lnWJ, F,= In [ - In (1 -

Ub) CJ].

and RdR P(R, z) dR = --y- exp [CT

(7c)

The slope (l/b) and the intercept [-(I/b)ln(a)] are next obtained by fitting (Q,,, P,,) to equation (7a) by the method of least squares. The fact that equation (7a) represents a straight line implies that any two, or more, sets of data (Q,,, F,) are sufficient to determine these constants of fit. The confidence in their values increases, of course, with the number of data points used. We have used this analysis method exclusively for evaluating amplitude data from D-region partial reflections. A normal experiment will yield from 200 to 300 data points, and our experience has been that the r.m.s. difference between the left-hand side and the right-hand side of equations (2a) rarely exceed 5% for the interval 0.1~ CS 0.9. The most common cause for a ‘bad fit’ is radio frequency interference which will cause the amplitude distribution to be ‘double humped’.

z’] exp

The greatest problem encountered in evaluating the analysis method presented here was to find a standard against which to compare results, and a

Table1 Error

0.0 0.5 1.0 2.0 3.0 4.0 5.0

a

b

0.5000 2.000 0.3935 2.018 0.2020 2.222 0.0141 3.530 3.0x10-4 5.224 3.35~10-~ 6.958 2.36x10-* 8.701

r.m.s.

-R2 202

1

I,,[& Rr

]

(9)

where I0 is the Bessel function of imaginary argument. Numerical integration of equation (9) for given values of z and for u = 1.0 yielded ‘data sets’ (R,,, C,,) in the range 0.05 I CS 0.95. Analysis of these data, using equations (7), gave the results as shown in Table I. The errors appearing in the table refer to the difference between the two sides of equation (Za). In the last two columns of the table the ‘true’ r.m.s. value is compared with the r.m.s. value as computed from equation (6b). The agreement is seen to be excellent. It should be noted, however, that equations (9) and (4) are not identical, and consequently numerical values for z and u can only be approximated from the coefficients of fit. Within these limitations it can be shown that

EVALUATION

z

[

max

J@Sj

8.7xlo-' 1.4xlo-'1.414 9.Ox1O-40.0016 1.581 0.0055 0.0084 2.000 0.0031 0.0054 3.162 0.0118 0.0183 4.472 0.0169 0.0263 5.831 0.0200 0.0314 7.211

n-‘/b

T i+l d---i 1.414 1.584 2.015 3.155 4.453 5.807 7.184

Amplitude distributions in data assessment problems When appropriate values of z and b are substituted from Table 1 into equation (10) it will be found that the two sides of the equation differ by about 10%. The results in Table 1 are graphically represented in Fig. 1 for four values of z. The variable V”* results when the values of R are normalised to their r.m.s. value. When we define

(lla) and (lib)

then we obtain expressions for C( V”*) and P( V”*) from equations (2a) and (4) respectively, by substituting V”’ for R and a’ for a. These functions are shown as solid-line curves in the illustrations. Also shown are two horizontal broken lines which indicate the range in C(V”*) considered. The asterisks represent ‘data’ points which result from numerical integration of equation (9) and change of variable from R to V”*. The distribution of 100

such V”’ data values is shown as bargraphs in Fig. 1. We have selected experimental data from a Dregion polarimeter experiment (VON BIEL, 1977) in an effort to demonstrate that the new analysis method can give good results even for severe data truncation. In this experiment two complex fields V, and V, are vectorially combined after one of these fields has been shifted in phase relative to the other by a known amount (a). This procedure results in a new complex field V(a). From experimental data the r.m.s. value for V(a) is computed for eight values of a in steps of 45 degrees. In addition, data exist to compute the r.m.s. values for V, and V,. In our experiment a 10 bit analogue-todigital voltmeter was used for which the saturation voltage was 10 V. This saturation value was exceeded for 7% of the time for the experimental data listed as a = 90“ in Table 2. For a = 45” and a = 135”, saturation occurred for 4% and 3% of the time respectively. 260 raw data values were collected for each phase, and r.m.s. values for V(ar), V, and V, were computed by the traditional method.

0.9 0.8 0.7

’/L\ j-; I IYL

40.6c.

II II II LI \I

I I I I I IL

.I

o.o~_ ____

0.0

1 .OL.

0.9

0.8 0.7

___

““”

0.5

.

I

.

.

.

.

-

-

- “‘I 1.5

l.:

.

_____________----

.

.

.

I

‘2

___-_

Go.4 0.3

io.2

0.2

0. I

0.1

0.0

0.0

2.0

1

-0.6 v so.5

10.3 ,

-

y2.0

1203

W7

lici 1

0.0

-

- 0.5

1.5

2.0

2.7 2.4 2.1

1.8c.

1.5: 1.2;

0.9 0.6

0.3

2.

Fig. 1. Composite of results (see text for details).

8-O

1204

H. A.

VON

Table 2 a--l/b

a

+I

6

0

4.5” 90” 135” 180” 225” 270” 315

VX

VY

3.65 4.75 5.13 4.75 3.65 2.02 0.51 2.02 2.82 2.31

data set comprising less than 20% of the total points is adequate for obtaining good results. However, our experience with computer simulated data has shown that satisfactory estimates result when as little as 40% of the total data are used to compute a and b. This fact also emerges from the results shown in Table 2.

2 LO 1 I-

3.54 3.76 1.91 0.52 1.92 2.48 2.19

Bnu

II2 Fraction of data used 0.16 0.02 0.00 0.02 0.14 0.57 0.75 0.57 0.33 0.41

CONCLUSION The

The same data were next analysed, using equations (7) and in the range 0.1~ C~0.9. As expected, the r.m.s. values computed from equation (5b) were practically equal to the ‘true’ values. This expectation is demonstrated in Table 1. We then repeated the analysis, again using a range of 0.1 i C~0.9, but this time under the assumption that voltmeter saturation occurred at 2.4V rather than the true 1OV. The results from this analysis are shown in Table 2 along with a column which indicates the fraction of the data used to compute the coefficients a and b. It should be clear that there is no implication intended that a

data analysis method presented in this paper is obviously more complex than the traditional averaging technique, and under normal conditions is not intended as a competitor to the time honoured procedures. Under conditions of apparatus saturation or low signal-to-noise ratios, it is advantageous to eliminate the suspect data from the analysis procedure and still obtain acceptable results. The data analysis method suggested here is capable of filling such a requirement. In addition it is frequently desirable to store data for future consideration, and particularly in this respect the new technique offers great advantages, because it requires that only the two parameters of fit, a and b, be stored. From these two parameters the cumulative distribution, the probability density function, and also the moments for the original data can be approximated.

REFERENcEs

RICE S. 0.

RICE s. 0.

1944 1945

VON BIEL H. A.

1977

Bell syst. Tech. J. 23, 282. Bell sys?. Tech. J. 24, 46. J. aimos. fen. Phys. 39, 769.