The shape of SAR histograms

The shape of SAR histograms

COMPUTER VISION, GRAPHICS, AND IMAGE PROCESSING 4,281-293 (1988) The Shape of SAR Histograms RICHARD Department of Physics, University R. Z...

661KB Sizes 12 Downloads 57 Views

COMPUTER

VISION,

GRAPHICS,

AND

IMAGE

PROCESSING

4,281-293

(1988)

The Shape of SAR Histograms RICHARD Department

of Physics,

University

R. ZITO of Arizona,

Tucson,

Arizona

85721

Received September 25,1985; accepted March 14,1988 It is customary to model the shape of amplitude histograms from synthetic aperture radar imagery by a Rayleigh distribution with a tied mode. Frequently histograms emerge whose shape shows non-Rayleigb behavior, especially in the “tail.” A more acceptable model for histograms of this type is to allow the mode of the Rayleigh distribution to possess a distribution of values all its own. Q 1988Academic press, IX. 1. INTRODUCTION

During the processing of synthetic aperture radar (SAR) imagery it is frequently useful to construct a histogram of the fractional number of pixels per amplitude increment versus the amplitude. The shape of the resulting bar graph is usually approximated by a Rayleigh probability density function (Rayleigh distribution), although this approximation may occasionally be poor. The departure of histograms from the Rayleigh shape can be quite useful. Certain histogram shapes are associated with certain types of terrain and may be used to facilitate automatic terrain recognition. Furthermore, certain portions of the histogram are associated with certain types of objects. For example, man-made objects usually turn up in the “long tail” of a histogram which contains “bright” pixels. Therefore, it is important to understand the cause or causes of departures of SAR histograms from the Rayleigh model if these histograms are to be used in an unambiguous way. For example, can two different types of scenes, one urban, the other pastoral, give similar histograms with long tails? These are the types of questions that must be answered. The justification for utilizing a Rayleigh distribution to model the amplitude histogram of an image has its origin in the fact that the Rayleigh distribution describes the magnitude of the vector sum (resultant) of a large number of small, independent, random phasors with equiprobable phases and equal amplitudes. In radar imagery the resultant is simply the amplitude (A) of the received electromagnetic field scattered randomly by a larger number of small scatterers. Over a small subregion of a scene the scattering properties of the terrain may be sufficiently uniform that the Rayleigh amplitude distribution p,(A) applies and is given by [l].

Here a, is the mode of the histogram of the returns from the nth subregion of some larger radar map. However, when a scene involves a variety of objects or terrains, then the conditions required for a Rayleigh distribution over the whole scene will fail. If the whole scene is sufkiently large and complex, then it would be reasonable to approximately describe the modes of the subregions with a distribution function 281 0734-189X/88

$3.00

CopytQht 0 1988 by Academic Press, Inc. All rights of reproduction in any fom reserved.

282

RICHARD

R. ZITO

of their own. The Rayleigh distribution used to describe the histogram of the nth subregion is, then, a conditional probability density function, conditioned over the outcome of the random variable (J with index n for the nth subregion choice. It is the intent of this report to characterize the overall (global) probability density function of amplitude for the whole scene when various choices are made for the probability distribution of the random variable (I. Modified Gaussian, Rayleigh, and exponential distributions have been considered and tested for fit to real radar data. 2. CALCULATIONS

The global probability density function of amplitude Pf(A) is given by the integral of the Rayleigh distribution weighted by a probability density function of modes f(a). That is,

(2) The integration proceeds from 0 to co since (I is always positive. Equation (2) may be integrated exactly in a few special cases. However, the most interesting probability density functions f will require approximate methods. The technique described by Papoulis [2] is most suited to integrals where one of the two functions in the integrand is a probability density function. This method also gives an answer which is theoretically useful since it will provide a measure of the deviation from Rayleigh-like behavior. It can be shown [2] that given the integral I such that

where f(x)

is a probability

density function,

I = g(q) + gq,,g + . . . g’“)(q)$.

(4)

where q = j-” xf(x) --m

(5)

dx

and

Pn= I _mm(x -

v)“f(x)

dx.

These formulas apply provided that g is “smooth.” In this context “smooth” means that the function g does not vary much over the region where f takes on signiticant values. In the next two sections both exact and approximate evaluation of Eq. (2) will be considered.

283

THE SHAPE OF SAR HISTOGRAMS 3. EXACT METHODS

The very simplest distribution of modes is the delta function distribution. In that case only a single global value (Us) for the mode is allowed and Eq. (2) becomes

P,(A)

=

.

($e-

(1/21(07)‘6

(

u

-

um)

do

(8)

.

as we would expect. That is, P,(A) has a Rayleigh distribution with u = a,. All other P,(A) will be compared to P,(A) with a,,, = 1. Next, consider the more interesting possibility of a Rayleigh distribution of modes with mode M. This has intuitive appeal since the mode can never have negative values. In that case the global probability density function of the amplitude, P,(A), is given by

Setting

yields

(11) Since d(&)2

= 2&d&,

Eq. (11) becomes

(12) Define

(13) and

284

RICHARD

R. ZITO

such that

Then

The integral of Eq. (15) is given by [3, 41

P,(A) = 2kKo(2@),

(16)

where K, is a function whose properties are well known since it is the solution of the second-order differential equation [5] 1 11” + -u’ Z

V2

i

1 + -g

u =

0,

i

where u is a function of z. The series solution of Eq. (17) yields [6]

where $J is the derivative of the natural logarithm of the gamma function. elementary clearing of factors

After

Therefore,

It is convenient to expand J/(j + 1) for integers j as follows [7, 81: +(j+1)=

-c+

i

;,

(21)

?l=l

where c is the Euler constant whose value is approximately given by [9, lo] 0.577.. . . The comparison between P,(A) and P,(A) may be seen in Fig. 1, where peaks have been aligned. Generally speaking, P,(A) is too gently sloping in the tail to match the histograms of most SAR images.

285

THE SHAPE OF SAR HISTOGRAMS 1.00 -

- - - _ - - -PRAY

0.90 i 0.80 - \.

PDELTA

\ 0.70 -

‘\ -.-.v-.-.-.,3,.,GA”S

AMPLITUDE

FIG. 1. A comparison of histogram behavior is shown above when the modes of various subregions are distributed according to a Rayleigh law (PRAY) and a half Gaussian (PHGAUS). A delta function distribution of modes is shown for comparison (PDEL.T). This latter curve is just purely Rayleigh in nature.

Finally, thereis one other interestingchoicefor f(a) which is exactlyintegrable but is perhaps a little articial. The half Gaussianprobability density function, definedby f(u)

-w2)(~/s)2

= &e

f(o) = 0

u 2 0,

(22)

u < 0,

(23)

where s is the standarddeviation,may be useful whenmodelingimagerywhich is largely dark and hasonly a very few bright subregions[ll]. The factor of 2 in Eq. (22)is thereto ensurethat f(u) posesa unit integraloverA. The globalprobability density function of the amplitudeP,,,(A) is givenby

Defming a

s

‘A2 2

(25)

and (26)

we get (27)

286

RICHARD

R. ZITO

which integrates to P,,(A)

= ;tcA?

Setting s = 1 yields the curve in Fig. 1. In general, this curve is usually not a good model of SAR histograms since it does not go to zero at the origin. However, such histograms do occur in optical imagery [12]. 4. APPROXIMATE

METHODS

Perhaps the single most important choice for f(o) is when u is distributed by a narrow Gaussian about some non-zero value. There is a certain amount of inconsistency with this assumption since a Gaussian will have a tail which will extend into the negative region of the u-axis on the f(a) vs u graph. However, if the Gaussian is narrow and possesses a mean which is far from the origin, little probability will be trapped under the tail in the region of negative u. In that case the probability of achieving a negative u will be very close to zero. It also makes sense to extend the integration of Eq. (2) to the interval (- co, cc), since f(u) has only very small values in the region of negative values of u. Now, the global probability density function PG,,,W is given by

where, as in the case of the &function, a, represents the displacement of the mode of the distribution f from the origin of the u axis in the f vs u graph. Since

and

it is easy to show that the moments q and cd2for the Gaussian distribution pLzg) are calculated from Eqs. (5) and (6) to be 9, = %I

(q, and

(32)

and /LQ

Therefore, to second order, Eq. (4) yields

= s2.

(33)

287

THE SHAPE OF SAR HISTOGRAMS 0.63

PDELTA c

/ ’ A.

._ . -

_-

_ ..pGA"S

AMPLITUDE

FIG. 2. A comparison of histogram behavior is shown above when the modes of various subregions are distributed according to a gaussian law (PGAUS) and an exponential law (PEXP). A delta function distribution of modes is shown for comparison.

In this form it is clear that P,,,(A) p assesses a Rayleigh part and a part in curly brackets which represents the departure from Rayleigh behavior. P,,,,(A) (Eq. (34)) is graphed in Fig. 2 with P,(A) for comparison. P,,,,(A) shows Rayleigh-like behavior with a more slowly decreasing tail. This behavior is very typical of SAR histograms. One last distribution for f will be examined by approximate methods. When f(a) has the shape of the displaced exponential below f(u)

= X,-w-%)

0’

f(Q) = 0

a*,

(35)

0 < a,,

(36)

the moments r],r and pZexp are given by 1 rl=P

=a,+

x 1

112exp

=

$7’

Therefore, to second order, the global probability P,,,(A), is given by

(37)

(38) density function of amplitude,

This expression for P,,(A) is graphed in Fig. 2. It has the same basic character as P,,,,,(A). It is difficult to distinguish between the two models by an examination of data. However, the Gaussian distribution of modes is more intuitively appealing. It

288

RICHARD

R. ZITO

is interesting to note that Rq. (39) again has the form of a Rayleigh term times a term which represents the deviation of P,,,(A) from Rayleigh behavior. 5. PARAh4ETRIC

METHODS

In this section, fitting SAR histograms with two and three parameter functions is considered. Examples are the log normal [13-151 and the Weibull distribution [16]. The log normal distribution takes the form 0

(A I c),

P log-nor@) =

[log(A - c) - m] ’ i (A -:).fiexp

2u2

where c, m, and u are adjustable constants. The Weibull distribution

(A >c),

(‘@

takes the form

P,,,,,(A) = (d+@?

(41)

where a and /3 are adjustable. In general, good fits may be obtained between these parametric distributions and the experimental data, with the Weibull distribution being the better of the two. However, little insight is gained about the nature of images from such techniques, since the parameters are not easily connected to physically meaningful quantities. 6. COMPARISON

OF

THEORY

WITH

IMAGERY

DATA

Figure 3 shows a typical fit of Eq. (34) to some SAR imagery. Twelve scenes containing forests, villages, farm land, lakes, and roads were analyzed. Each of the scenes was composed of approximately 4 X lo6 pixels of information so that the total analysis involved e 5 X lo7 pixels. Several tests were used to measure the

E.4[

FIG. 3. A typical Rayleigh fit (PDELTA)

&;

---------PDAT

fit of Eq. (34) (PGAUS) to an eqkmntd is also graphed for compakm

SAR

histogram

(PDAT).

The

best

THE

SHAPE

OF

SAR

289

HISTOGRAMS

goodness of fit between theoretical (model) and experimental histograms. Each of these will be discussed below. The simplest figure of merit is to evaluate the percent error in the experimental and theoretical probability functions (called DATA(i) and THEORY(i), respectively). More precisely the values of the probability functions DATA(i) and THEORY(i) at point i may be computed from probability density functions (pdfs) PDAT(i) and PGAUS(i), respectively (see Fig. 3) DATA(i) THEORY(i)

= PDAT(i) = PGAUS(i)

x

A x

(42)

A,

(43)

where A is a small interval on the amplitude axis containing point i. Therefore, the percent error is given by

%ERROR(Probab.)

i JTHEORY(i) - DATA(i)] = i-l 1

x 100,

W)

ivhere in our case n = 52 (i.e., our graphs are plotted from 52 data points corresponding to amplitude changes from 0 to 520). Note that the denominator in the equation above is unity since the total probability of any of the i outcomes is unity. Another more standard and very popular test is the chi square test where chi square, x2, is given by [17] x2 = f

[THEORY(~)

i=l

- DATA(i)]’ 9

0;

(45)

where a,. is the size of the error bar associated with the experimental points DATA(i). For the radar unit in this report ui - 0.0039. The value of x2 calculated above may then be compared with the tabulated values of x2 for 50 degrees of freedom (the number of data points minus two degrees of freedom associated with the parameters in Eq. (34)). The probability of achieving a larger x2 (worse tit) may then be obtained from the same table [18]. The Kohnogorov-Smirnov test was also applied to test the fit between the experimental data and Eq. (34) [19]. The Kolmogorov-Smirnov sample distribution function S,(i) f19) was constructed from the data distribution function generated by integration of PDAT(i) over i subintervals of size A. Next, the distribution function Foauss(i) was constructed by integrating THEORY(i) over i submtervals of size A. The norm 0, given by D,, = Sup&=,,,(i)

- S,(i)]

for all i

(W

may now be evaluated. If D, < D,“, where 0,” is the critical value associated with confidence coefficient 1 - a, then the hypothesis is accepted. In this report LY= 0.05 has been used and is associated with the 95% confidence band.

290

RICHARD

R. ZITO

The “skewness”and “kurtosis” (or excess)are a measureof the asymmetryand flatnessof a densityfunction, respectively[20]. Error in thesequantitiesgivessome estimateof the “shape” errorsencounteredin fitting the two-parametertheory of Eq. (34) to experimentaldata. Skewnessand kurtosis are definedin terms of the second,third, and fourth centralmoments(pz, pL3,and pq, respectively)about the means MO and MT of the experimentaland theoreticaldistributions,respectively. Therefore, ERROR (SKEWNESS) = ~,(THE~RY) - ~,(DATA) ERROR (KURTOSIS) = Y*(THEORY) - Y~(DATA),

(47) (48)

where SKEWNESS(DATA) = ~~(DATA) =

idDATA>

[P,@ATA)~~‘* ~A~(THEORY) SKEWNESS (THEORY) = Y,(THE~RY) = [p 2(THEORY)] 3/2 P@ATA) bz(DATA)]* - 3 p4(THEORY) KURTOSIS (THEORY) = y,(THEORY) = - 3 [/-dTHEORY)]* KURTOSIS (DATA) = y,(DATA) =

(49) (50) (51) (52)

p2(DATA) = ;cl ([ ih - M,]* * PDAT( i) - A)

(53)

,u*(THEORY) = icl ([ ih - M,]* . PGAUS(i) - A)

(54)

ps(DATA) = igt ([iA - M,]~ * PDAT(i) * A)

(55)

p3(THEORY) = i$i ([iA - +I3 * PGAUS(i) - A)

(56)

c(,(DATA) = i$ ([ih - M,]~ - PDAT(i) - A)

(57)

p.,(THEORY) = igi ([iA - M,]~ * PGAUS(i) * A) MO = e iA * PDAT(i) * A

(58) (5%

i=l

MT= iih-PGAUS(i).A. i-l

(60)

“A positive

1 2 3 4 5 6 7 8 9 10 11 12

Id. No.

outcome

10.39 10.55 9.98 6.81 9.52 6.92 7.33 10.45 9.98 7.03 10.11 23.92

% Error in probab. (DEF. 44)

means

22.94 38.63 31.96 11.67 17.41 11.97 11.76 27.49 31.62 7.81 22.60 294.83

x2

acceptance

2 99 87.33 97.73 2 99 2 99 2 99 2 99 > 99 97.92 2 99 2 99 2 0.010 of Eq. (34).

0.58 1.10 1.12 0.92 0.92 0.92 0.86 1.07 1.11 0.75 1.02 1.17

Skewness WW

1

x

Kurtosis Ww)

lo-*

Kurtosis (data) 0.55 - 0.45 - 0.31 0.07 0.18 0.08 0.14 0.05 -0.39 0.23 0.11 - 0.42

Skewness difference

to Eqs. (34) and (44))

- 0.97 3.54 2.77 0.31 -7.88 x lo-* 0.31 7.08 x lo-* 0.76 3.24 -0.47 0.39 3.60

Best Fit (According

-0.54 1.14 1.13 0.48 0.36 0.48 0.21 0.87 1.14 -7.98 0.69 1.14

TABLE and Theoretical

2.48 x lo-* 1.55 1.42 0.85 0.74 0.85 0.72 1.02 1.50 0.52 0.91 1.58

for Data

Skewness (theory)

Parameters

Probab. of Achieving greater x2 (in percent)

Statistical

-

-

-

0.43 2.41 1.64 0.17 0.44 0.17 0.28 0.11 2.11 0.40 0.30 2.46

Kurtosis difference

+ + + + + + + + + + + +

KolmogorovSmimov testa

E ul

g

z X ;3

$

i z-4

2

292

RICHARD

R. ZITO

-----

NORMALIZED

NORMALIZED

SMOOTHED

DATA

BEST flT

WEISULL

ERROR = 008

PIXEL AMPLITUDE

FIG. 4. Comparison of best fit Weibull curve to experimental SAR data curve.

In Table 1 the skewness yi and kurtosis y2 of the experimental and theoretical data are presented as well as the difference between these quantities. Skewness ranges from zero for a Gaussian to - 0.6 for a Rayleigh distribution to a value of 2 for an exponential density function [20]. The kurtosis (or excess) ranges from zero for a Gaussian to - 3.2 for a Rayleigh curve to 6 for an exponential [20]. All the theoretical and experimental histograms analyzed in this report had a Rayleigh-like behavior, but the longer right-hand tail results in a larger skewness. 7. CONCLUSION

Table 1 summa&es the results of this investigation. “Best fits” were determined by minimizing the percent error in the probability function (i.e., minMzing % ERROR[Probab.]) given by Eq. (44). This procedure generally led to selection of parameters s and a, in Eq. (34), which were close to, but not exactly the same as, those selected by mhimizhg x2 or the error in yi and y2. Generally speaking, fits are good. The minimum percent ERROR encountered with Eq. (34) is usually about 10% or less. This is comparable to, but slightly greater than, the percent ERROR associated with physically empty parametric fits. Weibull fits are about 2% better than those associated with Eq. (34) and log-normal fits offer a small 1% improvement. However, it should be noted that parametric fits are worse in the tail (cf. Figs. 3 and 4). One notable exception to the good fits which have been achieved is scene 12. For that scene the probability of achieving x2 2 the computed x2 is very small. This indicated rejection of the hypothesis (Eq. (34)). The inability to achieve a good fit

THE SHAPE OF SAR HISTOGRAMS

293

for this data seems to arise from the requirement that the function g(x) in Eq. (3) change too rapidly with respect to f(x) to allow the sequence Eq. (4) to be terminated after second order. In short, the “smoothness” condition on g(x) has been violated. The Kohnogorov-Smirnov test indicated acceptance of the hypothesis (Eq. (34)) in all 12 cases. Even data set 12 passed the test, though with the smallest margin. Taking into account the computational limitations of Eq. (34), it seems that the global probability density function of amplitude can be computed from the assumption that the various subregions of a scene each have a Rayleigh probability density function of amplitude with a fixed mode. There is, therefore, little chance of confusing the histogram of a lake, for example, with that of an agricultural area consisting of a checker board of small fields. The former scene will have a Rayleigh pdf and the latter a global pdf with a long “tail” due to the variation of modes as we move from one cultivated square to the next. However, some confusion arises when one compares the histogram of farm lands with that of a small village in a forest. The histograms may be quite similar even though the terrain types are quite different. This is because very different distributions of modes may yield very similar global pdfs as can be seen by comparing Eqs. (34) and (39). Such effects cause “false alarms” during automatic terrain recognition. REFERENCES 1. H. P. Westman (Ed.), Reference Dora for Radio Engineers, 4th ed., p. 991, Intl. Tel. and Tel., New York, 1956. 2. A. Papoulis, Probability, Random Variables and Stochastic Processes, pp. 151-152, McGraw-Hill, New York, 1965. 3. I. S. Gradshteyn and I. M. Ryzhik, Table ojlntegrals, Series, and Products, 4th ed., p. 340, Academic Press, New York, 1%5. 4. D. B. DeHaan, Nouvelles Tables D’Integrales DSjinies, p. 144, Hafner, New York, 1867. 5. Ref. [3, p. 9721. 6. Ref. 13, p. 9611. 7. E. Jahnke and F. Emde, Tables of Functions, 4th ed., p. 19, Dover, New York, 1945. 8. M. Abramowitz and I. A. Stegun, Handbook oj Mathematical Function, p. 258, NBS, Washington, 1972. 9. Ref. [7, p. 21. 10. Ref. [8, pp. 3, 2553. 11. R. C. Gonzalez and P. Win@ Digital Image Processing, p. 120, Addison-Wesley, Reading, MA, 1977. 12. Ref. [II, p. 1341. 13. E. Jakeman and P. N. Pusey, A model for non-Rayleigh sea echo, IEEE Trans. Antennas and Propagation AP-24, No. 6, 1976, 806-814. 14. G. V. Trunk and S. F. George, Detection of targets in non-Gaussian sea clutter, JEEE Trans. Aerospace Electron Systems AB-6, 1970, 620-628. 15. M. Fisz, P&ability Theory and Mathematical Statistics, p. 171, Wiley, New York, 1%3. 16. P. L. Meyer, Introducto~ P&ability and Statistical Applications, 2nd ed., pp. 234-235, Addison-Wesley, Reading, Mass., 1970. 17. A. C. Melissinos, Experiments in Modern Physics, pp. 464-465, Academic Press, New York, 1966. 18. S. M. Selby (Ed.), Standard Mathematical Tables, pp. 584-585, CRC, Cleveland, Ohio, 1968. 19. P. G. Hoe& Introduction to Mathematical Statistics, 4th ed., pp. 324-329, 401, Wiley, New York, 1971. 20. A. J. Duncan, Quality Control and Industrial Statistics, pp. 49-51, 64, 585, Irwin, Homewood, IL, 1974.