“Qualitative” Bayesian estimation of digital signals and images

“Qualitative” Bayesian estimation of digital signals and images

PatternRecoynition,Vol. 25, No. 11, pp. 1371 1380,1992 Printed in Great Britain 003l 3203/92 S5.(~)+.00 Pergamon Press Ltd © 1992Pattern RecognitionS...

1MB Sizes 0 Downloads 35 Views

PatternRecoynition,Vol. 25, No. 11, pp. 1371 1380,1992 Printed in Great Britain

003l 3203/92 S5.(~)+.00 Pergamon Press Ltd © 1992Pattern RecognitionSociety

"QUALITATIVE" BAYESIAN ESTIMATION OF DIGITAL SIGNALS A N D IMAGES MOHAMED ABDEL-MOTTALEB a n d AZRIEL ROSENFELD Computer Vision Laboratory, Center for Automation Research, University of Maryland, College Park, M D 20742-3411, U.S.A.

(Received 23 September 1991; in revised form 23 September 1991; received for publication 16 March 1992)

Abstract--Bayesian estimation of digital signals is ordinarily concerned with the problem of estimating an ideal signal, given a noisy signal. The problem of partial or "qualitative" Bayesian description is considered, rather than complete estimation, of the ideal signal. Only the case of a piecewise constant signal is considered; instead of estimating the value of the ideal signal, only a piecewise symbolic description of the signal is sought--e.g, is the value high or low, where these descriptors are defined by probability densities on the possible signal values. This task is computationally less costly than that of complete Bayesian estimation of the signal; moreover, it is found that the descriptions can be estimated robustly. This approach is illustrated both for digital signals and for a simple class of digital images. Bayesian estimation

Image description

Signal description

1. INTRODUCTION

2. PARTIAL SIGNAL ESTIMATION

In Bayesian signal estimation, tl'2' we are given a noisy signal and a set of possible ideal signals, with associated prior probabilities, and our problem is to determine the ideal signal that most probably gave rise to the observed noisy signal. This is the problem of maximum a posteriori (MAP) signal estimation. Often we are not interested in estimating the ideal signal completely, but only in deciding whether it has some given property or belongs to some given class. For example, if the ideal signal is linear, we may not need to know its slope and intercept, but only whether it is increasing, decreasing, or constant; or if the signal is quadratic, we may only need to know whether it is convex, concave, or straight (i.e. linear). Similarly, if the ideal signal is piecewise linear or quadratic, we may only need to find the (maximal) pieces of the signal that are increasing/decreasing/constant or convex/concave/ straight. When we are only interested in a partial description of the signal or of the pieces of the signal, we can regard our task as one of partial (or "qualitative") signal estimation. Evidently, partial estimation of a signal should be less computationally costly than complete estimation. The cost of computing partial MAP estimates of piecewise constant signals is discussed in Section 2. Examples of the results of such partial M A P estimation, and a comparison with ordinary ("complete") M A P estimation, are presented in Section 3. In Section 4 we apply the idea of partial estimation to digital images. For simplicity we consider only piecewise constant images in which the edges between constant regions are all horizontal or vertical. Experimental results for such images are presented in Section 5. Section 6 discusses various possible extensions of our work.

We assume that both the ideal signal and the noisy signal can only take discrete values in the finite range { 1,..., m}. We also assume that the noise affects each signal value independently and that the noise probabilities are the same for each signal value; such noise is called i.i.d. (independent, identically distributed). Since there are only m possible signal values, the noise model is thus completely described by an m* m matrix (PNI j[ i)), where PN(Jl i) is the probability that signal value i gives rise to noise signal value j, 1 _< i, j < m. This noise model is quite general and is capable of approximating all types of i.i.d, noise. Thus if we can empirically determine the noise that is present in a given class of images, we can incorporate that noise model in our estimation scheme. In the experiments described in this paper we used a quantized approximation to additive Gaussian noise. In complete MAP estimation of a piecewise constant signal we assume prior probability distributions for the piece sizes and their associated signal values. Using these distributions we can find the most likely ideal piecewise constant signal f given the n-tuple of (discretely sampled) noisy signal values 9(1) . . . . . g(n). This general maximization problem was shown in reference (3) to be very expensive computationally, since the number of possible values of f is exponential in n. In reference (4) the simpler problem of edge detection was treated; it was assumed that the constant pieces of the signal have a m i n i m u m size, and that we choose n so that the n-tuple cannot contain more than one edge i.e. it arises from either part of a single constant piece or parts of two constant pieces of the ideal signal. The n-tuple can thus be thought of as an edge detection operator, for which we make a decision as to whether (or not) an edge is present and in what position, and we

PR 25:11-G

1371

1372

M. ABDEL~MOTTALEBand A. ROSENFELD

also estimate the values on the two sides of the edge (or the single value, if there is no edge). In partial M A P estimation we assume a set A whose members represent possible descriptions of a constant piece of signal. Each member Ai of the set A is described by a pdf p(zlA~), the probability of ideal value z, given description A~. The problem of partial estimation is to divide the n-tuple of signal values into "pieces" and assign to each piece a member of A. It is clear that if the members of A correspond to individual values, this amounts to complete estimation of the signal. As the set A becomes smaller, the assignment of members of A to pieces yields a simpler, i.e. more partial, description of the signal. As in references (3, 4) we define the prior probabilities for the ideal signals by assuming them to be generated by a run length process and a description assignment process. The run length process has an associated run length probability density Pa; it generates a partition of the (integer) line into runs. We assume that the run lengths have a lower bound which is at least n - 1; this implies that the n-tuple of samples g(1) .... ,g(n) cannot contain more than one edge. It is straightforward to compute the probabilities q~ and q2 = 1 - ql that the n-tuple is contained in a single run or in a union of two consecutive runs (both of which it intersects), respectively. A reasonable choice for PR is a Rayleigh-like distribution in which short runs (close to the lower bound) are improbable, and very long runs are also increasingly improbable. Ifpa is sharply unimodal it can be crudely approximated by a spike at the position of the mode, i.e. we can assume for simplicity a constant run length, say r. In this case it is easy to show that ql = (r - n + 1)/r and q2 = (n - 1)/r. Note that if r = a(n - 1) we have ql = (a - 1)/a and q2 = 1/a. In the experiments described in the next section we used r = 3(n - 1); in this case we have ql = 2/3, q2 = 1/3. The process of assigning descriptions to the runs has an associated pdf PA" If the descriptions are independent, PA is simply a probability vector of length k; but if we want to allow neighbor dependencies, we must specify a two-dimensional array PA of size k . k , in which pA(U, V) is the probability that the descriptions assigned to a pair of consecutive runs are u and v (in that order). In terms of q~, q2 and PA, we can express the prior probability that the given n-tuple of noisy signal samples g(1) . . . . . g(n) arises from a given configuration of the ideal signal--specifically, from a constantdescription ideal signal, say with description A~, or from an ideal signal consisting of two parts, each with constant description, say Ai and Aj. It is easily seen that in the latter case the edge separating the two parts is equally likely to be in any given position, 1 < s < n - 1, where "position s" means "between samples s and s + 1". Note also that in the latter case we can have i = j ; thus there are two ways for the ideal signal to have a constant description. The probability of the constant description A i is Pii = qlPA(AI) + q 2 P A ( a i , Ai), and the probability of the ordered pair of descriptions

A i, A j, where i # j , is Pij = q2PA(Ai, A j). For example, if the descriptions are equally probable and independent, then pA(Ai)= 1/k and pA(Ai, A j ) = 1/k 2 for all i,j, so that for constant run length r we have p, = (r - n + 1)/ rk + ( n - 1)/rk 2 for all i, and Pi~ = ( n - 1)/rk 2 for all i :/:j. In particular, if r = a(n - 1) and k = 2, then the probability of a constant description is Pll + P22 = (2a - 1)/2a, and the probability of an edge (which is allowed to be in any position) is P12 + P21 = l/2a. For example, if a = 1 (the smallest possible value), these probabilities are both 1/2; ifa = 2, they are 3/4 and 1/4; ifa = 3, they are 5/6 and 1/6; and so on. Evidently, we can control the relative prior probabilities of the constant and edge descriptions by varying the (average) run length, and evidently these probabilities have the intuitively expected properties: in a signal composed of long constant runs, Constant has higher prior probability than Edge, and this begins to happen as soon as a > 1. For M A P estimation, we need to find the maximum of p,H~=lp(g(l)lA,),

1
and

p,jI-I~=, P(g(l)l A,)FI~'=~+1 P(g(I)IAJ), l<_i#j
l<_s
where g(l) is the lth noisy signal sample. Evidently, this is the same as maximizing

PU.s =- pijI'IT= IP(g(I)IA')I-IT=~+ 1P(g(I)[ A j), l
l
(note that we now allow i=j), since this last set of expressions simply includes n - 1 copies of each piiI-lT= lp(g(l)]Ai), one for each s. In these expressions, we have

p(g(l)lAi) = ~ pN(g(l) lz)P(zlA,) z

where z is the ideal value that gave rise to the noisy value g(I), and PN is the noise probability matrix. If the maximum occurs for i=j, it is more likely that the n-tuple g(1) . . . . . g(n) has a constant description. If it occurs for i :~j and some s, it is more likely that an edge is present, and its most likely position is s. The computational cost of this process is O(k2n), where k is the number of As. (Evidently, the cost of computing Po.I for all i, j is O(k2n). To compute them for other values of s, we note that plj.s+~/pij.~= p(g(s + 1)lA~)/p(g(s + 1)IAi). Using this observation we can calculate P~j.s for all s in O(k2n).) Thus the smaller the k, the lower the computational cost. For example, if the As correspond to individual gray levels, the cost is O(m2n); but if there are only two As the cost is only O(n), and since n - 1 is a lower bound on the run lengths n is likely to be small. Moreover, if k is small, the descriptions are relatively "coarse", and as we shall seen in Section 3, they can therefore be estimated more robustly. In our experiments, we used n = 16 and k = 2.

"Qualitative" Bayesian estimation of digital signals and images

tions were defined by pdfs, p(zlL) and p(zlH). Figure 1 shows three possible ways of defining these pdfs. The definitions in Figs l(a) and (b) are "general purpose": the higher (or lower) the value, the greater its probability given H (or L). The definitions in Fig. l(c) are more "special" and correspond to the situation where H and L have specific "ideal" values, with the probability of a value decreasing as its difference from the ideal values increases. (In Fig. l(c), p(zl H) and p(zlL) are truncated normal distributions with means at 14 and 18 and standard deviations of 2.) The ordinary M A P estimate of a noisy signal can be either a constant (with any of the 32 possible values) or an edge located in any of the 15 possible positions and

Note that some important classes of piecewise constant signals are "quasi-binary", i.e. they have values which are either "high" or "low" (e,g. Morse code); hence two As (k = 2) are quite sufficient. 3. EXPERIMENTS 1 Our experiments involved discrete-valued signals in which the number of possible (ideal or noisy) signal values was m = 32. The signals were corrupted with additive i.i.d, quantized Gaussian noise with zero mean and standard deviation tr = l, 2, 3. Our two descriptions, A ~ = H and A 2 = L, related to whether the signal value is high or low. These descripi

,06 /

i

i

:

T - -

~ ~

1373

i

~ high/



05

/

~

//

,o4 ~-

.03 .02 I Ol t-

o

5

i0

15

20

25

30

Fig. l(a).

.06 ~ i.05 [-

,, i,I

i

,

,

~

i

,

,

~

!

!

~

~

~

_

.04 [ .03 r .02 .01 0

L__i ~ k o

L L L ~_,

lO

15 Fig. l(b).

i,

I J i__l t I , ~ ,

20

25

i I t ~

30

1374

M. ABDEL-MOTTALEBand A. ROSENFELD

.15

.I

05

0 I 0

J 5

l i0

15

20

25

30

Fig. 1(c).

with any pair of distinct values. Similarly, the partial MAP estimate can either be the constant description H or L, or an edge located in any position and with values H and L (or L and H). In reference (3) it was pointed out that edge positions near the ends of the signal should not be allowed; if we allowed them, a few outlier values at one end of a noisy constant signal could lead to a MAP estimate of an edge near that end. We therefore allowed only 3 < s < 13 (= n - 2 ) in choosing the maximum. In our first set of experiments the ideal signal was a constant of value 12. Using each of the three definitions of H and L and each value of tr we performed both ordinary and partial MAP estimation for 50 noisy versions of the ideal signal. The partial MAP estimate was always the constant description L; no false edges were detected. For ordinary MAP estimation, on the other hand, as cr increases the number and contrast (= absolute difference between values) of detected false edges increases, as we see from Table 1. When no edge was detected (a "contrast" of 0), the constant value was always correctly estimated as 12. In our second set of experiments the ideal signal was an edge located in the middle of the signal (s = 8) with constant values 14 and 18. Using each of the definitions of H and L and each value of tr we performed both ordinary and partial MAP estimation for 50 noisy versions of the ideal signal. The results are shown in Table 2. The first column shows the position where an edge was detected; the second column shows the number of noisy signals for which an edge was detected in that position using ordinary MAP estimation; and the remaining columns show corresponding numbers for partial MAP estimation using the three definitions of L and H shown in Figs l(a)-(c). The numbers in the last row are cases in which the MAP estimate was a

constant. Note that when we used the first definition of L and H ("Partial a"), there were many such false dismissals; this is because Constant has a much higher prior probability than Edge (5/6 vs. 1/6). For the second and third definitions of L and H, partial MAP estimation performed about as well as ordinary MAP estimation; thus its lower computational cost did not lead to poorer performance. 4. PARTIALESTIMATIONOF IMAGES WITH ISOTHETICEDGES In this section we extend the approach of Section 2 to the case of piecewise constant digital images in which the edges between constant regions are all horizontal and vertical ("isothetic"). Such images actually occur in some applications; for example, in some types of integrated circuits the edges are isothetic. If the images are also "quasi-binary" (as they are for integrated circuits), two values of A (k = 2) should be adequate for describing them. In Section 4.1 we discuss the prior probability distribution on the ideal images, and in Section 4.2 we formulate the MAP estimation problem. Table 1. Results of ordinary MAP estimation of a noisy constant signal; the partial MAP estimates were always correct tr

Number of cases

1

35

0

15 29 21 24 26

1.33 0 2.57 0 3.80

2 3

Average(absolute) contrast

"Qualitative" Bayesian estimation of digital signals and images Table 2. Results of ordinary and partial MAP estimation of a noisy step edge Position

Ordinary

Partial a

Partial b Partial c

3 4 5 6 7 8 9 10 Constant

0 0 0 0 1 47 1

Position

Ordinary

Partial 1

3 4 5 6 7 8 9 10 11 12 13 Constant

0 1 0 1 4 30 10 2 0 0 2 0

3 4 5 6 7 8 9 10

1 2 0 2 5 17 7 5

11

1

0

1

1

12 13 Constant

1 2 7

0 1 26

1 l 7

0 2 9

(a) ~r = 1 0 0 0 0 0 26 0

0 0 0 1 2 47 0

0 0 0 1 2 47 0

1

0

0

0

0

24

0

0

Partial 2 Partial 3

(b) cr = 2 0 0 0 0 0 18 6 0 0 0 0 26 (c) a = 3 0 0 0 1 2 15 4 l

0 1 1 3 9 25 9 0 0 0 0 2

0 1 1 2 7 27 II 0 0 0 0 1

0 3 0 5 3 20 7 2

0 1 0 3 4 19 6 4

4.1. Priors We recall that in the one-dimensional case we defined the prior probability distribution on the ideal signal by assuming a run length process and a description assignment process. In the two-dimensional (2D) case we can extend this definition by using two one-dimensional (1D) run length processes, say based on run length probability densities PR and p~. If we assume that these processes are independent, we can construct a partition of the (integer) plane into upright rectangles by using each of the the two processes to generate a partition of the (integer) line into runs, and then taking the direct product of the two partitions. As in the 1D case, we assume a lower bound of (at least) n - 1 on the run lengths. This implies that an n* n upright square can contain at most one horizontal and one vertical edge. Let q~ and q2 be the probabilities, computed for the run length probability density PR,

1375

that an n-tuple of consecutive samples is contained in a single run and in the union of two consecutive runs (both of which it intersects), respectively, Let q'l and q'2 be the analogous probabilities for p~. Then the probability that an n , n array of samples is contained in a single rectangle is qtq'l; the probability that it is contained in a union of two horizontally (vertically) adjacent rectangles and intersects both of them is qEq'l(qlq'2); and the probability that it is contained in a 2 , 2 block of rectangles and intersects all four of them is q2q'2. If PR is a spike, so that the run length is constant, say r, we can easily compute ql and q2 exactly as in Section 2, and similarly for q] and q'2' The process of assigning descriptions to the rectangles has an associated pdfpA. Ifthe descriptions are independent, PA is simply a probability vector of length k; but if we want to allow neighbor dependencies, we must use a four-dimensional array PA of size k* k , k . k in which pa(t, u, v, w) is the probability that the descriptions assigned to a 2 , 2 block of rectangles are t, u, v, w arranged as shown in Fig. 2(a). The number of possible ideal configurations that can give rise to an n*n array of noisy samples is quadratic in n (the horizontal and vertical edges (if they are present) can each be in any of n - 1 positions) and quartic in k. For example, if k = 2, there are 16 ways of assigning descriptions to the quadrants of the block defined by the horizontal and vertical edges. As in the I D case, we are interested in the types of ideal configurations that could have given rise to the array of samples. In the 2D case there are five types of configurations, and some of them can arise in several different ways: (1) The ideal array has a constant description if it is contained in a single rectangle, or it is contained in a union of two adjacent rectangles which both have the same description, or it is contained in a 2*2 block of rectangles which all have the same description. (2) If the array is contained in a 2 , 2 block of rectangles, and three of them have the same description but the fourth has a different description, the array contains a corner (or "L-junction'); see Fig. 2(b). (3) If the array is contained in a 2 , 2 block of rectangles, and two horizontally or vertically adjacent rectangles have the same description but the other two have two different descriptions, the array contains a T-junction (Fig. 2(c)). Note that this case cannot arise if there are only two possible descriptions (k = 2). (4) If the array is contained in two vertically (or horizontally) adjacent rectangles which have different descriptions, or if it is contained in a 2 , 2 block of rectangles such that the horizontally (or vertically) adjacent rectangles have the same descriptions (but not all four are the same), then the array contains a horizontal (or vertical) edge (Figs 2(d) and (e)). (5) If the array is contained in a 2 , 2 block of rectangles, and no two horizontally or vertically adjacent rectangles have the same description, the array contains an X-junction (Fig. 2(f)). Note that this case can arise even if k = 2.

1376

M. ABDEL-MOTTALEBand A. ROSENFELD

t

u

V

W

i~. ~ i:.ii!:.:~.

--:~!

(a)

(b)

(e)

(d)

(e)

(f) Fig. 2.

By considering these cases it is straightforward to compute the prior probabilities of the five types of ideal configurations in terms of the values ofq and PA, just as in Section 2. For example, the probability of the ideal array having constant description A i is Piiii ---- qlq'lPA, + (q'lq2 + qlq'2)PA(Ai, A i ) + q2q'2pA(Ai, Ai, Ai, hi). Similarly, the probability of the array containing a corner in which the first three quadrants have description A i and the fourth has description Aj is Pi,j = q2q'2Pa(Ai, Ai, Ai, A j). The probabilities of other types of configurations are computed analogously. As in Section 2, we can compute the prior probabilities of the various types of configurations in closed form when the run lengths are constants, say r -= a(n - 1) and r ' = a ' ( n - 1), and these expressions are not complicated when the descriptions are independent. Table 3

gives the values of the probabilities for a = a' = 1, 2, 3 and k = 2 (recall that T-junctions cannot exist when k = 2). (Each of these probabilities is summed over all the possible positions and orientations, as in Section 2.) We see from Table 3 that the prior probabilities of the configuration types can vary significantly even when the descriptions are independent and equally probable. We could get even more variation by making the prior probabilities of the two descriptions unequal or by making them dependent, or by using different Table 3. Prior probabilities of various types of ideal block configurations, for various constant horizontal and vertical run lengths and for two independent descriptions Type Constant Edge Corner X

r=r'=n-I 0.125 0.25 0.50 0.125

r=r'=2(n-l) 0.53 0.31 0.13 0.03

r=r'=3(n-l) 0.68 0.25 0.06 0.01

"Qualitative" Bayesian estimation of digital signals and images constant run lengths (or different run length probability densities) in the horizontal and vertical directions. The probabilities shown in Table 3 seem to capture the intuitively expected properties of the priors. In an image composed of large constant regions, Constant should have the highest prior probability, Edge the next highest, and Corner and X should have low probabilities, with X lowest of all. The case r = r' = n - 1 in the table is too "busy" to meet these standards, but for the other two cases the probabilities are intuitively satisfying. In the experiments described in Section 5 we used the priors shown in Table 3 for the case r = r' = 3 ( n - 1). Obviously, under our simple assumptions of independent, equally probable descriptions we have only a limited degree of control over the relative prior probabilities of constants, edges, corners and Xs (and Ts, for k > 2), since we are only able to control the (average) run length(s). However, we are evidently able to control the relative probabilities of Constant and Edge, which are the two most probable cases (when the run lengths are sufficiently large, which should typically be the case). In any case, in M A P estimation it is not always important to specify the priors precisely.
4.2. M A P estimation As in Section 2, to find the M A P estimate we need to maximize

p, ...... = P . . . . I-I"x : , H~r : , p(g(x, y)[A,) Ilrx: , FI;::+ ,P(O(x, y)I A.) I-l] : . + , I - I ; : , p(g(x, y) lAv)

rl":,+ ,n"~:s+,p(ofx, y)lAw) for all 1 < t, u, v, w _< k; 1 _< r, s < n - 1, where O(x, y) is the noisy image gray level at position x, y, and where (as in Section 2)

p(o(x, y) IAi) = ~ PN(g(x, Y) lz)P(zl al). z

If the maximum occurs for t = u = v = w, then it is most likely that the array has a constant description. If it occurs when exactly three of t, u, v, w are equal, for some r and s, then it is most likely that the array contains a corner in position (r, s); and similarly for the other types of descriptions. The computational cost of this process is O(k4n3), where k is the number of As. (Evidently, the cost of computing Pt .... 1~ for all t , u , v , w is O(k4n2). If we know Pt...... then we can calculate Pt.... <,+ 1~for additional cost O(n), since

l 377

Similarly we can calculate Pt .... is+ 1)r for additional cost O(n). Since there are (n - 1)2 positions for r and s, we need additional cost O(n 3) to compute p, ...... for all r,s. Therefore, the total cost will be O(k4n3).) Evidently, in the 2D case it is very advantageous to keep both k and n small. Since n - 1 is a lower bound on the run lengths, n is in fact likely to be small. By requiring only a coarse description, we can also keep k small (e.g. for quasi-binary images k = 2); as pointed out in Section 2, this should also make the estimates more robust. In our experiments, we used n = 8 and k = 2. (The 8 , 8 array can be thought of as a feature detection operator for which we make a decision as to which type of feature--constant, edge, corner, or X-junction (or T-junction, when k > 2)--is present, and (if nonconstant) in what position.) 5. E X P E R I M E N T S

II

As in Section 3, in our image experiments we used m = 32 and the same i.i.d., approximately Gaussian noise model. Examples of the images used are shown in Fig. 3. Since in Section 3 the first definition of H and L ("bright" and "dark") gave poor results and the second and third definitions gave very similar results, here we used only the third definition. In the M A P estimation process we allowed only 3 < r, s < 6 = n - 2, in order to rule out edges near the borders of the image, as in Section 3. In our first set of experiments the ideal image was a constant of value 12. We performed partial M A P estimation for 50 noisy versions of this image with a = 1, 2, and 3 (e.g. Fig. 3(a)). Even for tr = 3, the estimate was always correct (the constant description L). In our second set of experiments the ideal image contained a horizontal edge in position 4 with values 14 (upper part) and 18 (lower part). Partial M A P was performed for 50 noisy versions of this image with each a (e.g. Fig. 3(b)). For tr = 1 and 2 the estimate was always correct (an edge in position 4 with descriptions L and H for the upper and lower parts). For ~r = 3, 49 of the estimates were (L,H) edges in the correct position; the remaining one was estimated as a corner because one upper quadrant was incorrectly described. In our third set of experiments the ideal image contained a corner in which the upper left quadrant had value 14 and the other quadrants had 18; the edge positions were 4 and 4. Figure 3(c) shows examples of the noisy images for each a. Here again, the partial M A P estimates for a = 1 were always correct. For tr = 2, 47 estimates were correct, and the other three were also corners of the correct type in shifted positions.

II;`= ,P(g(x, r + 1)IA,)H~=s+ i p(o(x, r + 1)[A,,) P ....... <'+ '> = P ....... l-I;, = ,p(o(x,r + 1)]A.)I-I" ==+ lp(g(x, r + 1)IA,~)"

M. ABDEL-MOTTALEBand A. ROSENFELD

1378

a)

b)

c)

d)

I

m m m m

Fig. 3. Examples of noisy images with a = 1 (left column), 2 (middle column), and 3 (right column). In rows (a-d), the ideal image is a constant, a horizontal, a corner, and an X-junction, respectively.

For a = 3, 46 of the estimates were corners of the correct type (36 of them in the correct position and the other 10 shifted); in the other four cases one quadrant was incorrectly described, yielding constant (twice), edge, and X-junction descriptions. Our fourth set of experiments used an X-junction ideal image with values 14 (upper left and lower right) and 18 (upper right and lower left) and the same edge positions; noisy examples for each tr are shown in Fig. 3(d). For a = 1 and 2 all of the partial M A P estimates were X-junctions of the correct type in the correct position. For a = 3 all the estimates were still of the correct type, and 47 were in the correct position, but the remaining three were shifted. Conventionally, features are detected in images using local operators which compute differences of average gray levels.16) In our M A P estimation experiments we detected features in an 8 , 8 array; this corresponds to using an operator of size 8,8. A more conventional way of detecting these features would be to consider all possible partitions of the 8 , 8 array defined by a horizontal line, a vertical line, or both (e.g. see Fig. 2). If we forbid partitions by lines that are only

one or two pixels away from the borders of the array, there are four possible partitions by a horizontal line, four by a vertical line, and 16 by both. For each partition by a horizontal or vertical line, the absolute difference of average gray levels on the two sides of the line will be high if a horizontal or vertical edge is present in that position. For each partition into quadrants by a horizontal and a vertical line, the absolute difference between the average gray level in one quadrant and the average gray level in the other three quadrants will be high if the single quadrant coincides with a corner; note that there are 64 possible corners (16 positions of the two dividing lines and four choices of the single quadrant)• X-junctions can be detected similarly by comparing the average gray levels in the two diagonally adjacent pairs ofquadrants; here there are only 16 possibilities. Thus we can test for the presence of all the possible features by comparing 4 + 4 + 64 + 16 --- 88 absolute differences of averages• The highest difference should arise from the "best" feature. We compared our M A P feature detection results with the best-feature results obtained from the absolute

"Qualitative" Bayesian estimation of digital signals and images differences of averages, using the same set of noisy images in which the ideal image was an edge. For a = 1, the best feature was always the correct edge. For a = 2, 45 of the best features were correct; in the other five cases, the best features were corners (differing from the correct feature in one quadrant). For a = 3, 31 were correct and the other 19 were corners of various types. As we saw above, the MAP results were much better. 6. DISCUSSION In this paper we have used the Bayesian approach to find maximum a posteriori (MAP) estimates. However, these are not the only estimates that can be computed from the posterior probabilities. Any cost function Cij (cost of making decision i when decision j is the correct one) can be used to define an optimal estimate, tT~Given Ci~, the optimal estimate is the one that minimizes the expected cost. The advantage of using cost functions is that we can tailor C~j to our particular application.Is) We have considered in this paper only piecewise constant signals and isothetic images. An important extension would be to piecewise linear (or higherdegree polynomial) signals or images; this would allow us to introduce other types of features such as "roof edges". In Sections 4 and 5 we used a very simple class of image models, based on a direct product of two piecewise constant signals, each generated by a run length process and a value (description) assignment process. This is not a realistic way of modeling 2D images, and it also constrains the prior probabilities of the various types of features. It would be desirable to develop a more realistic class of geometric signal models, perhaps based on "bombing" processes/9~ It would also be desirable to consider a larger class of isothetic images, e.g. allowing edges in four principal directions (multiples of 45°); this would allow us to handle a large class of integrated-circuit images. Our approach should still be practical, provided the images satisfy the restrictions that the edges appearing in an n*n array are oriented in at most two of the four directions, and that parallel edges cannot appear in the same n* n window. The partial MAP estimation results that we obtained for noisy edge images were much better than the "best-feature" results based on absolute difference of averages. This was true even though we used zeromean (approximately) Gaussian noise, for which difference-of-average edge detection operators are optimal,t1°~ As pointed out in Section 2 and demonstrated in references (3, 4), our approach can be used for arbitrary i.i.d, noise models; thus for non-Gaussian noise our approach should perform even better in comparison with conventional feature detection methods. Piecewise description of a signal in terms of k classes of values is evidently less accurate than full estimation

1379

of the values. In our examples, the computational cost when there are k classes is proportional to k 2. If the range of possible values is R, and the values are uniformly distributed, then the average estimation error when there are k classes is proportional to R/k. Thus a quadratic reduction in computational cost yields only a linear decrease in estimation accuracy. In many situations, moreover, we are not interested in quantitatively estimating the signal value, but only in assigning it to a class (e.g. "high" or "low"); for example, many types of images are "quasi-binary" (e.g. ink vs. paper, or etched vs. non-etched surface), and we want to correctly identify the two classes, but we do not need to estimate the gray levels, which in any case depend on the illumination. The approach described in this paper is especially appropriate for such situations. It should be pointed out that many images can be approximated acceptably using only small numbers of gray levels (typically less than 10);~1~ for such images too, our approach is much less expensive than ordinary MAP estimation. This paper has dealt only with edge detection in small segments of a signal (of length 16) and with feature detection in individual small (8.8) subimages, not with the description of the entire signal or image. MAP description of an entire image or signal would be very expensive. ~4~ However, reasonable (though suboptimal) descriptions should be obtainable by a "relaxation" approach tx2' 13) in which we compute the MAP descriptions of segments or subimages and then modify these descriptions so as to improve the consistency of neighboring descriptions. Acknowledgements The support of the Defense Advanced Research Projects Agency (ARPA Order No. 6989) and the U.S. Army Engineer Topographic Laboratories under Contract DACA76-89-C-0019is gratefully acknowledged, as is the help of Sandy German in preparing this paper. REFERENCES

1. H. L. Van Trees, Detection Estimation and Modulation Theory, Vol. I. Wiley,New York (1969). 2. N. C. Mohanty,Signal Processing. Van Nostrand Reinhold, New York (1987). 3. A. Rosenfeldand S. Banerjee,Maximum-likelihoodedge detection in digital signals, Technical Report CAR-TR492, Center for Automation Research, University of Maryland, College Park, March (1990). 4. S. Banerjee and A. Rosenfeld, MAP estimation of piecewise constant digital signals, Technical Report CARTR-510, Center for Automation Research, University of Maryland, College Park, July (1990). 5. M. Abdel-Mottaleb and A. Rosenfeld, Inexact Bayesian estimation, Technical Report CAR-TR-542, Center for Automation Research, University of Maryland, College Park, March (1990). 6. A. Rosenfeld and A. C. Kak, Digital Picture Processing. Academic Press, New York (1976). 7. R.O. Duda and P. E. Hart, Pattern Class~cation and Scene Analysis. Wiley, New York (1973). 8. R. Szeliski, Bayesian Modeling of Uncertainty in Lowlet~el Vision. Kluwer, Boston 0989). 9. N. Ahuja, Mosaic models for images--II. Geometric properties of components in coverage mosaics, Inf. Sci. 23, 159-200 (1981).

1380

M. ABDEL-MOTTALEBand A. ROSENFELD

10. J. Canny, A computational approach to edge detection, IEEE Trans. Pattern Analysis Mach. Intell. 8, 679-698 (1986). 11. S. Peleg, Iterative histogram modification, IEEE Trans. S M C 8, 555-556 (1978).

12. J. Kittler and J. Illingworth, Relaxation labeling algorithms--a review, I V C 206-216 (1985). 13. T. C. Henderson, Discrete Relaxation Techniques. Oxford University Press, New York (1990).

About the Author--MoHAMED ABDEL-MOTTALEBwas born in Alexandria, Egypt, in 1960. He received the B.Sc. degree in computer science and automatic control and the M.Sc. degree in remote sensing from Alexandria University, in 1983 and 1987, respectively. He is currently working toward the Ph.D. degree in computer science at the University of Maryland. From 1983 to 1985 he worked as a Teaching Assistant at Alexandria University. From 1985 to 1987 he worked as a Scientist Assistant at the IBM Research Center, Cairo, Egypt. From 1987 to 1988 he worked as a Research Assistant at Michigan State University. Since 1988 he has been a Research Assistant in the Computer Vision Laboratory at the University of Maryland. His current research interests include computer vision, image processing, and pattern recognition.

About the Author--AzgIEL ROSENFELDis a tenured Research Professor and Director of the Center for Automation Research, a department-level unit of the University of Maryland in College Park. He also holds Affiliate Professorships in the Departments of Computer Science and Psychology and in the College of Engineering. He holds a Ph.D. in mathematics from Columbia University (1957), rabbinic ordination (1952) and a Doctor of Hebrew Literature degree (1955) from Yeshiva University, and an honorary Doctor of Technology degree from Linkoping University, Sweden (1980), and is a certified Manufacturing Engineer (1988). He is a widely known researcher in the field of computer image analysis. He wrote the first textbook in the field (1969); was a founding editor of its first journal (1972); and was co-chairman of its first international conference (1987). He has published over 20 books and over 400 book chapters and journal articles, and has directed over 40 Ph.D. dissertations. He is a Fellow of the Institute of Electrical and Electronics Engineers (1971), and won its Emanuel Piore Award in 1985; he was a founding Director of the Machine Vision Association of the Society of Manufacturing Engineers (1985-1988), and won its President's Award in 1987; he was a founding member of the IEEE Computer Society's Technical Committee on Pattern Analysis and Machine Intelligence (1965), served as its Chairman (1985-1987), and received the Society's Meritorious Service Award in 1986; he was a founding member of the Governing Board of the International Association for Pattern Recognition (1978-1985), served as its President (1980-1982), and won its first K. S. Fu Award in 1988; he is a Fellow of the Washington Academy of Sciences (1988), and won its Mathematics and Computer Science Award in 1988; he is a founding Fellow of the American Association for Artificial Intelligence (1990); he is a Corresponding Member of the National Academy of Engineering of Mexico (1982) and a Foreign Member of the Academy of Science of the German Democratic Republic (1988).