The Differentiating Filter Approach to Edge Detection

The Differentiating Filter Approach to Edge Detection

ADVANCES IN ELECTRONICS AND ELECTRON PHYSIC'S, VOL. 88 The Differentiating Filter Approach to Edge Detection Maria Petrou Department of Electronic an...

4MB Sizes 9 Downloads 118 Views

ADVANCES IN ELECTRONICS AND ELECTRON PHYSIC'S, VOL. 88

The Differentiating Filter Approach to Edge Detection Maria Petrou Department of Electronic and Electrical Engineering, University of Surrey, Guildford, United Kingdom

I. Introduction . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . Theory . . . . . . . . . . . . . . . . A. The Good Signal-to-Noise Ratio Requirement . . B. The Good Locality Requirement . . . . . . . C. The Suppression of False Maxima . . . . . . D. The Composite Performance Measure . . . . . E. The Optimal Smoothing Filter . . . . . . . F. Some Example Filters . . . . . . . . . . Theory Extensions . . . . . . . . . . . . . A. Extension to Two Dimensions . . . . . . . B. The Gaussian Approximation . . . . . . . . C. The Infinite Impulse-Response Filters . . . . . D. Multiple Edges . . . . . . . . . . . . E. A Note on the Zero-Crossing Approach . . . . Postprocessing . . . . . . . . . . . . . . Conclusions . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . .

11. Putting Things in Perspective

III.

IV.

V. VI.

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

291 301 309 312 315 319 320 322 323 324 325 326 329 331 332 333 339 343

I . INTRODUCTION The purpose of computer vision is to identify objects in images. The images are obtained by various image capture devices like CCD cameras and analogue film cameras. In general an image has to be represented in a way that computers can understand it. Computers understand numbers, and numbers have to be used. An image, therefore, is a two-dimensional array of elements, each of which carries a number that indicates how bright the corresponding analogue picture is at that location. The elements of the image array are called pixels, and the values they carry are usually restricted by convention to vary between 0 (for black) and 255 (for white). To be able to represent a scene or an analogue picture in adequate detail, we need to use many such picture elements, i.e., our image arrays must be pretty large. For example, to imitate the resolution of the human vision system, we probably need arrays of size 4000 x 4000, and to imitate the resolution of an ordinary television set, we must use arrays of size 1000 x 1000. To store 291

Copyright Ic 1Y94 hy Academic Pres,. Inc All rights of reproduction in any form reserved ISBN 0-12-014730-0

298

MARIA PETROU

a television-size image, therefore, we need about eight Mbytes of memory. And this is only for a black-and-white image, usually called a grey image to indicate that not only black and white tones are used but also all possible shades in between. If we want to represent a coloured picture, we need three times as many bits, because it has been shown that any colour can be reproduced by blending appropriate amounts of three basic colours only. This is known as the trichromatic theory of colour vision. So, a coloured image can be represented by a three-dimensional array of numbers, two of the dimensions being the spatial dimensions which span the image and the third dimension being the one used to store three numbers that correspond to each pixel, each giving the intensity of the image in one of the three basic colours used. In this chapter, we are going to talk only about grey images, so this is the last time we make any reference to colour. It is clear from the above discussion that an image contains an enormous amount of information, not all of which is useful, necessary, or wanted. For example, we all can recognize that the person depicted in Fig. l b is the same as the person in Fig. la, although Fig. l b is only a sketch. That image is a binary image, and thus each pixel requires only two bits to be represented. This is a factor of 4 reduction in the number of bits needed for the representation of the grey image and a factor of 12 reduction in the

FIGURE1. (a) An original image. (b) Edges detected by hand.

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

299

number of bits needed for the representation of the corresponding colour image. And yet, for the purpose of recognition, such a representation is adequate. If we could make the computer produce sketches like this it would be very useful: first because in order to identify the shape of the object much less number crunching will have to take place and second because having found the outline of the object its properties can be computed more easily. A lot of vision problems would even stop at the point of the shape description, as many objects can be easily identified from their shape only. The task of making the computer produce a sketch like Fig. l b is called edge detection, and the algorithms that can do that are called edge detectors. Is edge detection a difficult task for a computer? Well, it has proven to be very difficult indeed, in spite of all the ingenuity and effort that has gone into it. Let us try to follow the steps I took when I drew the sketch of Fig. l b , starting from the image shown in Fig. la. I first looked at places where there was some changes in brightness and I followed them around. I did not bother with the changes in brightness that occur inside the boy’s shirt because I know that they do not matter in the recognition process. I did not bother with the shades that appear in the face, as they may be due to image reproduction problems or play no role in the representation of the basic characteristics of the face. I did bother with changes in brightness around the nose area, even though they were faint and gradual, and I did reproduce very faint outlines if they were straight, meaningful, and seemed to complete the shapes represented. If we read carefully again the previous statement, we will notice that a lot of thinking went into the process without even realising it. In particular, a lot of knowledge and experience was incorporated into it, knowledge that has been acquired over a lifetime! Well, most edge-detection effort so far has gone into attempting to reproduce the first small part of the description of the process, i.e., to make computers recognize the places where there is some change in brightness! And in spite of the hundreds of methods developed and the hundreds of papers published, a good edge detector today will not produce anything as good as what is shown in Fig. lb; instead, something like what is shown in Fig. 2 will be the result. The reason is that most of the effort has gone into the first part of the description, namely into identifying places where the brightness changes. In fact, this task seems relatively easy, but even that is difficult enough to have been the motivation of hundreds of publications. The rest of the description given is in fact extremely difficult. It is all about knowledge acquisition, representation, and incorporation and is part of the much wider field of research, including pattern recognition and artificial intelligence. This chapter will only deal with the first part of the problem. In the section on

300

MARIA PETROU

J \/---FIIXJRE2. The output of a good edge detector when applied to the image of Fig. la.

postprocessing we shall come the nearest we shall come to the incorporation of knowledge, but even that is going to be very elementary and nothing in comparison to the knowledge a human utilises when producing something like Fig. lb. It is my personal belief that the quest for the best edge detector has reached saturation point from the point of view of the image-processing approach and that any breakthough or significant improvement in the future will have to come from the integration of the edge-detection process

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

301

into a vision system, where knowledge is used at, and information is transferred back and forth, between all the levels of understanding and image analysis. As I said earlier, at first glance the identification of points where the intensity changes seems to be very easy. In fact it seems that we can achieve it by just scanning along the signal and noting any difference in the greylevel value we see. Every time this local difference is a local maximum, we note an edge. Let us do this first for a one dimensional signal, namely one row of the image. In Fig. 3a we plot the grey values in the image along a certain row and at the vicinity of an edge. To identify places where the grey value changes, I scan the signal and find the difference in grey-level values between a pixel and its next neighbour. Formally this process is called ” Ideally, this difference represents “convolution by the mask the local derivative of the intensity function calculated at the point halfway between the two successive pixels. For the sake of simplicity, however, we may assign the difference to the pixel under consideration. This small discrepancy can be avoided if we use the next and the previous neighbour to estimate the local difference. Since these neighbours are two interpixel distances away from each other, we may say that “we convolve the signal with mask 1 -0.5 I 0 1 0.5 1 . ’ If I; is the grey value at pixel i, we may say that the difference AI; at the same pixel is given by:

1-11-1.

Figure 3b shows the result of this operation. An edge is clearly the point where this difference is a local maximum. The most noticeable thing about Fig. 3b is that if we identify all the local maxima in the output signal we shall have to mark an edge in several places along the signal, most of which are spurious. This is shown in Fig. 3c. As we can see, the edge points detected are so many, that they hardly contain any useful information. The obvious cause of the problem is that when we do the edge detection, we ignore small and insignificant changes in the intensity value. When the computer does it, it does not know that. Therefore, we have to tell it! The proper terminology for this is thresholding. Effectively we tell the computer to ignore any local maximum in the value of the derivative which is less than a certain number, the threshold. How we choose this number is another topic of research. It can be done automatically by an algorithm we give the computer, or it can be done manually, after we look at the values of the local maxima, or even more grossly, by trial and error, until the result looks good. Alternatively, one may try to stop all these spurious local maxima from arising in the first place. If we look carefully at the image in Fig. la, we shall see that although the wall in the background is expected to be of

302

MARIA PETROU

raw data

(a) 0

:

n X u

8

P4

0

~

1.

=* 0

5

10

smoothed data

(d)

.8

~ ....mD.

8

8 .

( v -

15

20

0

5

0

15

20

X

X

first difference

(b)

10

(el first difference of smoothed dato

v -

UP4

0

0

5

10

15

X

(C)

local maxima

X

20

0

1 . 1

8

0

5

10

15

20

X

(fl local maxima in smoothed dota

X

FIOURE 3. Top panels: A raw signal and its smoothed version. Middle panels: The first difference of the signals in the top panels. Bottom panels: The locations of the local maxima in the values of the first difference.

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

303

uniform brightness, it seems to contain quite a variation in grey tones in the image. These variations are those that create all the spurious edges (see for example Torre and Poggio, 1986). The major reason of this lack of uniformity even for regions that in reality are very uniform, is the thermal noise of the imaging device. The best way to get rid of it is to smooth the signal before we apply any edge detection. This can be done, for example, by replacing the grey value at each pixel position by the average value over three successive pixels. The resultant signal then will look like Fig. 3d. Formally, we can say that the smoothed value Si at pixel i is given by:

We then apply the difference operation

to the smoothed signal and obtain the signal in Fig. 3e. If we keep only the local maxima, we obtain the signal in Fig. 3f. It is clear that some thresholding will stiIl be necessary, although fewer spurious edges are present in this signal than in the signal of Fig. 3c. There are a number of things to be noticed from the above operation: After the smoothing operation, the edge itself became very flat and shallow, so its exact location became rather ambiguous. In fact, the more smoothing is incorporated, i.e., the more pixels are involved in the calculation of Si by Eq. (2), the more blurred the edge becomes and the fewer the spurious edges that appear. This observation is known as the uncertainty principle in edge detection. In the next section we shall see how we can cope with it. We can substitute from Eq. 2 to Eq. 3 to obtain:

That is, we can perform the operations of smoothing and differencing in one go, by convolving the original signal with an appropriate mask, This is because both in this case with the mask - 1 -$ I 0 I I operations, namely smoothing and differencing, are linear. It is not always desirable for the two operations to be combined in that way, but sometimes it is convenient.

I 4

4 1.

We shall see in the next section how the two major observations above will be used in the process of designing edge-detection filters. However, first

304

MARIA PETROU

we shall see how the simple ideas above can be extended to the detection of edges in two dimensional signals, i.e., images. There are two major differences between the location of discontinuities in a one-dimensional and in a two-dimensional signal: First, sharp changes in the value of a two-dimensional function coincide with the local maxima of the first magnitude of the gradient of the function. For a two-dimensional signal the smoothing does not have to take place along the same direction as the local differencing. The gradient of a two-dimensional function I(x, y ) is a vector given by:

ar

az

gE-i+-j ax ay

where i and j are the unit vectors along the x and y directions respectively. Two things are obvious from the above expression. First, we must estimate the derivative of the intensity function in two directions instead of one; and second, an edge in a two-dimensional image is made up from elements, called edgels, each of which is characterized by two quantities, the magnitude of the gradient and its orientation. The orientation of an edge1 is useful for some applications, .but it is not always required. Clearly, an edge must coincide with places where lgl is a local maximum along the direction it points. In the rest of this section we shall combine all the above ideas to create our own first edge detector which in spite of all its simplicity seems to work quite well for a large number of images and has served the vision community for several years as a quick “dirty” solution, before, and even after, much more sophisticated algorithms became available. It is called the Sobel edge detector after Sobel, who first proposed it (see, for example, Duda and Hart, 1973). First we want to estimate the partial derivative of the brightness function along the x axis of the image. To reduce the effect of noise, we decide to smooth the image first by convolving it in the y direction by some smoothing mask. Such a mask is 1 1 1 2 1 1 I. We then convolve the smoothed image along the x axis with the mask W I T ]and estimate the local partial derivative aI/ax, which we call AZx. We follow a similar process in order to estimate the partial derivative of the brightness function, AI,, i.e., we smooth along the x axis by convolving with the smoothing 1 1 2 1 1 1and we difference along the y axis. We can then estimate mask ( the value of the magnitude of the gradient at each position by computing:

G

I

AI;

+ AIy”

(6)

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

305

Notice that C is not the magnitude of the gradient but rather the square of it. Since only relative values matter, there is no point in adding to the computational burden by taking square roots. We thus create a new output that at each pixel position contains an estimate of the magnitude of the gradient at that particular position. We can also estimate the approximate orientation of the gradient at a given position by comparing the outputs of the differences along the horizontal and the vertical directions at each position. If the horizontal difference is the greatest of the two, then a mainly vertical edge is indicated and to check for that we check if the magnitude of the gradient is a local maximum when compared with the values of the gradient the two horizontal neighbours of the pixel have. If the vertical difference is the largest one, a horizontal edge is indicated, and to confirm that we check if the gradient is a local maximum in the vertical direction. If either of the two hypotheses is confirmed, we mark an edge at the pixel under consideration. Figure 4a shows the result of applying this algorithm, to the image of Fig. la. It is clear that lots of spurious edges have been detected, and some postprocessing is necessary. After some trial and error concerning the value of a suitable threshold, Fig. 4b was obtained. We summarize the basic steps of this algorithm in Box 1 .

Coiivolvc

iiil)ii[

a

iinage vertically w i t h mask 2

Box I . A simple edge-detection algorithm.

306

MARIA PETROU

(a) (b) FIGURE 4. (a) The output of the algorithm presented in Box 1 when applied to the image of Fig. la. (b) The same output after thresholding.

The results shown in Fig. 4 are very encouraging, and if all images exhibited the same level of noise as the image in Fig. la, there would not have been much point for further refinement. It is worth, however, experimenting with some more noisy images, notably an image like the one in Fig. 5a. Figure 5b shows the output of the above algorithm. This output

(a) (b) FIGURE5. (a) A synthetic image with 100% additive Gaussian noise. (b) The result of applying the algorithm of Box 1 plus thresholding to the previous image.

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

307

is after a suitable threshold was chosen by trial and error! Clearly, such a result is very unsatisfactory, and the need is indicated for some more sophisticated approach to the problem. 11. PUTTING THINGSIN PERSPECTIVE

The approach we shall discuss in this chapter is only one way of dealing with the problem of edge detection. The reason it has been chosen is because it has prevailed over all other approaches, and it has become very popular in the recent years. In this section we shall review briefly the other approaches so that things are in perspective. Edge detection has attracted the attention, of researchers for a long time since the early days of computer vision. Quite often people interested in other aspects of vision bypassed the problem assuming that “a perfect line drawing of the scene is available.” As we mentioned in the introduction, a perfect line drawing has eluded us for a long time, and it has become increasingly obvious that it cannot be obtained in isolation of the other aspects of vision research. In spite of that, hundreds of papers have been published on the subject, and although it is impossible to review them all, we can at least record the basic trends in the field. We can divide the approaches into three very gross categories: The region approach. The template-matching approach. The filtering approach. The region approaches try to exploit the differences (often statistical) between regions which are separated by an edge. Examples of such approaches are the work of de Souza (1983), Bovic and Munson (1986), Pitas and Venetsanopoulos (1986), Kundu and Mitra (1987), and Kundu (1990), and they are often referred to as “nonlinear filtering approaches.” Such edge detectors are particularly successful when there is a prior hypothesis concerning the exact location and orientation of the edge, i.e., when the approach is model based and relies on hypothesis generation and testing (e.g., Graham and Taylor, 1988). An alternative type of approach is based on region segmentation that exploits the statistical dependence of pixel attributes on those of their neighbours. This statistical dependence of the attributes of pixels which make up a region may be discontinued, when a certain quantity concerning two neighbouring pixels exceeds some threshold. Such an approach is usually incorporated into a more general process of image segmentation or image restoration using Markov random fields, for example, and the proper term for it is “incorporating a line

308

MARIA PETROU

process in the system.’’ The “line process” is in fact the implicit acceptance of an edge between pixels which are sufficiently dissimilar. An example of such work is the work of Geman and Geman (1984). In general these methods tend to be slow. They also rely on estimates of the Markov parameters used, i.e., on image or at least region models, which are not usually available, and it is not easy to estimate. In the template-matching approaches, one can include the approach of Haralick (1980 and 1984) and Nalwa and Binford (1986), who model either the flat parts of the image function (facet model), or the edge itself. In the same category one should include the robust approach by Petrou and Kittler (1992) who tried to identify edges by fitting an edge template at each location which, however, did not minimize the sum of the squares of the residuals, but it rather relied on an elaborately derived kernel which weighed each grey value according to its difference from the corresponding value of the template. The process was very slow, and the results did not seem convincingly better than the results of the linear approaches. The problem with all model-based approaches (region-based and template-based included) is that one may tune the process very well according to the assumptions made, but the assumptions, i.e., the models adopted, do not apply at all edges in an image, so beautifully built theories fail because reality stubbornly prefers exceptions to the general rules! However, the last word has yet to be said about these lines of approach, and it is possible that in the future they may produce better results. Under the third category of edge detectors, we include all those which rely on some sort of filtering. Filters are often designed to identify locations of maximal image energy, like those by Shanmugam et al. (1979) and Granlund (1978), or to respond in a predetermined way when the first or the second derivative of the signal becomes maximal. In general, one understands filtering as a convolution process; this however is not always true and nonlinear filters which effectively adapt to the local edge orientation with the purpose of maximally enhancing it have been developed (for example see van Vliet et al., 1989). In the same category of nonlinear filtering one should include the morphological operator of Lee et al. (1987). A special type of filters was proposed by Morrone and Owens (1987). These were in quadrature with each other, designed to locate positions of energy maxima and classify the features detected by examining the phase of the filter outputs. The filters are chosen to form a Hilbert transform pair, and the sum of the squared outputs of the two convolutions is supposed to be the energy of the signal. Detailed experimentation on this claim has shown that this is not true exactly, unless one of the filters is matching the signal, something that is very difficult when the signal may be of varying profile. However, such filters have become reasonably popular recently and research

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

309

in that direction is still under development (see for example Perona and Malik, 1992). The attraction of the approach relies on the simultaneous identification of step type edges and line type edges. In this chapter we shall concentrate on the filters that are designed to identify maxima of the first derivative of the signal. The reader is referred to the above-mentioned references for details of other approaches and the brief survey of recent trends in edge detection by Boyer and Sarkar (1992). 111. THEORY

In Section I we saw some of the fundamental problems of edge detection, we constructed our first edge detector, and we saw its inadequacy in coping with very noisy images. To be able to d o better than that, we must examine carefully what exactly we are trying to do, express the problem in a way that can be tackled by the tools an engineer and designer has at his or her disposal, and finally solve it. That is what we shall attempt to do in this section. It is not difficult to convince ourselves by looking at Fig. 5 that the problem we really try to solve is to detect a signal in a very noisy input we are given. We saw that the intuitive filters we used in Section I did not really work. To choose another filter, we really need to know something more about the nature of the signal we try to detect and the noise we are dealing with. So, we must start by modelling both signal and noise. Since the noise most of the time is caused by the thermal noise of the imaging device, the most plausible way to model it is to assume that it is additive, Gaussian and homogeneous white noise with zero mean and standard deviation 6. The word “additive” means that the input signal I ( x,y ) , can be written as:

I(x,Y ) = u(x,u) + N x , u),

(7)

where u(x,y ) is the signal we try to isolate and n(x, y ) is the noise. The word “Gaussian,” means that at every location (x,y ) , the noisy component n, say, of the grey value, is chosen at random, from a Gaussian distribution of the form

where p(x,y ) is the mean and a(x, y ) is the standard deviation of the noise. This expression implies that at each location the noise is of different level and standard deviation. This would make the noise inhomogeneous over the image, something which is both unlikely to occur and difficult to handle.

310

MARIA PETROU

That is why we assume that the noise is “homogeneous,y’ and that the quantities p(x, y) and a(x, y) are not really functions of position. Further, if p were different from zero, there would have been a biased component to the noise which could easily be detected and removed at a preprocessing stage, The word “white” means that if we consider an image which consists of noise only, its Fourier spectral density is flat, i.e., all frequencies contribute to it with the same amplitude. Another way of saying the same thing is to state that the noise is uncorrelated. This means that the noisy grey value added to the signal grey value at each location is not affected by and does not affect any other noisy grey value added anywhere else in the image. That is, if I consider any two pairs of grey noise values at a certain relative position r, and I average the product of all possible such pairs at the same relative position over the image, the result will tend to zero as the size of the image I consider gets larger and larger. When, however, I compute the average square grey value of the noise field, the result will tend to become equal to the standard deviation of the noise, as the size of the image we consider gets larger. We say then that the autocorrelation function Rnn(r)of the noise field is a delta function:

It is known that the Fourier transform of the autocorrelation function of a random field is the spectral density of the field and knowing that the Fourier transform of a delta function is a constant, we deduce that the spectral density of uncorrelated noise is white, i.e., constant. Having understood the noise we are dealing with, or at least that we assume we are dealing with, we turn next to the method we are prepared to use in order to identify edges. To keep matters simple and fast, we prefer to use linear filters. There are various reasons for that: The implementation of linear filters is easy. In fact, one can use the general framework for edge detection given in Box 1 and only replace the simple masks by some more sophisticated ones. Various attempts have been made to replace the linear process of edge detection with some nonlinear one, but they did not show convincingly enough that they could produce any better results than the linear approach. We understand exactly how the linear approach works, thus we feel more in control when we use it. Edge detection is only a preprocessing stage to a vision system, and we need some method that works fast and efficiently, while nonlinear methods tend to be rather slow.

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

3 11

For these reasons, we shall restrict ourselves to the design of convolution filters. Just as we did in Section I, we shall start by considering one-dimensional signals only. Let us say, therefore, that the noisy signal we have can be expressed as: Z(x) = u(x) + n(x). (10) We are seeking to define a convolution filter f ( x ) which, when convolved with the above signal will produce an output with a well-defined maximum at the location of the edge (feature) we wish to detect. We can try to systematize the desirable properties of the filter we want to develop, as follows: We want to be able to detect the edge even at very high levels of noise, in other words, we want our filter to have high signal-to-noise ratio. We want the maximum of the output of the filter to be as close as possible to the true location of the edge/feature we want to identify. We want to have as few as possible spurious maxima in the output. These basic requirements from a good edge filter were first identified by Canny (1986), who set the foundations of the edge-filter theory. Although the above requirements as stated seem vague and general, one can translate them into quantitative expressions that can be used in the filter design. Before we do that, we must discuss first the properties of the filter function itself

0

Since the filter is assumed to be a convolution filter, we do not want to have to convolve with a filter of infinite size. We do not want to use a filter which goes abruptly to zero at some finite value, because sharp changes in a function can only be created by the superposition of strong high-order harmonics when Fourier analysis is performed. Since convolution of two functions corresponds to the multiplication of their spectra, the presence of significant high-frequency components in the spectrum of the filter will imply that the high-frequency components of the input signal will be acentuated. However, the noise is assumed white, and the signal is the product of an imagelsignal capturing device which naturally is having a band limited frequency of operation. Thus, the high frequencies in the input signal will be those that are dominated by the noise, while the low frequencies will be dominated by the spectrum of the true uncorrupted signal. Accentuation of the high frequencies, therefore, is equivalent to accentuation of noise, contrary to what we try to achieve. For this reason, we want the filter to go smoothly to zero at its end points.

312

MARIA PETROU

Another desirable property the filter should possess is that its output should be zero if the input signal does not contain any features, i.e., if the input signal is absolutely flat. This can be achieved if the filter has zero direct component. The above mentioned requirements can be summarized as follows:

f(* w ) = 0,

f ’ ( kw) = 0,

f ( x ) = 0 for 1x1 > w ,

(11)

Sr,f(X)dX = 0 wheref’(x) is the first derivative of the filter, and w is its finite half-width. A. The Good Signal-to-NoiseRatio Requirement

To be able to tell whether a filter has good signal-to-noise ratio or not, without trying it in practice, we must calculate expressions of the filter response to the signal and to the noise separately. Since the filter is assumed to be a convolution filter, its response to the signal can be written as:

~(2) = or equivalently,

L

. :S

~(2) =

u(x)f(i- X) dx

(12)

~ ( -2x)~(x) dx

(13)

given that the order by which two functions are convolved does not really matter. Similarly, the response of the signal to the noise component is:

L¶ 00

v(2)

=

n(x)f(2 - x) dx =

S_. W

n(2 - x)f(x) dx

(14)

The noise is a random variable, and thus v(2) will be a random variable too. The only way we can characterise it, then, is through its statistical properties. One way to estimate its magnitude, is to compute its mean square value If we multiply both sides of Eq. (14) with v(2) and denoted by take the expectation value, we have:

m.

E ( [ v ( i ) ] 2=]

iw -W

f(x)E[v(Z)n(i - x)] dx

(15)

where we have made use of the following facts: 1. The quantity v(2) does not depend on the variable of integration, so it

can be placed inside the integral sign on the right-hand side of the equation.

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

3 13

2. We can exchange the order of integration and taking of the expectation value on the right-hand side of the equation, because the expectation value is taken over all possible outcomes of the random process that gives rise to the noise at the specific location, i.e., by definition:

where N(n(i))is any function of the noise. 3. The expectation value integration affects only the random variables, i.e. quantities that are functions of the noise and not the deterministic filter function f ( x ) . The autocorrelation function of a random variable and the cross-correlation function between two random variables are respectively defined as: R""(7)= E(n(x)n(x+

T)]

&(7)

= E(v(x)n(x + 7))

(17)

Making use of this definition, Eq. (15) can be rewritten as:

mw)2) =

j

W

f(x)R,(-x) dx

(1 8)

-W

It is clear from the above expression that we need an expression for R,,(x). We start from Eq. (14) as before, only that now we multiply both sides with n ( i ) . Following the same steps we obtain:

s_. W

E { v ( i ) n ( 2 ) )=

f ( ~ ) E ( t ~-(xf) n ( i ) ] dx

(19)

Expressed in terms of the autocorrelation and cross-correlation functions, the above result can be restated as:

Rvn(2- 2) =

I_. W

f(x)Rfln(2- 2

+ X ) dx

(20)

However, the autocorrelation function of the noise is supposed to be given by Eq. (9). If we make use of that expression, we find that:

RJi

- 2) =

aZF(2 - i )

(21)

The above equation equivalent can be written as:

R,"W =

&-(-XI

(22)

Finally, substituting into Eq. (18). we obtain: W

(23) -W

314

MARIA PETROU

Having computed the response of the filter to the signal and estimated the magnitude of its response to noise, we are ready now to define the signal-tonoise ratio of the filter output:

We can simplify this expression, by saying that we redefine the origin of the x (or 2)axis to be the location of the edge we wish to detect. Then, at the

location of the edge, the signal-to-noise ratio will be given by the above expression calculated at 2 = 0. Further, we do not need to carry around constants that do not affect the choice of functionf(x). Such a constant is the standard deviation of noise. We can define, therefore, a measure of the signal-to-noise ratio, as follows:

The filter functionf(x) should be chosen in such a way that this expression is as large as possible. There are some interesting observations we can make by just looking at expressions (24) and (25): It is known that any function can be written as the sum of a symmetric and an antisymmetric part. Let us say that our filter functionf(x) can be written as: f(4 = + f,(x) (26)

m)

where f,(x) is its symmetric part and f,(x) is its antisymmetric part. On substitution in Eq. (25) we obtain: S=

,!I 4-x)f,(x)dx + r w u(- x)f,(x) dx dSrwfs2(x)dx +, ! S fa2(x)dx + 2 !yw f,(x)f,W dx

(27)

So far, we have not made any assumption concerning function u(x) with which we model the feature we wish to detect. Since our purpose is to detect sharp changes in the signal, centered at x = 0, the signal must be modelled by an appropriate function, like a sigmoid, or a step function. Further, since the filter is made to give zero response to a constant background, such a function should only model the signal without its direct component. Therefore, any function which models an edge reasonably will be an antisymmetric function. Given that the product of a symmetric and an antisymmetric function is antisymmetric, and given that we integrate over a symmetric interval, the

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

3 15

implication is that j!, u(- x)&(x) dx = 0 and I !, f,(x)&(x) dx = 0. The symmetric part of the filter function, therefore, does not contribute at all to the magnitude of the signal. On the contrary, it contributes to the magnitude of the filter’s response to noise as it can be seen from the extra integral that remains in the denominator of the above expression. We conclude, therefore, that the filter for the detection of edges should be an antisymmetric function. If we decide to model the edge we want to detect by a step function, the “strength” of the signal will be the amplitude of the step at x = 0, call it A . Then it is clear from expression 24 that this amplitude can come out of the integral in the numerator, and thus the signal-to-noise ratio we measure will be proportional to the true signal-to-noise ratio M a . If instead of using filterf(x) we use filter af(x), the signal-to-noise ratio for the filter response is not going to change, i.e., it is independent of the filter amplitude. If on the other hand, we scale the size of the filter and make it go to zero at x = +pw, say (with /3 > l), instead of w , the signal-to-noise ratio will be scaled up accordingly by @. We can see that as follows: The scaled filter would bef(x/P) and obviously would go to zero when x = +pw. If we substitute this filter expression in (25) and adjust the limits of integration appropriately, we shall have a measure of the signal-to-noise ratio of this particular filter. To relate it to the signalto-noise ratio of the original filter, we must change the variable of integration to y = x//3, say. Then it is trivial to see that the signal-tonoise ratio of the new filter is @ times the signal-to-noise ratio of the old filter. Thus, using larger filters we improve upon the signal-tonoise ratio performance.

B. The Good Locality Requirement We can turn our attention now to the problem of good locality. The edge we wish to detect will be marked at the location of an extremum of the output, i.e., at the point where

Using Eq. (12) we can compute as(X)/&f as: as(2) = -

a2

1

OD

-m

u(x)f’(Z - x ) d x

316

MARIA PETROU

Similarly, from Eq. (14) we obtain:

av(9 = n(xlf'(2 - x ) dx a2 -m It will be convenient later if we exchange the order of convolution in the above expression and without any loss of accuracy we rewrite it as: ~

In the absence of noise, the extremum in the filter output would coincide with the true location of the edge that is assumed to be at x = 0. This is very easy to see. Indeed, in the absence of noise, Eq. (28) becomes:

At the point 2 = 0 this expression is: OD

(33)

We have argued earlier that the filter should be an antisymmetric function, just like the function u(x) with which we model the signal. The first derivative of an antisymmetric function is a symmetric function, and the product of a symmetric and an antisymmetric function vanishes when integrated over a symmetric interval. The implication is that in the absence of noise

which means that the output is an extremum at the exact location of the edge. Because of the noise, however, the location of the extremum of the output will be misplaced, as computed from Eq. (28). The amount by which it will be misplaced is a random variable, and we can compute its mean-square value. Indeed, the misplacement is not expected to be a very large number, so we may expand the function f'(2 - x), which appears in Eq.(29), as a Taylor series about the point x' = 0:

f ' ( 2- x ) = f ' ( - x )

+ 2f"(- x ) +

(3 5 ) On keeping only the first two terms of the expansion and by substituting in Eq. (29) and remembering that f'(- x) is a symmetric function, we obtain: * . a

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

3 17

We could use the above result to substitute directly in Eq. (28); however, it will give us more insight to put it in a different form. It is obvious from the properties of convolution that the two expressions below are identical:

If we compute both sides we obtain:

I-00

uy-x)f'(x) dx =

j

u(x)f"(-x) dx

m:

(38)

On the grounds of this result, Eq. (36) can be written as: as(.q

-= x ' s a2

W

u'(-x)f'(x) dx

(39)

-W

If we use this result and that of Eq. (31) into (28) we obtain:

2

jw

U'(-X)f'(X)

-W

dx =

SIW

n(2 - x)f'(x) dx

(40)

Both sides of the above expression contain random variables and we can compute their square expectation values as follows:

Notice that the expectation integral operates only on the random variables and not on the deterministic factors. The expectaction value on the righthand side of this equation is effectively the expectation value of the square output of the convolution of filter f ' ( x ) with pure noise. Equation (23) above tells us that this is equal to d j Y w f ' ( x l f '(x) dx. Thus, the expectation value of the square misplacement of the location of the maximum in the output away from the true edge location is:

Clearly, the smaller this expectation value is, the more closely the output maximum is to the true edge location. Thus, we define a good locality measure by an expression proportional to the inverse of the right-hand side of the above equation and without any unecessary constants involved:

318

MARIA PETROU

We can make some interesting observations by looking at this expression: The good locality measure is independent of the filter amplitude. If we scale the filter as we did in the case of signal-to-noise ratio, the good locality measure of the scaled filter will turn out to be l/@ the good locality measured of the unscaled filter. Thus, the larger the filter is, the more ambiguity is introduced into the exact location of the detected feature. This is exactly the inverse of what we concluded about the signal-to-noise ratio, and the two conclusions together are known as the “uncertainty principle in edge detection.” For any two functionsfi(x) and fi(x), Schwarz’s inequality states that

with the equality holding when one function is the complex conjugate of the other. If we apply it to the expressions for S and L as given by Eqs. (25) and (43) respectively, we shall find that the filter that maximizes the signal-to-noise ratio is given byf(x) = u(-x) and that the filter that maximizes the good locality measure must satisfy f ’ ( x ) = u’(-x). This means that both measures can be maximized by the same function, i.e. , the “matched filter” for the particular signal. The last observation led Boie et al. (1986) to dispute the validity of the uncertainty principle and advocate the use of matched filters for edge detection. The uncertainty principle, however, is referred to the size of the filter and not its functional form. The question Canny (1986) and other people who followed this line of research tried to answer was: If I fix the size of the filter, how can I choose its shape so that I compromise between maximizing its signal-to-noise ratio and its good locality performance? For an isolated edge modeled by a step function, the matched filter is a truncated step of the opposite sign. This is the well-known difference-ofboxes operator (see, for example, Rosenfeld and Thurston, 1971), which due to its sharp ends creates an output with multiple extrema, something we wish to avoid. Boie et al. (1986) avoided this problem by not making the assumption of white Gaussian noise. Instead they analysed the physical causes of noise in the imaging device and came up with a nonflat noise spectrum. It is not clear from their work whether this by itself was adequate to make their filters go smoothly to zero or not. Their matched filters do go to zero smoothly, but some of them seem to be more than 100 pixels long! Further, instead of modelling the edge itself, they modelled its first derivative by a Gaussian function. If an edge were an ideal step edge, its derivative would have been a delta function. Clearly, the band limited

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

3 19

range of operation of the imaging devices converts any such derivative to something that is better approximated by a Gaussian. In the case of white noise expressionf'(x) = u'(-x) would have implied a filter made up of the integral of a Gaussian that has sharp ends, unless one forces it in some way to go to zero, perhaps making some extra assumptions concerning the proximity of neighboring edges. In general, the matched filters by Boie et al. (1986) have not gained much popularity, perhaps because they do not seem very practical,

C. The Suppression of False Maxima Since we consider filters that go smoothly to zero at their end points, the only source of false maxima in the output is the response of the filters to noise. Rice (1945) has shown that if we convolve a function with Gaussian noise, the output will oscillate about zero with average distance between zero crossings given by:

where R,(r) is the spatial autocorrelation function of function g(x),defined by: (46)

Upon differentiation, we obtain: (47)

We can define a new variable of integration in the integral of the right-hand side; i E x + r. Then:

I-I-. m

R&(r) =

g(R - r)g'(Z)di

(48)

Upon one more differentiation, we obtain: a0

R;:(T) =

-

g ' ( i - r)g'(Z)dx'

(49)

Thus, the expressions that appear in Eq (45) can be written in terms of the convolving function and its derivative as:

1

m

R,(O) =

-00

[s(x)12dx,

R;;(O) =

-

J

00

-m

[s'(x)I2dx

(50)

320

MARIA PETROU

It is clear that the false alarms in our case are the extrema in the output of the convolution of the filter with the noise, which coincide with the zeros in the first derivative of the output. These are the same as the zero crossings that will arise if we convolve the noise with the first derivative of the filter. That is, the false alarms in the output arise from the notional convolution: I!’, f’(x)n(Z - x)dx. The role of function g(x), therefore, in our case is played byf’(x) and thus, a measure of the average distance between the extrema in the output of our filter when we convolve it with noise can be defined as:

where we divided by w to make the expression scale-independent. We can use this expression as a measure of reduced number of spurious edges in the output. Clearly, the larger the average distance between the extrema in the output due to noise, the smoother the output will look and thus the easier it will be to isolate the true edges from the spurious ones. D. The Composite Performance Measure

We have derived in the previous three subsections quantitative expressions for the qualities we would like our filter to possess. The way these expressions have been defined implies that a good filter should maximize the values of all three of them. It is clear, however, just by looking at Eqs. (25) and (43) on one hand and (51) on the other, that it is impossible to maximize all three quantities simultaneously, since the integral,!j [f”(x)I2dx appears in the numerator in (43) and in the denominator in (51). There is a need, therefore, for some sort of compromise, where we try to satisfy all three criteria as well as possible. This can be done by forming a composite criterion, call it P, by combining the three criteria above. We then have to choose functionf(x) in such a way that this composite criterion is maximal. Such a function will probably depend on certain parameters that will have to be chosen so that the boundary constraints are satisfied and the composite criterion does take a maximal value. The way various researchers proceeded from this point onwards diverges and has led to a variety of filters admittedly not very different from each other. The exact details of the optimization process used are not of particular interest and can be found in the respective references given. We shall outline here only the basic assumptions of each approach. Canny’ composite criterion was formed by multiplying the first two quantities only, S and L: P, I SL. He then chose the filter function by maximizing P, subject to the extra condition that C is constant. Canny’s

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

321

model for an edge was a pure step function defined by: u(x) =

1 0

forx? 0 for x I0

Such a function is entirely self-similar at all scales, i.e., it does not introduce to the problem an intrinsic length scale, and thus the filter derived can be scaled up and down to fit the user’s size requirements. This way Canny derived the following filter:

f(x) =

I

eax[K1 sin(&) + K2 cos(&)] + e - Y K 3 sin(Qx) + K4 cos(Qx)] + K , -f(-x)

for ’ w 5 x I0 (53) forOsxsw

This filter depends on seven parameters K , , . . .,K , , a, and a, which can be chosen so that the boundary conditions expressed by the first two Eqs. (1 1) and the antisymmetry implication thatf(0) = 0 are satisfied. Further, as we mentioned earlier, the scaling of the filter does not affect its performance. Thus, one of the filter coefficients can arbitrarily be set to one so that the number of unknown parameters reduces to six. The problem is still underconstrained as the three boundary conditions are not enough to specify all six parameters, which have to be chosen so that C and P, take maximal values. Canny argued that it was not the exact value of C that mattered, but the error created to the output due to the presence of false maxima in relation to the error introduced by thresholding at the end. Thus, he tried to choose the values of the remaining three parameters (after the boundary conditions had been used) to maximize P, and at the same time minimize the error caused by false maxima expressed as a fraction of the error caused by thresholding. He used stochastic optimization to scan the 3D parameter space since the function he had to optimize was too complicated for analytic or deterministic approaches. Spacek (1986), in order to reduce the ambiguity, created a composite performance measure by multiplying all three criteria to form a composite one. Spacek’s composite criterion, therefore, is: P, E (SLC)2. He also modelled the edge by a step function. The best filter then appears to be one which is given by the same equation as Canny’s filter 53, but with Q = a. Thus, the number of independent parameters on which the filter depended was reduced to five, After using the boundary conditions, Spacek fixed parameter a to one, as the filter seemed to be insensitive to it, and chose the remaining parameters so that the composite performance measure took maximal value. Petrou and Kittler (1991) followed similar to Spacek’s approach but argued that the best model for an edge is a ramp, since any image processing

3 22

MARIA PETROU

device will smooth out all sharp changes in an image due to its finite band width of operation. The edge model they assumed was: 1 - 0.5e-" u(x) = [0.5esx

for x 2 0 for x 5 0

(54)

-

-

where s is some positive constant possibly in the range 0.5 to 3 which is intrinsic to the imaging device and thus identical for all scene step edges (and thus, image ramp edges) in images that were captured by the same device. The filter they derived is given by:

+

eax[KIsin(ux) + K2 cos(ux)] e-ax[K3sin(ux) + K4 cos(ux)] for - w Ix I0 + K5+ K,e" (55) -f(-x) for 0 s x Iw By a semiexhaustive search of the 2D parameter space they were dealing with (after the boundary conditions were used) they determined the values of the parameters which appear in the above expression so that the combined performance measure was maximized. They tabulated the parameter values for s = 1 and various filter sizes and explained how they should be scaled for different values of s. Finally, they derived the filter for step edges as a limiting case of the filter for ramps.

E. The Optimal Smoothing Filter The filters we discussed in the previous sections were meant to be filters that estimate the first derivative of the signal when it is immersed in white Gaussian noise. It is clear, however, from the definition of the convolution integral, that the first derivative of the output of a convolution is the same as the convolution of the initial signal with the derivative of the filter. We can turn this conjecture upside down and state that the result of convolving a signal with a differentiating filter can be obtained by convolving the signal with the integral of the filter first and then differentiating the output. The integral of the differentiating filter, however, is going to be a symmetric bell-shaped function that will act as a smoothing filter. Thus, we can separate the process of smoothing and differentiation so that we can perform one at a time along the directions we choose in a two-dimensional image, just as we did at the end of Section I. The integral of filter 55 is given by: h(x) =

ea[L1 sin(ux) + L2cos(ax)]

+ e-IIX[L3sin(@ + L4 cos(ax)]

+LSx+L,eSx+L, N-x)

for - w s x s O for 0 I x Iw

(56)

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

323

.

The parameters L , , . .,L, can be expressed as functions of parameters K , , ...,Ks. Clearly, the extra constant of integration L, has to be chosen in such a way that h(0) = 0. Further, the filter should be scaled so that when it acts upon a signal with no features, i.e., constant, it will not alter it. In other words, the direct component of the filter should be 1. This is equivalent to saying that the sum of its weights must be 1. Petrou and Kittler (1991) have tabulated the values of the parameters of the above filter for s = 1 and for various filter sizes and explained how these parameters should be scaled for filters of different values of s and different sizes.

F. Some Example Filters To demonstrate the results of the theory we developed so far, we can use some filters in the scheme proposed in Box 1. For a start, we have to choose an appropriate value of the parameter s. This can be done by modelling a couple of representative edges in one of the images we plan to process, but in general the value s = 1 is a quite representative value. So, we shall use, for simplicity, s = 1 in the filters that we shall implement. Making use of the information given in Petrou and Kittler (1991), we can derive the differencing and smoothing filters of sizes 5 to 13, given in Box 2. Filters smaller than that are not worth considering because they tend to be badly subsampled and therefore loose any optimality property. Filters larger than that sometimes may be useful in particularly noisy images, but we shall not consider them here. Difleereiitiatiori filters -.

I I

J

Smootlrirrc filters

Box 2. Differentiation and smoothing filters of various sizes for ramp edges computed for slope parameters s = 1 . Incomplete filters are supposed to be completed using the antisymmetry and the symmetry property of the differentiation and smoothing filters respectively.

324

MARIA PETROU

I:

f-

FIOURE6. The ouput of applying the algorithm of Box 1, with the filters of size 13 of

Box 2 plus thresholding, to the image of Fig. 5a.

We used the ramp filters of size 13 from the above table in the scheme defined in Box 1 to derive the results shown in Fig. 6. The improvement over the simple filters used in Section I is very noticeable, and it certainly justifies the effort. IV. THEORY EXTENSIONS

The work we discussed in the previous section forms only the bare bones of the line of approach reviewed in this article and has sparked off several papers concerned with improvements and variations of the basic theory. For example, the three criteria derived can be modified to apply to the design of filters appropriate for the detection of features with symmetric profiles, like roads and hedges in an aerial photograph (see for example, Petrou and Kittler, 1989; Petrou and Kolomvas, 1992; Petrou, 1993; and Ziou, 1991). However, linear feature detection is not the subject of this article, and we are not going to discuss it here. The major issues which merit discussion are the extension of the theory to two dimensions, the approximation of filters by simplified versions, their modification for more

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

325

efficient implementation as well as considerations of interference from other features in the signal. We shall discuss all these points one at a time but first we must see why the need for extensions and modifications arose. There are three drawbacks of the theory we discussed so far: The whole theory was developed in the context of one-dimensional signals. Images, however, are two-dimensional signals, and edges in them appear in all sorts of orientations. The filters seem to be cumbersome and not very efficient in their implementation. Edges were considered as isolated features, and no thought was given to the influence of one edge to the detection of a neighbouring one. In the subsections that follow we shall discuss how various researchers dealt with the above-mentioned problems.

A. Extension to Two Dimensions The optimal filters derived in the previous section concern edges in one-dimensional signals. To determine the gradient of the image function we only need to convolve the image in two orthogonal directions with onedimensional masks, and that is what the filters are supposed to be doing. In a two-dimensional signal, however, an edge can have any orientation, not necessarily orthogonal to the direction of convolution. If we assume pure step edges, the differentiation filter should not be affected by that: A step edge remains a step edge even when it is viewed at an angle by the convolution filter. The problem arises when one wants to make the filters more robust to noise and thus propose to smooth first along the direction orthogonal to the direction of differentiation, just as we did in Box 1. Then the true orientation of the edge matters, since any smoothing in a direction that does not coincide with the direction of the edge will result in blurring it, and a blurred edge has no longer an ideal step function profile irrespective of orientation; in fact, the more slanted the edge is to the direction of convolution, the more blurred it will become. Canny (1986) solved the problem of edge orientation by convolving the image in more than two directions and combining the results. He used as smoothing filter the Gaussian to smooth the image first in the orthogonal direction to that of convolution. Spacek’s (1986) approach to the problem was different. On the grounds that the differentiating filter is antisymmetric and cannot possibly have a two-dimensional counterpart, he concentrated on the smoothing filter,

326

MARIA PETROU

which is symmetric and thus does have a two-dimensional version. To produce this two-dimensional version, Spacek (and Petrou and Kittler, 1991, as well) simply replaced the x variable in the definition of h(x) by the polar radius r. This two-dimensional filter was then used to smooth the image first, before differentiating it with a very simple differencing mask, for example like the one we used in Section I. There are various drawbacks of this approach: 0

The spectral properties of a one-dimensional filter are different from the spectral properties of its circularly symmetric version. For example, the Fourier transform of a finite width pulse is a sinc function, while the Fourier transform of its circularly symmetric version is expressed in terms of another Bessel function. Both transforms “look” similar, but the exact locations of their zeros, maxima, and the like are different. However, having said that, the method used by the above researchers is often used in practice for the extension of filters to two dimensions, because in general, a good one-dimensional filter when circularized gives rise to a pretty good two-dimensional one. The circularly symmetric smoothing filter is not separable, so that a full two-dimensional convolution has to be performed before differencing takes place. This tends to be slow and cumbersome.

There have been some unsuccessful attempts to modify the theory so that optimal two-dimensional filters can be directly developed. The attempts concentrated mainly in the development of filters that detect edges as zero crossing points, i.e., filters that estimate the locations where the second derivative of the image function becomes zero, which are obviously the locations where the first derivativeattains maximum. Such filters correspond to the Laplacian of a Gaussian filters of Marr and Hilderth (1980). However, attempts to derive such filters did not go very far, mainly due to the lack of a simple two-dimensional formula that corresponds to Rice’s onedimensional result concerning the density of zeros of the filter response to noise. Such a formula would give the average distance between zeros in the response of a two-dimensional filter to a noise field. Apart from the calculational difficulty in deriving such a formula, it is not even clear how to define what we mean by density of false zero crossings in two dimensions.

B. The Gaussian Approximation

We saw that the extension of the optimal smoothing filter to two dimensions

led to filters that involve cumbersome two-dimensional convolutions. This is because the circularized filter h(r) is not separable. A two-dimensional

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

327

Gaussian, however, is the product of two one-dimensional ones, and a two-dimensional convolution with it can be done by two cascaded onedimensional convolutions. For a filter of size 7 x 7, say, this implies 2 x 7 multiplications per pixel as opposed to 7 x 7 multiplications. This is one of the main reasons that the optimal filters have hardly been used in practice, and instead Gaussian filters have been preferred. The other reason is that Canny himself, when he derived his differentiating filters, proposed that they can be approximated well by the derivative of a Gaussian. In fact this statement was taken so literally that most people when they say they use the “Canny filter” actually mean the derivative of a Gaussian! In fact, a Gaussian filter can be made to look as similar or as dissimilar as one wishes to the optimal filter, according to the choice of the standard deviation used! Figure 7 shows two Gaussian filters that have been chosen to correspond to the optimal filter. The first one was chosen so that the maxima of the two filters match. If we look at the tails of these filters, we shall see that the Gaussian filter has a significantly sharp edge, which implies that the noise characteristics of this filter will be different from the noise characteristics of the optimal filter. Canny

-2.0 -1 0.0

I

-5.0

I

I

0.0

5.0

10.0

FIG.7 . Two Gaussian “approximations” to the optimal filter of size 13 given in Box 2.

328

MARIA PETROU

(and several other researchers as well) computed the performance measure of a Gaussian filter by simply substituting the Gaussian function into the formula of the composite performance measure and allowing the limits to go to infinity. By doing such a calculation Canny concluded that the performance measure of the Gaussian approximation is 80% of the performance measure of the optimal filter. I consider this calculation meaningless. The Gaussian filter is infinite in extent, and, when used in practice, it is bound to be truncated. Truncation will cause noise accentuation and false responses. These false responses, however, are not accounted for by the performance measure, which considers only the false responses caused by the random noise field within the finite boundaries of the filter. Thus, composite performance measures computed for Gaussian filters using the formulae derived in the previous section are meaningless; either one uses infinite filter limits for their computation or truncated ones. It seems more reasonable to fix the noise characteristics of the filters one tries to associate, in order to make any meaningful comparisons. We can do that as follows: Suppose that we digitize the optimal filter as we did in Section 1II.F. The continuous filter function then is represented by seven or thirteen, say, numbers. Thus, some of its properties are lost. In effect we band limit it that way, and, by doing so, we make it to be of infinite extent in the image domain. We can use this fact to compute the discontinuity we introduce by truncating the filter now to its finite size and choose the standard deviation of the Gaussian filter so that it has the same discontinuity at the point of truncation. Further, we scale the Gaussian filter so that the sum of the squares of the weights of the two filters are the same, since this is the quantity that enters into computing the response of the filter to noise by Eq. (23). Then we can claim that we have defined the Gaussian filter that best corresponds to the optimal filter as we chose it by fixing the two filters’ responses to noise. This Gaussian approximation to the optimal filter is also shown in Fig. 7. We can see that it is very different from the other Gaussian approximation, and, as expected, it produces different results. In general, anything that looks like the derivative of a bell-shaped function can be approximated by the derivative of a Gaussian, but what matters is what parameter values we choose for the approximating Gaussian, and this is something to which there is no easy guidance. In conclusion, Gaussian filters are clearly easier to compute and more efficient to implement, but one should have in mind that they are not the product of any optimality theory, and since they can be made to look and behave as dissimilar as one likes to the filters that resulted from the theory developed in the previous section, they should not be associated with them.

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

329

C. The Infinite Impulse-Response Filters

Concerned with the efficient implementation of the optimal filters, Deriche

(1987) had the idea of allowing them to be of infinite extent and imple-

menting them recursively. The idea of edge detection by infinite impulseresponse filters has been around for some time (see, for example, Modestino and Fries, 1977). Recursive implementation implies that the same number of operations are performed per pixel, irrespective of the actual filter effective size and shape. Deriche allowed the limits in the Canny’s performance criteria to go to infinity and thus derived a filter of the form: f(x) =

-

ce-alxl sin(&)

(57)

where c is a scaling constant, and a and SZ are the filter parameters, to be chosen by the user by experimentation. Deriche’s filter can be derived from Canny’s filter if we take the limit w CQ. Indeed, by just looking at Formula (53), which holds for x I0, it becomes obvious that for large w the term multiplied by e-ax will explode unless K3 = K4 = 0. Further, the only way to make the filter go smoothly to zero at infinity is to choose also K2 = K5 = 0. Thus, filter 57 arises. Taking this limit, however, is wrong, because although one can do it if one considers the function in isolation, the theory that gave rise to the derivation of this function does not apply for the limiting case. Indeed, the criterion C , Canny derived, measures the average distance between zero crossings as a fraction of the filter width. When the filter width becomes infinite, the C criterion is undefined or becomes zero. Deriche in his paper claims that he used this criterion measure in his optimization process (what he and Canny call k), but a careful examination of his equations shows that he actually used the same measure as Canny, i.e., the percentage of error caused by the presence of false maxima, as a fraction of the error due to thresholding, a quantity Canny calls r. Apart from the fact that maximization of this quantity r is not the same as maximization of k (or C in our terminology), the derivation of r is based on the definition of k and that is besieged by the fact that k is badly defined for infinite filters. Sarkar and Boyer (1991a) spotted the inadequacy of the theory behind the above filters and reworked out the correct performance criteria appropriate for infinite impulse response filters. In particular they redefined the criterion concerning the average density of false maxima. To do that, they defined an effective filter width as: -+

330

MARIA PETROU

Thus, the measure of average distance of false responses now becomes:

Sarkar and Boyer subsequently optimized the composite performance is measure Canny had defined P, = SL subject to the condition that constant. The equations they had, however, were too complicated and no analytic form of the filters could be found. They derived their filters numerically and approximated them by a function of the form:

f ( x ) = Ae-"X(cos(+) - cos(/3aor + 4))

for x > 0,

(60)

where /3 = 1.201, = 0.771, and A and CY > 0 are scaling constants that do not affect the shape of the filter. The recursive implementation of this filter can be achieved by scanning the image one line at a time from left to right, to form the input signal sequence x(n). Its reverse version, x,(n), is formed when we scan the line from right to left. If there are N pixels in a line, the input sequence and its reverse are related by x,(n) = x(N - n + 1). The double scanning of each line is necessary because the filter is defined for both positive and negative values of x , and thus consists of a causal and anticausal part. These two sequences are used to form the corresponding output sequences given by:

~ + ( n=) bl.~+(n- 1) + b 2 ~ + ( n 2) + b3Y+(n - 3) + u,x(n - 1) + u*x(n - 2),

(61)

y-(n) = bly-(n - 1) + bzy-(n - 2) + b,y-(n - 3) + u,x,(n - 1) + u2x,(n - 2)

(62)

The total filter output sequence will be:

u(n) = r+(n)- y-,(n),

vn,

(63) where [y-,(n)) is the inverse sequence of [ y - ( n ) ) .The parameters that appear in the above expressions are given by the following equations in terms of the filter parameters: b , = e-"(l

+ 2cos(pa)

b2 = -ble-a b - e-3U 3 -

+ 4))

a, = Ae-"(cos(4) - cos(/k~

u2 = Ae-2a(cos($) - cos(2Pa

+ 4)) - b,ul

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

33 1

Sarkar and Boyer derived also the integral version of the above filter to be used for smoothing in the orthogonal direction to that of differentiation. The recursive implementation of these filters means that only 40 multiplications per pixel are performed irrespective of the values of the filter parameters. This number should be compared with the number of multiplications needed when the convolution filters derived in the previous section are used, which are 4 xfirter size. So, for filters of size 5, 7, or 9, convolution is more economical, while for filters of size larger than 10 the recursive implementation may give considerable computational gains. There are two drawback in the infinite impulse-response filters:

As it can be seen from Eqs. (61) and (62) above, the filter output is given by recursive relations that need initial values. Whatever initial values we use, their effect will propagate to all subsequent values of the sequence, that is, the recursive implementation of the filter introduces infinite boundary.effect! The infinite size of the filters in effect allows the interference of neighbouring edges. Indeed, the whole filter theory is based on the assumption that we wish to identify an isolated edge in the signal. The effect of several edges near each other on the output of the filter was not considered. How the theory can be extended to cope with this effect will be discussed in the next section. D. Multiple Edges

Shen and Castan (1986) also worked on the idea of infinite impulse response edge detection filters. In addition, they were concerned with the multipleedges problem and discussed how to avoid it. They used criteria similar to the criteria we developed in Section 111, but they appropriately modified them so that filters could be discontinuous at the centre. That way they derived an optimal smoothing filter, the first derivative of which can be used for the detection of edges as extrema of the first derivative of the signal and its second derivative for the detection of edges as zero crossings of the second derivative of the signal. Their filter has the form: where c is a scaling constant and p is a positive filter parameter. The parameters of the filter should be chosen in such a way that the interference from neighbouring edges is minimized. The interference effect was studied by considering a series of crenellated edges so that the signal jumps from - A to + A at some irregular intervals. Their analysis was based on the

332

MARIA PETROU

following assumptions: 0

In any arbitrary space interval (xo, xo + Ax), the probability of having one step edge is independent of xo.

a The number of step edges in an interval (xl, x2) is independent

in another interval (x3,x4) if the two intervals do not overlap.

0

of that

If &(Ax) is the probability of having k edges in an interval Ax, limAx+o(P2(Ax)/Ax)= 0.

The above assumptions can be used to show that:

where A > 0 is the average density of edge points in the signal. If Q is the standard deviation of the noise, there researchers showed that the filter parameters should be: c=-

AZA d2Dl

p = 2a,

wherea =

J7A

+27t

(67)

It is obvious from the above expressions that when the average distance between edges decreases, i.e., k increases, p increases too so that the filter becomes sharper and the danger of interference from nearby edges decreases. These filters were shown by Castan et al. (1990) to satisfy the criterion of maximal signal-to-signalratio and modified versions of the other two criteria of optimality: The good locality criterion was replaced by a modified version to allow for the filter discontinuity at the centre and thus permitting zero error in the edge locality (something which Canny’s criteria do not allow). The multiple responses criterion was replaced by the requirement that there should be one extremum only at the vicinity of the edge, but small dense extrema away from the edge are allowed. According to these criteria the filter given by 65 is optimal in its own right. The same authors proceeded to implement this filter recursively, as well as its first and second derivatives to be used for the actual differentiation of the signal.

E. A Note on the Zero-Crossing Approach Marr and Hildreth (1980) proposed to detect edges as the zero crossings of the second derivative of the signal. Combining this with the Gaussian filter for smoothing led to the Laplacian of a Gaussian filters, which were quite popular in the early 1980s. The theory we are reviewing in this article was

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

333

appropriately modified by Sarkar and Boyer (1991b), so that optimal filters that detect edges as zero crossings can be defined and implemented recursively. As the philosophy of such filters is slightly different from the philosophy of our approach so far, no more details will be given concerning them. One point, however, is worth mentioning, and that concerns the derivation of the optimal differentiating filter and its subsequent integration to form the “optimal” smoothing filter, as we did in Section III.E, or the derivation of the optimal smoothing filter and its differentiation to form the “optimal” diferencing filter, as Castan et al. (1990) did, as we discussed in 1V.D. Sarkar and Boyer showed that the optimal filter for detecting edges as zero crossings (i.e., effectively locating the zero crossings of the second derivative of the signal) is not the derivative of the optimal filter for detecting edges as extrema of the first derivative of the signal. The implication of this is that the derivative of the optimal smoothing filter is not necessarily the optimal differencing filter and vice versa. In other words, if one wants to derive the optimal smoothing filter one should start from the beginning, rather than integrate the optimal differentiating filter and so on. So, one should have this in mind and probably put the word in quotation marks when the filter referred to was not directly derived by optimizing a set of criteria but was rather the result of integratioddifferentiation of an optimal filter. V . POSTPROCESSING

All the theory we developed and discussed so far concerns the design of convolution filters that will effectively enhance the locations of the edges. Figure 8a shows the output of filtering the image of Fig. l a with the filters of size 9 given at the end of Section 111. The outputs of the directional convolutions have been combined to form the gradient magnitude output. For displaying purposes, the output has been linearly scaled to range between 0 and 255. We see that the edges of the image stick out quite nicely, and, therefore, we may think that if we simply threshold this output, we may identify them, provided that the threshold has been chosen carefully. However, before we do that, we must isolate the local maxima of the gradient because that is where edges are located. Figure 9 shows schematically the shape of the output surface near the location of an edge. The curves are the contours of constant gradient magnitude, and the thicker the line, the higher the value. The direction of the gradient is orthogonal to the contours. Clearly, we would like the edge to be marked along the thickest line. So, we must look for maxima of the gradient magnitude in a direction orthogonal to the edge direction, i.e., in a direction along the gradient

334

MARIA PETROU

40000.0 ,

20000.0

!

10000.0

-

0.0

0.0

100.0

200.0

300.0

400.0

5 1.0

(4 FIGURE8. (a) This image shows the magnitude of the gradient value at each location

computed using the differencing filter of size 13 of Box 2. The values have been scaled to vary between 0 and 255. (b) The local maxima of the gradient image shown in a. (c) The histogram of the values of the gradient image shown in a. The arrow indicates the threshold used for the edge map shown in d. (d) The edge map of b after thresholding with threshold 56.

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

335

vector. In our elementary edge detector the direction along which we examined whether the output is a local maximum or not is determined grossly. It is allowed to be either vertical or horizontal. In more sophisticated versions of edge detectors, the direction of the edge is detected by taking the inverse tangent of the ratio of the output in the y direction over the output in the x direction. The angle determined that way would, in general, define a direction pointing in between the neighbouring pixels, since it can take continuous values. The values of the gradient along this direction can be calculated by linear interpolation using the values of the neighbouring pixels. The value of the gradient at the pixel under consideration is then compared to the estimated values of the gradient on either side along the gradient direction, and if it is found to be a local maximum, the presence of a possible edge is marked in that location. In even more sophisticated versions of the algorithm the gradient values are fitted locally by a second-order surface, and the exact location of the local maximum is computed from this analytic fitting (see, for example, Huertas and Medioni, 1986). Such an approach results in subpixel accuracy in the location of the edges. Having isolated the local maxima, one might think that the task is over. Figure 8b actually shows the edges we find from the output in Fig. 8a if we keep the local maxima of the gradient. We see that there are lots of unwanted edges which somehow have to be weeded out. One would expect that all edges which are due to texture or noise will probably have very low magnitude, while edges which are genuine will have much higher values.

336

MARIA PETROU

of

FIGURE9. The image brightness at the vicinity of an edge. The thick lines correspond to locations of constant gradient magnitude (the thicker the line, the higher the value of the gradient magnitude). The thin lines correspond to locations of constant gradient direction. Ideally, we would like the edge to be marked along the thickest line.

If we plot therefore the number of pixels versus the gradient magnitude value, we expect to find two peaks, one representing the unwanted edgels and one the genuine ones. Unfortunately, this is not the case, as can be seen from Fig. 8c. The histogram of the edge magnitudes is monomodal; no clear differentiation can be made as to which pixels are edges and which are background on the basis of magnitude only. Even so, people often experiment by thresholding the gradient values, choosing a threshold more or less at random and adjusting it until the result looks acceptable. That is how Fig. 8d was produced. The arrow on the histogram in Fig. 8c shows the exact location of the threshold used. It is obvious that some correct edges have been missed out simply because the contrast across them is rather low, while other edges with no physical significance were kept. Simple thresholding according to gradient magnitude entirely ignores the actual location of the edgels. We must therefore, take into consideration the spatial arrangement of the edgels before we discard or accept them. This is called hysteresis thresholding. Canny incorporated hysteresis thresholding in his edge-detection algorithm, and, as experience has shown, it turned out

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

337

to be an even more significant factor in the quality of the output than the good filtering itself. It is often said that Sobel filtering followed by hysteresis thresholding is as good an edge detector as one can get. This is not exactly true, but it shows how important this stage of processing is in comparison to the filtering stage. Hysteresis thresholding consists of the following steps: Define two thresholds, one low and one high. Remove all edgels with gradient magnitude below the low threshold from the edge map. Identify junctions, and remove them from the edge map. A junction is any pixel which has more than two neighbouring edge pixels. Of the remaining edgels in the map create strings of connected edgels. If at least one of the edgels of a string has magnitude above the high threshold, accept all the edgels in the string as genuine. If none of the edgels in the string has magnitude above the high threshold, remove the whole string from the edge map. You may or may not wish to put back the junction points removed from the edge map at the beginning. If the removed junction points are to be put back, we accept only those that are attached to retained strings of edgels. Usually there are very few junction points in the filtered output, due to the way the edgels are picked. In fact, a serious criticism of this approach is that the filters respond badly to corners, and the process of nonmaxima suppression eliminates corners or junctions in general. The identification of junctions in an image is another big topic of research (for a review, see Eryurtlu and Kittler, 1992). People have either attempted to do it as an extra stage in the image processing chain (see, for example, Rangarajan et al., 1989; and Mehrotra and Nichani, 1990), or as part of the edge-detection process (see, for example, Harris and Stephens, 1988). We shall not go into the details here, as it is beyond the scope of this article. We must note, however, that the described approach is not designed to respond correctly to junctions and that in the output edge map most of the junctions will probably be missing. One issue that is of paramount importance is the choice of the two thresholds. Various researchers have carefully analysed the sources of the unwanted edgels and have come with various formulae concerning the choice of the thresholds (e.g., Voorhees and Poggio, 1987; and Hancock and Kittler, 1991). Unfortunately, these formulae depend on the exact filters used for smoothing and differentiation, and they certainly require an estimate of the level of noise in the image. On the other hand, most of the

338

MARIA PETROU

people who have used edge detectors have formed their own opinions as to what is a good set of thresholds. Although it is impossible to give a recipe that will work for all filters and all images, we can summarize the general consensus here, which is based on the collective experience of a large number of people and is backed by theoretical results derived by the abovementioned workers:

-

The high threshold should be a number in the range f x mean, -mean of the gradient value calculated before any nonmaxima suppression takes place. The small threshold should be between 4 to of that. The values used for the production of the output shown in Fig. 10a were the mean and two-thirds of that (i.e., 30 and 20, respectively). Another rule of thumb, not based on any theoretical work, is as follows: If one computes the statistics of the gradient magnitude after the nonmaxima suppression, a good set of thresholds is the mean and a tenth of the mean of the distribution of the gradient magnitudes.

FIGURE10. (a) The result of applying the algorithm of Box 1 with filters of size 9 from Box 2 and hysteresis thresholding with maximum and minimum thresholds the mean and two-thirds of the mean gradient value, respectively, computed before non-maxima suppression, to the image of Fig. la. (b) The result of applying the algorithm of Box 1 with filters of size 9 from Box 2 and hysteresis thresholding with maximum and minimum thresholds the mean and a tenth of the mean gradient value, respectively, computed after nonmaxima suppression to the image of Fig. la.

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

339

Figure 10b shows the result when using this rule (thresholds used 33 and 3.3). There is not much difference between the two results, so the two rules seem to be reasonably equivalent. We may wish to compare these results with Fig. 2, which was produced using thresholds 65 and 40, i.e., twice the mean and the small threshold about two-thirds of the high threshold. The single threshold result of Fig. 8d was produced with an in-between value of the last two, namely 56. Which of these results is preferable is very much a matter of application. The two rules of thumb mentioned above allow the preservation of much detail, while the thresholds used in Fig. 2 and 8d were chosen by trial and error to produce a “clean” picture, personally judged as “good” for presentation purposes. VI. CONCLUSIONS The work we presented in the previous sections focused on a small but significant part of the research effort in edge detection, namely that of convolution filters which respond with an extremum when the first derivative of a signal function is an extremum. Very elaborate filters were developed and shown to perform quite well when applied to difficult images. These filters are optimal within the restrictions of the approach adopted and the criteria used. However, there were some disquieting results. Deriche’s filters were developed using inconsistent criteria, i.e., they were allowed to be of infinite extent while the criteria used to justify them were ill-defined for infinite boundaries. And yet, those filters have become reasonably popular, and most of the users will tell you that they perform well enough. One has to wonder then how much the optimality criteria matter and how much the restrictions we impose define the rules of the game. Spacek (1986) had the idea to ignore any optimality and simply define a filter that is a cubic spline that simply fits the boundary conditions and nothing else. This filter is given by the following equation: f(x) = A [

(:>’

+ 2(;7

+

(t)]

for - w

I

x I0,

(68)

where A is an amlitude parameter. Spacek calculated the value of the composite performance measure of this filter for step edges and found it less than the value of the performance measure of the optimal filter. Petrou and Kittler (1991) showed that the difference becomes more significant when ramp edges are assumed and increases as the slope of the ramp edges decreases, i.e., as the ramps become shallower. However, these calculations are theoretical assessments of the filters, and we do not know how they translate to practical filter performance. To test correctly the performance

340

MARIA PETROU

of an edge detector we must have an image and its perfect edge map as drawn by hand, say, and compare the output of the edge detector against the desirable edge map, The fraction of the edge pixels that were not detected will form the underdetection error, and the fraction of spurious edge pixels will form the overdetection error. Then we could say that we have a measure of the true performance of the edge detector. It would be useful to know what the correspondence is between the value of a theoretical performance criterion and the underdetection and overdetection errors of the filter when some standard images are used. This, however, does not seem to be easy or even possible. The problem starts from the fact that it is very difficult to find or even develop such standard images. The reason is the example in hand, the one given in the introduction: A lot of knowledge creeps in when we produce the hand segmentation of an image, and any comparison against it is heavily biased in favor of the handproduced edge map. Even so, one may argue that we are prepared to allow for this factor and that we do not even hope to establish filters with zero overdetection and zero underdetection error. What matters really is the relative performance of the various filters when applied to the same image and their outputs are compared with the same hand-produced edge map. However, notice that I used the word “edge detector” and not “edge filter” when I talked about comparisons with the hand-drawn edge map. This is because edge filters simply enhance the edges, they do not identify them. It is the nonlinear postprocessing that does the identification and that relies on thresholds that could be chosen almost arbitrarily and that clearly should be different for different filters as detailed analysis has shown (see, for example, Hancock and Kittler, 1991). Furhter, the best edge detector should be one that requires the least adjustment from one image to the other, or for which the adjustment happens automatically. To assess the performance of an edge detector taking this into consideration, one certainly needs a set of images with known edge maps. And then the question arises as to what is a representative set of images! For the above reasons, it is very difficult to have absolutely objective criteria about the performance of edge detectors. This is also the reason why everybody who published anything on edge detection was able to show that his or her edge detector performs better than other edge detectors! In view of the above discussion, it seems that it is reasonable to compare filter outputs by applying them to the same image and for each filter playing with the parameters until a “good” result is achieved. Hardly a satisfactory process, but probably the fairest one under the circumstances. The cubic spline filter given by Eq. (68) was used to produce the result in Fig. 11. Visually, it is difficult to see much difference between this output and that of Fig. 2! Compare, however, Figs. 12a and 12b. Figure 12a was produced

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

341

FIGURE1 1 . The result of applying the algorithm of Box 1, with the spline filter of size 9 and hysteresis thresholding, to the image of Fig. la.

by the spline filter and 12b by the optimal filter of the same size. Both results were obtained using hysteresis thresholding with thresholds in ratio 2 : 3 and the high threshold chosen to be twice the mean of the gradient computed before the nonmaxima suppression. It is clear that the result of the optimal filter is superior as the circle and the straight lines were better detected. Both filters did a rather bad job at the perforations, partly because of their proximity and partly because of the junctions involved.

342

MARIA PETROU

FIGURE 12. (a) The result of applying the algorithm of Box 1, with the spline filter of size 13 and hysteresis thresholding, to the image of Fig. 5a. (b) The result of applying the algorithm of Box 1 , with the filter of size 13 from Box 2 and hysteresis thresholding, to the image of Fig. Sa.

From this example and from my experience with edge detectors, I would like to summarize the conclusions of this chapter in the form of the message to take home: 0

0

There is no filter or edge detector which is appropriate for every image. The most important parameter of a filter is its size. The noisier the image, the larger the filter should be. The sharper the filter at the centre, i.e., the more it resembles the difference of boxes operator, the more accurately it will locate the edges and the more sensitive to noise it will be. The general shape of the filter should be something like the filters presented in Fig. 7. The filter should go smoothly to zero if it is of finite size, and its value should drop to insignificant values within a small distance from its centre if it is of infinite size, to avoid interference from other features. The post processing stage is of paramount importance. Contextual postprocessing like probabilistic relaxation (e.g., Hancock and Kittler, 1990), salient feature selection (e.g., Sha’ashuna and Ullman, 1988) or at least hysteresis thresholding is recommended. For images with low levels of noise the Sobel or even simpler masks should be used. (The optimal filters we discussed are designed to cope with high levels of noise, and they will work badly due to the

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

0

343

overblurring of the true edges and the rounding of the corners if applied to noise-free images like those created by software graphics packages.) The noisier the image, the more is to be gained by using an optimal filter. Know thy edge detector. Avoid default values for the thresholds of the postprocessing stage or the filter size; instead, check the role of each parameter, particularly for filters whose shape changes with the value of the parameter, and adjust them accordingly.

Given that the exact filter shape seems to make little difference to the final outcome for images of low to intermediate levels of noise, is one to conclude then that all the elaborate theory we developed was useless? I would say no. For a start, such a conclusion would be a hindsight view. We would never have known unless lots of people had toiled developing the theory and the filters in the first palce. Besides, the optimal filters do make a difference for images of high levels of noise. In particular, the filters presented in Box 2 require only the specification of size to guarantee a good result, as opposed to the Gaussian-type filters for which the user has to play with two parameters, namely size and standard deviation, to achieve an acceptable result. Finally, even for images of low to intermediate levels of noise, if one is to use a filter, one might as well use something that is the result of careful consideration even though the difference it makes might be disproportionate to the effort put in developing it! REFERENCES Boie, R. A., Cox, I. J., and Rehak, P. (1986). Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition 100-108. Bovic, A. C., and Munson, D. C., Jr. (1986). Computer Vision, Graphics and Image Processing 33, 377-389. Boyer, K. L., and Sarkar, S. (1992). Applications of Art. Intelligence X : Machine Vision and Robotics SPIE-1708, 353-362. Canny, J . (1986). IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-8, 679-698. Castan, S., Zhao, J., and Shen, J. (1990). Proc. 1st European Conf. on Computer Vision (0.Faugeras, ed.) ECCV-90, 13-17. Deriche, R. (1987). International Journal of Computer Vision 1, 167-187. Duda, R. 0.. and Hart, P. E. (1973). Pattern Classification and Scene Analysis. John Wiley, New York. Eryurtlu, F., and Kittler, J. (1992). In: Signal Processing VI, Theories and Applications (J. Vandewalle, R. Boite, M. Moonen, A. Oosterlinck, eds.). Elsevier, 1, 591-594. Geman, S . , and Geman, D. (1984). IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-6, 721-741.

344

MARIA PETROU

Graham, J., and Taylor, C. J. (1988). Proc. of the 4th Alvey Vision Conf., Manchester, UK, 59-64. Granlund, G. H. (1978). Computer Graphics and Image Processing 8, 155-178. Hancock, E. R., and Kittler, J. (1990). IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-12, 165-181. Hancock, E. R., and Kittler, J. (1991). Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 196-201. Haralick, R. M. (1980). Computer Graphics Image Processing 12, 60-73. Haralick, R. M. (1984). IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-6, 58-68. Harris, C., and Stephens, M. (1988). Proc. 4thAlvey Vision Conf., Manchester, UK, 189-192. Huertas, A., and Medioni, G. (1986). IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-8, 651-664. Kundu, A., and Mitra, S. K. (1987). IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-9, 569-575. Kundu, A. (1990). Pattern Recognition 23, 423-440. Lee, J. S. J., Haralick, R. M., and Shapiro, L. G. (1987). IEEE J. of Robotics and Automation RA-3, 142-156. Marr, D., and Hildreth, E. (1980). Proc. R. SOC.Lond. B-207, 187-217. Mehrotra, R., and Nichani, S. (1990). Pattern Recognition 23, 1223-1233. Modestino, J. W., and Fries, R. W. (1977). Computer Graphics Image Processing 6 , 409-433. Morrone, M. C., and Owens, R. (1987). Pattern Recognition Letters 6, 303-313. Nalwa, V. S., and Binford, T. 0. (1986). IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-8, 699-7 14. Perona, P., and Malik, J. (1992). Applications of Art. Intelligence X : Machine Vision and Robotics SPIE-1708, 326-340. Petrou, M. (1993). IEE Proceedings-I Communications, Speech and Vision 140, 331-339. Petrou, M., and Kittler, J. (1989). Proc. 6th Scandinavian Conference on Image Analysis, SCIA '89. Oulu, Finland (M. Pietikainen and J. Roning, eds.), 816-819. Petrou, M., and Kittler, J. (1991). IEEE Pattern Analysis and Machine Intelligence PAMI-13, 483-491. Petrou, M . , and Kittler, J. (1992). Applications of Art. Intelligence X: Machine Vision and Robotics SPIE-1708, 267-281. Petrou, M., and Kolomvas, A. (1992). In: Signal Processing VI, Theories and Applications ( J . Vandewalle, R. Boite, M. Moonen, A. Oosterlinck, eds.). Elsevier 3, 1489-1492. Pitas, I., and Venetsanopoulos, A. N. (1986). IEEE Pattern Analysis andMachine Intelligence PAMI-8, 538-550. Rangarajan, K., Shah, M., and van Brackle, D. (1989). Computer Vision Graphics and Image Processing 48, 230-245. Rice, S . 0. (1945). Bell Syst. Tech. J. 24, 46-156. Rosenfeld, A., and Thurston, M. (1971). IEEE Trans. Comput C-20, 562-569. Sarkar, S., and Boyer, K. L. (1991a). IEEE Pattern Analysis and Machine Intelligence PAMI-13, 1 154-1 171. Sarkar, S., and Boyer, K. L. (1991b). CVGIP: Image Understanding 54, 224-243. Sha'ashua, A . , and Ullman, S. (1988). 2nd Intern. Conf. Comp. Vision ICCV-88, 321-326. Shanmugam, K. S., Dickey, F. M., and Green, J. A. (1979). IEEE Trans. Pattern Analysisand Machine Intelligence PAMI-1, 37-49. Shen, J., and Castan, S. (1986). Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, 109-1 14. de Souza, P. (1983). Computer Vision, Graphics and Image Processing 23, 1-14.

THE DIFFERENTIATING FILTER APPROACH TO EDGE DETECTION

345

Spacek, L . A. (1986). Image and Vision Comput. 4, 43-00. Torre, V., and Poggio, T. A. (1986). IEEE Pattern Analysis and Machine Intelligence PAMI-8, 147-163. van Vliet, L., Young, I . T . , and Beckers, G. L. (1989). Computer Vision, Graphic and Image Processing 45, 167-195. Voorhees, H., and Poggio, T. A. (1987). Proc. 1st Intern. Conf. Computer Vision. London, 250-258.

Ziou, D. (1991). Pattern Recognition 24, 465-478.