The Regularization of Early Vision

The Regularization of Early Vision

5 The Regularization of Early Vision 5.1 5.2 5.3 5.4 5.5 5.6 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

1MB Sizes 2 Downloads 54 Views

5 The Regularization of Early Vision

5.1 5.2

5.3

5.4

5.5

5.6

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Early vision as inverse optics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Ill-posed problems and regularization theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Generalized inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Condition number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Regularized solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Early vision revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Edge detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Computing optical f l o w . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Shape from shading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 A unified approach to early vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Is early vision really ill posed? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

87 88 88 91 93 93 94 94 95 97 97 98 99 100 100 101 101

INTRODUCTION

A basic tenet of many computational studies of vision is the distinction between early vision and higher-level processing. The term early vision denotes those problems (like edge detection, motion estimation, depth recovery, and shape from X) that appear to be addressed at the first few stages of the processing of visual information. In both natural and machine vision systems the understanding of early vision is considered a necessary step for the accomplishment of higher-level visual tasks like manipulation, object recognition and scene description. Unlike higher-level vision, early vision seems to be constituted by a number of roughly independent visual modules. Each module, at least to a first A R T I F I C I A L VISION ISBN 0-12-444816-X

Copyright 9 1997Academic Press Ltd All rights of reproduction in any form reserved

88

A. Verri

approximation, can be regarded as a bottom-up process characterized by a strong geometric flavor. In this chapter a framework for the analysis and solution to early vision problems is reviewed. The framework, originally proposed by (Bertero et al., 1988) is based on the theory of regularization of ill-posed problems (Tikhonov and Arsenin, 1977). An ill-posed problem (Hadamard, 1902; Courant and Hilbert, 1962) is a problem for which at least one of the conditions of uniqueness, existence or continuous dependence of the solution on the data is not ensured. The theory of regularization provides mathematical techniques able to restore these conditions for the solution to an ill-posed problem. The relevance of regularization theory to early vision, as pointed out by (Poggio et al., 1985), is due to the fact that many early vision problems appear to be ill-posed. The rest of this chapter is organized as follows. Section 5.2 describes the basic geometry and photometry of a visual system and formulates a few early vision problems as inverse problems. In Section 5.3 the mathematics of ill-posed problems and regularization theory is reviewed. In Section 5.4, the regularization of the early vision problems of Section 5.2 is outlined. The merits and limitations of this approach to computer vision are discussed in Section 5.5. Finally, Section 5.6 summarizes the presented results.

5.2

BASIC CONCEPTS

In this preliminary section the inverse (and ill-posed) nature of early vision problems is illustrated. To this purpose some basic notation of geometry and photometry is first established.

5.2.1

Preliminaries

In what follows a simple camera model is described and the fundamental equation of image formation is recalled. For a more detailed discussion of the geometry of vision the interested reader should refer to the appendix of the book by Mundy and Zisserman (1992), while for a comprehensive description of the physics of image formation the classic book by Born and Wolf (1959) is recommended. 5.2.1.1

Geometry

In the pinhole model of Figure 5.1 the viewing camera consists of a plane 7r, the image plane, and a point O, the focus of projection. Let f > 0 be the focal distance (i.e., the distance between 7r and O). In the 3D Cartesian system of

89

The Regularization of Early Vision

Xa

X - (Xl,X2,X3) Q

X - - (X 1~ X2~

0

Xl

Figure 5.1 Pinhole camera model 9 coordinates (X1,X2, X3) o f Figure 5.1, the point x = (Xl,X2,X3) C 7f, image of the point X = (X1,X2,X3) in 3D space, is obtained through the perspective projection of X on the image plane 7r with respect to the focus O. As shown in Figure 5.1, x is the point in which the straight line through X and O impinges the image plane 7r. If the image plane 7r is the set of points x = (Xl,X2,X3) with x3 = f , the fundamental equation of perspective projection can be written

x i - f Xi, X3

i-

1,2,3.

(5.1)

If the point X in 3D space is moving with velocity V - ( V 1 , V2, V3) , then v - (vl, v2, v3), the velocity of the corresponding image point x, can be obtained

90

A. Verri

by differentiating eqn (5.1) with respect to time, that is, ~)i -- f X3Vi - XiV3

i = 1,2,3.

(5.2)

Since '03 0 for all the points of 7r, the vector v, also called the motion field, can be regarded as a 2D vector field v = (Vl, v2). Let us now recall some basic definitions of photometry. =

5.2.1.2

Image formation

The image irradiance E = E(x) is the power per unit area of light at each point x of the image plane. The scene radiance L = L(X, O) is the power per unit area of light that can be thought of as emitted by each point X of a surface in 3D space in the particular direction 0. This surface can be fictitious, or it may be the actual radiating surface of a light source, or the illuminated surface of a solid. In the pinhole camera model, the camera has an infinitesimally small aperture and, hence, the image irradiance at x is proportional to the scene radiance at the point X in the direction 00 (the direction of the line through X and O). In practice, however, the aperture of any real optical device is finite and not very small (ultimately to avoid diffraction effects). Assuming that (i) the surface is Lambertian, i.e. L = L ( X ) , (ii) there are no losses within the system and (iii) the angular aperture (on the image side) is small, it can be proved (Born and Wolf, 1959) that E ( x ) = L(X)o3cos 4 q~

where ~o is the solid angle corresponding to the angular aperture and 4~ the angle between the principal ray (that is the ray passing through the center of the aperture) and the optical axis. With the further assumption that the aperture is much smaller than the distance of the viewed surface, the Lambertian hypothesis can be relaxed to give (Horn and Sjoberg, 1979) E ( x ) -- t ( X , 0o)O3cos 4 q~

(5.3)

with O0 direction of the principal ray. In what follows it is assumed that the optical system has been calibrated so that eqn (5.3) can be rewritten as E(x) = L(X, 00).

(5.4)

Equation (5.4), known as the image irradiance equation, says that the image irradiance at x equals the scene radiance at X in the direction identified by the line through X and O. We are now in a position to formulate some of the classical problems of early vision as inverse problems.

91

The Regularization of Early Vision

5.2.3

Early vision as inverse optics

As already mentioned, the aim of early vision is the reconstruction of the physical properties of surfaces in 3D space from 2D images. This reconstruction amounts to finding the solution to different problems of inverse optics. Let us illustrate this point further by means of three specific examples. 5.2.2.1

Edge detection

The first step of many visual algorithms can be thought of as the problem of inferring the discontinuities of surface orientation, depth, texture and other physical properties of the viewed surfaces in 3D space from the sharp changes, or edges of the image brightness pattern (Shanmugam et al., 1965; Canny, 1983; Torre and Poggio, 1986). In essence, edge detection requires the preliminary computation of spatial derivatives of the image brightness. As it can easily be seen in the 1D case, the inverse (and ill-posed) nature of edge detection resides in the differentiation stage. Let g - g(x) be a function defined on the interval [0, 1]. The derivative g l - u of g can be computed as the solution to the equation

g(x)

-

.(t)dt.

(5.5)

0

Clearly, the problem of determining u(x) from eqn (5.5) is an inverse problem. The problem is also ill-posed since the derivative u does not depend continuously on g. This can be best seen by means of a simple example. If - g + c sin/3x, then for all x E [0, 1] and independently of/3 [g-

gl

<~ ~.

(5.6)

However, if u - g~ and fi - ~ , then - u + ~3 cos/3x and for/3 I> N/e and some x

ICt-ul

u.

(5.7)

From eqns (5.6) and (5.7) it follows that the derivatives of two arbitrarily close functions can be arbitrarily different. Let us now discuss the problem of the computation of optical flow. 5.2.2.2

Computing optical flow

The estimation of the motion field v of eqn (5.2) from a sequence of images (Horn and Schunck, 1981; Hildreth, 1984; Adelson and Bergen, 1985; Heeger,

92

A. Verri

1987) is an essential step for the solution to problems like 3D motion and structure reconstruction (Longuet-Higgins and Prazdny, 1981; Rieger and Lawton, 1985; Heeger and Jepson, 1992). It has been proposed to compute v from the apparent motion of the image brightness pattern E - - E ( x , t) over time. For the sake of simplicity let us assume that this apparent motion, usually called opticalfiow, coincides with the motion field v = (Vl, v2) (for a critical discussion of the conditions under which optical flow and motion field are the same, see Verri and Poggio (1989)). Probably the most common hypothesis on the changing image brightness is that the total derivative of E vanishes identically (Horn and Schunck, 1981), or

dE(x,t)

(5.8)

dt The explicit evaluation of the left hand side of eqn (5.8) reads g x 1Vl -+- gx2 V2 --

-Et,

(5.9)

where Ex~, Ex2 and Et are the partial derivatives of E = E(x, t) with respect to Xl, x2 and t respectively. From eqn (5.9) the inverse nature of the problem of computing optical flow from the spatial and temporal derivatives of the image brightness is apparent. Clearly, the problem is also ill-posed since the solution to eqn (5.9) (one equation for two unknowns, Vl and v2) is not unique. 5.2.2.3

Shape from shading

The visual information that can be extracted from the shading of an object can be usefully employed for the recovery of the shape of the viewed surface (Horn, 1975). Following Horn (1990) let us derive the fundamental equation of shape

from shading. First, it is assumed that the depth range is small compared with the distance of the scene from the viewer. Consequently, the projection is approximately orthographic and by means of a suitable rescaling of the image coordinates eqn (5.1) can be rewritten

Xi --

gi,

i - 1,2.

(5.10)

Now, if (i) the reflective properties of the viewed surface are uniform, (ii) there is a single, far away light source (that is, the incident direction is constant) and (iii) the viewer is also far away (that is, the viewer direction is approximately the same for all the points of the scene), then the scene radiance depends on the orientation and not on the position of a surface patch in the scene. Therefore, by means of eqn (5.10) the scene radiance L at the point X in the direction 00 (the same direction for every X from (iii)) can be written in terms of the spatial derivatives off(x), the 'height' function which describes the viewed surface in terms of the image coordinates, or

L(X, 0o) - R(p(x),q(x))

(5.11)

93

The Regularization of Early Vision

with p --- f x 1

and q = fx=

partial derivatives of f = f ( x ) with respect to Xl and X 2 respectively and components of grad f , the gradient of f . The function R(p, q), which is the scene radiance as a function of the components of grad f , is called the reflectance map (Horn and Sjoberg, 1979). Through eqn (5.11), the image irradiance equation (5.4) reads

E(x) = R(p(x), q(x)).

(5.12)

The recovery of shape from eqn (5.12), the fundamental equation of shape from shading, is clearly an inverse (nonlinear) problem. The problem is also ill-posed since eqn (5.12) is only one equation for two unknowns, p and q. Let us now briefly review the fundamental concepts of regularization theory.

5.3

ILL-POSED P R O B L E M S A N D R E G U L A R I Z A T I O N T H E O R Y

In this section the distinction between well-posed and ill-posed problems is made precise. Then, the main techniques for determining the solution to (linear) ill-posed problems are briefly reviewed. For an exhaustive treatment of this subject see Tikhonov and Arsenin (1977) and Bertero (1989).

5.3.1

Definitions

For the sake of simplicity the presented analysis is restricted to the particular case of linear, inverse problems in functional spaces. With the only exception of shape from shading this assumption is sufficient to our purpose. Let X and Y be Hilbert spaces and L a continuous, linear operator from X to Y. The problem of determining u c X such that

g=Lu

(5.13)

for some g c Y is an inverse problem (the solution to eqn (5.13) requires the inversion of the operator L). We have the following definition: The problem of solving eqn (5.13) is well posed (Courant and Hilbert, 1962) if the following conditions are satisfied: 1) for each g c Y the solution u is unique (uniqueness); 2) for each g E Y, there exists a solution u c X (existence); 3) the solution u depends continuously on g (continuity). Otherwise, the problem is said to be ill posed. Now, let

N(L) = { f C X, with Lf = O}

94

A. Verri

be the null space of L, or the set of the invisible objects, and

R(L) = {g C Y, with g = Lf and f c X} the range of L, or the set into which the operator L maps X. From the definition of null space of L, it follows that if N(L) = {0}, then L is injective and the solution is always unique. Similarly, from the definition of range of L if Y = R(L), then L is onto and the solution to eqn (5.13) exists for all the g E Y. Since L is linear, condition 3 (continuous dependence on the data) is implied by conditions 1 and 2. Therefore, if L is injective and onto, the problem of solving eqn (5.13) is always well posed. Let us now briefly survey the main techniques that can be employed for the solution to eqn (5.13) when R(L) r Y and N(L) ~ {0}.

5.3.2

Generalized inverse

If R(L) r Y and the datum g does not belong to R(L) the existence condition is clearly violated and the solution to eqn (5.13) does not exist. A least square solution or pseudosolution can then be defined as the function u c X such that

IIL u - gll = i n f ( l l L f - gll,f E X} It can easily be seen that a pseudosolution exists if and only if R(L) is closed. Furthermore, the pseudosolution is unique if and only if N(L) = {0}. If N(L) r {0} but R(L) is closed, a pseudosolution u + of minimum norm, called a normal pseudosolution or generalized solution, can be found. It can be shown that the generalized solution u + is unique and orthogonal to N(L). Since the mapping g ~ u + is continuous, it follows that if R(L) is closed the problem of determining u + from g is always well-posed. The operator L + defined by the equation

L+g = u+

(5.14)

is called the generalized inverse of L.

5.3.3

Condition number \

It is important to notice that condition 3 (continuous dependence \ on the data) is weaker than stability or robustness of the solution against noise. The solution to a well-posed problem can still be severely ill conditioned. Let us discuss this point further in the assumption that R(L) is closed. From eqn (5.14) and the linearity of L we have II~Xu+II ~ IIL+ II IIAg II.

(5.15)

Similarly, from eqn (5.13) with u = u + it follows that

IIg II

II t II II u + II,

(5.16)

95

The Regularization of Early Vision

By combining eqns (5.15) and (5.16) it is easy to obtain

II/Xu+ll

II/Xgll

Ilu+ll

Ilgll

with a = II L II IIZ+ll ~ 1, The condition number a controls the stability of the generalized inverse. If a is not much larger than 1, the problem of solving eqn (5.13) through the computation of the generalized inverse is said to be well conditioned. Intuitively, small perturbations of the data of a well-conditioned problem cannot produce large changes in the solution. Instead, if a >> 1, small perturbations may cause large changes and the problem is ill conditioned. Let us now turn to the more difficult (and interesting) case in which R(L) is not closed.

5.3.4

Regularized solutions

Regularization theory (Tikhonov and Arsenin, 1977) was developed in the attempt to provide a solution to ill-posed problems in the general case. The devised regularization methods have two advantages over the theory of generalized inverse. First, the methods can be applied independently of the closedness of the range of L. Second, the numerical solution to a regularized problem is well conditioned. Consequently, regularized solutions are also stable against quantization and noise. The key idea of regularization is that in order to avoid wild oscillations in the solution to an ill-posed problem, the approximate solution to eqn (5.13) has to satisfy some smoothness (i.e., regularizing) constraint. Following the original formulation by Tikhonov let C be a constraint operator defined by

Cu-~

Cr(X)IU(~)(x) dx r=O

where the weights Cr are strictly positive functions and u (~) denotes the rth order derivative of u. A typical regularization method consists of minimizing the functional ~[u]9 [ILu-g [2 + AlCul2.

(5.17)

with A > 0. Depending on the available a priori information on the solution, the parameter A can be determined in three different ways. 1)

If the approximate solution u is known to satisfy the constraint

II Cull < E, then the problem is to find the function u that minimizes the functional

IILu - g II

96

A. Verri

with ]] Cu [I <~ E. By means of the method of Lagrange multipliers the solution to this problem is equivalent to determining the minimum u~ of ~ with arbitrary A, and to the search of the unique A such that

IICu ll 2)

-

E.

If the approximate solution u is known to satisfy the constraint

IILu- gll then the problem is to find the function u that minimizes the functional

II Cu II with ] ] L u - g I] ~< e. By means of the method of Lagrange multipliers the solution to this problem is equivalent to determining the minimum uA of ~A with arbitrary A, and to the search of the unique A such that

IILuA 3)

-

gll

- E.

If the approximate solution u is known to satisfy both the constraints

II Cu II

E

and

IIL u - g II

then the problem reduces to determining the minimum of ~A with

The first method looks for the function that best approximates the data in the set of sufficiently regular functions, the second method for the most regular function in the set of functions sufficiently close to the data. In the third method, a compromise between the degree of regularity and the closeness of the solution to the data is established. Let us conclude this very brief overview of regularization theory with three observations. First, if L is a convolution operator (as in the case of eqn (5.5)) the regularized solution can be simply obtained as a 'filtered' version of the non-regularized solution (Tikhonov and Arsenin, 1977). Second, let us mention the existence of methods (like cross validation (Wahba, 1977) and generalized cross validation (Craven and Wahba, 1979) that can be usefully employed for the determination of the parameter A in the absence of reliable a priori information on the degree of regularity and closeness of the solution to the data. Third, if the operator L is nonlinear, regularized solutions can still be obtained by minimizing the functional ~A of eqn (5.17) (with L a nonlinear operator). Under rather broad conditions on L and C, the existence and continuous dependence of the solution on the data can easily be proved (Tikhonov and Arsenin, 1977). In general, the uniqueness of the regularized solution is an open problem.

97

The Regularization of Early Vision

5.4

EARLY VISION REVISITED

In this section, the regularization of the problems of early vision formulated in Section 5.2 is outlined.

5.4.1

Edge detection

The typical procedure for the regularization of the problem of edge detection consists of two steps. In the first step the data are approximated by means of an analytic function, while in the second step the analytical derivative of the approximating function is computed. Let us first consider the 1D case. If gj is the datum at the location x J, j -- 1 , . . . , N, the approximating function is the function f that minimizes the functional N

~[f]-

Z

(gJ-f(xJ))2 + A I

(f"(x))2 dx.

(5.18)

j=l

with A > 0. Clearly, the regularizing functional 11cU

2 _ I ( f " (x) )2 dx

ensures the smoothness of the approximating function. Under rather general assumptions (Poggio et al., 1984), it can be shown that the functionf that minimizes the functional ~ of eqn (5.18) can be obtained by convolving the data with an appropriate filter R. Therefore, the derivative o f f can be computed by convolving the data with the derivative of the filter. In the 2D case, if the regularizing functional is

'] Cf l' - l J (~2 grad f )2 dxl dx2 where V 2 is the Laplacian, it can be shown (Poggio et al., 1984) that the solution can be obtained by convolving the data with the filter

R2(Xl'Xz)--z.

-~l I~ Jo(~z) o

Ac06-+-1 ~ d ~

where J0 is the 0th-order Bessel function and z - x/'x2 + x 2. In practice, the convolution of the data with a low-pass filter is sufficient to regularize the problem of differentiation. Consequently, most of the proposed methods for edge detection, methods developed before the connection between early vision and regularization theory was pointed out, can be regarded as regularization methods. Let us now deal with the problem of the computation of optical flow.

98

5.4.2

A. Verri

Computing optical flow

In Section 5.2 it was shown that the problem of determining the optical flow from eqn (5.9) is ill-posed. The well-posedness can be restored by means of the notion of generalized solution. Let us rewrite eqn (5.9) as gradE, v = -E t .

(5.19)

The pseudosolutions to eqn (5.19) are the vector fields which minimize the functional IIgrad E . v + Etll. If grad E r 0, a generic vector field v can be uniquely decomposed into the pair (v• vii ), with vj_ the component of v in the direction orthogonal to the isobrightness contour (that is, parallel to grad E), and vii the component of v in the direction parallel to the isobrightness contour. From the decomposition v -- (v• vii ), it follows that all the pseudosolutions to eqn (5.19) are the vector fields v with ~• = -

El II grad E

I1"

The normal pseudosolution is the unique pseudosolution v + with v i i - 0. Interestingly, the vector field v + can be successfully employed for motion understanding (Aloimonos and Duric, 1994). In the search for a unique optical flow more similar to the perceived image motion, Horn and Schunck (1981) proposed to look for the vector field V ~--- (2)1, '02) which minimizes the functional f~,x[v]- I J [ ( d g / d t ) 2 + A(lgradvl]2 + ]gradv2]2)] dXl dx2

(5.20)

with A > 0 and where the integral extends over the whole image plane. The second term in the right-hand side of eqn (5.20) is a smoothness constraint and, hence, the problem of minimizing 9t~[v] is well posed. Intuitively, the unique solution v is the smoothest vector field that nearly satisfies eqn (5.8).

The iterative scheme proposed by Horn and Schunck for the minimization of 9t~[v] makes it clear that the solution propagates from the boundary to the interior of the image plane. An interesting generalization of the smoothness term of f~ [v] can be found in (Youille and Grzywacz, 1988) where the optical flow is computed as the vector field v which minimizes the functional

flair]-

( d e / d t ) 2 "Jr-/~ Z r=O

cr(Dr'o)2

d x l dx2

(5.21)

99

The Regularization of Early Vision

with A > O, 2r

O" Cr ~Z",r

r!

and (D2rv)

- - V 2rv

(D 2r+lv) - grad (D2rv) (V 2 is the Laplacian operator). The functional (~a of eqn (5.21) has three interesting properties (Youille and Grzywacz, 1988). First, the fact that Co > 0 is a necessary and sufficient condition for the interaction to fall faster than 1/r, where r is the distance between motion measurement sites. Second, due to the particular choice of the coefficients Cr, the smoothing effects of the regularizing term in f~a is equivalent to a Gaussian interaction. Surprisingly enough, this 'short range' interaction, which is more effective than the 'long range' interaction of the Horn and Schunck method, is induced by the presence of higher-order derivatives in the regularizing functional. Third, the optical flow which is obtained by minimizing f~a seems to be consistent with a number of psychophysical experiments on motion perception. Finally, let us turn to the regularization of shape from shading.

5.4.3

Shape from shading

In Section 5.2 it was shown that the problem of solving eqn (5.12) is ill posed. In order to recover shape from shading (Ikeuchi and Horn, 1981) proposed to minimize the functional Oa[p, q ] - I J [IE(x ) - R ( p , q ) [ 2 + A(lgradp] 2 + [gradql2)] dXl dx2

(5.22)

with A > 0 and where R ( p , q) = s. n

is the reflectance map of a Lambertian surface, with s the unit vector in the direction of the light source, and 1 n -

V/1 + p2 _t_q2 ( - p ' - q' 1)

the unit vector normal to the surfacef - - f ( x l , x 2 ) . Since the reflectance map depends nonlinearly on p and q, the problem of minimizing the functional O~ of eqn (5.22) is not necessarily well posed. A rigorous proof of the existence and continuous dependence of the solution on the data of this problem can be found in (Bertero et al., 1988). The minimization of f~, usually obtained by means of iterative schemes, poses an interesting

1 O0

A. Verri

question. The devised iterative algorithms tend to 'walk away' from the correct solution of the image irradiance equation when this solution is provided as the initial condition. In essence this somewhat surprising behavior is the typical side effect of regularization techniques: a small amount of error in the original equation is traded for an increase in the smoothness of the solution. From the solutions p=p(xl,x2) and q = q(xl,x2), the viewed surface f--f(xl,x2) can then be recovered as the function that minimizes the functional J I [(fxl-P)2-+-(fx2-q)2]dxldX2 with the appropriate boundary conditions (Ikeuchi and Horn, 1981). In order to obtain more faithful surface reconstruction, Horn (1990) suggested the minimization of the functional

(~"~[P'q'f]-ll

{,E(x)-R(p,q),

2 +A(,gradp[2+

+ #[(fxl - p ) 2 + (fx2 - q)2]} dXl dx2

,gradq[ 2) (5.23)

with A, # > 0. The explicit presence of a term which measures the error of surface reconstruction in eqn (5.23), reduces the 'walk away' effect from the correct solution of iterative techniques and appears to produce better results on real images (Horn, 1990).

5.5

DISCUSSION

In this section the impact of regularization theory in computer vision is discussed. Let us start by listing the merits of the described framework.

5.5.1

A unified approach to early vision

First, the regularization theory approach to computer vision provides a coherent framework in which many visual problems can be analysed. As shown in Sections 5.2 and 5.4 several early vision problems can be formulated and solved as particular instances of a general class of problems, the class of ill-posed problems. Second, the connection between vision and regularization theory clarified important aspects of the variational formulation of visual problems (like the role of smoothness constraints and a priori information in the search for satisfactory solutions (Youille and Grzywacz, 1988), and stimulated the development of many fruitful ideas (like controlled continuity constraints

The Regularization of Early Vision

1 01

(Terzopoulos, 1986), snakes (Kass et al., 1987), balloons (Cohen and Cohen, 1993), and deformable contours (Blake et al., 1993)). Third, a number of interesting works originated from the attempt to overcome the difficulty of regularization to dealing with discontinuity. A largely incomplete list includes papers on Markov random fields and stochastic relaxation techniques for image segmentation (Geman and Geman, 1984; Poggio et al., 1988; Geman et al., 1990) and the solution to variational problems with discontinuities (Blake and Zisserman, 1987; Lee and Pavlidis, 1988; Mumford and Shah, 1989; Ambrosio and Tortorelli, 1990; March, 1992). Let us now turn to the major criticism that can be raised against the described approach.

5.5.2

Is early vision really ill posed?

The need of regularization theory for computer vision was advocated on the basis of the inverse nature of many early vision problems. This point bears reflection because the ill-posedness of an inverse problem might depend on the specific choice of the X and Y spaces and on the available a priori information on the solution. The uniqueness condition, for example, depends critically on the adopted (and available) geometric and heuristic constraints that are used to resolve ambiguities. The computation of image motion provides an instructive example. If the image brightness constancy equation is assumed to be the only available constraint, then the computation of optical flow is undoubtedly ambiguous. However, it has been shown that other constraint equations or suitable assumptions on the local structure of the optical flow of rigid objects (Tretiak and Pastor, 1984; Uras et al., 1988; Verri et al., 1990) can be used to efficiently and successfully reconstruct the full 2D motion field. In many cases, these constraints might be better suited than regularizing functionals for the description of the structural properties of the 2D motion field. In short the extent to which the full machinery of regularization theory is really necessary and useful for computer vision seems to depend on the amount of a priori information actually available for each specific problem. Finally, let us summarize the content of this chapter.

5.6

CONCLUSIONS

Regularization theory studies methods for the solution to ill-posed problems (i.e., problems for which at least one of the conditions of uniqueness, existence,

102

A. Verri

or continuous dependence of the solution on the data is not ensured). Many early vision problems can be formulated as problems of inverse optics and appear to be ill posed. In this chapter a few methods which have been proposed for the solution to classical early vision problems, like edge detection, motion estimation, and shape from shading have been discussed in the light of regularization theory. It is concluded that regularization theory provided a unified and inspiring framework for computational studies of early vision.

REFERENCES Adelson, E.H. and Bergen, J.R. (1985) Spatiotemporal energy models for the perception of motion. J. Opt. Soc. Am. A2, 284-299. Aloimonos, Y. and Duric, Z. (1994) Estimating the heading direction using normal flow. Int. Journal of Computer Vision 13, 33-56. Ambrosio, L. and Tortorelli, V.M. (1990) Approximation of functionals depending on jumps by elliptic functionals via F-convergence. Commun. Pure & Applied Mathematics 43, 999-1036. Bertero, M. (1989) linear inverse and ill-posed problems. Advances in Electronics and Electron Physics 75, 1-120. Bertero, M., Poggio, T. and Torre, V. (1988) Ill-posed problems in early vision. Proc. IEEE 76, 869-889. Blake, A. and Zisserman, A. (1987) Visual reconstruction. MIT Press, Cambridge, Massachusetts. Blake, A., Curwen, R. and Zisserman, A. (1993) A framework for spatiotemporal control in the tracking of visual contours. Int. J. Computer Vision 11, 127-145. Born, M. and Wolf, E. (1959) Principles of optics. Pergamon Press, New York. Canny, J.F. (1983) Finding edges and lines in images. AI Lab Memo 720, MIT, Cambridge, Massachusetts. Cohen, L.D. and Cohen, I. (1993) Finite-element methods for active contour models and ballons for 2D and 3D images. IEEE Trans. Pattern Analysis Machine Intelligence PAMI-15, 1131-1147. Courant, R. and Hilbert, D. (1962) Methods of mathematical physics (II). Wiley Interscience, London, UK. Craven, P. and Wahba, G. (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross validation. Numerical Mathematics 31, 377-403. Geman, S. and Geman, D. (1984) Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Analysis Machine Intelligence PAMI-6, 721-741. Geman, D., Geman, S., Graffigne, C. and Dong, P. (1990) Boundary detection by constrained optimization. IEEE Trans. Pattern Analysis Machine Intelligence PAMI-12, 609-628. Hadamard, J. (1902) Sur les probl6mes aux d6riv6es partielles et leur signification physique. Princeton University Bulletin Vol. 13. Heeger, D.J. (1987) Optical flow using spatiotemporal filters. Int. J. Computer Vision 1, 279-302.

The Regularization of Early Vision

1 03

Heeger, D.J. and Jepson, A.D. (1992) Subspace methods for recovering rigid motion I: algorithm and implementation. Int. J. Computer Vision 7, 95-117. Hildreth, E.C. (1984) The computation of the velocity field. Proc. Royal Society of London B221, 189-220. Horn, B.K.P. (1975) Obtaining shape from shading information. In: The psychology of computer vision (ed. P.H. Winston). McGraw-Hill, New York. Horn, B.K.P. (1990) Height and gradient from shading. Int. Journal of Computer Vision 5, 37-75. Horn, B.K.P. and Schunck, B.G. (1981) Determining optical flow. Artificial Intelligence 17, 185-203. Horn, B.K.P. and Sjoberg, R.W. (1979) Calculating the reflectance map. Applied Optics 18, 1770-1779. Ikeuchi, K. and Horn, B.K.P. (1981) Numerical shape from shading and occluding boundaries. Artificial Intelligence 17, 141-184. Kass, M., Witkin, A. and Terzopoulos, D. (1987) Snakes: active contour models. Int. Journal of Computer Vision 1, 321-331. Lee, D. and Pavlidis, T. (1988) One dimensional regularization with discontinuities. IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-10, 822-829. Longuet-Higgins, H.C. and Prazdny, K. (1981) The interpretation of moving retinal images. Proc. Royal Society of London B208, 385-397. March, R. (1992) Visual reconstruction with discontinuities using variational methods. Image and Vision Computing 10, 30-38. Mumford, D. and Shah, J. (1989) Optimal approximations by piecewise smooth functions and associated variational problems. Commun. Pure & Applied Mathematics 42, 577-685. Mundy, J.L. and Zisserman, A. (1992) Appendix - Projective geometry for machine vision. In: Geometric invariants in computer vision (eds J.L. Mundy and A. Zisserman), MIT Press, Cambridge, Massachusetts. Poggio, T., Voorhees, H. and Yuille, A. (1984) Regularizing edge detection. AI Lab. Memo 776, MIT, Cambridge, Massachusetts. Poggio, T., Torre, V. and Koch, C. (1985) Computational vision and regularization theory. Nature 317, 314-319. Poggio, T., Gamble, E.B. and Little, J.J. (1988) Parallel integration of vision modules. Science 242, 436-440. Rieger, J.H. and Lawton, D.T. (1985) Processing differential image motion. J. Optical Society of America A2, 354-359. Shanmugam, K.F., Dickey, F.M. and Green, J.A. (1965) An optimal frequency domain filter for edge detection in digital pictures. IEEE Trans. Pattern Analysis Machine Intelligence 44, 99-149. Terzopoulos, D. (1986) Regularization of inverse visual problems involving discontinuities. IEEE Trans. Pattern Analysis Machine Intelligence PAMI-8, 413-424. Tikhonov, A.N. and Arsenin, V.Y. (1977) Solutions of ill-posed problems. Winston & Sons, Washington, DC, USA. Torre, V. and Poggio, T. (1986) On edge detection. IEEE Trans. Pattern Analysis Machine Intelligence PAMI-8, 147-163. Tretiak, O. and Pastor, L. (1984) Velocity estimation from image sequences with second order differential operators. Proc. Int. Conf. on Pattern Recognition, Montreal, Canada, 16-19. Uras, S., Girosi, F., Verri, A. and Torre, V. (1988) A computational approach to motion perception. Biological Cybernetics 60, 79-87. Verri, A. and Poggio, T. (1989) Motion field and optical flow: qualitative

104

A. Verri

properties. IEEE Trans. Pattern Analysis and Machine Intelligence PAMI-11, 490-498. Verri, A., Girosi, F. and Torre, V. (1990) Differential techniques for optical flow. J. Optical Society of America AT, 912-922. Wahba, G. (1977) Practical approximate solutions to linear operator equations when the data are noisy. S I A M J. Numerical Analysis 14. Youille, A. and Grzywacz, N.M. (1988) A computational theory for the perception of coherent visual motion. Nature 333, 71-73.