Computing depth from stereo images by using optical flow

Computing depth from stereo images by using optical flow

Pattern Recognition Levers 4 (1986) 205- 2 t2 July 1986 North-Holland Computing depth from stereo images by using optical flow A . SCHEUING* an...

360KB Sizes 0 Downloads 35 Views



Pattern Recognition Levers 4 (1986)

205- 2 t2

July 1986

North-Holland

Computing depth from stereo images by using optical flow A . SCHEUING* and H . NIEMANN Leltrstrdtl,f'it, Inforrattik 5 (Muslererkennung), Uaivercit it hiimtgen - :Vimtbarn, .1iarreacvrrgpe 3 . 8520 LNun,een, I-ed, Rep. German r

Received 5 August

1985

AbsoucC This paper presents an approach to depth determination which uses two screo images and a combination of iteratively computing optical flow with a local correspondence criterion . the method is a low-level solution to depth determination in the sense that no contours, no features, and no matching of corresponding contours of features is necessary . Instead of that an iterative optimization using pixel values only is employed . Experiments show that computation of depth on 128 x 128 images can he done on a minicomputer (NW 68000-type), that an accuracy of about 5' ,o is obtainable, and that depth discontinuities can he handled .

Rev words: Stereo, optical flow, local correspondence, depth computation, 3-D information .

1 . Introduction

Determination of depth information is a necessary prerequisite for three-dimensional computer vision . Among the approaches to this problem are laser range finders, patterned light, ultrasonic devices, and stereo images . Surveys on depth determination are given in [1,2] . This paper addresses the problem of depth from stereo images . From a three-dimensional object two images are taken by two cameras separated by a known distance h . Several solutions and algorithms were developed [3-6] . The basic approach is to determine corresponding points in the two images . If the camera parameters are known, the depth of an object point in a three-dimensional scene can be computed by well-known equations from the two-dimensional image coordinates of this object point in (tic two stereo images . The problem then is the automatic determination of two * A . Scheuing is a visiting scientist from the Institut fur Informatik and Angewandte Mathematik, University of Bern, Switzerland .

stereo image points belonging to the same object point which is called the correspondence problem . In order to determine this, two stereo image points are corresponding points by using only local information in the neighborhood of these points, the neighborhood must have sufficient structure . A common approach is determination of contour lines (or other prominent features) in the stereo images and matching of contour lines [4,6] to obtain corresponding points . Basically the same problem occurs when determining the two-dimensional velocity vector (that is, the velocity vector in the image plane) of moving object points . If two images separated by a known time interval Jt are taken from a moving object and if two points in those images belonging to the same object point can be determined, the velocity vector of this point can be computed . One approach is computation of contour lines (or other features) and tracking of those lines from image to image in a time sequence of images [7, 8] . Another approach is computation of optical flow by iterative solution of a variational equation [9, 10] . The advantage of this approach is that it does not re-

0167-8655/86/53 .50 + 1986, Flsevier Science Publishers B .V . (North-Holland)

205



Volume 4, Number 3

PATTERN RECOGNITION LETTERS

quire initial determination of contour lines or other features, but uses pixel values only . Velocity of points can thus be determined independent of some initial image segmentation, and therefore velocity may in turn be used as a segmentation cue . The contribution of this paper is a combination of a simple local correspondence criterion with the variational equations derived in [9] to obtain an iterative computation of depth . No initial segmentation or computation of features is necessary . In this sense the proposed method is a purely lowlevel approach . Since depth is obtained without prior segmentation it can be used as an independent cue to segmentation . For example, if a discontinuity of depth occurs, this is a strong cue for two different surfaces . It should be noted that depth is not computed from motion information, but from two static images . Section 2 of the paper repeats some essential equations from stereo vision and introduces the employed notation . Section 3 repeats the essential equation from optical flow and gives the specialization to the stereo problem . Section 4 gives an overview of the proposed approach . Section 5 developes the iterative computation of depth . Section 6 presents experimental results .

July 1986

the camera axis . The two-dimensional image coordinates of the right camera are (x',y') and the image plane is -z -f where f is the focal length of the camera [ii] . At distance b to the left, in the direction of the x-axis, the left camera is located . The image coordinates of this camera are (x", y") . A three-dimensional object is mapped to a right image f R (x ;y') by the right camera and to a left image f L (x",y") by the left one . The depth coordinate z is lost by this mapping . It is assumed that the cameras are oriented such that the scan lines are parallel to the x', x"-axis and that those axes are parallel to the x-axis . In this case an object point (x, y, z) is mapped to a left image point (x", y") and a right image point (x" y') where y"-y'

and

x"=x'-fb1z .

The two-dimensional image coordinates are ] l y'J

and

Z[Y

[Y"] --z[XYb] .

Figure I shows the employed geometry and coordinate systems . The three-dimensional scene coordinate system is (x,y,z), it has its origin in the center of the right camera, and it has its z-axis in

(2)

The disparity - the displacement of corresponding image points - is (3a)

bu(x', y')=x"-x'=-fb/z and the disparity coefficient is "W'Y')-f/z .

2 . Depth from corresponding stereo image points

(1)

(3b)

If u can be computed from the two stereo images f t and f R and f and b are known, the threedimensional object coordinates are x Y z

1 (4) U

In digital image processing it is convenient to measure the disparity bu in units of pixel distance ; denote this by u', the image width in pixel by N, the image width in [cm] by xb : N fb b u'=bu-=-N--=-Ny-, xb z Z Xb

(5)

Y =f/z,, If the depth z ranges between zm i n and Zmax, disparity u' ranges from Figure 1 . Stereo geometry and coordinate systems adopted in the text .

206

umin =-Nyb/Zmax~_

u's

ax =-Nyb/Zmin-

(6)



Volume 4, Number 3

PAT IL RN RECOGNITION LETTERS

These equations establish the basic relationships for depth computations if corresponding image points are known .

July 1986

f, in (7) . In order to have a different weighting of ux and u, we modify (10) to F,'=u,'+fi'u,, .

(12)

The direct solution 3 . Computation of optical flow (13)

u"= - .fr, J,

In a time sequence of images showing a moving object one may view the intensity J(x, y, l) at every image point to have an associated velocity vector (u(x, y), u(x, y)), . The field of velocity vectors is called optical flow and it is shown in [9] that the relation fr u+f .u+J, =o

(7)

holds where f,-, J',., f, denote partial derivatives of f(x, y, f) with respect to x, y, t, respectively . Because on a larger moving object the velocity is changing only slowly, the smootheness constraint h ;=u

+u, +u +u ;.

in (11) might he used if Jx#0, but a specialization of (9) allows one to compute an approximation of u even if f,=0 and the value of u* is very noisy . The iterative solution in this case reduces errors . This is possible because (9) incorporates a neighborhood of a point to judge continuity . In fact, purely row-wise computation of u could be considered because disparity is only in the .v-direction . The prevalence of the x-direction is evident in (12) from the factor 72 <1 . Therefore, again a minimization problem is introduced by modifying (9) with (10, 11) to

(8) f -

is introduced . The problem is to find a velocity field which is smooth, that is minimizes (8), and satisfies (7) . Introducing the Lagrange multiplier a 2 velocities u and u have to be computed which minimize F

II Qf,.u+f,u+f]

(9)

(14)

Using variational calculus the condition (15)

results . This equation can be solved iteratively for u(x, y) as shown below . The terms f(x, y, b) and u(x, y) are replaced by their discrete versions f(i, j, k) and u(i, j), respectively, where i is the column index and j the row index . A discrete approximation of u,,+f 2 u 1 ,, is according to Figure 2 u,~+f3-u, .,-ir(1,1)-tr(1,J)

and f'(x,y)=f(-v,_v h) . (10)

Because of the geometry adopted in Figure 1 there is only a velocity component in the x-direction, which is equivalent to the disparity coefficient u, and a=0 . Therefore, (7) and (8) reduce to fr u+f,=O

+a 2 [u, +$ 2 u2] dxdy

fu+ .ffv=a - ( u„+a 2 ur . )

The iterative solution of this variational problem is given in [9) . In the case of stereo images one may view an image point f 1 (x"y") to be obtained from f R (x', y') by displacing it by the disparity bu(x', y') as given in (1) . To obtain formal equivalence use the camera distance b in Figure 1 to replace i in f( .r,y,t) and write fR(x,Y)= .f(x,y,0)

IA"+L1 2

(F ; ;+a'F,)dxdy .

2

+a22[u I:+u,-+u,+v,"])dxdv .

II

and

Fs=u,+u y .

(11)

Above, f„ - defined in (20), (21) - is the partial derivative with respect to b in Figure 1 and replaces

r-1

132

1-r

1 .1

Y U (x, y )

u(4/)

0 (1 .

r 2R z /la

2

0

13 2

2/1 2

j) .

2

(4 .8/1 2)

fl2s 1

Figure

Computation of a discrete approximation to u .,- fl'u„ 207



Volume 4, Numher 3

PATTERN RECOGNITION LETTERS

where

July 1986

showing that a"" is a weighted mean of u" (that is disparities of neighboring points) and a*, the direct solution for u(i, j) . If f,-0, the influence of the direct solution vanishes and the neighboring points dominate .

u(i,j)=(4+8/32)-1[2u(i-I,j)+2u(i+I, j) +fl 2 (u(i- I, j- 1)+2u(i, j- 1) +u(i+ l, j-1)+u(i-l, j+ 1) +2u(i,j+1)+u(i+l,j+1))] .

4 . Overview of the algorithm

A discrete approximation to f, and fl, will be obtained in Section 5 because these approximations are coupled to the local correspondence criterion . With (16) we obtain from (15) u

A short overview of the proposed method for depth computation is given in Figure 3 . The problem is equivalent to iterative computation of velocities if the right and left stereo images arc considered as images one unit of `time' apart . This allows one to derive an iterative equation for computation of disparity (see (19) in Section 3) which is in direct analogy to computation of velocity as derived in [9] . A deviation from 191 is the introduction of a local correspondence criterion for computation of discrete approximations to f and f, occuring in (15) . This step is essential to allow for a sufficiently large range of depth values and it also provides good initial values of disparity to start the iteration . Iterative depth computation then is fairly straightforward and gives the depth coordinate z .

a' ff L+a~ a fx+a 2 .

For an image with N • N pixels (17) gives a system of linear equations for the N • N unknowns u(i, j), i,j=1, . . .,N . A solution can be computed by Gauss-Seidel iteration u ° =0,

f,.2

a2 +a 2

f fn 0 .+af,

where n is the iteration step . Other initial values for u ° are considered in Section 5 . With the direct solution for u* in (13) equation (18) can be rewritten as A ua„= ce, + f,+a 2 f +a

stereo images

3D scene

~~ a

direct solution

fx .

/b . U

u

z

ft=f(x,y,bl

equivalence "tree I' "displacement b"

sect. 2,3

p"rtiul denvat :ves

1"m! correspondence

sect . 3,5

sect . 5

Figure 3 . The steps of depth computation . 208

depth

fk_ f(x,y,0)

i

~LJ

iterative solution

depth from Csparity

sect . 5

^



Volume 4, Number 3

PATTERN RECOGNITION LETTERS

5 . Iterative computation of disparity

July 1986

If s is determined, it is an approximation to u* . From (23) the local correspondence criterion

5 .1 . The case of small disparity i . (i,j)=(.f,R(i,j)- ./,I (i+s, .i))' A straightforward discrete approximation of f„f,, in (18),(19) would be

f, (1,

j)=VR(i,J)+ .fR(i,J+ 1) 4f,'(1,J)+ .J (1,J+ I)]/4,

/, . s (i,

I) - f I .R (14

l,J)-f''K(1,J),

Jv(i,I)=U,°(1,J)+J1,(i,J+1) +J1,'(1+I,J)+fn(ii 1,J+l)]/4, (20)

fn`)(i,J)=fl-(i,J)-fs(i,J),

However, these discrete forms are suitable only for small disparities 0<--u'<2 . A simple numerical example shows that this is not an interesting case . Let the image have N = 128 pixels, let y=f/x,,=2 in (5) and z,,, ;,,=50cnr, then from (6) we get b< -a'z,,, ;,,/(Ny)=0 .4em . Such a small stereo basis could only be obtained by moving one camera 0,4 cm to the left, but not by a pair of cameras . In general, small disparities result in a small range of depth values . Therefore, approximations to f-, f, should not be computed by (20) . Instead we introduce f(iJ)=f li+s,J)-f x li,J),

s=0,l,2, . . .,

5 .2 . A local correspondence criterion Computation of s in (21) is based on what is called in this paper a local correspondence criterion L . The range of s follows from (6) : (22)

The basic idea for the determination of s is that for the proper value of s one should observe at some fixed row j„ f,R(i,Jo)= .f i+S,Io), ./ (i,Jo)=f,~(i+ I, j( ,)=0 .

`+A(i+ I,J)')

(24)

is derived . The value of s is selected which minimizes L . This is done by a search over the discrete values of s given in (22) . Let s* be the value of s minimizing l . in (24) ; the value s* is computed in every point of the image array . Since s* is integer, but a' need not be integer, a refined value of u' is computed by filling a parabole av 2 +hxi c through three points (k,L''(i, j)), k-i-i s-l,i+s,i+s+1 . The minimum of this parabola is the refined value u'-s*+ (L' -l (i, j)-L' -l (i,J))/(4a), a=0 .5(L'-'(i, .i)-2L`(i,J)+1."'(i,J)) .

(25)

Since I/a is the radius of curvature in the minimum of the parabola, a is also a measure of reliability . A large value of a indicates high curvature and a well defined minimum . An estimate of f,. is f(i,J)=(/ (i,I)+f,'-(i+s*,J))/2 .

(26a)

(21)

as a generalization of (20) . The reason for introducing s is obvious . If the disparity is too large to compute reasonable values of f, by (20), the number s effects a shift of f t . i n x-direction to obtain a more accurate approximation of J t' .

A m .,, < SG amux

4 ( .f'(i,A

(23)

A weighted of f, is justified because in (I8, 19) f,, determines only the relative weight of a and u* . A reasonable weight is the factor a in (25) giving f,-(i,j)=a(J,K(i,J)+f,

(26b)

A large weight of u* is reasonable only if a strong local correspondence cue - measured by a in (25) - is available . The criterion L in (24) is essential for achieving a large depth range and reliable computation of disparity a by (18, 19) . It would be interesting to investigate the extension of the local correspondence criterion L in (24) also to the case of velocity computation where a two-dimensional version of L is necessary . 5 .3 . The iteration To obtain a fast and reliable iterative solution of (18), (19) it is extremely helpful to start with reason209



PATTERN RECOGNI IION LETTERS

Volume 4, Number 3

able initial values u ° . The following choices are conceivable : u ° =0

(worst case, if nothing is known),

u°=(u,;, ;,,+µ;, ;,,)/2 :5 u,)-u tr°-Su* ~0

u

(see (6)),

(see (25)), if f,' - > T, else,

u* iff:>T, linear interpolation between the 1 values of u* to the left and tright with f,2 > T.

b

(27)

According to some initial experiments the last choice of u ° gave best results . The threshold T in (27) should be selected with respect to a in (IS), (19) ; according to some experiments we chose T=a/3 and a=0 .1 max{fr } .

July 1986

It is helpful to do some bookkeeping in order to avoid unnecessary computation of a" . This may occur at the final steps of iteration if some values U'(i,J) already have reached their stable values . As a compromise between treating every point and no bookkeeping at all we found it useful to treat complete rows ; that means, if the values of u" and u" I in three adjacent rows are the same, the middle row is ignored in further iterations . The goodness of an iterative solution may be measured, for example, by c i . =maxlu"' (i,jo)-u"(i,J°)I/u"- (i,j°), 010

where j° is the row under consideration . c,=~ u p 1(i,j)-a"(i,j)I/N z , (28) (F,; + a °F,. ) .

Figure 4 . Illustration of experiments . (a) right camera image of first scene ; (h) right camera image of second scene ; (c) grey value representation of depth (scale is in cm) computed after I I iterations front scene shown in (a) ; (d) sane as (c) for scene shown in (h) .

210



Vohune 4. Number 3

Criteria for terminating the iteration might be

6 . Experimental results

e1,,< T1 Vjo

e,
Two experiments are shown to demonstrate the feasibility of the above approach . The right camera images are shown in Figure 4 (top) . They show an oblique cube in front of a background . Depth computation after I I iterations of these images with 128 x 128 pixels as shown in the bottom row of Figure 4 was done on a minicomputer (MC 68000-type) in 10-15 minutes CPU-time . In Figure 5 correct and computed depth along the horizontal and vertical lines as indicated in Figure 4 is plotted . Although the experiments were performed only with standard equipment (vidicon carver) they show a promising high precision of about 5% . On the other hand the limitation of the approach is

prespecified maximal error,

n?Nmar

July 1986

PATTERN RECOGNITION LETTERS

maximal number of iterations . (29)

In the experiments reported in Section 6 we chose EI .

After computation of a stable value u"(i, j) for .some n the depth coordinate z is obtained from (4),(5)- This completes depth computation . Only local information in a small pixel neighborhood was used . z

z lcml It

[.m]

-160

- 760

- 740

-740

-720

- 720

20

60

700

xel]

20

60

700

b)

a)

Figure 5 . Correct and computed depth along (a) horizontal line and (b) vertical line shown in Figure 4a . z

z

-760

n

-140

-720 20

60

700

a) Figure 6 . Correct and computed depth along (a) horizontal line and (b) vertical line shown in Figure 4b . 211



Volume 4, Number 3

PATTERN RECOGNITION LETTERS

visible : a single point of wrong correspondence (Figure 5b, right) would need more iterations to fade .

References

[I] Jarvis, R .A . (1983) . A perspective on range finding techniques for computer vision . IEEE Trans . PAM! 5, 122-139 . [21 Hall, E .1 . . and C . A . McPherson (1983) . Three dimensional perception for robot vision . Proc. SPIECoot! on Robotics and Robot Sensing Systems, Vol . 442, San Diego, CA, 117-143 . 131 Yakimovski, Y . and R . Cunningham (1978) . A system for extracting three-dimensional measurements from a pair of TV cameras . CGIP 7, 195-210 . 141 Shapira, R . (1974) . A technique for the reconstruction of a straight-ed_ec wire-frame object from two or more central projections . CGIP 3, 318-325 .

212

July 1986

[5] (irinison, W .E .L . . (1981)) . A computer implementation of a theory of human stereo vision . MIT, At Memo 565, Cambridge . [61 Baker, H .H . (1982) . Depth from edge and intensity based stereo . Report no . STAN CS-82-930, Stanford University, Stanford, CA . [7] Niemann, H ., H . Bunke, I . Hofmann, G . Sagerer, F . Wolf and H . 1 eistel (1985) . A knowledge based system for analysis of gated blood pool studies . IEEE Irons. PAM! 7, 246-259 . [8] Akita, K . (1984) . Image sequences analysis of real world human motion . Pattern Recognition 17, 73-83 . [9] Horn, B .K .P . and B .G . Schanck (1981) . Determining optical flow . Al 17, 185-203 . [10] Nagel, 11 .-H . (1983) . Displacement vectors derived from second-order intensity variations in image sequences . CVGIP 21, 85-117[I I] Ballard . D .H . and C .M . Brown (1982) . Computer Vision . Prentice-Hall, Englewood Cliffs . NJ .