Pattern Recognition Levers 4 (1986)
205- 2 t2
July 1986
North-Holland
Computing depth from stereo images by using optical flow A . SCHEUING* and H . NIEMANN Leltrstrdtl,f'it, Inforrattik 5 (Muslererkennung), Uaivercit it hiimtgen - :Vimtbarn, .1iarreacvrrgpe 3 . 8520 LNun,een, I-ed, Rep. German r
Received 5 August
1985
AbsoucC This paper presents an approach to depth determination which uses two screo images and a combination of iteratively computing optical flow with a local correspondence criterion . the method is a low-level solution to depth determination in the sense that no contours, no features, and no matching of corresponding contours of features is necessary . Instead of that an iterative optimization using pixel values only is employed . Experiments show that computation of depth on 128 x 128 images can he done on a minicomputer (NW 68000-type), that an accuracy of about 5' ,o is obtainable, and that depth discontinuities can he handled .
Rev words: Stereo, optical flow, local correspondence, depth computation, 3-D information .
1 . Introduction
Determination of depth information is a necessary prerequisite for three-dimensional computer vision . Among the approaches to this problem are laser range finders, patterned light, ultrasonic devices, and stereo images . Surveys on depth determination are given in [1,2] . This paper addresses the problem of depth from stereo images . From a three-dimensional object two images are taken by two cameras separated by a known distance h . Several solutions and algorithms were developed [3-6] . The basic approach is to determine corresponding points in the two images . If the camera parameters are known, the depth of an object point in a three-dimensional scene can be computed by well-known equations from the two-dimensional image coordinates of this object point in (tic two stereo images . The problem then is the automatic determination of two * A . Scheuing is a visiting scientist from the Institut fur Informatik and Angewandte Mathematik, University of Bern, Switzerland .
stereo image points belonging to the same object point which is called the correspondence problem . In order to determine this, two stereo image points are corresponding points by using only local information in the neighborhood of these points, the neighborhood must have sufficient structure . A common approach is determination of contour lines (or other prominent features) in the stereo images and matching of contour lines [4,6] to obtain corresponding points . Basically the same problem occurs when determining the two-dimensional velocity vector (that is, the velocity vector in the image plane) of moving object points . If two images separated by a known time interval Jt are taken from a moving object and if two points in those images belonging to the same object point can be determined, the velocity vector of this point can be computed . One approach is computation of contour lines (or other features) and tracking of those lines from image to image in a time sequence of images [7, 8] . Another approach is computation of optical flow by iterative solution of a variational equation [9, 10] . The advantage of this approach is that it does not re-
0167-8655/86/53 .50 + 1986, Flsevier Science Publishers B .V . (North-Holland)
205
Volume 4, Number 3
PATTERN RECOGNITION LETTERS
quire initial determination of contour lines or other features, but uses pixel values only . Velocity of points can thus be determined independent of some initial image segmentation, and therefore velocity may in turn be used as a segmentation cue . The contribution of this paper is a combination of a simple local correspondence criterion with the variational equations derived in [9] to obtain an iterative computation of depth . No initial segmentation or computation of features is necessary . In this sense the proposed method is a purely lowlevel approach . Since depth is obtained without prior segmentation it can be used as an independent cue to segmentation . For example, if a discontinuity of depth occurs, this is a strong cue for two different surfaces . It should be noted that depth is not computed from motion information, but from two static images . Section 2 of the paper repeats some essential equations from stereo vision and introduces the employed notation . Section 3 repeats the essential equation from optical flow and gives the specialization to the stereo problem . Section 4 gives an overview of the proposed approach . Section 5 developes the iterative computation of depth . Section 6 presents experimental results .
July 1986
the camera axis . The two-dimensional image coordinates of the right camera are (x',y') and the image plane is -z -f where f is the focal length of the camera [ii] . At distance b to the left, in the direction of the x-axis, the left camera is located . The image coordinates of this camera are (x", y") . A three-dimensional object is mapped to a right image f R (x ;y') by the right camera and to a left image f L (x",y") by the left one . The depth coordinate z is lost by this mapping . It is assumed that the cameras are oriented such that the scan lines are parallel to the x', x"-axis and that those axes are parallel to the x-axis . In this case an object point (x, y, z) is mapped to a left image point (x", y") and a right image point (x" y') where y"-y'
and
x"=x'-fb1z .
The two-dimensional image coordinates are ] l y'J
and
Z[Y
[Y"] --z[XYb] .
Figure I shows the employed geometry and coordinate systems . The three-dimensional scene coordinate system is (x,y,z), it has its origin in the center of the right camera, and it has its z-axis in
(2)
The disparity - the displacement of corresponding image points - is (3a)
bu(x', y')=x"-x'=-fb/z and the disparity coefficient is "W'Y')-f/z .
2 . Depth from corresponding stereo image points
(1)
(3b)
If u can be computed from the two stereo images f t and f R and f and b are known, the threedimensional object coordinates are x Y z
1 (4) U
In digital image processing it is convenient to measure the disparity bu in units of pixel distance ; denote this by u', the image width in pixel by N, the image width in [cm] by xb : N fb b u'=bu-=-N--=-Ny-, xb z Z Xb
(5)
Y =f/z,, If the depth z ranges between zm i n and Zmax, disparity u' ranges from Figure 1 . Stereo geometry and coordinate systems adopted in the text .
206
umin =-Nyb/Zmax~_
u's
ax =-Nyb/Zmin-
(6)
Volume 4, Number 3
PAT IL RN RECOGNITION LETTERS
These equations establish the basic relationships for depth computations if corresponding image points are known .
July 1986
f, in (7) . In order to have a different weighting of ux and u, we modify (10) to F,'=u,'+fi'u,, .
(12)
The direct solution 3 . Computation of optical flow (13)
u"= - .fr, J,
In a time sequence of images showing a moving object one may view the intensity J(x, y, l) at every image point to have an associated velocity vector (u(x, y), u(x, y)), . The field of velocity vectors is called optical flow and it is shown in [9] that the relation fr u+f .u+J, =o
(7)
holds where f,-, J',., f, denote partial derivatives of f(x, y, f) with respect to x, y, t, respectively . Because on a larger moving object the velocity is changing only slowly, the smootheness constraint h ;=u
+u, +u +u ;.
in (11) might he used if Jx#0, but a specialization of (9) allows one to compute an approximation of u even if f,=0 and the value of u* is very noisy . The iterative solution in this case reduces errors . This is possible because (9) incorporates a neighborhood of a point to judge continuity . In fact, purely row-wise computation of u could be considered because disparity is only in the .v-direction . The prevalence of the x-direction is evident in (12) from the factor 72 <1 . Therefore, again a minimization problem is introduced by modifying (9) with (10, 11) to
(8) f -
is introduced . The problem is to find a velocity field which is smooth, that is minimizes (8), and satisfies (7) . Introducing the Lagrange multiplier a 2 velocities u and u have to be computed which minimize F
II Qf,.u+f,u+f]
(9)
(14)
Using variational calculus the condition (15)
results . This equation can be solved iteratively for u(x, y) as shown below . The terms f(x, y, b) and u(x, y) are replaced by their discrete versions f(i, j, k) and u(i, j), respectively, where i is the column index and j the row index . A discrete approximation of u,,+f 2 u 1 ,, is according to Figure 2 u,~+f3-u, .,-ir(1,1)-tr(1,J)
and f'(x,y)=f(-v,_v h) . (10)
Because of the geometry adopted in Figure 1 there is only a velocity component in the x-direction, which is equivalent to the disparity coefficient u, and a=0 . Therefore, (7) and (8) reduce to fr u+f,=O
+a 2 [u, +$ 2 u2] dxdy
fu+ .ffv=a - ( u„+a 2 ur . )
The iterative solution of this variational problem is given in [9) . In the case of stereo images one may view an image point f 1 (x"y") to be obtained from f R (x', y') by displacing it by the disparity bu(x', y') as given in (1) . To obtain formal equivalence use the camera distance b in Figure 1 to replace i in f( .r,y,t) and write fR(x,Y)= .f(x,y,0)
IA"+L1 2
(F ; ;+a'F,)dxdy .
2
+a22[u I:+u,-+u,+v,"])dxdv .
II
and
Fs=u,+u y .
(11)
Above, f„ - defined in (20), (21) - is the partial derivative with respect to b in Figure 1 and replaces
r-1
132
1-r
1 .1
Y U (x, y )
u(4/)
0 (1 .
r 2R z /la
2
0
13 2
2/1 2
j) .
2
(4 .8/1 2)
fl2s 1
Figure
Computation of a discrete approximation to u .,- fl'u„ 207
Volume 4, Numher 3
PATTERN RECOGNITION LETTERS
where
July 1986
showing that a"" is a weighted mean of u" (that is disparities of neighboring points) and a*, the direct solution for u(i, j) . If f,-0, the influence of the direct solution vanishes and the neighboring points dominate .
u(i,j)=(4+8/32)-1[2u(i-I,j)+2u(i+I, j) +fl 2 (u(i- I, j- 1)+2u(i, j- 1) +u(i+ l, j-1)+u(i-l, j+ 1) +2u(i,j+1)+u(i+l,j+1))] .
4 . Overview of the algorithm
A discrete approximation to f, and fl, will be obtained in Section 5 because these approximations are coupled to the local correspondence criterion . With (16) we obtain from (15) u
A short overview of the proposed method for depth computation is given in Figure 3 . The problem is equivalent to iterative computation of velocities if the right and left stereo images arc considered as images one unit of `time' apart . This allows one to derive an iterative equation for computation of disparity (see (19) in Section 3) which is in direct analogy to computation of velocity as derived in [9] . A deviation from 191 is the introduction of a local correspondence criterion for computation of discrete approximations to f and f, occuring in (15) . This step is essential to allow for a sufficiently large range of depth values and it also provides good initial values of disparity to start the iteration . Iterative depth computation then is fairly straightforward and gives the depth coordinate z .
a' ff L+a~ a fx+a 2 .
For an image with N • N pixels (17) gives a system of linear equations for the N • N unknowns u(i, j), i,j=1, . . .,N . A solution can be computed by Gauss-Seidel iteration u ° =0,
f,.2
a2 +a 2
f fn 0 .+af,
where n is the iteration step . Other initial values for u ° are considered in Section 5 . With the direct solution for u* in (13) equation (18) can be rewritten as A ua„= ce, + f,+a 2 f +a
stereo images
3D scene
~~ a
direct solution
fx .
/b . U
u
z
ft=f(x,y,bl
equivalence "tree I' "displacement b"
sect. 2,3
p"rtiul denvat :ves
1"m! correspondence
sect . 3,5
sect . 5
Figure 3 . The steps of depth computation . 208
depth
fk_ f(x,y,0)
i
~LJ
iterative solution
depth from Csparity
sect . 5
^
Volume 4, Number 3
PATTERN RECOGNITION LETTERS
5 . Iterative computation of disparity
July 1986
If s is determined, it is an approximation to u* . From (23) the local correspondence criterion
5 .1 . The case of small disparity i . (i,j)=(.f,R(i,j)- ./,I (i+s, .i))' A straightforward discrete approximation of f„f,, in (18),(19) would be
f, (1,
j)=VR(i,J)+ .fR(i,J+ 1) 4f,'(1,J)+ .J (1,J+ I)]/4,
/, . s (i,
I) - f I .R (14
l,J)-f''K(1,J),
Jv(i,I)=U,°(1,J)+J1,(i,J+1) +J1,'(1+I,J)+fn(ii 1,J+l)]/4, (20)
fn`)(i,J)=fl-(i,J)-fs(i,J),
However, these discrete forms are suitable only for small disparities 0<--u'<2 . A simple numerical example shows that this is not an interesting case . Let the image have N = 128 pixels, let y=f/x,,=2 in (5) and z,,, ;,,=50cnr, then from (6) we get b< -a'z,,, ;,,/(Ny)=0 .4em . Such a small stereo basis could only be obtained by moving one camera 0,4 cm to the left, but not by a pair of cameras . In general, small disparities result in a small range of depth values . Therefore, approximations to f-, f, should not be computed by (20) . Instead we introduce f(iJ)=f li+s,J)-f x li,J),
s=0,l,2, . . .,
5 .2 . A local correspondence criterion Computation of s in (21) is based on what is called in this paper a local correspondence criterion L . The range of s follows from (6) : (22)
The basic idea for the determination of s is that for the proper value of s one should observe at some fixed row j„ f,R(i,Jo)= .f i+S,Io), ./ (i,Jo)=f,~(i+ I, j( ,)=0 .
`+A(i+ I,J)')
(24)
is derived . The value of s is selected which minimizes L . This is done by a search over the discrete values of s given in (22) . Let s* be the value of s minimizing l . in (24) ; the value s* is computed in every point of the image array . Since s* is integer, but a' need not be integer, a refined value of u' is computed by filling a parabole av 2 +hxi c through three points (k,L''(i, j)), k-i-i s-l,i+s,i+s+1 . The minimum of this parabola is the refined value u'-s*+ (L' -l (i, j)-L' -l (i,J))/(4a), a=0 .5(L'-'(i, .i)-2L`(i,J)+1."'(i,J)) .
(25)
Since I/a is the radius of curvature in the minimum of the parabola, a is also a measure of reliability . A large value of a indicates high curvature and a well defined minimum . An estimate of f,. is f(i,J)=(/ (i,I)+f,'-(i+s*,J))/2 .
(26a)
(21)
as a generalization of (20) . The reason for introducing s is obvious . If the disparity is too large to compute reasonable values of f, by (20), the number s effects a shift of f t . i n x-direction to obtain a more accurate approximation of J t' .
A m .,, < SG amux
4 ( .f'(i,A
(23)
A weighted of f, is justified because in (I8, 19) f,, determines only the relative weight of a and u* . A reasonable weight is the factor a in (25) giving f,-(i,j)=a(J,K(i,J)+f,
(26b)
A large weight of u* is reasonable only if a strong local correspondence cue - measured by a in (25) - is available . The criterion L in (24) is essential for achieving a large depth range and reliable computation of disparity a by (18, 19) . It would be interesting to investigate the extension of the local correspondence criterion L in (24) also to the case of velocity computation where a two-dimensional version of L is necessary . 5 .3 . The iteration To obtain a fast and reliable iterative solution of (18), (19) it is extremely helpful to start with reason209
PATTERN RECOGNI IION LETTERS
Volume 4, Number 3
able initial values u ° . The following choices are conceivable : u ° =0
(worst case, if nothing is known),
u°=(u,;, ;,,+µ;, ;,,)/2 :5 u,)-u tr°-Su* ~0
u
(see (6)),
(see (25)), if f,' - > T, else,
u* iff:>T, linear interpolation between the 1 values of u* to the left and tright with f,2 > T.
b
(27)
According to some initial experiments the last choice of u ° gave best results . The threshold T in (27) should be selected with respect to a in (IS), (19) ; according to some experiments we chose T=a/3 and a=0 .1 max{fr } .
July 1986
It is helpful to do some bookkeeping in order to avoid unnecessary computation of a" . This may occur at the final steps of iteration if some values U'(i,J) already have reached their stable values . As a compromise between treating every point and no bookkeeping at all we found it useful to treat complete rows ; that means, if the values of u" and u" I in three adjacent rows are the same, the middle row is ignored in further iterations . The goodness of an iterative solution may be measured, for example, by c i . =maxlu"' (i,jo)-u"(i,J°)I/u"- (i,j°), 010
where j° is the row under consideration . c,=~ u p 1(i,j)-a"(i,j)I/N z , (28) (F,; + a °F,. ) .
Figure 4 . Illustration of experiments . (a) right camera image of first scene ; (h) right camera image of second scene ; (c) grey value representation of depth (scale is in cm) computed after I I iterations front scene shown in (a) ; (d) sane as (c) for scene shown in (h) .
210
Vohune 4. Number 3
Criteria for terminating the iteration might be
6 . Experimental results
e1,,< T1 Vjo
e,
Two experiments are shown to demonstrate the feasibility of the above approach . The right camera images are shown in Figure 4 (top) . They show an oblique cube in front of a background . Depth computation after I I iterations of these images with 128 x 128 pixels as shown in the bottom row of Figure 4 was done on a minicomputer (MC 68000-type) in 10-15 minutes CPU-time . In Figure 5 correct and computed depth along the horizontal and vertical lines as indicated in Figure 4 is plotted . Although the experiments were performed only with standard equipment (vidicon carver) they show a promising high precision of about 5% . On the other hand the limitation of the approach is
prespecified maximal error,
n?Nmar
July 1986
PATTERN RECOGNITION LETTERS
maximal number of iterations . (29)
In the experiments reported in Section 6 we chose EI .
After computation of a stable value u"(i, j) for .some n the depth coordinate z is obtained from (4),(5)- This completes depth computation . Only local information in a small pixel neighborhood was used . z
z lcml It
[.m]
-160
- 760
- 740
-740
-720
- 720
20
60
700
xel]
20
60
700
b)
a)
Figure 5 . Correct and computed depth along (a) horizontal line and (b) vertical line shown in Figure 4a . z
z
-760
n
-140
-720 20
60
700
a) Figure 6 . Correct and computed depth along (a) horizontal line and (b) vertical line shown in Figure 4b . 211
Volume 4, Number 3
PATTERN RECOGNITION LETTERS
visible : a single point of wrong correspondence (Figure 5b, right) would need more iterations to fade .
References
[I] Jarvis, R .A . (1983) . A perspective on range finding techniques for computer vision . IEEE Trans . PAM! 5, 122-139 . [21 Hall, E .1 . . and C . A . McPherson (1983) . Three dimensional perception for robot vision . Proc. SPIECoot! on Robotics and Robot Sensing Systems, Vol . 442, San Diego, CA, 117-143 . 131 Yakimovski, Y . and R . Cunningham (1978) . A system for extracting three-dimensional measurements from a pair of TV cameras . CGIP 7, 195-210 . 141 Shapira, R . (1974) . A technique for the reconstruction of a straight-ed_ec wire-frame object from two or more central projections . CGIP 3, 318-325 .
212
July 1986
[5] (irinison, W .E .L . . (1981)) . A computer implementation of a theory of human stereo vision . MIT, At Memo 565, Cambridge . [61 Baker, H .H . (1982) . Depth from edge and intensity based stereo . Report no . STAN CS-82-930, Stanford University, Stanford, CA . [7] Niemann, H ., H . Bunke, I . Hofmann, G . Sagerer, F . Wolf and H . 1 eistel (1985) . A knowledge based system for analysis of gated blood pool studies . IEEE Irons. PAM! 7, 246-259 . [8] Akita, K . (1984) . Image sequences analysis of real world human motion . Pattern Recognition 17, 73-83 . [9] Horn, B .K .P . and B .G . Schanck (1981) . Determining optical flow . Al 17, 185-203 . [10] Nagel, 11 .-H . (1983) . Displacement vectors derived from second-order intensity variations in image sequences . CVGIP 21, 85-117[I I] Ballard . D .H . and C .M . Brown (1982) . Computer Vision . Prentice-Hall, Englewood Cliffs . NJ .