3D reconstruction in a constrained camera system

3D reconstruction in a constrained camera system

Pattern Recognition Letters 23 (2002) 1337–1347 www.elsevier.com/locate/patrec 3D reconstruction in a constrained camera system Jin Won Gu a, Joon He...

279KB Sizes 0 Downloads 82 Views

Pattern Recognition Letters 23 (2002) 1337–1347 www.elsevier.com/locate/patrec

3D reconstruction in a constrained camera system Jin Won Gu a, Joon Hee Han a b

b,*

Technical Research Center, Munhwa Broadcasting Corp. 31, Yoido-dong, Youngdungpo-gu, Seoul 150-728, South Korea Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyo-Ja Dong, Pohang 790-784, South Korea Received 25 June 2001; received in revised form 23 October 2001

Abstract Recently, there have been active researches on structure from discrete motion or shape from image sequences. In order to compute metric shapes, we may use a stereo algorithm with calibration parameters, or algorithms with a selfcalibration method from three or more images. In this paper, we show a method of computing a shape from two images without using a calibration target. First, the camera system is constrained such that the intrinsic and the extrinsic parameters are limited to a minimum number. Then the constraints are relaxed step-by-step until real situations can be handled. The values of the parameters of the restricted system are used as the initial values to the more general configuration. This method can be used when we restrict the motion between two cameras. First, we derive equations with strict constraints, and later we relax the constraints to handle realistic situations. Experimental results with real images showed the validity of our algorithm. Ó 2002 Elsevier Science B.V. All rights reserved. Keywords: Stereo; Structure from motion; Epipolar geometry; 3D reconstruction; Camera calibration

1. Introduction Stereo and image sequence analysis have been studied extensively to obtain 3D structures from intensity images. In order to obtain a metric structure from a pair of images, camera calibration is needed (Tsai, 1987; Faugeras and Toscani, 1986). Motion and optical flow have been traditional computer vision problems. Depending upon the constraints, several different methods have been

*

Corresponding author. Tel.: +82-54-279-2247; fax: +82-54279-2299. E-mail address: [email protected] (J.H. Han).

studied. Tomasi and Kanade (1992) used singular value decomposition to decompose motion and shape information from projected data with the assumption of orthographic projection. LonguetHiggins (1981) computed 3D structural information from image correspondences with known camera parameters. But only recently have researchers obtained applicable algorithms of obtaining a metric structure from image correspondences in projective cases. Several self-calibration methods, in which camera calibration parameters are computed from image correspondences in perspective projection cases, have been proposed (Faugeras et al., 1992; Hartley, 1997; Pollefeys and Gool, 1999). A theoretical method of direct computation of focal

0167-8655/02/$ - see front matter Ó 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 7 - 8 6 5 5 ( 0 2 ) 0 0 0 8 2 - X

1338

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347

length from a fundamental matrix has been suggested assuming the focal length is the only unknown parameter (Kanatani and Matsunaga, 2000). While stereo algorithms need a calibration target to compute metric shapes (Tsai, 1987; Faugeras and Toscani, 1986) self-calibration methods compute the camera parameters from image correspondences (Faugeras et al., 1992; Pollefeys et al., 1998). A stereo method uses two images, while selfcalibration methods need more than two images if we do not impose some constraints (Pollefeys et al., 1998; Hartley and Zisserman, 2000). Brooks et al. (1996) have shown a method of self-calibration from two images with special constraints. They consider the self-calibration of cameras arranged on a lateral stereo rig with coplanar axis. They show a closed form solutions of limited number of parameters from fundamental matrices. Some researchers used a turn table to compute 3D structures. Han et al. (1994) analyzed the trajectory of surface points to compute 3D locations of surface points, Zheng et al. (1997) analyzed the trajectories of specular reflections of a smooth object to recover 3D structures, and Mendoncß (2001) analyzed the boundaries of a rotating object. These methods use a lot of images taken at small angle intervals. In this paper, we present a method of computing a 3D shape from a pair of images without using the calibration target. Therefore, we have both the advantages of the stereo method and that of self-calibration. Since we use only two images, it is different with the turn table methods. We start with a constrained system, and the depth information is explicitly computed. Then we relax the constrains until we can handle much more general situation for which 10 parameters can be computed.

We show which and how the parameters are constrained and explain the method of reconstruction from image correspondences. 2.1. Extrinsic parameter constraints We restrict the initial condition and derive formulas to compute the depth information of an object explicitly. Then we relax the constraints to handle general (real) situations. First, we consider the most constrained case. Assume that the camera system is as shown in Fig. 1. There are two pinhole cameras, whose optical centers are C1 and C2 , respectively, optical axes intersect each other, and Y-axes of the two cameras are parallel to each other. The origin O of the world coordinate system is located at the intersection of the two optical axes. The Z-axis is aligned with the camera center C1 of the first camera. Y-axis is parallel to Y-axis of C1 and C2 . Let d1 and d2 be the distance from C1 and C2 to ! O, respectively, and h be the angle between OC1 ! and OC2 . Then the extrinsic parameters of this system are d1 , d2 and h (see Fig. 1). 2.2. Intrinsic parameter constraints Usually the intrinsic parameters of a pinhole camera model are au , av , s, u0 , v0 (Faugeras, 1993;

2. Constraints and reconstruction To increase the accuracy of the convergence in the minimization process of the parameter computing step and to decrease the relative dependency of the initial value, the extrinsic parameters as well as the intrinsic parameters are constrained.

Fig. 1. A constrained camera system.

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347

Xu and Zhang, 1996). In general, the aspect ratio of CCD array of a digital camera is close to 1, there is little skew and the principal point is near to the center of image. We also assume that au ¼ av ¼ a, s ¼ 0 (Hartley, 1998), and ðu0 ; v0 Þ ¼ ð0; 0Þ for the convenience of calculation. In this case, the intrinsic camera calibration matrix is a 3  3 diagonal matrix A,

1339

2.3. 3D reconstructed coordinates From (2) and (3), the inhomogeneous image T T coordinates ðx1 ; y1 Þ of p1 and ðx2 ; y2 Þ of p2 , respectively, are represented by x1 ¼

aX ; d1 Z

y1 ¼

aY d1 Z

ð4Þ

and A ¼ diagða; a; 1Þ:

ð1Þ

Then the projection matrix P1 of the first camera is 2 3 1 0 0 0 60 1 0 07 7 P1 ¼ ½A j 06 4 0 0 1 d1 5; 0 0 0 1 where 0 ¼ ½0 0 0T . The second camera has the same intrinsic parameters as the first one and its projection matrix is obtained by the transformation of P1 , a rotation around Y-axis, followed by a few translations along the Z-axis of the world coordinate system. The second projection matrix P2 can be obtained by substituting d1 with d2 in P1 and multiplying a rotation matrix Ry ð hÞ, such that 2 3 1 0 0 0 60 1 0 07 6 7 P2 ¼ ½A j 06 7 4 0 0 1 d2 5 0 2 cos h 6 0 6 6 4 sin h 0

0 0

0 1 sin h

1

0

0

cos h

0

0

0

3

07 7 7: 05 1 T

Given a 3D point M, ðX ; Y ; ZÞ in world coordinates, the homogeneous coordinates of the image of M in the first and second image planes, p1 and p2 , are 2 3  aX M p1 ¼ P1 ¼ 4 aY 5; ð2Þ 1 d1 Z  p2 ¼ P2

M 1



2 ¼4

3

aðX cos h Z sin hÞ 5: aY d2 ðZ cos h þ X sin hÞ

ð3Þ

aðX cos h Z sin hÞ ; d2 ðZ cos h þ X sin hÞ aY : y2 ¼ d2 ðZ cos h þ X sin hÞ x2 ¼

ð5Þ

Then from (4) 1 1 X ¼ x1 ðd1 ZÞ and Y ¼ y1 ðd1 ZÞ ð6Þ a a and we have two Zs from (5), denoted by Z1 and Z2 . By replacing X ; Y with the above, we get x2 ad2 x1 ad1 cos h x1 x2 d1 sin h ; sin h þ aðx2 x1 Þ cos h x1 x2 sin h y2 ad2 y1 ad1 x1 y2 d1 sin h : Z2 ¼ y2 a cos h y1 a x1 y2 sin h Z1 ¼

a2

Ideally, Z1 and Z2 should be equal. 2.4. Minimization of error function In X ; Y and Z (Z1 and Z2 ), d1 and d2 are common scale factors, and their ratio d2 =d1 is constant because the cameras are fixed. Therefore we can reduce parameter d1 by dividing all 3D coordinates X ; Y ; Z by d1 , such that d1 ¼ 1 and d2 =d1 ¼ r. Consequently there are three parameters to be computed, a, r, and h. Z1 and Z2 can be rewritten as follows: Z1 ¼ Z2 ¼

a2

x2 ar x1 a cos h x1 x2 sin h ; sin h þ aðx2 x1 Þ cos h x1 x2 sin h

y2 ar y1 a x1 y2 sin h y2 a cos h y1 a x1 y2 sin h

ð7Þ

ð8Þ

and for each correspondence, Z1 must be equal to Z2 . The objective function to be used in finding the parameters can be made by minimizing two types of errors: the difference between Z1 and Z2 , and the difference between image data and reprojected

1340

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347

data. Let M 0 ¼ ½X Y Z1  (or ½X Y Z2 ), where X ; Y ; Z1 (or Z2 ) are given in (6) and (8), and let ½a1 a2 a3 T ¼ P1 ½M 0 1T , x01 ¼ ða1 =a3 Þ, y10 ¼ a2 =a3 . Let ½b1 b2 b3 T ¼ P2 ½M 0 1T , x02 ¼ b1 =b3 , y20 ¼ b2 =b3 . Then the error of projected data is defined

 " # 2

x x0

1

10

f1 ¼

y1 y1

and

 " # 2

x x0

2

20 : f2 ¼

y2 y2

Using f1 and f2 , we define the error function as follows: X 2 fe ða; r; hÞ ¼ ððZ1 Z2 Þ þ f1 þ f2 Þ: ð9Þ That is, the error of the projection of a reconstructed 3D point to both image planes should be 2 minimized as well as ðZ1 Z2 Þ . 2.5. Analysis in epipolar geometry We can also derive the error function using epipolar geometry. In epipolar geometry, the fundamental matrix contains the camera parameter information. Therefore, the constraints we impose have some effects on the form of the fundamental matrix and some parameters can also be obtained from the fundamental matrix (Hartley, 1992). The fundamental matrix is written as T

F ¼ A EA

1

T

1

¼ A ½t RA ;

ð10Þ

Fig. 2. The translation vector t in the X–Z plane of the world coordinate system.

where A is the camera calibration matrix, and ½t is a skew symmetric matrix made from the translation vector t. t and R, respectively, represent the translation and rotation of the second camera relative to the first. E is an essential matrix. Fig. 2 shows the vector t in the Z X plane of the world coordinate system, where O1 and O2 represent the first and second camera centers, respectively. Note that the scale is relative value to d1 . That is, d1 ¼ 1, and r ¼ d2 =d1 . The translation vector t is shown in Fig. 2, and R can be computed from the constraints 2

3 2 r sin h cos h 5 and R ¼ 4 0 t¼4 0 r cos h 1 sin h

3 sin h 0 5: cos h

0 1 0

Thus, the essential matrix E and the fundamental matrix F are 2 3 0 1 r cos h 0 6 7 E ¼ ½t R ¼ 4 r cos h 0 sin h 5; 0 r sin h 0 2

0 61 F ¼ 4 a2 ðr cos hÞ 0

1 a2

3 ð1 r cos hÞ 0 7 0 1a sin h 5: 1 r sin h a

0 ð11Þ

And F must satisfy ~TF f m m0 ¼ 0;

ð12Þ

T ~ ¼ ½x1 y1 1T and f m0 ¼ ½x2 y2 1 . where m Here, we have another error function using (12) X 2 ~TF f fe ða; r; hÞ ¼ ððm m0 Þ þ f1 þ f2 Þ ð13Þ

which is similar to (9). The error function (9) minimizes the distance error between Z1 and Z2 in 3D space, while function (13) minimizes the distance from an image point to a corresponding epipolar line in the image. Luong and Faugeras (1997) used a minimization method based on epipolar geometry. The minimization function (13) includes the epipolar constraints as well as the reprojection terms f1 and f2 .

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347

3. Generalization of the camera configuration It is very difficult to make the camera system satisfy the constraints shown in Fig. 1 and we have to consider this when the constraints are not kept strict. Therefore, we relax the constraints in a step by step process. 3.1. Intersection Assume that two optical axes do not intersect, and let the distance between them in the Y-axis be d. It is easy to derive the second projection matrix P2 by a translation of the camera along the Y-axis, or an inverse translation of the 3D points. Then P2 becomes 2 3 1 0 0 0 60 1 0 07 6 7 P2 ¼ ½A j 06 7 4 0 0 1 d2 5 0 0 0 1 2 3 cos h 0 sin h 0 6 0 1 0 d 7 6 7 6 7 4 sin h 0 cos h 0 5 0 0 0 1

x2 ¼

ð14Þ

P1 and image coordinates of the first camera do not change: x1 ¼ aX =ðd1 zÞ, y1 ¼ aY =ðd1 zÞ. Therefore 1 X ¼ x1 ðd1 ZÞ; a

1 Y ¼ y1 ðd1 ZÞ: a

ð15Þ

Replacing X ; Y in (14) with (15) yields Z1 and Z2 : x2 ad2 x1 ad1 cos h x1 x2 d1 sin h ; a2 sin h þ aðx2 x1 Þ cos h x1 x2 sin h y2 ad2 y1 ad1 x1 y2 d1 sin h þ a2 d Z2 ¼ : y2 a cos h y1 a x1 y2 sin h Z1 ¼

As before, we use a scale relative to d1 . That is d ¼ 1, and d2 =d1 ¼ d2 ¼ r, and d=d1 ¼ r0 , then the error function becomes X fe ða; r; r0 ; hÞ ¼ ððZ1 Z2 Þ2 þ f1 þ f2 Þ: ð16Þ We can use the fundamental matrix to derive the error function. Since the translation vector t is ½r sin h r0 r cos h 1T , we have 2 0 3 r sin h 1 r cos h r0 cos h 6 7 0 sin h 5; E ¼ ½t R ¼ 4 r cos h r0 cos h r sin h r0 sin h 2 3 a12 r0 sin h a12 ð1 r cos hÞ 1a r0 cos h 6 7 0 1a sin h 5 F ¼ 4 a12 ðr cos hÞ 1 a12 r0 cos h r sin h r0 sin h a ð17Þ and the parameters should minimize the following error function X ~TF f fe ða; r; r0 ; hÞ ¼ ððm m0 Þ2 þ f1 þ f2 Þ: ð18Þ 3.2. Relaxation of the angle constraint It is difficult to make the camera system keep the angle constraint and the rotation around Yaxis exact. Therefore, this constraint must be relaxed. It can be done by replacing Ry ð hÞ with an arbitrary rotation matrix in P2 . Let the error of rotational angles around X-, Y- and Z-axis be hx , ðhy hÞ and hz , respectively. Then the arbitrary rotation matrix R0 ¼ ½rij0  is

and the second image coordinates become  " aðX cos h Z sin hÞ # M ¼ aðY dÞ p2 ¼ P2 : 1 d2 ðZ cos h þ X sin hÞ From this, we have: aðX cos h Z sin hÞ ; d2 ðZ cos h þ X sin hÞ aðY dÞ y2 ¼ : d2 ðZ cos h þ X sin hÞ

1341

R0 ¼ Rz ð hz ÞRy ð hy ÞRx ð hx Þ

ð19Þ

and the homogeneous coordinates of the image point of the second camera are 2 3 0 0 0 aðr11 x þ r12 y þ r13 zÞ 0 0 0 4 aðr21 x þ r22 y þ r23 z dÞ 5 0 0 0 d2 þ ðr31 x þ r32 y þ r33 zÞ and the inhomogeneous image coordinates ðx2 ; y2 Þ are 0 0 0 aðr11 X þ r12 Y þ r13 ZÞ ; 0 0 0 d2 ðr31 X þ r32 Y þ r33 ZÞ 0 0 0 aðr21 X þ r22 Y þ r23 Z dÞ : y2 ¼ 0 0 0 d2 ðr31 X þ r32 Y þ r33 ZÞ

x2 ¼

ð20Þ

1342

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347

The Zs, denoted Z1 and Z2 , and computed from each of the equations in (20) are:

fe ða; r; r0 ; hx ; hy ; hz Þ ¼

X

2 ~TF f ððm m0 Þ þ f1 þ f2 Þ:

ð22Þ

0 Z1 ¼ fx2 ðad2 x1 d1 r31 Þ



Note that hx , hy , and hz in (22) remove the angle constraint.

0 0 0 þ ð x2 y1 r32 x1 ar11 y1 ar12 Þd1 g 0 0 0 0 ðx1 r11 fa2 r13 þ y1 r12 x2 r33 Þa

3.3. Intrinsic parameters

0 0 x1 x2 r31 x2 y1 r32 g; 0 Z2 ¼ fy2 ðad2 x1 d1 r31 Þ



þ

0 ð y1 y2 r32



0 x1 ar21

0 fa2 r23



0 0 þ y1 r22 ðx1 r21 0 0 x1 y2 r31 y1 y2 r32 g



0 y1 ar22 Þd1



0 y2 r33 Þa

2

þ a dg

and the error function becomes X 2 fe ða; r; r0 ; hx ; hy ; hz Þ ¼ ððZ1 Z2 Þ þ f1 þ f2 Þ: ð21Þ We can define an error function which uses the fundamental matrix. Camera rotation matrix R ¼ ½rij , inverse of R0 , and F are R ¼ Rx ðhx ÞRy ðhy Þ Rz ðhz Þ, and F ¼ ½fij , where f11 ¼

1 ðr21 ð1 r cos hy Þ þ r0 r31 Þ; a2

f12 ¼

1 ðr22 ð1 r cos hy Þ þ r0 r32 Þ; a2

1 f13 ¼ ðr23 ð1 r cos hy Þ þ r0 r33 Þ; a f21 ¼ f22 ¼

1 ðr cos hy r11 r sin hy r31 Þ; a2 1 ðr cos hy r12 r sin hy r32 Þ; a2

1 f23 ¼ ðr cos hy r13 r sin hy r33 Þ; a 1 f31 ¼ ð r0 r11 þ r sin hy r21 Þ; a 1 f32 ¼ ð r0 r12 þ r sin hy r22 Þ; a

We also remove the constraints on the number of intrinsic parameters by considering all five parameters. Then the camera calibration matrix A0 is used instead of A " # au s u0 A0 ¼ 0 a v v 0 : 0 0 1 Then the fundamental matrix becomes F ¼ A0 T ½t RA0 1 , and reconstructed 3D coordinates, ðX ; Y ; ZÞ, using F, are: C4 þ C5 au av ðx2 r þ r0 sÞ ; C4 þ C5 C6 ðav ðx1 u0 Þ sðy1 v0 ÞÞð1 ZÞ ; X ¼ au av ðy1 v0 Þð1 ZÞ ; Y ¼ av Z¼

ð23Þ

where C1 ¼ r11 au þ r21 s þ r31 ðx2 u0 Þ; C2 ¼ r12 au þ r22 s þ r32 ðx2 u0 Þ; C3 ¼ r13 au þ r23 s þ r33 ðx2 u0 Þ; C4 ¼ C1 ðav ðx1 u0 Þ sðy1 v0 ÞÞ; C5 ¼ C2 au ðy1 v0 Þ; C6 ¼ C3 au av : And the error function fe is as follows: X ~TF f ððm m0 Þ2 þ f1 þ f2 Þ; fe ðpÞ ¼ 0

ð24Þ

T

where p ¼ ½au ; av ; s; u0 ; v0 ; r; r ; hx ; hy ; hz  .

4. Algorithm

f33 ¼ r0 r13 þ r sin hy r23 : The error function becomes

Minimizing the error function in 10 dimensional parameter space is very difficult since there

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347

1343

might be local optimal solutions. Therefore, it is necessary to start with values near global minimum. The constrained configuration is used to find good initial estimates of the parameters. In this section, we describe finding the initial values, then describe the general algorithm, which finds the 10 parameters using (24).

Let F 0 be the fundamental matrix given in (11), and the corresponding essential matrix be E0 2 3 0 f12 0 6 7 F 0 ¼ 4 f21 0 f23 5; and E0 ¼ AT F 0 A:

4.1. Initialization

f ðE0 Þ ¼ ½ðf122 f212 Þa2 ðf232 f322 Þ a4

0

f32

0

By applying the constraint (26) to E0 , we have 2

4.1.1. r The initial value of r is 1 by the constraint – small translation along the Z-axis. In real experiments, the results of the case when r is fixed to 1 are not very different from those of the case when it is not fixed. However, if it is too large to ignore the difference between d1 and d2 , r cannot be fixed during the minimization. 4.1.2. a Longuet-Higgins (1981) formulated the relation between two corresponding points with the assumption of a normalized camera. The relation is constrained by the essential matrix E. The essential matrix can be decomposed as the product of an antisymmetric and a rotation matrix (Faugeras and Maybank, 1990), and the fundamental matrix is multiplied by A T and A 1 to the left and right side of the essential matrix. From (1) and F ¼ ½fij  ¼ A T EA 1 2 2 3 a f11 a2 f12 af13 ð25Þ E ¼ AT FA ¼ 4 a2 f21 a2 f22 af23 5: af31 af32 f33 There are some constraints that must be satisfied by the essential matrix (Huang and Faugeras, 1989; Faugeras and Maybank, 1990); one is detðEÞ ¼ 0, and the other is 2

2

f ðEÞ ¼ 2Tr½ðEET Þ  TrðEET Þ ¼ 0:

ð26Þ

The initial value of a may be obtained by solving (26). But it is difficult to find its solutions because its degree in a is 8 and also because of numerical errors in F. However, the previous constraints simplify (26). If the camera system keeps the constraints ideally, many entries of F should be 0 as shown in Eq. (11).

and since a > 0, we obtain sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi f232 f322 : a¼ f122 f212

ð27Þ

4.2. h Now there is only one parameter, h, to be initialized. From (25), E ¼ ½eij , where e12 ¼ a2 f12 ; e23 ¼ af23 ;

e21 ¼ a2 f21 ; e32 ¼ af32

and by (11), with the fact that r  1, e12  e21  sð1 cos hÞ;

e23  e32  s sin h;

where s is an unknown scale. Let c1 ¼ ðe12 þ e21 Þ=2, c2 ¼ ð e23 þ e32 Þ=2. Then the above can be rewritten as c1 ¼ sð1 cos hÞ;

c2 ¼ s sin h

ð28Þ

and h can be computed. If (28) has many solutions for h, one angle is chosen which gives the minimum value of fe in (13), and the angle is used as the initial value of h. 4.3. Algorithm outline The reconstruction algorithm proposed in this paper is as follows: 1. Compute F from given correspondences (Hartley, 1995). 2. Compute the initial values of a and h by (27) and (28). 3. (Optional) Using the result from step 2 and r fixed to 1, choose a and h which minimize (13). 4. Using the result from step 2 or 3 and r ¼ 1, choose a, r, h which minimize (13).

1344

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347

5. Using the result from step 4 and r0 ¼ 0, choose parameters which minimize (18). 6. Using the result from step 5 and hx ¼ hz ¼ 0, hy ¼ h, choose parameters which minimize (22). 7. Using the result from step 6 and au ¼ av ¼ a, s ¼ u0 ¼ v0 ¼ 0, choose parameters which minimize (24). In the algorithm, the optional step 3 can be omitted when d1 and d2 are not similar to each other. Similar algorithm performance can be obtained by using (9), (16), and (21) on behalf of (13), (18) and (22), respectively.

Table 1 Results from each step of the example box Step

a

r

r0

3 4 5 6

896.166 896.166 896.166 896.166

(1) 1.0046 1.0044 1.0043

(0) (0) 0.0002 0.0001

Step

hx

hy ðhÞ

hz

fe

3 4 5 6

(0) (0) (0) 0:0730°

12:5230° 12:8171° 12:8330° 13:0296°

(0) (0) (0) 0:2412°

87.726 65.196 62.691 37.450

Numerals in parentheses are the values with which the parameters were fixed during the minimizing process. The amount of change in a in steps 3–6 is very small.

5. Experimental results We show experimental results with real images taken by a digital camera. The input image size is 768  512, corresponding pairs are given manually and the second images are captured after the object was rotated on a table to make the camera system satisfy our constraints. The first set of input image pairs is shown in Fig. 3. The number of corresponding pairs is 62 and computed F is 2 3 0:00000241 0:00009607 0:00828702 4 0:00006598 0:00000552 0:63406990 5: 0:01023987 0:63146694 0:44612926

value Method’’ to solve polynomial equations and ‘‘(Powell’s) Direction Set Methods in Multidimensions’’ to minimize error functions (Press et al., 1992). The parameters computed in the last step of the algorithm are:

In step 2, a and h are

hx ¼ 0:0729°;

a ¼ 896:166138;

h ¼ 13:092499°:

With these initial values, the results from each step are shown in Table 1. We used the ‘‘Eigen-

au ¼ 896:166;

av ¼ 896:166;

s ¼ 0:000345; v0 ¼ 0:000132; r ¼ 1:0043;

u0 ¼ 0:000083;

r0 ¼ 0:0001; hy ¼ 13:0296°;

hz ¼ 0:2412°:

As mentioned above, the image coordinates were translated to fit the origin into the center of image. So, u0 and v0 are close to 0.

Fig. 3. A pair of box images.

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347

1345

Fig. 4. A 3D reconstruction of the example box with texture.

Fig. 5. A pair images of a model house.

Then, we can compute the 3D world coordinates by (23). The computation result with texture is shown in Fig. 4. The environment of the second example is similar to that of the first. The input images are shown in Fig. 5 and the number of correspondence is 246. Initial values in step 2 are: a ¼ 1307:573853;

h ¼ 11:881117°

and the optimal parameters in the last step are au ¼ 1308:065;

av ¼ 1308:065;

s ¼ 0:000297;

u0 ¼ 0:001376;

v0 ¼ 0:000068; r ¼ 1:0421; hx ¼ 0:2107°;

r0 ¼ 0:0003; hy ¼ 11:8940°;

hz ¼ 0:1378°:

In Table 2, parameters from each step are listed. In steps 3 and 4, hy changed considerably because the distance ratio r was very different from 1 than that of the box examples. And the error, fe , of step 3 is much larger than that of step 4. It is clear that

Table 2 Parameter values of the model house example Step

a

r

hy ðhÞ

fe

3 4 5 6

1307.578 1307.577 1307.577 1307.577

(1) 1.0438 1.0439 1.0421

3:5609° 11:2230° 11:4350° 11:8534°

2615.231 263.307 221.061 165.158

The value of fe dropped considerably in step 4.

1346

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347

Fig. 6. A 3D reconstruction of a model house with texture.

parameters in step 3 are wrong, but the algorithm found correct parameters in the next step. The reconstructed object with texture is shown in Fig. 6. The number of iterations depends upon the input image. In the images used in the experiments, the number of total iterations (steps 1–9) was less than 15. Among the iterations, step 4 took about 7 iterations. It took about two seconds using a personal computer with a Pentium III MMX 450MHZ processor.

camera parameters this way, we were able to avoid the problem of being trapped at a local minimum in the optimization process. The depth information was directly related to the configuration. Therefore, we were able to compute the depth information without using a calibration target, or self-calibration method. Experimental results with real images showed the validity of our method. The limitation of this method is that even though we relaxed the constraints, the camera motion was still limited to a certain configuration. Future works include removing the remaining restrictions further.

6. Conclusion A method of computing shapes from a pair of images without using a calibration target is suggested. This is done by constraining the camera configuration in such a way that the depth information can be derived explicitly. In the system, 10 intrinsic and extrinsic parameters are used. Finding 10 parameters using a nonlinear objective function is very difficult. Therefore, the initial camera configuration is constrained in such a way that a limited number of parameters are involved in the constrained system. Then, the constraints are relaxed step by step until real situations can be handled. Two models of error functions, one containing the depth value, and the other containing the fundamental matrix, are derived. The parameters computed from the constrained system are used as the initial values of the more general configuration. By finding the

References Brooks, M., Agapito, L., Huynh, D., Baumela, L., 1996. Direct methods for self-calibration of a moving stereo head. In: Proc. ECCV’96 II, pp. 415–426. Faugeras, O., 1993. Three-Dimensional Computer Vision. MIT Press, Cambridge. Faugeras, O., Maybank, S., 1990. Motion from point matches: multiplicity of solutions. Internat. J. Comput. Vision 4, 225– 246. Faugeras, O., Toscani, G., 1986. The calibration problem for stereo. In: Proc. Comput. Vision Pattern Recognition’86, June 22–26, Miami Beach, FL, pp. 15–20. Faugeras, O., Luong, Q.-T., Maybank, S., 1992. Camera selfcalibration: theory and experiments. In: Proc. ECCV’92, pp. 321–334. Han, J., Kim, M., Poston, T., 1994. Computation of fixed surface features. Pattern Recognition Lett. 15, 587–593. Hartley, R., 1992. Estimation of relative camera positions for uncalibrated cameras. In: Europ. Conf. on Comput. Vision, pp. 579–587.

J.W. Gu, J.H. Han / Pattern Recognition Letters 23 (2002) 1337–1347 Hartley, R., 1995. In defense of the 8-point algorithm. In: Internat. Conf. on Comput. Vision, pp. 1064–1070. Hartley, R., 1997. Self-calibration of stationary cameras. Internat. J. Comput. Vision 22 (1), 5–23. Hartley, R., 1998. Minimizing algebraic error in geometric estimation problems. In: Proc. Internat. Conf. on Comput. Vision, pp. 469–476. Hartley, R., Zisserman, A., 2000. Multiple View Geometry in Computer Vision. Cambridge University Press, Cambridge. Huang, T., Faugeras, O., 1989. Some properties of the E matrix in two-view motion estimation. IEEE Trans. Pattern Anal. Machine Intell. 11 (12), 1310–1312. Kanatani, K., Matsunaga, C., 2000. Closed-form expression for focal lengths from the fundamental matrix. In: Proc. ACCV’2000, pp. 128–133. Longuet-Higgins, H., 1981. A computer algorithm for reconstructing a scene from two projections. Nature 293, 133– 135. Luong, Q.-T., Faugeras, O., 1997. Self-calibration of a moving cameras from point correspondences and fundamental matrices. Internat. J. Comput. Vision 22 (3), 261–289. Mendoncßa, P., Wong, K.Y., Cipolla, R., 2001. Epipolar geometry from profiles under circular motion. IEEE Trans. Pattern Anal. Machine Intell. 23 (6), 604–616.

1347

Pollefeys, C., Gool, L., 1999. Stratified self-calibration with the modulus constraint. IEEE Trans. Pattern Anal. Machine Intell. 21 (8), 707–724. Pollefeys, M., Koch, R., Gool, L.-V., 1998. Self-calibration and metric reconstruction in spite of varying and unknown internal camera parameters. In: Internat. Conf. on Comput. Vision, pp. 90–95. Press, W.-H., Teukolsky, S.-A., Vetterling, W.-T., Flannery, B.-P., 1992. Numerical Recipes in C: The Art of Scientific Computing, second ed. Cambridge University Press, Cambridge. Tomasi, C., Kanade, T., 1992. Shape and motion from image streams under orthography: a factorization method. Internat. J. Comput. Vision 9 (2), 137–154. Tsai, R., 1987. A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-theself TV cameras and lenses. IEEE J. Robot. Automat. RA-3 (4), 323–344. Xu, G., Zhang, Z., 1996. Epipolar Geometry in Stereo Motion and Object Recognition: A Unified Approach. Kluwer Academic Publishers, Dordrecht. Zheng, J., Fukagawa, Y., Abe, N., 1997. 3D surface estimation and model construction from specular motion in image sequences. IEEE Trans. PAMI 19 (5), 513–520.