Optics and Lasers in Engineering 90 (2017) 119–127
Contents lists available at ScienceDirect
Optics and Lasers in Engineering journal homepage: www.elsevier.com/locate/optlaseng
Sphere-based calibration method for trinocular vision sensor a
Rui Lu , Mingwei Shao a b
b,⁎
crossmark
National Computer Network Emergency Response Technical Team/Coordination Center of China, Beijing 100029, China Qingdao Research Center for Advanced Photonic Technologies, Laser Institute of Shandong Academy of Sciences, Qingdao 266100, China
A R T I C L E I N F O
A BS T RAC T
Keywords: Sphere Trinocular vision Calibration Feature matching
A new method to calibrate a trinocular vision sensor is proposed and two main tasks are finished in this paper, i.e. to determine the transformation matrix between each two cameras and the trifocal tensor of the trinocular vision sensor. A flexible sphere target with several spherical circles is designed. As the isotropy of a sphere, trifocal tensor of the three cameras can be determined exactly from the feature on the sphere target. Then the fundamental matrix between each two cameras can be obtained. Easily, compatible rotation matrix and translation matrix can be deduced base on the singular value decomposition of the fundamental matrix. In our proposed calibration method, image points are not requested one-to-one correspondence. When image points locates in the same feature are obtained, the transformation matrix between each two cameras with the trifocal tensor of trinocular vision sensor can be determined. Experiment results show that the proposed calibration method can obtain precise results, including measurement and matching results. The root mean square error of distance is 0.026 mm with regard to the view field of about 200×200 mm and the feature matching of three images is strict. As a sphere projection is not concerned with its orientation, the calibration method is robust and with an easy operation. Moreover, our calibration method also provides a new approach to obtain the trifocal tensor.
1. Introduction In multi-camera vision system, including stereo vision system, the relationship of cameras is fixed. Therefore, determining the transformation matrix (including the translation matrix and the rotation matrix) between each two cameras is significant. Meanwhile, another task is to determine the multi-focal tensor of multi-camera vision sensor, which is essential to feature matching. The method to finish these two tasks is named as calibration. For stereo vision system, calibration methods are various, such as planar target-based method, 1-D target-based method, etc. In the planar target-based method [1–3], different features on the planar target are utilized, e.g. centers of circles, corners, crosspoints, parallel lines and so on. Anyway, the relationships of these features are known exactly. When images of these features are captured by the two cameras, the coordinates (or expressions) under each camera coordinate system can be deduced from the camera model and the constraints of features. Once the corresponding features are confirmed, the relationship between the two cameras can be determined. In the 1-D target-based calibration method [4,5], the essential matrix is calculated from the corresponding feature points. When singular value decomposition is conducted to the essential matrix, the rotation matrix and
⁎
the translation matrix, which needs a factor, can be deduced. As the relationship between two feature points is known, the factor can be confirmed easily. The sphere target is also widely used to calibrate the stereo vision system. In [6,7], Agrawal and Zhang calibrate the intrinsic parameters of a camera based on the projection of the sphere target and the obtained dual image of the absolute conic. Meanwhile, the relationship between two cameras can be deduced from the 3D points cloud registration method. However, when there are little feature points, calibration results will be with more noise. In [8], two calibration methods are proposed to finish the calibration task. One is using sphere centers and points of tangency to obtain the fundamental matrix, the other is using the homography matrix and epipoles to determine fundamental matrix. In these sphere-based calibration methods mentioned above, the target should be placed in many different positions to obtain enough feature points, meanwhile, a mass of calculation is inevitable. For multi-camera vision system, which consists of more than two cameras, the relationship of cameras is also fixed. Normally, the calibration method for stereo vision system is valid when each two cameras are treated as a stereo vision system. However, as is known, the relationship between two cameras (or two image plane) can be
Corresponding author. E-mail address:
[email protected] (M. Shao).
http://dx.doi.org/10.1016/j.optlaseng.2016.10.004 Received 8 June 2016; Received in revised form 2 September 2016; Accepted 4 October 2016 0143-8166/ © 2016 Elsevier Ltd. All rights reserved.
Optics and Lasers in Engineering 90 (2017) 119–127
R. Lu, M. Shao
⎡ A C /2 D /2 ⎤ Q = ⎢ C /2 B E /2 ⎥ , ⎢⎣ ⎥ D /2 E /2 F ⎦
represented by fundamental matrix. Similarly, the relationship of cameras is expressed by multi-focal tensor. In trinocular vision sensor, the tensor is trifocal tensor. When each two cameras are calibrated by the method for stereo vision system, the calibration result cannot satisfy the constraint of trifocal tensor as it may be local optimum. In addition, the normal calibration method often meets the problem of self-occlusion as the different shooting angle of each camera. In this case, a new approach to calibrate the trinocular vision sensor needs to be proposed. In this paper, a new calibration method for trinocular vision sensor is presented. A sphere target with several spherical circles is utilized to determine the trifocal tensor. When the trifocal tensor is determined, the fundamental matrix between each two cameras can be obtained. In this case, the rotation matrix and the translation matrix with a scale factor can deduced from the singular value decomposition of the fundamental matrix. Then the scale factor is determined by the known spherical circle. In our method, the calibration result is precise and the problem of self-occlusion is evitable as the isotropy of a sphere.
(5)
and the related definitions are show as follow:
A = af02 , B = bf02 , C = cf02 , D = df0 , E = ef0 , F = f
(6)
Define the coordinate of the sphere center under the CCS is (X0 , Y0, Z 0 )T , the coordinate of its corresponding image point is (u 0 , v0 , 1)T , the relationship according to Eq. (2) is
⎛ X0 ⎞ ⎛u0 ⎞ K ⎜⎜ Y0 ⎟⎟ = λ ⎜⎜ v0 ⎟⎟ , ⎝1⎠ ⎝ Z0 ⎠
(7)
where K is the intrinsic parameter matrix of the camera. So the projection of the sphere center and the outline of the sphere can be deduced from Eqs. (5) and (7):
⎛ e3x ⎞ ⎛u0 ⎞ K ⎜⎜ e3y ⎟⎟ = λ′ ⎜⎜ v0 ⎟⎟ , ⎝1⎠ ⎝ e3z ⎠
2. Determination of trifocal tensor
(8)
On the sphere target, there are several spherical circles. Therefore, the features include the visible outline of the sphere target and the spherical circles. In this section, we will discuss the relationship between the trifocal tensor and these features.
the matrix K in Eq. (8) is defined the same as in Eq. (7), λ and λ′ are scale factors as they satisfies the following equation:
2.1. Outline of the sphere
whereλ1, λ2 and λ3 are the eigenvalues of matrix Q, λ1 and λ2 must have the same sign while λ3 must have the different one. As Q is a spherical matrix, we define λ r = λ1 = λ2 . [e3x , e3y , e3z ]T is the eigenvector corresponding to λ3 (If e3z < 0 , the eigenvector should be multiplied by scale factor −1) and R is the radius of the sphere. In trinocular vision sensor, the relationship of these corresponding image points P according to trifocal tensor is [8]:
λ′ =
The projection of a sphere on the image plane is an ellipse, which can be expressed as a matrix form [6]:
⎡ u ⎤T ⎡ a c /2 d /2 ⎤ ⎡ u ⎤ ⎢ v ⎥ ⎢ c /2 b e /2 ⎥ ⎢ v ⎥ = 0, ⎣ 1 ⎦ ⎢⎣ d /2 e /2 f ⎥⎦ ⎣ 1 ⎦
(1)
the homogeneous coordinate of the projective point where (u , v, on the image plane and a, b, c, d, e, f are the parameters of the elliptical expression. In Fig. 1, O-XYZ is the Camera Coordinate System (CCS) while o-xy is the Image Coordinate System (ICS). Under the CCS, the projection center of the camera is at the origin and the optical axis points in the positive Z direction. Supposing a spatial point P is projected onto the plane with Z=f0, referred to as the image plane under the CCS, where f0 is the effective focal length (EFL). Supposing p = (x, y, 1)T is the projection of P = (X , Y , Z )T on the image plane. Under the ideal pinhole imaging model, the undistorted model of the camera, P, p and the projection center O are collinear. The fact can be expressed by the following equation:
where r 2
x2
y 2 ,(x,
(9)
(10)
where T is the trifocal tensor,
⎧1, (i , j , k )is even permutation ⎪ εijk = ⎨− 1, (i , j , k )is odd permutation ⎪ 0, others ⎩
(11)
the indices repeated in the contravariant and covariant positions imply summation over the range (1,2,3) of the index. According to Eqs. (8) and (10), the relationship of the three projections of the sphere outline on the three image planes can be expressed as:
Sphere
(2)
Ellipse
Practically, the radial distortion and the tangential distortion of the lens are inevitable. When considering the radial distortion, we have the following equations:
⎧ x = x (1 + k1 r 2 + k2 r 4 ) ⎨ , ⎩ y = y (1 + k1 r 2 + k2 r 4 )
λ3 , λ3 + λr
P1i P2j P3k εjqu εkrv Tiqr = 0uv ,
1)T is
⎡X⎤ ⎡ x ⎤ ⎡ f0 0 0 0 ⎤ ⎢ ⎥ ⎢ ⎥ Z ⎢ y ⎥ = ⎢ 0 f0 0 0 ⎥ ⎢ Y ⎥ . Z ⎢⎣ 1 ⎥⎦ ⎢ ⎣ 0 0 1 0 ⎥⎦ ⎢⎣ 1 ⎥⎦
λ R
y Image Plane
Image Coordinate System
x
o
(3)
y)T is
= + the distorted image coordinate, and (x , y )T is the idealized one, k1, k2 are the radial distortion coefficients of the lens. According to Eqs. (1) and (2), we obtain the matrix representation of the right-circular cone under CCS as:
[ X Y Z ] Q [ X Y Z ]T = 0,
Camera Z Coordinate System
(4)
Y X
O
where the matrix Q is defined as [7]:
Fig. 1. The projection of a sphere under the camera coordinate system.
120
Optics and Lasers in Engineering 90 (2017) 119–127
R. Lu, M. Shao
⎡1 0 0 ⎤ ⎡0⎤ ⎡0⎤ l = CO = ⎢ 0 1 0 ⎥ ⎢ 0 ⎥ = ⎢ 0 ⎥ , ⎢⎣ ⎥⎢ ⎥ ⎢ ⎥ 0 0 R⎦ ⎣1⎦ ⎣1⎦
⎡ ⎛ e 1 ⎞ ⎤i ⎡ ⎛ e 2 ⎞ ⎤ j ⎡ ⎛ e 3 ⎞ ⎤k ⎢ ⎜ 3x ⎟ ⎥ ⎢ ⎜ 3x ⎟ ⎥ ⎢ ⎜ 3x ⎟ ⎥ ⎢K1 ⎜ e31y ⎟ ⎥ ⎢K2 ⎜ e32y ⎟ ⎥ ⎢K3 ⎜ e33y ⎟ ⎥ εjqu εkrv Tiqr = 0uv , ⎢ ⎜⎜ ⎟⎟ ⎥ ⎢ ⎜⎜ ⎟⎟ ⎥ ⎢ ⎜⎜ ⎟⎟ ⎥ ⎢⎣ ⎝ e31z ⎠ ⎥⎦ ⎢⎣ ⎝ e32z ⎠ ⎥⎦ ⎢⎣ ⎝ e33z ⎠ ⎥⎦ 1 2 3
(12)
(20)
In other words, the polar of the point O with respect to conic C is the line at infinity of plane Π. According to the definition of vanishing line, the projection of line l is the vanishing line of plane Π. The relationship is expressed as:
the related definition are explained in the above equations. 2.2. Spherical circles
(21)
L = co . Define the coordinate of the center O of circle under the CCS is (X0 , Y0, Z 0 )T , the normal vector of the plane Π which the circle C locates on is (Vx , Vy, Vz )T , the projection of the spherical circle on the image plane is c, the expression of which is:
⎡ a c /2 d /2 ⎤ c = ⎢ c /2 b e /2 ⎥ , ⎢ ⎥ ⎣ d /2 e /2 f ⎦
Combine with Eq. (13), we get
⎛ ± e1x Rη + e3x Rδ ⎞ ⎜ ⎟ L = cK ⎜ ± e1y Rη + e3y Rδ ⎟ , ⎜ ⎟ ⎝ ± e1z Rη + e3z Rδ ⎠
where K is the intrinsic parameter matrix of the camera, [e1x , e1y , e1z ]T and[e3x , e3y , e3z ]T are the same as defined in Eq. (14), the sign (position or negative) is confirmed from the determination of the center of the spherical circle as explanation above. In trinocular vision sensor, the relationship of these corresponding lines L according to trifocal tensor is [8]:
(13)
(X0 , Y0, Z 0 )T and (Vx , Vy, Vz )T can be obtained from the projection c and the exact known radius Rn. The two solutions are listed as following [6,9]: The first one is
⎧ X0 = e1x Rη + e3x Rδ ⎪ ⎨ Y0 = e1y Rη + e3y Rδ ⎪ Z = e Rη + e Rδ ⎩ 0 1z 3z
(L1r εris ) L 2j L 3k Ti jk = 0 s .
(23)
The related signs are the same as defined in Eq. (10). Therefore, the relationship of the spherical circle (with its projection cn on the image plane) and its corresponding circular cone Qn can be expressed as:
(14)
and its corresponding normal vector is
⎧Vx = e1x τ − e3x κ ⎪ ⎨Vy = e1y τ − e3y κ ⎪V = e τ − e κ ⎩ z 1z 3z
r j ⎛⎡ ⎞⎡ ⎛ ± e1 η + e 1 δ ⎞ ⎤ ⎛ ± e2 η + e 2 δ ⎞ ⎤ 1x 3x 1x 3x ⎜⎢ ⎟⎢ ⎟⎥ ⎜ ⎜ ⎟⎥ ⎜ ⎢c K ⎜ ± e 1 η + e 1 δ ⎟ ⎥ ε ⎟ ⎢c K ⎜ ± e 2 η + e 2 δ ⎟ ⎥ ris 1y 3y 1y 3y ⎜⎢ 1 1⎜ ⎟⎢ 2 2 ⎜ ⎟⎟ ⎥ ⎟⎥ ⎜ ⎜ 1 1 2 12 ⎟ ⎜ ⎢⎣ ⎟ e η e δ e η e ± + ± + ⎥ ⎢ ⎝ ⎝ 1z 3z ⎠ ⎦ 1z 3z δ ⎠ ⎥ ⎦ ⎝ ⎠⎣
(15)
the other solution is
2
1
⎧ X0 = −e1x Rη + e3x Rδ ⎪ ⎨ Y0 = −e1y Rη + e3y Rδ ⎪ Z = −e Rη + e Rδ ⎩ 0 1z 3z
k ⎡ ⎛ ± e3 η + e 3 δ ⎞ ⎤ 1x 3x ⎢ ⎟⎥ ⎜ ⎢c3 K3 ⎜ ± e13y η + e33y δ ⎟ ⎥ Ti jk = 0 s , ⎢ ⎜⎜ ⎟⎟ ⎥ 3 3 ⎢⎣ ⎝ ± e1z η + e3z δ ⎠ ⎥⎦ 3
(16)
and its corresponding normal vector is
(24)
where Kn is the intrinsic parameter matrix of camera n, [etxn , etyn , etzn]T is the eigenvector of Qn corresponding to λt , which satisfies the constraint described in Eq. (14).
⎧Vx = −e1x τ − e3x κ ⎪ ⎨Vy = −e1y τ − e3y κ ⎪ V = −e τ − e κ ⎩ z 1z 3z
(17)
the related signs are defined as below: η = λ3 ( λ1 − λ2 ) ,δ = λ1 ( λ2 + λ3 ) ,τ = λ1 ( λ1 + λ3 )
(22)
λ3 ( λ1 + λ3 )
λ1 − λ2 λ1 + λ3
,κ =
λ2 + λ3 λ1 + λ3
2.3. Determination of trifocal tensor
,
Heretofore, there are many approaches [8] to calculate the trifocal tensor, the typical one is the algebraic minimization algorithm. When the relationship between the spherical features and the trifocal tensor is obtained (Eqs. (12) and (24)), the trifocal tensor of the trinocular vision sensor can be determined easily.
where λ1, λ2 and λ3 are the eigenvalues of matrix Q, λ1 and λ2 must have the same sign while λ3 must have the different one and λ1 > λ2 .[e1x , e1y , e1z ]T is the eigenvector corresponding to λ1, [e3x , e3y , e3z ]T is the eigenvector corresponding to λ3 (If e3z < 0 , the eigenvector should be multiplied by scale factor −1). As the relationships of spherical circles are known, the unique solution can be confirmed according to the distance constraint of each two centers and the angle constraint of two normal vectors. When define a conic on a space plane is C and a point on the plane is p, the line
l = Cp
C
(18)
is the polar of the point p with respect to the conic C while the point p is the pole of l with respect to C. this relationship is illustrated in Fig. 2. Then a new coordinate system is defined: the origin of the system is the center of the spherical circle (point O), while the normal vector of the plane Π which the circle locates on is the Z direction. As the radius of the circle in known as R, the matrix form of the expression is:
⎡1 0 0 ⎤ C = ⎢ 0 1 0 ⎥, ⎢⎣ ⎥ 0 0 R⎦
l
(19)
then the polar can be deduced from Eq. (18):
Fig. 2. The pole-polar relationship.
121
p
Optics and Lasers in Engineering 90 (2017) 119–127
R. Lu, M. Shao
3. Determination of the transformation matrix
P
3.1. Determination of rotation matrix When the tensor is determined, the fundamental matrix of each two cameras which satisfies the constraint of tensor can be deduced:
⎧ F21 = [e′]× [T1, T2, T3] e′′ ⎨ , T T T ⎩ F31 = [e′′]× [T1 , T2 , T3 ] e′
Z1
(25)
where e’ is the pole of camera 2 with respect to camera 1, e” is the pole of camera 3 with respect to camera 1. The pole e’ is perpendicular to the left null-space of each Ti while e” is perpendicular to the right nullspace of each Ti. Then the pole can be deduced from the following equations:
⎧ e′T [u1, u2 , u3] = 0T ⎨ T . ⎩ e′′ [v1, v2, v3] = 0T
Y1
X2 O2
Y2
Fig. 3. The determination of rotation matrix.
The possible solution is the one satisfies that the z-coordinates of point PLR and point PRL are positive simultaneously (as illustrated in Fig. 3). In this case, the rotation matrix and the translation matrix which needs a scale factor are determined.
And the essential matrix of each two cameras can also be confirmed: ⎪ ⎪
X1
O1
(26)
⎧ E = KT F K 2 21 1 ⎨ 21 T ⎩ E31 = K3 F31 K1
Z2
.(27) 3.2. Optimization of rotation matrix
when the singular value decomposition (SVD) of matrix E is computed as E = Udiag (σ1, σ2, 0) V T , the singular values satisfy σ1 = σ2 = σ , namely the matrix E have two same singular values and a zero singular value. Then the transformation matrices, including the rotation matrix and the translation matrix which needs a scale factor, can be deduced from the SVD of matrix E. The four possible solutions of the rotation matrix and the translation matrix are listed as below:
⎧ A: R = UZV T , t = σu3 ⎪ ⎪ B: R = UZ T V T , t = −σu3 ⎨ , ⎪C: R = UZV T , t = −σu3 ⎪ D: R = UZ T V T , t = σu ⎩ 3
As the rotation matrix is an orthogonal matrix with three degrees of freedom, the rotation matrix is normally expressed as a vector, namely the rotation vector. Define the rotation vector as r = (rx , ry, rz ), according to the Rodriquez theorem, the relationship between the rotation matrix and the rotation vector can be indicated as:
⎡ 0 − rz ry ⎤ ⎥ ∼∼T + sin θ ⎢ r 0 − rx ⎥ . R = I cos θ + (1 − cos θ ) rr ⎢ z ⎢⎣− ry rx 0 ⎥⎦ The Eq. (31) can be rewrite as
(28)
⎡ 0 − rz ry ⎤ ⎢ ⎥ R − RT 0 − rx ⎥ = sin θ ⎢ rz , 2 ⎢⎣− ry rx 0 ⎥⎦
where u3 is the vector with respect to the zero singular value, namely ⎡ 0 1 0⎤ the last column of matrix U, and Z = ⎢− 1 0 0 ⎥. In Eq. (28), A and C, ⎢⎣ ⎥ 0 0 1⎦ similar with B and D, is a baseline reversal, while A and D, similar with B and C, is rotated 180° about the baseline. In fact, there is only one solution is possible in the measurement process. In multi-camera vision system, the reconstructed point must locate in front of both cameras. In this case, a method to confirm the possible solution from the four choices without any point is proposed. The related procedure is descripted as following: When define the rotation matrix from the left camera coordinate system to the right camera coordinate system is R, while the translation matrix is T, we can get the following relationship:
3.3. Determination of translation matrix In a trinocular vision sensor, we define the coordinate system of camera 1 as the global coordinate system and take camera 1 and camera 2 for example. In our calibration method, the radius of the sphere target is known exactly. The point (XC , YC , ZC ) on the sphere satisfies the relationship indicated as:
T
Assume the unit vector [0 0 1] under the left camera coordinate system is indicated as nR under the right camera coordinate system. Similarly, the origin of the left camera coordinate system is indicated as PR0 under the right camera coordinate system. Define the z-axis of the left camera coordinate system is projected on the xy-plane of the right camera coordinate system and the projected line is lLR , the direction vector of the line can be expressed as:
nl = n y × nR × n y ,
(32)
where θ = r , r∼ = r / θ and I is a unit matrix. In order to obtain the optimized rotation matrix, the rotation vector can be used to replace the rotation matrix in the process of determination of rotation matrix according to Eqs. (31) and (32).
(29)
PR = RPL + T ,
(31)
(XC − X0 )2 + (YC − Y0 )2 + (ZC − Z 0 )2 = R2 ,
(33)
where (X0 , Y0, Z 0 ) is the center of the sphere. So the coordinates of the feature points on the spherical circle can be calculated when combining Eqs. (2) with (33). The coordinates which are ambiguity can be determined as the z-coordinate of the displayed point is less than the one which is not displayed. Then the expression of the plane which the spherical circle locates on is confirmed. And the homography matrix between camera 1 and camera 2 which induced by the plane can also be determined:
(30)
lLR
where n y = (0, 1, 0). The expression of line is easily obtained from the point PR0 according to Eq. (30), then the intersection (point PLR ) of line lLR with z-axis under the right camera coordinate system can also be determined. Similarly, the intersection (point PRL ) of the line lRL which is projection of the z-axis of the right camera coordinate system, with zaxis under the left camera coordinate system can also be confirmed.
H = K2 (R + TndT ) K1−1,
(34)
where K1 and K2 are the intrinsic parameter matric of camera 1 and camera 2 respectively, R is the rotation matrix from camera 1 to 122
Optics and Lasers in Engineering 90 (2017) 119–127
R. Lu, M. Shao
Then the trifocal tensor of the three cameras can be calculated as
camera 2, and T is the translation matrix from camera 1 to camera 2. As the translation matrix with a scale factor λ has been determined, the factor can be determined from Eq. (34). 4. Simulations In this section, simulations are performed to verify the effects of some main factors, including the number of features and the image noise. Simulation conditions are listed as following: Assume the intrinsic parameter matric of the three cameras are the same, which is
⎡ 5087.7 0 820.1⎤ K1 = K2 = K3 = ⎢ 0 5087.3 612.2 ⎥ . ⎢⎣ ⎥ 0 0 1 ⎦
Respectively, the rotation matrix and translation matrix from the coordinate system of camera one to the coordinate system of camera two are assumed as
⎡− 534.68⎤ ⎢ 37.538 ⎥ , ⎢⎣ ⎥ 192.963 ⎦
⎡ 5.826 560.556 0.004 ⎤ T2 = ⎢− 1123.016 67.772 0.084 ⎥ , ⎢⎣ ⎥ − 0.004 − 0.042 0.000 ⎦
(39)
⎡− 8.791 × 10 4 9.354 × 10 4 641.080 ⎤ ⎢ ⎥ T3 = ⎢− 8.466 × 10 4 207.679 − 70.167⎥ . 3 ⎣ − 1.250 × 10 144.590 0.046 ⎦
(40)
4.1. Effect of the number of features The number of features, including the outline of the sphere and the spherical circles, is important for the determination of the trifocal tensor. Therefore, precise calibration result will be obtained when more features are utilized to calculate the trifocal tensor. In order to evaluate the effect, different number of features varying from five to fifteen are utilized to calculate the trifocal tensor. Gaussian noise with mean 0 and standard deviation of 0.2 pixels is added to both coordinates of the image points to generate the perturbed image points of features. The image points are normalized beforehand to make the calculation stable.
(36)
The rotation matrix and the translation matrix from the coordinate system of camera one to the coordinate system of camera three are
⎡ 0.812 − 0.048 0.582 ⎤ R13 = ⎢ 0.073 0.997 − 0.020 ⎥ , T13 = ⎢⎣ ⎥ − 0.579 0.059 0.813 ⎦
(38)
As the transformation matrix between each two cameras is decomposed from the trifocal tensor, the effect to the transformation matrix is evaluated in this section.
(35)
⎡ 0.952 − 0.028 0.305 ⎤ ⎡− 267.34 ⎤ R12 = ⎢ 0.034 0.999 − 0.016 ⎥ , T12 = ⎢ 18.769 ⎥ , ⎢⎣ ⎥ ⎢⎣ ⎥ − 0.305 0.026 0.952 ⎦ 96.481 ⎦
⎡− 610.173 123.974 0.013 ⎤ T1 = ⎢ − 46.594 − 0.455 0.008 ⎥ , ⎢⎣ ⎥ 0.037 − 0.008 − 0.000 ⎦
(37)
Fig. 4. Calibration results with the number of features.
123
Optics and Lasers in Engineering 90 (2017) 119–127
R. Lu, M. Shao
In order to observe evidently, the rotation matrix is expressed as the rotation vector ([rx , ry, rz ]) according to Rodriguez theorem. The rotation vector is the same as the direction vector of the rotation axis, while the norm of the rotation vector is the rotation angle θ. Define the translation matrix as [Tx , Ty, Tz ]T , the error of the calculated translation matxis and the idealized one is Dis. The simulation results are illustrated in Fig. 4. Each point in the figure represents the average of 1000 uniformly distributed rotations. The calibration result is stable with a little fluctuation as the increasing of the number of features. However, the increasing of the number of features results in burdensome calculation, which affects the efficiency of calibration. So the number of features should be determined overall in actual calibration experiment.
Table 1 Intrinsic parameters of three cameras. No.
fx
fy
u0
v0
kc1
kc2
1 2 3
5091.686 5098.552 5074.441
5091.540 5098.930 5074.602
850.107 856.719 794.611
616.595 597.165 577.571
−0.124 −0.147 −0.116
0.032 −5.752 0.439
illustrated in Fig. 5(b). The rotation matrix is also expressed as the rotation vector ([rx , ry, rz ]), while the translation matrix is expressed as [Tx , Ty, Tz ]T . The other related notations are the same with definitions in Fig. 4. Each point in the figure represents the average of 1000 uniformly distributed rotations. The error of calibration results, including the rotation matrix and translation matrix between each two cameras, is greater with the increasing of image noise. However, the error of rotation vector is still less than 1.8×10−3. The error of the translation matrix between camera 1 and camera 2 is less than 1.2 mm in x, y, z direction, and the overall error is less than 1.4 mm. Similarly, the error of the translation matrix between camera 1 and camera 3 is less than 1.2 mm in x, y, z direction and the overall error is less than 1.4 mm. In the proposed calibration method, the calibration result is precise and consistent with the corresponding trifocal tensor.
4.2. Effect of image noise The feature extraction is very important in this calibration method. In this section, simulations are conducted to evaluate the effect of image noise to the calibration results. In simulations, fifteen features are used to obtain the trifocal tensor of three cameras. Gaussian noise with mean 0 and standard deviation vary from 0 to 1 pixels is added to both coordinates of the image points to generate the perturbed image points. Image points are normalized before the calculation of trifocal tensor, then compatible rotation matrix and translation matrix are decomposed from the trifocal tensor. The calibration results are illustrated in Fig. 5. The error of rotation matrix is illustrated in Fig. 5(a), while the error of translation matrix is
Fig. 5. Calibration results with the standard deviation of noise.
124
Optics and Lasers in Engineering 90 (2017) 119–127
R. Lu, M. Shao
Fig. 6. The target used in our calibration method. (a) is the sphere target used to calibrate the trinocular vision sensor, (b) is the 1D target which is used to evaluate the accuracy of the proposed calibration methods.
Fig. 7. Feature extraction of sphere target. The red circle is the projection of outline of the sphere target, while the ellipses in other colors is the projection of the feature circle on the sphere target.
experiment, each camera is calibrated by Zhang's calibration method [1]. The obtained intrinsic parameters are listed in Table 1. A flexible sphere target with several spherical circles is designed. The sphere target is machined precisely. Moreover, there are several spherical circles on the sphere target. The relative relationship between each two spherical circles is not requested strictly. The sphere target which is used to finish the calibration of trinocular vision sensor is illustrated in Fig. 6(a). Images of the sphere target are captured by the three cameras. We first compensated for camera distortion by rectifying all real images and placed the sphere target in 10 different positions. The projection of sphere outline can be extracted easily as there are plenty of feature points. When the projection of a spherical circle intersects with the projection of the outline of the sphere target, a constraint of double-contact is taken. The results of feature extraction are illustrated in Fig. 7. Trifocal tensor of the three cameras is computed based on the result of feature extraction. The trifocal tensor is described as
Table 2 Result of the evaluation. No.
Dreal (mm)
Dmeasure (mm)
Error (mm)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 RMS
40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000 40.000
40.023 40.030 39.972 40.031 40.010 39.969 39.983 40.003 40.034 40.034 39.974 40.035 40.034 39.998 40.022 39.973 39.994 40.031 40.022 40.034 0.026
0.023 0.030 −0.028 0.031 0.010 −0.031 −0.017 0.003 0.034 0.034 −0.026 0.035 0.034 −0.002 0.022 −0.027 −0.006 0.031 0.022 0.034
5. Experiments 5.1. Calibration of trinocular vision sensor The trinocular vision sensor is composed of three CCD cameras(AVT F-504B) with a resolution of 2452×2056 pixels. In our
⎡ 58.669 − 12.936 0.002 ⎤ T1 = ⎢− 2.265 0.383 − 0.001⎥ , ⎢⎣ ⎥ 0.004 − 0.001 − 0.000 ⎦
(41)
⎡− 18.456 − 61.573 − 0.004 ⎤ T2 = ⎢ 100.070 − 28.915 0.013 ⎥ , ⎢⎣ ⎥ − 0.002 − 0.009 − 0.000 ⎦
(42)
⎡ 2.754 × 10 4 − 1.304 × 10 4 − 57.369 ⎤ T3 = ⎢1.074 × 10 4 − 3.138 × 10 3 − 6.129 ⎥ . ⎢ ⎥ ⎣ 121.615 − 26.390 0.007 ⎦
(43)
Then the compatible rotation matrix and translation matrix can be 125
Optics and Lasers in Engineering 90 (2017) 119–127
R. Lu, M. Shao
Fig. 8. The feature matching results based on the calibration results. (a) are the images captured by the three cameras simultaneously in one place while (b) are located in another. In each figure, the left image is captured by camera one, the middle one is captured by camera two, while the right one is captured by camera three.
Fig. 9. Feature matching results based on normal calibration method.
⎡ 0.467 0.318 − 0.825⎤ R13 = ⎢− 0.212 0.946 0.245 ⎥ , ⎢⎣ ⎥ 0.858 0.060 0.5099 ⎦
deduced from the obtained trifocal tensor. The rotation matrix from camera 1 coordinate system to camera 2 coordinate system is
⎡ 0.868 0.017 − 0.496 ⎤ R12 = ⎢− 0.034 0.999 − 0.025⎥ , ⎢⎣ ⎥ 0.495 0.038 0.868 ⎦
and the translation matrix is
(44)
⎡ 407.797 ⎤ T13 = ⎢− 128.654 ⎥ . ⎢⎣ ⎥ 298.591 ⎦
and the translation matrix is
⎡ 248.80 ⎤ T12 = ⎢ 10.421 ⎥ , ⎢⎣ ⎥ 210.386 ⎦
(46)
(45)
The rotation matrix from camera 1 to camera 3 is 126
(47)
Optics and Lasers in Engineering 90 (2017) 119–127
R. Lu, M. Shao
Fig. 8 in order to show clearly. From Fig. 8, there is no mismatching point, which indicates the strict constraint of trifocal tensor. We also obtained the fundamental matrix between each two cameras based on the normal calibration method. Matching points in image 3 are determined from feature points in image 1. Then we get the matching points ({P2i}) between image 1 and image 2. Similarly, the matching points ({P͠ 2i }) between image 3 and image 2 can also be obtained. The feature matching results based on the normal calibration results are show in Fig. 9 and twenty matching points taken randomly from the results are listed in Table 3. As the feature matching based on our calibration results is strict ({P2i} = {P͠ 2i }), we did not give the results. As listed in Table 3, the feature matching results based on the normal calibration results are not precise enough. The root mean square (RMS) error in x-direction is 0.007 pixels, while the RMS error in y-direction is 0.069 pixels. This is because that the epipolar is nearly horizontal. In the same case, feature matching based on the trifocal tensor in our calibration method is more strict.
Table 3 Matching results based on the normal calibration results (pixels). No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 RMS
ji } {P 2
{P2i}
Error
X
Y
X
Y
△X
△Y
699.868 770.432 769.736 768.621 767.310 766.051 762.987 758.639 757.237 732.108 726.777 708.774 701.626 750.719 747.676 744.952 732.444 720.501 715.014 707.405
1000.633 166.291 173.267 184.851 197.254 210.491 243.151 292.949 305.451 562.037 618.485 805.714 880.910 468.919 501.393 529.273 662.693 788.148 845.051 921.576
699.883 770.445 769.747 768.629 767.318 766.056 762.989 758.642 757.239 732.106 726.775 708.777 701.633 750.721 747.677 744.953 732.446 720.506 715.020 707.412
1000.490 166.152 173.151 184.765 197.186 210.445 243.110 292.914 305.440 562.064 618.511 805.679 880.848 468.903 501.374 529.257 662.672 788.099 844.989 921.473
−0.015 −0.013 −0.012 −0.008 −0.007 −0.004 −0.002 −0.003 −0.001 0.003 0.002 −0.003 −0.006 −0.001 −0.001 −0.002 −0.002 −0.005 −0.007 −0.007 0.007
0.144 0.139 0.116 0.086 0.068 0.046 0.041 0.035 0.011 −0.027 −0.025 0.035 0.061 0.016 0.018 0.017 0.021 0.048 0.062 0.103 0.069
6. Conclusion In this paper, a method to finish the calibration of trinocular vision is proposed. A sphere target with several spherical features is designed. From images of the target, trifocal tensor is determined. Then rotation matrix and translation matrix with a scale between each two cameras can be obtained based on singular value decomposition of the trifocal tensor. As the relationship of each two spherical feature is known exactly, the scale can be determined. Simulations and experiments show that the proposed method is precise and robust. And the root mean square error can reach 0.026 mm with regard to the view field of about 200×200 mm. Compared with the normal calibration method, the proposed method is more precise and stable as a sphere projection is not concerned with its orientation and image points are not requested one-to-one correspondence. Moreover, the trifocal tensor of the trinocular vision sensor which is similar with the fundamental matrix in the binocular vision sensor is significant for the feature matching and the feature matching is more strict. Moreover, our calibration method also provides a new approach to obtain the trifocal tensor.
5.2. Measurement accuracy evaluation A 1D target which is illustrated in Fig. 6(b), is used to evaluate the accuracy of the proposed calibration method. The distance between each two adjacent feature points (Dreal) is known exactly. Then the feature points is determined based on the calibration results and the distance (Dmeasure) can also be obtained. All measurement values are compared with their corresponding true value. As the target can be moved into different positions randomly, we can obtain enough distances to evaluate our calibration result, 20 of which are listed in Table 2. As the sphere target used in our calibration experiment merely has the accuracy of 0.01 mm, the RMS error of distance is 0.026 mm and the relative calibration accuracy is around 0.105‰ with regard to the view field of about 200×200 mm. So the calibration results are precise enough.
References [1] Zhang ZY. A flexible new technique for camera calibration. IEEE Trans Pattern Anal Mach Intell 2000;22(11):1330–4. [2] Wei ZZ, Shao MW, Zhang GJ. Parallel-based calibration method for line-structured light vision sensor. Opt Eng 2014;53(3):033101. [3] Wei ZZ, Liu XK. Vanishing feature constraints calibration method for binocular vision sensor. Opt Express 2015;23(15):18897–914. [4] Wei ZZ, Cao LJ, Zhang GJ. A novel 1D target-based calibration method with unknown orientation for structured light vision sensor. Opt Laser Technol 2010;42(4):570–4. [5] Liu Z, Zhang GJ, Wei ZZ, Sun JH. Novel calibration method for non-overlapping multiple vision sensors based on 1D target. Opt Laser Eng 2011;49(4):570. [6] Shiu YC, Ahmad S. 3D location of circular and spherical features by monocular model-based vision. IEEE Int. Conf. Syst. 1989;2:576–81. [7] Safaee-Rad R, Tchoukanov I, Smith KC, Benhabib B. Three-dimensional location estimation of circular features for machine vision. IEEE Trans Robot Autom 1992;8(5):624–40. [8] Hartley R, Zisserman A. Multiple view geometry in computer vision, 2nd ed.. Cambridge University; 2003, ch14-ch16. [9] Safaeerad R, Tchoukanov I, Smith KC, Benhabib B. Three-dimensional location estimation of circular features for machine vision. IEEE T Robot Autom 1992;8(5):624–40. [10] Steger C. Unbiased extraction of lines with parabolic and Gaussian profiles. Comput Vis Image Underst 2013;117:97–112.
5.3. Matching accuracy evaluation From the calibration result, the compatible fundamental matrix can be obtained. The fundamental matrix from camera 1 to camera 2 is
⎡ 1.000 − 1.712 × 101 1.555 × 10 4 ⎤ ⎢ ⎥ F12 = ⎢ 4.829 − 5.211 × 10−1 − 1.360 × 10 5 ⎥ , 4 5 6 ⎣− 1.099 × 10 1.183 × 10 8.491 × 10 ⎦
(48)
and the fundamental matrix from camera 1 to camera 3 is
⎡ 1.000 6.111 1.034 × 10 4 ⎤ ⎢ ⎥ F13 = ⎢ 4.464 − 1.494 4.622 × 10 4 ⎥ . ⎣− 5.434 × 10 2 − 4.961 × 10 4 − 5.536 × 10 6 ⎦
(49)
The feature matching results are illustrated in Fig. 8. A laser is projected onto the surface of the object and the light stripe is extracted based on Steger's extraction method [10]. Related feature matching results are illustrated in Fig. 8. As the extracted center points are mass, one point from each thirty points is taken and illustrated in
127