Transformation calibration of a camera mounted on a robot M E Bowman
and A K Forrest
The paper describes the accurate calibration of the camera transformation for a vision system consisting of a camera mounted on a robot. The calibration includes an analysis of the linearity of the camera. A knowledge of the camera transformation allows the three-dimensional position of the object points to be determined using triangulation. Keywords:
calibration,
linearity,
robot vision, stereo
Robot-mounted camera systems have been successfully used for some time in manufacturing industry, performing reliably in harsh conditions’. Such systems offer great flexibility for a manufacturing cell, particularly for such tasks as automated assembly. A robotcamera system allows a work scene to be viewed from many different positions, enabling a vision system to triangulate points in three dimensions (3D) from multiple images. The information derived can be used for both object recognition and general workpiece manipulation. The requirements for a fully flexible robotcamera system are an accurate knowledge of the camera position and a model for the camera’s mathematical transformation. The camera position is dependent on the robot, and high accuracy is needed if the vision system is to triangulate points effectively. The properties of the camera can be represented by a matrix transformation2A. The use of homogeneous coordinates’ allows the camera model to map the world coordinates to image coordinates in a single transformation, rather than adding such factors as perspective at later stages of the transformation. Knowledge of the model allows the inverse operation of transforming image points to object points - triangulation - to be performed. The camera model can be found using a least-squares best-tit approach, mapping the world object points to the image points with minimal error. The resultant
matrix holds information concerning the physical parameters of the camera and the distance and rotation from the camera to the object in view, and linearly maps homogeneous world coordinates to homogeneous image coordinates. If the transformation is not linear due to distortion effects, the transformation can be modelled using a polynomial estimate of the linearity of the camera6,7. The method of calibrating the camera within a robotcamera system is currently incorporated in a robot vision system, Muse (manipulator utilizing sight environment), which is being developed at Imperial College, London. Muse uses multiple images to triangulate points for object recognition.
SIMPLE
MAPPING
MODEL
OF CAMERA
If the object points are defined as [X,, Y,, Z;] with homogeneous coordinates [WJ,, WiYi, WiZi, Wi] and the corresponding image points are [u,, vi] with homogeneous coordinates [xi, y,, w,1 then matrices can be formed representing the object point and the image point as column vectors 0 and I respectively. The homogeneous object coordinates can be mapped to the homogeneous image coordinates by a linear transformation.
The transform can be represented in matrix form as
co =
I
(1)
where C is a 3 x 4 matrix representing the camera. For a single object point, the mapping to the corresponding image point can be explicitly described as
(2) Centre of Robotics, ZBX, UK
Imperial
College.
Exhibition
Rd. London
0262-8856/87/05261-06 ~015 no 4 november 1987
SW7
$03.00
@ 1987 Butterworth & Co. (Publishers) Ltd 261
If C is represented as
C=
%
CO1
Cl0 CO
ideal wnoge
1
CO2
co3
Cl1
Cl2
Cl3
c21
c22
c23
-Geometric
yi WE
=
+
c21 ri +
C22Zi
+
c23
clJf
+
cliY,
+
Ct2Zi
+
Cl3
c2&
+
c21Yi
+
c22zi
+
C23
(4)
(5)
c@J, + co, Y; + c()*zi + co, - c**X;u, - c.2,Y,u, - c,,z,u, = Uj
c2&Yivi
-
C2]
(6)
Cl3
YiVi- c2,zivi = vi
(7)
-a+
0I
-z&J, -Xmvm .
which can be represented as
a, +
a,u + a,v + a3u2 + a,uv + a,?
-Y&4, - Y,v,
--cl% - z,v,
%I
“ml
‘(8)
&IT
(15)
+ (k - Ic’)~
(16)
The Euclidean norm of an m-length vector x, where ]]x]]*= [lx,/2 + * . 1 + Ix,l’]“.’ can be used to describe the error for m points as E = I{j- Pa]]: + /]k - Pb]],
(10)
j’ = [ilrj2,
r=Ax-b
01)
kT = [kg, kZ, . . . , km]
LI~EARI~
The simple camera model derived from earlier sections
(17)
where j and k are column vectors formed from the j and k coordinates respectively
x = (AT A)-’ ATb
262
(14)
(9)
where the vector x represents all the C matrix elements cii except cz3. A solution to the system can be found if 11 equations are provided, corresponding to at least six points, and if each row vector of the system is linearly independent. If more than six points are chosen the linear system is said to be overdetermined and the solution to the vector x can be found on a best-fit basis which minimizes8t9 the Euclidean norm of the residual vector, r. Thus
CAMERA
(13)
The correction process aims to minimize the error between the position of the adjusted points and the ideal points. The error (in pixels squared) between the ideal image point position [j,k] and the adjusted image point position [i’, k’] is E = 0’ - j’)’
Ax = b
(12)
where the coefficients a,, . . . ,a, and bo, . . . ,b5 are constants parameterizing the necessary geometric correction process for the camera and can be described by two vectors a and b where
b = ]bo, b,, bz, b,, b,,
c22
Y, 0. z,0
J* =
.1’ .= .
01. x,0.
linearly maps homogeneous world coordinates to homogeneous image coordinates. If the transformation cannot be accurately described by a linear transform then higher-order polynomial functions may be used. Secondorder polynomials are commonly used to counteract the effects of any geometric distortion. A test is made initially of the geometric distortion of the system. A correction can be made to the location of the image points within a frame to allow for the distortion of the original picture. The correction process can be illustrated as in Figure 1. The correction process attempts to map the actual image coordinates [u, v] back to the original image coordinates [i, k], correcting the geometric distortion. A second-order polynomial operator was chosen to correct the positioning of the observed image points. The correction mapping transform is
a = [a,, aI, a2, a3, a,+ aS
c!m
-z,v,
Y, 0 2, 0
------a
k’ = b, + b,u + b2v + b3u2 + b,uv + b,v2
Equations 6 and 7 form the basis of the following matrix equation for m points.
i x, 0
Geometric correction
Figure 1. Correction of the location of the image points within a frame to allow for distortion of the original picture
Equations 4 and 5 represent a homogeneous linear system in variables tilt each object point [X,, Yi, Zi] and matching image point [u, VJ providing two linear equations of the system. To solve for the values of cii, the system must be made nonhomogeneous. This is achieved by setting a nonzero variable within the matrix C equal to one. The term cZ3 is chosen since it has a scaling effect for 3D homogeneous coordinate points. Thus
-
Corrected image plone [j.‘k’]
(3)
c2&
%-F + CllY; + GZi +
+ distortion
then
v. = I
Observed moge plone Cwl
None WI
. . . ,.A
(18) (19)
and P, a matrix containing up to and including secondorder terms of the observed image points [ui, vi], is defined as
image and vision computing
where the notation C, and S, (etc.) represent cos a and sin a (etc.) respectively. The Denavit-Hartenberg A matrices relate the orientation and position of one link’s coordinate frame to the next using four variables 8, d;, a; and a;. The physical meaning of The A matrix convention is’ illustrated in Figure 3. The A transform for joint i is described by The vectors a and b, which minimize the error E of the correction process, can be found using the leastsquares method. a = (P’P)-’
P’j
(21)
b = (P’P)-’
PTk
(22)
This allows the solution to be found by overdetermining the system. Only six points are necessary to find values for the vectors a and b, but using more than six points and minimizing the error E results in a statistically more dependable solution.
(25) where 8; is the joint angle, d, is the joint i offset, ai is the link i length and ai is the link i twist angle. If the transforms A,, . . . , A, and R are known beforehand, i.e. if the robot model is accurate, then the 3D homogeneous coordinates 0 can be transformed to 3D points relative to the camera, ‘0, where ‘0
CALIBRATING
- sin 8, cos a, sin 0, sin a, aj cos Oi cos Bicos ai -cos 8; sin ai a, sin 0; sin a, cos ai di 0 1 0
A, =
= (R A,. . . A,)-‘0
(26)
THE CAMERA Equation 23 can then be rewritten as
Mounting the camera on the end of an n-joint robot has many advantages. To benefit from the flexibility of such a scheme the camera matrix transform C must be known. The scene can be illustrated by Figure 2. If the homogenous 3D coordinates of a reference object in its own coordinate frame is described by the matrix 0 and the corresponding observed image by I, then C(RA,...A,)-‘0
= I,
(23)
where R is a 4 x 4 transform from the reference object to the robot base; A,, . . . ,A, are 4 x 4 DenavitHartenberg A transforms for each joint”,“; and C is a 3 x 4 transform from A,, to the lens and camera model inclusive. Equation 23 is of the same form as Equation 1 except that the term (R A,, . . .A,)-’ modifies the set of object points 0. Thus if R represents a translation of pl, pV and pZ and a rotation of a, p and y along and about the x, y and z axes, respectively, of the reference object coordinate frame, then
(‘BC,
R=
S,C” + S”qlC”
S,& - CJ,C, i
- SF,
c,c. -
s&%
s.c, + c&s,
0
0
% - &C,
w, 0
c co = I,
(27)
which has the same form as Equation 1. ‘0 (4 x m) and I, (3 x m) are homogeneous representations of 3D world and 2D image points with respect to C, where m is the number of points. Using the transform (R A, . . . A,)-‘, more than one view of the reference object can be incorporated into ‘0 and I,. In tests, the reference object used for the calibration was a cube, with the three visible sides each displaying a white embossed square contrasting strongly with the main cube body, which was coloured black. The reference object provides 12 clearly visible corner points. The calibration method devised for calculating C uses 120 points resulting from locating the corner points using one-dimensional Gaussian operators ‘2.‘3in each of ten views. C is then found using Equations 8 and 27, the solution being overdetermined to minimize the effects of repeatability and accuracy errors in the term (R A, . . . A,)-’ and errors in I,, the matrix of image points. The camera’s linearity vectors a and b can be found using the method described above, provided that a set of ideal image points and corresponding observed image points are available. A set of ideal image points, I,, can be calculated using the best-fit camera model, C, and the set of object pointscO. I, = c co
(28) Joint
i
Robot
Figure 2. Camera mounted
~015 no 4 november 1987
on an n-joint robot
Figure 3. A transform parameters
263
Equation 28 represents the purely linear transfo~ation from the actual 3D homogeneous world points to a set of ideal 2D homogeneous image points. If all the terms in C and ‘0 were correct, i.e. the camera transformation was perfectly linear and the positional information implicit in R and the A matrices was totally accurate, the image points I, and I, would correspond exactly. If the camera transformation was nonlinear this exact correspondence would not be achieved but the ideal image points I, calculated from Equation 28 could be used as an estimate of the ideal image points to correct the observed image points I,. Hence the set of ideal image points @,, k,] , . . . , km, km] found from I, can be used to form vectors j and k using Equations 18 and 19. The matrix P is formed from the set of corresponding observed image points [u,, v,], . . . , [u,, v,] using Equation 20. The vectors a and b are then calculated using Equations 21 and 22.
APPLICATION OF CAMERA STEREO VISION
MODEL
TO
Knowledge of C and the linearity vectors a and b allows the 3D position of observed image points from more than one view to be dete~ined accurately using triangulation. The view of an object with coordinates 0 can be represented as before by C (R A, . . , A,)-‘0
= I
(29)
If the substitution CT’ = C (R A, s.. A,)-’
(30)
is made, then Equation 29 can be rewritten as co=1
(311
A single point [Xi, Yi, .Zi, l] will then correspond to an image point [u, vi] with homogeneous image coordinates [i& ipli, ipzi]where
ipo,= c,x; + cl(), Yi + c(pzi + c’oj
c,,z, +
ip,, = c’,&
+ c’,,Y, +
i&i = C,Ji
+ c’,, Yj + c’,,zj
(321
@,3
(33)
+ c123
u, = ip~~lip~i
(35)
5 = iP,iliP~~
(36)
Two linear equations in the unknowns can be formed from Equations 32-36. Jqc’,
- Uic’& + zi(co2
4” Yi(c’,,
Xi, Yi and Z,
- uic*,)
- UiC2J = uic*3 -
c’()3
(37)
If an object point is viewed from two different viewpoints
264
then there will be four equations in Xi, Y, and 2, overdetermining the solution to Xi, Y, and Zi. Solving for the two sets of Equations 37 and 38, corresponding to the two views of the homogeneous object point [Xi, Y,, Zi, 11, is equivalent to finding the intersection of the two lines of sight; each pair of Equations 37 and 38 corresponds to the equation of a line in 3D space. The effect of errors in the terms of Equations 37 and 38 can be minimized by choosing the angle of intersection between the lines of sight to be as large as possible. A least-squares approach can be employed to determine the object point’s coordinates. The nonlinearity effects of the camera can be corrected if the image points are adjusted using Equations 12 and 13 and the linearity vectors a and b. The coordinates [X,, Y, Zj, l] are found in terms of the reference base coordinate frame since c’ is specified relative to the reference base coordinate frame. The object point [Xi, Yi, Zi, l] is therefore found to the absolute accuracy of the robot model, and is dependent on errors in the repeatability and accuracy of the robot.
SIMULATION CALIBRATION
RESULTS
OF CAMERA
A simulation was used to test the effectiveness of the calibration method, as there are two sources of potential error within the robot-camera system. The first source of error is the positional information derived from the term (R A, . . . A,)-‘0 which relates the 3D world object points to 3D object points relative to the camera transform C. Any errors in the robot R or A transforms, due to imperfect repeatability and accuracy or in the set of object points 0, will produce errors when calculatThe second source of errors is ing (R A,... A,)-‘0. from the transform C which only holds true if the camera transform is perfectly linear. The interaction between the positional errors in (R A, . . . A,,)-’ 0 and nonlinearity errors in C is an important factor in deciding the effectiveness of the calibration method in determining C and the nonlinearity vectors a and b. A precise way to measure the interaction, rather than using an actual robot-camera system with a mixture of unknown positional and nonlinearity errors, is to simulate the entire process and superimpose known positional and nonlinearity errors in varying quantities. A controlled test of the accuracy of the calibration technique was made using simulated sets of perfect and imperfect image points. A simple camera model was used for the C linear transfo~, and the ten different positions of a robot end effector were calculated with the reference object in view. The set of image points was then found using Equation 29. The robot model used for the A matrices was a five-axis Mitsubishi RM501 Movemaster. If the above set of image points was used to find C, a and b using the calibration method described earlier, the results would naturally reveal a perfect correspondence between object and image points with a purely linear transform and values for a and b as follows. a = [0, l,O,O,O,OJT
(39)
b = [0, 0, 1, 0, 0, O]’
(401
image and vision computing
The effectiveness of the calibration method was tested by altering the input data in two ways. First, errors in the position of the end effector were introduced by adding small movements to the transform (R A, . . . A,)-’ of known amplitude but random direction. Position errors can be caused by both accuracy and repeatability errors in the robot. The movements consisted of both a rotation and a translation. Secondly, the set of image points b,, k,], . . . , bm, k,,,] calculated from the homogeneous set of image points, I, was distorted using linearity vectors p and q to produce a new set of image points [u,, v,]. , . . . , [u,, v,] so that u = p0 + p, j + pzk + pj’
+ p,jk + psk2
(41)
v = qO + q,j + qzk + qj’
+ q4jk + q5k2
(42)
where
O.Imm +O.OOl rod Input
p = [0.9, 1.09,0.09,0.00009,0.00009,0.00009]T
(43)
q = [OS, 0.05, 1.05, 0.00005, 0.00005,0.00005]T
(44)
Equations 41 and 42 simulated the geometric distortion that might be present in a heavily distorted camera. C, a and b were then found using the distorted input image values and the method described above. The quantitative error used to calculate the effectiveness of the calibration method was based on finding the set of 3D object coordinate points for the set of distorted input image points [u,, v,], . . . ,[ u,, v,]. The 3D object coordinates were found using C, a and b and the triangulation method described above. The set of input object coordinate points, 0, described by the points [X,, Y,, Z,], . . . ,[X,,,, Y,,,, Z,] can be used to form vectors x, y and z such that x = [X,....,X,]
(45)
y = [Y,,
(46)
. . .1 Y,l
z = [Z,, . . . .Z,]
(47)
The set of output object points 0’ found from Equations 37 and 38 can be similarly described by x’ = [X,‘, . . , X,‘]
(48)
y’ = [Y,‘,
(49)
z’=
I
I 0.01 mm +OOOOl rod
) Y,]
[Z,‘,...,Z,‘]
(50)
position
Distortion corrected, no pcwlm errors (two and ten YleWs1 I 1.0 mm + 0.0 I rod
error
Figure 4. Triangulation error, E, against position error. 1, two views, no geometric distortion; 2, two views, distorted; 3, two views, distortion corrected; 4, ten views, no geometric distortion: 5, ten views, distorted,. 6, ten views, distortion corrected
the order of 0.05 mm + 0.0005 rad, and lines 4-6 show that a similar transition occurs for ten views but at greater position errors of 0.1 mm + 0.001 rad. This is due to the averaging effect of the least-squares method reducing the significance of any position errors present when using more than the two views necessary for triangulation. In both instances the geometric distortion was significantly corrected, if not eliminated altogether, by the use of linearity vectors a and b. Line 6 shows that position errors interfere with the process of calculating the vectors a and b, which results in a higher triangulation error than that due to the lowest position errors. The graph shows that geometric distortion can be corrected to the extent that position errors become the dominant source of error. An existing robotcamera system was calibrated using the method discussed. The robot used was a Mitsubishi RM-501 Movemaster with a quoted repeatability of 1 mm. The camera was a 512 x 512 CCD device. The results showed linearity errors to be negligible in comparison to positional errors.
CONCLUSIONS
The triangulation error in determining 0 is then E = IIx - ~‘11~~+ IIy - ~‘112~+ )I z - z’ 11?2
(51)
where E represents the error (in m*) summed over the points used in the calibration. Figure 4 shows the log-log variation in E with increasing position errors, using both two and ten viewpoints of the object. Lines l-3 are the results based on using Equations 37 and 38 for only two views, whilst lines 4-6 are for ten views. Lines l-3 demonstrate that geometric distortion was the main source of error in triangulation for two views until the position errors were of
~015 no 4 novemher 1987
A method of calibrating the camera transformation on a robot has been detailed with a full analysis of the linearity of the camera. The simulation results show that robot repeatability and accuracy errors are probably the dominant source of triangulation errors in a robot vision system, but that geometric distortion of the camera can be corrected if it is a significant factor. Knowledge of the camera transform C allows the 3D position of object points to be determined using triangulation. This method of camera transform calibration is currently used in Muse, a vision system being developed at Imperial College, London.
265
Clocksin, W F et al. ‘Progress in visual feedback for robotic arc-welding of thin sheet steel’ Proc. 2nd Int. Conf. Robot Vision & Sensory Controls (1982) Haralick, R M ‘Using perspective transformations in scene analysis’ Comput. Graph. Image Process. Vol
13 (1980) R M and Chu, Y H ‘Solving camera parameters from the perspective projection of a parameterized curve’ Pattern Recogn. Vol 17 No 6 (1984) Sobel, I ‘On calibrating computer controlled cameras for perceiving 3-D scenes’ Artificial Intelligence Vol 5 (1974) Roberts, L G ‘Machine perception of threedimensional solids’ Tech. Report No. 315 MIT Lincoln Laboratory, Cambridge, MA, USA (May 1963) Pratt, W K Digital image processing Wiley, New York, NY, USA (1978) Haralick,
266
7 Wong, R Y ‘Sensor transformations’ IEEE Trans. Syst. Man Cybernetics Vol SMC-7 No 12 (December 1977) 8 Ben-Israel, A and Greville, T N E Generalized inverses: theory and applications, Wiley, New York, NY, USA (1974) 9 Noble, B Applied linear algebra Prentice-Hall, Englewood Cliffs, NJ, USA (1969) 10 Denavit, J and Hartenberg, R S ‘A kinematic notation for lower-pair mechanisms based on matrices’ ASME. J. Appl. Mech. Vo122 (1955) mathematics, 11 Paul, R Robot manipulators: programming and control MIT Press, Cambridge, MA, USA (1981) 12 Hildreth, E ‘The detection of intensity changes by computer and biological vision systems’ Comput. Vision Graph. image Process. Vol 22 (1983) 13 Marr, D and Hildreth, E ‘Theory of edge detection’ Proc. R. Sot. London Serial B Vol207 (1980)
image and vision computing