Auto-calibration of visual sensor parameters on a robotic head

Auto-calibration of visual sensor parameters on a robotic head

Auto-calibration of visual sensor parameters on a robotic head Thierry Vibville We propose a new method of auto-calibration for a visual sensor mount...

1MB Sizes 0 Downloads 77 Views

Auto-calibration of visual sensor parameters on a robotic head Thierry Vibville

We propose a new method of auto-calibration for a visual sensor mounted on a robotic head. The method is based on the tracking of stationary targets, while the robotic system, performs a specific controlled displacement, namely a fixed axis rotation. We derive here the equations related to this paradigm, and demonstrate that we can calibrate intrinsic and extrinsic parameters using this method. Experimental data is provided.

Keywords: calibration, auto-calibration, robotic head

In addition, due to the lack of easily usable calibration procedures, this has lead many workers in the field to base their calibration techniques on the observation of a specific object (calibration grid) whose size and shape is known in advance. Such a calibration must be done at the beginning of a working session. In the case of an active visual system, calibration of the visual sensor for any possible configuration must be done, which is not realistic since such a mount has 5-13” of freedom. However, some authors have already attempted to define procedures of calibration in which the equations

used to identify the parameters knowledge of a given object,

Visual sensor calibration is the problem of determining the parameters of the transformation between the 3D information of the imaged object in space and the 2D observed image. Such a relationship is mandatory for 3D vision. More precisely we have to know the location (translation) and attitude (rotation) of the visual sensor with respect to the rest of the robotic system (extrinsic parameters), and the different parameters of the lens, such as focal length, magnitude factors, optical centre retinal location (intrinsic parameters). The problem of calibrating a number of cameras is becoming extremely important in the field of active vision to make use of artificial robotic heads’,‘. For such a system the accuracy of the visual perception relies on it. Even though this problem has often been considered in the past for static visual systems3-5 (see Tsai’ for a review), very little is known about the calibration of active visual systems, in which the extrinsic parameters and the intrinsic parameters of the visual sensor are modified dynamically. For instance, when tuning the zoom and focus ratios of a lens these parameters are modified and must be recomputed. INRIA, Sophia, BP93. 06902 Valbonne, @sophia.inria.fr) Paper

France (emailxthierry-

received: 3 J&y 1992; revised paper received: 7 April i993

do not come from the

but from hypotheses made about the visual surroundings. Such approaches are either based on general projective invariants7,*, or limited to the problem of calibration of the epipolar geometry of a stereo ring”-“. In fact the term ‘autocalibration’ or self-calibration for a visual sensor is usable when the method allows the system to calibrate (1) automatically, (2) without any semantic knowledge of the observed objects, or (3) without a complete knowledge of the ego-motion. considering all these methods, the basic constraint is the following: the observed objects are assumed to be stationary, thus forming a rigid configuration. The first question is thus how to detect tokens corresponding to the same rigid object without knowledge of their 3D parameters, and without calibration! Perhaps surprisingly, this problem has already been solved although the criteria to segment moving objects are not entirely rigorous, they are often more robust than methods involving 30 parametersi5, especially for small motion disparities 16. In the cases of several moving objects present in the field of view, the segmentation mechanism can easily decide which rigid motion corresponds the stationary tokens ‘*,14 . It is then reasonable to base our study on the assumption that the visual input is a set of stationary tokens. Finally, token correspondences from one frame to another are required by such mechanisms. These are

0262-8856/94/040227-l 1 @ 1994 Butterworth-Heinemann Ltd Image and Vision Computing Volume 12 Number 4 May 1994

227

Auto-calibration of visual sensor parameters: T Vieville obtained using token-tracker techniques”,“. In this paper, for simplicity we take points as tokens. On robotic heads, the need for calibration procedures which are not based on any specific object known in advance, but on the observation of stationary points of the visual surroundings, is crucial’.‘“. Specific and possibly more efficient mechanisms can be developed in this case. Such mechanisms are specific because on a robotic head we partially know the motion of the visual sensor. Precisely, all robotic heads perform controlled rotations around a few fixed axes, and the relative angle of this rotation is known with a high precision*. Using this capability, one might expect some simplifications in the problem of calibration. Moreover, there is a need to design a specific method in this case because standard general methods fail, since this kind of motion is singular for a general method of autocalibration (as shown by Luong20). To overcome this problem, we must introduce additional information, for instance about the angle of rotation. We will thus make two assumptions about the active motion of the sensor: (1) the rotation is performed around a fixed axis; and (2) the relative angle of rotation is known. Can a system be calibrated using these two assumptions? The answer is yes, and the related equations are developed in this paper. REVIEWING THE VISUAL SENSOR CALIBRATION PROBLEM The camera model used is the simple pin-hole model, assuming the camera performs a perfect perspective transformation with centre 0 (the camera’s optic centre) at a distance f (the focal length) of the retinal plane. We note (u, v), the horizontal and vertical coordinates of a pixel in the image plane. All coordinates are related to a frame of reference attached to the retina (x, y, z), z being aligned with the optical axis, as shown in Figure I. This retinal frame corresponds to a rotation angle 0 = 0. However, let us consider that the camera is not yet calibrated, but that the projection (u, v) of a 3D-point (X, Y, 2) onto the retina is given, in homogeneous coordinates, by:

I I

su =

s

1 \9

0

a,

0

1z 11 vg

Y

Y

A

where (Y,, and LY, are the horizontal and vertical magnitude factor (or scales, including the focal length), and (u”, v(,) corresponds to the location of the optical centre. This is a standard camera model’. As shown in Figure I, this corresponds to the usual equations u = v = (Y,Y/Z + v,:-s. See Toscani and LyuX/Z+u”, Faugeras’ for a review’. *A typical precision is of 0.01”. with a reproductability 0.1” for an angular (rotatory)

better

than

Figure 1

in the calibration

EQUATIONS FOR TRACKING STATIONARY POINT

problem

A

Let us consider a fixed point M = (X, Y, Z) in space and its projection onto the image plane in homogeneous coordinates m = (u, v, 1). During the head displacement, the point M= (X, Y, Z) does not project directly onto the retina, but is subject to this rotation, say of axis A and angle 0. For 0 = 0 the rotated frame of reference corresponds to the retinal frame of reference. The axis is parametrized by- a unary vector u = (uX, z+ u,)~‘ aligned with the axis of rotation and a point C= (C,, CY, C, on this axis. We choose C to have 021

u. Thus,

this

I(O’CII is the distance related equation is: M%=R(u,

join.

Image and Vision Computing

relationships

In the rest of the paper, we use matrix A to represent the intrinsic parameters of the system. This matrix is invertible, since neither cyrr, nor cy, are expected to be zero, whereas LY,> 0 and CY,> 0. It is important to note that (~,,/a,, depends upon both the geometry of a pixel and the way in which the digitalization is performed. It is usually different from one, because pixels are rectangular, and the sampling rates of the digitalization board might not be homogeneous. However, this value is almost fixed, and thus has to be calibrated once. On the contrary, the average value of (Y, and 6,, is directly related to the focal length of the lens, and varies when zooming. When zooming, or when modifying the parameters of the lens, both the magnitude factors and the optical centre might change, and it is very important to be able to recover the intrinsic parameters (u”, vO, (Ye,(u,). Let us now see how to derive a method of calibration which is only based on the observation of stationary points in the background, while performing a partially known motion.

‘In fact WC could have used a most sophisticated model for the calibration. This would have yielded the same kind of equations, but with costly derivations.

228

Geometric

Volume 12 Number 4 May 1994

point

is uniquely

from

A to

@).M%aM’=R(u,

the

defined,

and

origin.

The

O).M+ [(I-R(u, \

@)).Cl t

(1)

Auto-calibration of visual sensor parameters: T Vi&iile where R(u, 0) denotes the rotation matrix, while the induced translation is t. The projection of M onto the image plane describes a

planar trajectory which can be parametrized by 0. The coefficients are functions of: (1) the intrinsic calibration parameters (ut,, vyt, a,,, a,.), of (2) the rotation parameters (extrinsic parameters) u and C, and (3) the 3D depth Z of the point. Let us make this relation more explicit. There is an explicit formula for R(u, O), using the notation* ti *I = u/Ix’, which is related to a compact representation of the rotations (Rodriguez formula)“‘~“. We have+: R(u,O)=

Id,.,+sin(@)ii+(l

-cos(O)))ii’

(2)

Using equations (I) and (2) we can write:

@)*(M-Cj+C] ti*(M-C)+

=Q*[M+sin(@)

(I -cos(s))ii”+!f-C)] =

Q.[u.uT’.

M+C+sin(@)

M= 1’=

uA(M-C)+

cos @)((I - u . u”) . M - C)]

u (0)

I

C3 cos ((4) + S3 sin (0) + 03 C2 cos (0) + S2 sin (0) + 02 C3 cos (@) + S3 sin (0) + 03

Considering equation obvious result:

= u(P, 0) (5) = v(/P, 0)

(5). we have the following

Proposition 1 During a fixed axis rotation, the projection of a stationary point onto the retina is a planar curve whose form is given in equation (S), parametrized by the angle of rotation 0, and whose coefficients 9 = (01, 02, 03, SI. 5’2, S3, Cl, C2, C3)T are uniquely defined up to a sca!e factor.

1.

This equation can be factorized as: 2.

v(O)= &

Cl cos (0) + Sl sin (0) + 01

As expected, 9 is a function of the calibration parameters: {u, C, (LY~,, LY,,,uo, I+,)) and of the point location M, as given by equations (4). Moreover, all the information available about the calibration parameters, considering the projection a stationary point, is contained in Y. With this formulation, we can perform autocalibration in two steps:

z.(O) m(O) =Q.[R(u,

equations can also be written:

[6 + % cos (0) + Y sin (O)]

(3)

Identification of the coefficients of the trajectory of the projection of one or several stationary points during a rotation of the mount. (This is done in the following section.) Computation, knowing these coefficients, of some information about the calibration parameters. (This is done in the subsequent section.)

I

RECOVERING THE TRAJECTORY PARAMETERS

if we write:

=;,(&Mj

Given a set of angles 0, and a set of related retinal points (u,, v,), i = 1, . . ., n, we now want to compute the vector Y up to a scale factor as given in equation (5). To choose a criterion for this estimation, we first make use of two simple hypotheses:

A.u+A.Cj

l

=+

9 =

[A+(uA(M-C))]

(4)

For each data point (u,,, P,), the measurement errors are independent and have a constant covariance: /&

0 \

Cl

% =

(

l

c2 c3

In such a case, a reasonable criterion is:

Considering each image coordinate,

“For a vector u = (14,. II,., I,;)’

The quantities {Or . . . ON} are known, and their related measurement errors are negligible.

we have +[_;:

the previous set

-ij

C, cos(O,)+Sl sin(+),)+01 2 <’ z -!- 2‘,, 11,cr2 i C, cos (OJ + S3 sin (@I,)+ 03 1

-ii.

+ ‘This representation is well defined for rotation: with’ (3 E 10. 111 and u E It’ (unit sphere). If @= 0 (identity) or (3 = II (symmetry), this representation is singular. but the formula is still valid. Moreover. since we control C-1.this situation can he avoided in practice.

=$X,[it-u(CP,

L v;-

C2 cos (OJ + S2 sin (0;) + 02

2

C3 cos (0;) + S3 sin (@I,)+ 03 1 O)]‘+(v,-~(9,

Image and Vision Computing Volume

12

Number

O)]’

4

May 1994

229

Auto-calibration of visual sensor parameters: T Vi&Ale This corresponds to a E* criterion, also called a pseudo-Mahalanobis distance, and has a statistical interpretation 22. However, minimizing this criterion leads to non-linear equations, and it might not be straightforward to find the solutions. To overcome this difficulty, we recover the coefficients of the trajectory in two steps. We first derive analytically an initial estimate for the coefficients, and then refine this first estimate numerically.

Initial estimate of the coefficients A simple way .to obtain an initial estimate of the coefficients is to compute the Taylor expansion of these epressions. Let us only consider ui, the same derivations yield for Vi+The Taylor expansion for ui, up to the third order, is a huge expression, but can, in fact, be worked out if we use the constraint 03 + C3 = 1, which is true (see Proposition 3). This constraint is useful to simplify the equations, and ensures that the expression is not ill conditioned for u~/@,=~= (01 + C1)1/(03 + C3). In that case we obtain: uj = (Cl + 01)

The first three equations allow us to compute the common denominator between u and v from linear equations, and the last set of equations allow us to compute the numerators. These equations are well defined if and only if: uil vi2 - Ui2Vi1#O. A singular estimate corresponds to the case where the curvature of trajectory tc = 2(uiI vi2 Ui2V~,)/(~~, + Vfi)” is zero, i.e. a rectilinear trajectory. We have determined experimentally that this corresponds to the case where the axis of rotation is parallel to the retinal plane while the projection of the tracked trajectory is close to the centre of the retina. This situation can thus be avoided in practice. At the implementation level, a first estimation of the trajectory can be obtained considering a polynomial model, and using - for instance - least square minimization, each coordinate separately. The orders of the models fitted to the u and v data do not need to be three, but can depend upon the nature of the trajectory. As soon as one polynomial model for u or v is of an order of at least two and the other one is at least one, the set of equations (7) is well-defined.

Minimizing the non-linear criterion

+(S1-S3c1-S301)0~ +

01 ----( 2

Cl03

0103 2

2

-S3S1+S32c1+s3201 Sl

S3Cl

55301

+ ( 3------ 3

2

6

+ s32sl

-s33Cl-S3”01 Ujl

0; + Ui2@f +

U;3 0:

0;

s103

s3 Cl 03 + s3 0103

= Uio+

!

>

~~+Q(~~)

+ 0 (04)

Using the previous algorithm to obtain an initial estimate, the coefficient 9 is finally obtained by minimizing the criterion given in equation (6). To estimate these nine coefficients up to a scale factor, at least four data points are required. This means that the point must be tracked in at least for views. Let us now compute the covariances for the coefficient 9. This is a non-linear least-square minimization, corresponding to fitting a parametric curve (~(9, O,), ~(9~0,)) to a set of 2D-points {(a,, v,) . . . (u,,,, v,,,)}. For such an estimate we have the following result:

Proposition 2 ‘Assuming that:

and a similar expansion exists for vi. Considering this Taylor expansion up to the third order allows us to compute 0, Y and % from a set of linear equations. Using the same notations for v we obtain, after a small amount of algebra:

l

l l

a

C Ui,O3+2Uj*S3=2/3Uj,-2Ui3 vi103 + 2yi2S3 = 213u~I -2 Ui3

The inverse of the covariance matrix A9 (information matrix) for the optimal estimate P* is:

c3=1-03 01 =2Ui~+Ui~O3+2Ujj

S3

Sl = Z&l+ z&os3

i

every data point (ui, vi) has the same precision, as does every horizontal and vertical coordinate. All errors are independent, the ~~ffntities (0, . . , ON} we exactly known, the data points are close to the fitted curve, covariances are expected to be small.

(7)

Cl = -2Ui*+ijOC3-2Ui~S3

dV(9*, Oi) 8V(9*, Oi)T

02 = 2 Vi2+ Vi003 + 2 Vi1S3

a9

S2 = Vi1+ ViOS3 L

230

C2 = -2 Vi2+ Vi0C3 - 2 Vi1S3

Proof

image and Vision Computing Volume 12 Number 4 May 1994

The quantity



a9

9’” is a solution of the normal

>

Auto-calibration

equations (d9)

of the

=f(

Let

us write

Combining the three last equations af( )/a9 and A,, and performing a small algebra, yields the expected formula.

$(ac’)/

):

0 =f(9’*,

T ViCIville

for A@‘, amount of 0

. (4% “NY @,))I

“I, 0,)

{(u,,

;$

=

t2 criterion.

of visual sensor parameters:

=Z~,[u(Y*,

au(Y*, 0,)

O;)-Ui]

a9

av(Y*, 0;) +["(97@;)-",1

a9

Because covariances are expected to be small, it is realistic to try to estimate them using first order expansions. Then, considering thef( ) as a function of (u,, vN, 0,)) we have: {(u,, “I? 0,)

We are thus in a situation where the information matrix (inverse of the covariance) is mainly related to the Jacobians of the curve model, and thus to the numerical conditioning of the equations. In our case, considering a trajectory of N + 1 points given for 0, E { - N/2Ae . . . N/2 A@}, and neglecting high order terms, we can use the previous result to derive explicit formulas: CO” (a) - CO” (%) 12

zz,

A, =

df( 1 .* df( )’ . a(&, v;y a(&, “JT

A&N”

+ df( )

V@,

a@,

cxy=,

a9

0 Corr(%,

0,)

av(9*,

.

a9

Y)

1

a9

= &

_ du(9,

au;

df( ) -=-

0,)

a9

av(9,

a9

ai



0;)

and

i \

’ V,, =O

=

a9

au(9*,

a9

+ av(9*,

LI

au(s*, a9 +

au(s*, ’

av(9*, a9

a92

0,)

0,)’

a9

.

0J

0

15”;”

360 (U; I U~O + “, I “iO)

oi)7

4N 0

-3OVil

I

@,N”l

9

3ou;,

4N

542iN31

-304, ~ Og)N”I

72O(u?, + vf,) OiN512

,

l=v;~uj~-u;~vj~

COMPUTING

a9

Oj> av(9*,

-

As expected, the higher the number of points taken into account and the higher the amplitude of the angular step A@, the smaller the covariances. Note that for one configuration (when 1= 0) these matrix seem to be singular. This corresponds to a non-generic situation in which the vector to the centre of the curve is parallel to its tangent. In fact, the estimate is not singular, but the covariance have to be computed taking higher order terms into account.

ei)T

a2v(9*,

0;) av(9*,

a9

0

0

with

@;)

a9



@,)-“,I

+ I”(97

0

-3ov;, ~ O$)N”f

@i)-u;]d2U~~~

+ au(9*, oj)

0

9

But we have: af( ) a,g=87z,[u(9*,

0

CO”(Y)

=(+2

*_, ._df( ) ’ a9

df( )’ -.

0

-15v;o

Considering the quantity 9* as given by the implicit equation 0 =f(Y*? . .) we have, from the implicit function theorem: &I

A$,N512

O;)T

Since:

>=

18O(u& + v$)

0

a9 + av(9*,

_aft

0

A&N’

a@

au(Y*, 0;) au(9*, Oi)T

‘cr2

0

12

0

af( )’

0

CALIBRATION

PARAMETERS

Knowing 9, we must now analyse how to recover calibration parameters from equations (4). All derivations are related to the following result:

qT

a9

Proposition 3

Note that, since the data points are expected to be close to the fitted curve, second-order derivatives are not taken into account.

If M does not belon

the our

to the rotation axis

A, the triplet of vectors (II, A- B=%, A-’ 1Y) form a direct orthogonal frame of reference, IIA-’ a% 11= IIA-’ . Y 11.

Image and Vision Computing

and we have:

Volume 12 Number 4 May 1994

231

Auto-~ljbration of visual sensor parameters: T VWille the f~~~~w~~grelation holds:

Moreover,

and have the same magnitude: O=e, =(~eT.~-‘T.A-‘*$J’)

Proof Compute (u7‘ *A-’ . %) = 0 in the third equation of equations (4), thus u J_ A-’ . ‘%. Compute u AA-’ *% = A-’ - Y using equations (4). From this result we obtain that u l. A-’ +Y, and A-’ *% I A-’ . Y, but also that (u, A-’ *%, ---’. Y) form, if none of these vectors is null, a direct frame of reference. Now since u I A-’ *%, we derive from the previous equation: IIuIIIIA-‘.(eil=IIA-‘.YII, but u being unitary we obtain 11 A-’ *% II= Ilk’ . Y 11. Then computing, for instance, IIA-’ *~11~we get:

/IA-’

= (Cl - UeC3)(Slau

(C2-u0C3)(S2-vaS3) + z ff, O-ez

It is often not realistic that A4belongs to the rotation axis, because in a robotic head, these axes are behind the camera. This situation could, however, happen for an ‘arm-eye’ device, with the camera being mounted at the end of the robotic arm. We now can rewrite and simplify equations (4) as: a -‘.y u/“iA-‘.O A-‘.%+A-I.6

=ur\A-‘-q =;

u/‘iC

+f

which is obviously equivalent to the original equations, knowing Proposition 3. The equation being put in this last form, one may see that it is easy to recover the extrinsic parameters. But more than that, these derivations have several interesting implications.

ffu (cl?

-

v&3)2

(S2 - v(J3)2

2 a,, c3’ - .s3 - s32 1

These equations are quadratic with respect to (un, va) and linear with respect to (I/*:, lIc$). If generic, two sets of such equations are sufficient to estimate the intrinsic parameters. Moreover, we have a, >O and cr, > 0, these equations being thus unambiguous with respect to LY,and cr,. However, expanding the equations, one can easily verify they are linear with respect to b = (l/a:, l/&, u&f,, uola?, ~$a?:+ v$cI~)~. Therefore, by a suitable change of variables, it is possible to obtain linear equations related to the intrinsic parameters: considering the five components of b as unknowns, the measurement equations are now linear, and it is straightforward to calculate the intrinsic parameters from an estimate of b Thus two methods are available: one is a non-linear method, and the equations being quadratic in u. and vo, it is expected to obtain several solutions, but we might choose the right solution by considering a reasonable initial estimate for u0 and vo, and looking for a local solution close to the original values. At least two trajectories are required. The other method is linear, but at least three trajectories are required. The covariance related to these equations can be estimated from the knowledge of A6 and using formulas similar to those evoked in the proof of Proposition 2. We obtain: 1 a kWz)T = baN”Lz ( b

b c1

(T a = %(720/g C32 + 28801:S3’+ 2880m C3S3) b= ‘/2(1440m(C32-S32) (28801:-72Oli)C3S3) c = (2880 i: C3’ + 720 it S32 - 2880 m C3 53) E;=

u;l,+v!o

Equations for intrinsic calibration parameters 1:

It is possible to generate, knowing 8, two equations related to the intrinsic parameters (uO, vu, (Ye, a,) by simply writing that A-’ . % and A-’ . Y are orthogonal

Image and Vision Computing

(9)

2

-

+

<

232

1

= (Cl - U”C3)2-- (Sl -UaS3)2

+

(II’- IM) if and only if u/EM that is if M belong i the rotation axis d. Tf not llA_’ +(e// = I/A-’ Y I/# 0, and the triplet of vectors form a true frame of reference. On the contrary, if M belongs to the axis of rotation, M is invariant, and either Y nor % are even defined since the trajectory reduces to a point. The additional relation is straightforward to q derive.

+ C3S3

=((e“.A-‘T.A-l.(e)-(~P7.A-iT.A-‘.~)

-Y/j = ; 2/llM-Clj2-(~T4f)2

Finally, IIA-’ +YII= 0 if and only if IIM- C/I = since Z #0 as a projective factor. But (uT+f), (i&M) = (u’*(M-C)) = IIM-Cllcos(u, G4) since u I C and u is unitary. We thus have //M - C //

U,S3)

2

Volume 12 Number 4 May 1994

= 1:

4.

zk;,

+

vf,

Vj,

UiO-Ui,

m = U;’ UiO+Vi,

VjO Vi0

Auto-calibration of visual sensor parameters: T ViMlIe

intrinsic parameters Considering P trajectories, can be efficiently estimated using the following 8* criterion:

two linear homogeneous

equations for C:

O=es=CT.[(~~e’.A~‘T.A-‘.O‘)~-l.y(YT.A-‘T.A-‘.O)A-‘.(e]

(11)

O=e~=CCT.[A-‘.~r\A-‘.Y]

The final covariance is obtained (see Proposition 2) as:

sufficient to estimate the orientation of C in the plane normal to u. It is possible to recover the absolute position of the. point in space as:

oxif=

110~ t/IA- . Y/l [A-’ +% +A-’

Knowing 8, and u 0, vO, LY,and (Y, being estimated, we can evaluate A-’ . 0, A-’ - Y and A-’ - %. However, equation (8) is true only if Proposition 3 holds, i.e. A-’ . % _LA-’ +Y and IIA-’ . % II= IIA-’ * Y 11. Since this is not necessarily the case in practice, due to numerical errors and noise, we have to enforce these conditions as being true. This can be realized easily by rotating symmetrically A-’ * % and A-’ * Y around an axis orthogonal to both the vector and the minimum angle which transforms those vectors to perpendicular vectors, while their common magnitude is set to the mean of their magnitude. We have left the reader to verify that among all transformations which generate vectors verifying Proposition 3, this one minimizes the distance between A-’ e(e and its transform, as it does for A-’ - Y.

Extrinsic parameter computation Considering equation (8), and knowing the intrinsic parameters, we can estimate the extrinsic parameters and the point 3D location, up to a scale factor. Let us detail this result. We can estimate u = (A-’ - ‘f: AA-’ . Y)/IIA-’ - Y 11’. It is equivalent to say that each trajectory provides two linear homogeneous equations for u:

sufficient to estimate the direction of u. The centre of rotation C can be recovered scale factor. We obtain: 0°C =A[(YT+A-

up to a

‘T.A-‘.QA-‘.Y+ (cf&A-‘T.A-‘_@)A-I.(e)

This scale factor undetermination is not surprising, dealing with monocular cues. If the depth of a point, or the absolute distance between two points is known, the absolute value of C can be recovered, since we have: A = ZIIIA-’ . Y 112. It is equivalent to say that each trajectory provides

.6’]

(12)

This estimate is related to the projection of 0 in the plane orthogonal to u. Using this expression will ensure that 2 is defined as soon as 0 is not aligned with u, which corresponds again to the fact that M does not belong to the rotation axis A.

Calibration algorithm All previous follows:

developments

can be summarized

as

~o~sition

4 In rhe case uf the tracking of stationary targets during a fixed axis rotation for which relative angles are known, each trajectory provides two equations about the intrinsic parameters. Moreover, knowing the intrinsic parameters, each trajectory provides an estimate of the extrinsic parameters and of the 30 location of the target.

In fact, this is all that we can compute from the coefficients of the trajectory, since 8 is of dimension 8 (9 components up to a scale factor) and yields two equations for the intrinsic parameters, two equations to compute u, one equation to compute C with respect to u, and three equations to compute M, all this information being independent. There is thus no possibility to obtain any new information from 8. This yields the following algorithm: Algorithm 1 Track at least two points during a fixed axis rotation of the head, record angle and trajectory coordinates. 2 Identify the coefficients 9 by first computing initial values from the fitting of a polynomial model using equations (7), and then refine this estimate minimizing the criterion given in equation (6). 3 Compute the intrinsic parameters using equations (9) on at least two trajectories, and the related covariances. 4 Compute the true values of A-’ 6’, A-’ 9’ and A-’ %. Then rotate symmetrically A-’ Y and A-’ % around A-‘YAA-‘V: to obtain A-‘*%lA-‘9, and

image and Vision Computing Volume 12 Number 4 May 1994

233

Auto-calibration

of visual sensor parameters:

T Vi&ille

Table 1 Experiment 1 results ug

v0

(Y,

a, u=

True values

= 256

= 256

= 256

= 256

1 00

o-c=

0 C 0.71 0.71 )

Noise

E*,

E”C,

Em”

Em,

6,

60':

OP 0.05 p 0.2 p 0.5 p 1P 2P 5P

=op 0.02 p 6.0 p 3.0 p 4.2 p 12.3 p 23.2 p

0.11 p 4.06 p 11.2 p 8.1 p 6~ 11.2 p 17.1 p

=O% 0.08% 11% 10% 14% 23% Huge

0.04% 6.1% 16% 6.7% 25% 32% Huge

0.0004 deg 0.02 deg 1.86 deg 2.12 deg 6.12 deg 12.72 deg None

0.006 deg 0.03 deg 7.04 deg 6.25deg 13.23 deg 22.94 deg None

modify symmetrically the magnitudes of A-’ Y and A-’ % to get [IA-’ . % II= Ilk’

. Y II.

Compute the extrinsic parameters (u, OC/ IIOCll) and OMIII OCll) up to a scale factor, using equations (10, 11, 12). Compute the related covariances.

5

It then seems possible to increase the precision of the method, by tracking several trajectories at the same time.

Experiment 2: Trajectory identification using real data

The system is calibrated. We then studied the accuracy of the mechanism trajectory identification.

EXPERIMENTAL RESULTS Experiment 1: Parameter estimation with synthetic data We have studied the robustness of the method, considering noisy trajectories generated artificially. Gaussian noise has been added to the 2D coordinates of the points of the trajectory, the standard deviation being given in pixels. Computations have been done using only two trajectories. We obtained the results given in Table I. Errors for (Q, vO) correspond to errors in the estimate of the projection of the optical centre in pixel (p); errors for (au, cw,) are given i2 percentages, errors for the orientation of u and 0 C are given in degrees. These results lead to three comments: (1) because the rotation axis was vertical, variations were almost in the horizontal plane, and parameters related to horizontal positions have a better estimate than parameters estimated in relation to vertical positions; (2) the method tolerates an error of about two pixels for each point of trajectory, and diverges for higher levels of noise; (3) these results have the same order of magnitude as those obtained for standard methods of calibration3-5. Note that this has been estimated using only two trajectories. We have repeated the same experiment but with ten trajectories and obtained, in the same conditions, the results given in Table 2. Table 2

Same as Table 1 but with 10 trajectories

Noise

eq

l

lm”

eLII*

h”

Eo?

1P

1.3 p

3.3 p

4%

8%

1.27 deg

7.4 deg

234

Image and Vision Computing

of

Point tracking and trajectory identification In this experiment, we tracked the centre of gravity of a white blob of about 5 pixels in diameter over a dark background, the intensity being thresholded using a statistical criterion on the intensity distribution. Blobs were identified detecting connected regions of light pixels and computing the centre of these points. More than 20 positions are recorded for each trajectory. This part of the experiment is entirely automatic. The trajectory is estimated from a polynomial model computed up to order 10. The order of the development is automatically estimated using a E2 test on the residual error, the evaluation being stopped for a probability of distribution similarity better than p = 0.99. A typical trajectory is shown in Figures 2 and 3.

Figure 2 True and estimated tracking

Figure 3

Volume 12 Number 4 May 1994

horizontal

coordinates

during a

True and estimated vertical coordinates during a tracking

Auto-calibration

of visual sensor parameters:

T Vi&ille

We plotted the x and y coordinates against the angle of rotation 0; the continuous line corresponds to the true trajectory and the dashed line to the estimated trajectory. Since the trajectory was almost horizontal, the x estimate is almost a line, and the order of the Taylor expansion has been automatically stopped after order 1. For the vertical coordinates, the development has been carried out up to the fourth order. We increased the scale for the vertical coordinates to show the differences between the two curves. Behaviour

Figure 5 Parametric trajectories 128 points along 30 frames

of the algorithm

Identifying the trajectory using a polynomial ‘model (equation (7)) yields an average retinal error of 2-3 pixels between the data point and the trajectory, while recomputing this estimate minimizing the proposed criterion (equation (6)) yields an average retinal error of lower than 1 pixel. However, minimizing the proposed criterion without a good initial estimate yields unstable results, as we also experimented. It is thus important to perform the estimate as proposed here.

Table 3

Precision of the method

We obtained an overall precision of 0.2 pixel (standard deviations), and the order of magnitude of this value is fairly stable. This, indeed, means that it is possible to obtain subpixel accuracy for the trajectory, since we average several data point. Experiment data

3:

Parameter

estimation

using

real

In this experiment, we tracked the corners of the calibration grid shown in Figure 4, and identified the targets using the standard method of calibration described by Toscani et al.“. This part of the experiment is entirely automatic. The parametric model of all trajectories has been represented in Figure 5. We observed 128 points during 31 views in this experiment. It can be seen that there are false matches for six of the trajectories, but we did not correct it, assuming that the method will be robust enough to deal with this bias.

Mean

Standard

method

of the intrinsic parameters

of a visual sensor. We thus

Figure 4 Using a calibration grid to compare established calibration method

our method

with a well

during

the tracking

of

results

240.4 241.8 255.1 242.4 247.8 238.5 239.4 247.7 250.8 255.2 236.6

215.6 226.5 223.6 215.4 213.3 224.0 205.5 244.0 224.8 236.5 203.1

825.5 826.2 830.9 848.1 845.5 850.3 837.1 859.8 820.6 838.4 835.4

1228.7 1257.5 1215.3 1262.1 1227.1 1170.9 1223.7 1222.6 1240.8 1234.6 1214.6

245.06

221.12

837.98

1227.08

only compared the results obtained with our new method, with respect to the results obtained using a standard method3, under similar conditions, the main difference being, of course, that we do not introduce any information about the grid in our software. Using the standard method several times we obtained the results shown in Table 6. Since such a method does not provide a covariance for the estimate, we roughly estimated the standard deviations from a set of 20 measurements. With such a method, the location of the optical centre is known with a precision of 2-3 pixels, and the scale factors with a precision of 5%. Using the method developed in this paper, we obtained the following performance for the same run: -uO 245.93 SD=0.03

Intrinsic parameters: comparison with a standard method It is unfortunately not possible to know the ‘true’ value

identified

-au

832.06 SD=0.5

-vO 236.22 SD=0.3 -av

1280.99 SD=30.0

We have controlled these numbers running the experiment several times, and obtained very similar values. Note two fundamental facts: (1) our method yields better results in terms of variances than the standard method, because we do not only analyse one image of the grid but a full set; (2) the values have a better accuracy for parameters related to horizontal positions than those related to vertical positions (uO and (Y, are much more accurate than v. and (Y”), because the motion of the head was carried out almost in a horizontal plane, as visible in Figure 5: on the one hand, horizontal-related values are close to the average value obtained with the standard method, while vertical related values are not; on the other hand, estimated covariances are higher for vertical related values. In

Image and Vision Computing

Volume 12 Number 4 May 1994

235

Auto-~libration of visual sensor parameters: T Vi&Me any case, the covariances seem to correspond to the true precision of the data, and an iterative statistical filter might be able to manage with the fact that we only have a partial observation of the parameters. Since these results are limited because the other calibration method is itself imprecise, we have indirectly checked our results, verifying if, after the int~nsic calibration, the constraints of Proposition 3 are verified. They should be if the calibration is perfect. Computing:

as an indicator of the relative between the two vectors, and

F-arc

Err, =

error

in magnitude

cos

as an indicator of the error in orientation between these two vectors, we have obtained: Err n 0.0217241

Err a 3.45087

deg

u : (0.026 alpha

u : (0.026 alpha

-0.998

-83.133

-0.049) p(alpha)/pO

-0.999

-79.612

-0.036) p(alpha)/pO

p(u)/pO

: 0.04

: 0.27 p(u)/pO

: 0.15

: 0.04

u : (0.029 -0.999 -0.035) p(u)/pO : 0.16 alpha -88.522 p(alpha)/pO : 0.63

u : (0.032 -0.999 -0.033) p~u)/~ : 0.19 alpha -90.430 p(aipha)/pO : 0.76 where p(u) and ~(alpha) are the probabilities for the measure not being equal to the estimated mean value. In fact, we have computed the mean value of each estimate and then rejected measures whose probability in corresponding to the estimate are higher than a threshoId p. (see Vieville and Sander23 for a precise description). The final estimate is performed using plausible measures only. We finally have obtained: *u=

(

0.03

( 4.3e-05 % Covfu) = ( 4.6e-06 ( -6.2e-05

-1

4.6e-05 2.7e-06 -4&a-05

-0.051

)

-62e-05 ) -4.6e-05 ) 0.00089 )

%-Alpha = -81.689154 SD=8.270093 These

values

should have been zero if the two vectors had the same magnitude and were orthogonal. Note that to obtain a relevant estimate of the vertical calibration parameters, a rotation in the vertical plane is to be made. Ex~jnsic parameters The intrinsic parameters being calibrated, the rest of the experimentation is mainly a motion and structure from motion paradigm, in which some assumptions are made about the kind of motion performed. Extrinsic parameters are the axis orientation parametrized by the unary vector u and the axis location up to a scale factor parametrized by the orientation EYof C in the plane perpendicuIar to u. A typical set of results, using a recursive mechanism of Kalman filtering to propagate the estimations and their covariances, is: u : (0.032

-0.997

alpha -73.940 u : (0.027

-0.998

alpha -80.790 u

p(alpha)/pO -0.059) p(alpha)/~

: (0.029 -0.998 -0.056)

alpha -70.039 u

-0.072)

p(alpha)/pO

: (0.029 -0.998 -0.057)

alpha -90.655 u : (0.024

p(u)lpO

p(u)lpO

p(u)/pO

u : (0.029 -0.998 -0.057)

p(u)lpO

alpha

: 0.74

u : (0.026

-0.998

alpha -85.725

236

-0.058) p(alpha)/pO

: 0.05

: 0.78

: 0.89

p(atpha)/pO

: 0.05

: 0.60

p(u)/pO

-90.123

We do not use any information about the absolute positioning of points in space, as done normaly in calibration procedures, and this is the real improvement in this method. We have made use of the nature of the motion of a robotic system, and have better results than a general projective algorithm7*9*24 in this case, since this configuration is singular for these algorithms2’. As soon as one can track at least two stationary points in at least four views during a fixed axis rotation, the system, can be calibrated. It can be used in a clustered environment, performing in parallel to other tasks. Moreover, the two points are to be tracked either at the same time, or at different times but in the same conditions. Since we look at the whole trajectory only, we do not need to synchronously track these points.

: 0.08

: 0.12

-0.053)

-65.612

CONCLUSION

: 0.21

: 0.34

p(alpha)/~

alpha

-0.998

p(alpha)/pO

p(u)/pO

while the expected values were u = (0, - 1, 0) and cy= -90.0, considering the mechanical configuration of the camera on the mount’. We have thus obtained coherent estimates of the extrinsic parameters, but we experimented to show that the axis orientation u is always estimated with a higher accuracy, whereas the orientation of C is a rather unstable parameter. This is to be expected, since u only depends upon the relative orientation of A-’ . Y and A-’ . %, whereas C is a complex, thus more sensitive, function of all the parameters. As for the computation of intrinsic parameters, we have also measured how accurate the constraint of Proposition 3 is, stating that u i A-’ . Y J_A-’ *%. We obtained a mean quadratic error of : Err a 1.2421 deg, the order of magnitude being the same as before.

p(u)/pO

: 0.06

: 0.06 : 0.07

: 0.45

Image and Vision computing

Volume

12 Number

4 May 1994

Auto-calibration of visual sensor parameters: T Vi&ille It has often been said that when using a simple pinhole model, the intrinsic parameters are only approximate and reliable locally with respect to the retinal location or the 3D depth. Since our method can be applied in different windows of attention corresponding to objects located in a given area of the visual surroundings, it is thus possible to adjust the parameters to fit to this neighbourhood. We also demonstrate that auto-calibration of a robotic system is possible when using an active visual sensor. In this case, not only can the camera be calibrated, but also the mount itself, since the rotation axis is estimated in the frame of reference of the camera. This helps to calibrate the robotic head as described elsewhere’. Morover, more sophisticated models of the perspective projection could have been taken into account. However, this method needs to be realized in a precise experimental situation: the robotic head should only perform a fixed axis rotation, but not be subject to other motions (if mounted on a vehicle, the vehicle has to be stopped). This method is complementary to what has been invariants for autoobtained using projective calibration’, when calibrating stereo camera geometry using a variational principley%24, or when calibrating up to an affine transform25. In comparison with these studies, our method seems to be more restrictive, since we: (1) make rather strong assumptions about the kind of motion performed by the system (note that Trivediy*24 also considers a restrained rigid motion, while Faugeras et ~1.’ consider a general rigid motion); and (2) we have to track these points over several views, while the other methods only require two or three views. However, we only need to track two points while the other method requires seven or eight points, and it seems that our experimental results are more robust. Our method is thus less general but well adapted when fixed axis rotations are used. Our implementation corresponds to a recursive process (iterative filtering) correcting these parameters as a function of the related precision of the initial estimate versus the measures, and it eliminates false measures. But this is beyond the scope of the present paper and the full operational implementation of this method will be reported in future work.

ACKNOWLEDGEMENTS Oliver Faugeras, Steve Maybank and Rachid Deriche are gratefully acknowledged for some powerful ideas which are at the origin of this work. Jan-Olaf Eklund, Kourosh Pahlavan and Thomas Uhlin are gratefully acknowledged for the fruitful discussions we had during the progress of this work. Herve Mathieu and Jean-Luc Szpyrka are gratefully acknowledged for their fruitful help during the progress of this work. This work is partially realized under Esprit Project P5390/RTGC, and software is available with gnu-like copyright rules in the INRIA AcVis package.

Formal computations have been derived using the Maple software and its mpls package. REFERENCES I

Vi&he. T ‘Real time gaze control: Architecture for sensing Stockholm Workshop on Computational Vision behaviours’, (J-O Eklundh, ed), Rosenon, Sweden (1991) of the kth 2 Ekhlund, J-O, Pahlavan, K and Uhlin, T&Presentation robotic head’, Fifth Scandinavian on Computational Vision, Rosenon, Sweden (1991) 3 Toscani , G and Faugeras, 0 ‘Camera calibration for 3D computer vision’, Proc. ht. Workshop, on Machine Intelligence. . Tokyo, Japan (February 1987) 4 Tsai. R Y ‘An efficient and accurate calibration technique for 3D machine vision’. [EEE Proc. CPVR’86, Miami Beach, FL., (June 1986) pp 364-374 camera calibration’, Photogrammetric 5 Brown, D C ‘Close-range Eng., Vol 37 (1971) for 6 Tsia, R Y ‘Synopsis of recent progress on camera calibration 3D machine vision’, Robotics Rev., Vol 1 (1989) pp 147-159 7 Faugeras, 0, Luong, Q T and Maybank, S ‘Camera selfcalibration: Theory and experiment’, Proc. 2nd ECCV, Genoa, Italy (1992) S and Faugeras, ‘A theory of self-calibration of a 8 Maybank, moving camera’, Inr. Jo. Comput. Vision, Vol 8 (1992) 9 Trivedi, H ‘Semi-analytic method for estimating stero camera geometry from matched points’, Image & Vision Comput., Vol9 (1991) 10 Thacker, N A ‘On-line calibration of a 4-dof robot head for stereo vision’. Br. Machine Vision Assoc. Meeting on Active Vision, London, UK (1992) 11 Toscani, G, Vaillant, R, Deriche, R and Faugeras, 0 ‘Stereovision camera calibration using the environment’, Proc. 6th Scandinavian Conf. on Image Analysis (June 1989) pp 953960 12 Francois, E and Bouthemy, P ‘Multiframe-based identification of mobile components of a scene with a moving camera’, Conf Computer Vision and Pattern Recognition, Hawaii, HI (1991) 13 Rognone, A, Campani, M and Verri, A ‘Identifying multiple motions from optical flow’, 2nd ECCV, Genoa, Italy (1992) pp 258-268 14 Giai-Gheca, B, Bouthemy, P and Vieville, T Detection of moving objects, Technical Report RR-xxx, INRIA, Sophia, France (1993) 15 Navab, N and Zhang, Z ‘From multiple objects motion analysis to behavior-based object recognition’, Proc. ECAl92, Vienna, Austria (August 1992) pp 790-794 16 Vi&he, T ‘Estimate of 3D-motion and structure from tracking ZD-lines in a sequence of images’, Proc. 1st ECCV, Antibes, France (1990) pp 281-292 17 Deriche, R and Faugeras, 0 D ‘Tracking line segments’, Proc. 1st ECCV, Antibes, France (1990) pp 259-269 18 Stephens, M, Blisset, R, Charnley, D, Sparks, E and Pike, J ‘Outdoor vehicle navigation using passive 3d vision’, Comput. Vision & Putt. Recogn, IEEE Computer Society Press (1989) pp 556-562 19 Pahlavan, K, Ekhlund, 0 and Uhlin, T ‘Integrating primary ocular processes’, 2nd ECCV, Genoa, Italy (1992) pp 526-541 20 Luong, T L’Autocalibration: De L’interprktation des don&es bruit&es ri I’obtention de rbultats robustes, Phd thesis, Ecole Polytecnique (1992) 21 Lenz, R Group Theoretical Methods in Image Processing: Lecture Notes in Computer Science 413, Springer-Verlag. Berlin (1990) 22 Ayache, N Artificial Vision for Robots, MIT Press Cambridge, MA (1989) 23 Vieville, T and Sander, P C/sing pseudo Kalman-filters in the presence of constraints, Technical Report RR-1669, INRIA, Sophia, France (1992) 24 Trivedi, H ‘Estimation of steres and motion parameters using a variational principle’, Image & Vision Comput., Vol 5 (1987) 25 Crowley, J, Bobet, P and Schmid, C ‘Autocalibration of cameras by direct observations of objects’, Image & Vision Comput., Vol 11 No 2 (March 1993) pp 67-81

Image and Vision Computing Volume 12 Number 4 May 1994

237