Vision and Lie's approach to invariance

Vision and Lie's approach to invariance

PROMISING RESEARCH Vision and Lie’s approach to invariance L Van Gool, T Moons*, E PauweIs* Following the Lie group approach, invariants are found ...

3MB Sizes 0 Downloads 27 Views

PROMISING RESEARCH

Vision and Lie’s approach to invariance L Van Gool, T Moons*,

E PauweIs*

Following the Lie group approach, invariants are found as solutions of systems of partial differential equations (PDE). The starting point of Lie’s invariance theory is a group of transformations of some vector space (or manifold). Computer vision problems seldomly present themselves in this form, however. The paper sketches the steps that typically are involved when exploiting this framework for vision. It then focuses on the vision related problems it entails. The paper is not intended to be a theoretical treatment, and an effort is made to discuss the ideas on the basis of examples that are both instructive and practically relevant. Practical examples are given in a final section.

The application of invariance theory has gained a renewed interest in the computer vision community. Recent results show that it offers a strong, unifying framework that helps in tackling problems such as calibration-less vision, efficient matching, shape-from-motion, grouping, and several other problems considered crucial to intelligent vision. Nonetheless, a systematic approach to the problem of extracting invariants is far from trivial. This paper describes one such approach, the theory of Lie groups. After a concise and non-rigorous account on the method itself, typical problems that arise in vision are discussed. For each of these problems, one or more relevant

examples

Keywords: extraction

are given.

invariance,

Lie

theory,

recognition,

and A Oosterlinck

feature

LIE’S THEORY INTRODUCTION

A recipe for invariants (nouve,lle cuisine style)

Invariants have proved very useful in vision applications and their use has gained new impetus during the last few years. The invariance strand of research has spurred developments in several areas of vision research’m’7. In particular, a wealth of efficient and viewpoint independent object recognitim approaches’343536393103’2 have resulted. Also, other areas have benefiteds9’3S’6X’7. A crucial, additional advantage often is that such systems don’t require calibration of the cameras and their positions with respect to each other or the scene. New invariants were and still have to be identified. These new features are interesting results in their own right, but all the more so because they are derived more systematically than many of their predecessors. This systematic analysis is due to the elaborate knowledge offered by invariance theory, the theoretical framework that encompasses them all. One approach to invariance is offered by Lie group theory.

Katholieke Universiteit Leuven, ESAT, Kardinaal 3001 Leuven, Belgium *Postdoctoral Research Fellow of the Belgian Scientific Research (NFWO) Paper received:

8 April 1994;

ON INVARIANCE

revised paper received

Mercierlaan National

94, B-

Fund

12 September

At some stage, every vision system extracts measurements from one or more images. These measurements can be as simple as the coordinates of some points, local orientations in points on curves or surfaces, or more involved ones such as moments. Depending on the application, these initial measurements are then combined into features that are insensitive to a number of changes in the scene or the imaging process. It may be desirable for the features to be immune against -or invariant under-changes of the object pose in the case of a recognition task, or against changes in camera intrinsic parameters such as focal length when calibration is difficult. Although highly attractive, such features are not always easy to find. A systematic way of extracting invariants is offered by Lie group theory. This section gives an intuitive introduction. The first thing to do is to specify the exact set of transformations a feature should be invariant under. In fact, invariance theory insists on having a group structure for this set. The relevance of the group concept for invariance can intuitively be understood as follows. If two transformations of the image leave a

for 1994

0262-8856/95/$09.50 Image

c

1995 Elsevier

and Vision

Computing

Science Volume

B.V. All rights 13 Number

reserved

4 May

1995

259

Vision and Lie’s approach

to invariance:

L van Go01 et al.

feature unchanged, then clearly the composition of these two transformations will not change the feature either. Therefore, the composition of any two such transformations also is contained in the set of transformations that keep the feature invariant. The identity transformation (which actually does nothing) clearly preserves the feature as well; and so does the inverse of any transformation already in the set. Knowing that all these transformations maintain the invariance of the feature, it is clear that we have approached the group concept very closely. So the initial problem is identifying the relevant group of transformations. Lie theory offers an elegant way to solve this problem, as described later. For now, the group is assumed to be known, and the focus is on finding the invariants.

Orbits of Lie group actions and their relation to invariants The easiest way to get a feel for the existence of invariants is by looking at some simple groups. First, consider the group SO(2) of rotations of the plane B*. As measurements, we choose the coordinates of a single point. Under the action of this group, a point of R* can be brought to any other point on a circle through that point and with the origin as its center. Such circles are called the orbits of the action of SO(2) on the plane. Obviously, the points on an orbit all have the same distance to the origin. This distance does not change under rotations, and hence it is an invariant for SO(2). Moreover, every feature that is invariant for SO(2) should have the same value for all points belonging to the same orbit, which is fully specified by this distance. Consequently, each invariant for SO(2) must be a function of the distance to the origin. In mathematical parlance, ‘the action of SO(2) on R* has one independent invariant’. Notice the relation between the concept of invariance and a special parameterization of the measurement space. Choosing polar coordinates (p, e), this parameterization seems to fit the situation naturally: varying 19 and keeping p fixed yields the orbits. The parameter p determines a straight line for each fixed 8 value, which together form a pencil with the origin as centre. These lines are transversal to the orbits. The value of p can therefore be used to tag all the points on the same orbit, i.e. p is invariant. Now, add 2D translations to the set of rotations, resulting in the group M(2) of rigid motions of the plane. Any point can now be brought to any other point by an appropriate motion. The orbit covers the entire plane. Consequently, an invariant for M(2) must have the same value at each point of the plane, and thus must be a constant function. Constant functions are invariant are called trivial in a trivial way, and therefore invariants. For practical applications, however, it is important to have non-trivial invariants, and there clearly is such an invariant for the rigid motions of the plane: the distance between two points. Selecting as

260

Image and Vision

Computing

Volume

13 Number

measurements the coordinates of two points yields a measurement space that consists of quadruples of real numbers, and thus can be identified with ZR4. On the other hand, the rigid motions only allow 3 degrees of freedom: the angle of rotation and the translating distances in both coordinate directions. The orbits have dimension 3, and consequently can be parameterized using three parameters, leaving a fourth parameter to parameterize the direction transversal to the orbits. There is 4 - 3 = 1 independent invariant (viz. the distance between the two points). Wrapping up, the number of independent invariants is the total number of independent parameters to describe the measurement space minus the number of independent parameters needed to fix the position of a point in measurement space in its orbit, i.e. the dimension of the orbit. A visual example of such a parameterization for two-dimensional (2D) orbits in a three?dimensional (3D) measurement space is given in Figure .j. If, for a certain choice of measurements, the orbits and the measurement space have equal dimension, then the number of measurements has to be increased, in the hope that the new orbits will have lower dimension than the enlarged measurement space. The dimension of the measurement space is easy to determine (for independent measurements a matter of counting their number). What about the dimension of the orbits? The fact that most groups popping up in vision are Lie groups is key to solving this problem. Loosely speaking, a Lie group is a group that (at least

Figure 1 Stack of 2D orbit patches in a 3D measurement space. Each triple of measurements is represented by a point in the measurement space. The action of the transformations describing the problem can bring such point to any other point on its orbit. The shown (u, V, w)parametrization of the measurement space follows the principles outlined in the text. The parameter W, with parameter lines that are everywhere transversal to the orbits, is the only independent invariant

4 May 1995

Vision and Lie’s approach

locally) can be described by means of smooth functions in a number (ni say) of continuously varying real parameters pi. Thepnumber nP is called the dimension of the Lie group. For instance, SO(2) is a 1D Lie group, completely characterized by the rotation angle. M(2) is a 3D Lie group, characterized by the rotation angle, and the translation distances in both coordinate directions. In general, a Lie group looks locally exactly as Euclidean n,,-space P. Now if such a Lie group acts on IR” (or more generally, on a manifold), then, by changing each of the nP group parameters, a point in R” can move in at most nP different, independent directions. Consequently, the dimension of an orbit of a Lie group cannot exceed the dimension of the group. Increasing the dimension of the measurement space so as to exceed the dimension of the Lie group thus guarantees the existence of non-trivial invariants, and this rule-ofthumb is referred to as the counting argument. It goes without saying that it may turn out overly conservative, since it suffices to surpass the dimension of the orbits, which is lower than or equal to the dimension of the group.

equation

are

just

to invariance:

all

L van Go01 et al.

functions

of the form F: R3 -+ ZR, as expected. Solving equations like (3), which express that an invariant is not allowed to change in the directions in which the orbits expand, is a technique that can be generalized to any Lie group action, as explained next. For the sake of notation, the measurements are combined into a column vector m = (ml, m2, . . . , m,,J’. When the objects or the camera move or the camera parameters change, the image(s) change too, and consequently also the measurements mi. For the moment, assume that the Lie group G whose action on the measurement vectors models these changes, is known. Formally, m’ = g. m, with m the measurement vector computed before the. change and m’ the one computed after the change, and with g the appropriate group element acting on the measurements. An invariant then is a function f(m) of the measurement vector m whose value is not affected by this action, i.e.:

f(x, y) = F(x2 + y2) for some function

f (m’) = f(g . m) = f(m)

for all g E G

Reasoning can be

Calculating invariants Lie theory offers a systematic way to calculate the independent invariants for a given group action. Suppose the relevant transformation group describing the changes in the measurements has been identified. An invariant is a (real) functionf(mi, m2, . . , m,J of the n, measurements mj that does not change when the values of the measurements are changed due to the action of the group. For instance, a rotation of the plane over an angle 8 maps the point (x, y) E m2 onto the point (x’, y’), where: X’ = x cos e y’=xsinB+ycos0 1

y sin 8

(1)

The x- and y-coordinates of an image point are taken as the measurements (i.e. ml = x and m2 = y). An invariant for SO(2) is a real function ,f(x, y) that satisfies the condition: f(xcos8-ysin8,xsin8+ycos@=f(x,y) for all rotation

angles 8 and points

Now note that any rotation obtained as a sequence of small angles. An invariant value is at no point affected of the group parameter 0. In f(x, y) for which:

(x, y)

(2)

around the origin can be rotations over arbitrarily is thus a function whose along this gradual change particular, it is a function

with the left-hand side expressing the initial rate of change (i.e. the directional derivative) of the function f when the point (x,y) rotates around the origin (i.e. moves along its orbit). And indeed, the solutions to this

is now as for SO(2). Since G is a Lie group, g (smooth) function expressed as a ,p,,) of the parameters pi. Suppose we g=g(p1,p2,... smoothly vary these parameters pi, then the element g will travel smoothly through the Lie group G, i.e. But if the grow g(t) = g(pi(& p2(r), . . . ,p,,(t)>. element changes, then so does the measurement vector: m’ z m’(t) = g(t). m. On the other hand, the value of the invariant must remain constant. Therefore the expression f(m’(t)) should be independent of t, or equivalently:

$f@‘(t)>= 0 The function f being a function of the measurements, which vary smoothly with the parameters pi, the chain rule of differentiation yields that: (4) Two important remarks are to be made here. First, a truly invariant function f should satisfy the above differential equation for every possible way in which the group parameters can change, i.e. for any function g(t) = g(pl(t>. pdt), . . . , pnP(Q). Hence equation (4) should hold for all possible values of dpj/dt. But this is only possible if:

$P$ I

=O

forallj=

1,2,...,n,

I

(5)

Note that the left-hand side of the equation is just the partial derivative of f(m’) with respect to the group parameter p,. A second remark is that, since the elements of a Lie group depend smoothly on the continuously varying parameters pji, each group element can be reached starting from the identity element by consecutive small

Image and Vision

Computing

Volume

13 Number

4 May 1995

261

Vision and Lie’s approach

to invariance:

L van Gool et a/.

changes of the different parameters. In mathematical parlance, any element of a (connected) Lie group can be obtained as the composition of elements arbitrarily close to the identity. Recall that any rotation around the origin can be obtained as a sequence of rotations over arbitrarily small angles (the zero angle corresponding to the identity); similarly, any translation can be obtained by translations over arbitrary small distances in each of the coordinate directions (the zero distance corresponding to the identity). It is therefore sufficient to concentrate on group elements g = g(p,, ~2,. . . ,p,,) whose parameter values are close to those of the identity element e. For the PDEs (5), this means it suffices to evaluate the expressions am:/apj around the identity e. In particular, each invariantfmust satisfy the system:

Conversely, one proves in Lie group theory thit each solutionf to the system above is an invariant for G. The differential operators

for an = V@l -x212 + (Yl - Y212) arbitrary function F: IR --f IR. This proves again that (x1 - ~2)~ + (~1 - ~2)~ is the only independent invariant for M(2) on the given measurement set. Recall that ‘independent’ means that every motion invariant is a function F of this particular one.

f(XI>Yl,X2,Y2)

APPLYING

THE FRAMEWORK

TO VISION

In computer vision, the search for invariants follows more or less the steps outlined in Figure 2. First, the application has to be studied carefully. It has to be decided which changes (i.e. transformations) of the scene are to be allowed, and what parameters of the camera(s) are to remain unknown (i.e. uncalibrated). The changes of the scene and the camera parameters will probably induce changes of the variables that are measured from the image(s). A good choice for the measurements is a first, difficult problem. The measurements typically have to be chosen before one

application ~$=&

I

I

are called the infinitesimal generators of the underlying group action. Taking the example of rigid motions in the plane, each transformation is characterized by three parameters: the rotation angle 8, and the translation distances tl and t2 in both the coordinate directions. For non-trivial invariants to exist, the x- and ycoordinates of two image points were shown necessary. If (xl, yl) and (x2, ye) are the image coordinates of two points, then the measurement vector m = (XI,ye, ~2, ~2)’ is transformed into the vector m’ = (x’, , y’, , XL, y!J’ as: xi = xl cos

e-

select

1 yi = x2 sin t3 + y2 cos

af -y1 G+xl

1

af --yyz 8Yl

Yes

e + t2

3-+x2

af --=o

8x2

3Y2

af+af=, ax, ax2

Computing

Volume

identify

an

find

group

invariants

(8)

The left-hand sides are the directional derivatives in each of the different orbit directions. So the same procedure -changing the group parameters and evaluating at the identity transformation-yields a system of PDEs that must be satisfied by every invariant. The solutions to this system consist of the form

Image and Vision

,____________ 1_____-___

(7)

yields three PDE’s for finding

I

262

measurements

yl sin 8 + tl

xi = x2 cos 8 - y2 sin 0 + tl

procedure

\

i;.;..;;

y: = xl sin 0 + yl cos 0 + t2

The above invariant:

VJI

c

13 Number

Yes 4 ready Figure 2

invariants

4 May 1995

Schematic

overview

for

application

of steps necessary

for the extraction

of

Vision and

can investigate how they are influenced by the changes. In particular, the dimension of the orbits depends upon the choice of the measurements. This dimension may be appreciably lower for some choices than for others, Once the choice for the measurements has been made, the dimension of the orbits is determined, and it is checked whether this dimension is exceeded by the number of measurements. If not, measurements have to be added or replaced by others, until the orbit dimension is lower than their number. However, the orbit dimension may grow when measurements are added. It may therefore be difficult to estimate the number of measurements needed to arrive at the necessary dimension surplus. Estimating this surplus is also complicated by the fact that the number of parameters in the original formulation of a problem is not always the dimension of the relevant group or its orbits, as explained later. Once there are enough measurements for the extraction of an invariant (i.e. the ‘Yes’ exit of the first test is taken), Lie’s theorems warrant the existence of a group. The knowledge of the infinitesimal generators describing the orbits is equivalent to knowing the group. An explicit representation of that group is not always available, however. The situation is comparable to having the derivative of a curve. Assuming smoothness, the curve can be reconstructed through integration. Similarly, knowing the local structure of the orbits at every point, it is a matter of integration to fully describe the underlying group action in the measurement space. The ‘identify group’ step in the figure has been put into a dashed box because explicit knowledge about the group is not necessary for finding invariants. Indeed, the PDEs that yield invariants only need a local description of the orbits. Nevertheless, knowing the group facilitates the analysis of the unexpected variability that can be dealt with by the invariants. Indeed, it might come out that the smallest group found is the same as that for situations allowing more changes in the scene or the camera parameters. In that case, it will be said that the group absorbs the additional changes as well. To find the invariants, the system of PDEs has to be solved. Although systematic approaches exist”,“, this may turn out to be a tiresome process. It was found that it usually helps to identify ‘building blocks’ of which the invariants can be composed. These blocks themselves do not remain invariant, but change by some factors, that can then be eliminated by taking algebraic combinations. An example where such strategy works out very nicely are the plane affne and projective groups’4, “. The problem of solving the PDEs is not discussed in this paper. Finally, the invariants have to be extracted from real images. The stability and robustness of this extraction have been stumbling blocks for the use of invariants presented in mathematical textbooks. In particular, differential invariants’ (containing only the coordinates and derivatives thereof in a single point) have proved difficult to implement. Even for a

Image

Lie’s approach to invariance:

L van Go01 et al.

simple group like the plane aftine transformations, the order of derivatives needed grows beyond reach when they are to be calculated on image contours directly (a 4th order derivative is required, as will be shown). As an alternative, semi-differential invariants were pre-sented’3-‘57 19. They are discussed in a separate section. The remainder of the paper focuses on four problems that may be encountered when applying the Lie group framework to computer vision. They are summarized in Figure 3. To each problem a section will be devoted. Problems 1 and 2 have to do with identifying the group and selecting the measurements. In a mathematical context, problems usually are formulated directly in terms of a group acting on some variables. Vision problems usually do not present themselves that way, and the variability one has to deal with has to be analysed first. Moreover, one usually has quite some leeway in the choice of the measurements. The group that describes the changes of these measurements may differ from one choice to the next. Thus, getting started might take more time than actually applying the framework as outlined thus far. Problems 3 and 4 discuss deviations from Lie theory pure. The Lie group framework is geared towards connected groups that act semi-regularly. These conditions are explained later on, and it is shown that the framework remains useful even if they are violated.

Problem 1: Identifying the appropriate Lie group If a measurement type has been selected, it is not necessarily clear what the corresponding group is. The application will have been formulated in terms of parameters that vary, but the corresponding transformations may not form a group. The smallest group containing these transformations has then to be found. Moreover, for the calculation of the invariants, it is more the dimension of the orbits that matters. This dimension is not always the same as that of the group. This is the central theme of this section, about which the following topics revolve. First, if the measurement set includes derivatives of coordinates, the transformation group will be enlarged due to the additional uncertainties in the parameterizations of curves and surfaces. As will be shown, the added complexity tends to be low.

Figure 3 Lie group theoretical framework for the calculation of invariants requires some care when applying it to vision. Four problems are discussed

and Vision

Computing

Volume

13 Number

4 May

1995

263

Vision and Lie’s approach

to invariance:

L van Go01 et al.

The next subsection illustrates how the theory supports finding the group and orbit dimensions. It clarifies the Lie bracket as a useful mathematical tool, and illustrates the subtle differences between these two dimensions. The problems of measurement selection and the determination of the relevant group are heavily interwined. It is illustrated how the theory helps in choosing camera setups and measurement choices. Finally, attention is drawn to an erroneous way of estimating group dimensions, which one is nevertheless tempted to embark on. Coexistence with the reparametrization group In previous examples, coordinates of image points were used for the measurements. On the other hand, local orientations were already suggested. An example where this choice is rather natural is when one is dealing with curves. The most practical way of describing a curve is by means of a parameterization. This is a function ~1:I c D7 H R2 (or I c R H E!’ in the case of a space curve) of which the image a(t) traces out the curve when t runs through I. Apart from the image coordinates x(t) and y(t), one can also use their derivatives d’x/dt’ and d’y/dt’ as measurements. As argued next, the use of derivatives adds to the variability to be dealt with, i.e. comes in addition to the geometric changes in the scene and the possible variations of camera parameters. Allowing derivatives as measurements complicates the computation of invariants, since the parametrization of a curve will differ between views. Derivatives of the point coordinates with respect to these parametrizations will also change. Features (invariants) are needed that are insensitive to changes in the parametrization as well. Fortunately, the effect of such reparametrization on a curve can be described by the action of a Lie group, called the reparametrization group, and parametrization invariance can therefore be treated within the Lie framework. Consider a curve in the image, parametrized as a(t) = (x(t),y(t)), and deline a reparametrization t = &(t*), where 4 is a monotone smooth function. If differentiation with respect to t is denoted by a dot, and differentiation with respect to t* by a prime, then we get the following transformation matrix for the coordinate derivatives:

where d is the highest derivative considered. Similar relations hold for the y-coordinates. Taking into account that the function d, is an arbitrary monotone function, it follows that the vectors containing subse-

264

Image

and Vision

Computing

Volume

13 Number

quently higher derivatives (starting with 0th order, i.e. the coordinates pure) transform linearly under reparametrization with the following (d + 1) x (d + 1) transformation matrix: 0

0

...

0

a

0

0

...

0

b

a2

0

...

0

c . .

3ab ...

a3 . .

.

0

1

0

0 0 0 . .

(10)

.

The set of the above matrices is closed under matrix multiplication and forms a d-dimensional Lie group. That the relations between the elements are preserved under the composition of such matrices can be checked rather easily, but transpires immediately when considering that two subsequent reparametrizations are equivalent to one reparametrization. This reparametrization still has to have the above form. The changes for both x- and y-coordinates at a single point are governed by the same reparametrization matrix, but at different points different such matrices apply. Investigating invariance under both the group describing the projection induced deformations and the above reparametrization group is not that difficult when their effects can be disentangled. Indeed, the reparametrization acts by taking linear combinations of vectors containing coordinate derivatives of the same order. Suppose x denotes the vector constituted by the coordinates (x(t), y(t))' of a given curve point (i.e. x = (x(t), y(t))', and x(~‘~) (or xck) for short) denotes the vector of kth order derivatives of the coordinates x and y with respect to the curve parameter t (i.e. xck) = (dkx/dtk, dky/dtk)‘). Changing the parameter t into t*, for example, has the following effect on the third x(3:‘*) = cx(i”) + 3&x(*“) + a3x(3’f). derivative vector: The projection induced deformations, on the other hand, most of the time affect the individual coordinates differently, but act on each order of differentiation in a similar way. For instance, an affine transformation:

x’ = allx + at2y + bl y’=azlx+a~y+b2

(11)

transforms each derivative vector as x’(~) = AX(~) (where A is 2 x 2-matrix (au)). By the linearity of this transformation, it follows immediately that the action of the reparametrization group commutes with the affine transformations. This is bound to happen in many practical situations. From the point of view of invariants, this means that the set of invariants for both projection deformations and reparametrization is just the intersection of the set of deformation invariants with the set of reparametrization invariants. The reparametrization group is independent of the application, and its invariants can be computed once and for all. Table I contains the (independent) relative invariants for reparametrization involving derivatives up to the fourth order”. A relative invariant is a

4 May 1995

Vision and Lie’s approach Table 1 Relative invariants under reparametrization. stand for the determinant of the marix whose columns between them

(12)

y’=xsintl+yc0st3

w

x Y xu) yu)

0 0 1

(1)x(2), IX

3

121x(‘) xy

L van Go01 et al.

X’ = x cos e - y sin e + tl

Vertical bars are the vectors

R-invariant

3xP),x(‘)

to invariance:

where 8 and tI can take any real value. This set of transformations is not closed under composition, since a cascade of two such transformations with parameters (0, t,) and then (e’, t’,) yields:

1

xP),_xu)lxu) x(2) x(X,

x(3),

_ 5,xv)

X’ = x cos(e + e’) - y sin(8 + e/j + (t, cos 8’ + t’,)

5 .(3)

12 +

3, x(I) x(4 ,, x(‘l

x(4),

y’ = x sin(8 + et) + y c0s(e + e’) + tl sin e’

8

(13) function of the measurements whose value changes under transformation by multiplication with a scalar factor that depends upon the group parameters only. Formally, f‘(m) is a relative invariant for the group G if and only if f(g . m) = ;l(g)f(m) for some function A: G --f IR, coined the weight of f- The weights of relative reparametrization invariants are powers a” of the group parameter a = 4’ (cfr. equation (10)). The second column in the table gives the appropriate value of w for the invariants. Vertical bars stand for the determinant of the matrix whose columns are the vectors between them. The reason for being interested in relative invariants is two-fold. They allow the generation of invariant parameters” and absolute invariants are now easily found as appropriate ratios. A priori knowledge of the relative reparametrization invariants restricts the search space for projection invariants. For instance, consider the affine transformations (see equation (11)) that satisfy the condition alIa22 - ~1212~21 = 1, i.e. the unimodular transformations. Determinants of coordinate differences and/or derivatives are invariants for this group. It follows that a differential invariant (i.e. using only coordinate derivatives in one point) under unimodular transformations and reparametrization simultaneously is given by:

121x(l)x(2)~~x(2)x(3)~ _ 5jX(l)X(3)j2 + 31x(l)x(2)1)x(‘)x(4)~

with a non-zero translation part for the second argument. One must include translations in the ydirection in order to obtain a group structure, i.e. M(2). Although we started with a set of transformations depending on two parameters, the actual Lie group is 30. A transformation T of the form (12) can be interpreted as first rotating over an angle f3 about the origin and subsequently translating over distance tl in the x-direction. Let us denote this formally by T = X o R. The composition of two transformations T1 and T2 then is T2 o T, = X2 o R2 o X1 o RI. If the order of the middle factors R2 and XI could be reversed, then one would get (X2 o Xl) o (R2 o RI), which is of the same form as the original transformation (12). Unfortunately, rotations and translations in general do not commute, i.e. R2 o XI # X1 o RI. Instead: R2 o X, = (R2 o X, o R,’ o X;‘)

o (X, 0 R2)

The order of Rz and XI may be reversed provided the correcting commutator R2 o Xl o R,’ o X,’ is applied. The nice thing about this commutator is that it is itself a translation with a component in the y-direction, thereby indicating the need to include such translations. The relevant Lie group for this example is the rigid motion group M(2), for which the infinitesimal generators have already been extracted (equation (8)): -YE+\--=o

91x(l)x(2)18/3

af 8Y

which is the well-known aftine curvature from differential geometry (expressed for an arbitrary parameter). Indeed, the weights w of the numerator and denominator compensate to render the expression invariant under reparametrization. On the other hand, all the determinants in the expression are invariant under unimodular transformations in their own right, and consequently the overall expression cannot change either. Determining the group and orbit dimensions So far, nothing has been said about actually identifying the smallest, relevant group. Quite often, the set of transformations that describes the application is not a since it is not closed under composition. group, Moreover, the number of parameters determining the transformations often differs from the dimension of the underlying group. For instance, consider transformations of the form:

af af ay=

ax’

0

(14)

0

For the example at hand, we only had the transformation formulas (12) at our disposal. These transformations depend upon two independent real parameters (viz. 0 and t,). Differentiating with respect to these two parameters and evaluating at the identity yields the first two PDEs of system (14). How do we find the third one? The missing link for the group structure is the commutator of any two transformations contained in the set. It can be proved that the infinitesimal generator corresponding to the commutator of two one-parametertransformations is just the Lie bracket of the infinitesimal generators of the two original transformations’8. The Lie bracket of the two infinitesimal generators Cl

Image and Vision

Computing

Volume

13 Number

4 May 1995

265

Vision and Lie’s approach

to invariance:

L van Go01 et al.

and LZ is defined as [CC,,Cl] = ClCz - C,CI; or applied to a functionf [Cl, L,](f) = C,(C,(f)) - CdLl(f)). In the example above, the infinitesimal generator Lx of the a translation component X is Lx = ax and the infinitesimal

CR = -y g applied

CR

generator + x z.

to the function

[Lx, L,](f)

of

the

Hence,

R

is

[Lx, CR],

f(x, y) is:

( -( ax

part

the Lie bracket

= L&R(f))

=-a

rotation

- cR(LX(f))

a.f af

-vjg+xdy

a

-Y$“nJ:

>

a

>( > af

T&

af

=T&

So the Lie bracket of the first two infinitesimal generators (which were computed directly from the given set of transformations) yields the third infinitesimal generator (which is needed to turn the set of transformations into a group). This implies the following strategy for the identification of the appropriate Lie group: calculate the set of relevant transformations of the measurements, compute for each unknown parameter the corresponding infinitesimal generator, then compute the Lie bracket of any two of the generators in the set. These Lie brackets are themselves infinitesimal generators of one-parameter-transformations of the Bppropriate Lie group, and thus also their Lie brackets should be added. And so on. So the question is: when to stop this process? Sophus Lie’s Fundamental Theorems say that a set of infinitesimal generators corresponds to the action of a Lie group if and only if the Lie bracket of any two infinitesimal generators in the set can be expressed as an R-linear combination of the infinitesimal generators of this set. Hence, stop when the Lie bracket of any two infinitesimal generators can be expressed as an R-linear combination of those already contained in the set. Such set of infinitesimal generators is said to be in involution. In mathematics, a (real) vector space that is closed under the operation of taking Lie brackets is called a (real) Lie algebra. Now what about invariants? Well, each function J that is a solution to the system of PDEs defined by the infinitesimal generators must also be a solution to each m-linear combination of these equations, and vice versa. Thus, interestingly enough, considering the vector space spanned by the infinitesimal generators. it suffices to solve any system of equations whose differential operators form a basis for that vector space. Moreover, the dimension of the relevant Lie group is just the dimension (as a real vector space) of the smallest Lie algebra containing the infinitesimal generators. There is a close relationship between the Lie algebra of infinitesimal generators and the dimension of the orbits. Recall that the left-hand sides of the PDEs can be interpreted as directional derivatives of the function f, thus expressing that an invariant is not allowed to

266

Image

and Vision

Computing

Volume

13 Number

change in the directions in which the orbits expand. In particular, this means that the differential operators showing up in the system in fact provide a local description of the orbits. The coefficients of the partial derivatives in the equations are just the components of the tangent vectors to the orbit passing through the point in the measurement space at which the value of the invariant is calculated. In particular, the dimension of the orbit passing through a given point is just the dimension of the (real) vector space spanned by these tangent vectors (i.e. the dimension of the tangent space to the orbit at the given point). Consider system (14). The span of (-y, x), (1,0) and (0,l) is ZR’, which proves that the orbit is Z-dimensional. On the other hand, if we would take the x- and y-coordinates of two image points as the measurements, then the coefficients of the partial derivatives off are (-yl , XI, -y2, x2), (1, 0, 1,O) and (0, l,O,1) (see system (8)). If (XI, VI) # byd, then these vectors are independent and span a 3-dimensional vector space, embedded in the 4-dimensional space of four-tuples. Thus, for points in general positions, the orbit passing through them is 3-dimensional. There is a subtle difference between the dimension of the Lie group and that of the orbits, although most of the time these two numbers are the same. This explains why the counting argument works in most cases. The reason that it may fail is that ‘linear dependence’ between infinitesimal generators is defined differently: to determine the dimension of the group only linear combinations with real numbers as coefficients may be taken, whereas for the dimension of the orbit also point coordinates, etc. may be used. An example will illustrate difference: the infinitesimal the generators L1 =.x$+y$

and

L1=x2&+xy&

are

in

involution, sin& [L,, L2] = Cl. The (real) Lie algebra generated by them is 2-dimensional, because it is impossible to find real numbers c( and /? such that aC1 + PC2 E 0 (for all possible values of x and y). So the corresponding Lie group has dimension 2. But the orbit through the point x = (x, y) generated by this group is lD, since it extends in the directions VI = (x, y) and v2 = (x2, xy) (the coefficients of the partial derivatives) which, at the given point x (i.e. for fixed values of x and y) are dependent: v2 = XVI, i.e. both directions are the same. The group depends upon the measurements in the introductory sections it was assumed that the Lie group acting on the measurements had been identified. Normally, the identification of the group is part of the problem. For one thing, the group is not only determined by the problem at hand (i.e. the variability to handle), but just the same by the measurements that are used, and this not only because of reparametrization. Ideally, one would like to use robust measurements that undergo changes that can be described by the smallest group possible. As a well-known example, consider planar objects viewed tider pseudo-orthographic conditions (i.e.

4 May 1995

Vision and Lie’s approach

u = I(rllX+

simulating perspective projection as orthographic projection + scaling). The objects and the camera have an arbitrary relative orientation. Image point coordinates undergo affine transformations (cfr. equation (11)) between views. The plane affine transformations form a 60 Lie group, since they are determined by six independent parameters. On the other hand, choosing areas as measurements, these simply scale by the determinant of the affine transformation acting on the coordinates. Hence, a 1D group governs this case, since this scaling is described by a single parameter. The extraction of invariants requires at least four points but only two areas: at least four pairs of point coordinates are needed to surpass the six dimensions of the affne group, whereas only two areas suffice to surpass the one dimension of the scaling group. The following example is more involved, and illustrates how the framework also helps in selecting camera configurations, a problem directly linked to measurement selection. It is known that two orthographic views taken from different viewing directions of at least five points allow to position any of these points in an affine frame generated by the other’ four point8. Hence, taking four points as a basis, scenes can be reconstructed up to a 3D affine transformation. If the relative orientation of the cameras for the two views is known, one can do better. This case will serve as a sketchy example. The setup is illustrated in Figure 4. Suppose we attach the world coordinate system to the first camera. The image coordinates for that camera of a point with space coordinates (X, Y, 2) are:

r12Y+

to invariance:

L van Go01 et a/.

r13Z) + la

v = l(r2,X+r22Y+r23Z)

+lb

where the different scaling factor 1 has to account for changing distances to the object, a different focal length for the second camera, and possibly different pixel sizes. Together these formulas constitute the following relationship between the coordinates of a scene point and its images:

Considering the same object can described by X’ = (X’, Y’, Z’, point after the the motion, and

two cameras as a rigid stereo setup, the be looked at after a relative 3D motion with the equation X’ = EX, 1)’ the space coordinates of the scene motion, X = (X, Y,Z, 1)’ those before with:

E=

where R is a 3 x 3 rotation matrix and t E IR3 a translation vector. Using and s = (X,Y, 4 4’ s’ = (x’, y’, u’, v’)’ as measurements before and after the motion, resp., they are seen to transform as: s’ = U'(AE)U-'

s

(16)

where II = Z’/l expresses the resealing induced by the motion, and with:

of the images

x=kX /k/l

{ y=kY Suppose the relative orientation of the second camera is known and given by the rotation matrix (ri,). The translation vector between the cameras is (a, b, c)‘. Hence, the image coordinates for the point (X, Y,Z) as seen by the second camera are:

fk’/l’

0

0

0

O\

0

O\

the matrices that capture the stereo configuration. The crucial question is: what is the smallest Lie group that contains all transformations (16) (and their compositions)? It follows from equation (16) that, if the matrices U and U’ are known (i.e. if the relative scales k/l and k//Z’ are known), one is dealing with 3D similarities, i.e. a 7D Lie group Sim(3). If at least two points are available, k/l and k’ll’ can be calculated. Writing the coordinate differences of these points as A, it transpires that: k _Iknown

rotation

Figure 4 Stereo setup for obtaining two pseudo-orthographic by a pair of cameras with known relative orientation

views

Image

(r21r13 - rllr23)Ax + (r22r13 - r12r23)Ay rl3Av - r22Au

(17)

Notice that the same points used for this calculation can be reused fdr the extraction of invariants under the resulting 3D similarities, but that the knowledge of k/l

and Vision

Computing

Volume

13 Number

4 May

1995

267

Vision and Lie’s approach

to invariance:

L van Go01 et al.

and k’ll’ also reduces the number of independent measurements within each vector s to 3. Hence, three points are needed to extract an invariant. Next, suppose s = (x, y, u)’ at different points are selected as measurements. This could be relevant when observing points projected by a laser and replacing the second camera by a linear CCD and the necessary optics. If s’ are the corresponding measurements obtained with the same setup but after a relative 3D motion of the scene and the cameras, then: s’ =

U=

k/l

0

0

0

k/l

0

( r”

=

c;

n2

r13

k;l’

and )

r!3)

Implosion of vector fields near the identity The study of the Lie algebra has been introduced to identify the relevant group. A danger lurking behind many vision applications is the following. When observing a 3D geometrical structure from different viewing positions or at different moments during a motion, the different views are often related by a transformation T of the type T = VSV-‘, where V captures the viewing conditions (i.e. the effect of projection and the camera parameters), whereas S represents the changes in the scene (e.g. the new camera position and/or the motion of the observed object. The stereo problem of the previous subsection is a good example. Using the expounded strategy to identify the relevant Lie group, one should take smooth curves of relevant transformations (passing through the identity) by varying consecutively each of the (unknown) parameters in the transformation T. By the nature of the problem, one is tempted to take curves T(t) of the type T(t) = V(t)&‘(t) V(t)-‘, where V(t) is a smooth curve in

Image

evaluating

T(0) = ti(o)S(o)V-‘(0)

at the identity

yields:

+ V(o)s(o)V~‘(o)

- V(0) S(0) V’(0)

P(O) V’(0)

(18)

= S(0)

and i E W3 an arbitrary translation vector. Equation (17) can no longer be used to determine k/f, since it needs knowledge on the vs, which are not measured. The consequences of adding this single unknown parameter are more severe than might be anticipated. Indeed, the infinitesimal generator for k/l producesthrough the Lie bracket with the other generators-four other, additional generators, resulting in a system of rank 12 instead of 7. Admittedly, the above camera configuration is a bit unusual. Normally, it would make more sense to use three linear cameras instead. Needing less time to readout and process the visual data as well as the availability of higher resolutions might be reasons to opt for such setup. The conclusion would be the same, i.e. a 12D group results. At least five points have to be combined for an invariant.

268

and subsequently

U'(AR)U-' s+ i

with this time:

U’

the set of viewing transformations and S(t) is a smooth curve in the set of scene changes, both passing through the identity transformation at, say, t = 0. If one then differentiates T(t) with respect t, one gets:

and Vision

Computing

Volume

13 Number

since we assumed that both V(0) and S(0) represent the identity transformation. Blind application of the theory thus yields the Lie algebra of the group generated by the transformations S rather than that of the group of conjugated transformations. Of course, the method is not to blame here. Rather the choice of the generating curves is unfortunate, since it only addresses a part of the directions in which the Lie group extends near the identity. A solution is to choose trajectories V(t) that do not contain the identity. Indeed, there is by no means a need for such choice, since by the particular nature of transformations T, the T(0) = V(O)S(O)V(O)-’ = V(0)V(O)p’ = I (I is the identity) whenever S(0) is the identity. Equation (18) shows that arbitrary choices of V(t) add conjugations f(O) = V(O)$O)V(O)-’ of the $0) to the Lie algebra of the relevant group.

Problem

2: Selecting

the measurements

Reference to the measurement selection problem has been made repeatedly in the previous section, since this issue cannot be decoupled from the group identification. This section illustrates two more points, directly related to the choice of measurements. A first subsection shows that there is no guarantee that invariants can be found for a given choice of measurements, even if they are not too exotic to be successful for slightly different problems. A second subsection discusses measurement sets that are mixtures of point coordinates and coordinate derivatives at different points. Such combinations are useful to avoid both numerical problems with high order differentiation and painstaking searches for a sufficient number of points to build invariants composed of coordinates only. Failure to close the measurement set One of the basic conditions for application of the theory is that the value of each measurement is a (smooth) function of the measurements before the transformation. For instance, if the rotation group SO(2) is the relevant transformation group (cfr. equation (l)), then nobody will think of taking only the x-coordinate of an image measurement, since its value point as x’ = x cos 8 - y sin 8 after rotation also depends on the value of the y-coordinate. Taking x as a measurement

4 May

1995

Vision and Lie’s approach

forces one to include also y as measurement (and vice versa). A set of measurements that satisfies the above condition is called closed with respect to the group action. So, {x,~} is closed for the action of SO(2) on the plane, but {x} is not. There are cases where closure cannot be achieved, however, as will be shown next. When taking point coordinates or local directions as measurements, the closure condition will quite often immediately be satisfied, because in many cases the group action is defined by its direct effect on the point coordinates themselves. For other measurements it might not immediately be clear whether the measurement set is closed under the group action, and if it is not, which other measurements must be included to close it. For instance, consider using moments. They are defined as follows: if the image is given by a binary intensity map i(x, y), value 1 for object points and 0 for background, then the (p, q)th moment is defined as:

MIA9=

JJ RZ

xpy4i(x,

+ JJ(QlX al2Y

+ h)P(a21x

+ Q22Y + b21Y

-

al2a2l>dxd?i

JJ abs

wp(Y’)9

dx’ i3y’

-

( dx

-

ay

dx’ ay’

- -

ay

-

dx >

(20) dxdy

where: x,

= Pll,X

fPl2Y

LIpCe

should

be functions

of the mi. For

instance, in the affine case, formula (19) implies that a measurement set containing the moment Mp,y should also contain:

in order to be closed under affine transformations of the image. This proves again that the smallest closed set

Expanding the right-hand side results in a linear combination of moments of order less than or equal to then all other p + q. So, if M,,,, is a measurement, moments of order lower than or equal to p + q must also belong to the measurement set to meet the closure condition. In particular, taking Ml,1 as the first measurement, forces one to include Mz.0, Mo.2, Ml.0, MO, I and Mo,~ inthe measurement set. For the projective transformations, the expression for Mb, 4 becomes more complicated:

RZ

dm!

the factors

%1/i=, =(q+ l)Mp,,

(19)

=

with the pii arbitrary real numbers that together form an invertible 3 x 3-matrix P. It is a non-trivial exercise to figure out from the formula above what other moments have to be included when taking MI. I as the first measurement. Fortunately, Lie theory offers a method to solve this problem in a much easier way. For the equations (6) to form a system of PDEs in the measurements mi-a necessary condition to solve it-

%lA=,=qM,+,,,-,

R’

i(x,Y)abs(alla22

M’P?Y

L van Go01 et al.

y) dxdy

where integration takes place over the whole image. One says that p + q is the order of the moment. By cleverly combining moments, one can obtain moment invariants if the whole pattern undergoes the same transformation and remains completely visible. As an example, an expression is constructed that involves up to second order moments and is invariant under affine transformations of the image. An affine transformation (11) of the image will change the value of M,,, into: Mb.4 =

to invariance:

$-PI3

P3 I X + P32Y + p33 (21) Y, = P21x

+P22Y

+P23

p3lx

+p32Y

+p33

Image

containing Once a system (6) down and

MI,I is {Mo,o,M~,o,Mo.I,M~,o,M~.~,Mo.~}. closed measurement set is obtained, the of PDEs for the invariants can be written solved, yielding:

(MzoMo,o

- M:,,)(Mo.2Mo.o

-(Ml

-M;,,)

~Mno-hf~nMn~j~ Mi.0

as affne invariant. As a side note, observe that the existence of this invariant is an example where the counting argument fails. Indeed, the counting argument predicts no non-trivial invariants for this measurement set: six independent measurements minus six independent group parameters gives 0 independent invariants. But recall that this counting argument is just a rule of thumb, and that the correct number of independent invariants actually equals the number of independent measurements minus the dimension of the orbit. The afline group admits 5D orbits in the 6D space of measurement vectors m = (Mo,o, Ml,o, MO, 1, M~,o, Ml, 1, MC&‘, thus yielding the existence of 1 independent invariant, which is the one given above. Encouraged by this success, one may then want to go for projective invariant moments. In that case, the image coordinates transform as in equation (21) and

and Vision

Computing

Volume

13 Number

4 May

1995

269

Vision and Lie’s approach

to invariance:

L van Go01 et al.

the corresponding new values of the moments equation (20). Observing that:

are as in

dkf ’

LIP=,=

-(P+q+3)Mp+l,,

it is clear that the presence of a moment of order p + q forces the measurement set to also contain a moment of order p + q + 1 if it is to be closed under projective transformations of the image. This shows that a finite set of moments cannot be closed under the action of the projective group, because selecting any integer value for p, moments with ever larger p will keep on being generated. Symmetry considerations make clear that a similar problem arises with the choice of q when the derivative with respect to ~32 is taken. Hence, projective invariant moments do not exist. This tallies with observations made recently by Astr6m2’. Semi-differential invariants If the dimension of the orbits gets large, it may be difficult to find invariants that can be computed robustly. Indeed, the higher this dimension gets, the more measurements have to be combined and, hence, the higher the expected complexity of the invariants. Suppose the task is to recognize planar curves from distances appreciable larger than the size of the objects. Different views of a curve will differ by an afline transformation in the plane. This yields a 6D transformation group if point coordinates are used as measurements and four points have to be combined at least. If only coordinate derivatives at a single point are used, derivatives up to the fourth derivative were shown to be required for affine curvature, which is in fact only a relative invariant under affne transformations. The latter invariant is prone to noise, whereas finding the same combination of four points in different views can turn out a hard problem as well. The semi-differential framework14, I59l9 offers a way out, in that it combines low order derivatives in different points. This is a way to keep both the number of points and the order of the derivatives low, and yet have sufficient measurements to arrive at invariants. As an example, suppose a curve segment has been found by an edge detector (see Figure 5). The endpoints of the segment are not fixed, in the sense that this is just a portion of a curve that happens to have been extracted successfully from a particular view. Now suppose that two additional points xi and x2 also are identified. These are supposed to be rigidly connected to the curve, i.e. the points and the curve undergo the same aftine transformation when changing viewpoint. In this case, it would be difficult to pinpoint a third reference point. But a point belonging to the curve can bring in more measurements: its coordinates and their derivatives. As an example: Ix -

x1

.(‘)I

(x -

x2

x(‘)l

yields an invariant for every point x on the curve. Note that it is an invariant under both the afline transformations and reparametrization. Since this expression only

270

Image

and Vision

Computing

Volume

13 Number

l

x1

Figure 5 points

Segment

of a curve

and

two

rigidly

attached

reference

uses first order derivatives, it can be computed with sufficient robustness. More on the use of such invariants for plane affine and projective transformations is found elsewhere5, i4, 15.

Problem 3: Disconnected

Lie groups

Lie’s theory for the calculation of invariants assumes that each group element (transformation) can be obtained by composing elements that are close to the identity. In mathematical terms, this means that the Lie group is connected. More precisely, a Lie group is connectedif any two group elements can be joined by a smooth curve lying entirely in the group. The condition of connectedness is crucial to the calculation of invariants in the sense that it allows one to restrict the attention to the initial rate of change of a function along the orbit directions; thus resulting in PDEs for the invariants. Many Lie groups, however, are not connected. An interesting example is the transformation group that describes skewed symmetry. Skewed symmetry has to be understood here as the orthographic projection of a mirror symmetric contour, looked at obliquely. The transformation between symmetrically positioned points, is of the form:

where A is an invertible 2 x 2-matrix that embodies the effects of rotation in the object plane and projection. If the coefficients of A are unknown, composition generates arbitrary afline transformations with the linear part having determinant fl. So the relevant Lie group, in this case, consists of all affine transformations of the form x’=Ax+b with detA = &l. This is a 5D Lie group, since the six parameters of the affine group are constrained to satisfy this additional relation. The group has two connected components: one consisting of all transformations with det A = +l (the unimodular group), and the other formed by the transformations with determlnant - 1. Obviously, the second component cannot be excluded, because the symmetry transforma-

4 May 1995

Vision and Lie’s approach

tion (22) itself belongs to it. As the identity transformation belongs to the other component (the unimodular group), one can expect to get invariants for this component only. Indeed, there is no way one could generate a transformation with negative determinant through a combination of transformations that are close to the identity, since all of these have positive determinants. Sooner or later one has to perform a reflection to get a transformation with negative determinant; and this is a discontinuity in the process of gradual change. Since the unimodular group is 5D, the coordinates of three image points xi, x2 and x3 suffice for the generation of an invariant. It is Ix, - x3 x2 - x3(. This expression changes sign under an affine transformation with -l-determinant, and thus is not invariant for the entire group. Squaring the expression would solve the duality. In practice, however, one knows that the transformation must have a -l-determinant if the structure looked at has a skewed mirror symmetry and there is no real damage as far as the applicability goes. That changing components can affect the invariants by more than a multiplication with a scalar, is shown by the following example. Consider rigid motions of the form: x’ = x cos(2nk/n)

- y sin(2rck/n)

+ ti

1 y’ = x sin(27ck/n)

+ y cos(2rck/n)

+ t2

(23)

where n is a positive integer, k is an integer that runs from 0 to IZ- 1, and with tl and t2 arbitrary real numbers. They constitute a 2D Lie group, with two continuously varying real parameters (viz. tl and tz) involved in its description. Moreover, all transformations with the same k-value form a connected component of the group. So this Lie group has n different connected components. The connected component containing the identity transformation is formed by pure translations. Applying the Lie approach therefore yields translation invariants. In particular, taking point coordinates (x,, yi) (i = 1,2) as measurements, two independent invariants are found: xi - x2 and yi - 39. The motion (23) transforms x1 - x2 into (x1 - x2) cos(27ck/n) - (~1 - y2) sin(27rkln). This example is also illustrative in another way. Whereas the invariants found by the procedure described earlier may change value when applying a to another connected element belonging group component, they do have the same value ,for all transformations belonging to the same component: the expression is the same for all motions with the same kvalue. Consequently, one only has to see how the invariant changes under one transformation of the given component. In the motion example, we only have to investigate how the invariant(s) change when applying a rotation over an angle 27ck/n; and in the skewed symmetry case, one only has to check the effect of applying a reflection (e.g. with respect to the x-axis). This is explained as follows: in Lie group theory it is proved that the connected component Go of a Lie group G that contains the identity element, is a normal (Lie) subgroup of G (with the same dimension, i.e. dim Go = dim G). In particular, this means that the

to invariance:

L van Gool et al.

connected component of G passing through a partielement LEG is just the cular coset Gag = (gogIg E Go} of g. For instance, in the rigid motion example, Go consists of all pure translations, and the component containing the transformation (23) is obtained by first applying a pure rotation over an angle 2nk/n and subsequently performing pure translations, Turning back to the invariants, the procedure for their extraction yields invariants for the connected component Go only. Let f be such an invariant. Then f(go . m) =,f(m) for all go E Go. Now let g be an arbitrary element of the Lie group G. The connected component of G passing through g consists of all group elements of the form gag where go runs over all elements of Go. But f ((gag) . m) = f (go . (g . m)) = f (g . m) for all go E Go, which proves that f takes the same value (viz. f(g m)) for all group elements belonging to the same connected component as g. This property can be used to turn an invariant for Go into one for G. One also proves in Lie group theory that a Lie group can only have a discrete number of connected components. In vision applications, almost all Lie groups have a finite number of connected components. In that case, one can choose a particular different component. element in each Let {go, gl, . ,gn} (with go = e, the identity element of G, taken as the representative of Go) be such a set of elements for G. An invariant f(m) of Go takes the following values under the action of G: f (go m) = f (m), ,f(g1 . m), . , f (g, . m). Let f (m) be their product: .f (4

= f (m).f (g I . m) . .f (8, . m)

ThisJ’is an invariant an arbitrary element

.f(g ml = fi

fki

for the whole of G. Indeed, of G. Then: .

k ml>= fi f((gig)

let g be

m)

i=o

i=O

As Go is a normal subgroup of G, right multiplication with g permutes the cosets of Go in G. Hence: f(g

m> = fi f ((gig) i=o

ml = fi f (gi . ml = fO4 ;=a

thus proving thatfindeed is an invariant for G. In the same way, one proves that any symmetric function of these values is an iEvariant for G. In particular, this holds for the sum f(m) = f (m)f (gl . m) . .f(gn m). Quite invariants of G.

often, j and f= will be independent

Problem 4: Actions that are not semi-regular It was noted that the counting argument gives a conservative estimate of the number of measurements needed to generate invariants. The affine moments example shows that the actual number can be lower. The underlying reason is that the orbit actually is of strictly lower dimension than the group. But the counting argument also is conservative in another way:

Image and Vision

Computing

Volume

13 Number

4 May 1995

271

Vision and Lie’s approach to invariance:

L van Go01 et al.

it presumes that all the orbits have the same dimension (viz. all equal to the dimension of the group). This is not always the case. For instance, for the action of the rotation group SU(2) on the plane, circles centred at the ot’!gin are 1D orbits, but the origin itself also is an orbit in its own right and has dimension 0. Actions for which all the orbits have the same dimension are called semiregular. So the counting rule presumes that the action is semi-regular. This condition did not affect the foregoing analysis all that much, since orbits with lower dimension were discarded. Mathematically, this was allowed since the resulting set (i.e. action space minus the lower dimensional orbits) still had a manifold structure. Sometimes this operation can be reversed: delete the orbits of maximal dimension and look if the remaining subset can be considered as an action space in its own right. For instance, the PDEs for affine invariants based on three points xi = (xi,yi), x2 = (x2, yz) and x3 = (x3,y3) are: 0

dimension (i.e. the orbits of dimension 6). Note that M is 5dimensional (6 parameters minus 1 constraint). If the points xi, x2 and x3 are collinear, the rank of the system reduces from 6 to 4. An action with 4D orbits in a 5D measurement space yields one independent invariant. This invariant is the well-known ratio of distances between three points on a line. For another example, consider Figure 6. it can be shown that: RIG KG? with RI and R2 the radii of the osculating circles, is an invariant under plane projective transformations. Note how a rather simple combination of basic Euclidean concepts yields a projective invariant’5’22. This configuration can be constrained by taking for the intersection a point where the two curves are tangent (see Figure 7). As a consequence, L, = L2 for any possible second point, and the above invariant simplifies to:

RI 0

Rz

0

(24) where the equations correspond to the parameters aii, ui2, ~21, ~2, bi, b2, respectively (cfr. equation (11)). Becall that the dimension of the orbit passing through the measurement vector m = (x1, ~1, x2, y2, x3, ~3)’ is given by the rank of the system, evaluated at m. When the points x1, x2 and x3 are in general position (i.e. they do not satisfy any particular constraint), then this system clearly has rank 6, and no invariants are found, as predicted by the counting argument. But the rank of system (24) is strictly less than 6 if (and only if) the coefficient matrix has zero determinant. This determinant is:

D =

Xl

x2

x3

YI

~2

y3

I

1

1

Figure 6 Special configuration: two points, intersection of two curves. A projective invariant curvatures of the two curves at the intersection

one of which is the can be built using the

2

Thus the rank of the system drops when D = 0, i.e. if the points xi, x2 and x3 are collinear. As aftine transformations map collinear points to collinear points, the set M of measurements satisfying the collinearity constraint D = 0 is transformed into itself by the affine transformations, and thus can be considered as an action space in its own right. A4 itself is just the set that is obtained by deleting from the original measurement space /R6 all the orbits with maximal

272

L2

Image and Vision

Computing

Volume

13 Number

Figure 7 Special configuration: a specialization of the configuration in Figure 6, where the two curves (different line styles for clarity) are

tangent

at the intersection

4 May 1995

Vision

Tangency is preserved under plane projective transformations. Hence, the ratio of curvatures at a point where two curves are tangent is a projective invariant. Notice how little information this invariant is built of: it merely combines first and second order derivatives at the tangent point for the two curves. These eight measurements suffice to extract an invariant immune to both the eight independent projective group parameters and the four additional reparametrization parameters. Such special configurations can be identified systematically by looking for conditions that lower the rank of the systems of PDEs.

PRACTICAL

EXAMPLES

Example 1: Planar shape recognition As a first application, consider the scene of Figure 8. It contains two spanners out of a larger set of planar objects to be recognized. The viewing direction is very oblique, and hence the shapes suffer from serious perspective distortions. After a Canny based edge detection step, the longest edge fragments are selected for further processing (Figure 9). A gap filling and edge linking step is used to enhance the available edges. To deal with the overlap in such scenes, it is necessary to base the recognition on localized segments of the contours1’5’23. In this case, segments are taken to lie between subsequent bitangent points (i.e. points that

Figure 8 recognized

Image

containing

oblique

views

of

spanners

to

to invariance:

L van Go01 et al.

I-,

Longest

edges in Figure 8

have the same tangent) or between an inflection and a point where a line through the inflection touches the curve. Examples of such segments are shown in Figure 10. Two contours are drawn superimposed on the image. Each contour is accompanied by a straight line. Both lines are drawn between the two points that demarcate the segments used in this example. For the left contour, the segment is one between two bitangents. The segment for the contour on the right lies between an inflection and the point where a line through the inflection touches the contour. In order to generate a projectively invariant description of a segment, the knowledge of four reference points suffices. Indeed, any fifth point can be given invariant projective coordinates with respect to the four reference points23325 (2 + 8 measurements versus 8 independent group parameters leaves room for two invariants). Taking the end points of the segments as references-e1 and e2 say- two more points should be identified. Different methods to find additional points are available23S24. Here, the intersection of the tangent lines at the inflections between the end points is used as a third point. This point will be referred to as c. The fourth point is found using the semi-differential approach, i.e. by combining information on point coordinates and their derivatives.

be Figure 10

Image

approach

-Q_ ,-___- >

Figure 9

This section gives some practical examples of the use of invariants in computer vision. A first example describes the recognition of planar shapes viewed under perspective conditions. This example highlights the use of the semi-differential approach succinctly outlined in one of the previous sections. A second example illustrates the potential of uncalibrated stereo for solving tasks that seem to call for Euclidean three-dimensional reconstruction. This example shows ‘the importance of identifying the smallest group possible.

and Lie’s

and

Vision

Accepted

Computing

matches

Volume

are superimposed

13 Number

on the original

4 May

1995

image

273

Vision and Lie’s approach

to invariance:

L van Go01 et al.

To find the fourth reference point, an invariant parameterization of the segment between the inflections is generated, using the semi-differential parameter:

Ix(t) - c 1x0--1

x(‘)(t)1

x(t)--z(*

dt

In this expression, t stands for an arbitrary contour parameter as usual (it could, for instance, be Euclidean arclength), x is the column vector containing the image coordinates of a point that can be considered to slide along the contour, and x (l) denotes the column vector of the first derivatives of the sliding point’s contour coordinates with respect to t, as before. This parameter is not invariant as such, but will only differ up to some factor. Normalizing the ‘length’ between the inflections to 1 lifts this caveat. At the point where the parameter reaches value l/2 the fourth reference point is found, denoted by h. Note that the numerator tends to zero close to the inflections. Indeed, x - c and x(l) are almost parallel near the inflections. If the inflections are misplaced by a reasonable amount, the change in parameter length of the segment between the inflections will be very limited. The scheme is sensitive to the correctness of the tangent lines at the inflections rather than the positions of the inflections themselves. Finally, the four reference points el, e2, c and h are used to fix a projective reference frame. The invariant descriptions are generated by first giving the four reference points predefined coordinates. In this case, the following choices were made: in both examples er(- 1, 0), e2(1,O) and h(0, 1). The choice for c was made dependent on whether this intersection was lying on the same side of the straight line as the contour segment-as for the left segment -or on the other side- as for the segment on the right. In the former case, the choice is c(O,3), for the latter ~(0, -1). The points of the contour segments are then given their invariant, projective coordinates with respect to this basis. The resulting invariant descriptions for the segments are shown in Figure 11, with the model invariant descriptions superimposed in solid lines. As can be seen, the, model and image descriptions do not perfectly agree. An important factor contributing to the differences is the thickness of the objects. As a result, the edges in the image partially belong to the top plane and partially to the bottom plane. This is a deviation from the intended case of purely planar shapes. Nevertheless, as Figure II shows, the differences between correct pairs of model and image segments are clearly much smaller than those between the descriptions of different segments. The

Figure 12

PCB with the points

274

Image

and Vision

Computing

Volume

Example 2: 3D inspection via constrained stereo Camera calibration often is a painstaking necessity. It may impose interruptions in production, the need for hiring trained personnel by the user of the inspection system, or increased expenditure for field services by the vision supplier. Complete camera calibration is not always necessary, however, even where metric quantities such as lengths or areas have to be inspected. This section shows such example of inspection with incomplete calibration. Consider Figure 12. The task is to inspect whether the leftmost chip with a white label is well inserted. Figure 13 gives a detailed view of a case with a defect. Basically, the task is to check that the top plane of the chip is at an appropriate height above the PCB plane. This can be achieved with a camera with unknown pixel size and focal length and from a wide range of relative positions of the PCBs. If the PCBs pass by on a conveyor that translates along a fixed direction parallel to the image plane, then four pairs of corresponding points in two images suffice if the translation distance between the views is always the same. These data allow to assess the elevation of the chip above the PCB. If the value is too large, then the insertion is wanting. The feature used is a determinant type expression that is a function of the coordinates of the four points and that is extracted from the two images. Writing the

I \, \

4

,,,&.’ “3 \,,1..

, A

13 Number

indicated

useful for recognition. Once descriptions remain matching segments have been found, the complete shape model can be superimposed on the image for detailed matching. These are the larger contours shown in Figure 10, and they are seen to be in good agreement already.

“,

Figure 11 Model and image segments shown in their canonical frames, with A the result for the segment indicated in Figure IO of the spanner on the left and B the segment on the right

used in the analysis

4 May

1995

B

Vision and Lie’s approach to invariance: L van Go01 et al. over a fixed distance and parallel to the image is crucial. Normally, for an arbitrary translation and using a camera with unknown pixel size and focal length, the resulting transformation group capturing both geometrical and parameter variabilities would be lZdimensiona1 (3-D afline transformations)26. Constraining the translation in the forementioned way reduces the group to the 1 l-dimensional group of spatial unimodular transformations. Each pair of corresponding points yields three independent coordinates (Xti, Xzi,yti = y2i) as measurements. Hence, a minimum of four points is needed to generate a first invariant. This is one point less than required if the smallest group were 12D! This shows the importance of identifying the smallest group that captures all the variability and the advantages that mild constraints on generality might bring. Two pairs of stereo views for inspection are shown in Figures 14 and 15 each, with correct insertion and insufficient insertion, respectively. The chip sticks out to exactly the same degree for the two stereo pairs of Figure 15. In fact, the pairs within a single figure were taken of the same board, with different orientations of the PCB. The values of the determinant feature for different choices of four points are shown in Table 2. The numbers referred to in the first column correspond to those given in Figure 12. The entries of the second and third columns on the one hand, and the fourth and fifth columns on the other, should theoretically be identical. However, the different degree of insertion of the chip is expected to show up as a difference between the entries in columns 2 and 3 and those in columns 4 and 5. This expectation is corroborated by the experiment. Not all choices for the four points seem equally appropriate, however. To reduce the effect of noise, it is desirable to select four points that yield a maximum volume for the corresponding tetrahedron. Also, should the variation to translation

Figure 13

Detailed

view of improperly

inserted

chip

coordinates of a point xi as (~li,yli)~ and (XZi,Yzi)r in the first and the second image, respectively, then the expression takes the form:

(XII

-

Xl1

x12

xi3

x14

Yll

Yl2

Yl3

Yl4

x21

x22

x23

x24

1

1

1

1

x21)(x12

-

x22)(x13

-

x23)(x14

-

x24)

When one uses this feature, one must not have a conveyor motion parallel to the pixel columns (ydirection), since the denominator would vanish, Its value is proportional to the volume of the tetrahedron formed by the four points. As long as the tetrahedron doesn’t change, the feature is invariant, irrespective of the relative position and orientation of the PCB on the conveyor. Referring back to Figure 12, consider the points 1,2,4 and 6. note how the points 1,2 and 4 determine the PCB plane and 6 determines the elevation of the chip. The volume of the tetrahedron depends linearly on that elevation, since this is the only variable. The fact that images are taken before and after

Figure 14 Two pairs of stereo views (top and bottom pairs of figures) with the chip properly plugged in

Image and Vision Computing Volume 13 Number 4 May 1995

275

Vision and Lie’s approach

to invariance:

L van Go01 et al.

Figure 15 Two pairs of stereo views (top and bottom pairs of figures) with insufficient insertion

be measured (i.e. the position of point 6) result in as large a change of the volume as possible. This is the case if point 6 moves perpendicularly to the plane defined by the other three points. Upon inspection of Figure 12, l-2-4-6 would make a good choice, whereas 3-4-5-6 seems less appropriate, as fleshed out by the results in the table. An actual inspection job would then proceed as follows. The measurements from one or several stereo pairs of flawless PCBs are used to estimate the desired values for the determinant. In the example, one might select the first column as a reference. Then, a stereo pair of views of the board to be inspected is taken, and the invariant is calculated. If the measured value is too far off the reference values (as for columns 3 or 4 of the table), a defect is reported. Note that no absolute lengths or distances are calculated and that the focal length and pixel size of the camera can be left unknown. Moreover, the determinant value is invariant under rotations and translations of the PCBs. Indeed, in the examples shown, the board is viewed from different directions without this variability disrupting the assessment. No special jigs, pallets or other positioning material are necessary. One might go as far as using views with the chip leads invisible, e.g. top views, although precision is expected to go down. It only takes a fifth point to do away with the last restrictions: the need for a fixed translation parallel to the image. Similar invariants exist that allow inspection by

Table 2

Values for the PCB inspection invariant

Selected Pts

Figure 14 top

Figure 14 bottom

Figure 15 top

Figure 15 bottom

l-2-4-6 l-2-5-6 l-4-5-6 3-4-5-6

6.26 4.82 1.65 1.86

6.25 4.97 1.76 1.83

7.12 5.68 2.27 1.96

7.08 5.15 2.26

276

Image

and Vision

Computing

Volume

1.98

13 Number

dealing with the resulting 12-dimensional affine group, at the expense of higher computational complexity: the affine group absorbs the additional variability but requires more measurements for its invariants.

CONCLUSIONS In the first part of this paper, the basic ideas underlying the Lie group approach to invariance were explained and illustrated using some simple examples, thus providing the reader with a working-knowledge of invariants and their associated PDEs. The second part discusses a number of problems that tend to pop up when applying this mathematical theory to computer vision. Using illustrative examples taken from computer vision practice, we argue that these problems fall into two different categories. Firstly, there are the problems associated with turning a theoretical framework into a workable tool for the applications at hand (cf. Problems 1 and 2). Secondly, there are the problems faced when applying a mathematical model to a real-life application: one has to check whether or not the assumptions and prerequisites of the theory are met (cf. Problems 3 and 4). For each of the problems, specific solutions were suggested. Finally, the proposed framework was shown to offer solutions for practical problems. Focusing on the difficulties may have created the impression that this approach is particularly tiresome and heavy to handle. It should be understood, however, that whatever theory is used, these two categories of problems have to be dealt with. Therefore, it may be useful to point out some of the advantages of the Lie theory approach (compared to the other invariance theories): 1. The Lie framework has a clear geometric interpretation, thus providing insight in the action of the

4 May

1995

Vision

groups on the measurements and the existence of invariants. This compares favourably to the algebraic methods, which yield invariants after largely formal and often rather opaque manipulations. Lie theory admits a uniform treatment of the derivation of differential, semi-differential and algebraic invariants, applicable to algebraic and nonalgebraic shapes, and hence provides a framework that not only unities the procedure, but also allows to compare the different approaches. It is very well suited to make precise predictions about both the existence and the non-existence of invariants. It goes without saying that being able to show non-existence can save an enormous amount of search-time.

3

4 5

6 7

8

9

10

Of course, some important issues remain to be studied further. One such issue not touched upon is completeness, i.e. the unambiguity of an invariant description. Although some theorems exist in this regard, the mathematical concepts and the unambiguous reconstruction of shape required in vision are not always equivalent. Furthermore, there is a great need for symbolic packages that help generating, analysing and solving the systems of PDEs for the invariants. This would allow to exploit fully the possibilities of the theory. Anyway, the Lie approach to invariance offers the necessary means to attack vision problems in a systematic manner. Taking 3D reconstruction as an example, it is like a huge matrix to be filled out. What can be seen from different numbers of views, with different levels of calibration and using different models for projection and different types of features? Many entries have been studied, but even more have to follow. A systematic analysis is needed to avoid that work is duplicated or that one misses out on important opportunities.

11

12

13

14

15

16

17

18

ACKNOWLEDGEMENTS

19

This work was supported by ESPRIT Basic Research Action 6448 ‘VIVA’. Theo, Moons and Eric Pauwels gratefully acknowledge the Post-doctoral Research grant supported by the Belgian National Fund for Scientific Research (NFWO). Part of this paper was written during a stay of the first author at the Isaac Newton Institute for Mathematical Sciences, Cambridge, UK.

20

REFERENCES

24

1 2

Bruckstein, A and Netravali, A ‘Differential invariants of planar curves and recognizing partially occluded shapes’, in Visual Form, Plenum Press, New York (1992) pp 89-98 Faugeras, 0 ‘What can be seen in three dimensions with an uncalibrated stereo rig’, Proc. 2nd Euro. Conf. Comput. Vision, Santa Margherita Ligure, Italy (1992) pp 563-578

Image

21

22

23

25 26

and Vision

and Lie’s

approach

to invariance:

L van Go01 et al.

Hartley, R ‘Euclidean reconstruction from uncalibrated views’, in J Mundy et al. (eds), Appl. Invariance in Comput. Vision, LNCS 825. Surinzer-Verlaa. Berlin (1994) DD 2377256 Huttenlocher: D-and Uiman, S “Object’ recognition using allignment’, Proc. Int. Conf. Comput. Vision (1987) pp 496500 Kempenaers, P, Van Goal, L and Oosterhnck, A ‘Semidifferential invariants: algebraic-differential hybrids’, SPIE Proc. Hybrid Image and Signal Processing III, Orlando, FL (1992) pp 41-52 Koenderink, J and van Doorn, A ‘Aftine structure from motion’, J. Opt. Sot. Am. A, Vol 8 (1991) pp 3777385 Lamdan, Y, Schwartz, J and Wolfson, H ‘On recognition of 3D objects from 2D images’, Proc. Int. Conf. Robotics Automation (1988) pp 1407-1413 Maybank, S ‘Classification based on the cross ratio’, in J Mundy et al. (eds), Appl. Invariance in Comput. Vision, LNCS 825, Springer-Verlaa. Berlin (1994) DP 458472 Mohr, R, Veilion, F and Qnai, L ‘Relative 3D reconstruction using multiple uncalibrated images’, Proc. Conf. Comput. Vision Patt.‘Recogn., New York, NY (1993) pp 543-548 Moons, T, Van Goal, L, Van Diest, M and Pauwels, E ‘Affine reconstruction from perspective image pairs’, in J Mundy et al. (eds), Appl. Invariance in Comput. Vision, LNCS 825, SpringerVerlag, Berlin (1994) pp 2977316 Rothwell. C. Zisserman, A. Forsvth. D and Mundv. J ‘Using projective invariants for constant time library indexing in mode? based vision’, Proc. Br. Machine Vision Conf., Glasgow, UK (1991) pp 62-70 Shashua, A ‘On geometric and algebraic aspects of 3D affne and projective structures from perspective 2D views’, in J Mundy et al. (eds), Appl. Invariance in Comput. Vision, LNCS 825, Springer-Verlag, Berlin (1994) pp 127-144 Van Gool. L. Waaemans. J, Vandeneede. J and Oosterlinck. A ‘Similarity extracti&n and modelling’, Proc. Int. Co& Comput. Vision, Osaka, Japan (1990) pp 530-534 Van Goal, L, Kempenaers, P and Oosterlinck, A ‘Recognition and semi-differential invariants’, Conf. Comput. Vision Putt. Recogn., Lahaina, HI, USA (1991) pp 454460 Van Gool, L, Moons, T, Pauwels, E and Oosterlinck, A ‘Semiin Geometric Invariance in Computer differential invariants’, Vision, MIT Press, Cambridge, MA (1992) pp 1577192 Van Gool, L, Brill, M, Barrett, E, Moons, T and Pauwels, E ‘Semi-differential invariants for nonplanar curves’, in Applications of Invariance in Vision, MIT Press, Cambridge, MA (1992) pp 293-309 Zisserman, A, Blake, A, Rothwell, C, Van Goal, L and Van Diest, M ‘Eliciting qualitative structure from image curve deformations’, Proc. Int. Co& Comput. Vision (1993) pp 340-345 Olver, P Applications of Lie Groups to Differential Equations, Springer-Verlag, Berlin (1986) Moons, T, Pauwels, E, Van Goal, L and Oosterlinck, A ‘Foundations of semi-differential invariants’, Int. J. Comput. Vision, Vol 14 (1995) pp 2548 Weiss, I ‘Projective invariants of shapes’, Proc. Conf. Comput. vision Putt. Recogn., Ann Arbor, MI (1988) pp 291-297 Astrom, K ‘Fundamental difficulties with projective normalization of planar curves’, IEEE Trans. PAMI, Vol 17 (1995) pp 77-87 Van Goal, L, Moons, T, Pauwels, E and Wagemans, J ‘Invariance from the Euclidean geometers perspective’, Perception (1984) pp 5477561 Zisserman, A, Forsyth, D, Mundy, J and Rothwell, C ‘Recognizing general curved objects efficiently’, in Applic. Invariance in Vision, MIT Press, Cambridge, MA (1992) pp 228-25 1 Van Diest, M, Van Gool, L, Moons, T and Pauwels, E ‘Projective invariants for planar contour recognition’, Proc. Euro. ConJ Comput. Vision, Stockholm, Sweden (1994) pp 5277 534 Efimov, L Higher Geometry, MIR, Moscow (1980) Van Gool, L, Moons, T, Proesmans, M and Van Diest, M ‘Affme reconstruction from perspective image pairs obtained by a translating camera’, Proc. Int. Conf. Patt. Recogn., Jerusalem, Israel (1994) pp 1.290-1.294

Computing

Volume

13 Number

4 May

1995

277