Pattern Recoynition, Vol. 27, No. 10, pp. 1407 1414, 1994 Elsevier Science Ltd Copyright © 1994 Pattern Recognition Society Printed in Great Britain. All rights reserved 0031 3203/94 $7.00+,00
Pergamon
0031-3203(94) E0046-N
AN ALGEBRAIC M E T H O D FOR DETECTION A N D RECOGNITION OF POLYHEDRAL OBJECTS FROM A SINGLE IMAGE G. MAROLAt and A. VACCARELLI~ t Dipartimento di Ingegneria dell'Informazione, University of Pisa, Via Diotisalvi 2, 56126 Pisa, Italy ++Istituto di Elaborazione dell'Informazione, CNR, Via S. Maria 46, 56126 Pisa, Italy (Received 23 April 1993; in revised form 24 February 1994; received for publication 20 April 1994)
Abstract--This paper proposes an algorithm to recognize rotated, scaled or overlapping polyhedral objects using only one orthographic projection. The procedure is not based on the numerical solution of nonlinear equations and does not require any a priori knowledge. It is an algebraic method for testing whether a cluster of edges and vertices, in a line drawing, correspond to an orthographic view of a given 3D model. It has the advantage of being based on simple algebraic equations, which are easy to solve and do not require time-consuming numerical procedures. Recognition Polyhedral objects Orthographic correspondence testing
Single 2D view
1. INTRODUCTION In computer vision the problem of the identification and localization of 3D objects from a single camera view has been widely considered and many interesting contributions to its solution can be found in the literature. In most cases the proposed techniques are applicable to polyhedral objects and make use of the so called model-based recognition paradigm. In the model-based approach, a set of geometrical models are stored in a database and then matched against features (vertices, edges, angles, etc.) extracted from a 2D image. For example Kanade I'~ tried to interpret the image line junctions as projections of trihedral vertices of the model. Horaud t2~ uses an approach in which some image attributes are backprojected onto the 3D space using the geometric constraints of the perspective. Dhome et al. 131 compute the spatial location of a 3D object, by solving an eight degree equation and Kanatani t4~ discusses various methods for estimating 3D object parameters from 2D image characteristics. An algorithm, based on an alignment approach, which requires the resolution of second-order equations, has been proposed by Huttenlocher and Ullman/5~ Kriegman and Ponce 161 use elimination theory for recognizing and locating curved 3D objects from their monocular image contour. Finally Lowe tT~ proposed a stabilized minimization approach for determining model and viewpoint parameters. In most cases, however, the proposed methods require a considerable computational burden due to the solution of nonlinear equations, which often results in difficulties: multiple solutions may exist, numerical computations may not converge or the time required may be too long.
Gray level image
Graphsubisomorphism
An alternative approach for solving the recognition problem consists of using graph theory. For example Lie et al. ts~ describe the object by attributed relational graphs, and an interpretation tree search algorithm in 3D space is applied to the matching between the constructed relational graphs and those stored in the database. Interpretation tree search algorithms have also been proposed and developed by Gaston and LozanoP6rez, ~9)Grimson and Lozano-P~rez" Oland Murray. I1~' Recently, Wong 112l has carried out recognition by searching whether a 2D projection graph, constructed from the 2D projection of a 3D object, is a subgraph isomorphism of the object's model graph. However, this approach is often impractical because, due to the NP-Complete nature of graphisomorphism, it is very unlikely that it can be solved by means of an algorithm running in polynomial time. In addition, a purely topological matching, using subgraph searching, must be followed by a projection of the 3D geometrical model onto the 2D plane for verification. This last step is not simple because it is based on the camera pose estimation which may require complicated nonlinear optimization techniques (see, for example, Horaud et al. ~3}.
The method presented in this paper combines a topological approach, in which local features of a 3D model are searched for in the image using subgraph isomorphism, followed by a quantitative test on the so found correspondences. Time complexity due to subgraph searching is reduced to acceptable values by using simple local features of the 3D model. The quantitative test, which in practice represents the main original contribution of this paper, is based on a simple algebraic procedure which does not require appreciable
1407
1408
G. MAROLAand A. VACCARELL!
computational efforts. An essential role in the proposed verification procedure is played by a set of scale factors that not only make the algorithm capable of tolerating overlapping of objects and inaccurate edge extraction, but also enable it to take moderate perspective effects into account. The paper is organized as follows. In Section 2 we give a brief overview of the whole procedure which must be used for image recognition. In Section 3 we describe an algebraic technique to be used for testing topological correspondences between model and image. Section 4 discusses the practical aspects of the method, including its time complexity. Finally, in Section 5, some experimental results, obtained using real scenes, are reported.
2. PRESENTATION OF T H E M E T H O D
In this section, we will present the main features of our method and justify the assumptions that have been made. We have restricted our analysis to the case of polyhedral objects which can appear in a single orthographic view in any position and orientation and can also be partially occluded. This type of representation is very popular because it permits a considerable compression of the data. However, this approach presents the serious limitation of being not able to work with the most part of commonly used objects, which have many curved surfaces. Fortunately, the polyhedral part of an object, if it exists, is in many cases sufficient to identify the object itself. Hence, a simple procedure capable of identifying a pattern built-up from a limited number of interconnected edges (straight lines) is often sufficient to identify even complicated objects. As is customary, we have also assumed that a connected line drawing has been extracted from the scene using one of the efficient algorithms reported in the literature. (14-16~ However, in order to treat real-world problems, we have not assumed that line drawing will be perfect: lines or parts of them may be missing and extra lines caused by shadows and other disturbances may be present. Information on the line drawings of the input scene is stored effectively using a graph-based structural representation. The nodes of the graph correspond to the junctions of the line drawing and the arcs correspond to the edges linking them. This representation has been also used for the 3D models. The procedure to recognize.a known object in a scene is performed in two steps. First we verify whether the model graph of an object can be included in the image graph. This corresponds to testing whether the model graph is isomorphic to one or more image subgraphs. In practice, bearing in mind that not all the vertices of a 3D polyhedral object can be visible in a 2D view, it is convenient to use a library of subgraphs of the model, each containing a reduced number of nodes and representing a distinctive feature of the object itself.
Once a subgraph isomorphic to the model graph has been found in the scene, we must verify whether the features of corresponding nodes can be matched. In other words, in this second step of the recognition procedure, we have to find the transformation, expressed by a rotation matrix, which orthographically projects the features of the model nodes into the features of the image nodes. However, owing to the presence of overlapping and to imperfect edge extraction, some vertices of the considered model might not be visible in the image or might be lost in its line drawing. In such cases, even if there is a real correspondence between image and model, the features (coordinates) of corresponding nodes of isomorphic graphs could not be matched and the recognition procedure would fail. In order to overcome the above difficulty we have introduced a set of multiplicative coefficients 2~, one for each of the considered edges, as shown in Fig. 1. As we can see from the above figure, these generalized scale factors play the key role of interface between an ideal view of a given set of edges and the real scene in which these same edges appear altered by overlapping, noise, inaccurate line drawing and also moderate perspective distortion. This seems to be a complication, because we do not only have to find a rotation matrix R, but also this last set A of unknown scale factors. However, as will be shown in the next section, the problem can be posed in the form of a set of linear equations and solved without difficulty both for R and A. Moreover using the identity RRT= I (which, together with det (R)= 1, assures that R is a rotation matrix), the problem is solved in a natural way, for models having 4, 5 or 6 edges (a greater number of edges should require quite involved testing procedures). At first glance, the use of a limited set of 4, 5 or 6 edges as a distinctive feature of a model appears to limit the number of distinguishable objects drastically and imply that the recognition method would not be able to work with a large database. In fact the number of features, required to represent objects in a database, normally increases with the dimensions of the database itself. However, the proposed method combines a qualitative approach (graph isomorphism) with a quantitative (algebraic) testing procedure, based on a set of scale factors. It is thus not only able to distinguish
IDEAL VIEW
REAL VIEW
Fig. l. Ideal view and real view of a set of edges, with overlapping and inaccurate edge extraction.
Polyhedral objects from a single image objects having different structural properties, but also to discriminate among objects having the same structure but different sizes. This allows us in many cases, to overcome the limits imposed by a small database. As a last remark, we note that, even if the method is based on the assumption of orthographic views, the use of scale factors enables the procedure to work correctly even in presence of a moderate distortion caused by the perspective. 3. THE VERIFICATION ALGORITHM
In this section, we solve the problem of testing whether an image subgraph, isomorphic to a model graph, represents an orthographic view of the model. Note that, even if this testing procedure should be considered as a final step in a recognition method, we take it into consideration firstly for a main reason. In fact the testing procedure we will develop here suggests the use of some preferred numbers of edges, as distinctive features of a 3D model, so that it also influences the properties of subgraph searching itself. Thus we first describe the testing method and successively, in the next Section, we will show how it can be integrated in a complete procedure for pattern recognition. Let us represent N edges of the considered model by a set of vectors v~ = ( x i, y . z~), i = 1 . . . . . N centred at the origin of the reference frame x, y, z, as shown in Fig. 2. Moreover, we use the same description for the isomorphic subset of N edges of the line drawings of the image, by introducing a set of vectors w i = (~i, ~bi), i = 1,..., N centred at the origin of the two-dimensional reference frame ~, ~b.Clearly, in this way, any knowledge on how edges are connected together is lost, but this information has already been used while searching for isomorphism and is no longer needed. If the set of vectors w~ is obtained by means of a rotation and an orthographic projection of the 3D vectors % this correspondence can be established via the relationship: R'V=W
(1)
where R' is a [2 × 3] rotation matrix, V = [vl,vz ..... vs]
1409
is a [3 x N] matrix representing the model edges and W = [wl,w2 . . . . . wN] is a [2 x N] matrix representing the selected edges in the image frame. We note here that in the rotation matrix R' dependence on the distance is disregarded. This approximation can be considered valid if the distance of the imaging device from objects is large compared with their size and is equivalent to considering an orthographic projection instead of a perspective one. In most application cases this is not actually a limitation as this condition is often satisfied. Equation (1) is valid only in the ideal case in which all projected edges are completely visible and the edge extracting algorithm has worked without errors or imprecisions. In practice some edges in the image may be partially missing due to overlapping of objects, presence of shadows, imperfect line drawings, noise, etc. Therefore, the length of each vector wi, provided it has not been completely lost, is changed by an unknown scale factor 21. These scale factors are introduced in equation (1), giving the more realistic matrix equation R'V -- WA
where A = d i a g [ 2 1 , 2 2 . . . . . ~-N]. Note that the introduction of the coefficients 2i, denoted as generalized scale factors, is not only useful in the case of partially missing edges, but also allows the method to be insensitive to moderate distortion introduced by perspective projection. Let us now partition matrices V, W and A, so that equation (2) is split into two equations R'V 3 = W3A 3
eb
O4
B3
R'V N = WNA N
30 pattem
(4)
with VN = IV4,V5. . . . . VN], WN = [W4, W5.... , WN] and 2 N = diag [24, 25,..., 2N]. Note that matrix R' is identical in both equations (3) and (4), by virtue of the orthographic projection used. Equation (3) can be solved with respect to R', provided that V 3 is nonsingular, thus obtaining: (5)
which, substituted into equation (4), yields W3. A3. V31 .VN = W~,. A s.
l '4~'
(3)
with V3~---I'VI, V2, V3-], W3~--~-[-Wl,W2,W3] and A3--diag [21, 22, 23-I, and
R ' = Wa-A 3 .V3 I Z
(2)
(6)
By defining the [2(N - 3) x N] matrix H = r v T ( v 3 l ) diag(~l, ~2, ~3) diag(~4, ~s,.. ~ ) N ] LVT(V3 1) diag (~1, ~2, ~k3) diag (~b4,~bs,. i i )"
J
"~/3
Equation (6) can be rearranged as a homogeneous linear system of 2(N -- 3) equations in N unknowns: H[21,22 . . . . . 2N] t = 0
Fig. 2. The representation of a 3D set of edges by means of a cluster of vectors centred at the origin of the reference frame.
(7)
which, once solved, gives all the values of the unknown scale factors 3.i.
1410
G. MAROLAand A. VACCARELLI
In general, non-trivial solution of equation (7) can be found only if rank (H) < N. In particular if rank(H) --2(N - 3) we have N < 6, so that we can conclude that six edges are to be found in the scene, seems to be a natural upper limit for the proposed method. Of course, it is also possible to deal with a greater n u m b e r of edges, provided we verify that rank(H) < N. However, verifying the rank of a rectangular matrix is not a simple task, especially for matrices of great dimensions. Thus, on the basis of the above considerations, we limit ourselves to the three cases of N = 4, 5 and 6 [notice that for N < 3, rank(H) < 0.) Let us consider first the case of N = 6. In order to guarantee the existence of a non-trivial solution, it must be tested previously that det(l-l) = 0. If this condition is met, one of the 6 unknown scale factors must be fixed arbitrarily. In the case of N = 5, we have 4 equations in 5 u n k n o w n s and also in this case one of the scale factors must thus be fixed arbitrarily. Finally, in the case of N = 4, as 2 is the n u m b e r of equations and 4 the n u m b e r of unknowns, two scale factors are permitted to assume arbitrary values. Solving equation (7) for all scale factors is not sufficient, however, to ensure that there is an orthographic correspondence between an imaged subset of edges and an isomorphic set of edges in the 3D database. In fact, matrix R', which can be calculated using equation (5), in general may not have the properties of a rotation matrix. Fortunately, we have one or even two scale factors at our disposal, whose value can be adjusted in order to overcome this difficulty. A necessary (but not sufficient) condition for a [3 × 3]-matrix R to be a rotation matrix, is that it preserves lengths and mutual angles between edges. This condition corresponds to the simple identity RR v = I which, being the third row of R not available from the two-dimensional data of the scene, reduces to R'R 'T = I
(8)
where I is a [2 x 2J-identity matrix. Equation (8), which in practice gives rise to three scalar relationships, should be solved in terms of the arbitrary scale factors. If the problem is solvable, i.e. if it is possible to find the scale factors (one for the case of 5 or 6 edges, and two for the case of 4 edges), the orthographic correspondence between image and model can be considered as proved. However, as in most cases the data are noisy, owing to the presence of disturbances of various nature, the above relationships could not be satisfied even if there is an exact correspondence between the model and image. This difficulty can be overcome by solving equation (8) in the least-square sense, i.e. by minimizing the Euclidean norm of the difference (R'R 'T - I), defined as:O 7) Norm (R'R 'T - I) = x/trace [(R'R 'T - I)(R'R 'T - |)T].
(9) In addition, the minimal value of Norm (R'R 'T - I) can be assumed as a resemblance coefficient, being equal
to zero when a particular view of the 3D object matches the 2D image exactly. The case of 6 or 5 edges is very simple [remember that for N = 6 the condition det(H) = 0 has to be tested in advance]. In fact, by denoting b.y 2~ the unique scaling coefficient to be found, we obtain: R'R 'T = ~.2W3A3(I)(VTV3)- 1A3(1)wT
(10)
being A3(Aa) = diag [21 (2a), 22(2~),23(2,)]. Thus we have: N o r m (R'R 'T - I) = (a2a2 - 1)2 + (b22 - l) 2 + 2(c22) 2
(11) where
The right-hand side of equation (11) is m i n i m u m if 2a assumes the value: 22 -
a+c
a 2 + 2b 2 + c 2'
(12)
The above value of 2a has to be substituted into equation (11) in order to verify whether there is a perfect matching (norm equal to zero) or an approximated one (norm close but not equal to zero). Of course, if the norm is above a fixed threshold, the correspondence cannot be established. The value of this threshold depends on several factors: noise level, pixel density, etc. In the case of 4 edges, which corresponds to two arbitrary scale factors 2a and 2b, we have: R'R 'x = W3(2aA3(I, 0) + 2bAa(0, 1))(V3xV3) - t(2aA3(1,0) + AbA3(0, 1))W3T
(13)
where A3(J.a, '~b) = diag [21(2a, 2b), 22(2a, 2~), 23(2o, 2b)]. The n o r m can be written as: Norm [R'R 'T - I] = (al,~ 2 + a3),a~, b + a2J, 2 - 1) 2 +(b122 + ba262b + b2~.2 - l) 2 + 2(c122 + c32a2b + c222) 2
(14)
where:
I C1°,
Cll-] = W3A3(I, 0)(V~V3)-1A3(1, 0)W~ bl 3 c2 ] = W3A3(0 ' I)(VTVa)_ ~Aa(0, 1)wT b2 d C3
] = W3[A3(1,0)(VaTV3) - 1A3(0, l) b3 /
+ A3(0, I)(vTv3) - 1A3(1,0)]w3T. The right-hand side of equation (14) is a fourth degree polynomial in two variables 2a and 2 b. The pair of values which renders this expression m i n i m u m can be found by using well-known techniques of numerical analysis. However, such methods are very time-cons-
Polyhedral objects from a single image uming and convergence problems could occur. Fortunately, the problem can be simplified by introducing the auxiliary variables ~=(2,/2b)2efl=(26) in the above norm expression (note that )'b is always different from zero, as it is a scale coefficient). By equating to zero the derivatives with respect to ~ and fl, the following system can be obtained:
2
a122 + a222 + a3~.c2 + a4J.a2b + a52a2~ + a6)LbJ,c = I b122 + b222 + b322 + b42a2b + bs2.2 c + b6~bJ,c = 1
c~22 + c2)~g + c322 + C42a2b+ C52.2< + C6262<= 1 (17) where:
[fl(al~ 2 + a30~ -t- a2) -- 1](2al~ + a3) + [fl(bl~ 2 + b3c~ + b 2 ) - 1](2ba~ + b3)
+2fl(C1~2 +C3~+C2)(2C1~+C3)=0
1411
el ] = W3A3(1, 0, 0)(VTV3) - 1A3(1 ' 0, 0)W T bl 3 (15)
[fl(alot 2 + a3R + a2) -- 1](alo~ 2 + a3~ + a2) + [fl(bl~t 2 + b3ct + b2) - 1](blot 2 + b3ct + b2)
C2 ] = W3A3(0 , 1,0)(VTV3) - 1A3(0 ' 1,0)W T C2 b2 d C3 ] = W3A3(0, 0, 1)(VTV3) - 1A3(0 ' 0, 1)W T b3 g
+ 2fl(Cl~ 2 + C3~ + C2)2 ---~0. The required minimum is obtained by solving with respect to the unknowns a and ft. However, by eliminating the fl variable, the above system reduces to a fourth degree polynomial in ~. Hence, its four real or complex roots can be computed by using the Cardan formulasJ TM The minimum of the four values, found for the norm, is chosen and can be used in the same fashion as in the case of 5 or 6 edges, in order to verify a correspondence between the model and the image. In the above analysis, we have considered regular cases in which rank(H) = 2(N - 3). However, in certain practical situations, for example, when some of the considered edges are parallel, the rank of H may decrease, thus increasing the number of free scale factors at disposal. A number of free scale factors greater than three has no physical meaning, however, because the three equations corresponding to the constraint R ' R 'g = ! should have infinite solutions. On the contrary, in the case of three scale factors, the above three constraint equations can be satisfied directly, i.e. not in the least-squares sense. In fact we have:
C4 ] = W 3 [ A 3 ( 1,0, 0) (VTV3) C4 b4 d
1A3(0
'
1,0)
+ A3(0 , 1,0)(VTV3)- 1A3(1 , 0, 0)] W T
analytically
R'R 'T = W 3 [ ~ a A 3 ( 1 , 0 , 0 ) + 2hA3(0, 1,0) + 2eh3(0, 0 , 1)]. (VT3V3)-' [2,A3(1,0, 0) + 2t,A3(0, 1,0) + 2,.A3(0,O, 1)]W T (16)
being A3(,,~a,/I-t,,J'c) = diag [21(2~, 2b, 2~), 22(2 ~, 2b, 2<), 23(2~, 2~, 2~)]. Thus by substituting into R'R'T = I we obtain the following three scalar equations:
[a5 = 5W 3b5 c[ 5A]3 ( I ' 0 ' 0 ) ( V ~ V 3 ) - ' A 3 ( 0 ' 0 ' I ) c + A3(0, 0, 1)(VTV3)- ' A3(1,0, 0)]W T [a6 C6 cb66 ] = W3 [A3(0, I'0)(V~V3)- t A3(0, 0, 1) + A3(0, 0, 1)(VTV3)-IA3(0, 1, 0)]W T. By subtracting the second equation from the first one in system (17) and dividing both equations by 22, we obtain:
dl()'a~2+ d2(~'b~2+ d 3 + d4)~a~b + dS)~a + d6/'b=O
\,t< )
\,t< )
c,\~l
dl
c' L'x: o
;,<
,6
2<
=o
(18) having assumed di = ( a i - b~). The left-hand side of the above equations may be considered as a pair of polynomials in the (2a/2c) unknown, with (,l.ff2c)a given parameter. As is known, the above polynomials have a common root if and only if their Sylvester's Resultant ~19) is vanishing, i.e.:
0
[d,~2b + d s ] ',,w
cl
,t~
\~1 +c~+c,,X°&+c,X°+ 2c2< 2<
,,, ,r". "' + ,', 0
,tcZ<
L 2<
z
c y;
J
0
[c2(__2,~2+C6 2, + ¢3_[] \2
2<
=0.
(19)
1412
G. MAROLAand A. VACCARELLI
By developing the determinant at the left-hand side of equation (19), we obtain a fourth degree polynomial, which can again be solved analytically using Cardan's formula. Hence, knowing (2b/2c), the value (2a/2c) can first be found by solving the second of equations (18) and then 2c can be found by solving the first or the second of equations (17). Finally, from all these values of the scaling factors, we select those which have a physical meaning, i.e. are real and positive. 4. PRACTICALIMPLEMENTATIONOF THE METHOD In the previous section, we have considered the particular problem of verifying whether a given set of edges in a 2D image can be considered as the orthographic projection of a given set of edges in the 3D database. The purpose of this section is to examine the practical aspects of the whole recognition procedure and to discuss its application limits. In order to give a complete picture of the technique proposed, we consider the case of Fig. 3, in which a cube and a prism appear partially occluded by a pyramid. Let us suppose we wish to identify both the cube and the prism. To do this we assume the pattern consisting of six edges, shown at the upper-right corner of Fig. 3, as a distinctive feature of these two objects. Notice that in this case the cube will be identified if the three scaling factors 2~, 22, 23 of the edges ea; e2, e 3 are equal, whereas the prism is found if they are in a given ratio. A first step in the recognition procedure may be the decomposition of the line drawings into not connected
11 1
8
~
£, k/e6 I"%+
8
5 17
3D model
18r
rl.6-o
: 1
1 4 " - . . ~',. . -
2
2D image
,
~d 2.O
13 Rejected Pattern
Fig. 3. A synthetic 2D scene consisting of a cube, a prism and a pyramid. A 3D pattern, characteristic of either the prism or the cube, appears at the upper-right comer and an example of a wrong correspondence is shown at the lower-right corner.
parts. Normally this is achieved by searching for junctions of colinear edges which do not correspond to real vertices but simply to the occlusion of an edge by the face of an other object. Using decomposition on Fig. 3, the problem of finding a subgraph isomorphic to the model graph is greatly simplified. In fact, we obtain only 18 of these structural correspondences, whereas, in absence of such a preprocessing, we would have found 466 subgraphs corresponding to the searched model pattern. At this point, we have to decide which of these 18 subgraphs represents a real (partial) view of the searched object. Note that, as N = 6 and rank(H) = 3, we have three scale coefficient at disposal so that identity R'R T = I can be satisfied exactly and not in the mean square sense. In fact, by applying the testing procedure outlined in Section 3, we obtain the results summarized in Table 1. In this table only 3 accepted edge patterns are shown and a closer examination makes it possible to separate the views of the cube, i.e. those having 2 2 / 2 1 ~,~ 23//]. 1 ~ 1, from that of the prism having 2 2 / 2 1 ~,~ 0.75 and 23/21 ~ 0.66. As a concluding remark we note that the remaining 15 isomorphic patterns have been rejected because some of their scale factors are zero or negative. Consider, for example, the pattern (14, 13, 16, 20, 18, 17, 19) belonging to the cube and shown at the lower-right corner of Fig. 3. Clearly its isomorphism with the 3D model is only accidental, i.e. due to the fact that a vertex of the cube is not visible. Hence a rotation matrix, which transforms the model into this latter, does not exist. This corresponds to say that some of the scale factors satisfying to equation (7) assume a zero value, so that the pattern is rejected. Using a Mac II-ci Personal Computer, the time required for carrying out graph subisomorphism and testing on the synthetic scene of Fig. 3 is about 3 s. In general, the time required for testing a single topological correspondence does not vary substantially (for a model having 4, 5 or 6 edges) so that the time complexity of the whole procedure (neglecting preprocessing of the gray-level image) coincides with the time complexity of subgraph searching. Using the resuits of Wong (12) and assuming a model with 6 edges and a scene with n edges, we obtain the following general formula for the time complexity of our recognition procedure: T = O(n23). As we can see, the required time is simply a linear function of the total number of edges in the scene.
Table 1. Results for the case shown in Fig. 3 Isomorphic pattern 15, 16, 17, 19, 18, 13, 14 18, 16, 17, 14, 15, 13, 20 12, 9, 8, 11, 21, 10, 7
At
22
)~3
24.
25
26
Identified object
0.99 1.02 1.15
1.01 0.99 0.83
1.01 1.01 0.74
1.24 1.01 0.74
0.99 1.02 1.27
1.01 1.32 3.02
Cube Cube Prism
Polyhedral objects from a single image
1413
5. EXPERIMENTAL RESULTS
The method proposed in this paper has been tested on real-world images representing polyhedral objects in u n k n o w n positions. The images have been obtained using a digital camera with a focal length of 55 m m with a visual field of 376 × 284 pixels. The images are slightly distorted by perspective effects. A first example is shown in Fig. 4 where only polyhedral objects are present. At first the gray-level image is pre-processed in order to extract a set of connected straight line. This pre-processing has been carried out using a procedure based on the techniques described by G u and H u a n g 114~and Venkateswar and Chellappa u6) for the extraction of connected straight lines and the result is shown in Fig. 5. On the basis of the observations made in Sections 3 and 4, a set of 4, 5 or 6 edges must be chosen as a distinctive feature of an object. For example, if we search for the hexagonally shaped box at the left of Fig. 4, we can use a 3D pattern comprised of four edges as a distinctive feature. As it has a reduced n u m b e r of edges, this pattern is likely to be found in the scene even in the case of overlapping objects.
Fig. 6. A real-world scene consisting of some partially-shaped polyhedral objects.
Fig. 7. Connected line drawing extracted from the real-world scene of Fig. 6. The pattern found by means of the recognition process is evidenced.
Fig. 4. A real-world scene consisting of some polyhedral objects.
Fig. 5. Connected line drawing extracted from the real-world scene of Fig. 4. The pattern found by means of the recognition process is evidenced.
By using a brute force subgraph isomorphism search algorithm (no line drawing decomposition has been carried ¢ut), we have found 160 possible structural correspondences between the selected pattern and the scene of Fig. 5. Applying the testing procedure, these 160 structural correspondences can be ordered by increasing values of the norm. In addition, the corresponding rotation matrix and the scaling coefficients are found in order to verify how a rotated and projected 3D pattern appears in the scene. The smallest norm case has been evidenced in Fig. 5 and we can see that in this case a particular view of the 3D pattern exactly matches the 2D image. As a second example, let us consider the picture shown in Fig. 6. In this case the objects appearing in the scene are only partially shaped as polyhedra. However in this case, it is possible to extract a set of connected straight lines, shown in Fig. 7, in such a way as to obtain a description of the objects which is based only on the polyhedral part and that allows the identification. Let us suppose, for example, that we are searching for the tape holder located at the upper-right corner of Fig. 6. As a result of the recognition procedure, a representative 3D pattern consisting of five edges and cor-
1414
G. MAROLAand A. VACCARELLI
responding to the minimum norm is evident in Fig. 7. The resulting exact match demonstrates that, even in this case, the method is able to perform the correct identification of the searched object.
6. CONCLUSIONS The procedure proposed in this paper makes it possible to recognize rotated, scaled and overlapping polyhedral-shaped objects, using only one orthographic (or moderately perspective) view. The described method has the advantage of being based on simple algebraic equations which are easy to solve and do not require time-consuming numerical procedures. By utilizing a set of generalized scale factors, the method is able not only to distinguish between objects having different structural properties but also to discriminate between objects having the same structure but different sizes. The algorithm has been implemented on a small size computer and has been tested on several scenes from the real-world. Acknowledgements--This work was supported in part by a grant from Progetto Finalizzato "Sistemi Informatici e Calcolo Parallelo" promoted by the Italian National Research Council. Thanks are also due to an anonymous referee for critical examination and valuable comments.
REFERENCES
1. T. Kanade, Recovery of the 3D shape of an object from a single view, Artif. Intell. 17, 409-460 (1981). 2. R. Horaud, New methods for matching 3D objects with single perspective views, IEEE Trans. Pattern Anal. Mach. Intell. 9, 401-412, (1987). 3. M. Dhome, M. Richetin, J. T. Lapreste' and G. Rives, Determination of the attitude of a 3-D object from a single perspective view, IEEE Trans. Pattern Anal. Mach. lntell. II, 1265-1278 (1989).
4. K.-I. Kanatani, 3D Euclidean versus 2D non-euclidean: two approaches to 3D recovery from images, IEEE Trans. Pattern Anal. Mach. Intell. II, 329-332 (1989). 5. D.P. Huttenlocher and S. Ullman, Recognizing solid objects by alignment with an image, Int. J. Comput. Vision 5, 195-212 (1990). 6. D. J. Kriegman and J. Ponce, On recognition and positioning curved 3-D objects from image contours, IEEE Trans. Pattern Anal. Mach. Intell. 12, 1127-1137 (1990). 7. D.G. Lowe, Fitting parameterized three-dimensional models to images, IEEE Trans. Pattern Anal. Mach. lntell. 13, 441-4150 (1991). 8. W. Lie, C. Yu and Y. Cben, Model-based recognition and positioning of polyhedra using intensity-guided range sensing and interpretation in 3-D space, Pattern Recognition 23, 983-997 (1991). 9. P.C. Gaston and T. Lozano-P6rez, Tactile recognition and localization using object models: the case of polyhedra on a plane, IEEE Trans. Pattern Anal. Mach. lntell. 6, 257-266 (1984). 10. W. E. L. Grimson and T. Lozano-P6rez, Localizing overlapping parts by searching the interpretation tree, IEEE Trans. Pattern Anal. Mach. Intell. 9, 469-482 (1987). 11. D. W. Murray, Model-based recognition using 3D shape alone, Comput. Vision Graphics Image Process. 40, 250266 (1987). 12. E. K. Wong, Model matching in robot vision by subgraph isomorphism, Pattern Recognition 25, 287-303 (1992). 13. R. Horaud, B. Conio, O. Leboulleux and B. Lacolle, An analytic solution for the perspective 4-point problem, Comput. Vision Graphics Image Process. 47, 33-44 (1989). 14. W.K. Gu and T.S. Huang, Connected line drawing extraction from a perspective view of a polyhedron, IEEE Trans. Pattern Anal. Mach. Intell. 7, 422-430 (1985). 15. J. Burns, A. Hanson and E. Risemen, Extracting straight lines, IEEE Trans. Pattern Anal. Mach. Intell. 8, 425-455 (1986). 16. V. Venkateswar and R. Cbellappa, Extraction of straight lines in aerial images, IEEE Trans. Pattern Anal. Mach. Intell. 14, 1111-I 114 (1992). 17. L. K. Timothy and B. E. Bona, State Analysis: an Introduction. McGraw-Hill, New York (1968). 18. B. L. Van der Waerden, Algebra: Vol. 1. Frederick Ungar, New York (1970). 19. W. Groebner, Moderne Algebraische Geometrie. Springer, Wien (1949).
About the Author--GlovAr~Nl MAROLAwas born in Vicenza, Italy, in 1940. He received the Doct. Eng. degree in electronic engineering from Padua University, Italy, in 1965. He has been with the faculty of Electronic Engineering at Pisa University since 1967 where he is currently Associate Professor of Industrial Electronics. His fields of interest are control theory and robot vision.
About the Author--ANNA VACCARELLIwas born in Taranto in 1959. She received the Dr. Eng. degree in electronic engineering from Pisa University, Italy, in 1984. After graduating, she has been involved in research about digital signal and image processing with the Dipartimento di Ingegneria dell'Informazione of Pisa University. At present she is with the Instituto di Elaborazione dell'Informazione of the Italian Research Council (CNR) as a researcher. She is involved in pattern recognition problems.