0031 3203/93 $6.00+.00 Pergamon Press Lid © 1993 Pattern Recognilion Society
Pattern Reco#nition. Vol. 26, No. 7, pp. 1087 1098, 1993 Printed in Great Britain
REPRESENTATIVE PATTERNS FOR MODEL-BASED MATCHING JORJA HEN1KOFFand LINDA G. SHAPIRO~ Department of Electrical Engineering, University of Washington, Seattle, WA 98195, U.S.A. (Received 4 February 1992; in revisedform 7 January 1993; receivedfor publication 20 January 1993) A b s t r a c t - - A single intensity image of a three-dimensional (3D) object obtained under perspective projection can be reduced to a two-dimensional (2D) line drawing which contains patterns characteristic of the object and of its pose. These patterns can provide clues to narrow down the search for possible objects and poses when comparing 2D view-classmodels of 3D objects against images. This paper describes a general way of representing 2D line patterns and of using the patterns to consistently label 2D models of 3D objects. The representation is based on groups of three line segments that are likelyto be found in most imagescontaining man-made objects, but are unlikely to occur by accident. Experimental results using representative patterns to match 2D view-classmodels of 3D objects against real images of the objects are included.
Object recognition View-class models
Imagerepresentation Perceptual grouping Consistent labeling Discreterelaxation
I. INTRODUCTION Model-based matching is a well-known approach to object recognition in machine vision) 1 3) Some matching schemes reduce an intensity image to an edge image and treat it as a line drawing, hypothesizing matches between image lines and vertices and those found in boundary representation models. The hypothesized matches are then verified and possibly extended. Verification can be accomplished by hypothesizing enough matches to constrain the geometrical transformation from a three-dimensional (3D) model to a two-dimensional (2D) image under perspective projection.14tImages analyzed in this fashion are usually taken under controlled circumstances, for example, in a manufacturing setting. Even though noise and lighting are not particular problems in indoor scenes, edge detection is still often imperfect. In particular, detection of corners and trihedral vertices is troublesome. ~5) Consequently, matching schemes that depend on the accurate detection of vertices can run into problems when applied to poor quality real images) 6'7~ Matching perceptual groupings of features was suggested by Lowe) 41 His approach was to match a few significant groupings made up of certain arrangements of lines found in images. The arrangements are abstract, and their significance is determined by the 3D structure they imply and by the amount of information they contribute to estimating the parameters in a transformation from a 3D model to the 2D image. This paper is related to Lowe's perceptual groupings. Minimal processing is applied to an image to extract edges. The edges are grouped into modular building blocks called triples, a particular arrangement of three f Author to whom all correspondence should be addressed.
Model-based matching
lines. Triples and higher-level relationships among them form representative patterns. Different representative patterns show up in different views of a 3D object. A view class is a set of views in which the same representative patterns are visible. View class models consist of one or more representative patterns of a view class. Here we use a single representative pattern as a model in our matching scheme. This paper describes the use of representative patterns for model-based matching. Representative patterns extracted from an image are used to hypothesize matches between the image and a view-class model. Since patterns are modular, sub-patterns can also be easily labeled. This is particularly useful when an image contains only a part of a model pattern, due to imperfect edge detection or to occlusion. Section 2 defines triples and describes how triples are linked into chains, and how related chains form representative patterns. The result is a feature pyramid. Section 3 relates representative patterns to graphs. Section 4 describes an application of representative patterns to model-based matching using simple representative models of polyhedral objects, and Section 5 presents the results of experiments using real images. Finally, Section 6 summarizes the results. 2. A FEATURE PYRAMID BASED ON TRIPLES
Our goal was to develop a well-defined, abstract and robust representation of line drawings that can be computed automatically and quickly for both models and images, and that is useful for hypothesizing matches between image and model line segments, even for poor quality images. The resulting representation provides an impressionistic sketch of an image that generates clues for recognition.
1087
1088
J. HENIKOFFand L. G. SHAPIRO
The patterns are arrangements of line segments made up from basic units called triples. The definition of a triple, and therefore of the entire representation, is strictly 2D and relational. In images, which are considered imperfect line drawings, patterns are built up from whatever line segments may be detected. In models, which are considered perfect line drawings, patterns are broken down into sub-patterns of a size comparable to the image being recognized. Matches are hypothesized between the mutually largest possible subpatterns.
2.1. Junctions and triples Perfect line drawings of representative views are assumed to be available from models of 3D objects. These may be obtained from a geometric modeling system, or be produced manually. Images are subjected to minimal processing to detect only the strongest step edges. Edges are further filtered by discarding those shorter than a threshold. For both models and images, junctions are then inferred between the remaining line segments if their endpoints are within the same distance threshold, A line segment with junctions at both ends is a candidate for anchoring an ordered set of three lines forming a "U" shape which we call a triple. A triple, illustrated in Fig. l(a) is an ordered set of three lines and two junctions. The angles at the junctions must both be less than 180 deg when viewed from one side, which is designated as the "inside" of the triple. The
a)
2.2. Triple pairs, triple chains and sub-chains Two triples are considered to form a pair if they share one or two line segments. A pair type is noted by listing the numbers of the shared line segments. For example, in a "00" pair, line Io of both triples is shared, in a "1021" pair, lines 11 and 12 of one triple are shared with lines l0 and 11, respectively, of the other triple. Some pair types are illustrated in Figs l(b)-(d). Triple pairs can be extended into chains of several triples. The size of the chain is expressed in triples, and a single triple is considered to be a chain of size one. In this paper we use only non-undulating chains, those chains that have a well-defined "inside" in which all angles between segments are less than 180deg (Figs 2(a)-(b)). The triples are numbered to, tl, t 2. . . . .
b) 11 lz
l o ~
c)
10/
exact size of the angles is immaterial, as is the length of the line segments. The line segments in a triple are numbered lo, Ii, 12 clockwise from the inside. Other researchers have noted the usefulness of groupings of three linesfl '9) Three lines are sufficient to constrain a transformation from three dimensions to two with three rotation and three translation parameters:lO.ll) Triples are usually plentiful in images of machined parts, but are not likely to be accidental. If a line segment does not participate in at least one triple, it is not used to make a representative pattern. The requirement of two convex enclosed angles makes it possible to orient triples by giving them a welldefined inside and outside. A special type of triple with parallel legs was used by Mohan and Nevatia. t121
a)
b)
c)
d)
12
d)
11 ~12 11
Ii
Fig. 1. Triples and pairs of triples. (a) A single triple is an ordered set of three line segments in a convex arrangement. The lines are labeled lo, 11,12clockwise from the inside of the triple. An arrow is placed on the end of the third segment to mark the orientation and end of the triple. (b) A non-undulating pair of triples share two of their line segments, emphasized as thicker lines. Here line segments 11 and 12 in one triple are also l0 and 11 in the other, forming a "1021" pair. (c) An undulating pair of triples share one line. Here, line segment Iz of each triple is shared, forming a "22" pair. Again, the shared lines are shown thicker. (d) Another undulating triple pair. Line segment l0 of each triple is shared for a "00" pair.
Y Fig. 2. Non-undulatingchains. (a) An open, non-undulating chain of t = 3 triples. Open chains of t triples have t + I right turns and t + 2 sides. The triples are labeled t o, t I and t 2 next to their third segments. (b) A closed, non-undulatingchain of t = 4 triples. Closed chains of t triples have t right turns and form convex polygons of t sides. Since the arrangement is circular, the designation of triple t o is arbitrary. (c) An open base chain of three triples has sub-chains ranging from one to three triples in size. (d) Two sub-chains of the base chain in (c). On the left is one of three sub-chains of size 1 and on the right is one of two sub-chains of size 2.
Representative patterns for model-based matching clockwise from inside the chain. A non-undulating chain can be either open or closed; a closed chain outlines a convex polygon. Chains are only defined on the innermost segments; there are no connected line segments inside a chain that do not belong to it. With this restriction, a triple can belong to one and only one triple chain. Furthermore, a line drawing can be decomposed into chains in only one way, and chain construction does not depend on any information other than the line segments. A base chain is the maximal size non-undulating chain. A base chain of t triples can be broken into sub-chains. The n u m b e r of possible sub-chains is O(t2), and sub-chain size ranges from one to t triples. A base chain and two of its sub-chains are illustrated in Figs 2(c) and (d). Although most of the discussion in this paper assumes that all three segments of a triple are straight lines, there is no particular reason for this restriction, and the experiments included an object with curved lines. Curved lines make the determination of the convexity of the two angles ambiguous. For the experiments done here, angles were determined using only the endpoints of the segments, ignoring any curvature. Curved lines could be used as additional evidence for matching, provided they can be detected and classified reliably.
1089
a)
b)
Fig. 3. Related chains. (a) Two related chains. Chain M consists of three triples and chain N of two. The single line shared by the two chains can be expressed as triple pair relationships. (b) The chain relationship expressed as an undulating triple pair relationship (see Fig. l(c)) between triple 1 of chain M and triple 0 of chain N.
depend on angle size, but only on the fact that the triple angles are no more than 180deg when viewed from inside the triple, they are also impervious to rotation. Representative patterns are consequently unaffected by perspective projection. Furthermore, if an angle is less than 180deg from one viewpoint, it will also be less than 180 deg from another viewpoint of the same view class. Thus, representative patterns are viewpoint invariant;14) that is, a change of viewpoint within the same view class yields the same representative patterns. 2.4. The feature pyramid
2.3. Related triple chains and representative patterns Two base chains are related if they share a line segment. The chain relationship is expressed as a pair relationship between one triple from each chain. For example, a relationship between chains M and N is expressed as a pair type relationship between triple m of chain M and triple n of chain N. A pair relationship is denoted symbolically as
The features of line drawings discussed so far form a relational pyramid as defined by Lu and Shapiro/6~ This is a relational structure built up from primitives with increasingly complex relations (Fig. 4). At the base are the line segments. Above them are binary relationships among the lines: the junctions. Next come ternary relationships among the lines: the triples. Above triples are triple chains, then related triple chains and finally representative patterns. The pyramid describes
(rel) -- chain'triple(pair type)chain, triple. A single chain relationship may correspond to several different pair relationships between the two chains. In the example in Fig. 3(a), the single line segment shared by the two chains corresponds to four different triple pairs, one of which is shown in Fig. 3(b). Chain M consists of three triples and chain N of two. Their relationship in Fig. 3(b) is denoted as
I
Object ]
Representative Patterns Related Chains
(rel) = M.tl(22)N.tO. This means that the triple n u m b e r one of chain M is related to triple n u m b e r of chain N by a "22" pair type. A representative pattern is defined as the set of related chains that represent a line drawing. A pattern is minimally a single triple and maximally the entire line drawing. Since a line drawing can be decomposed only one way into chains, the representative pattern is unique. Since representative patterns are built up from topological relationships among lines without regard to line segment length or orientation, they are impervious to scale changes and translation, which are rigid transformations. Similarly, since the patterns do not
Triple Chains Ternary Relationships Binary Relationships Line Segments 1 Fig. 4. Feature pyramid. A relational pyramid of features representing an object. An object is expressed as a set of one or more representative patterns which are built hierarchically from the bottom to the top of the pyramid.
1090
J. HENIKOFFand L. G. SHAPIRO
a data structure that can be used to implement the representative patterns. A feature pyramid of a different type was used by Lu and Shapiro t61to determine the pose of a known object from an unknown view. Lu used line segments, junctions loops, and relationships among junctions and loops in his three-level pyramid. A device called a summary pyramid that counted the number of different types of relationships in the relational pyramid was used to rapidly select view classes for matching. 3. CHAINGRAPHAND SUB-CHAINGRAPH A chain graph can be used to describe the related chains in a representative pattern. The nodes of a chain graph correspond to triple chains and its arcs correspond to relationships between pairs of chains. A chain graph is planar and non-undulating chains can be made to correspond to regions of a planar graph. Then properties of planar graphs and their duals can be used to prove that the number of features (triples,
a) 1
1
b)
W
Fig. 5. Sub-chain graphs. (a) Model. The chain graph on the left represents the model line segments sketched on the right. The model has six base chains, A F, indicated by broken lines.The square nodes in the graph represent the base chains and the circular nodes behind them their possible sub-chains. The size of the sub-chains is indicated by the numbers next to the circular nodes, so the size of sub-chain F2 is 2 triples. In a matching scheme, the model sub-chains are labeled with image sub-chains of the same size.(b) Image. The chain graph on the left represents the image line segments sketched on the right. The image has five base chains, V-Z. While it is not possible to label the base chain F0 in the model correctly with any image sub-chain, it is possible to label sub-chain F2 correctly with base chain Z0 from the image.
chains, related chains) in a representative pattern is O(n), where n is the number of line segments. As each base chain can be broken down into sub-chains, the chain graph can be thought of as a set of graphs determined by the sub-chains and by the relationships between the sub-chains of different base chains. Subchains may inherit all, some or none of the base chain's relationships depending on which triples they include. Figure 5 illustrates the representative patterns and sub-chain graphs for a simple model and image. Figure 5(a) shows the model on the right with its six open base chains--which are collectively its representative pattern--indicated by broken lines and labeled A through F. On the left-hand side of Fig. 5(a) is the sub-chain graph corresponding to the model pattern. The square nodes represent the base chains and the chain letter from the model sketch appears inside. The circular nodes stacked behind the square ones represent the possible sub-chains and are numbered from 0, with 0 representing the base chain. The size of the sub-chains is indicated by a number next to the circular nodes. So, base chain F with a size of three triples has six subchains, one of size 3 (F0, the base chain itself), two of size 2 each (FI and F2), and three of size 1 each (F3, F4, F5). The lines connecting the nodes represent related sub-chains. Thus, sub-chain F2 of base chain F is of size 2 and shares a line segment with sub-chain A0 of base chain A, which is of size 1. On the right-hand side of Fig. 5(b) is a sketch of the edge image which suffers both from one missing and one extra line segment when compared with the model. There are five base chains making up its representative patterns, indicated by broken lines and labeled V through Z. Notice that V and Z do not share a line, nor do Y and W. On the left is the corresponding sub-chain graph. Here base chain Z0 has a size of two triples and has three sub-chains, one of size 2 and two of size 1 each. This layered graph structure can be used to match chains or sub-chains as described in the following section. If a feature represented by a long chain in a model is only partially detected in an image, due either to poor segmentation or to occlusion, the image subchain can be matched to a sub-chain of the same size in the model. In Fig. 5, for example, sub-chain Z0 in the image can be matched with sub-chain F2 in the model and then sub-chain X0 in the image can be matched consistently with A0 in the model. 4. A MATCHINGSTRATEGYUSINGTRIPLE CHAINS Matching using triple chains and the graph structures described above can be implemented in many ways; one possibility is discussed here. Simple representative models are derived from complete view classes of 3D models, and the objective is to find instances of all or part of a model in an image. Model triple subchains are the units and image triple sub-chains are the labels in a consistent labeling as formulated by Haralick and Shapiro. ~131The model features drive the matching
Representative patterns for model-based matching and careful construction of models can control matching computations as noted by Bhanu and Faugeras33) As soon as enough matches have been hypothesized to compute a transformation from the full 3D model to the image, verification can proceed; it is not necessary or desirable to attempt a complete consistent labeling.
4.1. Representative models For our purposes, a representative model is a topological subset of one or more complete views of a 3D object. For polyhedral objects with very distinctive faces, for example, a representative model can be constructed for each face. A representative model should be rich in representative patterns, where a pattern is richer if it contains more chains, and more relationships between chains. Chains that are the result of selfocclusion due to the pose of an object should not be included in representative models since they are the result of an accident of viewpoint. Closed chains should be avoided in representative models since there are too many ways to align them with an image chain; there is one way to align two open chains of the same size, but there are t ways to align two closed chains of size t triples. Automatic generation of representative models from full 3D models is possible. The representative pattern for each representative model is computed once and stored. This consists of all triple chains, sub-chains, and related subchains of different base chains. An image is reduced to its representative pattern representation for matching. It may be sufficient to match the base chains in the image and omit computing sub-chains. However, if image segmentation is poor, for example, if an edge is often broken into several short segments, it may be advantageous to compute and match sub-chains in the image as well as in the models, as we did in our experiments. 4.2. Consistent labelinq We labeled model sub-chains with image sub-chains using discrete relaxation3 ~41 Bhanu and Faugeras ~3i used a more complex form of relaxation labeling to label individual line segments, but our approach of labeling larger, more abstract units allows us to use a simpler approach. The units are the model sub-chains and the labels are the image sub-chains. The unary unit-label constraints are the possible sub-chain matches based on chain size alone. As defined in Section 2.2, chain size is defined as the n u m b e r of triples making up the chain, and a triple represents a viewpoint invariant arrangement of three line segments that is not dependent on the length of the line segments and only loosely dependent on the angles between them. Higher order constraints are built by labeling more and more related chains, and the constraints are propagated. Initially, nodes (sub-chains) in the model's chain graph are assigned all image sub-chains of the same size as possible labels. Next, all possible duos of related model sub-chains are labeled by duos of image sub-chains
1091
related in the same way. In the next level of relaxation labeling, model sub-chains are grouped into related trios and possible labels are assigned. At the end of three levels of labeling all groups of three related model chains will be labeled. Since three chains contain at least three triples sharing at most two lines, at least four line segments will be matched, enough to estimate a transformation and attempt verification. The consistent labeling constraints are stated formally below for the first three orders. Let (red) = triple(pair type)triple denote a relationship between two chains as described in Section 2. The three elements of(rel) are the numbers of the triples in the two chains which share a pair relationship, and the type of the pair relationship between them. Units U = {model sub-chains}. Labels L =- {image sub-chains}. Then Unary constraints: U × L = { ( u , l ) : u 6 U , l e L and size(u) = size(I)}. Binary constraints (chain graph): U × U = {(u 1, u2): u 1, u2 e U and u 1(rel)u2 for some (rel) }. L × L = {(l l, 12):/1,12 ~ L and l 1(rel)12 for some (rel) }. (U × L) 2 = {((ul,ll),(u2,12)): (ul,ll),(u2,12)~U × L, (ul,u2)eU x U,(ll,12)eL x L, and the value of(rel) is the same for the two pairs}. Ternary constraints (related chain graph): (U X L) 3 -~ {((ul,ll),(u2,12),(u3,13)):([ul,ll),(u2,12)), ((u2,12),(u3,13))e(U x L)2}. 4.3. Scorin9 matches When a match is hypothesized between one or more sub-chains using relaxation labeling, it is necessary to assign a score to it so that all hypothesized matches can be ordered for verification. One way to do this is to estimate the similarity between each matched image and model chain, which can be expressed as a conditional probability: Prob(ImageChainJModelChain). The probabilities can either be estimated from sample images, or propagated up the feature pyramid. For our experiments, we treated determination of the match scores as a calibration step of our imaging set-up and estimated them by analyzing 20 sample images of four different models. The representative pattern for each image was computed and the sizes of the image chains manually compared with the sizes of the corresponding model chains and the results collected to estimate: Prob(ImageChainSize = llModelChainSize = M). We then used these same probabilities to score the matches for all subsequent experiments. At each level of relaxation labeling, the scores of all the individually matched chains have to be combined
•092
J. HENIKOFFand L. G. SHAPIRO
to score the complete match. We do this simply by adding the scores for the individual matches. Thus at level one the maximum score is 1.0, at level two it is 2.0, and at level three it is 3.0 Additive scoring is reasonable because more inter-relationships among matched chains is stronger evidence for the match. If the score is only used to rank hypotheses for verification, more sophisticated methods of combining evidence from the individual matches are not required.
a) IL_
b)
J c)
d)
4.4. Verifying hypothesized matches If relaxation labeling is stopped after three levels, all singles, duos and trios of related model sub-chains will be labeled by all possible image singles, duos and trios. Some model chains may have several potential labels and some may have none. Each labeling is scored, and each represents a hypothesized match between model and image features. A group of related model subchains may have multiple possible image labels for several reasons: the model may occur more than once in the image, or the model may be symmetric and match the image in more than one orientation, or incorrect labels may be assigned because of poor segmentation or because of the purely topological nature of triple chains or because matching is being attempted to the wrong model. Each hypothesized match of related sub-chains can be expressed as a set of matched line segments or junctions. (In general, matches between the free endpoints of open chains are unreliable.) The matched representative model features correspond to features in the full 3D model from which the representative model was obtained. Verification begins by estimating the geometric transformation from the 3D model to the image implied by the matches, using some convenient technique, t~al The highest scoring hypotheses with enough matched features to compute the transformation should be verified first. Once a transformation has been estimated, verification could be completed by a back-projection technique, t4) However, if the model is large and complex enough, different parts of it may have been labeled independently, so another possibility is to look for other independently hypothesized sub-chain matches that are consistent with the transformation. If the model is rich enough to allow this approach, it is much easier to compute than back-projection comparisons.
5. M A T C H I N G EXPERIMENTS
The matching strategy outlined above has been implemented and applied to 33 real images of one of four different machined parts; each of these images is a picture of a single part. Several images containing more than one part were also analyzed. Each part is a polyhedron described by a geometric modeling system (PADL-2 t~5~) and a representative model was made for each one consisting of the outline of the most complex face (Fig. 6). The objects are named CUBECUT,
Fig. 6. Representative models. The models for the four polyhedral objects used in the matching experiments. The face of each object with the most complex representative pattern was used. The idea behind the matching scheme is to look for these models in the images. The representative patterns are drawn as broken lines. (a) The CUBECUT object has two chains, which are related. (b) The TARROW object has two chains, which are not related. (c) The CUBE3CUT object has six chains which are richly related. (d) The WIDGET object has curved line segments which make its representative pattern somewhat ambiguous. The circles were broken into three segments and the semi-circlesinto two by our image processing routines.
TARROW, W I D G E T and CUBE3CUT. The images each contain at least one instance of one of the four representative models. The images were captured with a video camera system, using only the normal overhead lights in the room, and transferred to a SUN-3 workstation where they were processed using standard image processing routines with fixed parameters (GIPSyI~6~). Four of the intensity images of single objects are shown in Fig. 7. Step edges were extracted using the Sobel operator with a threshold that kept only the strongest edges. The edges were then thinned, corners were detected, and the resulting line segments were classified as straight or conic. Finally, all lines less than nine pixels in length were discarded. Fixed thresholds and parameters, determined by analyzing six sample images in detail, were used to process all images. No attempt was made to optimize the results for each image. A file of the final line segments including endpoint coordinates, length and classification was input to a series of C programs that detect the triple chains, build the chain graph and perform three levels of relaxation labeling. Sub-chains were used for matching in both the models and images. The final output is a list of hypothesized m o d e l - i m a g e line segment matches ordered by similarity score. Verification was done manually. Similarity scores for hypothesized matches were assigned on the.basis of chain size alone and were determined before the experiments were run by analyzing
1093
Representative patterns for model-based matching
a)
b)
c)
d) ":+.:~:i:i: • x.:."
N ~ ~:,..~4~.~
•"
:i x
:::~:::r:
:::. :'
.-.:.
iii:
:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ,,-.+x,:<+x+-+:.',+:+
.',xx ::-'+x:::+
.x
= ==================================================="-' ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Fig. 7. Sample images. Images typical ofthe 33 pictures ofsingle objects analyzed in Table 1. (a} CUBECUT object, (b) TARROW object, (c) CUBE3CUT object, (d) WIDGET object. several sample images to estimate the conditional probabilities that an image chain would be of size I whenever a corresponding model chain is of size M for all possible combinations of I and M (see Section 4.3). Refinements were attempted on the basis of other chain properties (open or closed, inclusion of curved line segments), but they did not improve results. In fact, since the image processing was rather rudimentary, the refinements tended to rule out too many correct matches. For the images of single parts, another step was actually done before the relaxation labeling; the four representative models were ranked using a summary feature vector from the feature pyramid described in Section 2. Counts of each type of feature in the image and models were compared, an approach similar to that used by Lu to determine the pose of a known object, t6~ In our experiments, the correct model was usually ranked either first or second, but the results will not generalize to more complicated images with multiple objects where such simple counts would not
suffice. The rankings, though not always correct, were used to order the models to be labeled for the images containing one object. For the images containing multiple objects this ranking approach proved useless, and all four models were matched to each image and the results merged by similarity score. Since labeling is very fast (2-3 s per model on an IBM PC/AT for the multiple object images), this approach is not unreasonable. Figures 8(a)-(d) illustrate the matching results for four images, one of each object used in the experiments. In each figure, the best match to the correct model is shown with heavier lines. The representative model is on the left and the edge image on the right. Correctly matched endpoints are circled. The highest scoring hypothesis was nearly correct for the CUBECUT object in Fig. 8(a). Seven matched points were hypothesized for verification, although two of them are incorrect. In the image, the outline of the object is traced as a chain, even though it follows
1094
J. HENIKOFFand L. G. SHAPIRO
a)
b)
j jl-
v
c)
d)
Fig. 8. Samplematching results. The results of matching the images in Fig. 7 to the correct model. For each image, the model is sketched on the left and the line segments u~ed for matching on the right. Matches are shown as heavy lines with the correctly matched endpoints circled. Incorrectly matched endpoints are shown as heavy dots. (a) CUBECUT object, (b) TARROW object, (c) CUBE3CUT object, (d) WIDGET object.
a shape caused by self-occlusion. One possible approach to this phenomenon would be to ignore obviously occluded lines in the image for chain formation, such as lines that form the stem of a T-junction. In this image, one of the corners failed to be detected as well. Even with these problems, the match was reasonable. The TARROW object was matched pretty well to its model on the seventh hypothesis, as illustrated in Fig. 8(b). This model is very weak because it includes no related chains, just two short, unrelated chains. As a consequence, our routines were unable to determine the correct model for matching, and attempted to label several wrong models before this match was found. The first three hypotheses were correct for the CUBE3CUT object in Fig. 8(c). Each contained enough matched features to compute a transformation and they could be used to corroborate each other. This is
an especially good image for the representative model used. Figure 8(d) illustrates the WIDGET object which has curved line segments. After four tries, one end was partly matched, but only three of the five matched points are correct. The problem we encountered with most of the images containing curved lines was that the curves were broken into different numbers of segments by the image processing routines depending on the orientation of the object. The results for all 33 images of single objects are summarized in Table 1. The images averaged 22 line segments of at least 9 pixels in length, 14junctions, 10 triples, and 5 triple chains which were sub-divided into 21 sub-chains. On average, there were 22 relationships between groups of two sub-chains, and 47 relationships between groups of three sub-chains. Of the hypothe-
1095
Representative patterns for model-based matching Table 1. Summary of results for images of one object
Images Line segments Junctions Triples Triple pairs Triple chains Sub-chains Duos Trios Models until correct one Matches of > 3 triples Hypotheses until first correct one Images with no correct matches Seconds
CUBECUT
CUBE3CUT
7 16.8 8.1 6.0 7.1 2.6 13.0 3.8 6.9
11 22.5 16.6 I 1.3 14.6 6.2 21.5 21.7 24.7
1.3
Object WIDGET
TARROW
All
7 27.0 18.3 12.0 15.4 6. ! 25.0 16.9 26.7
7 24.1 12.7 10.7 16.6 4.1 25.7 51.0 148.9
33 22.4 14.1 10.0 13.4 4.9 21.1 22.6 47.2
2.0
1.6
2.9
1.9
1.8
11.4
7.3
10.7
8.0
1.1
3.5
5.4
10.7
4.9
1 4.5
0 9.4
1 10.7
3 13.3
5(15~o) 9.3
All values are averages based on the number of images. The "Seconds" row reports time for feature detection and matching on an IBM PC/AT.
sized matches (some to incorrect models) after three levels of relaxation, an average of 8 included enough features to attempt verification (the criterion used was at least three matched triples), and four hypotheses had to be verified on the average before a correct transformation was found. After the image processing was completed, detection of triple chains and hypothesis generation averaged 9 s on an IBM PC/AT. A more difficult image, containing three of the modeled objects, is shown in Fig. 9. The intensity image is shown in Fig. 9(a). The CUBE3CUT object is occluded, part of the CUBECUT object is in shadow, and part of the W I D G E T object is outside of the image. Figure 9(b) shows the results of applying the Sobel edge operator, Fig. 9(c) shows the results after thinning the edge image, and Fig. 9(d) shows the final line segments of at least nine pixels in length, which were used for matching. The final set of line segments includes only the strongest edges in the image, and small noise lines have been eliminated. For this image and other images of multiple objects, line segments forming the stems of T-junctions were ignored when constructing representative patterns. The best matches to the CUBE3CUT and CUBECUT models are indicated by heavier lines in Fig. 9(d), no correct hypotheses were generated to the WIDGET model. Three of the four highest scoring hypothesized matches to the CUBE3CUT object were correct and these overlapping matches covered the visible part of its top. The highest scoring match was of three triples, the other two were of five triples each. The highest scoring match to the CUBECUT object, consisting of three triples, was correct. These results illustrate the
usefulness of the ability to match sub-chains when an object is occluded or only partially detected. Three other images with multiple objects are shown in Fig. 10. The first image (Figs 10(a) and (b)) contains the CUBE3CUT, CUBECUT and WIDGET objects. The CUBE3CUT object is occluded, and the CUBECUT and WIDGET objects are poorly detected. No good hypotheses were generated for the latter two objects, but the best match to the CUBE3CUT model (Fig. 6(c)) includes five triples and is correct. The image in Figs 10(c) and (d) contains spurious lines in addition to three of the modeled objects. The objects are again only partially visible. In addition, artifacts of the image processing routines caused a corner to be missed on the CUBECUT object. Nevertheless, a good match of four triples to the CUBE3CUT model was found on the fourth hypothesis. The rectangles in the image add closed chains to its representative pattern, and they contribute many possible image sub-chain labels for model sub-chains. The best match to the CUBECUT model is close and would be correct if the corner were properly detected. The image in Fig. 10(e),containing two of the modeled objects plus a third unknown object, required 43 s to analyze on the PC and 10 s on a SUN-3 workstation. The first hypothesized match to the CUBE3CUT model was correct and consisted of three triples. Figure 10ft) shows the edge image with the best match to the CUBE3CUT model. Four other corroborating matches were made, covering the rest of the top of the CUBE3CUT object. Ignoring the stems of T-junctions when computing the image representative pattern was very helpful for this object because the T-junctions are the
1096
J. HENIKOFF and L. G. SHAPIRO
a)
b)
c)
ct)
Fig. 9. An image with multiple objects. An image containing three of the objects. (a) Intensity image, (b) edge image, (c) thinned edge image, (d) line segments used for matching with single best matches to the CUBE3CUT and CUBECUT models indicated as described for Fig. 8. Two other corroborating matches to the CUBE3CUT model covered the rest of the top of this object.
Representative patterns for model-based matching
a)
1097
b)
f
c)
d)
e)
f)
Fig. 10. Three images with multiple objects. For each of the three images, the intensity image is shown on the left and the line segments used for matching on the right with the best matches indicated as described for Fig. 8. The models are shown in Fig. 6. (a) Intensity image of three objects and (b) matches to CUBE3CUT model. (c) Intensity image with three objects plus spurious lines and (d) best matches to the CUBE3CUT and C U B E C U T models. The match to the C U B E C U T model is incorrect because one of the inside corners was missed by the image processing routines. (e) Intensity image with two modeled objects plus another object and (f) best matches to the CUBE3CUT model. The line segments which form the stems ofT-junctions in the image were ignored when the representative patterns were computed.
1098
J. HEN1KOFFand L. G. SHAPIRO
result of self-occlusion. However, no good matches were hypothesized to the C U B E C U T model, in part because some of the ignored T-junction stems belong to the top of the C U B E C U T object and are the result of occlusion by other objects in the picture.
6. DISCUSSION Our goal was to define representative patterns and investigate their usefulness for the hypothesis generation phase of a model-based matching procedure. The patterns based on triple chains are abstract and flexible. They can be computed automatically for both models and images and their modularity and hierarchical construction facilitate matching of incompletely detected features. Representative patterns can be employed in any consistent labeling scheme. The strategy discussed here takes advantage of the modular properties of patterns, which we treat as clues to the identity of an object. Since no attempt is made to infer 3D structure from the patterns, they are equally applicable to matching 2D objects. The most encouraging outcome of this work is that it is not necessary to have perfectly segmented images to make some sense out of them. As long as at least one triple can be found, hypotheses can be made for verification. It is even possible to use default parameter values for all image processing and feature detection operations.
REFERENCES
1. L. G. Roberts, Machine perception of three-dimensional solids, Optical and Electro-optical Information Processing, J. P. Tippett, ed. MIT Press, Cambridge, Massachusetts (1965).
2. R.C. Bolles and R. A. Cain, Recognizing and locating partially visible objects: the local-feature-focus method, Int. J. Robotics Res. 1, 57 (1982). 3. B. Bhanu and O. D. Faugeras, Shape matching of twodimensional objects, IEEE Trans. Pattern Analysis Mach. Intell. 6, 137 (1984). 4. D. G. Lowe, Three-dimensional object recognition from single two-dimensional images, Artif. Intell. 31,355 (1987). 5. E. De Micheli, B. Caprile, P. Ottonello and V. Torre, Localization and noise in edge detection, IEEE Trans. Pattern Analysis Mach. Intell. 11, 1106 (1989). 6. H. Lu and L. G. Shapiro, A relational pyramid approach to view class determination, IEEE Workshop on Interpretation of 3-D Scenes, November (1989). 7. N.T. Chu and L.G. Shapiro, Experiments in modelbased matching using a relational pyramid representation, SPIE Conf. on Applications of Artificial Intelligence, April (1990). 8. S.T. Barnard, Choosing a basis for perceptual space, Comput. Vision Graphics Image Process. 29, 87 (1985). 9. S. Linnainmaa, D. Harwood and L. S. Davis, Pose determination of a three-dimensional object using triangle pairs, IEEE Trans. Pattern Analysis Mach. Intell. 10, 634 (1988). 10. M. Dhome, M. Richetin, J. T. LaPreste and G. Rives, Determination of the attitude of three-dimensional objects from a single perspective view, IEEE Trans. Pattern Analysis Mach. Intell. 11, 1265 (1989). 11. Y. Liu, T. S. Huang and O. D. Faugeras, Determination of camera location from a single perspective view, IEEE Trans. Pattern Analysis Mach. Intell. 12, 28 (1990). 12. R. Mohan and R. Nevatia, Using perceptual organization to extract 3-D structures, 1EEE Trans. Pattern Analysis Mach. Intell. 11, 1121 (1989). 13. R. M. Haralick and L. G. Shapiro, The consistent labeling problem: part I, IEEE Trans. Pattern Analysis Mach. lntell. 1, 173 (1979). 14. D. H. Ballard and C. M. Brown, Computer Vision. PrenticeHall, Englewood Cliffs, New Jersey (1982). 15. PADL-2, Production Automation Project, College of Engineering and Applied Science, The University of Rochester, Rochester, New York. 16. GIPSY, General Image Processing System, Intelligent Systems Laboratory, Electrical Engineering Department, University of Washington, Seattle, Washington.
About the Author--JORJA HENIKOFFreceived the B.S. degree in mathematics from Stanford University in
1968 and the M.A. degree in mathematics from Boston University in 1972. She worked in the data processing industry for 15 years and then received the M.S.E.E. degree in electrical engineering from the University of Washington in 1990. She is currently self-employed.
About the Author--L1Nog G. SHAPIROwas born in Chicago, Illinois, in 1949. She received the B.S. degree in mathematics from the University of Illinois, Urbana, in 1970, and the M.S. and Ph.D. degrees in computer science from the University of Iowa, Iowa City, in 1972 and 1974, respectively. She was an Assistant Professor of Computer Science at Kansas State University, Manhattan, from 1974 to 1978 and was an Assistant Professor of Computer Science from 1979 to 1981 and Associate Professor of Computer Science from 1981 to 1984 at Virginia Polytechnic Institute and State University, Blacksburg. She was Director of Intelligent Systems at Machine Vision International in Ann Arbor from 1984 to 1986. She is currently Professor of Computer Science and Engineering and of Electrical Engineering at the University of Washington. Her research interests include computer vision, artificial intelligence, pattern recognition, robotics, and spatial database systems. She has co-authored two textbooks, one on data structures and one on computer and robot vision. Dr Shapiro is a senior member of the IEEE Computer Society and a member of the Association for Computing Machinery, the Pattern Recognition Society, and the American Association for Artificial Intelligence. She is Editor of CVGIP: Image Understanding and an editorial board member of IEEE Transactions on Pattern Analysis and Machine Intelligence and of Pattern Recognition. She was General Chairman of the IEEE Workshop on Directions in Automated CAD-based Vision in 1991, General Chairman of the IEEE Conference on Computer Vision and Pattern Recognition in 1986, General Chairman of the IEEE Computer Vision Workshop in 1985, and Co-program Chairman of the IEEE Computer Vision Workshop in 1982; and she has served on the program committees of a number of vision and AI workshops and conferences.