A Probabilistic Approach to Geometric Hashing Using Line Features

A Probabilistic Approach to Geometric Hashing Using Line Features

COMPUTER VISION AND IMAGE UNDERSTANDING Vol. 63, No. 1, January, pp. 182–195, 1996 ARTICLE NO. 0013 NOTE A Probabilistic Approach to Geometric Hashi...

1MB Sizes 5 Downloads 129 Views

COMPUTER VISION AND IMAGE UNDERSTANDING

Vol. 63, No. 1, January, pp. 182–195, 1996 ARTICLE NO. 0013

NOTE A Probabilistic Approach to Geometric Hashing Using Line Features FRANK C. D. TSAI IBM Zu¨rich Research Laboratory, Sa¨umerstrasse 4, CH-8803 Ru¨schlikon, Switzerland Received June 14, 1993; accepted October 20, 1994

Most current object recognition algorithms assume reliable image segmentation, which in practice is often not available. We examine the combination of the Hough transform with a variation of geometric hashing as a technique for model-based object recognition in seriously degraded single intensity images. Prior work on the performance analysis of geometric hashing has focused on point features, which can be hard to detect in an environment affected by serious noise and occlusion. This paper uses line features to compute recognition invariants in a potentially more robust way. We investigate the statistical behavior of these line features analytically. Various viewing transformations, which 2-D (or flat 3-D) objects undergo during image formation, are considered. For the case of affine transformations, which are often suitable substitutes for more general perspective transformations, we show experimentally that the technique is noise resistant and can be used in highly occluded environments.  1996 Academic Press, Inc.

1. INTRODUCTION

Visual object recognition can be complicated by the partial overlapping of the objects or the possible existence of occluding or unfamiliar objects in the scene. Prior research (see surveys [2, 3, 6]) has indicated that a model-based approach to object recognition can be very effective in overcoming occlusion, complication, and inadequate or erroneous low-level processing. A model-based paradigm, geometric hashing, was proposed by Lamdan et al. with application to point features [19]. The analysis of geometric hashing on point sets using bounded error models shows considerable noise sensitivity [11, 21]. Introducing Gaussian error models in conjunction with a Bayesian evidence accumulation scheme alleviates the problem significantly [29, 33]. However, the locations of point features can be hard to detect in noisy scenes. Line features are more robust and can be extracted by the Hough transform method with greater accuracy, since more image points are involved [16]. Although point features can be obtained by intersecting lines, this increases uncertainty. It can thus be advantageous to work directly 182 1077-3142/96 $18.00 Copyright  1996 by Academic Press, Inc. All rights of reproduction in any form reserved.

on line features by computing the line invariants using a combination of lines [13]. We examine line invariants under various transformations and investigate their statistical behavior under uncertainty. We impose minimal segmentation requirements; only positional information on edgels is assumed available. The method presented combines use of the Hough transform, geometric hashing, and the Bayesian weighted voting scheme of Rigoutsos [28]. This paper is organized as follows. In Section 2, we briefly review the original geometric hashing method; closed-form formulas for line invariants under various viewing transformations are given in Section 3. Since no reliable segmentation is assumed, we extracted our line features by means of an improved Hough transform technique that operates on seriously degraded environments. In Section 4, we modeled the statistical behavior of detected line parameters using a Gaussian random process with mild assumptions. We then analyzed the statistics of the computed invariants. This analysis yields a measure of the closeness of two invariants, upon which a Bayesian maximum likelihood pattern classification is based. Section 5 describes the resulting improvement of the recognition procedure. We have implemented a system that makes use of the above techniques for the case of affine transformations and this is described in Section 6. At this point, we also discuss possible best-least-squares match procedures for model registration. Experimental results are also presented. Section 7 discusses some observations of the approach and possible future extensions. 2. BRIEF REVIEW OF THE GEOMETRIC HASHING METHOD

The geometric hashing idea has its origins in work of Schwartz and Sharir [34]. Application of the idea for model-based visual recognition was described by Lamdan et al. This section outlines the original method; a more complete description can be found in Lamdan’s dissertation [18]; a parallel implementation can be found in

183

PROBABILISTIC GEOMETRIC HASHING USING LINES

[31, 32]; for later developments of the technique, refer to [9, 10, 12, 15, 23, 24, 29, 36]. The geometric hashing method proceeds in two stages; a preprocessing stage and a recognition stage. In the preprocessing stage, we construct a model representation by computing and storing in a hash table redundant, transformation-invariant model information derived from local features of model objects. In particular, for each model, M, all bases consisting of k points (k depending on the transformation the model object undergoes during image formation) are exploited. For every feasible basis, B, the invariants of all the remaining model points are computed in terms of B and (M, B) is then stored in the hash entries indexed by these invariants. During the subsequent recognition stage, the same invariants are computed from features in a scene and used as indexing keys to retrieve the possible matches with the model features from the hash table. If a model’s features score a sufficiently number of matches, we hypothesize the existence of an instance of that model in the scene.

fifth line (u, r) t can be encoded in terms of the quadruplet as a basis, in a projective-invariant way. One way to encode the fifth line, (u, r) t, is to find a transformation T which maps the quadruplet to a canonical basis, e.g., (0, 0) t, (f/2, 0) t, (0, 1) t and (f/2, 1) t then apply T to the fifth line. It can be shown that the invariant (u9, r9) t of (u, r) t in terms of (ui , ri ) it 51,2,3,4 is

u9 5 tan21(DEF/ABC), r9 5

3.1. Line Invariants under Projective Transformations We need five lines to define a shape signature under the projective group; given an ordered quadruplet of lines (ui , ri ) it 51,2,3,4 in a general position (i.e., no three lines are parallel to one another or intersect at a common point), any

Ï(ABC)2 1 (DEF )2

,

where A 5 r1 sin(u2 2 u3 ) 1 r2 sin(u3 2 u1 ) 1 r3 sin(u1 2 u2 ), B 5 r4 sin(u 2 u2 ) 2 r2 sin(u 2 u4 ) 1 r sin(u2 2 u4 ), C 5 r4 sin(u1 2 u3 ) 2 r3 sin(u1 2 u4 ) 1 r1 sin(u3 2 u4 ), D 5 2r3 sin(u 2 u1 ) 1 r1 sin(u 2 u3 ) 2 r sin(u1 2 u3 ), E 5 r4 sin(u1 2 u2 ) 2 r2 sin(u1 2 u4 ) 1 r1 sin(u2 2 u4 ),

3. LINE INVARIANTS UNDER VARIOUS TRANSFORMATION GROUPS

Potentially relevant transformation groups include rigid, similarity, affine, and projective transformations, depending on the manner in which an image is formed. Each of these transformation classes is a subgroup of the full group of projective transformations. Since a projective transformation is a collineation [17], all these transformations preserve lines. This allows us to use line features as inputs to the recognition procedures, because the feature type is invariant. The idea behind the geometric hashing method is to encode local geometric features in a manner which is invariant under the geometric transformation considered [20]. The same technique can apply directly to line features, without resorting to point features indirectly derived from lines, providing that we use some method of encoding line features in an invariant way [13, 20]. In the following, we present the formulas of the invariants under various transformation groups. For additional detail, refer to [36]. Throughout the following discussion, we represent a line by its normal parameterization (u, r) with u in [0, f) and r in R. If a computed invariant (u9, r9) has u9 outside [0..f), we adjust it by adding f or 2f and accordingly flipping the sign of r9.

2GCF

F 5 r4 sin(u2 2 u3 ) 2 r3 sin(u2 2 u4 ) 1 r2 sin(u3 2 u4 ), G 5 r2 sin(u 2 u1 ) 2 r1 sin(u 2 u2 ) 1 r sin(u1 2 u2 ). 3.2. Line Invariants under Rigid, Similarity, and Affine Transformations Using similar methods to define shape signatures, we need a pair of nonparallel lines to encode a third line under the Euclidean group, a triplet of lines, not all parallel, to encode a fourth line under the similarity group, and a triplet of nonparallel lines to encode a fourth line under the affine group. For the Euclidean group, the invariant (u9, r9) t of a third line (u, r) t in terms of the basis pair, (ui , ri ) it 51,2 , is f u9 5 u 2 u1 1 , 2 r9 5 r 1 csc(u1 2 u2 )(r2 sin(u 2 u1 ) 2 r1 sin(u 2 u2 )), or f u9 5 u 2 u1 2 , 2 r9 5 r 1 csc(u1 2 u2 )(r2 sin(u 2 u1 ) 2 r1 sin(u 2 u2 )), if (ui , ri ) it 51,2 are mapped to (f/2, 0) t, and (w, 0) t, respectively, where w is fixed through unknown. Note that this mapping involves a 1808 rotation ambiguity. This ambiguity can be easily broken if we know the position of a third line during verification.

184

FRANK C. D. TSAI

For the similarity group, the invariant (u9, r9) t of a fourth line (u, r) t in terms of the basis triplet, (ui , ri ) it 51,2,3 , is (without loss of generality, assuming (u1 , r1 ) t intersects (ui , ri ) it 52,3 ) f u9 5 u 2 u1 1 , 2 r9 5 sin(u1 2 u3 )B/uAu,

the line parameters of the segment. It is thus reasonable to assume that detected line parameters differ from the ‘‘true’’ values by a small perturbation having a Gaussian distribution. Similar to the assumptions made to analyze point features [7, 29, 33], the following mild assumptions are made: The measured line parameters of detected lines are statistically independent and each is distributed according to a Gaussian distribution centered at the true value of the parameter, with a fixed covariance

where A 5 r1 sin(u2 2 u3 ) 1 r2 sin(u3 2 u1 ) 1 r3 sin(u1 2 u2 ),

S5

S

D

s 2u

su r

su r

s 2r

.

B 5 r2 sin(u 2 u1 ) 2 r1 sin(u 2 u2 ) 1 r sin(u1 2 u2 ), if (ui , ri ) it 51,2,3 are mapped to (f/2, 0) t, (w1 , 0) t, and (w2 , sin uu1 2 u3 u) t, respectively, with w1 and w2 fixed though unknown. For the affine group, the invariant (u9, r9) t of a fourth line (u, r) t in terms of the basis triplet, (ui , ri ) it 51,2,3 , is

u9 5 tan21(csc(u1 2 u3 ) sin(u1 2 u) csc(u1 2 u2 )A/ csc(u2 2 u3 ) sin(u2 2 u) csc(u1 2 u2 )A),

(1)

Based upon the assumptions, we obtain the following general analytical formulas applicable to the perturbation functions of the invariants for all the various viewing transformations considered in Section 3. We show that all these perturbation functions have a Gaussian distribution in a first-order approximation. From Section 3, the computed invariant is a function of the line being encoded, (u, r)t, and the basis lines, (ui , ri )it 51,...,k (k depending on the specific transformation considered):

1 r9 5 (r1 csc(u1 2 u2 ) sin(u2 2 u) t 2 r2 csc(u1 2 u2 ) sin(u1 2 u) 1 r),

(2)

(u9, r9) 5 f ((u, r)t, (u1 , r1 )t, . . . , (uk , rk )t ).

where Equivalently we may write the above equation as follows:

t5 Ïcsc2(u1 2 u3 ) sin2(u1 2 u) 1 csc2(u2 2 u3 ) sin2(u2 2 u)

ucsc(u1 2 u2 )Au, A 5 r1 sin(u2 2 u3 ) 1 r2 sin(u3 2 u1 ) 1 r3 sin(u1 2 u2 ), if (ui , ri ) it 51,2,3 are mapped to (0, 0) t, (f/2, 0) t, and (f/4, 1/ Ï2) t, respectively.

u9 5 f 9((u, r)t, (u1 , r1 )t, . . . , (uk , rk )t ),

(3)

r9 5 f 0((u, r)t, (u1 , r1 )t, . . . , (uk , rk )t ).

(4)

Introducing perturbations of (u, r)t and (ui , ri )it 51,...,k results in a perturbation of (u9, r9)t :

4. THE EFFECT OF NOISE ON THE INVARIANTS

In an environment affected by serious noise and occlusion, line features are usually detected using the Hough transform. Therein detected line parameters always deviate slightly from their true values. As long as a line segment is not too short, this induces only slight perturbation of

u9 1 du9 5 f 9((u 1 du, r 1 d r)t, (u1 1 du1 , r1 1 d r1 )t, . . . , (uk 1 duk , rk 1 d rk )t ),

(5)

r9 1 d r9 5 f 0((u 1 du, r 1 d r) , (u1 1 du1 , r1 1 d r1 ) , . . . , t

(uk 1 duk , rk 1 d rk )t ).

t

(6)

185

PROBABILISTIC GEOMETRIC HASHING USING LINES

Expanding Eqs. (5) and (6) in Maclaurin series and ignoring second- and higher order perturbation terms and then subtracting Eqs. (3) and (4) from them, we have

OS

5

F

G

1 1 exp 2 (DQ9, DR9)S921(DQ9, DR9)t 2 2f ÏuS9u

D

k ­u9 ­u9 ­u9 ­u9 du 1 dr 1 dui 1 d ri ­u ­r ­ u ­ri i i 51

du9 5

(11)

5 c11du 1 c12d r 1 c13du1 1 c14d r1 1 ? ? ? 1 c1,2k11duk 1 c1,2k12d rk ,

OS

k ­r9 ­r9 ­r9 ­r9 du 1 dr 1 dui 1 d ri ­u ­r ­ri i 51 ­ui

d r9 5

and covariance matrix S9 5

D

(7)

1 c2,2k11duk 1 c2,2k12d rk ,

(8)

where c11 5 ­u9/­u, c12 5 ­u9/­r, c21 5 ­r9/­u, c22 5 ­r9/ ­r, c1,i 5 ­u9/­ui22 , c2,i 5 ­r9/­ui22 , for i 5 3, 5, . . . , 2k 1 1, and c1,i 5 ­u9/­ri22 , c2,i 5 ­r9/­ri22 , for i 5 4, 6, . . . , 2k 1 2. Let (DQ, DR), (DQi , DRi )i 51,...,k , and (DQ9, DR9) be stochastic variables denoting the perturbations of (u, r), (ui , ri )i 51,...,k and (u9, r9), respectively. From Eqs. (7) and (8), we have (DQ9, DR9)t 5

S

D

c11 c12 . . . c1,2k11 c1,2k12

c21 c22 . . . c2,2k11 c2,2k12

V, (9)

where V 5 (DQ, DR, DQ1 , DR1 , . . . , DQk , DRk )t. Since p(DQ, DR) and p(DQi , DRi )i 51,...,k are Gaussian and independent, the joint p.d.f. p(DQ, DR, DQ1 , DR1 , . . . , DQk , DRk ) is also Gaussian. More precisely, p(DQ, DR, DQ1 , DR1 , . . . , DQk , DRk ) 5 p(DQ, DR)p(DQ1 , DR1 ) ? ? ? p(DQk , DRk ) (10)

S

D F

1

(2f)

k11

G

1 exp 2 Vt Sˆ 21V 2

ˆu ÏuS

where

1

s 2u

su r ˆS 5 .. . 0 0

2

su r . . .

0

0

s 2r

0

0 . . . su r

.

.

.

0

s 2u

0

. . . su r

s

2 r

.

(2k12)3(2k12)

From Eqs. (9) and (10), we can show that (DQ9, DR9) also has a Gaussian distribution having p.d.f.

s 2r 9 su9r 9

S

D

s 2u9

su9r 9

su9r 9

s 2r 9

, where

O c 1 s O c 1 2s O c 5s Oc 1s Oc 1 2s O c 5s Oc c 1s Oc c c 1c c ). 1 s O (c

s 2u9 5 s 2u

5 c21du 1 c22d r 1 c23du1 1 c24d r1 1 ? ? ?

5

p(DQ9, DR9)

k

i 50

k

2 1,2i11

2 r

2 2,2i11

2 r

i 50

k

2 u

i 50

k

2 1,2i12

ur

2 2,2i12

ur

k

i 50

i 50

1,2i11c1,2i12 ,

k

k

2 u

i 50

i 50

2,2i11c2,2i12 ,

k

1,2i11 2,2i11

2 r

i 50

1,2i12 2,2i12

k

ur

i 50

1,2i11 2,2i12

2,2i11 1,2i12

Recall that in computing the invariant (u9, r9)t, we may adjust the value of u9 by adding f or 2f and flipping the sign of r9 accordingly. This being the case, the covariance matrix S9 is adjusted by flipping the sign of su9r 9 . 5. GEOMETRIC HASHING WITH WEIGHTED VOTING

Even when we use an improved Hough transform to detect line features in a degraded image, we still face the problem of dealing with image noise and the resulting problem of noise in the invariants used for hashing, as discussed in the preceding section. Performance is similarly degraded by quantization noise introduced in constructing the hash table. Although hash space density can be equalized by rehashing [30], a drawback here is that invariants that are close to each other can be far apart after rehashing. This can be disadvantageous in the presence of noise. Hence, we make use of a Bayesian a posteriori maximum likelihood scheme based on geometric hashing, which is similar to [14] for point features, wherefrom the evidence synthesis formula is simplified to our purpose, as described in Section 5.2. A viewpoint consistency constraint is also applied to further filter out false alarms in the experiments. 5.1. A Measure of Matching between Scene and Model Invariants During the recognition stage of geometric hashing, a computed scene invariant is used to index the geometric hash table and one vote is scored for each entry retrieved.

186

FRANK C. D. TSAI

However, a perturbed scene invariant can hardly exactly hit the entry where its matching model invariant resides. Gavrila and Groen assumed a bounded circular region around each hash bin and tallied equal votes for all the bins within the region around the bin hashed [10]. In the case of line invariants, our analysis in Section 4 shows that the perturbation of an invariant has an approximately Gaussian distribution with a nonzero correlation coefficient. This implies that the region of hash space that would need to be accessed is an ellipse instead of a circle centered around the ‘‘true’’ value of the invariant. Moreover, this perturbation also depends on the line being encoded and the basis used for encoding, which implies that different invariants should be associated with different size, shape, and orientation of hash space regions for error tolerance. (Analyses on point features also hold similar observations, with the correlation coefficient being zero in the case of similarity transformations [29].) Logically the nodes in hash space that the scene invariant more ‘‘closely matches’’ should have a higher weighted score than those which are far apart, in a manner accurately representing the behavior of the noise which affects these model invariants. Let invs be a scene invariant and invMi be a model invariant computed from a basis BMi and a line lj of Mi . Bayes’ theorem [8, 27] allows us to precisely determine the probability that invs results from the presence of a certain [Mi , BMi , lj ],

P([Mi , BMi ]) u invs ) P max P([Mi , BMi , lj ] u invs ) Y

p(invs )

j

.

(12) 5.2. Evidence Synthesis by Bayesian Reasoning In the recognition stage, certain scene features are probed as a basis Bs and the invariants of all the remaining features are computed in terms of this basis. Each invariant provides a certain degree of support for the existence of a [Mi , BMi ] combination, assuming Bs corresponds to BMi . For every [Mi , BMi ] conbination, if we synthesize support from all the invariants and then hypothesize that the one with maximum support appears in the scene, we in fact exercise a maximum likelihood approach to object recognition. The following simplified formula, based on a probability derivation using Bayes’ theorem [5] and assuming conditional independence among the invariants [14], synthesizes support from different invariants: P([Mi , BMi ] u invs1 , . . . , invsn ) P([Mi , BMi ]) 5

P([Mi , BMi , lj ] u invs ) 5 P([Mi , BMi , lj ])

maxj p(invs u [Mi , BMi , lj ])

p(invs1 ) ? ? ? p(invsn ) p(invs1 , . . . , invsn )

n

P([Mi , BMi ] u invsk )

k 51

P([Mi , BMi ])

p

maxj p(invsk u [Mi , BMi , lj ] 1 . p p(invs1 , . . . , invsn ) k 51 P([Mi , BMi ]) n

p(invs u [Mi , BMi , lj ]) p(invs )

Y ,

where • P([Mi , BMi , lj ]) is the a priori probability that Mi is embedded in the scene and BMi is selected to encode lj . We assume that all the models are equally likely to appear and have the same number of features, which is a simplification that can be easily generalized. • p(invs u [Mi , BMi , lj ]) is the conditional p.d.f. of observing invs given that Mi appears in the scene with BMi being selected and lj being encoded. It consists of two parts, the spike induced by [Mi , BMi , lj ] and a background distribution. • p(invs ) is the p.d.f. of observing invs , regardless of which [Mi , BMi , lj ] induces it. Assuming that model features used in the preprocessing stage are distinct and separate, invs can be close to at most one model invariant. Given invs , the supporting evidence that Mi appears in the scene and the basis BMi is selected is then

(by Eq. (12)) Thus P([Mi , BMi ] u invs1 , . . . , invsn ) n

Y p max p(invsk u [Mi , BMi , lj ]),

(13)

j

k 51

since p(invs1 , . . . , invsn ) is independent of all the hypotheses and all P([Mi , BMi ])’s are identical. In computing P([Mi , BMi ] u invs1 , . . . , invsn ), we need not compute absolute probabilities but only relative probabilities. Since the logarithm function is monotonic, we can apply the logarithm to both sides of Eq. (13), log P([Mi , BMi ] u invs1 , . . . , invsn ) 5 log C 1

O log max p(inv n

k 51

turning products into sums.

j

sk

(14) u [Mi , BMi , lj ]),

PROBABILISTIC GEOMETRIC HASHING USING LINES

187

5.3. The Algorithm

5.4. An Analysis of the Number of Probings Needed

In the preprocessing stage, for a specific transformation group considered, we use the formulas in Section 3 to compute the invariants and Eq. (11) to compute their perturbations. Both pieces of information, together with the identity of [model, basis, line], are stored in a node in the hash entry indexed by each invariant. Understanding that the perturbation function is Gaussian, we store its covariance matrix as a representation of the function. We note that each node records information about a particular [model, basis, line] combination necessary for our recognition stage. The hash table is so arranged that the position of each entry represents a quantized coordinate (u, r) of hash space. In this way, the range query can be easily performed. In the recognition stage, line features are detected and a subset of these lines are probed as a basis for associating invariants with all the remaining lines. Equation (14) is then used to accumulate support from all these invariants. We note that a straightforward use of Eq. (14) runs through all the [model, basis, line] combinations for every scene invariant and this can result in unpleasant computational cost. By exploiting the hash table prepared in the preprocessing stage, for each invariant invsk hashed into hash space, we access only its nearby nodes, [Mi , BMi , lj ]’s. For each node accessed, we credit log p(invsk u [Mi , BMi , lj ]) to the node, the initial score value of which is set to 0. In computing p(invsk u [Mi , BMi , lj ]), the background distribution function, compared with the spike induced by [Mi , BMi , lj ], is usually negligible. We thus use Eq. (11) to approximate p(invsk u [Mi , BMi , lj ]). Since typically this Gaussian p.d.f. falls off rapidly, we may approximately credit the same weighted score, log c, to all those nodes outside the neighborhood of the hash. Or we may equivalently credit log p(invsk u [Mi , BMi , lj ]) 2 log c to those within the neighborhood and keep the scores of other nodes intact. After all the scene invariants are processed, the support for each [Mi , BMi ] combination is histogramed. The top few with sufficiently high scores are then verified by transforming the model onto the scene. A quality measure, QM, defined as the ratio of visible boundaries, is computed for each hypothesis. If the QM is above a preset threshold (e.g., 0.5), the hypothesis is considered as passing verification. We note that the verification can proceed in a hierarchical way by verifying the basis first and rejecting it immediately when it fails verification. Since there are multiple model instances in the scene, after a number of probings we apply a viewpoint consistency constraint to refine the result, which is described in Section 6. In the following section, we give a probabilistic analysis of the number of probings needed to detect all the instances in the scene.

Suppose there are m model instances in the scene and n lines are detected by the Hough transform, where n 5 n1 1 n2 with n2 lines being spurious. Assume that the degree of occlusion of each instance is the same. Each model instance has average of n1 /m true lines, and the probability of choosing a scene basis, consisting of k features, which happens to match a certain model basis is k21 p 5 P i 50 (n1 /m 2 i)/(n 2 i). The probability of detecting all m model instances in t trials, t $ m, is p9 5 m21 C(t, m)m!p m 5 P i 50 (t 2 i)p m. We may derive a lower bound of t by restricting p9 $ «, 0 # « # 1. In practice, we have to try more probings than theoretically predicted, because a probed scene basis, even if it matches a certain model basis, can be ‘‘bad’’ (e.g., too small, too big, or too perturbed), resulting in a much distorted model-to-scene transformation that fails verification. 6. THE EXPERIMENTS

Since affine transformations are often appropriate approximations to more general perspective transformations under suitable viewing situations (see p. 79 of [17]), we chose to implement the cases of affine transformations. A similar approach applies to the cases of other transformations. 6.1. Implementation of Affine Invariant Matching To apply the procedure described in Section 5.3 to the case of affine groups, we need a triplet of lines to form a basis and apply Eqs. (1) and (2) to computing the invariants of other detected lines. Considering numerical stability, we should avoid using small bases or large bases for invariant computing. In our implementation, we used 256 3 256 images. If any side of the basis was less than 20 pixels or greater than 500 pixels, we considered the basis infeasible. The hash table was implemented as a 2-D array to represent the 2-D hash space. As during recognition we have to consider the neighborhood of a hash entry when the entry is retrieved. The boundary of the hash table was treated with special care; the u-axis of the 2-D hash table should be viewed round-wrapped with flip, since an invariant (u, r) is mathematically equivalent to (u 2 f, 2r) (e.g., (1798, 2) is equivalent to (218, 22), hence neighboring, say, (08, 22)). We also need to specialize Eq. (11), the perturbation function of the invariant, to the affine case. Refer to [35] for the expressions of the coefficients ci,j of its covariance matrix components. In each basis probing, for each hash node [Mi , BMi , lj ] multiple accumulation of weighted votes from different scene invariants should be avoided, since each model feature lj can match only one scene feature. We need to keep

188

FRANK C. D. TSAI

track of all the scores each hash node has received and count only the maximum score for the node. We also need to filter out reflection cases. Although reflection transformations form a subgroup of affine transformations, a 2-D object, say a triangle with vertex 1, 2, 3 in clockwise sequence, preserves this sequence, even when its shape is seriously skewed when viewed from a tilted camera. This problem is tackled by detecting the orientation (clockwise or counterclockwise) of a basis by its cross product, so that these kinds of false matches between scene bases and model bases are never hypothesized. Dealing with highly occluded scenes, we observed that sometimes even if a hypothesis passes verification (by verifying its boundary), it can still be a false alarm. By the viewpoint consistency principle [26], the locations of all object features in an image should be consistent with the projection from a single viewpoint. It can be shown that all the 2-D objects lying on the same plane have an identical enlarging (or shrinking) ratio of area, if viewed from the same camera, assuming approximation of affine transformations to perspective transformations. The quantity of this ratio is udet(A)u, where A is the skewing matrix of an affine transformation T 5 (A, b). We call it the skewing factor and request that all the hypothesized instances have the same skewing factor. (T can be obtained from the correspondence of bases between a model and its instance in the scene. For the formula computing T, refer to the Appendix of [36].) In each probing, if a hypothesis has sufficiently high votes, it is verified; thereafter, if the verification is successful, its skewing factor is computed and used to vote for the common skewing factor. After a sufficient number of bases are probed, which is determined by a statistical estimation procedure (see Section 5.4), the common skewing factor for all model instances is obtained by taking the udet(A)u with highest votes. Those hypothesized model instances which voted for this skewing factor are then passed to a disambiguation process which we go on to describe. It is possible that a scene feature is shared by many model instances, some of which can be spurious. We can refine the result by assigning the shared feature(s) to the model instance most strongly supported by the evidence. One possible technique for disambiguation is relaxation [37]. However, relaxation is a complicated process. Since we have associated each model instance a quality measure, QM, as a measure of the strength of evidence supporting its existence (this definition of QM favors long segments of a model instance and is similar to that used in [1] with the same justification), we may use QM for our purpose. The following straightforward technique proves to work well. We maintain two data structures, a QM-buffer and a frame-buffer, each of which is a 2-D array of the same size

as the image. After initializing both data structures, we process each candidate model instance one by one (the order is insignificant) by superimposing the candidate onto the QM-buffer and tracing its boundary. For each position traced, if the QM of the current candidate is greater than the value previously stored in that QM-buffer entry, we overwrite the corresponding entry of the frame-buffer with the id of this candidate and replace that value in the QMbuffer entry by the current QM. After all candidates are processed, we recompute the QM of each candidate on the frame-buffer, wherein in computing QM the boundary of each candidate is traced but only positions with this candidate’s id count. Finally, those with its new QM less than a preset threshold (e.g., 0.5) are removed from the list of candidate model instances. 6.2. Best Least-Squares Match The transformation between a model and its instance in the scene can be recovered by the correspondence between the model basis and the scene basis alone. However, detected scene lines are usually somewhat distorted due to noise. This results in distortion in computing the transformation. We observed that this transformation can usually transform the model basis to match its scene basis relatively accurately; yet, the rest of the model features more or less deviate from their scene correspondences after transformation. Knowledge of additional line correspondences can be used to improve the accuracy of the computed transformation. In the case of point features, it is straightforward to minimize the squared distances between transformed model points and their corresponding scene points [19]. Yet for line features, we are much troubled because line parameters, in contrast to their point counterparts, are an abstract description of the physical line in the image. One way to minimize the error is to treat each line as a point with coordinate (u, r) in (u, r)-space and minimize the squared distance between (u, r) and its correspondence, (u9, r9). A drawback lies in that u and r are of different metrics. Minimizing the squared distance between (u, r) and (u9, r9), we implicitly assume equal weight on both components. To circumvent this problem, we note that a line (u, r) can be uniquely represented by the point (r cos u, r sin u), which is the projection of the origin onto the line. We can then minimize the squared distance between points (r cos u, r sin u) and (r9 cos u9, r9 sin u9). Yet, this approach suffers the problem that the nearer the line is to the origin, the more weight is on r. It is logical at the final registration stage to perform error minimization in the image domain, where model instances are registered. Observing that the models in the model base are usually finite in the sense that they consist of line segments (though

189

PROBABILISTIC GEOMETRIC HASHING USING LINES

modeled by lines), we can minimize the squared distance between the endpoints of the transformed model line segments and their corresponding scene lines in image space. In the following, we derive the closed-form formula for the case of affine transformations. A similar technique applies to the cases of other transformation groups. Assume that the affine transformation T 5 (A, b) transforms n model line segments, with endpoints, u j1 and u j2 , to match n corresponding scene lines l j , j 5 1, . . . , n. The minimized summations of the squared distances between the sequence, T(u j1) to l j and T(u j2) to l j , j 5 1, . . . , n, is n E 5 minT oj51 dist2(T(u j1), l j ) 1 dist2(T(u j2), l j ). Let line l j have parameter (u j , r j ); endpoints u ji have coordinate (x ji , y ji), i 5 1, 2; A 5 [a ij ] i, j51,2 and b 5 [b i ] it51,2 . Then T(u ji ) 5 (a11 x ji 1 a12 y ji 1 b1 , a21 x ji 1 a22 y ji 1 b2) t and E 5 min T

O ((cos u , sin u )T(u ) 2 r ) n

j

j

j1

j

2

j51

1 ((cos u j , sin u j )T(u j2) 2 r j )2. To minimize E, we have to solve the following system of linear equations with six unknowns: ­E 5 0, ­a11

­E 5 0, ­a12

­E 5 0, ­b1

and

­E 5 0, ­a21

­E 5 0, ­a22

­E 5 0. ­b2

Rewriting the equations in matrix form (refer to the Appendix for the expressions of m ij and n i , i, j 5 1, . . . , 6),

1

m11 m12 m13 m14 m15 m16 m21 m22 m23 m24 m25 m26 m31 m32 m33 m34 m35 m36 m41 m42 m43 m44 m45 m46 m51 m52 m53 m54 m55 m56 m61 m62 m63 m64 m65 m66

21 2 1 2 a11

n1

a12

n2

a21 a22

5

n3 n4

b1

n5

b2

n6

,

we can solve the unknowns by Cramer’s rule (see p. 89 of [22]), a11 5 det(M1)/det(M), a12 5 det(M2)/det(M), a21 5 det(M3)/det(M), a22 5 det(M4)/det(M), b1 5 det(M5)/det(M),

b2 5 det(M6)/det(M),

where M 5 [m ij ] i,j51,...,6 and Mk 5 M with the kth column substituted by [n i ] it51,...,6 , for k 5 1, . . . , 6. We note that a variation with the weighted sum of the squared distances can further improve the result by giving more weight to longer model line segments when computing the squared distances between their transformed endpoints and corresponding scene lines. 6.3. Experimental Results We have done a series of experiments using synthesized images and real images containing polygonal objects, which are modeled by line features. Our model base consists of twenty models, each with ten lines (see Fig. 1). For synthesized images, we impose positive and negative noise. Both kinds of noise are supposed to be introduced by occlusion, e.g., flakes in the milling process. Negative noise represents missing boundary pixels and is the result of occlusion on the object boundary. Positive noise mainly includes edge pixels of those occluding objects and the noise introduced in the imaging process. We generated negative noise by randomly erasing edgels from the object boundary. For example, if 20% of the boundary is expected to be occluded, we trace along the boundary of the object instances and erase the edgels whenever ran modulo 5 5 0, where ran is a random number. We used random dot noise and segment noise to simulate positive noise. Random dot noise is an additive dot noise with a position in the image that is randomly selected; segment noise is added by randomly selecting a position and an orientation (from 08 to 3608) for the segment and then inserting in the image a segment of length varying randomly from 0 to 7. Figures 2 and 3 show two examples of the experiments on synthesized images. Figure 4 shows an example of the experiments on real images, with best least-squares matches using the procedure discussed in Section 6.2. Figure 2a shows a composite overlapping scene of models 0, 3, and 4 (model 0 appears twice), which are significantly skewed. Figure 2b is the result of Hough analysis. Fifty lines were detected. The covariance matrix used for the deviation of the detected lines from true lines was

S5

S

0.003384 20.00624 20.00624

2.5198

D

,

obtained from statistics of extensive simulations on images of different levels of degradation. Figure 2c shows the model instances hypothesized on the recognition stage. Five thousand probings were tried. Figure 2d shows the final result after disambiguation. Figure 3 shows the experiments on another compositve overlapping scene of models 0, 1, 2, 4, and 9. Sixty lines were detected by Hough analysis.

190

FRANK C. D. TSAI

FIG. 1. The twenty models used in our experiments. From left to right, top to bottom are model 0 to model 19.

The covariance matrix used is the same as in Figure 2. We again tried five thousand probings. Figure 4a shows a real image of models 0, 2, 3, and 4 with flakes scattered. Figure 4b is the result of edge detection using Boie–Cox edge detector [4]. Figure 4c shows the result of Hough analysis. Fifty lines are detected. Five thousand probings were tried. Figures 4d and 4e show the recognition result before and after disambiguation, respectively. Figure 4f shows the best least-squares match. 7. DISCUSSION

In conducting the experiments, in our first attempt we let the hypothesized candidate model instances vote for the common skewing factor without verifying them first.

A wrong common skewing factor was obtained at times. The reason for this involves: (1) The image is so noisy that too many lines are detected and multiple model instances coexist in the scene, thus irrelevant lines are often chosen as a basis in probing; (2) An affine transformation is so ‘‘powerful’’ that it is not particularly difficult to have an affine transformation that maps a quadruplet of lines to another quadruplet of lines if some error tolerance is allowed. Thus we modified our algorithm to allow only those verified candidate instances to vote for the common skewing factor. The trade-off is that verification takes computational time. We also found that during probing, if the probed triplet of lines happens to correspond to any model basis (i.e., a correct match), then this [model, basis] pair usually accu-

PROBABILISTIC GEOMETRIC HASHING USING LINES

191

FIG. 2. Experimental example 2.

mulates the highest weighted score above the others. In the cases where it does not rank the highest, it is usually due to fewer lines detected for that particular model instance. However, its weighted score is still among the highest few. The casual high-scored false alarms are usually rejected

by verification. We conclude that this weighted voting scheme is quite effective. The method described above does not assume any grouping of features, which is expected to greatly expedite the recognition stage (at least for scene basis selection). Lowe

192

FRANK C. D. TSAI

FIG. 3. Experimental example 1.

[25] first explicitly discussed the importance of grouping to recognition. However, if reliable segmentation can not be available, intelligent grouping seems difficult. We pointed out before that sometimes false alarms pass even the verification, because of high noise in the scene. However, statistically false alarms of this kind disperse their votes for differ-

ent skewing factors, while correct matches accumulate their votes for the common skewing factor. Finally, we end this paper with a brief discussion of two possible future directions, the intelligent fusion of multiple feature types and reasoning by part. If an image can be well segmented and geometric fea-

PROBABILISTIC GEOMETRIC HASHING USING LINES

FIG. 4. Experimental example 3.

193

194

FRANK C. D. TSAI

tures like points, segments, circles, and ovals can be reliably extracted, our technique can be helpful to designing a suitable weighting function or to perform hierarchical filtering by different feature types. This may require evaluation criteria for measuring the relative merit of different feature types and can be application-dependent. The techniques of reasoning by part and geometric hashing can possibly benefit from each other; it is usually inadequate to represent a complicated object by a single primitive feature type. Representing different parts of an object

by different primitive feature types can be more feasible. With well-segmented images, we may recognize different parts of objects by the geometric hashing technique using different feature types. ACKNOWLEDGMENTS The author thanks Professors J. T. Schwartz, R. A. Hummel, and J. Hong, Dr. I. Rigoutsos, Mr. Y. Hecker, and Mr. J. J. Liu for generous advice and discussions.

APPENDIX

The components of the matrix for the system of linear equations for best least-squares match are as follows: m11 5 oj51 cos2uj (x 2j1 1 x 2j2),

m31 5 oj51 cos uj sin u j (x 2j1 1 x 2j2),

m12 5 oj51 cos2u j (x j1 y j1 1 x j2 y j2),

m32 5 oj51 cos u j sin u j (x j1 y j1 1 x j2 y j2),

m13 5 oj51 cos u j sin u j(x 2j1 1 x 2j2),

m33 5 oj51 sin2u j (x 2j1 1 x 2j2),

m14 5 oj51 cos u j sin u j (x j1 y j1 1 x j2 y j2),

m34 5 oj51 sin2u j (x j1 y j1 1 x j2 y j2),

m15 5 oj51 cos2u j (x j1 1 x j2),

m35 5 oj51 cos u j sin u j (x j1 1 x j2),

m16 5 oj51 cos u j sin u j (x j1 1 x j2),

m36 5 oj51 sin2u j (x j1 1 x j2),

m21 5 oj51 cos2u j (x j1 y j1 1 x j2 y j2),

m41 5 oj51 cos u j sin u j (x j1 y j1 1 x j2 y j2),

m22 5 oj51 cos2u j ( y 2j1 1 y 2j2),

m42 5 oj51 cos u j sin u j ( y 2j1 1 y 2j2),

m23 5 oj51 cos u j sin u j (x j1 y j1 1 x j2 y j2),

m43 5 oj51 sin2u j (x j1 y j1 1 x j2 y j2),

m24 5 oj51 cos u j sin u j ( y 2j1 1 y 2j2),

m44 5 oj51 sin2u j ( y 2j1 1 y 2j2),

m25 5 oj51 cos2u j ( y j1 1 y j2),

m45 5 oj51 cos u j sin u j ( y j1 1 y j2),

m26 5 oj51 cos u j sin u j ( y j1 1 y j2),

m46 5 oj51 sin2u j ( y j1 1 y j2),

m51 5 oj51 cos2 u j (x j1 1 x j2),

m61 5 oj51 cos u j sin u j (x j1 1 x j2),

m52 5 oj51 cos2u j (y j1 1 y j2),

m62 5 oj51 cos u j sin u j (y j1 1 y j2),

m53 5 oj51 cos u j sin u j (x j1 1 x j2),

m63 5 oj51 sin2u j (x j1 1 x j2),

m54 5 oj51 cos u j sin u j (y j1 1 y j2),

m64 5 oj51 sin2u j ( y j1 1 y j2),

m55 5 2 oj51 cos2u j ,

m65 5 2 oj51 cos u j sin u j ,

m56 5 2 oj51 cos u j sin u j ,

m66 5 2 oj51 sin2u j,

n1 5 oj51 cos u j r j (x j1 1 x j2),

n3 5 oj51 sin u j r j (x j1 1 x j2),

n2 5 oj51 cos u j r j ( y j1 1 y j2),

n4 5 oj51 sin u j r j ( y j1 1 y j2),

n5 5 2 oj51 cos u j r j ,

n6 5 2 oj51 sin u j r j .

n n n n n n

n n n n n n

n n n n

n n

n n

n

REFERENCES 1. N. Ayache and O. D. Faugeras, HYPER: A new approach for the recognition and positioning of two-dimensional objects, IEEE Trans. PAMI 8, 1986, 44–54. 2. P. J. Besl and R. C. Jain, Three-dimensional object recognition, ACM Computing Surveys 17, 1985, 75–154.

n n n n n n

n n n n n n

n n n n

n n

n n

n

3. T. O. Binfold, Survey of model-based image analysis systems, Int. J. Robotics. Res. 1, 1982, 18–64. 4. R. Boie and I. Cox, Two-dimensional optimum edge recognition using matched and Weiner filter for machine vision, in Proceedings, International Conference on Computer Vision, London, England, 1987, pp. 100–108.

PROBABILISTIC GEOMETRIC HASHING USING LINES 5. E. Charniak and D. McDermott, Introduction to Artificial Intelligence, Addison-Wesley, MA, 1985. 6. R. T. Chin and C. R. Dyer, Model-based recognition in robot vision, ACM Comput. Surv. 18, 1986, 67–108. 7. M. Costa, R. Haralick, and L. Shapiro, Optimal affine-invariant point matching, in Proceedings, International Conference on Pattern Recognition, Atlantic City, 1990, pp. 233–236. 8. R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, Wiley, New York, 1973. 9. P. J. Flynn and A. K. Jain, 3-D object recognition using invariant feature indexing of interpretation table, CVGIP: Image Understanding 55, 1992, 119–129. 10. D. Gavrila and F. Groen, 3-D object recognition from 2-D images using geometric hashing, Pattern Recognit. Lett. 13, 1992, 263–278. 11. W. E. L. Grimson and D. P. Huttenlocher, On the sensitivity of geometric hashing, in Proceedings, International Conference on Computer Vision, Osaka, Japan, 1990, pp. 334–338. 12. A. Gueziec and N. Ayache, New developments on geometric hashing for curve matching, in Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, New York, 1993, pp. 703–704. 13. Y. C. Hecker and R. M. Bolle, Invariant Feature Matching in Parameter Space with Application to Line Features, IBM Research Report RC 16447, 1991. 14. R. Hummel and I. Rigoutsos, A Bayesian approach to model matching with geometric hashing, submitted. 15. R. Hummel and H. Wolfson, Affine invariant matching, in Proceedings, DARPA IU Workshop, 1988, pp. 351–364. 16. J. Illingworth and J. Kittler, A survey of the Hough transform, Comput. Vision Graphics Image Process 44, 1988, pp. 87–116. 17. f. klein. Elementary Mathematics from an Advanced Standpoint: Geometry, Macmillan, New York, 1925. 18. Y. Lamdan, Object Recognition by Geometric Hashing, Ph.D. thesis, Computer Science Dept., Courant Institute of Mathematical Sciences, New York University, New York, 1989. 19. Y. Lamdan, J. T. Schwartz, and H. J. Wolfson. On recognition of 3-D objects from 2-D images, in Proceedings, IEEE International Conference on Robotics and Automation, Philadelphia, Pennsylvania, 1988, pp. 1407–1413. 20. Y. Lamdan and H. J. Wolfson, Geometric hashing: A general and efficient model-based recognition scheme, in Proceedings, International Conference on Computer Vision, Tampa, Florida, 1988, pp. 238–249. 21. Y. Lamdan and H. J. Wolfson, On the error analysis of geometric hashing, in Proceedings, IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, 1991, pp. 22–27.

195

22. S. Lang, Linear Algebra, Addison-Wesley, MA 1966. 23. J. J. Liu, Model-Based 3-D Object Recognition System using Geometric Hashing with Attributed Features, Ph.D. thesis, Computer Science Dept., Courant Institute of Mathematical Sciences, New York University, New York, 1995. 24. J. J. Liu and R. A. Hummel, Geometric hashing with attributed features, in Proceedings, Second CAD-Based Vision Workshop, Champion, Pennsylvania, 1994, pp. 9–16. 25. D. G. Lowe. Perceptual Organization and Visual Recognition, Kluwer, MA, 1985. 26. D. G. Lowe, The viewpoint consistency constraint, Int. J. Comput. Vision 1, 1987, 57–72. 27. Y. H. Pao, Adaptive Pattern Recognition and Neural Networks, Addison-Wesley, MA, 1989. 28. I. Rigoutsos, Massively Parallel Bayesian Object Recognition, Ph.D. thesis, Computer Science Dept., Courant Institute of Mathematical Sciences, New York University, New York, 1992. 29. I. Rigoutsos and R. Hummel, Robust similarity invariant matching in the presence of noise, in Proceedings, The 8th Israeli Conference on Artificial Intelligence and Computer Vision, Tel Aviv, Israel, 1991. 30. I. Rigoutsos and R. Hummel, Several results on affine invariant geometric hashing, in Proceedings, 8th Israeli Conference on Artificial Intelligence and Computer Vision, Tel Aviv, Israel, 1991. 31. I. Rigoutsos and R. Hummel, Massively parallel model matching: Geometric hashing on the Connection Machine, in IEEE Computer: Special Issue on Parallel Processing for Computer Vision and Image Understanding, 1992. 32. I. Rigoutsos and R. A. Hummel, Implementation of geometric hashing on the Connection Machine, in IEEE Workshop on Directions in Automatic CAD-Based Vision, 1991. 33. K. B. Sarachik and W. E. L. Grimson, Gaussian error model for object recognition, in Proceedings, IEEE COnference on Computer Vision and Pattern Recognition, New York, 1993, pp. 400–406. 34. J. T. Schwartz and M. Sharir, Identification of partially obscured objects in two dimensions by matching of noisy ‘‘characteristic curves,’’ Int. J. Robotics Res. 6, 1987, 29–44. 35. F. C. D. Tsai, A Probabilistic Approach to Geometric Hashing using Line Features, Ph.D. thesis, Computer Science Dept., Courant Institute of Mathematical Sciences, New York University, New York, 1993. 36. F. C. D. Tsai, Geometric hashing with line features, Pattern Recognit. 27, 1994, 377–389. 37. S. W. Zucker, R. Hummel, and A. Rosenfeld, An application of relaxation labeling to line and curve enhancement, IEEE Trans. Comput. 26, 1977, 394–403.