Shape similarity matching for query-by-example

Shape similarity matching for query-by-example

Pattern Recognition, Vol. 31, No. 7, pp. 931—944, 1998 ( 1998 Pattern Recognition Society. Published by Elsevier Science Ltd All rights reserved. Prin...

679KB Sizes 0 Downloads 115 Views

Pattern Recognition, Vol. 31, No. 7, pp. 931—944, 1998 ( 1998 Pattern Recognition Society. Published by Elsevier Science Ltd All rights reserved. Printed in Great Britain 0031-3203/98 $19.00#0.00

PII: S0031-3203(97)00076-9

SHAPE SIMILARITY MATCHING FOR QUERY-BY-EXAMPLE BILGE GU®NSEL* and A. MURAT TEKALP Department of Electrical Engineering and Center for Electronic Imaging Systems, University of Rochester, Rochester, NY 14627, U.S.A. (Received 23 January 1997; in revised form 3 June 1997) Abstract—This paper describes a unified approach for two-dimensional (2-D) shape matching and similarity ranking of objects by means of a modal representation. In particular, we propose a new shape-similarity metric in the eigenshape space for object/image retrieval from a visual database via queryby-example. This differs from prior work which performed point correspondence determination and similarity ranking of shapes in separate steps. The proposed method employs selected boundary and/or contour points of an object as a coarse-to-fine shape representation, and does not require extraction of connected boundaries or silhouettes. It is rotation-, translation- and scale-invariant, and can handle mild deformations of objects (e.g. due to partial occlusions or pose variations). Results comparing the unified method with an earlier two-step approach using B-spline-based modal matching and Hausdorff distance ranking are presented on retail and museum catalog style still-image databases. ( 1998 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Image databases Shape similarity metrics Content-based access.

Modal matching

1. INTRODUCTION

Recently, a variety of content-based image indexing and retrieval methods have been proposed based on shape, color, texture, or combinations of them.(1—6) In general, queries based on content similarity to an example object/image in terms of color, texture, shape, etc. is known as Query-by-Example (QBE). This work focuses on shape-based QBE for color image databases. This requires a robust shape representation and shape similarity measure suitable for fast retrieval and similarity ranking. A wide variety of shape representation and matching techniques have been proposed. Among them are template matching,(7) modal matching,(8) methods using B-splines or snakes,(9) Fourier descriptors,(10) and statistical feature-based methods such as moment invariants.(11) However, these techniques do not address shape retrieval and similarity ranking issues related to QBE applications. Recent content-based still-image indexing and retrieval systems include IBM’s Query-by-Image Content (QBIC),(12) MIT’s Photobook,(4) and Multimedia/VOD testbed from Columbia University.(13) QBIC performs shape-based search utilizing a number of shape features: area, circularity, eccentricity, major axis of inertia, and higher-order algebraic moment invariants. In QBIC, these shape features are

*Author to whom correspondence should be addressed. E-mail: Mgunsel, tekalpN @ee. rochester. edu.

Object retrieval

combined into a feature vector, and shape similarity is measured by a weighted Euclidean distance metric. These features are sensitive to outliers and is reliable for a small number of objects (mostly one object) contained in the image. The Columbia University Multimedia testbed integrates texture, color and shape information for image retrieval using Euclidean distance. Photobook describes a content-based tool utilizing finite element modes to match shapes from a digital database to an example. Later work of Sclaroff and Pentland(3) incorporated functions for similarity ranking of objects into Photobook based on eigenshape decomposition. Their two-step approach utilized the eigenmodes computed from a finite element based correspondence algorithm, to describe rigid and nonrigid deformations needed to align one object with another. The extent of these deformations is defined as a ‘‘measure of similarity’’ between the two objects. Recent work by Sclaroff(14) ordered the shapes of database objects in terms of nonrigid deformations and does not require comparison of query example with all entries in a database. In this work, we define a new shape similarity measure based directly on elements of the mismatch matrix derived from the eigenshape-decomposition(8,15) which enables similarity ranking of the retrieved objects. We consider all or a subset of the boundary points of an object (database or query object) as its shape representation; and treat them as an m-dimensional feature vector, where m is automatically selected. Using the eigenshape representation of the object’s boundary, we form an m]m

931

932

B. GU®NSEL and A. M. TEKALP

positive-definite proximity matrix which describes the relationship between feature (boundary) points of that object. The distance between the eigenvectors of proximity matrices of the query example and the database object under consideration form a mismatch matrix specifying matched feature points. We, then, define a new similarity metric as a function of the entries of the mismatch matrix corresponding to matched point pairs. We specifically address the following problems: (i) How do we select the appropriate number of feature points m for each object? (ii) How do we perform shape matching and similarity ranking when each object in the database as well as the query example has a different number of feature points? (iii) Can we handle mild deformations of the object shape, e.g. due to pose variations and partial occlusions? An essential step in object-based indexing/retrieval is the specification of objects, sometimes refered to as the foreground/background (F/B) separation problem. In this paper, we employ a color-based F/B separation scheme, which allows users to semiautomatically identify objects of interest in the images. The boundary points of the database and example objects are estimated by either the Canny edge detector(16) or the Graduated Nonconvexity (GNC) algorithm.(17) Prior to shape matching, the histogram intersection method(18) is used for objectbased color histogram matching, which allows us to eliminate database objects whose colors are significantly different from the example object. This step minimizes the number of shape similarity tests that need to be conducted; thus, reducing the overall computational complexity of shape matching. We assume that F/B separation has also been employed in the creation of image databases which can be searched with the proposed method. Moreover, it is assumed that the eigenshape representation of the boundary points of each database object as well as their color histograms have been stored in an appropriate indexing field. The F/B separation (object definition) problem can be avoided altogether if the database and example images are object-based encoded bitstreams, e.g. MPEG-4 bitstreams,(19) which already include object boundary information. The organization of the paper is as follows: Section 2 addresses feature point selection (data reduction) depending on the intended application. Section 3 describes a unified shape matching and similarity ranking approach. In particular, Section 3.1 presents eigenshape representations and Section 3.2 defines the new shape similarity metric. Issues related to robustness and computational complexity are discussed in Section 3.3. The proposed method is compared with the two-step approach of B-spline-based modal matching and Hausdorff distance ranking,(20) which is briefly reviewed in Section 4. Experimental results are presented in Section 5, and conclusions are given in Section 6. The F/B separation method is summarized in Appendix-A.

2. FEATURE POINT SELECTION

The first step in shape matching and similarity ranking is to select the feature (boundary and/or contour) points that are representative of the 2-D shape of an object. Here, boundary points refer to all 2-D edge points that are projections of the 3-D boundaries of the object into the image plane, whereas contour points refer to outline of the 2-D shape of the object. Hence, contour points are a subset of boundary points, which suggests a coarse-to-fine approach for selection of object shape feature points depending on the intended application. For example, if we are interested in retrieving objects which are similar in shape to a query example, then a coarse feature set comprising of the contour points or the control points of a B-spline representation of a closed contour may be sufficient. On the other hand, if we are looking for objects with identical shape, including the pose, then a fine (dense) representation of all boundary points may be needed. The shape matching and similarity ranking method proposed in Section 3 is capable of using both representations, since (i) it allows both closed contours and unconnected object boundary points; (ii) it can handle matching two shape descriptors with different number of feature points. A dense set of boundary points can be automatically detected by the Canny edge detector(16) or the GNC algorithm.(17) In this case, the boundary points need not form a closed or connected set. The number m of feature points for eigenshape decomposition described in Section 3 is then set equal to the number of detected boundary points. It is assumed that the indexing field associated with each database image includes the boundary representation of its objects. This dense boundary representation is suitable to address queries posed in terms of the silhouettes or sketches of sought objects. If a coarse representation of the shape is sufficient for the application at hand and closed contours of the objects under consideration are available, then we employ the control points of the B-spline representation of the shape for matching. There are several properties which make B-splines appealing in shape representation.(9,20) First of all, the global shape of an object can be efficiently specified by a small number of control points characterizing fitted B-spline; hence, effective data reduction. Second, a B-spline representation is affine-invariant, i.e. a B-spline undergoing an affine transformation is still a B-spline whose control points can be computed by a transformation of the original ones. Third, a B-spline provides a smooth boundary representation that is robust to noise. Finally, it allows decoupling of treatment of x and y coordinates. Selection of contour feature points by B-spline fitting is summarized in Appendix B. 3. A UNIFIED APPROACH FOR SHAPE MATCHING AND SIMILARITY RANKING

The basic requirements on a shape-based QBE system include: data reduction, a robust shape

Shape similarity matching

representation, an accurate similarity metric, fast response time, and, depending on the application, invariance to rotation, scaling, translation, and mild deformations (perspective). In the literature shape matching and similarity ranking are generally done in two steps,(12,20) where the overall system may or may not fulfill all of these requirements. The shape similarity measures can be classified as geometric or perceptual. The perceptual approaches aim to model the characteristics of human visual similarity perception and define similarity measures based on such models.(7,21—24) The method proposed in this paper falls under the geometric approaches, which define a geometric shape similarity distance in a suitable metric space. We develop a retrieval scheme that performs shape matching and similarity ranking in a single step; thereby, reducing errors arising from computations in different feature spaces, and satisfying all requirements mentioned above. 3.1. Shape representation and matching by eigen decomposition

(1)

where m is the number of boundary points, and d2 "Ex !x E2 is the squared Euclidean distance ij i j between the boundary points x and x of the object i j under consideration. The parameter p controls the width of the Gaussian; and determines the degree of interaction between the boundary points. For small p, the interaction is fairly local; while for large p, a bigger subset of boundary points affects each match. Next, we compute the eigenvalues and eigenvectors of the proximity matrix H by solving He "j e , i"1, 2, 2 , m. i i i

modes or eigenshapes. The modal matrix is defined as V"[vTvT2 vT ]T"[e De 2 De ]. 1 2 m 1 2 m

(3)

Modes having large eigenvalues are refered to as lowfrequency modes, while those with small eigenvalues are called high-frequency modes. Because the matrix H is symmetric, the modal matrix V is orthogonal. Therefore, the m modes constitute an orthonormal basis, defining a set of coordinate axis in an m-dimensional space. The ith row v of the modal matrix V is i termed as the feature vector of boundary point i; and its elements are the corresponding modal coordinates. Suppose a query example e and a database object l have m and m boundary points, respectively. Then, e l the modal matrices He and Hl are m ]m and m ]m , e e l l respectively. We, next, form the mismatch matrix Z between the query example e and database object l. The entry in the ith row and jth column of the m ]m e l matrix Z is given by Z "Eve!vl E2, i"1, 2, 2 , m , j"1, 2, 2 , m . ij i j e l (4)

In eigenshape decomposition methods, shape information is decomposed into an ordered basis of orthogonal principal components.(15) As a result, the less critical and often noisy high-order components can be discarded in order to obtain canonical descriptions. This allows for selection of only the most important components to be used for efficient data reduction and robust shape matching. Furthermore, the orthogonality of eigenshapes ensures that the descriptions are unique. In order to establish correspondences between boundary points of the query example and a database object by the eigen-decomposition method, we first define a proximity matrix H for the query example and each database object. The proximity matrix of an object is positive definite by definition; and its eigenvectors encode the 2-D object shape. The entry in the ith row and jth column of the m]m matrix H can be computed based on a Gaussian-weighted metric as in reference (15) H "expMd2 /2p2N, i, j"1, 2, 2 , m, ij ij

933

(2)

All eigenvalues, j , will be positive since H is positive i definite. The normalized eigenvectors e are called i

Recall that, ve is m ]1 and vl is m ]1. Clearly, we i e j l need to redefine each vector as t]1 in order to subtract them. Determination of the common dimension t is discussed in Section 3.3. The values of Z range ij between zero (for a perfect match) and two (for a failed match). Finally, matching feature points are determined through the following procedure: (i) Determine the minimum entry in the ith row of Z. Suppose it is in the jth position. (ii) If the minimum entry in the jth column of Z is at the ith position, then a match is declared between the ith feature point of the example object and jth point of the database object; otherwise, no match is found for the ith feature point of the example object. Thus, the matrix Z enables rotation-, translation- and scale-invariant matching of the feature points of the query example and database objects. Of course, the maximum number of matches is given by K "minMm , m N, and the actual number of .!9 e l matching feature points is K4K . .!9 3.2. A new shape similarity metric We define the K]1 vector s composed of the el Z values corresponding to matched feature points of ij the query example e and database object l as the similarity vector. In case of a perfect match, K"K , .!9 and the similarity vector will be a zero vector. But if a perfect match cannot be established, the norm Es E el gives a measure of the mismatch of the two shapes in the shape matching (feature) space. An additional penalty will be added to the norm of Es E when el K(K . Since maxMZ N"2, the penalty term is set .!9 ij equal to 2(K !K) . .!9 Thus, we define a shape similarity distance function d(s ), between the query example e and database el

934

B. GU®NSEL and A. M. TEKALP

object l, by

G

inequality, i.e.

1 K K + + [ve !vl ]2 if K"K , si si .!9 K s/1 i/1 1 K K d(s )" + + [ve !vl ]2#2(K !K) el si si .!9 K .!9 s/1 i/1 if K(K , (5) .!9

C

D

where ve and vl denote the components of ve and vl , si si i j respectively. Recall that the dimension of the considered shape matching space is t, i.e. both ve and i vl are t]1. It can easily be shown that the function j d(s ) ranges between zero and two. el In the following, we show that the similarity distance (5) obeys metric properties; i.e. it is everywhere positive and has the properties of self-similarity and symmetry, and it satisfies the triangle inequality. Positivity: d(s )50. el This axiom is satisfied since the elements of the mismatch matrix Z are always greater than or equal to zero. Self-Similarity: d(s )"d(s )"0. ee ll The self similarity property is satisfied since d(s ) is el equal to zero if and only if there is a perfect match. Symmetry: d(s )"d(s ). el le The symmetry property corresponds to changing the rows of the matrix Z by its columns, which does not affect the coordinates of the matched points; hence, the value of similarity distance. Thus, the distance measure is symmetric. ¹riangle inequality: d(s )#d(s )5d(s ). ep pl el In order to prove that the similarity distance d(s ) el obeys the triangle inequality in the feature space, it is sufficient to show that each Z satisfies the triangle ij

Zel4Zep#Zpl N Eve!vl E24Eve!vpE2 ij ic cj i j i c #Evp!vl E2, c j

(6)

where i, j, c can be any boundary point, e, p and l are any three objects included in the database. However, each Z is defined as the Euclidean distance between ij the feature vectors;(8) thus, satisfies the triangle inequality. Indeed, d(s ) corresponds to the normalized el squared Euclidean distance between modal coordinates of matched boundary points. These properties relate to intuitive notions of shape resemblance; namely, that a shape is identical only to itself, the order of comparison of two shapes does not matter, and two shapes that are highly dissimilar cannot both be similar to some third shape. This final property (the triangle inequality) is particularly important in pattern matching applications in which several stored query example shapes are compared with an unknown shape. Most shape-comparison functions used in such applications do not obey the triangle inequality and can thus report that two highly dissimilar query example shapes are both similar to the unknown shape recognized from the observed image. 3.3. Computational aspects: reduction of dimensionality The block diagram of the unified shape matching and similarity ranking method is shown in Fig. 1. The top part of the block diagram illustrates database preparation; i.e. extraction of indexing information for database objects. The indexing information stored for each image consists of the eigenshape representation

Fig. 1. Block diagram of the unified shape retrieval system.

Shape similarity matching

and color histogram of objects included in the image. During the retrieval phase (the lower part of the block diagram), we need to compute the eigenshape representation of the query object, and then form the mismatch matrix between the query example and each image object under consideration. The matching is achieved in the feature space using the shape similarity metric. Thus, the computational complexity of the proposed retrieval method is directly related to the size of mismatch matrix, which depends on the number of the feature points, m and m . Here, we e l propose methods for reducing the size of the mismatch matrix and the dimension of the feature matching space t. Data reduction to limit the response time may be achieved in two ways: (i) initial reduction of the number of boundary or contour points m and m (we e l presented a B-spline fitting procedure for this purpose when closed contours are available); (ii) elimination of eigen modes with small eigenvalues, starting with a dense boundary set (statistical data reduction). In the latter case, to select the most significant eigenmodes, we first rank the eigenvalues of the proximity matrix of each object in nonincreasing order. Assuming that the number of selected eigenmodes for the query example is m@ , where m@ (m , the sum of e e e the eigenvalues corresponding to discarded modes, j , gives the mean-square error due to dis+m e i/m{e`1 i carding m !m@ modes. Then, we choose the first e e m@ eigenvalues such that the ratio e + me j i/t`1 i4P (7) + me j i/1 i remains less than some fixed percentage, P.(25) The scalar P is generally selected experimentally to provide a reduction in the number of features, while retaining the variance present in the original feature vector. It is assumed that similar data reduction has been achieved in the preparation of the database, such that the number of modes of the lth database objects has been reduced from m to m@ . In our experiments, l l we set the dimension of the feature matching space t"m@ . A complete algorithmic description of the proe posed reduced-dimensional unified shape matching and similarity ranking algorithm is summarized in Table 1. Furthermore, object-based color histogram matching can be employed to limit the search space of shape matching to foreground objects with desired colors.

4. A TWO-STEP METHOD

The proposed unified approach has been compared with a two-step approach where shape matching and similarity ranking are performed in different feature spaces.(20) The shape matching in the two-step approach is quite similar to the one described in Section 3.1. The eigenshape representation of the objects are again computed, but this time using only the control

935

Table 1. Summary of the proposed shape similarity matching scheme Step 1. For each Query Example e. Detect a boundary/contour point set. Find the m ]m proxe e imity matrix He with eigen values je, i"1, 2 , m , where i e m is the number of boundary/contour points of e. e Step 2. For the lth Database Object. Assuming that a boundary/contour point set is stored with each object, find the m ]m proximity matrix Hl with eigenl l values jl, i"1, 2 , m , where m is the number of boundi l l ary/contour points of the database object. Step 3. Dimension of the Matching Space, t. Let t"m@ be the number of eigenvalues of the proximity e matrix He satisfying the threshold condition

A

BNA B

me me + j + j 4P , i i e @ i/me`1 i/1 where P is predetermined. Find the reduced dimension m@ of e l the representation of the database object l satisfying the condition

A

BNA B

me me + j + j 4P , P 5P , i i l l e i/ml@`1 i/1 such that m@5m@ . l e Step 4. Construction of Mismatch Matrix. Compute t-dimensional eigenmodes ve, i"1, 2 , m@ , and vl, i e i i"1, 2 , m@ for e and l, respectively. Define the m@ ]m@ l e l mismatch matrix Z where t Z " + [ve !vl ]2. ij is js s/1 Here, ve and vl denote the components of ve and vl, si si i j respectively. Step 5. Feature Point Matching. Suppose successful matching was established at K4t points. That is, K points satisfy both column and row supremacy. Step 6. Similarity Metric. Compute the shape similarity distance d(s ) by el

GC

1 t t + + [ve !vl ]2 si si t d(s )" s/1i/1 el 1 K K + + [ve !vl ]2#2(t!K) si si t s/1i/1

D

if t"K, if t'K.

Step 7. Visit each database object and rank the retrieval outcomes according to the shape similarity distance.

points of a B-spline fit to the contour of the object, rather than using object boundary points. Once the B-spline representation for the query object and database object under consideration are established,(9) eigenshape representations corresponding to the contour of each object are obtained, and matching control point pairs are determined from the mismatch matrix as presented in Section 3.1. To allow definition of a similarity measure and similarity ranking, the B-spline control points of the database object are transformed to the spatial coordinates of the query object through an affine transformation. The parameters of this affine transformation are computed, in the least-squares sfrom

936

B. GU®NSEL and A. M. TEKALP

x x y r0 q0 q0 y 0 0 r0 )) " )) x x y rn~1 qn~1 qn~1 y 0 0 rn~1 a 11 a 12 ] )) a 22 a 23

1 0 2 0 0 2

0 x q0

0 y q0

0 1

1 0 2 0 0 0 0 0 2 x y 1 qn~1 qn~1

,

car retail catalogs (scanned at 70 dpi). Figures 3 and 4 depict some of the images in the car and clip art databases, respectively. Since our goal is to build an image retrieval system which is invariant to isotropic scaling, rotation, and translation, rigid or nonrigid deformations of each image are also added to both databases. Nonrigid deformations are included to test the sensitivity of the shape matching and ranking method to possible perspective distortions.

(8) 5.1. Results on the clip art database

where a , a , a , a , a , a represent the affine 11 12 13 21 22 23 parameters, and (x , y ) and (x , y ) represent the q qi ri ri coordinates of the i matched points of the query example and database object, respectively. The similarity of the query example and affine transformed database object can be established in terms of the Hausdorff distance.(7) The directed Hausdorff distance between the contours of the example object A"Ma , 2 , a N and the affine transformed 1 me database object B"Mb , 2 , b N is defined as 1 ml H(A, B)"maxMh(A, B), h(B, A)N,

(9)

where h(A, B)"max min MEa!bEN and E.E is the a|A b|B ¸ or Euclidean norm on the points of A and B. 2 Figure 2 shows the block diagram of the two-step retrieval scheme.

5. EXPERIMENTAL RESULTS

We tested the proposed shape matching and similarity ranking method on two prototype image databases: A binary Digital Clip Art database by the Image Club Graphics Inc. and a color image database consisting of images of 10 different car models from

We first compare the performance of contour-based vs boundary-based similarity matching using the metric (5) on some clip art images. A potential query object (an airplane) is shown in Fig. 5a. Three rigid and three nonrigid deformations of this query object from the database are depicted in Fig. 5b—g. Shape correspondences between the query object and each of the database objects are computed using 14 modes corresponding to the 14 largest eigenvalues for P50.05. Figure 6a and b plot the envelopes of the 1st and 20th eigenmodes of the query object (Fig. 5a) and the sixth database object (Fig. 5g), respectively. The first mode depicted by the solid curve, for instance, shows the envelope of the first column of the corresponding modal matrix V vs the boundary point index. It can easily be seen that the 20th mode (depicted by the dotted curve) is a high frequency mode while the 1st is a low-frequency mode. In Fig. 6c and d, we show the same plots for the B-spline fitted boundaries with 100 control points. The graphs show that the modes obtained by B-spline fitting are smoother. Figure 7 depicts the estimated matching points between the query object and two of nonrigidly deformed database objects. Because there are large number of data points, only 10% of the correspondences are shown. It can be seen that acceptable correspondences were found, except at the boundaries of the right wing

Fig. 2. Block diagram of the two-step retrieval system.

Shape similarity matching

937

Fig. 3. Examples from car image database.

Fig. 4. Examples from Digitalart Clip Art Catalog Images.

Fig. 5. (a) Query example. (b) 270° rotated. (c) 90° rotated. (d) 1/2 scaled. (e) Quite different view. (f) Very different view. (g) Very different view.

of the plane, although the correspondence model does not support nonrigid deformations. Ranking of the six database objects in terms of their similarity with the query example is given in Table 2.

The table lists the distances between the query and database objects computed by four different schemes for comparison purposes: the unified matching (ºM) scheme with the proposed shape similarity metric using all boundary points (ºM/all BP); ºM scheme using the B-spline control points of the contours (ºM/B-sp); the two-step (¹S) method using B-spline control points of the contours and the Hausdorff distance (HD) metric after affine registration (¹S/Bsp & HD); and the HD metric using all boundary points (HD/all BP). Table 3 includes the least-squares estimates of the affine parameters employed by the two step scheme. These parameters are used to register the database and query objects before computing the Hausdorff distance between them. These results can be interpreted as z The proposed similarity metric (1st and 2nd columns of Table 2) ranks the database objects in terms of their ‘‘similarity’’ to the query example, in a way consistent with human perception. Observe that the rotated planes, (b) and (c), yield a perfect match with the ºM/all BP method, while the scaled plane (d) results in a nonzero distance measure even all boundary points were used. However, nonrigid deformations, (e), (f), and (g), give higher measures, suggesting lesser similarity. z ºM/B-sp method does not provide a perfect match even in the case of rotations, because of the smoothing provided by the B-spline representation. However, it can clearly distinguish rigid deformations from nonrigid ones. z ¹S/B-sp&HD method fails to retrieve the nonrigidly deformed plane shown in Fig. 5(g). This is because the estimated affine parameters may be

938

B. GU®NSEL and A. M. TEKALP

Fig. 6. Envelope plots of the eigenvectors versus boundary points for the eigenshape representations of (a) Fig. 5(a) (all BP), (b) Fig. 5(g) (all BP), (c) Fig. 5(a) (B-sp), (d) Fig. 5(g) (B-sp). The solid and dotted plots denote the 1st and 20th eigenvectors, respectively.

5.2. Results on the car database

Fig. 7. Correspondence of the feature points for query example shown in Fig. 5(a).

unacceptable when the database object is not a rigid transformation of the query example. z The Hausdorff distance metric fails to order similarities for the scaled and nonrigidly deformed images while it performs very well with rotations.

Our second example demonstrates database preparation and compares contour-based vs. boundarybased similarity matching of five possible query templates, depicted in Fig. 8, and an image from our car database (a blue car), shown in Fig. 9a. Fig. 9b—d illustrate the steps in database preparation, i.e. shape feature extraction for storage with database images. First, a color sample patch is specified either on the background or foreground object as marked in Fig. 9a. Color based F/B separation is performed as described in Appendix A. Figure 9b shows the regions of the image whose color match the color sample. Figure 9c shows the smoothed image obtained by a filter which eliminates blue regions smaller than 50 pixels. Figure 9d displays all closed regions surrounded by blue pixels by marking each with a different color. Boundary/contour representations stored in the database for each object are selected among these closed regions. For a coarse contour based representation, only the outermost contour shown in Fig. 9g can be selected and this specifies the boundary of the retrieved object shown in Fig. 9h. For a finer boundary representation, the points on the borders of

Shape similarity matching

939

Table 2. Results of shape similarity ranking for the query example shown in Fig. 5a Figure (b) (c) (d) (e) (f ) (g)

ºM/all BP

ºM/B-sp

¹S/B-sp & HD

0.000000 0.000000 0.581323 1.825849 1.847753 1.881715

0.048125 0.109827 0.117602 1.417289 1.447367 1.506393

5.391831 3.970885 6.068726 42.792020 120.417610 1099.304612

HD/all BP 0.000000 h"270 0.000000 h"90 84.403 54.000 74.000 81.000

Table 3. Affine registration parameters estimated by the two-step method (¹S/B-sp) Figure (b) (c) (d) (e) (f ) (g)

a 11

a 22

a 12

a 21

!0.008652 !1.005323 0.506172 !0.296441 0.375435 !0.136446

!0.001655 !0.998795 0.505676 0.162715 0.005785 0.213968

!0.995743 0.003322 !0.005221 !0.437827 !0.552567 !0.061169

0.997345 !0.010402 !0.004827 !0.852070 0.218959 !0.831523

a

13

a

149.501383 151.974790 37.388406 130.692741 93.085879 86.887084

0.418744 !0.443877 38.069718 121.466011 58.406993 113.487696

23

Fig. 8. Query examples. (a) Temp. 1, (b) Temp. 2, (c) Temple. 3, (d) Temp. 4, (e) Temp. 5.

each of the marked regions may be selected. Figure 9e shows the car template which is used as query object. B-spline modeling of the object contour is shown in Fig. 9f, where the number of B-spline control points is set equal to 150. This number was selected to reduce the number of control points to approximately 50—60% of the original boundary points. As in the previous example, the shape similarity metric between the query template Fig. 9f and the object of contour Fig. 9g are computed by four different schemes and presented in the third row of Table 4 for comparison purposes. The rest of Table 4 provides the

shape similarity distances obtained for the query templates shown in Fig. 8a—e. Table 5 lists the leastsquares estimates of the affine parameters computed by the two-step method. It can be concluded from Table 4 that the similarity measures obtained by the method ºM/B-sp and ¹S/B-Sp & HD are similar. This is because both the original boundary points and the B-spline contour points are adequate to model the shape in this case. However, only ºM/all BP can detect the dissimilarity of car template shown in Fig. 8e, since it uses all boundary points.

940

B. GU®NSEL and A. M. TEKALP

Fig. 9. (a) Database image (color example patch is bounded by rectangle.) (b) Color classification result. (c) After smoothing. (d) Closed segments. (e) Query example. (f ) B-spline fitted contours of the query example. (g) Retrieved shape by modal matching. (h) Retrieved obect.

Table 4. Results of shape similarity analysis between the query examples shown in Fig. 8 and database image shown in Fig. 9a Templates Temp.1 Temp.2 Temp.3 Temp.4 Temp.5

ºM/all BP

ºM/B-sp

¹S/B-sp & HD

HD/all BP

0.170987 0.170987 0.174990 0.176338 1.314527

1.410236 1.298265 1.478280 1.408465 1.559369

7.274549 7.390784 8.612820 8.900485 11.307883

32.406371 42.329505 41.054752 25.744095 24.069159

Table 5. Affine registration parameters estimated by the two-step method (¹S/B-sp) Templates Temp. Temp. Temp. Temp. Temp.

1 2 3 4 5

a 11

a 22

1.273142 1.279201 1.331115 1.135517 1.067913

1.247740 !1.238341 1.115751 1.004790 0.979725

a 12 0.453186 !0.458881 !0.167853 !0.198221 !0.176451

The last example demonstrates the performance of the ºM/all BP method in cases where contour-based representations cannot be automatically obtained. Figures 10a—c illustrate different car database images. The smoothed foreground objects after F/B separation are shown in Fig. 10d—f. Here, the F/B separation

a 21 !0.303530 !0.299430 0.075591 0.331150 0.064744

a

13

!52.589716 15.800702 16.610467 !5.542172 20.016668

a

23

9.237884 195.458466 0.374542 !24.345672 !12.499075

is performed by using a background color sample. The textures of the pixels within the detected foreground regions are shown in Fig. 10g—i. The boundary points of the object of interest were obtained by Canny edge detector (p"1) . The resulting boundary points are depicted in Fig. 10j—l. In this case, it is clear

Shape similarity matching

Fig. 10. (a—c) Database images (color example patches are bounded by rectangles.) (d—f ) Smoothed color classification results. (g—i) Objects of interest. (j—l) Object boundaries.

Fig. 11. (a) Query example. (b—d) Three retrieval outcomes. (e) Boundaries of the query example. (f—h) Retrieved shapes.

941

942

B. GU®NSEL and A. M. TEKALP Table 6. Shape similarity distances and the respective number of important eigenvalues for the three outcomes of the retrieval Query example: (a) HD/all BP ºM/all BP No. of important eigen values

(b)

(c)

(d)

29.154759 0.114627

17.000000 0.233581

40.360872 0.235116

36

36

36

that closed object contours cannot be easily automatically found from the boundary points. Thus, we perform shape matching and similarity ranking in terms of object boundary points using the ºM/all BP method. Figure 11b shows the mirror image of the query example shown in Fig. 11a. Figure 11c and d depict two nonrigidly deformed images. Figure 11e—h pictures the object boundaries obtained by Canny edge detector. Table 6 presents the shape similarity distances between the query example and the other images, and compares them with the HD/all BP method. Also provided are the number of modes used in the modal matching. The order of shape similarity differences justifies the invariance of the proposed method to mild deformations as well as rotation and scaling.

6. CONCLUSIONS

This paper proposes a new shape similarity metric in the modal feature space, and thus, extends previous work on modal decomposition-based shape matching(15,20) by (i) providing a unified procedure for shape matching and similarity ranking; and (ii) proposing both B-spline based and statistical methods for reduction of dimensionality for faster response time. In particular, the proposed contour-point-based and boundary-point-based representations provide a coarse-to-fine models for shape matching and similarity ranking for different database browsing applications. We compare the results of four different coarse-to-fine shape matching and similarity ranking schemes. These results indicate that pose differences between similar objects lead to measurable similarity ranking differences with fine (boundary-point-based) representations while the similarity scores under the coarse (B-spline-fitted contour-based) representation do not indicate such differences. Future extensions of this work will include 3-D shape models, and 2-D/3-D mesh-based shape representations for reduction of dimensionality (i.e. selection of prominent boundary points by mesh-design procedures). Acknowledgements—This work is supported by a National Science Foundation SIUCRC grant and a New York State Science and Technology Foundation grant to the Center for Electronic Imaging Systems at the University of Rochester.

APPENDIX A. FOREGROUND/BACKGROUND SEPARATION

F/B separation is a fundamental problem in objectbased image and video processing. Because precise definition of objects in a scene is generally dependent on the interpretation of the user, it is impossible to develop automatic methods to extract semantically meaningful objects under any conditions. To this effect, the QBIC system, for example, provides both automatic and semiautomatic methods for F/B separation.(12) The automatic method in QBIC is intended for images with a simple background. In reference (5), an algorithm which integrates color and edge information for improved segmentation in the presence of busy backgrounds is proposed to extract objects from outdoor images. However, the computational complexity of this algorithm does not allow near real-time performance. In this paper, we employ an adaptive classification approach in a suitable luminance-chrominance space.(5,26) The number of classes is set equal to two, the object and background. The color threshold for background may be estimated automatically (unsupervised) or interactively (supervised) at run time from a user cliqued ‘‘background-truth’’ region. This step is optionally followed by a smoothing operation using the Graduated Non convexity (GNC) algorithm.(17,27) Median-filtering can also provide satisfactory results when the image is not noisy. Clearly this method will be successful for images with relatively uniform background colors, since it aims to separate pixels with the color of the ‘‘background-truth’’ from the rest. But it can be easily modified for textured backgrounds by just changing the elements of the feature vector from three color components to the textural features. The objects themselves can have any spatial distribution of color as long as they have sufficient contrast with the background. In practice, users may also want to pose a query specifying the color of the object of interest by simply clicking the ‘‘foreground-truth’’ region. Our adaptive color clustering scheme can also handle this case and provides the region of interest since the number of clusters is equal to two. In contrast to the separation of the background, the second case requires objects with relatively uniform colors. The two-classes adaptive clustering method seems suitable for the type of

Shape similarity matching

image databases targeted in this paper, namely retail and museum catalogs.

APPENDIX B. B-SPLINE FITTING

There are two ways of performing B-spline fitting.(9) The first uses a B-spline that interpolates between the sampled data points. The second finds an approximate B-spline which minimizes the defined error function between the contour of object and its corresponding B-spline representation. The second way requires less parameters and more robust to noise, therefore it has been adopted in this paper. Suppose we identify h ordered feature points from the continuous and closed contour of an object. The coordinates of the contour points are represented by the vector O"(O1, O2, 2 , Oh)"((x1, y1), (x2, y2), 2 , (xh, yh)). The objective is to find a set of m4h control points C , j"0, 1, 2 , m!1, and represent j the contour of the object using these control points and m connected curve segments Oi(t)"(xi(t), yi(t)). Since we deal with cubic B-splines, each of these curves is a linear combination of four cubic polynomials in the parameter t, where 04t41, i.e. O (t)"C Q (t)#C Q (t)#C Q (t) i i~1 0 i 1 i`1 2 #C Q (t), i"0, 1, 2 , m!1, i`2 3 where

(B.1)

Qk (t)"ak0t3#ak1t2#ak2t#ak3 , k"0, 1, 2, 3. (B.2) Let O be a contour feature point and O(t@ ) is its j j corresponding point on the B-spline. Then the following closed expression represents the curve segment for j4t@4j#1. m`2 O(t@j)" + Cimod(m)Qi,4(t@), j"1, 2, 2 , h, (B.3) i/0 where Q (t@), (i"0, 1, 2 , m!1) are called the nori,4 malized cubic B-spline bases and are related to each others by horizontal translation. Since the B-spline is factorizable into its x and y components, the sum of the residual error arising from fitting can be expressed as h h d2" + d2" + EOj!O(t@ )E2 j j j/1 j/1 h m`2 2 "+ xj! + C imod(m),x Q i,4(t{j ) i/0 j/1 m`2 2 # yj! + Cimod(m),yQi,4(t{j ) , (B.4) i/0 where d2j is the square of the residual error between the feature point Oj and its corresponding point on the B-spline, O(t@ ). j Here, the goal is to specify the (Ci,x, Ci,y) (i"0, 1, 2 , m!1) control points in such a way that minimizing the distance between the contour of object

GC C

D DH

943

and its cubic B-spline representation. If the values of the t@j ’s in equation (B.4) are known, the MMSE estimates for the m control points are obtained by leastsquares fitting algorithm. We have computed the t@j values using the equation (B.5) as it is suggested in reference (9) and extracted the control points by minimizing d2 with respect to (Ci,x, Ci,y). t@j`1"t@j#m

DOj!Oj~1D , l

(B.5)

where h`1 l" + DOj!Oj~1D j/2

(B.6)

represents the total length of the connected curve segments.

REFERENCES

1. B. Furht, S. W. Smoliar and H. Zhang, »ideo and Image Processing in Multimedia Systems. Kluwer Academic Publishers, Dordrecht (1995). 2. R. W. Picard and T. P. Minka, Vision texture for annotation. MIT Multimedia Laboratory Perceptual Computing Section TR No. 302 (1995). 3. S. Sclaroff and A. Pentland, Modal matching for correspondence and recognition. IEEE ¹rans. Pattern Anal. Mach. Intell. 17, 545—561 (1995). 4. A. Pentland, R. W. Picard and S. Sclaroff, Photobook: Tools for content-based manipulation of image databases, Proc. SPIE S&RI», vol. 2185 (1993). 5. E. Saber, A. M. Tekalp, R. Eschbach and K. Knox, Automatic image annotation using adaptive color classification. Graph. Mod. Image Process. 58(2), 115—126 (1996). 6. A. K. Jain and A. Vailaya, Image Retrieval Using Color and Shape. Pattern Recognition 29(8), 1233—1244 (1996). 7. D. P. Huttenlocher, G. A. Klanderman and W. J. Rucklidge, Comparing images using the Hausdorff distance. IEEE ¹rans. Pattern Anal. Mach. Intell. 15, 850—963 (1993). 8. L. S. Shapiro, Towards a vision-based motion framework. TR, Department of Engineering Science, Oxford University (1991). 9. F. S. Cohen, Z. Huang and Z. Yang, Invariant matching and identification of curves using B-splines curve representation. IEEE ¹rans. Image. Process 4, 1—10 (1995). 10. E. Persoon and K. S. Fu, Shape discrimination using Fourier descriptors. IEEE ¹rans. Pattern Analysis Mach. Intell. 8, 388—397 (1986). 11. J. L. Mundy and A. Zisserman, eds, Geometric Invariance in Computer »ision. MIT Press Cambridge, Massachusetts (1992). 12. M. Flickner et al., Query by image and video content: The QBIC system. IEEE Comput. 23—31 (1995). 13. S. F. Chang and J. R. Smith, Extracting multi-dimensional signal features for content-based visual query. Proc. SPIE 2501, 995—1006 (1995). 14. S. Sclaroff, Deformable prototypes for encoding shape categories in image databases. Pattern Recognition 30(4), 627—641 (1997). 15. L. Shapiro and O. Brady, Feature-based correspondence: an eigenvector approach. Image »ision Comput. 10(5), 283—288 (1992). 16. J. Canny, A computational approach to edge detection. IEEE ¹rans. Pattern Anal. Mach. Intell. 8, 679—698 (1986).

944

B. GU®NSEL and A. M. TEKALP

17. B. Gu( nsel, A. K. Jain and E. Panayirci, Reconstruction and boundary detection of range and intensity images using multiscale MRF representations. Comput. »ision Image ºnderstanding 63(2), 353—366 (1996). 18. M. J. Swain and D. H. Ballard, Color indexing. Int. J. Comput. »ision 7(11), 11—32 (1991). 19. MPEG-4 Video Verification Model Version 5.0, International Organisation for Standardisation (November 1996). 20. E. Saber and A. M. Tekalp, Region-based affine shape matching for automatic image annotation and query-byexample. »isual Commun. Image Representation, March (1997). 21. R. N. Shepard, Toward a universal law of generalization for psychological science. Science, 1317—1323 (1987). 22. C. L. Krumhansl, Concerning the applicability of geometric models to similarity data: The interrelationship

23. 24. 25. 26. 27.

between similarity and spatial density. Psychological Rev. 85, 445—463 (1978). A. Tversky and I. Gati, Similarity, separability, and the triangle inequality. Psychological Rev. 89, 123—154 (1982). S. Santini and R. Jain, Similarity Queries in Image Databases. Proc. IEEE Comput. »ision Pattern Recognition 646—651 (1996). D. L. Swets and J. Weng, Using discriminant eigenfeatures for image retrieval. IEEE ¹rans. Pattern Anal. Mach. Intell. 18(8), 831—836 (1996). B. Gu( nsel and A. M. Tekalp, Similarity analysis for shape retrieval by example. Proc. 13th Int. Conf. on Pattern Recognition (ICPR), pp. 330—334 (1996). A. Blake and A. Zisserman, »isual Reconstruction. MIT Press, Cambridge, Massachusetts (1987).

About the Author—BI0 LGE GU®NSEL received M.S. and Ph.D. degrees in Electronics and Communication Engineering from Istanbul Technical University, Istanbul, Turkey in 1988 and 1993, respectively. She is currently a Research Associate in the Electrical Engineering Department at University of Rochester, Rochester, U.S.A. During 1944—1995 she was an Assistant Professor in the Electrical and Electronics Engineering Department at Istanbul Technical University. Her research interests include video/image content analysis and retrieval, stochastic image models and multimedia information systems. Dr Gu¨nsel is a member of IEEE. About the Author—A. MURAT TEKALP received M.S. and Ph.D. degrees in Electrical, Computer and Systems Engineering from Rensselaer Polytechnic Institute (RPI), Troy, New York, in 1982 and 1984, respectively. From December 1984 to August 1987, he was a research scientist, and then a senior research scientist at Eastman Kodak Company, Rochester, New York. He joined the Electrical Engineering Department at the University of Rochester, New York, as an Assistant Professor in September 1987, where he is currently a Professor. His current research interests are in the areas of digital image and video processing, including image restoration and reconstruction, object-based image/video editing and coding, object-tracking, and image/video indexing for digital libraries. Dr Tekalp is a senior member of IEEE. He received the NSF Research Initiation Award in 1988. He has served as an Associate Editor for the IEEE Transactions on Signal Processing (1990—1992), IEEE Transactions on Image Processing (1994—1996), and Kluwer Journal on Multidimensional Systems and Signal Processing (1994—1996). He has been the Special Sessions Chair for the IEEE Int. Conf. on Image Processing (1995). He has served as the Chair of the IEEE Rochester Section in the 1994—1995 term of office. At present, he is the Chair of the IEEE Signal Processing Society Technical Committee on Image and Multidimensional Signal Processing. He is also on the editorial boards of Academic journals Graphical Models and Image Processing, and Visual Communications and Image Representation. He authored the Prentice-Hall book Digital Video Processing (1995).