Image and Vision Computing 17 (1999) 245–261
Shape indexing by multi-scale representation A. Del Bimbo*, P. Pala Dipartimento di Sistemi e Informatica, Universita` di Firenze, Via S. Marta 3, 50139 Florence, Italy Received 25 July 1997; received in revised form 1 April 1998; accepted 20 April 1998
Abstract Accessing large image databases requires effective indexing in order to restrict the number of database items that have to be processed. Indexing based on shapes is particularly challenging owing to the difficulty of deriving a similarity measure that supports clustering of shapes conforming with human perceptual similarity. Most previous techniques are based on the extraction of salient shape features and their organization into multi-dimensional point access structures. However, these features are extracted by analyzing shapes at a single resolution scale, and are not able to provide a robust representation. In this paper, we present a technique which exploits multi-scale analysis of shapes, to derive a hierarchical shape representation in which shape details are progressively filtered out while shape characterizing elements are preserved. A graph structure is introduced to represent shape parts at different scales and a procedure is defined to merge graphs of different shapes. Given a query shape, the graph can be traversed to select, through a coarse to fine matching, those database shapes which share similar structural parts with the query. 䉷 1999 Elsevier Science B.V. All rights reserved. Keywords: Image database; Shape representation; Shape indexing; Multi-resolution analysis; Scale–space filtering
1. Introduction The ever increasing rate at which images are generated in many application areas, gives rise to the need for contentbased image retrieval systems to provide an effective and efficient access to databases of images, on the basis of their visual content. Since visual information is difficult to describe using words, image attributes by themselves, instead of their textual representations, have to be used to access the database. Indexing procedures need to be developed to provide efficient retrieval. Through indexing, non-interesting images are filtered out and the number of images that must be matched is reduced. Several image properties and structures can be exploited to this end, such as colors, textures, shapes, and spatial relationships among image objects. In shape-based retrieval the query objective is focused on the form of the objects represented in the images. At database population time, shape contours must be extracted from database images. This can be done manually, or through partial human interaction, or in some cases even automatically—if the contrast background/foreground or a specific knowledge about the context of the application allow automatic techniques to be effective. * Corresponding author. Tel.: +39 55 4796262; fax: +39 55 4796363.
Retrieval is the process of finding images containing objects whose contours are similar to the query sketch. Differently from colors and textures, shape-based image retrieval is particularly challenging owing to the lack of a complete mathematical model of how humans perceive shapes and similarities between different shapes. Queries are usually performed by sketching the contour of the required object. Shape-based retrieval systems can be distinguished whether they employ feature-based representations of shapes, and measure similarity as a distance in the feature space, or evaluate similarity through template matching, following shape transformation techniques in order to derive a measure of shape similarity. Systems of the first class treat shapes as points in a multidimensional space, and evaluate the similarity between two shapes as the distance between corresponding points in the feature space. If the dimension of this space is not too large, access to the database can be filtered using well-known multi-dimensional point access methods (PAM). Values of features are computed off-line for the shapes in the database. At run-time, a comparison between feature values of the sketch and database images is performed to measure their similarity. Feature-based approaches tend to be brittle when shapes are a distorted version of the query template, and for most features there is no warranty that the human notion of
0262-8856/99/$ - see front matter 䉷 1999 Elsevier Science B.V. All rights reserved. PII: S0 26 2 -8 8 56 ( 98 ) 00 1 06 - 1
246
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
‘closeness’ corresponds to the topological closeness in the feature space. Usually, shapes require the extraction of a great number of features in order to be reliably modeled. In order to perform indexing, this representation space must be reduced to a lower dimension. Examples of systems for shape-based retrieval following this approach have been proposed previously [5,10,13,14]. They all use multidimensional point access methods to speed up search in the database. The system of Mehrotra and Gary [13] use a polygonal approximation of the boundary with very few boundary key points. Structural features of the polygon boundary are defined considering couples of adjacent vertices of this polygon. For each vertex, the corresponding feature includes the internal angle at the vertex, the distance from its adjacent vertex in the clockwise direction, and its coordinates. Indexing is supported using a b-tree. A similar approach is used by Grosky and Mehrotra [10]. The QBIC system [5] assumes that shapes are nonoccluded planar shapes. Instead of using external boundary descriptions, a large set of global features is used: area, circularity, eccentricity, major axis orientation and a set of algebraic moment invariants. Indexing is performed using an R-tree after having applied the Karhunen–Loeve transform in the feature space. The system works well for global feature properties, but provides only a limited ability to evaluate similarity according to local boundary properties. In systems following the shape transformation approach, one shape acts as the template, and the other is warped until the two coincide. The distance between the two shapes is a transformational distance and is measured by the amount of deformation needed to match the two shapes. Examples of such systems are elastic template matching [15], the PHOTOBOOK system [18], and the system of Jain and Vailaya [8]. Indexing is usually obtained by adding filtering modules (based on shape properties [18] or other features such as colors, textures [8] or spatial relationships [15]) which avoid the sequential scanning of the whole database. These methods are typically more robust to cope with shape distortions and allow comparison with partially occluded shapes. Typically, their computational requirements are high since database images must be processed sequentially. An accurate and effective measure of shape similarity is therefore associated with a loss of performance in terms of system efficiency. Effective shape-based retrieval must be based on a shape representation model which exploits both the advantages of multi-dimensional indexing and the robustness to shape distortion of elastic approaches. In this paper, we present a method for indexing of two-dimensional planar closed shapes on the basis of their visual appearance. The proposed approach is based on a hierarchical description of the shapes at different levels of abstraction. Shapes are analyzed at different scales of
resolution, at each scale partitioning the shape in significant parts. As the resolution changes, from fine to coarse, adjacent parts tend to combine to create more general structures, whose location, orientation and attributes are characteristic of the shape. Shapes which are partially different tend to exhibit similar characterizing elements if they are analyzed at coarse scales. This fact is exploited to derive a clustering criterion based on the most perceptually salient features of the shapes. The coarser the scale, the more the shapes are clustered according to their characterizing properties. To preserve the inherent uncertainty of visual attributes, shape parts are described through fuzzy attributes. This indexing technique has been included in a complete system [15,21] for image retrieval by color and color regions similarity, as well as by 2D pattern similarity. The paper is organized as follows. In Section 2, multiscale analysis is reviewed and a novel multi-scale shape description is presented. In Section 3, the proposed shape description is used to define an index structure. In Section 4, considerations about the effectiveness of the approach are expounded with several examples.
2. Multi-scale shape analysis Since the work of Attneave [1], many contributions have appeared to point out the central role played in the partitioning process by those points where the shape bends more sharply. Following the idea proposed in [2,3], we assume that the information modeling a shape includes the description of its parts and of their relationships. In particular, we assume that the characterizing elements of a curve are its protrusions and the way these protrusions are organized along the curve. Identification of curve protrusions is performed through the analysis of the curvature function. A protrusion is a portion of the curve bounded by two local maxima of curve concavity. We assume to analyze the curve according to a reference counterclockwise so that minima of the curvature function correspond to the boundaries of curve protrusions. For a generic object contour, the number of partition points can be high. Some of them identify characterizing elements of the shape, others identify insignificant details. Characterizing elements of a shape can be separated from details by analyzing the shape and its curvature at different scales. By progressively smoothing the shape, its details can be filtered out, and what remains is a coarser representation of the original shape, keeping its prominent features. This process can be applied several times, and each time the new shape can be viewed as a generalization of the preceding one, preserving its characterizing elements but loosing some details. This method is commonly referred to as scale–space filtering [9].
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
Fig. 1. Shape of a horse and its curvature function at different scales. The corresponding values of j are 4, 16 and 64 respectively.
2.1. Scale–space filtering Mathematically, a planar, continuous, closed curve can be parameterized with respect to its arc length t, and expressed as: c(t) ¼ {x(t), y(t)}, t 僆 [0, 1] To analyze c at varying levels of detail, functions x(t) and y(t) are convolved with a one-dimensional Gaussian kernel [6]: 1 t2 g(t, j) ¼ pexp ¹ 2 2j j 2p The two components of the smoothed curve C(t,j) are therefore evaluated as: Zt x(s)g(t ¹ s, j) ds
X(t, j) ¼ ¹⬁
Fig. 2. Curve token and its equivalent triangle.
247
248
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
Fig. 3. Shape of a horse at the resolution of four curvature minima. Tokens, equivalent triangles and orientation vectors are evidenced. For each token, membership values associated with token attributes are reported.
Zt Y(t, j) ¼
y(s)g(t ¹ s, j) ds ¹⬁
The curvature G(t,j) of C(t,j) at the point {Xðt; jÞ; Yðt; jÞ} can be expressed as [7]: G(t, j) ¼
Xt (t, j)Ytt (t, j) ¹ Xtt (t, j)Yt (t, j) (Xt2 (t, j) þ Yt2 (t, j))3=2
where X t, Y t and X tt, Y tt are the first and second derivatives of X and Y with respect to t, respectively. For a given value of j, we define P j ¼ {p n} as the set of points p n of the curve corresponding to the relative minima t n of G(t,j), i.e.
front legs are combined into a single protrusion; the same thing happens for the rear legs. The tail disappears and it is joined with the rest of the back. 2.2. Shape description An effective description of a generic shape can be achieved through the representation of the visual aspect of each token at different scales of resolution. We will refer in the following to the k-scale of a generic shape, as the smoothed shape with k minima of its curvature function.
pn ¼ C(tn , j) The set P j partitions the curve C(t,j) into tokens corresponding to the main protruding and indenting parts of the shape at the level of detail identified by j. When j varies, the number and locations of p n vary: the higher is j, the lower is the number of p n. Variations of j can have two different effects: (i) the number and location of minima change; and (ii) only the location of minima changes. Values of j where the number of minima changes mark structural changes in the aspect of the curve. A minimal but significant representation of the curve can thus be derived by describing the curve only in correspondence with these critical values of j. In Fig. 1, the effects of smoothing on the number and position of minima curvature points are shown. Also shown are correspondences between partition points along the curve and minima of the curvature. It can be noticed that, at the finest scale, the horse shape presents many partition points, which identify parts such as the muzzle, the neck, the back, the tail, and each single leg. At a coarser scale, the neck and the muzzle are combined into a single part which models the entire head of the horse. The two
Fig. 4. Representation structure associated with the shape of Fig. 3.
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
249
Fig. 5. Data structures representing the shapes of two horses. Correspondences between symbolic descriptions are highlighted with dotted arrows. Equivalence between the descriptions is achieved by a unitary circular shift of the tokens of the second shape. The first token t 0(C 0) of the first shape corresponds to the second token t 1(C 1) of the second shape; t1 ðC0 Þ to t2 ðC1 Þ, t2 ðC0 Þ to t3 ðC1 Þ and t3 ðC0 Þ to t 0(C 1).
Fig. 6. Shape rotated by 10, 20 and 30⬚. Corresponding symbolic descriptors are shown evidencing (dashed lines) equal descriptors in the four representations.
250
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
Fig. 7. Boundary shape of a bottle considering increasing degrees of occlusion (represented at four scales). Equivalent triangles of each token are shown.
2.2.1. Token description For each token t n we consider its equivalent triangle (see Fig. 2), with vertices at the two token extremes p n, p nþ1 and the token center of mass b n computed by averaging x and y coordinates along the token. The visual aspect of t n is represented by three properties, accounting for the prominent visual features of the shape: (i) the symmetry S of the equivalent triangle with respect to its basis (p n,p nþ1); S is evaluated considering the difference between the two angles b np np nþ1 and b np nþ1p n; ~ n where m n is the (ii) the orientation O of the vector bn m
median point of the segment (p n,p nþ1); O is expressed through the angle v n, which characterizes the vector in polar coordinates; (iii) the normalized token length L; L is expressed through the ratio between the token and the shape length l n/L, being: C(tn þ 1 , j)
C(tk , j)
Z
Z
ds, L ¼
ln ¼ C(tn , j)
ds C(t0 , j)
Shapes of the same object may exhibit a high variability in terms of shape details or spatial arrangement of the
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
251
Fig. 8. Data structures representing two different shapes of horses at three distinct scales. At each scale D(·), LOa(·) and LSOL a ð·Þ descriptions are represented as graph nodes. Equal descriptions of the two shapes are highlighted with equal textures.
tokens. To improve the tolerance of shape description to this variability, fuzzy sets have been used to represent the values of the three properties. A set of attributes is associated with each property S, O and L: (i) the value of the symmetry S, which ranges in [ ¹ p,p], is bucketed using the set of attributes (left, center, right); (ii) the value of the orientation O, ranging in [0,2p], is bucketed using the set (north, south, east, west); and (iii) the value of the ratio L, which ranges in [0,1], is bucketed using the set (short, med, long). A
membership function m a is defined for each attribute a. The value of m a (a real value in [0,1]) for a token t n represents to what extent the visual aspect of t n conforms to the attribute a. A symbolic description is derived by thresholding values of the membership functions. For each token t, l a(t,p) is defined as the set of labels representing those attributes a of the property p such that m a ⬎ a.
Fig. 9. Index structure for the representations of the two shapes in Fig. 8. The structure keeps only one node description for nodes having the same descriptions.
252
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
Fig. 10. Interface of the retrieval system.
Fig. 11. Bottle sketch and the first retrieved items.
253
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
Fig. 12. First 24 shapes indexed according to the query of Fig. 11.
2.2.2. Shape description A shape C at a generic scale k is represented using three descriptors: (i) the set of 10-dimensional numeric arrays D(C,k)—one array for each token t n with n 僆 {0,…,k ¹ 1}—storing the values of m a for all the attributes of the token; (ii) a symbolic description LSOL a (C,k) retaining the most relevant attributes of all the three properties: orientation, length and symmetry S O L LSOL a (C, k) ¼ La (C, k) ⫻ La (C, k) ⫻ La (C, k)
where LSa (C, k) ¼ la (t0 , S) ⫻ la (t1 , S) ⫻ …la (tk ¹ 1 , S) LOa (C, k) ¼ la (t0 , O) ⫻ la (t1 , O) ⫻ …la (tk ¹ 1 , O) LLa (C, k) ¼ la (t0 , L) ⫻ la (t1 , L) ⫻ …la (tk ¹ 1 , L)
(iii) LOa(C,k), the set of symbolic labels associated with tokens t n for the property O, according to the threshold a. As an example, in Fig. 3, a curve C is shown with its fuzzy description D(C,4) Several symbolic descriptions can be derived from D(C,4) according to different values of the Table 1 Size of the index structure for some values of a. The number of A-, B- and C-nodes are reported separately. the sample database is composed of over 400 shapes described at different scales a
A-nodes
B-nodes
C-nodes
0.50 0.45 0.40 0.35 0.30 0.25
69 79 99 122 136 177
221 295 348 444 540 975
1600 1600 1600 1600 1600 1600
254
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
Fig. 13. Horse sketch and the first retrieved items.
threshold a. If we assume a ¼ 0.4, the symbolic description LO0.4(C,4) can be computed as: l0:4 (t0 , O) ¼ {north} l0:4 (t1 , O) ¼ {west} l0:4 (t2 , O) ¼ {south, east} l0:4 (t3 , O) ¼ {north} LO0:4 (C, 4) ¼ {(north, west, south, north), (north, west, east, north)} This scheme provides three levels of abstraction for the description of a shape. D(C,k) corresponds to the lowest level of abstraction and is used for a fine analysis of shape properties. LSOL a (C,k) corresponds to an intermediate level: the shape is represented through symbolic labels computed on all the properties of the tokens. LOa(C,k) corresponds to the highest level: only the orientation property of the tokens is taken into account, and only the prominent perceptual features of the shape are retained. Using high values of a, symbolic descriptions retain only those attributes of each token which have a high value of the membership function (the prominent perceptual attributes). With low values of a, non-prominent perceptual attributes are represented also at the highest level of abstraction. In Fig. 4, the representation of the shape in Fig. 3 is shown. Since at the coarsest scales tokens identify characterizing elements of the shape, at these scales similar shapes share similar tokens. In Fig. 5, representation schemes associated with two shapes are shown. Dotted arrows highlight symbolic labels which are equal apart from circular shifting (in
the absence of a specific criteria to determine which token should be considered the first one, symbolic label matching need to be invariant to circular shift). In Fig. 5, the first token of the left horse corresponds to the front leg of the horse. Differently, the first token of the right horse corresponds to the head of the horse. The use of the orientation property for shape description makes it sensitive to rotations. A rotation of the shape of an angle v determines a rotation v of the orientation vectors. Therefore, the tolerance of the description to shape rotations depends on the value of a. As an example in Fig. 6 the shape of Fig. 3 is rotated by v ¼ 10, 20, 30⬚ and the corresponding descriptions LOa(C,k) are shown for a ¼ 0.4, 0.35, 0.30. Equal descriptors are evidenced with dashed lines. If an object is occluded, a portion of its contour is modified. In these cases, the description of the object shape can change. In our approach, since the shape description is obtained from a multi-scale representation, some minor occlusions can be coped with. In fact, if the occlusion does not alter the global appearance of the shape, the change of shape Table 2 Indexing time for different values of a on a SGI R4600 133 MHz processor a
Indexing time (s)
0.50 0.45 0.40 0.35 0.30 0.25
0.5 0.7 0.9 1.1 1.3 2.5
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
255
Fig. 14. First 22 shapes indexed according to the query of Fig. 13.
description is localized only at fine scales. In Fig. 7, they are shown four bottle-shapes with different degrees of occlusion, at different resolutions. Tokens at the highest resolution are quite unsimilar, but as the resolution decreases, characterizing elements of the shapes are preserved and tokens become similar.
3. Index structure Shape descriptions of different objects have been embedded into a graph index structure. At each scale k, symbolic descriptions of different objects are clustered, separately for LOa(·) and LSOL a (·). Clusters define nodes of a graph. A-nodes store symbolic descriptions LSOL a (·), B-nodes LSOL a (·) descriptions, and C-nodes fuzzy descriptions D(·). Nodes are connected through arcs. Two types of arcs are provided: (i) Arcs between nodes of the same type at different
scales k 1 and k 2. They model relationships between descriptions of an object at different scales. Only A-nodes are connected by this type of arc. (ii) Arcs between nodes of different types at a certain scale k. They model relationships between descriptions at different levels of abstraction, from coarse to fine. In Fig. 8, representations of two horse shapes are shown, for three different scales. Links between nodes correspond to relationships between descriptions at the same scale (vertical links) or at different scales (horizontal links—only at the highest abstraction level). In Fig. 9 the two representations of Fig. 8 are combined into a single representation in which descriptions shared by the two shapes are represented through a single node. The threshold a determines the number of symbolic descriptions that are associated with each description, and tunes the tolerance of the index structure to the imprecision of part description.
256
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
Fig. 15. Christ sketch and the first retrieved items.
3.1. Traversing the index Given a query shape C q, selection of database shapes sharing similar characterizing elements is performed by comparing descriptions of C q (obtained at different scales) against the graph nodes. Starting from the coarsest scale k 1, the elements of the set LOa(C q,k 1) are compared with the A-nodes of the graph. If a match is detected, the elements of the set LSOL a (C q,k 1) are compared with B-nodes linked to the matched A-node. Matching of two generic descriptions requires to us manage two different problems: (i) the scales of the two descriptions may be different; and (ii) there is no criterion to determine the ‘first’ token along the curve. Allowing cross-scale matching improves the robustness of the matching procedure, but at the same time increases its computational complexity. A threshold y can be used to establish the degree of cross-matching allowed. Two generic descriptions l 1 and l 2 referring to the scales s 1 and s 2, respectively, match iff: (i) ls 1 ¹ s 2l ⱕ y; and (ii) there exists a number r of circular shifts of the symbols in l 1 such that the two descriptions are equal. By allowing circular shifting in the matching procedure, invariance is achieved with respect to the location of the first token along the object contour. In our approach, cross-scale matching is not allowed, therefore we set y ¼ 0. Hence, D(C q,k 1) is compared with the C-nodes of the graph, and matching scores are computed. C-nodes contain numeric values corresponding to the fuzzy description D(C,k) of a database shape. Numeric descriptions of corresponding tokens are compared by evaluating the average correlation.
The same analysis is propagated to the following descriptions through the links between the A-nodes. The procedure terminates when the nodes corresponding to the finest scale are analyzed. Since a database shape can matched at several scales, global matching scores are computed by averaging matching scores computed at each scale. Assuming that a shape C has been matched at scales k 1,k 2,…,k N with matching scores m 1,m 2,…,m N, respectively, then the global matching score M is computed as: M¼
kN 1 X m N i ¼ k1 i
3.2. Complexity and index size The complexity of the indexing procedure depends on the number of graph nodes which are traversed during the search for a query shape C q. In the following, the analysis is simplified by assuming that no cross-scale matching is allowed between symbolic descriptions (i.e. y ¼ 0). Let N i,k represent the number of nodes of type i and scale k. According to the traversing procedure, A-nodes at the coarsest scale k 1 are all analyzed sequentially until a match with LOa(C q,k 1) is found. Nodes at the other scales are not analyzed sequentially. Only nodes which are linked with a node matched at a coarser scale are analyzed. Therefore, only a percentage p i,k of the N type i nodes is analyzed at a generic scale k. In the average case, the comparison of A-nodes of the coarsest scale requires NA, k1 =2 symbolic description comparisons. Let pB, k1 be the percentage of B-nodes which are
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
257
Fig. 16. First 24 shapes indexed according to the query of Fig. 15.
linked to the matched A-node. pB, k1 NB, k1 B-nodes are all analyzed sequentially until a match with LSOL a (C q,k 1) is found. In the average case this operation requires pB, k1 NB, k1 =2 symbolic description comparisons. Let pC, k1 be the percentage of C-nodes which are linked to the matched B-node. The pC, k1 NC, k1 C-nodes are all analyzed sequentially, and for each node matching scores with D(C q,k 1) are computed. To provide circular shifting invariance, matching between A-nodes at scale k 1 requires k12 symbol comparisons. Matching between B-nodes at the same scale requires 3k 1 symbol comparisons. The matching score between C-nodes requires us to correlate two onedimensional arrays of size 3k 1; this amounts to 3(k 1 þ 1) products. The same analysis holds for the other scales, with the only difference being that not all the A-nodes at scale k 2 are analyzed, but only the percentage pA, k2 which are linked to the A-node matched at the level k 1.
According to this, the complexity of the indexing procedure amounts to: ! kN X pA, k NA, k k2 þ pB, k NB, k 3k 2 k ¼ k1 symbol comparisons and kN X
(pC, k NC, k 3(k þ 1))
k ¼ k1
products, where pA, k1 ¼ 1. As to the size of the graph structure, this is bounded by: (i) the number of shapes in the database; (ii) the number of scales at which each shape is described; (iii) the value of tolerance to uncertainty Let us consider first the number of nodes which are generated by the description of a shape at the scale k. Membership functions can be chosen such that, for each
258
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
property, at most two membership functions exhibit values greater than a. In this case, the maximum number of elements of the set LOa(C,k) equals 2 k. Similarly, for the 3k set LSOL a (C,k) this maximum number equals 2 . Considering descriptions of the shape at scales 2,…,L, the maximum number of nodes is: A ¹ nodes : 22 þ … þ 2L ¼ 2L þ 1 ¹ 4 B ¹ nodes : 23ⴱ2 þ … þ 23ⴱL ¼
8L þ 1 ¹ 46 7
C ¹ nodes : L ¹ 2 For a database composed of N shapes, the maximum value of the index size is given by N[(L ¹ 2) þ (2 Lþ1 ¹ 4) þ ((8 3Lþ1 ¹ 46)/7)]. This worst case condition corresponds to the case when all the following hypotheses hold: (i) at any resolution, all tokens have two membership functions with value greater than a, for S, O and L; (ii) there are no shapes which share equal descriptions, and this condition holds at all the scales; and (iii) each shape is described at all the scales 2,…,L. In this worst case, the number of nodes of the graph grows exponentially with the number of scales at which shapes are described and linearly with the number of shapes. Actually, the index size is much smaller than this upper bound. This is due to the fact that, in real cases, the worst condition hypotheses do not hold. Many tokens exhibit marked visual properties, thus, for many of them, only one membership function exceeds the threshold a. This approach tends to preserve the characterizing elements of the shapes. As a consequence, many shapes have the same symbolic descriptions, determining a number of A and B nodes much less than the theoretical upper bound, as the coarseness of the representation grows. Table 1 gives the size of the index for different values of a for a sample database. 4. Experimental results In this section, we give a qualitative view of the performance of the proposed indexing approach through a set of examples. We show the sketch-based interface of the system for image retrieval and results of indexing using the technique expounded in previous sections. Examples presented use shapes of different degrees of complexity. For each sample sketch, we show the output of the system (the first indexed images) and shapes of objects that have been classified. Shapes are ranked according to the extent to which the shape characterizing elements are close to the sketch characterizing elements. The test database includes 555 shapes representing objects of different forms. Shapes have been extracted manually from images and sampled at 100 equally-spaced points. The index is automatically built according to the technique expounded in the previous sections.
Fig. 10 shows the query interface of the retrieval system. Buttons used for querying by shape similarity are located in the upper part of the icon menu (the pointer, the pencil, the rubber, the line and circle primitives). Buttons in the bottom left menu show functions available for shapebased retrieval: (i) multi-resolution indexing (described in this paper); (ii) token indexing (allowing retrieval by sketching only parts of the object contour—not discussed in this paper); and (iii) elastic matching (only for retrieval refinement on the image set extracted through the indexing procedures—reported in [15]). Two usersettable tuning knobs are used to define the user’s confidence in how much the sketch represents the desired shapes and to choose the number of best ranked shapes that are displayed. Fig. 11 shows the sketch of a simple form (a roundedbody bottle) and the best ranked indexed images. In order to evidence the shapes of the objects represented in the retrieved images, in Fig. 12, object contours corresponding to the 24 best-ranked shapes are reported after zooming. Most of these images (about 80%) represent bottles, even though their forms are quite different from each other. It can be noticed that multi-resolution indexing makes retrieval insensitive to differences between shapes of the same object, produced, for instance, by artifacts in the bottle neck or in the body. A sort of generalization of the sketch, preserving structural features, is performed internally in order to extract most similar shapes from the entire database. This gives the user a certain degree of flexibility in sketching the profile of the object, without being forced to a precise reproduction of its contour. Shapes of objects different from bottles share a somewhat similar structure with bottles. The bottle structure characterized by a thin neck attached to a larger body can be recognized in the shape of the flower vase (21st shape) and in that of two pears (2nd and 3rd shapes). Examples of indexing with more complex shapes are shown in Fig. 13 (with the first 22 shapes indexed according to the query shown in Fig. 14) and Fig. 15 (with the first 24 shapes indexed according to the query shown in Fig. 16). It can be observed that the system allows retrieval of objects with substantially different postures. This is particularly visible in the case of the horse sketch. Horse shapes with two legs are retrieved together with shapes of horses and other quadrupeds with more than two legs. Such differences in the forms are reflected in rankings assigned by the indexing procedure. Shapes very similar to the sketch, even though somewhat strongly deformed, are ranked in the first positions of the retrieval list. Lower rankings are assigned as shapes become more different from the original sketch, following a gradual degradation. It can be observed that profiles of hats (ranked in the lowest positions) are a somewhat smoothed version of the horse sketch profile, still keeping its prominent features. Fig. 17 shows average recall and precision values measured over a set of 100 different queries. Change of
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
259
Fig. 17. R and P as functions of a for three values of b.
user confidence determines changes in recall and precision values. As the user confidence grows, a higher selectivity is obtained. As the number of best-ranked shapes that are displayed is increased, recall gets higher and precision can lessen. Plots of the relationship between recall and precision
for different (normalized) values of these parameters are shown in Fig. 17. Fig. 18 shows the percentage of C-nodes that are crossed in the database index as a function of user confidence. It can be observed that this function decreases
260
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261
Fig. 18. Percentage of crossed C-nodes with respect to their total number as a function of a.
monotonically with this parameter. From 85 to 90% of comparisons are saved with respect to sequential scanning of the database. In Table 2 data about the speed of the indexing procedure are given. Data refer to a SGI R4600 133 MHz processor and IRIX 5.3 Operating System. Indexing times have been computed for different values of a with reference to the test database described in Section 4.
5. Conclusions In this paper, we have presented shape indexing through multi-scale analysis of shapes. Shape descriptions are organized in a layered graph that groups shapes sharing similar parts. Viewing shapes at multiple scales allows structural constitutive elements of the shape to be captured. The technique has a gradual degradation in ranking similar shapes, assigning lower rankings to increasingly different shapes. The complexity analysis shows that this approach can be effectively implemented with large image databases.
References [1] F. Attneave, Some informational aspects of visual perception, Psych. Review 61 (1954) 183–193. [2] I. Biederman, Recognition by components, Psych. Review 94 (1987) 115–147. [3] D.D. Hoffman, W.A. Richards, Parts of recognition, Cognition 18 (1985) 65–96.
[4] M. Leyton, Inferring causal history from shapes, Cognitive Science 13 (1989) 357–387. [5] C. Faloutsos, M. Flickner, W. Niblack, D. Petkovic, W. Equitz, R. Barber, The QBIC project: efficient and effective querying by image content, Research Report 9453, IBM Research Division, Almaden Research Center, August 1993. [6] A.K. Mackworth, F. Mokhtarian, Scale-based description of planar curves, Proc. 5th Conf. of Canadian Society of Computational Studies of Intelligence, London, Ontario, Canada, May 1984, pp. 114–119. [7] A.K. Mackworth, F. Mokhtarian, A theory of multiscale, curvaturebased shape representation for planar curves, IEEE PAMI 14 (8) (1992) 789–805. [8] A.K. Jain, A. Vailaya, Image retrieval using color and shape, Pattern Recognition 29 (8) (1996) 1233–1244. [9] A.P. Witkin, Scale-space filtering, Proc. 7th Int. Joint Conf. on Artificial Intelligence, Karlsruhe, Germany, 1983, pp. 1019–1022. [10] W.I. Grosky, R. Mehrotra, Index-based object recognition in pictorial data management, Computer Vision, Graphics and Image Processing 52 (1990) 416–436. [11] R. Mehrotra, F.K. Kung, W. Grosky, Industrial part recognition using a component index, Image and Vision Computing 3 (1990) 225–231. [12] W.I. Grosky, P. Neo, R. Mehrotra, A pictorial index mechanism for model-based matching, Proc. 5th Int. Conf. on Data Engineering, Los Angeles, CA, Feb. 1989, pp. 180–187. [13] P. Mehrotra, J.E. Gary, Similar-shape retrieval in shape data management, IEEE Computer September (1995) 57–62. [14] H.V. Jagadish, A retrieval technique for similar shapes, Proc. ACM SIGMOD, New York, 1991, pp. 208–217. [15] A. Del Bimbo, P. Pala, Visual image retrieval by elastic deformation of user sketches, IEEE Trans. on Pattern Analysis and Machine Intelligence, Feb. 1997. [16] C. Teh, R.T. Chin, On image analysis by method of moments, IEEE Transactions on PAMI 10 (1988) 496–513. [17] A. Khotanzad, Y.H. Hong, Invariant image recognition by zernike moments, IEEE Transactions on PAMI 12 (1990) 489–497.
A. Del Bimbo, P. Pala / Image and Vision Computing 17 (1999) 245–261 [18] S. Sclaroff, A. Pentland, Object recognition and categorization using modal matching, Proc. 2nd CAD-based Vision Workshop, Champion, PA, Feb. 1994. [19] G. Congiu, A. Del Bimbo, E. Vicario, Iconic retreival from databases of cardiological image sequences, Proc. 3rd Int. Conf. on Visual Database, Lausanne, Switzerland, March 1995. [20] A. Pentland, R. Picard, S. Sclaroff, Photobook: tools for content based manipulation of image databases, Proc. SPIE: Storage and Retrieval
261
for Image and Video Databases II, vol. 2185, Santa Jose, CA, Feb. 1994. [21] A. Del Bimbo, P. Pala, Image indexing using shape based visual features, Int. Conf. on Pattern Recognition, Wien, August 1996. [22] K. Hirata, T. Kato, Query by visual example: content-based image retrieval, in: A. Pirotte, C. Delobel, G. Gottlob (Eds.), Advances in Database Technology—EDBT’92, Lecture Notes on Computer Science, vol. 580.