Image and Vision Computing 22 (2004) 15–22 www.elsevier.com/locate/imavis
Image retrieval using resegmentation driven by query rectangles L. Cinque*, F. De Rosa, F. Lecca, S. Levialdi Dip. di Scienze dell’Informazione, Universita` di Roma “La Sapienza” Via Salaria 113, 00198 Rome, Italy Received 10 January 2003; received in revised form 10 July 2003; accepted 21 July 2003
Abstract Image retrieval using pictorial attributes such as color, position, and shape is a topic of vigorous research. We address two key issues in image retrieval: the use of rectangles in queries to express properties of regions in the desired target images, and the use of oversegmentation to build the index of images in the database. In our method, the rectangles in the user’s query are used to control a partial resegmentation of each candidate image. These query-driven, partial resegmentations provide the features needed for determining the distance between the query and each candidate, so that the closest candidates can be determined and retrieved. This method enables the construction of image retrieval systems with completely automatic indexing and relatively fast querying time. We show results of both qualitative and performance tests. q 2003 Published by Elsevier B.V. Keywords: Image retrieval; Resegmentation; Query-rectangle
1. Introduction Research in content-based image retrieval today is an area of increasing attention, expanding in breadth. After early successes in a few applications, research is now concentrating on deeper problems, challenging the hard problems at the crossroads of the discipline from which it was born: computer vision, databases, and information retrieval [1]. The growing number of images online offers opportunities to provide new services in the field of art, history, medicine and industry. The difficulty of image information retrieval is to infer from the query the human user’s intentions and preferences for images in a database. This challenge, then, is to figure out and do what the human wants. Most of the previous work on image retrieval by content uses image descriptions that are based upon statistical aspects (color histograms) or color and texture samples, taken at fixed locations in the images. Works on these problems include the QBIC system at IBM Almaden ([2,3]), work by Jain et al. at UC San Diego (see [4]), and others [5 –8]. A survey of these techniques has * Corresponding author. Tel.: þ 39-6-4991-8431; fax: þ39-6-884-1964. E-mail address:
[email protected] (L. Cinque). 0262-8856/$ - see front matter q 2003 Published by Elsevier B.V. doi:10.1016/j.imavis.2003.07.008
been given by Cinque et al. (see Ref. [10]). Image query by content is defined as a form of ‘iconic indexing’ in Ref. [16]. In order to allow better and more general kinds of retrieval, we employ descriptions of images in terms of regions that are obtained from a partial analysis of each image. Regions have the drawback of larger computation times, as well as many possible ways and techniques for computing their features. However, many researchers have begun to use regions in recent years for image retrieval. Dimai and Stricker [11] use a small number of ‘fuzzy’ regions in fixed positions to guide the computation of the image signatures; because the shapes and positions of their regions are not data-dependent, they do not have the benefits of regions obtained by a segmentation. Another use of regions is that by Carson et al. [12], in which a small set of regions is determined for each image using expectation maximization with color and texture features. They use a scale-selection heuristic based on directionality of edges in the image, but their regions come from only one scale of segmentation. Their method is therefore susceptible to low recall, due to oversegmentation and undersegmentation of the images. An important point in their paper is that the user should be presented with a view of the signature that is computed
16
L. Cinque et al. / Image and Vision Computing 22 (2004) 15–22
for the query image, in order to have some idea of why the retrieval is working the way it does. Several systems that combine information from all of the regions have been proposed. The SIMPLIcity system [13] uses integrated region matching as its similarity measure. By allowing many-to-many relationship of the regions, the overall similarity approach reduces the adverse effect of inaccurate segmentation, helps to clarify the semantics of a particular region and enables a single querying interface for region-based retrieval systems. Other efforts in this direction include WALRUS [14] a retrieval algorithm that employs a similarity model in which each image is first decomposed into its regions, and the similarity measure between a pair of images is then defined to be the fraction of the area of the two images covered by matching regions from the images, and a successful example of region-to-region similarity measures proposed by Jing [15]. Such system presents a region-based retrieval framework that integrates efficient region-based representation in terms of storage and retrieval. The framework consists of methods for image segmentation and a grouping indexing scheme based on a modified inverted file strategy. Moreover the framework supports continuous learning which enables it to self improve. In this paper we address two key issues in image retrieval: the use of over-segmentation when preparing the index of images in the database, and the use of rectangles in queries to express properties of regions in the desired target images. Our approach is to precompute a general structure for each image in the database, called an oversegmentation. We show how to use the oversegmentation at query time to reduce the time needed to compute the ‘query to candidate’ distance values. In our method, first a subset of the images in the database are identified as candidates using an approximate, first-level comparison. Then the rectangles in the user’s query are used to control a partial resegmentation of each candidate image. The resegmentation is performed at relatively low computational cost because it uses the results of the oversegmentations produced during the indexing phase. These query-driven partial resegmentations provide the features needed for determining the distance between the query and each candidate, so that the closest candidates can be determined and retrieved. This method enables the construction of image retrieval systems with completely automatic indexing. The organization of the paper is as follows: Section 2 outlines the basic technical ideas that underlie our method, Section 3 describes the use of oversegmentation in order to build the index of images and the use of rectangles in queries, in Section 4 experimental results are presented, and finally in Section 5 some further problems are discussed and conclusions are drawn.
2. Index representation Before we can describe our method in the next section, we must explain the basic technical ideas that underlie the system. 2.1. Image query by content problem formulation We define the image-query-by-content problem as follows. Given a query image I1 and a database of images {I2 ; …; In }; find the image Ii closest to I1 : The closeness is to be computed using a distance function DðIi ; Ij Þ which evaluates the shape, color, texture, position, and/or importance of the regions within their images. 2.2. Rich region descriptions We use a new data structure called Rich Region Description, which is a list of regions obtained by performing different segmentations at different levels of detail. Since the candidate pool may contain many regions, a selection can be made to retain only those regions with an appropriate likelihood of being useful. This is made according to heuristics. The result of selection is a set of regions which is not necessarily a segmentation of the image. For example, it may cover some pixels more than once, and there may be some pixels that are not covered by any of the regions. This is not a disadvantage but actually an advantage, because this description is used for retrieval, not for reconstruction, of the image. The description for an image, termed descðIÞ; is a set of region descriptions Ri ; where each of the regions is made up of pixels of I: It is not necessary for every pixel to be represented in a region, neither is it required that no pixel be represented more than once: regions may overlap, and they do not necessarily cover the entire image. descðIÞ ¼ {R1 ; R2 ; …; Rk } Ri ¼ kxi ; yi ; redi ; greeni ; bluei ; color covariancei ; areai ; perimeteri ; convexityi ; number of baysi l The object descðIÞ is what we term the rich region description for image I:
3. Oversegmentation and query-driven resegmentation 3.1. Computing oversegmentations As we already outlined, our approach to image retrieval in this work is towards the use of pre-calculated ‘coarse’ descriptions of images, which can be obtained off-line with an automatic segmentation procedure.
L. Cinque et al. / Image and Vision Computing 22 (2004) 15–22
17
The segmentation algorithm is described in Ref. [9], and is based on two main criteria: a multi-resolution approach and a target number of regions used as parameters for the algorithm. The procedure works in two phases. In the first phase builds a reduced version of the image and performs simple region growing, which cycles until a target number of regions has been found. In the second phase, results from the first are scaled to the original size image; region merging is performed, based on the relative color values and sizes of neighboring regions. Subsequent iterations perform merging with progressively relaxed thresholds, until the target number of regions is obtained. The regions resulting from this procedure form the descriptions for the images in the retrieval phase; we use ‘oversegmentations’, so that apparently unimportant details of images are not filtered out in the segmentation process. At the same time, using a relatively ‘higher level’ description than the single pixels brings considerable speed improvements, as we will show in the results section. 3.2. Query-driven resegmentation The main idea of this method is to be able to produce a segmentation (to be used for the matching process) that is dependent on the current query that the user specifies. In our image retrieval system, time query is expressed using tools to draw rectangles with color, size and position constraints for the desired target images. As an example, an user trying to find an image like Fig. 1a will probably draw a query similar to Fig. 1b. In order to perform matching, the query image is ‘superimposed’ to the oversegmentation of the image that is stored in the data base (Fig. 1c and d), to obtain a new segmentation to be used for the matching algorithm: all the regions of the oversegmentation which are found under the area of each query rectangle are merged together to constitute a new region, which inherits the properties of the original regions (a new mean color and new shape information). We used the following heuristic to decide whether a region should be merged or not: a region is merged if more than 60% of its area lies under the query region. This guarantees that the area of the resulting region does riot exceed the area of the query region by a predictable difference. The new segmentation obtained is shown in Fig. 1e: the justification for the use of this method is that whenever possible, the resulting new segmentations for the matching procedure will have a structure similar to the user query. The query-driven partial resegmentations provide the features needed for determining the distance between the query and each candidate: experimental results show that the matching function gives higher relevance to images similar to the query. The resegmentation process consists of the following steps:
Fig. 1. Resegmentation process: (a) desired target image, (b) query drawn by the user, (c) oversegmentation in the database (409 regions) (d) superimposition of query and oversegmentation (e) resulting resegmentation for matching process.
1. we start from the over-segmentations of all the images in the database, obtained off-line without user intervention. 2. each time a query is made to the system, a resegmentation is performed on the images of the database. 3. the matching function is applied, to compare the new segmentation with the query, and the results are presented in relevance order. The process is fast because the resegmentation is performed at relatively low computational cost: it uses the oversegmentations produced during the indexing phase. A method to speed up the process has been implemented: only a subset of the images in the database are identified as candidates using an approximate, first-level comparison, using a k-d tree index data structure. In this way a computationally expensive part (resegmentation and matching calculation) is performed only on images, which really have a probability of answering the users’ queries. In our experiments, on a data base of 350 images, the tree data structure enabled to prune the search and perform the matching on only the 10% of the data base for typical queries. 3.3. Alternative merging criteria The features of the merged regions are inherited from the original regions. This process obviously depends on the kind
18
L. Cinque et al. / Image and Vision Computing 22 (2004) 15–22
of feature: for example, if the case of region contours, when two adjacent regions are merged, the resulting contour can be easily defined. In the case of color, we use the mean color of the mean colors of the original regions. Alternative feature merging criteria could give better results, such as a color histogram, that would preserve more information about the original colors. Selection of the best features for
di ¼
8 < DðQi ; Ri Þ
case), a match between the query region and the minimum distance region is made. Using formulas, if we label the query regions with Q1 ; …; Qk ; the resegmentation regions with R1 ; …; Rn (the sets can have a different number of regions), and the set of the regions in the oversegmentation with S; we have:
if Qi cause the fusion of some regions; and Ri is the resulted region from this fusion
: min DðQi ; Ri Þ otherwise Ri [S
the application is a complex matter and will be the subject of future work.
And the similarity between the query Q and a candidate image I is:
3.4. Matching method
DðQ; IÞ ¼
A matching function has been studied and implemented in order to perform both efficiently and satisfactorily for the user’s point of view. This function is applied in order to give a similarity score between the query image and every image that resulted positive from the pruning phase. The resulting images are sorted and presented in our system in order of score. In particular, since the features of the query are based on rectangles with a position and color, the similarity is expressed in terms of similarity between the regions of the query and regions of the resegmentation of the database images. A typical oversegmentation contains about 300 regions, but the resegmented images contain only tens of regions, so the computational cost of the matching function is not very high. If R1 is the query region and R2 is the stored region, we define the distance between two region descriptions as follows: DðR1 ; R2 Þ ¼ a1 dcolor ðR1 ; R2 Þ þ a2 dposition ðR1 ; R2 Þ þ a3 dpixel ðR1 ; R2 Þ þ a4 vcolor2variance ðR1 Þ where the as are the weights of the function (whose sum is 1) and can be interactively changed by the user to give more importance to a particular feature. The variance is computed only with respect to the first region. The component feature distances are described in the following subsections. To compute the similarity between two images, the following strategy is used to match query regions with corresponding regions of each candidate image: if a query region causes the fusion of one or more regions of the oversegmentation, then that query region and the resulting region are matched and the similarity between them is evaluated. This makes sense because from the ipotheses of the resegmentation, these regions are in the same position in the query and candidate image. For query regions that do not cause a resegmentation (which is a more infrequent
k X kQ k d i kQk i¼1
where each d is weighted with the relative size of the query region to the size of the entire query image. 3.4.1. Color features A specific component distance for measuring color similarity has been studied: let c1 ¼ ðh1 ; s1 ; b1 Þ and c2 ¼ ðh2 ; s2 ; b2 Þ two colors represented using the HSB (Hue, Saturation and Brightness) color space. The following process is used for computing the distance: 1. Normalization of hues in the range [0,1]: h1 ¼
h1 ; 360
h2 ¼
h2 360
2. Computation of brightness and saturation coefficients (to be used later) ( 1 if ðb1 Þ2 þ ðb2 Þ2 $ 1 cbrightness ¼ ðb1 Þ2 þ ðb2 Þ2 otherwise ( 1 if ðs1 Þ2 þ ðs2 Þ2 $ 1 csaturation ¼ ðs1 Þ2 þ ðs2 Þ2 otherwise 3. Computation of hue difference: ( lh1 2 h2 l if lh1 2 h2 l # 0:5 d¼ 1 2 lh1 2 h2 l otherwise
dhue ¼
8 < cbrightness csaturation
if d $ 0:16
:ð
otherwise
d 2 0:16 Þ cbrightness csaturation
4. Computation of saturation difference: dsaturation ¼ ls1 2 s2 l3 cbrightness
L. Cinque et al. / Image and Vision Computing 22 (2004) 15–22
19
5. Computation of brightness difference: dbrightness ¼ lb1 2 b2 l3 6. Computation of the maximum difference between the two colors: dcolor ¼ maxðdhue ; dsaturation ; dbrightness Þ The value d represents the numeric difference in the hue. Since in the HSB color space the hue is represented in a circular interval (an angle), we need to know in which way to ‘turn’ in order to have the lowest difference, which must not exceed 0.5 in either case (Fig. 2). The constant 0.16, which appears at step 3, represents the division of the hues in 6 main slices. Each color difference which is greater than that value is considered maximum: the justification is that for such differences, colors are perceptually totally different. The brightness arid saturation coefficients are used to reduce the difference in hue as the colors become darker (and less ‘colored’): in these cases, in fact, the human eye gives less importance to hues. As to the saturation difference, we multiply by the brightness coefficient for the same reason. On the other side, the difference in brightness appears independent of both hue and saturation. The resulting total difference between two colors is taken as the maximum of the differences in hue, saturation and brightness. In order to understand the color feature distance, our system provides an interactive tool that lets users pick two colors and examine the computation of their distance. The above presented distance for measuring color similarity does not take into account color distributions within regions. One of our future works will be to modify this distance function in order to take into account also color histograms; in particular we are going to use histograms as distributions of the HSB colors, and to do that, we are going to use the definitions and the algorithms on color histograms and on binary color sets described in Ref. [17]. 3.4.2. Position This feature measures the spatial correspondence of the two regions R1 and R2 ; represented by the coordinates of the upper-left and lower-right points of the minimum enclosing rectangle: ðXai ; Yai Þ amid ðXbi ; Ybi Þ for i ¼ 1; 2: Let d be the length of the diagonal of the entire image. Then
Fig. 2. Part of the computation for the color similarity.
‘perfect’ or ‘near perfect’ overlapping between the two regions, while if its value is one or next to one, it indicates that there is no or minimum overlapping. The distance between centers of two regions is not a good measure for region overlappings; in fact this distance takes into account cases of not perfect overlap as ones of perfect overlap, as that of Fig. 4 where, even if the distance between the two centers (C1 and C2 ) is zero, there is no perfect overlapping between the two regions. Otherwise, in these cases, dposition reflects these not-perfect overlappings. 3.4.3. Pixel overlap This feature is used to compensate for the loss of information about the shape of the regions, which are represented only by the bounding box. This could result, in the matching phase, in very different regions being regarded as similar. As a first approximation, the difference between the shapes can be obtained by the number of pixels that constitute each region. If we indicate with p1 and p2 the total number of pixels in region R1 and R2 we have: lp1 2 p2 l dpixel ðR1 ; R2 Þ ¼ maxðp1 ; p2 Þ 3.4.4. Color variance During the resegmentation phase, sometimes regions with very different colors are merged and the resulting mean color may be different from both the original colors (as an example, consider a black and a white region which, if merged, produce a grey region). To partially overcome the problem, another feature that is computed for each region is its color variance: when two regions with very different colors are merged, the resulting legion will have a higher variance value. Intuitively, this
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðXa1 2 Xa2 Þ2 þ ðYa1 2 Ya2 Þ2 þ ðXb1 2 Xb2 Þ2 þ ðYb1 2 Yb2 Þ2 dposition ðR1 ; R2 Þ ¼ 2d This is the sum of the distances between the corresponding corners of the two rectangles, normalized by the maximum possible distance, in order to have a number in the range from 0 to 1 (Fig. 3). The intuition behind this definition is that dposition represents the spatial overlap coefficient between two regions: if its value is zero or next to zero, it indicates
value will represent the ‘pureness’ of the colors of the original regions. 4. Experiments In order to perform tests, we developed an image retrieval system in Java, based on our resegmentation
20
L. Cinque et al. / Image and Vision Computing 22 (2004) 15–22
Fig. 5. User interface for expressing queries (white areas are treated as ‘do not care’).
Fig. 3. Region distance.
method. The user can build a database by selecting a folder with GIF or JPEG files and executing the command to index them: this run-once phase builds the oversegmentations to be used later for retrieval, then builds the indexes on which the first-level comparison is made. There is a query interface by which the user creates colored rectangles in order to express his preferences about the images. (Fig. 5). We performed two kinds of tests on a 350 image data base: a qualitative test to express user satisfaction, and performance tests that show the difference in processing times and database sizes. In order to perform tests, we had our system record the history of the queries posed by the users, and then we selected some ‘typical’ queries.
the understanding of the method enabled users to perform more and more effective queries. 4.2. Performance tests We performed time comparisons to verify the performance of our method. We analyzed the following cases: a. b. c.
computing complete segmentations at query time. computing resegmentations at query time without index (performing matching on each image). computing resegmentations at query time using an index.
In Fig. 7 we have the timings for the segmentation phase of a typical 640 £ 480 image of our database: since
4.1. Qualitative tests In the qualitative tests, users tried our system and rated the results of the matching. They also rated the results of the same queries on the QBIC system (see Refs. [2,3]) using the same database. The ratings were higher with our system, averaging a 6.8 mark. Moreover, a very appreciated feature was the possibility to visually examine the matching process (see Fig. 6 for the results of the query in Fig. 5):
Fig. 4. Region distance position and distance centers.
Fig. 6. Results of the query: the rows indicate the images found, the internal bounding box representation, the matching with the query, the resegmentation, and the numerical values.
L. Cinque et al. / Image and Vision Computing 22 (2004) 15–22
21
Fig. 7. Performance measurements of our system, taken on a IBM £ 330 server (Pentium III 1 Ghz) with 1 Gb of RAM and a data base of 350 images.
it is a bottom-up process, images with less regions take more time to compute. The slowest part of the building of a database is the segmentation, while the indexing phase is relatively fast: in 30 min the whole process, which is run-once, is completed and the system is ready to be used. Images can later be added without recomputing every segmentation. During query processing, only average 10% of the images in the database are selected for resegmentation and matching, and both operations are very fast. This results in very fast response times: about 3 s for each query. Case (a) cannot be used for user interaction because it is too slow, while the other two are acceptable and of course the fastest processing time is obtained with the use of both resegmentations and an index. The latter must be tuned to avoid affecting recall and precision of the system. We also studied the influence of the granularity of the database (number of regions in the starting oversegmentation) on the processing times, on the XML region description size and on the results of the queries: there is not a big difference in processing times between 50 and 700 regions, and it would not be much useful to have a very high number of regions for ‘real-life’ picture data bases, so the best empirical setting would be to use databases made of 300 regions for each image.
5. Conclusions and future work The main features of our work are: the use of rectangle-based queries to express properties of regions in the desired target images, and the use of oversegmentation when preparing the index of images in the database. In our method the rectangles in the user’s query are used to control a partial resegmentation of each candidate image. An advantage is that the index is created off-line in an unattended mode, so that query methods work at a region level instead of the pixel level causing efficient computation of a matching function which tries to reflect users’ expectations. The improve-
ments to the current system which are currently being addressed are an accurate selection of features, which include more sophisticated shape features, and a clientserver architecture which improves scalability and processing times.
References [1] A. Smeuldres, M. Worring, S. Santini, A. Gupta, R. Jain, Contentbased image retrieval at the end of the early years, IEEE Transactions on Pattern Analysis and Machine Intelligence 22 (12) (2000) 1349– 1380. [2] W. Niblack, et al., The QBIC project: querying images by content using color, texture and shape, Proceedings of the Storage and Retrieval for Image and Video Databases, 1, 908, SPIE Bellingham, WA, 1993, pp. 173–187. [3] M. Flickner, et al., Query by image and video content: the QBIC system, Special issue on content-based image retrieval systems, Computer 28 (9) (1995) 23 –32. [4] S. Santini, R. Jain, Similarity matching, IEEE Transactions on Pattern Analysis and Machine Intelligence (1996). [5] V.N. Gudivada, V.V. Raghavan, Guest Editors, Special Issue on Content-Based Image Retrieval Systems, Computer 28 (9) (1995). [6] A. Del Bimbo, P. Pala, S. Santini, Visual image retrieval by elastic deformation of object sketches, Proceedings of the IEEE Symposium on Visual Languages (1994) 216– 223. [7] A. Rosenfeld, A.C. Kak, Digital Picture Processing, Academic Press, New York, 1992. [8] R. Haralik, L. Shapiro, Computer and Robot Vision, Addison-Wesley, Reading, MA, 1992. [9] L. Cinque, F. Lecca, S. Levialdi, S. Tanimoto, Retrieval of images using Rich Region Descriptions, Journal of Visual Languages and Computing (2000). [10] L. Cinque, M. De Marsico, S. Levialdi, Indexing pictorial documents by their content: a survey of current techniques, Image and Vision Computing 15 (1997) 119–141. [11] A. Dimai, M. Stricker, Spectral covariance and fuzzy regions for image indexing, Communications Technology Lab Technical Report BIWI-TR-173, Swiss Federal Institute of Technology, ETH, Zurich, Switzerland, April 1996. [12] C. Carson, S. Belongie, H. Greenspan, J. Malik, Region-based Image Querying, Proceedings of the CVPR’97 Workshop on Content-Based Access of Image and Video Libraries, IEEE Computer Society, Alameda, CA, 1997.
22
L. Cinque et al. / Image and Vision Computing 22 (2004) 15–22
[13] J.Z. Wang, J. Li, G. Wiederhold, Simplicity: semantics-sensitive Integrated Matching for Picture Libraries, IEEE Transactions on Pattern Analysis and Machine Intelligence (2001) 23. [14] A. Natsev, R. Rastogi, K. Shim, WARLUS: similarity retrieval algorithm for image database, Proceedings of ACM-SIGMOD International Conference on Management of Data, 1999. [15] F. Jing, B. Zhang, F.Z. Lin, W.Y. Ma, H. Zhang., A novel regionbased image retrieval method using relevance feedback, Proceedings
of the Third ACM International Workshop on Multimedia Information Retrieval (MIR), 2001. [16] S.L. Tanimoto, An iconic/symbolic structuring scheme, in: C.H. Chen (Ed.), Pattern Recognition and Artificial Intelligence, Academic Press, Orlando, FL, 1976, pp. 452 –471. [17] Smith, J.R. Integrated spatial and feature image systems: retrieval, analysis and compression, PhD, Columbia University, 1997.