Content-based image retrieval by viewpoint-invariant color indexing

Content-based image retrieval by viewpoint-invariant color indexing

IMAVIS 1591 Image and Vision Computing 17 (1999) 475–488 Content-based image retrieval by viewpoint-invariant color indexing Theo Gevers a,*, Arnold...

633KB Sizes 139 Downloads 211 Views

IMAVIS 1591

Image and Vision Computing 17 (1999) 475–488

Content-based image retrieval by viewpoint-invariant color indexing Theo Gevers a,*, Arnold W.M. Smeulders a a

ISIS, Faculty of WINS, University of Amsterdam, Kruislaan 403, 1098 SJ Amsterdam, The Netherlands Received 3 April 1997; received in revised form 27 February 1998; accepted 10 June 1998

Abstract We aim at content-based image retrieval by color information indexing robust against varying imaging conditions. To this end, we propose a new set of color features robust to a large change in viewpoint, object geometry and illumination. From the proposed set, various color features are selected to construct color pattern-cards for each image. Matching measurers are defined, expressing similarity between color pattern-cards, robust to a substantial amount of object occlusion and cluttering. Based on the color pattern-cards and matching measures, a hashing scheme is presented offering constant run-time image retrieval independent of the number of images in the image database. To evaluate accuracy of the image retrieval scheme, experiments have been conducted on a database consisting of 500 images taken from multicolored man-made objects in real world scenes. The results show that high image retrieval accuracy is achieved. Also, robustness is demonstrated against a change in viewing position, partial occlusion, and a substantial amount of object cluttering. Finally, the image retrieval scheme is integrated into the PicToSeek system on-line at http://www.wins.uva.nl/research/isis/PicToSeek/ for searching images on the World Wide Web. 䉷 1999 Elsevier Science B.V. All rights reserved. Keywords: Content-based image retrieval; Viewpoint-invariant color indexing; Dichromatic reflectance; Reflectance properties; Color models; Color invariants; Pattern-cards; Matching functions; Hashing; Image browser for Internet; World Wide Web

1. Introduction Over the past few years, a substantial development is going on in the field of managing large amounts of electronically stored image data mainly due to the interest of building multimedia information systems (database community) and image database systems (computer vision community). Managing image data in this regard requires storage, retrieval and processing of pictorial entities. The convergence of database and image processing/pattern recognition technology yields the basis for the creation of such digital image archives. Very large digital image archives have been created and used in a number of applications including archives of images of postal stamps, textile patterns, museum objects, trademarks and logos, and views from everyday life as it appears in home videos and consumer photography. Moreover, with the growth and popularity of the World Wide Web, a tremendous amount of visual information is made accessible publicly. As a consequence, there is a growing demand for search methods retrieving pictorial entities from large image archives. Currently, a large number of text-based search engines are available and they have been proven to be very successful in retrieving * Corresponding author.

documents. To locate pictorial information, these text-based search engines assume that textual descriptions of the visual data are present. However, people are reluctant in verbally categorizing visual information, which is very common for images available on the World Wide Web. Moreover, using text as the basis for retrieval is almost always inadequate due to the semantic richness of pictorial information, i.e. the majority of pictorial information in an image cannot be fully captured by text due to the essential limitations in expressive power of language. Often no textual description of the pictorial information is present at all. Hence, in those cases, the capabilities of current text-based search engines for retrieving images are limited. Therefore, in this paper, we consider the retrieval of images for which the textual description of the images has already been exhausted for search or for which no description exists other than the image as pictorial data. We focus on image retrieval by image example, where an example query image is given by the user on input (e.g. [1– 3]). A typical application is the problem of retrieving images containing instances of particular objects. Then, the query is specified by an example image taken from the object(s) at hand. The basic idea to image retrieval by image example is to extract characteristic features from target images which are matched with those of the query. These

0262-8856/99/$ - see front matter 䉷 1999 Elsevier Science B.V. All rights reserved. PII: S0262-885 6(98)00140-1

476

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

features are typically derived from color, shape or texture information of query and target. In this paper, we focus on using color information for the job at hand. A major problem of retrieval by image example occurs when the query and target image containing instances of the same object(s) are recorded under different imaging conditions. In general, images are taken from objects from different viewpoints. When two recordings are made of the same object from different viewpoints, the perceivable shape of the object will be projectively distorted. Also variations in viewpoint yield different shadowing and highlighting cues changing the intensity data fields considerably. Moreover, differences in illumination change the photometric composition of images. Therefore, in this paper, we propose a new set of color features which is: 1. 2. 3. 4.

robust robust robust robust

to to to to

a a a a

change change change change

in in in in

viewpoint; object geometry; the direction of the illumination; the intensity of the illumination.

From the proposed set, various color features are selected to construct color pattern-cards. Color pattern-cards indicate whether a particular color feature value is dominantly present in an image or not. Then, the problem of contentbased image retrieval is reduced to the problem to what extent the color pattern-card derived from the query image is similar to the color pattern-card constructed for each image in the image database. In this paper, our aim is to define matching measures expressing difference between color pattern-cards according to the following criteria: 1. 2. 3. 4.

robustness to a substantial amount of object occlusion; robustness to a substantial amount of object cluttering; robustness to noise in the images; high discriminative power.

No constraints are imposed on the objects being viewed and the image forming process other than that images should be taken from multicolored objects illuminated by white light. White illumination is not a severe restriction, because white illumination is acceptable for a large variety of applications for which approximated white light can be assumed, lightning can be controlled, or color constant methods can be applied on the images prior to the actual indexing process. The paper is organized as follows. In Section 2, related work is discussed. In Section 3, the effect of a change in imaging circumstances is studied for dichromatic reflectance under white light. From the analysis, a new set of robust color models is proposed. Image retrieval based on color pattern-card matching is given in Section 4. Hash tables are formed, in Section 5, to enable fast run-time image retrieval. To test the accuracy of the retrieval scheme, experiments are carried out on a dataset of 500 images in Section 6. Finally, in Section 7, the

application is considered of searching images on the World Wide Web.

2. Related work Color provides powerful information for image retrieval. A simple and effective color indexing scheme is to represent and match images on the basis of color histograms as proposed by Swain and Ballard [4]. The work makes a significant contribution in introducing color for image indexing. Swain and Ballard show that the use of color for image retrieval is to a large degree robust to changes in position and orientation of objects as well as changes in object shapes. Further, attempts have been made to develop general purpose image retrieval systems based on multiple features (e.g. color, shape and texture) describing the image content [5–11]. We implemented the Enigma system [1] retrieving images based on query by example. Matching between query and target images is performed by image template matching based on mathematical morphology operations. The system has been shown effective on visual domains with a high degree of formal structure such as documents of electronic schemas and geographical maps. Also some success was obtained in domains with a weaker formal structure such as the basic anatomic structure of radiographs. IBM research at Almaden implemented the QBIC system [2,12] allowing for queries on large image and video databases on the basis of color, texture and shape features. Fast indexing techniques and similarity functions are proposed. The key idea behind Photobook [3] is semantics-preserving image compression, which reduces images to a small set of perceptually significant coefficients describing the shape, color and texture of the images. In contrast to full content-based image retrieval, Chabot [13] uses a combination of visual appearance and text-based cues to retrieve images. Recently, a number of image browsers are available for retrieving images from the World Wide Web (e.g. [14– 17]). These systems retrieve images on the basis of keywords and/or the image content. A drawback of the above mentioned systems is, however, that the extracted features used during the retrieval process depend on the geometry of the object, on the viewpoint of the camera and on the accidental illumination conditions. As a result, using any of these systems, problems may occur when the query and target image are recorded under different imaging conditions. Consequently, there is an inevitable need for image search methods which are more robust to the image forming process. To this end, in this paper, we aim to formulate computational methods and data structures for the purpose of content-based image retrieval robust to a substantial change in viewpoint, object geometry and illumination.

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

477

for all wavelengths within the visible spectrum), then e…l† ˆ e and cs …l† ˆ cs , and hence being constants. Then, we put forward that the measured sensor values are given by: Cw ˆ emb …~n; ~s†kC ⫹ ems …~n; ~s; ~v†cs

Z l

fC …l†dl

…2†

for Cw 僆 {Rw ; Gw ; Bw } giving the red, green and blue sensor response under the assumption of a white light source. Further, kC ˆ

~ and surface reflection vector S~ defined in Fig. 1. Body reflection vector B the RGB-color space.

l

We first present the dichromatic reflection model under white illumination in Section 3.1. The effect of a change in object orientation, camera viewpoint, and illumination intensity is studied for the dichromatic reflection in Section 3.2. From the analysis, a new set of color invariant models is proposed. 3.1. The reflection model Consider an image of an infinitesimal surface patch of an inhomogeneous dielectric object. Using the red, green and blue sensors with spectral sensitivities given by fR …l†; fG …l† and fB …l†, respectively, to obtain an image of the surface patch illuminated by an SPD of the incident light denoted by e…l†, the measured sensor values is given by Shafer [18]: Z C ˆ mb …~n; ~s† fC …l†e…l†cb …l†dl l

l

fC …l†e…l†cs …l†dl

fC …l†cb …l†dl

…3†

fR …l†dl ˆ

Z l

fG …l†dl ˆ

Z l

fB …l†dl ˆ f

…4†

we propose that the reflection from inhomogeneous dielectric materials under white illumination is given by:

3. Color invariant image features

Z

l

is the compact formulation depending on the sensors and the surface albedo only. If the integrated white condition holds (as we assume throughout the paper): Z

⫹ ms …~n; ~s; ~v†

Z

…1†

for C ˆ {R, G, B} giving the Cth sensor response. Further, cb …l† and cs …l† are the surface albedo and Fresnel reflectance, respectively. l denotes the wavelength, n~ is the surface patch normal, ~s is the direction of the illumination source, and ~v is the direction of the viewer. Geometric terms mb and ms denote the geometric dependencies on the body and surface reflection component, respectively. Considering the neutral interface reflection (NIR) model (assuming that cs …l† has a constant value independent of the wavelength) and white illumination (equal energy density

Cw ˆ emb …~n; ~s†kC ⫹ ems …~n; ~s; ~v†cs f

…5†

In the next section, the effect of a change in imaging circumstances is studied for the given reflection model. From the analysis, a new set of color models is proposed which is robust to a change in viewpoint, surface orientation, illumination direction, illumination intensity, and highlights. 3.2. Reflectance with white illumination Consider the body reflection term of Eq. (5): Cb ˆ emb …~n; ~s†kC

…6†

for Cb 僆 {Rb ; Gb ; Bb } giving the red, green and blue sensor response of an infinitesimal matte surface patch under the assumption of white illumination. According to the body reflection model, the color depends only on kC (i.e. sensors and surface albedo) and the brightness on factor emb …~n; ~s†. If a matte homogeneously colored surface (i.e. with fixed albedo) contains a variety of surface normals (e.g. curved surface), then the set of measured colors will generate an elongated color cluster in RGB-color space. In fact, all measured colors of a matte homogeneously colored surface can be repre~ in RGB-color sented by the body reflection vector B ~ is defined space (see Fig. 1), where the direction of B by kC . Consider the surface reflection term of Eq. (5): Cs ˆ ems …~n; ~s; ~v†cs f

…7†

for Cs 僆 {Rs ; Gs ; Bs } giving the red, green and blue sensor

478

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

response for a shiny infinitesimal surface patch under white illumination. Note that under the given conditions, the color of highlights is not related to the color of the surface on which they appear, but only on the color of the light source. Thus for the white light source, the measured values of a shiny surface can be represented by the surface reflection vector S~ on the grey axis of the RGB-color space (see Fig. 1). The extent of the highlight color cluster depends on the roughness of the object surface. For a given point on a shiny surface, the contribution of the body reflection component Cb and surface reflection component Cs are added (cf. Eq. (5)). Hence, the observed colors of a uniformly colored (shiny) surface must be inside the triangular color plane in the RGB space spanned by the two reflection components (see Fig. 1). Therefore, any expression defining colors on this triangular plane is a color invariant for the dichromatic reflection model under white illumination. To that end, we propose the following basic set of irreducible color invariants: C1 ⫺ C2 C3 ⫺ C4

…8†

where C 1 ; C2 ; C 3 ; C 4 僆 {R; G; B} and C3 苷 C 4 , which is a color invariant for the dichromatic reflection model under white illumination as follows from substituting Eq. (5) in Eq. (8): …emb …~n; ~s†kC1 ⫹ ems …~n; ~s; ~v†cs f † ⫺ …emb …~n; ~s†kC2 ⫹ ems …~n; ~s; ~v†cs f † …emb …~n; ~s†kC3 ⫹ ems …~n; ~s; ~v†cs f † ⫺ …emb …~n; ~s†kC4 ⫹ ems …~n; ~s; ~v†cs f †

ˆ

emb …~n; ~s†kC1 ⫺ emb …~n; ~s†kC2 emb …~n; ~s†kC3 ⫺ emb …~n; ~s†kC4

ˆ

emb …~n; ~s†…kC1 ⫺ kC2 † k 1 ⫺ kC 2 ˆ C kC 3 ⫺ kC 4 emb …~n; ~s†…kC3 ⫺ kC4 †

only dependent on the sensors and the surface albedo. Any (linear) combination of the basic set of irreducible color invariants will result in a new color invariant. Thus, having RGB as primary colors yielding the basic set of irreducible color invariants …R ⫺ G†=…R ⫺ B†; …R ⫺ B†=…G ⫺ B†; …G ⫺ B†=…R ⫺ G† in RGB-color space, color invariants can be computed in a systematic manner: Lˆ

Si ai …Ri ⫺ Gi †p …Ri ⫺ Bi †q …Gi ⫺ Bi †r Sj bj …Rj ⫺ Gj †s …Rj ⫺ Bj †t …Gj ⫺ Bj †u

…9†

where p ⫹ q ⫹ r ˆ s ⫹ t ⫹ u, and p; q; r; s; t; u 僆 R. Further, i; j ⱖ 1 and ai ; bj 僆 R. Lemma 1 Assuming dichromatic reflection and white illumination, L is independent of the viewpoint, surface orientation, illumination direction, illumination intensity, and highlights.

Proof: By substituting Eq. (5) in Eq. (9) we have:

Si ai …Rwi ⫺ Gwi † p …Rwi ⫺ Bwi †q …Gwi ⫺ Bwi †r Sj bj …Rwj ⫺ Gwj †s …Rwj ⫺ Bwj †t …Gwj ⫺ Bwj †u ˆ

Si ai …Rbi ⫺ Gbi † p …Rbi ⫺ Bbi †q …Gbi ⫺ Bbi †r Sj bj …Rbj ⫺ Gbj †s …Rbj ⫺ Bbj †t …Gbj ⫺ Bbj †u

ˆ

Si ai …emb …~n; ~s†† p⫹q⫹r …kRi ⫺ kGi † p …kRi ⫺ kBi †q …kGi ⫺ kBi †r Sj bj …emb …~n; ~s†† s⫹t⫹u …kRj ⫺ kGj †s …kRj ⫺ kBj †t …kGj ⫺ kBj †u

ˆ

Si ai …kRi ⫺ kGi † p …kRi ⫺ kBi †q …kGi ⫺ kBi †r Sj bj …kRj ⫺ kGj †s …kRj ⫺ kBj †t …kGj ⫺ kBj †u

only dependent on the sensors and the surface albedo, where p ⫹ q ⫹ r ˆ s ⫹ t ⫹ u; and p; q; r; s; t; u 僆 R. Further, i; j ⱖ 1 and ai ; bj 僆 R. Further, Cw ˆ emb …~n; ~s†kC ⫹ ems …~n; ~s; ~v†cs f and Cb ˆ emb …~n; ~s†kC . QED. For instance, for the first order color invariants (i.e. p ⫹ q ⫹ r ˆ s ⫹ t ⫹ u ˆ 1), we have the set:  …R ⫺ G† …B ⫺ G† ; ; …R ⫺ B† …R ⫺ B†  …R ⫺ G† ⫹ …B ⫺ G† …R ⫺ G† ⫹ 3…B ⫺ G† ; ; :::; ; …R ⫺ B† …R ⫺ B† ⫹ 2…R ⫺ G† and for the second order color p ⫹ q ⫹ r ˆ s ⫹ t ⫹ u ˆ 2):  …R ⫺ G†…R ⫺ B† …B ⫺ G†…R ⫺ B† ; ; …R ⫺ B†2 …R ⫺ B†2

invariants

(i.e.

) …R ⫺ G†2 ⫹ …B ⫺ G†2 …R ⫺ G†2 ⫹ 3…B ⫺ G†2 ; ; :::; ; …R ⫺ B†2 …R ⫺ B†2 ⫹ 2…R ⫺ G†2

and for the third order color invariants: ( …R ⫺ G†3 …B ⫺ G†3 …R ⫺ G†3 ⫹ …B ⫺ G†3 ; ; ; …R ⫺ B†3 …R ⫺ B†3 …R ⫺ B†3 ) …R ⫺ G†…R ⫺ B†…G ⫺ B† ⫹ 3…B ⫺ G†3 ; :::; ; …R ⫺ B†3 ⫹ 2…R ⫺ G†3 etc. where each expression is a color invariant for the dichromatic reflectance under white illumination. We can easily see that hue given by [19]: ! p 3…G ⫺ B† …10† H…R; G; B† ˆ arctan …R ⫺ G† ⫹ …R ⫺ B† ranging from [0, 2p) is an instantiation of the first order color p invariant of Eq. (9), as a function of arctan (), with  a1 ˆ 3; a2 ˆ 0; b1 ˆ 1; b2 ˆ 1. Although any instantiation of L can be taken for the purpose of viewpoint independent image retrieval, in this paper, hue is considered as an instantiation of L because hue is intuitive and well-known in the color literature. In addition to hue, the following second order color invariant has

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

been selected as an instantiation of L for viewpoint-invariant image retrieval: …R ⫺ G† ; …R ⫺ G†2 ⫹ …R ⫺ B†2 ⫹ …G ⫺ B†2

…11†

l2 …R; G; B† ˆ

…R ⫺ B†2 ; …R ⫺ G† ⫹ …R ⫺ B†2 ⫹ …G ⫺ B†2

…12†

l3 …R; G; B† ˆ

…G ⫺ B†2 …R ⫺ G†2 ⫹ …R ⫺ B†2 ⫹ …G ⫺ B†2

…13†

2

l1 …R; G; B† ˆ

2

which is the set of normalized color differences (ncd), where 0 ⱕ li ⱕ 1 and l1 ⫹ l2 ⫹ l3 ˆ 1. In the next section, these instantiations are considered for the purpose of viewpoint-invariant image retrieval.

4. Viewpoint-invariant image retrieval by color patterncard matching This section is organised as follows. In Section 4.1, color pattern-cards are constructed on the basis of hue, l1 l2 l3 , and hue–hue pairs. Image retrieval based on color pattern-card matching is given in Section 4.2. In Section 4.3, various matching measures are proposed to express difference between color pattern-cards. 4.1. Color pattern-card construction Histograms are created first in the standard way. Because the color distributions of histograms depend on the scale of the recorded object (e.g. distance object–camera), we define the color pattern-cards as thresholded histograms. In this way, color pattern-cards are scale-independent by indicating whether a particular color model value is substantially present in an image or not. 4.1.1. Hue First, a hue histogram HH …i† is constructed by counting the number of times a hue value H…R~x ; G~x ; B~x † is present in an image I: HH …i† ˆ

h…H…R~x ; G~x ; B~x † ˆ i† for ᭙~x 僆 I N

…14†

where h indicates the number of times H…R~x ; G~x ; B~x †, defined by Eq. (10), equals the value of index (i). N is the total number of image locations. In the following, index (i) is called a hixel (histogram element). By histogrmaming hue values, pixels from a uniformly colored surface will, in theory, produce equivalent hue values and hence the total accumulation for a particular hixel …i† 僆 HH is a measure of the size of a homogeneously colored surface as measured by the camera. To make the image representation scale-independent, we define the color pattern-card as a threshold hue histogram

specified by ( 1; PH …i† ˆ 0;

if HH …i† ⬎ tb otherwise

479

…15†

where the set of hixels with value 1 represents hue values to be substantially present in the image regardless of their actual amount. tb is a predefined threshold (i.e. a uniformly colored region should at least cover tb percent of the total image range). In the following, we denote hixels with value 1 as foreground hixels and are displayed as black. Hixels with value 0 are said to be background hixels and will be displayed as white. Then the set of foreground hixels for PH is given by: CH ˆ {…i† : PH …i† ˆ 1}

…16†

4.1.2. Ncd A 3D histogram is constructed for ncd: Hl1 l2 l3 …i; j; k† ˆ h……l1 …R~x ; G~x ; B~x † ˆ i† ∧ …l2 …R~x ; G~x ; B~x † ˆ j† ∧ …l3 …R~x ; G~x ; B~x † ˆ k†† N for ᭙~x 僆 I

…17†

where ∧ denotes the logical AND. Similarly, the color pattern-card for ncd is defined by: ( 1; if Hl1 l2 l3 …i; j; k† ⬎ tb …18† Pl1 l2 l3 …i; j; k† ˆ 0; otherwise where tb is a predefined threshold. The set of foreground hixels is given by: Cl1 l2 l3 ˆ {…i; j; k† : P…i; j; k† ˆ 1}

…19†

4.1.3. Hue–hue edge pairs at local hue edge maxima To incorporate local spatial color information into the color pattern-card, we consider the pair of hue values at either side of a significant edge found in the hue image. Due to the circular nature of hue, the standard difference operator is not suited for computing the difference between hue values. Therefore, the difference between two hue values h1 and h2, ranging from [0, 2p), is defined as follows: d…h1 ; h2 † ˆ jh1 ⫺ h2 jmodp

…20†

yielding a difference d…h1 ; h2 † 僆 ‰0; pŠ between h1 and h2. This is a distance as it satisfies the following metric criteria: 1. 2. 3. 4.

d…h1 ; h2 † ⱖ 0 for all h1 and h2; d…h1 ; h2 † ˆ 0 if and only if h1 ˆ h2 ; d…h1 ; h2 † ˆ d…h2 ; h1 † for all h1 and h2; and d…h1 ; h2 † ⫹ d…h2 ; h3 † ⱖ d…h1 ; h3 † for all h1, h2 and h3 (triangular inequality).

To find hue edges in images, we use an edge detector of the Sobel type where the component of the positive gradient

480

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

Fig. 2. (a) Recorded color image. (b) Hue edge map M. (c) Observed hue–hue pairs of the original image in the computed color pattern-card with tb ˆ 1, i.e. a hue edge covering at least 1% of the total hue edge area. (d) Observed hue–hue pairs of the original image in the computed color pattern-card with tb ˆ 3.

vector in the x-direction is defined as follows: Hx …~x† ˆ

1 …d…H…x ⫺ 1; y ⫺ 1†; H…x ⫹ 1; y ⫺ 1†† 4 ⫹ 2d…H…x ⫺ 1; y†; H…x ⫹ 1; y†† ⫹ d…H…x ⫺ 1; y ⫹ 1†; H…x ⫹ 1; y ⫹ 1†††

…21†

And in the y-direction as: Hy …~x† ˆ

1 …d…H…x ⫺ 1; y ⫺ 1†; H…x ⫹ 1; y ⫺ 1†† 4 ⫹ 2d…H…x; y ⫺ 1†; H…x; y ⫹ 1†† ⫹ d…H…x ⫺ 1; y ⫹ 1†; H…x ⫹ 1; y ⫹ 1†††

The gradient magnitude is represented by: q jj7H…~x†jj ˆ Hx2 …~x† ⫹ Hy2 …~x†

…22†

…23†

where ts is a threshold based on the noise level in the hue image to suppress marginally visible edges. Then, for each local maximum, two neighboring points are computed based on the direction of the gradient to determine the hue value on the both sides of the edge: …25†

thus computed only at the two sides of a maximum. Furthermore, n~ is the normal to the intensity gradient at ~x and D is a preset fraction. A 2D hue–hue pair histogram is constructed in a standard way: HHtoH …i; j† ˆ

h…~p…~x† ˆ …i; j†† for ᭙~x 僆 M M

indicating the set of hixels representing dominant hue–hue pairs. Finally, the foreground hixels are specified by: CHtoH ˆ {…i; j† : PHtoH …i; j† ˆ 1}

After computing the gradient magnitude, nonmaximum suppression is applied to jj7H…x; y†jj to obtain local maxima in the gradient values [20]: ( jj7H…~x†jj; if …jj7H…~x†jj ⬎ ts † is a local maximum M…~x† ˆ 0; otherwise …24†

p~…~x† ˆ …H…~x ⫺ Dn~ †; H…~x ⫹ Dn~†† for ᭙~x 僆 M

maxima, pixels on a uniformly colored surface will not, in theory, produce hue edges and hence be discarded during histogram formation. However, pixels along the same hue edge will accumulate in the same hixel (histogram element). Hence the total accumulation for a particular hixel (i, j) in HHtoH is a measure of the length of a specific hue edge in the image. The hue–hue pair pattern-card is given by: ( 1; if HHtoH …i; j† ⬎ tb …27† PHtoH …i; j† ˆ 0; otherwise

…26†

where M is the number of local hue edge maxima. By histogramming hue–hue pairs at local hue edge

…28†

indicating the set of hixels representing hue–hue pairs at local hue edge maxima to be substantially present in the image. To illustrate the construction of PHtoH (cf. Eq. (27)) in practice, the color image shown in Fig. 2(a) is considered. The image is composed of an oven cloth of textile material against a white background. The oven cloth consists of five square areas of distinct color. The image is contaminated by a substantial amount of noise, minor surface orientation changes and shadows. In Fig. 2(b), the hue edge map M (cf. Eq. (24)) is shown computed by the hue edge algorithm with nonmaximum suppression. Good performance is shown, where computed edges are not affected by the disturbing influences of surface orientation change and shadows. As one can see, the edge map is zero except where two region of homogeneous color meet resulting in five hue borders. Pixels along these borders will generate four observed clusters (note that the outermost object background region outline is left out) of hue–hue pairs accumulating in the histogram HHtoH (not presented here). Fig. 2(c,d) shows the color pattern-cards as a result of thresholding HHtoH by tb ˆ 1 and tb ˆ 3, respectively. The color pattern-cards of hue–hue pairs are graphically represented on the basis of two hue axes h1 and h2 with 64 bins. Due to image noise, observed hue–hue pairs along the same hue edge are smeared out in the color pattern-card domain

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

481

Fig. 3. (a) First recorded color image. (b) Observed hue–hue pairs of the first recording in the computed color pattern-card with tb ˆ 3. (c) Second recorded color image. (d) Observed hue–hue pairs of the second recording in the computed color pattern-card with tb ˆ 3. Note that despite photometric differences of the two images, their color pattern-cards are fairly the same.

yielding four blob-like regions (see Fig. 2(c,d)), where pattern-card of (c) is more affected by noise (with only one clear outlier in the middle) than the one shown in (d) as a result of thresholding with different values. Note that CHtoH is symmetric around the diagonal yielding a unique hue–hue pair characterization. To illustrate robustness of hue–hue pair pattern-cards to the varying circumstances induced by the imaging process, the same 3D multicolored object is recorded twice with distinct position and orientation with respect to camera (see Fig. 3). The recording differ with respect to object scale, object orientation and camera viewpoint as well as with respect to shadowing, shading and highlighting cues. The pattern-cards are shown in Fig. 3(b) for image (a) and (d) for image (c). As one can see, the pattern-cards are fairly stable (except for a few outliers due to noise) discounting the disturbing influence of the image forming process. From the observations above, it can be concluded that each object in view will be summarized and characterized by an object-specific pattern-card robust a change in viewing position, object geometry and illumination. In general, an object consisting of a large variety of colors will yield a highly object-specific pattern-card as opposed to an object consisting of only a few distinct colors. 4.2. Image retrieval b {Ik }Nkˆ1

of Nb color Let the image database consist of a set images. For each Ik ; C Ik is created. C Q is created in a similar way from the query image. Then image retrieval is as follows. For each CIk , 1; :::; Nb ; matching function D compares C Ik with C Q to return a numerical measure D…C Q ; CIk † of difference. Then, the ordered list of images is displayed for viewing by ranking R ˆ {Ik : D…CQ ; C Ik †} according to matching measure D. In the next section, several alternatives for matching function D are defined. 4.3. Matching measure D() In practice, due to imperfections and noise (e.g. caused by surface imperfections, sensor noise, camera color clipping and blooming, and chromatic aberration in the camera lens) foreground hixels will be smeared out in the pattern-card

domain (see Figs. 2 and 3 for examples). Consequently, P will consist of point- or blob-like hixel clusters of small extent which are non-uniformly scattered. In addition, object clutter will introduce accidental foreground hixels due to surrounding objects. Moreover, partial object occlusion of the object itself will discard a number of foreground hixels from the pattern-card. Therefore, matching functions are required which are robust to noise, and robust to a substantial amount of object occlusion and cluttering. Let the set minus operator be given by C Q \C Ik ˆ {~i 僆 CQ : ~i 僆 CIk }, where ~i is (i) for hue, (i, j) for hue–hue pairs, and (i, j, k) for l1 l2 l2 . Then, matching functions can be characterized by two types of errors: false positives CQ \C Ik and false negatives CIk \CQ . 4.3.1. Statistic-based matching measures The first function is based on the difference between the number of foreground hixels in C Q and C Ik : Dm …CQ ; CIk † ˆ

jh…C Q † ⫺ h…CIk †j M

…29†

where h denotes the number of foreground hixels and M the total number of hixels in P. A matching function expressing the number of corresponding hixels (logical AND) is given by: Dand …CQ ; C Ik † ˆ 1 ⫺

h…C Q 傽 C Ik † M

…30†

where Dand is sensitive to false positives but not to false negatives. The overall misclassification error (logical XOR) is given by: Dxor …C Q ; CIk † ˆ

h……CQ \C Ik † 傼 …C Ik \C Q †† M

…31†

where Dxor is symmetric in C Q and CIk and hence sensitive to both false positives and false negatives. An important property of Dxor is that it is a metric. These functions measure difference between color pattern-cards based on the number of misclassified hixels regardless of the distance of each misclassified hixel to the nearest correct set. An advantage of giving each hixel an equal weight in the outcome is that a small number of hixel

482

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

outliers (e.g. background noise) of false positive type for Dand and of both types for Dxor , not affecting the overall shape, still yields high similarity. Outliers occur when color feature values are introduced due to sensing and measurement error, yielding isolated hixels in the patterncard domain (see Figs. 2 and 3 for examples). A disadvantage of giving each hixel an equal weight in the outcome, is that a small displacement of a large number of hixels (i.e. minor global distortions) results in a low similarity. Minor global distortions occur when color feature values change uniformly in value possibly due to for example color clipping and blooming, yielding a uniform shift of a large number of hixels in the color pattern-card domain. In the next section, matching functions are discussed which take into account the distance of each misclassified hixel to the nearest correct set.

i.e. it denotes the maximum distance from a point in one set to the nearest point in the other set. An important property of Dh …† is that it is a metric. Except for Dh …†, the above defined matching functions are all sensitive to false positives and insensitive to false negatives. But, as opposed to matching functions discussed in Section 4.3, these functions are less sensitive to small global distortions such as small displacements of a large number of hixels. However, the matching functions are more sensitive to outliers (e.g. background noise) of false positive type.

4.3.2. Geometry-based matching measures For PH , hixels represent whether a hue value is substantially present in an image or not. Due to the circular nature of hue, we define the distance between two hixels i and j by:

5.1. k nearest neighbor hash table

rH …i; j† ˆ d…i; j†

…32†

where i; j 僆 ‰0; 2p† and d() is given by Eq. (20). Further, for PHtoH , the distance between two pairs of hixels (i, j) and (k,l) is given by: q …33† rHtoH ……i; j†; …k; l†† ˆ d…i; k†2 ⫹ d…j; l†2 where i; j; k; l 僆 ‰0; 2p† and d() is given by Eq. (20). Let E…~a; C Ik † denote the shortest distance from a~ 僆 CQ to C Ik : E…~a; C Ik † ˆ

~ : a~ 僆 C Q } min {r…~a; b† ~b 僆 C Ik

…34†

where r…† is the standard Euclidean distance for Cl1 l2 l3 , the circular distance rH …† for CH , and the circular distance of index pairs rHtoH …† is taken for CHtoH . The geometry-based matching functions used to compute difference are as follows. The first one is the root mean square error distance mathematically specified as: v u X u 1 Q Ik E…~a; C Ik †2 …35† Drms …C ; C † ˆ t Q h…C † a~僆CQ The second one is the maximum error distance: Dmax …CQ ; CIk † ˆ

max {E…~a; C Ik †} a~ 僆 CQ

…36†

Furthermore, the Hausdorff distance is defined by: ~ C Q †} Dh …C Q ; C Ik † ˆ max{ max E…~a; C Ik †; max d…b; Q Ik ~ a~ 僆 C b僆C …37†

5. Hashing For the purpose of efficient image retrieval, color patterncards can be used as keys to index images in a hash table. To that end, in this section, we propose the k nearest neighbor hash table.

Let f …P† ! N ⫹ be defined as an indexing function for color pattern-card P resulting in a positive natural number. As stated, P is a binary array of hixels and hence the total number of possible configurations is 2B , where B is the total number of hixels in P. The total number of hixels depends on the number of dimensions and the bin size. As will be shown in Section 6.3, the number of bins q is of little influence on the retrieval accuracy as long as the number of bins is over q ˆ 16. Consequently, for hue-based pattern-card PH with bin size q ˆ 16 and 1 dimension we obtain fH …PH † ! {1; :::; 216 }. Furthermore, for hue–hue pair pattern-card PHtoH with bin size q ˆ 8 and 2 separated dimensions we obtain indexing function fHtoH …PHtoH † ! {1; :::; 216 }. Finally, for Pl1 ;l2 l3 and bin size q ˆ 6 and 3 separated dimensions we obtain fl1 l2 l3 …Pl1 l2 l3 † ! {1; :::; 218 }. 5.1.1. Storage complexity Hash tables are formed where each image is indexed according to its key. For each address, images in the database are ordered with respect to matching measure D(). To reduce the storage complexity only the k nearest neighbor images are stored. Then the total storage complexity is 2B × k entries. We take k ˆ 100 throughout the paper, resulting in a k nearest neighbor has table with 216 × 100 entries for fH …† and fHtoH …†, and 218 × 100 for fl1 l2 l3 …†, which can easily fit in the main memory of a standard PC. 5.1.2. Run-time image retrieval complexity During run-time image retrieval, color pattern-card PQ is computed for the query image and the indexing function f …PQ † computes the address to retrieve the k most similar images with respect to matching measure D(). Because the k most similar images (i.e. nearest neighbors) have already been precomputed off-line and assuming that the k nearest neighbor hash table fits into main memory, the image

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

483

Fig. 4. Left: Various images which are included in the image database of 500 images. The images are representative for the images in the database. Right: Corresponding images from the query set.

retrieval scheme is constant time, very efficient and independent on the number of images in the image database. We measured run-time image retrieval on a database of 100,000 images, based on k nearest (with k ˆ 100) neighbor hashing on a standard SPARCstation 5 with 110 Mhz and 64M main memory, to be 0.67 s. 6. Experiments To evaluate retrieval accuracy for the proposed contentbased image retrieval scheme, the following issues will be addressed: high discriminative power; robustness to a change in viewpoint; robustness to a change in object orientation; robustness to object occlusion; robustness to object cluttering; robustness to noise in the images and model deviations in the object. The data sets and evaluation measures are given in Sections 6.1 and 6.2. Color pattern-card formation is given in Section 6.3.

A second, independent set (the query set) of recordings was made of randomly chosen objects already in the database. These objects, N2 ˆ 70 in number, were recorded again one per image with a new, arbitrary position and orientation with respect to the camera, some recorded upside down, some rotated, some at different distances. In Fig. 4, various images from the image database of 500 images are shown on the left, whereas various images coming from the query set are shown on the right. In the experiments, all pixels in a color image are discarded with a local intensity and saturation smaller than 5% of the total range (this number was empirically determined by visual inspection); otherwise calculation of hue, hue–hue pairs and l1 l2 l3 become unstable. Consequently, the white cardboard background as well as the grey, white, dark or nearly colorless parts of objects as recorded in the color image will not be considered in the matching process. Further, we set D ˆ 3 for Eq. (24) and ts ˆ 4 for Eq. (25)

6.1. Datasets The database consists of N1 ˆ 500 images of domestic objects, tools, toys, food cans, art artifacts etc., all taken from two households. Objects were recorded in isolation and one per image with the aid of the SONY XC-003P CCD color camera and the Matrox Magic Color frame grabber. The digitization was done in 8 bits per color. Objects were recorded against a white cardboard background. Two light sources of average daylight color are used to illuminate the objects in the scene. Objects were recorded at a pace of a few shots a minute. There was no attempt to individually control focus or illumination. They show a considerable amount of noise, shadows, shading, specularities and self occlusion. As a result, recordings are best characterized as snap shot quality, a good representation of views from everyday life as it appears in home video, the news, and consumer photography in general.

Fig. 5. The discriminative power of the histogram matching by histogram intersection differentiated for the various color features plotted against the ranking j. The cumulative percentile x for l1l2l3, H, HtoH, rgb, S and RGB is given by xl1 l2 l3 ; xH ; xHtoH ; xrgb ; xS and xRGB , respectively.

484

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

Fig. 6. The discriminative power of the pattern-card matching process differentiated for the various matching functions plotted against the ranking j, N1 ˆ 500 and N2 ˆ 70. The average percentile x for Dxor ; Dand ; Drms ; Dh ; Dmax ; Dm is denoted by xDxor ; xDand ; xDrms ; xDh ; xDmax ; xDm , respectively.

in our experiments. These numbers were arrived at through experimentation by adjusting the parameter values and selecting the appropriate values by visual inspection. It has proved to be effective on our test images. 6.2. Error measures For a measure of match quality, let rank rQi denote the position of the correct match for query image Qi ; i ˆ 1; :::; N2 , in the ordered list of N1 match values. The rank rQi ranges from r ˆ 1 from a perfect match to r ˆ N1 for the worst possible match. Then, for one experiment, the average ranking percentile is defined by: ! N2 1 X N1 ⫺ rQi 100% …38† r ˆ N2 iˆ1 N1 ⫺ 1 The cumulative percentile of query images producing a rank smaller or equal to j is defined as: ! j 1 X Qi x…j† ˆ h…r ˆ k† 100% …39† N2 kˆ1 where h reads as the number of query images having rank k. 6.3. Color pattern-card formation The color pattern-card axes are partitioned uniformly. The resolution of axes should be a compromise between retrieval accuracy and computational efficiency. We determine the appropriate bin size for our application empirically by varying the number of bins on the axes over

q 僆 {2; 4; 8; 16; 32; 64; 128; 256}. The results show (not presented here) that the number of bins was of little influence on the retrieval accuracy when the number of bins ranges from q ˆ 16 and up. Therefore, the pattern-card bin size for each axis used during histogram formation is q ˆ 16 in the sequel. Also an appropriate value for tb is to be determined during the construction of the pattern-card C as defined in Section 4.1, where we considered the total accumulation for a particular hixel not substantial, when the total accumulation is below tb. Noise will introduce false positive and false negative errors affecting the performance of the matching functions. Because noise is application dependent, we determine the appropriate value of tb for our application by varying tb with q ˆ 16 over tb 僆 {1; 2; 3; 4; 6; 8} and have chosen tb ˆ 3 (i.e. covering at least 3% of the total image area) which produced the highest discriminative power averaged over all matching functions. 6.4. Discriminative power differentiated for the various color models In this section, we report on the recognition accuracy of the matching process for N2 ˆ 70 test images and N1 ˆ 500 reference images for the various color features. As stated, white lighting is used during the recording of the reference images in the image database and the independent test set. However, the objects were recorded with a new, arbitrary position and orientation with respect to camera. For comparison reasons in the literature, we have also constructed color feature spaces for RGB and the following standard color features derived from RGB: normalized colors (color invariant for matte objects [21]): r…R; G; B† ˆ

R ; R⫹G⫹B

g…R; G; B† ˆ

G ; R⫹G⫹B

b…R; G; B† ˆ

B R⫹G⫹B

…40†

and saturation (color invariant for matte objects [21]): S…R; G; B† ˆ 1 ⫺

min…R; G; B† R⫹G⫹B

…41†

For comparison reasons in the literature, in this subsection, matching is based on histogram intersection [4]. In Fig. 5, accumulated ranking percentile is shown for the various color features. From the results of Fig. 5 we can observe that the discriminative power of l1 l2 l3 ; H and hue–hue pairs followed by

Fig. 7. Two of the 10 images generating together 8 images by blanking out o 僆 50; 65; 80; 90 percent of the total object area.

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

485

rgb is higher then the other color features. As expected, the discrimination power of RGB has the worst performance due to its sensitivity to varying imaging conditions. In the next section, hue–hue pairs are taken to study the discriminative power differentiated for the various matching measures under a change of viewpoint, object occlusion and object cluttering. 6.5. Discriminative power differentiated for the various matching measures In this subsection, we report on the image retrieval accuracy of the matching process based on hue–hue pairs for N2 ˆ 70 query images and N1 ˆ 500 target images on the basis of various matching functions. The discriminative power differentiated for the various matching functions is shown in Fig. 6 showing the accumulated average percentile x for j ⱕ 10. For xDxor , 96% of the images have rank 1. In other words, with the probability of 96 perfect matches out of 100, very high retrieval accuracy is achieved. This is due to the fact that Dxor is symmetric and hence sensitive to both false positive and false negative error types, and relatively insensitive to outliers. Furthermore, slightly worse retrieval accuracy is provided by Dand and Drms , for which 92% and 89% of the correct matches are within the first 10 images. All other matching functions produce worse retrieval accuracy. As stated in Section 5, during off-line indexing, hash tables have been created where each image in the image database is indexed according to its indexing function f …P† ! N ⫹ . During run-time image retrieval, color pattern-card P for the query image is generated. Then, indexing function f(P) is computed yielding the address to retrieve the k ˆ 100 most similar images ordered with respect to matching measure D(). Because the k most similar images (i.e. nearest neighbors) have already been precomputed off-line, the image retrieval scheme is executed in constant time i.e. independent on the number of images in the image database. We have measured run-time image retrieval, based on k nearest neighbor hashing on a standard SPARCstation 5 with 110 Mhz and 64M main memory, to be 0.67 s. 6.6. Degradation of discriminative power due to occlusion and change in viewpoint To test the effect of occlusion on the color invariant matching process, 10 objects, already in the database of 500 recordings, were randomly selected and in total 40 images were generated by blanking out o 僆

Fig. 8. The discriminative power of the pattern-card matching process differentiated for the various matching functions plotted against the percentage object area blanked out o, N1 ˆ 500 and N2 ˆ 10.

{50; 65; 80; 90} percent of the total object area (see Fig. 7). Note that white as recorded in the color image will not be considered in the matching process. The average ranking percentile r with N1 ˆ 500 and N2 ˆ 10 is shown in Fig. 8. From the results we see that, in general, the shape and decrease of the curves for different matching functions do not differ significantly, except their retrieval accuracy. This means that the effect of occlusion (i.e. blanking out object area) is largely the same for all matching functions: namely a gradual decrease in retrieval beyond 60% blanking. To test the effect of change in viewpoint, the 10 objects were put orthographically in front of the camera and in total 40 recordings were made by varying the angle between the camera for s ˆ {45, 60, 75, 80} with respect to the object’s surface normal (see Fig. 9). The average ranking percentile with N1 ˆ 500 and N2 ˆ 10 is shown in Fig. 10. Looking at the results, the rate of decrease in retrieval accuracy is almost negligible for s ⬍ 75⬚. This means that pattern-card matching based on the proposed color invariant is highly robust to a change in viewpoint up to 75⬚ of the object with respect to the camera. 6.7. Degradation of discriminative power in the presence of object clutter Another important claim is that image retrieval is fairly insensitive to object clutter. To test the effect of object cluttering, a small and preliminary experiment has been conducted. Thirty images have been recorded from cluttered scenes. Each cluttered scene contained a set of six different multicolored

Fig. 9. Two of the 10 images generating together 8 images by varying the angle between the camera for s ˆ {45, 60, 75, 80} degrees with respect to the object’s surface normal.

486

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

Fig. 10. The discriminative power of the pattern-card matching process differentiated for the various matching functions plotted against the angle of rotation s, N1 ˆ 500 and N2 ˆ 10.

objects. Then, 10 objects were randomly selected which participated in exactly one of the cluttered scenes. These objects were recorded in isolation against a white background yielding the query set. The query set N2 ˆ 10 was matched against the database of N1 ˆ 30 images. Although the data set is small with N1 ˆ 30 and N2 ˆ 10, some tentative results can be observed (see Fig. 11) showing the accumulated average ranking percentile for various matching functions. As can be expected, Dxor provides poor retrieval accuracy in the presence of clutter. This is because Dxor is sensitive to false negatives. False negatives are introduced by hue–hue pairs coming from objects surrounding the object described by the query image. In fact, all matching functions which are sensitive to false negatives, Dxor and Dh , provide poor retrieval accuracy in the presence of clutter. From the results we see that Drms and Dand provide high image retrieval accuracy in the presence of object clutter. Drms and Dand are insensitive to false positives. 6.8. Conclusion on matching measures We have studied different matching functions for color pattern-card matching. Excellent performance is shown for

Fig. 11. The discriminative power of the pattern-card matching process differentiated for the various matching functions plotted against the ranking j, n1 ˆ 30 and N2 ˆ 10.

the XOR matching function Dxor , where 96% of the correct matches is within the first 1% rankings in a database of 500 different objects. This is due to the fact that Dxor is symmetric and can be interpreted as the number of pixels with the same color model value in the query image which can be found present in the retrieved image and vice versa. This is a desirable property when one object per image is recorded without any object clutter. Furthermore, experimental results show that color pattern-card matching is robust to a change in viewpoint up to 75⬚ of the object with respect to the camera. The effect of occlusion only gradually occurs beyond 60% blanking for all matching functions. In the presence of object clutter in the scene, tentative results reveal that high image retrieval accuracy is provided by the quadratic distance function Drms . Then, the overall conclusion is that for accurate image retrieval, Dxor is most appropriate when there is no object clutter, Drms yields best image retrieval accuracy in the presence of object clutter.

7. PicToSeek: an image retrieval system for the World Wide Web An important application is the content-based retrieval of images from the World Wide Web. To this end, the image retrieval scheme is incorporated into the content-based image browser PicToSeek [22] for searching pictorial information on the World Wide Web. PicToSeek collects images on the Web by means of autonomous Web-crawlers. Then, the collected images are automatically cataloged by image analysis methods into various image styles and types: JFIF-GIF, grey-color, photograph-synthetic, size, data of creation, and color depth. After cataloging images, the color invariant image features are extracted from the images and stored in the k nearest neighbor hash tables as defined in Section 5. When images are automatically collected, indexed and cataloged, PicToSeek allows for fast on-line image search. To illustrate the query capability of the system, the typical application is considered of retrieving images containing an instance of a given object. To that end, the query is specified by an example image taken from the object at hand. A typical query specification is shown in Fig. 12, where images are taken from the image database used in the experiments. PicToSeek enables the user to select and display the image of the object at hand by a Url-address. At run time, the user specifies the preferred invariance. Then, the required color invariants are extracted from the query and matched with those of the target images in the database. After matching, images are ordered with respect to the query according to their matching measure and displayed in the retrieval unit one by one through image browsing or as an ordered set according to the user preferences. The Web-crawler and graphical-user interface of

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

487

Fig. 12. Overview of the system.

PicToSeek has been implemented in Java. Image analysis and feature extraction methods have been implemented in C⫹⫹. A database is used to store the images and the indexes. The server runs on a SPARCstation 5 with 110 Mhz. PicToSeek can be experienced on-line at: http:// www.wins.uva.nl/research/isis/zomax/.

8. Conclusion In this paper, a new set of color models has been proposed for the purpose of content-based image retrieval robust to a large change in viewpoint, object geometry and illumination. From the proposed set, various color models have been selected to construct color pattern-cards for each image. Matching measures have been defined, expressing similarity between color pattern-cards, robust to a substantial amount of object occlusion and cluttering. Based on the theoretical and experimental results, it is concluded that high image retrieval accuracy is achieved by l1 l2 l3 ; H and hue–hue pairs. RGB has the worst performance due to its sensitivity to varying imaging conditions. Also, robustness is demonstrated against a change in viewing position, partial occlusion, and a substantial amount of object cluttering. Furthermore the pattern-card matching

process can be executed at very high speed independent on the number of images in the image database. Finally, the image retrieval scheme has been integrated into the PicToSeek image browser for searching images on the World Wide Web. No constraints are imposed on the objects being viewed and the image forming process other than that images should be taken from multicolored objects illuminated by white light. White illumination is not a severe restriction, because white illumination is accepted for a large variety of applications.

References [1] T. Gevers, A.W.M. Smeulders, Enigma: an image retrieval system, in: Proceedings of 11th International Conference on Pattern Recognition, The Hague, The Netherlands, 1992, 697–700. [2] W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, The QBIC Project: querying images by content using color, texture, and shape, in: Proceedings of Storage and Retrieval for Image and Video Databases, SPIE, 1993. [3] A. Pentland, R.W. Picard, S. Sclaroff, Photobook: tools for contentbased manipulation of image databases, International Journal of Computer Vision 18 (3) (1996) 233–254. [4] M.J. Swain, D.H. Ballard, Color indexing, International Journal of Computer Vision 7 (1) (1991) 11–32.

488

T. Gevers, A.W.M. Smeulders / Image and Vision Computing 17 (1999) 475–488

[5] W. Grosky, R. Mehrotra, Image Database Management (special issue), Computer 22 (12) (1989). [6] IFIP, Visual Database Systems I and II, Elsevier Science Publishers, North-Holland, 1989 and 1992. [7] A.W.M. Smeulders, R. Jain (Eds.), Image Databases and Multi-Media Search, Series on Software Engineering and Knowledge Engineering, Vol. 8, World Scientific, Singapore, 1997. [8] R. Jain, NSF Workshop on Visual Information Management Systems, SIGmod Record 22 (3) (1993) 57–75. [9] W. Niblack, R. Jain (Eds.), Proceedings of Storage and Retrieval for Image and Video Databases I, II and III, SPIE, Bellingham, 1993, 1994 and 1995. [10] Proceedings of Visual Information Systems: The First International Conference on Visual Information Systems, Melbourne, Victoria, Australia, 1996. [11] Proceedings of Visual Information Systems: The Second International Conference on Visual Information Systems, San Diego, CA, 1997. [12] M. Flickner et al, Query by image and video content: the QBIC system, Computer 28 (9) (1995) 23–33. [13] V.E. Ogle, M. Stonebraker, Chabot: retrieval from a relational database of images, Computer 28 (9) (1995) 40–49. [14] S. Sclaroff, L. Taycher, M. La Cascia, ImageRover: a content-based image browser for the World Wide Web, in: Proceedings of IEEE

[15]

[16]

[17] [18] [19]

[20]

[21] [22]

Workshop on Content-based Access and Video Libraries, CVPR, 1997. C. Frankel, M. Swain, Athitsos Webseer: An Image Search Engine for the World Wide Web, TR-95-010, Boston University, Boston, 1995. J.R. Smith, S.-F. Chang, VisualSEEK: a fully automated contentbased image query system, in: Proceedings of ACM Multimedia, 1996. A. Gupta, Visual Information Retrieval Technology: A Virage Perspective, TR 3A, Virage, 1996. S.A. Shafer, Using color to separate reflection components, COLOR Res. Appl. 10 (4) (1985) 210–218. H. Levkowitz, G.T. Herman, GLHS: a generalized lightness, hue, and saturation color model, CVGIP: Graphical Models and Image Processing 55 (4) (1993) 271–285. J. Canny, A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence 8 (6) (1986) 679–698. T. Gevers, Color Image Invariant Segmentation and Retrieval, PhD thesis, University of Amsterdam, The Netherlands, 1996. T. Gevers, A.W.M. Smeulders, PicToSeek: a content-based image search engine for the World Wide Web, in: Proceedings of Visual Information Systems, San Diego, CA, 1997, 93–100.