Displays 29 (2008) 451–457
Contents lists available at ScienceDirect
Displays journal homepage: www.elsevier.com/locate/displa
Perceived image similarity and quantization resolution Hock Chuan Chan * National University of Singapore, 3 Science Drive 2, Singapore 117543, Singapore
a r t i c l e
i n f o
Article history: Received 9 November 2007 Accepted 3 March 2008 Available online 12 March 2008 Keywords: Contest-based image retrieval Color histogram Color quantization Resolution Image similarity
a b s t r a c t Color quantization is a key step in content-based image retrieval based on color histograms and is critical to the retrieval performance. An important factor related to color quantization is the quantization resolution. It is important to empirically evaluate how resolution levels affect human perceptions of image similarity. A laboratory study was conducted to analyze the effect of different resolution levels on human judgment of image similarity. The results show that the impact of quantization resolutions on perceived image similarity is not linear. In fact, a logarithm relationship fits the data very well. Furthermore, there is a surprising result that the objective measure of colorloss can predict perceived image quite well. The study provides accurate data for content-based image retrieval researchers to decide on the tradeoff between processing speed (which is affected by the choice of resolution level) and perceived image similarity. Ó 2008 Elsevier B.V. All rights reserved.
1. Introduction Advances in hardware technology and retrieval methods have contributed to an increase in the use of digital images. Many image retrieval systems have been developed [16,26,46,55]. A common retrieval approach is content-based image retrieval, which captures visual features automatically or semi-automatically so as create image annotation and index [1,46]. Ideally, the extracted features could identify different objects that correspond to human perception. However, most content-based image retrieval systems are pseudo-object-based. That is, low-level visual features are extracted from images, hoping to capture the high-level semantic information through these features. These features in most cases may not have semantic meanings. However, these features limit the search space and greatly increase retrieval efficiency. Typically, retrieval systems implement some similarity measure between the query image and images in the database based on these low features. Two processes – feature extraction and similarity measurement – are considered to be crucial for content-based image retrieval systems. A commonly used feature is color, which is used for discriminating between relevant and irrelevant images [46]. Many color-related features can be extracted from an image, such as global and local histograms, and prominent and salient colors. For example, color histogram is used in image retrieval systems by Brunelli and Mich [3], Cinque et al. [10], Faloutsos et al. [17], Lu et al. [27], Ogle and Stonebraker [34], Yoo et al. [55]. Color histogram records the colors present in the image, and their quantities. * Tel.: +65 65163393. E-mail address:
[email protected] 0141-9382/$ - see front matter Ó 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.displa.2008.03.002
The reason for its popularity is that it is ‘‘invariant to translation, rotation about an axis perpendicular to the image, and change only slowly with rotation about other axes, occlusion, and change of the distance to the object” [50]. Color histogram is also found to show good retrieval performance [14,18]. Digital images are commonly represented as intensity values in the RGB color space. For 24-bit images, there are 16.8 million possible values. To maintain a color histogram on 16 million colors is very large and unnecessary for image retrieval. Typically, the color space is quantized to a much lower resolution. Therefore, the resolution level is critical for histogram retrieval method, as it will affect the efficiency and effectiveness of image retrieval systems [52]. For example, use of 16 bins each for red, green and blue will result in 4096 color bins. For each bin, one color is selected as the representative color of that bin. Two colors are the same if they fall into the same bin. Different quantization resolutions have been used, ranging from 64 to 4096 [9,19,32,49,52,51]. It is important to empirically evaluate the effect of different resolution levels (with different color spaces) on human perception of image similarity of the quantized image to the original image, as well as to investigate if there is any objective measure that can be used to predict human perception. Accurate data will allow system developers to know how their decisions on resolution levels are going to affect user perception of the images. This will allow them to make an informed decision on the tradeoff between system processing speed and user perception. Section 2 of this paper presents a general model of contentbased image retrieval, followed by a brief review of a few human factors study in this area. It highlights the paucity and the need for human factors studies. Section 3 clarifies the main research question, and provides a concise review of the relevant literature
452
H.C. Chan / Displays 29 (2008) 451–457
on color histogram and quantization resolution. The laboratory study is detailed in Section 4. The experiment chooses the most commonly used color space, and a very basic quantization method. It is not an aim in this study to identify the best quantization method or color space. Data analyses and results are presented in Section 5. It was found that the relationship between perceived image similarity and resolution levels can be represented by a simple equation. The conclusion is in Section 6. 2. Human factors in content-based image retrieval This section provides a brief review of content-based image retrieval, followed by a thorough review of relevant empirical studies involving human subjects. It establishes the need for more empirical studies based on actual human judgment of images, leading to the research question in the next section. A literature review of previous studies indicates that several levels of information can be captured from images. Researchers have proposed models ranging from two levels to several levels [8,23,24,26,38,45,46]. Systems can also be classified into two categories: syntactical and pseudoobject levels [8]. At the syntactical level, a single feature is captured to characterize the images. At the pseudo-object level, systems use combinations of these single features to index the images, hoping to approximately identify the objects by clever use of these features [2]. Typically feature-based systems operate in two distinct phases: the first phase is feature extraction, where the features of each image in the database are extracted, usually represented by a vector, and stored. The second phase is retrieval phase. Typically, a user presents a query. Based on a similarity measure in feature space, a system returns some images. These images could be ranked on some similarity values. One important issue for researchers of content-based image retrieval systems is finding good models and similarity measures which best mirror human perception for comparing images, and which can be integrated into the systems [29,33]. Many factors influence the way people measure similarity. Picard and Minka [37] identified four such factors: visual features, viewpoint, semantics, and culture. Human factors research in this area is gaining increasing attention. Some examples include Chan and Wang [6], Cox et al. [11–13], Han et al. [21], Minka and Picard [30], Mojsilovic et al. [31], Papathomas et al. [35], Picard [39], Rao and Lohse [40], Rui et al. [44], Scassellati et al. [47]. Generally, research works in this area can be classified in three directions: refining query, changing the traditional display techniques, and understanding human perceptions. A few human factor studies on image similarity are briefly reviewed in the following paragraphs. A study of human perceptions on shape similarity was conducted by Scassellati et al. [47], who tested seven computational shape matching methods against human perceptions. The methods include algebraic moments, parametric curve distance, turning angle, sign of curvature, and modified Hausdorff distance. The database used in the experiment has over 1400 images. Twenty original query shapes were drawn by the researchers. The shapes varied in complexity, number of angles, perimeter, etc. All query shapes were drawn to be perceptually similar to at least one object in the database. The 40 subjects in the experiment were told to choose up to ten matching objects for each query object, without ranking them. No definition of similarity was offered to subjects, but they were told that the relative size and orientation of the pictures were unimportant. The results showed that none of the methods matches the human selections well, although turning angle method provided the best overall results. Papathomas et al. [35] studied the importance of using semantic information, query feedback memory and relative judgments of
image similarity in an image retrieval system, PicHunter. The experiment involved six system versions which differed from each other along the above three factors. The first dimension has three options – purely pictorial features, purely semantic one, or both. The second dimension differentiates whether the system employs the entire history of user response or only the last response. The last dimension uses relative vs. absolute distance for users to give feedback. In the relative distance mode, a user could select several similar images as feedback. In absolute distance mode, only one image (the most similar) can be selected by user as feedback. Six subjects took part in the experiment. The database included 1500 images. Fifteen query images were employed in the experiment. Subjects were required to find the identical target image. Performance was computed as the average iterations that subjects took to complete each search successfully across the 15 targets. The results showed that the use of long-term memory improves performance significantly when relative distance is employed, but there is no improvement for the absolute-distance versions. One surprising finding of the experiment is that the best performance is achieved when using only semantic cues, with memory and relative similarity judgement. The combination of semantic and visual cues under the same condition only achieves the second best result. Rogowitz et al. [42] conducted a psychophysical experiment aimed at uncovering the dimensions human observers use in rating the similarity of photographic images. They also compared the results with two algorithmic image similarity metrics, one based on similarities in color histograms and the other based on more sophisticated perceptually-relevant features. Ninety-seven JPEG images were selected from 5000 photographic images. Two psychophysical scaling methods were used to measure the perceived similarity of each image with every other image in the set. In the ‘‘table scaling” experiment, 97 test images were printed and placed randomly on a large round table. Subjects were required to arrange these images so that the physical distances between them were inversely proportional to their perceived similarity. No definition of similarity was given to the subjects. Nine subjects served in this experiment. In ‘‘computing scaling” experiment, the experiment was conducted on a display monitor. The reference stimulus was presented along the left edge of the display, accompanied by two rows of four test stimuli running horizontally along the display. On each trial, the subjects viewed a randomly-selected reference stimulus and eight test stimuli, selected randomly from the set of 97, and judged which of the eight appeared most similar. Again, no definition of similarity was given to the subjects. Fifteen subjects took part in this experiment. A multidimensional scaling analysis in two and three dimensions was performed for the data. Both table and computing scaling methods produced similar results. This similarity suggests that monitor display and paper prints do not have significant effect on perception similarity. When compared with the algorithmic image methods, the conclusion is that even though visual features do not capture the whole semantic meaning of the images, they do correlate a lot with the semantics. Overall, more and more research in content-based image retrieval systems have focused to human perception. The empirical experiments not only give us more insight on human perception of some basic features, such as texture [40,31] and shape [47], but also cover the dimensions human observers use in rating the similarity of photographic images [42]. 3. Research question The main research question that we are investigating is: how do human perceptions of image similarity vary with the resolution levels? In addition, other objectives measures will be compared
453
H.C. Chan / Displays 29 (2008) 451–457
with human perceptions. There are potentially many other relevant factors that can be studied, such as color spaces [54], image color characteristics, other image features such as shape and textual [8,23,26], and specialized quantization methods [2,25,41]. For simplicity in the experiment design, we have excluded these factors. This part presents the literature most relevant to this research question. First two fundamental techniques in contentbased image retrieval systems using color histogram – the selection of similarity metrics and quantization – are discussed. Then relevant studies on selection of color quantization methods are reviewed. A major variable that affects the performance of a quantization method, quantization resolution, is presented, as well as the dependent variable, perceived image similarity. An objective measure, colorloss, is also presented to help provide a more complete picture of the results later. Color is one of the most salient features in an image. It is relatively robust to background complication and independent of image size and orientation. Many color-related features can be extracted, such as dominant colors, color distributions, color layout, etc. Among them, color histogram is the most commonly used technology in color-based image retrieval systems. Color histogram describes the overall color distributions of an image, i.e., it describes what colors are in an image and in what quantities. The use of color histogram for image retrieval has been widely explored [7,15,18,32,49,36,50]. The fundamental elements of this approach include selection of color quantization method and the histogram distance metrics (similarity metrics). Color quantization is a process during which similar colors are grouped together into bins. The higher the quantization resolution, the more bins there are. Similarity metrics are used to compare the similarity between two histograms. Many methods exist for the color quantization and similarity metrics. Several similarity metrics have been proposed for comparing the similarity between two histograms. Rubner et al. [43] divided them into two categories. The bin-by-bin similarity metrics only compare contents of corresponding histogram bins. On the other hand, crossbin similarity metrics not only compare contents of corresponding histogram bins, but also compare non-corresponding bins. Bin-by-bin similarity metrics are based on this assumption: features that fall into the same bin are close enough to each other to be considered the same, and those that do not fall into the same bin are too far apart to be considered similar. Several bin-by-bin similarity metrics exist, e.g. L1-metric [50]. Bin-by-bin dissimilarity measures are sensitive to bin size (resolution levels). A binning that is too coarse will not have sufficient discriminative power, while a binning that is too fine will place similar features in different bins which will never be matched. This is the reason for crossbin similarity metrics, e.g. the quadratic form distance [20]. Many other similarity metrics exist for comparing the similarity between two histograms, such as Kullback-Leibler divergence and Jeffrey distance, Kolmogorov-Smirnov distance, and parameter-based distance [18,43,46]. Smith [48] compared the recall and precision of different similarity metrics on a database containing 3100 images. The four queries in his experiment are sunsets, flowers, nature with blue sky, and lions. The results showed that the performances of L1-metric and quadratic form distance are almost the same. Color quantization is a critical technique for color histogram retrieval [28,52]. It is a process during which similar colors are grouped together. Without data about human judgment, it is difficult to decide the appropriate quantization resolution, which is critical to system’s efficiency. If the color space is quantized too finely (the quantization resolution is too large), then the index file will become too large. This will seriously affect the system’s efficiency. However, if the color space is quantized too coarsely (the quantization resolution is too small), then two images which are quite different will be regarded as the same. While there have been
some system-based evaluations of quantization methods [52,53], there have been few, if any, direct comparison with human perception. There is thus a necessity to investigate how human perception of images is affected by the resolution levels. The RGB color space is one of the most popular and widely used color spaces in that its format is the most common one for digital images and is compatible to computer display [21]. One main aim of quantization in the RGB color space is to have the quantized image as similar to the original image as possible, for display purpose [41]. Other color spaces are involved in some quantization methods [22,28]. Details of color spaces can be found in, e.g. Wyszecki and Stiles [54]. For simplicity, and as an initial study on this issue, only the RGB color space will be considered. The RGB color space is a cube model in which red, green and blue are its three axes respectively. The origin, with all coordinates zero, is black. The long diagonal represents gray values, which equal values in each axis, from the black corner through the cube to white at the opposite corner. For the RGB color space, researchers usually equally sample along the three axes [52], and this is probably the most basic method [41]. It is noted that various other quantization methods have been proposed. In general, the methods can be independent or dependent on the images, e.g. various modified version of the uniform quantization method, which is independent of the images, or medium-cut quantization and adaptive-binning methods, which are based on each individual image’s color characteristics [2,25,41]. Again, for simplicity of the experiment, color characteristic is not a factor in the study, and the basic uniform partitioning method is chosen. Various quantization resolutions have been used, from 16 to 4096 color bins [32,36,49,51]. For a long time, researchers have to choose an appropriate resolution so that they could reach a balance between the performance and the efficiency of the system. If the resolution is too small, the retrieval result will contain a lot of irrelevant images. However, if it is too large, the computing complexity of the system will be increased dramatically. Thus it is of great importance for the researchers to find out how the image similarity changes with the resolution. In the literature, two methods have been found to evaluate the performance of difference quantization techniques in image retrieval systems. In one method, color histograms based on different quantization techniques of images were computed and a similarity metric is proposed for comparing the similarity between two color histograms. Then several queries were given to retrieve the similar images from a specified database applying the above similarity metric and a color quantization method. The performances of the color quantization techniques were compared by performance of retrieval results [52,53]. The limitation of this method is that no similarity method has been found to best mimic the human perception of similarity. The results could also be highly dependent on the particular system and database/query images [28]. The second method, also an objective measure, is a calculation of colorloss [53]. Colorloss is a quantitative measurement of the color information loss for quantization techniques. It is defined as the Euclidean color distance of a pixel in the original image and the corresponding pixel in the quantized image. Let Imgo denotes an original image which composed of N pixels, and the RGB values of a pixel p are (ro, go, bo). Let Imgq represents the quantized image, and pq is the corresponding pixel of the pixel p, the RGB values of a pixel are (rq, gq, bq). The average color loss of a pixel between these two images is defined as follows:
ColorLossðImg 0 ; Img q Þ ¼
PN qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðr o rq Þ2 þ ðg 0 g q Þ2 þ ðb0 bq Þ2 1 N
Colorloss describes the average color loss of a quantized image. The larger the colorloss, the greater is the loss in color information. Both methods have not been validated against actual human perception.
454
H.C. Chan / Displays 29 (2008) 451–457
There is one experiment reported in the literature that used perceived quality to test quantization methods [5]. For resolution levels, the experiment tested only four levels (16, 64, 128, 256). It reported some significant differences in perceived quality across these levels. More details of the finding will be presented during the discussion of our experiment.
gory 1, 4–6 and 7–9 points for the other categories. Subjects are allowed to put multiple images under the same point, and to leave certain points with no images. To avoid any bias of presentation sequence, the original images and their 10 images were given to subjects in a random order. No definition of similarity and time limit was given to the subject.
4. Research methodology
5. Data analysis and results
A laboratory study was conducted to gather human (subjective) evaluations of images at various resolutions. In the following paragraphs, we present the experimental design, followed by the full description of subject, task and experimental procedures. The experiment involved various quantizations at 10 different resolutions. Only uniform quantization scheme is applied to the RGB color space. The RGB color space is a cube model. The common method to implement uniform quantization scheme is to quantize equally along the three axes. The whole color space is divided into many equal size small cubes (bins) after the quantization. Colors in the same bin are considered to be similar, and they are all represented by the same representative color, usually the center of each bin. In the experiment, the number of quantization bins for R, G, B axes respectively are: 3 3 3, 4 4 4, 5 5 5, 6 6 6, 7 7 7, 8 8 8, 9 9 9, 10 10 10, 11 11 11, 12 12 12. Thus, the ten quantization resolutions are: 27, 64, 125, 216, 343, 512, 729, 1000, 1331, and 1728. Thirty graduate students, 18 males and 12 females, with age from 23 to 30, participated in the experiment. They have normal visions and are naïve to the experiment purpose. Six images of postcard sizes (as shown in Fig. 1) are chosen as original images in the experiment according to the following criteria: first, we want to make sure that we cover a broad range of colors. Thirteen colors – red, orange, yellow, green, blue–green, light blue, blue, purple, pink, brown, white, gray, and black ‘‘match human perceptual categories and tend to distinguish interesting objects from their background” [4]. The thirteen colors are covered in the six images. Second, all images are natural scenes. Third, the images are in both landscape and portrait mode. Color prints are produced for the subjects to sort. This is similar to the study by Han et al. [21]. Although printing introduces slight color changes, the alternative of on-screen sorting of many images may be too tedious for human subjects, and may lead to unreliable data. The total number of quantized images for each original image is 10 (for the 10 different resolutions) giving a total of 60 images. Subjects were required to rate the image similarity from 1 to 9. The rating procedure is as follows: subjects first categorize images into three categories according to their similarity to the original images. The meaning of each category is as follows: category 1 is for images which retain little information from the original image, category 2 is for images which retain adequate information from the original image, and category 3 is for images which retain the most information from the original image. Subsequently, subjects sorted the images within each category from 1–3 points for cate-
The results of the subject’s image similarity assessment and the objective measure of colorloss for each resolution level are summarized in Table 1. Fig. 2 presents the chart of perceived similarity against resolution levels, and Fig. 3 presents colorloss against resolution levels. The vertical bars in the figures indicate the 95% confidence intervals for the mean values. Effects of resolutions proved to be significant. The relation between perceived image similarity and resolution is not linear. Perceived image similarity increases rapidly when the resolution is small However, the speed of increase slows as the resolution becomes bigger. Regression analysis is used to determine how the perceived image similarity changes with the increase of resolutions. Regression of average image similarity against ln(resolution) gives the following: Image Similarity ¼ 1:13 þ 1:36 lnðresolutionÞ For the regression model: adjusted R2 = 0.985, F = 600, p = 0.001; for the coefficient: t = 24.5, p = 0.001. Thus, the regression model fits the data very well. The R2 shows that resolution levels can explain about 98% of the variations in average perceived color similarity. An equally high R2 is produced when image similarity is regressed against ln(steps). Steps are the number of divisions along each axis of the RGB spaces. A regression of image similarity against steps produces a slightly lower R2 (0.90). For an image retrieval system, increasing resolution means that the system can have better performance by getting rid of more irrelevant images as the quantized images keep more and more information of the original image. On the other hand, increasing quantization resolution also means that the system has to afford extra overhead brought by increasing resolution. The data shows
Table 1 Average image similarity and colorloss Resolutions
Mean perceived similarity
Standard deviation
Average colorloss
27 64 125 216 343 512 729 1000 1331 1728
3.07 4.54 5.34 6.45 6.97 7.46 8.01 8.26 8.44 8.63
1.39 1.55 1.73 1.68 1.49 1.40 1.11 1.05 1.07 .75
0.157 0.116 0.095 0.077 0.068 0.060 0.055 0.050 0.045 0.041
Fig. 1. Six original images.
H.C. Chan / Displays 29 (2008) 451–457
9.00
8.00
Image similarity
7.00
6.00
5.00
4.00
3.00
2.00
1.00 0
500
1000
1500
2000
1500
2000
Resolution Fig. 2. Perceived image similarity.
0.18 0.16
Colorloss
0.14 0.12 0.10 0.08 0.06 0.04 0.02
0
500
1000
Resolution Fig. 3. Average colorloss.
researchers the impact of their choice of resolution. For example, resolutions below 350 are likely to lead to medium or low similarity perceptions. Another example is that increasing resolutions by 100 from 1500 to 1600 is not likely to improve similarity perceptions, but increasing 100 from 200 to 300 will likely lead to a big improvement in similarity perception. The changes could be estimated from the regression formula. It was also found that human perception of image similarity and colorloss are highly correlated. Linear regression of average image similarity against average colorloss over the 60 quantized images gives: Image Similarity ¼ 10:02 43:22 colorloss For this regression model: adjusted R2 = 0.731, F statistic = 161, p = 0.001; for the coefficient: t = 12.7, p = 0.001. The R2 shows that colorloss can explain about 73% of the variation in average perceived image similarity for the set of 60 quantized images. Taking
455
each set of 10 images (corresponding to 1 original image) alone, there is also a very strong correlation between image similarity and colorloss. The correlation coefficients range from 0.89 to 0.99, all with significant p values of 0.001. This equation, together with Figs. 2 and 3, allows us to have a realistic interpretation of colorloss values, which before are just numeric values. For example, if we have a colorloss value of 0.20, Figs. 2 and 3 tell us that 0.20 is a very bad value with very low image similarity. The various analyses of colorloss and image similarity lead to the surprising result that there is a close match between objective calculations and subjective human perception. Colorloss can be used to predict the average subjective human perception of image similarity. This experiment used six original images. It will be important to analyze whether the results are dependent on any particular image. An analysis of variance on color similarity for different images, taking into consideration colorloss values, shows that there is no significant difference across images. Thus there is high probability that the results will be robust to other sets of images. While the coefficients may change slightly for other images, the logarithmic relationship between image similarity and resolution, and the linear (negative) relationship between image similarity and colorloss should be very robust to changes in the choice of images. We now compare the findings with those by Chan and Nerheim-Wolfe [5]. Of the four resolutions that they tested, they found that 16-color palette has the poorest perceived quality (statistically significantly different from the others), 64-color palette is poorer (significantly) than 256-color, and 128-color is between 64-color and 256-color without significant difference from either one. When we test the average image similarity across 27, 64, 125 and 216 resolutions, we find an identical pattern. That is, at 27 bins, image similarity is significantly poorer than the others (p = 0.001), the 64-bin image similarity is significantly poorer than the 216-bin (p = 0.002), and the 125-bin is not significantly different from either the 64-bin or the 216-bin (p = 0.10 and 0.06, respectively). Thus this study confirms the findings from Chan and Nerheim-Wolfe [5]. In addition, this study empirically derives the equation to describe the relationship between image similarity and resolution levels. The study by Chan and Nerheim-Wolfe [5] was in the context of display quality. As the results are similar, the findings in this study could also be used to decide on the resolution levels needed to achieve a targeted display quality. Some histogram similarity metrics compare across bins and some do not [43]. The findings here can help to provide some information on when cross-bin comparison may be better. When the resolution level is very high (e.g. >1000), changes in resolution have little impact on perceived color similarity. This implies that adjacent bins appear similar. Thus cross-bin comparison of adjacent bins should be included. When the resolution level is low (e.g. <100), changes in the resolution level have big impacts on perceived image similarity. This implies that adjacent bins are different. Thus, cross-bin metrics are unlikely to be better. As an initial study on quantization using the human-subject experiment method, the study has a number of limitations, which are discussed here. Firstly, there are many interesting questions which can be explored with this experiment method, but an experiment is constrained by the time and concentration that can be expected from human subjects. For example, questions about other color spaces and quantization methods cannot be answered in a single experiment. Similarly, the study has limited the number of original images to only six, producing a total of sixty-six images for each subject to view and evaluate. This is judged to be quite demanding for human subjects. These images have been chosen to be colorful scenery pictures. Thus, it may be that the results are limited to this type of pictures. For example, if the context is black and white pictures, the results are unlikely to apply. Ques-
456
H.C. Chan / Displays 29 (2008) 451–457
tions about applicability of the results to other contexts cannot be answered by one experiment. Many further studies are needed to provide answers to these questions. 6. Conclusion The study of human perception of image content is important for the performance of image retrieval system because the ultimate end user of an image retrieval system is human. The focus on human evaluation is gaining increasing attention in recent years, as seen in Section 2. It aims at exploring and understanding how human perceives image content and how such knowledge can be integrated into the retrieval systems. This work, which focused on human evaluation of image similarity at different quantization resolutions, is another research effort in this area. Quantization resolution is an important decision in image retrieval systems. The traditional method to compare the performance of different quantization resolution is through the retrieval performance or algorithmic calculation of a special system which applies the different quantization resolutions, e.g. [25]. The result by this method could depend on the system retrieval method, or even on the particular queries or the set of images in the database. In this work, however, perceived image similarity is employed as the measure to judge performance of different quantization resolutions. Colorloss, a quantitative method to measure the color information loss for quantization techniques, is used to compare with the human results. There are two major statistically significant findings from this study. First, the relationship between the perceived image similarity and quantization resolution can be explained by a logarithmic relationship. Second, colorloss can be used to predict very well human perception of image similarity at different resolution levels. The derived equations enable researchers to estimate human perception (based on colorloss) and to know with more certainty how their decisions on resolution levels will affect human perception of the images. The study has focused on color-based retrieval using the commonly used RGB space and a basic method of quantization. It will be interesting to investigate whether the results could be applicable to other situations, e.g. in different color spaces and different quantization methods. In addition, the general approach of studying human performance data and applying that for the computer retrieval system is also applicable to other content-based retrieval methods. References [1] J.J. Ashley, R. Barber, M.D. Flickner, J.L. Hafner, D. Lee, C.W. Niblack, D. Petkovic, Automatic and semi-automatic methods for image annotation and retrieval in QBIC, in: Carlton W. Niblack, Ramesh C. Jain (Eds.), Proc. SPIE, Storage and Retrieval for Image and Video Databases III, vol. 2420, 1995, pp. 24–35. [2] S. Belongie, C. Carson, H. Greenspan, J. Malik, Color- and texture-based image segmentation using the expectation-maximization algorithm and its application to content-based image retrieval, in: Int. Conference on Computer Vision, Mumbai, India, 1998, pp.675–682. [3] R. Brunelli, O. Mich, Histogram analysis for image retrieval, Pattern Recognition 34 (2001) 1625–1637. [4] C. Carson, V.E. Ogle, Storage and retrieval of feature data for a very large online image collection, Bulletin of the Technical Committee on Data Engineering 19 (4) (1996) 19–27. [5] S.S. Chan, R. Nerheim-Wolfe, An empirical assessment of selected colorquantizing algorithms, Proceedings of Human Vision, Visual Processing and Digital Display V, vol. 2179, SPIE, 1994. pp. 298–309. [6] H.C. Chan, Y. Wang, Human factors in color-based image retrieval: an empirical study on size estimate accuracies, Journal of Visual Communication and Image Representation 15 (2004) 113–131. [7] S.-F. Chang, J.R. Smith, Extracting multi-dimensional signal features for content-based visual query. in: Lance T. Wu (Ed.), Proc. SPIE Visual Communications and Image Processing ‘95, vol. 2501, 1995, pp. 995–1006. [8] T.S. Chua, W.C. Low, T. Sim, Visual information retrieval, The Journal of the Institute of Image Electronics Engineers of Japan 27 (1) (1998) 10–19.
[9] T.S. Chua, S.K. Lim, H.K. Pung, Content-based retrieval of segmented images, in: Proceedings of ACM Multimedia ’94, 1994, pp. 211–218. [10] L. Cinque, G. Ciocca, S. Levialdi, A. Pellicano, R. Schettini, Color-based Image retrieval using spatial-chromatic histograms, Image and Vision Computing 19 (2001) 979–986. [11] I.J. Cox, M.L. Miller, T.P. Minka, P.N. Yianilos, An optimized interaction strategy for bayesian relevance feedback, in: IEEE Conference on Computer Vision and Pattern Recognition, 1998, pp. 553–558. [12] I.J. Cox, M.L. Miller, S.M. Omohundro, P.N. Yianilos, PicHunter: Bayesian relevance feedback for image retrieval, in: Int. Conf. On Pattern Recognition, Vienna, Austria, vol. 3, 1996a, pp. 361–369. [13] I.J. Cox, M.L. Miller, S.M. Omohundro, P.N. Yianilos, Target Testing and the PicHunter Bayesian multimedia retrieval system, Advanced Digital Libraries ADL’96 Forum, Washington D.C., 1996b, pp. 66–75. [14] V. Di Lecce, A. Guerriero, An evaluation of the effectiveness of image features for image retrieval, Journal of Visual Communication and Image Representation 10 (4) (1999) 351–362. [15] F. Ennesser, G. Medioni, Finding Waldo, or focus of attention using local color information, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (8) (1995) 805–809. [16] P. Enser, Visual image retrieval: seeking the alliance of concept-based and content-based paradigms, Journal of Information Science 26 (4) (2000) 199– 210. [17] C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, D. Petkovic, R. Barber, Efficient and effective querying by image content, Journal of Intelligent Information System: Integrating Artificial Intelligence and Database Technologies 3 (3–4) (1994) 231–262. [18] B.V. Funt, G.D. Finlayson, Color constant color indexing, IEEE Transactions on Pattern Recognition and Machine Intelligence 17 (5) (1995) 522–529. [19] R.S. Gray, Content-based Image Retrieval: Color and Edges, Technical Report, Dartmouth University Department of Computer Science Technical Report #95–252, 1995. [20] J. Hafner, H.S. Sawhney, W. Equitz, M. Flickner, W. Niblack, Efficient color histogram indexing for quadratic form distance functions, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (7) (1995) 729–736. [21] S. Han, B. Tao, T. Cooper, I. Testl, Comparison between different color transformations for the JPEG 2000, in: Proc. PICS-2000, 2000. [22] B. Hill, Th. Roger, F.W. Vorhagen, Comparative analysis of the quantization of color spaces on the basis of the CIELAB color difference formula, ACM Transactions on Graphics 16 (2) (1997) 109–154. [23] A. Jaimes, S.-F. Chang, Concepts and techniques for indexing visual semantics, in: V. Castelli, L. Bergman (Eds.), Image Databases Search and Retrieval of Digital Imagery, John Wiley & Sons, 2001. [24] B. Jorgensen, A. Jaimes, A.B. Benitez, S.-F. Chang, A conceptual framework and research for classifying visual descriptors, Journal of the American Society for Information Science and Technology (JASIST), Special Issue on Image Access: Bridging Multiple Needs and Multiple Perspectives (2001) 938–947. [25] W.K. Leow, R. Li, The analysis and applications of adaptive-binning color histograms, Computer Vision and Image Understanding, Special Issue: Colour for Image Indexing and Retrieval 94 (1-3) (2004) 67–91. [26] R. Li, W.K. Leow, From region features to semantic labels: a probabilistic approach, in: Proc Int. Conf. on Multimedia Modeling, 2003, pp. 402–420. [27] H. Lu, B.C. Ooi, K.L. Tan, Efficient image retrieval by color contents, in: Proceedings of First International Conference on Applications of Databases (ADB-94), 1994, pp. 95–108. [28] E. Mathias, A. Conci, Comparing the influence of color spaces and metrics in content-based image retrieval, in: International Symposium on Computer Graphics, Image Processing, and Vision, 1998, pp. 371–378. [29] M. Mirmehdi, R. Perissamy, Perceptual image indexing and retrieval, Journal of Visual Communication and Image Representation 13 (4) (2002) 460–475. [30] T. Minka, R.W. Picard, Interactive learning using a ‘Society of Models’, Pattern Recognition 30 (4) (1997) 565–581. [31] A. Mojsilovic, J. Kovacevic, J. Hu, R.J. Safranek, S.K. Ganapathy, Matching and retrieval based on the vocabulary and grammar of color patterns, IEEE Transactions on Image Processing 9 (1) (2000) 38–54. [32] W. Niblack, R. Barber, W. Equitz, M. Flickner, E. Glasman, D. Petkovic, P. Yanker, C. Faloutsos, The QBIC project: Querying Images by Content Using Color, Texture and Shape, SPIE vol. 1908: Storage and Retrieval for Image and Video Databases I, 1993, pp. 173–187. [33] H. Nishiyama, S. Kin, T. Yokoyama, Y. Matsushita, An image retrieval system considering subjective perception, in: Conference Proceedings on Human Factors in Computing Systems: Celebrating Interdependence, 1994, pp. 30–36. [34] V. Ogle, M. Stonebraker, Chabot: retrieval from a relational database of images, IEEE computer 28 (9) (1995) 40–48. [35] T.V. Papathomas, T.E. Conway, I.J. Cox, J. Ghosn, M.L. Miller, T.P. Minka, P.N. Yianilos, Psychophysical studies of the performance of an image database retrieval system, in: Bernice E. Rogowitz, Thrasyvoulos N. Pappas (Eds.), Proc. SPIE Human Vision and Electronic Imaging III, vol. 3299, 1998, pp. 591–602. [36] G. Pass, R. Zabih, J. Miller, Comparing images using color coherence vectors, in: ACM Conference on Multimedia, 1996, pp. 65–73. [37] R.W. Picard, T.P. Minka, Vision texture for annotation, Multimedia Systems 3 (1995) 3–14. [38] R.W. Picard, Digital libraries: meeting place for high-level and low-level vision, in: Asian Conference on Computer Vision, 1995. [39] R.W. Picard, T.P. Minka, M. Szummer, Modeling user subjectivity in image libraries, in: IEEE international Conference on Image Processing, 1996.
H.C. Chan / Displays 29 (2008) 451–457 [40] A.R. Rao, G.L. Lohse, Identifying high level features of texture perception, CVGIP: Graphical Models and Image Processing 55 (3) (1993) 218–233. [41] S.A. Redfield, Efficient Object Recognition Using Color Quantization, Ph.D. Dissertation, University of Florida, 2001. [42] B.E. Rogowitz, T. Frese, J.R. Smith, C.A. Bouman, E.B. Kalin, Perceptual image similarity experiments, in: Bernice E. Rogowitz, Thrasyvoulos N. Pappas (Eds.), Proc. SPIE Human Vision and Electronic Imaging III, vol. 3299, 1998, pp. 576– 590. [43] Y. Rubner, C. Tomasi, L.J. Guibas, Adaptive color-image embeddings for database navigation, in: Proceedings of the 1998 IEEE Asian Conference on Computer Vision, 1998, pp. 104–111. [44] Y. Rui, T.S. Huang, S. Mehrotra, Relevance feedback techniques in interactive content-based image retrieval, in: Ishwar K. Sethi, Ramesh C. Jain (Eds.), Proc. SPIE Storage and Retrieval for Image and Video Databases VI, vol. 3312, 1997a, pp. 25–36. [45] Y. Rui, T.S. Huang, S. Mehrotra, M. Ortega, A relevance feedback architecture for content-based multimedia information retrieval system, in: Proc. of IEEE Workshop on Content-based Access of Image and Video Libraries, 1997b, pp. 82–89. [46] Y. Rui, T.S. Huang, S.-E. Chang, Image retrieval: current techniques, promising directions, and open issues, Journal of Visual Communication and Image Representation 10 (1) (1999) 39–62. [47] B. Scassellati, S. Alexopoulos, M. Flickner, Retrieving image by 2D shape: a comparison of computation methods with human perceptual judgments, in:
[48] [49]
[50] [51]
[52]
[53]
[54] [55]
457
Proceedings of the SPIE Conf. on Storage and Retrieval of Image and Video Databases II 2185, 1994, pp. 2–14. J.R. Smith, Integrated Spatial and Feature Image Systems: Retrieval, Analysis and Compression, Ph.D. Thesis, Columbia University, 1997. J. Smith, S. Chang, Tools and techniques for color image retrieval, in: Ishwar K. Sethi, Ramesh C. Jain (Eds.), Proc. SPIE Storage and Retrieval for Still Image and Video Databases IV, vol. 2670, 1996, pp. 426–437. M.J. Swain, D.H. Ballard, Color indexing, International Journal of Computer Vision 7 (1) (1991) 11–32. A. Vellaikal, C.-C.J. Kuo, Content-based image retrieval using multiresolution histogram representation, SPIE Digital Image Storage and Archiving Systems, vol. 2606, SPIE, Philadelphia, 1995. pp. 312–23. X. Wan, C.-C.J. Kuo, Color distribution analysis and quantization for image retrieval, in: Ishwar K. Sethi, Ramesh C. Jain (Eds.), Proc. SPIE Storage and Retrieval for Still Image and Video Databases IV, vol. 2670, 1996, pp. 8–16. J. Wang, W. Yang, R. Acharya, color clustering techniques for color-contentbased image retrieval from image databases, in: Proceedings of the International Conference on Multimedia Computing and Systems, 1997, pp. 442–447. G. Wyszecki, W.S. Stiles, Color science: concepts and methods, second ed., Quantitative Data and Formulae, John Wiley & Sons, 2000. H.-W. Yoo, D.-S. Jang, S.-H. Jung, J.-H. Park, K.-S. Song, Visual information retrieval system via content-based approach, Pattern Recognition 35 (2002) 749–769.