Pattern Recognition Letters 26 (2005) 1105–1117 www.elsevier.com/locate/patrec
A main stem concept for image matching Volodymyr Mosorov
*
Department of Computer Engineering, Technical University of Lodz, Al. Politechniki 11, Lodz, 90-924, Poland Received 18 December 2003; received in revised form 1 October 2004 Available online 18 November 2004
Abstract An original ‘‘main stem’’ concept for image matching is presented. The main stem is a global image feature defined as a tree of reduced components without redundant and noise components. It has been shown that this image feature is strongly invariant to different types of topological transformations and contains useful information about ‘‘meaningful’’ image regions and their interrelations. We present how to construct the main stem and we devise an appropriate method for image matching that is based on their stems. The method for mapping the main stem onto a feature vector and appropriate metric to compare between the feature vectors in the selected representation space are presented. Preliminary experiments show the validity of the proposed method for robust image matching. 2004 Elsevier B.V. All rights reserved. Keywords: Image representation; Component tree; Image indexing; Similarity measure
1. Introduction Image matching is the key task in image database retrieval systems. There are two main phases in image matching: definition of feature space for image representation and definition of an appropriate metric in this feature space. Image transformation into features is called image indexing. The indices (feature vectors) are used to compactly rep-
*
Tel.:/fax: + 48 42 631 27 50. E-mail address:
[email protected] URL: http://www.kis.p.lodz.pl/~mosorow/index.html
resent image content, then the matching of indices is carried out. An image histogram is one of the many ways to index images and is used in such systems as the Query By Image Content system (Flickner et al., 1995). However, a histogram is a poor image measure, because two images may be very similar to each other even though they have completely unrelated semantics. Other image indexing approaches are based on the Fourier, discrete cosine or wavelet transforms which extract suitable characteristics of images, then selected coefficients of these transforms can be used as image indices (e.g. Wang et al., 1997).
0167-8655/$ - see front matter 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.patrec.2004.10.005
1106
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
Yet this approach has limitations in applications where high degrees of rotation and translation invariance are important. Therefore, the proposed image representation is not in accord with our visual perception. A large number of works in image representation adopt region-based approaches. Image regions are the basic building blocks in forming the visual content of an image and as a result they have considerable potential in representing the image content and enabling image matching. Loncaric and Dhawan (1993) have proposed a shape description method by means of Morphological Signature Transform (MST). However, this method has shown how to use a structuring element to compute the MST of the objects only. It does not solve the problem of structuring element selection. Moreover, vectors of shape parameters may be very useful for shape classification, but not as a basis for shape similarity measures. This is because, common shapes need hundreds of parameters to be represented explicitly and most of these parameters must be defined. For example, a large number of the shape detection algorithms work effectively on images with only a relatively uniform background. The majority of shape descriptions adopt such a procedure for image segmentation. Ma and Manjunath (1999) perform image retrieval that is based on segmented image regions. The segmentation procedure is not fully automated, as it requires some parametric tuning and hand pruning of regions. New image features have been recently introduced by Zhou and Huang (2001). They have proposed structural features for content-based image retrieval, especially edge/structure features extracted from edge maps. The feature vector is extracted by means of the ‘‘water-filling algorithm’’ applied to an edge map of the original image. However, the heuristic assumptions used in this algorithm are the main disadvantage of this approach. A naive Bayes algorithm to learn image categories from the blob representation in a supervised learning scheme was proposed by Carson et al. (1997). This framework suggested entails learning blob rules per category. It should be noted that each blob is represented by a histogram, thus the
representation is a discrete representation in the image plane as well as in feature space. Each query image is next compared to the extracted category models and associated with the closest matching category. In essence, the image matching problem is shifted to a one or two blob matching problem. In our approach we propose an image representation concept termed the ‘‘main stem’’ structure that is based on a component tree proposed by Bertrand et al. (1997). We show that this structure can be used for greyscale image indexing and matching. The ‘‘main stem’’ is a global image feature that is defined as a reduced component tree with redundant and ‘‘noise’’ components removed. The main advantage of the proposed image representation concept is its invariance to image rotation and translation and also its insensitivity to noise. It is shown that the main stems of similar images correlate well and remain unchanged under certain transformations, such as small changes in the lighting conditions, or the angle of the scene observation view. Additionally, it can contain as much semantically meaningful and important information as is needed. In our concept we first construct the main stem structure of an image and next generate the appropriate indices. These indices are generated by labelling stem components and building path sets of the stem structure. Images are compared and matched via an appropriate measure of similarity between image indices. The rest of this paper is organised as follows. In Section 2 some preliminaries are given and in the following sections we will elaborate on image representation using the proposed concept and introduce a similarity measure required for image matching. The preliminary experimental results are presented in Section 5 and the conclusions appear in the last section.
2. Component tree structure Let V denote an integer plane Z2. A grey-scale image I is defined by a function I(x, y), assuming discrete intensity values 0, 1, . . . , L 1 and given on x, y 2 V. Let C be an adjacency relation in V such as the 4-adjacency or the 8-adjacency.
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
To introduce the notion of a component tree we first need to define image level sets. In thresholding decomposition of image I the associated level set C(I, i) is a set obtained by thresholding function I(x,y),C(I,i) = {(x,y) : I(x,y) P i}, i = 0,1,. . . , L 1 (e.g. Action and Mukherjee (2000)). Each level set C(I, i) consists of connected components fcin g n = 1, 2, . . . , Ni, defined as subsets cin CðI; iÞ, where Ni is a total number of connected components of C(I, i). Any component cin of level i may include either zero, one, or strictly more than one level i + 1 components cniþ1 of i + 1 level set C(I, i + 1). Hence, each component cn can be classified into one of the three types: branch, leaf and node component. A branch component cin is a component that includes exactly one level i + 1 component ciþ1 n ; i ciþ1 c and a node component is a component n n that includes strictly more than one level i + 1 components. A node lowest level component is called a root. A leaf component is a component that includes no level i + 1 components. Definition 1. Let CðIÞ ¼ cin , n = 1, 2, . . . , Ni, i = 0, 1, . . . , L 1, be denoted as a set of all connected components for an image I. A component tree (CT) can be defined as follows:
1107
Definition 2. For an image I, a component tree T(I) is a set of components C(I) connected by a set of links, where each component in a tree is an abstract representation with links that are determined according to the types of relations that exist between components at consecutive grey levels. An example of a component tree corresponding to a grey-scale image (see Fig. 1a) is shown in Fig. 1b. The lines connecting the components are the links which show the existing relations between components. Najman and Couprie (2004) have proposed an algorithmic construction of a component tree structure. Component trees derived for real-live images contain too large a number of linked components of different types to be useful in practice. For instance, numbers of components for test images shown in Fig. 2, are listed in Table 1. Note, that here and in the following examples we take into account the components for which cin \ ciþ1 6¼ ;. n Note, that not each component of a component tree can be interpreted as an image object, e.g. as a texture element because each object consists of a number of components at different brightness levels. So the image shown in Fig. 1 comprises four objects and the total component number of the corresponding tree is equal to 16.
Fig. 1. Component tree of an image (a) is shown in (b), where: ·––node component, d––branch component, s––leaf component and dashed line is the height h(cn) of the chosen component (see Definition 4).
1108
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
Fig. 2. Examples of 256 · 256 grey-scale images. (a) Bridge, (b) Goldhill, (c) cameraman, (d) Lena, (e) pepper, (f) bird.
Table 1 Number of tree components for test images Image 256 · 256
Total number of components
Number of nodes
Number of leaves
Bridge Goldhill Paper Cameraman Lena Bird
19,064 17,960 15,484 15,079 12,629 7502
3328 3286 2561 2357 2140 769
6507 7598 7047 7075 6056 4761
3. A main stem concept for image description Node and leaf components of T (I) represent main topological discontinuities of a given image I. Branch components are not essential to characterise image topology. Our goal is to define a global feature for finding ‘‘meaningful’’ components c2C(I) of T (I) according to ‘‘meaningful’’ image regions (objects). The component tree obtained in such a way, must consist of a considerably smaller number of components, and should not com-
prise meaningless and noise components. Such components must be associated with the number of entire image shapes, their sizes and interrelations. In the next definitions, parameters associated with each tree component cn 2 C(I) are introduced. First we define a component area. Definition 3. An area s(cn) of component cn is defined as a pixel count of the component cn. For typical real images majority T(I) components are components of an area size from 1 to 3 pixels. This fact is illustrated in Table 2. Fig. 3 demonstrates component area distributions s(cn), for different types of components cn of the test images. These small size components can be classified in two types: components corresponding to noise image objects and components corresponding to the highest levels of brightness of any objects (object top). As we can observe the number of components in all examples quickly decreases for leaves
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117 Table 2 Percentage 1–3 pixel components for the test images Image
1–3 pixel components (%)
Bird Lena Cameraman Goldhill Pepper Bridge
70.93 63.77 62.80 61.69 60.49 52.26
and nodes in particular (see Fig. 3). For instance, for s = 3 the total number of components is about six times smaller than for s = 1.
component cin down to the leaf component ciþq n in T(I). An example of the height h(cn) = 120 for the chosen component cn is indicated by a dotted line in Fig. 1b. Note, that the component height characterises the grey level contrast of a component in comparison to its neighbourhood. Definitions of the area and height of tree components allow for introducing a concept of a ‘‘reduced component tree’’. Definition 5. A reduced component tree TR(I, so, ho) is a component tree consisting of components cn 2 C(I) of an area s(cn) larger or equal to so, s(cn) P so, and height h(cn) larger or equal to number of leaf components
number of leaf components
Definition 4. A height hðcin Þ of any non-leaf component cin is defined as the longest level path from
S
number of nodes
number of nodes
S
S
number of leaves
number of leaves
S
(a)
S
1109
(b)
S
Fig. 3. Examples of component area distribution for the Lena image (a) and pepper image (b).
1110
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
ho, h(cn) P ho, where so, ho are the predefined threshold values. TR(I, so, ho) is constructed by removing components fc0n g from an original tree T(I) if their sðc0n Þ < so and hðc0n Þ < ho . An example of a reduced component tree is illustrated in Fig. 4. The proposed idea has introduced an intuitive concept of a ‘‘main stem’’ of a component tree that is based on the following assumptions: Assumption 1. For each component tree T(I) there exists a main stem structure MS(T) comprising meaningful components of an entire image. Assumption 2. For each component tree T(I) there exist parameters sp, hp called pruning parameters, which define a main stem MS(T) as a reduced component tree TR(I, sp hp). The main stem characterises an entire image and should include the components that are particularly important for image retrieval only. Moreover, leaf components of MS(T) can be interpreted as regions of the most important image objects with regard to their size and brightness. In this work we propose a heuristic approach to seek the pruning parameters. For seeking pruning parameters of a main stem we have carried out the following experimental studies on the dependence of a number of leaf components versus the component area and height. Fig. 5 shows how the total number of leaf components cl 2 C(I) decreases,
for an area bigger than s(cl) > smin, smin = 1, 2, . . . , 100 and h(cl) > hmin, hmin = 1, 2, . . . , 100, respectively, for the Lena image. Obtained results show that histograms (see Fig. 5) can be divided into two different intervals. The first interval corresponds to a range of small values of s and h. This range presents a general case where the number of leaf components quickly decreases. The second interval corresponds to a range of larger values of s and h, where the number of leaf components decreases more slowly. Obviously, the pruning parameters depend on image sizes. This work reports preliminary results and the question of pruning parameter definition is open. In the simplest case we propose to use the n% capturing rule (number of leaf components at sp(hp)/total number of leaves * 100% > n, %).
4. Image indexing and matching For image indexing we introduce a simplified description of MS(T). This description takes into account nodes and leaves components of the main stem only. Proposition. Any main stem can be represented as a set P of shortest paths{pi}, i = 1, 2, . . . , L, from the root to each of the leaf components in an MS(T), where L is the number of all leaf components in MS(T). The path pi is a family of labels of nodes from the root to the leaf component.
Fig. 4. Original component tree (a), reduced component tree (b): each component c from T(I) is remote if its area s(cn) is smaller than so and h(cn) is smaller than ho = 120. The corresponding level set C(I,i), for i = 80 is shown in (c).
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
Sp
(b)
Sp
Smin
Smin
hp
n u m b e r o f l e a f c om p on e n t s
n u m b e r o f l e a f c o m p o n en t s
(a)
pepper image number of leaf components
number of leaf components
Lena image
1111
hp
hmin
hmin
Fig. 5. The histogram of a number of leaf components {cl}, which have (a) area larger than smin, s(cl) > smin; (b) height longer than hmin, h(cl) > hmin (for the Lena image and pepper image). The parameters sp and hp are chosen from a 95% energy capturing rule.
In order to determine labelling procedure we first introduce notions of ‘‘father-node’’ and ‘‘sonnode’’. Definition 6. Father-node is a low-level node that comprises n high-level nodes (son-nodes). Labelling of nodes is carried out by consecutive indexing of all son-nodes for each father-node beginning from the root. The root is labelled as 1. For image matching we first build path sets P k ¼ fpki g, i = 1, 2, . . . , L, for all K possible cases of node indexing, k = 1, 2, . . . , K from one of the stems. Examples of two possible cases of node component labelling and their path sets P1 and P2 are shown in Fig. 6. For comparison of stems an original matching algorithm is proposed. Let path sets PA and PB represent main stems MS(TA) and MS (TB) of images A and B, respectively. Let PA = {PAk}, k = 1, 2, . . . , K, where K is
the number of possible combinations of path indexing in MS(TA) of image A. The algorithm for matching any given two images A and B then proceeds as follows: 1. Choose path set PAx of sets {PAk}, k = 1, 2, . . . , K which is the most similar to path set PB according to the criterion: P Ax ¼ maxðP Ak \ P B Þ; k
k ¼ 1; 2; . . . ; K;
x 2 f1; 2; . . . ; Kg: 2. Determine joint part of both stems as a set J defined as an intersection of sets PAx and PB: J ¼ P Ax \ P B : Set J includes identical paths within PAx and PB 3. Determine a set D that includes different paths of sets PB and PAx. Set D is defined as a union of set differences:
1112
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
root
1
i-th leaf path set P1 1(2) 2(1) l6 1 l1
l3
l4
l2
path set P2
l1
(1,1)
(1,2)
l2
(1,1)
(1,2)
l3
(1,2)
(1,1)
l4
(1,2,1)
(1,1,1)
l5
(1,2,1)
(1,1,1)
l6
(1)
(1)
l5 (a)
(b)
Fig. 6. The node component labelling of MS (T) is shown in (a); path sets P1 and P2 for two possible cases of labelling are presented in (b).
D ¼ ðP Ax P B Þ [ ðP B P Ax Þ:
5. Calculate a similarity measure of the two stems,
4. Calculate the total number of nodes and leaves NJ of set J and the total number of nodes and leaves ND of set D, respectively.
S½MSðT A Þ; MSðT B Þ NJ þ 1 ¼ ðN D þ 1Þ ðN R þ 1Þ
Fig. 7. The view of an application window.
0 < S 6 1;
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
where NR is the total number of nodes and leaves in the main stem within a larger number of these components.
1113
The view of an application window is presented in Fig. 7. The developed application allows for:
5. Experimental results
(i) constructing a component tree of a grey-scale image, (ii) defining a main stem, (iii) counting the number of different types of components in a tree, (iv) visualisation of leaf components (i.e. the segmented image).
In order to verify the proposed concept a computer program was written in a C++ code. Details about this program are given in (Kowalski, 2004).
The test images are 8 bits/pixel images from the USC-SIPI Image Database, University of Southern California.
The measure S is defined in such a way that its values are limited to the range h0,1] with 1 corresponding to the largest similarity.
Fig. 8. The Lena image (a), main stem (b) of (a) (number of leaves = 18; pruning parameters: sp = 80, hp = 80). Image of leaf components of (b) is shown in (c).
Fig. 9. 256 · 256 Moon surface image (a). Image corrupted by 3% of ‘‘salt and pepper’’ noise (b). The main stem for (a) and (b), pruning parameters: sp = 50, hp = 40.
1114
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
Fig. 10. Scheme of the image matching procedure. Pruning parameters for images: sp = 200, hp = 55. Dashed ellipses indicate difference between query image stem and stems of the test images.
In this section we validate the proposed concept, in which we combine the main stem representation
with the defined distance measure and the applicability of the presented concept to image matching.
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
1115
Table 3 Parameters of component trees and stems for the test image set Image
Component tree
Main stem, sp = 200, hp = 55
Total components
Nodes
Leaves
Total components
Nodes
Leaves
Query image Motion_1 Motion_2 Tiffany Tank
20,979 20,728 20,808 48,567 47,889
1692 1638 1668 6730 6886
14,246 14,191 14,328 28,660 26,557
165 165 186 160 201
7 7 7 8 11
8 8 10 11 12
Table 4 The experimental results for test image set 256 · 256 image
Main stem sp = 100, hp = 100
Similarity measure, S Lena
Bird
Bridge
Goldhill
Cameraman
Pepper
Lena
1
0.028
0.038
0.036
0.040
0.028
Bird
0.028
1
0.008
0.031
0.007
0.008
Bridge
0.028
0.008
1
0.013
0.055
0.030
Goldhill
0.036
0.031
0.013
1
0.017
0.014
Cameraman
0.040
0.007
0.055
0.017
1
0.022
Pepper
0.028
0.008
0.030
0.014
0.022
1
We start from constructing a main stem structure for the Lena image. A complete component tree of 256 · 256 Lena image comprises 12629 components including 2140 nodes and 6059 leaves. Note, that these components, whose areas were not equal to respective component areas of a previous level, are taken into account only. The
constructed main stem is shown in Fig. 8b. It consists of 461 components including 16 nodes and 18 leaves. The MS(T) is a good description of the main topological discontinuities of the image and their leaf components can be interpreted as the main image objects (see Fig. 8c).
1116
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117
As indicated, MS(T) is designed to be invariant to shape, size, structure and rotation of primary shapes of an image. The main stem image representation is also resistant to noise corruption. Fig. 9 shows the moon surface test image and the same image of which 3% of pixels were contaminated by ‘‘salt and pepper’’ noise. This noise density was the highest noise level at which the shape of the main stem has been preserved. We next demonstrate the applicability of the presented concept to the image retrieval task. The images are matched by using the proposed measure S and the matching results are presented in Fig. 10. The top image is the query image. In the lower left hand side is the test image set. Each 512 · 512 image is processed to construct a complete component tree first. Then the main stem is extracted. The parameters of the component trees and stems of the test image set are presented in Table 3. For demonstration purposes the pruning parameters were so determined to obtain the main stems as small as possible, yet preserving the key image features. The feature vectors that represent path sets Pi, i = 1, . . . , 4 are next generated for each of these main stems and the PQ path for the main stem of an input query image. The PQ of the query image comprises two path sets PQ1, PQ2. The calculated similarity measures S for test images are: 0.175 (motion_1), 0.1111 (motion_2), 0.0667 (Tiffany), 0.0448 (tank), respectively. The obtained results are in accord with the visual similarity of the images and their main stems. Also Table 4 shows the similarity measures S of the proposed concept for a comparison of each to each for the image test set. The values of the similarity measures are small because here we have used stems comprised of a small number of components. There is the greatest degree of similarity between the cameraman image and bridge image and the smallest one between the cameraman image and bird image. Note that matching scheme is based on a comparison measured image parameters and in general case one can be differ from a comparison of the images performed by human brain.
6. Conclusion A novel image matching concept in which image representation is based on the reduced component tree structure termed the main stem was proposed. In this approach images are matched via the determined similarity measure between stems. The main stem structure is universal, in the sense that it allows for identification or distinguishing objects of arbitrary shapes, i.e., no restrictions on shapes are required. The proposed similarity measure that is based on image main stem enables two key image query tasks to be performed: • recognition of perceptually similar images that are not identical, • detection of significant objects of an image. The proposed image representation is also invariant to shift and rotation of image objects and is resistant to noise. We view this work as the first step in an extensive research effort ahead aimed at testing the proposed concept on larger data-sets. The paper reports preliminary results of using the component tree structure for image description and matching. However, several research questions are open. One of the main difficulties is the definition of the pruning parameters for constructing the main stem. Another problem is the accuracy of an image indexing based on this representation. We also conclude that the concept of the main stem representation of images can be extended to represent different image categories, for example the containing of buildings, people and vehicles etc. References Action, S.T., Mukherjee, D.P., 2000. Area operators for edge detection. Patt. Recog. Lett. 21, 771–777. Bertrand, G., Everat, J.C., Couprie, M., 1997. Topological grayscale watershed transformation. Vis. Geometry VI, SPIE 3168, 136–146. Carson, C., Belongie, S., Greenspan, H., Malik, J., 1997. Region based image querying. In: Proc. of the IEEE Workshop on Content based Access of Image and Video libraries (CVPRÕ97), pp. 42–49.
V. Mosorov / Pattern Recognition Letters 26 (2005) 1105–1117 Flickner, M., Sawhney, H., Niblack, W., Ashley, J., Huang, Q., Dom, B. et al., 1995. Query by image and video content the QBIC system. IEEE Comput. 28 (9), 23–32. Kowalski, T., 2004. Image processing based on a tree component analysis. M.Sc. thesis (supervisor V. Mosorov). Technical University of Lodz, Poland. Loncaric, S., Dhawan, A.P., 1993. A morphological signature transform for shape description. Patt. Recog. 26 (7), 1029– 1037. Ma, W.-Y., Manjunath, B.S., 1999. NeTra: A toolbox for navigating large image databases. Multimedia Syst. 7, 184–198.
1117
Najman, L., Couprie, M., 2004. Quasi-linear algorithm for the component tree. In: Vision Geometry XII, part of the IS & T/SPIE Symposium on Electronic Imaging, CA, USA. Wang, J.Z., Wiederhold, G., Firschein, O., Wei, S.X., 1997. Content-based image indexing and searching using DaubechiesÕ wavelets. Int. J. Digital Libraries 1, 311– 338. Zhou, X.S., Huang, T.S., 2001. Edge-based structural features for content-based image retrieval. Patt. Recog. Lett. 22 (5), 457–468.