J. Vis. Commun. Image R. 23 (2012) 18–30
Contents lists available at SciVerse ScienceDirect
J. Vis. Commun. Image R. journal homepage: www.elsevier.com/locate/jvci
A multiresolution approach for rotation invariant texture image retrieval with orthogonal polynomials model R. Krishnamoorthi ⇑, S. Sathiya devi Vision Lab, Department of Computer Science and Engineering, Anna University, Thiruchirappalli, Tamilnadu, India
a r t i c l e
i n f o
Article history: Received 22 March 2010 Accepted 21 July 2011 Available online 3 August 2011 Keywords: Texture image retrieval Orthogonal polynomials Rotation invariant Gabor wavelet Contourlet Transform Multiresolution Feature extraction Canberra distance
a b s t r a c t In this paper, a simple and an efficient Content Based Image Retrieval which is based on orthogonal polynomials model is presented. This model is built with a set of carefully chosen orthogonal polynomials and is used to extract the low level texture features present in the image under analysis. The orthogonal polynomials model coefficients are reordered into multiresolution subband like structure. Simple statistical and perceptual properties are derived from the subband coefficients to represent the texture features and these features form a feature vector. The efficiency of the proposed feature vector extraction for texture image retrieval is experimented on the standard Brodatz and MIT’s VisTex texture database images with the Canberra distance measure. The proposed method is compared with other existing retrieval schemes such as Discrete Cosine Transformation (DCT) based multiresolution subbands, Gabor wavelet and Contourlet Transform based retrieval schemes and is found to outperform the existing schemes with less computational cost. Ó 2011 Elsevier Inc. All rights reserved.
1. Introduction With the advent of information technology, innumerable digitized images, photos and videos are being produced every day. This exponential growth has created a high demand for efficient tools for image searching, browsing and retrieval from various domains such as remote sensing, fashion, crime prevention, publishing, medicine, and architecture. This issue has been addressed by an integrated framework called Content Based Image Retrieval (CBIR). This framework utilizes the low level features that are inherent in the images, such as color, shape and texture. Among these three features, texture is a prominent and indispensable feature for content based image indexing and retrieval [1]. Though the texture has distinct properties such as direction, periodicity, coarseness and pattern complexity, it has been perceived by Human Visual System (HVS), especially on the aspects of orientation and scale. The earlier CBIR systems such as QBIC [2], NETRA [3], Photobook [4], and VisualSeek [5], employed texture as one of the dominant feature for image retrieval. But most of these systems assume that the images are in the same orientation and scale. This assumption may be wrong for many situations and retrieval rate may also be affected. In order to avoid this problem several methods are reported in the literature and a comparative study of these methods are presented in [6]. Among these methods, the Gabor wavelet ⇑ Corresponding author. E-mail addresses:
[email protected] (R. Krishnamoorthi), sathyadevi_s@ yahoo.com (S. Sathiya devi). 1047-3203/$ - see front matter Ó 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jvcir.2011.07.011
based representation and extraction of texture feature is found to be attractive, but it is variant to rotation and scale and results with a fairly poor retrieval performance. In order to avoid this, a modified Gabor representation is introduced in [7] that utilize very few summations of conventional Gabor filter impulse responses for rotation and scale invariant texture image retrieval. Another variation of the Gobar filter called Circular Gabor Filter (CGF) is proposed in [8], wherein the traditional Gabor function is modified into a circular symmetric version and the rotation invariant texture features are achieved via the channel output of the CGF for efficient texture segmentation. Zhang et al. [9] modeled the rotation invariant texture feature extraction based on Gabor texture features, wherein, rotation normalization is realized by a circular shift of the feature elements so that all the images have the same dominant direction. Cui et al. [10] proposed a rotation and scale invariant feature set based on multiscale Radon transform. The feature matrix derived from the Radon wavelet domain was not only rotational and scaling invariant but also aimed to reflect different energy distributions of the texture image at different scales. Pun [11] extracted the rotation invariant texture feature using polar transform followed by an adaptive row shift invariant wavelet packet transform. The energy signatures are computed from the wavelet packet transform and act as feature vector for image retrieval. This method was claimed to outperform the Multi Channel Gabor Filtering (MCGF) and Wavelet Packet Signature (WPS) algorithms. Haley and Manjunath [12] used the magnitude of a Discrete Fourier Transform (DFT) with multiresolution filtering for rotation invariant texture classification using complete space
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
frequency model. This model utilizes the polar-analytic form of a two-dimensional (2-D) Gabor wavelet to compute rotationally invariant spatially localized amplitude, frequency and directional behavior of the texture and produces a high classification rate for large sample set. Milanese and Chebuliez [13] utilized the Fourier Power Spectrum by mapping from Cartesian to Logarithmic-Polar Coordinates. The Fourier power spectrum coefficients were similar to wavelet coefficients and were used to represent shape, color and texture features. They claimed that these features capture perceptually similar images and are also shown to be invariant to 2D rigid transformations, such as any combination of rotation, scaling and translation. In order to achieve a quality rotation and scale invariant statistical view of texture classification and retrieval, Muneswaran et al. [14] combined the Gaussian, Mexican Hat and Daubechies wavelets to derive the invariant texture feature set. In this work, the rotational invariance is achieved by using the wavelets with their directional properties and the scale invariance is achieved by a method, which is an extension to fractal dimension (FD) features. The first and the second order statistical properties were taken for characterizing the feature vector. To reduce the computational complexity of the Gabor wavelets, Dual Tree Complex Wavelet Transform (DTCWT) [15] is introduced to derive the scale and rotated invariant feature vector. Kokare et al. [16] proposed a combination of Dual Tree Rotated Complex Wavelet Filter (DTRCWF) and Dual Tree Complex Wavelet Transform (DT – CWT) for efficient rotation invariant texture feature extraction. The new rotated complex wavelet filter is strongly oriented in six different directions and it also characterizes the highly oriented texture. The energy features are obtained on each subband of the decomposed image in 12 different directions. Chen and Kundu [17] modeled the features of wavelet subbands as Hidden Markov Model (HMM) for deriving rotation invariant texture feature. Deng and Clausi [18] developed an anisotropic Circular Gaussian Markov Random Field (ACGMRF) model for retrieving rotation invariant texture features. Then the rotation invariant features are obtained from ACGMRF model parameters with Discrete Fourier Transform. They claimed that this method outperforms when compared to Laplacian pyramid, Isotropic Circular GMRF (ICGMRF) and Gray Level Co-occurrence probability features. A rotation invariant image retrieval based on a transformation of the texture information via steerable pyramid is introduced in [19]. In this scheme, a joint alpha stable sub-Gaussian model subband coefficients capture the non-Gaussian behavior and the normalized features are measured by minimizing a rotation invariant version of the Kullback-Leibler Divergence. Recently Rao et al. [20] have used Contourlet Transform for image retrieval with Manhattan distance for similarity and also claimed that the combination of Contourlet Transform based feature extraction with Manhattan distance yields high retrieval compared with Gabor – Zernike features. In practical applications, it is very difficult to ensure that the captured images are in same orientation and scale with each other. Hence invariant texture analysis is highly enviable from both practical and theoretical viewpoints. Though wavelet based multiscale analysis is reported for texture image retrieval, the existing wavelet based methods exploit multiple stages of transformation for making rotation being invariant at the expense of high computational complexity with degradation in retrieval performance. Hence in this paper, we propose a new rotation invariant texture feature extraction scheme directly from the orthogonal polynomials model coefficients incorporating the multiresolution approach. The proposed method concentrates both on the perceptual and statistical properties of the texture such as coarseness, contrast, direction, mean, standard deviation and energy. This paper is organized as follows. In Section 2, a detailed description on the orthogonal polynomials model and reordering
19
of these coefficients into a multiresolution subband like structure is presented. The effective rotation invariant feature vector extraction process for texture image retrieval is presented in Section 3. Section 4 discusses the performance evaluation metric of the CBIR system. The experimental results of the proposed method are presented in Section 5. Section 5 also contains the experiments of the existing schemes and their comparison with the proposed scheme. Finally, conclusion is presented in Section 6. 2. Multiresolution reordering with orthogonal polynomials model coefficients In this section the orthogonal polynomials model and the reordering of the orthogonal polynomials coefficients into multiresolution subband structure is presented. Multiresolution analysis plays a vital role in Human Visual System (HVS) and the experimental studies have shown that the eye’s sensitivity to a visual stimulus strongly depends upon the spatial frequency contents of this stimulus. Hence, HVS motivates the use of multiscale image decompositions as a front end to complex image processing algorithms. In general the multiscale decomposition of an image can be viewed as a linear transformation of the original pixel values into a set of bases of the transformed coefficients. The connection between multistage decompositions and the bases for image representation shows that images are sparse linear combinations of elementary images. Consider the following representation of a signal x(t) defined over some domain I:
xðtÞ ¼
X
ak uk ðtÞ;
t2I
ð1Þ
k
Here, uk(t) is termed as basis function and ak are the coefficients of the signal x(t) in the basis a = {uk(t)}. For a discrete P Q image, let the variable t in Eq. (1) be the pair of integers (n1, n2) and the domain of x be I ¼ f0; 1; . . . N 1g f0; 1; . . . M 1g, then the basis t is said to be discrete. Hence the subband decomposition of an image can be viewed as a linear transformation of the original PQ pixel values x(t) into a set of PQ subband coefficients ak. In this proposed work, the image analysis is considered as a linear transformation of original pixel values with a set of orthogonal polynomials functions. The orthogonal polynomials that have already been well established for image coding [21–23], have been extended to this proposed CBIR system. In order to analyze the content of an image for efficient proposal of CBIR system, a linear 2-D image formation system is considered and is represented around a Cartesian coordinate separable, blurring, point spread operator in which the image I results in the superposition of the point source of impulse weighted by the value of the object f. The object function f is expressed in terms of derivatives of the image function I relative to its Cartesian coordinates. This function is very useful for analyzing the low level features of the image. The point spread function M(x, y) can be considered to be real valued function defined for (x, y) e X Y, where X and Y are ordered subsets of real values. In case of gray-level image of size (n n) where X (rows) consists of a finite set, which for convenience can be labeled as {0, 1, . . . , n 1}, the function M(x, y) reduces to a sequence of functions.
Mði; tÞ ¼ ui ðtÞ;
i; t ¼ 0; 1; . . . ; n 1
ð2Þ
The linear two dimensional transformation can be defined by the point spread operator M(x, y)(M(i, t) = ui(t)), as
b0 ðf; gÞ ¼
Z x2X
Z
Mðf; xÞMðg; yÞIðx; yÞdx dy
ð3Þ
y2Y
Considering both X and Y to be a finite set of values {0, 1, 2, . . . , n 1}, Eq. (3) can be written in matrix notation as follows:
20
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
jb0ij j ¼ ðjMj jMjÞ0 jIj
ð4Þ
^i u ^ tj Onij ¼ u
where is the outer product, b0ij are n2 matrices arranged in the dictionary sequence, I is the image, b0ij are the coefficients of transformation and the point spread operator M is
where ûi is the (i + 1)th column vector of |M|. For example, polynomial basis operators of size (3 3) are
u0 ðt 1 Þu1 ðt 1 Þ un1 ðt 1 Þ u ðt Þu ðt Þ u ðt Þ n1 2 0 2 1 2 jMj ¼ .. . u ðt Þu ðt Þ u ðt Þ 0 n 1 n n1 n
6 7 6 7 6 7 ½O300 ¼ 4 1 1 1 5; ½O301 ¼ 4 1 0 1 5; ½O302 ¼ 4 1 2 1 5; ð5Þ
We consider the set of orthogonal polynomials u0(t), u1(t), . . . , un1(t) of degrees 0, 1, 2, . . . , n 1, respectively to construct the polynomial operators of different sizes from Eq. (4) for n P 2 and ti = i. The generating formula for the polynomials is as follows:
uiþ1 ðtÞ ¼ ðt lÞui ðtÞ bi ðnÞui1 ðtÞ for i 1; u1 ðtÞ ¼ t l;
and u0 ðtÞ ¼ 1
Pn 2 u hui ;ui i where bi(n) = hui1 = Pnt¼1 2i ;ui1 i t¼1
ðtÞ
ui1 ðtÞ
ð6Þ and l = 1n
Pn
t¼1 t.
Considering the
range of values of t to be ti = i, i = 1, 2, 3, . . . , n, we get 2
bi ðnÞ ¼
2
1 1 1
2
1 0 1
3
2
1 0 1 2 3 1 1 1 1 0 1 6 7 6 7 ½O310 ¼ 4 0 0 0 5; ½O311 ¼ 4 0 0 0 5 1 1 1 1 0 1 2
1 1 1
3
2
3 2 1 2 1 1 6 7 6 3 ¼4 0 0 0 5; ½O20 ¼ 4 2 1 2 1 1 2 3 2 1 0 1 1 6 7 6 ½O321 ¼ 4 2 0 2 5; ½O322 ¼ 4 2 1 0 1 1
1
1
3
1 2 1
3
½O312
1 2 1
3
7 2 2 5; 1 1 3 2 1 7 4 2 5 2 1
It can be shown that a set of (n n) ðn P 2Þ polynomial operators forms a basis, which is complete, linearly independent and orthogonal.
2
i ðn2 i Þ
2.2. Multiresolution reordering
2
4ð4i 1Þ X 1 n nþ1 l¼ t¼ n t¼1 2 2.1. The orthogonal polynomials basis For the sake of computational simplicity, the finite Cartesian coordinate set X, Y is labeled as {1, 2, 3}. The point spread operator in Eq. (3) that defines the linear orthogonal transformation for image analysis can be obtained as M M, where M can be computed and scaled from Eq. (4) as follows:
u0 ðx0 Þ u1 ðx0 Þ u2 ðx0 Þ 1 1 1 jMj ¼ u0 ðx1 Þ u1 ðx1 Þ u2 ðx1 Þ ¼ 1 0 2 u0 ðx2 Þ u1 ðx2 Þ u2 ðx2 Þ 1 1 1 n
y¼ ð7Þ
The set of polynomial operators Oi j (0 6 i, j 6 n 1) can be computed as
(a)
The orthogonal polynomials coefficients b0ij obtained from the previous sub section are reordered to provide image subbands in a multiresolution decomposition like structure. In the proposed method, the transformed coefficients b0ij are reordered into (3log2N + 1) multi resolution subbands for each of the (N N) block where N is a power of 2. The following assumptions are made to reorder the coefficients b0ij into the required multiresolution form. For a coefficient b0ij , let 2a1 6 i < 2a and 2b1 6 j < 2b where a and b are integers and i, j = 0, . . . , (N 1). Then the coefficient b0ij be stored into a particular subband Sy, with y computed as
0
for m ¼ 0
3ðm 1Þ þ 2ða=mÞ þ ðb=mÞ otherwise
ð8Þ
where m = max(a, b). The similar process is repeated for each (N N) block of the whole image of size (R C) where R is the height and C is the width. Then the reordered location of the coefficient b0ij is determined by the function presented as follows for
(b)
Fig. 1. (a) Transformed coefficients of (8 8) block and (b) reordered structure.
21
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
the block B(z, w), where z and w represent the row and column value of the block B.
R ¼ ð2m1 z þ i 2a1 ; 2m1 w þ j 2b1 Þ
Input Image
ð9Þ
For example, considering a (8 8) block, the rearrangement of the transformed orthogonal polynomials coefficients into ten multiresolution subbands are shown in Fig. 1 and these subbands correspond to a different set of scale and orientations (different spatial frequency bands). The original transform coefficients are shown in Fig. 1(a) and the corresponding multiresolution representation of subbands are presented in Fig. 1(b).
Color Image
Yes
Convert into Gray scale
No Apply Orthogonal Polynomials Transformation and reorder the coefficients into 3 level multi resolution subband like structure.
3. Proposed rotation invariant texture feature extraction In this section, the feature extraction process with the texture primitives that present in the image under analysis based on the orthogonal polynomials model with the multiresolution reordered subbands is presented. Generally, the psychophysical works reveal that the human brain will make a frequency analysis when it perceives an image and the energy distribution in the frequency domain identifies the texture [24]. In order to simulate the human brain and HVS, in this proposed method the texture feature extraction process is based on certain combination of statistical, directional and perceptual properties of the multiresolution reordered subbands. The statistical property of the texture is characterized by the well known second order measures such as mean (lk), standard deviation (rk) and energy (Ek). As inspired by Tamura et al. [25] the perceptual property namely coarseness, contrast and the direction of the texture in horizontal, vertical and diagonal direction are extracted from the subband coefficients. In the orthogonal polynomials model, it is observed that when the block of an image is rotated the coefficient’s magnitude remains unaltered but their position and the sign vary. At the same time the absolute difference between the corresponding coefficients of the original and the rotated block in the zig-zag sequence is zero. This is presented in Appendix A. Hence this proposed texture feature extraction considers the absolute value of the subband coefficients for extracting rotation invariant texture feature. In the proposed work, the image that is considered for analysis has been reordered to 10 multi resolution subbands and shall be named as S0, S1, S2, . . . , S9 as in the case of three level wavelet decomposition. The subband S0 of the reordered multiresolution structure is a coarse version of the original image and is modeled to carry out the frequency and energy distribution. The remaining subbands Si, i = 1, 2, . . . , 9 are utilized to extract texture features in vertical, horizontal and diagonal directions. The global rotation invariant texture feature vector f is the combination of statistical (Fs), directional (Fd) and perceptual (Fp) feature vectors. The flowchart representing the proposed feature extraction process is shown in the Fig. 2. Let X and Y be the size of each subband and b0ij are the subband coefficients where i and j represent the row and column values of the subband images, respectively. Statistical properties viz mean (lk), standard deviation (rk) and energy (Ek) for each subband are modeled respectively in terms of transform coefficients as
lk rk
X X Y 1 X ¼ jb0 j XY i¼0 j¼0 ij
1 ¼ XY
Ek ¼
! X X Y X 0 bij lk i¼0
X X Y X i¼0
j¼0
where k denotes the specific subband.
Compute the direction feature in (i) Horizontal direction using the subbands S7, S4 and S1 (ii) Vertical direction using the subbands S2, S5 and S8 (iii) Diagonal direction in 45 and 135 degrees using the subbands S3, S6 and S9
Compute the Contrast and Coarseness from S0 subband.
Combine all the features to make a global texture feature vector of dimension 41. Fig. 2. Flow chat for the proposed feature extraction method.
Based on Eqs. (10)–(12) the statistical texture features Fs are calculated for all the 10 subbands resulting in the feature vector of dimension 30 (10 3 = 30) and is represented as follows:
F s ¼ ðE0 ; l0 ; r0 ; E1 ; l1 ; r1 ; . . . ; E3log2 Nþ1 ; l3log2 Nþ1 ; r3log2 Nþ1 Þ
ð13Þ
Since directional property is an important characteristic of the texture images, the proposed feature vector is aimed at concentrating on the (i) horizontal (ii) vertical and (iii) diagonal directions represented in terms of the orthogonal polynomials subband coefficients and the extraction of the same is described in the subsequent sub sections. 3.1. The horizontal direction In this proposed method the horizontal direction of the texture is computed in the S7, S4 and S1 subbands. In order to increase the contrast of this subband, first the subband coefficients are convolved with a template T as described in [26].
Hði; jÞ ¼ jSði; jÞj T
ð14Þ
where T is a template mask and is defined as
ð10Þ
0
1 0 1
1
B C T ¼ @1 0 1A ð11Þ
j¼1
jb0ij j
Compute the statistical features such as mean, standard deviation and energy from the subbands S0, S1, S2 …,S9
ð12Þ
ð15Þ
1 0 1 S(i, j) is the subband coefficients and H(i, j) is the result of convolution. The convolution result is normalized for every row as follows:
jHði; jÞj Hn ði; jÞ ¼ PY j¼0 jHði; jÞj
ð16Þ
22
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
We then extract the horizontal direction Hd of the texture features as:
Hd ¼
X X Y 1 X jHn ði; jÞj XY i¼0 j¼0
ð17Þ
where X and Y are the size of the subband.
jD135 ði; jÞj D135 n ði; jÞ ¼ PX j¼0 jD135 ði; jÞj
ð25Þ
Now the diagonal direction for 135° is computed as
D135 ¼
X X Y 1 X jD135 n ði; jÞj XY i¼0 j¼0
ð26Þ
3.2. The vertical direction The vertical direction V(i, j) is extracted similar to horizontal direction but with the S2, S5 and S8 subbands as given below:
Vði; jÞ ¼ jSði; jÞj T 90
ð18Þ
where T is a 90° rotated template mask presented in Eq. (13), S(i, j) is the subband coefficients and V(i, j) is the result of convolution. The convolution result V(i, j) is normalized for every column as follows: 90
jVði; jÞj
V n ði; jÞ ¼ PX
j¼0 jVði; jÞj
Dd ¼
1 ðD45 þ D135 Þ 2
ð27Þ
Hence the direction feature (Fd) is represented as
F d ¼ ðHd ; V d ; Dd Þ
ð28Þ
ð19Þ 3.4. Contrast
The vertical direction Vd is now computed as X X Y 1 X Vd ¼ jV n ði; jÞj XY i¼0 j¼0
where X and Y are the size of the subband.The overall diagonal direction Dd is defined as the average of the diagonal direction of 45° and 135° and is computed as follows:
ð20Þ
where X and Y are the size of the subband.
Contrast is defined as the local variation of each pixel with respect to their neighborhood and is used for denoting the change of gray levels in a texture. In this proposed method the subband S0 is used for calculating the contrast since it contains the approximate coefficients for representing the global texture information. The contrast Lcon is calculated similar to the method presented in [27].
3.3. The diagonal direction In this proposed method, the diagonal direction is extracted with two angles 45° (diagonal) and 135° (anti diagonal) at S3, S6 and S9 subbands. The procedure employed for calculating either the horizontal or vertical direction is repeated for computing the diagonal direction of the subband for 45° and 135°, respectively with different templates. Diagonal Direction D45 ði; jÞ for 45° is calculated as follows:
D45 ði; jÞ ¼ jSði; jÞj T 45
ð21Þ
Lcon ði; jÞ ¼
maxS2WW ðSÞ minS2WW ðSÞ maxS2WW ðSÞ þ minS2WW ðSÞ
ð29Þ
where S represents the coefficient in the S0 subband, i, j are the row and column values of the subband and ðW WÞ is the neighborhood mask size. The global contrast Gcon is defined as the mean of all the local contrast:
Gcon ¼
X X Y 1 X Lcon ði; jÞ XY i¼1 j¼1
ð30Þ
where
0
0
B T 45 ¼ @ 0
0 1
1
3.5. Coarseness
C 1 0A
1 0
Since coarseness represents the granularity of the texture, we propose to extract coarseness feature from S0 subband similar to the method presented in [25]. The algorithmic approach for calculating the coarseness is presented below.
0
Then the normalized convolution D45 n ði; jÞ is represented as
jD45 ði; jÞj D45 nði; jÞ ¼ PY j¼0 jD45 ði; jÞj
ð22Þ
Now the diagonal direction D45 is computed as
D45 ¼
X X Y 1 X jD45 nði; jÞj XY i¼0 j¼0
ð23Þ
where X and Y are the size of subband. The diagonal direction of 135° is computed as follows:
D135 ði; jÞ ¼ Sði; jÞ T 135 where
0
T 135
1 1 0 0 B C ¼ @0 1 0A 0 0 1
and the normalized diagonal direction D135 n ði; jÞ is
ð24Þ
Algorithm: Input: Image of size (M M). Output: Global Coarseness Gcoar Begin (i) Divide the (M M) image into K sub images in which each element is the average of the values in a neighboring 2k 2k block where k = 1, 2, 3, . . . , K. The average of the subband coefficients Ak(x, y) is computed as follows:
Ak ðx; yÞ ¼
1 22k
k1 k1 xþ2 1 yþ2 X X1
jSði; jÞj
ð31Þ
i¼x2k1 j¼y2k1
where S(i, j) are the coefficients of the S0 subband and K = 4. (ii) Compute the differences between pairs of average that correspond to non overlapping neighborhood both in vertical and horizontal orientation:
Ek;horizontal ðx; yÞ ¼ X 1
ð32Þ
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
where X1 = |Ak(x + 2k-1, y) Ak(x - 2k-1, y)|
Ek;v ertical ðx; yÞ ¼ Y 1
ð33Þ k-1
Gcoar
X X Y 1 X ¼ Bbest ði; jÞ XY i¼1 j¼1
ð34Þ
end. Based on Eqs. (30) and (34), the perceptual feature vector Fp is computed and represented as
F p ¼ ðGcon ; Gcoar Þ
zero and also reduces the scaling effect. The performance of the proposed method is measured in terms of precision and recall as defined as below:
k-1
where Y1 = |Ak(x, y + 2 ) Ak(x, y - 2 )| (iii) For each point, compute the best size of the neighbor mask as the maximum difference among adjacent regions, Bbest(x, y) = 2k where k maximizes the differences: (iv) Compute the global coarseness value as the average of Bbest
ð35Þ
23
Recall ¼
Ri T
Precision ¼
Ri Ti
ð38Þ
where Ri is the number of relevant retrieved images, T is the total number of relevant images in the image database, and Ti is the number of all retrieved images. The performance is also measured in terms of average recognition rate for some combinations of features, recognition rate for the top matches considered, retrieval time and indexing time. The average retrieval rate is defined as the percentage of retrieved images in top matches, which belongs to the same class as a query image.
Thus the global rotation invariant texture feature vector f of dimension 41 is represented as follows:
5. Experiments and results
f ¼ ðF s ; F d ; F p Þ
The orthogonal polynomials based texture image retrieval is experimented with standard images taken from popular Brodatz [29] and MIT Vision Texture (VisTex) database [30]. These images are of size (512 512) with pixel values in the range 0–255. Each (512 512) image is divided into 16 (128 128) non-overlapping sub images. Some of the sample texture images from Brodatz and VisTex databases are shown in Figs. 3 and 4, respectively. In order to prove the rotation invariant of the proposed feature vector, each image in the databases is rotated with 30°, 60°, 90°, 120°, 150° and 200° to generate a total of 10,192 (16 7 91) images from Brodatz and 18,368 (16 7 164) images from VisTex databases. During experimentation, the proposed method decomposes the image into three level multiresolution structure resulting in ten subbands. The images are divided into (8 8) blocks and the orthogonal polynomials transformation is applied to each block as described in Section 2. Then the transformed coefficients are reordered into ten subbands from which the texture features are extracted. The statistical feature vector Fs of dimension 30 is computed from the ten subbands. The energy feature is normalized between 0 and 1 using min–max normalization. The horizontal direction of the texture is computed from the three detail subbands namely S1, S4 and S7 using Eq. (17), as described in Section 3.2. Then the vertical direction Vd of the texture
ð36Þ
4. Similarity and performance measure Having designed the feature extraction process in the previous section, our next aim is to retrieve relevant images from the database against the query image using the similarity measure since it is a key component of any Content Based Image Retrieval system. In the proposed retrieval scheme, the similarity between the query image and the images present in the database are calculated using the well known Canberra distance metric as shown in Eq. (37), as the same is reported to be the best among different distance metric [28].
dc ðx; yÞ ¼
d X jxi yi j jxi j þ jyi j i¼1
ð37Þ
where xi is a feature vector of query image Q, yi is the feature vector of image I in the database and d is the size of the feature vector. In the above equation, the numerator signifies the difference and the denominator normalizes the difference. Thus the distance value will never exceed one whenever either of the feature components is
Fig. 3. Sample texture images taken from Brodatz texture database.
Fig. 4. Sample texture images taken from VisTex texture database.
24
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
Fig. 5. (a) Query image and (b) retrieval results obtained with the proposed orthogonal polynomials model corresponding to the query image (a) from Brodatz database.
is computed with the subbands S2, S5 and S8. The diagonal direction Hd of the texture is computed in two directions viz., 45° and 135° with the subbands S3, S6 and S9 and the average of these two is represented as the diagonal feature. Then the overall directional feature Fd is computed with Hd, Vd and Dd. Then the perceptual features viz coarseness and contrast are computed from the S0 subband using Eqs. (30) and (34) as described in Sections 3.4 and 3.5, respectively. The resultant perceptual feature vector is represented as Fp. Hence the global texture feature vector f of dimension 41 is obtained for each image in the databases. In order to measure the performance of the proposed retrieval scheme, two sample query images Bark.200_16 and Stone.0000.60_01 are considered from Brodatz and VisTex databases, respectively and are shown in Figs. 5(a) and 6(a). The global texture feature vector f for these query images are computed similar to the process of database images as described above. Then the distance is computed with the Canberra distance metric between the query image and the images that are present in the databases. The obtained distances are sorted in ascending order. The top 10 retrieval results by the proposed method for these two query images are shown in Figs. 5(b) and 6(b). The retrieval rate is 100% and 80% for the query image Bark.200_16 and Stone.0000.60_01, respectively. From these figures it can be observed that the proposed orthogonal polynomials model based texture image retrieval scheme could retrieve the same class of images as query image and perceptually similar images. Instead of considering one query image from each database, we conduct experiments by taking each of the database images as
query image and the performance of the proposed method is evaluated in terms of average precision and recall. We obtain the average recall and precision as 0.926 and 0.74, respectively with the proposed technique and these results are presented in Table 1. The retrieval performance of the proposed method is further measured as a function of number of top retrieved images by considering database images with different query images. The average percentage of retrieval for the top 10, 20, 30, . . . , 100 matches of retrieval are obtained with respect to all the query images. The results are plotted on a graph with top matches in the X-axis and the percentage of retrieval in Y-axis and the same is presented in Figs. 7 and 8 for Brodatz and VisTex database images. It can be observed from the graph that the top matches versus percentage of retrieval line stretching longer horizontally and staying high in the graph indicates that the proposed method performs better and the retrieval rate is rapidly increased for the top 25, 30, 40, 50 and 90 matches. In order to evaluate the effectiveness and efficiency of the proposed texture feature based retrieval, performance comparisons are made with DCT based multi resolution, Gabor wavelet method and Contourlet Transform methods in terms of top matches, precision and recall, average recognition rate, feature extraction time and retrieval time. For estimating the retrieval accuracy of these three existing methods with top 10 matches, the same query image Bark.200_16 and Stone.0000.60_01 as shown in Figs. 5(a) and 6(a) from Brodatz and VisTex database is considered. The top 10 retrieval results obtained for these three existing methods are shown in the Figs. 9–14. We could infer from these figures that though Gabor
25
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
Fig. 6. (a) Query image and (b) retrieval results obtained with the proposed orthogonal polynomials model corresponding to the query image (a) from VisTex database.
Table 1 Performance measure of the proposed method with DCT based multiresolution, Gabor wavelet and Contourlet Transform. Database
Brodatz VisTex
Proposed method
DCT based multiresolution
Gabor wavelet
Recall
Precision
Recall
Precision
Recall
Preciion
Recall
Precision
0.926 0.908
0.74 0.752
0.80 0.782
0.64 0.636
0.92 0.872
0.72 0.725
0.934 0.884
0.726 0.723
wavelet method retrieves all the images of the same query image class, it fails to retrieve perceptually similar images unlike the proposed method. The retrieval rate for these query images are 90% and 60%, respectively. Contourlet Transform performs well and yields retrieval rate of 90% for both the query images but it fails to retrieve the different rotated versions of the same query image.
Contourlet Transform
The retrieval rate of DCT based multi resolution method for these query images are 70% and 30%. The recall and precision of the existing DCT based multiresolution, Gabor wavelet methods and Contourlet Transform are also measured using the above mentioned procedure and the results are incorporated in the same Table 1. It is evident from the
Performance result for Brodatz Database Performance Result for Vistex Database
100
% of retrieval
90 proposed
80
DCT Gabor
70
Contourlet
60 50 10
20
30
40
50
60
70
80
90
100
Top matches considered Fig. 7. Retrieval performance with top matches for Brodatz database.
Percentage of retrieval
100 90 Proposed
80
DCT Gabor
70
Contoulet
60 50
10
20
30
40
50
60
70
80
90
100
No of matches Fig. 8. Retrieval performance with top matches for VisTex database.
26
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
Fig. 9. Retrieval results obtained with DCT based multiresolution structure corresponding to the query image 5(a) from Brodatz database.
Fig. 10. Retrieval results obtained with DCT based multiresolution structure corresponding to the query image 6(a) from VisTex database.
Fig. 11. Retrieval results obtained with Gabor wavelet scheme corresponding to the query image 5(a) from Brodatz database.
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
Fig. 12. Retrieval results obtained with Gabor wavelet scheme corresponding to the query image 6(a) from VisTex database.
Fig. 13. Retrieval results obtained Contourlet Transform corresponding to the query image 5(a) from Brodtaz database.
Fig. 14. Retrieval results obtained Contourlet Transform corresponding to the query image 6(a) from VisTex database.
27
28
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
Table 2 Average retrieval accuracy of the proposed, DCT based multiresolution, Gabor based and Contourlet Transform methods for Brodatz database. Texture feature
Direction Direction + statistical property (mean, std, energy) Direction + coarseness + contrast Combination of all (proposed feature)
Average retrieval performance (for Brodatz database) Proposed method (%)
DCT based multiresolution (%)
Gabor based method (%)
Contourlet Transform method (%)
68.25 82.45
67.6 70.35
56.72 81.94
70.04 81.65
72.56 91.72
69.78 80.12
80.23 90.56
67.20 91.26
Table 3 Average retrieval accuracy of the proposed, DCT based multiresolution, Gabor based and Contourlet Transform methods for VisTex database. Texture feature
Direction Direction + statistical property (mean, std, energy) Direction + coarseness + contrast Combination of all (proposed feature)
Average retrieval performance (for VisTtex database) Proposed method (%)
DCT based multiresolution (%)
Gabor based method (%)
Contourlet Transform method (%)
65.72 89.14
62.34 73.29
53.58 81.25
71.87 85.65
71.87 89.56
65.16 77.26
77.27 85.85
65.23 87.29
Table 4 Performance of various texture feature retrieval time (On Pentium IV machine).
Feature extraction time (for query image) Searching time
Proposed method
DCT based multiresolution
Gabor
Contourlet Transform
0.52 s 0.04 s
0.55 s 0.05 s
3.95 s 0.06 s
0.73 s 0.06 s
tabulated result that the proposed method is found to perform well when compared to the other three existing methods. The retrieval performance of these three existing methods are further measured as a function of number of top retrieved images. The obtained results are presented as graph in the same Figs. 7 and 8. The Gabor wavelet method and Contourlet Transform are found to perform well only at top 30, 40 and 90 matches whereas the proposed method performs well when the top matches are between 25, 30, 40, 50 and 90 and yields a high retrieval rate. In order to strengthen the efficiency of the proposed method, some combination of texture features are considered for calculating the average recognition rate: (i) directional property, (ii) direction with statistical property (mean, standard deviation and energy) and (iii) direction with the perceptual property namely coarseness and contrast. For the directional feature alone, the proposed, DCT based multiresolution method, Gabor wavelet and Contourlet Transform method yield the average retrieval rate of 68.25%, 67.6%, 56.72% and 70.04%, respectively for Brodatz database and 65.72%, 62.34%, 53.58% and 71.87%, respectively for VisTex database. Since the Gabor basis is not orthonormal, it fails to capture the direction of the texture and yields the poor retrieval rate as compared with the proposed, DCT based multiresolution and Contourlet approach whereas Contourlet has direction and anisotropy property it performs well compared with proposed and other two existing schemes. Since the statistical measure plays an important role in modeling the texture feature, the directional feature is combined with mean, standard deviation and energy for computing the average recognition rate. These feature combination yields a retrieval rate of 82.45%, 70.35%, 81.94% and 81.65% for proposed, DCT based multiresolution, Gabor wavelet and Contourlet Transform methods respectively for Brodatz and 89.14%, 73.29%, 81.25% and 85.65%, respectively for VisTex database. The recognition rate has rapidly increased from 68.25% to 82.45% in the case of Brodatz database and from 65.72% to 89.14% for VisTex database. The feature combi-
nations viz perceptual property (coarseness and contrast) with direction also produces a good retrieval rate for the proposed, DCT and Gabor based methods except Contourlet Transform. These results are presented in Tables 2 and 3. It can be concluded from these experiments that each feature has unique characteristics and affects the retrieval rate positively or negatively. When the proposed texture feature vector combines all these characteristics, it produces a high recognition rate of 91.72% and 89.56% for Brodatz and VisTex databases. The obtained results of the proposed scheme are found to be higher than other existing methods. From the experimental results, it can be observed that the proposed method performs well and have good discriminative ability for the texture images of type irregular, stochastic and near stochastic. For example the classes of images such as Bark, Food, Metal, Misc, Sand, Clouds, Water, Grass and Stone, it yields high retrieval rate. It fails to perform well for the regular and near regular texture images such as Fabric, Brick and Bubbles. The effectiveness of the proposed and other three existing schemes are also measured with respect to CPU utilization time for extracting the feature vector and search time of database images for a given query image. For these measures we conducted experiments with Pentium IV, Java (Jdk1.5) and MS Access. The results are reported in a tabular form in Table 4. It is evident from these results that the proposed method is computationally attractive and the feature extraction time for the query image is 7.5 times less than that of Gabor wavelet based method.
6. Conclusion In this paper, a new rotation invariant texture feature technique directly extracted from the orthogonal polynomials model coefficients, incorporating the multiresolution approach is presented. The transformed coefficients are reordered into multiresolution subband like structure and statistical, directional and perceptual
29
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
(a) 66
116
119
137
137
26
117
114
86
95
71
26
119
71
139
100
75
117
139
117
116
95
117
62
21
62
100
114
66
86
75
21
Rotated by 90o
(b)
(c)
Fig. 15. (a) Original image D16, (b) pixel values of block B, and (c) 90° left rotated result of block B(B90°).
rotation invariant texture features are extracted from them. The proposed method is experimented with the Brodatz and VisTex texture databases and the retrieval performance is compared with DCT based multiresolution, Gabor wavelet and Contourlet Transform methods. The proposed method outperforms with less computational cost.
Table 5 Transformation coefficients of Block B obtained with the proposed orthogonal polynomials model. 1461 253 9 651
477 655 589 955
177 5 59 35
29 285 53 25
Acknowledgement This work was sponsored by All India Council for Technical Education (AICTE), New Delhi, India under the scheme ‘‘AICTE-RPS (Research Promotion Scheme)’’ with the grant no: F. No.: 8023/BOR/ RID/RPS-127/2008-09. Appendix A
In the proposed orthogonal polynomials model, it is observed that when the image is rotated the position and sign of the transformed coefficients’ is varied and the value remain unaltered. Hence the proposed model uses the absolute value of the transformed coefficients for rotation invariant texture feature extraction and the feature vector is sorted in ascending order. Let us prove this by using the following assumptions and steps. Assumptions No. of orientations No. of sub bands Original image Rotated version of original image 1; . . . ; K 1 1; . . . ; N 1
Consider the energy feature at different sub bands with orientations and it is denoted as
Eðk;nÞ ¼
X X Y X i¼0
jb0ij j
1461 477 177 29
253 655 5 285
9 589 59 53
ðEk;0 ; Ek;1 ; . . . ; Ek;i ; . . . ; Ek;N1 Þ
A.1. Rotation invariance
K N I(x, y) I’(x, y) K N
Table 6 Transformation coefficients of Block B90° obtained with the proposed orthogonal polynomials model.
ð39Þ
j¼0
Now we are going to prove that the rotation is equivalent to that of sorting of feature vector elements in ascending order. At the 0th orientation the energy of each sub band of Iðx; yÞ in ascending order is
651 955 35 25
ð40Þ
Similarly the energy distribution of I0 (x, y) is
ðE0ki ; E0k;1i ; . . . ; E0k;0 ; . . . ; E0k;N1i Þ
ð41Þ
where Ek;0 ¼ E0k;i ; Ek;1 ¼ E0k;1i , and so forth. Because E0k;n ¼ E0k;nþN (an image has the same energy distribution after rotating 180°), now the energy distribution of I0 (x, y) is as follows:
ðE0k;iþN ; E0k;1iþN ; . . . ; E0k;0 ; . . . ; E0k;N1i Þ
ð42Þ
Then reordering the feature value of Eq. (42) in ascending order, we get
ðE0k;0 ; E0k;1 ; . . . ; E0k;N1i ; E0k;Ni ; E0k;Niþ1 ; . . . ; E0k;N1 Þ
ð43Þ
From Eqs. (40) and (43) it is inferred that rotation invariant is achieved by reordering the features in ascending order using the absolute values of the transformed co efficient. Fig. 15 shows the graphical evidence for illustrating this concept. A block of size (4 4) is extracted from the image D16. The original block and the 90°° left rotated block is labeled as B and B90°, respectively. The proposed orthogonal polynomials based transformation is applied to both the blocks and the results are shown in Tables 5 and 6, respectively. From the tables it is noticed that the sign and their position of the transformed coefficients are different in the zig-zag ordering scheme in the normal and the rotated block but the magnitude remains unaltered and also seen that the coefficient b000 remains same. The distance between block B and B90° becomes zero in the zig-zag scheme if the absolute
30
R. Krishnamoorthi, S. Sathiya devi / J. Vis. Commun. Image R. 23 (2012) 18–30
values of the transformed coefficients are taken. Hence the proposed model uses the absolute value of the transformed coefficients for texture feature extraction and the feature vector is sorted in ascending order. References [1] S. Antani, R. Kasturi, R. Jain, A survey on the use of pattern recognition methods for abstraction, indexing and retrieval of images and video, Pattern Recognition 35 (4) (2002) 945–965. [2] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, P. Yanker, Query by image and video content: the QBIC system, IEEE Computer 28 (9) (1995) 23–32. [3] W.Y. Ma, B.S. Manjunath, Netra: a toolbox for navigating large image databases, in: Proceedings of IEEE International Conference on Image Processing (ICIP97), vol. 1, 1997, pp. 568–571. [4] A. Pentland, R.W. Picard, S. Sclaroff, Photobook: content based manipulation of image databases, International Journal of Computer Vision 18 (3) (1996) 233– 254. [5] J.R. Smith, S.F. Chang, Querying by Color Region Using the VisualSEEK Content Based Visual Query System, Intelligent Multimedia Information Retrieval, AAAI Press (1997) 23–41. [6] S.R. Fountain, T.N. Tan, K.D. Baskar, Comparative study of rotation invariant classification and retrieval of texture images, in: Proceedings of British Computer Vision Conference, vol. 1, 1998, pp. 266–275. [7] J. Han, K. Ma, Rotation invariant and scale invariant Gabor features for texture image retrieval, Image and Vision Computing 25 (9) (2007) 1474–1481. [8] J. Zhang, T. Tan, L. Ma, Invariant texture segmentation via circular Gabor filters, in: Proceedings of 16th International Conference on Pattern Recognition, vol. 2, 2002, pp. 901–904. [9] D. Zhang, A. Wong, M. Indrawan, G. Lu, Content based image retrieval using Gabor texture features, in: Proceedings of 1st IEEE Pacific Rim Conference on Multimedia (PCM’00), 2000, pp. 1139–1142. [10] P. Cui, J. Li, Q. Pan, H. Zhang, Rotation and scaling invariant texture classification based on radon transform and multiscale analysis, Pattern Recognition Letters 27 (5) (2006) 408–413. [11] C. Pun, Rotation invariant texture feature for image retrieval, Computer Vision and Image Understanding 89 (1) (2003) 24–43. [12] G.M. Haley, B.S. Manjunath, Rotation invariant texture classification using complete space frequency model, IEEE Transaction Image Processing 8 (2) (1999) 255–269. [13] R. Milanese, M. Chebuliez, A. Rotation, Translation and scale invariant approach to content based image retrieval, Journal of Visual Communication and Image Representation 10 (2) (1999) 186–196. [14] K. Muneeswaran, L. Ganesan, S. Arumugam, K. Ruba Sounder, Texture classification with combined rotation and scale invariant wavelet features, Pattern Recognition 38 (10) (2005) 1495–1506.
[15] H.S. Lo, M. Pickering M. Frater, J. Arnold, Scale and rotation invariant texture features from the dual-tree complex wavelet transform, in: IEEE International Conference on Image Processing (ICIP), vol. 1, 2004, pp. 227–230. [16] M. Kokare, P.K. Biswas, B.N. Chatterji, Texture image retrieval using new rotated complex wavelet filters, IEEE Transactions on Systems, Man and Cybernetics – Part B 35 (6) (2005) 1168–1178. [17] J.L. Chen, A. Kundu, Rotational and gray-scale transform invariant texture identification using wavelet decomposition and Hidden Markov Model, IEEE Transaction on Pattern Analysis and Machine Intelligence 16 (2) (1994) 208– 214. [18] H. Deng, A. Clausi, Gaussian MRF rotation invariant features for image classification, IEEE Transactions on Pattern Analysis and Machine Intelligence 26 (7) (2004) 951–955. [19] G. Tzagkarakis, B.B. Lozano, P. Tsakalides, Rotation invariant texture retrieval with Gaussianized steerable pyramids, IEEE Transactions on Image Processing 15 (9) (2006) 2702–2718. [20] Ch. Srinivasa Rao, S. Srinivas kumar, B.N. Chatterji, Content based image retrieval using Contourlet Transform, ICGST-GVIP Journal 7 (3) (2007) 9–15. [21] R. Krishnamoorthi, P. Bhattacharyya, A new data compression scheme using orthogonal polynomials, in: International Conference on Information Communications and Signal Processing ICICS ’97, vol. 1, 1997, pp. 490–494. [22] R. Krishnamoorthi, Transform coding of monochrome image with statistical design of experiments approach to separate noise, Pattern Recognition Letters 28 (7) (2007) 771–777. [23] R. Krishnamoorthi, N. Kannan, A new integer image coding technique based on orthogonal polynomials, Image and Vision Computing 27 (8) (2009) 999–1006. [24] Z. He, X. You, Y. Yuan, Texture image retrieval based on non-tensor product wavelet filter banks, Signal Processing 89 (2009) 1501–1510. [25] H. Tamura, S. Mori, T. Yamawaki, Textural features corresponding to visual perception, IEEE Transactions on Systems, Man and Cybernetics 8 (6) (1978) 460–473. [26] M. Jian, J. Dong, D. Gao, Z. Liang, New texture features based on wavelet transform coinciding with human visual perception, in: Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing, vol. 1, 2007, pp. 369–373. [27] S. Battiato, G. Gallo, S. Nicotra, Perceptive visual texture classification and retrieval, in: 12th International Conference on Image Analysis and Processing, 2003, pp. 524–529. [28] M. Kokare, B.N. Chatterji, P.K. Biswas, Comparison of similarity metrics for texture image retrieval, in: IEEE Proceedings on TENCON 2003 Conference on Convergent Technologies for Asia-Pacific Region, vol. 2, 2003, pp. 571–575. [29] P. Brodatz, Textures: A Photographic Album for Artists and Designers, Dover, New York, 1966. [30]
.