Image retrieval based on the texton co-occurrence matrix

Image retrieval based on the texton co-occurrence matrix

Pattern Recognition 41 (2008) 3521 -- 3527 Contents lists available at ScienceDirect Pattern Recognition journal homepage: w w w . e l s e v i e r ...

1MB Sizes 0 Downloads 35 Views

Pattern Recognition 41 (2008) 3521 -- 3527

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / p r

Image retrieval based on the texton co-occurrence matrix Guang-Hai Liu ∗ , Jing-Yu Yang Department of Computer Science, Nanjing University of Science and Technology, Nanjing 210094, China

A R T I C L E

I N F O

Article history: Received 4 June 2007 Received in revised form 4 April 2008 Accepted 9 June 2008

Keywords: Image retrieval GLCM Color gradient The texton co-occurrence matrix

A B S T R A C T

This paper put forward a new method of co-occurrence matrix to describe image features. This method can express the spatial correlation of textons. During the course of feature extracting, we have quantized the original images into 256 colors and computed color gradient from the RGB vector space, and then calculated the statistical information of textons to describe image features. Image retrieval experimental results have shown that our proposed method has the discrimination power of color, texture and shape features, the performances are better than that of GLCM and CCG. © 2008 Elsevier Ltd. All rights reserved.

1. Introduction Image retrieval is one of the main topics in the field of computer vision and pattern recognition. In the early 1990s, researchers have built many image retrieval systems, such as QIBC, MARS, FIDS and so on. They are different from the traditional image retrieval systems. These systems are based on image features such as color, texture, shape of objects and so on. Nowadays, the main research work of image retrieval consists of feature extracting techniques, image similarly match and image retrieval methods. Many researchers have put forward various algorithms to extract color, texture and shape features. Color is the most dominant and distinguishing visual feature. Color histogram-based techniques remain popular due to their simplicity, but it lacks spatial information. Several color descriptors try to incorporate spatial information to varying degrees, it include the compact color moments, color coherence vector and color correlograms [1]. Texture is used to specify the roughness or coarseness of object surface and described as a pattern with some kind of regularity. Many researchers have put forward various algorithms for texture analysis, such as the gray co-occurrence matrixes [2], Markov random field (MRF) model [3], simultaneous auto-regressive (SAR) model [4], Wold decomposition model [5], Gabor filtering [6,7] and wavelet decomposition [8,9] and so on. Shape features are widely used in various areas such as object recognition and content-based image retrieval. The classic methods



Corresponding author. Tel./fax: +86 25 84315510. E-mail addresses: [email protected] (G.-H. Liu), [email protected] (J.-Y. Yang). 0031-3203/$ - see front matter © 2008 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2008.06.010

of describing shape features are moment invariants, Fourier transform coefficients, edge curvature and arc length [10]. In order to integrate color, texture and shape features, this paper put forward a new method of co-occurrence matrix to describe image features. This method can express the spatial correlation of textons. During the course of feature extracting, we have quantized the original images into 256 colors and computed color gradient from the RGB vector space, and then calculated the statistical information of textons to describe image features. Image retrieval experimental results have shown that our proposed method has the discrimination power of color, texture and shape features, the performances are better than that of GLCM and CCG. The paper is organized as follows. In Section 2 the gray co-occurrence matrix is presented. In Section 3, the texton cooccurrence matrix (TCM) is studied and recommended after considering certain techniques of color edge extracting. In Section 4, the image retrieval performance resulted from GLCM, CCG and our proposed method is compared by conducting two experiments over the Vistex texture database of MIT, Corel images and those images which come from web. Section 5 concludes the paper. 2. The gray co-occurrence matrix The gray co-occurrence matrix is a traditional statistical method for texture analysis. The co-occurrence matrixes characterize the relationship between the values of neighboring pixels [2]. For a coarse texture these matrixes tend to have high values near the main diagonal, whereas for a fine texture the values are scattered. We denote the values of a gray image f as w ∈ {0, 1, . . . , W − 1} with f (P) = w. The pixel position is P = (x, y), let P1 = (x1 , y1 ), P2 = (x2 , y2 ), f (P1 ) = w, ˆ The probability Pr of two values (w, w) ˆ with pixel positions f (P2 )= w.

3522

G.-H. Liu, J.-Y. Yang / Pattern Recognition 41 (2008) 3521 -- 3527

ˆ of the Cd, [11]: related by d, defines the cell entry (w, w) ˆ = Pr(f (p1 ) = w ∧ f (p2 ) = w||p ˆ 1 − p2 | = d) Cd, (w, w)

(1)

 usually takes the range of values: 0◦ , 45◦ , 90◦ , 135◦ . Haralick extracted a set of 14 features based on the co-occurrence matrix [2], such as energy, homogeneity, contrast, entropy and so on. 3. The texton co-occurrence matrix The image edge has a close relationship with contour and texture pattern. It can provide abundance of texture information and shape information. The gradient information of image can detect the saltation of color, such as color image edge, stripe and acuity and so on. For a function f (x, y), the gradient of f at coordinates (x, y) is defined as the two-dimensional column vector:   jf jf ∇f = [Gx , Gy ] = , (2) jx jy

(3)

It is a common practice to approximate the magnitude of the gradient by using absolute values instead of squares and square roots: ∇f ≈ |Gx | + |Gy |

jR jG jB r+ g+ b jx jx jx jR jG jB r+ g+ b v= jy jy jy

u=

(5) (6)

Let the quantities gxx , gyy and gxy be defined in the terms of the dot product of these vectors, as follows [12]:   2     jR   j G  2  j B 2  +  gxx = uT u =   +   jx  jx jx  2  2  2       jR jG  jB  gyy = vT v =   +  +    jy jy jy gxy = uT v =

And the corresponding eigenvectors are given by n± = (cos ± , sin ± )

+ =

2gxy 1 arctan + k 2 gxx − gyy

jR jR jG jG jB jB + + jx jy jx jy jx jy

(7) (8)

(9)

Let a unit vector n = (n1 , n2 ) in the (x, y) plane, we define the squared local contrast of f (x, y) at P in direction n, as S(p, n) = gxx n21 + 2gxy n1 n2 + gyy n22

(10)

(13) (14) (15)

The gradient of image function f (x, y) is given by the angle, as follows [12]: G(x, y) = (0. 5 × [(gxx + gyy ) + (gxx − gyy ) cos 2 + 2gxy sin 2])1/2 (16)

(4)

A color image is considered as a two-dimensional vector field f (x, y) with three components, R, G and B. Let r, g and b be unit vectors along the R-, G- and B-axes of RGB color space, and define the vectors [12]:

(11)

It is well known that a quadratic form (Eq. (10)) has a maximum and a minimum value for varying n. These extreme values coincide with the eigenvalues of matrix A, and they are attained when n is the corresponding eigenvector [13]. By elementary calculations, one finds such extreme values to be  gxx + gyy (gxx − gyy )2 2 ± + gxy ± = (12) 2 4

− = + ± /2

The magnitude of this vector is given by ∇f = mag(∇f ) = [G2x + G2y ]1/2 = [(jf/ jx)2 + (jf/ jy)2 ]1/2

Let there are a 2 × 2 matrix A, as follows:   gxx gxy A= gxy gyy

Along orthogonal directions G is the maximum of color gradient Max(G), and the minimum of color gradient Min(G) along the other. All color gradient values have been normalized so that all values are assumed to be in the range of [0, 1], then projected the gradient values to be an image. It is shown in Fig. 1. Pixels are the base of image. Many researches have shown that it is difficult to obtain satisfactory effect only using the simplicity algorithm based on pixel level for image analysis. In order to improve the performance of image retrieval, this paper put forward the TCM to describe image features. Julesz [14] proposed the term “texton” conceptually more than 20 years ago. Texton is a very useful concept in texture analysis. As a general rule, textons defined as a set of blobs or emergent patterns sharing a common property all over the image; however, defining textons remains a challenge. The image features have a close relationship with textons and color diversification. The difference of textons may form various image features. If the textons in image are small and the tonal differences between neighboring textons are large, a fine texture may result. If the textons are larger and consist of several pixels, a coarse texture may result. At the same time, the fine or coarse texture characteristic depends on scale [10]. If the textons in image are large and consist of a few texton categories, an obvious shape may result. There are many types of textons in images. In this paper, we only define five special types of textons for image analysis. Let there is a

Fig. 1. Original image and color gradient images: (a) Original image; (b) Max(G) and (c) Min(G).

G.-H. Liu, J.-Y. Yang / Pattern Recognition 41 (2008) 3521 -- 3527

3523

Fig. 2. Five special types of textons: (a) 2 × 2 grid; (b) T1 ; (c) T2 ; (d) T3 ; (e) T4 and (f) T5 .

Fig. 3. The flow chart of textons detecting: (a) original image, (b) five special types of textons, (c) textons detection, (d) five components of texton images and (e) the final texton image.

2 × 2 grid in image. Its pixels are V1 , V2 , V3 and V4 , if three or four pixel values are same, thus those pixels formed a texton. Those five special types of textons are denoted as T1 , T2 , T3 , T4 , and T5 . It is shown in Fig. 2, the shadow of 2 × 2 grid denotes those pixel values are same. Different shadow structure formed various textons. Fig. 3(a) is an image, if it is shifted by one pixel in every direction, a 2 × 2 grid may appear. We use those five special types of textons to detect every grid, respectively, and then find out whether one of them may appear in those grids. A type of texton may detect out a component of texton image, thus there are five components of texton images. It is shown in Fig. 3(c). In those five components of texton images, the pixels of textons are kept in original values, others are replaced with the value of 0. It is shown in Fig. 3(d). Finally, we will combine those five components of texton images together to

form a final textons image. Let the pixel position is P = (x, y), at the same position Pi = (xi , yi ), every component of texton image has a pixel value, thus five components of texton images have five pixel values. They are denoted as W1 , W2 , W3 , W4 and W5 . If those five pixel values are same, the final texton image will be kept in original values in corresponding positions. If the values of 0 and nonzero appear in those five pixels, the final textons image will be kept in the values of nonzero. It is shown in Fig. 3(e). After detecting out the final textons image, we will calculate the emergence number of two values with pixel positions related by d.  usually takes the range of values: 0◦ , 45◦ , 90◦ , 135◦ . It is shown in Fig. 4(c) and (d). In order to extract color information, this paper also quantized the original image into 256 colors. We denote the index matrix of

3524

G.-H. Liu, J.-Y. Yang / Pattern Recognition 41 (2008) 3521 -- 3527

 usually takes the range of values: 0◦ , 45◦ , 90◦ and 135◦ . The TCMs utilize energy, contrast, entropy and homogeneity to describe image features. The features are given as follows [2,10]:

T1 =

N−1 N−1

F 2 (x, y)

(22)

x=0 y=0 N−1

T2 =

(x − y)

2

x−y=0

T3 = −

N−1 N−1

⎧ ⎨N−1 N−1 ⎩

x=0 y=0

⎫ ⎬ F(x, y) ⎭

(23)

F(x, y) log F(x, y)

(24)

x=0 y=0

N−1 N−1 T4 =

x=0

y=0

F(x, y) (25)

1 + (x − y)2

Let m be color quantization level or gray level, the computationally complexity of the gray co-occurrence matrix, the color correlograms and the TCM are O(m2 ), O(m2 d) and O(5m2 ). The computation of the texton co-occurrence matrix is bigger than the gray co-occurrence matrix and the color correlograms. 4. Image retrieval

Fig. 4. The flow chart of the texton co-occurrence matrix: (a) original image, (b) texton image, (c) the texton co-occurrence matrix and (d) four directional texton co-occurrence matrixes.

256 colors image as C(x, y). Let I(R), I(G), I(B) be the index value of be unit vectors along the R-, G- and B-axes of RGB color space, as follows: C(x, y) = 32 ∗ I(R) + 4 ∗ I(G) + I(B) where ⎧ I(R) = 0, 0  R  32 ⎪ ⎨ I(R) = i, 32 ∗ i + 1  R  32 ∗ (i + 1) ⎪ ⎩ i ∈ [1, 2, . . . , 7] ⎧ I(G) = 0, 0  G  32 ⎪ ⎨ I(G) = i, 32 ∗ i + 1  G  32 ∗ (i + 1) ⎪ ⎩ i ∈ [1, 2, . . . , 7] ⎧ I(B) = 0, 0  B  64 ⎪ ⎨ I(B) = i, 64 ∗ i + 1  B  64 ∗ (i + 1) ⎪ ⎩ i = 1, 2, 3

(17)

(18)

(19)

(20)

As for the texton images of Max(G), Min(G) and 256 colors image C(x, y), we use co-occurrence matrixes to extract their features. We denote the values of a texton image f as w ∈ {0, 1, . . . , W − 1} with f (P) = w. the pixel position is P = (x, y), let P1 = (x1 , y1 ), P2 = (x2 , y2 ), ˆ If the probability Pr of two values w and w ˆ cof (P1 ) = w, f (P2 ) = w. occurring with two pixel positions related by D defines the cell entry ˆ of co-occurrence matrix CD, , (w, w) ˆ = 1 − Pr{f (P1 ) = w ∧ f (P2 ) = w||P ˆ 1 − P2 | = D} CD, (w, w)

Two image database sets are used by ours system. The first image database set is the VisTex texture images of MIT. We have selected eight image categories, every category containing 10 images of size 512 × 512. As for every category, we have divided them into unoverlapped 256 × 256 sub-images, and then add those images into original images. As for water images, the number of original water image is 8, we have selected two images from the scene images of the VisTex texture images of MIT, namely GroundWaterCity.0008 and GroundWaterCity.0009, and divided them into un-overlapped 256 × 256 sub-images. Finally, all images are zooming out into 128 × 128 and result in a set of 400 images. The second image database set contains 2000 images. It comes from Corel images, VisTex texture images and web. In order to test the discrimination power of shape features, we have selected three image categories into database which have obvious shape features, such as car, porcelain and airplane. The second image database set includes eight image categories consist of car, rock, porcelain, leaves,

(21)

Table 1 The average retrieval precision with different distance on the first image database set Methods Distance parameter and the average retrieval precision (%)

GLCM CCG TCM

D=1

D=2

D=3

D=4

D=5

D=6

D=7

D=8

D=9

36.35 31.70 59.47

39.56 36.31 58.79

39.66 39.10 59.19

40.63 40.95 59.42

41.05 40.89 59.81

41.04 40.97 59.61

41.11 39.93 59.33

41.00 40.71 59.56

40.03 40.83 59.70

Table 2 The average retrieval precision on the second image database set Methods Distance parameter and the average retrieval precision (%)

GLCM CCG TCM

D=1

D=2

D=3

D=4

D=5

D=6

D=7

D=8

D=9

41.46 38.12 60.39

43.75 39.10 59.61

42.50 40.87 61.95

42.85 42.15 59.56

43.06 42.15 59.00

43.54 43.47 59.66

42.85 45.00 59.00

43.54 44.31 59.51

43.19 44.10 59.50

G.-H. Liu, J.-Y. Yang / Pattern Recognition 41 (2008) 3521 -- 3527

3525

Fig. 5. The performance curve figure using the methods of GLCM, CCG and TCM (a) on the first image database set with D = 5 and (b) on the second image database set with D = 3.

Table 3 The average retrieval precision on the first image database set with D = 5 Methods Image category and the precision (%) Bark GLCM CCG TCM

Fabric Paintings Food

26.66 36.44 35.55 36.67 44.89 50.67

40.22 59.11 53.78

Leaves Tile

34.00 47.56 49.78 40.67 52.44 59.56

Terrain Water Average

29.33 45.33 30.44 34.00 63.44 80.82

68.89 40.89 72.89

41.05 40.89 59.81

Table 4 The average retrieval precision on the second image database set with D = 3 Methods Image category and the precision (%) Car GLCM CCG TCM

Rock

Porcelain Leaves Food

46.11 65.55 41.11 48.89 41.67 44.44 72.50 65.75 70.42

56.11 48.33 64.58

Fabric Bark

42.22 28.33 33.33 37.22 47.50 46.11

Airplane Average

27.78 32.78 43.65 29.44 66.67 62.08

42.50 40.87 61.95

food, fabric, bark and airplane. Each image category contains 250 image sizes 128 × 128. To an image, a 12-dimensional vector T = [T1 , T2 , . . . , T12 ] will be obtained as the final image features in retrieval. Let Q = [Q1 , Q2 , . . . , Q12 ] be query image features, we use Euclidean distance formula in matching of features:   12   (Ti − Qj )2 D(i, j) = 

We compare the retrieval performance of the TCM to the gray cooccurrence matrix (GLCM) and color correlograms (CCG). During the course of two experiments, we have selected 50 images belonging to same category as query images. The relevant images do not include query image itself. The performance is evaluated as the average of the results calculated for each query separately. In experiments, the distance parameter D values utilized to calculate the co-occurrence matrixes were: D = 1, 2, . . . , 9. The average retrieval precision values are listed in Tables 1 and 2. The best performance of TCM was obtained when D = 5 for the first database set and D = 3 for the second database set. The average retrieval precision of GLCM method is basically described from 36% to 41% for the first database set and from 41% to 43% for the second database set. The average retrieval precision of CCG method is from 36% to 41% for the first database set and from 38% to 45% for the second database set. The average retrieval precision and recall curves are plotted in Fig. 5. It can be seen from the Tables 3, 4 and Fig. 5 that the proposed method achieves good results in terms of the retrieval precisions and recall compared to GLCM and CCG methods. Its performance is much better than that of GLCM and CCG methods. On the first image database set, the average retrieval precision of TCM method beyond GLCM and CCG method is 18.76% and 18.92%, respectively. On the second image database set, it is beyond GLCM and CCG method, 19.45% and 21.08%, respectively. Figs. 6 and 7 show two example retrievals on the first and second image database sets:

(26)

i,j=1

In order to evaluate performance of retrieval systems, the retrieval performance is assessed in terms of the commonly used precision and recall. Precision is the ratio of the number of retrieved images that are relevant to the number of retrieved images. Recall is the ratio of the number of retrieved images that are relevant to the total number of relevant images. It is defined as follows: precision(N) = IN /N

(27)

recall(N) = IN /M

(28)

where IN is the number of relevant images retrieved, M is the total number of relevant images and N is the total number of images retrieved.

(a) One query is a terrain image and the relevant images have similar roughness or coarseness of surface and color similarly. The top nine images in Fig. 6 show a good match of color and texture. (b) The other is a car image and the relevant images have obvious shape features. The top eight images in Fig. 7 show a good match of shape. On the second image database set. Car, porcelain and airplane have obvious shape features. The accuracy of those three image categories are significantly beyond that the GLCM and CCG methods. It shows that TCM has the discrimination power of shape features. The experiments were performed on a single CPU 2.8 GHz Pentium PC with 512 MB memory and the Windows operating system. The average time consuming of the gray co-occurrence matrix, the color correlograms and the TCM are 28.04 ms, 27.54 ms and 125.78 ms.

3526

G.-H. Liu, J.-Y. Yang / Pattern Recognition 41 (2008) 3521 -- 3527

Fig. 6. Example of image retrieval using TCM method on the first database set. The query is a terrain image. All return images are correctly retrieved and rank within top nine images. (The top-left image is the query image, the correct images do not include query image itself.)

Fig. 7. Example of image retrieval using TCM method on the second database set. The query is a car image. Eight return images are correctly retrieved and rank within top nine images. (The top-left image is the query image, the correct images does not include query image itself.)

5. Conclusion In this paper, we have put forward a new method of cooccurrence matrix to describe image features. It is different from the gray co-occurrence matrix and color correlograms, because this method has the discrimination power of shape features.

Image retrieval experiments were conducted over two image database sets using the gray co-occurrence matrix, color correlograms and the texton co-occurrence matrix. The two image database sets mainly come from VisTex texture database of MIT, Corel images and web. Experimental results have shown that our proposed method has the discrimination power of color, texture

G.-H. Liu, J.-Y. Yang / Pattern Recognition 41 (2008) 3521 -- 3527

and shape features, the performances are better than that of GLCM and CCG. Acknowledgments This work was supported by the National Natural Science Fund of China (No. 60632050). The authors would like to thank the reviewers for insightful comments that helped to improve the paper. References [1] J. Huang, S.R. Kumar, M. Mitra, et al., Image indexing using color correlograms, in: IEEE Conference on Computer Vision and Pattern Recognition, 1997, pp. 762–768. [2] R.M. Haralick, K. Shangmugam, I. Dinstein, Textural feature for image classification, IEEE Trans. Syst. Man Cybern. SMC-3 (6) (1973) 610–621. [3] G. Cross, A. Jain, Markov random field texture models, IEEE Trans. Pattern Anal. Mach. Intell. 5 (1) (1983) 25–39.

3527

[4] J. Mao, A. Jain, Texture classification and segmentation using multi-resolution simultaneous autoregressive models, Pattern Recognition 25 (2) (1992) 173–188. [5] F. Liu, R. Picard, Periodicity, directionality, and randomness: wold features for image modeling and retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 18 (7) (1996) 722–733. [6] B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data, IEEE Trans. Pattern Anal. Mach. Intell. 18 (8) (1996) 837–842. [7] J. Han, K.-K. Ma, Rotation-invariant and scale-invariant Gabor features for texture image retrieval, Image Vision Comput. 25 (2007) 1474–1481. [8] T. Chang, C.C. Jay Kuo, Texture analysis and classification with tree-structured wavelet transform, IEEE Trans. Image Process. 2 (4) (1993) 429–441. [9] A. Laine, J. Fan, Texture classification by wavelet packet signatures, IEEE Trans. Pattern Anal. Mach. Intell. 11 (15) (1993) 1186–1191. [10] M. Sonka, V. Hlavac, R. Boyle, Image Processing, Analysis, and Machine Vision, second ed., Thomson Brooks/Cole, Boston, MA, USA, 1998. [11] C. Palm, Color texture classification by integrative co-occurrence matrices, Pattern Recognition 5 (37) (2004) 965–976. [12] S. Di Zenzo, A note on the gradient of a multi-image, Comp. Vision Graphics Image Proc. 33 (1986) 116–125. [13] A. Cumani, Edge detection in multi-spectral images, Graphical Models Image Process. 53 (1991) 3–5. [14] B. Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (5802) (1981) 91–97.

About the Author—GUANG-HAI LIU was born in guangxi, China, on 20th May 1977. In 1999, he received the B.S. degree in building from Nanjing University of Science and Technology (NUST), Nanjing, China. He then continued on to complete a Masters of informatics in 2002 and his Ph.D. in the Department of Computer Science in 2005. His current research interests are in the areas of image processing, pattern recognition and artificial intelligence. About the Author—JING-YU YANG received the B.S. degree in Computer Science from Nanjing University of Science and Technology (NUST), Nanjing, China. From 1982 to 1984 he was a visiting scientist at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. From 1993 to 1994 he was a visiting professor at the Department of Computer Science, Missourian University in 1998; he worked as a visiting professor at Concordia University in Canada. He is currently a professor and Chairman in the Department of Computer Science at NUST. He is the author of over 100 scientific papers in computer vision, pattern recognition and artificial intelligence. He has won more than 20 provincial awards and national awards. His current research interests are in the areas of image processing, robot vision, pattern recognition and artificial intelligence.