EFFICIENT ROTATION INVARIANT TEXTURE FEATURES FOR CONTENT-BASED IMAGE RETRIEVAL

EFFICIENT ROTATION INVARIANT TEXTURE FEATURES FOR CONTENT-BASED IMAGE RETRIEVAL

Pattern Recognition, Vol. 31, No. 11, pp. 1725—1732, 1998 ( 1998 Pattern Recognition Society. Published by Elsevier Science Ltd All rights reserved. P...

562KB Sizes 24 Downloads 191 Views

Pattern Recognition, Vol. 31, No. 11, pp. 1725—1732, 1998 ( 1998 Pattern Recognition Society. Published by Elsevier Science Ltd All rights reserved. Printed in Great Britain 0031-3203/98 $19.00#0.00

PII: S0031-3203(98)00015-6

EFFICIENT ROTATION INVARIANT TEXTURE FEATURES FOR CONTENT-BASED IMAGE RETRIEVAL S. R. FOUNTAINs and T. N. TANt sComputational Vision Group, Department of Computer Science, The University of Reading, Reading G6 6AY, U.K. tNational Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China (Received 26 November 1997; in revised form 26 January 1998) Abstract—An efficient approach to the extraction of rotation invariant texture features is presented. Histograms of intensity gradient directions are compiled. Rotation invariant features are extracted by taking the Fourier expansion of the histogram. The method is applied to image database annotation and content based retrieval. With a database of over 400 randomly rotated images all textures are correctly identified on the return of two classifications. On presentation of a texture to the image retrieval system nine out of ten images returned from the search are of the same texture as the query image. Its simplicity and accuracy render the method highly suited to applications such as content-based image retrieval. The method requires no human intervention. Extensive experimental results are included to demonstrate the performance of the method in texture classification and content-based retrieval. ( 1998 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Image database

Rotation invariance

Content-based retrieval

1. INTRODUCTION

Image databases are becoming more pervasive with the advent of digital cameras and increased availability of inexpensive storage media. The phrase ‘‘A picture speaks a thousand words’’ comes to light during image annotation and retrieval where manual labelling proves both tedious and inadequate.(1, 2) Computer vision and image processing techniques can be employed to label and extract images from a database.(3, 4) Queries can be based on properties such as colour,(5) shape,(6) texture(7) and their combinations,(2) thus avoiding traditional keyword searches. This dynamic approach to image retrieval relies on the recognition and classification of image features. This paper focuses on texture which is a main cue in numerous situations such as the analysis of outdoor scenes. The majority of existing work on texture assumes that all images are acquired from the same viewpoint.(8) This is an unrealistic assumption in practical applications. A texture analysis approach should ideally be invariant to viewpoints. Obtaining viewpoint-invariant texture features is an extremely difficult task and current techniques are still in their infancy.(9, 10) This paper focuses on rotation invariance, an important aspect of the general viewpoint invariance problem. Previous attempts to measure such texture features have proven computationally expensive either to execute (e.g. references (11) and (12)) or train (e.g. reference (13)). Time is of the essence

Texture analysis

in many applications where real time solutions are required. The algorithm discussed in this paper boasts simplicity, efficiency and accuracy rendering it ideal for applications such as content based image retrieval and automatic image annotation. The remainder of this paper will discuss the development of the new technique. The performance of various methods to solve each subtask are analysed and the optimum combination used to produce the final results. Automatic image annotation takes the form of texture classification and a working content based image retrieval system is presented.

2. THE FEATURE EXTRACTION ALGORITHM

The algorithm, illustrated diagramatically in Fig. 1, consists of three main stages which are described in turn during the remainder of this section.

2.1. Gradient computation A gradient operator is used to produce magnitude and gradient direction images of the input texture. The images represent the magnitude and direction of maximal grey level change at each pixel of the input image. Such information is found to provide important cues for human visual perception(14) and should therefore be exploited in computer vision (see for example, reference (15)).

1725

1726

S. R. FOUNTAIN and T. N. TAN

Fig. 1. Block diagram of the feature extraction algorithm.

Fig. 2. Gradient direction (b, d) and magnitude images (c, e) of D104 (a).

A number of gradient operators(16) could be used, but the experiments in this paper focus on the more popular Canny(17) and Sobel(18) operators. All gradient directions are converted to values between 0° and 360°. Figure 2 shows an example of gradient computation. The original image, given in Fig. 2(a), is texture D104 taken from the Brodatz album.(19) The two gradient direction images obtained by the Canny and Sobel operators are shown in Figs 2(b) and (d), respectively. The corresponding magnitude images are shown in Figs 2(c) and (e). 2.2. Histogram compilation A histogram of gradient directions (h) is compiled for the image. This is accomplished by adding the magnitude value to the corresponding bin for h at each pixel.

Fig. 3. The effects of smoothing on gradient direction histogram of texture D104 smoothing iterations: (dark plot) 150, 100, 50, 10, 0 (light plot).

curve totals unity: Smoothing. The histograms so obtained tend to be very spiky. Histogram smoothing is therefore applied to remove spurious spikes. Assuming the graph is cyclic between 0° and 360° the smoothing operation is performed by averaging each set of three adjacent bins. The process is repeated for the required number of iterations. The effect of smoothing is shown in Fig. 3, where the number of iterations range from 0 (no smoothing) to 150. Normalisation. It is very unlikely that images are acquired under identical lighting conditions. The undesirable effects due to the differences in illumination can be eliminated to some extent by histogram normalisation. Two different normalisation techniques were used. The first method, area normalisation, is the process whereby each histogram bin is divided by the total magnitude of the graph ensuring the area under the

b(h) , (1) + b(h) h where B is the new histogram value, and b is the original histogram value of b(h). The second method, maximum normalisation, incorporates both graph alignment and normalisation. The maximum histogram value is detected. Each bin is then divided by the maximum and multiplied by a predetermined value (h) in order to scale the results between zero and h. Equation (2) gives the formal definition of the method. The graphs presented in Fig. 4 have been subjected to maximum normalisation. B(h)"

b(h)]h B(h)" , m

(2)

where h is the desired height of the histograms and m is the largest histogram value.

Texture features for content-based image retrieval

1727

a classification and labelling of the image components). A feature-based supervised classification method was used. Representative feature vectors (known as exemplars in this paper) for each class in the training set are ‘‘plotted’’ in an n-dimensional feature space, where n is the number of elements per feature vector. Upon classification the unknown texture’s feature vector is ‘‘plotted’’ in the same space and the distance to each exemplar measured. The texture is assigned to the nearest exemplar class. Two distance measures were used, namely Euclidean (E) and weighted Euclidean (¼ ) as defined in equations (3) and (4): Fig. 4. Maximum-normalised gradient direction histograms of texture D104 (dark) and its rotated version (grey).

2.3. Fourier expansion It is not difficult to see that a direction histogram is a periodic function of h with a period of 2n, where a rotation of the input image is equivalent to a translation of this function. Figure 4 shows the histograms of a texture and its rotated version. A translation between the two plots is evident. The Fourier expansion of the function is calculated and the magnitudes of the Fourier coefficients provide rotation invariant texture features. n-dimensional feature vectors are created from the first n magnitudes of the Fourier expansion.

3. CLASSIFICATION

Texture classification is the process whereby texture samples are assigned to a finite set of texture classes. In the context of image databases this process is essential during image annotation. An image is presented to the system which automatically analyses it and returns a description of the image content (i.e.,

E"J( fM !fK ) z ( fM!fK ),

S

¼"

n ( fM !fK )2 i , + i 2 i i/1

(3) (4)

where fM is an exemplar vector, fK is a feature vector for the unknown texture, fK is the ith feature of fK , fM the ith i i feature of fM and p is the standard deviation of fM. i Other more sophisticated distance measures and classifiers such as the Mahalanobis distance and neural network classifiers(20) could have been used. The emphasis in this paper is however computational simplicity.

4. EXPERIMENTAL RESULTS

Ten textures from the Brodatz album(19) were randomly selected for use in the following experiments. A database of 460 images was created by the random rotation and cropping of the textures which are shown in Fig. 5. Each test image was subjected to histogram equalisation to prevent bias towards images with similar grey levels. Where it is not otherwise stated the test data described above was used with the Sobel operator, area

Fig. 5. Ten Brodatz texture classes used in the classification experiments.

1728

S. R. FOUNTAIN and T. N. TAN Table 1. Effects of smoothing on classification accuracy (%) No. of classification s attempts

10 iterations

50 iterations

100 iterations

1st attempt 2nd attempt 3rd attempt

68 17.1 14.9

77.5 22.5 —

84.4 15.6 —

150 iterations

84 16 —

Table 2. Effects of histogram normalisation on classification accuracy (%)

Sobel

Canny

No. of classification attempts

Maximum normalisation

Area normalisation

1st attempt 2nd attempt 3rd attempt

84.4 15.6 —

63 30 7

1st attempt 2nd attempt 3rd attempt Further attempts

68.9 11.9 6.1 13.1

45.6 17.9 16.5 20

Table 3. Classification accuracy (%) under different number of features No. of classification attempts 1st attempt 2nd attempt 3rd attempt

Sobel

Canny

1st attempt 2nd attempt 3rd attempt Further attempts

3 Features

4 Features

5 Features

6 Features

74 22 4

84.4 15.6 —

82.5 17.5 —

81.5 18.5 —

62.5 14.2 9 14.3

68.9 11.9 6.1 13.1

70 12.7 5.3 12

70.4 12 5.2 12.4

normalisation, four features, 100 smoothing iterations and the Euclidean distance measure. The standard deviation parameter for Canny was set to one. The training set comprised 20 rotations of each texture (shown in Fig. 5). The mean feature vector for each class was used as an exemplar vector during classification. 4.1. Histogram smoothing The effects of histogram smoothing on classification performance were investigated. The results obtained are summarised in Table 1. The table shows the average classification accuracy for each given number of smoothing iterations. For example, for 50 iterations, 77.5% of the images were correctly classified at the first attempt; and the remaining 22.5% were correctly classified after a second attempt. A significant improvement in classification accuracy is evident as the number of smoothing iterations increases, however, the classification accuracy starts to drop after 100 iterations as the histograms begin to lose their characteristic features.

4.2. Normalisation Similar experiments were performed to study the effects of histogram normalisation. The two normalisation methods were tested using both the Canny and Sobel operators. The results of the experiments are given in Table 2. It can be seen that all images were correctly classified after two attempts (84.4% on the first attempt and 15.6% on the second attempt) when the Sobel operator and maximum normalisation were adopted. As a whole maximum normalisation appears to be superior under both operators, the reason for which is not entirely clear.

4.3. Number of features In order for the method to run efficiently the least number of features giving the most accurate results should be used during classification. The optimum number of features was discovered by experimentation and the classification results obtained are summarised in Table 3.

Texture features for content-based image retrieval

1729

Table 4. Classification accuracy (%) under different standard deviations of the Canny operator No. of classification attempts a"0.25 1st attempt 2nd attempt 3rd attempt Further attempts

60.4 14.9 8.8 15.9

a"0.4

a"0.5

67.4 5.6 4.5 22.5

75.8 7.6 4 12.6

a"0.525 a"0.55 77 10 3 10

77 6.8 3.5 12.8

a"0.6

a"0.75

a"1

72.8 6.1 6.2 14.9

67.3 14.1 5 13.6

68.9 11.9 6.1 13.1

a"1.5 a"2 62.3 16.2 6.9 14.6

57.5 13.9 6.6 22

Table 5. Comparison of classification accuracy (%) of Euclidean and weighted Euclidean classifiers

Sobel

Canny

No. of classification attempts

Weighted

Non-Weighted

1st attempt 2nd attempt 3rd attempt 1st attempt 2nd attempt 3rd attempt Further attempts

84.5 14.9 0.6 60.9 23.6 8.5 7

84.4 15.6 — 77 10 3 10

Fig. 6. Detailed comparison between Euclidean and weighted Euclidean distances for the Sobel operator.

The number of features retained is in fact retrogressive due to the well-known curse of dimensionality problem in pattern recognition.(21) The optimum number was found to be different for Sobel and Canny. Five features are chosen as optimal for Canny even though six features gave a better first place recognition rate. The difference between the two is very small and it is more efficient to use less features. 4.4. Standard deviation selection for Canny The standard deviation (p) is an important parameter of the Canny operator. It should be chosen with care to ensure good results. p governs the scale of edge detection; decreasing p results in the detection of edges at smaller scales.

The effects of p on the average classification accuracy are shown in Table 4. The optimum standard deviation is selected at 0.525 as this value gave a better second-attempt accuracy than 0.55. 4.5. Classification—Euclidean or weighted Euclidean? For the Sobel operator the two distance measures gave similar results (see Table 5). The recognition rate was worse for the Canny operator when the weighted Euclidean distance was used, though the percentage of textures recognised within three attempts was increased. A more detailed comparison between Euclidean and weighted Euclidean distances for the Sobel operator is shown in Fig. 6. It can be seen that with the weighted distance measure, textures D16 and D56

1730

S. R. FOUNTAIN and T. N. TAN

Fig. 7. A comparison between the Sobel and Canny operators. Table 6. Classification confusion matrix for the Sobel operator

D104 D15 D16 D17 D20 D31 D34 D53 D56 D84

D104

D15

D16

D17

D20

D31

D34

D53

D56

D84

46 — — — — — — — — —

— 25 — — — — — 15 — —

— — 38 — — 28 — — — —

— — — 46 — — — — — —

— — — — 46 — — — 1 —

— — 8 — — 18 — — — —

— — — — — — 46 — — —

— 21 — — — — — 31 — —

— — — — — — — — 45 —

— — — — — — — — — 46

have improved first-attempt performance while the remainder either see no change or fair worse. 4.6. Sobel versus Canny In order to compare the performance of the Sobel and Canny operators the optimum settings of the variables discussed above were chosen. The optimum settings for Sobel is a hundred iterations of smoothing, maximum normalisation and four features. For Canny the optimum is a hundred smoothing iterations, maximum normalisation, a standard deviation of 0.525 and five features. It was found that experiments using the Sobel operator produced results superior to those using Canny. Figure 7 supports this claim by showing the percentages of correct classifications at various attempts. For only one attempt the Sobel operator achieved an 84.4% accuracy while the Canny operator achieved only 77%. The following confusion matrices show the classifications of each texture for both Sobel (Table 6) and Canny (Table 7). There were 46 test samples for each texture class. The shaded diagonal highlights correct classifications. It is interesting to note that under the Sobel operator misclassifications are mainly due to confusions between D15&D53 and D16&D31. It is intuitive that D15 and D31 give rise to problems as the edges within these textures occur at many different directions. The histograms produced are easily confusable with those of other textures.

Experiments using the Canny operator proved extremely tedious. It took great patience to discover the optimum standard deviation. The execution time when using Canny was double that of the experiments using the Sobel operator. The confusion matrix (given in Table 7) is difficult to interpret as misclassifications are plentiful. 4.7. Remarks It is thought that the standard deviation parameter is responsible for the inferior results obtained with the Canny operator. The optimum value was discovered by trial and error on the test data as a whole. In reality, an ideal value for one texture image is likely to be sub-optimal for another. Physical analysis of each image is a solution to the problem but defeats the point of the exercise. Despite its simplicity the Sobel operator produced good results without the need for human intervention (i.e. no tuning parameters). All textures were correctly classified on either the first or second attempt.

5. CONTENT-BASED IMAGE RETRIEVAL

An image retrieval system was designed to test the power of the texture analysis method. A query texture is presented to the system which returns a predetermined number of ‘‘similar’’ images from its database (cf. Section 3). Ten most similar images were selected

Texture features for content-based image retrieval

1731

Table 7. Classification confusion matrix for the Canny operator

D104 D15 D16 D17 D20 D31 D34 D53 D56 D84

D104

D15

D16

D17

D20

D31

D34

D53

D56

D84

20 1 — — 7 1 — — — 4

4 37 — — 16 — — 4 1 3

3 — 46 — 2 2 — — — —

3 — — 46 — — — — — —

8 — — — 9 — 6 — 2 2

— — 8 — 1 43 — — — —

4 — — — 3 — 40 — 2 1

2 8 — — 5 — — 42 5 —

— — — — 3 — — — 36 —

2 — — — — — — — — 36

Fig. 8. Results for content-based texture image retrieval.

for return during experimentation. The current database contains 450 randomly orientated textures obtained from the ten classes in Fig. 5. The results are very encouraging. An average of nine out of ten images returned are of the same texture class as the query image. Figure 8 shows the results (as percentages) for each class in the database. After cross referencing Fig. 8 with Table 6 it can be seen that the texture classes with perfect results are the same in both cases. A texture that gives good classification results also gives good retrieval results. It follows that an accurate classification algorithm is imperative for work related to image databases.

6. FURTHER WORK

Preliminary testing suggests that the method performs well on a larger database, particularly for the application of content based image retrieval.(22) Texture segmentation will be addressed so that the method can be applied to more general image databases such as those containing natural scenes or medical images. The method is suited to this purpose

because unlike some others (e.g. reference (23) which utilises the 2-D Fourier transform) texture areas of any shape can be easily analysed. Further work includes these extensions as well as the natural progression to geometrically invariant texture classification.

7. CONCLUSIONS

An accurate and efficient method of rotation invariant texture classification has been described. The technique uses edge attribute information and the Fourier expansion to produce rotation invariant texture features. Real-time execution can be achieved as the edge attribute information is quick and easy to calculate and many systems include hardware support for the Sobel edge operator. Experimentation with different approaches to each sub-task of the method was presented. Use of the Sobel operator was found to be more accurate and efficient than Canny for this application. With the optimal settings identified in this paper, the Sobel operator is able to correctly classify 84.4% of the images on the first attempt and all images if two attempts are allowed.

1732

S. R. FOUNTAIN and T. N. TAN

An image retrieval system was developed using the texture analysis method. A database of 460 images with ten returns per search gave an average retrieval accuracy of 90%. The speed and simplicity of the method renders it ideal for such applications. REFERENCES

1. S. Fountain et al., Content-based rotation-invariant image annotation, IEE Colloq. on Intelligent Image Databases (1996). 2. M. Flickner et al., Query by image and video content: the QBIC system, IEEE Comput. 28, 23—32 (1995). 3. A. Pentland et al, Photobook: content-based manipulation of image databases, Proc. SPIE Storage and Retrieval Image and »ideo Databases II, Vol. 2185 (1994). 4. R. Picard and T. Minka, Vision texture for annotation, Multimedia Systems 3, 3—14 (1995). 5. J. Hafnesetal, Efficient color histogram indexing for quadratic form distance functions, IEEE ¹rans. Pattern Anal. and Mach. Intelligence 17 (7), 729—736 (1995). 6. J. Eakins et al., Retrieval of trade mark images by shape feature — the artisan project, IEE Colloq. on Intelligent Image Databases (1996). 7. A. Kankanhalli et al., Using texture for image retrieval, Proc. 3rd Int. Conf. on Automation, Robotics and Computer »ision, pp. 935—939. Singapore (1994). 8. T. Reed and J. Hans Du Buf, A review of recent texture segmentation and feature extraction techniques, CVGIP: Image ºnderstanding 57 (3), 359—372 (1993). 9. T. Tan, Noise robust and rotation invariant texture classification, Proc. »II European Signal Processing Conf., pp. 1377—1380 (1994). 10. T. Tan, Geometric transform invariant texture analysis, Procs SPIE, Vol. 2488, pp. 475—485 (1995).

11. R. Kashyap and A Khotanzad, A Model Based Method For Rotation Invariant Texture Classification, IEEE ¹rans. Pattern Anal. Mach. Intelligence PAMI-8(4), 786—804 (1986). 12. W. Wu and S. Wei, Rotation and gray-scale transforminvariant texture classification using spiral resampling, subband decomposition, and hidden Markov model, IEEE ¹rans. Image Process. 5 (10), 1423—1434 (1996). 13. J. You and H. Cohen, Classification and segmentation of rotated and scaled texture images using tuned masks, Pattern Recognition 26 (2), 245—258 (1992). 14. D. Marr, »ision. Freeman, San Fransisco (1982). 15. M. Gorkani and R. Picard, Texture orientation for sorting photos at a glance, Proc. IEEE Conf. on Pattern Recognition, pp. 459—464 (1994). 16. R. Gonzalez and R. E. Woods, Digital Image Processing. Addison-Wesley, Reading, MA (1992). 17. M Sonka et al., Image Processing, Analysis and Machine »ision. Chapman and Hall, London (1993). 18. D. Ballard and C. Brown, Computer »ision. Prentice Hall, Englewood Cliffs (1982). 19. P. Brodatz, ¹extures: A Photographic Album for Artists and Designer. Dover, NewYork (1966). 20. C. Bishop, Neural Networks for Pattern Recognition. Oxford University Press, Oxford, (1995). 21. A. Jain and B. Chandrasekaran, Dimensionality and sample size considerations in pattern recognition practice, in Handbook Statistics, P. Krishnaiah and L. Kanal, eds, Vol. 2, pp. 835—855. North Holland, Amsterdam (1982). 22. S. Fountain and T. Tan, Rotation invariant retrieval and annotation of image databases, Proc. BM»C ’97, Vol. 2, pp. 390 —399 (1997). 23. Greenspan et al., Rotation invariant texture recognition using a steerable pyramid, Proc. ICPR ’94, pp. 162—167 (1994).

About the Author—STEPHANIE FOUNTAIN received a first class B.Sc. honours degree in Computer Science from the University of Leeds in 1995. She was awarded the national Softwright Innovation Award for work on her undergraduate final year project. She is currently studying for a Ph.D. at The University of Reading. Her research interests include texture analysis and image database issues such as content-based retrieval and annotation. About the Author—TIENEU TAN received his B.Sc. (1984) in electronic engineering from Xi’an Jiaotong University, China, and M.Sc. (1986), DIC (1986) and Ph.D. (1989) in electronics engineering from Imperial College of Science, Technology and Medicine, London, England. In October 1989, he joined the Computational Vision Group at the Department of Computer Science, The University of Reading, England, where he worked as Research Fellow, Senior Research Fellow and Lecturer. In January 1998, he returned to China to take up a professorship at the National Laboratory of Pattern Recognition located at the Institute of Automation of the Chinese Academy of Sciences, Beijing, China. Dr Tan has published widely on image analysis and compuer vision. He is a Senior Member of the IEEE and an elected member of the Executive Committee of the British Machine Vision Association and Society for Pattern Recognition (BMVA). His current research interests include speech and image processing, machine and computer vision, pattern recognition, multimedia, and robotics.