Pattern Recognition 35 (2002) 783–790
www.elsevier.com/locate/patcog
Optimizing lter banks for supervised texture recognition Manfred Bresch∗ Microelectronic Systems, University of Duisburg, Finkenstrae 61, 47057 Duisburg, Germany Received 13 January 2000; accepted 28 March 2001
Abstract Two criteria for invariant supervised texture segmentation based on multi-channel approaches are introduced. The texture segmentation is carried out by feature extraction using multi-channel Gabor ltering and classi cation with symmetric phase-only matched ltering. For the feature extraction highly e2cient lter banks are required that enable clear distinction between feature vectors representing di4erent textures in order to achieve a high classi cation rate. For the design of the lter banks, the variances of the frequency components must be maximized. The spar hyper volume spanned by the normalized feature vectors representing di4erent textures must be maximized as well. These two criteria provide guidelines for lter bank design. ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. Keywords: Filter banks; Texture recognition; Pattern classi cation; Spar hyper volume; Design criteria
1. Introduction Texture analysis plays an important role in remote sensing, ultrasonic imaging, computer vision, image synthesis, satellite images, automotive applications, content-based image retrieval, detection of faulty regions in a surface, object recognition, scene analysis, medical image processing, image classi cation based on their texture and a lot of other elds in image processing. For most of the applications, a segmentation is necessary. Usually it is required, that a texture recognition system shall be insensitive to camera calibration, which is implying invariance with respect to orientation, shift, and scale. A well-known method for this invariant texture segmentation is a multi-channel approach [1,2] using a polar-logarithmic [3] Gabor lter bank as Gabor lters [4] are well-known to be suitable for segmentation pur-
∗ Tel.: +49-203-2783204; fax: +49-203-2783278. E-mail address:
[email protected] (M. Bresch).
poses [5 –7]. However, for good segmentation results, a highly complex lter bank design is required. Thus for an e2cient design, reliable design guidelines must be available. These design guidelines should yield a feature extraction system that provides maximum classi cation rate. The feature vectors representing di4erent classes should be thus mostly dissimilar, while feature vectors representing same classes should be mostly similar. The extracted feature vectors of di4erent classes cannot be distinguished, if in every channel the spectral power of all the textures is the same. Therefore, lter banks have to be designed in such a way that di4erent spectral powers of the textures in the channels are obtained. This paper is organized as follows: Section 2 introduces the volume tting criterion (called q) for synthesis of lter banks for feature extraction, and Section 3 presents the normalized spar hyper volume as analysis criterion for extracted feature sets (called vol). In Section 4, a basic requirement for feature vectors to be classi ed is discussed. Section 5 applies the analysis criterion to texture analysis. Section 6 presents numerical results and Section 7 concludes. The features are extracted by multi-channel
0031-3203/02/$22.00 ? 2002 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved. PII: S 0 0 3 1 - 3 2 0 3 ( 0 1 ) 0 0 0 7 5 - 9
784
M. Bresch / Pattern Recognition 35 (2002) 783–790
Gabor ltering (the average magnitude of each lter output becomes one feature [1]). The design criteria are necessary for a texture segmentation invariant with respect to orientation, scale, and position. Currently existing texture analysis techniques either lack of invariance properties [8,9], or are not suitable for segmentation purposes [10,11].
2. Volume tting criterion for designing a lter bank for supervised texture analysis invariant with respect to rotation and scale Two textures can be well distinguished, if the spectral powers of frequency channels used for spectral analysis di4er as much as possible. If in all channels both textures exhibit the same power, then the textures look the same and cannot be distinguished thus making a classi cation impossible. Therefore, a variance of the channel power should be as large as possible in all the feature extracting channels. Hence, the following method is recommended for the design of a lter bank for rotation and scale invariant texture recognition with multi-channel ltering. It consists of: 1. spectral analysis of data base textures, 2. computation of the variances of each frequency component, 3. determination of a quality value for maximization of the variances, 4. determination of a subspace of all polar-logarithmic lter banks, 5. selection of this polar-logarithmic lter bank, which leads to the optimal quality value for variance maximization.
Fig. 1 shows that spectral components exhibiting higher variance should be treated with higher priority. When extracting features with spectral analysis, a spectral component with higher variance should be more ”weighted”, as more the data base textures di4er in this spectral region. Let the n textures of the data base be the discrete two-dimensional spatial functions t1 (x; y); t2 (x; y); : : : ; tn (x; y) and their Fourier transforms be the two-dimensional spatial frequency functions T1 (u; v) =
∞ ∞
t1 (x; y)e−j2(ux+vy) ;
−∞−∞
T2 (u; v) =
∞ ∞
t2 (x; y)e−j2(ux+vy) ;
(1)
−∞ −∞
T3 (u; v) =
∞ ∞
t3 (x; y)e−j2(ux+vy) :
−∞ −∞
The variances of these Fourier transforms are 2 (u; v) =
n
1 {Ti (u; v) − }2 ; n i=1
(2)
where the average is =
n
1 Tj (u; v): n j=1
(3)
If the transfer function of a lter is interpreted as a weight H (u; v) with the discrete spectral co-ordinates u and v, then the use of H (u; v) = k2 (u; v)
Fig. 1. Frequency components of large variance are treated with higher priority.
(4)
M. Bresch / Pattern Recognition 35 (2002) 783–790
785
with the variance 2 (u; v) of the spectral components as lter transfer function leads to a good feature extraction. An optimal lter for the feature extraction is, therefore, H (u; v) =
n i=1
2
1 Ti (u; v) − Tj (u; v) n j=1 n
(5)
with a constant factor for all u; v. The equation for an optimal lter bank does not necessarily lead to a polar-logarithmic lter bank. Texture recognition invariant with respect to rotation and scale, however, requires such a lter bank. Hence, volume tting becomes inevitable. Volume tting means selecting a polar-logarithmic lter bank which ts best the optimal lter bank according to Eq. (5). For n data base textures and an N -channel lter bank with discrete co-ordinates u; v a normalized quality value q is de ned according to N u
v
q = N u
v
i=1
i=1
Hi (u; v) 2 (u; v)
Hi (u; v)
u
v
(6)
2 (u; v)
or N
Fig. 2. The larger the angle between extracted feature vectors the better.
(C) Re-design of the lter bank until the quality value determined in (B) is optimal. 4. Distinctness of feature vectors If two feature vectors are linearly dependent, then the represented objects cannot be distiguished, as they
1 n T (u; v)}2 u v n k=1 k q=
N 1 n n 2 H (u; v) { T (u; v) − T (u; v) } j i=1 i j=1 u v u v n k=1 k i=1 Hi (u; v)
n j=1 {Tj (u; v)
−
The range of this value is 0 6 q 6 1. Without this normalization the quality value would depend on or on the signal power of the texture. The polar-logarithmic lter bank with the largest quality value must be selected. Here, q = 1 means perfect exploiting the distinctness of the data base textures at the ltering, while q = 0 means suppression of any distinctness. 3. Spar hyper volume as criterion for the design of lter banks for supervised texture analysis Another design criterion for lter bank design is presented in this section. If two di4erent textures shall be distinguished according to their feature vectors, then it is desirable, that all the feature vectors are orthogonal. In this case, one of both texture possesses all those features, which the other lacks. Particularly undesirable is linear dependence of feature vectors representing di4erent classes. Here, both textures are of the same features and cannot be classi ed. This motivates the use of the following method: (A) Computation of all angles between the feature vectors. (B) Determination of a quality value maximizing the angles between the feature vectors.
(7)
exhibit common features. But if two feature vectors are orthogonal, then the object represented by one of both vector has all those features, which the other object lacks. In this case, a maximum classi cation rate is achieved. For classi cation, the feature extraction is the better, the larger the orthogonal projection of the vectors is, in other words, the larger the component of non-common features. In the same way, the smaller the parallel projections, the better the extraction, in other words, the smaller the component of common features. For optimal distictness in a large set of feature vectors, an orthogonal base is most desirable. The spar hyper volume of all feature vectors rises as the orthogonal projections of the vectors increase. For an orthogonal base, the spar hyper volume is maximal, while it is zero, if at least two vectors of the base are linear dependent. The spar hyper volume however also increases with the magnitude of the vectors, e.g. with the “edge lengths of the spars”, without simultaneously increasing distinctness of the objects represented by the feature vectors. Hence, the normalized spar hyper volume of the feature vectors, which only depends on the “shape of the spar”, is an appropriate quality factor for feature extraction. Therefore, the length of each vector of the base is normalized to one and then the normalized spar hyper volume is calculated. The spar hyper volume of an orthonormal basis is one. Fig. 2 compares
786
M. Bresch / Pattern Recognition 35 (2002) 783–790
Fig. 3. Ten synthetic textures mit 128 × 128 pixels, from which following data bases are chosen.
Fig. 4. Used data base textures (f. l. t. r.: textures 1, 2, 3 and 4).
good feature extraction of large inner angles with feature extraction of small inner angles. 5. Normalized spar hyper volume for dening the quality of feature sets The spar hyper volume of a vector base is the product of the orthogonal projections of all base vectors. The orthogonal projection of a vector is the component, which is orthogonal towards all other vectors of the base.
Let v1 ; v2 ; : : : ; v n be the m-dimensional base of n feature vectors vj = (vj1 ; vj2 ; : : : ; vjm ). It has to be noticed, that the base does not necessarily span the whole feature space as the number of feature vectors is not necessarily the same as the number of features contained in a vector. Then the orthogonal projections o1 ; o2 ; : : : ; o n with i; j ∈ {1; 2; : : : ; n} are o i ; vj = 0
∀i = j;
(8)
o i ; v i = 0
∀i;
(9)
M. Bresch / Pattern Recognition 35 (2002) 783–790
Fig. 5. Gabor lter bank used in Tables 1–3. The axes are cartesian frequency axes fx and fy . The AC part is in the center, the rims are 64 periods per image width.
Table 1 SPOMF lter outputs at spar hyper volume of 0.38 Feature of
Texture 1
Texture 2
Texture 3
Texture 4
Texture Texture Texture Texture
1.000 000 0.532 973 0.658 543 0.571 215
0.532 973 1.000 000 0.611 287 0.630 673
0.658 543 0.611 287 1.000 000 0.831 551
0.571 215 0.630 673 0.831 551 1.000 000
1 2 3 4
Table 2 SPOMF lter outputs at spar hyper volume of 0.000008 Feature of
Texture 1
Texture 2
Texture 3
Texture 4
Texture Texture Texture Texture
1.000 000 0.999 999 0.611 827 0.630 672
0.999 999 1.000 000 0.612 493 0.630 012
0.611 827 0.612 493 1.000 000 0.831 551
0.630 672 0.630 012 0.831 551 1.000 000
1 2 3 4
Table 3 SPOMF lter outputs at spar hyper volume of 0.278473 Feature of
Texture 1
Texture 2
Texture 3
Texture 4
Texture Texture Texture Texture
1.000 000 0.500 000 0.826 999 0.630 672
0.500 000 1.000 000 0.500 939 0.630 889
0.826 999 0.500 939 1.000 000 0.631 330
0.630 672 0.630 889 0.631 330 1.000 000
1 2 3 4
where the inner product of o i and vj is o i ; vj = m k=1 o ik vjk . All vectors
are normalized, so |v1 | = |v2 | = ::: = |v n | = 1
m 2 with |v i | = k=1 v ik . The normalized spar hyper volume to be maximized is
vol =
n i=1
|o i |
(10)
787
and is in the range 0 6 vol 6 1. It is the product of the components of all the feature vectors, which are orthogonal to all other feature vectors. The normalized spar hyper volume only contains unit vectors (|oi | = 1). The normalized spar hyper volume de nes the quality of a set of features compared to an optimal feature set. The “curse of dimensionality” [12] states, that the distinctness of feature vectors does not increase monotonically with the dimension of the feature vector. This leads to the problem of selecting the suitable feature vector dimension for an optimal classi cation. All those features that appear inside several vectors are “bad” for classi cation and should be used with low priority. Instead, all those features, which just one vector uniquely consists of, are apt for recognition purposes and should be used with high priority. When analyzing the normalized spar hyper volumes of the feature bases of di4erent dimension, the basis of this dimension must be considered as especially appropriate, that achieves a high normalized spar hyper volume. Fig. 6 points out, that the two methods are almost equivalent independent of texture class. The larger the normalized spar hyper volume the larger the variance (both values have to be maximized). However, as design criterion the volume tting criterion should be applied, because the lter bank can be calculated from the texture data base before feature extraction takes place. The spar hyper volume serves for check purposes, because this can only be calculated after feature extraction using the lter bank. 6. Numerical results From a set of data base textures shown in Figs. 3 and 4, features are extracted with Gabor lter banks presented in Fig. 5 and afterwards classi ed using symmetric [13] phase-only matched lters (SPOMF), which extremely well use the importance of phase in image processing [14 –16]. The simulation compares the spar hyper volumes of the extracted feature vectors. It must be mentioned, that the lower the second highest resulting SPOMF output (which cannot exceed the range [0,1]), the better the feature vectors and the higher the classi cation rate. The SPOMF matched ltering of two identical feature vectors is always 1.0. The feature vectors of textures 1 and 2 are almost parallel. Therefore, the spar hyper volume is very small and the distinctness is very low. Finally, a simulation of Brodatz [17] textures shown in Fig. 6 underlines the equivalence of the normalized spar hyper volume as analysis criterion with the variance tting as synthesis criterion. Features of two Brodatz textures shown in Fig. 7 were extracted using di4erent Gabor lter banks and the corresponding spar hyper volumes were calculated. The spectral
788
M. Bresch / Pattern Recognition 35 (2002) 783–790
Fig. 6. Comparing normalized spar hyper volume with volume tting, solid line: normalized spar hyper volume, dotted line: maximum of SPOMF lter output (peak sharpness).
Fig. 7. Used Brodatz textures [17] D11 and D24.
variances were calculated and tted to di4erent Gabor lter banks. The Gabor lter banks consist of four channels in radial direction at a radial bandwidth of one octave and eight channels in azimuthal direction at an ◦ azimuthal bandwidth of 22:5 . The banks di4er in the minimum radial center frequency. The minimum radial center frequency is between 1 and 8 periods per image width. The banks also di4er in the ratio of adjacent radial center frequencies. At a minimum radial center frequency between 1 and 6 period per image width the ratio is 2, at 7 periods per image width it is 1.8, and at periods per image width it is 1.7. Keeping these ratios constant at a value of 2.0 would violate Shannon‘s theorem at a center frequency of more than 6 periods per image width. Figs. 8(a) and (b) compare the segmentation results with two di4erent lter banks. The left lter bank extracts features spanning a normalized spar hyper volume of 0.06, the right lter bank extracts features spanning a normalized spar hyper volume of 0.68. Fig. 9
explains the feature extraction with multi-channel Gabor ltering. 7. Summary We have presented two criteria for invariant supervised texture segmentation based on multi-channel approaches. For the design of the lter banks, the variances of the frequency components must be maximized. The spar hyper volume spanned by the normalized feature vectors representing di4erent textures must be maximized as well. These two criteria give guidelines for lter bank design. A comparison of the spar hyper volume with the volume tting criterion proves their feasibilty and equivalence for texture segmentation. As design criterion, the volume tting criterion should be applied, because the lter bank can be calculated from the texture data base before feature extraction takes place. The spar hyper volume serves for check purposes.
M. Bresch / Pattern Recognition 35 (2002) 783–790
789
Fig. 8. (a) Top row: left: input image, middle: segmentation result with a feature set of a spar hyper volume of 0.06, right: segmentation result with a feature set of a spar hyper volume of 0.68, bottom row: middle: feature extracting lter bank leading to a feature vector set with a spar hyper volume of 0.06, right: feature extracting lter bank leading to a feature vector set with a spar hyper volume of 0.68. (b) From left to right: input image, Gabor lter bank leading to feature set of spar hyper volume of 0.04 with segmentation result, Gabor lter bank leading to feature set of spar hyper volume of 0.34 with segmentation result.
Fig. 9. Feature extraction with multi-channel Gabor ltering.
790
M. Bresch / Pattern Recognition 35 (2002) 783–790
References [1] M. Bresch, Invariant supervised texture recognition using multi-channel Gabor lters, Proceedings of Eusipco’98, Rhodes, Vol. IV, September 1998, pp. 2521–2524. [2] M. Bresch, Invariant texture segmentation with reduced illumination sensitivity, ECMCS’99, European Conference on Multi-Media Communications and Services, Krakow, June 1999, (extended Version Signal Processing 81(4)). [3] Y. Sheng, H.H. Arsenault, Experiments on pattern recognition using invariant Fourier-Mellin descriptors, Journal Opt. Soc. Am. A. 3 (6) (1986) 771–776. [4] D. Gabor, Theory of communication, J. Inst. Electr. Eng. 93 (1946) 429–457. [5] F. Farrokhnia, A.K. Jain, Unsupervised texture segmentation using Gabor lters, Pattern Recognition 24 (1991) 1167–1186. [6] F. Farrokhnia, A.K. Jain, A multi-channel ltering approach to texture segmentation, IEEE Trans. on Image Proc. (1991) 364–370. [7] A.K. Jain, N.K. Ratha, S. Lakshmanan, Object detection using Gabor lters, Pattern Recognition 30 (2) (1997) 295–309. [8] Motohide Yoshimura, Shunichiro Oe, Evolutionary segmentation of texture image using genetic algorithms towards automatic decision of optimum number of
[9] [10] [11]
[12] [13]
[14] [15] [16] [17]
segmentation areas, Pattern Recognition 32 (12) (1999) 2041–2054. G. CristRobal, J. Hormigo, Texture segmentation through eigen-analysis of the Pseudo-Wigner distribution, Pattern Recognition Lett. 20 (3) (1999) 337–345. A. Khotanzad, Y.H. Hong, Rotation invariant image recognition using features selected via a systemetic method, Pattern Recognition 23 (10) (1990) 1089–1101. S.O. Belkasim, M. Shridhar, M. Ahmadi, Pattern recognition with moment invariants: a comparative study and new results, Pattern Recognition 24 (12) (1991) 1117–1138. J. Mao, A.K. Jain, Texture classi cation and segmentation using mutliresolution simultaneous autoregressive models, Pattern Recognition 25 (2) (1992) 173–189. Q.S. Chen, M. Defrise, F. Deconninck, Symmetric phase-only matched ltering of Fourier-Mellin transforms for image registration and recognition, IEEE Trans. Pattern Anal. Mach. Intell. 16 (12) (1994) 1156–1168. J.S. Lim, A.V. Oppenheim, The importance of phase in signals, Proc. IEEE 69 (5) (1981) 529–541. T.S. Huang, J.W. Burnett, The importance of phase in image processing lters, IEEE Trans. ASSP ASSP-23 (6) (1975) 529–542. J.L. Horner, P.D. Gianino, Phase-only matched ltering, Applied Optics 23 (6) (1984) 812–816. P. Brodatz, Textures—A Photographic Album for Artists and Designers, Dover, New York, 1965.
About the Author—M. BRESCH has studied Electrical Engineering at the University of Dortmund from 1989 to 1995 and nished his Ph.D. Thesis about “Texture Segmentation using Multi-Channel Approaches” in November 1999 at the Chair of Microelectronic Systems at the Gerhard-Mercator-University of Duisburg. His main interests are Communication and Information Theory, Image Processing, Pattern Recognition.