Learning completed discriminative local features for texture classification

Learning completed discriminative local features for texture classification

Pattern Recognition 67 (2017) 263–275 Contents lists available at ScienceDirect Pattern Recognition journal homepage: www.elsevier.com/locate/patcog...

4MB Sizes 3 Downloads 151 Views

Pattern Recognition 67 (2017) 263–275

Contents lists available at ScienceDirect

Pattern Recognition journal homepage: www.elsevier.com/locate/patcog

Learning completed discriminative local features for texture classification Zhong Zhang a,∗, Shuang Liu a, Xing Mei b, Baihua Xiao c, Liang Zheng d a

Tianjin Key Laboratory of Wireless Mobile Communications and Power Transmission, Tianjin Normal University, Tianjin 300387, China National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China c The State Key Laboratory of Management and Intelligent Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China d Centre for Quantum Computation and Intelligent Systems, University of Technology Sydney, Ultimo, NSW 2007, Australia b

a r t i c l e

i n f o

Article history: Received 18 March 2016 Revised 15 February 2017 Accepted 16 February 2017 Available online 17 February 2017 Keywords: Texture classification Discriminative learning Local binary patterns Adaptive histogram accumulation

a b s t r a c t Local binary patterns (LBP) and its variants have shown great potentials in texture classification tasks. LBP-like texture classification methods usually follow a two-step feature extraction process: in the first pattern encoding step, the local structure information around each pixel is encoded into a binary string; in the second histogram accumulation step, the binary strings are accumulated into a histogram as the feature vector of a texture image. The performances of these classification methods are closely related to the distinctiveness of the feature vectors. In this paper, we propose a novel feature representation method, namely Completed Discriminative Local Features (CDLF), for texture classification. The proposed CDLF improves the distinctiveness of LBP-like feature vectors in two aspects: in the pattern encoding stage, we learn a transformation matrix using labeled data, which significantly increases the discrimination power of the encoded binary strings; in the histogram accumulation step, we use an adaptive weight strategy to consider the contributions of pixels in different regions. The experimental results on three challenging texture databases demonstrate that the proposed CDLF achieves significantly better results than previous LBP-like feature representation methods for texture classification tasks. © 2017 Elsevier Ltd. All rights reserved.

1. Introduction Texture is customarily defined as a visual pattern with repeated basic primitives, which describes the distinctive appearance of many natural objects [1]. Texture classification, as an important technique in texture analysis, has attracted much attention owing to the potential value for theoretical challenges and practical applications, such as texture retrieval, remote sensing, image classification, medical image analysis, and etc. Typically, a texture classification algorithm comprises two main stages. The first stage is texture representation, which extracts a feature vector from each texture image. The second stage is a classification step. With the extracted feature vector as input, we choose a classifier to determine the category of the given texture image. The most frequently used classifiers are k-nearest neighbor (k-NN) [2] and support vector machine (SVM) [3]. In this paper, we focus on texture representation step. We aim to develop discriminative and robust features that reduce the intra-class variance and enlarge the margin between different texture classes. ∗

Corresponding author. E-mail address: [email protected] (Z. Zhang).

http://dx.doi.org/10.1016/j.patcog.2017.02.021 0031-3203/© 2017 Elsevier Ltd. All rights reserved.

Texture representations have been widely studied in the past decades. Some early studies propose to extract invariant features using statistical, model-based and signal processing methods, such as co-occurrence matrices [4,5], hidden Markov model [6], and filter-based methods [7]. The representative filter-based methods include wavelet transform [8–10] and Gabor transform [11–13]; these methods provide a precise and unifying framework for analysis and characterization of a texture image at different scales. That said, they might fail in some challenging situations, such as intensive light and view changes. In the last decade, a number of texture classification algorithms have been proposed with significantly improved classification accuracy. Among these algorithms, the bag-of-words (BoW) model and the local binary patterns (LBP) are most representative due to their competitive performances and potential applications. The BoW model learns a dictionary from a set of filter responses [33] or from the original image patches [34]. Then, the dictionary is employed to build a histogram for each texture image. Some BoW-based methods learn discriminative features [35]. Meanwhile, LBP [36] and its variants [37–39] have also been widely used as recent state-of-the-art methods. They can be typically decomposed into two steps as shown in Fig. 1. The first step is pattern encoding. In this step, the center pixel is compared

264

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

Fig. 1. Two steps are involved to extract the LBP-like feature.

with its neighbouring ones and encoded into a binary string based on certain rules. The second step is histogram accumulation. Each binary string is translated into a decimal number which increases the count in the corresponding bin using equal weights. A feature vector is thus produced by Step 2. LBP [36] is regarded as the state-of-the-art texture representation method due to its attractive properties, such as rotation invariance, low computational complexity and robustness against monotonic illumination. However, there are still some limitations with the current LBP method. First, the encoding technique of LBP is designed in a handcrafted way in the pattern encoding process and the labeled information of local region is neglected, which is unadaptable with imaging environment changes, for example noise, illumination, scale and view changes. We believe that the performance will improve by utilizing data-driven strategy and the labeled information. Second, most LBP-like algorithms treat each pixel’s contribution equally during the histogram accumulation process, ignoring the local contrast information in texture regions. Nevertheless, image regions with high local contrast usually carry rich structure information, and therefore they should contribute more to the discrimination of the texture image than flat regions. A texture image from CUReT database [40] is shown in Fig. 2(a) and its standard deviation map of the 3 × 3 region is shown in Fig. 2(b). Intuitively, the regions with high standard deviation take more discriminative information for classification. In this paper, we propose a novel feature extraction approach, namely Completed Discriminative Local Features (CDLF), for texture classification. The CDLF builds upon LBP, with two important improvements. First, it learns discriminative encoding strategy in a data-driven way for texture classification. Our key idea is to learn a transformation matrix for texture images that maximizes the mutual information between the local features and their category labels. Second, we propose an adaptive histogram accumulation (AHA) algorithm, which leverages the local contrast characteristic in the process of histogram accumulation. In this step, we weight the standard deviation of a local region to regulate the contribution of each pixel. Hence, more useful local contrast information for texture classification is explored. Furthermore, inspired by CLBP [65], we also adopt completed representations for CDLF. Concretely, we learn the transformation matrix for the sign and magnitude components, respectively, and adopt AHA strategy to generate the histograms. By executing the above improvements, the proposed CDLF could extract discriminative and robust local features. Our approach is verified on three challenging texture image databases, and the experimental results demonstrate that the proposed CDLF achieves better results than that of previous algorithms in both noiseless and noise cases. The remainder of this paper is organized as follows. The next section presents the related work. Section 3 gives a brief review of LBP and CLBP. Section 4 presents the proposed CDLF in detail. Section 5 shows the experimental results which outperform the state-of-the-art algorithms on three publicly available texture databases. Finally, we conclude the paper in Section 6. 2. Related work Before presenting our method, we briefly discuss a number of texture classification methods. Texture classification methods could

be categorized into four main classes including model-based, structural, filter-based, and statistical methods [16]. In addition, the researchers find that the classification performance is improved when restructuring the low-level features. The BoW model is a representative one, and we also discuss it in this subsection. The model-based methods consider contextual and spatial information for texture classification. Spatial relationship among pixels is modeled by different algorithms including Gibbs random field (GRF) [17], Gaussian Markov random field [18,19], Auto-regressive (AR) [20] and hidden Markov model [6]. Awate et al. [21] focus on unsupervised texture segmentation and utilized a nonparametric statistical model of image neighborhoods. Then they employ entropy minimization on higher-order statistics to optimize segmentation. The structural methods try to find the texture primitives and determine their arrangement rules. The challenges of this kind of methods are that how to determine the primitives and the rules [22,23]. The mathematical morphology is usually employed to extract structural features for texture analysis [24,25]. The filter-based methods mainly contain spatial, frequency and spatial-frequency filters. Varma and Zisserman [33] design a bank of spatial filters named MR8 to achieve rotational invariant. The frequency filters such as Fourier transform decompose an image into frequency components [26]. The spatial-frequency filters such as wavelet transform and Gabor have superior accuracy to spatial and frequency filters. The pyramid-structured wavelet transform (PSWT) [27] and tree-structured wavelet transform (TSWT) [28] use the energy of frequency regions at different scales as features. Some extensions of the wavelet and its combining with other approaches are developed. For example, Lasmar et al. [29] combine asymmetric power distribution model with wavelet subbands for texture classification. The Radon transform combine with wavelet transform derives rotation-invariant features [14,15]. Ahmadvand and Daliri [30] extend wavelet and combine with the LL channel filter bank to obtain rotation invariant representations. As a type of wavelet extension, shearlets are effective in capturing the intrinsic geometric structures [31]. Dong et al. [32] utilize energy features to represent each shearlet subband, and then model them using linear regression. The BoW model represents [33,41–43] images as histograms over a dictionary of local features. This model first extracts features from local regions, and then learns a dictionary using these extracted local features. Finally, a histogram is computed for each texture image over the learned dictionary. A number of approaches derived from the BoW model focus on the design of local texture features. Some approaches, such as SPIN and RIFT [44,45], utilize sparse sampling which requires detecting salient regions before local feature extraction. Another kind of approaches adopt dense sampling which applies texture features pixel by pixel. Leung and Malik [46] construct a 3D texton representation using local geometric and photometric properties, and then build a dictionary by k-means algorithm. Varma and Zisserman [33] claim that the raw pixel values of local regions could achieve better performance [34]. To improve the discrimination and reduce the dimensionality, Liu et al. [48,49] employ random projection technique on the local patches. Furthermore, they fuse complementary information utilizing multiple sorted random projection channels [50]. To reduce the quantization loss, Xie et al. [51] employ sparse learning for texture classification. Afterwards, they propose an efficient sparse texton learning scheme to speed up the l0 -norm or l1 -norm minimization [52]. Mehta and Mehta [53] present patch based feature set called Dense Micro-block Difference (DMD), and minimize the quantization loss at dictionary learning step. The statistical methods are the main groups of feature representation for texture classification. Some simple features such as mean, variance and standard deviation are used for texture classi-

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

265

Fig. 2. (a) a texture image from the CUReT database; (b) its standard deviation map of the 3 × 3 region.

fication. Some methods utilize color information to overcome illumination variance [54]. Based on co-occurrence matrix, grey-level different method is proposed [55]. LBP [36] is particularly popular, and therefore it has been applied in many fields, including human detection [56], face analysis [57,58], image segmentation [59], background modeling [60], and biomedical image analysis [61]. Despite the great success of the original LBP in many applications, it also has several limitations. Many LBP variants [39,62,63] are proposed to improve the ordinary LBP at the pattern encoding step. In literature [39], Tan and Triggs propose local ternary pattern (LTP) to quantize the difference between a pixel and its neighbors into three levels, and then encoded the ternary pattern into two LBPs. Furthermore, Zhao et al. [68] validate that increasing the local quantization level can enhance the local discriminative, and present local quantization code (LQC) where pixels located in different quantization levels. To overcome the drawbacks of the gradient-based features, Mu et al. [56] design the Semantic LBP (SLBP) and Fourier LBP (FLBP) for human detection. The SLBP could reflect the semantic information of local region, and the space complexity is easily controlled. The FLBP encodes the patterns in the frequency domain to primarily avoid improper local thresholding. Heikkilä et al. [62] propose the center symmetric local binary patterns (CS-LBP) descriptor. Specifically, instead of contrast the center pixel, the center symmetric neighbour pixels are compared and encoded. Based on these improvements, center symmetric local ternary patterns (CS-LTP) [63] is proposed to obtain more discriminative feature. The self-adaptive quantization thresholds combined with the N-nary coding are employed in local energy pattern (LEP) [64] for material and dynamic texture classification. The completed LBP (CLBP) [65] decomposes the local differences into two complementary components, i.e., the signs and magnitudes, which are jointly encoded with the difference between the center pixel and global mean value. Based on CLBP, Zhao et al. [66] present completed local binary count (CLBC), counting the number of value1 s, for rotation invariant texture classification. Liu et al. [67] develop a computationally simple, noise tolerant descriptor, i.e., BRINT, relying on a strategy named circular averaging before binarization. Hafiane et al. [2] present adaptive median binary patterns (AMBP) which utilizes adaptive threshold selection that switches between the central pixel and median values based on adaptive window size. Liu et al. [70] present the median robust

extended LBP which is robust to noise. Another recent trend is to use a learning stage to build LBP-like features. Dominant patterns, such as DLBP [3] and DRLBP [69] are introduced in LBP. For more details about LBP, please refer to recent review [71]. Although the LBP-like descriptors have achieved great success, they may fail to handle the situation where the textures present complex local patterns. It is because they utilize a pre-defined encoding strategies. In this paper, we propose an informationtheoretic framework that allows learning completed discriminative local features, aiming for a better representation of texture images. In addition, we consider the local contrast characteristic in the process of histogram accumulation for a further improvement. 3. Brief review of LBP and CLBP In this section, we give a brief review of LBP and CLBP. As we have mentioned above, the CLBP extends the LBP to a completed form. 3.1. Local Binary Patterns (LBP) The LBP descriptor [36] is widely used in texture representation due to its excellent performance. The LBP descriptor labels each pixel by computing the sign of the difference between the intensities of that pixel and its neighboring pixels. The result is a decimal number computed by the obtained binary string. With these decimal numbers, the image is represented by a histogram. Formally, the LBP descriptor first calculates differences between the intensities of center pixel and its neighboring pixels

f = [ f0 , . . . , f p , . . . , fP−1 ]T = [x0 − xc , . . . , x p − xc , . . . , xP−1 − xc ]T ,

(1)

where f ∈ RP×1 is a local difference vector, P is the total number of involved neighboring pixels, and xc and xp ( p = 0, 1, · · · , P − 1) are the intensities of the center pixel and neighboring pixels, respectively. The LBP value at xc is defined as

LBPP,R =

P−1  p=0



h ( f p ) · 2 p , h (y ) =

1 if y ≥ 0 0 otherwise

(2)

266

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

Fig. 3. The flowchart of LBP coding process.

where R is the sampling radius. If the sampling neighboring pixels do not fall in the integer coordinates, the intensities are computed by bilinear interpolation. The extension of original LBP, i.e., the uniform rotation invariant LBP are shown in Appendix A. Given a texture image with the size of N1 × N2 , the whole texture image is represented by a histogram using all the LBP patterns. We define the histogram accumulation process as

H (k ) =

N1  N2 

φ (LBPP,R (i, j ), k ), k ∈ [0, K],

(3)

i=1 j=1

The proposed CDLF representation consists of three operators, i.e., CDLF_Sign (CDLF_S), CDLF_Magnitude (CDLF_M), and CDLF_Center (CDLF_C). Note that CDLF_C is the same as CLBP_C. The other two operators have improved the discriminative ability at two aspects. First, they learn discriminative encoding strategy in a data-driven way. Second, they treat the local contrast information as the bin weight of the histogram. In the following, we will introduce the proposed discriminative learning methods in detail. 4.1. Discriminative encoding strategy

 φ (LBPP,R (i, j ), k ) =

4. Approach

1, LBPP,R (i, j ) = k 0, otherwise

(4)

where LBPP, R (i, j) is the LBP value at pixel (i, j), and K is the maximal LBP pattern value. Eq. (3) indicates each LBPP, R value is accumulated in the histogram with the weight of 1. Fig. 3 shows the flowchart of LBP coding process.

4.1.1. Maximizing discrimination We merge the discriminative learning into the CDLF descriptor at the pattern encoding step. Since CDLF_M and CDLF_S have a similar encoding process, we only discuss the CDLF_M for convenience. Specifically, we utilize the information theory to learn a transformation matrix for CDLF_M

g = Tm · k , 3.2. Completed Local Binary Patterns (CLBP) In [65], the limitation of the LBP descriptor is pointed out: LBP only encodes the signs of local differences, without considering the magnitudes. Consequently, Guo et al. propose to inject both signs and magnitudes of the local differences into the LBP descriptor, which is called the completed local binary patterns (CLBP). The CLBP therefore provides a richer visual information than LBP, which is beneficial towards discriminative representations. They decompose the local differences fp into two components, i.e. the signs (sp ) and magnitudes (mp )

 f p = s p ∗ m p , where

s p = h( f p ) m p =| f p |

(5)

where h(y) is defined as in Eq. (2), and |fp | is the absolute value of fp . sp and mp correspond to two operators, i.e., CLBP_Sign (CLBP_S) and CLBP_Magnitude (CLBP_M). CLBP_S is the same as the original LBP in Eq. (2). CLBP_M is encoded by h( · ) where its input is the differences between mp and its mean value of the whole texture image. Similarly, the CLBP_Center (CLBP_C) is also encoded by h(·) where its input is the differences between xc and its mean value of the whole texture image. The three operators, CLBP_S, CLBP_M and CLBP_C are combined to further improve the performance.

(6)

where k = [k0 , . . . , k p , . . . , kP−1 ]T = [m0 − c, . . . , m p − c, . . . , mP−1 − c]T , mp is defined as in Eq. (5), and c is the mean value of mp in the whole texture image. Here, Tm ∈ RP×P is a transformation matrix satisfying TTm Tm = I, i.e., it has orthogonal columns of unitlength. In Eq. (6), we try to find a transformation matrix Tm to broaden the margin of local features k from different texture category. For simplicity, we first discuss a two-class problem, and the multi-class problem can be handled as a set of two-class problems using the one versus all strategy. Aiming at discriminant maximization, Eq. (6) can be rewritten as

max I (ga ; l ), Tm

(7)

where I represents mutual information between two random variables, ga is a set of labeled g, and l ∈ {0, 1} are their class labels. Maximizing mutual information could increase the degree of dependence between the features ga and their labels. Therefore, the discrimination of ga is improved. The optimization algorithm is shown in Appendix B. After learning the transformation matrix Tm , we obtain the features g by Eq. (6). Afterwards, we encode the CDLF_M using h(·) where the input is gp . In the same way, we utilize fp to learn another transformation matrix Ts for CDLF_S, and encode it like CLBP_S. Note that the proposed CDLF also adopts the uniform and rotation invariant strategy as [36,65].

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

267

Fig. 4. (a) Histogram accumulation, and (b) Adaptive Histogram Accumulation (AHA).

Fig. 5. The framework of CDLF.

As suggested in [65], there are two ways to combine different operators, i.e. jointly and hybridly. We adopt the same way to combine the CDLF_S, the CDLF_M, and the CDLF_C. Let us denote “CDLF_S_M” as the 2-D joint histogram of the CDLF_S and the CDLF_M. We build a 3-D joint histogram of three CDLF operators, denoted as “CDLF_S/M/C”. 4.2. Adaptive histogram accumulation (AHA) In this section, we improve the discrimination of histogram, and assign different weights for each local region by means of local contrast information. We define the weight as

ωP,R = 1 + α · StdP,R ,

(8)

P−1 2 = 1 where α is a positive constant, and StdP,R p=0 (x p − P  P−1 1 2 . The Std should be normalized by the maximum x ) p p=0 P Std of the texture image. The advantage of this weight lies in two folds. Firstly, various texture regions appear different ωP, R which explicitly considers the importance of microscopic texture structures. The weight ωP, R is high at the boundary, while it is low when covering flat regions. Secondly, the weight ωP, R is invariant when the texture image rotates.

Based on the above properties, we propose adaptive histogram accumulation (AHA) for texture image representation. We calculate wP, R for each local region and then treat it as the weight in the histogram accumulation stage. Note that the proposed AHA technique could be applied after any encoding strategies, including CDLF_S, CDLF_M, CDLF_C and jointly combining three operators of CDLF. For convenience, we only discuss the AHA for CDLF_M, and it is formulated as

H (k ) =

N1  N2 

riu2 ψ (CDLF _MP,R (i, j ), k ),

i=1 j=1 riu2 ψ (LBPP,R , k) =



k ∈ [0, K]

riu2 (i, j ) = k ω, CDLF _MP,R

0,

otherwise

(9)

(10)

We compare the traditional histogram accumulation with the AHA in Fig. 4. The traditional histogram accumulation assigns the same weight for each pattern, while the AHA assigns different weights according to the local contrast information. The framework of CDLF is illustrated in Fig. 5. Given a texture image as the input, we extract the center intensities and different features fp in Eq. (1). The center intensities are encoded to CDLF_C. The features kp are computed by Eq. (6). We employ fp and kp to

268

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

Fig. 6. Texture samples from the Outex TC10 and Outex TC12.

learn the transformation matrix Ts and Tm respectively, and then generate CDLF_S and CDLF_M. Finally, the CDLF histogram is built by combining the above three operators using AHA technique. 4.3. Dissimilarity metric and multi-scale CDLF A number of distance metrics are proposed to evaluate the dissimilarity between two histograms. In this paper, we utilize the chi-square distance. It is formulated as

d (A, B ) =

Y  ( Ay − By )2 , Ay + By

(11)

y=1

where Y is the number of bins, and Ay and By are the y-th bin values of the histograms A and B, respectively. We utilize multi-resolution strategy [36,65] to further improve the classification performance. That is, multiple operators of various (P, R) are utilized. 5. Experimental results To validate the effectiveness of the proposed algorithm, extensive experiments are carried out for texture classification. We compare our algorithm with the state-of-the-art algorithms on three public texture databases: the Outex database [73], the UIUC database [44], and the CUReT database [40]. In the experiments, each texture image is normalized to an average intensity 128 and a standard deviation 20 [36,65]. We employ the nearest neighborhood classifier with the chi-square distance for classification. 5.1. Experimental results on the Outex database We test our method on the Outex database [73] using two experimental test suites: Outex_TC_0 0 010 (TC10) and Ou-

tex_TC_0 0 012 (TC12). The two test suites contain 24 classes and some samples are shown in Fig. 6. The texture images are captured under nine different rotation angles (0°, 5°, 10°, 15°, 30°, 45°, 60°, 75°, and 90°) and three illumination conditions, i.e., 2300 K horizon sunlight (horizon), 2856 K incandescent (inca), and 40 0 0 K fluorescent tl84 (tl84). There are 20 non-overlapping texture images with the size of 128 × 128 under each imaging condition. For the TC10 test suite, 24 × 20 texture images in the conditions of rotation angle 0° and illumination “inca” are used as the training samples. The 8 × 24 × 20 texture images with the other eight rotation angles and illumination “inca” are treated as the testing samples. For the TC12 test suite, the training samples are the same as TC10. The testing samples consist of two parts, i.e., nine rotation angles with illumination “tl84” and nine rotation angles with illumination “horizon”. On this database, we first present the computational cost of the proposed CDLF. We implement the algorithm on a laptop with a 3.4 GHz CPU and 16G memory in Matlab 2012. The training time of the CDLF is about 40 minutes. The computational cost of CDLF, CLBP, LTP and LBP is 8.7ms, 8.2ms, 7.2ms, and 6.4ms, respectively for each texture image of size 128 × 128 when (P, R) is equal to (8, 1). We observe that the computational cost of CDLF is comparable with CLBP. The conclusions can be generalized to other (P, R) values and databases. Then, we study the influence of the parameter α in Eq. (8) which controls the importance of the local standard deviation. We mainly report the results of the proposed CDLF_S/M/C combining with AHA technique (CDLF_S/M/C+AHA) when (P, R) is equal to (8, 1) on the tl84 database, and our experimental results have shown that the conclusions can be generalized to the other databases as well. From Fig. 7, we can see that we achieve superior results when α is set to 2.

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

269

Table 1 Classification accuracies (%) of different methods on the TC10 and TC12 databases. R=1, P=8 TC10

LBPri LBP/VAR LTP DLBP CLBC_S CLBP_S CDLF_S CDLF_S+AHA CLBC_M CLBP_M CDLF_M CDLF_M+AHA CLBC_S_M CLBP_S_M CDLF_S_M CDLF_S_M+AHA CLBC_S/M/C CLBP_S/M/C CDLF_S/M/C CDLF_S/M/C+AHA

78.8 96.56 94.14 82.94 84.81 85.03 85.16 78.96 81.74 83.41 84.01 95.23 94.66 96.82 97.55 97.16 96.56 97.29 97.45

R=2, P=16 TC12

Average

tl84

horizon

71.97 79.31 75.88 65.02 65.46 66.67 66.92 53.63 59.3 60.28 61.06 82.13 82.75 83.75 84.21 89.79 90.30 91.06 91.62

69.98 78.08 73.96 63.17 63.68 64.72 64.44 58.01 62.77 64.10 65.0 83.59 83.14 85.05 85.32 92.92 92.29 93.31 94.03

73.58 84.65 81.33 70.38 71.31 72.14 72.17 63.53 67.93 69.26 70.02 86.98 86.85 88.54 89.03 93.29 93.05 93.89 94.37

TC10

91.72 97.84 96.95 97.7 88.67 89.40 90.23 90.60 92.45 93.67 94.84 95.96 98.10 97.89 98.59 99.04 98.54 98.72 99.48 99.64

R=3, P=24

TC12

Average

tl84

horizon

88.26 85.76 90.16 92.1 82.57 82.26 83.22 83.73 70.35 73.79 75.09 75.53 89.95 90.55 91.71 92.25 93.26 93.54 94.24 94.40

88.47 84.54 86.94 88.7 77.41 75.20 76.85 77.43 72.64 72.40 74.21 75.90 90.42 91.11 92.20 91.99 94.07 93.91 95.02 95.44

89.48 89.38 91.35 92.83 82.88 82.28 83.43 83.92 78.48 79.95 81.38 82.46 92.82 93.18 94.17 94.43 95.29 95.39 96.25 96.49

TC10

98.15 98.2 98.1 91.35 95.07 95.94 95.91 91.85 95.52 96.12 96.41 98.7 99.32 99.50 99.64 98.78 98.93 99.11 99.22

TC12

Average

t184

horizon

87.13 93.59 91.6 83.82 85.04 86.09 87.18 72.59 81.18 82.73 83.24 91.41 93.58 93.73 93.66 94.00 95.32 95.65 96.02

87.08 89.42 87.4 82.75 80.78 81.50 82.92 74.58 78.65 80.25 80.28 90.25 93.35 95.83 95.90 93.24 94.52 95.25 95.67

86.96 93.74 92.37 85.97 86.96 87.84 88.67 79.67 85.11 86.37 86.64 93.45 95.41 96.35 96.40 95.67 96.26 96.67 96.97

tively. The multi-scale CDLF achieves the ones of 99.61%, 96.04% and 96.11% for TC10, ‘tl84’ and ‘horizon’, respectively. The performance of the multi-scale CDLF+AHA is the ones of 99.74%, 96.37% and 96.32% for TC10, ‘tl84’ and ‘horizon’, respectively. The average accuracies of multi-scale CLBP, multi-scale CDLF and multiscale CDLF+AHA are 96.62% 97.25% and 97.48%, respectively. We also achieve superior performance with the multi-scale strategy. 5.2. Experimental results on the UIUC database

Fig. 7. Performance of our algorithm under different α on the tl84 database.

We compare the proposed algorithm with LBP [36], LTP [39], DLBP [3], CLBP [65], and CLBC [66]. The experimental results are listed in Table 1, from which we can draw a number of interesting conclusions. First, the operators of CDLF and their combinations (CDLF_S, CDLF_M, CDLF_S_M, and CDLF_S/M/C) obtain higher classification accuracies than those of CLBP (CLBP_S, CLBP_M, CLBP_S_M, and CLBP_S/M/C), respectively. It is because we adaptively learn a transformation matrix from training samples that maximize the mutual information between the local features and their labels. Second, when the operators of CDLF and their combinations incorporate the proposed AHA technique, the classification accuracies further improve, due to explicitly considering the local contrast information of the texture images. Third, the proposed CDLF_S/M/C+AHA achieves better performance than the other algorithms in all situations. It clearly demonstrates that the combination of CDLF_S, CDLF_M, and CDLF_C and discriminative encoding strategy together with AHA technique turn out to be a powerful representation of texture images. The use of multi-scale technique offers improvements over single-scale analysis. Concretely, we fuse three scales, i.e. (P, R ) = (8, 1 ), (P, R ) = (16, 2 ), and (P, R ) = (24, 3 ) for texture classification [65]. The multi-scale CLBP obtains the classification accuracies of 99.14%, 95.18% and 95.55% for TC10, ‘tl84’ and ‘horizon’, respec-

There are 25 homogeneous classes on the UIUC database as shown in Fig. 8. Each texture class includes 40 images with the size of 640 × 480 captured under significant view variations, and therefore this database is more challenging. As in [66,74], we randomly choose N texture images from each class as the training samples, while the remaining 40 − N texture images are used for testing. The average accuracies over 100 randomly splits with N = 20, 15, 10, 5 are reported in Table 2. The results lead us to the following conclusions. First, the proposed methods (CDLF and CDLF+AHA) achieves better results than CLBP at any condition, owing to the discriminative encoding strategy and the AHA technique. Second, the proposed CDLF+AHA obtains the best result when (R, P) is equal to (2, 16) for 20 and 10 training samples, and (R, P ) = (3, 24 ) for 15 and 5 training samples. Third, the classification accuracies for all the algorithms decrease with less training samples. However, the CDLF performs much better than CLBP when the number of training samples decreases. For example, when (R, P) is equal to (1, 8) the proposed CDLF is 2% better than the CLBP using 20 training samples, while the improvement goes to 3% using 5 training samples. When the multi-scale technique is adopted, i.e. (P, R ) = (8, 1 ), (P, R ) = (16, 2 ), and (P, R ) = (24, 3 ), the results further improve (see Table 3). In order to test the robustness of the proposed algorithm, the texture images are corrupted by additive Gaussian noise with different Signal-to-Nosie Ratios (SNRs). We randomly choose 20 training samples from each class, and the remaining samples are used for testing. Fig. 9 shows the average results over 100 randomly splits with (R, P ) = (3, 24 ). These results indicate that all methods maintain their performances when SNRs > 15. It is because the noise power is negligible compared to the texture image signal intensity. When the SNRs value drops below 15, the noise intensity

270

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

Fig. 8. Texture samples from the UIUC database.

Table 2 Classification accuracies (%) of different methods on the UIUC database. R=1, P=8

LTP LBP/VAR CLBC_S CLBP_S CDLF_S CDLF_S+AHA CLBC_M CLBP_M CDLF_M CLBC_S_M CLBP_S_M CDLF_S_M CDLF_S_M+AHA CLBC_S/M/C CLBP_S/M/C CDLF_S/M/C CDLF_S/M/C+AHA

R=2, P=16

R=3, P=24

20

15

10

5

20

15

10

5

20

15

10

5

67.16 66.61 55.61 54.78 56.57 57.87 52.12 57.52 59.28 82.40 81.80 83.87 84.80 87.83 87.64 89.57 90.16

64.29 63.98 51.11 51.85 53.42 54.18 49.84 54.14 56.43 78.86 78.55 80.20 80.83 85.66 85.70 87.73 88.30

58.20 58.49 46.69 46.79 49.12 51.65 45.51 50.11 51.82 74.88 74.8 77.38 78.69 82.35 82.65 84.74 85.17

48.15 50.37 39.85 40.53 43.28 45.87 39.04 40.95 42.58 65.28 64.84 67.91 68.03 74.57 75.05 78.39 78.82

79.25 73.31 62.39 61.04 63.76 64.46 67.1 72.12 73.46 88.51 87.87 88.62 89.11 91.04 91.04 92.85 94.10

75.80 70.58 59.17 55.84 58.87 59.32 64.42 68.99 69.39 86.31 85.07 86.76 87.94 89.66 89.42 90.17 90.53

70.77 66.04 53.07 51.77 54.87 55.97 59.01 64.47 66.21 82.04 80.59 82.91 84.75 86.63 86.29 88.73 89.58

60.34 57.03 43.37 41.88 44.73 46.43 50.67 57.06 58.97 73.16 71.64 73.79 74.66 79.48 78.57 80.55 80.82

82.34 75.01 66.90 64.11 67.12 68.45 69.33 74.45 76.26 89.72 89.18 90.87 91.14 91.39 91.19 92.90 93.77

79.10 71.94 63.48 60.11 63.87 66.27 66.63 71.47 72.65 87.68 87.42 88.39 88.85 90.10 89.21 91.18 92.83

73.94 66.86 57.46 54.67 57.79 59.83 60.62 65.21 66.40 83.92 81.95 84.05 85.73 86.45 85.95 87.34 88.80

62.19 57.55 47.19 44.45 48.53 50.22 51.68 56.72 58.67 75.16 72.53 75.63 76.09 79.75 78.05 81.78 82.11

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

271

Table 3 Classification accuracies (%) using multi-scale scheme on the UIUC database.

multi-scale CLBP multi-scale CDLF multi-scale CDLF+AHA

20

15

10

5

91.57 93.28 94.03

89.84 92.07 93.28

86.73 89.21 89.73

78.42 79.74 80.36

5.3. Experimental results on the CURet database

Fig. 9. Classification accuracy (%) on the UIUC database with different SNRs Gaussian noise.

ascends so that all the compared algorithms undergo considerable decrease of classification accuracy. But as the noise level increases, our observation is that the advantage of CDLF and CDLF+AHA over CBLP is further enlarged.

The CUReT database contains 61 texture classes each with 92 images under different illumination direction as shown in Fig. 10. As in [44,65,66,74], we randomly choose N texture images from each class as training samples, while the remaining 92 − N for testing. The average accuracies over 100 randomly splits with N = 46, 23, 12, 6 for single scale and multi-scale scheme are listed in Table 4 and Table 5, respectively. Similar conclusions to those on the UIUC database can be found on this database. Our algorithm yields the best results both in single scale and multi-scale situations. Once again, we prove the effectiveness of our algorithm on this database. We further add Gaussian noise with different SNRs to the original texture images. We randomly choose 46 texture images from each class as training samples, and the remaining for testing. Fig. 11 shows the average results over 100 randomly splits with (R, P ) = (3, 24 ). From this figure, we can see that the proposed CDLF+AHA achieves the best results in all situations. The proposed

Fig. 10. 61 texture samples from the CUReT database.

272

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275 Table 4 Classification accuracies (%) of different methods on the CUReT database. R=1, P=8

LTP LBP/VAR CLBC_S CLBP_S CDLF_S CDLF_S+AHA CLBC_M CLBP_M CDLF_M CDLF_M+AHA CLBC_S_M CLBP_S_M CDLF_S_M CDLF_S_M+AHA CLBC_S/M/C CLBP_S/M/C CDLF_S/M/C CDLF_S/M/C+AHA

R=2, P=16 23

12

6

46

23

12

6

46

23

12

6

85.77 61.55 78.82 80.03 81.63 82.38 66.61 74.78 76.29 77.21 93.10 93.24 94.75 95.08 94.78 95.19 96.71 97.40

78.49 55.33 72.89 73.07 74.87 75.08 57.82 67.86 70.32 70.76 86.62 88.19 90.63 91.47 90.12 91.2 92.28 93.21

70.77 49.28 66.21 67.60 69.32 70.44 58.62 59.95 61.85 62.63 79.88 80.43 83.37 84.15 82.92 83.81 85.73 86.06

60.48 41.96 56.88 58.68 59.76 60.86 50.12 57.52 59.88 61.87 69.89 71.45 74.23 75.66 72.85 73.44 75.30 76.10

90.21 55.49 79.78 84.05 86.40 86.65 73.89 82.71 83.28 83.37 93.78 93.20 95.85 96.20 95.39 95.35 96.22 96.88

84.74 50.76 74.42 79.05 80.22 81.28 66.05 75.93 77.39 77.89 89.60 89.01 91.37 91.87 91.30 91.24 93.07 93.65

76.24 45.14 68.95 72.01 74.48 76.06 58.7 68.32 70.78 71.57 82.71 81.93 82.64 83.59 85.91 84.66 86.18 87.17

66.75 39.07 60.42 62.73 65.02 66.52 50.63 57.55 60.72 61.48 72.16 72.54 75.74 77.08 75.17 75.41 77.62 78.81

91.04 55.6 80.14 86.06 86.97 88.73 77.41 86.59 88.48 89.70 93.60 93.94 94.28 95.58 95.26 95.38 95.87 96.27

85.15 51.33 74.21 81.63 82.79 83.21 68.36 79.76 81.51 82.06 89.12 89.88 90.61 91.86 90.55 91.77 92.50 92.80

77.88 44.5 70.57 75.51 77.72 78.05 60.53 72.12 75.33 75.35 81.57 83.95 86.19 87.29 84.07 85.01 86.79 87.45

68.64 38.82 60.82 67.00 69.30 70.43 51.23 62.81 64.84 65.37 70.52 73.23 75.80 77.39 73.18 76.16 78.95 79.82

Table 5 Classification accuracies (%) using multi-scale scheme on the CUReT database.

multi-scale CLBP multi-scale CDLF multi-scale CDLF+AHA

R=3, P=24

46

46

23

12

6

97.39 98.37 98.71

94.19 95.70 96.10

88.72 90.08 91.39

79.88 82.58 84.22

5.4. Comparison with the BoW-based methods Varma and Zisserman [34] utilize the joint distribution of intensity values (VZ_Joint) to classify texture images under the framework of the BoW model. They claim that classification based on the BoW model directly learned from the raw image patches outperforms that based on filter bank responses [46,47]. Therefore, we only compare the representative BoW-based method, i.e. VZ_Joint. Table 6 shows the classification accuracy on the three databases. We can see that the proposed approach leads to higher accuracy than VZ_Joint on the TC10 suite, the UIUC database and the CUReT database. 6. Conclusion This paper proposes a learning-based discriminative descriptor for texture classification. Based on CLBP, the extraction of the CDLF descriptor consists of two steps. First, it learns transformation matrices through mutual information maximization. Second, an adaptive histogram accumulation (AHA) is proposed to assign different weights to each pixel based on the local contrast information. In essence, the CDLF descriptor inherits the advantages of CLBP, and more importantly, it 1) adapts to different global context (datasets) by learning the transformation matrix, and 2) adapts to each image by local context. The experimental results clearly demonstrate that the proposed method achieves higher accuracy than the stateof-the-art methods.

Fig. 11. Classification accuracy (%) on the CUReT database with different SNRs Gaussian noise.

Acknowledgment CDLF obtains higher accuracy than CLBP except at SNRs = 30. As the noise level increases, the relative performance differences between the proposed algorithms (CDLF, CDLF+AHA) and CLBP increases. It proves the noise tolerance property of the proposed algorithms.

This work is supported by National Natural Science Foundation of China under Grant No. 61401309, No. 61501327, and No. 61401310, Natural Science Foundation of Tianjin under Grant No. 15JCQNJC01700, and Doctoral Fund of Tianjin Normal University

Table 6 Classification accuracies (%) of different methods on three databases. Outex database

VZ_Joint [34] multi-scale CDLF multi-scale CDLF+AHA

UIUC database

CUReT database

TC10

tl84

horizon

Average

20

15

10

5

46

23

12

6

98.51 99.61 99.74

97.45 96.04 96.37

98.35 96.11 96.33

98.10 97.25 97.48

93.27 93.28 94.03

92.00 92.07 93.28

88.39 89.21 89.73

80.87 79.74 80.36

96.51 98.37 98.71

93.42 95.70 96.10

88.22 90.08 91.39

79.14 82.58 84.22

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

under Grant No. 5RL134 and No. 52XB1405. This work is also supported by the Open Projects Program of National Laboratory of Pattern Recognition under Grant No. 20170 0 0 01.

and n is the step number for searching optimal rotation direction, defined as

n∗ = arg max J (Ln Tm (r − 1 )). 0≤n≤N

Appendix A. the uniform rotation invariant LBP The rotation invariant LBP is usually obtained by ri LBPP,R = min{ROR(LBPP,R , i )|i = 0, 1, . . . , P − 1},

(A.1)

where ROR(x, i) performs a circular bitwise right shift i times on the bit number x. In order to reduce the interference of noise, Ojala et al. [36] defined the U value at each pixel as the number of bitwise transitions between 0 and 1

U (LBPP,R ) = |h( fP−1 ) − h( f0 )| +

P−1 

|h( f p ) − h( f p−1 )|,

(A.2)

p=1

The uniform LBP only includes those pixels with no more than two transitions (i.e. U ≤ 2) in the adjacent binary presentation. riu2 is deAccordingly, a uniform rotation invariant pattern LBPP,R fined as

P−1

riu2 LBPP,R =

p=0 h ( f p ), U (LBPP,R ) ≤ 2 P + 1, otherwise

(A.3)

where the superscript riu2 means uniform rotation invariant patterns with U ≤ 2.

273

(B.5)

In the optimization process, we first initialize Tm (see the next paragraph). At r step, we compute the Ln for all n (1 ≤ n ≤ N), and obtain the optimal n∗ by choosing the maximum value of J (Ln Tm (r − 1 )) (see Eq. (B.5)). After obtaining optimal n∗ , we ∗ calculate Ln by Eq. (B.4). Then, we could obtain Tm (r ) by Eq. (B.3) at r step. The iterative optimization algorithm terminates when Tm (r ) − Tm (r − 1 )F < ε , where  · F represents Frobenius norm, and ε is a small positive number. The above optimization process is illustrated in Algorithm 1. The mathematical principle Algorithm 1 Optimization Process. Input: Tm (0 ), σ > 0, ε > 0, N > 0 Output: Tm (r ) Initialize the initialization of Tm (0 ) using the basis of principalsubspaces; while Tm (r ) − Tm (r − 1 )F < ε do 1. The optimal n is obtained by Eq. (B.5) 2. The optimal rotation direction Ln can be computed by Eq. (B.4) ∗ 3. L(r ) = Ln 4. Tm (r ) = L(r )Tm (t − 1 ) end

Appendix B. optimization algorithm With the help of differential entropy H, we reformulate Eq. (7) as

I (ga ; l ) = H (ga ) − H (ga | l )

(B.1)

= H (ga ) − P (l = 1 )H (gu ) − P (l = 0 )H (gv ),

where gu and gv are the positive and negative features from ga , respectively. Meanwhile, we suppose that the prior probabilities for the two classes are equal to 0.5. The differential entropy H is estimated using g, H (g ) = 12 ln((2π )P det ) where the covariance matrix  can be computed from g. Here, we assume the distribution of g is approximately Gaussian. Thus, Eq. (B.1) is approximately equal to

1 1 . I (ga ; l ) = ln det a − ln det u − ln det v , 2 2

(B.2)

where a , u , and v are the covariance matrices computed from all labeled ga , gu , and gv , respectively. Let us denote the objective function Eq. (7) as W (Tm ). The target of the optimization algorithm is to search the transformation matrix Tm in order to maximize W (Tm ). Traditionally, we need to calculate the gradient of W (Tm ), i.e. ∂ W∂ T(Tm ) by using gradim

ent based method. But, it difficult to calculate ∂ W∂ T(Tm ) subject to m TTm Tm = I. Thus, we seek L(r ) ∈ SO(P ) to iteratively optimize Tm (r ) at r step which achieves higher value of W (Tm ) than that of (r − 1 ) step.

Tm ( r ) = L ( r ) T m ( r − 1 ) ,

(B.3)

where SO(P ) is the P-dimensional special orthogonal group, and it leads to a set of rotation operations in RP . As a result, Tm keeps the property of orthogonality, i.e. TTm Tm = I. We expect L(r ) to provide an optimal rotation direction which makes the objective value W increases rapidly. On the basis of the Lie algebra, L is defined as

Ln = exp(nσ



(Ei, j − E j,i )),

(B.4)

i, j

where 2 ≤ i ≤ P, i + 1 ≤ j ≤ P, and Ei, j is a matrix whose (i, j)-th element is one and all others are zero. Here, σ is the step length,

behind this algorithm and the details can be found in [72]. It is crucial to initialize the transformation matrix Tm for CDLF_M. We utilize the basis of principal subspaces of k in Eq. (6) as the initialization of Tm . References [1] B. Julesz, Textons, the elements of texture perception, and their interactions, Nature 290 (5802) (1981) 91–97. [2] A. Hafiane, K. Palaniappan, G. Seetharaman, Joint adaptive median binary patterns for texture classification, Pattern Recognit. 48 (8) (2015) 2609–2620. [3] S. Liao, M. Law, A. Chung, Dominant local binary patterns for texture classification, IEEE Trans. Image Process. 18 (5) (2009) 1107–1118. [4] R.M. Haralick, K. Shanmugam, I. Dinstein, Textural features for image classification, IEEE Trans. Syst. Man Cybern. 3 (6) (1973) 610–621. [5] L.S. Davis, Polarograms: a new tool for image texture analysis, Pattern Recognit. 13 (3) (1981) 219–223. [6] J. Chen, A. Kundu, Rotation and gray scale transform invariant texture identification using wavelet decomposition and hidden markov model, IEEE Trans. Pattern Anal. Mach. Intell. 16 (2) (1994) 208–214. [7] T. Randen, J.H. Husoy, Filtering for texture classification: a comparative study, IEEE Trans. Pattern Anal. Mach. Intell. 21 (4) (1999) 291–310. [8] A. Laine, J. Fan, Texture classification by wavelet packet signatures, IEEE Trans. Pattern Anal. Mach. Intell. 15 (11) (1993) 1186–1191. [9] D. Charalampidis, T. Kasparis, Wavelet-based rotational invariant roughness features for texture classification and segmentation, IEEE Trans. Image Process. 11 (8) (2002) 825–837. [10] Y. Xu, X. Yang, H. Ling, H. Ji, A new texture descriptor using multifractal analysis in multi-orientation wavelet pyramid, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 161–168. [11] A.C. Bovik, M. Clark, W.S. Geisler, Multichannel texture analysis using localized spatial filters, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1) (1990) 55–73. [12] A.K. Jain, F. Farrokhnia, Unsupervised texture segmentation using gabor filters, Pattern Recognit. 24 (12) (1991) 1167–1186. [13] B.S. Manjunath, W.Y. Ma, Texture features for browsing and retrieval of image data, IEEE Trans. Pattern Anal. Mach. Intell. 18 (8) (1996) 837–842. [14] K. Jafari-Khouzani, H. Soltanian-Zadeh, Radon transform orientation estimation for rotation invariant texture analysis, IEEE Trans. Pattern Anal. Mach. Intell. 27 (6) (2005) 1004–1008. [15] P. Cui, J. Li, Q. Pan, H. Zhang, Rotation and scaling invariant texture classification based on radon transform and multiscale analysis, Pattern Recognit. Lett. 27 (5) (2006) 408–413. [16] R. Maani, S. Kalra, Y.H. Yang, Noise robust rotation invariant features for texture classification, Pattern Recognit. 46 (8) (2013) 2103–2116. [17] I.M. Elfadel, R.W. Picard, Gibbs random fields, cooccurrences, and texture modeling, IEEE Trans. Pattern Anal. Mach. Intell 16 (1) (1994) 24–37. [18] H. Deng, D.A. Clausi, Gaussian MRF rotation-invariant features for image classification, IEEE Trans. Pattern Anal. Mach. Intell. 26 (7) (2004) 951–955.

274

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

[19] C. Dharmagunawardhana, S. Mahmoodi, M. Bennett, M. Niranjan, Rotation invariant texture descriptors based on gaussian markov random fields for classification, Pattern Recognit. Lett. 69 (1) (2016) 15–21. [20] J. Mao, A.K. Jain, Texture classification and segmentation using multiresolution simultaneous autoregressive models, Pattern Recognit. 25 (2) (1992) 173–188. [21] S.P. Awate, T. Tasdizen, R.T. Whitaker, Unsupervised texture segmentation with nonparametric neighborhood statistics, Eur. Conf. Comput. Vis. (2006) 494–507. [22] E.R. Urbach, M.H.F. Wilkinson, Efficient 2-d grayscale morphological transformations with arbitrary flat structuring elements, IEEE Trans. Image Process. 17 (1) (2008) 1–8. [23] F.M. Khellah, Texture classification using dominant neighborhood structure, IEEE Trans. Image Process. 20 (11) (2011) 3270–3279. [24] W.K. Lam, C.K. Li, Scale invariant texture classification by iterative morphological decomposition, in: International Conference on Image Processing (ICIP), 2001, pp. 141–144. [25] E. Aptoula, Extending morphological covariance, Pattern Recognit. 45 (12) (2012) 4524–4535. [26] R. Azencott, J.P. Wang, L. Younes, Texture classification using windowed fourier filters, IEEE Trans. Pattern Anal. Mach. Intell. 19 (2) (1997) 148–153. [27] L. Xavier, T.B.I. Mary, N.D.W. Raj, Content based image retrieval using textural features based on pyramid-structure wavelet transform, in: International Conference on Electronics Computer Technology (ICECT), 2011, pp. 79–83. [28] T. Chang, C.-C.J. Kuo, Texture analysis and classification with tree-structured wavelet transform, IEEE Trans. Image Process. 2 (4) (1993) 429–441. [29] N.E. Lasmar, A. Baussard, G.L. Chenadec, Asymmetric power distribution model of wavelet subbands for texture classification, Pattern Recognit. Lett. 5 (1) (2015) 1–8. [30] A. Ahmadvand, M.R. Daliri, Rotation invariant texture classification using extended wavelet channel combining and LL channel filter bank, Knowl. Based Syst. 97 (1) (2016) 75–88. [31] G. Easley, D. Labate, W.Q. Lim, Sparse directional image representations using the discrete shearlet transform, Appl. Comput. Harmon. Anal. 25 (1) (2008) 25–46. [32] Y. Dong, D. Tao, X. Li, J. Ma, J. Pu, Texture classification and retrieval using shearlets and linear regression, IEEE Trans. Cybern. 45 (3) (2015) 358–369. [33] M. Varma, A. Zisserman, A statistical approach to texture classification from single images, Int. J. Comput. Vis. 62 (1–2) (2005) 61–81. [34] M. Varma, A. Zisserman, A statistical approach to material classification using image patch exemplars, IEEE Trans. Pattern Anal. Mach. Intell. 31 (11) (2009) 2032–2047. [35] L. Zheng, S. Wang, J. Wang, Q. Tian, Accurate image search with multi-scale contextual evidences, Int. J. Comput. Vis. 120 (1) (2016) 1–13. [36] T. Ojala, M. Pietikäinen, T. Maenpää, Multiresolution grayscale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell. 24 (7) (2002) 971–987. [37] X. Qian, X. Hua, P. Chen, L. Ke, PLBP: an effective local binary patterns texture descriptor with pyramid representation, Pattern Recognit. 44 (10) (2011) 2502–2515. [38] J. Ren, X. Jiang, J. Yuan, Noise-resistant local binary pattern with an embedded error-correction mechanism, IEEE Trans. Image Process. 22 (10) (2013) 4049–4060. [39] X. Tan, B. Triggs, Enhanced local texture feature sets for face recognition under difficult lighting conditions, IEEE Trans. Image Process. 19 (6) (2010) 1635–1650. [40] K.J. Dana, B.V. Ginneken, S.K. Nayar, J.J. Koenderink, Reflectance and texture of real-world surfaces, ACM Trans. Graph. 18 (1) (1999) 1–34. [41] M. Varma, A. Zisserman, Texture Classification: Are Filter Banks Necessary? in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2003, pp. 691–698. [42] L. Zheng, S. Wang, Z. Liu, Q. Tian, Packing and padding: coupled multi-index for accurate image retrieval, IEEE Conf. Comput. Vis. Pattern Recognit. (2014) 1939–1946. [43] L. Zheng, S. Wang, Q. Tian, Coupled binary embedding for large-scale image retrieval, IEEE Trans. Image Process. 23 (8) (2014) 3368–3380. [44] S. Lazebnik, C. Schmid, J. Ponce, A sparse texture representation using local affine regions, IEEE Trans. Pattern Anal. Mach. Intell. 27 (8) (2005) 1265–1278. [45] J. Zhang, M. Marszałek, S. Lazebnik, C. Schmid, Local features and kernels for classification of texture and object categories: a comprehensive study, Int. J. Comput. Vis. 73 (2) (2007) 213–238. [46] T. Leung, J. Malik, Representing and recognizing the visual appearance of materials using three-dimensional textons, Int. J. Comput. Vis. 43 (1) (2001) 29–44.

[47] O.G. Cula, K.J. Dana, 3D texture recognition using bidirectional feature histograms, Int. J. Comput. Vis. 59 (1) (2004) 33–60. [48] L. Liu, P. Fieguth, Texture classification from random features, IEEE Trans. Pattern Anal. Mach. Intell. 34 (3) (2012) 574–586. [49] L. Liu, P. Fieguth, D. Clausi, G. Kuang, Sorted random projections for robust rotation-invariant texture classification, Pattern Recognit. 45 (6) (2012) 2405–2418. [50] L. Liu, P. Fieguth, D. Hu, Y. Wei, G. Kuang, Fusing sorted random projections for robust texture and material classification, IEEE Trans. Circuits Syst. Video Technol. 25 (3) (2015) 482–496. [51] J. Xie, L. Zhang, J. You, D. Zhang, Texture classification via patch-based sparse texton learning, IEEE Int. Conf. Image Process. (2010) 2737–2740. [52] J. Xie, L. Zhang, J. You, S. Shiu, Effective texture classification by texton encoding induced statistical features, Pattern Recognit. 48 (2) (2015) 447–457. [53] R. Mehta, K.E. Eguiazarian, Texture classification using dense micro-block difference, IEEE Trans. Image Process. 25 (4) (2016) 1604–1616. [54] O. Arandjelovic´ , Colour invariants under a non-linear photometric camera model and their application to face recognition from video, Pattern Recognit. 45 (7) (2012) 2499–2509. [55] T. Ojala, K. Valkealahti, E. Oja, M. Pietikäinen, Texture discrimination with multidimensional distributions of signed gray-level differences, Pattern Recognit. 34 (3) (2001) 727–739. [56] Y. Mu, S. Yan, Y. Liu, T. Huang, B. Zhou, Discriminative local binary patterns for human detection in personal album, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008, pp. 1–8. [57] T. Ahonen, A. Hadid, M. Pietikainen, Face description with local binary patterns: application to face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 28 (12) (2006) 2037–2041. [58] S. Moore, R. Bowden, Local binary patterns for multi-view facial expression recognition, Comput. Vision Image Understanding 115 (4) (2011) 541–558. [59] P. Nammalwar, O. Ghita, P.F. Whelan, A generic framework for colour texture segmentation, Sensor Rev. 30 (1) (2010) 69–79. [60] M. Heikkilä, M. Pietikäinen, A texture-based method for modeling the background and detecting moving objects, IEEE Trans. Pattern Anal. Mach. Intell. 28 (4) (2006) 657–662. [61] K. Burçin, N.V. Vasif, Down syndrome recognition using local binary patterns and statistical evaluation of the system, Expert Syst. Appl. 38 (7) (2011) 8690–8695. [62] M. Heikkilä, M. Pietikäinen, C. Schmid, Description of interest regions with local binary patterns, Pattern Recognit. 42 (3) (2009) 425–436. [63] R. Gupta, H. Patil, A. Mittal, Robust order-based methods for feature description, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010, pp. 334–341. [64] J. Zhang, J. Liang, H. Zhao, Local energy pattern for texture classification using self-adaptive quantization thresholds, IEEE Trans. Image Process. 22 (1) (2013) 31–42. [65] Z. Guo, L. Zhang, D. Zhang, A completed modeling of local binary pattern operator for texture classification, IEEE Trans. Image Process. 19 (6) (2010) 1657–1663. [66] Y. Zhao, D. Huang, W. Jia, Completed local binary count for rotation invariant texture classification, IEEE Trans. Image Process. 21 (10) (2012) 4492–4497. [67] L. Liu, Y. Long, P.W. Fieguth, S. Lao, G. Zhao, BRINT: Binary rotation invariant and noise tolerant texture classification, IEEE Trans. Image Process. 23 (7) (2014) 3071–3084. [68] Y. Zhao, R.G. Wang, W.M. Wang, W. Gao, Local quantization code histogram for texture classification, Neurocomputing 207 (2016) (2016) 354–364. [69] R. Mehta, K. Egiazarian, Dominant rotated local binary patterns (DRLBP) for texture classification, Pattern Recognit. Lett. 71 (2016) (2016) 16–22. [70] L. Liu, S. Lao, P.W. Fieguth, Y. Guo, X. Wang, M. Pietikäinen, Median robust extended local binary pattern for texture classification, IEEE Trans. Image Process. 25 (3) (2016) 1368–1381. [71] L. Liu, P. Fieguth, Y. Guo, X. Wang, M. Pietikäinen, Local binary features for texture classification: taxonomy and experimental study, Pattern Recognit. 62 (2017) (2017) 135–160. [72] B. Hall, Lie groups, lie algebras, and representations: an elementary introduction, Springer, 2003. [73] T. Ojala, T. Mäenpää, M. Pietikainen, J. Viertola, J. Kyllönen, S. Huovinen, Outex-new framework for empirical evaluation of texture analysis algorithms, in: International Conference on Pattern Recognition (ICPR), 2002, pp. 701–706. [74] Y. Zhao, W. Jia, R. Hu, H. Min, Completed robust local binary pattern for texture classification, Neurocomputing 106 (2013) 68–76.

Z. Zhang et al. / Pattern Recognition 67 (2017) 263–275

275

Zhong Zhang received the Ph.D. degree from the Institute of Automation, Chinese Academy of Sciences. He joined Tianjin Normal University, where he is currently an Associate Professor. He has published about 60 papers in the areas of pattern recognition and computer vision in international journals and conferences such as the IEEE Transactions on Circuits Systems Video Technology, IEEE Transactions on information forensics and Security, Signal Processing (Elsevier), CVPR, ICPR and ICIP. His current research interests include pattern recognition, computer vision and machine learning. Shuang Liu is an Associate Professor at Tianjin Normal University. She received the B.S. and M.S. degrees in mathematics from Heilongjiang University in 2006 and 2009, respectively, and the Ph.D. degree in pattern recognition and intelligence systems from the Institute of Automation, Chinese Academy of Sciences in 2014. Her current research interests include texture analysis, pattern recognition and computer vision. Xing Mei is an assistant professor in the National Laboratory of Pattern Recognition, Institution of Automation, Chinese Academy of Sciences (CASIA) and a research scholar at SUNY Albany during 2013–2015. His research interests are in computer vision and computer graphics. Mei received a PhD in pattern recognition and intelligence systems from CASIA. Baihua Xiao received the B.S. degree in Electronic Engineering from Northwestern Polytechnical University, Xi’an, China and the Ph.D. degree in Pattern Recognition and Intelligent control from the Institute of Automation, Chinese Academy of Sciences, Beijing, China, in 1995 and 20 0 0, respectively. From 2005, he has been a Professor at the State Key Laboratory of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China. His research interests include pattern recognition, image processing and machine learning. Liang Zheng received the Ph.D degree in Electronic Engineering from Tsinghua University, China, in 2015, and the B.E. degree in Life Science from Tsinghua University, China, in 2010. He was a postdoc researcher in University of Texas at San Antonio, USA. He is currently a postdoc researcher in Quantum Computation and Intelligent Systems, University of Technology Sydney, Australia. His research interests include image retrieval, classification, and person re-identification.