Eye tracking data guided feature selection for image classification

Eye tracking data guided feature selection for image classification

Author’s Accepted Manuscript Eye Tracking Data Guided Feature Selection for Image Classification Xuan Zhou, Xin Gao, Jiajun Wang, Hui Yu, Zhiyong Wang...

2MB Sizes 0 Downloads 160 Views

Author’s Accepted Manuscript Eye Tracking Data Guided Feature Selection for Image Classification Xuan Zhou, Xin Gao, Jiajun Wang, Hui Yu, Zhiyong Wang, Zheru Chi www.elsevier.com/locate/pr

PII: DOI: Reference:

S0031-3203(16)30265-5 http://dx.doi.org/10.1016/j.patcog.2016.09.007 PR5869

To appear in: Pattern Recognition Received date: 20 September 2014 Revised date: 9 September 2016 Accepted date: 10 September 2016 Cite this article as: Xuan Zhou, Xin Gao, Jiajun Wang, Hui Yu, Zhiyong Wang and Zheru Chi, Eye Tracking Data Guided Feature Selection for Image C l a s s i f i c a t i o n , Pattern Recognition, http://dx.doi.org/10.1016/j.patcog.2016.09.007 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Eye Tracking Data Guided Feature Selection for Image Classification Xuan Zhoua,1 , Xin Gaob,1 , Jiajun Wanga,∗, Hui Yua , Zhiyong Wangc , Zheru Chid,e a School

of Electronic and Information Engineering, Soochow University, Suzhou 215006, P.R.China b Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science, Suzhou 215163, P.R.China c School of Information Technologies, The University of Sydney, NSW 2006, Australia d Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong e PolyU Shenzhen Research Institute, Shenzhen, P.R.China

Abstract Feature selection has played a critical role in image classification, since it is able to remove irrelevant and redundant features and to eventually reduce the dimensionality of feature space. Although existing feature selection methods have achieved promising progress, human factors have seldom been taken into account. To tackle such a problem, a novel two-stage feature selection method is proposed for image classification by taking human factors into account and leveraging the value of eye tracking data. In the coarse selection stage, with the help of eye tracking data, Regions of Interests (ROIs) from the human perspective are first identified to represent an image with visual features. Then, with an improved quantum genetic algorithm (IQGA) that incorporates a novel mutation strategy for alleviating the premature convergence, a subset of features are obtained for the subsequent fine selection. In the fine selection stage, a hybrid method is proposed to integrate the efficiency of the minimal-RedundancyMaximal-Relevance (mRMR) and the effectiveness of the Support Vector Machine based Recursive Feature Elimination (SVM-RFE). In particular, the ranking criterion of the SVM-RFE is improved by incorporating the ranking information obtained from the mRMR. Comprehensive experimental results in two benchmark datasets demonstrate that eye tracking data are of great importance to improve the performance of feature selection for image classification. Keywords: Eye tracking, Feature selection, Quantum genetic algorithm (QGA), mRMR, SVM-RFE

∗ Corresponding 1 Xuan

Author: [email protected], Tel: (+86) 0512 6522 1873, Fax: (+86) 0512 6787 1211 Zhou and Xin Gao contributed equally to this study

Preprint submitted to Pattern Recognition

September 12, 2016

1. INTRODUCTION Feature selection has been one of the key components of many pattern recognition systems such as image classification [1, 2] and cancer classification systems2 [3], when more and more diverse information is available to characterize entities such as images and objects to be classified. Since more features do not always lead to better classification performance, feature selection aims to identify a set of relevant and necessary features and to reduce the dimensionality of feature space for improving classification performance [4]. It will also reduce storage and computational costs. There are two types of feature selection methods: filters and wrappers [5, 6]. Filter type methods utilize the general characteristics of feature data and select the top-ranked features according to a criterion. In general, they aim to maximize the relevancy of a set of features while minimizing the redundancy among features [3, 7, 8, 9]. For example, in [8], with a minimal-redundancy-maximal-relevance (mRMR) criterion defined based on the mutual information, near-optimal features were identified with an incremental search method. However, the mRMR method does not allow a flexible trade-off between the relevancy and redundancy of features although a suitable trade-off could be useful for improving the classification performance [2]. The greedy algorithms and the simulated annealing ones [3, 10] are typical examples taking the trade-off into account and attempting to determine the optimal trade-off between the relevancy and the redundancy of a set of genes. The most important merit of the filter methods lies in their efficiency. However, due to the lack of a feedback process involved, it is difficult to ensure improved classification performance when the resulted feature subset is used for model training and learning. The wrapper type methods are usually able to achieve higher classification accuracy than the filter type ones, since the characteristics of classifiers are taken into account in the feature selection process. Among the wrapper type methods, the support vector machine (SVM) is one of the most widely used classifiers. For example, the SVM recursive feature elimination (SVM-RFE) method [11] has attracted more and more attention. In this method, the weights of the trained SVM classifier were used as the ranking measures of genes. Based on these measures, genes with the poorest performance were removed. Different from the SVM-RFE [11] which recursively eliminates individual genes, an SVM recursive cluster-elimination-based (SVM-RCE) method [12] was proposed to remove gene clusters according to a score defined in terms of the accuracy of the SVM for each gene cluster. Duan et al. proposed a multiple SVM-RFE (MSVM-RFE) method [13] where the SVM was trained on multiple subsets of training data and genes were ranked through the statistical analysis of gene weights in multiple runs. In 2 Feature

selection is often referred as gene selection in the field of bioinformatics.

2

[14], Wahde and Szallasi reviewed evolutionary algorithm based wrapper methods where gene selection was achieved with genetic operation based optimizations. In general, wrapper type methods outperform filter type ones, while suffering from high computational cost and robustness. Recently, some researchers also proposed to combine the filter type and the wrapper type feature selection methods to achieve both the efficiency and the effectiveness. For example, Mundra et al. proposed to minimize the redundancy among selected genes by incorporating the mutual information based mRMR method into the SVM-RFE [15] method. The drawback of such single-filter-single-wrapper (SFSW) type approaches is that its classification accuracy is dependent on the choices of specific filters and wrappers. In [16], Leung et al. proposed a multiple-filter-multiple-wrapper (MFMW) approach that used multiple filters and multiple wrappers to identify promising biomarker genes for improving the classification accuracy and robustness. Their experimental results demonstrated that the MFMW approach outperforms SFSW methods in all cases generated by all combinations of filters and wrappers used in the MFMW approach. Similarly, there have been lots of studies in the feature selection specific for image classification [17, 18, 19]. Although much promising progress has been achieved, little attention has been paid to address the human factors in the feature selection process. On the contrary, most existing feature selection methods aimed to mathematically identify a subset from a given set of low level visual features such as color and texture [20, 21]. Considering the fact that human beings are very good at interpreting visual information for various tasks such as object recognition and image classification, it will be particularly valuable to leverage the cognitive process of human beings. In order to explore mechanisms of human eyes for processing the visual information, we propose to use the eye tracking technology. Since the 1960s, this technology has been used as an objective tool for the analysis of visual attention and perception by determining fixations. In [22], eye tracking experiments were carried out to explore the relationship between gaze behavior and a visual attention model that identifies Regions of Interest (ROIs) in images. The results demonstrated that eye gaze behavior on images with clear ROIs is different from that on images without clear ROIs. In [23], the feasibility of using an eye tracker as an image retrieval interface was explored, which showed that eye tracking data could be used to retrieve target images in fewer steps than the random selection. In [24], the fixation data were used to locate human focused ROIs (hROIs) for image matching, while the relative gaze duration of each hROI was used to weigh the similarity measure for image retrieval. This method outperforms the conventional content-based image retrieval methods especially when important regions in images are difficult to be located based on visual features. Therefore, it is anticipated that eye tracking data will help identify important features for image classification by leveraging the cognitive process of

3

human beings. Based on the above observations, in this paper we propose a two-stage feature selection method. In the coarse selection stage, an eye tracking device is employed to acquire eye tracking data for identifying hROIs as sample images. With the hROIs identified, an improved quantum genetic algorithm (IQGA) is proposed to select an initial subset of features by utilizing QGA’s efficient search capability and effective capacity for optimization problems of combinatorial nature. In order to alleviate the premature convergence problem of the traditional QGA [25], an adaptive mutation strategy is employed. In the fine selection stage, we propose a hybrid method to take the complementary advantages of both filter type and wrapper type feature selection methods. This method is performed with respect to the components of coarsely selected features by combining the filter type mRMR method and the wrapper type SVMRFE method sequentially, which is called mRMR-SVM-RFE. The former method is implemented to select a near optimal subset of feature components efficiently while the latter one is used to select more effective feature components from the subset obtained from the former step. In summary, the key contributions of our work are: 1. Different from most existing feature selection methods which aim to improve feature selection by devising mathematically sound algorithms, our method is one of the first studies taking human factors into account in the process of feature selection for image classification by using eye tracking data. In addition, instead of utilizing several visual features, we investigate 75 visual features. It is the largest number of visual features studied in the literature of image classification, to the best of our knowledge. 2. We propose a two-stage feature selection method. In the coarse selection stage, a subset of visual features are identified with the help of eye tracking data and the quantum genetic algorithm (QGA). We also propose an improved mutation strategy to eliminate the premature convergence issue of the QGA. 3. We propose a hybrid method, namely mRMR-SVM-RFE, in the fine feature selection stage to have both the efficiency merit of the mRMR and the effectiveness merit of the SVM-RFE. We devise an improved SVM-RFE method by integrating the ranking information of individual feature components obtained from the mRMR method into the ranking criterion of the original SVMRFE method. Therefore, our proposed mRMR-SVM-RFE method performs better than the one proposed in [15] where the ratio between the relevancy and redundancy in mRMR was directly used to devise the ranking criterion in SVM-RFE. The rest of this paper is organized as follows. In Section 2, we briefly introduce the 75 visual features used in our study. In Section 3, we introduce three aspects of our feature selection methods: 4

the acquisition of eye tracking data, the QGA based coarse feature selection, and the mRMR-SVM-RFE based fine feature selection. In Section 4, we present and discuss our experimental results of image classification in two benchmark datasets. In Section 5, we conclude our work together with discussions on our future work.

2. FEATURE EXTRACTION Low level visual features such as the color, the texture and the shape are fundamental for characterizing the visual content such as images [26, 27, 28, 29]. In this paper, we investigate 75 descriptors of these three types of visual features which have been widely used and popular for the visual content representation in various tasks such as the image classification and the image retrieval. Color features include the color histogram [30, 31], the dominant color [32], color moments [33], the color set [34], the color structure descriptor [32], the color layout [32], and the scalable color descriptor [35]. In combination with different color spaces and quantization methods, totally 54 color descriptors (indexed from 1 to 54) are extracted for each image, as shown in Table 1. In total, 11 texture descriptors (indexed from 55 to 65) [27], as shown in Table 2, are studied in this work: three texture descriptors recommended in MPEG-7, including the local edge histogram descriptor (EHD) [36], the texture browsing descriptor (TBD) and the homogeneous texture descriptor (HTD) [37], the Tamura texture [38], the Gabor feature [39], the primitive length descriptor, the auto-correlation descriptor [40], the edge frequency descriptor [41], the Fourier descriptor, and the co-occurrence matrix descriptor [42]. In this work, 10 widely used shape features (indexed from 66 to 75) [29] as shown in Table 3 are used: the area, the Euler number, area projections, the eccentricity, the principle direction, geometric invariant moments, Legendre moments, Zernike moments [43], and pseudo Zernike moments [44]. For convenience of the following discussions, each feature will be assigned with a unique index. That is, all features in the feature set will be indexed from I (0) = {m|m = 1, 2, . . . , M } where M is the number of image features and equals to 75 in this paper. Note that other visual features can also be utilized and integrated in our feature selection framework.

3. FEATURE SELECTION Our proposed feature selection method consists of two stages: a coarse selection stage and a fine selection stage. In the coarse selection stage guided by eye tracking data, an improved quantum genetic algorithm (IQGA) is proposed to select a subset of visual features from the total 75 features to achieve

5

Table 1. Summary of color features Feature Index

1∼10 General Feature Name Histogram(in 10 color spaces) Feature Dimension 48 Feature Index 37∼49 Color moment Feature Name (in 13 color spaces) Dimensions 9

11∼20 21 22 23 24∼36 Accumulated HSV Histogram HSV Histogram HSV Histogram Dominant colors Histogram(in (8:3:3 uniform (16:4:4 uniform (non-uniform (in 13 color 10 color quantization) quantization) quantization) spaces) spaces) 48 72 256 35 16 50 51 52 53 54 HSV Color set Color structure Color layout Scalable color Scalable color (8:3:3 uniform in RGB in YCbCr in 16:4:4 HSV in 8:3:3 HSV quantization) space space space space 8 256 12 66 20

For the color histogram features, the order for all the 13 color spaces is: HSL (Hue, Saturation, Lightness), HSV (Hue, Saturation, Value), JPEG/YCbCr, Lab, Lch (Lightness, chroma, hue), Luv, RGB, XYZ, YCbCr, YDbDr, YIQ, YPbPr and YUV. The general histogram and the accumulative histogram are derived in color spaces other than HSL, HSV, Lch.

Table 2. Summary of texture features Feature Index Feature Name

55 EHD

56 TBD

57 HTD

58 59 60 Tamura Texture Gabor Textrue Primitive Length Descriptor Descriptor Descriptor Feature Dimension 8 5 62 3 24 5 Feature Index 61 62 63 64 65 Feature Name Autocorrelation Edge Frequency Fourier Texture Fourier Texture Co-occurrence Descriptor Descriptor (Energy) (Amplitude) Matrix Dimensions 8 385 256 256 8

the best classification performance in the dataset where the eye tracking data were obtained. In the fine selection stage, a hybrid feature selection method mRMR-SVM-RFE is proposed to take the advantages of both the mRMR and the SVM-RFE. 3.1. Coarse Selection with Eye Tracking Data and IQGA The coarse selection stage consists of three components: the eye tracking data acquisition, the hROI identification, and the coarse feature selection with the IQGA. 3.1.1. Eye Tracking Data Acquisition Eye tracking data were acquired from a purposely prepared image dataset, namely Dataset1. This dataset comprises two categories of images: one with distinct objects and the other without distinct objects. Each category contains 50 images which were randomly picked up from the 7,346 Hemera color image database [45]. Each image in this dataset has a resolution of around 1, 545 × 1, 024 pixels which covers most of the screen with a resolution of 1, 920 × 1, 280 pixels. Such a setting guarantees that almost all eye tracking data are located inside the image area displayed on the screen. A non-intrusive table-mounted eye tracker, Tobii X120, was used in a user-friendly environment to obtain eye tracking data for images in Dataset1. A high accuracy of 0.5 degree (with 0.3 degree drift) 6

Table 3. Summary of shape features Feature Index Feature Name

66 Area

67 Euler Number

68 69 70 Horizontal Area Vertical Area Eccentricity Projection Projection Feature Dimension 1 1 127 127 1 Feature Index 71 72 73 74 75 Feature Name Principle Direction Geometric Invariant Legendre Zernike Pseudo-Zernike Moment Moment Moment Moment Dimensions 1 7 25 10 10

can be achieved with this eye tracker. The experiment was conducted at a sampling rate of 120 Hz. The freedom for the head movement was 30 × 22 × 30 cm [24]. The highest rotation speed of a participant’s head is 35 cm/s. A calibration was carried out under a grid of nine calibration points to minimize errors of eye tracking data. In the experiment, one participant was invited to view each of the 100 images within 5 seconds under a free-view condition. The participant was a proficient computer user with a normal vision and new to eye tracking devices. The participant sit at a viewing distance of about 68 cm in front of the computer screen and the corresponding subtended visual angle was about 41.5◦ × 26.8◦ . In total, nearly 600 samples of raw gaze data for each image were collected with the Tobii X120 eye tracker. As shown in Fig. 1, four sample images are overlaid with raw gaze samples (marked with blue asterisks) to illustrate the eye tracking patterns of object-distinctive images and non-object-distinctive images. It is clearly observed that the eye gaze data of object-distinctive images often concentrate in the regions of objects while those of non-object-distinctive images scatter broadly, which indicates that not all image content is necessary for human perception. As a result, visual features extracted from eye gaze regions would be more helpful for image classification. 3.1.2. hROI Identification In order to identify hROIs from raw gaze data, gazing samples are firstly clustered to form fixation points. The clustering process was completed with the Tobii X120 Studio software where the fixation radius (35 pixels) and the minimum fixation duration (100 ms) are set to extract fixation data from the raw gaze data. Next, square regions are identified as hROIs around individual fixation points. To guarantee a proper coverage of visual content in an hROI, a suitable region size should be chosen. On one hand, larger regions tend to include unnecessary or even noisy information, which may compromise the performance of image classification. On the other hand, smaller regions tend to miss necessary information which could be discriminative for image classification. In this work, a size of 127 × 127 pixels is chosen for hROIs after different sizes have been tried. As shown in Fig. 1, fixation points

7

(a)

(b) Fig. 1. Illustration of eye tracking data and hROIs where blue asterisks indicate raw gaze samples and green squares indicate fixation points and hROIs. (a) Eye tracking data and hROIs for object-distinctive images. (b) Eye tracking data and hROIs for non-object-distinctive images.

(a)

(b) Fig. 2. Illustration of the enlarged version of hROIs as shown in Fig. 1. (a) hROIs for object-distinctive images. (b) hROIs for non-object-distinctive images.

8

and the corresponding hROIs are marked with green squares while the red dotted lines connect the temporally adjacent fixation points. Note that the numbers of fixation points vary from image to image. Fig. 2 gives the enlarged version of hROIs as shown in Fig. 1 for the object-distinctive and non-objectdistinctive cases, respectively. After identifying hROIs for images in Dataset1, all the 75 features listed in Tables 1, 2 and 3 are extracted for each hROI. Therefore, a concatenated feature vector x with M features can be obtained to represent each hROI, x = [xT1 , xT2 , . . . , xTM ]T ,

(1)

where xm , m = 1, 2, . . . , M , is a column vector denoting the m-th feature, T is the transposition operation. In our case, since 15 ∼ 20 fixation points can be derived in each image in Dataset1, we select 15 hROIs from each image as samples for the following coarse selection procedure. As a result, 100 images in Dataset1 will produce 1,500 such samples. Upon extracting features for all L = 1500 hROI samples as in Eq.(1), a pool Px = {x(1) , x(2) , . . . , x(L) } of feature vectors can be obtained. 3.1.3. Coarse Feature Selection with the IQGA Our coarse feature selection aims to best classify the L hROIs represented as Px into their corresponding class labels Plb = {y1 , y2 , . . . , yL } (either object-distinctive class or non-object-distinctive class) with a subset of M (= 75) visual features. In this section, we introduce our proposed IQGA algorithm from five aspects: the encoding strategy, the observation operator, the fitness function, the improved rotation gate for the mutation, and the quantum crossover. Note that in this stage, the selection is performed on individual visual features, instead of individual components of each visual feature. (1) Encoding Strategy for Feature Selection Rather than encode a feature as a binary value (0 or 1) like a traditional genetic algorithm (GA), the quantum genetic algorithm (QGA) encodes the selection status of a feature with a Q-bit which gives the probability of a feature being selected [25, 46]. In this way, the selection task for M features can be encoded as a Q-bits vector with M components (i.e., a chromosome with M genes), q

= =

(q1 , q2 , . . . , qM ) ⎛ ⎞ α1 α 2 . . . α M ⎝ ⎠, β1 β2 . . . β M

(2)

T

where qm = (αm , βm ) (m = 1, 2, . . . , M ) represents the Q-bit (selection status) corresponding to the m-th feature. Here, both αm and βm are variables with continuous values in [0, 1], with |αm |2 being

9

the probability of the m-th feature not being selected, while |βm |2 = 1 − |αm |2 being the probability of being selected. With such a coding strategy, the quantum genetic algorithm searches for the optimal chromosome in terms of the fitness function from the sample space through mutation and crossover operations. Since all α and β in Eq. (2) are of continuous values, the population for quantum chromosome sample space is much larger than that for the traditional binary valued chromosome sample space. Therefore, it will be of greater probability using QGA to find a globally optimal chromosome in such a space. (2) Observation Operator In order to complete the feature selection task, the quantum chromosome should be converted to conventional binary bits which indicate whether the corresponding features are selected. In QGA, such a conversion is usually referred to as the observation procedure. In this paper, the observation procedure can be mathematically formulated as:

om

⎧ ⎨ 0, = ⎩ 1,

if

|αm |2 > r

m = 1, 2, . . . , M,

(3)

otherwise

where r is a random number uniformly distributed in [0, 1], om = 1 implies that the m-th feature is selected, otherwise, the m-th feature is discarded. (3) Fitness Function The fitness function is used to evaluate the effectiveness of a chromosome sample. In our case, the purpose of feature selection is to select an optimal subset of features which can achieve the best classification accuracy. Hence, the fitness function f it(o) is defined as the classification accuracy P achieved with the subset of features selected using the chromosome guided selection scheme o, f it(o)  P =

Ncorrect (o) , Ntotal

(4)

where Ncorrect (o) and Ntotal are the number of correctly classified hROIs according to the selection scheme o and the total number of hROIs in the test subset in Dataset1, respectively. In the coarse selection stage, the following support vector machine (SVM) classifier is employed [47] for evaluating different feature selection schemes:

10

f (x(l) o )

= =

w · Φ(x(l) o )+b (j) αj yj K(x(l) o , xo ) + b,

(5)

j (l)

where xo is a vector of the selected features under the selection scheme o for the l-th hROI sample to (j)

be classified, xo is a vector of the selected features of the j-th training sample, yj denotes the class label of the j-th sample, αj is the Lagrange parameter which can be obtained from the training sample set, and K(·) is a kernel function, b is a classification threshold. In our case, the radial basis function (RBF) is selected as the kernel function of the SVM classifier: (t) 2 K(xlo , xto ) = exp(−gx(l) o − xo  ),

(6)

where g is set to 1 according to a ten-fold cross validation of the classification results when setting g to different values in the set {10−5 , 10−4 , . . . , 100 , . . . , 105 }. (4) Improved Rotation Gate for Mutation In the QGA, the mutation operation is usually implemented with an invertible normalized matrix referred to as a quantum rotation gate. Then, the mutation operation with respect to the m-th gene in a chromosome can be expressed as: ⎛ ⎝

 αm  βm





⎠=⎝

cos(θm )

− sin(θm )

sin(θm )

cos(θm )



⎞⎛ ⎠⎝

αm

⎠,

(7)

βm

  T where (αm , βm )T and (αm , βm ) are, respectively, the Q-bit of the m-th gene before and after the

mutation operation, θm is the rotation angle and θm = s(αm , βm ) · Δθm . Here, Δθm determines the rotation angle and s(αm , βm ) controls the rotation direction. In most existing QGAs, the above two parameters are determined in a manner as shown in Table 4 [48] where om is the observed binary value of the m-th gene in the chromosome being considered, om,opt is the observed binary value of the mth gene in the optimal chromosome in terms of the fitness function, f it(o) and f it(oopt ) are fitness functions of the above two chromosomes as defined in Eq. (4). From Table 4, we can see that Δθ can only have a few fixed discrete values. The limited number of the values for Δθ is not helpful for increasing the diversity of the chromosome population and hence the searching algorithm is prone to be trapped in a local optimum. In order to tackle such a problem, upon comprehensive considerations of the relationship between Δθ and evolution generations, and the 11

Table 4. Rotation Angles and Rotation Directions [48] om om,opt f (o) ≥ (oopt ) 0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

False True False True False True False True

Δθm 0 0 0 0.05π 0.01π 0.025π 0.005π 0.025π

s(αm , βm ) α m · βm > 0 α m · βm < 0 α m = 0 β i = 0 0 0 0 0 0 0 0 0 0 0 0 0 -1 +1 ±1 0 -1 +1 ±1 0 +1 -1 0 ±1 +1 -1 0 ±1 +1 -1 0 ±1

relationship between Δθ and the fitness values, the rotation angle is alternatively computed as follows:

[f it(oopt ) − f it(o)] t , θm = s(αm , βm ) · Δθm exp − tmax

(8)

where s(αm , βm ) and Δθm still represent the rotation angle and its direction whose values can be found in Table 4, (f it(oopt ) − f it(o)) is the difference between the fitness values of the chromosome being considered and those of the optimal one, t and tmax are the current and the largest generation numbers, respectively. With such an improved mutation strategy, the amount of rotation is adaptively adjusted with the evolution generation and the fitness value. As a result, the diversity of the chromosome population is increased, which effectively avoids the premature convergence problem of conventional QGA. Although the diversity in the population can be increased by revising the rotation scheme, such diversity will be lost when the Q-bit components α and β prematurely converge to a value in the vicinity of 0 or 1 upon quantum rotation. To address this problem, the Hε gate [49] is employed to modify the mutation result as follows: ⎛ ⎝

 αm  βm

⎞ ⎠

=

=

  Hε (αm , βm )

⎧ √ √ ⎪ ⎪ ( ε, 1 − ε)T , ⎪ ⎨ √ √ ( 1 − ε, ε)T , ⎪ ⎪ ⎪ ⎩ (α , β  )T , m m

if

 2 |αm | < ε;

if

 2 |αm | > 1 − ε;

if

 2 ε ≤ |αm | ≤ 1 − ε,

(9)

where 0 < ε  1 is a threshold set to 0.01 in this paper [50]. Obviously, through such a modification,

12

  T Hε , (αm , βm ) will depart away from the vicinity of 0 or 1, avoiding being forced to these two values

during the observation procedure. Hence, the diversity of the observed Q-bit value can be increased and the premature convergence problem can be alleviated. (5) Quantum Crossover In this paper, the crossover operation is performed with a probability Pc of the population after observation. Suppose that ok = (ok1 , ok2 , . . . , okM ) and okcr = (okcr,1 , okcr,2 , . . . , okcr,M ) are the observed states of the k-th chromosome before and after crossover operations. The crossover operation can be formulated as: K okcr,m = o((k+m−1)) , m

(10)

where ((k + m − 1))K = (k + m − 1) mod K, K is the number of samples in the population. This operation simulates the quantum interference procedure and can make the best use of the information contained in the population. The crossover operation is very helpful for increasing the diversity of the population and avoiding prematurity of the algorithm. In summary, the algorithm for coarse feature selection is described in Algorithm 1. Algorithm 1: Coarse Feature Selection with IQGA : Feature vector pool Px , half for training and half for testing, class label pool Plb , crossover probability Pc , maximum generation number tmax (1) (1) Output : Feature subset Sf and its indices If for the selected features Initialize: The generation number t = 0 and the population Q = {q1 , q2 , . . . , qK } Begin Obtain O = {o1 , o2 , . . . , oK } for Q from Eq.(3); while (t < tmax ) do for (k ← 1 to K) do Perform feature selection according to ok ; Train the SVM in the training set with the selected features; Perform classification and compute f it(ok ) with the testing set; end Find the chromosome index kopt with the highest classification accuracy; for (k ← 1 to K and k = kopt ) do Perform mutation for qk according to Eqs.(7),(8) and (9); Obtain ok for qk from Eq.(3); end for (k ← 1 to K and (r = rand(0, 1)) ≤ Pc ) do Perform crossover operation according to Eq.(10); end t ← t + 1; end Find oopt with the highest classification accuracy; (1) (1) (1) Output subsets If = {m|oopt,m = 1}, Sf = {xm |m ∈ If }; End Input

13

3.2. Fine Selection with mRMR-SVM-RFE (1)

After the coarse feature selection, the Mcoarse selected features in Sf

are utilized to characterize

image content. That is, each image will be represented with a concatenated feature vector xcoarse = (1)

[xTi1 , xTi2 , . . . , xTiMcoarse ]T with i1 , i2 , . . . , iMcoarse ∈ If , where xik is the ik -th selected feature with nik (≥ 1) components, i.e., xik = [xik ,1 , xik ,2 , . . . , xik ,nik ]T . We perform fine selection with respect to individual components of xcoarse , as opposed to the feature-wise approach used in the coarse selection. Therefore, in the following discussion, the coarsely selected and concatenated feature vector xcoarse will be rewritten in a component-wise manner as:

(1) xcoarse = [x1 , x2 , . . . , xN ]T , with N = i ∈I (1) nik . As a result, the correspondences between If (1)

and Ic

k

f

(1)

= {i|i = 1, 2, . . . , N } and between Sf

(1)

and Sc

= {xi |i = 1, 2, . . . , N } can be established.

In terms of the component-wise manner, the purpose of the fine selection is to select nf ine (< N ) feature components for effective image content representation. Our fine feature selection process consists of two steps: the mRMR based feature selection to further reduce the number of feature components, and an improved SVM-RFE based feature selection for deriving the best performed feature components. Therefore, we name our proposed fine feature selection method as mRMR-SVM-RFE. 3.2.1. mRMR based Feature Selection In the mRMR based feature selection method [8], the redundancy R and the relevance D of a feature subset are measured in terms of the mutual information (MI) defined as follows: R(S) =



1 |S|

and D(S, c) =

2

I(xi ; xj ),

(11)

xi ,xj ∈S

1 I(xi ; c), |S|

(12)

xi ∈S

where R(S) is the redundancy of the feature subset S, D(S, c) is the relevance of the feature subset S to the target classes c, I(xi ; xj ) and I(xi ; c) are the mutual information between the feature components xi and xj and that between xi and c, respectively, and |S| is the number of elements in the subset S. Therefore, the objective of maximizing the relevance and minimizing the redundancy simultaneously can be expressed as the following optimization problem: Sc(2) = arg max φ(S, c) = (1) S⊂Sc

14

D(S, c) , R(S)

(13)

(2)

where the subset Sc

(1)

is the optimally selected subset of feature components in the subset Sc . In our

work, the incremental search scheme [9] is employed to find the near-optimal solution to the problem in Eq.(13). With this approach, nmM feature components are finally selected to construct the feature (2)

subset Sc . 3.2.2. Improved SVM-RFE based Feature Selection (2)

After the near-optimal subset Sc

of feature components are obtained with the filter type mRMR

based feature selection method, the searching space has been reduced to cater for the computationally expensive wrapper type SVM-RFE based feature selection method. (1) Traditional SVM-RFE Method The SVM-RFE based feature selection method [51] starts with all feature components and recursively removes the feature component with the least importance for classification in a backward elimination manner. The measure for the importance of a feature component is computed from the weight vector of the SVM [15]: w=



(l)

αl × yl × xmM ,

(14)

l (l)

where αl is the Lagrange multipliers, yl is the class label of the l-th sample, and xmM is the feature vector resulted from mRMR selection for the l-th sample. With such a weighting vector, the importance of the i-th component ci can be determined as wi2 , and hence the feature components can be selected according to their importance as determined above. (2) Improved SVM-RFE Method Since the mRMR criterion aims to simultaneously maximize the relevance and minimize the redundancy of a feature component, it would be helpful to integrate such a criterion into ranking the importance of a feature component in the SVM-RFE method. Therefore, we devise a new ranking criterion through a convex combination as follows: ci = (1 − β) × I  (i) + β × |wi |,

(15)

where ci is the importance of the i-th component of the feature, β is a constant satisfying β ∈ [0, 1], wi is as defined in Eq.(14), and I  (i) indicates the relevance-redundancy factor of the i-th feature component in terms of the mRMR criterion. Now there are two problems to be solved: how to define I  (i) and how to choose a suitable β. For defining I  (i), we employ a simple yet effective solution: ranking each feature component in terms of the mRMR criterion in a decreasing order and assigning a decreasing order number to each feature 15

(2)

component. As a result, an ordered set I  = {nmM , nmM − 1, . . . , 1} can be obtained for Sc

using

the mRMR. Algorithm 2: Fine Feature Selection Algorithm mRMR-SVM-RFE (1)

(2)

(L)

: Pool of features Pxcoarse = {xcoarse , xcoarse , . . . , xcoarse }, pool of class labels (1) Plb = {y1 , y2 , . . . , yL }, subset Sc of feature components after coarse selection, the number nmM of feature components from mRMR, and the number of the final selected feature components nf ine . (3) Output : Subset Sc for the nf ine selected feature components Initialize : Feature rank list r = [ ], feature subset obtained from the mRMR selection method (2) (2) Sc = Φ, Ic = Φ Begin (2) while (|Sc |) = nmM ) do (1) (2) Search the best feature component x ˆ in Sc − Sc with the incremental search scheme; (2) (2) (2) (2) x}, Ic ← Ic ∪ {index(ˆ x)}; Sc ← Sc ∪ {ˆ end (2) (2) Set S = Sc , I = Ic ;  Obtain the set I of the order numbers for features in S; (1) (2) (L) Set X = [xcoarse , xcoarse , . . . , xcoarse ]T , Xt = X(:, I); Compute the combinatorial coefficient β from Eq. (18); while (S = Φ) do Train the SVM classifier α = SV M − train(Xt , y); Compute the weight vector w from Eq. (14) ; Compute the ranking criterion ci from Eq. (15) for all feature components; Find the feature component: ˆi = arg mini∈I ci ; r ← [ˆi, r], S ← S − {xˆi }, I ← I − {ˆi}; Set Xt = X(:, I); end (3) Output Sc = {xi |i = r[k], k = 1, 2, . . . , nf ine }; End Input

Rather than choosing the combination coefficient β empirically, we propose to obtain its value through a mathematical model that characterizes the dependency of the classification accuracy on β. This model is formulated by fitting data to a polynomial function of the k-th order as follows: P (β) = d0 + d1 β + · · · + dk β k ,

(16)

where di (i = 1, 2, . . . , , k) is one of the k fitting coefficients which can be determined with the following least square fitting strategy. Suppose that Pj (j = 1, 2, . . . , J) is one of the J classification accuracies achieved upon J different

16

choices of β. With these discrete data, the fitting coefficients can be determined as follows: (d0 , d1 , . . . , dk ) = arg

min

(d0 ,d1 ,...,dk )

J

[Pj − P (βj )]2 ,

(17)

j=1

where P (βj ) is the classification accuracy computed from Eq. (16) when β = βj . With this model, the value of β can be obtained by solving the following optimization problem: β = arg max P (β). β

(18)

In summary, the fine feature selection algorithm is described in Algorithm 2.

4. Experimental Results and Discussions 4.1. Datasets Three image datasets were used in our experiments. The first dataset, namely Dataset1, was used for coarse selection. As discussed in Section 3.1.1, under the guidance of eye tracking data, totally 1,500 hROIs were extracted from images in this dataset. Among these hROIs, half were used for training the SVM classifier and the other half were used to evaluate the performance of the selected features. In order to perform statistical comparisons, 25 rounds of experiments were conducted with different initial populations. The second dataset, namely Dataset2, is composed of 5,306 images randomly picked from the Caltech image database [52]. Images in this dataset were divided to seven categories, namely, airplanes, cars, faces, guitars, leaves, motorbikes, and background. Our fine feature selection process was firstly performed on one half of the images (i.e. the training set) from this dataset and then performance evaluations of different feature selection algorithms were conducted on the other half of images (i.e., the testing set). Some sample images of each category in Dataset2 are shown in Fig. 3. A detailed description of Dataset2 is provided in Table 5. The third dataset, namely Dataset3, is composed of 1,000 images in 10 categories (100 images per category) picked from the COREL image database3 . This dataset was constructed with the same protocol as that in [53]. Sample images of each category in Dataset 3 are shown in Fig. 4. In order to perform statistical comparisons among different algorithms in the fine selection stage, 10 rounds of experiments were conducted on both Dataset2 and Dataset3 by partitioning the datasets 3 http://wang.ist.psu.edu/docs/home.shtml

17

(a) (b) (c) (d) (e) (f) (g) Fig. 3. Sample images of each category in Dataset2: (a) airplanes; (b) cars; (c) faces; (d) guitars; (e) leaves; (f) motorbikes; (g) background

Table 5. Statistics of Dataset2

Number of images Number of images Number of images Category airplanes cars faces guitars leaves motorbikes background

in the category 800 1,155 435 1,030 186 800 900

for training 400 578 217 515 93 400 450

18

for testing 400 577 218 515 93 400 450

(a) (b) (c) (d) (e) (f) (g) (h) (i) (j) Fig. 4. Sample images of each category in Dataset3. (a) African people and villages; (b) beach; (c) buildings; (d) buses; (e) dinosaurs; (f) elephants; (g) flowers; (h) horses; (i) mountains and glaciers; (j) foods

randomly to two parts for feature selection and for performance evaluation in 10 different ways. Unless specified, experimental results discussed below are those corresponding to one of the 10 rounds. 4.2. Results of Coarse Feature Selection The coarse feature selection using eye tracking data was performed in Dataset1 with our proposed improved QGA algorithm (namely IQGA-E as referred to Algorithm 1). For the implementation of this algorithm, the population size K and the probability of the crossover Pc were, respectively, set to 20 and 0.1 for maintaining the diversity of the population. The maximum generation number tmax was set to 200 to ensure convergence of the evolution algorithm and meanwhile to keep a sufficient number of features for the subsequent fine selection. The number of genes was set to 75, which is equal to the number of features in our study. Since our IQGA-E algorithm is an evolution algorithm, different initial population will result in slightly different results, one of which is presented in Table 6. As can be seen from this table, there are 18 visual features selected: 13 color features, 3 texture features, and 2 shape features, which indicates that color features play important roles in human vision when discriminating different types of images. Our classification evaluation shows that, with these features, we can achieve the best accuracy of 92.4%

19

Table 6. Results of coarse feature selection with IQGA-E Feature Index

1 General Feature Name Histogram(in JPEG/YCbCr space) Feature Dimension 48 Feature Index 21 HSV Histogram Feature Name (8:3:3 uniform quantization) Dimensions 72 Feature Index 53 Scalable color Feature Name in 16:4:4 HSV space Dimensions 66

4 General Histogram(in RGB color space) 48 22 HSV Histogram (16:4:4 uniform quantization) 256 57 HTD

5 8 General General Histogram(in Histogram(in XYZ color YIQ color space) space) 48 48 26 30 Dominant colors Dominant colors (in JPEG/YCbCr (in RGB color space) space) 16 16 59 65 Gabor texture Co-occurrence Descriptor Matrix

62

24

14 16 Accumulative Accumulative Histogram(in Histogram RGB color YCbCr color space) space) 48 48 43 52 Color moment Color layout (in RGB color in YcbCr space) space 9 12 73 74 Legendre Zernike Moment Moment

8

25

10

Table 7. Classification performance of three coarse feature selection algorithms (CI represents confidence interval) Algorithm QGA IQGA IQGA-E

Accuracy% Precision% Recall% F1 score Feature Number p-value (95%CI) (95%CI) (95%CI) (95%CI) 87.6 87.56 87.20 87.38 16 p < 0.05 (87.02, 88.18) (87.10,88.02) (86.79,87.61) (86.96,87.80) 89.2 89.59 89.20 89.39 20 p < 0.05 (88.22, 90.18) (89.23,89.95) (88.79,89.61) (89.01,89.77) 91.8 92.47 91.28 91.87 18 p < 0.05 (91.53, 92.07) (92.28,92.66) (91.01,91.55) (91.65,92.09)

(1)

in the test set of the hROIs. The 18 selected features form a feature subset Sf selection. By concatenating all the 18 features, the feature subset

(1) Sf

for the next stage fine (1)

is converted to a new set Sc

with 864 feature components. We also compared our proposed IQGA-E coarse selection algorithm with two other algorithms: the traditional QGA and the improved QGA without using eye tracking data (denoted as IQGA). When eye tracking data are not utilized, global feature extraction was conducted on each image in Dataset1. All the algorithms were conducted 25 runs with different initial populations under the same parametric settings as given above. As shown in Table 7, with the help of eye tracking data, IQGA-E clearly outperforms both IQGA and QGA. It is also observed that IQGA performs better than QGA, which demonstrates the effectiveness of our improved mutation strategy. In order to investigate whether the features selected with our proposed IQGA-E algorithm are the best for image classification, 10 rounds of experiments were performed using bootstrapping techniques where a subset of 18 features were chosen randomly for training and classification in each round. Ten

20

rounds of classification results based on 18 randomly selected features show that we can only achieve an average accuracy of 82.98% which is significantly lower than that (91.8%) achieved based on 18 features selected by our proposed IQGA-E. 4.3. Results of Fine Feature Selection In this section, we report the fine feature selection results from our proposed mRMR-SVM-RFE method (referred to Algorithm 2) for the training set of Dataset2. This method sequentially implements the mRMR method and the improved SVM-RFE method which integrates the mRMR ranking information into the SVM-RFE ranking criterion. Considering the fact that the number of feature components (1)

in Sc

is 864 (see Section 4.2), we kept 300 feature components (i.e., nmM = 300) after the mRMR

selection in our experiments to balance the computational efficiency of the filter model and the wrapper model. Then the number of final selected feature components with mRMR-SVM-RFE, nf ine , was set to 120, because when it increases, there is no clear improvement of classification accuracy in our experiments. 4.3.1. Parameter Estimation In our experiments, the combination coefficient β is determined in a manner as described in Eqs. (16) and (17). In the experiments, we used a fourth order (i.e. k = 4) polynomial function to characterize the dependence of classification performance on β. For the robustness of our proposed algorithm, the above-mentioned polynomial function was constructed under three different settings for the parameter g in the RBF kernel of the SVM. These three values for g are, respectively, 1 × 10−6 , 5 × 10−6 and 8 × 10−6 . With each value of g, classification performance was obtained at 20 different values of β in the training set. By fitting these data to Eq. (17), three dependence equations can be obtained as follows:

P1 (β) = −1.07β 4 + 1.93β 3 − 0.99β 2 + 0.16β + 0.90, P2 (β) = −0.79β 4 + 1.48β 3 − 0.82β 2 + 0.15β + 0.90,

(19)

P2 (β) = −0.97β 4 + 1.93β 3 − 1.20β 2 + 0.26β + 0.88,

where Pj (β) (j = 1, 2, 3) is the classification accuracy under the j-th setting of g. From these three equations, three β values maximizing three accuracies in Eq.(19) can be obtained by performing the optimization procedure of Eq.(18). Then three optimal β values obtained are 0.848, 0.854 and 0.877. 21

Table 8. Results of Fine Feature Selection with mRMR-SVM-RFE on Dataset2 Feature Index

1 General Feature Name Histogram(in JPEG/YCbCr space) Selected Dimensions 18 Feature Index 21 HSV Histogram Feature Name (8:3:3 uniform quantization) Selected Dimensions 7 Feature Index 53 Scalable color Feature Name in 16:4:4 HSV space Selected Dimensions 16

4 General Histogram(in RGB color space) 13 22 HSV Histogram (16:4:4 uniform quantization) 24 57 HTD

18

8 14 16 General Accumulative Accumulative Histogram(in Histogram(in Histogram YIQ color RGB color YCbCr color space) space) space) 5 4 6 26 30 52 Dominant colors Dominant colors Color layout (in JPEG/YCbCr (in RGB color (in YcbCr space) space) space) 5 2 1 59 Gabor texture 1

The final optimal β value is determined as the arithmetic mean of these three values, i.e., β = 0.859, while the parameter g in the final experiment is set to 1 × 10−6 . 4.3.2. Features from Fine Selection (1)

Table 8 gives the feature components selected from the 864-dimension subset Sc

of 18 different

features. It is observed that totally 13 features with 120 feature components survive from our fine selection procedure: 11 color features and 2 texture features. The missing of shape features indicates that general region based shape features are not necessary to differentiate image categories. The top 4 most frequently selected features out of the 18 coarsely selected ones are the 16:4:4 uniformly quantized histogram in the HSV space, the homogeneous texture descriptor (HTD), the general histogram in the JPEG/YCbCr color space, and the scalable color in the 16:4:4 HSV color space. This result further indicates the important role of colors in discriminating different type of images. It is also observed that a large proportion of the components of the color features are from the HSV space, since HSV color space complies better with human vision system. 4.4. Performance Evaluation 4.4.1. Classification Performance In this subsection, we present image classification results in the test set of Dataset2 with selected features obtained using different combinations of methods, such as with or without the coarse selection and with or without eye tracking data. The SVM with the RBF kernel is also used as a classifier for performance evaluation. This classifier was also trained with the training set of Dataset2 and the parameter g in the RBF kernel was set as 1 × 10−6 according to a ten-fold cross validation of the classification results when setting g to different values in the set {10−10 , 10−9 , . . . , 100 , . . . , 1010 }. 22

Fig. 5. Classification accuracies with different numbers of feature components selected with different fine selection algorithms (after coarse selection with IQGA-E)

Fig. 6. Classification accuracies for different image categories with the features selected using IQGA-E&mRMRSVM-RFE (with eye tracking data) and IQGA&mRMR-SVM-RFE (without eye tracking data).

The classification results are presented in Fig. 5, where the mRMR&SVM-RFE method represents the hybrid one that performs feature selection by simply implementing the filter type mRMR and the wrapper type SVM-RFE algorithm sequentially in the fine selection stage. As can be observed from this figure that the mRMR&SVM-RFE method generally performs better than either the mRMR based selection method or the SVM-RFE based one, which indicates the effectiveness of the hybrid method. The mRMR-SVM-RFE method performs the best among all the compared methods, which owes to the fusion of ranking information of the mRMR and the SVM-RFE methods. In particular, the performance improvement is more significant when a small number of feature components are selected. In order to study the impact of the eye tracking data on improving the classification performance with selected features, coarse selection was performed with and without eye tracking data involved. As shown in Fig. 6 for testing images in Dataset2, eye tracking data are always helpful for improving

23

Fig. 7. Classification accuracies with the features selected using IQGA&mRMR-SVM-RFE and QGA&mRMRSVM-RFE.

image classification performance. The average accuracy for the IQGA-E&mRMR-SVM-RFE method is 94.21% which is clearly higher than 92.88% for the IQGA&mRMR-SVM-RFE method. In particular, significant improvement has been achieved for some categories such as guitar and motorbike. However, an exception exists for the leaf class where the classification accuracy for the case with eye tracking data is lower than that without the eye tracking data. These may be due to two reasons: the first one is that the eye tracking data were only obtained from an image dataset with two classes of images; the second one may be that some key features for human vision in recognizing the leaf class are not included in our initial feature pool with only 75 features, which therefore deteriorates the final classification performance. For investigating the performance improvement by using our proposed IQGA other than the traditional QGA in the coarse selection stage, the features selected with these two methods in combination with our proposed mRMR-SVM-RFE fine selection algorithm were used for the classification of testing images in Dataset2. Classification accuracies with respect to testing images in different categories in Dataset2 are shown in Fig. 7. From this figure, we can observe that by using our proposed IQGA as the coarse selection method, the final classification accuracies can also be improved. In order to validate the advantage of incorporating ranking information of mRMR into that of the SVM-RFE method, the features selected with our proposed IQGA-E coarse selection method in combination with either the simple hybrid mRMR&SVM-RFE algorithm or the rank fused mRMR-SVM-RFE algorithm were used for the classification of testing images in Dataset2. Results of these two kinds of combinations are shown in Fig. 8. From this figure, it can be observed that incorporating ranking information of the mRMR method into the ranking process of the SVM-RFE method is helpful for the

24

Fig. 8. Classification accuracies with the features selected using IQGA-E&mRMR&SVM-RFE and IQGAE&mRMR-SVM-RFE.

Table 9. Comparison of different feature selection methods on Dataset2 Selection Method mRMR-SVM-RFE QGA&mRMR-SVM-RFE IQGA&mRMR-SVM-RFE IQGA-E&mRMR-SVM-RFE IQGA-E&mRMR&SVM-RFE IQGA-E&mRMR IQGA-E&SVM-RFE

Accuracy% (95%CI) 90.66 (90.51,90.81) 91.21 (91.00,91.42) 92.88 (92.67,93.09) 94.21 (94.01,94.41) 93.15 (92.80,83.51) 91.39 (91.08,91.70) 92.52 (92.27,92.77)

Precision% (95%CI) 88.98 (88.22,89.73) 89.91 (89.30,90.53) 92.25 (91.61,92.90) 93.31 (92.86,93.76) 92.17 (91.01,93.33) 90.70 (89.82,91.57) 91.64 (90.90,92.37)

Recall% (95%CI) 89.01 (88.32,89.70) 90.21 (89.59,90.84) 92.37 (92.04,92.69) 93.53 (93.04,94.01) 92.56 (91.75,93.37) 90.72 (89.76,91.67) 92.10 (91.79,92.41)

F1 score (95%CI) 88.99 (88.30,89.69) 90.06 (89.46,90.66) 92.31 (91.86,92.77) 93.42 (93.01,93.83) 92.36 (91.40,93.33) 90.71 (89.81,91.60) 91.87 (91.36,92.37)

p-value p < 0.5 p < 0.5 p < 0.5 p < 0.5 p < 0.5 p < 0.5 p < 0.5

feature selection and eventually improves the image classification accuracy. To further evaluate different feature selection algorithms, the Receiver Operating Characteristic curve (ROC) metric is also used for performance comparison. Fig. 9 provides the values of the Area Under the receiver operating Characteristic curves (AUC) for different feature selection methods in Dataset2. From this figure, we can see that the AUC corresponding to our proposed IQGA-E&mRMRSVM-RFE is the largest (0.9815), which shows its best classification performance among others. Similarly, Fig. 10 shows the AUC values for different feature selection methods in Dataset3, which also demonstrates that our proposed IQGA-E&mRMR-SVM-RFE method clearly outperforms other methods. T-tests were performed to assess whether the improvement of our proposed method is of statistical importance. We randomly partitioned Dataset2 into two parts in ten different ways. In each way, half

25

Fig. 9. Comparison of different methods in terms of the area under the receiver operating characteristic curve (AUC) in Dataset2.

Fig. 10. Comparison of different methods in terms of the area under the receiver operating characteristic curve (AUC) in Dataset3.

26

Table 10. Comparison of different feature selection methods on Dataset3 Selection Method mRMR-SVM-RFE QGA&mRMR-SVM-RFE IQGA&mRMR-SVM-RFE IQGA-E&mRMR-SVM-RFE IQGA-E&mRMR&SVM-RFE IQGA-E&mRMR IQGA-E&SVM-RFE

Accuracy% (95%CI) 80.80 (79.96,81.64) 81.88 (81.34,82.42) 83.12 (82.62,83.62) 85.04 (84.65,85.43) 83.47 (83.03,83.91) 81.04 (80.59,81.49) 82.52 (82.01,83.03)

Precision% (95%CI) 81.37 (80.63,82.11) 82.10 (81.51,82.69) 83.51 (83.04,83.98) 85.83 (85.37,86.29) 84.36 (84.68,85.04) 80.91 (80.38,81.44) 82.46 (82.04,82.88)

Recall% (95%CI) 80.88 (80.06,81.70) 81.60 (81.25,81.95) 82.72 (82.30,83.14) 85.44 (84.87,86.01) 83.92 (83.50,84.34) 80.56 (79.72,81.39) 82.24 (81.97,82.51)

F1 score (95%CI) 81.12 (80.56,81.68) 81.85 (81.42,82.28) 83.11 (82.73,83.49) 85.63 (85.17,86.09) 84.14 (83.62,84.66) 80.74 (80.11,81.37) 82.35 (82.01,82.69)

p-value p < 0.5 p < 0.5 p < 0.5 p < 0.5 p < 0.5 p < 0.5 p < 0.5

of the images were used for training the classifier with the features selected from different methods and the other half were used for testing. Performance evaluations of different feature selection algorithms in Dataset2 are presented in Table 9 where three other metrics (i.e., precision, recall and F1 score) are also listed along with the average classification accuracy. From this table, it can be seen that, at the 0.05 level, the proposed IQGA-E&mRMR-SVM-RFE method performs the best among others. Similar evaluations were also conducted in Dataset3 based on our eye tracking data guided coarse selection results. As shown in Table 10, we can also conclude that, at the 0.05 level, our proposed IQGAE&mRMR-SVM-RFE method performs the best in Dataset3, which indicates a consistent superiority of this algorithm to others. Similar to the coarse selection, in order to show whether the final 13 features selected with our proposed IQGA-E&mRMR-SVM-RFE are the best ones, 10 rounds of experiments were performed using bootstrapping techniques where a subset of 13 features were chosen randomly for training and classification in each round. Ten rounds of classification results based on 13 randomly selected features show that we can only achieve an average accuracy of 79.68% in Dataset2 and 59.4% in Dataset3 which are clearly lower than those (94.21% and 85.04%) achieved based on 13 features selected by our proposed IQGA-E&mRMR-SVM-RFE. 4.4.2. Computational Time Table 11 shows the computational times of different feature selection methods when 120 feature components are selected. A single Core 2GHz computer running Matlab2009b was used in the experiments. From this table, we can see that the filter model mRMR is the least computationally expensive while the wrapper model SVM-RFE is the most. Although the computational cost of the hybrid method is higher than that of the mRMR, it is much lower than that of the SVM-RFE. In addition, when the 27

Table 11. Comparison of Computational Time Method

Computational time (Second: s) With coarse selection

Without coarse selection

mRMR

202

747

SVM-RFE

5,293

19,584

mRMR-SVM-RFE

472

1,094

mRMR-SVM-RFE was applied directly to the original 75 image features with 3,268 components, 1,094 seconds was required to select a 120-dimension feature set which could achieve an average classification accuracy of 90.66%. However, it only took 472 seconds for our proposed two-step IQGA-E&mRMRSVM-RFE method to select a 120-dimension feature and achieved a classification accuracy of 94.21%. Therefore, eye tracking data guided feature selection is both more efficient and more effective than existing methods.

5. Conclusions and Future Work In this paper, we present a two-stage feature selection method for image classification by taking human factors into account. Rather than relying on mathematically plausible techniques, we identify a number of features with the help of eye tracking data which reveal how humans perceive visual content. In the coarse feature selection stage, an improved quantum genetic algorithm (IQGA) is proposed to utilize eye tracking data. In the fine feature selection stage, a novel hybrid feature selection method is proposed to make use of both the efficiency merit of the filter type mRMR method and the effectiveness merit of the wrapper type SVM-RFE method by integrating the ranking information of these two methods. Comprehensive experiments have been conducted with respect to 75 visual features in three image datasets. Experimental results consistently demonstrate that eye tracking data are clearly helpful in improving the performance of image classification. In addition, the coarse-to-fine selection strategy greatly improves the efficiency of the whole feature selection process. Based on the promising results achieved, we discuss our future work from three aspects. First, our future research will focus on investigating advanced algorithms to better model eye tracking data from both static and dynamic perspectives. In this work, the eye tracking data were only used to guide the determination of hROIs. Since both the order of the fixation points and gaze duration are highly important for human vision, taking these two measures into account in our feature selection procedure will no doubt be very helpful for improving the performance of image classification further. Second, we will investigate how different types of images impact eye tracking data. In our current experiments, the features for coarse selection were simply determined according to a two-class (object-distinctive vs

28

non-object-distinctive) classification problem with the guidance of the eye tracking data. However, such a simple setting would be limited in the classification problem with a large number of classes. Last, we will investigate more features such as Scale Invariant Feature Transform (SIFT) descriptors and Local Binary Pattern (LBP) features, since the ones used in our work are mainly global features.

Acknowledgments This work is supported by the National Natural Science Foundation of China, No. 60871086 and No. 61473243, the Natural Science Foundation of Jiangsu Province China, No. BK2008159 and the Natural Science Foundation of Suzhou No. SYG201113. The authors thank the anonymous reviewers for their constructive comments and valuable suggestions.

References [1] A. Jain, D. Zongker, Feature selection: Evaluation, application, and small sample performance, IEEE Trans. Pattern Anal. Mach. Intell. 19 (1997) 153–158. [2] I. Guyon, A. Elisseeff, An introduction to feature and variable selection, Journal of Machine Learning Research 3 (2003) 1157–1182. [3] X.Liu, A.Mondry, An entropy based gene selection method for cancer classification using microarray data, BMC Bioinformatics 6 (2005) 76. [4] Z. Xu, I. King, M.-T. Lyu, R. Jin, Discriminative semi-supervised feature selection via manifold regularization, IEEE Transactions on Neural Networks 21 (2010) 1033–1047. [5] M. Dash, K. Choi, P. Scheuermann, H. Liu, Feature selection for clustering-a filter solution, in: Proc. Second Int’l Conf. Data Mining, pp. 115–122. [6] R. Caruana, D. Freitag, Greedy attribute selection, in: Proc. 11th Int’l Conf. Machine Learning, pp. 28–36. [7] J. Zhang, H. Deng, Gene selection for classification of microarray data based on bayes error, BMC Bioinformatics 8 (2007) 370. [8] H. Peng, P. Long, C. Ding, Feature selection based on mutual information criteria of maxdependency, max-relevanceand min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence 27 (2005) 1226–1238.

29

[9] C. Ding, H. Peng, Minimum redundancy feature selection from microarray gene expression data, in: Proc. Second IEEE Computational Systems Bioinformatics Conference, pp. 523–528. [10] R. Battiti, Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on Neural Network 5 (1994) 537–550. [11] I. Guyou, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Machine Learning 46 (2002) 389–422. [12] M. Yousef, S.Jung, L.Showe, M.Showe, Recursive cluster elimination(rce) for classification and feature selection from gene expression data, BMC Bioinformatics 8 (2007) 114. [13] K.B.Duan, J.C.Rajapakse, H.Wang, F.Azuaje, Multiple svm-rfe for gene selection in cancer classification with expression data, IEEE Transactions on Bioscience 4 (2005) 228–234. [14] M. Wahde, Z. Szallasi, A survey of methods for classification of gene expression data using evolutionary algorithm, Expert Rev. Molecular Diagnostics 6 (2006) 101–110. [15] P. A. Mundra, J. C. Rajapakse, Svm-rfe with mrmr fiter for gene selection, IEEE Transactions on Nanobioscience 9 (2010) 31–37. [16] Y. Leung, Y. Hung, A multiple-filter-multiple-wrapper approach to gene selection and microarray data classification, IEEE/ACM Transactions on Computational Biology and Bioinformatics 7 (2010) 108–117. [17] C. Shang, D. Barnes, Fuzzy-rough feature selection aided support vector machines for mars image classification, Computer Vision and Image Understanding 117 (2013) 202–213. [18] A. Vavilin, K.-H. Jo, Automatic context analysis for image classification and retrieval based on optimal feature subset selection, Neurocomputing 116 (2013) 201–207. [19] C.-Y. Chang, S.-J. Chen, M.-F. Tsai, Application of support-vector-machine-based method for feature selection and classification of thyroid nodules in ultrasound images, Pattern Recognition 43 (2010) 3494–3506. [20] S. Zhong, Y. Liu, Y. Liu, F. lai Chung, A semantic no-reference image sharpness metric based on top-down and bottom-up saliency map modeling, in: IEEE 17th International Conference on Image Processing, pp. 1553–1556. [21] L. Wang, Feature selection with kernel class separability, IEEE Transaction on Pattern Analysis and Machine Intelligence 30 (2008) 1534–1546. 30

[22] O. OYEKOYA, F. STENTIFORD, Exploring human eye behaviour using a model of visual attention, in: 17th International Conference on (ICPR’04), volume 4, IEEE Computer Society, Washington, DC, USA, pp. 945–948. [23] O. OYEKOYA, F. STENTIFORD, Perceptual image retrieval using eye movements. advances in machine vision, Image Processing, and Pattern Analysis (2006) 281–289. [24] Z. Liang, H. Fu, Y. Zhang, Z. Chi, D. D. Feng, Content-based image retrieval using a combination of visual features and eye tracking data, in: Proceeding ETRA’ 10 Proceedings of the 2010 Symposium on Eye-Tracking Research and Applications, pp. 41–44. [25] A. Draa, S. Meshoul, H. Talbi, M.Batouche, A quantum-inspired differential evolution algorithm for solving the n-queens problem, The International Arab Journal of Information Technology 7 (2010) 21–27. [26] K. E. A. van de Sande, T. Gevers, C. G. M. Snoek, Evaluating color descriptors for object and scene recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2010) 1582–1596. [27] Y. D. Chun, N. C. Kim, I. H. Jang, Content-based image retrieval using multiresolution color and texture features, IEEE Transactions on Multimedia 10 (2008) 1073 – 1084. [28] L. Nanni, J. Shi, S. Brahnam, A. Lumini, Protein classification using texture descriptors extracted from the protein backbone image, Journal of Theoretical Biology 24 (2010) 1024–1032. [29] M. Carlin, Measuring the performance of shape similarity retrieval methods, Computer Vision and Image Understanding 84 (2001) 44–61. [30] Text of ISO/IEC 15 938-3 Multimedia Content Description Interface- Part 3: Visual. Final Committee Draft, ISO/IEC/JTC1/SC29/ WG11, 2001. Doc. N4062. [31] M. Swain, D. Ballard, Color indexing, International Journal of Computer Vision 7 (1991) 11–32. [32] L. Cieplinski, Mpeg-7 color descriptors and their applications, in: Proc. of 9th International Conference on Computer Analysis of Images and Patterns Seville, pp. 11–20. [33] M.Stricker, M.Orengo, Similarity of color images, in: Proc. SPIE Storage and Retrieval for Image and Video Databases, pp. 381–392. [34] J. Smith, S. Chang, Single color extraction and image query, in: IEEE International Conference on Image Processing, volume 3, pp. 528–531. 31

[35] J. Hafner, H. Sawhney, W. Equitz, M. Flickner, W. Niblack, Efficient color histogram indexing for quadratic form distance functions, IEEE Transactions on Pattern Analysis and Machine Intelligence 17 (1995) 729–736. [36] C. S. Won, D. k. Park, Image block classification and variable block size segmentation using a model-fitting criterion, Optical Engineering 36 (1997) 2204–2209. [37] B. S. Manjunath, J. Ohm, V. V. Vasudevan, A. Yamada, Color and texture descriptors, IEEE Transaction on Circuits and Systems for Video Technology 11 (2001) 703–715. [38] H. Tamura, S. Mori, T. Yamawaki, Texture features corresponding to visual perception, IEEE Transaction on Systems Man and Cybernetics Smc-8 (1978) 460–473. [39] J. Han, K. Ma, Rotation-invariant and scale-invariant gabor features for texture image retrieval, Image and Vision Computing 25 (2007) 1474–1481. [40] M. Kreutz, H. B. V¨olpel, Scale-invariant image recognition based on higher-order autocorrelation features, Pattern Recognition 29 (1996) 19–26. [41] R. M. Haralick, Edge and region analysis for digital image data, Computer graphics and image processing 12 (1980) 60–73. [42] R.C.Gonzalez, R.E.Woods, Digital Image Processing, Prentice Hall, New Jersey, 2nd edition, 2002. [43] K. R. R. R. Mukundan, Fast computation of legendre and zernike moments, Pattern recognition 28 (1995) 1433–1442. [44] R. M. C. W. Chong, P. Raveendran, An efficient algorithm for fast computation of pseudo-zernike moments, International journal of pattern recognition and artificial intelligence 17 (2003) 1011– 1023. [45] Z. Liang, H. Fu, Z. Chi, et al., Image pre-classification based on saliency map for image retrieval, in: IEEE 7th International Conference on Information, Communications and Signal Processing (ICICS 2009), pp. 1–5. [46] Y. W. Jeong, J. B. Park, S. H. Jang, A new quantum-inspired binary pso: application to unit commitment problems for power systems, IEEE Transactions on Power Systems 25 (2010) 1486– 1495.

32

[47] J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, V. Vapnik, Feature selection for SVMs, in: Proceedings of NIPS’2000, pp. 668–674. [48] K. H. Han, J. H. Kim, Genetic quantum algorithm and its application to combinatorial optimization problem, in: IEEE Proceedings of the 2000 Congress on Evolutionary Computation, pp. 1354– 1360. [49] T. C. Lu, G. R. Yu, An adaptive population multi-objective quantum-inspired evolutionary algorithm for multi-objective 0/1 knapsack problems, Information Sciences (2013) 39–56. [50] K.-H. Han, J.-H. Kim, Quantum-inspired evolutionary algorithms with a new termination criterion, h gate and two-phase scheme, IEEE Transactions on Evolutionary Computing 8 (2004) 156–169. [51] M. Luo, L. Luo, Feature selection for text classification using OR+SVM-RFE, in: Control and Decision Conference (CCDC), pp. 1648–1652. [52] G. Griffin, A. Holub, P. Perona., Caltech 256 object category dataset, Technical Report UCB/CSD04-1366, California Institute of Technology, 2007. [53] R. Kachouri, K. Djemal, H. Maaref, Adaptive feature selection for heterogeneous image databases, in: Conference on Image Processing Theory Tools and Applications (IPTA), pp. 26–31.

Xuan Zhou recieved his BSc from Nanjing university Jinling college, China in 2012 and is now a master student in Electronic Engineering in Soochow University, SuZhou, China. His current research focuses on image classification. Xin Gao received his Ph D from Zhejiang University, China in 2004 and now is a researcher in Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Science. His research is manly focused on medical imaging, evaluation of radiotherapy, interventional diagnosis and treatment. Jiajun Wang recieved his BSc and MSc both in physics in 1992 and 1995 from Soochow University, China and his Ph D in Biomedical Engineering from Zhejiang University in 1999.He is currently a professor with the school of Electronic and Information Engineering, Soochow University, China. His research is mainly focused on medical imaging, image processing, pattern recognition and bioinformation. He has published more than 40 scientific journal or conference papers. Hui Yu recieved her BSc from Soochow university, China in 2009 and is now a master student in Electronic Engineering in Soochow University, SuZhou, China. Her current research focuses on image classification. 33

Zhiyong Wang received his BEng and MEng degrees in electronic engineering from South China University of Technology, Guangzhou, China, and his PhD degree from Hong Kong Polytechnic University, Hong Kong. He is a senior lecturer of the School of Information Technologies, the University of Sydney, after joining the school as a Postdoctoral Research Fellow. His research interests include multimedia information processing, retrieval and management, Internet multimedia computing, humancentred multimedia computing, pattern recognition, and machine learning. Zheru Chi received the BEng and MEng degrees from Zhejiang University, in 1982 and 1985, respectively, and the PhD degree from the University of Sydney, in March 1994, all in electrical engineering. Between 1985 and 1989, he was a faculty member of Department of Scientific Instruments, Zhejiang University. He worked as a senior research assistant/research fellow in the Laboratory for Imaging Science and Engineering, University of Sydney, from April 1993 to January 1995. Since February 1995, he has been with Hong Kong Polytechnic University, where he is now an associate professor in the Department of Electronic and Information Engineering. Since 1997, he has served on the organization or program committees for a number of international conferences. He was an associate editor of the IEEE Transactions on Fuzzy Systems between 2008 and 2010, and is currently an editor of the International Journal of Information Acquisition. His research interests include image processing, pattern recognition, and computational intelligence. He has authored/co-authored one book and 11 book chapters, and published more than 190 technical papers. He is a member of the IEEE.

34