An innovative multi-kernel learning algorithm for hyperspectral classification

An innovative multi-kernel learning algorithm for hyperspectral classification

Computers and Electrical Engineering 79 (2019) 106456 Contents lists available at ScienceDirect Computers and Electrical Engineering journal homepag...

1MB Sizes 1 Downloads 95 Views

Computers and Electrical Engineering 79 (2019) 106456

Contents lists available at ScienceDirect

Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

An innovative multi-kernel learning algorithm for hyperspectral classification ✩ Fei Li a,b,∗, Huchuan Lu a, Pingping Zhang a a b

School of Information and Communication Engineering, Dalian University of Technology, Dalian 116023, China Engineering Training Center, Shenyang Aerospace University, Shenyang 110136, China

a r t i c l e

i n f o

Article history: Received 28 April 2019 Revised 22 July 2019 Accepted 2 September 2019

Keywords: Hyperspectral Kernel learning Classification Adaptive boosting

a b s t r a c t Many studies show that support vector machine (SVM) techniques have gotten superior performance in hyperspectral classification. SVM classifier’s improvement was fulfilled by the combination of multiple kernel tricks. Multi-kernel learning (MKL) algorithms have been used in hyperspectral processing. Combining SVM and MKL, we propose a novel method for hyperspectral classification. It adopts a boost algorithm rather than the global optimization method to search for the optimal combination of kernel SVMs. Experiments were carried out on several reflectance spectra of plankton and two hyperspectral remote sensing images to verify the approach. It turns out that the proposed method achieved better classification performance than some conventional algorithms. © 2019 Elsevier Ltd. All rights reserved.

1. Introduction Hyperspectral data has more spectral information than multispectral data, and it contains dozens or even hundreds of bands [1]. Hyperspectral image classification is to mark each pixel of the image as a category, which is one of the important task of hyperspectral data processing. Theoretically, more spectral information makes the hyperspectral classification get higher accuracy than multiplespectral classification, because high spectral resolution makes materials with similar spectral characters to be discriminated steadily. However, there are many facts that will affect the classification results actually. For example: (1) It is difficult to obtain sufficient samples for training a reliable classifier. (2) High spectral dimension will also need more sufficient samples to train classifiers. Additionally, it is very difficult to label samples for a remote sensing image manually. As a result, high spectral dimensions and fewer labeled samples are prone to the curse of dimensionality and the risk of overfitting estimation [2,3]. In remote sensing literatures, many machine learning methods have been developed and applied in multispectral and hyperspectral image classification [4–7], such as K-nearest neighbour (KNN), Random Forest (RF) [8], Support Vector Machines (SVM) [9–11], Bayes methods [12], Multinomial Logistic Regression (MLR) [13,14], Sparse Representation (SR) [15,16], and deep learning [17–19]. Among these methods, kernel based methods have gotten growing interests which can map the data from the original space to the high-dimensional kernel space and make the data linear separable [10]. SVM is the most representative kernel machine and successfully applied in many remote sensing studies because of its elegant properties, ✩ This paper is for CAEE special section SI-aicv3. Reviews processed and recommended for publication to the Editor-in-Chief by Associate Editor Dr. Huimin Lu. ∗ Corresponding author at: School of Information and Communication Engineering, Dalian University of Technology, Dalian 116023, China. E-mail address: [email protected] (F. Li).

https://doi.org/10.1016/j.compeleceng.2019.106456 0045-7906/© 2019 Elsevier Ltd. All rights reserved.

2

F. Li, H. Lu and P. Zhang / Computers and Electrical Engineering 79 (2019) 106456

Fig. 1. Illustration of Multiple Kernel Boosting (MKB) process. For each x, a pixel with hundreds of spectral bands, is put into multiple kernel functions to form a pool of SVM classifiers. The final strong classifier is obtained from the pool using boosting method.

such as fitness for seldom training samples, less sensitive to the Hughes phenomenon and robustness for noise samples [11,20,21]. Many kernel methods are developed to deal with complicated problems of hyperspectral image classification [21], or to select features for classification [22]. As a kernel method, the key ingredient is the kernel function which is predefined, for example, a polynomial kernel or a Gaussion kernel. However, in classification problems, there is no one single kernel that suits for all data coming from multiple sources or different classes. Hence, some literatures are interested in considering multiple kernels rather than a single kernel. These methods are called Multiple Kernel Learning (MKL) which intends to train the weights of each kernel and select the best combination of kernel functions for classification [23]. MKL significantly improves the classification accuracy of the single kernel, but it also brings another problem, i.e., the increasing complexity of optimization. In order to address the problem of complexity of optimization, some improved MKL algorithms were designed, such as SimpleMKL which employs mixed-norm regularization [24]. In the field of hyperspectral image classification, some literatures start to utilize MKL to improve the classification performance. Camps-Valls et al. presented a heuristic algorithm of composite kernel machines to enhance the ability of classification for hyperspectral images [10]. Cui and Prasad proposed a linear combination of kernels methods which can assign weights to distinct meaningful feature sets [16]. In [20], intraclass and inter-class information is incorporated into the criteria for optimizing kernel parameters in order to enhance the separability of kernel feature space. Inspired by the MKL, Yang et al. [25] proposed Multiple Kernel Boosting (MKB) algorithm for human tracking. We consequently attempt to apply MKB algorithm to the hyperspectral classification. Fig. 1 demonstrates the flow of our proposed approach. In the SVMs Pool, every kernel SVM is regarded as a weak classifier. The pixel vectors are the objects which are projected to the different kernel spaces. Boosting techniques are adopted to find optimal kernel SVMs and build a strong classifier. In order to find an optimum group of SVMs, we make use of boosting techniques instead of optimization methods. In each iteration process, we recalculate the weights of the weak classifiers until a satisfactory strong classifier is generated. The rest of this paper is outlined as follows. Section 2 introduces classic SVM and MKL. Section 3 presents our proposed algorithm. Section 4 introduces the experiments and results on plankton spectral data and two hyperspectral remote sensing datasets. Finally, Section 5 provides the conclusion.

2. Related work 2.1. SVM The original SVM is a model for binary classification problem, and its aim is to find a separating hyperplane in the feature space. The algorithm can be expressed by the following minimization optimal formula:

 min

 1 T w w+C ξi 2 n

i=1







sub ject to yi wT xi + b ≥ 1 − ξi ,

ξi ≥ 0 , i = 1 , 2 . . . , n

(1)

where w is the normalized vector of the separating hyperplane, b is a bias, and ξ i is positive slack variables. Such that wT x + b = 0 is the separating hyperplane, the decision function can be expressed by:

f (x ) = sgn(wT x + b)

(2)

The result of f(x) is +1 or −1 as the sample’s label. Whereas, the most classification applications are non-linear problem or linear inseparable problem. In order to address these problems, the kernel methods were put foreword. Features are projected from the original space to a higher dimension features space, making the problem linear separable, thus:





y i w T φ ( x i ) + b ≥ 1 − ξi ,

ξi ≥ 0 , i = 1 , 2 . . . , n

(3)

F. Li, H. Lu and P. Zhang / Computers and Electrical Engineering 79 (2019) 106456

3

where φ (x) represents the results of projection of x in a kernel space. By using the dual learning algorithm, it can be solved from the corresponding Lagrange function, which is defined as:



n 

max α

i=1



n n 1  αi − αi α j yi y j φ (xi )T φ (xi ) 2 i=1 j=1

sub ject to

n 

αi yi = 0, 0 ≤ αi ≤ C, i = 1, 2, . . . , n

(4)

i=1

when α is determined, the decision function for test sample is obtained. The mapping function can be ignored by kernel trick. 2.2. MKL Select a proper kernel is a non-trivial task for a single kernel SVM in a given application. MKL aims to find a best combination of multiple kernels and the relevant classifier simultaneously. Let {Km }M be the multiple kernel matrices, m=1 {βm }M be the weights for the mixture kernels. Then the combination of multiple kernels can be expressed as follows: m=1 M 

K (x, xi ) =

βm Km (x, xi )

m=1

where the weights of kernel β m ≥ 0 and D 

F (x ) =

αi yi

M 

(5) M

m=1

βm = 1. The decision function is written as

βm Km (x, xi ) + b

(6)

m=1

i=1

The values of α i , β m and b can be derived from an optimization algorithm as described in depth in [23]. 3. Multiple kernel boosting Although MKL successfully solved the problem on how to select kernels for SVMs, the huge time-consuming is impracticable on the application of hyperspectral image processing. While boosting method can help us simplify the MKL optimization procedure. The above equation can be re-written as: M 

F (x ) =

m=1 M 

=

βm

D 

αi yi Km (x, xi ) + b

i=1

  βm Km (x )T α + bm

(7)

m=1

 where Km (x ) = [Km (x, x1 ), Km (x, x2 ), . . . , Km (x, xD )]T , α = (α1 y1 , α2 y2 , . . . , αD yD )T , and b = M m=1 bm . Therefore the decision function F(x) can be seen as linear combination of several SVMs. Suppose that the single objective function is f m (x ) = Km (x )T α + bm , then Eq. (7) can be rewritten as

F (x ) =

M 

βm fm (x )

(8)

m=1

We use the boosting method to get the value of β to avoid complex optimization processing. Firstly, every SVM is trained separately by the entire training set, so that the M weak SVM classifiers are confirmed; Secondly, a kernel SVM with the best discriminative ability is selected from the pool of weak SVMs repeatedly, and is added to the final decision function. Iterating the process until the predefined stop criterion is reached. In each iterative process, the classification error of a single weak classifier is calculated by:

δ=

D

i=1

w(i )| f (xi )|(sgn(−yi f (xi ) ) + 1 )  2 Di=1 w(i )| f (xi )|

(9)

where sgn(x) is a symbol function, when x is positive, the function value equals 1, otherwise it equals -1. The weak classifier with the least error is the object that should be added to the final object function, and its parameter is obtained by βi = 1−δi 1 2 log δi .

The workflow is outlined as follows: N Step1: Preparing kernel functions {Km }M m=1 and training set {xi , yi }i=1 ; Step2: Training SVMs to form a pool of kernel SVM candidates; Step3: Initializing all sample weights by using wi = N1 ;

4

F. Li, H. Lu and P. Zhang / Computers and Electrical Engineering 79 (2019) 106456

Fig. 2. Reflectance spectral curves for Pronghorn algae (a), Dinoflagellate (b), Noctiluca scintillans (c), and Heterosigma akashiwo (d).

Step4: For each SVM, computing classification error using Eq. (9), and selecting the SVM with the minimum error; 1−δ Step5: Computing weights using βi = 12 log δ i for each SVM; i

Step6: Recalculating all sample weights using wki +1 =  2

wki

δ j (δ j −1 )

e−β

ky

i hk ( xi )

, where h represent SVM;

Step7: Repeat the steps from step 4 to step 6 K times or β is less than 0. At last, we build the stronger classifier based on all the selected SVMs and their corresponding weights. 4. Experiments and results In this section, the performance of the MKB algorithm was tested on plankton spectral data and two hyperspectral remote sensing images in several aspects: class-specific accuracies, overall accuracy, average accuracy and Kappa coefficient. The MKB was compared with four classical classifiers such as KNN, SVM, Bayes and Sparse Representation via Orthogonal Matching Pursuit (SR-OMP) [15]. First, we introduce the characteristics of the datasets and the experimental settings in the experiments. Then, the results are shown for comparative analysis. 4.1. Data description Plankton reflectance spectral data: This data set is four plankton spectra at different chlorophyll concentrations which is obtained in the laboratory, and provided by China Marine Environment Monitoring Center. The four species of phytoplankton are Pronghorn algae, Dinoflagellate, Noctiluca scintillans, and Heterosigma akashiwo, respectively. These spectra have 240 bands. The wavelength ranges from 400 to 10 0 0 nanometers and the sampling interval is 2.7 nanometers. This dataset has several characteristics: (1) the number of bands is large; (2) the number of samples in each class is very small; (3) the waveforms are very similar. These all characteristics increase the classification difficulty. The spectral curves for Pronghorn algae (a), Dinoflagellate (b), Noctiluca scintillans (c) and Heterosigma akashiwo (d) are illustrated in Fig. 2. Salinas-A dataset: Salinas-A is a hyperspectral remote sensing image. It is a small part of Salinas scene which was collected by AVIRIS sensor over Salinas Valley, California. Its spatial resolution is 3.7 m per pixel. There are 86 rows and 83 columns in a band image, and totally 224 bands. Following previous works, 20 water absorption bands are discarded. Fig. 3 shows the gray-scale and ground truth image of Salinas-A. Pavia-U dataset: Pavia-U was acquired by ROSIS sensor over the area of Pavia University, northern Italy. Its spatial resolution is 1.3 m per pixel. Due to the black strip and water absorption noises, 13 bands are removed. The image inlcudes 1096∗ 715 pixels with 102 channels totally. There are 9 classes of land cover in this scene. Fig. 4 shows the colour composite image and ground truth of Pavia-U. 4.2. Experiments For phytoplankton spectral data, because of the small number of samples in every category, we select only three samples from every class as training samples. SR-OMP algorithm requires that the dictionary must be semi-positive definite, that

F. Li, H. Lu and P. Zhang / Computers and Electrical Engineering 79 (2019) 106456

5

Fig. 3. Sample image in the Salinas-A Scene dataset. (a) Grayscale image. (b) Corresponding pseudo color ground truth.

Fig. 4. Sample image in the Pavia University Scene dataset. (a) True color image. (b) Corresponding pseudo color ground truth.

is, the dimension of samples is much smaller than the number of samples. Therefore, we reduce the dimension of phytoplankton spectral data by Principal Component Analysis (PCA) to 10 to meet the calculation requirements. In Salinas-A, 10 training samples were selected randomly in every category. In Pavia-U, we selected 50 samples randomly in every category. The number of categories is taken as the parameter of KNN. On the contrary, the compared SVM used the RBF kernel, and the parameters are obtained by using cross-validation method. We used 4 kernels in MKB, such as linear kernel, polynomial kernel, RBF kernel and sigmoid kernel. It can be seen from the Tables 1–3 that the MKB derives better results than others. In view of different datasets, the classification accuracy in the dataset Salinas-A is very high, and in the Pavia-U, the classification accuracy is a little lower than that in Salinas-A. The reasons are attributed to as: (1) Although the number of categories in the two datasets is similar, the number of pixels varies greatly. The number of pixels in Pavia-U is much greater than that in Salinas-A. (2) In Salinas-A, the pixels in the same category of appears in blocks, and the distinction between different categories is remarkable. While in Pavia-U, the pixels in the same category are scattered. Fig. 5 shows the overall classification accuracy with different numbers of training samples. On Salinas-A dataset, except the BAYES, the overall accuracy is rising to a steady trend with the increase of the number of training samples. On Pavia-U

6

F. Li, H. Lu and P. Zhang / Computers and Electrical Engineering 79 (2019) 106456

Table 1 Classification results on Phytoplankton spectral data(%). Name

KNN

SVM

BAYES

SR-OMP

MKB

Pronghorn algae Dinoflagellate Noctiluca scintillans Heterosigma akashiwo Overall Average Kappa

55.56 100 80 87.5 78.57 80.76 0.7711

55.56 100 80 87.5 75 77.99 0.7332

44.44 100 60 50 64.29 66.39 0.6200

66.67 66.67 100 62.5 71.43 73.96 0.6949

66.67 100 100 87.5 85.71 88.54 0.8012

Table 2 Classification results on Salinas-A dataset(%). Name

KNN

SVM

BAYES

SR-OMP

MKB

Brocoli green weeds 1 Corn senesced green weeds Lettuce romaine 4wk Lettuce romaine 5wk Lettuce romaine 6wk Lettuce romaine 7wk Overall Average Kappa

99.74 77.43 95.29 99.87 99.85 99.87 93.24 94.84 0.9303

99.74 95.53 64.96 100 100 92.74 97.16 97.16 0.9710

99.48 44.68 98.05 96.26 98.96 99.50 84.57 89.49 0.8409

99.23 94.42 88.64 86.10 68.99 87.73 87.53 87.52 0.8711

99.81 95.64 96.54 99.82 100 96.31 97.55 98.02 0.982

Table 3 Classification results on Pavia-U dataset(%). Name

KNN

SVM

BAYES

SR-OMP

MKB

Asphalt Meadows Gravel Trees Bitumen Soil Metal Sheets Bricks Shadows Overall Average Kappa

63.41 51.31 71.70 96.87 99.11 72.36 90 70.97 99.84 68.48 79.51 0.6474

82.32 81.50 83.99 87.66 99.47 78.58 90 71.21 99.89 82.2 86.07 0.8179

69.51 67.33 28.68 90.53 99.48 35.99 84.89 78.76 98.94 66.99 72.67 0.6626

68.05 67.78 65.45 97.30 99.70 73.34 88.45 82.13 94.58 75.12 82.36 0.6618

84.41 86.75 84.63 98.30 100 82.3 92.53 79.21 99.54 85.3 89.74 0.883

Fig. 5. The overall classification accuracy on two datasets.

F. Li, H. Lu and P. Zhang / Computers and Electrical Engineering 79 (2019) 106456

7

Fig. 6. The kappa coefficient on two datasets. Table 4 Computational time on Salinas-A(s). Number

10

20

30

40

50

60

70

80

90

100

KNN BAYES SVM SR-OMP MKB

0.250 0.352 0.165 10.559 1.279

0.095 0.161 0.277 14.146 1.844

0.097 0.141 0.455 16.779 1.921

0.116 0.131 0.840 20.331 2.101

0.179 0.161 0.977 22.705 2.614

0.153 0.124 1.229 26.236 2.435

0.172 0.124 1.702 28.116 4.123

0.198 0.127 1.907 31.387 5.331

0.214 0.124 2.332 32.887 7.342

0.234 0.123 2.831 35.522 6.995

Table 5 Computational time on Pavia-U(s). Number

10

20

30

40

50

60

70

80

90

100

KNN BAYES SVM SR-OMP MKB

1.313 2.876 2.523 319.634 3.561

1.750 2.467 3.955 316.655 5.134

2.068 2.270 5.653 322.381 7.681

2.580 2.365 7.171 332.410 9.213

3.122 2.347 9.730 310.147 12.540

3.545 2.344 11.739 329.987 14.362

3.954 2.278 12.816 325.271 16.339

4.395 2.312 15.798 327.335 18.134

4.895 2.397 21.492 326.634 20.677

5.514 2.313 19.089 327.676 25.221

dataset, the overall accuracy is generally increasing slowly. It is apparent that the MKB produced the best results among four methods. Further, results of BAYES and KNN are not as good as that of SVM. Fig. 6 shows the Kappa coefficient with different numbers of training samples. The Kappa coefficient is an index to measure the classification accuracy. That is why the trend of Kappa coefficient is similar to the trend of classification accuracy. On the Salinas-A dataset, when the number of training samples exceeds 15, the Kappa coefficients of the four methods are higher than 0.9, indicating that the consistency of the classification results is almost perfect. However, on the PaviaU dataset, the results of MKB and SVM methods are in the range of 0.7 to 0.9, implying the consistency of the result is substantial. Finally, in order to evaluate the efficiency of MKB, run time of each classification algorithm is also provided. To be fair, all algorithms involved in the comparsion were implemented in the same computer environment: Intel Core i5-2400 CPU, 12-GB random access memory, Windows 7 operating system, and Matlab 2016a. The run time was computed for each classification algorithm from selecting training samples to testing complete data sets. Tables 4 and 5 show the experimental results which indicate that computational time of BAYES and KNN algorithm is very small and almost not affected by the number of samples. Although MKB algorithm is not fastest, it is much faster than SR-OMP algorithm. We can also see that each algorithm spends more time on Pavia-U dataset than on Salinas-A dataset, because the Pavia-U dataset contains not only more categories, but also more test samples than Salinas-A dataset. From the above analysis, we can see that the MKB method performs better than other methods, no matter in the classification accuracy or the kappa coefficient, and its computational time is acceptable. 5. Conclusion In this paper, we have proposed a novel hyperspectral image classification method by the named MKB. MKB combines MKL and SVM to construct a strong classifier which strengthens the discriminative ability of SVM. By using the boosting

8

F. Li, H. Lu and P. Zhang / Computers and Electrical Engineering 79 (2019) 106456

method, an update scheme adjusts their weights to reselect good kernel SVMs. It turned out that the MKB performed better than others. In future work, we plan to combine the spatial-contextual constrain with MKB framework to improve the classification performance. Declaration of Competing Interest None. Supplementary material Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.compeleceng. 2019.106456. References [1] Chang CI. Hyperspectral imaging: techniques for spectral detection and classification. Plenum Publishing Corporation; 2003. [2] Hughes G. On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory 1968;14(1):55–63. [3] Li F, Zhang P, Huchuan L. Unsupervised band selection of hyperspectral images via multi-dictionary sparse representation. IEEE Access 2018;6:71632–43. [4] Lan R, He J, Wang S, Gu T, Luo X. Integrated chaotic systems for image encryption. Signal Process 2018;147:133–45. [5] Lan R, Lu H, Zhou Y, Liu Z, Luo X. An LBP encoding scheme jointly using quaternionic representation and angular information. Neural Comput Appl 2019(5):1–7. [6] Serikawa HLS. Underwater image dehazing using joint trilateral filter. Comput Electr Eng 2014;40(1):41–50. [7] Lu H, Li Y, Mu S, Dong W, Kim H, Serikawa S. Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J 2018;5(4):2315–22. [8] Liu W, Li S, Zhang M, Wu Y, Su SZ, Ji R. Spectral-spatial classification of hyperspectral imagery based on random forests. In: Proceedings of the international conference on internet multimedia computing and service. ACM; 2013. p. 163–8. [9] Melgani F, Bruzzone L. Classification of hyperspectral remote sensing images with support vector machines. IEEE Trans Geosci Remote Sens 2004;42(8):1778–90. [10] Camps-Valls G, Gomez-Chova L, Muñoz-Marí J, Vila-Francés J, Calpe-Maravilla J. Composite kernels for hyperspectral image classification. IEEE Geosci Remote Sens Lett 2006;3(1):93–7. [11] Kuo B-C, Ho H-H, Li C-H, Hung C-C, Taur J-S. A kernel-based feature selection method for SVM with RBF kernel for hyperspectral image classification. IEEE J Sel Top Appl Earth Obs Remote Sens 2014;7(1):317–26. [12] Swain YJ. Bayesian contextual classification based on modified m-estimates and Markov random fields. IEEE Trans Geosci Remote Sens 2002;34(1):67–75. [13] Li J, Bioucas-Dias JM, Plaza A. Semisupervised hyperspectral image classification using soft sparse multinomial logistic regression. IEEE Geosci Remote Sens Lett 2013;10(2):318–22. [14] Khodadadzadeh M, Li J, Plaza A, Bioucas-Dias JM. A subspace-based multinomial logistic regression for hyperspectral image classification. IEEE Geosci Remote Sens Lett 2014;11(12):2105–9. [15] Chen Y, Nasrabadi NM, Tran TD. Hyperspectral image classification using dictionary-based sparse representation. IEEE Trans Geosci Remote Sens 2011;49(10):3973–85. [16] Cui M, Prasad S. Class-dependent sparse representation classifier for robust hyperspectral image classification. IEEE Trans Geosci Remote Sens 2015;53(5):2683–95. [17] Chen Y, Lin Z, Zhao X. Deep learning-based classification of hyperspectral data. IEEE J Sel Top Appl Earth Obs Remote Sens 2014;7(6):2094–107. [18] Lu H, Li Y, Min C, Kim H, Serikawa S. Brain intelligence: go beyond artificial intelligence. Mob Netw Appl 2018;23(2):368–75. [19] Lu H, Li Y, Uemura T, Kim H, Serikawa S. Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Gener Comput Syst 2018;82:142–8. [20] Bruzzone L, Persello C. A novel context-sensitive semisupervised SVM classifier robust to mislabeled training samples. IEEE Trans Geosci Remote Sens 2009;47(7):2142–54. [21] Fauvel M, Benediktsson JA, Chanussot J, Sveinsson JR. Spectral and spatial classification of hyperspectral data using SVMS and morphological profiles. IEEE Trans Geosci Remote Sens 2008;46(11):3804–14. [22] Pal M FGM. Feature selection for classification of hyperspectral data by SVM. IEEE Trans Geosci Remote Sens 2010;48(5):2297–307. [23] Bach FR, Lanckriet GR, Jordan MI. Multiple kernel learning, conic duality, and the SMO algorithm. In: Proceedings of the twenty-first international conference on machine learning. ACM; 2004. p. 6. [24] Rakotomamonjy A, Bach F, Canu S, Grandvalet Y. SimpleMKL. J Mach Learn Res 2008;9:2491–521. [25] Fan Y, Lu H, Yang MH. Robust visual tracking via multiple kernel boosting with affinity constraints. IEEE Trans Circuits Syst Video Technol 2014;24(2):242–54. Fei Li received the M.Sc. degree from Nankai University, China in 2005. Currently, she is pursuing the Ph.D. degree in the School of Information and Communication Engineering, Dalian University of Technology(DUT), China. Her research interests include machine learning and hyperspectral image processing. Huchuan Lu received the M.S. degree in signal and information processing, Ph.D. degree in system engineering, Dalian University of Technology(DUT), China, in 1998 and 2008, respectively. He has been a faculty since 1998 and a professor since 2012 in the School of Information and Communication Engineering of DUT. His research interests are in the areas of computer vision and pattern recognition. Pingping Zhang received B.E. degree in mathematics and applied mathematics, Henan Normal University (HNU), Xinxiang, China, in 2012. He is currently a Ph.D. candidate in the School of Information and Communication Engineering, Dalian University of Technology (DUT), China. His research interests are in deep learning, saliency detection, object tracking and semantic segmentation.