Wavelet SVM in Reproducing Kernel Hilbert Space for hyperspectral remote sensing image classification

Wavelet SVM in Reproducing Kernel Hilbert Space for hyperspectral remote sensing image classification

Optics Communications 283 (2010) 4978–4984 Contents lists available at ScienceDirect Optics Communications j o u r n a l h o m e p a g e : w w w. e ...

2MB Sizes 1 Downloads 105 Views

Optics Communications 283 (2010) 4978–4984

Contents lists available at ScienceDirect

Optics Communications j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / o p t c o m

Wavelet SVM in Reproducing Kernel Hilbert Space for hyperspectral remote sensing image classification Peijun Du a,⁎, Kun Tan a,b, Xiaoshi Xing b a Key Laboratory for Land Environment and Disaster Monitoring of State Bureau of Surveying and Mapping of China, China University of Mining and Technology, Xuzhou City, Jiangsu Province 221116, PR China b Center for International Earth Science Information Network (CIESIN), Columbia University, 61 Route 9W, PO Box 1000, Palisades, NY 10964, USA

a r t i c l e

i n f o

Article history: Received 15 December 2009 Received in revised form 10 July 2010 Accepted 3 August 2010 Keywords: Hyperspectral remote sensing Wavelet transform Support Vector Machine (SVM) Reproducing Kernel Hilbert Space (RKHS) Classification

a b s t r a c t Combining Support Vector Machine (SVM) with wavelet analysis, we constructed wavelet SVM (WSVM) classifier based on wavelet kernel functions in Reproducing Kernel Hilbert Space (RKHS). In conventional kernel theory, SVM is faced with the bottleneck of kernel parameter selection which further results in timeconsuming and low classification accuracy. The wavelet kernel in RKHS is a kind of multidimensional wavelet function that can approximate arbitrary nonlinear functions. Implications on semiparametric estimation are proposed in this paper. Airborne Operational Modular Imaging Spectrometer II (OMIS II) hyperspectral remote sensing image with 64 bands and Reflective Optics System Imaging Spectrometer (ROSIS) data with 115 bands were used to experiment the performance and accuracy of the proposed WSVM classifier. The experimental results indicate that the WSVM classifier can obtain the highest accuracy when using the Coiflet Kernel function in wavelet transform. In contrast with some traditional classifiers, including Spectral Angle Mapping (SAM) and Minimum Distance Classification (MDC), and SVM classifier using Radial Basis Function kernel, the proposed wavelet SVM classifier using the wavelet kernel function in Reproducing Kernel Hilbert Space is capable of improving classification accuracy obviously. © 2010 Elsevier B.V. All rights reserved.

1. Introduction New sensors and novel classification methods in remote sensing are providing more and more capabilities for target recognition [1], environment monitoring [2], agriculture [3] and other fields than ever before. As one of the most significant earth observation data sources, hyperspectral remote sensing image has hundreds of bands, so its classification is often challengeable owing to the impacts of uncertainty, noise and high-dimensional data volume [4]. A lot of classification approaches have been proposed and applied to hyperspectral remote sensing images, and an alternative technique, Support Vector Machine (SVM) [5,6], has been applied to hyperspectral remote sensing image classification with good performance in recent years [7,8]. SVM represents a promising direction in machine learning area [7]. Many scientists have done hard and productive research in this field, such as data uncertainty [9], use of small sampling sets [10,11], composite kernel [12] and so on. Kernel function plays important roles in SVM classification, and many kernel functions are available now, such as linear, polynomial, Gaussian, and Sigmoid-shaped functions [13]. However, novel kernel functions still need be studied, designed and applied to SVM due to the limitations of

⁎ Corresponding author. E-mail address: [email protected] (P. Du). 0030-4018/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.optcom.2010.08.009

different kernel functions and the difficulty of designing a universal kernel function which actually is a hot topic in machine learning at present. Wavelet theory has developed rapidly with wide use fields such as signal processing [14], image restoration [15], fault diagnosis [16], and so on since the 1980s. The wavelet network was proposed for classification with great success [17]. There are a variety of contributions to the wavelet kernel function for SVM such as WSVM [18], reproducing the wavelet kernel frame [19] and Least Square SVM [20]. However, most of those studies were focused on support vector regression, and the applications in hyperspectral remote sensing image classification are still unavailable yet. In this paper, a new wavelet kernel is introduced into SVM in order to obtain better generalization performance. According to the Reproducing Kernel Hilbert Space (RKHS) theory, our approach is to construct a wavelet estimator and optimize the wavelet kernel by solving empirical risk minimization problem in RKHS. Hyperspectral remote sensing image is used to test this proposed SVM classifier, and different wavelet kernels are experimented and compared as well. The rest of this paper is organized into four sections. Section 2 recalls the principles of support vector machine for hyperspectral remote sensing image classification, Section 3 introduces regularization theory and wavelet theory in RKHS. Section 4 deals with two experiments. Finally, Section 5 summarizes the observations and concluding remarks to complete this paper.

P. Du et al. / Optics Communications 283 (2010) 4978–4984

4979

2. Principles of Support Vector Machine classification

3. Regularization theory and wavelet theory

Training data are required for training SVM model, but these data cannot be separated without errors. The data points that are closest to the hyperplane are used to measure the margin, and SVM tries to find the hyperplane that maximizes the margin and minimizes a quantity proportion to the number of misclassification errors. SVM turns out that the optimal hyperplane can be determined as the solution of the following convex quadratic programming problem [21,22]:

The main idea in this proposed method is to construct a wavelet estimator by solving an empirical risk minimization problem. Learning from samples can be viewed as the estimation of the functional dependency between an input x and an output y of a system given a set of examples.

min : w;b;ξi

l 1 2 ‖w‖ + C ∑ ξi 2 i=1

ð1Þ

subject to:

n

D = fðxi ; yi Þ; xi ∈χ⊂ℝ ; yi ∈γ = f−1; 1g; i = 1; ⋯lg

The solution of the problem is the function f that minimizes the regularized empirical risk: 2

R½ f  = Efð f ðXÞ−Y Þ g:

yi ½ðw⋅xi Þ + b ≥ 1−ξi ; ∀i = 1; 2; ⋯; l ξi ≥0; ∀i = 1; ⋯; l

ð2Þ

where {(x1, y1), ⋯ (xi, yi)} are the labeled training dataset, and xi ∈ ℝN and yi ∈ {− 1, 1}. w and b define a linear classifier in the feature space, C is the regularization parameter defined by user, and ξi is a positive slack variable enabling to deal with permitted errors. The classical linearly constrained optimization problem can be transformed (using a Lagrangian formulation) into the following dual problem: N

Maximize: ∑ αi − i=1

  1 N N ∑ ∑ αi αj yi yj K xi ; xj 2 i=1 j=1

N

i=1

Considering the probability law, the empirical risk is: Remp ½ f  =

1 l ∑ C ðxi ; yi ; f ðxi ÞÞ l i=1

Rreg ½ f  =

1 l 2 ∑ C ðxi ; yi ; f ðxi ÞÞ + λ‖ f ‖H l i=1

l

f ðxÞ = sgn ∑

For a linearly nonseparable case, a kernel function that satisfies the condition stated by Mercer's theorem and corresponds to some types of inner product in the transformed (higher) dimensional feature space is introduced: ð4Þ

So the final result is a discrimination function f(x) conveniently expressed as a function of the data in the original (lower) dimensional feature space:  n   f ðxÞ = sgn½ðw⋅xÞ + b = sgn ∑ αi yi K ðxi ⋅xÞ + b :

ð5Þ

i=1

ð13Þ

i=1

! αi yi K ðx; xi Þ



+b

ð14Þ

where b⁎ ∈ R, α⁎i is the Lagrange multiplier, and K(⋅, ⋅) is the reproducing wavelet kernel. RKHS is a Hilbert space with special properties. The interest of RKHS arises from its associated kernel functions. For simplicity, the following proposition is assumed without proof. Theorem. Wavelet frames' finite set of L2(ℝ) is a Hilbert space endowed with inner product spans a RKHS. Let's define an indexed family of function Γt(⋅) ∈ L2(ℝ) index by t ∈ χ (χ being any subset of ℝn), its reproducing kernel is K(x, y) = b Γx(⋅), Γy(⋅) N L2(ℝ). A detailed proof can be referred to [25,26]. The idea of wavelet analysis is to approach a function using a family of functions which are produced by translation and dilatation of the mother wavelet function [27].

Some popular kernel functions include [23]: (1) Linear kernel: K ðxi ; xÞ = ðxi ⋅xÞ

ð12Þ

where C(⋅, ⋅) is a cost function. In order to avoid ill-posed problems, we turn it into a well-posed one using regularization theory [24].

ð3Þ

    K xi ; xj = Φðxi Þ⋅Φ xj :

ð11Þ

where Η is a Reproducing Kernel Hilbert Space (RKHS), and λ is a regularization parameter. By minimizing the regularized empirical risk in Eq. (13), WSVM estimator is with the form as follows:

subject to: ∑ αi yi = 0 and 0≤ αi ≤ C; i = 1; 2; ⋯; l:

ð10Þ

ð6Þ

φa;b ðxÞ = jaj

−1 = 2

φ

  x−b a

ð15Þ

(2) Polynomial kernel: d

K ðxi ; xÞ = ðxi ⋅x + 1Þ

ð7Þ

(where d is a constant); (3) Gaussian Radial Basis Function kernel:   2 K ðxi ; xÞ = exp −γ‖x−xi ‖

φi ðxÞ = φj;k ðxÞ = ja0 j ð8Þ

(4) Sigmoid kernel:    d T K xi ; xj = tanh γxi xj + γ

where a, b ∈ ℝ, a is the dilatation factor and b is the translation factor. Rewriting it with multi index form, that is

ð9Þ

−j = 2

  −j φ a0 x−kb0

ð16Þ

where a0, b0 ∈ R, j, k ∈ Z, j denotes a multi index. Considering the necessary condition of the theory and an orthonormal basis fei g of L2 (ℝ), the function of Γx(⋅) is rewritten as: Γx ð⋅Þ = ∑ αi; j φj ðxÞei ð⋅Þ i;j

ð17Þ

4980

P. Du et al. / Optics Communications 283 (2010) 4978–4984

Fig. 1. Examples of wavelet kernel. (a) Harr Kernel (b) Daubechies Kernel (c) Coiflets Kernel and (d) Symlet Kernel.

where n o αi,j=cjδi,j are coefficients combining with the wavelet basis φj of Γx(⋅) ∈ L2(ℝ) [28]. Finally, the wavelet kernel in Reproducing Kernel Hilbert Space is: K ðx; yÞ = ∑ αi; j αj;n φj ðxÞφn ð yÞ

ð18Þ

i;j;n

So far, we can construct the multidimensional wavelet kernel from the straightforward one-dimensional wavelet kernel function. Firstly, the wavelet basis of L2(ℝ) can be replaced with a new basis of L2(ℝd). Then, according to the tensor products of RKHS [29], we write it as: d

Kd ðx; yÞ = ∏ kðxi ; yi Þ

ð19Þ

i=1

Moreover, the kernel with coefficients in RKHS is: jmax

K ðx; yÞ = ∑ ∑ j = jmin k

1 φj;k ðxÞφj;k ð yÞ 2j

ð20Þ

where k is the translation parameter of a mother wavelet function, and jmin, jmax are the minimum and maximal dilations [19]. But the

mother wavelet with the minimal and maximal dilations should be set in practical experiments. It is obvious that conditions for having frameable RKHS are easier to verify than Mercer's condition. Examples will be given next within the context of semiparametric estimation. Multiscale algorithm can take advantage of the wavelet multiresolution structure. The wavelet kernels are efficient for estimating high-dimensional functions. The minimal and maximal dilations could be selected by cross-validation model [30]. Traditional SVM only provides two-class classification arithmetic, and it is important to extend to multi-class classification. The 1-against-1 (1-a-1) model is applied in this paper [31].

Table 1 The main properties of some wavelet functions. Wavelet function Form Orthogonality Biorthogonality Compact support Continuous wavelet transform Length of support Symmetry

Haar Haar Yes Yes Yes Yes 1 Yes

Daubechies dbN Yes Yes Yes Yes 2N − 1 App.

Coiflets coifN Yes Yes Yes Yes 6N − 1 App.

Symlets Sym Yes Yes Yes Yes 2N − 1 App.

P. Du et al. / Optics Communications 283 (2010) 4978–4984

4981

Fig. 2. (a) Original remote sensing image, WSVM classification result (b) Harr Kernel (c) Daubechies Kernel (d) Coiflets Kernel and (e) Symlet Kernel.

4. Experiments In our experiments, we constructed WSVM with different wavelet functions. Table 1 shows the main properties of some wavelets with wide uses. The representations of the wavelet kernel SVM function using Haar, Daubechies, Coiflets, and Symlets are shown in Fig. 1. They are used in the following two classification experiments. 4.1. Experiment 1 The experiment data is airborne OMIS hyperspectral image at Changping Area, Beijing City in China. It has 512 rows, 512 pixels and 64 bands. Fig. 2 is the RBG composite image of the hyperspectral image (R: Band 36 with wavelength of 0.81 μm, G: Band 23 with wavelength of 0.68 μm, and B: Band 11 with wavelength of 0.56 μm). In Fig. 2, the green region is the grass land, the black region is the fish pond region, the yellow region is the yellow grass, and the white region is the inhabited area. After the whole data has been normalized, training and test samples were selected. Taking into account all features of the spectral

and texture, the Pixel Purity Index (PPI) was computed. One thousand pure pixels were obtained as endmember pixels and then chosen as samples. In the experiment, these pure pixels together with ground truth were taken as training and test samples. The classification problem involves the identification of eight land cover types (C1: crop land (230 pixels), C2: inhabited area (111 pixels), C3: inhabited area 1 (67 pixels), C4: crop land 1 (78 pixels), C5: water (103 pixels), C6: road (65 pixels), C7: bare soil (91 pixels), and C8: plant (88 pixels)). Also the test samples are obtained in the same way. The experiment was conducted using different wavelet kernel WSVMs. Classification process was performed using 5-fold crossvalidation. For the Harr Kernel function, the classification accuracy is 87.13%, kappa coefficient is 0.8520 with the parameters (jmin = − 8, jmax = 8). The other Daubechies Kernel function obtained the classification accuracy of 88.51% and kappa coefficient of 0.8678 with the parameters (jmin = − 6, jmax = 6). As for the Coiflet Kernel function, we obtained the best classification accuracy, 88.94% and kappa coefficient, 0.8728 with the parameters (jmin = − 2, jmax = 2). For the Symlet Kernel function, its classification accuracy is 88.51% and kappa coefficient is 0.8678 with the parameters (jmin = − 6,

Table 2 Harr Kernel WSVM classifier confusion-matrix.

C1 C2 C3 C4 C5 C6 C7 C8 Producer accuracy (%)

C1

C2

C3

C4

C5

C6

C7

C8

User accuracy (%)

136 0 0 0 0 0 0 0 100.00

0 28 0 0 0 0 0 0 100.00

0 35 66 0 0 12 0 0 58.41

0 0 0 164 0 0 0 0 100.00

0 0 1 0 135 0 0 0 99.26

0 28 23 0 0 80 0 0 61.07

0 5 2 0 0 0 105 3 91.30

0 0 1 0 0 11 0 105 89.74

100.00 29.17 70.97 100.00 100.00 77.67 100.00 97.22 87.13

4982

P. Du et al. / Optics Communications 283 (2010) 4978–4984

Table 3 Daubechies Kernel WSVM classifier confusion-matrix.

C1 C2 C3 C4 C5 C6 C7 C8 Producer accuracy (%)

C1

C2

C3

C4

C5

C6

C7

C8

User accuracy (%)

135 0 0 0 0 0 0 0 100.00

0 29 0 0 0 0 0 0 100.00

1 30 70 0 0 7 0 0 64.81

0 0 0 164 0 0 0 0 100.00

0 18 1 0 135 0 0 0 87.66

0 19 21 0 0 94 0 0 70.15

0 0 0 0 0 0 105 8 92.92

0 0 1 0 0 2 0 100 97.09

99.26 30.21 75.27 100.00 100.00 91.26 100.00 92.59 88.51

C1

C2

C3

C4

C5

C6

C7

C8

User accuracy (%)

135 0 0 0 0 0 0 0 100.00

0 28 0 0 0 0 0 0 100.00

1 54 68 0 0 6 0 0 52.71

0 0 0 164 0 0 0 0 100.00

0 0 0 0 135 0 0 0 100.00

0 7 25 0 0 97 0 0 75.19

0 7 0 0 0 0 105 4 90.52

0 0 0 0 0 0 0 104 100.00

99.26 29.17 73.12 100.00 100.00 94.17 100.00 96.30 88.94

Table 4 Coiflet Kernel WSVM classifier confusion-matrix.

C1 C2 C3 C4 C5 C6 C7 C8 Producer accuracy (%)

Table 5 Symlet Kernel WSVM classifier confusion-matrix.

C1 C2 C3 C4 C5 C6 C7 C8 Producer accuracy (%)

C1

C2

C3

C4

C5

C6

C7

C8

User accuracy (%)

134 0 0 0 0 0 0 0 100.00

0 28 0 0 0 0 0 0 100.00

2 39 74 0 0 7 0 0 60.66

0 0 0 164 0 0 0 0 100.00

0 12 0 0 135 0 0 0 91.84

0 16 18 0 0 94 0 0 73.44

0 1 0 0 0 0 104 9 91.23

0 0 1 0 0 2 1 99 96.12

98.53 29.17 79.57 100.00 100.00 91.26 99.05 91.67 88.51

jmax = 6). Tables 2–5 are the confusion-matrix of different kernel functions. Fig. 2 shows the classification results of the different wavelet kernel SVMs. As shown in these tables, different classes are with different accuracy. From those confusion matrixes, it is found that C2 has the lowest user accuracy because C2, C3 and C6 classes have similar spectrum, as shown in Fig. 3. In order to compare its performance with the conventional kernel of SVM, we also choose SVM using RBF kernel for classification. The RBF kernel has better accuracy than the three other kernels [32–34] (Linear Kernel, Polynomial Kernel and Sigmoid Kernel). Using the same training and testing samples, its overall accuracy is 84.26% and the kappa coefficient is 0.8190 with the parameters (C = 100 and δ = 0.019). Considering traditional classification algorithms, Spectral Angle Mapping classification (SAM) and Minimum Distance Classification (MDC) are applied in our experiments. The accuracy of SAM is 78.9262%, and the accuracy is just 76.3830% for MDC.

pixels), C3: metal sheets (265 pixels), C4: bare soil (532 pixels), C5: trees (524 pixels), C6: meadows (540 pixels), C7: gravel (392 pixels), C8: asphalt (548 pixels), and C9: bitumen (375 pixels)). We used test samples provided by Prof. Paolo Gamba of the University of Pavia, Italy for classification accuracy assessment. After Principal Component Analysis (PCA), the first 10 principal components were chosen for classification. The experiment was conducted using different kernel WSVMs. In the Harr Kernel function, the classification accuracy is 90.13%, kappa coefficient is 0.8820 with the parameters (jmin = − 6, jmax = 6). For the Daubechies Kernel

4.2. Experiment 2 The experiment data was collected in 2003 by the ROSIS sensor, with spectral coverage ranging from 0.43 to 0.86 μm and size of 610 × 340 pixels. It has spatial resolution of 1.3 m/pixels with 115 spectral bands. The data is centered at University of Pavia and atmospherically corrected. The classification involves the identification of nine land cover types (C1: brick (514 pixels), C2: shadow (231

Fig. 3. The spectrum of C2, C3 and C6.

P. Du et al. / Optics Communications 283 (2010) 4978–4984

function, we obtained the classification accuracy 89.91% and kappa coefficient 0.8788 with the parameters (jmin = − 4, jmax = 4). As for the Coiflet Kernel function, the best overall accuracy 91.23% and kappa coefficient 0.8956 were obtained with the parameters (jmin = − 2, jmax = 2). For the Symlet Kernel function, its classification accuracy is 88.98% and kappa coefficient is 0.8789 with the parameters (jmin = − 8, jmax = 8). Fig. 4 shows the classification results of the different wavelet kernel SVMs. For SVM using RBF kernel, the accuracy is 85.31% with the kappa coefficient 0.8437. The best classification accuracy (88.94% in Experiment 1 and 91.23% in Experiment 2) was obtained using the Coiflet Kernel function. Coiflet Kernel function has some interesting properties that makes it useful in image and signal processing. It possesses maximal number of vanishing shifted scaling moments for given number of scaling coefficients. Coiflets are separable filters in the sense that spatial frequencies in x, y, and diagonal directions can be selected using this type of filters. Besides, Coiflet filters maintain a close match between the trend values of the signal and the original signal.

4983

5. Conclusions and discussions In this paper, the wavelet kernel in RKHS was constructed and combined with SVM classifier, and four different wavelet kernels were used in experiments. In the OMIS II data example presented, the results highlighted that the WSVM could derive an accurate classification (highest: 88.94%, lowest: 87.13%). We obtained the best classification accuracy of 88.94% using the Coiflet Kernel function. Compared with the RBF Kernel SVM, SAM, and MDC, WSVM is more effective. When using ROSIS data, WSVM is also the most valuable classifiers. And also the Coiflet Kernel WSM has the best classification accuracy. Coiflets have some interesting properties when applied in Remote Sensing image processing and possesses maximal number of vanishing shifted scaling moments. It also keeps a close match between the trend values of the signal and the original signal. However, it takes a long time for training when we set these parameters (jmin, jmax) as big values. The other limitation is that it is difficult to discriminate those classes with the similar spectrum. We'll give further investigations in our future work.

Fig. 4. (a) Original remote sensing image, WSVM classification result (b) Harr Kernel (c) Daubechies Kernel (d) Coiflets Kernel and (e) Symlet Kernel.

4984

P. Du et al. / Optics Communications 283 (2010) 4978–4984

Acknowledgments The authors would like to thank the anonymous reviewers for their constructive comments, which are very useful for improving this paper. The paper is completed also with support from the Center for International Earth Science Information Network (CIESIN), Columbia University, USA. This work is supported by research grants from the National Natural Science Foundation of China (No. 40401038, No. 40871195), the National High-Tech Program of China (No. 2007AA12Z162), and the Jiangsu Provincial Innovative Planning Fund for Graduate Students (CX08B_112Z). Our thanks also go Prof. Paolo Gamba from the University of Pavia for providing the ROSIS data. References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

S. Prasad, L.M. Bruce, IEEE Geosci. Remote Sens. Lett. 5 (Oct 2008) 625. M. Garcia, et al., Remote Sens. Environ. 112 (Sep 2008) 3618. K.C. Swain, et al., J. Appl. Remote Sens. vol. 1 (2007). A. Plaza, et al., Remote Sens. Environ. 113 (2009) S110. B. Scholkopf, A. Smola, Learning with Kernels, MIT Press Cambridge, Mass, 2002. V.N. Vapnik, The Nature of Statistical Learning Theory, Springer, NY, 2000. M. Pal, P. Mather, Int. J. Remote Sens. 26 (2005) 1007. F. Melgani, L. Bruzzone, IEEE Trans. Geosci. Remote Sens. 42 (2004) 1778. T. Oommen, et al., Math. Geosci. 40 (2008) 409. G.M. Foody, et al., Remote Sens. Environ. 104 (Sep 2006) 1. G.M. Foody, A. Mathur, Remote Sens. Environ. 103 (Jul 2006) 179.

[12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34]

G. Camps-Valls, et al., IEEE Trans. Geosci. Remote Sens. 46 (2008) 1822. G. Camps-Valls, et al., Neurocomputing 62 (2004) 501. I. Daubechies, IEEE Trans. Inf. Theory 36 (Sep 1990) 961. J.M. Zhong, H.F. Sun, IEEE Trans. Circuits Syst. Regul. Pap. 55 (Oct 2008) 2716. Z.M. Du, et al., Hvac&R Research 14 (Nov 2008) 959. Q. Zhang, et al., IEEE Trans. Neural Netw. 3 (1992) 889. L. Zhang, et al., IEEE Transactions on Systems, Man and Cybernetics Part B 34 (2004) 34. A. Rakotomamonjy, et al., Applied Stochastic Models in Business and Industry vol. 21 (2005) 153 Mar–Apr. F. F. Wu and Y. L. Zhao, Least squares support vector machine on Gaussian wavelet kernel function set, 2006, pp. 936–941. V.N. Vapnik, The Nature of Statistical Learning Theory, Springer, NY, 1995. B. Scholkopf, et al., New Support Vector Algorithms, vol. 12, MIT Press, 2000, p. 1207. B. Scholkopf, et al., Advances in Kernel Methods: Support Vector Learning, MIT Press, 1999. T. Evgeniou, et al., Adv. Comput. Math. 13 (2000) 1. A. Rakotomamonjy, S. Canu, Journal of Machine Learning Research 6 (Sep 2005) 1485. S. Canu, et al., Advances in Learning Theory: Methods, Models and Applications 190 (2003) 89. Q. Zhang, et al., IEEE Transactions on Neural Networks 3 (1992) 889. L. Debnath, P. Mikusinski, Hilbert Spaces with Applications, Academic Press, 2005. N. Aronszajn, Transactions of the American Mathematical Society 68 (1950) 337. C. Hsu, et al., A Practical Guide to Support Vector Classification, 2003. C. Hsu, C. Lin, IEEE Trans. Neural Netw. 13 (2002) 415. K. Tan, P.J. Du, Spectroscopy and Spectral Analysis 28 (Sep 2008) 2009. F. Melgani, L. Bruzzone, IEEE Trans. Geosci. Remote Sens. 42 (2004) 1778. G. Camps-Valls, L. Bruzzone, IEEE Trans. Geosci. Remote Sens. 43 (2005) 1351.