A novel binary tree support vector machine for hyperspectral remote sensing image classification

A novel binary tree support vector machine for hyperspectral remote sensing image classification

Optics Communications 285 (2012) 3054–3060 Contents lists available at SciVerse ScienceDirect Optics Communications journal homepage: www.elsevier.c...

3MB Sizes 73 Downloads 118 Views

Optics Communications 285 (2012) 3054–3060

Contents lists available at SciVerse ScienceDirect

Optics Communications journal homepage: www.elsevier.com/locate/optcom

A novel binary tree support vector machine for hyperspectral remote sensing image classification Peijun Du a, b,⁎, Kun Tan b, c, Xiaoshi Xing c a b c

Department of Geographical Information Science, Nanjing University, Nanjing City, Jiangsu Province 210093, P.R. China Jiangsu Provincial Key Laboratory for Resources and Environment Information Engineering, China University of Mining and Technology, Xuzhou City, Jiangsu Province 221116, P.R. China Center for International Earth Science Information Network (CIESIN), Columbia University, 61 Route 9 W, PO Box 1000, Palisades, NY 10964, USA

a r t i c l e

i n f o

Article history: Received 31 October 2010 Accepted 24 February 2012 Available online 10 March 2012 Keywords: Support vector machine (SVM) Hyperspectral remote sensing Classification Binary tree J–M distance

a b s t r a c t According to the principle of support vector machine (SVM) and the inter-class separability rule of hyperspectral data, a novel binary tree SVM classifier based on separability measure among different classes is proposed for hyperspectral image classification. J–M distance is used to measure the separability in order to generate the binary tree automatically. By experiments using airborne operational modular imaging spectrometer II (OMIS II) data, satellite EO-1 Hyperion hyperspectral data and airborne AVIRIS data, the classification accuracy of different multi-class SVMs is obtained and compared. Experimental results indicate that the proposed adaptive binary tree classifier outperforms other existing multi-class SVM strategies. Use of the adaptive binary tree SVM classifier is a novel approach to improve the accuracy of hyperspectral image classification and expand the possibilities for interpretation and application of hyperspectral remote sensing image. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Hyperspectral remote sensing (RS) images acquired by different imaging spectrometer sensors, such as EO-1 Hyperion, HyMap and AVIRIS, have shown wide usefulness in many geosciences, environmental and agricultural applications. In general, hyperspectral RS image has fine spectral resolution and abundant information which can be used to address a variety of resource and environmental issues. Especially, hyperspectral sensors can be used to distinguish different types of the same species [20]. According to published literatures, hyperspectral RS information processing is complex in terms of data uncertainty [21], curse of dimensionality [20] and small samples [11,12], etc. As a result, it is difficult to apply a traditional classifier to hyperspectral data with high performance. The classification accuracy seriously decreases because of Hughes phenomenon [15]. An alternate classification technique, support vector machine (SVM), has been put forward and adopted to hyperspectral remote sensing image classification successfully. Many results show that the SVM outperforms traditional classifiers such as artificial neural network (ANN), minimum distance classifier (MDC), maximum likelihood classifier (MLC) and spectral angle mapper (SAM) [22,28]. As the most popular and effective statistical learning algorithm, SVM has become one hot topic in pattern recognition and machine learning fields in recent years. SVM has such advantages as less requirement to ⁎ Corresponding author at: Department of Geographical Information Science, Nanjing University, Nanjing City, Jiangsu Province 210093, P.R. China. E-mail addresses: [email protected], [email protected] (P. Du). 0030-4018/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.optcom.2012.02.092

prior knowledge, more suitability to small size of samples [8], more robustness to noises [10], and higher learning efficiency and more powerful generalization capacity [2]. So it can be used to spatial data processing and analysis including hyperspectral RS image classification [20], spatial fitting and regression [1], data mining [14], and object detection [16]. SVM is originally designed for binary class classification. However, it can be extended to multi-class problem as applied to RS which is such a process on most occasions. Usually there are two strategies to obtain a multi-class pattern recognition system [13]. Several common multi-class methods for SVM based on these schemes have been proposed: 1-against-all (1-a-a) [13,17], 1-against-1 (1-a-1) [13], decision directed acyclic graph (DDAG) SVM [23], and error correcting output codes (ECOC) [9]. These algorithms can obtain high accuracy; however, they also result in a time-consuming and tedious parameter tuning process, even a large unclassifiable region. Tree architecture is always leveraged in decision theory. SVM with binary tree architecture has been introduced to reduce the number of binary classifiers and achieve a fast decision [19]. However, due to the requirement of high classification accuracy and good generalization capacity of binary-tree SVM classifier, it is still an issue when only small-size samples are available for hyperspectral RS images. To address these issues, various SVMs with binary tree architecture have been investigated to reduce the amounts of binary classifiers for time saving and fast decision [6,7]. In this paper, some key techniques in multi-class SVM based hyperspectral RS image classification are investigated, and a novel advanced binary tree SVM is proposed. Three hyperspectral remote sensing data sets, including OMIS hyperspectral RS images, EO-1

P. Du et al. / Optics Communications 285 (2012) 3054–3060

H1

H H2

Classification margin Fig. 1. The demonstration of optimized classification margin.

Hyperion data and AVIRIS Indian Pine data, are experimented to compare the proposed algorithm with other existing methods and demonstrate its advantages in practical uses. The remainder of the paper is organized as follows. Section 2 introduces four common multi-class strategies for SVM. Section 3 is the proposed adaptive binary tree multi-class SVM classifier based on separability measure. Experiments and analyses are demonstrated in Section 4. Finally, Section 5 concludes the paper with some remarks. 2. Common multi-class strategies for SVM SVM, as one of the most effective statistical learning algorithms, uses structural risk minimization (SRM) criterion rather than empirical risk minimization (ERM) in other machine learning methods. Because SVM is advantageous to mitigate such difficulties in hyperspectral classification as small-size samples, high dimensionality, poor generalization and uncertainty impacts, it has been employed to hyperspectral RS image classification in recent years [3,4]. Although it is generally concluded that SVM performs better than other conventional classifiers and it is suitable for high dimensional features (for example, the direct use of all bands of hyperspectral image), the time consumption and computation capacity are still challenging; therefore, feature extraction and dimensionality reduction are still meaningful on many occasions. The theory of SVM for a two class problem could be found in many references [26,29] (Fig. 1). Traditional SVM only provides two-class classification algorithm, and it is important to extend it to multiclass classification cases. There are already some multi-class SVM strategies that were put forward before and have become comparatively mature. In general these methods can be grouped into two types: one is constructing and combing several binary class classifiers, and the other is considering all data in one optimization formulation directly [13]. Four popular methods, including 1-against-1 (1-a-1), 1against-all (1-a-a), decision directed acyclic graph (DDAG) and binary tree (BT), are introduced as follows. A: 1-against-all (1-a-a) method The 1-a-a method is the simplest strategy to extend SVM to multiclass pattern recognition problems. This strategy builds N binary SVM classifiers, one for each of the N classes. It inputs testing samples X to N two-class classifiers respectively and calculates the discriminant function of every classifier, then chooses the class whose discriminant function is maximal to the testing data. B: 1-against-1 (1-a-1) method In this method, every two classes are processed by constructing a child classifier of SVM, so N × (N − 1)/2 child SVM classifiers are

3055

required. By combining all N × (N − 1)/2 classifiers, SVM can determine the class of pixel through the accumulation of predicting classification. In the testing stage, for each testing sample is processed by N × (N − 1)/2 child classifiers of SVM, and the class which appears maximum times are the predicted class label using voting method. C: decision directed acyclic graph (DDAG) method In DDAG method, classes are the same except one at decision nodes of each level, namely, there are many overlapping pattern classes at the level nodes except leaf nodes. Therefore, more support vectors are needed to separate two subgroups, which badly affects the training and testing speed. In this way, N × (N − 1)/2 child classifiers of SVM should be obtained, without nonseparable region. In addition, different choices of root node will give birth to different result for a decided class, which leads to uncertain classification result. D: binary tree method Binary tree classifiers divide a complex problem into several simple ones and then tackle the sub-problem. If the separated classes include plural classes at the node connected to the top node, the hyperplane that separates the classes should be obtained. It repeats until only one class remains in the separated region. Finally, the original multi-class is divided into many two class nodes. Fig. 2 shows two representative trees for the SVM. One is slant binary tree SVM (SBTSVM), and the other is balanced binary tree SVM (BBTSVM). At each node one class is separated from the remaining classes for SBTSVM. But some classes are separated from the remaining classes at each node for BBTSVM. The top of the tree includes all the original classes. Each internal node consists of either a set of class pairs or a single class. 3. Adaptive binary tree SVM classifier based on separability measure As mentioned above, SBTSVM and BBTSVM can separate classes from the nodes of binary tree. However, the processing is random, indicating that we do not know which classes are separated first. Therefore, the separability measure should be calculated for the constructed BT. In order to reduce the classification error in the root node, the class with long distance should be classified firstly. In general, hyperspectral image has massive bands with abundance of information. There are many separability rules as follows: average distance between samples, relative distance between samples, and discrete degree, etc. If the samples have same distance between different classes or different distance appears in the same classes, the average distance cannot measure the separability. Relative distance also is of

(a) Slant Binary Tree

(b) Balanced Binary Tree

SVM 1

2

3

SVM 4

(1

2

3 4)

SVM 1 (2

3

4)

SVM

2

(3

3

SVM

SVM

(1

(3

2)

4)

4)

4

1

2

3

Fig. 2. The sketch for binary tree SVMs to solve a 4-class problem.

4

3056

P. Du et al. / Optics Communications 285 (2012) 3054–3060

no effect as zero distance between different classes. Discrete degree is very useful in hyperspectral classification. However, discrete degree becomes meaningless as the class distance goes beyond one point. To determine the statistical separability among classes for a given band combination, Jeffries–Matusita (JM) distances are introduced for separability measurement of hyperspectral data. The maximum JM distances can be regarded as a relative gauge for the ambiguity separation and therefore for the classes [27]. JM distances are defined by the equation: J ij ¼ ∫

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   ffi2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dx pðxjwi Þ− p xjwj

ð1Þ

x

where, p(x|wi) is pixel probability density and wi shows classes i, which can also be written in the form J = 2(1 − p), where: p¼∫

rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi  ffi pðxjwi Þp xjwj dx

ð2Þ

x

If pi(x), i = 1, 2, …, n, is multivariate Gaussian density as above, then   −B J ij ¼ 2 1−e

ð3Þ

 9 8 T σ þ σ −1   1 < σ i þ σ j 2= 1 i j μ −μ j μ i −μ j þ where; B ¼ : ð4Þ 8 i 2 : jσ i j1=2 jσ j j1=2 ; 2 It shows that JM distance is monotone increasing function of Eq. (4). Since 0 b e −α b 1, Jij ranges from 0 to 2 with 2 corresponding to the largest separation. It was observed that this condition of J is responsible for its utility as a feature selection criterion in multi-class problems. As for k-class problem, Xi (i = 1, ⋯, k) is training data of the ith class, and we define the separability to JMij between the i class and j class. JMij ¼ J ij

Fig. 4. False color composite of OMIS II hyperspectral remote sensing image.

decision tree. Two improved algorithms for binary tree SVM (BTSVM) are experimented. One is adaptive binary tree support vector machines (ABTSVM) based on Jeffries–Matusita (JM) distance, and the other is Kullback Leiber distance binary tree SVM (KLBTSVM) which is performed using Kullback Leiber distance instead of Jeffries–Matusita (JM) distance. The iterative process is shown in Fig. 3.

(a) Experiment 1 SVM SVM

1 2 3 4 5 6 7

1 2

SVM 3 4 5 6

2

1

ð5Þ

JMij is the JM distance of two classes; if JMij ≥ 1, there is no overlap between i and j classes; if JMij b 1, there is overlap between i and j classes. In conclusion, the larger JMij is, the easier the two classes are separated. There are easiest separation classes as Eq. (6):

SVM

3 4

3

4

7 8

SVM 5 SVM 5

6 7

7

5

SVM 6 8

7

6

8

(b) Experiment 2 SVM 1 2

ð6Þ

ði; jÞ ¼ arg max JMij

8

SVM

i¼1;…;k

3 4 5 6 7 8

1 7 8

SVM

2 3 4 5 6

j¼1;…;k

SVM 7 8

Based on the properties of JM distance, a novel binary-tree SVM classifier is designed. In the strategy, the most easily separated classes are separated firstly when the decision tree is formed, and the separability measures derived are introduced into the formation of the

7

1

SVM 3 4 3

8

SVM 2 5 6 SVM

4

5 6

5

2

6

(c) Experiment 3 Calculating separability matrix

SVM SVM

If Class>1

J={C1,C2,…,Cl} If JMik< JMjk

If Class>1

SVM

4 5 8 8

SVM SVM

1 2

1

2

8

1 2 3 6 7 SVM

3 6 7

If JMik> JMjk

4 J1={Ci}+{Ck}

4 5

1 2 3 4 5 6 7

J2={Cj}+{Ck}

Fig. 3. The diagram for the iterative process.

5

SVM 3 6 3

6

Fig. 5. The sketch for novel binary tree SVM classification.

7

P. Du et al. / Optics Communications 285 (2012) 3054–3060

3057

Crop land Inhabited area

Inhabited area 1 Crop land1 Water Road Bare soil Fig. 7. Different ground object classification accuracy comparison.

Plant Fig. 6. Classification result image of novel binary tree SVM.

The process of establishing ABTSVMs is as follows: (1) Calculating separability between arbitrary two classes based on the separability measure, and obtaining a separability matrix JM: 2

3 0 JM12 JM13 ⋯JM1l 6 JM21 0 JM23 ⋯JM2l 7 7 JM ¼ 6 4⋮ ⋮ ⋮ ⋯⋮ 5 JMl1 JMl2 JMl3 ⋯ 0 (2) Letting the set J contain all the classes, and select the maximal separability JMij where class i and j belonging to J1 partition and J2 partition, respectively; (3) If the separability measure JMik between the class k and the class i is smaller than JMjk between the class k and class j in the remainder classes except class i and j, the class k belongs to J1, else it belongs to J2; (4) Repeating the process as step (3) until the remained classes are partitioned; (5) Considering all the classes in the set J1 as one classified set, and the classes in set J2 as the other one, then constructing the hyperplane in the same feature space; (6) Letting J = J1 and J = J2, respectively, then repeating steps (3) and (4) until there is only one class in the sets J1 and J2. After the binary tree is generated, SVM is used on each node of the tree to classify the hyperspectral remote sensing image. 4. Experiments and analysis 4.1. Experiment 1 The experimental data source is the OMIS II hyperspectral image of Changpin, Beijing, China. It has 512 lines, 536 samples and 64 bands.

Fig. 4 is the RBG composite of the hyperspectral image (R: Band 36 with wavelength of 0.81 μm, G: Band 23 with wavelength of 0.68 μm, and B: Band 11with wavelength of 0.56 μm). In Fig. 4, the green region is grass land, the black region stands fish pond region, the yellow region is the yellow grass, and the white region is the inhabited area. After the entire data set is normalized (the value of each pixel is between zero and one), samples are selected. Taking into account all spectral and texture features, the pixel purity index (PPI) of the image is computed. One thousand pure pixels are obtained for all classes and endmembers are chosen as samples. In this experiment, all pure pixels together with ground truth are taken as training samples at last. The classification problem involves the identification of eight land cover types (C1: crop land (230 pixels), C2: inhabited area (111pixels), C3: inhabited area B (67 pixels), C4: crop land B (78 pixels), C5: water (103 pixels), C6: road (65 pixels), C7: bare soil (91 pixels), C8: plant (88 pixels)) for the OMIS data set. The training sets are 0.318% of the data set. The testing samples, about 0.359% of the data set, are also obtained in the same way. Based on the conclusion of other papers [5,20], a Gaussian kernel is used for the multi-class SVM. The grid search is applied for selecting C and γ. After pre-processing the data, the training samples are applied to design the binary tree structure, shown in Fig. 5(a). As the result in Fig. 6 and Table 1, the classification accuracy is very good for large area. The SVM parameters are C = 2048 and γ = 0.5, and the classification accuracy is 96.52%, Kappa coefficient is 0.8335, and time consumed is 1.0310 s. The main inhabited areas are difficult for classification because of confusion with other ground objects such as small water, grass, and road, etc. It decreases classification accuracy in those local areas. From Table 1, it is found that C2 (inhabited area) and C3 (inhabited area B) are difficult for classification because of spectral similarity. But it could get better accuracy for the remaining classes. In order to compare various algorithms, the spectral separability is changed to calculate the inter-class distance described by Kullback

Table 1 Adaptive binary tree SVM classification confusion-matrix.

C1 C2 C3 C4 C5 C6 C7 C8 Producer accuracy

C1

C2

C3

C4

C5

C6

C7

C8

User accuracy

134 0 0 0 0 0 0 0 100.00%

0 34 0 0 0 0 0 0 100.00%

0 26 26 0 0 0 0 0 50.00%

2 0 0 164 0 0 0 0 98.80%

0 0 0 0 135 0 0 0 100.00%

0 36 67 0 0 103 0 0 50.00%

0 0 0 0 0 0 103 3 97.17%

0 0 0 0 0 0 2 105 98.13%

98.53% 35.42% 27.96% 100.00% 100.00% 100.00% 97.22% 97.22% 96.52%

3058

P. Du et al. / Optics Communications 285 (2012) 3054–3060 Table 3 Classification result of SAM and MDC.

Overall accuracy Kappa

Fig. 8. The spectrum of road and bare soil.

Leiber distance. The Kullback Leiber distance binary tree SVM (KLTSVM) is applied for classification as well. In this experiment, the same training and testing samples are used. The parameters are C = 1024 and γ = 0.5, the classification accuracy is 93.82%, Kappa coefficient is 0.7682, and time consumed is 0.9982 s. Comparing the results of two algorithms, Fig. 7 shows that two methods could reach the same accuracy among general objects. However, the results are different between road and bare soil. We can reach the conclusion from Fig. 8, which is about the spectral of road and bare boil in the experiment data set. Totally, JM distance performs better for BTSVM than Kullback Leiber distance especially in hyperspectral classification. We also compare ABTSVM with other common multi-SVMs, such as one-against-all SVM (1-a-a SVM), one-against-one SVM (1-a-1 SVM), decision directed acyclic graph SVM (DDAG SVM), slant binary SVM (SBSVM), and balanced binary tree (BBTSVM). Table 2 shows the accuracy statistics of different algorithms. Considering these traditional classification algorithms, spectral angle mapper classification (SAM) and minimum distance classification (MDC) are applied in our experiments. They are two transitional classifiers for hyperspectral image classification. SAM is a physicallybased spectral classification that uses an n-D angle to match spectral features to reference spectrum. SAM determines the spectral similarity between two spectra by calculating the angle between the spectra and treating them as vectors in a space with dimensionality equal to the number of bands [18]. MDC applies the mean vectors of each endmember and calculates the Euclidean distance from each unknown pixel to the mean vector for each class [25]. Table 3 shows their accuracies. Compared with common classifiers, SVM has the highest accuracy in terms of overall accuracy and kappa compared. Moreover, the proposed binary tree SVM (ABTSVM) based on JM distance is the best one in all experimented multi-class SVMs and has 96.52% overall accuracy. Besides, ABTSVM is less time-consuming in contrast with 1a-1 SVM and 1-a-a SVM. Therefore, the JM spectral separability is a good choice in SVM classification with application to hyperspectral data set, though it is a bit more time-consuming than other BTSVMs.

SAM

MDC

78.9262% 0.7577

76.3830% 0.7281

lines, 310 samples, and 242 bands with a spectral range covering 400 nm to 2500 nm. The Hyperion data were geographically registered by ground control points (GCPs) [30]. Fig. 9 shows a false composite of EO-1 Hyperion image. The diverse landscapes in the study area range from the hilly regions in the north, the urban area in the middle, and the salt evaporators in the southern end of the San Francisco Bay. The urban part includes old residential areas, the Quarry Lake, new residential areas, highway, industrial and commercial areas, and city parks and school lawns. 38 Empty bands and 51 noisy bands are removed, so 153 bands are used for classification. Samples are selected after the entire data set is normalized. The pixel purity index (PPI) of the data is computed. About one thousand and five hundred pure pixels are obtained. Then endmember pixels are selected as samples. The classification process involves the identification of eight land cover types (C1: meadow land (200 pixels), C2: oak woodland (206 pixels), C3: swamp (181 pixels), C4: commercial (106 pixels), C5: highway (171 pixels), C6: salt evaporator (195 pixels), C7: lake (268 pixels), and C8: dry grass (123 pixels)). The training sets are 0.850% of data set.

4.2. Experiment 2 In this experiment, the EO-1 Hyperion image is used. The location of the data is the city of Fremont, part of the Silicon Valley, located on the southeast side of the San Francisco Bay in California. It has 550 Table 2 Classification result of different multi-SVM. Multi-SVM

1-a-a SVM

1-a-1 SVM

DDAG SVM

SBTSVM

BBTSVM

Overall accuracy Kappa Time-consumed

94.36% 0.7874 1.8280 s

94.72% 0.7882 1.5940 s

93.92% 0.7692 0.1100 s

93.68% 0.7669 0.2650 s

93.72% 0.7679 0.8440 s

Fig. 9. Case study area in Experiment 2.

P. Du et al. / Optics Communications 285 (2012) 3054–3060

3059

Table 4 Hyperion data classification result of different classifiers. Multi-class SVM

ABTSVM

KLBTSVM

1-a-a SVM

1-a-1 SVM

DAAG SVM

SBTSVM

BBTSVM

SAM

MDC

Overall accuracy Kappa Test time (s)

97.12% 0.9178 7.5190

96.33% 0.8995 7.4800

94.46% 0.8723 8.1220

93.62% 0.8634 7.7940

93.78% 0.8654 5.1410

93.18% 0.8615 6.1630

94.72% 0.8791 11.8090

80.62% 0.7512 4.1542

78.65% 0.7411 4.2153

Also, the testing samples, about 0.720% of the data set which is different from training sets but obtained in the same way. The accuracy statistical indicators are listed in Table 4. The Binary Tree SVM based on JM distance is also the best one in multi-class SVMs and has 97.1245% overall accuracy, the highest accuracy among all of the algorithms. So the novel multi-SVM presented in this experiment has the highest accuracy. Fig. 10 shows the classification results using ABTSVM.

(b) KLdistance SVM

(a) JM distance SVM

(e) DDAG SVM

(f) SBT SVM

4.3. Experiment 3 In this experiment the AVIRIS Indian Pines image is used. The Indian Pines scene was gathered by the AVIRIS instrument in June 1992 with 220 bands. The data is available online from http://dynamo.ecn.purdue. edu/biehl/MultiSpec. It consists of 145 ×145 pixels and 16 groundtruth classes, ranging from 20 to 2468 pixels in size. It covered an area of mixed agriculture and forestry landscape in NW Indiana. The data

(c) 1-a-a SVM

(d) 1-a-1 SVM

(g) BBT SVM high way

salt evaporato

dry grass meadow land swamp

commercial

lake oak woodland

Fig. 10. Classification results of Hyperion data using multi-class SVM.

3060

P. Du et al. / Optics Communications 285 (2012) 3054–3060

Table 5 AVIRIS data classification accuracy of different classifiers. Multi-class SVM

ABTSVM

KLBTSVM

1-a-a SVM

1-a-1 SVM

DDAG SVM

SBTSVM

BBTSVM

SAM

MDC

Overall accuracy Kappa Test time (s)

97.92% 0.9775 1.1900

96.15% 0.9612 1.0230

95.02% 0.9210 2.6140

94.61% 0.8965 2.2730

94.32% 0.9087 1.0450

93.66% 0.8723 1.1270

94.13% 0.8976 1.7302

83.71% 0.7623 1.0292

80.92% 0.7567 1.0876

set represents a very challenging land-cover classification scenario, in which the primary crops of the area (mainly corn and soybeans) were very early in their growth cycle, with only about 5% canopy cover [24]. We removed 20 noisy bands covering the region of water absorption, and leaving 200 spectral bands for analysis. Similar to above experiments, the Pixel Purity Index (PPI) of the data is computed at first and about four hundred pure pixels are obtained. Then endmember pixels are selected as samples. The classification process involves the identification of eight land cover types (C1: corn— notill (46 pixels), C2: corn—min (40 pixels), C3: corn (40 pixels), C4: grass/pasture (59 pixels), C5: grass/pasture—mowed (73 pixels), C6: soybeans—notill (71 pixels), C7: soybeans—clean (58 pixels), and C8: woods (74 pixels)). The training sets are 2.19% of data set. Also, the testing samples, about 2.24% of the data set, are obtained in the same way. The accuracy statistics is listed in Table 5. The proposed adaptive binary tree SVM based on JM distance also performs best among all multi-class SVM strategies and has 97.9154% overall accuracy, better than the other multi-class SVM and traditional classifiers (MDC and SAM). So the proposed novel multi-class SVM has the highest classification accuracy, consistent with the other two experiments. 5. Conclusion An adaptive binary tree SVM based on JM distance is proposed and experimented in this paper. Three experiments using different hyperspectral data have shown that ABTSVM classifier outperforms other multi-class SVM classifiers (1-a-a, 1-a-1, DDAG, SBTSVM, and BBTSVM) and traditional classifiers. The conclusions are consistent in all of the applications of multi-SVMs and traditional classifiers to hyperspectral images. It is concluded that the proposed multi-class SVMs based on JM distance are effective to hyperspectral remote sensing image classification. Future studies will focus on the optimization the structure and parameters of multi-class SVMs in order to further improve the classification accuracy. Acknowledgment The authors are grateful to Dr. Christopher Small, Lamont Research Professor Scientist at the Lamont-Doherty Earth Observatory, The Earth Institute at Columbia University, USA, for his valuable review and comments. The authors also thank Prof. Peng Gong, Director of the State Key Laboratory of Remote Sensing Science of China, and Dr. Bing Xu, Associate Professor at the University of Utah, USA, for providing the Hyperion data. We also thank Dr. D. Landgrebe of Purdue University, West Lafayette, IN, USA, for making the AVIRIS data set and the documents available online. This work is supported by the research grants from National Natural Science Foundation of

China (No. 40401038, No. 40871195, No. 41101423), the National High-Tech Program of China (No. 2007AA12Z162), and the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD). The paper is completed also with support from the Center for International Earth Science Information Network (CIESIN), Columbia University, USA.

References [1] X. An, S.G. Su, T. Wang, S. Xu, W.J. Huang, L.D. Zhang, Spectroscopy and Spectral Analysis 27 (2007) 1619. [2] N. Ancona, R. Maglietta, E. Stella, Pattern Recognition 39 (2006) 1588. [3] L. Bruzzone, M. Chi, M. Marconcini, IEEE Transactions on Geoscience and Remote Sensing 44 (2006) 3363. [4] G. Camps-Valls, L. Gomez-Chova, J. Calpe-Maravilla, J. Martin-Guerrero, E. SoriaOlivas, L. Alonso-Chorda, J. Moreno, IEEE Transactions on Geoscience and Remote Sensing 42 (2004) 1530. [5] G. Camps-Valls, L. Gomez-Chova, J. Munoz-Mari, J. Vila-Frances, J. CalpeMaravilla, IEEE Geoscience and Remote Sensing Letters 3 (2006) 93. [6] J. Chen, C. Wang, R. Wang, Neurocomputing 72 (2009) 3370. [7] J. Chen, R. Wang, International Journal of Remote Sensing 28 (2007) 2821. [8] M. Chi, R. Feng, L. Bruzzone, Advances in Space Research 41 (2008) 1793. [9] T. Dietterich, G. Bakiri, Journal of Artificial Intelligence Research 2 (1995) 263. [10] P.J. Du, X.M. Wang, K. Tan, G. Foody, 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences, 25–27, June, 2008, Shanghai, World Academic Union Ltd, Liverpool, 2008, p. 138. [11] G. Foody, A. Mathur, Remote Sensing of Environment 93 (2004) 107. [12] G. Foody, A. Mathur, C. Sanchez-Hernandez, D. Boyd, Remote Sensing of Environment 104 (2006) 1. [13] C. Hsu, C. Lin, IEEE Transactions on Neural Networks 13 (2002) 415. [14] C. Huang, J. Dun, Applied Soft Computing Journal (2007) 1381. [15] G. Hughes, IEEE Transactions on Information Theory 14 (1968) 55. [16] J. Inglada, ISPRS Journal of Photogrammetry and Remote Sensing 62 (2007) 236. [17] K. K. Chin, 1998, Support vector machines applied to speech pattern classification. Master's thesis, University of Cambridge, UK. [18] F.A. Kruse, A.B. Lefkoff, J.W. Boardman, K.B. Heidebrecht, A.T. Shapiro, P.J. Barloon, A.F.H. Goetz, Remote Sensing of Environment 44 (1993) 145. [19] Z. Liu, W. Shi, Q. Qin, X. Li, D. Xie, IGARSS '05. Proceedings of IEEE International Geoscience and Remote Sensing Symposium, 2005, p. 186. [20] F. Melgani, L. Bruzzone, IEEE Transactions on Geoscience and Remote Sensing 42 (2004) 1778. [21] T. Oommen, D. Misra, N. Twarakavi, A. Prakash, B. Sahoo, S. Bandopadhyay, Mathematical Geosciences 40 (2008) 409. [22] M. Pal, P. Mather, International Journal of Remote Sensing 26 (2005) 1007. [23] J. Platt, N. Cristianini, J. Shawe-Taylor, Advances in Neural Information Processing Systems 12 (2000) 547. [24] A. Plaza, J. Benediktsson, J. Boardman, J. Brazile, L. Bruzzone, G. Camps-Valls, J. Chanussot, M. Fauvel, P. Gamba, A. Gualtieri, Remote Sensing of Environment 113 (2009) 110. [25] J. Richards, X. Jia, Springer, Berlin, 2006, p. 201. [26] B. Scholkopf, C. Burges, A. Smola, Advances in Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, USA, 1999. [27] P. Swain, Remote Sensing: The Quantitative Approach (1978) 136. [28] K. Tan, P.J. Du, Spectroscopy and Spectral Analysis 28 (2008) 2009. [29] V.N. Vapnik, Statistical Learning Theory, Springer, NY, 1998. [30] B. Xu, P. Gong, Photogrammetric Engineering and Remote Sensing 73 (2007) 955.