Active extreme learning machines for quad-polarimetric SAR imagery classification

Active extreme learning machines for quad-polarimetric SAR imagery classification

International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319 Contents lists available at ScienceDirect International Jour...

13MB Sizes 0 Downloads 141 Views

International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

Contents lists available at ScienceDirect

International Journal of Applied Earth Observation and Geoinformation journal homepage: www.elsevier.com/locate/jag

Active extreme learning machines for quad-polarimetric SAR imagery classification Alim Samat a,b , Paolo Gamba c , Peijun Du a,b,∗ , Jieqiong Luo a,b a Key Laboratory for Satellite Mapping Technology and Applications of State Administration of Surveying, Mapping and Geoinformation of China, Nanjing University, China b Jiangsu Provincial Key Laboratory of Geographic Information Science and Technology, Nanjing University, China c Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Italy

a r t i c l e

i n f o

Article history: Received 16 April 2014 Accepted 25 September 2014 Keywords: PolSAR Extreme learning machine Ensemble learning Active learning Active extreme learning machines

a b s t r a c t Supervised classification of quad-polarimetric SAR images is often constrained by the availability of reliable training samples. Active learning (AL) provides a unique capability at selecting samples with high representation quality and low redundancy. The most important part of AL is the criterion for selecting the most informative candidates (pixels) by ranking. In this paper, class supports based on the posterior probability function are approximated by ensemble learning and majority voting. This approximation is statistically meaningful when a large enough classifier ensemble is exploited. In this work, we propose to use extreme learning machines and apply AL to quad-polarimetric SAR image classification. Extreme learning machines are ideal because of their fast operation, straightforward solution and strong generalization. As inputs to the so-called active extreme learning machines, both polarimetric and spatial features (morphological profiles) are considered. In order to validate the proposed method, results and performance are compared with random sampling and state-of-the-art AL methods, such as margin sampling, normalized entropy query-by-bagging and multiclass level uncertainty. Experimental results for four quad-polarimetric SAR images collected by RADARSAT-2, AirSAR and EMISAR indicate that the proposed method achieves promising results in different scenarios. Moreover, the proposed method is faster than existing techniques in both the learning and the classification phases. © 2014 Elsevier B.V. All rights reserved.

Introduction Polarimetric SAR (PolSAR, also called quad-polarimetric SAR, or fully polarimetric SAR) has not only all the advantages of a conventional SAR, such as all-weather, and day/night observation capabilities, but it is also able to capture more information about the backscattering phenomena through multi-polarization modes. Accordingly, PolSAR has attracted a growing interest in the context of remote sensing applications. The impact of single, dual and fully polarimetric data sets as well as the corresponding polarimetric features and decomposition methods on the characterization of multiple materials, from vegetation to ice, from natural terrain to artificial infrastructures, have been already investigated in various studies (Conradsen et al., 2003; Ainsworth et al., 2009; McNairn et al., 2009; Lonnqvist et al., 2010; van Zyl et al., 2011;

∗ Corresponding author at: Jiangsu Provincial Key Laboratory of Geographic Information Science and Technology, Nanjing University, China. Tel.: +86 15905159291. E-mail addresses: [email protected] (A. Samat), [email protected] (P. Gamba), [email protected] (P. Du), [email protected] (J. Luo). http://dx.doi.org/10.1016/j.jag.2014.09.019 0303-2434/© 2014 Elsevier B.V. All rights reserved.

Paladini et al., 2013). However, due to the data collection mechanism, the complexity of the interaction between the ground surface and the incident electromagnetic wave, and finally the inherent speckle effect, the classification of PolSAR images is still a challenging research topic. To overcome such challenges and process PolSAR data more effectively, many supervised/unsupervised classifiers have been proposed (Doulgeris et al., 2008; Ferro-Famil et al., 2001; He et al., 2013; Qi et al., 2012). Despite their excellent performances, even the most effective supervised classifiers, such as support vector machines (SVM) (Lardeux et al., 2009), classifiers ensembles (Waske and Braun, 2009), or the Wishart supervised classifier (Lee et al., 1994), rely on the quality, amount and availability of labeled training samples. These training data must represent the actual statistical properties of the land cover classes in the scene. This constraint usually leads to extensive visual interpretation (inherently subjective) or expensive and time-consuming field work. Moreover, due to their redundancy, manually selected training samples do not always guarantee a correct selection. Therefore, training samples with high quality of representation and low redundancy, valid for adequate learning of any supervised classifier, are urgently needed.

306

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

Fig. 1. False colored RGB images obtained using the Pauli decomposition (R = |S11 + S22|, G = |S12 + S21|, B = |S11 − S22|) as well as reference maps for each of the four considered data sets. First row: San Francisco (C-band, Radarsat-2); second row: Foulum (L-Band, AirSAR); third row: Flevoland (L-band, EMISAR); Forth row: Flevoland (C-band, Radarsat-2).

Recently, active learning (AL) techniques have been extensively investigated due to their ability to select additional informative samples from unlabeled data. AL selects samples from a pool of unlabeled samples via query strategies suited to the properties of the classifier, the current labeled sample set and auxiliary unlabeled data (Tuia et al., 2011a,b). So far, several AL algorithms have been introduced, with promising results in multi- and hyperspectral image classification (Rajan et al., 2008; Tuia et al., 2009, 2011a,b). As a form of sampling technique, AL focuses on human–machine interaction under the assumption that the samples that are most informative for the user will generalize the classification capabilities. According to the selected query strategy, AL methods can be categorized into three main groups: (1) methods relying on the classifier features (e.g., the geometrical features of SVM – Tong and Koller, 2002; Demir et al., 2011); (2) methods based on the estimation of the class posterior probability functions (Roy and McCallum, 2001; Mitra et al., 2004); and (3) methods based on the query-bycommittee paradigm (Freund et al., 1997; Copa et al., 2010). The first group includes margin sampling (Tuia et al., 2011a,b), multiclass level uncertainty (Demir et al., 2011) and significance space construction (SSC) (Pasolli et al., 2011). These approaches mainly take advantages of geometrical properties of SVM, but also share

their sensitiveness to model parameters. The second group, including KL-max (Rajan et al., 2008) and the breaking ties (BTs – Luo et al., 2004), is more computational efficient, but the training sample selection criterion relies on approximations of the posterior probabilities, and is highly likely to fail when the initial training samples count is very limited. The last group, including query-by-bagging (Freund et al., 1997; Copa et al., 2010), entropy query-by-bagging (EQB), and normalized entropy query-by-bagging (Tuia et al., 2009) is applicable to any kinds of model or combination of models, but models with strong generalization and small computational cost are usually preferred (Tuia et al., 2011a,b). Summing up, there is still a need for AL algorithms with small computational costs and generalized performances, able to work efficiently on very small initial training samples without approximations of the posterior probabilities. In this regard, our idea is to exploit the potentials of extreme learning machines (ELM) in an active learning framework. ELM was proposed as a generalized algorithm for single-hidden layer feed-forward neural networks, performing well in both regression and classification (Huang et al., 2006a,b). Due to its remarkable advantages, such as fast operation, straightforward solution and strong generalization, ELM has attracted a lot of attentions in the pattern recognition field (Suresh et al., 2009; Minhas et al., 2010;

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

307

Fig. 2. Representative VHR samples for (a): “built-up 1” class (“sparse residential”); (b) “built-up 2” class (“residential”); (c) “built-up 3” class (“dense residential”). Please note that red arrows point to the North. (For interpretation of the references to color in this text, the reader is referred to the web version of the article.)

Huang et al., 2012). Recently, ELM and its extensions, e.g., ensemble extreme learning machines (E2LMs) and kernel extreme learning machines (KELM), have been investigated for multi- and hyperspectral image classification (Pal, 2009; Pal et al., 2013; Samat et al., 2014a). However, ELM has also some drawbacks: (1) the randomness of input weights and bias can result in ill-posed problems, leading to low performances or even no solution; and (2) the performance depends on the hidden layer structure, that should be specified by the user before the training phase (Huang et al., 2006a,b). To reduce the probability of model overfitting, a Bayesian extreme learning machine (BELM) was proposed in SoriaOlivas et al. (2011). An error-minimized extreme learning machine (EMELM) was instead proposed in Feng et al. (2009) by implementing an incremental learning and adaptively changing the hidden lures structure. Additionally, two ensemble versions of ELM were investigated in Samat et al. (2014a) to control the effects due to the random selection of input weights and biases. In this work, exploiting the fact that the performances of a weak classifier can be improved by majority voting combining bootstrap replicas, ELM drawbacks were addressed by a Bagging technique (Breiman, 1996; Lam and Suen, 1997; Samat et al., 2014a). Specifically, a query-by-Bagging algorithm combined with ELM is applied as the base learner for an active learning framework. Instead of using a single ELM to estimate the posterior probability function of the classes, the most informative samples are selected according to the votes of an ELM ensemble. Additionally, to control possible limitations caused by the ELM hidden layer structure, a k-fold cross-validation technique is used to select the optimal number of hidden layer neurons in each iteration step. The resulting approach, labeled active extreme learning machines (AELM), is expected to be

capable of fast operation and excellent performances, as well as of strong generalization. AELM was tested on four quad-polarimetric SAR images. For validation and comparison, basic and advanced AL algorithms, such as random sampling, margin sampling, entropy query-by-bagging and multiclass level uncertainty, were applied to all experiments. In all of the tests, in addition to polarimetric features (six parameters computed from the coherency matrix), spatial features such as morphological profiles (MP) and differential morphological profiles (DMP) were considered in input, with the aim of investigating the benefits of spatial features for PolSAR image classification and evaluating the capability of the proposed method to handle high dimensional data. The remainder of this paper is organized as follows. Section “Related methods” briefly presents the related methods such as the polarimetric coherency decomposition, morphological and differential morphological profiles, active learning and extreme learning machines. Section “Proposed AELM classification approach” presents the proposed AELM method in detail. Finally, Section “Experimental data sets” presents the experimental data sets, while results are proposed and discussed in Section “Results and discussions”. Conclusions are eventually presented in Section “Conclusions”. Related methods Polarimetric coherence decomposition The polarization of electromagnetic fields scattered by a ground target does not depend only on the polarization of the incident

308

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

Fig. 3. Backscattering power for Yamaguchi decomposition: (a) double bounce; (b) helix bounce; (c) odd bounce; and (d) volume bounce.

electromagnetic field, but also on the characteristics of the target such as its shape, size, direction, structure and material (Cloude and Pottier, 1996). In order to make full use of the polarization information, the scattering process for the target of interest must be considered as a function of the electromagnetic fields. The relation between the incident (EI ) and the scattered (ES ) wave can be described by the Sinclair matrix S (Cloude and Pottier, 1996):

e−jkr e−jkr ES = SEI = r r



Shh

Shv

Svh

Svv

 (1)

EI

where the 2 × 2 complex coherent Sinclair matrix S describes the information directly related to physical properties of the target materials and its orientation, Sij is the so-called complex scattering coefficient or complex scattering amplitudes, and finally i (h or v) and j (h or v) denote the polarization of the incident and the scattered angle, separately. Although the Sinclair matrix S can be used for polarimetric composition and for the analysis of the scattering mechanisms, “scattering vectors” are used to extract a more physical description of the phenomenon. In the monostatic backscattering case, assuming the complex Pauli spin matrix basis set

 P =

√ 2



1 0 0

1



√ 2



1 0 0

−1



√ 2



0

1

1

0

 (2)

As a way to express the most common backscattering mechanisms for concentrated targets, the scattering vector Kp is defined (Cloude and Pottier, 1996) as: 1  KP = √ Shh + Svv 2

Shh − Svv

2Shv

T

(3)

Practically, most of the radar targets are distributed, and generally situated in a dynamically changing environment, subject to spatial and temporal variations. For such distributed targets, one has to make assumptions concerning stationarity, homogeneity, and ergodicity. This can be analyzed more precisely by introducing the concept of space and time varying stochastic processes, where the target or the environment can be described by the second order moments of the fluctuations extracted from the polarimetric coherency or covariance matrices (Cloude and Pottier, 1996). In the monostatic backscattering case, the coherence matrix T3 is given as: T3 =



KP · KP∗T

⎡ =

1 2



|Shh + Svv |2



⎢ (S − S )(S + S )∗ ⎣ hh vv hh vv

∗ 2 Shv (Shh + Svv )



(Shh + Svv )(Shh − Svv )





|Shh − Svv |2



2 Shv (Shh − Svv )















⎥ ⎦

2 (Shh + Svv )Sh∗v 2 (Shh − Svv )Sh∗v



4 |Shv |2



(4)

where · denotes the ensemble average, and the superscript “*T” denotes the conjugate transpose. Therefore, for each pixel the PolSAR information is embedded in three real (diagonal) and three complex (off-diagonal) parameters,

Fig. 4. Several false color composites of the San Francisco area using Yamaguchi decomposition; (a): R = odd, G = helix, B = double; (b): R = volume, G = helix, B = double; (c): R = volume, G = odd, B = helix; (d): R = volume, G = odd, B = double.

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319 175

73

AELM

SVM_RS

SVM_MS

SVM_nEQB

309

72

150

71

SVM_MCLU

70

OA (%)

68 67 66

AELM

65

SVM_RS

64

Time (s)

125

69

100 75 50

SVM_MS

63

SVM_nEQB

62 61

25

SVM_MCLU

0

50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970 1010

50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 950 1000

60

Sample size

Sample size

93 92 91 90 89 88 87 86 85 84 83 82 81

(b) 200 180 160

Time (s)

140 120

AELM SVM_RS

100 80

SVM_MS

60

SVM_nEQB

40

AELM

SVM_RS

SVM_MS

SVM_nEQB

SVM_MCLU

20

SVM_MCLU

0 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970 1010

50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970 1010

OA (%)

(a)

Sample size

Sample size

(d)

(e) 91

OA (%)

90 89 88

AELM

87 86

SVM_RS

85

SVM_MS

84

SVM_nEQB

83

SVM_MCLU

50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970 1010

82

AELM

SVM_RS

SVM_MS

SVM_nEQB

SVM_MCLU

50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970 1010

92

300 280 260 240 220 200 180 160 140 120 100 80 60 40 20 0

Time (s)

93

Sample size

Sample size

(f)

(e)

Fig. 5. Overall classification accuracy (%) and consumed time (seconds) for the proposed and compared methods applied to data set 1. Different input sets are considered; (a) and (b): coherency; (c) and (d): coherence and MP; (e) and (f): coherence, MP and DMP.

for a total of nine parameters. Due to the symmetry of the coherency matrix, these nine parameters can be reduced to a feature vector with five or six elements, assumed to have a joint Gaussian distribution (Kong et al., 1988; Rignot et al., 1992; Cloude and Pottier, 1996; Maghsoudi et al., 2011; Samat et al., 2014b). In this work, the three diagonal terms and three complex correlation coefficients from the upper triangle were used to build the feature vector, able to represent the polarimetric backscattering behavior of all targets. Morphological profiles Morphological profiles (MP) (Pesaresi and Benediktsson, 2001), extended morphological profiles (EMP) (Benediktsson et al., 2005) and differential morphological profiles (DMP) (Dalla Mura et al., 2010) are spatial features that can be computed for each pixel in an image, by recursively applying mathematical morphology operators. They are powerful tools for extracting spatial and contextual components from an image, and are useful in the representation

and description of geometric structures. Additionally, they have been already and extensively studied for multispectral and hyperspectral data interpretation (Benediktsson et al., 2005; Dalla Mura et al., 2010; Soille and Pesaresi, 2002), although these operators have not attracted a large attention in the field of SAR or PolSAR image processing yet. More in detail, MP is defined by a sequence of morphological opening and closing operations with different structural element sizes. Opening by reconstruction (OR ) and closing by reconstruction (CR ) for a gray-level image f are respectively defined by Benediktsson et al. (2005): ORi (f ) = Rfı [εi (f )] CRi (f ) = Rfε [ıi (f )]

(5)

where εi and ıi are the erosion and dilation operations with a structure element (SE) whose radius is i pixels long, and Rfı and Rfε are the geodesic reconstruction by dilation and erosion, respectively.

310

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

Fig. 6. Classification maps for AELM (first row 1), RS (second row), MS (third row), nEQB (fourth row) and MCLU (fifth row) applied to data set 1. Different input sets are considered; (first column); coherence and MP (second column); coherence , MP and DMP (third column).

By using a sequence of closing and opening operations by reconstruction, MP can be formally defined as Benediktsson et al. (2005): MP(f ) = {ORi=1 (f ), ORi=2 (f ), . . ., ORi=n (f ), f, CRi=1 (f ), CRi=2 (f ), . . ., CRi=n (f )],

i ∈ [1, n]}

(6)

Similarly, and according to the definition in Benediktsson et al., 2005, DMP is expressed as:

L 

ˇj · g(wj · xi + bj ) = yi ,

i = 1, 2, . . ., N

(8)

j=1

DMP(f ) = {ORi=2 (f ) − ORi=1 (f ), . . ., ORi=n (f ) − ORi=n−1 (f ), CRi=2 (f ) − CRi=1 (f ), . . ., CRi=n (f ) − CRi=n−1 (f ), i ∈ [1, n]}

For N arbitrary distinct samples (xi , yi ), where xi = [xi1 , xi2 , . . ., xiN ]T ∈ RD and yi = [yi1 , yi2 , . . ., yiN ]T ∈ RP , superscripts T, D and P denote transpose, input space dimension and output class number, respectively, the standard decision rule for a SLFN with L hidden neurons and an activation function g(x) can be expressed as:

(7)

Extreme learning machine To solve the bottleneck of the learning speed of artificial neural networks (ANNs), the extreme learning machine (ELM) was firstly proposed for single-hidden layer feed-forward neural networks (SLFNs) (Huang et al., 2006a,b, 2012).

where wj = [wj1 , wj2 , . . ., wjN ]T and ˇj = [ˇj1 , ˇj2 , . . ., ˇjP ]T represent the weight vectors from the input to the hidden layer and from the hidden layer to the output layer, respectively, bj is the bias of jth hidden neuron, and g(wj · xi  + bj ) is the output of the j-th hidden neuron with respect to the input sample xi . Notice that Eq. (8) can be rewritten in a compact form as: H·ˇ =Y

(9)

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319 35

85 82 76

SVM_RS SVM_MS

25

73 70 67

AELM

64

SVM_RS

61

SVM_MS

58

SVM_nEQB

55

Time (s)

OA (%)

AELM

30

79

SVM_nEQB 20

SVM_MCLU

15 10 5

SVM_MCLU 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

0 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

52

Sample size

Sample size

(b)

(a) 30

88 87

25

85 84

AELM

83

SVM_RS

82

SVM_MS

Time (s)

86

OA (%)

311

SVM_nEQB

81

AELM

SVM_RS

SVM_MS

SVM_nEQB

SVM_MCLU

20 15 10 5

SVM_MCLU 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

0

Sample size

Sample size

(d)

(c) 30

95

25

85 AELM 80

SVM_RS

Time (s)

OA (%)

90

AELM

SVM_RS

SVM_MS

SVM_nEQB

SVM_MCLU

20 15 10

SVM_MS 75

SVM_nEQB

5

SVM_MCLU 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

0 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

70

Sample size

Sample size

(e)

(f)

Fig. 7. Overall classification accuracy (%) and consumed time (seconds) for the proposed and compared methods applied to data set 2. Different input sets are considered; (a) and (b): coherency; (c) and (d): coherence and MP; (e) and (f): coherence, MP and DMP.

where



g(w1 , b1 , x1 )

...

g(wL , bL , x1 )



⎜ g(w , b , x ) ... g(w , b , x ) ⎟ L L 1 1 2 2 ⎟ ⎜ ⎟ H=⎜ ⎜ ⎟ .. .. .. ⎝ ⎠ . . . g(w1 , b1 , xN )



ˇ1



g(wL , bL , xN )



⎜ˇ ⎟ ⎜ 2⎟ ⎟ ˇ=⎜ ⎜ . ⎟ ⎝ .. ⎠ ˇL

...

y1



,

N×L

ˇ = H+ · Y

⎜y ⎟ ⎜ 1⎟ ⎟ , Y =⎜ ⎜ . ⎟ ⎝ .. ⎠ L×P

yN

Thus, the learning algorithm may have defects such as local minima and over-training. By exploring the approximation capability of feed-forward neural networks with a finite training set, researchers proved that SLFNs could reach the approximate capacity at a specified error level ε(ε > 0) when the number of hidden layer neurons is much less than that of training samples (Huang et al., 2006b). According to the minimum norm least square function, the weight matrix ␤ in Eq. (9) can then be obtained by:

N×P

Here, H is the hidden layer output matrix of the SLFN, and ˇ is the output weight matrix. Optimal weights and bias for a SLFN can be found by using back-propagation learning algorithm, which requires the user to specify learning rate and momentum. However, there is no guarantee that the global minimum error can be found.

(10)

where H+ is the Moore-Penrose generalized inverse of matrix H. Eq. (10) defines the key component of extreme learning machine (ELM), and provides some remarkable advantages with respect to the standard back-propagation approach. For example, it straightforwardly gets the smallest norm of weights with the best generalization performance and avoids the local minima where back-propagation learning algorithm may be trapped. Furthermore, input weights and biases can be randomly assigned if the activation function is infinitely differentiable (Huang et al., 2012).

312

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

Fig. 8. Classification maps for AELM (first row 1), RS (second row), MS (third row), nEQB (fourth row) and MCLU (fifth row) applied to data set 2. Different input sets are considered; coherence (first column); coherence and MP (second column); coherence , MP and DMP (third column).

The algorithmic pseudo-code of the ELM method can be summarized as follows: Algorithm.1: ELM Input: labeled training set X, activation function g(x), hidden neuron numbers N. randomly select the input layer weight W and bias B; calculate the hidden layer output H by using (9); calculate the output weight ˇ by using (10); Output: input layer weight W, bias B and output weight ˇ.

Proposed AELM classification approach The query by committee (QBC) and its variants (Seung et al., 1992; Freund et al., 1997; Leskes, 2005; Copa et al., 2010) are among the most positively evaluated techniques for Active Learning. In this paradigm, the agreement among candidate instances relative to the

current classifiers are explored, and the inconsistency of outputs from the classifiers and ensemble is utilized as criterion for recommendation. Besides, Bagging (Bootstrap Aggregating) is one of the most well-known, intuitive and simplest frameworks for regression and classification problems in ensemble learning (EL). Let’s consider an ensemble of  independent classifiers {D1 , D2 , . . ., D }, with s = [s1 , s2 , . . ., s ]T representing the label output of this ensemble, where si ∈ ˝ is the label assigned by classifier Di , to a pixel x. P(si ) denotes the probability that classifier Di labels x as class si ∈ ˝. From the conditional independence, the posterior probability used to label x can be expressed as (Kuncheva, 2004):



P(ϑk |s) =

P(ϑk ) i=1 P(si |ϑk ) P(ϑk )P(s|ϑk ) = P(s) P(s)

(11)

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

313

30

70 68

25

66 62 60

Time (s)

OA(%)

64

AELM

58

SVM_RS

56 54

SVM_MS

52

SVM_nEQB

50

SVM_MCLU

20 15 10 5

110 150 190 230 270 310 350 390 430 470 510 550 590 630 670 710 750 790 830 870 910 950 990

110 150 190 230 270 310 350 390 430 470 510 550 590 630 670 710 750 790 830 870 910 950 990

Sample size

Sample size

(a)

(b)

83

90

80

80 70

Time (s)

71 68 65 AELM

SVM_RS

56

SVM_MS

SVM_nEQB

53

SVM_MCLU

59

AELM

SVM_RS

SVM_MS

SVM_nEQB

SVM_MCLU

74

OA(%)

SVM_nEQB

0

77

60 50 40 30 20 10

50

110 150 190 230 270 310 350 390 430 470 510 550 590 630 670 710 750 790 830 870 910 950 990

110 150 190 230 270 310 350 390 430 470 510 550 590 630 670 710 750 790 830 870 910 950 990

0

Sample size

Sample size

(d)

AELM SVM_RS SVM_MS SVM_nEQB

110 150 190 230 270 310 350 390 430 470 510 550 590 630 670 710 750 790 830 870 910 950 990

SVM_MCLU

60 55 50 45 40 35 30 25 20 15 10 5 0

AELM

SVM_RS

SVM_MS

SVM_nEQB

SVM_MCLU

110 150 190 230 270 310 350 390 430 470 510 550 590 630 670 710 750 790 830 870 910 950 990

80 78 76 74 72 70 68 66 64 62 60 58 56 54

Time (s)

(c)

OA(%)

SVM_RS

SVM_MS SVM_MCLU

48

62

AELM

Sample size

Sample size

(f)

(e)

Fig. 9. Overall classification accuracy (%) and consumed time (seconds) for the proposed and compared methods applied to data set 3. Different input sets are considered; (a) and (b): coherency; (c) and (d): coherency and MP; (e) and (f): coherency, MP and DMP.

The denominator P(s) does not depend on ϑk and can be ignored, so the support for class ϑk can be calculated as:



∝ log(P(ϑk )) + log ⎝



i=1,si =ϑk

 



i



(12)

(1 − i )⎠

i=1,si = / ϑk

labeled by Di as belonging to class ϑs , while their true class label is wk and Nk is the total number of elements from class ϑk . Taking cmi(k,s) /Nk as an estimate of the probability P(si |ϑk ), and Nk /N as an

i

(13)

i=1

Applying majority voting, the support for class ϑk in Eq. (13) depends on the votes for class ϑk , and Eq. (13) is equivalent to:  

k (x) ∝

 where  i denotes the individual accuracy. Let’s call CMi the confusion matrix for the classifier Di , then cmi(k,s) is the number of pixels

cmik,s

−1

Nk

P(si |ϑk )

i=1

 

1

k (x) ∝

 

k (x) ∝ P(ϑk )

estimate of the probability for class ϑk (Kuncheva, 2004), Eq. (12) is equivalent to:

=

di,k , di,k

i,c=k

1 if classifier Di correctly labels x into class k. 0

(14)

otherwise.

Using SVM margin maximization as an example, the margin for an instance is related to the certainty of its classification. Instances

314

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

Fig. 10. Classification maps for AELM (first row 1), RS (second row), MS (third row), nEQB (fourth row) and MCLU (fifth row) applied to data set 3. Different input sets are considered; coherence (first column); coherence and MP (second column); coherence, MP and DMP (third column).

with correct label and high certainty will have large margins, while a small margin will cause instability of the classification. In this sense, a good criterion Q for AL must be capable to assign a high rank to samples located in the margins, close to the support vectors (Tuia et al., 2009, 2011b). Moreover, according to the concept of “margin” from statistical learning theory, maximizing the margins (“boosting the margins”) by providing the most informative

training set will lead to a smaller upper bound on the testing error (Schapire et al., 1998; Zhou, 2012). Therefore, we can conclude that the most informative samples should be located in small margins and are hard to be correctly classified by the base classifier in EL. From a mathematical point of view, if the supports {k (x)|k = 1, 2, . . ., c} for a candidate x have the minimum variance (i.e., the

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

315

Table 1 Overall classification accuracy (OA) and kappa statistics () for the proposed and compared methods applied to data set 1 (F1: coherence; F2: coherence and MP; F3: coherence, MP and DMP.). Methods

AELM

Statistics

OA (%)



OA (%)



OA (%)



OA (%)



OA (%)



71.9 91.78 91.66

0.64 0.90 0.89

68.39 91.58 91.67

0.60 0.89 0.89

70.54 92.19 92.07

0.62 0.90 0.90

69.75 91.99 91.69

0.62 0.90 0.89

70.02 92.29 92.24

0.62 0.90 0.90

F1 F2 F3

Features

RS

MS

nEQB

MCLU

The underlined values are the highest accuracy achieved using different feature sets. Table 2 Overall classification accuracy (OA) and kappa statistics () for the proposed and compared methods applied to data set 2 (F1: coherence; F2: coherence and MP; F3: coherence, MP and DMP.). Methods

AELM

Statistics

OA (%) 81.6 85.26 86.86

F1 F2 F3

Features

RS

MS

nEQB

MCLU



OA (%)



OA (%)



OA (%)



OA (%)



0.76 0.81 0.83

81.96 85.11 84.72

0.78 0.81 0.81

82.63 85.43 85.45

0.77 0.81 0.81

82.46 84.07 85.66

0.77 0.80 0.81

83.51 85.41 85.33

0.78 0.81 0.81

The underlined values are the highest accuracy achieved using different feature sets. Table 3 Overall classification accuracy (OA) and kappa statistics () for the proposed and compared methods applied to data set 3 (F1: coherence; F2: coherence and MP; F3: coherence, MP and DMP.). Methods

AELM

Statistics

OA (%)



OA (%)



OA (%)



67.69 80.72 77.85

0.63 0.78 0.75

67.8 75.45 76.84

0.63 0.72 0.74

68.99 76.22 78.88

0.65 0.73 0.76

F1 F2 F3

Features

RS

MS

nEQB OA (%) 69.42 75.67 79.57

MCLU 

OA (%)



0.65 0.72 0.77

68.85 78.79 79.46

0.65 0.76 0.77

The underlined values are the highest accuracy achieved using different feature sets.

supports for each class are almost equal and there is the maximum uncertainty in class assignments), then this candidate will have the highest ranking score. This can be expressed mathematically as: (x) ∼ = min{Var(1 (x), 2 (x), . . ., c (x))}

(15)

To correctly exploit this idea, we need to stress that, from a statistical point of view, the larger the number of independent classifiers ␨, the closer the resulting approximation to the real posterior probability P(s|ϑk ), and the closer to the actual value the support k (x) in Eq. (11). Furthermore, the following equation express accuracy for the majority voting strategy: QmajV =

 







 m

  m (1 − )−m

(16)

m= ϑ/2 +1

where 冀L/2冁 denotes the floor of L/2. It can be noticed that if  > 0.5, QmajV in above equation will monotonically increase and QmajV → 1 as → ∞. Hence, the quality criterion Q for the posterior probability estimate depends on the numbers of classifiers  and the classification accuracy . That requires a considerable number of base classifiers must be used, hence the base classifiers must be capable of fast operation, and high generalization performance. As a matter of fact, ELM meets these requirements, exactly like SVM, which has been already widely investigated in AL but has the drawback of being sensitive to kernel parameters. As introduced above, AELM is based on the query-by-committee paradigm in order to exploit both the advantage of AL in selecting the most informative samples, and the one by ELM about fast and effective operation. In this approach, a large number of ELMs is considered as base learners’ set, by using a Bagging strategy with 100% bootstrap ratio (in this sense, the diversity is only caused by the randomness of input layer weights and biases for the ELM). Unlabeled candidates are ranked according to the minimum variety

criterion defined by Eq. (15), and this step is followed by a labeling by the user of the candidates with the highest ranking score, which leads to eventually add them to the labeled training set. This process is iteratively repeated until the unlabeled data pool is empty or the training samples are adequate to train the ELM. The algorithmic pseudo-code of the proposed AELM method can be summarized as follows: Algorithm.2: AELM Inputs: initial labeled training set X, unlabeled candidate pool U, number of instances to add at each iteration , number of ELM L, activation function g(x) in ELM, k-fold cross validation set to search for the optimum hidden layer structure. Repeat: 1: Divide the training set X into K partitions; 2: for i = 1 to L do 2.1: Get the optimum hidden layer structure (neurons number ni ) of current ELMi for current training set Xi by using k-fold cross validation strategy; i 2.2: Get ELMmodel = ELM(Xi , g(x), ni ); 2.3: Calculate the hidden layer output HU of U based on (9) using W and B i ; from ELMmodel i 2.4: Get the label YUi of U by Eq. (9) using HU and ˇ from ELMmodel ; end for 1 2 3: Calculate supports for all candidates in U based on YU = [YU , YU , . . ., YUL ] by Eq. (14) for all classes; 4: Ranking the candidates pool U according to the criterion in Eq. (15); 5: Select the most interesting candidates S from U; 6: The user S assign the label to the recommended candidates S ; 7: Update the training set X by adding the labeled S ; 8: Update the candidates pool U by removing the selected S ; Until: a stopping criterion is met. Outputs: labeled set X.

Experimental data sets Four quad polarimetric SAR data sets are used for experiment. The first and fourth data sets are freely distributed by MDA Geospatial Services Inc. (http://gs.mdacorporation.com), while second and

316

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319 81

70

80

AELM

SVM_RS

60

SVM_MS

SVM_EQB

50

SVM_MCLU

79

77

AELM

76

SVM_RS

75 74

SVM_MS

73

SVM_EQB

72

Time (s)

OA (%)

78

40 30 20 10

SVM_MCLU 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

0 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

71

Sample size

Sample size

(b)

(a) 84

100

82

90 80 70

78 AELM 76 SVM_RS 74 SVM_MS 72 70

Time (s)

OA (%)

80

60 50 40 30

SVM_EQB

20

SVM_MCLU

10 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

0

Sample size

Sample size

(d)

(c) 85

80

84

70

83

60

81 80

AELM

79

SVM_RS

78

SVM_MS

77

SVM_EQB

76

SVM_MCLU

50 40

AELM

SVM_RS

30

SVM_MS

SVM_EQB

20

SVM_MCLU

10 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

0 50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

75

Time (s)

82

OA (%)

SVM_RS SVM_EQB

50 90 130 170 210 250 290 330 370 410 450 490 530 570 610 650 690 730 770 810 850 890 930 970

68

AELM SVM_MS SVM_MCLU

Sample size

Sample size

(e)

(f)

Fig. 11. Overall classification accuracy (%) and consumed time (seconds) for the proposed and compared methods applied to data set 4. Different input sets are considered; (a) and (b): coherency; (c) and (d): coherency and MP; (e) and (f): coherency, MP and DMP.

third data sets are provided by the European Space Agency (ESA) (http://earth.eo.esa.int/polsarpro/default.html). Data set 1 (see Fig. 1a) with a size of 600 × 1200 pixels was captured by RadarSAT-2 with fine quad-polarization mode on November 11, 2009. The study site is located close to the San Francisco Golden Gate Bridge, and the data sets has a spatial resolution of 8 m. Five classes were considered for classification: water, vegetation, built-up area 1, built-up area 2 and built-up area 3, (Fig. 1b). Building areas are divided into built-up area 1, 2 and 3 according to the backscatter responses, block size and directions, and size of building units, as detailed in Fig. 2 and visible in Figs. 3 and 4. Data set 2 is a subset of an airborne L-band fully polarized EMISAR image of Foulum (Denmark), acquired on April 17, 1998, with 5 m spatial resolution. It represents a scene including forest, rye, oats, winter wheat and urban areas (Fig. 1c and d). Data set 3 was obtained by the airborne sensor AirSAR, operating in L-band and flown by NASA JPL over the Flevoland region (The Netherlands) in 1989. The scene shown in Fig. 1e has a spatial resolution of 6.6 m in the slant range direction and 12.10 m in the

azimuth direction, and covers a large agricultural area of flat topography and homogeneous soils. The data set has been used by Lee et al. for pixel-based classification research in Lee et al. (1994), and the ground truth map shown in Fig. 1f, lists 11 classes, consisting of eight different crops, from stem beans to wheat, in addition to bare soil, water, and forest. Finally, Data set 4, shown in Fig. 1g, was captured by Radarsat-2 with fine quad-polarization mode on April 8, 2008, over the same Flevoland area. This data set has a spatial resolution of 8 m, and five classes were considered: water, forest, two types of crops (labeled “farmland 1” and “farmland 2”) and urban area (Fig. 1h). To describe more in detail the relations between polarimetric backscattering responses and physical properties of the ground targets, Fig. 2 shows a VHR image sample for each of the “built-up 1”, “built-up 2” and “built-up 3” classes, while the four backscattering responses by the Yamaguchi decomposition model (Yamaguchi et al., 2005) are shown in Figs. 3 and 4. As expected, different builtup areas show different backscattering responses. For instance, the “built-up 1” and “built-up 2” classes have a stronger backscatter response than the “built-up area 3” class with respect to double

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

317

Fig. 12. Classification maps for AELM (first row 1), RS (second row), MS (third row), nEQB (fourth row) and MCLU (fifth row) applied to data set 4. Different input sets are considered; coherence (first column); coherence and MP (second column); coherence, MP and DMP (third column).

bounce and odd bounce (see Fig. 3a and c). Additionally, they show similar backscattering response with respect to vegetation areas as for helix and volume contributions. Results and discussions All the experiments were carried out using the MatlabTM software on a Windows 8 64 bit system with Intel® CoreTM i7-3770

CPU, and 32GB of RAM. The total number of ELMs in the committee is set to 100, a sigmoid is used as the activation function, the searching range for the optimum hidden layer structure (hidden neurons numbers) is set to [10–500], and a 3-fold cross validation technique is used for optimum neuron number selection. For the AL methods use for comparison, such as margin sampling, random sampling, multi class level uncertainty and entropy query-by-bagging, SVMs with RBF kernels were used, and a grid search using 3-fold cross

318

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

Table 4 Overall classification accuracy (OA) and kappa statistics () for the proposed and compared methods applied to data set 4 (F1: coherence; F2: coherence and MP; F3: coherence, MP and DMP.). Methods

AELM

Statistics

OA (%)



OA (%)



OA (%)



OA (%)



OA (%)



79.89 82.88 82.95

0.73 0.77 0.77

80.22 82.68 82.94

0.73 0.77 0.77

78.52 82.96 84.14

0.72 0.77 0.79

80.58 83.53 84.3

0.74 0.78 0.79

78.61 82.85 84.07

0.72 0.77 0.79

Features

F1 F2 F3

RS

MS

EQB

MCLU

The underlined values are the highest accuracy achieved using different feature sets.

validation was implemented for optimal parameter selection. Additionally, the number of entropy query-by-bagging is set to 4, while the bootstrap sampling ratio in entropy query-by-bagging is set to 0.63, considered as the “golden ratio” in sampling based EL paradigms. Moreover, in all test 10 pixels from each class are randomly selected as initial training samples, and 10 unlabeled pixels with the highest ranking score are added to labeled training set at each iteration process. To compute MP and DMP, the structure elements are set to be disks with an increasing diameter from 3 to 9 pixels. Finally, the overall accuracy (OA), kappa coefficient () and CPU time consumption (in seconds) were used to evaluate the performance of the considered methods. Results on data set 1 (San Francisco) Fig. 5 shows the overall accuracies and computational costs during the learning phase for all the considered methods with different input sets. Fig. 6 shows instead the corresponding classification maps with overall accuracy values (in percentage). Table 1 shows the overall classification accuracy (OA) and kappa statistics () values. AELM shows better performance than multi-class level uncertainty, random sampling, normalized entropy-by-bagging and margin sampling when using polarimetric coherency features alone. After adding spatial features, MP and DMP, the overall accuracy is increased (from 71% to 92%), especially for multi-class level uncertainty, normalized entropy-by-bagging and margin sampling and AELM. Although AELM has a performance which is slightly lower than margin sampling, multi-class level uncertainty and normalized entropy query-by-bagging, its computational costs are always much lower than all other methods. For instance, it takes less than 25 s for each training phase for AELM, while the other approaches take at least 40 s. Results on data set 2 Fig. 7 shows the overall accuracies and computational costs during the learning phase for all the considered methods with different input sets. Fig. 8 shows instead the corresponding classification maps with overall accuracy values (in percentage). Table 2 shows the overall classification accuracy (OA) and kappa statistics () values. It has to be noted that AELM shows the lowest overall accuracy on polarimetric coherence feature compared with the other four methods. However, AELM has the best performance after adding the MP spatial features (OA larger than 85%), and shows a similar performance after adding DMP. In comparison with the other SVMbased methods, AELM is the most efficient one, with at least twice the speed of the other approaches. Results on data set 3 Fig. 9 shows the overall accuracies and computational costs during the learning phase for all the considered methods with different input sets. Fig. 10 shows instead the corresponding classification maps with overall accuracy values (in percentage). Table 3 shows

the overall classification accuracy (OA) and kappa statistics () values. Several interesting conclusions can be obtained from these figures and tables: (1) for coherence features, AELM shows the same performance with respect to the overall classification accuracy as multi-class level uncertainty and random sampling, while margin sampling and normalized entropy query-by-bagging have better performances; (2) AELM shows the best performance with respect to overall classification accuracy after adding the MP spatial features (from 67% to 80%), while the normalized entropy queryby-bagging shows the worst performance when considering both coherency and MP; (3) the computational costs of all considered methods grow with the training sample size, but AELM has the fastest training speed when considering coherency and spatial features together. Results on data set 4 Finally, results for the data set 4 using polarimetric features and polarimetric-spatial features are shown in Figs. 11 and 12 and Table 4. It must be noted that ALEM shows better performance than multi-class level uncertainty and margin sample using polarimatric coherency only, but lower performances than entropy query-by-bagging and random sampling. However, after adding spatial features the overall classification accuracy is increased (from 78% to 83%), but its performance is still slightly lower than others. Instead, AELM is constantly faster than others (less than 15 s, while other methods take at least 30 s). Conclusions In this paper a new method, called active extreme learning machine (AELM), is proposed for polarimetric SAR image classification by combining active learning with extreme learning machine and integrating polarimetric and spatial features. To validate the performance of the proposed method, four full-polarimetric SAR images collected by spaceborne (RADARSAT-2) and airborne (AIRSAR and EMISAR) sensors were considered. Moreover, state of the art active learning algorithms such as margin sampling, normalized entropy query-by-bagging and multiclass level uncertainty were used for comparison and validation purposes. According to the experimental results, the proposed AELM method shows strong classification performances in terms of overall accuracy, better in some cases compared to other AL algorithms. In general, spatial features (morphological profiles and differential morphological profiles) improve the discrimination capability for some of the classes with respect to using coherence only, demonstrating in our opinion the advantage of combining spatialpolarimetric features. Finally, by using Bagging and ensemble learning to estimate the multi-class uncertainty instead of calculating the posterior probability function, in terms of speed and efficiency, the proposed technique with AELM outperforms the other approaches in all the test.

A. Samat et al. / International Journal of Applied Earth Observation and Geoinformation 35 (2015) 305–319

Acknowledgements The research is partially supported by the National Natural Science Foundation of China under Grant No. 41171323, Jiangsu Provincial Natural Science Foundation under Grant No. BK2012018, and Key Laboratory of Geo-Informatics of National Administration of Surveying, Mapping and Geoinformation of China under Grant No. 201109. This work was performed while A. Samat was visiting the TLC &RS Lab of the Universty of Pavia under a grant by the CARIPLO Foundation, Project 2009-2936. References Ainsworth, T.L., Kelly, J.P., Lee, J.S., 2009. Classification comparisons between dualpol, compact polarimetric and quad-pol SAR imagery. ISPRS J. Photogr. Remote Sens. 64 (5), 464–471. Benediktsson, J.A., Palmason, J.A., Sveinsson, J.R., 2005. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 43 (3), 480–491. Breiman, L., 1996. Bagging predictors. Mach. Learn. 24 (2), 123–140. Cloude, S.R., Pottier, E., 1996. A review of target decomposition theorems in radar polarimetry. IEEE Trans. Geosci. Remote Sens. 34 (2), 498–518. Conradsen, K., Nielsen, A.A., Schou, J., Skriver, H., 2003. A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 41 (1), 4–19. Copa, L., Tuia, D., Volpi, M., Kanevski, M., 2010. Proc. SPIE 7830, Image and Signal Processing for Remote Sensing XVI, 78300K , Unbiased query-by-bagging active learning for VHR image classification. Dalla Mura, M., Atli Benediktsson, J., Waske, B., Bruzzone, L., 2010. Morphological attribute profiles for the analysis of very high resolution images. IEEE Trans. Geosci. Remote Sens. 48 (10), 3747–3762. Demir, B., Persello, C., Bruzzone, L., 2011. Batch-mode active-learning methods for the interactive classification of remote sensing images. IEEE Trans. Geosci. Remote Sens. 49 (3), 1014–1031. Doulgeris, A.P., Anfinsen, S.N., Eltoft, T., 2008. Classification with a non-Gaussian model for PolSAR data. IEEE Trans. Geosci. Remote Sens. 46 (10), 2999–3009. Feng, G., Huang, G.B., Lin, Q., Gay, R., 2009. Error minimized extreme learning machine with growth of hidden nodes and incremental learning. IEEE Trans. Neural Netw. 20 (8), 1352–1357. Ferro-Famil, L., Pottier, E., Lee, J.S., 2001. Unsupervised classification of multifrequency and fully polarimetric SAR images based on the H/A/Alpha-Wishart classifier. IEEE Trans. Geosci. Remote Sens. 39 (11), 2332–2342. Freund, Y., Seung, H.S., Shamir, E., Tishby, N., 1997. Selective sampling using the query by committee algorithm. Mach. Learn. 28 (2–3), 133–168. He, C., Li, S., Liao, Z., Liao, M., 2013. Texture classification of PolSAR data based on sparse coding of wavelet polarization textons. IEEE Trans. Geosci. Remote Sens. 51 (8), 4576–4590. Huang, G.B., Chen, L., Siew, C.K., 2006a. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Netw. 17 (4), 879–892. Huang, G.B., Zhu, Q.Y., Siew, C.K., 2006b. Extreme learning machine: theory and applications. Neurocomputing 70, 489–501. Huang, G.B., Zhou, H., Ding, X., Zhang, R., 2012. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. B: Cybern. 42 (2), 513–529. Kong, J.A., Swartz, A.A., Yueh, H.A., Novak, L.M., Shin, R.T., 1988. Identification of terrain cover using the optimum polarimetric classifier. J. Electromagn. Waves Appl. 2 (2), 171–194. Kuncheva, L.I., 2004. Combining Pattern Classifiers: Methods and Algorithms. John Wiley & Sons, New Jersey, pp. 123–148. Lam, L., Suen, S.Y., 1997. Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Trans. Syst. Man Cybern. 27 (5), 553–568. Lardeux, C., Frison, P.L., Tison, C., Souyris, J.C., Stoll, B., Fruneau, B., Rudant, J.P., 2009. Support vector machine for multifrequency SAR polarimetric data classification. IEEE Trans. Geosci. Remote Sens. 47 (12), 4143–4152. Lee, J.S., Grunes, M.R., Kwok, R., 1994. Classification of multi-look polarimetric SAR imagery based on complex Wishart distribution. Int. J. Remote Sens. 15 (11), 2299–2311. Leskes, B., 2005. The value of agreement: a new boosting algorithm. In: Learning Theory. Springer Berlin Heidelberg, pp. 95–110. Lonnqvist, A., Rauste, Y., Molinier, M., Hame, T., 2010. Polarimetric SAR data in land cover mapping in boreal zone. IEEE Trans. Geosci. Remote Sens. 48 (10), 3652–3662.

319

Luo, T., Kramer, K., Goldgof, D.B., Hall, L.O., Samson, S., Remsen, A., Hopkins, T., 2004. Active learning to recognize multiple types of plankton. In: IEEE Proceedings of the 17th International Conference on Pattern Recognition, 3, pp. 478–481. Maghsoudi, Y., Collins, M., Leckie, D.G., 2011. Nonparametric feature selection and support vector machine for polarimetric SAR data classification. In: IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 2857–2860. McNairn, H., Shang, J., Jiao, X., Champagne, C., 2009. The contribution of ALOS PALSAR multipolarization and polarimetric data to crop classification. IEEE Trans. Geosci. Remote Sens. 47 (12), 3981–3992. Minhas, R., Baradarani, A., Seifzadeh, S., Jonathan Wu, Q.M., 2010. Human action recognition using extreme learning machine based on visual vocabularies. Neurocomputing 73 (10), 1906–1917. Mitra, P., Murthy, C.A., Pal, S.K., 2004. A probabilistic active support vector learning algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 26 (3), 413–418. Pal, M., 2009. Extreme learning machine based land cover classification. Int. J. Remote Sens. 30 (14), 3835–3841. Pal, M., Maxwell, A.E., Warner, T.A., 2013. Kernel-based extreme learning machine for remote-sensing image classification. Remote Sens. Lett. 4 (9), 853–862. Paladini, R., Ferro Famil, L., Pottier, E., Martorella, M., Berizzi, F., Dalle Mese, E., 2013. Point target classification via fast lossless and sufficient – invariant decomposition of high-resolution and fully polarimetric SAR/ISAR data. Proc. IEEE 101 (3), 798–830. Pasolli, E., Melgani, F., Bazi, Y., 2011. Support vector machine active learning through significance space construction. IEEE Geosci. Remote Sens. Lett. 8 (3), 431–435. Pesaresi, M., Benediktsson F A, 2001. A new approach for the morphological segmentation of high-resolution satellite imagery. IEEE Trans. Geosci. Remote Sens. 39 (2), 309–320. Qi, Z., Yeh, A.G.O., Li, X., Lin, Z., 2012. A novel algorithm for land use and land cover classification using RADARSAT-2 polarimetric SAR data. Remote Sens. Environ. 118, 21–39. Rajan, S., Ghosh, J., Crawford, M.M., 2008. An active learning approach to hyperspectral data classification. IEEE Trans. Geosci. Remote Sens. 46 (4), 1231–1242. Rignot, E., Chellappa, R., Dubois, P., 1992. Unsupervised segmentation of polarimetric SAR data using the covariance matrix. IEEE Trans. Geosci. Remote Sens. 30 (4), 697–705. Roy, N., McCallum, A., 2001. Toward Optimal Active Learning Through Monte Carlo Estimation of Error Reduction. ICML, Williamstown, pp. 441–448. Samat, A., Du, P., Liu, S., Li, J., Cheng, L., 2014a. E2LMs: ensemble extreme learning machines for hyperspectral image classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 7 (4), 1060–1069. Samat, A., Du, P., Baig, M.H.A., Chakravarty, S., Cheng, L., 2014b. Ensemble learning with multiple classifiers and polarimetric features for polarized SAR image classification. Photogr. Eng. Remote Sens. 80 (3), 239–251. Schapire, R.E., Freund, Y., Bartlett, P., Lee, W.S., 1998. Boosting the margin: a new explanation for the effectiveness of voting methods. Ann. Stat. 26 (5), 1651–1686. Seung, H.S., Opper, M., Sompolinsky, H., 1992. Query by Committee. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. ACM, pp. 287–294. Soille, P., Pesaresi, M., 2002. Advances in mathematical morphology applied to geoscience and remote sensing. IEEE Trans. Geosci. Remote Sens. 40 (9), 2042–2055. Soria-Olivas, E., Gomez-Sanchis, J., Martin, J.D., Vila-Frances, J., Martinez, M., Magdalena, J.R., Serrano, A.J., 2011. BELM: Bayesian extreme learning machine. IEEE Trans. Neural Netw. 22 (3), 505–509. Suresh, S., Venkatesh Babu, R., Kim, H.J., 2009. No-reference image quality assessment using modified extreme learning machine classifier. Appl. Soft Comput. 9 (2), 541–552. Tong, S., Koller, D., 2002. Support vector machine active learning with applications to text classification. J. Mach. Learn. Res. 2, 45–66. Tuia, D., Ratle, F., Pacifici, F., Kanevski, M.F., Emery, W.J., 2009. Active learning methods for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 47 (7), 2218–2232. Tuia, D., Volpi, M., Copa, L., Kanevski, M., Munoz-Mari, J., 2011a. A survey of active learning algorithms for supervised remote sensing image classification. IEEE J. Sel. Top. Signal Process. 5 (3), 606–617. Tuia, D., Pasolli, E., Emery, W.J., 2011b. Using active learning to adapt remote sensing image classifiers. Remote Sens. Environ. 115 (9), 2232–2242. van Zyl, J.J., Arii, M., Kim, Y., 2011. Model-based decomposition of polarimetric SAR covariance matrices constrained for nonnegative eigenvalues. IEEE Trans. Geosci. Remote Sens. 49 (9), 3452–3459. Waske, B., Braun, M., 2009. Classifier ensembles for land cover mapping using multitemporal SAR imagery. ISPRS J. Photogr. Remote Sens. 64 (5), 450–457. Yamaguchi, Y., Moriyama, T., Ishido, M., Yamada, H., 2005. Four-component scattering model for polarimetric SAR image decomposition. IEEE Trans. Geosci. Remote Sens. 43 (8), 1699–1706. Zhou, Z.H., 2012. Ensemble Methods: Foundations and Algorithms. CRC Press, Boca Raton.