Neurocomputing 108 (2013) 103–110
Contents lists available at SciVerse ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Rare signal component extraction based on kernel methods for anomaly detection in hyperspectral imagery Yanfeng Gu a,n, Lin Zhang b a b
School of Electronics and Information Engineering, Harbin Institute of Technology, Harbin, China Division of Adults and Graduates Studies, Eastern Nazarence College, MA, USA
a r t i c l e i n f o
abstract
Article history: Received 6 September 2012 Received in revised form 25 October 2012 Accepted 12 November 2012 Communicated by Lixian Zhang Available online 3 January 2013
Anomaly detection is one of hot research topics in hyperspectral remote sensing. For this task, RX detector (RXD) is a benchmark method. Unfortunately, Gaussian distribution assumption adopted by RXD cannot be well satisfied in hyperspectral images due to high dimensionality of data and complicated correlation between spectral bands. In this paper, we address this problem and propose an algorithm called rare signal component extraction (RSCE), aiming at finding a subspace where the Gaussian assumption is well obeyed and improving detection performance of RXD. RSCE algorithm first utilizes kernel singular value decomposition (KSVD) to construct a kernel-based whitening operator, and then, carries out kernel-based whitening on hyperspectral data. After that, RSCE algorithm is to extract and determine a singular signal subspace by means of independent component analysis in reproducing kernel Hilbert space (RKHS) space and singularity measure. Numerical experiments were conducted on two real hyperspectral datasets. The experimental results show that the proposed RSCE algorithm greatly improves the detection performance of RXD and outperforms other state-of-the-art methods. & 2013 Elsevier B.V. All rights reserved.
Keywords: Hyperspectral imagery Anomaly detection Independent component analysis Kernel methods (KM) High-order statistics Singular value decomposition (SVD)
1. Introduction Hyperspectral imaging provides fine spectral resolution for good description of ground materials. By means of this merit of hyperspectral imaging, anomaly detection, as a class of crucial techniques promoting rapid applications of hyperspectral images, has become a hot topic recently. In most natural scenes, research of rare and small targets is a relevant issue as they usually are man-made. In hyperspectral imagery, anomaly detection aims at finding rare pixels having different spectral signatures from the surrounding background clutters. Main characteristic of anomaly detection consists in that it carries out detection without priori spectra about target materials. In addition, it is generally assumed that the rare anomalies are viewed as be embedded within background clutter with a very low signal-to-clutter ratio (SCR), namely, the anomalies are mixed with background clutter even at subpixel-level ([8–10,13–16,18,20,21]). Recently, RX detector (RXD), which was proposed by Reed and Yu [19], has been considered as a benchmark method for anomaly detection in multispectral and hyperspectral imaging. RXD is derived from the generalized-likelihood ratio test (GLRT) and an adaptive detector with constant false alarm rate (CFAR) what we would pursue
n
Corresponding author. Tel.: þ86 451 86403020; fax: þ86 451 86413583. E-mail address:
[email protected] (Y. Gu).
0925-2312/$ - see front matter & 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.neucom.2012.11.016
in anomaly detection. In RXD, it is assumed that both of the spectra of the anomalous targets and the covariance of the background clutters are unknown. Furthermore, the background clutter is assumed as being obeyed to unimodal Gaussian distribution. Thus, Mahalanobis distance between the pixel to be detected and the background mean vector is compared to a threshold to detect anomalies. At the same time, the mean vector and covariance matrix of the background clutter are estimated by using those pixels surrounding the pixel. After RXD was proposed, some modified RXD-based detectors were investigated one after the other. For example, Chang et al. proposed a number of modified RXD, such as normalized RXD (NRXD), correlation matrix based NRXD (CNRXD) [3]. Actually, the anomaly detection of hyperspectral images can be thought as a design of filtering anomalies of interest from discrete spatial–spectral signals. There are some important factors we should consider like rationality and stability of model, and nonlinear physical phenomena ([24–26]). When one uses these detectors to deal with hyperspectral data, there are two main factors which limit the performance of RXD and the modified RXD, i.e., rationality of the Gaussian assumption and high dimensionality of the data. First, the assumption of the background clutter being obeyed to unimodal Gaussian distribution is better satisfied, the RXD is more effective. However, the unimodal Gaussian assumption cannot be satisfied generally in real case because the background clutters are mixed each with other or with anomalies at pixel or subpixel scale. This fact leads to poor false alarm performance [14]. If there are multiple classes of
104
Y. Gu, L. Zhang / Neurocomputing 108 (2013) 103–110
background clutters in test region, it is more difficult to properly represent the underlying distribution by means of the unimodal Gaussian distribution [2]. To overcome the limitation of unimodal Gaussian distribution, researchers [22] proposed linear mixture of Gaussian models to more properly characterize nonhomogeneous multicomponent scenes. The parameters of the Gaussian mixture model can be estimated using the stochastic expectation maximization (SEM) approach [17]. However, it is badly limited by the requirement for knowing the number of Gaussian models. Second, the high dimensionality of hyperspectral images is another factor badly limiting the performance of the conventional RXD. This factor makes covariance estimation of the background clutter more difficult and unreliable. Principal component analysis (PCA), which is one of the most conventional feature extraction and dimensionality reduction algorithms in many pattern classification applications, is able to pre-process the hyperspectral images and is helpful to improve RXD. Furthermore, several independent component analysis (ICA)-based algorithms have been proposed in Ref. [7]. Compared with PCA, the merit of ICA stems from the use of high-order statistics. By means of ICA, those algorithms built anomaly detection map combining the thresholded independent components (ICs) with high kurtosis. Unfortunately, for hyperspectral images, the high dimensionality indicates complicated correlation between spectral bands [4]. As a result, those linear algorithms, like PCA and ICA, are not always satisfied to detect anomalies in hyperspectral images. Along with the wide applications of kernel methods (KM), nonlinear detection algorithms have also been developed in hyperspectral remote sensing. In the kernel-based detectors, high-order correlation between spectral bands is implicitly exploited by using kernel function. In this way, one is able to capture the high-order correlation structures in input data set and get nonlinear decision boundaries. Kwon and Nasrabidi proposed a nonlinear kernelized version of the conventional RXD by extending the RXD from the original input space to a high-dimensional feature space on the whole dimensionality of input data [12]. This detection algorithm, called Kernel RX, demonstrated superior performance over the conventional RXD on real hyperspectral data. Kernel RX algorithm is a whole kernelized algorithm in feature space. As a result, the kernelization operation improves separability of anomalies of interest from background clutters, but a different role of the kernelized components in feature space is not distinguished. Banerjee et al. proposed a method for anomaly detection in hyperspectral images based on support vector data description (SVDD), which is a kernel method for modeling the support of a distribution [2]. The SVDD-based detection is a nonparametric algorithm, which possesses several attractive merits, such as sparsity, good generalization, and use of kernel. By means of the use of kernel for modeling the support of the nontrivial multimodal distributions, the SVDD-based algorithm kept away from obstacle brought by mathematically unimodal Gaussian assumption. In fact, The SVDD-based algorithm models the probability density function (PDF) though one class support vector machine (SVM). For this reason, there is no surely guarantee that all local samples come from same single background clutter rather than multiple background clutters. As a consequence, the detection performance of SVDD-based detection will have a decline while the number of training samples is inadequate or multiple background clutters surround the pixel under test. Recently, Gu et al. [7] have proposed a selective kernel principal component analysis (SKPCA)-based feature extraction algorithm for anomaly detection in hyperspectral images, in which KPCA is used to capture the higher order correlation structure of input data, and local kurtosis is used as a measure to select the most singular component for anomaly detection. In fact, the data variance is weakly affected by the rare anomalous signals and is mainly controlled by the background clutters. Therefore, the techniques, like PCA and KPCA, estimate the signal subspace addressing mostly the background clutters and ignoring the presence of rare target pixels [1].
In this paper, we investigate the anomaly detection in hyperspectral images and present a rare signal component extraction algorithm called RSCE to improve the detection performance of RXD. In the RSCE algorithm, kernel-based singular value decomposition (KSVD) is utilized to build kernel subspace for background clutters and capture the nonlinearly higher order correlation of input hyperspectral images. Then, a kernel whitening operator, which is derived from the kernel subspace, is performed on the original data. After that, we are able to perform ICA to get the kernel-based nonlinear ICs. Finally, the rare signal component (RSC) can be found from the kernel-based ICs by means of singularity measure. For all these descriptions, the main works of this paper can be determined in the following: (1) the conformation of KSVD-based whitening operator, which enables one to effectively reduce the data dimensionality and rapidly capture nonlinearly higher order correlation of the input hyperspectral dataset; (2) the separable extraction for kernel-based ICs, which is realized by KSVD and linear FastICA and overcome the computational limitation from directly kernelindependent component analysis (KICA); (3) the determination of the RSC by the singularity measure from kernel ICs, which makes the unimodal Gaussian distribution rationalized and is more suitable for rare signal detection than kernel-based principal components (KPC). The rest of this paper is organized as follows. In Section 2, we provide a brief introduction to RXD, which is used to implement the final detection in the RSCE algorithm. In Section 3, we describe information about the RSCE algorithm in detail, including key procedures such as KSVD-based whitening, ICs extraction, and singularity measures for RSC. In addition, realization outline of the RSCE algorithm is given in this section. Experiments and result analyses based on two real hyperspectral datasets are shown in Section 4, and conclusions are drawn in Section 5.
2. RX algorithm In the conventional RXD, two competing hypotheses to be distinguished are given by H0 : z ¼ n ðtarget absentÞ H1 : z ¼ at þ n ðtarget presentÞ
ð1Þ
where z is an input spectral signal to be detected and consists of l spectral bands denoted by z ¼ ½z1 z2 zl T , a 4 0, n is a vector that represents process of the background clutters and noise, and t is the spectral signature of the anomalous target given by t ¼ ½t 1 t 2 t l T . In RXD, the target signature t and background covariance GB are assumed to be unknown. The model assumes that the data arise from two normal probability density functions with the same covariance matrix but different means. Under H0, the data (background clutters) are modeled as N(0,GB), and under H1 the data are modeled as N(t,GB). In most cases, local unimodal Gaussian distribution is adopted in the conventional RXD. Generally, estimating the mean and covariance of background clutter is conducted within a double concentric window. The inner window is used to protect the probably anomalous information, whose size should be equal to the size of the target of interest in the test scene. In contrast to the inner window, the spectral ^ B and samples in the outer window are used to estimate the mean l ^ B of background clutter. Based on the description covariance G mentioned above, expression of the conventional RXD is given by 1 ^B ^B ^B T G zl ð2Þ RXðzÞ ¼ zl ^ B is the background covariance matrix estimated from the where G ^B reference data of the background clutters in the outer window, and l is the estimated sample mean of the background clutters. Let Z be a threshold, if RX(z)Z Z, then assumption with target present is adopted, otherwise assumption with target absent.
Y. Gu, L. Zhang / Neurocomputing 108 (2013) 103–110
105
3. Rare signal component extraction
dimensionality d, i.e., fxi gli ¼ 1 . The l samples should be centered and normalized before the following processing. To carry out KSVD,
In this paper, the rare signal component is defined as a signal subspace which is only composed of information of rare signal of interest and noise. The procedure of the proposed RSCE algorithm is shown in Fig. 1. First of all, KSVD is performed on input hyperspectral data which is centered and normalized beforehand. By this way, one is able to obtain a nonlinear kernel prewhitening operator which is key factor in ICA model. Then, ICA is used to transform and mine independent components (ICs) which are nonlinearly kernel independent components (KICs) here. After that, we search the rare signal component (RSC) by calculating and comparing singularity measure for each KRSC. The KRSC extracted from hyperspectral data will work well in final RXD-based anomaly detection.
the sample set fxi gli ¼ 1 is mapped into feature space F by means of a mapping f : x A X -fðxÞ A F first. Thus, the mapped samples in feature space are denoted by P ¼ ½fðx1 Þfðx2 Þ fðxl ÞT . In feature space F , we define a inner product as follows ð3Þ k xi ,xj ¼ fðxi ÞUf xj
3.1. Subspace model Let H is an original hyperspectral data cube and X(i) is the ith spectral band with size n m. So the hyperspectral data can be denoted by H ¼ fXðiÞ 9XðiÞ A Rnm , i ¼ 1,2,. . .,lg. For convenience of expression, let X3D is a 3D matrix form of H and is denoted by ð1Þ
ð2Þ
ðlÞ
lnm
. Furthermore, we introduce an X3D ¼ ½X ,X ,. . .,X , X3D A R operator of vectorization that orders a matrix into a vector, for 1 2 example, vec ¼ ½1234T . By using this operator, we can 3 4 obtain the 2D matrix form of the hyperspectral data H and write it as X2D ¼ ½x1 x2 xl T , X2D A Rld , d¼n m and xi ¼ ½xi1 xi2 xid . So now we have l samples (signals or spectral images) with
Hyperspectral imagery
Data preprocessing (centering and normalizing)
where k(U,U) is a kernel function and needs to satisfy Mercer condition so that f belongs to a function space which has a structure of so-called reproducing kernel Hilbert space (RKHS). Now we consider the kernel matrix given by K ¼ PPT ¼ ½fðx1 Þfðx2 Þ fðxl ÞT ½fðx1 Þfðx2 Þ fðxl Þ
where the kernel matrix K is symmetric and positive semidefinite (PSD). The eigenvalue decomposition of matrix K ¼QDQT. Here, Q is composed of eigenvector of K and D is a diagonalized matrix with eigenvalues of K. Generally speaking, singular value decomposition (SVD) of the mapped dataset P can be written as P ¼ URVT
ð5Þ
where U,V are real orthogonal unitary matrices, R a rectangular diagonal matrix with nonnegative real numbers on the diagonal. Without loss of generality, it can be assumed that the singular values are ordered by decreasing order. In the case of KSVD, one is unable to directly determine the number of the non-zero singular values M as the number depends on the mapping (here, it is infinity for Gaussian RBF kernel) and referred to the size of the dataset. Now we consider that there are at most M non-zero singular values, correspondingly, the number of columns of U and V is M. So the problem turns out to how to compute the subspace model U of the mapped data. Now let us consider a projection onto r-dimensional subspace, namely, the matrix U is composed of r(roM) columns that is U ¼ ½UðrÞ UðMrÞ
ð6Þ
As we know, the kernel matrix K is an l l symmetric and PSD under the assumption about the number of non-zeros singular values. So the matrix U can be obtained by computing the eigenvalue decomposition of K and is given as follows 1=2 UðrÞ ¼ PQ ðrÞ DðrÞ ð7Þ
Calculate kernel prewhitening matrix
Generally, the mapped samples fðyÞ needs to be centered in . This can be realized by substituting Kc for K, i.e.,
Perform kernel SVD
Kc ¼ K1l UKKU1l þ 1l UKU1l
ICA model
ð4Þ
Calculate VD
Separate KICs by using Fast ICA
ð8Þ
where 1l is an l l matrix, of which all elements are equal to 1/l. Afterwards the projection and the centering operations in the new subspace U(r) can easily be integrated. So we can write result of the projection as T T 1=2 P ¼ DðrÞ Q ðrÞ Kc ð9Þ Y ¼ UðrÞ Here, Y is the projected data and Y A Rrd . 3.2. ICA
Search RSC by calculating singularity on each KIC
Output RSC for anomaly detection
ICA performs linear decomposition on observations to obtain a set of statistically independent components. Let Y be a matrix composed of r observations, denoted by Y ¼ ½y1 y2 yr T and S is another matrix containing r unknown signal sources, denoted by S ¼ ½s1 s2 sr T . The ICA mixture model is given by Y ¼ AS
Fig. 1. The flowchart of the proposed RSCE algorithm.
ð10Þ
where A is full-rank matrix with size r r, called mixing matrix.
106
Y. Gu, L. Zhang / Neurocomputing 108 (2013) 103–110
The goal of ICA is to estimate unknown each independent signal source si and mixing matrix A by using only observations Y, subjecting to constraints that the independent signal sources obey non-Gaussian distribution or one of the signal sources obeys Gaussian distribution at most. We assume that the estimation of independent signal source si can be obtained by using Y and a linear transform matrix W. So we have the equation as follows S^ ¼ WY
ð11Þ
Thus, substituting Eq. (10) for Y in Eq. (11), we have S^ ¼ WAS
ð12Þ
If the estimation of signal sources in Eq. (12) is valid, a constraint can be found and given by WA ¼ I
ð13Þ
where I is identity matrix with size r r. It can be found that W is an unmixing matrix relevant to A. NonGaussianity maximization is frequently used to measure statistical independence. By non-Gaussianity maximization and the constraint in Eq. (13), we can convert ICA mixture model into a constraint optimization problem. In the RSCE algorithm, FastICA algorithm is adopted because the FastICA algorithm suits for detection and discrimination of rare features [11]. Generally, we need to determine intrinsic dimensionality of the hyperspectral images before performing ICA. In our algorithm, the intrinsic dimensionality is used as the dimensionality of output data after KSVD, i.e., the dimensionality r of the subspace model as same as the input parameter for FastICA model. The virtual dimensionality (VD), which is used to determine the intrinsic dimensionality of the original hyperspectral data in our research, is defined as the minimum number of spectrally distinct signal sources that characterize the hyperspectral data from the perspective view of target detection and classification. More details about VD can be found in literature [5]. By integrating the KSVD and ICA, the RSCE realizes a fast extraction of nonlinear KICs from huge data amount of hyperspectral images that is a very difficult task for the existing kernel ICA or fast KICA algorithms.
have good statistical characteristic, especially robustness, the function G( ) is generally chosen from three functions: G1 ðtÞ ¼ ð1=a1 Þ logcosha1 t, G2 ðtÞ ¼ exp ðð1=2Þt 2 Þ, G3 ðtÞ ¼ ð1=4Þt 4 . Among all random variables with same variance matrix, variable obeying to the Gaussian distribution has maximum entropy. To find the most singular component from KICs, we need to compute the singularity value for each KIC which are obtained by KSVD and FastICA while kurtosis or negentropy is used as the singularity measue. Then, we choose the component which has maximum kurtosis or negentropy as the most singular one. Local singularity (LS) is another effective measure and can be used to select the most effective component for detection such as RX detector. The LS, which is also based on kurtosis and is computed in local sliding windows, is described in detail in Ref. [13]. 3.4. The procedure of the RSCE algorithm The procedure of the RSCE algorithm is summarized as follows. Algorithm Rare signal component extraction 1: Convert the original hyperspectral data from 3D- to 2D-form by means of the vectorization operator 2: Center and normalize the 2D data 3: Compute VD of the normalized 2D data 4: Compute kernel matrix K according to Eq. (4) 5: Compute normalized kernel matrix Kc according to Eq. (7) 6: Perform SVD on Kc and build the subspace model for prewhitening in feature space 7: Prewhiten the 2D data by the subspace projection 8: Input the prewhitened data and the subspace model into FastICA model 9: Output KICs 10: Compute singularity of each KIC according to singularity measures given in part C of this section and find the RSC with maximum singularity 10: Output the RSC extracted
3.3. Singularity measure of RSC In our research, three singularity measures are adopted, i.e., kurtosis, kurtosis-based local singularity, negentropy. Given a ^ be mean and variance estimations random variable a, let a and s of N observations ai ði ¼ 1,2,. . .,NÞ, respectively. Here, a is gray value of a pixel in KICs and N is total number of the pixels in a KIC or a local region of interest. The estimation of kurtosis can be written as
g^ 4 ¼ 3 þ
N X
4
^ ðai aÞ4 =ðN1Þs
ð14Þ
i¼1
¨ In fast ICA algorithm, Hyvarinen and Oja [6] defined negentropy to measure non-Gaussianity of each independent component. The negentropy is given as follows JðtÞ ¼ H t gauss HðtÞ ð15Þ where t is a random variable to be measured, tgauss is a Gaussian random variable which has a variance matrix as same as t, J is just the negentropy, and H is called differential entropy. ¨ To simplify computation of negentropy, Hyvarinen et al. proposed an approximate expression as follows JðtÞ fE½GðtÞE½GðvÞg2
ð16Þ
where v is a Gaussian random variable with zero mean and variance (unit variance if data is standardized) as same as ti, and G( ) is practically any nonquadratic function. To make objective function
Fig. 2. The first hyperspectral data collected by AVIRIS.
Y. Gu, L. Zhang / Neurocomputing 108 (2013) 103–110
4. Experiments and result analysis In this section, we demonstrate the relevant experiments and results analysis. 4.1. Data Two real hyperspectral datasets were used in our detection experiments. Fig. 2 demonstrates the image scenes of 1st hyperspectral data. The first hyperspectral data were collected by AVIRIS (airborne visible/infrared imaging spectrometer) and its spatial resolution is 3.5 m 3.5 m per pixel. After removing the bands
107
that correspond to the water absorption regions, low SNR and bad bands, 126 available bands of the first data in wavelength range from 0.4 mm to 1.8 mm remained in our experiments. The spatial size of the first data is of 400 100 pixels. In the first data, there are 3 airplanes and 38 anomalous man-made targets in the image scene which are interest of anomalies. The second hyperspectral data were collected by OMIS (operative modular imaging spectrometer), and its spatial resolution is 3.5 m 3.5 m per pixel too. The spatial size of the second data is 190 400 pixels. There are 64 available bands of the second data in wavelength range from 0.4 mm to 0.7 mm. In this image scene, there are 4 anomalous targets to be detected.
Fig. 3. The first-ten KICs prewhitened by KSVD. (a) KIC1 (b) KIC2 (c) KIC3 (d) KIC4 (e) KIC5 (f) KIC6 (g) KIC7 (h) KIC8 (i) KIC9 (j) KIC10.
108
Y. Gu, L. Zhang / Neurocomputing 108 (2013) 103–110
Table 1 Singularity measures of kernel independent components by three methods. Method
Measure of RSC for KICs
Global kurtosis Kurtosis-based local singularity Negentropy
1st
2nd
3rd
4th
5th
6th
7th
8th
9th
10th
290.3 10 4.06
142.2 2 3.92
74.74 3 4.00
67.41 1 4.03
49.89 13 4.90
13.30 27 4.95
15.91 7 3.96
1.37 16 4.04
7.05 6 3.91
3.38 6 4.05
4.2. Experimental settings In the radial basis function (RBF) as RSCE algorithm, Gaussian 2 ðKÞij ¼ k xi ,xj ¼ exp f:xi xj :2 =2s2 g, which was used to build the kernel prewhitening matrix, was adopted because of its good performance and applicability. In this paper, the value of s was determined experimentally. In addition, the number of output kernel ICs is determined as 10 according to VD. To validate the effectiveness of the proposed RSCE algorithm for anomaly detection, ROC is used as a rule to evaluate and compare the detection performance quantitatively. The ROC represents relationship of detection probability Pd and false alarm rate Pf, and can provide quantitative comparison about the detection performance. Based on the ground truth, the coordinate’s ranges of those panels in image scene were obtained. Anomalous targets and false alarms were identified by determining whether anomalous-like targets fell into the coordinates ranges of the panels or not. The identifications of targets and false alarms were performed on the scene, while each pixel is considered as unit of the identifications. Pd and Pf are defined respectively as follows P d ¼ N d =NT
and
Pf ¼ Nf =N A
0.4
0.3
0.2
0.1
0 -10
-5
0
5
10
-5
0
5
10
-5
0
5
10
0.3
ð17Þ
where Nd is the number of the detected pixels, NT is the total number of truth target pixels in the scene, Nf is the number of the false alarms, and NA is the total number of image pixels. Fig. 3 demonstrates the first-ten KICs extracted from 1st hyperspectral data. From this figure, it can be found that 6th KIC includes all key information for anomaly description almost. This fact indicates that the 6th KIC is just the RSC which is most informative component for anomaly detection. By means of high-order statistics, one is able to search and extract the RSC. Here, three measures aforementioned, i.e., global kurtosis, kurtosis-based local singularity (LS) [7], and negentropy, were computed on the KICs. The measure values are listed in Table 1. From this table, it is easy to find that the LS and negentropy have similar performance and both of them are better than the global kurtosis. For 1st hyperspectral data, it is easy to determine 6th KIC as the RSC according to LS or negentropy measure. In this RSC, the H0 or H1 assumptions is met well. Accordingly, RXD will be work well. To illustrate this situation, we plotted 1D histogram distribution of the RSC and compared the distribution with Gaussian distribution which has mean and variance as same as the RSC. Fig. 4 shows the comparisons between the 1D histogram distribution and Gaussian distribution. Through this statistical comparison, it can be found that the 4th KIC and 10th KIC obey the Gaussian distribution well. On the contrary, the RSC, i.e., 6th KIC, departs apparently from the Gaussian distribution. Namely, the RSC extracted from original hyperspectral data contains key anomalous information indeed. Furthermore, we used ROC curve to evaluate and compare the proposed RSCE algorithm with the conventional RX, and other stateof-the-art algorithms such as the selective KPCA (SKPCA for short) [7], SVDD [2], and kernel RX [12]. The ROC from 1st hyperspectral data by using five different algorithms is shown in Fig. 5. By comparing the ROC of different algorithms, it can be found that the RSCE algorithm greatly outperforms other algorithms in
0.2
0.1
0 -10
0.4
0.3
0.2
0.1
0 -10
Fig. 4. Comparisons between the 1D histogram distribution and Gaussian distribution for RSC and other KICs. (a) 4th KIC, (b) RSC, i.e., 6th KIC and (c) 10th KIC.
terms of detection performance. From the ROC, it is also observed that the RSCE enormously improved the detection performance of the conventional RXD by performing KSVD and ICA as a feature
Y. Gu, L. Zhang / Neurocomputing 108 (2013) 103–110
extraction before detection. This fact strongly validates the effectiveness of the RSCE for anomaly detection again. Similarly, detection experiment was conducted on the 2nd hyperspectral data which are greatly different from the 1st hyperspectral data. Fig. 6 illustrates ROC of detection results by five different algorithms. The RSCE algorithm also presents the superior performance to other algorithms. In both of experiments on 1st and 2nd real hyperspectral data, the proposed algorithm RSCE embedding feature extraction in feature space outperformed the kernel RX algorithm in which kernel functions are used as well and the conventional RX is kernelized in whole feature space. Additionally, what the proposed algorithm provides features to final detection is kernelized IC rather than kernelized PC that is outputted by SKPCA algorithm. The experimental results indicate the kernel IC extracted by RSCE is superior to the kernel PC by SKPCA in terms of anomaly detection. The RSCE algorithm utilizes KSVD to build kernel-based prewhitening operator and introduces the operator into FastICA model. In this way, the RSCE realizes an efficient kernel-based ICA which can be viewed a separable kernel ICA. In the experiments, we compared the computational time of the FastICA with KSVD and kernel ICA (KICA) [23]. The experiments were run on an Intel dual core 2.2 GHz machine with 3 GB RAM. When three
109
bands of 1st hyperspectral data inputted to two methods, the computational time of the FastICA with KSVD is 1.94 s, and KICA is 170 s. When all bands of hyperspectral data were used, the KICA generally failed to complete the separation of KICs.
5. Conclusions In this paper, a kernel-based rare signal component extraction algorithm, called RSCE, is proposed for anomaly detection in hyperspectral imagery. The core contribution of this paper consists in the following aspects: the conformation of KSVD-based whitening operator, the separable extraction for kernel-based ICs, and the determination of the RSC by singularity measure from kernel ICs. By means of the procedures above, RSCE enables one to mine anomalous information from highly dimensional and nonlinear data meanwhile guarantees the rationality and effectiveness of mathematical assumption in the conventional RX detector. The excellent ability to extract RSC preserving best information of anomalies ensures the proposed RSCE algorithm contributes to good performance of anomaly detection in separable and efficient manner. The experimental results prove the proposed algorithm can greatly improve the performance of the conventional RXD;
0.9
Probability of detection
0.8 0.7 0.6 0.5 0.4 0.3 RSCE SKPCA RXD SVDD Kernel RX
0.2 0.1 0 10-4
10-3
10-2
10-1
False alarm rate Fig. 5. ROC from 1st hyperspectral data by using five different algorithms.
1 0.9
Probability of detection
0.8 0.7 0.6 0.5 0.4 0.3
RSCE SKPCA RXD SVDD Kernel RX
0.2 0.1 0 10-4
10-3
10-2 False alarm rate
Fig. 6. ROC from 2nd hyperspectral data by using five different algorithms.
10-1
110
Y. Gu, L. Zhang / Neurocomputing 108 (2013) 103–110
meanwhile, outperforms several state-of-the-arts algorithms for anomaly detection in hyperspectral images.
Acknowledgement This work was supported by the Natural Science Foundation of China under the Grants 60972144, Fundamental Research Funds for the Central Universities (Grant HIT.NSRIF.2010095), and Research Fund for the Doctorial Program of Higher Education of China under the Grant 20092302110033, and Heilongjiang Province Postdoctoral Scientific and Research Foundation. References [1] N. Acito, M. Diani, G. Corsini, A new algorithm for robust estimation of the signal subspace in hyperspectral images in the presence of rare signal components, IEEE Trans. Geosci. Remote Sens. 47 (11) (2009) 3844–3856. [2] A. Banerjee, P. Burlina, C. Diehl, A support vector method for anomaly detection in hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 44 (2006) 2282–2291. [3] C.I. Chang, Anomaly detection and classification for hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 40 (6) (2002) 1314–1325. [4] G. Camps-Valls, L. Bruzzone, Kernel-based methods for hyperspectral image classification, IEEE Trans. Geosci. Remote Sens. 43 (6) (2005) 1351–1362. [5] C.-I. Chang, Q. Du, Estimation of number of spectrally distinct signal sources in hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 42 (3) (2004) 608–619. ¨ [6] A. Hyvarinen, E. Oja, Independent component analysis: algorithms and applications, Neural Networks 13 (4/5) (2000) 411–430. [7] Y.F. Gu, Y. Liu, Y. Zhang, A selective KPCA algorithm based on high-order statistics for anomaly detection in hyperspectral imagery, IEEE Geosci. Remote Sens. Lett. 5 (1) (2008) 43–47. [8] P. Gurram, H. Kwon, T. Han, Sparse kernel-based hyperspectral anomaly detection, IEEE Geosci. Remote Sens. Lett. 9 (5) (2012) 943–947. [9] P. Gurram, H. Kwon, Support-vector-based hyperspectral anomaly detection using optimized kernel parameter, IEEE Geosci. Remote Sens. Lett. 8 (6) (2011) 1060–1064. [10] J.C. Harsanyi, 1993. Detection and Classification of Subpixel Spectral Signatures in Hyperspectral Image Sequences, Ph.D. Dissertation, University of Maryland. [11] A. Huck M. Guillaume, 2008. A CFAR algorithm for anomaly detection and discrimination in hyperspectral images. in: International Conference on Image Processing, 2008, pp. 1868–1891. [12] H. Kwon, N.M. Nasrabidi, Kernel RX algorithm: a nonlinear anomaly detector for hyperspectral imagery, IEEE Trans. Geosci. Remote Sens. 43 (2) (2005) 388–397. [13] D. Landgrebe, Hyperspectral image data analysis, IEEE Signal Process Mag. (2002) 17–28. [14] D. Manolakis, C. Siracusa, G. Shaw, Hyperspectral subpixel target detection using the linear mixing model, IEEE Trans. Geosci. Remote Sens. 39 (7) (2001) 1392–1409. [15] D. Manolakis, G. Shaw, Detection algorithms for hyperspectral imaging applications, IEEE Signal Process Mag. (2002) 29–43. [16] D. Manolakis, R. Lockwood, T. Cooley, J. Jacobson, Is there a best hyperspectral detection algorithm? Proc. SPIE 7334 (2009) 733402-1–733402-16. [17] P. Masson, W. Pieczynski, SEM algorithm and unsupervised statistical segmentation of satellite images, IEEE Trans. Geosci. Remote Sens. 31 (1993) 618–633.
[18] N.M. Nasrabadi, Regularized spectral matched filter for target recognition in hyperspectral imagery, IEEE Signal Process Lett. 15 (2008) 317–320. [19] S. Reed, X. Yu, Adaptive multiple-band CFAR detection of an optical pattern with unknown spectral distribution, IEEE Trans. Acoust. Speech Signal Process. 38 (1990) 1760–1770. [20] J.A. Richards, X. Jia, Remote Sensing Digital Image Analysis, third ed., Springer, New York, 1999. [21] F.C. Robey, D.R. Fuhrmann, E.J. Kelly, A CFAR adaptive matched filter detector, IEEE Trans. Aerosp. Electron. Syst. 28 (1) (1992) 208–216. [22] D. Stein, S. Beaven, L.E. Hoff, E. Winter, A. Shaum, A.D. Stocker, Anomaly detection from hyperspectral imagery, IEEE Signal Process Mag. 19 (1) (2002) 58–69. [23] H. Shen, S. Jegelka, A. Gretton, Fast kernel-based independent component analysis, IEEE Trans. Geosci. Remote Sens. 57 (9) (2009) 3498–3502. [24] L. Zhang, J. Lam, Necessary and sufficient conditions for analysis and synthesis of Markov jump linear systems with incomplete transition descriptions, IEEE Trans. Autom. Control 55 (7) (2010) 1695–1701. [25] L. Zhang, N. Cui, M. Liu, Y. Zhao, Asynchronous filtering of discrete-time switched linear systems with average dwell time, IEEE Trans. Circuits Syst. I 58 (5) (2011) 1109–1118. [26] L. Zhang, P. Shi, Stability, l2-gain and asynchronous HN control of discretetime switched systems with average dwell time, IEEE Trans. Autom. Control 54 (9) (2009) 2193–2200.
Yanfeng Gu received his Ph.D. in information and communication engineering from Harbin Institute of Technology, China in 2005. He joined as a Lecture in School of Electronics and Information Engineering, HIT. He was appointed as associate professor at the same institute in 2006; meanwhile he was enrolled in first Outstanding Young Teacher Training Program of HIT. He was a visiting scholar in Department of Electrical Engineering and Computer Science, University of California, Berkeley, U.S. from 2011 to 2012. His current research focuses on image processing in remote sensing, machine learning and pattern analysis, multiscale geometric analysis. He has published more than 60 peer-reviewed papers, four book chapters, and he is the inventor or co-inventor of seven patents. He is currently an Associate Professor in the Department of Information Engineering, Harbin Institute of Technology, Harbin, China. He is an IEEE member and a peer reviewer for several international journals such as IEEE Transaction on Geoscience and Remote Sensing, IEEE Transaction on Instrument and Measurement, IEEE Geoscience and Remote Sensing Letters, IET Electronics Letter.
Lin Zhang received her Bsc degree from College of Resources and Environment, Northeast Agricultural University, China in 2006. From Sep, 2006 to Nov, 2011, she worked as a Laboratory Assistant in the Department of Quality Evaluation, Harbin Sanctity Pharmaceutical Co., Ltd. Currently, she is seeking her Msc degree from Division of Adults and Graduates Studies, Estern Nazarence College, Massachusetts, USA. Mrs. Zhang’s research interests include Computation and Information processing in multiple disciplines spanned from Economic Management, communication networks to system biology.