Extraction of Endmembers from Spectral Mixtures F. J. Garcı´a-Haro,* M. A. Gilabert,* and J. Melia´*
L
inear spectral mixture modeling (LSMM) divides each ground resolution element into its constituent materials using endmembers which represent the spectral characteristics of the cover types. However, it is difficult to identify and estimate the spectral signature of pure components or endmembers which form the scene, since they vary with the scale and purpose of the study. We propose three different methods to estimate the spectra of pure components from a set of unknown mixture spectra. Two of the methods consist in different optimization procedures based on objective functions defined from the coordinate axes of the dominant factors. The third one consists in the design of a neural network whose architecture implements the LSMM principles. The different procedures have been tested for the case of three endmembers. First, were used simulated and real data corresponding to mixtures of vegetation and soil. Factors that limit the accuracy of the results, such as the number of channels and the level of data noise have been analyzed. Results have indicated that the three methods provide accurate estimations of the spectral endmembers, especially the third one. Moreover, the second method, that is based on the exploration of the mixture positions in the factor space, has demonstrated to be the most appropriate when the dimensionality of the data is reduced. Finally, this procedure was applied on a Landsat-5 TM scene. Elsevier Science Inc., 1999
INTRODUCTION The growing trend toward quantification in the Earth sciences generally involves the analysis of compositional * Departament de Termodina`mica, Facultat de Fı´sica, Universitat de Vale`ncia, Burjassot, Vale`ncia, Spain Address correspondence to F. J. Garcı´a-Haro, Dept. de Termodina`mica, Facultat de Fı´sica, Univ. de Vale`ncia, Dr. Moliner, 50, 46100 Burjassot, Vale`ncia, Spain. E-mail:
[email protected] Received 6 August 1997; revised 24 November 1998. REMOTE SENS. ENVIRON. 68:237–253 (1999) Elsevier Science Inc., 1999 655 Avenue of the Americas, New York, NY 10010
measurements on a large number of materials (Full et al., 1981). One way of viewing such a data set is to consider individual samples as mixtures of a relatively small number of endmembers. Linear spectral mixture modeling (LSMM) divides each ground resolution element into its constituent materials or components using endmembers which represent the spectral characteristics of the cover types. Endmembers are the features recognizable in a scene as being meaningful for an observer, and constitute abstractions of real objects that can be regarded as having uniform properties (Strahler et al., 1986). In many problems, the number of endmembers and their composition may be unknown and the problem becomes to unmix the given compositions into a number of endmembers supported by the dimensionality of the data. The estimation of these endmembers is not straightforward since they vary depending upon the scale and purpose of the study (Milton and Emery, 1995). The endmembers are also influenced by processes such as the scattering by surface materials or other factors such as the illumination geometry, which could be considered as additional sources of noise in the modeling approach. Methods developed in literature to deduce the endmembers have limitations. Some methods extract them simply from the image using the known spectral signatures of surface dominant cover types (Bierwirth, 1990). Other techniques such as principal component (PC) analysis enclose the cloud of pixel spectra within a solid geometric figure whose number of vertex is equal to the dimensionality of the basis space (White and Drake, 1993; Bryant, 1996). However, despite that these methods guarantee the feasibility of the outcoming fractions, they assume that the true endmembers are contained within the data set, which is usually not realistic. In practice, it is difficult to find pure targets (i.e., fully covered by an unique material) at the spatial resolution of satellite sensors. For example, the “matorral” spectral class presents a pixel cover of typically less than 40% in semi-arid areas. In this case its spectral signature cannot be captured 0034-4257/99/$–see front matter PII S0034-4257(98)00115-1
238
Garcı´a-Haro et al.
within a sample set of pixels nor predicted theoretically. Other methods have been developed to solve the problem when the endmembers are not represented by the samples (external endmembers). One of them consists in the alignment of image spectra to reference endmembers by means of matching techniques that compare pixel spectra to each library spectra. It enables us to accept or reject them according to parameters introduced in an expert system (Johnson et al., 1985). Other methods that are based on the QMODEL (Full et al., 1981), fuzzy algorithms (Bezdek, 1981; Full et al., 1982), or the parallel coordinate representation (Bateson and Curtiss, 1996) provide feasible endmembers from the innate relationships between samples located as vectors in the PCs space. It is common that compositional variables are inherently correlated (e.g., vegetation and soil fractions are often negatively correlated). A large degree of dependence between variables implies that the same amount of variability can be carried by a smaller number of new variables, each of which would be mutually independent (Full et al., 1982). These abstract variables (representing a new coordinate reference axes) can be defined and related to the original ones via linear equations, operations which are termed factor analysis (FA). In this article, three new methods have been developed to identify the number and the spectral signature of the major components from a set of mixture spectra. The first method consists in deducing the best possible rotation of the factor abstract axes to “align” them upon the directions of the real component spectra. The second method is based on the position of all the objects in the reduced “factor space.” Both methods employ a minimization procedure of an objective function appropriately defined. The third method consists in the design of a neural network whose architecture implements the LSMM principles. The three methods have been validated for the case of three endmembers. First, we have used simulated and real data corresponding to mixtures of vegetation and soil. Finally, the procedures have been applied on data from the sensor Thematic Mapper (TM) of satellite Landsat-5, since this sensor is well suited for vegetation monitoring (Franklin, 1986).
from the mixtures matrix D (when the fractions matrix F is also unknown). The first step of methods 1 and 2 consists in applying the FA procedure to decompose the original data into a unknown number, c, of abstract spectra (factors) which coincides with the number of pure components [Eq. (2)]: D5R·F
where R and F constitute the abstract representation of the “real” matrices R and F: the column vectors of the matrix R;{R1, R2,..., Rc} are the independent factor loadings which contain the dominant reflecting features, and ideally, each one represents one spectral property (Huete, 1986). The procedure to compute c, R, and F is based on the diagonalization of the covariance matrix of the observations D. The procedure is different depending if the number of spectral channels, n, is greater or smaller than the number of samples, s (Garcı´a-Haro, 1997). The problem of finding the number of factors, c, is one of the least known in FA (Malinkowski and Howery, 1977). Method 1: Rotation of the Factor Spectra {Ri} The original matrix R can be expressed from its abstract matrix by means of a linear transformation T: R5R·T.
(3)
In this way, the problem (i.e., the matrices R and F) is parametrized from uniquely the matrix T. The coefficients, tuv, of matrix T cannot be computed using standard methods, since matrix R is unknown. The procedure consists in computing the coefficients of the matrix T by means of an optimization procedure which minimizes the objective function, P1, defined from the addition of the chi-square residuals of Eq. (1) plus a penalty function which is introduced to constraint the possible solutions of matrices R and F to physically allowable values (i.e., fractions and reflectances must be nonnegative and fractions must add up to 1 within each sample or pixel): rij>0 (i51,2, . . . , n, j51,2, . . . , c), fik>0 (i51,2, . . . , c, k51,2, . . . , s),
THEORY AND METHODS
c
To set up the basis of the problem, we will express the LSMM formulation. Let n be the dimensionality of the spectral observations. Assuming s mixtures of c composing endmembers, the LSMM is expressed as c
dik5 o rij fjk , j51
o fik51
(k51,2, . . . , s).
i51
c
P1(tuv)5 o
o1 s
i51 k51
(1)
where D is the mixtures spectra matrix (n3s), R is the endmembers matrix (n3c), and F is the fractions matrix (c3s). The main aim is the estimation of the matrix R
(4)
The physical sense of introducing these constraints is to guarantee endmember combinations that adequately describe the observed mixture within each pixel. Thus the objective function is expressed as
or matricially: D5R·F
(2)
2
n fik(tuv) 2 d(fik)1 o kfl i51 n
o1 1 j 1
3d(rij)1 o i5
where:
s
5
2
eij(tuv) 2 , reij
o1 c
j51
2
rij(tuv) 2 krl (5)
Extraction of Endmembers from Spectral Mixtures
• e is the residual matrix: e5D2R·F.
51,0,
d(x)5
x>0, x,0.
(6) (7)
• reij represents the uncertainty of the element eij (provided that these uncertainties were unknown, they could be considered gaussian variables, reij5√eij). We took a unique value from averaging the matrix e, which does not alter significantly the results. • k fl and krl are the average values of matrices F and R. These parameters were introduced to weight adequately the two inequality constraints of Eq. (5) (since reflectance and fraction values are different in absolute terms). These parameters provide a formal measure of the uncertainty in each element of the unknown matrices. In order to simplify the computations, they were considered constant (values 1/c and 0.15 were arbitrarily assigned to k f l and krl, respectively), since the solution is quite insensitive to small variations of these parameters. The matrices R, F, and e must be evaluated within every iteration. The matrix R was computed using Eq. (3), and the matrix F was computed by applying the LSMM using the constrained least-squares estimator (Garcı´a-Haro et al., 1996). Finally, e was computed using Eq. (6). The two first terms on the right side of Eq. (5) are distinct from zero for the first iterations, but they vanish when the unknown parameters converge to the solution; hence the objective function becomes a chisquare (v2) one. The minimization procedure was based on the MIGRAD algorithm (Minuit, 1992), which demonstrated to be more accurate than other numerical procedures like SIMPLEX (Minuit, 1992). MIGRAD is a variable metric method with inexact line search, a stable metric updating scheme, and checks for the positive-definiteness of the covariance matrix, which is guaranteed by adding an appropriate constant along the diagonal (as determined by its eigenvalues). The main weakness of the algorithm is that it fails provided that the first derivatives are very inaccurate. Method 2: Exploration of the Positions of the Objects in the Factor Space The columns of the matrix F, which are usually termed the principal components (PCs), constitute the scores of the data points in the abstract axes space. The proposed method is based on the idea that all the objects (data vectors) are enclosed within a straight-sided geometric figure called polytope (Full et al., 1981) whose number of sides (or edges) is equal to the dimensionality of the basis space plus one. In this case all sample vectors can
239
be described as linear combinations of compositions represented by each vertex on such a figure. The procedure to find the vertex of the polytope, E, consists in an optimization procedure of an objective function, P2, defined by c
o1 1 k 1
P25 o i5
s
5
n fpik 2 p7 d(fpik)1 o i51 k fp l
o 1 21a·DIS, j 1 re s
5
eij
(8)
ij
where • E is a (c3c) matrix of unknown parameters. • FP is a (c3s) matrix that is computed (every iteration) from matrices E and F by means of a minimization procedure based on the leastsquares constrained estimator [Eq. (9)]: F5Fp·E.
(9)
• e is the residual of the minimization [Eq. (10)]: e5F2Fp·E.
(10)
• reij represents the uncertainty of the element eij, and k fpl the average of the values of the matrix FP. • The function DIS expresses the “proximity” between the positions of the computed endmembers, E, and the cloud of data, F [Eq. (11)]: c
DIS5 o
s
c
5
5
o o u fij2eiku 1 j 1 k 1
i5
b
(11)
• a and b are constants, which we define according to data characteristics. Once computed, the matrix E, the matrix of pure components spectra, R, was obtained as in Eq. (12): R5R·ET.
(12)
Figure 1 shows a clarifying diagram corresponding to the case of three components, showing how the hyperspace of objects falls within a plane and the polytope is a triangle. The first term on the right in Eq. (8) assures that the extremes {E1, E2, E3} situate on the vertices of the triangle, the second term guarantees that the extremes situate inside the plane and the third term controls the proximity of the extremes to the center of the cloud. As will be seen subsequently, appropriate values of parameter a are about 1 when there are not external endmembers and lay in the range 0.01–0.2 when the pure components are contained within the data. Otherwise, parameter b let us filter the presence of isolated points (i.e., it increases the stability of the mean compared to an individual data value) in order to deduce the endmembers from the entire data structure rather than from a few outlying points (e.g., R and Q in Fig. 1). We have employed as default values a51.0 and b51.0. Method 3: Design of a Neural Network Recently, either artificial intelligence concepts or neural computing techniques have received increasing attention for most remote sensing applications, especially for clas-
240
Garcı´a-Haro et al.
Figure 1. Diagram of the principles of method 2, showing the data vectors in the reduced abstract space. rp represents the residual of point P, and P9 its projection onto the plane.
sification. It is frequent the use of a supervised, feedforward structure employing a backpropagation algorithm that adjusts the network weights to produce convergence between the network outputs and a set of training data (Moody et al., 1996). The primary advantage of the neural networks model is its generality, inasmuch as it does not require a priori knowledge of the data statistical distribution and hence it does not require nonsingular covariance matrices. Neural network algorithms have also potential for treating complex data sets that include multispectral (Baraldi and Parmiggiani, 1995), multitemporal (Gopal and Woodcock, 1996), multisensor (Serpico
and Roli, 1995) or multidirectional (Abuelgasim et al., 1996) characteristics. In this study, we have analyzed the possibilities of neural networks to estimate the spectra of pure components. The neural network consisted in three layers of interconnected neurons. The structure of the neural architecture is shown in Figure 2A, where nli stands for the ith neuron of the lth layer. The input layer presented c processing units or nodes representing the fractions of the components within each mixture (hence c coincides with the number of endmembers). Thus the kth input vector presented to the network input, {xk1, xk2, . . . , xkc}, constitutes the unknown vector of fractions for the kth mixture (1
oli5 o wlijolj21 j51
(13)
The output of the neural network consists of n elements representing the data wavebands. Thus the output vector associated to the kth input, {yk1, yk2, . . . , ykn}, constitutes the spectral signature of the kth mixture. In our case, for a given input mixture k, oli is the network out-
Figure 2. A) Layer architecture of the neural network and notation employed to denote the neurons. B) Computations of the neuron nil.
Extraction of Endmembers from Spectral Mixtures
put (i.e., the reflectance in ith waveband yki). Thus the designed architecture implements the principles of the LSMM [Eq. (1)]. The basis of the backpropagation learning algorithm (Heermann and Khazenie, 1992) is the optimization technique of gradient descendent on the sum of the squared differences between the activation of the nodes in the output ykj and their desired values dkj (which in our case are the elements of D, the n3s mixtures spectra matrix). The objective is to minimize the error function, which is a least-squares classifier for the data of the training set [Eq. (14)]: s
d5 o
n
o (dkj2ykj)2. 1 j 1
k5
5
(14)
The application of the gradient descendent method yields the following iterative update rule for the weights of the last hidden layer [Eq. (15)]: W(i)5W(i21)1lr·E·XT,
(15)
where E5Y2D is the error matrix as obtained from the difference between the actual output matrix, Y, and the desired or target matrix, D, and lr is the learning ratio factor. Although we have employed the backpropagation algorithm, important modifications have been introduced. The most outstanding one is that the network inputs are not known and must be computed. Thus the training model must update not only the hidden layer weights but also the input vectors. We considered two different learning ratios, one for the weights and other for the input. In addition, a momentum term, M, was added to the error function in the backpropagation algorithm that allows the network to respond not only to the local gradient but also to recent trends in error surface. It accelerates the training and filters the high frequency variations in data (Rumelhart and McClelland, 1986). The training network was made as follows: (i) The change to be made in the matrix W for the epoch ith, DW(i), was the result of applying the generalized delta rule with momentum [Eq. (16)]: dW5lrW·E·XT, DW(i)5MW·DW(i21)1(12MW)·dW,
(16)
where lrW and MW correspond to the learning ratio and the momentum of matrix W. (ii) The change to be made in the fractions matrix, DX(i), was undertaken so that Eq. (1) were accomplished. Thus a new estimation of this matrix, X1, was computed by applying the LSMM to unmix the matrix of mixtures, Y, using the last estimation of the matrix of pure components, W; it was carried out by means of the constrained least-squares estimator (Garcı´a-Haro et al., 1996). Then, the change was computed as in Eq. (17):
241
dX5lrX·(X2X1) , DX(i)5MX·DX(i21)1(12MX)·dX,
(17)
where lrX and MX represent the learning ratio and the momentum of matrix X. When making the change to matrices W or X implied transverse into the region which has not physical behavior (i.e., fractions or reflectances not ranging between 0 and 1), the magnitude of the change was diminished to keep them inside the desired range. (iii) The function error was computed from the changed values of matrices W and X, either changing each of them separately or changing both together. It originates a new error function which can be used to estimate the definitive change in both matrices and in their learning ratios: If the new error exceeds the old error by more than a predefined ratio, the new values are discarded (and the learning ratios are decreased). Otherwise the best change is kept and learning ratios are, accordingly, increased. This modified adaptive learning rate algorithm increases the speed and reliability of the training procedure. The training network convergence does not depend on the initial values of LRY and LRW. One question of the three methods concerns the initial values of parameters in the iteration procedure. In all cases the election is equivalent to the a priori prediction of either the matrix of fractions or the matrix of pure component spectra, since both matrices are related [Eq. (1)]. Moreover, both matrices present physical constraints that should be accomplished [Eq. (4)]. There are several possibilities of predicting the initial inputs. The most simple procedure consists in selecting them randomly. Alternatively, when possible, it might be used a set of spectra representative of likely components. Another possible election is based on the cloud distribution of the mixtures in the PC features space (White and Drake, 1993; Bryant, 1996) since the feasible endmembers occupy the most extreme positions provided that the cloud has a convex pattern. An algorithm was applied to automatically identify the extremes for this precise situation. We tested both the procedure based on the random input and the procedure based on the PCs. Both of them provided generally similar outcomes. However, when using the random procedure, the proposed methods failed in a few cases (i.e., the iterative procedure did not converge or converged to an erroneous solution), and otherwise, when generating the initial values using the PCs procedure, the solutions hardly ever failed. This indicates the importance of an appropriate procedure to estimate the free parameters for the first iteration. The procedure based on the PCs seems to be suitable because it is general, operative and guarantees the reliability of the out-
242
Garcı´a-Haro et al.
comes, in agreement with other studies (Pinet et al., 1996; Martin et al., 1996). Nevertheless, the proposed methods can be considered robust and reliable inasmuch as they provide good results for different elections of the first iteration values of parameters. APPLICATION Simulated Data and Real Data The simulation and real cases addressed in this article deal with three-component mixtures showing a convex geometry for the topology of the data cloud distribution. The reason for not considering a more complex situation is twofold: (1) When applying LSMM from data with a reduced number of spectral bands, such as Landsat-5 TM data, the accurate estimation of a number of endmember proportions larger than 3 is limited when the spectral separability of the materials is small, irrespectively of the method employed to extract the endmembers (Garcı´a-Haro et al., 1996). In fact, the attainable accuracy (which is proportional to the hypervolume occupied by the endmembers positions in the features space) considerably diminishes when one endmember is collinear in some spectral regions with the rest of endmembers (Sabol et al., 1992; Garcı´a-Haro, 1997). (2) The proposed methods parametrize the spectral endmembers by means of a reduced number of free parameters (e.g., this number is equal to c2 in methods 1 and 2). Thus, the greater the number of endmembers, the lower the accuracy of the estimations and the greater the time of computations. Nevertheless, at a regional scale it is frequent that a high number of materials intermingle within the scene.
Furthermore, components that have not very distinctive spectral features (e.g., showing a reduced spectral contrast) may be not necessarily located at the edges of the cloud (which might have invaginations, i.e., a concave pattern). Although we have not taken into account such a situation in this article and hence we have overestimated the likely success of the different techniques, it does not invalidate their feasibility to handle the endmembers on scenes with a high level of complexity. For example, in order to optimize the computations, reference endmembers (from field measurements or a spectral library) might be incorporated into the model though having a well-calibrated image, and on the assumption that the reference materials adequately represent the spectral variability of the endmembers in the image scene (Tompkins et al., 1997). High spectral resolution spectra from 0.4 lm to 2.5 lm corresponding to three components: vegetation, soil, and coal (which was introduced to address the influence of soil brightness), which were mixed in appropriate proportions. We divided the fractions space into cubic grids of 1/r side (with r a positive integer) [Eq. (18)]: F5(p1/r, p2/r, . . . , pc /r) ,
(18)
where 0
c
(i51, . . . , c)
and
o pi5r . i 1 5
The number of generated mixtures is . 1r1c22 c21 2
s5
Since we considered r55, a number of s521 spectral mixtures were obtained. These spectra were sampled at different spectral resolutions (from 10 nm to 300 nm) to generate a synthetic set of mixtures spectra, and subsequently they were repeatedly combined with noise to simulate the real case. In fact, the imaging system limits
Figure 3. A) Spectra of the three pure components considered. B) Results of the FA decomposition of the real mixtures (10 nm resolution).
Extraction of Endmembers from Spectral Mixtures
243
Figure 4. Estimated spectra of the pure components (dashed lines) jointly with the original curves (solid lines) from synthetic data sets with different levels of noise (10 nm resolution).
the detectability in terms of i) spectral contrast (affected by the number, width, and location of spectral bands), and ii) level of noise. Noise was modeled to obtain a normal distribution about the actual signal, (i.e., zero mean gaussian noise, with covariance matrix given by r2I). Figure 3A shows the three spectra (10 nm spectral resolution) that were used to generate the synthetic mixtures: vegetation (Quercus), and two types of soil (carbonate and coal). The simulated data, although not analogous to real data, enabled us to assess the performance of the different methods under controlled, ideal circumstances. We then extended results to a more realistic case using data from a set of 21 mixtures of vegetation, soil, and coal measured in laboratory (Garcı´a-Haro et al., 1996). Designed plots presented seven varying amounts of vegetation characterized by their leaf area index (LAI), that ranged from 0 (bare soil) to 2.4 (close to 100% coverage). Two different sets of data were used, corresponding to high spectral resolution (around 10 nm) and data Figure 5. An example of spectrum regeneration, from data with 100 nm resolution and r52.0, corresponding to the mixture: 40% veg, 40% soil, 20% coal.
sampled in TM bands. Figure 3B shows the three primary loading factors extracted through the high resolution data. The first factor is a mean curve and corresponds mostly to the albedo of the spectral mixtures. Its shape is characteristic of vegetation response, although with unusual brightness values, which indicates that vegetation is responsible of the dominant variance in the spectral mixtures. The second factor contains spectral features of vegetation and soil. Finally, the third factor is low and flat, similarly to coal. Application of Method 1 Synthetic Mixtures Figure 4 shows the estimated spectra of the pure components from high resolution data sets presenting different levels of noise, jointly with the original spectra. In general, the three components spectra are well regenerated in all regions, although the performance seems to be less accurate for the coal component in the NIR region and for the soil component in the MIR region. The fact that Figure 6. Values of v2min/t as derived using method 1 from different spectral mixtures with varying level of noise and spectral resolution.
244
Garcı´a-Haro et al.
Figure 7. Values of the statistical error of matrix R (expressed in % reflectance units) vs. the number of bands for different levels of noise, as estimated from the uncorrelated errors of parameters t11, t12, and t13, which provide an estimation of the statistical errors of the vegetation, soil and shadow endmembers, respectively.
the coal endmember regeneration is the most problematic one is not surprising, since its signal-to-noise ratio is the lowest one, given its low albedo. Logically, the deviation between original and estimated spectra was directly proportional to the level of noise. Moreover, this residual also increased at decreasing the number of channels (as will be shown in Fig. 8), since the gaussian noise becomes more apparent when the data dimensionality decreases (Sabol et al., 1992). However, what is important is not only the number of wavebands, but also the width and location of the spectral channels. For example, the performance attained from TM wavebands was better than from spectra sampled at 200 nm, despite that the dimensionality of the TM data is smaller. These results are in agreement with results obtained by other authors (Sabol et al., 1992). Figure 5 shows an example of spectrum regeneration. The “pure” curve represents the original spectrum, which has been convoluted with a high level of noise to simulate extreme conditions. The “regenerated” curve represents the spectrum computed from the combination (in the estimated proportions) of the derived spectra. It can be observed that despite the great level of noise in-
troduced (“noisy” curve) the regenerated spectrum resembles the actual curve. Error Analysis Since residuals are normally distributed, v2min is a chisquare variable with t5N2L2C freedom degrees, where N is the number of data (N5n3s), L is the number of constraints (which coincides with s, because the fractions sum-to-one in each sample), and C is the number of free parameters (C5c2). It let us apply a chisquare test of the fit performance. Figure 6 shows the values of v2min/t (i.e., v2min normalized) for different spectra data sets with varying level of noise and different spectral resolutions. The fact that this variable was close to 1 confirmed the validity of the modeling approach. Results indicated that for p50.05 (95% confidence level) the null hypothesis that both spectra (the original and the modeled) are the same cannot be invalidated for any of the mixtures. Typical values obtained for the free parameters, tuv, were
1
2
1
2
0.36 20.23 20.06 0.02 0.02 0.02 Tˆ5 20.50 0.32 0.30 , E(Tˆ)5 0.07 0.07 0.08 , 20.01 0.42 0.53 0.06 0.05 0.06
Extraction of Endmembers from Spectral Mixtures
245
Figure 8. Values of RMS residual of matrices R, D (both expressed in % reflectance units), and F vs. the number of bands for different levels of noise. 1 refers to difference respect to the original (i.e., the real error), and 2 refers to the simulated data (i.e., the fit goodness).
where E(Tˆ) is a matrix composed by the errors of elements of matrix Tˆ as supplied by the minimization procedure. According to Eq. (3), the ith endmember is a linear combination of the three factor loadings, with coefficients that coincide with the elements of ith column of Tˆ. Since the relative errors associated with the first row (which constitute the projections of this endmember into the factor axes) are small, and the first factor contains the major contribution to the albedo, the error of the estimating spectra will be relatively small. In addition, the minimization algorithm itself supplies an error matrix associated with the unknown parameters, which are the inverse of the matrix of second derivatives of the objective function multiplied by the number of standard deviations we establish for the confidence level (Frodesen et al., 1979). We applied an algorithm (Garcı´a-Haro et al., 1996) to compute separately the confidence interval limits of these coefficients in a process that took also into account their correlations. Figure 7 shows the statistical errors, r, of the pure components spectra associated with the estimated uncorrelated errors of parameters t11, t12 and t13, which [ac-
cording to Eq. (3)] correspond to the components of vegetation, soil, and shadow, respectively. It can be observed as the error in the spectra decreases at increasing the number of wavebands, because it reduces the ambiguity introduced by the gaussian noise (Sabol et al., 1992). Moreover, the error of the coal component also increases proportionally to the data noise. This is due to that a slight variation of the parameters keeps them far from the “unallowable” region, that is, when an endmember combination sums up to 1 with negative and superpositive fractions (fi.1). Therefore, it originates a significant variation of the objective function, P1, which is proportional to the noise in data, since this is a chi-square function when the penalty functions keep null [see Eq. (5)]. Nevertheless, the errors of vegetation and soil components do not depend significantly on the noise of the spectral data. The reason for this is that the limits of the confidence level of parameters are not dictated by the chi-square function since the penalty functions predominate when the parameters transverse into the region that has not physical behavior. t21 and t31 behaved similarly to t11 (vegetation) and identically occurred to the components of soil and coal. As a conse-
246
Garcı´a-Haro et al.
Figure 9. A) Estimated spectral endmembers applying method 1 on real mixtures of vegetation, soil and coal (100 nm resolution). B) Example of spectra regeneration corresponding to two mixtures characterized by 16 g m22 coal and two different values of LAI. Measured spectra are shown in solid lines, and modeled spectra are shown in dashed lines.
quence of the correlation of parameters tuv, their overall error is inferior than it would be obtained by means of the quadratic error propagation law, being similar to that obtained in Figure 7. This error is reasonably small (below 2.0 in % reflectance units), being comparable to the simulated noise. Similarly, errors for the fractions were around 0.04 for vegetation, 0.025 for soil, and 0.03 for coal. Alternatively, we have analyzed the performance of the method on the basis of the deviation (residual) between the actual and the estimated values of matrices. Figure 8 shows the values of the root mean square (RMS) residual of matrices R, F, and D. They were obtained averaging the results from 20 noisy distributed data. In general, in all cases the RMS decreased with the number of bands and increased with the level of noise. Deviations from this are due to numerical inaccuracies in the minimization procedure. Similar behavior was found for matrices D and F (although with typical values around 0.2–0.8 for D and 0.01–0.03 for F). Despite the
fact that the values of the RMS of the endmembers matrix is many times greater than those of the samples matrix D, they are generally below 2%, which indicates a good performance. Identically, the RMS residual of estimating fractions (F matrix) were satisfactorily small. We have considered two different residuals for matrix D. 1 refers to difference respect to the original (pure), and 2 refers to the simulated (noisy) data and is hence a v2 function. As might be expected, the RMS2(D) residual is proportional to the level of noise, with a relation 1:1 except for the case of data with a reduced dimensionality, when the solution is adapted to the noise (at the expense of decreasing the reliability of the estimates). Nevertheless, the RMS1(D) residual behaves similarly to the RMS residual of matrices R and F, although it does not increase so fast when the number of wavebands is low. Moreover, it is inferior to the RMS2(D) residual, which indicates the validity of the method, that has the capability of filtering spurious noise in data.
Figure 10. A) Polytope, i.e. values of the scores of the data points in the abstract space PC1–PC2 (∗), extremes captured by the literature PCs method (n), and extremes captured by the proposed method 2 from data with 250 nm resolution and r51.0. B) Spectra of the pure components corresponding to Figure 10A (dashed lines) jointly with the original curves (solid lines).
Extraction of Endmembers from Spectral Mixtures
247
Figure 11. Values of RMS residual of matrices D, R, and F vs. the number of bands for different levels of noise.
Real Mixtures Finally, method 1 was undertaken on the set of real mixtures sampled at different spatial resolutions. The chisquare test indicated a similar good performance between original and modeled spectra, confirming the validity of the method to extract the spectra of pure components for mixtures of vegetation and soil, even from data with
a reduced dimensionality. Figure 9A shows the estimated spectra of the three components using data in TM wavebands. The spectra resemble the spectral signature of vegetation, soil, and coal, which reveals the applicability of the method to actual data. Figure 9B shows two examples of spectra regeneration, corresponding to plots with varying amounts of vegetation. It reveals a good agree-
Figure 12. Case of external components, that is, with the absence of the three pure data points, for the TM data set with r51.0. A) Polytope (symbols have the same meaning as in Fig. 10A). B) Estimated spectra of the pure components (dashed lines) jointly with the original curves (solid lines).
248
Garcı´a-Haro et al.
Figure 13. Case of 16 mixtures with the absence of five of the six data points on the soil–coal edge, for the TM data set with r51.0. We have considered both the default value for a, i.e., a51.0 (top) and a more appropriate value for this particular case (a50.2) (bottom). A) Polytope (symbols have the same meaning as in Fig. 10A). B) Estimated spectra of the pure components (dashed lines) jointly with the original curves (solid lines).
ment between the observed and the modeled curves. Similar results were obtained for plots with greater amounts of vegetation. Application of Method 2 Synthetic Mixtures Figure 10A shows an example of the polytope (mixture positions in the abstract space PC1–PC2), jointly with the extremes captured by the literature PCs method (White and Drake, 1993), which coincide with the pure data points, since the endmembers are contained in the mixtures, and the extremes captured by the method 2. It reveals that the proposed method has captured quite well the positions of the pure data points, despite that greater deviations were obtained for the coal endmember. Figure 10B shows the estimated spectra of the three components corresponding to the above data set (250 nm and r51.0). Although in this example, data noise induced a slight bias (underestimation) in the derived spectra of coal component, this effect is not systematic, but it depends on the noise distribution of the data. Figure 11 shows the RMS residual of matrices R, F,
and D (corresponding to average the results from 10 noisy distributed data). It can be observed as the residual increases proportionally to the level of noise. However, it does not tend to increase at decreasing the number of wavebands as occurred with method 1, but presents a maximum corresponding to the cases when the number of wavebands is similar to the number of samples (21). It could be due to inadequacies of the eigenvectors, because in this precise case the covariance matrix of data tends to be singular. The most outstanding fact is that the residual does not diverge when the number of wavebands is very small, which indicates the suitability of this method to be applied on data with a reduced dimensionality like TM. Effect of External Endmembers We tested the feasibility of method 2 to extract external endmembers, by applying it from a set of 18 mixtures with the absence of the three pure data points. Figure 12A shows an example of the polytope, jointly with the extremes captured by the literature PC method (which now do not coincide with the pure data points), and the extremes captured by method 2. It reveals that the pro-
Extraction of Endmembers from Spectral Mixtures
249
Figure 14. Polytope, that is, values of the scores of the data points in the abstract space PC1–PC2 (∗) and extremes captured by the method 2, using a50, from two different data sets (r51.0).
posed method captured relatively well the positions of the pure data points. Figure 12B shows the estimated spectra of the three components corresponding to the aforementioned case. Results show that endmembers are not considerably affected at taking off the three samples of pure endmembers, which indicates that the method does not require the presence of vertex-extreme samples (estimating outcomes represent purer spectra than could be found in any of the mixtures). Effect of Data Points on the Polytope Edges The method is likely to fail provided that no data fall on or near an edge of the mixing polytope. To test this effect, we have undertaken method 2 from a set of 16 mixtures with the absence of five of the six data points on the soil–coal edge. Figure 13A top shows the results obtained assigning the default value to parameter a (i.e., a51.0). It is observed as the method addressed reasonably well the vegetation and soil positions, but it did not work well enough for the coal one. Figure 13B top shows the estimated spectra of the three components, which re-
veals that coal spectrum was not adequately reproduced. Nevertheless, a more suitable election of the a value is bound to improve the results, as can be observed in Figure 13A bottom, which shows the results for the case a50.2. It revealed that when the value of a was diminished, the minimization was allowed to capture external endmembers, and the performance was significantly increased (see Fig. 13B bottom). In fact, the results were even better for lesser values of parameter a (e.g., a50.01). However, it is necessary to assign values distinct from zero to a, otherwise the method will probably fail (as it is observed in Fig. 14). In general, the assignation of an optimum value to a is not straightforward since it depends on the characteristics of the system and the data dimensionality. Real Mixtures Finally, method 2 was undertaken from the set of real spectral mixtures. Figure 15 shows the estimating spectra of the three pure components from data in TM wavebands. They adequately reproduce the spectral signa-
Figure 15. Case of real mixtures in TM wavebands. A) Polytope (symbols have the same meaning as in Fig. 10A). B) Estimated spectra of the pure components (dashed lines) jointly with the original curves (solid lines).
250
Garcı´a-Haro et al.
Figure 16. Estimated spectra of the pure components as obtained from method 3 (dashed lines) jointly with the original curves (solid lines) for TM synthetic data sets with different levels of noise.
tures of vegetation, soil, and coal. The chi-square test indicated a good performance too. Application of Method 3 Synthetic Mixtures The convergence was obtained at about only 200 epochs, when the function error presented values similar to the levels of noise (in order to avoid an overtraining that would adapt the solution to fine details associated to noise). Figure 16 shows two examples of the endmember regeneration, as obtained from TM data presenting different levels of noise. Despite the fact that the increase of noise brought about a greater fluctuation of the estimating endmembers, the residuals were comparable to introduced noise and the systematic error was small. Figure 17 shows the values of RMS residual of matrix R (corresponding to average the results from five noisy distributed data). This residual showed a behavior similar to method 1, tending to increase at increasing the number of bands and decreasing the number of wavebands. Identically occurred for matrices F and D. However, this reFigure 17. Values of RMS residual of endmembers matrix R vs. the number of bands for different levels of noise.
sidual is even smaller than in method 1, specially for the extreme case when the noise is maximum and the number of wavebands reduced (e.g., for TM data). Real Mixtures Figure 18 shows the estimated spectra of pure components as obtained from the real mixtures in TM wavebands. They reasonably reproduce the spectra of vegetation, soil, and coal. Moreover, the test of the fit of the model was successfully applied. APPLICATION TO A LANDSAT TM SCENE The methods developed are likely to estimate the composing endmembers in actual satellite scenes. Thematic Mapper (TM) data were obtained from a Landsat-5 overpass taken on 7 April 1993 (a floating window from the scene 199/34 of 80 km by 60 km in size, and limited by the UTM coordinates: Xmin5565000; Xmax5645000; Ymin5415300; Ymax54213000) which comprises the major part of the Guadalentı´n Basin—a semiarid landscape situated in SE Spain. The image was atmospheriFigure 18. Estimated spectra of the three pure components from real mixtures (in TM wavebands) as obtained from method 3.
Extraction of Endmembers from Spectral Mixtures
251
Figure 19. Values of the scores of the data points in the abstract space (∗) and extremes captured by method 2 (s) from TM pixels presenting quaternary background.
cally corrected using an invertible atmospheric radiative transfer model (Gilabert et al., 1994). We next checked the capability of method 2 to derive suitable endmembers of soil and different types of seminatural vegetation. This method was selected due to the optimal performance that offers on data with a reduced dimensionality. The area shows high variability of lithological exposure. The main dominant lithologies are carbonate, marl, quaternary, phyllite, greywackes, and gypsum. Quaternary deposits include pebbles of various sizes and compositions of sedimentary and metamorphic origin (Younis et al., 1997). In order to control soil influence we analyzed separately the zones presenting uniform soil background. This was undertaken by utilizing a lithological classification of the area (MEDALUS II Final Report, 1995). Figure 19 shows the scores of the data points (in the axes space PC1–PC2 and PC1–PC3) from 500 TM pixels randomly selected in quaternary background areas (which present matorral and pine forest covers). The polytope is enclosed by the endmember positions of the three estimated pure components. As shown in this figure, the endmember positions gather the distribution topology.
Figure 20A shows the estimated spectra of the three endmembers, for areas with quaternary background. The spectra resemble the signatures of green vegetation, soil (quaternary), and shadow. The application of this methodology to the areas with other different lithologies produced similar values: one vegetation spectrum, one soil spectrum, and a third endmember which in combination with the other two reproduces the variability of the area (shadow, other soil background, or another vegetation species). Results demonstrated the validity of the method, for example, RMS residual of the fit was comparable to data error. Furthermore, the estimated endmembers guaranteed the acceptability of the fractions [i.e., Eqs. (4)]. Figure 20B shows the distribution of the endmember fractions in the baricentric triangle axes, corresponding to areas presenting quaternary background. Even though its contribution in the scene is low, the spectrum of green vegetation was captured from the data. However, the accuracy of the different methods depends on the quality of the samples: the more biased the samples, the worse the results.
Figure 20. Case of Figure 19. Left) Estimated spectra of the pure components. Right) Distribution of the endmember fractions in the baricentric triangle axes.
252
Garcı´a-Haro et al.
Although it is theoretically possible to apply the method to a greater set of endmembers, the accuracy of the estimates is conditioned by the real dimensionality of the scene and the contrast of the materials (it is proportional to the hypervolume of the space defined by the data). Therefore, a TM scene hardly can support the computation of four endmembers with the desirable accuracy (Garcı´a-Haro et al., 1996). CONCLUSIONS Three different methods have been developed to estimate pure component spectra from spectral mixtures. They were checked from mixtures of three components. Results demonstrated the validity of the methods, reproducing the original data for a 95% confidence level. The analysis of errors provided inaccuracies of the pure components spectra below the introduced noise and relatively small values for the fractions in all cases. In general, RMS residual of estimated spectra increases at increasing the level of noise and decreasing the number of wavebands (mainly in method 1) and it is dependent not only on the number of mixture spectra but also on the spectral contrast between materials, effects that will be carefully analyzed in future work. Typical values of RMS residual were below 0.2 for endmember reflectance and 0.04 for fractions. The error of estimated parameters supplied by the minimization algorithm was similar. Results indicated that method 3 seems to be the most accurate in overall and method 2 shows the best performance when the number of wavebands is reduced, which makes it specially adequate to be applied on TM data (as confirmed subsequently). Similarly, good results were obtained for the case of real mixtures of vegetation and soil, even from data with a reduced dimensionality. Moreover, endmembers are selected from the data space itself, and therefore, the methods are less problematic than library based methods which can hardly have into account all processes and factors influencing the data, which is specially critical when the number of spectral channels is reduced. Same results were reached when handing Clementine data analysis (five bands in the ultraviolet-visible domain) with a similar type of method (Martin et al., 1997; Pinet et al., 1996). Nevertheless, the results depend on the quality of the sample. Samples should express the variability of the system of interest and present a significant variation in the mixture proportions. Though the methods can ideally identify external endmembers, they could fail provided that data do not fall near an edge of the mixing polytope, which determines the accuracy of the outcomes. The results are also conditioned by the original input (first iteration). However, despite that an appropriate election of the parameters for the first iteration improves the reliability of the computations, the three proposed methods
seem to be robust since they provided acceptable outcomes in all cases. Providing that the accuracy attainable by the methods is inadequate to extract the complete set of spectra, it is always possible to introduce a modification in order to use one or more endmembers whose signatures are well known. This increases very much the sensitivity of the method to reliably estimate the rest of pure components as it was obtained in preliminary results. Nevertheless, regardless of the method used to select the endmembers, calibration based on ground measurements is needed to convert derived fractions to absolute abundances. Concluding, the optimization proposed method provides a deep insight into the nature of spectral endmembers, offering new possibilities in imaging spectroscopy. This work has been supported by the MEDALUS III Project, funded by the European Communities (ENV4-CT95-0119). Special thanks are due to the anonymous referees for their helpful comments on the first draft of the article. J. Garcı´a-Haro has had a research grant (BOE 23.11.1994) from the Ministry of Science and Education (Spain).
REFERENCES Abuelgasim, A. A., Gopal, S., Irons, J. R., and Strahler, A. H. (1996), Classification of ASAS multiangle and multispectral measurements using artificial neural networks. Remote Sens. Environ. 57:79–87. Baraldi, A., and Parmiggiani, F. (1995), A neural network for unsupervised categorization of multivalued input patterns: an application to satellite image clustering. IEEE Trans. Geosci. Remote Sens. 33:305–316. Bateson, A., and Curtiss, B. (1996), A method for manual endmember selection and spectral unmixing. Remote Sens. Environ. 55:229–243. Bezdek, J. C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum, New York. Bierwirth, P. N. (1990), Mineral mapping and vegetation removal via data-calibrated pixel unmixing, using multispectral images. Int. J. Remote Sens. 11:1999–2017. Bryant, R. G. (1996), Validated linear mixture modeling of Landsat TM data for mapping evaporite minerals on a playa surface: methods and applications. Int. J. Remote Sens. 17:315–330. Franklin, J. (1986), Thematic mapper analysis of coniferous forest structure and composition. Int. J. Remote Sens. 7:1287–1301. Frodesen, A. G., Skjeggestad, O., and Tøfte, H. (1979), Probability and Statistics in Particle Physics, Universitetsforlaget, Oslo, Norway. Full, W. E., Ehrlich, R., and Klovan, J. E. (1981), Extended Qmodel—objective definition of external end members in the analysis of mixtures. Math. Geol. 13:331–344. Full, W. E., Ehrlich, R., and Bezdek, J. C. (1982), Fuzzy Qmodel—a new approach for linear unmixing. Math. Geol. 14:259–270.
Extraction of Endmembers from Spectral Mixtures
Garcı´a-Haro, F. J., Gilabert, M. A., and Melia´, J. (1996), Linear spectral mixture modeling to estimate vegetation amount from optical spectral data. Int. J. Remote Sens. 17:3373–3400. Garcı´a-Haro, F. J. (1997), Modelizacio´n y estimacio´n de para´metros relacionados con la cubierta vegetal en teledeteccio´n, MSc thesis, Universitat de Vale`ncia, Spain. Gilabert, M. A., Conese, C., and Maselli, F. (1994), An atmospheric correction method for the automatic retrieval of surface reflectances from TM images. Int. J. Remote Sens. 15: 2065–2086. Gopal, S., and Woodcock, C. (1996), Remote sensing of forest change using artificial neural network. IEEE Trans. Geosci. Remote Sens. 34:398–404. Heermann, P. D., and Khazenie, N. (1992), Classification of multispectral remote sensing data using a Back-Propagation Neural network. IEEE Trans. Geosci. Remote Sens. 30:81–88. Huete, A. R. (1986), Separation of soil–plant spectral mixtures by factor analysis. Remote Sens. Environ. 19:237–251. Johnson, P. E., Smith, M. O., and Adams, J. B. (1985), Quantitative analysis of planetary reflectance spectra with principal component analysis. In Proc. 15th Lunar and Planetary Science Conference. Pt 2, J. Geophys. Res., pp. C805–C810. Malinkowski, E. R., and Howery, D. G. (1977), Determination of the number of errors and the experimental error in a data matrix. Anal. Chem. 49:612–617. Martin, P. D., Pinet, P. C., Chevrel, S. D., and Daydou, Y. H. (1997), New methodology for spectral analyses and high spatial resolution: an example through the investigations of the more humorum region of the Moon. In Proc. XXVIII Lunar and Planetary Science Conference, Houston, pp. 877–878. Martin, P. D., Pinet, P. C., Bacon, R., Rousset, A., and Bellagh, F. (1996), Martian surface mineralogy from 0.8 to 1.05 microns TIGER spectro-imagery measurements in Terra Sirenum and Tharsis Montes formation. Planet Space Sci. 44:859–888.
253
MEDALUS II Final Report (1995), Internal report presented to MEDALUS office. Thatcham, Berkshire, U.K. Milton, E. J., and Emery, D. R. (1995), The identification of reference endmembers using high spatial resolution multispectral images. In Proc. RSS95 Remote Sensing in Action, Remote Sensing Society 95, Southampton, pp. 579–586. Minuit (1992), CERN program library entry D506, Reference manual, Computing and Network Division, Geneve, Switzerland. Moody, A., Gopal, S., and Strahler, A. H. (1996), Artificial neural network response to mixed pixels in coarse-resolution satellite data. Remote Sens. Environ. 58:329–343. Pinet, P. C., Martin, P. D., Costard, S. C., Daydou, Y., and Johnson, P. E. (1996), Aristarchus Plateau: Clementine spectro-imaging and geological inferences. In Proc. XXVII Lunar and Planetary Science Conference, Houston, pp. 1037–1038. Rumelhart, D. E., and McClelland, J. L. (1986), Parallel Distributed Processing, MIT Press, Cambridge, MA, Vol. 3. Sabol, D. E., Adams, J. B., and Smith, M. O. (1992), Quantitative subpixel spectral detection of targets in multispectral images. J. Geophys. Res. 25:2659–2672. Serpico, S. B., and Roli, F. (1995), Classification of multisensor remote-sensing images by structured neural networks. IEEE Trans. Geosci. Remote Sens. 33:562–578. Strahler, A. H., Woodcock, C. E., and Smith, J. A. (1986), On the nature models in remote sensing. Remote Sens. Environ. 20:121–139. Tompkins, S., Mustard, J. F., Pieters, C. M., and Forsyth, W. (1997), Optimization of endmembers for spectral mixtures analysis. Remote Sens. Environ. 59:472–489. White, K., and Drake, N. (1993), Mapping the distribution and abundance of gypsum in South-Central Tunisia from Landsat Thematic Mapper data. Z. Geomorphol. 37:309–325. Younis, M. T., Gilabert, M. A., Melia´, J., and Bastida, J. (1997), Weathering process effects on spectral reflectance of rocks in a semi-arid environment, Int. J. Remote Sens. 18:3361–3377.