International Journal of Applied Earth Observation and Geoinformation 11 (2009) 423–430
Contents lists available at ScienceDirect
International Journal of Applied Earth Observation and Geoinformation journal homepage: www.elsevier.com/locate/jag
Classifiers vs. input variables—The drivers in image classification for land cover mapping M. Heinl a,b,*, J. Walde c, G. Tappeiner d, U. Tappeiner a,b a
Institute of Ecology, University of Innsbruck, Sternwartestr. 15, 6020 Innsbruck, Austria Institute for Alpine Environment, EURAC, Viale Druso 1, 39100 Bolzano, Italy c Department of Statistics, University of Innsbruck, Universita¨tsstr. 15, 6020 Innsbruck, Austria d Department of Economics, University of Innsbruck, Universita¨tsstr. 15, 6020 Innsbruck, Austria b
A R T I C L E I N F O
A B S T R A C T
Article history: Received 11 November 2008 Accepted 20 August 2009
The study investigates the performance of image classifiers for landscape-scale land cover mapping and the relevance of ancillary data for the classification success in order to assess and to quantify the importance of these components in image classification. Specifically tested are the performance of maximum likelihood classification (MLC), artificial neural networks (ANN) and discriminant analysis (DA) based on Landsat7 ETM+ spectral data in combination with topographic measures and NDVI. ANN produced high accuracies of more than 75% also with limited input information, while MLC and DA produced comparable results only by incorporating ancillary data into the classification process. The superiority of ANN classification was less pronounced on the level of the single land cover classes. The use of ancillary data generally increased classification accuracy and showed a similar potential for increasing classification accuracy than the selection of the classifier. Therefore, a stronger focus on the development of appropriate and optimised sets of input variables is suggested. Also the definition and selection of land cover classes has shown to be crucial and not to be simply adaptable from existing land cover class schemes. A stronger research focus towards discriminating land cover classes by their typical spectral, topographic or seasonal properties is therefore suggested to advance image classification. ß 2009 Elsevier B.V. All rights reserved.
Keywords: Classification Landsat Artificial neural network Discriminant analysis Maximum likelihood Land use Land cover Thematic map Ancillary data This paper is dedicated to Professor Walter Larcher on the occasion of his 80th birthday.
1. Introduction Detailed and accurate land cover data are among the most crucial information that are required for large-scale environmental research. The knowledge of the spatial configuration of the Earth’s surface is the key for assessing habitat distribution, landscape composition or land use changes and is an essential requirement for landscape modelling and scenario building, particularly in times of global change. The suitability of remote sensing for acquiring land cover data has long been recognised, but the process of generating land cover information from remotely sensed data is still far from being standardised or optimised (Foody, 2002; Lu and Weng, 2007). An extensive variety of multi-spectral image classification methods have been developed, which were recently reviewed by Lu and Weng (2007), though none of the developed classifiers is described as inherently superior to any other, as their performance largely depends on the kind and quality of the input
* Corresponding author at: Institute of Ecology, University of Innsbruck, Sternwartestr. 15, 6020 Innsbruck, Austria. Tel.: +43 512 507 5980. E-mail address:
[email protected] (M. Heinl). 0303-2434/$ – see front matter ß 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.jag.2009.08.002
data for the classification and the desired output. Even unsupervised ISODATA classification has been used successfully, for example to extract specific, spectrally distinct features such as forests, fire scars, coastlines or urban areas (Ekercin, 2007; Heinl et al., 2006; Kaya and Curran, 2006; Souza et al., 2003). However, for obtaining thematic land cover data, supervised classification is to be preferred in most cases (Foody, 2001; Jensen, 2005; Kavzoglu, 2009), as desired output classes are already pre-defined and postclassification analyses and class aggregations are not necessarily required. Especially the use of advanced approaches such as artificial neural networks, fuzzy-sets or support vector machines produced levels of accuracy higher than, e.g. the popular maximum likelihood classifier or discriminant analysis (Berberoglu et al., 2007; Dixon and Candade, 2008; Jensen, 2005; Kavzoglu and Mather, 2003; Kavzoglu and Reis, 2008; Pal and Mather, 2005). But only few specific comparisons have been published (Berberoglu et al., 2007; Hardin, 2000; Kavzoglu and Reis, 2008; Paola and Schowengerdt, 1995; Zhang et al., 2007), usually documenting a superiority of the advanced approaches, but also suggesting maximum likelihood classification as better alternative (Carvalho et al., 2004). The use of different numbers and types of land cover classes and sample sizes complicates a quantitative comparison of
424
M. Heinl et al. / International Journal of Applied Earth Observation and Geoinformation 11 (2009) 423–430
the results. And despite the often documented inferiority in classification success, maximum likelihood classification is still one of the most widely used classification algorithms (Jensen, 2005), most likely also due to advantages in data handling and processing times (Paola and Schowengerdt, 1995). Therefore, many applied landscape-scale studies and land use/land cover research rely on these standard classification approaches (Brandt and Townsend, 2006; Cushman and Wallin, 2000; Jianchu et al., 2005; Joy et al., 2003; Ruiz-Luna and Berlanga-Robles, 2003). In contrast, advanced approaches are primarily limited to methodological studies for optimising the classification process, often using only very limited sample sizes (Fassnacht et al., 2006; Foody, 2001; Kavzoglu and Mather, 2003; Kavzoglu and Reis, 2008; Ouyang and Ma, 2006; Paola and Schowengerdt, 1997; Yemefack et al., 2006). Besides the type of image classifier, the use of ancillary data is recognised as being crucial for the performance of image classification. Ancillary data have been used successfully to improve image classification, especially by including topographic measures, NDVI or texture measures in the classification process additionally to the spectral information for separating features with similar spectral properties (Berberoglu et al., 2007; Carpenter et al., 1999; Giannetti et al., 2001; Islam et al., 2008; Joy et al., 2003; Kozak et al., 2008; Lu and Weng, 2007; Saadat et al., 2008; Watanachaturaporn et al., 2008). Despite extensive research on classifiers and ancillary data since decades, comparisons and applications of image classifiers using standardised samples on landscape-scale are largely missing (Lu and Weng, 2007). To overcome this discrepancy, the present study was conducted mutually both on the performance of different classifiers and on the importance of ancillary data for landscape-scale land cover assessments using pre-defined land cover classes. The present study investigates therefore the effect of a variety of selected and widely accessible input variables and classifiers on classification accuracy overall and on the level of specific land cover classes, and assesses and especially quantifies the importance of these components in image classification. We hypothesize that advanced classification approaches achieve higher overall accuracies compared to standard classifiers with little or no ancillary data, while incorporating ancillary data reduces the importance of the type of classifier. Specifically compared are the performance of maximum likelihood classification, discriminant analysis and artificial neural networks, covering presumably the most widely used hard classifiers and representing parametric and non-parametric classifiers. Ancillary data in the form of topographic measures and NDVI were incorporated stepwise into the classification to document the relevance of these input data. Classification results on the level of land cover classes are discussed in the context of reference data selection and land cover class definition.
in the valley bottoms and of forests, alpine grasslands and bare rocks and glaciers on slopes and high altitude. 2.2. Ancillary data Elevation, slope, aspect, sun elevation angle (terrain illumination) and the Normalised Differenced Vegetation Index (NDVI) were calculated and used as ancillary data in the classification process. A digital elevation model with 90 m resolution (pixel size) (Jarvis et al., 2006) was used to derive elevation, slope and aspect for the study region in ArcGIS 9.2. The resulting data were resampled to the spatial resolution of the spectral data (28.5 m) by cubic convolution. The three terrain measures were used in combination as one classification input package (DEM). The sun elevation angle cos(i) was used as a measure of terrain illumination and accounts for topographic effects (Teillet et al., 1982), calculated as cos(i) = cos un cos usz + sin un sin usz cos (fs fn), where un is the terrain slope angle, uszis the solar zenith angle, fs is the solar azimuth angle and fn is the aspect angle (Twele and Erasmi, 2005). Solar zenith and azimuth angle were derived from the Landsat metadata file. Slope and aspect angle were calculated from the digital elevation model. The Normalised Differenced Vegetation Index (NDVI) was calculated as NDVI = (NIRTM4 REDTM3)/(NIRTM4 + REDTM3). 2.3. Reference data
2. Methods
Reference data for training and validating the classifications were provided by an extensive land cover assessment campaign for selected districts in North Tyrol, Austria (Neustift, Fulpmes, Mutters, Innsbruck, La¨ngenfeld), South Tyrol, Italy (St. Leonhard in Passeier, St. Martin in Passeier) and Upper Bavaria, Germany (GarmischPartenkirchen, Farchant) (Tasser et al., 2009). The land cover data were derived from visual interpretation of aerial photography from 2000 in combination with field sampling in 2000/2001. The data were mapped consistently at a scale of 1:10 000 with a minimum mapping unit (MMU) of 4 ha. Only pixels within the core areas of the reference data polygons, i.e. pixels at least 50 m away from the polygon boundary (10 m for water courses), were included to reduce delineation errors. The data were transferred to raster format using the spatial resolution of the spectral data (28.5 m), which resulted in 716 524 pixels as reference data. The land cover classes used for the mapping were reclassified for the present study to meet the European CORINE Level 2 data criteria (Bossard et al., 2000; Nunes de Lima, 2005). The hierarchical CORINE land cover classification scheme includes in Level 1 ‘artificial surfaces’, ‘agricultural areas’, ‘forest and semi-natural areas’, ‘wetlands’ and ‘water bodies’, and 15 classes in Level 2, of which 11 were recorded in the study area. Only class 34 (‘glaciers and perpetual snow’) was introduced from CORINE Level 3 to account for the wide areas covered by snow and glaciers in the study area, so that 12 land cover classes were used in the present study (Table 1).
2.1. Spectral data properties and study region
2.4. Classification process
The spectral information for the image classification was acquired by the Landsat7 ETM+ sensor (path193/row027) on 13 September 1999. The imagery was provided by the Global Land Cover Facility (GLCF) (www.landcover.org) as orthorectified GeoCover data set in GeoTIFF format with UTM projection (UTM 32N), WGS-84 datum, and 28.5 m pixel size. The six bands representing the visible and infrared spectrum (ETM+ bands 1–5, 7) were used in the study. The scene was cut to 1650 3300 pixels to fit to the extent of available reference data and covers 3541 km2, including the city of Innsbruck (Austria) in the north-east (Fig. 1). The landscape is mountainous with elevation ranging from 390 m to 3739 m a.s.l. and consists primarily of grasslands and urban areas
For assessing the relevance of classifiers and input variables for the classification success, 15 classifications were calculated, using three classifiers with five different input combinations. The input variables included (1) the spectral Landsat information from bands 1–5 and 7 (ETM), (2) ETM in combination with the topographic measures elevation, slope, aspect (DEM), (3) ETM, DEM and NDVI, (4) ETM, DEM and cos(i), and (5) ETM, DEM, NDVI, cos(i). Supervised classification was performed using discriminant analysis (DA) in SPSS, maximum likelihood classification (MLC) in Geomatica and the artificial neural network (ANN) in MATLAB. The MLC was calculated so that every pixel was assigned to a training class and no Null-class was created. DA were calculated using prior probabilities
M. Heinl et al. / International Journal of Applied Earth Observation and Geoinformation 11 (2009) 423–430
425
1996) was used as training algorithm. The optimal hit ratio, which was defined as the proportion of correctly classified pixels to all pixels in the validation set, was achieved with 40 hidden units. A sample of 100 000 pixels (based on the 30% (214 931) initial training data sets) were selected randomly for the ANN classification, of which 70% were used as training data to estimate the parameters and 30% were used to control for the generalization ability. In order to have equal prior probabilities, each class was represented by 8333 data sets. For classes with less than this number of pixels the units in the random sample were duplicated. Final validation of the ANN classification was performed with the same 70% (501 593 pixels) of the reference data set as for MLC and DA. 2.5. Post-classification assessments
Fig. 1. Outline of the districts providing the reference data (white areas), presented on Landsat7 ETM+ imagery (false colour composite of bands 5, 4, 3 with UTM coordinates, Zone 32N).
equal for all groups. (Although prior probabilities proportional to the group sample size usually lead to higher classification accuracy, very small classes are rarely preserved in the classification outcome.) MLC and DA were calculated with 30% (214 931) randomly selected pixels of the reference data (training data), and were validated by the remaining 70% (501 593) of the reference data. Accuracy assessment is based on the agreement between validation data and classification outcomes. The artificial neural network, in particular a fully connected three-layer perceptron (MLP), was used as non-linear analyzing tool. The MLP consisted of an input layer containing 6, 9, 10 or 11 processing units according to the number of input variables, of a hidden layer including as many processing units as necessary to approximate the relationship, and of an output layer. The output layer had 12 output units corresponding to the land cover classes (cf. Table 1). The logistic function was employed as activation function for the output units to ensure the interpretation of the output as probability. The tangens hyperbolicus function was employed as activation function for the hidden layer. The Levenberg–Marquardt algorithm (Bishop,
Classification accuracy was assessed as overall accuracy, representing the proportion of ‘correctly’ classified pixels (i.e. pixels with corresponding reference and classification class) relative to the total amount of investigated pixels. For detailed analyses, also user’s and producer’s accuracy as well as the Kappa coefficient of agreement were calculated (cf. Foody, 2002; Jensen, 2005). A significance level of 1% was chosen. Upper and lower limits of the confidence interval were computed using the formula for confidence intervals for proportions (e.g. in Thomas and Allcock, 1984) for each class separately in order to consider the underlying heterogeneity. Afterwards the weighted average was calculated to obtain the limits for the overall accuracy. Additionally, a few training samples were drawn randomly and the methods optimised on the new samples. Subsequently the distribution of the obtained overall accuracies were analysed. As the width of the so obtained confidence intervals was quite similar to the width of the aforementioned calculated confidence intervals these results are suppressed for the sake of brevity. Due to a spatial resolution of 28.5 m of the input data (MMU of 0.08 ha), the classification results are provided in a different level of detail than the reference data (MMU of 4 ha). Therefore, classification accuracy is underestimated, as a discrepancy between reference and classification data is not necessarily a result of misclassification but can also be caused by the missing level of detail in the reference data. These differences in MMU were addressed in an additional assessment by limiting the validation data to pixels from pixel clusters larger than 4 ha in the classification outcomes. 3. Results 3.1. Overall classification accuracy related to classifiers and input variables The classifications by DA and MLC produced very similar overall accuracies for all input combinations. Accuracies were in the range of 55–60% for using only spectral data (ETM) as input variables and reached about 75% when ancillary data were included (Fig. 2). The classifications using ANN produced higher overall accuracies for all input combinations compared to MLC and DA, reaching about 75% for using only spectral data (ETM) and 85% with ancillary data. Maximum overall classification accuracy of 86.3% was achieved by using ANN and all input information, lowest accuracy of 56.3% resulted from DA using spectral information (ETM) only. The stepwise incorporation of ancillary data into the classification process showed the most pronounced increase in classification accuracy for DEM data (i.e. elevation, slope and aspect), independent of the classifier. The increase of classification accuracy by incorporating DEM data was highest for MLC and DA with 18.8% and 21.1%, respectively, and lowest for ANN with 8.4%. The further incorporation of NDVI values increased classification accuracy
426
M. Heinl et al. / International Journal of Applied Earth Observation and Geoinformation 11 (2009) 423–430
Table 1 Number and percentage of pixels of the land cover classes in the reference data. LCC
Class description
Abbr.
11 12
Urban fabric Industrial, commercial and transport units Artificial, non-agric. vegetated areas Arable land Permanent crops Grassland Forests Scrub and/or herbaceous associations Open spaces with little or no vegetation Glaciers and perpetual snow Inland wetlands Inland waters
urb ind
14 21 22 23 31 32 33 34 41 51 Total
art
Number of pixels 21,666 (3.0%) 4,412 (0.6%) 96 (<0.1%)
arab crop grass for scrub
1,453 301 33,509 356,986 108,629
(0.2%) (<0.1%) (4.7%) (49.8%) (15.2%)
open
156,446 (21.8%)
glac wet wat
30,794 (4.3%) 596 (0.1%) 1,636 (0.2%) 716,524 (100%)
Land cover class coding (LCC) and description follow the CORINE Land Cover Level 2 nomenclature (Bossard et al., 2000; Nunes de Lima, 2005).
by roughly 2–3%, including cos(i) increased the classification accuracy by roughly 1–2% and their combined inclusion increased classification accuracy by about 3–4%. Only pixels from pixel clusters larger than 4 ha were then used to correct for the differences in the MMU of reference and image data. The effect of this correction is an increase in overall accuracy to a final maximum in this study of 89.5% (Kappa: 0.84) for MLC, 89.2% (Kappa: 0.83) for DA and 94.3% (Kappa: 0.91) for ANN classification, respectively. Overall, classifications with the same input information were always most accurately classified by ANN. Regarding the input variables, especially DEM data significantly increased the classification accuracy and the increase was most pronounced for DA and MLC classification. Considering the MMU in the reference data further increased overall accuracy by about 9%. The maximum overall accuracy reached by all three classifiers was in the magnitude of about 90%. 3.2. Accuracy of single land cover classes related to classifiers and input variables The just described relation of increasing overall accuracy with increasing input variables and highest accuracies by ANN classification could only partially be supported by the results for single land cover classes, i.e. by ‘industrial units’ (12), ‘forest’ (31), ‘scrub’ (32) and ‘open’ (33) (Fig. 3). In contrast, classification success was only little affected by the input variables for ‘art. veg. areas’ (14), ‘arable’ (21), ‘crop’ (22), ‘wetland’ (41) or ‘water’ (51). These classes were also in general not very well classified by ANN and even empty output classes were produced. As a third group, the classes ‘urban fabric’ (11), ‘grassland’ (23) and ‘glaciers’ (34) were least accurately classified by ANN for ETM as single input information, though incorporating further input variables resulted again in highest accuracies by ANN. In the case of the ‘grassland’ (23) class, also the incorporation of NDVI data clearly increased classification accuracy, besides the already mentioned general positive effect of DEM data on overall accuracy. This indicates the importance of ancillary data, especially DEM, but also NDVI data, additionally to the spectral information (ETM), for the discrimination of specific land cover classes. 3.3. ‘‘Confusion’’ of land cover classes Misclassifications between single land cover classes are best assessed by an error or confusion matrix. Exemplary for the classification outcomes, the results from discriminant analysis,
Fig. 2. Overall accuracies of maximum likelihood classification (MLC), discriminant analysis (DA) and artificial neural network classification (ANN) with different input data combinations (ETM: Landsat7 ETM+ bands 1–5, 7; DEM: aspect, slope, elevation; NDVI; cos(i): terrain illumination). Presented are also overall accuracies after considering the minimum mapping unit (MMU) in the reference data, i.e. the assessment was limited to pixel clusters larger than the MMU of 4 ha (cf. Section 2).
using all input variables (ETM, DEM, NDVI, cos(i)) and considering the MMU are presented (Table 2). Producer’s accuracies higher than 80% were recorded for 6 out of the 12 land cover classes, and all classes showed producer’s accuracies higher than 50%. User’s accuracies higher than 80% were also produced for 6 out of the 12 land cover classes. The other classes showed user’s accuracies below 50%, ‘artificial, non-agric. vegetated areas’ (14) and ‘wetland’ (41) even below 10%. The land cover classes can be grouped into three main categories according to the classification success. One group consists of classes with both high user’s and producer’s accuracy, including ‘forest’ (31), ‘scrub’ (32), ‘open’ (33) and ‘glaciers’ (34). A second group consists of classes with high user’s accuracy and low producer’s accuracy, which includes ‘urban fabric’ (11) and ‘grassland’ (23). ‘Urban fabric’ (11) was primarily misclassified as ‘industrial unit’ (12) inside the group of urban classes. ‘Grassland’ (23) was primarily misclassified as ‘forest’ (31) and ‘wetland’ (41), but also to a lesser extent as ‘arable land’ (21), ‘scrub’ (32), ‘crop’ (22) and ‘artificial, non-agric. vegetated areas’ (14). The third group consists of classes with high producer’s accuracy and low user’s accuracy, including ‘industrial units’ (12), ‘artificial, non-agric. vegetated areas’ (14), ‘arable land’ (21), ‘crop’ (22), ‘wetland’ (41) and ‘water’ (51). Areas classified as ‘industrial units’ (12) included primarily pixels from ‘urban fabric’ (11), indicating again a confusion inside the ‘urban’ classes. The ‘artificial, non-agric. vegetated areas’ (14) classification was mainly confused by falsely included pixels from ‘forest’ (31), ‘urban fabric’ (11) and ‘scrub’ (32). Both ‘arable land’ (21), ‘crop’ (22) and also ‘wetland’ (41) were primarily misclassified due to the inclusion of pixels originating from ‘grassland’ (23). The areas classified as ‘water’ (51) included primarily pixels from ‘forest’ (31), but also from ‘open’ habitats (33). Adequate classification accuracy of about 80% or higher could therefore only be achieved for ‘forest’ (31), ‘scrub’ (32), ‘open’ (33) and ‘glaciers’ (34). Considering the misclassification between the two classes ‘urban fabric’ (11) and ‘industrial units’ (12) would also qualify a combined urban class as well represented by the classification. 4. Discussion 4.1. The relevance of input variables and classifiers for image classification accuracy Spectral data, topographic measures and NDVI data were used to test their performance in image classifications by maximum
M. Heinl et al. / International Journal of Applied Earth Observation and Geoinformation 11 (2009) 423–430
427
Fig. 3. Number of correctly classified pixels by MLC (*), DA (&) and ANN (!) classification for each land cover class, using different input data combinations (ETM: Landsat7 ETM+ bands 1–5, 7; DEM: aspect, slope, elevation; NDVI; cos(i): terrain illumination). Zero values for classes not present in the classification outcomes were omitted.
likelihood classification (MLC), discriminant analysis (DA) and artificial neural networks (ANN). The use of ancillary data significantly improved the classification accuracy for the present data set compared to using spectral data (ETM) only. These increases in overall accuracy were observed independent of the classifier. Especially incorporating topographic information (elevation, slope, aspect; DEM), but also the NDVI, showed positive effects on the overall accuracy, indicating strong interdependences between these factors and land cover. The NDVI is derived from surface reflectance and, therefore, inevitably responds to land cover; but due to its nonlinearly, the index adds new information to the spectral bands, enhancing class separability (Cihlar et al., 1996; Defries and Townshend, 1994; Hansen et al., 2000; Jensen, 2005; Muchoney and Strahler, 2002). But topography is only an indirect measure, reflecting environmental gradients (e.g. temperature) that originally affect the land cover. Hence, land cover is not a response to topography, but to environmental conditions or land use often associated with topography. The incorporation of topographic data into the classification process will therefore not increase classification accuracy in all cases, but becomes most relevant, where it reflects environmental gradients (e.g. as in the case of mountainous regions in the present study) (Islam et al., 2008; Saadat et al., 2008). Therefore, also hydrological, soil or geological data, etc. can be used
successfully for improving image classification accuracy. However, environmental gradients or landscape heterogeneity need to be reflected, so that their relevance becomes obviously dependent on the study area (Giannetti et al., 2001; Mas, 2004; Shrestha and Zinck, 2001; Watanachaturaporn et al., 2008). Regarding the classifiers, the study revealed large differences in classification accuracy between ANN and MLC or DA for the minimum input setting (ETM). The differences between the classifiers are less pronounced when comparing the classification results using the maximum number of input variables (cf. Fig. 2). The level of classification accuracy was similar for DA and MLC including ancillary data and ANN without ancillary data. This suggests that investing in the incorporation of ancillary data can be as productive for increasing classification accuracy as the preparation of classification algorithms: incorporating ancillary data increased overall classification accuracy by about 10% for ANN and by about 20% for MLC and DA, while the different classifiers varied in the range of 20% for ETM as sole input information and differed otherwise less than 10%. Overall, this indicates a superiority of ANN classification in case of limited input information, the necessity of incorporating ancillary data when using DA or MLC for achieving adequate classification accuracies comparable to ANN, and a minor importance of the type of classifier with increasing
M. Heinl et al. / International Journal of Applied Earth Observation and Geoinformation 11 (2009) 423–430
428
Table 2 Error matrix of the supervised classification by discriminant analysis (DA). LCC
Reference data
Sum
UA [%]
97.7 45.2 0.9 21.5 28.0 83.5 96.8 80.2 85.6 91.7 6.9 37.3
urb
ind
art
arab
crop
grass
for
scrub
open
glac
wet
wat
11
12
14
21
22
23
31
32
33
34
41
51
11 12 14 21 22 23 31 32 33 34 41 51
3834 1766 220 1 0 3 23 0 0 0 4 12
0 1564 6 0 0 3 17 0 0 0 10 5
1 3 29 0 0 5 0 0 0 0 0 0
0 4 17 162 9 0 5 0 0 0 0 1
0 0 0 0 109 0 6 0 0 0 0 0
7 49 133 591 246 5,195 1,863 572 0 0 1372 8
0 0 2,564 0 25 373 208,659 5,358 1,086 0 166 605
0 4 220 0 0 631 3,677 49,708 10,931 38 320 1
81 50 3 0 0 0 1,158 6,250 87,417 1,580 26 200
0 0 0 0 0 0 0 0 2,698 17,968 0 0
0 0 6 0 0 0 20 98 0 0 140 0
1 22 25 0 0 12 75 0 12 0 4 494
3,924 3,462 3,223 754 389 6,222 215,503 61,986 102,144 19,586 2,042 1,326
Sum
5863
1605
38
198
115
10,036
218,836
65,530
96,765
20,666
264
645
420,561
PA [%]
65.4
97.4
76.3
81.8
94.8
51.8
95.3
75.9
90.3
86.9
53.0
76.6
Input data are the Landsat bands 1–5 and 7 (ETM), elevation, slope and aspect (DEM), NDVI and cos(i). Only pixels from pixel clusters larger than 4 ha are considered in order to correct for the MMU in the reference data. See Table 1 for coding and abbreviation of the land cover classes (LCC). Overall accuracy: 89.2. Kappa coefficient: 83.5. All underlined values indicate hit ratios that are significantly higher than the corresponding hit ratios computed by employing a-priori probabilities. A-priori probabilities for class membership may either be obtained by the equal distribution of the classes or by the empirically given class distribution.
number of inputs. The latter becomes apparent when assuming all classifiers working ideally towards the same level of accuracy, i.e. 100%, so that high classification accuracies of different classifiers will inevitably approximate. But this approximation is basically driven by the input variables. Choosing a superior classifier, like, e.g. ANN (Hardin, 2000; Jensen, 2005; Kavzoglu and Mather, 2003; Zhang et al., 2007), is therefore expected to become less crucial with the more input information is included in the classification. However, ancillary data need to be carefully selected, as increasing the input information does not necessarily enhance classification accuracy (Kavzoglu and Mather, 2002). The selection of appropriate ancillary data becomes therefore highly relevant for trying to achieve high classification accuracies, especially with standard classifiers like MLC or DA. Also the accuracy assessment on the level of the single land cover classes often revealed more pronounced advantages of incorporating ancillary data compared to the selection of the classifier (cf. Fig. 3). Only ‘industrial units’ (12), ‘forest’ (31), ‘scrub’ (32) and ‘open’ (33) produced the same superiority of ANN classification for all input combinations that was documented for the overall accuracy. But these classes represent more than 85% of the reference data set, so that they obviously dominate the trends in overall accuracy. In contrast, ‘urban fabric’ (11), ‘grassland’ (23) and ‘glaciers’ (34) produced lowest accuracies for ANN classification without ancillary data and only the incorporation of DEM and NDVI led to accuracies comparable with MLC and DA. As these classes represent only 12% of the reference data, their accuracies do not affect overall accuracy, but they are of course highly relevant for land cover mapping. Therefore, despite relatively little effect of ancillary data on the overall accuracy in the ANN classifications, the assessment of the single land cover classes clearly revealed the importance of ancillary data for specific classes also for ANN classification; and not only for MLC and DA as indicated by the results on the overall accuracy. Including ancillary data in the classification process is therefore considered as crucially important for achieving high classification accuracies independent on the classifier and both on the level of overall accuracy and on the level of the single land cover classes. The rather extensive discussion in the scientific literature on techniques and performances of image classifiers (Dixon and
Candade, 2008; Foody, 2001; Kavzoglu and Mather, 2003; Kavzoglu and Reis, 2008; Ouyang and Ma, 2006; Pal and Mather, 2005; Paola and Schowengerdt, 1997) and only little attention and efforts towards developing and optimising input variables (Berberoglu et al., 2007; Carra˜o et al., 2008; Zhu and Tateishi, 2006) does, however, not reflect this picture (Lu and Weng, 2007). A stronger focus on the development of appropriate and optimised sets of input variables for discrimination of specific land cover classes rather than on image classifiers would therefore be an important step for further advances in image classification. 4.2. Aspects of reference data selection and class definition Besides input variables and classifiers, also the aspects of reference data selection and the land cover class definition are of crucial importance for any kind of supervised image classification. The approach of using data polygons for reference data selection with a minimum mapping unit (MMU) has shown to be problematic, as the discrepancy in mapping detail between reference and image data produces inaccurate land cover information for areas smaller than the MMU, but is rather unavoidably in any non-pixel based approach (Cihlar et al., 1996). The effect of the MMU could be illustrated in the present study by the increase in overall accuracy by about 10% after excluding pixels that are part of pixel clusters smaller than the MMU; considering the MMU in the accuracy assessment is therefore definitely of significant importance. The derived confusion matrix for the classification by discriminant analysis (DA) revealed not only the classification accuracy of the single land cover classes, but could also illustrate effects of inappropriate land cover class definition, which became especially evident for ‘grassland’ (23) and the classes representing artificial surfaces. The ‘grassland’ (23) class showed low producer’s accuracy and was falsely included basically in the ‘forest’ (31), ‘scrub’ (32) and ‘wetland’ (41) classes (cf. Table 2), which indicates a too heterogeneous definition of this class. Indeed, ‘grassland’ (23) includes both intensive meadows, extensively managed pastures or even abandoned grasslands and covers therefore many different ‘subclasses’ with different spectral properties caused by woody components, senescent vegetation or treatment that would require
M. Heinl et al. / International Journal of Applied Earth Observation and Geoinformation 11 (2009) 423–430
a refinement of this class. However, comparing the classification success of the ‘grassland’ (23) class by the different classifiers indicates a better separability by ANN. Artificial neural networks have, indeed, found to be more robust to training site heterogeneity than other classifiers (Kavzoglu and Reis, 2008; Muchoney and Strahler, 2002; Paola and Schowengerdt, 1995), although this could only be supported in this study when ancillary data were incorporated. ANN classification seems, therefore, less sensitive to class definition than MLC or especially DA, presuming that sufficient non-spectral information are provided for their discrimination. Inappropriate class definition affects most likely also the classification accuracy of other classes, e.g. the land cover classes ‘arable’ (21), ‘crop’ (22) and ‘wetland’ (41) with low user’s accuracies and many ‘grassland’ (23) pixels falsely included. These obviously spectrally and also topographically rather similar land cover types would require further information for their separation in image classification, e.g. by multi-temporal data merging to account for seasonal variations related to management practices (Brown de Colstoun et al., 2003; Capao et al., 2007; Carra˜o et al., 2008; Langley et al., 2001; Liu et al., 2002; Zhu and Tateishi, 2006). In contrast to the suggested refinement or subdivision of the ‘grassland’ (23) class, the classes representing artificial surfaces appear to be inadequately dissected. ‘Urban fabric’ (11) and ‘industrial units’ (12) are largely confused among each other (cf. Table 2) and appear to be difficult to distinguish spectrally or even in the process of reference data mapping. A combination of these classes would clearly improve the classification outcome, though obviously any reduction of thematic resolution would increase classification accuracy (Bach et al., 2006; Latifovic and Olthof, 2004). Inappropriate class definition became also evident for ‘artificial, non-agric. vegetated areas’ (14), which represents parks and leisure facilities largely composed of grassy and woody surfaces, and is therefore primarily confused with ‘grassland’ (23) and ‘forest’ (31) due to spectral similarities. Hence, classes that are primarily defined by their spatial appearance (‘land use’) and not by their surface characteristics (‘land cover’) seem inappropriate for image classification purely based on spectral and topographic information. In cases where land use classes are not sufficiently separable by remotely sensed data, classification accuracy may be increased through the use of texture measures or contextual information (Berberoglu et al., 2007; Carvalho et al., 2004). But nevertheless, the results clearly stress the importance of well-defined land cover classes for image classification. Class definition needs to be based on typical surface characteristics rather than on land use or spatial connectivity with other classes. Using global or continental land cover classification schemes (e.g. FAO’s LCCS (Di Gregorio and Jansen, 2000), CORINE (Bossard et al., 2000; Nunes de Lima, 2005)) requires therefore careful consideration if the used classes are appropriately representing the land cover in the regional or local study area. The land cover class definition also needs to consider the spectral properties of the classes and the separability of the classes by the input variables. Inappropriate class definition or missing input variables for discriminating classes will inevitably leads to inaccurate and blurred classification results, as, e.g. for the ‘grassland’ (23) class in the present study. The understanding of what kind of land cover classes can be discriminated by what kind of spectral, topographic or temporal information is, however, still rudimentary and further research on the separability of specific land cover classes is required to advance image classification. 5. Conclusion The comparison of the performance of MLC, DA and ANN in image classification revealed advantages of ANN classifications in image accuracy overall and for single land cover classes. The incorporation of ancillary data into the classification process clearly increased
429
classification accuracy overall and on the level of single land cover classes, independent of the used classifier. However, ANN produced high accuracies also with limited input information, while MLC and DA produced comparable results only by incorporating ancillary data into the classification process. However, the superiority of ANN classification was less pronounced on the level of the single land cover classes, especially when no ancillary data are incorporated in the classification. Overall, the magnitude of difference in overall accuracy for the different input data combinations indicates large potential of ancillary data for increasing classification accuracy, comparable to the selection of the classifier. Therefore, further approaches that work towards an optimised set of input variables for discriminating specific land cover classes are required. Also the definition and selection of land cover classes has shown to be crucial and not to be simply adaptable from existing land cover class schemes, especially for regional or local studies. Land cover classes need to be defined based on separability by the used input variables and on local habitat characteristics. A stronger focus on discriminating land cover types by their spectral, topographic or seasonal properties is therefore required to further advance the process of image classification. Acknowledgements The research was kindly supported by the University of Innsbruck Vice Rectorate for Research and the European Academy Bolzano (EURAC). The authors thank two anonymous reviewers for their valuable comments and suggestions. References Bach, M., Breuer, L., Frede, H.G., Huisman, J.A., Otte, A., Waldhardt, R., 2006. Accuracy and congruency of three different digital land-use maps. Landscape and Urban Planning 78 (4), 289–299. Berberoglu, S., Curran, P.J., Lloyd, C.D., Atkinson, P.M., 2007. Texture classification of Mediterranean land cover. International Journal of Applied Earth Observation and Geoinformation 9 (3), 322–334. Bishop, C., 1996. Neural Networks for Pattern Recognition. Oxford University Press, New York. Bossard, M., Feranec, J., Otahel, J., 2000. CORINE Land Cover Technical Guide— Addendum 2000. Technical Report No 40 (Copenhagen, EEA). Brandt, J.S., Townsend, P.A., 2006. Land use–land cover conversion, regeneration and degradation in the high elevation Bolivian Andes. Landscape Ecology 21 (4), 607–623. Brown de Colstoun, E.C., Story, M.H., Thompson, C., Commisso, K., Smith, T.G., Irons, J.R., 2003. National Park vegetation mapping using multitemporal Landsat 7 data and a decision tree classifier. Remote Sensing of Environment 85 (3), 316– 327. Capao, L., Carrao, H., Araujo, A., Caetano, M., 2007. An approach for land cover mapping with multi-temporal MERIS imagery. In: IEEEE Geoscience and Remote Sensing Symposium (IGARSS), Proceedings. pp. 3836–3839. Carpenter, G.A., Gopal, S., Macomber, S., Martens, S., Woodcock, C.E., Franklin, J., 1999. A neural network method for efficient vegetation mapping. Remote Sensing of Environment 70, 326–338. Carra˜o, H., Gonc¸alves, P., Caetano, M., 2008. Contribution of multispectral and multitemporal information from MODIS images to land cover classification. Remote Sensing of Environment 112 (3), 986–997. Carvalho, L.M.T.D., Clevers, J.G.P.W., Skidmore, A.K., Jong, S.M.D., 2004. Selection of imagery data and classifiers for mapping Brazilian semideciduous Atlantic forests. International Journal of Applied Earth Observation and Geoinformation 5 (3), 173–186. Cihlar, J., Ly, H., Xiao, Q.H., 1996. Land cover classification with AVHRR multichannel composites in northern environments. Remote Sensing of Environment 58 (1), 36–51. Cushman, S.A., Wallin, D.O., 2000. Rates and patterns of landscape change in the Central Sikhote-alin Mountains, Russian Far East. Landscape Ecology 15 (7), 643–659. Defries, R.S., Townshend, J.R.G., 1994. NDVI-derived land-cover classifications at a global-scale. International Journal of Remote Sensing 15 (17), 3567–3586. Di Gregorio, A., Jansen, L., 2000. Land Cover Classification System (LCCS)—Classification Concepts and User Manual. FAO, Rome, Italy. Dixon, B., Candade, N., 2008. Multispectral landuse classification using neural networks and support vector machines: one or the other, or both? International Journal of Remote Sensing 29 (4), 1185–1206. Ekercin, S., 2007. Coastline change assessment at the Aegean Sea Coasts in Turkey using multitemporal Landsat imagery. Journal of Coastal Research 23 (3), 691– 698.
430
M. Heinl et al. / International Journal of Applied Earth Observation and Geoinformation 11 (2009) 423–430
Fassnacht, K.S., Cohen, W.B., Spies, T.A., 2006. Key issues in making and using satellite-based maps in ecology: a primer. Forest Ecology and Management 222 (1–3), 167–181. Foody, G.M., 2001. Thematic mapping from remotely sensed data with neural networks: MLP, RBF and PNN based approaches. Journal of Geographical Systems 3, 217–232. Foody, G.M., 2002. Status of land cover classification accuracy assessment. Remote Sensing of Environment 80, 185–201. Giannetti, F., Montanarella, L., Salandin, R., 2001. Integrated use of satellite images, DEMs, soil and substrate data in studying mountainous lands. International Journal of Applied Earth Observation and Geoinformation 3 (1), 25–29. Hansen, M.C., Defries, R.S., Townshend, J.R.G., Sohlberg, R., 2000. Global land cover classification at 1 km spatial resolution using a classification tree approach. International Journal of Remote Sensing 21 (6–7), 1331–1364. Hardin, P.J., 2000. Neural networks versus nonparametric neighbor-based classifiers for semisupervised classification of Landsat Thematic Mapper imagery. Optical Engineering 39 (7), 1898–1908. Heinl, M., Neuenschwander, A., Sliva, J., Tacheba, B., 2006. Interactions between fire and flooding in a southern African floodplain system (Okavango Delta, Botswana). Landscape Ecology 21, 699–709. Islam, M.A., Thenkabail, P.S., Kulawardhana, R.W., Alankara, R., Gunasinghe, S., Edussriya, C., Gunawardana, A., 2008. Semi-automated methods for mapping wetlands using Landsat ETM plus and SRTM data. International Journal of Remote Sensing 29 (24), 7077–7106. Jarvis, A., Reuter, H.I., Nelson, A., Guevara, E., 2006. Hole-filled Seamless SRTM Data V3. International Centre for Tropic Agriculture (CIAT) available from http:// srtm.csi.cgiar.org. Jensen, J.R., 2005. Introductory Digital Image Processing: A Remote Sensing Perspective. Pearson, Prentice Hall, USA. Jianchu, X., Xihui, A., Xiqing, D., 2005. Exploring the spatial and temporal dynamics of land use in Xizhuang watershed of Yunnan, southwest China. International Journal of Applied Earth Observation and Geoinformation 7 (4), 299–309. Joy, S.M., Reich, R.M., Reynolds, R.T., 2003. A non-parametric, supervised classification of vegetation types on the Kaibab National forest using decision trees. International Journal of Remote Sensing 24 (9), 1835–1852. Kavzoglu, T., 2009. Increasing the accuracy of neural network classification using refined training data. Environmental Modelling & Software 24 (7), 850– 858. Kavzoglu, T., Mather, P.M., 2002. The role of feature selection in artificial neural network applications. International Journal of Remote Sensing 23 (15), 2919– 2937. Kavzoglu, T., Mather, P.M., 2003. The use of backpropagating artificial neural networks in land cover classification. International Journal of Remote Sensing 24 (23), 4907–4938. Kavzoglu, T., Reis, S., 2008. Performance analysis of maximum likelihood and artificial neural network classifiers for training sets with mixed pixels. Giscience & Remote Sensing 45 (3), 330–342. Kaya, S., Curran, P.J., 2006. Monitoring urban growth on the European side of the Istanbul metropolitan area: A case study. International Journal of Applied Earth Observation and Geoinformation 8 (1), 18–25. Kozak, J., Estreguil, C., Ostapowicz, K., 2008. European forest cover mapping with high resolution satellite data: the Carpathians case study. International Journal of Applied Earth Observation and Geoinformation 10 (1), 44–55. Langley, S.K., Cheshire, H.M., Humes, K.S., 2001. A comparison of single date and multitemporal satellite image classifications in a semi-arid grassland. Journal of Arid Environments 2001 (49), 401–411. Latifovic, R., Olthof, I., 2004. Accuracy assessment using sub-pixel fractional error matrices of global land cover products derived from satellite data. Remote Sensing of Environment 90 (2), 153–165. Liu, Q.J., Takamura, T., Takeuchi, N., Shao, G., 2002. Mapping of boreal vegetation of a temperate mountain in China by multitemporal Landsat TM imagery. International Journal of Remote Sensing 23 (17), 3385–3405.
Lu, D., Weng, Q., 2007. A survey of image classification methods and techniques for improving classification performance. International Journal of Remote Sensing 28 (5), 823–870. Mas, J.F., 2004. Mapping land use/cover in a tropical coastal area using satellite sensor data, GIS and artificial neural networks. Estuarine Coastal and Shelf Science 59 (2), 219–230. Muchoney, D.M., Strahler, A.H., 2002. Pixel- and site-based calibration and validation methods for evaluating supervised classification of remotely sensed data. Remote Sensing of Environment 81 (2–3), 290–299. Nunes de Lima, M.V.E., 2005. CORINE land cover updating for the year 2000. Image 2000 and CLC2000, products and methods. EUR 21757 EN (Ispra, JRC-IES). Ouyang, Y., Ma, J., 2006. Classification of multi-spectral remote sensing data using a local transfer function classifier. International Journal of Remote Sensing 27 (24), 5401–5408. Pal, M., Mather, P.M., 2005. Support vector machines for classification in remote sensing. International Journal of Remote Sensing 26 (5), 1007–1011. Paola, J.D., Schowengerdt, R.A., 1995. A detailed comparison of backpropagation neural network andmaximum-likelihood classifiers for urban land use classification. Geoscience and Remote Sensing 33 (4), 981–996. Paola, J.D., Schowengerdt, R.A., 1997. The effect of neural-network structure on a multispectral land-use/land-cover classification. Photogrammetric Engineering & Remote Sensing 63 (5), 535–544. Ruiz-Luna, A., Berlanga-Robles, C., 2003. Land use, land cover changes and coastal lagoon surface reduction associated with urban growth in northwest Mexico. Landscape Ecology 18 (2), 159–171. Saadat, H., Bonnell, R., Sharifi, F., Mehuys, G., Namdar, M., Ale-Ebrahim, S., 2008. Landform classification from a digital elevation model and satellite imagery. Geomorphology 100 (3–4), 453–464. Shrestha, D.P., Zinck, J.A., 2001. Land use classification in mountainous areas: integration of image processing, digital elevation data and field knowledge (application to Nepal). International Journal of Applied Earth Observation and Geoinformation 3 (1), 78–85. Souza, C., Firestone, L., Silva, L., Roberts, D., 2003. Mapping forest degradation in Eastern Amazon from SPOT4 through spectral mixture models. Remote Sensing of Environment 87 (4), 494–506. Tasser, E., Ruffini, F., Tappeiner, U., 2009. An integrative approach for analysing landscape dynamics in diverse cultivated and natural mountain areas. Landscape Ecology 24 (5), 611–628. Teillet, P.M., Guindon, B., Goodenough, D.G., 1982. On the slope-aspect correction of multispectral scanner data. Canadian Journal of Remote Sensing 8 (2), 84– 106. Thomas, I.L., Allcock, G.M., 1984. Determining the confidence level for a classification. Photogrammetric Engineering and Remote Sensing 50 (10), 1491– 1496. Twele, A., Erasmi, S., 2005. Evaluating topographic correction algorithms for improved land cover discrimination in mountainous areas of central Sulawesi. In: Erasmi, S., Cyffka, B., Kappas, M. (Eds.), Remote Sensing and GIS for Environmental Studies. Go¨ttinger Geographische Abhandlungen 113, Go¨ttingen, pp. 287–295. Watanachaturaporn, P., Arora, M.K., Varshney, P.K., 2008. Multisource classification using support vector machines: an empirical comparison with decision tree and neural network classifiers. Photogrammetric Engineering and Remote Sensing 74 (2), 239–246. Yemefack, M., Bijker, W., De Jong, S.M., 2006. Investigating relationships between Landsat-7 ETM+ data and spatial segregation of LULC types under shifting agriculture in southern Cameroon. International Journal of Applied Earth Observation and Geoinformation 8 (2), 96–112. Zhang, Y., Gao, J., Wang, J., 2007. Detailed mapping of a salt farm from Landsat TM imagery using neural network and maximum likelihood classifiers: a comparison. International Journal of Remote Sensing 28 (10), 2077–2089. Zhu, L., Tateishi, R., 2006. Fusion of multisensor multitemporal satellite data for land cover mapping. International Journal of Remote Sensing 27 (5–6), 903– 918.