ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing journal homepage: www.elsevier.com/locate/isprsjprs
A data mining approach for evaluation of optimal time-series of MODIS data for land cover mapping at a regional level Fuqun Zhou a,⇑, Aining Zhang a, Lawrence Townley-Smith b a b
Canada Centre for Remote Sensing, 588 Booth Street, Ottawa, Ontario K1A 0Y7, Canada Agriculture-Ago Food Canada, Science and Technology Branch, Agriculture and Agri-Food Canada, 1800 Hamilton Street, Regina, Saskatchewan S4P 4L2, Canada
a r t i c l e
i n f o
Article history: Received 4 January 2013 Received in revised form 24 May 2013 Accepted 17 July 2013 Available online 21 August 2013 Keywords: Land cover Time-series imagery Data mining Information extraction Optimal variable combination MODIS data
a b s t r a c t Optical Earth Observation data with moderate spatial resolutions, typically MODIS (Moderate Resolution Imaging Spectroradiometer), are of particular value to environmental applications due to their high temporal and spectral resolutions. Time-series of MODIS data capture dynamic phenomena of vegetation and its environment, and are considered as one of the most effective data sources for land cover mapping at a regional and national level. However, the time-series, multiple bands and their derivations such as NDVI constitute a large volume of data that poses a significant challenge for automated mapping of land cover while optimally utilizing the information it contains. In this study, time-series of 10-day cloud-free MODIS composites and its derivatives – NDVI and vegetation phenology information, are fully assessed to determine the optimal data sets for deriving land cover. Three groups of variable combinations of MODIS spectral information and its derived metrics are thoroughly explored to identify the optimal combinations for land cover identification using a data mining tool. The results, based on the assessment using time-series of MODIS data, show that in general using a longer time period of the time-series data and more spectral bands could lead to more accurate land cover identification than that of a shorter period of the time-series and fewer bands. However, we reveal that, with some optimal variable combinations of few bands and a shorter period of time-series data, the highest possible accuracy of land cover classification can be achieved. Crown copyright Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved.
1. Introduction Optical Earth Observation (EO) data with medium spatial resolution, typically MODIS (Moderate Resolution Imaging Spectroradiometer), are of particular value to environmental applications and are considered as the most effective data sources for land cover mapping at regional and national scales due to their high temporal and spectral resolutions. Time-series of EO data, including from Advanced Very High Resolution Radiometer (AVHRR), SPOT VGT, and Moderate Resolution Imaging Spectroradiometer (MODIS), have been widely used in land cover identification (Hirosawa et al., 1996; Kasischke and French, 1997; Liang, 2001; Friedl et al., 2002; Bagan et al., 2005; Latifovic and Pouliot, 2005; Xiao et al., 2005; Matsuoka et al., 2007; Wardlow et al., 2007; Carrão et al., 2008; Zhang et al., 2008; Clark et al., 2010; Lv and Liu, 2010; Wardlow and Egbert, 2010; Colditz et al., 2012a) and for land cover dynamics assessment and land cover change detection (Jakubauskas et al., 2002; Yang and Lo, 2002; Huttich et al.,
⇑ Corresponding author. E-mail address:
[email protected] (F. Zhou).
2007; Donohue et al., 2008; Giriraj et al., 2008; Liu et al., 2011; Alcantara et al., 2012). Time-series of EO data have many advantages compared to a single time image for land cover identification (Maxwell et al., 2002a) as the former captures dynamic spectral information of vegetation at various vegetation growth stages. In some of these studies, the derived metrics from the time-series EO spectral information such as Normalized Difference Vegetation Index (NDVI) (Fuller, 1998; Azzalil and Menentil, 1999; Weiss et al., 2001; Hill and Donald, 2003; Hilla et al., 2004; Knight et al., 2006; Wardlow and Egbert, 2008; Baldi et al., 2008; Evrendilek and Gulbeyaz, 2008; Kleynhans et al., 2011; Fensholt and Proud, 2012; Neeti et al., 2012) and vegetation phenology are used (Reed et al., 1994; de Beurs and Henebry, 2004; Lupo et al., 2007; Bradley and Mustard, 2008). These derived metrics augment the data dimension and enrich the information for vegetation mapping and monitoring. However, the time-series, multiple bands and their derivations constitute a large volume of data that poses a significant challenge for automated mapping of land cover. How to optimally use this wealth of information and maximally realize its advantages remains an issue. For example:
0924-2716/$ - see front matter Crown copyright Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.isprsjprs.2013.07.008
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
(1) Would the utilization of all the information available give the highest accuracy for land cover identification? (2) Would utilization of only a portion of the time-series of data (for example, a few spectral bands, shorter time period or the derived metrics such as NDVI, vegetation phenology) achieve similar or better accuracy than using of all the information? (3) Would utilization of the derived metrics such as NDVI and vegetation phenology metrics combined with spectral bands (fewer bands) result in the same or higher accurate land cover maps? (4) If the answers to questions 2 or/and 3 are positive, what combinations of bands and time-periods of data and NDVI or vegetation phenology may lead to the optimal result? Maxwell et al. (2002a,b) examined AVHRR time-series data (16day composites) for land cover identification, and addressed some of the above questions. In their first study (Maxwell et al., 2002a), they evaluated the effect of reducing the number of composite periods and altering the spacing of those composite periods on land cover classification accuracy. They found that the number of composite periods can be halved, reducing from fourteen (14) composite dates to seven (7) composite dates, without significantly reducing overall classification accuracy, and that concentrating more composites near the beginning and end of the growing season, as compared to using evenly spaced time periods, produced slightly higher classification values. The study result was similar to the earlier studies by McGwire et al. (1992) and Ehrlich and Lambin (1996). In another study, Maxwell et al. (2002b) assessed the reduction of spectral dimensionality of the AVHRR bands while retaining information needed for land cover mapping. For AVHRR, they concluded that three channels among five are necessary for effective land cover type discrimination. Recently, Biradar et al. (2007) assessed the best spectral bands and timing of imagery for land use – land cover class separation using Landsat ETM+ and Terra MODIS data (16-day composites). Their study found that using certain bands and time-series period combinations of both datasets can better separate land cover and land use in their small study area. Although these studies have answered some of the above questions, further studies are needed to fully explore the optimal reduced dataset combinations for land cover identification for potential monitoring operations at a regional scale. Moreover, although NDVI and vegetation phenology derived from time-series of spectral bands are increasingly used for characterizing land cover and detecting land cover change, their contributions along with the spectral bands as well as their optimal combination for accurate land cover mapping need a thorough evaluation. This study tries to answer the above questions by fully exploring the time-series of MODIS 10-day cloud-free composites in the growing season to identify the optimal variable combinations for land cover identification at a regional level. As the possible combinations of the time-series of MODIS data and its derivatives are enormous, we have developed and evaluated four variable combination models to identify the optimal combination of variables for land cover mapping which is described in section below.
2. Study area, data and method 2.1. Study area The area under the study is the southern agriculture region of Saskatchewan, Canada as shown in Fig. 1. Saskatchewan is in the central part of the Canadian Prairies and is bounded on the west
115
by Alberta, on the north by the Northwest Territories, on the east by Manitoba of Canada, and on the south by the states of Montana and North Dakota, USA. Saskatchewan is vast – 651,900 km2 and contains two major natural regions: the boreal forest in the north and the prairies in the south. The southern half of Saskatchewan contains almost half of Canada’s total cultivated farmland. Saskatchewan continues to be a leader in cereal crop production across the country supplying 5% of the world’s total exported wheat and is Canada’s most important grain-producing region. Saskatchewan also produces oilseeds such as canola, flaxseed, mustard, sunflower and safflower. In the northern part of the province, forestry is also a significant industry. Overall, annual crop land dominates the study region, with substantial pasture/perennial crop land such as alfalfa. Forest land is mainly distributed in the north. (http://www.agriculture.gov.sk.ca/Saskatchewan_Picture/ – accessed on August 31st, 2012). 2.2. EO data and derived variables MODIS sensors aboard the Terra satellites provide global coverage every one to two days, acquiring data in 7 spectral bands suitable for land surface applications, with a spatial resolution of 250 m (bands 1–2) or 500 m (bands 3–7). It is one of the most advanced sensors available for large-scale terrestrial applications (Salomonson et al., 1989). To facilitate use of the datasets, Canada Centre for Remote Sensing has developed a technology to produce 10-day cloud-free composites of Terra MODIS’ 7 bands (1–7) covering Canada and North America. In addition, bands 3–7 are downscaled from the original spatial resolution of 500 m to 250 m (Luo et al., 2008). Therefore all the 7 bands of the 10-day cloud-free MODIS composites used in the study have the same spatial resolution and dimension. Canada is located at high latitude and has only one growing season. For this study, only the data in the growing season (April to October) is included. The selected time-series datasets span 7 months and each month has 3 composites. In total, 21 composites are available in the period of the vegetation growing season. Considering all the 7 bands for each composite, there is a total of 147 ‘bands’ or layers of the selected time series. Two additional datasets, NDVI and a set of vegetation phenological metrics in the vegetation-growing season, are derived from the time-series of MODIS bands and analyzed. As NDVI is derived from MODIS bands 1 and 2, it has the same length of the MODIS composites, i.e., there are 3 NDVI values per month, and in total, a timeseries of 21 NDVI variables are derived during the studied season. A time-series of NDVI of three typical vegetation types (annual crop, native grass and deciduous tree) are shown in Fig. 2a, c, and e. It can be seen that the three types of vegetation, in general, have different NDVI patterns. Vegetation phenology represents stages of growth or development of plants which occur during a growing season, and apply to both natural vegetation (e.g. trees, and grassland) and cultivated crops. NDVI responds to the growth of vegetation which differs with land cover and can be captured by EO sensors with a high revisit frequency. Vegetation phenological curves can be derived from a time-series of NDVI (Fig. 2) and described by phenological parameters from the curves (Fig. 3). The phenology curves of three typical vegetation types (annual crop, native grass, and deciduous tree) are shown in Fig. 2b, d, and f. In this study, a total of 11 vegetation phenological parameters were derived from the phonology curves using the TIMESAT package (Jönsson and Eklundh, 2003). We discuss these parameters and their applications in detail in later sections. Fig. 2 shows the typical time-series of NDVI in a growing season and the derived vegetation phenological curve of three different
116
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
Fig. 1. The study area.
types of vegetation. The three vegetation types have different NDVI values at various growing stages and therefore different phenological curves. The phenological curves filter out some of the noise in the NDVI time series. The typical annual crop has a narrow shape, indicating a short growing period, and the peak of the curve is often during July. Native grass has a flat but relatively constant NDVI distribution over the growing season. The typical deciduous tree has the largest NDVI over almost all the growing season. With all the spectral information (time-series of MODIS 10-day cloud-free 7 band composites) and the derived data (time-series of NDVI, and phenology parameters), there are a number of possible variable combinations that could be useful for land cover identification. To fully explore the optimal variable combinations for land cover identification, a data mining methodology is developed to identify the optimal variable combinations for achieving land cover mapping by using a commercial data mining tool See5/C5.0. MODIS became operational in 2000 and there is a circa 2000 land cover map (c2000 map) in the study area which was generated from Landsat ETM images. Therefore year 2000 is selected for the evaluation of the optimal variable combination for the land cover mapping using the c2000 map (Agriculture and Agri-Food Canada, 2012) as the reference. 2.3. Classification scheme The circa 2000 land cover map of Saskatchewan has 12 land cover types, ten of them are included in the study, and the other 2 are disregarded as in total they consist of about only 0.6% of
the land area, and due to the objective of the study for land cover identification at regional level. The 10 land cover types and their description are listed in Table 1. For the study, it is not critical to separate deciduous, coniferous, and mixed forest land cover. They are grouped into forest land cover type in the study. Wetland was masked out before the mapping. All the analysis and results described below are based on the redefined land cover types. 2.4. Data mining approach The datasets used in this study as described above are timeseries of 10-day cloud-free MODIS 7 bands composites and their derived NDVI and vegetation phenological metrics. As a variety and a large volume of data is available, the main objective of the study is to identify the optimal spectral or/and the spectrally derived variable combinations and time period which could produce land cover maps with the highest possible accuracy. Instead of selecting a few variable combinations subjectively for the evaluation, this study develops 4 variable combination models and thoroughly evaluates all possible variable combinations with various time periods to explore the optimal variable set which can achieve the highest possible accuracy of land cover identification. Fig. 4 shows the workflow of the data mining process. The input data are stored in the variable space and extracted into a feature space using the data mining process, which uses feature extractors to convert the variables into features. Then the data mining models
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
117
Fig. 2. Time-series of NDVI variables and phenology curves of three typical vegetation types: at the left are NDVI time series: (a) Annual crop; (c) native grass; (e) deciduous tree. At the right (b), (d), and (f) are the corresponding fitted phenology curves.
are trained and verified using training and verification features respectively before they are deployed to the data of the entire study area. The data mining tool used in the study is See5/C5.0, a commercial and generic tool that extracts patterns from the input data into the form of a decision tree or rule sets for aiding decision-making. See5/C5.0 has been applied by several studies in land cover classification based on remote sensing (Keane et al., 2004; Pal and Mather 2003; Kumar et al., 2010; Evrendilek and Gulbeyaz, 2011; Boryan et al., 2011; Boulila et al., 2011; Colditz et al., 2012b; Schneider, 2012). It has been designed to analyze large volumes of data and incorporates innovations such as adaptive boosting, which constructs multiple classifiers for ensemble classification. When boosting is enabled, multiple classifiers are created; each classifier votes for its predicted classes and the votes are counted to determine the final classes (Quinlan, 1996). In these studies,
ten ensemble classifiers (with See5/C5 boost option 10 trials) have been found to be the optimum improving accuracy and were used for all the models. The variables of the input EO and EO-derived metrics in the variable space can be described as: (a) Spectral data of MODIS time-series of 10-day cloud-free composites: spectral(bt) where b e (1, . . . , 7) (band index of the cloud-free composite) and t e (1, . . . , 21) (time index of the time-series of cloud-free composites). (b) Derived NDVI time-series: NDVI(t) where t e (1, . . . , 21) (time index of the NDVI time-series). (c) Vegetation phenological parameters: VPP(k)
118
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
n1 ðm; nÞ ¼
m X n X spectralðbt Þ
mP3
ð1Þ
b¼1 t¼1
n2 ðm; nÞ ¼
n m X X ðNDVIðtÞ þ spectralðbt ÞÞ t¼1
n3 ðm; nÞ ¼
11 m X n X X VPPðkÞ þ spectralðbt Þ k¼1
n4 ðlÞ ¼
b P 3;
mPb
ð2Þ
b P 3;
mPb
ð3Þ
b¼3
b¼3 t¼1
l X VPPðkÞ
ð4Þ
k¼1
Fig. 3. Illustration of a phenology curve and some of phenology parameters (the original figure from Eklundh and Jonsson (2012)). The black line is the NDVI curve and the red line is the fitted function. The phenology features include: (a) date of beginning of growing season; (b) date of end of season; (c) date of reaching 80% of seasonal amplitude (rising); (d) date of reaching 80% of seasonal amplitude (declining); (e) maximum value; (f) seasonal amplitude; (g) growing season length; (h) small seasonal integral (area between the base level and the phonology curve); and (i) large seasonal integral (area below phenology curve).
where k e (1, . . . , 11) (index of phenology parameters). As shown above and in Fig. 4, the variable space contains timeseries of spectral information spectral(bt), derived NDVI(t), and vegetation phenology parameters VPP(k). Four feature models are constructed to extract variable combinations from the variable space into the feature space; each of them generates a large number of feature sets:
where n(m, n) is the feature model with parameters m and n; m e (1, . . . , 7) the MODIS band index; n e (1, . . . , 21) a composite index of the time-series; l e (1 . . . , 11) the phenology parameter index; P and max min is the operator of variable combinations from min to max (max P min). Model 1 n1(m, n) generates features with minimum 3 spectral bands and maximum 7 bands of various time-series periods from one single time stamp to the entire length of the time-series; Model 2 n2(m, n) generates features of NDVI and at least 1 spectral band and maximum 5 (exclusive bands 1 and 2) in various lengths of the time-series; and Model 3 n3(m, n) generates features of vegetation phenology parameters and at least 1 spectral band and maximum 5 (exclusive bands 1 and 2) with various time-series lengths; Model 4 n4(l) generates features of various combinations of vegetation phenology parameters. For the Models 2 and 3 as shown in Eqs. (2) and (3), in order to avoid redundancy, bands 1 and 2 are excluded from the feature extraction when spectral bands are combined with NDVI or phenology metrics.
Table 1 Major land cover types and their description. Land cover type
Description
Annual cropland
Annually cultivated cropland and woody perennial crops. Includes annual field crops, vegetables, summer fallow, orchards and vineyards Lakes, reservoirs, rivers, streams, salt water, etc. Land predominantly built-up or developed; including vegetation associated with these cover conditions. This may include road surfaces, railway surfaces, buildings and paved surfaces, urban areas, parks, industrial sites, mine structures and farmsteads Predominantly native grasses and other herbaceous vegetation, may include some shrubland cover. Land used for range or native unimproved pasture may appear in this class Periodically cultivated cropland. Includes tame grasses and other perennial crops such as alfalfa and clover grown alone or as mixtures for hay, pasture or seed Predominantly woody vegetation of relatively low height (generally ±2 m)
Water bodies Developed land Native grassland Perennial (perennial cropland and pasture) Shrubland Forest land Deciduous forest Coniferous forest Mixed forest Wetland
Predominantly broadleaf/deciduous forests or treed areas Predominantly coniferous forests or treed areas Mixed coniferous and broadleaf/deciduous forests or treed areas Land with a water table near/at/above soil surface for enough time to promote wetland or aquatic processes (semi-permanent or permanent wetland vegetation, including fens, bogs, swamps, sloughs, marshes, etc.)
Training Data Variable Space Spectral (bt) NDVI (t) VPP(k)
Feature Extractor
Feature space of variable combinations
Verification Data
Data Ming See5/C5.0
Decision Tree for Land Cover Classification
Land Cover Classification
Fig. 4. EO-based data mining procedures for land cover identification.
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
In the feature space, each feature set, i.e., a combination of variables with a time period (either a single time slot – one composite, or multiple time frames – multiple composites) generated by Model 1, 2, 3 or 4, is processed by the data mining tool and then evaluated to determine if any variable combination can sufficiently identify all the land cover types of the study. 2.5. Sampling strategies Point samples like ground reference collected from field survey using GPS are not the best representations for both model training and verification of land cover study using MODIS images due to its medium spatial resolution. As the ground dimension of the MODIS data used in the study is 250 m by 250 m, a pixel may be mixed by two or more land cover types on the ground. If one of the land cover types in such pixel is sampled and then used to represent the entire pixel either for model training or verification, the result does not reflect the true situation. In order to avoid this situation, instead, an area sampling method is used in the study. A pixel of EO images (especially with a medium or coarse spatial resolution) could have three land cover situations: (1) only one land cover type; (2) a dominant land cover; or (3) mixed land cover types. A pixel with only one land cover type (situation 1) is called a homogeneous pixel. A pixel with a dominant land cover (situation 2) is defined as within the ground dimension of the pixel there are two or more land cover types, but among those land cover types, one land cover type covers more than a half of the ground area, and other land cover types occupy the rest area of the pixel. A pixel of the situation 3 can be called a heterogeneous pixel of which the ground surface of the pixel is covered by multiple land cover types, but without a dominant one. Homogeneity, dominance, and heterogeneity described above are relative terms and changeable depending on the scale of a study. When they are applied to EO-based applications, they are determined by the spatial resolution of imagery in relation to the size of features on the ground. For this study, the three types of landscape settings are distinguished within the dimension of a MODIS pixel. The c2000 map was used to identify and evaluate the distribution of these pixel types within the study area. The c2000 map is raster-based and generated mainly from Landsat ETM images, and its spatial resolution is 30 m. Each pixel of the map is assigned one land cover, therefore every pixel of the reference map can be considered as a homogeneous one. When a MODIS pixel is superimposed on the land cover map, it covers about 64 pixels of the land cover map. A MODIS pixel is identified as homogeneous if and only if all the corresponding 64 pixels of the c2000 map have the same land cover, or as dominant if more than one land cover types are found, but 32 or more sub-pixels have the same land cover (the dominant one), or as heterogeneous (other land cover combinations). Within the study area, the proportion of homogeneous and dominant (MODIS) pixels is 48.16%, and 47.76%, respectively. In total, they occupy about 96% of the study area. Only about 4% of the area is occupied by multiple land cover types (heterogeneity) without a dominant land cover defined above. In this study, we used only homogeneous samples for model training, but had two sampling strategies for model verification: (1) homogeneous samples only and (2) dominant and homogeneous samples. The latter samples represent about 96% of the landscape of the study region. All the samples are selected randomly within the study area and 1000 samples were gathered independently for each of the land cover types. Deciduous, coniferous, and mixed forest were sampled and classified separately, but their results were aggregated as forest land. Therefore, as a group, forest land has 3000 samples in total.
119
3. Result and analysis 3.1. Results and analysis of homogeneous pixels for training and verification Using the methodology developed in Section 2.4: (1) Model 1 (Eq. (1)) of spectral band combinations only, (2) Model 2 (Eq. (2)) of NDVI/spectral band combinations, (3) Model 3 (Eq. (3)) of phenology parameter/band combinations, and (4) Model 4 (Eq. (4)) of phenology parameter combinations only. This section will discuss the results of the first 3 models (Models 1, 2 and 3) of the variable combinations, and then discuss the results of Model 4 in Section 3.6. In the first three groups of variable combinations, each variable associates with a time period. To simplify the analysis, the unit of the time-series period is 1 month. Therefore all the 3 composites within a month are arranged as a unit. That means that if a month of the time-series is considered for evaluation, all the data of the three composites within the month are used. Table 2 shows the lowest and highest Kappa values of the three models for land cover identification using homogeneous pixels for both training and verification processes. The highest Kappa values of the three models are very similar, but the lowest has large variations. The NDVI and one spectral band of a single time stamp (1 month) yields the lowest Kappa (0.55), while the combination of band and vegetation phenology parameters yields the highest low-end Kappa (0.75). An analysis of the Kappa values of different variable combinations reveals that, in general, the longer the length of the time-series of data, the higher the Kappa value, and thus the higher identification accuracy. This observation suggests that time series data are more informative than a single snapshot image for pattern recognition of land cover. For example, the low-end Kappa for phenology/spectral band combinations had a much larger value than that of other two combinations as phenology parameters are derived from all the data in a full growing season. In the following sections, the analysis is based on the accuracy of land cover classification against the verification samples, instead of using Kappa value. Table 3 shows the result of one of the optimal NDVI/band variable combinations (one from the variable combination group with the highest Kappa). The confusion matrix patterns of the land cover types are typical of the 3 models with the highest Kappa. The numbers in bold in the diagonal of the matrix represent the pixels that are correctly classified and suggest that the optimal variable combinations (the highest overall accuracy) can discriminate most of the land cover types with an overall average accuracy of about 89%. The land covers of water bodies, forest land, annual cropland, and developed land all have accuracies over 90%. Perennial land has the lowest accuracy (76%). In the land covers with accuracies below 80%, perennial cover and native grassland are often misclassified as each other (130 and 184 out of 1000 samples, respectively). The perennial land includes seeded grasslands which are similar to native grassland in spectral signature and phenology. Shrubland and forest are also misclassified with each other, as the main difference between them is height and density. In many situations, they likely have similar spectral signatures and phenology. Although the native grassland and perennial land are difficult to separate from each other, they are easily distinguished from the other land cover types. Only 8 out of 1000 samples of perennial land are classified as forest land cover, and no grassland is mislabeled as forest land. Twenty-six and 50 out of 1000 samples of grassland and perennial land are misclassified as cropland, respectively. Developed land has an accuracy of 93%. Although it does not confuse with water bodies, it confuses with all other types of land cover slightly. Water has the least confusion with others.
120
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
Table 2 Kappa of land cover classification. Model
No. of variable combinations
Spectral bands with various time-series periods NDVI + spectral bands with various time-series periods Phenology metrics + spectral bands with various time-series periods
12,573 3937 3937
Kappa Lowest
Highest
0.58 0.55 0.75
0.84 0.85 0.84
Table 3 Accuracy matrix of a band/NDVI combination for land cover classification (one of the best variable combinations). Land cover type
Cropland
Cropland Forest land Shrubland Grassland Perennial Developed Water Average
934 4 10 26 50 22 2
Forest land
Shrub land
12 2872 118 8 6 2
102 798 8 26 8
3.2. Results of homogeneous pixels for training and dominant pixels for verification In the study region, about 52% of the land cover is not homogeneous based on the spatial dimension of a MODIS pixel and 30 m c2000 map. Therefore, we conducted an evaluation using homogeneous samples for model training, but dominant land cover samples for model verification. Statistically, the verification samples represent about 96% of all the MODIS pixels of the study region. The same three groups of features (Models 1, 2, and 3) described above were evaluated. Table 4 lists the agreement percent of land cover types of the best results of the three groups of variable combinations, respectively. In comparison with the results of using homogeneous pixels for both training and verification, the optimal variable combinations of the three groups using homogeneous pixels for training and dominant pixels for verification yield similar but lower accuracy. However, the overall averaged accuracy of all the land cover identification is above 80%. Although there are some differences in the results for each land cover type, overall they are similar to the results using homogenous samples: cropland and forest land and water bodies have highest accuracy, and perennial and grassland have the lowest. It can be seen, from comparison of Tables 3 and 4, that the overall averaged accuracy decreases about 9% using dominant pixels for verification, which is likely caused by land cover mixing within a pixel. The accuracy of cropland cover type decreases the least from 93% to 90%. This is because cropland in Saskatchewan is usually in large parcels relative to the MODIS pixel size. Once cropland cover becomes a dominant land cover within a pixel, it is likely Table 4 Maximum overall averaged accuracy of land cover identification using dominant pixels for verification (one of the best combinations of each group). Land cover type
Cropland Forest land Shrubland Grassland Perennial Developed Water Overall average
Grassland
The highest overall accuracy (%) Model 1 (Eq. (1))
Model 2 (Eq. (2))
Model 3 (Eq. (3))
89.20 86.29 71.00 71.20 71.40 74.20 86.60 80.27
89.80 85.56 72.40 71.60 71.20 75.60 86.20 80.39
90.40 84.53 71.80 71.60 71.40 75.00 89.40 80.35
22 760 130 12 2
Perennial 36 22 44 184 758 26
Developed
Water
18 6 20 28 926 2
2 2
992
% 93.40 95.73 79.80 76.00 75.80 92.60 99.20 87.50
that, if not all, the majority of the 64 sub-pixels of a MODIS pixel are cropland. The accuracy of developed land cover decreases the most from about 92% to about 75%, and the accuracy of water bodies drops more than 10%. Other land covers yield a lower accuracy (about 4–8% lower). The results are explainable because the land covers other than the dominant one within a MODIS pixel would contribute to the spectral information, which makes it more difficult to accurately identify the dominant land cover. The degree of the confusion may depend on the number and the types of land covers within the pixels of a dominant land cover. 3.3. Time-series of band variable combinations In Section 3.2, the best results of land cover identification from all possible band variable combinations of various timeseries periods were presented. In this section, we will present and analyze the statistics of all the results of band variable combinations (Eq. (1)) based on the evaluation of homogeneous land cover samples for model training, and dominant land cover samples for validation. As described in Section 3.1, in order to make the statistical analysis concise, the time series of data are assessed in a unit of 1 month, leading to a total of 12,753 possible band variable combinations of the time-series data with a condition that any variable combination must have at least 3 bands. For example, a combination can be bands 1, 2, and 6 in April; or bands 1, 2, and 6 in April and May, or bands 1, 2, 3, and 6 in June, July and August, and so on. Two groups of statistical results for the band variable combinations are analyzed. One group presents the results of a fixed number of bands vs. various time-series periods; and the other presents the results of a fixed time-series period vs. various numbers of bands. Fig. 5 shows the overall averaged accuracy of land cover classification with a fixed number of bands and various time-series periods, and Fig. 6 shows the overall averaged accuracy of land cover classification with a fixed time-series period and various numbers of bands. As shown in Fig. 5, in general, when the number of bands is fixed, the overall averaged accuracy gains with the increase of time-series length. For example, 3 band combinations from 1-month time period could produce an overall averaged accuracy about 73%; when the length of the time-series is increased to 2 months, the averaged accuracy is improved by about 3% to about 76%. This implies that the reduced accuracy of a smaller number of bands can be compensated by a longer time-series length. This
121
No. of Bands: 3
82 80 78 76 74 72 70
Accuracy (%)
Accuracy (%)
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
2
3
4
5
6
1
7
2
3
4
5
6
7
Time-series length (in month)
Time-series length (in month)
(a)
(b)
No. of Bands: 5
82 80 78 76 74 72 70
Accuracy (%)
Accuracy (%)
1
No. of Bands: 4
82 80 78 76 74 72 70
2
3
4
5
6
7
1
2
3
4
5
6
Time-series length (in month)
Time-series length (in month)
(c)
(d)
Accuracy (%)
1
No. of Bands: 6
82 80 78 76 74 72 70
7
No. of Bands: 7
82 80 78 76 74 72 70 1
2
3
4
5
6
7
Time-series length (in month)
(e) Fig. 5. Overall averaged accuracy and uncertainty of band combinations grouped by a fixed number of bands and various time period. The uncertainty is the standard deviation.
observation applies to all the other cases of a fixed number of band combinations. However, the accuracy improvement with the increasing time-series length is not linear, with the incremental improvement of accuracy being reduced as time series length is increased. In one case, using all 7 bands and increasing the time series from 6 to 7 months, increasing the time-series length causes a slight reduction (approximately 0.5%) in accuracy. This similar phenomenon was reported by Pouliot et al. (2009) and is caused by feature saturation. These patterns indicate that time-series data do help improve the land cover classification accuracy, but there are diminishing returns from increasing the length and eventually the improvement is small. For example, when 5 bands are used, the overall accuracy increases only about 1% from time-series length 3 months to 7 months. As data with a longer time-series length would involve more processes (such as geometrical and radiometric corrections) and more computation, there is a trade-off of computational cost and accuracy improvement. The uncertainty bars shown in Fig. 5 are the standard deviation of all the results of the possible combinations of each group. In general, the accuracy of the combinations with fewer bands and a shorter time-series period has a larger uncertainty than that of the combinations with more bands and longer time-series. This means that when fewer bands with a shorter time-series length are used in land cover classification, choosing a ‘right’ band combination for a better accuracy is more critical than using the data of more bands and a longer time-series.
Similarly, as shown in Fig. 6, when the time-series period is fixed the overall averaged accuracy is higher when more bands are utilized. There are also diminishing returns from increasing the number of bands, particularly above 4 bands. Moreover, when the length of time-series is 5 months or longer, there is almost no improvement from 6 bands to 7. In the case of 7 month period, the accuracy even decreases from 6 bands to 7 bands. It is worth mentioning that as discussed above, more uncertainty is associated with variable combinations of a shorter time-series length and fewer bands. Therefore carefully selecting the band and the time period for an optimal variable combination to achieve the best possible accurate output is necessary, especially when a short period and a fewer bands are used. Of the 12,736 possible combinations (Eq. (1)) with various numbers of bands and time-series periods, there are 439 combinations whose accuracy is over 80%. As shown in Fig. 7a, among the 439 band combinations that produced higher accuracy classifications, bands 1 and 2 have the highest occurrences – band 2 is part of all the band combinations, and band 1 is involved in 95% of the band combinations. They are followed by band 6. Band 5 has the lowest occurrences, with band 4 and band 7 having just a little bit higher occurrences than that of band 5. Therefore, bands 1, 2 and 6 have an advantage over other bands of generating higher accuracy outputs in terms of band selection. The better performance of Bands 1 and 2 is due to their higher (original) spatial resolution, and are in the spectral range of red and near infrared range
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
82 80 78 76 74 72 70
Time-series period -1 month
3
4
4
5
6
(a)
(b)
4
82 80 78 76 74 72 70
5
6
7
3
4
5
6
No. of Bands
(c)
(d) Accuracy (%)
82 80 78 76 74 72 70
4
5
6
7
7
Time-series period - 6 months
3
4
5
6
No. of Bands
No. of Bands
(e)
(f)
82 80 78 76 74 72 70
7
Time-series period - 4 months
No. of Bands
Time-series period - 5 months
3
3
7
No. of Bands
Accuracy (%)
Accuracy (%)
6
No. of Bands
Time-series period -3 months
3
82 80 78 76 74 72 70
5
Time-series period -2 months
82 80 78 76 74 72 70
Accuracy (%)
82 80 78 76 74 72 70
Accuracy (%)
Accuracy (%)
Accuracy (%)
122
7
Time-series period - 7 months
3
4
5
6
7
No. of Bands
(g) Fig. 6. Overall averaged accuracy and uncertainty of various number of spectral band combinations grouped by a fixed time period. The uncertainty is the standard deviation.
400
(a)
6
7
em be r O ct ob er
5
gu st
4
Band
pt
3
Se
2
Ap
1
ly
0
ril
0
100
Ju
100
200
Au
200
300
ne
300
Ju
400
Histogram of Individual month for overall accuracy over 80%
M ay
Occurrence
Occurrence
500
Histogram of individual band for overall accuracy over 80%
(b)
Fig. 7. Histogram of individual band (a) and month (b) for overall averaged accuracy over 80%.
123
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
20
150
Percent (%)
Occurrence
200
Percent of number of bands for overall accuracy over 80%
Histogram of number of bands for overall accuracy over 80%
100 50 0
3
4
5
6
15 10 5 0
7
3
4
5
6
Number of Bands
Number of Bands
(a)
(b)
7
Fig. 8. Histogram of the bands (a) and percentage (b) for overall averaged accuracy over 80%.
Histogram of time-series period for overall accuracy over 80%
25
Percent (%)
Occurence
200 150 100 50 0
Percent of time-series period for overall accuracy over 80%
20 15 10 5 0
1
2
3
4
5
6
7
1
2
3
4
5
6
Time-series length in month
Time-series length in month
(a)
(b)
7
Fig. 9. Histogram of time-series period (a) and percentage (b) for overall averaged accuracy over 80%.
which are very good for vegetation identification. Band 6 is in the range of middle infrared, and is also good for identifying vegetation. Fig. 7b shows individual month occurrence for the variable combinations whose accuracy is over 80%. It can be seen that almost all the months have the similar number of occurrences and no single month stands out although August and October have a slightly higher occurrence. This means that, in terms of the time factor, no period of the growing season is critical for land cover classification, but rather a time-series period of EO observations is required. Furthermore, among the 439 variable combinations with overall average accuracy exceeding 80%, combinations involving 5 bands have the highest number (159), followed by combinations of 4 bands (137) (Fig. 8a). Combinations with 7 bands are the least common. However, in terms of classification accuracy exceeding 80%, the combinations engaging all 7 bands have the highest percent, and combinations of 3 bands have the lowest (Fig. 8b). This implies that the combinations with more bands have a higher probability of producing a higher accuracy output. As the percentage of fewer bands for generating a higher accurate output is low, it further confirms that when fewer bands are used, band selection is essential for achieving a high accuracy result. When time-series length was examined, combinations with only 1 or 2 months did not produce any output with accuracy over 80% (Fig. 9a). Band combinations with a time-series period of 5 months have the highest occurrence, and following by those of 6 months. As in the combinations of numbers of bands, the longest period (7 months) has the highest probability of producing a high accuracy output. The probability of combinations with a time-series length of 3 months for a good result is very small and for the combinations of a time-series of length of 1 or 2 months, the probability is 0 (Fig. 9b). This further confirms that
data with a longer time-series length is preferable for land cover classification. 3.4. NDVI and band combinations with various time-series lengths As NDVI is derived from bands 1 and 2 of time-series MODIS data, the NDVI is used to replace bands 1 and 2 in all the band combinations involving them (Eq. (2)) to determine if NDVI contributes more to the accuracy of land cover classification than bands 1 and 2 directly. In this model band 1 and band 2 in the form of NDVI are always present in the exploration. As presented in Section 3.3, bands 1 and 2 contribute mostly to the higher accuracy results of land cover classification. In the exploration, NDVI is combined with one or more bands from among band 3 to 7 with various time periods of 1 month or more. As in the spectral band analysis above, the land cover classification accuracies are assessed by two groups – one group is a fixed number of bands from 1 to 5 with various time-series periods in the unit of month from 1 to 7; and the other is a fixed time-series period from 1 to 7 months with different number of bands (3–5). The statistical results of the assessment are shown in Figs. 10–13. As shown in Fig. 10, the results were similar to the band variable analysis; lower accuracy outputs are resulted from the variable combinations of NDVI with few bands and a shorter time-series period. The low end of the averaged accuracy of the NDVI with one band within 1-month period is about 71%, and the high end of the accuracy is over 80% which can be achieved with NDVI plus 3 bands or more within a time period of 5 months or longer or NDVI plus 2 bands or more within a time-series period of 6 months or longer. It is noted that except for the combinations of a shorter time-series periods of 1 or 2 months, in general, NDVI/ band variable combinations could perform the same as or even better than that of band variable combinations (Fig. 5 vs. Fig. 10).
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
Overall Averaged Accuracy No. of Bands: 1
82 80 78 76 74 72 70
82 80 78 76 74 72 70
2
3
4
5
6
1
7
3
4
5
6
Time-series period in month
(a)
(b)
Overall Averaged Accuracy No. of Bands: 3
Overall Averaged Accuracy No. of Bands: 4
2
3
4
5
6
82 80 78 76 74 72 70 1
7
2
3
4
5
6
Time-series period in month
Time-series period in month
(c)
(d)
Accuracy (%)
1
2
Time-series period in month
Accuracy (%)
Accuracy (%)
1
Overall Averaged Accuracy No. of Bands: 2
82 80 78 76 74 72 70
Accuracy (%)
Accuracy (%)
124
7
7
Overall Averaged Accuracy No. of Bands: 5
82 80 78 76 74 72 70 1
2
3
4
5
6
7
Time-series period in month
(e) Fig. 10. Overall averaged accuracy and its uncertainty of NDVI and spectral combinations. Uncertainty is the standard deviation.
As it can be seen from Fig. 10, the uncertainty of the most combinations is a little smaller compared to those of band combinations; therefore, combinations of NDVI with the same number of bands and the same length of the period perform similarly or even better. Fig. 11 shows overall average accuracy and uncertainty grouped by fixed time period in months and various numbers of bands. When the time period is more than 4 months, the accuracy differences among various numbers of bands are small, however, when only 1 month of data are used, the overall accuracies are only from about 71% (NDVI plus one band) to about 76% (NDVI and all 5 bands). Hence, it is confirmed again that time-series can improve the accuracy of land cover classification. In total, there are 3937 possible NDVI/band combinations. Among them, 421 (11%) combinations produced overall averaged accuracy over 80%, with the maximum of 81%. The distribution of the occurrences of these combinations is shown in Figs. 12 and 13. The majority combinations are those of 2–4 bands (Fig. 12a) and 4–6 month period (Fig. 13a). The highest frequency of more accurate classifications is NDVI plus 3 bands and 5 month period. However, when the percentage of each combinations is considered, the combinations with 4 or 5 bands have the similar and highest frequency (Fig. 12b), and, in term of time scale, the combinations with the time period of 6 months have the highest frequency (Fig. 13b). Although these results indicate that variable combinations with a longer period or more bands, in general, could have
a larger chance to produce good results, if selected correctly, variable combinations with a shorter period (say NDVI + 2 or 3 bands, and a time series of 4 or 5 months) could also have a good chance to reach the similar results. Furthermore, in terms of band contribution, as shown in Fig. 14a, band 6 had the highest representation, and then followed by bands 4 and 3. Therefore, combinations including NDVI and bands 6, 4 and 3 could have the highest probability of producing a land cover classification with higher accuracy. On the opposite, a combination with band 5 has the lower chance of producing good results. For time period contribution, October has the most occurrences and is followed by the months of April, May and June. September has the lowest frequency (Fig. 14b). Hence, choosing the months of October, April, May or June has the better opportunity to produce a better land cover map. As April, May and June show a higher frequency of higher accurate land cover classification, it suggests that it is possible to generate an accurate land cover map by mid-season from MODIS data. 3.5. Phenology metrics and band variable combinations As discussed in Section 2.2 and shown in Fig. 3, vegetation phenology metrics are derived from time-series of NDVI in a full vegetation growing season. Eleven phenology parameters are used to describe a vegetation phenology pattern: they are (a) time for
125
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
Time-series period: 1 month
80 78 76 74 72
80 78 76 74 72
70
70 1
3
4
5
1
2
3
4
No. of Bands
No. of Bands
(a)
(b)
80 78 76 74 72
5
Time-series period: 4 month
82 80
Accuracy (%)
Accuracy (%)
2
Time-series period: 3 month
82
78 76 74 72 70
70 1
2
3
4
5
1
2
3
No. of Bands
No. of Bands
(c)
(d)
Time-series period: 5 month
82 80 78 76 74 72
4
5
Time-series period: 6 month
82 80
Accuracy (%)
Accuracy (%)
Time-series period: 2 month
82
Accuracy (%)
Accuracy (%)
82
78 76 74 72 70
70 1
2
3
4
1
5
2
3
No. of Bands
No. of Bands
(e)
(f)
5
Time-series period: 7 month
82
Accuracy (%)
4
80 78 76 74 72 70 1
2
3
4
5
No. of Bands
(g)
200
Histogram of number of bands for overall accuracy over 80%
25
Percent (%)
Occurrence
Fig. 11. Overall averaged accuracy and its uncertainty of NDVI and a fixed time-series period with various numbers of bands. Uncertainty is the standard deviation.
150 100 50 0
Percent of number of bands for overall accuracy over 80%
20 15 10 5 0
1
2
3
4
5
1
2
3
4
Number of Bands
Number of Bands
(a)
(b)
5
Fig. 12. Histogram of the bands combined with NDVI (a) and percentage (b) for overall averaged accuracy over 80%.
126
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
Percent of time-series period for overall accuracy over 80%
50
Percent (%)
Occurence
200
Histogram of time-series period for overall accuracy over 80%
150 100 50 0 1
2
3
4
5
6
40 30 20 10 0
7
1
2
3
4
5
6
Time-series length in month
Time-series length in month
(a)
(b)
7
Fig. 13. Histogram of time-series period of band and NDVI combinations (a) and percentage (b) for overall averaged accuracy over 80%.
400 300 200 100
5
6
7
Band
(a)
ay
Ju ly Au gu Se st pt em be r O ct ob er
4
Ju ne
3
M
ril
0
Ap
350 300 250 200 150 100 50 0
Histogram of Individual month for overall accuracy over 80%
Occurrence
Occurrence
Histogram of individual band for overall accuracy over 80%
(b)
Fig. 14. Histogram of NDVI and individual band (a) and month (b) for overall averaged accuracy over 80%.
the start of the season – time for which the left edge has increased to a user defined level (e.g., 20% of the seasonal amplitude) measured from the left minimum level; (b) time for the end of the season – time for which the right edge has decreased to a user defined level measured from the right minimum level; (g) length of the season; (j) base level – the average of the left and right minimum values; time for the middle of the season – the mean value of the times for which, respectively, the left edge has increased to the 80% level (c) and the right edge has decreased to the 80% level (d); (e) largest data value for the fitted function during the season; (f) seasonal amplitude – difference between the maximal value and the base level; rate of increase at the beginning of the season – calculated as the ratio between the values evaluated at the season start and at the left 80% level divided by the corresponding time difference; rate of decrease at the end of the season – calculated as the ratio between the values evaluated at the season end and at the right 80% level divided by the corresponding time difference; (i) large seasonal integral – integral of the function describing the season from the season start to the season end; and (h) small seasonal integral – integral of the difference between the function describing the season and the base level from season start to season end (Jönsson and Eklundh, 2003). The phenology metrics and spectral band combinations (Eq. (3)) for land cover classification exploration consist of all phenology parameters plus one or more bands (bands 3, 4, 5, 6, 7) within various time-series periods. In order to avoid redundancy, bands 1 and 2 are excluded from all the combinations as they were used to compute the NDVI time series which then were used to derive the phenology parameters. For the evaluation, like other models, overall averaged classification accuracies are analyzed from two groups: a group of a fixed number of bands with various time-series period, and the other of a fixed time-series period with various numbers of bands. The low-end of the overall accuracies of all the combinations is higher than that of any variable combinations of other models (Eqs. (1) and (2)). This is expected because vegetation phenology param-
eters are derived from the data of a full vegetation growing season. The high end of the overall accuracies is a little more than 80% that is similar to the high-end accuracy of other variable combinations. Fig. 15 shows the overall accuracy of a fixed number (1–5) of bands (3–7) with various time-series periods. Although, in general, the variable combinations with more bands and a longer time-series period would produce a higher accuracy, the accuracy improvement with the increase of the time-series period from 3 months to longer periods is not substantial. The accuracy difference from the lowest to the highest due to the time-series period difference is within 2% for almost for all combinations. For example, for the combinations of 2 bands and phenology parameters, the overall averaged accuracy of 1 month time-series period is about 78%, and the accuracy of 7 month period is above 79%; and for the combinations of 5 bands and phenology parameters, the averaged accuracy of 1 month period is about 79%, and the accuracy of 7 month time-series is slightly lower than 80%. Although the maximum accuracy of the combinations can be over 80% with 4 or 5 bands and 5 month time-series period or longer, the accuracy difference among these combinations is not substantial. Therefore, any combinations of phenology parameters with more than 2 bands and 3 month time-series period would produce accuracy similar to those of combinations with more bands and a longer time-series period. This is likely because phenologic metrics contribute significantly in the land cover classification. The observations above can be confirmed from Fig. 15a–e. All the overall accuracies of all the variable combinations are within the range of 78–80% (except the combinations of phenology parameters plus 1 band in less than 4 month time period). It is also noted that the uncertainties of all the combination groups are smaller than that of the combinations of other models. This implies that what bands used additional to phenology parameters are not critical if a certain number of bands (P2) and time-series period (P3 months) are utilized. Although phenology metrics are useful information for land cover identification, one obvious disadvantage of using the
127
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
No. of Bands: 2
Accuracy (%)
Accuracy (%)
No. of Bands: 1 82 80 78 76 74 72 70
1
2
3
4
5
6
1
7
2
3
4
5
6
Time-series period in month
Time-series period in month
(a)
(b)
No. of Bands 3
1
2
3
4
5
6
80 78 76 74 72 70
7
1
2
3
4
5
6
Time-series period in month
Time-series period in month
(c)
(d)
Accuracy (%)
7
No. of Bands: 4
82
82 80 78 76 74 72 70
Accuracy (%)
Accuracy (%)
82 80 78 76 74 72 70
7
No. of Bands: 5
82 80 78 76 74 72 70
1
2
3
4
5
6
7
Time-series period in month
(e) Fig. 15. Overall averaged accuracy of phenology metrics and time-series period with a fixed number of spectral bands. Uncertainty is the standard deviation.
phenology parameters is that it requires the data spanning a full vegetation growing season. This is a severe constraint to most optical sensors as cloud cover may prevent obtaining full season data. Furthermore, if land cover information is needed before the end of vegetation growing season, phenology metric does not help at all.
3.6. Vegetation phenology For the final assessment, we tested the full range of phenology metrics, combinations of one to all parameters to assess if they are sufficient to distinguish different land covers without additional
Overall Averaged Accuracy
80
Accuracy (%)
70 60 50 40 30 20 10 0
1
2
3
4
5
6
7
8
9
10
11
Number of Phenology Metric Fig. 16. Overall averaged accuracy and uncertainty of land cover classification with phenology metrics.
information. In general, higher overall accuracy was achieved with an increase in the number of phenology variables (Fig. 16). Similarly to the other tests, there is large increase in accuracy when only a few metrics are used and the increase diminishes as the number of metrics used increases. For example, when 2 variables instead of one are used, the overall accuracy improves from about 42% to 53%; when the number of variables is increased to 6, the overall accuracy reaches about 72%. Once the number of parameters included exceeds 6, the overall accuracy only increases slightly. For example, from 6 variables to 10 variables, the overall accuracy increases only about 2% from 72% to about 74%. Like the other types of variable combinations for land cover classification, accuracy uncertainty is bigger for a smaller number of variable combinations and accuracy variation is smaller when the number of variables is 9 or more. The choice of metrics which are selected for land cover identification is not critical if a large number of variables (>9) is used. However, when the number of variables is decreased, as shown in Fig. 15, the performance of different variable combinations in each group will be substantially different. Overall, phenology parameters derived from seasonal time-series of EO data without other information are not the best metrics for land cover identification as its overall best accuracy is lower than those of other types of variable combinations. Compared to phenology with band combinations of Model 3 (Eq. (3)), the accuracy is about 5% lower. One reason of the lower accuracy is that although phenology metrics are derived from the data of a full vegetation growing season, only two bands (MODIS bands 1 and 2) are
128
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
involved in the phenology parameters computation. As mentioned above, one of the drawbacks of using phenology parameters for land cover identification is that they require the data of a full vegetation growing season. If we need land cover information before the end of a growing season, it cannot be fulfilled using phenology information.
landscape of homogeneous/heterogeneity needs to be conducted in order to determine if medium spatial resolution of EO data such as MODIS is sufficient for land cover mapping of a region.
Aknowledgments 4. Conclusions EO-based land cover mapping and its subsequent transitional land assessment at a regional and national level require large coverage and adequate spatial and temporal resolutions of EO data. MODIS data is a reasonable choice due to its large swath, medium spatial resolution, and its daily capability of image acquisition. This study fully explored, with the developed data mining strategy, the potential of the time-series of MODIS data (10-day cloudfree composites) in various forms for land cover classification at a regional level. In particular, MODIS spectral bands, and its derived NDVI and vegetation phenology metric were studied through variable combinations of different numbers of bands and various time-series periods for optimal utilization of the time-series of MODIS data for land cover mapping. This study concludes that: (a) Time-series of MODIS data would produce more accurate output than single time stamped data. The accuracy improvement can reach about 10%. (b) In general, variable combinations of a longer time-series length and more spectral bands could produce the higher accuracy outputs among all possible variable combinations; However, reducing time period from 7 to 3 months and spectral bands from 7 to 4 would decrease the accuracy about only 2%, but trim down the data volume by a factor of about 4. (c) Variable combinations with more bands and a longer timeseries length not only produce higher accuracy outputs, but also a smaller uncertainty, which means that when a longer period of time-series and more bands are incorporated for land cover identification, the selection of what bands in what period is not critical as long as the selected data consist of enough number of bands with enough time period. Therefore, MODIS data at early or middle season are sufficient to produce the best possible accurate outputs of land cover identification. (d) Of the 4 groups of variable combinations, although the lowend accuracy for land cover classification is different, three of the variable combinations (Models 1, 2 and 3) can reach the similar high-end accuracy. Although NDVI and vegetation phenology provide us with some extra options, MODIS spectral band data by itself can produce the same highest possible accuracy of land cover classification as those with additional information. (e) As the phenology parameters are derived from the data of a full vegetation growing season, other types of variable combinations are preferable, especially in order to meet the requirement of an early or middle season land cover identification. (f) Although promising, based on our evaluations, the usefulness of time-series MODIS data for land cover identification also depends on the homogeneity–heterogeneity of the landscape being studied. The degree of landscape homogeneity–heterogeneity, i.e., the degree of mixed information within a pixel, is a function of the spatial distribution patterns of land cover and the size of land parcels relative to the MODIS pixels. Thus heterogeneity affects the accuracy of land cover identification. Therefore, an initial analysis of
The authors would like to thank Drs. Ian Olthof, and Junhua Li for their valuable comments that have improved this manuscript. The authors wish to thank the two anonymous reviewers and Prof. Song, the associate editor of the journal for their insightful critiques, which led to significant improvements of the manuscript. The long-term satellite record team of Canada Centre for Remote Sensing is acknowledged for the provision of input data. This research is partially financed by Canadian Space Agency through the Government Related Initiatives Program. References Agriculture and Agri-Food Canada, 2012. ISO 19131 Land Cover for Agricultural Regions of Canada, Circa 2000 – Data Product Specification.
. Alcantara, C., Kuemmerle, T., Prishchepov, A.V., Radeloff, V.C., 2012. Mapping abandoned agriculture with multi-temporal MODIS satellite data. Remote Sensing of Environment 124 (2012), 334–347. Azzalil, S., Menentil, M., 1999. Mapping isogrowth zones on continental scale using temporal Fourier analysis of AVHRR-NDVI data. International Journal of Applied Earth Observation and Geoinformation 1 (1), 9–20. Bagan, H., Wang, Q., Watanabe, M., Yang, Y., Ma, J., 2005. Land cover classification from MODIS EVI times-series data using SOM neural network. International Journal of Remote Sensing 26 (22), 4999–5012. Baldi, G., Nosetto, M.D., Aragón, R., Aversa, F., Paruelo, J.M., Jobbágy, E.G., 2008. Long-term satellite NDVI data sets: evaluating their ability to detect ecosystem functional changes in South America. Sensors 8, 5397–5425. http://dx.doi.org/ 10.3390/s8095397. Biradar, C.M., Thenkabail, P.S., Lslam, Md.A., Anputhas, M., Tharme, R., Vithanage, J., Alankara, R., Gunasinghe, S., 2007. Establishing the best spectral bands and timing of imagery for land use-land cover (LULC) class separability using Landsat ETM+ and Terra MODIS data. Canadian Journal Remote Sensing 33 (5), 421–444. Boryan, C., Yang, Z., Mueller, R., Craig, M., 2011. Monitoring US agriculture: the US department of agriculture, national agricultural statistics service, cropland data layer program. Geocarto International 2011, 1–18. Boulila, W., Farah, I.R., Ettabaa, K.S., Solaiman, B., Ghézala, H.B., 2011. A data mining based approach to predict spatiotemporal changes in satellite images. International Journal of Applied Earth Observation and Geoinformation 13 (2011), 386–395. Bradley, B., Mustard, J., 2008. Comparison of phenology trends by land cover class: a case study in the Great Basin, USA. Global Change Biology 14, 334–346, doi: 10.1111/j.1365-2486.2007.01479. Carrão, H., Gonçalves, P., Caetano, M., 2008. Contribution of multispectral and multitemporal information from MODIS images to land cover classification. Remote Sensing of Environment 112 (2008), 986–997. Clark, M.I., Aide, T.M., Grau, H.R., Riner, G., 2010. A scalable approach to mapping annual land cover at 250 m using MODIS time series data: a case study in the dry chaco ecoregion of South America. Remote Sensing of Environment 114 (2010), 2816–2832. Colditz, R.R., Saldaña, G.L., Maeda, P., Espinoza, J.A., Tovar, C.M., Victoria, A., 2012a. Generation and analysis of the 2005 land cover map for Mexico using 250 m MODIS data. Remote Sensing of Environment 123 (2012), 541–552. Colditz, R.R., Schmidt, M., Conrad, C., Hansen, M.C., Dech, S., 2012b. Land cover classification with coarse spatial resolution data to derive continuous and discrete maps for complex regions. Remote Sensing of Environment 115 (2011), 3264–3275. de Beurs, K.M., Henebry, G.M., 2004. Land surface phenology, climatic variation, and institutional change: analyzing agricultural land cover change in Kazakhstan. Remote Sensing of Environment 89 (2004), 497–509. Donohue, R.J., Roderick, M.L., McVicar, T.R., 2008. Deriving consistent long-term vegetation information from AVHRR reflectance data using a cover-trianglebased framework. Remote Sensing of Environment 112 (2008), 2938–2949. Ehrlich, D., Lambin, E.F., 1996. Broad scale land-cover classification and interannual climatic variability. International Journal of Remote Sensing 17, 845–862. Eklundh L., Jönsson, P., 2012. Timesat 3.1 Software manual. Evrendilek, F., Gulbeyaz, O., 2008. Deriving vegetation dynamics of natural terrestrial ecosystems from MODIS NDVI/EVI data over Turkey. Sensors 8, 5270–5302, doi: 10.3390/s8095270. Evrendilek, F., Gulbeyaz, O., 2011. Boosted decision tree classifications of land cover
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129 over Turkey integrating MODIS, climate and topographic data. International Journal of Remote Sensing 32 (12), 3461–3483. Fensholt, R., Proud, S.R., 2012. Evaluation of earth observation based global long term vegetation trends — comparing GIMMS and MODIS global NDVI time series. Remote Sensing of Environment 119 (2012), 131–147. Friedl, M.A., McIver, D.K., Hodges, J.C.F., Zhang, X.Y., Muchoney, D., Strahler, A.H., Woodcock, C.E., Gopal, S., Schneider, A., Cooper, A., Baccini, A., Gao, F., Schaaf, C., 2002. Global land cover mapping from MODIS algorithms and early results. Remote Sensing of Environment 83 (2002), 287–302. Fuller, D.O., 1998. Trends in NDVI time series and their relation to rangeland and crop production in Senegal, 1987–1993. International Journal of Remote Sensing 19 (10), 2013–2018. Giriraj, A., Irfan-Ullah, M., Murthy, MSR., Beierkuhnlein, C., 2008. Modelling spatial and temporal forest cover change patterns (1973–2020): a case study from South Western Ghats (India). Sensors 8, 6132–6153, doi: 10.3390/s8106132. Hill, M.J., Donald, G.E., 2003. Estimating spatio-temporal patterns of agricultural productivity in fragmented landscapes using AVHRR NDVI time series. Remote Sensing of Environment 84 (2003), 367–384. Hilla, M.J., Donalda, G.E., Hyderb, M.W., Smith, R.C.G., 2004. Estimation of pasture growth rate in the south west of Western Australia from AVHRR NDVI and climate data. Remote Sensing of Environment 93 (2004), 528–545. Hirosawa, Y., Marsh, S.E., Kliman, D.H., 1996. Application of standardized principal component analysis to land-cover characterization using multitemporal AVHRR data. Remote Sensing of Environment 58, 267–281. Huttich, C., Herold, M., Schmullius, C., Egorov, V., Bartalev, S.A., 2007. Indicators of Northern Eurasia’s land-cover change trends from SPOT-VEGETATION timeseries analysis 1998–2005. International Journal of Remote Sensing 28 (18), 4199–4206. Jakubauskas, ME., Peterson, DL., Kastens, JH., Legates, D.R., 2002. Time series remote sensing of landscape-vegetation interactions in the Southern Great Plains. Photogrammetric Engineering and Remote Sensing 68 (10), 1021–1030. Jönsson, P., Eklundh, L., 2003. Seasonality extraction from satellite sensor data. In: Chen, C.H. (Ed.), Frontiers of Remote Sensing Information Processing. World Scientific Publishing, pp. 487–500. Kasischke, E.S., French, N.H.F., 1997. Constraints on using AVHRR composite index imagery to study patterns of vegetation cover in boreal forests. International Journal of Remote Sensing 18 (11), 2403–2426. Keane, R.E., Cary, G.J., Davies, I.D., Flannigan, M.D., Gardner, R.H., Lavorel, S., Lenihan, J.M., Li, C., Rupp, T.S., 2004. A classification of landscape fire succession models: spatial simulations of fire and vegetation dynamics. Ecological Modelling 179 (2004), 3–27. Kleynhans, W., Olivier, J.C., Wessels, K.J., Salmon, B.P., van den Bergh, F., Steenkamp, K., 2011. Detecting land cover change using an extended kalman filter on MODIS NDVI time-series data. Geoscience and Remote Sensing Letters, IEEE 8 (3), 507–511. Knight, J.F., Lunetta, R.L., Ediriwickrema, J., Khorram, S., 2006. Regional scale landcover characterization using MODIS-NDVI 250 m multi-temporal imagery: a phenology based approach. GIScience and Remote Sensing 43 (1), 1–23. Kumar, U., Kerle, N., Punia, M., Ramachandra, T.V., 2010. Perceptron and decision tree from MODIS data. Journal of the Indian Society of Remote Sensing 38 (4), 592–603, doi: 10.1007/s12524-011-0061-y. Latifovic, R., Pouliot, D., 2005. Multitemporal land cover mapping for Canada: methodology and products. Canadian Journal of Remote Sensing 31 (5), 347– 363. Liang, S., 2001. Land-cover classification methods for multiyear AVHRR data. International Journal of Remote Sensing 22 (8), 1479–1493. Liu, Y., Wang, X., Guo, M., Tani, H., Matsuoka, N., Matsumura, S., 2011. Spatial and temporal relationships among NDVI, climate factors, and land cover changes in Northeast Asia from 1982 to 2009. GIScience and Remote Sensing 48 (3), 371– 393, doi: 10.2747/1548-1603.48.3.371. Luo, Y., Trishchenko, A.P., Khlopenkov, K.V., 2008. Developing clear-sky, cloud and cloud shadow mask for producing clear-sky composites at 250-meter spatial
129
resolution for the seven MODIS land bands over Canada and North America. Remote Sensing of Environment 112 (12), 4167–4185. Lupo, F., Linderman, M., Vanacker, V., Bartholome, E., Lambin, E.F., 2007. Categorization of land-cover change processes based on phonological indicators extracted from time series of vegetation index data. International Journal of Remote Sensing 28 (11), 2469–2483. Lv, T., Liu, C., 2010. Study on extraction of crop information using time-series MODIS data in the Chao Phraya Basin of Thailand. Advances in Space Research 45 (2010), 775–784. Matsuoka, M., Hayasaka, T., Fukushima, Y., Honda, Y., 2007. Land cover in East Asia classified using Terra MODIS and DMSP OLS products. International Journal of Remote Sensing 28 (2), 221–248. Maxwell, S.K., Hoffer, R.M., Chapman, P.L., 2002a. AVHRR composite period selection for land cover classification. International Journal of Remote Sensing 23, 5043–5059. Maxwell, S.K., Hoffer, R.M., Chapman, P.L., 2002b. AVHRR channel selection for land cover classification. International Journal of Remote Sensing 23, 5061–5073. McGwire, K.C., Fairbanks, D.H.K., Estes, J.E., 1992. Examining regional vegetation associations using multi-temporal AVHRR imagery. In: Proceedings of the National ASPRS/ACSM 1992 Annual Convention, Bethesda, Maryland, pp. 304– 313. Neeti, N., Rogan, J., Christman, Z., Eastman, J.R., Millones, M., Schneider, L., Nickl, E., Schmook, B., Turner II, B.L., Ghimire, B., 2012. Mapping seasonal trends in vegetation using AVHRR-NDVI time series in the Yucatán Peninsula, Mexico. Remote Sensing Letters 3 (5), 433–442. Pal, M., Mather, P.M., 2003. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sensing of Environment 86, 554– 565. Pouliot, D., Latifovic, R., Fernandes, R., Olthof, I., 2009. Evaluation of annual forest disturbance monitoring using a static decision tree approach and 250 m MODIS data. Remote Sensing of Environment 113 (2009), 1749–1759. Quinlan, J.R., 1996. Bagging, boosting, and C4.5. In: Proceedings AAAI-96 Fourteenth National Conference on Artificial Intelligence, Portland, OR. Reed, B.C., Brown, J.F., VanderZee, D., Loveland, T.R., Merchant, J.W., Ohlen, D.O., 1994. Measuring phenological variability from satellite imagery. Journal of Vegetation Science 5, 703–714. Salomonson, VV., Barnes, WL., Maymon, PW., Montgomery, HE., Ostrow, H., 1989. MODIS: advanced facility instrument for studies of the earth as a system. IEEE Transactions on Geoscience Remote Sensing 27 (2), 145–153. Schneider, A., 2012. Monitoring land cover change in urban and peri-urban areas using dense time stacks of Landsat satellite data and a data mining approach. Remote Sensing of Environment 124 (2012), 689–704. Wardlow, B.D., Egbert, S.L., 2008. Large-area crop mapping using time-series MODIS 250 m NDVI data: an assessment for the US Central Great Plains. Remote Sensing of Environment 112 (2008), 1096–1116. Wardlow, B.D., Egbert, S.L., 2010. A comparison of MODIS 250-m EVI and NDVI data for crop mapping: a case study for southwest Kansas. International Journal of Remote Sensing 31 (3), 805–830. Wardlow, B.D., Egbert, S.L., Kastens, J.H., 2007. Analysis of time-series MODIS 250 m vegetation index data for crop classification in the US Central Great Plains. Remote Sensing of Environment 108 (2007), 290–310. Weiss, E., Marsh, S.E., Pfirman, E.S., 2001. Application of NOAA-AVHRR NDVI timeseries data to assess changes in Saudi Arabia’s rangelands. International Journal of Remote Sensing 22 (6), 1005–1027. Xiao, X., Boles, S., Liu, J., Zhuang, D., Frolking, S., Li, C., Salas, W., Moore III, B., 2005. Mapping paddy rice agriculture in southern China using multi-temporal MODIS images. Remote Sensing of Environment 95 (2005), 480–492. Yang, X., Lo, C.P., 2002. Using a time series of satellite imagery to detect land use and land cover changes in the Atlanta, Georgia metropolitan area. International Journal of Remote Sensing 23 (9), 1775–1798. Zhang, X., Sun, R., Zhang, B., Tong, Q., 2008. Land cover classification of the North China Plain using MODIS EVI time series. ISPRS Journal of Photogrammetry and Remote Sensing 63, 476–48.