Mapping global urban areas using MODIS 500-m data: New methods and datasets based on ‘urban ecoregions’

Mapping global urban areas using MODIS 500-m data: New methods and datasets based on ‘urban ecoregions’

Remote Sensing of Environment 114 (2010) 1733–1746 Contents lists available at ScienceDirect Remote Sensing of Environment j o u r n a l h o m e p a...

4MB Sizes 0 Downloads 20 Views

Remote Sensing of Environment 114 (2010) 1733–1746

Contents lists available at ScienceDirect

Remote Sensing of Environment j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / r s e

Mapping global urban areas using MODIS 500-m data: New methods and datasets based on ‘urban ecoregions’ Annemarie Schneider a,⁎, Mark A. Friedl b, David Potere c a

Center for Sustainability and the Global Environment, Nelson Institute for Environmental Studies, University of Wisconsin-Madison, 1710 University Avenue, Madison, WI 53726, United States Department of Geography and Environment, Boston University, 675 Commonwealth Avenue, Boston, MA 02215, United States c Office of Population Research, Princeton University, 207 Wallace Hall, Princeton, NJ 08544, United States b

a r t i c l e

i n f o

Article history: Received 2 October 2009 Received in revised form 1 March 2010 Accepted 6 March 2010 Keywords: Urban areas Urbanization Cities Global monitoring Land cover Environment Classification Decision trees Machine learning

a b s t r a c t Although cities, towns and settlements cover only a tiny fraction (b1%) of the world's surface, urban areas are the nexus of human activity with more than 50% of the population and 70–90% of economic activity. As such, material and energy consumption, air pollution, and expanding impervious surface are all concentrated in urban areas, with important environmental implications at local, regional and potentially global scales. New ways to measure and monitor the built environment over large areas are thus critical to answering a wide range of environmental research questions related to the role of urbanization in climate, biogeochemistry and hydrological cycles. This paper presents a new dataset depicting global urban land at 500-m spatial resolution based on MODIS data (available at http://sage.wisc.edu/urbanenvironment.html). The methodological approach exploits temporal and spectral information in one year of MODIS observations, classified using a global training database and an ensemble decision-tree classification algorithm. To overcome confusion between urban and built-up lands and other land cover types, a stratification based on climate, vegetation, and urban topology was developed that allowed region-specific processing. Using reference data from a sample of 140 cities stratified by region, population size, and level of economic development, results show a mean overall accuracy of 93% (k = 0.65) at the pixel level and a high level of agreement at the city scale (R2 = 0.90). © 2010 Elsevier Inc. All rights reserved.

1. Introduction In a relatively short period of time, urbanization has emerged as a top environmental issue facing many parts of the Earth (Montgomery, 2008). Urban areas have profound environmental impacts that extend beyond city boundaries, including urban heat island effects, impervious surfaces that alter sensible and latent heat fluxes, conversion and fragmentation of natural ecosystems, loss of agricultural land, contamination of air, soil and water, increased water use and runoff, and reduced biodiversity (Pickett et al., 1997; El Araby, 2002; Alberti, 2005; Shepherd, 2005). New estimates that half of the global population now lives in urban areas means that the importance and impact of cities is greater than ever before (UN, 2008). An additional two billion people are expected to arrive in cities by 2050, with nearly 90% of this growth expected in developing countries. Clearly, the impact of urban areas on the human population and the global

⁎ Corresponding author. E-mail addresses: [email protected] (A. Schneider), [email protected] (M.A. Friedl), [email protected] (D. Potere). 0034-4257/$ – see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.rse.2010.03.003

environment is significant, and will become even more pronounced in the future (Mills, 2007). While we are beginning to comprehend the local environmental impacts of urbanization, there is a new interest across several disciplines to understand how urbanization — regionally or cumulatively — contributes to global environmental change (Grimmond, 2008; Mills, 2007). Urban areas are the primary source regions of anthropogenic carbon emissions (Svirejeva-Hopkins et al., 2004), yet global models of climate and biogeochemistry include only relatively crude representations of urban areas (Pataki et al., 2006). Recent studies have demonstrated that accurate representation of urban land use is both important and poorly captured in current models (Oleson et al., 2008; Peters-Lidard et al., 2004). Accurate and timely information on the distribution and characteristics of urban areas are therefore essential for a wide array of geophysical research questions related to the impact of humans on the environment (Kaye et al., 2006). The datasets that have emerged during the last decade show considerable disagreement on the location and extent of urban areas (Potere & Schneider, 2007; Potere et al., 2009). Moreover, most environmental modeling efforts require additional information that is not available at regional to global scales, including urban vegetation types, presence of irrigation, building heights and materials, and

1734

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

urban surface radiative and thermodynamic properties (Oleson et al., 2008). In this paper we present results from a new effort to create a global map of urban, built-up and settled areas, which serves as the first stage in our development of a comprehensive database of urban land surface characteristics for 2001–2010. This work builds on previous mapping efforts using Moderate Resolution Imaging Spectroradiometer (MODIS) data at 1-km spatial resolution (Schneider et al., 2003, 2005), which is included as part of the MODIS Collection 4 (C4) Global Land Cover Product (Friedl et al., 2002). Here we address weaknesses in the first map as well as several limitations of contemporary global urban maps by developing a methodology that relies solely on newly released Collection 5 (C5) MODIS 500-m resolution data. Specifically, a supervised decision-tree classification algorithm is used to map urban areas using region-specific parameters. At the heart of our approach is a new, global stratification of “urban ecoregions” that facilitates study and mapping of cities and towns at regional to global scales. After describing our methods, we expand on work presented in Schneider et al. (2009) by presenting our results, a validation of the new map using a global, stratified random sample of 140 case study cities, and assessment of urban land densities captured by the new global urban map. 2. Background: strengths and weaknesses of global urban mapping efforts During the last two decades, eight different teams have developed global maps that offer circa-2000 portraits of urban areas (see Gamba and Herold, 2009). Table 1 summarizes the data, methods, and urban

definitions used for each map, as well as the estimates of global urban area derived from each. Substantial progress has been made since the first maps were released in the early 1990s, yet mapping urban areas at global scale remains a complex challenge. The global area of urban land is small relative to the total land surface, urban areas are heterogeneous mixes of land cover types, and there are significant differences in how different groups and disciplines define the term ‘urban’. We have discussed the methodologies and key attributes of each map elsewhere (Potere & Schneider, 2007). Here, we briefly review the strengths and weaknesses of these past efforts to provide a point of departure for our current work. One of the strengths of recent global urban maps is that nearly all exploit remotely sensed imagery, either directly as data input or indirectly using remote sensing-based products. Seven maps in Table 1 exploit multispectral data sources, including Systeme Pour l'Observation de la Terre (SPOT)-Vegetation data (Bartholome & Belward, 2005), Medium Resolution Imaging Spectrometer (MERIS) data (Arino et al., 2007), and C5 MODIS data. These global remote sensing datasets have provided a way to apply consistent definitions and methodologies to the diverse set of cities across the globe. The second strength of these global urban mapping efforts is the use of semi-automated classification algorithms and/or data fusion methods that draw on a combination of satellite imagery, census data, and GIS datasets. Four maps rely specifically on supervised or unsupervised classification methods (GLC2000, Bartholome & Belward, 2005; GlobCover, ESA, 2008; MODIS 1-km, Schneider et al., 2003, 2005; MODIS 500-m), and four exploit data fusion approaches based on regression techniques or decision rules. This latter group includes the History Database of the Environment (HYDE, Goldewijk,

Table 1 The ten global maps depicting urban (or urban-related) areas shown in order of increasing global urban extent. The five maps used for comparison (shown in bold) with the new MODIS 500-m map of urban extent include the MODIS 1-km map, Global Land Cover 2000's urban class (GLC2000), the GlobCover urban class, NOAA's Impervious Surface Area map (IMPSA), and CIESIN's Global Rural–Urban Mapping Project (GRUMP). Abbreviation

Map (citation)

Producer

Definition of urban features Resolution Data

Methods

Urban extent (km2) (% of land area)

VMAP

Vector Map Level Zero (Danko, 1992) Global Land Cover 2000 (Bartholome and Belward, 2005) GlobCover v2 (Arino et al., 2007; ESA, 2008)

US National GeospatialIntelligence Agency European Commission Joint Research Center

Populated places

1:1 mil.

Digitization

Artificial surfaces and associated areas

988 m

276,000 (0.21) 308,000 (0.24)

Artificial surfaces and associated areas (urban areas N 50%) Urban area (built-up, cities)

309 m

MERIS (GLC2000 in some areas)

9000 m

Landscan, UN census data, city gazetteers

Density of impervious surface area Built environment (N50%), including non-vegetated, human-constructed elements, with minimum areaN 1 km2 Urban and built-up areas

927 m

Landscan, Nighttime lights data MODIS 463-m data

927 m

MODIS 1 km, Nighttime lights, population density

Earth Institute at Columbia University

Urban extent

927 m

VMAP, census data, Nighttime lights, maps

National Geophysical Data Center (US-NOAA)

Nighttime illumination intensity

927 m

DMSP-OLS 2.2 km data

US Oak Ridge National Laboratory (US-DOE)

Ambient (average over 24 h) global population distribution

927 m

Data fusion Geocover maps, VMAP0, MODIS 1-km, Landsat, census data, high-resolution imagery

GLC2000

GlobCover

European Commission Joint Research Center

History Database of the Global Environment v3 (Goldewijk, 2001, 2005) IMPSA Global Impervious Surface Area (Elvidge et al., 2007) MODIS 500-m MODIS Urban Land Cover 500-m (Schneider et al., 2009)

Netherlands Environmental Assessment Agency US Nat. Geophysical Data Center (US-NOAA) University of Wisconsin, Boston University (US-NASA)

MODIS 1-km

Boston University (US-NASA)

HYDE

GRUMP

Lights

LandScan

MODIS Urban Land Cover 1-km (Schneider et al., 2003) Global Rural–Urban Mapping Project (CIESIN, 2004) Nighttime Lights v2 (Elvidge et al., 2001; Imhoff et al., 1997) LandScan 2005 (Bhaduri et al., 2002)

463 m

Aeronautical charts, maps SPOT-Vegetation, Nighttime lights data

Unsupervised classification, 18 regional teams Unsupervised 313,000 classification (0.24) Data fusion by aggregation, decision rules Data fusion by linear regression Supervised classification, Landsat-based exemplars Supervised classification, data fusion Data fusion by logarithmic regression Data compositing

532,000 (0.41) 572,000 (0.44) 657,000 (0.51)

727,000 (0.57) 3,524,000 (2.74) NA

NA

Abbreviations: SPOT, Systeme Pour l'Observation de la Terre; MERIS, Medium Resolution Imaging Spectroradiometer; MODIS, Moderate Resolution Imaging Spectroradiometer; DMSP-OLS, Defense Meteorological Satellite Programme-Operational Line Scanner; ESA, European Space Agency; UN, United Nations; NOAA, National Oceanographic and Atmospheric Administration; NASA, National Aeronautics and Space Administration; CIESIN, Center for International Earth System Information Network; DOE, Department of Energy.

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

2005), Global Impervious Surface Area dataset (IMPSA, Elvidge et al., 2007), Global Rural–Urban Mapping Project (GRUMP, CIESIN, 2004), and LandScan (Bhaduri et al., 2002). The drawback to data fusion is that representation of urban areas is often data-dependent, and definitions of urban areas become blurred when multiple data types are used (Potere & Schneider, 2007). GRUMP characterizes “urban extents” based on their definition in Table 1, yet the strong reliance on buffered census data has resulted in a map that corresponds more closely to population than built-up areas. There are several challenges associated with using currently available global urban datasets. First, the maps exhibit a high degree of variability in how they depict the global urban landscape (rightmost column, Table 1). The most problematic areas tend to be in developing regions, where accurate and up-to-date maps are most needed. The total extent of urban land in China, for instance, ranges from 10,000 km2 in the GLC2000 dataset (Bartholome & Belward, 2005) to over 261,000 km2 in GRUMP (CIESIN, 2004), a 25-fold difference. Given such inconsistent estimates, it is impossible that all maps depict urban areas correctly. Three research teams have conducted global-scale accuracy assessment (ESA, 2008; Mayaux et al., 2006; Strahler, 2003), but these efforts have been hampered by a lack of test sites in urban areas. Our research has worked to address this validation gap; this study and a companion article (Potere et al., 2009) attempt to systematically characterize the accuracy of the global urban maps at global, regional, and local scales. A second challenge in global urban mapping efforts is the lack of a consistent, unambiguous definition of “urban area.” Our previous work has shown that the representation of urban land in global urban maps often reflects the input data: maps from census data correspond to population distribution, those utilizing Nighttime Lights data correlate with national income levels, while maps from multispectral data align most closely with ‘built-up areas’ (Potere & Schneider, 2007; Potere et al., 2009). The issue of urban definition can become problematic when the maps are used. It is not uncommon, for example, for Nighttime Lights data and derivative maps to be used in applications where maps of built-up areas might provide a better approximation (Sterling & Ducharne, 2008; Trusilova et al., 2008). The coarse spatial resolution of available global urban maps further exacerbates the class definition issue. Nearly all maps in Table 1 were produced at spatial resolutions of 1–2 km, resulting in lack of spatial detail, omission of small cities (b5–10 km2), and significant errors along city edges due to the difficulty of depicting ‘mixed’ pixels. 3. Defining urban extent Because of the difficulties associated with defining ‘urban areas’, it is important to provide a clear conceptual framework of the urban environment for regional and global mapping studies. “Urbanized land” is a depiction of land use, and includes commercial, industrial, residential and transportation land use types, because these classes are most functional for urban planners and practitioners. These classes are distinct from land cover, which is defined as the physical attributes, composition, condition and characteristics of the Earth's surface. Urban areas are heterogeneous mixtures of land cover types, and may contain any number of vegetated (grass, shrubs, and trees) and man-made surfaces (cement and asphalt). Here, we utilize a definition of urban areas based on physical attributes: urban areas are places that are dominated by the built environment. The ‘built environment’ includes all non-vegetative, human-constructed elements, such as buildings, roads, runways, etc. (i.e. a mix of human-made surfaces and materials), and ‘dominated’ implies coverage greater than or equal to 50% of a given landscape unit (here, the pixel). Pixels that are predominantly vegetated (e.g. a park) are not considered urban, even though in terms of land use, they may function as urban space. Although ‘impervious surface’ is often used to characterize urban areas within the remote sensing literature

1735

(Ridd, 1995), we prefer the more direct term ‘built environment’ because of uncertainty and scaling issues surrounding the impervious surface concept (Small, 2003; Small & Lu, 2006; Stow et al., 2007). Finally, we also define a minimum mapping unit: urban areas are contiguous patches of built-up land greater than 1 km2. 4. Methods 4.1. Overview In this section, we describe our methodology for deriving urban areas from C5 MODIS data using region-specific processing. We first present a stratification of urban ecoregions that we developed to allow region-specific image processing. We then describe the data inputs and methodology for the classification and post-processing steps. Finally, we describe the validation data and methods used to assess the accuracy of the new global urban map. 4.2. Urban ecoregions — a new approach for stratification of global urban systems To facilitate processing and classification of the MODIS data, a global stratification of the Earth's land surface was developed based on the natural, physical and structural elements of urban areas. While urban environments represent some of the most complex and heterogeneous landscapes in the world, research within urban studies, urban ecology and land change science has shown that there is surprising regularity in city structure, configuration, constituent elements, and vegetation types within geographic regions and by level of economic development (Angel et al., 2005, Schneider & Woodcock, 2008). The approach here exploits these local similarities to define 16 quasi-homogeneous strata that we term urban ecoregions (Fig. 1, Table 2). This typology applies early ideas from Brady et al. (1979) and Pickett et al. (2001) on the ecological, cultural and social elements of cities; here we extend these concepts, applying them for the first time in a spatially explicit format at the global scale. It is important to point out that this stratification is distinct from the concept of an “urban ecosystem”, where elements of the social and built environment are intermixed with biological and physical features within a given area, such as a watershed (Pataki et al., 2006; Pickett et al., 2001). Rather, we delineate regions that span large areas (i.e. continental-scale), and that encompass multiple cities and a range of land cover types. As illustrated in Table 2, the stratification is based on three elements: (1) a biome designation that synthesizes general climate and vegetation (Olson et al., 2001); (2) regional differences in urban topology, including city structure, organization and historic development (Bairoch, 1988; Lynch, 1981); and (3) the level of economic development defined by per-capita gross domestic product (GDP) in purchasing power parity (UN, 2008; World Bank, 2008). The biome label plays an important role in the stratification scheme, because it is directly correlated to vegetation density and form within and surrounding urban areas. Previous studies have shown significant differences in the vegetation phenology (i.e. green up, senescence) between urban and rural areas due to differential temperature and precipitation patterns (Roetzer et al., 2000; Zhang et al., 2004). These temporal patterns are integral to the multitemporal nature of our classification approach, and the urban ecoregion scheme is particularly adept at capturing the region-dependent nature of these trends. Fig. 2 shows the timing of vegetation cycles (e.g. green up, senescence) for different land cover types in five cities using one year of 8-day MODIS 500-m Nadir BRDF-Adjusted Reflectance (NBAR) observations of the Enhanced Vegetation Index (Huete et al., 2002). While spectral signatures of urban areas and bare fields are often similar in coarse resolution pixels, for example, the temporal signatures of majority urban and mixed urban pixels are clearly

1736

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

Fig. 1. The map of urban ecoregions produced for this research (see Table 2 for descriptions). The sample of 140 cities used for validation is indicated by points.

distinct from other cover types. By using the urban ecoregion stratification in conjunction with the semi-automated classification methods described below, we exploit these temporal patterns to increase map accuracy. Specifically, the urban ecoregion stratification is intersected with the global dataset of MODIS imagery, and regions are processed on a case-by-case basis.

4.3. Supervised classification with ensemble decision trees Decision trees have been widely used for remote sensing applications during the last decade (Chan et al., 2001; Friedl & Brodley, 1997; Hansen et al., 1996; Pal & Mather, 2003), including several studies focused on mapping land cover from coarse resolution data (DeFries

Table 2 Characteristics of the urban ecoregions, including the amount of global land area and urban land (from the MODIS 500-m map) in each region. No

Climate

Dominant land cover

Urban topology

Per-capita incomea

Geographic region

Total land, km2 (% of total land)b

Urban land, km2 (% of total urban land)

1

Temperate

Broadleaf mixed forest

High

U.S., Canada, Australia

7,057,600 (4.7)

125,000 (19.0)

High

Europe, Japan

6,970,400 (4.7)

100,800 (15.4)

Moderate

3,182,200 (2.1)

58,100 (8.9)

9,654,700 (6.5)

33,900 (5.2)

Moderate–low Moderate

Eastern China, South Korea North –South America, Australia Middle East South America

4,561,400 (3.1) 9,487,400 (6.4)

37,700 (5.7) 21,200 (3.2)

Low

Sub-Saharan Africa

2,479,600 (1.7)

5970 (0.9)

Moderate–low

9,191,100 (6.2)

80,500 (12.3)

Moderate

South-Central Asia, South-East Asia South America

5,961,300 (4.0)

47,500 (7.2)

Low

Sub-Saharan Africa

16,500,600 (11.1)

18,600 (2.8)

Moderate

South America, Southern Africa Southern Europe, Northern Africa Africa, Middle East Central Asia

1,777,100 (1.2)

12,300 (1.9)

2,489,500 (1.7)

44,400 (6.8)

22,488,100 (15.1) 8,952,400 (6.0)

38,600 (5.9) 25,500 (3.9)

20,139,700 (13.5)

6000 (0.9)

n/a

North America, Northern Eurasia North, South Pole

n/a

n/a

11

Tropical–subtropical

Grasslands

12

Temperate Mediterranean Arid, semi-arid

Mixed forest, shrubland Desert, barren Steppe, shrubland Boreal forest

Rectilinear grid structure, segregated land use, high density core, low-density suburbs Radial street structure; varying bldg density, blurred urban/rural boundary Rectilinear grid, high density rect. buildings, newly developed outer nuclei/towns Low-density, rectilinear grid, segregated land use, wide streets Rectilinear grid, high density block structure Rectilinear grid, central axis layout, typically new frontier cities Compact cities/towns, high density, mixed land use, structure varies High density, tightly spaced buildings, narrow streets, mixed land use Rectilinear grid, central axis, dense block structure, segregated land use Irregular street structure, medium density, vegetated core, high density outskirts Radial structure, central axes, high-medium density core/outskirts, segregated land use Radial, semi- irregular street structure, high-medium building density Varies Irregular street layout; conjoined buildings; segregated land use Small towns, villages

Ice, snow

n/a

2 3 4 5 6

Grassland, shrubland Tropical

Broadleaf forest

7 8

Tropical–subtropical

9

Mixed forest Savannah, grasslands

10

13 14 15 16 a

Arctic

High

High Varies Moderate–low Moderate to high

Per-capita income measured by per-capita gross domestic product in purchasing power parity circa 2000; value reflects median across region as follows: low, b$3000; moderate–low, $3000–5200; moderate, $5200–17,000; high, N$17,000. b Total land area excludes Antarctica and Greenland.

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

1737

Fig. 2. Phenological signatures of urban areas and nearby land cover types derived from one year of MODIS 500-m Nadir BRDF-Adjusted Reflectance (NBAR) observations of the Enhanced Vegetation Index MODIS (EVI) for five cities. The plots depict trends from different urban ecoregions (labeled at the bottom of each plot).

et al., 2000; Friedl et al., 2002). This classification method is able to handle large, non-parametric datasets with noisy or missing data, complex, non-linear relationships between features and classes, and problems that require a many-to-one mapping approach (Fayyad & Irani, 1992; Friedl & Brodley, 1997). For this work we use C4.5, a mature and extensively tested algorithm (Quinlan, 1993), which is also used in the MODIS Land Cover product (Friedl et al., 2009). Decision-tree construction is simple and intuitive: the training data is split in a recursive manner into increasingly homogeneous subsets based on statistical tests applied to the feature values (the satellite data). After the decision tree has been estimated, tested, and ‘pruned’ to eliminate over-fitting to the training data, the decision rules are applied to the entire image to produce a classified map. The efficiency and efficacy of decision trees increases significantly with the addition of boosting, an ensemble classification technique developed in the machine learning community (Quinlan, 1996). This technique improves class discrimination by estimating multiple classifiers while forcing the classifier to focus on difficult classes. The final classification is produced by an accuracy-weighted vote across the classifications (Quinlan, 1996). Boosting has been shown to be equivalent a form of additive logistic regression (Collins et al., 2002; Friedman et al., 2000), and as a result, probabilities of class membership can be assigned for each class at every pixel. Our classification approach employs a one-year time series of MODIS data to exploit spectral and temporal properties of land cover types. Specifically, we utilize the differences in temporal signatures for urban and rural areas that result from phenological differences between vegetation inside and outside the city (Fig. 2). The MODIS inputs are 8-day NBAR values for the seven land bands (463.3-m resolution) for one year (18 February 2001 to 17 February 2002). These data are normalized to a nadir-viewing angle to reduce the effect of varying illumination and viewing geometries (Schaaf et al.,

2002), and the 8-day values are aggregated to 32-day averages to reduce the temporal correlation and the frequency of missing values from cloud cover. Monthly and yearly minima, maxima and means for each band and the Enhanced Vegetation Index (Huete et al., 2002) are also included. The training data include 1860 training sites ranging from 1 to 100 km2 in area (Fig. 3), selected and labeled according to the International Geosphere–Biosphere Programme (IGBP) 17-class system (Belward & Loveland, 1997). Each site consists of a polygon obtained by manual interpretation of Landsat and Google Earth data (4–30-m resolution) (Friedl et al., 2009) where land cover is uniform and representative of one IGBP class. The full training site database was reviewed and revised for use with C5 MODIS data (see Friedl et al., 2009), and an independent set of urban training sites was also compiled from 182 cities located across the globe. 4.4. Estimating posterior probabilities for select urban ecoregions While our previous approach for C4 MODIS data utilized the same basic algorithm (Schneider et al., 2003, 2005), our new method differs in several key ways. Specifically, the C5 methodology does not rely on external datasets to constrain the classification (e.g. DMSP Lights data). The increased resolution of the MODIS data and improvements to the training site database are typically sufficient to generate the final urban map for most regions. For areas where substantial classification errors are present — typically semi-arid/arid regions without significant settlement — the ability of the boosted decision trees to produce class membership probabilities is exploited. To this end, the classification algorithm is run twice: the first classification utilizes the full set of land cover exemplars that includes urban sites, and the second excludes the urban training sites. The first classification characterizes the urban core and mixed urban spaces, with the caveat that some non-urban areas (typically shrubland) are

1738

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

Fig. 3. The distribution of classification exemplars labeled according to the 17-class International Geosphere–Biosphere Programme (IGBP) classification scheme. These 2190 sites generated 44,715 training pixels that were used as input to the decision-tree classifier.

erroneously labeled urban land (Fig. 4a). These areas of confusion are identified by low membership in the urban class. For these pixels, we take advantage of the information in the second classification to estimate a posteriori probabilities for the urban classes (Fig. 4b). The theoretical basis for computing a posteriori probabilities is Bayes' Rule, which is derived from the definition of conditional probability (Robert, 1997). Here, the probabilities from the second classification function as a priori information to modify the urban land cover probabilities from the first. The new a posteriori urban probabilities (Fig. 4c) are then used to determine if a pixel should indeed be labeled ‘urban land’, or whether it should be characterized by another land cover class. The land cover probabilities are used as prior information to constrain the decision-tree output in the following urban ecoregions: (4) Temperate grasslands of North–South America; (5) Temperate grassland in the Middle East; (11) Tropical–subtropical grasslands; (12) Temperate Mediterranean; and (14) Arid, semi-arid steppe in Central Asia. In these regions, urban areas are typically confused with open/closed shrubland due to a relatively similar ‘mixed’ signal of vegetation and bare ground/bright surfaces. To remedy this, the land cover probabilities from the open shrubland and savannah classes are used to develop a probability surface based on the assumption that any area with a low (or zero) probability of shrubland/savannah are those areas suitable for urban land. The a priori probability of urban land, P(urban), is therefore estimated as: P ðurbanÞ = 1−P ðshrublandÞ

ð1Þ

where P(shrubland) is the probability of shrubland/savannah from the decision-tree classification output run without urban training data. Bayes' Rule is then applied at every pixel in the region to combine the a priori information with the conditional probabilities from the decision-tree output. The resulting posterior probabilities are then visually assessed against high-resolution data (e.g. aerial photos, Google Earth), and a suitable threshold chosen to create the final map of urban extent (mean threshold, 40%). The final step in our methodology thus includes post-processing on a region-by-region basis. In addition to estimation of a posteriori

probabilities for difficult-to-classify regions, regional post-processing includes application of the MODIS 500-m water mask, use of a spatial filter to remove single, stand-alone urban pixels, and handediting. 4.5. Accuracy assessment To assess accuracy at a global scale, we use a sample of 140 Landsat-based maps (30-m resolution) of urban areas and metropolitan regions (Angel et al., 2005; Schneider & Woodcock, 2008). Fig. 1 shows the global distribution of these cities, which were chosen using a random-stratified sampling design based on population size, geographic region and income (for full details, see Potere et al., 2009). These 140 cities (Table 3) range in size from 20 to 8000 km2 (100,000–15 million inhabitants), and are independent of the training exemplars used during classification of the MODIS 500-m map. To define the extent of each case study city, the urban core is typically buffered by 30 km to include peri-urban areas. In cases where two cities converge (e.g. Washington D.C. and Baltimore, Maryland), political boundaries are used to delineate the study area for each. Detailed views of four sample cities are shown in Fig. 5. To ensure that these data provide a statistically defensible basis for characterizing the quality of global maps, 10,000 random samples were drawn from very high-resolution (4 m) imagery in Google Earth to complete an independent assessment of the Landsat-based reference maps. Each site was labeled as urban or non-urban by multiple analysts using a double-blind procedure to reduce uncertainty and bias during the labeling procedure. The pooled confusion matrix results showed that the maps for individual cities range in accuracy from 82.8 to 91.0% correctly classified, and are thus considered suitable for use in validating the new MODIS 500-m map. To provide context for assessment of the MODIS 500-m urban map and to demonstrate the differences across global urban maps, the analysis was also completed for five currently available global urban maps, including the MODIS C4 1-km map, GLC2000, GlobCover, Global Impervious Surface Area dataset (IMPSA), and GRUMP. These additional datasets were chosen because their definitions of ‘urban areas’ align closely with the definition used in our approach.

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

1739

Fig. 4. An illustration of the post-processing technique used in select urban ecoregions to adjust the urban probabilities derived from the decision-tree output (a) using the probabilities of other land cover classes (b). The posterior probabilities in (c) are then thresholded to produce the final map, shown in (d).

5. Results 5.1. Local and regional views of urban extent We begin our assessment at the scale of cities and regions. Representative results from the MODIS 500-m map are shown in Fig. 5 for four cities: two are located in more developed countries (Washington D.C., USA; London, U.K.), and two are located in less developed regions (Johannesburg, South Africa; Guangzhou, China). When compared against the Landsat-based reference maps (top row), the MODIS 500-m map (second row) provides a more detailed, articulated representation of built-up areas for each sample city than any of the other global urban maps. In Guangzhou, especially, the new map delineates transportation corridors extending outward from the city as well as nearby small town development; these features are portrayed in the other global maps as continuous urban fabric. Similarly, vegetated areas along the fringe of Johannesburg's urban core appear well-defined in the new map when compared to the earlier C4 1-km map results (third row). The effect of the Nighttime Lights data used in the MODIS 1-km map, IMPSA and GRUMP maps is also noticeable when these maps are compared against the new MODIS 500-m map. While IMPSA represents “percent impervious surface”, the city views reveal difficulties that might occur when a thresholded, urban/non-urban

version of the map is needed. Based on the results for four cities, the choice of thresholds appears to be region- and context-dependent. When a continuous-value map of urban extent is needed (e.g. withinpixel percent of urban cover), one alternative is to aggregate the MODIS 500-m map to 1 km (or coarser) resolution to create percent cover (examples are shown in the left-most column, Fig. 5). Turning to the regional view, Fig. 6a shows the new MODIS 500-m map for the Eastern U.S. and Canada. Although it is difficult to assess map quality at this scale, the sizes and shapes of the cities are in good agreement with the expected urban morphology of the region. Fig. 6b shows the secondary land cover class label derived from the boosted decision-tree output. The secondary label — included as part of the release of the C5 MODIS 500-m map — provides an approximation of the second most-likely land cover type within a pixel. This information can be used to determine the dominant class that is mixed with urban land at a sub-pixel level. 5.2. Descriptive statistics from the new global urban map Table 2 provides a selection of statistics based on the global urban ecoregion stratification, including the total land area and the amount of urban land within each stratum. While temperate ecoregions comprise only 21% of the Earth's land surface (regions 1–5, Table 2), 53% (355,486 km2) of the urban land around the world falls within

1740

Table 3 The geopolitical distribution and population of the 140 cities included in the validation sample (Angel et al., 2005, Schneider & Woodcock, 2008). For detailed information on the sample design, see Potere et al. (2009). North America

Latin and South America

Europe

Sub-Saharan Africa

Country

Population

City

Country

Population

City

Country

Population

City

Country

Population

1 Los Angeles 2 Chicago 3 Philadelphia 4 Houston 5 Washington DC 6 Montreal 7 Phoenix 8 Minneapolis 9 Baltimore 10 Pittsburgh 11 Cincinnati 12 Sacramento 13 Calgary 14 Tacoma 15 pringfield 16 Modesto 17 St. Catherine's 18 Victoria

United States United States United States United States United States Canada United States United States United States United States United States United States Canada United States United States United States Canada Canada

16,373,645 9,157,540 6,188,463 4,669,571 3,727,565 3,678,000 3,251,876 2,968,806 2,552,994 2,358,695 1,979,202 1,796,857 1,079,310 700,820 591,932 446,997 389,600 317,506

1 Mexico City 2 SaoPaolo 3 Buenos Aires 4 Santiago 5 Belo Horizonte 6 Guadalajara 7 Monterrye 8 Curitiba 9 Guatemala City 10 Caracas 11 Brasilia 12 San Salvador 13 Montevideo 14 Tijuana 15 Kingston 16 Ribeirão Preto 17 Valledupar 18 Guarujá 19 llhéus 20 Jequìé

Mexico Brazil Argentina Chile Brazil Mexico Mexico Brazil Guatemala Venezuela Brazil El Salvador Uruguay Mexico Jamaica Brazil Colombia Brazil Brazil Brazil

18,100,000 17,800,000 12,600,000 5,538,000 4,160,000 3,908,000 3,400,000 3,261,168 3,242,000 3,153,000 1,800,000 1,408,000 1,236,000 1,167,000 912,500 502,333 274,300 269,104 161,898 130,207

1 Paris 2 Moscow 3 London 4 Milan 5 Madrid 6 Warsaw 7 Vienna 8 Budapest 9 Prague 10 Thessaloniki 11 Palermo 12 Sheffield 13 Astrakhan 14 Leipzig 15 Le Mans 16 Castellon 17 Oktyabrsky

France Russia United Kingdom Italy Spain Poland Austria Hungary Czech Republic Greece Italy United Kingdom Russia Germany France Spain Russia

9,624,000 9,321,000 8,219,226 4,251,000 4,072,000 2,269,000 2,070,000 1,825,000 1,215,000 789,000 684,300 640,048 486,100 446,491 194,825 144,500 111,500

1 Nairobi 2 Addis Ababa 3 Johannesburg 4 Accra 5 Harare 6 Ibadan 7 Pretoria 8 Kampala 9 Bamako 10 Ouagadougou 11 Ndola 12 Banjul 13 Kigali

Kenya Ethiopia South Africa Ghana Zimbabwe Nigeria South Africa Uganda Mali Burkina Faso Zambia Gambia Rwanda

3,138,295 2,639,000 2,335,000 1,976,000 1,752,000 1,731,000 1,508,000 1,212,000 1,131,000 1,130,000 568,600 399,386 351,400

Western Asia and Northern Africa

South and Central Asia

East Asia

Southeast Asia, Pacific, and Australia

City

Country

Population

City

Country

Population

City

Country

Population

City

Country

Population

1 Cairo 2 Istanbul 3 Alexandria 4 Casablanca 5 Ankara 6 Algiers 7 Tel Aviv-Jaffa 8 Baku 9 Sana'a 10 Yerevan 11 Kuwait City 12 Marrakech 13 Malatya 14 Port Sudan 15 Aswan 16 Tébessa 17 Zugdidi

Egypt Turkey Egypt Morocco Turkey Algeria Israel Azerbaijan Yemen Armenia Kuwait Morocco Turkey Sudan Egypt Algeria Georgia

10,600,000 9,451,000 4,113,000 3,541,000 3,190,000 2,760,740 2,181,000 1,936,000 1,653,300 1,406,765 1,190,000 736,500 437,000 384,100 219,017 163,279 104,947

1 Mumbai 2 Kolkota 3 Dhaka 4 Teheran 5 Hyderabad 6 Bangalore 7 Ahmedabad 8 Pune 9 Lakhnau 10 Kanpur 11 Jaipur 12 Coimbatore 13 Vijayawada 14 Rajshahi 15 Ahvaz 16 Shimkent 17 Jalna 18 Gorgan 19 Saidpur

India India Bangladesh Iran India India India India India India India India India Bangladesh Iran Kazakhstan India Iran Bangladesh

18,100,000 12,900,000 12,300,000 7,225,000 6,842,000 6,787,000 5,375,000 3,489,000 2,685,000 2,450,000 2,145,000 1,292,000 1,237,000 1,016,000 997,000 360,100 244,523 188,710 114,000

1 Tokyo 2 Shanghai 3 Beijing 4 Seoul 5 Wuhan 6 Hongkong 7 Chengdu 8 Guangzhou 9 Dongguan 10 Pusan 11 Zhengzhou 12 Yulin 13 Yiyang 14 Fukuoka 15 Leshan 16 Ulan Bator 17 Changzhi 18 Anqing 19 Ansan 20 Akashi 21 Chinju 22 Chonan

Japan China China Korea China China China China China Korea China China China Japan China Mongolia China China Korea Japan Korea Korea

26,400,000 12,900,000 10,800,000 9,887,779 7,243,000 6,927,000 5,293,000 5,162,000 4,528,000 3,830,000 2,070,000 1,558,000 1,343,000 1,341,470 1,137,000 738,000 593,500 566,100 549,900 293,117 287,100 114,600

1 Manila 2 Bangkok 3 Ho Chi Minh City 4 Hanoi 5 Sydney 6 Singapore 7 Bandung 8 Medan 9 Palembang 10 Kuala Lumpur 11 Kuala Lumpur 12 Ipoh 13 Bacolod 14 Songkhla

Philippines Thailand Vietnam Vietnam Australia Singapore Indonesia Indonesia Indonesia Malaysia Philippines Malaysia Philippines Thailand

10,900,000 7,281,000 4,615,000 3,678,000 3,664,000 3,567,000 3,409,000 1,879,000 1,422,000 1,378,000 718,821 566,211 429,076 342,475

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

City

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746 Fig. 5. A comparison of the global urban maps for four metropolitan areas (top to bottom): Washington D.C.-Baltimore, U.S.A.; London, United Kingdom; Johannesburg, South Africa; and Guangzhou, China. The maps include (left to right) a regional view of the new MODIS 500-m map of urban extent aggregated to 8-km resolution, and local views of a Landsat-based classification (30-m resolution), the new map of urban extent from MODIS 500-m data, the MODIS 1-km map of urban extent, the urban class from Global Land Cover 2000, the urban class from GlobCover, the “urban extents” from the Global Rural–Urban Mapping Project, and NOAA's Impervious Surface Area (IMPSA) map (for references, see Table 1).

1741

1742

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

Fig. 6. A regional view of the East Coast of the U.S. for (a) the new map of global urban extent shown with the newly released Collection 5 MODIS 500-m Land Cover map (Friedl et al., 2009); and (b) the secondary land cover class label within the urban extent of the MODIS 500-m map.

these zones. This result is not surprising given the large number of highly populated areas with low urban population density in these regions relative to other parts of the world (UN, 2008), and the expansive nature of cities in post-industrial countries such as the U.S. and Canada. A similarly high ratio of urban land to total land area is seen in the Mediterranean region, where nearly 7% of the world's total urban land is located in just under 1.8% of the global landscape. Finally, the densely populated regions of India and China (tropical, subtropical mixed forests of South, Southeast Asia) contain just 6% of the world's land area, yet comprise over 12% of the total global urbanized land area. Although the urban ecoregion divisions are informative, it is also useful to compare the amounts of urban land across the global urban maps for different geopolitical regions. Estimates of urban land from coarse resolution (1–2 km) global maps have been reported at 2–3% of total land area (see Potere et al., 2009; Schneider et al., 2009). Our results indicate that these estimates may in fact be inflated; the amount of urban land in the MODIS 500-m map varies from only 0.13% of total continental land area in Australia to 0.97 in East Asia, with most regions near the continental average of 0.5% urbanized (e.g. South America, 0.47%; Southeast Asia, 0.63%). The exception is Western Europe (2.11%), a result that is to be expected given the extensive nature of urban areas in this region. 5.3. Assessment of city size We compared estimates of urban size derived from analysis of the 140 Landsat-based maps to those obtained from the MODIS 500-m map and five global urban datasets (Fig. 7, Table 4). To do this, we used the native resolution of the reference maps (30-m resolution, xaxis) and the native resolutions for each global map (0.3–1.1-km2) to estimate the global urban map size (y-axis) (note that IMPSA was thresholded at 20% based on previous testing, Potere et al., 2009). We assume that the maps with the highest accuracies are those with the best fit to the reference sample (i.e. low root mean square error, or RMSE, and high R2). The results in Fig. 7 show that the MODIS 500-m map most closely approximates city sizes across the globe. Except for a few outliers,

nearly all sample cities have sizes close to those depicted in the highresolution reference data. In addition to the tight fit about the 1:1 line, the MODIS 500-m data have the lowest RMSE, 142.6 km2, compared to the other sources. In contrast, several of the global urban maps systematically overestimate city sizes. GRUMP is the extreme case, with nearly all of its data points above the 1:1 line and the highest RMSE (726.0 km2). Next we examined the plots for patterns with respect to temperate regions (grey points) and tropical regions (black diamonds). The RMS errors for temperate cities are lower than those for tropical regions across all global urban datasets except the MODIS 500-m map. MODIS 1km, for example, has the greatest contrast in RMSE values: 443.1 km2 for temperate and 710.5 km2 for tropical cities. The MODIS 500-m map performs well in cities from both regions (RMSEtemp = 218.5 km2, RMSEtrop = 234.7 km2), underscoring the improvements in the new dataset. Moreover, its RMSE for tropical zones is significantly lower than that of the other global maps, which range from 405.7 to 775.5 km2. Because the temperate/tropical designation is directly related to geographic region and income level (e.g. tropical regions are dominated by developing countries, while temperate regions include mostly industrialized countries), this result shows that the MODIS 500-m urban map offers greater accuracy in developing countries where the global urban maps differ from one another significantly (Potere & Schneider, 2007). The long history of mapping cities in North America and Europe and familiarity with these regions have allowed some degree of consensus on how these areas should be depicted (also apparent in Fig. 5). Cities in developing countries (India, China), however, remain problematic for many of the global urban maps because of the small scale of features and increased mixing of urban and vegetation land use types. Table 4 provides intercept, slope and adjusted R2 values for regressions of city extents for the global maps (dependent variable) versus those for the Landsat-based maps (independent variable). Because the residuals from these regressions are much higher for cities at the extremes of the city size-distribution, we excluded all cities with urban extents less than 100 km2 or more than 2000 km2 in the Landsat-based maps (note that sample sizes were still greater than 100 for all cities). As demonstrated in Fig. 7, the MODIS 500-m

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

1743

Fig. 7. Scatter plots of the 140 cities in the validation sample, where each plot shows city sizes from the high-resolution Landsat-based maps (x-axis) against one of the global urban maps (y-axis). The results are color-coded to show the differences for cities in temperate ecoregions (grey points) and tropical ecoregions (black diamonds). Note the log scale on both axes. Adapted from Schneider et al. (2009).

map has the least unexplained variance in its estimates of city size (R2 = 0.90), while GLC2000 (R2 = 0.35) and GlobCover (R2 = 0.27) have the most unexplained variance in city size. All of the slopes in Table 4 are highly significant (p b 0.01) and the intercepts are significant for all of the maps except GRUMP (p = 0.66). 5.4. Per-pixel analysis of map agreement For the next phase of analysis, each of the 140 Landsat-based reference maps was compared to its counterpart in the MODIS 500-m map and in the five global urban maps on a pixel-by-pixel basis. First, the coarse resolution grid (0.3–1 km2) from each global urban map was overlaid on each sample city, and the mean percent coverage of urban land was assigned to each grid cell. Second, a series of binary urban/non-urban contingency matrices were constructed to calculate the level of agreement between the reference data and the global map of interest.

We summarize the accuracy statistics findings by presenting the global and regional distributions of Cohen's Kappa (k, Cohen, 1960) for each global map (Fig. 8). Although overall accuracy is often used as a standard indicator of map quality, many contend that Kappa provides a better overall measure because it also incorporates information on the errors of omission and commission (Allouche et al., 2006; Foody, 2007). At the global scale (Fig. 8a), the box plots for each urban map reinforce previous results. The MODIS 500-m map has the highest mean and median Kappa (k = 0.65), compared to those for IMPSA, GlobCover, GLC2000, and MODIS 1-km (mean k = 0.45–0.6, median k = 0.45–0.6). GRUMP stands apart from the others with a low mean Kappa of 0.2 due to its high commission error for the urban class. The regional trends in Fig. 8b–f also echo the city size results: all maps perform well in temperate North America and temperate Europe (Fig. 8b, c), with the exception of GLC2000 and GlobCover, which both have low values for North American cities (mean k = 0.45, 0.37 respectively), and GRUMP, which has consistently low values across all regions. MODIS 500-m leads

Table 4 City size correction factors for estimating reference city size using a linear univariate regression based on the extents from each global urban map.

GLC2000 GlobCover IMPSA MODIS 500-m MODIS 1-km GRUMP a b

Sample size

City size range for global map (km2)a

Slopeb

Intercept (km2)b

Standard error of slope

Standard error of intercept (km2)

Adjusted R2

101 105 112 112 112 110

(6–2420) (5–2428) (23–1441) (13–2137) (9–3971) (97–5127)

0.60 0.61 0.99 0.80 0.46 0.27

258.6 278.2 122.8 42.0 158.4 − 14.8

0.080 0.090 0.076 0.025 0.031 0.015

5.6 6.0 3.4 2.3 5.1 − 0.4

0.36 0.30 0.60 0.90 0.67 0.75

Only cities with a non-zero extent and area between 100 and 2000 km2 in the Landsat-based map are included. All slopes are highly significant (p b 0.01) and all intercepts are significant (p b 0.03) expect for GRUMP (p = 0.66).

1744

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

Fig. 8. Box plots showing the distribution of Kappa values for the 140 validation cities for (a) the global set of 140 validation cities, and (b–f) for regional subsets of validation cities in five urban ecoregions. The six global urban maps shown along the x-axis for each region are (left to right, in order of increasing global urban extent): Global Land Cover 2000 (GLC2000), the recently released GlobCover map, NOAA's Impervious Surface Area map (IMPSA, thresholded at 20%), the MODIS-based maps of urban extent at 500-m and 1-km spatial resolution (MODIS 500 m, MODIS 1 km), and CIESIN's Global Rural–Urban Mapping Project (GRUMP).

the group in North America but IMPSA fares slightly better than MODIS 500-m in European cities (both have median k = 0.7, but IMPSA's mean and distribution are higher). In tropical and semi-arid regions (Fig. 8d, e, f), the Kappa values decline for all global maps, and greater variability is evident. The Kappa values for the MODIS 500-m map remain high in all regions. 5.5. Urban land density within MODIS pixels In this section, we evaluate how areas with different densities of urban and built-up land are classified in the new MODIS 500-m urban map. Essentially, we ask the question, how much urban land is depicted in a MODIS global urban pixel? We again rely on the aggregated high-resolution city maps, and use simple histograms to plot the frequency distribution of pixels (binned in 10% increments along the x-axis). Fig. 9 shows the distribution of pixels for: (1) the validation data (grey bars), aggregated from 30 m to 500 m according, and (2) the corresponding sub-pixel percentages of pixels mapped as ‘urban’ in the MODIS 500-m global map (black bars). Individual histograms are shown for twelve of the 140 cities, grouped into temperate forest ecoregions (top row), temperate grassland-savannah ecoregions (middle row), and tropical-subtropical forest ecoregions (bottom row). The results for the 12 cities illustrate several important trends representative of the larger sample. First, the distribution of the validation data when aggregated is roughly bimodal; nearly all cities have large numbers of pixels at the extremes of 0–10% and 90–100%. Only a few cities are the exception to this rule (e.g. Kampala, Calcutta, and Guangzhou), but visual inspection suggests that the large number of low-density pixels in these areas is tied to the dense network of small villages located outside the core city. These settlements are difficult to characterize because the small scale and fragmented nature of these areas means that they do not qualify as ‘urban’ under our definition in Section 3. Second, the MODIS urban pixels have a skewed distribution toward high values of percent urban cover. Because our methodology is designed to capture ‘majority urban’ pixels (N50% built-up land), we expect the map to do well in areas where the density of urban land is

high (e.g. 80–100%), and indeed, the MODIS 500-m map does a good job mapping these pixels regardless of city size, location, or surrounding land cover type. Similarly, we expect the MODIS 500-m map's ability to map urban land to decrease as the density of urban land decreases. This gradual decline is clearly evident in Chicago, Paris, Fukuoka and in other cities in temperate forested ecoregions, but less so in other areas. This indicates that the urban signal may be easier to discern in temperate regions, or may simply suggest that we have good training data coverage for mid-density urban areas of North America and the European Union, allowing the classifier to perform well in these regions. Cities in temperate grasslands and savannas (middle row, Fig. 9), on the other hand, display a greater percentage of low-density pixels labeled ‘urban’ in the new MODIS 500-m map. Because these areas have proven difficult to classify with coarse resolution data (the probabilities were adjusted to correct for over-estimation of urban land per Section 5.5), error pixels such as these are not unexpected. Overall, the results in Fig. 9 suggest that — even in regions where it is difficult to map urban land — there is a great deal of consistency in how the MODIS 500-m map depicts dense urban areas (70–100% urban land). 6. Discussion and conclusions This paper describes a new map of circa 2001–2002 global urban extent derived from C5 MODIS 500-m data. Despite limitations related to cloud cover and missing data within urban cores, our methods were successful in depicting cities, towns and settlements of multiple sizes and scales. The improved quality of the C5 MODIS data, combined with the new, region-based mapping approach, translates into several significant improvements in the new map. Chief among these is the increased level of accuracy provided by the new map when compared against the C4 MODIS urban map and other available global urban datasets. Using a suite of accuracy measures, our results reveal that (1) the urban areas depicted in the MODIS 500-m map have a high level of agreement (R2 = 0.90) with Landsat-based maps of 140 cities (previously verified by Google Earth 4-m imagery), and (2) the MODIS 500-m map has a high per-pixel agreement (mean k = 0.65) when compared against the coarsened reference maps at global and regional

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

1745

Fig. 9. Bar plots showing the frequency distribution of urban land densities for the validation city maps when aggregated from 30 m to 500 m spatial resolution (grey bars), and the densities that correspond to pixels mapped as ‘urban and built-up’ in the new MODIS 500-m map (black bars). Representative results for 12 cities are shown above, grouped into temperate forested ecoregions (urban ecoregions 1, 2, and 3), temperate grassland/savannah ecoregions (urban ecoregions 4, 5, and 12), and tropical–subtropical forested ecoregions (urban ecoregions 8 and 10). The numbers on the horizontal axes refer to the maximum value of each bin.

scales. The increased accuracy is due, in part, to the four-fold increase in spatial resolution offered by the MODIS 500-m map; the greater level of detail provides better representation of fringe areas and small settlements than was possible at 1–2 km resolution. Each of these improvements is important to studies of how urban areas are contributing to global environmental change processes. The new C5 MODIS 500-m map is a departure from our earlier efforts using C4 MODIS data, since this new map relies solely on multispectral daytime satellite observations. No ancillary datasets (themselves subject to error) were needed to constrain or enhance the classification. These results illustrate the strength of the new methodology: the decision-tree classifier provides a repeatable, semiautomated approach, while region-specific processing offers increased flexibility. Avoiding the use of ancillary population or nighttime lights datasets also means that we can avoid issues that crop up with data fusion. Additional advances in global mapping are offered by the set of urban ecoregions defined in this study. To characterize and model cities appropriately and effectively, earth system scientists need to take advantage of the more than 100 years of urban theory that describes key differences in the urban structure, layout, building sizes, vegetation types and phenology. The urban ecoregion typology offers one way to differentiate important parameters of the urban environment (e.g. surface roughness, albedo, sensible and latent heat flux, etc.), which provides improvement over parameter assignments based on a single global value. Moreover, the use of urban ecoregions in this research showed that the stratification can provide a framework for processing a global dataset quickly and efficiently. The results presented in this paper provide a depiction of urban versus non-urban areas, as well as information on the second most-

likely land cover class, which in turn can be used to understand vegetation types and properties in urbanized regions. While these maps are useful for showing the extent of the built environment, it is clear from recent research efforts that an extended database — including finer resolution data or sub-pixel information — is needed. With this in mind, our ongoing efforts are focused on: (1) creating updated maps of urban extent circa 2005–2006; (2) creating global maps that provide sub-pixel estimates of urban land use and vegetation; and (3) providing a more refined suite of land surface characteristics for urban areas by differentiating core downtown areas from low-density residential areas. In addition, we hope to build on these results to create a globally consistent, validated map of “hot spots” of land cover change in rapidly developing metropolitan areas. In summary, human activities appropriate a large proportion of the Earth's land surface for uses ranging from agriculture and pasture to managed forests and grasslands, to recreational parks and spaces. While urbanization may be the most intense of these uses, it is the least expansive of all land use types. Recently, there has been much attention in the media regarding rapid urban expansion and estimates on the current total extent of urban land are widely disputed (1–3%). The dataset presented in this paper provides a more refined picture of global urban land areas that can inform both modeling and policy activities. Acknowledgments The authors wish to thank Solly Angel and Dan Civco for generous use of their datasets, Scott Macomber and Damien Sulla-Menashe for technical support, and Mutlu Ozdogan for comments on an earlier draft of this paper. This work was supported by NASA grant NNX08AE61A.

1746

A. Schneider et al. / Remote Sensing of Environment 114 (2010) 1733–1746

References Alberti, M. (2005). The effects of urban patterns on ecosystem function. International Regional Science Review, 28, 168−192. Allouche, O., Tsoar, A., & Kadmon, R. (2006). Assessing the accuracy of species distribution models: Prevalence, kappa and the true skill statistic (TSS). Journal of Applied Ecology, 43, 1223−1232. Angel, S., Sheppard, S. C., & Civco, D. L. (2005). The dynamics of global urban expansion. Washington D.C.: The World Bank http://www.williams.edu/Economics/Urban Growth/WorkingPapers.htm, last accessed August 1, 2009. Arino, O., Gross, D., Ranera, F., Bourg, L., Leroy, M., Bicheron, P., et al. (2007). GlobCover: ESA service for global land cover from MERIS. Proceedings of the international geoscience and remote sensing symposium, 23–28 July 2007, Barcelona, Spain, doi: 10.1109/IGARSS.2007. 4423328. Bairoch, P. (1988). Cities and economic development: From the dawn of history to the present. Chicago, U.S.A.: The University of Chicago Press. Bartholome, E., & Belward, A. S. (2005). GLC2000: A new approach to global land cover mapping from Earth observation data. International Journal of Remote Sensing, 26, 1959−1977. Belward, A. S., & Loveland, T. (1997). The IGBP-DIS global 1 km land cover data set, DISCover: First results. International Journal of Remote Sensing, 18, 3291−3295. Bhaduri, B., Bright, E., Coleman, P., & Dobson, J. (2002). LandScan: Locating people is what matters. Geoinfomatics, 5, 34−37. Brady, R. F., Tobias, T., Eagles, P. F. J., Ohmer, R., Micak, J., Veale, B., et al. (1979). A typology for the urban ecosystem and its relationship to larger biogeographical landscape units. Urban Ecology, 4, 11−28. Chan, J. C. W., Chan, K. P., & Yeh, A. G. O. (2001). Detecting the nature of change in an urban environment: A comparison of machine learning algorithms. Photogrammetric Engineering and Remote Sensing, 67, 213−225. CIESIN, Center for International Earth Science Information Network. (2004). Global Rural–Urban Mapping Project (GRUMP), Alpha Version: Urban Extents.http:// sedac.ciesin.columbia.edu/gpw last accessed August 1, 2009. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20, 37−46. Collins, M., Schapire, R. E., & Singer, Y. (2002). Logistic regression, Adaboost, & Bregman distances. Machine Learning, 48, 253−285. Danko, D. M. (1992). The digital chart of the world project. Photogrammetric Engineering and Remote Sensing, 58, 1125−1128. DeFries, R. S., Hansen, M. C., Townshend, J. R. G., Janetos, A. C., & Loveland, T. R. (2000). A new global 1-km dataset of percentage tree cover derived from remote sensing. Global Change Biology, 6, 247−254. El Araby, M. (2002). Urban growth and environmental degradation. Cities, 19, 389−400. Elvidge, C., Imhoff, M. L., Baugh, K. E., Hobson, V. R., Nelson, I., Safran, J., et al. (2001). Nighttime lights of the world: 1994–95. ISPRS Journal of Photogrammetry and Remote Sensing, 56, 81−99. Elvidge, C., Tuttle, B. T., Sutton, P. C., Baugh, K. E., Howard, A. T., Milesi, C., et al. (2007). Global distribution and density of constructed impervious surfaces. Sensors, 7, 1962−1979. ESA, European Space Agency (2008). GlobCover products description and validation report.ftp://uranus.esrin.esa.int/pub/globcover_v2/global/GLOBCOVER_Products_ Description_Validation_Report_I2.1.pdf last accessed August 1, 2009. Fayyad, U. M., & Irani, K. B. (1992). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87−102. Foody, G. M. (2007). Map comparison in GIS. Progress in Physical Geography, 31, 439−445. Friedl, M. A., & Brodley, C. E. (1997). Decision tree classification of land cover from remotely sensed data. Remote Sensing of Environment, 61, 399−409. Friedl, M. A., McIver, D. K., Hodges, J., Zhang, X., Muchoney, D., Strahler, A., et al. (2002). Global land cover mapping from MODIS: Algorithms and early results. Remote Sensing of Environment, 83, 287−302. Friedl, M. A., Sulla-Menashe, D., Tan, B., Schneider, A., Ramankutty, N., Sibley, A., et al. (2009). MODIS Collection 5 Global Land Cover: Algorithm refinements and characterization of new datasets. Remote Sensing of Environment, 114, 168−182. Friedman, J., Hastie, T., & Tibshirani, R. (2000). Additive logistic regression. Annals of Statistics, 28, 337−374. Gamba, P., & Herold, M. (Eds.). (2009). Global mapping of human settlements: Experiences, datasets, and prospects. Boca Raton, Florida, U.S.A: CRC Press. Goldewijk, K. (2001). Estimating global land use change over the past 300 years: The HYDE database. Global Biogeochemical Cycles, 15, 417−434. Goldewijk, K. (2005). Three centuries of global population growth: A spatially referenced population density database for 1700–2000. Population and Environment, 26, 343−367. Grimmond, S. (2008). Urbanization and global environmental change: Local effects of urban warming. Geographical Journal, 173, 83−88. Hansen, M., Dubayah, R., & DeFries, R. (1996). Classification trees: An alternative to traditional land cover classifiers. International Journal of Remote Sensing, 17, 1075−1081. Huete, A., Didan, K., Miura, T., Rodriguez, E. P., Gao, X., & Ferreira, L. G. (2002). Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sensing of Environment, 83, 195−213. Imhoff, M. L., Lawrence, W., Stutzer, D., & Elvidge, C. (1997). A technique for using composite DMSP/OLS “city lights” satellite data to map urban areas. Remote Sensing of Environment, 61, 361−370. Kaye, J. P., Groffman, P. M., Grimm, N., Baker, L. A., & Pouyat, R. V. (2006). A distinct urban biogeochemistry? Trends in Ecology and Evolution, 21, 192−199. Lynch, K. (1981). A theory of good city form. Boston, U.S.A.: The Massachusetts Institute of Technology Press. Mayaux, P., Eva, H., Gallego, J., Strahler, A. H., Herold, M., Agrawal, S., et al. (2006). Validation of the Global Land Cover 2000 map. IEEE Transactions on Geoscience and Remote Sensing, 44, 1728−1739.

Mills, G. (2007). Cities as agents of global change. International Journal of Climatology, 27, 1849−1857. Montgomery, M. (2008). The urban transformation of the developing world. Science, 319, 761−764. Oleson, K., Bonan, G., Feddema, J., Vertenstein, M., & Grimmond, C. S. B. (2008). An urban parameterization for a global climate model. Part 1: Formulation and evaluation for two cities. Journal of Applied Meteorology, 47, 1038−1060. Olson, D. M., Dinerstein, E., Wikramanayake, E., Burgess, N., Powell, G., Underwood, E., et al. (2001). Terrestrial ecoregions of the world: A new map of life on Earth. BioScience, 51, 933−938. Pal, M., & Mather, P. M. (2003). An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sensing of Environment, 86, 554−565. Pataki, D. E., Alig, R. J., Fung, A. S., Golubiewski, N. E., Kennedy, C. A., McPherson, E. G., et al. (2006). Urban ecosystems and the North American carbon cycle. Global Change Biology, 12, 2092−2102. Peters-Lidard, C. D., Kumar, S., Tian, S., Eastman, J. L., & Houser, P. (2004). Global urbanscale land atmosphere modeling with the land information system. Proceedings of the symposium on planning, nowcasting, and forecasting in the urban zone, 84th American Meteorological Society Annual Meeting, 11–15 January 2004, Seattle, Washington. Pickett, S., Burch, W., Dalton, S., Foresman, T., Grove, M., & Rowntree, R. (1997). A conceptual framework for the study of human ecosystems in urban areas. Urban Ecosystems, 1, 186−199. Pickett, S. T. A., Cadenasso, M. L., Grove, J. M., Nilon, C. H., Pouyat, R. V., Zipperer, W. C., et al. (2001). Urban ecological systems: Linking terrestrial ecological, physical, and socioeconomic components of metropolitan areas. Annual Review of Ecology and Systematics, 32, 127−157. Potere, D., & Schneider, A. (2007). A critical look at representations of urban areas in global maps. GeoJournal, 69, 55−80. Potere, D., Schneider, A., Schlomo, A., & Civco, D. A. (2009). Mapping urban areas on a global scale: Which of the eight maps now available is more accurate? International Journal of Remote Sensing, 30, 6531−6558. Quinlan, J. R. (1993). C4.5: Programs for machine learning. New York: Morgan Kaufmann Publishers. Quinlan, J. R. (1996). Bagging, boosting, and C4.5. Proceedings of the 13th national conference on artificial intelligence (AAAI-96), 4–8 August 1996 (pp. 725−730). Portland, Oregon: AAAI Press. Ridd, M. K. (1995). Exploring a V-I-S (vegetation-impervious surface-soil) model for urban ecosystem analysis through remote sensing — Comparative anatomy for cities. International Journal of Remote Sensing, 16, 2165−2185. Robert, C. P. (1997). The Bayesian Choice: A decision-theoretic motivation. New York, New York: Springer-Verlag 436 pp. Roetzer, T., Wittenzeller, M., Haeckel, H., & Nekovar, J. (2000). Phenology in central Europe — Differences and trends of spring phenophases in urban and rural areas. International Journal of Biometeorology, 44, 60−66. Schaaf, C. B., Gao, F., Strahler, A. H., Lucht, W., Li, X., Tsang, T., et al. (2002). First operational BRDF, albedo nadir reflectance products from MODIS. Remote Sensing of the Environment, 83, 135−148. Schneider, A., & Woodcock, C. E. (2008). Compact, dispersed, fragmented, extensive? A comparison of urban expansion in twenty-five global cities using remotely sensed data, pattern metrics and census information. Urban Studies, 45, 659−692. Schneider, A., Friedl, M. A., Mciver, D. K., & Woodcock, C. E. (2003). Mapping urban areas by fusing multiple sources of coarse resolution remotely sensed data. Photogrammetric Engineering and Remote Sensing, 69, 1377−1386. Schneider, A., Friedl, M. A., & Woodcock, C. E. (2005). Mapping urban areas by fusing multiple sources of coarse resolution remotely sensed data: Global results. Proceedings of the 5th international symposium of remote sensing of urban areas, 14–16 March 2005, Tempe, Arizona. Schneider, A., Friedl, M. A., & Potere, D. (2009). A new map of global urban extent from MODIS satellite data.Environmental Research Letters, 4 article 044003. Shepherd, M. (2005). A review of current investigations of urban-induced rainfall and recommendations for the future. Earth Interactions, 9, 1−27. Small, C. (2003). High spatial resolution spectral mixture analysis of urban reflectance. Remote Sensing of Environment, 88, 170−186. Small, C., & Lu, J. (2006). Estimation and vicarious validation of urban vegetation abundance by spectral mixture analysis. Remote Sensing of Environment, 100, 441−456. Sterling, S., & Ducharne, A. (2008). Comprehensive data set of global land cover change for land surface model applications.Global Biogeochemical Cycles, 22 article number GB3107. Stow, D., Lopez, A., Lippitt, C., Hinton, S., & Weeks, J. (2007). Object-based classification of residential land use within Accra, Ghana based on QuickBird satellite data. International Journal of Remote Sensing, 28, 5167−5173. Strahler, A. (2003). MODIS MOD12 Land Cover and Land Cover Dynamics Products User Guide. V003 Validation: Validation of the Consistent-year V003 MODIS Land Cover Product http://www-modis.bu.edu/landcover/userguidelc/consistent.htm, last accessed August 1, 2009. Svirejeva-Hopkins, A., Schellnhuber, H. J., & Pomaz, V. L. (2004). Urbanised territories as a specific component of the global carbon cycle. Ecological Modelling, 173, 295−312. Trusilova, K., Jung, M., Churkina, G., Karstens, U., Heimann, M., & Claussen, M. (2008). Urbanization impacts on the climate in Europe: Numerical experiments by the PSUNCAR Mesoscale Model (MM5). Journal of Applied Meteorology and Climatology, 47, 1442−1455. United Nations Department of Economic and Social Affairs, Population Division (2008). World Urbanization Prospects: The 2007 Revision. : United Nations Publications. World Bank (2008). World Development Indicators.http://econ.worldbank.org last accessed August 1, 2009. Zhang, X. Y., Friedl, M. A., Schaaf, C. B., Strahler, A., & Schneider, A. (2004). The footprint of urban climates on vegetation phenology. Geophysical Research Letters, 31, L12209.