Ecological Informatics 9 (2012) 11–18
Contents lists available at SciVerse ScienceDirect
Ecological Informatics journal homepage: www.elsevier.com/locate/ecolinf
Regionalization of forest pattern metrics for the continental United States using contiguity constrained clustering and partitioning John A. Kupfer ⁎, Peng Gao, Diansheng Guo Department of Geography, University of South Carolina, Columbia, SC 29208, USA
a r t i c l e
i n f o
Article history: Received 11 July 2011 Received in revised form 9 February 2012 Accepted 9 February 2012 Available online 16 February 2012 Keywords: Regionalization Environmental cluster analysis Land classification Landscape pattern metric Forest fragmentation Regional conservation planning
a b s t r a c t Technological advances have created new opportunities for defining and mapping ecological and biogeographical regions on the basis of quantitative criteria while generating a need for studies that evaluate the sensitivity of ecoregionalizations to clustering methods and approaches. In this study, we used a novel regionalization algorithm, regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP), to identify hierarchical regions based on measures of forest extent, connectivity, and change for 2109 watersheds in the continental U.S. Unlike regionalizations developed using non-spatial clustering techniques, REDCAP directly incorporates a spatial contiguity constraint into a traditional hierarchical clustering method, resulting in contiguous regions that optimize a homogeneity measure. Results of our analyses identified nine- and eighteen-class Forest Pattern Regions that reflected the influence of natural and anthropogenic factors structuring forest extent and fragmentation. Because these regions are defined by the forest pattern metrics themselves, rather than pre-defined political or ecological units, they provide a valuable means for visualizing forest pattern information and quantifying forest patterns across a large, diverse geographic area. In contrast, regionalizations of the same data using two non-spatial methods (kmeans clustering and non-spatial average linkage clustering) resulted in more homogeneous classes composed of many discontiguous units. While it should not be viewed as a replacement for non-spatial clustering techniques, REDCAP provides an alternative approach to developing ecological regionalizations by placing greater emphasis on maintaining the spatial contiguity of units, a property that may be desirable in many broad-scale regionalizations because it reduces data complexity and facilitates the visualization and interpretation of ecological or biogeographic data. © 2012 Elsevier B.V. All rights reserved.
1. Introduction Regionalization is a foundation of geographic data analysis (Haggett et al., 1977) that has been applied in fields as diverse as climatology (Fovell and Fovell, 1993), demography (Openshaw and Rao, 1995), political science (George et al., 1997), hydrology (Peterson et al., 2011) and agricultural science (Lark, 1998), among many others. Ecological land classification, the process of delineating and classifying ecologically distinctive areas that are homogeneous with respect to environmental conditions or biotic communities, provides perhaps the best example of the use of regionalization concepts and methods in biogeography and ecology. The results of such classifications are often hierarchical and define units of land at spatial scales ranging from fine-scale environmental domain classifications (e.g., ecological land types) to broad-scale features such as ecoregions (e.g., Cleland et al., 1997). Because regionalizations provide a ⁎ Corresponding author at: Department of Geography, 709 Bull Street, Columbia, SC 29208, USA. Tel.: + 1 803 777 6739; fax: + 1 803 777 4972. E-mail addresses:
[email protected] (J.A. Kupfer),
[email protected] (P. Gao),
[email protected] (D. Guo). 1574-9541/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.ecoinf.2012.02.001
perspective that is relevant to many basic and applied questions in biogeography, ecology, evolution and conservation (Kreft and Jetz, 2010), the development, testing and application of appropriate methods is an area of growing interest to many scientists and environmental managers (e.g., Heikinheimo et al., 2007; Procheş, 2005). The process of delineating ecological regions involves important considerations and tradeoffs regarding homogeneity of the regions with respect to any implicit or explicit classification criteria, and spatial contiguity of the resulting units. At the broad spatial extents of ecoregions, nations or continents, a commonly accepted and clearly articulated theoretical basis for ecoregionalization is often lacking (McMahon et al., 2004), and ecoregions are typically delineated by experts utilizing a mix of subjective and objective criteria and environmental proxies (e.g., Bailey, 1995; Omernik, 1987; Wiken, 1986). Spatial contiguity of the resulting regions is often desired but rarely handled quantitatively. Ecological regionalizations produced using more strictly quantitative methods (e.g., data mining techniques), on the other hand, provide a more repeatable, transparent, and defensible alternative (Coops et al., 2009). In this study, we used a novel regionalization algorithm, regionalization with dynamically constrained agglomerative clustering and
12
J.A. Kupfer et al. / Ecological Informatics 9 (2012) 11–18
partitioning (REDCAP), to identify hierarchical regions defined by metrics of forest extent, connectivity, and change. Forest loss, degradation and fragmentation collectively pose the greatest current threat to global biodiversity (Kupfer and Franklin, 2009), and a number of regional, national and international initiatives related to the sustainable use of forests and protection of biodiversity have prioritized the measurement and monitoring of deforestation and forest fragmentation at broad spatial scales (Kupfer, 2006). In the majority of such assessments, measures of forest area, pattern or fragmentation are simply mapped, with data on the extent or connectivity of forest cover provided for administrative or political units (Petit et al., 2001; Svancara et al., 2009) or summed by pre-defined units such as ecoregions or ecozones (Wulder et al., 2008). Such units may not provide the optimal basis to analyze or assess underlying patterns and drivers of landscape change. Regionalization provides an alternative approach for identifying regions defined instead by forest pattern characteristics themselves. By reducing data complexity, regionalization can assist in the visualization and interpretation of forest pattern information and thereby serve as an effective means for quantifying and monitoring forest patterns, particularly across large, diverse geographic areas (Long et al., 2010). We applied REDCAP to land cover data for the continental United States to cluster more than 2100 watersheds into Forest Pattern Regions on the basis of five landscape metrics. In this paper, we present methods for visualizing and interpreting the spatial distribution of forest pattern as depicted by the regions and highlight the utility of this approach by comparing it to regionalizations created using nonspatial clustering methods. 2. Materials and methods 2.1. Data The National Land Cover Dataset (NLCD) is the primary source of seamless land-cover data in the United States. The NLCD 2001 (2001 being the nominal year from which most of the Landsat imagery was acquired) mapped 16 land-cover classes across the conterminous U.S. at a 30 m cell size with a 0.4 ha minimum mapping unit (Homer et al., 2007). We used a similar approach to that employed by Wade et al. (2003) by reclassifying the land cover data into five categories: 1) forest, 2) non-forested natural and semi-natural land covers, 3) developed lands, 4) agricultural lands, and 5) nodata (Table 1). Measures of landscape change were made using the NLCD 1992/2001 Retrofit Land Cover Change Product, which was developed to map land cover changes between the 1992 and 2001 NLCD classifications (Fry et al., 2009). Based on a probability sample of 15,000 pixels from across the contiguous U.S., Wickham et al. (2010) cited an overall accuracy of 85.3% for NLCD 2001 Anderson Level I land cover classes, with forest and
Table 1 Scheme for reclassifying NLCD 2001 data into five general cover types. NLCD 2001 scheme Forested
Other natural/semi-natural cover
Human uses: developed lands
Human uses: agricultural lands No data
41 42 43 90 52 71 95 21 22 23 24 81 82 11 12 31
deciduous forest evergreen forest mixed forest woody wetlands shrub/scrub grassland/herbaceous emergent herbaceous wetland developed, open space developed, low intensity developed, medium intensity developed, high intensity pasture/hay cultivated crops – open water – perennial ice/snow – barren land (rock/sand/clay)
cropland user accuracies of 87% and 82%, respectively. Forest user accuracy improved to 91.5% when one region, which was dominated by shrubland, was excluded. Regional class-specific accuracies ranged from 79 to 91%, with higher accuracies for regionally-abundant land cover classes. Based on error matrices presented in Wickham et al. (2010), aggregating the Anderson Level I classes into the more general categories used here would result in user accuracies for forests and human land uses that exceed 90% in a majority of regions. User accuracy was lower for non-forested natural and semi-natural land covers (ca. 70%), with most errors involving confusion with forested cover; however, accuracy was typically much higher (>80–90%) for regions where non-forested natural and semi-natural land covers were dominant. Dozens of metrics have been used to quantify landscape pattern on the basis of land cover data (Haines-Young and Chopping, 1996) and to assess broad-scale patterns of deforestation (the loss of forested area) and fragmentation (the alteration of forest pattern across the landscape) (Kupfer, 2006). Using the reclassified NLCD data, we calculated 13 metrics at the scale of the 2109 eight-digit HUC (Hydrologic Unit Code) watersheds in the continental U.S. (http://water.usgs. gov/GIS/huc.html; last accessed 4 January 2012). Four metrics quantified the percentage of a watershed covered by forest, non-forested natural and semi-natural covers, developed areas, and agriculture. Another six metrics quantified land cover changes from 1992 to 2001, including the percentage of forest pixels that changed to developed, agricultural or non-forested natural land covers, and vice versa. The final three measures quantified forest connectivity using an approach similar to that developed by Riitters et al. (2000) in which a 9 × 9-pixel moving window was superimposed onto the map and the proportion of forest pixel edges shared with other forest pixels (Pff), non-forested natural land cover pixels (Pfn), or humandominated land use (developed or agricultural) pixels (Pfh) was calculated. These values could each range from 0 to 100%, depending on the surrounding context of forest pixels. While multiple landscape metrics are needed to adequately capture various aspects of forest extent and fragmentation, metrics tend to exhibit a great deal of redundancy, and a few can often capture most of the variation in landscape characteristics. Because the use of intercorrelated variables in multivariate regionalizations should be avoided, we sought to identify a subset of metrics that were relevant to our specific application and collectively quantified system characteristics while minimizing redundancy. Preliminary screening using principal component analysis allowed us to select five uncorrelated indices of landscape pattern that identified unique aspects of landscape pattern in watersheds across the continental U.S.: 1) the percentage of forested area in the watershed, 2) the percentage of developed area in the watershed, 3) forest connectivity with non-forested natural and semi-natural land covers, 4) forest connectivity with human-dominated land covers, and 5) the percentage of agricultural land that reverted to forest from 1992 to 2001. 2.2. Theory and calculation Existing applications of regionalization methods to broad-scale environmental and ecological data most commonly involve: 1) the use of standard non-spatial clustering methods, followed by a revision of the clusters to enforce spatial contiguity of the resulting units, or 2) non-spatial clustering with a spatially-weighted dissimilarity measure. These approaches both have inherent limitations that affect the definition and pattern of the resultant units (Guo, 2008). For example, because the clusters constructed in the first stage may not be spatially connected, the algorithms need to either divide or merge the clusters to form regions. This process causes the final number of regions to be unpredictable or difficult to specify, and the overall quality of regions (in terms of a homogeneity measure) is difficult to optimize. The method used in this study, REDCAP, directly incorporates a spatial contiguity constraint into a traditional hierarchical clustering
J.A. Kupfer et al. / Ecological Informatics 9 (2012) 11–18
method and partitions the spatial hierarchy to optimize a homogeneity measure. Spatial contiguity is thus guaranteed through the clustering and optimization process, resulting in more consistently and efficiently derived regions (Guo, 2008). Here, we provide a brief outline of REDCAP, but readers are referred to Guo (2008) and Guo and Wang (2011) for additional detail on its algorithms. REDCAP involves two steps: contiguity constrained hierarchical clustering and spatially contiguous tree partitioning. The first step is similar to that used in a traditional agglomerative hierarchical clustering approach in that units or clusters that best meet specified criteria are joined, but it requires that every cluster at each hierarchical level be spatially contiguous. A spatially contiguous tree is thus built by iteratively merging the most similar units that are spatially connected (Fig. 1). REDCAP then partitions the spatially contiguous tree to obtain a number of sub-trees, each of which corresponds to a spatially contiguous region. In this study, the objective of the tree partitioning step was to minimize the total heterogeneity of all regions. Heterogeneity (H) was defined as a measure of similarity among the watersheds within each region (R) defined as:
H ðRÞ ¼
nr d X X
xij −xj
2
ð1Þ
j¼1 i¼1
where d = the number of forest metrics; nr = the number of watersheds in R; xij = the value for the jth metric of the ith watershed, and xj = the mean value of the jth metric for all watersheds. We classified watersheds on the basis of the landscape metrics using three contiguity-constrained agglomerative clustering methods available in REDCAP. These methods differ in whether they define the distance between two clusters as: 1) the dissimilarity between the closest pair of data points from each cluster (single linkage clustering), 2) the average dissimilarity between all cross-cluster pairs of data points (average linkage clustering), or 3) the dissimilarity
Fig. 1. A hypothetical example of spatially-constrained clustering. Based on five areas (A–E) that differ in the amount of forest cover (represented by darkness of the shading), a non-spatial method would first link Areas A and E, and Areas D and C. A solution involving two clusters would be composed of: 1) a large region containing Areas A, B, and E, and 2) a smaller cluster with two disjunct elements, Areas C and D. Spatiallyconstrained clustering requires that every cluster at each hierarchical level be spatially contiguous. In this case, the first two regions formed would contain Areas A and B, and Areas D and E. The final two region solution differs from that produced using the non-spatial clustering method because of the manner by which the spatial tree is partitioned.
13
between the furthest pair of data points in the clusters (complete linkage clustering). A recent assessment of REDCAP indicated that the average linkage and complete linkage methods performed comparably for synthetic datasets while the single linkage method performed significantly worse (Guo and Wang, 2011). The additional spatial constraint added by contiguity-constrained agglomerative clustering requires that two clusters must be spatially contiguous to be merged. REDCAP employs two different strategies to perform clustering under a contiguity constraint: first-order constraints and full-order constraints. The first-order constraining strategy defines the distance between two clusters using only first-order edges, i.e., those that directly connect two spatial neighbors, during the clustering process. Therefore, the inter-cluster distances calculated using single, average and complete linkage approaches only involve cluster members that are directly connected. A full-order constraining strategy defines the distance between two clusters over all edges between them and is thus dynamic in nature because the contiguity matrix is updated after each merge to track all edges that connect two different clusters. Experiments have shown that the full-order strategy performs significantly better than the first-order strategy in optimizing an objective homogeneity function (Guo, 2008). While we calculated results using all three clustering methods combined with the full-order strategy, the results presented here are from the regionalization using full order, average linkage clustering, which performed as well or better than other methods in terms of regional heterogeneity, size balance, internal variation, and preservation of data distribution in a previous assessment of REDCAP (Guo, 2008). The REDCAP software is freely available at: www.spatialdatamining. org (last accessed 21 January 2012). Prior to the regionalization, each forest metric was normalized to a zero mean and unit variance. As with other clustering methods that are based on the degree of similarity in conditions between study units, REDCAP allows users to assign differential variable weights, which can then produce different clustering solutions (see, for example, Leathwick et al., 2003). In the absence of a quantitative basis for assigning such weights, researchers often ignore them or define them on the basis of expert opinion. Previous research has documented the relative importance of land cover composition over configuration in shaping biotic responses to forest loss and fragmentation (Fahrig, 2003). We therefore assigned weights of five (5.0) to the forest and developed land cover variables, two (2.0) to the two connectivity variables, and one (1.0) to the land cover change variable. Experiments with alternative weightings that still emphasized land cover composition over configuration (e.g., weights of 4 vs. 2, or 5 vs. 3) yielded nearly identical results for the coarsest divisions in the hierarchical classifications (e.g., the first ten regions) but differed slightly in boundary locations when more than 20 regions were mapped. We determined the statistically optimal number of regions by analyzing heterogeneity at each hierarchical level. Visually, diagnostic graphs can clarify how within-group heterogeneity varies with the number of clusters in a hierarchical classification. In this case, we plotted the heterogeneity function calculated in the clustering process (Eq. (1)) against the cluster number, with the latter ranging from 2 to the total number of possible regions. We used the L-method algorithm (Salvador and Chan, 2004) to locate points of maximum curvature in the evaluation plots under the assumption that an appropriate number of clusters often coincides with rapid changes in the evaluation metric classification as clusters are merged (Kreft and Jetz, 2010). To do so, the evaluation graph was iteratively divided into two parts, those with a lower or higher number of clusters (Lc and Rc, respectively) for each possible number of clusters (c). Separate lines were fitted for Lc and Rc, and total RMSE (Root Mean Squared Error) at c was defined as: RMSEc ¼
c−1 b−c RMSEðLc Þ þ RMSEðRc Þ b−1 b−1
ð2Þ
14
J.A. Kupfer et al. / Ecological Informatics 9 (2012) 11–18
where RMSE(Lc) and RMSE(Rc) were RMSEs of the lines in Lc and Rc respectively, and b was the largest number of clusters that could be obtained (i.e., the number of watersheds). The statistically optimal number of regions occurred at the c that minimized RMSEc and thus separated the many similar clusters that form a nearly straight line on the right side of the graph from the more rapidly increasing region on the left side. Traditionally, regionalization is a special form of classification in which spatial units are grouped together based not only on a set of defined criteria but also a set of contiguity constraints (Haining, 2003). A number of recent biogeographic and ecological ‘regionalizations’, however, have opted not to spatially constrain the classification process (e.g., Coops et al., 2009; Procheş, 2005; Rueda et al., 2010). To demonstrate the influence of imposing spatial contiguity on a regionalization, we classified our data using two additional non-spatial methods. First, we used average linkage clustering to classify the watersheds, but without a spatial contiguity constraint. The result is directly comparable to that produced by REDCAP because it employs the same clustering strategy. Second, we classified watersheds using a non-hierarchical classification technique, k-means clustering. Unlike hierarchical methods such as average linkage clustering, k-means clustering partitions study units into k clusters that minimize a measure of within-cluster dispersion (Han et al., 2001). In this case, we used the mean normalized metric values of the watersheds in a cluster as the cluster center, and minimized the within-cluster sum of square error. Both k-means clustering and hierarchical, agglomerative clustering have been widely applied to ecological and biogeographical data and utilized to develop environmental regionalizations (e.g., Kreft and Jetz, 2010; Salvati and Zitti, 2009). To facilitate comparisons, we defined the desired number of clusters for both approaches a priori based on the optimal number of regions identified using the Lmethod algorithm for the REDCAP analysis. Analyses were conducted using SPSS v. 17.0.
The nine primary Forest Pattern Regions reflect the influence of variations in forest extent, pattern and change caused by two factors. First, a number of the regions parallel well-defined ecoregional boundaries related to natural variability in forest cover due to gradients in elevation, soils and precipitation (Fig. 3-top). This was particularly evident for: 1) Northwestern Mountain Forests, Northern Rocky Mountain Forests, Boreal Shield Forests, and Eastern Temperate Forests, all of which were characterized by high forest cover and
3. Results 3.1. REDCAP and forest pattern regions Based on the L-method analysis, the statistically optimum number of Forest Pattern Regions was nine, with a secondary optimum of 18. Graphically, these values represent visible inflection points where heterogeneity begins to decline less rapidly with increasing numbers of regions in the classification (Fig. 2).
Fig. 2. Within-group heterogeneity as a function of the number of regions in a REDCAP clustering of 2109 watersheds on the basis of land cover, forest connectivity and change data. Statistically optimum numbers of forest pattern regions (9 and 18) were identified using the L-method algorithm (Salvador and Chan, 2004).
Fig. 3. Forest Pattern Regions identified by a REDCAP clustering of 2109 watersheds on the basis of land cover, forest connectivity and change data. Region boundaries are superimposed over values for: (top) the percentage of forested area in the watershed; (middle) forest fragmentation by human-dominated land covers; (bottom): forest fragmentation by non-forested natural and semi-natural land covers. Region names: (1) Northwestern Mountain Forests; (2) Northern Rocky Mountain Forests; (3) Southern/ Central Rocky Mountain Forests; (4) Great Basin Deserts and Grasslands; (5) Great Plains; (6) Boreal Shield Forests; (7) Agricultural Heartland; (8) Mid-Atlantic Metropolitan Corridor; (9) Eastern Temperate Forests.
J.A. Kupfer et al. / Ecological Informatics 9 (2012) 11–18
connectivity, and 2) the Great Basin Deserts and Grasslands region and the Great Plains region, which had low forest cover due to moisture limitations (Table 2). The importance of natural variability in conditions was also evident in differences in the amount of natural and semi-natural non-forested cover and its connectivity with forested systems. The Southern and Central Rocky Mountain Forests region was distinguished from other forest regions because of its large spatial extents of low elevation shrub, grassland and desert systems interspersed with montane forested areas. The second major factor structuring Forest Pattern Regions was anthropogenic activity, the effects of which would not normally be directly included in ecoregional delineations. Two regions, in particular, were distinguished by greater extents of agriculture (the Agricultural Heartland) and developed lands (the Mid-Atlantic Metropolitan Corridor) and thus greater fragmentation of forested habitats by humandominated land uses (Fig. 3-middle; Table 2). With respect to the forested regions, the REDCAP regionalization clearly distinguished between areas where forest was fragmented by more natural systems such as shrublands and grasslands (Southern and Central Rocky Mountain Forests), those experiencing a high degree of fragmentation by human activities (Eastern Temperate Forests), and those where forest connectivity was relatively high and fragmentation by either factor was low (Fig. 3-bottom). Subdivision of Forest Pattern Regions into 18 subregions generally occurred in two areas. The Eastern Temperate Forest was divided into several subregions that differed in terms of forest cover (most notably, the Lower Mississippi River delta region), human-caused fragmentation (e.g., in more agricultural areas of the Atlantic and Gulf Coastal Plain), and natural fragmentation (particularly the Everglades and Southern Florida) (Fig. 4). Regions in the western U.S. were similarly divided on the basis of forest extent and connectivity, although human-caused fragmentation was less of a distinguishing factor, except in southern and central California. Instead, subregions varied mainly as a function of differences in elevation, topography, and climate, with several well-defined Level II and III ecoregions emerging (e.g., the Columbia Plateau and Snake River Plain in the Pacific Northwest and the Mogollon Rim and Upper Gila Mountains in the Southwest). 3.2. The role of spatial contiguity in clustering The importance of imposing spatial contiguity on cluster development in the regionalization process was evident when REDCAP results were compared to regionalizations created by clustering watersheds without such a constraint. A nine-class classification developed using a non-spatial k-means clustering approach resulted in 221 unique units, with individual classes containing 8–42 discontiguous units (Fig. 5-top). While there was some overlap with the REDCAP-defined Forest Pattern Regions (e.g., Region E with the Agricultural Heartland or Region B with the two Rocky Mountain
15
Regions), widely-separated watersheds were often joined in a single cluster because of similarities in the five landscape pattern metrics (e.g., heavily forested watersheds in the Pacific Northwest, Boreal Shield, and Appalachian Mountains). In other cases, an extensive unit type was perforated by a number of smaller, disjunct units, for example, the various forested areas of Region B embedded within Region H. The nine-class, non-spatial average linkage hierarchical clustering generally had greater congruence with the REDCAP regionalization, but still resulted in a total of 133 disjunct units (Fig. 5bottom). While the nine-class k-means and average linkage clustering solutions had substantially more individual units arranged as discontiguous patches, they had much lower heterogeneity values (11,200 and 14,845, respectively) than the nine region REDCAP solution (23,132). This finding is not surprising because requiring regions to be spatially contiguous to be joined inherently increases withinregion heterogeneity. In fact, heterogeneity of the nine classes for the non-spatial methods approached that for the 18 class subregional REDCAP classification, and there were more similarities in the pattern of units, particularly with the non-spatial average linkage cluster results (Figs. 4 and 5). As the number of regions in all classifications increased, heterogeneity values began to converge because the individual, contiguous REDCAP subregions begin to approximate the numerous disjunct patches created by non-spatial methods. 4. Discussion While broad scale assessments of forest pattern have generally focused on mapping or reporting measures of forest loss and fragmentation for political or ecological units, the approach used in this study provides a complimentary perspective in two distinct ways. First, it facilitates the identification of regions that are perhaps most vulnerable to forest transformation effects or have the greatest potential conservation value, which has practical significance in biodiversity assessment and monitoring plans. Environmental cluster analysis has been cited as a potential tool for identifying sites that collectively represent regional species diversity (Trakhtenbrot and Kadmon, 2005), and a small but growing number of studies are beginning to apply quantitative regionalization methods to the examination of landscape patterns. Riitters and Coulston (2005), for example, used spatial scan statistics to identify geographic concentrations of forest located near holes (‘perforations’) in otherwise intact forest canopies, justifying their approach by noting that: “National assessments of forest fragmentation satisfy international biodiversity conventions, but they do not identify specific places where ecological impacts are likely” (p. 483). REDCAP provides a powerful yet flexible methodology for defining such regions and mapping variability of sample units within them. For example, while the Agricultural Heartland is characterized by a high level of human impact, it is possible to identify areas undergoing forest reversion or additional forest loss within the context of
Table 2 Summaries of landscape metrics for nine Forest Pattern Regions identified by REDCAP clustering of watersheds (n = 2109) on the basis of land cover, forest connectivity and change data. % Forest
Eastern temperate forests Northwestern mountain forests Southern/central rocky mountain forests Boreal shield forests Agricultural heartland Great plains Northern rocky mountain forests Great basin deserts and grasslands Mid-Atlantic metropolitan corridor
67.7 74.6 37.5 80.7 16.6 4.6 75.5 16.0 40.8
% Non-forest: Non-human land cover
Agriculture
Developed
3.7 15.5 54.3 6.5 6.1 65.4 16.3 66.0 10.6
22.6 5.5 6.5 11.1 73.4 29.4 4.9 15.4 25.3
6.0 4.3 1.6 1.7 4.0 0.5 3.3 2.6 23.3
PFN
PFH
Net Δ: agriculture to forest
2.9 14.2 24.0 5.9 4.5 11.9 12.9 23.8 3.8
21.0 7.3 3.4 10.3 44.1 4.3 4.8 7.3 38.1
0.0001 0.0001 − 0.0007 0.0129 0.0129 0.0022 0.0021 0.0012 − 0.0305
16
J.A. Kupfer et al. / Ecological Informatics 9 (2012) 11–18
Fig. 5. Nine class regionalizations produced using non-spatial clustering of 2109 watersheds on the basis of land cover, forest connectivity and change data. (top): k-means clustering; (bottom): hierarchical, agglomerative classification using average linkage clustering.
Fig. 4. Forest Pattern subregions identified by a REDCAP clustering of 2109 watersheds on the basis of land cover, forest connectivity and change data. Region boundaries are superimposed over values for: (top) the percentage of forested area in the watershed; (middle) forest fragmentation by human-dominated land covers; (bottom): forest fragmentation by non-forested natural and semi-natural land covers.
broader regional trends (Fig. 6). Similarly, it is simple to identify and map subregions within the Eastern Temperate Forest that vary on the basis of forest cover, agent of fragmentation, or net gains and losses of forest area (Fig. 7). These two examples highlight the second advantage of the approach used in this research: as a hierarchical, agglomerative approach, REDCAP generates a region hierarchy, starting from one region (the whole data set) and continuing to a specified maximum number of regions. In contrast to methods such as k-means clustering, hierarchical methods are particularly well suited for applications such as this because ecological and biogeographical entities are hierarchically arranged (McLaughlin, 1992), and the relative relationships
between regions can provide useful insights into underlying biotic connections and processes (Kreft and Jetz, 2010). For example, while we identified statistical optima regarding cluster sizes, REDCAP itself makes no assumptions about appropriate scales of analysis and display. Rather, it facilitates the exploration of regionalization patterns because the user can readily select and display any number of desired clusters, noting how regional patterns change with classification detail, and examine the cluster dendrogram, which is useful for understanding similarities among spatial units and regions. Results of this research underscore that any decision about whether to incorporate spatial contiguity into ecological land classifications needs to be made with an awareness of the importance of spatial scale. In fine-scale environmental domain classifications (e.g., those at the landscape scale), spatial contiguity of ecological land types is rarely expected, so spatial contiguity of units belonging to the same class is unlikely to be a constraint. Even at broader spatial scales, a number of researchers have chosen not to enforce spatial constraints during clustering to avoid forcing cohesion of clusters that may not be justified by the environmental or ecological data (e.g., Rueda et al., 2010). Such studies emphasize within-region homogeneity over spatial contiguity and may, as our results show, result in comparatively complicated solutions comprised of regions with many disjunct units. However, the usefulness of a regionalization extends beyond its ability to partition within- and between-region variability and needs to be firmly rooted in an awareness of the phenomena under study. Procheş (2005) argued that spatial contiguity should not serve as a constraint for diversity or endemism centers, even at large grain sizes, noting that the primary criterion should be the degree of similarity (or uniqueness) in organism assemblages. They
J.A. Kupfer et al. / Ecological Informatics 9 (2012) 11–18
Fig. 6. Net forest change from 1992 to 2001 for watersheds within the Agricultural Heartland Forest Pattern region.
did suggest that such regions may be intuitively easier to accept if they are contiguous. By its nature, REDCAP places greater emphasis on maintaining the spatial contiguity of units, a property that may be desirable in many broad-scale regionalizations because it reduces data complexity and facilitates the visualization and interpretation of ecological or biogeographic data (in this case, forest pattern information). In fact, the presentation of spatial information within more broadly defined regions gives users the ability to examine variations in individual unit characteristics within a broader regional context (e.g., Figs. 3–4 and 6–7). In contrast to k-means clustering, the hierarchical nature of REDCAP provides a more flexible platform for exploring the relative relationships between regions, and we advocate greater consideration for approaches that specifically incorporate spatial contiguity as a constraint in regionalizations, when appropriate. Finally, while errors in the source data such as land cover misclassifications may contribute to errors in the regionalization process, the data presented by Wickham et al. (2010) indicate a high level of accuracy for the coarse Anderson Level I NLCD data (85.3%) that was further improved by aggregating thematic classes into more common categories. Consequently, accuracy of the land cover data used to produce the regionalization exceeded 90% for many regions and classes, particularly those regions where a particular cover class was relatively well represented. Further, by enumerating a number of variables at the scale of watersheds, the importance of some fine scale errors were likely minimized. However, further research evaluating the potential effects of misclassifications in the source data on the resulting regions is warranted. 5. Conclusions Advances in computing power, data availability and methodological techniques are opening new avenues for defining and mapping Fig. 7. Variations in forest cover, human-caused fragmentation, and net forest change for subregions within the Eastern Temperate Forest region.
17
18
J.A. Kupfer et al. / Ecological Informatics 9 (2012) 11–18
ecological and biogeographical regions using quantitative approaches (Mackey et al., 2008). Researchers have thus begun to evaluate: 1) the sensitivity of regionalizations and, more broadly, environmental classifications, to differences in the clustering method (Trakhtenbrot and Kadmon, 2005), and 2) the effectiveness of different clustering techniques for biodiversity applications (Kent and Carmel, 2011). In this research, we explored the use of a novel regionalization algorithm, regionalization with dynamically constrained agglomerative clustering and partitioning, for the identification of hierarchical regions defined by metrics of forest extent, connectivity, and change. Our results demonstrate the utility of this approach for visualizing and interpreting the spatial patterns and controls of forest extent and connectivity, but the basic approach is applicable to a much broader range of ecological and biogeographic applications. Despite recent advances in quantitative regionalization methods, there has been little effort to evaluate the issue of spatial contiguity in deriving and interpreting the resulting regions. Our findings highlight the value of explicitly incorporating spatial contiguity into broad-scale ecological land classifications and regionalizations, at least as a complement to non-spatial clustering approaches; specifically, our results point toward the necessity of weighing (and justifying) the tradeoff between methods and approaches that emphasize within-region homogeneity vs. those that consider spatial contiguity of the resulting regions. This decision is particularly important when classifying an area into relatively few regions; as the number of regions grows, the significance of spatial contiguity declines while heterogeneity values from spatial and non-spatial methods begin to converge. Acknowledgments This work was supported in part by Grant #BCS-0748813 from the National Science Foundation to D. Guo. Comments from two anonymous reviewers greatly improved the quality of this paper. JAK would like to thank Harvey Miller and Gordon Mulligan for conversations that helped to stimulate his interest in this topic. References Bailey, R.G., 1995. Description of the Ecoregions of the United States, Miscellaneous Publications No. 1391Second Edition. U.S. Department of Agriculture, Forest Service, Washington, DC. Cleland, D.T., Avers, P.E., McNab, W.H., Jensen, M.E., Bailey, R.G., King, T., Russell, W.E., 1997. National hierarchical framework of ecological units. In: Boyce, M.S., Haney, A. (Eds.), Ecosystem Management Applications for Sustainable Forest and Wildlife Resources. Yale University Press, New Haven, CT, pp. 181–200. Coops, N.C., Wulder, M.A., Iwanicka, D., 2009. An environmental domain classification of Canada using earth observation data for biodiversity assessment. Ecological informatics 4, 8–22. Fahrig, L., 2003. Effects of habitat fragmentation on biodiversity. Annual Review of Ecology, Evolution, and Systematics 34, 487–515. Fovell, R.G., Fovell, M.Y.C., 1993. Climate zones of the conterminous United States defined using cluster analysis. Journal of Climate 6, 2103–2135. Fry, J.A., Coan, M.J., Homer, C.G., Meyer, D.K., Wickham, J.D., 2009. Completion of the National Land Cover Database (NLCD) 1992–2001 Land Cover Change Retrofit product. U.S. Geological Survey Open-File Report 2008–1379. 18 pp. George, J.A., Lamar, B.W., Wallace, C.A., 1997. Political district determination using large-scale network optimization. Socio-Economic Planning Sciences 31, 11–28. Guo, D., 2008. Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP). International Journal of Geographical Information Science 22, 801–823. Guo, D., Wang, H., 2011. Automatic region building for spatial analysis. Transactions in GIS 15, 29–45. Haggett, P.A., Cliff, D., Frey, A.E., 1977. Locational Analysis in Human Geography, second ed. Arnold, London. Haines-Young, R., Chopping, M., 1996. Quantifying landscape structure: a review of landscape indices and their application to forested landscapes. Progress in Physical Geography 20, 418–445.
Haining, R., 2003. Spatial Data Analysis-Theory and Practice. Cambridge University Press, Cambridge. Han, J., Kamber, M., Tung, A.K.H., 2001. Spatial Clustering Methods in Data Mining: A Survey. In: Miller, H.J., Han, J. (Eds.), Geographic Data Mining and Knowledge Discovery. Taylor and Francis, London, pp. 188–217. Heikinheimo, H., Fortelius, M., Eronen, J., Mannila, H., 2007. Biogeography of European land mammals shows environmentally distinct and spatially coherent clusters. Journal of Biogeography 34, 1053–1064. Homer, C., Dewitz, J., Fry, J., Coan, M., Hossain, N., Larson, C., Herold, N., McKerrow, A., Van Driel, J.N., Wickham, J., 2007. Completion of the 2001 National Land Cover Database for the conterminous United States. Photogrammetric Engineering and Remote Sensing 73, 337–341. Kent, R., Carmel, T., 2011. Evaluation of five clustering algorithms for biodiversity surrogates. Ecological Indicators 11, 896–901. Kreft, H., Jetz, W., 2010. A framework for delineating biogeographical regions based on species distributions. Journal of Biogeography 37, 2029–2053. Kupfer, J.A., 2006. National assessments of forest fragmentation patterns in the U.S. Global Environmental Change 16, 73–82. Kupfer, J.A., Franklin, S.B., 2009. Linking spatial pattern and ecological responses in human-modified landscapes: The effects of deforestation and forest fragmentation on biodiversity. Geography Compass 3, 1331–1355. Lark, R.M., 1998. Forming spatially coherent regions by classification of multi-variate data: an example from the analysis of maps of crop yield. International Journal of Geographical Information Science 12, 83–98. Leathwick, J.R., Overton, J.M., McLeod, M., 2003. An environmental domain classification of New Zealand and its use as a tool for biodiversity management. Conservation Biology 17, 1612–1623. Long, J., Nelson, T., Wulder, M., 2010. Regionalization of landscape pattern indices using multivariate cluster analysis. Environmental Management 46, 134–142. Mackey, B.G., Berry, S.L., Brown, T., 2008. Reconciling approaches to biogeographical regionalization: a systematic and generic framework examined with a case study of the Australian continent. Journal of Biogeography 35, 213–229. McLaughlin, S.P., 1992. Are floristic areas hierarchically arranged? Journal of Biogeography 19, 21–32. McMahon, G., Wiken, E.B., Gauthier, D.A., 2004. Toward a scientifically rigorous basis for developing mapped ecological regions. Environmental Management 34, S111–S124. Omernik, J.M., 1987. Ecoregions of the conterminous United States. Annals of the Association of American Geographers 77, 118–125. Openshaw, S., Rao, L., 1995. Algorithms for reengineering 1991 census geography. Environment and Planning A 27, 425–446. Peterson, H.M., Nieber, J.L., Kanivetsky, R., 2011. Hydrologic regionalization to assess anthropogenic changes. Journal of Hydrology 408, 212–225. Petit, S., Firbank, R., Wyatt, B., Howard, D., 2001. MIRABEL: Models for integrated review and assessment of biodiversity in European landscapes. Ambio 30, 81–88. Procheş, Ş., 2005. The world's biogeographical regions: cluster analyses based on bat distributions. Journal of Biogeography 32, 607–614. Riitters, K., Wickham, J., O'Neill, R., Jones, K.B., Smith, E., 2000. Global-scale patterns of forest fragmentation. Conservation Ecology 4 (2), 3 http://www.consecol.org/vol4/ iss2/art3 [online]. Riitters, K.H., Coulston, J.W., 2005. Hot spots of perforated forest in the eastern United States. Environmental Management 35, 483–492. Rueda, M., Rodriguez, M.A., Hawkins, B.A., 2010. Towards a biogeographic regionalization of the European biota. Journal of Biogeography 37, 2067–2076. Salvador, S., Chan, P., 2004. Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms. Proceedings of the Sixteenth IEEE International Conference on Tools with Artificial Intelligence. Institute of Electrical and Electronics Engineers, Piscataway, N.J, pp. 576–584. Salvati, L., Zitti, M., 2009. The environmental “risky” region: Identifying land degradation processes through integration of socio-economic and ecological indicators in a multivariate regionalization model. Environmental Management 44, 888–898. Svancara, L.K., Scott, J.M., Loveland, T.R., Pidgorna, A.B., 2009. Assessing the landscape context and conversion risk of protected areas using satellite data products. Remote Sensing of Environment 113, 1357–1369. Trakhtenbrot, A., Kadmon, R., 2005. Environmental cluster analysis as a tool for selecting complementary networks of conservation sites. Ecological Applications 15, 335–345. Wade, T.G., Riitters, K.H., Wickham, J.D., Jones, K.B., 2003. Distribution and causes of global forest fragmentation. Conservation Ecology 7 (2), 7 http://www.consecol. org/vol7/iss2/art7/ [online]. Wickham, J.D., Stehman, S.V., Fry, J.A., Smith, J.H., Homer, C.G., 2010. Thematic accuracy of the NLCD 2001 land cover for the conterminous United States. Remote Sensing of Environment 114, 1286–1296. Wiken, E.B. (compiler). 1986. Terrestrial Ecozones of Canada. Ecological Land Classification Series No. 19. Hull, PQ: Environment Canada. Wulder, M.A., White, J.C., Han, T., Coops, N.C., Cardille, J.A., Holland, T., Grills, D., 2008. Monitoring Canada's forests. Part 2: National forest fragmentation and pattern. Canadian Journal of Remote Sensing 34, 563–584.