Regional fish community indicators of landscape disturbance to catchments of the conterminous United States

Regional fish community indicators of landscape disturbance to catchments of the conterminous United States

Ecological Indicators 26 (2013) 163–173 Contents lists available at SciVerse ScienceDirect Ecological Indicators journal homepage: www.elsevier.com/...

913KB Sizes 0 Downloads 35 Views

Ecological Indicators 26 (2013) 163–173

Contents lists available at SciVerse ScienceDirect

Ecological Indicators journal homepage: www.elsevier.com/locate/ecolind

Regional fish community indicators of landscape disturbance to catchments of the conterminous United States Peter C. Esselman a,∗ , Dana M. Infante b , Lizhu Wang c , Arthur R. Cooper b , Daniel Wieferich b , Yin-Phan Tsang b , Darren J. Thornbrugh b , William W. Taylor b a

Department of Zoology and Center for Water Sciences, Michigan State University, East Lansing, MI 48824, USA Department of Fisheries and Wildlife, Michigan State University, East Lansing, MI 48824, USA c Great Lakes Regional Office, International Joint Commission, P.O. Box 32869, Detroit, MI 48232, USA b

a r t i c l e

i n f o

Article history: Received 11 July 2012 Received in revised form 25 October 2012 Accepted 30 October 2012 Keywords: Large-scale multimetric index River assessment Fish communities Boosted regression

a b s t r a c t Biological assessments of river conditions are increasingly conducted at regional and continental scales that match the extent of large-scale river management efforts. Multimetric indices composed of biological community indicators are commonly used to assess ecological condition and indices have recently been applied in large regions. Methods for large-scale multimetric index creation emphasize repeatability, comparability across regions, and objective selection of candidate metrics. Here we used an extensive fish dataset to create a large pool of fish community metrics which were screened to create multimetric indices (MMIs) in eight ecoregions covering the conterminous U.S. Candidate metrics were tested for metric range, corrected for natural gradients using boosted regression trees, and then tested for repeatability and sensitivity to landscape disturbance. Temporally stable and repeatable metrics were then evaluated for redundancy and used to compose MMIs for each region. Our MMIs were significantly correlated to independently developed MMIs, accurately reproducing prior index values with moderate to high precision and little bias. Our study demonstrates the utility of boosted regression tree models for correcting metric values for natural abiotic gradients and shows that the order of screening tests has a potentially important influence on metric selection. The resultant regional indices and component metrics provide a basis for assessing condition and testing hypotheses about landscape influences on aquatic ecosystems at a national scale in the US. © 2012 Elsevier Ltd. All rights reserved.

1. Introduction Biological assessments of river condition are increasingly conducted at regional and continental scales to support river management efforts. Where past efforts were typically carried out at basin- or state-wide extents, recent efforts assess areas many times this size. For example, the US Environmental Protection Agency’s Wadeable Streams Assessment (Paulsen et al., 2008; US Environmental Protection Agency, 2006) evaluated biological condition of wadeable streams of the lower-48 states of the US using benthic macroinvertebrate and fish indicators. US Geological Survey’s National Water Quality Assessment (NAWQA) data have been used to conduct multiple assessments over regional and national scales (Carlisle et al., 2008, 2010; Meador et al., 2008; Meador and Carlisle, 2009). Other efforts have used biological indicators to assess river environments over large portions of North America (Pont et al., 2009), Europe (Hering et al., 2004; Pont et al., 2006), and Australia (Harris and Silveira, 1999). Many of these

∗ Corresponding author. Tel.: +1 517 432 1927; fax: +1 517 432 2789. E-mail address: [email protected] (P.C. Esselman). 1470-160X/$ – see front matter © 2012 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ecolind.2012.10.028

studies employed multimetric indices (MMIs) composed of taxonomic and functional metrics identified to be sensitive to anthropogenic disturbance gradients. The growing list of largescale studies represents an ambitious and challenging front in biological assessment of river ecosystems. As the spatial extent of assessments expands, data needs and challenges grow. Large regional assessments require representative data for both natural and anthropogenic conditions in stream habitats or watersheds and regionally consistent response variables (Wang et al., 2011). Regional biological and environmental datasets are expensive to obtain, often resulting in low site densities and large gaps in spatial coverage thus limiting assessments to generalizations based on relatively few sites over large areas. Alternatively by combining datasets from different biological sampling programs to achieve greater site densities, biases caused by inconsistent site distributions and differences in sampling protocols can become problematic. Variable site densities resulting from combined datasets can lead to results that are biased toward trends expressed in heavily sampled areas. Sampling effort influences the accuracy and precision of estimates of biological community attributes (Angermeier and Smogor, 1995; Flotemersch and

164

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173

Blocksom, 2005; Lyons, 1992), so combining data from programs with different effort could increase variability of community metrics and reduce metric performance. Thus tradeoffs exist between lower site densities with less bias but low statistical power, and higher site densities with greater statistical power but increased bias and variable metric accuracy and precision. Early MMIs were conducted at relatively small scales relying on best professional judgment to help select aquatic community indicators (Karr, 1981; Karr et al., 1986). As the study extent expands, best professional judgment becomes less and less reliable because few scientists have extensive knowledge of faunal responses to environmental conditions over large scales. For this reason, regional studies tend to emphasize standardized approaches that can be repeated in different contexts, comparability across regions, and objective metric selection (Whittier et al., 2007). As the geophysical diversity of riverine landscapes increases with spatial extent it also becomes important to control for natural abiotic variation that can confound the interpretation of biological condition (Pont et al., 2006). Models are necessary to correct biological metrics for natural variation, leading to a more valid comparison of disturbed versus undisturbed river conditions (Baker et al., 2005; Cao et al., 2007; Pont et al., 2006) but involving more methodological complexity. Geographically extensive data about the landscape contexts for stream habitats are often used in regional biological assessments as proxies for local habitat conditions. The accuracy of landscape-scale biological assessments is affected by use of only those landscapescale disturbances that can be modeled or mapped across the region of interest. While many robust land use/land cover data sets exist for conducting large-scale assessments (e.g., US National Land Cover Data Set, US National Wetlands Inventory), some important drivers of stream habitat degradation are not widely available in a comparable fashion at national scales (e.g., locations and intensity of animal feedlot operations in the United States). Landscape factors tend to operate indirectly through intermediate factors that directly induce changes to stream flow, thermal regimes, sediment, water chemistry, etc. (Poff, 1997), making attribution of the proximate causes of biological decline difficult. Nonetheless, landscape anthropogenic variables often correlate strongly to the physical, chemical, and biological condition of streams and rivers (Allan, 2004; Allan et al., 1997; Gergel et al., 2002; Paul and Meyer, 2001), and many of the most influential anthropogenic drivers of change (e.g., urban land use, point sources of pollution) have been characterized for the entire US and other parts of the world. Thus, while challenges remain for teasing apart the relative contributions of different landscape disturbances, use of landscape-scale information represents a practical approach for developing a consistent picture of biological condition over large regions. The biological assemblages from which candidate indicators are calculated and screened are likely to be reflective of local habitat conditions that are the result of natural and anthropogenic landscape gradients. Recent work has advanced methods for screening indicators for use in regional multi-metric biological indices by placing a strong emphasis on rigorous testing and use of statistical principals to select individual metrics (Bramblett et al., 2005; Fore and Grafe, 2002; Klemm et al., 2003; McCormick et al., 2001) with an emphasis on transferability of methods across regions/settings (Stoddard et al., 2008). An advantage of using a robust, standardized process for MMI construction is that it can be applied to appropriate data in any region at almost any spatial scale to produce internally consistent results (Stoddard et al., 2008). Whittier et al. (2007) and Stoddard et al. (2008) recommend similar processes for screening indicators for use in large regional biological assessment. Their approach relies on successive performance tests to eliminate indicators that have (1) insufficient empirical range to reflect a wide gradient of ecosystem states, (2) high temporal variation in repeat

visits to the same site, and (3) low responsiveness to habitat degradation. Using the methods of Stoddard et al. (2008) and Whittier et al. (2007), a large pool of candidate metrics can be progressively reduced to a small set of indicators that can be consistently applied to indicate a biological condition gradient. In this paper, we apply an approach similar to Whittier et al. (2007) and Stoddard et al. (2008) to define fish community indicators of biological condition in nine large regions of the conterminous US. Our intent was to identify biological indicators that could be used in a national river habitat condition assessment for the National Fish Habitat Partnership (http://www.fishhabitat.org). We set out to identify a set of fish community indicators in each region that could be used in follow-up analyses to understand the influence of landscape disturbances on stream biological conditions. We opted to define indicators from an extensive fish community dataset combined from multiple state and federal data sources, which were screened against a cumulative anthropogenic disturbance gradient defined from a spatially extensive GIS database. Below, we describe the indicator screening process and our procedures for accounting for natural gradients, variable sampling densities and biases associated with combining data from different sampling protocols. We then present our results, explore the accuracy and precision of an MMI composed of our indicators, and discuss our approach and findings. 2. Methods 2.1. Study area and spatial framework Our study focused on fish communities of streams and rivers of the conterminous US. We used the National Hydrography Dataset Plus (NHDPlus) (USEPA and USGS, 2005) as our base layer for geographic representation of stream reaches and their catchments. The NHDPlus consists of 1:100,000-scale river reaches, the catchment boundaries demarking the land area estimated to drain to each reach as well as other stream and drainage attributes. We define a reach as a confluence-to-confluence river segment in the NHDPlus. The NHDPlus has a defined network topology that accounts for upstream–downstream connections between reaches and their associated catchments. This topology allowed us to summarize conditions for natural and anthropogenic landscape variables at two relevant catchment scales for each reach (Wang et al., 2011): the local catchment (the topographically defined land area draining laterally to a reach) and the network catchment (the entire land area upstream of a reach, including its local catchment). The rivers of the United States drain a land area of approximately 7.8 million km2 and integrate a large amount of abiotic and biogeographic variation (Abell et al., 1999). To control for some of this variation, we screened fish community indicators separately in nine ecoregions. We used the aggregated Omernik (1987) ecoregions defined by the EPA’s Wadeable Streams Assessment (Paulsen et al., 2008), because: (1) they encompass areas with similar physiographic conditions that fundamentally influence the structure and function of aquatic biological communities (Herlihy et al., 2008); and (2) they provide a basis for drawing comparisons to previous studies that were conducted in the same regions (e.g., Pont et al., 2009; Stoddard et al., 2008). Within aggregated ecoregions, we sub-sampled our fish collection sites within finer scale spatial units called ecological drainage units (EDUs). EDUs are 1000 to 10,000 km2 watershed-based spatial units defined by similarities in aquatic fauna, physiography, climate, and ecosystem connectivity (Higgins et al., 2005). Two hundred and fifty seven EDUs have been defined for the United States (Higgins et al., 2005; Sowa et al., 2007). Each river reach was assigned to one of six size-strata based on catchment area (A): headwater streams (A ≤ 10 km2 ), creeks (10 < A ≤ 100 km2 ), small rivers

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173

165

Table 1 Abiotic and anthropogenic variables used in the screening process with associated units, resolution, quartile and mean values from catchments across the entire study area, and source of data. NA = “not applicable” based on scale or to point feature data. Variable (Units)

Scale/resolution

Local catchment

Network catchment

25%

75%

Mean

25%

75%

Mean

Source

Abiotic variables Baseflow index (Baseflow/total flow × 100) Mean local catchment elevation (masl) Mean annual air temperature (◦ C) Mean annual precipitation (mm) Network catchment area (km2 )

1:250,000 30 m 4 km 4 km 1:100,000

NA 154 NA NA NA

NA 966 NA NA NA

NA 645 NA NA NA

32.28 NA 7.31 884.2 22.93

52.35 NA 11.28 1132 374.97

42.65 NA 9.75 1008 6076

Wolock, 2003 Gesch et al., 2002 USEPA and USGS (2005) USEPA and USGS (2005) USEPA and USGS (2005)

Anthropogenic variables Land use in catchment (%) Developed, open and low intensity Developed, medium intensity Developed, high intensity Pasture/hay Cultivated crops

30 m 30 m 30 m 30 m 30 m

2.97 0 0 0.08 0.28

14.34 1.00 0.08 34.20 16.02

13.00 2.43 0.88 20.31 10.74

3.39 0.01 0 0.83 2.56

9.68 0.73 0.20 47.93 16.92

9.38 1.27 0.43 25.59 11.49

Homer et al., 2007 Homer et al., 2007 Homer et al., 2007 Homer et al., 2007 Homer et al., 2007

1 km 1:100,000 1:100,000 NA NA NA NA NA

3.95 0.12 1189 0 0 0 0 0

43.50 0.79 2867 0 0 0 0 0

90.43 0.97 2461.10 0.03 0.004 0.04 0.01 0.001

5.09 0.30 1242 0 0 0 0 0

39.95 0.68 2003 0.02 0.0002 0.01 0 0

57.07 0.55 1856 0.02 0.0002 0.02 0.001 0.001

Dobson et al., 2000 U.S. Census Bureau (2002) U.S. Census Bureau (2002) USACE (2012) USGS (2005) USEPA (2012) USEPA (2012) USEPA (2012)

Density in catchment (per km2 ) Population Road crossings Road length (m) Dams Mines or mineral processing plants Toxics Release Inventory sites National pollutant discharge elimination system sites Superfund national priority sites

(100 < A ≤ 1000 km2 ), medium rivers (1000 < A ≤ 10,000 km2 ), large rivers (10,000 < A ≤ 25,000 km2 ), and great rivers (A > 25,000 km2 ). 2.2. Landscape data We identified national datasets of physiographic, climatic, geomorphic, and anthropogenic landscape characteristics that met four conditions: (1) representative of conditions since 2000; (2) consistent across the study area in the way that they were assembled; (3) meaningful for assessing fish habitat based on our understanding of the influence of landscape characteristics on fluvial habitats (reviewed in Allan, 2004; Gergel et al., 2002); and (4) of sufficient resolution to draw comparisons between local catchments. From national datasets that met these conditions, we identified five physiographic and climatic variables shown in the literature to directly or indirectly influence fish community composition and structure at regional scales (Table 1): upstream catchment area, mean local catchment elevation, baseflow contribution to network catchment, and mean network catchment air temperature and precipitation. We also identified 13 anthropogenic stress variables, including land use types and catchment densities of human population, dams, roads, road crossings, and point sources of pollution and toxics (Table 1). The values of each of the 13 landscape disturbance variables were summarized in local and network catchments of every river reach (Wang et al., 2011). Data summaries in local and network catchments were generated using the variable attribution and accumulation tools distributed with the NHDPlus (CA3T version 1.0.0, Horizon Systems, Herndon, VA). Depending on variable type (grid, polygon, point, line), we summarized the data as catchment means, catchment percentages, or densities of points or linear features per unit catchment area (Table 1). Means for network catchments were calculated as area-weighted averages to account for different contributions of local catchments to the area of each network catchment. 2.3. Fish data We assembled fish community data from multiple sampling programs to calculate potential ecological condition indicators.

Fish community data were gathered from federal and state biological assessment programs that used single-pass boat, barge, or backpack electrofishing methods and that focused on collecting whole-community data. We used only datasets that met three criteria: collected since 1990, counts for all species captured in a sample, and geographic coordinates associated with each sampling site. Repeat samples were available at some sites. Data received from providers were subjected to a rigorous quality assurance procedure that involved verification of data quality and geographic accuracy of sites. Data quality were assessed by manually reviewing each dataset to ensure that non-game species were well-represented in all samples, that the gear used was listed as electrofishing, and that species counts for each sample were free of missing values or unusually large numbers that could have resulted from errors in data entry. Sample events that did not meet these criteria were discarded from the dataset. The geographic accuracy of sites was verified by projecting sites into an equal-area projection in ArcGIS 9x (ESRI Corp, Redlands, CA), calculating distance to the nearest NHDPlus river arc, and then individually checking the locations of all sites greater than 50 m distance from an arc. Location checks were accomplished in Google Earth (Google Corp., Mountain View, CA) by comparing projected sample localities to river arcs overlain on aerial and satellite images to establish whether the site could be clearly associated with an NHDplus arc, was on an unmapped tributary, or was unassociated with any obvious river feature. Descriptions of sample locations were used to verify nearby landmarks. Sites that could be clearly associated with an NHDPlus arc were snapped to that arc, while those with ambiguous associations or were on unmapped river reaches were discarded. By these methods we verified the locations and quality of 49,340 sampling events at 26,468 sites. Though data were collected in different field sampling programs, all protocols targeted the whole fish community using electrofishing methods. After controlling for potential systematic bias caused by differences in sampling programs (described below), the data were assumed to be sufficient for capturing inter-site differences in the presence and relative abundances of fishes from which we could calculate fish richness, functional, and tolerance group metrics. We attributed our fish taxonomic data with tolerance, habitat, trophic, and reproductive guilds from the trait matrix

166

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173

presented by Frimpong and Angermeier (2009) and the tolerance matrix of Esselman et al. (2011). We used an established list of candidate metrics (Stoddard et al., 2005a) as a basis for calculating 85 metrics for absolute and proportional taxa richness and for proportional abundance for each sample. Each metric was attributed with an expected direction of response (Supplement 1). In most ecoregions, combining data from different state and federal fish sampling programs led to higher site densities in states with greater sampling effort. To reduce spatial bias in highly sampled regions, sites were sub-sampled to meet a target site density in each ecoregion equal to the mean site density across all EDUs in that region. We subsampled our dataset by EDU, taking a random sample of sites within stream size strata in proportion to the relative length of each size stratum in each EDU. This led to a reduction in site densities in EDUs with high densities and to retention of all sites in EDUs with below average site densities. Because of potentially strong differences in the ecological structure and function of large and great river systems, we did not use data from these strata in the current assessment. After subsampling and removal of large and great river sites, 12,279 sites remained. 2.4. Indicator screening The indicator screening process requires identification of leastand most-disturbed sites. In the absence of local habitat data at sites, we relied instead on information about anthropogenic disturbances to local and network catchments that are known to cause local habitat disturbance. This approach is supported by a large body of research that demonstrates landscape effects of human activities on streams habitats and biota (reviewed in Allan, 2004; Gergel et al., 2002). Based on the local and network catchment stress data associated with each reach, we ranked sites according to their relative stress magnitude for each of the 13 stress variables at the two spatial scales (26 separate rankings total). Sites with lower rank values had lower relative stress magnitudes. We averaged the stress rank across all 26 variable/scale combinations to derive a measure of relative cumulative stress. For each of the aggregated ecoregions, we placed ten percent or 50 sites (which ever was greater) with the lowest mean stress ranks in the least-disturbed category, while the same number of sites with the highest mean stress rank was designated as most-disturbed. Thus, for regions with less than 500 sites, 50 sites were still used, with the exception of the Northern Plain region (N = 106) where we selected only 30 sites for the least- and most-disturbed categories. Successive screening tests were applied to each of the 85 metrics in each region to identify those that were temporally stable and sensitive to landscape disturbance. Steps in this process were as follows: 1. Metric range and percent zero-values – Metrics with a narrow range are unlikely to vary sufficiently to discriminate site-level differences in condition, and those with a high proportion of zeros across sites are undesirable because they cannot be measured in many locations. Metrics with zero-values at >33% of sites were eliminated from consideration, as were richness metrics with a range of less than three species. The most recently collected sample at sites with repeat visits was used. 2. Adjust for natural gradients and systematic bias – Prior to further testing of candidate metrics, we controlled for variation due to natural environmental gradients and potential biases caused by differences in sampling protocols among the 35 programs from which we derived our national fish community data. We did this by developing boosted regression tree models for each community metric using important abiotic landscape predictors (Table 1) and a categorical variable representing sampling protocol. Models were fit to data from least-disturbed sites only

3.

4.

5.

6.

and used to predict metric scores that would be expected under least disturbed conditions to all sites. The residuals of the leastdisturbed model (residual = predicted–observed) were rescaled from 0 to 100 and used as the corrected biological metrics. Boosted regression trees are a machine-learning approach that combines many simple regression trees in an additive framework. Each tree trains on the residuals of the tree that precedes it until predictive deviance is minimized (see Elith, Leathwick, and Hastie, 2008 for overview). We implemented each model with a tree complexity of 5 and a bag ratio = 0.5, and adjusted the learning rate of each model to minimize prediction error after 2000–3500 iterations. Reproducibility – Community metrics that vary greatly in repeat samples at the same site are unlikely to be reliable indicators of inter-site differences. We used the signal-to-noise ratio (S/N) – the ratio of variance across sites to variance of repeated samples at the same sites – as our measure of metric reproducibility. Metrics with higher S/N have a greater likelihood of being reproduced in samples taken at later dates and thus to serve as reliable indicators of inter-site differences in condition. Metrics with S/N less than three were excluded from further consideration. Responsiveness test – Metrics that can discriminate mostdisturbed from least-disturbed conditions are responsive to the disturbance gradient, and thus are good candidates for use as indicators of habitat condition. We used Z-tests of metric values between least- and most-disturbed sites to judge the ability of each metric to distinguish the extremes of the disturbance gradient. The absolute value of the Z-statistic serves as a single measure of the statistical difference between the means of each disturbance group. Final metric selection and check for redundancy – All metrics with significant Z-test results that agreed with the expected direction of response were evaluated to compose regional MMIs. We proceeded by choosing the candidate metric that was most discriminating first, and then iteratively adding the most responsive metric from each metric category (trophic, reproductive, habitat, etc.) that was not redundant with metrics already selected in the index. If less than four metrics met these requirements, we then included the most discriminating non-redundant metrics from a metric category already represented in the final list of metrics. The metric selection process reflects the logic that the strongest MMIs are composed of the most responsive individual metrics that carry information unique from other metrics in the index (Stoddard et al., 2008; Van Sickle, 2010). Redundancy was assessed by evaluating Pearson correlation strengths between metrics at least-disturbed sites to verify that the metrics used to compose the MMI were not strongly correlated (r ≤ |0.70|). Least disturbed sites were used to assess redundancy so as not to inadvertently exclude metrics that co-varied as a result of anthropogenic disturbance (Stoddard et al., 2008). Metric scoring and calculation of final MMI – MMIs were composed by summing the selected metrics and rescaling the final MMI to range from 0 to 100 to facilitate interpretation and comparison.

The process used here is very similar to that presented by Stoddard et al. (2008), except that we used BRTs rather than multiple linear regression models to correct for natural gradients, and we conducted the reproducibility test after metric correcting rather than before. We hypothesized that including natural variation in the S/N test would lead to inflation of the “signal” in the ratio. Given that our interest was to identify strong indicators of anthropogenic disturbance, we first removed variation attributable to natural gradients, thus providing a stronger test that between-site differences are caused by differences in stream condition rather than by sampling variation within a site. To test whether the order of steps had

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173

167

Fig. 1. Densities of sampling sites in 8-digit USGS hydrologic units after subsampling (top), and the locations of least- and most-disturbed site locations within the 9 aggregated ecoregions used to stratify the metric screening process (bottom).

an important influence on our results, we compared S/N ratios of pre- and post-corrected metrics in each region using Student’s t-tests.

2.5. Multi-metric index and testing Validation of the final indicators is necessary to impart confidence in their use. The metric screening process provides an internal validation of each metric’s reproducibility, responsiveness, and redundancy relative to available datasets. However, external validation against independent datasets is a stronger test of metric performance. As an initial test of external validity we used our indicators to compose a MMI in each region, and we compared them with independently derived multimetric indices developed using similar methods. We compared our multimetric score with the vertebrate index of biotic integrity (IBI) scores calculated using Western Environmental Monitoring and Assessment Program (EMAP) data by Stoddard et al. (2005b) and the predictive MMI of Pont et al. (2009). We used least squares regression to examine the fit between our multimetric scores and the Western EMAP vertebrate IBI scores in the Western Mountain (WMT) and Xeric West (XER) ecoregions (Fig. 1). The WMT and XER regions were the only ones for which we had access to MMI scores derived using similar methods.

3. Results 3.1. Indicator screening The metric screening process led to a reduction in candidate metrics from 85 to five or less in each region that met all screening criteria (Table 2). The test for zero values led to the greatest reduction in candidate metrics under consideration in the Coastal Plain (CPL), Northern Plains (NPL), Upper Midwest (UMW), Western Mountain (WMT), and Xeric West (XER) regions. No metrics were eliminated by the range test, suggesting that fish data varied sufficiently to indicate a broad range of conditions. The reproducibility test led to the greatest reduction in candidate metrics in the Northern Appalachian (NAP), Southern Plains (SPL), and Temperate Plains (TPL) regions. In the Southern Appalachian (SAP) region, the responsiveness and redundancy tests led to the greatest reductions. Boosted regression tree models were developed for 94% (445/475) of the metrics that remained after the zero and range test. For the other 6% of metrics, a minimum number of boosted regression tree iterations (n = 50) failed to reduce model deviance, implying that they were not strongly correlated to the natural gradient variables used in our models. The 6% of fish metrics with no model (N = 30) were used for further metric screening without correction. Regional summaries of boosted regression tree results

168

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173

Table 2 Summary of the screening process by region. S/N = signal-to-noise test. Resp. = Z test for responsiveness. Redun. = test for redundancy and agreement of response direction with expectations. Abbrev.

CPL NAP NPL SAP SPL TPL UMW WMT XER

Region

Coastal Plain Northern Appalachians Northern Plains Southern Appalachians Southern Plains Temperate Plains Upper Midwest Western Mountain Xeric West

N

Number of metrics remaining after test:

446 4672 106 1896 521 2365 1426 625 222

Start

Zero/range

S/N

Resp.

Redun.

85 85 85 85 85 85 85 85 85

54 51 42 66 54 70 53 42 43

24 17 31 57 25 32 27 36 32

10 16 1 48 14 21 19 30 20

4 4 0 5 3 4 4 4 4

showed that metric correlations to natural gradients and sampling protocols were highly variable (Table 3). Mean regional correlation strengths between observed and model predicted values ranged from r = 0.27 (±0.21 SE) in the NPL region to r = 0.56 (±0.05 SE) in the SAP region, averaging 0.41 (±0.14 SE) across all models in all regions. Regional minimum correlation strengths suggested that some metrics were not related to regional environmental gradients, while maximum correlation strengths from 0.61 in the NAP region to a high of 0.89 in the CPL region indicate that spatial variation in other metrics was strongly linked to natural gradients (Table 3). Correcting for natural gradients and sampling protocols prior to running the test for reproducibility had an important influence on candidate metrics, particularly those measuring richness. In almost every region, we observed a general trend of lower S/N ratios after correction for natural gradients (Table 4), as a result of reduced among-site variability (e.g., “signal”) after accounting for variation associated with natural gradients and sampling protocols. Student’s t-tests comparing mean S/N ratios of pre- and post-corrected metrics run for all metric categories and for percent individuals, percent taxa, and richness metrics separately showed that richness metrics that were corrected for abiotic gradients had significantly lower S/N ratios in five of the nine regions (p < 0.10). The trend toward lower post-correction S/N values was present for relative abundance and relative richness metrics in most regions, but the differences were not statistically significant (Table 4). Had we corrected for natural gradients after the reproducibility test, an additional 43 metrics would have passed to the responsiveness test. Responsiveness testing identified metrics in each region that could statistically distinguish between sites in least- and mostdisturbed catchments. The NPL region had only one metric with significant differences between the means of the least and most disturbed sites (Table 2; Supplement 2), and this metric (% native herbivore taxa) responded negatively to stress, contrary to the expectation that herbivores tend to benefit from anthropogenic disturbance. Responsive metrics were more numerous in other regions, ranging from 10 metrics in the CPL region to 48 in the SAP region (Table 2, Supplement 2). The final metrics for each region matched our expected response, exhibited low percentages

of zeros, sufficient range, good reproducibility, sensitivity to the extremes of the disturbance gradient, and low redundancy with the other metrics chosen (Table 5). The majority of metrics selected exhibited a high degree of responsiveness to the disturbance gradient as measured by the Z-test. p-Values for 15 of the 29 metrics (52%) were less than 0.0001, indicating a strong ability to discern least- from mostdisturbed sites. Some of the metrics selected for regional MMIs included fish community attributes that responded positively to increased disturbance. For instance, percent omnivore taxa showed a strong positive response to increased disturbance in the Southern Appalachian region. The most frequent metric classes selected were habitat (10 metrics), trophic (9 metrics), reproductive (6 metrics), tolerance (5 metrics), and composition (2 metrics). In the CPL, SPL, and TPL regions, it was necessary to include two non-redundant metrics from the same metric categories to have at least three metrics in each region to calculate an MMI.

3.2. Multi-metric index testing Our MMIs were significantly correlated to the independently developed IBIs of Whittier et al. (2007) and Pont et al. (2009) for the WMT and XER regions. Our MMI was most strongly related to Whittier et al.’s (2007) vertebrate IBI for the WMT region (r2 = 0.69; p < 0.0001), followed by their vertebrate IBI for the XER region (r2 = 0.37; p < 0.0001), then to Pont et al.’s (2009) modeled IBI scores in the XER (r2 = 0.33; p < 0.0001) and WMT regions (r2 = 0.11; p < 0.0001) (Fig. 2). The scatter of points around the 1:1 line indicates a tendency for our MMI to have lower scores than Whittier et al.’s (2007) IBI at the low end of the disturbance gradient, and higher scores at the high end of the gradient (Figs. 2a and b). Compared to Pont et al.’s (2009) modeled IBI, neither regional comparison showed close correspondence to the 1:1 line, with most of Pont et al.’s scores falling in the intermediate range of values across the full range of our MMI (Fig. 2c), and a general tendency for our MMI scores to be higher across almost the full range of values in the Xeric region (Fig. 2d).

Table 3 Summary of boosted regression tree models of abiotic influences on sites in least-disturbed landscapes of each region. Region

N models trained

N least disturbed sites

Cross-validation correlation Mean ± SE

CPL NAP NPL SAP SPL TPL UMW WMT XER

52 51 34 66 44 70 53 38 37

45 467 30 190 52 237 143 63 30

0.46 0.43 0.27 0.56 0.39 0.50 0.46 0.30 0.32

± ± ± ± ± ± ± ± ±

0.10 0.04 0.21 0.05 0.13 0.05 0.07 0.13 0.20

Min

Max

−0.03 0.18 −0.19 0.17 −0.04 0.09 0.16 −0.01 -0.10

0.89 0.61 0.79 0.88 0.73 0.74 0.68 0.65 0.67

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173

169

Table 4 T-test results comparing mean metric S/N ratios before and after correction for natural gradients. Region codes are the same as those in Table 2 Metric types include all metrics combined (ALL), relative abundance (PIND), relative richness (PTAX), and absolute richness (RICH). N refers to the number of metrics in each class of the comparison. StDev = standard deviation. t-Value and P are the test statistic and significance level for each t-test, respectively. Bold results indicate those that are significantly different after correction. Region

Type

N

Uncorrected

Corrected

t-Value

p

Mean

StDev

Mean

StDev

CPL

ALL PIND PTAX RICH

52 17 16 19

3.39 3.36 3.39 3.41

1.74 1.26 2.32 1.63

3.40 3.47 3.68 3.11

1.55 1.22 1.98 1.42

−0.05 −0.27 −0.37 0.60

0.9615 0.7883 0.7121 0.5501

NAP

ALL PIND PTAX RICH

51 16 15 20

2.97 3.07 2.72 3.09

0.84 0.90 0.91 0.74

2.81 3.00 2.87 2.61

0.63 0.69 0.65 0.54

1.12 0.24 −0.50 2.31

0.2658 0.8095 0.6236 0.0263

NPL

ALL PIND PTAX RICH

34 9 9 16

16.18 12.47 7.93 22.62

12.04 7.67 6.46 13.21

16.03 15.44 5.99 22.02

12.84 14.74 3.81 11.83

0.05 −0.56 0.76 0.13

0.9624 0.5839 0.4567 0.8938

SAP

ALL PIND PTAX RICH

66 23 18 25

4.21 3.85 4.11 4.61

1.05 1.15 1.01 0.72

3.90 3.75 40.1 3.97

0.89 1.09 0.86 0.70

1.79 0.30 0.30 3.17

0.0753 0.7652 0.7631 0.0026

SPL

ALL PIND PTAX RICH

44 11 13 20

3.42 2.82 3.67 3.58

1.00 0.79 1.37 0.68

3.24 2.86 3.59 3.21

0.77 1.07 0.56 0.60

0.95 −0.11 0.20 1.82

0.3430 0.9126 0.8432 0.0769

TPL

ALL PIND PTAX RICH

70 25 19 26

3.31 2.98 3.15 3.76

0.80 0.70 0.74 0.76

3.08 2.85 2.96 3.37

0.77 0.67 0.69 0.83

1.80 0.63 0.82 1.74

0.0746 0.5310 0.4149 0.0882

UMW

ALL PIND PTAX RICH

53 17 15 21

3.36 2.93 2.50 4.33

1.11 0.44 0.46 1.10

3.23 2.96 2.66 3.87

0.87 0.47 0.45 0.96

0.66 −0.20 −0.99 1.46

0.5111 0.8421 0.3309 0.1519

WMT

ALL PIND PTAX RICH

38 11 10 17

6.50 7.57 5.40 6.46

2.63 3.90 1.58 1.89

6.36 7.32 4.99 6.56

2.56 4.25 1.18 1.12

0.23 0.15 0.65 −0.18

0.8195 0.8854 0.5229 0.8549

XER

ALL PIND PTAX RICH

37 11 10 16

5.54 5.85 7.37 4.19

2.79 2.44 4.09 0.67

4.42 5.03 5.33 3.43

1.81 2.34 1.86 0.57

2.05 0.80 1.44 3.48

0.0400 0.4332 0.1679 0.0016

4. Discussion Management of aquatic ecosystems in the face of rapid global change requires information about the condition of ecosystems at extents that are relevant to the decisions being made (Angermeier and Winston, 1999; Fausch et al., 2002; Meador et al., 2008). For national-scale management initiatives, we need to define indicators that respond consistently to anthropogenic disturbances over large regions that vary physiographically and biogeographically. Indicators defined at this extent are necessarily different from those defined for smaller-scale assessments, and the latter may not be effective at larger scales (Ode et al., 2008). Indicators for large spatial extents must respond consistently to a wide range of anthropogenic conditions across variable abiotic contexts and must account for greater taxonomic and functional diversity and a greater diversity of community types. Working at larger extents also leads to greater data demands to characterize the range of both natural and anthropogenic responses within the region of interest. This can lead to a reliance on landscape proxies for local scale habitat variation and on biological data assembled from multiple sources as it did here. Using one of the most extensive fish abundance dataset yet compiled for the United States, we identified repeatable and responsive

indicators in eight of the nine aggregated ecoregions of the conterminous US. We had little success identifying repeatable and responsive metrics for the Northern Plains (NPL) ecoregion, likely due to low sample size in this region (N = 106). The one metric with a statistically significant ability to discriminate least from most disturbed sites in the NPL (percent native herbivore taxa) had a response that was opposite (negative) from that expected from the literature (Supplement 2), and thus we excluded it from the final MMI calculation. In most of the other regions, more than a dozen metrics passed the responsiveness test (Supplement 2), but many of these were redundant with one another. Other combinations of non-redundant metrics could have been selected in place of those presented in Table 5, but with lower power to discriminate between sites in least- and most-disturbed landscape. We observed a general trend of increasing numbers of responsive metrics with increasing sample sizes in a region, except in the TPL and NAP regions, which had high sample sizes (N = 2365 and 4672, respectively; Table 2) but low numbers of responsive metrics relative to other regions (Table 2). Previous studies have also had difficulty defining metrics in the TPL and NAP regions. In the TPL, reference sites have been observed to be of low quality due to the ubiquity of agricultural land use and associated legacies (Bramblett et al., 2005; Stoddard et al., 2005b). Our results in the TPL region may have been similarly

170

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173

Table 5 Final metrics selected in each region showing direction of response (±), % zeros, S/N ratio, and results of sensitivity analysis. |Z| = absolute value of Z statistic. p = significance level for the Z test. Class

Metric code

Description

Response (±)

% Zeros

S/N

|Z|

p

CPL – Coastal Plains Region Trophic INV NAT RICH Trophic OMNI PTAX LOTC NAT PTAX Habitat CYPR PIND Composition

Native invertivore richness % omnivore taxa % native lotic taxa % cyprinid individuals

− + − −

25.34 0.13 13.75 20.39

5.20 5.14 4.75 4.37

4.53 3.96 3.19 2.88

<0.0001 <0.0001 0.0014 0.0039

NAP – Northern Appalachian Region Habitat RIVR NAT PTAX Reproductive HIDE NAT PTAX Tolerance INTOL PIND INVPISC PIND Trophic

% native large river taxa % native egg hider taxa % intolerant individuals % invertivore/piscivore individuals

+ − − −

12.33 18.16 19.91 12.43

3.89 3.49 3.39 3.20

23.74 22.43 16.38 8.35

<0.0001 <0.0001 <0.0001 <0.0001

SAP – Southern Appalachian Region RIVR NAT PTAX Habitat Reproductive HIDE NAT PIND HERB RICH Trophic Composition CYPR PIND TE PIND Tolerance

% native large river taxa % native egg hider individuals Herbivore richness % cyprinid individuals % threatened and endangered individuals

+ − + − −

2.31 5.47 1.95 3.03 31.86

6.48 4.03 3.58 3.24 3.98

11.98 11.46 9.41 5.93 5.31

<0.0001 <0.0001 <0.0001 <0.0001 <0.0001

SPL – Southern Plains Region LOTC NAT PIND Habitat RIVR NAT RICH Habitat OMNI RICH Trophic

% native lotic individuals Native large river species richness Omnivore richness

− + +

11.98 0.87 0.29

5.70 3.54 3.50

3.91 3.90 3.41

<0.0001 <0.0001 0.0006

TPL – Temperate Plains Region Habitat RIVR NAT PTAX Trophic OMNI RICH Habitat RHEO NAT PIND Reproductive LITH PTAX

% native large river taxa Omnivore richness % native rheophilic individuals % lithophilic spawner taxa

+ + − −

0.76 0.26 8.05 4.56

4.25 3.56 3.28 3.71

11.27 6.58 3.33 2.08

<0.0001 <0.0001 0.0009 0.0374

UMW – Upper Midwest Region Tolerance TE RICH Habitat RIVR NAT PTAX Reproductive LITH PIND OMNI RICH Trophic

Threatened and endangered species richness % native large river taxa % lithophilic spawner individuals Omnivore richness

− + − +

23.41 7.55 6.31 0.20

3.63 3.10 3.07 4.37

10.29 9.45 5.82 5.53

<0.0001 <0.0001 <0.0001 <0.0001

WMT – Western Mountain Region Reproductive HIDE NAT PIND Tolerance INTOL PIND PISC PIND Trophic Habitat RIVR NAT PIND

% native egg hider individuals % intolerant individuals % piscivore individuals % native large river individuals

− − − −

12.01 28.64 2.31 7.77

9.06 18.60 7.19 4.54

69.01 63.78 40.79 40.56

<0.0001 <0.0001 <0.0001 <0.0001

XER – Xeric West Region LITH PTAX Reproductive Habitat WCOL NAT PIND Tolerance TE RICH HERB RICH Trophic

% lithophilic spawner taxa % native water column individuals Threatened and endangered species richness Herbivore richness

− + − +

11.60 5.10 18.56 12.76

6.37 4.26 3.92 3.43

5.23 3.88 3.59 2.96

<0.0001 <0.0001 0.0003 0.0030

affected by the lack of a sufficiently wide gradient of ecological condition to define a robust indicator set. In the NAP region, defining effective fish indicators of ecological condition is made challenging by the relatively depauperate post-glacial fauna composed of generalist species that have colonized since the end of the last ice age (Halliwell et al., 1999). The generalist character of the NAP fauna is consistent with the low numbers of metrics passing the S/N test (Table 2) as low faunal variation among sites would lead to lower signal variances (S). Despite the small numbers of metrics available for responsiveness testing, the subset of metrics selected for our MMI had very high responsiveness to the landscape disturbance gradient (Table 5) indicated by very high statistical significance (low p values). When composed in a multi-metric index, our indicators correlated strongly to the spatially extensive regional vertebrate IBI on which we based our methods and drew our data (Fig. 2a and b; Whittier et al., 2007). The strength of correlation is striking given that: (1) we excluded amphibians and reptiles from our data, (2) our approaches to defining least and most disturbed sites and to modeling were different, and (3) the final metrics selected were largely different (except for the use of tolerance based metrics in

XER and WMT and lithophilic spawners in XER). We defined the disturbance gradient from landscape disturbances only, eliminating the need for expert opinion (Whittier et al., 2007). This lends support to the validity of using landscape proxies and also suggests that the generalized approach of Whittier et al. (2007) and Stoddard et al. (2008) may be robust to slight methodological changes and still produce similar results. The weaker agreement to the modeled IBI scores of Pont et al. (2009) likely reflects stronger divergences in the methods used by Pont et al. (2009) even though their dataset and least disturbed sites were identical to those used by Whittier et al. (2007). In particular, Pont et al.’s (2009) approach differed in the way metric residuals were treated after correction by models (transformation to standard deviation units followed by a normal probability transformation). The stronger divergence of Pont et al.’s (2009) scores from ours and from those of Whittier et al. (2007) (see Fig. 5 in Pont et al., 2009) suggests that differences in modeling (multiple linear regression versus boosted regression) may be less influential than the treatment of residuals and composition of final scores after the modeling phase is complete. Our indicator screening process differs from that of Whittier et al. (2007) and Stoddard et al. (2008) in several ways. First, we

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173

171

Fig. 2. Our multimetric scores against other regional MMIs: (a) Western EMAP IBI results for the Western Mountain and (b) Xeric West regions (Whittier et al., 2007); modeled IBI scores of Pont et al. (2009) for the Western Mountain (c) and Xeric West (d) regions. The solid line represents the best fit line, while the dotted line represents a perfect (1:1) fit.

conducted the S/N test after correcting for natural gradients rather than before as practiced by these authors. The purpose of the S/N test is to identify metrics that have high variation in space (“signal”) relative to variation in time (“noise”) to ensure that between-site differences are caused by differences in stream condition rather than by sampling variation within a site (Kaufmann et al., 1999). If the signal is inflated by natural abiotic differences between sites rather than anthropogenic factors, then the test does not serve its purpose of identifying temporally stable indicators of biological condition. Assuming that models effectively control for natural variation, then changing the order of the S/N screening should lead to signal values that are primarily due to differences in condition caused by anthropogenic drivers. Our comparison of S/N ratios of uncorrected and natural-gradient-corrected metrics (Table 4) suggests that the order of the S/N test can have significant effects on the screening process. Another difference between our screening process and that of previous work in the western US was our use of a non-parametric modeling approach instead of linear regression (Pont et al., 2009; Stoddard et al., 2008; Whittier et al., 2007). Large regional landscape data frequently have skewed distributions and non-linear responses, so often they do not meet the assumptions for standard regression and ANOVA techniques, even after data transformations (Esselman et al., 2011). Machine-learning approaches like boosted regression trees work well with skewed data that exhibit non-linear responses to predictors (De’ath, 2007). High average correlation strengths between modeled and observed metric values indicated that boosted regression trees were well-suited to the task of accounting for natural variation (Table 3). Cao et al. (2007) used classification and regression trees (CART) – the technique that forms the basis of boosted regression trees – to partition variation in diatom-based biological indicators due to natural environmental factors. They found that the CART-adjusted metrics estimated the underlying condition gradient more precisely and were less prone

to Type I and Type II errors. Like CART, boosted regression trees offer distinct advantages over parametric alternatives like multiple linear regression, including the ability to use a wide range of data types (numeric, binary, categorical, etc.), insensitivity to outliers, and freedom from assumptions about the form of the relationship between predictor and response variables (De’ath and Fabricius, 2000). Unlike CART, boosted regression trees tend to have greater predictive accuracy, and they can model smooth functions and interactions (Elith et al., 2008; Friedman, 2002). Relative to studies that corrected for natural gradients using linear regression, nonlinear models like BRTs will often lead to a better fit to data and therefore to lower residual variation and less bias in the variation around the best fit line. Regional metrics of ecological condition have great potential for application to river assessment and conservation planning. Biological indicator datasets at sampled reaches have recently been used to train models that predict ecological condition at all reaches in a spatially continuous manner (Carlisle et al., 2009; Riseng et al., 2010). Such datasets can be used to calibrate spatially continuous indices of cumulative landscape disturbance (Esselman et al., 2011; Wang et al., 2008), and also to characterize drivers of ecological condition loss (Van Sickle and Paulsen, 2008). In our prior work characterizing cumulative landscape disturbance to habitats throughout the conterminous U.S., we concluded that assessments over such a large extent would benefit from stratification of the country into biophysically similar regions, from inclusion of multiple biological response variables per region, and from application of non-linear statistical techniques to identify metrics and model their responses to landscape indicators (Esselman et al., 2011). The candidate metrics defined here (Table 4; Supplement 2) satisfy all of these recommendations and should lead to a greater ability in follow-up work to discern and compare the relative influences of landscape stresses on stream ecological condition.

172

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173

Biological assessments at sub-continental and continental extents represent an important frontier in applied ecological research (Paulsen et al., 2008). Essential challenges to these endeavors will continue to include difficulty in defining regional reference conditions, assembling adequate response datasets, and applying appropriate statistical and modeling techniques. Continued efforts in this area will facilitate a deeper understanding of the influence of study extent on indicator effectiveness, and methodological advances that promise to benefit assessments conducted at any extent. The fish community indicators presented here provide a basis for assessing a suite of landscape stresses and have much potential to help answer questions of importance for national river management and protection. Acknowledgements This research was supported on grants from the U.S. Fish and Wildlife Service and U.S. Geological Survey provided to Michigan State University, which completed data preparation and analysis. We wish to acknowledge the National Fish Habitat Partnership Science and Data Committee, Gary Whelan (Michigan DNR), Andrea Ostroff (USGS), and Doug Beard (USGS), The Nature Conservancy, and the World Wildlife Fund. Numerous organizations provided data for this project including GA DNR (T. Litts), IL DNR (A. Holtrop), FL Fish and Wildlife Commission (J. Estes), Indiana DNR (A. Grier), KY Division of Water (J. Brumley), LA Department of Wildlife and Fisheries (B. Alford), MA Department of Fisheries and Wildlife (T. Richards), MI DNR, OH EPA (D. Mishne), OK Conservation Commission (G. Kloxin), SC DNR (M. Scott), TN Wildlife Resources Agency (F. Fiss), TX Parks and Wildlife Department (T. Birdsong), University of Georgia (J. Chamblee) and Rushing Rivers Institute (J. Rogers). We are grateful to Robert Hughes and Thomas Whittier for sharing results from the W-EMAP research and to Peter Ruhl and Darren Carlisle for providing NAWQA data. We also wish to acknowledge Ralph Tingley, Jacqui Fenner, and Jared Ross for assistances with data management. Finally, we thank Scott Sowa, Jonathan Higgins, and Paul Seelbach for their helpful input. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/ j.ecolind.2012.10.028. References Abell, R., Olson, D.M., Dinerstein, E., Hurley, P., Diggs, J.T., Eichbaum, W., Walters, W., Allnutt, T., Loucks, C.J., Hedao, P., 1999. Freshwater Ecoregions of North America: A Conservation Assessment. Island Press, Washington, DC. Allan, J.D., 2004. Landscapes and riverscapes: the influence of land use on stream ecosystems. Annu. Rev. Ecol. Evol. Syst. 35, 257–284. Allan, J.D., Erickson, D.L., Fay, J., 1997. The influence of catchment land use on stream integrity across multiple spatial scales. Freshw. Biol. 37, 149–161. Angermeier, P.L., Smogor, R.A., 1995. Estimating number of species and relative abundances in stream-fish communities: effects of sampling effort and discontinuous spatial distributions. Can. J. Fish. Aquat. Sci. 52, 936–949. Angermeier, P.L., Winston, M.R., 1999. Characterizing fish community diversity across Virginia landscapes: prerequisite for conservation. Ecol. Appl. 9, 335–349. Baker, E.A., Wehrly, K.E., Seelbach, P.W., Wang, L., Wiley, M., Simon, T., 2005. A multimetric assessment of stream condition in the Northern Lakes and Forests Ecoregion using spatially explicit statistical modeling and regional normalization. Trans. Am. Fish. Soc. 134, 697–710. Bramblett, R.G., Johnson, T.R., Zale, A.V., Heggem, D.G., 2005. Development and evaluation of a fish assemblage index of biotic integrity for northwestern Great Plains streams. Trans. Am. Fish. Soc. 134, 624–640. Cao, Y., Hawkins, C.P., Olson, J., Kosterman, M.A., 2007. Modeling natural environmental gradients improves the accuracy and precision of diatom-based indicators. J. N. Am. Benthol. Soc. 26, 566–585. Carlisle, D.M., Hawkins, C.P., Meador, M.R., Potapova, M., Falcone, J., 2008. Biological assessments of Appalachian streams based on predictive models for fish, macroinvertebrate, and diatom assemblages. J. N. Am. Benthol. Soc. 27, 16–37.

Carlisle, D.M., Falcone, J., Meador, M.R., 2009. Predicting the biological condition of streams: use of geospatial indicators of natural and anthropogenic characteristics of watersheds. Environ. Monit. Assess. 151, 143–160. Carlisle, D.M., Wolock, D.M., Meador, M.R., 2010. Alteration of streamflow magnitudes and potential ecological consequences: a multiregional assessment. Front. Ecol. Environ. 9, 264–270. De’ath, G., Fabricius, K.E., 2000. Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81, 3178–3192. De’ath, G., 2007. Boosted trees for ecological modeling and prediction. Ecology 88, 243–251. Dobson, J.E., Bright, E.A., Coleman, P.R., Durfee, R.C., Worley, B.A., 2000. A global population database for estimating populations at risk. Photogramm. Eng. Rem. S. 66, 849–857. Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regression trees. J. Anim. Ecol. 77, 802–813. Esselman, P.C., Infante, D.M., Wang, L., Wu, D., Cooper, A.R., Taylor, W.W., 2011. An index of cumulative disturbance to river fish habitats of the conterminous United States from landscape anthropogenic activities. Ecol. Restor. 29, 133–151. Fausch, K.D., Torgersen, C.E., Baxter, C.V., Li, H.W., 2002. Landscapes to riverscapes: bridging the gap between research and conservation of stream fishes. Bioscience 52, 483–498. Flotemersch, J.E., Blocksom, K.A., 2005. Electrofishing in boatable rivers: does sampling design affect bioassessment metrics? Environ. Monit. Assess. 102, 263–283. Fore, L.S., Grafe, C., 2002. Using diatoms to assess the biological condition of large rivers in Idaho (U.S.A.). Freshw. Biol. 47, 2015–2037. Friedman, J.H., 2002. Stochastic gradient boosting. Comput. Stat. Data Anal. 38, 367–378. Frimpong, E.A., Angermeier, P.L., 2009. Fish traits: a database of ecological and lifehistory traits of freshwater fishes of the United States. Fisheries 34, 487–495. Gergel, S.E., Turner, M.G., Miller, J.R., Melack, J.M., Stanley, E.H., 2002. Landscape indicators of human impacts to riverine systems. Aquat. Sci. 64, 118–128. Gesch, D., Oimoen, M., Greenlee, S., Nelson, C., Steuck, M., Tyler, D., 2002. The national elevation dataset. Photogramm. Eng. Rem. S. 68, 5–11 http://ned.usgs.gov/ Halliwell, D.B., Langdon, R.W., Daniels, R.A., Kurtenbach, J.P., Jacobson, R.A., 1999. Classification of freshwater fish species of the northeastern United States for use in the development of indices of biological integrity, with regional applications. In: Simon, T.P. (Ed.), Assessing the Sustainability and Biological Integrity of Water Resources Using Fish Communities. CRC Press, Boca Raton, FL, pp. 301–333. Harris, J.H., Silveira, R., 1999. Large-scale assessments of river health using an index of biotic integrity with low diversity fish communities. Freshw. Biol. 41, 235–252. Hering, D., Moog, O., Sandin, L., Verdonschot, P.F.M., 2004. Overview and application of the AQEM assessment system. Hydrobiologia 516, 1–20. Herlihy, A.T., Paulsen, S.G., Van Sickle, J., Stoddard, J.L., Hawkins, C.P., Yuan, L.L., 2008. Striving for consistency in a national assessment: the challenges of applying a reference-condition approach at a continental scale. J. N. Am. Benthol. Soc. 27, 860–877. Higgins, J.V., Bryer, M.T., Khoury, M.L., Fitzhugh, T.W., 2005. A freshwater classification approach for biodiversity conservation planning. Conserv. Biol. 19, 432–445. Homer, C., Dewitz, J., Fry, J., Coan, M., Hossain, N., Larson, C., Herold, N., McKerrow, A., VanDriel, J.N., Wickham, J., 2007. Completion of the 2001 National Land Cover Database for the conterminous United States. Photogramm. Eng. Rem. S. 73, 337–341, www.mrlc.gov/nlcd2001.php Karr, J.R., 1981. Assessment of biotic integrity using fish communities. Fisheries 6, 21–27. Karr, J.R., Fausch, K.D., Angermeier, P.L., Yant, P.R., Schlosser, I.J., 1986. Assessing Biological Integrity in Running Waters: A Method and Its Rationale, 5th ed. Illinois Natural History Survey, Champaign, IL. Kaufmann, P.R., Levine, P., Robinson, E.G., Seeliger, C., Peck, D., 1999. Quantifying physical habitat in wadeable streams. Office of Research and Development, US Environmental Protection Agency, Washington, DC. Klemm, D.J., Blocksom, K.A., Fulk, F.A., Herlihy, A.T., Hughes, R.M., Kaufmann, P.R., Peck, D.V., Stoddard, J.L., Theny, W.T., Griffith, M.B., 2003. Development and evaluation of a macroinvertebrate biotic integrity index (MBII) for regionally assessing Mid-Atlantic Highlands streams. Environ. Manage. 31, 656–669. Lyons, J., 1992. Using the Index of Biotic Integrity (IBI) to Measure Environmental Quality in Warmwater Streams of Wisconsin. U.S. Department of Agriculture, Forest Service, North Central Forest Experimental Station, St. Paul, MN. McCormick, F.H., Hughes, R.M., Kaufmann, P.R., Peck, D.V., Stoddard, J.L., Herlihy, A.T., 2001. Development of an index of biotic integrity for the Mid-Atlantic Highlands region. Trans. Am. Fish. Soc. 130, 857–877. Meador, M.R., Whittier, T.R., Goldstein, R.M., Hughes, R.M., Peck, D.V., 2008. Evaluation of an index of biotic integrity approach used to assess biological condition in western US streams and rivers at varying spatial scales. Trans. Am. Fish. Soc. 137, 13–22. Meador, M.R., Carlisle, D.M., 2009. Predictive models for fish assemblages in eastern US streams: implications for assessing biodiversity. Trans. Am. Fish. Soc. 138, 725–740. Ode, P.R., Hawkins, C.P., Mazor, R.D., 2008. Comparability of biological assessments derived from predictive models and multimetric indices of increasing geographic scope. J. N. Am. Benthol. Soc. 27, 967–985. Omernik, J.M., 1987. Ecoregions of the conterminous United States. Ann. Assoc. Am. Geogr. 77, 118–125.

P.C. Esselman et al. / Ecological Indicators 26 (2013) 163–173 Paul, M.J., Meyer, J.L., 2001. Streams in the urban landscape. Annu. Rev. Ecol. Syst. 32, 333–365. Paulsen, S.G., Mayio, A., Peck, D.V., Stoddard, J.L., Tarquinio, E., Holdsworth, S.M., Van Sickle, J., Yuan, L.L., Hawkins, C.P., Herlihy, A.T., Kaufmann, P.R., Barbour, M.T., Larsen, D.P., Olsen, A.R., 2008. Condition of stream ecosystems in the US: an overview of the first national assessment. J. N. Am. Benthol. Soc. 27, 812–821. Poff, N.L., 1997. Landscape filters and species traits: towards mechanistic understanding and prediction in stream ecology. J. N. Am. Benthol. Soc. 16, 391–409. Pont, D., Hugueny, B., Beier, U., Goffaux, D., Melcher, A., Noble, R., Rogers, C., Roset, N., Schmutz, S., 2006. Assessing river biotic condition at a continental scale: a European approach using functional metrics and fish assemblages. J. Appl. Ecol. 43, 70–80. Pont, D., Hughes, R.M., Whittier, T.R., Schmutz, S., 2009. A predictive index of biotic integrity model for aquatic-vertebrate assemblages of western US streams. Trans. Am. Fish. Soc. 138, 292–305. Riseng, C.M., Wiley, M.J., Seelbach, P.W., Stevenson, R.J., 2010. An ecological assessment of Great Lakes tributaries in the Michigan Peninsulas. J. Great Lakes Res. 36, 505–519. Sowa, S.P., Annis, G., Morey, M.E., Diamond, D.D., 2007. A gap analysis and comprehensive conservation strategy for riverine ecosystems of Missouri. Ecol. Monogr. 77, 301–334. Stoddard, J.L., Peck, D.V., Olsen, A.R., Larsen, D.P., Van Sickle, J., Hawkins, C.P., Hughes, R.M., Whittier, T.R., Lomnicky, G., Herlihy, A.T., Kaufmann, P.R., Peterson, S.A., Ringold, P.L., Paulsen, S.G., Blair, R., 2005a. Environmental Monitoring and Assessment Program (EMAP): Western Streams and Rivers Statistical Summary. U.S. Environmental Protection Agency, Washington, DC. Stoddard, J.L., Peck, D.V., Paulsen, S.G., Van Sickle, J., Hawkins, C.P., Herlihy, A.T., Hughes, R.M., Kaufmann, P.R., Larsen, D.P., Lomnicky, G., Olsen, A.R., Peterson, S.A., Ringold, P.L., Whittier, T.R., 2005b. An Ecological Assessment of Western Streams and Rivers. U.S. Environmental Protection Agency, Washington, DC. Stoddard, J.L., Herlihy, A.T., Peck, D.V., Hughes, R.M., Whittier, T.R., Tarquinio, E., 2008. A process for creating multimetric indices for large-scale aquatic surveys. J. N. Am. Benthol. Soc. 27, 878–891.

173

US Army Corps of Engineers (USACE), 2012. National Inventory of Dams. geo.usace.army.mil/pgis/f?p=397:1:0:NO: (accessed Jan. 2007). U.S. Census Bureau, 2002. UA Census 2000 TIGER/Line Files TechDocumentation. U.S. Census Bureau, Washington, DC. nical www.esri.com/data/download/census2000-tigerline US Environmental Protection Agency, 2006. Wadeable Streams Assessment: a Collaborative Survey of the Nation’s Streams. Office of Water, US Environmental Protection Agency, Washington, DC. US Environmental Protection Agency (USEPA), US Geological Survey (USGS), 2005. National Hydrography Dataset Plus – NHDPlus Version 1.0. www.horizon-systems.com/nhdplus/ US Environmental Protection Agency (USEPA), 2012. EPA Geospatial Data Access Project. http://www.epa.gov/enviro/geo data.html (accessed Jan. 2007). US Geological Survey (USGS), 2005. Active mines and mineral processing plants in the United States in 2003. U.S. Geological Survey, Reston, Virginia. mrdata.usgs.gov/mineplant/. Van Sickle, J., Paulsen, S.G., 2008. Assessing the attributable risks, relative risks, and regional extents of aquatic stressors. J. N. Am. Benthol. Soc. 27, 920–931. Van Sickle, J., 2010. Correlated metrics yield multimetric indices with inferior performance. Trans. Am. Fish. Soc. 139, 1802–1817. Wang, L.Z., Brenden, T., Seelbach, P., Cooper, A., Allan, D., Clark, R., Wiley, M., 2008. Landscape based identification of human disturbance gradients and reference conditions for Michigan streams. Environ. Monit. Assess. 141, 1–17. Wang, L., Infante, D., Esselman, P., Cooper, A., Wu, D., Taylor, W., Beard, D., Whelan, G., Ostroff, A., 2011. A hierarchical spatial framework and database for the national river fish habitat condition assessment. Fisheries 36, 436–449. D.M., 2003. Base-flow index grid for the contermiWolock, United States. U.S. Geological Survey, Reston, VA. nous water.usgs.gov/GIS/metadata/usgswrd/XML/bfi48grd.xml. Whittier, T.R., Hughes, R.M., Stoddard, J.L., Lomnicky, G.A., Peck, D.V., Herlihy, A.T., 2007. A structured approach for developing indices of biotic integrity: three examples from streams and rivers in the western USA. Trans. Am. Fish. Soc. 136, 718–735.