Predicting hydrologic disturbance of streams using species occurrence data

Predicting hydrologic disturbance of streams using species occurrence data

Science of the Total Environment 686 (2019) 254–263 Contents lists available at ScienceDirect Science of the Total Environment journal homepage: www...

3MB Sizes 0 Downloads 17 Views

Science of the Total Environment 686 (2019) 254–263

Contents lists available at ScienceDirect

Science of the Total Environment journal homepage: www.elsevier.com/locate/scitotenv

Predicting hydrologic disturbance of streams using species occurrence data☆ John Tyler Fox a,⁎, Daniel D. Magoulick b a b

Arkansas Cooperative and Wildlife Research Unit, Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA U.S. Geological Survey, Arkansas Cooperative Fish and Wildlife Research Unit, Department of Biological Sciences, University of Arkansas, Fayetteville, AR, USA

H I G H L I G H T S

G R A P H I C A L

A B S T R A C T

• Climate non-stationarity is expected to increase stream hydrologic variability. • We modeled hydrologic disturbance (HDI) using fish records, flow, and geographic variables via random forest (RF). • RF models predicted HDI class of ungaged streams with high overall and class accuracy. • Georeferenced biological data can inform predictive models of hydrologic disturbance.

a r t i c l e

i n f o

Article history: Received 7 December 2018 Received in revised form 16 April 2019 Accepted 11 May 2019 Available online 25 May 2019 Editor: Ralf Ludwig Keywords: Aquatic GAP Flow alteration Ecological flows Machine learning Random forest Geospatial analysis Watershed conservation and management Natural flow regime

a b s t r a c t Aquatic organisms have adapted over evolutionary time-scales to hydrologic variability represented by the natural flow regime of rivers and streams in their unimpaired state. Rapid landscape change coupled with growing human demand for water have altered natural flow regimes of many rivers and streams on a global scale. Climate non-stationarity is expected to further intensify hydrologic variability, placing increased pressure on aquatic communities. Using a machine learning approach and georeferenced species occurrence data, we modeled and mapped spatial patterns of hydrologic disturbance for streams in Arkansas, Missouri, and eastern Oklahoma. Random forest (RF) models trained on fish community data, hydrologic, and landscape metrics for gaged streams in the National Hydrography (NHDPlusV2) database were used to predict a hydrologic disturbance index (HDI) for ungaged streams. The HDI is part of the USGS Geospatial Attributes of Gages for Evaluating Streamflow (GAGESII) database and is a composite index of watershed-scale disturbance from anthropogenic stressors. Fish presence/ absence data had similar overall model prediction accuracy (77%; 95% CI: 0.74, 0.80) as flow variables (76%; CI: 0.73, 0.80). Including topographic variables increased the RF prediction accuracy of both the fish (90%; CI: 0.88, 0.92) and flow models (86%; CI: 0.84, 0.89). Spatial patterns of hydrologic disturbance suggest distinct ecohydrological regions exist where conservation actions may be focused. Streams with low HDI were predominately located in the Ozark Highlands, Boston Mountains, and Ouachita Mountains. Correlation analysis of HDI

☆ This draft manuscript is distributed solely for the purposes of scientific peer review. Its content is deliberative and predecisional, so it must not be disclosed or released by reviewers. Because the manuscript has not yet been approved for publication by the U.S. Geological Survey (USGS), it does not represent any official USGS findings or policy. ⁎ Corresponding author. E-mail address: [email protected] (J.T. Fox).

https://doi.org/10.1016/j.scitotenv.2019.05.156 0048-9697/© 2019 Published by Elsevier B.V.

J.T. Fox, D.D. Magoulick / Science of the Total Environment 686 (2019) 254–263

255

by flow regime showed groundwater stable streams had the lowest disturbance frequency, with over 50% of stream reaches with low HDI located in forested land cover. HDI was highest for big rivers, intermittent runoff streams and streams in areas of agricultural land use. Our results show long-term georeferenced biological data can provide a valuable resource for predictive modeling of hydrologic disturbance for ungaged rivers and streams. © 2019 Published by Elsevier B.V.

1. Introduction Aquatic species have evolved life-history traits adapted to naturallyvarying but predictable environmental conditions and this variability is often vital for survival and reproduction, particularly for species requiring a range of habitat conditions at different life stages (Poff et al., 1997). The flow regime of rivers and streams, in particular, plays a central role in sustaining native biodiversity and is one of the most influential regulators of fish life-history strategies and community structure (Power et al., 1988; Olden and Kennard, 2010; Poff et al., 2010; Matthews, 2012; Mims and Olden, 2013). However, hydrology is also among the most frequently altered components of lotic systems due to human activities and other environmental disturbance (Poff, 1997; Poff et al., 1997). Local and landscape-scale factors including dams, diversions, roads, and artificial canals can impact the composition of fish communities by pushing stream flow outside of the bounds of normal function (Bunn and Arthington, 2002; Carlisle et al., 2011; McManamay et al., 2012; McManamay et al., 2015). Seasonal and interannual variability of high and low flows is projected to increase for many streams and rivers, even in regions where mean annual river discharge remains relatively unchanged, with an increasing percentage of streams predicted to have experienced flow regime shifts by the 2050s (Döll and Schmied, 2012). Environmental flows (e-flows) science has received increased attention since the formulation of the natural flow regime paradigm (Poff et al., 1997) accompanied by the development of an array of hydrologic metrics (HMs) to describe natural flow regimes, quantify flow alteration, and aid in defining environmental flow standards (Eng et al., 2017). Recent studies have highlighted the importance of certain HMs for predicting fish and invertebrate community integrity, particularly those describing depleted high flows, homogenization of flows, and erratic flows (Carlisle et al., 2017). However, the predictive performance of many commonly-used HMs frequently has little or no bearing on the actual ecological relevance of those metrics (Carlisle et al., 2017; Eng et al., 2017). Furthermore, reliance on historical hydrologic time series for calculating reference conditions to guide environmental flows practices and allocation of conservation resources is challenging given shifting baselines due to rapid climate change and other disturbance. (Poff, 2017). Incorporating other “non-flow” environmental variables and broadening the ecological framework of e-flow science to incorporate population and community-level biological data is an important component in developing and understanding flow-ecology relationships, and improving the transferability and scalability of predictive models (Poff, 2017). Species occurrence records represent an underutilized and potentially valuable resource for evaluating and understanding the impacts of anthropogenic modification and other disturbance on streamflow and the structure and function of aquatic ecosystems. Environmental factors interact to exert selective pressure on aquatic organisms via complex and multidimensional ways over different spatial, temporal, and taxonomic scales of observation (Rahel, 1990; McManamay et al., 2015). Biotic factors including competition, predator-prey interactions, morphology, and resource partitioning can have strong effects on fish communities via both direct and indirect mechanisms (Power et al., 1988; Jackson et al., 2001; Matthews, 2012), while abiotic factors including temperature, flow, and water chemistry often determine whether individuals are able to successfully

colonize and persist in a potential habitat (Smith and Powell, 1971; Power et al., 1988; McManamay et al., 2015). Thus, abiotic factors operate hierarchically and unidirectionally at larger scales (e.g. basins, stream segments) to influence both structural and biotic elements at smaller scales (e.g. stream reaches) (Smith and Powell, 1971; Rahel, 1990; Poff, 1997; McManamay et al., 2015). The interaction between biotic and abiotic stream components is complementary and may frequently have similar and synergistic roles in regulating the structure of fish communities (Matthews, 2012). Understanding the role of hydrologic disturbance in ecological processes of streams and rivers is a critical part of developing regional flow standards. Mapping and analyzing the spatial patterns of hydrologic disturbance can provide an important tool for resource managers to identify potential risks of aquatic ecosystem degradation and to prioritize and target conservation and restoration decisions. Biological datasets complied under the USGS Aquatic GAP initiative and other sources represent an extensive source of high-quality georeferenced species occurrence datasets that, when linked to the National Hydrology Database Version 2 (NHDPlusV2) and Geospatial Attributes of Gages for Evaluating Streamflow (GAGES II) databases, can be a powerful tool for characterizing ecological responses to changes in flow and other environmental variables across varying spatial and temporal scales (Troia and McManamay, 2017). The GAGES II database was developed as part of a national effort to centralize data from USGS flow monitoring stations and their upstream watersheds (Falcone et al., 2010; Falcone, 2011). Falcone et al. (2010) used a subset of landscape variables to calculate a cumulative hydrologic disturbance index (HDI), which provides an important indicator for assessing important anthropogenic stressors within each USGS gage's watershed. The HDI is composed of seven metrics of anthropogenic disturbance, including major dam density (#/ 100 km2), change in dam storage (1950–2006; megaliters/km2), % canals and artificial paths, distance to National Pollutant Discharge Elimination Sites (NPDES), county-level freshwater withdrawal estimates (m3/km2), road density (km/km2), and landscape fragmentation calculated from the 2001 NLCD. In the present study, our objectives were to: (1) provide an alternative framework for identifying gaged and ungaged streams at risk of hydrologic impairment using existing biological datasets that can be applied locally and regionally, or expanded nationally, (2) evaluate the relationships between HDI, natural flow regime, and landuse to aid conservation specialists and natural resource managers in developing conservation and management actions, and (3) identify which fish species are influential in predicting hydrologic disturbance of streams in the Ozark-Ouachita Interior Highlands and Gulf Coastal Plains regions of the southern United States. 2. Methods 2.1. Study area The study area encompassed rivers and streams in Arkansas, Missouri, and eastern Oklahoma, U.S.A. and was defined based on the spatial extent of fish sampling records bounded by Level III ecoregions (Fig. 1). North American ecoregions are designated at varying levels of coarseness ranging from Level I–IV (finest scale) and denote areas where similar ecosystems and environmental resources are concentrated. The study region is both hydrologically and geographically

256

J.T. Fox, D.D. Magoulick / Science of the Total Environment 686 (2019) 254–263

Fig. 1. Map of the study area showing (a.) the distribution of USGS stream gages and fish sampling locations; (b.) 2011 NLCD landuse categories and Level III Ecoregion boundaries.

diverse, encompassing eleven Level III ecoregions with most occurring in more than one state. Leasure et al. (2016) identified seven natural flow regimes in the Ozark-Ouachita Interior Highlands region including: Groundwater stable (GS), Groundwater (GW), Groundwater flashy (GF), Perennial runoff (PR), Runoff flashy (RF), Intermittent runoff (IR), and Intermittent flashy (IF), in addition to big rivers (BR). Groundwater streams tend to dominate in the Ozark Highlands and runoff streams are more prevalent in the Boston Mountains, while intermittent flow regimes are found throughout the region, but particularly in the Ouachita Mountains. 2.2. Geospatial data sources This study used georeferenced fish records compiled for Arkansas and Missouri by the USGS National Gap Analysis Project (Aquatic GAP) (Scott et al., 1993; Sowa et al., 2007; Annis et al., 2011), and survey and museum collection data maintained by the Missouri Department of Conservation (MDC), Oklahoma Natural Heritage Inventory (ONHI), and Oklahoma Conservation Committee. (Appendix A1 and A2 in the Supporting Information). Limitations of using large biodiversity datasets can include sampling bias in favor of recorder distribution, lack of survey effort assessment, and incomplete coverage of species distributions (Ruete, 2015). However, these drawbacks were minimized by the extensive geographic coverage of fish sampling data and our use of presence/absence occurrences in place of count or abundance data to control for differences in fish sampling methodologies, effort, and gear (Guo and Olden, 2014). Although data on survey effort is not available, records used in the present study were collected predominately by professional biologists knowledgeable in fish sampling protocols, with approximately 93% of fish records collected using either drag/kick seins or backpack electroshockers (Appendix A2). Additional spatial data layers used in the analysis included the USGS National Hydrography

Dataset Plus Version 2 (NHDPlusV2), which includes the National Land Cover Database (2001, 2011 NLCD) (Homer et al., 2015), and the Geospatial Attributes of Gages for Evaluating Streamflow (GAGES II) (Falcone, 2011) (Table 1). 2.3. Data preprocessing We combined and standardized fish locations records from Arkansas, Missouri, and Oklahoma to correct for misspelled or redundant species names and we removed records with uncertain species IDs. Records for hybrids were also excluded and subspecies designations were truncated to species level. The cleaned regional species dataset consisted of 202,390 individual records for 255 species collected from 1916 to 2016. This encompasses a period major dam construction in the U.S. which peaked in the 1950s–60s and was subsequently followed in the latter half of the 20th century by a tripling in the demand for irrigation water from impoundments, reservoirs, and groundwater resources (Biemans et al., 2011). Despite the potential of older fish collections to contain less reliable records, exploratory data analysis indicated temporally constraining the fish dataset to sampling records collected since the 1970s resulted in a marked decline in model accuracy, as did limiting the analysis only to species identified as having high importance in random forest models. Geographic locations (i.e. latitude, longitude) were used to link fish survey records to the NHDPlusV2 and to assign each collection record a stream reach code using linear referencing in ArcMap v.10.5 (Environmental Systems Research Institute, Redlands, California). Stream reach codes are unique 14-digit numbers which form the basis of the NHD stream linear referencing system and allows georeferenced observations to be linked to specific points along a stream network. We constrained our analysis to streams second order and larger (Strahler method) and used the unique reach codes as identifiers for creating a

J.T. Fox, D.D. Magoulick / Science of the Total Environment 686 (2019) 254–263

257

Table 1 Description of geospatial variables used in RF models. Mean annual and monthly stream flow (cubic feet per second, cfs) and velocity (feet per second, fps) were measured at USGS gages and estimated for upstream and downstream stream reaches using the USGS Enhanced Unit Runoff Method (EROM), a multi-step process designed to improve flow estimates based on the 30-year period from 1971 to 2000. Dataset

Variable

Spatial scale

Description

GAGESII

HDI

Watershed 30 m

Hydrologic Disturbance Index representing cumulative disturbance of selected anthropogenic stressors within the watershed of each USGS gage Land cover for the conterminous United States based on Landsat TM satellite imagery

NLCD 2011 NHDPlusV2

2

Land cover Topology Drainage area Max elevation Slope Hydrology QA_MA VA_MA QC_MA VC_MA QE_MA

Catchment Stream reach Stream reach

Total upstream catchment area from downstream end of flowline Maximum elevation of flowline (smoothed) in centimeters Slope of flowline (m/km) based on smoothed elevations

Stream reach Stream reach Stream reach Stream reach Stream reach

VE_MA QA_01-12 VA_01-12 QC_01-12 VC_01-12

Stream reach Stream reach Stream reach Stream reach Stream reach

QE_01-12 VE_01-12

Stream reach Stream reach

Mean Annual Flow from runoff (cfs) Mean Annual Velocity for QA (fps) Mean Annual Flow with Reference Gage Regression applied to QB (cfs) (Best EROM estimate of “natural” mean flow) Mean Annual Velocity for QC (fps). (Best EROM estimate of “natural” mean velocity) Mean Annual Flow from gage adjustment (cfs) (Best EROM estimate of actual mean flow) Mean Annual Velocity from gage adjustment (fps) (Best EROM estimate of actual mean velocity) Mean Monthly Flow from runoff (cfs) Mean Monthly Velocity for QA (fps) Mean Monthly Flow with Reference Gage Regression applied to QB (cfs) (Best EROM estimate of “natural” mean flow) Mean Monthly Velocity for QC (fps). (Best EROM estimate of “natural” mean velocity) Mean Monthly Flow from gage adjustment (cfs) (Best EROM estimate of actual mean flow) Mean Monthly Velocity from gage adjustment (fps) (Best EROM estimate of actual mean velocity)

binary presence/absence table of fish communities for 8218 stream reaches. There were 521 USGS gages within our study area and we spatially joined fish presence/absence records to the nearest gage up to a maximum stream network distance of 10 km. This distance was conservatively chosen based on the findings of Bond and Kennard (2017) in the Murray-Darling Basin of Australia that ecological data and hydrologic metrics were readily transposable up to 25 km, after which prediction uncertainty rapidly increased. The resulting dataset included fish sampling records for a total 2521 gaged stream reaches linked to hydrologic and topographic variables (watershed area, elevation, and slope) extracted from the NHDPlusV2, and the HDI from the GAGESII database (Falcone, 2011; Table 1). HDI values for gaged streams ranged from 3 to 33 and we used the quantile values to assign each stream into categories of low (HDI 3–10), medium (HDI 11–19), and high (HDI 20–33) hydrologic disturbance for the random forest analysis. Statistical relationships between the individual components of the HDI calculated for a subset of 128 reference streams using the approach described by Lynch et al. (2019) are illustrated by multiple pairwise comparisons using Spearman's rank correlation (Appendix A3). Mean annual and monthly discharge (cubic feet per second, cfs) and velocity (feet per second, fps) measured at USGS gage sites were included in the analysis, along with flow data for upstream and downstream reaches estimated using the USGS Enhanced Unit Runoff Method (EROM, Table 1) (McKay et al., 2012). The EROM routes unit runoff for each catchment using an incremental, regressive approach to improve the accuracy of flow estimates and is coordinated to reflect the 1971 to 2000 time period. Input data for the EROM includes precipitation, runoff, and temperature and the model is calibrated using flow records from the USGS stream gage nearest to the stream reach of interest using only those with at least 10 complete years of record. The EROM also incorporates numerous enhancements to the previous unit runoff methods, including accounting for excess potential evapotranspiration, major water withdrawals and additions, and stream networkinterpolated adjustments to correct estimated flows upstream and downstream of gage locations. EROM-corrected flow estimates (annual and monthly Q_E values; Table 1) are considered the “best” NHDPlusV2 flow estimates for use in models and analyses (McKay et al., 2012). A full description of theEROM can be found online: http://www.horizonsystems.com/NHDPlus/. Our use of the NHDPlusV2 flow variables without additional hydrologic metrics (e.g. Hydrological Indices Tool (HIT) metrics) was based on its widespread availability and use, and the

extensive validation and quality assessment and control USGS stream flow estimates have undergone. However, we wish to make clear that the NHDPlusV2 hydrologic variables are limited to stream discharge and velocity (i.e. magnitude), and therefore do not reflect the full range of streamflow conditions including ecologically-important measures of duration and frequency of high and low flows (Poff et al., 1997). 2.4. HDI model development We used a random forest (RF) classification approach to predict hydrologic disturbance of streams in the study area. RF statistical classification is a supervised machine learning method that builds and averages multiple decision trees to improve model predictions (Breiman, 2001). As trees are built the relative importance of predictor variables are computed by measuring the mean decrease in model accuracy as each predictor is randomly permuted in turn. The RF model makes no formal assumptions about the theoretical distributions of the data and therefore can handle non-linear interactions (Breiman, 2001). We developed four separate RF models (Table 1) based on fish presence/absence (Model1), NHDPlusV2 hydrologic metrics (Model2), NHDPlusV2 hydrologic and topographic variables (Model3), and fish presence/absence and topographic variables (Model4). The models were developed to assess the predictive ability of fish presence/absence records, with and without the influence of topographic variables, compared to hydrologic metrics available in the NHDPlusV2 database. The RF models were trained on a random subset (70%) of streams in the study area. The remaining 30% of the dataset was withheld to evaluate model performance using repeated k-fold cross validation, by first splitting the data into k = 10 subsets and randomly holding each one out in turn, while training the model on the remaining subsets. This splitting process was repeated three times to reduce overfitting on the training set and the overall model accuracy was taken as the mean of these repeats. The number of variables randomly sampled as candidates at each split (mtry) was tuned using the Caret statistical package in R (Kuhn, 2008) with the number of trees (ntree) set at 500. A random search approach was used for hyper-parameter optimization, which has been shown to outperform grid and manual search procedures for finding best models with faster computational time (Bergstra and Bengio, 2012). Each trained RF model was then used to predict on the testing set and to calculate standard confusion matrix statistics,

258

J.T. Fox, D.D. Magoulick / Science of the Total Environment 686 (2019) 254–263

including overall model accuracy and 95% confidence intervals based on the percentage of correctly classified HDI categories, as well as Cohen's Kappa which was normalized to the baseline of random chance (i.e., 50%). In addition, sensitivity, specificity, and balanced accuracy of the RF model predictions were calculated for each of the different HDI categories. The best performing RF model was trained on the full dataset for gaged streams (n = 2521) and used to predict HDI category membership for ungaged stream reaches (n = 5697). The datasets for gaged and ungaged streams were then combined to map regional patterns of hydrologic disturbance. The HDI classifications were extended to neighboring stream reaches in the NHDPlusV2 database by assigning category membership based on their proximity to a classified location up to a maximum distance of 10 km. To assess the representativeness of HDI values of gaged and ungaged streams to those extrapolated to the NHDPlusV2 stream network a subset of HDI values (n = 8218) was randomly generated from across the NHDPlusV2 network and compared to HDI values for gaged and ungaged streams using a Pearson Chisquare (X2) goodness-of-fit test. A p-value testing the whether the reference distribution of the randomly generated samples were the Monte Carlo simulation with N = 10,000 repetitions to estimate a pvalue. 2.5. Natural flow regimes and NLCD landuse Relationships between HDI and the seven natural flow regime classifications for gaged and ungaged streams developed by Leasure et al. (2016) (Groundwater stable (GS), Groundwater (GW), Groundwater flashy (GF), Perennial runoff (PR), Runoff flashy (RF), Intermittent runoff (IR), and Intermittent flashy (IF), in addition to big rivers (BR)) were examined using row-standardized correspondence analysis (CA) in the ‘vcd’ (Friendly, 2013) and ‘ca’ (Nenadic and Greenacre, 2007) R statistical packages. CA is an extension of principal component analysis (PCA) and is used to analyze frequencies formed by categorical data summarized in a contingency table format to provide factor scores for rows and columns. Coordinate bi-plots and 2-way association plots were used to show significant departures between observed and expected frequencies based on Pearson's Chi-square tests (Friendly and Meyer, 2015). We further quantified the distribution of HDI based on land use categories in the 2011 NLCD using land use (30m2 spatial resolution) values extracted for a 200 m buffer around each fish sampling location. While we evaluated how observed and predicted HDI values of vary in relation to land use and flow regime, our primary objective was to assess how well long-term fish community data perform in predicting HDI, not to make inferences about how specific land uses and hydrologic disturbances influence biological communities. 2.6. Geostatistical analysis Spatial patterns of HDI were interpolated and mapped using Empirical Bayesian Kriging (EBK) (Krivoruchko, 2012) based on the RF predictions for the combined datasets for gaged and ungaged streams (n = 8218). The HDI classes were recoded as numerical values from 1 (low HDI) to 3 (high HDI) and represent the quantile range of HDI for streams in the study area. EBK differs from other kriging methods in that it explicitly accounts for error introduced by estimating many semivariogram models instead of using a single “true” semivariogram to make predictions at unknown locations. EBK does this via a restricted maximum likelihood approach by subsetting and simulating data at unmeasured locations to obtain an optimal empirical semivariogram and then calculating weights to minimize interpolation mean square error (Pilz and Spöck, 2008). The default spatial correlation is the power model with weights for each new semivariogram calculated using Bayes' rule, which indicates how likely it is that the observed data can be generated from the theoretical semivariogram (Krivoruchko, 2012). In addition to aiding visualization of landscape-level patterns of HDI,

EBK can be used to generate a prediction standard error surface to enable better understanding of spatial patterns of prediction uncertainty. 3. Results 3.1. HDI model prediction accuracy The RF model trained only on fish presence/absence data (Model1; Appendix A4) predicted HDI of the validation dataset for gaged streams with an overall accuracy of 77% and a balanced class accuracy of between 75 and 86% (Table 2). However, sensitivity was low for streams with high HDI (54%) (Table 3). Model performance for predicting HDI based only on NHDPlusV2 flow variables (Model2) was similar to Model1, although the fish presence/absence data predicted low HDI with higher accuracy. Accounting for watershed drainage area (km2), elevation (cm), and slope (m/km) substantially improved the prediction accuracy of both the flow (Model3) and fish (Model4) models. The best performing model in terms of both overall and individual class accuracies was Model4. Sensitivity for predicting streams with low HDI based on fishes and topographic variables was nearly 10% higher than for Model3. A Pearson Chi-square analysis showed that the observed frequencies of HDI levels were statistically similar for both modeled sites and the NHDPlusV2 stream network (X2 = 5.55, df = 4, p = 0.234), indicating that the study sites were representative of HDI values extrapolated to the larger NHDPlusV2 stream network. Streams with low HDI tended to be located in smaller watersheds at lower elevations and with steeper slope gradients, compared to streams with high and moderate HDI (Fig. 2). While streams with low HDI were found throughout the study area the greatest concentration was in the Boston Mountains and Ozark and Ouachita Interior Highlands ecoregions (Fig. 3), although a large proportion of streams with moderate HDI were also seen throughout the Ozark Highlands. In contrast, the Western Corn Belt, Interior River Valley, and Mississippi Alluvial Plains ecoregions had the greatest proportion of streams classified as having moderate and high HDI, and the fewest streams with low HDI. Variables identified as important predictors of HDI in the RF classifications based on randomized column permutation are shown in Table 4. Nine species were identified as having high importance in both Model1 (Fishes) and Model4 (Fishes topography) (Table 4). These included including several species of darters (Etheostoma spectabile, E. caeruleum, E. uniporum, E. mihileze) and sculpin (Cottus immaculatus, C. carolinae), as well as the western mosquito fish (Gambusia affinis), and big-eyed shiner (Notropis boops). The most influential NHDPlusV2 flow variables in both Model 2 (Flow) and Model 3 included mean monthly flow velocity (fps) in January, February, and during the fall from September to November. The topographic metrics, drainage area, elevation, and slope, were the most influential variables in both Model3 (flow geography) and Model4 and their addition reduced the relative importance of any single flow metric or fish species. 3.2. Natural flow regimes and HDI The best performing RF classification (Model4) was used to predict HDI for 5696 ungaged streams, of which 43% were classified as having moderate HDI, 29% with low HDI, and 28% with high HDI. Extending the classification to neighboring stream (10 km search neighborhood)

Table 2 Summary of global accuracy statistics for the three RF models including 95% confidence intervals and Cohen's Kappa. Model

Overall accuracy

95% CI

Kappa

Model1 - fishes Model2 - flow Model3 - flow geography Model4 - fishes geography

0.77 0.76 0.86 0.9

0.74, 0.8 0.73, 0.8 0.84, 0.89 0.88, 0.92

0.62 0.6 0.77 0.84

J.T. Fox, D.D. Magoulick / Science of the Total Environment 686 (2019) 254–263

259

Table 3 Category-level accuracy statistics calculated by standard contingency table analysis of observed vs. expected outcomes for the different RF models. Confusion matrix statistic

HDI class High

Moderate

Low

Model1 - fishes Sensitivity Specificity Balanced accuracy

0.54 0.96 0.75

0.85 0.71 0.78

0.78 0.93 0.86

Model2 - flow Sensitivity Specificity Balanced accuracy

0.55 0.96 0.75

0.84 0.72 0.78

0.75 0.9 0.83

Model3 - flow geography Sensitivity Specificity Balanced accuracy

0.75 0.97 0.86

0.92 0.83 0.87

0.85 0.96 0.91

Model4 - fishes geography Sensitivity Specificity Balanced accuracy

0.76 0.99 0.87

0.95 0.88 0.91

0.94 0.97 0.95

in the NHDPlusV2 network resulted in 106,949 stream reaches, with 50% classified as having moderate HDI, 27% with low HDI, and 23% with high HDI (Fig. 3). Correspondence analysis of predicted HDI and natural flow regime indicated that groundwater stable streams were more likely to have low HDI, while big rivers and intermittent runoff streams had the highest frequency of high HDI (Fig. 4). A 2-way association plot of HDI by natural flow regime class with Pearson residuals shows a high degree of similarity in the observed versus expected frequencies of low, medium, and high HDI among intermittent flashy streams (Fig. 5). Big rivers and intermittent runoff streams had a significantly greater than expected number of reaches with high HDI groundwater flashy and groundwater stable flow classes contained significantly greater than expected number of streams with low HDI and fewer than expected streams with high HDI. 3.3. Empirical Bayesian kriging models Empirical Bayesian kriging (EBK) maps of HDI showed areas of high disturbance predominantly surrounding larger urban areas, and in the Arkansas, Missouri, and Mississippi River Valleys (Fig. 6a). Areas least impacted by hydrologic disturbance included the Ozark and Ouachita

Fig. 3. Map of the regional stream network classified by HDI category. HDI values were assigned to stream reaches based on their proximity to a classified location (gaged and ungaged) within the linear stream network.

Interior Highlands and the Boston Mountains in northwestern Arkansas. Prediction uncertainty of the EBK model used to map landscape-level spatial patterns of the HDI across the study area was generally low, indicated by a root mean square standardized error close to one (1.0008) and an average standard error (0.48) close to the root mean square error (RMSE; 0.51). Prediction standard errors for the model represent the variation between true and predicted values, where the true value lies within ±2 times the prediction standard error in 95% of cases. Prediction uncertainty tended to be highest in areas with sparse USGS stream gage coverage and fewer fish sampling records (Figs. 6b and 1). Average prediction standard error for individual HDI categories

Fig. 2. Boxplots of (a.) watershed drainage area (km2; with outliers truncated at 5000km2), (b.) maximum elevation (m) and (c.) slope (m/km) for gaged and ungaged stream reaches where fish sampling data were available (n = 8218).

260

J.T. Fox, D.D. Magoulick / Science of the Total Environment 686 (2019) 254–263

Table 4 Importance scores of the top 20 variables in RF models calculated by averaging the decrease in accuracy of each RF tree following leave-one-out permutation and scaled between 0 and 100. Fish species and flow metrics identified as important in multiple models are shown in bold. Model1 - fishes

Model2 - flow

Model3 - flow geography

Model4 - fishes geography

Variable

Importance

Variable

Importance

Variable

Importance

Variable

Importance

Gambusia affinis Notropis boops Cyprinella lutrensis Campostoma anomalum Cottus immaculatus Etheostoma spectabile Etheostoma caeruleum Etheostoma uniporum Lepomis megalotis Lepomis cyanellus Lepomis macrochirus Noturus exilis Fundulus olivaceous Lythrurus umbratilis Percina nasuta Semotilus atromaculatus Pimephales notatus Micropterus punctulatus Micropterus dolomieu Notropis telescopus

100 92.54 87.9 67.63 65.64 64.12 63.82 62.74 61.92 61.85 61.31 59.65 59.14 56.98 56.77 56.44 55.88 53.91 53.57 53.28

VE_02 VC_01 VA_01 VE_09 VC_09 VC_10 VE_01 VC_02 VA_10 VA_09 QE_11 VC_11 VE_07 QE_01 VE_11 VE_10 VA_02 QC_10 QE_09 QA_10

100 98.11 82.38 77.13 73.66 73.4 68.34 68.84 65.68 65.29 65.69 59.45 56.57 55.84 55.53 54.46 54.22 53.49 52.71 51.4

Drainage area (km2) Maximum elevation (cm) VE_02 Slope (m/km) VC_01 VA_01 VA_02 VE_07 VC_10 VC_11 QC_01 QE_02 VE_09 VE_01 VC_02 VA_10 VA_12 QC_10 QE_08 VA_11

100.00 52.62 7.83 7.71 4.86 4.81 4.76 4.62 4.36 3.88 3.83 3.80 3.70 3.68 3.51 3.32 3.28 3.28 3.19 3.09

Drainage area (km2) Maximum elevation (cm) Slope (m/km) Cottus immaculatus Notropis boops Nocomis asper Cyprinella lutrensis Etheostoma uniporum Percina nasuta Cottus carolinae Etheostoma caeruleum Etheostoma mihileze Etheostoma spectabile Lepomis megalotis Luxilus cardinalis Gambusia affinis Notropis telescopus Phoxinus erythrogaster Pimephales notatus Nocomis biguttatus

100.00 54.47 14.58 10.29 8.80 7.39 6.98 6.30 6.21 5.54 5.45 5.27 3.34 3.27 3.08 2.68 2.66 2.63 2.60 2.54

was nearly equivalent, ranging from 0.45 for moderate HDI to 0.46 for high and low HDI.

developed areas was substantially less than that observed in forested landuse. 4. Discussion

3.4. Land use and landscape variables Examining the distribution of HDI for gaged and ungaged streams by 2011 NLCD landuse showed 51% of sampling locations classified as having low HDI were located in forested areas (deciduous, evergreen, and mixed) as opposed to only 26% of locations with high HDI (Fig. 7). In contrast, the proportion of streams in agricultural land use (i.e. cultivated crops) with high HDI (16%) was four times higher than those with low HDI (4%). While streams in developed areas tended to have higher HDI, the difference among the proportions of HDI categories in

Fig. 4. Correspondence analysis contribution bi-plot of HDI by natural flow regime categories: big rivers (BR), groundwater (G), groundwater flashy (GF), groundwater stable (GS), intermittent flashy (IF), intermittent runoff (IR), perennial runoff (PR), and runoff flash (RF). GS streams were more likely to have low HDI, while BR and IR flow regimes had the highest frequency of streams with high HDI.

Using a machine learning approach, we were able to predict and map the spatial distribution of hydrological disturbance index (HDI) levels (low, moderate, high) for ungaged streams in Arkansas, Missouri, and eastern Oklahoma with low model and class uncertainty. The ability of fish occurrence data to accurately predict HDI proved to be as high or higher than hydrologic metrics describing annual and monthly mean flow rate or discharge (cfs) and velocity (fps) currently available in the NHDPlusV2 database. Models of fish occurrence data (Model1 and Model4; Table 3) also predicted low HDI class with higher sensitivity than RF models trained on flow metrics. Streams with high and moderate HDI tended to be lower elevation rivers and streams with larger drainage areas and lower gradients of slope (Fig. 3). Larger rivers in temperate and tropical regions typically support higher fish species diversity and more complex community assemblage structure compared to headwater streams, due to increased habitat complexity, food availability, geologic age, and stability of environmental conditions (Jackson et al., 2001; Matthews, 2012). However, large river systems are also more likely to be impacted by dams and other anthropogenic land alteration, with nearly all watersheds in the United States larger than 2000 km2 containing dams (Graf, 1999). Aquatic communities are shaped by physical and chemical gradients that define the success of individual species in a given habitat and impose fundamental controls over their ecological interactions (Smith and Powell, 1971; Southwood et al., 1974; Southwood, 1977; Power et al., 1988; Rahel, 1990; Poff, 1997; Matthews, 2012). The same topographic features that act to limit the distributions of certain fish species (e.g. elevation, drainage area, slope) may also predispose certain watersheds to hydrologic disturbance (e.g. dam construction, urban development, agriculture). Abiotic variability and disturbance, for example, may directly limit the species present in local fish assemblages, while indirectly invoking behavioral responses like migration to more favorable habitat (Matthews, 2012). Human demands for surface and ground water have increased significantly over the past several decades, with agriculture alone accounting for nearly 85% of global water consumption and, together with urbanization, leading to substantial degradation of surface water quality, loss of aquatic habitat, and unsustainable levels of water withdrawals (Foley et al., 2005). Loss of forest cover in

J.T. Fox, D.D. Magoulick / Science of the Total Environment 686 (2019) 254–263

261

Fig. 5. 2-way bar plot of (A) high, moderate, and low HDI categories and (B) natural flow regime classes: big rivers (BR), groundwater (G), groundwater flashy (GF), groundwater stable (GS), intermittent flashy (IF), intermittent runoff (IR), perennial runoff (PR), and runoff flash (RF), for streams in the Ozark and Ouachita Interior Highlands of Arkansas, Missouri, and Oklahoma (n = 23,307). Pearson residuals show the departure of observed from expected frequencies of disturbance under the null model (e.g. independence) and the size of each flow class, indicated by the width of the bar.

watersheds can greatly contribute to hydrologic disturbance of rivers by decreasing the lag time it takes for precipitation runoff to reach stream tributaries, in addition to increasing the amount of sediment, and chemical contaminants (Booth et al., 2002). In the present study, we observed the proportion of stream reaches in forested catchments with low hydrologic disturbance were nearly twice that of streams with high HDI, while the number of stream reaches in agricultural land use in the category of high HDI was five times greater than those with low HDI. Similarly, Hill et al. (2017) observed that streams with larger watersheds

and in areas dominated by agricultural land use had a lower probability of being in good biological condition, although large rivers receiving much of their flow from snowmelt tended to have higher biological condition despite flowing through areas dominated by human-related land uses (Hill et al., 2017). Geographical patterns of hydrologic disturbance across the study area suggest that distinct ecohydrological regions exist where conservation actions may be focused. Streams with high HDI values were located predominantly in the Arkansas, Missouri, and Mississippi River Valleys,

Fig. 6. Maps of (a.) HDI predicted by empirical Bayesian kriging and (b.) associated spatial patterns of prediction standard error representing the variation between true and predicted values where the true value lies within ±2 times the prediction standard error in 95% of cases.

262

J.T. Fox, D.D. Magoulick / Science of the Total Environment 686 (2019) 254–263

Fig. 7. Bar graph showing the distribution of 2011 NLCD land use by HDI category for gaged and ungaged streams (n = 8218).

and around agricultural and large urban areas, while streams with the lowest disturbance were concentrated in the Ozark Highlands, Boston Mountains, and Ouachita Mountains. Big river and intermittent runoff flow classes had a significantly greater frequency of high disturbance. In contrast, groundwater stable and, to a lesser degree, groundwater flashy flow classes had a significantly higher frequency of stream reaches with low HDI. Streams with natural flow regimes dominated by perennial runoff had the highest frequency of moderate disturbance, but also a significantly lower frequency of high HDI, indicating a potential opportunity for conservation actions targeted at mitigating landscape factors contributing to hydrologic disturbance, and limiting or reducing their impact on aquatic ecosystems. Fishes identified as important for predicting HDI were predominantly benthic species, including several species of darters (Etheostoma spectabile, E. caeruleum, E. uniporum, E. mihileze) and sculpin (Cottus immaculatus, C. carolinae). These species tend to be less tolerant of turbidity and pollution, and are also typically found in riffles, runs, and shallow pools of higher gradient, headwater streams with silt-free substrate of rock, cobble, or gravel (Pflieger et al., 1975). In contrast, the western mosquitofish (Gambusia affinis), which was the most important predictor of HDI in Model1, is a generalist that occurs in habitats of all kinds, often in deep pools and stagnant ponds with low dissolved oxygen and higher temperatures. Western mosquitofish have been observed to dominate fish communities during extended low flow events along with red shiner (Cyprinella lutrensis) (Rahel and Olden, 2008), which was also identified as an important species for predicting HDI in Model1 and Model4. Overall, nine fish species were identified as having high importance in both models (Table 4). The central stoneroller (Campostoma anomallum) and several other species identified as influential in Model1 became less important when accounting for drainage area, elevation, and slope (Model4). Averaging variable importance over all RF trees and performing a random search to tune and optimize the ntree parameter based on out of box error rates helped minimize differences in variable importance rankings due to autocorrelation among predictors (Nicodemus et al., 2010). However, some differences in the rankings of variable importance in an RF model are to be expected with the inclusion of additional factors that explain variations in HDI for streams in the study area (Strobl et al., 2008). Using georeferenced fish records for predicting HDI for ungaged streams represents an alternative framework to using hydrologic metrics (HMs) computed with daily mean flow data. While HMs are often

hypothesized to be ecologically-relevant streamflow attributes, empirical evidence of their relevance to aquatic species and communities is frequently lacking (Carlisle et al., 2017) due to incomplete streamflow data and mathematical constraints (Eng et al., 2017). Reliance on historical hydrologic time series data to determine reference conditions of streams and catchments is becoming increasingly problematic given the non-stationarity and shifting of hydrologic baselines driven by rapid global climate change coupled with anthropogenic land transformation (Poff, 2017). Analysis of aggregated biological assessment data collected in proximity to stream gaging sites in combination with long-term flow records has the potential to enhance prediction and understanding of biological responses to watershed-scale disturbance contributing to streamflow alteration. Our categorization of HDI into high, moderate, and low levels of hydrologic disturbance based on the quantiles of the HDI scores likely ignored meaningful variations in stream impairment within the assigned HDI categories. It should also be noted that while the NHDPlusV2 provides mean annual and monthly stream flow and velocity measures, there are many other aspects of flow regime (e.g. daily flow variability, base flow index, constancy of daily flows, etc.) which are not accounted for and thus limits their scope and utility. Performance of our approach may also be reduced in areas having sparse sampling records or low fish diversity. However, the high predictive capability of fish community data in combination with basic topographic variables for explaining variation in stream HDI across different ecoregions with highly variable hydrology and landuse, suggests our method can be applied to other regions for which long-term biological data are available. 5. Conclusions The performance of stream reach-scale fish community data for accurately predicting HDI further illustrates the importance of incorporating long-term species monitoring records in environmental-flows science. Long-term biological datasets like the USGS aquatic GAP may prove to be a particularly valuable resource for predicting HDI of ungaged streams, given that reference streams used to generate hydrologic metrics are likely to be increasingly impacted by rapid changes in climate and other environmental conditions (Poff, 2017). The extensive spatiotemporal scope of the USGS aquatic GAP and other biological databases makes them a particularly valuable resource for characterizing

J.T. Fox, D.D. Magoulick / Science of the Total Environment 686 (2019) 254–263

spatial patterns of hydrologic alteration using species location data, while alleviating the typical logistical, time, and fiscal constraints associated with conducting field surveys. The continued collection and compilation of biological data complementary to long-term streamflow datasets is vital for identifying and understanding biological community responses to hydrologic disturbance. Furthermore, modeling and mapping the spatial patterns of disturbance can inform the development of adaptive strategies to better manage for ecosystem resilience under increasing local and regional climatic variability, and provide valuable information for stakeholders to identify and prioritize conservation needs and target management actions. Supplementary data to this article can be found online at https://doi. org/10.1016/j.scitotenv.2019.05.156. Acknowledgements We would like to thank the Arkansas Game and Fish Commission, Missouri Department of Conservation, Oklahoma Natural Heritage Inventory, Oklahoma Conservation Commission, and USGS Aquatic Gap Program for providing the biological survey data used in this research. We would also like to thank B. Peoples and our anonymous reviewers for their comments, which helped to substantially improve the manuscript. Funding for this work was provided by the Arkansas Game and Fish Commission State Wildlife Grant Program (grant no. AR-TF16AF01276). The authors have no conflicts of interest related to this research. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. References Annis, G.M., Millican, D., Gallipeau, C., Inlander, E., Diamond, D., Sowa, S., Morey, M., Hanberry, P., Garringer, A., Mabrey, K., 2011. Developing stream reach-scale predicted distribution models for fish species in Arkansas. Final Report Submitted to the USGS National Gap Analysis Program, p. 1,079. Bergstra, J., Bengio, Y., 2012. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305. Biemans, H., Haddeland, I., Kabat, P., Ludwig, F., Hutjes, R., Heinke, J., Von Bloh, W., Gerten, D., 2011. Impact of reservoirs on river discharge and irrigation water supply during the 20th century. Water Resour. Res. 47. Bond, N.R., Kennard, M.J., 2017. Prediction of hydrologic characteristics for ungauged catchments to support hydroecological modeling. Water Resour. Res. 53 (11), 8781–8794. Booth, D.B., Hartley, D., Jackson, R., 2002. Forest cover, impervious-surface area, and the mitigation of stormwater impacts. J. Am. Water Resour. Assoc. 38, 835–845. Breiman, L., 2001. Random forests. Mach. Learn. 45, 5–32. Bunn, S.E., Arthington, A.H., 2002. Basic principles and ecological consequences of altered flow regimes for aquatic biodiversity. Environ. Manag. 30, 492–507. Carlisle, D.M., Wolock, D.M., Meador, M.R., 2011. Alteration of streamflow magnitudes and potential ecological consequences: a multiregional assessment. Front. Ecol. Environ. 9, 264–270. Carlisle, D.M., Grantham, T.E., Eng, K., Wolock, D.M., 2017. Biological relevance of streamflow metrics: regional and national perspectives. Freshwater Sci. 36, 927–940. Döll, P., Schmied, H.M., 2012. How is the impact of climate change on river flow regimes related to the impact on mean annual runoff? A global-scale analysis. Environ. Res. Lett. 7, 014037. Eng, K., Grantham, T.E., Carlisle, D.M., Wolock, D.M., 2017. Predictability and selection of hydrologic metrics in riverine ecohydrology. Freshwater Sci. 36, 915–926. Falcone, J.A., 2011. GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow. US Geological Survey. Falcone, J.A., Carlisle, D.M., Wolock, D.M., Meador, M.R., 2010. GAGES: a stream gage database for evaluating natural and altered flow conditions in the conterminous United States. Ecology 91, 621. Foley, J.A., DeFries, R., Asner, G.P., Barford, C., Bonan, G., Carpenter, S.R., Chapin, F.S., Coe, M.T., Daily, G.C., Gibbs, H.K., 2005. Global consequences of land use. Science 309, 570–574. Friendly, M., 2013. Working with categorical data with R and the vcd and vcdExtra packages. Retrieved from. cran.r-project.org/web/packages/vcdExtra/vignettes/vcd-tutorial. pdf (last accessed July 2013). Friendly, M., Meyer, D., 2015. Discrete Data Analysis With R: Visualization and Modeling Techniques for Categorical and Count Data. CRC Press. Graf, W.L., 1999. Dam nation: a geographic census of American dams and their large-scale hydrologic impacts. Water Resour. Res. 35, 1305–1311. Guo, Q., Olden, J.D., 2014. Spatial scaling of non-native fish richness across the United States. PLoS One 9, e97727.

263

Hill, R.A., Fox, E.W., Leibowitz, S.G., Olsen, A.R., Thornbrugh, D.J., Weber, M.H., 2017. Predictive mapping of the biotic condition of conterminous US rivers and streams. Ecol. Appl. 27, 2397–2415. Homer, C., Dewitz, J., Yang, L., Jin, S., Danielson, P., Xian, G., Coulston, J., Herold, N., Wickham, J., Megown, K., 2015. Completion of the 2011 National Land Cover Database for the conterminous United States–representing a decade of land cover change information. Photogramm. Eng. Remote. Sens. 81, 345–354. Jackson, D.A., Peres-Neto, P.R., Olden, J.D., 2001. What controls who is where in freshwater fish communities the roles of biotic, abiotic, and spatial factors. Can. J. Fish. Aquat. Sci. 58, 157–170. Krivoruchko, K. 2012. Empirical Bayesian Kriging. ESRI: Redlands, CA, USA. Available online at: http://www. esri. com/news/arcuser/1012/empirical-byesian-kriging. html (Last accessed 08.02. 2016). Kuhn, M., 2008. Building predictive models in R using the caret package. J. Stat. Softw. 28, 1–26. Leasure, D., Magoulick, D.D., Longing, S., 2016. Natural flow regimes of the Ozark– Ouachita interior highlands region. River Res. Appl. 32, 18–35. Lynch, D.T., Leasure, D.R., Magoulick, D.D., 2019. Flow alteration-ecology relationships in Ozark Highland streams: consequences for fish, crayfish and macroinvertebrate assemblages. Sci. Total Environ. 672, 680–697. https://doi.org/10.1016/j. scitotenv.2019.03.383. Matthews, W.J., 2012. Patterns in Freshwater Fish Ecology. Springer Science & Business Media. McKay, L., Bondelid, T., Rea, A., Johnston, C., Moore, R., Dewald, T., McKay, L., Bondelid, T., Rea, A., Johnston, C., 2012. User Guide (Data Model Version 2.1). McManamay, R.A., Orth, D.J., Dolloff, C.A., 2012. Revisiting the homogenization of dammed rivers in the southeastern US. J. Hydrol. 424, 217–237. McManamay, R.A., Peoples, B.K., Orth, D.J., Dolloff, C.A., Matthews, D.C., 2015. Isolating causal pathways between flow and fish in the regulated river hierarchy. Can. J. Fish. Aquat. Sci. 72, 1731–1748. Mims, M.C., Olden, J.D., 2013. Fish assemblages respond to altered flow regimes via ecological filtering of life history strategies. Freshw. Biol. 58, 50–62. Nenadic, O., Greenacre, M., 2007. Correspondence analysis in R, with two-and threedimensional graphics: the ca package. J. Stat. Softw. 20. Nicodemus, K.K., Malley, J.D., Strobl, C., Ziegler, A., 2010. The behaviour of random forest permutation-based variable importance measures under predictor correlation. BMC Bioinf. 11, 110. Olden, J.D., Kennard, M.J., 2010. Intercontinental comparison of fish life history strategies along a gradient of hydrologic variability. American Fisheries Society Symposium, pp. 83–107. Pflieger, W.L., Sullivan, M., Taylor, L., 1975. The Fishes of Missouri. Missouri Department of Conservation Jefferson City. Pilz, J., Spöck, G., 2008. Why do we need and how should we implement Bayesian kriging methods. Stoch. Env. Res. Risk A. 22 (5), 621–632. Poff, N.L., 1997. Landscape filters and species traits: towards mechanistic understanding and prediction in stream ecology. J. N. Am. Benthol. Soc. 16, 391–409. Poff, N.L., 2017. Beyond the natural flow regime? Broadening the hydro-ecological foundation to meet environmental flows challenges in a non-stationary world. Freshw. Biol. 63 (8), 1011–1021. Poff, N.L., Allan, J.D., Bain, M.B., Karr, J.R., Prestegaard, K.L., Richter, B.D., Sparks, R.E., Stromberg, J.C., 1997. The natural flow regime. BioScience 47, 769–784. Poff, N.L., Richter, B.D., Arthington, A.H., Bunn, S.E., Naiman, R.J., Kendy, E., Acreman, M., Apse, C., Bledsoe, B.P., Freeman, M.C., 2010. The ecological limits of hydrologic alteration (ELOHA): a new framework for developing regional environmental flow standards. Freshw. Biol. 55, 147–170. Power, M.E., Stout, R.J., Cushing, C.E., Harper, P.P., Hauer, F.R., Matthews, W.J., Moyle, P.B., Statzner, B., Irene, R., De Badgen, W., 1988. Biotic and abiotic controls in river and stream communities. J. N. Am. Benthol. Soc. 7, 456–479. Rahel, F.J., 1990. The hierarchical nature of community persistence: a problem of scale. Am. Nat. 136, 328–344. Rahel, F.J., Olden, J.D., 2008. Assessing the effects of climate change on aquatic invasive species. Conserv. Biol. 22, 521–533. Ruete, A., 2015. Displaying bias in sampling effort of data accessed from biodiversity databases using ignorance maps. Biodiversity Data J. 3. Scott, J.M., Davis, F., Csuti, B., Noss, R., Butterfield, B., Groves, C., Anderson, H., Caicco, S., D'Erchia, F., Edwards Jr., T.C., 1993. Gap analysis: a geographic approach to protection of biological diversity. Wildl. Monogr. 3–41. Smith, C.L., Powell, C.R., 1971. The summer fish communities of Brier Creek, Marshall County, Oklahoma. American Museum Novitates; no. 2458. Southwood, T.R., 1977. Habitat, the templet for ecological strategies? J. Anim. Ecol. 337–365. Southwood, T., May, R., Hassell, M., Conway, G., 1974. Ecological strategies and population parameters. Am. Nat. 108, 791–804. Sowa, S.P., Annis, G., Morey, M.E., Diamond, D.D., 2007. A gap analysis and comprehensive conservation strategy for riverine ecosystems of Missouri. Ecol. Monogr. 77, 301–334. Strobl, C., Boulesteix, A.-L., Kneib, T., Augustin, T., Zeileis, A., 2008. Conditional variable importance for random forests. BMC Bioinf. 9, 307. Troia, M.J., McManamay, R.A., 2017. Completeness and coverage of open-access freshwater fish distribution data in the United States. Divers. Distrib. 23, 1482–1498.