Application of multiple index development approaches to benthic invertebrate data from the Virginian Biogeographic Province, USA

Application of multiple index development approaches to benthic invertebrate data from the Virginian Biogeographic Province, USA

Ecological Indicators 23 (2012) 176–188 Contents lists available at SciVerse ScienceDirect Ecological Indicators journal homepage: www.elsevier.com/...

1MB Sizes 0 Downloads 10 Views

Ecological Indicators 23 (2012) 176–188

Contents lists available at SciVerse ScienceDirect

Ecological Indicators journal homepage: www.elsevier.com/locate/ecolind

Application of multiple index development approaches to benthic invertebrate data from the Virginian Biogeographic Province, USA Marguerite C. Pelletier a,b,∗ , Arthur J. Gold b , Liliana Gonzalez c , Candace Oviatt d a U.S. Environmental Protection Agency, Office of Research and Development, National Health and Environmental Effects Laboratory, Atlantic Ecology Division, Narragansett, RI 02882, United States b University of Rhode Island, Department of Natural Resources Science, Kingston, RI 02881, United States c University of Rhode Island, Department of Computer Science and Statistics, Kingston, RI 02881, United States d University of Rhode Island, Graduate School of Oceanography, Narragansett, RI 02882, United States

a r t i c l e

i n f o

Article history: Received 23 December 2010 Received in revised form 15 March 2012 Accepted 18 March 2012 Keywords: Benthic indices Method comparison Invertebrates Estuarine

a b s t r a c t Previous work had indicated that the Virginian Province Index did not perform well in a smaller estuarine complex. While it was hoped that the existing Chesapeake Bay Benthic Index of Biotic Integrity, with its greater number of metrics and habitat separation would be more adaptable, this index also did not perform well outside of Chesapeake Bay. In this study we assembled additional metrics and applied different methods of index compilation to explore the indices relative strengths and weaknesses. Three different approaches were utilized – two multimetric indices (Chesapeake Bay IBI and the Mebane IBI) and a statistical logistic regression technique. The data were subdivided by habitat (salinity and grain size), and indices compiled using the same initial group of benthic metrics. Each approach was examined for its classification accuracy for both reference and impaired sites for the entire Virginian Province. The Chesapeake Bay IBI approach did not perform well in this study. In contrast, another multimetric approach, the Mebane IBI approach, performed well, as did the statistical logistic regression approach. Both techniques have promise for index development and could be useful in applying a biological condition gradient to estuaries. Published by Elsevier Ltd.

1. Introduction Macroinvertebrates have commonly been used to detect pollution impacts in estuaries (Pinto et al., 2009; Dauvin et al., 2010). They are abundant, easy to collect and their communities are very diverse, with representatives from many different phyla (Snelgrove, 1998) utilizing many different habitats and feeding strategies (Rhoads, 1974; Weisberg et al., 1997; Little, 2000). Macroinvertebrate assemblages also respond predictably to pollution (Pearson and Rosenberg, 1978; Hart and Fuller, 1979), are relatively sedentary, and act as integrators of stress over months to years (Weisberg et al., 1997; Paul et al., 2001). Benthic communities can be assessed using multivariate, univariate and multimetric approaches. Multivariate analysis is generally more sensitive (Warwick and Clarke, 1993; Clarke and Warwick, 2001). Less information is lost, and more complicated

∗ Corresponding author at: U.S. Environmental Protection Agency, Office of Research and Development, National Health and Environmental Effects Laboratory, Atlantic Ecology Division, Narragansett, RI 02882, United States. Tel.: +1 401 782 3131; fax: +1 401 782 3030. E-mail address: [email protected] (M.C. Pelletier). 1470-160X/$ – see front matter. Published by Elsevier Ltd. doi:10.1016/j.ecolind.2012.03.019

environmental gradients may be detected. Unfortunately, these analyses may be more difficult to interpret by non-experts. Given the potential difficulty of interpretation and the requirement for advanced statistical packages, many environmental managers prefer to use univariate (e.g., Shannon’s H ) or multivariate indices (Weisberg et al., 1997; Paul et al., 2001) in estuaries to assess benthic community response. Benthic indices based on macroinvertebrates have been developed and used for assessing estuarine condition (Marques et al., 2009). In the United States, it has been suggested that biomonitoring and indices be used in marine and estuarine waters much as they are currently used in streams to address water quality standards (Gibson et al., 2000). Benthic indices summarize aspects of the invertebrate community into a single number that can be used by environmental managers (Engle and Summers, 1999; Pinto et al., 2009). These indices can be related to various environmental factors which can help to diagnose identified impairment. Both multimetric and multivariate indices generally are composed of metrics representing different aspects of the benthic community including diversity, productivity, pollution tolerance-sensitivity, trophic composition and species composition, which are expected to change in response to environmental stressors such as sediment contamination or estuarine eutrophication. If chosen well, these metrics, and indices,

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

177

Fig. 1. Map of Virginian Biogeographic Province.

will be primarily responsive to the environmental gradient of concern (e.g., anthropogenic impairment) and minimize the impact of natural gradients. However, because these indices are community summaries, some information is lost; it is possible that they will be unresponsive to factors not accounted for during index development or that some undetected gradient could influence the index. A variety of indices have been developed worldwide (Marques et al., 2009). In the United States, benthic indices comprised of multiple metrics are commonly utilized (Karr et al., 1986; Weisberg et al., 1997; Engle and Summers, 1999; Van Dolah et al., 1999; Paul et al., 2001; Llansó et al., 2002a,b; Hale and Heltshe, 2008), although the method of their construction varies. The Index of Biotic Integrity (IBI), an additive multimetric technique, was first developed for streams using fish communities (Karr et al., 1986). This approach was later applied to stream macroinvertebrate communities (Ohio EPA, 1987; Plafkin et al., 1989; Kerans and Karr, 1994), and then to estuarine macroinvertebrate communities (Weisberg et al., 1997; Van Dolah et al., 1999). This approach is now used in more than 80% of water quality programs in the United States (Norris and Hawkins, 2000). Although the traditional approach for developing IBIs advocated by Karr has been utilized in U.S. estuarine waters (Weisberg et al., 1997; Van Dolah et al., 1999), alternative IBI development practices have been utilized in freshwater. One modification uses continuous scoring, and it has been shown to be more effective than non-continuous scoring (i.e., 1, 3, 5) of traditional IBI approaches (Pinto et al., 2009; Blocksom, 2003). Continuous scoring was used to develop a fish index for Pacific Northwest rivers (Mebane et al., 2003), and macroinvertebrate indices in mid-Atlantic highland streams (Blocksom, 2003). A variation on the traditional IBI approaches is to use statistical techniques to assemble multimetric indices. For example, Engle and Summers (1999) developed a benthic index comprised of five

metrics for Gulf of Mexico estuaries using discriminant analysis. This index was scaled between zero and ten to aid in interpretation. Paul et al. (2001) also used discriminant analysis to develop a benthic index using three metrics for the Virginian Biogeographic Province (Fig. 1) located along the east coast of the United States. To account for habitat differences, some metrics were adjusted for salinity. Index values below zero are considered impaired while those above zero are considered unimpaired. Hale and Heltshe (2008) used logistic regression rather than discriminant analysis to develop a benthic index composed of three metrics for the Gulf of Maine. This index also ranges from 0 to 10. Diaz et al. (2004) suggested that development of new indices should only proceed after examination of the appropriateness of existing indices. The Virginian Province Index (Paul et al., 2001) was developed using discriminant analysis. This index was well calibrated for the entire Province (Fig. 1), while also performing well when applied to the Chesapeake Bay (Ranasinghe et al., 2002). However, it was not particularly effective when applied to assessing benthic condition in the Hudson-Raritan estuary (D. Adams, personal communication). The better performance in the Chesapeake Bay was likely due to the fact that almost half of the stations used to develop the Virginian Province Index were from Chesapeake Bay. The Chesapeake Bay Benthic Index of Biotic Integrity (Weisberg et al., 1997; Llansó, 2002) is a multimetric index that is subdivided by habitat and contains many more metrics summarizing benthic community condition. Given the success in applying this index within Chesapeake Bay, the largest estuary in the United States, and its subestuaries, we hoped that it would be possible to directly apply the Index to the entire Virginian Province. However, our initial attempts to directly apply this index did not provide good classification accuracy, especially for impaired sites. Our results corresponded to other unsuccessful attempts to apply this index to

178

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

Delaware Bay, Long Island Sound, and the Hudson-Raritan estuary (see Engle and Summers, 1999). Thus, both of the existing indices in the Virginian Province were not universally applicable. In this paper we examined three different methods of index compilation to examine the strengths and weaknesses of each index compilation technique, rather than trying to produce the best index for the region. Later research will focus on producing an optimal index for the Province. Two multimetric approaches were used (Chesapeake Bay IBI approach; Weisberg et al., 1997; Llansó, 2002; Van Dolah et al., 1999 and Mebane IBI approach; Mebane et al., 2003) and one multivariate approach (logistic regression approach; Hale and Heltshe, 2008). Each approach was examined for its classification accuracy for both reference and impaired sites. To examine the broader applicability of our results we applied our index approaches to another area, the Acadian Biogeographic Region.

Table 1 Benthic habitat categories. Habitat Tidal freshwater Oligohaline Low mesohaline High mesohaline Polyhaline mud Polyhaline sand

TF O LM HM PM PS

Salinity (ppt)

Grain size (% silt/clay)

0–0.05 ≥0.05 to 5 ≥5 to 12 ≥12 to 18 ≥18 ≥18

n/a n/a n/a n/a ≥40 0–40

this context, reference stations are not pristine but rather are those which are least impacted. At impaired sites, one or more of the following conditions existed: ten or more ERLs or one or more ERM exceeded, amphipod survival less than 80%, or bottom D.O. less than or equal to 2 mg/L. 2.2. Benthic metrics and analysis overview

2. Materials and methods 2.1. Study region/field data Data from the Environmental Monitoring and Assessment Program (EMAP, www.epa.gov/emap) were assembled for this study (1856 stations). These data encompassed seven different monitoring efforts between 1990 and 2006 during the summer index period (July through early October) in the Virginian Biogeographic Province (Fig. 1). These monitoring efforts used similar gear and sampling/analysis methods. Data from stations sampled more than one time during a single year were averaged. Data from stations sampled over multiple years (12 stations over a four year period) were treated as independent samples. Strict quality assurance protocols were followed for samples collected in the field (U.S. EPA, 2001) and analyzed in the laboratory (U.S. EPA, 1995). Variables used in our analysis included benthic invertebrate abundance, sediment chemistry (organic and inorganic analytes), sediment grain size and total organic carbon, bottom dissolved oxygen and salinity measures, and results from sediment toxicity tests using the amphipod Ampelisca abdita (U.S. EPA, 1995, 2001). Most of the macroinvertebrate data were collected using a 0.04 m2 Young-modified Van Veen grab. Less than two percent of the samples were collected using a 0.04 m2 Ponar grab or a 0.1 m2 Smith MacIntyre grab. Samples were sieved though a 0.5 mm screen, preserved in formalin and then enumerated and identified to the lowest possible taxonomic level, generally genus or species. Both sorting and counting required that 10% of all samples processed by each technician were rechecked. For this study, species nomenclature was standardized against the Integrated Taxonomic Information System (ITIS) database (http://www.itis.gov) With the exception of Class Oligochaeta, only organisms identified to genus or species level were retained for analysis. Pelagic and epifaunal organisms were also eliminated as they are more motile and less directly exposed to in situ pollution at the sampling site. Because habitat differences can cause large differences in community structure, all stations were classified by habitat (Pelletier et al., 2010), defined by salinity and grain size (Table 1). All stations were additionally classified as ‘reference’ and ‘impaired’ using concurrently collected EMAP data based on sediment contaminant threshold values (Effects Range Median (ERM) and Effects Range Low (ERL); Long et al., 1995), amphipod toxicity, total organic carbon (TOC), and dissolved oxygen (D.O.) levels (see Weisberg et al., 1997; Hale et al., 2004). Reference sites were defined using the following criteria: no ERMs and less than 4 ERLs exceeded, amphipod survival greater than 80% relative to the control, TOC less than or equal to 2%, and bottom D.O. greater than or equal to 5 mg/L. In

To evaluate estuarine benthic indices, a total of 23 candidate metrics were compiled from the literature (Table 2). A majority of these metrics had been utilized in benthic indices to discriminate between reference and impaired sites. These metrics represented different aspects of the benthic community including diversity, productivity, pollution tolerance-sensitivity, trophic composition (Tables 3 and 4) and species composition. Shannon’s diversity, number of species, number of individuals and ES50 (used to calculate mES50 and AbwES50), were calculated in PRIMER 6 (Clarke and Gorley, 2006). ES50 is a diversity measure calculated using rarefaction (Hurlbert, 1971; Legendre and Legendre, 1998). For each species, the lower 5% of the distribution of the ES50s was calculated to derive a tolerance value with low values indicating pollution tolerant species and high values indicating pollution sensitive species (Rosenberg et al., 2004). Then for each station, the average of the tolerance values for all species was calculated (mES50). To calculate the abundance weighted value (AbwES50), each tolerance value was multiplied by the species abundance prior to averaging tolerance values. Dominance scores, percent sensitivity and tolerance, tolerance score and all trophic and species composition scores were computed in SAS 9.2 (SAS Institute 2002–2008). The resulting data were then divided into calibration (75% of the data) and validation (the remaining 25% of the data) datasets (Hosmer and Lemeshow, 2000). The calibration data were used to develop the models. The metric scoring and models developed from the calibration data were validated by applying them to the remaining 25% of the data not used in model development. The benthic data were subdivided into six habitat categories based on salinity and grain size (Table 1). All analyses were then performed on the individual habitats. The benthic invertebrate data were screened to remove epibenthic and pelagic species. Benthic metrics that had been used in other index development exercises were calculated to summarize different aspects of the benthic community (Table 2). All metrics in each of the six habitats were examined for their ability to discriminate between reference and impaired sites using SPSS (Version16.0). The Mann–Whitney test, a nonparametric analogue to the t-test (Zar, 1999), was used to determine if there were significant differences (˛ = 0.10) between reference and impaired sites (for a given metric in a given habitat). An alpha of 0.10 was chosen to reduce the risk of classifying degraded habitats as nondegraded and to be consistent with other studies (Van Dolah et al., 1999; Llansó et al., 2002a,b). For each habitat, box plots of the distribution of each metric were evaluated to determine if there was adequate discrimination between reference and impaired sites (Fig. 2a). A metric was eliminated if the reference interquartile range was encompassed by the impaired interquartile range or vice versa. A metric was also eliminated if >70% of the values were zero

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

179

Table 2 Benthic metrics. All potential benthic metrics are listed. The metrics selected for inclusion in the benthic indices are indicated by the boxes below each habitat (TF, tidal freshwater; O, oligohaline; LM, low mesohaline; HM, high mesohaline; P M, polyhaline mud; P S, polyhaline sand).

TF Species Diversity S H' GleasD % dom1 % dom2 % dom3 Productivity N

1 2 3 4 4 4

2

O

LM

HM

P_M

P_S

Number of species Shannon-Weiner diversity Gleason's D % abundance of most dominant species % abundance of two most dominant species % abundance of three most dominant species Total number of individuals

Pollution tolerance-sensitivity 5 % abundance of pollution sensitive species % sens 5 % abundance of pollution tolerant species % tol 6 Tolerance score tol score 4 mES50 Station mean of Rosenberg's tolerance value 4 Abundance weighted station mean of Rosenberg's tolerance v alue AbwES50 Trophic composition 2 % carn % abundance of carnivores and omnivores 2 % depo % deep deposit feeders Species composition 1 % bivalve % abundance Bivalvia 1 % crust % abundance Crustacea 7 % echino % abundance of Echinodermata 1 % abundance Tellinidae % tellin 1 % abundance Lucinidae % lucin 1 % capi % abundance Capitellidae 3 % spio % abundance of Spionidae 3 % abundance of Tubificidae % tubi 6 tany:chir Abundance ratio Tanypodinae:Chironomidae 1

Van Dolah et al. (1999). Weisberg et al. (1997). 3 Paul et al. (2001). 4 Hale and Heltshe (2008). 5 Pelletier et al. (2010). 6 Llansó (2002). 7 Grall and Glémarec (1997). 2

or non-detects. For each habitat, Pearson’s correlation analysis was performed to remove redundant metrics (r > 0.70). Twenty-three metrics (in each habitat) were reduced to 2 in the tidal freshwater habitat, 3 in the oligohaline habitat, 5 each in the low mesohaline and high mesohaline habitats, and 8 each in polyhaline mud and polyhaline sand habitats (Table 2). The resulting metrics were used to develop indices using the three different approaches. Each approach was examined for its classification accuracy for both reference and impaired sites. Overall classification accuracy is the percentage of both reference and impaired sites that are correctly identified by the model. The correct impaired classification is the percentage of impaired sites correctly identified as impaired by the model. This can also be referred to as sensitivity (probability of correct classification of an event). The correct reference classification is the percentage of reference sites correctly identified as reference, sometimes called specificity (probability of correct classification of a nonevent). Since the Chesapeake Bay approach focused on reference sites while the logistic regression approach focused on impaired sites, the definition of event/non-event will vary for the same stations, so the terms sensitivity and specificity are not used here.

2.3. Chesapeake Bay IBI approach Following the procedure of Weisberg et al. (1997), Llansó (2002) and Van Dolah et al. (1999), individual metrics were scored as 1, 3, or 5 based on the distribution of that metric across all the reference sites. For each station, a score was assigned to each metric (Van Dolah et al., 1999). For metrics expected to be higher at reference sites (e.g., % abundance of carnivores and omnivores (% carn)), the following scoring was utilized: a score of 5 was assigned if the value for that metric at that station was equal to or above the 50th percentile of the corresponding reference values, a score of 1 was assigned if the value for that metric at that station was below the 10th percentile of the corresponding reference values, while a score of 3 was assigned to values between the 10th and 50th percentile. For metrics expected to be lower at reference sites (e.g., % abundance of the two most dominant species (% dom2)), the following scoring was utilized: a score of 5 was assigned if the value for that metric at that station was equal to or below the 50th percentile of the corresponding reference values, a score of 1 was assigned if the value for that metric at that station was above the 90th percentile of the corresponding reference values, while a score of 3 was assigned to values between 50th and 90th percentiles. The

180

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

Fig. 2. Example of steps for scoring metrics using the Mebane IBI approach. Two metrics from the polyhaline mud habitat (% carn and % dom2) were selected as representative cases. (a) Metrics are examined to determine if they are higher in reference (% abundance of carnivores and omnivores (% carn)) or impaired (% abundance of the two most dominant species (% dom2)) sites. (b) Cumulative distribution function (cdf) is created from binned data from the entire dataset (reference, impaired and unclassified sites). (c) The truncated cdf (5th to 95th percentile of the distribution) is scaled between 0 and 1 and curve fit to the points. These equations are later applied to the entire dataset to score the metrics.

metric based on total numbers of individuals was expected to have a bimodal distribution – both very high and very low values may indicate ecological impact (Pearson and Rosenberg, 1978). Because of this, stations were scored as 5 if the value for that station was between the 25th and 75th percentiles of the corresponding reference values, as 3 if the value was between the 10th and 25th or between the 75th and 90th percentiles and 1 if the value for that metric at that station was below the 10th or above the 90th percentile. The index was calculated by averaging across all metrics at a site. Index scores of 3 or higher indicate that the benthic community is not impaired.

2.4. Mebane IBI approach Individual metrics were scored continuously from 0 to 1, based on the distribution of that metric across all stations (reference, impaired, unclassified). A cumulative frequency distribution (cdf) was created for each metric in each habitat in SPSS (Fig. 2b). The proportions from the cdf were then converted to scores. The minimum and maximum values were set by the 5th and 95th percentile of all sites. The proportions were then rescaled between 0 and 1. For metrics expected to be lower at reference sites (e.g., % dom2, Fig. 2a), the proportions were reversed so that a proportion of 0 was

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188 Table 3 Species list of carnivores and omnivores for this study. Ablabesmyia parajanta Acteocina Acteocina canaliculata Acteocina oryza Aglaophamus verrilli Alpheus heterochaelis Anaitides mucosa Ancinus depressus Ancistrosyllis Ancistrosyllis hartmanae Arabella Arabella iricolor Arabella mutans Brania clavata Brania wellfleetensis Cabira incerta Carinoma tremaphoros Ceratonereis irritabilis Chaoborus punctipennis Chiridotea Chiridotea almyra Chiridotea arenicola Chiridotea nigrescens Chiridotea tuftsi Chironomus Cladotanytarsus Clinotanypus pinguis Coelotanypus Cryptochironomus Cryptochironomus fulvus Cryptotendipes Cyathura Cyathura burbancki Cyathura polita Dicrotendipes Dicrotendipes nervosus Dinophilus Diopatra cuprea Drilonereis longa Drilonereis magna Edwardsia elegans Eobrolgus spinosus Eteone fauchaldi Eteone heteropoda Eteone lactea Eteone longa Eumida sanguinea Exogone Exogone dispar Exogone hebes

Exogone rolani Fimbriosthenelais Fimbriosthenelais minor Glycera Glycera americana Glycera dibranchiata Glycinde armigera Glycinde solitaria Glyptotendipes Gomphus Goniadella gracilis Grubeosyllis clavata Haminoea solitaria Harnischia Hexagenia Laeonereis culveri Lumbrineris Lumbrineris acicularum Lumbrineris ernesti Lumbrineris tenuis Lumbrineris verrilli Lysidice Marphysa sanguinea Microphthalmus Microphthalmus aberrans Microphthalmus sczelkowii Microphthalmus similis Micrura leidyi Nassarius Nassarius obsoletus Nassarius trivittatus Nassarius vibex Neanthes succinea Nematonereis hebes Nephtys Nephtys incisa Nephtys picta Nereis Nereis acuminata Nereis grayi Neverita duplicata Ninoe nigripes Notocirrus spiniferus Ogyrides alphaerostris Onuphis eremita Parahesione luteola Paranaitis speciosa Parapionosyllis longicirrata Paratanytarsus Paratendipes

Parougia caeca Pettiboneia duofurca Phyllodoce Phyllodoce arenae Pinnixa Pinnixa chaetopterana Pinnixa sayana Pionosyllis Pisione remota Podarke obscura Podarkeopsis levifuscina Polinices duplicatus Polypedilum Polypedilum convictum Polypedilum halterale Procladius Procladius sublettei Ptilanthura tenuis Rictaxis punctostriatus Robackia Schistomeringos rudolphi Sigalion arenicola Sigambra Sigambra bassi Sigambra tentaculata Sphaerosyllis taylori Sthenelais boa Sthenelais limicola Stictochironomus Streptosyllis arenae Streptosyllis pettiboneae Stylochus ellipticus Syllis corallicoides Syllis gracilis Tanypus Tanytarsus Tectonatica pusilla Terebra dislocata Tubulanus Tumidotheres maculatus Xenochironomus

Table 4 Species list of deep deposit feeders for this study. Amastigos caperatus Aulodrilus pigueti Axiothella mucosa Branchiura sowerbyi Capitella Capitella capitata Capitellides jonesi Clymenella Clymenella torquata Dero Ennucula tenuis Euclymene zonalis Heteromastus filiformis Ilyodrilus templetoni Leitoscoloplos Leitoscoloplos fragilis Leptosynapta Leptosynapta tenuis Limnodrilus Limnodrilus hoffmeisteri

Macroclymene zonalis Maldane glebifex Mediomastus Mediomastus ambiseta Mediomastus californiensis Notomastus Notomastus latericeus Nucula Nucula proxima Oligochaeta Orbinia americana Pectinaria Pectinaria gouldi Pectinaria granulata Peloscolex benedeni Peloscolex heterochaetus Phoxocephalus holbolli Polygordius Rhepoxynius epistomus Rhepoxynius hudsoni

Sabaco elongatus Scalibregma inflatum Scoloplos Scoloplos acutus Scoloplos armiger Scoloplos robustus Scoloplos rubra Solemya velum Stylaria lacustris Travisia carnea Travisia parva Tubificoides Tubificoides wasselli Yoldia Yoldia limatula

181

equivalent to a score of 1. A line was fitted to the scores (Fig. 2c) using a least squared algorithm in SigmaPlot (Version 11.0). The equation was then used to score the metric for each station in that habitat. Resulting values greater than one were corrected to 1 while values less than zero were corrected to 0. Classification accuracy was calculated using a variety of thresholds (1st percentile to the 99th percentile) based on the reference distribution. The index was created by averaging across all metrics at a site, then multiplying by 10 so that the index ranged from 0 to 10, with 0 indicating poor condition. There is no set impairment threshold. Threshold selection is a management decision based on the distribution of the index at reference sites. 2.5. Logistic regression approach Logistic regression was used to model the probability of impairment based on the benthic community (Agresti, 1996; Hosmer and Lemeshow, 2000). The index was developed from the logistic regression model: p = elog it

p

(1 + elog it p )

,

where p = probability that a site is impaired

The benthic index is calculated as: 10 × (1 − p) as per Hale and Heltshe (2008) so the index is scaled from 0 to 10, with 10 indicating a lower probability of impairment. The saturated model, which contained all significant metrics for a given habitat, was compared to all possible subset (‘reduced’) models using SAS 9.2 (SAS Institute, 2002–2008). These ‘reduced’ models were examined for overall significance (p < 0.10), maximum rescaled R2 and the Akaike’s Information Criterion (AIC). AIC is a log-likelihood measure that accounts for the number of parameters in the model and is used to evaluate the relative strength of competing models (Agresti, 1996; Burham and Anderson, 2002). A lower AIC value indicates that less information was lost, indicating a potentially better model. Reduced models were then compared to the saturated model using the likelihood ratio statistic, G2 (Agresti, 1996). A more parsimonious model will have a low AIC value and a non-significant G2 test. In addition, a goodness of fit measure, the Hosmer–Lemeshow test was examined to select appropriate models. The Hosmer and Lemeshow test divides the data into deciles based on predicted probabilities then calculates 2 based on observed and expected values. A non-significant test indicates that the model fits the data (Hosmer and Lemeshow, 2000). All models meeting these criteria are valid models that are more parsimonious and not significantly different from the saturated model. The final index has no set impairment threshold. Thresholds for defining impaired and unimpaired condition are based on management objectives. 2.6. Acadian Province validation Data from the Acadian Biogeographic Province (Fig. 3) were assembled for this study (182 stations, www.epa.gov/emap). These data were collected between 2000 and 2001 during the summer index period (July through October). These data were collected using similar gear and sampling/analysis and quality assurance methods as those used in the Virginian Biogeographic Province. As in the Virginian Province, with the exception of Class Oligochaeta, only organisms identified to genus or species level were retained for analysis. All stations were classified by habitat. The dominant habitats in the Acadian Province, polyhaline mud and polyhaline sand, were examined (Table 1). All analyses were conducted on each individual habitat separately. All stations were additionally classified as ‘reference’ and ‘impaired’ based on the criteria described previously. The same 23 candidate metrics (Table 2) were

182

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

Fig. 3. Map of Acadian Biogeographic Province (validation region).

calculated for the Acadian Province. These metrics were screened to identify those metrics that could discriminate between reference and impaired sites and were not redundant as described above. The metrics were then used to construct indices using the three different approaches detailed above. Classification accuracy was calculated for each approach.

all scored metrics were averaged and scaled to produce an index (Fig. 5). A sliding threshold was developed based on the values of the index at reference sites for each individual habitat. At lower cut-offs, more reference sites were correctly classified while at

3. Results 3.1. Chesapeake Bay approach Thresholds were identified based on the reference distribution for each metric in each habitat (Table S1), which allowed the metrics to be scored. The benthic index did a very good job classifying reference sites, but a relatively poor job classifying impaired sites for both calibration (Fig. 4) and validation data. Reference stations were accurately classified 88–94% of the time using calibration data and 88–100% using validation data, while classification accuracy for impaired sites ranged from 25% to 62% using calibration data and 29–64% using validation data (Table 5). The tidal freshwater, polyhaline mud and polyhaline sand habitats had the lowest classification accuracy for impaired sites, especially using calibration data. The overall classification accuracy for the indices ranged between 41% and 80% using calibration data and 45–77% using validation data (Table 5). 3.2. Mebane approach Linear, quadratic, sigmoidal, or hyperbolic curves were empirically fit to the scored data (Fig. 2c) and the best fit line then used to score each metric in each habitat (Table S2). For each habitat,

Fig. 4. Benthic index developed using Chesapeake Bay IBI approach using calibration data. The box encompasses the interquartile range. The line within the box is the median. The whiskers encompass the highest and lowest values that are not outliers (∼99% of the distribution). Note: Some extreme values not shown for clarity. Reference sites are coded in white while impaired sites are grey. The dotted line indicates the threshold between impaired and unimpaired sites.

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

183

Table 5 A comparison of classification accuracy for all three approaches. In the Chesapeake Bay IBI approach, all stations with index scores ≥3 are coded as reference, while stations with index values <3 are coded as impaired. For the Mebane IBI approach, the threshold was determined by the reference distribution for the index at reference sites. Multiple thresholds were examined. The values here represent the level that maximized correct classification for both reference and impaired sites. If a station has an index score greater than or equal to the threshold, it is coded as reference, otherwise it is coded as impaired. In the logistic regression approach, multiple thresholds were selected from the entire distribution. The values shown were those that maximized correct classification for both reference and impaired sites. If a station has an index score greater than or equal to the threshold, it is coded as reference, otherwise it is coded as impaired. Index approach

Habitat

Index threshold

Calibration classification accuracy (%)

Validation classification accuracy (%)

Chesapeake Bay

Tidal freshwater Oligohaline Low mesohaline High mesohaline Polyhaline mud Polyhaline sand

3 3 3 3 3 3

Overall

Reference

Impaired

Overall

41 65 67 80 53 75

94 89 88 91 93 89

25 60 53 62 43 32

45 69 71 77 59 75

100 100 100 95 100 88

29 64 54 45 47 36

Mebane

Tidal freshwater Oligohaline Low mesohaline High mesohaline Polyhaline mud Polyhaline sand

50th = 5.81 25th = 7.06 25th = 4.84 25th = 4.93 25th = 5.93 50th = 5.58

66 82 73 75 81 58

53 78 75 76 75 61

70 83 72 74 82 49

59 88 67 80 78 49

0 100 63 89 57 40

76 86 69 64 84 76

Logistic regression

Tidal freshwater Oligohaline Low mesohaline High mesohaline Polyhaline mud Polyhaline sand

2.50 3.60 4.50 6.50 2.60 7.60

63 80 78 76 82 68

65 78 79 74 82 70

62 80 78 79 82 63

64 88 76 80 81 58

20 100 75 84 23 59

76 86 77 73 84 56

higher percentile cut-offs, impaired sites were more often correctly classified using both calibration and validation data (Table 6). For most habitats, using the 25th percentile of the reference distribution resulted in overall classification accuracies of 75–82%. Using the 50th percentile for the tidal freshwater and polyhaline sand habitat resulted in overall classification accuracies of 66% and 58%, respectively using calibration data, and slightly lower values using validation data (Table 5). 3.3. Logistic regression approach In the tidal freshwater habitat, no subset ‘reduced’ models had lower AIC and non-significant likelihood ratio statistic, G2 , than

Fig. 5. Benthic index developed using Mebane IBI approach using calibration data. The box encompasses the interquartile range. The line within the box is the median. The whiskers encompass the highest and lowest values that are not outliers (∼99% of the distribution). Note: Some extreme values not shown for clarity. Reference sites are coded in white while impaired sites are grey.

Reference

Impaired

the saturated model (Table S3). In all other habitats, one or more models had AIC lower than the saturated model, a non-significant likelihood ratio statistic, a non-significant Hosmer–Lemeshow test, and acceptable c values (Table S3). In the oligohaline habitat only one subset model met all criteria. In the low mesohaline habitat four models met the criteria while in the high mesohaline three models did. In the polyhaline mud and polyhaline sand habitats, 25 and 43 models, respectively, met all criteria. The benthic index was calculated for all models using the logistic regression equations for both the saturated model and identified ‘reduced’ models. The index values and classification accuracy were similar for all ‘reduced’ models so further discussion will focus on the reduced model with the lowest AIC value in each habitat (Table S4) along with the saturated model. For each habitat, different values of the benthic index were examined for classification accuracy (Table 7). At lower benthic index values, more reference sites were correctly classified while at higher index values, impaired sites were more often correctly classified (Table 7). The threshold values of the benthic index that provided reasonable classification accuracy for both reference and impaired sites varied by habitat (Table 5). In the oligohaline, low mesohaline, high mesohaline and polyhaline mud, overall classification accuracy ranged from 76% to 82% in the calibration data set, when balancing ability to identify reference and impaired sites. The accuracy obtained in the validation data was similar to the results obtained from the calibration data. When balancing the ability to discriminate between reference and impaired sites, overall classification accuracy was only 63–64% in the tidal freshwater habitat for both calibration and validation data. In the polyhaline sand habitat, overall classification accuracy was only 68% when balancing identification of reference and impaired sites. The validation response was slightly lower, but otherwise similar. The benthic index scores had very little correspondence among habitats in the reference sites (Fig. 6a), but showed more similarity of response in the impaired sites, with the exception of the polyhaline sand habitat. The ‘reduced’ subset model scores exhibited a similar pattern to the saturated model (Fig. 6b) and classification accuracy was likewise similar.

184

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

Table 6 An example of classification accuracy using a sliding threshold for the index developed using the Mebane IBI approach.

Habitat High mesohaline

Polyhaline mud

Calibration Data

Validation Data

Classification Accuracy (%)

Classification Accuracy (%)

Reference percentile cutoff

Reference Percentile value

Overall

Reference

Impaired

Overall

Reference

Impaired

1 5 10 25 50 75 90 95 99 1 5 10 25 50 75 90 95 99

2.63 3.27 3.82 4.93 6.41 7.34 7.99 8.07 8.56 1.94 2.87 4.72 5.53 6.15 6.99 7.35 7.67 8.39

77 80 80 75 65 52 42 39 38 28 34 66 75 77 81 78 80 79

100 97 91 76 50 24 9 3 2 98 95 91 75 50 25 9 5 0

38 53 62 74 91 100 100 100 100 10 18 59 75 84 95 96 100 100

67 73 77 80 53 50 40 37 37 25 35 67 81 81 78 75 78 78

95 95 95 89 37 21 5 0 0 100 100 93 71 43 14 0 0 0

18 36 45 64 82 100 100 100 100 4 16 59 84 92 96 96 100 100

Boxes indicate where the accuracy of identifying both reference and impaired sites are approximately equal.

Table 7 An example of classification accuracy using a sliding threshold for the index developed using the logistic regression approach.

Calibration Data

Validation Data

Classification Accuracy (%)

Classification Accuracy (%)

Benthic Index cutoff

Overall

Reference

Impaired

Overall

Reference

Impaired

Tidal freshwater

5.50 5.00 4.50 4.00 3.80 3.60 3.40 3.20 3.00 2.50 2.00 1.50 1.00

79 76 74 73 73 73 71 71 73 63 56 47 40

12 12 29 35 35 35 35 41 53 65 71 94 94

100 96 89 85 85 85 83 81 79 62 51 32 23

77 77 77 77 68 68 68 64 68 64 64 73 68

0 0 0 0 0 0 0 0 20 20 40 80 100

100 100 100 100 88 88 88 82 82 76 71 71 59

High mesohaline

8.50 8.00 7.50 7.00 6.50 6.40 6.20 6.00 5.80 5.60 5.50 5.00 4.50 4.00 3.50

67 71 74 74 76 76 76 75 79 79 78 79 80 82 80

55 60 66 69 74 74 74 74 81 81 81 84 88 93 95

88 88 88 82 79 79 79 76 76 76 74 71 68 62 56

63 63 73 80 80 77 77 80 83 83 83 83 80 80 77

47 47 63 79 84 84 84 89 95 95 95 95 95 95 95

91 91 91 82 73 64 64 64 64 64 64 64 55 55 45

Habitat

Boxes indicate where the accuracy of identifying both reference and impaired sites are approximately equal.

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

185

For each habitat, all scored metrics were averaged and scaled to produce an index. A sliding threshold was developed based on the values of the index at reference sites for each individual habitat. At lower cut-offs, more reference sites were correctly classified while at higher percentile cut-offs, impaired sites were more often correctly classified, as was seen in the Virginian Province. Overall classification accuracy was 94% in the polyhaline mud habitat and 68% in the polyhaline sand habitats, using a reference cut-off of the 10th and 25th percentiles, respectively. In the polyhaline sand habitat, no ‘reduced’ models were possible as the index was composed of 2 metrics. In the Polyhaline mud habitat, fourteen models fit well had lower AIC values and were not significantly different from the saturated model (Table S7). The benthic index was calculated for all models using the logistic regression equations for both the saturated model and identified ‘reduced’ models. As in the Virginian Province, further discussion will focus on the reduced model with the lowest AIC value in each habitat (Table S8) along with the saturated model. At lower benthic index values, more reference sites were correctly classified while at higher index values, impaired sites were more often correctly classified. The threshold values of the benthic index that provided reasonable classification accuracy for both reference and impaired sites varied by habitat (Table 8). In the Polyhaline mud habitat, all stations were correctly classified. In the polyhaline sand habitat, overall classification accuracy was 68% when balancing identification of reference and impaired sites.

4. Discussion

Fig. 6. Benthic indices developed using logistic regression approach using calibration data. The box encompasses the interquartile range. The line within the box is the median. The whiskers encompass the highest and lowest values that are not outliers (∼99% of the distribution). Note: Some extreme values not shown for clarity. Reference sites are coded in white while impaired sites are grey. (a) Saturated models developed using all metrics. (b) Reduced model developed from a subset of all possible metrics. The models shown are those that had the lowest AIC value from each habitat, and were not significantly different from the saturated model. Because the tidal freshwater index was based on only 2 metrics, no reduced model was possible.

3.4. Acadian Province validation For the Chesapeake Bay approach, thresholds were identified based on the reference distribution for each metric in each habitat (Table S5), which allowed the metrics to be scored. The benthic index did a very good job classifying reference sites, but a relatively poor job classifying impaired sites (Table 8). Reference stations had 100% classification accuracy in both the polyhaline mud and polyhaline sand habitats. As was seen in the Virginian Province, classification accuracy was lower for impaired stations (60% for polyhaline mud and 0% for polyhaline sand). For the Mebane approach, quadratic, sigmoidal, exponential or hyperbolic curves were empirically fit to the scored data and the best fit line then used to score each metric in each habitat (Table S6).

In our study we examined three different approaches to index assembly – two multimetric approaches (Chesapeake Bay and Mebane) and one statistical approach (logistic regression). The study was designed to compare the different index development approaches, rather than create one ‘best’ model for the Virginian Province. Because the Chesapeake Bay approach used multiple habitats, the Mebane and logistic regression approaches also were tested for the different habitats. This resulted in slightly different impairment thresholds for both the Mebane and logistic regression approach. If developing a ‘best approach’ for these two methods, we would have adjusted the metrics for habitat differences as was done by Engle and Summers (1999) and Paul et al. (2001). The Chesapeake Bay IBI approach has a set threshold for impairment, based on scoring the individual metrics. This approach had very good classification accuracy for reference sites but poor classification accuracy for impaired sites. This pattern was also seen in the Acadian Providence validation case. This is likely due to the fact that the metrics were scored based on the reference distribution and were therefore more likely to accurately reflect reference conditions. Metrics that are more reflective of impaired conditions are not represented well by this approach, especially where there was some overlap in metric response at reference and impaired sites. Although this approach has worked well in Chesapeake Bay, where it was developed), it did not perform well in this study. Our results were consistent with attempts by other researchers (see Engle and Summers, 1999). Despite using completely different approaches – the Mebane IBI approach uses continuous scoring of metrics and then averages across metrics while the logistic regression uses weights the raw metrics so that the overall model discriminates well between reference and impaired sites – classification accuracies were similar for the two approaches and were comparable to classification accuracies seen in other studies (Table 9). Although the Mebane IBI approach is also a multimetric approach, it worked better than the Chesapeake IBI approach, probably because scoring was continuous and based on the entire range of data. Both Pinto et al.

186

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

Table 8 A comparison of classification accuracy for all three approaches in the Acadian Province. In the Chesapeake Bay IBI approach, all stations with index scores ≥3 are coded as reference, while stations with index values <3 are coded as impaired. For the Mebane IBI approach, the threshold was determined by the reference distribution for the index at reference sites. Multiple thresholds were examined. The values here represent the level that maximized correct classification for both reference and impaired sites. If a station has an index score greater than or equal to the threshold, it is coded as reference, otherwise it is coded as impaired. In the logistic regression approach, multiple thresholds were selected from the entire distribution. The values shown were those that maximized correct classification for both reference and impaired sites. If a station has an index score greater than or equal to the threshold, it is coded as reference, otherwise it is coded as impaired. Index approach

Habitat

Index threshold

Acadian Province Classification accuracy (%)

Chesapeake Bay

Polyhaline mud Polyhaline sand

3 3

88 41

100 100

69 0

Mebane

Polyhaline mud Polyhaline sand

10th = 5.41 25th = 7.01

94 68

89 74

100 46

Logistic Regression

Polyhaline mud Polyhaline sand

5.00 4.40

100 68

100 67

100 69

Overall

(2009) and Blocksom (2003) indicated that continuous scoring produced better results than non-continuous scoring. For calibration and validation data, overall classification accuracy was relatively high for most habitats using the Mebane IBI approach. The exceptions were the tidal freshwater and polyhaline sand habitats. Low overall classification accuracy (66%) for the tidal freshwater environment was previously seen when applying the Chesapeake Bay B-IBI in Chesapeake Bay (R. Batuik, personal communication). Poor classification accuracy in polyhaline sand was also seen by Teixeira et al. (2010), who noted that it was hard to distinguish between natural and anthropogenic stress in these high energy areas. Sand tends to be prevalent in high energy areas, and have less sediment cohesion (than mud). Because of this, there is generally high sediment movement, which can cause a variety of successional stages to be present at any given time (Hall, 1994), which likely contributed to the low classification accuracy in this environment. Although the overall classification accuracies were similar, there were also some differences between the Mebane IBI approach and the logistic regression approach. While the Mebane approach produced index values that were somewhat similar across habitats (Fig. 5), the logistic regression index values increased as salinity increased (Fig. 6). An exception is the polyhaline mud habitat. The amount of organic carbon in sediments is largely controlled by grain size (Burdige, 2006), and many chemical contaminants are also often associated with high carbon sediments (Schwarzenbach et al., 2003). Thus it would be expected that there would be more impaired sites in the polyhaline mud habitat and more reference sites in the polyhaline sand habitat. Empirically, that is what we saw in our data. Interestingly, the index created using the logistic regression approach also showed this trend, with the polyhaline mud index having lower values, even in the reference sites than the polyhaline sand index (Fig. 6). This suggests that classifying based on grain size and salinity did not adequately control for these environmental variables. Paul et al. (2001) corrected for dominant

Reference

Impaired

environmental gradients by adjusting individual metrics for salinity, and creating one index for all habitats. It is likely that adjusting for metrics for these major environmental gradients (e.g., salinity and grain size) would produce more equal index values across habitats. Both approaches utilized in this study appear to have merit. Previous work has shown that multimetrics and multivariate approaches can give similar results (Herbst and Silldorff, 2006). The strength of the Mebane IBI approach is the ease of explaining how the index was constructed to managers. While the logistic regression approach is not as easily explained, it has been used in index development and is not itself an esoteric approach. In addition, this technique allows an accepted method to select a more parsimonious model. To aid comparison between approaches, models were developed in individual habitats. For the Mebane approach, the index values were relatively stable across habitats, implying that important natural environmental gradients have been accounted for. However, the index thresholds varied. In contrast, the logistic regression approach had different index values across habitats. Adjusting the metrics for major habitat gradients and then developing the index would help to stabilize index values across habitats for the logistic regression approach and allow a single threshold across all both approaches appear to have merit in index development in estuarine waters. Although classification accuracies for the Mebane IBI approach and the logistic regression approach were within the range of accuracies seen with other studies, they were not as high as seen in the Chesapeake Bay B-IBI in Chesapeake Bay or by Paul et al. (2001) for four years of data in the Virginian Province. Engle and Summers (1999) noted that when building an index applicable across a broad area, the amount of variability inherent in the benthos may decrease precision. The data used in the study were obtained from probabilistic surveys, which are designed to evaluate overall condition rather than focusing on areas of concern. In addition, our

Table 9 Comparison of classification accuracy for other invertebrate-based benthic indices. Study

Area of application

Chesapeake Bay B-IBIa Engle and Summers (1999) Van Dolah et al. (1999) Paul et al. (2001) Hale and Heltshe (2008) This study – Mebane approach This study – logistic regression approach

Chesapeake Bay Gulf of Mexico estuaries Carolinian Biogeograhic Province Virginian Biogeographic Province Gulf of Maine estuaries Virginian Biogeographic Province Virginian Biogeographic Province

Classification accuracy Calibration

a

R. Batuik, personal communication.

Validation

Overall 66–95%

91–96% 90–100% 87–90% 81% 58–82% 63–82%

66–84% 50–82% 81–88% 76–84% 49–88% 58–88%

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

criteria to select reference and impaired sites yielded a broad range of sites rather than the very best and very worst sites. Reference conditions may vary across habitats and ecoregions (Teixeira et al., 2010), which can cause variations in index results. This suggests that more locally calibrated indices be developed for a given area or water body, but that intercalibration exercises, such as that done by Ranasinghe et al. (2002) and Teixeira et al. (2010) be conducted. These sorts of exercises can be useful in applying the biological condition gradient developed for streams (Davies and Jackson, 2006) to estuaries. The biological condition index allows the binning of ecological attributes (in this case, benthic index scores) along a gradient of conditions from ‘natural’ or ‘minimally disturbed’ to severely altered. The biological condition gradient is also analogous to the ecological quality ratio (Andersen et al., 2004) used under the European Water Framework Directive. These approaches hold great promise for not only diagnosing ‘good’ and ‘bad’ sites, but also allowing an understanding of whether sites are improving or degrading, and improving communication for all stakeholders. 5. Conclusion Multiple approaches for benthic index development have been explored worldwide. In this study, we examined three approaches utilized in the United States – two multimetric and one multivariate, and explored their strengths and weaknesses. Although the Chesapeake Bay IBI approach has been successfully utilized in the area for which it was developed, it did not perform well in this study. In contrast, another multimetric approach, the Mebane IBI approach, performed well, as did the multivariate logistic regression approach. The results from the Acadian Province validation were similar, so our conclusions appear to be more broadly applicable. The strength of the Mebane IBI approach is the ease of explaining how the index was constructed to managers. The strength of the logistic regression approach is that this technique allows selection of a more parsimonious model. Both techniques have promise for index development and could be useful in applying a biological condition gradient to estuaries. Acknowledgements We would like to thank the EMAP field crews and IT staff for providing the data used in this study, Trish Decastro for graphics support and Anne Kuhn-Hines, Steven Hale and James Heltshe for their technical reviews. This manuscript has been reviewed by U.S. EPA’s National Health and Environmental Effects Research Laboratory, Atlantic Ecology Division, Narragansett, Rhode Island and approved for publication. Approval does not signify that the contents necessarily reflect the views and policies of the Agency. Mention of trade names or commercial products does not constitute endorsement or recommendation for use. This is Contribution # AED-10-082. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.ecolind.2012.03.019. References Agresti, A., 1996. An Introduction to Categorical Data Analysis. John Wiley & Sons, New York, 290pp. Andersen, J.P., Conley, D.J., Hedal, S., 2004. Palaeoecology, reference conditions and classification of ecological status: the EU Water Framework Directive in practice. Mar. Pollut. Bull. 49, 283–290. Blocksom, K.A., 2003. A performance comparison of metric scoring methods for a multimetric index for mid-Atlantic highlands streams. Environ. Manage. 31, 670–6852.

187

Burdige, D.J., 2006. Geochemistry of Marine Sediments. Princeton University Press, Princeton, NJ, 609pp. Burham, K.P., Anderson, D.R., 2002. Model Selection and Multimodel Inference: A Practical Information-theoretic Approach, 2nd ed. Springer-Verlag, New York, 488pp. Clarke, K.R., Gorley, R.N., 2006. PRIMER v6: User Manual/Tutorial. PRIMER-E, Plymouth, UK, 172pp. Clarke, K.R., Warwick, R.M., 2001. Change in Marine Communities: An Approach to Statistical Analysis and Interpretation, 2nd ed. PRIMER-E, Plymouth, UK, 169pp. Dauvin, J., Bellan, G., Bellan-Santini, D., 2010. Benthic indicators: from subjectivity to objectivity—where is the line? Mar. Pollut. Bull. 60, 947–953. Davies, S.P., Jackson, S.K., 2006. The biological condition gradient: a descriptive model for interpreting change in aquatic ecosystems. Ecol. Appl. 16, 1251–1266. Diaz, R.J., Solan, M., Valente, R.M., 2004. A review of approaches for classifying benthic habitats and evaluating habitat quality. J. Environ. Manage. 73, 165–181. Engle, V.D., Summers, J.K., 1999. Refinement, validation, and application of a benthic condition index for Gulf of Mexico estuaries. Estuaries 22, 624–635. Gibson, G.R., Bowmann, M.L., Gerritsen, J., Snyder, B.D., 2000. Estuarine and Coastal Marine Waters: Bioassessment and Biocriteria Technical Guidance. EPA/822/B-00/024. United States Environmental Protection Agency, Office of Water, Washington, DC, 289pp. Grall, J., Glémarec, M., 1997. Using biotic indices to estimate macrobenthic community perturbations in the Bay of Brest. Estuar. Coast. Shelf Sci. 44, 43–53. Hale, S.S., Paul, J.F., Heltshe, J.F., 2004. Watershed landscape indicators of estuarine benthic condition. Estuaries 27, 283–295. Hale, S.S., Heltshe, J.F., 2008. Signals from the benthos: development and evaluation of a benthic index for nearshore Gulf of Maine. Ecol. Indic. 8, 338–350. Hart, C.W., Fuller, S.L.H. (Eds.), 1979. Pollution Ecology of Estuarine Invertebrates. Academic Press. New York, 406pp. Hall, S.J., 1994. Physical disturbance and marine benthic communities: life in unconsolidated sediments. Oceanogr. Mar. Biol. Annu. Rev. 32, 179–239. Herbst, D.B., Silldorff, E.L., 2006. Comparison of the performance of different bioassessment methods: similar evaluations of biotic integrity from separate programs and procedures. J. N. Am. Benth. Soc. 25, 513–530. Hosmer, D.W., Lemeshow, 2000. Applied Logistic Regression, 2nd ed. John Wiley & Sons, New York, 375pp. Hurlbert, S.N., 1971. The non-concept of species diversity: a critique and alternate parameters. Ecology 52, 577–586. Karr, J.R., Fausch, K.D., Angermeier, P.L., Yant, P.R., Schlosser, I.J., 1986. Assessing biological integrity in running waters. A method and its rationale. Illinois Natural History Survey, Special Publication 5, Champaign, IL, 28pp. Kerans, B.L., Karr, J.R., 1994. A benthic index of biotic integrity for rivers of the Tennessee Valley. Ecol. Appl. 4, 768–785. Legendre, P., Legendre, L., 1998. Numerical Ecology, 2nd English Edition. Elsevier Science B.V., Amsterdam, 853pp. Little, C., 2000. The Biology of Soft Shores and Estuaries. Oxford University Press, New York, NY, 252pp. Llansó, R.J., Scott, L.C., Dauer, D.M., Hyland, J.L., Russell, D.E., 2002a. An estuarine benthic index of biotic integrity for the mid-Atlantic region of the United States. I. Classification of assemblages and habitat definition. Estuaries 25, 1219–1230. Llansó, R.J., Scott, L.C., Dauer, D.M., Hyland, J.L., Dauer, D.M., Russell, D.E., Kutz, F.W., 2002b. An estuarine benthic index of biotic integrity for the mid-Atlantic region of the United States. II. Index development. Estuaries 25, 1231–1242. Llansó, R.J., 2002. Methods for Calculating the Chesapeake Bay Benthic Index of Biotic Integrity. Versar, Inc., Columbia, MD, 27pp., http://www.baybenthos.versar.com. Long, E.R., MacDonald, D.D., Smith, S.L., Calder, F.D., 1995. Incidence of adverse biological effects within ranges of chemical concentrations in marine and estuarine sediments. Environ. Manage. 19, 81–97. Marques, J.C., Salas, F., Patrício, J., Teixeira, H., Neto, J.M., 2009. Ecological Indicators for Coastal and Estuarine Environmental Assessment: A User Guide. WIT Press, Southampton, UK, 183pp. Mebane, C.A., Maret, T.R., Hughes, R.M., 2003. An index of biological integrity (IBI) for the Pacific Northwest. T. Am. Fish. Soc. 132, 239–261. Norris, R.H., Hawkins, C.P., 2000. Monitoring river health. Hydrobiologia 435, 5–17. Ohio EPA, 1987. Biological Criteria for the Protection of Aquatic Life, vols. I–III. Division of Water Quality, Ohio Environmental Protection Agency, Columbus, OH, 362pp. Paul, J.F., Scott, K.J., Campbell, D.E., Gentile, J.H., Strobel, C.S., Valente, R.M., Weisberg, S.B., Holland, A.F., Ranasinghe, J.A., 2001. Developing and applying a benthic index of estuarine condition for the Virginian Biogeographic Province. Ecol. Indic. 1, 83–99. Pearson, T.H., Rosenberg, R., 1978. Macrobenthic succession in relation to organic enrichment and pollution of the marine environment. Oceanogr. Mar. Biol. Ann. Rev. 16, 229–311. Pelletier, M.C., Gold, A.J., Heltshe, J.F., Buffum, H.W., 2010. A method to identify estuarine macroinvertebrate pollution indicator species in the Virginian Biogeographic Province. Ecol. Indic. 10, 1037–1048. Pinto, R., Patrício, J., Baeta, A., Fath, B.D., Neto, J.M., Marques, J.C., 2009. Review and evaluation of estuarine biotic indices to assess benthic condition. Ecol. Indic. 9, 1–25. Plafkin, J.L., Barbour, M.T., Porter, K., Gross, S., Hughes, R.M., 1989. Rapid bioassessment protocols for use in rivers and streams: benthic macroinvertebrates and

188

M.C. Pelletier et al. / Ecological Indicators 23 (2012) 176–188

fish. EPA/440/4-89/001. U.S. Environmental Protection Agency, Washington, DC, 190pp. Ranasinghe, A., Frithsen, J.B., Kutz, F.W., Paul, J.F., Russell, D.E., Batuik, R.A., Hyland, J.L., Scott, J., Dauer, D.M., 2002. Application of two indices of benthic community condition in Chesapeake Bay. Environmetrics 13, 499–511. Rhoads, D.C., 1974. Organism–sediment relations on the muddy sea floor. Oceanogr. Mar. Biol. Ann. Rev. 12, 263–300. Rosenberg, R., Blomquist, M., Nilsson, H.C., Cederwell, H., Dimming, A., 2004. Marine quality assessment by use of benthic species-abundance distributions: a proposed new protocol within the European Water Framework Directive. Mar. Pollut. Bull. 49, 728–739. SAS Institute, 2008. Version 9.2, Cary, NC, USA. Schwarzenbach, R.P., Gschwend, P.M., Imboden, D.M., 2003. Environmental Organic Chemistry. John Wiley & Sons, Hoboken, NJ, 1313pp. Snelgrove, P.V.R., 1998. The biodiversity of macrofaunal organisms in marine sediments. Biodivers. Conserv. 7, 1123–1132. Teixeira, H., Borja, A., Weisberg, S.B., Ranasinghe, J.A., Cadien, D.B., Dauer, D.M., Dauvin, J., Degraer, S., Diaz, R.J., Grémere, A., Karakassis, I., Llansó, R.J., Lovell, L.L., Marques, J.C., Montagne, D.E., Occhipinti-Ambrogi, A., Rosenberg, R., Sardá, R., Schaffner, L.C., Velarde, R.C., 2010. Assessing coastal benthic macrofauna

community condition using best professional judgment—developing consensus across North America and Europe. Mar. Pollut. Bull. 60, 589–600. Van Dolah, R.F., Hyland, J.L., Holland, A.F., Rosen, J.S., Snoots, T.R., 1999. A benthic index of biological integrity for assessing habitat quality in estuaries of the southeastern USA. Mar. Environ. Res. 48, 269–283. Warwick, R.M., Clarke, K.R., 1993. Comparing the sensitivity of disturbance: a meta-analysis of marine macrobenthic community data. Mar. Ecol. Prog. Ser. 92, 221–231. Weisberg, S.B., Ranasinghe, J.A., Schaffner, L.C., Diaz, R.J., Dauer, D.M., Frithsen, J.B., 1997. An estuarine index of biological integrity (B-IBI) for Chesapeake Bay. Estuaries 20, 149–158. U.S. EPA, 1995. Environmental Monitoring and Assessment Program (EMAP): Laboratory Methods Manual—Estuaries, vol. 1: Biological and Physical Analyses. EPA/620/R-95/008. United States Environmental Protection Agency, Office of Research and Development, Narragansett, RI, 128pp. U.S. EPA, 2001. National Coastal Assessment: Field Operations Manual. EPA/620/R-01/003. United States Environmental Protection Agency, Office of Research and Development, National Health and Environmental Effects Laboratory, Gulf Breeze, FL, 72pp. Zar, J.H., 1999. Biostatistical Analysis, 4th ed. Prentice-Hall Inc, NJ, 663pp.