Evaluating the Type II error rate in a sediment toxicity classification using the Reference Condition Approach

Evaluating the Type II error rate in a sediment toxicity classification using the Reference Condition Approach

Aquatic Toxicology 101 (2011) 207–213 Contents lists available at ScienceDirect Aquatic Toxicology journal homepage: www.elsevier.com/locate/aquatox...

286KB Sizes 0 Downloads 7 Views

Aquatic Toxicology 101 (2011) 207–213

Contents lists available at ScienceDirect

Aquatic Toxicology journal homepage: www.elsevier.com/locate/aquatox

Evaluating the Type II error rate in a sediment toxicity classification using the Reference Condition Approach ˜ Maestre a , Maite Martinez-Madrid b , Trefor B. Reynoldson c Pilar Rodriguez a,∗ , Zurine a b c

Department of Zoology and Animal Cell Biology, University of the Basque Country, Apdo. 644, 48080 Bilbao, Spain Department of Genetics, Physical Anthropology and Animal Physiology, University of the Basque Country, Apdo. 644, 48080 Bilbao, Spain Acadia Center for Estuarine Research, Acadia University, Box 115, Wolfville, NS B2R 4P6, Canada

a r t i c l e

i n f o

Article history: Received 22 June 2010 Received in revised form 10 September 2010 Accepted 25 September 2010 Keywords: Sediment toxicity Tubifex Reference condition Probability ellipses

a b s t r a c t Sediments from 71 river sites in Northern Spain were tested using the oligochaete Tubifex tubifex (Annelida, Clitellata) chronic bioassay. 47 sediments were identified as reference primarily from macroinvertebrate community characteristics. The data for the toxicological endpoints were examined using non-metric MDS. Probability ellipses were constructed around the reference sites in multidimensional space to establish a classification for assessing test-sediments into one of three categories (Non Toxic, Potentially Toxic, and Toxic). The construction of such probability ellipses sets the Type I error rate. However, we also wished to include in the decision process for identifying pass–fail boundaries the degree of disturbance required to be detected, and the likelihood of being wrong in detecting that disturbance (i.e. the Type II error). Setting the ellipse size to use based on Type I error does not include any consideration of the probability of Type II error. To do this, the toxicological response observed in the reference sediments was manipulated by simulating different degrees of disturbance (simpacted sediments), and measuring the Type II error rate for each set of the simpacted sediments. From this procedure, the frequency at each probability ellipse of identifying impairment using sediments with known level of disturbance is quantified. Thirteen levels of disturbance and seven probability ellipses were tested. Based on the results the decision boundary for Non Toxic and Potentially Toxic was set at the 80% probability ellipse, and the boundary for Potentially Toxic and Toxic at the 95% probability ellipse. Using this approach, 9 test sediments were classified as Toxic, 2 as Potentially Toxic, and 13 as Non Toxic. © 2010 Elsevier B.V. All rights reserved.

1. Introduction Sediments are an important component of aquatic ecosystems and provide habitat for the benthic community. Sediments have been called “the long term memory of water quality” (Heise, 2007) and are both a source and sink for contaminants entering the ecosystem from point and non-point sources. Therefore, a comprehensive water quality evaluation should include assessment of the quantity and quality of sediments which can cause adverse effects in the ecological status of freshwaters (Casper, 2008; Salomons and Brils, 2004). Sediment quality has a direct bearing on the achievement of the objective of Good ecological and chemical status for water bodies, as defined in the European Water Framework Directive (WFD: CEE, 2000). However, the role of sediments as a secondary source of contaminants has been neglected in the WFD and sediment quality is usually assessed by regional authorities under the stand still principle (CEE, 2008), i.e. priority substances present in sediments should not significantly increase their con-

∗ Corresponding author. Tel.: +34 946 012 712; fax: +34 946 013 500. E-mail address: [email protected] (P. Rodriguez). 0166-445X/$ – see front matter © 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.aquatox.2010.09.020

centration, in agreement with Article 4 (5c) of the WFD. Over the last few years, numerous authors have stressed that the presence of chemicals at high concentrations implies only exposure but is not evidence of ecological degradation (Burton et al., 2002; Chapman and Anderson, 2005; Grapentine et al., 2002). The study of biological condition (benthic communities, ecotoxicity, and bioaccumulation) is required for a sound assessment of any adverse effects of sediment contaminants (Chapman, 2007) or measuring the effectiveness of any remedial action (Chapman and Anderson, 2005). In the last decade, the “Weight of Evidence” approach (WOE: Chapman et al., 2002) has established that the assessment of sediment quality must be performed by integrating different lines of evidence (LOEs), which include both field and laboratory data. There is no single methodology for the WOE approach, but it usually includes sediment chemistry, toxicity tests, benthic communities, and often habitat conservation, and bioaccumulation or food chain transfer models (Burton et al., 2002). Typically some sort of decision matrix is used, based on either scores or nominal categories from univariate or less frequently multivariate analysis, to provide a WOE integrating the individual lines of evidence. A common approach for deriving scores for individual LOEs is the comparison of data for the different variables from test sites with data from

208

P. Rodriguez et al. / Aquatic Toxicology 101 (2011) 207–213

sites considered as reference of a healthy condition. This assessment is based on the measurement of the degree of deviation of any single LOE measured at a given site from a reference condition established from the study of a number of unpolluted sites in the same study area (Reference Condition Approach: Bailey et al., 2004; Reynoldson et al., 1997, 2000). The sediment toxicity lines of evidence are based on the analyses of acute or chronic endpoints measured in laboratory bioassays where test organisms (usually benthic species) have been exposed to field sediments. In the present study, the sediment dwelling oligochaete species Tubifex tubifex (cultured in the laboratory) was exposed to both reference and test sites in chronic bioassays (ASTM, 2005) and several endpoints were measured including survival, growth (total production) and reproduction (number of cocoons and young, hatch percentage). In agreement with the WFD philosophy for the assessment of the ecological status of the water bodies, toxicity data for the test sites are compared with the reference condition, thus allowing for the natural variability of sediments in the study area. Toxicity risk assessment implies the knowledge of probability of occurrence of certain effects. Two decision making approaches based on laboratory sediment toxicity data can be used, as explained by Grapentine et al. (2002). A significant impairment in a toxicity endpoint is indicated when conditions in a test site show a statistical significant difference with reference sites. This univariate approach requires, typically, an analysis for each endpoint measured at every test site with a value for the reference condition. The results of these individual analyses need to be combined into a single assessment of sediment toxicity. However, an alternative approach considers multiple endpoints using multivariate analysis, e.g. plotting reference sites together with the test sites in ordination space (Reynoldson et al., 2002a). In this study, the analysis of the sediment toxicity LOE has used the multivariate approach. The aim of present study was to do a screening assessment of sediment toxicity based on several chronic variables (survival, growth and reproduction) of the oligochaete worm T. tubifex, using the Reference Condition Approach. Using multivariate analysis of the toxicity variables we have built probability ellipses to evaluate both Type I and II statistical errors in the site assessments. To determine Type II error we created disturbed (simpacted) sites by modifying the endpoints for each of the reference sites, e.g. reducing survival and reproduction by various amounts. Using this approach we are certain that the value has changed from reference and can determine if the assessment detects this change. When using field sites we cannot be certain that the endpoint is showing a response. We are aware of the limitation of using only one sediment bioassay for toxicity assessment, since the use of at least three bioassays with different species (Chapman, 2007) is recommended in the assessment of sediment toxicity. However, the purpose of this paper was to examine an approach to establish Type II errors and can easily be adapted to other sediment tests (Reynoldson et al., 2002a). In our opinion, data derived from the T. tubifex 28 d sediment bioassay are both ecologically relevant and informative for screening level sediment toxicity evaluation. In fact, reproductive impairment in this bioassay constitutes the main response identified in sediment toxicity assessment, using 4 different sediment chronic bioassays with invertebrates (Reynoldson et al., 2002b), although this may vary depending on the toxicants (Milani et al., 2003).

2. Material and methods A total of 71 sediments from river locations in Northern Spain were tested. For most sites, the IBMWP (invertebrate community)

scores (Alba-Tercedor and Sánchez-Ortega, 1988; Alba-Tercedor and Pujante, 2000) were available from the various water authorities surveillance monitoring data bases (Basque Government, 2007; CHE, 2008; CHN, 2004). At 10 sites we calculated the IBMWP scores following the water authority procedures (Basque Government, 2006; CHE, 2006). For each study site, the Ecological Quality Ratio (EQS: Stroffek, 2001) was calculated using the reference condition score for each ecoregion, as calculated by the water authorities. Selection of reference sites in the study area was done in two steps. First, sites with benthic communities evaluated as having Good or Very Good quality (EQS ratios close to 1) were selected. Second, we excluded any of those sites with more than 50% mortality and/or more than 2 toxicity endpoints below the 5th percentile of the total variable distribution (Reynoldson et al., 2000, 2002a). T. tubifex sediment chronic bioassays were run following the standardised method proposed by Reynoldson et al. (1991) and further included in ASTM (2005). Sampling, storage, and general procedure for running the chronic bioassay and endpoints in present study are described in detail by Maestre et al. (2007). Test organisms were cultured in the laboratory and their sensitivity regularly controlled (96 h LC50 = 4.89–7.22 mg Cr(VI) l−1 , 0.25–0.39 mg Cd l−1 , 0.03–0.08 mg Cu l−1 , in Maestre et al. (2009a)). In each set of bioassays, a negative control was run using culture sediment. Mean values of the toxicity endpoints for controls were within the range of values previously reported in Maestre et al. (2007) (i.e. survival > 90%, number of cocoons per adult > 9, coefficients of variation for cocoon production < 25%, and for young production < 50%). Both survival and variability of cocoon production agree with the validity criteria in ASTM (2005). At most study sites, a subsample from the bulk sediment sampled for bioassays was used for chemical analyses of 7 metals (Cd, Cr, Cu, Hg, Ni, Pb and Zn) and 1 metalloid (As). Data for 25 sediments sampled in 2004 were provided by the water authorities which control the different river monitoring networks. Metals were analysed in the <63 ␮m sediment fraction using inductively coupled plasma atomic emission spectroscopy (ICP-AES), after microwave assisted acid digestion (USEPA, 2000) or microwave digestion combined with inductively coupled plasma mass spectrometry (ICP MS) (Navarro et al., 2006). Hg concentration was measured by cold vapour atomic absorption spectrometry (CV AAS) or by flow injection hydride generation atomic absorption spectrometry (FI HG AAS) (Sanz et al., 2004). Chemical data for all study sites are shown in Table 1. Spearman correlation analysis between chemicals and toxicological endpoints was performed with SPSS (2006). Values for toxicological endpoints in the chronic bioassays were ordinated and plotted in multivariate space (non metric multidimensional scaling analysis, nMDS) to examine patterns in site distribution. Probability ellipses in the multidimensional space of the reference sites were created with an Excel macro (Microsoft Office Excel, 2003) using the MDS scores of reference sites. Site classification (cluster analysis), Anosim (analysis of dissimilarity), and multivariate analysis (nMDS) for ordination of sites were performed with Primer 6 (Clarke and Gorley, 2006), based on the data set derived from sediment toxicity bioassays. The selection of the pass–fail boundaries for the toxicity assessment of the test sediments was based on the construction of probability ellipses in the MDS ordination space. The ordination of the reference sites in the multivariate space describes the range of variation for toxicological variables in unimpaired sediments. By constructing probability ellipses based on reference sites only, the test sites can be compared with the response range in natural conditions (Reynoldson et al., 2000). The greater the departure from the reference space, the greater the difference from reference condition and, consequently, the higher the degree of impairment.

P. Rodriguez et al. / Aquatic Toxicology 101 (2011) 207–213

209

Table 1 Total organic content, silt–clay percent and metal concentration (mg kg−1 dw) in reference and test sediments. In italics, concentrations above TEC: Threshold Effect Concentration; in bold, concentrations above PEC (Probable Effect Concentration). Sediment criteria from MacDonald et al. (2000). %TOC

% Silt–clay

As

Cd

Reference sediments n = 47 Mean SD Maximum Minimum No. sediments above TEC No. sediments above PEC

2.8 2.2 11.1 0.3 – –

23.9 14.1 84.0 0.9 – –

6.0 5.6 22.9 1.4 8 0

0.4 0.3 1.8 <0.01 2 0

Control sediment n = 4a Mean SD Maximum Minimum

12.4 2.5 15.3 10.5

25.8 3.5 29 22

12.0 9.4 21.1 2.5

0.5 0.3 0.8 0.1

Test sediments A202b ARAR150b AS160b AS160-A B226b BA558b BHE D202b N338b NO3023b NO3070b NO3096b OK114b OM080 OM380b SP18b SP8b UR320 UR434b URS34 Z828 68 124 183

2.32 6.99 0.61 3.26 6.95 1.79 1.39 14.60 14.90 – – – 1.11 2.20 – n.d. – 2.29 3.39 2.12 1.66 3.47 2.31 0.38

4.9 15.1 6.2 6.2 7.6 9.5 41.0 28.5 7.5 – – – 4.3 3.2 – 79 – 8.1 28.0 21.2 22.1 23.9 13.4 1.1

15.4 25.0 13.5 <5 5.6 41.5 <5 2.9 16.8 8.0 11.0 8.0 10.5 <5 <0.25 4.0 <2 20.9 16.5 7.2 5.1 2.9 4.3 7.2

0.48 1.19 1.59 <1 0.44 2.35 <1 0.68 1.24 1.20 1.60 0.80 0.43 <0.8 0.41 n.d. 5.20 <0.8 0.74 <0.01 <0.01 <0.01 <0.01 <0.01

a b

Cr

Cu

20.1 16.3 113.9 3.5 2 1

18.4 14.5 242 1.6 6 1

19.2 5.3 25.1 12.3 35.7 70.3 183.3 149.0 28.17 47.0 79.5 22.2 2117 36.0 63.0 24.0 317.9 10.9 8.3 100.0 211.0 27.5 74.8 15.1 24.2 12.9 77.0 18.9

Hg

Ni

Pb

Zn

0.2 0.2 0.92 <0.01 14 0

20.9 14.6 54.9 1.8 16 2

32.3 52.1 305.8 3.1 8 3

91.6 83.3 502.5 16.2 9 1

15.8 6.9 23.6 9.3

0,2 0.2 0.4 0.02

24.2 13.4 43.9 15.1

195.5 203.0 479.1 16.4

121.1 57.94 178.5 41.5

66.5 63.4 94.5 31.8 15.5 157.1 34.8 10.7 347.2 115.0 133.0 53.0 231.7 8.5 1.7 108.0 383.0 49.6 46.7 16.8 25.2 36.9 20.1 35.5

0.23 <0.1 4.09 1.8 <0.1 <0.1 <0.1 0.32 0.25 0.59 0.54 0.43 <0.1 <0.1 <0.1 n.d. 0.98 0.30 1.21 0.08 2.40 0.42 1.37 0.11

25.9 29.4 90.2 92.8 13.5 27.0 21.6 11.2 713.6 33.0 51.0 29.0 248.2 5.6 1.7 15.0 99.0 27.8 34.2 33.1 43.7 47.9 43.8 73.6

49.9 273.3 90.2 25.2 36.7 90.4 17.0 18.5 104.1 74.0 79.0 46.0 46.8 27.7 2.51 81.0 79.0 253.0 49.4 13.5 22.1 19.1 14.4 18.6

155.6 428.4 303.8 132 64.7 183.3 221.0 110.8 524.9 376.0 519.0 479.0 175.1 115.0 25.8 752.0 2399 477.0 198.3 53.2 98.7 107.4 54.8 70.2

n = 3 for %TOC and % fines. Sediment concentrations measured by water authorities (Basque Government, 2007; CHE, 2008; CHN, 2004).

The use of probability ellipses based on the reference sites response in essence sets the Type I error rate (i.e. a probability ellipse of 75% has a Type I error rate of 0.25). The probability of Type II error was examined using ellipses of 75, 80, 85, 90, 95, 99 and 99.9% probability in the multivariate space of the reference sediments. The amount of disturbance was manipulated by simulating different degrees of alteration of the reference sites. Thus, simpacted sediments (in the sense used by Bailey et al., 2004) assessed in this exercise were created by increasing the degree of alteration of individual response variables in each of the 47 reference sediments. The alteration was applied as a percentage reduction (25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, and 85%) of the value of the toxicity variables. The ability of each probability ellipse to detect impairment for each set of simpacted sediments was then quantified as the proportion of sediments outside the ellipse. We also wished to select an ellipse size for the classification of sediment toxicity evaluating the Type II error (i.e. the likelihood of failing to identify disturbance) and the degree of disturbance required to be detected. The Type II error rates for each sediment simpact level were calculated as the proportion of simpacted sites inside an ellipse, for each probability ellipse. 3. Results In the evaluation of the line of evidence of sediment toxicity, a total of 47 sediments were selected to establish the reference

condition in the study region. These sediments show a relatively large range of values in compositional variables (particle size distribution, organic mater and metal concentration) (Table 1). The toxicity variables were not significantly correlated with particle size distribution (sediment fraction < 0.5 mm) (Spearman rs , p > 0.05), whereas Ni concentration had significant although low correlation values with cocoons per adult (rs = −0.33) and hatch percentage (rs = −0.34), and As with number of cocoons per adult (rs = −0.34). The range of variation of the toxicity endpoints in the chronic bioassay of T. tubifex when exposed to reference sediments is shown in Table 2. Survival and number of cocoons per adult were the variables with lowest coefficients of variation (CV = 10.4 and 15.8%, respectively), while the number of young worms was the endpoint with the greatest variability (CV = 68.1%). Cluster analysis using the data of the toxicity bioassays revealed two not significantly differentiated groups (Anosim, R = 0.52; Simprof,  = 0.14, p < 0.5%). Therefore, the range of values for the endpoints in the bioassays using the 47 reference sediments can be interpreted as the natural range of variability of the chronic response for T. tubifex expected in the study area, under the exposure conditions of the sediment bioassay. The Type II error rates were calculated from the proportion of simpacted sites identified as equivalent to reference or Non Toxic (i.e. inside an ellipse) for each probability ellipse (setting the Type II error rate) and degree of alteration (Table 3). For example, at a

210

P. Rodriguez et al. / Aquatic Toxicology 101 (2011) 207–213

Table 2 Descriptive statistics of the endpoints measured in the sediment toxicity bioassays with Tubifex tubifex in selected reference sites (n = 47) and the control series (n = 14). Endpoints: survival (%); CcAd, number of cocoons per adult; Hatch percentage (%); YgAd, number of young worms per adult; TGR, total growth rate (d−1 ); CV, coefficient of variation. % Survival

CcAd

% Hatch

YgAd

TGR

Reference sediments Mean Standard deviation Maximum Minimum %CV

93.3 9.7 100.0 60.0 10.4

9.00 1.42 11.40 5.56 15.8

33.52 14.16 68.00 5.78 42.2

24.94 16.99 76.10 1.30 68.1

0.037 0.013 0.061 0.004 39.9

Control sediments Mean Standard deviation Maximum Minimum %CV

98.9 2.1 100.0 95.0 2.1

10.33 0.58 11.15 8.74 5.6

37.41 10.54 53.2 21.7 28.2

33.06 14.16 59.7 12.6 42.8

0.036 0.008 0.050 0.024 22.2

30% reduction in the test endpoints values with a 75% probability ellipse 26% of the simpacted sites were inside the ellipse and therefore identified as equivalent to reference (Type II error p = 0.26). At a disturbance level of 50%, all the sites were outside the 75% probability ellipse, therefore, 100% were classified as Toxic, and accordingly the Type II error rate was zero. In contrast at a disturbance level of 50% and with a 95% ellipse (Type I error p = 0.05) 34% of the simpacted sites were inside that ellipse and the Type II error is therefore p = 0.34. Balancing Type I and II errors is a trade off between misidentifying reference sites as disturbed and failing to identify disturbed sites. As the probability values used for the ellipses get larger (i.e. the Type I error is reduced), the probability of Type II error increases and only higher levels of disturbance can be detected (i.e. reduction in the endpoints) where sediment is classified as Toxic (Fig. 1). From these simpact data (Table 3) and setting a Type II error below 0.05, the identification of those simpacted sediments with a disturbance level of 40% as Toxic requires the use of the 80% probability ellipse; similarly, to detect a reduction >50% requires the 85% ellipse, and so on. With the commonly used threshold of 95% probability, and with a Type II error below 0.05, disturbance can only be detected when it is 60% or greater than reference. In the absence of information as to whether decision making should favour industry (cost of remediation or regulation) or the environment (habitat or species protection), it would seem reasonable that the chance of making both types of error should be equal (Glozier et al., 2002; Mapstone, 1995; Underwood, 1993). We have used this balanced approach to select appropriate probability ellipse as the boundary for classifying sites as toxic.

Fig. 1. Relationships of Type I and II error rates at 3 levels of disturbance of theoretical simpacted sediments, constructed as a certain percentage of reduction of the endpoints in reference sites (for details see text).

Sediment quality bands have been established using two probability ellipses that establish three impairment classes based on the response in the toxicity tests: Band 1—Non Toxic or equivalent to reference, Band 2—Potentially Toxic or possibly different from reference, and Band 3—Toxic or different from reference condition. Based on a desire to keep Type II error below 5%, simpact disturbances ≤ 35% cannot be detected by any of the probability ellipses that were used. Thus, if we could detect a 40% reduction in endpoint values as being a concern of potential toxicity, that is, as a warning of impairment with respect to the reference condition, and requiring a Type II error of less than 5%, we identified the 80% probability ellipse (Table 3) as the pass/fail boundary for classifying sediments as either Non Toxic and Potentially Toxic. A second ellipse of 95% defines a second pass/fail boundary between Potentially Toxic and Toxic categories (Fig. 2). The selection of this second probability ellipse was based on the general recommendation that both Type errors occur at about the same probability level. This boundary identifies simpact disturbances with a 60% reduction in the values of the endpoints of reference sites. With these probability ellipses selected, we have assessed sediment toxicity for test sediments from 24 sites following the approach described by Reynoldson et al. (2002a). Each test sediment was compared to reference sites in multidimensional space one at a time, and the toxicological assessment of the sediment was based on the position of the test sediment in that reference ordination space using the 80 and 95% probability ellipses constructed around the reference sites (Table 4). This approach concluded that 13 test sediments were inside the 80% probability ellipse and were assessed as Non Toxic, whereas 9 sediments were located outside the 95% probability ellipse and thus assessed as Toxic, and 2 test sediments ordered between both ellipses were considered as Potentially Toxic.

Table 3 Type II error calculated for each probability ellipse, considering different degrees of disturbance (% reduction) of the toxicological variables with respect to the reference sediments (theoretical simpacted sediments). In bold, those cases with Type II error < 0.05. Simpacted sediments % reduction

25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85%

Probability ellipses 75%

80%

85%

90%

95%

99%

99.9%

0.45 0.26 0.09 0.02 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

0.51 0.32 0.15 0.04 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

0.57 0.38 0.26 0.13 0.13 0.04 0.00 0.00 0.00 0.00 0.00 0.00 0.00

0.74 0.55 0.34 0.38 0.38 0.15 0.00 0.00 0.00 0.00 0.00 0.00 0.00

0.85 0.79 0.55 0.62 0.62 0.34 0.11 0.02 0.00 0.00 0.00 0.00 0.00

0.98 0.96 0.89 0.87 0.87 0.81 0.72 0.55 0.40 0.23 0.09 0.00 0.00

1.00 1.00 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98 0.98

P. Rodriguez et al. / Aquatic Toxicology 101 (2011) 207–213

211

Table 4 Toxicological assessment of the test sediments using the probability ellipses of 80 and 95%, compared with the assessment using 5th and 0.5th percentiles of the endpoints in reference sites. Abbreviations: NT, Non Toxic; T, Toxic. Test sediments

A202 ARAR150 AS160-500 B226 BA558 NO3023 NO3070 NO3096 OK114 UR434 Z828 183 UR320 N338 AS160 D202 BHE500 OM380 SP18 SP7.1 URS34 SP8 68 124

Assessment through probability ellipses (this study)

NT NT NT NT NT NT NT NT NT NT NT NT NT Potentially T Potentially T T T T T T T T T T

Fig. 2. The pass–fail boundaries of the 80% and 95% probability ellipses on the multivariate MDS space of the chronic toxicity endpoints (survival, cocoons per adult, young per adult, and total growth) measured in 47 reference sediments. Diamonds represent individual reference sites. Grey circles show the relative position of the theoretical simpacted sediments (see text) with reductions of up to 40%, between 40 and 60% and more than 60% of the reference values.

4. Discussion Sediment toxicity assessment constitutes one of the lines of evidence in the weight of evidence procedure (Chapman et al., 2002). The identification of the natural responses for toxicity test endpoints (i.e. the reference condition) is a necessary and critical step in the assessment of sediment quality as they provide the indicator values for determining the levels of stress or effect (Burton et al., 2002). In the present study, the selection of reference sites was based on biological variables (the good conservation state of macroinvertebrate communities and the absence of toxicological effects observed in laboratory bioassays). This explains why some of the reference sediments may have relatively high levels of some chemicals (see in Table 1 some substances over the TEC or PEC values at reference sites). This fact was also observed in the reference sediments selected by other authors (Chapman, 2007; Reynoldson

Assessment by percentiles (from Maestre et al., 2009b) Survival

TGR

Reproduction

NT NT NT NT NT NT NT NT NT NT NT NT NT NT Potentially T T T NT NT NT NT T T T

NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT NT T T T T T T T

NT NT NT NT NT NT NT NT NT NT NT NT T Potentially T Potentially T NT Potentially T Potentially T Potentially T Potentially T T Potentially T T T

et al., 2002a) and can be attributed to low bioavailability of those chemicals. Lack of correlation between toxicological variables and sediment characteristics in the present study may be related to insufficient analysis of contaminants of potential concern, particularly the absence of data on organic compounds. However, the same lack of correlation was observed for reference sediments in the Great Lakes of North America (Reynoldson et al., 1995, 2002a). With a few exceptions, the natural characteristics of the reference sediments (particle size distribution, organic carbon, metals) were not related to the toxicological variables; therefore, other variables, such as microorganism communities (Balykin, 1983) could be responsible for differences observed in growth and reproduction in the bioassays. The variability in the toxicological endpoints attributable to the natural variability of the test organisms is expressed by the values of the coefficients of variation in the control series of the bioassays. These are relatively low for most variables (less than 26%, except for young per adult), and always lower than that observed in the reference sediments. The observed variability in our test organisms has been discussed in a previous publication (Maestre et al., 2007), and the results here are in accord with previous observations describing high variability in young production. The emphasis in this study on Type II statistical error is because this error has a direct implication on health or environmental risk as it measures the probability that toxic sites are assessed as Non Toxic. In contrast, Type I error is related to the risk of unnecessary expense because non toxic sites have been identified as Toxic. Many researchers on benthic communities have used a Type I error of 0.05, but there is no consensus on an appropriate Type II error rate, although some authors recommend setting similar probability levels for Type I and II errors (e.g. Glozier et al., 2002). Reynoldson et al. (2002b) noted that a determination of the degree of impairment or distance from reference condition that is unacceptable is ultimately a subjective decision. The selection of appropriate level for ˛, as represented by the probability ellipses in a decision-making process has that degree of subjectivity. In the present study, we have used what we consider reasonable criteria (i.e. percentage reduction of the endpoints >40%, control of Type II error below 5%, and maintaining a balance between statistical errors). The evalua-

212

P. Rodriguez et al. / Aquatic Toxicology 101 (2011) 207–213

tion of the Type II error in sediment toxicity assessment is not usual (Environment Canada, 1998; Reynoldson et al., 2002a); however, it is not trivial and constitutes an essential tool for toxicity risk assessment because it provides information on the error of classifying a toxic sediment as Non Toxic. Such inaccuracy may have important consequences in conservation of aquatic ecosystems, and therefore represents critical information for environmental managers. Sediment toxicity assessment for the same test sites has been also analysed in a previous publication using an univariate approach, namely 5th and 0.5th percentile values of the endpoints measured in reference sites (Maestre et al., 2009b). Sediment toxicity assessment of the test sites in this study and the previous based on percentiles were coincident for 14 test sediments (Table 4), mostly for Non Toxic sites. At 10 test sites, toxicity assessment using probability ellipses appears to be more sensitive, since it identifies as Toxic all test sediments (except one) that showed at least one endpoint assessed as Toxic using the percentiles approach. The main contributions of the multivariate approach and the use of probability ellipses as boundaries for sediment toxicity classification are the simultaneous treatment of all toxicity endpoints values, and to provide to the site toxicity assessment additional information on the statistical errors, which are associated with both socio-economic and environmental costs. 5. Conclusions The classification of test sediments for assessing toxicity has been done using a Reference Condition Approach. Probability ellipses around reference sites provide the pass–fail boundaries for the toxicity categories (Non Toxic, Potentially Toxic, and Toxic). The construction of these ellipses around the reference sites is done in multidimensional space established by MDS multivariate analysis of the toxicological endpoints. The main attributes of probability ellipses in the multivariate approach are: • consideration of all the toxicity endpoints values simultaneously, • providing the toxicity assessment with additional information on the statistical Type I and II errors, which are associated with both socio-economic and environmental costs. The method has proved to be generally more sensitive for those sites that are equivocal through the evaluation of individual endpoints. Consideration of Type II statistical error in the sediment toxicity assessment may have important consequences in conservation of aquatic ecosystems and, therefore, should be evaluated in environmental risk assessment. Acknowledgments This research has been possible through the financial support provided to the research project by the Spanish Government CGL2008-04502, and the Basque Government GIC07/125-IT-40407. We thank Dr. Keith Somers of the Ontario Department of Environment (Canada) for providing the Excel macro for constructing the ellipses and Dr. Jesús de la Cal (Applied Mathematics department, UPV/EHU) for interesting remarks on some statistical concepts. References Alba-Tercedor, J., Sánchez-Ortega, A., 1988. Un método rápido y simple para evaluar la calidad biológica de las aguas corrientes basado en el de Hellawell (1978). Limnetica 4, 51–56. Alba-Tercedor, J., Pujante, A.M., 2000. Running-water biomonitoring in Spain: opportunities for a predictive approach. In: Wright, J.F., et al. (Eds.), Assessing the Biological Quality of Freshwaters. Freshwater Biological Association, Ambleside, pp. 207–216 (Chapter 14).

ASTM, American Society for Testing and Materials, 2005. Standard Test Method for Measuring the Toxicity of Sediments-Associated Contaminants with Freshwater Invertebrates. ASTM E1706-05. ASTM, Philadelphia, PA, USA. Bailey, R.C., Norris, R.H., Reynoldson, T.B., 2004. Bioassessment of Freshwater Ecosystems Using the Reference Condition Approach. Kluwer Academic Press, USA. Balykin, A.V., 1983. On the relationship between the microflora and the habitat of the Tubificidae. In: Kurashvili, B.E. (Ed.), Aquatic Oligochaeta. Proceedings of the Fourth All-Union Symposium. “Metsniereba” Publishing House, Tbilisi, pp. 16–22. Basque Government, 2006. Red de Seguimiento del Estado Ecológico de los Ríos de la Comunidad Autónoma del País Vasco. Tomo 1: Metodologías utilizadas, http://www.uragentzia.euskadi.net/u81-0003/es/contenidos/informe estudio/red rios/es red agua/adjuntos/2005 01.pdf. Basque Government, 2007. Red de Seguimiento del Estado Ecológico de los Ríos de la Comunidad Autónoma del País Vasco (2005–2007). Dirección de Aguas de Gobierno Vasco, http://www.uragentzia.euskadi.net/u81-0003/es/contenidos/ informe estudio/red masas agua superficial/es red agua/indice.html. Burton Jr., G.A., Batley, G.E., Chapman, P.M., Forbes, V.E., Smith, E.P., Reynoldson, T.B., Schlekat, C.E., den Besten, P.J., Bailer, A.J., Green, A.S., Dwyer, R.L., 2002. A weight-of-evidence framework for assessing sediment (or other) contamination: improving certainty in the decision-making process. Hum. Ecol. Risk Assess. 8, 1675–1696. Casper, S.T., 2008. Regulatory frameworks for sediment management. In: Owens, P.N. (Ed.), Sustainable Management of Sediment Resources: Sediment Management at the River Basin Scale. Elsevier, Amsterdam, The Netherlands, pp. 55–81. CEE, 2000. Directive 2000/60/CE of the European Parliament and of the Council of 23 October 2000 establishing a framework for community action in the field of water policy. Off. J. Eur. Union{expand} L327/1 (22.12.2000). CEE, 2008. Directive 2008/105/EC of the European Parliament and of the Council of 16 December 2008 on environmental quality standards in the field of water policy, amending and subsequently repealing Council Directives 82/176/EEC, 83/513/EEC, 84/156/CEE, 84/491/CEE y 86/280/CEE and amending Directive 2000/60/EC of the European Parliament and of the Council. Off. J. Eur. Union L348/84 (24.12.2008). Chapman, P.M., 2007. Determining when contamination is pollution—weight of evidence determinations for sediments and effluents. Environ. Int. 33, 492–501. Chapman, P.M., Anderson, J., 2005. A decision-making framework for sediment contamination. Integr. Environ. Assess. Manage. 1, 163–173. Chapman, P.M., McDonald, B.G., Lawrence, G.S., 2002. Weight-of-evidence issues and frameworks for sediment quality (and others) assessments. Hum. Ecol. Risk Assess. 8, 1489–1515. CHE, 2006. Hydrographical Confederation of River Ebro. Metodología para el Establecimiento del Estado Ecológico según la Directiva Marco del Agua, http://195.55.247.234/webcalidad/estudios/indicadoresbiologicos/Manual bentonicos.pdf. CHE, 2008. Hydrographical Confederation of River Ebro. Red de Macroinvertebrados (2005), Control Biológico en Ríos (2006 and 2007), http:// 195.55.247.234/webcalidad/es estudios.htm. CHN, 2004. Hydrographical Confederation of Northern Spain. Red de Tóxicos (unpublished document). Clarke, K.R., Gorley, R., 2006. PRIMER v6: User Manual/Tutorial. PRIMER-E, Plymouth, UK. Environment Canada, 1998. Pulp and Paper Technical Guidance. Document for Aquatic Environmental Effects Monitoring. EEM/1998/1. Environment Canada, Ottawa, Ontario, Canada. Glozier, N.E., Culp, J.M., Reynoldson, T.B., Bailey, R.C., Lowell, R.B., Trudel, L., 2002. Assessing metal mine effects using benthic invertebrates for Canada’s environmental effects program. Water Qual. Res. J. Can. 37, 251–278. Grapentine, L., Marvin, C., Painter, S., 2002. Initial development and evaluation of a sediment quality index for the Great Lakes region. Hum. Ecol. Risk Assess. 8, 1549–1567. Heise, S., 2007. Preface. In: Sustainable Management of Sediment Resources, Sediment Risk Management and Communication, vol. 3. Elsevier, Amsterdam, pp. v–vii. Maestre, Z., Martinez-Madrid, M., Rodriguez, P., Reynoldson, T., 2007. Ecotoxicity assessment of river sediments and a critical evaluation of some of the procedures used in the aquatic oligochaete Tubifex tubifex chronic bioassay. Arch. Environ. Contam. Toxicol. 53, 559–570. Maestre, Z., Martinez-Madrid, M., Rodriguez, P., 2009a. Monitoring the sensitivity of the oligochaete Tubifex tubifex in laboratory cultures using three toxicants. Ecotoxicol. Environ. Saf. 72, 2083–2089. Maestre, Z., Rodriguez, P., Martinez-Madrid, M., 2009b. Application of the sediment quality TRIAD to rivers in northern Spain. In: Santos, E.B. (Ed.), Ecotoxicology Research Development. Nova Science Publishers, New York, pp. 205– 224. MacDonald, D.D., Ingersoll, C.G., Berger, T.A., 2000. Development and evaluation of consensus-based sediment quality guidelines for freshwater ecosystems. Arch. Environ. Contam. Toxicol. 39, 20–31. Mapstone, B.D., 1995. Scalable decision rules for environmental impact studies: effect size, Type 1, and Type 2 errors. Ecol. Appl. 5, 401–410. 2003. Microsoft Office Excel. In: Operating System Windows XP. Microsoft Corporation. Milani, D., Reynoldson, T.B., Borgmann, U., Kolasa, J., 2003. The relative sensitivity of four benthic invertebrates to metals in spiked-sediments exposures and application to contaminated field sediment. Environ. Toxicol. Chem. 22, 845–854.

P. Rodriguez et al. / Aquatic Toxicology 101 (2011) 207–213 Navarro, P., Raposo, J.C., Arana, G., Etxebarria, N., 2006. Optimisation of microwave assisted digestion of sediments and determination of Sn and Hg. Anal. Chim. Acta 566, 37–44. Reynoldson, T.B., Thompson, S.P., Bampsey, J.L., 1991. A sediment bioassay using the tubificid oligochaete worm Tubifex tubifex. Environ. Toxicol. Chem. 10, 1061–1072. Reynoldson, T.B., Bailey, R.C., Day, K.E., Norris, R.H., 1995. Biological guidelines for freshwater sediment based on benthic assessment of sediment (the BEAST) using a multivariate approach for predicting biological state. Aust. J. Ecol. 20, 198–219. Reynoldson, T.B., Norris, R.H., Resh, V.H., Day, K.E., Rosenberg, D.M., 1997. The reference condition: a comparison of multimetric and multivariate approaches to assess water-quality impairment using benthic macroinvertebrates. J. North Am. Benthol. Soc. 16, 833–852. Reynoldson, T.B., Day, K.E., Pascoe, T., 2000. The development of the BEAST: a predictive approach for assessing sediment quality in the North American Great Lakes. In: Wright, J.F., Sutcliffe, D.W., Furse, M.T. (Eds.), Assessing the Biological Quality of Freshwaters. Freshwater Biological Association, Ambleside, UK, pp. 165–180. Reynoldson, T.B., Thompson, S.P., Milani, D., 2002a. Integrating multiple toxicological endpoints in a decision-making framework for contaminated sediments. Hum. Ecol. Risk Assess. 8, 1569–1584.

213

Reynoldson, T.B., Smith, E.P., Bailer, A.J., 2002b. A comparison of three weight-of evidence approaches for integrating sediment contamination data within and across lines of evidence. Hum. Ecol. Risk Assess. 8, 1613–1624. Salomons, W., Brils, J., 2004. Contaminated Sediments in European River Basins. EC contract no. EVK1-CT-2001-2002. European Sediment Research Network (SEDNET). Sanz, J., de Diego, A., Raposo, J.C., Madariaga, J.M., 2004. Methylmercury determination in sediments and fish tissue from the Nerbioi-Ibaizabal estuary (Basque Country, Spain). Anal. Chim. Acta 508, 107–117. SPSS, 2006. SPSS 15.0.1 for Windows. SPSS, Chicago, IL, USA. Stroffek, S., 2001. Determination of Reference Conditions and Class Boundaries in Monitoring and Assessing of Surface Water Ecological Status in France. REFCOND workshop, Uppsala, Sweden. Underwood, A.J., 1993. The mechanics of spatially replicated sampling programs to detect environmental impacts in a variable world. Aust. J. Ecol. 18, 99–116. USEPA, 2000. Determination of Inorganic Analytes by Inductively Coupled PlasmaAtomic Emission Spectrometry, Test Methods for Evaluating Solid Waste. United States Environmental Protection Agency (USEPA 6010C). SW-846. U.S. Government Printing Office, Washington, DC, USA.