Inter-laboratory comparison of a sediment toxicity test using the marine amphipod, Rhepoxynius abronius

Inter-laboratory comparison of a sediment toxicity test using the marine amphipod, Rhepoxynius abronius

'vIarine Ent'tronmental Research 19 (1986) 13-37 Inter-Laboratory Comparison of a Sediment Toxicity Test Using the Marine Amphipod, Rhepoxynius abr...

938KB Sizes 0 Downloads 32 Views

.'vIarine Ent'tronmental Research 19 (1986) 13-37

Inter-Laboratory Comparison of a Sediment Toxicity Test Using the Marine Amphipod, Rhepoxynius

abronius A. J. Mearns Ocean Assessments Division, US NOAA, 7600 Sand Point Way NE, Seattle, Washington 98115, USA

R. C. Swartz Marine Science Center, US EPA, Newport, Oregon 97365, USA

J. M. Cummins US EPA, Region 10 Laboratory. PO Box 549, Manchester, Washington 98353. USA

P. A. Dinnel Fisheries Research Institute, University of Washington WH-10, Seattle. Washington 98[95, USA

P. Plesha National Marine Fisheries Service, Mukilteo Field Station, PO Box 21, Mukilteo, Washington 98275, USA

P. M. Chapman* E.V.S. Consultants, 195 Pemberton Avenue, North Vancouver, BC. V7P2R4, Canada (Received: 29 November, 1985)

A BSTRA C T An inter-laboratory comparison o f the Swartz et al. (1985) amphipod sediment toxicity test was performed Jor seren marine sediments o f * To whom correspondence should be addressed. 13 Marine Enriron. Res. 0141-1136/86/503-50 ~ Elsevier Applied Science Publishers Ltd, England, 1986. Printed in Great Britain

~4

A J. Mearns et al tarying toxicity. Fice laboratories participated. Four a priori and one a posteriori hypotheses or criteria were tested _[or three end points (surcit'al. emergence and reburial). The bioassay met the a priori criterion of success, acceptable surt'ical and behacior (emergence and reburial) of controls. It also met two of three a priori hypotheses: acceptable agreement on the rank order of toxllcit)'for all three end points and acceptable agreement on mean ralues Jor the end points. The third hypothesis, classification of sediments as toxic or non-toxic, was only met for the emergence end point; howeLer, this was probably due to the narrow range of toxic sediments tested (four of the seren sediments tested were only marginally toxic). Reciew of these and other amphipod sediment toxicity test data indicates that sediments that are clearly nontoxic (surcital is greater than 87'~) and those that are clearly toxic (sur~it'al is less than 76 o~) will be accurately classified whereas those of rnarginal toxicity (surciral is between 76°0 and 87'~/0) can only be classified based on emergence data. An a posteriori comparison indicated that the amphipod sediment toxicity test was more precise in LC~o and ECso determinations with a reference toxicant (cadmium-amended sediments) than has precious[)' been shown in inter-laborator.v comparisons. Based on the results of this stud)', we recommend the wider use of this toxicity test to determine the toxicity of field-collected marine sediments and for laboratory studies with contaminant-amended sediments.

INTRODUCTION Many toxicity tests are now available to assess the potential biological consequences of waste disposal and marine pollution. However, aquatic toxicity tests are seldom incorporated into large-scale, regionwide or long-term monitoring programs. One of the factors that limit wider use of toxicity tests is lack of information about their reproducibility (Mclntyre, 1984). Without such information, management and regulatory agencies may be reluctant to endorse, sponsor or require the use of toxicity tests in monitoring and assessment programs. A major new area of concern is the assessment of toxic materials in marine sediments. Marine sediments are now clearly recognized as major reservoirs for toxic chemicals in the marine environment. A sediment toxicity test using the infaunal amphipod, Rhepoxynius abronius (Barnard; Phoxocephalidae) (Swartz et al., 1985) is one of several tests being considered and, in several cases, used as a tool for surveying sediment quality. During the past three years, this test has been

Inter-laboratory comparison oj amphipod sediment toxicity test

15

used by five different laboratories to test sediment toxicity at several hundred sites in Puget Sound, Washington (Long, 1984) and at selected sites elsewhere in the United States (Swartz et al., 1985). In addition, this test is one of two principal procedures being used in interim criteria by the US Environmental Protection Agency for evaluating permits for the disposal of dredge material in Puget Sound (US EPA, 1984, 1985). Because of the increasing importance or this particular test, an experiment was conducted to determine the variability of the amphipod sediment toxicity test among different laboratories using the same protocol. This study is the first inter-laboratory comparison test to use field-collected sediment. Previous inter-laboratory comparisons used specific individual toxicants in the aqueous phase alone (Davis & Hoos, 1975; Reish et al., 1978; Vanhaecke & Persoone, 1981) or in the presence of clean sediment (Pesch & Hoffman, 1983). The present study is also the first large-scale comparison of the Swartz et al. (1985) amphipod toxicity test. Chapman & Fink (1983) reported only a limited comparison of this test. The test developed by Swartz et al. (1985) is a 10-day static experiment conducted with seawater of >_25 ppt salinity at 15 °C under constant light. The experimental design requires five replicate t-liter beakers, each supplied with constant aeration and each containing 175 ml or 2cm of sediment and 20 healthy 3- to 5-mm amphipods (Rhepoxynius abronius). The principal end point is the mean number ofamphipods surviving at the end of Day 10: secondary end points are (I) the mean number of amphipods capable of reburial in clean sediment at the end of Day 10 and (2) the mean number of emerged amphipods determined at the end of Day I 0. A principal outcome is classification of test sediments as toxic, where mean survival is significantly (p < 0-05) lower than control survival, or non-toxic, where there is no significant difference between mean survival in control and test beakers. Sediment toxicity is indicated by increasing mortality, increasing emergence or decreasing reburial. The protocols of Swartz et al. (I985) are specific for all steps, including: (I) collection and preparation of sediments; (2) collection of amphipods; (3) transport and acclimatization of amphipods: (4) experimental set up and testing procedures: (5) monitoring; (6) termination and enumeration of amphipods and end points and (7) data analysis and interpretation. The experiment conducted for this report involved five separate laboratories operating voluntarily under the purview of a nonparticipating referee, the senior author of this paper (AJM). All

.4 J. M e a r n s et al.

[6

participants contributed to refinement of the experimental design and synchronous implementation of the experiment. The experiment tested the variability associated with all steps following collection of the amphipods (e.g. steps (3) to (6) above). Collection and preparation of sediments, collection of amphipods, some portions of the transport and acclimatization of amphipods and data analysis were cooperative activities not subject to inter-laboratory differences. One a priori criterion, three a priori hypotheses and one a posteriori hypothesis were tested for all three end points (emergence, survival and reburial). METHODS Five laboratories participated in the inter-laboratory comparison experiment: E.V.S. Consultants Ltd., Vancouver, British Columbia: the Mukilteo Laboratory, Environmental Conservation Division, National Marine Fisheries Service, Mukilteo, Washington; the University of Washington Fisheries Research Institute Bioassay Laboratory located at the West Point Treatment Plant, Municipality of Metropolitan Seattle; the US Environmental Protection Agency Region I0 Laboratory, Manchester, Washington and the US Environmental Protection Agency Ocean Discharge Division Laboratory, Newport, Oregon. The experiment was conducted during February, 1984.

Approach Prior to conducting the experiment, participants attended several coordinating and planning workshops. The workshops resulted in the development of a priori criteria and hypotheses, the experimental design, planning and assignment of responsibilities for five task areas, including: (1) collection and preparation of sediments; (2) collection of amphipods (3) transport and acclimatization of amphipods and sediments; (4) initiation, monitoring and termination of the experiment and (5) data collection and analysis. Participants agreed a priori that a successful experiment must meet one criterion and test three primary hypotheses. The criterion was that mean survival in control sediments must be >90 '~',~,for every laboratory. The three hypotheses were based primarily on survival (I)

There is no significant difference among at least 80% (four of five) of the laboratories in the rank order of seven sediment samples as determined by mean survival.

Inter-laboratory cornpartson of arnphipod sediment toxicity test

(2)

(3)

17

There is no significant difference in mean survival in each test sediment among at least 80 0"0 of the laboratories for at least 80 ?J; of the test sediments. At least 80°/ / o of the laboratories agree on the classification of sediment as toxic or not toxic for at least 80 °/ /o of the test sediments. Sediment is toxic if mean survival in the test sediment is significantly less than the mean control survival. It is not toxic if the control and test sediment survival means are not significantly different.

These three hypotheses were secondarily applied to amphipod emergence and reburial data. The minimum experimental design that the participants agreed would best meet the criterion and hypotheses involved three treatments comprising seven sediments: (1) a non-toxic control sediment, (2) three test sediments containing a standard reference toxicant designed to bracket the range of highly toxic to slightly toxic and (3) three field-collected sediment samples from locations that, in previous testing, ranged from highly toxic to slightly toxic. To maintain anonymity, laboratories were assigned a code name known only to the referee. The seven sediment samples were assigned color codes. Specific tasks were conducted as described below. Collection and preparation of test sediments Control and test sediments were collected 7 to 14 days prior to the experiment and were kept refrigerated (4°C) until distributed to the participating laboratories. Control sedi ments were collected from shallow subtidal sites (West Beach; Washington and Yaquina Bay, Oregon). Portions of the control sediments were amended with the reference toxicant, cadmium chloride (CdCI2) and the measured concentrations were 4.1 I, 8.25 and 12-19mg Cd/kg dry weight. Field samples were taken from three sites in Puget Sound: Central Basin (Metro Seattle Station A600E), inner Sinclair Inlet and Slip No. I in City Waterway, Commencement Bay. At each site, several samples were collected with a van Veen grab and mixed together. Sediment samples were stored in 4-liter glass jars labelled with colorcoded tape and replicate number, and were distributed randomly to the participants. Subsamples of sediments from all treatments were analysed to determine sediment characteristics, which included sediment particle size distribution, total solids content, per cent water, volatile solids content, Eh, cadmium concentrations and grease and oil content.

18

A.J. Mearns et al.

Collection of amphipods

Live amphipods were collected by benthic dredge from a depth of 6 m off West Beach, Whidbey Island, Washington. Approximately 18000 Rhepoxynius abronius were captured and maintained in groups of 200 in sieved sediment from this site. The sediment was contained in plastic trays stored on ice in plastic ice chests. Sediment temperature was monitored at intervals throughout the 24-36 h capture and initial transport period. The temperature did not exceed 10°C.

Initiation, monitoring and termination of experiments Each laboratory set up all the beakers with seawater and sediments on the same morning, using its own source of clean seawater. Since the various sediment types had different densities, exposure was standardized by using the wet weights of each sediment type required to create a sediment layer 2-cm deep in a l-liter beaker. Beakers were numbered randomly and arranged randomly on shelves in temperature-controlled rooms or in water baths. Amphipods were selected and introduced into the beakers. All amphipods were pre-tested for ability to rebury in sediment from the collection site: animals not reburying in less than l h were not used. Amphipods were drawn randomly from each holding tray on a rotating schedule and distributed randomly into each beaker. All criteria described by Swartz c't al. (1985) for experimental set-up were followed. Daily monitoring of temperature, pH, salinity and dissolved oxygen (DO) content of water was conducted in one beaker of each treatment by each laboratory. All beakers were examined daily for amphipods emerged from sediments. On Day 10, each experiment was terminated in the order in which it was set up. First, DO, pH, temperature and salinity were recorded in one beaker for each treatment. Next, data were recorded in the following order: (1) number of amphipods totally or partially emerged in undisturbed beakers, (2) total number of surviving amphipods retained during screening of contents of each beaker, (3) number of amphipods that reburied within l min following introduction into a beaker that contained clean sediment and (4) number of dead amphipods (no movement or response to physical stimulation). Amphipods were examined under a low power dissecting microscope.

Inter-laboratory compartson o1 amphipod sediment to.vicit v test

19

Compilation and analysis of data All data manipulations and statistical analyses were performed using raw data on amphipod emergence, survival and reburial. Kendall's Test of Concordance (Sokal & Rohlf, 1969) was applied to the ranking values to test the hypothesis that there was inter-laboratory agreement on rank order of the treatments. The extent of inter-laboratory agreement on absolute means (independent of controls) was determined using ANOVA and the SNK (Student-Newman-Keuls) multiple range test (Sokal & Rohlf, 1969). D u n n e t t ' s Procedure (Steel & Torrie, 1960) was used to test the hypothesis that there was inter-laboratory agreement on which sediments were toxic and which were not. Homogeneity of variances was tested prior to ANOVA and Dunnett's procedure by either Hartley's Fm~ x test (Sokal & Rohlf, 1969) or, when a treatment mean had zero variance, Cochran's test (Eisenhart et al., 1947). If variances were not homogeneous, an arcsine transformation was applied. In all cases, this transformation was successful in meeting the assumptions of the homogeneity of variances. Grand means were calculated by pooling the five replicates from the five laboratories (n = 25). In addition to the a p r i o r i tests, a p o s t e r i o r i computations were also made of the 10-day LCso for survival and ECso'S for reburial and emergence, for each of the three cadmium-amended sediments, using the moving average angle method (Finney, 1971).

RESULTS

Sediment characteristics Sediment characteristics varied markedly among the different sediments used in this study (Table I ). The control and cadmium-amended sediment was predominantly clean sand. In contrast, the three field-collected sediments were muds (predominantly silt and clay) with high organic content and varying organic content and cadmium contamination. Temperature, salinity, pH and DO of water in the test beakers were within the acceptable levels defined by Swartz et al. (1985).

A . J . Mcarns et al.

20

TABLE I Characteristics of Tested Sediments

Sand (',~,) Silt ('~o) Clay (')i') Total solids ('~i,) Water (,,,) Total volatile solids (",,) Eh (mV) Oil and grease (mg/liter) C a d m i u m (mg/kg) h

Control and amended sediment"

Puget Sound sediments Ccntral Basin

98-0.98.5 0-3.2-0 0-0. I-2 77-5.79-8 200.21.0 I-0, 1.3 >250 68. 100 0-0, 0-I

2. I 37-5 60-4 32.4 67-9 10.5 140 843 0-4

Sinclair Inlet 18-2 46-I 35-7 41.9 58-3 9-3 188 2573 4.2

City Waterway 17.2 42-5 40.3 29. I 71.2 29-6 80 20294 5-6

Control and amended sediment values are based on two determinations. All other values except c a d m i u m c o n c e n t r a t i o n s are based on single determinations. C a d m i u m values other than control are grand means of the five sets of means derived for all laboratories from triplicate d e t e r m i n a t i o n s for each laboratory.

TABLE 2 Survival of R. abronius in Test Sediments. Mean + S t a n d a r d Deviation

Treatment

Control 4mg/kg Cd 8 mg/kg Cd 12mg/kg Cd Central Basin Sinclair Inlet City Waterv.-ay

Laboratory /

2

18-4_+ 1.3 (I) ° 17.8+1-3 (2) 17.4+1-8 (3) I-6-+0-6 (7) 16.6 + 2-3 (4) 15-6+-2.6 (5) 148_+2-3 (6)

19-2+0.8 (2) 19.6_+0.6 (I) 18.0 +_ 2-0 (3) 8-2_+2.2 (7) 13-8_+I-5 (5) 13-44-2-2 (6) 17-4+2.4 (4)

3 20.0+__0-0 (I) 19.4+0.6 (2) 15.6+_2.1 (6) 2-4_+1.5 (7) 18-0_+I-6 (3) 17-4_+1.5 (4) 16.6+-2.5 (5)

Ranks (1) = least toxic; (7) = most toxic sediment.

Grand Mean 4 18.8_+ 1.3 (2) I9-8 -+ 0.4 (1) t0.0_+3-0 (6) 1.2_+1-1 (7) 18.4+- I.I (3) 17-6+0.6 (4) 16-8+2-2 (5)

5 20.0_+0.0 (I) 196_+0-9 (2) 15-4 +- 0-6 (5) 5.6_+2-3 (7) 16-0+_0-7 (4) 14.8+- I-8 (6) 17.4+O6 (3)

19 3_+ I-I (1) 19.2+1-1 (2) 15-3-+3.4 (6) 3-8-+3-1 (7) 16-6_+2.2 (4) 15-8+2.4 (5) 16.6 + 2.2 (3)

Inter-laboratory comparison of amphipod sediment toxicitv test

21

Comparison of control treatments The a priori criterion for success was that every laboratory had a mean control survival of at least 90 °//o (eighteen amphipods). Overall ,~rand mean ( n = 2 5 ) survival was 19.3+1.1 amphipods or 96.4% (Table2). Individual laboratory means ranged from 18.4 to 20.0 amphipods (92 ~o to 100 o/,~survival) with a coefficient of variation (CV) of 3-7 °1/~o.Hence, the criterion for control survival was met by all five laboratories. There were no a priori criteria for proposed acceptable control values for reburial or emergence at Day 10. Reburial controls averaged 19.2 amphipods with individual laboratory means ranging from 18.4 to 20.0 (CV = 3.5 "/jo, Table 3). Emergence in controls averaged 0 4 amphipods with individual laboratory means ranging from 0 to 1,0 amphipods (Table 4). Rank orders of toxicity The first a priori hypothesis was that there would be no significant difference among at least 80% (four of five) of the laboratories on the TABLE 3 Reburial o f R. abronius in Test Sediments, M e a n + Standard Deviation

Treatment

Laboratory

Grand Mean

/

2

3

4

5

18.4+1.3

19.2+0-8

19.8 +0-4

18.8+1 3

20-0+0-0

(I) a

(2)

(I)

(2)

(I)

(l)

4mg/kg Cd

17.8+_1.3 (2)

19.6_+0.6 (1)

19.0+00 (2)

198_+0.4 (I)

19-6_+0-9 (2)

19.2+ 1.0 (2)

8 mg/kg Cd

16,0+I,9 (4)

17-8_.+1.8 (3)

14.2 + 2.8 (6)

7.8+2-/ (6)

12-2+1.8 (6)

136+4.1 (6)

12mg/kg Cd

1.2 +0-8 (7)

4-4_+2-0 (7)

I-6+1.5 (7)

0.0 +0.0 (7)

1.4+ I.I (7)

[-7+ [.2 (7)

Central Basin

16-4+2-4 (3)

13-8_+ I-5 (5)

17.8+ 1.9 (3)

18.4_+ I.I (3)

16.0_+0-7 (4)

16-5_+2-2 (3)

Sinclair Inlet

152+2-8

13.0+2-6

17-4+ 1.5

17.6_+0-6

14.8+ I-8

15.6+_2-4

(5) .

(6)

(4)

(4)

(5)

(4)

13.6-+2.9 (6)

17-0+2-7 (4)

14,4+3.6 (5)

14.8-+3-0 (5)

16.2+0-8 (3)

15-2+2-8 (5)

Control

City Water~'ay

" R a n k s ( I ) = least toxic; (7) = most toxic sediment.

19.2_+1 I

22

A.J.

Mcarns et al.

TABLE 4 Day 10 Emergence of R. abronms in Test Sediments, Mean + Standard Deviation Treatment

Control 4 mg/kg Cd 8 mgikg Cd 12mg/kgCd Central Basin Sinclair Inlet

City Waterway

Laboratory

Grand Mean

1

2

3

4

5

I-0_ 1.0 (5)' 1.6 +- 2-0 (4) 2-8+_2.2 (3) 12.8+3.3 (l) 0-4+_0-6 (7) 1.0_4-1.0

044-0.9 (6) 0-44- 0,6 (6) I-2+1.6 (3) 13.64- I-8 (1) 0-8+_0,8 (5) 1.2+_1-1

0.0 _+0-0 (7) 0.4 _+ 0.6 (5) 4.6_+2.5 (2) 16-6+_2-0 (I) 0.44-06 (5) 0-8+_0.8

0.6+_0-9 (5) 0.2 +- 0.4 (6) [0.6 + 2-1 (2) 17.84- I.I (I) 0.2_+0-4 (6) 0-8+_0-8

0.0 + 0.0 (7) 0-4 _+ 0-9 (6) 6.4+2.0 (3) 17.0_+0.7 (I) 0"8+ I-3 (5) 2.64-1'1

0.4+0.8 (7) 0-64- I. l (5) 5-[+_3.8 (2) 15.6+2-7 (I) 0.5+-0-8 t6) 1.3+_I-I

(51

(3)

(4)

(4)

(4)

(4)

3.4 +_ 0-9

4.0 4.- 2-0

7.4 +- 3.4

4.6 + 2.7

(3)

(3)

(2)

(3)

5.4 +_ 2.7

(2)

3'0 + 1.9

(2)

Ranks ( I ) = least toxic: (7)= most toxic sediment

relative ranking of samples in terms of toxicity as measured, primarily, by the end point of survival, and, secondarily, by the end points of reburial and emergence at ten days. The null hypothesis was that there was no concordance of ranking among all five laboratories for survival, reburial or emergence. Using Kendall's Test of Concordance, that hypothesis was rejected for each of the three end points (p <0-001 in all cases). The alternative hypothesis of inter-laboratory agreement on ranking was accepted for each of the three end points when all treatments were taken together. Further inspection of mean survival data indicated that there was perfect (100'~',o) inter-laboratory agreement in ranking within the three cadmium treatments for all three end points: i.e. the rank from least to most toxic was 4 mg/kg Cd, 8mg/kg Cd, 12mg/kg Cd for survival, emergence and reburial (Tables 2, 3 and 4). For the three field samples from Puget Sound, inter-laboratory agreement on ranking was also perfect for emergence: i.e., the rank from least to most toxic was (l) Central Basin, (2) Sinclair Inlet, (3) City Waterway (Table 4). However, there was not agreement for reburial and

Inter-laboratory comparison oJ amphipod sediment toxicity test

23

su~ival. Three of the laboratories agreed in the toxicity ranking noted above, whereas the other two agreed on the order 2-3-1. Agreement on mean values The second apriori hypothesis was that there would be an average 80 o; or greater level of agreement between all five laboratories on the actual numerical values of mean survival. This hypothesis was secondarily applied to mean reburial and emergence. The results of A N O V A and SNK multiple range tests of the data are presented in Tables 5, 6 and 7. For each of the three end points, on average, four or five laboratories (80 0"o) agreed on absolute means. There was perfect agreement for all laboratories on control means for survival, emergence and reburial. For survival, there was also perfect agreement for the sample from City Waterway (Table 5). For survival data on all other samples except the TABLE 5 Laboratory Agreement on Survival. Results of ANOVA and SNK Multiple Range Tests

Laboratory number

Treatment

as ranked Low

High

Maximum number of non-signiflcant diJfl'rences (agreements)

Control

I

4

2

3

5

5

4mgkgCd

I

3

2

5

4

4

8mg,kgCd

4

5

3

1

2

4

12 mg/kg Cd

4

I

3

5

2

3

Central Basin

2

5

I

3

4

4

Sinclair Inlet

2

5

I

3

4

4

City Waterway

I

3

4

5

~

5

Mean agreement

4-I

Number in full agreement

"

N u m b e r meeting 80 ; agreement

criterion Mean survival for laboratories that are connected by the same underline is not significantly different (p < 005).

A . J . Mearns et al.

24

TABLE 6 Laboratory Agreement on Reburial, Results of A N O V A and SNK Multiple Range Tests

Treatment

Laboratory number as ranked Low

High

Maximum number of non-significant difft'rences (agreements)

Control

[

4

2

3

5

5

4 mg/kg Cd

l

3

2

5

4

4

8 mg/kg Cd

4

5

3

1

2

2

12 m g / k g C d

4

I

5

3

2

4

Central Basin

2

5

1

3

4

4

Sinclair Inlet

2

5

I

3

4

4

City Waterway

I

3

4

5

2

5

Mean agreement

4.0

Number in full agreement

2

Number meeting 80°0 agreement criterion Mean survival for laboratories that are connected by the same underline is not significantly different (p < 0.05).

12mg/kg cadmium addition, four of five laboratories were in full agreement; laboratories 1 and 4 each disagreed with the others once and laboratory 2, twice. For survival data from the treatment with 12 mg/kg Cd, laboratories 2 and 5 differed with each other whereas laboratories I, 3 and 4 were in full agreement and significantly different from laboratories 2 or5. A similar pattern of agreement was seen with the reburial end point (Table6). However, there was better agreement for the 12 mg/kg Cd treatment (four of five laboratories, laboratory 2 being the exception) and considerable disagreement for the 8 m T k g Cd treatment (maximum of two laboratories in agreement). Nonetheless, overall, a mean of four laboratories agreed. The end point of emergence on Day 10 resulted in the best interlaboratory agreement on mean values (Table 7). All five laboratories were in full agreement (no significant differences) for controls and three

Inter-laboratory comparison of amphipod sediment toxicity test

25

TABLE 7 Laboratory Agreement on Day l0 Emergence. Results of ANOVA and SNK Multiple Range Tests Treatment

Laboratory number as ranked Low

High

Maximum number of non-significant differences (agreements)

Control

3

5

"~

4

[

5

4 mg,kgCd

4

2

3

5

l

5

8 mg/kgCd

2

I

3

5

4

2

[2 mg/kgCd

I

2

3

5

4

3

Central Basin

4

I

3

2

5

5

Sinclair Inlet

3

4

I

2

5

5

City Waterway

2

3

4

I

5

4

Mean agreement

4. [

Number in full agreement

4

Number meeting 80~ agreement criterion Mean surviwtl for laboratories that are connected by the same underline is not significantly different (p < 0-05). t r e a t m e n t s ( 4 m g / k g Cd t r e a t m e n t , Central Basin and Sinclair Inlet). A m a x i m u m o f f o u r l a b o r a t o r i e s agreed on mean e m e r g e n c e for the sample from City W a t e r w a y , three for the 12 m g / k g Cd t r e a t m e n t and only two for the 8 mg/kg Cd t r e a t m e n t . In s u m m a r y , four o f five l a b o r a t o r i e s , 80 o/ jo, were, on average, in a g r e e m e n t on m e a n values for the three end points. T h e end point o f e m e r g e n c e at D a y 10 p r o d u c e d the highest n u m b e r o f cases ( f o u r o f seven treatments) o f full a g r e e m e n t . Agreement on 'toxic' versus 'non-toxic' classification of samples T h e third a p r i o r i h y p o t h e s i s was that there would be, on average, an 80 oL or greater level o f a g r e e m e n t between the five l a b o r a t o r i e s in distinguishing n o n - t o x i c samples f r o m significantly toxic samples. Using

A. J, Mearns et al.

26

Dunnett's t procedure, there was perfect inter-laboratory agreement for survival for only two of the six test treatments: amended with 4 mg/kg Cd, non-toxic and amended with 12 m g k g Cd, toxic (TableS). For the 8 mg/kg Cd-amended sediment and the three field samples from Puget Sound, agreement among the five laboratories was split: two toxic, three non-toxic or three toxic, two non-toxic. The average agreement was 73 °,o among the five laboratories in the discrimination of toxic versus non-toxic on the basis of survival. For emergence on Day 10, there was perfect TABLE 8 Laboratory Agreement on Survival, Results of Dunnett's t Procedure (Means for each treatment were compared against the individual laboratory c o n t r o l . )

Treatment

Number o f laboratories ranking sediments as

Laboratory I

2

3

4

5

4 mg/kg Cd 8 mg/kgCd

12mg/kgCd Central Basin Sinclair Inlet City Watecway M e a n agreement Number in full agreement Number meeting 80',, agreement criterion

Toxic

Non-toxic

0

5

+

+

+

3

"~

+

+

+

5

0

+

+

-

+

+

2

3

-

+

+

2

3

+

-

+

3

2

+

-

3-7 2 2

- = Not significant: + = p < 0 - 0 5 .

agreement for three of the six test treatments, the 4 mg/kg and 12 mg/kg Cd and City Waterway sediments (Table 9). The 8 mg/kg Cd treatment was ranked toxic by three laboratories, not toxic by two laboratories, and the field sample, Sinclair Inlet, was ranked toxic by one laboratory, and not toxic by four laboratories. The average agreement among the laboratories on the basis of emergence was 90 ~/o- For reburial, there was also perfect inter-laboratory agreement for the two cadmium treatments (4mg/kg and 12mg/kg) and split agreement for the remaining four treatments (Table 10). The average agreement among laboratories on the basis of reburial was 77 ~ .

bTtcr-laboratorv comparison O/amphipod sediment toxicity test

TABLE

27

9

Laboratory Agreement on Day 10 Emergence. Results of D u n n e t t s t Procedure

Treatment I

4 mg kg Cd 8 m g kg Cd 12 m g k g Cd Central Basin Sinclair Inlet City Water,,vay Mean agreement Number in full agreement Number meeting 8 0 " , agreement criterion

Number o[ laboratories ranking sedimcnts as

Laboratory 2

3

+

+

+ +

+

+

+

4

+ +

+ +

+

+ +

Toxic

Non-toxic

0 3 5 0

5 2 0 5

I

4

5

0

4.5 4

- = Not significant: + = p <0-05.

TABLE

10

Laboratory Agreement on Reburial. Results of Dunnett's t Procedure

Treatment

Laboratory I

2

3

4

Number of laboratories rankhtg sediments as 5 Toxic

4 mg/kg Cd 8mg/kgCd 12mg/kgCd Central Basin Sinclair Inlet City Water~ay Mean agreement Number in full agreement Number meeting 80 °[i agreement criterion - = Not significant: + = p <.0-05.

+ +

+ + + -

+ +

+ +

+ + 3-8 2

+

3

+ + + + +

0 3 5 "~ 3 4

Non-toxic

.4. J. Mcarns et al

28

LCso and ECho values for cadmium Although not an a p r i o r i objective of the experiment, we computed sediment cadmium LCso values for survival and ECso values for reburial and emergence using nominal values for the three cadmium-amended sediments (Table 11). Individual laboratory estimates for mean LCso values for cadmiumamended sediments ranged from 9.44 to 11.45 mg/kg with a grand mean (n = 25) of 9-81 and a grand mean coefficient of variation of 12-04%. TABLE I I Comparison of LC_~o (Survival) and ECso Concentrations (Emergence. Reburial) for Cadmium Treatments by Moving Average Angle Method (95 % confidence limits are in parentheses)

Laboratory

LCso (mg/kg) sur~iral

ECso (mg,,'kg) ¢'mergcncc

ECso (mg kg) reburial

I

9"95 (9-68-10.23) 11-45 (10"86-12-28) 9.44 (9-05-9.8 I) 8-17 (7.53-8.67) 10-02 (9-48- I0' 64)

II '06 (10.52-11-79) 10.93 (10"50-11-46) 9.57 (9-14- I 0-00) 7 90 (7-47-8"371 9.12 ( 8 59-9.58)

9.59 (9.29 9-881 10.39 (10-02-10 80) 9-09 (8.68-9-461 7-66 (7.25-7-99) 860 18-08-9 03 )

2 3 4 5

Individual laboratory 95~o confidence limits ranged from 0-55 to 1.42 mg/kg. ECso values based on emergence at Day 10 ranged from 9-12 to I 1.06mg/kg cadmium with a grand mean (n = 25) of 9.72 and a grand mean coefficient of variation 13.57/o. Individual laboratory 95 ,o confidence limits ranged from 0.86 to 1-27 mg/kg. EC5o values based on reburial ranged from 7.66 to 10.39 mg/kg with a grand mean ( n = 2 5 ) of 9.07mg/kg and a grand mean coefficient of variation of 11.33 ~,,. Individual laboratory 95 ~,,~confidence limits ranged From 0.59 to 0.95 mg/kg. O/

-

O/

Into,r-laboratory comparison o] amphipod scdiment to.rtctt~ test

29

DISCUSSION This experiment was designed to test the inter-laboratory variability of the amphipod sediment toxicity test as proposed by Swartz et al. (1985). All five laboratories performed the test as described in the protocol of Swartz et al. (1985) and all data requested by a non-participatory referee were obtained. In designing this test. the participants agreed that the test sediments should include non-toxic (90 ~ - 1 0 0 }:,; survival), moderately toxic (on the order of 50 ;',,~',survival) and hi~zhly toxic sediments (near 0 °/survival), In the case of the field-collected sediments, actual toxicity was not what we expected based on previous toxicity tests with sediments from these sites. This may be due to the patchy nature of sediment toxicity, or may indicate an actual change in toxicity at these sites. Of the seven sediments tested, two fell in the non-toxic range (mean survival 96 o/,,,,,to 96.5 ~/o), one fell in the highly toxic range (mean survival 19[~,0), and none fell in the moderately toxic range. Instead, the majority of the sediments (four of seven) were marginally toxic (mean survival, 76.5 o/£ to 83 }~). Therefore, in this inter-laboratory comparison, the ability of the Swartz et al. (1985) toxicity test to differentiate between the sediments was tested only at the extremes of the toxic gradient. The results of this experiment are summarized in Table 12. By three of four measures, the amphipod sediment toxicity test as proposed by Swartz ~'t al. (1985)successfully passed a priori criteria and hypotheses proposed for a robust toxicity test. These include excellent ( > 90 ~o) mean control survival, significant agreement on ranking all six test samples for each of the three end points and excellent agreement on mean responses. However, the amphipod sediment toxicity test failed one a priori hypothesis. Specifically, there was incomplete inter-laboratory agreement on classification of samples as toxic or non-toxic. The implications of each of these findings are discussed in more detail below. The criterion of both success and robustness was high control survival. No laboratory failed to meet the a priori requirement that mean control survival remain above 90'?,.;;. Moreover. there were no significant differences in mean control survival among the five laboratories. The end points, reburial and emergence at Day 10, also conformed to these conditions (emergence
30

A. J. Mearns e'. al.

TABLE 12 Summary of Success of Inter-Laboratory Comparison Experiment in Passing Criteria and Hypotheses Pass ?

Remarks

A priori criteria I. Acceptable control responses Primary criterion All laboratories have control survival > 9 0 oj~

Yes

All laboratories met criterion

Related results Control reburial > 90"~, and control emergence < [ 0 ~/o for all laboratories, There were no significant differences (p < 0-05) between all five laboratories in mean control survival, reburial and emergence. A priori hypotheses I. Toxicity ranking of sediments Primary criterion 80°° of laboratories agree on ranking of mean survival for 80~o of treatments

Yes

No significant difference (p < 0.05) between all five laboratories for all treatments (control included)

Related results There was no significant difference (p < 0.05) between all five laboratories in ranking of mean reburial and emergence for all seven sediment treatments. There was perfect agreement among all laboratories in the ranking of the three cadmium treatments for all three end points. There was perfect agreement among all laboratories in the ranking of all three field samples for reburial. Three of five laboratories agreed perfectly in ranking of field sediments for survival and emergence, and there was no significant difference (p < 0.05) between all five laboratories in the ranking of these two end points. 9

Interlaboratory agreement on mean responses Primary criterion 80 o./ of laboratories have no significant difference in mean survival for 8 0 ~ of treatments

Yes

Criterion met for six of seven treatments (86~J~)

Related results o~ This criterion was met for six of seven treatments (86/o) for reburial, and five of seven treatments (71 ~'/o)for emergence.

Inter-laboratory comparison of amphipod sediment toxicity test

31

TABLE 12--contd.

Pass ?

Remarks

A priori hypotheses--contd. 3. Classification of samples as toxic or not toxic based on comparison with control for individual laboratories.

Primary criterion 80 °o of laboratories agree on classifications by survival for 80~0 of treatments

No

Criterion met for only two of six treatments

(33 ojg)

Related results This criterion was met for five of six treatments (83 ~) classified by emergence, but only three of six treatments (50 ~,~)classified by reburial. A posteriori h)pothesis I. Similarity of cadmium LC~o and ECso values

Primary criterion There were no aprioricriteria for judging differences between laboratories in estimates of the cadmium LCso and ECso's.

Related results The LCso (and ECso) estimates were remarkably similar between all five laboratories. The coefficient of variation was less than 14~/o for all three end points.

The first hypothesis was inter-laboratory consistency in the rank order of treatments. Although there was not perfect inter-laboratory agreement on the rank order of mean survival, reburial and emergence among all six test sediments, the differences were not significant and the hypothesis of inter-laboratory agreement in ranking was accepted at p < 0.05 for survival, reburial and emergence at Day 10. Moreover, there was perfect inter-laboratory agreement in rank order assignments of the subset of three cadmium-amended sediments for all three end points. However, lack of agreement occurred with the subset of three field samples from Puget Sound. The difference in agreement between the two subsets is not surprising since the cadmium-amended sediments resulted in a wider toxicity gradient ( 9 6 ~ , 765~o, 1 9 ~ mean survival) than the field samples, which exhibited a narrow range of values for mean survival (79 ,"/o, 83 ~ , 83 ~). However, this does not mean that a toxicity gradient was not present among these field samples; a gradient was evident in toxic

32

A . J . Mearns et al.

effects of these sediments on emergence and reburial and there was perfect inter-laboratory agreement in ranking of all three field samples (and the cadmium-amended samples) for emergence and reburial. This result suggests that emergence and reburial (sublethal responses) may be more precise measures than survival in differentiating toxic sediments from non-toxic sediments under moderately contaminated conditions, The second hypothesis tested inter-laboratory agreement on mean values for survival, reburial and emergence independent of control conditions. On average, 80~o or more o[ the laboratories agreed on absolute values for all three end points. The third hypothesis tested inter-laboratory ranking of test sediments as toxic or not toxic relative to the inter-laboratory control. The requisite 80 % level of agreement was achieved for only two of the six test sediments for survival but increased to three of six test sediments for reburial and to five of six (83 3~) for emergence. The two sediments that yielded perfect agreement for all three end points were the low dose and high dose cadmium-amended sediments (4 and 12mg/kg, respectively); and for emergence all laboratories agreed that the Central Basin sediment sample was non-toxic and that the City Waterway sediment sample was toxic. The hypothesis of agreement on classification of samples was only upheld for emergence. As previously noted, the lack of agreement on classifying sediment samples, based on differences between control and test survival, was due to the low range of mean survival obtained from the three field samples (79-83 0/0); the 8 mg/kg Cd-amended sediment also resulted in an overall mean survival near this range (76.5 ~). From these results, it appears that the Swartz et al. (1985) amphipod sediment toxicity test does not lead to a confident conclusion about sediment toxicity when survival is in the range of 76.5 ~o to 83 ~,,,. This suggestion is supported by other data. Table 13 presents data from 365 separate amphipod sediment toxicity tests conducted by three laboratories (E.V.S. Consultants; EPA, Manchester and EPA, Newport) prior to, and independent of the present results. These data clearly show that samples with survivals below 75.5 <',0 are always classified as toxic, whereas samples with survivals above 87.5 °0 are always classified as non-toxic. In contrast, samples with survivals between 76 ~,0 and 87 3/o (the three field-collected sediments used in the present study and the 8 mg/kg Cd-amended sediment fall into this range) are not consistently classified as toxic or non-toxic. Thus, the partial rejection of the third hypothesis appears to be a reflection of the narrow range of toxic

Inter-laboratory comparison of amphipod sediment toxicity test

33

TABLE 13 Independent

Data

from Three Laboratories on Mean Su~-ival Values that are Significantly Different from Controls

Mean surtical

Nurnber o[sediments tested"

(Out of 20)

('~,,)

Toxic b

Non-toxic

00-[5- [ 15.2-15.9 16.0-16.9 17.0-17-4 17-5-20.0

0-75-5 76-79.5 80-84.5 85-87 87.5-100

129 14 19 7 0

0 II 20 18 I47

Per cent oJ sedirn en ts rated as toxic b 100 56 49 28 0

" 365 separate sediments tested. b Significant at p < 0.05, compared with controls.

sediments used in this study. It is likely that this hypothesis would have been accepted if the range of toxicity had been wider. The last comparison, in this case an a posteriori comparison, was on 10-day LCso and ECs0 data for cadmium. The agreement in these data was much better in this study than that obtained in other inter-laboratory comparisons using reference toxicants. For instance, the inter-laboratory coefficient of variation (CV) was 11.3 % for reburial, 12-49~ for survival and 13.6 0/o for emergence. In contrast, sequential, repetitive tests with Daphnia exposed to cadmium have yielded 48-h LCso CV's of 20.9 ~o (D. pnlex, nine separate tests) to 72.4°0 (D. magna, eight separate tests) (Lewis & Weber, 1985). The ratio of high to low mean end-point values (i.e. LCso and ECso values) for the present study is compared in Table 14 with four previous studies using various reference toxicants. The present study had the lowest ratio of 1.4; in the other studies, these ratios varied from 1.8 to 10.5. Similarly, the ratio derived in the present study is lower than that derived by Lewis & Weber (1985) in sequential, repetitive cadmium 48-h LCso'S with Daphnia pulex (ratio of 1.7) and Daphnia magna (ratio of 6.7). The results of the cadmium comparison demonstrate the ability of the amphipod sediment toxicity test to measure precise LCso and ECso values for toxicant additions in sediments, as well as the corollary dosedependent relationship. This further establishes that the Swartz et al. (1985) amphipod sediment toxicity test is capable of measuring, with

,.., . ~

=-5

i~

~ -3

~" ~

~, - ~

",

e-, o

o i_ o

,~

=

~==o

-

m

E N

¢~,_. 0

",,.3 c:

.~

"C'

-

~.~_ ~ _ ¢f0

.-'2_

0

~.~_

& 0

L.)

.= =--

~

5

~-

._!

-,& Z

,.~

I.-

k-.

,-,'

Inter-laboratory comparison of amphipod sediment toxicity test

35

good inter-laboratory precision, toxic effects due to a chemical-laden sediment.

SUMMARY The Swartz et al. (1985) amphipod sediment toxicity test, as performed in this study by five laboratories, was robust from several standpoints: acceptable survival and behavior (emergence and reburial) of controls, determination of the rank order of toxicity and agreement of mean values for survival and behavior. The fact that the test was not found to be robust in classifying sediments as toxic or non-toxic, except in the case of emergence, is probably due to the small intervals between mean survival of control and test material for several treatments. The results with the cadmium-amended sediments demonstrate that this test is extremely precise in lethal (survival) and sublethal (emergence, reburial) response parameters. Any laboratory that performs the amphipod sediment toxicity test according to the protocol ofSwartz et al. (1985) should be able to rank the relative toxicity of a series of sediment samples with a high degree of confidence provided that there is a reasonable gradient from highly toxic to relatively non-toxic. However, where samples are only marginally toxic (mean survival 76-87 ~), only the end point of emergence may accurately distinguish and rank samples.

ACKNOWLEDGEMENTS The authors thank the following individuals for their assistance and encouragement: Howard Harris, Louis Butler, Bruce McCain, Edward Long, John Landall and Michael Buchman (NOAA); Gary O'Neal, Arnold Gahler and Donald Baumgartner (EPA), and John Lampe (Municipality of Metropolitan Seattle). Individual laboratory acknowledgements are as follows. E.V.S. Consultants: David Mitchell, Craig Barlow, Kim McKim and Laurie Mitchell; EPA (Newport): Faith Cole, Kathy Sercu, Don Schults, George Ditsworth, Janet Lamberson, Jill Jones and Paul Kemp; University of Washington: Fredericka Ott, Joan Miniken and Quentin Stober: EPA (Manchester): Carolyn Gangmark, Roy Arp, Michael Schlender and Bruce Woods; National Marine

36

A.J. Mearns et al.

Fisheries Service: William G r o n d l u n d and Michael Schiewe. Word processing of the manuscript was done by Muriel Heatley ( N O A A ) and Marla Mees (E.V.S. Consultants). Valuable review comments on an early draft were provided by Carol Pesch, Don Reish, T o m Ginn, kes Williams and three anonymous referees. Bob Spies provided final editorial review of the paper.

REFERENCES Chapman, P. M. & Fink, R. (1983). Additional marine sediment toxicity tests in connection with toxicant pretreatment planning studies. Report prepared for the Municipality of Metropolitan Seattle by E.V.S. Consultants, Seattle, Washington. Davis. J. C. & Hoos, R. A. W. (1975). Use of sodium pentachlorophenate and dehydroabietic acid as reference toxicants for salmonid bioassays. J. Fish. Res. Bd. Canada, 32, 411-16. Eisenhart, C., Hastay, M. W. & Wallace, W. A. (1947). Techniques ofstatisticol analysis. Chapter 15. McGraw-Hill Book Co., NY. Finney, D. J. (1971). Probit analysis. Cambridge University Press, Cambridge, UK. Grothe, D. R. & Kimerle, R. A. (1985). inter- and intralaboratory variability in Daphnia rnagna effluent toxicity test results. Enrironm. Toxicol. Chem., 4, 189-92. Lewis, P. A. & Weber, C. [. (1985). A study of the reliability of Daphnia acute toxicity tests, in: Aquatic Toxicology and Hazard Assessment: Secenth Symposium (Cardwell, R. D., Purdy, R. & Bahner, R. C. (Eds)). ASTM STP 854, 73-86. Long, E. R. (1984). Sediment bioassays: A summary of their use in Puget Sound. Unpublished manuscript, Seattle Project Office, US NOAA. 30 pp. Mc[ntyre, A. D. (1984). What happened to biological effects monitoring? Mar. Pollut. Bull., 15, 391-2. Pesch, C. E. & Hoffman, G. L. (1983). lnterlaboratory comparison of a 28-day toxicity test with the polychaete Neanthes arenaceodentata. In: Aquatic Toxicity and Hazard Assessment: Sixth Symposium (Bishop, W. E., Cardwell, R. D. & Heidolph, B. B. (Eds)), ASTM STP 802, 482-93, Reish. D. J., Pesch, C. E., Gentile, J. H., Bellan, G. & Bellan-Santini, D. (1978). Interlaboratory calibration experiments using the polychaetous annelid Capitella capitata. Mar. Em'ironm. Res., i, 109-18. SokaI, R. R. & Rohlf, F. J. (1969). Biometrv. W. H. Freement and Co.. San Fransciso. 776 pp. Steel, R. G. D. & Torrie, J. H. (1960). Principles and procedures o/statisties. McGraw-Hill Book Co., Inc., New York. Swartz, R. C., DeBen, W. A., Phillips, J. K.. Lamberson, J. O. & Cole, F. A.

Inter-laboratory comparison of amphipod sediment toxicity test

37

1985). Phoxocephalid amphipod bioassay for marine sediment toxicity. In: Aquatic Toxicology and Hazard Assessment. Secenth Symposium (Cardwell. R. D.. Purdy. R. & Bahner, R. C. (Eds)). ASTM STP 854. 284-307. US EPA. (1984). Interim decision criteria ]'or disposal of dredged material at Four-Mile Rock open-water disposal site. Environmental Protection Agency Region 10, June 1984. 18pp. US E PA, (1985). hlterim decision criteria Jor uncontamed disposal of dredged material at the Port Gardner open-water disposal site. Environmental Protection Agency Region [0, May 1985. 15 pp. Vanhaecke. P. & Persoone, G. (198 l). Report on an intercalibration exercise on a short-term standard toxicity test with Artemia nauplii (ARC-test). [nterm 106, 359-76.