ARTICLE IN PRESS Applied Radiation and Isotopes 66 (2008) 1575– 1581
Contents lists available at ScienceDirect
Applied Radiation and Isotopes journal homepage: www.elsevier.com/locate/apradiso
Choice and criteria for selection of sampling strategies in environmental radioactivity monitoring E.M. Scott a,, P. Dixon b, G. Voigt c, W. Whicker d a
Department of Statistics, University of Glasgow, Glasgow G12 8QW, UK Department of Statistics, Cochran Hall, Iowa State University, Ames, IA, USA IAEA Seibersdorf Laboratories, Vienna, Austria d Department of Environmental and Radiological Health Science, Colorado State University Fort Collins, CO 80523, USA b c
article info Keywords: Statistical sampling strategies Representative
abstract Environmental radioactivity monitoring requires a sampling strategy to be defined, adopted and delivered using sound scientific principles. Statistical sampling delivers a set of sampling units from the population that is representative of all sampling units that could be taken. Such a representative set can then be used to draw inference(s) and conclusion(s) about the population based upon a statistical model. The environmental knowledge of the context in which the sampling is to be carried out plays a vital role in determining the appropriate statistical sampling strategy. & 2008 IAEA. Published by Elsevier Ltd. All rights reserved.
1. Introduction Environmental radioactivity monitoring requires a sampling strategy to be defined, adopted and delivered. If statistical sampling is to be used, information about the nature of the population and its characteristics must be known. The concept of population is important, the population is the set of all items that could be sampled, while a sampling unit is a unique member of the population that can be selected as an individual sample for collection and measurement. Statistical sampling leads to a description of the sampled units from the population and inference(s) and conclusion(s) about the population based upon a statistical model. Intrinsic to the concept of population is stochastic variation. Variation in environmental media is an ever present issue, in that soil or sediment samples taken side-by-side, or material taken from different parts of the same plant, or from different animals in the same environment, exhibit different activity concentrations of a given radionuclide. Individual sampling units are unique and the attribute of interest varies over the sampling units, following some statistical distribution, often called the sampling distribution. The set of sampling units is collectively called the sample. The observed distribution of values over the sampling units will provide an estimate of the variability inherent in the population of sampling units that, theoretically, could be taken. An essential statistical concept is that the taking of a sufficient number of individual sampling units provides a true
Corresponding author. Tel.: +44 141 330 4814.
E-mail address:
[email protected] (E.M. Scott).
reflection of the underlying population and that by the process of statistical sampling, the sampling variation can be quantified. This variability is caused by natural variations in the processes that control radionuclide transport and uptake in the environment. As a result, radionuclide measurements in the sampling units will vary, to be quantified in the sampling variation. This should be clearly distinguished from the measurement variation, which reflects that all measurement is subject to uncertainty. Every time that an analytical measurement is repeated under identical conditions on an identical sampling unit (even if this were possible), a different result would typically be obtained. The use of valid statistical sampling techniques increases the chance that a set of specimens or sampling units (the sample, in the collective sense) is collected in a manner that is representative of the population. Representativeness of environmental samples is difficult to demonstrate and usually, representativeness is considered justified by the procedure used to select the samples (Gilbert and Pulsipher, 2005). The theoretical sampling strategy provides, as a minimum, a plan of what and where to measure, how many sampling units to collect, the time frame over which sampling units should be collected, and the sampling frequency. The resulting measurements made following the strategy should be ‘fit for purpose’ whether that be for emergency or regulatory management, or simply for compliance purposes. The sampling strategy designed and adopted must deliver the necessary confidence and power to allow the appropriate inferences to be drawn. Statistical sampling, however, is not simply a ‘recipe’ based approach, the environmental knowledge of the context in which the sampling is to be carried out plays a vital role in determining the sampling strategy.
0969-8043/$ - see front matter & 2008 IAEA. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.apradiso.2007.10.015
ARTICLE IN PRESS 1576
E.M. Scott et al. / Applied Radiation and Isotopes 66 (2008) 1575–1581
Environmental knowledge also plays a key role in the transformation of any theoretically designed sampling strategy, which may require to be adapted or to be modified to reflect unexpected circumstances to deliver the realised sampling strategy. In the following sections, a five-stage process is described and some case studies used to illustrate how the principles and approaches described can be applied in a number of situations.
2. The process in designing a statistically based sampling scheme The process of designing a statistically based sampling scheme is based on five stages, which are listed below: Stage Stage Stage Stage Stage
1: Define the objectives 2: Summarise the environmental context 3: Identify the target population 4: Select an appropriate sampling design 5: Implement the sampling design and after laboratory analysis, summarise the results.
Following the sequence of stages, logically orders and brings together the necessary information to support the monitoring/ sampling scheme and documenting each stage, provides important evidence, which can then be subjected to scrutiny for any quality assurance procedures. 2.1. Specifying the objectives The first and potentially most important preexperimental stage is to specify clearly what the objectives are that must be met. In the radioecological context, there are many possible objectives including:
describing a characteristic of interest (usually the average, but
could also be the variability or a high percentile), e.g. what is the average activity of 14C in foodstuffs, describing spatial patterns including mapping the spatial distribution, such as has been done for Chernobyl fallout, quantifying contamination above a background or specified intervention level, detecting temporal or spatial trends, assessing environmental impacts of specific facilities, or of events such as accidental releases, demonstrating protection of the most exposed member of a population.
The reality is that often, an existing sampling programme is used to satisfy a number of different objectives (including reporting requirements under international treaties).
the expected pattern and magnitude of variability in the observations.
2.2.1. Examples The population is the set of all items that could be sampled, such as all fish in a lake, all people living in the UK, all trees in a spatially defined forest, or all 20 g soil samples from a field. Appropriate specification of the population includes a description of its spatial extent and perhaps its temporal stability. In some cases, sampling units are discrete entities (i.e. animals, trees), but in others, the sampling unit might be investigatordefined, and arbitrarily sized. Example 1. Technetium in shellfish for radiological protection. The objective here is to provide a measure (the average) of technetium in shellfish (e.g. lobsters for human consumption) from the west coast of Scotland. The population is all lobsters on the west coast of Scotland and the sampling unit is an individual animal. Example 2. Radiocarbon in cereal crops for impact assessment. Similarly here, the objective might be to provide an estimate of the baseline level of 14C in cereal crops remote from any anthropogenic disturbances. For this particular problem, definition of the population should include identification of the species and information on where and when it grew and its spatial context. The analysis requirements would then define how much material would need to be collected for each sampling unit. In terms of the temporal extent, it would be logical for the samples to be selected from a single growing season and in a specific year such as 2007. This results in a clear definition of the population, the chosen cereal crop in the selected location growing in a specific year and of a sampling unit, namely a bulked sample of the cereal crop harvested from a specific location. Knowledge concerning the analytical procedure to be used would constrain the quantity of material, in this case, cereal to be sampled. A realistic figure might be 10 g, therefore, the population would be all possible 10 g samples of cereal that could be collected and a sampling unit would be a 10 g sample of cereal collected from a crop. Naturally, in this context, each sampling unit would also have a spatial dimension. Example 3. Mapping 137Cs (Sanderson et al., 2004) In an emergency, the ability to rapidly map deposition of nuclides over a wide area could prove important in management of the situation and protection of the population. Therefore a large EU–funded project (ECCOMAGS) was begun to:
assess comparability of the European aerial gamma spectrometry (AGS) systems for mapping contamination and
2.2. Using your scientific knowledge
assess comparability of AGS with ground-based systems (including in-situ and soil samples).
Having specified objectives, the next stage in the design of the sampling programme is to utilise prior knowledge concerning the nuclides of interest and their environmental behaviour. This would include:
the nature of the population such as the physical or biological
material of interest, its spatial extent, its temporal stability, and other important characteristics, the expected behaviour and environmental properties of the nuclide of interest in the population members, the sampling unit (i.e. individual sample or specimen),
An AGS system (with NaI detectors) mounted in a helicopter or fixed-wing craft might typically ‘see’ an area that corresponds to 100 m2, an in-situ spectrometry system might see an area that corresponds to 10 m2 and finally, soil samples (taken using a coring tool) might ‘see’ an area corresponding to 10 cm2. Each system in addition has a third dimension, namely depth). The spatial extent of the sampling unit is sometimes known as its support. Knowledge concerning the measuring and sampling
ARTICLE IN PRESS E.M. Scott et al. / Applied Radiation and Isotopes 66 (2008) 1575–1581
devices immediately helps identify the population and sampling unit. The population of sampling units would be defined by the boundary of the area that is to be mapped and would be conceptually broken into sampling units of the measurement/sampling device extent. Example 4. Sampling to detect and remove radioactive particles from a beach (DPAG, 2006). A final example concerns sampling with the objective to ‘detect and promptly remove’ radioactive particles on a beach in the north of Scotland close to the Dounreay nuclear site. Monitoring systems have included, in the early years, irregular strandline survey, while a more recent monitoring system has used a series of NaI detectors mounted on a vehicle. This example is therefore similar to Example 3 (using in-situ spectrometry), but the objective is not to map a contamination field but rather to identify point sources spatially distributed on the beach. The temporal dimension of the sampling is challenging, in that the arrival or redistribution rate of particles on the beach is unknown. The current sampling is semicontinuous, a large extent of the beach is monitored each month (taking several weeks to complete a survey). Spatially, it is possible to consider the beach area as being divided into conceptual grid cells whose dimension is determined by the detector extent, the integration time and the vehicle speed.
1577
Table 1 The selection of a random sample of ten cells 1
2
3
4
5
6
7
8
9
17 23 31 46
25 33 42 51
45 54
Selected cells are shown in bold.
simple example, imagine sampling ten 100 cm2 areas from a specified geographic area for soil sampling. We could use a map to enumerate all individual grid cells. Suppose that the area comprised 56 such cells, then we could generate 10 random numbers lying between 1 and 56, such as 5, 17, 23, 25, 31, 33, 42, 45, 46, 51 to identify the 10 individuals. If the same number were generated more than once, then we would simply continue the process till we had 10 unique random numbers and these would then identify the individual sampling units to be monitored. The actual numbers (5, 17, etc.) are read from random number tables or may be generated by statistical software. In the context of the cereal crop, we would require define of the population by first identifying where cereal crops were grown in the remote area, we would then identify the sampling units by the amount of crop that would be required to be harvested to make the measurement, then for a particular field, a scheme such as that identified in Table 1 could be used.
3. Statistical sampling schemes The basis of many sampling methods is probability sampling, where it is assumed that the population can be enumerated such that each member of the population has a known and nonzero probability of being selected. Under a probabilistic sampling strategy, estimators for the population characteristics of interest can be derived and their efficiency evaluated. There are many statistical sampling designs, some of which are described in ICRU (2006) as well as in classic statistical sampling textbooks including Cochran (1977), Thompson (2000), Thompson and Seber (1996) and Barnett (2002). Three of the most widely used schemes are described below. 3.1. Simple random sampling (SRS) Suppose as in Example 2, we wanted to estimate the baseline level (the average) of 14C in cereal crops remote from any anthropogenic disturbances. For SRS, every sampling unit in the population is expected to have an equal probability of being included in the sample. The first step requires complete enumeration of the population members. In the simple randomsampling scheme, one generates a set of random digits that are used to objectively identify the individuals to be sampled and measured. How can such a scheme be applied to the problem? 3.1.1. The sampling frame In simple random sampling, one might assume a population of N units (e.g. N 100 cm2 areas), and use simple random sampling to select n of these units. This typically involves generation of n random digits between 1 and N, which would identify the units to sample. If a number is repeated, then one would simply generate a replacement digit. The actual sample (set of sampling units to be measured) is chosen by randomisation, using published tables of random numbers or computer algorithms. Selecting a probability sample is easy when the population can be enumerated. As a
3.2. Stratified random sampling Suppose in another example, we wanted to estimate the inventory of 60Co in the sediments of an estuary whose boundaries have been clearly defined. We know that 60Co is particle reactive and we have a map of sediment type in the estuary, showing the location and area of sand, silt, and mud features. The radionuclide measurement requires a surface sample of area 10 cm2, so that again we can conceptually define the population within the estuary as all 10 cm2 areas. How would we make use of this information about the sediment type? If we chose a simple random sampling plan, a random sample of locations within the estuary at which a sediment sample would be taken and the 60Co measured is chosen from the target population of locations. Such a random sampling scheme might over sample muddy location and under sample silty locations. Over sampling simply means that more muddy locations were sampled than would be required based on their relative area in the estuary. Where the population is divided into two or more strata that individually are more homogeneous than the entire population, a stratified sampling method is often used to estimate the properties of each stratum. Usually, the proportion of sample observations in each stratum is similar to the stratum proportion in the population. If there is knowledge of different strata over the sampling domain (such as sediment type), the use of a stratified sample would be recommended and a random sample of locations would be selected within each stratum. Using a stratified random sampling scheme, the population would still be defined as all 10 cm2 sampling units, but the random selection of sampling units would be done within each of the strata (sand, silt, mud, etc.) and the number of sampling units selected in each stratum would be proportional to the area of that stratum within the estuary. Fig. 1 shows an example of stratified
ARTICLE IN PRESS 1578
E.M. Scott et al. / Applied Radiation and Isotopes 66 (2008) 1575–1581
Number of points = 50
northings
60 50 40 30 20 10 0 0
20
40
60
80
100 eastings
120
140
160
180
200
Fig. 1. Results of a stratified random sampling scheme, shaded areas represent the different strata, dots identify the sampled locations.
Table 2 The selection of a systematic sample of 9 cells 1
2
3
4
5
6
12
7
8
9 18
systematic sample to confound the sampling with the inherent cyclicity in the attribute of interest. This introduces bias when estimating the mean, but it may be desirable if the study goal concerns most-exposed individuals (presuming that the sampling occurs at the peak of the cycle).
24 30
36
3.4. General spatial sampling
42 48
54
Selected cells are shown in bold.
random sampling, a total of 50 sampling locations were selected in four different strata. Any form of random sampling with a spatial context may prove to be impractical and inefficient. In the estuarine setting, the sampling is likely to be done from a boat, so manoeuvrability would argue for sampling in a straight line, possibly from one shore to the other which leads to the third sampling scheme, systematic sampling (Table 2).
In several of the illustrative examples, there has been a strong spatial component, if we assume that the attribute is spatially continuous, and that in principle it is possible to measure the attribute at any location defined by coordinates (x, y) over the domain or area of interest, then a number of different sampling schemes (some specialised) have been developed. A motivating 137 example would be mapping Cs over an area. This is a subject area where there has been much specialised development, but systematic sampling, however, still remains one of the most popular sampling schemes in the spatial context and we introduce two simple variants, quadrat and line transect sampling. Interested readers in geostatistical sampling techniques are referred to Webster and Oliver (2001).
3.3. Systematic sampling Using the same example, assume there are N ( ¼ nk) units in the population. Then to sample n units, a first unit is selected for sampling at random and subsequent samples are taken at every k units. Systematic sampling has a number of advantages over simple random sampling, not least of which is convenience of collection. A systematic sample is spread more evenly over the population. In a spatial context such as the sediment sampling problem, this would involve laying out a regular grid of points, which are fixed distances apart in both directions within a plane surface. Usually, for systematic sampling the region is considered as being overlaid by a grid (rectangular or otherwise), and sampling locations are at gridline intersections at fixed distance apart in each of the two directions. The starting location is expected to be randomly selected. Both the extent of the grid and the spacing between locations are important. The sampling grid should span the area of interest (the population). Table 2 provides a simple illustration of a systematic sample taken from a grid. In this setting a systematic sample would also produce sampling units drawn from the different strata in appropriate proportions. Systematic samples are especially common for monitoring over time. Samples are frequently collected daily, monthly, quarterly, or annually. One particular disadvantage of systematic sampling arises when the underlying attribute has a cyclical pattern (e.g. a diurnal cycle say for 7Be in air sampling, or a cycle in 137Cs due to ploughed furrows in a field), then it would be possible for a
3.4.1. Quadrat and line transect sampling A quadrat is a well-defined area within which one or more samples are taken. The position and orientation of the quadrat will be chosen as part of the sampling scheme. This is a popular approach in ecology where it is commonly used in estimating abundance of particular species, but the approach is equally relevant in radioecology. A line transect is a straight line along which samples are taken, the starting point and orientation of which will be chosen as part of the sampling scheme. In addition, the number of samples to be collected along the transect, and their spacing requires definition (Fig. 2). 3.5. What can go wrong? In any real-life situation, as we move from the theoretical to the practical implementation of the sampling scheme, problems may be encountered. In a sampling scheme, the most commonly encountered problem is that it is not possible to collect a sampling unit (e.g. no trees at the location, sediment bottom too rocky, etc). The appropriate response to this problem depends on the study goal and target population. If the goal is to monitor accumulation in soft-bottom sediments, then rock and gravel samples should be ignored or replaced by a sample from a substitute location. If the goal is to estimate the inventory in a whole lake bottom, rock or gravel samples should be analysed for radionuclides or treated as zero values.
ARTICLE IN PRESS E.M. Scott et al. / Applied Radiation and Isotopes 66 (2008) 1575–1581
1579
Number of points = 44 50
northings
40 30 20 10 0 0
10
20
30
40
50 eastings
60
70
80
90
100
Fig. 2. A transect sampling scheme, where the transects are equally spaced over the region, but the sampling locations on the transect are randomly selected.
4. Criteria for choosing a design A good monitoring programme provides a scientifically justifiable basis for conclusions about conditions or trends in a target population. All four statistical sampling designs discussed above provide that basis. The specific choice of design is influenced by statistical efficiency and practical considerations. A more efficient sampling design provides a more precise estimate of the quantity of interest, e.g. the population mean, from the same number of samples. In general, the precision of the estimate of interest, e.g. the estimate of the population mean, depends on the sample size. When designs are compared at the same number of samples, spatial, stratified and systematic designs are generally more efficient than simple random sampling (Cochran, 1977). The details (e.g. how much more efficient) depend on the characteristics of the target population. Implementing a monitoring programme requires choosing a sample size. More samples provide a more precise estimate. The details of the computation of the standard error depend on the design; they are given in various sampling texts (e.g. Cochran, 1977; ICRU, 2006; Thompson, 2000). The appropriate choice of sample size depends on the desired precision. A formal approach for choosing a design and sample size for environmental decision making is the US Environmental Protection Agency’s Data Quality Objectives (DQO) (EPA, 2006) process. The steps in the DQO process are very similar to steps listed in Section 2, except that the DQO process includes explicit statement of the required performance (i.e. required precision for an estimate or maximum error rates for a hypothesis test). The required performance is then used to determine the appropriate sample size. This approach has been incorporated into visual sample plan, a public-domain software package (EPA, 2006). Practical considerations determine how easily a design on paper can be implemented in the field. A simple random sample of locations can require considerable travel time to visit all locations. In many environments, systematic sampling and especially transect sampling minimise travel time and are much easier to implement.
5. A detailed case study Returning to Example 3 in Section 2.2, the overall aims of the ECCOMAGS exercise was to assess the comparability of the European AGS systems and to evaluate their comparability with ground-based systems. This EU- (FP5 Nuclear Safety) funded project provides an opportunity to compare more than three
measurement techniques (differing in their spatial and temporal extent) in terms of sampling strategies. It is simplest to imagine a hierarchy of sampling based on the spatial support of the ‘sampling unit’. The first level is soil sampling, with the results of the soil sampling forming the basis of ‘ground truthing’ of both in-situ and AGS measurements. In ECCOMAGS, three common areas were chosen for ground truthing of the AGS systems, and in each area, soil-sampling locations were selected at random. The objective of the ground sampling was to produce (a) a spatial average for the area and (b) to allow point–point comparison with spatially matched AGS measurements. In-situ measurement uses the same technology as the AGS systems, the detector is smaller (for portability), and the height above ground is typically of the order of a few centimetres, thus the spatial field of view of the detector is much smaller. Three calibration sites were defined for in-situ measurement (which could also be used by the AGS systems). At each site, soil samples were collected, the objective again being to provide a representative, spatially averaged estimate of the 137Cs activity. However, the sampling scheme used for this part of the programme used detailed knowledge of the detector physics and a distance weighted sampling scheme was devised as shown in Fig. 3 (Tyler et al., 1996). An AGS follows a series of predefined flight lines, the spacing of which determines the spatial coverage. Thus the survey uses a systematic sampling scheme. The AGS detector sees a ‘field of view (ellipse)’ defined by the aircraft height and by the time over which an observation is recorded. The raw data generated is an energy spectrum, accumulated over a short period of time (typically 1–2 s). An observation therefore has a spatial coverage of some 100s of m2 depending on a number of operational factors including height and speed of the aircraft. The sampling scheme is therefore characterised by the flight line spacing and by the speed/integration period of the detector. Some of the results from the AGS-in-situ and AGS-ground comparisons of activity per unit area are shown in the scatterplots below, with the line of equality included to aid interpretation. These show that there is a very good agreement between the methods, with points scattered around the line of equality (Fig. 4).
6. Conclusions Sampling for radionuclides in the environment is carried out for many purposes, including estimation of certain characteristics, such as the activity concentration of a radionuclide (Bq kg1) in
ARTICLE IN PRESS 1580
E.M. Scott et al. / Applied Radiation and Isotopes 66 (2008) 1575–1581
Fig. 3. Expanding hexagonal sampling pattern showing the radial numbers and shell designations for in-situ characterisation (Tyler et al., 1996).
Fig. 4. Air to in-situ and air to ground pointwise comparisons (Sanderson et al., 2004).
sediment, water, or biological tissue within a defined area, the areal activity density of radionuclide deposition (Bq m2) in a given type of soil, etc. Many experimental and monitoring programs have multiple objectives that must be clearly specified before the sampling programme is designed, because different purposes require different sampling strategies and sampling intensities in order to be efficient, and to permit general inferences. Statistical sampling is pertinent and necessary in radioecology because of the natural stochastic variation that occurs in all environmental media, and the fact that this variation is usually much larger than variations associated with measurement uncertainties.
In general, sampling for radionuclides in the environment is not unlike sampling for other attributes of environmental media, nor for other types of survey sampling. The principles discussed here are elaborated in textbooks and papers about environmental sampling and more general sampling textbooks where additional details can be found.
References Barnett, V.B., 2002. Sample Survey. Principles and Methods. Arnold, London. Cochran, W.G., 1977. Sampling Techniques, third ed. Wiley, New York.
ARTICLE IN PRESS E.M. Scott et al. / Applied Radiation and Isotopes 66 (2008) 1575–1581
Dounreay Particle Advisory Group (DPAG), 2006. Third Report Scottish Environment Protection Agency, Stirling, UK. Gilbert, R.O., Pulsipher, B.A., 2005. Role of sampling designs in obtaining representative data. Environ. Forensics 6, 27–33. ICRU Report 75, 2006. Sampling for radionuclides in the environment. J. ICRU 6(1). Sanderson, D.C.W., Cresswell, A., Scott, E.M., Lang, J.J., 2004. Demonstrating the European capability for airborne gamma spectrometry: results from the ECCOMAGS exercise. Radiat. Prot. Dosim. 109 (1–2), 119–125. Thompson, S.K., 2000. Sampling. Wiley, New York, 341pp.
1581
Thompson, S.K., Seber, G.A.F., 1996. Adaptive Sampling. Wiley, New York, 265pp. US Environmental Protection Agency, 2006. Guidance on Systematic Planning Using the Data Quality Objectives Process, QA/G-4, US EPA, Washington DC, EPA/240/B-06/001. Tyler, A.N., Sanderson, D.C.W., Allyson, J.D., Scott, E.M., 1996. Accounting for spatial variability and fields of view in environmental gamma-ray spectrometry. J. Environ. Radioactiv. 33 (3), 213–235. Webster, R., Oliver, M.A., 2001. Geostatistics for Environmental Scientists. Wiley Statistics in Practice Series. Wiley, New York.