Reliability Engineering and System Safety 42 (1993) 271-291
Deriving parameter probability density functions M. E. Stephens, B. W. Goodwin & T. H. Andres Environmental and Safety Assessment Branch, AECL Research, Whiteshell Laboratories, Pinawa, Manitoba, Canada ROE 1LO
The long-term performance of a nuclear fuel waste disposal system is typically studied by modelling potential releases of contaminants to the environment and the consequent health risk to humans. To deal with uncertainty in the estimated consequences of the releases, the behaviour of the disposal system may be repeatedly simulated under the direction of a probabilistic assessment code. In each simulation, parameters in the system model take different values to reflect the uncertainty about their values in the real system. The selection of the possible values of the parameters is governed by a probability density function (PDF) for each parameter. This paper examines the question of how to derive parameter PDFs for such assessments. The paper includes: 1. A definition of the concept of probability, and the mathematical properties of PDFs and the related cumulative distribution functions (CDFs). 2. A discussion of the influence of the context in which PDFs are derived on their form and interpretation. Key elements of the context include what is known about the disposal system, the structure and extent of the model description of the system, and how the results of the assessment will be used. 3. A description of approaches to deriving parameter PDFs. In cases where data are unavailable or incomplete, reliance must be placed on the judgement of experts to generate subjective probability distributions. Examples are given of PDFs defined for recent Canadian assessments. The paper concludes with remarks on managing reliably the large body of PDF data for an assessment, how values are sampled from PDFs, and desirable future developments.
1 INTRODUCTION
Associated with a model description of a system are various uncertainties, that is, possible differences between the characteristics of the model and those of the real system itself.~ Recognition is growing of the importance of characterizing and analysing these uncertainties in system assessments and policy analysis in general. 2 Uncertainties enter an assessment for a variety of reasons, and at several stages: 3
The long-term performance of nuclear fuel waste disposal systems is typically studied by modelling potential releases of waste materials from containment in a repository or vault, their m o v e m e n t back through the geosphere, the resulting distribution of radiological and chemical contamination in the biosphere, and the consequent health risk to humans. For this purpose, the p e r f o r m a n c e of a nuclear fuel waste disposal system may be defined as:
1. Creating a conceptual model of the real system Are the important features of the system known? Should one assess more than one scenario, that is, more than one combination of possible system characteristics and sequences of events? Could the processes occurring in the system be described by different models which could lead to significantly different conclusions?
how well the system protects h u m a n health and the environment from potential releases of radionuclides and toxic substances. t~) 1993 Canadian Government. 271
272
M. E. Stephens, B. W. Goodwin, T. H. Andres
2. Expressing the conceptual model in mathematical terms, devising numerical solutions to the mathematical equations, and coping with the limits of accuracy of machine computational methods What will be the impact of limitations on the scope of the model description, and of different simplifications needed to define tractable mathematical equations? For example, what is the effect of using different methods and degrees of discretization of a finite element mesh to describe a continuous flow field? How different would the results be if the model code was written using different mathematical libraries on different computers? 3. Deciding on the most appropriate values of the model parameters to use How different would be the best-estimate values provided by different experts? What would different experts give as the range of possible values? How would their estimates change if they were asked for pessimistic versus realistic values?
Substantial expert judgment has always been required in performing all these stages, and attempts are now being made to formalize its use. 4 Expert judgment is also required in deciding how far to continue efforts to reduce uncertainty, and in recognizing and allowing for irreducible uncertainty when applying the results of the assessment. Probabilistic assessment codes provide a means of capturing and assimilating expert judgment about the appropriate values of model parameters to use. These codes direct repeated simulation of the evolution of the disposal system. Several such codes have been developed to assess radioactive waste disposal systems. 5 The models used in these codes are usually constructed to be as realistic as possible while remaining mathematically tractable. If any bias is intended, it is in the sense that would lead to overpredicting the estimated consequences such as radiation doses or concentrations of contaminants in environmental compartments. Parameters in the models are allowed to take on different values from one simulation to another to reflect the uncertainty about their values in the real system. A probability density function (PDF) for each parameter relates the possible values of the parameter to probabilities that they will be observed in the real system. One value of the parameter is sampled from the P D F to be used in each simulation. The results from the set of simulations are used to generate statistics on the distribution of the consequences of interest, typically the estimated radiological dose to humans, and chemical contamina-
tion of parts of the biosphereJ ~ The distribution of doses and the associated probabilities can be used to define a mean consequence, commonly called the risk from the system. The risk thus encapsulates in one number estimates of both the possible consequences and the corresponding probabilities, consistent with all the information incorporated in the models and data. 7 The risk is analogous to an expected profit or rate of return one might calculate before making a financial decision. The justifications for the PDFs used determine how the distribution of results of the calculations should be interpreted and how they can properly be applied. Thus PDFs are a key component of model-based calculations for a performance assessment. However, capturing the uncertainty in the parameter values does not obviate the need to recognize and deal with the uncertainties in the earlier stages of model abstraction from the real system. This paper focuses on the task of deriving parameter PDFs from available information. The body of the paper consists of three sections, which take up: 1. the meaning of 'probability' and the mathematical properties of PDFs and related cumulative distribution functions (CDFs); 2. the influence of the context in which PDFs are derived on their form and interpretation, that is, what is known about the disposal system, the structure and extent of the model description of the system, and how the results of the assessment will be used; and 3. approaches to deriving PDFs when data are available, and when reliance must be placed on expert judgment. Examples are given of PDFs defined for recent Canadian assessments. The paper concludes with remarks on managing reliably the large body of PDF data for an assessment, how values are sampled from PDFs, and desirable future developments.
2 THE MEANING OF 'PROBABILITY' A N D THE MATHEMATICAL PROPERTIES OF PDFs A N D CDFs We suggest the following definition for probability: ~ The degree of belief, held by the person(s) providing the information, that a quantitative system characteristic will have a value in a particular interval under specified conditions of measurement. In probabilistic performance assessments, the probability associated with a value of a parameter
Deriving parameter probability density functions corresponds to the relative frequency with which randomly sampled values would lie in different intervals of the allowed range of values, in the limit as the number of samples goes to infinity. The probability that the observed value of the parameter is less than a particular value is specified by its cumulative distribution function (CDF). The probability of observing a value of the parameter in different possible intervals is governed by the corresponding probability density function (PDF). A parameter X is continuous if it can assume a value x equal to any of a continuum of values. The
1.0
O,8
O.6
0.4
E
0.2
0.0 i
0
1
cumulative distribution function (CDF) of X, F(x), defined over the sample space - ~ to +0% gives the probability P that X will have a value less than or equal to x, that is:
F(x) = P [ X <-x] If F is a continuous and smooth function, the corresponding probability density function (PDF) of X, f ( X ) , is defined as the derivative of the CDF: dF(x) f(x) =
dx
The value of the C D F for X = x is equal to the integral from - ~ to x of the PDF of X:
F(x) =
f ( x ' ) dx'
The probability that X has a value in the interval between a and b is given by:
P[a < X <- b] =
f ( x ) dx = F(b) - F(a)
273
3
2
4
5
Parameter Value [units]
Fig. 1. A typical cumulative probability distribution (CDF). The CDF gives the probability that the parameter has a value less than a particular value. In this example, the probability that the parameter has a value less than 1 is 0.5.
switches in assessment codes to select between distinct possibilities. Examples are the number of persons in a critical group of human receptors, whether domestic water comes from a well or a surface water body, and which of several types of soil occurs on farmland. The cumulative distribution function for a discrete parameter X, F ( X ) , defined over a sample space U = {xi};=~.,, gives the probability that X has a value less than or equal to a given value x:
F(x) = P[X <- x] Whereas X is defined on a discrete set, F(x) has a value at all x. The CDF of X has the following property:
i
F(x,) = X f(x/)
Because F(x) corresponds to an increasing probability, it must increase monotonically and range from 0 to 1 as x increases from its lowest to highest possible value. 2 Correspondingly, the integral of the P D F of X from -oo to +o0 is 1:
j=l
1.0
0.8
P [ - ~ <--X ~ +~] =
f ( x ) dx = 1
Figure 1 shows a plot of a typical CDF for a continuous variable; Fig. 2 shows the corresponding PDF. The parameter value corresponding to a given fraction of the cumulative probability, (that is, a given value of the CDF) is termed a quantile, percentile or fractile. For example, for a normally distributed parameter, the value of the parameter equal to the mean is the 0.5 quantile; the value corresponding to the mean plus one standard deviation is the 0.84 quantile (approximately) of its CDF. A parameter X is discrete if it can only assume a value x selected from a set of individually distinct values. Discrete parameters are commonly used as
•~
o.,a
~
0,2 0.0
..~...............~... r 0
I
2
3
4
$
Parameter Value [units] GM=1.0 GSD=2,0
Fig. 2. The probability density function (PDF) corresponding to the CDF in Fig. 1. The PDF is Iognormai, with a geometric mean (GM) of 1, and a geometric standard deviation (GSD) of 2. The integral of the PDF between two values gives the probability that the parameter has a value between those values,
274
M. E. Stephens, B. W. Goodwin, T. H. Andres
The probability function of X, f ( X ) , gives the probability that X has its different possible values: f(x,) = P[X = x,] Because F(x) corresponds to an increasing probability, it must increase monotonically and range from 0 to 1 as x increases from its lowest to highest possible value. Correspondingly, the sum of the probabilities that X has each of its possible values must equal 1: f(xi) = 1 i--I
As is the case for a continuous parameter, the CDF gives the probability that X assumes a value less than or equal to x.
3. THE INFLUENCE OF THE CONTEXT IN WHICH PDFs ARE DERIVED ON THEIR FORM AND INTERPRETATION The PDFs chosen for a performance assessment are strongly influenced by what is known about the disposal system, the structure and extent of the model description of the system, and how the results of the assessment will be used. 3.1 What is known about the system Uncertainties from several sources must be incorporated into PDFs for the parameters to be used in model descriptions of a disposal system. These sources include, for example: 3.1.1 Limited information on the repository design Assessments are commonly performed before the disposal facility is constructed. There may be differences between the design of the intended system and the actual facility eventually constructed. Even when an existing facility is being assessed, there are limits to the detail and accuracy of information that can be obtained about the facility and its surroundings, and how they will change in the future. Examples are long-term changes in climate, topography of the location, and dispersion mechanisms and paths. 3.1.2 Limitations in describing real world systems It is generally not possible to describe a real system exhaustively. As the size and complexity of the system increase, there is a limit to the level of detailed description and modelling that is practical. The model of a disposal system may thus represent only selected characteristics of its behaviour. The importance of the omitted characteristics may not yet be well recognized. It is possible to imagine that a more exhaustive
assessment might yield different results, and therein lies the uncertainty. For instance, concurrent physical and chemical processes may have complex, poorly characterized interactions among them. A case in point is the long-term rate of nuclide movement in soil, which is affected by many simultaneous processes, including leaching, capillary arise, runoff and evaporation. Different experts might characterize such processes differently, giving rise to uncertainty in model results. As well, the disposal system may be affected by unpredictable perturbations. For example earthquakes may change the pathways in the geosphere that are available to contaminants released from a disposal repository. Some processes may exhibit variability, that is, fluctuations in time or space. For instance, annual rainfall changes with climate changes. One cannot anticipate accurately what rainfall a site will receive, and so uncertainty arises about environmental impacts that are strongly rainfall-dependent. In modelling such impacts, PDFs should be selected to describe rainfall that reflect the extent of the uncertainty. The net effect of concurrent processes and variability may be modelled through the use of effective or lumped parameters, which estimate the overall or average rate of the detailed processes. Such simplifications introduce uncertainty in the accuracy of the description of local effects, and in the appropriate values of the effective parameters. The uncertainty arises because the abstraction process that defines effective parameters could be carried out in different ways, possibly leading to different results. The effective parameter approach may also have been adopted because of insufficient data on local values, for instance, permeability in a volume of rock. Alternatively, an effective parameter may be considered adequate for the needs of the assessment. For example, it may be sufficiently accurate to estimate the anticipated rate at which waste containers corrode and fail at different places in the repository as a function of the mean value of the distribution of temperatures throughout the repository. Further, the mechanism(s) governing the phenomena may only be partially understood. For example, many sorption or sorption-like processes are approximated by a linear model (that is, the amount of a substance sorbed is assumed to be proportional to its concentration in groundwater). With care, non-linear sorption isotherms can be approximated using the linear Kd approach. 9 Where a modeller is uncertain about the best way to model a process, one option is to use one model and adjust parameter distributions so that results would span those produced by alternative models. Two guidelines apply. First, the full range of behaviour of all feasible models should be covered. Second,
Deriving parameter probability density functions parameter distributions should be adjusted to favour more pessimistic results, to defer criticism that adjustments to distributions are made to produce 'acceptable' results. Great caution must be used in broadening or skewing a PDF for this purpose. It may not be easy to determine which parameter values will give higher consequences. For instance, changing the degree of retardation of different nuclides in a decay chain during transport may lead to either higher or lower estimated doses from the whole chain. An explicit justification should always be recorded. Making an allowance for model uncertainty by adjusting a PDF must be done with great caution. It is NOT appropriate if the model would exhibit a fundamentally different form of behaviour from that of the real phenomenon.
3.1.3 Generalizing from research data There may be measurement error in the data due to human error or undetected instrument malfunction. Or there may be inherent ambiguities in relating the value of the parameter actually measured to the value of the parameter of interest. For example, it is very difficult to obtain meaningful values for the in situ redox potential of groundwater by direct measurement. Uncertainty is also introduced when data are extrapolated to estimate parameter values outside the range of calibration, over much longer periods of time, or for different conditions. For instance, corrosion rates for metals are measured in short-term experiments. These rates must be projected to cover the very long times of an assessment, which leads to uncertainty in estimates of the rate at which waste containers will be perforated. Similarly, measurements of sorption properties of buffer clay materials in controlled laboratory conditions must be interpreted to predict buffer behaviour when it is exposed to real groundwater, and to repository heat and pressure. Uncertainty arises because extrapolated trends are not tightly constrained, and could take a variety of forms.
3.1.4 Human behaviour Unpredictable human choices in the far future, such as agricultural practices, new lifestyles and housing conditions, will determine who the most exposed humans will be, and how they will be exposed. Uncertainty about credible future human behaviour can be quite large.
3.1.5 Mathematical approximations in setting up and solving model equations Approximations and simplifications may be made in describing the system to arrive at a mathematically tractable set of model equations. For example, an actinide nuclide decay chain may be shortened to include only its main members. Approximate
275
numerical methods, for instance finite difference techniques, may be required to solve the resulting equations when analytic solutions are not feasible. Uncertainty arises if the approximations and simplifications lead to significantly different results from the original model. For high-level waste disposal systems, the impact of uncertainty from these five sources is compounded by the need to assess performance for hundreds or thousands of years. 3.2 The structure and extent of the model description of the system The more extensive and complex the system is, the greater may be the uncertainty in the parameters used in the models. A model description of a disposal system may be structured in different ways, and the description may extend to greater or lesser extent into the surrounding environment. The challenge lies in establishing how precise a description is necessary for the purposes of the assessment, and whether it is feasible to achieve this level of precision given the information available. Exhaustive quantitative simulation of system behaviour is neither necessary nor possible. To decide on the structure and extent of a model description required, one needs a systematic way of identifying and combining the factors (features, events and processes in the system and its surroundings) that can affect system performance. A scenario for a waste disposal system is a combination of system factors describing a complete mechanism and pathway by which radioactive or toxic materials may be released from an engineered containment, be transported through the geosphere to the biosphere, and come into contact with humans. 10 Factors can be natural (for example groundwater flow) or man-induced (human intrusion into the repository). A scenario can include concurrent factors (simultaneous diffusion and advection of nuclides through rock). Different scenarios can have several factors in common. Significant discrete events, such as human intrusion into the repository, can instead be modelled separately in their own scenario. Methods to define and analyse scenarios are currently under development. ~ The context provided by each scenario determines the range and probability of allowed values of parameters in the system model used to describe the scenario. PDFs used with a model must also be compatible with the extent of the system description, particularly how tightly the system is considered to be coupled to its surrounding environment. For instance, a nearsurface, low-level waste repository in a clay bed near a sea coast may well be affected by erosion, sea-level and climate changes. Such processes would be of
276
M. E. Stephens, B. W. Goodwin, T. H. Andres
much less significance to a deep geological repository for high-level waste located in the middle of a continent. On the other hand, either repository may be significantly affected by long-term weather cycles through changes in regional flow patterns of groundwater that could disperse any released waste material. 3.3 How the results of the assessment will be used A performance assessment may be performed for a variety of purposes. The PDFs should be chosen accordingly. For example, assessments may be performed:
3. 3.1 To scope and explore major alternatives offered by a generic disposal concept An assessment may require larger variability and uncertainty if one is evaluating a generic proposition (for instance, high-level radioactive waste in metal containers buried at some unspecified depth in a geological formation), than, if one is evaluating a specific design (used C A N D U T M fuel in titanium containers emplaced in tunnels of a particular geometry at a depth of 1000m in granitic rock with particular properties). The PDFs used may include all the alternatives to be studied because it is not certain which of them will be selected. Examples might be representative sites in different geological settings, several container materials, or several repository geometries. Alternatively, the alternatives could be studied in separate sets of calculations. The PDFs used in each set of calculations would be chosen to represent the remaining uncertainty consistent with the lone alternative. 3. 3. 2 To identify research needed to reduce uncertainty about a system characteristic to an acceptable level The aim in this situation is to decide which parameters in a particular system design should be studied to achieve the most useful reduction in uncertainty. For example, the PDFs describing possible physical and chemical characteristics of an important barrier layer of high-quality rock may incorporate ranges of values to investigate what values are required to slow waste transport in groundwater to an acceptable rate. Even design specifications such as wall thickness of waste containers could be subject to uncertainty when sensitivity analysis is performed, and PDFs should be adjusted accordingly. 3. 3. 3 To optimize disposal system design PDFs for a controllable parameter of uncertain importance, for instance, container wall thickness, may be varied to study what parameter values are best suited to meet safety and other performance criteria.
3. 3. 4 To compare system behaviour to risk and safety criteria set by the regulatory authorities, to obtain approval of a generic disposal concept or to license a particular facility The regulator may prescribe in detail the manner in which the calculations are to be performed (for example, Refs 12 and 13). For instance, the regulator may set a minimum period over which mathematical models must be used to estimate health impacts from the disposal system. PDFs used to describe chemical properties of buffer clay, which may slowly evolve over time, would then have to be consistent with possible parameter values over the period set by the regulator. Another example would be the case where the regulator has specified that doses are to be calculated with respect to a defined receptor. Uncertainty about what kind of person would live near the disposal site thousands of years in the future would then not be a factor in the calculation. 4. APPROACHES TO DERIVING PROBABILITY DISTRIBUTIONS Each parameter in a model may have a unique combination of sources of uncertainty associated with it. Therefore it is not practical nor possible to set down an explicit, step-by-step, all-inclusive procedure for deriving a set of PDFs for an assessment. Even so, progress has recently been made in systematizing the process of deriving CDFs and PDFs for performance assessments. 4"~4 For example, Fig. 3 shows the five-step procedure used by Tierney to construct CDFs for a recent assessment of the Waste Isolation Pilot Plant (WIPP) in the USA.14 Guidance to data contributors may take the form of a suggested series of stages to follow, for example: ~' (a) become familiar with relevent information on the repository design and site characteristics; (b) decide whether and how the PDF is to be used to select options; (c) decide whether and how the PDF is to allow for model error and uncertainties; (d) decide whether and how the PDF is to incorporate any dependence on another parameter; (e) select a PDF type for the parameter; and (f) quantify the attributes of the PDF. Each of these stages is discussed below. Illustative examples from recent Canadian assessments are briefly described. More detailed information on the examples may be found in Refs 15 and 16. 4.1 Become familiar with the repository design and site characteristcs Data contributors must be familiar with all available information about the design of the disposal system
Deriving parameter probability density functions
277
X, I Solicit Information I about X from RI In I
I For Each Vadable
the Following MannerJ
S t e ~
~
Step 2
~ Step
3
Yes
~''-
½ No
/ ~ ~ ~ n d , . "~' r ~ M _ ....
Y
|,.. Analyst Constructs Either an Empirical
CDF or l
Piecewlse4inear CDF from Data
Step 4
Estimates of Range of X I Possible, On. I ro O~I."IFQ llowlng: I I I
I t
Parcemile Pol.ta, the Mean Value,
o.,st,o.
I Analyst Constructs I MEF Distribution That I is Appropriate to Kind I of Subjective Estimate[ Provided by RI. J
I I
I Analyst Uses Distribution Suggested byRI
TRI-6342-634-0
Fig. 3. Five-step procedure used to construct CDFs for a recent assessment of WIPP. RI refers to responsible investigator (that is, subject-matter expert); MEF refers to the Maximum Entropy Formalism. ~4 and the geological characteristics of the site(s) to be assumed. They must have a clear idea of how firm knowledge is about the disposal system, what the structure and extent of the model description of the system are, and how the results of the assessment will be used. It may seem almost superfluous to make this point, but our experience shows that specific effort is essential to be sure data contributors do understand the problem being assessed. 4.2 Decide whether and h o w the P D F is to be used to select options
Some elements of a model may constitute alternatives or options, such as whether a future population practises irrigation. Each option may be studied in a separate set of assessment calculations. To calculate an overall risk estimate, the results of the studies must then be combined using the probability of each option. Alternatively, the options can be combined into one set of simulations by introducing a decision variable. For each simulation a value is sampled for the decision variable, determining which of the options will be selected. The P D F for the decision variable gives the probabilities of each option. For example, in current studies with the SYVAC3CC3 (SYstems Variability Assessment Code, Genera-
teen 3--Canadian Disposal Concept, Generation 3) code, a decision variable is used to select between a well and a lake as the source of drinking water for a critical group. The PDF for the source is a piecewise uniform distribution over the two zero-width intervals [1, 1] and [2, 2], which were assigned equal weights of 0.5, corresponding to expert opinion on the probability that future individual households will use a well or lake. Another example from SYVAC3-CC3 is the type of soil in the fields from which the critical group obtains its food. A decision variable is used to select among four soil types commonly found on the Canadian Shield. The PDF for the decision variable has four zero-width intervals [1, 1], [2, 2], [3, 3], [4, 4], with weights of 0-57, 0-05, 0-24, and 0.14. The weights correspond to the areas on the Shield currently covered by the four soil types (sand, loam, clay and organic), adjusted according to expert judgment to account for the suitability of the different types for farming. 4.3 Decide whether and h o w the P D F is to allow for model error and uncertainties
The models and parameters describing a system are highly interdependent. As a consequence, in some instances parameter PDFs may be used to account for the error or imprecision resulting from modelling choices.
M. E. Stephens, B. W. Goodwin, T. H. Andres
278
For example, the peak value of the PDF for a sorption coefficient may be skewed to lower values. This would bias the distribution of results towards faster transport times, hence earlier, higher doses from a less-dispersed pulse of released radionuclide. This might compensate for possible inaccuracy of the linear sorption model. '~ While this approach can be used to increase the weight of higher consequences in an average consequence, justification must be provided for doing this instead of using a more detailed model. As well, as noted in Section 3.1.2, it may be difficult to determine which bias will lead to higher consequences. PDFs can be used to introduce small, random variations into model results by adding a residual error parameter in the defining equations. 17 For example, the SYVAC3-CC3 geosphere model uses a distribution coefficient, Kd, to quantify the extent of sorption of an element on a mineral. The value used for Ku is calculated from an equation fitted to results from laboratory measurements, multiplied by a random error factor expressing the uncertainty in the fitted equation. The error factor is based upon a random number sampled from a iognormai PDF with ~.~10ometric mean 1.0 and geometric standard deviation , limited to the interval 0-1-10.0 (+ about 30 of the distribution). The discussion in Section 3.5 on the Iognormal distribution explains the meaning of geometric mean and geometric standard deviation. Uncertainty as to which of several possible concurrent processes or pathways dominates can be dealt with in several ways. Individual simulations can be run with detailed research models of the different mechanisms to establish which of them dominates under what conditions. An example is whether the release of radionuclides from a waste form is kinetically controlled or solubility controlled. Both processes can then be included in the system model for use when the appropriate conditions arise. Comparison calculations could identify which alternative produces the most unfavourable result, and a decision-making algorithm could be included in
• .:~.... "..:.:.¢ ..
\, '~.J/r -1.0
"...: ...'....., .~ .....:.~
; , , , ,. "4"1 -0,8
Strong Negative Correlation
the assessment code to systematically select the higher consequence alternative, for instance, the most contaminated of several possible sources driving an exposure pathway. Another option is to include each of the alternatives in the system model, for example, parallel exposure pathways in the geosphere and biosphere, and sum their effects. PDFs may be used in the final assessment code to describe uncertainty about the characteristics of the competing pathways. For instance, details of groundwater movement at an assumed site in a Canadian assessment were modelled using the M O T I F groundwater flow research code. TM The predicted groundwater velocity fields were used to define a network of flow segments which summarized the essential characteristics of the pathways through the successive layers of rock in the SYVAC3-CC3 geosphere model. The PDFs used to set the hydraulic properties of the segments, permeability, rock sorption characteristics, and so on, introduced uncertainty in the description of flow due to use of the summary model. 4.4 Decide whether and how the P D F is to incorporate any dependence on another parameter Sampled parameters may be correlated to (that is, tend to depend on, or follow the value of), one or more independent parameters. The degree of the tendency is often measured by a correlation coefficient. Correlation coefficients always lie between - 1 and +1, and take one or other extreme value, if and only if the two parameters are exactly linearly related. If the parameters are independent, then the correlation coefficient equals 0. Ignoring correlations may lead to physically impossible simulations. Figure 4 gives examples of data correlated to various positive and negative degrees. As mentioned in the discussion of normal and correlated normal distribution types in Section 4.5 below, a set of values of a dependent parameter can be correlated by a simple expression to a set of values of another independent parameter if both parameters
• ~.:- : .' ...:-.. ..;~-':. ..'...,.~ .... ...-.; ...:.' ,:,.
-0.4 Negative Correlation
0.4 Weak Correlation
:" e."¢ 0.8 Positive Correlation
I/"+l .¢/
1.0 Strong Positive Correlatlon+ '
Fig. 4. Illustrations of data exhibiting various degrees of positive and negative correlation. "~ Indicative values of the correlation coefficient, C, are shown below the plot.
Deriving parameter probability density functions are represented by untruncated normal or lognormal PDFs, and if an appropriate correlation coefficient is available. For example, soil partition coefficients and soil-to-plant concentration ratios are correlated in this way in the SYVAC3-CC3 biosphere model.19 Strong dependence is better treated through a functional relationship rather than by separately sampling the parameters and coupling them by a correlation coefficient. 2° For example, in SYVAC3CC3 solubility and sorption coefficients for five key elements are represented as functions of basic chemical parameters such as salinity and electrochemical potential. The latter are treated as the randomly sampled 'independent' parameters. The coefficients are then intercorrelated via their dependence on the basic parameters. 21 4.5 Select a PDF type for the parameter
Table 1 shows eight types of PDF recognized by the SYVAC3 executive code. 7 The available types are typical of many such assessment codes. Table 1 gives, for each PDF type, sample plots of the PDF type and the corresponding CDF, the attributes (that is, coefficients) that must be specified to define the PDF completely; the mode, median, mean and standard deviation of the PDF defined in terms of the attributes; comments on the shape of the PDF; typical characteristics of parameters for which the PDF may be used; and examples of the use of the PDF. If required, PDFs can be truncated, that is, the parameter values can be limited to lie between specified boundaries. If a PDF is truncated the remaining part of the PDF must be rescaled so that it integrates to 1, as mentioned above in the definition of a PDF. The rescaling is handled automatically by SYVAC3. The eight PDF types are: (i) Constant A constant PDF implies that it is believed that parameter X can have one, and only one, precisely known value. A constant PDF is thus a point estimate of the value of the parameter; the standard deviation is zero. Strictly speaking, the constant distribution should be used only when the value of the parameter is known with great accuracy, or when variations in its value have negligible effect on the system description. It is sometimes convenient to use a constant PDF to set the value of a well-defined constant property of the system, initial amount of waste in the repository for instance. This simplifies changing the value later, for sensitivity analysis, for example. The constant distribution may also find use in replacing another, broader distribution for a parameter when it becomes possible to fix its value with great confidence.
279
(ii) Uniform A uniform PDF implies that all parameter values in an allowed range are considered equally probable, and all values outside the range are impossible. The uniform distribution can be justified on the basis of Laplace's principle of insufficient reason: if available evidence rules out favouring any of the possibilities, then equal probability should be assigned to each. Use of the uniform distribution may lead to a relatively wide spread in model results in comparison with calculations using one of the peaked types of distribution with limited range (for instance, truncated normal or triangular). The uniform PDF may suggest itself if no information is available on the relative probabilities of the different possible values, but bounding values are known with confidence. There should be good physical reasons for the discontinuities in the PDF at the endpoints. If the range is also ill-defined, another type of distribution should be considered instead. (iii) Piecewise uniform The piecewise uniform distribution assigns different probabilities to parameter values lying in each of several non-overlapping intervals. This distribution is very flexible because it can include intervals over which f (x ) = O. This distribution can be used to approximate any continuous distribution expressed in the form of a histogram. It can approximate data poorly fitted by the other available PDF types. Finally, the piecewise uniform distribution can fit a discrete distribution, or a mixed (part continuous, part discrete) distribution. As discussed previously in Section 4.2, it can be used to assign probabilities to decision variables selecting among options. (iv) Loguniform A Ioguniform PDF assigns equal probability to all values of the logarithm of the parameter value. Values of the parameter must be positive and lie between two bounding values. Lower parameter values are more probable. The loguniform distribution may be suitable if, for instance, little information is available on the relative probabilities of different values of the parameter, except that it is positive and measured on a log scale. There should be good physical reasons for the discontinuities in the PDF. (v) Normal and correlated normal The normal (or Gaussian) distribution is unbounded and symmetric with respect to its mean value, which can be any real number. A normally distributed variable X, with mean g and standard deviation a, has a probability of 0-683 (approximately) of lying between g - o and /~ + o, and a probability of 0-955 of lying between ~ - 2 o and ~ + 2 o . The normal distribution is unbounded and must often be truncated to limit values to physically possible ranges. Par-
Table 1. Types of probability density functions (PDFs) recognized by S Y V A C 3 . u' PDF and CDF Plots
Type
Constant
1
a
F(x) 0
Uniform
PDF
b(X - a) (Dirac delta function) if x is continuous 1 for x = 9 and 0 elsewhere if x is discrete
×
I
-×
a
f(x){ 0
1 b-a 0
F(x) 0
a
b
lower bound of X: a
otherwise
upper bound of X: b
:~
wi
1
Loguniform
a3
0
b~
x
1 o
a
'6-
number of non-overlapping intervals where f ( x ) > O: n; for each interval i: lower and upper bounds of X: a, and bi (ai <-bi), weight of X values in i: w,(w, > 0) where ~, w, = 1
if a, = b i if a i < - x < b i
(b, - ai) a~61 '%
otherwise
1 x.ln(b/a)
for 0 < a - < x < b
lower bound of X: a
0
otherwise
upper bound of X: b
×
Normal/ Correlated Normal
1 [-(x --jexp
- ~)2~
)
X
Lognormal/ o Correlated 1 Lognormal F(x) ? 0
Triangular
/
'
×
xV'2~r[In(GSD)]2 e x p ( - [ l n ( x ) - ln(GM)]Z) 2[ln(GSD)] 2 /
x
0
~
for x > 0
5(×) 0 a 1 F(x) ~ _ . ~ _ _ _ ~ ( ~ _ _ ~ 0
for x--<0
2(x - a) (b-a)(c-a)
f°r a<-x < c
2(b - x ) (b-a)(b-c)forc~x
a
0
x '~'
~(1-x'W
'
B(~,, ~)=
f,
' t ~' '(1
0
geometric mean of X: GM; geometric standard deviation of X: GSD (see Figure 5); if correlated to Y, correlation coefficient: C; if truncated, whether by value or quantile, and the bounds: a and b.
lower bound of X: a upper bound of X: b most likely value of X: c, where a -< c --
for a ~ x < b
where x' = (x - a)/(b - a) See Figure 6
mean value of X ; / J ; standard deviation of X: o; if correlated to Y, correlation coefficient: C; if truncated, whether by value or quantile, and the bounds: a and b.
otherwise
B( o:,, o~2) Beta
constant value of x: a
for a<_x < b
wib(x-ai) Piecewise Uniform
PDF Attributes/Constraints
t)'~2 'dr
otherwise
lower bound of X: a upper bound of X: b shape parameters: oL~ and o~2, where oL~> 0 and % > 0
Table 1 (contd.) Type
Mode
Median
Mean
Standard Deviation
Constant
a
a
a
0
Uniform
Not Unique
Piecewise Uniform
May not be unique
No simple formula
Loguniform
a
V~
Normal
/~
#
Correlated Normal
#,
l~'x = #x + (Y - #y) C ° x
a+b
a+b
b-a
2
2
z~
~ wi(b, + ai) ~
No simple formula
2 b - a
~/gb 2- a 2
in(b/a)
,.
/~
(b - a) 2
In(b/a)
/In(b/a)/2 o
#'
a'x = Ox~/1 - C 2
oy GM
Lognormal
GV where G V = e °2
GM
GM~/-GV
GM~/GV(GV-
1)
..................................................................................................................................
Correlated Lognormai
GM' GV'
GM' , where G M ' = e"' and G V ' = e ° 2
GM' Gv~G-~
G M ' ~ / G V ' ( G V ' - 1)
a+b a + V~(c - a ) ( b - a ) / 2
c > T
a+b Triangular
c
c
c=
b - ~/(b - c)(b - a)/2
See
Beta
2
~/aZ+bZ+cZ-ab-ac-bc
3
18
a+b 2
C ~ - -
a + (b - a)I,,.5(0(,, 0(z), where Ix(0(i, 0(:) is the incomplete beta function
"Comments on Shape of P D F " Column
a+b+c
0(i
VCaq0(2
0(~ ÷ 0(2
(0(i ÷ 0(2)\/0(i ÷ 0(2 ÷ 1
Table 1 (contd.) Type
C o m m e n t s on Shape of P D F
Constant
only one value is allowed
Uniform
all allowed values are equally probable
Piecewise Uniform
can approximate almost any functional shape
Loguniform
In(x) is uniformly distributed between In(a) and ln(b) lower values of X are m o r e likely to be selected all allowed decades of values are equally probable
Normal/ Correlated Normal
about 68.3% of randomly sampled values will be between # - o and # + o, about 95.5% between # - 2 o and # + 2o, and about 99.9% between # - 3 o and # + 30 can be correlated to another normal or lognormal P D F
Lognormal/ Correlated Lognormal
In(x) is normally distributed with m e a n # and standard deviation o (Figure 5) about 68.3% of randomly sampled values will lie between G M / G S D and G M . G S D , about 95.5% between G M / G S D 2 and G M - G S D 2, and 99.9% between G M / G S D 3 and G M - G S D 3 can be correlated to another normal or lognormal P D F
Triangular
the bulk of the values can be skewed towards either limit by shifting the m o d e "a + (0(' - 1)(b - a) 0(~ + o:2 - 2 a mode:,
Beta
if0(~>land
0(2>1
if 0(~ < 1 and o:z -> 1, or if 0(~ = 1 and 0(2 > 1
a and b
i f 0 ( ~ < l and 0(2<1
b
if 0(t -> 1 and 0(2< 1, or if 0(~ > 1 and 0(2 = 1
no unique m o d e
if 0(~ = o:2 = 1
Figure 6 gives examples with different values of 0(~ and 0(2.
M. E. Stephens, B. W. Goodwin, T. H. Andres
282 Table 1 (contd.) Type
Used for Parameters
Example
Constant
with a well-known, fixed value for which few data are available, but firm bounds are known for which any of the other PDF types are not suitable, or with two or more disjoint sets of possible values, or which are to be represented by a histogram which are intrinsically positive, with limits which have been estimated; relative weights may be poorly known, but lower values are considered more likely
radionuclide decay constants
Uniform Piecewise Uniform
Loguniform
rate constant for UO2 dissolution probability of food coming from a field lying on different types of soil permeability of backfill in repository mean corrosion rate of waste containers/ animals' water ingestion rate (correlated to animals' feed ingestion rate) depth of a domestic well/ distribution coefficients for elements on soil (correlated to plant/soil concentration ratio)
Normal/ Correlated Normal
which are the sum of values or variables; may be correlated to an independent normal or lognormal PDF
Lognormal/ Correlated Lognormal
which are the product of values or variables; may be correlated to an independent normal or lognormal PDF
Triangular
for which upper and lower bounds and the most likely value have been estimated
pH of groundwater in repository
Beta
for which upper and lower bounds and the most likely values have been estimated requiring a very flexible function (see Figure 6)
factor used to calculate the redox potential of water in buffer material from the redox potential of water in contact with used fuel
ameters whose values reflect the sum of the effects of several independent variables (each not necessarily normal) often follow the normal distribution. The goodness of the fit depends on the characteristics of the contributing distributions, and improves as the number of variables increases. These properties follow from the Central Limit Theorem. 22 The normal distribution is suitable for parameters representing the sum of the effects of unmodelled contributing processes. An example might be human food consumption rate. As discussed above in Section 4.4, a parameter X may be correlated to an independent parameter Y. If both X and Y are distributed normally, a correlation coefficient C can be introduced that will adjust the sampled X values for a set of simulations to allow for the correlation to y.Z3 That is:
x = ~x + (Y - Itv)Cox/OY + ~
C2)°x z
where x = t h e sampled value of X, adjusted for correlation to Y /~x = the specified mean for sampled values of X Ox = t h e specified standard deviation for sampled values of X y = the sampled value of Y /~v = the specified mean for sampled values of Y
O'y
the specified standard deviation for sampled values of Y C = the specified degree of correlation between X and Y z = a random number, sampled from a normal distribution with mean ~ = 0 and standard deviation o = 1. =
This technique may be used to cause groups of parameters to exhibit a c o m m o n correlation. For instance, in SYVAC3-CC3, the PDFs for diffusion coefficients and sorption capacities of elements on buffer and backfill are all correlated to one of four dummy parameters, respectively for cations, anions, actinides and tracer elements. Values of each d u m m y parameter are selected from a normal P D F for that class of species. Since the PDFs for all elements of each class are correlated to a c o m m o n parameter, they will all tend to have higher or lower values together in a given simulation. (vi) Lognormal and correlated lognormal The lognormal distribution is used when the logarithm of the parameter value is normally distributed. Figure 5 illustrates the relationship between the PDFs for the natural log of X and for X itself. The lognormal distribution is skewed toward lower values. X cannot have a value less than zero.
Deriving parameter probability density functions
PDF of x
[/
283
Mode = GM/GV
~
Median = GM 1/ /
/
/
]
GM/
GM
~
\
x
F
~
GM
GV ~ / (Mod'e) I~ (Mean) \ (Median~
GM~'G~'
/
\~1
~-e
=
I ;
"\~ /
~-2~ I
Mean
\
GM'~'V' GM'GSD t
GSD
\
x
\
I
~.
~.,-~
V.-:Z~r
log(x)
Fig. 5. Characteristics of the lognormal distribution. "J GM = e F'=
x
where/~ can be estimated from n data points {xj} by: ~u=
In xi n
GSD = e ° and GV = e "z where o can be estimated from n data points {xi} by o=
(In xl - l~)2/(n - 1 t. = I
The lognormal distribution is suitable for an experimentally measured quantity that is the product of several other independent measured quantities, such as the output from a sequence of dilutions or geochemical separations. The logarithms of the quantities will then be additive, and the Central Limit T h e o r e m will apply to them. The lognormal distribution for a p a r a m e t e r X can be interpreted by analogy to the corresponding normal distribution for In X. A lognormally distributed variable X, with geometric mean G M and geometric standard deviation G S D , has a probability of 0.683 (approximately) of lying between G M / G S D and G M . G S D , and a probability of 0.955 (approximately) of lying between G M / G S D 2 and G M - G S D ~. (vii) Triangular The triangular distribution is bounded and has a single mode. A triangular distribution may be a suitable choice in cases where upper and lower limits on the possible values of X are known with confidence, and the most probable value (mode) of X can be estimated. Use of the triangular P D F also makes it clear that great detail is not claimed for the characteristics of the distribution.
(viii) Beta The beta distribution can take on a variety of shapes (Fig. 6). Parameter values are limited to some well-defined range of values. The beta distribution must be used with care because it is defined in terms of two quantities, aq and o~2, which are related to the mean, mode, median and standard deviation of the distribution in non-trivial ways. Pertinent experimental data may provide valuable indicators of the appropriate type of P D F for each parameter, and help in making first estimates of the attributes of the distributions. H o w e v e r , conditions in which field and laboratory m e a s u r e m e n t s have been taken may not correspond to the anticipated conditions in the far future covered by the assessment. It may also be difficult to judge what the future conditions are likely to be. For either reason, a P D F that was derived directly from m e a s u r e m e n t s may have to be modified to allow for the different conditions. If many data points are available, a histogram of the data can be used directly as a piecewise uniform distribution. Well-developed techniques are available
284
M. E. Stephens, B. W. Goodwin, T. H. Andres
~2 : 3.
/
/
~'\\, a2
=2.
/
! a:2 =
\
i
/
I. _
f,"
x
t
~x2
=
.5
~1
=
"5
a I
=
1.
a a
:
2.
ai
=
3.
Fig. 6. Sample beta distribution functions showing the effect of changing shape parameters o:, and o~2.'"
to fit the data directly to a standard mathematical distribution type or to define an empirical distribution. For example, Fig. 7 shows a lognormal PDF that was derived for the depth of wells that might be drilled in the vicinity of a hypothetical repository. ~ Data on the depth of 89 existing domestic wells in the study area were grouped into the histogram shown in the figure, and then they were fitted to the lognormal distribution that gave the best value of a selection function measuring the match between the data and lognormaltype PDFs. 24 An upper bound of 200m could be placed on the possible well depth because of the spacing and orientation of fracture zones at the site from which wells could draw water.
Another example is shown in Fig. 8. The parameter involved relates the concentration of a contaminant in the atmosphere to its concentration in the surface soil below. 25 The histogram in Figure 8 represents annual average dust loading data from 149 locations on the Canadian Shield over 14 years. A lognormal PDF 0.4
0.3
n
140 Canod i 07 s i tes t'/1~'
o
C O"
/,Hl ,,,: 22 T
~- 0.2
5 "7,
i
MA L
i
i
i
~.., SIGNm
~zGnn^2
i
o.,.s~o,oooo
O.T
0.3~IS
O.IIBI
0.0 IO -° ~Q
i°
t~o.oo WELL
26o.oo
2~o.oo
DEPTH FOR 89 WELLS
Fig. 7. Use of a selection function to fit data on the depth of domestic wells at a Canadian site to a lognormal PDF. ~6"24 The original depth data from 89 local wells are shown as a histogram. The Iognormal curve fitted to the data is also plotted, along with its attributes.
10 -8
ROL
(k O m
-a
)
10 7
I(3-e
Fig. 8. Distributions of atmospheric dust-load parameter ADL. 25 The histogram represents annual mean data over 14 years from 149 Canadian sites. The lognormal distribution labelled 'atmosphere submodel' was derived from it for the assessment code. The curve has the shape of the normal distribution type because the log of A D L is plotted along the abscissa (compare to Fig. 5). The distribution labelled 'biosphere 2' was used in an earlier atmosphere model, in which A D L values were derived from daily means at a generic 'dusty' site. The distribution labelled 'Atikokan' is based on daily mean measurements of A D L over eight years at one site. This distribution is broader, but has a lower mean value than the distribution of annual mean data from 149 sites.
Deriving parameter probability density functions fitted to these data was used in calculations to estimate dust loading at a randomly selected site on the Shield. Also shown in the figure is a P D F for dust loading that was generated from daily average data collected at one site near Atikokan over an eight-year period. The distribution calculated from daily data for one site has a lower mean value, but a broader range, than the annual data for many sites. If the dust-loading parameter is important to the model in which it is used, the distribution of consequences, such as dose to an individual from air inhalation, will have a different mean and broader range of possible outcomes if the daily average data from the one site are used. This underscores the importance of matching the P D F derived from actual data to its intended interpretation in the assessment. The P D F labelled 'biosphere 2' was used in an earlier model of the dust loading process. This distribution was derived from daily means at a generic 'dusty' site. More often than not, directly relevant data are sparse. Projection or analogy may be invoked to assign probabilities. For instance, annual total precipitation may exhibit a distribution over a future assessment period of a few centuries similar to its distribution in the recorded past. In the same vein, it might be argued that the sorption properties of strontium will be similar to those of calcium because both elements are in the same group of the periodic table of the elements. Formal methods are being developed to estimate probabilities on the basis of the judgment of one or more experts. One technique, probability encoding, involves first using experts' responses to generate points on the CDF for the parameter; the CDF is then translated into the corresponding PDF. 8 For instance, to define a probability, p, that a sorption coefficient exceeds a value x, the expert would be asked whether he/she prefers either:
The value of p for which the person is indifferent as to which of the two wagers he/she makes gives a point on the CDF for the parameter. Repeating the exercise several times gives the shape of CDF representing the expert's judgment. The derivative of the curve is the corresponding PDF. Specific precautions must be taken to avoid well-known biases that tend to affect estimation of probabilities: ~ (i) motivational biases: wanting to please the assessment team, to appear to be the expert, or to avoid appearing pessimistic; and (ii) cognitive biases: focusing on recent rather than all information, making comparisons with respect to a single individual piece of information, filtering out information that doesn't match expectation, believing or considering more probable events for which a complete scenario can be envisaged, or making unrecognized assumptions. To limit the impact of the biases of any individual expert, one may collect and seek to reconcile the opinions of several experts. In a recent study, 26 eight experts were asked to generate probability distributions for hydraulic conductivity and porosity in basalt at the Hanford Site in Washington State. The results are shown in Fig. 9. 27 There were clearly differences in the experts' views. Methods exist to resolve such differences by having the experts exchange estimates and seek consensus. 8"28 However probability encoding is an expensive process. Experts must be chosen carefully and should include someone familiar with the practicalities of defining and measuring the quantity, as well as a person familiar with how the parameter will be used and interpreted in the assessment model. Finally encoding should not be used as a substitute for research that could provide direct evidence. 26 Similarly, methods exist to update a probability distribution when new information becomes available. 28't7 To apply these methods it must be
(i) entering a lottery with a probability p of winning a sum of money; or (ii) winning the same sum of money if the sorption coefficient has a value less than x.
loo
m < an O
285
80
DISTRIBUTIONS OF THE INDEPENDENT, NATIONALLY RECOGNIZED EXPERTS;
-~
60
CUMULATIVE PROBABILITY
"-~
I
i
/.
7/-
_"
40 I.-
20 U
0 10 .7
1 0 -6
1 0 .5
1 0 -4
1 0 -3
10-2
1 0 -1
10 0
EFFECTIVE POROSITY PSB411 -e$
Fig. 9. CDF plots for the effective porosity of basalt at the Hanford Reservation in Washington State. The plots represent the independent judgements of eight experts, and were generated using the probability encoding method. 27
286
M. E. Stephens, B. W. Goodwin, T. H. Andres
possible to make a reasonable initial guess at the distribution. Whether or not a formal process is used to elicit probability information, the following sorts of questions may arise about the significance of the available data when one is selecting an appropriate PDF type for a parameter: (i) What is the primary source o f uncertainty in the data ? For instance, do the data come from replicated experiments, or from experiments executed under a variety of conditions? The distribution of data from repeated experiments may correspond to experimental error; data from experiments under varying conditions may incorporate uncertainties from other sources as well. This may influence the shape of the PDF. For instance, if the dominant uncertainty is experimental error in repeated measurements, the normal distribution may be appropriate. (ii) Were at least some o f the measurements made under conditions near the anticipated extremes o f possible conditions in the disposal system ? If no such measurements are available, can firm limiting values for the parameter P D F be set for other reasons? For example, experiments on radionuclide sorption as a function of water chemistry and temperature may not cover the complete range of expected conditions in a repository. It may still be possible to estimate bounds for the possible values on the basis of observations on natural analogs. Pertinent well-established physical arguments may help in setting firm lower and upper limits on the possible values of the parameter. For example, porosity must lie between 0 and 1. As more information about a parameter accumulates, uncertainty about its possible values may decrease. It may then be possible to update the P D F by decreasing the allowed range of parameter values and establishing more pronounced peaks. 2~'17 (iii) Do the assumed conditions in the assessment correspond to a subrange o f the conditions in which the data were measured? Will such considerations affect an estimate of the most-likely value(s)? For example, under the assumptions made in the assessment, the rate that groundwater corrodes waste containers may be limited to a smaller range than laboratory measurements if the electrochemical potential in the geosphere is fixed by the presence of large amounts of certain minerals. (iv) Are there any limits on the range o f parameter values required by modelling considerations? How close are the data values to these modelling limits? Are limits on the range of allowed parameter values needed to retain model validity? How should the distribution of allowed parameter values reflect this limit? For example, how should a model of soil
uptake of nuclides, and the parameters in the model, be restricted to yield only concentrations within the cation exchange capacity of the soil'?
4.6 Quantify the attributes of the PDF Once a PDF type has been selected for the parameter, the available data will have to be interpreted to specify quantitative attributes fully defining the PDF. The sample mean and variance of the data are unbiased estimators of the mean and variance of the distribution to be expected if the sampling could be continued indefinitelyfl~ As well, for some distribution types an estimate of the most probable value of the parameter may be used to set the mode of the PDF. Some distributions may have more than one mode. The following are typical issues that arise about the significance of the data and of the r61e of the parameter in the model when distribution attributes are being set: (i) l f firm limits on the allowed values o f the parameter cannot be set, is it still possible to define 'soft' limits? Soft limits are values on the high and low sides of a PDF that one believes should bound some fraction (for instance, 95% or 99%) of the probability. As outlined above in Section 4.5, one could select corresponding values of the CDF, and then translate that value into an attribute of the PDF. Such values will be particularly useful in fixing the characteristics of a PDF that has an infinite range, such as a normal distribution. (ii) Is the parameter used as an 'effective' value in the model? As noted earlier, effective values are often used to describe the net influence of multiple, fluctuating or locally varying processes. Means, standard deviations and correlations between effective parameters may differ from their values for the original data. For instance, radionuclide sorption on different minerals may be approximated by a single effective sorption parameter for a particular rock type. This may lead to uncertainty in the value of the effective parameter to use.
Some processes and material properties exhibit scale-dependent characteristics. Scale-dependent properties depend on the scale at which they are measured and described. Rock porosity, for example, is described on increasing scales in terms of integranular cavities, microfractures, and large fractures. It may be important to verify that the available data on transport through the rock are relevant to the scale to be modelled. Has the number of dimensions of the model description been reduced? For example, would uncertainty in estimated variations in groundwater
Deriving parameter probability density functions flow patterns be increased if flow in a threedimensional array of fractures is modelled as one- or two-dimensional flow through a slab? For long-term assessments, key transport processes such as groundwater flow typically change only slowly over hundreds or thousands of years. Processes that exhibit short-term fluctuations, such as daily or monthly cycles of climatic and living systems, may be relevant only if they affect annual or lifetime doses to individuals. For example, monthly rainfall values will vary more than annual rainfall values. While 1000-year averages may vary little, forecasts of a 1000-year average will be uncertain because of lack of knowledge. Which average does the model assume? How will the output distribution be interpreted? If predictable time-dependence of a quantity is an important characteristic of the system, it is preferable to introduce the dependence as a function of time-dependent variables, instead of as a timeindependent input sampled parameter. Such treatments require more detailed information. Even if the models in the assessment code do not directly incorporate time-dependent parameters, it may be possible to allow for the uncertainty introduced by approximate handling of time variation. Following is an example of how one time-dependent parameter was handled in a Canadian assessment.15 The molecular diffusion coefficient increases with temperature, which changes with time in a high-level waste vault. However a time-independent value was required to meet with model requirements. A procedure was developed to estimate a PDF that would represent the variation in the diffusion coefficient as a function of time. To begin with, it is expected that when the first waste containers in the repository start to fail and expose the waste to groundwater, the water will be past its peak temperature, and, at all times, will be less than 150°C, surface. The most likely water temperature over the assessment period was taken to be 100°C, the average maximum reference temperature near the container. For long times, the temperature allowed was 15°C, the ambient rock temperature at 1000-m depth. A three-interval piecewise uniform distribution was defined for the molecular diffusion coefficient [mZ/a], with values in the interval: from 0.01 to 0.05 assigned a weight of 0.15 from 0.05 to 0-24 assigned a weight of 0-70, and from 0-24 to 0-42 assigned a weight of 0-15 The weights were assigned from estimates of the duration of the three temperatures regimes after container failure. In this case, the PDF had to be broader to include all the possible parameter values at any time in the assessment period than if only a range of values at early times (when molecular diffusion would be
287
highest) had been allowed. Any value within that broader range may come to be used throughout a given simulation. Therefore extreme, improbable parameter values which are most appropriate at limited times or locations may be used throughout a few complete simulations rather than at different times in more simulations. Such bias will often generate a broader range of estimated effects of the process than if values were repeatedly sampled throughout each simulation to reproduce explicitly the short-term fluctuations.3° This is usually acceptable for a long-term assessment code used to scan the gross features of many conceivable eventualities rather than the fine detail of any one particular possible evolution.
5. MANAGING INFORMATION ON PDFs Once data contributors have decided on the PDFs to be assigned to all parameters for an assessment, it is essential that the data are both reliably transmitted to the assessment team, and recorded in a suitable form for creating computer input data files, for interpreting results, and for archiving. Reference 31 contains a compendium of this type of PDF data for a recent assessment of the Waste Isolation Pilot Project (WIPP) in the USA. Manual handling of the masses of information involved is tedious and error-prone. Consequently an automated data handling system was created for a recent Canadian assessment. 32 All the required information on each PDF was collected on standard data transmittal forms. Figure 10 shows a completed example. One such form is required for each parameter. Information on the form includes a description of the PDF type and attributes, a justification for its selection, and approvals by the data contributor, modeller, and the manager of the database. A computerized database was designed and implemented to store the information on the data transmittal forms. Several computer codes were written to automate the creation of the database, entry of data into it, and creation of the SYVAC3-CC3 input files from its contents. The database contains information on approximately 8000 parameters and occupies 14 megabytes (14 million characters). Data can only be entered into the database by one person, the database manager. Anyone in the research team can examine and print out data from the database. Data entered into the database are printed out and returned for review and signoff to the data contributor, the leader of the group responsible for the model in which the data will be used, and the
288
M. E. Stephens, B. W. Goodwin, 7". H. Andres AREATE P.A.
Davis, G.A. Thorne
SYVAC3-CC3 Parameter Characteristics I. D a t a A u t h o r i z a t i o n Data submitted by: ~ PLEASE TYPe..
SEE ESA
GO
for the CAD Post-Closure Assessment ' I :
'
~
DELIN
/
3
,
~
Date:
S FOR D FIN
03
~¢~
ONS OF
2. Parameter Full Name, Complete Definition and Mathematical
Symbol
Full Name: TERRESTRIAL CATCHMENT AREA Complete Definition:
- lake catchment or watershed area Including the surface area of the lake. Note that the unit qualifier "drysoil" is equivalent to " s o i l " h e r e .
I Mathematical 4.
Probability
Symbol ( i f Density
any):
Function
I 3.
Ad (PDF) f o r
SI U n i t s [ m ] . ~ l or , . t o t ]
the Parameter
PDF Type:(-OOOS-l'(~¢~'iBounds:
None [ ], or Value bounds [ ], or Ouantile bounds [ ]
Upper bound: Lower bound:
Attributes (a,b,c,~,~,GH,GSD,%,=2,n,{ai,bi,wi}) as appropriate for type: ( L i s t on b a c k o f p a g e o r on a s e p a r a t e p a g e i f y o u n e e d m o r e s p a c e . )
Ca_= I,Ob ~lo
~
5. Dependence ( i f
.
.
.
.
an~) oil Another Parameter v i a a C o r r e l a t i o n
Independent [~], or D e p e n d e n t on p a r a m e t e r : with Correlation
Coefficient *
Coefficient
( F u l l Name) (between -1. and +1.):
Page 1 of 2 Fig. lOa. Completed example of the data submission form used in the latest Canadian assessment (Front) ~
person overseeing preparation of the model code. The signed forms are archived. More details on the system are given in Reference 32.
6. SAMPLING PDFs IN AN ASSESSMENT CODE Once PDFs have been derived for and entered into an assessment code, there remains the practical question of how to sample values from them for different simulations. Several sampling methods have been
developed, including random sampling of all parameters, and systematically sampling combinations of selected parameters from different intervals, for instance, 'high', 'mid-range', or 'low' intervals, in their allowed range. Additional transformations of sampled PDF values may also be useful, for instance to choose values preferentially from intervals of particular interest, for example, high-consequence, low-probability intervals worth extra scrutiny because they contribute a large share to risk. Each sampling technique has advantages and disadvantages as regards efficiency of implementation, degree of independence from the form of the model in
Deriving parameter probability density functions
289
6. Reasons for This Choice of PDF (Please provide J u s t i f i c a t i o n f o r the given i n f o r m a t i o n , i n c l u d i n g PDF type, a t t r i b u t e s , hounds, the p r i n c i p a l sources of u n c e r t a i n t y , underlying assumptions, s i m p l i f i c a t i o n s and q u a l i f y i n g c o n d i t i o n s , and a t t a c h a plot of the PDF and data p o i n t s used. A l t e r n a t i v e l y , please provide a r e f e r e n c e where t h i s i n f o r m at i o n may be found.)
'-".'," c .(, ,,.,.
~/,
• SYVAC3-CC3 Information
(TO BE COMPLETED BY ESAB)[
Short name of the parameter in SYVAC3-CC3: Long name (up to 32 c h a r a c t e r s ) : model c o n s t r a i n t s .
Checked by:
Date: ] ~ ' ~ - O ~ " - 2 _ 5
(slgIMture) Data have been correctly entered into SYVAC3-CC3 data base. Checked by:
t
Date:
(signature) Page 2 of 2 Fig. 10b. Completed example of the data submission form used in the latest Canadian assessment (Back). ,6 which the parameter is used, possibility of combining independent sample sets, and degree of bias in estimators of characteristic output distributions. 33.34 Sampling is an important subject in its own right, and beyond the scope of this paper.
7. S U M M A R Y A N D DEVELOPMENTS
DESIRABLE
FUTURE
Principles and practical techniques to derive CDFs and PDFs for performance assessments of disposal
systems are available and in active u s e . 1"8'10"14'26'28"31 Further development is desirable to better handle the effects of correlations, spatial and temporal variability, extreme values and stochastic natural and anthropogenic phenomena. 14 In cases where data are sparse, the generation of subjective probabilities by use of techniques such as probability encoding is still 'clearly more art than science', 26 and should not replace research that could produce direct evidence. Risk results from a probabilistic assessment using PDFs must be interpreted within the context of what
290
M. E. Stephens, B. W. Goodwin, T. H. Andres
is understood by probability, what is known about the system, the structure and extent of the model description of the system, and how the results will be used. A desirable i m p r o v e m e n t in the use of PDFs for system performance assessments would be clearer statements of what is meant by such concepts as probability and risk. For example, the calculated radiological risk to human health for a disposal facility is not exactly equivalent to an expected long-term return from repeatedly playing a game of chance like a weekly lottery. In a lottery, one makes repeated observations of the variable behaviour of the real system. In contrast, when studying the characteristics of a model of a system, one estimates variations in the model behaviour resulting from variable assumptions about the p a r a m e t e r values in its defining equations. Risk in the performance assessment context is therefore more akin to making a series of estimates of the various possible results of proposed activities, such as how long it will take to construct a building, how much paint will be needed to paint a fence of estimated dimensions, or what the rate of return will be from a new business venture. 35 In each case, uncertainty about the actual, eventual outcome of the intended activity comes from a variety of sources. The uncertainty represented in the distribution of outcomes from simulations of the operating processes reflects, to the degree of accuracy of which the model is capable, the combined distributions of the uncertain input p a r a m e t e r values. This provides the most complete information available on which to base decisions in the face of irreducible uncertainty.
ACKNOWLEDGMENTS This paper was prepared as part of the Canadian Nuclear Fuel Waste M a n a g e m e n t Program. The Program is jointly funded by A E C L and Ontario Hydro under the auspices of the C A N D U Owners Group. The authors wish to thank their colleagues at A E C L Whiteshell Laboratories for many helpful discussions over the years on the philosophy and practicalities of deriving PDFs for disposal system assessments, and collaboration in developing and using the P D F database for the SYVAC3-CC3 code.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
REFERENCES 15. 1. ERL, Handling Uncertainty in Prediction. Handling Uncertainty in Environmental Impact Assessment, Vol. 18. Environmental Resources Limited, London, 1985. 2. Morgan, M. G. & Henrion, M., Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and
16.
Policy Analysis. Cambridge Universitv Press, Cambridge, 1990. IAEA/NES/CEC, Proceedings of the International Symposium on Safety Assessment of Radioactive Waste Repositories, Paris, France, 9-13 October 1989, Organisation for Economic Co-operation and Development, Paris, France, 1990. Bonano, E. J., Hora, S. C., Keaney, R. L. & von Winterfeldt, D. Elicitation and Use of Expert Judgment in Performance Assessment for High-Level Radioactive Waste Repositories. Sandia Report SAND89-1821, Sandia National Laboratories, Albuquerque, 1989. Thompson, B. G. J., Goodwin, B. W., Nies, A., Saltelli, A., Kjellbert, N. A., Galson, D. A. & Sartori, E. J., The OECD Nuclear Energy Agency Probabilistic Assessment Codes (PSAC) User Group; Objectives, Achievements and Programme of Activities. Proceedings of the International Symposium on Safety Assessment of Radioactive Waste Repositories, Paris, France, 9-13 October 1989, Organisation for Economic Co-operation and Development, Paris, France, 1990. Dormuth, K. W. & Quick, R. D., Accounting for Parameter Variability in Risk Assessment for a Canadian Nuclear Fuel Waste Disposal Vault. Int. J. Energy Systems, 1 (1986) 125-27. Goodwin, B. W., Andres, T. H., Davis, P. A., LeNeveu, D. M., Melnyk, T. W., Sherman, G. R. & Wuschke, D. M., Post-closure Environmental Assessments for the Canadian Nuclear Fuel Waste Management Program. Radioactive Waste Management and the Nuclear Fuel Cycle, 8 (2, 3) (1987) 241-72. von Holstein, C.-A. S. S. & Matheson, J. E., A Manual for Encoding Probability Distributions. SRI Report AD-A092259, SRI International, Menlo Park, CA, 1978. Walker, J. R. & LeNeveu, D. M., Nonlinear Chemical Sorption Isotherms in the Assessment of Nuclear Fuel Waste Disposal. AECL Report AECL-8394, Chalk River, Ontario, Canada, 1987. Stephens, M. E., Goodwin, B. W. & Andres, T. H., Guidelines for Defining Probability Density Functions for SYVAC3-CC3 Parameters. AECL Technical Record TR-479, Chalk River, Ontario, Canada, 1989. OECD Nuclear Energy Agency, Working Group on the Identification and Selection of Scenarios for Performance Assessment of Radioactive Waste Disposal Systematic Approaches to Scenario Development. Draft Final Report, OECD, Paris, France, 1992. AECB, Deep Underground Disposal of Nuclear Fuel Waste: Background Information and Regulatory Requirements Regarding the Concept Assessment Phase. Atomic Energy Control Board Regulatory Document R-71, Ottawa, Canada, 1985. AECB, Regulatory Objectives', Requirements and Guidelines for the Disposal of Radioactive Wastes-Long Term Aspects. Atomic Energy Control Board Regulatory Document R-104, Ottawa, Canada, 1987. Tierney, M. S., Constructing Probability Distributions of Uncertain Variables in Models of the Performance of the Waste Isolation Pilot Plant: The 1990 Performance Simulations. Sandia Report SAND90-2510, Sandia National Laboratories, Albuquerque, 1990. Wuschke, D. M.. et al., Second Interim Assessment of the Canadian Concept for Nuclear Fuel Waste Disposal--Vol. 4: Post-Closure Assessment. Atomic Energy of Canada Limited Report, AECL-8373-4, Chalk River, Ontario, Canada, 1985. AECL, Environmental Impact Statement on the Concept
Deriving parameter probability density functions for Disposal of Canada's Nuclear Fuel Waste. Atomic Energy of Canada Limited, Pinawa, Manitoba, Canada (in preparation). 17. Eslinger, P. W. & Sagar, B., Use of Bayesian Analysis for Incorporating Subjective Information. In Proceedings of the Conference on Geostatistical, Sensitivity, and Uncertainty Methods for Ground-water Flow and Radionuclide Transport Modelling, San Francisco, 15-17 September 1987, Battelle Press, Columbus, Ohio, USA, 1989. 18. Chan, T., Scheier, N. E. & Reid, Keith J. A., Finite-Element Thermohydrogeological Modelling for Canadian Nuclear Fuel Waste Management. In Proceedings of the Second International Conference on Radioactive Waste Management, Winnipeg, Manitoba, Canada, 7-11 September 1986, Canadian Nuclear Society, Toronto, Ontario, Canada, 1986. 19. Zach, R. & Sheppard, S. C., Food-Chain and Dose Model, CALDOS, for Assessing Canada's Nuclear Fuel Waste Management Concept. Health Physics, 60 (5) (1991) 643-56. 20. Andres, T. H., Identification of Parameter Correlation and Functional Relationships Using Expert Opinion. Atomic Energy of Canada Limited Report prepared for SENES Consultants Limited, 1986. 21. Garisto, F. & Garisto, N. C., A UO2 Solubility Function for the Assessment of Used Nuclear Fuel Disposal. Nucl. Sci. Eng., 90 (1985) 103. 22. Feller, W., An Introduction to Probability Theory and its Applications. John Wiley and Sons, New York, 1950. 23. Kleijnen, J. P. C., Statistical Techniques in Simulation-Part 1. Marcel Dekker, New York, 1974. 24. Frech, K. J. & O'Connor, P. A., FITDIS User's Manual. AECL Technical Record TR-407, Chalk River, Ontario, Canada, 1986. 25. Amiro, B. D., The Atmosphere Submodel for the Assessment of Canada's Nuclear Fuel Waste Management Concept. AECL Report AECL-9889, Whiteshell, Manitoba, Canada, 1992. 26. Merkhofer, M. W. & Runchal, A. K., Probability Encoding: Quantifying Uncertainty over Hydrologic Parameters for Basalt. In Proceedings of the Conference
291
on Geostatistical, Sensitivity, and Uncertainty Methods for Ground-water Flow and Radionuclide Transport Modelling, San Francisco, 15-17 September 1987, Battelle Press, Columbus, Ohio, USA, 1989. 27. Loo, W. W., Arnett, R. C., Leonhart, L. S., Luttrell, S. P. I-Seng Wang & McSpadden, W. R., Effective Porosities of Basalt: A Technical Basis for Values and Probability Distributions Used in Preliminary Performance Assessments. SD-BWI-TI-254, Rockwell Hanford Operations, Richland, Washington, 1984. 28. Dalrymple, G. J., The Use of Expert Opinion in Specifying Input Distributions for Use in Probabilistic Risk Analysis of Radioactive Waste Disposal. Wastes Management, 79 (12) (1989) 912-22. 29. Freund, J. E., Mathematical Statistics. Prentice-Hall, Englewood Cliffs, New Jersey, 1962. 30. Kremer, J. N., Ecological Implications of Parameter Uncertainty in Stochastic Simulation. Ecological Modelling, 18 (1983) 187-207. 31. Rechard, R. P., Iuzzolino, H. & Sandha, J. S., Data Used in Preliminary Performance Assessment of the Waste Isolation Pilot Plant. Sandia Report SAND-2408, Sandia National Laboratories, Albuquerque, 1990. 32. Stephens, M. E. & Witzke, K. H., The CC3 Database Management System, Voi. I--Description. AECL Technical Record TR-XXX-1, Chalk River, Ontario, Canada (in preparation). 33. Andres, T. H., Statistical Sampling Strategies. In Proceedings from a Workshop on Uncertainty Analysis. OECD Nuclear Energy Agency, Paris, France, 1987. 34. Ackoff, R. L., The Design of Social Research. University of Chicago Press, Chicago, 1953, table reproduced as Table 1 in Risk-Assessment Methodology Development for Waste Isolation in Geological Media-Technical Review of Documents. NUREG/CR-0394, NUREG/CR-0424, and NUREG/CR-0458, Stevens, C. A., R. R. Fullwood and S. L. Basin, NUREG/CR1672, Science Applications Inc., Palo Alto, California, 1980. 35. Reilly, P. M. & Johri, H. P., Decision-Making Through Opinion Analysis. Chemical Engineering, 76 (1969) 122-9.