Environmental Software 9 (1994) 1-11 ~'; 1994 Elsevier Science Limited Printed'in Great Britain. All rights reserved 0266-9838/94/$7.00 ELSEVIER
UNCSAM: a tool for automating sensitivity and uncertainty analysis P. H. M. Janssen, P. S. C. Heuberger & R. S a n d e r s National Institute of Public Health and Environmental Protection ( RIVM ), PO Box 1, 3720 BA Bihhoven, The Netherlands
(Received 1 September 1992; revised version received 4 August 1993: accepted 3 September 1993)
ABSTRACT The important role of sensitivity and uncertainty analysis in the mathematical modelling process is indicated, and guidelines are given for these analyses. The main features are presented of the software package UNCSAM, which applies efficient Monte Carlo sampling in combination with regression and correlation analysis to perform sensitivity and uncertainty analyses on a large variety of simulation models. The use of UNCSAM is illustrated by an environmental application study. K e y words: Sensitivity, Uncertainty, Latin Hypercube sampling. S o f t w a r e Availability Name of the Software: UNCSAM Contact Adress: Mrs. M.E.J. Kroeze; RIVM (CWM); PO Box 1; 3720 BA Bilthoven; The Netherlands Telephone etc.: Telephone: ..(31) 30 743320; Telefax: ..(31) 30 250740; Telex: 47215 rivm nl Year first available: 1992 Hardware required: No specific requirements; device with mathematical co-processor is recommended Program Language: Standard Fortran 77, embedded in Ansi-C environment Program size: 3 Mbyte Cost: $ 3000 (for universities half-price, i.e. $ 1500)
1. I N T R O D U C T I O N Mathematical models are useful tools in the study of many real life problems. However, in general our knowledge and information on these problems is limited, uncertain and poor. In order to enable a reliable development and application of such models, it is therefore often necessary to perform a thorough model analysis to gain insight into the reliabilitity of the model(s). Model analysis usually consists of performing a • sensitivity analysis: the study of the influence of variations in model parameters, initial conditions etc. on model outputs;
• uncertainty analysis: the study of the uncertain aspects of a model, and of their influence on the (uncertainty of the) model outputs.
These analyses serve to clarify the crucial aspects of the model, and to convey the origins and effects of model uncertainties. The results will moreover lead to increased insight into the model, and can provide useful suggestions for further model development, calibration and data acquisition. During the last decades a variety of techniques have been proposed to perform these analyses; for an overview see 1&3,4. The majority of techniques are concerned with deterministic (simulation) models; corn-
2
P. 14. M. Janssen, P. S. C. Heuberger, R. Sanders
parative evaluation studies show that in particular the efficient Monte Carlo based method which uses the recently developed Latin Hypercube sampling technique 6,6 is an appropriate candidate for analysing these models. Currently, various excellent, easy-to-use, software packages (e.g. Crystal Ball, @RISK) r,s have become commercially available for performing basic risk and uncertainty computations. These packages are based on the Monte Carlo approach and offer a wide variety of distributions from which samples can be drawn; moreover they provide excellent reporting and visualization facilities. The major limitation of this software is, however, that it is primarily intended to analyze spreadsheet models. In most applications at the RIVM (Dutch National Institute for Public Health and Environmental Protection) this limitation was too severe. The class of spreadsheet models is far too limited to describe the complexity and the dynamical features encountered in many biological and environmental systems. Instead, often physically-oriented dynamical models are employed in practice, which are based on ordinary or partial differential equations (e.g. atmospheric transport models; groundwater flow and transport models; bio-geochemical models). These models can vary widely in complexity, sophistication, method of implementation (e.g. using low-level programming languages, or applying dedicated dynamical simulation tools), run-time (from seconds to hours), and hardware requirements (PC's, mini-, micro-, super-computers, work-stations). A second limitation of the above mentioned risk and uncertainty analysis software is that it does not provide measures to express which parts of the model contribute substantially to the uncertainty of the output variables. In many applications, it is this specific information on the dominant risk or uncertainty factors which is of utmost interest. Against this background it was decided in 1990 at the RIVM to develop a comprehensive and flexible, Monte Carlo based, software package UNCSAM (UNCertainty analysis by Monte Carlo SAMpling techniques) for performing sensitivity and uncertainty analyses, focussing especially on the above mentioned limiting issues, i.e. (1) UNCSAM should enable treatment of a large variety of mathematical models, largely independent of their form and implementation, and (2) it should provide straightforward information on the contribution of the various model components to the sensitivity or uncertainty of the model Outputs. Although UNCSAM was primarily intended for internal use at the RIVM, it was decided, after successful experiences with the software, to update the package and make it available for external use. The main features of the present version 1.1 of UNCSAM will be described in this paper. Before doing this, some background material is presented on sensitivity and uncertainty analysis. Finally we illustrate
the use of UNCSAM by an environmental application study, and end with some conclusions. 2. S E N S I T I V I T Y A N D U N C E R T A I N T Y ANALYSIS 2.1 Strategy The process of performing sensitivity and uncertainty analyses can typically be divided into various steps, which also serve as useful guidelines for planning these activities: 1. Problem formulation: In this vital step, one has to clarify why sensitivity or uncertainty analysis is needed for the model at hand; what knowledge is desired and in which form this information should be reported? 2. Inventory of sensitivity or uncertainty 'sources': In general the model outputs will depend on (a) the model structure, i.e. the form of the mathematical equations which constitute the model. (b) the modelinputs/external factors: these represent the influence of the external 'environment' on the system (e.g. sources and sinks, forcing functions, etc.). (c) boundary and/or initial conditions. (d) model parameters: usually the model contains various coefficients (parameters) characterizing the (dynamical) behaviour of the modelled processes (e.g. diffusivity; reaction rate etc.). (e) the applied computational scheme in which the model is implemented (i.e. model operation). One has to specify which 'sources' will be involved in a sensitivity or uncertainty analysis. In general those 'sources' are chosen that have (partially) unknown values, or factors which will/can be manipulated or controlled. 3. Quantification of the 'sources': (a) For sensitivity analysis this comes down to specifying the nominal values of the 'sources' and the appropriate variations around these values. (b) For uncertainty analysis this will depend on the nature of the uncertainty and on the information available on this: If the program has been checked decently (syntactically and semantically) the uncertainties (errors) in model operation will usually be minor, compared to the other uncertainties. Uncertainties in model structure are usually
UNCSAM." a tool for automating sensitivity and uncertain(v analysis difficult to quantify: in pratice one often resorts to a comparative study of various feasible model structures. The uncertainty in the other 'sources' is usually described in a probabilistic setting by specifying their probability distributions (e.g. by giving the mean, standard deviation, minimum, maximum, distribution type), and their mutual correlations. 4. Evaluation of the effects on the model outputs: (a) In a sensitivity analysis it is evaluated how the variations of the various 'sources' around their nominal values affect the considered model outputs. This information can be reported either in a direct graphical way (e.g. by plotting the model output(s) for various values of the varying 'sources'), or by computing sensitivity measures (e.g. partial derivatives; regression coefficients etc.) and reporting them in tabular or graphical form. Moreover the importance of the various 'sources' is ranked, in order to detect the most sensitive ones. (b) In an uncertainty analysis one first evaluates the total influence of the uncertainties in the various 'sources' on the model outputs. This information can be reported in various ways, e.g. by presenting (a) the range of the model outputs, (b) the mean and variance, (c) empirical distribution functions and histograms, (d) percentile values, e.g. 2.5, 50 (median) and 97.5 percentile. Since these quantities are usually presented as estimates based on a finite number of model simulations, it is also desirable to have additional information on the accuracy of these estimates (e.g. confidence bands). In addition to the overall uncertainty picture, one tries to discover which specific sources contribute substantially to the total uncertainty. Typically correlation or regression analysis is applied to obtain measures to express this contribution. This will be briefly discussed in subsection 2.3. 5. Performing a robustness study of the results: The results of the foregoing sensitivity or uncertainty computations usually depend on the choices and specifications of the various 'sources'. Since these choices can be rather subjective and arbitrary, it is important to cheek the robustness of the results with respect to these choices.
2.2 Monte Carlo based methods Monte Carlo based methods for sensitivity and uncertainty analysis rely on the fact that variations or
3
uncertainties in the 'sources' cat~ usually be suitably described by specifying probability distributions and mutual correlations, which reflect these variations or the probability of the values which the 'sources' can take. Sampling is performed from these distributions, resulting in a set of values for the various 'sources' (Monte Carlo sampling). These sampled values are subsequently used to generate model outputs (Monte Carlo simulation), and the results are saved for further analysis. This further analysis consists of computing and showing basic statistical information (means, variances, percentiles, histograms etc.), of assessing the accuracy of the computed quantities (confidence bounds), and of performing regression and correlation analysis to obtain insight into the sensitivity and uncertainty contribution of the various 'sources'. This Monte Carlo approach is simple and straightforward, and can be applied to a wide variety of models. In order to limit the computational load, especially for large models, it is recommended to use an efficient sampling technique. In particular the recently developed Latin Hypercube sampling (LHS) technique is considered an appropriate candidate 5,6. This technique uses a stratified way of sampling from the separate 'sources', based on a subdivision of the range of each 'source' in N disjunct equiprobable intervals. Sampling one value in each interval according to the associated distribution, one obtains N samples for each 'source'. The sampled values for the first 'source' are subsequently randomly paired with the sampled values of the second 'source'. These pairs are furthermore randomly combined with the sampled values of the 3-d 'source' etc., which finally results in N combinations of p parameters. This set of p-tuples is the Latin Hypercube sample. It is possible to incorporate correlations in the sampled set and to avoid spurious correlations due to the sampling process, using the restricted pairing technique of Iman and Conover9. When using LHS, the parameter space is representatively sampled with only a few samples (i.e. N can be small). A choice of N > ~ • p samples, where p is the number of parameters to be sampled, usually gives satisfactory results ~. Moreover LHS leads to efficient estimates of various statistical characteristics of the model output (e.g. expectation, probability distribution etc.) l°.
2.3 Sensitivity and uncertainty contributions Various statistics are employed to quantify the sensitivity and uncertainty contribution of the 'sources' to the model outputs. Most of them are based on regression and correlation analysis11. For the purpose of this study we will only present a few measures based on linear regression analysis. Suppose that least-squares regression of the model output y(k) in the k-th simulation run (k = 1 , . . . , N) on the associated sampled 'sources' xl (k),..., xp(k) re-
4
P . H . M . Janssen, P. S. C. Heuberger, R. Sanders
sults in the regression equation: P
= £ +
8,.
+
=:
+
(1)
i=1
The quantities 8 0 , 8 1 , . . . , 8 p denote the ordinary regression coefficients, and ~(k) denotes the regression residual. The 8i values can be considered to be absolute sensitivity measures, measuring approximately (in first order) the absolute change Ay of y, if xi changes with amount Azi, while the other 'sources' remain constant. The coefficient-of-determination (COD; also called R ~) of this regression is equal to: R 2 . _ __ S~ _
S~
(2)
where S.2 denotes the sample variance of the associated quantity. R 2 is a number between 0 and 1, expressing the validity of the linear regression model ~)(k) to approximate the original model output y(k); R 2 ~ 1 indicates a good approximation. On the basis of the regression equation (1) the uncertainty in y (as expressed by its variance S~) is readily expressed in terms of 8iS., : p
p
+
(3)
i=1 j=l
Here r~,~ denotes the (sample) correlation coefficient between the uncertainty 'sources' zi and zj. For the case when zi is uncorrelated with the other 'sources', it is obvious that the quantity 8iS~, measures the linear uncertainty contribution of the source zi. The related standardized quantity
tainty of the model response y, due to a relative change in the uncertainty ( s=, ) of the individual source zi, accounting for the induced change of the correlated 'sources' xj (j # i): = PUCi" Sx,
(5)
The PUC can be expressed as a combination of regression and correlation quantities11: P
PUC,
:=
81 ') •
•
(6)
k=l
Here rx~y denotes the (sample) correlation between the source zk and the model output y. If zi is uncorrelated with zk, k # i, PUC simplifies to: •
= (SRCi) 2
(7)
Due to this relationship, the square root of the PUC is often used as a measure for the uncertainty contribution (RTU=RooT of Uncertainty): RTUi := X/I PUCi I
(8)
Being based on the linear regression model (1), the measures presented thusfar only measure the linear uncertainty contribution. Their usefulness therefore strongly hinges on the condition that the nonlinearities in the relationship between the 'sources' zi and the model output y are minor (reflected by R 2 1). When encountering important non-linearities one should resort to other methods, e.g. considering more sophisticated (non-linear) regression models, or applying suitable transformations on the data (i.e. y and Z i ' S ) 11 .
8}') =
8i " S=,
(i = 1,...,p)
(4) 3. T H E S O F T W A R E
is called the standardized regression coefficient (SRC), and thus measures the fraction of the uncertainty in y which is contributed by xi (if r~,.j = 0 for i # j). The SRC's will therefore only give a valid impression of the uncertainty contribution if the 'sources' show no substantial correlation, and if the linear regression model is a fair approximation of the original model output y (i.e. R 2 ~ 1). Expression (3) shows that correlations between sources make an unequivocal determination of the uncertainty contribution of an individual 'source' impossible, since it can not be neatly separated from the contributions of the other correlated 'sources'. In the literature various alternative measures have been proposed to cope with this problem 11. One of these measures, the so called partial uncertainty contribution (PUC), tries to express the relative change xt ~-~-~s s~ J~ in the uncer-
PACKAGE UNCSAM
UNCSAM is a comprehensive and flexible software package for performing sensitivity and uncertainty analyses of mathematical models, based on Monte Carlo techniques in combination with regression and correlation analysis. The main features of UNCSAM are as follows; Application Area: UNCSAM can be used to analyse a broad spectrum of (simulation) models, largely independent of their form and implementation. Basic components: UNCSAM, Version 1.1, comprises a collection of programs, developed for the various activities needed in sensitivity and uncertainty analysis: • Sampling •
Basic statistical analysis
UNCSAM: a tool for automating sensitivity and uncertainty analysis * Confidence bounds for estimated quantities • Determination of sensitivity and uncertainty contributions Additional programs are available for UNCSAMmodel interfacing, file manipulation (reading, selecting signals from data files), graphics etc. To be more specific we present additional information on these items: • Distributions: Sampling can be done from a variety of continuous probability distributions : (log)uniform, (log)normal, (log)triangular, exponential, logistic, Weibull, Beta. • Sampling: Use can be made of simple random sampling and of Latin Hypercube sampiing. User-specified parameter correlations are taken into account; a correction for spurious correlations can be applied 9. • Basic statistical analysis: Information on mean, variance, median, skewness, kurtosis, percentiles, correlations and covariances can be computed, and shown in tabular or graphical form. Moreover empirical distribution functions and histograms can be determined. • Confidence bounds: The aforementioned statistical quantities are computed as estimates on the basis of a finite number of samples. Information on the accuracy of these statistics can be obtained by computing their confidence bounds. UNCSAM offers the opportunity to compute the confidence bounds for the empirical cumulative distribution function, based on the Kolmogorov statistic 12, or on the binomial model 13. • Sensitivity and uncertainty contribution: The user can choose amongst a large number (20 !) of sensitivity and uncertainty measures, based on linear regression and correlation analysis of the original and the ranktransformed data values 11. • File organization: UNCSAM generates a large number of files, containing data or information needed in the various stages of the analysis process. The storage load is constrained and loss of accuracy is avoided by storing intermediate numerical results in binary files. The final results (tables and plot instructions) are stored in ASCII-files. • Interfacing: UNCSAM is developed in a model independent way, irrespective of how the user's model is implemented. This offers the user the flexibility to use the package for many application studies. The user should
5
however establish an interface between UNCSAM and his/her model. The function of this interface is to pass data files (of desired format) from the package to the model and vice versa. To enhance ease of use, interface programs are provided for a fairly general class of models. Operational mode: UNCSAM can be used in batch mode as well as in interactive mode. Software is available to generate instruction files for batch mode execution. A simple menu-driven option is available when working in interactive mode. Data Input: The various programs which constitute UNCSAM require input files containing the necessary information. In most cases these files have been generated before by other application programs of UNCSAM. In a few situations the inputfiles have to be constructed explicitly by the user (e.g. by using a wordprocessor or text-editor). Data Output: The final results of the analyses are stored in ASCII-files in the form of tables, or in the form of plot instructions for graphical presentation. The tabular information can be displayed on screen, or printed on a hardcopy device. The graphical information can be made visible by the graphical software at hand. Hardware Requirements: The software is written in Ansi standard Fortran 77, embedded in an AnsiC environment (for the overall 'master program' which calls the various FORTRAN subprograms). Being written in a portable code, it can thus be be used on a variety of devices, such as PCXT/AT/286/386/486/OS2, VAK, SUN workstations etc. It is advised to use UNCSAM on machines which are equiped with a co-processor. UNCSAM will occupy about 3Mb of memory. Documentation: A comprehensive users manual is available 14. A simple case study is included in the manual as an illustration. Availability: An executable version of UNCSAM (version 1.1) along with documentation is available from the RIVM. Finally, some comments are in order concerning the major differences between UNCSAM and software packages like Crystal Ball and @RISK (see Introduction): The application range of UNCSAM differs considerably from the one of Crystal Ball and ~RISK. The latter two software packages are add-in's to spreadsheet software and are thus restricted to spreadsheet models. UNCSAM, on the other hand, enables the analysis of a wide variety of models, largely independent of their form and implementation. UNCSAM thus provides tools for
6
P . H . M . Janssen, P. S. C. Heuberger, R. Sanders
the analysis of many practical application studies for which spreadsheet models are far too restrictive. Moreover, unlike UNCSAM, the packages Crystal Ball and @RISK do not render straightforward information on the sensitivity and uncertainty contributions of the 'sources', information which is of utmost importance in many applications. UNCSAM also offers information on the confidence of the results obtained by the Monte Carlo analysis, which is not the case for Crystal Ball and @RISK. Analysis by @RISK and Crystal Ball is typically performed in an on-line, interactive fashion. For most applications of these packages this is suitable, since most spreadsheet models do not require much computation time, and Monte Carlo simulation can thus be performed on-line. UNCSAM, however, is often applied to computer intensive process-oriented dynamic models, for which the computation time for simulation dominates the time for sampling and subsequent statistical analysis. Since UNCSAM offers appropriate interfacing tools, the simulation activities can be separated from the sampling and the analysis, and can therefore be easily performed offline, even on a different platform. Moreover, UNCSAM offers the opportunity to perform the sampling and analysis activities in an interactive as well as in a non-interactive (batch) way. This 'batch operation' option is especially useful if similar activities have to be repeated a large number of times. Being add-in's to spreadsheet software, the graphical and menu facilities of Crystal Ball and @RISK are excellent, and extremely easy in use. Compared to this, the capabilities of UNCSAM for these aspects are more modest, albeit definitely appropriate. A simple menu option is offered for running the various application programs which constitute UNCSAM. Running these programs is simple and straightforward. The graphical information generated by UNCSAM consists of plotinstruction files, which can easily be processed by the user's graphical software into plots. Crystal Ball and @RISK offer a larger choice of distributions than UNCSAM, although the distributions offered by UNCSAM will be appropriate for most real-life applications. 4. A P P L I C A T I O N S T U D Y UNCSAM has been used in various application studies (e.g. uncertainty analysis of acidification models, atmospheric process model, global warming model). Presently it is in use for a sensitivity and uncertainty
study of biospheric models; risk assessment studies in ecological settings; sensitivity and uncertainty study of pesticide leaching models. Moreover, it serves as an aid when performing sensitivity analyses for calibration studies. It has also been applied in the field of public health (uncertainties in predicting the spread of infectious diseases such as AIDS). In order to illustrate the use of UNCSAM, some results are presented of a recent uncertainty analysis of an acidification model 15. The analysed model is the soil acidification model RESAM (Regional Soil Acidification Model), developed at the Winand Staring Centre for Integrated Land, Soil and Water Research. The primary aim of this model is to make predictions of long-term environmental effects of acid deposition on forest soils on a regional scale. RESAM is a dynamic, one-dimensional, multi-layer, process-oriented model, based on the conceptual relationship between forest element cycling and soil acidification. It simulates the major biogeochemical processes occurring in the forest canopy, litter layer and mineral soil horizons. The model consists of a set of mass balance equations, equilibrium equations and rate-limited equations (mostly first order reactions); the model input includes atmospheric deposition and hydrological data 16. We will now discuss the various steps which constitute the uncertainty analysis: 1. Problem formulation: Since the results of RESAM are used to analyse the acidification problem and to evaluate the benefit of potential abatement strategies (policy applications), it is imperative that the uncertainty of the model predictions be analysed, particularly due to the lack of sufficient data to calibrate the long-term model. The major aim of the uncertainty analysis is therefore the quantification of the uncertainty in the model responses to a given deposition scenario, due to uncertainty and spatial variability in the data. The secondary aim is to determine which additional data would most improve the reliability of predictions. The analysis is restricted to one specific forest soil ecosystem: a leptic podzol with Douglas fir, subject to a reducing deposition scenario. The soil profile consists of four distinguished horizons (layers): A0 (litter layer, 4cm), A1 (15 cm), Bh (25 cm) and C (20cm). The investigated model responses are pH, A1/Ca ratio and Ntt4/K ratio in the root zone. These variables are generally used as indicators for forest soil acidification and for potential forest damage. All variables are yearly water-flux averaged quantities, considered on a regional scale (the Netherlands is divided in 20 typical receptor areas), during the time period 1987-2010. 2. Inventory of uncertainty sources: The uncertainty
UNCSAM: a tool for automating sensitivity and uncertainty analysis uncertainty is presented by:
in the model structure is not considered; this will be subject of a future study where the impact of various process formulations is investigated. Due to the prohibitive large amount of potential uncertainty sources (> 200), a considerable reduction was attempted by neglecting those sources which were a priori regarded as insignificant for the investigated model responses. Applying various additional assumptions, the number of considered uncertainty sources was further reduced to about 80. These sources concerned15:
• showing tables with the mean, standard deviation, and coefficient of variation at the beginning (1987), halfway (2000), and at the end (2010) of the simulation period. • showing plots with the trajectories of the mean, median (50 percentile), 2.5 and 97.5 percentile during the simulation period. Moreover the model response is given for a so called reference run, obtained by simulating the model with the parameter values set at their mean value. Motivation for this last information is to check whether it is possible to suffice with only one single (reference) simulation run to obtain a realistic impression of the mean behaviour in extensive regional applications. • showing the histograms of the quantities in 1987, 2000 and 2010.
(a) Input terms: The applied deposition scenario is based on intended emission reductions in the Netherlands in the time periods 1987-2000 and 2000-2010. The uncertainty in this scenario is restricted to the initial values (1987), comprising 12 sources. The additional uncertainty in the remaining time traject due to political and technological factors, is not considered.
Moreover the contribution of the sources to the uncertainty is presented by showing tables with the values of the RTU and SRC, and by plotting the trajectory of the RTU for the three most important sources either at the beginning or at the end of the simulation period. We only discuss the results for the pH, focussing mainly on layer 3. Considering the results in Table 1 it is obvious that the absolute uncertainty (standard deviation) of the pH in the subsoil is slightly higher than in the topsoil, whereas the opposite is true for the relative uncertainty (coefficient of variation). Both the absolute and relative uncertainty remain fairly constant in both layers during the simulation period. The mean value of the pH in both layers increases during the simulation period, due to the reducing deposition scenario. This behaviour also shows up in the trajectories of the mean, percentiles etc. in both layers during the simulation period; see Fig. 1 for the results of layer 3. This figure moreover shows that the reference run and the mean correspond reasonably well. Notice also a slight difference between the median and the mean, which indicates that the pH distribution is skewed to the left. This is confirmed by the histograms in Fig. 2.
(b) Initial conditions: The amounts and element contents in needles, roots, stems and root distribution for the Douglas fir tree species are considered as uncertain. This comprises 33 sources. (c) Model parameters: Parameters which characterize the relevant soil and vegetation processes are considered as uncertain. This comprises 34 sources 3. Quantification of uncertainty sources: The required information was obtained from expertise judgment, extensive field research and literature data. Characterization of these uncertainties was an elaborate task. For an exact specification we refer to the literature 15. 4. Evaluation of the effects on the model outputs: Applying the Latin Hypercube sampling technique, 150 samples were determined and the associated model simulations were performed. Attention was especially focussed on the uncertainty over the time range 1987-2010 for the pH, the M/Ca-ration and the NH4/K-ratio in the top of the root zone (layer 1; A1, 15cm), and the bottom of the root zone (layer 3; C, 20cm). This
pH Mean Standard deviation Coef. of variation
Layer 1 1987 2 0 0 0 2.96 3.10 0.106 0.111 0.036 0.036
7
2010 3.22 0.126 0.039
Layer 3 1987 2000 4.10 4.21 0.131 0.123 0.032 0.031
2010 4.33 0.138 0.032
Table 1: Mean, standard deviation and coefficient of variation of the pH in layer 1 and 3 in 1987, 2000, 2010.
P. H. M. Janssen, P. S. C Heuberger, R. Sanders
layer 3 pH 5.0
2.5 perc. 4.5
97.5 perc.
mean
median 4.0 ref, run
.5
'
'
'
'
1985
I
'
'
•
"
1990
I
'
'
'
"
I
'
'
'
2000
1995
'
I
'
'
'
'
2005
I
2010 Time (yr)
Figure 1: Temporal evolution of mean, median, 2.5, 97.5 percentile, and reference run of the pH in layer 3.
ics, characterized by the equilibrium constant of aluminium hydroxide (KAlo~) in layer 3 (saturation). This quantity determines the H+-buffering by aluminium hydroxides. Its uncertainty contribution decreases during the simulation period, due to the decreasing deposition. Next to the parameter KAIo~, the uncertainty in pH is most determined by the dry deposition flux of NH3 (FNH3,dd) and SOs (FSO~,dd), which
Before studying the uncertainty contribution of the individual 'sources', one should first check whether the underlying simple linear regression model gives an adequate fit to the simulated model responses. The results on the R 2 in Fig. 3 show that this is indeed the case. According to the results in Table 2 and Fig. 4, the uncertainty of the pH in layer 3 is mainly determined by the aluminium dissolution dynam-
25.0
25.0 20.0
>.
~.
15.0
10.0
g
,o.o
5.0
~
5.0
15.0
u r
3
Or G) t_ ,¢...
I
0.0
0.0
'
3.,50
'
3.70
'
3.90
4.10
pH (1987)
4.30
I
#
4.50
4.70
3.50
3.70
'
'
3.90
4110
4.30
pH (2010)
Figure g: Histograms of the pH in layer 3 at beginning and at end of simulation period.
450
4.70
UNCSAM: a tool for automating sensitivity and uncertainty analysis
9
layer 3 R2 1.0"
0.8"
0.6"
0.4"
0.2'
00 •
'
'
'
'
1985
I
'
'
'
'
1990
I
'
'
'
'
1995
I
'
'
'
2000
'
|
'
'
'
2005
'
I
2010 Time (yr)
Figure 3: Temporal evolution of the regression R 2 of the pH in layer 3. are the main contributors to the external acid load. The difference between the corresponding values of the SRC and the RTU for FNH3,ad and FSO2,dd in Table 2 is mainly due to the strong correlation between these quantities. The SRC does not account for correlations, and is relatively small for FSO2,dd as compared to FNH3,dd. For a more elaborate discussion on the uncertainty contribution of the various sources see [15].
RTU KAlox FNH3,dd FSO2,dd FNO2,dd KKez,1 kNa~ KNH4,~,,s kni,3 KNH4,ex,1 FCldw
1987 SRC
0.90 (1) 0.16(2) 0.15 (3) 0.12(4) 0.10(5) 0.09 ( 6 ) 0.07 (7) 0.07 (8) 0.06 ( 9 ) 0.06 (10)
RTU
5. C O N C L U S I O N S Sensitivity and uncertainty analysis should be an essential part of the model building activities, especially in the context of ill-defined, uncertain, or information poor application areas. Performing these analyses in a structured way will not only convey the crucial aspects of a model and the origins and effects of model uncertainties, but will also lead to increased insight
2000 SRC
0.91 (1) 0.89 (i) -0.17(2) 0.16(2) 0.05 (17) 0.15 (3) -0.09(5) 0.12 (4) -0.14(3) o.1o (6) 0.09 (6) 0.10 (7) -0.05 (14) 0.08 (9) -0.06(12) 0.04 (22) 0.05 (18) 0.06 (12) -0.11 (4) 0.11(5)
RTU
2010 SRC
0.80 (1) 0.21(3) 0.21 (4) 0.18(5) 0.09 (10) 0.10 (7) 0.15 (7) -0.02 (36) 0.08 (11) -0.03 (22) 0.05 (30) 0.05 (15) 0.06 (19) -0.20(2) 0.23(2)
0.89 (1) -0.19(3) 0.08 (8) -0.10 (6) -0.14 (4)
0.81 (1) -0.18(4) 0.01 (67) -0.12(8) -0.12 (7) 0.15 (5) -0.01 (66) -0.04 (31) 0.06 (12) -0.36(2)
Table 2: Uncertainty contribution and rankings for uncertainty sources for the pH in layer 3 in 1987, ~000, ~010.
lO
P. H. M. Janssen, P. S. C Heuberger, R. Sanders
layer 3 RTU 1.0
0.8
KAI
OX
0.6
FNH
3,dd
0.4
FSO
2~dd
0.2
0.0
.
1985
i
i
I
•
1990
•
i
•
I
1995
*
•
•
•
I
2000
i
•
•
i
I
•
•
¢
2005
|
I
2010 Time (yr)
Figure 4: Temporal evolution of the RTU between model parameters and the pH in layer 3.
in the model and its reliability/validity, and provide useful suggestions for well-structured further research (model development, calibration, data acquisition). The software package UNCSAM, which is based on Monte Carlo techniques in combination with regression and correlation analysis, is a useful tool for actually performing these analyses on a large variety of simulation models, largely independent of the way in which the models are implemented. Sampling can be performed from a large number of distributions, and many ways are offered, graphically as well as tabularly, to obtain a clear and complete picture of the uncertainties and their main contributors. ACKNOWLEDGEMENT We thank J. Kros and W. de Vries of the Winand Staring Centre for Integrated Land, Soil and Water Research for their permission to present part of the resuits on the uncertainty analysis of RESAM, and for providing the illustrations in the text.
REFERENCES 1. Iman, R.L., Helton, J.C. An investigation of uncertainty and sensitivity analysis techniques for computer models. Risk Analysis. 1988, 8, 71-90. 2. Iman R.L., Helton, J.H. A comparison o/ uncertainty and sensitivity analysis techniques[or computer models. Report NUREG/CR-3904, U.S. Nuclear Regulatory Commission, Washington, 1985. 3. Rose K.A., Swaxtzman, G.L. A review o]parameter sensitivity methods applicable to ecosystem models. Internal report NUREG/CR-2016. U.S. Nuclear Regulatory Commission, Washington, 1981. 4. Janssen, P.H.M., Slob, W., Rotmans, J. Sensitivity analysis and uncertainty analysis: an inventory o/ ideas, methods and techniques (in Dutch). Report 958805001. RIVM, Bilthoven, the Netherlands, 1990. 5. McKay, M.D., Beckman, R.J., Conover, W.J. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics. 1979, 21,239-245. 6. Iman, R.L. and Conover, W.J. Small sample sensitivity
U N C S A M : a tool Jbr automating sensitivity and uncertainty analysis analysis techniques for computer models, with an application to risk assessment. Communications in Statistics. 1980, A9, 1749-1842. 7. Crystal Ball Version 2.0. Decisioneering, Inc. 1727 Conestoga Street, Boulder, CO80301, USA. 8. @RISK Version 2.01. Palisade Corporation, 31 Decker Road, Newfield, New York 14867, USA. 9. Iman, R.L. and Conover, W.J. A distribution free approach to inducing rank correlations among input variables. Communications in Statistics. 1982, B l l , 311-334. 10. Stein, M. Large sample properties of simulations using Latin Hypercube Samphng. Technometrics. 1987, 29, 143-151. 11. Janssen, P.H.M. Assessing sensitivities and uncertainties in models: a critical evaluation. Conference on Predictability and Nonlinear Modelling in Natural Sciences and Economics. Wageningen, The Netherlands, 5-7 April, 1993.
11
12. Conover,W.J. Practical NonparametricStatistics. John Wiley & Sons, Inc. New York, 1971. 13. Andres, T.H. Confidence bounds on an empirical cumulative distribution function. Internal report AECL8382; Whiteshell Nuclear Research Establishment, Pinawa, Canada, 1986. 14. Janssen, P.H.M., Heuberger, P.S.C., Sanders, R. UNCSAM 1.1: a Software Package for Sensitivity and Uncertainty Analysis; Manual. Report 959101004. RIVM, Bilthoven, the Netherlands, 1992. 15. Kros, J., de Vries, W., Janssen, P.H.M., Bak, C.I. The uncertainty in forecasting regional trends of forest soil acidification. Water, Air and Soil Pollution. 1993, 66, 29-58. 16. De Vries, W. and Kros, J. The long-term impact of acid deposition on the Aluminum chemistry of an acid forest soil, in Regional Acidification Models, ed. K~m~ri, J., Brakke, D.F. Jenkins, A., Norton, S.A. and Wright, R.F., pp. 113-128, Springer-Verlag, Berlin, Heidelberg, 1989.