PII:
Atmospheric Environment Vol. 32, No. 21, pp. 3619—3628, 1998 ( 1998 Published by Elsevier Science Ltd. All rights reserved Printed in Great Britain S1352–2310(97)00419–6 1352—2310/98 $19.00#0.00
MONTE CARLO ESTIMATES OF UNCERTAINTIES IN PREDICTIONS BY A PHOTOCHEMICAL GRID MODEL (UAM-IV) DUE TO UNCERTAINTIES IN INPUT VARIABLES STEVEN R. HANNA,*- JOSEPH C. CHANG and MARK E. FERNAU Earth Tech, Inc., 196 Baker Ave., Concord, MA 01742, U.S.A. (First received 20 March 1997 and in final form 22 August 1997. Published August 1998) Abstract—Because photochemical grid models such as UAM-IV are being used to make policy decisions concerning emissions controls, it is important to know (1) the uncertainties in the model predictions due to the combined effects of uncertainties in the full set of input variables, and (2) the individual input parameters whose variations have the greatest effect on variations in model predictions. A preliminary Monte Carlo uncertainty analysis system has been developed and the methodology has been demonstrated using an application of the standard U.S. regulatory model, UAM-IV, to the 230 km by 290 km New York City domain for the 6—8 July 1988 ozone episode. As a first step, ten modeling experts were asked to estimate the typical uncertainties of 109 UAM-IV input parameters, including 23 variables related to emissions, boundary conditions, and meteorological conditions; and 86 variables related to chemical rate constants. For many of the model inputs, the assumed range of uncertainty was about plus or minus 30% of a normal mid-range value, and, in most cases, the distributions were assumed to have a log-normal shape. The regulatory agency’s ‘‘base run’’ application of UAM-IV to this ozone episode was used to define the mid-range or median values of all input parameters. 50 Monte Carlo UAM-IV runs were then carried out by simple random sampling of each of the 109 input parameters from the assumed distributions. The 50 predicted values of peak hourly averaged ozone concentrations anywhere on the geographic domain for the episode were found to follow a log-normal distribution and exhibit a variability from 176 to 331 ppb. The locations of the 50 predicted ozone peaks varied from 100 km upwind (southwest) of New York City to 150 km downwind (northeast) of the city. Variability in the input parameter known as the anthropogenic volatile organic compound (VOC) area source emissions had the most influence on the variations in the 50 predicted peak ozone concentrations. ( 1998 Published by Elsevier Science Ltd. All rights reserved
1. OBJECTIVES AND BACKGROUND
The 1990 Clean Air Act Amendments (CAAA) passed by the U.S. Congress require that photochemical grid models be used to demonstrate that candidate emissions control strategies will provide desired reductions in ozone concentrations. The Urban Airshed Model — Carbon Bond IV (UAM-IV) is a standard regulatory model recommended by the U.S. Environmental Protection Agency (EPA, 1991; Scheffe and Morris, 1993) and is the most commonly used model in the U.S. for these applications. However, comprehensive reviews by the National Research Council (1991) and by Barchet et al. (1994) point out that there are uncertainties in all photochemical grid models that create a risk of developing inaccurate emissions control recommendations. The primary objective of
*Correspondence should be addressed to: Steven R. Hanna, CSI Mail stop SC3, 103 Science and Technology I, George Mason University, Fairfax, VA 22030-4444, U.S.A. Tel.: 703 993 1992, Fax: 703 9931980, e-mail:
[email protected]. - Current Affiliation: George Mason University, Fairfax, VA 22030-4444, U.S.A.
the current study is to help to quantify part of that risk by developing a preliminary methodology for assessing the uncertainties in the predictions of photochemical grid models due to typical uncertainties in a full set of input parameters. The uncertainty/assessment methodology should be sufficiently general that it can be applied to a wide range of models and domains and it can be included as a routine in a comprehensive modeling system (Hansen et al., 1994). It is noted that the uncertainty analysis methods discussed in this paper address only one component of the total model uncertainty, namely the component due to uncertainties in specifying input parameters. Not addressed are other components of total model uncertainty, such as uncertainties due to deficiencies in model physics and chemistry assumptions, and uncertainties due to natural or stochastic fluctuations in the atmosphere. Furthermore, no comparisons with actual observations are given in this paper. Previous comparisons by Hanna and Fernau (1997) of the predictions of the photochemical grid model UAM-IV with observations on the New York domain for 7—8 July 1988 showed that there are biases of #12% on July 7 and #14% on July 8 in predictions
3619
3620
S. R. HANNA et al.
of peak hourly averaged ozone concentrations anywhere on the geographic domain. A relative scatter of 39% was found on 7 July and 25% on 8 July in predictions of hourly averaged ozone concentrations at given monitor locations and times. Errors of about a factor of two were found in predictions of hourly averaged NO and VOC concentrations at given rex ceptor locations. A survey of a wide range of related literature demonstrated no shortage of methods available for analyzing uncertainties in model predictions due to variations in input parameters. Questions concerning model uncertainty arise in all scientific and engineering fields. Unfortunately for photochemical grid modelers, their problem lies at the high end of the complexity range, due to the large numbers of input parameters, the many equations and parameterizations, the thousands of grid points, and the high time resolution required to solve the differential equations. Several optional approaches for analyzing the uncertainty of complex modeling systems are summarized below. Uncertainties in modeling systems are often studied using sensitivity analysis procedures. It is assumed that a model consists of a set of equations with m dependent or ‘‘output’’ variables and n independent variables plus input parameters (Rabitz et al., 1983). The sensitivity coefficient, S, can be defined as the ratio of the fractional change in an output variable to the corresponding fractional change in an input variable. As with any differential procedure, it is implied that the fractional changes are small (say, less than 10%). The combined effects of variations in multiple input parameters can be estimated by assuming that there are no correlations among variables and there are no nonlinear effects, giving the result that the total fractional uncertainty in a given dependent or ‘‘output’’ variable is the square root of the sum of the squares of each individual sensitivity coefficient. Because the sensitivity coefficient approach is less accurate for input parameters that exhibit relatively large uncertainties (e.g. larger than 10—20%), for systems that are nonlinear, and for systems where input variables are strongly correlated, it is less useful for large photochemical grid models such as UAM-IV. Advanced mathematical procedures and software systems have been developed that allow sensitivity coefficients to be evaluated for more complex modeling systems. These include the direct decoupled method (DDM) applied by Dunker (1981, 1984), Milford et al. (1992), and Gao et al. (1995); the Fourier Amplitude Sensitivity Test (FAST) applied by McRae et al. (1982); the Green’s Function or Adjoint Green’s Function methods discussed by Rabitz et al. (1983) and Rabitz and Hales (1995); and the automatic differentiation technique tested by Carmichael et al. (1997). Nearly all applications of these advanced methods have been to subsets of the full models, with emphasis on the equations describing photochemical mechanisms.
Stochastic Response Surface Models (SRSMs) represent a relatively new approach that may yield useful results for some classes of simplified photochemical grid models. The software system transforms the model’s input variables into a set of independent orthogonal variables, which then allow the model’s output variables to be expressed as functions of these same variables. The method is computationally efficient as long as the number of degrees of freedom is not very large, and it can be applied to linear models, to some nonlinear models, and to some ‘‘black-box’’ models. The SRSM approach can also be thought of as a ‘‘model of a model’’. The challenge is to express the photochemical grid model in a form that can be analyzed by the SRSM. An example of a SRSM is the Deterministic Equivalent Modeling Method (DEMM) discussed by Tatang (1995) and Isukapalli et al. (1995). Thus far SRSMs have been tested with limited sets of inputs and have not been applied to the full set of input parameters required for a photochemical grid model such as UAM-IV. Monte Carlo methods have been applied for about 20 yr to both simple atmospheric dispersion models such as the Gaussian plume model (e.g. Irwin et al., 1987) and to complex photochemical models (e.g. Stolarski et al., 1978; Thompson and Stewart 1991; Gao et al., 1996; Yang et al., 1995). Ang and Tang (1984) provide a good general description of the Monte Carlo method. The advantages of the Monte Carlo method are that (1) it can be applied to a complete set of about 100 or more input parameters, (2) it allows useful estimates of the uncertainties in model outputs to be obtained with only 50 or 100 model runs, (3) it allows use of standard nonparametric statistical tests concerning confidence intervals, and (4) it is widely used in the analysis of other environmental problems. The above comments apply to the socalled Monte Carlo Simple Random Sampling (SRS) method, as distinguished from the Latin Hypercube Sampling (LHS) method, which is also used in many applications. The SRS method employs straight-forward independent random sampling from the entire distribution for each sample. However, the LHS method forces the distribution of random samples to be more equally spread across the specified probability density function, which is thought to be useful when large numbers of samples are not possible. The LHS method does not allow the application of standard nonparametric statistical tests, and it is often found to underestimate the total variance of the output parameters (NCRP, 1996). The SRS Monte Carlo method was selected for use in this preliminary study and is more thoroughly described in the next section.
2. MONTE CARLO METHOD
The Monte Carlo method (with Simple Random Sampling) for uncertainty analysis is quite simple in
Uncertainties in predictions by UAM-IV
principle (Ang and Tang, 1984; NCRP, 1996). The major challenge with the method is to efficiently carry out many runs of a complex photochemical grid model such as UAM-IV, and then to distill the millions of three-dimensional, gridded, time-dependent outputs into a few useful conclusions. First, the 100 or more important input variables must be identified and their uncertainty ranges and shapes of their probability density functions (PDFs) prescribed. There are no restrictions on the shapes of the PDFs, although most studies make use of a few basic PDF shapes, such as normal, log-normal, uniform, or triangular. Maximum and minimum limits on each variable can be prescribed to prevent unrealistic selection of extreme values. Next the method randomly selects a new set of input parameters (the selection process can assume independence among the input parameters or can prescribe correlations, if known). The modeling system is then run to generate results using the set of input parameters that was selected. The process is repeated many times (there is always a compromise that must be reached concerning the optimum number of runs, since each model run takes several hours on a computer workstation). For each model output variable, a PDF is generated. It is important to stress again that the input and output variables can have any shape PDF. As explained by Conover (1971), the ‘‘confidence’’ in the results may be interpreted through statistical tolerance limits, which show that the SRS Monte Carlo uncertainty analysis is valid even for a limited number of trials (say 50 or 100). Tolerance limits differ from statistical confidence intervals in that the tolerance limits provide an interval within which at least a proportion, q, of the population lies, with probability, p or more, that the stated interval does indeed ‘‘contain’’ the proportion, q, of the population. For example, the minimum and maximum values obtained from a simple random sample of size 50 will produce 95% statistical confidence that this range contains (at least) 90% of the population of possible values. This procedure is valid only when the Monte Carlo simulations are based on Simple Random Sampling (SRS). To date, there is not an acceptable technique for producing statistical tolerance limits using Latin Hypercube Sampling (LHS). The use of statistical tolerance limits in uncertainty analysis is discussed in both IAEA Safety Series No. 100 (IAEA, 1989) and NCRP Commentary No. 14 (NCRP, 1996). A further objective of this study is to generalize the methodology so that it can be included in a comprehensive modeling system (Hansen et al., 1994) and applied to any photochemical grid model. A preliminary software system has been developed that is modular so that the user can specify different input parameters, PDFs, and models relatively easily. Portions of the preliminary code are generic to any Monte Carlo application and portions are specific to the chosen model. The user chooses which output parameters to save and to analyze. The demonstra-
3621
tion system includes a set of analysis programs and plotting tools (some examples of resulting scatterplots and correlation summaries are given later). A statistical package is included that allows calculation of standard statistics and confidence intervals using the bootstrap method as described by Hanna (1989) and Olesen (1995). The code can be easily modified to add new analysis methods. Interested readers can obtain the diskette containing this preliminary Monte Carlo code framework from the authors. It must be emphasized that the current uncertainty analysis method deals with only one component of the total model uncertainty and says nothing about the component of model uncertainty associated with the lack of validity of the physics and chemistry algorithms in the model itself. The model is assumed to be a true representation of atmospheric processes. Also, the analysis methods do not deal with the component of model uncertainty associated with natural or stochastic (turbulent) fluctuations that occur in the atmosphere. When compared with observations, the best photochemical grid models typically show uncertainties of about 10% in predicting peak daily hourly averaged ozone concentrations, and the uncertainty grows to about a factor of two in predicting hourly averaged NO and VOC concentrations at fixed rex ceptor sites (Scheffe and Morris, 1993; Hanna et al., 1996; Hanna and Fernau, 1997). These ‘‘minimum uncertainties’’ represent the sum of the contributions of the three components of model uncertainty, and can be thought of as the limits to accuracy of current photochemical grid models.
3. UNCERTAINTIES IN UAM-IV INPUT PARAMETERS
The uncertainty in model predictions due to uncertainties in input parameters is a function of the magnitudes and shapes of the PDFs of the uncertainties in individual inputs. Correlations between inputs may also be important in some cases. As described by Morgan and Henrion (1990), the information concerning the individual inputs can be estimated in many ways, with the most detailed method involving a formal elicitation process in which the opinions of experts are obtained by means of written communications, face-to-face meetings, and multiple model runs by individual experts. Less detailed informal methods can also be followed which involve only limited questionnaires and no face-to-face meetings. It is the case that, for most large environmental models, the uncertainties in inputs are not well-known and are not often discussed in the literature. An example of an exception is the article by Atkinson et al. (1992), who discuss uncertainties in the chemical rate parameters used in regional photochemical grid models. The preliminary study described in this paper is based on an informal elicitation method that used responses to questionnaires that were mailed to ten
3622
S. R. HANNA et al.
experts, whose backgrounds represent the four major components of the model (emissions, boundary conditions, meteorology, and chemistry). The questionnaire sent to the experts did not mention any specific geographic domain or time period, and emphasized the EPA’s (1990, 1991) regulatory photochemical grid model, UAM-IV. Consistent with definitions in the model, the input variables are assumed to be constant over a grid square (5 km by 5 km horizontal spacing and 50 m to about 500 m vertical spacing) and over a time period of about one hour. Of course, the uncertainties in various inputs will vary with location, with episode type, with time of day, and with many other influences. Variations in some types of input variables were not included in our preliminary analysis, since they would require major effort for each new Monte Carlo run. A good example of such an input is grid size (5 km in this UAM-IV study), which is known to affect the model predictions but which would require substantial additional effort to re-interpolate the emissions fields and other input fields and rerun the preprocessors if grid size were varied for each new Monte Carlo run. The grid size is an example of a ‘‘model domain parameter’’ referred to by Morgan and Henrion (1990). For ease in application of this preliminary method, spatial- and time-variable inputs such as wind speed, wind direction, and mixing depth were assumed to be uniformly perturbed by a constant factor across the entire domain (e.g. all wind directions are perturbed by #12° on all grid squares). It would be possible to perturb the velocity field more realistically, but it would be necessary to be sure that known relations for mesoscale and regional space and time energy spectra were satisfied. Perturbations in a few parameters, such as vertical diffusivity or dry deposition velocity, had to be implemented by adding a few statements to the UAM-IV code to read information on the Monte Carlo samples and to apply the perturbations. The median values of all input parameters were assumed to be those used in the New York State Implementation Plan (SIP) application of UAM-IV to the 6—8 July 1988 ozone episode on the New York City UAM domain (NYSDEC, 1994), with the exception that the meteorological input files were from the Modeling Ozone Cooperative (MOCA) study (Fernau et al., 1995). The MOCA study used essentially the same meteorological preprocessing software and observational inputs as were used by NYSDEC (1994), but the methods of determining vertical wind extrapolation and overwater temperatures differed slightly. The 6—8 July 1988 episode was one of the worst that has occurred in the Northeast U.S. in the past 15 yr. Like most ozone episodes in the region, it was characterized by hot weather, mostly clear skies, a widespread high pressure system, and light winds with general flow from the southwest to northeast (i.e. the winds were lined-up along the axis of the U.S. east coast megalopolis). The ambient air quality standard
for ozone (120 ppb for a one-hour average) was exceeded on 6—8 July 1988 over a broad area extending from Virginia to Maine. For this exercise, the probability density functions (PDFs) of the input variables were assumed to be either clipped normal or clipped log-normal. The distributions were clipped at the high and low ends to avoid unphysical out-of-range extremes. Log-normal distributions are found to be more appropriate for environmental variables that are widely distributed and that tend to have a few large values (e.g. wind speed). Normal distributions are more appropriate for variables such as wind direction and temperature. A Standard Random Sampling (SRS) code was applied to the transformed normal variables and the arbitrary assumption was made that only resampled variables that were within $ two standard deviations of the median of the transformed normal variable were used. Correlations among perturbations in input variables were not accounted for in this preliminary test of the methodology. The NCRP (1996) document points out that, for 50 random samples, the 95% confidence bars on any correlation coefficients with magnitudes less than about 0.3 will overlap zero, thus implying that these correlations will have little influence on the variance or uncertainty of the output variables. For photochemical grid models, there are few, if any, pairs of input variables whose perturbations have correlations with magnitudes exceeding about 0.3. Further tests could investigate the effects of any correlations exceeding 0.3 in more detail. However, the experts currently have a difficult enough time estimating the uncertainty of individual input variables without also having to estimate the correlation between pairs of input variables. The correlation itself would have a large uncertainty. Table 1 lists the 109 input variables for the regional photochemical grid model, UAM-IV, that were used in the analysis. The preliminary estimates of uncertainty ranges (intended to include 95% of the data) are given, based on information received from the ten experts. The variables fall into four groups: emissions (variable numbers 1—6), initial and boundary chemical concentrations (variable numbers 7—12), meteorological conditions (variable numbers 13—23), and chemical reactions (variable numbers 24—109). The chemical reaction rate constants for the 86 Carbon Bond-IV reactions are all assumed to have an uncertainty of $30%, on the basis of the advice of the four chemists who responded to the questionnaire. Future application of the method could use variable values of the uncertainties in the rate constants (e.g. Atkinson et al., 1992) and could account for groupings of the chemical mechanisms. Readers interested in discussions of the set of 86 chemical equations and associated rate constants are referred to the model’s technical document (EPA, 1990). It is noted that there are also ‘‘uncertainties’’ in the uncertainty estimates in Table 1. Because of a lack of
Uncertainties in predictions by UAM-IV
3623
Table 1. Listing of 109 UAM-IV (EPA, 1990) input variables and assumed 95% uncertainty ranges (with respect to the medians) Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24—109
Input variable Emissions Group Anthropogenic NO area source emiss. x Anthropogenic VOC area source emiss. Anthropogenic NO point source emiss. x Anthropogenic VOC point source emiss. Biogenic NO emissions x Biogenic VOC emissions
95% Uncertainty range
$40% $80% $30% $50% $Factor of two $Factor of two
PDF L-N L-N L-N L-N L-N L-N
Initial and boundary chemical concentrations group Initial ozone concentration $30% L-N Initial NO and VOC concentration $50% L-N x Boundary ozone concentration $30% L-N Boundary NO and VOC concentration $80% L-N x Top ozone concentration $70% L-N Top NO and VOC concentration $Factor of two L-N x Meteorological conditions group Wind speed $30% L-N Wind direction $0.52 rad ($30°) N Mixing depth $30% L-N Vertical diffusivity K $50% L-N z Relative humidity $30% N Surface temperature $3 K N Stability class $One class N Temperature gradient in day mixed layer $0.002 K/m N Temperature gradient at night and aloft $0.005 K/m N Region top $30% L-N Deposition velocity v $50% L-N $ Chemical reaction group Chemical rate constants for UAM-IV reactions 1—86 (all assumed to have an uncertainty of $30%, a L-N distribution, and to vary independently). See EPA (1990) for reactions.
Note. Log-Normal and normal distributions are denoted by L-N or N, respectively.
detailed observations over a wide range of conditions, the experts’ confidence in their uncertainty estimates can be as high as a factor of two in some cases. To simplify the generation of input files for new model runs in this preliminary demonstration exercise, some cases of ‘‘double counting’’ of input variable perturbations were allowed. For example, the vertical diffusivity K and the deposition velocity v are not z $ strictly ‘‘input’’ variables but are calculated inside the code. They are functions of other input variables such as wind speed and stability class. Therefore they were first calculated by the code using the new (perturbed) wind speed and the new stability class, and then a new line of code multiplied these values by the randomly sampled perturbation for K or v . The compounded z $ uncertainty in K and v may be as high as about two z $ times the values (plus and minus 50%) listed in Table 1. In addition, the non-photolytic chemical rate constants are functions of both the temperature perturbation and the randomly sampled perturbation for the chemical rate constant. The effect of the temperature perturbation is expected to be minor when compared with the 30% uncertainty assumed for the chemical rate constants in Table 1.
4. VARIABILITY IN OZONE PREDICTIONS DUE TO UNCERTAINTIES IN INPUTS
Fifty Monte Carlo runs were made, with the number primarily limited by the fact that a single threeday UAM-IV run requires about 1 day of time on 2 a computer workstation. As mentioned in Section 1, for 50 samples we have 95% confidence that the range (minimum to maximum of the 50 values of any output variable) will contain 90% of the population of possible values. These are satisfactory ‘‘tolerance limits’’ for large environmental models (NCRP, 1996). With this number of Monte Carlo samples, any correlations among input variables less than about 0.3 will have no significant effect on the output statistics. Furthermore, since the total uncertainty of predictions by UAM-IV of peak daily hourly averaged ozone concentration anywhere on the domain is known to be about 10 to 20%, there are diminishing returns in carrying out many more Monte Carlo runs in order to refine the outputs. The preliminary analysis of the outputs of the Monte Carlo runs is complicated by the sheer volume of output data (three-dimensional, time-dependent fields of all output variables for all 50 Monte Carlo
3624
S. R. HANNA et al.
runs). The current demonstration exercise has been limited to analysis of predictions of ozone in the lowest UAM-IV grid layer, whose horizontal dimensions are 5 km by 5 km per grid cell, and whose thickness varies from 50 m at night to as much as 1 to 2 km during the day. The analysis is centered on 7 and 8 July, since on 6 July the model is still moving toward an equilibrium state in relation to the initial inputs. Following the NCRP’s (1996) advice to first ‘‘look at the data,’’ many scatter plots, contour plots, time series plots, and distribution plots such as Fig. 1 were created. The location and magnitude of the 50 predicted peak hourly averaged domain-wide ozone concentration maxima for 7 or 8 July are plotted in Fig. 1. The locations of the 50 predicted maxima extend over a broad region covering a range of about 250 km. It is seen that a large clustering or grouping of the predicted maximum ozone concentrations occurs about 50—100 km northeast (downwind) of New York City, and that another clustering or grouping occurs near the southwest (upwind) boundary. Winds tended to flow from southwest to northeast during this episode. The first grouping is found to be related to positive perturbations of anthropogenic VOC area source emissions, while the second grouping is found to be related to positive perturbations of boundary concen-
trations of ozone. The grouping northeast of New York City contains the five highest ozone predictions. Only two of the 50 maxima occur in the same grid square as the maximum predicted by the base run (near the 205 and 239 ppb values just east of Receptor 8 in New York City). The 50 predicted concentrations in Fig. 1 range from a minimum of 176 ppb to a maximum of 331 ppb. We have 95% confidence that this interval contains 90% of the population. The cumulative distribution function (CDF) for the 50 predicted domain-wide peak hourly averaged ozone concentrations is shown in Fig. 2, which includes 95% confidence limits. The figure also contains a best-fit three-parameter log-normal distribution curve. The PDF for this curve is given by the formula: 1 p(ln(O !C))" 3 J2npln(O !C) 3
G A
BH
1 ln(O !C)!k(ln(O !C)) 2 3 3 ]exp ! pln(O !C) 2 3
(1)
where O is the predicted ozone concentration and 3 C is a constant with the same units as the ozone concentration. The CDF is obtained by integrating
Fig. 1. Map of 230 km by 290 km New York geographic domain used in the UAM-IV modeling uncertainty study. Larger numbers and dots with names indicate locations of the ten receptors being studied. Smaller numbers indicate magnitudes (ppb) and locations of UAM-IV predicted domain-wide peak hourly averaged ozone concentrations on 7 or 8 July 1988 for the 50 Monte Carlo runs. In some cases, the numbers are offset by a small amount in order to avoid having them lie on top of one another. The location of the predicted maximum for the UAM-IV base run (which used mean values of all input variables) was in New York City, near the Mabel Dean receptor site.
Uncertainties in predictions by UAM-IV
3625
Fig. 2. Cumulative distribution function (CDF), shown as solid line, for the 50 Monte Carlo runs with UAM-IV for the predictions of the domain-wide peak hourly averaged ozone concentration (ppb). 95% confidence bounds on the CDF are shown as dashed and dotted lines, and the 50 predictions are shown as solid squares. The best-fit three-parameter log-normal distribution is given in equations (1) and (2).
the PDF: ln(O !C) 3
CDF(ln(O !C))" 3
: p(x) dx. ~=
(2)
The three best-fit parameters in equation (1) are: C"175 ppb, k(ln(O !C))"3.64, pln(O !C)"0.66. 3 3 The three-parameter log-normal distribution fits these data best because of the long tail at high concentrations and because of the bunching of the points near the lowest concentration of 175 ppb. Note that it is implied by the shape of the distribution that there is a 2% probability that the predicted concentration will be even higher than the maximum of 331 ppb produced by the 50 Monte Carlo runs. Equations (1) and (2) can be used to estimate the probability of exceeding any concentration of interest. Of course, because of physical and chemical constraints, it is not appropriate to extend the curve in Fig. 2 indefinitely. Also, it should be remembered that this curve is valid only for the UAM-IV model applied to the New York domain for the 6—8 July 1988 time period. The ten receptor locations (indicated by large numbers and dots) shown in Fig. 1 were selected as representative sites for special analysis of the variability of the peak hourly averaged ozone predictions of the 50 Monte Carlo runs for 7—8 July 1988. Some characteristics of the distributions of the sets of 50 predicted Monte Carlo peak ozone concentrations at the ten receptors are compared in Table 2. The distribution of the 50 domain-wide peak ozone concentrations is also included at the bottom of the table. Several key points on the CDFs are listed, including the minimum, me-
dian, and maximum, and the 95% probability ranges (based on the 2nd and 49th ranked values of the predicted ozone concentrations). The probability ranges are expressed both as absolute magnitudes in ppb, and as percentages of the medians. Since the 95% probability ranges are defined by the mean plus and minus two standard deviations, then the standard deviations of the normalized concentrations would be about one-fourth as large as the percentages for the entire 95% range shown on the right side of the table. Recall from Section 1 that it can be stated that, for 50 Monte Carlo runs, we have 95% confidence that the interval defined by the predicted maxima and minima in Table 2 encompasses 90% of the population. The ranges of uncertainty in predicted concentrations at the ten stations listed in Table 2 are seen to be different for two geographic zones — one geographic zone (receptors at E. Fishkill, NY; Greenwich, CT; and Mabel Dean, NY) represents the urban plume in and just downwind of New York City, and the other geographic zone (all other receptors) represents more rural areas. The first geographic zone shows about twice as much uncertainty (about 60 to 80%) in predicted maximum ozone concentrations. The Mabel Dean receptor site is nearest New York City and shows the largest variability, partly due to ozone ‘‘scavenging’’ that occurs in the urban plume for Monte Carlo runs with relatively large positive perturbations in NO emissions. The 54% uncertainty x that is found for the domain-wide ozone maxima is between the uncertainties for the two categories mentioned above. Also note that all ozone concentration points on the domain-wide CDF are about 30 ppb or more higher than the highest ozone concentration points for the individual stations. As the number of
S. R. HANNA et al.
3626
Table 2. Characteristics of the distribution of the sets of 50 predicted UAM-IV peak hourly averaged ozone concentrations (in ppb) for 7—8 July 1988 from the Monte Carlo uncertainty runs 95% Probability range Receptor 1 Catskill, NY 2 Oxford, MA 3 E. Hartford, CT 4 E. Fishkill, NY 5 New Haven, CT 6 Babylon, NY 7 Greenwich, CT 8 Mabel Dean, NY 9 Plainfield, NJ 10 Flemington, NJ Domain-Wide Peak
Min. (ppb)
Med. (ppb)
Max. (ppb)
Expressed in ppb
Expressed as Percent of Median! (%)
72 98 115 126 116 109 127 94 135 154 176
90 123 137 179 141 137 177 135 163 181 213
126 157 161 266 169 165 265 233 210 214 331
78—121 98—145 116—155 136—258 120—169 122—163 133—251 101—218 141—194 154—212 184—298
48 38 28 68 35 30 67 87 33 32 54
Note. Data are given for the ten receptors (see Fig. 1 for locations) and for the domain-wide peak. ! Percentage is calculated by taking the difference between the 2nd and 49th-ranked predicted concentrations and dividing by the median.
receptors being analyzed becomes larger, the maxima for the receptors will approach the domain-wide maxima. An effort was made to identify those input parameters whose variations have a relatively strong effect on the variations of the outputs described above and summarized in Table 2. With only 50 Monte Carlo runs, it is not wise to carry any correlation analyses too far, due to the influence of inevitable spurious correlations that occur between the input variables. Nevertheless, a few relatively strong correlations (0.60 to 0.70) appeared between variations in predicted ozone concentrations at the receptor locations in Fig. 1 and variations in some individual input parameters, as described below. For example, this limited analysis suggested that the variations in predicted peak hourly averaged ozone concentration over the entire domain are most strongly related to variations in anthropogenic VOC area source emissions, a result that is consistent with UAM-IV model evaluation and emissions sensitivity studies on the New York domain by Hanna and Fernau (1997). The relatively strong dependence on anthropogenic VOC area source emissions is also found at individual receptors in and just downwind of New York City (i.e. Mabel Dean, NY; E. Fishkill, NY; and Greenwich, CT). Variations in peak predicted hourly averaged ozone concentrations at receptors located near upwind boundaries (i.e., Catskill, NY and Flemington, NJ) are found to be most strongly dependent on variations in boundary ozone concentrations. Variations in predicted peak ozone concentrations at the farthest-downwind rural receptor in Fig. 1 and Table 2 (Oxford, MA) are found to be most strongly correlated with variations in wind direction, which tend to shift the position of the broad urban ozone plume extending downwind from the New York City source region.
5. IMPLICATIONS AND LIMITATIONS
A preliminary Monte Carlo methodology for assessing the variations in predictions of photochemical grid models due to uncertainties in input variables has been developed and tested, and some examples of results have been given in Section 4 above. A draft computer code, based on Monte Carlo Simple Random Sampling (SRS), was applied to the UAM-IV photochemical grid model (EPA, 1990) as run on the New York City domain for the ozone episode occurring during the time period 6—8 July 1988. Several possible ways of expressing and interpreting the variability in the outputs have been presented and 95% confidence limits have been given for several of the statistical measures. There are obvious implications concerning the risk of making erroneous decisions when key input variables are not well-known. For this preliminary demonstration exercise using a set of 50 UAM-IV runs, where inputs are randomly selected from distributions based on their expected ranges of uncertainty, the predicted domain-wide maximum hourly averaged ozone concentration ranged from 176 to 331 ppb (almost a factor of two range). Because a diskette is available containing the inputs for the 50 Monte Carlo runs and the ozone predictions at key locations for these runs, it is possible for risk analysts and others to more thoroughly investigate these issues. Despite the useful results that have been obtained from this demonstration exercise using the preliminary Monte Carlo methodology, there are several limitations that can be stated and will be the subject of future studies in which the methodology is improved: ¸imitation 1 — Estimation of uncertainties in inputs — The estimates of the uncertainties in the
Uncertainties in predictions by UAM-IV
inputs were made based on an informal process where ten modelers responded to a questionnaire. Furthermore, the estimates of uncertainties have not been keyed to geographic domains and other site-specific considerations. More work needs to be done to better estimate these uncertainties. ¸imitation 2 — Correlations among inputs — Where known, large correlations (i.e., greater than about 0.5) among inputs should be accounted for in the Monte Carlo resampling. Similarly, groups of related chemical reactions should be identified. The methodology should be designed so that physically improbable combinations of inputs are avoided (e.g. a large positive perturbation in anthropogenic VOC emissions is unlikely to occur at the same time as a large negative perturbation in VOC boundary concentrations). ¸imitation 3 — Accounting for ‘‘difficult’’ inputs — The application described above used only those variables that could be changed without difficulty for each new Monte Carlo run. Model domain parameters such as grid size were not changed. Also, even though uncertainties in some gridded input fields such as wind speed, wind direction, and emissions are known to vary in space, the resampling method assumed that the local grid values of these inputs were perturbed the same relative amount everywhere. Some variations in space could be allowed, as long as the extra labor time and computer time can be minimized, and known space and time energy spectra relations could be satisfied. ¸imitation 4 — ¹ime required for Monte Carlo runs — It takes many weeks to complete 50 runs on a single workstation for a photochemical grid model such as UAM-IV. This time could be reduced by making better use of computer resources (e.g., making runs on parallel processors). ¸imitation 5 — Interpretations of results — A few standard statistical tests were carried out in this preliminary methodology demonstration. This set of tests could be modified and expanded and new analysis/interpretation methods developed and applied. ¸imitation 6 — Near-surface ozone at a few sites — Even though the 50 Monte Carlo UAM-IV runs produced vast numbers of outputs in three dimensions and in time, the analysis was restricted to maximum near-surface ozone at a few stations. Clearly there is much more that can be done, involving other pollutants, other vertical layers, other times, and so on. ¸imitation 7 — One model, one geographic domain, one time period — The methodology should be expanded to other photochemical grid models (e.g. UAM-V, SAQM, CALGRID, Models-3), other geographic domains (e.g. Houston, Atlanta, Los Angeles), and other time periods (including both polluted and clean episodes). The next phase of the project will deal with UAM-V on the Eastern U.S. domain for the 6—12 July 1995 time period. ¸imitation 8 — Base case emissions — The analysis was limited to base case mean emissions files. The
3627
mean emissions assumed could be altered to determine how the uncertainties may change as emissions are systematically reduced by means of control strategies. Acknowledgments—The project is supported by the Electric Power Research Institute, with D. Alan Hansen as technical manager. H. Christopher Frey (North Carolina State University), Eduard Hofer (GRS, Munich, Germany), and F. Owen Hoffman (SENES Oak Ridge, Inc.) made major contributions to the design of the Monte Carlo project and the interpretation of the results.
REFERENCES
Ang, A. H.-S. and Tang, W. H. (1984) Probability Concepts in Engineering Planning and Design, »ol. 2: Decision, Risk, and Reliability. Wiley, New York. Atkinson, R. A., Baulch, D. L., Cox, R. A., Hampson, R. F. Jr., Kerr, J. A. and Troe, J. (1992) Evaluated kinetic and photochemical data for atmospheric chemistry. Supplement IV. IUPAC Subcommittee on Gas Kinetic Data Evaluation for Atmospheric Chemistry. Journal of Physical and Chemical Reference Data 21, 1125—1135. Barchet, W. R., Roth, P. M. and Tesche, T. W. (1994) Evaluating the Risks of Developing Inaccurate Emissions Control Recommendations using Regional Photochemical Air Quality Simulation Models: A Draft Protocol. Prepared for EPRI, P.O. Box 10412, Palo Alto, California. Dated 19 January 1994, plus 24 March 1994 Addendum, total 123 pp. Carmichael, G. R., Sandu, A. and Potra, F. A. (1997) Sensitivity analysis for atmospheric chemistry models via automatic differentiation. Atmospheric Environment 31, 475—489. Conover, W. J. (1971) Practical Nonparametric Statistics. 2nd Edn. Wiley, New York. Dunker, A. M. (1981) Efficient calculation of sensitivity coefficients for complex atmospheric models. Atmospheric Environment 15, 1155—1161. Dunker, A. M. (1984) The decoupled direct method for calculating sensitivity coefficients in chemical kinetics. Journal of Chemical Physics 81, 2385—2393. EPA (1990) User’s Guide for the Urban Airshed Model (Five Volumes). EPA-450/4-90-007, U.S. EPA/OAQPS, Research Triangle Park, North Carolina. EPA (1991) Guidelines for Regulatory Applications of the Urban Airshed Model. EPA-450/4-91-013. NTIS No. PB92-108760. USEPA/OAQPS, Research Triangle Park, North Carolina. Fernau, M. E., Schulman, L. L. and Scire, J. S. (1995) Application of CALMET to the 4-11 July 1988 Episode for the Modeling Ozone Cooperative. Final Report, MOCA c/o Edison Electric Institute, 701 Pennsylvania Ave. NW, Washington, District of Columbia. Gao, D., Stockwell, W. R. and Milford, J. B. (1995) Firstorder sensitivity and uncertainty analysis for a regionalscale gas-phase chemical mechanism. Journal of Geophysics Research 100D, 23,153—23,166. Gao, D., Stockwell, W. R. and Milford, J. B. (1996) Global uncertainty analysis of a regional-scale gas-phase chemical mechanism. Journal of Geophysics Research 101, 9107—9119. Hanna, S. R. (1989) Confidence limits for air quality models, as estimated by bootstrap and jackknife resampling methods. Atmospheric Environment 23, 1385—1395. Hanna, S. R. and Fernau, M.E. (1997) Photochemical Model Evaluations in the Northeast. Final Report prepared for the American Petroleum Institute, 1220 L Street NW, Washington, District of Columbia.
3628
S. R. HANNA et al.
Hanna, S. R., Moore, G. E. and Fernau, M. E. (1996) Evaluation of photochemical grid models (UAM-IV, UAM-V, and the ROM/UAM-IV couple) using data from the Lake Michigan Ozone Study (LMOS). Atmospheric Environment 30, 3265—3279. Hansen, D. A., Dennis, R. L., Ebel, A., Hanna, S., Kaye, J. and Thuillier, R. (1994) The quest for an advanced regional air quality model. Environmental Science and ¹echnology 28, 560A—569A. IAEA (1989) Evaluating the Reliability of Predictions made using Environmental Transfer Models. IAEA Safety Series No. 100. International Atomic Energy Agency, Vienna, Austria. Irwin, J. S., Rao, S. T., Petersen, W. B. and Turner, D. B. (1987) Relating error bounds for maximum concentration estimates to diffusion meteorology uncertainty. Atmospheric Environment 21, 1927—1938. Isukapalli, S. S., Roy, A. and Georgopoulos, P. G. (1995) Stochastic response Surface Methods (SRSMs) for uncertainty propagation: application to physiologically-based pharmacokinetic modeling. Draft manuscript available from Rutgers Univ. EOHSI, 681 Frelinghuysen Rd., Piscataway, New Jersey. McRae, G. J., Goodin, W. R. and Seinfeld, J. H. (1982) Mathematical Modeling of Photochemical Air Pollution. Tech. Report, Environ. Qual. Lab., California Institute of Technology, Pasadena, California; EQL Report No. 18. Milford, J. D., Gao, D., Russell, A. G. and McRae, G. J. (1992) Use of sensitivity analysis to compare chemical mechanisms for air-quality modeling. Environmental Science and ¹echnology 26, 1179—1189. Morgan, M. G. and Henrion, M. (1990) ºncertainty: A Guide to Dealing with ºncertainty in Quantitative Risk and Policy Analysis. Cambridge University Press, New York. National Research Council (1991) Rethinking the Ozone Problem in ºrban and Regional Air Pollution. National Academy Press, Washington, DC. NCRP (1996) A Guide for ºncertainty Analysis in Dose and Risk Assessments Related to Environmental Contamination
(ed. F. O. Hoffman), NCRP Commentary No. 14, National Council on Radiation Protection and Measurements, 7910 Woodmont Ave., Bethesda, Maryland. NYSDEC (1994) New ½ork State Implementation Plan for Ozone. Vol. I, Appendix A-3: New York airshed urban airshed modeling, July 5 to 11 base case. Division of Air Resources, New York State Department of Environmental Conservation, Albany, New York. Olesen, H. R. (1995) Toward the establishment of a common framework for model evaluation. In Air Pollution Modeling and Its Applications XI, pp. 519—528. Plenum Press, New York. Rabitz, H. and Hales, J. (1995) An ACP Primer on Sensitivity Analysis. Monthly ºpdate, DOE Atmospheric Chemistry Program, Vol. 6, No. 11, K9-30, BPNL, POB 999, Richland, Washington, 1—7. Rabitz, H., Kramer, M. and Docol, D. (1983) Sensitivity analysis in chemical kinetics. Annual Review of Physical Chemistry 34, 419—461. Scheffe, R. D. and Morris, R. E. (1993) A review of the development and application of the Urban Airshed Model. Atmospheric Environment 27B, 23—39. Stolarski, R. S, Butler, D. M. and Rundel, R. D. (1978) Uncertainty propagation in a stratospheric model. 2. Monte Carlo analysis of imprecisions due to reaction rates. Journal of Geophysics Research 83, 3074—3078. Tatang, M. A. (1995) Direct incorporation of uncertainty in chemical and environmental engineering systems. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, Massachusetts. Thompson, A. M. and Stewart, R. W. (1991) Effect of chemical kinetic uncertainties on calculated constituents in a tropospheric photochemical model. Journal of Geophysics Research 96, 13,089—13,108. Yang, Y.-J., Stockwell, W. R. and Milford, J. B. (1995) Uncertainties in incremental reactivities of volatile organic compounds. Environmental Science and ¹echnology 29, 1336—1345.
.