PM and light extinction model performance metrics, goals, and criteria for three-dimensional air quality models

PM and light extinction model performance metrics, goals, and criteria for three-dimensional air quality models

ARTICLE IN PRESS Atmospheric Environment 40 (2006) 4946–4959 www.elsevier.com/locate/atmosenv PM and light extinction model performance metrics, goa...

327KB Sizes 0 Downloads 31 Views

ARTICLE IN PRESS

Atmospheric Environment 40 (2006) 4946–4959 www.elsevier.com/locate/atmosenv

PM and light extinction model performance metrics, goals, and criteria for three-dimensional air quality models James W. Boylana,, Armistead G. Russellb a

Georgia Department of Natural Resources, Air Protection Branch, 4244 International Parkway, Suite 120, Atlanta, GA 30354, USA Department of Civil and Environmental Engineering, Georgia Institute of Technology, 311 Ferst Drive, Atlanta, Georgia 30332-0512, USA

b

Received 30 March 2005; received in revised form 9 September 2005; accepted 9 September 2005

Abstract In order to use an air quality modeling system with confidence, model performance must be evaluated against observations. While ozone modeling and evaluation is fairly developed, particulate matter (PM) modeling is still an evolving science. EPA has issued minimal guidance on PM and visibility model performance evaluation metrics, goals, and criteria. This paper addresses these issues by examining various bias and error metrics and proposes PM model performance goals (the level of accuracy that is considered to be close to the best a model can be expected to achieve) and criteria (the level of accuracy that is considered to be acceptable for modeling applications) that vary as a function of concentration and extinction. In this paper, it has been proposed that a model performance goal has been met when both the mean fractional error (MFE) and the mean fractional bias (MFB) are less than or equal to +50% and 730%, respectively. Additionally, the model performance criteria has been met when both the MFEp+75% and MFBp760%. Less abundant species would have less stringent performance goals and criteria. These recommendations are based upon an analysis of numerous PM and visibility modeling studies performed throughout the country. r 2006 Elsevier Ltd. All rights reserved. Keywords: Bias; Error; Particulate matter; Aerosols; Regional haze

1. Introduction Promulgation of the PM2.5 National Ambient Air Quality Standard (NAAQS) and the Regional Haze Rule in the US requires states to write State Implementation Plans (SIPs) that detail emission control strategies that will be implemented to reduced ambient particulate matter (PM) concentrations to levels deemed acceptable by the US EPA. Counties with measured PM2.5 concentrations that violate the daily standard (24-hour average4 Corresponding author. Fax: +1 404 363 7100.

E-mail address: [email protected] (J.W. Boylan). 1352-2310/$ - see front matter r 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.atmosenv.2005.09.087

65 mg m3) or the annual standard (annual average over 3 years415 mg m3) are designated ‘‘nonattainment’’ and must show attainment with these standards by 2009. Regional haze is the impairment of visibility caused by the presence of PM in the atmosphere that scatter and absorb light. The Regional Haze Rule (US EPA, 1999) requires that visibility at Class I Federal areas (US EPA, 2002) must be returned to their natural conditions (i.e., no anthropogenic emission impacts) by 2064. Every 10 years, reasonable progress towards this goal must be demonstrated. Atmospheric modeling, using multi-dimensional, emissions-based air quality models (e.g., CMAQ

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

(Byun and Ching, 1999), CAMx (Environ, 2002), URM-1ATM (Boylan et al., 2002), REMSAD (ICF Consulting, 2002), CMAQ-MADRID (Zhang et al., 2004a), etc.), provides a scientific method for linking future changes in emissions with future air quality and is the approach required by US EPA for demonstrating attainment of the PM2.5 NAAQS and regional haze reasonable progress goals. Internationally, their application has a less formal role, but they still provide the foundation for policy assessment. Additionally, they are used in scientific analyses to better understand the processes affecting pollutant fate and impacts. Typically, air quality models are used to simulate historical air pollution periods. The modeled pollutant concentrations are then compared to observations from the same period to ensure the model is capturing the important chemical and physical processes in the atmosphere. For regulatory applications, the model must demonstrate ‘‘acceptable’’ performance before future year emissions are estimated and rerun through the model to estimate future year pollutant concentrations under similar meteorological conditions. These future year pollutant concentrations are used to demonstrate that the PM2.5 NAAQS and regional haze reasonable progress goals will be met in the future. Sound model performance evaluations are critical for any modeling exercise and many approaches have been recommended (Chang and Hanna, 2004; Seigneur et al., 2000). The science behind ozone modeling has been evolving for many decades and the chemical processes governing the formation and destruction of ozone are well established. Model performance criteria for ozone were published by the US EPA (1991) and continue to be used to help evaluate the level of confidence in ozone modeling results. However, no such performance guidance has been developed for PM models yet. The question this paper attempts to address is: ‘‘When has a PM model demonstrated acceptable performance?’’ This is not a simple question since PM consists of many components (e.g., sulfate, nitrate, ammonium, organic carbon, elemental carbon, soils, coarse mass, etc.) and the model may perform well for some components, but not others. Additionally, ‘‘What model performance metrics should be used for scientific model applications to facilitate consistent inter-comparisons between applications and models?’’ Regulators can use the information presented in this paper to aid in their decision as

4947

to whether or not a model application is acceptable for regulatory purposes. 1.1. Particulate matter Particulate matter (PM) is defined as microscopic and submicroscopic particles (solid or liquid) that exist in the atmosphere. Particulates with an aerodynamic diameter less than 2.5 mm are referred to as fine PM or PM2.5, those with aerodynamic diameter less than 10 mm are referred to as PM10, and those with aerodynamic diameter between 2.5 and 10 mm are referred to as coarse particles. Major PM species that are routinely measured and simulated using emission-based models are sulfate  + (SO2 4 ), nitrate (NO3 ), ammonium (NH4 ), organic carbon (OC), elemental carbon (EC), soils, and coarse mass (CM). In addition, trace metals are often monitored, though frequently not simulated individually using three-dimensional air quality models. A majority of the sulfate and nitrate is formed in the atmosphere through the oxidation of sulfur dioxide (SO2) and nitrogen oxides (NOx), respectively. Ammonium is formed when ammonia gas (NH3) reacts with acidic particles or gases (e.g., sulfate or nitrate). Organic carbon can either be emitted directly or formed in the atmosphere through the oxidation of volatile organic compounds (VOCs). Elemental carbon is emitted directly into the atmosphere via combustion processes and is treated as an inert (i.e., non-reactive) specie by most models. Soils, or crustal materials, are also treated as an inert compound and consist of metal oxides from wind blown dust, road dust, industrial activities, and agricultural activities. Coarse mass can be made up of a combination of the species discussed above. 1.2. Visibility and light extinction Visibility is a measure of the degree of clearness of the atmosphere. Light extinction is the impairment of visibility caused by the presence of particles and gases in the atmosphere that scatter and absorb light (Seinfeld and Pandis, 1998). Since light extinction is related to the concentrations of particles in the atmosphere, PM models can be used to estimate light extinction and visibility. While comparisons of modeled PM to observations is fairly straightforward, comparisons of modeled visibility to measured visibility is not as straightforward since there are very few direct measurements

ARTICLE IN PRESS 4948

J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

of visibility. However, Eq. (1) can be used to estimate the total light extinction as a linear combination of specific pollutant concentrations, component-specific light scattering/absorption coefficients, and a relative humidity adjustment factor (Malm et al., 2000):  bext ¼ 3  f ðRHÞ  ½SO2 4  þ 3  f ðRHÞ  ½NO3  þ 4  ½OC þ 10  ½EC

þ 1  ½Soils þ 0:6  ½CM þ bRayleigh ,

ð1Þ

where, [SO2 4 ] is the mass associated with fine sulfate PM (mg m3), [NO 3 ] is the mass associated with fine nitrate PM (mg m3), [OC] is the fine total organic PM mass (mg m3), [EC] is the fine elemental carbon PM mass (mg m3), [Soils] is the fine soil (or crustal) PM mass (mg m3), [CM] is the coarse PM mass (mg m3), bRayleigh is Rayleigh scattering (10 Mm1), and f(rh) is the climatological relative humidity adjustment factor (unitless). This equation assumes that the sulfate and nitrate are fully neutralized with ammonium and adds this assumed ammonium mass to the sulfate and nitrate mass, respectively. As with total PM2.5, individual components of total light extinction can be separated and evaluated. 2. PM model evaluations Air quality modeling and ambient measurements are two different ways to estimate actual ambient concentrations of pollutants in the atmosphere. Both modeling and measurements have some degree of uncertainty associated with their estimates. Uncertainties associated with three-dimensional atmospheric modeling systems are very difficult to quantify. The uncertainties associated with emission levels alone are still being investigated for many sources, including some of the most important ones. Very little work has been done to quantify uncertainty due to meteorological inputs and uncertainty in the chemistry has been treated almost solely with box models. Quantifying three-dimensional air quality modeling uncertainty is an on-going process that has been attempted primarily for ozone, and then only for a few inputs applied to a few modeled days (Hanna et al., 2001, 2005; Russell and Dennis, 2000). Uncertainties have been published for measurements obtained by various monitoring networks. However, studies have shown large differences between measured components of PM from different monitoring networks due to sampling and analysis techniques. For example, the difference

between measurements using the Interagency Monitoring of Protected Visual Environments (IMPROVE) protocol and the Speciation Trends Network (STN) protocol at co-located sites can vary by as much as 20% for sulfate, 30% for nitrate, 15% for organic carbon (after blank correction), and 30% for EC (Solomon et al., 2004). In addition, measurements are made using the aerodynamic diameter, not the Stokes diameter as modeled. To remove additional uncertainty in the comparisons, the Stokes diameter can be converted into an equivalent aerodynamic diameter (Seinfeld and Pandis, 1998). Even if this conversion is not performed, the errors associated with this uncertainty are typically far smaller than other uncertainties in the modeling and measurement systems. Next, there is the issue related to comparing a point measurement to a volumetric grid cell averaged modeled concentration (Seinfeld and Pandis, 1998). Most PM measurements are taken at specific locations that are thought to be representative of the PM concentrations over a finite area. The extent that point measurements represents other nearby points or an average concentration in a specific area is related to the concentration gradients that vary by PM species (Marmur et al., 2004). Measurements of species with sharp concentration gradients will be much less representative of an area average concentration than those with smaller concentration gradients. Surface level model grid cells used for PM modeling typically range between 4 to 36 km on a side in the horizontal and between 20 to 40 m in the vertical. Since the modeling results represent an average concentration over the entire grid cell and the observations likely do not represent the average concentration over the same volume of air, it is not possible to exactly match modeled results to measurements. For ozone, NO2, and CO, the amount of calculated model error that could be ascribed to this volume averaging was found to be a significant amount of the total (McNair et al., 1996). Therefore, when comparing modeling results to observations, the measurements should not be considered the absolute truth and normalized bias and error calculations should not necessarily be normalized by the observations alone. 2.1. Standard performance metrics There are a number of performance metrics that can be used to examine model performance of air

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

quality models (Bencala and Seinfeld, 1979; Seigneur et al., 2000). Bias is a measure of the tendency of the model to over or under predict the observations and can be positive or negative. Error is a measure of the extent that the model deviates from the observations and is always positive. Bias and error can be represented in either absolute terms or relative terms (typically normalized by observations). Correlations can also be used. The following section discusses the standard performance metrics that have been used for comparing modeled atmospheric pollutant concentrations to observations. 2.1.1. Mean bias and error The mean bias (MB) and mean error (ME) are defined as the average difference between all modelobserved pairs, with the exception that the error includes only absolute deviation between the two. Mean bias and error can be written as MB ¼

N 1X ðC m  C o Þ, N i¼1

(2)

ME ¼

N 1X jC m  C o j, N i¼1

(3)

where Cm is the model-estimated concentration at station i, Co is the observed concentration at station i, and N equals the number of estimate–observation pairs drawn from all valid monitoring station data for the comparison time period of interest. Since the mean bias and error are given in absolute terms (mg m3), it is difficult to evaluate if the bias and errors are significant without having other information such as to the magnitude of (and possibly the variability in) the observed concentrations included for comparison. 2.1.2. Mean normalized bias and error Mean normalized bias (MNB) and mean normalized error (MNE) normalize the bias and error for each model-observed pair by the observation before taking the average (Eqs. (4) and (5)).  N  1X Cm  Co MNB ¼ , (4) N i¼1 Co  N  1X C m  C o   . MNE ¼ N i¼1  C o 

(5)

4949

Mean normalized bias and error are reported in percent and range from 100% to +N for MNB and from 0% to +N for MNE. Current EPA guidance (US EPA, 1991) recommends using these metric for ozone model performance evaluations in conjunction with an observation-based minimum threshold (e.g., 60 ppb). An observation-based minimum threshold is required since the normalized quantities can become large when the observations are small. Whenever the observation is smaller than the threshold value, that estimate–observation pair is excluded from the calculations. Excluding lower ozone concentrations from the performance statistics is acceptable because ozone NAAQS are directed at peak ozone concentrations. On the contrary, PM NAAQS are concerned with 24-hour average and annual average concentrations and visibility applications are focussed on both the cleanest and dirtiest days. Also, some components of PM can be very small, making it difficult to set a reasonable minimum threshold value without excluding a majority of the data points. Without a minimum threshold, very large normalized biases and errors can result when observations are close to zero even though the absolute biases and errors are very small. In many instances, a few data points can dominate the metric leading to skewed impressions of model performance. Furthermore, since the MNB ranges from 100% to +N and MNE ranges from 0% to +N, simulated concentrations higher than the observations are weighted more than equivalent concentrations where the simulated level is lower than the observations. This metric also assumes observations are the absolute truth. 2.1.3. Normalized mean bias and error The normalized mean bias (NMB) and normalized mean error (NME) differ from the MNB and MNE by normalizing the mean bias and mean error by the mean observation (Eqs. (6) and (7)). PN ðC m  C o Þ NMB ¼ i¼1 , (6) PN i¼1 C o PN NME ¼

i¼1 jC m  C o j . PN i¼1 C o

(7)

The NMB and NME metrics typically give a much better sense of the model performance compared to MNB and MNE and do not require an observationbased minimum threshold. Again, these metrics assume observations are the absolute truth. Also,

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

4950

the NMB ranges from 100% to +N and NME ranges from 0% to +N, resulting in values simulated higher than the observations being weighted more than equivalent (on a relative basis) concentrations simulated lower than the observations. 2.1.4. Mean fractional bias and error The mean fractional bias (MFB) and mean fractional error (MFE) normalize the bias and error for each model-observed pair by the average of the model and observation before taking the average (Eqs. (8) and (9)). MFB ¼

N 1X ðC m  C o Þ , N i¼1 ðC o þ C m =2Þ

(8)

MFE ¼

N 1X jC m  C o j . N i¼1 ðC o þ C m =2Þ

(9)

Since the MFB ranges from 200% to +200% and MFE ranges from 0% to +200%, these metrics have the advantage of bounding the maximum bias and error and do not allow a few data points to dominate the metric. Furthermore, this metric does not assume that observations are the absolute truth. Finally, these metrics are symmetric (i.e., they give equal weight, on a relative basis, to concentrations simulated higher than observations as those simulated lower than observations). 2.1.5. Normalized mean bias and error factor Normalized mean bias factor (NMBF) and Normalized mean error factor (NMEF) are relatively new metrics. Although they are not considered traditional evaluation metrics, they do have some advantages such as treating over and under predictions proportionately and avoiding the inflation associated with small concentrations. However, the equations used to determine NMBF and NMEF (Yu et al., 2005) are more complex than the tradition metrics and assume observations are the absolute truth. 2.2. Benchmark studies A number of modeling studies have been examined in order to benchmark the ability of current air quality models to simulate PM concentrations. Monitoring networks used in these comparisons included Interagency Monitoring of Protected Visual Environments (IMPROVE) (IMPROVE,

1995; Sisler and Malm, 2000; Malm et al., 2000) operated by the National Parks Service, Speciation Trends Network (STN) (US EPA, 2000) operated by US EPA, Clean Air Status and Trends Network (CASTNet) (US EPA, 2003a) operated by US EPA, and Southeastern Aerosol Research and Characterization network (SEARCH) (Hansen et al., 2003) operated by Southern Company. IMPROVE measurements are 24-hour average concentrations and consist of PM2.5 (total mass, sulfate, nitrate, organic carbon, elemental carbon, and soils), PM10, and coarse mass. Additionally, some sites now measure ammonium. IMPROVE measurements are made once every three days. STN measurements are typically collected once every three days, but some sites collect once every six days and consist of total PM2.5 and various components of PM2.5 (sulfate, nitrate, ammonium, organic carbon, and elemental carbon). CASTNet measurements are weekly averages and consist of sulfate, nitrate, and ammonium components of PM2.5. SEARCH measurements consist of 24-hour concentrations of PM2.5 (sulfate, nitrate, ammonium, organic carbon, elemental carbon, and metal oxides), PM10, and coarse mass. These measurements are typically made once every three days, although daily measurements are sometimes available. In addition, SEARCH measures continuous (hourly) concentrations of many of these same pollutants. However, this data has not been used in any of the benchmark studies presented below. The mean fractional bias and error were calculated for each of the following benchmark studies and will be presented in the next section: Southern Appalachian Mountain Initiative (SAMI) using the URM-1ATM model on 6 episodes compared to IMPROVE measurements (Odman et al., 2002 and Boylan et al., 2006); Western Regional Air Partnership (WRAP) using the CMAQ model for a 1996 annual simulation (Wang et al., 2002) and a summer and winter season in 2002 (Tonnesen et al., 2004) compared to IMPROVE measurements; Visibility Improvement State & Tribal Association of the Southeast (VISTAS) using CMAQ for 3 episodes (Morris et al., 2004a) and a 2002 annual simulation (Morris et al., 2004b) compared to IMPROVE, STN, SEARCH, and CASTNet measurements; Midwest RPO using CMAQ, CAMx, and REMSAD for a winter and summer month compared to IMPROVE measurements (Baker, 2004a); Georgia Institute of Technology using CMAQ for a winter and summer month compared to IMPROVE and SEARCH

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

measurements (Park and Russell, 2003); Electric Power Research Institute (EPRI) using CMAQ and CMAQ-MADRID (Bailey et al., 2004) and Coordinating Research Council (CRC) using CMAQ and PM-CAMx (Zhang et al., 2003; 2004b) to simulate a summer episode compared to IMPROVE and SEARCH measurements; and Clear Skies using REMSAD to simulate four seasons in 1996 compared to IMPROVE measurements (US EPA, 2003b). 3. Performance goals and criteria Performance ‘‘goals’’ are defined as the level of accuracy that is considered to be close to the best a model can be expected to achieve in that application. Performance goals can be set for both regulatory and/or scientific applications. Here, performance ‘‘criteria’’ are defined as the level of accuracy that is considered to be acceptable for standard modeling applications, at least without further evidence of adequacy. Current EPA draft guidance (US EPA, 2001) for modeling regional haze and PM provides minimal guidance for determining if PM model performance is ‘‘acceptable’’ for regulatory applications. It has been suggested that current air quality models can achieve mean normalized errors of 50% or less for PM2.5 and sulfate for urban-scale domains (Seigneur, 2001). PM concentrations can vary by more than an order of magnitude depending on species, geographic location, and time of year. Therefore, one might suggest different performance goals and criteria should be developed for: (1) different components of PM, (2) different seasons of the year, (3) different parts of the country, (4) urban vs. rural sites, and/or (5) clean vs. polluted days. However, performance goals and criteria that vary as a function of concentration and light extinction can address all these circumstances with a single set of goals and criteria. 3.1. Particulate matter Analysis of modeling performed by Boylan et al. (2006) and Baker (2004b) suggests that the mean fractional error and bias are the least biased and most robust of the various performance metrics discussed in Section 2.1. Therefore, these are the metrics that have been incorporated into the PM model performance goals and criteria that we propose here. Proposed performance goals and

4951

criteria should account for scientific considerations (e.g., various uncertainties, model capabilities, and representativeness of monitored concentrations), application considerations (e.g., how are the results to be used), and past model performance. This is similar to the approach used by Dennis and Dontown (1984) and Tesche et al. (1990) for ozone and adopted by US EPA (1991). EPA defines ‘‘major’’ components of PM2.5 as those that comprise more than 30% of the total PM2.5 (US EPA, 2001). Applying this definition to measured PM2.5 concentration that are half the annual PM2.5 NAAQS (15 mg m3) results in a concentration of 2.25 mg m3. For our analysis we defined major PM species as those with concentrations above 2.25 mg m3 and minor PM species as those with concentrations below 2.25 mg m3. For visibility, we defined major PM species as those with light extinction above 10 Mm1 (light extinction due to Rayleigh scattering) and minor PM species as those with light extinction below 10 Mm1. Based on the benchmark studies discussed in Section 2.2 (Figs. 1–9), we propose that the model performance goal for major components of PM2.5 has been met when both the mean fractional error (MFE) and the mean fractional bias (MBF) are less than or equal to approximately +50% and 730%, respectively. Additionally, we propose that the model performance criteria for major components of PM2.5 has been met when both the MFE and the MBF are less than or equal to approximately +75% and 760%, respectively. Since less abundant or minor species (o 2.25 mg m3) will have less contribution to total PM mass, they should also have less stringent performance goals and criteria (i.e., higher fractional bias and errors) than those that are more abundant. Furthermore, past model applications suggest that normalized performance is typically poorer for minor species. This is not surprising since a few large deviations can have a significant impact on the overall performance assessment. In the extreme, consider the case where the observed concentration of a PM constituent is zero at all times, but there is a trivially small amount of that same constituent simulated. In this case the MFE and MFB are both +200%. The same is true if the simulated level is zero, and the observed level is non-zero. Thus, the performance goal should range from +200% (MFE) or 7200% (MFB) at a zero concentration to +50% and 730%, respectively. The performance criteria should range from

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

4952

PM2.5 Mean Fractional Error

200.0

150.0 Criteria

100.0

Goal

50.0

Mean Fractional Bias

Mean Fractional Error

200.0

0.0

PM2.5 Mean Fractional Bias

150.0 100.0

(+) Criteria

50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0

4.0

8.0

12.0 16.0 20.0 24.0

0.0

Average Concentration (µg/m3)

4.0

8.0 12.0 16.0 20.0 24.0

Average Concentration (µg/m3)

Fig. 1. PM2.5 (mg m3) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

PM10 Mean Fractional Error

200.0

150.0 Criteria

100.0

Goal

50.0

Mean Fractional Bias

Mean Fractional Error

200.0

0.0

PM10 Mean Fractional Bias

150.0 100.0

(+) Criteria

50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0

5.0

10.0 15.0 20.0 25.0 30.0

0.0

Average Concentration (µ µg/m3)

5.0 10.0 15.0 20.0 25.0 30.0

Average Concentration (µg/m3)

Fig. 2. PM10 (mg m3) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

Sulfate Mean Fractional Error

Sulfate Mean Fractional Bias 200.0

150.0 Criteria

100.0

Goal

50.0 0.0

Mean Fractional Bias

Mean Fractional Error

200.0 150.0 100.0

(+) Criteria

50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0

2.0

4.0

6.0

8.0

10.0 12.0

0.0

Average Concentration (µ µg/m ) 3

2.0

4.0

6.0

8.0

10.0 12.0

Average Concentration (µg/m3)

Fig. 3. Sulfate (mg m3) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

+200% (MFE) or 7200% (MFB) at a zero concentration to +75% and 760%, respectively. To smoothly interpolate between zero and larger concentrations, we propose an exponential, with the form: MFEðorMFBÞpA1 e0:5ðC o þC m Þ=a2 þ A3 ,

(10)

where, C o is the mean observed concentration and C m is the mean modeled concentration for each component. The coefficients, A1, a2 and A3, have been determined from analysis of past model applications and asymptotically approach the model performance goals and criteria discussed above. In our analysis, we found that the values given in

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

Nitrate Mean Fractional Error

Nitrate Mean Fractional Bias 200.0

150.0 Criteria Goal

100.0 50.0

Mean Fractional Bias

Mean Fractional Error

200.0

4953

150.0 100.0

(+) Criteria

50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0 0.0

1.0

2.0

3.0

4.0

0.0

Average Concentration (µg/m3)

1.0

2.0

3.0

4.0

Average Concentration (µg/m3)

Fig. 4. Nitrate (mg m3) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

Ammonium Mean Fractional Error

Ammonium Mean Fractional Bias 200.0

150.0 Criteria

100.0

Goal

50.0 0.0

Mean Fractional Bias

Mean Fractional Error

200.0

150.0 (+) Criteria

100.0 50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0

1.0

2.0

3.0

4.0

0.0

Average Concentration (µ µg/m3)

1.0

2.0

3.0

4.0

Average Concentration (µg/m3)

Fig. 5. Ammonium (mg m3) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

Organics Mean Fractional Error

Organics Mean Fractional Bias 200.0

150.0 Criteria

100.0

Goal

50.0

Mean Fractional Bias

Mean Fractional Error

200.0

150.0 100.0

(+) Criteria

50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0

0.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

Average Concentration (µg/m3)

0.0

1.0

2.0

3.0

4.0

5.0

6.0

Average Concentration (µg/m3)

Fig. 6. Organics (mg m3) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

Table 1 capture trends found in prior applications. Using those equations, a mean concentration of 0.8 mg m3 would result in a performance goal of 100% for MFE, and a mean concentration of 0.45 mg m3 would result in a performance goal of 7100% for MFB. Meeting the performance goals and criteria for each component of PM helps ensure that they will also be met for total PM2.5 and PM10.

The proposed model performance goals and criteria are combined with the mean fractional biases and errors from the benchmark studies (Figs. 1–9) to show how performance varies as a function of concentration. Note that the x-axis (‘‘average concentration’’) in these plots is the average of the mean modeled and mean observed concentrations for each PM species during each

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

4954

EC Mean Fractional Error

EC Mean Fractional Bias 200.0

150.0 Criteriaa

100.0

Goal

50.0

Mean Fractional Bias

Mean Fractional Error

200.0

0.0

150.0 (+) Criteria

100.0 50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0

0.5

1.0

1.5

2.0

0.0

Average Concentration (µg/m3)

0.5

1.0

1.5

2.0

Average Concentration (µg/m3)

Fig. 7. Elemental carbon (mg m3) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

Soils Mean Fractional Error

Soils Mean Fractional Bias 200.0

150.0 Criteriaa

100.0

Goal

50.0

Mean Fractional Bias

Mean Fractional Error

200.0

0.0

150.0 (+) Criteria

100.0 50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0

0.5

1.0

1.5

2.0

0.0

Average Concentration (µg/m3)

0.5

1.0

1.5

2.0

Average Concentration (µg/m3)

Fig. 8. Soils (mg m3) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

Coarse Mass Mean Fractional Error

Coarse Mass Mean Fractional Bias 200.0

150.0 Criteriaa

100.0

Goal

50.0

Mean Fractional Bias

Mean Fractional Error

200.0

150.0 (+) Criteria

100.0 50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0 0.0

2.0

4.0

6.0

8.0

Average Concentration (µg/m3)

0.0

2.0

4.0

6.0

8.0

Average Concentration (µg/m3)

Fig. 9. Coarse Mass (mg m3) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

episode/month/season. Since the solid lines identifying the goals and criteria for bias resemble a horn, these types of plots are referred to as ‘‘bugle’’ plots. 3.2. Visibility Light extinction is a linear combination of PM concentrations multiplied by appropriate light scattering efficiencies, absorption coefficients, and/ or relative humidity multipliers. Therefore, we can

use the same model performance goals (MFE p+50% and MFB p730%) and criteria (MFEp+75% and MFBp760%) for components of light extinction as were used for PM. The climatological relative humidity adjustment factor ranges from 1.0 to 5.6 with an average value of 2.6. Therefore, on average, 1.0 mg m3 of sulfate, nitrate, or ammonium will be equal to 7.8 Mm1 of light extinction. Furthermore, 1.0 mg m3 of organic carbon, elemental carbon, soils, and course mass

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

4955

Table 1 Proposed model performance goals and criteria for PM PM performance metrics

Equationsa

Mean fractional bias goal

MFBp  170e0:5ðC o þC m Þ=0:5 mg=m þ 30

3

0:5ðC o þC m Þ=0:75 mg=m3

MFEp150e

Mean fractional bias criteria

MFBp  140e0:5ðC o þC m Þ=0:5 mg=m þ 60

þ 50 3

0:5ðC o þC m Þ=0:75 mg=m3

Mean fractional error criteria a

(11) (12)

Mean fractional error goal

MFEp125e

(13) (14)

þ 75

3

C o is the mean observed concentration (mg m ) and C m is the mean modeled concentration (mg m3) for each component of PM.

Table 2 Proposed model performance goals and criteria for light extinction Extinction performance metrics

Equationsa

Mean fractional bias goal

MFBp  170e0:5ðC o þC m Þ=2:5 Mm

1

0:5ðC o þC m Þ=3:75 Mm1

Mean fractional error goal

MFEp150e

Mean fractional bias criteria

MFBp  140e0:5ðC o þC m Þ=2:5 Mm

1

0:5ðC o þC m Þ=3:75 Mm1

Mean fractional error criteria

MFEp125e

a

(15)

þ 30

(16)

þ 50

(17)

þ 60

(18)

þ 75

1

C o is the mean observed concentration (Mm ) and C m is the mean modeled concentration (Mm1) for each component of light extinction.

MFE Goals and Criteria

MFB Goals and Criteria 200.0

150.0 Criteriaa

100.0

Zone 3

Goal

Zone 2

50.0 Zone 1

0.0 0.0

4.0

8.0 12.0 16.0 20.0 24.0 28.0

Average Concentration (µg/m3)

Mean Fractional Bias

Mean Fractional Error

200.0

150.0 100.0

Zone 2 Zone 3

50.0 0.0 -50.0 -100.0

(+) Criteria (+) Goal

Zone 1

(-) Goal (-) Criteria

Zone 3 Zone 2

-150.0 -200.0 0.0 4.0 8.0 12.0 16.0 20.0 24.0 28.0

Average Concentration (µg/m3)

Fig. 10. Performance zones for mean fractional error (left) and mean fractional bias (right).

will equal 4.0, 10.0, 1.0, and 0.6 Mm1, respectively. To account for the differences in units, the specific equations used to define the goals and error for light extinction (Table 2) will differ slightly than those used for PM. 3.3. Performance zones The goals and criteria proposed in Sections 3.1 and 3.2 are not necessarily a pass/fail test, but can be used to help identify the level of model evaluation that must be performed to insure the modeling is reliable, as well as, provide guidance for

scientific model applications. Mean normalized bias and error values can fall into one of three performance zones on the bugle plots (Fig. 10). The area below the goal line for error and between the positive and negative goal lines for bias is Zone 1 and corresponds to good model performance. We suggest that an operational evaluation should be performed for species in this zone to quantify the level of agreement. The area between the goal and criteria lines is Zone 2 and corresponds to average model performance and would require a diagnostic evaluation to be performed to evaluate the ability of the model to predict precursors, oxidants, PM size

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

4956

distributions, temporal variations, spatial variations, and mass fluxes. The area outside the criteria line(s) is Zone 3 and is an area of poor model performance. Modeled components of PM that fall into this zone would require extended diagnostic and mechanistic evaluations that will likely include sensitivity analysis. Failure to meet the proposed criteria should not necessarily mean that the modeling can not be used for regulatory applications. Consideration should be given to the extent that modeling results fall outside the criteria. EPA modeling guidance recommends a ‘‘Weight of Evidence Approach’’ (US EPA, 2001) is which results of other analyses (e.g., other model outputs, trend analysis, use of observational models) are used to compliment the model findings of the photochemical model. Also, if it can be demonstrated that the modeling results can still be used with confidence for certain components of PM even though other components of PM are modeled poorly, there is no reason to deem the modeling as a whole as unacceptable. As PM models and measurement techniques mature and typical performance improves, performance zones can be made more restrictive by simply adjusting the coefficients in the performance goals and criteria equations (Tables 1 and 2) to meet the current state of the science.

Sulfate (Fig. 3) performance is better than most components of PM2.5 that were examined, with a majority of the points meeting the goal and only two points falling outside the criteria. This can likely be attributed to good estimates of SO2 emissions, sulfate chemistry being less complex than other components of PM species, and the high spatial homogeneity of sulfate. On the contrary, nitrate (Fig. 4) performance is quite poor with approximately 40% of the points falling outside the criteria and only values smaller 0.8 mg m3 meeting the goal. There is strong positive bias when the average concentrations are over 1.0 mg m3 and a strong negative bias when they are below 1.0 mg m3. When not directly measured, observed ammonium was calculated by assuming the sulfate and nitrate were fully neutralized with ammonium (Malm et al., 2000). Therefore, it is not surprising that ammonium (Fig. 5) performance falls between that of sulfate and nitrate. Overall, the performance is fairly good because the majority of the ammonium is associated with sulfate which performs well. Less than 40% of the organic (Fig. 6) performance assessments meet the goal and a fair number fall outside the criteria. The large errors are mainly due to simulated organic levels being lower than observations, especially at the urban STN sites. Elemental carbon (Fig. 7) concentrations are typically below 1.0 mg m3 and the performance is within the performance goals for all assessments except one, which is just barely outside the goal. Approximately 40% if the soil (Fig. 8) points meet the goal, while another 45% are outside the criteria. Most of the soil error is caused by large overestimations by the model. Finally, coarse mass (Fig. 9) performance is poor with only two points meeting the goal and a majority falling outside the criteria. Fortunately for PM2.5 modeling, coarse mass is not considered. However, it is likely that

3.4. Discussion of results The model performance for PM2.5 (Fig. 1) is fairly good with about 50% of the points meeting the goals and a large majority meeting the criteria. A handful of point between 3.8 mg m3 and 8.8 mg m3 do not meet the criteria. A majority of the PM10 (Fig. 2) performance values meet the goals and criteria; however, a larger percentage of the total points fall outside the criteria compared to PM2.5. Organics Mean Fractional Error

Organics Mean Fractional Bias 200.0

150.0 Criteriaa

100.0

Goal

50.0

Mean Fractional Bias

Mean Fractional Error

200.0

150.0 100.0

(+) Criteria

50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0 0.0

10.0

20.0

Average Extinction (Mm-1)

30.0

0.0

10.0

20.0

30.0

Average Extinction (Mm-1)

Fig. 11. Organics (M m1) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

Soils Mean Fractional Bias

Soils Mean Fractional Error 200.0

150.0 Criteriaa

100.0

Goal

50.0

Mean Fractional Bias

200.0 Mean Fractional Error

4957

150.0 100.0

(+) Criteria

50.0

(+) Goal

0.0

(-) Goal

-50.0

(-) Criteria

-100.0 -150.0 -200.0

0.0 0.0

10.0

20.0

30.0

Average Extinction (Mm-1)

0.0

10.0

20.0

30.0

Average Extinction (Mm-1)

Fig. 12. Soils (Mm1) mean fractional error (left) and mean fractional bias (right) for all benchmark runs compared to proposed performance goals and criteria.

EPA will be promulgating a new coarse PM standard in the near future making it critical that models improve their predictions of CM. An identical analysis has been performed for components of light extinction. In summary, similar performance patterns to those of PM were identifies for sulfate, nitrate, ammonium, organics (Fig. 11), and elemental carbon. These similarities are due to the fact that each of these components get multipliers that are on the same order of magnitude as the scale change when converting from mass to light extinction. However, since soils (Fig. 12) and coarse mass do not get large multipliers due to their minimal impact on visibility, their performance is much improved over that of their PM mass performance. In fact, only a handful of soils and coarse mass points fall outside the criteria lines. 4. Conclusions The authors have proposed model performance goals and criteria that vary as a function of PM concentration and light extinction. The goal has been met when both the mean fractional error and the mean fractional bias are less than or equal to +50% and 730%, respectively. The criteria has been met when both the mean fractional error and the mean fractional bias are less than or equal to +75% and 760%, respectively. Less abundant or ‘‘minor’’ species should have less stringent performance goals and criteria. Performance evaluation should be done on an episode-by-episode basis or on a month-by-month basis for annual modeling. Recommended performance goals and criteria should be used to help identify areas that can be improved upon in future modeling. Extended diagnostic evaluation and sensitivity tests should be performed to address poor performance. Using

the identified metrics, sulfate and EC components of PM2.5 and light extinction are generally the most accurately simulated, while nitrate and organic carbon performance are poor. Poor soil performance is of concern for PM2.5, but not visibility. Poor coarse mass performance does not impact total PM2.5 and has minimal impact on total light extinction. Acknowledgements The authors gratefully acknowledge Dr. Gail Tonnesen (University of California—Riverside), Mr. Kirk Baker (LADCO), and US EPA for supplying modeling results. We would also like to acknowledge the many individuals who have contributed to the collection of the data used for model evaluation in this and related studies, including those individuals associated with IMPROVE, STN, CASTNet, and SEARCH.

References Bailey, E., Gautney, L., Jacobs, M., Kelsoe, J., Pun, B., Seigneur, C., Douglas, S., Haney, J., Kumar, N., 2004. A comparison of model performance of CMAQ, MADRID-1, MADRID-2 and REMSAD. AAAR 2004 Annual Conference, Atlanta, GA. Baker, K.R., 2004a. Application of multiple one-atmosphere air quality models emphasizing PM2.5 performance evaluation (Paper # 04-A-320-AWMA). Air and Waste Management Association Conference, Indianapolis, IN. Baker, K.R., 2004b. Interpretation of PM2.5 model performance metrics. Presented at PM Model Performance Workshop, Chapel Hill, NC. 11 February 2004. Located at: /http:// www.ladco.org/tech/photo/present/metrics1.pdfS. Bencala, K.E., Seinfeld, J.H., 1979. An air quality model performance assessment package. Atmospheric Environment 13 (8), 1181–1185.

ARTICLE IN PRESS 4958

J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959

Boylan, J.W., Odman, M.T., Wilkinson, J.G., Russell, A.G., Doty, K.G., Norris, W.B., McNider, R.T., 2002. Development of a comprehensive, multiscale ‘‘one-atmosphere’’ modeling system: application to the Southern Appalachian Mountains. Atmospheric Environment 36, 3721–3734. Boylan, J.W., Odman, M.T., Wilkinson, J.G., Russell, A.G., 2006. Integrated assessment modeling of atmospheric pollutants in the Southern Appalachian Mountains: Part II. Fine particulate matter and visibility. Journal of the Air and Waste Management Association 56, 12–22. Byun, D.W., Ching, J.K.S., 1999. Science Algorithms of the EPA Models-3 Community Multiscale Air Quality (CMAQ) Modeling System. EPA/600/R-99/030, US Environmental Protection Agency, Office of Research and Development, Washington, DC. Chang, J.C., Hanna, S.R., 2004. Air quality model performance evaluation. Meteorology and Atmospheric Physics 87 (1–3), 167–196. Dennis, R.L., Downton, M.W., 1984. Evaluation of urban photochemical models for regulatory use. Atmospheric Environment 18, 2055–2069. Environ, 2002. User’s Guide: Comprehensive Air Quality Model with Extensions (CAMx). Environ International Corporation, Novato, CA. Hanna, S.R., Lu, Z., Frey, H.C., Wheeler, N., Vukovich, J., Arunachalam, S., Fernau, M., Hansen, D.A., 2001. Uncertainties in predicted ozone concentration due to input uncertainties for the UAM-V photochemical grid model applied to the July 1995 OTAG domain. Atmospheric Environment 35, 891–903. Hanna, S.R., Russell, A.G., Wilkinson, J.G., Vukovich, J., Hansen, D.A., 2005. Monte Carlo estimation of uncertainties in BEIS3 emission outputs and their effects on uncertainties in chemical transport model predictions. Journal of Geophysical Research 110, D01302. Hansen, D.A., Edgerton, E.S., Hartsell, B.E., Jansen, J.J., Kandasamy, N., Hidy, G.M., Blanchard, C.L., 2003. The Southeastern aerosol research and characterization study: part 1—overview. Journal of the Air and Waste Management Association 53, 1460–1471 Data located at /http://www. atmospheric-research.com/public/index.htmlS. ICF Consulting, 2002. User’s Guide to the Regional Modeling System for Aerosols and Deposition (REMSAD) Version 7. ICF Consulting, San Rafael, CA. IMPROVE, 1995. IMPROVE Data Guide—A Guide To Interpret Data. University of California Davis. Prepared for National Park Service, Air Quality Research Division, Fort Collins, CO. Data located at /http://vista.cira.colostate.edu/ improve/Data/IMPROVE/improve_data.htmS. Malm, W.C., Pitchford, M.L., Scruggs, M., Sisler, J.F., Ames, R., Copeland, S., Gebhart, K.A., Day, D.E., 2000. Spatial and seasonal patterns and temporal variability of haze and its constituents in the United States, report III; ISSN: 0737-535247. Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, CO. Marmur, A., Russell, A.G., Mulholland, J.A., 2004. Air-quality modeling of PM2.5 mass and composition in Atlanta: results from a two-year simulation and implications for use in health studies. Reprints from third Annual CMAS Models-3 Conference, 2004. Chapel Hill, NC. McNair, L., Harley, R., Russell, A.G., 1996. Spatial inhomogeneity in pollutant concentrations and their implications for air quality model evaluation. Atmospheric Environment 30, 4291–4301.

Morris, R.E., Koo, B., Lau, S., Tesche, T.W., McNally, D., Loomis, C., Stella, G., Tonnesen, G., Wang, Z., 2004a. VISTAS emissions and air quality modeling phase I task 4cd report: Model Performance Evaluation and Model Sensitivity Tests for Three Phase I Episodes—Final Report. Prepared for Visibility Improvement State and Tribal Association of the Southeast (VISTAS), Swannanoa, NC. Morris, R.E., Koo, B., Lau, S., Tesche, T.W., McNally, D., Loomis, C., Stella, G., Tonnesen, G., Chien, C-J., 2004b. VISTAS Phase II Emissions and Air Quality Modeling Task 4a Report: evaluation of the initial CMAY 2002 annual simulation—revised draft final report. Prepared for Visibility Improvement State and Tribal Association of the Southeast (VISTAS), Swannanoa, NC. Odman, M.T., Boylan, J.W., Wilkinson, J.G., Russell, A.G., Mueller, S.F., Imhoff, R.E., Doty, K.G., Norris, W.B., McNider, R.T., 2002. SAMI air quality modeling final report. Southern Appalachian Mountains Initiative, Asheville, NC. Park, S-K, Russell, A.G., 2003. Sensitivity of PM 2.5 to emissions in the Southeast. Reprints from Second Annual CMAS Models-3 Conference, Chapel Hill, NC. Russell, A.G., Dennis, R., 2000. NARSTO critical review of photochemical models and modeling. Atmospheric Environment 34, 2283–2324. Seigneur, C., 2001. Current status of air quality models for particulate matter. Journal of the Air and Waste Management Association 51, 1508–1521. Seigneur, C., Pun, B., Pai, P., Louis, J.F., Solomon, P., Emery, C., Morris, R., Zahniser, M., Worsnop, D., Koutrakis, P., White, W., Tombach, I., 2000. Guidance for the performance evaluation of three-dimensional air quality modeling systems for particulate matter and visibility. Journal of the Air and Waste Management Association 50, 588–599. Seinfeld, J.H., Pandis, S.N., 1998. Atmospheric Chemistry and Physics. Wiley, New York, NY. Sisler, J.F., Malm, W.C., 2000. Interpretation of trends of PM2.5 and reconstructed visibility from the IMPROVE Network. Journal of the Air and Waste Management Association 50, 775–789. Solomon, P., Klamser-Williams, T., Egeghy, P., Crumpler, D., Rice, J., 2004. STN/IMPROVE comparison study—preliminary results. Presented at PM Model Performance Workshop, Chapel Hill, NC. 10 February 2004. Located at: /http:// www.cleanairinfo.com/PMModelPerformanceWorkshop2004/ presentations/RiceSTNImprove.pptS. Tesche, T.W., Georgopoulos, P., Seinfeld, J.H., Lurmann F., Roth, P.M., 1990. Improvements in procedures for evaluating photochemical models. Report A832-103, California Air Resources Board, Sacramento, CA. Tonnesen, G., Wang, B., Chien, C.J., Wang, Z., Omary, M., Adelman, Z., Holland, A., Morris, R., 2004. WRAP 2002 Visibility Modeling: Annual CMAQ Performance Evaluation using Preliminary 2002 version C Emissions. Prepared for Western Regional Air Partnership, /http://pah.cert.ucr.edu/ aqm/308/ppt_files/WRAP_Pre02C_eval_results.pptS. US EPA, 1991. Guidance for Regulatory Application of the Urban Airshed Model (UAM). US Environmental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC. US EPA, 1999. Federal Register, 40 CFR Part 51. Regional Haze Regulations; Final Rule. US Environmental Protection Agency, Washington, DC.

ARTICLE IN PRESS J.W. Boylan, A.G. Russell / Atmospheric Environment 40 (2006) 4946–4959 US EPA, 2000. Quality Assurance Guidance Document— Final Quality Assurance Project Plan: PM2.5 Speciation Trends Network Field Sampling (EPA-454/R-01-001). US Environmental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC. Data located at: /http://www.epa.gov/ttn/airs/airsaqs/detaildata/ downloadaqsdata.htmS. US EPA, 2001. Draft Guidance for Demonstrating Attainment of Air Quality Goals for PM2.5 and Regional Haze; US Environmental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC. US EPA, 2002. List of 156 mandatory Class I Federal areas: /www.epa.gov/oar/vis/class1.htmlS. US EPA, 2003a. Clean air status and trends network (CASTNet)—2002 annual report (EPA contract no. 68-D-03-052). Prepared by MACTEC, Inc. for US Environmental Protection Agency, Office of Atmospheric Programs (OAP), Washington, DC. Data located at /http://www.epa.gov/ castnet/data.htmlS. US EPA, 2003b. Technical support document for the clear skies act 2003 air quality modeling analyses. US Environmental Protection Agency, Office of Air Quality Planning and Standards, Research Triangle Park, NC.

4959

Wang, Z., Tonnesen, G., Morris, R., Shankar, U., 2002. WRAP Regional Haze CMAQ 1996 Model Performance Evaluation. Prepared for Western Regional Air Partnership, /http:// pah.cert.ucr.edu/rmc/ppt_files/UCR_Denver_1996_evaluation. 070502.pptS. Yu, S., Eder B.K., Dennis R.L., Chu S-H, Schwartz, S., 2005. On the development of new metrics for the evaluation of air quality models. Atmospheric Environment, submitted for publication. Zhang, Y., Pun, B., Wu, S-Y ., Vijayaraghavan, K., Yelluru, G.K., Seigneur, C., 2003. Performance Evaluation of CMAQ and PM-CAMx for the July 1999 SOS Episode. CRC Project Number A-40-1 (Document Number CP131-03-02). Prepared for Coordinating Research Council, Inc., Alpharetta, GA. Zhang, Y., Pun, B., Vijayaraghavan, K., Wu, S-Y., Seigneur, C., Pandis, S.N., Jacobson, M.Z., Nenes, A., Seinfeld, J.H., 2004a. Development and application of the Model of aerosol dynamics, reaction, ionization, and dissolution (MADRID). Journal of Geophysical Research 109, D01202, doi:10.1029/ 2003JD003501. Zhang, Y., Pun, B., Wu, S-Y., Vijayaraghavan, K., Seigneur, C., 2004b. Application and evaluation of two air quality models for particulate matter for a southeastern US episode. Journal of the Air and Waste Management Association 54, 1478–1493.