Expert Systems with Applications 41 (2014) 869–876
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Multi-criteria selection of an Air Quality Model configuration based on quantitative and linguistic evaluations V.H. Almanza ⇑, I. Batyrshin, G. Sosa Instituto Mexicano del Petróleo, 07730 México D.F., Mexico
a r t i c l e
i n f o
Keywords: Multi-criteria evaluation Pareto Fronts Air quality WRF-Chem
a b s t r a c t This study presents the application of multi-criteria evaluation in the selection of an optimal configuration for an Air Quality Model. The simulation domains focus on the Mexico City Metropolitan Area. A set of 10 different configurations were considered as alternatives. These configurations included convective parameterization, 6th order diffusion and exclusion of data assimilation within the Planetary Boundary Layer. In addition, model integration in a continuous setup and in a segmented setup was also considered. The modeling variables were surface temperature, wind speed, wind direction, and sulfur dioxide. The performance of the meteorological fields was evaluated with statistical metrics together with the Local Trend Association measure and further used as criteria. The air pollution field was evaluated qualitatively with five expert-based linguistic criteria and further converted into numerical terms. Meteorological variables and sulfur dioxide were aggregated into two single arrays, one for representing meteorology and the other for representing transport of a large industrial plume. Pareto Fronts were constructed for these two arrays under different weights scenarios. Results suggested that a model with no parameterizations in continuous integration setup and a model with segmented integrations using 6th order diffusion were the optimal configurations to conduct future air quality studies. Ó 2013 Elsevier Ltd. All rights reserved.
1. Introduction It has long been recognized the negative impacts of air pollution on human health and ecosystems. The role of anthropogenic emissions is at the forefront of these impacts (WMO, 2012). To effectively address the air pollution at several spatial scales, air quality modeling is of utmost relevance. Air quality forecasting uses source and receptor models to study the degree of pollution in future events. There are several approaches to model air pollution, and all of them rely on the scientific goals under study which include the spatial scale of interest. In this respect, expert-based approaches are important in air quality studies. For example: in forecasting air pollution with fuzzy time series methods (Domanska & Wojtylak, 2012); classifying concentrations of criteria pollutants in Mexico (Barrón-Adame, Cortina-Januchs, Vega-Corona, & Andina, 2012); for controlling emissions (Zhou, Huang, & Chan, 2004), identification of geographical distribution of particle pollution (Li & Shue, 2004); or in suggesting a system for air quality management at regional scale based on interval programming with stochastic variables (Cao, Huang, & He, 2011).
⇑ Corresponding author. E-mail address:
[email protected] (V.H. Almanza). 0957-4174/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.eswa.2013.08.017
Another approach consists in the application of deterministic models which solves the conservation equations of different physical and chemical processes (Zhang, Bocquet, Mallet, Seigneur, & Baklanov, 2012). These models are known as Air Quality Models (AQM) or Chemical Transport Models (CTM). This kind of models combines in a systematic approach current knowledge of meteorology, emissions and atmospheric chemistry to make forecasts of ambient concentrations (Molina et al., 2004). In addition, they provide scientific understanding of pollutant processes; can address and track, both in space and time, the long range transport of air pollutants; and can help in formulating scenarios involving changes in meteorology or emissions (Zhang et al., 2012). In Mexico, air quality studies using this type of models have been conducted in recent years focusing on different research objectives and on different kind of pollutants: sensitivity of ozone formation to emission changes (Lei, de Foy, Zavala, Volkamer, & Molina, 2007); assessment of possible underestimation of volatile organic compounds in the official National Emissions Inventory (West et al., 2004) and estimation of source contribution of large sulfur dioxide plumes (de Foy et al., 2009) among others. Meteorological fields, including wind, hourly temperature, mixing depth and solar insolation fields are an important input for any modeling exercise with Air Quality Models (Russell & Dennis, 2000). These fields can have great uncertainty which contribute
870
V.H. Almanza et al. / Expert Systems with Applications 41 (2014) 869–876
in mispredicting airborne chemical species, aerosols and particulate matter (Seaman, 2000). Thus, accurate meteorological fields in a CTM are of utmost importance. They will lead to reliable forecasts of air pollution events. Multi-criteria evaluation is suitable for addressing complex problems involving high uncertainty, conflicting objectives and different data among other features (Wang, Jing, Zhang, & Zhao, 2009). It selects the best choice in a set of potential alternatives (Dalalah, Hayajneh, & Batieha, 2011). Since it is unlikely to achieve optimal values in all objectives at the same time, the final solution is a compromise of the initial alternatives in the selection process (Behnamian, Ghomi, & Zandieh, 2009). In environmental sciences it has been used in the development of a support system to help planning water resources management in the Haihe river (China) (Weng, Huang, & Li, 2010) and in the estimation of the optimum number of parameters for the coupled system of a multi-layer perceptron and the Numerical Weather Prediction (NWP) model HIRLAM (Niska et al., 2005). In this respect, the best configuration for either a NWP or a CTM cannot be known a priori, even though there are recommended parameterizations which produce reliable results. Therefore, sensitivity experiments are conducted in order to choose the best configuration among different physical parameterizations (Borge, Alexandrov, del Vas, Lumbreras, & Rodríguez, 2008; Lo, Yang, & Pielke, 2008). However, not all the possible combinations are feasible to be evaluated since some parameters are specific for some modeling options. Although sensitivity studies usually focus on the meteorological fields, there are works which additionally assess the potential transport of pollutants for each configuration. Trajectory calculations are computed in order to evaluate the meteorological fields (Godowitch, Gilliam, & Rao, 2011; Ngan et al., 2012; Seaman, 2000). This allows depicting the degree of uncertainty in the meteorological fields. Nevertheless, the plume transport is not considered in these works as an extra criterion to evaluate model performance. The main objective of this study is to select an optimum configuration for an AQM based on multi-criteria evaluation. The main motivation is to have a trade-off between meteorology and plume transport in order to conduct a regional air quality simulation focusing on the Mexico City Metropolitan Area (MCMA). Section 2 presents the AQM and the area of study. The methodology is presented in Section 3. Results and discussion are given in Section 4, and conclusions of this study are summarized in Section 5.
2. Air Quality Model and area of study 2.1. WRF-Chem Detailed Eulerian numerical Air Quality Models have been developed for research purposes and to support emissions-control policy decisions (Seaman, 2000). An example is the WRF-Chem model. It is an on-line chemistry model fully coupled to the Weather Research and Forecasting (WRF) model (Skamarock et al., 2005) with lead developments at the National Oceanic and Atmospheric Administration (NOAA) and the Pacific Northwest National Laboratory (Fast et al., 2006; Grell et al., 2005). In this respect, the WRF model is a state-of-the-art mesoscale system designed for both short- and long-term weather and climate simulations. It is a non-hydrostatic model, with different physical parameterizations, including cumulus parameterizations, Planetary Boundary Layer, land surface models, long-wave and short-wave radiation, and microphysics models. It also includes multi-scale data assimilation. Thus, the WRF-Chem preserves the transport, grid and physics schemes of WRF’s meteorological component when solving the transport in sub-grid scales. It has different chemical mechanisms, as well as several schemes of
aerosol and photolysis (WRF-Chem, 2010; WRF, 2010). WRF-Chem version 3.2.1 is used in the present study. 2.2. Tula region and MCMA The city of Tula is located in the Mezquital Valley, in southwest Hidalgo with a total population of nearly 94,000 inhabitants and more than 140 industries. The region is semi-arid with average temperatures of 17 °C and precipitation ranging from 432 to 647 mm, increasing from north to south. In this region, the Tula Industrial Complex is settled in an area of 400 km2. The major industries of the city are located within this region, including the Miguel Hidalgo Refinery, the Francisco Perez Rios power plant, several cement plants and limestone quarries. Other minor industries include metal manufacturing, processed food, chemical and incineration of industrial waste. The emission of pollutants from combustion processes of these industries impacts the regional air quality. In addition, the inflow of untreated sewage water from Mexico City promotes severe pollution problems to soil and water resources (Cifuentes, Blumenthal, Ruíz-Palacios, Bennett, & Peasey, 1994; Vazquez-Alarcon, Justin-Cajuste, Siebe-Grabach, Alcantar-Gonzalez, & de la Isla-de Bauer, 2001). According to current environmental regulations, this region is classified as a critical area due to the high emissions of SO2 and particulate matter (SEMARNAT-INE, 2006). The MCMA is the largest megacity in North America, and the third largest urban agglomeration, with nearly 22 million inhabitants, after Tokyo (Japan) and Delhi (India) (UN, 2012). It is located in the subtropics within an elevated U-shaped basin surrounded by mountain ridges which border the west, east and south regions of the city. The metropolitan area covers 1500 km2 on the southwest side (Parrish, Singh, Molina, & Madronich, 2011; Williams, Brown, Cruz, Sosa, & Streit, 1995). This topography acts like a barrier to large-scale circulations and isolates the basin from the winds of synoptic weather systems at low levels (Zhang & Dubey, 2009). Previous work has shown that emissions from Tula Industrial Complex can impinge into the megacity, exerting an influence on sulfur dioxide pollution levels in the city (de Foy et al., 2009; Williams et al., 1995; Almanza, Molina, & Sosa, 2012). 3. Air Quality Model configurations A set of 10 different WRF-Chem configurations were considered in this study. Each model run encompass a 6-day simulation period, from 00:00 UTC 22 March to 00:00 UTC 28 March of 2006, using three domains with horizontal resolution of 27, 9 and 3 km, and 35 vertical levels (Fig. 1a). All of the simulations performed Multiscale Four Dimensional Data Assimilation (FDDA) to improve accuracy in the meteorological fields and keep the same physics to maintain consistency. Operational meteorological data, Mexico City Air Quality Network (RAMA) surface data; and radar wind profilers, radiosondes and surface data from MILAGRO field campaign were used in the FDDA assimilation process. MILAGRO field campaign was a major international collaborative project to examine the behavior and export of atmospheric emissions from the Mexico megacity (Molina et al., 2010). The simulations considered two of the main approaches to run an AQM. The first one of them is a single continuous integration requiring an initialization of meteorology and chemistry just at the beginning of the run. The second one is a multiple integration of overlapping short segments requiring multiple initializations at the beginning of each segment run. Even though they appear similar, final results may differ. In addition, computing time and disk storage are important features to take into account when running the AQM in either of these approaches. Segmented runs tend to require relatively more disk space than a single continuous run.
871
V.H. Almanza et al. / Expert Systems with Applications 41 (2014) 869–876
Fig. 1. (a) WRF-Chem simulation domains: d01 is the parent domain; d02 and d03 indicate the number of the nested domain. (b) Location of meteorological monitoring sites (circles) and SO2 monitoring sites (stars). The square represents the extent of the Tula Industrial Complex (TIC) containing Francisco Perez Rios Power Plant (FPRPP) and Miguel Hidalgo Refinery (MHR). Label bar indicates terrain height.
Table 1 Parameters employed for the sensitivity cases of this study. M1 is the baseline case. M1 to M5 correspond to model configurations in the continuous integration setup (C). M6 to M10 correspond to model configurations in the segmented integrations setup (S). Y denotes inclusion of a parameterization. Parameter
Convective parameterization in d03 6th order diffusion in d01 and d02 Wind field nudging above the PBL Integration
Case name M1/M6
M2/M7
M3/M8
M4/M9
M5/M10
– – – C/S
Y – – C/S
– Y – C/S
– – Y C/S
Y Y – C/S
Previous work obtained reliable results of air quality simulations for the MCMA using some of the most common modeling options (Almanza et al., 2012; Rivera et al., 2009). Thus, rather than performing an exhaustive sensitivity test of all the modeling options, the present study just considered (see Table 1) the inclusion of convective parameterization in the third domain (d03), 6th order horizontal diffusion in the first two domains (d01 and d02), and inclusion of grid nudging of the wind field within the Planetary Boundary Layer (PBL) also in the first two domains with the purpose of increasing the accuracy of future air quality simulations for this specific modeling period. Grid nudging is a kind of Newtonian relaxation which is ideal for assimilating synoptic data covering most or all the model domain at discrete intervals (Seaman, 2000). M1 to M10 denote the corresponding set of the 10 WRF configurations. For example, M2 included convective parameterization under a continuous integration setup; whilst M7 included convective parameterization under a segmented integration setup. Both M1 and M6 did not include any of the studied parameterizations and just differ in how they were integrated. 4. Model performance 4.1. Quantitative evaluation The performance of the meteorological fields obtained with WRF-Chem was assessed by means of the following measures: Root Mean Squared Error (RMSE):
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u n u1 X m1 ðP; OÞ ¼ t ðPi Oi Þ2 n i¼1 the mean bias (BIAS):
m2 ðP; OÞ ¼
n 1X ðPi Oi Þ2 2 i¼1
and the Index of Agreement (IOA) which is designed to better handling differences in predicted and observed means and variances (Willmott et al., 1985)
Pn m3 ðP; OÞ ¼ 1 Pn
i¼1 ðP i
1¼1 ðjP i
Oi Þ2
Oj þ jOi OjÞ
where Pi denotes model predictions, Oi experimental observations and O denotes the average of observations for the total simulation data n. The meteorological observations from 15 monitoring stations including the Mexico City RAMA monitoring network and the surface measurements from MILAGRO campaign were used to compare model results. The model surface variables considered for this purpose were the temperature at 2 m above ground level (T), wind speed (WS) at 10 m above ground level and wind direction (WD). With respect to the wind field, the RMSE of the vector wind difference (RMSEvec) was calculated. This statistic considers both speed and direction errors (Fast, 1995).
872
V.H. Almanza et al. / Expert Systems with Applications 41 (2014) 869–876
m4 ðP; OÞ ¼ ½ðuP u0 Þ2 þ ðv P v 0 Þ2
1=2
where up ; v p are the predicted and u0 ; v 0 are the observed zonal and meridional wind components respectively. In this study the air quality simulations were run just in the third domain (d03) to save computing time. Thus, the performance was evaluated only for the innermost domain. Additionally to the statistical evaluation methods the Local Trend Association (LTA) measure based on Moving Approximation Transform (MAT) (Batyrshin, Herrera-Avelar, Sheremetov, & Panova, 2007) was used for the performance evaluation of the model configurations. Unlike statistical evaluation measures LTA measures a similarity in dynamics of time dependent variables given in measurements and in models. MAT transforms a time series y = (y1, . . ., yn) into a sequence MATk(y) = (a1, . . ., ank+1) of slopes of least squares linear regressions fi = ait + bi of time series (yi, yi+1, . . ., yi+k1) in a sliding window of size k e {2, . . ., n}. These slope values are calculated by Batyrshin et al. (2007):
ai ¼
Pk1 6 j¼0 ð2j k þ 1Þyiþj 2
kðk 1Þ
;
i 2 1; . . . ; n k þ 1:
Local Trend Association between time series P and O is calculated for window size k by
ltak ðP; OÞ ¼ cosðMAT k ðPÞ; MAT k ðOÞÞ:
(Fig. 1b). At this stage, just the emissions from Tula are considered in order to simplify the analysis. The aim is to select an optimum configuration for future air quality simulations.
5. Multi-criteria evaluation 5.1. Quantitative criteria: meteorology The evaluation of a set of model configurations taking the statistical quantities {RMSE, 10A, BIAS, RMSEvec, LTA} as the criteria j = 1, . . ., 5 and the model configurations as the alternatives {M1, . . ., M10} (Ng, 2008) was performed for the aggregated meteorological variables T, WS and WD. The aggregation consisted in taking the arithmetic average of measurements from 15 monitoring stations previously mentioned in Section 4.1. The median and the lexicographical minimum were considered as aggregation methods but they tended to exclude the outliers (not shown). Each model configuration Mi=1,. . .,10 was evaluated by converting the statistical quantities under all criteria into a single performance measure for each k meteorological variable PMv ik¼T;WS;WD . The matrix MPi,j {i = 1, . . ., 10; j = 1, . . ., 5} represents the performance of Mi,j, that is the model configuration Mi under criterion j, normalized in a 0–1 scale. The normalization used the following linear transformation M min
This measure was used in Almanza and Batyrshin (2011) for analysis of associations between atmospheric pollutants and meteorological variables in MCMA. In the present study, we used an aggregation of LTA obtained for window sizes k = 2 and k = 4:
M 5 ðP; OÞ ¼ maxðlta2 ðP; OÞ; lta4 ðP; OÞÞ: 4.2. Qualitative evaluation
fM g
ij i¼1...10 i;j MPi;j ¼ maxi¼1...10 . Thus, the performance measure of a fM ij gmini¼1...10 fM ij g
model configuration for each meteorological variable was exP pressed as the weighted sum PMv i ¼ 5j¼1 xij MPij where xij is k
the weight of criteria j of model configuration Mi such that P5 j¼1 xij ¼ 1. Subsequently, the performance measures of all meteorological variables were further aggregated into a single performance array METi by taking their weighted sum P MET i ¼ 3k¼1 W ik PMv i , where Wik is the weight of meteorological k
Unlike the meteorological fields, the model performance of the regional plume transport for each configuration was assessed qualitatively in terms of SO2. Five features were considered for this purpose: the dynamics on 22 March (d22), the timing of the concentration peak on 23 March (p23), the magnitude of the peak on 23 March (m23), the concentration on the early morning on 24 March (mo24), and the peak of late night and early morning of 24 and 25 March (p2425) respectively. Each model of the set of 10 configurations was assessed on the basis of previous results and with expected behavior of the model (Almanza et al., 2012; de Foy et al., 2009; Rivera et al., 2009). One of the most rigorous tests of model performance is the reproduction of the peak concentration in the grid cell containing the measurement location at the same hour as it is measured, as well as the phasing of the peak, which would be within ±1 h (Seinfeld, 1988). For this purpose, observations from 25 monitoring stations including RAMA and MILAGRO were used to analyze plume transport from Tula to the MCMA. It should be noted that some stations which monitor meteorological variables do not monitor sulfur dioxide and vice versa
variable k for performance measure PMv ik , such that P3 k¼1 W ik ¼ 1. In this study, wind direction is relatively more important than wind speed and temperature. Thus, a slightly higher weight was assigned to wind direction. If ozone simulations were to be conducted, temperature would be as important as wind direction and similar weights could be assigned. In this work, however, the analysis just considers the gas phase chemistry of sulfur dioxide emissions released by large point sources. In order to examine the influence of the statistical criteria in the configuration selection, a sensitivity test on criteria weights was conducted. It consisted in assigning equal weights to all statistical quantities (S1), moderately higher weights to some of them (S2), relatively equal weights but one higher than the rest (S3–S6) and considering just one of the statistical quantities (S7–S10) as shown in Table 2. The purpose was to represent the relative importance of each criterion on the selection process and to further investigate how the selection could be influenced if considering just one statistical performance measure. Thus, a single performance array METi was obtained for each sensitivity test.
Table 2 Assigned weights to the statistical quantities in the sensitivity tests. Statistical measure
RMSE/RMSEvec IOA BIAS LTA
Sensitivity test S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
0.25 0.25 0.25 0.25
0.045 0.045 0.455 0.455
0.295 0.235 0.235 0.235
0.235 0.295 0.235 0.235
0.235 0.235 0.295 0.235
0.235 0.235 0.235 0.295
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
873
V.H. Almanza et al. / Expert Systems with Applications 41 (2014) 869–876 Table 3 Numerical values assigned by priority to the considered linguistic attributes. E, VG, G, F and P represents the linguistic attribution: Excellent, Very Good, Good, Fair and Poor. The subscripts H and L represent High priority and Low priority respectively. Value Linguistic attribute
100 EH
90 VGH
75 G H = EL
70 VGL
60 GL
45 FL = FH
10 PL
Table 4 Number of dominated model configurations for best candidates.
0 PH
Model configuration
Sensitivity cases S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
Total
M1 M2 M8 M9
4 0 4 0
4 0 0 0
0 0 4 0
0 0 4 0
4 0 0 0
4 0 4 0
0 4 4 4
0 0 5 0
3 0 0 3
4 4 0 0
23 8 25 7
5.2. Linguistic criteria: SO2 plume transport With respect to sulfur dioxide, the model performance was evaluated subjectively by means of linguistic measures. The criteria in this case consisted in the five model features previously mentioned in Section 4.2. The linguistic attributes used for this purpose were {Excellent, Very Good, Good, Fair, Poor}, la = 1, . . ., 5, according to how the model performed for each feature. The subjective evaluation was preferred since at this stage only one emission source, the Tula Industrial Complex, was considered. That is, the emissions from Mexico City, the Popocatepetl volcano and surrounding regions were excluded, and a quantitative evaluation would not be representative for the performance of air pollutants. Thus, the evaluation of model configurations consisted in taking the model features {d22, 23, m23, mo24, p2425}, js = 1, . . ., 5, as the criteria and the model configurations as the alternatives {M1, . . ., M10}. Similar to the meteorological variables, each model configuration Mi = 1,. . .,10 was evaluated by converting the linguistic attributes la under all model features js into a single performance measure for sulfur dioxide PMv i ¼fSO2 g . For this purpose, the linguistic attributes la were converted into numerical values in a 0–100 scale. The procedure to assign numerical
0.7
values considered the priority among the model features. In the present study, the most important model features were the timing of the peak on 23 March (p23) and the magnitude of early morning concentration on 24 March (mo24). Thus, these model features were ranked with higher (H) priority, whilst the other ones with low (L) priority. In order to maintain consistency between high and low priority model features, the numerical assignment was split into two sets, one for each type of priority. This is presented in Table 3. For instance, the highest value (100) was assigned to the linguistic attribute {Excellent} with high priority features EH; whilst a similar value (75) was assigned to both the linguistic attribute {Good} with high priority features and to the linguistic attribute {Excellent} with low priority features. The numerical assignment was done for each of the 25 monitoring stations. Like for the meteorological variables, the weighted performance of SO2 for all stations was further aggregated into the single performance measure array SDi by taking the arithmetic average of the resulting numerical assignments of the monitoring stations and subsequently normalized into a 0–1 scale. With these
M5
M10
SD i
0.6 0.5 0.4
S1 M6
M3 M8
M1
0.3 0.3
M9
0.35
0.4
0.7
M7
M4 M2
0.45
0.5
0.55
M5
0.6
0.65
M10
SD i
0.6
S5
0.5 0.4
M6
M3
M1
0.3 0.3
M8
0.35
0.7
M9 M7 0.4
M4 M2
0.45
0.5
0.55
0.6
M5
0.65
M10
SD i
0.6 0.5 0.4 0.3 0.2
M6
M1
M8 M7 M2 0.3
S10
M3
0.4
M4
M9 0.5
0.6 MET
0.7
0.8
0.9
i
Fig. 2. Pareto Fronts (stars) of sensitivity cases S1 (upper panel), S5 (mid panel) and S10 (bottom panel).
1
874
V.H. Almanza et al. / Expert Systems with Applications 41 (2014) 869–876
Fig. 3. Regional transport of SO2 plume form the Tula Industrial Complex for the model configurations M1 (yellow) and M8 (purple dashed) on 23 March at 11:00 LST (left panel), and 24 March at 05:00 LST (right panel). The dots depict the location of the monitoring stations. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
6. Results and discussion 6.1. Pareto Fronts: optimal configurations The Pareto Fronts of the three sensitivity cases S1, S5 and S10 are presented in Fig. 2. It shows the best models when the quantitative evaluation of the meteorological variables were assumed of equal importance (upper panel); when considering that the BIAS was slightly more important for assessing performance (mid panel) and the case when assuming that LTA was the most important performance criterion (lower panel). In all cases, model configurations M1, M2 and M8 were suggested as reliable candidates to conduct an air quality simulation. In the first case S1, the configuration M1 dominated over four configurations. It is d denoted M 1 !½M 3 ; M 5 ; M 6 ; M 10 . The configuration M8 also domid nated over four configurations M8 !½M 3 ; M 4 ; M 5 ; M 10 , and the configurations M2, M7 and M9 only dominated over two configurations. However, if giving the bias slightly more priority (case S5) similar results to case S1 were obtained, except that configuration M8 only dominated over three models instead of four as in case S1. When comparing M8 with respect to METi for S1 and S5, it can be seen that in S5 case, M8 presented slightly lower performance in meteorology, implying slightly higher bias in wind direction with respect to M1. According to LTA, the configurations which better represented the dynamics of the observed time series were M1 and M2, each with four dominated configurations. An interesting finding regarding configuration M2 is that even though it presented slightly lower
TLI
80
OBS M1 M2 M8
70 60 SO2 (ppb)
two performance arrays METi and SDi, one for the meteorology and one for sulfur dioxide the Pareto Front of model configurations was constructed. The best two models were selected by counting the number of dominated models for each point in the Pareto Front. This procedure was performed for each case of the weights sensitivity test {S1, . . ., S10}. The configurations with higher dominance were selected as the best model candidates.
50 40 30 20 10 0 22
23
24
25 Days (LST)
26
27
28
Fig. 4. Sulfur dioxide time series at Tultitlan (TLI) monitoring station of model configurations M1 (blue), M2 (orange) and M8 (cyan). All times are in Local Standard Time (LST). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
performance in meteorology with respect to the statistical measures, it had the best performance in terms of plume transport. In addition, it represented relatively well the dynamics of both meteorology and transport of sulfur dioxide from Tula region. This suggests that the dynamics were better reproduced with the continuous integration rather than with multiple integrations of segmented runs. Table 4 shows the number of dominated models for the remaining cases of the sensitivity analysis. It only shows model configurations with the highest degree of dominance, so that a zero means that for the respective sensitivity case, those particular configurations were not the highest ones. It is clear that configurations M1 and M8 were the best candidates for conducting an air quality simulation, since they had the greatest dominance among all the model configurations, M8 in particular. However, M2 could give reliable results given its performance in representing the dynamics of plume transport.
875
V.H. Almanza et al. / Expert Systems with Applications 41 (2014) 869–876 Table 5 Best model configurations. Results
Sensitivity case S1
This work Ng method
M1 –
S2 M8 –
M1 M1
S3 M8 M8
M8 M1
S4 M1 M8
M8 M8
S5 M1 M1
M1 M1
6.2. Regional transport: influence of model configuration Fig. 3 shows the spatial distribution of the SO2 plume from the Tula Industrial Complex. This Figure depicts the subtle but important differences in the regional transport between the best model candidates. On 23 March at 11:00 LST the modeled plume transport is very similar for configuration M1 (yellow) and configuration M8 (purple dashed), but with a slight delay and lower spread for M1 (Fig. 3, left). In contrast, on 24 March at 05:00 LST configuration M1 presented a wider penetration in the northwest region of the basin even reaching part of the southwest; whilst M8 prevailed mostly on the western ridge of the basin suggesting no plume impingement at this hour (Fig. 3, right). The time series at Tultitlan monitoring station (TLI) of these model configurations are presented in Fig. 4 to better depict the contribution of model parameterizations in the final plume transport. It also includes the results obtained with M2 configuration. Although M8 overpredicted the magnitude of SO2 concentration on 22 March it reproduced the concentration in the early morning on 24 March better than M1. In addition M1 and M8 reproduced the timing of the peak on 23 March, whilst M2 predicted it one hour earlier. Moreover, the early morning peak on 25 March was better reproduced with M8 in both magnitude and timing, since M1 and M2 presented higher overprediction. In order to support these results, the best two models of this study for each sensitivity case were compared with the best models obtained with the method of Ng (2008). This method ranks the criteria in descending order of importance. In this study we ranked the modeling variables as rv = {SO2, WD, WS, T} in that order. This resulted in a matrix containing the aggregated performance measures of each variable ½PMv i¼fSO g PMv i¼fWDg PMv i¼fWSg PMv i¼fTg . Re2 sults are presented in Table 5. Ng method showed a great similarity with the suggested modeling configurations, in particular M1 and M8. Nevertheless, when all statistical quantities have the same importance (S1), Ng method could not give representative results. The main differences resulted when RMSE (S3) and LTA (S6) were slightly more important in the evaluation of model performance. In addition, this approach also showed that M2 reproduced relatively well the dynamics of the observed time series (S10). Thus, M1 and M8 were selected as the best alternatives for conducting a full air quality simulation in this specific simulation domain. They correspond to the simplest case (M1) and to a segmented integration with 6th order diffusion (M8). This suggests that convective parameterization would not be required. However, M2 included this parameterization and performed relatively well with respect to the dynamics. Ignoring data assimilation within the Planetary Boundary Layer resulted in the worst configurations in terms of both meteorology and plume transport (M5 and M10). Another aspect that is important in practical terms concerns to the portability, disk storage and scripting between M1 (continuous integration) and M8 (segmented integration). The results suggests that a single run could be as accurate as a segmented run, which would require relatively less disk storage, less scripting, increased portability, and slightly lower computing time. Scripting refers to the required code to pre-process, automate, and post-process the model inputs and outputs. For instance, in their numerical experiments (Lo et al., 2008) analyzed the influence of applying
S6 M8 M8
M1 M8
S7 M8 M1
M8 M8
S8 M9 M9
M8 M8
S9 M6 M6
M1 M1
S10 M9 M9
M1 M8
M2 M2
nudging in short-segments re-initializations for regional climate downscaling, and found that their results were the same as a continuous run with nudging of their experiments. They suggested that it could improve the portability of their experiments. Even though these practical aspects were not included in the procedure to select the best model configurations, it would be feasible to also weight model configurations with these practical criteria. 7. Conclusions The selection of the optimal model configurations for an Air Quality Model using Multi-Criteria Evaluation was investigated. The main motivation to apply this approach is because a preliminary simulation gave relatively good results so that a more accurate run could be used in future studies. For this purpose some of the most widely used statistical metrics for evaluating model performance were used as the criteria. A set of 10 different model configurations were used as the alternatives. In addition to the statistical metrics, an approach based on Local Trend Associations was also applied with the aim of evaluating the dynamics of the model configurations. The results suggested that the most simple model configuration in continuous integration (M1) and the model configuration with 6th order diffusion in multiple-integrations (M8) were the optimal configurations to conduct future air quality studies for the simulation period from 22 March 2006 to 27 March 2006. These configurations were Pareto optimal and prevailed after performing a sensitivity test to the weights of the criteria. In addition the application of the method of Ng (2008) suggested similar configurations. The criterion using Local Trend Associations provided insights of which configurations better reproduced the dynamics of meteorological variables, providing supporting information for selecting potential model configurations probably overlooked. Multi-criteria evaluation provided a more rigorous approach in the selection process of an optimum configuration for an Air Quality Model in a regional scale. Acknowledgments The authors thank Dr. Ed Scott Wilson Garcia for his help with MPICH. In addition, the experimental data of the MILAGRO campaign is greatly acknowledged. References Almanza, V., & Batyrshin, I. (2011). On trend association analysis of time series of atmospheric pollutants and meteorological variables in Mexico City Metropolitan Area. In J.-F. Martínez-Trinidad et al. (Eds.). Proceedings of 3rd Mexican conference on pattern recognition, MCPR’11 (Vol. 6718, pp. 95–102). Heidelberg: Springer. Almanza, V. H., Molina, L. T., & Sosa, G. (2012). Soot and SO2 contribution to the supersites in the MILAGRO campaign from elevated flares in the Tula Refinery. Atmospheric Chemistry and Physics, 12, 10583–10599. http://dx.doi.org/10.5194/ acp-12-10583-2012. Barrón-Adame, J. M., Cortina-Januchs, M. G., Vega-Corona, A., & Andina, D. (2012). Unsupervised system to classify SO2 pollutant concentrations in Salamanca, Mexico. Expert Systems with Applications, 39, 107–116. Batyrshin, I., Herrera-Avelar, R., Sheremetov, L., & Panova, A. (2007). Moving approximation transform and local trend associations in time series data bases. In I. Batyrshin, J. Kacprzyk, L. Sheremetov, & L. Zadeh (Eds.), Perception-based data mining and decision making in economics and finance. Studies in computational intelligence (Vol. 36, pp. 55–83). Springer PhysicaVerlag.
876
V.H. Almanza et al. / Expert Systems with Applications 41 (2014) 869–876
Behnamian, J., Ghomi, S., & Zandieh, M. (2009). A multi-phase covering Paretooptimal front method to multi-objective scheduling in a realistic hybrid flowshop using a hybrid metaheuristic. Expert Systems with Applications, 36, 11057–11069. Borge, R., Alexandrov, V., del Vas, J. J., Lumbreras, J., & Rodríguez, E. (2008). A comprehensive sensitivity analysis of the WRF model for air quality applications over the Iberian Peninsula. Atmospheric Environment, 42, 8560–8574. Cao, M. F., Huang, G. H., & He, L. (2011). An approach to interval programming problems with left-hand-side stochastic coefficients; an application to environmental decisions analysis. Expert Systems with Applications, 38, 11538–11546. Cifuentes, E., Blumenthal, U., Ruíz-Palacios, G., Bennett, S., & Peasey, A. (1994). Epidemiological panorama for the agricultural use of wastewater: The Mezquital Valley, Mexico. Salud Publica, Mexico, 36, 3–9. Dalalah, D., Hayajneh, M., & Batieha, F. (2011). A fuzzy multi-criteria decisión making for supplier selection. Expert Systems with Applications, 38, 8384–8391. de Foy, B., Krotkov, N. A., Bei, N., Herndon, S. C., Huey, L. G., Martínez, A.-P., et al. (2009). Hit from both sides: Tracking industrial and volcanic plumes in Mexico City with surface measurements and OMI SO2 retrievals during the MILAGRO field campaign. Atmospheric Chemistry and Physics, 9, 9599–9617. http:// dx.doi.org/10.5194/acp-6-2321-2006. Domanska, D., & Wojtylak, M. (2012). Application of fuzzy time series models for forecasting pollution concentrations. Expert Systems with Applications, 39, 7673–7679. Fast, J. D. (1995). Mesoscale modeling and four-dimensional data assimilation in areas of highly complex terrain. Journal of Applied Meteorology, 34, 2762–2782. Fast, J. D., Gustafson, W. I., Jr., Easter, R. C., Zaveri, R. A., Barnard, J. C., Chapman, E. G., et al. (2006). Evolution of ozone, particulates, and aerosol direct forcing in an urban area using a new fully-coupled meteorology, chemistry, and aerosol model. Journal of Geophysical Research, 111. http://dx.doi.org/10.1029/ 2005JD006721. Godowitch, J. M., Gilliam, R. C., & Rao, S. T. (2011). Diagnostic evaluation of ozone production and horizontal transport in a regional photochemical air quality modeling system. Atmospheric Environment, 45, 3977–3987. Grell, G. A., Peckham, S. E., Schmitz, R., McKeen, S. A., Frost, G., Skamarock, W. C., et al. (2005). Fully coupled ‘‘online’’ chemistry within the WRF model. Atmospheric Environment, 39, 6957–6975. Lei, W., de Foy, B., Zavala, M., Volkamer, R., & Molina, L. T. (2007). Characterizing ozone production in the Mexico City Metropolitan Area: A case study using a chemical transport model. Atmospheric Chemistry and Physics, 7, 1347–1366. http://dx.doi.org/10.5194/acp-7-1347-2007. Li, S.-T., & Shue, L. Y. (2004). Data mining to aid policy making in air pollution management. Expert Systems with Applications, 27, 331–340. Lo, J., Yang, Z. L., & Pielke, R. A. Sr., (2008). Assessment of three dynamical climate downscaling methods using the weather research and forecasting (WRF) model. Journal of Geophysical Research, 113, D09112. http://dx.doi.org/10.1029/ 2007JD009216. Molina, L. T., Madronich, S., Gaffney, J. S., Apel, E., de Foy, B., Fast, J., et al. (2010). An overview of the MILAGRO 2006 Campaign: Mexico City emissions and their transport and transformation. Atmospheric Chemistry and Physics, 10, 8697–8760. http://dx.doi.org/10.5194/acp-10-8697-2010. Molina, L. T., Molina, M. J., Slott, R. S., Kolb, C. E., Gbor, P. K., Meng, F., et al. (2004). Air quality in selected megacities. Journal of the Air & Waste Management Association, 54(12), 1–73. Ng, W. L. (2008). An efficient and simple model for multiple criteria supplier selection problem. European Journal of Operational Research, 186, 1059–1067. Ngan, F., Byun, D., Kim, H., Lee, D., Rappenglück, B., & Pour-Biazar, A. (2012). Performance assessment of retrospective meteorological inputs for use in air quality modeling during TexAQS 2006. Atmospheric Environment, 54, 86–96.
Niska, H., Rantamäki, M., Hiltunen, T., Karppinen, A., Kukkonen, J., Ruuskanen, J., et al. (2005). Evaluation of an integrated modeling system containing a multilayer perceptron model and the numerical weather prediction model HIRLAM for the forecasting of urban airborne pollutant concentrations. Atmospheric Environment, 39, 6524–6536. Parrish, D. D., Singh, H. B., Molina, L., & Madronich, S. (2011). Air quality progress in North American megacities: A review. Atmospheric Environment, 45, 7015–7025. Rivera, C., Sosa, G., Wöhrnschimmel, H., de Foy, B., Johansson, M., & Galle, B. (2009). Tula industrial complex (Mexico) emissions of SO2 and NO2 during the MCMA 2006 field campaign using a mobile mini-DOAS system. Atmospheric Chemistry and Physics, 9, 6351–6361. http://dx.doi.org/10.5194/acp-9-6351-2009. Russell, A., & Dennis, R. (2000). NARSTO critical review of photochemical models and modeling. Atmospheric Environment, 34, 2283–2324. Seaman, N. L. (2000). Meteorological modeling for air-quality assessments. Atmospheric Environment, 34, 2231–2259. Seinfeld, J. (1988). Ozone air quality models: A critical review. JAPCA, 38, 616–645. SEMARNAT-INE., Secretarıa del Medio Ambiente y Recursos Naturales/Instituto Nacional de Ecología., Inventario Nacional de Emisiones de Mexico 1999, Mexico, 2006. Skamarock, W. C., Klemp, J. B., Dudhia, J., Gill, D. O., Barker, D.M., Wang, W., et al. (2005). A description of the advanced research WRF version 2, NCAR Technical Note, NCAR/TN-468+STR, p. 8. United Nations (UN), (2012). Department of Economic and Social Affairs, Population Division: World urbanization Prospects: The 2011 revision. Vazquez-Alarcon, A., Justin-Cajuste, L., Siebe-Grabach, C., Alcantar-Gonzalez, G., & de la Isla-de Bauer, M. L. (2001). Cadmio, níquel y plomo en agua residual, suelo y cultivos en el Valle del Mezquital, Hidalgo, México. Agrociencia, 35, 267–274. Wang, J. J., Jing, Y. Y., Zhang, C. F., & Zhao, J. H. (2009). Review on multi-criteria decision analysis aid in sustainable energy decision-making. Renewable and Sustainable Energy Reviews, 13, 2263–2278. Weng, S. Q., Huang, G. H., & Li, Y. P. (2010). An integrated scenario-based multicriteria decision support system for water resources management and planning: A case study in the Haihe River Basin. Expert Systems with Applications, 37, 8242–8254. West, J. J., Zavala, M. A., Molina, L. T., Molina, M. J., San Martini, F., McRae, G. J., et al. (2004). Modeling ozone photochemistry and evaluation of hydrocarbon emissions in the Mexico City metropolitan area. Journal of Geophysical Research, 109, D19312. http://dx.doi.org/10.1029/2004JD004614. Williams, M. D., Brown, M. J., Cruz, X., Sosa, G., & Streit, G. (1995). Development and testing of meteorology and air dispersion models for Mexico City. Atmospheric Environment, 29, 2929–2960. Willmott, C. J., Ackleson, S. G., Davis, R. E., Feddema, J. J., Klink, K. M., Legates, D. R., et al. (1985). Statistics for the evaluation of models. Journal of Geophysical Research, 90(C5), 8995–9005. WMO. (2012). World Meteorological Organization/International Global Atmospheric Chemistry: GAW Report No. 205, WMO/IGAC Impacts of Megacities on air pollution and climate, Ch. 7, pp. 250–275. WRF. (2010). Weather Research and Forecasting, ARW version 3.2.1 Modeling System User’s Guide, National Center for Atmospheric Research. WRF-Chem version 3.2 user guide. (2010). Zhang, Y., Bocquet, M., Mallet, V., Seigneur, C., & Baklanov, A. (2012). Real-time air quality forecasting part I: History, techniques and current status. Atmospheric Environment, 60, 632–655. Zhang, Y., & Dubey, M. K. (2009). Comparisons of WRF/Chem simulated O3 concentrations in Mexico City with ground-based RAMA measurements during the MILAGRO period. Atmospheric Environment, 43, 4622–4631. Zhou, Q., Huang, G., & Chan, C. (2004). Development of an intelligent decision support system for air pollution control at coal-fired power plants. Expert Systems with Applications, 26, 335–356.