Atmospheric Environment 36 (2002) 5953–5959
A simple semi-empirical model for predicting missing carbon monoxide concentrations Kim N. Dirksa,*, Murray D. Johnsa, John E. Haya, Andrew P. Sturmanb b
a University of Auckland, Private Bag 92019, Auckland, New Zealand University of Canterbury, Private Bag 4800, Christchurch, New Zealand
Received 8 May 2002; received in revised form 4 September 2002; accepted 11 September 2002
Abstract Carbon monoxide monitoring using continuous samplers is carried out in most major urban centres in the world and generally forms the basis for air quality assessments. Such assessments become less reliable as the proportion of data missing due to equipment failure and periods of calibration increases. This paper presents a semi-empirical model for the prediction of atmospheric carbon monoxide concentrations near roads for the purpose of interpolating missing data without the need for any traffic or emissions information. The model produces reliable predictions while remaining computationally simple by being site-specifically optimized. The model was developed for, and evaluated at, both a suburban site and an inner city site in Hamilton, New Zealand. Model performance statistics were found to be significantly better than other simple methods of interpolation with little additional computational complexity. r 2002 Elsevier Science Ltd. All rights reserved. Keywords: Carbon monoxide; Urban air quality; Empirical modeling; Interpolation; Missing data
1. Introduction Carbon monoxide, originating mainly from vehicles, is one of the major sources of air pollution in New Zealand’s urban centres. For this reason, air quality monitoring is often carried out adjacent to busy roads. Regional authorities in New Zealand generally base their assessment of the quality of air on statistics such as the proportion of time over which the air pollution levels are in excess of the recommended air quality guideline values. Such assessments of air quality may be estimated from archives of monitoring data without the need for an air quality model. However, the results become less reliable when a significant proportion of the data are missing due to equipment failure, routine maintenance or calibration. In order to interpolate for these missing data so that better air quality assessments can be made, an air pollution model may be used. Many of the *Corresponding author. E-mail address:
[email protected] (K.N. Dirks).
existing commercial models (generally designed to make predictions for sites where air pollution monitoring has yet to be carried out) are too complex given the limitation of the routinely available meteorological and traffic data. Even the standard box model requires knowledge of the depth of the box, information that generally cannot be obtained from wind measurements at only one height. The topographical structures generally found in urban and suburban environments result in complex wind flows which are difficult to model. Indeed, even many of the quite detailed models have been found to be unsuitable for complex urban and suburban environments (Hoydysh and Dabberdt, 1994). Also, in many situations for which continuous concentrations are required, no traffic information exists for the area of interest, a basic requirement of such models. Another approach to the problem is to use a simplistic interpolation method such as assuming the ‘season’s average’ concentration for the particular time of day that is missing, or linearly interpolating between the previous and the following day, in order to obtain
1352-2310/02/$ - see front matter r 2002 Elsevier Science Ltd. All rights reserved. PII: S 1 3 5 2 - 2 3 1 0 ( 0 2 ) 0 0 7 6 7 - 7
K.N. Dirks et al. / Atmospheric Environment 36 (2002) 5953–5959
5954
continuous data sets. Neither of these methods is ideal since the meteorology on the missing day may have been significantly different from the days on which the interpolation was based, leading to unrealistic predictions. Clearly, a complementary method is required. This paper presents a simple semi-empirical model for the prediction of carbon monoxide concentrations near roads based on existing wind speed, wind direction and carbon monoxide concentration data collected at the sampling site of interest. In the absence of continuous traffic information for the site and under the assumption that traffic flow rates are the same for a given time of day for weekdays, the emission rate and the background concentrations may be determined empirically for each time of day using regression. The model parameters may then be used to interpolate for missing carbon monoxide data providing wind speed and direction data exist for the site for the periods for which carbon monoxide data are missing and interpolation is required. As with the semi-empirical model for urban PM10 concentrations presented by Kukkonen et al. (2001), new parameters are required if the model is to be used for a different site. The model is kept as simple as possible in order to facilitate its implementation. The model is developed for, and evaluated at, both a suburban and an inner city site in Hamilton, New Zealand.
2. Model description This semi-empirical model is based on the box model approach where the emission rate Q (mg m1 s1) is assumed to be constant along a road and the pollutants are mixed uniformly within a two-dimensional box of height Dz (m) (Hanna et al., 1982). The horizontal wind speed, u (m s1), assumed to be uniform within the layer and running at an angle y to the road, removes the pollutants through advection. At the same time, pollutants are introduced into the box through advection of the background concentration. The concentration C (mg m3) under steady-state conditions and for conditions when the monitor is leeward of the road is given by C¼
Q þ CB ; uDz sin y
ð1Þ
where the background concentration is given by CB (mg m3). The semi-empirical model presented here assumes the concentration ðCÞ to be inversely related to the wind speed ðuÞ (as with the box model) but with a wind speed offset, uo (m s1), included to avoid severe overpredictions in very light wind speed conditions as suggested by Chock (1978). The assumption of the inverse relationship with the spread of the plume in the vertical direction ðDzÞ is maintained.
Observations at the site suggest that, provided the leeward and windward wind cases are separated, there does not appear to be an effect of wind direction on concentration for the two sites tested in this study. This is discussed in more detail later. For this reason, to avoid problems of discontinuity, and for the sake of simplicity, the ðsin yÞ1 term was omitted from the semiempirical model. An important aspect of the semi-empirical model presented here, particularly in terms of fulfilling the purpose of being able to use the model for interpolating missing data, is that predictions are made for wind directions such that the monitor is located both windward and leeward of the road (i.e., for all wind conditions), rather than simply leeward conditions. Concentrations for leeward conditions are given by C¼
Ql þ Cl ; Dzðu þ uo Þ
ð2Þ
while for windward conditions by C¼
Qw þ Cw : Dzðu þ uo Þ
ð3Þ
The emissions term for leeward conditions ðQl Þ incorporates both emissions from the road adjacent to the monitor and emissions from other roads in the vicinity which have a wind speed dependence. In contrast, the emissions term for the windward conditions ðQw Þ consists of simply the contribution from other roads in the vicinity that have a wind speed dependence. The terms Cl and Cw represent the background concentrations for leeward and windward conditions, respectively. The optimum value for uo was determined empirically through the minimization of the model root-meansquared error (RMSE). Optimum model parameters (Ql Dz1 ; Cl ; Qw Dz1 and Cw ) were found by performing linear regressions of carbon monoxide concentration ðCÞ on the wind speed function ðu þ u0 Þ1 for leeward and windward conditions separately for each 10 min interval throughout the day across all days in the data set. This gives 144 regression coefficients and 144 intercepts, one for each 10 min interval throughout the day, calculated simply using the wind speed and concentration measurements for the site. Weekend data were treated separately from weekday data (giving a new set of optimum parameters for weekends) as traffic flow patterns were expected to be significantly different. In both cases the optimum parameters were constrained to avoid negative concentration predictions. If the slope (Ql Dz1 or Qw Dz1 ) was found to be negative for a particular time interval, it was set to zero and the intercept parameter (Cl or Cw ) was taken as the average concentration for the particular time interval. In other words, it was assumed that the concentration was independent of wind speed. If the intercept parameter was found to be negative, it was set to zero and the
K.N. Dirks et al. / Atmospheric Environment 36 (2002) 5953–5959
regression coefficient was re-evaluated with the intercept constrained to be zero. Data were also sorted by season to take into account the seasonal patterns in the meteorology. To evaluate the model for the purpose of interpolating missing data, the fictitious point method (Seaman, 1983) was used whereby each day was systematically removed from the data set, predictions made for that day and then compared with the observed concentrations. This procedure was repeated for all days in the data set. The model results are presented based on the rootmean-squared error (RMSE), the relative-root-meansquared error (RRMSE), and the correlation coefficient between the observed and predicted concentrations. The model parameters are considered optimized when the model RMSE is minimized. The ability of the semi-empirical model to predict for missing days is also tested relative to other simplistic interpolation methods. The first is to assume the ‘season average’ concentration for the particular time of day. The second approach is to assume the same concentration as recorded on the previous weekday at the same time of day, while the third method is to linearly interpolate between the previous weekday and the following weekday (interpolation method) for each time interval throughout the day. The ‘previous day extrapolation’ method is of particular interest because Cogliani (2001) has suggested that the air pollution index on the previous day is a significant predictive factor for air pollution levels on a given day.
3. Data
5955
months at the suburban site during the summer and autumn of 1997 giving two sets of 60 week days and 27 weekend days each. 3.2. Wind direction effect on concentration Relatively steady wind direction was observed for the autumn season at the inner city site throughout the period of observation, almost certainly due to channeling as a result of local topography. Much more variable wind directions were observed at the suburban site. Fig. 1 shows the effect of wind direction on concentration for the suburban site with the data sorted into wind speed quartiles. Wind directions between 1801 and 3601 represent conditions when the monitor was leeward of the road. Concentrations for leeward conditions are clearly higher than for windward conditions, but it is not clear that concentrations for near-parallel leeward conditions are higher than for near-perpendicular conditions, as has been suggested to be generally the case (Hoydysh and Dabberdt, 1994). An exception is perhaps the medium-to-high wind speed case that shows a slight tendency towards higher concentrations in parallel wind conditions. This suggests that there is no obvious wind direction dependence, although a model would clearly require the sorting of leeward from windward conditions. Similar conclusions were drawn for the summer season at this site. At the inner city site, it was found that the wind was such that the monitor was persistently leeward of the road. For this reason, it was assumed that the single leeward component of the model applied for all intervals throughout the monitoring period.
3.1. Site descriptions and data sets 4. Results and discussion The city of Hamilton has a population of 117 000 people and is the fourth largest urban centre in New Zealand. It is the main New Zealand centre located inland and is positioned in a shallow basin. Seasonal wind frequencies show that westerly winds prevail during the spring and summer but that there is no predominant wind direction during the autumn and winter seasons. The annual average wind speed at this site is 2.0 m s1 (L. Pigott, pers. com). Two air quality monitoring sites have been established in Hamilton: an inner city site and a suburban site. The inner city site is located near a T-junction within the city’s central business district, while the suburban site is located 85 m along one branch of a four-way lightcontrolled intersection. Wind speed, wind direction and carbon monoxide concentrations were recorded continuously at both sites with 10 min averages archived. Data were collected for 3 months during the winter of 1996 at the inner city site giving a set of 40 week days and 14 weekend days. Data were also collected for 6
In the first instance, the value of the wind speed offset, uo ; was determined empirically through the minimization of the model root-mean-squared error (RMSE) for the 40 days of winter weekday data at the inner city site and for the 60 days each of summer and autumn weekday data at the suburban site. The optimum value for uo was found to be 0.38 m s1 for the inner city site and 0.1 m s1 for the two seasons at the suburban site. In all three cases the optimization minima were very shallow indicating that only a very small change in model performance was observed throughout the range of possible uo values. This means that there was little quantitative improvement to be gained by including a wind speed offset. However, its inclusion significantly reduced short-term severe over-predictions otherwise encountered at low wind speeds and it was therefore retained. Fig. 2 shows the leeward optimum model parameters for the winter season at the inner city site for weekdays
K.N. Dirks et al. / Atmospheric Environment 36 (2002) 5953–5959 16
16
12
12
[CO] (mg m−3)
[CO] (mg m−3)
5956
8
8
4
4
0
0 90
270
360
90
450
(b)
Wind Direction Relative to Road (degrees)
16
12
12
8
4
270
360
450
8
4
0
0 90
(c)
180
Wind Direction Relative to Road (degrees)
16
[CO] (mg m−3)
[CO] (mg m−3)
(a)
180
180
270
360
450
90
(d)
Wind Direction Relative to Road (degrees)
18
270
360
450
Wind Direction Relative to Road (degrees)
Fig. 1. The effect of wind direction on concentration at the suburban site for the autumn season sorted by wind speed quartile. Angles from 1801 to 3601 correspond to conditions where the monitor is leeward of the road, while angles from 901 to 1801 and from 3601 to 4501 represent windward wind conditions: (a) lowest wind speed quartile, (b) second lowest wind speed quartile, (c) second highest wind speed quartile, and (d) highest wind speed quartile.
5
2.0
3
1.5
2
1.0
1
0.5
0 0:00
6:00
12:00
18:00
Cl (mg m−3)
Ql/∆Z (mg m2 s-1)
4
2.5 Ql/∆Z Cl
0.0 24:00
Time (hh:mm)
Fig. 2. Diurnal variations in the model optimum parameters for weekdays at the inner city site.
with all 40 days included. Note that the parameter Ql Dz1 peaks during the day and shows a minimum during the night. The background concentration tends to remain low throughout the 24 h period for this site. Fig. 3 shows the model optimum parameters throughout the day for weekdays for leeward and windward conditions for the autumn seasons at the suburban site. Note that Ql Dz1 and Qw Dz1 were found to be highest
during the day and lowest during the night with peaks around the morning and evening rush hours. As expected, both parameters for the leeward conditions are higher than the equivalent parameters for the windward conditions. Weekend data were treated separately from weekday data because the weekend traffic flow patterns were expected to be significantly different and therefore
K.N. Dirks et al. / Atmospheric Environment 36 (2002) 5953–5959
these days are significantly different from the weekdays. A similar procedure was used to show the importance of treating both windward and leeward intervals separately for the suburban site for both seasons. Fig. 5 shows an example of 1 week of predicted and observed concentrations for the autumn season at the suburban site and 1 week of observed and predicted concentrations at the inner city site. The fictitious point method was used to obtain ‘independent’ predictions. For example, for a 40 day data set, the remaining 39 days were used to produce optimum model parameters to predict for the ‘missing’ day. As with most models based on regression, there is a slight tendency to underestimate peak concentrations and overestimate for low concentrations. However, in general, the model produces good predictions of concentration. The RMSE, RRMSE and the correlation coefficient between the observed and predicted
RMSE O%
ð4Þ
was calculated for each day, where O% is the mean of the observed concentrations for a day. The day showing the largest error according to the RRMSE was removed and the routine was repeated until only 3 days remained in the data set. Fig. 4 shows the average RRMSE as a function of the number of days removed from the data set, with the days removed labelled as ‘weekend’ or ‘weekday’ days. This figure shows that there was a strong tendency for weekend days to be removed first, indicating that indeed
2.0
3
1.5
2
1.0
1
0.5
0 00:00
06:00
(a)
12:00 Time (hh:mm)
Qw /∆Z (mg m−2s−1)
4
5
2.5 Ql/∆Z Cl
Cl (mg m−3)
Ql/∆Z (mg m−2s−1)
5
0.0 24:00
18:00
2.5 Qw /∆Z Cw
4
2.0
3
1.5
2
1.0
1
0.5
0 00:00
06:00
(b)
12:00 Time (hh:mm)
CW (mg m−3)
required separate optimum parameters. In order to justify this decision quantitatively, a routine was developed which optimized the model parameters based on all the days, including weekend days. The relativeroot-mean-squared error (RRMSE) given by RRMSE ¼
5957
0.0 24:00
18:00
Fig. 3. Autumn parameters at the suburban site for (a) leeward and (b) windward conditions.
0.75 Weekdays Weekends
0.70
Mean RRMSE
0.65 0.60 0.55 0.50 0.45 0.40 0.35 0.30 0
5
10
15
20
25
30
35
40
45
50
55
Number of Days Removed
Fig. 4. The ‘worst’ day is systematically removed from the data set and overall model performance evaluated based on the reduced data set until three days remain. This highlights the need to treat weekends separately from weekdays.
K.N. Dirks et al. / Atmospheric Environment 36 (2002) 5953–5959
5958 8
8
Predicted [CO] (mg m−3)
Observed Predicted
[CO] (mg m−3)
6
4
2
0 29-Jul-96
6
4
2
0 30-Jul-96
31-Jul-96 01-Aug-96 02-Aug-96 03-Aug-96 04-Aug-96 05-Aug-96
(a)
Date
0
2
0
2
(b)
4
6
8
4
6
8
Observed[CO] (mg m−3)
8
8
Predicted [CO] (mg m−3)
Observed Predicted
[CO] (mg m−3)
6
4
2
6
4
2
0
0 15-Jun-98 16-Jun-98 17-Jun-98 18-Jun-98 19-Jun-98 20-Jun-98 21-Jun-98 22-Jun-98
(c)
Date
(d)
−3 Observed[CO] (mg m )
Fig. 5. One week of observed and model predicted concentrations for the winter season at the inner city site. 3 and 4 August 1996 were weekend days (a) Time series of observed and predicted. (b) Predicted versus observed concentrations. Equivalent graphs for the autumn season at the suburban site are given in (c) and (d). 20 and 21 June 1998 were weekend days. Data series start on Monday with tick marks indicating midnight.
Table 1 Model performance statistics for the semi-empirical model Site
Season
Weekday/weekend
RMSE (mg m3)
RRMSE
Correlation coefficient
Inner city Suburban Suburban Inner city Suburban Suburban
Winter Summer Autumn Winter Summer Autumn
Weekday Weekday Weekday Weekend Weekend Weekend
0.55 0.25 0.61 0.51 0.20 0.51
0.54 0.61 0.84 0.87 0.57 1.00
0.81 0.72 0.77 0.57 0.60 0.60
concentrations were calculated for each season at each of the sites and are presented in Table 1. Weekend results are presented separately from weekday results. Although the RMSE values were found to be better for the weekend data than for the weekday data, the RRMSE and correlation coefficient were found to be worse. This indicates that the model performed better on weekdays and that the improved RMSE for weekends was simply due to the lower concentrations observed during the weekend. The poorer model performance at the weekends is likely to be due to the greater number of weekdays within the data set than weekend days in order to optimize the model parameters (a ratio of 5 to 2). Less data results in poorer estimates of optimum parameters. Also, traffic patterns change less from day to day for
Table 2 Average model statistics for three interpolation methods compared to the semi-empirical model applied to weekdays for the inner city site Method
RMSE (mg m3)
RRMSE
Correlation coefficient
Semi-empirical model Seasonal averaging Previous day extrapolation Linear interpolation
0.55
0.54
0.81
0.69
0.77
0.55
0.96
1.01
0.35
0.99
1.00
0.44
K.N. Dirks et al. / Atmospheric Environment 36 (2002) 5953–5959
5959
Table 3 Average model statistics for three interpolation methods compared to the semi-empirical model applied to weekdays for the suburban site Method
RMSE (mg m3)
Semi-empirical model Seasonal averaging Previous day extrapolation Linear interpolation
0.25 0.29 0.36 0.31
RRMSE 0.525 0.78 1.03 0.91
0.61 0.77 1.00 0.81
Correlation coefficient 0.68 1.21 1.51 1.26
0.72 0.59 0.44 0.52
0.77 0.58 0.34 0.43
Numbers in italics are the results for the summer while those in regular font are for the autumn.
weekdays but there are significant differences between Saturdays and Sundays. Anomalies in traffic flow patterns (sports events/fairs, etc.) are also expected to be more common during the weekend than weekdays. These would affect the optimum parameters for the weekend days and result in poorer model performance. Tables 2 and 3 compare the performance of the semiempirical model to the results obtained using various simplistic interpolation methods for both the inner city and suburban sites. ‘Seasonal averaging’ assumes the season average of a particular time of day for any missing period. ‘Previous day extrapolation’ assumes the same concentration as recorded on the previous weekday at the same time of day, while the ‘linear interpolation’ linearly interpolates between the previous weekday and the following for each missing time interval throughout the day. Each of these methods is applied to the 40 days of weekday data at the inner city site and both sets of 60 days of weekday data at the suburban site assuming that each day in turn is missing. The model performance statistics shown in Table 2 are averaged over the 40 (or 60) days for the four interpolation methods, including the semi-empirical model. Clearly, interpolating missing data using the semi-empirical approach produces much more reliable estimates than other common methods, with relatively little additional computational effort though clearly wind data for the site and period of interest must be available.
5. Conclusions A simple semi-empirical model for the interpolation of missing of carbon monoxide concentrations near roads is presented. The main difference between this model and most existing models is that the parameters of this model are optimized for each site of interest, so that many of the complexities of the site-specific characteristics, such as turbulence, can be incorporated
into the optimum parameters. Data have also been sorted by time of day so that predictions may be made without the need for traffic or emissions information. Given that data already exist for the site, and despite the simplicity of the model, model performance statistics show that the model performs better than other simple interpolation methods, without the need to introduce a complex commercial model for the purpose. The usefulness of a model of this type lies in its ability to produce reliable estimates of pollutant concentrations with minimal data and computational requirements.
Acknowledgements The authors wish to thank Environment Waikato for providing the data required for this study and for their financial support throughout the project.
References Chock, D.P., 1978. A simple line-source model for dispersion near roadways. Atmospheric Environment 12, 823–829. Cogliani, E., 2001. Air pollution forecast in cities by an air pollution index highly correlated with meteorological variables. Atmospheric Environment 35, 2871–2877. Hanna, S.R., Briggs, G.A., Hosker, R.P., 1982. Handbook on Atmospheric Diffusion, Technical Information Center, US Department of Energy, Virginia, pp. 102. Hoydysh, W.G., Dabberdt, W.F., 1994. Concentration fields at urban intersections: fluid modelling studies. Atmospheric Environment 28 (11), 1849–1860. Kukkonen, J., Harkonen, J., Karppinen, A., Pohjola, M., Peitarila, H., Koskentalo, T., 2001. A semi-empirical model for urban PM10 concentrations, and its evaluation against data from an urban measurement network. Atmospheric Environment 35, 4433–4442. Seaman, R.S., 1983. Objective analysis accuracies of statistical interpolation and successive correction schemes. Australian Meteorological Magazine 31, 225–240.