Ozone indices based on simple meteorological parameters: potentials and limitations of regression and neural network models

Ozone indices based on simple meteorological parameters: potentials and limitations of regression and neural network models

Atmospheric Environment 33 (1999) 4299}4307 Ozone indices based on simple meteorological parameters: potentials and limitations of regression and neu...

502KB Sizes 0 Downloads 66 Views

Atmospheric Environment 33 (1999) 4299}4307

Ozone indices based on simple meteorological parameters: potentials and limitations of regression and neural network models G. Soja*, A-M. Soja Research Centre Seibersdorf, Department of Environmental Research, A-2444 Seibersdorf, Austria Received 18 September 1998; accepted 11 February 1999

Abstract This study tested the usefulness of extremely simple meteorological models for the prediction of ozone indices. The models were developed with the input parameters of daily maximum temperature and sunshine duration and are based on a data collection period of three years. For a rural environment in eastern Austria, the meteorological and ozone data of three summer periods have been used to develop functions to describe three ozone exposure indices (daily maximum, 7 h mean 9.00}16.00 h, accumulated ozone dose AOT40). Data sets for other years or stations not included in the development of the models were used as test data to validate the performance of the models. Generally, optimized regression models performed better than simplest linear models, especially in the case of AOT40. For the description of the summer period from May to September, the mean absolute daily di!erences between observed and calculated indices were 8$6 ppb for the maximum half hour mean value, 6$5 ppb for the 7 h mean and 41$40 ppb h for the AOT40. When the parameters were further optimized to describe individual months separately, the mean absolute residuals decreased by )10%. Neural network models did not always perform better than the regression models. This is attributed to the low number of inputs in this comparison and to the simple architecture of these models (2-2-1). Further factorial analyses of those days when the residuals were higher than the mean plus one standard deviation should reveal possible reasons why the models did not perform well on certain days. It was observed that overestimations by the models mainly occurred on days with partly overcast, hazy or very windy conditions. Underestimations more frequently occurred on weekdays than on weekends. It is suggested that the application of this kind of meteorological model will be more successful in topographically homogeneous regions and in rural environments with relatively constant rates of emission and long-range transport of ozone precursors. Under conditions too demanding for advanced physico/chemical models, the presented models may o!er useful alternatives to derive ecologically relevant ozone indices directly from meteorological parameters.  1999 Elsevier Science Ltd. All rights reserved. Keywords: Meteorological model; Regression model; Neural network model; Ozone index; Calculation

1. Introduction Surface-level ozone is the primary constituent of photooxidative smog. It is known that an elevated con-

* Corresponding author. E-mail address: [email protected] (G. Soja)

centration of ozone is a potential human health hazard (Spektor et al., 1991; Kinney et al., 1996) and a!ects vegetation adversely. The negative e!ects may relate to visible foliar injury, to physiological impairments, to enhanced leaf senescence and consequently to signi"cant yield and growth reduction in agricultural crops and forests (Krupa et al., 1994; Legge et al., 1995, Loibl and Smidt, 1996; Hogsett et al., 1997).

1352-2310/99/$ - see front matter  1999 Elsevier Science Ltd. All rights reserved. PII: S 1 3 5 2 - 2 3 1 0 ( 9 9 ) 0 0 1 2 6 - 0

4300

G. Soja, A-M. Soja / Atmospheric Environment 33 (1999) 4299}4307

The importance of ozone as an air quality parameter has induced legislative measures establishing national air quality standards. Some of these include mandatory public warnings and tra$c restrictions. The institution of public information systems and of possible sanctions in cases where ozone limit values are exceeded has increased the demand for e!ective prediction models for maximum ozone concentrations. Prediction models either rely on the statistical analysis of previous atmospheric conditions (Proyou et al., 1991; Abdul-Wahab et al., 1996; Spichtinger et al., 1996; Xu et al., 1996; Yi and Prybutok, 1996) or on theories related to physical and chemical processes in the atmosphere (Stedman and Williams, 1992; Renner et al., 1993; Simpson, 1993, Zlatev et al., 1993). Many of these models require copious input of physical and chemical data, demanding sophisticated data collection campaigns. If these inputs are not known, the transfer and application of a model from one region to another is problematic. Reliable emission inventories are indispensable for the chemical and physical/chemical models (Schneider et al., 1997), but they are only partly existent and are limited in their regional validity. Although ozone prediction models exist and some are in use, they mainly concentrate on hourly or maximum ozone concentrations and not on other exposure (or absorption) indices that are more meaningful for assessments of ecological e!ects (Fuhrer et al., 1997). Therefore, the purpose of this work is threefold:

monitoring station (Seibersdorf: 48300 north, 16330 east) during the months May to September. Seibersdorf is situated within 30}50 km downwind of the industrial areas of Vienna, Bratislava and Wiener Neustadt in an open landscape characteristic of eastern Austria. It was assumed that this region is exposed to a typical rural environment concerning precursor levels, meteorological conditions and ozone concentrations. Thus, Seibersdorf was considered to be representative for a rural region (outside of the urban areas) of 3000}4000 km. For testing the models, data from another year (1996) at the same station and from a di!erent station in the same geographical region (Stixneusiedl) were used. For calculation of the ozone indices, linear, nonlinear, and neural network functions were developed. Daily maximum temperature and daily sunshine duration were used as input or independent parameters. Three di!erent ozone indices characterizing the daily ozone impact were used as output or dependent variables: 7 h mean (9.00}16.00 h), daily integrated ozone dose AOT40 (accumulated over threshold of 40 ppb during daylight hours), and maximum half hour mean. For each of the ouput variables, di!erent models were developed: one for the entire period from May to September without seasonal weighting, and "ve for the individual months with optimized coe$cients for each month. The following commercial software packages were used for the developments:

E To develop models for direct prediction of daily ozone (exposure) indices e.g. 7 h mean or AOT40. E To test the usefulness of models with a minimum of meteorological input parameters. E To compare the performance and analyse the limitations of linear, nonlinear and neural network models. These types of models were chosen to test the usefulness of classical modelling tools against an alternative methodology based on pattern recognition theory. Neural networks have proved their usefulness especially in cases when the mathematical relationships describing the processes of the system to be studied were not fully known (Batchelor, 1998; Gardner and Dorling, 1998).

1. WinStat (G. Greulich Software, Hohenstaufen, Germany), 2. TableCurve 3D (SPSS, Erkrath, Germany), 3. Statistica (StatSoft, Tulsa, USA), 4. Neural Connection (SPSS, Erkrath, Germany).

This study wanted to test if a neural network would perform e$ciently also in the case of a very low number of inputs. Furtheron, this work was conducted to analyse the precision with which ozone indices can be calculated when emission inventories and chemical input data are not known or are too uncertain to apply advanced physicochemical models successfully. 2. Data sources and model descriptions The data sets for developing the models were obtained during three consecutive years (1993}1995) from a rural

For the neural network models, a 2-2-1 architecture (with two hidden nodes) proved to give the best results (Table 1). 2.1. Model development and initial results At "rst, linear regression functions were developed as the simplest approach to determine the dependence of the ozone indices on maximum temperature and sunshine duration. In a pre-study, these parameters were found to be the most signi"cant meteorological inputs for explaining ozone exposure indices in Eastern Austria (results not shown). This choice also largely con"rms "ndings by Spichtinger et al. (1996) for the Munich region. In Figs. 1 and 2 the univariate dependence of these variables is shown. The 7 h mean as well as the ozone dose AOT40 showed a higher correlation to temperature (r"0.64 and 0.56, respectively) than to sunshine duration (r"0.57 and 0.49, respectively). The eminent importance of temperature incorporation in ozone models has also been stressed by Olszyna et al. (1997).

G. Soja, A-M. Soja / Atmospheric Environment 33 (1999) 4299}4307

4301

Table 1 Performance of di!erent architectures (di!erent numbers of hidden nodes) for multi-layer perceptron (MLP) and Bayesian network models (B) with test data of ozone dose AOT40. RMS: root mean square of error, which is the square root of the mean square of the di!erence between predicted and actual output, m.a.e.: mean absolute error Model type MLP Number of 2 hidden nodes

MLP 3

MLP 4

B 2

B 3

B 4

RMS m.a.e.

57.5 42.5

52.0 40.5

56.2 41.2

72.3 55.2

69.8 51.1

52.3 39.24

Fig. 2. Linear regressions between the daily ozone AOT40 (daylight hours) and meteorological parameters (top: sunshine duration, bottom: maximum temperatures) during three summer periods (1993}1995).

Fig. 1. Linear regressions between the daily ozone 7 h mean (9.00}16.00) and meteorological parameters (top: sunshine duration, bottom: maximum temperatures) during three summer periods (1993}1995).

These inputs were later combined to develop multiple linear and nonlinear regression functions.Tables 2 and 3 make it clear that the performance of the optimized regression models usually was superior (and only in a few cases equivalent) to the simplest case of linear models (z"a#bx#cy). The improvement using those regres-

sion functions was more pronounced in the case of AOT40 (#6% of explained data variance) than in the case of the other ozone indices. Generalizing these results, the optimized regression models (Figs. 3}5) were considered to be the better choice for describing the dependence of the ozone indices on the analysed meteorological parameters. The combination of all data to determine functions for the whole summer period (May to September) resulted in correlation coe$cients (r) very similar to the mean of the r of the individual months (indicated as the percentages of the explained variance in Table 2). The daily 7 h mean (9.00}16.00 h) produced slightly better regression-"ts than the corresponding ozone dose and ozone maximum. Apparently, the importance of additional physical and chemical parameters not included in these models is higher for these ozone indices than for the 7 h mean. Our results are in accordance with the report of Bloom"eld et al. (1996) who also preferred the use of regression models for relating ozone to meteorological conditions.

4302

G. Soja, A-M. Soja / Atmospheric Environment 33 (1999) 4299}4307

Table 2 Regression functions for explaining the dependence of ozone indices (z) on daily maximum temperature (x, in 3C) and sunshine duration (y, in h). The numbers in the columns a, b, and c indicate a constant and the regression coe$cients for the independent variables. Additionally, the performance of z"a#bx#cy in comparison to the best linear and nonlinear models is given Month/period

a

Function

Maximum half-hour mean (in ppb): May z"a#bx #cy June z"a#bx ln x#cy  July z"a#bx#cy ln y August z"a#bx#cy September z"a#b/x #cy All months z"a#bx #cy ln y

b

25.7 13.5 !9.94 19.0 94.7 20.2

7 h mean (in ppb): May z"a#b exp(!x/c)exp(!y/c) June z"a#bx #cy  July z"a#bx #cy  August z"a#bx #cy ln y September z"a#bxln x#cy All months z"a#bx #cy

29.2 9.30 15.6 18.0 9.55 22.0

c

0.248 0.392 2.40 0.0473 !254 0.230

% of variance % of variance explained by explained by linear and z"a#bx#cy nonlinear models

0.00787 5.33 1.21 0.0536 0.142 0.515

2.15 !13.6 0.210 6.00 0.00617 4.93 0.00720 0.254 0.291 1.527 0.00505 1.491

57.5 64.2 51.4 67.5 67.4 63.2

54.9 63.7 51.2 66.5 65.6 62.9

71.6 73.9 64.1 84.3 75.5 73.5

67.0 73.3 63.5 82.4 75.5 72.4

69.8 64.2 66.1 75.7 70.5 70.0

59.3 63.9 64.5 67.9 57.6 64.2

AOT40 (in ppb. h): May June July August September All months

z"a exp(!x/b)exp(!y/c) z"a#bx#cy  ln y z"a#bx #cy z"a#bx #cy z"a#bx#cy z"a#bx#cy 

3.05 !106 !105 18.7 !22.9 !43.9

Table 3 Comparison of regression models and neural network models for di!erent ozone indices. The data are the sum of absolute residuals for a test data set not included in the development of the models Model

O max. O 7 h O dose    half hour mean mean AOT40

z"a#bx#cy Other linear and nonlinear regressions Neural networks: Multi-layer perceptron Radial basis function Bayesian

1216 1029

1018 998

7380 6229

1229 1179 1203

1018 996 996

6352 6229 6397

2.2. Model validation The best regression models were applied to two sets of test data not included in the development of the functions (Tables 4 and 5). The mean absolute daily di!erences

!9.18 !7.55 0.282 13.4 0.0523 0.784 !0.174 0.038 0.00314 0.0636 0.00704 0.228

between the observed and the calculated indices (residuals) were 8$6 ppb (mean$s.d.) for the maximum half-hour mean value, 6$5 ppb for the 7 h mean (9.00}16.00 h) and 41$40 ppb h for the daily integrated ozone dose AOT40. When the optimized models for the individual months were used, the mean absolute residuals decreased by 0}10% (Tables 4 and 5). For a further comparison, neural network models were developed for the purpose of comparing their performance with the regression models (Table 3). Both types of models were applied to the test data set used in Table 4 (Seibersdorf, 1996). The comparison was based on the sum of the absolute values of the residuals for the individual ozone indices. Using the functions for the combined period from May to September as examples, in no instance did the neural network models produce clear improvements. In the cases of the 7 h mean (9.00}16.00 h) and the ozone dose AOT40, the neural network models matched the performance of the non linear functions. The maximum half hour mean, however, was best reproduced by the optimized regression functions. Apparently, the logarithmic expression included in the regression function for this ozone index (Table 2) could not satisfactorily

G. Soja, A-M. Soja / Atmospheric Environment 33 (1999) 4299}4307

Fig. 3. Dependence of the daily ozone dose AOT40 on sunshine duration and maximum temperature.

Fig. 4. Dependence of the daily ozone 7 h mean (9.00}16.00) on sunshine duration and maximum temperature.

4303

4304

G. Soja, A-M. Soja / Atmospheric Environment 33 (1999) 4299}4307

Fig. 5. Dependence of the daily maximum ozone half-hour mean on sunshine duration and maximum temperature.

Table 4 Validation of the regression models with test data (not included in the models). Location/year: Seibersdorf 1996. Actually observed mean values and di!erences between observed and calculated ozone exposure indices (monthly mean$s.d.). O max: maximum half hour  mean value in nl l \, MV7: 7 h mean value in nl l \, AOT40: daily integrated ozone dose above 40 nl l\ during daylight hours) O max  Observed mean

Mean absolute residual

Regression models for individual months: May 45$14 10.4$7.6 June 60$17 8.4$7.6 July 48$9 7.9$4.6 August 50$11 4.9$2.8 September 28$12 6.0$6.0 Regression model for the combined months: May}Sept. 46$16 8.0$6.4

MV7 Observed mean

Mean absolute residual

AOT40 Observed mean

Mean absolute residual

39$14 52$13 42$9 42$11 23$10

9.1$6.8 5.9$3.5 6.2$4.4 3.8$3.1 5.5$5.2

58$59 126$95 51$58 56$63 2.7$5.9

43$35 50$34 65$51 37$27 2.6$5.4

40$15

6.5$4.8

59$74

41$40

be incorporated into the neural network algorithm. On the other hand, the exponential terms in the regression functions for ozone dose and 7 h mean were well reproduced by the neural network model. This limitation of the neural network model was probably due to the simple architecture of the model

(2 inputs, 2 hidden nodes, 1 output). Although the number of hidden nodes was optimal in the case of one hidden layer (Table 1), di!erent designs (more hidden layers or feedbacks) might have improved the performance of the neural networks for the ozone maximum.

G. Soja, A-M. Soja / Atmospheric Environment 33 (1999) 4299}4307

4305

Table 5 Validation of the regression models with test data (not included in the models). Location/year: Stixneusiedl 1995. Actually observed mean values and di!erences between observed and calculated ozone exposure indices (monthly mean$s.d.). O max: maximum half-hour  mean value in nl.l\, MV7: 7 h mean value in nl l\, AOT40: daily integrated ozone dose above 40 nl l\ during daylight hours) O max  Observed mean

Mean absolute residual

Regression models for individual months: May 60$13 7.2$5.2 June 46$11 8.3$7.5 July 71$11 7.7$6.7 August 60$15 8.0$4.9 September 46$11 6.4$4.3 Regression model for the combined months: May}Sept. 57$15 7.8$6.2

MV7 Observed mean

Mean absolute residual

AOT40 Observed mean

Mean absolute residual

49$12 37$12 59$12 49$13 33$9

6.0$4.6 9.9$7.7 8.4$5.3 4.9$3.3 4.5$4.3

132$105 49$63 238$112 132$101 30$42

43$40 47$57 76$57 41$36 13$16

45$15

7.5$5.9

118$115

47$51

It is hypothesized that with a larger number of input parameters the potential of neural networks could have been better exploited. The merits and slight superiorities of neural networks under such circumstances have already been analysed by Comrie (1997).

2.3. Analysis of over- and under-estimations by the models For studying the limitations of the models, those days with especially high di!erences between observed and predicted ozone indices (residuals) were further scrutinized. For these analyses only those days with di!erences of more than the mean residual plus one standard deviation were selected (64 out of 450 cases), and additional physical parameters not used in the model development were introduced. The importance of these parameters for explaining high residuals was analysed using factorial analysis. Figs. 6 and 7 show the relative weight (the varimax rotated factor loadings) of the newly introduced parameters in the two main factors that were related to the residuals. Environmental parameters with higher factor loadings explained high absolute residuals (higher than the mean of residuals plus one s.d.) better than parameters with low factor loadings. A di!erentiation between days with overestimations and days with underestimations made it clear that the respective deviations from the observed ozone indices were caused by di!erent environmental parameters. It was observed that overestimations more frequently occurred on days with lower global radiation (compared to sunshine duration) and high wind speeds (Fig. 6). Apparently, on days with hazy or partly overcast conditions sunshine duration is a less reliable substitute for global radiation. High wind speed createis very turbulent atmospheric conditions and an increased mixing height. These conditions favour the dilution of high ground-level

Fig. 6. Factor loadings of additional physical parameters for explaining high residuals (higher than the mean of the residuals plus one standard deviation) on days with overestimations of ozone indices by the best regression models.

ozone concentrations by air masses from high tropospheric layers. A very di!erent situation was revealed for the days when the model underestimated the observed ozone level (Fig. 7). It was of highest importance if these situations occurred on weekdays (Monday to Friday) or weekends (Saturday, Sunday or o$cial holiday). Underestimations more frequently occurred during the week than on weekends, probably due to higher precursor emissions by tra$c during the week. The importance of considering these short-term variations in the emission situation was also reported by Ziomas et al. (1995) and Cerveny and Balling (1998). Altshuler et al. (1995), however, stress the importance of considering the weekend NO /VOC-ratio V

4306

G. Soja, A-M. Soja / Atmospheric Environment 33 (1999) 4299}4307

derive ecologically relevant ozone indices directly from meteorological conditions. Acknowledgements The authors gratefully acknowledge "nancial support by the Austrian Ministery for Science and Tra$c (FTSP-Programm) and by the JubilaK umsfonds der OG sterreichischen Nationalbank (project nr. 4112). Thanks to D. Seeley for linguistic improvements. References

Fig. 7. Factor loadings of additional physical parameters for explaining high residuals (higher than the mean of the residuals plus one standard deviation) on days with underestimations of ozone indices by the best regression models.

because occasionally these changes may also result in an increase in ozone on Sunday. These analyses indicate that small improvements would require additional data input. When this kind of model is applied or adapted to the regional requirements, it must be decided if the addition of more sophisticated radiation and wind data as inputs in neural network models will be worth an increase in the precision of the models presented in this paper.

3. Conclusions The peculiarity of the models presented here is their low demand for input parameters. It has been shown that the knowledge of temperature and radiation alone can be su$cient to predict ozone indices with a precision in the same range as that of more advanced models. Although the application of simple models might not be necessary or desirable for regions where emission inventories and the atmospheric conditions are well characterized both physically and chemically, these models could have importance in less intensely studied areas. The application will be more successful in topographically homogeneous regions and in rural environments where the emissions and long-range transport of ozone precursors are relatively constant. The regional adaptation of the parameters will require temporary monitoring campaigns for air quality and meteorological conditions to derive locally optimized parameters for the speci"c functions. If these conditions are met and the data input requirements of other models are too demanding for a useful application, the presented models o!er a simple alternative to

Abdul-Wahab, S., Bouhamra, W., Ettouney, H., Sowerby, B., Crittenden, B.D., 1996. A statistical model for predicting ozone levels in the Shuaiba industrial area. Kuwait. ESPR 3, 195}204. Altshuler, S.L., Arcado, T.D., Lawson, D.R., 1995. Weekday vs weekend ambient ozone concentrations: discussion and hypotheses with focus on northern California. Journal of Air and Waste Management Association 45, 967}972. Batchelor, W.D., 1998. Fundamentals of neural networks. In: Peart, R.M., Curry, R.B. (Eds.), Agricultural Systems Modeling and Simulation. Marcel Dekker, New York, pp. 597}628. Bloom"eld, P., Royle, J.A., Steinberg, L.J., Yang, Q., 1996. Accounting for meteorological e!ects in measuring urban ozone levels and trends. Atmospheric Environment 30, 3067}3077. Cerveny, R.S., Balling Jr, R.C., 1998. Weekly cycles of air pollutants, precipitation and tropical cyclones in the coastal NW Atlantic region. Nature 394, 561}563. Comrie, A.C., 1997. Comparing neural networks and regression models for ozone forecasting. Journal of Air and Waste Management Association 47, 653}663. Fuhrer, J., SkaK rby, L., Ashmore, M.R., 1997. Critical levels for ozone e!ects on vegetation in Europe. Environmental Pollution 97, 91}106. Gardner, M.W., Dorling, S.R., 1998. Artici"cial neural networks (the multilayer perceptron) } a review of applications in the atmospheric sciences. Atmospheric Environment 32, 2627}2636. Hogsett, W.E., Weber, J.E., Tingey, D., Herstrom, A., Lee, E.H., Laurence, J.A., 1997. An approach for characterizing tropospheric ozone risk to forests. Environmental Management 21, 105}120. Kinney, P.L., Thurston, G.D., Raizenne, M., 1996. The e!ects of ambient ozone on lung function in children: a reanalysis of six summer camp studies. Environmental Health Perspectives 104, 170}174. Krupa, S.V., Nosal, M., Legge, A.H., 1994. Ambient ozone and crop loss: establishing a cause}e!ect relationship. Environmental Pollution 83, 269}276. Legge, A.H., GruK nhage, L., Nosal, M., JaK ger, H.J., Krupa, S.V., 1995. Ambient ozone and adverse crop response: an evaluation of North American and European data as they relate to exposure indices and critical levels. Journal of Applied Botany 69, 192}205.

G. Soja, A-M. Soja / Atmospheric Environment 33 (1999) 4299}4307 Loibl, W., Smidt, St., 1996. Ozone exposure } areas of potential risk for selected tree species in Austria. ESPR 3, 213}217. Olszyna, K.J., Luria, M., Meagher, J.F., 1997. The correlation of temperature and rural ozone levels in southeastern U.S.A. Atmospheric Environment 31, 3011}3022. Proyou, A.G., Toupance, G., Perros, P.E., 1991. A two-year study of ozone behaviour at rural and forested sites in eastern France. Atmospheric Environment 25 A, 2145}2153. Renner, E., Rolle, W., Helmig, D., 1993. Comparison of computed and measured photooxidant concentrations at a forest site. Chemosphere 27, 881}898. Schneider, C., Kessler, C., Moussiopoulos, N., 1997. In#uence of emission input data on ozone level predictions for the Upper Rhine Valley. Atmospheric Environment 31, 3185}3203. Simpson, D., 1993. Photochemical model calculations over Europe for two extended summer periods: 1985 and 1989. Model results and comparison with observations. Atmospheric Environment 27A, 921}943. Spektor, D.M., Thurston, G.D., Mao, J., He, D., Hayes, C., Lippmann, M., 1991. E!ects of single and multiday ozone

4307

exposures on respiratory function in active normal children. Environmental Research 55, 107}122. Spichtinger, N., Winterhalter, M., Fabian, P., 1996. Ozone and Grosswetterlagen. ESPR 3, 145}152. Stedman, J.R., Williams, M.L., 1992. A trajectory model of the relationship between ozone and precursor emissions. Atmospheric Environment 26A, 1271}1281. Xu, D.P., Yap, D., Taylor, P.A., 1996. Meteorologically adjusted ground level ozone trends in Ontario. Atmospheric Environment 30, 1117}1124. Yi, J., Prybutok, V.R., 1996. A neural network model forecasting for prediction of daily maximum ozone concentration in an industrialized urban area. Environmental Pollution 92, 349}357. Ziomas, I.C., Melas, D., Zerefos, C.S., Bais, A.F., 1995. On the relationship between peak ozone levels and metereological variables. Fresenius Environmental Bulletin 4, 53}58. Zlatev, Z., Christensen, J., Eliassen, A., 1993. Studying high ozone concentrations by using the Danish Eulerian model. Atmospheric Environment 27A, 845}865.