Selection and validation of parameters in multiple linear and principal component regressions

Selection and validation of parameters in multiple linear and principal component regressions

Environmental Modelling & Software 23 (2008) 50e55 www.elsevier.com/locate/envsoft Selection and validation of parameters in multiple linear and prin...

136KB Sizes 5 Downloads 18 Views

Environmental Modelling & Software 23 (2008) 50e55 www.elsevier.com/locate/envsoft

Selection and validation of parameters in multiple linear and principal component regressions J.C.M. Pires, F.G. Martins*, S.I.V. Sousa, M.C.M. Alvim-Ferraz, M.C. Pereira LEPÆ, Departamento de Engenharia Quı´mica, Faculdade de Engenharia, Universidade do Porto, Rua Dr. Roberto Frias, 4200-465 Porto, Portugal Received 13 April 2007; accepted 23 April 2007 Available online 13 June 2007

Abstract This paper aims to select statistically valid regression parameters using multiple linear and principal component regression models. The selection methods were: (i) backward elimination based on the confidence interval limits; (ii) backward elimination based on the correlation coefficient; (iii) forward selection based on the correlation coefficient; (iv) forward selection based on the sum of square errors; and (v) combinations of all variables. For the purpose of the work, a case study was considered. The case study focused on the determination of the parameters that influence the concentration of tropospheric ozone. The explanatory variables were meteorological data (temperature, relative humidity, wind speed, wind direction and solar radiation), and environmental data (nitrogen oxides and ozone concentrations of the previous day). The results showed that each selection method led to different multiple linear regression models, as a consequence of the collinearities between explanatory variables. Such collinearities can be removed by pre-processing the explanatory data set, through the application of principal component analysis. The application of this procedure allowed the achievement of the same regression model using all selection methods. Ó 2007 Elsevier Ltd. All rights reserved. Keywords: Multiple linear regression; Parameter validation methodologies; Principal component analysis

The predicted value given by the regression model (by) is calculated by:

1. Introduction Multiple linear regression (MLR) attempts to model the relationship between two or more explanatory variables and a response variable, by fitting a linear equation to the observed data. The dependent variable ( y) is given by:

y¼b b0 þ

k X

b bi xi þ e

ð1Þ

i¼1

by ¼ b b0 þ

k X

b bi xi

The most common method to estimate the regression parameters b bi is the minimization of the sum of square errors (SSE). The equation is as follows: b bi ¼ arg min

n X  i¼1

where xi (i ¼ 1, . , k) are the explanatory independent variables, b bi (i ¼ 0, . , k) are the regression coefficients, and e is the error associated with the regression and assumed to be normally distributed with both expectation value zero and constant variance (Agirre-Basurko et al., 2006). * Corresponding author. Tel.: þ351 22 508 1974; fax: þ351 22 508 1449. E-mail address: [email protected] (F.G. Martins). 1364-8152/$ - see front matter Ó 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.envsoft.2007.04.012

ð2Þ

i¼1

yi  byi

2

ð3Þ

MLR is one of the most used methods for forecasting. This method is widely used to fit the observed data and to create models that can be used for the prediction in many research fields such as biology (Khan et al., 2006; Mercaldo-Allen et al., 2006; Smith and Wachob, 2006), medicine (SanchezOrtuno et al., 2006; Andersen et al., 2005; Dorbala et al., 2006), psychology (Ansiau et al., 2005), economics (Mohamed and Bodger, 2005; Singh, 2006) and environment

J.C.M. Pires et al. / Environmental Modelling & Software 23 (2008) 50e55

(Sousa et al., 2006; Turalıo˘glu et al., 2006; Lu and Chang, 2005; Friis et al., 2005; Goyal et al., 2006). The environment is a research field where MLR has more applications. Sousa et al. (2006) used the MLR approach to predict hourly mean ozone concentrations, comparing it with feedforward artificial neural networks (FANN) and time series (TS) modelling. In that study, MLR models showed good performance in the development step, but in the validation step FANN presented lower residual errors. Turalıo˘glu et al. (2006) used MLR to determine the relationship between daily average total suspended particulate (TSP), sulphur dioxide (SO2) concentrations and meteorological parameters (temperature, wind speed, relative humidity, pressure and precipitation) in Erzurum, Turkey. While pollutants concentrations had a strong relation with temperature, they had a significant correlation with wind speed and pressure. The precipitation and humidity were weakly correlated with SO2 and TSP. This method was also used by Goyal et al. (2006) to forecast daily averaged concentration of respirable suspended particulate matter (RSPM) in Delhi and Hong Kong based on some meteorological factors. The results were compared with time series auto regressive integrated moving average (ARIMA) model and the combination of the two models. The combination of MLR and ARIMA presented better performance in comparison with the MLR or the ARIMA. The studies described above did not, however, consider the variable dependence before the application of MLR. When explanatory variables are correlated with each other, the application of this method usually presents some drawbacks due to the fact that high correlations between predictor variables can difficult a correct analysis. The dependence of the explanatory variables can be removed through the application of principal component analysis (PCA). PCA creates new variables, the principal components (PC), that are orthogonal and uncorrelated. These variables are linear combinations of the original variables. The PC are ordered in such a way that the first component has the largest fraction of the original data variability (Abdul-Wahab et al., 2005; Wang and Xiao, 2004; Sousa et al., 2007). To evaluate the influence of each variable in the PC, varimax rotation is generally used to obtain the rotated factor loadings that represent the contribution of each variable in a specific principal component. Principal component regression (PCR) is a method that combines linear regression and PCA. PCR establishes a relationship between the output variable ( y) and the selected PC of the input variables (xi). To develop these models with variables that correspond to significant regression parameters it is necessary to use one of the statistical procedures described in Section 2. There are several published papers where these procedures were ignored (Boughton and Chiew, 2007; Liu et al., in press; Zhu et al., 2007). The application of these procedures avoids the inclusion in the models of input variables less correlated with the output variable. In this paper, regressions were performed with and without the application of PCA to the original data. The aim was to apply a methodology, based on a statistical procedure, to select

51

the explanatory variables to be used in the development of multiple linear and principal component regression models. The following five methods were compared: (i) backward elimination based on the confidence interval limits; (ii) backward elimination based on the correlation coefficient; (iii) forward selection based on the correlation coefficient; (iv) forward selection based on the sum of square errors; and (v) combinations of all variables. A case study was considered, regarding the determination of the parameters that influence the concentration of tropospheric ozone. The explanatory variables were meteorological data (temperature, relative humidity, wind speed, wind direction and solar radiation), and environmental data (nitrogen oxides and ozone concentrations of the previous day). The remainder of this paper is outlined as follows: in Section 2 different methods to validate regression coefficients of multiple linear and principal component regressions models are presented; in Section 3 the case study is described; in Section 4 the results of the parameter validation methods are discussed, and finally; in Section 5 some conclusions are presented. 2. Selection methods The significance of the regression parameters in the MLR and PCR models was evaluated through the calculation of their confidence interval. The parameter b bi is valid if (Hayter et al., 2005): a=2  tnk1 b s b ffiffiffiffiffiffiffiffiffi bi j > p Sxxi

ð4Þ

where t is the Student t distribution, n is the number of points, k is the number of parameters, a p is ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi the significance ffilevel, b s is the standard deviation given by SSE=ðn  P k  1Þ and Sxxi is the sum of squares related to xi given by nj¼1 ðxij  xi Þ2. The description of the methods is given below. 2.1. Method 1 e backward elimination based on the confidence interval limits The regression parameters were considered valid for positive values of gi. This parameter was defined by: a=2

  tnk1 b s gi ¼ b bi   pffiffiffiffiffiffiffiffiffi Sxxi

ð5Þ

The following procedure was applied: Step 1 Determination of the regression equation with all explanatory variables. Step 2 Calculation of gi for each regression parameter with a specific significance level. Step 3 If all values are positive, the regression is valid and the procedure is concluded; if at least one value is negative: selection of the lowest value of gi, elimination of the corresponding explanatory variable and proceed to step 4.

J.C.M. Pires et al. / Environmental Modelling & Software 23 (2008) 50e55

52

Step 4 Determination of the regression equation with the remaining explanatory variables and return to step 2.

2.2. Method 2 e backward elimination based on the correlation coefficient This method also uses a backward elimination. In the previous method, the selected variable corresponded to the lowest value of gi. In the present method, the selected variable is the explanatory variable which presents the lowest correlation with the dependent variable ( y). 2.3. Method 3 e forward selection based on the correlation coefficient The procedure of this method is as follows: Step 1 Determination of the regression equation by calculating only b b0. Step 2 Ordering of the explanatory variables according to their correlation with the dependent variable (from the most to the least correlated variable). Step 3 Selection of the explanatory variable best correlated with the dependent variable and determination of the regression equation. Step 4 Verification of all the regression parameters validity; if valid proceed to step 5; if not: elimination of the last added explanatory variable, determination of the regression equation using the validated variables, and conclusion of the procedure. Step 5 Addition of one of the remaining explanatory variables following the order described in step 2. Step 6 Repetition of steps 4 and 5 until all the explanatory variables are used in the regression.

2.4. Method 4 e forward selection based on the sum of square errors This method is similar to method 3. The selected variable corresponds to the one presenting the lowest sum of square errors (SSE). Besides t-test, also Partial F test was applied for the regression parameter validation; from now on referred as methods 4a and 4b, respectively. For the application of Partial F test, F statistic is calculated by (Kleinbaum et al., 1988): Fi ¼

ðSSEi1  SSEi Þ  ðn  k  1Þ SSEi  k

2.5. Method 5 e combinations of all variables Method 5 determines the regression equations corresponding to all the possible combinations of explanatory variables. This method selects the combination correspondent to the regression with valid parameters and the lowest value of SSE. Thus, it guarantees the achievement of the optimal solution, but when datasets have many variables, a large increase of computation time is observed. The methods described above are faster, but they cannot achieve the optimal solution. 3. Case study This study aims to determine the parameters that mostly influence the concentration of tropospheric ozone. Increased tropospheric ozone levels have been affecting human health, climate, vegetation, materials and atmospheric composition. Tropospheric ozone is formed by reactions involving solar radiation and anthropogenic pollutants (methane, non-methane volatile organic compounds, carbon monoxide) in the presence of nitrogen oxides. Consequently, a typical daily profile of ozone concentrations (strongly influenced by meteorological factors as well) shows maximum values during the early afternoon, with sharp decrease during the late afternoon and minimum values over the night and early morning. However, this typical daily profile is not always observed. Nocturnal ozone maxima, observed between midnight and sunrise, have been studied by several authors (Corsmeier et al., 1997; Baumbach and Vogt, 1999; Eliasson et al., 2003). Nocturnal ozone maxima were always lower than diurnal maxima, however significant high concentrations (220 mg/m3 and 240 mg/m3) were measured during the night (Samson, 1978; Steinberger and Ganor, 1980). Nocturnal ozone maxima may have environmental and health impacts; recent studies showed that increased ozone levels at night affected vegetation (Musselman and Massman, 1999). The data were recorded in an urban site situated in Oporto, Northern Portugal, and the treatment was based on the hourly averages of pollutants concentrations and meteorological parameters. The analysed period was August of 2003, during night time, 22 he6 h GMT, (219 data points). The explanatory variables considered were: air temperature (T), solar radiation (SR), relative humidity (RH), wind speed (WS), wind direction (WD), and the concentrations of the previous day of nitrogen monoxide (NO), nitrogen dioxide (NO2) and ozone (O3) (Alvim-Ferraz et al., 2006; Baur et al., 2004). The explanatory variables were standardized to have zero mean and unit standard deviation.

ð6Þ 4. Results and discussion

where SSEi  1 and SSEi are the sum of square errors before and after the addition of a explanatory variable. The value of Fi is then compared with the critical value ( fc) given by Fk,n  k  1,1  a, and the added explanatory variable is considered valid if the critical value is less than the calculated value Fi.

The selected variables for each method were validated with t-test and Partial F test (method 4b) using a significance level of 0.05. Table 1 shows the MLR coefficients and the correspondent values of ti obtained using different variable selection methods,

J.C.M. Pires et al. / Environmental Modelling & Software 23 (2008) 50e55

53

Table 1 Statistically valid regression parameters (significance level of 0.05) and the correspondent values of ti using MLR, for each selection method

Table 2 Correlation matrix of the explanatory variables

Parameters

NO 1 NO2 T SR RH WS WD O3

b0 t0 b3 t3 b5 t5 b6 t6 b7 t7

Methods 1

2

3

4a

4b

5

22.7 20.3 3.28 0.57

22.7 20.3

22.7 20.3

22.7 20.3

22.7

22.7 20.3

3.56 0.78 5.12 6.33 3.01 2.83

3.56 0.88 5.12 2.57 3.01 0.47

5.22 2.60 2.82 0.30

3.96 1.52

3.96 1.52

3.96 1.52

Notes: b0 / independent parameter; b3 / temperature; b5 / relative humidity; b6 / wind speed; b7 / wind direction; ti is equal to gi or (Fi  fc) for the methods that apply, respectively, t-test or Partial F test for the validation of the regression parameters (in both cases, a parameter is valid for positive values of ti).

for the analysed period. The value of ti is equal to gi or (Fi  fc) for the methods that apply, respectively, t-test or Partial F test for the validation of the regression parameters (in both cases, a parameter is valid for positive values of ti). The selected parameters that influenced the concentration of tropospheric ozone were different for each method. For all of them, concentrations of the previous day of nitrogen oxides (b1 and b2), ozone (b8), and solar radiation (b4) were statistically invalid. The constant b0 was always statistically valid, presenting the same value for all methods, which corresponded to the mean of ozone concentration. For method 1, temperature (b3), wind speed (b6) and wind direction (b7) were considered to be the variables that influenced the concentration of tropospheric ozone. For methods 2, 3 and 4a, wind speed was the only variable considered to be statistically valid. For methods 5 and 4b, relative humidity (b5), wind speed (b6) and wind direction (b7) were considered to be the variables that mostly influence the concentration of tropospheric ozone. The influence of wind speed and wind direction was validated, but the influence of both temperature and relative humidity, was not well defined. Concluding, the results showed that each selection method led to different multiple linear regression models, which

NO NO2

T

SR

RH

WS

WD

0.757 0.361 0.155 L0.310 0.037 1 0.612 0.070 L0.511 0.153 1 L0.145 L0.891 0.370 1 0.097 0.045 1 L0.313 1

O3

L0.284 L0.496 L0.177 L0.440 L0.269 0.140 L0.141 0.115 0.302 L0.186 0.015 0.265 1 0.083 1

Note: Values in bold correspond to statistically valid correlation coefficients.

resulted from the existence of collinearities between explanatory variables. Table 2 shows the correlation matrix between the explanatory variables, these coefficients provide a measure of the linear relationship between two different variables. For the validation of these values, the critical correlation coefficient was calculated (with a significance level of 0.05) and compared with the correlation coefficient. Correlation coefficient is valid if higher than the critical value. The values in bold presented in Table 2 correspond to statistically valid coefficients (Sousa et al., 2007). Temperature (b3) and relative humidity (b5) showed the greatest correlation coefficient. The collinearities can be removed by pre-processing the explanatory data set, through the application of principal component analysis. Applying PCA, the variables were transformed into an equal number of principal components, maintaining all information of original data. After the variable transformation, varimax rotation was used to maximize the loadings of predictor variable on each principal component. Table 3 shows the results of the varimax rotation on the eight PC and the variance for each component. Values in bold correspond to the main contributions of the explanatory variables in each principal component. The first principal component (PC1) has important contributions of two meteorological variables, namely temperature (T) and relative humidity (RH). PC2 is heavily loaded on two primary pollutants (NO and NO2). PC3, PC4, PC5 and PC6 have important contributions of respectively, solar radiation (SR), wind direction (WD), wind speed (WS) and ozone concentration (O3).

Table 3 Principal components varimax rotating loadings

NO NO2 T SR RH WS WD O3 % Cumulative Variance

PC1

PC2

PC3

PC4

PC5

PC6

PC7

PC8

0.201 0.452 L0.907 0.078 0.944 0.198 0.164 0.132

L0.928 L0.617 0.204 0.057 0.141 0.011 0.109 0.290

0.098 0.075 0.087 0.989 0.037 0.011 0.076 0.049

0.136 0.028 0.111 0.072 0.143 0.022 L0.974 0.036

0.036 0.095 0.186 0.011 0.121 L0.971 0.021 0.134

0.251 0.320 0.067 0.039 0.097 0.113 0.033 0.932

0.029 0.540 0.119 0.018 0.017 0.024 0.009 0.067

0.003 0.013 0.241 0.003 0.197 0.004 0.001 0.000

37.32

60.66

74.26

85.14

92.81

96.95

98.87

100.00

Note: Values in bold indicate the variables that most influence each principal component.

J.C.M. Pires et al. / Environmental Modelling & Software 23 (2008) 50e55

54

Table 4 Statistically valid regression parameters (significance level of 0.05) and the correspondent values of ti using PCR, for each selection method Parameters

b0 t0 b5 t5 b6 t6 b7 t7

Methods 1

2

3

4a

4b

5

22.7 20.3 6.52 3.48 4.37 0.22

22.7 20.3 6.52 3.48 4.37 0.22

22.7 20.3 6.52 3.48 4.37 0.22

22.7 20.3 6.52 3.48 4.37 0.22

22.7

22.7 20.3 6.52 3.48 4.37 0.22

6.52 13.9 4.37 1.29 5.51 0.58

Note: ti is equal to gi or (Fi  fc) for the methods that apply, respectively, t-test or Partial F test for the validation of the regression parameters (in both cases, a parameter is valid for positive values of ti).

Table 4 shows the selected variables and the respective values of ti achieved using PCR, after the application of the different selection methods. For all methods, the statistically invalid PC were for PC1 to PC4 and PC8. Besides PC5 and PC6, statistically valid for all methods, PC7 was also validated using method 4b. PC7 was selected due to the difference in the parameter validation criteria. When applying this method, the addition of PC7 did not change the regression coefficients of the other valid PC, which demonstrates that the input variables (PC) were independent. The original variables associated with the valid principal components, PC5 and PC6, were respectively, wind speed and ozone concentration of the previous day. These results were different from the ones obtained using the original variables in MLR. Tropospheric ozone is not produced by photochemical reactions during the night, due to, both the absence of sunlight and the lower emission of ozone precursors such as NOx and VOCs. Nocturnal ozone can be explained by: (i) accumulation of ozone during daytime; (ii) vertical mixing with higher concentrations of ozone from upper layers in the atmosphere; and (iii) horizontal transport due to local winds such as land-sea breezes (Eliasson et al., 2003; Mesquita et al., 2004). Therefore, during the night, the wind and the ozone concentration of the previous day strongly influence tropospheric ozone concentration. This fact is in agreement with what was concluded by analysing the data using PCR. 5. Conclusions Aiming the selection of statistically valid regression parameters, using multiple linear and principal component regression models, five methods were compared. These methods were used to evaluate the variables that influenced tropospheric ozone concentration during the night period. When multiple linear regression was used, the results showed that each selection method led to different models, as a consequence of the collinearities between explanatory variables. On the contrary, when principal component regression was applied, all five methods selected wind speed and ozone

concentration of the previous day as the variables influencing the concentration of tropospheric ozone, which is in agreement with what naturally occurs. Therefore, the application of principal component analysis removed the collinearities between the explanatory variables, allowing the achievement of the same regression model using all selection methods.

Acknowledgements ~o da Authors are grateful to Comiss~ao de Coordenac¸a Direcc¸~ao Regional-Norte and to Instituto Geofı´sico da Universidade do Porto for kindly providing the air quality and meteorological data. This work was supported by Fundac¸~ ao para a Cieˆncia e Tecnologia (FCT). J.C.M. Pires also thanks the FCT for the fellowship SFRD/BD/23302/2005.

References Abdul-Wahab, S.A., Bakheit, C.S., Al-Alawi, S.M., 2005. Principal component and multiple regression analysis in modelling of ground-level ozone and factors affecting its concentrations. Environmental Modelling & Software 20 (10), 1263e1271. Agirre-Basurko, E., Ibarra-Berastegi, G., Madariaga, I., 2006. Regression and multilayer perceptron-based models to forecast hourly O3 and NO2 levels in the Bilbao area. Environmental Modelling & Software 21, 430e446. Alvim-Ferraz, M.C.M., Sousa, S.I.V., Pereira, M.C., Martins, F.G., 2006. Contribution of anthropogenic pollutants to the increase of tropospheric ozone levels in Oporto Metropolitan Area, Portugal since the 19th century. Environmental Pollution 140, 516e524. Andersen, J.R., Søgnen, E., Natvig, G.K., 2005. Diet quality in 116 Norwegian men and women with coronary heart disease. European Journal of Cardiovascular Nursing 5, 244e250. Ansiau, D., Marquie´, J.C., Soubelet, A., Ramos, S., 2005. Relationships between cognitive characteristics of the job, age, and cognitive efficiency. International Congress Series 1280, 43e48. Baumbach, G., Vogt, U., 1999. Experimental determination of the effect of mountain-valley breeze circulation on air pollution in the vicinity of Freiburg. Atmospheric Environment 33, 4019e4027. Baur, D., Saisana, M., Schulze, N., 2004. Modelling the effects of meteorological variables on ozone concentration e a quantile regression approach. Atmospheric Environment 38 (28), 4689e4699. Boughton, W., Chiew, F., 2007. Estimating runoff in ungauged catchments from rainfall, PET and the AWBM model. Environmental Modelling & Software 22 (4), 476e487. Corsmeier, U., Kalthoff, N., Kolle, O., Kotzian, M., Fiedler, F., 1997. Ozone concentration jump in the stable nocturnal boundary layer during a LLJevent. Atmospheric Environment 31, 1977e1989. Dorbala, S., Crugnale, S., Yang, D., Di Carli, M.F., 2006. Effect of body mass index on left ventricular cavity size and ejection fraction. American Journal of Cardiology 97, 725e729. Eliasson, I., Thorsson, S., Andersson-Sko¨ld, Y., 2003. Summer nocturnal ozone maxima in Go¨teborg, Sweden. Atmospheric Environment 37, 2615e2627. Friis, K., Ko¨rtzinger, A., Pa¨tsch, J., Wallace, D.W.R., 2005. On the temporal increase of anthropogenic CO2 in the subpolar North Atlantic. Deep-Sea Research I 52, 681e698. Goyal, P., Chan, A.T., Jaiswal, N., 2006. Statistical models for the prediction of respirable suspended particulate matter in urban cities. Atmospheric Environment 40, 2068e2077. Hayter, A.J., Wynn, H.P., Liu, W., 2005. Slope modified confidence bands for a simple linear regression model. Statistical Methodology 3, 186e192.

J.C.M. Pires et al. / Environmental Modelling & Software 23 (2008) 50e55 Khan, Z.A., Badruddin, I.A., Quadir, G.A., Seetharamu, K.N., 2006. A quick and accurate estimation of heat losses from a cow. Biosystems Engineering 93 (3), 313e323. Kleinbaum, D.G., Kupper, L.L., Muller, K.E., 1988. Applied Regression Analysis and Other Multivariable Methods. PWS-KENT Publishing Company, Boston, pp. 124e129. Liu, G., Wang, L., Qu, H., Shen, H., Zhang, X., Zhang, S., Mi, Z., 2007. Artificial neural network approaches on composition-property relationships of jet fuels based on GC-MS. Fuel, in press, http://dx.doi.org/10.1016/ j.fuel.2007.02.023. Lu, H.C., Chang, T.S., 2005. Meteorologically adjusted trends of daily maximum ozone concentrations in Taipei, Taiwan. Atmospheric Environment 39, 6491e6501. Mercaldo-Allen, R., Kuropat, C., Caldarone, E.M., 2006. A model to estimate growth in young-of-the-year tautog, Tautoga onitis, based on RNA/DNA ratio and seawater temperature. Journal of Experimental Marine Biology and Ecology 329, 187e195. Mesquita, M.C., Alvim-Ferraz, M.C.M., Ferreira, I., Go´is, J., 2004. A Influeˆncia dos Ventos Locais no Aumento das Concentrac¸~oes de Ozono Superficial, 8a Confereˆncia Nacional de Ambiente, Lisbon. Mohamed, Z., Bodger, P., 2005. Forecasting electricity consumption in New Zealand using economic and demographic variables. Energy 30 (10), 1833e1843. Musselman, R., Massman, W., 1999. Ozone flux to vegetation and its relationship to plant response and ambient air quality standards. Atmospheric Environment 33, 65e73. Samson, P., 1978. Nocturnal ozone maxima. Atmospheric Environment 12, 951e955.

55

Sanchez-Ortuno, M., Moore, N., Taillard, J., Valtat, C., Leger, D., Bioulac, B., Philip, P., 2006. Sleep duration and caffeine consumption in French middle-age working population. Sleep Medicine 6, 247e251. Singh, G., 2006. Estimation of a mechanisation index and its impact on production and economic factors e a case study in India. Biosystems Engineering 93, 99e106. Smith, C.M., Wachob, D.G., 2006. Trends associated with residential development in riparian breeding bird habitat along the Snake River in Jackson Hole, WY, USA: implications for conservation planning. Biological Conservation 128, 431e446. Sousa, S.I.V., Martins, F.G., Pereira, M.C., Alvim-Ferraz, M.C.M., 2006. Prediction of ozone concentrations in Oporto city with statistical approaches. Chemosphere 64, 1141e1149. Sousa, S.I.V., Martins, F.G., Alvim-Ferraz, M.C.M., Pereira, M.C., 2007. Multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations. Environmental Modelling & Software 22, 97e103. Steinberger, E., Ganor, E., 1980. High ozone concentrations at night in Jerusalem and Tel-Aviv. Atmospheric Environment 14, 221e225. Turalıo˘glu, F.S., Nuho˘glu, A., Bayraktar, H., 2006. Impacts of some meteorological parameters on SO2 and TSP concentrations in Erzurum, Turkey. Chemosphere 59, 1633e1642. Wang, S., Xiao, F., 2004. AHU sensor fault diagnosis using principal component analysis method. Energy and Buildings 36 (2), 147e160. Zhu, Y.M., Lu, X.X., Zhou, Y., 2007. Suspended sediment flux modeling with artificial neural network: An example of the Longchuanjiang River in the Upper Yangtze Catchment, China. Geomorphology 84 (1e2), 111e125.