Neighbourhood scale nitrogen dioxide land use regression modelling with regression kriging in an urban transportation corridor

Neighbourhood scale nitrogen dioxide land use regression modelling with regression kriging in an urban transportation corridor

Atmospheric Environment xxx (xxxx) xxx Contents lists available at ScienceDirect Atmospheric Environment journal homepage: http://www.elsevier.com/l...

2MB Sizes 0 Downloads 93 Views

Atmospheric Environment xxx (xxxx) xxx

Contents lists available at ScienceDirect

Atmospheric Environment journal homepage: http://www.elsevier.com/locate/atmosenv

Neighbourhood scale nitrogen dioxide land use regression modelling with regression kriging in an urban transportation corridor Tuo Shi a, b, c, Nick Dirienzo d, Weeberb J. Requia e, Marianne Hatzopoulou f, Matthew D. Adams a, * a

Department of Geography, University of Toronto Mississauga, 3359 Mississauga Road, Mississauga, Ontario, L5L 1C6, Canada CAS Key Laboratory of Forest Ecology and Management, Institute of Applied Ecology, Chinese Academy of Sciences, No. 72 Wenhua Road, Shenyang, 110016, China College of Resources and Environment, University of Chinese Academy of Sciences, No. 19 Yuquan Road, Beijing, 100049, China d Institute of Environmental Science, Carleton University, 1125 Colonel By Drive, Ottawa, Ontario, K1S 5B6, Canada e Department of Environmental Health, School of Public Health, Harvard University, 677 Huntington Avenue, Boston, MA, 02115, United States f Department of Civil and Mineral Engineering, University of Toronto, 35 St George Street, Toronto, Ontario, M5S 1A4, Canada b c

H I G H L I G H T S

� Land use regression modelling is demonstrated at a local scale. � Nitrogen dioxide concentrations were measured in an urban transportation corridor. � Spatial modelling relied on traffic-related variables. � Regression kriging was able to improve the model’s predictive performance. A R T I C L E I N F O

A B S T R A C T

Keywords: Air pollution modelling Land use regression Regression kriging Nitrogen dioixde

Land use regression models (LUR) associate observed air pollution concentrations with surrounding land use characteristics for air pollution modelling. This technique is common in urban landscapes focused at a city-wide spatial scale. Our study tested the applicability of LUR modelling at a local scale, defined as multiple air monitors within a neighbourhood. The study area was 15.4 km of an urban transportation corridor in Mississauga, Canada. Nitrogen dioxide (NO2) was sampled at 112 sites during the summer in 2018 and observations ranged from 5.8 ppb to 19.65 ppb. A linear regression LUR model explained 69% of the variation in NO2 concentrations at this local scale, with estimated prediction errors less than 1.61 ppb, which were calculated by three cross-validation methods. Traffic volume, major and minor road lengths were key determinants among the predictor variables, and park area and distance to the nearest major intersection were the only variables with negative coefficients in the local-scale model. Extending the linear model approach with regression kriging improved the model’s explanatory ability with a coefficient of determination at 0.91; however, smaller improvements were observed during cross-validation. Leave-one-out cross-validation for the linear model LUR model (RMSE ¼ 1.44 ppb and a R2 ¼ 0.64) and the regression kriging LUR model (RMSE ¼ 1.34 ppb and a R2 ¼ 0.69) were similar. Model performance remained stable when 10-fold cross-validation was performed with the regression kriging LUR model (regression kriging, R2 ¼ 0.68 and RMSE ¼ 1.36 ppb). The predicted air pollution levels ranged from 4.5 ppb to 25.6 ppb. This study demonstrates the ability of LUR modelling to perform well for local scale modelling in transportation dominated local urban environments.

1. Introduction Exposure to urban ambient air pollution is associated with negative health effects that include impacts to the respiratory, cardiovascular and

nervous systems (Kampa and Castanas, 2008; Lave and Seskin, 2013). In developed regions, because of industrial restructuring, technological change and pollution control, levels of criteria air contaminants have declined; however, increased road traffic and congestion have resulted

* Corresponding author. E-mail address: [email protected] (M.D. Adams). https://doi.org/10.1016/j.atmosenv.2019.117218 Received 18 August 2019; Received in revised form 5 December 2019; Accepted 9 December 2019 1352-2310/© 2019 Elsevier Ltd. All rights reserved.

Please cite this article as: Tuo Shi, Atmospheric Environment, https://doi.org/10.1016/j.atmosenv.2019.117218

T. Shi et al.

Atmospheric Environment xxx (xxxx) xxx

in hot spots of traffic-related air pollution (Goodman et al., 2009; Harris et al., 2015; Zhang and Batterman, 2013). In dense urban areas, traffic-related air pollution and residents’ exposure are closely linked because of proximity of residences and workplaces to traffic (Nafstad et al., 2003; Weichenthal et al., 2015). In order to understand urban air pollution exposure, land use regression (LUR) models have become a widely used method for spatial modelling of air pollution in a city (Eeftens et al., 2012; Marshall et al., 2008). LUR links surrounding land use conditions to observed air pollution concentrations with stochastic models using predictor variables ob­ tained through geographic information systems (Hoek et al., 2008). This approach has shown stronger performance when compared to dispersion modelling, remote sensing techniques and spatial interpolation for intra-city ambient air pollution studies (Marshall et al., 2008). The SAVIAH (Small Area Variations In Air quality and Health) study was the first to use LUR to model small scale variations in air pollution, which selected NO2, a marker for traffic-related pollution, which was moni­ tored at 80 locations in four study areas in Europe to construct a LUR model (Briggs et al., 1997). The results showed that the model produced accurate predictions of monitored levels. LUR modelling has become widely used in various studies (Beelen et al., 2013; Shi et al., 2016). Although relevant studies all take LUR as the theoretical framework, the monitoring methods, predictors and statistical models have differed with continual improvement (Kashima et al., 2018). Commonly, LUR models use a linear regression equation (Briggs et al., 1997). Model performance can be improved with the combination of LUR and kriging (Araki et al., 2015; Mercer et al., 2011). Kriging models estimate the values at unsampled locations by a weighted spatial averaging of nearby samples. The correlations among neighbouring values are modelled as a function of the geographic distance between the points across the study area, defined by a variogram (Miller et al., 2007). Regression kriging (equivalent to universal kriging or kriging with external drift) is a hybrid method that combines regression models with kriging of the regression residuals. The kriged residual values are then added to the regression predictions to provide an improved estimation (Motaghian and Mohammadi, 2011). Some studies have demonstrated improved prediction performance by combining the LUR model with universal kriging or kriging with external drift (Mercer et al., 2011; Pearce et al., 2009). Spatial and temporal scale are critical aspects in LUR models, which can include the spatial resolution of estimates, spatial density of air pollution observations (e.g. observations from regional regulatory monitoring vs. passive sampling vs. route mobile monitoring), spatial extent of a model (e.g. city vs. country), temporal resolution of outputs (e.g. hourly vs. seasonal), and the temporal period (e.g. single season vs. multiyear forecasting). LUR models have shown poor performance when applied to an independent data set (e.g. a new location or time) (Miskell et al., 2015; Molter et al., 2010; Mukerjee et al., 2012; Poplawski et al., 2009). Poplawski et al. (2009) and Allen et al. (2011) studied the spatial transferability of LUR models between different cities with similar weather and pollution regimes. The results indicated that although it was possible to transfer LUR models between geographically similar cities, the studies could achieve better exposure assessments by devel­ oping LUR models locally. Weissert et al. (2018) tested the within-city transferability of LUR models developed at different spatial scales (local scale and city scale), and found that within city transferability was limited. As Molter et al. (2010) pointed that most previous LUR studies did not include temporal variation, because they were based on short term monitoring campaigns and did not have historic pollution data. Mukerjee et al. (2012) also found that strong summer/winter seasonal influence was detected in the LURs in Cleveland, Ohio. In this paper, we explore one of the spatial scale aspects in LUR modelling, which is the use of LUR modelling for spatial interpolation at a local scale. This is important because LUR models can be easily applied to fine spatial resolutions (e.g. household or neighbourhood); however, little work has examined the performance of LUR modelling at these

spatial scales. We examine NO2 estimates at a neighbourhood scale using hybrid LUR modelling, including both linear regression models and regression kriging. We focused our research on a major urban transportation corridor in Mississauga, Canada. The paper examines if LUR modelling is suitable on a local scale in a transportation dominated environment. 2. Materials and methods 2.1. Study area Mississauga is a city in the Canadian province of Ontario, and is adjacent to Toronto, which is Canada’s largest city and the fourth largest city in North America. It is situated on the shores of Lake Ontario in the Regional Municipality of Peel, with a population of 721,599 as of the 2016 census. Within Peel Region, Hurontario Street is a major urban thoroughfare within the cities of Mississauga and Brampton, and is one of the busiest transit corridors in the Greater Toronto Area (McCallion, 2004). Within Canada, the Region of Peel hosts the largest, most intense cluster of freight distribution and logistics industries. Toronto Pearson International Airport located in the north of Mississauga, is Canada’s largest and busiest airport by passenger volume with 45,440,557 pas­ sengers and movement of 443,153.2 tonnes of freight in 2017. In Fig. 1, we present the 15.4 km Hurontario Corridor within Mississauga. 2.2. Nitrogen dioxide concentration measurements Passive two-sided Ogawa diffusion tubes were used to sample 7-day average NO2 concentrations at 112 sites around Hurontario Street, during the summer of 2018. These samplers were selected because they are lightweight and easy to install in the urban environment, do not require a power source, and have shown strong performance with EPA reference methods (Mukerjee et al., 2004). Ogawa samplers work by converting a triethanolamine coating on filters to nitrite in the presence of NO2 (Mukerjee et al., 2004; Poplawski et al., 2009), which are then extracted into water and the nitrite analyzed using colorimetry of a diazo-coupling reaction. Samplers were attached to utility poles (e.g. light posts) about 3 m above the ground. White opaque rain shelters were used to reduce the chance of rain contacting the sample pads. Samplers were faced North to reduce exposure to sunlight (reducing degradation of collection media), and the geographical coordinates of each site were recorded with a Global Positioning System unit. Sample locations were manually selected to reflect small scale changes in various types of urban land use around the Hurontario corridor. Sites were also chosen to represent the range of expected spatial variability of air pollutant concentrations (Weissert et al., 2018), and they were based on available sampling locations within the corridor and governed by our ability to acquire sample location permission. Three 7-day sampling campaigns were conducted between July 10th and August 13th, 2018. In the first sampling period (July 10th to 16th), 37 samples were collected, in the second sampling period (July 24th to 30th), 37 samples were collected, and the final sampling campaign (August 7th to 13th) generated 38 samples. Between 5 and 8 duplicates were included in each sampling period, in addition to field blanks. Each sampling campaign also included a sampler that was collocated with a regulatory air pollution monitor in Mississauga that is operated by the provincial government. 2.3. Predictor variables LUR predictor variables were selected based on those identified as significant in past LUR models (Ho et al., 2015; Kashima et al., 2018; Shi et al., 2016). Four categories including 106 independent variables were generated to reflect pollutant emission intensity, which included: (1) land use, (2) traffic, (3) population and (4) geographic information (Table 1). We analyzed the land use, population and most traffic 2

T. Shi et al.

Atmospheric Environment xxx (xxxx) xxx

Fig. 1. NO2 sampling sites, land use and roads in the study area.

predictor variables by the coefficient of determination (R2). (2) All the predictor variables with an R2 larger than zero were included in the initial model. (3) Stepwise multiple linear regression was conducted to establish LUR regression models of NO2 and ranked by Akaike infor­ mation criterion (AIC). In the R programming language, the step func­ tion was used to select a best fit model with a minimum AIC relative to other models, the variables were added and dropped repeatedly in the model development and finally we retained the candidate model with minimum AIC. (4) The significance level and variance inflation factor (VIF) of each predictor variable were checked to confirm the variables’ significance levels and ensure no issues of multicollinearity. (5) Identify the final model and evaluate the predictive ability of the model using leave-one-out cross-validation (LOOCV), 10-fold cross-validation, and 10-block cross-validation. Following the linear regression model, we developed a regression kriging model. Regression kriging included the retained significant predictor variables from the linear regression model. The variogram was estimated using the autoKrige function from the “automap” package in R (Hiemstra et al., 2009). Leave-one-out cross-validation (LOOCV), 10-fold cross-validation and 10-block cross-validation were applied to estimate prediction capacity at unobserved locations.

parameters using eight buffer sizes ranging from 1 m to 500 m, and three traffic parameters using distance to the object. Longitude, latitude, and elevation were determined for each point. We calculated the distance to the airport and to Lake Ontario from the sampling points. Wind direction of the sampling points relative to Hurontario Street were calculated and manually interpreted to determine if each location was upwind or downwind of Hurontario Street based on the primary wind direction. The land use and roadway data were obtained from DMTI Spatial Inc., a commercial GIS data provider. The elevation data were obtained from a 0.75 arc second (equal to about to 20 m) DEM from the Open Government Portal of Canada. Population density data were obtained from the 2016 Census Profiles at the dissemination area level of geog­ raphy. The wind direction data were obtained from historical hourly weather and climate data of the Toronto International Airport (Pearson Airport), which is 5 km from Hurontario Street. The traffic volume data were estimated from an interpolated Annual Average Daily Traffic (AADT) surface. The sources of AADT included monitoring stations governed by Ontario Ministry of Transportation, and Transportation Departments for the Region of Peel and City of Mississauga. The obtained data could only reflect the traffic volume of monitored locations. Inverse distance weighted interpolation was used to estimate a traffic volume surface along Hurontario (exponent of dis­ tance was 2, search radius of 12 points), and then the traffic volume was estimated for the unmonitored road segments.

2.5. Cross-validation & spatial autocorrelation The prediction capacity of the two methods were evaluated by leaveone-out cross-validation (LOOCV) and 10-fold cross-validation. LOOCV uses a single observation from the original sample as the validation data, and the remaining observations as the training data. This is repeated such that each observation in the sample is used once as the validation data. From the cross-validation, the R2 and root-mean-squared error (RMSE) were reported to evaluate and compare the predictive abilities

2.4. Land use regression modelling Linear regression models were developed using a modified stepwise selection procedure (Shi et al., 2017), which included five steps. (1) Evaluate the relationship between each predictor variable and measured NO2 concentrations using a linear regression model and ranked the 3

T. Shi et al.

Atmospheric Environment xxx (xxxx) xxx

2.6. Pollution surface mapping

Table 1 Description of predictor variables in the model. No.

Predictor variables

Land use data 1 Commercial land use 2 Government and institutional land use 3 Open area 4 Parks and recreational land use 5 Residential area 6 Resources and industrial land use 7 Waterbody Traffic information 8 Minor road length 9 Major road length 10 Highway (regional road/provincial freeway/transitway) length 11 Distance to the nearest road 12 Distance to the nearest major road 13 Distance to the nearest major intersection 14 Traffic volume on major road Population 15 Population density Geographic information 16 X-Coordinate 17 Y-Coordinate 18 X-Coordinate multiply by Y-Coordinate 19 Elevation 20 Distance to Pearson Airport 21 Distance to Lake Ontario 22 Wind direction of Hurontario St.

Abbreviations

Unit

Buffer size (radius of buffer in meter)

COMMERCIAL GOVERNMENT

m2 m2

OPEN PARKS

m2 m2

1, 5, 10, 25, 50, 100, 200, 500

A convex hull surrounding all sampling locations was generated and used as the boundary for spatial estimation using the LUR models. Prediction points were generated on a 10-m grid within the convex hull for a total of 233,001 points. The land use parameters included in both the linear regression and regression kriging models were calculated for each grid cell centre (point). Using the land use data, air pollution concentrations were estimated for each point of the 10-m grid. 2.7. Sensitivity analysis

2

RESIDENTIAL RESOURCE

m m2

WATERBODY

m2

Minor Major Highway

m m m

1, 5, 10, 25, 50, 100, 200, 500

D_Road

m

NA

D_Major

m

NA

D_Intersection

m

NA

Tra_Vol

veh.day

Pop_Den

counts⋅km

1

To determine if our model’s performance was sensitive to combining the three sampling periods, we completed two analyses that included: (1) Using data from the regional monitoring network, operated by the Ontario Ministry of the Environment, Conservation and Parks, we calculated the mean value for each week from the 17 available monitors within 75 km of our study area. The three sets of mean values were compared using an ANOVA to test if there were significant differences between the three periods. We used the wider region because in Mis­ sissauga only one regulatory monitor operates, which is located on the University of Toronto Mississauga Campus and is not reflective of all land use conditions in our study area. (2) We tested for statistical sig­ nificance if we included a dummy variable for sampling week in our model.

1, 5, 10, 25, 50, 100, 200, 500 2

3. Results 3.1. Field sampling descriptive statistics

1, 5, 10, 25, 50, 100, 200, 500

Point_X Point_Y X_Y

degree degree degree2

NA NA NA

Elevation D_Airport

m m

NA NA

D_Lake

m

NA

Wind_Dir

NA

NA

Sampler measurements ranged from 5.80 to 19.65 ppb with sampling week mean and standard deviation (SD) values of 11.87 ppb (SD ¼ 3.17 ppb), 8.89 ppb (SD ¼ 1.27 ppb), and 9.55 ppb (SD ¼ 1.32 ppb) for pe­ riods one to three, respectively. Throughout sampling campaigns one to three, 17 duplicate pairs of samplers were installed, we found an average difference of 1.18 ppb. During each sampling period we collocated an air pollution sampler with the regulatory monitor. The sampler was installed on an adjacent light post (to replicate our sampling protocol), which was about 20 m from the monitoring enclosure. The collocated samples had an average mean bias of 0.3 ppb and a mean absolute error of 1 ppb when compared with the regulatory monitors operated by the Ontario Ministry of the Environment, Conservation and Parks. Comparing the regional mean values for the three sampling periods from the regional monitoring network (17 sites within 75 km), the regional mean values were 7 ppb, 6 ppb, and 7 ppb for periods one to three.

and prediction errors of the methods (Hoek et al., 2008; Ryan and LeMasters, 2007). 10-fold cross-validation separated the data into 10 folds and the same performance metrics were evaluated as applied in the LOOCV. In addition, h-block cross-validation is also used to verify the pre­ diction capacity of the model. The h-block CV technique removes spatial blocks or subseries of observations to address problems of dependent random variables (Burman et al., 1994). In our study, the average dis­ tance between sampling points is 5.09 km and the minimum distance is 0.11 km (Appendix. A). H-block CV removes h number of nearest monitoring stations relative to a sampling location from the training set in the corresponding iteration (hence LOOCV is special case, which is equivalent to 0-block cross-validation). In the study, integers from 0 to 20 were selected as the h-block size, and the R2 and RMSE of 10-block were reported. Both the LUR and regression kriging methods rely on errors to be randomly distributed. Our spatial data have the potential for spatial autocorrelation among the residuals, which occurs when the data are either spatially lagged, or a potential predictor variable is missing from the model (spatial error). To test if errors were spatially correlated, we calculated Moran’s I on the models’ residuals.

3.2. LUR models The linear regression model included six statistically significant variables following the variable selection process including (1) traffic volume within 200 m (Tra_vol_200), (2) major road length within 50 m (Major_50), (3) government land use within 500 m (Government_500), (4) distance to the nearest major intersection (D_Intersection), (5) minor road length within 100 m (Minor_100) and (6) park area within 500 m (PARKS_500). The model’s R2 is 0.69 and the coefficient values are presented in Table 2. Based on the calculation of partial R2 values, Tra_vol_200, Major_50 and Minor_100 explained the most variation in the model, while PARKS_500 and D_Intersection were variables with negative coefficients. Moran’s I was 0.560 indicating the residuals do not display a spatial pattern. The LOOCV R2 of the LUR model was 0.64 and LOOCV RMSE was 1.44 ppb, 10-fold CV R2 was 0.63 and 10-fold CV RMSE was 1.45 ppb, and 10-block CV R2 was 0.55 and 10-block RMSE was 1.61 ppb. The regression kriging model included the significant predictors from the linear model. The variogram parameters were a circular function, with a nugget of 0.99, a sill of 1.90 and a range of 646. The R2 4

T. Shi et al.

Atmospheric Environment xxx (xxxx) xxx

Table 2 Summary of the final resultant LUR model of NO2. Coefficient

Std. error

t value

P(>| t|)

Intercept

8.09

16.22

<2 � 10 16

Tra_vol_200

1.91 � 10 6

4.93

3.26 � 10

6

Major_50

1.02 � 10 2

4.65

6.97 � 10

6

Government_500

1.99

2.00

4.80 � 10

2

D_Intersection

2.63 � 10 3

3.20 � 10

3

Minor_100

4.55 � 10 3

7.84 � 10

5

PARKS_500

9.07

4.99 � 10 1 3.88 � 10 7 2.18 � 10 3 9.95 � 10 1 8.70 � 10 4 1.11 � 10 3 2.45

3.46 � 10

4

3.02 4.12 3.71

VIF

Partial R2

1.73

0.19

2.44

0.18

1.23

0.04

1.94

0.08

2.10

0.14

1.42

0.12

Fig. 2. Cross-validation R2 of different h-block sizes.

R2 ¼ 0.69; Adj. R2 ¼ 0.67. LOOCV R2 ¼ 0.64; LOOCV RMSE ¼ 1.44 ppb. 10-fold CV R2 ¼ 0.63; 10-fold CV RMSE ¼ 1.45 ppb. 10-block CV R2 ¼ 0.55; 10-block RMSE ¼ 1.61 ppb.

value for the regression kriging model was 0.91. Three cross-validation method results of the regression kriging model, ordinary kriging model and LUR are summarized in Table 3. We also displayed the results of cross-validation R2 of different h-block sizes in Fig. 2. In Fig. 3, we present a scatter plot of the predicted versus actual values for both the linear regression model and the regression kriging model. Predicted values for the air pollution surface from the linear regression ranged between 3.9 and 25.2 ppb, and the regression kriging approach predicted values between 4.5 and 25.6 ppb, and the standard error ranged from 0 to 3.3 ppb. The exposure surface and standard error surface are presented in Fig. 4. The differences between the two surfaces, the one derived from regression kriging minus linear regression, ranged from 1.9 to 4.9 ppb. A surface of the differences is also included in Fig. 4.

Fig. 3. Scatter plots of NO2 predictions from regression kriging (RK) and land use regression (LUR) and observations in the study area. The dotted line is the 1:1 line.

model were not significant (P > 0.1), which confirms that our combined use of the three different sampling campaigns in the study should not have a significant effect on the analysis.

3.3. Sensitivity analysis

4. Discussion

To examine if the three sampling campaigns occurred under different regional conditions, a one-way analysis of variance (ANOVA) was car­ ried out. The results of Levene’s Test of equality of variances identified that the group variances were not significantly different (F ¼ 0.58, P ¼ 0.57). The ANOVA result showed that there were no statistically sig­ nificant differences in NO2 concentrations between the three sampling weeks (F ¼ 1.53, P ¼ 0.23). When different sampling periods were added as factors into the final linear regression model, the periods in the fitted

The modelled air pollution surface identified hotspots of NO2 occurring in areas with dense road network and heavy traffic volumes, such as road intersections. In addition, transportation variables, such as traffic volume, major and minor road length, all played stronger effects on the NO2 concentration compared to the other variables at the local scale. Although the study areas, sampling methods and scales are different, traffic as the dominant explanatory variable is consistent with other LUR models (Shi et al., 2016; Weissert et al., 2018). In urban en­ vironments, traffic-related fossil fuel combustion, especially from diesel-fueled vehicles, contributes substantially to ambient NO2 con­ centration (Mavroidis and Chaloulakou, 2011). In addition, park area has a negative coefficient that confirms the importance of urban green space in mitigating air pollution (Abhijith et al., 2017; Selmi et al., 2016). Trees remove gaseous air pollution primarily by uptake via leaf stomata, though some gases are removed by the plant surface. Once inside the leaf, gases diffuse into intercellular spaces and may be absorbed by water films to form acids or react with inner-leaf surfaces (Chaparro-Suarez et al., 2011; Nowak et al., 2006). Epidemiological studies have provided evidence that long-term NO2 exposure may decrease lung function and increase the risk of respiratory

Table 3 Land use regression, ordinary kriging and regression kriging models crossvalidation summary. Cross-validation of different model

R2

RMSE

LOOCV

0.64 0.53 0.69 0.63 0.52 0.68 0.55 0.05 0.45

1.44 ppb 1.66 ppb 1.34 ppb 1.45 ppb 1.67 ppb 1.36 ppb 1.61 ppb 2.45 ppb 1.78 ppb

10-fold CV 10-block CV0

LUR Ordinary kriging Regression kriging LUR Ordinary kriging Regression kriging LUR Ordinary kriging Regression kriging

5

T. Shi et al.

Atmospheric Environment xxx (xxxx) xxx

Fig. 4. Air pollution surface generated using regression kriging (RK) (A), RK standard error (B) and the differences of pollution surfaces generated by RK and LUR (C) (resolution is 10 m).

to reduce the prediction R2 but it stabilized around an h of 5 with no further reduction. Our interpretation is based on how the models are formulated. The kriging methods rely on distance weighted averages of surrounding observations to estimate a value at location. As the neigh­ bours increase in distance it is natural for the prediction ability to decrease. However, the LUR model can utilize the relationships between land use and pollution across the entire study area to retain most of its predictive power. These results suggest that LUR models may perform better in areas of few or no observations. However, all models appear to perform better as the number of local observations increases. In our study, we focused on NO2 air pollution because it is a trans­ portation related air pollutant with identified health effects. In epide­ miological studies, however, there are many other pollutants of concern. Adams and Kanaroglou (2015) and Liu et al. (2016) respectively used mobile monitoring data and stationary monitoring data to develop LUR models of PM2.5 and NO2 at a city scale. Their results showed that although the urban elements were the same, the explanatory ability of variables and the predictive ability of models varied due to the differ­ ences in the properties of pollutants. We are not sure how well LUR models will function at a local scale for other pollutants, or in envi­ ronments with multiple air pollution sources. The ability to model spatial change in air pollution at a small scale is important for measuring the impact of policy interventions. Hurontario Street will be undergoing the development of a light rail transit system. Our small-scale spatial model can be used to model baseline conditions prior to the construction and operation of the light rail line and be used to quantify changes in future air pollution conditions. The measurement of policy intervention is common in air pollution research. Zhou et al. (2010) reported that after implementation of temporary transportation control measures during the 2008 Beijing Olympics, urban traffic emissions of multiple pollutants were significantly reduced during the event. Titos et al. (2015) analyzed the effect of different transportation

symptoms (World Health Organization, 2003). Moreover, NO2, an accepted marker of traffic related air pollution, is related to individuals’ living environment (Wang et al., 2016). LUR modelling has frequently been used to estimate air pollution exposure in epidemiological studies. Most commonly ambient concentration measurements from central-site monitors are used as the exposure estimates in epidemiologic studies. However, the approach may lack the spatial and temporal resolution required to capture exposure variability, thus resulting in exposure € prediction errors (Baxter et al., 2013; Ozkaynak et al., 2013). Few studies have applied LUR at a local scale to predict NO2 concentrations, where we define local as multiple monitors within a neighbourhood. Miskell et al. (2015) developed a LUR model at a local scale and city scale within a central business district in Auckland, New Zealand, whereby using seven urban design variables explained 62% of the observed variability in the NO2 measurements. A transportation related variable was the strongest predictor in the model, which was the number of lanes. Weissert et al. (2018) developed a microscale LUR model for a heavily trafficked section of road in Auckland, New Zealand, which explained 66% of the variability in NO2 concentrations, and predictor variables reflecting traffic information, such as street width, distance to major road and number of bus stops were important determinants at the local scale. Therefore, it is reasonable to believe that, although pollutant sampling methods and study areas were different and predictor vari­ ables varied, the LUR model has significant applicability for predicting NO2 at a local urban scale, but such studies at the local scale are few. In the cross-validation results of different modelling methods, the 10block CV identified that the LUR model may be most robust to issue of spatial correlation because it retained its predictive power best when the spatial neighbours were removed. We assessed how different h-block sizes affected the prediction ability of hold-out samples (Fig. 2). The regression kriging model demonstrated continued reduction in R2 values as h was increased; however, as h increased for the LUR model it began 6

T. Shi et al.

Atmospheric Environment xxx (xxxx) xxx

5. Conclusions

regulations on air quality in two similarly sized cities that included Granada (Spain) and Ljubljana (Slovenia). In both study areas, signifi­ cant reductions of black carbon and particulate matter concentrations were observed after the restrictions were implemented. Depending on the scale of the intervention, researchers will require models capable of capturing the change for the appropriate spatial unit. A limitation in this study was the selection of monitoring locations. Each sampling location was selected in the field and monitors were installed on light posts. Our approach was to begin selecting sites at an intersection and moving outwards at increasing distances from the initial location. As locations were selected, in the field we visually assessed their spatial distribution using GIS and GPS and included additional sites in land use types or in areas that had not been selected. We were unable to select sites prior to being in the field, because we had to ensure that each light post was free from overhead wires, street signs and streetlights, which were details not included in the streetlight database we received from the municipal government. One concern was that our locations were duplicated samples, such that they may have been located too close together and have no variation between locations. To evaluate this, we examined the variogram from the kriging method and noted even at the closest distances the semi-variance values were not zero, which indicates variation in concentrations at the small dis­ tances we compared. The variogram is included in Appendix B. A second limitation in our study is that we did not evaluate the spatial transferability within the city of our LUR model. We have demonstrated the ability for local modelling, but these intensive sam­ pling campaigns likely suffer from transferability issues. For example, our model is calibrated for a transportation dominated environment and may not perform well in non transportation environments within the city. Spatial transferability at a local scale should be investigated in future work. Past studies have identified temporally nonstationary predictors and their explanatory powers between seasons (Ghassoun et al., 2015; Mukerjee et al., 2012), so it is unclear if the same modelling effectiveness will occur in other seasons. The LUR model’s predictive power appeared strong based on cross-validation; however, it is unclear if additional local land use variables may help improve model perfor­ mance. For example, Tang et al. (2013) and Shi et al. (2016) concluded that LUR models performance could be enhanced by integrating urban morphological factors, such as, building heights, and street configuration.

In this study, we assessed the performance of LUR modelling at the local scale, which was represented as a major urban transportation corridor in Mississauga, Canada. We showed that NO2 concentrations can be modelled at this scale using linear regression. LUR modelling demonstrated a good performance (R2 ¼ 0.69, Adj. R2 ¼ 0.67; LOOCV R2 ¼ 0.64, LOOCV RMSE ¼ 1.44 ppb; 10-fold CV R2 ¼ 0.63, 10-fold CV RMSE ¼ 1.45 ppb), while regression kriging was able to improve pre­ dictive performance (R2 ¼ 0.91, LOOCV R2 ¼ 0.69, RMSE ¼ 1.34 ppb; 10-fold CV R2 ¼ 0.68, RMSE ¼ 1.36 ppb). According to the final LUR model, variables capturing traffic information, such as traffic volume and road length, played dominant roles in explaining the NO2 concen­ tration changes in the study, and urban park and distance to the nearest major intersection were the variables with negative coefficients. This study demonstrated the ability of LUR modelling at a local scale. Author contributions Tuo Shi: Formal analysis, Writing - Original Draft, WritingReviewing and Editing. Nick Dirienzo: Methodology, Investigation, Writing - Original Draft, Writing- Reviewing and Editing. Weeberb J. Requia: Methodology, Writing- Reviewing and Editing. Marianne Hatzopoulou: Conceptualization, Methodology, WritingReviewing and Editing, Funding acquisition. Matthew D. Adams: Conceptualization, Methodology, Resources, Writing- Reviewing and Editing, Funding acquisition. Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements The research was supported by the Natural Sciences and Engineering Research Council of Canada [RGPIN-2018-04845] and University of Toronto XSeed Grant. Tuo Shi thanks the China Scholarship Council (No. [2018]3101) for fellowship support.

Appendix C. Supplementary data Supplementary data to this article can be found online at https://doi.org/10.1016/j.atmosenv.2019.117218. Appendix A. Distance between Neighbour stations in different h-block size

7

T. Shi et al.

Atmospheric Environment xxx (xxxx) xxx

Appendix B. Experimental Variogram from Kriging Model

References

Ghassoun, Y., Ruths, M., Lowner, M.O., Weber, S., 2015. Intra-urban variation of ultrafine particles as evaluated by process related land use and pollutant driven regression modelling. Sci. Total Environ. 536, 150–160. https://doi.org/10.1016/j. scitotenv.2015.07.051. Goodman, P.G., Rich, D.Q., Zeka, A., Clancy, L., Dockery, D.W., 2009. Effect of air pollution controls on black smoke and sulfur dioxide concentrations across Ireland. J. Air Waste Manag. Assoc. 59, 207–213. https://doi.org/10.3155/10473289.59.2.207. Harris, M., Beck, M., Gerasimchuk, I., 2015. The End of Coal: Ontario’s Coal Phase-Out. International Institute for Sustainable Development. Hiemstra, P.H., Pebesma, E.J., Twenh€ ofel, C.J., Heuvelink, G.B., 2009. Real-time automatic interpolation of ambient gamma dose rates from the Dutch radioactivity monitoring network. Comput. Geosci. 35, 1711–1721. https://doi.org/10.1016/j. cageo.2008.10.011. Ho, C.C., Chan, C.C., Cho, C.W., Lin, H.I., Lee, J.H., Wu, C.F., 2015. Land use regression modeling with vertical distribution measurements for fine particulate matter and elements in an urban area. Atmos. Environ. 104, 256–263. https://doi.org/10.1016/ j.atmosenv.2015.01.024. Hoek, G., Beelen, R., de Hoogh, K., Vienneau, D., Gulliver, J., Fischer, P., Briggs, D., 2008. A review of land-use regression models to assess spatial variation of outdoor air pollution. Atmos. Environ. 42, 7561–7578. https://doi.org/10.1016/j. atmosenv.2008.05.057. Kampa, M., Castanas, E., 2008. Human health effects of air pollution. Environ. Pollut. 151, 362–367. https://doi.org/10.1016/j.envpol.2007.06.012. Kashima, S., Yorifuji, T., Sawada, N., Nakaya, T., Eboshida, A., 2018. Comparison of land use regression models for NO2 based on routine and campaign monitoring data from an urban area of Japan. Sci. Total Environ. 631–632, 1029–1037. https://doi.org/ 10.1016/j.scitotenv.2018.02.334. Lave, L.B., Seskin, E.P., 2013. Air Pollution and Human Health. RFF Press. https://doi. org/10.4324/9781315064451. Liu, C., Henderson, B.H., Wang, D.F., Yang, X.Y., Peng, Z.R., 2016. A land use regression application into assessing spatial variation of intra-urban fine particulate matter (PM2.5) and nitrogen dioxide (NO2) concentrations in City of Shanghai, China. Sci. Total Environ. 565, 607–615. https://doi.org/10.1016/j.scitotenv.2016.03.189. Marshall, J.D., Nethery, E., Brauer, M., 2008. Within-urban variability in ambient air pollution: comparison of estimation methods. Atmos. Environ. 42, 1359–1369. https://doi.org/10.1016/j.atmosenv.2007.08.012. Mavroidis, I., Chaloulakou, A., 2011. Long-term trends of primary and secondary NO2 production in the Athens area. Variation of the NO2/NOx ratio. Atmos. Environ. 45, 6872–6879. https://doi.org/10.1016/j.atmosenv.2010.11.006. McCallion, H., 2004. Building on success in Mississauga, Ontario. Ekistics 71, 135–137. http://www.mississauga.ca/file/COM/September_1st_Strategy_Launch_Presentation .pdf. Mercer, L.D., Szpiro, A.A., Sheppard, L., Lindstr€ om, J., Adar, S.D., Allen, R.W., Avol, E.L., Oron, A.P., Larson, T., Liu, L.J.S., Kaufman, J.D., 2011. Comparing universal kriging and land-use regression for predicting concentrations of gaseous oxides of nitrogen (NOx) for the Multi-Ethnic Study of Atherosclerosis and Air Pollution (MESA Air). Atmos. Environ. 45, 4412–4420. https://doi.org/10.1016/j.atmosenv.2011.05.043. Oxford, England : 1994. Miller, J., Franklin, J., Aspinall, R., 2007. Incorporating spatial dependence in predictive vegetation models. Ecol. Model. 202, 225–242. https://doi.org/10.1016/j. ecolmodel.2006.12.012.

Abhijith, K.V., Kumar, P., Gallagher, J., McNabola, A., Baldauf, R., Pilla, F., Broderick, B., Di Sabatino, S., Pulvirenti, B., 2017. Air pollution abatement performances of green infrastructure in open road and built-up street canyon environments – a review. Atmos. Environ. 162, 71–86. https://doi.org/10.1016/j.atmosenv.2017.05.014. Adams, M.D., Kanaroglou, P.S., 2015. Mapping real-time air pollution health risk for environmental management: combining mobile and stationary air pollution monitoring with neural network models. J. Environ. Manag. 168, 133. https://doi. org/10.1016/j.jenvman.2015.12.012. Allen, R.W., Amram, O., Wheeler, A.J., Brauer, M., 2011. The transferability of NO and NO2 land use regression models between cities and pollutants. Atmos. Environ. 45, 369–378. https://doi.org/10.1016/j.atmosenv.2010.10.002. Araki, S., Yamamoto, K., Kondo, A., 2015. Application of regression kriging to air pollutant concentrations in Japan with high spatial resolution. Aerosol Air Qual. Res. 15, 234–241. https://doi.org/10.4209/aaqr.2014.01.0011. Baxter, L.K., Dionisio, K.L., Burke, J., Sarnat, S.E., Sarnat, J.A., Hodas, N., Rich, D.Q., Turpin, B.J., Jones, R.R., Mannshardt, E., 2013. Exposure prediction approaches used in air pollution epidemiology studies: key findings and future recommendations. J. Expo. Sci. Environ. Epidemiol. 23, 654. https://doi.org/ 10.1038/jes.2013.62. Beelen, R., Hoek, G., Vienneau, D., Eeftens, M., Dimakopoulou, K., Pedeli, X., Tsai, M.Y., Kunzli, N., Schikowski, T., Marcon, A., Eriksen, K.T., Raaschou-Nielsen, O., Stephanou, E., Patelarou, E., Lanki, T., Yli-Tuomi, T., Declercq, C., Falq, G., Stempfelet, M., Birk, M., Cyrys, J., von Klot, S., Nador, G., Varro, M.J., Dedele, A., Grazuleviciene, R., Molter, A., Lindley, S., Madsen, C., Cesaroni, G., Ranzi, A., Badaloni, C., Hoffmann, B., Nonnemacher, M., Kraemer, U., Kuhlbusch, T., Cirach, M., de Nazelle, A., Nieuwenhuijsen, M., Bellander, T., Korek, M., Olsson, D., Stromgren, M., Dons, E., Jerrett, M., Fischer, P., Wang, M., Brunekreef, B., de Hoogh, K., 2013. Development of NO2 and NOx land use regression models for estimating air pollution exposure in 36 study areas in Europe - the ESCAPE project. Atmos. Environ. 72, 10–23. https://doi.org/10.1016/j.atmosenv.2013.02.037. Briggs, D.J., Collins, S., Elliott, P., Fischer, P., Kingham, S., Lebret, E., Pryl, K., VanReeuwijk, H., Smallbone, K., VanderVeen, A., 1997. Mapping urban air pollution using GIS: a regression-based approach. Int. J. Geogr. Inf. Sci. 11, 699–718. https:// doi.org/10.1080/136588197242158. Burman, P., CHOW, E., NOLAN, D., 1994. A cross-validatory method for dependent data. Biometrika 81, 351–358. Chaparro-Suarez, I.G., Meixner, F.X., Kesselmeier, J., 2011. Nitrogen dioxide (NO2) uptake by vegetation controlled by atmospheric concentrations and plant stomatal aperture. Atmos. Environ. 45, 5742–5750. https://doi.org/10.1016/j. atmosenv.2011.07.021. Eeftens, M., Beelen, R., de Hoogh, K., Bellander, T., Cesaroni, G., Cirach, M., Declercq, C., _ e, _ A., Dons, E., de Nazelle, A., Dimakopoulou, K., Eriksen, K., Falq, G., Dedel _ R., Heinrich, J., Hoffmann, B., Jerrett, M., Fischer, P., Galassi, C., Gra�zulevi�ciene, Keidel, D., Korek, M., Lanki, T., Lindley, S., Madsen, C., M€ olter, A., N� ador, G., Nieuwenhuijsen, M., Nonnemacher, M., Pedeli, X., Raaschou-Nielsen, O., Patelarou, E., Quass, U., Ranzi, A., Schindler, C., Stempfelet, M., Stephanou, E., Sugiri, D., Tsai, M.-Y., Yli-Tuomi, T., Varr� o, M.J., Vienneau, D., Klot, S.v., Wolf, K., Brunekreef, B., Hoek, G., 2012. Development of land use regression models for PM2.5, PM2.5 absorbance, PM10 and PMcoarse in 20 European study areas; results of the ESCAPE project. Environ. Sci. Technol. 46, 11195–11205. https://doi.org/ 10.1016/j.atmosenv.2007.08.012.

8

T. Shi et al.

Atmospheric Environment xxx (xxxx) xxx Selmi, W., Weber, C., Rivi�ere, E., Blond, N., Mehdi, L., Nowak, D., 2016. Air pollution removal by trees in public green spaces in Strasbourg city, France. Urban For. Urban Green. 17, 192–201. https://doi.org/10.1016/j.ufug.2016.04.010. Shi, Y., Lau, K.K.L., Ng, E., 2016. Developing street-level PM2.5 and PM10 land use regression models in high-density Hong Kong with urban morphological factors. Environ. Sci. Technol. 50, 8178–8187. https://doi.org/10.1021/acs.est.6b01807. Shi, Y., Lau, K.K.L., Ng, E., 2017. Incorporating wind availability into land use regression modelling of air quality in mountainous high-density urban environment. Environ. Res. 157, 17–29. https://doi.org/10.1016/j.envres.2017.05.007. Tang, R., Blangiardo, M., Gulliver, J., 2013. Using building heights and street configuration to enhance intraurban PM10, NOX, and NO2 land use regression models. Environ. Sci. Technol. 47, 11643–11650. https://doi.org/10.1021/ es402156g. Titos, G., Lyamani, H., Drinovec, L., Olmo, F.J., Mo�cnik, G., Alados-Arboledas, L., 2015. Evaluation of the impact of transportation changes on air quality. Atmos. Environ. 114, 19–31. https://doi.org/10.1016/j.atmosenv.2015.05.027. Wang, A., Fallah-Shorshani, M., Xu, J., Hatzopoulou, M., 2016. Characterizing near-road air pollution using local-scale emission and dispersion models and validation against in-situ measurements. Atmos. Environ. 142, 452–464. https://doi.org/10.1016/j. atmosenv.2016.08.020. Weichenthal, S., Van Ryswyk, K., Kulka, R., Sun, L., Wallace, L., Joseph, L., 2015. InVehicle exposures to particulate air pollution in Canadian metropolitan areas: the urban transportation exposure study. Environ. Sci. Technol. 49, 597–605. https:// doi.org/10.1021/es504043a. Weissert, L., Salmond, J., Miskell, G., Alavi-Shoshtari, M., Williams, D., 2018. Development of a microscale land use regression model for predicting NO 2 concentrations at a heavy trafficked suburban area in Auckland. NZ. Sci. Total Environ. 619, 112–119. https://doi.org/10.1016/j.scitotenv.2017.11.028. World Health Organization, 2003. Health Aspects of Air Pollution with Particulate Matter, Ozone and Nitrogen Dioxide: Report on a WHO Working Group. Copenhagen: WHO Regional Office for Europe, Bonn. Germany 13-15 January 2003. https://apps.who.int/iris/handle/10665/107478. Zhang, K., Batterman, S., 2013. Air pollution and health risks due to vehicle traffic. Sci. Total Environ. 450–451, 307–316. https://doi.org/10.1016/j. scitotenv.2013.01.074. Zhou, Y., Wu, Y., Yang, L., Fu, L., He, K., Wang, S., Hao, J., Chen, J., Li, C., 2010. The impact of transportation control measures on emission reductions during the 2008 Olympic Games in Beijing, China. Atmos. Environ. Times 44, 285–293. https://doi. org/10.1016/j.atmosenv.2009.10.040.

Miskell, G., Salmond, J., Longley, I., Dirks, K.N., 2015. A novel approach in quantifying the effect of urban design features on local-scale Air pollution in central urban areas. Environ. Sci. Technol. 49, 9004–9011. https://doi.org/10.1021/acs.est.5b00476. Molter, A., Lindley, S., de Vocht, F., Simpson, A., Agius, R., 2010. Modelling air pollution for epidemiologic research - Part II. Predicting temporal variation through land use regression. Sci. Total Environ. 409, 211–217. https://doi.org/10.1016/j. scitotenv.2010.10.005. Motaghian, H.R., Mohammadi, J., 2011. Spatial estimation of saturated hydraulic conductivity from terrain attributes using regression, kriging, and artificial neural networks. Pedosphere 21, 170–177. https://doi.org/10.1016/S1002-0160(11) 60115-X. Mukerjee, S., Smith, L.A., Norris, G.A., Morandi, M.T., Gonzales, M., Noble, C.A., Neas, L. € M., Ozkaynak, A.H., 2004. Field method comparison between passive air samplers and continuous monitors for VOCs and NO2 in El Paso, Texas. J. Air Waste Manag. Assoc. 54, 307–319. https://doi.org/10.1080/10473289.2004.10470903. Mukerjee, S., Willis, R.D., Walker, J.T., Hammond, D., Norris, G.A., Smith, L.A., Welch, D.P., Peters, T.M., 2012. Seasonal effects in land use regression models for nitrogen dioxide, coarse particulate matter, and gaseous ammonia in Cleveland, Ohio. Atmospheric Pollution Research 3, 352–361. https://doi.org/10.5094/ apr.2012.039. Nafstad, P., Håheim, L., Oftedal, B., Gram, F., Holme, I., Hjermann, I., Leren, P., 2003. Lung cancer and air pollution: a 27 year follow up of 16 209 Norwegian men. Thorax 58, 1071–1076. Nowak, D.J., Crane, D.E., Stevens, J.C., 2006. Air pollution removal by urban trees and shrubs in the United States. Urban For. Urban Green. 4, 115–123. https://doi.org/ 10.1016/j.ufug.2006.01.007. € Ozkaynak, H., Baxter, L.K., Dionisio, K.L., Burke, J., 2013. Air pollution exposure prediction approaches used in air pollution epidemiology studies. J. Expo. Sci. Environ. Epidemiol. 23, 566. https://doi.org/10.1038/jes.2013.15. Pearce, J.L., Rathbun, S.L., Aguilar-Villalobos, M., Naeher, L.P., 2009. Characterizing the spatiotemporal variability of PM2.5 in Cusco, Peru using kriging with external drift. Atmos. Environ. 43, 2060–2069. https://doi.org/10.1016/j.atmosenv.2008.10.060. Poplawski, K., Gould, T., Setton, E., Allen, R., Su, J., Larson, T., Henderson, S., Brauer, M., Hystad, P., Lightowlers, C., 2009. Intercity transferability of land use regression models for estimating ambient concentrations of nitrogen dioxide. J. Expo. Sci. Environ. Epidemiol. 19, 107. https://doi.org/10.1038/jes.2008.15. Ryan, P.H., LeMasters, G.K., 2007. A review of land-use regression models for characterizing intraurban air pollution exposure. Inhal. Toxicol. 19, 127–133. https://doi.org/10.1080/08958370701495998.

9