Journal Pre-proof Estimating fine particulate concentration using a combined approach of linear regression and artificial neural network Maqbool Ahmad, Khan Alam, Shahina Tariq, Sajid Anwar, Jawad Nasir, Muhammad Mansha PII:
S1352-2310(19)30689-2
DOI:
https://doi.org/10.1016/j.atmosenv.2019.117050
Reference:
AEA 117050
To appear in:
Atmospheric Environment
Received Date: 11 May 2019 Revised Date:
8 October 2019
Accepted Date: 10 October 2019
Please cite this article as: Ahmad, M., Alam, K., Tariq, S., Anwar, S., Nasir, J., Mansha, M., Estimating fine particulate concentration using a combined approach of linear regression and artificial neural network, Atmospheric Environment (2019), doi: https://doi.org/10.1016/j.atmosenv.2019.117050. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.
Graphical Abstract
Inputs
Models
Ground Observed PM2.5 Data
Multiple Linear Regression Model
Auxiliary Data
Artificial Neural Network
MODIS-AOD NDVI Meteorology Day/Month of Year
Input
Output Hidden Layer
Estimated PM2.5 Concentration
1
Estimating fine particulate concentration using a combined approach
2
of linear regression and artificial neural network
3 4
Maqbool Ahmad1*, Khan Alam2*, Shahina Tariq1, Sajid Anwar3, Jawad Nasir4, Muhammad Mansha4
5 1
6 7
Department of Meteorology, COMSATS University Islamabad, Pakistan
2
Department of Physics, University of Peshawar, Peshawar 25120, Khyber Pakhtunkhwa, Pakistan 3
8
Faculty of Computer science & Engineering, Ghulam Ishaq Khan Institute of Engineering &
9 10
Technology, Swabi, Pakistan 4
Pakistan Space and Upper Atmosphere Research Commission (SUPARCO), P.O. Box 8402, Off
11
University Road, Karachi- 75270, Pakistan
12 13 14 15 16 17 18 19 20 21 22
*Corresponding author:
23
Email:
[email protected];
[email protected]
24
Phone: +92-91-9216727
25
Abstract
26
Fine particulate matter (PM2.5) is directly associated with the degradation of air quality and
27
environmental health effects. PM2.5 is gaining much attention through its environmental impacts,
28
but the inadequacy of ground based measurements limits the understanding of PM2.5 over many
29
regions. This study is aimed to employ a new and integrated approach of multiple linear
30
regression (MLR) and artificial neural networks (ANN) to estimate the ground level PM2.5
31
concentration using satellite aerosol optical depth (AOD), land use data and meteorological
32
parameters. AOD from Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol
33
products (MOD04) with Dark Target Deep Blue Combined algorithm at 10 km spatial resolution
34
were retrieved for the most urbanized and industrialized city of Karachi, Pakistan during
35
2015-2017. The results of the MLR model revealed a good agreement with the ground observed
36
data through correlation (R) 0.96, 0.87 and 0.76 for 2015, 2016 and 2017, respectively. The
37
ANN with error back propagation algorithm was developed using AOD with binning of land use
38
and meteorological parameters with associated spatio-temporal terms. The data sets were
39
assembled into three groups, with 80 % data for training and 10 % each for validation and
40
testing. ANN revealed good correlation coefficients (R) 0.80, 0.80 and 0.78 for training, test and
41
validation, respectively. The proposed study has shown the enhanced accuracy in estimating
42
PM2.5 concentration by including meteorological and land use data with satellite AOD. The
43
results showed that both MLR and ANN are in closed agreement and capable to estimate PM2.5
44
concentrations. Overall, for the estimation of particulate concentration, ANN is more powerful
45
technique and can be used to estimate long term particulate matter concentration with associated
46
guidelines to monitor air quality in any region.
47
Keywords: Multiple linear regression, artificial neural network, AOD, MODIS
48
1. Introduction
49
The unprecedented rate of urbanization in developing countries is continuously driving the
50
threats to air quality with associated public health and the environment. Along with, the
51
simultaneous increasing rates in population and transportation have given dangerous indication
52
levels of air pollutants (Malashock et al., 2018). Among these, particulate matter (PM) is
53
considered as a critical air pollutant with associated air quality and adverse health problems.
54
Further, these issues are compounded by road traffics, industrialization, infrastructure activities
55
and the absence of environmental regulation (Nguyen et al., 2015). PM with an aerodynamic
56
diameter of 2.5 µm (PM2.5) has attracted more attention due to its adverse respiratory lung
57
problems, cardiovascular disease and morbidity outcomes (Chu et al., 2013; You et al., 2016).
58
The estimation algorithms of PM2.5 concentrations are dependent on ground and remote sensing
59
observations to a great extent. However, the ground-level PM2.5 estimations were constrained
60
due to their limited spatial coverage.
61
Satellite remote sensing of aerosols has played a major role to mitigate the limitations of ground
62
based measurements. Satellite derived aerosol optical depth (AOD) is proven to be a potent
63
parameter for the assessment of air quality on a large scale. However, it cannot be used alone to
64
estimate the PM2.5 concentration. Because AOD is the measure of column aerosol loadings while
65
particulate matter is used to indicate the mass concentration near the earth’s surface (Alvarez et
66
al., 2003). Despite of their contrasting units, satellite AOD can be used as a surrogate of PM
67
(Gupta and Christopher, 2009). With AOD, the incorporation of meteorological parameters, and
68
mixing layer height improve the estimation of PM concentration in regression models. It is also
69
proven that, instead of aerosol size distribution, considering the vertical profile of relative
70
humidity could improve the AOD-PM2.5 correlation (Soni et al., 2018; Karimian et al., 2016).
71
Wang and Christopher (2003) have shown a good correlation (> 0.7) between ground level PM2.5
72
and AOD at 0.55 μm from Moderate Resolution Imaging Spectroradiometer (MODIS). Ma et al.
73
(2016) have enhanced the correlation results (0.68) by including the meteorological parameters
74
values in their statistical model. Besides, the long history of regression models in various
75
disciplines limitations uncertainties still exist due to the non-linear relationship with parameters.
76
Therefore, these models may not provide a high level of accuracy in complex, extreme, nonlinear
77
data with associated results in terms of regression assumptions and multiple co-linearity between
78
input and output variables (Ul-Saufie et al., 2011). To overcome these issues new approaches
79
were introduced in several studies (Elbayoumi et al., 2015).
80
Artificial neural networks (ANN) consisting of more than one hidden layer, are called deep
81
neural networks (DNN). DNN’s have recently shown very good performance on many visions,
82
speech, and natural language processing (NLP) benchmarks. The structures of ANN models can
83
vary depending upon the situation in hand and training behavior in terms of geo-location,
84
meteorological and temporal conditions (Russo et al., 2015). Their performances have shown
85
good accuracy, where monitoring stations are used to measure concentrations of PM2.5 (Fang and
86
Wang, 2017). Interestingly, ANN can approximate an unknown function f(x) by learning from
87
training data. The f(x) can represent a classification or regression problem. Specific to this study,
88
we can train an ANN for estimating the particulate matter as a regression problem.
89
Likewise, Dastoorpoor et al. (2016) monitoring and understanding of temporal variation of
90
particulate concentration on local scales over compound urban environment like Karachi will
91
need a robust algorithm in support to regression models. Ground based PM2.5 is rarely present in
92
most developing regions like Pakistan. Therefore, it is very difficult to assess long range PM2.5
93
concentration trends. In Karachi, air quality was assessed by analyzing PM2.5, aerosols, and other
94
trace elements collected at two locations of Tibet center and Korangi (Lurie et al., 2019).
95
However, in their study investigation about the influence of land cover type and meteorological
96
conditions with AOD on estimation of PM2.5 was not discussed. Therefore, this study proposed a
97
new approach to estimate the PM2.5 from MODIS AOD with land use data and other climatic
98
conditions at twenty one locations in Karachi. The most powerful model of DNN in support to
99
MLR was used to estimate PM2.5 concentration from August 2015 to December 2017. This will
100
be a distinctive approach to provide a base for the estimation of long term PM2.5 concentration
101
over the study area.
102 103
2. Materials and Methods
104
2.1. Data description
105
2.1.1. Ground measurements
106
The study domain covers the most urbanized city (Karachi) of Pakistan. Karachi (24.86 N,
107
67.00 E) is a coastal city situated at the bank of Arabian Sea. The total area of study region is
108
3527 km² with a most population density of about 20 million in South Asia (Alam et al., 2011;
109
Lurie et al., 2019). According to Khwaja et al. (2013), Karachi is comprised of residential,
110
commercial and urban/suburban locations. It has a desert like (subtropical) climate with scanty
111
rainfall, humid in summer (wettest month of August) and dry in winter (driest month of
112
December). The major air pollutants (aerosol) in Karachi are mainly emitted into the atmosphere
113
due to automobile exhaust gases, chemical pollutants by land vehicles, local dust transportation,
114
fossil fuels, and industrial byproducts (Alam et al., 2014). In total, 21 monitoring stations in
115
domain of Karachi are added to estimate daily PM2.5 concentrations in the period from
116
2015-2017, details are given in Table 1. The ground PM observation stations and meteorological
117
station in Karachi are shown by red and green dots, respectively (see Figure 1).
118 119
Figure 1. Study area with spatial distribution of monitoring stations.
120
The obtained data from stations were pre-processed using geo-statistical techniques to bring in
121
consistent required scale. Further, the data was used as input parameters for statistical and neural
122
networking models. The data exclude unvalued and outliers data.
123
Table 1. List of ground PM2.5 monitoring sites in Karachi.
124
Sites
Name of Site
Sites
Name of Site
Description
1 2 3 4 5 6 7 8 9 10 11
Coastal Site 1 Coastal Site 2 Coastal Site 3 KEPZ Bahria Town TF Highway PITB FTC NIPA Sohrab Goat1 Nizamabad
12 13 14 15 16 17 18 19 20 21
Port Qasim1 Port Qasim2 Coastal Site 4 Clifton Dawood Hoshang Road Brooks Regal Sarjani Town Sohrab Goat2
Coastal Coastal Coastal Coastal Residential Transportation Commercial Commercial Public Suburb Suburb
125
2.1.2. Meteorological Data
126
Meteorological data sets (temperature, wind speed, specific humidity, atmospheric pressure and
127
wind direction) were acquired from weather monitoring stations installed at each location and
128
Pakistan Meteorological Department (PMD) in the study region. The meteorological data was
129
simultaneously collected along with PM concentrations.
130
Planetary Boundary Layer (PBL) height data of Modern-Era Retrospective Analysis for
131
Research and Applications, version 2 (MERRA-2) was used. The PBL consideration
132
significantly impacts the empirical models for the PM and AOD relationship, because at different
133
heights the hygroscopic growth changes particle extinction properties (Chu et al., 2013; Tsai et
134
al., 2011; van Donkelaar et al., 2006). Conventionally, the AOD products render light extinction
135
effects of both coarse and fine particles. However, assuming well mixing of the boundary layer,
136
specific humidity, and extinction coefficient for dry aerosol remains unchanged (Wallace and
137
Hobbs, 2006). Likewise, the PM2.5-AOD relationship can be found approximately on the vertical
138
profile of relative humidity (RH) in their correlation. As to remove the effect of suface-level
139
water vapor on PM, it was measured in dry state. Therefore, humidity correction was also needed
140
by considering surface level RH and vertical distribution of RH within the boundary layer.
141 142
2.1.3. Aerosol Robotic Network
143
Aerosol robotic network (AERONET) is a ground based aerosol network established by the
144
National Aeronautics and Space Administration (NASA). It directly measures the sun and
145
diffuses sky radiances through the CIMEL sun/sky radiometers in the wavelength range of
146
340-1020 nm and 440-1020 nm (Bibi et al., 2017). The AERONET data comes in three levels,
147
i.e. level 1.0 (unscreened), Level 1.5 (cloud screened) and level 2.0 (cloud screened and quality
148
assured). In the present study, AERONET level 2.0 data is used to validate satellite AOD data to
149
confirm their retrieval accuracy for the condition of this study. For validation, the daily mean
150
AOD of 500 nm from AERONET was calibrated to a common wavelength of 550 nm of MODIS
151
using the following equation. AOD
152
= AOD
550 500
(1)
Where, α is angstrom exponent with value of range 440–870 nm (Alam et al., 2011).
153 154
2.1.4. Moderate Resolution Imaging Sepctroradiometer
155
Moderate Resolution Imaging Sepctroradiometer (MODIS) mounted on space borne Terra and
156
Aqua satellites has high radiometric resolutions (Draxler and Rolph, 2011). Both these satellites
157
cross the equator from north to south at 10:30 and south to north at 13:30, respectively. MODIS
158
has 36 spectral bands lying in the range of wavelengths from 0.41 µm to 14.4 µm. It has a swath
159
width of 2330 km. Due to its broad spectral and spatial resolution, MODIS provides detailed
160
near daily measurements of aerosol optical depth and other applications. According to Qian et al.
161
(2012), Deep Blue (DB) algorithm is used very effectively in bright surfaces (e.g. desert, urban
162
areas and semi-arid areas), while Dark Target (DT) algorithm in dark surfaces (e.g. mid-visible
163
and red wavelengths, suitable over vegetative land). MODIS datasets are broadly used to assess
164
various properties of aerosols and their associated climatic impacts using various retrieving
165
algorithms.
166
In this study, the collection 6, Level 2 MODIS dataset (MOD04 for Terra) by using AOD550
167
Dark Target Deep Blue Combined algorithm was selected at a spatial resolution of 10 km from
168
August 2015 to December 2017. The relationship between MODIS-AOD and AERONET AOD
169
was also calculated to assess the accuracy of satellite data. A good correlation (0.95) between
170
MODIS-AOD and AERONET-AOD was found. Satellite retrievals were re-sampled and
171
re-projected to bring in the same projection coordinate system. Generally, the aerosols are
172
assorted and limited within the PBL, so the AOD values normalized by PBL height for mean
173
PBL, were considered as extinction (in km-1). It is therefore, considered for the surface PM
174
concentration, while accounting for PBL depth variations (Arvani et al., 2016). Apportioned by
175
land use cover, MODIS Level 3, Normalized Difference Vegetation Index (NDVI) vegetation
176
product (MOD13) was also procured during the same period. MODIS based NDVI was used in
177
the AOD-PM2.5 model to reflect the land use cover type. The MODIS data can be downloaded
178
from the website, www.search.earthdata.nasa.gov/search.
179 180
2.2. Linear regressions
181
Linear regression models relate the response of the output parameters to input parameters. This
182
relationship is mainly focused to understand the effect of input parameters in response to output
183
parameters. Several studies have used both linear and multiple linear regression models for the
184
estimation of PM2.5. However, simple linear regression models render strict assumptions in
185
improper handling of the measurement uncertainties and may cause non-negligible errors (Wu et
186
al., 2018). These errors are found in estimated ground-level PM2.5 concentrations, where AOD is
187
the sole predictor (see Equation 2). Therefore, in this study simple linear regression is not
188
considered to estimate the PM2.5 estimation. PM
. (
)
= β AOD + β" (2)
189
where, AOD is the aerosol optical depth at a given region and β , β" are the regression
190
co-efficients obtained from the least squares method during the regression of ground based
191
measured PM concentration and AOD. Multiple linear regression (MLR) models have the
192
capabilities to resolve these issues and improve the model reliability up to a great extent. In this
193
study satellite AOD with meteorological parameters are taken as input parameters and PM2.5 is to
194
be estimated as output variable. Nevertheless, these estimations are influenced by the vertical
195
distribution of aerosols along other meteorological parameters (Tandon et al., 2010). The
196
MODIS AOD with meteorological variables, PBL (in km), and land use cover e.g. NDVI were
197
included in multi regression models to improve ground level estimation of PM2.5 as shown by
198
equation 3 (Saunders et al., 2014). PM
. (
)
= β + β" AOD + β NDVI + β( (T) + β* (WS) + β (H) + β. (P) + β/ (PBL) (3)
199
where, β and β"
200
each day, NDVI is the normalized difference vegetation index, T is temperature, WS is wind
201
speed, H is humidity, P is pressure and PBL is Planetary Boundary Layer Height.
/
are the regression co-efficients, AOD is MODIS-AOD values, d represents
202 203
2.3. Artificial Neural Networks
204
The artificial neural networks with regression models are aimed to assess the comparative
205
analysis of estimated PM2.5. ANN is known due to its superiority over traditional regression
206
methods due to its efficient computations, generalization and limited dependence on prior
207
knowledge (Elangasinghe et al., 2014). A prototype of ANN with error back propagation
208
algorithm is used as shown in the Figure 2.
6"(
5"
N1
5
209
5" ∗ 6"( + 5 ∗ 6
N2
N3
6"* 6
Input Layer
6
(
8 = Out (1 − Out )
6(
(
N5 N4
Error
6*
In = N( ∗ 6( + N* ∗ 6*
*
Hidden Layer
Out = 4(In ) Output Layer
210 211 212 213
Figure 2. Schematics of ANN with error back propagation algorithm. “In” represents input and “Out” represents the output.
214
The output at each neuron is computed by collecting weighted sum from neurons in pervious
215
layer. The weights (w) were initialized randomly from a uniform or normal distribution. The
216
figure shows this for N3 by using Equation 4.
In the above figure, ANN has one input, one hidden and one output layer with 2-2-1 neuron each.
N( = 4(5" 6"( + 5 6 ( ) (4)
217
Where 4(. ) is the activation function (ReLu, sigmoid etc.). N represents neuron and 5 input
218
vectors. Once the output is computed at N5, it is followed by comparing it with ground truth
219
value and computes the error. As the initial weights were obtained from random distribution, the
220
current output may not match with ground truth. This error is then propagated backwards for
221
computing change in each weight. The error (8) can be computed by Equation 5a. The
222
Out ; (1 − Out ; ) is the derivative of the sigmoid activation function in the output layer, while
223
the second term (Target ; − Out ; ), computes the difference between the current and desired
224
performance. 8; = Out ; (1 − Out ; )(Target ; − Out ; ) (5a)
225
Consequently, the change in weight is then computed using following equation 5b. ∆6( = 8D ∗ Out ;( ∗ Learning rate(ԉ) + 6( (5b)
226
In this way the weights and biases of the neural network are fine-tuned, and the cost function is
227
minimized. The network uses mean squared error (MSE) as the cost function, which is shown in
228
Equation 6. Based upon the error back propagation algorithm the proposed DNN network is
229
shown in Figure 3. Spatial Terms
MODI-AOD Temporal Terms MODIS-NDVI Meteorological Data
230 231
Back Propagation
Hidden Layers
INPUT Layers Figure 3. Estimation of PM2.5 with several hidden layers using DNN.
Estimated PM2.5
OUTPUT
232 ;
1 MSE = I(y − t ) (6) N K"
233
Where, N, y and t represent the number of training samples, current output, and ground truths.
234
In this study, seven dimensions in the input layer were mapped to a scalar floating point value of
235
PM2.5 concentration during 2015-2017 using the function fitting neural network. The model is
236
trained for input parameters, e.g. satellite AOD, meteorological parameters, NDVI, and spatio-
237
temporal information. The overall data sets were being assembled into three groups, with 80 %
238
data for training and 10 % each for validation and testing was used in the network of interest.
239
The DNN with error back propagation algorithm was implemented by using MATLAB.
240
2. 4. Evaluation of estimation performance
241
The evaluation of estimated PM2.5 concentration using the concerned modeling approaches is
242
based on the statistical parameters, e.g. correlation co-efficient (R), coefficient of determination
243
(R2), mean square error (RMSE) with mean absolute error given in Equation 7-9 (Ghaedrahmat
244
et al., 2019; Maleki et al., 2019; Ceylan and Bulkan, 2018; Nguyen et al., 2015). ∑ K"(c − m ) R = 1 − N R (7) ∑ K"(c − m Q)
MAE = I
(c − m ) (8) n K"
RMSE = UI
(c − m ) , MSE = (RMSE) (9) n K"
245
where c represents the estimated value by model, m the observed value, n is the number of
246
observed data pairs. The overall methodology of the proposed study is shown in the below
247
Figure 4.
Satellite Images MOD04/MOD13
Re-sampling
Ground PM & Meteorological Data
Spatiotemporal Data integration
PM
Pre-processing
Grd. MET
Multiple Linear Regression Model Grd. PM Artificial/Deep Neural Networking
Correlation Results
PM Estimation
Temporal Variation
Sat. PM
Validation
248 249 250
Figure 4. Methodology of the proposed research work. Grd. PM and Sat. PM represent ground and satellite observed PM, respectively.
251 252
3. Results and Discussion
253
3.1. Validation of satellite-derived AOD data
254
The first step in the analysis of this study was the validation of satellite-derived AOD with
255
ground level AERONET AOD data. Although, there are many ways to validate it, but the cross
256
validation method is most widely used, as AERONET Level 2 data is free of cloud
257
contamination (Boiyo et al., 2017; Ma et al., 2016). This validation was very important due to its
258
larger columnar spatial coverage as compared to ground level PM2.5 point data. AERONET AOD
259
data was compared with that of MODIS AOD. The study period was selected on the basis of
260
available ground PM data, e.g. from 2015 to 2017. The MODIS AOD was found in good
261
agreement (R > 0.85) with AERONET during the study period. This agreement provides an
262
accurate guarantee of MODIS AOD to be used in PM2.5 estimation.
263 264
3.2. Model estimation and Accuracy analysis
265
Several MLR models were developed to estimate the PM2.5 concentrations during the study in
266
hand. Hence, the best model with the highest R (0.96) was chosen (see Equation 10). PM
.
= 498.94 + 2.73AOD − 111.28NDVI + 0.24T— 0.53WS − 0.39H − 0.42P − 0.001PBL (10)
267
In this model MODIS derived AOD, NDVI, temperature, wind speed, humidity, pressure and
268
PBL were used as independent parameters. The missing values of AOD in the study sites were
269
replaced by the median values. The assessment of vertical profile of aerosol and RH in
270
AOD-PM2.5 correlation was carried out using AOD-PM without any auxiliary data, and with
271
mixing layer height and surface level RH in the regression model. Likewise, linear regression
272
model was performed for the vertical profile of RH. The analysis revealed less significant
273
improvement in PM-AOD correlation by introducing the hygroscopic factor on the
274
measurements of PM mass concentration that was discarded for the study in hand. While, the
275
introduction of PBL data led to significant improvement that could not be dropped. The
276
regression results of the model are given in Table 2. As clear from this table, the statistically
277
significant coefficients of regression (P < 0.0) indicate the adequacy of the model fitting
278
(Chelani, 2018).
279 280 281
282 283 284 285
Table 2. Statistical results of MLR analysis. Coefficient
p-Value
Standard Error
Constant
433.64
0.00
42.22
AOD
2.04
0.01
1.22
NDVI
-95.01
0.00
14.40
Temperature
0.26
0.00
0.03
Wind Speed
-0.42
0.00
0.04
Humidity
-0.30
0.00
0.00
Pressure
-0.36
0.00
0.40
286 287
The regressions in scatter matrix (see Figure 5) have shown significant correlations (R) values
288
between estimated and ground observed PM2.5 concentrations of 0.96, 0.87 and 0.76 in 2015,
289
2016 and 2017, respectively.
290 291 292 293
Figure 5. Scatter matrix, linear regression analysis of estimated PM2.5 (PM.Est) concentrations and ground-base PM2.5 (PM.Obs) concentrations in μg ( during 2015-2017.
294
The MLR model estimations were completely validated with mean monthly ground level PM2.5
295
data in four coastal sites of the study area. The selection of these sites was based on the present
296
ground PM2.5 concentration data. While ANN was used to estimate the PM2.5 in the remaining
297
study sites with details given in section 3.4 and 3.5.
298 299
3.3. Temporal Variation of PM2.5 concentration
300
The mean monthly concentration of PM2.5 from August 2015 to December 2017 was obtained by
301
using Equation 10. The Figure 6 depicts the time series PM2.5 concentration over the selected
302
sites in the study region during the study period. The results depict the variation in PM2.5
303
concentrations pattern from mean monthly estimated PM2.5 concentration with a standard
304
deviation ( 30.48 ± 4.8 µg
305
30.18 µg
306
concentrations of 39.38 μg
(
(
). While, the associated monthly mean ground based PM2.5 is
. The regression results have shown the highest averaged estimated PM2.5 (
and 46.4 μg
(
in November 2015 and 2016, respectively, while
307
39.85 μg
308
(22.31μg
309
variation pattern may be influenced by local meteorological conditions, land use and geography
310
(Xu et al., 2017; Lin et al., 2016).
311
Figure 6. Time series variation of mean monthly PM2.5 concentrations ( μg/m( ± standard deviation) during 2015-2017. The green horizontal line represents the mean concentration level of PM2.5.
312 313 314 315 316
(
(
in December 2017 in the study area. In contrast, the lowest averaged estimated PM2.5 ) in August 2015, (19.9 μg
(
) in March 2016 and (25.19 μg
(
) in June 2017. This
Several studies have linked the highest values of PM2.5 during the November-December (cold
317
days) as compared to the lowest values in March, June and August (warm days) with the lowest
318
ventilation capability, lowest rate of humidity, lower temperature, and decreased atmospheric
319
height (Elbayoumi et al., 2015). Similar results were observed in the urbanized location of
320
Beijing with highest PM2.5 concentration in winter and lowest in summer seasons (Zhao et al.,
321
2009). The increasing trend of PM concentrations in metropolitan cities is associated with
322
distributed urbanization and human activities in the context of anthropogenic pollutants. In
323
addition, these pollutants include aerosols, urban construction, emissions of industries,
324
automobile and burning gases of fossil fuels. The overall correlation between observed and
325
estimated PM2.5 is shown in the Figure 7a. Similarly, to provide the adequacy of the linear
326
regression model the non correlation of residuals (errors) is necessary and is shown in Figure 7b
327
(Ul-Saufie, 2011). Hence, the results of both figures depicted the goodness of MLR application
328
in estimating the PM2.5 concentration from August 2015 to December 2017.
329
(
330 331 332 333
Figure 7. (a) Correlation between mean monthly PM2.5 concentration (μg and (b) Residual analysis.
) during 2015-2017
334
satellite data during the study period. From the Figure 5 and 7, the cross validation results
335
suggested that the concerned MLR model (e.g. Equation 10) can be used to estimate ground level
The calculated correlation (see Figure 7a) was based solely on the complete set of ground and
336
PM2.5 from satellite AOD with applied conditions. However, it can be used merely in those
337
locations for which complete input datasets exist, otherwise their result will need to validate with
338
different approaches. The fusion of meteorological and land use data with satellite AOD has
339
proven its importance to estimate particulate matter estimation. Likewise, the sole satellite AOD
340
retrievals are not sufficient to estimate PM2.5 concentration (Chelani. 2018). Nguyen et al. (2015)
341
in their study carried similar procedure to predict the ground based PM concentration using MLR
342
at various sites in Vietnam. Their results revealed good correlation (0.69) between MODIS-AOD
343
and ground level AOD concentration. The extended MLR models have revealed enhanced results
344
over the simple and ordinary MLR models. Xin et al. (2014) in their study used satellite AOD as
345
predictors with climatic parameters and found comparatively results with other statistical models
346
in the continental United States. Likewise, Liu et al. (2007) estimated PM2.5 from satellite AOD
347
and found good estimation.
348 349
3.4. PM estimations with neural networks
350
This section reports the PM estimation with NN. Each input vector consists of MODIS-AOD,
351
MODIS-NDVI and meteorological data. Based upon these inputs, 7 neurons in the input layer
352
were employed. Further, the output layer consists of only one neuron for the estimated PM2.5.
353
The input dataset (3536) from training DDN was collected from 21 stations in Karachi during
354
2015-2017 and evaluated over each station rather than a few stations as in case of MLR model.
355
To get optimum performance by ANN, various numbered pairs of transfer functions were tested
356
for the hidden and output layers by changing the neuron numbers in the hidden layers. Among
357
these combinations the results of the best combination with maximum correlation coefficient and
358
minimum RMSE were taken. The average accuracy of several runs for each architecture
359
(network) is reported in Table 3. In general, one hidden layer is enough in the network structure
360
to estimate any non linear output with required accuracy. However, more than one hidden layers
361
can be appropriate for complex data sets (Chaloulakou et al., 2003). The network layer with
362
seven neurons in the input layer, as two hidden layers, including 14 hidden neurons plus a single
363
neuron in the output layer revealed best estimations.
364
Table 3. Statistical results for different ANN networks. R S.No.
Network
Validation
Test set
Training set
All
1
7-32-32-1
0.78
0.78
0.80
0.80
2
7-32-64-1
0.79
0.72
0.81
0.80
3
7-128-128-1
0.79
0.74
0.81
0.80
4
7-256-256-1
0.79
0.76
0.80
0.80
5
7-32-32-32-1
0.78
0.80
0.80
0.80
6
7-32-64-128-1
0.79
0.78
0.76
0.80
365 366
3.5. Daily estimation of PM2.5
367
Daily averaged PM2.5 concentrations were estimated by the pre-trained DNNs (see Figure 8). The
368
convergence of the training was decided by evaluating the accuracy on the validation set for each
369
epoch. If the accuracy on the validation set does not increase, we reduced the learning rate and
370
try for several iterations. If the accuracy does not improve for 6 checks, we stopped the training
371
and concluded that training was converged.
372 373 374 375 376
Figure 8. Time series variation of daily mean PM2.5 concentrations (μg/m ( ) during 2015-2017. The relationships between observed and estimated PM2.5 concentrations using ANN are shown in the Figure 9.
377 378 379 380 381
Figure 9: The correlation (R) values for training, validation, test, and all data regression analysis. The “R” value for regression analysis of training, validation, test and all data are 0.80, 0.78, 0.80,
382
and 0.80, respectively. The R value > 0.70 here is considered significant and showed a strong
383
positive linear relationship between input and output vectors. Therefore, in this study the ANN
384
proposed model is capable to handle such random variation and is considered good and
385
acceptable. The correlation factor values nearly equal to 0.6 were considered normal in the
386
context of random climate change (Ul-Saufie et al., 2013). In addition, Figure 10 has shown the
387
proceeding pattern of network training. It can be observed that the mean squared error (MSE)
388
between the target and estimated output decreases as training proceeds. The plot showed that
389
neither under-fitting nor over fitting has occurred. The convergence criterion determines when
390
the network training should stop. This is required to avoid both under-fitting and over-fitting. In
391
the proposed work, we divided the training set into training, validation, and test set. The
392
convergence is decided by evaluating validation accuracy after each epoch. If the accuracy on
393
the validation set does not increase for three consecutive iterations, we reduced the learning rate
394
and continue training. If the accuracy is not improved for six checks and the learning rate
395
become very small, we stopped the training and concluded that training has converged. Further,
396
as shown in the figure, the convergence is achieved on 23rd epoch. However, it does not change
397
much after 23rd epoch.
398 399 400
Figure 10. Mean square error values for training, validation and test.
401
4. Conclusion
402
The main objective of the proposed study was to use a combined approach of linear regression
403
(MLR) and machine learning (ANN) to estimate the PM2.5 concentrations from 2015 to 2017 in
404
the most urbanized city of Pakistan. In addition, the satellite AOD was incorporated with
405
meteorological and land use (NDVI) parameters to develop these models. Both models have
406
shown positive response with respect to the target. MLR model was used to estimate mean
407
monthly PM2.5 at those sites, where ground data were present at four out of seven coastal sites.
408
Therefore, MLR results were good, but limited at few stations. In contrast, ANN was prepared
409
and trained on input datasets to estimate the daily averaged PM2.5 overall study sites. The results
410
were consistent and strongly correlated with observed PM2.5 concentration. The results obtained
411
from the combined approach of MLR and ANN models were found significant with respect to
412
ground observed measurements. The main outcomes are given below.
413
•
the context of handling large datasets, high efficiency and accuracy with low error.
414 415
•
The accuracy of PM2.5 estimation was enhanced by including the meteorological and land use data with satellite AOD.
416 417
The results revealed that the performance of ANN can be preferred over MLR model in
•
In case of estimating PM2.5 concentration, if the relationship between input and output
418
vectors is non-linear, the deep neural network can still approximate PM2.5 due to
419
nonlinear activation functions.
420 421
•
ANN with nonlinear activation function will perform better than MLR. However, MLR can better perform if decision boundaries can be linearly separated.
422
•
Properly trained ANN can be extended to a large study area to fulfill the spatiotemporal
423
gaps in the ground level PM observations with associated guidelines to monitor air
424
quality.
425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444
445
References
446
Alam, K., Blaschke, T., Madl, P., Mukhtar, A., Hussain, M., Trautmann, T., & Rahman, S. 2011.
447
Aerosol size distribution and mass concentration measurements in various cities of Pakistan.
448
Journal of Environmental Monitoring, 13, 1944-1952.
449
Alam, K., Khan, R., Blaschke, T., & Mukhtiar, A. 2014. Variability of aerosol optical depth and
450
their impact on cloud properties in Pakistan. Journal of Atmospheric and Solar-Terrestrial
451
Physics, 107, 104-112.
452
Alam, K., Qureshi, S., Blaschke, T. 2011. Monitoring spatio-temporal aerosol patterns over
453
Pakistan based on MODIS, TOMS and MISR satellite data and a HYSPLIT model. Atmos.
454
Environ. 45, 4641-4651.
455
ALVAREZ, R., BONIFAZ, R., LUNETTA, R.S., GARCI´A, C., GO´ MEZ, G., CASTRO, R.,
456
BERNAL, A. and CABRERA, A.L., 2003, Multitemporal land-cover classification of Mexico
457
using Landsat MSS imagery. International Journal of Remote Sensing, 24, 2501-2514.
458
Arvani B, Pierce RB, Lyapustin AI, Wang Y, Ghermandi G, Teggi S. 2016. Seasonal monitoring
459
and estimation of regional aerosol distribution over Po valley, northern Italy, using a high-
460
resolution MAIAC product. Atmospheric environment. 141:106-121.
461
Boiyo, R., Raghavendra Kumar, K., Zhao, T. 2017. Statistical intercomparison and validation of
462
multisensory aerosol optical depth retrievals over three AERONET sites in Kenya, East
463
Africa. Atmos. Res. 197 (15), 277-288.
464
Chaloulakou, A., Grivas, G., & Spyrellis, N. 2003. Neural network and multiple regression
465
models for PM10 prediction in Athens: a comparative assessment. Journal of the Air & Waste
466
Management Association, 53(10), 1183-1190.
467
Chelani, A. B. 2018. Estimating PM2.5 concentration from satellite derived aerosol optical depth
468
and meteorological variables using a combination model. Atmospheric Pollution Research.
469
Ceylan, Ζ., & Bulkan, S. (2018). Forecasting PM10 levels using ANN and MLR: A case study
470
for Sakarya City. GLOBAL NEST JOURNAL, 20(2), 281-290.
471
Chu, D.A.; Tsai, T.-C.; Chen, J.-P.; Chang, S.-C.; Jeng, Y.-J.; Chiang, W.-L.; Lin, N.-H.
472
Interpreting aerosol lidar profiles to better estimate surface PM2.5 for columnar AOD
473
measurements. Atmosp. Environ. 2013, 79, 172–187.
474 475
Dastoorpoor M, Idani E, Khanjani N, Goudarzi G, Bahrampour A. 2016. Relationship between air pollution, weather, traffic, and traffic-related mortality. Trauma monthly. 21(4).
476
Draxler, R.R., Rolph, G.D. 2011. HYSPLIT (HYbrid Single-particle Lagrangian Integrated
477
Trajectory) Model, Accessed via NOAA ARL READY Website. NOAA Air Resources
478
Laboratory, Silver Spring, MD.
479
Elangasinghe, M., Dirks, K., Singhal, N., Costello, S., Longley, I., Salmond, J. 2014. A simple
480
semi-empirical technique for apportioning the impact of roadways on air quality in an urban
481
neighbourhood. Atmos. Environ. 83, 99-108.
482
Elbayoumi, M., Ramli, N. A., & Yusof, N. F. F. M. 2015. Development and comparison of
483
regression models and feed forward back propagation neural network models to predict
484
seasonal
485
schools. Atmospheric Pollution Research, 6(6), 1013-1023.
486 487
indoor
PM2.5–10
and
PM2.5
concentrations
in
naturally
ventilated
Fang D and Wang J. 2017. A Novel Application of Artificial Neural Network for Wind Speed Estimation, International Journal of Sustainable Energy, 36(5), 415-429.
488
Ghaedrahmat Z, Vosoughi M, Birgani YT, Neisi A, Goudarzi G, Takdastan A. 2019. Prediction
489
of O3 in the respiratory system of children using the artificial neural network model and with
490
selection of input based on gamma test, Ahvaz, Iran. Environmental Science and Pollution
491
Research. 26(11):10941-50.
492
Gupta, P and Christopher, S.A. 2009. Particulate matter air quality assessment using integrated
493
surface, satellite, and meteorological products: multiple regression approach. J. Geophy. Res.
494
Atmos. 114, D14205.
495
Karimian, H., Li, Q., Li, C., Jin, L., Fan, J., Li, Y., 2016. An Improved Method for Monitoring
496
Fine Particulate Matter Mass Concentrations via Satellite Remote Sensing. Aerosol and Air
497
Quality Research 16, pp. 1081-1092.
498
Kaskaoutis, D.G., Kharol, S.K., Sinha, P.R., Singh, R.P., Badarinath, K.V.S., Mehdi, W.,
499
Sharma, M., 2011. Contrasting aerosol trends over South Asia during the last decade based on
500
MODIS observations. Atmospheric Measurement Techniques 4, 5275-5323.
501
Khwaja, H.A., Fatmi, Z., Malashock, D., Aminov, Z., Kazi, A., Siddique, A., Qureshi, J. and
502
Carpenter, D.O. Effect of air pollution on daily morbidity in Karachi, Pakistan. Journal of
503
Local and Global Health Science, 2013. p.3.
504
Liu, Y., Koutrakis, P., Kahn, R., Turquety, S., & Yantosca, R. M. (2007). Estimating fine
505
particulate matter component concentrations and size distributions using satellite-retrieved
506
fractional aerosol optical depth: Part 2. A case study. Journal of the Air & Waste Management
507
Association, 57(11), 1360-1369.
508
Lin, C.Q.; Li, Y.; Yuan, Z.B.; Alexis, K.H.; Deng, X.J.; Tse, T.K.L.; Fung, J.C.H.; Li, C.C.; Li,
509
Z.Y.; Lu, X.C.; et al. Estimation of long-term population exposur to PM2.5 for dense urban
510
areas using 1-km MODIS data. Remote Sens. Environ. 2016, 179, 13-22.
511
Lurie K, Nayebare SR, Fatmi Z, Carpenter DO, Siddique A, Malashock D, Khan K, Zeb J,
512
Hussain MM, Khatib F, Khwaja HA. PM2.5 in a megacity of Asia (Karachi). 2019. Source
513
apportionment and health effects. Atmospheric Environment. 202:223-33.
514 515
Ma, X., Wang, J., Yu, F., Jia, H., Hu, Y., 2016. Can MODIS AOD be employed to derive PM2.5 in Beijing-Tianjin-Hebei over China? Atmos. Res. 181, 250–256.
516
Malashock, D., Khwaja, H., Fatmi, Z., Siddique, A., Lu, Y., Lin, S., & Carpenter, D. (2018).
517
Short-Term Association between Black Carbon Exposure and Cardiovascular Diseases in
518
Pakistan’s Largest Megacity. Atmosphere, 9(11), 420.
519
Maleki H, Sorooshian A, Goudarzi G, Baboli Z, Birgani YT, Rahmati M. 2019. Air pollution
520
prediction by using an artificial neural network model. Clean Technologies and
521
Environmental Policy. 1:1-2.
522 523
Mishra, D., Goyal, P., Upadhyay, A., 2015. Artificial intelligence based approach to forecast PM2.5 during haze episodes: a case study of Delhi, India. Atmos. Environ. 102, 239-248.
524
Nguyen, Thanh TN, Hung Q. Bui, Ha V. Pham, Hung V. Luu, Chuc D. Man, Hai N. Pham, Ha
525
T. Le, and Thuy T. Nguyen. "Particulate matter concentration mapping from MODIS satellite
526
data: a Vietnamese case study." Environmental Research Letters 10, no. 9 (2015): 095016.
527
Russo, A., Lind, P.G., Raischel, F., Trigo, R., Mendes, M., 2015. Neural network forecast of
528
daily pollution concentration using optimal meteorological data at synoptic and local scales.
529
Atmos. Pollut. Res. 6, 540-549.
530 531
Saunders, R. O., Kahl, J. D., & Ghorai, J. K. (2014). Improved estimation of PM2.5 using Lagrangian satellite-measured aerosol optical depth. Atmospheric environment, 91, 146-153.
532
Soni, M., Payra, S., Verma, S., 2018. Particulate matter estimation over a semi-arid region
533
Jaipur, India using satellite AOD and meteorological parameters. Atmos. Pollut. Res. 949-
534
958.
535 536
Tandon, A., Yadav, S., Attri, A.K., 2010. Coupling between meteorological factors and ambient aerosol load. Atmospheric Environment 44 (9), 1237-1243.
537
Tsai, TC, Jeng, YJ, Chu, DA, Chen, JP, and Chang, SC. 2011. Analysis of the relationship
538
between MODIS aerosol optical depth and particulate matter from 2006 to 2008, Atmospheric
539
Environment, 45, 4777-4788.
540
Ul-Saufie, A., Yahaya, A., Ramli, N., Awang, N., Hamid, H. 2013. Future daily pm 10
541
concentrations prediction by combining regression models and feed forward back propagation
542
models with principle component analysis (PCA). Atmos. Environ. 77, 621-630.
543
van Donkelaar, A., Martin, R. V., and Park, R. J. 2006. Estimating ground-level PM2.5 using
544
aerosol optical depth determined from satellite remote sensing, Journal of Geophysical
545
Research: Atmospheres, 111, D21201.
546 547
Wallace, J.M. and Hobbs, P.V. 2006. Atmospheric science: an introductory survey. Vol. 92. Elsevier.
548
Wang, J., and S. A. Christopher. 2003. Inter-comparison between satellite derived aerosol optical
549
thickness and PM2.5 mass: Implications for air quality studies, Geophys. Res. Lett., 30(21),
550
2095.
551 552
Wu, C.; Yu, J.Z. Evaluation of linear regression techniques for atmospheric applications: The importance of appropriate weighting. Atmos. Meas. Tech. 2018, 11, 1233-1250.
553
Xin, J., Zhang, Q., Wang, L., Gong, C., Wang, Y., Liu, Z., & Gao, W. 2014. The empirical
554
relationship between the PM2.5 concentration and aerosol optical depth over the background
555
of North China from 2009 to 2011. Atmospheric Research, 138, 179-188.
556
Xu, J., & Pei, L. 2017. Air Quality Index Prediction Using Error Back Propagation Algorithm
557
and Improved Particle Swarm Optimization. In International Conference on Mechatronics and
558
Intelligent Robotics (pp. 9-14). Springer, Cham.
559
You, W. Zang, Z. Zhang, L. Li, Y. Pan, X. Wang, W. 2016. National-Scale Estimates of
560
Ground-Level PM2.5 Concentration in China Using Geographically Weighted Regression
561
Based on 3 km Resolution MODIS AOD. Remote Sens. 8, 184.
562
Zhao X, Zhang X, Xu X, Xu J, Meng W, Pu W. Seasonal and diurnal variations of ambient
563
PM2.5 concentration in urban and rural environments in Beijing. Atmos Environ. 2009; 43:
564
2893-2900.
Highlights
•
New & robust integrated approach of linear regression & deep neural network is used.
•
Land use and meteorological data significantly improve PM2.5 estimation.
•
Neural network with nonlinear activation function can better estimate PM2.5.
•
Deep neural network is more efficient and robust than previous regression models.
Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: