Estimating fine particulate concentration using a combined approach of linear regression and artificial neural network

Estimating fine particulate concentration using a combined approach of linear regression and artificial neural network

Journal Pre-proof Estimating fine particulate concentration using a combined approach of linear regression and artificial neural network Maqbool Ahmad...

2MB Sizes 0 Downloads 73 Views

Journal Pre-proof Estimating fine particulate concentration using a combined approach of linear regression and artificial neural network Maqbool Ahmad, Khan Alam, Shahina Tariq, Sajid Anwar, Jawad Nasir, Muhammad Mansha PII:

S1352-2310(19)30689-2

DOI:

https://doi.org/10.1016/j.atmosenv.2019.117050

Reference:

AEA 117050

To appear in:

Atmospheric Environment

Received Date: 11 May 2019 Revised Date:

8 October 2019

Accepted Date: 10 October 2019

Please cite this article as: Ahmad, M., Alam, K., Tariq, S., Anwar, S., Nasir, J., Mansha, M., Estimating fine particulate concentration using a combined approach of linear regression and artificial neural network, Atmospheric Environment (2019), doi: https://doi.org/10.1016/j.atmosenv.2019.117050. This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. © 2019 Published by Elsevier Ltd.

Graphical Abstract

Inputs

Models

Ground Observed PM2.5 Data

Multiple Linear Regression Model

Auxiliary Data

Artificial Neural Network

MODIS-AOD NDVI Meteorology Day/Month of Year

Input

Output Hidden Layer

Estimated PM2.5 Concentration

1

Estimating fine particulate concentration using a combined approach

2

of linear regression and artificial neural network

3 4

Maqbool Ahmad1*, Khan Alam2*, Shahina Tariq1, Sajid Anwar3, Jawad Nasir4, Muhammad Mansha4

5 1

6 7

Department of Meteorology, COMSATS University Islamabad, Pakistan

2

Department of Physics, University of Peshawar, Peshawar 25120, Khyber Pakhtunkhwa, Pakistan 3

8

Faculty of Computer science & Engineering, Ghulam Ishaq Khan Institute of Engineering &

9 10

Technology, Swabi, Pakistan 4

Pakistan Space and Upper Atmosphere Research Commission (SUPARCO), P.O. Box 8402, Off

11

University Road, Karachi- 75270, Pakistan

12 13 14 15 16 17 18 19 20 21 22

*Corresponding author:

23

Email: [email protected]; [email protected]

24

Phone: +92-91-9216727

25

Abstract

26

Fine particulate matter (PM2.5) is directly associated with the degradation of air quality and

27

environmental health effects. PM2.5 is gaining much attention through its environmental impacts,

28

but the inadequacy of ground based measurements limits the understanding of PM2.5 over many

29

regions. This study is aimed to employ a new and integrated approach of multiple linear

30

regression (MLR) and artificial neural networks (ANN) to estimate the ground level PM2.5

31

concentration using satellite aerosol optical depth (AOD), land use data and meteorological

32

parameters. AOD from Moderate Resolution Imaging Spectroradiometer (MODIS) aerosol

33

products (MOD04) with Dark Target Deep Blue Combined algorithm at 10 km spatial resolution

34

were retrieved for the most urbanized and industrialized city of Karachi, Pakistan during

35

2015-2017. The results of the MLR model revealed a good agreement with the ground observed

36

data through correlation (R) 0.96, 0.87 and 0.76 for 2015, 2016 and 2017, respectively. The

37

ANN with error back propagation algorithm was developed using AOD with binning of land use

38

and meteorological parameters with associated spatio-temporal terms. The data sets were

39

assembled into three groups, with 80 % data for training and 10 % each for validation and

40

testing. ANN revealed good correlation coefficients (R) 0.80, 0.80 and 0.78 for training, test and

41

validation, respectively. The proposed study has shown the enhanced accuracy in estimating

42

PM2.5 concentration by including meteorological and land use data with satellite AOD. The

43

results showed that both MLR and ANN are in closed agreement and capable to estimate PM2.5

44

concentrations. Overall, for the estimation of particulate concentration, ANN is more powerful

45

technique and can be used to estimate long term particulate matter concentration with associated

46

guidelines to monitor air quality in any region.

47

Keywords: Multiple linear regression, artificial neural network, AOD, MODIS

48

1. Introduction

49

The unprecedented rate of urbanization in developing countries is continuously driving the

50

threats to air quality with associated public health and the environment. Along with, the

51

simultaneous increasing rates in population and transportation have given dangerous indication

52

levels of air pollutants (Malashock et al., 2018). Among these, particulate matter (PM) is

53

considered as a critical air pollutant with associated air quality and adverse health problems.

54

Further, these issues are compounded by road traffics, industrialization, infrastructure activities

55

and the absence of environmental regulation (Nguyen et al., 2015). PM with an aerodynamic

56

diameter of 2.5 µm (PM2.5) has attracted more attention due to its adverse respiratory lung

57

problems, cardiovascular disease and morbidity outcomes (Chu et al., 2013; You et al., 2016).

58

The estimation algorithms of PM2.5 concentrations are dependent on ground and remote sensing

59

observations to a great extent. However, the ground-level PM2.5 estimations were constrained

60

due to their limited spatial coverage.

61

Satellite remote sensing of aerosols has played a major role to mitigate the limitations of ground

62

based measurements. Satellite derived aerosol optical depth (AOD) is proven to be a potent

63

parameter for the assessment of air quality on a large scale. However, it cannot be used alone to

64

estimate the PM2.5 concentration. Because AOD is the measure of column aerosol loadings while

65

particulate matter is used to indicate the mass concentration near the earth’s surface (Alvarez et

66

al., 2003). Despite of their contrasting units, satellite AOD can be used as a surrogate of PM

67

(Gupta and Christopher, 2009). With AOD, the incorporation of meteorological parameters, and

68

mixing layer height improve the estimation of PM concentration in regression models. It is also

69

proven that, instead of aerosol size distribution, considering the vertical profile of relative

70

humidity could improve the AOD-PM2.5 correlation (Soni et al., 2018; Karimian et al., 2016).

71

Wang and Christopher (2003) have shown a good correlation (> 0.7) between ground level PM2.5

72

and AOD at 0.55 μm from Moderate Resolution Imaging Spectroradiometer (MODIS). Ma et al.

73

(2016) have enhanced the correlation results (0.68) by including the meteorological parameters

74

values in their statistical model. Besides, the long history of regression models in various

75

disciplines limitations uncertainties still exist due to the non-linear relationship with parameters.

76

Therefore, these models may not provide a high level of accuracy in complex, extreme, nonlinear

77

data with associated results in terms of regression assumptions and multiple co-linearity between

78

input and output variables (Ul-Saufie et al., 2011). To overcome these issues new approaches

79

were introduced in several studies (Elbayoumi et al., 2015).

80

Artificial neural networks (ANN) consisting of more than one hidden layer, are called deep

81

neural networks (DNN). DNN’s have recently shown very good performance on many visions,

82

speech, and natural language processing (NLP) benchmarks. The structures of ANN models can

83

vary depending upon the situation in hand and training behavior in terms of geo-location,

84

meteorological and temporal conditions (Russo et al., 2015). Their performances have shown

85

good accuracy, where monitoring stations are used to measure concentrations of PM2.5 (Fang and

86

Wang, 2017). Interestingly, ANN can approximate an unknown function f(x) by learning from

87

training data. The f(x) can represent a classification or regression problem. Specific to this study,

88

we can train an ANN for estimating the particulate matter as a regression problem.

89

Likewise, Dastoorpoor et al. (2016) monitoring and understanding of temporal variation of

90

particulate concentration on local scales over compound urban environment like Karachi will

91

need a robust algorithm in support to regression models. Ground based PM2.5 is rarely present in

92

most developing regions like Pakistan. Therefore, it is very difficult to assess long range PM2.5

93

concentration trends. In Karachi, air quality was assessed by analyzing PM2.5, aerosols, and other

94

trace elements collected at two locations of Tibet center and Korangi (Lurie et al., 2019).

95

However, in their study investigation about the influence of land cover type and meteorological

96

conditions with AOD on estimation of PM2.5 was not discussed. Therefore, this study proposed a

97

new approach to estimate the PM2.5 from MODIS AOD with land use data and other climatic

98

conditions at twenty one locations in Karachi. The most powerful model of DNN in support to

99

MLR was used to estimate PM2.5 concentration from August 2015 to December 2017. This will

100

be a distinctive approach to provide a base for the estimation of long term PM2.5 concentration

101

over the study area.

102 103

2. Materials and Methods

104

2.1. Data description

105

2.1.1. Ground measurements

106

The study domain covers the most urbanized city (Karachi) of Pakistan. Karachi (24.86 N,

107

67.00 E) is a coastal city situated at the bank of Arabian Sea. The total area of study region is

108

3527 km² with a most population density of about 20 million in South Asia (Alam et al., 2011;

109

Lurie et al., 2019). According to Khwaja et al. (2013), Karachi is comprised of residential,

110

commercial and urban/suburban locations. It has a desert like (subtropical) climate with scanty

111

rainfall, humid in summer (wettest month of August) and dry in winter (driest month of

112

December). The major air pollutants (aerosol) in Karachi are mainly emitted into the atmosphere

113

due to automobile exhaust gases, chemical pollutants by land vehicles, local dust transportation,

114

fossil fuels, and industrial byproducts (Alam et al., 2014). In total, 21 monitoring stations in

115

domain of Karachi are added to estimate daily PM2.5 concentrations in the period from

116

2015-2017, details are given in Table 1. The ground PM observation stations and meteorological

117

station in Karachi are shown by red and green dots, respectively (see Figure 1).

118 119

Figure 1. Study area with spatial distribution of monitoring stations.

120

The obtained data from stations were pre-processed using geo-statistical techniques to bring in

121

consistent required scale. Further, the data was used as input parameters for statistical and neural

122

networking models. The data exclude unvalued and outliers data.

123

Table 1. List of ground PM2.5 monitoring sites in Karachi.

124

Sites

Name of Site

Sites

Name of Site

Description

1 2 3 4 5 6 7 8 9 10 11

Coastal Site 1 Coastal Site 2 Coastal Site 3 KEPZ Bahria Town TF Highway PITB FTC NIPA Sohrab Goat1 Nizamabad

12 13 14 15 16 17 18 19 20 21

Port Qasim1 Port Qasim2 Coastal Site 4 Clifton Dawood Hoshang Road Brooks Regal Sarjani Town Sohrab Goat2

Coastal Coastal Coastal Coastal Residential Transportation Commercial Commercial Public Suburb Suburb

125

2.1.2. Meteorological Data

126

Meteorological data sets (temperature, wind speed, specific humidity, atmospheric pressure and

127

wind direction) were acquired from weather monitoring stations installed at each location and

128

Pakistan Meteorological Department (PMD) in the study region. The meteorological data was

129

simultaneously collected along with PM concentrations.

130

Planetary Boundary Layer (PBL) height data of Modern-Era Retrospective Analysis for

131

Research and Applications, version 2 (MERRA-2) was used. The PBL consideration

132

significantly impacts the empirical models for the PM and AOD relationship, because at different

133

heights the hygroscopic growth changes particle extinction properties (Chu et al., 2013; Tsai et

134

al., 2011; van Donkelaar et al., 2006). Conventionally, the AOD products render light extinction

135

effects of both coarse and fine particles. However, assuming well mixing of the boundary layer,

136

specific humidity, and extinction coefficient for dry aerosol remains unchanged (Wallace and

137

Hobbs, 2006). Likewise, the PM2.5-AOD relationship can be found approximately on the vertical

138

profile of relative humidity (RH) in their correlation. As to remove the effect of suface-level

139

water vapor on PM, it was measured in dry state. Therefore, humidity correction was also needed

140

by considering surface level RH and vertical distribution of RH within the boundary layer.

141 142

2.1.3. Aerosol Robotic Network

143

Aerosol robotic network (AERONET) is a ground based aerosol network established by the

144

National Aeronautics and Space Administration (NASA). It directly measures the sun and

145

diffuses sky radiances through the CIMEL sun/sky radiometers in the wavelength range of

146

340-1020 nm and 440-1020 nm (Bibi et al., 2017). The AERONET data comes in three levels,

147

i.e. level 1.0 (unscreened), Level 1.5 (cloud screened) and level 2.0 (cloud screened and quality

148

assured). In the present study, AERONET level 2.0 data is used to validate satellite AOD data to

149

confirm their retrieval accuracy for the condition of this study. For validation, the daily mean

150

AOD of 500 nm from AERONET was calibrated to a common wavelength of 550 nm of MODIS

151

using the following equation. AOD

152

= AOD

550 500

(1)

Where, α is angstrom exponent with value of range 440–870 nm (Alam et al., 2011).

153 154

2.1.4. Moderate Resolution Imaging Sepctroradiometer

155

Moderate Resolution Imaging Sepctroradiometer (MODIS) mounted on space borne Terra and

156

Aqua satellites has high radiometric resolutions (Draxler and Rolph, 2011). Both these satellites

157

cross the equator from north to south at 10:30 and south to north at 13:30, respectively. MODIS

158

has 36 spectral bands lying in the range of wavelengths from 0.41 µm to 14.4 µm. It has a swath

159

width of 2330 km. Due to its broad spectral and spatial resolution, MODIS provides detailed

160

near daily measurements of aerosol optical depth and other applications. According to Qian et al.

161

(2012), Deep Blue (DB) algorithm is used very effectively in bright surfaces (e.g. desert, urban

162

areas and semi-arid areas), while Dark Target (DT) algorithm in dark surfaces (e.g. mid-visible

163

and red wavelengths, suitable over vegetative land). MODIS datasets are broadly used to assess

164

various properties of aerosols and their associated climatic impacts using various retrieving

165

algorithms.

166

In this study, the collection 6, Level 2 MODIS dataset (MOD04 for Terra) by using AOD550

167

Dark Target Deep Blue Combined algorithm was selected at a spatial resolution of 10 km from

168

August 2015 to December 2017. The relationship between MODIS-AOD and AERONET AOD

169

was also calculated to assess the accuracy of satellite data. A good correlation (0.95) between

170

MODIS-AOD and AERONET-AOD was found. Satellite retrievals were re-sampled and

171

re-projected to bring in the same projection coordinate system. Generally, the aerosols are

172

assorted and limited within the PBL, so the AOD values normalized by PBL height for mean

173

PBL, were considered as extinction (in km-1). It is therefore, considered for the surface PM

174

concentration, while accounting for PBL depth variations (Arvani et al., 2016). Apportioned by

175

land use cover, MODIS Level 3, Normalized Difference Vegetation Index (NDVI) vegetation

176

product (MOD13) was also procured during the same period. MODIS based NDVI was used in

177

the AOD-PM2.5 model to reflect the land use cover type. The MODIS data can be downloaded

178

from the website, www.search.earthdata.nasa.gov/search.

179 180

2.2. Linear regressions

181

Linear regression models relate the response of the output parameters to input parameters. This

182

relationship is mainly focused to understand the effect of input parameters in response to output

183

parameters. Several studies have used both linear and multiple linear regression models for the

184

estimation of PM2.5. However, simple linear regression models render strict assumptions in

185

improper handling of the measurement uncertainties and may cause non-negligible errors (Wu et

186

al., 2018). These errors are found in estimated ground-level PM2.5 concentrations, where AOD is

187

the sole predictor (see Equation 2). Therefore, in this study simple linear regression is not

188

considered to estimate the PM2.5 estimation. PM

. (

)

= β AOD + β" (2)

189

where, AOD is the aerosol optical depth at a given region and β , β" are the regression

190

co-efficients obtained from the least squares method during the regression of ground based

191

measured PM concentration and AOD. Multiple linear regression (MLR) models have the

192

capabilities to resolve these issues and improve the model reliability up to a great extent. In this

193

study satellite AOD with meteorological parameters are taken as input parameters and PM2.5 is to

194

be estimated as output variable. Nevertheless, these estimations are influenced by the vertical

195

distribution of aerosols along other meteorological parameters (Tandon et al., 2010). The

196

MODIS AOD with meteorological variables, PBL (in km), and land use cover e.g. NDVI were

197

included in multi regression models to improve ground level estimation of PM2.5 as shown by

198

equation 3 (Saunders et al., 2014). PM

. (

)

= β + β" AOD + β NDVI + β( (T) + β* (WS) + β (H) + β. (P) + β/ (PBL) (3)

199

where, β and β"

200

each day, NDVI is the normalized difference vegetation index, T is temperature, WS is wind

201

speed, H is humidity, P is pressure and PBL is Planetary Boundary Layer Height.

/

are the regression co-efficients, AOD is MODIS-AOD values, d represents

202 203

2.3. Artificial Neural Networks

204

The artificial neural networks with regression models are aimed to assess the comparative

205

analysis of estimated PM2.5. ANN is known due to its superiority over traditional regression

206

methods due to its efficient computations, generalization and limited dependence on prior

207

knowledge (Elangasinghe et al., 2014). A prototype of ANN with error back propagation

208

algorithm is used as shown in the Figure 2.

6"(

5"

N1

5

209

5" ∗ 6"( + 5 ∗ 6

N2

N3

6"* 6

Input Layer

6

(

8 = Out (1 − Out )

6(

(

N5 N4

Error

6*

In = N( ∗ 6( + N* ∗ 6*

*

Hidden Layer

Out = 4(In ) Output Layer

210 211 212 213

Figure 2. Schematics of ANN with error back propagation algorithm. “In” represents input and “Out” represents the output.

214

The output at each neuron is computed by collecting weighted sum from neurons in pervious

215

layer. The weights (w) were initialized randomly from a uniform or normal distribution. The

216

figure shows this for N3 by using Equation 4.

In the above figure, ANN has one input, one hidden and one output layer with 2-2-1 neuron each.

N( = 4(5" 6"( + 5 6 ( ) (4)

217

Where 4(. ) is the activation function (ReLu, sigmoid etc.). N represents neuron and 5 input

218

vectors. Once the output is computed at N5, it is followed by comparing it with ground truth

219

value and computes the error. As the initial weights were obtained from random distribution, the

220

current output may not match with ground truth. This error is then propagated backwards for

221

computing change in each weight. The error (8) can be computed by Equation 5a. The

222

Out ; (1 − Out ; ) is the derivative of the sigmoid activation function in the output layer, while

223

the second term (Target ; − Out ; ), computes the difference between the current and desired

224

performance. 8; = Out ; (1 − Out ; )(Target ; − Out ; ) (5a)

225

Consequently, the change in weight is then computed using following equation 5b. ∆6( = 8D ∗ Out ;( ∗ Learning rate(ԉ) + 6( (5b)

226

In this way the weights and biases of the neural network are fine-tuned, and the cost function is

227

minimized. The network uses mean squared error (MSE) as the cost function, which is shown in

228

Equation 6. Based upon the error back propagation algorithm the proposed DNN network is

229

shown in Figure 3. Spatial Terms

MODI-AOD Temporal Terms MODIS-NDVI Meteorological Data

230 231

Back Propagation

Hidden Layers

INPUT Layers Figure 3. Estimation of PM2.5 with several hidden layers using DNN.

Estimated PM2.5

OUTPUT

232 ;

1 MSE = I(y − t ) (6) N K"

233

Where, N, y and t represent the number of training samples, current output, and ground truths.

234

In this study, seven dimensions in the input layer were mapped to a scalar floating point value of

235

PM2.5 concentration during 2015-2017 using the function fitting neural network. The model is

236

trained for input parameters, e.g. satellite AOD, meteorological parameters, NDVI, and spatio-

237

temporal information. The overall data sets were being assembled into three groups, with 80 %

238

data for training and 10 % each for validation and testing was used in the network of interest.

239

The DNN with error back propagation algorithm was implemented by using MATLAB.

240

2. 4. Evaluation of estimation performance

241

The evaluation of estimated PM2.5 concentration using the concerned modeling approaches is

242

based on the statistical parameters, e.g. correlation co-efficient (R), coefficient of determination

243

(R2), mean square error (RMSE) with mean absolute error given in Equation 7-9 (Ghaedrahmat

244

et al., 2019; Maleki et al., 2019; Ceylan and Bulkan, 2018; Nguyen et al., 2015). ∑ K"(c − m ) R = 1 − N R (7) ∑ K"(c − m Q)

MAE = I

(c − m ) (8) n K"

RMSE = UI

(c − m ) , MSE = (RMSE) (9) n K"

245

where c represents the estimated value by model, m the observed value, n is the number of

246

observed data pairs. The overall methodology of the proposed study is shown in the below

247

Figure 4.

Satellite Images MOD04/MOD13

Re-sampling

Ground PM & Meteorological Data

Spatiotemporal Data integration

PM

Pre-processing

Grd. MET

Multiple Linear Regression Model Grd. PM Artificial/Deep Neural Networking

Correlation Results

PM Estimation

Temporal Variation

Sat. PM

Validation

248 249 250

Figure 4. Methodology of the proposed research work. Grd. PM and Sat. PM represent ground and satellite observed PM, respectively.

251 252

3. Results and Discussion

253

3.1. Validation of satellite-derived AOD data

254

The first step in the analysis of this study was the validation of satellite-derived AOD with

255

ground level AERONET AOD data. Although, there are many ways to validate it, but the cross

256

validation method is most widely used, as AERONET Level 2 data is free of cloud

257

contamination (Boiyo et al., 2017; Ma et al., 2016). This validation was very important due to its

258

larger columnar spatial coverage as compared to ground level PM2.5 point data. AERONET AOD

259

data was compared with that of MODIS AOD. The study period was selected on the basis of

260

available ground PM data, e.g. from 2015 to 2017. The MODIS AOD was found in good

261

agreement (R > 0.85) with AERONET during the study period. This agreement provides an

262

accurate guarantee of MODIS AOD to be used in PM2.5 estimation.

263 264

3.2. Model estimation and Accuracy analysis

265

Several MLR models were developed to estimate the PM2.5 concentrations during the study in

266

hand. Hence, the best model with the highest R (0.96) was chosen (see Equation 10). PM

.

= 498.94 + 2.73AOD − 111.28NDVI + 0.24T— 0.53WS − 0.39H − 0.42P − 0.001PBL (10)

267

In this model MODIS derived AOD, NDVI, temperature, wind speed, humidity, pressure and

268

PBL were used as independent parameters. The missing values of AOD in the study sites were

269

replaced by the median values. The assessment of vertical profile of aerosol and RH in

270

AOD-PM2.5 correlation was carried out using AOD-PM without any auxiliary data, and with

271

mixing layer height and surface level RH in the regression model. Likewise, linear regression

272

model was performed for the vertical profile of RH. The analysis revealed less significant

273

improvement in PM-AOD correlation by introducing the hygroscopic factor on the

274

measurements of PM mass concentration that was discarded for the study in hand. While, the

275

introduction of PBL data led to significant improvement that could not be dropped. The

276

regression results of the model are given in Table 2. As clear from this table, the statistically

277

significant coefficients of regression (P < 0.0) indicate the adequacy of the model fitting

278

(Chelani, 2018).

279 280 281

282 283 284 285

Table 2. Statistical results of MLR analysis. Coefficient

p-Value

Standard Error

Constant

433.64

0.00

42.22

AOD

2.04

0.01

1.22

NDVI

-95.01

0.00

14.40

Temperature

0.26

0.00

0.03

Wind Speed

-0.42

0.00

0.04

Humidity

-0.30

0.00

0.00

Pressure

-0.36

0.00

0.40

286 287

The regressions in scatter matrix (see Figure 5) have shown significant correlations (R) values

288

between estimated and ground observed PM2.5 concentrations of 0.96, 0.87 and 0.76 in 2015,

289

2016 and 2017, respectively.

290 291 292 293

Figure 5. Scatter matrix, linear regression analysis of estimated PM2.5 (PM.Est) concentrations and ground-base PM2.5 (PM.Obs) concentrations in μg ( during 2015-2017.

294

The MLR model estimations were completely validated with mean monthly ground level PM2.5

295

data in four coastal sites of the study area. The selection of these sites was based on the present

296

ground PM2.5 concentration data. While ANN was used to estimate the PM2.5 in the remaining

297

study sites with details given in section 3.4 and 3.5.

298 299

3.3. Temporal Variation of PM2.5 concentration

300

The mean monthly concentration of PM2.5 from August 2015 to December 2017 was obtained by

301

using Equation 10. The Figure 6 depicts the time series PM2.5 concentration over the selected

302

sites in the study region during the study period. The results depict the variation in PM2.5

303

concentrations pattern from mean monthly estimated PM2.5 concentration with a standard

304

deviation ( 30.48 ± 4.8 µg

305

30.18 µg

306

concentrations of 39.38 μg

(

(

). While, the associated monthly mean ground based PM2.5 is

. The regression results have shown the highest averaged estimated PM2.5 (

and 46.4 μg

(

in November 2015 and 2016, respectively, while

307

39.85 μg

308

(22.31μg

309

variation pattern may be influenced by local meteorological conditions, land use and geography

310

(Xu et al., 2017; Lin et al., 2016).

311

Figure 6. Time series variation of mean monthly PM2.5 concentrations ( μg/m( ± standard deviation) during 2015-2017. The green horizontal line represents the mean concentration level of PM2.5.

312 313 314 315 316

(

(

in December 2017 in the study area. In contrast, the lowest averaged estimated PM2.5 ) in August 2015, (19.9 μg

(

) in March 2016 and (25.19 μg

(

) in June 2017. This

Several studies have linked the highest values of PM2.5 during the November-December (cold

317

days) as compared to the lowest values in March, June and August (warm days) with the lowest

318

ventilation capability, lowest rate of humidity, lower temperature, and decreased atmospheric

319

height (Elbayoumi et al., 2015). Similar results were observed in the urbanized location of

320

Beijing with highest PM2.5 concentration in winter and lowest in summer seasons (Zhao et al.,

321

2009). The increasing trend of PM concentrations in metropolitan cities is associated with

322

distributed urbanization and human activities in the context of anthropogenic pollutants. In

323

addition, these pollutants include aerosols, urban construction, emissions of industries,

324

automobile and burning gases of fossil fuels. The overall correlation between observed and

325

estimated PM2.5 is shown in the Figure 7a. Similarly, to provide the adequacy of the linear

326

regression model the non correlation of residuals (errors) is necessary and is shown in Figure 7b

327

(Ul-Saufie, 2011). Hence, the results of both figures depicted the goodness of MLR application

328

in estimating the PM2.5 concentration from August 2015 to December 2017.

329

(

330 331 332 333

Figure 7. (a) Correlation between mean monthly PM2.5 concentration (μg and (b) Residual analysis.

) during 2015-2017

334

satellite data during the study period. From the Figure 5 and 7, the cross validation results

335

suggested that the concerned MLR model (e.g. Equation 10) can be used to estimate ground level

The calculated correlation (see Figure 7a) was based solely on the complete set of ground and

336

PM2.5 from satellite AOD with applied conditions. However, it can be used merely in those

337

locations for which complete input datasets exist, otherwise their result will need to validate with

338

different approaches. The fusion of meteorological and land use data with satellite AOD has

339

proven its importance to estimate particulate matter estimation. Likewise, the sole satellite AOD

340

retrievals are not sufficient to estimate PM2.5 concentration (Chelani. 2018). Nguyen et al. (2015)

341

in their study carried similar procedure to predict the ground based PM concentration using MLR

342

at various sites in Vietnam. Their results revealed good correlation (0.69) between MODIS-AOD

343

and ground level AOD concentration. The extended MLR models have revealed enhanced results

344

over the simple and ordinary MLR models. Xin et al. (2014) in their study used satellite AOD as

345

predictors with climatic parameters and found comparatively results with other statistical models

346

in the continental United States. Likewise, Liu et al. (2007) estimated PM2.5 from satellite AOD

347

and found good estimation.

348 349

3.4. PM estimations with neural networks

350

This section reports the PM estimation with NN. Each input vector consists of MODIS-AOD,

351

MODIS-NDVI and meteorological data. Based upon these inputs, 7 neurons in the input layer

352

were employed. Further, the output layer consists of only one neuron for the estimated PM2.5.

353

The input dataset (3536) from training DDN was collected from 21 stations in Karachi during

354

2015-2017 and evaluated over each station rather than a few stations as in case of MLR model.

355

To get optimum performance by ANN, various numbered pairs of transfer functions were tested

356

for the hidden and output layers by changing the neuron numbers in the hidden layers. Among

357

these combinations the results of the best combination with maximum correlation coefficient and

358

minimum RMSE were taken. The average accuracy of several runs for each architecture

359

(network) is reported in Table 3. In general, one hidden layer is enough in the network structure

360

to estimate any non linear output with required accuracy. However, more than one hidden layers

361

can be appropriate for complex data sets (Chaloulakou et al., 2003). The network layer with

362

seven neurons in the input layer, as two hidden layers, including 14 hidden neurons plus a single

363

neuron in the output layer revealed best estimations.

364

Table 3. Statistical results for different ANN networks. R S.No.

Network

Validation

Test set

Training set

All

1

7-32-32-1

0.78

0.78

0.80

0.80

2

7-32-64-1

0.79

0.72

0.81

0.80

3

7-128-128-1

0.79

0.74

0.81

0.80

4

7-256-256-1

0.79

0.76

0.80

0.80

5

7-32-32-32-1

0.78

0.80

0.80

0.80

6

7-32-64-128-1

0.79

0.78

0.76

0.80

365 366

3.5. Daily estimation of PM2.5

367

Daily averaged PM2.5 concentrations were estimated by the pre-trained DNNs (see Figure 8). The

368

convergence of the training was decided by evaluating the accuracy on the validation set for each

369

epoch. If the accuracy on the validation set does not increase, we reduced the learning rate and

370

try for several iterations. If the accuracy does not improve for 6 checks, we stopped the training

371

and concluded that training was converged.

372 373 374 375 376

Figure 8. Time series variation of daily mean PM2.5 concentrations (μg/m ( ) during 2015-2017. The relationships between observed and estimated PM2.5 concentrations using ANN are shown in the Figure 9.

377 378 379 380 381

Figure 9: The correlation (R) values for training, validation, test, and all data regression analysis. The “R” value for regression analysis of training, validation, test and all data are 0.80, 0.78, 0.80,

382

and 0.80, respectively. The R value > 0.70 here is considered significant and showed a strong

383

positive linear relationship between input and output vectors. Therefore, in this study the ANN

384

proposed model is capable to handle such random variation and is considered good and

385

acceptable. The correlation factor values nearly equal to 0.6 were considered normal in the

386

context of random climate change (Ul-Saufie et al., 2013). In addition, Figure 10 has shown the

387

proceeding pattern of network training. It can be observed that the mean squared error (MSE)

388

between the target and estimated output decreases as training proceeds. The plot showed that

389

neither under-fitting nor over fitting has occurred. The convergence criterion determines when

390

the network training should stop. This is required to avoid both under-fitting and over-fitting. In

391

the proposed work, we divided the training set into training, validation, and test set. The

392

convergence is decided by evaluating validation accuracy after each epoch. If the accuracy on

393

the validation set does not increase for three consecutive iterations, we reduced the learning rate

394

and continue training. If the accuracy is not improved for six checks and the learning rate

395

become very small, we stopped the training and concluded that training has converged. Further,

396

as shown in the figure, the convergence is achieved on 23rd epoch. However, it does not change

397

much after 23rd epoch.

398 399 400

Figure 10. Mean square error values for training, validation and test.

401

4. Conclusion

402

The main objective of the proposed study was to use a combined approach of linear regression

403

(MLR) and machine learning (ANN) to estimate the PM2.5 concentrations from 2015 to 2017 in

404

the most urbanized city of Pakistan. In addition, the satellite AOD was incorporated with

405

meteorological and land use (NDVI) parameters to develop these models. Both models have

406

shown positive response with respect to the target. MLR model was used to estimate mean

407

monthly PM2.5 at those sites, where ground data were present at four out of seven coastal sites.

408

Therefore, MLR results were good, but limited at few stations. In contrast, ANN was prepared

409

and trained on input datasets to estimate the daily averaged PM2.5 overall study sites. The results

410

were consistent and strongly correlated with observed PM2.5 concentration. The results obtained

411

from the combined approach of MLR and ANN models were found significant with respect to

412

ground observed measurements. The main outcomes are given below.

413



the context of handling large datasets, high efficiency and accuracy with low error.

414 415



The accuracy of PM2.5 estimation was enhanced by including the meteorological and land use data with satellite AOD.

416 417

The results revealed that the performance of ANN can be preferred over MLR model in



In case of estimating PM2.5 concentration, if the relationship between input and output

418

vectors is non-linear, the deep neural network can still approximate PM2.5 due to

419

nonlinear activation functions.

420 421



ANN with nonlinear activation function will perform better than MLR. However, MLR can better perform if decision boundaries can be linearly separated.

422



Properly trained ANN can be extended to a large study area to fulfill the spatiotemporal

423

gaps in the ground level PM observations with associated guidelines to monitor air

424

quality.

425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444

445

References

446

Alam, K., Blaschke, T., Madl, P., Mukhtar, A., Hussain, M., Trautmann, T., & Rahman, S. 2011.

447

Aerosol size distribution and mass concentration measurements in various cities of Pakistan.

448

Journal of Environmental Monitoring, 13, 1944-1952.

449

Alam, K., Khan, R., Blaschke, T., & Mukhtiar, A. 2014. Variability of aerosol optical depth and

450

their impact on cloud properties in Pakistan. Journal of Atmospheric and Solar-Terrestrial

451

Physics, 107, 104-112.

452

Alam, K., Qureshi, S., Blaschke, T. 2011. Monitoring spatio-temporal aerosol patterns over

453

Pakistan based on MODIS, TOMS and MISR satellite data and a HYSPLIT model. Atmos.

454

Environ. 45, 4641-4651.

455

ALVAREZ, R., BONIFAZ, R., LUNETTA, R.S., GARCI´A, C., GO´ MEZ, G., CASTRO, R.,

456

BERNAL, A. and CABRERA, A.L., 2003, Multitemporal land-cover classification of Mexico

457

using Landsat MSS imagery. International Journal of Remote Sensing, 24, 2501-2514.

458

Arvani B, Pierce RB, Lyapustin AI, Wang Y, Ghermandi G, Teggi S. 2016. Seasonal monitoring

459

and estimation of regional aerosol distribution over Po valley, northern Italy, using a high-

460

resolution MAIAC product. Atmospheric environment. 141:106-121.

461

Boiyo, R., Raghavendra Kumar, K., Zhao, T. 2017. Statistical intercomparison and validation of

462

multisensory aerosol optical depth retrievals over three AERONET sites in Kenya, East

463

Africa. Atmos. Res. 197 (15), 277-288.

464

Chaloulakou, A., Grivas, G., & Spyrellis, N. 2003. Neural network and multiple regression

465

models for PM10 prediction in Athens: a comparative assessment. Journal of the Air & Waste

466

Management Association, 53(10), 1183-1190.

467

Chelani, A. B. 2018. Estimating PM2.5 concentration from satellite derived aerosol optical depth

468

and meteorological variables using a combination model. Atmospheric Pollution Research.

469

Ceylan, Ζ., & Bulkan, S. (2018). Forecasting PM10 levels using ANN and MLR: A case study

470

for Sakarya City. GLOBAL NEST JOURNAL, 20(2), 281-290.

471

Chu, D.A.; Tsai, T.-C.; Chen, J.-P.; Chang, S.-C.; Jeng, Y.-J.; Chiang, W.-L.; Lin, N.-H.

472

Interpreting aerosol lidar profiles to better estimate surface PM2.5 for columnar AOD

473

measurements. Atmosp. Environ. 2013, 79, 172–187.

474 475

Dastoorpoor M, Idani E, Khanjani N, Goudarzi G, Bahrampour A. 2016. Relationship between air pollution, weather, traffic, and traffic-related mortality. Trauma monthly. 21(4).

476

Draxler, R.R., Rolph, G.D. 2011. HYSPLIT (HYbrid Single-particle Lagrangian Integrated

477

Trajectory) Model, Accessed via NOAA ARL READY Website. NOAA Air Resources

478

Laboratory, Silver Spring, MD.

479

Elangasinghe, M., Dirks, K., Singhal, N., Costello, S., Longley, I., Salmond, J. 2014. A simple

480

semi-empirical technique for apportioning the impact of roadways on air quality in an urban

481

neighbourhood. Atmos. Environ. 83, 99-108.

482

Elbayoumi, M., Ramli, N. A., & Yusof, N. F. F. M. 2015. Development and comparison of

483

regression models and feed forward back propagation neural network models to predict

484

seasonal

485

schools. Atmospheric Pollution Research, 6(6), 1013-1023.

486 487

indoor

PM2.5–10

and

PM2.5

concentrations

in

naturally

ventilated

Fang D and Wang J. 2017. A Novel Application of Artificial Neural Network for Wind Speed Estimation, International Journal of Sustainable Energy, 36(5), 415-429.

488

Ghaedrahmat Z, Vosoughi M, Birgani YT, Neisi A, Goudarzi G, Takdastan A. 2019. Prediction

489

of O3 in the respiratory system of children using the artificial neural network model and with

490

selection of input based on gamma test, Ahvaz, Iran. Environmental Science and Pollution

491

Research. 26(11):10941-50.

492

Gupta, P and Christopher, S.A. 2009. Particulate matter air quality assessment using integrated

493

surface, satellite, and meteorological products: multiple regression approach. J. Geophy. Res.

494

Atmos. 114, D14205.

495

Karimian, H., Li, Q., Li, C., Jin, L., Fan, J., Li, Y., 2016. An Improved Method for Monitoring

496

Fine Particulate Matter Mass Concentrations via Satellite Remote Sensing. Aerosol and Air

497

Quality Research 16, pp. 1081-1092.

498

Kaskaoutis, D.G., Kharol, S.K., Sinha, P.R., Singh, R.P., Badarinath, K.V.S., Mehdi, W.,

499

Sharma, M., 2011. Contrasting aerosol trends over South Asia during the last decade based on

500

MODIS observations. Atmospheric Measurement Techniques 4, 5275-5323.

501

Khwaja, H.A., Fatmi, Z., Malashock, D., Aminov, Z., Kazi, A., Siddique, A., Qureshi, J. and

502

Carpenter, D.O. Effect of air pollution on daily morbidity in Karachi, Pakistan. Journal of

503

Local and Global Health Science, 2013. p.3.

504

Liu, Y., Koutrakis, P., Kahn, R., Turquety, S., & Yantosca, R. M. (2007). Estimating fine

505

particulate matter component concentrations and size distributions using satellite-retrieved

506

fractional aerosol optical depth: Part 2. A case study. Journal of the Air & Waste Management

507

Association, 57(11), 1360-1369.

508

Lin, C.Q.; Li, Y.; Yuan, Z.B.; Alexis, K.H.; Deng, X.J.; Tse, T.K.L.; Fung, J.C.H.; Li, C.C.; Li,

509

Z.Y.; Lu, X.C.; et al. Estimation of long-term population exposur to PM2.5 for dense urban

510

areas using 1-km MODIS data. Remote Sens. Environ. 2016, 179, 13-22.

511

Lurie K, Nayebare SR, Fatmi Z, Carpenter DO, Siddique A, Malashock D, Khan K, Zeb J,

512

Hussain MM, Khatib F, Khwaja HA. PM2.5 in a megacity of Asia (Karachi). 2019. Source

513

apportionment and health effects. Atmospheric Environment. 202:223-33.

514 515

Ma, X., Wang, J., Yu, F., Jia, H., Hu, Y., 2016. Can MODIS AOD be employed to derive PM2.5 in Beijing-Tianjin-Hebei over China? Atmos. Res. 181, 250–256.

516

Malashock, D., Khwaja, H., Fatmi, Z., Siddique, A., Lu, Y., Lin, S., & Carpenter, D. (2018).

517

Short-Term Association between Black Carbon Exposure and Cardiovascular Diseases in

518

Pakistan’s Largest Megacity. Atmosphere, 9(11), 420.

519

Maleki H, Sorooshian A, Goudarzi G, Baboli Z, Birgani YT, Rahmati M. 2019. Air pollution

520

prediction by using an artificial neural network model. Clean Technologies and

521

Environmental Policy. 1:1-2.

522 523

Mishra, D., Goyal, P., Upadhyay, A., 2015. Artificial intelligence based approach to forecast PM2.5 during haze episodes: a case study of Delhi, India. Atmos. Environ. 102, 239-248.

524

Nguyen, Thanh TN, Hung Q. Bui, Ha V. Pham, Hung V. Luu, Chuc D. Man, Hai N. Pham, Ha

525

T. Le, and Thuy T. Nguyen. "Particulate matter concentration mapping from MODIS satellite

526

data: a Vietnamese case study." Environmental Research Letters 10, no. 9 (2015): 095016.

527

Russo, A., Lind, P.G., Raischel, F., Trigo, R., Mendes, M., 2015. Neural network forecast of

528

daily pollution concentration using optimal meteorological data at synoptic and local scales.

529

Atmos. Pollut. Res. 6, 540-549.

530 531

Saunders, R. O., Kahl, J. D., & Ghorai, J. K. (2014). Improved estimation of PM2.5 using Lagrangian satellite-measured aerosol optical depth. Atmospheric environment, 91, 146-153.

532

Soni, M., Payra, S., Verma, S., 2018. Particulate matter estimation over a semi-arid region

533

Jaipur, India using satellite AOD and meteorological parameters. Atmos. Pollut. Res. 949-

534

958.

535 536

Tandon, A., Yadav, S., Attri, A.K., 2010. Coupling between meteorological factors and ambient aerosol load. Atmospheric Environment 44 (9), 1237-1243.

537

Tsai, TC, Jeng, YJ, Chu, DA, Chen, JP, and Chang, SC. 2011. Analysis of the relationship

538

between MODIS aerosol optical depth and particulate matter from 2006 to 2008, Atmospheric

539

Environment, 45, 4777-4788.

540

Ul-Saufie, A., Yahaya, A., Ramli, N., Awang, N., Hamid, H. 2013. Future daily pm 10

541

concentrations prediction by combining regression models and feed forward back propagation

542

models with principle component analysis (PCA). Atmos. Environ. 77, 621-630.

543

van Donkelaar, A., Martin, R. V., and Park, R. J. 2006. Estimating ground-level PM2.5 using

544

aerosol optical depth determined from satellite remote sensing, Journal of Geophysical

545

Research: Atmospheres, 111, D21201.

546 547

Wallace, J.M. and Hobbs, P.V. 2006. Atmospheric science: an introductory survey. Vol. 92. Elsevier.

548

Wang, J., and S. A. Christopher. 2003. Inter-comparison between satellite derived aerosol optical

549

thickness and PM2.5 mass: Implications for air quality studies, Geophys. Res. Lett., 30(21),

550

2095.

551 552

Wu, C.; Yu, J.Z. Evaluation of linear regression techniques for atmospheric applications: The importance of appropriate weighting. Atmos. Meas. Tech. 2018, 11, 1233-1250.

553

Xin, J., Zhang, Q., Wang, L., Gong, C., Wang, Y., Liu, Z., & Gao, W. 2014. The empirical

554

relationship between the PM2.5 concentration and aerosol optical depth over the background

555

of North China from 2009 to 2011. Atmospheric Research, 138, 179-188.

556

Xu, J., & Pei, L. 2017. Air Quality Index Prediction Using Error Back Propagation Algorithm

557

and Improved Particle Swarm Optimization. In International Conference on Mechatronics and

558

Intelligent Robotics (pp. 9-14). Springer, Cham.

559

You, W. Zang, Z. Zhang, L. Li, Y. Pan, X. Wang, W. 2016. National-Scale Estimates of

560

Ground-Level PM2.5 Concentration in China Using Geographically Weighted Regression

561

Based on 3 km Resolution MODIS AOD. Remote Sens. 8, 184.

562

Zhao X, Zhang X, Xu X, Xu J, Meng W, Pu W. Seasonal and diurnal variations of ambient

563

PM2.5 concentration in urban and rural environments in Beijing. Atmos Environ. 2009; 43:

564

2893-2900.

Highlights



New & robust integrated approach of linear regression & deep neural network is used.



Land use and meteorological data significantly improve PM2.5 estimation.



Neural network with nonlinear activation function can better estimate PM2.5.



Deep neural network is more efficient and robust than previous regression models.

Declaration of interests ☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. ☐The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: