Evaluating the effect of air pollution on global and diffuse solar radiation prediction using support vector machine modeling based on sunshine duration and air temperature

Evaluating the effect of air pollution on global and diffuse solar radiation prediction using support vector machine modeling based on sunshine duration and air temperature

Renewable and Sustainable Energy Reviews 94 (2018) 732–747 Contents lists available at ScienceDirect Renewable and Sustainable Energy Reviews journa...

2MB Sizes 0 Downloads 14 Views

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

Contents lists available at ScienceDirect

Renewable and Sustainable Energy Reviews journal homepage: www.elsevier.com/locate/rser

Evaluating the effect of air pollution on global and diffuse solar radiation prediction using support vector machine modeling based on sunshine duration and air temperature

T



Junliang Fana, Lifeng Wua,b, , Fucang Zhanga,c, Huanjie Caia,c, Xiukang Wangd, Xianghui Lub, Youzhen Xianga a

Key Laboratory of Agricultural Soil and Water Engineering in Arid and Semiarid Areas of Ministry of Education, Northwest A&F University, Yangling 712100, China School of Hydraulic and Ecological Engineering, Nanchang Institute of Technology, Nanchang 330099, China Institute of Water-saving Agriculture in Arid Areas of China, Northwest A&F University, Yangling 712100, China d College of Life Sciences, Yan’an University, Yan’an 716000, China b c

A R T I C LE I N FO

A B S T R A C T

Keywords: Global dimming Air pollutant Air quality index Suspended particulate matters Ozone

Increasing air pollutants attenuate surface solar radiation, and thus can be influential variables for solar radiation prediction. In this study, six air pollutants of PM2.5, PM10, SO2, NO2, CO and O3 as well as air quality index (AQI) were chosen for analyzing their single and integrated effects on daily global and diffuse solar radiation (Rs and Rd) prediction. Seven single air pollution parameters, 15 combinations of two parameters and 20 combinations of three parameters were considered using Support Vector Machine (SVM) based on sunshine duration or air temperature. Daily meteorological and air pollution data between January 2014 and December 2015 from China's capital city of Beijing were used to train SVM models and data from January 2016 to December 2016 for testing. Results show that AQI was the most relevant air pollution parameter for both Rs and Rd prediction, followed by O3 for Rs and by PM2.5 for Rd with slight difference as that of AQI. The combination of PM10 and O3 and the combination of PM2.5 and O3 were the most influential combination of two air pollution inputs for Rs and Rd prediction, respectively. The combination of PM2.5, PM10 and O3 was the most optimal combination of three air pollution inputs for both daily Rs and Rd prediction. Compared with SVM models without considering air pollution, the accuracy of SVM models with the most influential combinations of one, two and three air pollution inputs was improved by 13.9%, 19.8% and 22.2% in terms of RMSE for sunshinebased Rs, respectively. The corresponding values were 15.2%, 22.0% and 22.8% for temperature-based Rs, 16.1%, 21.5% and 24.5% for sunshine-based Rd, and 16.8%, 22.0% and 23.3% for temperature-based Rd. The results demonstrate the importance of appropriate selection of air pollution inputs to improve the accuracy of Rs and Rd prediction in air-polluted regions.

1. Introduction Among the renewable and sustainable energy resources (e.g. solar, wind, biomass, geothermal and hydroelectric), solar energy has attracted much attention due to its abundant availability on the Earth's surface and being environmentally-friendly [57,31,45]. Global solar radiation (Rs) and its component of diffuse solar radiation (Rd) at a given location are of great importance for agricultural and hydrological modeling as well as the optimal design and application of solar energy systems [2,18,25,29,38,41]. Nevertheless, unlike other meteorological

variables (e.g. sunshine duration and air temperature), reliable measurements of global and diffuse solar radiation are not available at many worldwide locations, particularly in developing countries, due to the high costs and the difficulty of installation and maintenance of measuring instruments like Pyranometers [23]. Taking China as an example, there are 726 long-term weather stations, where 98 stations record global solar radiation, but only 17 stations measure diffuse solar radiation [26]. Therefore, various techniques have been proposed to predict Rs and Rd, e.g. empirical models [1,4,8,24,15] and machine learning models [11,16,17,34,37,40].

Abbreviations: ANFIS, Adaptive Neuro Fuzzy Inference System; ANN, Artificial Neural Networks; API, Air Pollution Index; AQI, Air Quality Index; BTH, Beijing–Tianjin–Hebei; GDP, Gross Domestic Product; MAE, Mean absolute error (MJ m−2 d−1); R2, Coefficient of determination; RF, Random Forest; RMSE, Root mean square error (MJ m−2 d−1); SVM, Support Vector Machine ⁎ Corresponding author at: School of Hydraulic and Ecological Engineering, Nanchang Institute of Technology, Nanchang 330099, China. E-mail address: [email protected] (L. Wu). https://doi.org/10.1016/j.rser.2018.06.029 Received 28 February 2018; Received in revised form 13 June 2018; Accepted 13 June 2018 1364-0321/ © 2018 Elsevier Ltd. All rights reserved.

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Nomenclature C CO k n N NO2 O3 PM2.5 PM10

Ra Rd Rs SO2 Tmax Tmin φ ω ɛ λ γ Ω

Penalty parameter of the error Carbon monoxide (mg m−3) Number of observations Sunshine duration (h) Maximum sunshine duration (h) Nitrogen dioxide (μg m−3) Ozone (μg m−3) Suspended particulate matters < 2.5 µm in aerodynamic diameter (μg m−3) Suspended particulate matters < 10 µm in aerodynamic diameter (μg m−3)

Extra-terrestrial solar radiation (MJ m−2 d−1) Diffuse solar radiation (MJ m−2 d−1) Global solar radiation (MJ m−2 d−1) Sulfur dioxide (μg m−3) Maximum temperature (°C) Minimum temperature (°C) Higher-dimensional feature space Weights vector Tube size Regularization parameter Minimum loss Regularization term

with nonlinear and multidimensional relationships in noisy environments [28]. Thus, various machine learning techniques have been applied to predict solar radiation, e.g. Artificial Neural Networks (ANN) ([7,51,32]), Support Vector Machines (SVM) [37,40] and Adaptive Neuro Fuzzy Inference System (ANFIS) [34,56], etc. Among these machine learning models, the SVM model has been recently employed for predicting solar radiation from sunshine duration or air temperature data owing to its higher accurate predictions compared with the other models [21,33,39,47]. Apart from sunshine duration and air temperature, other meteorological and geographical factors like precipitation, relative humidity and location (longitude and latitude), were also used for solar radiation

The empirical models are most commonly used due to their model simplicity and low computational costs [22,23,44]. Over the past few decades, many efforts have been made to predict Rs and Rd from various types of empirical models, e.g. sunshine-based models [24,25,36,5,6] and temperature-based models [16,17,22,30,50]. Generally, the sunshine-based empirical models provide better estimates than those based on maximum/minimum temperature [13,54]. However, when lack of sunshine duration measurements, the temperature-based empirical models are highly preferred for solar radiation prediction due to air temperature being the most available meteorological variable at any stations around the world [22]. Although empirical models have been widely employed for solar radiation prediction, they are difficult to deal

Fig. 1. The geographical locations of the 14 major cities in the Beijing-Tianjin-Hebei (BTH) region of China. The average seasonal AQI during 2014–2016 in each city is also presented. 733

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Fig. 2. Observed daily global and diffuse solar radiation (Rs and Rd), sunshine duration (n), maximum/minimum temperature (Tmax and Tmin) and the mass concentrations of six major air pollutants (PM2.5, PM10, SO2, NO2, CO and O3) along with calculated air quality index (AQI) from January 2014 to December 2016 in Beijing.

the atmosphere, such as suspended particulate matters smaller than 2.5 µm in aerodynamic diameter (PM2.5), suspended particulate matters smaller than 10 µm in aerodynamic diameter (PM10), nitrogen dioxide (NO2), sulfur dioxide (SO2), carbon monoxide (CO) and ozone (O3), which resulted in the attenuation of surface solar radiation by scattering and absorbing solar radiation [27,48,52]. Solar energy systems designed based on conventional meteorological data may be, thus, difficult to meet the current solar power demand, which inevitably influences the development and exploration of solar energy resources.

prediction [54]. Mohammadi et al. [34] used the ANFIS models to evaluate the effect of nine important parameters for the prediction of daily global and diffuse solar radiation in Iran. According to their results, n, N and Ra were generally the most influential input parameters to predict global and diffuse solar radiation, where Tmin and Tmax were also significant parameters when lack of the sunshine data. Recently, the decreasing global solar radiation, referred as “global dimming”, has been observed at many locations around the globe. This was proved to be largely attributed to the anthropogenic alteration of pollutant load in 734

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Table 1 Minimum, maximum, mean and standard deviation values of the 12 input parameters and the two output parameters (Rs and Rd) during 2014–2016 used in this study. Parameter

n (hr) N (hr) Tmin (°C)

Tmax (°C)

PM2.5 (μg m−3)

PM10 (μg m−3)

SO2 (μg m−3)

NO2 (μg m−3)

CO (mg m−3)

O3 (μg m−3)

AQI

Ra (MJ m−2 d−1)

Rs (MJ m−2 d−1)

Rd (MJ m−2 d−1)

Min

0.0

9.2

0.0

2.0

8.0

0.2

2.0

18.0

13.8

0.0

0.0

13.9 6.7 4.1

14.8 12.0 1.9

− 10.9 41.1 19.2 11.3

5.0

Max Mean Standard deviation

− 15.2 27.9 9.0 10.8

476.0 78.1 69.3

501.0 103.3 76.4

133.0 13.4 16.7

152.0 49.3 24.7

8.1 1.2 1.0

169.0 56.8 38.2

478.0 41.9 112.5 28.4 76.3 10.0

31.1 14.3 7.5

17.5 6.7 3.9

air pollution conditions. The results indicate that the developed RF models with API data have shown better performance than the empirical models of Zhao et al. [55], with 2.0–17.4% lower values of root mean square error in the testing stage. Vakili et al. [45] evaluated the effects of suspended particulate matters on the prediction of daily Rs using ANN models. The results showed that their proposed model had relatively higher accuracy compared with work of previous researchers. Furlan et al. [19] have estimated hourly diffuse solar radiation by introducing a new regression model, which considered the effects of cloudiness and air pollution (concentration of particulate matters) besides meteorological variables for Rd prediction. The results showed that their developed model was superior to the previously developed ones. Yao et al. [53] also developed new empirical models for Rd estimation in four cities of China with heavy fog and haze by considering air quality index (AQI), an index similar to API. They found that the AQI modified models were more accurate than the existing daily diffuse solar radiation models. Air pollution index (API) or air quality index (AQI) has been considered as a relative index in air quality evaluation rather than for solar radiation estimation. They are mostly determined as the maximum of calculated individual air pollution or quality index (IAPI or IAQI) of each air pollutants based on mass concentrations [53], which indicates that AQI only represents the IAPI or IAQI of the primary air pollutant. However, solar radiation attenuation is an integrated result of multiple air pollutants instead of a single parameter, even the dominant one. More important, the solar radiation attenuation due to absorbing or scattering by particulate matters and gaseous pollutants is supposed to be more associated with their absolute mass concentrations instead of a relative index. It is apparent from the related reviews that the potential of calculated index of API or AQI and the concentration of integrated particulate matters (PM2.5 + PM10) have been evaluated in modeling solar radiation, but the effects of other specific air pollution parameters (e.g. PM2.5, PM10, NO2, SO2, CO and O3), particularly their integrated effects on the Rs and Rd prediction have not been studied so far. Therefore, the main aim of the present study was to explore the significance and potential effects of using different combinations of air pollution input parameters on the prediction accuracy of global and diffuse solar radiation by using SVM modeling in regions with heavy urban air pollution, which has implications on the modeling of global and diffuse solar radiation by selecting various air pollution inputs.

Fig. 3. Simple flowchart of the proposed SVM selection procedure.

Therefore, attempts have been recently made to evaluate the effects of air pollution on the prediction of solar radiation. Considering the air pollution index (API), an index for daily prediction and assessment of air quality, Zhao et al. [55] have developed linear, exponential and logarithmic empirical models for predicting Rs from sunshine duration in nine cities of China. The results emphasize the importance of API on the reduction in statistical errors and the improvement of prediction accuracy. Similarly, considering API and geographical factors, Suthar et al. [43] have also presented linear, exponential-linear and exponential-quadratic empirical models for estimating Rs in nine cities of India. They found that air pollution was more important than location for Rs estimation. Furthermore, Sun et al. [42] have evaluated the potential of API in estimating global solar radiation using Random Forest (RF) models for three cities of China with different

Table 2 Statistical results of SVM models for predicting daily global and diffuse solar radiation only based on sunshine duration or air temperature in the training and testing stages without considering the air pollution effects. Model type

Rs_n Rs_T Rd_n Rd_T

Inputs

n, N and Ra Tmin, Tmax and Ra n, N and Ra Tmin, Tmax and Ra

RMSE (MJ m−2 d−1)

R2

MAE (MJ m−2 d−1)

Training

Testing

Training

Testing

Training

Testing

0.924 0.694 0.740 0.569

0.943 0.743 0.748 0.621

2.193 4.235 1.962 2.543

1.794 3.817 1.896 2.420

1.575 3.142 1.369 2.027

1.248 2.709 1.344 1.857

Note: Rs_n: sunshine-based global solar radiation; Rs_T: temperature-based global solar radiation; Rd_n: sunshine-based diffuse solar radiation; Rd_T: temperaturebased diffuse solar radiation; the same below. 735

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Table 3 Statistical results of SVM models for predicting daily global and diffuse solar radiation as well as the ranking of input parameters in the training and testing stages for different model types.

Model type

Rs_n

Rs_T

Rd_n

Rd_T

R2

Inputs

RMSE (MJ m−2 d−1) Training Testing

MAE (MJ m−2 d−1) Ranking Training Testing

Training

Testing

AQI

0.935

0.958

2.046

1.545

1.462

1.097

1

PM2.5

0.933

0.956

2.044

1.582

1.459

1.105

3

PM10

0.937

0.955

1.997

1.603

1.443

1.120

4

SO2

0.927

0.947

2.114

1.754

1.514

1.229

7

NO2

0.925

0.949

2.116

1.700

1.493

1.156

5

CO

0.929

0.949

2.085

1.719

1.486

1.190

6

O3

0.933

0.956

2.043

1.573

1.525

1.127

2

AQI

0.785

0.820

3.435

3.238

2.342

2.150

1

PM2.5

0.789

0.806

3.422

3.343

2.279

2.158

3

PM10

0.778

0.801

3.563

3.367

2.470

2.256

4

SO2

0.743

0.771

3.808

3.615

2.733

2.488

7

NO2

0.776

0.792

3.545

3.453

2.492

2.329

5

CO

0.753

0.780

3.706

3.546

2.564

2.337

6

O3

0.787

0.817

3.442

3.241

2.348

2.154

2

AQI

0.801

0.834

1.729

1.591

1.158

1.123

1

PM2.5

0.801

0.828

1.734

1.621

1.168

1.125

2

PM10

0.806

0.814

1.713

1.691

1.183

1.176

3

SO2

0.755

0.766

1.920

1.891

1.327

1.322

7

NO2

0.750

0.769

1.936

1.875

1.319

1.291

6

CO

0.773

0.794

1.846

1.771

1.246

1.226

5

O3

0.794

0.810

1.744

1.701

1.237

1.206

4

AQI

0.682

0.734

2.184

2.013

1.561

1.451

1

PM2.5

0.687

0.732

2.163

2.015

1.504

1.441

2

PM10

0.701

0.712

2.134

2.101

1.581

1.513

3

SO2

0.619

0.663

2.412

2.273

1.842

1.667

7

NO2

0.634

0.667

2.336

2.259

1.713

1.640

6

CO

0.661

0.706

2.283

2.152

1.630

1.512

5

O3

0.650

0.699

2.248

2.120

1.738

1.610

4

Note: the top three ranked inputs were highlighted in blue, green and orange, respectively; the same below.

2. Materials and methods

motorization during recent decades. The average seasonal AQI values during 2014–2016 in 14 major cities of the BTH region are shown in Fig. 1. As we can see, the average AQI values in spring and autumn were higher than 100 in most cities, and half of the cities had an AQI value over 150 in winter, which have excessed China's current air quality standards.

2.1. Study area The Beijing–Tianjin–Hebei (BTH) region is one of the most developed regions in China, which is located in northern China and includes two municipalities (Beijing, Tianjin) and one province (Hebei) as shown in Fig. 1. The BTH region is one of the most economically developed regions in China, covering 2.3% of the territory of China, while producing 9.3% of the total national gross domestic product (GDP) in 2016. The BTH region occupies 8.1% of the total population and 8.6% of the total motor vehicles in China. In addition, BTH is also one of the most seriously air-polluted regions in China and around the world due to rapid development of industrialization, urbanization and

2.2. Data collection and analysis Continuously observed daily Rs, Rd, sunshine duration (n), maximum/ minimum temperature (Tmax and Tmin) and the mass concentrations of six major air pollutants (PM2.5, PM10, SO2, NO2, CO and O3) along with calculated AQI from January 2014 to December 2016 were collected from China's capital city of Beijing as a representative city in the BTH region 736

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Table 4 Statistical results of SVM models for predicting daily sunshine-based global solar radiation as well as the ranking of input combinations of two parameters in the training and testing stages.

Combination #

R2

Inputs Training

Testing

RMSE (MJ m−2 d−1) Training Testing

MAE (MJ m−2 d−1) Ranking Training Testing

1

PM2.5, PM10

0.937

0.958

2.018

1.526

1.450

1.088

7

2

PM2.5, SO2

0.935

0.958

2.034

1.529

1.465

1.082

8

3

PM2.5, NO2

0.931

0.960

2.044

1.496

1.467

1.066

4

4

PM2.5, CO

0.933

0.958

2.037

1.552

1.458

1.104

10

5

PM2.5, O3

0.937

0.963

1.954

1.443

1.438

1.055

2

6

PM10, SO2

0.937

0.955

2.004

1.589

1.454

1.112

12

7

PM10, NO2

0.933

0.960

2.003

1.511

1.441

1.066

5

8

PM10, CO

0.937

0.956

1.979

1.584

1.423

1.114

11

9

PM10, O3

0.941

0.964

1.941

1.439

1.433

1.044

1

10

SO2, NO2

0.925

0.951

2.096

1.672

1.486

1.156

14

11

SO2,CO

0.929

0.949

2.083

1.709

1.490

1.183

15

12

SO2, O3

0.937

0.958

1.984

1.543

1.484

1.115

9

13

NO2, CO

0.929

0.953

2.072

1.658

1.469

1.143

13

14

NO2, O3

0.939

0.962

1.958

1.452

1.443

1.033

3

15

CO, O3

0.937

0.960

1.971

1.522

1.472

1.087

6

Table 5 Statistical results of SVM models for predicting daily temperature-based global solar radiation as well as the ranking of input combinations of two parameters in the training and testing stages.

Combination #

R2

Inputs Training

Testing

RMSE (MJ m−2 d−1) Training Testing

MAE (MJ m−2 d−1) Ranking Training Testing

1

PM2.5, PM10

0.787

0.819

3.441

3.219

2.364

2.146

6

2

PM2.5, SO2

0.792

0.823

3.394

3.196

2.264

2.088

4

3

PM2.5, NO2

0.785

0.817

3.468

3.249

2.419

2.172

8

4

PM2.5, CO

0.781

0.814

3.483

3.274

2.398

2.183

9

5

PM2.5, O3

0.821

0.834

3.172

3.098

2.165

2.037

2

6

PM10, SO2

0.746

0.776

3.751

3.587

2.619

2.369

15

7

PM10, NO2

0.787

0.812

3.427

3.299

2.305

2.137

10

8

PM10, CO

0.774

0.790

3.546

3.464

2.488

2.266

13

9

PM10, O3

0.817

0.846

3.187

2.976

2.110

2.017

1

10

SO2, NO2

0.781

0.810

3.480

3.310

2.395

2.128

11

11

SO2,CO

0.783

0.803

3.457

3.365

2.320

2.148

12

12

SO2, O3

0.799

0.817

3.354

3.240

2.315

2.204

7

13

NO2, CO

0.771

0.789

3.591

3.489

2.551

2.284

14

14

NO2, O3

0.805

0.826

3.294

3.167

2.180

2.095

3

15

CO, O3

0.801

0.821

3.324

3.209

2.239

2.101

5

data were provided by China National Environmental Monitoring Center. Data were excluded if any of the above meteorological data was missing or the ratio of Rs/Ra, Rd/Rs and n/N was greater than 1. The measured data from January 2014 to December 2016 were then divided into two datasets. The first dataset (January 2014–December 2015) were used to train

(Fig. 2). Moreover, the values of maximum sunshine duration (N) and extra-terrestrial solar radiation (Ra) were also determined as per the procedures described by Allen et al. [3]. The weather data were provided and quality examined by the National Meteorological Information Center (NMIC) of China Meteorological Administration, while the air pollution 737

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Table 6 Statistical results of SVM models for predicting daily sunshine-based diffuse solar radiation as well as the ranking of input combinations of two parameters in the training and testing stages.

Combination #

R2

Inputs Training

Testing

RMSE (MJ m−2 d−1) Training Testing

MAE (MJ m−2 d−1) Ranking Training Testing

1

PM2.5, PM10

0.808

0.845

1.690

1.541

1.136

1.071

3

2

PM2.5, SO2

0.801

0.839

1.730

1.570

1.178

1.083

7

3

PM2.5, NO2

0.799

0.841

1.732

1.562

1.166

1.076

4

4

PM2.5, CO

0.801

0.841

1.731

1.563

1.165

1.095

6

5

PM2.5, O3

0.821

0.852

1.644

1.504

1.137

1.060

1

6

PM10, SO2

0.806

0.815

1.722

1.681

1.194

1.164

12

7

PM10, NO2

0.806

0.826

1.709

1.629

1.185

1.132

10

8

PM10, CO

0.815

0.828

1.675

1.622

1.162

1.131

9

9

PM10, O3

0.821

0.848

1.637

1.520

1.156

1.081

2

10

SO2, NO2

0.755

0.780

1.929

1.836

1.325

1.267

15

11

SO2,CO

0.776

0.799

1.838

1.753

1.258

1.211

14

12

SO2, O3

0.803

0.824

1.721

1.639

1.221

1.155

11

13

NO2, CO

0.781

0.806

1.813

1.724

1.236

1.185

13

14

NO2, O3

0.808

0.841

1.692

1.562

1.193

1.109

4

15

CO, O3

0.803

0.830

1.720

1.609

1.208

1.118

8

Table 7 Statistical results of SVM models for predicting daily temperature-based diffuse solar radiation as well as the ranking of input combinations of two parameters in the training and testing stages.

Combination #

R2

Inputs

RMSE (MJ m−2 d−1) Training Testing

MAE (MJ m−2 d−1) Ranking Training Testing

Training

Testing

1

PM2.5, PM10

0.701

0.734

2.120

2.019

1.484

1.411

7

2

PM2.5, SO2

0.681

0.733

2.189

2.020

1.527

1.431

8

3

PM2.5, NO2

0.684

0.738

2.173

2.003

1.486

1.419

6

4

PM2.5, CO

0.689

0.743

2.155

1.978

1.497

1.424

4

5

PM2.5, O3

0.721

0.767

2.039

1.887

1.452

1.359

1

6

PM10, SO2

0.691

0.709

2.171

2.111

1.605

1.508

13

7

PM10, NO2

0.704

0.729

2.110

2.035

1.541

1.462

9

8

PM10, CO

0.706

0.728

2.114

2.042

1.530

1.472

10

9

PM10, O3

0.719

0.752

2.059

1.949

1.539

1.412

2

10

SO2, NO2

0.642

0.681

2.331

2.208

1.713

1.594

15

11

SO2,CO

0.658

0.707

2.267

2.119

1.653

1.522

14

12

SO2, O3

0.679

0.722

2.220

2.056

1.702

1.503

11

13

NO2, CO

0.661

0.712

2.247

2.094

1.607

1.500

12

14

NO2, O3

0.707

0.746

2.095

1.971

1.555

1.439

3

15

CO, O3

0.701

0.743

2.117

1.979

1.576

1.413

5

Beijing is characterized by the temperate monsoon climate, experiencing a hot and wet season from June to August and a cold and dry season from December to February. Fig. 2 and Table 1 showed that Tmin ranged from − 15.2 °C in January to 27.9 °C in July, while Tmax varied from − 10.9 °C in January to 41.1 °C in July. The average n was 6.7 h

the SVM models, while the second dataset (January 2016–December 2016) were used to test the SVM models. Table 1 presents the minimum, maximum, mean and standard deviation values of the 12 input parameters and two output parameters (Rs and Rd) during 2014–2016 used in this study. 738

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Table 8 Statistical results of SVM models for predicting daily sunshine-based global solar radiation as well as the ranking of input combinations of three parameters in the training and testing stages.

RMSE MAE (MJ m−2 d−1) (MJ m−2 d−1) Ranking Training Testing Training Testing Training Testing R2

Combination #

Inputs

1

PM2.5, PM10 and SO2

0.935

0.960

2.028

1.514

1.471

1.079

2

PM2.5, PM10 and NO2

0.935

0.962

2.020

1.455

1.452

1.045

9

3

PM2.5, PM10 and CO

0.937

0.960

2.007

1.507

1.437

1.083

14

4

PM2.5, PM10 and O3

0.941

0.966

1.930

1.396

1.412

1.019

1

5

PM2.5, SO2 and NO2

0.931

0.962

2.044

1.479

1.468

1.056

11

6

PM2.5, SO2 and CO

0.935

0.960

2.037

1.517

1.465

1.084

16

7

PM2.5, SO2 and O3

0.939

0.964

1.959

1.434

1.445

1.040

6

8

PM2.5, NO2 and CO

0.931

0.960

2.058

1.495

1.473

1.074

12

9

PM2.5, NO2 and O3

0.939

0.964

1.936

1.406

1.435

1.026

2

10

PM2.5, CO and O3

0.939

0.962

1.973

1.455

1.446

1.057

9

11

PM10, SO2 and NO2

0.933

0.960

2.024

1.503

1.454

1.067

13

12

PM10, SO2 and CO

0.937

0.955

1.995

1.595

1.447

1.122

19

13

PM10, SO2 and O3

0.939

0.962

1.942

1.434

1.440

1.041

6

14

PM10, NO2 and CO

0.935

0.958

2.012

1.519

1.445

1.073

17

15

PM10, NO2 and O3

0.937

0.964

1.952

1.409

1.440

1.021

3

16

PM10, CO and O3

0.941

0.963

1.925

1.431

1.414

1.038

5

17

SO2, NO2 and CO

0.929

0.951

2.069

1.664

1.467

1.149

20

18

SO2, NO2 and O3

0.937

0.964

1.955

1.445

1.446

1.042

8

19

SO2, CO and O3

0.937

0.958

1.973

1.528

1.473

1.090

18

20

NO2,CO and O3

0.939

0.964

1.944

1.427

1.440

1.025

4

respectively, which can be estimated by minimizing the following regularized risk function:

with the maximum and minimum values of 0 h and 13.9 h appearing in November and June, respectively. The concentrations of PM2.5, PM10, SO2, NO2 and CO were higher in the cold and dry winter than those in summer due to the combustion of fossil fuels for heating purposes. Similarly, the highest AQI was observed in winter (478.0) and the lowest value was found in summer (18.0). On the contrary, the concentration of O3, a secondary air pollutant, showed opposite trend, with the highest value in the hot and wet summer (169.0 μg m−3) and lowest in winter (2.0 μg m−3) due to the high radiation and temperature. Besides, the highest Rs and Rd values occurred in June at 31.0 MJ m−2 d−1 and 17.5 MJ m−2 d−1, while the lowest values occurred in November at 0.8 MJ m−2 d−1 and 0 MJ m−2 d−1.

R (C ) = C

1 n

n

∑i =1 L (di, yi ) +

1 ω 2

2

(2)

where C is the penalty parameter of the error, di is the desired value, n n 1 is the number of observations, and C n ∑i = 1 L (di , yi ) is the empirical error, in which the function Lε can be determined below:

Lε (d, y ) = d − y − ε d − y ≥ ε or 0 otherwise 1 2

(3)

where ω is the so-called regularization term and ɛ is the tube size. The approximated function in Eq. (1) is finally expressed in an explicit form by introducing Lagrange multipliers and exploiting the optimality constraints:

2.3. Modeling with Support Vector Machine (SVM)

2

f (x , αi, αi*) =

The SVM algorithm developed by Vapnik [46] is a supervised machine learning model for data analysis and pattern recognition, and it is widely applied for regression and forecasting. The SVM model estimates the regression based on a series of kernel functions, which are able to convert the original, lower-dimensional input data to a higher-dimensional feature space in an implicit manner. As compared to the ANN model normally with multiple local minima, the SVM gives a unique solution resulting from the convex nature of optimality problem [10]. The approximated function in the SVM algorithm is presented as follows:

f (x ) = ωφ (x ) + b

15

n

∑i =1 (αi − αi*) K (x , xi) + b

(4)

where K(x,xi) is the kernel function. The commonly used RBF non-linear kernel function was used in this study due to its better performance in predicting solar radiation compared with other kernel functions [37,9].

Krbf (x , x i ) = exp ⎡ ⎢ ⎣

−(x − x i )2 ⎤ ⎥ 2σ 2 ⎦

(5)

The detailed information and computation procedure of the SVM algorithm can be found in Vapnik [46].

(1)

2.4. Model inputs and SVM parameters

where φ(x) is the higher-dimensional feature space converted from the input vector x. ω and b are the weights vector and a threshold,

In the present study, as shown in Fig. 3, two baseline models were 739

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Table 9 Statistical results of SVM models for predicting daily temperature-based global solar radiation as well as the ranking of input combinations of three parameters in the training and testing stages.

RMSE MAE (MJ m−2 d−1) (MJ m−2 d−1) Ranking Training Testing Training Testing Training Testing R2

Combination #

Inputs

1

PM2.5, PM10 and SO2

0.787

0.828

3.433

2

PM2.5, PM10 and NO2

0.792

0.832

3

PM2.5, PM10 and CO

0.781

0.815

4

PM2.5, PM10 and O3

0.815

5

PM2.5, SO2 and NO2

6

3.155

2.362

2.058

3.390

3.113

2.274

2.064

7

3.480

3.268

2.393

2.173

18

0.850

3.179

2.947

2.118

1.986

1

0.780

0.808

3.493

3.323

2.422

2.107

19

PM2.5, SO2 and CO

0.778

0.815

3.517

3.264

2.486

2.179

17

7

PM2.5, SO2 and O3

0.817

0.830

3.196

3.131

2.197

2.046

8

8

PM2.5, NO2 and CO

0.785

0.819

3.447

3.227

2.303

2.106

14

9

PM2.5, NO2 and O3

0.812

0.848

3.222

2.969

2.099

1.995

2

10

PM2.5, CO and O3

0.819

0.845

3.181

3.002

2.150

2.001

4

11

PM10, SO2 and NO2

0.781

0.815

3.478

3.248

2.409

2.097

15

12

PM10, SO2 and CO

0.783

0.821

3.480

3.212

2.441

2.143

12

13

PM10, SO2 and O3

0.799

0.819

3.338

3.220

2.269

2.113

13

14

PM10, NO2 and CO

0.790

0.815

3.396

3.257

2.288

2.115

16

15

PM10, NO2 and O3

0.815

0.845

3.196

2.997

2.192

2.049

3

16

PM10, CO and O3

0.823

0.834

3.158

3.094

2.163

2.039

5

17

SO2, NO2 and CO

0.771

0.789

3.582

3.482

2.544

2.273

20

18

SO2, NO2 and O3

0.799

0.824

3.322

3.181

2.236

2.090

11

19

SO2, CO and O3

0.806

0.830

3.277

3.136

2.129

2.041

9

20

NO2,CO and O3

0.817

0.832

3.190

3.107

2.164

2.025

6

firstly considered to predict daily Rs and Rd based on sunshine duration or air temperature, i.e. (1) measured sunshine duration, calculated maximum sunshine duration and extraterrestrial solar radiation (n, N and Ra), and (2) measured minimum/maximum temperature and calculated extraterrestrial solar radiation (Tmax, Tmin and Ra) were used as model inputs. Based on the baseline models, six air pollution parameters of PM2.5, PM10, SO2, NO2, CO and O3 as well as AQI were further considered for analyzing their single and coupling effects on the improvement in prediction accuracy of daily Rs and Rd. Therefore, seven single parameters, 15 possible combinations of two parameters and 20 possible combinations of three parameters were considered for daily Rs and Rd prediction. For the SVM models, the default value of 0.1 was used for the parameter ε, while the parameters C and γ were optimized by using a grid search method with C and γ ranging from 2–8 to 28 at 2n intervals (n = − 8, − 7…, 0, …7, 8). Basically, all the 289 pairs of (C, γ) were tried and the one with the best accuracy was selected.

10

k

R2 =

∑i = 1 (Yi, m − Yi, e )2 k

∑i = 1 (Yi, m − Yi, m)2 1 k

RMSE =

MAE =

1 k

(6)

k

∑ (Yi,m − Yi,e)2 i=1

(7)

k



Yi, m − Yi, e

i=1

(8)

where Yi,m, Yi,e, Yi, m and k are the measured, predicted and the mean of measured global or diffuse solar radiation and the number of observations, respectively. Higher values of R2 are preferred, i.e. closer to 1 means better model performance and regression line fits the data well. Conversely, the lower the RMSE, MAE and absolute MBE values are, the better the model performs. The significance of a single or combination of inputs was then ranked based upon the three considered statistical indicators. Once the highest R2 and lowest RMSE and MAE were obtained for a single or combination of inputs, it was determined as the most influential variable or combination for Rs and Rd prediction and vice versa. In view of the requirements of the SVM algorithms, the raw meteorological data were scaled to a fixed range from 0 to 1 using the minmax normalization method:

2.5. Model comparison and statistical error analysis The accuracy and performance of the SVM models with different combinations of air pollution parameters for daily Rs and Rd prediction were evaluated and compared using three commonly used statistical indicators [14], which were coefficient of determination (R2, Eq. (6)), root mean square error (RMSE, Eq. (7)) and mean absolute error (MAE, Eq. (8)).

x norm =

740

x i − x min x max − x min

(9)

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Table 10 Statistical results of SVM models for predicting daily sunshine-based diffuse solar radiation as well as the ranking of input combinations of three parameters in the training and testing stages.

RMSE MAE (MJ m−2 d−1) (MJ m−2 d−1) Ranking Training Testing Training Testing Training Testing R2

Combination #

Inputs

1

PM2.5, PM10 and SO2

0.806

0.848

1.701

1.526

1.156

1.055

13

2

PM2.5, PM10 and NO2

0.810

0.852

1.679

1.509

1.145

1.039

9

3

PM2.5, PM10 and CO

0.814

0.854

1.666

1.491

1.127

1.051

6

4

PM2.5, PM10 and O3

0.830

0.867

1.595

1.432

1.104

1.012

1

5

PM2.5, SO2 and NO2

0.799

0.845

1.731

1.541

1.171

1.055

14

6

PM2.5, SO2 and CO

0.805

0.848

1.714

1.524

1.168

1.059

12

7

PM2.5, SO2 and O3

0.819

0.857

1.647

1.477

1.141

1.032

3

8

PM2.5, NO2 and CO

0.801

0.848

1.719

1.522

1.161

1.056

11

9

PM2.5, NO2 and O3

0.823

0.861

1.627

1.457

1.132

1.023

2

10

PM2.5, CO and O3

0.821

0.856

1.646

1.487

1.142

1.049

5

11

PM10, SO2 and NO2

0.806

0.828

1.718

1.624

1.196

1.124

19

12

PM10, SO2 and CO

0.815

0.832

1.673

1.607

1.172

1.124

18

13

PM10, SO2 and O3

0.819

0.850

1.646

1.518

1.167

1.078

10

14

PM10, NO2 and CO

0.815

0.839

1.667

1.565

1.157

1.086

16

15

PM10, NO2 and O3

0.821

0.854

1.636

1.494

1.168

1.066

7

16

PM10, CO and O3

0.828

0.857

1.608

1.479

1.142

1.049

4

17

SO2, NO2 and CO

0.780

0.806

1.821

1.723

1.244

1.182

20

18

SO2, NO2 and O3

0.808

0.841

1.702

1.559

1.208

1.111

15

19

SO2, CO and O3

0.810

0.837

1.689

1.576

1.204

1.106

17

20

NO2,CO and O3

0.819

0.854

1.648

1.495

1.169

1.059

8

3.2. Case 2: SVM models with one air pollution input

where xnorm and xi represent the moralized and raw training and testing data; xmax and xmin are the minimum and maximum of the training and testing data.

For the second case, only one air pollution parameter was further considered as input for the sunshine or temperature-based Rs and Rd models. For this purpose, seven SVM models were developed for all four model types to evaluate the impact of each considered air pollution input on Rs and Rd prediction. Table 3 shows the statistical results of SVM models for predicting daily Rs and Rd as well as the ranking of input parameters in the training and testing stages for different model types. The obtained results revealed that AQI was the most relevant air pollution parameter for prediction of both daily Rs and Rd, which can be largely due to the fact that AQI represents the pollution condition of the dominant air pollutant. This is in good agreement with previous studies [42,53,55], where AQI or API was found to be closely correlated with solar radiation and improved the prediction accuracy of Rs and Rd. The O3, PM2.5 and PM10 were the next influential parameters with slight difference as that of AQI for predicting daily Rs, while the PM2.5, PM10 and O3 were the next corresponding significant variables for daily Rd prediction. These results indicated that PM2.5, PM10 and O3 were the most influential air pollutants for the improvement in daily Rs and Rd prediction based on sunshine duration and air temperature. Basically, PM2.5 and PM10 are the major air pollutants in Beijing during the winters, but O3 has increasingly become the dominant air pollution in summer [12,20]. Therefore, both particulate matters and O3 have essential effects on the estimation of daily Rs and Rd. Although exploring the best input parameters is important, identifying the worst inputs can be also interesting. The results have recognized that SO2 was as the least influential variable on both daily Rs and Rd prediction, followed by

3. Results and discussion 3.1. Case 1: Baseline models with no air pollution input For the first case, only measured sunshine duration, calculated maximum sunshine duration and extra-terrestrial solar radiation or maximum/minimum temperature and calculated extra-terrestrial solar radiation were considered as inputs for SVM models. Table 2 shows the statistical results of SVM models for predicting daily Rs and Rd only based on sunshine duration or air temperature in the training and testing stages without considering the air pollution effects. As shown in Table 2, for global solar radiation, sunshine-based SVM models showed much better performance with R2 = 0.943, RMSE = 1.794 MJ m−2 d−1 and MAE = 1.248 MJ m−2 d−1 in the testing stage compared with the temperature-based models (R2 = 0.743, RMSE = 3.817 MJ m−2 d−1 and MAE = 2.709 MJ m−2 d−1). The results have also indicated that the sunshine duration-based models (R2 = 0.748, RMSE = 1.896 MJ m−2 d−1 and MAE = 1.344 MJ m−2 d−1 in the testing stage) were more accurate than models based on air temperature (R2 = 0.621, RMSE = 2.420 MJ m−2 d−1 and MAE = 1.857 MJ m−2 d−1) for predicting diffuse solar radiation, which was generally in good agreement with previous studies [13,26,54].

741

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Table 11 Statistical results of SVM models for predicting daily temperature-based diffuse solar radiation as well as the ranking of input combinations of three parameters in the training and testing stages.

RMSE MAE (MJ m−2 d−1) (MJ m−2 d−1) Ranking Training Testing Training Testing Training Testing R2

Combination #

Inputs

1

PM2.5, PM10 and SO2

0.692

0.733

2

PM2.5, PM10 and NO2

0.701

3

PM2.5, PM10 and CO

0.704

4

PM2.5, PM10 and O3

5

2.151

2.023

1.513

1.414

17

0.745

2.118

1.977

1.479

1.391

14

0.746

2.105

1.967

1.472

1.395

9

0.733

0.774

1.999

1.857

1.439

1.320

1

PM2.5, SO2 and NO2

0.686

0.738

2.174

2.001

1.511

1.412

16

6

PM2.5, SO2 and CO

0.689

0.746

2.165

1.968

1.517

1.409

10

7

PM2.5, SO2 and O3

0.714

0.766

2.075

1.894

1.490

1.358

4

8

PM2.5, NO2 and CO

0.689

0.746

2.155

1.968

1.474

1.409

10

9

PM2.5, NO2 and O3

0.721

0.769

2.047

1.880

1.444

1.338

3

10

PM2.5, CO and O3

0.719

0.769

2.050

1.876

1.452

1.354

2

11

PM10, SO2 and NO2

0.697

0.728

2.141

2.042

1.581

1.465

18

12

PM10, SO2 and CO

0.701

0.726

2.132

2.045

1.560

1.470

19

13

PM10, SO2 and O3

0.714

0.752

2.092

1.948

1.568

1.405

8

14

PM10, NO2 and CO

0.711

0.741

2.085

1.988

1.510

1.437

15

15

PM10, NO2 and O3

0.721

0.759

2.050

1.918

1.501

1.373

6

16

PM10, CO and O3

0.729

0.764

2.019

1.896

1.487

1.360

5

17

SO2, NO2 and CO

0.663

0.712

2.249

2.097

1.620

1.506

20

18

SO2, NO2 and O3

0.702

0.745

2.126

1.972

1.584

1.430

12

19

SO2, CO and O3

0.706

0.745

2.122

1.972

1.590

1.409

12

20

NO2,CO and O3

0.719

0.759

2.053

1.921

1.504

1.376

7

3.4. Case 4: SVM models with three air pollution inputs

CO and NO2 for daily Rs prediction and by NO2 and CO for daily Rd prediction.

To select the most influential combination of three air pollution parameters for improving the prediction of global and diffuse solar radiation based on sunshine duration and air temperature, 20 possible combinations of air pollution inputs were considered and assessed. Tables 8–11 provide statistical results of SVM models for predicting daily Rs and Rd as well as the ranking of input combinations of three parameters in the training and testing stages. The results show that the combination of PM2.5, PM10 and O3 was the most optimal combination of three inputs for both daily Rs and Rd prediction. For sunshine and temperature-based Rs prediction, the next most important combinations of three inputs were combinations of PM2.5, NO2 and O3 as well as PM10, NO2 and O3, with small differences in statistical indicators compared to that of the best combination. The next most significant combinations of three inputs were combinations of PM2.5, NO2 and O3 as well as PM2.5, SO2 and O3 for sunshine-based Rd prediction, and PM2.5, CO and O3 as well as PM2.5, NO2 and O3 for temperature-based Rd prediction. These results further indicated the significant effect of PM2.5, PM10 and O3 on the global and diffuse solar radiation prediction. Furthermore, the results revealed that the worst combination of three inputs was the combination of SO2, NO2 and CO.

3.3. Case 3: SVM models with two air pollution inputs To identify the best input combination of two air pollution parameters, 15 possible combinations of air pollution inputs were further considered and analyzed for the sunshine or temperature-based Rs and Rd models. Tables 4–7 provide statistical results of SVM models for predicting daily Rs and Rd as well as the ranking of input combinations of two parameters in the training and testing stages. It is apparent that the best combination of air pollution inputs was different between the Rs and Rd models owing to the characteristic differences in global and diffuse solar radiation. For Rs the combination of PM10 and O3 was found to be the most influential combination of two air pollution inputs, while for Rd the combination of PM2.5 and O3 was identified as the most influential combination of two inputs. The combinations of PM2.5 and O3 as well as NO2 and O3 were identified as the next influential combinations for predicting daily Rs. The combinations of PM10 and O3 as well as PM2.5 and PM10 were the next significant combinations for daily sunshine-based Rd prediction, while combinations of PM10 and O3 as well as NO2 and O3 were the next important combinations for daily temperature-based Rd prediction. It is clear that in the significant combinations of input parameters for different model types there were two out of four parameters of PM2.5, PM10, O3 and NO2 determined as the most important variables. Also, it is found that combinations of SO2 and NO2 as well as SO2 and CO were among the worst combinations to predict daily Rs and Rd.

3.5. Comparison of models with different air pollution inputs Although the most significant combinations of one, two and three air pollution inputs were revealed for the prediction of global and diffuse solar radiation, it is also essential to identify the best combination among all. This was only evaluated based on the statistical indicator of 742

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Fig. 4. Values of RMSE in both training and testing stages for the most influential set of parameters with no, one, two and three air pollution inputs for the four model types (Rs_n: sunshine-based global solar radiation; Rs_T: temperature-based global solar radiation; Rd_n: sunshine-based diffuse solar radiation; Rd_T: temperaturebased diffuse solar radiation; the same below).

RMSE, which was a mostly accepted statistical indicator to represent the prediction accuracy of daily Rs and Rd prediction. Fig. 4 shows the values of RMSE in both training and testing stages for the most influential combination with no, one, two and three air pollution inputs for the four model types. Compared to the SVM models without considering air pollution, the values of RMSE of the SVM models with one, two and three air pollution inputs were decreased by 13.9%, 19.8% and 22.2% in the testing stage for sunshine-based Rs, respectively. The corresponding values were 15.2%, 22.0% and 22.8% for temperaturebased Rs, 16.1%, 21.5% and 24.5% for sunshine-based Rd, and 16.8%, 22.0% and 23.3% for temperature-based Rd. It is obvious that by increasing the number of most influential air pollution parameters, the statistical errors declined and the corresponding prediction accuracy increased. However, this improvement decreased considerably for all model types when the number of input parameters increased from two to three. Thus, increasing the number of input parameters more than three for the construction of SVM models is expected to be not desirable due to minor improvements (no more than 3%) and further complexity in the required inputs to predict daily Rs and Rd. Therefore, considering the most significant combinations of two inputs (i.e., PM10 and O3 for Rs, PM2.5 and O3 for Rd) is the more preferable combination in terms of appropriate number of air pollution inputs to balance the model simplicity and prediction accuracy. However, the most significant combination of three air pollution inputs can be also considered when a much higher accuracy is required. In this case, the measured values of PM2.5

for Rs and PM10 for Rd should be incorporated as the third input parameter to produce a more accurate prediction of daily Rs and Rd. The results also indicate that in case of inaccessibility to the sunshine duration, the commonly measured air temperatures along with the calculated maximum possible sunshine duration and influential air pollution parameters can be used to predict daily global and diffuse solar radiation with favorable accuracy. The predicted daily global and diffuse solar radiation values by the SVM models using most important combination with no, one, two and three air pollutants as inputs in the testing stage have been compared with their corresponding measured values in Fig. 4. The results exhibited decrease in statistical errors and thus improvement in prediction accuracy when air pollution parameters were considered. As seen from Fig. 5, the obtained agreements between the predicted and measured solar radiation values were more favorable for combinations of two and three parameters. The dispersion degree of the data points in the scatter plots for combinations of two and three parameters was substantially lower than that of one and no air pollution parameter. These results further confirmed the importance of air pollution parameters on the accurate prediction of daily global and diffuse solar radiation. Fig. 6 also shows the monthly values of RMSE in the testing stage for the baseline models without considering the effect of air pollution and for the improved models by the most influential combination of three air pollution inputs for the four model types. It is found the prediction 743

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Fig. 5. Scatter plots of the developed models using most significant combination of no, one, two and three air pollutants as inputs in the testing data stage for the four model types.

Beijing (RMSE = 1.396 MJ m2 d−1). Although the temperature-based SVM models performed worse than the sunshine duration-based SVM models for Rs estimation, their performance was better or close to the sunshine-based empirical models proposed by Suthar et al. [43] and the RF models by Sun et al. [42]. Unsurprisingly, the prediction accuracy of Rd modeling from sunshine duration and AQI using the SVM model (RMSE = 1.591 MJ m2 d−1) in this study was worse than that of the global solar radiation-based empirical models (RMSE = 1.354 MJ m2 d−1) for Beijing [53]. However, when PM2.5, PM10 and O3 instead of AQI were included, the performance of sunshine duration-based SVM model was comparable to that of the global solar radiation-based empirical models. These results further confirmed the importance of appropriate selection of air pollution inputs to improve the accuracy of Rs and Rd prediction in air-polluted regions.

accuracy in terms of RMSE was significantly improved when considering the most influential combination of three air pollution inputs on a monthly basis, varying from 13% increase in autumn to 20% in winter for sunshine-based global solar radiation, from 12% increase in summer to 44% in winter for temperature-based global solar radiation, from 11% increase in autumn to 27% in summer for sunshine-based diffuse solar radiation and from 14% increase in autumn to 31% in summer for temperature-based diffuse solar radiation. As we can see from Fig. 6, higher improvement in prediction accuracy was obtained in winter for Rs and in summer for Rd. This can be largely attributed to the fact that PM2.5 and PM10 were dominant air pollutants in winter but O3 has becoming more dominant in summer in recent years (Fig. 2), which was also observed in previous studies [35,49]. Table 12 presents a comparison between the results of this study with the results of previous studies regarding the modeling of global and diffuse solar radiation by considering air pollution inputs. As seen in Table 12, the prediction accuracy of Rs modeling from sunshine duration and API using the SVM model (RMSE = 1.545 MJ m2 d−1) in this study was superior to that of the corresponding empirical models (RMSE = 1.893 MJ m2 d−1) with the same model inputs for Beijing [55]. Also, the SVM models outperformed the empirical models proposed by Suthar et al. [43] for nine cities in India (average RMSE = 3.293 MJ m2 d−1) and the RF models by Sun et al. [42] for three cities in China (average RMSE = 2.268 MJ m2 d−1, respectively). As we stated earlier, the SVM model having sunshine duration along with PM2.5, PM10 and O3 further improved the prediction accuracy of Rs in

4. Conclusions The effects of six air pollutants of PM2.5, PM10, SO2, NO2, CO and O3 along with AQI on the prediction of daily global solar radiation and diffuse solar radiation were evaluated. The Support Vector Machine (SVM) models were utilized to identify the most significant single and combination of air pollution parameters for daily Rs and Rd prediction based on meteorological and air pollution data between January 2014 and December 2016 from the capital city of Beijing, China. Specifically, seven single parameters, 15 possible combinations of two parameters and 20 possible combinations of three parameters were considered for daily Rs 744

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

Fig. 6. Monthly values of RMSE in the testing stage for the baseline models without considering the effect of air pollution and for the improved models by the most influential combination of three air pollution inputs for the four model types.

Table 12 Comparison between the results of this study with the results of previous studies regarding the modeling of global and diffuse solar radiation by considering air pollution inputs. Reference

Location

Model type

Input parameter

Output

RMSE (MJ m2 d−1)

Zhao et al. [55] Suthar et al. [43] Sun et al. [42] Present study Present study Vakili et al. [45] Present study Present study Yao et al. [53] Present study Present study Present study Present study

Beijing, China Nine cities in India Three cities in China Beijing, China Beijing, China Tehran, Iran Beijing, China Beijing, China Beijing, China Beijing, China Beijing, China Beijing, China Beijing, China

Empirical Empirical RF SVM SVM ANN SVM SVM Empirical SVM SVM SVM SVM

n, API n, API n, Tmax, Tmin, API n, AQI n, PM2.5, PM10 and O3 Tmax, Tmin, RH, U2, PM2.5, PM10 Tmax, Tmin, AQI Tmax, Tmin, PM2.5, PM10 and O3 Rs, AQI n, AQI n, PM2.5, PM10 and O3 Tmax, Tmin, AQI Tmax, Tmin, PM2.5, PM10 and O3

Rs Rs Rs Rs Rs Rs Rs Rs Rd Rd Rd Rd Rd

1.893 3.293 2.268 1.545 1.396 0.077 3.343 2.947 1.354 1.591 1.432 2.013 1.857

inputs for both daily Rs and Rd prediction. Compared with the SVM models without considering air pollution, the accuracy of the SVM models with the most influential combinations of one, two and three air pollution inputs was improved by 13.9%, 19.8% and 22.2% in the testing stage in terms of RMSE for sunshine-based Rs, respectively. The corresponding values were 15.2%, 22.0% and 22.8% for temperature-based Rs, 16.1%, 21.5% and 24.5% for sunshine-based Rd, and 16.8%, 22.0% and 23.3% for temperature-based Rd. The results indicate that the improvement in prediction accuracy decreased considerably when the number of inputs

and Rd prediction. The results show that the appropriate selection of air pollution input parameters played a significant role in accurate prediction of daily Rs and Rd. The AQI was found to be the most influential air pollution parameter for the prediction of daily Rs and Rd, followed by O3 for Rs and by PM2.5 for Rd with slight difference as that of AQI. The combination of PM10 and O3 and the combination of PM2.5 and O3 were determined as the most significant combination of two air pollution inputs for daily Rs and Rd prediction, respectively. The combination of PM2.5, PM10 and O3 was the most important combination of three air pollution 745

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

increased from two to three. Therefore, considering the most significant combinations of two air pollution inputs (i.e., PM10 and O3 for Rs, PM2.5 and O3 for Rd) is the more preferable combination in terms of appropriate number of inputs to balance the model simplicity and prediction accuracy.

[22] [23]

Acknowledgements

[24]

This study was jointly supported by the National Key Research and Development Program of China (No. 2016YFC0400201), the National Natural Science Foundation of China (Nos. 51509208, 51709143 and 51669015), Jiangxi Natural Science Foundation of China (No. 20171BAB216051), the Scientific Startup Foundation for Doctors of Northwest A&F University (No. Z109021613) and the “111” Project (B12007). Thanks to the National Meteorological Information Center of China Meteorological Administration for offering the meteorological data. We also acknowledge the constructive comments and suggestions from the three anonymous reviewers.

[25]

[26] [27] [28]

[29]

[30]

References

[31]

[1] Achour L, et al. Bouharkat M, Assas O. Hybrid model for estimating monthly global solar radiation for the Southern of Algeria: (Case study: Tamanrasset, Algeria). Energy 2017;135:526–39. [2] Abal G, Aicardi D, Suárez RA, Laguarda A. Performance of empirical models for diffuse fraction in Uruguay. Sol Energy 2017;141:166–81. [3] Allen RG, Pereira LS, Raes D, Smith M, et al. Crop evapotranspiration-Guidelines for computing crop water requirements-FAO Irrigation and drainage paper 56 300. Rome: FAO; 1998. p. D05109. [4] Bakirci K. Models for the estimation of diffuse solar radiation for typical cities in Turkey. Energy 2015;82:827–38. [5] Bakirci K. Models of solar radiation with hours of bright sunshine: a review. Renew Sustain Energy Rev 2009;13:2580–8. [6] Bayrakçı HC, Demircan C, Keçebaş A. The development of empirical models for estimating global solar radiation on horizontal surface: a case study. Renew Sustain Energy Rev 2017. [7] Behrang MA, Assareh E, Ghanbarzadeh A, Noghrehabadi AR. The potential of different artificial neural network (ANN) techniques in daily global solar radiation modeling based on meteorological data. Sol Energy 2010;84:1468–80. [8] Besharat F, Dehghan AA, Faghih AR. Empirical models for estimating global solar radiation: a review and case study. Renew Sustain Energy Rev 2013;21:798–821. [9] Chen J-L, Li G-S. Evaluation of support vector machine for estimation of solar radiation from measured meteorological variables. Theor Appl Climatol 2014;115:627–38. [10] Chen J-L, Li G-S, Wu S-J. Assessing the potential of support vector machine for estimating daily solar radiation using sunshine duration. Energy Convers Manag 2013;75:311–8. [11] Chen J-L, Li G-S, Xiao B-B, Wen Z-F, Lv M-Q, Chen C-D, Jiang Y, Wang X-X, Wu S-J. Assessing the transferability of support vector machine model for estimation of global solar radiation from air temperature. Energy Convers Manag 2015;89:318–29. [12] Cheng N, Li Y, Zhang D, Chen T, Sun F, Chen C, Meng F. Characteristics of ground ozone concentration over Beijing from 2004 to 2015: trends, transport, and effects of reductions. Atmos Chem Phys Discuss 2016. http://dx.doi.org/10.5194/acp2016-508,Rev. [13] Chukwujindu NS. A comprehensive review of empirical models for estimating global solar radiation in Africa. Renew Sustain Energy Rev 2017;78:955–95. [14] Despotovic M, Nedic V, Despotovic D, Cvetanovic S. Review and statistical analysis of different global solar radiation sunshine models. Renew Sustain Energy Rev 2015;52:1869–80. [15] Fan J, Chen B, Wu L, Zhang F, Lu X, Xiang Y. Evaluation and development of temperature-based empirical models for estimating daily global solar radiation in humid regions. Energy 2017. [16] Fan J, Wang X, Wu L, Zhang F, Bai H, Lu X, Xiang Y. New combined models for estimating daily global solar radiation based on sunshine duration in humid regions: a case study in South China. Energy Convers Manag 2018;156:618–25. [17] Fan J, Wang X, Wu L, Zhou H, Zhang F, Yu X, Xiang Y. Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: a case study in China. Energy Convers Manag 2018;164:102–11. [18] Fan J, Wu L, Zhang F, Xiang Y, Zheng J. Climate change effects on reference crop evapotranspiration across different climatic zones of China during 1956–2015. J Hydrol 2016;542:923–37. [19] Furlan C, De Oliveira AP, Soares J, Codato G, Escobedo JF. The role of clouds in improving the regression model for hourly values of diffuse solar radiation. Appl Energy 2012;92:240–54. [20] Gao W, Tie X, Xu J, Huang R, Mao X, Zhou G, Chang L. Long-term trend of O3 in a mega City (Shanghai), China: characteristics, causes, and interactions with precursors. Sci Total Environ 2017;603:425–33. [21] Hosseini Nazhad SH, Lotfinejad MM, Danesh M, et al. A comparison of the performance of some extreme learning machine empirical models for predicting daily

[32]

[33]

[34]

[35] [36] [37]

[38]

[39]

[40]

[41]

[42]

[43] [44]

[45]

[46] [47]

[48] [49]

[50]

[51]

746

horizontal diffuse solar radiation in a region of southern Iran. Int J Remote Sens 2017;38(23):6894–909. Hassan GE, Youssef ME, Mohamed ZE, Ali MA, Hanafy AA. New temperature-based models for predicting global solar radiation. Appl Energy 2016;179:437–50. Jahani B, Dinpashoh Y, Nafchi AR. Evaluation and development of empirical models for estimating daily solar radiation. Renew Sustain Energy Rev 2017;73:878–91. Jamil B, Akhtar N. Comparison of empirical models to estimate monthly mean diffuse solar radiation from measured data: case study for humid-subtropical climatic region of India. Renew Sustain Energy Rev 2017. Jamil B, Siddiqui AT. Generalized models for estimation of diffuse solar radiation based on clearness index and sunshine duration in India: applicability under different climatic zones. J Atmos Sol-Terr Phys 2017;157:16–34. Jiang Y. Estimation of monthly mean daily diffuse radiation in China. Appl Energy 2009;86:1458–64. Khodakarami J, Ghobadi P. Urban pollution and solar radiation impacts. Renew Sustain Energy Rev 2016;57:965–76. Kisi O, Parmar KS. Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution. J Hydrol 2016;534:104–12. Li H, Liu Z, Liu K, Zhang Z. Predictive power of machine learning for optimizing solar water heater performance: the potential application of high-throughput screening. Int J Photoenergy 2017:2017. Liu X, Mei X, Li Y, Wang Q, Jensen JR, Zhang Y, Porter JR. Evaluation of temperature-based global solar radiation models in China. Agric For Meteorol 2009;149:1433–46. Liu Z, Wu D, Yu H, Ma W, Gin G. Field measurement and numerical simulation of combined solar heating operation modes for domestic buildings based on the Qinghai-Tibetan plateau case. Energy Build. 2018;167:312–21. Lotfinejad MM, Hafezi R, Khanali M, et al. A comparative assessment of predicting daily solar radiation using Bat Neural Network (BNN), Generalized Regression Neural Network (GRNN), and Neuro-Fuzzy (NF) system: a case study. Energies 2018;11(5):1188. Meenal R, Selvakumar AI. Assessment of SVM, empirical and ANN based solar radiation prediction models with most influencing input parameters. Renew Energy 2017. Mohammadi K, Shamshirband S, Petković D, Khorasanizadeh H. Determining the most important variables for diffuse solar radiation prediction using adaptive neuro-fuzzy methodology; case study: city of Kerman, Iran. Renew Sustain Energy Rev 2016;53:1570–9. Muller CO, Yu H, Zhu B. Ambient air quality in China: the impact of particulate and gaseous pollutants on IAQ. Procedia Eng 2015;121:582–9. Pandey CK, Katiyar AK. A comparative study to estimate daily diffuse solar radiation over India. Energy 2009;34:1792–6. Quej VH, Almorox J, Arnaldo JA, Saito L. ANFIS, SVM and ANN soft-computing techniques to estimate daily global solar radiation in a warm sub-humid environment. J Atmos Sol-Terr Phys 2017;155:62–70. Quej VH, Almorox J, Ibrakhimov M, Saito L. Empirical models for estimating daily global solar radiation in Yucatán Peninsula, Mexico. Energy Convers Manag 2016;110:448–56. Ramli MAM, Twaha S, Al-Turki YA. Investigating the performance of support vector machine and artificial neural networks in predicting solar radiation on a tilted surface: Saudi Arabia case study. Energy Convers Manag 2015;105:442–52. Shamshirband S, Mohammadi K, Khorasanizadeh H, Yee L, Lee M, Petković D, Zalnezhad E. Estimating the diffuse solar radiation using a coupled support vector machine–wavelet transform model. Renew Sustain Energy Rev 2016;56:428–35. Sharifi SS, Rezaverdinejad V, Nourani V. Estimation of daily global solar radiation using wavelet regression, ANN, GEP and empirical models: a comparative study of selected temperature-based approaches. J Atmos Sol-Terr Phys 2016;149:131–45. Sun H, Gui D, Yan B, Liu Y, Liao W, Zhu Y, Lu C, Zhao N. Assessing the potential of random forest method for estimating solar radiation using air pollution index. Energy Convers Manag 2016;119:121–9. Suthar M, Singh GK, Saini RP. Effects of air pollution for estimating global solar radiation in India. Int J Sustain Energy 2017;36:20–7. Teke A, Yıldırım HB, Çelik Ö. Evaluation and performance comparison of different models for the estimation of solar radiation. Renew Sustain Energy Rev 2015;50:1097–107. Vakili M, Sabbagh-Yazdi SR, Khosrojerdi S, Kalhor K. Evaluating the effect of particulate matter pollution on estimation of daily global solar radiation using artificial neural network modeling based on meteorological data. J Clean Prod 2017;141:1275–85. Vapnik V. The nature of statistical learning theory. New York: Springer Science & Business Media; 2013. Voyant C, Notton G, Kalogirou S, Nivet M-L, Paoli C, Motte F, Fouilloy A. Machine learning methods for solar radiation forecasting: a review. Renew Energy 2017;105:569–82. Wang Y, Yang Y, Zhao N, Liu C, Wang Q. The magnitude of the effect of air pollution on sunshine hours in China. J Geophys Res Atmos 2012:117. Xie Y, Zhao B, Zhang L, Luo R. Spatiotemporal variations of PM2. 5 and PM10 concentrations between 31 Chinese cities and their relationships with SO2, NO2, CO and O3. Particuology 2015;20:141–9. Yacef R, Mellit A, Belaid S, Sen Z. New combined models for estimating daily global solar radiation from measured air temperature in semi-arid climates: application in Ghardaïa. Algeria Energy Convers Manag 2014;79:606–15. Yadav AK, Chandel SS. Solar radiation prediction using Artificial Neural Network techniques: a review. Renew Sustain Energy Rev 2014;33:772–81.

Renewable and Sustainable Energy Reviews 94 (2018) 732–747

J. Fan et al.

[55] Zhao N, Zeng X, Han S. Solar radiation estimation using sunshine hour and air pollution index in China. Energy Convers Manag 2013;76:846–51. [56] Zou L, Wang L, Xia L, Lin A, Hu B, Zhu H. Prediction and comparison of solar radiation using improved empirical models and adaptive neuro-fuzzy inference systems. Renew Energy 2017;106:343–53. [57] Zhou L, Zheng Y, Ouyang M, Lu L. A study on parameter variation effects on battery packs for electric vehicles. J. Power Sour. 2017;364:242–52.

[52] Yang X, Zhao C, Zhou L, Wang Y, Liu X. Distinct impact of different types of aerosols on surface solar radiation in China. J Geophys Res Atmos 2016;121:6459–71. [53] Yao W, Zhang C, Wang X, Sheng J, Zhu Y, Zhang S. The research of new daily diffuse solar radiation models modified by air quality index (AQI) in the region with heavy fog and haze. Energy Convers Manag 2017;139:140–50. [54] Zhang J, Zhao L, Deng S, Xu W, Zhang Y. A critical review of the models used to estimate solar radiation. Renew Sustain Energy Rev 2017;70:314–29.

747