Improved pollution forecasting hybrid algorithms based on the ensemble method

Improved pollution forecasting hybrid algorithms based on the ensemble method

Accepted Manuscript Improved Pollution Forecasting Hybrid Algorithms based on the Ensemble Method Hui Liu , Yinan Xu , Chao Chen PII: DOI: Reference:...

2MB Sizes 0 Downloads 82 Views

Accepted Manuscript

Improved Pollution Forecasting Hybrid Algorithms based on the Ensemble Method Hui Liu , Yinan Xu , Chao Chen PII: DOI: Reference:

S0307-904X(19)30230-6 https://doi.org/10.1016/j.apm.2019.04.032 APM 12779

To appear in:

Applied Mathematical Modelling

Received date: Revised date: Accepted date:

6 December 2018 4 April 2019 10 April 2019

Please cite this article as: Hui Liu , Yinan Xu , Chao Chen , Improved Pollution Forecasting Hybrid Algorithms based on the Ensemble Method, Applied Mathematical Modelling (2019), doi: https://doi.org/10.1016/j.apm.2019.04.032

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights 

The proposed model realizes the accurate multi-step forecasting for urban fine particle concentration.



The empirical wavelet transform and Stacking ensemble methods improve the



The hampel identifier and outlier robust extreme learning machine are used to realize the robustness of the forecasting.



CR IP T

accuracy of the proposed model.

The inverse empirical wavelet transform reconstruction solves the over-fitting

AC

CE

PT

ED

M

AN US

problem of the forecasting.

1

ACCEPTED MANUSCRIPT

Improved Pollution Forecasting Hybrid Algorithms based on the Ensemble Method

CR IP T

Hui Liu*, Yinan Xu, Chao Chen

Institute of Artificial Intelligence and Robotics (IAIR), Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic and

PT

ED

M

AN US

Transportation Engineering, Central South University, Changsha 410075, China

*Corresponding Author: Prof. Hui Liu, Professor, Institute of Artificial Intelligence

CE

and Robotics, Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic and Transportation Engineering, Central South University, Changsha

AC

410075, Hunan, China. Tel.: +86 13637487240. Email: [email protected]

2

ACCEPTED MANUSCRIPT

Abstract In a new fine particle concentrations forecasting model, the hampel identifier outlier correction preprocessing detects and corrects the outliers in the original series. Empirical wavelet transform method decomposes the corrected series into a set of subseries adaptively, and each subseries are used to train the Stacking ensemble

CR IP T

method. In the Stacking ensemble forecasting method, the outlier robust extreme learning machine meta-learner combines different Elman neural network base learners and outputs the forecasting results of different subseries. Different forecasting

AN US

subseries are combined and then reconstructed by inverse empirical wavelet transform reconstruction method to get the final forecasting fine particle concentrations results. It has been proved in the study that the model proposed in the study has better accuracy and wide applicability comparing to the existing models.

PT

1 Introduction

ED

Stacking ensemble method;

M

Keywords: Urban fine particle concentration forecasting; hybrid forecasting model;

1.1 Background and significance

CE

With the rapid development of industry and expansion of population, air pollution has become a global problem. Particulate matter smaller than 2.5 micrometers (PM2.5), a

AC

mix of solid and liquid particles in the air, can be inhaled directly to the alveoli of the lungs, making it a major concern in the international community. It has been proved that PM2.5 may affect the human respiratory system and cause chronic respiratory diseases [1]. An extremely serious air pollution incident happened in North China in February, 2014, during which the average PM2.5 concentrations in Beijing and Shijiazhuang were 170 ug / m3 and 290 ug / m3 , respectively. It is also recorded that 3

ACCEPTED MANUSCRIPT severely hazy conditions happened in January 2013 may have led to $253.8 million USD losses in Beijing [2]. Because of the unoptimistic fact of current air environmental quality, it is urgent to find some solutions. However, the management and control of fine particles air pollution is complex and difficult due to the complicated sources of air pollution and

CR IP T

the strong associations with the weather and wind direction [3]. Prediction of the atmospheric composition is considered as an effective way to help to perform better management measures of air quality significantly [4].

AN US

1.2 Related works

In recent years, plenty of fine particle concentration forecasting models have been proposed to assist the management of air quality. The forecasting models can be divided into three major categories: deterministic models, statistical models and

M

hybrid models [5, 6]. The deterministic methods simulate the process of discharge, accumulation, diffusion and transfer of atmospheric elements by employing

ED

meteorological, emission and chemistry models [7]. However, the deterministic

PT

methods need masses of computational resources and are extremely time-consuming to accomplish the forecasting process [8]. Because of the disadvantages of complex

CE

computational steps and slow computing speed, these deterministic models are not implemented in most instances [9].

AC

On the contrary, statistical models are computationally much quicker and easier [10]. An ARIMA model was applied to forecast fine particle concentrations [11]. The results showed that fine particle concentrations in Fuzhou, China experienced seasonal fluctuations over the past two years, caused by the Fuzhou Government's macropolicy on air quality. Another statistical model widely used in air pollution forecasting is grey model. The grey model approach was adopted to control the 4

ACCEPTED MANUSCRIPT indoor air quality [20], and the grey multivariable convolution model was proposed with new information priority accumulation which could be used in air pollution prediction [21]. Initially, the grey model was used to predict air pollutant emissions from a medical incinerator in [22]. Air quality indicators in the Beijing-Tianjin-Hebei region was predicted [12], the lightly polluted day in

CR IP T

Jing-Jin-Ji region of China was predicted [13], the monthly air quality index for cities in China was predicted by using grey Holt-Winters model [14], and the results showed that the air pollutant emissions could be predicted even if the emission data were insufficient. However, although the previously single statistical

AN US

models can generate the forecasting results in a short time, their accuracy still has room for improvement.

Hybrid models combine data decomposition, modeling or optimization methods to

M

improve the forecasting performance. For example, a decomposition-ensemble principle was presented with a meta-heuristic optimization method called the

ED

grey wolf optimizer [8]. The original fine particle series was decomposed into

PT

several intrinsic mode functions (IMFs) by complementary ensemble empirical mode decomposition (CEEMD), and then each IMF was predicted by support vector

CE

regression (SVR) optimized by grey wolf optimizer. The study results showed that the proposed decomposition-ensemble model has high prediction accuracy. Likewise,

AC

some secondary decomposition methods are utilized to further improve the forecasting ability. For instance, a hybrid secondary decomposition model was built [15]. The wavelet packet decomposition was used to decompose the original series and the high-frequency components were further decomposed by the CEEMD algorithm. Then Phase space reconstruction was employed to determine the optimal input of each IMF. The decomposed components were modeled by the least square 5

ACCEPTED MANUSCRIPT support vector regression (LSSVR) model, optimized by the particle swarm optimization method. Results showed that the hybrid model outperforms the other benchmark methods. A two-phase decomposition technique was proposed, combing with the extreme learning machine (ELM) optimized by differential evolution algorithm [5]. The CEEMD was firstly used to decompose the original

CR IP T

series into a set of IMFs and then variational mode decomposition (VMD) was utilized to decompose the high frequency IMFs into a number of variational modes. A dynamic evaluation framework was utilized to monitor the ambient air pollution [16], where an air contaminants concentration forecast engine was proposed, combing

AN US

the improved CEEMD and LSSVM optimized by improved sine cosine optimization algorithm. In the field of fine particle concentrations forecasting, it has been proved that the previously hybrid models can predict fine particle concentrations effectively.

M

However, it is worthy to investigate whether there are some methods which can further promote the accuracy, stability and robustness performance of the hybrid

ED

models. In the study, a new fine particle concentrations forecasting framework is

PT

proposed using the decomposition, ensemble, forecasting and reconstruction strategies. In the framework, the hampel identifier (HI) outlier correction preprocessing and

CE

outlier-robustness ELM (ORELM) meta-learner are utilized to realize the fine particle concentrations forecasting robustness of the framework, the Stacking ensemble

AC

method can improve the forecasting accuracy of the framework comparing to some existing mainstream models and the inverse empirical wavelets transform (IEWT) reconstruction method can guarantee the stability of the ensemble framework. As shown in Table 1, the new model proposed in the study improves the fine particle concentrations forecasting accuracy comparing to some other existing mainstream

6

ACCEPTED MANUSCRIPT models. A group of real data is used to verify the effectiveness of the model proposed in the study. Table 1 Some available forecasting results of the model proposed in related works.

New model

Study area (China) Lanzhou Shanghai Chengdu Beijing Shijiazhuang Harbin

Horizon 1 day 1 day 1 hour

RMSE (μg/m3) 4.89 (Table 3 in [3]) 3.27 (Table 2 in [5]) 3.77 (Table 3 in [15]) 0.36 0.20 0.28

CR IP T

Models (Step-1) [3] [5] [15]

1 hour

In the field of machine learning and neural network, the processing of training set is important. Presence of outliers can influence the training process of learning

AN US

algorithms and lead to over-fitting or poor generalization ability [17]. Outlier detection and correction methods have been used in many fields including noisy highly periodic data [18], gas turbine dynamic time series [19], streaming data [20],

M

etc. But outlier correction techniques are typically not executed in fine particle concentration forecasting. In the paper, Hampel Identifier (HI) [21], one of the most

ED

robust outlier detection and correction method, is employed to finish the detection and

PT

correction of the outliers in original fine particle concentration data. All of the decomposition methods in existing hybrid models need to predetermine the

CE

parameters artificially, like number of IMFs in CEEMD and number of modes in VMD. It has been proved that too few modes in VMD can lead to insufficient

AC

segmentation, while too many modes may capture additional noise or cause mode mixing problems [22]. The Empirical wavelet transform (EWT) can autonomously divide the frequency band according to the spectrum of the signal and generate corresponding filters to decompose the data series, and overcome the defect of predetermining parameters [23]. However, the EWT has not been widely used in fine particle concentration forecasting. In the field of wind speed forecasting, EWT is 7

ACCEPTED MANUSCRIPT utilized to decompose the wind speed series into several subseries [24]. Conclusions are drawn that the EWT can improve the forecasting accuracy. In the study, the EWT method is employed to decompose the corrected series after HI preprocessing. 1.3 Innovations and contributions In general, there are improving space for accuracy, robustness and stability of existing

CR IP T

models. In the study, an ensemble fine particle concentration multi-step forecasting model is proposed.

The main contributions of the study are summarized as follows: (A) ORELM and HI

AN US

promote the robustness of the hybrid model. In the study, HI outlier detection and correction method is selected to preprocess the training set of original fine particle concentration series to recognize and correct outliers in normal data series and reduce the complexity of original series. ORELM is a novel ELM with better robustness [25].

M

Innovatively, the model proposes the HI outlier correction preprocessing and novel ORELM to improve the robustness and stability. (B) The Stacking ensemble method

ED

integrated with four ENN base learners and ORELM meta-learner is proposed to

PT

enhance the forecasting performance. Using ENN as base learners, and ORELM as meta-learner to integrate the results of base learners, the Stacking ensemble method is

CE

proposed to forecast the fine particle concentrations. The Stacking ensemble method can improve the prediction accuracy. (C) The IEWT reconstruction method is selected

AC

as a post-processing to correct the over-fitting value of PM2.5 concentration results and promote the stability. The forecasting subseries are combined and reconstructed by the IEWT. The results of the study indicate that the IEWT can solve the over-fitting problem of Stacking, and improve the learning and fitting ability of the model proposed in the study consequently.

8

ACCEPTED MANUSCRIPT

2 Study area and data description Northern China, whose air pollution issue is one of the most critical and serious all around the world, is the focus area of the study. In the study, five sets of PM2.5 concentration data over a one-hour period are used to verify the effectiveness of the model proposed in the study.

CR IP T

All fine particle concentration series used in the study are collected from Chinese official PM2.5 concentration monitoring site. The series 1 and 2 are taken from Chemical Engineering School and High-Tech district in Shijiazhuang, Hebei province

AN US

whose codes are 1028A and 1030A. The series 3 and 4 are taken from Peace-Magnificent Park and Jianguo Road in Harbin, Heilongjiang province whose codes are 1133A and 1137A. The series 5 is taken from Haidian District, Beijing whose code is 1007A. All of the experimental data used in the study are collected

M

since April 3th, 2016 until May 5th, 2016.

Series 1-4 are used to analyze the performance of the model proposed in the study,

ED

and the series 5 is used to compare the model proposed in the study with some

PT

existing models. Each PM2.5 concentration series includes 900 observations and is divided into three parts. The 1st ~ 500th data are defined as the training set to train the

CE

hybrid model or the base learners of Stacking ensemble method. The 501th ~ 700th data are defined as the validation set to get optimal window length of HI and train the

AC

meta-learners of Stacking ensemble method. And the 701th ~ 900th data are defined as the testing data to estimate the forecasting accuracy of the models proposed in the study. Fig.1 demonstrate the original data used in the study.

9

AN US

CR IP T

ACCEPTED MANUSCRIPT

Fig. 1 Five fine particle concentration series in the study.

3 Methodology

M

3.1 Hampel identifier outlier correction method

ED

The Hampel identifier (HI) is considered to be one of the most effective and robust outlier correction method, and it is widely utilized in practical use [21]. HI can filter

described

PT

the abnormal information of data series. The specific procedure of HI can be as

follows.

Set

the

input

fine

particle

concentration

series

CE

X   x1 , x2 , x3 ,..., xn  . Set the length of the sliding window w . Set the evaluation

AC

parameter  . The evaluation parameter is set as default value   0.6745 .Set the threshold TR . According to the 3 statistical rules, the threshold TR is set as 3 in the study. If the sample xi meets the condition of Z '  TR , the sample xi will be regarded as an outlier and be replaced by the number of mi . In the sliding window, the median of the local subseries is calculated by the equation given as follows:

10

ACCEPTED MANUSCRIPT mi  median  xi k , xi (k 1) ,..., xi ,..., xi (k 1) , xi k 

(1)

where xi is the ith sample of the input data. The MAD of the local subseries in sliding window is defined as:

MADi  median  xi k  mi ,..., xi  mi ,..., xi (k 1)  mi , xi k  mi 

(2)

Z'

CR IP T

After that, the Z ' scores can be defined as:

xi  mi xi  mi  MAD /  MAD / 0.6745

3.2 Empirical wavelet transform decomposition

(3)

AN US

The EWT mainly contains two functions: the empirical wavelet function and the empirical scale function. The empirical wavelet function is defined as a band-pass filter on each frequency band, and it can be explained as follows:

(4)

ED

M

1,   1    n      1 n    cos      1   n    , 1   n    1   n     2  2n 0, otherwise 

PT

The empirical scale function is defined as a low-pass filter, and it can be explained as

CE

follows:

AC

1, 1+  n    1    n 1      1 cos  2   2    1    n 1    , 1    n 1    1   n 1  n 1    n          1 sin  2   2    1    n    , 1    n    1   n n     0, otherwise

(5)

where    0,1 controls the width of the transition phase. In order to make sure that



the EWT 1  t  , n  t n1 N



is a set of orthogonal bases of L2  R  , the  should be

11

ACCEPTED MANUSCRIPT    n  limited as:   min n  n 1  . In the study, the   x  is set as follows which is  n 1  n  widely used:   x   x4  35  84 x  70 x 2  20 x3  ,0  x  1 . 3.3 Stacking ensemble method

CR IP T

In the section, the Stacking ensemble method, ENN base learners and ORELM meta-learners are introduced. 3.3.1 Stacked generalization

Stacked Generalization is a nonlinear ensemble method [26]. Stacking selects a

AN US

meta-learner to apply the non-linear weightings for different base learners. So Stacking can also be considered as a combination strategy. Stacking method gives the hybrid model expansibility. Many experimental results demonstrate that one effective Stacking hybrid model can do better than any base learners [27].

M

The specific procedures of establishment of Stacking hybrid model can be described

ED

as follows. The initial data can be set as A   xn , yn  | n  1,2...N , where xn is the input data and yn is the predicting data or output data. In the study, the ENNs are set

PT

as the base learners. Especially, if the base learners are different, the primary

CE

integration is heterogeneous which can behave better in prediction than the homogeneous primary integration. In order to separate 4 ENNs base learners from

AC

each other, the specific parameters of 4 ENNs base learners are different. The base learners can be described as M k | k  1,2,3,4 . The non-void true subsets of A are set as A1 , A2 , A3 , which should meet the following requirements:

A1

A2

A3  A

A1

A2

A3   12

(6)

ACCEPTED MANUSCRIPT The subset A1   xn , yn  | n  1,2,..., N1 is used to train the base learners, in order to compare the effects of base learner with different number of set. In the study, the subset A1 is set as the training set. The subset A2   xn , y n  | n  1,2,..., N2  is set as the validation set in the study, which is used to generate the predicted data

CR IP T

through the base learners that have been trained and train meta-learner. The output data of base learners is the input data of meta-learner which can be described as





Pn   Pn1 , Pn2 , Pn3 , Pn4  |n  1 , 2 ,N. 2. .., The ORELM is used as meta-learner M ' in the study. The output data of base learners and the validation set are used to train

 Pn , yn  | n  1,2,..., N2 .

AN US

meta-learner M ' , which can be described as

The subset

A3   x n , y n  | n  1,2,..., N3 is set as the testing set to test the hybrid Stacking

model, after all base learners and meta-learner are trained.

M

3.3.2 Elman neutral network base learner

ED

ENN is a typical recurrent neural network [28]. Different from other artificial neural networks, the ENN contains several context units. The context units can store the

PT

output of hidden layers and input them to hidden layers again after a certain delay in subsequent iterations. So ENN is sensitive to the original datasets and it increases the

CE

ability of short-term memory to process dynamic information [29].

AC

3.3.3 Outlier-robust extreme learning machine meta-learner Extreme learning machine is one of the most powerful techniques in machine learning because of its excellent calculation speed and results [30]. But it can be unreliable when outliers exist in the training samples. Therefore, a novel outlier-robust extreme learning machine is proposed to improve the robustness of model based on the sparsity characteristic of outliers [25]. The mathematic model of looking for output weight  with l1 -norm can be described as: 13

ACCEPTED MANUSCRIPT min C e 1   2 , s. t. y  H  e 2

(7)



Where e is the training error, C is the regularization parameter introduced to regular the proportion between the training error and the norm of output weight. The augmented Lagrangian function is given as follows: 2 2

  T (y  H  e) 

 2

y  H  e

2 2

(8)

CR IP T

L (e,  ,  )  e 1  

Where  is a vector of the Lagrangian multiplier and  is a penalty parameter set as 2 N / y 1 [31].

AN US

Use the ALM to estimate the optimal solution (e,  ) and the multiplier  : L (e k , ,k )   k 1  arg min    L (e, k 1,k ) e k 1  arg min    k 1  k   (y  H k 1  ek 1 )

(9)

M

It leads to the following solution of  k 1 :

ED

k 1  (HT H  2 C I)1 HT (y  ek  k  )

(10)

The ORELM is utilized in the study because it can handle the problem of over-fitting

PT

of outliers and improve the robustness. Note that the input nodes number and hidden

CE

nodes number are set as six and five, respectively. And regularization parameter C is chosen as 240 in the study.

AC

3.4 Inverse empirical wavelet transform reconstruction The IEWT is selected to reconstruct the combined PM2.5 concentration subseries after forecasting process in the study. The reconstruction models of the IEWT method can be defined by the following equation:

14

ACCEPTED MANUSCRIPT N

f  t   wf  0, t   1  t    wf  n, t   n  t  n 1

N     wˆ f  0,    ˆ1     wˆ f  n,   ˆ n    n 1  



(11)

3.5 Framework of the proposed ensemble hybrid model The framework of the proposed ensemble model is shown in Fig. 2. The detailed

CR IP T

modeling steps are given as follows: (A) The original fine particle concentration series is divided into three datasets, including training set, validation set and testing set. (B) HI outlier correction preprocessing correct the outliers in training set. The corrected training set, original validation and testing set are the input series of the

AN US

following steps. (C) The training set of input data series is adaptively decomposed into a series of subseries by EWT decomposition method. (D) Each subseries is forecasted by the ENNs base predictors. The numbers of neurons of different base

M

predictors are set diversely. (E) The ORELM is set as the meta-predictor to combine the results of different ENN base predictors by the Stacking ensemble method. (F)

ED

The Stacking outputs the forecasting results of the corresponding original subseries.

PT

The IEWT reconstruction method combines the results and reconstructs them to get

AC

CE

the final results.

15

PT

ED

M

AN US

CR IP T

ACCEPTED MANUSCRIPT

AC

CE

Fig. 2 The framework of the proposed PM2.5 concentration forecasting hybrid model

16

ACCEPTED MANUSCRIPT

4 Evaluation indices and main results 4.1 Error evaluation indices In order to evaluate the forecasting performance of the model proposed in the study accurately, three widely used evaluation indices of PM2.5 concentration forecasting are

CR IP T

selected to calculate forecasting errors, including the Mean Absolute Percentage Error (MAPE), the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE), which are calculated by the following equations: N

MAPE  ( ( x(t )  xˆ (t )) / x(t ) ) / N

AN US

t 1

(12)

N

MAE  ( x(t )  xˆ (t ) ) / N

(13)

t 1

N

RMSE  ( x(t )  xˆ (t )  ) / N 2

(14)

M

t 1

where x(t ) is the actual value of PM2.5 concentration series, xˆ (t ) is the forecasting

PT

4.2 Sample entropy

ED

value of results and N is the number of samples in the testing set.

Sample entropy (SampEn, SE) is a modification of approximate entropy (ApEn). SE

CE

is widely used to evaluate the complexity of time series [32]. In general, the smaller value of SE represents better self-similarity of the time series.

AC

The sample entropy can be defined as follows:

SE   log

A B

(15)

Where A is the number of template vector groups when d  X m1(i), X m1( j )  r , and B is the number of template vector groups when d  X m (i), X m ( j )  r . In the study, m is set to be 3 and r is set to be 0.2 times standard deviation. 17

ACCEPTED MANUSCRIPT 4.3 Main forecasting results In the section, main results of different series are displayed. Figures 3-7 show the results of the PM2.5 concentrations multiple step forecasting. As shown in Figures 3-6, series 1-4 are used to analyze different components of the model proposed in the study, where series 1 is used for HI, series 2 is used for Stacking and series 4 is used

CR IP T

for IEWT reconstruction. As shown in Fig.3 and Fig.4, the EWT-Stacking-Elman model (in yellow) and HI-EWT-Stacking-Elman model (in grey) do not perform well in 5-step, especially where the original series waveform changes drastically. The Stacking ensemble method may lead to over-fitting because of the high complexity of

AN US

original series. The model proposed in the study solves the over-fitting problem benefiting from the IEWT reconstruction method. The IEWT reconstruction method can improve the fitting and learning ability of hybrid model significantly and restrain

M

the over-fitting phenomenon, which is a particularly serious problem in Stacking ensemble method. The IEWT fits well with the Stacking, and they complement each

AC

CE

PT

ED

other.

Fig. 3 Results of the PM2.5 concentration series 1 in 5-step forecasting.

18

CR IP T

ACCEPTED MANUSCRIPT

PT

ED

M

AN US

Fig. 4 Results of the PM2.5 concentration series 2 in 5-step forecasting.

AC

CE

Fig. 5 Results of the PM2.5 concentration series 3 in 5-step forecasting.

Fig. 6 Results of the PM2.5 concentration series 4 in 5-step forecasting. 19

CR IP T

ACCEPTED MANUSCRIPT

5 Comparison analysis 5.1 Analysis of HI outlier correction

AN US

Fig. 7 Results of the PM2.5 concentration series 5 in 1-step forecasting

In the section, the effectiveness and improvement of HI outlier correction method are

M

analyzed. Some major parameters of HI are listed and set as follows: the length of sliding window w is selected and changed by the influence of trial and error; the

ED

threshold TR is set the default value to 3. Especially, the length of sliding window

PT

w is selected in a range of [5, 45] to improve the adaptability of different series. SE is used to show the decrease of the complexity of initial data after HI outliers

CE

correction. Table 2 shows the specific reduction of it. Fig. 8 shows the specific results of series 1 after HI outliers correction. The outlier correction result shows that HI can

AC

distinguish the outliers from normal points and correct them to normal position. Table 2 The sample entropy of initial series and series after HI preprocessing.

Sample Entropy

Series 1

Series 2

Series 3

Series 4

Before HI

0.6074

0.5927

0.8834

0.5405

After HI

0.6071

0.5832

0.8830

0.5325

16

11

3

6

Best window length

20

ACCEPTED MANUSCRIPT From Table 2, it can be seen that: (A) depending on different series with different volatility characteristics, the best sliding window lengths of HI are different. The outliers in four series are apparently random. (B) HI can reduce the Sample entropy of the initial data. As shown in the bold text of Table 2, the SE indexes of all series after HI outlier correction are lower than that before HI. The

CR IP T

complexity of original series is reduced by removing the outliers. In consequence,

ED

M

AN US

the self-similarity and learnability of input data are improved.

PT

Fig. 8 HI outlier correction results of four PM2.5 concentration series. In addition, the evaluation indices of validation set are calculated to show the

CE

improvement of HI. Four comparison model groups are set to contrast and analyze: (A)

AC

HI-EWT-Elman vs. EWT-Elman; (B) HI-EWT-Stacking-Elman vs. EWT-Stacking -Elman; (C) HI-EWT–Elman-IEWT vs. EWT–Elman-IEWT; (D) HI-EWT-Stacking -Elman-IEWT vs. EWT-Stacking-Elman-IEWT. The comparison results of different groups are shown in Table 3 and 4. Table 3 Specific error indices of different comparison models in series 1 Step

MAPE (%)

1

1.6737

MAE (μg/m3) EWT-Elman 0.7322

RMSE (μg/m3)

MAPE (%)

0.9142

1.5076

21

MAE (μg/m3) HI-EWT-Elman 0.6772

RMSE (μg/m3) 0.8433

ACCEPTED MANUSCRIPT

1 3 5 1 3 5 1 3 5

3.6097 8.0443

1.5184 1.8846 3.2954 4.0561 EWT-Stacking-Elman 1.5882 0.8244 1.4599 3.1724 1.8559 3.9777 6.7110 3.7029 7.5899 EWT–Elman-IEWT 0.8580 0.3778 0.4594 3.3248 1.3850 1.6800 6.9544 2.8674 3.4108 EWT-Stacking-Elman-IEWT 0.6014 0.3088 0.4530 2.6553 1.3936 2.0241 5.3945 2.3010 2.8923

3.1869 1.3495 1.6733 7.3583 3.0230 3.7150 HI-EWT-Stacking-Elman 1.4169 0.6136 0.7550 3.0941 1.7550 3.3723 6.0797 3.5538 7.1464 HI-EWT–Elman-IEWT 0.7798 0.3219 0.4078 2.7820 1.1544 1.4174 6.7192 2.7839 3.2884 HI-EWT-Stacking-Elman-IEWT 0.3839 0.1614 0.2006 1.7387 0.8392 1.1380 4.5303 1.8427 2.2465

Table 4 The improving percentages after HI of series 1.

1 3 5 1 3 5

AN US

1 3 5

PMAE (%) PRMSE (%) HI-EWT-Elman vs. EWT-Elman 9.9210 7.5173 7.7521 11.7130 11.1196 11.2136 8.5276 8.2661 8.4076 HI-EWT-Stacking-Elman vs. EWT-Stacking-Elman 10.7842 25.5690 48.2827 2.4670 5.4634 15.2190 9.4067 4.0275 5.8430 HI-EWT–Elman-IEWT vs. EWT–Elman-IEWT 9.1131 14.8077 11.2322 16.3239 16.6470 15.6311 3.3814 2.9138 3.5886 HI-EWT-Stacking-Elman-IEWT vs. EWT-Stacking-Elman-IEWT 36.1774 47.7515 55.7248 34.5202 39.7841 43.7783 16.0197 19.9191 22.3284

M

1 3 5

PMAPE (%)

ED

Step

CR IP T

3 5

PT

From Tables 3 and 4, it can be analyzed that in the four comparing groups HI

CE

can improve the forecasting accuracy from one-step to five-step. In addition, as shown in the bold text of Table 4, the improving percentages after HI can reach

AC

up to 40%-50% which depends on the specific structure of the hybrid model. The maximum improving percentages appear in the 1-step forecasting. 5.2 Analysis of Stacking ensemble method In the proposed hybrid model, the Stacking ensemble method is used to improve the predicting performance of the ENNs. The meta-predictor chooses the ORELM to integrate the results of each base learner. The section analyzes the effectiveness and 22

ACCEPTED MANUSCRIPT improvement of Stacking ensemble method by comparing the HI-EWT-ENNs and HI-EWT-Stacking-ENN. Each HI-EWT-ENN has different hidden neuron number, including 15, 20, 25 and 30. The MAEs of base predictors are present in Fig. 9. The

AN US

CR IP T

forecasting performance of above models is present in Table 5.

Fig. 9 The MAEs of different base learners

M

Table 5 The improving percentages after Stacking in PM2.5 concentrations series 2. Step

CE

1 3 5

PT

1 3 5

ED

1 3 5

AC

1 3 5

PMAPE (%) PMAE (%) PRMSE (%) HI-EWT-ENN-15 vs. HI-EWT-Stacking-ENN 72.8564 67.4546 66.8889 29.7356 13.7283 7.1794 40.2696 33.8558 33.3260 HI-EWT-ENN-20 vs. HI-EWT-Stacking-ENN 53.4858 42.9999 42.4487 30.5257 11.9403 7.0224 38.9948 33.8853 35.0481 HI-EWT-ENN-25 vs. HI-EWT-Stacking-ENN 65.4461 61.0827 60.7301 32.1511 22.2556 15.3172 36.6464 33.1665 34.2609 HI-EWT-ENN-30 vs. HI-EWT-Stacking-ENN 71.1172 67.9485 67.8744 59.7242 55.2615 52.7095 43.1676 39.2120 40.9019

As the results shown in Fig.9 and Table 5, the base predictors have diverse forecasting results. The phenomenon proves that the forecasting results can provide diverse feature for Stacking method. In addition, as shown in the bold text of Table 5, the improving percentages of Stacking ensemble method can reach up to 72.8% which proves the remarkable ability of Stacking. The 23

ACCEPTED MANUSCRIPT phenomenon can be explained that the Stacking method utilizes meta-predictor to achieve nonlinear combination of base-predictors with minimum error. 5.3 Analysis of IEWT reconstruction In the section, the results of IEWT reconstruction method are analyzed. The process of EWT and IEWT are shown in Fig. 10 visually. In order to quantify and analyze the

CR IP T

effectiveness of IEWT, four comparison hybrid model groups are set, which are shown as follows: (A) EWT-Elman-IEWT vs. EWT-Elman; (B) EWT-Stacking -Elman-IEWT vs. EWT-Stacking-Elman; (C) HI-EWT–Elman-IEWT vs. HI-EWT

AN US

-Elman; (D) HI-EWT-Stacking-Elman-IEWT vs. HI-EWT-Stacking-Elman. The evaluation indices of different comparison hybrid model groups are shown in Tables 6 and 7.

Table 6 Specific error indices of different comparison models in series 4

1 3 5

M

AC

1 3 5

ED

1 3 5

PT

1 3 5

MAE RMSE (μg/m3) (μg/m3) EWT-Elman 2.6846 0.9365 1.1407 4.6131 1.8123 2.2640 8.6505 3.4586 4.3352 EWT-Stacking-Elman 1.2097 0.4237 0.5137 3.7214 1.2624 1.6163 8.2265 2.8987 3.5254 HI-EWT–Elman 1.7677 0.6321 0.7955 3.9682 1.5019 1.8513 8.0752 3.1647 3.8740 HI-EWT-Stacking-Elman 1.0466 0.3823 0.5011 3.6853 1.2939 1.5789 7.2568 2.7389 3.4912

MAPE (%)

CE

Step

MAE RMSE (μg/m3) (μg/m3) EWT-Elman-IEWT 1.5160 0.5427 0.6825 3.7084 1.4168 1.7740 7.7719 3.0210 3.7315 EWT-Stacking-Elman-IEWT 0.7519 0.2575 0.3051 3.5916 1.2411 1.4984 7.5199 2.6462 3.2419 HI-EWT–Elman-IEWT 1.4231 0.4680 0.5769 3.3668 1.3132 1.6753 7.6682 2.9777 3.6298 HI-EWT-Stacking-Elman-IEWT 0.5062 0.1848 0.2294 2.5835 0.9624 1.1926 4.5327 1.5825 2.0211

MAPE (%)

Table 7 The improving percentages of IEWT in the PM2.5 concentration series 4 Step 1 3 5 1

PMAPE (%)

PMAE (%) PRMSE (%) EWT-Elman-IEWT vs. EWT-Elman 43.5306 43.5306 43.5306 19.6123 19.6123 19.6123 10.1575 10.1575 10.1575 EWT-Stacking-Elman-IEWT vs. EWT-Stacking-Elman 37.8430 37.8430 37.8430 24

ACCEPTED MANUSCRIPT 3 5

3.4874 8.5893

3.4874 3.4874 8.5893 8.5893 HI-EWT–Elman-IEWT vs. HI-EWT–Elman 19.4958 19.4958 19.4958 15.1559 15.1559 15.1559 5.0401 5.0401 5.0401 HI-EWT-Stacking-Elman-IEWT vs. HI-EWT-Stacking-Elman 51.6315 51.6759 54.2282 29.8985 25.6205 24.4628 37.5385 42.2210 42.1017

1 3 5 1 3 5

CR IP T

From Tables 6 and 7, it can be analyzed that in the four comparison groups IEWT can improve the forecasting accuracy universally from one-step to five-step. In addition, as shown in the bold text of Table 7, the improving

AC

CE

PT

ED

M

AN US

percentages of IEWT can reach up to 51.63% -54.23%.

Fig. 10 The procedure and result of IEWT in the PM2.5 concentration series 4

As shown in Fig.10, the EWT decomposition method can decompose complex initial series into subseries with higher learnability. The forecasting accuracy of proposed hybrid model is improved in this way.

25

ACCEPTED MANUSCRIPT 5.4 Analysis of comparison with the existing models In the section, some existing fine particle concentration forecasting models are selected to compare with the model proposed in the study, which are shown as follows: (A) CEEMD-VMD-DE-ELM; (B) EEMD-PSR-LSSVR; (C) SDA-LSSVR-PSO; (D) CEEMD-GWO-SVR. Some existing models cannot forecast in multi-step, so the

CR IP T

model proposed in the study are compared with the following models in Step-1. The evaluation indices of different models are shown in Table 8.

Table 8 The evaluation indices of some existing models and the model proposed in the study

CEEMD-VMD-DE-ELM EEMD-PSR-LSSVR SDA-LSSVR-PSO CEEMD-GWO-SVR

New model

MAPE (%) 6.1525 15.0236 29.5788 16.9457 1.0236

MAE (μg/m3) 1.6178 4.7555 7.3659 5.6800 0.2818

RMSE (μg/m3) 2.1039 6.0483 8.5226 7.5946 0.3617

AN US

Models (Step-1)

PMAPE (%) 83.36 93.18 96.53 93.95

PMAE (%) 82.58 94.07 96.17 95.03

PRMSE (%) 82.80 94.01 95.75 95.23

M

From Table 8, the following conclusions of the hybrid models can be gotten: (A)

ED

The replication results of these models A, B and D in the study are close to the RMSE results of the existing models as given in Table 1. So it is reasonable to

PT

adopt the replication results to compare with the model proposed in the study. (B)

CE

The model proposed in the study outperforms the existing hybrid models. As shown in Table 8, the evaluation indices improvement percentages of the model

AC

proposed in the study can reach up to 80%-96% comparing to the existing models, which are enhanced by the Stacking ensemble method and EWT decomposition method.

6 Conclusions In the study, a novel ensemble hybrid model is proposed to forecast hourly fine particle concentrations in multi-step. Aiming at Northern China, three of the most 26

ACCEPTED MANUSCRIPT polluted cities are selected to train and test the hybrid model. The model proposed in the study combines outlier correction preprocessing, data decomposition method, ensemble method, neural network and data reconstruction. HI outlier correction preprocessing detects and corrects the outliers of the original PM2.5 concentration series, which improves the robustness of the hybrid model. The

CR IP T

EWT is used to decompose the corrected series into a set of subseries adaptively, and each subseries are used to train the Stacking ensemble method. Four different ENNs are set as the base learners and the ORELM is selected as the meta-learner to integrate the results of different base learners. The IEWT method is used to reconstruct the

AN US

combined subseries of forecasting output series to get the final forecasting PM2.5 concentration results.

After the experiment and analysis, the following conclusions can be drawn: (A) The

M

model proposed in the study promotes the performance in multi-step urban fine particle concentration forecasting comparing to the existing models. (B) The EWT

ED

decomposition and Stacking ensemble methods can promotes the forecasting accuracy

PT

of the hybrid model. (C) The HI outlier correction preprocessing and ORELM improve the robustness of the hybrid model. (D) The Stacking method may lead to

CE

over-fitting problem of the fine particle forecasting. (E) The IEWT reconstruction method solves the over-fitting problem of the hybrid model and improves the stability

AC

of the hybrid model. Acknowledgements The study is fully supported by the National Natural Science Foundation of China (Grant No. 61873283), the Shenghua Yu-ying Talents Program of the Central South University and the innovation driven project of the Central South University (Project No. 2019CX005). 27

ACCEPTED MANUSCRIPT References

AC

CE

PT

ED

M

AN US

CR IP T

[1] T. Li, R. Hu, Z. Chen, Q. Li, S. Huang, Z. Zhu, L.-F. Zhou, Fine particulate matter (PM2. 5): The culprit for chronic lung diseases in China, Chronic diseases and translational medicine, 4 (2018) 176-186. [2] M. Gao, S.K. Guttikunda, G.R. Carmichael, Y. Wang, Z. Liu, C.O. Stanier, P.E. Saide, M. Yu, Health impacts and economic losses assessment of the 2013 severe haze event in Beijing area, Science of the Total Environment, 511 (2015) 553-561. [3] M. Niu, K. Gan, S. Sun, F. Li, Application of decomposition-ensemble learning paradigm with phase space reconstruction for day-ahead PM2. 5 concentration forecasting, Journal of environmental management, 196 (2017) 110-118. [4] F. Biancofiore, M. Busilacchio, M. Verdecchia, B. Tomassetti, E. Aruffo, S. Bianco, S. Di Tommaso, C. Colangeli, G. Rosatelli, P. Di Carlo, Recursive neural network model for analysis and forecast of PM10 and PM2. 5, Atmospheric Pollution Research, 8 (2017) 652-659. [5] D. Wang, S. Wei, H. Luo, C. Yue, O. Grunder, A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine, Science of The Total Environment, 580 (2017) 719-733. [6] B. Zhai, J. Chen, Development of a stacked ensemble model for forecasting and analyzing daily average PM 2.5 concentrations in Beijing, China, Science of The Total Environment, 635 (2018) 644-658. [7] Y. Chen, R. Shi, S. Shu, W. Gao, Ensemble and enhanced PM 10 concentration forecast model based on stepwise regression and wavelet analysis, Atmospheric Environment, 74 (2013) 346-359. [8] M. Niu, Y. Wang, S. Sun, Y. Li, A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2. 5 concentration forecasting, Atmospheric Environment, 134 (2016) 168-180. [9] P. Perez, E. Gramsch, Forecasting hourly PM2. 5 in Santiago de Chile with emphasis on night episodes, Atmospheric Environment, 124 (2016) 22-27. [10] H.J. Fernando, M. Mammarella, G. Grandoni, P. Fedele, R. Di Marco, R. Dimitrova, P. Hyde, Forecasting PM10 in metropolitan areas: Efficacy of neural networks, Environmental pollution, 163 (2012) 62-67. [11] L. Zhang, J. Lin, R. Qiu, X. Hu, H. Zhang, Q. Chen, H. Tan, D. Lin, J. Wang, Trend analysis and forecast of PM2. 5 in Fuzhou, China using the ARIMA model, Ecological Indicators, 95 (2018) 702-710. [12] L. Wu, N. Li, Y. Yang, Prediction of air quality indicators for the Beijing-Tianjin-Hebei region, Journal of cleaner production, 196 (2018) 682-687. [13] L. Wu, H. Zhao, Using FGM (1, 1) model to predict the number of the lightly polluted day in Jing-Jin-Ji region of China, Atmospheric Pollution Research, (2018). [14] L. Wu, X. Gao, Y. Xiao, S. Liu, Y. Yang, Using grey Holt–Winters model to predict the air quality index for cities in China, Natural Hazards, 88 (2017) 1003-1012. [15] K. Gan, S. Sun, S. Wang, Y. Wei, A secondary-decomposition-ensemble learning paradigm for forecasting PM 2.5 concentration, Atmospheric Pollution Research, (2018). [16] R. Li, Y. Dong, Z. Zhu, C. Li, H. Yang, A dynamic evaluation framework for ambient air pollution monitoring, Applied Mathematical Modelling, 65 (2019) 52-71. [17] O.P. Panagopoulos, P. Xanthopoulos, T. Razzaghi, O. Şeref, Relaxed support vector regression, Annals of Operations Research, (2018) 1-20. 28

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

[18] D.T. Shipmon, J.M. Gurevitch, P.M. Piselli, S.T. Edwards, Time Series Anomaly Detection; Detection of anomalous drops with limited features and sparse examples in noisy highly periodic data, arXiv preprint arXiv:1708.03665, (2017). [19] G.F. Ceschini, N. Gatta, M. Venturini, T. Hubauer, A. Murarasu, Optimization of Statistical Methodologies for Anomaly Detection in Gas Turbine Dynamic Time Series, Journal of Engineering for Gas Turbines and Power, 140 (2018) 032401. [20] S. Ahmad, A. Lavin, S. Purdy, Z. Agha, Unsupervised real-time anomaly detection for streaming data, Neurocomputing, 262 (2017) 134-147. [21] R.K. Pearson, Outliers in process modeling and identification, IEEE Transactions on control systems technology, 10 (2002) 55-63. [22] K. Dragomiretskiy, D. Zosso, Variational mode decomposition, IEEE transactions on signal processing, 62 (2014) 531-544. [23] J. Gilles, Empirical Wavelet Transform, IEEE Transactions on Signal Processing, 61 (2013) 3999-4010. [24] H. Liu, X.W. Mi, Y.F. Li, Wind speed forecasting method based on deep learning strategy using empirical wavelet transform, long short term memory neural network and Elman neural network, Energy Conversion and Management, 156 (2018) 498-514. [25] K. Zhang, M. Luo, Outlier-robust extreme learning machine for regression problems, Neurocomputing, 151 (2015) 1519-1527. [26] D.H. Wolpert, Stacked generalization, Neural networks, 5 (1992) 241-259. [27] Z. Ma, Q. Dai, Selected an Stacking ELMs for Time Series Prediction, Neural Processing Letters, 44 (2016) 1-26. [28] J.L. Elman, Finding structure in time, Cognitive science, 14 (1990) 179-211. [29] P. Lin, Z. Peng, Y. Lai, S. Cheng, Z. Chen, L. Wu, Short-term power prediction for photovoltaic power plants using a hybrid improved Kmeans-GRA-Elman model based on multivariate meteorological factors and historical power datasets, Energy Conversion and Management, 177 (2018) 704-717. [30] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing, 70 (2006) 489-501. [31] J. Yang, Y. Zhang, Alternating direction algorithms for \ell_1-problems in compressive sensing, SIAM journal on scientific computing, 33 (2011) 250-278. [32] J.S. Richman, J.R. Moorman, Physiological time-series analysis using approximate entropy and sample entropy, American Journal of Physiology-Heart and Circulatory Physiology, 278 (2000) H2039-H2049.

29