Accepted Manuscript
Improved Pollution Forecasting Hybrid Algorithms based on the Ensemble Method Hui Liu , Yinan Xu , Chao Chen PII: DOI: Reference:
S0307-904X(19)30230-6 https://doi.org/10.1016/j.apm.2019.04.032 APM 12779
To appear in:
Applied Mathematical Modelling
Received date: Revised date: Accepted date:
6 December 2018 4 April 2019 10 April 2019
Please cite this article as: Hui Liu , Yinan Xu , Chao Chen , Improved Pollution Forecasting Hybrid Algorithms based on the Ensemble Method, Applied Mathematical Modelling (2019), doi: https://doi.org/10.1016/j.apm.2019.04.032
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights
The proposed model realizes the accurate multi-step forecasting for urban fine particle concentration.
The empirical wavelet transform and Stacking ensemble methods improve the
The hampel identifier and outlier robust extreme learning machine are used to realize the robustness of the forecasting.
CR IP T
accuracy of the proposed model.
The inverse empirical wavelet transform reconstruction solves the over-fitting
AC
CE
PT
ED
M
AN US
problem of the forecasting.
1
ACCEPTED MANUSCRIPT
Improved Pollution Forecasting Hybrid Algorithms based on the Ensemble Method
CR IP T
Hui Liu*, Yinan Xu, Chao Chen
Institute of Artificial Intelligence and Robotics (IAIR), Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic and
PT
ED
M
AN US
Transportation Engineering, Central South University, Changsha 410075, China
*Corresponding Author: Prof. Hui Liu, Professor, Institute of Artificial Intelligence
CE
and Robotics, Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic and Transportation Engineering, Central South University, Changsha
AC
410075, Hunan, China. Tel.: +86 13637487240. Email:
[email protected]
2
ACCEPTED MANUSCRIPT
Abstract In a new fine particle concentrations forecasting model, the hampel identifier outlier correction preprocessing detects and corrects the outliers in the original series. Empirical wavelet transform method decomposes the corrected series into a set of subseries adaptively, and each subseries are used to train the Stacking ensemble
CR IP T
method. In the Stacking ensemble forecasting method, the outlier robust extreme learning machine meta-learner combines different Elman neural network base learners and outputs the forecasting results of different subseries. Different forecasting
AN US
subseries are combined and then reconstructed by inverse empirical wavelet transform reconstruction method to get the final forecasting fine particle concentrations results. It has been proved in the study that the model proposed in the study has better accuracy and wide applicability comparing to the existing models.
PT
1 Introduction
ED
Stacking ensemble method;
M
Keywords: Urban fine particle concentration forecasting; hybrid forecasting model;
1.1 Background and significance
CE
With the rapid development of industry and expansion of population, air pollution has become a global problem. Particulate matter smaller than 2.5 micrometers (PM2.5), a
AC
mix of solid and liquid particles in the air, can be inhaled directly to the alveoli of the lungs, making it a major concern in the international community. It has been proved that PM2.5 may affect the human respiratory system and cause chronic respiratory diseases [1]. An extremely serious air pollution incident happened in North China in February, 2014, during which the average PM2.5 concentrations in Beijing and Shijiazhuang were 170 ug / m3 and 290 ug / m3 , respectively. It is also recorded that 3
ACCEPTED MANUSCRIPT severely hazy conditions happened in January 2013 may have led to $253.8 million USD losses in Beijing [2]. Because of the unoptimistic fact of current air environmental quality, it is urgent to find some solutions. However, the management and control of fine particles air pollution is complex and difficult due to the complicated sources of air pollution and
CR IP T
the strong associations with the weather and wind direction [3]. Prediction of the atmospheric composition is considered as an effective way to help to perform better management measures of air quality significantly [4].
AN US
1.2 Related works
In recent years, plenty of fine particle concentration forecasting models have been proposed to assist the management of air quality. The forecasting models can be divided into three major categories: deterministic models, statistical models and
M
hybrid models [5, 6]. The deterministic methods simulate the process of discharge, accumulation, diffusion and transfer of atmospheric elements by employing
ED
meteorological, emission and chemistry models [7]. However, the deterministic
PT
methods need masses of computational resources and are extremely time-consuming to accomplish the forecasting process [8]. Because of the disadvantages of complex
CE
computational steps and slow computing speed, these deterministic models are not implemented in most instances [9].
AC
On the contrary, statistical models are computationally much quicker and easier [10]. An ARIMA model was applied to forecast fine particle concentrations [11]. The results showed that fine particle concentrations in Fuzhou, China experienced seasonal fluctuations over the past two years, caused by the Fuzhou Government's macropolicy on air quality. Another statistical model widely used in air pollution forecasting is grey model. The grey model approach was adopted to control the 4
ACCEPTED MANUSCRIPT indoor air quality [20], and the grey multivariable convolution model was proposed with new information priority accumulation which could be used in air pollution prediction [21]. Initially, the grey model was used to predict air pollutant emissions from a medical incinerator in [22]. Air quality indicators in the Beijing-Tianjin-Hebei region was predicted [12], the lightly polluted day in
CR IP T
Jing-Jin-Ji region of China was predicted [13], the monthly air quality index for cities in China was predicted by using grey Holt-Winters model [14], and the results showed that the air pollutant emissions could be predicted even if the emission data were insufficient. However, although the previously single statistical
AN US
models can generate the forecasting results in a short time, their accuracy still has room for improvement.
Hybrid models combine data decomposition, modeling or optimization methods to
M
improve the forecasting performance. For example, a decomposition-ensemble principle was presented with a meta-heuristic optimization method called the
ED
grey wolf optimizer [8]. The original fine particle series was decomposed into
PT
several intrinsic mode functions (IMFs) by complementary ensemble empirical mode decomposition (CEEMD), and then each IMF was predicted by support vector
CE
regression (SVR) optimized by grey wolf optimizer. The study results showed that the proposed decomposition-ensemble model has high prediction accuracy. Likewise,
AC
some secondary decomposition methods are utilized to further improve the forecasting ability. For instance, a hybrid secondary decomposition model was built [15]. The wavelet packet decomposition was used to decompose the original series and the high-frequency components were further decomposed by the CEEMD algorithm. Then Phase space reconstruction was employed to determine the optimal input of each IMF. The decomposed components were modeled by the least square 5
ACCEPTED MANUSCRIPT support vector regression (LSSVR) model, optimized by the particle swarm optimization method. Results showed that the hybrid model outperforms the other benchmark methods. A two-phase decomposition technique was proposed, combing with the extreme learning machine (ELM) optimized by differential evolution algorithm [5]. The CEEMD was firstly used to decompose the original
CR IP T
series into a set of IMFs and then variational mode decomposition (VMD) was utilized to decompose the high frequency IMFs into a number of variational modes. A dynamic evaluation framework was utilized to monitor the ambient air pollution [16], where an air contaminants concentration forecast engine was proposed, combing
AN US
the improved CEEMD and LSSVM optimized by improved sine cosine optimization algorithm. In the field of fine particle concentrations forecasting, it has been proved that the previously hybrid models can predict fine particle concentrations effectively.
M
However, it is worthy to investigate whether there are some methods which can further promote the accuracy, stability and robustness performance of the hybrid
ED
models. In the study, a new fine particle concentrations forecasting framework is
PT
proposed using the decomposition, ensemble, forecasting and reconstruction strategies. In the framework, the hampel identifier (HI) outlier correction preprocessing and
CE
outlier-robustness ELM (ORELM) meta-learner are utilized to realize the fine particle concentrations forecasting robustness of the framework, the Stacking ensemble
AC
method can improve the forecasting accuracy of the framework comparing to some existing mainstream models and the inverse empirical wavelets transform (IEWT) reconstruction method can guarantee the stability of the ensemble framework. As shown in Table 1, the new model proposed in the study improves the fine particle concentrations forecasting accuracy comparing to some other existing mainstream
6
ACCEPTED MANUSCRIPT models. A group of real data is used to verify the effectiveness of the model proposed in the study. Table 1 Some available forecasting results of the model proposed in related works.
New model
Study area (China) Lanzhou Shanghai Chengdu Beijing Shijiazhuang Harbin
Horizon 1 day 1 day 1 hour
RMSE (μg/m3) 4.89 (Table 3 in [3]) 3.27 (Table 2 in [5]) 3.77 (Table 3 in [15]) 0.36 0.20 0.28
CR IP T
Models (Step-1) [3] [5] [15]
1 hour
In the field of machine learning and neural network, the processing of training set is important. Presence of outliers can influence the training process of learning
AN US
algorithms and lead to over-fitting or poor generalization ability [17]. Outlier detection and correction methods have been used in many fields including noisy highly periodic data [18], gas turbine dynamic time series [19], streaming data [20],
M
etc. But outlier correction techniques are typically not executed in fine particle concentration forecasting. In the paper, Hampel Identifier (HI) [21], one of the most
ED
robust outlier detection and correction method, is employed to finish the detection and
PT
correction of the outliers in original fine particle concentration data. All of the decomposition methods in existing hybrid models need to predetermine the
CE
parameters artificially, like number of IMFs in CEEMD and number of modes in VMD. It has been proved that too few modes in VMD can lead to insufficient
AC
segmentation, while too many modes may capture additional noise or cause mode mixing problems [22]. The Empirical wavelet transform (EWT) can autonomously divide the frequency band according to the spectrum of the signal and generate corresponding filters to decompose the data series, and overcome the defect of predetermining parameters [23]. However, the EWT has not been widely used in fine particle concentration forecasting. In the field of wind speed forecasting, EWT is 7
ACCEPTED MANUSCRIPT utilized to decompose the wind speed series into several subseries [24]. Conclusions are drawn that the EWT can improve the forecasting accuracy. In the study, the EWT method is employed to decompose the corrected series after HI preprocessing. 1.3 Innovations and contributions In general, there are improving space for accuracy, robustness and stability of existing
CR IP T
models. In the study, an ensemble fine particle concentration multi-step forecasting model is proposed.
The main contributions of the study are summarized as follows: (A) ORELM and HI
AN US
promote the robustness of the hybrid model. In the study, HI outlier detection and correction method is selected to preprocess the training set of original fine particle concentration series to recognize and correct outliers in normal data series and reduce the complexity of original series. ORELM is a novel ELM with better robustness [25].
M
Innovatively, the model proposes the HI outlier correction preprocessing and novel ORELM to improve the robustness and stability. (B) The Stacking ensemble method
ED
integrated with four ENN base learners and ORELM meta-learner is proposed to
PT
enhance the forecasting performance. Using ENN as base learners, and ORELM as meta-learner to integrate the results of base learners, the Stacking ensemble method is
CE
proposed to forecast the fine particle concentrations. The Stacking ensemble method can improve the prediction accuracy. (C) The IEWT reconstruction method is selected
AC
as a post-processing to correct the over-fitting value of PM2.5 concentration results and promote the stability. The forecasting subseries are combined and reconstructed by the IEWT. The results of the study indicate that the IEWT can solve the over-fitting problem of Stacking, and improve the learning and fitting ability of the model proposed in the study consequently.
8
ACCEPTED MANUSCRIPT
2 Study area and data description Northern China, whose air pollution issue is one of the most critical and serious all around the world, is the focus area of the study. In the study, five sets of PM2.5 concentration data over a one-hour period are used to verify the effectiveness of the model proposed in the study.
CR IP T
All fine particle concentration series used in the study are collected from Chinese official PM2.5 concentration monitoring site. The series 1 and 2 are taken from Chemical Engineering School and High-Tech district in Shijiazhuang, Hebei province
AN US
whose codes are 1028A and 1030A. The series 3 and 4 are taken from Peace-Magnificent Park and Jianguo Road in Harbin, Heilongjiang province whose codes are 1133A and 1137A. The series 5 is taken from Haidian District, Beijing whose code is 1007A. All of the experimental data used in the study are collected
M
since April 3th, 2016 until May 5th, 2016.
Series 1-4 are used to analyze the performance of the model proposed in the study,
ED
and the series 5 is used to compare the model proposed in the study with some
PT
existing models. Each PM2.5 concentration series includes 900 observations and is divided into three parts. The 1st ~ 500th data are defined as the training set to train the
CE
hybrid model or the base learners of Stacking ensemble method. The 501th ~ 700th data are defined as the validation set to get optimal window length of HI and train the
AC
meta-learners of Stacking ensemble method. And the 701th ~ 900th data are defined as the testing data to estimate the forecasting accuracy of the models proposed in the study. Fig.1 demonstrate the original data used in the study.
9
AN US
CR IP T
ACCEPTED MANUSCRIPT
Fig. 1 Five fine particle concentration series in the study.
3 Methodology
M
3.1 Hampel identifier outlier correction method
ED
The Hampel identifier (HI) is considered to be one of the most effective and robust outlier correction method, and it is widely utilized in practical use [21]. HI can filter
described
PT
the abnormal information of data series. The specific procedure of HI can be as
follows.
Set
the
input
fine
particle
concentration
series
CE
X x1 , x2 , x3 ,..., xn . Set the length of the sliding window w . Set the evaluation
AC
parameter . The evaluation parameter is set as default value 0.6745 .Set the threshold TR . According to the 3 statistical rules, the threshold TR is set as 3 in the study. If the sample xi meets the condition of Z ' TR , the sample xi will be regarded as an outlier and be replaced by the number of mi . In the sliding window, the median of the local subseries is calculated by the equation given as follows:
10
ACCEPTED MANUSCRIPT mi median xi k , xi (k 1) ,..., xi ,..., xi (k 1) , xi k
(1)
where xi is the ith sample of the input data. The MAD of the local subseries in sliding window is defined as:
MADi median xi k mi ,..., xi mi ,..., xi (k 1) mi , xi k mi
(2)
Z'
CR IP T
After that, the Z ' scores can be defined as:
xi mi xi mi MAD / MAD / 0.6745
3.2 Empirical wavelet transform decomposition
(3)
AN US
The EWT mainly contains two functions: the empirical wavelet function and the empirical scale function. The empirical wavelet function is defined as a band-pass filter on each frequency band, and it can be explained as follows:
(4)
ED
M
1, 1 n 1 n cos 1 n , 1 n 1 n 2 2n 0, otherwise
PT
The empirical scale function is defined as a low-pass filter, and it can be explained as
CE
follows:
AC
1, 1+ n 1 n 1 1 cos 2 2 1 n 1 , 1 n 1 1 n 1 n 1 n 1 sin 2 2 1 n , 1 n 1 n n 0, otherwise
(5)
where 0,1 controls the width of the transition phase. In order to make sure that
the EWT 1 t , n t n1 N
is a set of orthogonal bases of L2 R , the should be
11
ACCEPTED MANUSCRIPT n limited as: min n n 1 . In the study, the x is set as follows which is n 1 n widely used: x x4 35 84 x 70 x 2 20 x3 ,0 x 1 . 3.3 Stacking ensemble method
CR IP T
In the section, the Stacking ensemble method, ENN base learners and ORELM meta-learners are introduced. 3.3.1 Stacked generalization
Stacked Generalization is a nonlinear ensemble method [26]. Stacking selects a
AN US
meta-learner to apply the non-linear weightings for different base learners. So Stacking can also be considered as a combination strategy. Stacking method gives the hybrid model expansibility. Many experimental results demonstrate that one effective Stacking hybrid model can do better than any base learners [27].
M
The specific procedures of establishment of Stacking hybrid model can be described
ED
as follows. The initial data can be set as A xn , yn | n 1,2...N , where xn is the input data and yn is the predicting data or output data. In the study, the ENNs are set
PT
as the base learners. Especially, if the base learners are different, the primary
CE
integration is heterogeneous which can behave better in prediction than the homogeneous primary integration. In order to separate 4 ENNs base learners from
AC
each other, the specific parameters of 4 ENNs base learners are different. The base learners can be described as M k | k 1,2,3,4 . The non-void true subsets of A are set as A1 , A2 , A3 , which should meet the following requirements:
A1
A2
A3 A
A1
A2
A3 12
(6)
ACCEPTED MANUSCRIPT The subset A1 xn , yn | n 1,2,..., N1 is used to train the base learners, in order to compare the effects of base learner with different number of set. In the study, the subset A1 is set as the training set. The subset A2 xn , y n | n 1,2,..., N2 is set as the validation set in the study, which is used to generate the predicted data
CR IP T
through the base learners that have been trained and train meta-learner. The output data of base learners is the input data of meta-learner which can be described as
Pn Pn1 , Pn2 , Pn3 , Pn4 |n 1 , 2 ,N. 2. .., The ORELM is used as meta-learner M ' in the study. The output data of base learners and the validation set are used to train
Pn , yn | n 1,2,..., N2 .
AN US
meta-learner M ' , which can be described as
The subset
A3 x n , y n | n 1,2,..., N3 is set as the testing set to test the hybrid Stacking
model, after all base learners and meta-learner are trained.
M
3.3.2 Elman neutral network base learner
ED
ENN is a typical recurrent neural network [28]. Different from other artificial neural networks, the ENN contains several context units. The context units can store the
PT
output of hidden layers and input them to hidden layers again after a certain delay in subsequent iterations. So ENN is sensitive to the original datasets and it increases the
CE
ability of short-term memory to process dynamic information [29].
AC
3.3.3 Outlier-robust extreme learning machine meta-learner Extreme learning machine is one of the most powerful techniques in machine learning because of its excellent calculation speed and results [30]. But it can be unreliable when outliers exist in the training samples. Therefore, a novel outlier-robust extreme learning machine is proposed to improve the robustness of model based on the sparsity characteristic of outliers [25]. The mathematic model of looking for output weight with l1 -norm can be described as: 13
ACCEPTED MANUSCRIPT min C e 1 2 , s. t. y H e 2
(7)
Where e is the training error, C is the regularization parameter introduced to regular the proportion between the training error and the norm of output weight. The augmented Lagrangian function is given as follows: 2 2
T (y H e)
2
y H e
2 2
(8)
CR IP T
L (e, , ) e 1
Where is a vector of the Lagrangian multiplier and is a penalty parameter set as 2 N / y 1 [31].
AN US
Use the ALM to estimate the optimal solution (e, ) and the multiplier : L (e k , ,k ) k 1 arg min L (e, k 1,k ) e k 1 arg min k 1 k (y H k 1 ek 1 )
(9)
M
It leads to the following solution of k 1 :
ED
k 1 (HT H 2 C I)1 HT (y ek k )
(10)
The ORELM is utilized in the study because it can handle the problem of over-fitting
PT
of outliers and improve the robustness. Note that the input nodes number and hidden
CE
nodes number are set as six and five, respectively. And regularization parameter C is chosen as 240 in the study.
AC
3.4 Inverse empirical wavelet transform reconstruction The IEWT is selected to reconstruct the combined PM2.5 concentration subseries after forecasting process in the study. The reconstruction models of the IEWT method can be defined by the following equation:
14
ACCEPTED MANUSCRIPT N
f t wf 0, t 1 t wf n, t n t n 1
N wˆ f 0, ˆ1 wˆ f n, ˆ n n 1
(11)
3.5 Framework of the proposed ensemble hybrid model The framework of the proposed ensemble model is shown in Fig. 2. The detailed
CR IP T
modeling steps are given as follows: (A) The original fine particle concentration series is divided into three datasets, including training set, validation set and testing set. (B) HI outlier correction preprocessing correct the outliers in training set. The corrected training set, original validation and testing set are the input series of the
AN US
following steps. (C) The training set of input data series is adaptively decomposed into a series of subseries by EWT decomposition method. (D) Each subseries is forecasted by the ENNs base predictors. The numbers of neurons of different base
M
predictors are set diversely. (E) The ORELM is set as the meta-predictor to combine the results of different ENN base predictors by the Stacking ensemble method. (F)
ED
The Stacking outputs the forecasting results of the corresponding original subseries.
PT
The IEWT reconstruction method combines the results and reconstructs them to get
AC
CE
the final results.
15
PT
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
AC
CE
Fig. 2 The framework of the proposed PM2.5 concentration forecasting hybrid model
16
ACCEPTED MANUSCRIPT
4 Evaluation indices and main results 4.1 Error evaluation indices In order to evaluate the forecasting performance of the model proposed in the study accurately, three widely used evaluation indices of PM2.5 concentration forecasting are
CR IP T
selected to calculate forecasting errors, including the Mean Absolute Percentage Error (MAPE), the Mean Absolute Error (MAE) and the Root Mean Square Error (RMSE), which are calculated by the following equations: N
MAPE ( ( x(t ) xˆ (t )) / x(t ) ) / N
AN US
t 1
(12)
N
MAE ( x(t ) xˆ (t ) ) / N
(13)
t 1
N
RMSE ( x(t ) xˆ (t ) ) / N 2
(14)
M
t 1
where x(t ) is the actual value of PM2.5 concentration series, xˆ (t ) is the forecasting
PT
4.2 Sample entropy
ED
value of results and N is the number of samples in the testing set.
Sample entropy (SampEn, SE) is a modification of approximate entropy (ApEn). SE
CE
is widely used to evaluate the complexity of time series [32]. In general, the smaller value of SE represents better self-similarity of the time series.
AC
The sample entropy can be defined as follows:
SE log
A B
(15)
Where A is the number of template vector groups when d X m1(i), X m1( j ) r , and B is the number of template vector groups when d X m (i), X m ( j ) r . In the study, m is set to be 3 and r is set to be 0.2 times standard deviation. 17
ACCEPTED MANUSCRIPT 4.3 Main forecasting results In the section, main results of different series are displayed. Figures 3-7 show the results of the PM2.5 concentrations multiple step forecasting. As shown in Figures 3-6, series 1-4 are used to analyze different components of the model proposed in the study, where series 1 is used for HI, series 2 is used for Stacking and series 4 is used
CR IP T
for IEWT reconstruction. As shown in Fig.3 and Fig.4, the EWT-Stacking-Elman model (in yellow) and HI-EWT-Stacking-Elman model (in grey) do not perform well in 5-step, especially where the original series waveform changes drastically. The Stacking ensemble method may lead to over-fitting because of the high complexity of
AN US
original series. The model proposed in the study solves the over-fitting problem benefiting from the IEWT reconstruction method. The IEWT reconstruction method can improve the fitting and learning ability of hybrid model significantly and restrain
M
the over-fitting phenomenon, which is a particularly serious problem in Stacking ensemble method. The IEWT fits well with the Stacking, and they complement each
AC
CE
PT
ED
other.
Fig. 3 Results of the PM2.5 concentration series 1 in 5-step forecasting.
18
CR IP T
ACCEPTED MANUSCRIPT
PT
ED
M
AN US
Fig. 4 Results of the PM2.5 concentration series 2 in 5-step forecasting.
AC
CE
Fig. 5 Results of the PM2.5 concentration series 3 in 5-step forecasting.
Fig. 6 Results of the PM2.5 concentration series 4 in 5-step forecasting. 19
CR IP T
ACCEPTED MANUSCRIPT
5 Comparison analysis 5.1 Analysis of HI outlier correction
AN US
Fig. 7 Results of the PM2.5 concentration series 5 in 1-step forecasting
In the section, the effectiveness and improvement of HI outlier correction method are
M
analyzed. Some major parameters of HI are listed and set as follows: the length of sliding window w is selected and changed by the influence of trial and error; the
ED
threshold TR is set the default value to 3. Especially, the length of sliding window
PT
w is selected in a range of [5, 45] to improve the adaptability of different series. SE is used to show the decrease of the complexity of initial data after HI outliers
CE
correction. Table 2 shows the specific reduction of it. Fig. 8 shows the specific results of series 1 after HI outliers correction. The outlier correction result shows that HI can
AC
distinguish the outliers from normal points and correct them to normal position. Table 2 The sample entropy of initial series and series after HI preprocessing.
Sample Entropy
Series 1
Series 2
Series 3
Series 4
Before HI
0.6074
0.5927
0.8834
0.5405
After HI
0.6071
0.5832
0.8830
0.5325
16
11
3
6
Best window length
20
ACCEPTED MANUSCRIPT From Table 2, it can be seen that: (A) depending on different series with different volatility characteristics, the best sliding window lengths of HI are different. The outliers in four series are apparently random. (B) HI can reduce the Sample entropy of the initial data. As shown in the bold text of Table 2, the SE indexes of all series after HI outlier correction are lower than that before HI. The
CR IP T
complexity of original series is reduced by removing the outliers. In consequence,
ED
M
AN US
the self-similarity and learnability of input data are improved.
PT
Fig. 8 HI outlier correction results of four PM2.5 concentration series. In addition, the evaluation indices of validation set are calculated to show the
CE
improvement of HI. Four comparison model groups are set to contrast and analyze: (A)
AC
HI-EWT-Elman vs. EWT-Elman; (B) HI-EWT-Stacking-Elman vs. EWT-Stacking -Elman; (C) HI-EWT–Elman-IEWT vs. EWT–Elman-IEWT; (D) HI-EWT-Stacking -Elman-IEWT vs. EWT-Stacking-Elman-IEWT. The comparison results of different groups are shown in Table 3 and 4. Table 3 Specific error indices of different comparison models in series 1 Step
MAPE (%)
1
1.6737
MAE (μg/m3) EWT-Elman 0.7322
RMSE (μg/m3)
MAPE (%)
0.9142
1.5076
21
MAE (μg/m3) HI-EWT-Elman 0.6772
RMSE (μg/m3) 0.8433
ACCEPTED MANUSCRIPT
1 3 5 1 3 5 1 3 5
3.6097 8.0443
1.5184 1.8846 3.2954 4.0561 EWT-Stacking-Elman 1.5882 0.8244 1.4599 3.1724 1.8559 3.9777 6.7110 3.7029 7.5899 EWT–Elman-IEWT 0.8580 0.3778 0.4594 3.3248 1.3850 1.6800 6.9544 2.8674 3.4108 EWT-Stacking-Elman-IEWT 0.6014 0.3088 0.4530 2.6553 1.3936 2.0241 5.3945 2.3010 2.8923
3.1869 1.3495 1.6733 7.3583 3.0230 3.7150 HI-EWT-Stacking-Elman 1.4169 0.6136 0.7550 3.0941 1.7550 3.3723 6.0797 3.5538 7.1464 HI-EWT–Elman-IEWT 0.7798 0.3219 0.4078 2.7820 1.1544 1.4174 6.7192 2.7839 3.2884 HI-EWT-Stacking-Elman-IEWT 0.3839 0.1614 0.2006 1.7387 0.8392 1.1380 4.5303 1.8427 2.2465
Table 4 The improving percentages after HI of series 1.
1 3 5 1 3 5
AN US
1 3 5
PMAE (%) PRMSE (%) HI-EWT-Elman vs. EWT-Elman 9.9210 7.5173 7.7521 11.7130 11.1196 11.2136 8.5276 8.2661 8.4076 HI-EWT-Stacking-Elman vs. EWT-Stacking-Elman 10.7842 25.5690 48.2827 2.4670 5.4634 15.2190 9.4067 4.0275 5.8430 HI-EWT–Elman-IEWT vs. EWT–Elman-IEWT 9.1131 14.8077 11.2322 16.3239 16.6470 15.6311 3.3814 2.9138 3.5886 HI-EWT-Stacking-Elman-IEWT vs. EWT-Stacking-Elman-IEWT 36.1774 47.7515 55.7248 34.5202 39.7841 43.7783 16.0197 19.9191 22.3284
M
1 3 5
PMAPE (%)
ED
Step
CR IP T
3 5
PT
From Tables 3 and 4, it can be analyzed that in the four comparing groups HI
CE
can improve the forecasting accuracy from one-step to five-step. In addition, as shown in the bold text of Table 4, the improving percentages after HI can reach
AC
up to 40%-50% which depends on the specific structure of the hybrid model. The maximum improving percentages appear in the 1-step forecasting. 5.2 Analysis of Stacking ensemble method In the proposed hybrid model, the Stacking ensemble method is used to improve the predicting performance of the ENNs. The meta-predictor chooses the ORELM to integrate the results of each base learner. The section analyzes the effectiveness and 22
ACCEPTED MANUSCRIPT improvement of Stacking ensemble method by comparing the HI-EWT-ENNs and HI-EWT-Stacking-ENN. Each HI-EWT-ENN has different hidden neuron number, including 15, 20, 25 and 30. The MAEs of base predictors are present in Fig. 9. The
AN US
CR IP T
forecasting performance of above models is present in Table 5.
Fig. 9 The MAEs of different base learners
M
Table 5 The improving percentages after Stacking in PM2.5 concentrations series 2. Step
CE
1 3 5
PT
1 3 5
ED
1 3 5
AC
1 3 5
PMAPE (%) PMAE (%) PRMSE (%) HI-EWT-ENN-15 vs. HI-EWT-Stacking-ENN 72.8564 67.4546 66.8889 29.7356 13.7283 7.1794 40.2696 33.8558 33.3260 HI-EWT-ENN-20 vs. HI-EWT-Stacking-ENN 53.4858 42.9999 42.4487 30.5257 11.9403 7.0224 38.9948 33.8853 35.0481 HI-EWT-ENN-25 vs. HI-EWT-Stacking-ENN 65.4461 61.0827 60.7301 32.1511 22.2556 15.3172 36.6464 33.1665 34.2609 HI-EWT-ENN-30 vs. HI-EWT-Stacking-ENN 71.1172 67.9485 67.8744 59.7242 55.2615 52.7095 43.1676 39.2120 40.9019
As the results shown in Fig.9 and Table 5, the base predictors have diverse forecasting results. The phenomenon proves that the forecasting results can provide diverse feature for Stacking method. In addition, as shown in the bold text of Table 5, the improving percentages of Stacking ensemble method can reach up to 72.8% which proves the remarkable ability of Stacking. The 23
ACCEPTED MANUSCRIPT phenomenon can be explained that the Stacking method utilizes meta-predictor to achieve nonlinear combination of base-predictors with minimum error. 5.3 Analysis of IEWT reconstruction In the section, the results of IEWT reconstruction method are analyzed. The process of EWT and IEWT are shown in Fig. 10 visually. In order to quantify and analyze the
CR IP T
effectiveness of IEWT, four comparison hybrid model groups are set, which are shown as follows: (A) EWT-Elman-IEWT vs. EWT-Elman; (B) EWT-Stacking -Elman-IEWT vs. EWT-Stacking-Elman; (C) HI-EWT–Elman-IEWT vs. HI-EWT
AN US
-Elman; (D) HI-EWT-Stacking-Elman-IEWT vs. HI-EWT-Stacking-Elman. The evaluation indices of different comparison hybrid model groups are shown in Tables 6 and 7.
Table 6 Specific error indices of different comparison models in series 4
1 3 5
M
AC
1 3 5
ED
1 3 5
PT
1 3 5
MAE RMSE (μg/m3) (μg/m3) EWT-Elman 2.6846 0.9365 1.1407 4.6131 1.8123 2.2640 8.6505 3.4586 4.3352 EWT-Stacking-Elman 1.2097 0.4237 0.5137 3.7214 1.2624 1.6163 8.2265 2.8987 3.5254 HI-EWT–Elman 1.7677 0.6321 0.7955 3.9682 1.5019 1.8513 8.0752 3.1647 3.8740 HI-EWT-Stacking-Elman 1.0466 0.3823 0.5011 3.6853 1.2939 1.5789 7.2568 2.7389 3.4912
MAPE (%)
CE
Step
MAE RMSE (μg/m3) (μg/m3) EWT-Elman-IEWT 1.5160 0.5427 0.6825 3.7084 1.4168 1.7740 7.7719 3.0210 3.7315 EWT-Stacking-Elman-IEWT 0.7519 0.2575 0.3051 3.5916 1.2411 1.4984 7.5199 2.6462 3.2419 HI-EWT–Elman-IEWT 1.4231 0.4680 0.5769 3.3668 1.3132 1.6753 7.6682 2.9777 3.6298 HI-EWT-Stacking-Elman-IEWT 0.5062 0.1848 0.2294 2.5835 0.9624 1.1926 4.5327 1.5825 2.0211
MAPE (%)
Table 7 The improving percentages of IEWT in the PM2.5 concentration series 4 Step 1 3 5 1
PMAPE (%)
PMAE (%) PRMSE (%) EWT-Elman-IEWT vs. EWT-Elman 43.5306 43.5306 43.5306 19.6123 19.6123 19.6123 10.1575 10.1575 10.1575 EWT-Stacking-Elman-IEWT vs. EWT-Stacking-Elman 37.8430 37.8430 37.8430 24
ACCEPTED MANUSCRIPT 3 5
3.4874 8.5893
3.4874 3.4874 8.5893 8.5893 HI-EWT–Elman-IEWT vs. HI-EWT–Elman 19.4958 19.4958 19.4958 15.1559 15.1559 15.1559 5.0401 5.0401 5.0401 HI-EWT-Stacking-Elman-IEWT vs. HI-EWT-Stacking-Elman 51.6315 51.6759 54.2282 29.8985 25.6205 24.4628 37.5385 42.2210 42.1017
1 3 5 1 3 5
CR IP T
From Tables 6 and 7, it can be analyzed that in the four comparison groups IEWT can improve the forecasting accuracy universally from one-step to five-step. In addition, as shown in the bold text of Table 7, the improving
AC
CE
PT
ED
M
AN US
percentages of IEWT can reach up to 51.63% -54.23%.
Fig. 10 The procedure and result of IEWT in the PM2.5 concentration series 4
As shown in Fig.10, the EWT decomposition method can decompose complex initial series into subseries with higher learnability. The forecasting accuracy of proposed hybrid model is improved in this way.
25
ACCEPTED MANUSCRIPT 5.4 Analysis of comparison with the existing models In the section, some existing fine particle concentration forecasting models are selected to compare with the model proposed in the study, which are shown as follows: (A) CEEMD-VMD-DE-ELM; (B) EEMD-PSR-LSSVR; (C) SDA-LSSVR-PSO; (D) CEEMD-GWO-SVR. Some existing models cannot forecast in multi-step, so the
CR IP T
model proposed in the study are compared with the following models in Step-1. The evaluation indices of different models are shown in Table 8.
Table 8 The evaluation indices of some existing models and the model proposed in the study
CEEMD-VMD-DE-ELM EEMD-PSR-LSSVR SDA-LSSVR-PSO CEEMD-GWO-SVR
New model
MAPE (%) 6.1525 15.0236 29.5788 16.9457 1.0236
MAE (μg/m3) 1.6178 4.7555 7.3659 5.6800 0.2818
RMSE (μg/m3) 2.1039 6.0483 8.5226 7.5946 0.3617
AN US
Models (Step-1)
PMAPE (%) 83.36 93.18 96.53 93.95
PMAE (%) 82.58 94.07 96.17 95.03
PRMSE (%) 82.80 94.01 95.75 95.23
M
From Table 8, the following conclusions of the hybrid models can be gotten: (A)
ED
The replication results of these models A, B and D in the study are close to the RMSE results of the existing models as given in Table 1. So it is reasonable to
PT
adopt the replication results to compare with the model proposed in the study. (B)
CE
The model proposed in the study outperforms the existing hybrid models. As shown in Table 8, the evaluation indices improvement percentages of the model
AC
proposed in the study can reach up to 80%-96% comparing to the existing models, which are enhanced by the Stacking ensemble method and EWT decomposition method.
6 Conclusions In the study, a novel ensemble hybrid model is proposed to forecast hourly fine particle concentrations in multi-step. Aiming at Northern China, three of the most 26
ACCEPTED MANUSCRIPT polluted cities are selected to train and test the hybrid model. The model proposed in the study combines outlier correction preprocessing, data decomposition method, ensemble method, neural network and data reconstruction. HI outlier correction preprocessing detects and corrects the outliers of the original PM2.5 concentration series, which improves the robustness of the hybrid model. The
CR IP T
EWT is used to decompose the corrected series into a set of subseries adaptively, and each subseries are used to train the Stacking ensemble method. Four different ENNs are set as the base learners and the ORELM is selected as the meta-learner to integrate the results of different base learners. The IEWT method is used to reconstruct the
AN US
combined subseries of forecasting output series to get the final forecasting PM2.5 concentration results.
After the experiment and analysis, the following conclusions can be drawn: (A) The
M
model proposed in the study promotes the performance in multi-step urban fine particle concentration forecasting comparing to the existing models. (B) The EWT
ED
decomposition and Stacking ensemble methods can promotes the forecasting accuracy
PT
of the hybrid model. (C) The HI outlier correction preprocessing and ORELM improve the robustness of the hybrid model. (D) The Stacking method may lead to
CE
over-fitting problem of the fine particle forecasting. (E) The IEWT reconstruction method solves the over-fitting problem of the hybrid model and improves the stability
AC
of the hybrid model. Acknowledgements The study is fully supported by the National Natural Science Foundation of China (Grant No. 61873283), the Shenghua Yu-ying Talents Program of the Central South University and the innovation driven project of the Central South University (Project No. 2019CX005). 27
ACCEPTED MANUSCRIPT References
AC
CE
PT
ED
M
AN US
CR IP T
[1] T. Li, R. Hu, Z. Chen, Q. Li, S. Huang, Z. Zhu, L.-F. Zhou, Fine particulate matter (PM2. 5): The culprit for chronic lung diseases in China, Chronic diseases and translational medicine, 4 (2018) 176-186. [2] M. Gao, S.K. Guttikunda, G.R. Carmichael, Y. Wang, Z. Liu, C.O. Stanier, P.E. Saide, M. Yu, Health impacts and economic losses assessment of the 2013 severe haze event in Beijing area, Science of the Total Environment, 511 (2015) 553-561. [3] M. Niu, K. Gan, S. Sun, F. Li, Application of decomposition-ensemble learning paradigm with phase space reconstruction for day-ahead PM2. 5 concentration forecasting, Journal of environmental management, 196 (2017) 110-118. [4] F. Biancofiore, M. Busilacchio, M. Verdecchia, B. Tomassetti, E. Aruffo, S. Bianco, S. Di Tommaso, C. Colangeli, G. Rosatelli, P. Di Carlo, Recursive neural network model for analysis and forecast of PM10 and PM2. 5, Atmospheric Pollution Research, 8 (2017) 652-659. [5] D. Wang, S. Wei, H. Luo, C. Yue, O. Grunder, A novel hybrid model for air quality index forecasting based on two-phase decomposition technique and modified extreme learning machine, Science of The Total Environment, 580 (2017) 719-733. [6] B. Zhai, J. Chen, Development of a stacked ensemble model for forecasting and analyzing daily average PM 2.5 concentrations in Beijing, China, Science of The Total Environment, 635 (2018) 644-658. [7] Y. Chen, R. Shi, S. Shu, W. Gao, Ensemble and enhanced PM 10 concentration forecast model based on stepwise regression and wavelet analysis, Atmospheric Environment, 74 (2013) 346-359. [8] M. Niu, Y. Wang, S. Sun, Y. Li, A novel hybrid decomposition-and-ensemble model based on CEEMD and GWO for short-term PM2. 5 concentration forecasting, Atmospheric Environment, 134 (2016) 168-180. [9] P. Perez, E. Gramsch, Forecasting hourly PM2. 5 in Santiago de Chile with emphasis on night episodes, Atmospheric Environment, 124 (2016) 22-27. [10] H.J. Fernando, M. Mammarella, G. Grandoni, P. Fedele, R. Di Marco, R. Dimitrova, P. Hyde, Forecasting PM10 in metropolitan areas: Efficacy of neural networks, Environmental pollution, 163 (2012) 62-67. [11] L. Zhang, J. Lin, R. Qiu, X. Hu, H. Zhang, Q. Chen, H. Tan, D. Lin, J. Wang, Trend analysis and forecast of PM2. 5 in Fuzhou, China using the ARIMA model, Ecological Indicators, 95 (2018) 702-710. [12] L. Wu, N. Li, Y. Yang, Prediction of air quality indicators for the Beijing-Tianjin-Hebei region, Journal of cleaner production, 196 (2018) 682-687. [13] L. Wu, H. Zhao, Using FGM (1, 1) model to predict the number of the lightly polluted day in Jing-Jin-Ji region of China, Atmospheric Pollution Research, (2018). [14] L. Wu, X. Gao, Y. Xiao, S. Liu, Y. Yang, Using grey Holt–Winters model to predict the air quality index for cities in China, Natural Hazards, 88 (2017) 1003-1012. [15] K. Gan, S. Sun, S. Wang, Y. Wei, A secondary-decomposition-ensemble learning paradigm for forecasting PM 2.5 concentration, Atmospheric Pollution Research, (2018). [16] R. Li, Y. Dong, Z. Zhu, C. Li, H. Yang, A dynamic evaluation framework for ambient air pollution monitoring, Applied Mathematical Modelling, 65 (2019) 52-71. [17] O.P. Panagopoulos, P. Xanthopoulos, T. Razzaghi, O. Şeref, Relaxed support vector regression, Annals of Operations Research, (2018) 1-20. 28
ACCEPTED MANUSCRIPT
AC
CE
PT
ED
M
AN US
CR IP T
[18] D.T. Shipmon, J.M. Gurevitch, P.M. Piselli, S.T. Edwards, Time Series Anomaly Detection; Detection of anomalous drops with limited features and sparse examples in noisy highly periodic data, arXiv preprint arXiv:1708.03665, (2017). [19] G.F. Ceschini, N. Gatta, M. Venturini, T. Hubauer, A. Murarasu, Optimization of Statistical Methodologies for Anomaly Detection in Gas Turbine Dynamic Time Series, Journal of Engineering for Gas Turbines and Power, 140 (2018) 032401. [20] S. Ahmad, A. Lavin, S. Purdy, Z. Agha, Unsupervised real-time anomaly detection for streaming data, Neurocomputing, 262 (2017) 134-147. [21] R.K. Pearson, Outliers in process modeling and identification, IEEE Transactions on control systems technology, 10 (2002) 55-63. [22] K. Dragomiretskiy, D. Zosso, Variational mode decomposition, IEEE transactions on signal processing, 62 (2014) 531-544. [23] J. Gilles, Empirical Wavelet Transform, IEEE Transactions on Signal Processing, 61 (2013) 3999-4010. [24] H. Liu, X.W. Mi, Y.F. Li, Wind speed forecasting method based on deep learning strategy using empirical wavelet transform, long short term memory neural network and Elman neural network, Energy Conversion and Management, 156 (2018) 498-514. [25] K. Zhang, M. Luo, Outlier-robust extreme learning machine for regression problems, Neurocomputing, 151 (2015) 1519-1527. [26] D.H. Wolpert, Stacked generalization, Neural networks, 5 (1992) 241-259. [27] Z. Ma, Q. Dai, Selected an Stacking ELMs for Time Series Prediction, Neural Processing Letters, 44 (2016) 1-26. [28] J.L. Elman, Finding structure in time, Cognitive science, 14 (1990) 179-211. [29] P. Lin, Z. Peng, Y. Lai, S. Cheng, Z. Chen, L. Wu, Short-term power prediction for photovoltaic power plants using a hybrid improved Kmeans-GRA-Elman model based on multivariate meteorological factors and historical power datasets, Energy Conversion and Management, 177 (2018) 704-717. [30] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory and applications, Neurocomputing, 70 (2006) 489-501. [31] J. Yang, Y. Zhang, Alternating direction algorithms for \ell_1-problems in compressive sensing, SIAM journal on scientific computing, 33 (2011) 250-278. [32] J.S. Richman, J.R. Moorman, Physiological time-series analysis using approximate entropy and sample entropy, American Journal of Physiology-Heart and Circulatory Physiology, 278 (2000) H2039-H2049.
29