Accepted Manuscript A SVR-ANN combined model based on ensemble EMD for rainfall prediction Yu Xiang, Ling Gou, Lihua He, Shoulu Xia, Wenyong Wang
PII: DOI: Reference:
S1568-4946(18)30535-0 https://doi.org/10.1016/j.asoc.2018.09.018 ASOC 5097
To appear in:
Applied Soft Computing Journal
Received date : 10 April 2017 Revised date : 14 August 2018 Accepted date : 11 September 2018 Please cite this article as: Y. Xiang, et al., A SVR-ANN combined model based on ensemble EMD for rainfall prediction, Applied Soft Computing Journal (2018), https://doi.org/10.1016/j.asoc.2018.09.018 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*Highlights (for review)
Highlights 1. A novel SVR-ANN combined model based on the information extracted with EEMD is proposed for rainfall prediction 2. Various prediction methods are adopted for different components of input data, which employs Support Vector Machine (SVR) for short-period component prediction, while Artificial Neural Network (ANN) for long-period components prediction. 3. E-SVR-ANN model shows better performances than traditional methods that provides new thinking in rainfall prediction area.
Graphical abstract (for review)
Graphical Abstract
*Manuscript Click here to view linked References
1
A SVR-ANN Combined Model Based on Ensemble EMD for Rainfall Prediction
2 3
Yu. Xiang a, b, Ling. Gou a, Lihua. He a,1, Shoulu. Xia a, Wenyong. Wang a
4 5
a
6
Technology of China, Chengdu, Sichuan 611731, China
School of Computer Science and Engineering, University of Electronic Science and
7 8
b
9
Email:
[email protected] (Yu. Xiang)
Corresponding author
10
Postal address: School of Computer Science and Engineering, University of
11
Electronic Science and Technology of China, Chengdu, Sichuan 611731, China
12 13
Abstract
14
Accurate and timely rainfall prediction is very important in hydrological modeling.
15
Various prediction methods have been proposed in recent years. In this work,
16
information regarding the short-to-long time variation inside original rainfall time
17
series is explored using Ensemble Empirical Mode Decomposition (EEMD) based
18
analysis on three rainfall datasets collected by meteorological stations located in
19
Kunming, Lincang and Mengzi, Yunnan Province, China. Considering both with
20
prediction accuracy and time efficiency, a novel combined model based on the
21
information extracted with EEMD is then proposed in this paper. This model adopts 1
Present address: School of Computer Science and Engineering, Guilin University of Aerospace Technology, Guilin, Guangxi 541004, China. 1 / 39
1
various supervised learning methods for different components of input data, which
2
employs Support Vector Machine (SVR) for short-period component prediction, while
3
Artificial Neural Network (ANN) for long-period components prediction. Our
4
research shows better performances than traditional methods that provides new
5
thinking in rainfall prediction area.
6
Keywords: rainfall prediction; time series; Ensemble Empirical Mode Decomposition
7
(EEMD); SVR-ANN combined model
8 9
1. Introduction
10
Rainfall is one of the important factors in hydrological model that affects local water
11
quantity and quality. Small changes of rainfall levels may cause severe flooding or
12
drought, affect food production and even economic activities [1, 2, 3]. As a result,
13
accurate and timely rainfall forecast is crucial for weather prediction, water resources
14
planning, nature hazards prediction and so on [4, 5].
15
Many researches have been proposed in the quantitative rainfall prediction using
16
diverse techniques. Traditional methods, such as numerical prediction model and
17
Bayesian method, were used in precipitation and rainfall prediction [6, 7]. In recent
18
years, machine learning methods, such as Support Vector Machine (SVM) and
19
Artificial Neural Network (ANN), or their variations, have achieved excellence
20
performance in the rainfall prediction area. Pai and Hong used SVR with different
21
parameter selection methods to predict rainfall, such as genetic algorithms (GA) [8],
22
simulated annealing (SA) [9] and chaotic particle swarm optimization algorithm 2 / 39
1
(CPSO) [10], and the simulation results revealed that the proposed models all
2
achieved well forecasting performances. Kashiwao et al. proposed local rainfall
3
prediction based on ANN using meteorological data obtained from the website of the
4
Japan Meteorological Agency (JMA) [11]. Other SVM and ANN- based algorithms
5
are summarized in [12]. Theoretical comparisons between SVR and ANN also have
6
been done by researchers. In [13], after the theory of SVM and ANN were given, the
7
authors discussed the advantages of SVM over ANN. They pointed out that SVR is
8
founded on computational learning theory thus an optimal set of weights and
9
thresholds of the trained network could be found. Liong et al. also pointed out that
10
compared with ANN, SVM is data-adaptive that expected to give a relatively good
11
generalization performance for hydrologic conditions also, unless the catchment
12
undergoes a drastic natural or man-made change affecting the underlying physical
13
process [14]. W-C Hong et al. pointed out that although all SVR with different
14
evolutionary algorithms are superior to other competitive forecasting models, these
15
algorithms are almost lack of knowledge memory or storage functions which would
16
lead to neither time consuming nor inefficiency in the searching the suitable
17
parameters [15]. Conclusion could be drawn from above literatures that SVR and
18
ANN has its own superiority under specific circumstances. Therefore, models which
19
could combine the benefits of various learning algorithms are still need to be
20
researched.
21
Since rainfall is a complex atmospheric process, which is nonlinear and nonstationary,
22
it is not easy to predict [16,17]. Researchers have shown that pre-processing on the 3 / 39
1
input data would improve the prediction accuracy. Singular Spectrum Analysis (SSA)
2
was used to decompose the inputs into multi-components and combined with SVM to
3
forecast rainfall in [18]. Wavelet analysis was also widely used and coupled with
4
machine learning methods to predict rainfall and air pollutants in [4, 19]. Recently,
5
Empirical Mode Decomposition (EMD) and the improved version Ensemble EMD
6
proved to be an empirical, intuitive, direct and self-adaptive data processing methods
7
especially suitable for non-linear and non-stationary time series [20,21,22]. Naik et al.
8
decomposed the single-channel electromyography (EMG) signal into a set of
9
noise-canceled intrinsic mode functions (IMFs) by EEMD to improve the
10
performance for diagnosing neuromuscular disorders [23]. Furthermore, studies have
11
shown that EMD combined with prediction methods could achieve superior
12
performance in time series prediction. For example, Huang S and Wang w et al.
13
employed EMD-based data pre-processing methods to decompose the time series into
14
several components, then they adopted the SVM or ANN to predict rainfall or runoff
15
[24,25]. Sun et al. used EEMD and least square support vector regression (LSSVR)
16
with parameters optimized by gravitational search algorithm (GSA) to forecast solar
17
radiation [43]. In [44], EEMD and least square support vector machine
18
(EEMD-LSSVM) based on phase space reconstruction (PSR) is proposed for
19
day-ahead PM2.5 concentration prediction.
20
While using EEMD, each decomposed component of the inputs is discrepant.
21
Different components have different frequencies, and represent different physical
22
meanings. Recently, some models considering about using various prediction methods 4 / 39
1
for different components of input data have been proposed. In [41], For components
2
obtained by EEMD, the proposed model used ARIMA for stationary and
3
low-complexity components prediction and LSSVR\FNN for others to nuclear energy
4
consumption forecasting. In [42], a novel methodology is proposed for container
5
throughput forecasting. In this method, an individual forecasting model is selected for
6
each component based on the data characteristic analysis (DCA), in which
7
ARIMA\SARIMA is selected for stationary and low complexity data and LSSVR for
8
others. Zhang et al. proposed a new hybrid EEMD-based method for crude oil price
9
forecasting, which considered both the nonlinearity and time-varying dynamics of
10
crude oil price movement [45]. However, in rainfall perdition area, most researchers
11
still employed same prediction method on all components. Here we introduce a novel
12
decomposition-ensemble model named E-SVR-ANN into rainfall perdition area. In
13
our proposed model, we first decompose original data into several components using
14
EEMD, and then employs SVR for short-period component prediction, while ANN
15
for long-period components prediction. Experimental results using standard datasets
16
and real rainfall data show that our new model, which consider with both prediction
17
accuracy and time efficiency, can provide better performance to improve the rainfall
18
prediction.
19
The rest of this paper is organized as follows: Section 2 provides a brief introduction
20
to the methodology and our data, Section 3 describes the model we come up with,
21
Section 4 presents and discusses the results, and conclusions will arise in Section 5.
22
2. Method and Data 5 / 39
1
2.1 Methodology
2
2.1.1 Ensemble Empirical Mode Decomposition (EEMD)
3
EMD can be seemed as a shift progress, which can decompose the nonlinear and
4
non-stationary data into several components and a residue, as given in Eq. (1). n
x(t ) IMFi (t ) Rn (t )
5
(1)
i 1
6
Where x(t) is original time series, IMFi(t) is the ith intrinsic model function, and Rn(t)
7
is the residue. IMFs have to satisfy the basic conditions: (1) the number of extrema
8
and the number of zero-crossings should be equal or differ by one. (2) the mean of the
9
maximum envelope and the minimum envelope should be zero [26]. EMD algorithm
10
is described in Fig. 1.
11 12
Fig. 1. EMD algorithm 6 / 39
1
The EMD algorithm includes two iterations: one inner loop, which uses local extrema
2
and stopping criterion I to generate a single IMF, and an outer loop, which stops when
3
the residue, Rn(t), becomes a monotonic function from which no more IMF can be
4
extracted [27]. In order to avoid a serious “mode mixing” problem, which manifests
5
itself in IMFs consisting of oscillations of dramatically disparate scales in EMD, Wu
6
and Huang, inspired by a noise-assisted data analysis (NADA) method, proposed an
7
improved algorithm of EMD – Ensemble EMD (EEMD) as follows [28]:
8
(1) Add a white noise series to input data;
9
(2) Decompose the data with white noise into IMFs;
10
(3) Repeat step 1 and 2 by N (ensemble) times, each time with different white noise;
11
(4) Obtain the means of corresponding IMFs as the final decomposition result.
12
2.1.2 Back Propagation Artificial Neural Network (BP-ANN)
13
Among various Neural Network architectures, BP-ANN provides excellent
14
performance in hydrological modeling [27, 29]. BP-ANN is a multilayer feed forward
15
network that trained by error back propagation algorithm and the structure is
16
displayed in Fig. 2.
7 / 39
Output Layer
Hidden Layer
Input Layer
h1 x1
∑+a1 , f(.)
hj wjk
xi
wij
∑+b1 , f(.)
y1
∑+bk , f(.)
yk
∑+bm , f(.)
ym
∑+aj , f(.)
hl ∑+al , f(.)
xn
1 2 3
4
Error back propagation to adjust weights and biases
Fig. 2. BP-ANN topological structure The BP-ANN can be described as follows: l y f ( w jk h j bk ) k j 1 n h f ( w x a ) ij i j j i 1
(2)
5
where f(.) is a non-linear activation function, such as a Sign function or a continuous
6
Sigmoid function, which is used to deal with the input data and the hidden layer
7
output. We also define xi (i= 1, 2…, n) as the input time series and yk (k= 1, 2…, m) as
8
the output time series, wij is the connection weight between the input layer neuron xi
9
and the hidden layer neuron hj (j= 1, 2…, l), and aj is the bias of the neuron hj.
10
Similarly, wjk, bk are the weights and bias between hidden layer neuron hj and output
11
layer neuron yk, respectively. In Fig. 2, there are two phases in BP-ANN, the first
12
phase is that the data transfers from the input layer to hidden layer, then reaches to the
13
output layer; and the second one is that the error transfers from the output layer back 8 / 39
1
to hidden layer, then reaches to the input layer, coupling with adjusting the
2
corresponding weights and biases. ANN gets the prediction results through the
3
weighted linear transformation between adjacent layers, so the number of the nodes
4
within hidden layer needs to be determined after the parameters of the input and
5
output layer are chosen in order to gain best prediction performance.
6
2.1.3 -SVR
7
Consider a set of training data {(x0, y0)… (xl, yl)}, such that xi is an input and yi is a
8
target output. The idea of SVR is to determine a function f(x,w) that could
9
approximate future values accurately [30], and it is given by:
10
y f ( x, w) w ( x) b
11
Where w and b represent weights and bias respectively. The vector x in the input
12
space is mapped to a high-dimensional feature space via a nonlinear mapping function
13
φ(x). In this paper, we use -SVR and introduce the slack variables and * which
14
are the measurements ‘above’ and ‘below’ the tube. The standard form of the -SVR
15
is:
(3)
N 1 2 w C (i i*) ] 2 i 1 subject to yi ( w ( xi ) b) i w ( xi ) b yi i* i , i* 0
min[ 16
17 18
(4)
2
Where w denotes the flatness of the regression function. 2.1.4 Partial Autocorrelation Function (PACF)
19
PACF is a technique to eliminate the effects of other variables to find the partial
20
correlation of a time series with its own lagged values, which controlling for the 9 / 39
1
values of the time series at all shorter lags. The brief introduction of PACF is as
2
follows.
3
The jth regression coefficient of autoregressive model at lag k is defined as k,j, then
4
the autoregressive model of lag k could be expressed as:
yt k1 yt 1 k 2 yt 2 k 3 yt 3 kk yt k ut
5
(5)
6
Where yt is the value in time series {yi} at time t, k,k is the last regression coefficient ,
7
regarded as the function of lag k (i.e. PACF), which represents the autocorrelation
8
coefficient between yt and yt-k without considering the influence of yt-1, yt-2, …, yt-k+1,.
9
And the k,k could be calculated as follows. 11 1 k k ( )(1 jkj ) 1 k 1,k 1 k 1 k 1 j kj j 1 j 1 k 1, j kj k 1,k 1 k ,k j 1 ,j 1, 2, , k
10
(6)
k is the autocorrelation function at lag k, and 0
11
Where k
12
k cov( yt , yt k ) E[( yt )( yt k )] is the covariance between yt and yt-k.
13
Thus the partial autocorrelation diagram could be plotted according to the results of
14
k,k obtained by Eq.(6), and the input variables could be determined by analyzing the
15
plots of PACF corresponding to the lag length. For example, if the output variable is
16
yi, on the condition that the PACF at lag k is out of the 95% confidence interval which
17
is
, then yi-k would be one of the input variables [31].
18
2.2 Study Area and Data
19
2.2.1 Geographic information
20
Yunnan province, lies in the southwest of China, is an area with special geographical 10 / 39
1
position and diverse topography. The regional climate inside this province is obvious
2
difference. The rainfall datasets used in this paper were collected from three stations
3
located in Kunming, Lincang and Mengzi respectively, represent various climatic
4
characteristics of Yunnan. The position of the three meteorological stations are shown
5
in Fig. 3.
6 7 8 9
Fig. 3. Location of the three meteorological stations 2.2.2 Data feature The daily rainfall data collected from Kunming, Lincang and Mengzi are processed
10
into monthly rainfall datasets as the input of this study. The datasets of Kunming and
11
Mengzi are both up to 720 months, from January 1951 to December 2015, while the
12
dataset of Lincang reaches a length to 696 months, from January 1953 to December
13
2015. In this paper, the datasets from January 1951 to August 2007 of Kunming and
14
Mengzi, January 1953 to August 2007 of Lincang are used for model training, while
15
datasets from September 2007 to December 2015 for validation. Fig. 4 shows the
16
results of monthly precipitation of the three stations, and the unit is 0.1 mm.
11 / 39
1
Fig. 4. Observed rainfall time series
2 3
3. Model Description
4
3.1 E-SVR-ANN model
5
Since the pattern of rainfall time series proved to be nonlinear and nonstationary, the
6
original time series may be comprised of a number of components with various
7
frequencies. Original rainfall data could be decomposed by EEMD into different
8
IMFs with a residue, as shown in Eq. (1). Each IMF represents different intrinsic
9
modes of oscillation, which means a component with a fixed frequency or period. The
10
term “frequency” here is equivalent to the average period of each IMF, that could be
11
calculated by measuring the time intervals between consecutive zero-crossings on
12
successive waves, the results are summarized in Table 1. We could learn from Table 1
13
that the average period of IMF1 is 3 months that represents a season, the average
14
period of IMF2 is 11 months that represents a year, while the average period of IMF5
15
is 56 months that represents 5 years. As mentioned in Section 1, we find it would be
16
better to treat different components with different prediction methods since that each
17
IMF has its own characteristic (i.e. frequency or period). As IMF1 is the shortest
18
period component of the original series, IMF1 is hard to predict accurately. Some 12 / 39
1
previous studies considered IMF1 as noise which to be abandoned [32,33]. However,
2
from Table 1, we could find out that the variance percentage of IMF1 in Kunming,
3
Lincang and Mengzi are 19.5%, 16.77% and 25.27% respectively, which means that
4
they contain the relevant characteristics of the original series and cannot be neglected.
5
Then the problem turns out to be how to choose appropriate prediction models to
6
forecast various IMFs. Table 1. Average periods and variance percentage of IMFs
7
Variance percentage (%)
Period(month)
Kunming Lincang Mengzi Kunming Lincang Mengzi IMF1
3.36
3.26
3.18
19.50
16.77
25.27
IMF2
10.81
11.12
10.38
40.55
48.27
36.76
IMF3
13.53
13.95
14.30
13.44
11.87
12.85
IMF4
27.82
30.06
26.46
2.25
0.80
1.00
IMF5
56.80
55.42
56.20
0.40
0.36
1.34
IMF6
121.82
124.27
141.50
0.42
0.29
0.33
IMF7
409.50
198.40
307.25
0.34
0.12
0.22
IMF8
471.00
413.00
462.00
0.21
0.07
0.07
8
Researches on prediction performance comparisons between ANN and SVR have
9
shown that SVM gave the overall optimal results within short-period time series.
10
Yisheng Lv et. al adopted five different models to predict traffic flow and found that
11
models based on SVR model performed much better than ANN in 15-min prediction
12
while the performance of ANN increased with prediction time slot increased [34].
13
Ren et al. compared a number of EEMD-based forecasting models to predict
14
short-term wind speed time series and found out that the EEMD-based hybrid SVR
15
methods always had better performance than the EEMD-based hybrid ANN methods
16
in most cases [35]. Theoretically, since the data values vary fast in short-term rainfall 13 / 39
1
data component, we expect SVR performs better to predict short-period components
2
decomposed by EEMD. We did some experiments to further explore the assumption,
3
as shown in Table 2. We used rainfall time-series in Kunming as an example.
4
Table 2. Performances comparison with ANN and SVR ANN SVR Kunming R RMSE MAE R RMSE MAE 0.6731 281.1855 226.4995 0.7298 265.2048 217.6252 IMF1 0.9725 103.8298 68.9446 IMF2 0.9764 93.4635 69.1586 0.9972 30.3736 15.4123 IMF3 0.9993 16.0276 10.1524 1.9745 1.5972 0.9999 IMF4 0.9999 1.7035 1.4165 0.9997 1.9058 0.7076 IMF5 0.9999 0.5304 0.4197 0.9999 0.0937 0.0710 IMF6 0.9999 0.0473 0.0444 0.2149 63.5202 46.7239 0.8046 IMF7 0.9999 0.6759 0.0111 0.0098 0.9999 IMF8 0.9999 0.0031 0.0025 0.9999 0.2261 0.1701 Rn 0.9999 0.1607 0.1581
5
Table 2 shows the forecasting performance comparison of each IMF using ANN and
6
SVR within rainfall time series in Kunming. We could find out that the prediction
7
performances of SVM are better than ANN in IMF1, while that within IMF2 to IMF8
8
and Rn, the prediction performances of ANN are better than SVM with the probability
9
of 79.2%. Experiments with other datasets receive similar results. The mathematical
10
analysis about our assumption will be our future work.
11
SVM might be supersensitive to the parameters, and the corresponding optimization
12
process will consume too much tuning time. We did some experiments to compare the
13
time consumption on each IMFs between SVM and ANN, as shown in Fig. 5. The
14
experiments were done using PC with Intel(R) Core(TM) i3-4160 CPU @ 3.60GHz
15
and 4GB RAM.
14 / 39
1 2
Fig. 5. Time consumption using Kunming datasets
3
Experiments results show that the SVM consumes much more time than ANN with
4
IMFn (n≥4). Considering with both the prediction accuracy and time efficiency, the
5
principle of our proposed model is to adopt SVR to predict short-period component
6
IMF1 and ANN to predict longer-period IMFs and the Rn. The results are then added
7
together to form the final rainfall prediction. We call this E-SVR-ANN model and this
8
new combined model is shown in Fig. 6. In our proposed model, we choose IMF1 as
9
the short-period component, instead of IMF1+IMF2 or IMF1+IMF2+IMF3. The
10
influences of short-period components combinations will be analyzed in Section 4.5.
15 / 39
Rainfall Time Series EEMD
IMF1
IMF2
……
IMFn
Rn
PACF1
PACF2
……
PACFn
PACFn+1
standardization
standardization
……
standardization
standardization
SVR
ANN
……
ANN
ANN
∑ Prediction
1
Fig. 6. E-SVR-ANN model
2 3
Before predict process begins, IMF data are prechosen by PACF. The maximum
4
continuous IMF data outside the confidence interval are to be chosen as the input of
5
the predictor. Data standardization should be carried out first to alleviate large
6
fluctuations in learning process as follows:
X
7
8 9
x min( x) max( x) min( x)
(7)
The algorithm based on E-SVR-ANN model is summarized as follows. Input: data: the rainfall time series; noise: the added white noise; n: the number of
10
IMFs; iter: the iteration, iter [50, 200] ; x: the input vector; yt: the observerd data; y:
11
the predicted data.
12
Output: O: the predicted result.
13
Procedure:
14
IMFs <- eemd(data, noise, iter);
15
for i=1; i<=n do 16 / 39
1 2
lag <- Pacf(IMFi); // calculating the number of input variables for training data, using Eq.(6)
3
IMFi <- Normalization(IMFi); //using Eq.(7)
4
If IMF1
5
[yt, x] <- SVR_format(IMF1); //using lag
6
SVR_ parameters <- SVR_gridresearch(yt, x); //finding the values of cost
7
(c), variable gamma (g), and the error tolerance ()
8
Model1 <- SVR_train(yt, x); //using Eq. (3-4)
9
y <- SVR_predict(yt, x, Model1); //using Eq. (3)
10
Else
11
[train_label, train_data] <- ANN_format(IMF1); //using lag
12
ANN_ parameters <- ANN_traverse(yt, x);
13
Modeli <- ANN_train(yt, x); // using Eq. (2)
14
y <- ANN_predict(yt, x, Modeli);
15
End if
16
O 1 y
17
end for
18
Return O
19
n
3.2 Three comparing indicators
20
We will compare E-SVR-ANN with several models including ANN, SVR, E-ANN,
21
E-SVR, EEMD-ARIMA [36], EMD-RBFNN [37], and EMD-DBN [38]. Our
22
measurements include Pearson Correlation (R), Root Mean Square Errors (RMSE), 17 / 39
1
and Mean Absolute Errors (MAE). R is a linear correlation coefficient, reflecting the
2
degree of linear correlation between two variables, the greater the absolute value of R
3
the stronger the correlation is. RMSE is one of the commonly used error index
4
statistics, represents the sample standard deviation of the differences between
5
predicted values and observed values. MAE is also a statistic for measuring the
6
predictive capability of a model. The larger the value of RMSE and MAE, the worse
7
the prediction effect of the prediction model. R, RMSE, MAE are defined as:
8
R
( X X )(Y Y ) ( X X ) (Y Y ) i
i
2
i
9
RMSE
10
MAE
11
2
(8)
i
1 N ( X i Yi )2 n i 1
1 N X i Yi n i 1
(9)
(10)
In our paper, X represents predicted time series, while Y represents real time series.
12
4. Results and Discussions
13
4.1 Validation of E-SVR-ANN model
14
First, validation tests of our model were applied with some standard datasets like NN5
15
[39]. Due to the space limitation, we randomly select three columns (6,7,8) from NN5
16
to compare the performance among ANN, SVR, E-ANN, E-SVR and E-SVR-ANN.
17
The first 735 data are designed to construct prediction models while the last 56 data
18
are used for validation. Fig. 7 shows the forecasting results of E-ANN, E-SVR and
19
E-SVR-ANN, along with the curve of original data (column 6 in NN5) as an example.
20
As shown in Table 3, E-SVR-ANN model’s prediction results are closest to the
21
observed data because the values of R, RMSE and MAE perform best than other 18 / 39
1
models. Meanwhile, Fig. 8 shows the scatter plot of the three models to the same NN5
2
data above, reveals that among all the models, the plot of E-SVR-ANN is the closet to
3
the 45-degree line. Together with Table 4, E-SVR-ANN obtains best performance. We
4
will discuss E-SVR-ANN model more deeply and compare with more existing
5
methods already proposed in the section 4.6.
6
Fig. 7. Predicted and observed NN5 data (column 6)
7
8 9
Fig. 8. Scatter plot of prediction in NN5 (column 6)
10
Table 3. Forecasting performance indicators of time series in NN5 R
ANN
6 0.7517 19 / 39
7 0.5868
8 0.7945
average 0.7110
RMSE
MAE
1
SVR E-ANN E-SVR E-ANN-SVR ANN SVR E-ANN E-SVR E-ANN-SVR ANN SVR E-ANN E-SVR E-ANN-SVR
0.7180 0.8571 0.8590 0.8750 5.7803 4.9475 3.7062 3.5644 3.4697 4.1765 3.4259 3.0993 2.6808 2.6687
0.8330 0.8864 0.8934 0.8938 6.2966 5.8061 4.4921 4.3062 4.2518 4.4463 3.8667 3.3470 3.2637 3.2586
0.7852 0.8685 0.8791 0.8850 6.3128 5.5859 4.4943 4.2210 4.1485 4.3239 3.7671 3.5136 3.1194 3.1083
Table 4. R-squares and slopes of scatter plots
NN5_6 NN5_7 NN5_8 2
0.8046 0.8619 0.8850 0.8862 6.8615 6.0042 5.2845 4.7923 4.7239 4.3489 4.0087 4.0946 3.4138 3.3975
R-square slope R-square slope R-square slope
E-ANN 0.7429 0.8977 0.7346 0.9742 0.7857 0.9437
E-SVR 0.7832 1.0018 0.7379 1.0216 0.7982 0.9556
E-SVR-ANN 0.7853 0.9862 0.7657 1.0512 0.7988 0.9624
4.2 The decomposed original rainfall time series
3
The original rainfall time series are decomposed into eight IMFs and a residue Rn as
4
shown in Fig. 9. As mentioned in section 3.1, the frequency of each component is
5
decreasing from high to low, which can also be reflected from the period values
6
shown in Table 1. Rn represents the trend of the original series, showing that the
7
precipitation from 1951 to 1988 peaked and then began to decline.
20 / 39
1
Fig. 9. Decomposed IMFs of original rainfall time series
2 3
4.3 PACF Results
4
PACF is applied to determine the input variables of IMFs before prediction. The
5
variables outside the confidence interval would have larger effect on output, so we
6
should choose the maximum continuous variables outside that confidence interval as
7
input variables. All the PACF results are summarized in Table 5, where [1, n]
8
represents time series data from xt-1 to xt-n (i.e. [ xt 1 , xt 2 ,..., xt n ] ). Table 5. PACF results of three stations
9
Input variables IMF1 IMF2
IMF3
IMF4
IMF5 IMF6 IMF7 IMF8
Rn
Kunming [1,4]
[1,4]
[1,6]
[1,10] [1,8]
[1,7]
[1,7]
[1,7]
[1,5]
Lincang
[1,6]
[1,6]
[1,10] [1,8]
[1,7]
[1,7]
[1,8]
[1,10]
[1,5]
21 / 39
Mengzi 1
[1,3]
[1,6] [1,10] [1,10] [1,8]
[1,6]
[1,7]
[1,7]
[1,14]
4.4 Parameters of ANN and SVR
2
Parameters of ANN and SVR must be selected to achieve better results. The number
3
of hidden nodes in ANN needs to be decided, so we traverse among [2, 20] to select
4
parameter which performances best in condition of the determined input and output
5
variables. The final results of hidden nodes’ number in ANN are shown in Table 6.
6
Meanwhile, the values of the cost (c), variable gamma (g), and the error tolerance ()
7
of SVR have to be determined, and all chosen parameters are shown in Table 7 using
8
grid research method. Table 6. Number of hidden nodes in ANN
9
IMF1 IMF2 IMF3 IMF4 IMF5 IMF6 IMF7 IMF8 Rn Kunming
11
8
6
4
4
19
4
3
4
Lincang
2
12
6
3
4
4
5
2
5
Mengzi
5
16
12
5
2
3
2
4
18
Table 7. Parameters c, g and in SVR
10
Kunming
Lincang
Mengzi
g
c
g
c
g
IMF1 1024
16
256
1024
8
256
1024
8
256
IMF2 1024
8
32
1024
4
16
1024
2
4
IMF3 1024
2
0.5002
1024
2
1
1024
1
2
IMF4 1024 0.5
0.2504
1024
4
0.0010
1024
4
0.0010
IMF5 1024
4
0.0039
1024
8
0.0078
1024
8
0.0156
IMF6 1024
8
0.0010
1024
16
0.0010
1024 32
0.0010
IMF7 1024
8
0.0078
1024
8
0.0020
1024 32
0.0010
IMF8 1024
8
0.0010
1024
2
0.0010
1024
2
0.0020
4
0.0039
1024
0.1
0.0020
1024
2
0.0010
c
Rn
1024
22 / 39
1
4.5 Influences of short-period components combinations
2
The influences of short-period components combinations on prediction performances
3
and time consuming are summarized in Table 8 and Fig.10, respectively. We used
4
Kunming data set as an example. Case1 represents the experimental results with
5
IMF1 as the only short-period component, case2 represents the experimental results
6
with IMF1+IMF2 as the short-period component, case2 represents the experimental
7
results with IMF1+IMF2+IMF3 as the short-period component, both in Table 8 and
8
Fig.10. Considering both with the prediction performances and time consuming, we
9
can conclude that the best choice is to choose IMF1 as the short-period component,
10 11
instead of IMF1+IMF2 or IMF1+IMF2+IMF3. Table 8. Performances comparison of short-period components combinations
R
RMSE
MAE
Case1 Case2 Case3 Case1 Case2 Case3 Case1 Case2 Case3
Kunming 0.9406 0.9394 0.9381 271.0670 271.1602 273.6014 212.3615 208.2951 208.8073
23 / 39
Lincang 0.9442 0.9294 0.9290 284.2566 322.9070 324.1195 208.2581 231.1351 231.4450
Mengzi 0.9337 0.9260 0.9244 211.8788 222.9883 225.6493 167.5038 174.2003 173.8431
1 2 3
Fig. 10. Time comparison of short-period components combinations 4.6 Performance and analysis
4
Tables and figures in this section give the comparable results of 8 models. Within
5
these tables and figures, SVR represents that we use SVR on original rainfall data
6
without any data-processing methods, ANN represents that we use ANN on original
7
rainfall data without any data-processing methods, E-SVR represents that we use SVR
8
on EEMD-decomposed rainfall data and E-ANN represents that we use ANN on
9
EEMD-decomposed rainfall data. EEMD-ARIMA, EMD-RBFNN, and EMD-DBN
10
are the methods proposed in [36,37,38] respectively.
11
As shown in Table 9, compared with the other models, ANN model has the worst
12
performance. The results of SVR model are much like ANN. The other six EEMD-
13
based models obtain better performance, means that EEMD plays significant role in
14
improving prediction accuracy. Thus, the next performance comparisons are mainly
15
carried out among EEMD- based models.
16
It also be obviously observed from Table 9 that the E-SVR-ANN model show the best 24 / 39
1
performance on R, RMSE, and MAE in their average values. E-SVR-ANN model
2
obtaining a best R, RMSE and MAE values of 0.9395, 255.7341 and 196.0411
3
respectively, which improves R by 1.2%, reduces RMSE and MAE by 9.1% and
4
7.98%, respectively compared with E-SVR. Similarly, E-SVR-ANN improves R by
5
0.36%, reduces RMSE and MAE by 10.11% and 11.5%, respectively compared with
6
E-ANN with the average value. While comparing with the other three models
7
EEMD-ARIMA, EMD-RBFNN and EMD-DBN, R of E-SVR-ANN is improved by
8
3.1%, 8.6% and 12.8% respectively. Meanwhile, RMSE is reduced by 6.7%, 36.2%
9
and 32.1%, and MAE is reduced by 16.6%, 33.4% and 27.6% respectively.
10
Station-based values of R, RMSE and MAE also reveal the same conclusion. For
11
example, with Kunming rainfall series, R of E-SVR-ANN is improved by 0.69%,
12
0.6%, 4.8%, 10.4% and 13.9%, respectively compared with E-ANN, E-SVR,
13
EEMD-ARIMA, EMD-RBFNN and EMD-DBN. At the same time, RMSE is reduced
14
by 5.5%, 3.5%, 12.4%, 26% and 35.9% while MAE is reduced 7.2%, 1%, 12.1%,
15
21.8% and 29%, respectively compared with E-ANN, E-SVR, EEMD-ARIMA,
16
EMD-RBFNN and EMD-DBN. Table 9. Prediction performance comparisions
17
R
RMSE
ANN SVR E-ANN E-SVR EEMD-ARIMA EMD-RBFNN EMD-DBN E-SVR-ANN ANN
Kunming Lincang Mengzi Average 0.7266 0.7910 0.6769 0.7315 0.7511 0.8569 0.7332 0.7804 0.9350 0.9439 0.9310 0.9366 0.9351 0.9288 0.9223 0.9287 0.8975 0.9249 0.9115 0.9113 0.8518 0.8767 0.8660 0.8648 0.8258 0.8664 0.8073 0.8332 0.9406 0.9442 0.9337 0.9395 544.8936 544.4374 450.1345 513.1552 25 / 39
MAE
SVR E-ANN E-SVR EEMD-ARIMA EMD-RBFNN EMD-DBN E-SVR-ANN ANN SVR E-ANN E-SVR EEMD-ARIMA EMD-RBFNN EMD-DBN E-SVR-ANN
523.0857 280.8749 285.9490 309.2700 366.3900 423.0900 271.0670 356.4787 362.3314 214.6052 226.1165 241.4900 271.4400 298.9200 212.3615
449.2077 293.3639 324.5389 288.6400 502.8700 350.2200 284.2566 407.1008 326.6116 223.5826 231.8147 216.4700 354.0800 256.4300 208.2581
405.1217 270.5488 232.5319 224.4900 333.3900 356.8400 211.8788 332.4932 304.1594 209.6316 177.9335 172.0000 257.4300 257.2200 167.5038
459.1384 281.5959 281.0066 274.1333 400.8833 376.7167 255.7341 365.3576 331.0341 215.9398 211.9549 209.9867 294.3167 270.8567 196.0411
1
The prediction results of all models at all the three stations are presented in Fig. 11-
2
Fig.16. Fig. 11- Fig.13 illustrate the rainfall prediction results using E-SVR, E-ANN,
3
EEMD-ARIAM, EMD-RBFNN, EMD-DBN and E-SVR-ANN models with data of
4
Kunming, Lincang and Mengzi respectively. These six models all have good
5
performances for rainfall simulation, whilst E-SVR-ANN has less fluctuation than
6
others. Fig. 14- Fig.16 are scatter plots of prediction results using E-SVR, E-ANN,
7
EEMD-ARIAM, EMD-RBFNN, EMD-DBN and E-SVR-ANN in Kunming, Lincang,
8
and Mengzi, in which the black line represents y=x. We can learn from these figures
9
that all the six models are close to the 45-degree line, while E-SVR-ANN model is the
10
most stable one. The parameters of these fitted linear regression lines of all the
11
prediction models are shown in Table 10, which represents the fitting accuracy of the
12
curves. We can find the slopes of the 6 models are all close to 1 as shown in Fig.
13
14-16, and the R-square of E-SVR-ANN model performs better in most cases. It
14
indicates that the E-SVR-ANN prediction data has the lower discrete degree which
15
means the E-SVR-ANN prediction data is more close to the observed data and fits 26 / 39
1
better than the other models.
2 3
Fig. 11. Predicted and observed rainfall time series (Kunming)
4 5
Fig. 12. Predicted and observed rainfall time series (Lincang)
6 7
Fig. 13. Predicted and observed rainfall time series (Mengzi) 27 / 39
1 2
Fig. 14. Scatter plots of prediction in Kunming
3 4
Fig. 15. Scatter plots of prediction in Lincang
5 6
Fig. 16. Scatter plots of prediction in Mengzi 28 / 39
Table 10. R-squares and slopes of scatter plots
1
E-ANN E-SVR EEMD-ARIMA EMD-RBFNN EMD-DBN E-SVR-ANN 2
Kunming R-square slope 0.8743 1.0543 0.8745 1.0337 0.8486 0.9940 0.8595 0.7909 0.7100 1.0104 1.0619 0.8847
Lincang R-square slope 0.8909 0.9749 0.8627 0.9409 0.9659 0.8954 0.8061 0.7214 0.8315 1.0293 0.8916 0.9516
Mengzi R-square Slope 0.8668 1.0994 0.8506 1.0353 0.8559 1.0115 0.8156 0.9085 0.6809 1.0712 1.0313 0.8717
Table 11. DM test results between E-SVR-ANN and other models
Kunming DM-MAE Lincang Mengzi Kunming DM-RMSE Lincang Mengzi
ANN
SVR
7.8046 29.4237 20.1783 19.6078 34.9980 27.3331
9.9246 30.6048 48.9603 20.1747 32.3356 39.1509
EEMDARIMA 3.4772 9.5704 0.5949 8.4155 5.1880 5.5340 37.0603 15.8411 4.6826 3.8619 5.6881 0.1378 6.4732 8.7812 4.8534 60.8196 23.5138 11.2056 E-ANN
E-SVR
EMDEMDRBFNN DBN 12.0152 3.7638 6.6017 21.7559 17.9307 9.6847 23.4175 8.0816 13.3633 21.3823 32.5325 19.5049
3 4
Note: DM-MAE denotes the DM test statistic based on Mean Absolute Error; DM-RMSE denotes the DM test statistic based on Root Mean Squared Error.
5
Finally, prediction performance of above 8 models is compared by the DM test to
6
identify the statistical significance. Using the DM test descried in [40], the
7
comparison of every pair of prediction models is summarized in Table 11.
8
Conclusions could be drawn from Table 11: 1. According to the DM test based on
9
MAE with data of Kunming, since the absolute value of DM-MAE = 7.8046>2.58,
10
the zero hypothesis is rejected at the 1% level of significance, that is to say, the
11
observed differences are significant and combined with Table 8, the prediction
12
accuracy of model E-SVR-ANN is better than that of model ANN. 2. According to the
13
DM test based on RMSE with data of Kunming, the absolute value of DM-RMSE =
14
19.6078, the zero hypothesis is also rejected and the observed differences are still
15
significant and combined with Table 9, the prediction accuracy of model 29 / 39
1
E-SVR-ANN is still better than that of model ANN. Similarly, almost all the DM test
2
values in Table 11 show that our proposed model outperforms other alternative
3
models.
4
5. Conclusion
5
Since rainfall time series is nonlinear and non-stationary, traditional models may not
6
get the ideal results. In this paper, we proposed a combined model which applied
7
different supervised learning methods for various scales according to their features.
8
Unlike traditional methods to build prediction model that treat all the IMFs identically
9
or ignore the shortest-period component, the proposed model proved to be more
10
accurate in rainfall prediction area. First, we use EEMD to decompose the original
11
rainfall time series into a set of IMFs that represent intrinsic modes of oscillation from
12
short-to-long periods. Then we use SVR to predict short-period component IMF1 and
13
use ANN for others to get better prediction on different datasets in Yunnan province,
14
southwest China. From the experiment results, we find our proposed model
15
E-SVR-ANN performs the best with the largest R and the lowest RSME and MAE in
16
all testing cases. While using E-SVR-ANN, R will be improved from 0.36% to 12.8%,
17
RMSE will be reduced from 6.7% to 36.2% and MAE will be reduced from 7.98% to
18
33.4%, respectively. Furthermore, we find out that in the scatter plots E-SVR-ANN
19
perform more stable than other models. DM test results also show that our proposed
20
model outperforms other alternative models and shows its reliability.
21
It may be interesting to combine prediction models with recently-developed deep
22
learning methods such as stacked autoencoder algorithm to improve the performance 30 / 39
1
of rainfall prediction. These research topics are considered in our ongoing work.
2 3
Figure Caption
4
Fig. 1. EMD algorithm
5
Fig. 2. BP-ANN topological structure
6
Fig. 3. Location of the three meteorological stations
7
Fig. 4. Observed rainfall time series
8
Fig. 5. Time consumption using Kunming datasets
9
Fig. 6. E-SVR-ANN model
10
Fig. 7. Predicted and observed NN5 data (column 6)
11
Fig. 8. Scatter plot of prediction in NN5 (column 6)
12
Fig. 9. Decomposed IMFs of original rainfall time series
13
Fig. 10. Time comparison of short-period components combinations
14
Fig. 11. Predicted and observed rainfall time series (Kunming)
15
Fig. 12. Predicted and observed rainfall time series (Lincang)
16
Fig. 13. Predicted and observed rainfall time series (Mengzi)
17
Fig. 14. Scatter plots of prediction in Kunming
18
Fig. 15. Scatter plots of prediction in Lincang
19
Fig. 16. Scatter plots of prediction in Mengzi
20 21 22 31 / 39
1 2
References
3
[1] Wang B, Xiang B, Li J, et al. Rethinking Indian monsoon rainfall prediction in
4
the context of recent global warming[J]. Nature communications, 2015, 6.
5
[2] Brown C, Meeks R, Ghile Y, et al. Is water security necessary? An empirical
6
analysis of the effects of climate hazards on national-level economic growth[J].
7
Phil. Trans. R. Soc. A, 2013, 371(2002): 20120416.
8
[3] Kusiak A, Wei X, Verma A P, et al. Modeling and prediction of rainfall using
9
radar reflectivity data: A data-mining approach[J]. IEEE Transactions on
10
Geoscience and Remote Sensing, 2013, 51(4): 2337-2342.
11
[4] Ramana R V, Krishna B, Kumar S R, et al. Monthly rainfall prediction using
12
wavelet neural network analysis[J]. Water resources management, 2013, 27(10):
13
3697-3711.
14
[5] Bui D T, Pradhan B, Lofman O, et al. Regional prediction of landslide hazard
15
using probability analysis of intense rainfall in the Hoa Binh province,
16
Vietnam[J]. Natural hazards, 2013, 66(2): 707-730.
17
[6] Ganguly A R, Bras R L. Distributed quantitative precipitation forecasting using
18
information from radar and numerical weather prediction models[J]. Journal of
19
Hydrometeorology, 2003, 4(6): 1168-1180.
20
[7] Nikam V B, Meshram B B. Modeling rainfall prediction using data mining
21
method: A Bayesian approach[C]//Computational Intelligence, Modelling and
22
Simulation (CIMSim), 2013 Fifth International Conference on. IEEE, 2013: 32 / 39
1 2 3
132-136. [8] Pai P F, Hong W C. A recurrent support vector regression model in rainfall forecasting[J]. Hydrological Processes, 2007, 21(6): 819-827.
4
[9] Hong W C, Pai P F. Potential assessment of the support vector regression
5
technique in rainfall forecasting[J]. Water Resources Management, 2007, 21(2):
6
495-513.
7
[10] Hong W C. Rainfall forecasting by technological machine learning models[J].
8
Applied Mathematics and Computation, 2008, 200(1): 41-57.Vasiliades L,
9
Galiatsatou P, Loukas A. Nonstationary frequency analysis of annual maximum
10
rainfall using climate covariates[J]. Water Resources Management, 2015, 29(2):
11
339-358.
12
[11] Kashiwao T, Nakayama K, Ando S, et al. A neural network-based local rainfall
13
prediction system using meteorological data on the Internet: A case study using
14
data from the Japan Meteorological Agency[J]. Applied Soft Computing, 2017,
15
56: 317-330.
16
[12] Cheng C, Sa-Ngasoongsong A, Beyca O, et al. Time series forecasting for
17
nonlinear and non-stationary processes: a review and comparative study[J]. IIE
18
Transactions, 2015, 47(10): 1053-1071.
19 20
[13] Sivapragasam C, Liong S Y, Pasha M F K. Rainfall and runoff forecasting with SSA–SVM approach[J]. Journal of Hydroinformatics, 2001, 3(3): 141-152.
21
[14] Liong S Y, Sivapragasam C. Flood stage forecasting with support vector
22
machines[J]. JAWRA Journal of the American Water Resources Association, 33 / 39
1
2002, 38(1): 173-186.
2
[15] Hong W C, Dong Y, Zhang W Y, et al. Cyclic electric load forecasting by
3
seasonal SVR with chaotic genetic algorithm[J]. International Journal of
4
Electrical Power & Energy Systems, 2013, 44(1): 604-614.
5
[16] Vasiliades L, Galiatsatou P, Loukas A. Nonstationary frequency analysis of
6
annual maximum rainfall using climate covariates[J]. Water Resources
7
Management, 2015, 29(2): 339-358.
8
[17] Wu C L, Chau K W. Prediction of rainfall time series using modular soft
9
computing methods[J]. Engineering applications of artificial intelligence, 2013,
10 11 12
26(3): 997-1007. [18] Chau K W, Wu C L. A hybrid model coupled with singular spectrum analysis for daily rainfall prediction[J]. Journal of Hydroinformatics, 2010, 12(4): 458-473.
13
[19] Gan K, Sun S, Wang S, et al. A secondary-decomposition-ensemble learning
14
paradigm for forecasting PM2. 5 concentration[J]. Atmospheric Pollution
15
Research, 2018, https://doi.org/10.1016/j.apr.2018.03.008
16
[20] Hao H, Wang H L, Rehman N U. A joint framework for multivariate signal
17
denoising using multivariate empirical mode decomposition[J]. Signal Processing,
18
2017, 135: 263-273.
19
[21] Guo Y, Naik G R, Nguyen H. Single channel blind source separation based local
20
mean decomposition for Biomedical applications[C]//Engineering in Medicine
21
and Biology Society (EMBC), 2013 35th Annual International Conference of the
22
IEEE. IEEE, 2013: 6812-6815. 34 / 39
1
[22] Krishna P K M, Ramaswamy K. Single Channel speech separation based on
2
empirical mode decomposition and Hilbert Transform[J]. IET Signal Processing,
3
2017.
4
[23] Naik G R, Selvan S E, Nguyen H T. Single-channel EMG classification with
5
ensemble-empirical-mode-decomposition-based
ICA
for
diagnosing
6
neuromuscular disorders[J]. IEEE Transactions on Neural Systems and
7
Rehabilitation Engineering, 2016, 24(7): 734-743.
8
[24] Huang S, Chang J, Huang Q, et al. Monthly streamflow prediction using modified
9
EMD-based support vector machine[J]. Journal of Hydrology, 2014, 511:
10
764-775.
11
[25] Wang W, Xu D, Chau K, et al. Improved annual rainfall-runoff forecasting using
12
PSO–SVM model based on EEMD[J]. Journal of Hydroinformatics, 2013, 15(4):
13
1377-1390.
14
[26] Xiang Y, Wang X, He L, et al. Spatial-Temporal Analysis of Environmental Data
15
of North Beijing District Using Hilbert-Huang Transform[J]. PloS one, 2016,
16
11(12): e0167662.
17 18 19 20 21 22
[27] Govindaraju R S. Artificial neural networks in hydrology. II: hydrologic applications[J]. Journal of Hydrologic Engineering, 2000, 5(2): 124-137. [28] Wu Z, Huang N E. Ensemble empirical mode decomposition: a noise-assisted data analysis method[J]. Advances in adaptive data analysis, 2009, 1(01): 1-41. [29] Govindaraju R S. Artificial neural networks in hydrology. I: Preliminary concepts[J]. Journal of Hydrologic Engineering, 2000, 5(2): 115-123. 35 / 39
1
[30] Wu C H, Ho J M, Lee D T. Travel-time prediction with support vector
2
regression[J]. IEEE transactions on intelligent transportation systems, 2004, 5(4):
3
276-281.
4
[31] Guo Z, Zhao W, Lu H, et al. Multi-step forecasting for wind speed using a
5
modified EMD-based artificial neural network model[J]. Renewable Energy,
6
2012, 37(1): 241-249.
7
[32] Chattopadhyay S, Chattopadhyay G. Identification of the best hidden layer size
8
for three-layered neural net in predicting monsoon rainfall in India[J]. Journal of
9
Hydroinformatics, 2008, 10(2): 181-188.
10 11
[33] Bray M, Han D. Identification of support vector machines for runoff modelling[J]. Journal of Hydroinformatics, 2004, 6(4): 265-280.
12
[34] Lv Y, Duan Y, Kang W, et al. Traffic flow prediction with big data: a deep
13
learning approach[J]. IEEE Transactions on Intelligent Transportation Systems,
14
2015, 16(2): 865-873.
15
[35] Ren Y, Suganthan P N, Srikanth N. A comparative study of empirical mode
16
decomposition-based short-term wind speed forecasting methods[J]. IEEE
17
Transactions on Sustainable Energy, 2015, 6(1): 236-244.
18
[36] Wang W, Chau K, Xu D, et al. Improving forecasting accuracy of annual runoff
19
time series using ARIMA based on EEMD decomposition[J]. Water Resources
20
Management, 2015, 29(8): 2655-2675.
21
[37] Fu Q, Liu D, Li T, et al. EMD-RBFNN Coupling Prediction Model of Complex
22
Regional Groundwater Depth Series: A Case Study of the Jiansanjiang 36 / 39
1
Administration of Heilongjiang Land Reclamation in China[J]. Water, 2016, 8(8):
2
340.
3
[38] Qiu X, Ren Y, Suganthan P N, et al. Empirical mode decomposition based
4
ensemble deep learning for load demand time series forecasting[J]. Applied Soft
5
Computing, 2017, 54: 246-255.
6 7
[39] Crone S. Time series forecasting competition for computational intelligence. http://www.neural-forecastingcompetition.com. 2008.
8
[40] Derrac J, García S, Molina D, et al. A practical tutorial on the use of
9
nonparametric statistical tests as a methodology for comparing evolutionary and
10
swarm intelligence algorithms[J]. Swarm and Evolutionary Computation, 2011,
11
1(1): 3-18.
12
[41] Tang L, Wang S, He K, et al. A novel mode-characteristic-based decomposition
13
ensemble model for nuclear energy consumption forecasting[J]. Annals of
14
Operations Research, 2015, 234(1): 111-132.
15
[42] Xie G, Zhang N, Wang S. Data characteristic analysis and model selection for
16
container
throughput
forecasting
within
a
decomposition-ensemble
17
methodology[J]. Transportation Research Part E: Logistics and Transportation
18
Review, 2017, 108: 160-178.
19
[43] Sun S, Wang S, Zhang G, et al. A decomposition-clustering-ensemble learning
20
approach for solar radiation forecasting[J]. Solar Energy, 2018, 163: 189-199.
21
[44] Niu M, Gan K, Sun S, et al. Application of decomposition-ensemble learning
22
paradigm with phase space reconstruction for day-ahead PM2. 5 concentration 37 / 39
1 2 3
forecasting[J]. Journal of environmental management, 2017, 196: 110-118. [45] Zhang J L, Zhang Y J, Zhang L. A novel hybrid method for crude oil price forecasting[J]. Energy Economics, 2015, 49: 649-659.
4
38 / 39
1
Acknowledgements
2
This paper is supported by the following projects:
3
1. Sichuan Provincial Science and Technology Plan Program on Key Research Project,
4
and the project’s number is No.182DYF2573.
5
2. Next Generation Internet Technology Innovation Project, and the project’s number
6
is No.NGII20160327.
39 / 39