International Journal of Forecasting xxx (xxxx) xxx
Contents lists available at ScienceDirect
International Journal of Forecasting journal homepage: www.elsevier.com/locate/ijforecast
Predicting monthly biofuel production using a hybrid ensemble forecasting methodology ∗
Lean Yu a , , Shaodong Liang a , Rongda Chen b , Kin Keung Lai c ,
∗
a
School of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China School of Finance, Zhejiang University of Finance & Economics, Hangzhou 310018, China c College of Economics, Shenzhen University, Shenzhen 518060, China b
article
info
Keywords: Biofuel production forecasting Hybrid ensemble forecasting EMD LSTM ELM Fine-to-coarse reconstruction
a b s t r a c t This paper proposes a hybrid ensemble forecasting methodology that integrating empirical mode decomposition (EMD), long short-term memory (LSTM) and extreme learning machine (ELM) for the monthly biofuel (a typical agriculture-related energy) production based on the principle of decomposition—reconstruction—ensemble. The proposed methodology involves four main steps: data decomposition via EMD, component reconstruction via a fine-to-coarse (FTC) method, individual prediction via LSTM and ELM algorithms, and ensemble prediction via a simple addition (ADD) method. For illustration and verification, the biofuel monthly production data of the USA is used as the our sample data, and the empirical results indicate that the proposed hybrid ensemble forecasting model statistically outperforms all considered benchmark models considered in terms of the forecasting accuracy. This indicates that the proposed hybrid ensemble forecasting methodology integrating the EMD-LSTM-ELM models based on the decomposition—reconstruction—ensemble principle has been proved to be a competitive model for the prediction of biofuel production. © 2019 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
1. Introduction It is well known that there is a close relationship between agriculture and biofuel (Jafari & Othman, 2016). In particular, the development of biofuels has brought the two industries closer together and has had far-reaching effects on agricultural development and food security. The main impacts relate to the following four aspects. Firstly, modern agriculture increasingly depends on fossil-fuelderived fertilizers and diesel-driven machinery. Food storage, processing and marketing are all energy-intensive activities, which links food and energy more closely to inputs. Secondly, the biofuel industry’s demand for raw materials has expanded the market space for agricultural products further, which has led to a non-traditional and ∗ Corresponding authors. E-mail addresses:
[email protected] (L. Yu),
[email protected] (K.K. Lai).
non-food demand for agricultural products and opened up direct links between energy and food demand. Thirdly, the development of the biofuel industry has changed the competitive situation and the distribution pattern of the energy market. Bioethanol and biodiesel have a direct competition and substitution relationship with petroleum-based gasoline and diesel, and thus there is a linkage effect among the prices of these energy products. Finally, in the agricultural product market, biofuel enterprises, food processing enterprises and feed production enterprises all have a direct competitive relationship. The complexity and sensitivity of the energy market demand and price changes mean that small changes in energy demand are likely to cause large fluctuations in the demand for biofuel feedstocks, which will have a direct effect on the stability and sustainable development of the agricultural product market. We display the above direct effects between agriculture and biofuel by using Chinese data for illustrative
https://doi.org/10.1016/j.ijforecast.2019.08.014 0169-2070/© 2019 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
2
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
purposes. Due to the crisis of fossil energy shortages, bioenergy is proposed as an important alternative energy source for solving the shortage issue. The two oil crises in 1973 and 1979 were a heavy blow to the world’s major oil-dependent economies. The potential of fossil energy, energy conservation and the development of alternative new energy sources have attracted increasing attention globally. Therefore, people naturally consider the development and utilization of renewable energy as an effective alternative to fossil energy. In China, the government attaches great importance to the development of the biofuel industry, and it has developed rapidly with the involvement of several major oil groups, as is shown in Fig. 1. As Fig. 1 shows, biofuel production shows a general rising trend, except in 2015, when cost pressures narrowed the price gap between biofuels and oil. The shrinking profits of biofuel manufacturers led to a significant decrease in biofuel production in China in 2015 compared with that in 2014. In recent years, more than 40 manufacturers with annual outputs of more than 5000 tons have developed to a large scale. With the rapid growth of biofuel production, more and more agricultural products, including various crops, are needed. Because some major crops, such as corn and rice, are important for both the development of biofuels and the consumption of low-income households, the rapid development of the biofuel industry has led to a food crisis. For example, Timilsina, Beghin, Mensbrugghe, and Mevel (2012) analyzed the impact of the bioenergy industry on food security from the perspectives of the market, target and impact, and found that the bioenergy industry had a direct impact on cultivated land. Similarly, Lee, Clark, and Devereaux (2008) cited the report of the World Food and Agriculture Organization (FAO) to show that about 40% of the world food crisis in 2007 was due to the development of the bioenergy industry. The energy properties of agricultural products have led to many new characteristics emerging in the adjustment of major crop production structures, as is illustrated in Fig. 2. The main tendency reflects this in two ways. First, considering the production of the three major food crops in 2017, the rice production of China was 212.676 million tons and the wheat output was 134.334 million tons, while the corn production was 259.071 million tons. Thus, the corn production has surpassed that of rice, becoming the largest food crop in China. In particular, the contribution of corn to grain production is now much larger than those of either rice or wheat, and this growth in corn production is the main reason for the grain production increase in China. Conversely, soybean production shows a significant downward trend, with continuous negative changes in planting area. Second, from the perspective of the planting areas of major domestic crops, overall the sown area of corn increases year by year from 2000 to 2015 due to the development of biofuel, reaching 44,968 thousand hectares in 2015. Conversely, the decline in the area of corn in 2015– 2017 is due in part to the regulation of biofuel production. In general, the proportion of the grain sown area that is growing corn increased from 21.26% in 2000 to 41.34% in 2015. Unlike corn, China’s rice planting area is relatively
stable. In 2015, the sown area of rice was 30,784 thousand hectares, which is up by 255 thousand hectares from 2011, accounting for the proportion of grain sown area due to the subsidy (Li, Liao, Wang and Huang, 2018). In contrast, the planting area of wheat was 24,569 thousand hectares in 2014, including a decrease of 201 thousand hectares from 2011, and the proportion of the grain sown area growing wheat also decreased from 24.57% in 2000 to 21.35% in 2015. Similarly, the soybean planting area is decreasing year by year: from 12,660 thousand hectares in 2000 to 8443 thousand hectares in 2015. The proportion of the grain sown area growing soybeans also fell from 8.58% in 2000 to 5.9% in 2017. The trends in the international arena are similar. In recent years, the crop planting structure of the United States of America (USA) has also been undergoing necessary adjustments in order to adapt to the structural changes in the international market and domestic demand. The increase in demand in the international market and the large domestic consumption of corn for ethanol fuel have led to many original wheat production lands in the USA being changed to produce corn, meaning that the corn planting area has shown a growing trend. At the same time, while corn stalks may also be used to produce ethanol in the future, at present the original corn stalks are returned to the field directly in order to increase the soil fertility. However, if a large proportion of corn stalks is used to produce fuel ethanol, it will increase the demand for fertilizer to a certain extent, and undoubtedly will have a negative impact on soil conditions. In summary, the development of the biofuel industry has consolidated the relationship between biofuel and agriculture further. The development of biofuel has expanded the agricultural product market, meaning that an increase in the production of agricultural products can stimulate and guarantee biofuel production. In the meanwhile, the development of biofuel has changed the structure and pattern of global food production and trade. The short-term contribution of biofuel to global energy and the environment is relatively small, but (inevitably) has exacerbated the contradiction between supply and demand in the global agricultural product market, becoming one of the main reasons for sharp fluctuations in food prices. In order to reduce the large price variation of agricultural products, it is necessary to predict the biofuel production. Usually, accurate predictions of biofuel production can help to adjust the production of agricultural products in time. Accordingly, the timely adjustment of agricultural product prices according to the corresponding biofuel production forecasts can do a reasonable job of alleviating grain shortages in a certain specific period. In this sense, biofuel production forecasting is not only necessary, but also extremely important for the development of the biofuel industry and agriculture. In the area of biofuel production forecasting, the previous literature has mainly used traditional econometric and statistical methods such as semi-empirical models (Melikoglu, 2014), the Hubbert model (Li & Zhang, 2014) and the fuzzy grey-Markov model (Geng, Yong, Sun, Jiang, & Chen, 2015). Melikoglu (2014) used semiempirical models to forecast the bioethanol and biodiesel
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
3
Fig. 1. Biofuel production in China, 2006–2017. Source: National Bureau of Statistics of China (http://www.stats.gov.cn).
Fig. 2. Sown areas of major crops in China over the period 2000–2017. Source: National Bureau of Statistics of China (http://www.stats.gov.cn).
demand in view of Turkey’s Vision 2023 goals, Energy Market Regulatory Authority targets, and European Union directives. The results predicted a considerable increase in biodiesel demand between 2014 and 2023, and Turkey’s biodiesel potential being intensified further by extensive biofuel farming on fertile unused agricultural land across the country. Li and Zhang (2014) applied a Hubbert model to analyze the development trend of biofuel production in China and provided a brief comparison of China’s biofuel production with that of the USA. The empirical results show that the biofuel production of the first-generation technology in China will reach 58 million barrels in 2025, which is equal to that of the U.S. in 2002. Geng et al. (2015) resolved the influence of random fluctuation data and weak anti-interference capability in the Markov chain model by proposing a dynamic fuzzy grey-Markov prediction model for biofuel production forecasting, in order to improve the prediction performance of the conventional prediction methods based upon past production levels in
conjunction with the factors of economy, governmental policies, and technological developments. Their empirical results demonstrated the superiority of the proposed fuzzy grey-Markov model relative to the benchmark prediction models. However, the biofuel production system is a complex system, which is affected easily by various factors such as the economy, governmental policies, resources, technological developments and social issues. Thus, the above methods can only provide good prediction results under linear assumptions, being unable to capture the hidden nonlinear features of the biofuel production series. Numerous experiments have demonstrated that the predictive performance may be very poor if one uses these traditional statistical and econometric methods (Mejdoub & Ghorbel, 2018; Song & Yu, 2018; Weigend, 2018). Therefore, the traditional methods are not suitable for predicting biofuel production (Geng et al., 2015). Despite the limitations of the traditional methods, some nonlinear and emerging artificial intelligent (AI)
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
4
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
methods, such as artificial neural networks (ANN; see Rajendra, Jena, & Raheman, 2009), genetic algorithms (GA; see Rajendra et al., 2009) and support vector regression (SVR; see Li, Liu, Fang, & Chen, 2010; Paraskevopoulos & Posch, 2018), provide powerful solutions to complex time series prediction. In recent years, AI methods have been utilized in various big data-driven models for complex financial time series prediction (Zhou et al., 2016). However, such methods have several shortcomings. For example, AI-based nonlinear forecasting methodologies are time-consuming and inefficient, and often require a considerable computational time. Furthermore, most AI-based prediction models such as ANN are parametersensitive and easy to trap into ‘‘local minima’’ or ‘‘overfitting’’ (Husaini, Ghazali, Nawi, & Ismail, 2011). This paper will endeavor to overcome these shortcomings by providing an efficient and fast solution to biofuel production forecasting. In such a situation, we propose a hybrid ensemble forecasting methodology that integrates empirical mode decomposition (EMD), long short-term memory (LSTM) and extreme learning machine (ELM) (EMD-LSTM-ELM for short), based on the decomposition– reconstruction–ensemble principle (Yu, Wang, & Tang, 2015), for predicting monthly biofuel production. The proposed methodology involves four main steps. Firstly, for simplicity, we apply EMD, a competitive decomposition method, to divide the original complex data into some relatively simple components, namely intrinsic mode functions (IMFs). Secondly, considering the time required to model all of the decomposed IMFs, we apply a fineto-coarse reconstruction method to capture inner factors and reduce the computational cost. Thirdly, we use LSTM, an excellent deep learning algorithm for dealing with complex time series, to forecast the high-frequency component. Unlike the high-frequency component, the low-frequency and tendency components are relatively regular. Thus, ELM is used to predict these data in terms of time complexity and prediction accuracy. Finally, we integrate these individual forecasting results into an aggregated output for ensemble purposes by means of simple addition. In general, the main contributions of this paper are as follows. First of all, it proposes a novel hybrid ensemble forecasting method integrating EMD, LSTM and ELM based on the decomposition–reconstruction–ensemble principle. Our empirical results show that this proposed hybrid ensemble forecasting method can improve the prediction accuracy and reduce the computational complexity. Second, the proposed hybrid ensemble forecasting methodology is first used to predict biofuel production, in order to improve some shortcomings of traditional linear methods and AI-based nonlinear methods in the previous literature. Lastly, we compare the forecasting performances of the various models with those of different benchmarking prediction models, and the empirical results show the robustness and superiority of the proposed hybrid ensemble forecasting model. The main aims of this paper are to propose a novel hybrid ensemble forecasting methodology for monthly biofuel production forecasting, and to compare it with other forecasting techniques, including popular single models
and typical decompose–ensemble models with reconstruction. The remainder of this paper is organized as follows. Section 2 describes the formulation process of the proposed hybrid ensemble forecasting method. The empirical results are reported and discussed further in Section 3. Section 4 concludes the paper and outlines further research directions. 2. Methodology formulation This section presents the proposed hybrid ensemble forecasting methodology in terms of the decomposition– reconstruction–ensemble principle for monthly biofuel production prediction. In particular, Section 2.1 provides an overview of the proposed methodology, and Sections 2.2–2.5 respectively describe the four main steps of the proposed hybrid ensemble methodology, together with the related techniques. 2.1. General framework of the proposed methodology We improve the forecasting accuracy and reduce the computational complexity by proposing a hybrid ensemble forecasting methodology for predicting the monthly biofuel production, as illustrated in Fig. 3. As the diagram shows, the proposed hybrid ensemble forecasting methodology involves four main steps: data decomposition, component reconstruction, individual prediction and ensemble prediction. In the first step, based on the ‘‘divide and conquer’’ strategy (Yu et al., 2015), empirical model decomposition (EMD), a competitive decomposition method, is applied to divide the original complex data into some relatively simple components for the sake of simplicity. In the second step, a fine-to-coarse (FTC) reconstruction process based on the T -test that can explore the inner factors of different components is used to reduce the number of components, thus decreasing the computational complexity (Zhang, Lai, & Wang, 2008). This reconstruction process works with three main parts of the original time series data, namely a high-frequency component, a low-frequency component and a tendency component. In the third step, we employ two different machine learning approaches, namely long-short term memory (LSTM) and extreme learning machine (ELM), based on the advantages of each given the characteristics of the three components. Because there may be some complex patterns hidden in the reconstructed high-frequency component, LSTM, a typical deep learning model, is selected for forecasting the high-frequency component. Similarly, ELM (Yu, Yang, & Ling, 2016), a fast and efficient learning method, can be chosen to predict the low-frequency and tendency components because these reconstructed components are relatively simple and regular. Finally, the three individual forecasting results above are combined into a final ensemble prediction by simple addition. The remainder of this section will present the details of the four steps.
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
5
Fig. 3. General framework of the proposed hybrid ensemble forecasting methodology.
2.2. Data decomposition The first step in the proposed hybrid ensemble forecasting approach is data decomposition. In this step, EMD, a competitive decomposition method, is used to divide the original complex time series data into a few relatively simple components, for the sake of simplicity. Compared with other traditional decomposition methodologies, such as Fourier decomposition (Wen, Gao, Xiao, Xu, & Tao, 2013) and wavelet decomposition (Feng, Ying, Wu, & Wan, 2014), EMD is a kind of empirical, intuitive, direct and self-adaptive data processing method, and is suitable for both nonlinear and nonstationary data (Huang et al., 1998). EMD can capture various hidden patterns in the complex data systems effectively even without a priori knowledge, and has been applied widely to various nonstationary data processing, due to its flexibility and self-adaptivity (Yu et al., 2015). Usually, EMD decomposes the original time series into some independent and nearly periodic intrinsic mode functions (IMFs). Mathematically, IMFs must satisfy two conditions: (1) in each whole function, the number of extrema (i.e., both maxima and minima) and the number of zero crossings must be equal, or different by no more than one; and (2) the functions must be symmetric with respect to a local zero mean. Hence, the original data
series Xt (t = 1, 2, . . . , T ) can be expressed as a sum of IMFs and one residue, xt =
N ∑
cj,t + rN ,t ,
(1)
j=1
where N is the number of IMFs, r N ,t is the residue, and c j,t (j = 1, 2, . . . , N) is the jth IMF at time t. The IMF components that contain indifferent frequency bands are different and change with the variation in time series X t , while r N ,t represents the central tendency of time series X t . In practice, the total number of IMFs, N, can be determined based on the data size T according to the equation N = log2 T. 2.3. Component reconstruction The previous step decomposes the original time series into a number of IMFs and one residue, for the sake of simplicity. However, this is quite a time-consuming process for individual forecasting if all of the decomposed IMFs are used for prediction. In addition, Yu et al. (2015) mentioned that all IMF prediction errors can be accumulated in the final ensemble prediction step if one models all IMFs and the residual. We address this problem by introducing the component reconstruction step into the original decomposition–ensemble model; thus, the
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
6
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
decomposition–reconstruction–ensemble methodology is proposed in order to improve the computational efficiency and predictive performance. There are various reconstruction methods available, such as data-characteristic-driven (DCD) reconstruction (Yu et al., 2015), fine-to-coarse (FTC) (Zhang et al., 2008), phase space reconstruction (PSR) (Yan, Wang, Zhang, & Ding, 2014), and sample entropy measurement (Zhang, Wu, & Liu, 2014). Yu et al. (2015) proposed a decomposition–ensemble prediction model for crude oil price forecasting based on the data-characteristic-driven (DCD) reconstruction method and obtained a better predictive performance. Zhang et al. (2008) applied a FTC reconstruction method to the analysis of the characteristics of the crude oil price volatility based on the EMD. Yan et al. (2014) introduced PSR as a reconstruction method when developing a novel decomposition–ensemble paradigm for uranium resource price forecasting. Zhang et al. (2014) proposed a characteristic component analysis method for wind speed forecasting by reconstructing new components based on sample entropy measurement. However, all of these studies conducted the component reconstruction according to only a single criterion, such as the frequency (Yan et al., 2014) or complexity (Zhang et al., 2014), while neglecting inner factors of the data. Unlike DCD and PSR, the FTC method reconstructs IMF and the residual into three components using the T -test: high frequency, low frequency and tendency. Moreover, the reconstructed sequence of each component has obvious regularity and contains centralized feature information. For this reason we propose FTC, a simple reconstruction method based on the T -test, for exploring the inner factors of different IMFs and reducing the computational cost. For more details of these four reconstruction methods, please refer to Yan et al. (2014), Yu et al. (2015), Zhang et al. (2008) and Zhang et al. (2014). Given the merits and drawbacks of these four reconstruction methods, this paper selects the FTC method based on the T -test as our reconstruction tool. The FTC reconstruction method involves two steps. First, the similarity of all decomposed IMFs obtained from the previous step is tested using the T -test. Second, the similar IMFs with insignificant differences at a certain confidence level are reconstructed into new components. Usually, the residual in the decomposed parts is the tendency component. If there is not a significant difference among the IMFs when we compare a certain IMF with the other IMFs via the T -test, they are reconstructed into same-type components such as high-frequency and low-frequency components. Thus, the decomposed IMFs from the previous step are merged into three parts, a high-frequency component, a low-frequency component and a tendency component, based on the results of the T -test. 2.4. Individual prediction The second step of the proposed methodology leaves us with three different components of complex time series data, namely the high-frequency, low-frequency and tendency components, by reconstructing the decomposed IMFs. The fluctuation of the high-frequency component
is fast and irregular with time, and the rules behind its internal information are complex. Thus, we forecast the high-frequency component using long short-term memory (LSTM), which is an excellent deep learning algorithm for dealing with complex time series. Unlike the highfrequency component, the low-frequency and tendency components are relatively regular. Thus, extreme learning machine (ELM), a fast and efficient algorithm, is used to predict these two components in terms of their time complexity and prediction accuracy. The remainder of the subsection presents brief introductions to the LSTM and ELM models. 2.4.1. LSTM Long short-term memory (LSTM), which was first proposed by Hochreiter and Schmidhuber (1997), is a kind of typical deep-learning algorithm with a recurrent neural network (RNN) structure (Assaad, Boné, & Cardot, 2008). As a variant of RNN, the biggest innovation of LSTM is the introduction of control gates, which solves the problem of gradient expansion or disappearance, making the deep neural network (DNN; see Yu et al., 2016) that is deployed in time easier to train. In recent years, it has been used widely in speech processing, behavior recognition, video analysis and other fields. The basic structure of LSTM is illustrated in Fig. 4. As can be seen from Fig. 4, the first step in LSTM is to decide what information is going to be thrown away from the cell state. This decision is made by the following forget gate (f t ): ft = σ (Wxf xt + Whf xt −1 + Wcf ct −1 + bf ).
(2)
The next step is to decide which new information is going to be stored in the cell state. This step has two folds. First, the input gate (it ) layer decides which values to update. Second, a tanh layer creates a vector of new candidate values C t . These two processes can be described as it = σ (Wxi xt + Whi xt −1 + Wci ct −1 + bi )
(3)
ct = ft ct −1 + it tan h(Wxi xt + Wci xt −1 + bc ).
(4)
The final step is to decide what is going to be produced as output. This output will be based on the cell state, but will be a filtered version. In this step, the output gate (Ot ) decides what parts of the cell state are going to be produced as output. Then, the cell state goes through another tanh layer (to force the values to be between –1 and 1) and is multiplied by the output gate as follows: ot = σ (Wxo xt + Who xt −1 + Wco xt + bo ).
(5)
In these four equations, it , f t , ot and c t are the vectors for input gates, forget gates, output gates and cell activations, respectively, at time t, and all of the vectors have the same size as the hidden vector ht . The weight matrices W are rectangular matrices that represent connections among any two components. For instance, Wxo, Who and Wco describe the weights from cells, hidden vectors, and cell activations to another components, respectively. bi, bf, bc and bo are the bias. Interested readers are referred to Zheng, Chen, Wu, Chen, and Liu (2017) for more details.
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
7
Fig. 4. The cell of the long short-term memory neural network.
Benefiting from three control gates and a memory cell, LSTM can keep, read, reset and update the long-term information easily. LSTM establishes a long time delay between input and feedback. The gradient will neither explode nor disappear, because the internal state of the memory cell in this architecture maintains a continuous error flow. In this sense, it is very suitable for processing and predicting the problems of long intervals and delay in time series data. However, like other deep learning algorithm, LSTM also has some weakness, such as difficulty in adjusting parameters and a long training time due to many parameters. 2.4.2. ELM The extreme learning machine (ELM), first proposed by Huang, Zhu, and Siew (2005), is a typical fast and efficient algorithm from the family of neural networks. The unique merits of the ELM are its time saving and high accuracy, and thus there are a great many applications of ELM in different areas (Yu et al., 2016). Essentially, ELM is a special case of a single-layer feedforward network (SLFN) (Tang, Dai, Yu, & Wang, 2015), which is illustrated in Fig. 5. The computational structure of ELM with K arbitrary samples (X j , t j ) can be represented as L ∑
where Z denotes the actual output value of the ELM. This indicates the existence of β i , wi and bi such that L ∑
βi g(Wi · Xj + bi ) = tj , j = 1, . . . , K .
(8)
i=1
βi g(Wi · Xj + bi ) = oj , j = 1, . . . , K ,
(6)
i=1
where W i = [W i1 , W i2 , . . . , W iK ]T denotes the weight vector that connects the input nodes to the ith hidden node and β i is the weight vector that connects the output nodes with the ith hidden node. In addition, bi is the threshold of the ith hidden node. The inner product of W i and X j is denoted by the operation W i · X j in Eq. (6). Let us consider that the standard ELM with L hidden nodes employing the activation function g(x) can approximate these K samples with zero error. In such a situation, we obtain the following equation: K ∑ Zj − tj = 0, j=1
Fig. 5. Basic structure of ELM.
A succinct expression of the above equations can be written as Hβ = T ,
(9)
where β is the output weight, T is the expected output and H is the output of hidden layer nodes, which is elaborated as H(W1 , . . . , WL , b1 , . . . , bL , X1 , . . . , XL )
[ =
g(W1 · X1 + b1 )
...
g(W1 · XN + b1 )
... ... ...
g(WL · X1 + bL )
]
...
g(WL · XN + bL )
.
(10)
N ×L
To train the ELM, we hope to get W i , bi and β i so that (7)
ˆ ˆ ˆ H(Wi , bi )βi − T = min ∥H(Wi , bi )βi − T ∥ . W ,b,β
(11)
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
8
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
This is equivalent to minimizing the loss function E=
N L ∑ ∑
(
j=1
βi g(Wi · Xi + bi ) − tj )2 .
(12)
i=1
The training of the single hidden layer ELM neural network can be transformed into the solving of a linear system. Accordingly, the output weight can be determined as
βˆ = H + T ,
(13)
where H + is the Moore-Penrose generalized inverse of a matrix (Prasad & Bapat, 1992). It is also proved that the norm of the solution obtained is the smallest and is unique. In the ELM algorithm, the output matrix of the hidden layer is determined uniquely once the input weight and the bias of hidden layer have been determined randomly. For ELM, its main feature is that the hidden layer node parameters can be given randomly or artificially, and therefore parameter adjustment is not needed. In this sense, the learning process only needs to calculate the output weight. In other words, the biggest advantage of ELM is its fast learning speed and good generalization performance. Although the method of randomizing the hidden layer of the ELM neural network improves the operation speed greatly, it inevitably leads to the hidden danger of over-fitting (Lin, Liu, Fang, & Xu, 2017). In summary, LSTM, an excellent deep learning algorithm for dealing with complex time series, is used to forecast the high-frequency component. In contrast to the high-frequency component, the low-frequency and tendency components are relatively regular. Thus, the ELM model, with its unique merits of time-economy and high accuracy, is selected for forecasting the low-frequency and tendency components. After the individual predictions based on LSTM and ELM methods have been completed, the individual prediction results need to be integrated into an aggregated output for final prediction, as is presented in Section 2.5. 2.5. Ensemble prediction Section 2.4 uses two important learning algorithms, LSTM and ELM, as individual prediction tools for predicting the high-frequency, low-frequency and tendency components. However, ensemble prediction is necessary in order to combine these individual prediction results of different components. For the ensemble forecasting processes of different components, the final forecasting results of the original time series data Xt can be expressed as xˆ t = h(dˆ t (1), dˆ t (2), . . . , dˆ t (P)),
(14)
where xˆ t denotes the final forecasting result at time t, dˆ t is the individual forecasting value for the jth component, and h(dˆ t ) is a function of ensemble prediction. This paper begins by decomposing the original time series data into a linear expansion of modes, i.e., IMFs, via EMD. Then, the FTC reconstruction method is used to reconstruct the decomposed IMFs into three meaningful
components, where the sum of the components is equivalent to the actual value of the original time series data. For the three components, the LSTM and ELM algorithms are used for individual predictions. Accordingly, a simple but effective ensemble approach, the simple addition (ADD) strategy, is used to aggregate the three individual predicted results dˆ t (j)(j = 1, 2, . . . , P) for the final ensemble prediction. That is, following decomposition and reconstruction, the original data series becomes a highfrequency component, a low-frequency component and a tendency component, the ratio of which is 1:1:1 for the three components. Thus, for ensemble prediction, the optimal weighting of the ADD strategy is 1:1:1. 3. Empirical analysis We test the effectiveness of our proposed hybrid ensemble forecasting model by using the US monthly biofuel production data as sample data. For comparison purposes, we also introduce as benchmarks some classical forecasting techniques including five single models, namely the autoregressive integrated moving average (ARIMA) model, the Markov switching model (MS; see Engel, Wahl, & Zagst, 2018; Li, Dong, Huang and Pierre, 2018), the support vector regression (SVR) model, the extreme learning machine (ELM) model and the long short-term memory (LSTM) model, as well as the five decomposition– ensemble forecasting models with reconstruction (R) (i.e. EMD-R-ARIMA, EMD-R-MS, EMD-R-SVR, EMD-R-ELM and EMD-R-LSTM). Section 3.1 describes the experimental design, and Section 3.2 reports and further discusses the corresponding results. 3.1. Experimental design This section presents data descriptions, evaluation criteria and benchmark models. The experimental design of this paper is a rolling forecast, where the length of the rolling estimation window is fixed at a length of 10. For multi-step-ahead prediction, we adopt recursive forecasting. 3.1.1. Data descriptions This paper uses as experimental data the monthly biofuel production data in US markets, collected from the US Energy Information Administration (EIA) (http://www.eia. doe.gov/), as shown in Fig. 6. As can be seen from Fig. 6, the US biofuel monthly production data covers the period from January, 1981 to May, 2018, with a total of 449 observations. The sample data are divided into two subsets, namely a training set and a testing set. The first 80% of observations are used as the training set for model training, while the remaining testing set is used for model performance evaluation. Furthermore, multi-step-ahead forecasting is performed with prediction horizons of one, two, three and six months to test the robustness of the proposed hybrid ensemble forecasting model.
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
9
Fig. 6. Time series data of US biofuel monthly production. Source: US Energy Information Administration (EIA) (http: //www.eia.doe.gov/).
3.1.2. Evaluation criteria We measure the level forecasting accuracy using two popular criteria, the mean absolute percent error (MAPE) and root mean squared error (RMSE): MAPE =
M 1 ∑ xt − xˆ t
M
|
t=1
xt
|
M 1 ∑ RMSE = √ (xˆ t − xt )2 , M
(15)
(16)
t=1
where xˆ t and xt (t = 1, 2, . . . , M) are the prediction and real values at time t respectively, and M is the data size of the test set. In addition, we also consider the directional forecasting accuracy, in terms of a directional statistic (Dstat ; see Yu et al., 2016), as represented by Dstat =
M 1 ∑
M
at ,
(17)
t=1
where at = 1 if (xt + 1m – xt )(xˆ t + 1m – xt ) ≥0 or at = 0 otherwise. Finally, we show the statistical superiority of the proposed hybrid ensemble forecasting model by implementing the Diebold-Mariano (DM) test. The DM test investigates the null hypothesis of forecast accuracy equality against the alternative of different forecasting capabilities between the target model A and its benchmark model B. Using the mean square error loss function, the DM statistic can be written as S=
g (Vˆ g /M)1/2
,
3.1.3. Benchmark models We verify the superiority of the proposed hybrid ensemble forecasting methodology for monthly biofuel production prediction by introducing some other forecasting models as benchmarks for comparison purpose. These models include five single models without the decomposition–reconstruction process, namely ARIMA, MS, SVR, ELM and LSTM, and five ensemble models with decomposition–reconstruction (EMD-R), namely EMD-R-ARIMA, EMD-R-MS, EMD-R-SVR, EMD-R-ELM and EMD-R-LSTM. Our reasons for selecting these particular benchmark models are that ARIMA is the classical traditional econometric model, while SVR, ELM and LSTM, as the typical AI methods, might be the most competitive nonlinear forecasting models in the area of energy forecasting. Many empirical investigations have demonstrated their superiority over the traditional linear models. For the sake of fair comparison with the proposed hybrid ensemble forecasting, the five ensemble forecasting models with decomposition–reconstruction are selected in terms of the previous competitive learning approaches. Note that all methods in this study are run in MATLAB R2018a on a computer with CPU 3.5 GHz and 8 GB RAM. 3.2. Empirical results In this subsection, we first present an illustrative computational process of the proposed hybrid ensemble forecasting methodology. Then some real-world biofuel production forecasting experiments are conducted and some interesting results are presented by comparison of these results with some benchmark forecasting methods.
(18)
∑N
where g = 1/M t=1 [(xt − xA,t )2 − (xt − xB,t )2 ] and ∑∞ Vˆ g = γ0 + 2 l=1 γl . Here, xˆ A,t and xˆ B,t represent the values predicted for xt by the tested model A and the benchmark model B respectively at time t.
3.2.1. Computational process of the hybrid ensemble forecasting methodology Section 2 showed that the proposed hybrid ensemble forecasting methodology consists of four steps: data decomposition, component reconstruction, individual prediction and ensemble prediction. In the computational
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
10
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
experiment, the first step of the proposed hybrid ensemble methodology is to decompose the original series data of monthly biofuel production via EMD, with the decomposition results presented in Fig. 7. The figure shows that the original monthly biofuel production time series data are decomposed into a total of eight components: seven IMFs and one residue. In the second step of the proposed hybrid ensemble forecasting methodology, a similarity test is performed between the first few IMFs and the later IMFs by means of a T-test. This experiment finally determines the highfrequency component to consist of the first, second, and third IMFs. Accordingly, the low-frequency component consists of the 4–7 IMFs. The residue is regarded as the tendency component. Fig. 8 shows the results of the reconstruction by means of the FTC method. The third step of the proposed hybrid ensemble forecasting methodology uses the three reconstructed components for individual prediction. For the high-frequency component with high irregularity, LSTM is selected as the main forecasting method. For the low-frequency and tendency components, the ELM algorithm is selected as the main forecasting tool. In the final step, we conduct ensemble prediction. In this experiment, the simple addition (ADD) method is used to fuse the individual prediction results of the three different reconstructed components into the final ensemble prediction. These four steps enable the proposed hybrid ensemble forecasting methodology to be used for biofuel production prediction. Accordingly, we next report the computational results. 3.2.2. Performance comparison of single forecasting methods This section employs five single benchmarking models, ARIMA, MS, SVR, ELM and LSTM, for predicting the biofuel monthly production data of USA. The parameters of ARIMA (p − d − q) are estimated by minimizing the Schwarz criterion (SC). In SVR, we select the Gaussian RBF kernel function, and set the regularization and kernel parameters via the grid search method (Tang et al., 2015). In the ELM model, we adopt a standard ELM(10-25-1) model in which a sigmoid function is used as the transfer function in the hidden layer, and the lag order of l0 is determined by partial autocorrelation (PAC) analysis. By trial and error, the number of hidden neurons is set to 25. Each ELM model is trained iteratively 1000 times. For LSTM, we use MATLAB’s toolbox to build the model, which consists of one input layer, one LSTM layer with memory blocks and one output layer. The input layer has a dimensionality of 10 features, and the sigmoid function is chosen as an activation function. The number of hidden units is set to 200 by trial and error, and the sigmoid function is selected as the activation function. The minibatch gradient descent was used with a fixed learning rate of 0.001. For the gradient descent method, we trained batches of 128 sequences and 200 epochs. For the Markov switching model, the lag order is 10 and the number of states is 2. We used the second-order partial derivative to calculate the standard error of the fitted coefficients. Figs. 9–11 report the predictive performances of the five
single models based on these parameter settings in terms of MAPE, RMSE and Dstat, respectively. One important conclusion can be drawn immediately from these comparison results, namely that the proposed hybrid ensemble forecasting, i.e., EMD-R-(LSTM+ELM)ADD, has a better predictive performance than any of the single models at different prediction horizons, and performs steadily, which proves its robustness. In terms of the level accuracy, measured by the MAPE (Fig. 9) and RMSE (Fig. 10), it is obvious that the proposed hybrid ensemble forecasting methodology obtains the best predictive performance, followed by the LSTM and SVR. Of the five single models, LSTM and SVR perform better than ELM, MS and ARIMA at all prediction horizons. The main reason for this is that the LSTM is a typical deep learning algorithm and the SVR is a global optimum algorithm. For ELM and ARIMA, the results are mixed. That is, ELM is better than ARIMA when the prediction horizon is short, such as 1 or 2, while ARIMA is better than ELM when the prediction horizon is long, such as 3 or 6. One possible reason for this result is that ELM may tend to fall into a local minimum when the prediction horizon is slightly longer. In regard to the directional accuracy, measured by Dstat, it can be seen that the proposed hybrid ensemble forecasting method is still best, as Fig. 11 shows. Of the five single models, LSTM has the best directional accuracy, followed by ELM and ARIMA, while SVR performs the worst when the horizon is short. The main reason for this is that LSTM is good at processing the information contained in high-frequency data, whereas SVR pays more attention to the global optimum in data processing and less attention to the direction accuracy. When the prediction horizon is 3 or 6, the performance of MS is the worst among the five single models, as it is unable to capture the hidden nonlinear features of the biofuel production series at the longer horizons. 3.2.3. Performance comparison of ensemble forecasting methods For further comparison, we also consider predictions from five decomposition–reconstruction–ensemble forecasting models using the single algorithms above as individual forecasting tools. For the sake of consistency, the parameter specification of each individual forecasting tool is determined in the same way as when considered as a single benchmark. Finally, the five ensemble forecasting models are compared with each other in terms of MAPE, RMSE and Dstat, with the results presented in Figs. 12–14, respectively. In terms of level prediction accuracy (measured by MAPE in Fig. 12 and RMSE in Fig. 13), we find two main conclusions. On the one hand, the proposed hybrid ensemble forecasting methodology, EMD-R-(LSTM+ELM), performs the best of the five ensemble forecasting models, followed by EMD-R-SVR, while the EMD-R-ARIMA model performs worst at all prediction horizons. One possible reason for this could be that LSTM is a superior algorithm for processing information in the high-frequency component and SVR is a global optimum algorithm that is good at nonlinear data processing, while the linear
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
11
Fig. 7. Data decomposition results of biofuel production via EMD.
Fig. 8. Data reconstruction results of biofuel production via FTC.
ARIMA model is unsuitable for complex nonlinear biofuel production forecasting. On the other hand, the results for the EMD-R-ELM and EMD-R-LSTM models are mixed. That is, the predictive performance of the EMD-R-ELM is better than that of EMD-R-LSTM for short prediction horizons, such as 1 or 2, whereas EMD-R-LSTM performs better than the EMD-R-ELM model for longer prediction horizons, such as 3 or 6. The main reason for this may be that ELM can tend to fall into local minima when the prediction horizon is long. These two findings demonstrate that the AI-based methods are more suitable for complex nonlinear time series forecasting than the linear ARIMA model. Focusing now on directional accuracy, three similar conclusions can be obtained. First of all, the proposed hybrid ensemble forecasting method achieves the best directional accuracy due to the good generalization capability of hybridizing LSTM and ELM. Second, of the five benchmarking ensemble models, the directional accuracies of EMD-R-LSTM and EMD-R-ELM are better than those of EMD-R-SVR, EMD-R-MS and EMD-R-ARIMA. This
may be because LSTM and ELM are better at processing the information in high frequency data, while SVR pays more attention to the global optimum in data processing and less attention to the directional accuracy. Finally, EMD-R-ARIMA performs worst at all prediction horizons, because the traditional linear method is unsuitable for complex nonlinear time series forecasting. 3.2.4. Statistical testing of different forecasting models Sections 3.2.2 and 3.2.3 compare the predictive performances of five individual forecasting models and five ensemble forecasting models, in addition to the proposed ensemble model, but do not determine whether there are significant differences among them. To achieve this, it is necessary to statistically test the predictive performances of different models, and we do this using the DM test. The results are reported in Tables 1–4. As these tables show, in general the conclusions in Sections 3.2.2 and 3.2.3 can be confirmed statistically. Six main conclusions can be drawn from these tables.
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
12
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
Fig. 9. Performance comparison of single models in terms of MAPE.
Fig. 10. Performance comparison of single models in terms of RMSE.
First, the proposed hybrid ensemble forecasting methodology has an obvious advantage over the five benchmark ensemble forecasting models (EMD-R-LSTM, EMD-R-ELM, EMD-R-SVR, EMD-R-MS and EMD-R-ARIMA) at the 10% significance level for prediction horizons of 1, 2 and 6, indicating the effectiveness of the proposed hybrid ensemble forecasting methodology. However, when the prediction horizon is 3, there is no significant difference in predictive performance between the proposed hybrid ensemble forecasting methodology and three of the ensemble forecasting models (i.e., EMD-R-LSTM, EMDR-ELM, and EMD-R-SVR), That is, while the proposed EMD-R-(LSTM+ELM) performs better than the other three ensemble forecasting models for the three-step-ahead prediction, the difference is not significant.
Second, the advantage of the proposed hybrid ensemble forecasting method over the single models at the 10% significance level is obvious, except for SVR and LSTM. Thus, although the predictive performance of the proposed hybrid ensemble forecasting methodology is better than those of the LSTM and SVR models in the empirical analysis, there is no significant difference between the proposed hybrid ensemble forecasts and those of the two single forecasting models. Third, a comparison of the five benchmark ensemble forecasting models, i.e., EMD-R-LSTM, EMD-R-ELM, EMD-R-SVR, EMD-R-MS and EMD-R-ARIMA, shows that the former four are superior to the last one at the 5% significance level for prediction horizons of 1 and 6. However, there are no significant differences among these
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
13
Fig. 11. Performance comparison of single models in terms of Dstat.
Fig. 12. Performance comparison of ensemble models in terms of MAPE.
five ensemble forecasts when the two-step-ahead predictions are considered. Finally, for a prediction horizon of 3, the first three ensemble forecasting models (i.e., EMD-RLSTM, EMD-R-ELM and EMD-R-SVR) perform better than the EMD-R-MS model, while there is no significant difference between the first three models and EMD-R-ARIMA. The reasons behind these strange phenomena are unknown, and would be worth exploring further in the future. Fourth, a comparison of the five benchmark ensemble forecasting models and the five single forecasting models shows that the five ensemble forecasting models are superior to the single ARIMA model at the 10% significance level for prediction horizons of 1 and 6. Similarly, the five ensemble forecasting models perform better than the
single ELM model at the 10% significance level when considering six-step-ahead prediction. Apart from these two exceptions, there are no significant differences between the five ensemble forecasting models and the other four single forecasting models (i.e., LSTM, ELM, MS and SVR). This result requires us to explore more competitive forecasting methodologies. Interestingly, the performances of EMD-R-MS for the one-, two- and three-step-ahead prediction horizons are better than those of four of the single models (LSTM, ELM, SVR and ARIMA) at the 10% significance level, which indicates the predictive ability of the EMD-R-MS model relative to some single forecasting models. Fifth, a comparison of the single MS model with the other four single models (i.e., LSTM, ELM, SVR and ARIMA)
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
14
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
Fig. 13. Performance comparison of ensemble models in terms of RMSE.
Fig. 14. Performance comparison of ensemble models in terms of Dstat.
shows that the MS model performs better than the other single models at the 10% significance level for the one-, two-, and six-step-ahead prediction horizons, implying that the MS model is a promising forecasting model relative to other single models. Finally, a comparison of the single LSTM model with three of the other single models (i.e., ELM, SVR and ARIMA) indicates that the LSTM model has an obvious advantage over the ELM and ARIMA models for one-, two-, three- and six-step-ahead prediction at the 10% significance level, revealing the effectiveness of the LSTM. However, there is no significant difference between LSTM and SVR, because the SVR is a global optimum algorithm and is good at processing the nonlinear time series forecasting problem.
3.2.5. Further discussions about reconstruction and computational efficiency The previous subsections have compared the predictive performances of different single and ensemble forecasting models and obtained some interesting results. However, the empirical analysis thus far has not presented the reconstruction function or computational efficiency details. We investigated the function of the reconstruction and computational efficiency of the proposed hybrid ensemble forecasting methodology by conducting two additional experiments that included EMD-LSTM-ADD and EMD-ELM-ADD without the reconstruction process. That is, after the original biofuel production time series have been decomposed into seven IMFs and one residue, the LSTM and ELM models are used to conduct the prediction experiments directly based on the eight decomposed modes without the reconstruction
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
15
Table 1 DM test results for one-step-ahead prediction. Target model
EMD-R(LSTM+ELM) EMD-R-LSTM
Benchmark model EMD-R-LSTM
EMD-R-ELM
EMD-R-SVR
EMD-R-ARIMA
EMD-R-MS
MS
LSTM
ELM
SVR
ARIMA
−1.134
−1.013
−1.155
−1.149
−2.699
−1.269
−0.316
−2.310
−0.853
−1.638
(0.046)
(0.031) 0.086 (0.535)
(0.011) 0.570 (0.716) 0.437 (0.669)
(0.001)
(0.003) −1.020 (0.153) −0.869 (0.192) −1.610 (0.053) 0.899 (0.815)
(0.102) 1.299 (0.903) 1.782 (0.962) 1.075 (0.858) 2.725 (0.995) 2.865 (0.997)
(0.201) 1.303 (0.904) 1.250 (0.894) 1.948 (0.974) 2.976 (0.999) −1.233 (0.021) −1.132 (0.027)
(0.008)
(0.156) 0.803 (0.789) 0.685 (0.753) 0.511 (0.695) 3.278 (0.999) −0.749 (0.071) −0.842 (0.080) −0.656 (0.256) 1.978 (0.976)
EMD-R-ELM EMD-R-SVR
−1.402 (0.080)
−1.657 (0.049) −2.144 (0.016)
EMD-R-ARIMA EMD-R-MS MS LSTM
−0.364 (0.358) −0.483 (0.314) −0.986 (0.162) 1.292 (0.902) −1.009 (0.033) −0.665 (0.013) −2.039 (0.021)
ELM SVR
(0.001)
−1.513 (0.065) −1.782 (0.037) −2.306 (0.011) −0.944 (0.172) −1.183 (0.021) −1.092 (0.027) −3.139 (0.001) −1.391 (0.082) −3.329 (0.000)
Table 2 DM test results for two-step-ahead prediction. Target model
EMD-R(LSTM+ELM) EMD-R-LSTM EMD-R-ELM EMD-R-SVR
Benchmark model EMD-R-LSTM
EMD-R-ELM
EMD-R-SVR
EMD-R-ARIMA
EMD-R-MS
MS
LSTM
ELM
SVR
ARIMA
−2.324
−1.360
−2.869
−2.174
−1.946
−1.364
−2.334
−3.657
−1.168
−2.174
(0.010)
(0.087) 1.369 (0.915)
(0.002) 0.837 (0.799) −0.406 (0.342)
(0.015)
(0.025) 1.704 (0.955) −0.528 (0.298) 0.068 (0.527) 1.064 (0.856)
(0.086) 0.808 (0.790) −0.384 (0.350) −0.010 (0.495) 0.987 (0.838) −0.073 (0.470)
(0.010) 0.791 (0.786) −0.255 (0.399) 0.039 (0.515) 1.002 (0.842) −1.233 (0.041) −1.314 (0.037)
(0.000)
(0.121) 0.653 (0.743) −0.104 (0.459) 0.134 (0.553) 1.640 (0.949) −1.357 (0.041) −1.728 (0.039) 0.119 (0.547) 1.523 (0.936)
−0.385 (0.350)
−1.223 (0.111) −1.061 (0.144)
EMD-R-ARIMA EMD-R-MS MS LSTM ELM SVR
process. Then, as the final ensemble prediction, the eight individual prediction results are fused into an aggregated result via the ADD method. This section focuses only on the computational efficiency and function of the reconstruction process. Accordingly, Table 5 shows the experimental results on the computational efficiency of the three different models. As can be seen from Table 5, the computational time of EMD-LSTM-ADD is the longest, followed by EMD-R(LSTM+ELM)-ADD, while EEMD-ELM-ADD runs the most quickly. The main reason for this is that LSTM itself, as a deep learning method, has a long running time, due to the complex parameter adjustments, whereas ELM is a fast and efficient algorithm. However, there are some irregularities hidden in the high-frequency component, LSTM is selected as a typical prediction tool for predicting the high-frequency component. At the same time,
−0.539 (0.295) −1.454 (0.073) −1.731 (0.042) −0.021 (0.492) 0.022 (0.047) −1.278 (0.028) −2.118 (0.017)
(0.015)
−0.385 (0.350) −1.223 (0.111) −1.061 (0.144) 0.000 (0.500) −1.896 (0.012) −1.837 (0.009) −1.002 (0.158) 0.021 (0.508) −1.640 (0.039)
the reconstruction method reduces the eight modes to three main components in order to improve the computational efficiency. Thus, EMD-R-(LSTM+ELM)-ADD is created to balance predictive performance and computational efficiency. Hybridizing LSTM and ELM means that the proposed hybrid ensemble forecasting methodology with the reconstruction process improves the prediction performance relative to the ensemble forecasting models without reconstruction (i.e., EMD-LSTM-ADD and EMDELM-ADD), even though the computational efficiency is affected by LSTM. In addition, we conducted two kinds of experiments to compare the performances of different prediction methods following the decomposition and reconstruction of the original time series data. First, we compare the superiority of LSTM for high-frequency component forecasting. For the sake of comparison, we use the ARIMA, SVR and
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
16
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
Table 3 DM test results for three-step-ahead prediction. Target model
Benchmark model EMD-R-LSTM
EMD-R-ELM
EMD-R-SVR
EMD-R-ARIMA
EMD-R-MS
MS
LSTM
ELM
SVR
ARIMA
EMD-R(LSTM+ELM) EMD-R-LSTM
−1.134
−1.013
−1.155
−1.496
−2.834
−0.944
−0.316
−2.310
−0.853
−1.638
(0.128)
(0.156) −0.523 (0.301)
(0.124) −0.569 (0.285) −0.216 (0.414)
(0.037) −0.887 (0.188) −0.802 (0.211) −0.462 (0.322)
(0.002) −2.545 (0.005) −3.167 (0.007) −1.813 (0.034) −1.254 (0.104)
(0.010) 0.521 (0.698) 0.940 (0.826) 1.030 (0.848) 1.245 (0.893) 3.613 (0.999)
(0.376) 0.232 (0.592) 0.522 (0.699) 0.923 (0.822) 1.239 (0.892) −1.667 (0.027) 0.738 (0.649)
(0.010) −1.465 (0.071) −1.075 (0.141) −0.914 (0.180) −0.587 (0.279) −1.912 (0.033) 0.889 (0.167) −2.356 (0.009)
(0.197) −0.326 (0.372) −0.056 (0.478) 0.103 (0.541) 0.820 (0.794) 0.541 (0.067) −1.987 (0.577) −0.629 (0.265) 1.957 (0.975)
EMD-R-ELM EMD-R-SVR EMD-R-ARIMA EMD-R-MS MS LSTM ELM SVR
(0.031)
−1.031 (0.151) −1.007 (0.157) −0.614 (0.270) −1.177 (0.120) −1.783 (0.032) 0.045 (0.078) −1.381 (0.084) 0.374 (0.646) −0.950 (0.171)
Table 4 DM test results for six-step-ahead prediction. Target model
Benchmark model EMD-R-LSTM
EMD-R-ELM
EMD-R-SVR
EMD-R-ARIMA
EMD-R-MS
MS
LSTM
ELM
SVR
ARIMA
EMD-R(LSTM+ELM) EMD-R-LSTM
−1.949
−2.548
−1.555
−4.731
−1.551
−2.949
−1.512
−5.097
−2.019
−4.784
(0.026)
(0.005) −0.702 (0.241)
(0.060) 0.179 (0.571) 0.908 (0.818)
(0.000) −2.521 (0.006) −1.914 (0.028) −2.734 (0.003)
(0.060) −0.392 (0.347) 0.277 (0.609) −0.436 (0.331) 1.793 (0.963)
(0.001) −2.512 (0.005) −1.588 (0.056) −2.102 (0.017) 0.335 (0.513) −2.686 (0.003)
(0.065) 0.014 (0.506) 0.668 (0.748) −0.162 (0.436) 2.480 (0.993) 0.718 (0.861) 2.320 (0.990)
(0.000) −3.024 (0.001) −2.916 (0.002) −3.677 (0.000) −1.552 (0.060) −2.881 (0.001) 1.756 (0.068) −3.568 (0.003)
(0.022) −0.870 (0.192) −0.589 (0.278) −1.137 (0.128) 0.868 (0.807) −0.681 (0.256) 1.332 (0.093) −0.808 (0.210) 2.320 (0.990)
EMD-R-ELM EMD-R-SVR EMD-R-ARIMA EMD-R-MS MS LSTM ELM SVR
ELM models to predict and compare the final prediction results of the high-frequency component, while ELM is used as the main method for predicting the low-frequency and tendency components. Second, we compare the superiority of ELM for predicting the low-frequency and trend components. Similarly, ARIMA, SVR and LSTM are used to predict and compare the final prediction results of the low-frequency and trend components, while LSTM is used as the main method for high-frequency component forecasting. Thus, we conduct seven computational experiments, the experimental results of which are shown in Tables 6–9 for four different prediction horizons. As can be seen from Tables 6–9, we obtain three main findings. First of all, if all three components are predicted by ELM and LSTM, the predictive performance of EMD-R-ELM is superior to that of EMD-R-LSTM. This
(0.000)
−2.659 (0.004) −2.062 (0.020) −2.795 (0.003) −1.297 (0.097) −1.059 (0.039) 1.332 (0.909) −2.650 (0.004) 1.332 (0.909) 0.979 (0.164)
Table 5 Computational efficiency comparison of three different ensemble models. Models
EMD-LSTMADD
EMD-R-(LSTM+ ELM)-ADD
EMDELM-ADD
Computational time (s)
13,718.271
4396.882
12.021
may be because LSTM is suitable for forecasting the highfrequency component rather than the low-frequency and tendency components. Second, when SVR and ARIMA are used as generic forecasting algorithms in the hybrid ensemble forecasting methods, the predictive performances of EMD-R-(SVR+ELM) and EMD-R-(LSTM+SVR) are better than those of EMD-R-(ARIMA+ELM) and EMDR-(LSTM+ARIMA), which means that AI-based forecasting
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
17
Table 6 Performance comparisons of different frequencies for one-step-ahead prediction. Models
EMD-RLSTM+ELM
EMD-RELM
EMD-RLSTM
EMD-RARLMA+ELM
EMD-RSVR+ELM
EMD-RLSTM+ARIMA
EMD-RLSTM+SVR
Dstat MAPE RMSE
0.8315 0.0356 8.1819
0.8090 0.0479 11.6519
0.7978 0.0487 11.8665
0.6966 0.0589 13.2342
0.7978 0.0363 8.2146
0.6517 0.0604 13.2265
0.7303 0.0403 8.3910
Table 7 Performance comparisons of different frequencies for two-step-ahead prediction. Models
EMD-RLSTM+ELM
EMD-RELM
EMD-RLSTM
EMD-RARLMA+ELM
EMD-RSVR+ELM
EMD-RLSTM+ARIMA
EMD-RLSTM+SVR
Dstat MAPE RMSE
0.7865 0.0369 8.2601
0.8539 0.0440 11.1496
0.7416 0.052 13.2459
0.7078 0.0507 11.4635
0.7415 0.0454 0.0480
0.6179 0.0857 18.5716
0.6853 0.0517 11.1698
Table 8 Performance comparisons of different frequencies for three-step-ahead prediction. Models
EMD-RLSTM+ELM
EMD-RELM
EMD-RLSTM
EMD-RARLMA+ELM
EMD-RSVR+ELM
EMD-RLSTM+ARIMA
EMD-RLSTM+SVR
Dstat MAPE RMSE
0.7978 0.0414 9.2627
0.7753 0.0494 12.0638
0.8090 0.0455 11.9844
0.7415 0.0541 11.8796
0.7191 0.0458 10.0823
0.6179 0.0843 18.3860
0.6853 0.0512 10.9281
Table 9 Performance comparisons of different frequencies for six-step-ahead prediction. Models
EMD-RLSTM+ELM
EMD-RELM
EMD-RLSTM
EMD-RARLMA+ELM
EMD-RSVR+ELM
EMD-RLSTM+ARIMA
EMD-RLSTM+SVR
Dstat MAPE RMSE
0.8401 0.0405 9.1888
0.7303 0.0565 13.8686
0.7528 0.0506 12.287
0.6741 0.0582 12.5120
0.6966 0.0480 10.0823
0.6292 0.0854 18.6061
0.7078 0.0536 10.8087
methods have some advantages over the traditional linear forecasting methods. Finally, the predictive performance of the hybrid ensemble forecasting methodology proposed in this paper, i.e., EMD-R-(LSTM+ELM), is better than those of the other six ensemble forecasting methods listed in Tables 6–9 in terms of both the level prediction accuracy, measured by MAPE and RMSE, and the directional accuracy, measured by Dstat . This reveals that the proposed EMD-R-(LSTM+ELM) model can be used as a competitive hybrid ensemble forecasting methodology for biofuel production prediction. 4. Conclusions Given the close relationship between agriculture and biofuel produced by crop fermentation, this paper proposes a hybrid ensemble forecasting methodology for the prediction of monthly biofuel production in order to coordinate the harmonious development of agriculture and the biofuel industry. For illustration and verification purposes, we use US biofuel production data for computation and testing. In terms of empirical results, we find that the proposed hybrid ensemble forecasting methodology integrating EMD, LSTM, ELM and ADD performs the best across different forecasting models for biofuel production forecasting. In all testing cases, it has the lowest RMSE and MAPE and the highest Dstat , indicating that the proposed hybrid ensemble forecasting methodology can be used as
a very promising forecasting tool for predicting biofuel production. In addition to biofuel production, the proposed hybrid ensemble forecasting methodology could be also extended to other real-world problems, in order to test its generalization and universality thoroughly. Meantime, other extra factors that affect the biofuel market can also be taken into consideration as input features of LSTM, which may enhance the forecasting performance further. In addition, when the amount of data is relatively small (e.g., fewer than 100 observations), the training effect of LSTM is not good enough for deep learning, which leads to the situation where the proposed hybrid ensemble forecasting model might fail. We will look into these issues in the near future. Acknowledgments This work is partially supported by grants from the Key Program of the National Natural Science Foundation of China (NSFC nos.71433001 and 71631005), the National Program for Support of Top-Notch Young Professionals, and Beijing Advanced Innovation Center for Soft Matter Science and Engineering. References Assaad, M., Boné, R., & Cardot, H. (2008). A new boosting algorithm for improved time-series forecasting with recurrent neural networks. Information Fusion, 9(1), 41–55.
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.
18
L. Yu, S. Liang, R. Chen et al. / International Journal of Forecasting xxx (xxxx) xxx
Engel, J., Wahl, M., & Zagst, R. (2018). Forecasting turbulence in the Asian and European stock market using regime-switching models. Quantitative Finance and Economics, 2(2), 388–406. Feng, K., Ying, Z., Wu, J., & Wan, M. (2014). Short-term wind speed forecast based on wavelet packet decomposition and peak-type Markov chain. Journal of Nanjing University of Science & and Technology, 38(5), 639–643, and 657. Geng, N., Yong, Z., Sun, Y., Jiang, Y., & Chen, D. (2015). Forecasting China’s annual biofuel production using an improved Grey model. Energies, 8(10), 12080–12099. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, E. H., Zheng, Q., et al. (1998). The empirical mode decomposition and the Hilbert spectrum for nonlinear and nonstationary time series analysis. Proceedings of the Royal Society, Series A, London, 454, 903–995. Huang, G. B., Zhu, Q. Y., & Siew, C. K. (2005). Extreme learning machine: a new learning scheme of feedforward neural networks. IEEE International Joint Conference on Neural Networks, 2, 985–990. Husaini, N. A., Ghazali, R., Nawi, N. M., & Ismail, L. H. (2011). Pisigma neural network for temperature forecasting in Batu Pahat. In International conference on software engineering and computer systems (pp. 530–541). Jafari, Y., & Othman, J. (2016). Impact of biofuel development on Malaysian agriculture: a comparative statics, multicommodity, multistage production, partial equilibrium approach. Food & and Energy Security, 5(3), 192–202. Lee, H., Clark, W. C., & Devereaux, C. (2008). Biofuels and sustainable development. SSRN Electronic Journal, RWP08–049. Li, Z., Dong, H., Huang, Z., & Pierre, F. (2018). Asymmetric effects on risks of virtual financial assets (VFAs) in different regimes: a case of Bitcoin. Quantitative Finance and Economics, 2(4), 860–883. Li, Z., Liao, G., Wang, Z., & Huang, Z. (2018). Green loan and subsidy for promoting clean production innovation. Journal of Cleaner Production, 187, 421–431. Li, D. C., Liu, C. W., Fang, Y. H., & Chen, C. C. (2010). A yield forecast model for pilot products using support vector regression and manufacturing experience–the case of large-size polariser. International Journal of Productions Research, 48(18), 5481–5496. Li, M. Y., & Zhang, G. S. (2014). International comparison and forecast of china biofuel production: based on Hubbert model. Advanced Materials Research, 953–954, 240–245. Lin, S., Liu, X., Fang, J., & Xu, Z. (2017). Is extreme learning machine feasible? A theoretical assessment (part II). IEEE Transactions on Neural Networks & and Learning Systems, 26(1), 21–34. Mejdoub, H., & Ghorbel, A. (2018). Conditional dependence between oil price and stock prices of renewable energy: a vine copula approach. Economic and Political Studies, 6(2), 176–193.
Melikoglu, M. (2014). Demand forecast for road transportation fuels including gasoline, diesel, LPG, bioethanol and biodiesel for Turkey between 2013 and 2023. Renewable Energy, 64, 164–171. Paraskevopoulos, T., & Posch, P. N. (2018). A hybrid forecasting algorithm based on SVR and wavelet decomposition. Quantitative Finance and Economics, 2(3), 525–553. Prasad, K. M., & Bapat, R. B. (1992). The generalized Moore–Penrose inverse. Linear Algebra and its Applications, 165, 59–69. Rajendra, M., Jena, P. C., & Raheman, H. (2009). Prediction of optimized pretreatment process parameters for biodiesel production using ANN and GA. Fuel, 88(5), 868–875. Song, F., & Yu, Y. H. (2018). Modelling energy efficiency in China: a fixed-effects panel stochastic frontier approach. Economic and Political Studies, 6(2), 158–175. Tang, L., Dai, W., Yu, L., & Wang, S. (2015). A novel CEEMD-based EELM ensemble learning paradigm for crude oil price forecasting. International Journal of Information Technology & and Decision Making, 14(01), 141–169. Timilsina, G. R., Beghin, J. C., Mensbrugghe, D. V. D., & Mevel, S. (2012). The impacts of biofuel targets on land-use change and food supply: A global CGE assessment. Agricultural Economics, 43(3), 315–332. Weigend, A. S. (2018). Time series prediction: Forecasting the future and understanding the past. USA: Routledge. Wen, C. X., Gao, S., Xiao, X. N., Xu, Y. H., & Tao, S. (2013). Forecast of power quality index based on the discrete Fourier decomposition and AR model. Advanced Materials Research, (1420), 732–733–1426. Yan, Q. S., Wang, S. T., Zhang, Y. F., & Ding, M. H. (2014). Uranium resource price prediction based on empirical mode decomposition and extreme learning machine. Control & and Decision, 29(7), 1187–1192. Yu, L., Wang, Z., & Tang, L. (2015). A decomposition-ensemble model with data-characteristic-driven reconstruction for crude oil price forecasting. Applied Energy, 156, 251–267. Yu, L., Yang, Z., & Ling, T. (2016). A novel multistage deep belief network based extreme learning machine ensemble learning paradigm for credit risk assessment. Flexible Services & and Manufacturing Journal, 28(4), 576–592. Zhang, X., Lai, K. K., & Wang, S. Y. (2008). A new approach for crude oil price analysis based on empirical mode decomposition. Energy Economics, 30(3), 905–918. Zhang, G., Wu, Y., & Liu, Y. (2014). An advanced wind speed multi-step ahead forecasting approach with characteristic component analysis. Journal of Renewable & and Sustainable Energy, 6(5), 1663–1672. Zheng, Z., Chen, W., Wu, X., Chen, P. C. Y., & Liu, J. (2017). LSTM network: a deep learning approach for short-term traffic forecast. IET Intelligent Transport Systems, 11(2), 68–75. Zhou, T., Gao, S., Wang, J., Chu, C., Todo, Y., & Zheng, T. (2016). Financial time series prediction using a dendritic neuron model. Knowledge-Based Systems, 105, 214–224.
Please cite this article as: L. Yu, S. Liang, R. Chen et al., Predicting monthly biofuel production using a hybrid ensemble forecasting methodology. International Journal of Forecasting (2019), https://doi.org/10.1016/j.ijforecast.2019.08.014.