Accepted Manuscript
Hidden Markov model for municipal waste generation forecasting under uncertainties P. Jiang , X. Liu PII: DOI: Reference:
S0377-2217(15)00842-5 10.1016/j.ejor.2015.09.018 EOR 13237
To appear in:
European Journal of Operational Research
Received date: Revised date: Accepted date:
22 September 2014 7 August 2015 9 September 2015
Please cite this article as: P. Jiang , X. Liu , Hidden Markov model for municipal waste generation forecasting under uncertainties, European Journal of Operational Research (2015), doi: 10.1016/j.ejor.2015.09.018
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT Highlights
A model-driven learning method is proposed to trace fluctuations dynamically.
The fluctuation forecasting under uncertainties is made possible using the HMM.
Case studies of different scenarios are presented to show the validity of the HMM. Two forecasting formulas are proposed to cope with two scenarios separately.
AC
CE
PT
ED
M
AN US
CR IP T
ACCEPTED MANUSCRIPT
Hidden Markov model for municipal waste generation forecasting under uncertainties
CR IP T
P. Jiang, X. Liu*
Corresponding author: Professor Xiao LIU
AN US
Department of Industry Engineering, Shanghai Jiao Tong University, P. R. China
Department of Industrial Engineering management, Shanghai Jiao Tong University, 800 Dongchuan Road, Min-Hang District, Shanghai 200240, P. R. China
M
Email:
[email protected]
AC
CE
PT
Fax: +65-6601-4089
ED
Phone: +65-6601-4089
ACCEPTED MANUSCRIPT Abstract: Waste generation forecasting is a complex process that is found to be influenced by some latent influencing parameters and their uncertainties, such as economic growth, demography, individual behaviors, activities and events, and management policies. These hidden features play an important role in forecasting the fluctuations of waste generation. We therefore focus on revealing the trend of waste generation in megacities which face significant influences of social and economic changes to achieve urban sustainable development. To dynamically trace fluctuations caused by these uncertainties, we propose a probability
CR IP T
model-driven statistical learning approach which hybridizes a wavelet de-noising, a Gaussian mixture model, and a hidden Markov model. First, to gain the actual underlying trend, wavelet de-noising is used to eliminate the noise of data. Next, the Expectation-Maximization and the Viterbi algorithms are employed for learning parameters and discerning the most probable sequence of hidden states, respectively. Subsequently, the state transition matrix is
AN US
updated by fractional predictable changes of influencing parameters to perform non-periodic fluctuation problem forecasting, and the forward algorithm is utilized to search the most similar data pattern for the current pattern from historical data in order to forecast the future trend of the periodic fluctuation problem. Finally, we apply the approaches into two kinds of case studies that test both a small dataset and a large dataset. How uncertainty factors
M
influence forecasted results is analyzed in the subsection of results and discussion. The computational results demonstrate that the proposed approaches are effective in solving the
PT
ED
municipal waste generation forecasting problem.
Keywords: Forecasting, Waste generation, Wavelet de-noising, Hidden Markov model,
AC
CE
Gaussian mixture model
ACCEPTED MANUSCRIPT 1. Introduction Accurate forecasting of municipal waste generation can provide theoretical guidance in disposal capacity design and capital budget planning for waste management systems (Beigl et al., 2008; Cherian and Jacob, 2012; Rimaityte et al., 2012; Intharathirat et al., 2015). Waste generation forecasting in megacities is a complex process that is found to be influenced by some latent influencing factors and their uncertainties, such as rapid economic growth or
CR IP T
economic crisis, relentless demographic forces, ever-changing individual behaviors, special activities and events, and complex waste management measures (Pires et al., 2011; Cherian and Jacob, 2012; Denafas et al., 2014; Ghiani et al., 2014). For example, the annual waste generation trend in a megacity could be altered by large-scale activities (such as the Olympic Games and the World EXPO), long-term epidemic diseases (such as SARS), or a new
AN US
regulation policy for municipal waste. The fluctuation of waste generation comes from unmeasured uncertainties of the economy, individual behaviors, population mobility, and unexpected disasters. The latent factors and their uncertainties are difficult to quantify, and they play an important role in tracing the fluctuation of waste generation dynamically and forecasting the consequences of changes. The fluctuation and nonlinearity of both municipal
M
solid waste (MSW) and wastewater generation is a huge obstacle for accurate forecasting. It is difficult to forecast fluctuation of waste generation accurately via traditional statistical models,
ED
such as the moving average and linear regression models. In this study, we focus on fluctuation forecasting problem. Three major contributions are made in this paper: 1) A probability model-driven statistical learning approach, that hybridizes
PT
a wavelet de-noising (WDE), a Gaussian mixture model (GMM), and a hidden Markov model (HMM), is proposed to trace waste generation fluctuations under uncertainties dynamically; 2)
CE
The way of weighted average Gaussian mixture and the way of searching the most similar pattern are proposed as two concrete HMM-based forecasting ways to cope with the periodic
AC
and non-periodic fluctuation problems, respectively; and 3) Two kinds of case studies of different scenarios are presented in this study. They correspond to a small sample and a large sample datasets to show the effectiveness of the proposed approaches. The first case uses annual MSW generation in Shanghai and Beijing from 1978 to 2010, and the second case utilizes the daily wastewater level of a Spanish urban sewage treatment plant from 1990 to 1991. The paper is organized as follows: Section 2 summarizes the details of related studies. Problem description and forecasting modelling are presented in Section 3. Section 4 provides
ACCEPTED MANUSCRIPT two kinds of case studies to test the effectiveness of the proposed approaches. Finally, we close this paper with some conclusions and guidelines for future work. 2. Related works: a summary So far, a number of forecasting techniques have been applied to waste generation forecasting. Mainstream methods include traditional statistical models, the gray and fuzzy models, simulation models, and non-probabilistic statistical learning models. In this section, we
CR IP T
summarize the application scope, advantages, and weaknesses of these mainstream methods. Subsequently, literature of probability model-driven statistical learning approaches as a comparison is introduced briefly.
Traditional statistical models include linear regression models (LRM), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), seasonal
AN US
ARIMA (sARIMA), exponential smoothing (ES), and seasonal ES (sES). LRM aims to discover a linear function to quantify waste generation. Essentially, the problem of waste generation is nonlinear, so LRM is just an approximate fitting method for waste generation forecasting. The ARMA, a mixture of autoregressive and moving average models, can be employed to forecast waste generation of stationary time series (Katsamaki et al., 1998).
M
ARIMA stochastic models are effective time series forecasting tools which can take influence factors into consideration, and can also process periodic time series data effectively. ARIMA
ED
model has been applied to a range of forecasting fields, such as portfolio optimization (Ustun and Kasimbeyli, 2012), and MSW generation (Owusu-Sekyere, 2013). The ES model is
PT
performed by utilizing the weighted mean of all values in time series, which assigns exponentially decreasing weights over time. Both sARIMA and sES can take into account
CE
seasonal changes and trends (Xu et al., 2013; Song and He, 2014; Rimaityte et al., 2012; Denafas et al., 2014). Traditional statistical models can forecast the moving average or discover smooth trend effectively; however, fluctuations are hard to be traced accurately.
AC
The gray and fuzzy models are based on fuzzy set theory. The fuzzy mathematic method
can solve problems with uncertainty, and can also obtain reliable models given poor data. Chen and Chang (2000) applied a gray fuzzy dynamic prediction technique to forecast MSW generation when the sample numbers were low. Karavezyris et al. (2002) employed fuzzy logic to enhance confidence in the validity of the system dynamics model. Fuzzy logic is an attractive method; however, the knowledge base of the fuzzy model is founded on expert experience. Gray model is suitable to study the uncertainty of systems and handle small sample dataset (Intharathirat et al., 2015); however, the conventional gray prediction model is sensitive to
ACCEPTED MANUSCRIPT initial values (Guo et al., 2014). Simulation models are suitable for simulating complex systems which are difficult to express via mathematic formula. System dynamics (SD) is one of the promising of these methods for studying complex feedback systems. SD models have been widely used for predicting waste generation (Karavezyris et al., 2002; Dyson and Chang, 2005; Kollikkathara et al., 2010; Ahmad, 2012). SD models use a multivariate method and are far more complex due to the multifarious interactions between the selected parameters (Beigl et al., 2008). Therefore,
CR IP T
it is hard to achieve model validation.
Non-probabilistic statistical learning models for waste generation forecasting mainly include artificial neural networks (ANN) and support vector machines (SVM). ANN, one of the most widespread of intelligent statistical-learning models, has powerful ability to deal with nonlinear forecasting problems (Setzler et al., 2009) and has been applied to generation
AN US
forecasting fields (Jalili Ghazi Zade and Noori, 2008; Ali Abdoli, 2012; Antanasijevic et al., 2013). Although ANN is promising, the disadvantages, such as over-fitting and local minimum, limit its application. SVM is a nonlinear and kernel-based intelligent technique. Noori et al. (2009), Abbasi et al. (2013), and Abbasi et.al. (2014) proposed SVM models to implement weekly prediction of waste generation in the Iranian cities of Mashhad and Tehran, respectively.
M
Compared with ANN, SVM overcomes the problem of over-fitting training data and local minimum. However, SVM is still a black box technique which cannot establish a mathematical
ED
functional mapping between the input variables and the drifting parameters (Dong et al., 2009). Recently, probability model-driven statistical learning approaches have been effectively
PT
applied to perform forecasting under uncertainties. Compared with ANN and SVM, they are white box methods which focus more on the components of the mathematical model. For
CE
example, Bayesian network (BN) is a combination of graph theory and probability theory which has unique advantages for handling problems with uncertainty, and mining causality
AC
among variables. BN has been introduced into forecasting areas, such as traffic flow forecasting (Sun et al., 2006) and wastewater inflow monitoring (Cheon et al., 2008). The quality of results produced by the BN method largely depends on the amount of evidence (Dong et al., 2009); however, they are not available in most cases. Another example is HMM which became popular for classification and pattern recognition problems after its appearance. The HMM has been applied to a range of fields, such as stock market forecasting (Hassan, 2009), PM2.5 concentration prediction (Dong et al., 2009), failure prognosis (Zhou et al., 2010; Kim et al., 2011), and manpower planning (Guerry, 2011). However, to the best of our knowledge, no study has previously forecasted waste generation via an HMM.
ACCEPTED MANUSCRIPT Although the first four stream methods have respective merits for various applications, there are still some difficulties using these methods for measuring and tracing fluctuations of waste generation dynamically. The advantages of the HMM lie in its theoretical basis, rigorous mathematical structure, and its proven fitness for modelling dynamic systems under uncertainties. 3. Model
CR IP T
In this section, the study problems and their characteristics are first introduced. Then, models are constructed to forecast MSW and wastewater generation. 3.1. Problem description
Municipal waste generation in megacities is a complex process of the interaction of
AN US
intertwined systems, which is found to be influenced by latent influencing factors and their uncertainties. So waste generation has the characteristics of nonlinearity, fluctuation, and strong interference. The key challenges for waste generation forecasting problem of both MSW and wastewater lie in generation fluctuations which include non-periodic and periodic fluctuation. Annual MSW generation is a non-stationary time series. The increment rate,
M
transformed from MSW generation, shows an obvious non-periodic fluctuation (Fig. 1). Daily wastewater generation level is an approximate stationary time series with periodic fluctuation
AC
MSW Generation / tonne
CE
PT
MSW Increment Rate / %
ED
(Fig. 2).
Year
Fig. 1. MSW generation and the increment rate of generation in city X
Waste Generation / tonne
ACCEPTED MANUSCRIPT 109
107 105 103 101 99
97 95
1
2
3
4
5
6
7
8
9
Period
CR IP T
Fig. 2. Wastewater generation in city Y
Two techniques based on HMM are proposed to deal with above challenges of fluctuations:
1) For the periodic fluctuation problem, all data from each period are selected to set up data patterns. For instance, we will obtain 9 patterns as shown in Fig. 2. Once two similar
AN US
patterns are revealed (e.g., patterns 3 and 6, or patterns 4 and 7), we employ the similar pattern to forecast waste generation trend in the coming period.
2) For the non-periodic fluctuation problem, the variable to be forecasted and other observable factors can be integrated to constitute data patterns. Then, we can use the approach as above to forecast waste generation. However, if the number of searchable patterns is too
M
small, the error of forecasted results becomes unacceptable. In this situation, one alternative
ED
forecasting way (weighted average Gaussian mixture) is employed to solve this problem. 3.2. Waste generation forecasting modelling
PT
A probability model-driven statistical learning approach, which hybridizes a wavelet de-noising, a Gaussian mixture model, and a hidden Markov model, is constructed to dynamically trace fluctuations of waste generation. An HMM is a double embedded stochastic
CE
process which includes a Markov process and a general stochastic process. Fig. 3 shows a simple HMM component structure sketch with a hidden state sequence and a directly visible
AC
observation sequence.
q1
q2
...
qt
...
qT
Hidden state sequence
O1
O2
...
Ot
...
OT
Observation sequence
Fig. 3. Sketch of the structure of a simple hidden Markov model
ACCEPTED MANUSCRIPT In this study, we define the total uncertainty degree of waste generation as a hidden variable. The hidden state sequence is assumed to follow a discrete-time, finite-state, and first-order Markov chain which implies that the current state is dependent only on the previous state. We also regard the variable to be forecasted as an observation which is a continuous random variable. The observation is assumed to be conditionally independent of all others given the state that generated it. Uncertainties do not need to be measured directly; instead, we let historical data tell us what the total uncertainty degree is most likely to be
CR IP T
through data analysis.
Fig. 4 illustrates the process of HMM-based forecasting which includes three parts: Part I: data preprocessing, Part II: HMM modelling, and Part III: forecasting. There are two scenarios which follow different processes in Part III. Scenario I is for the non-periodic fluctuation problem with small sample sizes, and Scenario II is for the periodic fluctuation
AN US
problem with large sample sizes. In this paper, we present two kinds of case studies that represent the different scenarios. MSW generation forecasting for Shanghai and Beijing (Scenario I) is carried out via the way of weighted average Gaussian mixture (WAGM) based on an HMM, while wastewater generation forecasting for the Spain Manresa sewage
(SMSP) based on an HMM.
ED
PT
CE
Part II:Hidden Markov Model
Forecasting by WAGM Forecasting by SMSP
Data post-processing
Viterbi decoding
Training and parameter determination
Parameter initialization
Part I:Preprocessing
Scenario I
HMM modelling
Scenario II
Data preprocessing
Variable selection
Scenario I
M
treatment plant (Scenario II) is implemented via the way of searching the most similar pattern
Part III:Forecasting
Scenario II
AC
Fig. 4. The process of HMM-based forecasting for two different scenarios
3.2.1. Preprocessing - Part I This subsection mainly focuses on the preprocessing process for the whole model. This
involves two steps: variable selection and data preprocessing. Step1. Variable selection The total uncertainty of latent influencing factors (non-observable variables) is taken as an integrated hidden variable which is called the total uncertainty degree of waste generation.
ACCEPTED MANUSCRIPT The variable to be forecasted is regarded as a manifest variable (i.e., observation variable). In Fig. 3, hidden state sequence is composed of different total uncertainty degrees of waste generation, and observation sequence is consist of the actual values of the manifest variable. Step2. Data preprocessing This subsection includes wavelet de-noising and data pattern setup. The former is a tool for removing noise in the data series, and the latter is the basis step of pattern recognition in
CR IP T
an HMM. (I) Wavelet de-noising
The municipal waste generation is fluctuant and nonlinear due to uncertainties. The statistic data of waste generation is not a true reflection of the underlying trend caused by sources of noises. Wavelet de-noising, that uses scaled and translated versions of the mother
AN US
wavelet ψ j ,k ( x) to build an approximation f ( x) of an original function f ( x) , is a reliable method to eliminate the data noise of a time series signal (De Souza Silva et al., 2010). Wavelet translation can be expressed as follows:
c j ,k j ,k ( x)
(1)
ψ j ,k ( x) = δψ (2 j x - k )
(2)
f ( x)
M
j ,k
ED
c j ,k
f ( x) j ,k ( x)dx
(3)
where c j ,k is wavelet coefficients; δ is a constant; k decides the translated versions of a
PT
wavelet; and 2 j is its scale parameter.
Ingrid Daubechies invented compactly supported orthonormal wavelets, which makes the
CE
analysis of discrete wavelets practicable. The families of orthonormal wavelets are also called Daubechies wavelets. The Symlets are nearly symmetrical wavelets that were proposed as
AC
modifications to the Daubechies wavelets (Misiti et al., 2011). In this study, the Daubechies and Symlets
wavelets were employed to carry out wavelet decomposition and
synthesis-reconstruction. After wavelet decomposition, data signal could be decomposed into two categories of filters: low-pass filter (the approximation part) and high-pass filter (the detailed parts). The underlying trend of the all factors can be captured by the approximation part, and the noise features can be recorded by the detailed parts. Eq. (4) reconstructs the approximation part ( AJ ) and the detailed parts ( D1 , D2 ,..., DJ ) into a new time series data signal (Misiti et al., 2011):
ACCEPTED MANUSCRIPT (4)
S = AJ + D1 + D2 +... + DJ
where S is new signal, and J is scale parameter. After reconstruction, the new time series will be regarded as updated data for training and testing of the WDE-HMM approach. This will be illustrated by a specific example in Section 4. (II) Data pattern setup Data patterns that can reflect the characteristics of fluctuation refer to specific and similar
CR IP T
data vectors which follow a defined rule. In order to forecast fluctuations of waste generation in a large sample via an HMM, datasets of observations are usually arranged in a specific order so that every data vector forms a pattern which is the basis for data pattern recognition in an HMM. For example, the wastewater generation data for each week (no data on Saturday) of the Manresa sewage treatment plant is arranged in a data pattern [Mon., Tues., Wed., Thur.,
AN US
Fri., and Sun.]. This pattern can interpret the wastewater generation behavior of each week. Waste generation in historical municipal activities or incidents can provide reference for future forecasting. When a similar situation reoccurs, this historical pattern can be used to forecast. 3.2.2. Hidden Markov model - Part II
M
The main difference between continuous HMMs and discrete HMMs lies in the statistic property of observations. In this study, the observation of each state is a vector of continuous
ED
random variables. This subsection includes four steps: continuous HMM modelling, parameter initialization, training and parameter determination, and Viterbi decoding.
PT
Step 1. Continuous HMM modelling Fig. 5 shows the translation relationship of hidden states and observations in a continuous
CE
HMM. For each hidden state, a probability exists for the current state to transform into any possible state, while each observation vector (i.e., data pattern) can be observed with a
AC
probability in its corresponding hidden state.
ACCEPTED MANUSCRIPT a1N a1j
a2N a2j
a1i a11
a22 a12 a21
s1
s2
aii a2i ... ai2
ai1
aiN ajj aij ... aji
si aj2
sj
aNN ajN ... aNj
sN
aNi aN2
aj1
b1(O)
b2(O)
O
O
bi(O) ...
O
CR IP T
aN1 bj(O) ...
O
bN(O)
...
O
Hidden states
Observation vector
Transition probability
Emission probability
AN US
Fig. 5. The relationship of hidden states and observations in continuous HMM
The primary notations that will be used for the rest of this paper are listed as follows:
N : The number of the hidden states of this model;
M
T : Length of the state sequence or observation sequence. t = 1, 2,..., T ;
S : A set of states. S = {s1 , s2 ,..., si ,..., s N } and i = 1, 2,..., N ;
t;
ED
Q : State sequence {q1, q2 ,..., qT } , where qt = si represents the hidden state at time
O : Observation vector sequence, O = {O1 , O2 ,..., Ot ,..., ΟT } ;
PT
π : The prior probability of initial state { πi };
A : Transition matrix A={aij } , aij represents state transition probability from the
CE
hidden state i to state j , j = 1, 2,..., N ; B : A family of probability density functions B = {b j (O)} ;
AC
b j (O ) : the occurrence probability of observation at state j . b j (O ) is illustrated by a
widely used multi-dimensional Gaussian distribution: b j (O) = ∑P( K j = k | st = j ) b jk (O) k
= ∑P( K j = k | st = j )Φ[O, μ jk , Σ jk ] k
∑c jk [O, μ jk , Σ jk ],
1 j N ,1 k K
(5)
k
where O is the vector of observations being modeled; [O, μ jk , Σ jk ] is Gaussian density;
ACCEPTED MANUSCRIPT c jk , μ jk , and Σ jk represent mixture coefficients, mean vector, and covariance matrix for the
k th mixture component at state j , respectively; and
∑c
jk
=1;
k
λ : The overall parameters λ = (π, A, B) , and
πi = 1 , ∑aij = 1 , b j ( x)dx 1 . ∑ i j
Step 2. Parameter initialization
CR IP T
Prior knowledge about probabilities of parameters λ = (π, A, B) is extremely useful for parameter initialization, but it is often not available. For case with prior information, parameters can be initialized easily. For case without prior knowledge, we only consider the basic constraints above to initialize parameters ( λ ) randomly. Note that the proper number of hidden states is unknown in this step which will be eventually decided in parameter
AN US
determination of Step 3. Step 3. Training and parameter determination
In this step, parameters are updated through training the HMM. The most suitable number of hidden states and the proper number of mixture components of a GMM are determined by
M
Akaike’s information criterion (AIC). (I) Training
ED
In the training step, once we obtain the observation sequence O = {O1 , O2 ,..., Ot ,..., ΟT } the parameters ( λ ) of an HMM can be trained by the Expectation-Maximization (EM) algorithm (Rabiner, 1989; Bilmes, 1998) which iterates ceaselessly until it converges to the maximum
PT
likelihood. After updating all parameters, the new parameter set is λML = (π , A, B ) = arg max P (O | λ) , where π = {πi } , A = {aij } , and B = {b j (O)} . The updated initial state
CE
λ
probability ( πi ), transition probability ( aij ), mixture coefficients ( c jk ), mean vector ( μ jk ), and
AC
covariance matrix ( Σ jk ) are expressed as follows:
πi = ξ1 (i)
(6)
T -1
T -1
t =1
t =1
T
T
aij =∑ξt (i, j ) /∑ξt (i ) K
c jk = ∑ξt ( j , k ) /∑∑ξt ( j , k ) t =1
T
μ jk =
(7)
(8)
t =1 k =1
T
ξt ( j, k ) Ot /∑ξt ( j, k ) ∑ t =1 t =1
(9)
ACCEPTED MANUSCRIPT
Σ jk =
T
T
t =1
t =1
∑ξt ( j, k ) (Ot - μ jk )(Ot - μ jk )T /∑ξt ( j, k )
(10)
where ξt (i) and ξt ( j ) represent the probability of being in si and s j at time t , respectively; ξt (i, j ) represents the probability of being in si at time t and in s j at time
t +1 ; ξt ( j, k ) represents the probability of being in s j at time t with the k th mixture component; and Ot is the observation at time t in the observation sequence. The
N
ξt (i) = αt (i) βt (i) /∑αt (i) βt (i) i =1
N
ξt ( j ) = αt ( j ) βt ( j ) /∑αt ( j ) βt ( j )
ξt (i, j ) =
AN US
j =1
CR IP T
forward-backward variables α and β can be used to illustrate these probabilities:
αt (i )aij b j (Ot +1 ) βt +1 ( j ) N
N
(11)
(12)
(13)
∑ ∑αt (i)aij b j (Ot +1 ) βt +1( j) i =1 j =1
M
t ( j ) t ( j ) cjk[Ot , μ jk , Σ jk ] t ( j , k ) N t ( j ) t ( j ) K cjk[O , μ , Σ ] t jk jk j 1 k 1
(14)
ED
Since the EM algorithm easily converges to a local optimum or a saddle point, the log-likelihood estimator of the EM algorithm may less than from the maximum log-likelihood
PT
estimator. To improve the quality of parameter estimation, in this study, the model is trained by varied initial parameters ( λ ) with multiple runs to obtain the updated parameters ( λ ) by
CE
choosing the biggest log-likelihood estimator. (II) Parameter determination It is difficult yet crucial to determine the number of hidden states and components of the
AC
GMM in an HMM. Tradeoff always exists between a more accurate result and a smaller computing complexity for this problem. In this paper, AIC (Akaike, 1973), which is a simple and useful model selection criterion based on K-L information, is employed to determine the numbers, as the method attempts to optimize the tradeoff between complexity and goodness of fit (Eq. (15)). We consider the number combination of hidden states and mixture components with the smallest AIC as the best parameter for modeling.
AIC = -2ln( L) + 2 p
(15)
p=N +N ( N -1) + N ( K + K ( K+1) / 2)
(16)
ACCEPTED MANUSCRIPT where L is the likelihood function; p is the number of parameters to be estimated which include state, transition, and emission parameters; N is the number of hidden states; and K is the number of components of the GMM. Step 4. Viterbi decoding In the decoding step, the new parameter set λ and the observation sequence O are
Viterbi decoding (Rabiner, 1989). Q* can be expressed as: _
Q* = arg max{P(Q | O, λ)} Q
CR IP T
available. The most probable sequence of states Q* = {q1* , q2* ,..., qT* } can be calculated by the
(17)
Let δt (i) be the maximal probability of state sequences { q1, q2 ,..., qt , qt = si } at time t
AN US
which generate the observations { O1, O2 ,..., Ot } for the given model.
_
δt (i) = max P(q1, q2 ,..., qt , qt = si , O1, O2 ,..., Ot | λ) q1 ,q2 ,...,qt -1
(18)
The Viterbi decoding is a dynamic programming method. Q* = {q1* , q2* ,..., qT* } can be
M
derived when achieving recursive results as max[δt (i)] . 3.2.3. Forecasting and post-processing - Part III
ED
This subsection includes two steps: forecasting waste generation for the two scenarios in Fig. 4 and data post-preprocessing by the bootstrapping.
PT
Step 1. Forecasting
The way of weighted average Gaussian mixture and the way of searching the most similar
CE
pattern (Hassan, 2009) can be employed to forecast non-periodic and non-periodic fluctuations via the HMM, respectively. After repeated experiments, we found that the former is suitable
AC
for both small sample (approximately 25 to 30 data points with non-periodic fluctuation) and large sample forecasting, and that the latter is proper for large sample forecasting. (I) Forecasting way for Scenarios I Conventionally, a sample size less than 30 is considered small, making it difficult to detect similar patterns from historical data with such limited training data. In this situation, the way of weighted average Gaussian mixture, which integrates states transition and observations probability, is a better choice than the way of searching the most similar pattern. The state transition matrix A(i, j ) , learned by the training dataset, can reflect historical
ACCEPTED MANUSCRIPT relations among hidden states. Some fresh information about changes of fractional latent influencing parameters is available at some forecasting points beforehand, such as a new issuing management policy on controlling MSW generation, a deteriorating economic crisis, or an expected large-scale activity. Although the controlling effects of the policy, real deteriorating degree of the crisis, and concrete participation degree of residents in the activity are also uncertain, they can still provide useful information to update the state transition
Fτ = Fτ-1 (1+ rτ )
N A (qT , s j ) E j ( X ), j 1 r N j 1 A (qT , s j ) E j ( X ),
CR IP T
matrix A(i, j ) to a new one A(i, j ) . The forecasted amount Fτ at time T+τ is given by: (19)
if fresh information exists
(20)
if no fresh information
AN US
where rτ is the MSW increment rate at time T+τ , forecasted via the proposed model; F0=OT when τ = 1 ; X
is a continuous random variable which represents the MSW
increment rate; E j ( X ) is the expectation of X under hidden state j ; qT is the hidden state at time T ; s j represents one state in the set of states S = {s1, s2 ,..., sN } ; and updated
M
Aτ (qT , s j ) is the τ -step state transition probability from the state of time T to the state s j .
(II) Forecasting way for Scenarios II
ED
For a large data sample, once the HMM is trained, the log-likelihood estimator of each data pattern can be calculated by the forward algorithm (Rabiner, 1989). Then, we use Eq. (21)
PT
to discover the time point with the most similar pattern in historical data to the current pattern
CE
(i.e., the pattern at time T).
t * arg min{| LLET LLEt |} 1 t T 1
(21)
t
where LLET and LLEt are log-likelihood estimators at current time ( T ) and historical time
AC
( t ), respectively.
Theoretically, the nearest log-likelihood estimators can be discerned at time point t*
* from the historical data. The waste generation behavior of time point t is most similar to
that of current time point T . We assume that the increment rate that follows the reference point is the same as the point we are attempting to forecast. Thus, the i th forecasted amount in the data pattern at time T +1 is Fi : Fi = OTi [(Oti* +1 - Oti* ) / Oti* +1] 1 i I
(22)
ACCEPTED MANUSCRIPT where I is the element number of the data pattern defined in a specific case. Step 2. Data post-processing Discovering the most similar pattern is crucial for the second forecasting way. Due to the local optimum characteristic of EM algorithm, the HMM would find several similar patterns to the current one from the historical data. The local optimum may be a better predictor of behavior than a temporally distant global optimum as similar patterns that are closer in time
CR IP T
should have more impact on one another. Here we make use of this fact by training the model via multi-simulation of K runs. After each multi-simulation training, we extract the most similar pattern and calculate a forecasted amount Fk ( k = 1, 2,..., K ). Then, the bootstrapping method (Efron, 1979) was utilized to estimate 95% confidence intervals of the forecasted
sample dataset with unknown distribution. 4. Case study
AN US
amount {F1 , F2 ,..., FK } , which is an especially proper statistical resampling algorithm for small
This section illustrates two kinds of cases, that are yearly MSW generation with non-periodic fluctuation for both Shanghai and Beijing (case I), and daily wastewater
M
generation with periodic fluctuation for the Manresa sewage treatment plant (case II). The original datasets were extracted from the Shanghai Statistical Yearbook 2011 (Shanghai
ED
Bureau, 2011), the Beijing Statistical Yearbook 2011 (Beijing Bureau, 2011), and the UC Irvine Machine Learning Repository (Bache and Lichman, 2013), respectively.
PT
To assess the forecasting accuracy, four frequently used criteria were applied for comparisons of the proposed models and other models. The mean absolute error (MAE), the
CE
mean square error (MSE), the coefficient of determination ( R 2 ), and the adjusted coefficient of
AC
determination ( adjusted R2 ) are given by: MAE=
1 n ∑ A -F n i =1 i i
(23)
MSE=
1 n ( Ai -Fi )2 ∑ n i =1
(24) n
n
i =1
i =1
_
R 2 = 1- SSE / SST = 1- ∑( Ai -Fi ) 2 / ∑( Ai - A) 2 adjusted R 2 = 1 -
n -1 SSE / SST n - k -1
(25) (26)
where Ai is the actual amount; Fi is the forecasted amount; n is the testing sample size; k
ACCEPTED MANUSCRIPT is the number of explanatory variables. ARIMA and sARIMA, ES and sES, ANN, and SVM, introduced in Section 2, are all widely used and effective models to perform waste generation forecasting. Therefore, they were also implemented to compare and assess the effectiveness of the proposed models. For building ARIMA and sARIMA models, the autocorrelation function plot and AIC were both considered to discover the best-fitted model from the profile of different parameters (p, d, q). To build ES model for non-seasonal trend forecasting, the best suitable model is selected out
CR IP T
among simple ES, double (Brown) ES, linear (Holt) ES, and damped-trend linear ES. To analysis seasonal trend by sES model, simple sES, winters additive algorithm based ES, and winters multiplicative algorithm based ES were tested on the formatted time series. The most suitable ES or sES models were selected by the Expert Modeler in the SPSS 19.0 software. The BP neural network, the most frequently used architecture of ANN, was utilized in this
AN US
study, which is a three-layer network including input layer, hidden layer, and output layer. The classical Levenberg-Marquardt backpropagation algorithm is employed to update weights and bias states. Optimized parameters of ANN are given in the subsection of 4.1.3 and 4.2.3. The SVM regression was performed using the LIBSVM software (Chang and Lin, 2011). SVM regression with four types of kernel functions (i.e. linear, polynomial, radial basis
M
function, and sigmoid) were tested to find out the most appropriate kernel function for the given application, according to criteria of MSE and R2 during training and testing stages. A
ED
two-step grid search method consisted of coarse grid search and finer grid search was used to select the best parameter of gamma function and loss function. In the process of parameters
PT
selection, the cross validation was employed to avoid over fitting problem.
CE
4.1. Case I: MSW generation forecasting 4.1.1. Background
AC
Shanghai, the economic center of China, is a megacity with a resident population over 23 million and yearly GDP of more than ¥2 trillion; Beijing, the capital city of China, is another megacity even in the world. The waste generation of both Shanghai and Beijing remains a strong upward trend over the past three decades. Fig. 6 shows MSW generation with non-periodic fluctuation in these two cities from 1978 to 2010.
750 650 550
Shanghai
Beijing
450 350 250 150 50 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 Year
CR IP T
Solid Waste Generation / 104 tonne
ACCEPTED MANUSCRIPT
Fig. 6. MSW generation from 1978 to 2010 of Shanghai and Beijing 4.1.2. Forecasting process
The dataset of waste generation increment rate was chosen as input data in order to meet
AN US
the requirement of stationary time series which was tested by the autocorrelations function and the Q-Q plot. An assumption test method, the one-sample Kolmogorov-Smirnov test, was used to explore the distribution of input data which can be utilized to initialize the state transition matrix. The testing results are presented in Table 1, which show that we cannot reject the null hypothesis that the data in MSW increment rate dataset comes from a normal
M
distribution at the significance level of 0.05. Testing for independent and identically distributed (i.i.d.) hypothesis with the correlogram was implemented for the time series of the
ED
MSW increment rate. The results show that only 1 of the 32 values of ρˆ k lies outside the bounds ±1.96 / T ( α = 0.05 ), and there is no evidence to reject the null hypothesis
PT
( H 0 : ρk = 0 ) that the observations are independently distributed. The selected dataset was divided into two sets, one for training and the other for testing.
CE
To avoid random error, the HMM was tested using data from 2004 to 2010. We trained it using the data from 1978 to 2003 to forecast waste generation in 2004, continued to train it
AC
using the data from 1978 to 2004 to forecast waste generation in 2005, and so forth. In order to forecast waste generation via the WDE-HMM approach, waste generation data signal was decomposed by Symlets of level 5 for Shanghai and level 3 for Beijing, and then de-noised by the penalized high thresholding function via the MATLAB wavelet toolbox. Using AIC, we define the number of hidden states (N) as 5 and the mixture components (K) of a GMM as 3 (see Table 2).
Table 1. One-sample Kolmogorov-Smirnov test for waste increment rate datasets
ACCEPTED MANUSCRIPT
Sample Size
Beijing (1978-2010) 32
Shanghai (1978-2010) 32
Beijing (1978-2003) 25
Shanghai (1978-2003) 25
Normal (Mean/SD)
0.077/0.061
0.066/0.093
0.075/0.057
0.075/0.103
Most Extreme Differences
0.117
0.162
0.162
0.166
Kolmogorov-Smirnov Z
0.664
0.915
0.810
0.830
Asymp. Sig. (2-tailed)
0.769
0.373
0.528
0.495
NHS
NGMC
MLE
AIC
NHS
NGMC
MLE
AIC
1
-160.5
350.9
4
3
-113.5
331.1
3
2
-142.7
333.5
4
4
-95.8
335.7
3
3
-131.9
335.9
5
1
-143.2
356.3
3
4
-120.5
342.3
5
2
-118.7
337.3
4
1
-154.3
356.5
5
3
-91.0
322.0
4
2
-125.5
323.1
5
4
-74.4
338.8
NHS
NGMC
MLE
AIC
6
1
-130.6
357.1
6
2
-101.1
334.1
6
3
-72.1
324.3
6
4
-50.4
340.9
AN US
3
CR IP T
Table 2. AIC scores of different hidden Markov models
Note: NHS=the number of hidden states; NGMC=the number of GMM mixture components; MLE=the maximum log-likelihood estimator.
M
In this study, the total uncertainty degree of waste generation (denoted as ζ ) is taken as a hidden variable. The lower the total uncertainty degree is, the smaller the increment rate
ED
fluctuation around the mean value will be; while the higher it is, the larger the increment rate fluctuation will be. ζ=ζ (ς1, ς2 , ς3 , ς4 , ς5 ,...) expresses that ζ is correlated to a number of
PT
uncertainties, mainly such as uncertainty from the economy ( ς1 ), population mobility ( ς 2 ), individual behaviors ( ς3 ), management policy ( ς 4 ), unexpected disasters ( ς5 ), etc. In case I,
CE
the number of hidden states is five. So total uncertainty degree can be divided into five discrete states (denoted as “–high,” “–low,” “0,” “+low,” and “+high”), where “–” means total
AC
uncertainty of negative effects which resulting in a waste increment rate that is lower than the mean increment rate, while “+” indicates the total uncertainty of positive effects which cause the waste increment rate to be higher than the mean. The initial state transition probability among different total uncertainty degrees can be expressed by Gaussian distribution of the MSW increment rate and a random parameter, for example, A0 (3, 3) = Φ( μ + ασ ) - Φ( μ - ασ ) ,
A0 (3, 2) = A0 (3, 4) = Φ( μ - ασ ) - Φ( μ - 2ασ ) and
A0 (3,1) = A0 (3, 5) = Φ( μ - 2ασ ) , where μ is the mean
value of the increment rate; σ is standard deviation; and [0.5,1] is a random term. The
initial transition matrix can be expressed as:
ACCEPTED MANUSCRIPT high low high a h, h low al , h A0 (i, j ) = 0 a0, h low a l , h high a h, h
a h,l al ,l a0,l a l ,l a h,l
low high
0 a h,0 al ,0 a0,0 a l ,0 a h,0
a h, l al , l a0, l a l , l a h, l
a h, h al , h a0, h a l , h a h, h
(27)
For each annual forecasting, 150 training runs were carried out via different initial
CR IP T
parameters ( λ ) to obtain updated parameters ( λ ) by choosing the largest log-likelihood estimator. The iterative curve of the largest log-likelihood estimator of Year 2009 is plotted in Fig. 7, and it shows that the log-likelihood estimator achieves its maximum (-91.0) after 21 iterations. The most probable state sequence from 1979 to 2008 (Fig. 8) can be revealed via Viterbi decoding, which illustrates the changes in the levels of hidden states. An assumption
AN US
test method, the chi-square test, was used to verify the Markov property of the hidden state χ 2 = 41.6 (the case of Beijing) and
sequence.
χ 2 = 36.0 (the case of Shanghai),
χ 2 > χα2 ((q -1)2 ) = 26.3 ( q=5 and α=0.05 ), where α is significance level, and (q -1)2 is the degree of freedom. The test result shows the hidden state sequence {q1, q2 ,..., qT } satisfies
M
the Markov property. -100
ED
-110
-120 -130 -140
PT
Log-likelihood estimator
-90
-150 -160
CE
0
3
6
9 12 Iterations
15
18
21
Fig. 7. The log-likelihood estimator iterative curve of MSW forecasting of Year 2009 Hidden States
AC
+high
Shanghai
Beijing
+low
0
-low
-high 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 Year
Fig. 8. The most probable state sequence from 1979 to 2008 of Shanghai and Beijing
ACCEPTED MANUSCRIPT Fresh information of uncertainty factors can be utilized to update state transition matrix and improve forecasted results. In Year 2008, Beijing hosted the Olympic Games, and Beijing government encouraged citizens to conserve energy and reduce emissions. After the Olympic Games, the government issued a policy on comprehensively advancing domestic waste sorting on April 28, 2009. The effect of this policy would likely last a few years. Shanghai is another city that experiences the same effect. In order to transform itself into a National Environmental Protection Model City, the government issued a policy on comprehensively
CR IP T
controlling environment pollution on July 9, 2007. The policy emphasized on carrying out diagnostic investigations in rest months of Year 2007 and completing control task at the end of Year 2008. The effect of this policy would be significant in the target year but limited for following years, as it is a short term task whose aim is to win the title of Model City.
The fresh information makes intuitive sense that waste generation of next year will
AN US
change and state transition matrix should be updated to respond to the new information. Once fresh information is obtained, we can consult experts in waste management system two questions: 1) Which is the most probable state of the next year considering fresh information? 2) How many probabilities will this state absorb from other states? Experts can give a probability interval if they are uncertain about the exact value. Then, the most common
M
method (the weighted average) in group decision problem was utilized to aggregate experts’
PT
Year 2009 for instance):
ED
knowledge. The updated transition matrix can be expressed as (taking Beijing’s situation of
R-l ,- h
R0,-h
R+l ,- h
(28)
R+ h,- h
CE
A(i, j ) =
high low 0 low high i [ai , h ai ,l ai ,0 ai ,l ai , h ]
where R j ,-h is the absorption ratio of probabilities from other states. To simplify this
AC
process, the same ratios are chosen for R j ,-h of different j .
4.1.3. Parameters setup of comparison methods ES model can only utilize waste generation variable to perform forecasting. However, the
two major drivers for waste generation, the Resident Population and the Real GDP (Wang and Nie, 2001), were chosen as predictors in the ARIMA model to improve its forecasting accuracy. In ANN and SVM models, the best suitable predictors are the Resident population, the Real GDP and the Resident Consumption Level, which were selected by criteria of MSE and R2 during training and testing stages. The increment rate of waste generation in t +1
ACCEPTED MANUSCRIPT year ( RtW+1 ) is a function of the increment rate in t year ( RtW ) and increment rates of another three factors in t year ( RtP , RtGDP , and RtC ). All models used in this study are data-driven style, which means it is needed to retrain these models once new data is obtained. As a result, the best parameters for each year forecasting may be different. In the case of Shanghai, ARIMA (1, 1, 1) was selected for years of 2004, 2005, and 2009; ARIMA (1, 2, 1) was used for another 4 years. Linear (Holt) ES is
CR IP T
the best fitted ES model for Shanghai case. In the case of Beijing, ARIMA (1, 1, 1) was selected for years 2004 to 2010. Linear (Holt) ES was used for years 2004 and 2008, and double (Brown) ES was chosen for other years. The basic parameters of non-probabilistic statistical learning models for Shanghai and Beijing cases are the same. Optimized BP neural network parameters: learn rate equals to 0.2; transfer function of hidden layer is tansig; error
AN US
goal is 0.0001; input nodes are 4; maximum iterations is 5000; hidden nodes are 10; output node is 1; training algorithm is Levenberg-Marquardt backpropagation. In SVM regression, radial basis function (RBF) was selected as kernel function, which is consistent with the work of Abbasi et al. (2014). The parameter of gamma function and penalty parameter are both set to be the range [2-8, 28] in coarse grid search. Parameter ranges were narrowed in finer grid
M
search via the results of coarse grid search. The steps of two searches are set to be 1 and 0.2, respectively.
ED
4.1.4. Results and discussion
We compare the results of ARIMA, ES, ANN, SVM, HMM, and WDE-HMM
PT
approaches for Shanghai and Beijing cases in Figs. 9 and 10, respectively. Tables 3 and 4 summarize MAE, MSE, R2, and adjusted R2 of the forecasted results for Shanghai and Beijing
CE
cases, respectively. The results indicate that HMM outperforms ARIMA, ES, ANN, and SVM (except for the MAE of HMM is slightly more than that of ES in Shanghai case). After
AC
wavelet de-noising, WDE-HMM performs better than HMM. The forecasting accuracies of HMM* and WDE-HMM* are further improved after taking into fresh information of management policies consideration. Table 3. Forecasting accuracies of different models for Shanghai case ARIMA
ES
ANN
SVM
HMM
HMM*
WDE-HMM
WDE-HMM*
MAE
14.84
11.93
13.37
15.67
12.45
10.78
11.74
9.93
MSE
463.8
247.8
371.7
336.1
240.5
152.1
225.0
130.8
R
0.741
0.861
0.792
0.812
0.865
0.915
0.874
0.927
adjusted R2
0.612
0.861
0.584
0.624
0.865
0.898
0.874
0.912
2
ACCEPTED MANUSCRIPT
Table 4. Forecasting accuracies of different models for Beijing case ARIMA
ES
ANN
SVM
HMM
HMM*
WDE-HMM
WDE-HMM*
MAE
26.13
26.30
25.34
24.49
23.9
22.1
19.9
17.9
MSE
929.8
945.0
918.0
865.0
801.3
653.6
687.2
540.4
0.893
0.891
0.894
0.900
0.907
0.925
0.921
0.938
0.840
0.891
0.789
0.801
0.907
0.910
0.921
0.925
2
R
2
adjusted R
CR IP T
Note: The testing sample size is small (n=7), so the adjusted R2 have a relatively big gap with R2 for those methods with more predictors (such as ANN and SVM). * Considering fresh information of policies, the most probable states inferred by experts are both “-h” for Year 2008 in Shanghai and Year 2009 in Beijing; the absorption rate of probabilities were calculated to be 0.25 and 0.35, respectively. All experts’ knowledge is collected in the Shanghai Environmental Protection Bureau.
All the forecasted amount of the proposed approaches are close to actual amount in Figs. 9 and 10. Not only does WDE-HMM have more accurate results in regular years but also it
AN US
appears to be more reliable than other methods when encountering a drastic fluctuation, such as the fluctuation points of Year 2008 in Shanghai case and Year 2009 in Beijing case. The disadvantages of comparison methods are revealed in these two cases. ARIMA cannot catch the fluctuation effectively. ES chooses a compromise between general and fluctuation points. Although ES can effectively reflect the upward trend, it would lost lots of fluctuation
M
information. ANN is not stable due to its characteristic of over-fitting and local minimum.
ED
740
690
665
PT
715
Actual amount ARIMA ES ANN SVM HMM* WDE-HMM*
CE
MSW Generation / 104 tonne
Both ANN and SVM depends on influencing factors (predictors) heavily.
640
AC
615
590 2004
2005
2006
2007 Year
2008
2009
2010
Fig. 9. Forecasted and actual amount of Shanghai case from 2004 to 2010
ACCEPTED MANUSCRIPT
660 620
580
Actual amount ARIMA ES ANN SVM HMM* WDE-HMM*
540 500
460 420
380 2004
2005
2006
2007 Year
CR IP T
MSW Generation / 104 tonne
700
2008
2009
2010
Fig. 10. Forecasted and actual amount of Beijing case from 2004 to 2010
As mentioned in Section 3.2, uncertainties are not measured directly; instead, historical
AN US
data indicate the likely total uncertainty degree. The total uncertainty degree
ζ=ζ (ς1, ς2 , ς3 , ς4 , ς5 ,...) is determined by many uncertainties. In most cases, ζ is high when one or two of the uncertainty factors vary sharply, and ζ is low if all uncertainties are general. According to the calculated result of the total uncertainty degree (Fig. 8), we can
M
search for the source of uncertainty in reality from the five major sources. ζ of Year 2000 is “+high”, likely due to government policy regarding increasing environmental sanitation
ED
facilities. ζ of Year 2003 is “+high” in Shanghai and “+low” in Beijing, probably because of an explosion of disposable supplies during the SARS outbreak. The global economic crisis
PT
and winning a title of Model City are the probable reasons that ζ of Year 2008 is “-high” in Shanghai; however, ζ in the same year of Beijing is “0”, partly caused by the fact that the
CE
Olympic Games was held. ζ of Years 2006 and 2007 in Shanghai, and ζ of Year 2007 in Beijing are “0” since there were few changes in economic, population mobility, individual
AC
behaviors, and management policy. The way of weighted average Gaussian mixture was proposed to deal with the case I.
Once we have determined the total uncertainty degree in a year, the emission matrix ( B ), and the state transition probability matrix ( A ) which are updated by training the model via the historical dataset and even some fresh information, we can forecast weighted waste generation of the next year via probability calculation. Thus, fluctuation forecasting of waste generation can be implemented even when some of the uncertainty factors vary sharply.
ACCEPTED MANUSCRIPT 4.2. Case II: wastewater generation forecasting 4.2.1. Background The study site is the Manresa sewage treatment plant, located in Barcelona (Spain) which is one of the most populous cities in the world. Spain, the fourth largest country in Europe, is located at the western edge of the Mediterranean. Spain has a population of over 47 million people which ranks in the top 30 worldwide. The wastewater treatment system of Spain is
CR IP T
well developed, with more than 80% of the wastewater being treated. Spain has been actively promoting the reuse of wastewater that has been treated by urban sewage treatment plants as an aspect of implementing sustainable urban development. The original dataset of wastewater generation with periodic fluctuation has 477 daily data, which is a period from January 1,
AN US
60000 50000
40000 30000
Original signal De-noised signal
20000 10000 0
50
100
M
Wastewater Generation / tonne
1990 to July 10, 1991, not including Saturdays (see the original signal in Fig. 11).
150
200
250
300
350
400
450
Day
ED
Fig. 11. The original vs. de-noised signal of daily wastewater generation
PT
4.2.2. Forecasting process
Using MATLAB wavelet toolbox, the original signal was decomposed by Daubechies of level 2, and then de-noised by the penalized medium thresholding function. Fig. 11 shows
CE
both the original signal and the de-noised signal. Considering weekly periodicity, the datasets for HMM and WDE-HMM approaches were transformed into a matrix datasets with rows that
AC
form a weekly time series, and columns that follow a pattern [Mon., Tues., Wed., Thur., Fri., and Sun.]. Each row data is a stationary time series tested by the autocorrelations function and the Q-Q plot. The distribution of input data was explored by the one-sample Kolmogorov-Smirnov test (Table 5). Testing for i.i.d. hypothesis with the correlogram was also implemented for time series data of wastewater generation. The results show that 2, 1, 2, 1, 1, and 3 of the 78 values of ρˆ k lie outside the bounds ±1.96 / T ( α = 0.05 ) for datasets of Mon., Tues., Wed., Thur., Fri., and Sun., respectively. So there is no evidence to reject the null hypothesis that the observations are independently distributed. The selected dataset was
ACCEPTED MANUSCRIPT also divided into two sets, one for training and the other for testing. We tested the proposed approaches using the 79th and the 80th week data. Using AIC criterion, we set the number of hidden states to 5 and the mixture components of a GMM to 3 (see Table 6). The result of the chi-square test is: χ 2 = 71.8 > χα2 ((q -1)2 ) = 26.3 ( q=5 and α=0.05 ), which shows that the hidden state sequence {q1, q2 ,..., qT } satisfies the Markov property.
Sample Size
Monday 80
Tuesday 80
Wednesday 80
Thursday 79
Friday 79
Sunday 79
38943/6387
39544/7151
38741/5643
37054/6489
37404/5603
35027/6994
Most Extreme Differences Kolmogorov-Smirnov Z
0.076
0.101
0.149
0.091
0.096
0.101
0.678
0.902
1.329
0.807
0.855
0.899
Asymp. Sig. (2-tailed)
0.748
0.391
0.059
0.533
0.458
0.394
AN US
Normal (Mean/SD)
CR IP T
Table 5. One-sample Kolmogorov-Smirnov test for wastewater generation datasets
NHS
NGMC
MLE
AIC
NHS
NGMC
MLE
AIC
NHS
NGMC
MLE
AIC
3
1
-133.5
297.0
3
2
-98.8
245.5
3
3
-81.9
235.9
4
1
-117.0
281.9
4
2
-82.5
237.0
4
3
-57.6
219.1
5
1
-95.6
261.1
5
2
-60.8
221.7
5
3
-30.5
200.9
6
1
-83.4
262.7
6
M
Table 6. AIC scores of different hidden Markov models
-46.8
225.6
6
3
-23.4
226.7
2
ED
Note: NHS=the number of hidden states; NGMC=the number of GMM mixture components; MLE=the maximum log-likelihood estimator.
PT
To facilitate the forecasting accuracies of HMM and WDE-HMM approaches, we carried out 20 runs of multi-simulations (30 simulations each run) from different initial parameters ( λ )
CE
to obtain 20 total forecasted amount {F1 , F2 ,..., F20 } via choosing the biggest log-likelihood estimator. For the WDE-HMM approach, only the top three most similar historical patterns of
49000
Wastewater Generation / tonne
Wastewater Generation / tonne
AC
the current pattern were selected to illustrate the result of pattern recognition in Fig. 12. 44000 39000 34000 29000
The 44th week The 60th week
The 58th week The 78th week
24000
Mon.
Tues.
Wed. Thur. Time
Fri.
Sun.
54000
49000 44000 39000 34000 29000 24000
The 14th week The 70th week Mon.
Tues.
The 24th week The 79th week
Wed. Thur. Time
Fri.
Sun.
Fig. 12. The current wastewater generation pattern, (left, for the 78th week) and (right, for the 79th week), matched with historical similar data patterns
ACCEPTED MANUSCRIPT 4.2.3. Parameters setup of comparison methods Historical data of influencing factors for wastewater generation is not available in this case. So only historical data of wastewater generation can be utilized as predictors for ANN and SVM models. However, this does not affect sES and sARIMA models since they were supposed to utilize wastewater generation data to implement forecasting. As the week period is 6, for one week ahead forecasting of ANN and SVM models, the wastewater generation in
CR IP T
t +1 day ( Wt +1 ) is a function of wastewater generation in t - 5 day ( Wt -5 ), t - 6 ( Wt -6 ), … , and t -10 ( Wt -10 ). For one day ahead forecasting, the wastewater generation in t +1 day ( Wt +1 ) is a function of wastewater generation in t day ( Wt ), t -1 day ( Wt -1 ), … , and t -5 day ( Wt -5 ).
The best-fitted model of sARIMA in this case was selected to be sARIMA (2, 0, 1) (1, 0,
AN US
1)6. A time lag for 6 days was used as the seasonal parameter for the Auto Regressive. Winters' Multiplicative algorithm based ES is the most suitable model for wastewater generation forecasting with seasonal period of 6. Optimized BP neural network parameters: learn rate equals to 0.2; transfer function of hidden layer is tansig; error goal is 0.0001; input
M
nodes are 6; maximum iterations is 10000; hidden nodes are 15; output node is 1; training algorithm is Levenberg-Marquardt backpropagation. In SVM regression, RBF was selected as
ED
the best kernel function. The parameter of gamma function and penalty parameter are both set to be the range [2-8, 28] in coarse grid search. The steps of coarse grid search and finer grid
PT
search are set to be 1 and 0.1, respectively. 4.2.4. Results and discussion
CE
For each day, the mean value of 20 forecasted amount of the proposed approaches was regarded as the final forecasted amount. The forecasted results of sARIMA, sES, ANN, SVM, HMM, and WDE-HMM approaches are compared in Fig. 13.
AC
Table 7 summarizes MAE, MSE, and R2 of the forecasted results. The results indicate
that the HMM approach outperforms sARIMA, sES, ANN, and SVM. The WDE-HMM approach also performs better than the HMM approach after wavelet de-noising. Since data of influencing factors is not available, the forecasting accuracies for one week ahead forecasting of ANN and SVM are much worse than the proposed approaches; however, the forecasting accuracies were raised under one day ahead forecasting.
ACCEPTED MANUSCRIPT
34000
31000 Actual amount sARIMA ANN SVM sES HMM WDE-HMM
28000
25000 1
2
3
4
5 Day
6
CR IP T
Wastewater Generation / tonne
37000
7
8
9
Fig. 13. Forecasted and actual amount of the testing dataset
sARIMA*
sES*
ANN**
2086.3
1490.3
1395.1
9.44
4.04
5.54
-0.407
0.397
0.175
MAE 6
MSE (/10 ) 2
R
AN US
Table 7. Forecasting accuracies of different models for wastewater generation ANN*
SVM**
SVM*
HMM*
WDE-HMM*
1696.3
1462.3
1684.5
1272.9
986.3
6.76
4.71
6.29
3.28
2.03
-0.008
0.298
0.063
0.511
0.697
Note: * means one week ahead forecasting, and **means one day ahead forecasting. Adjusted R equals to R2,
M
since there are no extra predictors in this case.
2
ED
Next, a bootstrapping method was utilized to calculate 95% confidence intervals of forecasted amount derived from HMM and WDE-HMM approaches (Figs. 14 and 15). The 95%
PT
confidence intervals of WDE-HMM are narrower than those of HMM. The scope of the confidence intervals of WDE-HMM encapsulates a greater number of observation data points. This visually displays that the forecasting precision of WDE-HMM is higher than HMM.
CE
Fig. 15 shows that all actual wastewater generations (except for the 6th point) are within the scope of 95% confidence intervals, and that the trend of forecasted amount is similar to
AC
that of actual amount. The reason why the 6th point is not in the 95% confidence intervals is that new waste generation uncertainties exist for the current pattern which cannot be fully traced in history. However, the good news is that this new feature can be utilized in the future forecasting. In the case II, the way of searching the most similar pattern was used to forecast wastewater generation with periodic fluctuation for a large sample case. The results demonstrate that this approach is a good way to dynamically trace historical patterns. This also indicates that we have achieved the goal of forecasting fluctuations of waste generation.
40000 37000 34000
31000 Upper confidence interval Lower confidence interval Actual amount
28000 25000
1
2
3
4
5 Day
6
7
8
9
CR IP T
Wastewater Generation / tonne
ACCEPTED MANUSCRIPT
40000
34000 31000
AN US
37000
Upper confidence interval Lower confidence interval Actual amount
28000 25000 1
2
3
4
M
Wastewater Generation / tonne
Fig. 14. 95% confidence intervals of forecasted amount of the HMM approach
5 Day
6
7
8
9
ED
Fig. 15. 95% confidence intervals of forecasted amount of the WDE-HMM approach 5. Conclusions and future work
PT
In this study, we have proposed a probability model-driven statistical learning approach to forecast the fluctuant generation of municipal waste under uncertainties. The proposed
CE
approaches, which hybridizes a wavelet de-noising, a Gaussian mixture model, and a hidden Markov model, were tested by two kinds of case studies. They are Shanghai and Beijing
AC
annual MSW generation with non-periodic fluctuation, and daily wastewater generation with periodic fluctuation of the Manresa sewage treatment plant. The results have indicated that the HMM approach generally outperforms the ARIMA, sARIMA, ES, sES, ANN, and SVM approaches in both MAE, MSE, R2, and adjusted R2 criteria. The WDE-HMM approach performs better after the wavelet de-noising. The computational results have demonstrated that the proposed approaches are effective in solving the municipal waste generation forecasting problem. We presented two concrete forecasting ways, the way of weighted average Gaussian mixture (WAGM) and the way of searching the most similar pattern (SMSP), to forecast the
ACCEPTED MANUSCRIPT MSW generation with non-periodic fluctuation, and the wastewater generation with periodic fluctuation, respectively. Actually, different steps of state transition in the state probability matrix can be chosen in the way of WAGM. Therefore, the way of WAGM is suitable for short or medium term forecasting of both small and large samples; while the way of SMSP is advantageous for short term forecasting with large samples. In this study, we have implemented the fluctuation forecasting of waste generation which is found to be influenced by some latent influencing factors and their uncertainties. How
CR IP T
uncertainty factors influence forecasted results was illustrated in the subsection of results and discussion. In MSW generation cases, the fresh information is effectively utilized to update the state transition matrix to better foresee future trend.
Future work is needed to improve the EM algorithm. Due to the local optimum characteristics of the EM algorithm, the model was trained repeatedly. This required a
AN US
significant computing effort. In addition, one can further attempt to forecast middle-term waste generation via an improved hidden Markov model. Furthermore, n this study, the transition matrix is time homogeneous in each forecasting process although it can be updated online when new data is available, and can be updated once fresh information is obtained. However, non-homogeneous hidden Markov models are likely to achieve higher forecasting
M
accuracy.
ED
References
Abbasi, M., Abduli, M. A., Omidvar, B., & Baghvand, A. (2013). Forecasting municipal solid
PT
waste generation by hybrid support vector machine and partial least square model. International Journal of Environmental Research, 7(1), 27-38.
CE
Abbasi, M., Abduli, M. A., Omidvar, B., & Baghvand, A. (2014). Results uncertainty of support vector machine and hybrid of wavelet transform-support vector machine models for
AC
solid waste generation forecasting. Environmental Progress & Sustainable Energy, 33(1), 220-228.
Ahmad, K. (2012). A system dynamics modeling of municipal solid waste management systems in Delhi. Renewable Energy. 1, 628–641.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B., & Csaki, F., (Eds.), Second international symposium on information theory (pp. 267-281). Budapest. Ali Abdoli, M., Falah Nezhad, M., Salehi Sede, R., & Behboudian, S. (2012). Longterm forecasting of solid waste generation by the artificial neural networks. Environmental
ACCEPTED MANUSCRIPT Progress & Sustainable Energy, 31(4), 628-636. Antanasijevic, D., Pocajt, V., Popovic, I., Redzic, N., & Ristic, M. (2013). The forecasting of municipal waste generation using artificial neural networks and sustainability indicators. Sustainability science, 8(1), 37-46. Bache,
K.,
&
Lichman,
M.
(2013).UCI
Machine
Learning
Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. [last accessed on Aug 2, 2015].
review. Waste management, 28(1), 200-214.
CR IP T
Beigl, P., Lebersorger, S., & Salhofer, S. (2008). Modelling municipal solid waste generation: A
Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, Computer
Science
ICSI-TR-97-021, 4(510): 126. Bureau,
S.
S.,
2011.
Institute,
University
of
AN US
International
Shanghai
statistical
California
Berkeley,
yearbook
2011
[http://www.stats-sh.gov.cn/data/toTjnj.xhtml?y=2011e]. [last accessed on Aug 2, 2015]. Bureau, B, S., 2011. Beijing statistical yearbook 2011 [http://www.bjstats.gov.cn/]. [last accessed on Aug 2, 2015].
M
Chang, C. C., & Lin, C. J. (2011). LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.
ED
Chen, H. W., & Chang, N. B. (2000). Prediction analysis of solid waste generation based on grey fuzzy dynamic modeling. Resources, Conservation and Recycling, 29(1), 1-18.
PT
Cheon, S., Kim, S., Kim, J., & Kim, C. (2008). Learning Bayesian networks based diagnosis system for wastewater treatment process with sensor data. Water Science & Technology,
CE
58(12), 2381–2393.
Cherian, J., & Jacob, J. (2012). Management Models of Municipal Solid Waste: A Review
AC
Focusing on Socio Economic Factors. International Journal of Economics and Finance, 4(10), 131-139.
Denafas, G., Ruzgas, T., Martuzevicius, D., Shmarin, S., Hoffmann, M., Mykhaylenko, V., ... & Ludwig, C. (2014). Seasonal variation of municipal solid waste generation and composition in four East European cities. Resources, Conservation and Recycling, 89, 22-30. De Souza e Silva, E. G., Legey, L. F., & de Souza e Silva, E. A. (2010). Forecasting oil price trends using wavelets and hidden Markov models. Energy economics, 32(6), 1507-1519. Dong, M., Yang, D., Kuang, Y., He, D., Erdal, S., & Kenski, D. (2009). PM2.5 concentration
ACCEPTED MANUSCRIPT prediction using hidden semi-Markov model-based times series data mining. Expert Systems with Applications, 36(5), 9046-9055. Dyson, B., & Chang, N. B. (2005). Forecasting municipal solid waste generation in a fast-growing urban region with system dynamics modeling. Waste management, 25(7), 669-679. Efron, B. (1979). Bootstrap methods: another look at the jackknife. The annals of Statistics, 7(1), 1-26.
CR IP T
Ghiani, G., Laganà, D., Manni, E., Musmanno, R., & Vigo, D. (2014). Operations research in solid waste management: A survey of strategic and tactical issues. Computers & Operations Research, 44, 22-32.
Guerry, M. A. (2011). Hidden heterogeneity in manpower systems: A Markov-switching model approach. European Journal of Operational Research, 210(1), 106-113.
AN US
Guo, X., Liu, S., Wu, L., & Tang, L. (2014). Application of a Novel Grey Self-Memory Coupling Model to Forecast the Incidence Rates of Two Notifiable Diseases in China: Dysentery and Gonorrhea. PloS one, 9(12), e115664.
Hassan, M. R. (2009). A combination of hidden Markov model and fuzzy model for stock market forecasting. Neurocomputing, 72(16), 3439-3446.
M
Intharathirat, R., Salam, P. A., Kumar, S., & Untong, A. (2015). Forecasting of municipal solid waste quantity in a developing country using multivariate grey models. Waste
ED
Management, 39, 3-14.
Jalili Ghazi Zade, M., & Noori, R. (2008). Prediction of municipal solid waste generation by
PT
use of artificial neural network: A case study of Mashhad. International Journal of Environmental Research, 2(1), 13-22.
CE
Karavezyris, V., Timpe, K. P., & Marzi, R. (2002). Application of system dynamics and fuzzy logic to forecasting of municipal solid waste. Mathematics and Computers in Simulation,
AC
60(3), 149-158.
Katsamaki, A., Willems, S., & Diamadopoulos, E. (1998). Time series analysis of municipal solid waste generation rates. Journal of Environmental Engineering, 124(2), 178-183.
Kim, M. J., Jiang, R., Makis, V., & Lee, C. G. (2011). Optimal Bayesian fault prediction scheme for a partially observable system subject to random failure. European Journal of Operational Research, 214(2), 331-339. Kollikkathara, N., Feng, H., & Yu, D. (2010). A system dynamic modeling approach for evaluating municipal solid waste generation, landfill capacity and related cost management issues. Waste management, 30(11), 2194-2203.
ACCEPTED MANUSCRIPT Misiti, M., Misiti, Y., Oppenheim, G., & Poggi, J. M. (2011). Wavelet Toolbox™ 4 Getting Started Guide. The MathWorks, Inc.: Natick, MA, (Chapter1). Noori, R., Abdoli, M. A., Ghasrodashti, A. A., & Jalili Ghazizade, M. (2009). Prediction of municipal solid waste generation with combination of support vector machine and principal component analysis: A case study of Mashhad. Environmental Progress & Sustainable Energy, 28(2), 249-258. Owusu-Sekyere, E. (2013). Forecasting and Planning for Solid Waste Generation in the
CR IP T
Kumasi Metropolitan Area of Ghana: An ARIMA Time Series Approach. International Journal of Sciences, 2(2013-04), 69-83.
Pires, A., Martinho, G., & Chang, N. B. (2011). Solid waste management in European countries: A review of systems analysis techniques. Journal of environmental management, 92(4), 1033-1050.
AN US
Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.
Rimaityte, I., Ruzgas, T., Denafas, G., Racys, V., & Martuzevicius, D. (2012). Application and evaluation of forecasting methods for municipal solid waste generation in an eastern-European city. Waste Management & Research, 30(1), 89-98.
M
Setzler, H., Saydam, C., & Park, S. (2009). EMS call volume predictions: A comparative study. Computers & Operations Research, 36(6), 1843-1851.
ED
Song, J., & He, J. (2014). A multistep chaotic model for municipal solid waste generation prediction. Environmental engineering science, 31(8), 461-468.
PT
Sun, S., Zhang, C., & Yu, G. (2006). A Bayesian network approach to traffic flow forecasting. IEEE Transactions on Intelligent Transportation Systems, 7(1), 124-132.
CE
Ustun, O., & Kasimbeyli, R. (2012). Combined forecasts in portfolio optimization: a generalized approach. Computers & Operations Research, 39(4), 805-819.
AC
Wang, H., & Nie, Y. (2001). Municipal Solid Waste Characteristics and Management in China. Journal of the Air & Waste Management Association, 51(2), 250-263.
Xu, L., Gao, P., Cui, S., & Liu, C. (2013). A hybrid procedure for MSW generation forecasting at multiple time scales in Xiamen City, China. Waste Management, 33(6), 1324-1331. Zhou, Z. J., Hu, C. H., Xu, D. L., Chen, M. Y., & Zhou, D. H. (2010). A model for real-time failure prognosis based on hidden Markov model and belief rule base. European Journal of Operational Research, 207(1), 269-283.