Hidden Markov model for municipal waste generation forecasting under uncertainties

Hidden Markov model for municipal waste generation forecasting under uncertainties

Accepted Manuscript Hidden Markov model for municipal waste generation forecasting under uncertainties P. Jiang , X. Liu PII: DOI: Reference: S0377-...

1MB Sizes 11 Downloads 111 Views

Accepted Manuscript

Hidden Markov model for municipal waste generation forecasting under uncertainties P. Jiang , X. Liu PII: DOI: Reference:

S0377-2217(15)00842-5 10.1016/j.ejor.2015.09.018 EOR 13237

To appear in:

European Journal of Operational Research

Received date: Revised date: Accepted date:

22 September 2014 7 August 2015 9 September 2015

Please cite this article as: P. Jiang , X. Liu , Hidden Markov model for municipal waste generation forecasting under uncertainties, European Journal of Operational Research (2015), doi: 10.1016/j.ejor.2015.09.018

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT Highlights 

A model-driven learning method is proposed to trace fluctuations dynamically.



The fluctuation forecasting under uncertainties is made possible using the HMM.



Case studies of different scenarios are presented to show the validity of the HMM. Two forecasting formulas are proposed to cope with two scenarios separately.

AC

CE

PT

ED

M

AN US

CR IP T



ACCEPTED MANUSCRIPT

Hidden Markov model for municipal waste generation forecasting under uncertainties

CR IP T

P. Jiang, X. Liu*

Corresponding author: Professor Xiao LIU

AN US

Department of Industry Engineering, Shanghai Jiao Tong University, P. R. China

Department of Industrial Engineering management, Shanghai Jiao Tong University, 800 Dongchuan Road, Min-Hang District, Shanghai 200240, P. R. China

M

Email: [email protected]

AC

CE

PT

Fax: +65-6601-4089

ED

Phone: +65-6601-4089

ACCEPTED MANUSCRIPT Abstract: Waste generation forecasting is a complex process that is found to be influenced by some latent influencing parameters and their uncertainties, such as economic growth, demography, individual behaviors, activities and events, and management policies. These hidden features play an important role in forecasting the fluctuations of waste generation. We therefore focus on revealing the trend of waste generation in megacities which face significant influences of social and economic changes to achieve urban sustainable development. To dynamically trace fluctuations caused by these uncertainties, we propose a probability

CR IP T

model-driven statistical learning approach which hybridizes a wavelet de-noising, a Gaussian mixture model, and a hidden Markov model. First, to gain the actual underlying trend, wavelet de-noising is used to eliminate the noise of data. Next, the Expectation-Maximization and the Viterbi algorithms are employed for learning parameters and discerning the most probable sequence of hidden states, respectively. Subsequently, the state transition matrix is

AN US

updated by fractional predictable changes of influencing parameters to perform non-periodic fluctuation problem forecasting, and the forward algorithm is utilized to search the most similar data pattern for the current pattern from historical data in order to forecast the future trend of the periodic fluctuation problem. Finally, we apply the approaches into two kinds of case studies that test both a small dataset and a large dataset. How uncertainty factors

M

influence forecasted results is analyzed in the subsection of results and discussion. The computational results demonstrate that the proposed approaches are effective in solving the

PT

ED

municipal waste generation forecasting problem.

Keywords: Forecasting, Waste generation, Wavelet de-noising, Hidden Markov model,

AC

CE

Gaussian mixture model

ACCEPTED MANUSCRIPT 1. Introduction Accurate forecasting of municipal waste generation can provide theoretical guidance in disposal capacity design and capital budget planning for waste management systems (Beigl et al., 2008; Cherian and Jacob, 2012; Rimaityte et al., 2012; Intharathirat et al., 2015). Waste generation forecasting in megacities is a complex process that is found to be influenced by some latent influencing factors and their uncertainties, such as rapid economic growth or

CR IP T

economic crisis, relentless demographic forces, ever-changing individual behaviors, special activities and events, and complex waste management measures (Pires et al., 2011; Cherian and Jacob, 2012; Denafas et al., 2014; Ghiani et al., 2014). For example, the annual waste generation trend in a megacity could be altered by large-scale activities (such as the Olympic Games and the World EXPO), long-term epidemic diseases (such as SARS), or a new

AN US

regulation policy for municipal waste. The fluctuation of waste generation comes from unmeasured uncertainties of the economy, individual behaviors, population mobility, and unexpected disasters. The latent factors and their uncertainties are difficult to quantify, and they play an important role in tracing the fluctuation of waste generation dynamically and forecasting the consequences of changes. The fluctuation and nonlinearity of both municipal

M

solid waste (MSW) and wastewater generation is a huge obstacle for accurate forecasting. It is difficult to forecast fluctuation of waste generation accurately via traditional statistical models,

ED

such as the moving average and linear regression models. In this study, we focus on fluctuation forecasting problem. Three major contributions are made in this paper: 1) A probability model-driven statistical learning approach, that hybridizes

PT

a wavelet de-noising (WDE), a Gaussian mixture model (GMM), and a hidden Markov model (HMM), is proposed to trace waste generation fluctuations under uncertainties dynamically; 2)

CE

The way of weighted average Gaussian mixture and the way of searching the most similar pattern are proposed as two concrete HMM-based forecasting ways to cope with the periodic

AC

and non-periodic fluctuation problems, respectively; and 3) Two kinds of case studies of different scenarios are presented in this study. They correspond to a small sample and a large sample datasets to show the effectiveness of the proposed approaches. The first case uses annual MSW generation in Shanghai and Beijing from 1978 to 2010, and the second case utilizes the daily wastewater level of a Spanish urban sewage treatment plant from 1990 to 1991. The paper is organized as follows: Section 2 summarizes the details of related studies. Problem description and forecasting modelling are presented in Section 3. Section 4 provides

ACCEPTED MANUSCRIPT two kinds of case studies to test the effectiveness of the proposed approaches. Finally, we close this paper with some conclusions and guidelines for future work. 2. Related works: a summary So far, a number of forecasting techniques have been applied to waste generation forecasting. Mainstream methods include traditional statistical models, the gray and fuzzy models, simulation models, and non-probabilistic statistical learning models. In this section, we

CR IP T

summarize the application scope, advantages, and weaknesses of these mainstream methods. Subsequently, literature of probability model-driven statistical learning approaches as a comparison is introduced briefly.

Traditional statistical models include linear regression models (LRM), autoregressive moving average (ARMA), autoregressive integrated moving average (ARIMA), seasonal

AN US

ARIMA (sARIMA), exponential smoothing (ES), and seasonal ES (sES). LRM aims to discover a linear function to quantify waste generation. Essentially, the problem of waste generation is nonlinear, so LRM is just an approximate fitting method for waste generation forecasting. The ARMA, a mixture of autoregressive and moving average models, can be employed to forecast waste generation of stationary time series (Katsamaki et al., 1998).

M

ARIMA stochastic models are effective time series forecasting tools which can take influence factors into consideration, and can also process periodic time series data effectively. ARIMA

ED

model has been applied to a range of forecasting fields, such as portfolio optimization (Ustun and Kasimbeyli, 2012), and MSW generation (Owusu-Sekyere, 2013). The ES model is

PT

performed by utilizing the weighted mean of all values in time series, which assigns exponentially decreasing weights over time. Both sARIMA and sES can take into account

CE

seasonal changes and trends (Xu et al., 2013; Song and He, 2014; Rimaityte et al., 2012; Denafas et al., 2014). Traditional statistical models can forecast the moving average or discover smooth trend effectively; however, fluctuations are hard to be traced accurately.

AC

The gray and fuzzy models are based on fuzzy set theory. The fuzzy mathematic method

can solve problems with uncertainty, and can also obtain reliable models given poor data. Chen and Chang (2000) applied a gray fuzzy dynamic prediction technique to forecast MSW generation when the sample numbers were low. Karavezyris et al. (2002) employed fuzzy logic to enhance confidence in the validity of the system dynamics model. Fuzzy logic is an attractive method; however, the knowledge base of the fuzzy model is founded on expert experience. Gray model is suitable to study the uncertainty of systems and handle small sample dataset (Intharathirat et al., 2015); however, the conventional gray prediction model is sensitive to

ACCEPTED MANUSCRIPT initial values (Guo et al., 2014). Simulation models are suitable for simulating complex systems which are difficult to express via mathematic formula. System dynamics (SD) is one of the promising of these methods for studying complex feedback systems. SD models have been widely used for predicting waste generation (Karavezyris et al., 2002; Dyson and Chang, 2005; Kollikkathara et al., 2010; Ahmad, 2012). SD models use a multivariate method and are far more complex due to the multifarious interactions between the selected parameters (Beigl et al., 2008). Therefore,

CR IP T

it is hard to achieve model validation.

Non-probabilistic statistical learning models for waste generation forecasting mainly include artificial neural networks (ANN) and support vector machines (SVM). ANN, one of the most widespread of intelligent statistical-learning models, has powerful ability to deal with nonlinear forecasting problems (Setzler et al., 2009) and has been applied to generation

AN US

forecasting fields (Jalili Ghazi Zade and Noori, 2008; Ali Abdoli, 2012; Antanasijevic et al., 2013). Although ANN is promising, the disadvantages, such as over-fitting and local minimum, limit its application. SVM is a nonlinear and kernel-based intelligent technique. Noori et al. (2009), Abbasi et al. (2013), and Abbasi et.al. (2014) proposed SVM models to implement weekly prediction of waste generation in the Iranian cities of Mashhad and Tehran, respectively.

M

Compared with ANN, SVM overcomes the problem of over-fitting training data and local minimum. However, SVM is still a black box technique which cannot establish a mathematical

ED

functional mapping between the input variables and the drifting parameters (Dong et al., 2009). Recently, probability model-driven statistical learning approaches have been effectively

PT

applied to perform forecasting under uncertainties. Compared with ANN and SVM, they are white box methods which focus more on the components of the mathematical model. For

CE

example, Bayesian network (BN) is a combination of graph theory and probability theory which has unique advantages for handling problems with uncertainty, and mining causality

AC

among variables. BN has been introduced into forecasting areas, such as traffic flow forecasting (Sun et al., 2006) and wastewater inflow monitoring (Cheon et al., 2008). The quality of results produced by the BN method largely depends on the amount of evidence (Dong et al., 2009); however, they are not available in most cases. Another example is HMM which became popular for classification and pattern recognition problems after its appearance. The HMM has been applied to a range of fields, such as stock market forecasting (Hassan, 2009), PM2.5 concentration prediction (Dong et al., 2009), failure prognosis (Zhou et al., 2010; Kim et al., 2011), and manpower planning (Guerry, 2011). However, to the best of our knowledge, no study has previously forecasted waste generation via an HMM.

ACCEPTED MANUSCRIPT Although the first four stream methods have respective merits for various applications, there are still some difficulties using these methods for measuring and tracing fluctuations of waste generation dynamically. The advantages of the HMM lie in its theoretical basis, rigorous mathematical structure, and its proven fitness for modelling dynamic systems under uncertainties. 3. Model

CR IP T

In this section, the study problems and their characteristics are first introduced. Then, models are constructed to forecast MSW and wastewater generation. 3.1. Problem description

Municipal waste generation in megacities is a complex process of the interaction of

AN US

intertwined systems, which is found to be influenced by latent influencing factors and their uncertainties. So waste generation has the characteristics of nonlinearity, fluctuation, and strong interference. The key challenges for waste generation forecasting problem of both MSW and wastewater lie in generation fluctuations which include non-periodic and periodic fluctuation. Annual MSW generation is a non-stationary time series. The increment rate,

M

transformed from MSW generation, shows an obvious non-periodic fluctuation (Fig. 1). Daily wastewater generation level is an approximate stationary time series with periodic fluctuation

AC

MSW Generation / tonne

CE

PT

MSW Increment Rate / %

ED

(Fig. 2).

Year

Fig. 1. MSW generation and the increment rate of generation in city X

Waste Generation / tonne

ACCEPTED MANUSCRIPT 109

107 105 103 101 99

97 95

1

2

3

4

5

6

7

8

9

Period

CR IP T

Fig. 2. Wastewater generation in city Y

Two techniques based on HMM are proposed to deal with above challenges of fluctuations:

1) For the periodic fluctuation problem, all data from each period are selected to set up data patterns. For instance, we will obtain 9 patterns as shown in Fig. 2. Once two similar

AN US

patterns are revealed (e.g., patterns 3 and 6, or patterns 4 and 7), we employ the similar pattern to forecast waste generation trend in the coming period.

2) For the non-periodic fluctuation problem, the variable to be forecasted and other observable factors can be integrated to constitute data patterns. Then, we can use the approach as above to forecast waste generation. However, if the number of searchable patterns is too

M

small, the error of forecasted results becomes unacceptable. In this situation, one alternative

ED

forecasting way (weighted average Gaussian mixture) is employed to solve this problem. 3.2. Waste generation forecasting modelling

PT

A probability model-driven statistical learning approach, which hybridizes a wavelet de-noising, a Gaussian mixture model, and a hidden Markov model, is constructed to dynamically trace fluctuations of waste generation. An HMM is a double embedded stochastic

CE

process which includes a Markov process and a general stochastic process. Fig. 3 shows a simple HMM component structure sketch with a hidden state sequence and a directly visible

AC

observation sequence.

q1

q2

...

qt

...

qT

Hidden state sequence

O1

O2

...

Ot

...

OT

Observation sequence

Fig. 3. Sketch of the structure of a simple hidden Markov model

ACCEPTED MANUSCRIPT In this study, we define the total uncertainty degree of waste generation as a hidden variable. The hidden state sequence is assumed to follow a discrete-time, finite-state, and first-order Markov chain which implies that the current state is dependent only on the previous state. We also regard the variable to be forecasted as an observation which is a continuous random variable. The observation is assumed to be conditionally independent of all others given the state that generated it. Uncertainties do not need to be measured directly; instead, we let historical data tell us what the total uncertainty degree is most likely to be

CR IP T

through data analysis.

Fig. 4 illustrates the process of HMM-based forecasting which includes three parts: Part I: data preprocessing, Part II: HMM modelling, and Part III: forecasting. There are two scenarios which follow different processes in Part III. Scenario I is for the non-periodic fluctuation problem with small sample sizes, and Scenario II is for the periodic fluctuation

AN US

problem with large sample sizes. In this paper, we present two kinds of case studies that represent the different scenarios. MSW generation forecasting for Shanghai and Beijing (Scenario I) is carried out via the way of weighted average Gaussian mixture (WAGM) based on an HMM, while wastewater generation forecasting for the Spain Manresa sewage

(SMSP) based on an HMM.

ED

PT

CE

Part II:Hidden Markov Model

Forecasting by WAGM Forecasting by SMSP

Data post-processing

Viterbi decoding

Training and parameter determination

Parameter initialization

Part I:Preprocessing

Scenario I

HMM modelling

Scenario II

Data preprocessing

Variable selection

Scenario I

M

treatment plant (Scenario II) is implemented via the way of searching the most similar pattern

Part III:Forecasting

Scenario II

AC

Fig. 4. The process of HMM-based forecasting for two different scenarios

3.2.1. Preprocessing - Part I This subsection mainly focuses on the preprocessing process for the whole model. This

involves two steps: variable selection and data preprocessing. Step1. Variable selection The total uncertainty of latent influencing factors (non-observable variables) is taken as an integrated hidden variable which is called the total uncertainty degree of waste generation.

ACCEPTED MANUSCRIPT The variable to be forecasted is regarded as a manifest variable (i.e., observation variable). In Fig. 3, hidden state sequence is composed of different total uncertainty degrees of waste generation, and observation sequence is consist of the actual values of the manifest variable. Step2. Data preprocessing This subsection includes wavelet de-noising and data pattern setup. The former is a tool for removing noise in the data series, and the latter is the basis step of pattern recognition in

CR IP T

an HMM. (I) Wavelet de-noising

The municipal waste generation is fluctuant and nonlinear due to uncertainties. The statistic data of waste generation is not a true reflection of the underlying trend caused by sources of noises. Wavelet de-noising, that uses scaled and translated versions of the mother

AN US

wavelet ψ j ,k ( x) to build an approximation f ( x) of an original function f ( x) , is a reliable method to eliminate the data noise of a time series signal (De Souza Silva et al., 2010). Wavelet translation can be expressed as follows:

 c j ,k j ,k ( x)

(1)

ψ j ,k ( x) = δψ (2 j x - k )

(2)

f ( x) 

M

j ,k

ED

c j ,k  





f ( x) j ,k ( x)dx

(3)

where c j ,k is wavelet coefficients; δ is a constant; k decides the translated versions of a

PT

wavelet; and 2 j is its scale parameter.

Ingrid Daubechies invented compactly supported orthonormal wavelets, which makes the

CE

analysis of discrete wavelets practicable. The families of orthonormal wavelets are also called Daubechies wavelets. The Symlets are nearly symmetrical wavelets that were proposed as

AC

modifications to the Daubechies wavelets (Misiti et al., 2011). In this study, the Daubechies and Symlets

wavelets were employed to carry out wavelet decomposition and

synthesis-reconstruction. After wavelet decomposition, data signal could be decomposed into two categories of filters: low-pass filter (the approximation part) and high-pass filter (the detailed parts). The underlying trend of the all factors can be captured by the approximation part, and the noise features can be recorded by the detailed parts. Eq. (4) reconstructs the approximation part ( AJ ) and the detailed parts ( D1 , D2 ,..., DJ ) into a new time series data signal (Misiti et al., 2011):

ACCEPTED MANUSCRIPT (4)

S = AJ + D1 + D2 +... + DJ

where S is new signal, and J is scale parameter. After reconstruction, the new time series will be regarded as updated data for training and testing of the WDE-HMM approach. This will be illustrated by a specific example in Section 4. (II) Data pattern setup Data patterns that can reflect the characteristics of fluctuation refer to specific and similar

CR IP T

data vectors which follow a defined rule. In order to forecast fluctuations of waste generation in a large sample via an HMM, datasets of observations are usually arranged in a specific order so that every data vector forms a pattern which is the basis for data pattern recognition in an HMM. For example, the wastewater generation data for each week (no data on Saturday) of the Manresa sewage treatment plant is arranged in a data pattern [Mon., Tues., Wed., Thur.,

AN US

Fri., and Sun.]. This pattern can interpret the wastewater generation behavior of each week. Waste generation in historical municipal activities or incidents can provide reference for future forecasting. When a similar situation reoccurs, this historical pattern can be used to forecast. 3.2.2. Hidden Markov model - Part II

M

The main difference between continuous HMMs and discrete HMMs lies in the statistic property of observations. In this study, the observation of each state is a vector of continuous

ED

random variables. This subsection includes four steps: continuous HMM modelling, parameter initialization, training and parameter determination, and Viterbi decoding.

PT

Step 1. Continuous HMM modelling Fig. 5 shows the translation relationship of hidden states and observations in a continuous

CE

HMM. For each hidden state, a probability exists for the current state to transform into any possible state, while each observation vector (i.e., data pattern) can be observed with a

AC

probability in its corresponding hidden state.

ACCEPTED MANUSCRIPT a1N a1j

a2N a2j

a1i a11

a22 a12 a21

s1

s2

aii a2i ... ai2

ai1

aiN ajj aij ... aji

si aj2

sj

aNN ajN ... aNj

sN

aNi aN2

aj1

b1(O)

b2(O)

O

O

bi(O) ...

O

CR IP T

aN1 bj(O) ...

O

bN(O)

...

O

Hidden states

Observation vector

Transition probability

Emission probability

AN US

Fig. 5. The relationship of hidden states and observations in continuous HMM

The primary notations that will be used for the rest of this paper are listed as follows:

N : The number of the hidden states of this model;

M

T : Length of the state sequence or observation sequence. t = 1, 2,..., T ;

S : A set of states. S = {s1 , s2 ,..., si ,..., s N } and i = 1, 2,..., N ;

t;

ED

Q : State sequence {q1, q2 ,..., qT } , where qt = si represents the hidden state at time

O : Observation vector sequence, O = {O1 , O2 ,..., Ot ,..., ΟT } ;

PT

π : The prior probability of initial state { πi };

A : Transition matrix A={aij } , aij represents state transition probability from the

CE

hidden state i to state j , j = 1, 2,..., N ; B : A family of probability density functions B = {b j (O)} ;

AC

b j (O ) : the occurrence probability of observation at state j . b j (O ) is illustrated by a

widely used multi-dimensional Gaussian distribution: b j (O) = ∑P( K j = k | st = j ) b jk (O) k

= ∑P( K j = k | st = j )Φ[O, μ jk , Σ jk ] k

∑c jk [O, μ jk , Σ jk ],

1  j  N ,1  k  K

(5)

k

where O is the vector of observations being modeled; [O, μ jk , Σ jk ] is Gaussian density;

ACCEPTED MANUSCRIPT c jk , μ jk , and Σ jk represent mixture coefficients, mean vector, and covariance matrix for the

k th mixture component at state j , respectively; and

∑c

jk

=1;

k

λ : The overall parameters λ = (π, A, B) , and



πi = 1 , ∑aij = 1 ,  b j ( x)dx  1 . ∑  i j

Step 2. Parameter initialization

CR IP T

Prior knowledge about probabilities of parameters λ = (π, A, B) is extremely useful for parameter initialization, but it is often not available. For case with prior information, parameters can be initialized easily. For case without prior knowledge, we only consider the basic constraints above to initialize parameters ( λ ) randomly. Note that the proper number of hidden states is unknown in this step which will be eventually decided in parameter

AN US

determination of Step 3. Step 3. Training and parameter determination

In this step, parameters are updated through training the HMM. The most suitable number of hidden states and the proper number of mixture components of a GMM are determined by

M

Akaike’s information criterion (AIC). (I) Training

ED

In the training step, once we obtain the observation sequence O = {O1 , O2 ,..., Ot ,..., ΟT } the parameters ( λ ) of an HMM can be trained by the Expectation-Maximization (EM) algorithm (Rabiner, 1989; Bilmes, 1998) which iterates ceaselessly until it converges to the maximum

PT

likelihood. After updating all parameters, the new parameter set is λML = (π , A, B ) = arg max P (O | λ) , where π = {πi } , A = {aij } , and B = {b j (O)} . The updated initial state

CE

λ

probability ( πi ), transition probability ( aij ), mixture coefficients ( c jk ), mean vector ( μ jk ), and

AC

covariance matrix ( Σ jk ) are expressed as follows:

πi = ξ1 (i)

(6)

T -1

T -1

t =1

t =1

T

T

aij =∑ξt (i, j ) /∑ξt (i ) K

c jk = ∑ξt ( j , k ) /∑∑ξt ( j , k ) t =1

T

μ jk =

(7)

(8)

t =1 k =1

T

ξt ( j, k ) Ot /∑ξt ( j, k ) ∑ t =1 t =1

(9)

ACCEPTED MANUSCRIPT

Σ jk =

T

T

t =1

t =1

∑ξt ( j, k ) (Ot - μ jk )(Ot - μ jk )T /∑ξt ( j, k )

(10)

where ξt (i) and ξt ( j ) represent the probability of being in si and s j at time t , respectively; ξt (i, j ) represents the probability of being in si at time t and in s j at time

t +1 ; ξt ( j, k ) represents the probability of being in s j at time t with the k th mixture component; and Ot is the observation at time t in the observation sequence. The

N

ξt (i) = αt (i) βt (i) /∑αt (i) βt (i) i =1

N

ξt ( j ) = αt ( j ) βt ( j ) /∑αt ( j ) βt ( j )

ξt (i, j ) =

AN US

j =1

CR IP T

forward-backward variables α and β can be used to illustrate these probabilities:

αt (i )aij b j (Ot +1 ) βt +1 ( j ) N

N

(11)

(12)

(13)

∑ ∑αt (i)aij b j (Ot +1 ) βt +1( j) i =1 j =1

M

   t ( j )  t ( j )   cjk[Ot , μ jk , Σ jk ]   t ( j , k )   N  t ( j )  t ( j )   K cjk[O , μ , Σ ]  t jk jk    j 1    k 1

(14)

ED

Since the EM algorithm easily converges to a local optimum or a saddle point, the log-likelihood estimator of the EM algorithm may less than from the maximum log-likelihood

PT

estimator. To improve the quality of parameter estimation, in this study, the model is trained by varied initial parameters ( λ ) with multiple runs to obtain the updated parameters ( λ ) by

CE

choosing the biggest log-likelihood estimator. (II) Parameter determination It is difficult yet crucial to determine the number of hidden states and components of the

AC

GMM in an HMM. Tradeoff always exists between a more accurate result and a smaller computing complexity for this problem. In this paper, AIC (Akaike, 1973), which is a simple and useful model selection criterion based on K-L information, is employed to determine the numbers, as the method attempts to optimize the tradeoff between complexity and goodness of fit (Eq. (15)). We consider the number combination of hidden states and mixture components with the smallest AIC as the best parameter for modeling.

AIC = -2ln( L) + 2 p

(15)

p=N +N ( N -1) + N ( K + K ( K+1) / 2)

(16)

ACCEPTED MANUSCRIPT where L is the likelihood function; p is the number of parameters to be estimated which include state, transition, and emission parameters; N is the number of hidden states; and K is the number of components of the GMM. Step 4. Viterbi decoding In the decoding step, the new parameter set λ and the observation sequence O are

Viterbi decoding (Rabiner, 1989). Q* can be expressed as: _

Q* = arg max{P(Q | O, λ)} Q

CR IP T

available. The most probable sequence of states Q* = {q1* , q2* ,..., qT* } can be calculated by the

(17)

Let δt (i) be the maximal probability of state sequences { q1, q2 ,..., qt , qt = si } at time t

AN US

which generate the observations { O1, O2 ,..., Ot } for the given model.

_

δt (i) = max P(q1, q2 ,..., qt , qt = si , O1, O2 ,..., Ot | λ) q1 ,q2 ,...,qt -1

(18)

The Viterbi decoding is a dynamic programming method. Q* = {q1* , q2* ,..., qT* } can be

M

derived when achieving recursive results as max[δt (i)] . 3.2.3. Forecasting and post-processing - Part III

ED

This subsection includes two steps: forecasting waste generation for the two scenarios in Fig. 4 and data post-preprocessing by the bootstrapping.

PT

Step 1. Forecasting

The way of weighted average Gaussian mixture and the way of searching the most similar

CE

pattern (Hassan, 2009) can be employed to forecast non-periodic and non-periodic fluctuations via the HMM, respectively. After repeated experiments, we found that the former is suitable

AC

for both small sample (approximately 25 to 30 data points with non-periodic fluctuation) and large sample forecasting, and that the latter is proper for large sample forecasting. (I) Forecasting way for Scenarios I Conventionally, a sample size less than 30 is considered small, making it difficult to detect similar patterns from historical data with such limited training data. In this situation, the way of weighted average Gaussian mixture, which integrates states transition and observations probability, is a better choice than the way of searching the most similar pattern. The state transition matrix A(i, j ) , learned by the training dataset, can reflect historical

ACCEPTED MANUSCRIPT relations among hidden states. Some fresh information about changes of fractional latent influencing parameters is available at some forecasting points beforehand, such as a new issuing management policy on controlling MSW generation, a deteriorating economic crisis, or an expected large-scale activity. Although the controlling effects of the policy, real deteriorating degree of the crisis, and concrete participation degree of residents in the activity are also uncertain, they can still provide useful information to update the state transition

Fτ = Fτ-1 (1+ rτ )

 N A (qT , s j ) E j ( X ),  j 1 r   N    j 1 A (qT , s j ) E j ( X ),

CR IP T

matrix A(i, j ) to a new one A(i, j ) . The forecasted amount Fτ at time T+τ is given by: (19)

if fresh information exists

(20)

if no fresh information

AN US

where rτ is the MSW increment rate at time T+τ , forecasted via the proposed model; F0=OT when τ = 1 ; X

is a continuous random variable which represents the MSW

increment rate; E j ( X ) is the expectation of X under hidden state j ; qT is the hidden state at time T ; s j represents one state in the set of states S = {s1, s2 ,..., sN } ; and updated

M

Aτ (qT , s j ) is the τ -step state transition probability from the state of time T to the state s j .

(II) Forecasting way for Scenarios II

ED

For a large data sample, once the HMM is trained, the log-likelihood estimator of each data pattern can be calculated by the forward algorithm (Rabiner, 1989). Then, we use Eq. (21)

PT

to discover the time point with the most similar pattern in historical data to the current pattern

CE

(i.e., the pattern at time T).

t *  arg min{| LLET  LLEt |} 1  t  T  1

(21)

t

where LLET and LLEt are log-likelihood estimators at current time ( T ) and historical time

AC

( t ), respectively.

Theoretically, the nearest log-likelihood estimators can be discerned at time point t*

* from the historical data. The waste generation behavior of time point t is most similar to

that of current time point T . We assume that the increment rate that follows the reference point is the same as the point we are attempting to forecast. Thus, the i th forecasted amount in the data pattern at time T +1 is Fi : Fi = OTi [(Oti* +1 - Oti* ) / Oti* +1] 1  i  I

(22)

ACCEPTED MANUSCRIPT where I is the element number of the data pattern defined in a specific case. Step 2. Data post-processing Discovering the most similar pattern is crucial for the second forecasting way. Due to the local optimum characteristic of EM algorithm, the HMM would find several similar patterns to the current one from the historical data. The local optimum may be a better predictor of behavior than a temporally distant global optimum as similar patterns that are closer in time

CR IP T

should have more impact on one another. Here we make use of this fact by training the model via multi-simulation of K runs. After each multi-simulation training, we extract the most similar pattern and calculate a forecasted amount Fk ( k = 1, 2,..., K ). Then, the bootstrapping method (Efron, 1979) was utilized to estimate 95% confidence intervals of the forecasted

sample dataset with unknown distribution. 4. Case study

AN US

amount {F1 , F2 ,..., FK } , which is an especially proper statistical resampling algorithm for small

This section illustrates two kinds of cases, that are yearly MSW generation with non-periodic fluctuation for both Shanghai and Beijing (case I), and daily wastewater

M

generation with periodic fluctuation for the Manresa sewage treatment plant (case II). The original datasets were extracted from the Shanghai Statistical Yearbook 2011 (Shanghai

ED

Bureau, 2011), the Beijing Statistical Yearbook 2011 (Beijing Bureau, 2011), and the UC Irvine Machine Learning Repository (Bache and Lichman, 2013), respectively.

PT

To assess the forecasting accuracy, four frequently used criteria were applied for comparisons of the proposed models and other models. The mean absolute error (MAE), the

CE

mean square error (MSE), the coefficient of determination ( R 2 ), and the adjusted coefficient of

AC

determination ( adjusted R2 ) are given by: MAE=

1 n ∑ A -F n i =1 i i

(23)

MSE=

1 n ( Ai -Fi )2 ∑ n i =1

(24) n

n

i =1

i =1

_

R 2 = 1- SSE / SST = 1- ∑( Ai -Fi ) 2 / ∑( Ai - A) 2 adjusted R 2 = 1 -

n -1 SSE / SST n - k -1

(25) (26)

where Ai is the actual amount; Fi is the forecasted amount; n is the testing sample size; k

ACCEPTED MANUSCRIPT is the number of explanatory variables. ARIMA and sARIMA, ES and sES, ANN, and SVM, introduced in Section 2, are all widely used and effective models to perform waste generation forecasting. Therefore, they were also implemented to compare and assess the effectiveness of the proposed models. For building ARIMA and sARIMA models, the autocorrelation function plot and AIC were both considered to discover the best-fitted model from the profile of different parameters (p, d, q). To build ES model for non-seasonal trend forecasting, the best suitable model is selected out

CR IP T

among simple ES, double (Brown) ES, linear (Holt) ES, and damped-trend linear ES. To analysis seasonal trend by sES model, simple sES, winters additive algorithm based ES, and winters multiplicative algorithm based ES were tested on the formatted time series. The most suitable ES or sES models were selected by the Expert Modeler in the SPSS 19.0 software. The BP neural network, the most frequently used architecture of ANN, was utilized in this

AN US

study, which is a three-layer network including input layer, hidden layer, and output layer. The classical Levenberg-Marquardt backpropagation algorithm is employed to update weights and bias states. Optimized parameters of ANN are given in the subsection of 4.1.3 and 4.2.3. The SVM regression was performed using the LIBSVM software (Chang and Lin, 2011). SVM regression with four types of kernel functions (i.e. linear, polynomial, radial basis

M

function, and sigmoid) were tested to find out the most appropriate kernel function for the given application, according to criteria of MSE and R2 during training and testing stages. A

ED

two-step grid search method consisted of coarse grid search and finer grid search was used to select the best parameter of gamma function and loss function. In the process of parameters

PT

selection, the cross validation was employed to avoid over fitting problem.

CE

4.1. Case I: MSW generation forecasting 4.1.1. Background

AC

Shanghai, the economic center of China, is a megacity with a resident population over 23 million and yearly GDP of more than ¥2 trillion; Beijing, the capital city of China, is another megacity even in the world. The waste generation of both Shanghai and Beijing remains a strong upward trend over the past three decades. Fig. 6 shows MSW generation with non-periodic fluctuation in these two cities from 1978 to 2010.

750 650 550

Shanghai

Beijing

450 350 250 150 50 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 Year

CR IP T

Solid Waste Generation / 104 tonne

ACCEPTED MANUSCRIPT

Fig. 6. MSW generation from 1978 to 2010 of Shanghai and Beijing 4.1.2. Forecasting process

The dataset of waste generation increment rate was chosen as input data in order to meet

AN US

the requirement of stationary time series which was tested by the autocorrelations function and the Q-Q plot. An assumption test method, the one-sample Kolmogorov-Smirnov test, was used to explore the distribution of input data which can be utilized to initialize the state transition matrix. The testing results are presented in Table 1, which show that we cannot reject the null hypothesis that the data in MSW increment rate dataset comes from a normal

M

distribution at the significance level of 0.05. Testing for independent and identically distributed (i.i.d.) hypothesis with the correlogram was implemented for the time series of the

ED

MSW increment rate. The results show that only 1 of the 32 values of ρˆ k lies outside the bounds ±1.96 / T ( α = 0.05 ), and there is no evidence to reject the null hypothesis

PT

( H 0 : ρk = 0 ) that the observations are independently distributed. The selected dataset was divided into two sets, one for training and the other for testing.

CE

To avoid random error, the HMM was tested using data from 2004 to 2010. We trained it using the data from 1978 to 2003 to forecast waste generation in 2004, continued to train it

AC

using the data from 1978 to 2004 to forecast waste generation in 2005, and so forth. In order to forecast waste generation via the WDE-HMM approach, waste generation data signal was decomposed by Symlets of level 5 for Shanghai and level 3 for Beijing, and then de-noised by the penalized high thresholding function via the MATLAB wavelet toolbox. Using AIC, we define the number of hidden states (N) as 5 and the mixture components (K) of a GMM as 3 (see Table 2).

Table 1. One-sample Kolmogorov-Smirnov test for waste increment rate datasets

ACCEPTED MANUSCRIPT

Sample Size

Beijing (1978-2010) 32

Shanghai (1978-2010) 32

Beijing (1978-2003) 25

Shanghai (1978-2003) 25

Normal (Mean/SD)

0.077/0.061

0.066/0.093

0.075/0.057

0.075/0.103

Most Extreme Differences

0.117

0.162

0.162

0.166

Kolmogorov-Smirnov Z

0.664

0.915

0.810

0.830

Asymp. Sig. (2-tailed)

0.769

0.373

0.528

0.495

NHS

NGMC

MLE

AIC

NHS

NGMC

MLE

AIC

1

-160.5

350.9

4

3

-113.5

331.1

3

2

-142.7

333.5

4

4

-95.8

335.7

3

3

-131.9

335.9

5

1

-143.2

356.3

3

4

-120.5

342.3

5

2

-118.7

337.3

4

1

-154.3

356.5

5

3

-91.0

322.0

4

2

-125.5

323.1

5

4

-74.4

338.8

NHS

NGMC

MLE

AIC

6

1

-130.6

357.1

6

2

-101.1

334.1

6

3

-72.1

324.3

6

4

-50.4

340.9

AN US

3

CR IP T

Table 2. AIC scores of different hidden Markov models

Note: NHS=the number of hidden states; NGMC=the number of GMM mixture components; MLE=the maximum log-likelihood estimator.

M

In this study, the total uncertainty degree of waste generation (denoted as ζ ) is taken as a hidden variable. The lower the total uncertainty degree is, the smaller the increment rate

ED

fluctuation around the mean value will be; while the higher it is, the larger the increment rate fluctuation will be. ζ=ζ (ς1, ς2 , ς3 , ς4 , ς5 ,...) expresses that ζ is correlated to a number of

PT

uncertainties, mainly such as uncertainty from the economy ( ς1 ), population mobility ( ς 2 ), individual behaviors ( ς3 ), management policy ( ς 4 ), unexpected disasters ( ς5 ), etc. In case I,

CE

the number of hidden states is five. So total uncertainty degree can be divided into five discrete states (denoted as “–high,” “–low,” “0,” “+low,” and “+high”), where “–” means total

AC

uncertainty of negative effects which resulting in a waste increment rate that is lower than the mean increment rate, while “+” indicates the total uncertainty of positive effects which cause the waste increment rate to be higher than the mean. The initial state transition probability among different total uncertainty degrees can be expressed by Gaussian distribution of the MSW increment rate and a random parameter, for example, A0 (3, 3) = Φ( μ + ασ ) - Φ( μ - ασ ) ,

A0 (3, 2) = A0 (3, 4) = Φ( μ - ασ ) - Φ( μ - 2ασ ) and

A0 (3,1) = A0 (3, 5) = Φ( μ - 2ασ ) , where μ is the mean

value of the increment rate; σ is standard deviation; and  [0.5,1] is a random term. The

initial transition matrix can be expressed as:

ACCEPTED MANUSCRIPT high low high  a h, h  low  al , h A0 (i, j ) = 0  a0, h  low  a l , h  high  a h, h

a h,l al ,l a0,l a l ,l a h,l

low  high

0 a h,0 al ,0 a0,0 a l ,0 a h,0

a h, l al , l a0, l a l , l a h, l

a h, h  al , h  a0, h   a l , h  a h, h 

(27)

For each annual forecasting, 150 training runs were carried out via different initial

CR IP T

parameters ( λ ) to obtain updated parameters ( λ ) by choosing the largest log-likelihood estimator. The iterative curve of the largest log-likelihood estimator of Year 2009 is plotted in Fig. 7, and it shows that the log-likelihood estimator achieves its maximum (-91.0) after 21 iterations. The most probable state sequence from 1979 to 2008 (Fig. 8) can be revealed via Viterbi decoding, which illustrates the changes in the levels of hidden states. An assumption

AN US

test method, the chi-square test, was used to verify the Markov property of the hidden state χ 2 = 41.6 (the case of Beijing) and

sequence.

χ 2 = 36.0 (the case of Shanghai),

χ 2 > χα2 ((q -1)2 ) = 26.3 ( q=5 and α=0.05 ), where α is significance level, and (q -1)2 is the degree of freedom. The test result shows the hidden state sequence {q1, q2 ,..., qT } satisfies

M

the Markov property. -100

ED

-110

-120 -130 -140

PT

Log-likelihood estimator

-90

-150 -160

CE

0

3

6

9 12 Iterations

15

18

21

Fig. 7. The log-likelihood estimator iterative curve of MSW forecasting of Year 2009 Hidden States

AC

+high

Shanghai

Beijing

+low

0

-low

-high 1979 1981 1983 1985 1987 1989 1991 1993 1995 1997 1999 2001 2003 2005 2007 Year

Fig. 8. The most probable state sequence from 1979 to 2008 of Shanghai and Beijing

ACCEPTED MANUSCRIPT Fresh information of uncertainty factors can be utilized to update state transition matrix and improve forecasted results. In Year 2008, Beijing hosted the Olympic Games, and Beijing government encouraged citizens to conserve energy and reduce emissions. After the Olympic Games, the government issued a policy on comprehensively advancing domestic waste sorting on April 28, 2009. The effect of this policy would likely last a few years. Shanghai is another city that experiences the same effect. In order to transform itself into a National Environmental Protection Model City, the government issued a policy on comprehensively

CR IP T

controlling environment pollution on July 9, 2007. The policy emphasized on carrying out diagnostic investigations in rest months of Year 2007 and completing control task at the end of Year 2008. The effect of this policy would be significant in the target year but limited for following years, as it is a short term task whose aim is to win the title of Model City.

The fresh information makes intuitive sense that waste generation of next year will

AN US

change and state transition matrix should be updated to respond to the new information. Once fresh information is obtained, we can consult experts in waste management system two questions: 1) Which is the most probable state of the next year considering fresh information? 2) How many probabilities will this state absorb from other states? Experts can give a probability interval if they are uncertain about the exact value. Then, the most common

M

method (the weighted average) in group decision problem was utilized to aggregate experts’

PT

Year 2009 for instance):

ED

knowledge. The updated transition matrix can be expressed as (taking Beijing’s situation of

R-l ,- h

R0,-h

R+l ,- h

(28)

R+ h,- h

CE

A(i, j ) =

high low 0 low  high i [ai , h ai ,l ai ,0 ai ,l ai , h ]

where R j ,-h is the absorption ratio of probabilities from other states. To simplify this

AC

process, the same ratios are chosen for R j ,-h of different j .

4.1.3. Parameters setup of comparison methods ES model can only utilize waste generation variable to perform forecasting. However, the

two major drivers for waste generation, the Resident Population and the Real GDP (Wang and Nie, 2001), were chosen as predictors in the ARIMA model to improve its forecasting accuracy. In ANN and SVM models, the best suitable predictors are the Resident population, the Real GDP and the Resident Consumption Level, which were selected by criteria of MSE and R2 during training and testing stages. The increment rate of waste generation in t +1

ACCEPTED MANUSCRIPT year ( RtW+1 ) is a function of the increment rate in t year ( RtW ) and increment rates of another three factors in t year ( RtP , RtGDP , and RtC ). All models used in this study are data-driven style, which means it is needed to retrain these models once new data is obtained. As a result, the best parameters for each year forecasting may be different. In the case of Shanghai, ARIMA (1, 1, 1) was selected for years of 2004, 2005, and 2009; ARIMA (1, 2, 1) was used for another 4 years. Linear (Holt) ES is

CR IP T

the best fitted ES model for Shanghai case. In the case of Beijing, ARIMA (1, 1, 1) was selected for years 2004 to 2010. Linear (Holt) ES was used for years 2004 and 2008, and double (Brown) ES was chosen for other years. The basic parameters of non-probabilistic statistical learning models for Shanghai and Beijing cases are the same. Optimized BP neural network parameters: learn rate equals to 0.2; transfer function of hidden layer is tansig; error

AN US

goal is 0.0001; input nodes are 4; maximum iterations is 5000; hidden nodes are 10; output node is 1; training algorithm is Levenberg-Marquardt backpropagation. In SVM regression, radial basis function (RBF) was selected as kernel function, which is consistent with the work of Abbasi et al. (2014). The parameter of gamma function and penalty parameter are both set to be the range [2-8, 28] in coarse grid search. Parameter ranges were narrowed in finer grid

M

search via the results of coarse grid search. The steps of two searches are set to be 1 and 0.2, respectively.

ED

4.1.4. Results and discussion

We compare the results of ARIMA, ES, ANN, SVM, HMM, and WDE-HMM

PT

approaches for Shanghai and Beijing cases in Figs. 9 and 10, respectively. Tables 3 and 4 summarize MAE, MSE, R2, and adjusted R2 of the forecasted results for Shanghai and Beijing

CE

cases, respectively. The results indicate that HMM outperforms ARIMA, ES, ANN, and SVM (except for the MAE of HMM is slightly more than that of ES in Shanghai case). After

AC

wavelet de-noising, WDE-HMM performs better than HMM. The forecasting accuracies of HMM* and WDE-HMM* are further improved after taking into fresh information of management policies consideration. Table 3. Forecasting accuracies of different models for Shanghai case ARIMA

ES

ANN

SVM

HMM

HMM*

WDE-HMM

WDE-HMM*

MAE

14.84

11.93

13.37

15.67

12.45

10.78

11.74

9.93

MSE

463.8

247.8

371.7

336.1

240.5

152.1

225.0

130.8

R

0.741

0.861

0.792

0.812

0.865

0.915

0.874

0.927

adjusted R2

0.612

0.861

0.584

0.624

0.865

0.898

0.874

0.912

2

ACCEPTED MANUSCRIPT

Table 4. Forecasting accuracies of different models for Beijing case ARIMA

ES

ANN

SVM

HMM

HMM*

WDE-HMM

WDE-HMM*

MAE

26.13

26.30

25.34

24.49

23.9

22.1

19.9

17.9

MSE

929.8

945.0

918.0

865.0

801.3

653.6

687.2

540.4

0.893

0.891

0.894

0.900

0.907

0.925

0.921

0.938

0.840

0.891

0.789

0.801

0.907

0.910

0.921

0.925

2

R

2

adjusted R

CR IP T

Note: The testing sample size is small (n=7), so the adjusted R2 have a relatively big gap with R2 for those methods with more predictors (such as ANN and SVM). * Considering fresh information of policies, the most probable states inferred by experts are both “-h” for Year 2008 in Shanghai and Year 2009 in Beijing; the absorption rate of probabilities were calculated to be 0.25 and 0.35, respectively. All experts’ knowledge is collected in the Shanghai Environmental Protection Bureau.

All the forecasted amount of the proposed approaches are close to actual amount in Figs. 9 and 10. Not only does WDE-HMM have more accurate results in regular years but also it

AN US

appears to be more reliable than other methods when encountering a drastic fluctuation, such as the fluctuation points of Year 2008 in Shanghai case and Year 2009 in Beijing case. The disadvantages of comparison methods are revealed in these two cases. ARIMA cannot catch the fluctuation effectively. ES chooses a compromise between general and fluctuation points. Although ES can effectively reflect the upward trend, it would lost lots of fluctuation

M

information. ANN is not stable due to its characteristic of over-fitting and local minimum.

ED

740

690

665

PT

715

Actual amount ARIMA ES ANN SVM HMM* WDE-HMM*

CE

MSW Generation / 104 tonne

Both ANN and SVM depends on influencing factors (predictors) heavily.

640

AC

615

590 2004

2005

2006

2007 Year

2008

2009

2010

Fig. 9. Forecasted and actual amount of Shanghai case from 2004 to 2010

ACCEPTED MANUSCRIPT

660 620

580

Actual amount ARIMA ES ANN SVM HMM* WDE-HMM*

540 500

460 420

380 2004

2005

2006

2007 Year

CR IP T

MSW Generation / 104 tonne

700

2008

2009

2010

Fig. 10. Forecasted and actual amount of Beijing case from 2004 to 2010

As mentioned in Section 3.2, uncertainties are not measured directly; instead, historical

AN US

data indicate the likely total uncertainty degree. The total uncertainty degree

ζ=ζ (ς1, ς2 , ς3 , ς4 , ς5 ,...) is determined by many uncertainties. In most cases, ζ is high when one or two of the uncertainty factors vary sharply, and ζ is low if all uncertainties are general. According to the calculated result of the total uncertainty degree (Fig. 8), we can

M

search for the source of uncertainty in reality from the five major sources. ζ of Year 2000 is “+high”, likely due to government policy regarding increasing environmental sanitation

ED

facilities. ζ of Year 2003 is “+high” in Shanghai and “+low” in Beijing, probably because of an explosion of disposable supplies during the SARS outbreak. The global economic crisis

PT

and winning a title of Model City are the probable reasons that ζ of Year 2008 is “-high” in Shanghai; however, ζ in the same year of Beijing is “0”, partly caused by the fact that the

CE

Olympic Games was held. ζ of Years 2006 and 2007 in Shanghai, and ζ of Year 2007 in Beijing are “0” since there were few changes in economic, population mobility, individual

AC

behaviors, and management policy. The way of weighted average Gaussian mixture was proposed to deal with the case I.

Once we have determined the total uncertainty degree in a year, the emission matrix ( B ), and the state transition probability matrix ( A ) which are updated by training the model via the historical dataset and even some fresh information, we can forecast weighted waste generation of the next year via probability calculation. Thus, fluctuation forecasting of waste generation can be implemented even when some of the uncertainty factors vary sharply.

ACCEPTED MANUSCRIPT 4.2. Case II: wastewater generation forecasting 4.2.1. Background The study site is the Manresa sewage treatment plant, located in Barcelona (Spain) which is one of the most populous cities in the world. Spain, the fourth largest country in Europe, is located at the western edge of the Mediterranean. Spain has a population of over 47 million people which ranks in the top 30 worldwide. The wastewater treatment system of Spain is

CR IP T

well developed, with more than 80% of the wastewater being treated. Spain has been actively promoting the reuse of wastewater that has been treated by urban sewage treatment plants as an aspect of implementing sustainable urban development. The original dataset of wastewater generation with periodic fluctuation has 477 daily data, which is a period from January 1,

AN US

60000 50000

40000 30000

Original signal De-noised signal

20000 10000 0

50

100

M

Wastewater Generation / tonne

1990 to July 10, 1991, not including Saturdays (see the original signal in Fig. 11).

150

200

250

300

350

400

450

Day

ED

Fig. 11. The original vs. de-noised signal of daily wastewater generation

PT

4.2.2. Forecasting process

Using MATLAB wavelet toolbox, the original signal was decomposed by Daubechies of level 2, and then de-noised by the penalized medium thresholding function. Fig. 11 shows

CE

both the original signal and the de-noised signal. Considering weekly periodicity, the datasets for HMM and WDE-HMM approaches were transformed into a matrix datasets with rows that

AC

form a weekly time series, and columns that follow a pattern [Mon., Tues., Wed., Thur., Fri., and Sun.]. Each row data is a stationary time series tested by the autocorrelations function and the Q-Q plot. The distribution of input data was explored by the one-sample Kolmogorov-Smirnov test (Table 5). Testing for i.i.d. hypothesis with the correlogram was also implemented for time series data of wastewater generation. The results show that 2, 1, 2, 1, 1, and 3 of the 78 values of ρˆ k lie outside the bounds ±1.96 / T ( α = 0.05 ) for datasets of Mon., Tues., Wed., Thur., Fri., and Sun., respectively. So there is no evidence to reject the null hypothesis that the observations are independently distributed. The selected dataset was

ACCEPTED MANUSCRIPT also divided into two sets, one for training and the other for testing. We tested the proposed approaches using the 79th and the 80th week data. Using AIC criterion, we set the number of hidden states to 5 and the mixture components of a GMM to 3 (see Table 6). The result of the chi-square test is: χ 2 = 71.8 > χα2 ((q -1)2 ) = 26.3 ( q=5 and α=0.05 ), which shows that the hidden state sequence {q1, q2 ,..., qT } satisfies the Markov property.

Sample Size

Monday 80

Tuesday 80

Wednesday 80

Thursday 79

Friday 79

Sunday 79

38943/6387

39544/7151

38741/5643

37054/6489

37404/5603

35027/6994

Most Extreme Differences Kolmogorov-Smirnov Z

0.076

0.101

0.149

0.091

0.096

0.101

0.678

0.902

1.329

0.807

0.855

0.899

Asymp. Sig. (2-tailed)

0.748

0.391

0.059

0.533

0.458

0.394

AN US

Normal (Mean/SD)

CR IP T

Table 5. One-sample Kolmogorov-Smirnov test for wastewater generation datasets

NHS

NGMC

MLE

AIC

NHS

NGMC

MLE

AIC

NHS

NGMC

MLE

AIC

3

1

-133.5

297.0

3

2

-98.8

245.5

3

3

-81.9

235.9

4

1

-117.0

281.9

4

2

-82.5

237.0

4

3

-57.6

219.1

5

1

-95.6

261.1

5

2

-60.8

221.7

5

3

-30.5

200.9

6

1

-83.4

262.7

6

M

Table 6. AIC scores of different hidden Markov models

-46.8

225.6

6

3

-23.4

226.7

2

ED

Note: NHS=the number of hidden states; NGMC=the number of GMM mixture components; MLE=the maximum log-likelihood estimator.

PT

To facilitate the forecasting accuracies of HMM and WDE-HMM approaches, we carried out 20 runs of multi-simulations (30 simulations each run) from different initial parameters ( λ )

CE

to obtain 20 total forecasted amount {F1 , F2 ,..., F20 } via choosing the biggest log-likelihood estimator. For the WDE-HMM approach, only the top three most similar historical patterns of

49000

Wastewater Generation / tonne

Wastewater Generation / tonne

AC

the current pattern were selected to illustrate the result of pattern recognition in Fig. 12. 44000 39000 34000 29000

The 44th week The 60th week

The 58th week The 78th week

24000

Mon.

Tues.

Wed. Thur. Time

Fri.

Sun.

54000

49000 44000 39000 34000 29000 24000

The 14th week The 70th week Mon.

Tues.

The 24th week The 79th week

Wed. Thur. Time

Fri.

Sun.

Fig. 12. The current wastewater generation pattern, (left, for the 78th week) and (right, for the 79th week), matched with historical similar data patterns

ACCEPTED MANUSCRIPT 4.2.3. Parameters setup of comparison methods Historical data of influencing factors for wastewater generation is not available in this case. So only historical data of wastewater generation can be utilized as predictors for ANN and SVM models. However, this does not affect sES and sARIMA models since they were supposed to utilize wastewater generation data to implement forecasting. As the week period is 6, for one week ahead forecasting of ANN and SVM models, the wastewater generation in

CR IP T

t +1 day ( Wt +1 ) is a function of wastewater generation in t - 5 day ( Wt -5 ), t - 6 ( Wt -6 ), … , and t -10 ( Wt -10 ). For one day ahead forecasting, the wastewater generation in t +1 day ( Wt +1 ) is a function of wastewater generation in t day ( Wt ), t -1 day ( Wt -1 ), … , and t -5 day ( Wt -5 ).

The best-fitted model of sARIMA in this case was selected to be sARIMA (2, 0, 1) (1, 0,

AN US

1)6. A time lag for 6 days was used as the seasonal parameter for the Auto Regressive. Winters' Multiplicative algorithm based ES is the most suitable model for wastewater generation forecasting with seasonal period of 6. Optimized BP neural network parameters: learn rate equals to 0.2; transfer function of hidden layer is tansig; error goal is 0.0001; input

M

nodes are 6; maximum iterations is 10000; hidden nodes are 15; output node is 1; training algorithm is Levenberg-Marquardt backpropagation. In SVM regression, RBF was selected as

ED

the best kernel function. The parameter of gamma function and penalty parameter are both set to be the range [2-8, 28] in coarse grid search. The steps of coarse grid search and finer grid

PT

search are set to be 1 and 0.1, respectively. 4.2.4. Results and discussion

CE

For each day, the mean value of 20 forecasted amount of the proposed approaches was regarded as the final forecasted amount. The forecasted results of sARIMA, sES, ANN, SVM, HMM, and WDE-HMM approaches are compared in Fig. 13.

AC

Table 7 summarizes MAE, MSE, and R2 of the forecasted results. The results indicate

that the HMM approach outperforms sARIMA, sES, ANN, and SVM. The WDE-HMM approach also performs better than the HMM approach after wavelet de-noising. Since data of influencing factors is not available, the forecasting accuracies for one week ahead forecasting of ANN and SVM are much worse than the proposed approaches; however, the forecasting accuracies were raised under one day ahead forecasting.

ACCEPTED MANUSCRIPT

34000

31000 Actual amount sARIMA ANN SVM sES HMM WDE-HMM

28000

25000 1

2

3

4

5 Day

6

CR IP T

Wastewater Generation / tonne

37000

7

8

9

Fig. 13. Forecasted and actual amount of the testing dataset

sARIMA*

sES*

ANN**

2086.3

1490.3

1395.1

9.44

4.04

5.54

-0.407

0.397

0.175

MAE 6

MSE (/10 ) 2

R

AN US

Table 7. Forecasting accuracies of different models for wastewater generation ANN*

SVM**

SVM*

HMM*

WDE-HMM*

1696.3

1462.3

1684.5

1272.9

986.3

6.76

4.71

6.29

3.28

2.03

-0.008

0.298

0.063

0.511

0.697

Note: * means one week ahead forecasting, and **means one day ahead forecasting. Adjusted R equals to R2,

M

since there are no extra predictors in this case.

2

ED

Next, a bootstrapping method was utilized to calculate 95% confidence intervals of forecasted amount derived from HMM and WDE-HMM approaches (Figs. 14 and 15). The 95%

PT

confidence intervals of WDE-HMM are narrower than those of HMM. The scope of the confidence intervals of WDE-HMM encapsulates a greater number of observation data points. This visually displays that the forecasting precision of WDE-HMM is higher than HMM.

CE

Fig. 15 shows that all actual wastewater generations (except for the 6th point) are within the scope of 95% confidence intervals, and that the trend of forecasted amount is similar to

AC

that of actual amount. The reason why the 6th point is not in the 95% confidence intervals is that new waste generation uncertainties exist for the current pattern which cannot be fully traced in history. However, the good news is that this new feature can be utilized in the future forecasting. In the case II, the way of searching the most similar pattern was used to forecast wastewater generation with periodic fluctuation for a large sample case. The results demonstrate that this approach is a good way to dynamically trace historical patterns. This also indicates that we have achieved the goal of forecasting fluctuations of waste generation.

40000 37000 34000

31000 Upper confidence interval Lower confidence interval Actual amount

28000 25000

1

2

3

4

5 Day

6

7

8

9

CR IP T

Wastewater Generation / tonne

ACCEPTED MANUSCRIPT

40000

34000 31000

AN US

37000

Upper confidence interval Lower confidence interval Actual amount

28000 25000 1

2

3

4

M

Wastewater Generation / tonne

Fig. 14. 95% confidence intervals of forecasted amount of the HMM approach

5 Day

6

7

8

9

ED

Fig. 15. 95% confidence intervals of forecasted amount of the WDE-HMM approach 5. Conclusions and future work

PT

In this study, we have proposed a probability model-driven statistical learning approach to forecast the fluctuant generation of municipal waste under uncertainties. The proposed

CE

approaches, which hybridizes a wavelet de-noising, a Gaussian mixture model, and a hidden Markov model, were tested by two kinds of case studies. They are Shanghai and Beijing

AC

annual MSW generation with non-periodic fluctuation, and daily wastewater generation with periodic fluctuation of the Manresa sewage treatment plant. The results have indicated that the HMM approach generally outperforms the ARIMA, sARIMA, ES, sES, ANN, and SVM approaches in both MAE, MSE, R2, and adjusted R2 criteria. The WDE-HMM approach performs better after the wavelet de-noising. The computational results have demonstrated that the proposed approaches are effective in solving the municipal waste generation forecasting problem. We presented two concrete forecasting ways, the way of weighted average Gaussian mixture (WAGM) and the way of searching the most similar pattern (SMSP), to forecast the

ACCEPTED MANUSCRIPT MSW generation with non-periodic fluctuation, and the wastewater generation with periodic fluctuation, respectively. Actually, different steps of state transition in the state probability matrix can be chosen in the way of WAGM. Therefore, the way of WAGM is suitable for short or medium term forecasting of both small and large samples; while the way of SMSP is advantageous for short term forecasting with large samples. In this study, we have implemented the fluctuation forecasting of waste generation which is found to be influenced by some latent influencing factors and their uncertainties. How

CR IP T

uncertainty factors influence forecasted results was illustrated in the subsection of results and discussion. In MSW generation cases, the fresh information is effectively utilized to update the state transition matrix to better foresee future trend.

Future work is needed to improve the EM algorithm. Due to the local optimum characteristics of the EM algorithm, the model was trained repeatedly. This required a

AN US

significant computing effort. In addition, one can further attempt to forecast middle-term waste generation via an improved hidden Markov model. Furthermore, n this study, the transition matrix is time homogeneous in each forecasting process although it can be updated online when new data is available, and can be updated once fresh information is obtained. However, non-homogeneous hidden Markov models are likely to achieve higher forecasting

M

accuracy.

ED

References

Abbasi, M., Abduli, M. A., Omidvar, B., & Baghvand, A. (2013). Forecasting municipal solid

PT

waste generation by hybrid support vector machine and partial least square model. International Journal of Environmental Research, 7(1), 27-38.

CE

Abbasi, M., Abduli, M. A., Omidvar, B., & Baghvand, A. (2014). Results uncertainty of support vector machine and hybrid of wavelet transform-support vector machine models for

AC

solid waste generation forecasting. Environmental Progress & Sustainable Energy, 33(1), 220-228.

Ahmad, K. (2012). A system dynamics modeling of municipal solid waste management systems in Delhi. Renewable Energy. 1, 628–641.

Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Petrov, B., & Csaki, F., (Eds.), Second international symposium on information theory (pp. 267-281). Budapest. Ali Abdoli, M., Falah Nezhad, M., Salehi Sede, R., & Behboudian, S. (2012). Longterm forecasting of solid waste generation by the artificial neural networks. Environmental

ACCEPTED MANUSCRIPT Progress & Sustainable Energy, 31(4), 628-636. Antanasijevic, D., Pocajt, V., Popovic, I., Redzic, N., & Ristic, M. (2013). The forecasting of municipal waste generation using artificial neural networks and sustainability indicators. Sustainability science, 8(1), 37-46. Bache,

K.,

&

Lichman,

M.

(2013).UCI

Machine

Learning

Repository

[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science. [last accessed on Aug 2, 2015].

review. Waste management, 28(1), 200-214.

CR IP T

Beigl, P., Lebersorger, S., & Salhofer, S. (2008). Modelling municipal solid waste generation: A

Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Technical Report, Computer

Science

ICSI-TR-97-021, 4(510): 126. Bureau,

S.

S.,

2011.

Institute,

University

of

AN US

International

Shanghai

statistical

California

Berkeley,

yearbook

2011

[http://www.stats-sh.gov.cn/data/toTjnj.xhtml?y=2011e]. [last accessed on Aug 2, 2015]. Bureau, B, S., 2011. Beijing statistical yearbook 2011 [http://www.bjstats.gov.cn/]. [last accessed on Aug 2, 2015].

M

Chang, C. C., & Lin, C. J. (2011). LIBSVM : a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3), 27.

ED

Chen, H. W., & Chang, N. B. (2000). Prediction analysis of solid waste generation based on grey fuzzy dynamic modeling. Resources, Conservation and Recycling, 29(1), 1-18.

PT

Cheon, S., Kim, S., Kim, J., & Kim, C. (2008). Learning Bayesian networks based diagnosis system for wastewater treatment process with sensor data. Water Science & Technology,

CE

58(12), 2381–2393.

Cherian, J., & Jacob, J. (2012). Management Models of Municipal Solid Waste: A Review

AC

Focusing on Socio Economic Factors. International Journal of Economics and Finance, 4(10), 131-139.

Denafas, G., Ruzgas, T., Martuzevicius, D., Shmarin, S., Hoffmann, M., Mykhaylenko, V., ... & Ludwig, C. (2014). Seasonal variation of municipal solid waste generation and composition in four East European cities. Resources, Conservation and Recycling, 89, 22-30. De Souza e Silva, E. G., Legey, L. F., & de Souza e Silva, E. A. (2010). Forecasting oil price trends using wavelets and hidden Markov models. Energy economics, 32(6), 1507-1519. Dong, M., Yang, D., Kuang, Y., He, D., Erdal, S., & Kenski, D. (2009). PM2.5 concentration

ACCEPTED MANUSCRIPT prediction using hidden semi-Markov model-based times series data mining. Expert Systems with Applications, 36(5), 9046-9055. Dyson, B., & Chang, N. B. (2005). Forecasting municipal solid waste generation in a fast-growing urban region with system dynamics modeling. Waste management, 25(7), 669-679. Efron, B. (1979). Bootstrap methods: another look at the jackknife. The annals of Statistics, 7(1), 1-26.

CR IP T

Ghiani, G., Laganà, D., Manni, E., Musmanno, R., & Vigo, D. (2014). Operations research in solid waste management: A survey of strategic and tactical issues. Computers & Operations Research, 44, 22-32.

Guerry, M. A. (2011). Hidden heterogeneity in manpower systems: A Markov-switching model approach. European Journal of Operational Research, 210(1), 106-113.

AN US

Guo, X., Liu, S., Wu, L., & Tang, L. (2014). Application of a Novel Grey Self-Memory Coupling Model to Forecast the Incidence Rates of Two Notifiable Diseases in China: Dysentery and Gonorrhea. PloS one, 9(12), e115664.

Hassan, M. R. (2009). A combination of hidden Markov model and fuzzy model for stock market forecasting. Neurocomputing, 72(16), 3439-3446.

M

Intharathirat, R., Salam, P. A., Kumar, S., & Untong, A. (2015). Forecasting of municipal solid waste quantity in a developing country using multivariate grey models. Waste

ED

Management, 39, 3-14.

Jalili Ghazi Zade, M., & Noori, R. (2008). Prediction of municipal solid waste generation by

PT

use of artificial neural network: A case study of Mashhad. International Journal of Environmental Research, 2(1), 13-22.

CE

Karavezyris, V., Timpe, K. P., & Marzi, R. (2002). Application of system dynamics and fuzzy logic to forecasting of municipal solid waste. Mathematics and Computers in Simulation,

AC

60(3), 149-158.

Katsamaki, A., Willems, S., & Diamadopoulos, E. (1998). Time series analysis of municipal solid waste generation rates. Journal of Environmental Engineering, 124(2), 178-183.

Kim, M. J., Jiang, R., Makis, V., & Lee, C. G. (2011). Optimal Bayesian fault prediction scheme for a partially observable system subject to random failure. European Journal of Operational Research, 214(2), 331-339. Kollikkathara, N., Feng, H., & Yu, D. (2010). A system dynamic modeling approach for evaluating municipal solid waste generation, landfill capacity and related cost management issues. Waste management, 30(11), 2194-2203.

ACCEPTED MANUSCRIPT Misiti, M., Misiti, Y., Oppenheim, G., & Poggi, J. M. (2011). Wavelet Toolbox™ 4 Getting Started Guide. The MathWorks, Inc.: Natick, MA, (Chapter1). Noori, R., Abdoli, M. A., Ghasrodashti, A. A., & Jalili Ghazizade, M. (2009). Prediction of municipal solid waste generation with combination of support vector machine and principal component analysis: A case study of Mashhad. Environmental Progress & Sustainable Energy, 28(2), 249-258. Owusu-Sekyere, E. (2013). Forecasting and Planning for Solid Waste Generation in the

CR IP T

Kumasi Metropolitan Area of Ghana: An ARIMA Time Series Approach. International Journal of Sciences, 2(2013-04), 69-83.

Pires, A., Martinho, G., & Chang, N. B. (2011). Solid waste management in European countries: A review of systems analysis techniques. Journal of environmental management, 92(4), 1033-1050.

AN US

Rabiner, L. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257-286.

Rimaityte, I., Ruzgas, T., Denafas, G., Racys, V., & Martuzevicius, D. (2012). Application and evaluation of forecasting methods for municipal solid waste generation in an eastern-European city. Waste Management & Research, 30(1), 89-98.

M

Setzler, H., Saydam, C., & Park, S. (2009). EMS call volume predictions: A comparative study. Computers & Operations Research, 36(6), 1843-1851.

ED

Song, J., & He, J. (2014). A multistep chaotic model for municipal solid waste generation prediction. Environmental engineering science, 31(8), 461-468.

PT

Sun, S., Zhang, C., & Yu, G. (2006). A Bayesian network approach to traffic flow forecasting. IEEE Transactions on Intelligent Transportation Systems, 7(1), 124-132.

CE

Ustun, O., & Kasimbeyli, R. (2012). Combined forecasts in portfolio optimization: a generalized approach. Computers & Operations Research, 39(4), 805-819.

AC

Wang, H., & Nie, Y. (2001). Municipal Solid Waste Characteristics and Management in China. Journal of the Air & Waste Management Association, 51(2), 250-263.

Xu, L., Gao, P., Cui, S., & Liu, C. (2013). A hybrid procedure for MSW generation forecasting at multiple time scales in Xiamen City, China. Waste Management, 33(6), 1324-1331. Zhou, Z. J., Hu, C. H., Xu, D. L., Chen, M. Y., & Zhou, D. H. (2010). A model for real-time failure prognosis based on hidden Markov model and belief rule base. European Journal of Operational Research, 207(1), 269-283.