Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms

Journal of Hydrology 477 (2013) 119–128 Contents lists available at SciVerse ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/l...

Download PDF

2MB Sizes 0 Downloads 81 Views

Report

PDF Reader
Full Text

Journal of Hydrology 477 (2013) 119–128

Contents lists available at SciVerse ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Advancing monthly streamﬂow prediction accuracy of CART models using ensemble learning paradigms Halil Ibrahim Erdal a,⇑, Onur Karakurt b,1 a b

_ Turkish Cooperation and Coordination Agency (TIKA), Atatürk Bulvarı No. 15, Ulus Ankara, Turkey Gazi University Engineering Faculty, Civil Engineering Department, Celal Bayar Bulvarı, 06570 Maltepe, Ankara, Turkey

a r t i c l e

i n f o

Article history: Received 3 July 2012 Received in revised form 5 November 2012 Accepted 6 November 2012 Available online 16 November 2012 This manuscript was handled by Geoff Syme, Editor-in-Chief Keywords: Bagging (bootstrap aggregating) Classiﬁcation and regression trees Ensemble learning Stochastic gradient boosting Streamﬂow prediction Support vector regression

s u m m a r y Streamﬂow forecasting is one of the most important steps in the water resources planning and management. Ensemble techniques such as bagging, boosting and stacking have gained popularity in hydrological forecasting in the recent years. The study investigates the potential usage of two ensemble learning paradigms (i.e., bagging; stochastic gradient boosting) in building classiﬁcation and regression trees (CARTs) ensembles to advance the streamﬂow prediction accuracy. The study, initially, investigates the use of classiﬁcation and regression trees for monthly streamﬂow forecasting and employs a support vector regression (SVR) model as the benchmark model. The analytic results indicate that CART outperforms SVR in both training and testing phases. Although the obtained results of CART model in training phase are considerable, it is not in testing phase. Thus, to optimize the prediction accuracy of CART for monthly streamﬂow forecasting, we incorporate bagging and stochastic gradient boosting which are rooted in same philosophy, advancing the prediction accuracy of weak learners. Comparing with the results of bagged regression trees (BRTs) and stochastic gradient boosted regression trees (GBRTs) models possess satisfactory monthly streamﬂow forecasting performance than CART and SVR models. Overall, it is found that ensemble learning paradigms can remarkably advance the prediction accuracy of CART models in monthly streamﬂow forecasting. Crown Copyright Ó 2012 Published by Elsevier B.V. All rights reserved.

1. Introduction Forecasting of the streamﬂow measurements for daily, monthly or longer time intervals is of great importance in terms of both managing water resources systems and planning of new water resources systems effectively. The studies for forecasting of the stream data by using different methods in order to operate water resources systems and perform re-planning activities effectively continue to increase. The realistic forecasting of the streamﬂow measurements which depends on a number of hydrologic factors will contribute to planning of the short and long-term water resources systems (such as ﬂood control and reservoir operation) in the most effective manner. Therefore, the prediction of the water ﬂow measurements in water systems planning studies is essential and various models developed for this subject continue to increase gradually. In the recent years, ensemble learning techniques (also known as committee machines) (Anctil and Lauzon, 2004; Snelder et al.,

⇑ Corresponding author. Tel.: +90 312 508 10 00; fax: +90 312 309 89 68. E-mail addresses: [email protected] (H.I. Erdal), [email protected] (O. Karakurt). 1 Tel.: +90 312 231 74 00; fax: +90 312 230 84 34.

2009) have been employed in the modeling and the estimation of hydrologic variables in different research areas. In general, ensemble methods (i.e. bagging, boosting) are designed to overcome problems with weak predictors (Hancock et al., 2005). Artiﬁcial neural networks (ANNs) and decision trees (DTs) are commonly used as base predictors in building ensemble machine learning models (Zhang et al., 2008). This study investigates bagging and boosting techniques, which are widely used in machine learning literature, to build strong predictors for streamﬂow estimation. Bagging (acronym for bootstrap aggregating) was proposed by Breiman (1996) to improve the prediction accuracy of weak learners. The bagging aims minimizing of prediction variance by generating bootstrapped replica datasets. Boosting (also known as arching) is also very popular an ensemble method which comes from the same philosophy. Boosting creates a linear combination out of many models (Hancock et al., 2005). Each new model is dependent on the preceding model (Friedman, 2002). These techniques differ in the ways to process the training data and to combine the predictions coming from their base predictors (Zhang et al., 2008). To the best of our knowledge ensemble methods have not been applied extensively in hydrological time series analysis especially in streamﬂow estimation. A few available implementations of ensemble methods in hydrological time series respectively: Jeong

0022-1694/$ - see front matter Crown Copyright Ó 2012 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jhydrol.2012.11.015

120

H.I. Erdal, O. Karakurt / Journal of Hydrology 477 (2013) 119–128

and Kim (2005) applied an ensemble neural network (ENN) in monthly inﬂows forecasting to the Daecheong dam in Korea. The ENN combined the outputs of member neural networks employing the bagging method. The overall results showed that the ENN outperformed than a simple ANN among the three rainfall-runoff models. Cannon and Whitﬁeld (2002) studied the use of ensemble neural networks modeling in the streamﬂow forecasting. Li et al. (2010) applied bagging to construct various support vector machines (SVMs) models in the streamﬂow prediction. The results showed that the bagged SVM model outperforms the prediction ability of bagged multiple linear regressions (MLRs), simple SVM, and simple MLR models in all of the adopted evaluation scores. Tiwari and Chatterjee (2011) investigated hybrid wavelet bootstrapped artiﬁcial neural networks models in daily discharge forecasting. The results revealed that the model which uses the capabilities of both bootstrap and wavelet methods was the best model. Araghinejad et al. (2011) investigated both generation and combination techniques of ANN ensembles in the peak discharge forecasting of the ﬂoods. The results indicated that using the ANN ensembles could enhance the probabilistic forecast skill for hydrological events. Tiwari and Chatterjee (2010) built a hybrid wavelet–bootstrap–ANN model for hourly ﬂood forecasting. The results indicated that the proposed model can enhance ﬂood forecasting results. Snelder et al. (2009) developed a method of mapping ﬂow regime classes using boosted regression trees (BRTs). The performance of the BRT models was compared with other methods (i.e., linear discriminant analysis LDA and classiﬁcation and regression trees CARTs). They found that, boosted regression trees could increase the conﬁdence in decisions associated with setting environmental ﬂows and the ability to undertake broad-scale ecohydrological research. Boucher et al. (2010) used bagged multi-layer perceptrons for a 1-day-ahead streamﬂow forecasting purpose on three watersheds. Shu and Burn (2004) incorporated bagging and boosting in building ANN ensembles for estimating the index ﬂood and the 10-year ﬂood quantile. The comparison between ANN ensembles and a single ANN showed that ANN ensembles were more accurate in ﬂood estimation. To our knowledge there are limited applications of classiﬁcation and regression trees within hydrological forecasting: Vezza et al. (2010) were applied multiple regressions with morphoclimatic catchment characteristics in sub-regions obtained through four classiﬁcation methods: seasonality indices (SIs), classiﬁcation and regression trees (CRTs), residual pattern approach (RPA) and weighted cluster analysis (WCA). They found that the CRT model outperforms the models obtained by the other classiﬁcation techniques in terms of explained variance. The SVM models have been successfully used to estimate seasonal, monthly and hourly streamﬂows and showed good generalization performance in their applications respectively: Kisi and Cimen (2011) investigated usage of a wavelet-support vector regression (WSVR) conjunction model for in monthly streamﬂow forecasting. The test results were compared with the single support vector regression model. The comparison results showed that the discrete wavelet transform could increase the accuracy of the SVR model in forecasting monthly streamﬂows. Guo et al. (2011) studied to improve the performance of the SVM model in predicting monthly streamﬂow, and compared the results of modiﬁed SVM with the results of ANN model and conventional SVM model. Asefa et al. (2006) used SVMs in predicting both seasonal and hourly streamﬂows. The results from SVMs showed a promising performance in predicting site-speciﬁc, real-time streamﬂows. Samsudin et al. (2010) demonstrated how the monthly river ﬂow could be well represented by the hybrid models, combining the group method of data handling and the least squares support vector machine models.

2. Method In this study, we use a classiﬁcation and regression trees (CARTs) model for monthly streamﬂow forecasting, ﬁrst. A conventional well-known machine learning model, support vector regression (SVR), is employed as the benchmark model. Then, the bagging and stochastic gradient boosting ensemble learning methods are incorporated in building tree-based ensemble modeling to optimize the prediction accuracy of single CART model. In building tree-based ensembles, a training sample is drawn randomly from the entire observed data, ﬁrst. Then, a CART model is built using this replica dataset. In addition, this procedure is repeated 100 times to get 100 individual forecast models that provide the variability in sampling. Each of these CART models is then used to forecast of the current ﬂow and the linear combination of these forecasts is used as the ﬁnal forecast. Forecasting of the streamﬂow measurements for daily, monthly or longer time intervals is of great importance for the effective operation of a water resources system. Plenty of the activities related to planning and operation of the components of a water resource system require decent forecasts of future events. The storage-yield sequences are generally related to monthly periods. Hence, monthly river ﬂow forecast is very important for water resource system planning Kisi et al. (2011). In the applications, the ﬁrst 30-year of data (360 months) is used for training and the remaining 5-year data (60 months) is used for testing. The surface water hydrographs of rivers exhibit large variations due to many natural phenomena. One of the most commonly used approaches for interpolating and extending streamﬂow records is to ﬁt observed data with an analytic power model. However, such analytic models may not adequately represent the ﬂow process, because they are based on many simplifying assumptions about the natural phenomena that inﬂuence the river ﬂow. This paper demonstrates how a simple ensemble model can be used as an adaptive model as well as a predictor. Therefore, three attribute combinations based on preceding monthly streamﬂows are developed to forecast current streamﬂow values. The attribute combinations used in the application are: (i) Qt–1, (ii) Qt–1 and Qt–2, and (iii) Qt–1, Qt–2 and Qt–3. In all combinations, the output is the discharge Qt for the current month. The following three performance measures are used to evaluate the proposed predictive models. The correlation coefﬁcient (R) is a common measure of how well the curve ﬁts the actual data. A value of 1 indicates a perfect ﬁt between actual and predicted values, meaning that the values have the same propensity. The mathematical formula for computing R is:

P P P n y y0 ð yÞð y0 Þ ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ q R ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ P P ﬃ P P ð y2 Þ ð y2 Þ ð y02 Þ ð y0 Þ2

ð1Þ

where y = actual value; y0 = predicted value; and n = number of data samples. The root mean squared error (RMSE) is the square root of the mean square error. The RMSE is thus the average distance of a data point from the ﬁtted line measured along a vertical line. The RMSE is given by the following equation:

sﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ P 0 ðy yÞ2 RMSE ¼ n

ð2Þ

The mean absolute error (MAE) is a quantity used to measure how close forecasts or predictions are to the eventual outcomes. The mean absolute error is given by

MAE ¼

1X jy y0 j n i¼1;n

ð3Þ

Also six numerical descriptors were computed to investigate the statistical relation between observed streamﬂow data and predicted streamﬂow data;

H.I. Erdal, O. Karakurt / Journal of Hydrology 477 (2013) 119–128

by using the mean for regression problems or majority voting for classiﬁcation problems (Pino-Mejias et al., 2008). For a regression problem it works as fallowed (Bühlmann and Yu, 2002): A training set of D consists of data fðX i ; Y i Þ; i ¼ 1; 2; . . . ; ng with Yi the real-valued response and Xi a p-dimensional explanatory variable for the ith instance. A predictor EðYjX ¼ xÞ ¼ f ðxÞ is denoted by

Maximum discharge (Max Q). Minimum discharge (Min Q). Mean of discharge (Mean Q). Variance of discharge (Var Q). Maximum under-prediction (MUP). Maximum over-prediction (MOP).

Moreover, the accuracy of the peak ﬂow estimates will have a major effect on the design of drainage or ﬂood control facilities. Thus, this study also argues the performance of peak ﬂow prediction accuracy of proposed predictive models.

C n ðxÞ ¼ hn ðD1 ; . . . ; Dn ÞðxÞ

2.1. Classiﬁcation and regression trees Classiﬁcation and regression trees (CARTs) was proposed by Breiman et al. (1984) and it has gained popularity in the resent years. However, CART is identiﬁed as unstable learner because it is prone to overﬁtting (Breiman, 1996). More speciﬁcally, CART is very sensitive to small changes in the training dataset (Hastie et al., 2001). It works as fallowed (Hancock et al., 2005): Each node within the tree has a partitioning rule and the partitioning rule is deﬁned through minimization of the relative error (RE) which is the minimization of the sums-of-squares of a split for regression problems: L X

ðy1 yL Þ2 þ

l¼0

R X ðyr yR Þ2

ð4Þ

r¼0

where y1 and yr are the left and right partitions with L and R observations of y in each, with respective means yL and yR . The decision rule d is a point in some estimator variable x that is used to determine the left and right branches. The partitioning rule that minimizes the RE is then used to construct a node in the tree. The primary parameters for the CART are the following: the number of folds; the minimum total weight; and the number of seeds and the values for these parameters are 3, 2 and 1, respectively. A CART structure is depicted in Fig. 1. 2.2. Bagging Bagging (acronym for bootstrap aggregating) is one of the earliest and most popular ensemble methods. Bootstrap resampling method (Efron, 1979) and aggregating are the basis of bagging. Variety in bagging is derived by using bootstrapped replicas of the original data. Different training sub-datasets are drawn at random with replacement from the training dataset. Separate models are produced and used to predict the entire data from aforesaid subsets. Then various estimated models are aggregated

ð5Þ

where hn is the nth hypothesis. Theoretically, bagging is deﬁned as follows: First, construct a bootstrapped sample

Di ¼ ðY i ; X i Þ

REðdÞ ¼

121

ð6Þ

according to the empirical distribution of the pairs Di = (Xi, Yi), where ði ¼ 1; 2; . . . ; nÞ. Secondly, estimate the bootstrapped predictor by the plug-in principle,

C n ðxÞ ¼ hn ðDi ; . . . ; Dn ÞðxÞ

ð7Þ

where Cn(x) = hn(D1, . . . , Dn)(x) Finally, the bagged predictor is:

C n;B ðxÞ ¼ E ½; Dn ðxÞ

ð8Þ

To sum up, bagging is one of the simplest to implement technique which can reduce variance when combined with the base learner generation, with a good performance (Wang et al., 2011). A more detailed version of bagging is described in Breiman (1999). The bagging ensemble model structure developed in the present study is shown in Fig. 2. A CART is employed as base predictor of BRT and the primary parameters for the CART are identical to Section 2.1. Moreover the bagging parameters are the size of each bag (as a percentage); the number of iterations (number of trees); and the number of seeds. In this case, the values for these parameters are 100, 100, and 1, respectively. 2.3. Stochastic gradient boosting Boosting is an important machine learning meta-algorithm because it enhances the prediction accuracy of weak predictors like decision (regression) trees and artiﬁcial neural networks. First boosting algorithm was introduced by Schapire (1990). This study employs the stochastic gradient boosting (also known as treeboost) algorithm which was introduced by Friedman (2001, 2002) to construct the gradient boosted regression trees model. Stochastic gradient boosting is a statistical method of ﬁtting an additive model of base functions and generates replica datasets from original dataset to optimize prediction accuracy and it can be deﬁned as fallowed (Hancock et al., 2005): A training sample of D consists of data {(xi,yi), i = 1,2, . . . , n} where xi is a variable within the predictor set and yi is the response variable. Initialize the model F0(x) with a constant value, ﬁrst: N X F 0 ðxÞ ¼ arg min Wðyi ; cÞ

c

ð9Þ

i¼1

Compute so-called pseudo-residuals:

@ Wðyi ; Fðxi ÞÞ zim ¼ @Fðxi Þ Fðxi Þ¼F m1ðxÞ

ð10Þ

where zi is the path of steepest decent, W is the loss function and m is number of model (m = 1, . . . , M). This direction is used to constrain each new model entering the boosted subset. Then, compute the parameter am N X am ¼ arg min ½zim qhðxi ; aÞ a;q

Fig. 1. A CART structure.

ð11Þ

i¼1

where a is the split point of xi, h(xi;am) is the model (in this study, it is a CART), and q is the weighting of the tree. After that, calculate

122

H.I. Erdal, O. Karakurt / Journal of Hydrology 477 (2013) 119–128

Fig. 2. Bagging ensemble model structure.

the optimal value of the expansion coefﬁcient bm for the model (base learner) h(xi;am) N X bm ¼ arg min Wðyi ; F m1 ðxi Þ þ bhðxi ; am ÞÞ b

ð12Þ minimize Rreg ðf Þ ¼

i¼1

where b is the weight of each new tree in the direction of zi. After above two-step least squares process we can update Fm

F m ðxÞ ¼ F m1 ðxÞ þ bm hðxi ; am Þ

ð13Þ

The shrinkage parameter m controls the learning rate of the process and reduces the risk of an overﬁtting.

F m ðxÞ ¼ F m1 ðxÞ þ mbm hðxi ; am Þ

ð14Þ

where 0 < m 6 1 Tree-based ensembles of gradient boosting models can increase their accuracy by generating a series of models and then combine them into an ensemble model with higher performance. Moreover, gradient boosted regression trees inherits advantages of regression trees while overcoming their inaccuracy (Chou et al., 2011). And gradient boosting ensemble model structure built in the present study is shown in Fig. 3. In this study, a CART is used as the base learner and the parameters of CART are identical to Section 2.1. The best parameters for the GBRT are the number of iterations (number of trees) m is 100 and the shrinkage m is 0.5. 2.4. Support vector regression As a statistical method, support vector machines were originally utilized by Vapnik (1995) for the solution of binary classiﬁcation problems. In the following year, SVM regression was developed by Vapnik, Drucker, Burges, Kaufman and Smola (Schölkopf et al., 1999). This new method was named as support vector regression (SVR). It can be deﬁned as fallowed (Erdal and Ekinci, 2012): Considering a set of training data fðx1 ; y1 Þ; ::::; ðx‘ ; y‘ Þg, where each xi 2 Rn ; yi 2 R, the decision function is given by

f ðxÞ ¼ ðw UðxÞÞ þ b;

with respect to w 2 Rn , b 2 R where U denotes a non-linear transformation from Rn to high dimensional space (). The primal optimization problem is given by:

ð15Þ

l X 1 jjwjj2 þ C Sðf ðxiÞ yiÞ 2 i¼0

ð16Þ

Decent settings are vital for forecasting accuracy of learning machines. The SVR’s parameters are as follows: The kernel is poly kernel; the complexity parameter is 1, 2 and 3; the epsilon is 1.0E11 and 1.0E12 the exponent is 1, 2 and 3. The best parameter conﬁguration for this technique is the complexity parameter is 1, epsilon is 1.0E12 and the exponent is 1. 3. Study area In this study, the monthly streamﬂow data of Karsßıköy observation station on the Çoruh River in the Eastern Black Sea Region of Turkey are used. The location of the station is shown in the Fig. 4 and the coordinate of the station is on the 41°4200 380 E longitude and on the 41°2700 070 N latitude. The Drainage area of the station is 19654.4 km2 and the elevation of the station is 57 m. The streamﬂow data measured between 1964 and 2002 (39 years) of the Karsßıköy Station. However, 35 years of measured data (1968– 2002; 420 months) is used in application. The observed data is for hydrologic years, i.e. the ﬁrst month of the year is October and the last month of the year is September. This study is focused on the Çoruh River because total of 13 hydro-electric dams are planned as part of the Çoruh River Development Plan but a total of 27 are proposed for the Çoruh River Catchment. Under the Çoruh Development Plan, two dams have been completed (Murtli Dam and Tortum Dam), another is under construction (Deriner Dam) and Yusufeli Dam, just upstream is in its ﬁnal planning phase. 4. Results The correlation coefﬁcient (R), root mean squared error (RMSE), mean absolute error (MAE) statistics of the classiﬁcation and

Fig. 3. Gradient boosting ensemble model structure.

123

H.I. Erdal, O. Karakurt / Journal of Hydrology 477 (2013) 119–128

Fig. 4. The Karsßıköy Station on the Çoruh River in Turkey.

regression trees (CARTs) and support vector regression (SVR) models in training and testing phases are given in Tables 1 and 2. The tables point that the CART and SVR models whose inputs are the ﬂows of three previous months have the best accuracy in both training and test period. Tables 1 and 2 show that CART (Rtraining = 0.9096, Rtesting = 0.7349) model is superior to the SVR (Rtraining = 0.7792, Rtesting = 0.7033) model for determining correlation coefﬁcient statistic in prediction accuracy. The relative correlation coefﬁcient (R) differences between the CART (attribute combination iii) and the SVR (attribute combination iii) models are 16.74% in the training phase, and 4.49% in the testing phase. We have found that there is a direct relationship between R, MAE and RMSE in training phase and CART is superior to SVR for minimizing RMSE (CART = 79.46 m3/s; SVR = 126.32 m3/s) and MAE (CART = 49.88 m3/s; SVR = 73.88 m3/s). However, RMSE (CART = 137.01 m3/s; SVR = 132.77 m3/s) and MAE (CART = 77.76 m3/s; SVR = 75.28 m3/s) statics are inconsistent with the R statics in testing phase. Even though we have proved that the CART model is superior to SVR model, we feel that the obtained results of CART model in testing phase is not satisfactory. Thus, we use bagging and stochastic gradient boosting in building CART ensemble models (bagged

regression trees BRT; stochastic gradient boosted regression trees GBRT) to enhance the accuracy of forecasting. The performance statistics of BRT and GBRT models in training and testing phases are given in Tables 3 and 4. The tables depict that the BRT and GBRT models with three attributes yield the best results in both training and test period. Tables 1–3 indicate that the obtained results of the CART (Rtraining = 0.9096), BRT (Rtraining = 0.9086) and GBRT (Rtraining = 0.9163) models are very close and better than SVR (Rtraining = 0.9792) model in training phase. However, Tables 1–4 show that the ensemble models (BRT = 0.8085; GBRT = 0.8054) are superior to single machine learning models (CART = 0.7349; SVR = 0.7033) for determining R statics in testing phase. The relative correlation coefﬁcient (R) differences between the BRT (attribute combination iii) and the CART (attribute combination iii) models is 10.01%, and differences between the GBRT (attribute combination iii) and the CART (attribute combination iii) models are and 9.59% in the testing phase. Obviously, it can be understood from the empirical results that ensemble learning paradigms can improve the prediction accuracy of their base predictors. In addition, the relative correlation coefﬁcient (R) differences between the BRT (attribute combination iii) and the SVR (attribute

Table 1 R, MAE and RMSE statistics for CART. Model

CART

Model attributes

(i) Q(t–1) (ii) Q(t–1) & Q(t–2) (iii) Q(t–1), Q(t–2) & Q(t–3)

Training

Testing

MAE (m3/s)

RMSE (m3/s)

R

MAE (m3/s)

RMSE (m3/s)

R

95.54 52.42 49.88

145.33 88.87 79.46

0.6479 0.8851 0.9096

92.86 83.21 77.76

137.35 141.12 137.01

0.6493 0.7038 0.7349

124

H.I. Erdal, O. Karakurt / Journal of Hydrology 477 (2013) 119–128

Table 2 R, MAE and RMSE statistics for SVR. Model

Model attributes

Training 3

SVR

(i) Q(t–1) (ii) Q(t–1) & Q(t–2) (iii) Q(t–1), Q(t–2) & Q(t–3)

Testing 3

MAE (m /s)

RMSE (m /s)

R

MAE (m3/s)

RMSE (m3/s)

R

88.40 73.94 73.88

148.55 126.79 126.32

0.6695 0.7778 0.7792

89.85 75.30 75.28

152.56 133.17 132.77

0.5833 0.7018 0.7033

MAE (m3/s)

RMSE (m3/s)

R

MAE (m3/s)

RMSE (m3/s)

R

86.95 49.90 50.21

127.90 79.57 81.33

0.7457 0.9122 0.9086

96.11 68.51 67.65

143.99 117.17 106.86

0.6077 0.7818 0.8085

MAE (m3/s)

RMSE (m3/s)

R

MAE (m3/s)

RMSE (m3/s)

R

89.24 67.02 53.86

130.19 99.18 80.39

0.7441 0.8907 0.9163

93.23 76.71 69.15

136.27 116.52 108.38

0.6587 0.7651 0.8054

Table 3 R, MAE and RMSE statistics for BRT. Model

BRT

Model attributes

(i) Q(t–1) (ii) Q(t–1) & Q(t–2) (iii) Q(t–1), Q(t–2) & Q(t–3)

Training

Testing

Table 4 R, MAE and RMSE statistics for GBRT. Model

GBRT

Model attributes

(i) Q(t–1) (ii) Q(t–1) & Q(t–2) (iii) Q(t–1), Q(t–2) & Q(t–3)

Training

Testing

combination iii) models is 14.96%, and differences between the GBRT (attribute combination iii) and the SVR (attribute combination iii) models are and 14.52% in the testing phase. The obtained results indicate that tree-based ensemble models are superior alternatives to conventional machine learning models. Moreover, MAE and RMSE statics are inconsistent with the determination of correlation statics in training and testing phases. For minimizing RMSE statics, GBRT (81.83 m3/s) is the best, BRT (80.39 m3/s) is the second, CART (79.46 m3/s) is the third and SVR (126.32 m3/s) is the worst model and CART (49.88 m3/s) is the best, BRT (50.21 m3/s) is the second GBRT (53.86 m3/s) is the third and SVR (73.88 m3/s) is the worst model for minimizing MAE statics in training phase. In addition, in testing phase, the best model for minimizing RMSE (106.86 m3/s) and MAE (67.65 m3/s) is BRT, the 2nd model is GBRT (RMSE = 108.38 m3/s, MAE = 69.15 m3/s), the 3th model is SVR (RMSE = 132.77 m3/s, MAE = 75.28 m3/s) and ﬁnally the worst model is CART (RMSE = 137.01 m3/s, MAE = 77.76). The BRT, GBRT, CART and SVR forecasts and residuals in test period are shown in Fig. 5. The BRT and GBRT models can approximate the hydrographs better than the CART and SVR models. The under estimations of the peaks of proposed predictive models can be seen from the residuals. Moreover, numerical descriptors (i.e., Max Q, Min Q, Mean Q, Var Q, MUP & MOP) for the proposed ensemble machine learning models are summarized in Tables 5 and 6. The streamﬂow peak-estimates obtained by the BRT, GBRT, CART and SVR models and the corresponding observed values are compared in Table 7 and Fig. 6. From the table and ﬁgure it can be seen that, the accuracy of tree-based models (i.e., CART, BRT GBRT) are better than SVR model. The BRT estimation of the maximum peak is 566.5 m3/s instead of observed 740.8 m3/s with an under estimation of 23.5%, while the GBRT SVR and CART compute as 546.2 m3/s with an under estimation of 26.2%, 524.8 m3/s with an under estimation of 29.1% and 520.8 m3/s with an under estimation of 29.7%, respectively. The CART estimation of the second

maximum peak is 520.8 m3/s instead of observed 701.40 m3/s, with an under estimation of 25.7%, while the BRT, GBRT and SVR result in 447.6 m3/s with an underestimation of 36.18%, 444.5 m3/s, with an underestimation of 36.6% and 205.4 m3/s, with an underestimation of 70.7%, respectively. For training and test phases the box plots depict the distributions of monthly observed and predicted streamﬂows in Figs. 7 and 8. The box height corresponds to the interquartile range, the whiskers depict the 5th and 95th percentiles and the horizontal line is the median. Dots indicate values outside the range and the horizontal line within each boxes indicate the median values. Observed and predicted streamﬂow data are positively skewed for training and test phases. Box plot representations of observed and predicted streamﬂow variability are shown in the ﬁgures. The performance of BRT model is better than the SVR, CART and GBRT models when compared to the distribution of the observed streamﬂow data. Moreover the distribution of streamﬂow data predicted by the BRT model is similar to the distribution of observed data and the BRT model do the best job at the capturing the observed data for training and test phases. Figs. 9 and 10 depict the distributions of the models and the measured ﬂow data statistically for training and testing phases. The ﬁgures show that the best results obtained by BRT and GBRT gave a better ﬁt to a straight line than SVR and CART did, which indicated that these techniques were more accurate for predicting streamﬂow. 5. Discussion In this study, a classiﬁcation and regression tree (CART) is employed in monthly streamﬂow forecasting and is compared with the support vector regression (SVR) model, ﬁrst. For determining correlation coefﬁcient statistic, the CART model yields better results than the SVR model. The relative correlation coefﬁcient (R) differences between the CART and the SVR models are 16.74% in the training phase, and 4.49% in the testing phase.

125

H.I. Erdal, O. Karakurt / Journal of Hydrology 477 (2013) 119–128

Fig. 5. Monthly streamﬂow estimates of SVR, CART, BRT and GBRT models in test period.

The results indicate that CART model is signiﬁcantly better than SVR model especially in training phase. This may happen because, ﬁrst of all, CART does not need a priori information about dataset and this allows considering a bigger variety of possible model speciﬁcations. Even if training set holds some irrelevant information (e.g. measurement errors, misspeciﬁcation) the model will choose correct splits by itself and hence account for disturbances automatically. Although all possible data splits are analyzed, CART architecture is ﬂexible to account all of them and do it faster. CART can use any combination of continuous and categorical data so researchers is no more limited to a particular class of data and created models will be able to capture more real-life effects (Andriyashin, 2005). Moreover, It is well known that, the decent parametric settings are vital for forecasting accuracy of SVM and the performance of SVM is highly depends on kernel function selection which forces researchers to make additional assumptions. On the other hand, the architecture of CART is non-parametric. When no data structure hypotheses are available, non-parametric analysis becomes very effective data mining tool. Beside this, when using CART for predictive modeling researchers do not need to make any additional assumptions concerning model errors distribution (Andriyashin, 2005). Ultimately, the ﬂexible structure of CART helps it to ﬁt the target correctly. Conversely, the SVR is more rigid method than the CART and its stiffness (bias) may increase the prediction error of the model. However, in the testing phase, the obtained result of CART (R = 0.7349) is not remarkable. Thus, secondly, two ensemble learning paradigms (i.e., bagging and stochastic gradient boosting) are incorporated in building CART ensembles. In bagging and boosting, multiple versions of the CART are formed by making bootstrap replicas of the learning set and using these as new learning sets. The predicted value generated by the ensembles is an average over these multiple versions of predictors (Grunwald et al., 2009). Although bagging and boosting both combine the outputs from different predictors, they differ in the ways to permutate the training data and to combine the estimates coming from their base learners Zhang et al. (2008). In bagging, each new resample is drawn at random with replacement from the entire learning dataset which are all independent, however in stochastic gradient boosting the resampling for the next CART depends on the performance of the previous CART. More precisely, in boosting, the algorithm trains the ﬁrst CART with the original sample, and the training set of a new CART is assembled based on the performance of the prior CART. The new sample obtained from the previous CART differs signiﬁcantly from observed values.

Table 5 Numerical descriptors for four predictive models and observed data for training phase.

Observed SVR CART BRT GBRT

Min Q (m3/s)

Max Q (m3/s)

Mean Q (m3/s)

Var Q

MOP (m3/s)

MUP (m3/s)

48.96 34.43 63.94 72.85 67.79

1119.58 893.51 984.44 774.81 842.09

210.63 174.30 211.99 210.94 212.07

44365.70 30379.87 44938.54 44495.49 44974.77

313.36 380.74 340.04 304.46

799.15 300.81 399.29 316.19

Table 6 Numerical descriptors for four predictive models and observed data for testing phase.

Observed SVR CART BRT GBRT

Min Q (m3/s)

Max Q (m3/s)

Mean Q (m3/s)

Var Q

MOP (m3/s)

MUP (m3/s)

56.89 12.95 63.94 73.11 80.00

740.84 613.23 984.44 663.32 722.04

204.83 170.93 218.99 210.20 215.50

41953.42 29215.56 47957.53 44184.57 46438.58

254.30 681.34 355.47 418.94

495.98 390.16 405.15 410.10

126

H.I. Erdal, O. Karakurt / Journal of Hydrology 477 (2013) 119–128

Table 7 The comparison of SVR, CART, BRT and GBRT peak-estimates for the test period. Peak no.

Observed peaks (m3/s)

BRT

GBRT

CART

SVR

1 2 3 4 5 6 7 8 9

318.6 581.9 251.8 659.8 561.2 740.8 701.4 391.7 589.5

168.0 530.4 139.3 538.8 156.0 566.6 447.6 152.9 550.7

222.5 476.4 159.2 559.1 151.1 546.2 444.5 154.1 559.1

171.0 518.6 171.0 520.8 171.0 520.8 520.8 159.0 520.8

161.3 293.2 111.4 255.8 130.4 524.8 205.4 136.5 367.7

Relative error (%) BRT (%)

GBRT (%)

CART (%)

SVR (%)

47.3 8.9 44.7 18.3 72.2 23.5 36.2 61.0 6.6

30.2 18.1 36.8 15.3 73.1 26.3 36.6 60.7 5.2

46.3 10.9 32.1 21.1 69.5 29.7 25.8 59.4 11.7

49.4 49.6 55.7 61.2 76.8 29.2 70.7 65.1 37.6

Fig. 6. Monthly streamﬂow peak estimates of SVR, CART, BRT and GBRT models in test period.

Fig. 8. Box plots of monthly observed and predicted streamﬂows for test phase.

Fig. 7. Box plots of monthly observed and predicted streamﬂows for training phase.

The patterns of new sample are adjusted with higher probability of being sampled, thus, they will have a greater chance of appearing in the new sample than those correctly predicted. This means that different CARTs are specialized in different parts of the observation space (Shu and Burn, 2004). In this study, 100 randomly drawn replica datasets are generated for building both bagged and boosted regression trees. Then, these datasets is used to build individual CART models. Finally, the outputs of the CART modes are combined using linear combination for ﬁnal forecast. To the best of our knowledge this is the ﬁrst study which employs the ensemble learning paradigms in building CART ensembles for monthly streamﬂow forecasting. In the training phase the results of CART (R = 0.9096), bagged regression tree (R = 0.9086) and stochastic gradient boosted regression tree (R = 0.9163) are very close to each other. It looks like the ensemble learning methods (i.e., bagging; boosting) do

not work in training phase. This result is not surprising because the obtained result of CART is remarkable and, thus, the ensemble learning methods may not contribute to increase the prediction accuracy. In that, the ensemble learning methods are expected to advance the accuracy of weak learning process. However, as a fact, good training accuracy does not guarantee good testing accuracy. Indeed, ın the testing phase, the empirical results suggest that bagged regression tree (BRT) and stochastic gradient boosted regression tree (GBRT) models (RBRT = 0.8085, RGBRT = 0.8054) are superior to single CART model. But, though BRT slightly outperforms then GBRT, there is no signiﬁcant difference between bagging and boosting. Similar results were reported by Ismail and Mutanga (2010) when they applied an empirical comparison of ensemble learning methods via CART. The bagging and boosting methods increase the R statics of single CART model by 10.01%, and 9.59%, respectively. In addition, the relative correlation coefﬁcient (R) differences between the BRT and the SVR models is 14.96%, and differences between the GBRT and the SVR models is 14.52% in the testing phase. These results indicate that the ensemble methods can improve the accuracy of single CART model noticeably and they are superior alternatives to well-known support vector machine model. Our ﬁndings approximate previously reported results: Jeong and Kim (2005) used a bagged neural network in monthly inﬂows forecasting and the overall results indicated that the bagged neural network outperformed than a simple ANN among the three rainfall-runoff models. Beside this, the peak-estimates of tree-based models are also more accurate than the SVR model. However, ın general, it looks like the CART model performed better than proposed ensemble models in forecasting monthly peak ﬂows. This may happen because the proposed ensemble models try to reduce the noise by

H.I. Erdal, O. Karakurt / Journal of Hydrology 477 (2013) 119–128

127

Fig. 10. Observed versus predicted daily streamﬂow data for testing phase. Fig. 9. Observed versus predicted daily streamﬂow data for training phase.

generating sub-datasets to enhance the overall prediction accuracy. And this procedure usually makes the data smoother which may lead to lack of ﬁt of the model for peak estimating. Also, visually, the distribution of streamﬂow data predicted by the BRT and GBRT model remarkably ﬁt the distribution of observed data while CART and SVR do not in both training and testing phases. Then a question comes to mind: Why CART ensembles work? As argued before, the main philosophy of bagging and stochastic gradient boosting ensemble learning methods is to enhance the prediction accuracy of conventional weak predictors. In general, tree-based ensembles can inherit almost all advantages of treebased models while overcoming their primary problem, which is inaccuracy. Moreover, tree-based ensemble models can reasonably increase their accuracy by generating many replica datasets and creating various models, which has a lower bias, and then

integrating them in building an ensemble model with higher performance (Chou et al., 2011). Since the CART and SVR are data driven techniques, they require large number of input–output data sets for their training process according to ensemble models. This is why the tree-based ensemble models perform better than the single CART and SVR models. Using only one model to predict the streamﬂow time-series, usually, may not illuminate the internal mechanism of the phenomenon.

6. Conclusions In this study, 35-year of measured data of Karsßıköy observation station on the Çoruh River in Turkey (1968–2002) is used and three attribute combinations based on preceding monthly streamﬂows are developed to forecast current streamﬂow values. Moreover, the correlation coefﬁcient (R), mean absolute error (MAE) and root

128

H.I. Erdal, O. Karakurt / Journal of Hydrology 477 (2013) 119–128

mean squared error (RMSE) statistics are used to evaluate the proposed predictive models. The results from this study indicate that (i) the classiﬁcation and regression trees (CARTs) is promising technique on monthly streamﬂow forecasting and yields better results than the support vector regression (SVR) model (ii) the obtained results of the CART model in testing phase noticeable but not overwhelming (iii) ensemble learning methods (bagging, stochastic gradient boosting) can noticeably increase the accuracy of the single CART model (iv) bagging ensemble model yields slightly better results than boosting ensemble model. As a result of this empirical study, tree-based ensemble models can be implemented to successfully forecast monthly streamﬂow time series and the usage of the proposed models is very easy. In this study, bagging and stochastic gradient boosting methods are used in building ensemble models. The other ensemble learning models (i.e., stacking, adaboost) could be used for this process. Moreover, this study employs a single CART model as the base predictor. In construction ensemble models, other machine learning models (i.e., support vector machines, artiﬁcial neural networks) could be used as base predictors. These may be subject of future works. Finally, ın this study, only three input–output combinations based on preceding monthly streamﬂows are used to forecast current streamﬂow values. A broadly analysis of the environmental factors that inﬂuence streamﬂow is beyond the scope of this paper, but this could be subject of another important future study. Acknowledgement The authors would like to thank the editor and anonymous referees for valuable and helpful comments. References Anctil, F., Lauzon, N., 2004. Generalisation for neural networks through data sampling and training procedures with applications to streamﬂow predictions. Hydrol. Earth Syst. Sci. 8 (5), 940–958. http://dx.doi.org/10.5194/hess-8-9402004. Andriyashin, A., 2005. Financial Applications of Classiﬁcation and Regression Trees. Master Thesis. Humboldt University. Berlin pp: 6–8. Araghinejad, S., Azmi, M., Kholghi, M., 2011. Application of artiﬁcial neural network ensembles in probabilistic hydrological forecasting. J. Hydrol. 407 (1–4), 94– 104. http://dx.doi.org/10.1016/j.jhydrol.2011.07.011. Asefa, T., Kemblowski, M., Mckee, M., Khalil, A., 2006. Multi-time scale stream ﬂow predictions: the support vector machines approach. J. Hydrol. 318 (1–4), 7–16. http://dx.doi.org/10.1016/j.jhydrol.2005.06.001. Boucher, M.-A., Lalibert´e, J.-P., Anctil, F., 2010. An experiment on the evolution of an ensemble of neural networks for streamﬂow forecasting. Hydrol. Earth Syst. Sci. 14, 603–612. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classiﬁcation and Regression Trees. Wadsworth, Int. Group, Belmont, California, pp. 357–358. Breiman, L., 1996. Bagging predictors. Machine Learning 24 (2), 123–140. http:// dx.doi.org/10.1023/A:1018054314350. Breiman, L., 1999. Using adaptive bagging to debias regressions. Technical Report No. 547 of University of California, Berkeley. Bühlmann, P., Yu, B., 2002. Analyzing bagging. Ann. Stat. 30 (4), 927–961. Cannon, A.J., Whitﬁeld, P.H., 2002. Downscaling recent streamﬂow conditions in British Columbia, Canada using ensemble neural network models. J. Hydrol. 259, 136–151. http://dx.doi.org/10.1016/S0022-1694(01)00581-9. Chou, J.S., Chiu, C.K., Farfoura, M., Al-Taharwa, I., 2011. Optimizing the prediction accuracy of concrete compressive strength based on a comparison of data-

mining techniques. J. Comput. Civil Eng. 25 (3), 242–263. http://dx.doi.org/ 10.1061/(ASCE)CP.1943-5487.0000088. Efron, B., 1979. Bootstrap methods: another look at the jackknife. Ann. Stat. 7 (1), 1– 26. http://dx.doi.org/10.1214/aos/1176344552. Erdal, H.I., Ekinci, A., 2012. A comparison of various artiﬁcial intelligence methods in the prediction of bank failures. Comput. Econ.. http://dx.doi.org/10.1007/ s10614-012-9332-0. Friedman, J.H., 2002. Stochastic gradient boosting. Comput. Stat. Data Anal. 38 (4), 367–378. http://dx.doi.org/10.1016/S0167-9473(01)00065-2. Friedman, J.H., 2001. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29 (5), 1189–1232. Grunwald, S., Daroub, S.H., Lang, T.A., Diaz, O.A., 2009. Tree-based modeling of complex interactions of phosphorus loadings and environmental factors. Sci. Total Environ. 407 (12), 3772–3783. Guo, J., Zhou, J., Qin, H., Zou, Q., Li, Q., 2011. Monthly streamﬂow forecasting based on improved support vector machine model. Expert Syst. Appl. 38 (10), 13073– 13081. http://dx.doi.org/10.1016/j.eswa.2011.04.114. Hancock, T., Put, R., Coomans, D., Vanderheyden, Y., Everingham, Y., 2005. A performance comparison of modern statistical techniques for molecular descriptor selection and retention prediction in chromatographic QSRR studies. Chemomet. Intell. Lab. Syst. 76 (2), 185–196. http://dx.doi.org/ 10.1016/j.chemolab.2004.11.001. Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, New York, p. 500. ISBN: 9780387848570. Ismail, R., Mutanga, O., 2010. A comparison of regression tree ensembles: Predicting Sirexnoctilio induced water stress in Pinuspatula forests of KwaZulu-Natal, South Africa. Int. J. Appl. Earth Obs. Geoinf. 12S, S45–S51. Jeong, D.-I., Kim, Y.-O., 2005. Rainfall-runoff models using artiﬁcial neural networks for ensemble streamﬂow prediction. Hydrol. Process. 19 (19), 3819–3835. http://dx.doi.org/10.1002/hyp. 5983. Kisi, O., Cimen, M., 2011. A wavelet-support vector machine conjunction model for monthly streamﬂow forecasting. J. Hydrol. 399 (1–2), 132–140. http:// dx.doi.org/10.1016/j.jhydrol.2010.12.041. Li, P.-H., Kwon, H.-H., Sun, L., Lall, U., Kao, J.-J., 2010. A modiﬁed support vector machine based prediction model on streamﬂow at the Shihmen Reservoir, Taiwan. Int. J. Climatol. 30 (8), 1256–1268. http://dx.doi.org/10.1002/joc.1954. Pino-Mejias, R., Jimenez-Gamero, M.D., Cubiles-de-la-Vega, M.D., Pascual-Acosta, A., 2008. Reduced bootstrap aggregating of learning algorithms. Pattern Recogn. Lett. 29 (3), 265–271. http://dx.doi.org/10.1016/j.patrec.2007.10.002. Samsudin, R., Saad, P., Shabri, A., 2010. A hybrid least squares support vector machines and GMDH approach for river ﬂow forecasting. Hydrol. Earth Syst. Sci. Discuss 7, 3691–3731. http://dx.doi.org/10.5194/hessd-7-3691-2010. Schapire, E.R., 1990. The strength of weak learnability. Machine Learning 5, 197– 227. http://dx.doi.org/10.1023/A:1022648800760. Schölkopf, B., Burges, C.J.C., Smola, A., 1999. Advances in Kernel Methods: Support Vector Learning. MIT Press, Cambridge MA, ISBN 0-262-19416-3. Shu, C., Burn, D.H., 2004. Artiﬁcial neural network ensembles and their application in pooled ﬂood frequency analysis. Water Resour. Res. 40 (W09301). http:// dx.doi.org/10.1029/2003WR002816. Snelder, T.H., Lamouroux, N., Leathwick, J.R., Pella, H., Sauquet, E., Shankar, U., 2009. Predictive mapping of the natural ﬂow regimes of France. J. Hydrol. 373, 57–67. http://dx.doi.org/10.1016/j.jhydrol.2009.04.011. Tiwari, M.K., Chatterjee, C., 2011. A new Wavelet–Bootstrap–ANN hybrid model for daily discharge forecasting. J. Hydroinform. 13 (3), 500–519. http://dx.doi.org/ 10.2166/hydro.2010.142. Tiwari, M.K., Chatterjee, C., 2010. Development of an accurate and reliable hourly ﬂood forecasting model using Wavelet–Bootstrap–ANN (WBANN) hybrid approach. J. Hydrol. 394 (3–4), 458–470. http://dx.doi.org/10.1016/ j.jhydrol.2010.10.001. Vapnik, V.N., 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York, ISBN 0-387-98780-0. Vezza, P., Comoglio, C., Rosso, M., Viglione, A., 2010. Low ﬂows regionalization in North-Western Italy. Water Resour. Manage. 24, 4049–4074. http://dx.doi.org/ 10.1007/s11269-010-9647-3. Wang, G., Hao, J., Mab, J., Jiang, H., 2011. A comparative assessment of ensemble learning for credit scoring. Exp. Syst. Appl. 38 (1), 223–230. http://dx.doi.org/ 10.1016/j.eswa.2010.06.048. Zhang, C.X., Zhang, J.S., Wang, G.W., 2008. An empirical study of using rotation forest to improve regressors. Appl. Math. Comput. 195 (2), 618–629. http:// dx.doi.org/10.1016/j.amc.2007.05.010.

Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms

Advancing monthly streamflow prediction accuracy of CART models using ensemble learning paradigms

Recommend Documents