Development of multivariate adaptive regression spline integrated with differential evolution model for streamflow simulation

Development of multivariate adaptive regression spline integrated with differential evolution model for streamflow simulation

Journal of Hydrology 573 (2019) 1–12 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydro...

2MB Sizes 1 Downloads 104 Views

Journal of Hydrology 573 (2019) 1–12

Contents lists available at ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Research papers

Development of multivariate adaptive regression spline integrated with differential evolution model for streamflow simulation Zainab Abdulelah Al-Sudania, Sinan Q. Salihb, Ahmad sharafatic, Zaher Mundher Yaseend,

T



a

Water Resources Department, College of Engineering, University of Baghdad, Iraq Computer Science Department, College of Computer Science and Information Technology, University of Anbar, Ramadi, Iraq c Department of Civil Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran d Sustainable Developments in Civil Engineering Research Group, Faculty of Civil Engineering, Ton Duc Thang University, Ho Chi Minh City, Viet Nam b

A R T I C LE I N FO

A B S T R A C T

This manuscript was handled by G. Syme, Editor-in-Chief, with the assistance of Helge Bormann, Associate Editor

Among several components of the hydrology cycle, streamflow is one of the essential process necessarily needed to be studied. The establishment of an accurate and reliable forecasting soft computing model for this process is highly vital for water resource planning and management. The influence of the climatological environment on streamflow is central and studying its influence is very significant from the hydrology perspective. It has been noticed that the application of machine learning models considerably become predominant in solving and capturing the complexity of hydrological applications. This research presents the implementation of a novel hybrid model called Multivariate Adaptive Regression Spline integrated with Differential Evolution (MARS-DE) to forecast streamflow pattern in semi-arid region. To achieve this, monthly time series streamflow data at Baghdad station, coordinated at Tigris River, Iraq, is inspected. For the model validation, Least Square Support Vector Regression (LSSVR) and standalone MARS models are conducted. To demonstrate the analysis of the undertaken models, several statistical indicators are computed to verify the modeling accuracies. Based on the achieved results, the MARS-DE model exhibited an excellent hybrid predictive modeling capability for monthly time scale streamflow in semi-arid region. Quantitatively; MARS-DE, LSSVR and MARS models achieved the minimum root mean square error (RMSE) and mean absolute error (MAE) values of 46.64–35.25 m3/s, 57.50–49.20 m3/s and 78.01–62.65 m3/s, respectively. In conclusion, several perspectives are suggested for further studies to enhance the forecasting capability of the model.

Keywords: MARS-DE Streamflow simulation Semi-arid environment Antecedent values

1. Introduction 1.1. Streamflow forecasting background Streamflow forecasting is one of the challenging problems in the field of hydrological engineering due to its association with several nonstationary variability, chaotic distribution, and non-linearity phenomena (Yaseen et al., 2015). With respect to the sophisticated hydrological implementation, reliable and robust predictive models are highly needed for real-time streamflow forecasting. Up to date, the progress on exploring newly developed intelligent models is the motivation of hydrologist and soft computing developers. This is because every watershed is characterized differently and also because of the existence of randomness, natural stochasticity and periodicity in streamflow patterns (Kember et al., 1993; Tilmant et al., 2007). Additionally, there is no identical model that can be demonstrated as a



general model for all the diverse watershed conditions. In practice, streamflow is characterised by complex phenomena and very challenging to be comprehended perfectly with regard to forecasting the future pattern due to its dynamical, non-linear, and nonstationarity behaviour (Villarini et al., 2011). The understanding of the complexity of the streamflow pattern and its phenomenon is highly essential for hydrological processes to achieve an appropriate water resources management. Several studies have been conducted on streamflow phenomenon simulation over the past three decades; the major area of interest has been to comprehend the pattern of the influence of global and regional climatic changes on the hydrology cycle; for instance, studying the flood and drought trends over a specific region (Chua and Holz, 2005; Condon et al., 2014; Das Gupta, 2008).

Corresponding author. E-mail address: [email protected] (Z.M. Yaseen).

https://doi.org/10.1016/j.jhydrol.2019.03.004 Received 7 February 2019; Received in revised form 6 March 2019; Accepted 7 March 2019 Available online 13 March 2019 0022-1694/ © 2019 Elsevier B.V. All rights reserved.

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

multiple purposes in water resources engineering (Diop et al., 2018; Ghorbani et al., 2018; Yaseen et al., 2018b). In more descriptive manners, accurate long-term streamflow forecasting (monthly or yearly) is very necessary for the planning and operation of water storage, river sediment transport, agricultural and irrigation usage, and others. On the other hand, an accurate short-term streamflow (realtime) forecast, such as hourly or daily, is important in the areas affected by flooding for flood prediction purposes (Kagoda et al., 2010; Kisi, 2010; Shiri and Kisi, 2010). However, forecasting of this nature is a valuable tool in the provision of advanced warnings of likely flooding to minimize and mitigate their effects on infrastructure and human health. Recently, there have been new explorations for reliable machine learning models (Keshtegar et al., 2018; Mehdizadeh et al., 2017; Yaseen et al., 2016a,b). Multivariate adaptive regression spline model has demonstrated its capability in solving regression engineering problems (Kayadelen et al., 1992; Tong et al., 2016; Wang et al., 2015; Zhang and Goh, 2016; Zhou and Leung, 2007). The main advantage of MARS model as a non-parametric regression procedure is its potential to solve non-linear problems and their variables interaction (Lall et al., 1996; Sephton, 2001). The viability of the MARS model demonstrated a massive progress in the field of hydrology, especially in the areas of drought prediction (Deo et al., 2017), air quality modeling (García Nieto and Álvarez Antón, 2014), pan evaporation prediction (Kisi, 2015), river water pollution (Kisi and Parmar, 2016), susceptibility to earth-flow landslide modeling (Conoscenti et al., 2015), ground water level forecasting (Rezaie-balf et al., 2017), soil cation exchange capacity estimation (Emamgolizadeh et al., 2015), dissolved oxygen concentration prediction (Heddam and Kisi, 2018), and longitudinal dispersion coefficient prediction (Haghiabi, 2016). The construction of MARS model requires internal parameters optimization. As a robust machine learning model, hybrid models have shown a remarkable predictability feasibility. The current study is devoted to the integration of MARS model with a new nature inspired algorithm called Differential Evolution (DE). The DE algorithm was used to optimize the internal parameters as a hyperparameters tuning tool during the training stage. The DE is one of the well-developed metaheuristic algorithms that is based on evolutionary global method (Piotrowski and Napiorkowski, 2011). It was derived from the Genetic Algorithms (GA) and essentially used to solve multi-dimensional optimization problems that involve continuous variables. Hence, to our knowledge, MARS-DE is proposed in this study as a hybrid predictive model for streamflow forecasting in semi-arid region.

1.2. Literature review Based on the existing literature, there are two main categories of streamflow modeling methods, including (i) physical-based models, and (ii) soft computing models. The first category requires a deeper analysis and massive watershed hydrological parameters information such as boundary conditions and several others (Yaseen et al., 2018c), whereas the soft computing models (presented in the form of artificial intelligence (AI) models) require less efforts and pre-knowledge of the watershed characteristic. AI models are conceptual-based models that use historical information to capture the non-linear relationship between the input and output variables (Maier et al., 2014). In addition, AI-based models involve the inspection of optimal modeling solution, identifying the associated variables, and validating the final proposed model (Araghinejad et al., 2011). One of the major concerns of AI models is the identification of the appropriate modeling procedure. Hence, at this stage, the learning and feature extraction processes from the input data are implemented while the optimal model is approximated by minimizing the training error between the predicted and the target matrix. The optimal model is selected from the set of trained models based on an independent validation set which is usually judged by the lowest mean square error. Various factors that influence real hydrological conditions can affect the approximation accuracy of AI models. Such factors include the time scale or forecast horizon, the input data determination, and the configuration of AI models. Several intelligent models such as fuzzy logic, neural network, support vector machine and genetic programming models have recently been used for modeling streamflow pattern (Yaseen et al., 2015). This is owing to their feasibility in detecting the non-linearity, redundancy, and dynamic pattern of the time series (Yaseen et al., 2018a). Additionally, these models need no pre-knowledge of the studied problem, i.e., hydrology or climatology processes (Ghorbani et al., 2018). However, there are many problems with the AI models in terms of their modeling architecture; they are required to be in harmony with a learning mechanism to extract the valuable information from the historical data set of the streamflow. Also, AI models require an appropriate modeling procedure for the optimization process which needed to be introduced in order to re-adjust the bias and weight usually encountered in the model structure (Keshtegar et al., 2016; Labat, 2005). Above all, the AI models need numerous trial and error processes during the estimation of several parameters (such as the number of hidden layers, the type of the transfer function, and the number of neurons in the hidden layers) during the training phase in order to achieve an optimal architecture.

1.4. Research objectives

1.3. Research inspiration

The application of hybridized multivariate adaptive regression spline model with differential evolution algorithm is examined to forecast streamflow in semi-arid region at Baghdad meteorological station, Tigris River, Iraq. The capability of the proposed model is validated against one of the popular regression models (i.e., LSSVR model) in addition to standalone MARS model and couple models established over the literature on the same case study. Various input combinations are constructed to build the predictive models based on the correlated previous months streamflow. The results achieved are comprehensively discussed and comparatively analysed. The rest of this article was structured as follows: Section 2 covered the case study description and data collection while Section 3 covered the methodological overview. Section 4 reported the application and analysis of the findings while Section 5 presented the conclusion drawn from the study.

The enthusiasm of the hydrologists is always to address the end results of the rainfall water with a particular watershed. Various research efforts have been devoted to this matter, where watershed management is highly significant nowadays (Tongal and Booij, 2018). The main challenge with the quantification of streamflow is the interaction of several hydrological components with this regard, such as groundwater seepage, evaporation/evapotranspiration, sub-water flow, and many other water losses. Owing to the forgoing hydrological and climatological processes, streamflow pattern is characterised by high non-linearity, non-stationarity, and random distribution; hence, the probability of achieving an accurate streamflow forecasting model is highly challenging due to the existing high stochasticity problem. With this regard, hydrologists have been focusing on the elucidation of the hydrological processes for these components based on the theory of developing hydrological mathematical models. Based on the established state-of-the-art research, hydrological mathematical models are grouped into mechanistic models and data-driven (or machine learning) models (Yaseen et al., 2016a,b). The significance of streamflow forecasting can be beneficial for

2. Case study and data description Tigris River is considered as one of the largest rivers in the Middle Eastern region with a total length of approximately 1718 km. Tigris river, also called Nahar Degla, flows from its source in Turkey, then, to 2

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

Fig. 1. The location of the studied meteorological station, Baghdad station on Tigris river.

Fig. 2. The actual dataset of streamflow used to build the predictive models.

3

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

If y is to be estimated with M basis functions, the MARS model would be:

Iraq, and about 85% of the total basin area (253,000 km) resides in the Iraqi region (Macklin and Lewin, 2015) (See Fig. 1). Tigris River and Euphrates River provide the main source of fresh water for all the cities (from the northern to the southern part of the region). The water usage is for many purposes such as domestic usage, agriculture, irrigation, and industrial consumption. The interesting part of this region, particularly in the capital city of Baghdad, is that it has a massive population growth and technological industrial developments. Hence, to satisfy these critical elements, an accurate vision of the magnitude of the streamflow is highly essential. The climate of this river is semi-arid; it has been reported by several scientific studies that the mean rainfall of the area is approximately 216 mm with seasonal characteristics (December to February) (Yaseen et al., 2018a). The mean monthly streamflow of Tigris River at the investigated meteorological station is 411.35 m3/s, and the standard deviation is 234.52 m3/s. The weather temperature varies between summer and winter. During the summer period, the temperature might exceed 45 °C but may drop to below 10 °C during the winter period. The current research is established using 20 years (1991–2010) monthly scale streamflow (see Fig. 2). The statistical characteristics of the complete, training (90%) and testing (10%) are tabulated in Table 1.

= Y fm (X ) = C0 +

M



Cm Bm (X )

M=1

3. Methodology overview

GCV (M ) =

In this section, the frameworks of the multivariate adaptive regression spline (MARS), the differential evolution (DE) algorithm, the hybridized MARS-DE and the benchmark model (LSSVR) was explained mathematically.

1 n

fm (Xi )) ∑i = 0 (Yi −  n

2

(1 − C (M N ))2

C (M ) = (M + 1) + dM

(6)

where M is the number of basis functions present in formula (4), and d is the penalty factor. Once the modeling procedure has been initiated, it is possible to launch the correlated attributes of the input variables that should be used in the basis functions (not all the input variables will end up in the final model) based on different criteria such as the number of subsets where a variable appears, the increment of GCV, or the residual sum of squares (RSS) when the variable is removed from the model (Cheng and Cao, 2014a,b; Zhang et al., 2015).

MARS model is one of the well-established recent models for module regression and classification engineering applications (Friedman, 1991). As the main problem solved in this research is regression, a detailed methodological explanation is presented here. The main goal of the MARS model to solve regression problem and to estimate the continuous variable defined as y (nx1) based on a set of variables X (nxp) . Here, the defined regression model consisted of a linear function f (x ) and a magnitude of error e ; the formula is expressed as follows:

3.2. Optimization algorithm: Differential evolution (DE)

(1) In the field of optimization, the DE algorithm is one of the robust bio-nature inspired algorithms (Zafar et al., 2017) whose major concept depends on the optimization of the optimal solution based on the iterative mechanism where the optimum candidate is selected (Zhang et al., 2012). The main advantage of the DE algorithm is that there is no requirement for any optimization function to deliver the best solution; however, the algorithm tunes any assigned problem by sustaining a population of candidate solutions and initiating new candidate solutions in parallel to the existing ones. The best fitness criteria is the deterministic function for the optimum population candidates (Mandal et al., 2011). The main concept of the DE algorithm is based on assuming that the optimized variables are encoded vectors. The span of the vectors n is equivalent to the attribute variables of the prediction problem. The population is decomposed into number of parents (NP); a vector Xpg is defined (p = 1, …, NP) as the index of the individual population and the corresponding generation (G). The defined problem Xpg, m contains the composed vectors. Here, m is the individual index variable; the domain of the problem variables is subjected to ( Xmmin and Xmmax ). The mechanism of the DE algorithm is performed through four stages, including (Azimi et al., 2017):

The MARS model is a mathematical model whose internal function is based on the piecewise polynomials of degree called the basis function (T ) or splines. The internal connections are fitted over different intervals of the input attributes. The intervals are bounded by knots (k ). The MARS spline function can be defined as (Friedman and Roosen, 1995a,b):

(k − x )q ifx < k −(x − k )+q = ⎧ ⎨ 0 Otherwise ⎩

(2)

(k − x )q ifx ≥ k −(x − k )+q = ⎧ ⎨ 0 Otherwise ⎩

(3)

where q (≥0) is the power that determines the degree of the polynomial piece-wise function. If q = 1, then, the splines are linear (as in this study). Table 1 The statistical characteristic of the inspected streamflow dataset.

The complete data Training phase Testing phase

(5)

where C (M ) is the parameter that penalizes complexity and increases its value with the number of basis functions which can be obtained through (Friedman and Roosen, 1995a,b; Sekulic and Kowalski, 1992):

3.1. Multivariate adaptive regression spline (MARS) model

y = f (x ) + e

(4)

 is the forecasted outcome given by this model, C0 is the inwhere Y tercept, Bm (X ) is the m-th basis function, and Cm is the corresponding coefficient. During the training procedure, the model (both variables and the node position) are optimized. For instance, consider a given dataset X with n samples and p independent variables; then, N = n × p pairs of spline basis functions, where x ij (i = 1, 2, …, n; and j = 1, 2, …, p) are the node locations. At the first stage, the MARS model is constructed in the forward process. Two pairs of the basis functions are embedded to the first step of the forward process (Friedman, 1991; Friedman and Roosen, 1995a,b). The model is usually trapped with overfitting and its capability in solving regression process is limited. Hence, the model must be pruned to enhance its forecasting accuracy and this is usually done by the backward process using the generalized cross-validation (GCV) parameter (Wang et al., 2010). The mathematical form of the GCV can be expressed as:

Minimum (m3/s)

Maximum (m3/s)

Mean (m3/s)

Standard deviation

Skewness

298.10 298.10 331.40

2651.00 2651.00 936.40

726.54 745.25 565.61

364.23 377.14 150.82

2.36 2.25 0.75

i. Initialization: The first generation (i.e., population) is initialized randomly by considering the minimum and maximum magnitude 4

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

• M in the rage [3, 100]. • I in range [1, 4]. • d in range [−1, 4].

variables. ii. Mutation: The construction of the number partners random vector noise is performed in this stage. iii. Recombination: The recombination is randomly generated after the number partner is computed and the results validate with Xpg . iv. Selection: The final stage is to compare the test vector with the original ones; the best value is achieved through the fitness function.

max

max

x ij = (Ub − Lb) × rand + Lb

Therefore, the representation of each solution consists of three main components as given in Fig. 4. Step 4: Execute the mutation operator. For each individual X , the standard DE algorithm generates a corresponding mutated individual which is expressed by the following equation:

The algorithm of the proposed MARS-DE model is presented in the following sub-section.

Vi G + 1 = Xr1G + F × (Xr 2G − Xr 3G )

3.3. The hybridized MARS-DE model

Step 5: Execute the crossover operator and generate an experimental individual as follows:

uijG + 1 f (x ) =

XiG + 1 =

G+1 G+1 G ⎧ui , iff (ui ) < f (Xi ) G ⎨ Xi , otherwise ⎩

(12)

Step 7: The proposed MARS-DE is stopped when the stop criterion is met. If the stop criterion is not met, the model will proceed to the next generation. The number of function evaluation (NFE) is used to stop the DE while searching for the optimal parameters of MARS.

(7)

3.4. Least square support vector regression (LSSVR) model Due to the advancement of the machine learning model, there have been several model improvements and LSSVR model is one of the newly extended versions (Suykens and Vandewalle, 1999). LSSVR is the newly modified version of support vector regression (SVR) that was originally developed by (Vapnik, 1995). LSSVR model was selected as a benchmark model due to its potential for solving several hydrological processes (Deo et al., 2016; Deo and Samui, 2017; Kumar et al., 2016; Mouatadid et al., 2018). It has been acknowledged that the main drawback of the SVR model is the convergence learning process which has been solved by the LSSVR model through eliminating the quadratic programming solution issue. This improvement could exclude several limitations such as overfitting problem and trapping in the local minima. Further, it may produce a stable solution to crack the quadratic programming problems (Ji et al., 2014; Xie et al., 2013).

Step 2: Define the fitness function as the mean absolute percentage error (MAPE ) which is established as: n

∑i = 1 (sfo − sff ) sfo n

(11)

Step 6: Perform the selection operator. Each objective individual XiG must compete with its associated experimental individual uiG + 1 which is generated after the crossover and mutation operations. When the experimental individual uiG + 1 has a better fitness value than the objective individual XiG , uiG + 1 will become the offspring; otherwise XiG will automatically become the offspring. Using the minimum problem as a case study, the selection method is represented in Eq. (12), where f is the fitness function.

where Xi is the input value, Xmin is the minimum value, Xmax is the maximum value, and XiN is the normalized value of the dataset.

MAPE =

G+1 ⎧Vij , ifr (j ) ≤ CRorj = rn (i) ⎨ XijG otherwise ⎩

where r (j ) represents a randomly generated number in the range [0, 1],and j denotes the j th gene of an individual. CR represents the crossover rate which is in the range [0, 1] and determined by the user. The gene index rn (i) is generated randomly in the range [1, D ] and applied to ensure that at least one dimension of the experimental individual is from the mutated individual. The global search of the proposed algorithm is enhanced when the value of CR gets smaller.

Step 1: The original dataset is normalized between [0,1] to harmonize the scale of the data and avoid the large numeric range; the normalization function is given as follows:

Xi − Xmin Xmax − Xmin

(10)

where Xr1, Xr2 , and Xr3 are generated randomly. None of these individuals is identical to the objective generated serial number i ; hence, the population size is greater than the four individuals. F represents the scaling factor that governs the mutation degree and within the range of [0,2] as reported in the literature.

In this paper, a hybrid forecasting model based on multivariate adaptive regression spline (MARS)-differential evolution (DE) was proposed. The main motivation of integrating the DE algorithm to MARS is to determine the optimal values of the hyper-parameters of the MARS model. These hyper-parameters have an essential influence on the MARS models’ performance (Cheng and Cao, 2014a,b) and these parameters are: (i) the maximum number of basis function (MaxFun), (ii) the penalty parameter (d) (smooth parameter), and (iii) the maximum interaction between variable (Imax). The are several optimization methods used to compute the optimal hyper-parameters, including random search, Nelder-Mead search, grid search, heuristic search, genetic algorithms, pattern search, and several others (AL-Musaylh et al., 2018; Chou et al., 2004; Crino and Brown, 2007; Friedman and Roosen, 1995a,b; García Nieto and Álvarez Antón, 2014; Zhang and Goh, 2016). In the current study, a robust bio-inspired nature optimization algorithm based on the differential evolution concept hybridized with MARS model was used for the internal parameters tuning. The proposed MARS-DE was used to forecast one month ahead streamflow value based on several correlated lag times determined based on the correlation statistics. The determination coefficient was used to determine the optimization process. The flowchart of the proposed MARS-DE model was demonstrated in Fig. 3. The determination coefficient is the statistical metric used to evaluate the regression curve that approximates the actual observations data. In addition, it is a descriptive metric ranged between (0–1), where the optimal scoring is 1, representing an accurate prediction. Crossvalidation is the standard technique used here to find the real coefficient of determination (R2) for the analysed dataset. The main steps of the proposed MARS-DE model were given as follows:

XiN =

(9)

(8)

where sfo is the actual value at periodt , sff is the predicted value at period t , and n is the number of periods used in the calculation. Step 3: Initialize the values of the parameters (Mmax , Imax , d ) of MARS randomly based on a uniform distribution as given in Eq. (9). The initial values of these parameters are generated in the predefined ranges; the upper bound and lower bound based on the case study, are as follows: 5

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

Fig. 3. The proposed MARS-DE predictive model for streamflow forecasting.

Fig. 4. The solution representation.

3.5. Input variables modeling procedure The applied forecasting models were constructed based on the correlated lag times of previous months to forecast one month ahead. In the present research, the auto-correlation function and its partial statistics were computed (See Fig. 5). It can be noticed that the correlated antecedent streamflow values were up to five months; hence, five input combinations were constructed based on the lags of the previous

Fig. 5. The correlations of the river flow data for different time lags.

months. The input combinations were presented as follows:

M1Sft = (Sft − 1) 6

(13)

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

Table 2 The statistical performance indicators of the proposed MARS-DE predictive model over the testing phase. Models

SI

MAPE

RMSE (m3/s)

MAE (m3/s)

RMSRE

MRE

BIAS

NSE

M1 M2* M3 M4 M5

0.1116 0.0810 0.1237 0.1543 0.1448

0.0892 0.0621 0.1076 0.1259 0.1260

64.2432 46.6489 71.2015 88.8058 83.3386

50.6862 35.2595 60.9413 67.3775 71.2891

0.1111 0.0811 0.1255 0.1729 0.1428

−0.0038 0.0138 0.0303 0.0576 0.0134

10.7497 −6.4488 −10.8752 −27.6989 2.8887

0.7973 0.8931 0.7510 0.6127 0.6589

M2Sft = (Sft − 1, Sft − 2)

(14)

M3Sft = (Sft − 1, Sft − 2, Sft − 3)

(15)

M4Sft = (Sft − 1, Sft − 2, Sft − 3, Sft − 4 )

(16)

M5Sft = (Sft − 1, Sft − 2, Sft − 3, Sft − 4, Sft − 5)

n

NSE = 1 −

RE =

(17)

n



RMSE =

(18)

n

|Sfo − Sff | Sfo

i=1

1 n

RMSRE =

MRE =

1 n

n

(19)

(Sfo − Sf f )2 n

i=1 n

(20)

Sfo − Sff ⎞2 ⎟ ⎝ Sfo ⎠

∑ ⎜⎛ i=1

(21)

Sfo − Sff ⎞ ⎟ ⎝ Sfo ⎠

∑ ⎜⎛ i=1

(22)

n

MAE =

BIAS =

∑i = 1 |Sfo − Sff | (23)

n n ∑i = 1

(Sfo − Sff )

n ∑i = 1

(Sfo)

(26)

The effectiveness of the proposed MARS-DE and the comparable LSSVR models was inspected on real historical streamflow sourced from an official organization authorized for monitoring such streamflows at Baghdad station located on Tigris River. This section provides a comprehensive detail of the applied predictive model predictability in forecasting one step ahead streamflow. It is worth stating that the applied dataset is continuous and does not experience any missing monitoring events data during the period under study. In this section, the robustness of the models was evaluated and assessed using multiple statistical indicators such as absolute error measurements and best-fit-goodness (e.g., SI, MAPE, RMSE, MAE, RMSRE, MRE, BIAS and NSE). Based on Tables 2 and 3, five input combinations and their forecasting skills were reported based on the antecedent streamflow records. According to Table 2, two months lead time performed the best prediction matrix. It can be observed that this is the case for the proposed model and the comparable model. In quantitative terms, MARS-DE model achieved the best performance indicators (RMSE ≈ 46.64 m3/s, MAE ≈ 35.25 m3/s and NSE ≈ 0.89) while LSSVR model performed the best modeling forecasting (RMSE ≈ 57.50 m3/s, MAE ≈ 49.20 m3/s and NSE ≈ 0.83) (See Tables 2 and 3). Both models achieved good forecasting performances based on the statistical research established by (Moriasi et al., 2007). However, the sophistication of the advanced level of accuracy is always the motivation. Based on the tabulated results, three months lead time and onward up to fivemonth lead time demonstrated low forecasting skills. This can best be explained by the decrease in the correlated lead time of the streamflow as clearly illustrated in Fig. 5. Note, the established forecasting procedure is based on univariate modeling in which only streamflow information is used to build the proposed predictive model. Practically, this is more challenging for the watersheds lack of meteorological data. Figs. 6 and 7 displayed the variance between the actual and forecasted (i.e., MARS-DE and LSSVR) streamflow pattern (presented in the form of scatter plots) over the testing phase for each designed input

n



∗ 100

4. Application results and analysis

∑i = 1 (Sfo − Sf f )2 n

1 n

Sfo



Various statistical performance indicators were computed, including mean absolute percentage error (MAPE), scatter index (SI), root mean square error (RMSE), root mean square relative error (RMSRE), mean absolute error (MAE), mean relative error (MRE), BIAS, Nash-Sutcliffe efficiency (NSE), and Relative Error (RE) (Tao et al., 2018). The mathematical expressions can be described as follows:

MAPE = 100

Sfo − Sff

(25)

and forecasted streamflow values, and Sfo is the mean value of the actual streamflow.

3.6. Forecasting skill performance indicators

Sfo

n ¯ )2 ∑i = 1 (Sfo − Sf o

where n is the number of the testing samples, Sfo and Sff are the actual

where Sft denotes the current forecasted value of streamflow. Sft − 1, Sft − 2, Sft − 3, Sft − 4, andSft − 5 are the lag times of the previous months (up to five months). The current research was designed based on 20 years of monthly scale streamflow data (1991–2010). Ten percent (10%) of the dataset (2009–2010) was used for the model validation (testing phase) whereas the rest of the data was used for the training phase.

SI =

∑i = 1 (Sfo − Sf f )2

(24)

Table 3 The statistical performance indicators of the comparable LSSVR predictive model over the testing phase. Models

SI

MAPE

RMSE (m3/s)

MAE (m3/s)

RMSRE

MRE

BIAS

NSE

M1 M2* M3 M4 M5

0.1337 0.0999 0.1373 0.1493 0.1586

0.1098 0.0874 0.1084 0.1157 0.1260

76.9290 57.5033 79.0402 85.9033 91.3036

57.5613 49.2066 63.4961 69.2461 74.7509

0.1540 0.1026 0.1290 0.1399 0.1436

0.0521 0.0128 −0.0204 −0.0198 −0.0096

−24.2822 −1.0852 14.3844 12.6344 15.4164

0.7094 0.8376 0.6932 0.6376 0.5906

7

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

1000

y = 0.9842x + 15.527 R² = 0.9026

900

600 500 400

800

700 600

300

500

700

500 400 300 300

500

700

900

300

Actual streamflow

Actual streamflow

700

900

M5 900

y = 0.9081x + 80.566 R² = 0.7074

800

MARS-DE

800

MARS-DE

500

Actual streamflow

M4

900

600

400 900

1000

700

500

300

300

y = 0.819x + 115.02 R² = 0.7613

900

800

700

MARS-DE

MARS-DE

800

1000

y = 0.7058x + 158.51 R² = 0.8186

MARS-DE

900

M3

M2

M1

700 600 500

y = 0.6837x + 179.11 R² = 0.6602

700 600 500 400

400

300

300 300

500

700

300

900

500

700

900

Actual streamflow

Actual streamflow

Fig. 6. The scatter plots between the applied MARS-DE predictive model and the actual river flow over the testing period and for all the investigated input combinations.

highest correlation coefficient as depicted in the scatter plot presentation.

combination. The scatter plots indicated the least square regression formula ( y = ax + b ) and the coefficient of determination between the actual and forecasted values. Based on the graphical presentation, M2 input combination showed an ideal variation around the best fit line. MARS-DE achieved a maximum R2 ≈ 0.90 whereas LSSVR model achieved the best R2 ≈ 0.83. By comparing the investigated models (MARS-DE and LSSVR) with a detailed comprehensive analysis, the computation of the relative error (RE) indicator using Eq. (26) was examined during the testing period for the best input combination. The results of the two models are illustrated in Fig. 8 for each single observation over the testing phase. In general, the performance of MARS-DE model demonstrated less percentage error and limited between +14 and −18% whereas the performance of the LSSVR model demonstrated a slightly higher RE% (+22 and −15). It can be concluded that the residual error for the testing phase was improved when using the proposed MARS-DE model compared to the LSSVR model. Some observations were also observed to be common to both models with a high percentage of error. This could be due to the sudden increment in the streamflow due to occasionally heavy rainfall events or other source of water flow through the upstream attributes. Descriptively, the interception between several statistical indicators, including the correlation, standard deviation, and root mean square was generated in the form of a Taylor diagram. Fig. 9 showed the Taylor diagram visualization over the testing phase of the proposed MARS-DE and the comparable forecasting model. MARS-DE demonstrated the closest coordinate using M2 input combination to the actual streamflow (benchmark value). The test modeling phase presented a far distance between MARS-DE model and the actual data point. Indeed, this is normal due to the lower magnitude achieved for the RMSE and the

5. MARS-DE modeling validation The authors also observed that the proposed hybrid MARS-DE forecasting model can provide an accurate outcome for the investigated case study. It is worth to validate the current modeling performance with the standalone MARS model and the literature studies performed on the same case study. Table 4 reported the statistical performance of the standalone MARS model. The best input combination for the streamflow forecasting was M1 (incorporating one-month lead time). This is on the contrary with the MARS-DE and LSSVR models, where the optimal input combination was two-month lead time. Indeed, this is best can be explained due to the fact machine learning based forecasting models behave differently from one case to another in accordance to complexity of the simulated problem. The forecasting was attained with minimum absolute error measures (RMSE ≈ 78.01 m3/s and MAE ≈ 62.65 m3/s). This is evidencing a remarkable predictability performance augmentation through the integration of the differential evolution evolutionary optimization algorithm with MARS model. With respect to the validation against the literature, Zaher and his co-authors developed an extreme learning machine (ELM), classical support vector regression (SVR), and generalized regression neural network (GRNN) for streamflow forecasting of the same case study using (80–20%) training and testing division (Yaseen et al., 2016a,b). ELM exhibited the best predictability potential over SVR and GRNN models. Nevertheless, MARS-DE model of the present study produced an enhanced forecasting accuracy (51.1–57.9%) through the RMSE and MAE indicators over ELM model, respectively. The high level of 8

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

M1

900

1000

y = 0.8993x + 82.249 R² = 0.7628

900

700 600

800

700 600

700 600

500

500

500

400

400

400

300

300

300 300

500

700

900

300

Actual streamflow

500

700

300

900

900

700

900

M5 900

y = 0.9435x + 19.873 R² = 0.717

800

800

y = 0.6402x + 191.59 R² = 0.6045

700 LSSVR

LSSVR

500

Actual streamflow

Actual streamflow

M4 1000

y = 0.8965x + 45.158 R² = 0.7376

900

800

LSSVR

LSSVR

800

1000

y = 0.8361x + 95.398 R² = 0.8377

LSSVR

1000

M3

M2

700 600

600 500

500

400

400

300

300 300

500

700

900

300

Actual streamflow

500

700

900

Actual streamflow

Fig. 7. The scatter plots between the applied LSSVR predictive model and the actual river flow over the testing period and for all the investigated input combinations.

eliminating or incorporating those trends might beneficial proposition to give an informative knowledge for the intelligence predictive models.

accuracy enhancement could be explained through the increment in the training dataset, where the current study was established using 90% of the historical data. In addition, the integration of the nature-inspired algorithm as hyperparameters tuning algorithm for MARS model could contribute to this enhanced prediction accuracy.

7. Conclusion 6. Modeling assessment and possible future research

The current study explored a newly developed machine learning predictive model called MARS-DE for monthly streamflow forecasting in semi-arid environments. At Baghdad meteorological station, Tigris River, the historical data of the streamflow over 20 years was obtained for the model construction. The architecture of the proposed model was established based on several antecedent values of streamflow patterns. Among several machine learning introduced in the literature, support vector machine model showed as a reliable model for forecasting streamflow and thus, was selected for the model authentication. LSSVR model was developed using the same input combinations. The results of the performed models were examined using several statistical indicators and graphical presentation prediction skill metrics. In general, the results demonstrated the potential of the developed MARS-DE model for monthly streamflow forecasting in semi-arid environments with a remarkable forecasting accuracy. Based on the statistical metrics (RMSE and MAE), MARS-DE model improved the streamflow predictability by (18.8–28.3%) and (40.2–43.7%) over LSSVR and standalone MARS models. As a future research proposition, the forecasting possibility can be enhanced through the incorporation of some related climate parameters such as rainfall, humidity, and temperature to provide more information on the streamflow phenomenon. Declaration of interest The authors state that there is no conflict of interest in publishing the current research.

Deceptively, the obtained modeling results evidenced the shortcoming of the length of the dataset span implemented in the current study. In general, the length of data used (20 years monthly basis) has a considerable effect on the accuracy of the overall performance. Hence, it is highly recommended to put more efforts in the future research on incorporating longer dataset span where more streamflow patterns can be included in the training phase. This is due to the fact that the training stage should experience the majority of the streamflow patterns to allow the models in the testing session to forecast with an acceptable level of accuracy. In this context, it seems that 20 years streamflow is not perfectly sufficient, and this can affect the level of the forecasting accuracy. Furthermore, it might be essential to incorporate other casual hydrological variables as external attributes for the streamflow simulation. The main intuition is that weather conditions (mainly, rainfalls) are the main influencer of streamflow patterns. Thus, there could be an enhancement in forecasting procedure if other related climate variables are associated with historical datasets of streamflow to develop a good and successful prediction model. Adding climatological information such as rainfall, temperature, or even humidity as an input attribute might enhance the learning mechanism and a better performing forecasting model can be achieved. On other aspect, monthly scale streamflow pattern is associated with seasonality trend and thus, 9

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

MARS-DE 20

Relative Error

15 10 5 0 -5 -10 -15 -20 1

6

11 16 Months (Testing phase)

21

Relative Error

LSSVR 25 20 15 10 5 0 -5 -10 -15 -20 1

6

11 16 Months (Testing phase)

21

Fig. 8. The percentage of the relative distribution error for the best input combination (M2) for both applied predictive models over the test modeling phase.

MARS-DE 0.0

180

180

160

160

140

140

120

M4

100

M5

Standard Deviation

Standard Deviation

0.0

LSSVR

M3

80

M1

M2

M4

120

M3

M5

100

M1 80

60

60

40

40

20

20

M2

Actual

Actual

0

0

0

20

40

60

80

100

120

Standard Deviation

140

160

180

0

20

40

60

80

100

120

Standard Deviation

Fig. 9. Taylor diagram graphical presentation for both applied predictive models over the testing phase. 10

140

160

180

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al.

Table 4 The statistical performance indicators of the standalone MARS predictive model over the testing phase. Models

SI

MAPE

RMSE (m3/s)

MAE (m3/s)

RMSRE

MRE

BIAS

NSE

M1* M2 M3 M4 M5

0.1355 0.1448 0.1452 0.1722 0.1793

0.1115 0.1320 0.14136 0.14716 0.1484

78.0114 83.349 83.5435 99.1160 103.211

62.6527 74.8996 77.5599 76.8163 86.7239

0.1402 0.1442 0.1524 0.1964 0.1717

−0.0267 0.0352 0.0536 0.0266 −0.0292

23.761 −17.219 −23.384 −12.941 26.9907

0.7011 0.6588 0.6573 0.5176 0.4769

Acknowledgments

Friedman, J.H., 1991. Multivariate adaptive regression splines. Ann. Statist. 19, 1–67. https://doi.org/10.1214/aos/1176347963. Friedman, J.H., Roosen, C.B., 1995. Statistical Methods in Medical Research. statistical methods in medical research 197–217. doi: 10.1177/096228029500400303. Friedman, J.H., Roosen, C.B., 1995b. An introduction to multivariate adaptive regression splines. Stat. Methods Med. Res. https://doi.org/10.1177/096228029500400303. García Nieto, P.J., Álvarez Antón, J.C., 2014. Nonlinear air quality modeling using multivariate adaptive regression splines in Gijón urban area (Northern Spain) at local scale. Appl. Math. Comput. https://doi.org/10.1016/j.amc.2014.02.096. Ghorbani, M.A., Khatibi, R., Karimi, V., Yaseen, Z.M., Zounemat-Kermani, M., 2018. Learning from multiple models using artificial intelligence to improve model prediction accuracies: application to river flows. Water Resour. Manage. https://doi.org/ 10.1007/s11269-018-2038-x. Haghiabi, A.H., 2016. Prediction of longitudinal dispersion coefficient using multivariate adaptive regression splines. J. Earth Syst. Sci. https://doi.org/10.1007/s12040-0160708-8. Heddam, S., Kisi, O., 2018. Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2018.02.061. Ji, Z., Wang, B., Deng, S., You, Z., 2014. Predicting dynamic deformation of retaining structure by LSSVR-based time series method. Neurocomputing 137, 165–172. https://doi.org/10.1016/j.neucom.2013.03.073. Kagoda, P.A., Ndiritu, J., Ntuli, C., Mwaka, B., 2010. Application of radial basis function neural networks to short-term streamflow forecasting. Phys. Chem. Earth 35, 571–581. https://doi.org/10.1016/j.pce.2010.07.021. Kayadelen, C., Günaydın, O., Fener, M., Demir, A., Özvan, A., Mollahasani, A., Alavi, A.H., Gandomi, A.H., Rashed, A., Mollahasani, A., Bazaz, J.B., Bowles, J.E., Samui, P., Kim, D., 1992. Engineering properties of soils and their measurement. KSCE J. Civ. Eng. https://doi.org/10.1080/1064119X.2014.954655. Kember, G., Flower, A.C., Holubeshen, J., 1993. Forecasting river flow using nonlinear dynamics. Stochastic Hydrol. Hydraulics 7, 205–212. https://doi.org/10.1007/ BF01585599. Keshtegar, B., Allawi, M.F., Afan, H.A., El-Shafie, A., 2016. Optimized river stream-flow forecasting model utilizing high-order response surface method. Water Resour. Manage. 30, 3899–3914. https://doi.org/10.1007/s11269-016-1397-4. Keshtegar, B., Mert, C., Kisi, O., 2018. Comparison of four heuristic regression techniques in solar radiation modeling: Kriging method vs RSM, MARS and M5 model tree. Renew. Sustain. Energy Rev. https://doi.org/10.1016/j.rser.2017.07.054. Kisi, O., 2015. Pan evaporation modeling using least square support vector machine, multivariate adaptive regression splines and M5 model tree. J. Hydrol. 528, 312–320. https://doi.org/10.1016/j.jhydrol.2015.06.052. Kisi, O., 2010. Wavelet regression model for short-term streamflow forecasting. J. Hydrol. 389, 344–353. https://doi.org/10.1016/j.jhydrol.2010.06.013. Kisi, O., Parmar, K.S., 2016. Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution. J. Hydrol. 534, 104–112. https://doi.org/10.1016/j.jhydrol.2015.12.014. Kumar, D., Pandey, A., Sharma, N., Flügel, W.-A., 2016. Daily suspended sediment simulation using machine learning approach. CATENA 138, 77–90. https://doi.org/10. 1016/j.catena.2015.11.013. Labat, D., 2005. Recent advances in wavelet analyses: part 1. A review of concepts. J. Hydrol. 314, 275–288. https://doi.org/10.1016/j.jhydrol.2005.04.003. Lall, U., Sangoyomi, T., Abarbanel, H.D.I., 1996. Nonlinear dynamics of the Great Salt Lake: nonparametric short-term forecasting. Water Resour. Res. https://doi.org/10. 1029/95WR03402. Macklin, M.G., Lewin, J., 2015. The rivers of civilization. Quat. Sci. Rev. 114, 228–244. https://doi.org/10.1016/j.quascirev.2015.02.004. Maier, H.R., Kapelan, Z., Kasprzyk, J., Kollat, J., Matott, L.S., Cunha, M.C., Dandy, G.C., Gibbs, M.S., Keedwell, E., Marchi, A., Ostfeld, A., Savic, D., Solomatine, D.P., Vrugt, J.A., Zecchin, A.C., Minsker, B.S., Barbour, E.J., Kuczera, G., Pasha, F., Castelletti, A., Giuliani, M., Reed, P.M., 2014. Evolutionary algorithms and other metaheuristics in water resources: current status, research challenges and future directions. Environ. Modell. Software 62, 271–299. https://doi.org/10.1016/j.envsoft.2014.09.013. Mandal, A., Das, S., Abraham, A., 2011. A differential evolution based memetic algorithm for workload optimization in power generation plants, in: Proceedings of the 2011 11th International Conference on Hybrid Intelligent Systems, HIS 2011. doi: 10. 1109/HIS.2011.6122117. Mehdizadeh, S., Behmanesh, J., Khalili, K., 2017. Using MARS, SVM, GEP and empirical equations for estimation of monthly mean reference evapotranspiration. Comput. Electron. Agric. 139, 103–114. https://doi.org/10.1016/j.compag.2017.05.002. Mouatadid, S., Raj, N., Deo, R.C., Adamowski, J.F., 2018. Input selection and data-driven model performance optimization to predict the Standardized Precipitation and Evaporation Index in a drought-prone region. Atmos. Res. 212, 130–149. https://doi. org/10.1016/j.atmosres.2018.05.012.

The authors of the current research would like to reveal their gratitude and appreciation to the editor-in-chief: Prof. Geoff Syme and the associated editor: Prof. Helge Bormann for managing our manuscript. The appreciation is extended to the respected reviewers whom gave valid comments where to enhance the context and visualization of our manuscript. References AL-Musaylh, M.S., Deo, R.C., Li, Y., Adamowski, J.F., 2018. Two-phase particle swarm optimized-support vector regression hybrid model integrated with improved empirical mode decomposition with adaptive noise for multiple-horizon electricity demand forecasting. Appl. Energy. https://doi.org/10.1016/j.apenergy.2018.02.140. Araghinejad, S., Azmi, M., Kholghi, M., 2011. Application of artificial neural network ensembles in probabilistic hydrological forecasting. J. Hydrol. 407, 94–104. https:// doi.org/10.1016/j.jhydrol.2011.07.011. Azimi, H., Bonakdari, H., Ebtehaj, I., Ashraf Talesh, S.H., Michelson, D.G., Jamali, A., 2017. Evolutionary pareto optimization of an ANFIS network for modeling scour at pile groups in clear water condition. Fuzzy Sets Syst. 319, 50–69. https://doi.org/10. 1016/j.fss.2016.10.010. Cheng, M.-Y., Cao, M.-T., 2014a. Accurately predicting building energy performance using evolutionary multivariate adaptive regression splines. Appl. Soft Comput. 22, 178–188. https://doi.org/10.1016/j.asoc.2014.05.015. Cheng, M.Y., Cao, M.T., 2014b. Accurately predicting building energy performance using evolutionary multivariate adaptive regression splines. Appl. Soft Comput. J. https:// doi.org/10.1016/j.asoc.2014.05.015. Chou, S.M., Lee, T.S., Shao, Y.E., Chen, I.F., 2004. Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regression splines. Expert Syst. Appl. https://doi.org/10.1016/j.eswa.2003.12.013. Chua, L.H.C., Holz, K.-P., 2005. Hybrid neural network—finite element river flow model. J. Hydraul. Eng. 131, 52–59. Condon, L.E., Gangopadhyay, S., Pruitt, T., 2014. Climate change and non-stationary flood risk for the Upper Truckee River Basin. Hydrol. Earth Syst. Sci. Discuss. 11, 5077–5114. https://doi.org/10.5194/hessd-11-5077-2014. Conoscenti, C., Ciaccio, M., Caraballo-Arias, N.A., Gómez-Gutiérrez, Á., Rotigliano, E., Agnesi, V., 2015. Assessment of susceptibility to earth-flow landslide using logistic regression and multivariate adaptive regression splines: a case of the Belice River basin (western Sicily, Italy). Geomorphology. https://doi.org/10.1016/j.geomorph. 2014.09.020. Crino, S., Brown, D.E., 2007. Global optimization with multivariate adaptive regression splines. IEEE Trans. Syst. Man Cybern. B Cybern. https://doi.org/10.1109/TSMCB. 2006.883430. Moriasi, D.N., Arnold, J.G., Van Liew, M.W., Bingner, R.L., Harmel, R.D., Veith, T.L., 2007. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 50, 885–900. https://doi.org/10.13031/2013. 23153. Das Gupta, A., 2008. Implication of environmental flows in river basin management. Phys. Chem. Earth Parts A/B/C 33, 298–303. https://doi.org/10.1016/j.pce.2008.02. 004. Deo, R.C., Kisi, O., Singh, V.P., 2017. Drought forecasting in eastern Australia using multivariate adaptive regression spline, least square support vector machine and M5Tree model. Atmos. Res. https://doi.org/10.1016/j.atmosres.2016.10.004. Deo, R.C., Samui, P., 2017. Forecasting evaporative loss by least-square support-vector regression and evaluation with genetic programming, gaussian process, and minimax probability machine regression: case study of brisbane city. J. Hydrol. Eng. 22, 05017003. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001506. Deo, R.C., Tiwari, M.K., Adamowski, J.F., Quilty, J.M., 2016. Forecasting effective drought index using a wavelet extreme learning machine (W-ELM) model. Stoch. Environ. Res. Risk Assess. 1–30. https://doi.org/10.1007/s00477-016-1265-z. Diop, L., Bodian, A., Djaman, K., Yaseen, Z.M., Deo, R.C., El-shafie, A., Brown, L.C., 2018. The influence of climatic inputs on stream-flow pattern forecasting: case study of Upper Senegal River. Environ. Earth Sci. 77, 182. https://doi.org/10.1007/s12665018-7376-8. Emamgolizadeh, S., Bateni, S.M., Shahsavani, D., Ashrafi, T., Ghorbani, H., 2015. Estimation of soil cation exchange capacity using Genetic Expression Programming (GEP) and Multivariate Adaptive Regression Splines (MARS). J. Hydrol. 529, 1590–1600. https://doi.org/10.1016/j.jhydrol.2015.08.025.

11

Journal of Hydrology 573 (2019) 1–12

Z. Abdulelah Al-Sudani, et al. Piotrowski, A.P., Napiorkowski, J.J., 2011. Optimizing neural networks for river flow forecasting – Evolutionary Computation methods versus the Levenberg-Marquardt approach. J. Hydrol. 407, 12–27. https://doi.org/10.1016/j.jhydrol.2011.06.019. Rezaie-balf, M., Naganna, S.R., Ghaemi, A., Deka, P.C., 2017. Wavelet coupled MARS and M5 model tree approaches for groundwater level forecasting. J. Hydrol. https://doi. org/10.1016/j.jhydrol.2017.08.006. Sekulic, S., Kowalski, B.R., 1992. Mars : A Tutorial 6, 199–216. Sephton, P., 2001. Forecasting recessions: can we do better on MARS? Review 83, 39–49. Shiri, J., Kisi, O., 2010. Short-term and long-term streamflow forecasting using a wavelet and neuro-fuzzy conjunction model. J. Hydrol. 394, 486–493. https://doi.org/10. 1016/j.jhydrol.2010.10.008. Suykens, J.A.K., Vandewalle, J., 1999. Least squares support vector machine classifiers. Neural Process. Lett. 9, 293–300. https://doi.org/10.1023/A:1018628609742. Tao, H., Diop, L., Bodian, A., Djaman, K., Ndiaye, P.M., Yaseen, Z.M., 2018. Reference evapotranspiration prediction using hybridized fuzzy model with firefly algorithm: Regional case study in Burkina Faso. Agric. Water Manage. Tilmant, A., Lettany, J., Kelman, R., 2007. Hydrological risk assessment in the euphratestigris river basin: a stochastic dual dynamic programming approach. Water Int. 32, 294–309. https://doi.org/10.1080/02508060708692208. Tong, S.L., Cui, C.F., Bai, Y.L., Zhu, W.J., Yu-Sun, En-Hua, 2016. Application of multivariate adaptive regression spline models in long term prediction of river water pollution. Taiwan Water Conservancy. https://doi.org/10.1016/j.jhydrol.2015.12. 014. Tongal, H., Booij, M.J., 2018. Simulation and forecasting of streamflows using machine learning models coupled with base flow separation. J. Hydrol. https://doi.org/10. 1016/j.jhydrol.2018.07.004. Vapnik, V., 1995. The Nature of statistical Learning Theory, Data Mining and Knowledge Discovery. Springer-Verlag New York, Inc, New York, NY, USA. Villarini, G., Smith, J.A., Serinaldi, F., Ntelekos, A.A., 2011. Analyses of seasonal and annual maximum daily discharge records for central Europe. J. Hydrol. 399, 299–312. https://doi.org/10.1016/j.jhydrol.2011.01.007. Wang, Liang-Jie, Guo, Min, Sawada, Kazuhide, Lin, Jie, Zhang, Jinchi, 2015. Landslide susceptibility mapping in Mizunami City, Japan: a comparison between logistic regression, bivariate statistical analysis and multivariate adaptive regression spline models. CATENA 135, 271–282. https://doi.org/10.1016/j.catena.2015.08.007. Wang, X., Park, T., Carriere, K.C., 2010. Variable selection via combined penalization for high-dimensional data analysis. Comput. Stat. Data Anal. 54, 2230–2243. https://doi. org/10.1016/j.csda.2010.03.026.

Xie, G., Wang, S., Zhao, Y., Lai, K.K., 2013. Hybrid approaches based on LSSVR model for container throughput forecasting: a comparative study. Appl. Soft Comput. 13, 2232–2241. https://doi.org/10.1016/j.asoc.2013.02.002. Yaseen, Z., Kisi, O., Demir, V., 2016a. Enhancing long-term streamflow forecasting and predicting using periodicity data component: application of artificial intelligence. Water Resour. Manage. https://doi.org/10.1007/s11269-016-1408-5. Yaseen, Z.M., Awadh, S.M., Sharafati, A., Shahid, S., 2018a. Complementary data-intelligence model for river flow simulation. J. Hydrol. 567, 180–190. https://doi.org/ 10.1016/j.jhydrol.2018.10.020. Yaseen, Z.M., El-shafie, A., Jaafar, O., Afan, H.A., Sayl, K.N., 2015. Artificial intelligence based models for stream-flow forecasting: 2000–2015. J. Hydrol. 530, 829–844. https://doi.org/10.1016/j.jhydrol.2015.10.038. Yaseen, Z.M., Fu, M., Wang, C., Hanna, W., Wan, M., Deo, R.C., El-shafie, A., 2018b. Application of the Hybrid Artificial Neural Network Coupled with Rolling Mechanism and Grey Model Algorithms for Streamflow Forecasting Over Multiple Time Horizons. doi: 10.1007/s11269-018-1909-5. Yaseen, Z.M., Jaafar, O., Deo, R.C., Kisi, O., Adamowski, J., Quilty, J., El-shafie, A., 2016b. Stream-flow forecasting using extreme learning machines: a case study in a semi-arid region in Iraq. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2016.09.035. Yaseen, Z.M., Sulaiman, S.O., Deo, R.C., Chau, K.-W., 2018c. An enhanced extreme learning machine model for river flow forecasting: state-of-the-art, practical applications in water resource engineering area and future research direction. J. Hydrol. Zafar, A., Shah, S., Khalid, R., Hussain, S.M., Rahim, H., Javaid, N., 2017. A metaheuristic home energy management system, in: Proceedings - 31st IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2017. doi:10.1109/WAINA.2017.118. Zhang, W., Goh, A.T.C., 2016. Multivariate adaptive regression splines and neural network models for prediction of pile drivability. Geosci. Front. 7, 45–52. https://doi. org/10.1016/j.gsf.2014.10.003. Zhang, W., Goh, A.T.C., Zhang, Y., Chen, Y., Xiao, Y., 2015. Assessment of soil liquefaction based on capacity energy concept and multivariate adaptive regression splines. Eng. Geol. https://doi.org/10.1016/j.enggeo.2015.01.009. Zhang, X., Zhou, J., Wang, C., Li, C., Song, L., 2012. Multi-class support vector machine optimized by inter-cluster distance and self-adaptive deferential evolution. Appl. Math. Comput. https://doi.org/10.1016/j.amc.2011.10.063. Zhou, Y., Leung, H., 2007. Predicting object-oriented software maintainability using multivariate adaptive regression splines. J. Syst. Softw. https://doi.org/10.1016/j.jss. 2006.10.049.

12