Estimating 2-year flood flows using the generalized structure of the Group Method of Data Handling

Estimating 2-year flood flows using the generalized structure of the Group Method of Data Handling

Journal of Hydrology 575 (2019) 671–689 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhy...

8MB Sizes 0 Downloads 34 Views

Journal of Hydrology 575 (2019) 671–689

Contents lists available at ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Research papers

Estimating 2-year flood flows using the generalized structure of the Group Method of Data Handling Rachel Waltona, Andrew Binnsa, Hossein Bonakdarib, Isa Ebtehajc, Bahram Gharabaghid,

T



a

School of Engineering, University of Guelph, Guelph, Ontario N1G 2W1, Canada Department of Civil Engineering, Razi University, Kermanshah, Iran c Environmental Research Center, Razi University, Kermanshah, Iran d Water Resources Engineering, Room 2417, Thornbrough Building, School of Engineering, University of Guelph, 50 Stone Road East, Guelph, Ontario N1G 2W1, Canada b

ARTICLE INFO

ABSTRACT

This manuscript was handled by A. Bardossy, Editor-in-Chief, with the assistance of Jozsef Szilagyi, Associate Editor

The lack of stream gauging data worldwide has inspired an abundance of methods attempting to predict the complex natural process of streamflow. A set of traditional regression equations have been developed by the U.S. Geological Survey for the state of Iowa. However, several inconsistencies were noted, including irregular region definition and varying input parameters between regions. To overcome the common problems faced in modelling, and to combat the irregularities in the regression equations, a new Group Method of Data Handling (GMDH) algorithm has been proposed using a combination of the classical combinatorial GMDH and multilayer GMDH to predict the 2-year peak flow. Data from 365 stream gauges located in the state of Iowa were used in the model development. The new algorithm was compared to the two classical algorithms and the traditional linear regression equations developed by the U.S. Geological Survey. The new algorithm which requires fewer input parameters without relying on regionalization was found to be the best method. The developed algorithm is based on a more robust hydrologic theory and therefore is a reliable method for predicting the 2-year peak flow for ungauged basins. The two main hypotheses inspired by the past regression equations and suggested in this study were that regionalization is not necessary and that using fewer input parameters to avoid redundancy still produces a robust prediction method. This study has developed a novel and reliable prediction method for peak flow prediction at ungauged basins while advancing insight into the most influential variables governing the magnitude of the flow.

Keywords: Group Method of Data Handling (GMDH) Floods Rivers Ungauged basins Streamflow

1. Introduction The hydrologic cycle is a complex natural system, which consists of many equally complex sub-systems. Streamflow is one sub system, and knowledge of streamflow is necessary for watershed planning and design. It can be measured in the field; however, long-term data are required to make appropriate decisions and assess risks associated with certain magnitudes of flows. Long-term, consistent stream gauging data is sparse across the world. Decades of modelling efforts have been put forth attempting to quantify the principal factors governing streamflow to develop an accurate prediction method (Hrachowitz et al., 2013). Streamflow can be reported in several ways; one common method is the return period, for example, the 2-year return period flow (Q2). The return period categorizes the probability associated with a certain magnitude of flow being equalled or exceeded in any given year (USGS, 2016). To statistically determine the return period flows for a basin

either long-term data or an accurate model for estimation are needed. Many modelling techniques have been used for evaluating return period peak flows with some success, such as linear regression, spatial proximity, and region of influence (Ahn and Palmer, 2016; Haddad and Rahman, 2012; Merz and Blöschl, 2005; Razavi and Coulibaly, 2013). When a linear regression is used for peak flow prediction purposes basin characteristics are related to the desired peak flow quantile by statistically determining the optimal variable weighting. This can be carried out using a variety of linear regression procedures. Regardless of the procedure, the general assumptions made are that the model is linear and each input is independent, has equal variance and is normally distributed (Su et al., 2012). Violations to these assumptions may introduce some error into the linear regression model. However, the extent of the error is unknown (Su et al., 2012). Considering the complexity of the inherent non-linearity present in natural systems, many complications occur when modelling attempts

⁎ Corresponding author at: Water Resources Engineering, Room 2417, Thornbrough Building, School of Engineering, University of Guelph, 50 Stone Road East, Guelph, Ontario N1G 2W1, Canada. E-mail addresses: [email protected] (R. Walton), [email protected] (A. Binns), [email protected] (B. Gharabaghi).

https://doi.org/10.1016/j.jhydrol.2019.05.068 Received 29 January 2019; Received in revised form 18 April 2019; Accepted 20 May 2019 Available online 23 May 2019 0022-1694/ © 2019 Elsevier B.V. All rights reserved.

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

being made. Soft computation methods have been frequently used throughout recent decade that can produce efficient and more accurate method for solving complex problems in the water resources field (Tayfur, 2012, Ebtehaj et al., 2019). One concern, regarding the application some of the artificial approaches is that they are not able to produce an explicit expression between input and output variables. In water resources engineering an expression can help acquire better insight into a problem for practical application. The Group Method of Data Handling (GMDH) method is a powerful subset of artificial intelligence (AI) methods for modeling complex water resources multi-parameter problems GMDH is utilized in diverse problems, such as trap efficiency of detention dams (Parsaie et al., 2017) soil cumulative infiltration (Rahmati, 2017), rainfall–runoff modeling (Zhang et al., 2013), lake level variation (Zaji et al., 2018), and simulating river characteristics (Najafzadeh and Tafarojnoruz, 2016; Shaghaghi et al., 2018). The main goal of this study is to evaluate the three different GMDHbased methods on the stream gauging data from Iowa and determine if these methods can overcome the limitations of the original regression equations developed for predicting the 2-year peak flow. Sensitivity analysis results of GMDH-based equation for each input variable were carried out to investigate the effect of hydrological parameters of the basin on 2-year return flow prediction.

parameters due to a criterion value after some iterations’ layer and calculation time while arranging the models. At that point, full arranging techniques proceed for the chosen set of best variables until progress is negligible. This offers the probability to set considerably more information factors at the input and to save successful factors between layers to discover the ideal model. Generally, modeling a nonlinear problem with combinational GMDH algorithm consists of five steps, as shown in Fig. 1. The data sample for training and testing are defined in the first step. In the second step, different layers are presented to describe the complexity of models. The optimal models are formed in the third step, and the optimal model is selected in the fourth step. Finally, the additional model definition is done by discriminating criteria during the fifth step. 2.1.2. Multilayer GMDH algorithm (MGA) The classical structure of the GMDH method (Ivakhnenko, 1971, 1995) is a self-organized and unidirectional network for solving complex nonlinear problems. The GMDH consists of several layers, and each layer consists of several neurons. All neurons have the same structure, two inputs, and one output. Each neuron consists of five weights, and one bias establishes the processing operation between the input and output data according to the following equation:

yik = N (xi , xi ) = bk + w1k x i + w2k x i + w3k x i2 + w4k x i2 + w5k x i x i

2. Materials and methods

where N is the number of input variables, K = 1,2,3,…,Cm2, , {1, 2, 3, ...,m} and m is a number of neurons in the previous layer. Weights are calculated based on the least square error method and then embedded in specified and fixed values within each neuron. The obvious feature of this network is that the previous stage or the previous layer's neurons are the factor and generator of the production of new neurons in the number of Cm2 = m(m − 1)/2, and among the produced neurons, certain deletions to avoid network divergence (Farlow, 1984). Neurons that remain for the continuation and expansion of the network may be omitted to form the network convergence form because they are not linked to the last layer’s neuron, which is referred to as inactive neurons. The criterion for the selection and removal of a set of neurons in a layer is the ratio of the sum of squared errors (rj2) between the actual output values (yi) and the output of the neuron jth (yij*) as follows:

2.1. Group Method of Data Handling Ivakhnenko (1968) presented a technique for constructing a higherorder polynomial known as Group Method of Data Handling, to overcome the dead-end problem of equation complexity and linear dependency present in standard regression equations. Two examples of algorithms used in nonlinear modeling problems are combinatorial and multilayer GMDH algorithms. 2.1.1. Combinatorial GMDH algorithm The combinatorial GMDH algorithm (CGA) is the main single-layer algorithm for GMDH. The flowchart of the CGA is presented in Fig. 1. The matrix of input dataset comprises of N points of observed samples related to a set of M parameters. These samples are categorized into two groups; training subsamples (NA) and testing subsamples (NB). The training sub-samples are employed to generate the optimal model structure by minimizing the following fitness function:

1 AR (s ) = NB

NB

(yi

yi (B )) 2

min

i=1

r j2 =

(1)

i = 1, 2, ...,M

j = 1, 2, ...,M

(yi

yij )2

N y2 i=1 i

m

y = a0 +

{1, 2, 3, ...,Cm2 }

j

m

m

ai x i + i=1

(5)

m

m

m

aij x i xj + i=1 j=1

aijk x i xj xk + ... i=1 j=1 k =1

(6)

The structure for the neurons is given in a summary form of two second order variables as follows:

yi = f (x ip , x iq) = a 0 + a1 x ip + a2 x iq + a3 x ip x iq + a4 x ip2 + a5 xiq2

(2)

(7)

The function f has six coefficients that are unknown, which, for all pairings dependent on system {(x ip , x iq), i = 1, 2, ...,N } , yields the desired output {(yi ), i = 1, 2, ...,N } . The following statement is minimized based on the least square error rule:

Non-linear individuals are taken as novel input parameters in the training data sub-sample. The progress of the yield variable is indicated by the user. All models in the next layer are arranged in the following form:

y = a0 + a1 x i + a2 xj ,

N i=1

where m is the number of selected neurons in the previous layer. The mapping between input and output variables by this type of neural network is represented by the Volterra linear function as follows:

The k-fold cross-validation criteria, which considers all information in data samples by a random selection of training and testing subsamples (for instance; k = 5), is used as the fitness function. A comprehensive inquiry is performed on models classed into gatherings of equivalent complexity, allowing one to plan the comprehensive search termination rule. For instance, the primary layer could use the data held on each section of the sample and a full inquiry is connected to every conceivable model of the form:

y = a0 + a1 x i ,

i (4)

= 1, 2, ...N

N

(3)

E = Min

[(f (xki , xkj ) k=1

The models are assessed for consistency with the measured samples, continuing until the criterion is satisfied. Due to the constraint inflicted by the time of computation period, it is recommended to extend

yi )2]

(8)

Accordingly, an equation system with six unknown parameters and N equations is solved. 672

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Fig. 1. The flowchart of combinatorial GMDH algorithm.

2.1.3. Generalized structure of GMDH algorithm (GSGA) In the previous sub-sections, two different algorithms were introduced for modeling nonlinear systems using the GMDH method: CGA and MGA. Both algorithms present certain disadvantages, which can be improved upon using new encoding. The benefits of single layer self-organizing systems are shown in their simplicity and capacity to perform through the arrangement of the model structure. However, as the quantity of terms expands so does the number of the indispensable observations which limit them from poorly characterized frameworks portrayed by a low quantity of accessible observations, and incomplete input vectors — the main shortcomings of the CGA are related to the computational capability for modeling highly complicated models. The calculation time is long because of the total sorting out strategy, which constrains the quantity of power and terms in the last model. Another limitation in modeling nonlinear problems using the MGA method is that only two variables are considered as inputs for each neuron. In addition, the structure of MGA is presented as a second-order polynomial (Eq. (7)). In classical multilayer GMDH techniques, the input variables of each neuron are selected from adjacent neurons. In this study, to overcome the disadvantages while capitalizing on the numerous advantages of both CGA and MGA strategies, a new GMDH-based method was encoded in MATLAB software, deemed the generalized structure of GMDH algorithm (GSGA). The proposed method overcomes the limitations present in both classical algorithms. In the GSGA method, both MGA and CGA are considered, and the

y1 = a 0 + a1 x1p + a2 x1q + a3 x1p x1q + a4 x12p + a5 x12q y2 = a 0 + a1 x2p + a2 x2q + a3 x 22p x2q +

a4 x 22p

+

a5 x 22q

2 2 yN = a0 + a1 xNp + a2 xNq + a3 xNp xNq + a4 xNp + a5 xNq

(9)

The above-mentioned equation system is presented in the following matrix form:

Aa = Y

(10)

where

a = {a0 , a1, a2 , a3 , a4 , a5 }T

(11)

Y = {y1 , y2 , y3 , ...,aN }

(12)

1 x1p x1q x1p x1q x12p x12q A=

1 x2p x2q x2p x2q x 22p x 22q . . . .. . . . . . .. . . 2 2 1 xNp xNq xNp xNq xNp xNq

(13)

The least square method from multiple regression analysis produces the normal equations solution as follows:

a = (AT A) 1AT Y

(14) 673

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

optimal model is selected based on the Akaike Information Criterion (AIC) (Ebtehaj et al., 2014) as follows:

AIC = N × log

1 N

Through the regression analysis carried out by Eash et al. (2013) it was determined the best prediction variables were basin drainage area (DA), the maximum 24-hour precipitation that occurs on average once every 10 years (I24H10Y), the constant of channel maintenance (CCM), the percent of DA within the Des Moines Lobe landform region (DESMOIN), the measure of basin shape (BSHAPE) and the average saturated hydraulic conductivity of the soil (KSATSSUR). DA is the most frequently included parameter in streamflow prediction methods, and it has a high correlation with peak flow values (Ahn and Palmer, 2016; Atieh et al., 2015, 2017; Aziz et al., 2014). It is positively correlated with Q2: for a larger value of DA, a higher Q2 is expected (Eash et al., 2013). A precipitation variable is typically included in prediction methods; however, the most common variable is the mean annual precipitation (MAP) (Curran et al., 2016; Eash et al., 2013; Olson and Veilleux, 2014). MAP is commonly included because it is measured at many locations and easily obtained. Iowa uses I24H10Y instead as a predictor variable, which may give a better indication of higher flow events compared to MAP. The variation in I24H10Y is shown in Fig. 4a). The difference in I24H10Y across the state is only 32 mm, demonstrating a homogenous representation for this variable. CMM represents the drainage density of the basin and is calculated by dividing the drainage area by the total length of the streams in the basin (Eash et al., 2013). A basin with fewer streams per DA will result in a higher CCM. Conversely, a basin with more streams per DA will have a lower CCM. CCM was found to be negatively correlated with Q2 (Eash et al., 2013). Iowa is the only state found to incorporate this variable in their peak flow regression equations, and it was only used in the equation for region 1 (Table 1). DESMOIN represents the percent of the basin in the Des Moines Lobe. During the last glaciation in North America approximately 12,000 years ago a massive ice sheet encroached onto the mid continental USA (Prior, 1991). This developed the Des Moines landform, which still distinctly represents the glacial activity that occurred in Iowa. Prior to agriculture activity, it was estimated that 44% of the Des Moines Region was wetlands (Miller et al., 2009) which has created poorly drained soils in the region (Prior, 1991). Most of the natural lakes in Iowa are in this landform region (Prior, 1991). The distinct landscape represented by the Des Moines Lobe makes it a logical variable to include in the prediction of peak flows in Iowa. Eash et al. (2013) found that DESMOIN was negatively correlated with the peak flow. Therefore areas with higher DESMOIN have lower peak flows. BSHAPE is the ratio of basin length squared over the drainage area of the basin. Eash et al. (2013) found that BSHAPE was negatively correlated with peak flow. This means that basins with a larger length to width ratio typically have lower peak flows. The longer flow path will increase the travel time to the outlet, increasing opportunities for losses through abstractions and therefore resulting in lower peak flow. The larger length to width ratio may also provide more opportunities for runoff to concentrate along the flow path, a decreasing volume that reaches the outlet. KSATSSUR represents the soil pores ability to transmit water when saturated (Eash et al., 2013). The lower the value of KSATSSUR, the lower the infiltration ability, therefore the higher the runoff. KSATSSUR is negatively correlated with runoff, which in theory will contribute to lower peak flow. KSATTSUR ranges from 2.7 μm/s to 32.1 μm/s which has significant spatial variation across the state, shown in Fig. 4b). The northeast corner of Iowa has the highest KSATTSUR which should indicate areas of lower flow. South Iowa has the lowest KSATTSUR indicating that this area will generate higher peak flows. Comparing both maps in Fig. 4, the areas with the lowest KSATTSUR in the south receive relatively higher I24H10Y. If I24H10Y is the best precipitation indicator of the 2-year peak flow and KSATTSUR is an influential predictor of the soil conditions than the south of Iowa should be receiving the highest flows in the state. Both KSATTSUR and I24H10Y exhibit similar spatial trends across Iowa;. The correlation coefficient was

N

(yi

yi )2 + 2 × k

i=1

(15)

where N is the number of samples, y and y are the observed and predicted samples and k is the number of modifiable variables. For example, using MGA with two inputs and one neuron (Eq. (7)), k is equal to 6. The defined structure in GSGA is not as constant as in Eq. (7). It could be a neuron with more than two inputs, a higher order polynomial, or it could be selected from the inputs of a neuron from other layers. The main structure defining the CGA algorithm for the GSGA method is as follows: i) In the first structure, all possible multiplied pairs are considered as the model structure (xi × xj). ii) In addition to the terms considered in the second state (xi × xj), all possible squares will be considered (xi × xj, xj2). iii) The difference of this structure compared to the previous is all possible divided pairs are used instead of all possible squares (xi × xj, xi/xj). iv) The fourth structure is the “Custom polynomial.”.” In this state, the optimum polynomial is gained based on the defined values as following: - Maximum power of a variable; - Minimum power of a variable; - Maximum total power in a term; and - A maximum number of a variable in a term. The main defined structure in the MGA algorithm is as follows: i) Linear (a0 + a1·xi + a2·xj); ii) Polynomial (a0 + a1·xi + a2·xj + a3·xi·xj); iii) Quadratic polynomial (a0 + a1·xi + a2·xj + a3·xi·xj + a4·xi2 2 + a5·xj ); and iv) Custom: use of custom polynomial function by defined values as those presented in the fourth structure of the CGA. The following example defines a different structure. The maximum number of power, maximum total power and maximum number in a term is equal to 2, where the minimum power is equal to zero. This is a special kind of GSGA which yields Eq. (7). By increasing the maximum power and maximum total power to 4, and decreasing the maximum number in a term, the following equation is developed:

yi = f (x ip , x iq) = a0 + a1 x ip + a2 x ip2 + a3 xip3 + a4 x ip4 + a5 x iq + a6 xiq2 + a7 xiq3 + a8 x iq4 (16) The flowchart of the GSGA is presented in Fig. 2. 2.2. Data collection The data used to develop the original regression equations and descriptions of the methods followed for the data collection in Iowa is publicly available in the “Methods for Estimating Annual ExceedanceProbability Discharges for Streams in Iowa, Based on Data through Water Year 2010” report by Eash et al. (2013). This data set was used in this study to apply the three GMDH methods to compare to the original regression equations. 365 stream gauges from Iowa and surrounding states, with their corresponding basin characteristics and peak flow data, were collected. The location of each stream gauge is shown in Fig. 3. The Q2 for each stream gauge was determined through statistical analysis by Eash et al. (2013). 674

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Fig. 2. The flowchart of generalized structure of GMDH algorithm (GSGA).

calculated to be −0.53 showing that they are negatively correlated. Both should be considered as variables to predict Q2. The range of each variable collected for analysis is reported in Table 2.

input variables. The three input variables were chosen based on the perceived relevance and the high visual variability across the state. The first step in GMDH-based modeling is to categorize all samples into training sub-samples and testing sub-samples. The user-defined parameters are selected next. They are the maximum power of a variable, minimum power of a variable, maximum total power in a term and a maximum number of variables in a term. These parameters are changed throughout the modeling process to find the best model. Three different GMDH algorithms are employed in this study. The three algorithms are CGA, MGA, and GSGA. The GSGA is a combination of CGA and MGA, modified to overcome the limitations present in both CGA and MGA. Therefore the final GSGA model is likely to have a similar structure to the CGA or MGA models. The structure of the proposed equation by MGA is shown in Fig. 5. From Fig. 5, the optimum structure used adjacent and non-adjacent layers in the neurons of the second and fourth layers. The best explicit equation for Q2 prediction derived from CGA, MGA, GSGA, and GSGA with only three variables, are presented in the following boxes.

3. Development of GMDH-based equations In this study, four different GMDH-based methods were evaluated to determine if a more reliable prediction of Q2 could be produced compared to the previously developed regression equations. The same independent variables used for the regression equations are used as the model inputs for the first three GMDH- based methods, CGA, MGA, and GSGA: DA, DESMOIN, I24H10Y, KSATSSUR, BSHAPE, CCM. The following model has been developed based on the independent variables:

Q2 = f (DA, DESMOIN, I 24H10Y, KSATSSUR, BSHAPE, CCM)

(17)

The fourth model used the GSGA method using different input variables: DA, I24H10Y, and KSATSSUR. The fourth model was applied to determine if a comparable results could be produced using fewer

675

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Fig. 3. Location of stream gauges and relative location of region boundaries in Iowa and surrounding states used by the USGS in developing the USGS regression equations and used in this study for developing the GMDH models.

Box 1 . The optimal equation for Q2 (CGA).

676

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Box 2 . The optimal equation for Q2 (MGA).

677

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Box 3 . The optimal equation for Q2 (GSGA).

678

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Box 4 . The optimal equation for Q2 (GSGA with DA, I24H10Y and KSATSSUR as input parameters).

4. Performance evaluation criteria The capability of the proposed GMDH-based method for predicting the 2-year peak flow at ungauged basins is evaluated using two absolute indices; correlation coefficient (R), mean absolute error (MAE) and root mean square error (RMSE) and one relative index; mean absolute relative error (MARE), as shown in Eqs. (18)–(21).

R=

MAE =

(

n i=1

n i=1

(Oi

MARE =

O¯ ) 2

n i=1

(Pi

P¯ ) 2

|Oi

1 n

(19)

Pi ) 2

i=1

n i=1

|Oi

,

ENS

1

Q o, i

(22)

The U.S. Geological Survey (USGS) has developed an extensive set of regression equations for predicting return period peak flows for most of the states in the USA, including the state of Iowa. To develop the regression equations for Iowa a set of 394 stream gauges were used with the gauges being divided into three homogenous regions (Eash et al., 2013). Both ordinary least squares (OLS) and general least squares regression (GLS) analysis were employed as regression techniques for equation development. OLS was used to develop the preliminary set of equations by determining the best set of variable combinations. GLS was used to refine and develop the final set of equations because this method takes into consideration the length and variance in streamflow records. Therefore, gauges with varying record length can be used with confidence in the same analysis (Eash et al., 2013). The 2-year peak flow regression equations developed for the three defined regions in Iowa are in Table 1. Each equation performs very well for the data it was developed upon. However, the equations are non-transferable to other regions or states. The three equations have several similarities which have inspired the goal of the present study, to develop one model to predict peak flows for Iowa without using regionalization while maintaining similar accuracy. Fig. 3 shows the relative locations of the region boundaries as determined by Eash et al. (2013). In certain locations, the three regions are very close together with an approximate separation difference of only 50 km. However, the average difference in calculated flow rates using the three different equations is significant. Each equation was

(20)

Pi | Oi

__ 2

5. Results and discussion

(18)

n

(Oi

Qo, i

Qf , i ) 2

5.1. USGS regression equations for the state of Iowa

Pi|

i=1

1 n

(Qo, i

where Qo,i and Qf,i are observed and predicted discharge values and is the mean of the observed discharge value.

2

P¯ ) )

N i=1 N i=1

n

1 n

RMSE =

O¯ )(Pi

(Oi

ENS = 1

(21)

where Oi and Pi are the observed and predicted values (respectively), O¯ and P¯ are the means of observed and predicted values (respectively), and n is the number of samples. The MAE and RMSE indicate the goodness-of-fit due to the sign of the deviations of the observed data from the predicted peak flow values. It should be noted that the RMSE index is suited for normally distributed predicted errors (Chai and Draxler, 2014) and is not appropriate for all prediction models. The MAE index is employed to assess all differences between predicted and observed values irrespective of the sign (Yaseen et al., 2017). The MARE considers the relative error of the predicted values related to the observed values. The R value is bound in a range of [−11] which designates the covariance in the observed peak flows. MAE, RMSE, and R are insusceptible to outliers and the proportional and additive difference among the observations and predictions. For instance, a shift in all predicted values can prompt mistaken conclusions if the model is assessed exclusively based on R. The precision of proposed GMDH-based equations is determined using the Nash-Sutcliffe coefficient (ENS) (Nash and Sutcliffe, 1970): 679

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

applied to the entire data set, which revealed an average difference between regions 1 and 2 of 32.7 m3/s, regions 2 and 3 of 25.8 m3/s, and regions 1 and 3 of 45.9 m3/s. Many stream gauges are located very close to the regions boarder which emphasizes the error that will occur if the wrong equation is applied. Each equation uses a different combination of input variables; the only constant variable between the equations is DA. DA is incorporated in regions 1 and 3 in a similar fashion; however, in region 2, it is part of the equation in a more convoluted manner. Since DA is independent of location, the form of DA in region 2 is unexpected. According to the regression equations, CCM, DESMOIN, and BSHAPE are unique properties of regions 1, 2 and 3, respectively. Both CCM and BSHAPE are functions of DA. CCM is also a function of the total length of streams in the basin. CCM has a small variation of 3.35 km2/km throughout the entire state. Region 1 is the only region that incorporates this variable,

where the variation between minimum and maximum values is only 3 km2/km. Regions 2 and 3 still exhibit variation in CCM of 1.04 km2/ km and 0.66 km2/km, respectively. However, it is not included in the equation. BSHAPE ranges from 0.55 km to 22.4 km, exhibiting more variation than CCM, but no logical conclusion can be drawn to explain why such a basin specific variable is only significant in region 3. DESMOIN is a variable that is specific to Iowa. The majority of region 1 is within the DESMOIN landform region (Eash et al., 2013). However, this variable is only incorporated in region 2. Region 2 has a few stream gauges that border region 1. Therefore some basins in region 2 cover a portion of the DESMOIN landform. It is interesting that DESMOIN is not included in region 1. Eash et al. (2013) explain that CCM is used to represent the effects of glacial activity in region 1. Therefore it is interesting that CCM is not used in region 2 as well. The decision to include CCM, DESMOIN, and BSHAPE in only one region each pose the

Fig. 4. a) Variation of I24H10Y across Iowa and b) variation of KSATSSUR across Iowa. Both figures display the relative region boundary locations as prescribed by Eash et al. (2013). 680

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

3 exhibits the most spatial variability in KSATSSUR as it covers some of the highest values in the northeast corner and the lowest values in the southern region of the state. Region 2 does exhibit spatial variation in KSATSSUR however it is not included in the equation. KSATSSUR is a property of the soil, and it has been found that the conditions of the soil affect the magnitude of the peak flow experienced in a basin (Grillakis et al., 2016). The significant spatial variation and the impact that soil properties can have on the magnitude of the peak flow confirm that KSATSSUR is an important variable to include. The regional delineation does not align with the spatial variation in the chosen input variables, which strengthens the hypothesis that regionalization is not necessary for prediction.

Table 1 The 2-year peak flow regression equations developed by the USGS for the state of Iowa. Region

# of Gauges

Regression Equation

1

91

2

176

Q2 = DA0.675 10

3

127

Q2 = 10 ( Q2 =

0.355 + 0.601 × I 24H 10Y 0.595 × CCM 0.55

49 + 51.2 × DA0.005 ) × (DESMOIN + 1) 0.069

DA0.600

× 10 2.43

0.009 × KSATSSUR 0.017 × BSHAPE )

Note: In Table 1, Q2 is the 2-year return flow in ft3/s, DA is the basin drainage area in mi2, I24H10Y is the maximum 24-hour precipitation that happens on average once every 10 years in inches, CCM is the constant of channel maintenance in mi2/mi, DESMOIN is the percent of DA within the Des Moines Lobe landform region, BSHAPE is the measure of basin shape in mi and KSATSSUR is the average saturated hydraulic conductivity of the soil in µm/s.

5.2. Performance evaluation of GMDH-based equation Fig. 6 shows the performance of the fourth GMDH based model in predicting Q2 following the input combination in Eq. (17). The MGA and GSGA models perform well in predicting the maximum value of the Q2 while the CGA model has weaker prediction capabilities. The MGA model has weaker performance with very low values of Q2 resulting in high errors. Therefore, caution should be taken using MGA models if it is known that Q2 is low. In the different samples all proposed models both under- and over-predict Q2 values. Fig. 6 also displays the error distribution of each of the models. The CGA model exhibits a right skew distribution where the MGA exhibits an extreme left skew distribution of absolute error. The GSGA model’s error is closest to a normal distribution in comparison to the other models. The quantitative comparison between the various six inputs GMDH methods presented in Table 3 shows that the lowest and highest relative error values are related to GSGA (MARE = 0.85) and MGA (MARE = 2.14), respectively. GSGA also has the best overall RMSE while MGA has the worst RMSE. In addition, the absolute error index of the GSGA method (RMSE = 1501.13) is significantly better than the two classic methods; CGA (RMSE = 1546.78) and MGA (RMSE = 1623.92). The smallest AIC index, which simultaneously considers the accuracy and complexity of the model, also proves superior to the CGA method (AIC = 1191.33). It should be noted that the low level of accuracy related to CGA model in all relative and absolute errors results in lower confidence when using this model. It is concluded that the proposed method in this study (GSGA) has been able to increase the accuracy of Q2 prediction by solving the weaknesses associated with CGA and MGA, such as the lack of use of neurons of nonadjacent layers, the limited number of parameters per layer and the limitation of maximum power associated with different terms.

Table 2 Minimum, maximum and average values for each input parameter used in modelling.

2

DA (km ) DESMOIN (%) I24H10Y (mm) KSATSSUR (mm/d) BSHAPE (km) CCM (km2/km)

Minimum Value

Maximum Value

Average Value

0.13 0 90.9 164 0.55 0.25

20156.9 100 127.3 2903 22.4 3.6

1100.6 21.6 111.9 899 6.6 0.93

hypothesis that they may not be necessary to include in prediction at all. I24H10Y is the precipitation variable incorporated in the regression equation, but it is only applied to the equation for region 1. Precipitation is the driving force of any flooding event, and therefore some representation of the precipitation should be used in a prediction equation. Fig. 4a) displays the spatial variation in I24H10Y across Iowa. Although I24H10Y only varies from 95 mm to 127 mm, that still represents a 29% difference across the state. The highest value of I24H10Y is in the southeast corner of the state. Conversely, the lowest I24H10Y is in the northwest corner of the state. Disregarding I24H10Y in regions 2 and 3 would be more logical if the regional delineation followed the spatial variation in I24H10Y. Region 3 covers portions of both the highest and lowest I24H10Y. Region 1 has the smallest variation in I24H10Y and is the only region to incorporate the variable. KSATSSUR is only incorporated in the equation for region 3. Region

Fig. 5. The optimum structure of MGA. 681

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Fig. 6. Scatter plots and normal distribution of absolute error related to the different GMDH-based methods used for predicting Q2.

682

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Table 3 Statistical indices for GMDH-based equations. Method

CGA_6_Input

MGA_6_Input

GSGA_6_Input

USGS_6_Input

GSGA_3_Input

R RMSE MARE AIC ENS

0.95 1546.78 1.34 1191.33 0.88

0.94 1623.92 2.14 1371.07 0.88

0.95 1501.13 0.85 1282.57 0.89

0.87 1682 0.43 – 0.88

0.95 1567.9 1.35 1168.3 0.88

Fig. 7. The box-plot of absolute error for the three different GMDH-based equations.

Recognizing that the GSGA method outperformed the classical GMDH-based equations, this method was further refined to include only three relevant input variables: DA, I24H10Y, and KSATSSUR. The three parameter GSGA model has a relative error (MARE = 1.35) that is higher than the original GSGA using six inputs (MARE = 0.85). The absolute indices (R = 0.95, RMSE = 1567.9) are comparable to the original three GMDH-based methods but are better than the USGS regression equation results (R = 0.87, RMSE = 1682). The ENS of all four models are very similar (CGA = 0.88, MGA = 0.88, GSGA (six parameters) = 0.89, USGS = 0.88, GSGA (three parameters) = 0.95). This shows that using more input variables does not increase the predictive power of the model. Using DA, I24H10Y, and KSATSSUR as inputs maintain prediction accuracy. In addition, AIC results show that GSGA_3_Input model with AIC = 1168.3 has greater accuracy compared to what USGS_6_Input has with AIC = 1282.57. From a logistical standpoint, this model is easier to apply as it requires only three inputs and regionalization was eliminated. This model is more coherent

compared to the regression equations. It is imperative in model development to have a strong understanding of the incorporation and structure of each input from a hydrological perspective to ensure that the trends will remain true for ungauged basins. The main advantage to the three parameter GSGA method is that it does not require regionalization to apply the equation. Regardless of geographical location with Iowa, the method can be applied. However, to apply the USGS regression equations, first the region must be Table 4 Uncertainty analysis for different GMDH-based equations.

683

Method

MIPE

SDIPE

WUB

95% PEI

CGA_6_Input MGA_6_Input GSGA_6_Input GSGA_3_Input

8.84 21.33 −0.73 7.46 × 10−11

1548.88 1626.01 1503.19 1569.98

± 159.43 ± 167.37 ± 154.73 ± 161.61

(−150.59, (−146.04, (−155.46, (−161.61,

168.27) 188.69) 153.99) 161.61)

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Fig. 8. The sensitivity analysis results of GMDH-based equation to each input variables (CGA).

determined. Boxplots relating the absolute error were developed to (1) further investigate the Q2 prediction performance of the proposed models, and (2) to evaluate the error distribution, as shown in Fig. 7. The highest relative error occurred using the MGA method. The distribution of the error presented in the boxplots represents the performance of the proposed method in this study at maximum Q2 rates. The maximum value associated with GSGA is lower than both of CGA and MGA and is approximately half that of the MGA method. In addition, the lowest average absolute error is related to the GSGA method. The boxplots provide additional strength to the conclusion that the GSGA method, which simultaneously uses the two structures presented in classical methods and modifies these structures to overcome the constraints of these methods, has improved the accuracy of the peak flow prediction.

IPE = Pj MIPE =

(23)

Oj 1 n

n

IPEj

(24)

j=1

n

SDIPE =

(IPEj j=1

MIPE )2 / n

1

(25)

where P and O are the predicted, and the observed values (respectively), and n is the number of samples. A negative or positive value of MIPE demonstrates the forecasting model underestimating or overestimating the observed samples, respectively. The confidence band is defined around the estimated values of an error using the Wilson score method without continuity correction (Ebtehaj et al., 2017). Considering ± 1.96 SDIPE produced a roughly 95% confidence bound. The results of the uncertainty analysis, which shows the 95% prediction interval error (PEI), the width of uncertainty band (WUB), SDIPE and IPE are presented in Table 4. The MIPE value for CGA, MGA and the three input GSGA are positive while the GSGA MIPE is negative. In fact, predicting Q2 using GSGA provides an overall underestimation while the other two methods provide an overestimation of Q2. The MIPE for the three parameters GSGA is low, demonstrating less uncertainty with the overestimations. The WUB for GSGA (WUB = ± 154.73) is the lowest value among CGA

5.3. Uncertainty analysis A quantitative evaluation of the uncertainties associated with the estimation of Q2 is provided using developed GSGA versus the classical CGA and MGA. To calculate the uncertainty of the developed models, the individual prediction error (IPE); mean of IPE (MIPE) and standard deviation of IPE (SDIPE), should be computed. The IPE, MIPE, and SDIPE are defined as: 684

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Fig. 9. The sensitivity analysis results of GMDH-based equation to each input variables (MGA).

(WUB = ± 159.43), three input GSGA (WUB = ± 161.61) and (MGA (WUB = ± 167.37). The lowest 95% PEI is related to GSGA method.

using the partial derivative of the proposed equation related to each input variables (∂T/∂xi; i = 1, 2, …, n, where n is the number of input variables). The lower or higher value of the PDSA implies a lower or higher sensitivity of the proposed model to the target variable. A positive value of PDSA denotes that an increase in the desired variable results in an increase of the target variable. Conversely, a negative PDSA value means that an increase in the variable results in a decrease of the target variable. The PDSA results from CGA, MGA, GSGA and the three input GSGA

5.4. Partial derivative sensitivity analysis (PDSA) To investigate the variation in trends of Q2 prediction inflicted by each input variable a partial derivative sensitivity analysis (PDSA) was applied (Shaghaghi et al., 2017; Ebtehaj et al., 2018). In this approach, the sensitivity of the target variable to each input variable is studied 685

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

models relative to the input variables are provided in Figs. 8–11 respectively. The variation in trends for the CGA, MGA and GSGA are relatively similar so the sensitivity values in a different range as often positive as they are negative. It should be noted that the negative values in CGA and MGA are much higher than the positive values. It is shown that an extreme reduction of the dependent variable Q2 is caused by an increase of the independent variable BSHAPE. Comparison of three models indicates that the GSGA and MGA models have the highest and

lowest sensitivity to BSHAPE parameter, respectively. The sensitivity shown in the GSGA model is intuitive from a hydrological perspective as a higher BSHAPE indicates basins with a larger length to width ratio, which results in lower Q2 values. The variation in Q2 caused by CCM for CGA is constantly positive. Therefore, an increase in CCM results in a slight change in Q2 using the CGA model. The sensitivity of the MGA and GSGA models in relation to CCM has no clear trend. The sensitivity sign for these models is as often

Fig. 10. The sensitivity analysis results of GMDH-based equation to each input variables (GSGA).

686

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

Fig. 10. (continued)

Fig. 11. The sensitivity analysis results of GMDH-based equation to each input variable (GSGA) using only DA, I24H10Y and KSATSSUR as input variables.

positive as they are negative. For CCM values less than one, an increase in CCM values leads to a greater decrease in the estimation of Q2 by MGA model. It should be noted that the sensitivity value of the GSGA model in relation CCM is greater than CGA and MGA models. The sensitivity values when DA is changed is positive most of the time for all four models. This was expected, as DA is positively correlated with Q2. By increasing the value of DA, the sensitivity decreases; as such, the sensitivity is sometimes negative for the MGA model and

the three input GSGA model. Overall a slight change in DA results in an increase in the Q2 estimation calculated by all three GMDH-based models. Like the two previous parameters, BSHAPE and CCM, the highest sensitivity values are related to the GSGA model. The PDSA results show that the MGA and GSGA models do not have a definite trend compared to changes in the DESMOIN parameter. The only noteworthy trend is that the sensitivity of the MGA and GSGA models to DESMOIN is negative at both minimum and maximum 687

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al.

values. The difference between the CGA model and these two models is that at the maximum values the sensitivity is positive. Precipitation is the causal force driving any flooding event, and it was expected that Q2 would be sensitive to changes in I24H10Y. The sensitivity of the CGA to the I24H10Y parameter is very high compared to the other parameters, and the sensitivity sign is positive for all samples. Therefore, a slight change in the I24H10Y results in a significant change in Q2 estimated by CGA. The sensitivity sign for I24H10Y is positive most of the time, indicating that an increase in the variable leads to an increase in Q2. No distinct trend is apparent for GSGA. Therefore, it is difficult to draw any conclusions from this model. Unlike CGA and GSGA where no clear and permanent change is observed, the trend of the MGA equation related to I24H10Y has a second order variation. The three input GSGA model exhibits high sensitivity to I24H10Y. The trend in sensitivity has both an increasing and a decreasing trend as I24H10Y increases, again making it difficult to draw conclusions about the effect of this parameter on this model. The variation of GSGA in relation to KSATSSUR is negative most of the time but does not exhibit a clear trend. The negative sensitivity sign is logical, as a lower value of KSATSSUR indicates a lower infiltration capacity which in turn should cause an increase in the peak flow. The three input GSGA model has a similar trend in sensitivity. However, it has more positive sensitivity values. The variation of CGA and MGA in relation to KSATSSUR do not have a clear trend or a consistent sensitivity sign. The sensitivity analysis is an important aspect of evaluating model performance. The sensitivity analysis provides further evidence of the enhanced prediction performance provided by the GSGA method. The sensitivity trends can be explained by hydrologic theory. Even if the model can produce accurate results, if the trends exhibited by the model oppose known theories or cannot be explained in a logical manner, then it is likely the model will not produce accurate results at ungauged locations.

were very sensitive to changes in I24H10Y. The sensitivity was mostly positive, which indicates that an increase in I24H10Y will increase the magnitude of Q2 in the model. From the sensitivity analysis, some unclear trends were noted using CCM, BSHAPE, and DESMOIN providing further evidence to eliminate them as prediction variables. Using both GSGA models, the trends in sensitivity were the most logical based on hydrologic theory. This study has shown that one reliable prediction method can be employed to predict the 2-year peak flow in Iowa to a high degree of accuracy and is supported by the context of hydrologic perspectives. Declaration of Competing Interest None. Acknowledgements The authors would like to acknowledge the financial support of the Discovery Grant Program, Trust Fund #400675, Natural Sciences and Engineering Research Council of Canada (NSERC) for funding this research. References Ahn, K.H., Palmer, R., 2016. Regional flood frequency analysis using spatial proximity and basin characteristics: quantile regression vs. Parameter regression technique. J. Hydrol. 540, 515–526. https://doi.org/10.1016/j.jhydrol.2016.06.047. Atieh, M., Gharabaghi, B., Rudra, R., 2015. Entropy-based neural networks model for flow duration curves at ungauged sites. J. Hydrol. 529, 1007–1020. https://doi.org/10. 1016/j.jhydrol.2015.08.068. Atieh, M., Taylor, G., Sattar, A.M., Gharabaghi, B., 2017. Prediction of flow duration curves for ungauged basins. J. Hydrol. 545, 383–394. Aziz, K., Rahman, A., Fang, G., Shrestha, S., 2014. Application of artificial neural networks in regional flood frequency analysis: a case study for Australia. Stoch. Env. Res. Risk A 28 (3), 541–554. https://doi.org/10.1007/s00477-013-0771-5. Chai, T., Draxler, R.R., 2014. Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geosci. Model Dev. 7, 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014. Curran, J.H., Barth, N.A., Veilleux, A.G., Ourso, R.T., 2016. Estimating flood magnitude and frequency at gaged and ungaged sites on streams in Alaska and conterminous basins in Canada, based on data through water year 2012. U.S. Geological Survey Scientific Investigations Report 2016–5024. Eash, D.A., Barnes, K.K., Veilleux, A.G., 2013. Methods for Estimating Annual Exceedance-Probability Discharges for Streams in Iowa, Based on Data through Water Year 2010. U.S. Geological Survey Scientific Investigations Report 2013 – 5086, 63. Ebtehaj, I., Bonakdari, H., Sharifi, A., 2014. Design criteria for sediment transport in sewers based on self-cleansing concept. J. Zhejiang Univ-Sc A. 15 (11), 914–924. https://doi.org/10.1631/jzus.A1300135. Ebtehaj, I., Bonakdari, H., Moradi, F., Gharabaghi, B., Khozani, Z.S., 2018. An integrated framework of Extreme Learning Machines for predicting scour at pile groups in clear water condition. Coast. Eng. 135, 1–15. https://doi.org/10.1016/j.coastaleng.2017. 12.012. Ebtehaj, I., Sattar, A.M., Bonakdari, H., Zaji, A.H., 2017. Prediction of scour depth around bridge piers using self-adaptive extreme learning machine. J. Hydroinform. 19 (2), 207–224. https://doi.org/10.2166/hydro.2016.025. Ebtehaj, I., Bonakdari, H., Gharabaghi, B., 2019. Closure to “An integrated framework of Extreme Learning Machines for predicting scour at pile groups in clear water condition by Ebtehaj, I., Bonakdari, H., Moradi, F., Gharabaghi, B., Khozani, Z.S”. Coast. Eng. 147, 135–137. https://doi.org/10.1016/j.coastaleng.2019.02.011. Farlow, S.J., 1984. Self-organizing Methods in Modeling, GMDH Type Algorithms. New York and Basel, Marcel Dekker, Inc. Grillakis, M.G., Koutroulis, A.G., Komma, J., Tsanis, I.K., Wagner, W., Blöschl, G., 2016. Initial soil moisture effects on flash flood generation – A comparison between basins of contrasting hydro-climatic conditions. J. Hydrol. 541, 206–217. https://doi.org/ 10.1016/j.jhydrol.2016.03.007. Haddad, K., Rahman, A., 2012. Regional flood frequency analysis in eastern australia: bayesian gls regression-based methods within fixed region and roi framework – quantile regression vs. Parameter regression technique. J. Hydrol. 430–431, 142–161. https://doi.org/10.1016/j.jhydrol.2012.02.012. Hrachowitz, M., Savenije, H.H.G., Blöschl, G., McDonnell, J.J., Sivapalan, M., Pomeroy, J.W., Arheimer, B., Blume, T., Clark, M.P., Ehret, U., Fenicia, F., 2013. A decade of Predictions in Ungauged Basins (PUB) – a review. Hydrolog. Sci. J. 58 (6), 1198–1255. https://doi.org/10.1080/02626667.2013.803183. Ivakhnenko, A.G., Ivakhnenko, G.A., 1995. The review of problems solvable by algorithms of the group method of data handling (GMDH). S. Mach. Perc. 5 (4), 527–535. Ivakhnenko, A.G., 1968. The group method of data handling; a rival of the method of stochastic approximation. Soviet Automatic Control 13 (3), 43–55. Ivakhnenko, A.G., 1971. Polynomial theory of complex system. IEEE Trans. Syst. Man.

6. Conclusions A novel GMDH-based algorithm, coined GSGA, for predicting the 2year peak flow at streams in Iowa has been developed. The GSGA is a combination of two classical GMDH-based algorithms, CGA and MGA. By developing a new encoding structure to simultaneously evaluate the two methods the disadvantages when they are employed separately are overcome. Data from 365 stream gauges located in the state of Iowa were collected for model development. Traditional regression equations developed by the USGS for Iowa were also tested on the data set to compare to the GMDH-based algorithms. The state was divided into three separate regions, where each region was assigned a unique equation for prediction. The different structure and input variables, as well as the questionable regions defined, insinuates that error may occur at ungauged sites. Having three different equations creates a lack of cohesion between predictions and errors near the regional boundaries. Three GMDH-based equations and the traditional USGS regression equations rely on six input parameters: DA, I24H10Y, KSATSSUR, BSHAPE, CCM, and DESMOIN. The inconsistent inclusion of the six variables and in the traditional USGS regression equations led to a third GSGA equation using only three input variables. This model achieved an R-value of 0.89 and an ENS of 0.89 which is very comparable to the original three GMDH-based equations using six variables. This method shows that using more variables is not necessary and regionalization can be avoided. Uncertainty analysis and a partial derivative sensitivity analysis were performed for the four GMDH-based methods. The smallest WUB and 95% confidence limits occurred using the GSGA method. The sensitivity analysis revealed interesting trends between the predicted Q2 and the selected input variables, for example, most of the models 688

Journal of Hydrology 575 (2019) 671–689

R. Walton, et al. Cybern. SMC-1 364–378. https://doi.org/10.1109/TSMC.1971.4308320. Merz, R., Blöschl, G., 2005. Flood frequency regionalisation – spatial proximity vs. catchment attributes. J. Hydrol. 302, 283–306. https://doi.org/10.1016/j.jhydrol. 2004.07.018. Miller, B.A., Crumpton, W.G., Van der Valk, A.G., 2009. Spatial distribution of historical wetland classes on the des moines lobe. Iowa. Wetlands 29 (4), 1146–1152. https:// doi.org/10.1672/08-158.1. Najafzadeh, M., Tafarojnoruz, A., 2016. Evaluation of neuro-fuzzy GMDH-based particle swarm optimization to predict longitudinal dispersion coefficient in rivers. Environ. Earth Sci. 75 (2), 157. Nash, J.E., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models Part IA discussion of principals. J. Hydrol. 10, 282–290. https://doi.org/10.1016/00221694(70)90255-6. Olson, S.A., Veilleux, A.G., 2014. Estimation of Flood Discharges at Selected Annual Exceedance Probabilities for Unregulated, Rural Streams in Vermont. U.S. Geological Survey Scientific Investigations Report 2014-5078. Parsaie, A., Azamathulla, H.M., Haghiabi, A.H., 2017. Physical and numerical modeling of performance of detention dams. J Hydrol. https://doi.org/10.1016/j.jhydrol.2017. 01.018. Prior, J.C., 1991. Landforms of Iowa. University of Iowa Press, Iowa City. Rahmati, M., 2017. Reliable and accurate point-based prediction of cumulative infiltration using soil readily available characteristics: a comparison between GMDH, ANN, and MLR. J. Hydrol. 551, 81–91. Razavi, T., Coulibaly, P., 2013. Streamflow prediction in ungauged basins: review of regionalization methods. J. Hydrol. Eng. 18 (8), 958–975. https://doi.org/10.1061/

(ASCE)HE.1943-5584.0000690. Shaghaghi, S., Bonakdari, H., Gholami, A., Ebtehaj, I., Zeinolabedini, M., 2017. Comparative analysis of GMDH neural network based on genetic algorithm and particle swarm optimization in stable channel design. Appl. Math. Comput. 313, 271–286. https://doi.org/10.1016/j.amc.2017.06.012. Shaghaghi, S., Bonakdari, H., Gholami, A., Kisi, O., Shiri, J., Binns, A.D., Gharabaghi, B., 2018. Stable alluvial channel design using evolutionary neural networks. J. Hydrol. 566, 770–782. https://doi.org/10.1016/j.jhydrol.2018.09.057. Su, X., Yan, X., Tsai, C.L., 2012. Linear regression. Wiley Interdisciplinary Reviews. Computation. Stat. 4 (3), 275–294. https://doi.org/10.1002/wics.1198. Tayfur, G., 2012. Soft Computing in Water Resources Engineering: Artificial Neural Networks, Fuzzy Logic and Genetic Algorithms. WIT Press, Southamton, UK ISBN:978-1845646363. USGS., 2016. Floods: Recurrence intervals and the 100-year floods (USGS). Retrieved from https://water.usgs.gov/edu/100yearflood.html. Yaseen, Z.M., Ebtehaj, I., Bonakdari, H., Deo, R.C., Mehr, A.D., Mohtar, W.H.M.W., Diop, L., El-shafie, A., Singh, V.P., 2017. Novel approach for streamflow forecasting using a hybrid ANFIS-FFA model. J. Hydrol. 554C, 263–276. https://doi.org/10.1016/j. jhydrol.2017.09.007. Zaji, A.H., Bonakdari, H., Gharabaghi, B., 2018. Reservoir water level forecasting using group method of data handling. Acta Geophys. 66 (4), 717–730. Zhang, H., Liu, X., Cai, E., Huang, G., Ding, C., 2013. Integration of dynamic rainfall data with environmental factors to forecast debris flow using an improved GMDH model. Comput Geosci. 56, 23–31.

689