Creating a non-linear total sediment load formula using polynomial best subset regression model

Creating a non-linear total sediment load formula using polynomial best subset regression model

Journal of Hydrology 539 (2016) 662–673 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhy...

1MB Sizes 2 Downloads 37 Views

Journal of Hydrology 539 (2016) 662–673

Contents lists available at ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Creating a non-linear total sediment load formula using polynomial best subset regression model Davut Okcu a,⇑, Ali Osman Pektas b, Ali Uyumaz c a

Bahcesehir University, 34353 Besiktas, Istanbul, Turkey Bahcesehir University, Civil Engineering Department, 34353 Besiktas, Istanbul, Turkey c Istanbul Technical University, Department of Hydraulics and Water Resources, Maslak, Istanbul, Turkey b

a r t i c l e

i n f o

Article history: Received 4 February 2016 Received in revised form 28 April 2016 Accepted 30 April 2016 Available online 2 June 2016 This manuscript was handled by Andras Bardossy, Editor-in-Chief, with the assistance of Sheng Yue, Associate Editor Keywords: Sediment transport River hydrology Polynomial best subset regression (PBSR) Bed material load Suspended sediment Bedload sediment

s u m m a r y The aim of this study is to derive a new total sediment load formula which is more accurate and which has less application constraints than the well-known formulae of the literature. 5 most known stream power concept sediment formulae which are approved by ASCE are used for benchmarking on a wide range of datasets that includes both field and flume (lab) observations. The dimensionless parameters of these widely used formulae are used as inputs in a new regression approach. The new approach is called Polynomial Best subset regression (PBSR) analysis. The aim of the PBRS analysis is fitting and testing all possible combinations of the input variables and selecting the best subset. Whole the input variables with their second and third powers are included in the regression to test the possible relation between the explanatory variables and the dependent variable. While selecting the best subset a multistep approach is used that depends on significance values and also the multicollinearity degrees of inputs. The new formula is compared to others in a holdout dataset and detailed performance investigations are conducted for field and lab datasets within this holdout data. Different goodness of fit statistics are used as they represent different perspectives of the model accuracy. After the detailed comparisons are carried out we figured out the most accurate equation that is also applicable on both flume and river data. Especially, on field dataset the prediction performance of the proposed formula outperformed the benchmark formulations. Ó 2016 Elsevier B.V. All rights reserved.

1. Introduction Sediment transport is one of the most powerful agents on river environment. It transfigures the river morphology by Long-term degradation and aggradation of channel beds via erosion and deposition. Such issues can have a direct effect on the level of the river during flooding. A change in the morphology of a river can threaten channel stability which can create local scour around hydraulic installations. In addition, an accurate prediction of the total sediment load is a key factor in managing sedimentation in reservoirs. Therefore, for almost a hundred years sediment transport prediction is one of the most studied issues in different disciplines. The total sediment load includes the wash load and the bed-material load. The bed-material load consists of bed-load and suspended load. Generally, two approaches are available for predicting the bed-material load in a river. One is to estimate the ⇑ Corresponding author. E-mail addresses: [email protected], [email protected] (D. Okcu), [email protected] (A.O. Pektas), [email protected] (A. Uyumaz). http://dx.doi.org/10.1016/j.jhydrol.2016.04.069 0022-1694/Ó 2016 Elsevier B.V. All rights reserved.

bed-load and suspended load in separate calculations. This is based on the fact that the hydrodynamics of each mode of sediment transport is different. The methods developed by Einstein (1950), van Rijn (1984), and Toffaleti (1969) fall into this approach. The other approach is to estimate the bed-material load directly without dividing the transport mode into two parts (e.g. Engelund and Hansen, 1967; Brownlie, 1981; Ackers and White, 1973; Karim and Kennedy, 1990; Choi and Lee, 2015). This approach is simple and sometimes preferred in the sense that the two modes of sediment transport cannot easily be distinguished from one another, in reality. In fact, choosing the approach is constrained by the available data, practical engineering purpose and the precision level of the study. It can be asserted that if the data availability is not a constraint the appropriate approach can be selected by using the Shields-Parker diagram. The Shields-Parker diagram (for more details see García, 2008, p: 60–65) shows that in gravel bed rivers, bed material is transported mainly as bed load. In this diagram the critical condition for suspension is plotted with an additional curve, which is derived from the ratio of shear velocity and the

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

sediment fall velocity (also see, Niño and García, 1998; Lopez and Garcia, 2001). On the other hand, in sand bed rivers, suspension and bed load transport of bed material coexist, particularly at high flows. So decision on the approach of the study if the bed load and the suspended load are investigated separately or together can be depend on the criteria that if the river bed type is gravel or sand. However, there are some suggested empirical equations on the decision of the river bed type (e.g. García et al., 2000), this decision is a bit complicated and including the bed type in the formulation (model) could add extra uncertainty to the applied numerical model. In this study, the second approach is used to predict the total sediment load. The flume and the field conditions are very different from each other and it is a big deal to find a model which gives good performance for both situations. Therefore some studies focused on only flume data. Smart (1984), Damgaard et al. (1997), Ackers and White (1973). Dogan et al. (2009) investigated if the sediment transport in natural alluvial channels can be predicted from observations at the laboratory scale. Tayfur et al. (2013) and Pektas (2015) used explanatory analysis like principal component analysis or cluster analysis to identify the significant non-dimensional parameters of sediment transport. In recent studies most frequently Machine learning based models like Neural networks, Fuzzy logic, Support vector machines, are used in sediment modeling (e.g. Pektas and Dogan, 2013; Cigizoglu, 2002). Kisi and Cigizoglu, 2007 studied to improve the neural network performance in suspended sediment estimation. But machine learning models have a black box nature, so very small amount of information could be gained inside the model and most of these models are not suitable to generate a formula. Therefore the regression models are still popular. Sinnakaudan et al. (2006) developed a total bed material formula by using multiple linear regression model. The authors focused on high gradient river sediment transport (Sinnakaudan et al., 2010) by using Regression models. Neter et al. (1989) discussed the use of all-possible-subset regression (Best Subset regression) in conjunction with stepwise regression. Howard et al. (2010) used best subset regression in their rainfall-runoff response models. Loomis et al. (2012) developed a new calibration that used best subsets regression model on lake sediment. Recently Lacombe et al. (2014) used best subset regression model for stream flow prediction. At the same time there is no sediment transport study using best subset regression model. In the present study Best subset regression technique is modified and used to find the optimum input combination that keeps the nonlinear relationships. The aim to obtain the most parsimonious and the most accurate model to predict the total bed material concentration by considering the nonlinear relationships. Then we compare the new generated formula with the wellknown and widely used traditional formulas of the literature. All the benchmark formulations are referenced by ASCE (2008). These are The Yang (1979), The Karim (1998), The Engelund and Hansen (1967), The Ackers and White (1973), The Molinas and Wu (2001) formulas. The Yang (1979) and The Karim (1998) regression based formulas.

2. Total bed material load sediment transport formulas In the literature, there are many sediment transport formulas which have different specifications. Therefore many studies have attempted to find the best performing formulas for determining the total sediment load in rivers. Alonso (1980) compared eight formulas using both flume and field data and concluded that Yang’s (1973), Ackers and White’s (1973), Engelund and Hansen’s (1967), and Laursen’s (1958) formulas are all reliable. Brownlie (1981) compared 14 formulas using a compendium of sediment

663

transport data from the laboratory and field records. He concluded that Brownlie’s (1981), Ackers and White’s (1973), and Engelund and Hansen’s (1967) formulas are acceptable. Woo and Yoo (1991) carried out extensive performance tests with 10 selected sediment transport formulas and found that Engelund and Hansen’s (1967), Ackers and White’s (1973), and van Rijn’s (1984) formulas are more reliable than the others. Nakato (1990) tested 11 total sediment load formulas using field data. Wu and Wang (2003) tested Engelund and Hansen’s (1967), Ackers and White’s (1973), Yang’s (1979), Wu et al.’s (2000) formulas and found that the performance of all of these formulas are comparable, when uniform sediment is being considered. García (2008) recommended six total sediment load formulas, namely the Engelund and Hansen’s (1967), Brownlie’s (1981), Karim and Kennedy (1983), Ackers and White’s (1973), Yang’s (1973), and Molinas and Wu’s (2001) formulas. Recently, Yang et al. (2009) compared the predictive performance of neural networks and the selected sediment formulas. In this study, five total sediment load formulas, including Engelund and Hansen’s (1967), Ackers and White’s (1973), Yang’s (1979), Brownlie’s (1981), and Karim’s (1998) formulas, are used. Hereafter, they are referred to as EH, AW, YANG, PBSR, and KARIM formulas, respectively. Although data used for development of some of these formulas include gravel, the formulas are designed for use in sand-bed rivers. Herein, the total sediment load formulas estimate either total sediment load per unit width (qt) or total bedmaterial concentration in parts per million by weight (flux based mass concentration) (C) and are related by

qt ¼

1 C q ðGs Þ ð1  CÞ w

ð1Þ

where qw is water discharge per unit width, Gs is specific gravity of sediment. Each formulation has special constraints and in comparison part these constraints are applied. 2.1. The Engelund and Hansen Formula (EH-1967) The Engelund and Hansen (1967) Relation is a semi-empirical equation based on energy concepts. It is derived for sandy streams. This relation was developed from a small set of laboratory data (ASCE, 2008).

qt 1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:05ðs Þ2:5 C 3 ðGs  1Þd50

ð2Þ

where qt is total sediment load per unit width, Gs  1 is the submerged specific gravity, and s⁄ is dimensionless Shields stress. The equation can be determined in another form:

  Gs US rS pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi C ¼ 0:05 ðG Gs  1  ðGs  1Þgd50 s 1Þd50

ð3Þ

where C is flux-based mass Concentration, d50 is the median size of particle diameter, Gs is specific weight of sediment, U is velocity of water, S is slope, r is hydraulic radius.

The Engelund and Hansen equation ðEHÞ is applicable to : 8 > < d50 P 0:15mm Re P 12 > : Gradationðrs Þ 6 2 Implicitly, The EH formula can be written as a function of dimensionless parameters:

664

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

U2 rS ; grS ðGs  1Þd50

C¼f

!

water velocity; US is unit stream power Uc is critical average flow velocity at incipient motion and the others are explained before.

 Yang equation ðYANGÞ is applicable to :

2.2. The Ackers and White Formula (AW-1973) Ackers and White (1973) developed one of the most popular formulas of total bed material transport. This relation is based on Bagnold’s stream power concept. Ackers and White used dimensional analysis together with consideration of the physical characteristics of the sediment-laden flow. Ackers and White’s method (AW) is based on almost 1000 flume experiments, which were carried out under steady state with uniform or near uniform sediments (ASCE, 2008).

AW0 s method is applicable to : f0:04 mm 6 d50 6 4:94 mm C ¼ c:Gs

 n  m d50 U F gr 1 r U Aaw

ð4Þ

F gr ¼

pffiffiffiffiffiffiffiffiffiffiffi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi so =q= g:d50: ðGs  1Þ

ð5Þ

Dgr ¼

qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2=3 3 ðGs  1Þgd50 =v

ð6Þ

where c, Aaw and m are coefficients varying with Dgr that is a function of dimensionless grain size. v is the kinematic viscosity of water, Fgr is called Ackers-White mobility number. U⁄ is shear velocity of sediment and the other is explained before. For Dgr > 60.0 (or approximately d50 > 2.5 mm):

n¼0 m ¼ 1:5 Aaw ¼ 0:17

Implicitly, the YANG formula can be written as a function of dimensionless parameters:

! US U  wd50 U 3 ; ; C¼f ; w w v gHw 2.4. The Karim Formula (KARIM-1998) The Karim and Kennedy (1990) obtained total volumetric sediment discharge per unit width from nonlinear regression using a sample of 339 river flows and 608 flume flows (ASCE, 2008). More recently, Karim (1998) proposed a simpler power relation for the sediment transport equation using the same data sets employed in the Karim-Kennedy analysis. Karim (1998) applied his equation to laboratory and field data having non-uniform sediments by dividing the sediment into size fractions. Karim’s (1998) formula predicts the dimensionless total sediment load per unit width;

!2:97   1:47 qt U U qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ¼ 0:00139 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w 3 ðGs  1Þgd50 ðGs  1Þgd50

ð8Þ

where w is sediment particle fall velocity in water; U is average water velocity, U⁄ is shear velocity of sediment and the others are explained before. Implicitly, The KARIM formula can be written as a function of dimensionless parameters:

U U C¼f ; pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi w ðGs  1Þgd50

c ¼ 0:025

0:063 mm 6 d50 6 2 mm C P 100 ppm

!

For 1.0 < Dgr660.0 (or approximately 0.04 mm < d50(mm) 62.5 mm)

n ¼ 1  0:056 log Dgr

2.5. The Molinas and Wu Formula (MW-2001)

m ¼ 1:34 þ ð9:66=Dgr Þ Aaw ¼ 0:14 þ 0:23=ðDgr Þ0:5 2

log c ¼ 2:86 log Dgr  ðlog Dgr Þ  3:53 Implicitly, the AW formula can be written as a function of dimensionless parameters:

  d50 U F gr C ¼ f Gs ; ; ; r U  Aaw 2.3. The Yang Formula (YANG-1979) This relation was developed from a dimensional analysis of stream power and other relevant variables. The Yang equation is obtained from multiple regression analysis using 463 laboratory flume observations (Choi and Lee, 2015). The equation can be formulized as;

wd50 U log C ¼ 5:435  0:286 log  0:457 log m w     wd50 U US U c : log   0:314 log þ 1:799  0:409 log w m w w ð7Þ where C is total bed-material concentration in parts per million by weight; w is sediment particle fall velocity in water; U is average

This empirical relation is based on Velikanov’s gravitational power theory, which assumes that the power available in flowing water is equal to the sum of the power required to overcome flow resistance and the power required to keep sediment in suspension against gravitational forces. Molinas and Wu (2001) stated that the predictors of Engelund-Hansen, Ackers and White, and Yang have been developed with lab experiments representative of shallow flows and cannot be applied to large rivers. Motivated by the need for having a total bed-material load predictor for application to large sand-bed rivers, Molinas and Wu (MW) used stream power and energy concept considerations together with data from large rivers (e.g. Amazon, Atchafalaya, Mississippi, Red River).



! pffiffiffiffi 1430ð0:86 þ wÞw1:5 0:016 þ w

W stream power, which is defined by ! U3 w¼ 2 ðGs  1ÞgHwðlogðH=d50 ÞÞ

ð9Þ

ð10Þ

where H is height of water and the others are explained before. Most parameters in this empirical equation can be measured and/or estimated in the field, making it a useful formulation for practical use in large sand-bed rivers. One advantage of this approximation is that the energy slope (S) does not have to be

665

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

measured directly, which is always a challenge in large alluvial rivers.

8 0:4 mm 6 d50 6 1:35 mm > > > < 0:003 m3 =s 6 Q 6 0:64 m3 =s w MW equation is applicable to : > 0:03 m 6 H 6 0:37 m > > : 4 ppm 6 C 6 39310 ppm Implicitly, The MW formula can be written as a function of dimensionless parameters:

C¼f

! H U3 ; d50 g:H:w

3. Data and method

To ensure the normality assumption of regression models, the dependent variable is transformed by natural logarithm. The histogram of the concentration of carried sediment is presented in Fig. 1. As shown in the figure after transformation the probability density function become close fit to the normal curve, indicating the accordance for the use in regression models. After data partition, Best subset regression models are modified and used to generate a new formula. Neter et al. (1985) used possible-subset regression with stepwise regression. The authors pointed that there is a limitation in stepwise regression search approach which is it presumes there is a single ‘‘best” subset of X variables and seeks to identify it. But, there is often no unique ‘‘best” subset. Therefore for huge input numbers best subset solution might give the most parsimonious model if the model comparison parameter is sensitive to input number.

4. Polynomial best subset regression model (PBSR)

To extend the database assembled by Brownlie (1981b) is utilized which represents a wide spectrum of the considered problem. A wide range of sample is compiled from both flume experiments and field observations. Table A (in Appendix) shows the reference studies of the compiled dataset. In Table 1 descriptive statistics of river/lab dimensions and some sediment parameters are presented. Many of the dimensionless parameters are derived by using these attributes. As seen in Table 1 while the range of river width, depth and the river discharges very large for field data the range of sediment concentration (C_ppm) and d50_mm are very large for flume data. As seen in Table the Flume data sediment concentration has a wide range and the standard deviation is approximately twice that of Field data. Within Total dataset, the maximum sediment concentration is observed as 12,900 ppm and 90% of the data is smaller than the value of 1939.2 indicating that there are many extreme values within the last 10 percentage. Before model development process the dataset is divided into two sets as training and validation. Before the model development process dataset is divided into two sets as training (70%) and validation (30%). The validation part is not used in model development. The models are compared in validation part considering the lab data (44%) and field data (56%) partition. Total number of data set is 2100, which are consist of lab data (927) and field data (1173).

The Best subset regression approach is adopted to fit and test (significance, F- Anavo) all possible combinations of the input variables in a regression equation and to select the best solution. The new approach is called Polynomial Best Subset Regression Model (PBSR). The aim of the polynomial design is to investigate if there is any nonlinear relationship between model inputs and the output by creating the nth power of the attributes and using them in the model considering the correlation between them. So n is taken as 3 in this study. In this study 10 non-dimensional variables are used as a starting point. These variables have been selected within the non-dimensional parameters of the predetermined 5 formula in the literature. These initial parameters can be formulized as;

C ¼ f S;

! H U 3 U  d50 HS U w U wd50 US ; ; ; ;  ; pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; ; ; d50 g:H:w v ðGs 1Þd50 w U ðGs 1Þgd50 v w ð11Þ

All the variables are included in the analysis with their first, second and third order force therefore 30 parameters are used for best subset selection process. While taking the powers (of the parameters in using them in regression as inputs) increases the nonlinear estimation capability of the model, this process increases the collinearity problem in the developed models.

Table 1 Descriptive statistics of raw dataset. Valid N

Mean

Median

Min.

Max.

Range

Perc.%10

Perc.%90

Std. Dev.

Skew ness

Kurtosis

Field Q-m3/s B-m H-m S d50_mm C-ppm

1173 1173 1173 1173 1173 1173

1505 181.01 3.001 0.685 0.740 522.5

134.78 82.60 1.703 0.720 0.323 200.24

0.00 0.35 0.034 0.006 0.083 5.61

28826 1109 17.282 6.690 3.400 5830

28826 1109 17.2 6.684 3.317 5824

2.86 13.72 0.316 0.042 0.161 44.00

4899 491 9.327 1.500 2.204 1420

3824 235 3.607 0.685 0.821 847.7

4.00 2.23 1.697 2.156 1.402 3.14

18.52 4.90 2.012 11.59 0.312 11.25

Lab Q-m3/s B-m H-m S d50_mm C-ppm

927 927 927 927 927 927

0.13 1.12 0.147 0.002 0.416 875.76

0.05 0.91 0.125 0.002 0.375 249.60

0.00 0.31 0.032 0.000 0.100 2.90

2.08 2.44 0.585 0.011 1.500 12900

2.07 2.13 0.553 0.011 1.400 12897

0.01 0.49 0.064 0.001 0.150 26.00

0.36 2.44 0.287 0.003 0.930 2480

0.19 0.64 0.083 0.001 0.263 1552

3.79 1.08 1.174 2.214 1.368 3.57

24.60 0.04 1.521 7.065 2.191 16.66

Total Q-m3/s B-m H-m S d50_mm C-ppm

2100 2100 2100 2100 2100 2100

840.50 101.60 1.7408 0.3832 0.5969 678.44

3.84 19.20 0.3290 0.0438 0.3435 213.05

0.00 0.31 0.0323 0.0002 0.0830 2.90

28826 1109 17.28 6.6900 3.4000 12900

28826 1109 17.2 6.690 3.317 12897

0.02 0.70 0.076 0.001 0.150 33.91

2129 390 6.233 1.290 2.204 1829

2953 197 3.046 0.614 0.658 1223

5.47 3.05 2.521 2.655 2.065 4.04

35.35 10.13 5.96 13.99 3.11 23.21

666

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

Fig. 1. Histogram of total sediment concentration a- Raw data, b- Transformed data.

P

73,35865

L

72,68266

J

18,43217

R

8,860096

p=,05

t-Value (for Coefficient;Absolute Value)

degree of accuracy. In this situation the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. (Belsley et al., 1980). Understanding the magnitude of the multicollinearity problem is not easy however there are some statistical diagnostics. In this study a well_ known collinearity diagnostic ‘Variance Inflation Factor (VIF) is used. The VIF is an index that indicates the ratio of the variance in a given predictor that can be explained by other predictors. There is no common rule for the threshold values of VIF. As a rule of Thumb VIF > 10 is an indication of a big Multicollinearity problem (O’brien, 2007). The output formula of the PBSR model is;

C ¼ 34:45

P3:239 J 0:005 L0:066 R0:146

ð12Þ

Fig. 2. Pareto chart of inputs, PBSR model.

The number of possible sub models increases very rapidly as the number of effects in the whole model increases. The amount of computation required to perform all-possible-subset regression increases as the number of possible sub models increases. In fact, for thirty parameters, potentially 230 models (1073741824) could be developed. Before initiating the permutation process of the algorithm a constraint is applied to decrease the possible number (potentially possible) of the models. In the study well known collinearity diagnostic Variance inflation factor (VIF) is used to eliminate the models initially (VIF > 3000). After this elimination possible 53009101 variations (different input combinations) are tried and compared to each other. In this process, the adjusted R2 is used to select the most parsimonious and the most accurate models. But the selected models still could have many collinear parameters. To overcome this problem the algorithm is modified to select the most accurate model while model input parameter selection is suppressed to decrease multicollinearity problem. The model algorithm is repressively adopted to eliminate the combinations (input sets) which includes very high VIF (>10) valued parameters by optimizing delta parameter that is used for sweeping operator in computation of reverse matrix. Different values of sweeping operators are tried between 10 and 1. The Adjusted R2 statistics is suitable for best model selection since it includes a penalty term that punishes more input parameter while comparing the models. The multicollinearity (also collinearity is a phenomenon in which two or more predictor variables in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a substantial

P; J; L; and R are dimensionless numbers which are defined as : 8 U ffi P ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi > > ðGs 1Þgd50 > > > > < lnJ ¼ ðlnSÞ3 ) J ¼ eðLnSÞ3  2 2 > > lnL ¼ ln dH50 ) L ¼ eðlnðH=d50 ÞÞ > > > > : R ¼ Uvd50 where C is total bed-material concentration in parts per million by weight; U is velocity of water; S is energy slope; Gs  1 is the submerged specific gravity; d50 is the median size of particle diameter; pffiffiffiffiffiffiffi H is height of water; U⁄ is shear velocity (U  ¼ grS); r is hydraulic Radius; v is kinematic viscosity of water and g is gravitational acceleration. P is a kind of dimensionless particle parameter which is used in many modeling studies (e.g. Tayfur et al., 2013; Pektas, 2015). Because the proposed formula is generated after data based analysis the limit of the usage is constrained to the dataset limits. So it can be concluded that this formula can be used for the rivers which have a slope between 0.0002 and 6.69, Particle diameterd50 with 0.083–3.4 mm, concentration between 2.9–12,900 ppm. In Fig. 2 the pareto chart of the regression coefficients are presented. Pareto chart shows the absolute value of t-values of the estimated coefficients so it is useful to understand the relative importance of the parameters. As shown in the figure the most important variables of the model are P and L. The dimensionless parameter R is found as the least significant parameter of the model. But it is still over the a vertical line (dashed line) which indicates the minimum magnitude of statistically significant parameter estimates.

667

5. Model evaluation criteria In the present study correlation coefficients (r), Nash–Sutcliffe efficiency (NSE), and Logarithmic transformation variable (e) and Adjusted R2 statistics are used to compare the model accuracies. Additionally the error statistics Mean absolute error (MAE) and Percent bias (PBIAS) are used to examine the error perspective. All used statistics have different perspectives and focus on a partial side of evaluation. Well known Pearson’s correlation coefficient (r) describe the degree of collinearity between predicted and measured data. The r, which ranges from 1 to 1, is a measure of the degree of linear relationship. If r = 0, no linear relationship exists. The relative error (RE) is the average of absolute error divided by the absolute magnitude of the exact value.

RE ¼

 PN  obs   Y pred Þ=Y obs i  i ðY i i N

ð13Þ

The Nash–Sutcliffe efficiency (NSE) is a normalized statistic that determines the relative magnitude of the residual variance (‘‘noise”) compared to the measured data variance (‘‘information”) (Nash and Sutcliffe, 1970). NSE indicates how well the plot of observed versus simulated data fits the 1:1 line (Moriasi et al., 2007). NSE is computed as shown in Eq. (14).

2

3 obs pred 2 ðY  Y Þ i 5 NSE ¼ 1  4P i i N obs mean 2 ðY  Y Þ i i PN

ð14Þ

The logarithmic transformation variable (e) is the average of the logarithm of the predicted/observed data ratio data.



N X log Y pred i

i

log Y obs i

ð15Þ

The value of e is centered on zero, is symmetrical in under- or over prediction, and is approximately normally distributed (Parker et al., 2007). If the predicted and measured data are in complete agreement, then the average of ‘‘e” will be 0. Values of Average e < 0 are indicative of under prediction; values > 0 are indicative of over prediction.

(a)

14000 12000

R² = 0.85

10000 8000 6000 4000 2000 0

0

2000

4000

6000

8000

10000

12000

14000

Observaon (ppm)

(b) PBSR predicon (ppm)

As shown in Fig. 3 there is no any clear pattern or trend in residual-predicted charts for both training and validation datasets. Therefore it can be concluded that the model is specified correctly.

PBSR predicon (ppm)

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

2000 1800 1600 1400 1200 1000 800 600 400 200 0

R² = 0.85

0

500

1000

1500

2000

Observaon (ppm) Fig. 4. a and b. Prediction-Observation Scatter plot of proposed model (Validation dataset).

The adjusted R2 is suitable to compare the models that have different input numbers since it has a penalty term which uses the observation number (case number, N) and the input variable number (p). The use of an adjusted R2 is an attempt to take account of the phenomenon of the R2 automatically and spuriously increasing when extra explanatory variables are added to the model. The adjusted R2 is the modification of the well-known R2 by adding the penalty terms as shown in formula;

Adjusted R2 ¼ R2  ð1  R2 Þ

p Np1

ð16Þ

Negative values can occur when the model contains terms that do not help to predict the response. Percent bias (PBIAS) measures the average tendency of the predicted data to be larger or smaller than their observed counterparts. The optimal value of PBIAS is 0.0, with low-magnitude values indicating accurate model prediction. Positive values indicate model underestimation bias, and negative values indicate model overestimation bias (Gupta et al., 1999). PBIAS is calculated with Eq. (17):

PN PBIAS ¼

i

ðY obs  Y Pred Þ  100 i i PN obs Y i i

ð17Þ

6. Results and discusion

Fig. 3. Residual- Predicted scatter graph of PBSR model.

By using the determined model fit statistics, new proposed equation (Eq. (12)) and the benchmark model outputs are compared in validation dataset. The formula restrictions of the models are used in comparison process. Additionally, predictive capabilities of the models for field and lab data are investigated. The scatter plot of PBSR predictions and the observed values on validation dataset (Fig. 4a and b) indicate that model has no bias by the sense of underestimation or overestimation. In Fig. 4b the small values (2000<) are zoomed for a good visual inspection.

668

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

remedy of this situation is about understanding the relative weights of data cases for lab, field and total for MW. Only 19 cases (within field validation dataset) are convenient to MW but within lab dataset 112 cases are convenient, total validation data number is 131. The performance of MW for field data is very bad, probably because Molinas and Wu (2001) derived this formula in a dataset which is dominated with lab observations. Since the number of field data is relatively small (19/131 = 14%) so total validation data correlation coefficient (0.894) is not affected (not descend) by the low correlation performance of validation field dataset (0.280). In another words, since the data number of the lab data is relatively high (112/131 = 86%) it dominates (corr. coef = 0.904) the total average of correlation coefficient (0.89). The other example of conflict situation is about KARIM statistics. As seen in Fig. 7(a) the KARIM formula produce two cluster of outputs (+ for field observations, o for lab observations). If the fit statistics are calculated in a cluster than both correlations are well (R2 = 0.87- lab, R2 = 0.71-field) but when a total regression line is tried, then this time the best regression line wants to balance the error squares and so the line passes in the middle. If double logarithmic scales are used in the figure this discrepancy is shown clearly (see Fig. 7(b), solid line). As understood from Table 2, the field data performances of the EH, AW and YANG are very bad. The NSE statistics of these three are all negative and the correlation coefficients are almost zero. Mean relative errors are huge when compared to the other models (PBSR, KARIM and MW). For field data best two models are PBSR and KARIM. The correlation coefficients and the Adjusted R2 values of PBSR and KARIM are close to each other. Strangely the NSE value of KARIM is negative and high. This means that the model is worse than the base average model. As seen in Fig. 8a, KARIM gives very high overestimations for most field data. But the variations of KARIMs’ predictions are synchronous with observations. This is why the correlation coefficients and the Adjusted R2 values are high. But NSE and some other model comparison criteria detect this critical point. The PBIAS value is very high and negative (-1470) indicating that the model overestimates, the ‘‘e” (1.32) is positive and close to the ‘‘e” value of AW (1.41). So after detailed examinations it can be concluded that for field data set the proposed PBSR formula outperforms all the others. Even if the restrictions of each model considered in calculations, all benchmark models give poor performance for field data. The scatter diagrams of the EH, AW and YANG is presented in Fig. 9a –c.

For whole dataset and the for the validation dataset the determined model performance statistics are given in Table 2. As seen in Table, EH, AW and YANG formulas have bad performance in the sense of all criteria. Negative big values of NSE indicate that these three models are worse than average base models. The scatter plots of these models are given in Fig. 5 a,b,c. The scale of the predictions are constrained to 500,000 to visual inspection, but as understood from Maximum Absolute Error statistics these models produce very high inaccurate predictions. The models MW and PBSR give high correlation coefficients and the formula of the KARIM is relatively good. Dual scatter plots (Fig. 6a and b) give comparison of each model prediction and the proposed model (PBSR) against observational data. As seen that the output of the KARIM has a two way spread like the others (EH, AW and YANG), there is a sharp linear spread that has very high prediction values near y axis and a linear spread around y = x line. By the e and PBIAS the PBSR model tends to make underestimation while the MW tends to make overestimation. For mixed validation data (Field + Lab) the best two models are PBSR and MW. When these two are compared, by all of the performance indices, the proposed model PBSR outperforms the others and the MW is relatively successful. The Models are compared with their field data (with in validation set) predictive capacities. Table 3 represents the determined statistics of the models for field data. In comparison process only validation data is used and the specific restrictions of the models are considered. If three tables (Tables 2–4) are compared simultaneously, it seems that the results are inconsistent, for example, the correlation coefficient indices for MW 0.895 (Table 2 – in validation part of whole dataset), 0.280 (Table 3 – validation part in Field dataset), 0.904 (Table 4 – Validation part of Lab dataset) changes drastically across different datasets. Or correlation coefficient of KARIM (Table 2 – in validation part of whole dataset) 0.36, 0.84 (Table 3 – validation part in Field dataset), 0.93 (Table 4 – Validation part of Lab dataset) changing sharply when the label field or lab changes. It can be thought that these sharp changes are because of the limitations of the formulas, but in the modeling phase of the study all the constraint of the formulas (models) is taken into consideration. The models are constructed by using the data-cases which provide all the constraints of the relevant formulas. In fact, the

Table 2 Model comparison statistics for whole dataset. PBSR

KARIM

MW

EH

AW

YANG

Validation Correlation Relative error mean e NSE PBIAS Adjusted R2 Max AE Median AE Mean AE

0.92 0.54 0.01 0.82 0.68 0.846 3989 91 287

0.36 14.02 0.81 43.76 556.80 0.119 53607 1703 4464

0.90 4.59 0.53 0.71 9.49 0.810 1726 393 448

0.01 2775 1.99 238564 35666 0.014 988310 637769 466259

0.02 20651 0.88 3976660606 543058 0.008 2208065954 691 4159439

0.02 167 1.30 4834 5795 0.012 831894 45673 66418

Total Correlation Relative error mean e NSE PBIAS Adjusted R2 Max AE Median AE Mean AE

0.90 0.56 0.00 0.75 0.08 0.81 4682 84 275

0.40 14.38 0.83 63.57 649.69 0.16 68379 1704 4568

0.87 4.86 0.58 0.55 13.09 0.75 2500 401 453

0.08 2674 1.97 329285 40550 0.00 993358 607423 459752

0.01 12981 0.88 1789071853 304912 0.00 2208065954 608 2078584

0.06 170 1.31 6458 6561 0.00 934573 46114 68508

669

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

(b) Prediciton_C (ppm)

Predicon_C (ppm)

(a) 500000

250000

0

0

5000

10000

15000

500000 400000 300000 200000 100000 0

20000

0

5000

Observaon_C (ppm)

Predicon_C (ppm)

(c)

10000

15000

20000

Observaon_C (ppm)

1200000 1000000 800000 600000 400000 200000 0

0

5000

10000 15000 20000

Observaon_C (ppm) Fig. 5. Scatter plots of predictions in validation dataset a- EH model, b- AW model, c- YANG model.

60000

(b)

PBSR

50000

Predicon_C(ppm)

Predicon_C (ppm)

(a)

KARIM

40000 R² = 0.13

30000 20000

R² = 0.85

PBSR

12000

R² = 0.85

MW R² = 0.82

8000

4000

10000 0

0

20000

40000

60000

0

0

4000

Observaon_C (ppm)

8000

12000

Observaon_C (ppm)

Fig. 6. Dual scatter plots of predictions for validation dataset. a- KARIM, PBSR b- MW, PBSR.

Table 3 Model comparison statistics for Field data within dataset. PBSR

KARIM

MW

EH

AW

YANG

Validation (Field) Correlation Relative error mean e NSE PBIAS Adjusted R2 Max AE Median AE Mean AE

0.86 0.54 0.01 0.51 13.31 0.74 3989.49 81.71 250.96

0.84 24.05 1.32 217.29 1470.48 0.71 53607.19 3665.14 7756.09

0.28 2.97 0.56 0.54 4.67 0.25 567.07 453.08 426.42

0.30 4661.66 3.27 616961 90709 0.07 988310 857888 784586

0.02 37043.14 1.41 19429835346 1414486 0.01 2208065954 6350 7460815

0.41 302.42 2.26 23136 14240 0.15 831894 95710 117597

Total (Field) Correlation Relative error mean e NSE PBIAS Adjusted R2 Max AE Median AE Mean AE

0.84 0.56 0.01 0.38 13.71 0.71 4682 83 269

0.83 24.56 1.33 239.26 1521.46 0.69 68379 3742 7950

0.15 3.14 0.58 0.59 4.06 0.07 586 473 437

0.33 4669.82 3.33 690445 91558 0.10 993358 863572 794840

0.02 23013.14 1.38 6656825053 702894 0.00 2208065954 5306 3672649

0.39 306.63 2.28 23430 14433 0.15 934573 98149 121848

670

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673 R² = 0.7114

(a)

60000 50000

10000

LAB

40000

100

20000

0

LAB

10

R² = 0.868

10000 0

R² = 0.868

1000

FIELD

R² = 0.1261

30000

R² = 0.1261

(b)

R² = 0.7114

1

10000 20000 30000 40000 50000 60000

FIELD

1

10

100

1000

10000

60000 50000

(b)

R² = 0.71

PBSR

KARIM

Predicon_C (ppm)

(a) Predicon_C (ppm)

Fig. 7. a- Validation part scatter graph of KARIM formula, b- double logarithmic form.

40000 30000 20000 10000 0

R² = 0.74 0

12000

PBSR

9000

MW

R² = 0.74

6000 3000 0

10000 20000 30000 40000 50000 60000

Observaon_C (ppm)

R² = 0.08 0

3000

6000

9000

12000

Observaon_C (ppm)

Fig. 8. Dual scatter plots of predictions for field dataset. a- KARIM, PBSR b-MW, PBSR.

500000

(a) -EH

1000000

Predicon_C (ppm)

Predicon_C (ppm)

1200000

800000 600000 400000 200000 0

300000 200000 100000 0

0

5000

10000

(b) -AW

400000

0

5000

15000

10000

15000

Observaon_C (ppm)

Observaon_C (ppm)

Predicon_C (ppm)

500000

(c) -YANG

400000 300000 200000 100000 0

0

5000

10000

15000

Observaon_C (ppm) Fig. 9. Scatter plots of field data in validation dataset a-EH model, b-AW model, c- YANG model.

As explained earlier many studies conducted in flume data set and also correspondent formulas belong to lab data. Because of this the performances of the models are compared with flume dataset within the validation partition. Table 4 represents the determined fit statistics for this division. For Flume dataset PBSR, KARIM and MW give better performance than others (EH, AW, YANG). There is a big gap between these two triples by the sense of all comparison statistics. As seen

in Fig. 10a–c the latter’s scatter plots. Since there are very extreme predictions the y axis is scaled to relatively small value (200,000). By correlation coefficients and Adjusted R2 values these three models (EH, AW, YANG) seems they cannot capture almost nothing in the variation of the observed lab data. But this immediate result may not be completely true, since these statistics (like many others) are very sensitive to extreme data. For example after the restrictions of EH formula in validation dataset there exist only

671

200000

200000

(a) -EH

150000

Predicon_C (ppm)

Predicon_C (ppm)

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

100000 50000 0

0

5000

10000

(b) -AW

150000 100000 50000

15000

0

0

Observaon_C (ppm)

5000

10000

15000

Observaon_C (ppm) Predicon_C (ppm)

200000

(c) - YANG

150000 100000 50000 0

0

5000

10000

15000

Observaon_C (ppm) Fig. 10. Scatter plots of Lab data in validation dataset a-EH model, b-AW model, c- YANG model.

Table 4 Model comparison statistics for Lab data within dataset. PBSR

KARIM

MW

EH

AW

YANG

Validation (Lab) Correlation Relative error mean e NSE PBIAS Adjusted R2 Max AE Median AE Mean AE

0.95 0.55 0.03 0.89 9.40 0.90 3107 118 333

0.93 1.49 0.18 0.84 12.03 0.87 6643 152 352

0.90 4.87 0.53 0.76 12.48 0.81 1726 373 452

0.03 174.30 0.22 8866 1398 0.03 953505 154 27259

0.07 25.93 0.22 561 529 0.01 665133 185 5708

0.005 14.06 0.21 229 538 0.03 172068 205 8416

Total (Lab) Correlation Relative error mean e NSE PBIAS Adjusted R2 Max AE Median AE Mean AE

0.94 0.56 0.02 0.88 10.20 0.89 3162 89 282

0.94 1.51 0.20 0.86 8.46 0.88 6643 124 288

0.87 5.09 0.58 0.64 19.91 0.75 2500 378 455

0.02 116.76 0.23 16714 2041 0.01 986248 110 30403

0.01 147.47 0.23 296192 4455 0.01 24762913 147 39503

0.02 15.59 0.21 352 619 0.01 270496 181 8094

PBSR

Predicon_C (ppm)

12,000

R² = 0.868

8,000 6,000 4,000

MW

R² = 0.82

10,000 8,000 6,000 4,000 2,000

2,000 0

R² = 0.90

PBSR

12,000

KARI M

10,000

14,000

R² = 0.90

Predicon_C (ppm)

14,000

0 0

2,000 4,000 6,000 8,000 10,000 12,000 14,000

Observaon_C (ppm)

0

2,000 4,000 6,000 8,000 10,000 12,000 14,000

Observaon_C (ppm)

Fig. 11. Dual scatter plots of predictions for lab dataset. a- KARIM, PBSR b-MW, PBSR.

672

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

154 data and for only 7 data the EH model produces extremely huge values. Because of this the correlation coefficient and the Adjusted R2 value are approaching zero. If these 7 data points are eliminated from the dataset, then the correlation coefficients rise to 0.87 (from 0.03). But these 7 points are completely concordant with the restrictions of the formula; all the attributions of these 7 cases are within the restrictive boundaries of the formula (see Section 2.1). There are similar situations for all methods, reorganizing the dataset could increase some fit statistics. Because of such situations it is advised that using different fit statistics together may give relatively unbiased comparisons. The other inference of this type of lack of fits is that the data based models are very dependent to the dataset which they are generated. If it is possible a kind of ensemble modeling technique like v-fold cross validation technique should be used to obtain more robust models. For most hydrological modeling studies this solution is not applicable because these arrangements need proper data mining tools and detailed information about data mining. An alternative but less effective way may be (if possible) increasing the range of the dataset and the number of data points to increase the generalizability (see Table 1). As seen in Table 1 the range of river width, depth and the river discharges are very huge. The performance statistics of PBSR, KARIM and MW are close to each other. The ranking for all fit statistics of Table 4 is PBSR, KARIM and MW. The dual scatter graph of the models are given in Fig. 11. As seen from the figures KARIM and MW formulas give high deviations for large values of sediment concentration.

Appendix A see Table A.

Table A Compiled data and reference studies. River Code

Place

Number of Data

Reference

1 2 3 4

ACP AMC ATC CHP

Field Field Field Field

112 8 61 27

5

COL

Field

93

6

HII

Field

24

7

MID

Field

37

8

MIS

Field

160

9 10

MOU NED

Field Field

84 49

11

NIO

Field

36

12

POR

Field

219

13

RED

Field

24

14

RGC

Field

6

7. Conclusion

15

RGR

Field

201

In this study 10 non-dimensional parameters of the sediment transport are used with their second and third order powers to capture nonlinear relationships. Since the parameters are generated from their powers there is a high correlation within the input dataset. So Best sub regression model is modified to overcome multicollinearity problem and a new sediment transport formula (Eq. (12)), which is suitable for the use of both river and flume data, is obtained after trial and error process of approximately six hundred million models. The new formula is compared with 5 well known sediment formulae in the literature. The proposed formula outperforms the compared models in terms of model fit statistics. While some models are relatively good for partial dataset (flume or field), all benchmark models give poor performance for the dataset which include both field and flume data. The present study indicates that using a mixed dataset (flume + field) and also increasing the range of the dataset and the number of data points yields a better equation that can accurately predict sediment. The Parameters, Particle Reynolds number (R), the slope of the stream (S), the ratio of river depth to the particle sediment size (L), relative roughness (H/d50) and Particle parameter (P) have greater influence when predicting sediment rate. This formula can be used for the rivers which have a slope between 0.0002 and 6.69, Particle diameter-d50 with 0.083–3.4 mm, concentration between 2.9–12,900 ppm. For further works the application performance of the proposed formula could be tried in a wide range dataset and the proposed formula could be modificated. The methodology of this study (modification of best subset regression) would be used in further studies to generate new formulas, especially if the input dataset is highly inter-correlated.

16

RIO

Field

31

17

TRI

Field

1

Data of Mahmood (Brownlie (1981b)) American Canal Data of Simons (1957) Atchafalaya River Data of Toffaleti (1968) India Canal Data of Chaudry (Brownlie (1981b)) Colorado River Data of the U.S. Byreau of Reclamation (Brownlie (1981b)) Hii River Data of Shinohara and Tsubaki (Brownlie (1981b)) Middle Loup River Data of Hubbell and Matejka (1959) Mississippi River Data of Toffaleti (Brownlie (1981b)) Mountain Creek Data of Einstein (1944) Rio Magdelena and Canal del Dique Data of NEDECO (Brownlie (1981b)) Niobrara River Data of Colby and Hembree (1955) Portugal River Data of Da Cunha (Brownlie (1981b)) Red River Data of Toffaleti (Brownlie (1981b)) Rio Grande Conveyance Channel Data of Culbertson (Brownlie (1981b)) Rio Grande Data of Nordin and Beverage (Brownlie (1981b)) Rio Grande Near Bernalillo Data Given by Toffaletti (Brownlie (1981b)) Trinity River Data of Knott (Brownlie (1981b))

1 2

BAL BEN

Lab Lab

14 2

3 4 5 6

CHY COS DAV EPA

Lab Lab Lab Lab

7 19 67 14

7 8 9 10 11 12 13 14 15 16 17 18

EPB FRA GIL GKA GKB GUY JOR KEN KNB LAU MCD MPR

Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab

9 11 35 23 15 125 7 4 4 5 8 16

19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35

MUT NOR OBR OJK PRA SIN STE STR TAY VAB VAH WLM WLS WSA WSB WSS ZNA

Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab Lab

11 23 33 13 28 58 17 12 9 7 5 4 65 197 36 13 11

Acknowledgments This study is based on the PhD thesis, ‘Developing A New Total Sediment Transport Formula By Using River Parameters’, of the first author under supervision of the third author in Istanbul Technical University (ITU).

Barton and Lin (1955) Government of West Bengal (Brownlie (1981b)) Chyn (Brownlie (1981b)) Costello (1974) Davies (1971) E. Pakistan Water and Power (Brownlie (1981b)) Gov. of E. Pakistan (Brownlie (1981b)) Franco (Brownlie (1981b)) Gilbert (Brownlie (1981b)) Gilbert (Brownlie (1981b)) Gilbert (Brownlie (1981b)) Guy et al. (1966) Jorissen (1938) Kennedy (Brownlie (1981b)) Kennedy and Brooks (1965) Laursen (1958) MacDougall (1933) Meyer-Peter and Muller (Brownlie (1981b)) Mutter (1971) Nordin (1976) O’Brien (Brownlie (1981b)) Onishi (Brownlie (1981b)) Pratt (Brownlie (1981b)) Singh (1960) Stein (1965) Straub (Brownlie (1981b)) Taylor (1971) Vanoni and Brooks (1957) Vanoni and Hwang (Brownlie (1981b)) Williams (Brownlie (1981b)) Willis et al. (1972) US Waterways Exp. Sta. (Brownlie (1981b)) US Waterways Exp. Sta. (Brownlie (1981b)) US Waterways Exp. Sta. (Brownlie (1981b)) Znamenskaya (1963)

D. Okcu et al. / Journal of Hydrology 539 (2016) 662–673

References Ackers, P., White, W.R., 1973. Sediment transport: new approach and analysis. ASCE J. Hydraul. Eng. 99 (11), 2041–2060. Alonso, C.V., 1980. Selecting a formula to estimate sediment transport capacity in non-vegetated channels. CREAMS: A Field-Scale Model for Chemicals, Runoff, and Erosion from Agricultural Management Systems Conservation Research Report Number 25, May 1980, pp. 426–439, 3 Tab, 39 Ref, 1 Append. Barton, J.R., Lin, P.N., 1955. A study of the sediment transport in alluvial streams. Civil Engineering Department Report. Belsley, David A., Kuh, Edwin, Welsch, Roy E., 1980. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. Wiley, New York, ISBN 0-471-05856-4. Brownlie, W.R., 1981b. Compilation of Alluvial Channel Data: Laboratory and Filed. Report No. KH-R-43B, W.M. Keck Laboratory of Hydraulics and Water Resources Division of Engineering and Applied Science, California Institute of Technology, Pasadena, California, 309pp. Brownlie, W.R., 1981. Prediction of flow depth and sediment discharge in open channels. Rep. No. KH-R-43B, Lab. of Hydraulic Research, California Institute of Technology, Pasadena, Calif. Cigizog˘lu, H.K., 2002. Suspended sediment estimation for rivers using artificial neural networks and sediment rating curves. Turk. J. Eng. Environ. Sci. 26 (1), 27–36. Choi, S.U., Lee, J., 2015. Prediction of total sediment load in sand-bed rivers in korea using lateral distribution method. J. Am. Water Resour. Assoc. 51 (1), 214–225. Colby, B.R., Hembree, C.H., 1955. Computations of total sediment discharge, Niobrara River near Cody, Nebraska (No. 1357). US Geological Survey; for sale by USGPO. Costello, W.R., 1974. Development of bed configurations in coarse sand: Experimental Sedimentology Laboratory, Dept. of Earth and Planetary Sciences. Damgaard, J.S., Whitehouse, R.J., Soulsby, R.L., 1997. Bed-load sediment transport on steep longitudinal slopes. J. Hydraul. Eng. 123 (12), 1130–1138. Davies, T.R., 1971. Summary of Experimental Data for Flume Tests Over Fine Sand. University of Southampton, Department of Civil Engineering. Dogan, E., Tripathi, S., Lyn, D.A., Govindaraju, R.S., 2009. From flumes to rivers: can sediment transport in natural alluvial channels be predicted from observations at the laboratory scale?, Water Resources Research, vol. 114, W08433, doi: http://dx.doi.org/10.1029/2008WR007637. Engelund, F., Hansen, E., 1967. A Monograph on Sediment Transport in Alluvial Streams. Teknish Forlag Technical Press, Copenhagen, Denmark, p. 62. Einstein, H.A., 1944. Bed-load transportation in Mountain Creek. Einstein, H.A., 1950. The Bedload Function for Sediment Transportation in Openchannel Flows. In: Soil Conservation Servicen Technical Bulletin, No. 1026. U.S Department of Agriculture. García, M.H., Laursen, E.M., Michel, C., Buffington, J.M., 2000. The legend of AF Shields. J. Hydraul. Eng. 126 (9), 718–723. García, M., 2008. Sediment Transport and Morphodynamics. Sedimentation Engineering: pp. 21–163. doi: http://dx.doi.org/10.1061/9780784408148.ch02. Gupta, H.V., Sorooshian, S., Yapo, P.O., 1999. Status of automatic calibration for hydrologic models: comparison with multilevel expert calibration. J. Hydrol. Eng. 4 (2), 135–143. Guy, H.P., Simons, D.B., Richardson, E.V., 1966. Summary of alluvial channel data from flume experiments, 1956–1961 (No. 462-I). Howard, A.J., Bonell, M., Gilmour, D., Cassells, D., 2010. Is rainfall intensity significant in the rainfall–runoff process within tropical rainforests of northeast Queensland? The Hewlett regression analyses revisited. Hydrol. Process. 24 (18), 2520–2537. Hubbell, D.W., Matejka, D.Q., 1959. Investigations of Sediment Transportation, Middle Loup River at Dunning, Nebraska: With Application of Data from Turbulence Flume (No. 1476). Geological Survey (US). Jorissen, A.L., 1938. Étude expérimentale du transport solide des cours d’eau. Rev. Univ. Mines 14, 269–282. Karim, M.F., Kennedy, J.F., 1990. Menu of coupled velocity and sediment-discharge relations for rivers. J. Hydraul. Eng. 116 (8), 978–996. Karim, M.F., Kennedy, J.F., 1983. Computer-based Predictors for Sediment Discharge and Friction Factor of Alluvial Streams. University of Iowa, Report No, Iowa Institute of Hydraulic Research, p. 242. Karim, F., 1998. Bed material discharge prediction for nonuniform bed sediments. J. Hydraul. Eng. 124 (6), 597–604. Kennedy, J.F., Brooks, N.H., 1965. Laboratory study of an alluvial stream at constant discharge. In: Proc. Federal. Inter-Agency Sedimentation Conf, pp. 320–330. Kisi, O., K. Cigizoglu, H., 2007. Comparison of different ANN techniques in river flow prediction. Civil Eng. Environ. Syst. 24 (3), 211–231. Lacombe, G., Douangsavanh, S., Vogel, R.M., McCartney, M., Chemin, Y., Rebelo, L.M., Sotoukee, T., 2014. Multivariate power-law models for streamflow prediction in the Mekong Basin. J. Hydrol.: Reg. Stud. 2, 35–48. Laursen, E.M., 1958. The total sediment load of streams. J. Hydraul. Div. 84 (1), 1–36.

673

Loomis, S.E., Russell, J.M., Ladd, B., Street-Perrott, F.A., Damste, J.S.S., 2012. Calibration and application of the branched GDGT temperature proxy on East African lake sediments. Earth Planet. Sci. Lett. 357, 277–288. Lopez, F., Garcia, M.H., 2001. Risk of sediment erosion and suspension in turbulent flows. J. Hydraul. Eng. 127 (3), 231–235. MacDougall, C.H., 1933. Bed-Sediment transportation in open channels. Eos, Trans. Am. Geophys. Union 14 (1), 491–495. Molinas, A., Wu, B., 2001. Transport of sediment in large sand-bed rivers. J. Hydraul. Res. 39 (2), 135–146. Moriasi, D.N., Arnold, J.G., Van Liew, M.W., Bingner, R.L., Harmel, R.D., Veith, T.L., 2007. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE 50 (3), 885–900. Mutter, D.G., 1971. A flume study of alluvial bed configurations (Doctoral dissertation, Department of Civil Engineering, University of Alberta). Nakato, T., 1990. Test of selected sediment-transport formulas. J. Hydraul. Eng. 116 (3), 362–379. Nash, J., Sutcliffe, J.V., 1970. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol. 10 (3), 282–290. Neter, J., Wasserman, W., Kutner, M.H., 1985. Applied Linear Statistical Models, second ed. Richard D. Irwin Inc., Homewood, IL. Neter, J., Wasserman, W., Kutner, M.H., 1989. Applied linear regression models. Niño, Y., García, M., 1998. Using Lagrangian particle saltation observations for bedload sediment transport modelling. Hydrol. Process. 12 (8), 1197–1218. Nordin, C.F., 1976. Flume studies with fine and coarse sands (No. 76–762). US Geological Survey. O’brien, R.M., 2007. A caution regarding rules of thumb for variance inflation factors. Qual. Quant. 41, 673–690. Parker, R., Arnold, J.G., Barrett, M., Burns, L., Carrubba, L., Neitsch, S.L., Srinivasan, R., 2007. Evaluation of three watershed-scale pesticide environmental transport and fate models. J. Am. Water Res. Assoc. 43 (6), 1424–1443. Pektasß, A.O., 2015. Determining the essential parameters of bed load and suspended sediment load. Int. J. Global Warming 8 (3), 335–359. Pektasß, A.O., Dog˘an, E., 2013. Prediction of bed load via suspended sediment load using soft computing methods. Simons, D.B., 1957. Theory and design of stable channels in alluvial materials. CER, pp. 57–17. Singh, B.R., 1960. Study of critical velocity of stick-slip sliding. J. Eng. Ind. 82 (4), 393–398. Sinnakaudan, S., Ab Ghani, A., Ahmad, M.S.S., Zakaria, N.A., 2006. Multiplelinear regression model for total bed material load prediction. J. Hydraul. Eng. 132 (5), 521–528. Sinnakaudan, S.K., Sulaiman, M.S., Teoh, S.H., 2010. Total bed material load equation for high gradient rivers. J. Hydro-environ. Res. 4 (3), 243–251. Smart, G.M., 1984. Sediment transport formula for steep channels. J. Hydraul. Eng. Stein, R.A., 1965. Laboratory studies of total load and apparent bed load. J. Geophys. Res. 70 (8), 1831–1842. Tayfur, G., Karimi, Y., Singh, V.P., 2013. Principal component analysis in conjuction with data driven methods for sediment load prediction. Water Resour. Manage. 27, 2541–2554. http://dx.doi.org/10.1007/s11269-013-0302-7. Taylor, B.D., 1971. Temperature effects in alluvial streams. Toffaleti, F.B., 1968. A Procedure for Computation of the Total River Sand Discharge and Detailed Distribution, Bed to Surface (No. TR-5). COMMITTEE ON CHANNEL STABILIZATION (ARMY). Toffaleti, F.B., 1969. Definitive computations of sand discharge in rivers. J. Hydraul. Div., ASCE 95 (HY1), 225–246. Van Rijn, L.C., 1984. Sediment transport. Part I: Bed load transport. J. Hydraul. Eng. 110 (10), 1431–1456. Vanoni, V.A., Brooks, N.H., 1957. Laboratory studies of the roughness and suspended load of alluvial streams. Willis, J.C., Coleman, N.L., Ellis, W.M., 1972. Laboratory study of transport of fine sand. J. Hydraul. Div. 98 (3), 489–501. Woo, H., Yoo, K., 1991. Discussion on ‘‘test of selected sediment-transport formulas”. J. Hydraul. Eng., ASCE 117 (9), 1233–1234. Wu, W., Wang, S.S., Jia, Y., 2000. Nonuniform sediment transport in alluvial rivers. J. Hydraul. Res. 38 (6), 427–434. Wu, W., Wang, S.S.Y., 2003. Selection and Evaluation of Nonuniform Sediment Transport Formulas for River Modeling. The XXX IAHR Congress, Thessaloniki, Greece. Yang, C.T., 1973. Incipient motion and sediment transport. J. Hydraul. Div., ASCE 99 (HY10), 1679–1704. Yang, C.T., 1979. Unit stream power equations for total load. J. Hydrol. 40 (1), 123– 138. Yang, C.T., Marsooli, R., Aalami, M.T., 2009. Evaluation of total load sediment transport using ANN. Int. J. Sedim. Res. 24 (3), 274–286. Znamenskaya, N.S., 1963. Experimental study of the dune movement of sediment. Trans. State Hydrol. Inst. (Trudy GGI) 108.