Modelling of soil permeability using different data driven algorithms based on physical properties of soil

Modelling of soil permeability using different data driven algorithms based on physical properties of soil

Journal Pre-proofs Research papers Modelling of soil permeability using different data driven algorithms based on physical properties of soil Vijay Ku...

1MB Sizes 0 Downloads 43 Views

Journal Pre-proofs Research papers Modelling of soil permeability using different data driven algorithms based on physical properties of soil Vijay Kumar Singh, Devendra Kumar, P.S. Kashyap, Pramod Kumar Singh, Akhilesh Kumar, Sudhir Kumar Singh PII: DOI: Reference:

S0022-1694(19)30958-8 https://doi.org/10.1016/j.jhydrol.2019.124223 HYDROL 124223

To appear in:

Journal of Hydrology

Received Date: Revised Date: Accepted Date:

20 February 2019 5 October 2019 9 October 2019

Please cite this article as: Singh, V.K., Kumar, D., Kashyap, P.S., Singh, P.K., Kumar, A., Singh, S.K., Modelling of soil permeability using different data driven algorithms based on physical properties of soil, Journal of Hydrology (2019), doi: https://doi.org/10.1016/j.jhydrol.2019.124223

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Β© 2019 Published by Elsevier B.V.

Modelling of soil permeability using different data driven algorithms based on physical properties of soil Vijay Kumar Singh1, Devendra Kumar2, P.S. Kashyap3, Pramod Kumar Singh4, Akhilesh Kumar5, Sudhir Kumar Singh6 [email protected],

Research Scholar

[email protected], [email protected],

Professor

[email protected],

Professor

[email protected], [email protected], 1,2,3,5Department

Professor

Professor

Assistant Professor

of Soil and Water Conservation Engineering, Govind Ballabh Pant University

of Agriculture and Technology, Pantnagar, Uttarakhand, India – 263145 6K.

Banerjee Centre of Atmospheric and Ocean Studies, University of Allahabad, Prayagraj-211002, India

4Department

of Irrigation and Drainage Engineering, Govind Ballabh Pant University of

Agriculture and Technology, Pantnagar, Uttarakhand, India – 263145 Abstract: Soil permeability is an important parameter for assessment of infiltration, runoff, ground water, drainage and structures design. In the current research, five different data driven algorithms namely Multilayer Perceptron (MLP), Co-Active Neuro-Fuzzy Inference System (CANFIS), Support Vector Machine (SVM), Decision Tree (DT) and Random Forest (RF) algorithms and also, their wavelets (W-MLP, W-CANFIS, W-SVM, W-DT and W-RF algorithms) were used to predict soil permeability based on physical properties of soil. Also, 1

reliable information/input vectors were assessed based on Gamma Test (GT). Sand, silt, clay and organic content (OC) parameters were chosen as information vectors based on gamma test. The potential of data driven algorithms were evaluated based on different statistical indices during model development and validation phase. It was found that wavelet based algorithms viz. WMLP, W-CANFIS, W-SVM, W-DT and W-RF simulated better results of soil permeability compared to non-wavelet (MLP, CANFIS, SVM, DT and RF) algorithms. Among all wavelet and non-wavelet algorithms, W-RF algorithm had the highest accuracy and efficiency of model. The results of sensitivity analysis indicated that clay > silt > sand > OC > BD > PD was the order of sensitive parameters for soil permeability prediction based on data driven algorithms. Keyword: Soil permeability, Sensitivity, Wavelet, MLP, CANFIS, SVM, Decision Tree, Random Forest

1. Introduction Permeability is the most imperative building property of soils and it is used for understanding infiltration, runoff, drainage and settlement process (Mishra et al., 2010). Also, the management of water resources, drinking water supply, water detonations, ground water storage and surface water storage are highly affected with seepage properties of soil. Seepage is connected specifically to permeability of soil media (K) determined by Darcy's law (Darcy 1856). Permeability is a basic soil property depicting the hydraulic activities of unsaturated soils, and it is identified with the properties of water and soil (Dong et al., 2018). The permeability of various types of soils such as silty, clay, loamy, sandy varies wildly. It is demanding to decide the permeability for the reason that of the variation in mineral composition and the multifaceted texture of the soil (Dong et al., 2018). The permeability is likely the most vital hydrogeological

2

parameter, alongside other parameters, as it affects the stream of flow and relocation of contaminants underneath the ground surface, particularly in aquifers system and soils (Li, 2014). The variability in permeability creates problem in construction of drainage structures, canals, check dams and other civil and soil water structures. The permeability of soil may be determined in the field or in the laboratory. However, field measurement of soil permeability is very complex, generally expensive, work concentrated and tedious (Vienken and Dietrich, 2011; Fereshte 2014). A number of methods have been developed to estimate the permeability of soil (McKinlay and Safiullah 1980; Lapierre et al. 1990; Vereecken et. al., 1990; Alyamani and Sen,1993; Fredlund et al., 1994; Pape et al., 1999; Nelson 2005; Shahnazari and Vahabikashi, 2011). But these methods require complex parameters and have limitations, assumption and low accuracy. Therefore, indirect techniques have been developed to simulate the permeability of soil, based on basic soil properties such as soil texture (sand, silt and clay), particle density, bulk density and total carbon content, using data driven algorithms (Akbulut, 2005; Donohue and Wensrich 2008; Ghanbarian-Alavijeh, 2010; Nosrati et al., 2012; Emami et al., 2012; Sihag et al., 2017; Sihag, 2018). The data driven algorithms have no limitations, assumptions and also have higher simulation accuracy as compared to other models (Chapuis, 2012). It is the motive behind that the authors decided to predict permeability of soil using data driven algorithms. In modern studies, some applications and recommendations of data driven algorithm namely Multilayer Perceptron (MLP), Co-Active Neuro-Fuzzy Inference System (CANFIS), Support Vector Machine (SVM), Decision Tree (DT), Self-organizing map (SOM) and Random Forest (RF) algorithms in different fields such as rainfall prediction (Singh et al., 2015; 2017a), runoff prediction (Noori et al., 2011; Rezaeianzadeh et al., 2013; Singh 2017; Singh et al., 2016b; 2017b; Kumar et al., 2019 ), evaporation estimation (Saran et al., 2017; Kisi and Alizamir, 2018;

3

Rezaie-Balf et al., 2018; Shiri, 2018), sediment prediction (Mirbagheri et al., 2010; Kisi and Shiri, 2012; Singh et al, 2016c; 2018, Yaseen and Kisi, 2018; Kumar et al., 2019), soil temperature estimation ( Kisi et al., 2015; 2017; Singh et al., 2018) have been reported by the researchers. After review of potential of different algorithms, authors decided that MLP, SVM, CANFIS, Random Forest, Decision Tree algorithms and their wavelet transform may be applied to predict the permeability of soil. Some of basic questions rose related to pre-handling of vectors such as how many input vectors, what should be specific relation between input and output vectors for development of the best model? The gamma test (GT) is determination of the meaningful relation between input and output vector in non-linear modelling techniques during pre-development of vector. The input and output relation were chosen based on least value of gamma test and standard error (SE) value (Noori et al. 2012). The general details of gamma test may be obtained from given citation (Remesan et al., 2008; Lafdani et al., 2013). The gamma test (GT) is one of the most popular techniques to assess the most governing input and output relation (Singh et al. 2018a; b). The low correlation between input and output combination and large number of input vector have high complication and high chance of overfitting of developed model (Kumar et al., 2019). This was the reason for applying the gamma test for selection of governing input vectors to reduce complexity and chance of overfitting of models. The present study was undertaken with the objectives as: (a) to select specific input-output vector relation based on gamma test; (b) to concentrate on development of techniques for prediction of the soil permeability from soil texture (sand, silt and clay), particle density, bulk density and total carbon content by using different data driven algorithms; (c) to assess the rank of input vectors for prediction of soil permeability based on sensitivity analysis. 4

2. Material and methods 2.1 Field work and lab experiments A total of 180 soil samples were collected from a depth of 0-30 cm using auger from different locations in the study area. The physicochemical properties including grain size, density, permeability and total organic content (TOC) were determined in the Irrigation and Drainage Engineering and Soil and Water Conservation Engineering Laboratories, G. B. Pant University of Agriculture and Technology, Pantnagar, India. The particle size of soil was analyzed by hydrometer method (Bouyocos, 1962) in soil and water conservation engineering lab. Total organic content present in soil was evaluated using chemical titration (Walkley and Black, 1934) in water quality laboratory of Irrigation and Drainage Engineering. The particle density was determined by liquid pycnometer method and water was used as liquid material (Singh, 2001). The core or volumetric cylinder method was used to determine bulk density in soil and water conservation engineering lab. The particle and bulk density were determined by standard procedure of lab manual (Singh, 2001). The disturbed soil samples were used for determination of soil permeability. The details of experimental data such as texture, bulk density, particle density, organic carbon and permeability of soil are presented in Table 1. The permeability of soil (K) was determined by constant head permeability test. The constant head permeability test apparatus is shown in Fig. 1. The coefficient of permeability of soil was determined by the following equation. 𝐾=

𝑄×𝐿 𝐴𝐻𝑇

….

(1)

Where, K = soil permeability (cm/sec), Q = water discharge (cm3/sec), A = Cross sectional area of soil media (cm2), L = Soil media length (cm), T = Time (sec), H= Head difference (cm). 5

Fig: 1 Constant head permeability test diagram Table: 1 Statistical information of data set 2.2 Data driven algorithms 2.2.1 MLP algorithm MLP system is a supervised learning method and it is combination of different hyper-parameters that support in approximating complex connection between input and output (Ayyadevara, 2018). The hyper-parameters such as learning rate, momentum, hidden layers, iteration and number of neurons were used during the development of MLP system. The MLP networks were made up of three different layers viz. information (input), hidden and yield (output). The information level/layer is generally the independent factor and utilized to simulate the yield (output) level/layer (Kumar et al., 2019). The hidden level/layer is utilized to change the information factors into a simulated yield. The activation function is used to change the information signal into a yield signal in MLP system. In present scenario, tan hyperbolic transfer function was used because this function is marginally more rapid and better than logistic and sigmoid transfer functions (Maier and Dandy, 1998). The learning algorithm adjusts weights and biases for minimizing the overall inaccuracy in MLP (Kisi, 2007). The Levenberg–Marquardt algorithm is highly capable to minimize the overall error (Rezaeian et al., 2010). In this scenario, hyper-parameter namely learning rate (0.2), momentum (0.1), iteration (1000), hidden level (1) and neuron (2 to 40) were applied for development of efficient model (Table 2). The facts and fundamental literature of MLP was reported by these researchers (Kisi, 2007; Rezaeian et al., 2010; Singh et al., 2016a). The MLP and CANFIS models were developed using NeuroSolution 5.0.

6

2.2.2 CANFIS algorithm The CANFIS algorithm is the combined form of fuzzy system and artificial neural system with quick and precise capacities (Zareabyaneh, et al., 2016). Generally, Tsukamoto and TSK fuzzy structures are used for CANFIS model development in NeuroSolution 5.0 software. Takagi-Sugeno and Kang (TSK) fuzzy structure (Takagi and Sugeno, 1985) with two participation capacities were favored in this investigation. The Gaussian membership function was utilized for each information neuron. These structures were selected on the basis of past investigations and the suggestions of scientists (Aytek, 2008; Wang et al., 2017, Pradhan et al., 2018). In current scenario, Gaussian membership functions (MFs) and Takagi-Sugeno-Kang fuzzy (TSK) fuzzy model were applied for development of CANFIS model. The value of MFs (2 to 6), threshold (0.001) and iteration (1000) were used to develop CANFIS model (Table 2). The information and elementary concept about CANFIS can be assessed from several literature (Tabari et al., 2012; Aziz et al., 2013; Pradhan et al., 2018; Singh et al., 2018b). 2.2.3 SVM algorithm SVM is a supervised learning technique which is applied to solve the prediction, pattern recognitions, regression and classification problem. SVM algorithm is frequently used more than other machine learning techniques. For example, neural systems are firmly identified with structural risk minimization hypotheses in statistics (Smola et al., 2007). The basic theory of SVM algorithm (Vapnik,1998) was developed for solving regression and classification problem. In this study, radial basis kernel function was used to develop SVM model and the hyper parameters of model were defined using a large number of trials in modeling approaches (Table 2). Details about technique are found in the literature (Kisi and Cimen, 2010; Chen et al. 2010;

7

Kisi, 2015; Barzegar et al., 2017; Yu et al., 2017). R studio 3.5.1 was used for development of SVM, RF and DT models. 2.2.4 DT algorithm A decision tree is used for prediction of a target by learning decision methods. The decision tree is a supervised learning data driven technique and used for solving the regression, classification and prediction by learning decision techniques. It is constructing of groups as a tree structure and partitions a data set into smaller subsets. A decision tree is started from the root node (known as the first parent), each node can be split into left and right child nodes. These nodes can be further split and they themselves become parent nodes of their resulting children nodes. In the meantime, a related decision tree is gradually created. The last outcome is a tree with decision hubs (nods) and leaf hubs. A decision hub has at least two branches. Leaf hub is a representation to an arrangement. The highest decision hub in a tree, which relates to the finest forecaster is called root hub (Lee et al., 2013; Fakhari and Moghadam, 2013; Nasridinov et al., 2013). The details of parameters used in decision tree algorithm is presented in Table 2. Similar studies have been conducted by various researchers (Harb et al., 2009; Loh, 2014; Fakhari and Moghdam, 2013; Czajkowski and Kretowski, 2016; Nagalla et. al., 2017). 2.2.5 RF algorithm RF algorithm was first presented by Leo Breiman in 2001. RF algorithm is a flexible, simply hyper-parameter optimization and easily applicable machine learning algorithm. RF algorithm avoids avoiding overfitting problem during model development. RF algorithm can be applied to solve a classification, regression strategy and prediction problems. Additionally, excessive superiority of RF algorithm is that it is very simply to determine the virtual importance

8

of respectively feature in the forecasting. In regression, tree indicator derives the numerical values instead of class labels utilized by the arbitrary tree classifier (Karimi et al., 2018). The RF model is built though fitting single trees in group (bagging system). The error of the bagged trees is equivalent that of the single tree, while the error of bagging is decreased by decrease in the relationship between trees (Hastie et al., 2009). The full-developed trees are not pruned back, and it is one of the real points of interest of random forest model over other tree strategies (Quinlan 1992). RF algorithm is developed based on the decision of the pruning techniques and not the variable choice measures, influence the execution of tree-based methods (Pal and Mather 2003). The ideal parameters were required for designing random forest model given in Table 2. The fundamental theory of random forest can acquire from (Chen et al., 2017; Singh et al. 2017c; Shiri, 2018). 2.2.6 Wavelet transform The wavelet transform algorithm is an efficient and powerful mathematical algorithm that gives an optimum time-frequency representation of an interpreted data in the time and frequency perspective. The modeling of permeability or hydraulic conductivity processes using the initial data that is without or with zero resolution makes the internal mechanism problematic to understand. Resolution of component is the separation of data into various frequency components. In present the study, Haar wavelet algorithm (Haar, 1910) was used to determine the wavelet transform. The literature of wavelet algorithm is well known and described by researcher (Si and Zeleke, 2005; Goyal, 2014; Hu and Si, 2016). The original discrete time series (C0(t)) can be resolved by Atrous decomposition algorithm as, +∞

πΆπ‘Ÿ(𝑑) = βˆ‘π“ = β€•βˆžh(𝓁)πΆπ‘Ÿ ― 1(𝑑 + 2π‘Ÿπ“)(π‘Ÿ = 1,2,.....)

9

… (2)

π‘Šπ‘Ÿ(𝑑) = πΆπ‘Ÿ ― 1(𝑑) ― πΆπ‘Ÿ(𝑑)(π‘Ÿ = 1,2,.....)

… (3)

where, h(𝓁) is the discrete low-pass filter. Cr(t) and Wr(t) (r = 1, 2 ….) are respectively sub time series of scale coefficient and wavelet coefficient at rth resolution stage. In the current research, five data driven algorithms and wavelets models were applied to predict permeability of soil and also, performance of algorithms were assessed based on RMSE, WI, NSE, PI and CC values obtained from observed and simulated values of permeability of soil during training and testing periods. The qualitative results of different algorithms (MLP, CANFIS, SVM, DT and RF) and wavelets (W-MLP, W-CANFIS, W-DT, W-RF) models were analyzed in terms of scatter plot and time series plot between observed and predicted permeability of soil during testing period. Gamma test was applied to evaluate most dominant input vector for prediction of soil permeability. Sensitivity was analyzed with increasing and decreasing value of each input vector. The flow chart of model development is presented in Fig. 2. Table: 2 The hyper parameter of algorithms used for model development Fig: 2 Flow chart of model development 2.3 Model performance Evaluation The model performance was estimated by different statistical indicates namely Nash Sutcliffe Efficiency (NSE), Performance Index (PI), Root Mean Square Error (RMSE), Willmott’s Index of Agreement (WI) and Correlation Coefficient (CC). The details of statistical indicator can be assessed from given citations (Gandomi & Roke, 2015; Hyndman & Koehler, 2006; Nash & Sutcliffe, 1970; Singh, 2018a; Willmott et al, 2012)

10

3

Results and Discussion

3.1 Gamma test The gamma test result is illustrated in Table 3. After masking the sand, silt and clay from inputs vector, values of gamma and SE had high deviation with respect to no masking (111111). Masking of PD, BD and OC also showed deviation with respect to no masking but it was less compared to masking sand, silt and clay. Therefore, sand, silt and clay input vectors affected the soil permeability output vector more compared to other input (PD, BD and OC) vectors. It was observed that clay particle of soil extremely affects the permeability of soil in non-linear relation. The particle density of soil had the least effect on soil permeability. After analyzing twenty-two models, based on minimum value of gamma and SE, Model-20 having governing inputs as sand, silt, clay and OC was selected as it had minimum values of gamma (0.05614) and SE (0.00424). The gamma and SE values of all models are presented by histogram plot in Fig.3. Fig: 3 Histogram plot between Models and values of gamma and SE Table: 3 Result of gamma test for input variable selection 3.2 Wavelet and non-wavelet data driven algorithms The results of MLP, CANFIS, SVM, DT and RF algorithms were analyzed into two phases (Table 4). First phase was training (model development) phase and another testing phase for validating developed model. The performance of models were assessed based on higher values of WI (close to 1 for good and close to 0 for poor), NSE (close to 1 for good and close to 0 for poor), CC (close to +1 for good and close to -1 for poor) and least values of RMSE (close to 0 for good and higher values (+ ∞) for poor), PI (close to 0 for good and higher values (+ ∞) for poor) during both phases. The RF model has value of NSE = 0.827, WI = 0.950, CC = 0.910, 11

RMSE = 2.789 cm/hr, PI = 0.212 in training phase and NSE = 0.849, WI = 0.956, CC = 0.922, RMSE = 2.304 cm/hr, PI = 0.165 in testing phase. After comparing the results of MLP, CANFIS, SVM, DT and RF models, models capability (RF > DT > SVM > CANFIS > MLP) was determined for soil permeability prediction. The performance of all models was found to be satisfactory. The RF and DT models had higher values of WI, NSE, CC and least values of RMSE, PI during training and testing phases as compared to MLP, CANFIS, SVM, models. It was finally observed that random forest and decision tree models outperformed other models for prediction of soil permeability. The WT was innovative algorithm coupled with data driven algorithms namely MLP, CANFIS, SVM, RF and DT models in the current study. The results of wavelet-based data driven algorithm viz. W-MLP, W-CANFIS, W-SVM, W-DT and W-RF models are presented in Table 4. The W-RF algorithm estimated the best values of NSE (0.880), WI (0.970), CC (0.942), RMSE (1.725 cm/hr) and PI (0.144) during model calibration and also, the best values of NSE (0.900), WI (0.972), CC (0.950), RMSE (1.530 cm/hr) and PI (0.108) during validation of model. Based on statistical indices of all developed models during training and testing phase, it was observed that all the models were capable to predict the permeability of soil. After comparing all developed models output, it was also observed that W-DT and W-RF algorithms outperformed W-MLP, W-CANFIS, and W-SVM models for prediction of soil permeability. It was observed that the results of data driven algorithms were mainly dependent on their hyper parameters. The defining of appropriate value of hyper parameters in data driven modelling is very complex. In the current study, this was solved by trial and error. So, this may be reason behind the lower performance of MLP, CANFIS and SVM algorithms as compared to RF and DT models. The performance of all developed models may be improved by optimizing the hyper 12

parameters of algorithm by different metaheuristic optimizing techniques. It was seen that all data driven algorithms results were improved by coupling wavelet algorithms in soil permeability prediction. It was also observed that wavelet-based data driven algorithms performance was better than only data driven algorithms. The qualitative analysis of developed data driven models were assessed through visual interpretation of time series, scatter plots and Taylor diagram. The time series and scatter diagram were plotted between observed soil permeability and predicted soil permeability during testing phase (Fig. 4 to 6). It may be inferred from time series and scatter plots that MLP and CANFIS models had high deviation from observed soil permeability compared to SVM, DT and RF models. Taylor diagram was plotted between standard deviation and correlation of observed and simulated by different models. It was observed from Taylor diagram that RF model has the better correlation as compared to other models. Fig: 4 Scatter and time series diagrams of non-wavelet based algorithms Fig: 5 Scatter and time series diagrams of wavelet-based algorithms Fig: 6 Taylor diagrams of non-wavelet and wavelet-based algorithms during testing period Table: 4 Results of different data driven algorithms 3.3 Sensitivity investigation of different variables The prospective of gamma test for input assortment and potential of independent vector in regression problem can be evaluated through sensitivity analysis. The best model, random forest, was used to measure sensitivity. The sensitivity analysis was achieved through 10 % increase and decrease of the parameter values. The sensitivity investigation (Singh et al., 2018a) was carried out in terms of relative sensitivity (RS). The sensitivity investigation results of 13

developed model are illustrated in Table 5. It was observed from results that bulk density and particle density have values of RS (1.481) and (-1.294) in training and (-0.487) and (-0.648) in testing of model. The RS values for soil texture parameters clay, sand and silt were estimated as 23.957, 17.894 and 18.177 respectively, during the development stage of model. Similarly, the RS values of clay, sand and silt were calculated as -18.487, 15.012 and 16.254 in validation stage of developed model. After comparing results of all variables, it was observed that soil texture (clay, sand, silt) was the most sensitive parameter for prediction of soil permeability by way of data driven algorithms. In the same way, it was also observed that clay soil particles has the highest influence on the soil permeability as compared to other parameters based on sensitivity analysis results. It can be stated that the rank of sensitive parameters was clay > silt > sand > OC > BD > PD for prediction of soil permeability by the data driven algorithms. The minimum and maximum permeability of soil values varied from 1.03 to 18.39 cm/hr. The permeability of soil was found to be depending on large numbers of parameters such as percentage of sand, silt, clay and organic carbon, soil salinity, compaction of soil, viscosity of water, pore size distribution etc. The bulk and particles density were dependent on soil texture. In the present study, most of the soil samples were sandy, sandy loam and loam. Therefore, the deviation in bulk and particles density of soil was found to be less. This may be the reason for less effect of bulk and particles density on permeability of soil. Table: 5 Sensitivity results of different parameters 4. Summary and Conclusions This research was conducted in three-parts; first part was collection of soil sample from different locations of field, second part was analysis of soil samples in laboratory, and third part was analysis and modelling of data. In third part, five different non wavelet data driven 14

algorithms (MLP, CANFIS, SVM, DT and RF) and wavelet data driven algorithms (W-MLP, WCANFIS, W-SVM, W-DT and W-RF) were applied for simulation of soil permeability based on various physical properties of soil. After analyzing the results of gamma test, it was concluded that soil particle viz. sand, silt, clay and OC parameters are appropriate input combination among sand, silt, clay, BD, PD and OC parameters for prediction of soil permeability. These selected parameters can be applied in future studies for prediction of soil permeability. Finally, it was concluded that RF algorithm was the best performing algorithm with high accuracy and efficiency for soil permeability prediction based on wavelet and non-wavelet algorithms. Sensitivity results indicated that soil texture (composition of sand, silt and clay particle) was the most sensitive vector as compared to OC, BD and PD parameters. It was recommended that RF model may be applied for soil permeability prediction in future works. Most of hydraulic structures are built on soil and if the soil under them is porous, it may be result in the seepage of the water, and may be also affect in piping action, this will decrease the strength of the soil to support structural load. Therefore, this research may be applied in design of canals, check dams, dam and drainage system. Similarly, it may be applied in assessment of surface and ground water for planning and management, ground water recharge from canals and monsoon rainfall. For future research point of view, authors suggest that the accuracy and efficiency of developed models may be increased by the hyper parameters optimization of algorithms. Acknowledgement: Author give special thanks to TEQIP-III for financial support. Conflict of Interest None. Reference

15

Akbulut, S., 2005. Artificial neural networks for predicting the hydraulic conductivity of coarsegrained soils. Eurasian Soil Sci 38(4):392–398 Alyamani, M.S., Sen, Z., 1993. Determination of hydraulic conductivity from grain-size distribution curves. Ground Water 31(4):551–555 Aytek, A., 2008. Co-active neurofuzzy inference system for evapotranspiration modeling. A fusion of foundations. Methodologies and applications. Soft Comput 13(7):691–700 Ayyadevara, V. K., 2018. Pro Machine Learning Algorithms, https://doi.org/10.1007/978-14842-3564-5_7 Aziz, K., Rahman, A., Shamseldin, A.Y., Shoaib, M., 2013. Co-active neuro fuzzy inference system for regional flood estimation in Australia. J Hydrol Environ Res 1(1):11–20 Barzegar, R., Moghaddam, A. A., Adamowski, J., Fijani, E., 2017. Comparison of machine learning models for predicting fluoride contamination in groundwater. Stoch. Environ. Res. Risk Assess. 31 (10), 2705–2718. Bouyoucos, G. J., 1962. Hydrometer method improved for making particle size analysis of soils. Agron. J. 54: 464 Breiman, L.: Random Forests. Machine Learning 45 (1) pp. 5–32 (2001). Chapuis, R. P., 2012. Predicting the saturated hydraulic conductivity of soils: a review, Bull. Eng. Geol. Environ. 71:401–434, DOI 10.1007/s10064-012-0418-7 Chen, G., Long, T., Xiong, J., Bai, Y., 2017. Multiple random forests modelling for urban water consumption forecasting. Water Resour. Manage. 31, 4715–4729.

16

Chen, S.T., Yu, P.S., Tang, Y.H., 2010. Statistical downscaling of daily precipitation using support vector statistical downscaling of daily precipitation using support vector machines and multivariate analysis. J. Hydrol. 385 (1–4), 13–22. Czajkowski, M. and Kretowski, M., 2016. The role of decision tree representation in regression problems – Anevolutionary perspective, Applied Soft Computing 48: 458–475 Darcy, H., 1856. Les fontaines publiques de la ville de Dijon. Victor Dalmont, Paris Dong, S., Guo, Y., Xiong, Y. 2018. Method for Quick Prediction of Hydraulic Conductivity and Soil-Water Retention of Unsaturated Soils, J. Transport. Res. Board DOI: 10.1177/0361198118798486 Donohue, T.J., Wensrich, C.M., 2008. The prediction of permeability with the aid of computer simulations. Part Sci Technol. 26:97–108 Emami, H.S., Horafa, M, Neyshabouri, M.R., 2012. Evaluation of hydraulic conductivity at inflection point of soil moisture characteristic curve as a matching point for some soil unsaturated hydraulic conductivity models. JWSS Isfahan Univ Technol 16(59):169– 182. http://jstna r.iut.ac.ir/artic le-1-2206-en.html Fakhari, A., Moghadam, A.M.E., 2013. Combination of classification and regression indecision tree for multi-labeling image annotation and retrieval, Appl. SoftComput. 13 (2):1292–1302. Fereshte, F.H., 2014. Evaluation of artificial neural network and regression PTFS in estimating some soil hydraulic parameters. Proenviron Promediu 7(17):10–20 Fredlund, M.D., Xing, A.Q., Huang, S.Y., 1994. Predicting the permeability function for unsaturated soils using the soil-water characteristic curve. Can Geotech J 31(4):533–546

17

Ghanbarian-Alavijeh, B., Liaghat, A.M., Sohrabi, S. 2010. Estimating saturated hydraulic conductivity from soil physical properties using neural network model. World Acad Sci Eng Technol 62:131–136 Goyal, M.K., 2014. Monthly rainfall prediction using wavelet regression and neural network: an analysis of 1901–2002 data, Assam, India. Theor. Appl. Climatol. 118, 25–34. Haar, A., 1910. Zur Theorie der orthogonalen Funktionensysteme, Mathematische Annalen, 69 (3): 331–371, doi:10.1007/BF01456326 Harb, R., Yan, X., Radwan, E., Su, X., 2009. Exploring precrash maneuvers using classification trees and random forests. Accident Analysis & Prevention, 41(1), 98-107. Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning. Springer, New York. Hu, W., Si, B.C., 2016. Technical note: multiple wavelet coherence for untangling scale specific and localized multivariate relationships in geosciences. Hydrol. Earth Syst. Sci. 20 (8), 3183–3191. Karimi, S., Sadraddini, A.A., Nazemi, A.H., Xu, T., Fakherifard, A., 2018. Generalizability of gene expression programming and random forest methodologies in estimating cropland and grassland leaf area index. Comput. Electron. Agric. 144, 232–240. Kisi, O., 2007. Streamflow forecasting using different artificial neural network algorithms. ASCE J Hydrol Eng 12(5):532–539 Kisi, O., Shiri, J., 2012. River suspended sediment estimation by climatic variables implication: comparative study among soft computing techniques. Comput Geosci 43:73–82. https ://doi.org/10.1016/j.cageo .2012.02.007

18

Kisi, O. 2015. Pan evaporation modeling using least square support vector machine, multivariate adaptive

regression

splines

and

M5

model

tree,

Journal

of

Hydrology,

https://doi.org/10.1016/j.jhydrol.2015.06.052. Kisi, O., Alizamir, M., 2018. Modelling reference evapotranspiration using a new wavelet conjunction heuristic method: Wavelet extreme learning machine vs wavelet neural networks,

Agricultural

and

Forest

Meteorology,

263:41-48,

DOI:

10.1016/j.agrformet.2018.08.007 Kisi, O. and Cimen, M., 2010. A wavelet-support vector machine conjunction model for monthly streamflow forecasting, Journal of Hydrology,https://doi.org/10.1016/j.jhydrol.2010.12.041 Kisi, O., Sanikhani, H., Cobaner, M., 2017. Soil temperature modeling at different depths using neuro-fuzzy, neural network, and genetic programming techniques. Theor Appl Climatol 129 (3–4), 833–848. http://dx.doi.org/10.1007/s00704-016-1810-1. Kisi, O., Tombul, M., Kermani, M.Z., 2015. Modeling soil temperatures at different depths by using three different neural computing techniques. Theor. Appl. Climatol. 121, 377-387 Kumar, A., Kumar, P. and Singh, V.K., 2018. Evaluating Different Machine Learning Models for Runoff and Suspended Sediment Simulation, Water Resources Management, https://doi.org/10.1007/s11269-018-2178-z Lafdani, E.K., Moghaddam, N.A., Ahmadi, A., 2013. Daily suspended sediment load prediction using artificial neural networks and support vector machines. J. Hydro. 478, 50–62. http://dx.doi.org/10.1016/j.jhydrol.2012.11.048. Lapierre, C., Leroeuil, S., Locat, S., 1990. Mercury intrusion and permeability of Louiseville clay. Can Geotech J 27:761–773

19

Lee, D.L., Deng, L.Y., Lin, K.H. et. al., 2013. Using Decision Tree Analysis for Personality to Decisions of the National Skills Competition Participants, ITC, Lecture Notes in Electrical Engineering 253, DOI: 10.1007/978-94-007-6996-0_56 Li, P. 2014. Book Review: β€œEffective Parameters of HydrogeologicalModels” may lead readers safely through the deep waters of uncertainty. Environ Process 1:187–192. doi:10.1007/s40710-014-0008-8 Loh, W. 2014. Fifty years of classification and regression trees, Int. Stat. Rev. 83 (3) (2014) 329–348. Maier, H.R., Dandy, G.C. 1998. The effect of internal parameters and geometry on the performance of back-propagation neural networks: an empirical study. Environ Model Softw 13(2):193–209 McKinlay, D.G., Safiullah, A.M.M., 1980. Pore size distribution and permeability of silty clays. ASCE, J Geotech Eng Div, 106(GT10):1165–1168 Mirbagheri, S.A., Nourani, V., Rajaee, T., Alikhani, A., 2010. Neuro-fuzzy models employing wavelet analysis for suspended sediment concentration prediction in rivers. Hydrol Sci J 55(7):1175–1189. https://doi.org/10.1080/02626 667.2010.50887 1 Mishra, A., Ohtsubo, M., Loretta L. Higashi., T., 201. Prediction of compressibility and hydraulic conductivity of soil-bentonite mixture, International Journal of Geotechnical Engineering, 4:3, 417-424, DOI: 10.3328/IJGE.2010.04.03.417-424 Nagalla, R., Pothuganti, P., Pawar D. S. 2017. Analyzing gap acceptance behavior at unsigalized intersections using support vector machines, decision tree and random forests, Procedia Computer Science 109C:474–481 20

Nasridinov, A., Sun-Young, I. and Park, Y. H. 2013. A Decision Tree-Based Classification Model for Crime Prediction, ITC, Lecture Notes in Electrical Engineering 253, DOI: 10.1007/978-94-007-6996-0_56 Nelson, P.H., 2005. Permeability, porosity, and pore-throat size: a three-dimensional perspective. Petrophysics 46(6):452–455 Noori, R., Karbassi, A.R., Moghaddamnia, A., Han, D., Zokaei-Ashtiani, M.H., Farokhnia, A., Ghafari Gousheh, M., 2011. Assessment of input variables determination on the SVM model performance using PCA, Gamma test and forward selection techniques for monthly stream flow prediction. J. Hydrol. 401, 177–189 Nosrati, K.F., Movahedi, N.S., Hezarjaribi, A., Roshani, G.A., Dehghani, A.A., 2012. Using artificial neural networks to estimate saturated hydraulic conductivity from easily available soil properties. Electron J Soil Manag Sustain Prod 2(1):95–110 Pal, M., Mather, P.M., 2003. An assessment of the effectiveness of decision tree methods for land cover classification. Remote Sens Environ 86(4):554–565 Pape H, Clauser C, Iffland J., 1999. Permeability prediction based on fractal pore space geometry. Geophysics 64(5):1447–1460 Pradhan, S., Kumar, S., Kumar, Y., Sharma, H.C. 2018. Assessment of groundwater utilization status and prediction of water table depth using different heuristic models in an Indian interbasin, https://doi.org/10.1007/s00500-018-3580-4 Quinlan, J. R., 1992. Learning with continuous classes. Proceedings of Australian Joint Conference on Artificial Intelligence. World Scientific Press, Singapore, pp 343–348

21

Remesan, R., Shamim, M.A., Han, D., 2008. Model input data selection using gamma test for daily solar radiation estimation. Hydrol. Process. 22, 4301–4309. Rezaeianzadeh M, Amin S, Khalili D, Singh, V.P., 2010. Daily outflow prediction by multi-layer perceptron with logistic sigmoid and tangent sigmoid activation functions. Water Resour Manag 24(11):2673–2688 Rezaeianzadeh, M. Tabari, H. Yazdi, A.A. Isik, S. and Kalin, K., 2013. Flood flow forecasting using ANN, ANFIS and regression models, Neural Comput & Applic, DOI 10.1007/00521013-14436. Rezaie-Balf, M., Kisi, O, Chua, L., 2018. Application of ensemble empirical mode decomposition based on machine learning methodologies in forecasting monthly pan evaporation, Hydrology Research, DOI: 10.2166/nh.2018.050 Richard A. B., 2016. Encyclopedia of Machine Learning and Data Mining, ISBN 978-1-48997687-1 Saran, B., Kashyap, P.S., Singh B. P., Singh, V. K., Vivekanand, 2017. Daily Pan Evaporation Modeling in Hilly Region of Uttarakhand Using Artificial Neural Network, Indian Journal of Ecology 44(3):467-473 Shahnazari, M.R., Vahabikashi, A., 2011. Permeability prediction of porous media with variable porosity by investigation of stokes flow over multi-particles. J of Porous Media 14(3):243–250, DOI: 10.1615/JPorMedia.v14.i3.40 Shiri, J., 2018. Improving the performance of the mass transfer-based reference evapotranspiration estimation approaches through a coupled wavelet random forest methodology, Journal of hydrology, https://doi.org/10.1016/j.jhydrol.2018.04.042

22

Si, B.C., Zeleke, T.B., 2005. Wavelet coherency analysis to relate saturated hydraulic properties to soil physical properties. Water Resour. Res. 41 (11). Sihag, P., Tiwari N.K., Ranjan, S., 2017. Prediction of unsaturated hydraulic conductivity using adaptive neuro-fuzzy inference system (ANFIS). ISH J Hydraul Eng. https ://doi.org/10.1080/09715010.2017.13818 6 Sihag, P. 2018. Prediction of unsaturated hydraulic conductivity using fuzzy logic and artificial neural

network,

Modeling

Earth

Systems

and

Environment,

https://doi.org/10.1007/s40808-018-0434-0 Singh, S. (2001) Field and laboratory manual for soil science. Pub. Kalyani Publishers, New Delhi, India, p 4-6. Singh, B.P, Kumar, P., Singh, V.K., and Yadav, M., 2015. Rain-fall prediction through artificial neural networks (ANNS): a soft computing approach, Progressive Research: An International Journal, 10(3): 1167-1173 Singh VK, Kumar D, Kashyap PK, Kisi O (2018b) Simulation of suspended sediment based on gamma

test,

heuristic,

and

regression-based

techniques.

Environ

Earth

Sci.

https://doi.org/10.1007/s12665-018-7892-6 Singh. V.K., Kumar, P., Singh, B.P., 2016a. Rainfall-runoff Modeling using Artificial Neural Networks (ANNs) and Multiple Linear Regression (MLR) Techniques Indi J Eco 43 (2): 436-442 Singh, V.K., Kumar, P., Singh, B.P., Malik, A. 2016b. A comparative study of adaptive neuro fuzzy inference system (ANFIS) and multiple linear regression (MLR) for rainfallrunoff modelling, Int J Sci Natur 7 (4): 714-723 23

Singh, V.K., Singh, B.P., Kisi, O., Kushwaha, D.P., 2018a. Spatial and multi-depth temporal soil temperature assessment by assimilating satellite imagery, artificial intelligence and regression-based

models

in

arid

area.

Comput

Electron

Agric.

https://doi.org/10.1016/j.compag.2018.04.019 Singh, V.K, Singh, B.P., Kumar, A., Vivekanand, 2017b A comparative study of artificial intelligence

and

conventional

techniques

for

rainfall-runoff

modeling

DOI10.15740/HAS/IJAE/10.2/441-449 Singh, V.K, Singh, B.P, Vivekanand 2016c. Basin suspended sediment prediction using soft computing and conventional approaches in India, Int J Sci Natur 7 (2) 2016: 459-468 Singh, B., Singh, P., Singh, K. 2017c. Modelling of impact of water quality on infiltration rate of soil by random forest regression, Model. Earth Syst. Environ. DOI 10.1007/s40808-0170347-3 Singh, B.P., Kumar, P., Srivastava, T., Singh, V.K., 2017a. Estimation of Monsoon Season Rainfall and Sensitivity Analysis Using Artificial Neural Networks Indian Journal of Ecology 44(Special Issue-5):317-322 Singh, V.K, 2017. Soft computing based rainfall runoff modelling. LAP LAMBERT Academic publishing, Mauritius, ISBN, 987-3-330-35008-3 Smola, A., Vishwanathan, S.V.N., Le, Q. 2007. Bundle methods for machine learning. In: Koller D, Singer Y (eds) Advances in neural information processing systems, vol 20. MIT Press, Cambridge MA Tabari, H., Talaee, P.H., Abghari, H., 2012. Utility of co-active neuro fuzzy inference system for pan evaporation modeling in comparison with multi-layer perceptron. Meteorol Atmos Phys 116:147–154 24

Takagi, T., Sugeno, M. 1985. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans Syst 15(1):116–132 Vapnik, V., 1998. Statistical learning theory. John Wiley, New York Vereecken, H., Maes, J., Feyen, J. 1990. Estimating unsaturated hydraulic conductivity from easily measured soil properties. Soil Sci 149:1–12 Vienken, T., Dietrich, P., 2011. Field evaluation of methods for determining hydraulic conductivity from grain size data. J Hydrol 400:58–71 Walkley, A.J., Black, I.A., 1934. Estimation of soil organic carbon by the chromic acid titration method. Soil Sci. 37: 29-3) Wang, L., Kisi, K., Zounemat-Kermani, M., Li, H. 2017. Pan evaporation modeling using six different heuristic computing methods in different climates of China. J Hydrol 544:407–427 Wang, D., Ding, J., 2003. Wavelet network model and its application to the prediction of hydrology. Nat. Sci. 1 (1), 67–71. Yaseen, Z.M., and Kisi, O., 2018. The potential of hybrid evolutionary fuzzy intelligence model for

suspended

sediment

concentration

prediction,

Catena,

174:11-23,

DOI:

10.1016/j.catena.2018.10.047 Yu, P.-S., Yang, T.-C., Chen, S.-Y., Kuo, C.-M., Tseng, H.-W., 2017. Comparison of random forests and support vector machine for real-time radar-derived rainfall forecasting. J. Hydrol. 552, 92–104 Zareabyaneh, H., Bayat-Varkeshi, M., Golmohammadi, G., Mohammadi, K., 2016. Soil temperature estimation using an artificial neural network and co-active neuro-fuzzy inference system in two different climates. Arab J Geosci 9(377):1–10

25

List of Figure

Fig: 1

26

Output data (Permeability of soil) Input data (Sand, Silt, Clay, BD, PD, TOC)

Wavelet (Haar)

Gamma Test

C3, W1, W2, W3

Sand, Silt, Clay, TOC

MLP, CANFIS, SVM, RF, DT

WMLP, WCANFIS, WSVM, WRF, WDT Predicted Output (K) Compare with Statistical Indices

Model

Fig. 2

Fig: 3 27

28

Fig:4

29

Fig: 5

30

Fig.6 31

List of Table Table: 1 Statistics Mean Median Standard Deviation Kurtosis Skewness Minimum Maximum Number of Sample

Clay 17.01 16.59 7.29 7.37 1.74 1.96 57.24 172

Mean Median Standard Deviation Kurtosis Skewness Minimum Maximum Number of Sample

17.25 17.13 5.40 0.05 0.34 1.96 35.22 122

Mean Median Standard Deviation Kurtosis Skewness Minimum Maximum Number of Sample

16.42 12.94 10.64 5.47 2.03 6.68 57.24 50

All data Silt Sand 24.55 58.44 25.30 57.79 8.49 12.49 0.33 1.29 -0.03 0.00 0.00 14.20 48.01 97.66 172 172 Training data set 27.49 55.25 26.85 55.30 7.27 9.49 0.89 0.21 0.15 0.06 0.00 34.20 48.01 97.66 122 122 Testing data Set 17.36 66.22 15.75 67.30 6.82 15.33 0.12 3.22 -0.11 -1.13 2.65 14.20 29.18 91.66 50 50

OC 2.36 0.88 2.93 0.71 1.53 0.10 9.90 172

BD 1.35 1.36 0.07 2.05 -0.13 1.12 1.64 172

PD 2.52 2.54 0.12 0.17 -0.94 2.21 2.65 172

K 6.49 6.23 3.87 0.43 0.74 1.03 18.39 172

0.83 0.76 0.38 1.47 1.16 0.10 1.95 122

1.36 1.36 0.05 0.65 -0.42 1.21 1.65 122

2.49 2.53 0.10 0.36 -1.11 2.24 2.65 122

6.16 5.78 3.81 1.02 0.93 1.03 18.39 122

6.08 7.05 3.11 -1.00 -0.62 0.59 9.90 50

1.34 1.34 0.10 0.56 0.26 1.12 1.61 50

2.59 2.65 0.13 3.26 -2.12 2.21 2.65 50

7.28 7.17 3.95 -0.28 0.35 1.14 17.76 50

Table: 2 Model MLP CANFIS Random Forest Decision Tree SVM

Parameter of the algorithms Learning rate = 0.2, momentum = 0.1, number of neuron = 29, Iteration = 1000, Hidden layer = 1 Membership function (MFs) = Gaussian membership functions, MFs = 3, Fuzzy Model = TSK, threshold = 0.001, Iteration = 1000 Number of Tree = 300, mtry = 8 Size of tree = 6 SVM type = regression, Kernel function = Radial, Cast = 9, 32

Gamma = 0.25 Table: 3 Model M-1 M-2 M-3 M-4 M-5 M-6 M-7 M-8 M-9 M-10 M-11 M-12 M-13 M-14 M-15 M-16 M-17 M-18 M-19 M-20 M-21 M-22

Mask (Combination) 111111 011111 101111 110111 111011 111101 111110 001111 010111 011011 011101 011110 100111 110011 110101 110110 101011 101101 101110 111001 111010 111100

Input Vector Sand, Silt, Clay, PD, BD, OC Absent of Sand Absent of Silt Absent of Clay Absent of PD Absent of BD Absent of OC Absent of Sand and Silt Absent of Sand and clay Absent of Sand and PD Absent of Sand and BD Absent of Sand and OC Absent of Clay and Silt Absent of Clay and PD Absent of Clay and BD Absent of Clay and OC Absent of Silt and PD Absent of Silt and BD Absent of Silt and OC Absent of PD and BD Absent of PD and OC Absent of BD and OC

Gamma

SE

0.05818 0.06914 0.07245 0.09656 0.06014 0.06489 0.06748 0.14570 0.12489 0.09578 0.09145 0.08467 0.16481 0.08941 0.08815 0.09145 0.07515 0.07145 0.07234 0.05614 0.06440 0.06348

0.00459 0.00615 0.00684 0.01485 0.00478 0.00457 0.00601 0.17921 0.15421 0.01784 0.01472 0.01157 0.18759 0.01984 0.01847 0.02154 0.01255 0.01149 0.01189 0.00424 0.00548 0.05297

Table: 4

MLP CANFIS SVM DT RF W-MLP

NSE

WI

0.772 0.652 0.781 0.776 0.827 0.755

0.935 0.896 0.934 0.939 0.950 0.927

Training RMSE (cm/hr) 3.288 5.009 3.158 3.234 2.489 3.533

PI

CC

NSE

WI

0.284 0.448 0.272 0.278 0.212 0.307

0.879 0.813 0.884 0.885 0.910 0.869

0.724 0.739 0.803 0.847 0.849 0.781

0.914 0.926 0.945 0.959 0.956 0.936

33

Testing RMSE (cm/hr) 4.208 3.974 3.006 2.338 2.304 3.334

PI

CC

0.312 0.293 0.218 0.167 0.165 0.243

0.851 0.870 0.896 0.922 0.922 0.885

WCANFIS W-SVM W-DT W-RF

0.846

0.958

2.215

0.187

0.920 0.792

0.938

3.174

0.231 0.891

0.806 0.841 0.880

0.945 0.956 0.970

2.789 2.295 1.725

0.238 0.194 0.144

0.898 0.824 0.917 0.889 0.942 0.900

0.952 0.969 0.972

2.679 1.699 1.530

0.193 0.909 0.120 0.943 0.108 0.950

Table: 5 Variable

Clay

Sand

Silt

BD

PD

OC

Variable values Clay (-10%) Clay Clay (+10%) Sand (-10%) Sand Sand (10%) Silt (-10%) Silt Silt (+10%) BD (-10%) BD BD (+10%) PB (-10%) PD PD (+10%) OC (-10%) OC OC (+10%)

Training NSE 0.684 0.880 0.672 0.784 0.880 0.779 0.769 0.880 0.768 0.861 0.880 0.859 0.872 0.880 0.875 0.815 0.880 0.834

WI 0.758 0.970 0.749 0.841 0.970 0.838 0.834 0.970 0.837 0.962 0.970 0.960 0.965 0.970 0.966 0.914 0.970 0.924

PI 0.491 0.144 0.517 0.394 0.144 0.401 0.415 0.144 0.428 0.198 0.144 0.208 0.197 0.144 0.194 0.345 0.144 0.319

34

Testing RS -23.957

17.894

18.177

-1.294

1.481

11.648

NSE 0.771 0.900 0.789 0.829 0.900 0.831 0.786 0.900 0.798 0.873 0.900 0.864 0.891 0.900 0.895 0.834 0.900 0.851

WI 0.867 0.972 0.849 0.919 0.972 0.914 0.83 0.972 0.839 0.964 0.972 0.951 0.968 0.972 0.969 0.901 0.972 0.9174

PI 0.389 0.108 0.010 0.365 0.108 0.355 0.409 0.108 0.419 0.157 0.108 0.169 0.128 0.108 0.121 0.334 0.108 0.324

RS -18.487

15.012

16.254

-0.648

-0.487

5.089

Highlight 1. The W-RF algorithm have the highest accuracy and efficiency for soil permeability prediction based on physical properties. 2. The results of sensitivity analysis show that clay > sand > OC > BD > PD were the most sensitive parameters for soil permeability prediction. 3. The RF algorithm was the best performing algorithm with high accuracy and efficiency of model for permeability prediction based on wavelet and non-wavelet algorithms.

35