Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique

Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique

Journal of Hydrology 579 (2019) 124172 Contents lists available at ScienceDirect Journal of Hydrology journal homepage: www.elsevier.com/locate/jhyd...

7MB Sizes 0 Downloads 25 Views

Journal of Hydrology 579 (2019) 124172

Contents lists available at ScienceDirect

Journal of Hydrology journal homepage: www.elsevier.com/locate/jhydrol

Research papers

Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique ⁎

Hossein Mojaddadi Rizeeia, Biswajeet Pradhana,b, , Maryam Adel Saharkhiza, Saro Leec,d,

T



a

Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia Department of Energy and Mineral Resources Engineering, Sejong University, Choongmu-gwan, 209 Neungdong-ro, Gwangjin-gu, Seoul, 05006, Republic of Korea c Geoscience Platform Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), Gajeong-dong 30, Yuseong-gu, Daejeon 305-350, Republic of Korea d Korea University of Science and Technology, 217 Gajeong-ro Yuseong-gu, Daejeon 34113, Republic of Korea b

A R T I C LE I N FO

A B S T R A C T

This manuscript was handled by G. Syme, Editor-in-Chief

Machine learning and data-driven models have achieved a favorable reputation in the field of advanced geospatial modeling, particularly for models of groundwater aquifer potential over large areas. Such models built using standalone machine learning techniques retain some uncertainty, including errors associated with the modeling process, sampling approach, and input hyper-parameters. Some of these techniques cannot be applied in data-scarce regions because high bias and variance can lead to oversimplification. Therefore, in the current study, we developed and validated a novel ensemble multi-adaptive boosting logistic regression (MABLR) model for groundwater aquifer potential mapping. This model was validated in a large area of the Gyeongsangbuk-do basin in South Korea and the results were compared to those of different types of machine learning models including multiple-layer perception (MPL), logistic regression (LR), and support vector machine (SVM) models. A forward stepwise LR technique was implemented to assess the importance of contributing morphological factors; we found 15 factors that contributed significantly: topographic wetness index (TWI), topographic roughness index (TRI), stream power index (SPI), topographic position index (TPI), multi-resolution valley bottom flatness (MVBF), slope, aspect, slope length (LS), distance from the river, distance from the fault, profile curvature, plane curvature, altitude, land use/land cover (LULC), and geology. We optimized the MABLR model using a fuzzy logic supervised (FLS) approach with 184 iterations and then validated the results using accuracy assessment metrics including the κ coefficient, root-mean-square error (RMSE), receiver operating characteristics (ROC), and the precision-recall curve (PRC). Our model had superior predictive performance among the models tested, with higher overall goodness-of-fit and validation values according to the κ coefficient (0.819 and 0.781, respectively), ROC (0.917 and 0.838), and PRC (0.931 and 0.872). Our experimental results demonstrate that MABLR is more effective at reducing bias and variance error than other constituent machine learning methods.

Keywords: Machine learning Groundwater aquifer potential Multi-adaptive-boosting-logistic-regression GIS Optimization

1. Introduction Groundwater is among the greatest valuable natural resources due to its vital importance in industrial, residential, and agricultural applications. As a non-renewable natural resource, groundwater quality effects the vulnerability of soil to pollution, drinking water quality, temperature modulation, environmental sensitivity, and local climate change (Manap et al., 2013). One third of the global population depends on groundwater for their daily needs (Oh et al., 2011). The development of groundwater is a key issue for the storage of fresh drinking water (Jothibasu and Anbazhagan, 2016). Therefore, it is

important to investigate the behavior and characteristics of groundwater. Groundwater transmissivity is determined by several factors including geological, physiographical, morphological parameters, hydrological conditions, and climate variation (Kumar et al., 2015), and the availability and activity of groundwater can be affected by topography, lithology, geological structure, slope, and many other factors (Oh et al., 2011). The creation of a comprehensive model that can effectively consider all possible contributing factors for groundwater mapping is essential. Hydrogeological lab tests, sample drilling, and geospatial models



Corresponding authors at: Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and IT, University of Technology Sydney, NSW 2007, Australia (B. Pradhan). Geoscience Platform Division, Korea Institute of Geoscience and Mineral Resources (KIGAM), Gajeong-dong 30, Yuseong-gu, Daejeon 305-350, Republic of Korea (S. Lee). E-mail addresses: [email protected] (B. Pradhan), [email protected] (S. Lee). https://doi.org/10.1016/j.jhydrol.2019.124172 Received 3 July 2019; Received in revised form 30 August 2019; Accepted 23 September 2019 Available online 23 September 2019 0022-1694/ © 2019 Elsevier B.V. All rights reserved.

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

Fig. 1. Location of the study area.

regression spline (Zabihi et al., 2016), index of entropy (Al-Abadi and Shahid, 2015), boosted regression tree (Naghibi et al., 2016), multivariate adaptive regression splines (Rahmati et al., 2019), artificial neural network model (Corsini et al., 2009), and aquifer sustainability factor (Smith et al., 2010). In most cases, statistical and machine learning models perform well; however, if the training sample size is inadequate, these models tend to oversimplify reality. Different sources of uncertainties related to groundwater modeling can include the modeling process, input parameters, and sampling approach (Refsgaard et al., 2007). The ensemble evidential belief function (Mohammady et al., 2012) and tree-based model was proposed to create the groundwater potential map (Naghibi et al., 2019). The development of ensemble models has allowed the integration of a base-learner approach with a prime algorithm to achieve more robust models that can be applied over large study areas, where data coverage can be inconsistent (Naghibi et al., 2017). However, the application of hybrid models should be explored for different regions to determine the optimum model in terms of accuracy, robustness, overfitting, and sensitivity to scarce data (Rahmati et al., 2018). To reduce these modeling uncertainties, we coupled a multi-adaptive boosting hybrid model (MultiAdaBoosting) based on a decisioncommittee technique that combines adaptive boosting (AdaBoost) with wagging, with logistic regression (LR), a robust model with strict expectations prior to training (Pradhan, 2010), to develop the ensemble MABLR model. Although, MultiAdaBoosting is one of the powerful

are tools often used for mapping groundwater. Although these methods provide detailed recognition of subsurface hydrogeological structures (Helaly, 2017), they can be time-consuming and costly (Nampak et al., 2014). The development of geographic information systems (GIS), statistical techniques, machine learning models, and remote sensing data have led to advances in groundwater potential analyses (Yin et al., 2018). GIS and remote sensing technology have been used as spatial research tools in numerous environmental applications including hydrological studies and natural hazard risk assessments (Mojaddadi et al., 2017; Rizeei et al., 2018a,b; Rizeei et al., 2016, 2018c). Recently, machine learning and data mining methods have been implemented in many groundwater studies due to their ability to recognize patterns within inventory datasets and nonlinear relationships between parameters (Naghibi et al., 2018). Numerous forms of GIS-based and machine learning models have been applied for groundwater potential application, including multicriteria decision analysis (Kaliraj et al., 2014; Pradhan, 2010), frequency ratio (Guru et al., 2017; Oh et al., 2011; Rahmati et al., 2016), Dempster-Shafer theory (Rahmati and Melesse, 2016), weights-of-evidence modelling (Corsini et al., 2009; Ghorbani Nejad et al., 2017), Self-learning Random Forests (Sameen et al., 2018), logistic regression (Ozdemir, 2011; Rizeei et al., 2018a), decision tree (Chenini et al., 2010), evidential belief function (Mogaji et al., 2015), the logistic model tree (Rahmati et al., 2018) certainty factor (Razandi et al., 2015), analytical hierarchy process (Adiat et al., 2012; Yin et al., 2018), the statistical index (Falah et al., 2017), multivariate adaptive 2

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

adaptive boosting classifiers that can classify multiple classes on both basic and complex recognition problems, yet it can be sensitive to the existence of the outliers in the dataset which is very common in groundwater domain, as well as over-fitting problems (Naghibi et al., 2016). Therefore, we integrated LR with MultiAdaBoosting to overcome the model over-fitting and outlier sensitivity problems by producing a highly certain ensemble classifier with less dependency on modification of hyper-parameters or settings. Our main goal was to evaluate the ability of MABLR to assess the morphological parameters of groundwater aquifer potential zones in South Korea and compare its results to those of support vector machine (SVM), multiple-layer perception (MPL), and standalone LR models. Specifically, we optimized the contributing factors for LR groundwater modeling; spatially modeled groundwater aquifer potential using the ensemble MABLR machine learning model and compared the results to those of other constituent machine learning models including MLP, SVM, and LR; assessed the strength of machine learning model hyperparameters using a fuzzy logic supervised (FLS) approach; and compared the performance of MABLR and other machine learning models. The study site was the Ulseong county in South Korea and covers an area of about 1175.2 km2 (Fig. 1). The Ulseong county comprise of 831.0 km2 (70.7%) of forest land, 214.6 km2 (18.2%) of farmland, and 32.6 km2 (2.7%) of rivers. The yearly average temperature is 11 °C, and it is a cold and dry region with very little precipitation due to its geographical nature as an inland basin situated between Taebaek and Sobaek mountain ranges. It rains an average of 92 days annually. The average precipitation is 960 mm, which shows the shortage of rainfall, compared to the Korean mean of precipitation amount 1250 mm (http://www.usc.go.kr/eng/About_Uiseong/Introduction/Location). We identified 169 rock aquifers within the study area and recorded specific capacity and transmissivity information for each well in the region based on field surveys. Rock aquifer data indicate that the maximum range for aquifer groundwater is about 955.41 m3/h in winter, declining to a minimum of 0.0005 m3/h in summer. Well inventory points were randomly separated into two classes of 70% (118 wells) for model training and 30% (51 wells) for model testing. Based on the transmissivity (T) characteristics of each individual well, the well inventory was divided into two groups, productive (yield > 40 m3/h) and unproductive (yield < 40 m3/h), according to the criteria of Sameen et al. (2018). To create an effective well inventory for use in the machine learning models, productive and unproductive samples were assigned values of 1 and 0, respectively. Fig. 1 shows the locations of the groundwater wells within the study area. We examined the effects of 12 contributing morphological factors. These factors were extracted using the ArcGIS 10.6 software in raster format at a spatial resolution of 10 m × 10 m and statistically analyzed using Waikato Environment for Knowledge Analysis (Weka) v. 3.9.2. Topographic indices were derived from a digital elevation model that was originally surveyed as a 1:5000-scale topographic map by the Korean National Geographic Information Institute. We developed and calibrated the MABLR model to map groundwater potential in the basin through the following steps. First, significant contributing factors were selected. Then we modeled groundwater aquifer potential using the calibrated MABLR model and compared the results to those of other well-known machine learning models. Finally, we evaluated the model results using the κ coefficient, root-mean-square error (RMSE), receiver operating characteristics (ROC) curve, and the precision-recall curve (PRC) (Fig. 2).

Fig. 2. The overall flowchart of this study.

productivity. This requires training related parameters (Pradhan and Lee, 2010; Aghdam et al., 2016; Hong et al., 2017; Rizeei et al., 2018a), which can affect model precision. We selected the following groundwater conditioning factors for their potential contribution to the groundwater model: topographic wetness index (TWI), topographic roughness index (TRI), stream power index (SPI), specific catchment area (SCA), topographic position index (TPI), multiresolution valley bottom flatness (MVBF), multiresolution ridge top flatness (MRTF), the convergence index (CI), Melton ruggedness number (MRN), slope, aspect, slope length (SL), distance from the river, distance from the fault, profile curvature, plane curvature, altitude, land use/land cover (LULC), soil, and geology. To determine which contributing parameters were significantly correlated with groundwater aquifer productivity, a forward stepwise LR was applied using the Weka software. The LR assessed the degree of functional correlation among all contributing parameters and spring locations, which affect aquifer expansion (Hosmer et al., 2013; Ozdemir, 2011). Effective contributing parameters were defined as those with P < 0.05 (Rahmati et al., 2018); a total of 15 contributing parameters were identified and retained in the model: TWI, TRI, SPI, TPI, MVBF, slope, aspect, SL, distance from the river, distance from fault, profile curvature, plane curvature, altitude, LULC, and geology (Fig. 3). Elevation is among the most significant parameters used in groundwater analyses; groundwater aquifer potential in highly elevated areas approaches zero (Botzen et al., 2013). Surface runoff flows from highly elevated areas toward lower regions; consequently, groundwater potential is higher in low-altitude or flat terrains. Slope and aspect are topographical factors with important applications as hydrology parameters due to their effects on runoff accumulation and the velocity of excess rainfall (Rizeei et al., 2017). An increase in slope decreases the amount of time available for surface infiltration, increasing the amount of water entering drainage networks that will later be retrievable from groundwater aquifers. Aspect also can be influential parameters, particularly in a hilly area. North and east face aspect have exposed by the long duration of sun radiation where vegetation coverage is not as dense as west and south face aspect in Korean region. Hence, the rainfall drops cannot penetrate the bare soil where soil pores are more likely to get blocked by intense rainfall due to lack of enough vegetation coverage. As a result, the probability of groundwater is more on west or south face aspect rather east and north ones. Plan and profile curvature also contribute significantly to physical groundwater models; these parameters consist of raster data ranging from negative (concave) to positive (convex) values and must be classified before becoming model

2. Materials and methods 2.1. Groundwater conditioning parameters We performed probability analyses to examine the correlation among conditioning factors that can influence groundwater potential model results (Tehrany et al., 2013) and groundwater aquifer 3

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

Fig. 3. Significant contributing factors to groundwater modelling.

input. Pixels with a value of zero are assigned to flat regions. TPI indicates the position of each cell, and is calculated as follows (De Reu et al., 2013; Guisan et al., 1999):

TPI =

aquifer. The MVBF index reflects the valley bottom characteristics of flatness and lowness. Flatness is measured using the inverse of the slope, and lowness is measured using ranking elevation with respect to a circular surrounding area. These two measures, both scaled from 0 to 1, are combined by multiplication and can be interpreted as fuzzy set membership functions (Gallant and Dowling, 2003; Kaufmann, 1975). LS is a combination of slope gradient (S) and slope length (L). We adopted an extensively used method for calculating LS, as follows:

Epixel Esurrounding

(1)

where Epixel is the altitude of the cell and Esurrounding is the mean altitude of the neighboring pixels. High TPI values indicate upper slopes, while low values of TPI show lower slopes where the potential of the groundwater aquifer is high. The MVBF index links between size and flatness of valley bottoms, which was incorporated into the algorithm by reducing the slope threshold. Zero value specifies erosional terrain with less possibility of groundwater aquifer, while values above 1 indicating areas of deposition with much productive groundwater

A 0.4 sinβ ⎞1.3 LS = ⎛ s ⎞ ⎛ ⎝ 22.13 ⎠ ⎝ 0.0896 ⎠

(2)

where A is the accumulated flow of the unit stream power theory, which considers sediments and water, and β is the slope in degrees. 4

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

Fig. 3. (continued)

topographic index can be estimated with respect to grid spacing and terrain roughness by comparing the relationship between the topographic index surface and reference data. TRI is another morphological parameter widely used in groundwater analyses; it is calculated in this study as follows:

Basically, a low value of LS is more probable for a productive groundwater aquifer. SPI and TWI are water-related parameters calculated as follows (Gokceoglu et al., 2005):

SPI = Astanβ ,

(3)

TRI =

As ⎞ TWI = ln ⎛⎜ ⎟, ⎝ tanβ ⎠

Abs(max2 − min2),

(5)

where max and min represent the largest and smallest values of cells in nine rectangular neighborhoods of altitude values. LULC types are also primary factors that strongly contribute to groundwater potential modeling. A detailed understanding of LULCs bears extreme significance for environmental and natural hazards (Rizeei et al., 2016). Lithology and geology are also important parameters used to detect sensitive groundwater aquifer areas. Soil type

(4)

where As is the catchment area or flow accumulation (m2 m−1) and β is the local slope gradient measured in degrees. SPI indicates the erosive power of water flow. TWI represents the effects of topography on runoff generation and the amount of flow accumulation at any location within the river catchment (Gokceoglu et al., 2005). The accuracy of a 5

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

Fig. 3. (continued)

2012). The FLS evaluated hyper-parameters by a search run iteratively from a random vertex that calculated the ideal value among the available domain. After all runs were assessed, the optimal hyperparameter configuration was selected within 184 iterations according to evaluation metrics. The optimal hyper-parameters for all proposed models are summarized in Table 1.

directly affects the drainage process via characteristics such as texture, permeability degree, and structure. Lithological information regarding the permeability of rocks is also required. The study area contained rocks from 142 different lithology classes. Variation in the factors contributing to the behavior and activity of groundwater cause ambiguity in the overlaying process. Therefore, all factors were normalized to a common scale in the feature raster before overlaying (Youssef et al., 2015; Mojaddadi et al., 2017; Fanos and Pradhan, 2019).

2.3. Theory of the LR, SVM, MLP, and MABLR models LR is a widely used multivariate statistical model that can be applied to continuous or discrete data of any distribution or raster format

2.2. Model optimization

Table 1 The optimal value for hyper-parameters of the models by the FbSP technique.

Hyper-parameters affect the quality and robustness of machine learning models and must, therefore, be selected to achieve the highest model performance (Liao et al., 2012). Once all significant hyperparameters were selected, domain values were assigned for each individual hyper-parameter. These values indicate the range of probable values for each parameter. Because the optimal value for single hyperparameters should be coordinated with other hyper-parameters, finding the most effective domain value of hyper-parameters for a model is a time-consuming procedure without optimization systems (Woo et al., 2007). In this study, we specified six classes for each domain to cover its effective range. Following domain selection, we applied an FLS technique to optimize the hyper-parameters (Zhang et al., 2010). The FLS optimized the hyper-parameters of the MABLR, SVM, MLP, and LR models by assigning an optimal predictive value for all involved hyper-parameters within their domains to limit the degree of redundancy between them. Values of hyper-parameters that were exceptionally associated with the model and with low inter-correlations were selected on the basis of discrepancy evaluation (Tong and Murray, 6

Model

Hyper-parameter

Optimal value

MABLR

Weight threshold Seed Number of sub committees Batch size

120 1 4 100

MPL

Seed Momentum Learning rate Hidden layer attribute Hidden layer class

5 0.8 0.9 9.5 2

SVM

Kernel function gamma in kernel C value Penalty parameter

Poly-kernel 0.3 0.25 100

LR

Batch size Ride value

150 1.2e−9

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

2.4. Evaluation methods

(Lee and Sambath, 2006a,b). It was proposed by McFadden (1974) to measure probability of occurrence depending on contributing parameters. LR can be used to evaluate relationships among binary dependent variables over nominal and scalar values of independent variables (Shirzadi et al., 2012). SVM was designed on the basis of statistical learning theory to minimize operational uncertainty (Yao et al., 2008). This process converts nonlinear structures into linear structures according to hyperplane creation (Tehrany et al., 2014). A separate hyperplane is created for the original space with n coordinates among points within two different categories (Marjanović et al., 2011). The hyperplane separates training datasets based on a kernel function of the SVM. Support vectors are recognized as neighboring training vertices of the ideal hyperplane. The goal of the SVM model is to recognize the ideal separating hyperplane range. The MLP algorithm is a feed-forward artificial neural network (NN) that uses nodes linked by input signals and numeric weights to produce layers that receive, process, and display output (Harun et al., 2010). Back-propagation is applied to reduce errors accumulated via the repetitive approach. NNs have successfully been utilized in remote sensing applications. Limitations of the MLP model include high computational costs and overlearning (Mia and Dhar, 2016). MultiAdaBoosting merges AdaBoost with wagging to produce a decision-committee model (Webb, 2000; Bui et al., 2016) that reduces both variance and bias. Although it cannot be applied for committees of < 10 members, MultiAdaBoosting exhibits greater error reduction than all other relative committed algorithms (Kotsiantis et al., 2007). In comparison, MABLR uses LR for classifier-based learning to generate decision committees with less error than either wagging or MultiAdaBoosting, even for a large cross-section of datasets (Webb, 2000). MABLR is more efficient than MultiAdaBoosting due to its matching parallel execution algorithms. The steps of MABLR implementation are shown in Fig. 4. All classifiers determined by wagging are independent from all others, permitting parallel multiplication and creating uncertainty in the MultiAdaBoosting model at the sub-committee class. MABLR improves error reduction compared to other approaches, including bagging decision trees, wagging, and MultiAdaBoosting, particularly at ∊t < 10, when variance is amplified, thus reducing the frequency at which the central tendency is created and therefore reducing its ability to contribute to uncertainty.

The following evaluation metrics were applied to assess the accuracy of groundwater potential models: RMSE, κ coefficient, ROC, and PRC. The κ coefficient measures the overall accuracy of the model among all correctly assigned samples on a diagonal basis in the error matrix allocated by the full dataset (Ridd and Liu, 1998). The κ coefficient is calculated as follows: r

K=

r

M ∑i = 1 x ii − ∑i = 1 xi + x + i M2



r ∑i = 1

xi + x + i

(6)

where r reflects the total number of rows in the error matrix, xii is observation i, xi and x + 1 are the minimal totals, and M is the set of observations. ROC curves are designed to evaluate and visualize the performance of an analytical model; they indicate sensitivity or a true positive rate (TP) associated with a decision threshold on the y-axis, and specificity or false positive rate (FP) on the x-axis (Fawcett, 2006), thus representing the positive and negative probability, respectively, that a pixel is classified correctly. The area under the ROC curve estimates the overall accuracy of the model (Nampak et al., 2014; Pradhan, 2010). However, evaluation of the model solely by visual interpretation of ROC can be misleading; thus, the precision-recall curve (PRC) is a complementary evaluation metric that is useful for imbalanced datasets. The PRC shows the correlation between the positive predictive value (PPV) or precision and sensitivity for all possible pixels, from which TP and FP can be calculated. The PRC graph can be plotted by dividing sensitivity by PPV. The x-axis represents recall or sensitivity, and the y-axis represents precision. Each point on the PRC graph thus represents a selected cut-off. A perfect model will have a ROC and PRC of 1, whereas a value approaching 0 indicates an inaccurate model. RMSE is used to evaluate differences between the observed sample values and predicted model values. RMSD is the square root of the second trial or the quadratic mean of the deviations from observed values to predicted values (Hyndman and Koehler, 2006). RMSE was calculated as follows: n

RMSE =

∑i = 1 (Xtest − Xtrain )2 n

(7)

where Xtest is the set of testing values and Xtrain is the set of training values at i. 3. Results and discussion 3.1. Groundwater potential mapping Groundwater aquifer potential was modeled using four machine learning techniques: MABLR, MLP, SAV, and LR; maps based on these models are shown in Fig. 5. We focused mainly on the development of the MABLR ensemble model because this study is the first to implement it to determine groundwater probability; therefore, the optimization processes are discussed in detail. The models were assessed according to four accuracy metrics. Groundwater potential aquifer maps were created on the basis of predicted probability ranging from 0 to 1, where 0 indicates no probable pixels and 1 indicates 100% probability of occurrence. To create thematic zoning maps, which are more easily understood by end-users and decision makers, we used the quantile technique in the GIS platform to reclassify the probability index into five classes: very high, high, moderate, low, and very low. High potential areas were located in lowelevation zones near riverbanks, and low potential areas were found in high-elevation areas with steep slopes. These findings were common among all groundwater potential aquifer maps; clearly, riparian areas and some upstream areas are expected to have the potential for groundwater yield.

Fig. 4. Steps of MABLR implentation. 7

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

Fig. 5. Groundwater aquifer potential maps calculated by a) LR, b) MLP, c) SVM, and d) MABLF models.

groundwater potential was SPI, with a weight of 4.844. Variation in SPI can directly increase or decrease groundwater potential. Plane curvature and MVBF were the second and third most influential factors, with weights of 4.315 and 4.240, respectively. These factors significantly affected runoff behavior and further delineated areas of groundwater concentration. Other hydrological and morphological factors including TPI, TRI, and SL also contributed greatly to groundwater potential zones, with weights of 3.076, 3.039, and 2.537, respectively. Altitude, index, TWI, and profile curvature made moderate contributions

In particular, MABLR results indicated that locations with the highest groundwater aquifer potential were mainly situated in the western and southwestern regions of the study area (Fig. 5). By contrast, very low groundwater potential was assigned to eastern and northwestern regions of the study area. Among a total of 59 productive wells, 47 were assigned to very high and high groundwater aquifer potential zones, indicating the high precision of the MABLR model. The MABLR model was used to extract the degree of contribution of each factor (Fig. 6). The most effective parameter in determining

Fig. 6. The assigned weightage to each contributing factor by the MABLR model. 8

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

(weights: 1.189, 0.996, 0.955, and 0.481, respectively). However, MABLR defined slope, aspect, geology, distance from fault, and LULC as the least influential factors for this study area, with weights < 0.2. Most of the artificial intelligence-based models contain multiple hyper-parameters that must be precisely defined to achieve ideal results; their interactions must also be considered for model optimization (Dehnavi et al., 2015; Zare et al., 2013). The trial-and-error methods used by machine learning models to retrieve optimal hyper-parameter values are time-consuming and can introduce errors, particularly if the number of hyper-parameters exceeds four and the range of their domains is very wide (Mojaddadi Rizeei et al., 2019). Hence, automating this selection process is useful for decreasing computational time and increasing the accuracy of the final output by considering the full range of possible interactions among hyper-parameters. We adopted FLS optimization, which refines the parameter configuration at each iteration until convergence; however, increasing the number of iterations does not necessarily result in enhanced configuration. FLS determined optimal hyper-parameter values within 184 iterations. Table 1 lists the feasible and optimal hyper-parameters of the implemented models examined in this study, as determined by the FLS approach. The MABLR model was calibrated using four hyper-parameters including weight threshold, seed number, the number of subcommittees, and batch size. The well-calibrated MABLR ensemble model achieved greater bias and error reduction than MultiAdaBoost, particularly at small committee sizes. 3.2. Evaluation of the groundwater potential models The models examined in this study were evaluated by RMSE, κ coefficient, ROC, and PRC, which reflect the efficiency, accuracy, and validity of the resulting groundwater potential aquifer maps. The greatest difference between ROC and PRC is that the ROC graph produces a greater number of true negative results (Table 2). The assessment was divided into two parts: goodness-of-fit (success) and validation (prediction). A training sample of well locations, which represented 70% of the total inventory, was used to assess the success of the model, and the testing sample included the remaining 30% of data not used during the modeling process. All metrics indicated a considerable correlation between the model results and observed data; however, the trained data were not used in the validation process. The success rate (goodness-of-fit) and prediction rate (validation) results are shown in Table 2. The RMSE results indicated that MABLR had lower error than all other models, with values of 0.2483 and 0.3003 for goodness-of-fit and validation assessment, respectively. The predictive performance of MABLR was also greater for goodness-of-fit and validation assessment in terms of κ coefficient (0.8191 and 0.7814, respectively), ROC (0.917 and 0.838), and PRC (0.931 and 0.872) (Fig. 7). By reducing bias and variance in the dataset due to integrated with LR, the ensemble MABLR reduced outlier negative effects and sampling patterns more than other implemented models. Also, it showed the best capability to minimize the overfitting problem due to optimized hyper-parameters of the MultiAdaBoosting, which showed a stable accuracy variation from goodness-of-fit to validation, which is a common issue among ensemble model.

Fig. 7. The area under the ROC graph for goodness-of-fit and validation level.

The MLP results for goodness-of-fit included a κ coefficient of 0.6954, ROC of 0.843, and PRC of 0.924. Slightly lower values were obtained for validation: 0.6697, 0.823, and 0.851, respectively. RMSE error values indicated that the MLP model had the second highest success rate (0.2385) and the third highest validation rate (0.435) among all models. The SVM model yielded κ coefficient, ROC, and PRC values of 0.6853 and 0.6801, 0.834 and 0.813, and 0.883 and 0.791 for

Table 2 The results of goodness-of-fit and validation evaluation of all the applied models. Goodness of fit

Validation

Metrics

RMSE

k coefficient

ROC

PRC

RMSE

k coefficient

ROC

PRC

MABLR MLP SVM LR

0.2483 0.2385 0.3119 0.2937

0.8191 0.6954 0.6853 0.5569

0.917 0.843 0.834 0.822

0.931 0.924 0.883 0.8685

0.3003 0.435 0.3217 0.4704

0.7814 0.6697 0.6801 0.5401

0.838 0.823 0.813 0.745

0.872 0.851 0.791 0.8116

9

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

Declaration of Competing Interest

goodness-of-fit and validation assessment, respectively. SVM placed third among all accuracy assessment metrics. SVM RMSE values indicated the highest error in terms of success rate (0.3119), and the second lowest error in terms of validation rate (0.3217). LR showed the lowest accuracy among all models, with κ coefficient, ROC, and PRC values of 0.5569, 0.822, and 0.8685, respectively, in terms of goodness-of-fit and 0.5401, 0.745, and 0.8116 in terms of validation. RMSE values indicated that LR had a slightly higher success rate (0.2937) than the SVM and the worst performance among all models (0.4704). In general, all models examined in this study had an acceptable amount of uncertainty and high goodness-of-fit. The well-calibrated ensemble MABLR model exhibited the highest performance for modeling groundwater aquifer potential.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Acknowledgements This research was supported by the Basic Research Project of the Korea Institute of Geoscience and Mineral Resources (KIGAM) and Science and Technology Internationalization Project (NRF2016K1A3A1A09915721) funded by the Ministry of Science and ICT. The research is supported by the Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), University of Technology Sydney under grant numbers: 323930, 321740.2232335; 321740.2232424 and 321740.2232357. The English in this document has been checked by at least two professional editors, both native speakers of English. For a certificate, please see: http://www.textcheck.com/certificate/189N3i.

4. Conclusion Sustainable groundwater aquifer management requires precise modeling to accurately and reliably simulate conditions in nature. Modeling groundwater aquifer potential is a delicate process involving the estimation of several morphological and hydrological parameters. Several techniques have been proposed for groundwater potential mapping; however, not all can be applied in data-scarce regions where bias and variance are high, as they tend toward oversimplification. Although, MultiAdaBoosting is one of the powerful adaptive boosting classifiers that can classify multiple classes even on complex recognition problems, yet it is sensitive to the existence of the outliers in the dataset which is very common in groundwater domain, as well as overfitting problems. Therefore, we proposed the ensemble MABLR, which reduces bias and variance in the dataset in the Gyeongsangbukdo basin of South Korea. The integrated MultiAdaBoosting with the actual function of LR caused less sensitivity on outliers, training distribution that resulted in a tangible reduction of overfitting problem with less dependency on modification of hyper-parameters. Several contributing factors were assessed using a dataset of specific capacity and transmissivity for 169 well locations. Initially, we applied a forward stepwise LR algorithm to identify 15 significantly contributing morphological factors: TWI, TRI, SPI, TPI, MRVBF, slope, aspect, SL, distance from the river, distance from fault, profile curvature, plane curvature, altitude, LULC, and geology. Then we developed a new robust ensemble method, coupling LR with the MultiAdaBoosting technique to construct the MABLR model, which showed higher performance than other well-known machine learning methods including MPL, SVM, and standalone LR. We applied FLS to successfully retrieve optimal hyper-parameter values for the implemented models. The model results showed that MABLR had the best accuracy and efficiency based on evaluation by RMSE, κ coefficient, ROC, and PRC. The most influential contributing factors were identified as SPI, plan curvature, and MRVBF. Visual interpolation of high groundwater aquifer potential areas showed that they were located in low-elevation zones near riverbanks whereas low potential areas were located in high-elevation areas with steep slopes. Our results will be valuable for evaluating groundwater studies and successive model development to further reduce uncertainties and consider the morphological factors that influence the precision of groundwater potential modeling. The main barrier of this research was using the contributing factors with a moderate spatial resolution, which reduced the quality of groundwater mapping. Thus, it is suggested to use the 1-meter spatial resolution to leverage the final map precision. Since the proposed model has the capability of modeling the functions with scare input data, it is also recommended being experimented on other probability application such as landslide that has a smaller number of inventory datasets. However, the proposed model should be implemented in multiple regions to test its transferability and reliability before it can be applied to assess the vulnerability of wells.

Appendix A. Supplementary data Supplementary data to this article can be found online at https:// doi.org/10.1016/j.jhydrol.2019.124172. References Adiat, K., Nawawi, M., Abdullah, K., 2012. Assessing the accuracy of GIS-based elementary multi criteria decision analysis as a spatial prediction tool–a case of predicting potential zones of sustainable groundwater resources. J. Hydrol. 440, 75–89. Aghdam, I.N., Varzandeh, M.H.M., Pradhan, B., 2016. Landslide susceptibility mapping using an ensemble statistical index (Wi) and adaptive neuro-fuzzy inference system (ANFIS) model at Alborz Mountains (Iran). Environ. Earth Sci. 75 (7), 553. https:// doi.org/10.1007/s12665-015-5233-6. Al-Abadi, A.M., Shahid, S., 2015. A comparison between index of entropy and catastrophe theory methods for mapping groundwater potential in an arid region. Environ. Monit. Assess. 187 (9), 576. Bui, D.T., Ho, T.-C., Pradhan, B., Pham, B.-T., Nhu, V.-H., Revhaug, I., 2016. GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks. Environ. Earth Sci. 75 (14), 1101. https://doi.org/10.1007/s12665-016-5919-4. Botzen, W., Aerts, J., Van den Bergh, J., 2013. Individual preferences for reducing flood risk to near zero through elevation. Mitig. Adapt. Strat. Gl. 18 (2), 229–244. Chenini, I., Mammou, A.B., El May, M., 2010. Groundwater recharge zone mapping using GIS-based multi-criteria analysis: a case study in Central Tunisia (Maknassy Basin). Water Resour. Manage. 24 (5), 921–939. Corsini, A., Cervi, F., Ronchetti, F., 2009. Weight of evidence and artificial neural networks for potential groundwater spring mapping: an application to the Mt. Modino area (Northern Apennines, Italy). Geomorphology 111 (1–2), 79–87. De Reu, J., Bourgeois, J., Bats, M., Zwertvaegher, A., Gelorini, V., De Smedt, P., Chu, W., Antrop, M., Maeyer, P.D., Finke, P., Meivenne, M.V., Verniers, J., Crombe, P., 2013. Application of the topographic position index to heterogeneous landscapes. Geomorphology 186, 39–49. Dehnavi, A., Aghdam, I.N., Pradhan, B., Varzandeh, M.H.M., 2015. A new hybrid model using step-wise weight assessment ratio analysis (SWARA) technique and adaptive neuro-fuzzy inference system (ANFIS) for regional landslide hazard assessment in Iran. Catena 135, 122–148. https://doi.org/10.1016/j.catena.2015.07.020. Falah, F., Ghorbani Nejad, S., Rahmati, O., Daneshfar, M., Zeinivand, H., 2017. Applicability of generalized additive model in groundwater potential modelling and comparison its performance by bivariate statistical methods. Geocarto Int. 32 (10), 1069–1089. Fanos, A.M., Pradhan, B., 2019. A spatial ensemble model for rockfall source identification from high resolution LiDAR data and GIS. IEEE Access. 7, 74570–74585. https:// doi.org/10.1109/ACCESS.2019.2919977. Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recogn. Lett. 27 (8), 861–874. Gallant, J.C., Dowling, T.I., 2003. A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resour. Res. 39 (12). Ghorbani Nejad, S., Falah, F., Daneshfar, M., Haghizadeh, A., Rahmati, O., 2017. Delineation of groundwater potential zones using remote sensing and GIS-based datadriven models. Geocarto Int. 32 (2), 167–187. Gokceoglu, C., Sonmez, H., Nefeslioglu, H.A., Duman, T.Y., Can, T., 2005. The 17 March 2005 Kuzulu landslide (Sivas, Turkey) and landslide-susceptibility map of its near vicinity. Eng. Geol. 81 (1), 65–83. Guisan, A., Weiss, S.B., Weiss, A.D., 1999. GLM versus CCA spatial modeling of plant species distribution. Plant Ecol. 143 (1), 107–122. Guru, B., Seshan, K., Bera, S., 2017. Frequency ratio model for groundwater potential

10

Journal of Hydrology 579 (2019) 124172

H.M. Rizeei, et al.

Pradhan, B., 2010. Remote sensing and GIS-based landslide hazard analysis and crossvalidation using multivariate logistic regression model on three test areas in Malaysia. Adv. Space Res. 45 (10), 1244–1256. Pradhan, B., Lee, S., 2010. Regional landslide susceptibility analysis using back-propagation neural network model at Cameron Highland, Malaysia. Landslides 7 (1), 13–30. https://doi.org/10.1007/s10346-009-0183-2. Rahmati, O., Melesse, A.M., 2016. Application of Dempster-Shafer theory, spatial analysis and remote sensing for groundwater potentiality and nitrate pollution analysis in the semi-arid region of Khuzestan. Iran. Sci. Total Environ. 568, 1110–1123. Rahmati, O., Moghaddam, D.D., Moosavi, V., Kalantari, Z., Samadi, M., Lee, S., Tien Bui, D., 2019. An automated python language-based tool for creating absence samples in groundwater potential mapping. Remote Sens. 11 (11), 1375. Rahmati, O., Naghibi, S.A., Shahabi, H., Bui, D.T., Pradhan, B., Azareh, A., Melesse, A.M., 2018. Groundwater spring potential modelling: comprising the capability and robustness of three different modeling approaches. J. Hydrol. 565, 248–261. Rahmati, O., Pourghasemi, H.R., Melesse, A.M., 2016. Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: a case study at Mehran Region, Iran. Catena 137, 360–372. Razandi, Y., Pourghasemi, H.R., Neisani, N.S., Rahmati, O., 2015. Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Sci. Inform. 8 (4), 867–883. Refsgaard, J.C., van der Sluijs, J.P., Højberg, A.L., Vanrolleghem, P.A., 2007. Uncertainty in the environmental modelling process–a framework and guidance. Environ. Model. Softw. 22 (11), 1543–1556. Ridd, M.K., Liu, J., 1998. A comparison of four algorithms for change detection in an urban environment. Remote Sens. Environ. 63 (2), 95–100. Rizeei, H.M., Azeez, O.S., Pradhan, B., Khamees, H.H., 2018a. Assessment of groundwater nitrate contamination hazard in a semi-arid region by using integrated parametric IPNOA and data-driven logistic regression models. Environ. Monit. Assess. 190 (11), 633. Rizeei, H.M., Pradhan, B., Saharkhiz, M.A., 2017. Surface Runoff Estimation and Prediction Regarding LULC and Climate Dynamics Using Coupled LTM, Optimized ARIMA and Distributed-GIS-Based SCS-CN Models at Tropical Region, GCEC 2017. Springer, pp. 1103–1126. Rizeei, H.M., Pradhan, B., Saharkhiz, M.A., 2018b. An integrated fluvial and flash pluvial model using 2D high-resolution sub-grid and particle swarm optimization-based random forest approaches in GIS. Complex Intell. Syst. 1–20. Rizeei, H.M., Pradhan, B., Saharkhiz, M.A., 2018c. Surface runoff prediction regarding LULC and climate dynamics using coupled LTM, optimized ARIMA, and GIS-based SCS-CN models in tropical region. Arab. J. Geosci. 11 (3), 53. Rizeei, H.M., Saharkhiz, M.A., Pradhan, B., Ahmad, N., 2016. Soil erosion prediction based on land cover dynamics at the Semenyih watershed in Malaysia using LTM and USLE models. Geocarto Int. 31 (10), 1158–1177. Sameen, M.I., Pradhan, B., Lee, S., 2018. Self-learning random forests model for mapping groundwater yield in data-scarce areas. Nat. Resour. Res. 1–19. Shirzadi, A., Saro, L., Joo, O.H., Chapi, K., 2012. A GIS-based logistic regression model in rock-fall susceptibility mapping along a mountainous road: salavat Abad case study, Kurdistan, Iran. Nat. Hazards. 64, 1639–1656. Smith, A.J., Walker, G., Turner, J., 2010. Aquifer Sustainability Factor: A Review of Previous Estimates. International Association of Hydrogeologists (AIH) and the Geological Society of Australia (GSA), pp. EP104589. Tehrany, M.S., Pradhan, B., Jebur, M.N., 2013. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J. Hydrol. 504, 69–79. Tehrany, M.S., Pradhan, B., Jebur, M.N., 2014. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J. Hydrol. 512, 332–343. Tong, D., Murray, A.T., 2012. Spatial optimization in geography. Ann. Assoc. Am. Geogr 102 (6), 1290–1309. Webb, G.I., 2000. Multiboosting: a technique for combining boosting and wagging. Mach. Learn. 40 (2), 159–196. Woo, M.W., Daud, W.R.W., Tasirin, S.M., Talib, M.Z.M., 2007. Optimization of the spray drying operating parameters—A quick trial-and-error method. Dry Technol. 25 (10), 1741–1747. Yao, X., Tham, L., Dai, F., 2008. Landslide susceptibility mapping based on support vector machine: a case study on natural slopes of Hong Kong, China. Geomorphology 101 (4), 572–582. Yin, H., Shi, Y., Niu, H., Xie, D., Wei, J., Lefticariu, L., Xu, S., 2018. A GIS-based model of potential groundwater yield zonation for a sandstone aquifer in the Juye Coalfield, Shangdong, China. J. Hydrol. 557, 434–447. Youssef, A.M., Al-Kathery, M., Pradhan, B., 2015. Landslide susceptibility mapping at AlHasher area, Jizan (Saudi Arabia) using GIS-based frequency ratio and index of entropy models. Geosci. J. 19 (1), 113–134. https://doi.org/10.1007/s12303-0140032-8. Zabihi, M., Pourghasemi, H.R., Pourtaghi, Z.S., Behzadfar, M., 2016. GIS-based multivariate adaptive regression spline and random forest models for groundwater potential mapping in Iran. Environ. Earth Sci. 75 (8), 665. Zare, M., Pourghasemi, H.R., Vafakhah, M., Pradhan, B., 2013. Landslide susceptibility mapping at Vaz Watershed (Iran) using an artificial neural network model: a comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arab. J. Geosci. 6 (8), 2873–2888. Zhang, Y., Maxwell, T., Tong, H., Dey, V., 2010. Development of a supervised software tool for automated determination of optimal segmentation parameters for ecognition. ISPRS TC VII Symposium – 100 Years ISPRS, Vienna, Austria.

mapping and its sustainable management in cold desert, India. J. King Saud Univ. Sci. 29 (3), 333–347. Harun, N., Dlay, S.S., Woo, W.L., 2010. Performance of keystroke biometrics authentication system using multilayer perceptron neural network (MLP NN), Communication Systems Networks and Digital Signal Processing (CSNDSP), 2010 7th International Symposium on. IEEE. pp. 711–714. Helaly, A.S., 2017. Assessment of groundwater potentiality using geophysical techniques in Wadi Allaqi basin, Eastern Desert, Egypt-Case study. NRIAG J. Astron. Geophys. 6 (2), 408–421. Hong, H., Liu, J., Zhu, A.X., Shahabi, H., Pham, B.T., Chen, W., Pradhan, B., Tien Bui, D., 2017. A novel hybrid integration model using support vector machines and random subspace for weather-triggered landslide susceptibility assessment in the Wuning area (China). Environ. Earth. Sci. 76, 652. https://doi.org/10.1007/s12665-0176981-2. Hosmer Jr, D.W., Lemeshow, S., Sturdivant, R.X., 2013. Applied Logistic Regression. John Wiley & Sons, pp. 398. Hyndman, R.J., Koehler, A.B., 2006. Another look at measures of forecast accuracy. Int. J. Forecast. 22 (4), 679–688. Jothibasu, A., Anbazhagan, S., 2016. Modeling groundwater probability index in Ponnaiyar River basin of South India using analytic hierarchy process. Model. Earth Syst. Environ. 2 (3), 109. Kaliraj, S., Chandrasekar, N., Magesh, N., 2014. Identification of potential groundwater recharge zones in Vaigai upper basin, Tamil Nadu, using GIS-based analytical hierarchical process (AHP) technique. Arab. J. Geosci. 7 (4), 1385–1401. Kaufmann, A., 1975. Introduction to the Theory of Fuzzy Subsets. Academic Pr, pp. 2. Kotsiantis, S.B., Zaharakis, I., Pintelas, P., 2007. Supervised machine learning: a review of classification techniques. Emerg. Artificial Intell. Appl. Comput. Eng. 160, 3–24. Kumar, P., Bansod, B.K., Debnath, S.K., Thakur, P.K., Ghanshyam, C., 2015. Index-based groundwater vulnerability mapping models using hydrogeological settings: a critical evaluation. Environ. Impact Assess. 51, 38–49. Lee, S., Sambath, T., 2006a. Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environ. Geol. 50 (6), 847–855. Lee, S., Sambath, T., 2006b. Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environ. Geol. 50, 847–855. Liao, S.-H., Chu, P.-H., Hsiao, P.-Y., 2012. Data mining techniques and applications–A decade review from 2000 to 2011. Expert Syst. Appl. 39 (12), 11303–11311. Manap, M.A., Sulaiman, W.N.A., Ramli, M.F., Pradhan, B., Surip, N., 2013. A knowledgedriven GIS modeling technique for groundwater potential mapping at the Upper Langat Basin, Malaysia. Arab. J. Geosci. 6 (5), 1621–1637. Marjanović, M., Kovačević, M., Bajat, B., Voženílek, V., 2011. Landslide susceptibility assessment using SVM machine learning algorithm. Eng. Geol. 123 (3), 225–234. McFadden, D., 1974. Conditional logit analysis of qualitative choice behavior. In: Zarembka, P. (Ed.), Frontiers in Econometrics. Academic Press, New York, pp. 105–142. Mia, M., Dhar, N.R., 2016. Prediction of surface roughness in hard turning under high pressure coolant using Artificial Neural Network. Measurement 92, 464–474. Mogaji, K., Lim, H., Abdullah, K., 2015. Regional prediction of groundwater potential mapping in a multifaceted geology terrain using GIS-based Dempster-Shafer model. Arab. J. Geosci. 8 (5), 3235–3258. Mojaddadi, H., Pradhan, B., Nampak, H., Ahmad, N., Ghazali, A.H.B., 2017. Ensemble machine-learning-based geospatial approach for flood risk assessment using multisensor remote-sensing data and GIS. Geomat. Nat. Haz. Risk. 8 (2), 1080–1102. https://doi.org/10.1080/19475705.2017.1294113. Mojaddadi Rizeei, H., Pradhan, B., Saharkhiz, M.A., 2019. Urban object extraction using Dempster Shafer feature-based image analysis from worldview-3 satellite imagery. Int. J. Remote Sens. 40 (3), 1092–1119. Mohammady, M., Pourghasemi, H.R., Pradhan, B., 2012. Landslide susceptibility mapping at Golestan Province, Iran: a comparison between frequency ratio, DempsterShafer, and weights-of-evidence models. J. Asian Earth Sci. 61 (15), 221–236. https://doi.org/10.1016/j.jseaes.2012.10.005. Naghibi, S.A., Moghaddam, D.D., Kalantar, B., Pradhan, B., Kisi, O., 2017. A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping. J. Hydrol. 548, 471–483. https://doi.org/10. 1016/j.jhydrol.2017.03.020. Naghibi, S.A., Pourghasemi, H.R., Dixon, B., 2016. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environ. Monit. Assess. 188 (1), 44. Naghibi, S.A., Dolatkordestani, M., Rezaei, A., Amouzegari, P., Heravi, M.T., Kalantar, B., Pradhan, B., 2019. Application of rotation forest with decision trees as base classifier and a novel ensemble model in spatial modeling of groundwater potential. Environ. Monit. Assess. 191 (4), 248. Naghibi, S., Vafakhah, M., Hashemi, H., Pradhan, B., Alavi, S., 2018. Groundwater augmentation through the site selection of floodwater spreading using a data mining approach (case study: Mashhad plain, Iran). Water 10 (10), 1405. Nampak, H., Pradhan, B., Manap, M.A., 2014. Application of GIS based data driven evidential belief function model to predict groundwater potential zonation. J. Hydrol. 513, 283–300. Oh, H.-J., Kim, Y.-S., Choi, J.-K., Park, E., Lee, S., 2011. GIS mapping of regional probabilistic groundwater potential in the area of Pohang City, Korea. J. Hydrol. 399 (3–4), 158–172. Ozdemir, A., 2011. GIS-based groundwater spring potential mapping in the Sultan Mountains (Konya, Turkey) using frequency ratio, weights of evidence and logistic regression methods and their comparison. J. Hydrol. 411 (3–4), 290–308.

11