Journal Pre-proof Integrated machine learning methods with resampling algorithms for flood susceptibility prediction
Esmaeel Dodangeh, Bahram Choubin, Ahmad Najafi Eigdir, Narjes Nabipour, Mehdi Panahi, Shahaboddin Shamshirband, Amir Mosavi PII:
S0048-9697(19)35978-9
DOI:
https://doi.org/10.1016/j.scitotenv.2019.135983
Reference:
STOTEN 135983
To appear in:
Science of the Total Environment
Received date:
6 October 2019
Revised date:
5 December 2019
Accepted date:
5 December 2019
Please cite this article as: E. Dodangeh, B. Choubin, A.N. Eigdir, et al., Integrated machine learning methods with resampling algorithms for flood susceptibility prediction, Science of the Total Environment (2019), https://doi.org/10.1016/j.scitotenv.2019.135983
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier.
Journal Pre-proof Integrated machine learning methods with resampling algorithms for flood susceptibility prediction Esmaeel Dodangeha, Bahram Choubinb, Ahmad Najafi Eigdirb, NarjesNabipourc, Mehdi Panahid, Shahaboddin Shamshirbande,f,*, Amir Mosavig,h a
Department of Watershed Management, Sari Agricultural Sciences and Natural Resources University, P.O.
Soil Conservation and Watershed Management Research Department, West Azarbaijan Agricultural and Natural
Resources Research and Education Center, AREEO, Urmia, Iran
ro
b
of
Box737, Sari, Iran
Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam
d
Department of Geophysics, Young Researchers and Elites Club, North Tehran Branch, Islamic Azad University,
re
-p
c
Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh
City, Vietnam
na
e
lP
Tehran, Iran
Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam
g
Kalman Kando Faculty of Electrical Engineering, Obuda University, Budapest, Hungary
h
School of the Built Environment, Oxford Brookes University, Oxford OX30BP, UK
Jo
ur
f
* Corresponding author, Email:
[email protected]
Journal Pre-proof Integrated machine learning methods with resampling algorithms for flood susceptibility prediction
Abstract Flood susceptibility projections relying on standalone models, with one-time train-test data
of
splitting for model calibration, yields biased results. This study proposed novel integrative flood
ro
susceptibility prediction models based on multi-time resampling approaches, random
-p
subsampling (RS) and bootstrapping (BT) algorithms, integrated with machine learning models:
re
generalized additive model (GAM), boosted regression tree (BTR) and multivariate adaptive regression splines (MARS). RS and BT algorithms provided 10 runs of data resampling for
lP
learning and validation of the models. Then the mean of 10 runs of predictions is used to produce
na
the flood susceptibility maps (FSM). This methodology was applied to Ardabil Province on coastal margins of the Caspian Sea which faced destructive floods. The area under curve (AUC)
ur
of receiver operating characteristic (ROC) and true skill statistic (TSS) and correlation
Jo
coefficient (COR) were utilized to evaluate the predictive accuracy of the proposed models. Results demonstrated that resampling algorithms improved the performance of Standalone GAM, MARS and BRT models. Results also revealed that Standalone models had better performance with the BT algorithm compared to the RS algorithm. BT-GAM model attained superior performance in terms of statistical measures (AUC = 0.98, TSS = 0.93, COR = 0.91), followed by BT-MARS (AUC = 0.97, TSS = 0.91, COR = 0.91) and BT-BRT model (AUC = 0.95, TSS = 0.79, COR = 0.79). Results demonstrated that the proposed models outperformed the benchmark models such as Standalone GAM, MARS, BRT, multilayer perceptron (MLP)
Journal Pre-proof and support vector machine (SVM). Given the admirable performance of the proposed models in a large scale area, the promising results can be expected from these models for other regions. Keywords: Resampling approach; Random subsampling; Bootstrapping; Flood susceptibility; Machine learning
of
1. Introduction
ro
Floods are among the most devastating natural disasters that occur on large scales and their post-
-p
disaster effects persist for a long time. The Global Assessment Report on Disaster Risk Reduction (UNISDR, 2015) indicated that flood mortality risk is increased in the Middle East
re
and North Africa (MNA) countries due to the population growth (Haghizadeh et al., 2017) and
lP
manipulating the nature by forest cutting and rangeland degradation. The flood mortality rate has an 11% increased in MNA countries compared to the last century (UNISDR, 2015; Ghomian and
na
Yousefian, 2017). During the past 30 years, more than 80% of natural disaster events occurred in
ur
9 countries including Afghanistan, Pakistan, Iran, Sudan, Somalia, Algeria, Morocco, Yemen
Jo
and Egypt (UNISDR, 2011). The causes of flooding vary in different parts of the world, for example, Queensland floods are river floods that occur as a result of inability to absorb water into the soil so that the excess water flows through the river channels and causes flooding (Chanson et al., 2014). In Bangladesh, monsoon rains are the main cause of flooding that occurs as a result of movement of hurricanes from sea to the coast (Brammer, 1990). The cause of coastal floods in Europe is the Atlantic storms that push the water to the coast. If this process is accompanied with intense tides it can cause devastating floods (UNISDR, 2015). While two-thirds of Iran struggled with water scarcity (Rahmati et al., 2018) floods have been one of the most destructive natural disasters in the country over the past few years (Vaghefi et
Journal Pre-proof al., 2019). Most of the flooding events in Iran are human-induced that occur as a result of human interference with nature through the change in flood routes, land use change and urban development along the dry river beds without creating the new routes for crossing the floods (Jamali et al., 2015). Just in the past two years, 112 people have died in flooding events and 45,000 homes have been completely destroyed. The flood events have also damaged the municipal facilities, agricultural lands, and roads (Habibian, 2018). Although flood prevention is
of
not entirely feasible, flood hazard zoning in order to determine the most flood-vulnerable areas
ro
and adopting the flood proofing practices can noticeably reduce the flood damages.
-p
Physically-based hydrologic models such as hydrologic engineering center-hydrologic modeling
re
system (HEC-HMS) (Feldman, 2000), soil and water assessment tool (SWAT) (Arnold et al., 1998) IHACRES (Croke et al., 2005), and HSPF model (Bicknell et al., 1997) have been
lP
engaged in flood modeling studies. However, the use of such models requires extensive field
na
measurements and exhausted parameterization practices (Fenicia et al., 2008). They still provide only at site estimations of the flood hazard using the local streamflow data recorded at gauging
ur
hydrometric stations thus they are not suitable for flood assessment in the regional scales (Li et
Jo
al., 2011; Tien Bui et al., 2016).
Flood susceptibility mapping (FSM) with support of geographic information systems (GIS) enables us to predict future flood occurrence to mitigate their human and socio-economic losses (Wang et al., 2019). GIS techniques have made a significant contribution to flood susceptibility modeling studies by providing geostatistical tools for handling large amounts of spatial data (Tien Bui et al., 2016; Wang et al., 2019). Various statistical and data-driven techniques along with GIS techniques have been proposed and applied for identifying the flood-susceptible areas in the literature. Among those commonly used approaches are analytical hierarchy process
Journal Pre-proof (AHP) (Chen et al., 2011; Kazakis et al., 2015; Luu et al., 2018; Tang et al., 2018), frequency ratio (FR) (Lee et al., 2012; Rahmati et al., 2016; Samanta et al., 2018; Siahkamari et al., 2018; Tehrany et al., 2015a) and weights-of-evidence (Rahmati et al., 2016). However, there are drawbacks to the use of these approaches for flood susceptibility map (FSM) production. For example, the results of the AHP model is subject to uncertainties due to ambiguous judgments (Miles and Snow, 1984), and the FR method is greatly dependent on the sample size (Sajedi-
of
Hosseini et al., 2018b).
ro
To cope with these issues, the machine learning (ML) approaches including, artificial neural
-p
networks (ANNs), adaptive neuro-fuzzy inference systems (ANFIS), genetic algorithm (GA),
re
support vector machines (SVM), and tree-based models have been proposed and extended to use for flood susceptibility predictions (Bui et al., 2019, 2018b, 2018a; Chapi et al., 2017; Khosravi
lP
et al., 2019, 2018; Kia et al., 2012; Seckin et al., 2013; Tehrany et al., 2015b, 2013; Wang et al.,
na
2019; Zhao et al., 2018). In this study, we used three rarely used ML models including multivariate adaptive regression splines (MARS), boosted regression trees (BRT) and
ur
generalized additive model (GAM) for flood susceptibility modeling. Some studies indicated a
Jo
superior performance of the MARS model over the other artificial intelligent (AI) models such as ANN, SVM, ANFIS, M5 model tree (Mosavi et al., 2018; Rezaie-balf et al., 2017). Boosted regression trees (BRT) are robust ensemble models which compound the advantages of regression tree models and boosting method (Elith et al., 2008). Generalized additive models with high potential of fitting the complex and non-linear environmental data (Mitchell Lyons, 2018) are also ensemble models that compound the benefits of generalized linear models (GLMs) and additive models that are considered in this study.
Journal Pre-proof For the use of any predictive model, two sets of data are considered. The first set is used for learning the model and the second set, which is statistically independent (Araújo et al., 2005), is used for evaluation of the model. Data splitting is commonly used for partitioning the data into learning and validation subsets. Most of the investigations using ML models for flood susceptibility predictions either in standalone or hybrid versions, relied on one-time learning and validation phases for reproducing the flood susceptibility maps. However, this could have
of
resulted in bias to the parameter estimation and subsequently FSM predictions of the used
ro
models. To alleviate this issue the current study employed resampling algorithms namely
-p
bootstrapping (BT) and random subsampling (RS) ( Picard and Cook, 1984; Politis et al., 1999;
re
Steven Abney, 2002; Wu, 1986; Hastie et al., 2009) to use along with ML models for FSM prediction. The BT and RS algorithms perform sampling respectively with and without
lP
replacement of original data respectively, by using samples randomly drawn in B iterations. The
na
RS algorithm randomly splits the data into training and validation sets B times without replacement of the original data. The BT algorithm draws a sample with an equal length to the
ur
original data as a training set for learning the models. The data that are not contributed to model
Jo
learning is used for validation of the models. Performance of the ML models strikingly dependent on the data used for model learning. The use of resampling algorithms combined with the ML models as integrative models (for multi-time model learning) engages more information on data and thus results in less biased results. The present study proposed novel integrative models by coupling the resampling algorithms with ML models (to provide the more reliable flood susceptibility predictions compared to the onetime splitting train-test data set) with a case study at Ardabil province near the coastal margins of Caspian Sea. Therefore the main objectives of this research are: (i) to evaluate the performance
Journal Pre-proof of three rarely engaged ML models namely the generalized additive model (GAM), boosted regression tree (BRT), and multivariate adaptive regression splines (MARS) for flood susceptibility prediction, (ii) to explore the effects of resampling methods (bootstrapping, BT, and subsampling, RS) on the performance of ML models by comparing the performance of standalone ML models (GAM, BRT and MARS models) with the new integrative models called hereafter BT-GAM, BT-BRT, BT-MARS, RS-GAM, RS-BRT, RS-MARS, (iii) to predict the
of
flood susceptibly map using the new integrative models for identifying the most susceptible area
-p
ro
over the study region.
re
2. Materials and methods
lP
2.1. Study area
The study area is Ardabil Province that is located in northwest Iran. The distinct covering an area
na
of about 17,953 km2 (1% of the area of the country) is restricted to Talesh Mountains in the east
ur
and the Azerbaijan Plateau in the west (Tavoosi and Delara, 2010) (Fig. 1). Around 75% of this area has mountainous topography with a maximum elevation of 4,811m above mean sea level at
Jo
Sabalan Mountain. As a consequence of complex topography and numerous lakes, the region has very diverse climates such as arid and semi-arid in the north and Mediterranean and semi-humid in the south of the district. Based on the historical data analyses between 1988-2016, the mean annual rainfall is about 480 mm (Iran Meteorological Organization, 2019). The spatial distribution of the rainfall also varies between 222 mm in the north and 1810 mm in the southeastern of the district. The mean air temperature in the study region varies between 7.9 o 15.2o that is spatially variable between the climate stations (Tavoosi and Delara, 2010).
Journal Pre-proof Due to the proximity of the study region to the coastal margins of the Caspian Sea, it is exposed to destructive floods. According to the Ardabil Province crisis management organization (2016), more than 37% of disasters in Ardabil Province are related to flood. Over the past three years, the average flood damage was estimated at around $ 900,000 per year. The total losses of the last flood event in February 2019 reached to $ 370,000 (Samani, 2019).
of
Fig. 1 SOMEWHERE HERE
ro
2.2. Data used
-p
2.2.1. Flood inventory map
The flood inventory map, a map showing the locations of historical floods, is a prerequisite for
re
spatial modeling of flood susceptibility (Tehrany et al., 2014). It shows the past records of flood
lP
occurrence in an area and can be used to estimate the future flood events by analyzing the relationships between the past flood events and their environmental conditioning factors. In the
na
current research, flood inventory map was constructed using the ground control points of the
ur
historical flood data of 147 locations using the existing reports and by field surveys (Fig. 1). The
Jo
equal number of 147 location points was considered as non-flood points (Fig. 1).
2.2.2. Preparation of geospatial database (Flood influencing factors) Although there is not a universal guideline to select the flood influencing factors (Azareh et al., 2019; Hosseini et al., 2020), several flood influencing variables such as elevation, slope, aspect, curvature, distance to stream, rainfall, normalized difference vegetation index (NDVI), land use, and lithology were identified according to the literature (Khosravi et al., 2016a; Rahmati et al., 2016; Choubin et al., 2019b) and based on the available data. The spatial variability of the
Journal Pre-proof identified parameters over the study region is illustrated in Fig. 2. All of the input variables were transformed to the raster format with the 30 × 30 m spatial resolution in the ArcGIS interface.
2.2.2.1. Elevation Elevation is one of the most effective parameters that play an important role in flood inundation of an area (Tehrany et al., 2014; Choubin et al., 2019b). It has been frequently used as a relief
of
indicator at large scales (Rahmati et al., 2018). There is an inverse relationship between the
ro
elevation and flooding of a region. Li et al. (2011) indicated that the regions with lower elevation
-p
are more prone to flooding. Cea and Bladé (2015) pointed out that flooding in lower elevations is
re
due to the flowing water from the high elevations. The elevation map was prepared using the
lP
digital elevation model (DEM) 30 × 30 m resolution in the ArcGIS interface (Fig. 2a).
na
2.2.2.2. Slope
Slope is a physiographic characteristic that greatly affects the runoff volume and velocity. The
ur
runoff volume and velocity increase with increasing the slope gradient ( Khosravi et al., 2016b;
Jo
Tien Bui et al., 2018a). As the slope gradient increases, the runoff infiltration rate decreases and a large amount of runoff enters the drainage network (Tehrany et al., 2015a). The slope map of the region varies from 0 to 88 degrees (Fig. 2b).
2.2.2.3. Aspect The aspect is also important in flood studies (Choubin et al., 2019b). Sunny slopes are less humid and less likely to be flooded. Yates et al. (2002) showed that the hydrologic response unit is highly influenced by the slope aspect. Rahmati et al. (2016) also demonstrated that soil
Journal Pre-proof moisture content and local climatic conditions are also influenced by slope aspect. In this study, the aspect classified into nine categories as illustrated in Fig. 2c.
2.2.2.4. Curvature Curvature is also an important flood conditioning factor that affects heterogeneity and hyporheic (Cardenas et al., 2004). Flat and concave areas are more prone to flooding (Tehrany et al., 2014,
of
2015a). In the present study, curvature was calculated using the DEM by ArcGIS software (Fig.
-p
ro
2d).
re
2.2.2.5. Distance to stream
The smaller the distance from the river and stream, the greater the risk of flooding. Predick and
lP
Turner (2008) emphasized that distant or close proximity to the river is a serious factor for
na
flooding. Tien Bui et al. (2018) and Darabi et al. (2019) also observed that a great number of floods occurred in areas adjacent to the river. As illustrated in Fig. 2e, the distance from river
2.2.2.6. Rainfall
Jo
ur
ranges between 0 and 9560 m.
Rainfall is a key influencing factor in flood susceptibility mapping which is considerably remarked in the literature (Tehrany et al., 2015a; Tien Bui et al., 2017). In this research, the mean annual rainfall map was calculated using the long-term precipitation data (1988-2016) from 32 rain gauge stations by the inverse distance weighting (IDW) method in ArcGIS. Rainfall data were obtained from the Iran Meteorological Organization (IRIMO) and the Iranian Department of Water Resources Management Company (IDWRMC). The annual rainfall varies
Journal Pre-proof between 222 and 1810 mm respectively from the northwest towards the southeastern coastal regions of the province (Fig. 2f).
2.2.2.7. Normalized difference vegetation index (NDVI) The NDVI is a simple graphic indicator that is used in remote sensing analyses for the assessment of vegetation attributes in a region (Sajedi‐ Hosseini et al., 2018a). As there is an
of
inverse relationship between vegetation density and flooding (Tehrany et al., 2013; Kumar and
ro
Acharya, 2016), the use of this index can be useful in preparing the flood susceptibility map. The
-p
NDVI map of the study region (Fig. 2g) is calculated and mapped using the following equation (Tucker and Seller, 1986):
re
NIR R NIR R
(1)
lP
NDVI
where NIR is the near-infrared portion of the electromagnetic spectrum (0.76-0.90μm) and R is
Jo
2.2.2.8. Land use
ur
na
the red portion of the electromagnetic spectrum (0.63-0.69μm).
Land use is one of the most important hydrologic variables that directly affect runoff volume and velocity (Yalcin et al., 2011) and sediment transportation (Benito et al., 2010). Several investigations emphasized on the importance of the land use pattern on flooding (Benito et al., 2010; Beckers et al., 2013; García Ruiz et al., 2008; Karlsson et al., 2017; Komolafe et al., 2018). The land use map in this study was received from the IDWRMC for the year 2012 (Fig. 2h).
2.2.2.9. Lithology
Journal Pre-proof Lithology has also a significant role in runoff volume and its speed by controlling the infiltration rate and sediment transport. Sediment transport, which is one of the flood-related components, is affected by geological formation erodibility. Lee et al. (2012) indicated that different geology units have different susceptibility to flooding. It also affects the channel shape on temporal floods (Reneau, 2000; Heitmuller et al., 2015). The geology map of the study region consists of 31 geology units (Fig. 2i), which was obtained from IDWRMC.
ro
of
Fig. 2 SOMEWHERE HERE
-p
2.3. Machine learning models
re
2.3.1. Generalized additive model (GAM)
lP
GAM known as “wiggly models” (Mitchell Lyons, 2018) was advanced by Hastie and Tibshirani (1990) as integration of generalized linear model (GLM) with additive models. In GAM the
na
linear relationship between the dependent and independent variables is replaced by non-linear
ur
smooths (Jones and Wrigley, 1995). GAM uses the additive approach in which the suitable functional form is selected based on the data without prior knowledge of the functional aspects
Jo
of the model (Jones and Almond, 1992). Environmental data rarely follow simple linear models and are usually best explained GAM. GAMs are initially developed to deliver the benefits of GLMs and additive models in a single model, simultaneously (Hastie and Tibshirani 1999). GAMs are nonparametric extensions of the GLMs and the main advantage of that over the second type of the models is the capability of fitting the complex and non-linear relationships. The response variable (Y) in GAMs does not necessarily follow the normal distribution and it can be fitted by a variety of distributions such as Poisson or binomial distributions. A link
Journal Pre-proof function (g) is used to connect the response variable with the predictors to generate the predictions: (1) where E(Y) is the expected values of the response variable, β0 is the coefficient vector, and f1(x1),…fm(xm) are the predictors. In essence, floods are complex natural phenomena and there is a nonlinear relationship between the flood occurrence and environmental conditioning factors.
of
Owing to the predictive power and ability of the modeling non-linear relationships, GAMs are
-p
ro
appropriate models for flood susceptibility predictions and thus considered in this study.
re
2.3.2. Multivariate adaptive regression splines (MARS)
Friedman (1991) proposed the MARS model as a flexible method of analyzing the multi-
lP
dimensional experimental data. Belonging to the nonparametric models, MARS possess the great
na
potential of modeling complex nonlinear process (Friedman, 1991; Zhang and Goh, 2016) and reproducing the simple, easier-to-interpret piecewise linear models (Zhang and Goh, 2016).
ur
MARS modeling follows a “divide and conquer” strategy by partitioning the train data sets into
Jo
subdomains and fitting the separate piecewise linear models (splines) with varied gradients (Zhang and Goh, 2016). Splines, also called basis functions (BFs), are connected through a network of knots that are randomly positioned within a range of input variables. MARS models are built in a reciprocating process: in the stepwise step, MARS generates the BFs and placed the candidate knots to capture the interactions between all available variables. The knots locations are optimized by adopting the adaptive regression algorithm. In the backward step, the pruning of the surplus BFs is conducted for eliminating the least contributed BFs. The general structure of the MARS models is as follows:
Journal Pre-proof (2) where y represents the dependent variable, β0 represents the constant coefficient, M represents the number of BFs, λm represents the basis function with βm as its corresponding coefficient and X represents the predictors (flood conditioning factors in the current study). There are no assumptions regarding the data distributions and nonlinear associations between the predictors and dependent variable in MARS models (Kisi et al., 2019) which makes them capable of
of
modeling complicated nonlinear processes such as flood occurrence. More technical information
-p
ro
about the MARS models can be found in Hastie et al. (2009) and Put et al. (2004).
re
2.3.3. Boosted regression trees (BRT)
Freund and Schapire (1996) firstly introduced the boosted regression trees (BRT) as a powerful
lP
data mining algorithm for continuous and categorical input variables. Boosted regression trees
na
(BRT) are ensemble models which compound the advantages of regression tree models and boosting method (Elith et al., 2008). Construction of multiple regression models instead of fitting
ur
simple linear models improves the performance of the predictions in BRT models (Schapire,
Jo
2003). Two algorithms are used for handling the nonlinear relationships between the response variable and predictors: i) regression trees map out the associations between the response variable and predictors through a recursive binary split procedure and ii) boosting prunes the tree by incorporating the simple regression models to increase the model performances. In BRT models a successive fitting of tree models is conducted where each posterior tree is adapted to fit the residual of the preceding model (Knoll et al., 2019). Despite numerous points such as accepting the input variables of different types and insensitivity to outliers, BRT models produce biased results with insufficient data lengths (Jin et al., 2018). Resampling approaches described
Journal Pre-proof in the following, are adopted in the present study to promote the predictive performance of this model.
2.4. Resampling approaches Any predictive machine learning model needs to be tuned for the parameters before using it to make predictions. More representatives a sample of the population, less biased the parameters of
of
the fitting model and predictions are. This is highly impactful when handling spatial data such as
ro
flood susceptibility with a high degree of variabilities. There are several categories of effective
-p
resampling techniques such as Jackknife resampling, cross-validation, nonparametric
re
bootstrapping and random subsampling (Fox, 2002). The last two methods are stressed in the current study. These algorithms can be applied with any loss function and any nonlinear
na
lP
modeling approaches (Hastie et al., 2009).
ur
2.4.1. Bootstrapping algorithm
Jo
The bootstrapping algorithm introduced by (Efron, 1979) is a statistical inference method which builds a sampling distribution to resample the original data (Fox, 2002). The basic idea of this algorithm is presuming the sample data as a population from which several samples can be drawn. Let Z = (z1, z2, …, zN) where zi = (xi,y) with xi denoting the explanatory variable (flood conditioning factors) and y denoting the response variable (flood susceptibility rate here in this study). Bootstrapping algorithm randomly draws completely independent samples with the replacement of the original training data Z with each drawn sample having the same length as to the original data set. Bootstrapping is conducted B times to produce B bootstrap samples, preserving the stochastic characteristics of the original data Z. Then B times modeling efforts
Journal Pre-proof tried to fit the separate bootstrap samples with reproducing the B flood susceptibility maps. The mean of the flood susceptibility predictions over the B runs for each model were taken as flood susceptibility map. 2.4.2. Random Subsampling Random subsampling randomly splits the data into train and test portions and repeats the process
of
in B iterations. In contrast to the bootstrapping algorithm, the random subsampling method resamples the data without replacement and thus the original data are engaged for model
ro
calibration rather than bootstrap samples. In each time a part of data is used for learning the
-p
models and the rest of the data are used for model validation. In this way, a wide range of
na
lP
occurrence) can be captured by the models.
re
associations between the predictors (flood conditioning factors) and the response variable (flood
ur
2.5. Modeling development of proposed integrative models The present study aimed to introduce the novel integrative intelligent models based on ensemble
Jo
machine learning and resampling algorithms for increasing the flood susceptibility predictions over the large scales. GIS database including nine flood conditioning factors were mapped in the ArcGIS 10.3 interface. Flood inventory map was also prepared by locating the 147 ground control points of historical floods. All of the spatial GIS database were converted to the ASCII format (Mackenzie, 1980) to be readable by R software (R Development Core Team, 2016). The proposed models were programmed using the SDM (Naimi and Araújo, 2016), Biomode2 (Thuiller et al., 2013), raster (Hijmans et al., 2017), gbm (Ridgeway, 2013), and earth (Milborrow, 2019) packages. For the construction of the models, the random selection method
Journal Pre-proof was used to divide the points into training and testing subsets based on bootstrapping and random subsampling methods. Due to the large scale of the study area and the increased sample size and for the sake of saving time a total number of B = 10 runs were selected for each resampling approach. This means that each of the machine learning models fits the spatial data over the 10 runs and produces 10 flood susceptibility prediction maps. Then, the mean of the
of
predictions over 10 runs was mapped as the final flood susceptibility map for each model. For better visual discrimination of the flood susceptibilities between different regions, the natural
ro
break classification method (Poli and Sterlacchini, 2007) was used to categorize the pixels with
-p
similar susceptibility values into the same groups. This classification algorithm specifies the
re
class breaks by minimizing the within-class differences and maximizing the between-class
lP
differences (Choubin et al., 2019b). Flowchart of the stages followed for developing the proposed models is illustrated in Fig. 3.
ur
na
Fig. 3 SOMEWHERE HERE
Jo
2.6. Accuracy assessment of proposed integrative models Flood predictive models need to be assessed for performance before making predictions. This study utilized various statistical accuracy assessment measures such as receiver operating characteristics (ROC) area under the ROC curve (AUC), true skill statistic (TSS, also known as Hanssen-Kuipers discriminant), and correlation coefficient (COR), to assess the efficiency of the proposed models. Tien Bui et al. (2016) indicated that RMSE shows sensitivities to outliers and larger values of the investigated variables thus alternative statistical measures and graphical assessment of the models are also utilized for performance evaluation of the models. This study
Journal Pre-proof also utilized TSS statistic as an alternative measure to kappa statistic, which, unlike the kappa statistic, is not affected by the prevalence and size of the validation data sets (Allouche et al., 2006). (3)
of
(4)
ro
In the above equations n denotes to the total sample size, FOI denotes the flood occurrence index
-p
(Occurrence ≈ 1, Non-occurrence ≈ 0), FSI denotes the flood susceptibility index offered by the
re
models, a denotes the number of flood occurrence points which accurately classified by the
lP
models as flooded pixels, b denoted to the Non-occurrence flood points which inaccurately classified as flooded pixels by the model, c denoted the flood occurrence points which
na
inaccurately classified as the Non-flooded pixels, d denoted the Non-occurrence flood points
ur
which accurately classified as the non-flooded pixels.
Jo
The receiver operating characteristic (ROC) method was implemented for the assessment of the models. The ROC curve has been frequently applied for accuracy evaluation of the spatial prediction models for flood susceptibility modeling (Chen et al., 2011; Khosravi et al., 2016a, b; Tien Bui et al., 2016; Lee et al., 2017; Hong et al., 2018; Tien Bui et al., 2018a; Choubin et al., 2019b). The ROC curve is a 2-dimensional curve with false-positive rates (1-specificity) in the x-axis versus true-positive rates (sensitivity) in the y-axis. The sensitivity defined as the frequency of the flooded pixels distinguished as flooded and the specificity defined as the frequency of non-flooded pixels distinguished as non-flooded (Hong et al., 2018). The area under the ROC curve (AUC) is used for the quantitative assessment of the developed integrative
Journal Pre-proof model. The AUC values range between 0 (when the model is absolutely non-informative) and 1 (when the model absolutely performed well) (Evans et al., 2005). The higher AUC value, the better the model is (Fawcett, 2006). The AUC values < 0.6 indicates poor performance, 0.6 - 0.7 indicates moderate performance, 0.7 - 0.8 good performance and > 0.8 indicates a very good
of
performance of the model.
ro
3. Results and discussion
-p
3.1. Flood susceptibility prediction using the developed integrative models The steps outlined in Fig. 3 are followed to predict the flood susceptibility rate for the whole
re
study domain. For this purpose, spatial variation of the flood locations in relation to their
lP
conditioning factors was modeled using the developed models. Given that the representational
na
accuracy of the training data strongly affects the model performance and simulations (Zhou and Wu, 2011), this study utilized the benefits of RS and BT algorithms to share all of the data in
ur
learning and validation process. RS and BT algorithms were used for B = 10 runs of resampling
Jo
of the training and validation data sets. For each run of integrative models with the random subsampling algorithm, 70% of the data were randomly sampled and used for learning the models and the trained models were then validated using the remaining 30% of the data. In the bootstrapping algorithm, a sample with equal length to the original data is sampled with replacement so that approximately 63.2% of total data is sampled as training data and remaining 36.8% of the total data, which is not selected during the sampling, is used for model validation. The mean of the flood susceptibility rates over the B = 10 runs was transferred to GIS software to construct the flood susceptibility maps. Fig. 4 displays the flood susceptibility predictions over the study domain. Based on all model predictions, coastal areas in the southeast of the study
Journal Pre-proof region are the most susceptible areas to flooding and are categorized into the very high susceptibility class (Fig. 4). Fig. 2 (g) demonstrates the higher NDVI values for these regions and thus the lower flood susceptibilities expected for these areas (Tehrany et al., 2013). Khosravi et al. (2016a) and Tien Bui et al. (2016) also indicated the role of NDVI as a major flood conditioning factor among the others. However, the main reason for the higher values of NDVI in these regions refers to the greatest amounts of rainfalls with a long time of continuity due to
of
the proximity to the Caspian Sea (Fig. 2f) which diminished the role of vegetation attributes in
ro
this region. Fig. 4 displays that the river marginal areas are more susceptible to flooding and are
-p
classified within the very high flood susceptibility class. This result is in agreement with the
re
findings of other studies such as Yang et al. (2018), Darabi et al. (2019), and Choubin et al.
lP
(2019b) which demonstrated the higher flood susceptibility of the regions around the rivers.
na
Fig. 4 SOMEWHERE HERE
ur
3.2. Model validation and comparison
Jo
Fig. 4 shows that all of the investigated models yield similar results in spite of the geographical extent of the flood-susceptible areas. The coastal areas in the southeast margins of the Caspian Sea and river margins are more susceptible areas to flooding based on the model simulations. For validation of the models, historical flood test points were mapped on the flood susceptibility prediction maps. As Fig. 5 illustrates the historical flood points match the high and very high susceptible areas around the rivers and coastal areas. For more evaluation of the developed models, a comparison was made between these models and conventional standalone models such as MLP, SVM, and standalone GAM, MARS and BRT models. Table 1 provided the performance evaluation measures for the investigated models using the test data. COR values
Journal Pre-proof imply the correlations between the FOI (Occurrence = 1 and Non-occurrence = 0) and the FSI values, predicted by the ML models. The higher COR values indicate the close agreement between the FOI and FSI and the high efficiency of the models. Results indicated that proposed models based on BT algorithm attained the best performance with AUC values varying between 0.95 and 0.98, COR coefficient varying from 0.79 to 0.91 and TSS varying between 0.83 and 0.94, which outperformed the MLP and SVM and standalone GAM, MARS, and BRT
of
benchmark models (Table 1).
-p
ro
Table 1 SOMEWHERE HERE
re
The Standalone model also attained good results with the RS algorithm with AUC values varying
lP
between 0.92 and 0.96, COR coefficient varying from 0.73 to 0.86, TSS varying between 0.77and 0.89. Results demonstrated that BT and RS algorithms improved the performance of the
na
Standalone models. All of the Standalone models performed better with the BT algorithm
ur
compared to the RS algorithm.
Jo
Fig. 5 displays the ROC curves for developed models using the train and test flood points. The ROC curves on the graphs belong to the individual runs and the thicker curves are the mean values over the B = 10 runs. As it is clear, the mean ROC curves are closer to the upper right corner compared to the single ROC curves which verified the improved performance of the models. For a given model with a high degree of agreement between the number of flood points and frequency of cells with very high susceptibility rates the slope of the ROC curve increases and as a result the ROC curve gets closer to the upper left corner (i.e., closer to AUC = 1). The results exhibited that the proposed models have an admirable performance with spatial flood
Journal Pre-proof susceptibility predictions. The superiority of the BT algorithm is also clear in these figures than the RS algorithm. For a better understanding of the model efficiencies, it is also important to determine the area percent of each flood susceptibility class reproduced by the model. Smaller the high and very high susceptibility classes areas, the more efficient the model is. It is seen that BT-MARS model reproduced lower area percent of very high susceptibility classes compared to alternative models. Table 2 outlined the area percent of each susceptibility class reproduced by
of
the BT-GAM, BT-BRT, and BT-MARS models. The results indicated that BT-MARS and BT-
ro
GAM models reproduced the lower area percent of very high susceptibility class (26 and 32%)
-p
compared to the BT-BRT model (46%). The number of flood points embraced in flood
re
susceptibility classes is also a good indicator of model adequacy. Table 2 presented the number of historical flood points located within the flood susceptibility classes. Likewise, BT-MARS
lP
and BT-GAM performed better than BT-BRT which embraces the modest and greatest number
na
of flood points within the very low and very high susceptibility classes. For the BT-MARS and BT-GAM models, respectively, number of 1 (0.7%) and 1(0.7%) flood points are embraced in
ur
very low susceptibility, whereas number of 134 (91%) and 133 (90.5%) flood points are
Jo
embraced within the very high susceptibility class. For the BT-BRT model, a number of 8 flood points (5%) are embraced within the very low susceptibility and 80 flood points (54%) are embraced within the very high susceptibility class. Fig. 5 SOMEWHERE HERE Table 2 SOMEWHERE HERE
3.3. Sensitivity analysis of the response variable
Journal Pre-proof Appropriate selection of the flood conditioning factors is crucial for reliable flood susceptibility predictions (Kourgialas and Karatzas, 2011). The role of flood conditioning factors varies from region to region and we should be able to infer that it is a function of sample data properties. To prove this claim, the present study used the BT algorithm along with the GAM model to model flood susceptibility using different samples over B = 10 runs of simulations. In each run, a sample was randomly drawn with the replacement of original data and was used to analyze the
of
associations between the flood conditioning factors and flood occurrence using the GAM model
ro
based on the AUC and COR criteria. The importance of the flood conditioning factors was
-p
investigated based on two statistics of the decrease in correlation (DC) and decrease in AUC
re
(DAUC) (Choubin et al., 2019a) for different runs of simulations. As Fig. 6 shows, for each run different variables are identified as the most influencing factors with varying degrees of
lP
importance. For example, the distance to river and rainfall variable are the main influencing
na
factors in run 6, while NDVI and elevation had a more pronounced role in run 7. Generally, during the 10 runs, the most important variables were NDVI, distance to river, elevation,
ur
lithology, and rainfall among others (Fig. 6). For determining the most influencing factors, mean
Jo
of the variable importance over B = 10 runs was calculated. Fig. 7 displays average importance of the variable over different runs for each model. As illustrated in this figure NDVI, distance to river, elevation, and lithology are the main influencing factors of flood occurrence over the study domain. These results are accordance with Khosravi et al. (2018), Bui at al. (2018), Choubin et al. (2019b), (Darabi et al., 2019). Fig. 6 SOMEWHERE HERE Fig. 7 SOMEWHERE HERE
Journal Pre-proof 3.4. Justification of the proposed models and future works As discussed above, the flood susceptibility prediction depends on the appropriate selection of the flood conditioning factors and both of them are a function of the sample data points used for modeling. All of the investigations on flood susceptibility modeling used one-time data splitting for flood susceptibility modeling and predictions. Regarding the increased nonlinearity between the flood influencing factors and flood occurrence in large scale areas, the biased results would
of
be expected for flood susceptibility predictions by applying one-time data splitting of training
ro
and validation sets. This is due to the fact that limited information is used for projecting the flood
-p
influencing factors and flood occurrence which yields biased flood susceptibility predictions.
re
The proposed models based on the BT and RS algorithms provide sufficient information for multi-time model learning using different sample points. By using these models, a wide range of
lP
nonlinear associations between the predictors and predictant is captured leading to unbiased
na
predictions. For the sake of limited speed processing number of B = 10 runs of resampling are used in this study. Although the results of this study were promising, for the areas with limited
ur
hydrological data and increased nonlinearity between the predictors and flood occurrence the
Jo
number of bootstrap samples should be increased to an adequate number.
4. Conclusions This study proposed novel integrative models based on the bootstrapping (BT) and random subsampling (RS) algorithms, incorporated with machine learning models, for unbiased flood susceptibility predictions. Three machine learning models namely MARS, GAM and BRT incorporated with two resampling algorithms (BT and RS) and six new integrative models have emerged. Results indicated that employing resampling approaches improved the performance of
Journal Pre-proof the machine learning models. The BT algorithm is still performed better than the RS algorithm in terms of performance evaluation. The BT-GAM and BT-MARS models had competitive results in terms of performance evaluation measures. However, the BT-MARS model reproduced lower area percent of a very high susceptibility class. Sensitivity analysis of the response variable to various flood conditioning factors identified different conditioning factors with varying degrees of importance for each run of simulation which justified the use of proposed models for flood
of
susceptibility modeling. Regarding the extent of the study area, the models presented in this
ro
study provided good results on a large scale, and therefore their results could be considered in
re
-p
large scale planning for the development of urban areas.
lP
References
na
Allouche, O., Tsoar, A., Kadmon, R., 2006. Assessing the accuracy of species distribution models: Prevalence, kappa and the true skill statistic (TSS). J. Appl. Ecol.
ur
https://doi.org/10.1111/j.1365-2664.2006.01214.x
Jo
Araújo, M.B., Pearson, R.G., Thuiller, W., Erhard, M., 2005. Validation of species-climate impact models under climate change. Glob. Chang. Biol. https://doi.org/10.1111/j.13652486.2005.01000.x Arnold, J.G., Srinivasan, R., Muttiah, R.S., Williams, J.R., 1998. Large area hydrologic modeling and assessment part I: Model development. J. Am. Water Resour. Assoc. https://doi.org/10.1111/j.1752-1688.1998.tb05961.x Azareh, A., Sardooi, E.R., Choubin, B., Barkhori, S., Shahdadi, A., Adamowski, J. and Shamshirband, S. 2019. Incorporating multi-criteria decision-making and fuzzy-value
Journal Pre-proof functions
for
flood
susceptibility
assessment,
Geocarto
International,
DOI:
10.1080/10106049.2019.1695958 Beckers, A., Dewals, B., Erpicum, S., Dujardin, S., Detrembleur, S., Teller, J., Pirotton, M., Archambeau, P., 2013. Contribution of land use changes to future flood damage along the river
Meuse
in
the
Walloon
region.
Nat.
Hazards
Earth
Syst.
Sci.
https://doi.org/10.5194/nhess-13-2301-2013
of
Benito, G., Rico, M., Sánchez-Moya, Y., Sopeña, A., Thorndycraft, V.R., Barriendos, M., 2010.
the
Guadalentín
River,
southeast
Spain.
Glob.
Planet.
Change.
-p
of
ro
The impact of late Holocene climatic variability and land use change on the flood hydrology
re
https://doi.org/10.1016/j.gloplacha.2009.11.007
Bicknell, B.R., Imhoff, J.C., Kittle Jr., J.L., Donigan Jr., A.S., Johanson, R.C., 1997.
lP
Hydrological Simulation Program--Fortran, User’s manual for version 11: U.S.
na
Environmental Protection Agency, National Exposure Research Laboratory, Athens, Ga., EPA/600/R-97/080, 755 p. https://doi.org/EPA/600/R-97/080
ur
Brammer, H., 1990. Floods in Bangladesh: Geographical Background to the 1987 and 1988
Jo
Floods. Geogr. J. https://doi.org/10.2307/635431 Bui, D.T., Khosravi, K., Li, S., Shahabi, H., Panahi, M., Singh, V.P., Chapi, K., Shirzadi, A., Panahi, S., Chen, W., Bin Ahmad, B., 2018a. New hybrids of ANFIS with several optimization algorithms for flood susceptibility modeling. Water (Switzerland). https://doi.org/10.3390/w10091210 Bui, D.T., Panahi, M., Shahabi, H., Singh, V.P., Shirzadi, A., Chapi, K., Khosravi, K., Chen, W., Panahi, S., Li, S., Ahmad, B. Bin, 2018b. Novel Hybrid Evolutionary Algorithms for Spatial Prediction of Floods. Sci. Rep. https://doi.org/10.1038/s41598-018-33755-7
Journal Pre-proof Bui, D.T., Tsangaratos, P., Ngo, P.T.T., Pham, T.D., Pham, B.T., 2019. Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2019.02.422 Cardenas, M.B., Wilson, J.L., Zlotnik, V.A., 2004. Impact of heterogeneity, bed forms, and stream
curvature
on
subchannel
hyporheic
exchange.
Water
Resour.
Res.
https://doi.org/10.1029/2004WR003008
of
Cea, L., Bladé, E., 2015. A simple and efficient unstructured finite volume scheme for solving
ro
the shallow water equations in overland flow applications. Water Resour. Res.
-p
https://doi.org/10.1002/2014WR016547
re
Chanson, H., Brown, R., McIntosh, D., 2014. Human body stability in floodwaters: The 2011 flood in Brisbane CBD, in: ISHS 2014 - Hydraulic Structures and Society - Engineering
lP
Challenges and Extremes: Proceedings of the 5th IAHR International Symposium on
na
Hydraulic Structures. https://doi.org/10.14264/uql.2014.48 Chapi, K., Singh, V.P., Shirzadi, A., Shahabi, H., Bui, D.T., Pham, B.T., Khosravi, K., 2017. A
ur
novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ.
Jo
Model. Softw. https://doi.org/10.1016/j.envsoft.2017.06.012 Chen, Y.R., Yeh, C.H., Yu, B., 2011. Integrated application of the analytic hierarchy process and the geographic information system for flood risk assessment and flood plain management in Taiwan. Nat. Hazards. https://doi.org/10.1007/s11069-011-9831-7 Choubin, B., Moradi, E., Golshan, M., Adamowski, J., Sajedi-Hosseini, F., Mosavi, A., 2019. An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2018.10.064
Journal Pre-proof Croke, B., Andrews, F., Jakeman, A., 2005. Redesign of the IHACRES rainfall-runoff model. Proc. Darabi, H., Choubin, B., Rahmati, O., Torabi Haghighi, A., Pradhan, B., Kløve, B., 2019. Urban flood risk mapping using the GARP and QUEST models: A comparative study of machine learning techniques. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2018.12.002 Efron, B., 1979. Bootstrap Methods: Another Look at the Jackknife. Ann. Stat.
of
https://doi.org/10.1214/aos/1176344552
ro
Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regression trees. J. Anim.
-p
Ecol. https://doi.org/10.1111/j.1365-2656.2008.01390.x
re
Evans, R., Horstman, C., Conzemius, M., 2005. Accuracy and optimization of force platform gait analysis in Labradors with cranial cruciate disease evaluated at a walking gait. Vet.
T.,
2006.
An
introduction
to
ROC
analysis.
Pattern
Recognit.
Lett.
na
Fawcett,
lP
Surg. https://doi.org/10.1111/j.1532-950X.2005.00067.x
https://doi.org/10.1016/j.patrec.2005.10.010
ur
Feldman, A., 2000. Hydrologic modeling system HEC-HMS, Technical Reference Manual.
Jo
Tech. Ref. Man. https://doi.org/CDP-74B Fenicia, F., Savenije, H.H.G., Matgen, P., Pfister, L., 2008. Understanding catchment behavior through
stepwise
model
concept
improvement.
Water
Resour.
Res.
Ann.
Stat.
https://doi.org/10.1029/2006WR005563 Fox,
J.,
2002.
Bootstrapping
Regression
Models.
https://doi.org/10.1214/aos/1176345638 Freund, Y., Schapire, R.R.E., 1996. Experiments with a New Boosting Algorithm. Int. Conf. Mach. Learn. https://doi.org/10.1.1.133.1040
Journal Pre-proof Friedman, J., 1991. Multivariate adaptive regression splines (with discussion). Ann. Stat. García Ruiz, P.J., Ignacio, Á.S., Pensado, B.A., García, A.C., Frech, F.A., López, M.Á., González, J.A., Octavio, J.B., Burguera Hernández, J.A., Garriga, M.C., Blanco, D.C., García, B.C., Cordero, M.C., Peña, J.C., Ibáñez, A.E., Onisalde, A.G., Giménez-Roldán, S., Ibáñez, P.G., Vara, J.H., Alonso, R.I., Jiménez Jiménez, F.J., Krupinski, J., Bojarsky, J.K., Ramírez, I.L., García, E.L., Martínez-Castrillo, J.C., González, D.M., Rodríguez, F.M.,
of
Rivera, P.M., Fargas, E.M., Inchausti, J.O., Romero, J.O., Plana, J.O., Vallejo, P.O.,
ro
Sedano, B.P., de Colosía Rama, V.P., López-Fraile, I.P., Comes, A.P., Periz, V.P.,
-p
Rodríguez Oroz, M.C., García, D.S., Pérez, P.S., Muñoz, J.S., Gamo, J.V., Merino, C.V., Serra, F.V., Velázquez Pérez, J.M., Baña, R.Y., Capdepon, I.Z., 2008. Efficacy of long-term
re
continuous subcutaneous apomorphine infusion in advanced Parkinson’s disease with motor
lP
fluctuations: A multicenter study. Mov. Disord. https://doi.org/10.1002/mds.22063
Focus
on
Iran:
na
Ghomian, Z., Yousefian, S., 2017. Natural Disasters in the Middle-East and North Africa With a 1900
to
2015.
Heal.
Emergencies
Disasters
Q.
ur
https://doi.org/10.18869/nrip.hdq.2.2.53
Jo
Guzzetti, F., Mondini, A.C., Cardinali, M., Fiorucci, F., Santangelo, M., Chang, K.T., 2012. Landslide inventory maps: New tools for an old problem. Earth-Science Rev. https://doi.org/10.1016/j.earscirev.2012.02.001 Habibian, F., 2018. Increased number of floods in Iran [WWW Document]. Econ. news database. Haghizadeh, A., Siahkamari, S., Haghiabi, A.H., Rahmati, O., 2017. Forecasting flood-prone areas using Shannon’s entropy model. J. Earth Syst. Sci. https://doi.org/10.1007/s12040017-0819-x
Journal Pre-proof Hastie, T., Tibshirani, R., 1999. Generalized additive models. Chapman & Hall/CRC, Boca Raton, Fla.; London. Hastie, T., Tibshirani, R., Friedman, J., 2009. Elements of Statistical Learning 2nd ed. Elements. https://doi.org/10.1007/978-0-387-84858-7 Heitmuller, F.T., Hudson, P.F., Asquith, W.H., 2015. Lithologic and hydrologic controls of mixed alluvial-bedrock channels
in
flood-prone fluvial
systems:
Bankfull
and
of
macrochannels in the Llano River watershed, central Texas, USA. Geomorphology.
ro
https://doi.org/10.1016/j.geomorph.2014.12.033
-p
Hijmans, R.J., van Etter, J., Cheng, J., Mattiuzzi, M., Summer, M., Greenberg, J.A., Lamigueiro,
re
O.P., Bevan, A., Racine, E.B., Shortridge, A., Ghosh, A., 2017. Geographic Data Analysis and Modeling. R CRAN Proj.
lP
Hong, H., Panahi, M., Shirzadi, A., Ma, T., Liu, J., Zhu, A.X., Chen, W., Kougias, I., Kazakis,
na
N., 2018. Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci. Total Environ.
ur
https://doi.org/10.1016/j.scitotenv.2017.10.114
Jo
Hosseini, F.S., Choubin, B., Mosavi, A., Nabipour, N., Shamshirband, S., Darabi, H. and Haghighi, A.T., 2019. Flash-flood hazard assessment using Ensembles and Bayesian-based machine learning models: application of the simulated annealing feature selection method. Science of The Total Environment, p.135161. Jamali, M., Moghimi, E., Jafarpour, Z., Kardavani, P., 2015. Spatial analysis of the geomorphological hazards of urban development in Shiraz dry river. J. Spat. Anal. Environ. Hazards 3, 51–61. Jin, X., Wang, S., Yu, N., Zou, H., An, J., Zhang, Yuling, Wang, J., Zhang, Yulong, 2018.
Journal Pre-proof Spatial predictions of the permanent wilting point in arid and semi-arid regions of Northeast China. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2018.07.038. Jones, K. and Almond, S., 1992. Moving out of the linear rut: the possibilities of generalized additive models. Transactions of the Institute of British Geographers, pp.434-447. Jones, K. and Wrigley, N., 1995. Generalized additive models, graphical diagnostics, and logistic regression. Geographical Analysis, 27(1), pp.1-18.
of
Karlsson, C.S.J., Kalantari, Z., Mörtberg, U., Olofsson, B., Lyon, S.W., 2017. Natural Hazard
ro
Susceptibility Assessment for Road Planning Using Spatial Multi-Criteria Analysis.
-p
Environ. Manage. https://doi.org/10.1007/s00267-017-0912-6
re
Kazakis, N., Kougias, I., Patsialis, T., 2015. Assessment of flood hazard areas at a regional scale using an index-based approach and Analytical Hierarchy Process: Application in Rhodope-
lP
Evros region, Greece. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2015.08.055
na
Khosravi, K., Nohani, E., Maroufinia, E., Pourghasemi, H.R., 2016a. A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio
ur
and weights-of-evidence bivariate statistical models with multi-criteria decision-making
Jo
technique. Nat. Hazards. https://doi.org/10.1007/s11069-016-2357-2 Khosravi, K., Pham, B.T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., Prakash, I., Tien Bui, D., 2018. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2018.01.266 Khosravi, K., Pourghasemi, H.R., Chapi, K., Bahri, M., 2016b. Flash flood susceptibility analysis and its mapping using different bivariate models in Iran: a comparison between Shannon’s entropy, statistical index, and weighting factor models. Environ. Monit. Assess.
Journal Pre-proof https://doi.org/10.1007/s10661-016-5665-9 Khosravi, K., Shahabi, H., Pham, B.T., Adamowski, J., Shirzadi, A., Pradhan, B., Dou, J., Ly, H.B., Gróf, G., Ho, H.L., Hong, H., Chapi, K., Prakash, I., 2019. A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2019.03.073 Kia, M.B., Pirasteh, S., Pradhan, B., Mahmud, A.R., Sulaiman, W.N.A., Moradi, A., 2012. An
of
artificial neural network model for flood simulation using GIS: Johor River Basin,
ro
Malaysia. Environ. Earth Sci. https://doi.org/10.1007/s12665-011-1504-z
-p
Knoll, L., Breuer, L., Bach, M., 2019. Large scale prediction of groundwater nitrate
re
concentrations from spatial data using machine learning. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2019.03.045
Areas
under
the
Influence
of
Climate
Change.
Nat.
Hazards
Rev.
na
Urban
lP
Komolafe, A.A., Herath, S., Avtar, R., 2018. Methodology to Assess Potential Flood Damages in
https://doi.org/10.1061/(asce)nh.1527-6996.0000278
flood-hazard
areas—a
case
study.
Hydrol.
Sci.
J.
Jo
assess
ur
Kourgialas, N.N., Karatzas, G.P., 2011. Flood management and a GIS modelling method to
https://doi.org/10.1080/02626667.2011.555836 Kumar, R., Acharya, P., 2016. Flood hazard and risk assessment of 2014 floods in Kashmir Valley: a space-based multisensor approach. Nat. Hazards. https://doi.org/10.1007/s11069016-2428-4 Lee, M.J., Kang, J.E., Jeon, S., 2012. Application of frequency ratio model and validation for predictive flooded area susceptibility mapping using GIS, in: International Geoscience and Remote Sensing Symposium (IGARSS). https://doi.org/10.1109/IGARSS.2012.6351414
Journal Pre-proof Lee, Sunmin, Kim, J.C., Jung, H.S., Lee, M.J., Lee, Saro, 2017. Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea. Geomatics, Nat. Hazards Risk. https://doi.org/10.1080/19475705.2017.1308971 Li, X.H., Zhang, Q., Shao, M., Li, Y.L., 2011. A Comparison of Parameter Estimation for Distributed Hydrological Modelling Using Automatic and Manual Methods. Adv. Mater. Res. https://doi.org/10.4028/www.scientific.net/amr.356-360.2372
of
Luu, C., Von Meding, J., Kanjanabootra, S., 2018. Assessing flood hazard using flood marks and
ro
analytic hierarchy process approach: a case study for the 2013 flood event in Quang Nam,
-p
Vietnam. Nat. Hazards. https://doi.org/10.1007/s11069-017-3083-0
re
Mackenzie, C.E., 1980. Coded character sets : history and development. Addison-Wesley, Reading, Mass.
lP
Milborrow, S., 2019. earth:Multivariate Adaptive Regression Splines.
na
Miles, R.E., Snow, C.C., 1984. Designing strategic human resources systems. Organ. Dyn. https://doi.org/10.1016/0090-2616(84)90030-5
ur
Mitchell Lyons, 2018. Generalised additive models (GAMs): an introduction [WWW
Jo
Document]. Environ. Comput. URL http://environmentalcomputing.net/intro-to-gams/ Mosavi, A., Ozturk, P., Chau, K.W., 2018. Flood prediction using machine learning models: Literature review. Water (Switzerland). https://doi.org/10.3390/w10111536 Naimi, B., Araújo, M.B., 2016. Sdm: A reproducible and extensible R platform for species distribution modelling. Ecography (Cop.). https://doi.org/10.1111/ecog.01881 Picard, R.R., Cook, R.D., 1984. Cross-Validation of Regression Models. J. Am. Stat. Assoc. 79, 575–583. https://doi.org/10.1080/01621459.1984.10478083 Poli, S., Sterlacchini, S., 2007. Landslide representation strategies in susceptibility studies using
Journal Pre-proof weights-of-evidence modeling technique. Nat. Resour. Res. https://doi.org/10.1007/s11053007-9043-8 Politis, D.N., Romano, J.P., Wolf, M., 1999. Subsampling. Springer, New York, NY. Predick, K.I., Turner, M.G., 2008. Landscape configuration and flood frequency influence invasive shrubs in floodplain forests of the Wisconsin River (USA). J. Ecol. https://doi.org/10.1111/j.1365-2745.2007.01329.x
of
Put, R., Xu, Q.S., Massart, D.L. and Vander Heyden, Y., 2004. Multivariate adaptive regression
ro
splines (MARS) in chromatographic quantitative structure–retention relationship studies.
-p
Journal of Chromatography A, 1055(1-2), pp.11-19.
re
R Development Core Team, 2016. R: A language and environment for statistical computing. R Found. Stat. Comput. https://doi.org/10.1017/CBO9781107415324.004
lP
Rahmati, O., Kornejady, A., Samadi, M., Nobre, A.D., Melesse, A.M., 2018. Development of an
na
automated GIS tool for reproducing the HAND terrain model. Environ. Model. Softw. https://doi.org/10.1016/j.envsoft.2018.01.004
ur
Rahmati, O., Pourghasemi, H.R., Zeinivand, H., 2016. Flood susceptibility mapping using
Jo
frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto Int. https://doi.org/10.1080/10106049.2015.1041559 Reneau, S.L., 2000. Stream incision and terrace development in Frijoles Canyon, Bandelier National Monument, New Mexico, and the influence of lithology and climate. Geomorphology. https://doi.org/10.1016/S0169-555X(99)00094-X Rezaie-balf, M., Naganna, S.R., Ghaemi, A., Deka, P.C., 2017. Wavelet coupled MARS and M5 Model
Tree
approaches
for
groundwater
https://doi.org/10.1016/j.jhydrol.2017.08.006
level
forecasting.
J.
Hydrol.
Journal Pre-proof Ridgeway, G., 2013. gbm: Generalized Boosted Regression Models. R Packag. version 1.6-3.1. Sajedi‐ Hosseini, F., Choubin, B., Solaimani, K., Cerdà, A. and Kavian, A., 2018a. Spatial prediction of soil erosion susceptibility using a fuzzy analytical network process: Application of the fuzzy decision making trial and evaluation laboratory approach. Land degradation & development, 29(9), pp.3092-3103. Sajedi-Hosseini, F., Malekian, A., Choubin, B., Rahmati, O., Cipullo, S., Coulon, F., Pradhan,
contamination.
Sci.
-p
https://doi.org/10.1016/j.scitotenv.2018.07.054
Total
Environ.
ro
groundwater
of
B., 2018b. A novel machine learning-based approach for the risk assessment of nitrate
Samani, S., 2019. Allocation of 151 billion rials to compensate for flood damage to two [WWW
Document].
Iran.
re
provinces
student’s
News
Agency.
URL
lP
https://www.isna.ir/news/97121306705
na
Samanta, R.K., Bhunia, G.S., Shit, P.K., Pourghasemi, H.R., 2018. Flood susceptibility mapping using geospatial frequency ratio technique: a case study of Subarnarekha River Basin, India.
ur
Model. Earth Syst. Environ. https://doi.org/10.1007/s40808-018-0427-z
Jo
Schapire, R.E., 2003. The Boosting Approach to Machine Learning: An Overview. https://doi.org/10.1007/978-0-387-21579-2_9 Seckin, N., Cobaner, M., Yurtal, R., Haktanir, T., 2013. Comparison of Artificial Neural Network Methods with L-moments for Estimating Flood Flow at Ungauged Sites: The Case of
East
Mediterranean
River
Basin,
Turkey.
Water
Resour.
Manag.
https://doi.org/10.1007/s11269-013-0278-3 Siahkamari, S., Haghizadeh, A., Zeinivand, H., Tahmasebipour, N., Rahmati, O., 2018. Spatial prediction of flood-susceptible areas using frequency ratio and maximum entropy models.
Journal Pre-proof Geocarto Int. https://doi.org/10.1080/10106049.2017.1316780 Steven Abney, 2002. Bootstrapping, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). AT&T Laboratories – Research 180 Park Avenue Florham Park, NJ, USA, 07932, Philadelphia, pp. 360–367. Tang, Z., Zhang, H., Yi, S., Xiao, Y., 2018. Assessment of flood susceptible areas using spatially explicit,
probabilistic
multi-criteria
decision
J.
Hydrol.
of
https://doi.org/10.1016/j.jhydrol.2018.01.033
analysis.
ro
Tavoosi, T., Delara, G., 2010. Climate Classification of Ardebil Province. Nivar 34, 47–52.
-p
Tehrany, M.S., Pradhan, B., Jebur, M.N., 2013. Spatial prediction of flood susceptible areas
re
using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2013.09.034
lP
Tehrany, M.S., Pradhan, B., Jebur, M.N., 2014. Flood susceptibility mapping using a novel
na
ensemble weights-of-evidence and support vector machine models in GIS. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2014.03.008
ur
Tehrany, M.S., Pradhan, B., Jebur, M.N., 2015a. Flood susceptibility analysis and its verification
Jo
using a novel ensemble support vector machine and frequency ratio method. Stoch. Environ. Res. Risk Assess. https://doi.org/10.1007/s00477-015-1021-9 Tehrany, M.S., Pradhan, B., Mansor, S., Ahmad, N., 2015b. Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. Catena. https://doi.org/10.1016/j.catena.2014.10.017 Thuiller, W., Georges, D., Engler, R., 2013. biomod2: Ensemble platform for species distribution modeling. R Packag. version. https://doi.org/10.1017/CBO9781107415324.004 Tien Bui, D., Bui, Q.T., Nguyen, Q.P., Pradhan, B., Nampak, H., Trinh, P.T., 2017. A hybrid
Journal Pre-proof artificial intelligence approach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest fire susceptibility modeling at a tropical area. Agric. For. Meteorol. https://doi.org/10.1016/j.agrformet.2016.11.002 Tien Bui, D., Pradhan, B., Nampak, H., Bui, Q.T., Tran, Q.A., Nguyen, Q.P., 2016. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area
of
using GIS. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2016.06.027
ro
UNISDR, 2011. Global Assessment Report on Disaster Risk Reduction. International Strategy
-p
for Disaster Reduction (ISDR).
re
UNISDR, 2015. Global Assessment Report on Disaster Risk Reduction., International Stratergy for Disaster Reduction (ISDR). https://doi.org/9789211320282
lP
Vaghefi, S.A., Keykhai, M., Jahanbakhshi, F., Sheikholeslami, J., Ahmadi, A., Yang, H.,
na
Abbaspour, K.C., 2019. The future of extreme climate in Iran. Sci. Rep. 9, 1464. https://doi.org/10.1038/s41598-018-38071-8
ur
Wang, Y., Hong, H., Chen, W., Li, S., Panahi, M., Khosravi, K., Shirzadi, A., Shahabi, H.,
Jo
Panahi, S., Costache, R., 2019. Flood susceptibility mapping in Dingnan County (China) using adaptive neuro-fuzzy inference system with biogeography based optimization and imperialistic
competitive
algorithm.
J.
Environ.
Manage.
247,
712–729.
https://doi.org/10.1016/j.jenvman.2019.06.102 Wu, C.F.J., 1986. Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis. Ann. Stat. 14, 1261–1295. https://doi.org/10.1214/aos/1176350142 Yalcin, A., Reis, S., Aydinoglu, A.C., Yomralioglu, T., 2011. A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression
Journal Pre-proof methods for landslide susceptibility mapping in Trabzon, NE Turkey. Catena. https://doi.org/10.1016/j.catena.2011.01.014 Yang, W., Xu, K., Lian, J., Ma, C., Bin, L., 2018. Integrated flood vulnerability assessment approach
based
on
TOPSIS
and
Shannon
entropy
methods.
Ecol.
Indic.
https://doi.org/10.1016/j.ecolind.2018.02.015 Yates, D.N., Warner, T.T., Leavesley, G.H., 2002. Prediction of a Flash Flood in Complex
of
Terrain. Part II: A Comparison of Flood Discharge Simulations Using Rainfall Input from
ro
Radar, a Dynamic Model, and an Automated Algorithmic System. J. Appl. Meteorol.
-p
https://doi.org/10.1175/1520-0450(2000)039<0815:poaffi>2.0.co;2
models
for
re
Zhang, W., Goh, A.T.C., 2016. Multivariate adaptive regression splines and neural network prediction
of
pile
drivability.
Geosci.
Front.
lP
https://doi.org/10.1016/j.gsf.2014.10.003
areas
on
a
na
Zhao, G., Pang, B., Xu, Z., Yue, J., Tu, T., 2018. Mapping flood susceptibility in mountainous national
scale
in
China.
Sci.
Total
Environ.
ur
https://doi.org/10.1016/j.scitotenv.2017.10.037
Jo
Zhou, Y., Wu, Y., 2011. Analyses on influence of training data set to neural network supervised learning performance. Adv. Intell. Soft Comput. Adv. Intell. Soft Comput. 106, 19–25.
Journal Pre-proof Declaration of interests
Jo
ur
na
lP
re
-p
ro
of
☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Journal Pre-proof
Jo
ur
na
lP
re
-p
ro
of
Figures:
Fig. 1. Location of the Ardabil Province in northwest Iran
Jo
ur
na
lP
re
-p
ro
of
Journal Pre-proof
Fig. 2. Flood conditioning factors used in this study
Journal Pre-proof
Flo od conditioning factors
Flood inventory map
Resampling by RS and BT algorithms (B = 10 runs)
Altitude Flood points
Slope
Non -Flood points
Aspect Curvature Training dataset
Validation dataset
River
of
Rainfall
MARS model BRT model
ro
NDVI
-p
Landuse
re
Lithology
Averaging the flood susceptibility over B=10 runs
GAM model
YES
na
lP
Flood susceptibility map
ur
Fig. 3. Flowchart of methodology
Jo
Model calibration
NO Stopping condition?
Jo
ur
na
lP
re
-p
ro
of
Journal Pre-proof
Fig. 4. Flood susceptibility predictions in the Ardabil province using the proposed integrative models.
Journal Pre-proof
0.2
Mean AUC (test.dep) = 0.918
sitive rate)
1-Specificity (false positive 1-Specificity (false positive rate)rate)
1.0 0.8 0.6 0.4 0.2 0.0
ro
-p
1.0 0.8 0.6 0.2
0.4
re
Sensitivity (true positive rate)
0.0
lP
na
1.0
0.0
0.0
Mean AUC (training) = 0.968 Mean Mean AUCAUC (test) = 0.95 (test.dep) = 0.943
0.2
0.4
0.6
0.8
0.6
0.8
1.0
1-Specificity (false positive rate)
ROC - bootstrap) ROC (brt(gam - subsampling)
ROC (gam
1.0
Mean AUC (train) = 0.96 Mean AUC (training) = 1
AUC (training) = 0.96 Mean Mean AUC (test) = 0.92 Mean AUC (test.dep) = 0.976 Mean AUC (test.dep) = 0.918
0.8
1.0
RS-BRT
0.6
0.6 0.4 0.2
Mean AUC (train) = 0.96
0.4
0.4
ur
Jo
1-Specificity (false positive rate)
0.2
0.2
0.8
0.0
0.6
Mean AUC (training) = 0.999 Mean AUC (test.dep) = 0.936
Sensitivity (true positive rate)
0.4
ROC (brt - bootstrap) BT-BRT
0.8
1.0
0.2
(training) = 0.999 Mean Mean AUCAUC (test) = 0.96 Mean AUC (test.dep) = 0.936
Sensitivity (true positive rate) Sensitivity (true positive rate) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
1.0 0.8 0.6 0.4 0.2 0.0
Sensitivity (true positive rate) 0.2 0.4 0.6 0.8 1.0 Sensitivity (true positive rate)
1-Specificity
Mean AUC (train) = 0.99
1-Specificity (false positive rate) 0.2 0.4 0.6 0.8 1.0
0.0
0.00.0 0.20.2 0.40.4 0.60.6 0.80.8 1.01.0
1-Specificity (false positive rate)
1-Specificity (false positive rate) 1-Specificity (false positive rate)
1.0 0.8 0.6
.4
(true positive rate)
0.4
0.6
0.8
1.0
Fig. 5. ROC curves of the developed integrative models for learning and validation phases with ROC (mars - bootstrap) - subsampling) the thin curves representing the ROC of the individualROC runs (mars and the thicker curves representing the mean ROC over the B = 10 runs. (true positive rate)
0
RS-MARS
(training) = 1 Mean Mean AUCAUC (test) = 0.97 Mean AUC (test.dep) = 0.969
0.0
sitive rate)
1-Specificity (false positive 1-Specificity (false positive rate)rate) 0.2 0.4 0.6 0.8 1.0
0.2
ROC (mars - subsampling)
Mean AUC (train) = 1
0.0
1.0
0.0
1-Specificity (false positive rate)
ROC (mars - subsampling)
0.0
AUC (training) = 1 AUC (test.dep) = 0.969
Sensitivity (true positive rate)
otstrap)
Sensitivity (true positive rate)
1.0 0.8 0.6 0.4 0.2
Mean AUC (train) = 1
0.0
ROC (mars - bootstrap) BT-MARS
0.8
Mean(training) AUC (training) Mean AUC = 0.96 = 1 Mean(test.dep) AUC (test.dep) Mean AUC = 0.918= 0.976
Mean0.6 (training) =1 0.4 0.6 AUCAUC (test) =0.8 0.95 0.0 0.0 0.2 0.2 0.4Mean 1.0 1.0 Mean AUC 0.8 (test.dep) = 0.945
1-Specificity (false positive rate) 0.0 0.0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 1.0
1.0
RS-GAM
0.0
Sensitivity (true positive rate) 0.0 0.4 0.2 0.6 0.4 0.8 0.6 1.0 0.8 1.0 0.2
0.0
0.0
Mean AUC (training) = 1 AUC(training) (test) 0.98 Mean0.6 AUC 0.4Mean 0.8= =0.96 1.0 Mean AUC (test.dep) = 0.976
(true positive Sensitivity rate) rate) (true positive Sensitivity
1.0 0.6 1.0 0.8 0.8 0.0 0.4 0.2 0.6 0.4 0.2
Mean AUC (training) = 0.968 Mean AUC (test.dep) = 0.943
Mean AUC (train) = 1
ROC (gam - subsampling)
ROC (gam
of
0.8
- bootstrap) ROCROC (brt (gam - subsampling)
ROC - bootstrap) ROC (brt (gam -BT-GAM subsampling)
0.0
AUC (training) = 0.968 AUC (test.dep) = 0.943
Sensitivity (true positive rate)
strap)
(true positive Sensitivity rate) rate) (true positive Sensitivity 0.0 0.2 0.4 0.6 0.8 1.0
ROC (brt - bootstrap)
0.0
0.2
0
1-Specificit
Jo
ur
na
lP
re
-p
ro
of
Journal Pre-proof
Fig. 6. Sensitivity analysis of the response variable (flood) to the various conditioning factors for individual runs of the BT-GAM model
Jo
ur
na
lP
re
-p
ro
of
Journal Pre-proof
Fig. 7. Sensitivity analysis of the response variable (flood) to the various conditioning factors using bootstrapping (BT) and random subsampling (RS) algorithms.
Jo
ur
na
lP
re
-p
ro
of
Journal Pre-proof
Journal Pre-proof
Tables:
GAM
0.92
MARS
0.95
0.86
0.89
BRT
0.93
0.74
0.82
0.90
0.64
0.71
0.94
0.78
0.78
RS-GAM
0.95
0.86
0.87
RS-MARS
0.96
0.86
0.89
RS-BRT
0.92
0.73
0.77
BT-GAM
0.98
0.91
0.93
BT-MARS
0.97
0.91
0.94
BT-BRT
0.95
0.79
0.83
MLP
na
ur
RS integrative models
lP
SVM
Jo
BT integrative models
COR
of
AUC
0.82
ro
re
Standalone benchmark models
Model
-p
Table 1 Accuracy assessment of the integrative models, comparison with standalone benchmark models TSS 0.84
Journal Pre-proof
Table 2 Area percent and the number of floods occurred within the susceptibility classes reproduced by the models
Susceptibility rate
BT-GAM
BT-BRT
Number of floods
Area percent
Number of floods
Area percent
Number of floods
Very low
191209 (40%)
1 (0.7%)
155901 (8%)
8 (5%)
1953529 (41%)
1 (0.7%)
Low
581374 (12%)
2 (1.36%)
162310(39%)
25 (17%)
644932 (13%)
1 (0.7%)
Moderate
380406 (8%)
2 (1.36%)
277526 (15%)
9 (6%)
464488 (10%)
3 (2%)
High
379499 (8%)
10 (6%)
422092 (22%)
25 (17%)
452812 (10%)
9 (5.5%)
Very high
148599 (32%)
133 (90.5%)
857621 (46%)
80 (54%)
1223601 (26%)
134 (91%)
Jo
ur
na
lP
re
-p
ro
of
Area percent
BT-MARS
Journal Pre-proof
Curvature
Distance to river
Aspect
NDV
Resampling algorithms Slope
Flood and Non-flood points
Elevation
ro
of
Sensitivity analysis
Jo
ur
na
lP
re
-p
Flood susceptibility map
Rainfall
Machine learning models
Journal Pre-proof
Highlights: Novel integrative models proposed for flood susceptibility predictions.
Resampling algorithms were integrated with machine learning models.
Bootstrapping and subsampling algorithms improved the performance of the models.
Machine learning models performed best with bootstrapping algorithm.
Jo
ur
na
lP
re
-p
ro
of