Integrated machine learning methods with resampling algorithms for flood susceptibility prediction

Integrated machine learning methods with resampling algorithms for flood susceptibility prediction

Journal Pre-proof Integrated machine learning methods with resampling algorithms for flood susceptibility prediction Esmaeel Dodangeh, Bahram Choubin...

2MB Sizes 0 Downloads 43 Views

Journal Pre-proof Integrated machine learning methods with resampling algorithms for flood susceptibility prediction

Esmaeel Dodangeh, Bahram Choubin, Ahmad Najafi Eigdir, Narjes Nabipour, Mehdi Panahi, Shahaboddin Shamshirband, Amir Mosavi PII:

S0048-9697(19)35978-9

DOI:

https://doi.org/10.1016/j.scitotenv.2019.135983

Reference:

STOTEN 135983

To appear in:

Science of the Total Environment

Received date:

6 October 2019

Revised date:

5 December 2019

Accepted date:

5 December 2019

Please cite this article as: E. Dodangeh, B. Choubin, A.N. Eigdir, et al., Integrated machine learning methods with resampling algorithms for flood susceptibility prediction, Science of the Total Environment (2019), https://doi.org/10.1016/j.scitotenv.2019.135983

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier.

Journal Pre-proof Integrated machine learning methods with resampling algorithms for flood susceptibility prediction Esmaeel Dodangeha, Bahram Choubinb, Ahmad Najafi Eigdirb, NarjesNabipourc, Mehdi Panahid, Shahaboddin Shamshirbande,f,*, Amir Mosavig,h a

Department of Watershed Management, Sari Agricultural Sciences and Natural Resources University, P.O.

Soil Conservation and Watershed Management Research Department, West Azarbaijan Agricultural and Natural

Resources Research and Education Center, AREEO, Urmia, Iran

ro

b

of

Box737, Sari, Iran

Institute of Research and Development, Duy Tan University, Da Nang 550000, Vietnam

d

Department of Geophysics, Young Researchers and Elites Club, North Tehran Branch, Islamic Azad University,

re

-p

c

Department for Management of Science and Technology Development, Ton Duc Thang University, Ho Chi Minh

City, Vietnam

na

e

lP

Tehran, Iran

Faculty of Information Technology, Ton Duc Thang University, Ho Chi Minh City, Vietnam

g

Kalman Kando Faculty of Electrical Engineering, Obuda University, Budapest, Hungary

h

School of the Built Environment, Oxford Brookes University, Oxford OX30BP, UK

Jo

ur

f

* Corresponding author, Email: [email protected]

Journal Pre-proof Integrated machine learning methods with resampling algorithms for flood susceptibility prediction

Abstract Flood susceptibility projections relying on standalone models, with one-time train-test data

of

splitting for model calibration, yields biased results. This study proposed novel integrative flood

ro

susceptibility prediction models based on multi-time resampling approaches, random

-p

subsampling (RS) and bootstrapping (BT) algorithms, integrated with machine learning models:

re

generalized additive model (GAM), boosted regression tree (BTR) and multivariate adaptive regression splines (MARS). RS and BT algorithms provided 10 runs of data resampling for

lP

learning and validation of the models. Then the mean of 10 runs of predictions is used to produce

na

the flood susceptibility maps (FSM). This methodology was applied to Ardabil Province on coastal margins of the Caspian Sea which faced destructive floods. The area under curve (AUC)

ur

of receiver operating characteristic (ROC) and true skill statistic (TSS) and correlation

Jo

coefficient (COR) were utilized to evaluate the predictive accuracy of the proposed models. Results demonstrated that resampling algorithms improved the performance of Standalone GAM, MARS and BRT models. Results also revealed that Standalone models had better performance with the BT algorithm compared to the RS algorithm. BT-GAM model attained superior performance in terms of statistical measures (AUC = 0.98, TSS = 0.93, COR = 0.91), followed by BT-MARS (AUC = 0.97, TSS = 0.91, COR = 0.91) and BT-BRT model (AUC = 0.95, TSS = 0.79, COR = 0.79). Results demonstrated that the proposed models outperformed the benchmark models such as Standalone GAM, MARS, BRT, multilayer perceptron (MLP)

Journal Pre-proof and support vector machine (SVM). Given the admirable performance of the proposed models in a large scale area, the promising results can be expected from these models for other regions. Keywords: Resampling approach; Random subsampling; Bootstrapping; Flood susceptibility; Machine learning

of

1. Introduction

ro

Floods are among the most devastating natural disasters that occur on large scales and their post-

-p

disaster effects persist for a long time. The Global Assessment Report on Disaster Risk Reduction (UNISDR, 2015) indicated that flood mortality risk is increased in the Middle East

re

and North Africa (MNA) countries due to the population growth (Haghizadeh et al., 2017) and

lP

manipulating the nature by forest cutting and rangeland degradation. The flood mortality rate has an 11% increased in MNA countries compared to the last century (UNISDR, 2015; Ghomian and

na

Yousefian, 2017). During the past 30 years, more than 80% of natural disaster events occurred in

ur

9 countries including Afghanistan, Pakistan, Iran, Sudan, Somalia, Algeria, Morocco, Yemen

Jo

and Egypt (UNISDR, 2011). The causes of flooding vary in different parts of the world, for example, Queensland floods are river floods that occur as a result of inability to absorb water into the soil so that the excess water flows through the river channels and causes flooding (Chanson et al., 2014). In Bangladesh, monsoon rains are the main cause of flooding that occurs as a result of movement of hurricanes from sea to the coast (Brammer, 1990). The cause of coastal floods in Europe is the Atlantic storms that push the water to the coast. If this process is accompanied with intense tides it can cause devastating floods (UNISDR, 2015). While two-thirds of Iran struggled with water scarcity (Rahmati et al., 2018) floods have been one of the most destructive natural disasters in the country over the past few years (Vaghefi et

Journal Pre-proof al., 2019). Most of the flooding events in Iran are human-induced that occur as a result of human interference with nature through the change in flood routes, land use change and urban development along the dry river beds without creating the new routes for crossing the floods (Jamali et al., 2015). Just in the past two years, 112 people have died in flooding events and 45,000 homes have been completely destroyed. The flood events have also damaged the municipal facilities, agricultural lands, and roads (Habibian, 2018). Although flood prevention is

of

not entirely feasible, flood hazard zoning in order to determine the most flood-vulnerable areas

ro

and adopting the flood proofing practices can noticeably reduce the flood damages.

-p

Physically-based hydrologic models such as hydrologic engineering center-hydrologic modeling

re

system (HEC-HMS) (Feldman, 2000), soil and water assessment tool (SWAT) (Arnold et al., 1998) IHACRES (Croke et al., 2005), and HSPF model (Bicknell et al., 1997) have been

lP

engaged in flood modeling studies. However, the use of such models requires extensive field

na

measurements and exhausted parameterization practices (Fenicia et al., 2008). They still provide only at site estimations of the flood hazard using the local streamflow data recorded at gauging

ur

hydrometric stations thus they are not suitable for flood assessment in the regional scales (Li et

Jo

al., 2011; Tien Bui et al., 2016).

Flood susceptibility mapping (FSM) with support of geographic information systems (GIS) enables us to predict future flood occurrence to mitigate their human and socio-economic losses (Wang et al., 2019). GIS techniques have made a significant contribution to flood susceptibility modeling studies by providing geostatistical tools for handling large amounts of spatial data (Tien Bui et al., 2016; Wang et al., 2019). Various statistical and data-driven techniques along with GIS techniques have been proposed and applied for identifying the flood-susceptible areas in the literature. Among those commonly used approaches are analytical hierarchy process

Journal Pre-proof (AHP) (Chen et al., 2011; Kazakis et al., 2015; Luu et al., 2018; Tang et al., 2018), frequency ratio (FR) (Lee et al., 2012; Rahmati et al., 2016; Samanta et al., 2018; Siahkamari et al., 2018; Tehrany et al., 2015a) and weights-of-evidence (Rahmati et al., 2016). However, there are drawbacks to the use of these approaches for flood susceptibility map (FSM) production. For example, the results of the AHP model is subject to uncertainties due to ambiguous judgments (Miles and Snow, 1984), and the FR method is greatly dependent on the sample size (Sajedi-

of

Hosseini et al., 2018b).

ro

To cope with these issues, the machine learning (ML) approaches including, artificial neural

-p

networks (ANNs), adaptive neuro-fuzzy inference systems (ANFIS), genetic algorithm (GA),

re

support vector machines (SVM), and tree-based models have been proposed and extended to use for flood susceptibility predictions (Bui et al., 2019, 2018b, 2018a; Chapi et al., 2017; Khosravi

lP

et al., 2019, 2018; Kia et al., 2012; Seckin et al., 2013; Tehrany et al., 2015b, 2013; Wang et al.,

na

2019; Zhao et al., 2018). In this study, we used three rarely used ML models including multivariate adaptive regression splines (MARS), boosted regression trees (BRT) and

ur

generalized additive model (GAM) for flood susceptibility modeling. Some studies indicated a

Jo

superior performance of the MARS model over the other artificial intelligent (AI) models such as ANN, SVM, ANFIS, M5 model tree (Mosavi et al., 2018; Rezaie-balf et al., 2017). Boosted regression trees (BRT) are robust ensemble models which compound the advantages of regression tree models and boosting method (Elith et al., 2008). Generalized additive models with high potential of fitting the complex and non-linear environmental data (Mitchell Lyons, 2018) are also ensemble models that compound the benefits of generalized linear models (GLMs) and additive models that are considered in this study.

Journal Pre-proof For the use of any predictive model, two sets of data are considered. The first set is used for learning the model and the second set, which is statistically independent (Araújo et al., 2005), is used for evaluation of the model. Data splitting is commonly used for partitioning the data into learning and validation subsets. Most of the investigations using ML models for flood susceptibility predictions either in standalone or hybrid versions, relied on one-time learning and validation phases for reproducing the flood susceptibility maps. However, this could have

of

resulted in bias to the parameter estimation and subsequently FSM predictions of the used

ro

models. To alleviate this issue the current study employed resampling algorithms namely

-p

bootstrapping (BT) and random subsampling (RS) ( Picard and Cook, 1984; Politis et al., 1999;

re

Steven Abney, 2002; Wu, 1986; Hastie et al., 2009) to use along with ML models for FSM prediction. The BT and RS algorithms perform sampling respectively with and without

lP

replacement of original data respectively, by using samples randomly drawn in B iterations. The

na

RS algorithm randomly splits the data into training and validation sets B times without replacement of the original data. The BT algorithm draws a sample with an equal length to the

ur

original data as a training set for learning the models. The data that are not contributed to model

Jo

learning is used for validation of the models. Performance of the ML models strikingly dependent on the data used for model learning. The use of resampling algorithms combined with the ML models as integrative models (for multi-time model learning) engages more information on data and thus results in less biased results. The present study proposed novel integrative models by coupling the resampling algorithms with ML models (to provide the more reliable flood susceptibility predictions compared to the onetime splitting train-test data set) with a case study at Ardabil province near the coastal margins of Caspian Sea. Therefore the main objectives of this research are: (i) to evaluate the performance

Journal Pre-proof of three rarely engaged ML models namely the generalized additive model (GAM), boosted regression tree (BRT), and multivariate adaptive regression splines (MARS) for flood susceptibility prediction, (ii) to explore the effects of resampling methods (bootstrapping, BT, and subsampling, RS) on the performance of ML models by comparing the performance of standalone ML models (GAM, BRT and MARS models) with the new integrative models called hereafter BT-GAM, BT-BRT, BT-MARS, RS-GAM, RS-BRT, RS-MARS, (iii) to predict the

of

flood susceptibly map using the new integrative models for identifying the most susceptible area

-p

ro

over the study region.

re

2. Materials and methods

lP

2.1. Study area

The study area is Ardabil Province that is located in northwest Iran. The distinct covering an area

na

of about 17,953 km2 (1% of the area of the country) is restricted to Talesh Mountains in the east

ur

and the Azerbaijan Plateau in the west (Tavoosi and Delara, 2010) (Fig. 1). Around 75% of this area has mountainous topography with a maximum elevation of 4,811m above mean sea level at

Jo

Sabalan Mountain. As a consequence of complex topography and numerous lakes, the region has very diverse climates such as arid and semi-arid in the north and Mediterranean and semi-humid in the south of the district. Based on the historical data analyses between 1988-2016, the mean annual rainfall is about 480 mm (Iran Meteorological Organization, 2019). The spatial distribution of the rainfall also varies between 222 mm in the north and 1810 mm in the southeastern of the district. The mean air temperature in the study region varies between 7.9 o 15.2o that is spatially variable between the climate stations (Tavoosi and Delara, 2010).

Journal Pre-proof Due to the proximity of the study region to the coastal margins of the Caspian Sea, it is exposed to destructive floods. According to the Ardabil Province crisis management organization (2016), more than 37% of disasters in Ardabil Province are related to flood. Over the past three years, the average flood damage was estimated at around $ 900,000 per year. The total losses of the last flood event in February 2019 reached to $ 370,000 (Samani, 2019).

of

Fig. 1 SOMEWHERE HERE

ro

2.2. Data used

-p

2.2.1. Flood inventory map

The flood inventory map, a map showing the locations of historical floods, is a prerequisite for

re

spatial modeling of flood susceptibility (Tehrany et al., 2014). It shows the past records of flood

lP

occurrence in an area and can be used to estimate the future flood events by analyzing the relationships between the past flood events and their environmental conditioning factors. In the

na

current research, flood inventory map was constructed using the ground control points of the

ur

historical flood data of 147 locations using the existing reports and by field surveys (Fig. 1). The

Jo

equal number of 147 location points was considered as non-flood points (Fig. 1).

2.2.2. Preparation of geospatial database (Flood influencing factors) Although there is not a universal guideline to select the flood influencing factors (Azareh et al., 2019; Hosseini et al., 2020), several flood influencing variables such as elevation, slope, aspect, curvature, distance to stream, rainfall, normalized difference vegetation index (NDVI), land use, and lithology were identified according to the literature (Khosravi et al., 2016a; Rahmati et al., 2016; Choubin et al., 2019b) and based on the available data. The spatial variability of the

Journal Pre-proof identified parameters over the study region is illustrated in Fig. 2. All of the input variables were transformed to the raster format with the 30 × 30 m spatial resolution in the ArcGIS interface.

2.2.2.1. Elevation Elevation is one of the most effective parameters that play an important role in flood inundation of an area (Tehrany et al., 2014; Choubin et al., 2019b). It has been frequently used as a relief

of

indicator at large scales (Rahmati et al., 2018). There is an inverse relationship between the

ro

elevation and flooding of a region. Li et al. (2011) indicated that the regions with lower elevation

-p

are more prone to flooding. Cea and Bladé (2015) pointed out that flooding in lower elevations is

re

due to the flowing water from the high elevations. The elevation map was prepared using the

lP

digital elevation model (DEM) 30 × 30 m resolution in the ArcGIS interface (Fig. 2a).

na

2.2.2.2. Slope

Slope is a physiographic characteristic that greatly affects the runoff volume and velocity. The

ur

runoff volume and velocity increase with increasing the slope gradient ( Khosravi et al., 2016b;

Jo

Tien Bui et al., 2018a). As the slope gradient increases, the runoff infiltration rate decreases and a large amount of runoff enters the drainage network (Tehrany et al., 2015a). The slope map of the region varies from 0 to 88 degrees (Fig. 2b).

2.2.2.3. Aspect The aspect is also important in flood studies (Choubin et al., 2019b). Sunny slopes are less humid and less likely to be flooded. Yates et al. (2002) showed that the hydrologic response unit is highly influenced by the slope aspect. Rahmati et al. (2016) also demonstrated that soil

Journal Pre-proof moisture content and local climatic conditions are also influenced by slope aspect. In this study, the aspect classified into nine categories as illustrated in Fig. 2c.

2.2.2.4. Curvature Curvature is also an important flood conditioning factor that affects heterogeneity and hyporheic (Cardenas et al., 2004). Flat and concave areas are more prone to flooding (Tehrany et al., 2014,

of

2015a). In the present study, curvature was calculated using the DEM by ArcGIS software (Fig.

-p

ro

2d).

re

2.2.2.5. Distance to stream

The smaller the distance from the river and stream, the greater the risk of flooding. Predick and

lP

Turner (2008) emphasized that distant or close proximity to the river is a serious factor for

na

flooding. Tien Bui et al. (2018) and Darabi et al. (2019) also observed that a great number of floods occurred in areas adjacent to the river. As illustrated in Fig. 2e, the distance from river

2.2.2.6. Rainfall

Jo

ur

ranges between 0 and 9560 m.

Rainfall is a key influencing factor in flood susceptibility mapping which is considerably remarked in the literature (Tehrany et al., 2015a; Tien Bui et al., 2017). In this research, the mean annual rainfall map was calculated using the long-term precipitation data (1988-2016) from 32 rain gauge stations by the inverse distance weighting (IDW) method in ArcGIS. Rainfall data were obtained from the Iran Meteorological Organization (IRIMO) and the Iranian Department of Water Resources Management Company (IDWRMC). The annual rainfall varies

Journal Pre-proof between 222 and 1810 mm respectively from the northwest towards the southeastern coastal regions of the province (Fig. 2f).

2.2.2.7. Normalized difference vegetation index (NDVI) The NDVI is a simple graphic indicator that is used in remote sensing analyses for the assessment of vegetation attributes in a region (Sajedi‐ Hosseini et al., 2018a). As there is an

of

inverse relationship between vegetation density and flooding (Tehrany et al., 2013; Kumar and

ro

Acharya, 2016), the use of this index can be useful in preparing the flood susceptibility map. The

-p

NDVI map of the study region (Fig. 2g) is calculated and mapped using the following equation (Tucker and Seller, 1986):

re

NIR  R NIR  R

(1)

lP

NDVI 

where NIR is the near-infrared portion of the electromagnetic spectrum (0.76-0.90μm) and R is

Jo

2.2.2.8. Land use

ur

na

the red portion of the electromagnetic spectrum (0.63-0.69μm).

Land use is one of the most important hydrologic variables that directly affect runoff volume and velocity (Yalcin et al., 2011) and sediment transportation (Benito et al., 2010). Several investigations emphasized on the importance of the land use pattern on flooding (Benito et al., 2010; Beckers et al., 2013; García Ruiz et al., 2008; Karlsson et al., 2017; Komolafe et al., 2018). The land use map in this study was received from the IDWRMC for the year 2012 (Fig. 2h).

2.2.2.9. Lithology

Journal Pre-proof Lithology has also a significant role in runoff volume and its speed by controlling the infiltration rate and sediment transport. Sediment transport, which is one of the flood-related components, is affected by geological formation erodibility. Lee et al. (2012) indicated that different geology units have different susceptibility to flooding. It also affects the channel shape on temporal floods (Reneau, 2000; Heitmuller et al., 2015). The geology map of the study region consists of 31 geology units (Fig. 2i), which was obtained from IDWRMC.

ro

of

Fig. 2 SOMEWHERE HERE

-p

2.3. Machine learning models

re

2.3.1. Generalized additive model (GAM)

lP

GAM known as “wiggly models” (Mitchell Lyons, 2018) was advanced by Hastie and Tibshirani (1990) as integration of generalized linear model (GLM) with additive models. In GAM the

na

linear relationship between the dependent and independent variables is replaced by non-linear

ur

smooths (Jones and Wrigley, 1995). GAM uses the additive approach in which the suitable functional form is selected based on the data without prior knowledge of the functional aspects

Jo

of the model (Jones and Almond, 1992). Environmental data rarely follow simple linear models and are usually best explained GAM. GAMs are initially developed to deliver the benefits of GLMs and additive models in a single model, simultaneously (Hastie and Tibshirani 1999). GAMs are nonparametric extensions of the GLMs and the main advantage of that over the second type of the models is the capability of fitting the complex and non-linear relationships. The response variable (Y) in GAMs does not necessarily follow the normal distribution and it can be fitted by a variety of distributions such as Poisson or binomial distributions. A link

Journal Pre-proof function (g) is used to connect the response variable with the predictors to generate the predictions: (1) where E(Y) is the expected values of the response variable, β0 is the coefficient vector, and f1(x1),…fm(xm) are the predictors. In essence, floods are complex natural phenomena and there is a nonlinear relationship between the flood occurrence and environmental conditioning factors.

of

Owing to the predictive power and ability of the modeling non-linear relationships, GAMs are

-p

ro

appropriate models for flood susceptibility predictions and thus considered in this study.

re

2.3.2. Multivariate adaptive regression splines (MARS)

Friedman (1991) proposed the MARS model as a flexible method of analyzing the multi-

lP

dimensional experimental data. Belonging to the nonparametric models, MARS possess the great

na

potential of modeling complex nonlinear process (Friedman, 1991; Zhang and Goh, 2016) and reproducing the simple, easier-to-interpret piecewise linear models (Zhang and Goh, 2016).

ur

MARS modeling follows a “divide and conquer” strategy by partitioning the train data sets into

Jo

subdomains and fitting the separate piecewise linear models (splines) with varied gradients (Zhang and Goh, 2016). Splines, also called basis functions (BFs), are connected through a network of knots that are randomly positioned within a range of input variables. MARS models are built in a reciprocating process: in the stepwise step, MARS generates the BFs and placed the candidate knots to capture the interactions between all available variables. The knots locations are optimized by adopting the adaptive regression algorithm. In the backward step, the pruning of the surplus BFs is conducted for eliminating the least contributed BFs. The general structure of the MARS models is as follows:

Journal Pre-proof (2) where y represents the dependent variable, β0 represents the constant coefficient, M represents the number of BFs, λm represents the basis function with βm as its corresponding coefficient and X represents the predictors (flood conditioning factors in the current study). There are no assumptions regarding the data distributions and nonlinear associations between the predictors and dependent variable in MARS models (Kisi et al., 2019) which makes them capable of

of

modeling complicated nonlinear processes such as flood occurrence. More technical information

-p

ro

about the MARS models can be found in Hastie et al. (2009) and Put et al. (2004).

re

2.3.3. Boosted regression trees (BRT)

Freund and Schapire (1996) firstly introduced the boosted regression trees (BRT) as a powerful

lP

data mining algorithm for continuous and categorical input variables. Boosted regression trees

na

(BRT) are ensemble models which compound the advantages of regression tree models and boosting method (Elith et al., 2008). Construction of multiple regression models instead of fitting

ur

simple linear models improves the performance of the predictions in BRT models (Schapire,

Jo

2003). Two algorithms are used for handling the nonlinear relationships between the response variable and predictors: i) regression trees map out the associations between the response variable and predictors through a recursive binary split procedure and ii) boosting prunes the tree by incorporating the simple regression models to increase the model performances. In BRT models a successive fitting of tree models is conducted where each posterior tree is adapted to fit the residual of the preceding model (Knoll et al., 2019). Despite numerous points such as accepting the input variables of different types and insensitivity to outliers, BRT models produce biased results with insufficient data lengths (Jin et al., 2018). Resampling approaches described

Journal Pre-proof in the following, are adopted in the present study to promote the predictive performance of this model.

2.4. Resampling approaches Any predictive machine learning model needs to be tuned for the parameters before using it to make predictions. More representatives a sample of the population, less biased the parameters of

of

the fitting model and predictions are. This is highly impactful when handling spatial data such as

ro

flood susceptibility with a high degree of variabilities. There are several categories of effective

-p

resampling techniques such as Jackknife resampling, cross-validation, nonparametric

re

bootstrapping and random subsampling (Fox, 2002). The last two methods are stressed in the current study. These algorithms can be applied with any loss function and any nonlinear

na

lP

modeling approaches (Hastie et al., 2009).

ur

2.4.1. Bootstrapping algorithm

Jo

The bootstrapping algorithm introduced by (Efron, 1979) is a statistical inference method which builds a sampling distribution to resample the original data (Fox, 2002). The basic idea of this algorithm is presuming the sample data as a population from which several samples can be drawn. Let Z = (z1, z2, …, zN) where zi = (xi,y) with xi denoting the explanatory variable (flood conditioning factors) and y denoting the response variable (flood susceptibility rate here in this study). Bootstrapping algorithm randomly draws completely independent samples with the replacement of the original training data Z with each drawn sample having the same length as to the original data set. Bootstrapping is conducted B times to produce B bootstrap samples, preserving the stochastic characteristics of the original data Z. Then B times modeling efforts

Journal Pre-proof tried to fit the separate bootstrap samples with reproducing the B flood susceptibility maps. The mean of the flood susceptibility predictions over the B runs for each model were taken as flood susceptibility map. 2.4.2. Random Subsampling Random subsampling randomly splits the data into train and test portions and repeats the process

of

in B iterations. In contrast to the bootstrapping algorithm, the random subsampling method resamples the data without replacement and thus the original data are engaged for model

ro

calibration rather than bootstrap samples. In each time a part of data is used for learning the

-p

models and the rest of the data are used for model validation. In this way, a wide range of

na

lP

occurrence) can be captured by the models.

re

associations between the predictors (flood conditioning factors) and the response variable (flood

ur

2.5. Modeling development of proposed integrative models The present study aimed to introduce the novel integrative intelligent models based on ensemble

Jo

machine learning and resampling algorithms for increasing the flood susceptibility predictions over the large scales. GIS database including nine flood conditioning factors were mapped in the ArcGIS 10.3 interface. Flood inventory map was also prepared by locating the 147 ground control points of historical floods. All of the spatial GIS database were converted to the ASCII format (Mackenzie, 1980) to be readable by R software (R Development Core Team, 2016). The proposed models were programmed using the SDM (Naimi and Araújo, 2016), Biomode2 (Thuiller et al., 2013), raster (Hijmans et al., 2017), gbm (Ridgeway, 2013), and earth (Milborrow, 2019) packages. For the construction of the models, the random selection method

Journal Pre-proof was used to divide the points into training and testing subsets based on bootstrapping and random subsampling methods. Due to the large scale of the study area and the increased sample size and for the sake of saving time a total number of B = 10 runs were selected for each resampling approach. This means that each of the machine learning models fits the spatial data over the 10 runs and produces 10 flood susceptibility prediction maps. Then, the mean of the

of

predictions over 10 runs was mapped as the final flood susceptibility map for each model. For better visual discrimination of the flood susceptibilities between different regions, the natural

ro

break classification method (Poli and Sterlacchini, 2007) was used to categorize the pixels with

-p

similar susceptibility values into the same groups. This classification algorithm specifies the

re

class breaks by minimizing the within-class differences and maximizing the between-class

lP

differences (Choubin et al., 2019b). Flowchart of the stages followed for developing the proposed models is illustrated in Fig. 3.

ur

na

Fig. 3 SOMEWHERE HERE

Jo

2.6. Accuracy assessment of proposed integrative models Flood predictive models need to be assessed for performance before making predictions. This study utilized various statistical accuracy assessment measures such as receiver operating characteristics (ROC) area under the ROC curve (AUC), true skill statistic (TSS, also known as Hanssen-Kuipers discriminant), and correlation coefficient (COR), to assess the efficiency of the proposed models. Tien Bui et al. (2016) indicated that RMSE shows sensitivities to outliers and larger values of the investigated variables thus alternative statistical measures and graphical assessment of the models are also utilized for performance evaluation of the models. This study

Journal Pre-proof also utilized TSS statistic as an alternative measure to kappa statistic, which, unlike the kappa statistic, is not affected by the prevalence and size of the validation data sets (Allouche et al., 2006). (3)

of

(4)

ro

In the above equations n denotes to the total sample size, FOI denotes the flood occurrence index

-p

(Occurrence ≈ 1, Non-occurrence ≈ 0), FSI denotes the flood susceptibility index offered by the

re

models, a denotes the number of flood occurrence points which accurately classified by the

lP

models as flooded pixels, b denoted to the Non-occurrence flood points which inaccurately classified as flooded pixels by the model, c denoted the flood occurrence points which

na

inaccurately classified as the Non-flooded pixels, d denoted the Non-occurrence flood points

ur

which accurately classified as the non-flooded pixels.

Jo

The receiver operating characteristic (ROC) method was implemented for the assessment of the models. The ROC curve has been frequently applied for accuracy evaluation of the spatial prediction models for flood susceptibility modeling (Chen et al., 2011; Khosravi et al., 2016a, b; Tien Bui et al., 2016; Lee et al., 2017; Hong et al., 2018; Tien Bui et al., 2018a; Choubin et al., 2019b). The ROC curve is a 2-dimensional curve with false-positive rates (1-specificity) in the x-axis versus true-positive rates (sensitivity) in the y-axis. The sensitivity defined as the frequency of the flooded pixels distinguished as flooded and the specificity defined as the frequency of non-flooded pixels distinguished as non-flooded (Hong et al., 2018). The area under the ROC curve (AUC) is used for the quantitative assessment of the developed integrative

Journal Pre-proof model. The AUC values range between 0 (when the model is absolutely non-informative) and 1 (when the model absolutely performed well) (Evans et al., 2005). The higher AUC value, the better the model is (Fawcett, 2006). The AUC values < 0.6 indicates poor performance, 0.6 - 0.7 indicates moderate performance, 0.7 - 0.8 good performance and > 0.8 indicates a very good

of

performance of the model.

ro

3. Results and discussion

-p

3.1. Flood susceptibility prediction using the developed integrative models The steps outlined in Fig. 3 are followed to predict the flood susceptibility rate for the whole

re

study domain. For this purpose, spatial variation of the flood locations in relation to their

lP

conditioning factors was modeled using the developed models. Given that the representational

na

accuracy of the training data strongly affects the model performance and simulations (Zhou and Wu, 2011), this study utilized the benefits of RS and BT algorithms to share all of the data in

ur

learning and validation process. RS and BT algorithms were used for B = 10 runs of resampling

Jo

of the training and validation data sets. For each run of integrative models with the random subsampling algorithm, 70% of the data were randomly sampled and used for learning the models and the trained models were then validated using the remaining 30% of the data. In the bootstrapping algorithm, a sample with equal length to the original data is sampled with replacement so that approximately 63.2% of total data is sampled as training data and remaining 36.8% of the total data, which is not selected during the sampling, is used for model validation. The mean of the flood susceptibility rates over the B = 10 runs was transferred to GIS software to construct the flood susceptibility maps. Fig. 4 displays the flood susceptibility predictions over the study domain. Based on all model predictions, coastal areas in the southeast of the study

Journal Pre-proof region are the most susceptible areas to flooding and are categorized into the very high susceptibility class (Fig. 4). Fig. 2 (g) demonstrates the higher NDVI values for these regions and thus the lower flood susceptibilities expected for these areas (Tehrany et al., 2013). Khosravi et al. (2016a) and Tien Bui et al. (2016) also indicated the role of NDVI as a major flood conditioning factor among the others. However, the main reason for the higher values of NDVI in these regions refers to the greatest amounts of rainfalls with a long time of continuity due to

of

the proximity to the Caspian Sea (Fig. 2f) which diminished the role of vegetation attributes in

ro

this region. Fig. 4 displays that the river marginal areas are more susceptible to flooding and are

-p

classified within the very high flood susceptibility class. This result is in agreement with the

re

findings of other studies such as Yang et al. (2018), Darabi et al. (2019), and Choubin et al.

lP

(2019b) which demonstrated the higher flood susceptibility of the regions around the rivers.

na

Fig. 4 SOMEWHERE HERE

ur

3.2. Model validation and comparison

Jo

Fig. 4 shows that all of the investigated models yield similar results in spite of the geographical extent of the flood-susceptible areas. The coastal areas in the southeast margins of the Caspian Sea and river margins are more susceptible areas to flooding based on the model simulations. For validation of the models, historical flood test points were mapped on the flood susceptibility prediction maps. As Fig. 5 illustrates the historical flood points match the high and very high susceptible areas around the rivers and coastal areas. For more evaluation of the developed models, a comparison was made between these models and conventional standalone models such as MLP, SVM, and standalone GAM, MARS and BRT models. Table 1 provided the performance evaluation measures for the investigated models using the test data. COR values

Journal Pre-proof imply the correlations between the FOI (Occurrence = 1 and Non-occurrence = 0) and the FSI values, predicted by the ML models. The higher COR values indicate the close agreement between the FOI and FSI and the high efficiency of the models. Results indicated that proposed models based on BT algorithm attained the best performance with AUC values varying between 0.95 and 0.98, COR coefficient varying from 0.79 to 0.91 and TSS varying between 0.83 and 0.94, which outperformed the MLP and SVM and standalone GAM, MARS, and BRT

of

benchmark models (Table 1).

-p

ro

Table 1 SOMEWHERE HERE

re

The Standalone model also attained good results with the RS algorithm with AUC values varying

lP

between 0.92 and 0.96, COR coefficient varying from 0.73 to 0.86, TSS varying between 0.77and 0.89. Results demonstrated that BT and RS algorithms improved the performance of the

na

Standalone models. All of the Standalone models performed better with the BT algorithm

ur

compared to the RS algorithm.

Jo

Fig. 5 displays the ROC curves for developed models using the train and test flood points. The ROC curves on the graphs belong to the individual runs and the thicker curves are the mean values over the B = 10 runs. As it is clear, the mean ROC curves are closer to the upper right corner compared to the single ROC curves which verified the improved performance of the models. For a given model with a high degree of agreement between the number of flood points and frequency of cells with very high susceptibility rates the slope of the ROC curve increases and as a result the ROC curve gets closer to the upper left corner (i.e., closer to AUC = 1). The results exhibited that the proposed models have an admirable performance with spatial flood

Journal Pre-proof susceptibility predictions. The superiority of the BT algorithm is also clear in these figures than the RS algorithm. For a better understanding of the model efficiencies, it is also important to determine the area percent of each flood susceptibility class reproduced by the model. Smaller the high and very high susceptibility classes areas, the more efficient the model is. It is seen that BT-MARS model reproduced lower area percent of very high susceptibility classes compared to alternative models. Table 2 outlined the area percent of each susceptibility class reproduced by

of

the BT-GAM, BT-BRT, and BT-MARS models. The results indicated that BT-MARS and BT-

ro

GAM models reproduced the lower area percent of very high susceptibility class (26 and 32%)

-p

compared to the BT-BRT model (46%). The number of flood points embraced in flood

re

susceptibility classes is also a good indicator of model adequacy. Table 2 presented the number of historical flood points located within the flood susceptibility classes. Likewise, BT-MARS

lP

and BT-GAM performed better than BT-BRT which embraces the modest and greatest number

na

of flood points within the very low and very high susceptibility classes. For the BT-MARS and BT-GAM models, respectively, number of 1 (0.7%) and 1(0.7%) flood points are embraced in

ur

very low susceptibility, whereas number of 134 (91%) and 133 (90.5%) flood points are

Jo

embraced within the very high susceptibility class. For the BT-BRT model, a number of 8 flood points (5%) are embraced within the very low susceptibility and 80 flood points (54%) are embraced within the very high susceptibility class. Fig. 5 SOMEWHERE HERE Table 2 SOMEWHERE HERE

3.3. Sensitivity analysis of the response variable

Journal Pre-proof Appropriate selection of the flood conditioning factors is crucial for reliable flood susceptibility predictions (Kourgialas and Karatzas, 2011). The role of flood conditioning factors varies from region to region and we should be able to infer that it is a function of sample data properties. To prove this claim, the present study used the BT algorithm along with the GAM model to model flood susceptibility using different samples over B = 10 runs of simulations. In each run, a sample was randomly drawn with the replacement of original data and was used to analyze the

of

associations between the flood conditioning factors and flood occurrence using the GAM model

ro

based on the AUC and COR criteria. The importance of the flood conditioning factors was

-p

investigated based on two statistics of the decrease in correlation (DC) and decrease in AUC

re

(DAUC) (Choubin et al., 2019a) for different runs of simulations. As Fig. 6 shows, for each run different variables are identified as the most influencing factors with varying degrees of

lP

importance. For example, the distance to river and rainfall variable are the main influencing

na

factors in run 6, while NDVI and elevation had a more pronounced role in run 7. Generally, during the 10 runs, the most important variables were NDVI, distance to river, elevation,

ur

lithology, and rainfall among others (Fig. 6). For determining the most influencing factors, mean

Jo

of the variable importance over B = 10 runs was calculated. Fig. 7 displays average importance of the variable over different runs for each model. As illustrated in this figure NDVI, distance to river, elevation, and lithology are the main influencing factors of flood occurrence over the study domain. These results are accordance with Khosravi et al. (2018), Bui at al. (2018), Choubin et al. (2019b), (Darabi et al., 2019). Fig. 6 SOMEWHERE HERE Fig. 7 SOMEWHERE HERE

Journal Pre-proof 3.4. Justification of the proposed models and future works As discussed above, the flood susceptibility prediction depends on the appropriate selection of the flood conditioning factors and both of them are a function of the sample data points used for modeling. All of the investigations on flood susceptibility modeling used one-time data splitting for flood susceptibility modeling and predictions. Regarding the increased nonlinearity between the flood influencing factors and flood occurrence in large scale areas, the biased results would

of

be expected for flood susceptibility predictions by applying one-time data splitting of training

ro

and validation sets. This is due to the fact that limited information is used for projecting the flood

-p

influencing factors and flood occurrence which yields biased flood susceptibility predictions.

re

The proposed models based on the BT and RS algorithms provide sufficient information for multi-time model learning using different sample points. By using these models, a wide range of

lP

nonlinear associations between the predictors and predictant is captured leading to unbiased

na

predictions. For the sake of limited speed processing number of B = 10 runs of resampling are used in this study. Although the results of this study were promising, for the areas with limited

ur

hydrological data and increased nonlinearity between the predictors and flood occurrence the

Jo

number of bootstrap samples should be increased to an adequate number.

4. Conclusions This study proposed novel integrative models based on the bootstrapping (BT) and random subsampling (RS) algorithms, incorporated with machine learning models, for unbiased flood susceptibility predictions. Three machine learning models namely MARS, GAM and BRT incorporated with two resampling algorithms (BT and RS) and six new integrative models have emerged. Results indicated that employing resampling approaches improved the performance of

Journal Pre-proof the machine learning models. The BT algorithm is still performed better than the RS algorithm in terms of performance evaluation. The BT-GAM and BT-MARS models had competitive results in terms of performance evaluation measures. However, the BT-MARS model reproduced lower area percent of a very high susceptibility class. Sensitivity analysis of the response variable to various flood conditioning factors identified different conditioning factors with varying degrees of importance for each run of simulation which justified the use of proposed models for flood

of

susceptibility modeling. Regarding the extent of the study area, the models presented in this

ro

study provided good results on a large scale, and therefore their results could be considered in

re

-p

large scale planning for the development of urban areas.

lP

References

na

Allouche, O., Tsoar, A., Kadmon, R., 2006. Assessing the accuracy of species distribution models: Prevalence, kappa and the true skill statistic (TSS). J. Appl. Ecol.

ur

https://doi.org/10.1111/j.1365-2664.2006.01214.x

Jo

Araújo, M.B., Pearson, R.G., Thuiller, W., Erhard, M., 2005. Validation of species-climate impact models under climate change. Glob. Chang. Biol. https://doi.org/10.1111/j.13652486.2005.01000.x Arnold, J.G., Srinivasan, R., Muttiah, R.S., Williams, J.R., 1998. Large area hydrologic modeling and assessment part I: Model development. J. Am. Water Resour. Assoc. https://doi.org/10.1111/j.1752-1688.1998.tb05961.x Azareh, A., Sardooi, E.R., Choubin, B., Barkhori, S., Shahdadi, A., Adamowski, J. and Shamshirband, S. 2019. Incorporating multi-criteria decision-making and fuzzy-value

Journal Pre-proof functions

for

flood

susceptibility

assessment,

Geocarto

International,

DOI:

10.1080/10106049.2019.1695958 Beckers, A., Dewals, B., Erpicum, S., Dujardin, S., Detrembleur, S., Teller, J., Pirotton, M., Archambeau, P., 2013. Contribution of land use changes to future flood damage along the river

Meuse

in

the

Walloon

region.

Nat.

Hazards

Earth

Syst.

Sci.

https://doi.org/10.5194/nhess-13-2301-2013

of

Benito, G., Rico, M., Sánchez-Moya, Y., Sopeña, A., Thorndycraft, V.R., Barriendos, M., 2010.

the

Guadalentín

River,

southeast

Spain.

Glob.

Planet.

Change.

-p

of

ro

The impact of late Holocene climatic variability and land use change on the flood hydrology

re

https://doi.org/10.1016/j.gloplacha.2009.11.007

Bicknell, B.R., Imhoff, J.C., Kittle Jr., J.L., Donigan Jr., A.S., Johanson, R.C., 1997.

lP

Hydrological Simulation Program--Fortran, User’s manual for version 11: U.S.

na

Environmental Protection Agency, National Exposure Research Laboratory, Athens, Ga., EPA/600/R-97/080, 755 p. https://doi.org/EPA/600/R-97/080

ur

Brammer, H., 1990. Floods in Bangladesh: Geographical Background to the 1987 and 1988

Jo

Floods. Geogr. J. https://doi.org/10.2307/635431 Bui, D.T., Khosravi, K., Li, S., Shahabi, H., Panahi, M., Singh, V.P., Chapi, K., Shirzadi, A., Panahi, S., Chen, W., Bin Ahmad, B., 2018a. New hybrids of ANFIS with several optimization algorithms for flood susceptibility modeling. Water (Switzerland). https://doi.org/10.3390/w10091210 Bui, D.T., Panahi, M., Shahabi, H., Singh, V.P., Shirzadi, A., Chapi, K., Khosravi, K., Chen, W., Panahi, S., Li, S., Ahmad, B. Bin, 2018b. Novel Hybrid Evolutionary Algorithms for Spatial Prediction of Floods. Sci. Rep. https://doi.org/10.1038/s41598-018-33755-7

Journal Pre-proof Bui, D.T., Tsangaratos, P., Ngo, P.T.T., Pham, T.D., Pham, B.T., 2019. Flash flood susceptibility modeling using an optimized fuzzy rule based feature selection technique and tree based ensemble methods. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2019.02.422 Cardenas, M.B., Wilson, J.L., Zlotnik, V.A., 2004. Impact of heterogeneity, bed forms, and stream

curvature

on

subchannel

hyporheic

exchange.

Water

Resour.

Res.

https://doi.org/10.1029/2004WR003008

of

Cea, L., Bladé, E., 2015. A simple and efficient unstructured finite volume scheme for solving

ro

the shallow water equations in overland flow applications. Water Resour. Res.

-p

https://doi.org/10.1002/2014WR016547

re

Chanson, H., Brown, R., McIntosh, D., 2014. Human body stability in floodwaters: The 2011 flood in Brisbane CBD, in: ISHS 2014 - Hydraulic Structures and Society - Engineering

lP

Challenges and Extremes: Proceedings of the 5th IAHR International Symposium on

na

Hydraulic Structures. https://doi.org/10.14264/uql.2014.48 Chapi, K., Singh, V.P., Shirzadi, A., Shahabi, H., Bui, D.T., Pham, B.T., Khosravi, K., 2017. A

ur

novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ.

Jo

Model. Softw. https://doi.org/10.1016/j.envsoft.2017.06.012 Chen, Y.R., Yeh, C.H., Yu, B., 2011. Integrated application of the analytic hierarchy process and the geographic information system for flood risk assessment and flood plain management in Taiwan. Nat. Hazards. https://doi.org/10.1007/s11069-011-9831-7 Choubin, B., Moradi, E., Golshan, M., Adamowski, J., Sajedi-Hosseini, F., Mosavi, A., 2019. An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2018.10.064

Journal Pre-proof Croke, B., Andrews, F., Jakeman, A., 2005. Redesign of the IHACRES rainfall-runoff model. Proc. Darabi, H., Choubin, B., Rahmati, O., Torabi Haghighi, A., Pradhan, B., Kløve, B., 2019. Urban flood risk mapping using the GARP and QUEST models: A comparative study of machine learning techniques. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2018.12.002 Efron, B., 1979. Bootstrap Methods: Another Look at the Jackknife. Ann. Stat.

of

https://doi.org/10.1214/aos/1176344552

ro

Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regression trees. J. Anim.

-p

Ecol. https://doi.org/10.1111/j.1365-2656.2008.01390.x

re

Evans, R., Horstman, C., Conzemius, M., 2005. Accuracy and optimization of force platform gait analysis in Labradors with cranial cruciate disease evaluated at a walking gait. Vet.

T.,

2006.

An

introduction

to

ROC

analysis.

Pattern

Recognit.

Lett.

na

Fawcett,

lP

Surg. https://doi.org/10.1111/j.1532-950X.2005.00067.x

https://doi.org/10.1016/j.patrec.2005.10.010

ur

Feldman, A., 2000. Hydrologic modeling system HEC-HMS, Technical Reference Manual.

Jo

Tech. Ref. Man. https://doi.org/CDP-74B Fenicia, F., Savenije, H.H.G., Matgen, P., Pfister, L., 2008. Understanding catchment behavior through

stepwise

model

concept

improvement.

Water

Resour.

Res.

Ann.

Stat.

https://doi.org/10.1029/2006WR005563 Fox,

J.,

2002.

Bootstrapping

Regression

Models.

https://doi.org/10.1214/aos/1176345638 Freund, Y., Schapire, R.R.E., 1996. Experiments with a New Boosting Algorithm. Int. Conf. Mach. Learn. https://doi.org/10.1.1.133.1040

Journal Pre-proof Friedman, J., 1991. Multivariate adaptive regression splines (with discussion). Ann. Stat. García Ruiz, P.J., Ignacio, Á.S., Pensado, B.A., García, A.C., Frech, F.A., López, M.Á., González, J.A., Octavio, J.B., Burguera Hernández, J.A., Garriga, M.C., Blanco, D.C., García, B.C., Cordero, M.C., Peña, J.C., Ibáñez, A.E., Onisalde, A.G., Giménez-Roldán, S., Ibáñez, P.G., Vara, J.H., Alonso, R.I., Jiménez Jiménez, F.J., Krupinski, J., Bojarsky, J.K., Ramírez, I.L., García, E.L., Martínez-Castrillo, J.C., González, D.M., Rodríguez, F.M.,

of

Rivera, P.M., Fargas, E.M., Inchausti, J.O., Romero, J.O., Plana, J.O., Vallejo, P.O.,

ro

Sedano, B.P., de Colosía Rama, V.P., López-Fraile, I.P., Comes, A.P., Periz, V.P.,

-p

Rodríguez Oroz, M.C., García, D.S., Pérez, P.S., Muñoz, J.S., Gamo, J.V., Merino, C.V., Serra, F.V., Velázquez Pérez, J.M., Baña, R.Y., Capdepon, I.Z., 2008. Efficacy of long-term

re

continuous subcutaneous apomorphine infusion in advanced Parkinson’s disease with motor

lP

fluctuations: A multicenter study. Mov. Disord. https://doi.org/10.1002/mds.22063

Focus

on

Iran:

na

Ghomian, Z., Yousefian, S., 2017. Natural Disasters in the Middle-East and North Africa With a 1900

to

2015.

Heal.

Emergencies

Disasters

Q.

ur

https://doi.org/10.18869/nrip.hdq.2.2.53

Jo

Guzzetti, F., Mondini, A.C., Cardinali, M., Fiorucci, F., Santangelo, M., Chang, K.T., 2012. Landslide inventory maps: New tools for an old problem. Earth-Science Rev. https://doi.org/10.1016/j.earscirev.2012.02.001 Habibian, F., 2018. Increased number of floods in Iran [WWW Document]. Econ. news database. Haghizadeh, A., Siahkamari, S., Haghiabi, A.H., Rahmati, O., 2017. Forecasting flood-prone areas using Shannon’s entropy model. J. Earth Syst. Sci. https://doi.org/10.1007/s12040017-0819-x

Journal Pre-proof Hastie, T., Tibshirani, R., 1999. Generalized additive models. Chapman & Hall/CRC, Boca Raton, Fla.; London. Hastie, T., Tibshirani, R., Friedman, J., 2009. Elements of Statistical Learning 2nd ed. Elements. https://doi.org/10.1007/978-0-387-84858-7 Heitmuller, F.T., Hudson, P.F., Asquith, W.H., 2015. Lithologic and hydrologic controls of mixed alluvial-bedrock channels

in

flood-prone fluvial

systems:

Bankfull

and

of

macrochannels in the Llano River watershed, central Texas, USA. Geomorphology.

ro

https://doi.org/10.1016/j.geomorph.2014.12.033

-p

Hijmans, R.J., van Etter, J., Cheng, J., Mattiuzzi, M., Summer, M., Greenberg, J.A., Lamigueiro,

re

O.P., Bevan, A., Racine, E.B., Shortridge, A., Ghosh, A., 2017. Geographic Data Analysis and Modeling. R CRAN Proj.

lP

Hong, H., Panahi, M., Shirzadi, A., Ma, T., Liu, J., Zhu, A.X., Chen, W., Kougias, I., Kazakis,

na

N., 2018. Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci. Total Environ.

ur

https://doi.org/10.1016/j.scitotenv.2017.10.114

Jo

Hosseini, F.S., Choubin, B., Mosavi, A., Nabipour, N., Shamshirband, S., Darabi, H. and Haghighi, A.T., 2019. Flash-flood hazard assessment using Ensembles and Bayesian-based machine learning models: application of the simulated annealing feature selection method. Science of The Total Environment, p.135161. Jamali, M., Moghimi, E., Jafarpour, Z., Kardavani, P., 2015. Spatial analysis of the geomorphological hazards of urban development in Shiraz dry river. J. Spat. Anal. Environ. Hazards 3, 51–61. Jin, X., Wang, S., Yu, N., Zou, H., An, J., Zhang, Yuling, Wang, J., Zhang, Yulong, 2018.

Journal Pre-proof Spatial predictions of the permanent wilting point in arid and semi-arid regions of Northeast China. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2018.07.038. Jones, K. and Almond, S., 1992. Moving out of the linear rut: the possibilities of generalized additive models. Transactions of the Institute of British Geographers, pp.434-447. Jones, K. and Wrigley, N., 1995. Generalized additive models, graphical diagnostics, and logistic regression. Geographical Analysis, 27(1), pp.1-18.

of

Karlsson, C.S.J., Kalantari, Z., Mörtberg, U., Olofsson, B., Lyon, S.W., 2017. Natural Hazard

ro

Susceptibility Assessment for Road Planning Using Spatial Multi-Criteria Analysis.

-p

Environ. Manage. https://doi.org/10.1007/s00267-017-0912-6

re

Kazakis, N., Kougias, I., Patsialis, T., 2015. Assessment of flood hazard areas at a regional scale using an index-based approach and Analytical Hierarchy Process: Application in Rhodope-

lP

Evros region, Greece. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2015.08.055

na

Khosravi, K., Nohani, E., Maroufinia, E., Pourghasemi, H.R., 2016a. A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio

ur

and weights-of-evidence bivariate statistical models with multi-criteria decision-making

Jo

technique. Nat. Hazards. https://doi.org/10.1007/s11069-016-2357-2 Khosravi, K., Pham, B.T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., Prakash, I., Tien Bui, D., 2018. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2018.01.266 Khosravi, K., Pourghasemi, H.R., Chapi, K., Bahri, M., 2016b. Flash flood susceptibility analysis and its mapping using different bivariate models in Iran: a comparison between Shannon’s entropy, statistical index, and weighting factor models. Environ. Monit. Assess.

Journal Pre-proof https://doi.org/10.1007/s10661-016-5665-9 Khosravi, K., Shahabi, H., Pham, B.T., Adamowski, J., Shirzadi, A., Pradhan, B., Dou, J., Ly, H.B., Gróf, G., Ho, H.L., Hong, H., Chapi, K., Prakash, I., 2019. A comparative assessment of flood susceptibility modeling using Multi-Criteria Decision-Making Analysis and Machine Learning Methods. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2019.03.073 Kia, M.B., Pirasteh, S., Pradhan, B., Mahmud, A.R., Sulaiman, W.N.A., Moradi, A., 2012. An

of

artificial neural network model for flood simulation using GIS: Johor River Basin,

ro

Malaysia. Environ. Earth Sci. https://doi.org/10.1007/s12665-011-1504-z

-p

Knoll, L., Breuer, L., Bach, M., 2019. Large scale prediction of groundwater nitrate

re

concentrations from spatial data using machine learning. Sci. Total Environ. https://doi.org/10.1016/j.scitotenv.2019.03.045

Areas

under

the

Influence

of

Climate

Change.

Nat.

Hazards

Rev.

na

Urban

lP

Komolafe, A.A., Herath, S., Avtar, R., 2018. Methodology to Assess Potential Flood Damages in

https://doi.org/10.1061/(asce)nh.1527-6996.0000278

flood-hazard

areas—a

case

study.

Hydrol.

Sci.

J.

Jo

assess

ur

Kourgialas, N.N., Karatzas, G.P., 2011. Flood management and a GIS modelling method to

https://doi.org/10.1080/02626667.2011.555836 Kumar, R., Acharya, P., 2016. Flood hazard and risk assessment of 2014 floods in Kashmir Valley: a space-based multisensor approach. Nat. Hazards. https://doi.org/10.1007/s11069016-2428-4 Lee, M.J., Kang, J.E., Jeon, S., 2012. Application of frequency ratio model and validation for predictive flooded area susceptibility mapping using GIS, in: International Geoscience and Remote Sensing Symposium (IGARSS). https://doi.org/10.1109/IGARSS.2012.6351414

Journal Pre-proof Lee, Sunmin, Kim, J.C., Jung, H.S., Lee, M.J., Lee, Saro, 2017. Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea. Geomatics, Nat. Hazards Risk. https://doi.org/10.1080/19475705.2017.1308971 Li, X.H., Zhang, Q., Shao, M., Li, Y.L., 2011. A Comparison of Parameter Estimation for Distributed Hydrological Modelling Using Automatic and Manual Methods. Adv. Mater. Res. https://doi.org/10.4028/www.scientific.net/amr.356-360.2372

of

Luu, C., Von Meding, J., Kanjanabootra, S., 2018. Assessing flood hazard using flood marks and

ro

analytic hierarchy process approach: a case study for the 2013 flood event in Quang Nam,

-p

Vietnam. Nat. Hazards. https://doi.org/10.1007/s11069-017-3083-0

re

Mackenzie, C.E., 1980. Coded character sets : history and development. Addison-Wesley, Reading, Mass.

lP

Milborrow, S., 2019. earth:Multivariate Adaptive Regression Splines.

na

Miles, R.E., Snow, C.C., 1984. Designing strategic human resources systems. Organ. Dyn. https://doi.org/10.1016/0090-2616(84)90030-5

ur

Mitchell Lyons, 2018. Generalised additive models (GAMs): an introduction [WWW

Jo

Document]. Environ. Comput. URL http://environmentalcomputing.net/intro-to-gams/ Mosavi, A., Ozturk, P., Chau, K.W., 2018. Flood prediction using machine learning models: Literature review. Water (Switzerland). https://doi.org/10.3390/w10111536 Naimi, B., Araújo, M.B., 2016. Sdm: A reproducible and extensible R platform for species distribution modelling. Ecography (Cop.). https://doi.org/10.1111/ecog.01881 Picard, R.R., Cook, R.D., 1984. Cross-Validation of Regression Models. J. Am. Stat. Assoc. 79, 575–583. https://doi.org/10.1080/01621459.1984.10478083 Poli, S., Sterlacchini, S., 2007. Landslide representation strategies in susceptibility studies using

Journal Pre-proof weights-of-evidence modeling technique. Nat. Resour. Res. https://doi.org/10.1007/s11053007-9043-8 Politis, D.N., Romano, J.P., Wolf, M., 1999. Subsampling. Springer, New York, NY. Predick, K.I., Turner, M.G., 2008. Landscape configuration and flood frequency influence invasive shrubs in floodplain forests of the Wisconsin River (USA). J. Ecol. https://doi.org/10.1111/j.1365-2745.2007.01329.x

of

Put, R., Xu, Q.S., Massart, D.L. and Vander Heyden, Y., 2004. Multivariate adaptive regression

ro

splines (MARS) in chromatographic quantitative structure–retention relationship studies.

-p

Journal of Chromatography A, 1055(1-2), pp.11-19.

re

R Development Core Team, 2016. R: A language and environment for statistical computing. R Found. Stat. Comput. https://doi.org/10.1017/CBO9781107415324.004

lP

Rahmati, O., Kornejady, A., Samadi, M., Nobre, A.D., Melesse, A.M., 2018. Development of an

na

automated GIS tool for reproducing the HAND terrain model. Environ. Model. Softw. https://doi.org/10.1016/j.envsoft.2018.01.004

ur

Rahmati, O., Pourghasemi, H.R., Zeinivand, H., 2016. Flood susceptibility mapping using

Jo

frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto Int. https://doi.org/10.1080/10106049.2015.1041559 Reneau, S.L., 2000. Stream incision and terrace development in Frijoles Canyon, Bandelier National Monument, New Mexico, and the influence of lithology and climate. Geomorphology. https://doi.org/10.1016/S0169-555X(99)00094-X Rezaie-balf, M., Naganna, S.R., Ghaemi, A., Deka, P.C., 2017. Wavelet coupled MARS and M5 Model

Tree

approaches

for

groundwater

https://doi.org/10.1016/j.jhydrol.2017.08.006

level

forecasting.

J.

Hydrol.

Journal Pre-proof Ridgeway, G., 2013. gbm: Generalized Boosted Regression Models. R Packag. version 1.6-3.1. Sajedi‐ Hosseini, F., Choubin, B., Solaimani, K., Cerdà, A. and Kavian, A., 2018a. Spatial prediction of soil erosion susceptibility using a fuzzy analytical network process: Application of the fuzzy decision making trial and evaluation laboratory approach. Land degradation & development, 29(9), pp.3092-3103. Sajedi-Hosseini, F., Malekian, A., Choubin, B., Rahmati, O., Cipullo, S., Coulon, F., Pradhan,

contamination.

Sci.

-p

https://doi.org/10.1016/j.scitotenv.2018.07.054

Total

Environ.

ro

groundwater

of

B., 2018b. A novel machine learning-based approach for the risk assessment of nitrate

Samani, S., 2019. Allocation of 151 billion rials to compensate for flood damage to two [WWW

Document].

Iran.

re

provinces

student’s

News

Agency.

URL

lP

https://www.isna.ir/news/97121306705

na

Samanta, R.K., Bhunia, G.S., Shit, P.K., Pourghasemi, H.R., 2018. Flood susceptibility mapping using geospatial frequency ratio technique: a case study of Subarnarekha River Basin, India.

ur

Model. Earth Syst. Environ. https://doi.org/10.1007/s40808-018-0427-z

Jo

Schapire, R.E., 2003. The Boosting Approach to Machine Learning: An Overview. https://doi.org/10.1007/978-0-387-21579-2_9 Seckin, N., Cobaner, M., Yurtal, R., Haktanir, T., 2013. Comparison of Artificial Neural Network Methods with L-moments for Estimating Flood Flow at Ungauged Sites: The Case of

East

Mediterranean

River

Basin,

Turkey.

Water

Resour.

Manag.

https://doi.org/10.1007/s11269-013-0278-3 Siahkamari, S., Haghizadeh, A., Zeinivand, H., Tahmasebipour, N., Rahmati, O., 2018. Spatial prediction of flood-susceptible areas using frequency ratio and maximum entropy models.

Journal Pre-proof Geocarto Int. https://doi.org/10.1080/10106049.2017.1316780 Steven Abney, 2002. Bootstrapping, in: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). AT&T Laboratories – Research 180 Park Avenue Florham Park, NJ, USA, 07932, Philadelphia, pp. 360–367. Tang, Z., Zhang, H., Yi, S., Xiao, Y., 2018. Assessment of flood susceptible areas using spatially explicit,

probabilistic

multi-criteria

decision

J.

Hydrol.

of

https://doi.org/10.1016/j.jhydrol.2018.01.033

analysis.

ro

Tavoosi, T., Delara, G., 2010. Climate Classification of Ardebil Province. Nivar 34, 47–52.

-p

Tehrany, M.S., Pradhan, B., Jebur, M.N., 2013. Spatial prediction of flood susceptible areas

re

using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2013.09.034

lP

Tehrany, M.S., Pradhan, B., Jebur, M.N., 2014. Flood susceptibility mapping using a novel

na

ensemble weights-of-evidence and support vector machine models in GIS. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2014.03.008

ur

Tehrany, M.S., Pradhan, B., Jebur, M.N., 2015a. Flood susceptibility analysis and its verification

Jo

using a novel ensemble support vector machine and frequency ratio method. Stoch. Environ. Res. Risk Assess. https://doi.org/10.1007/s00477-015-1021-9 Tehrany, M.S., Pradhan, B., Mansor, S., Ahmad, N., 2015b. Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. Catena. https://doi.org/10.1016/j.catena.2014.10.017 Thuiller, W., Georges, D., Engler, R., 2013. biomod2: Ensemble platform for species distribution modeling. R Packag. version. https://doi.org/10.1017/CBO9781107415324.004 Tien Bui, D., Bui, Q.T., Nguyen, Q.P., Pradhan, B., Nampak, H., Trinh, P.T., 2017. A hybrid

Journal Pre-proof artificial intelligence approach using GIS-based neural-fuzzy inference system and particle swarm optimization for forest fire susceptibility modeling at a tropical area. Agric. For. Meteorol. https://doi.org/10.1016/j.agrformet.2016.11.002 Tien Bui, D., Pradhan, B., Nampak, H., Bui, Q.T., Tran, Q.A., Nguyen, Q.P., 2016. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area

of

using GIS. J. Hydrol. https://doi.org/10.1016/j.jhydrol.2016.06.027

ro

UNISDR, 2011. Global Assessment Report on Disaster Risk Reduction. International Strategy

-p

for Disaster Reduction (ISDR).

re

UNISDR, 2015. Global Assessment Report on Disaster Risk Reduction., International Stratergy for Disaster Reduction (ISDR). https://doi.org/9789211320282

lP

Vaghefi, S.A., Keykhai, M., Jahanbakhshi, F., Sheikholeslami, J., Ahmadi, A., Yang, H.,

na

Abbaspour, K.C., 2019. The future of extreme climate in Iran. Sci. Rep. 9, 1464. https://doi.org/10.1038/s41598-018-38071-8

ur

Wang, Y., Hong, H., Chen, W., Li, S., Panahi, M., Khosravi, K., Shirzadi, A., Shahabi, H.,

Jo

Panahi, S., Costache, R., 2019. Flood susceptibility mapping in Dingnan County (China) using adaptive neuro-fuzzy inference system with biogeography based optimization and imperialistic

competitive

algorithm.

J.

Environ.

Manage.

247,

712–729.

https://doi.org/10.1016/j.jenvman.2019.06.102 Wu, C.F.J., 1986. Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis. Ann. Stat. 14, 1261–1295. https://doi.org/10.1214/aos/1176350142 Yalcin, A., Reis, S., Aydinoglu, A.C., Yomralioglu, T., 2011. A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression

Journal Pre-proof methods for landslide susceptibility mapping in Trabzon, NE Turkey. Catena. https://doi.org/10.1016/j.catena.2011.01.014 Yang, W., Xu, K., Lian, J., Ma, C., Bin, L., 2018. Integrated flood vulnerability assessment approach

based

on

TOPSIS

and

Shannon

entropy

methods.

Ecol.

Indic.

https://doi.org/10.1016/j.ecolind.2018.02.015 Yates, D.N., Warner, T.T., Leavesley, G.H., 2002. Prediction of a Flash Flood in Complex

of

Terrain. Part II: A Comparison of Flood Discharge Simulations Using Rainfall Input from

ro

Radar, a Dynamic Model, and an Automated Algorithmic System. J. Appl. Meteorol.

-p

https://doi.org/10.1175/1520-0450(2000)039<0815:poaffi>2.0.co;2

models

for

re

Zhang, W., Goh, A.T.C., 2016. Multivariate adaptive regression splines and neural network prediction

of

pile

drivability.

Geosci.

Front.

lP

https://doi.org/10.1016/j.gsf.2014.10.003

areas

on

a

na

Zhao, G., Pang, B., Xu, Z., Yue, J., Tu, T., 2018. Mapping flood susceptibility in mountainous national

scale

in

China.

Sci.

Total

Environ.

ur

https://doi.org/10.1016/j.scitotenv.2017.10.037

Jo

Zhou, Y., Wu, Y., 2011. Analyses on influence of training data set to neural network supervised learning performance. Adv. Intell. Soft Comput. Adv. Intell. Soft Comput. 106, 19–25.

Journal Pre-proof Declaration of interests

Jo

ur

na

lP

re

-p

ro

of

☒ The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Journal Pre-proof

Jo

ur

na

lP

re

-p

ro

of

Figures:

Fig. 1. Location of the Ardabil Province in northwest Iran

Jo

ur

na

lP

re

-p

ro

of

Journal Pre-proof

Fig. 2. Flood conditioning factors used in this study

Journal Pre-proof

Flo od conditioning factors

Flood inventory map

Resampling by RS and BT algorithms (B = 10 runs)

Altitude Flood points

Slope

Non -Flood points

Aspect Curvature Training dataset

Validation dataset

River

of

Rainfall

MARS model BRT model

ro

NDVI

-p

Landuse

re

Lithology

Averaging the flood susceptibility over B=10 runs

GAM model

YES

na

lP

Flood susceptibility map

ur

Fig. 3. Flowchart of methodology

Jo

Model calibration

NO Stopping condition?

Jo

ur

na

lP

re

-p

ro

of

Journal Pre-proof

Fig. 4. Flood susceptibility predictions in the Ardabil province using the proposed integrative models.

Journal Pre-proof

0.2

Mean AUC (test.dep) = 0.918

sitive rate)

1-Specificity (false positive 1-Specificity (false positive rate)rate)

1.0 0.8 0.6 0.4 0.2 0.0

ro

-p

1.0 0.8 0.6 0.2

0.4

re

Sensitivity (true positive rate)

0.0

lP

na

1.0

0.0

0.0

Mean AUC (training) = 0.968 Mean Mean AUCAUC (test) = 0.95 (test.dep) = 0.943

0.2

0.4

0.6

0.8

0.6

0.8

1.0

1-Specificity (false positive rate)

ROC - bootstrap) ROC (brt(gam - subsampling)

ROC (gam

1.0

Mean AUC (train) = 0.96 Mean AUC (training) = 1

AUC (training) = 0.96 Mean Mean AUC (test) = 0.92 Mean AUC (test.dep) = 0.976 Mean AUC (test.dep) = 0.918

0.8

1.0

RS-BRT

0.6

0.6 0.4 0.2

Mean AUC (train) = 0.96

0.4

0.4

ur

Jo

1-Specificity (false positive rate)

0.2

0.2

0.8

0.0

0.6

Mean AUC (training) = 0.999 Mean AUC (test.dep) = 0.936

Sensitivity (true positive rate)

0.4

ROC (brt - bootstrap) BT-BRT

0.8

1.0

0.2

(training) = 0.999 Mean Mean AUCAUC (test) = 0.96 Mean AUC (test.dep) = 0.936

Sensitivity (true positive rate) Sensitivity (true positive rate) 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0

1.0 0.8 0.6 0.4 0.2 0.0

Sensitivity (true positive rate) 0.2 0.4 0.6 0.8 1.0 Sensitivity (true positive rate)

1-Specificity

Mean AUC (train) = 0.99

1-Specificity (false positive rate) 0.2 0.4 0.6 0.8 1.0

0.0

0.00.0 0.20.2 0.40.4 0.60.6 0.80.8 1.01.0

1-Specificity (false positive rate)

1-Specificity (false positive rate) 1-Specificity (false positive rate)

1.0 0.8 0.6

.4

(true positive rate)

0.4

0.6

0.8

1.0

Fig. 5. ROC curves of the developed integrative models for learning and validation phases with ROC (mars - bootstrap) - subsampling) the thin curves representing the ROC of the individualROC runs (mars and the thicker curves representing the mean ROC over the B = 10 runs. (true positive rate)

0

RS-MARS

(training) = 1 Mean Mean AUCAUC (test) = 0.97 Mean AUC (test.dep) = 0.969

0.0

sitive rate)

1-Specificity (false positive 1-Specificity (false positive rate)rate) 0.2 0.4 0.6 0.8 1.0

0.2

ROC (mars - subsampling)

Mean AUC (train) = 1

0.0

1.0

0.0

1-Specificity (false positive rate)

ROC (mars - subsampling)

0.0

AUC (training) = 1 AUC (test.dep) = 0.969

Sensitivity (true positive rate)

otstrap)

Sensitivity (true positive rate)

1.0 0.8 0.6 0.4 0.2

Mean AUC (train) = 1

0.0

ROC (mars - bootstrap) BT-MARS

0.8

Mean(training) AUC (training) Mean AUC = 0.96 = 1 Mean(test.dep) AUC (test.dep) Mean AUC = 0.918= 0.976

Mean0.6 (training) =1 0.4 0.6 AUCAUC (test) =0.8 0.95 0.0 0.0 0.2 0.2 0.4Mean 1.0 1.0 Mean AUC 0.8 (test.dep) = 0.945

1-Specificity (false positive rate) 0.0 0.0 0.2 0.2 0.4 0.4 0.6 0.6 0.8 0.8 1.0 1.0

1.0

RS-GAM

0.0

Sensitivity (true positive rate) 0.0 0.4 0.2 0.6 0.4 0.8 0.6 1.0 0.8 1.0 0.2

0.0

0.0

Mean AUC (training) = 1 AUC(training) (test) 0.98 Mean0.6 AUC 0.4Mean 0.8= =0.96 1.0 Mean AUC (test.dep) = 0.976

(true positive Sensitivity rate) rate) (true positive Sensitivity

1.0 0.6 1.0 0.8 0.8 0.0 0.4 0.2 0.6 0.4 0.2

Mean AUC (training) = 0.968 Mean AUC (test.dep) = 0.943

Mean AUC (train) = 1

ROC (gam - subsampling)

ROC (gam

of

0.8

- bootstrap) ROCROC (brt (gam - subsampling)

ROC - bootstrap) ROC (brt (gam -BT-GAM subsampling)

0.0

AUC (training) = 0.968 AUC (test.dep) = 0.943

Sensitivity (true positive rate)

strap)

(true positive Sensitivity rate) rate) (true positive Sensitivity 0.0 0.2 0.4 0.6 0.8 1.0

ROC (brt - bootstrap)

0.0

0.2

0

1-Specificit

Jo

ur

na

lP

re

-p

ro

of

Journal Pre-proof

Fig. 6. Sensitivity analysis of the response variable (flood) to the various conditioning factors for individual runs of the BT-GAM model

Jo

ur

na

lP

re

-p

ro

of

Journal Pre-proof

Fig. 7. Sensitivity analysis of the response variable (flood) to the various conditioning factors using bootstrapping (BT) and random subsampling (RS) algorithms.

Jo

ur

na

lP

re

-p

ro

of

Journal Pre-proof

Journal Pre-proof

Tables:

GAM

0.92

MARS

0.95

0.86

0.89

BRT

0.93

0.74

0.82

0.90

0.64

0.71

0.94

0.78

0.78

RS-GAM

0.95

0.86

0.87

RS-MARS

0.96

0.86

0.89

RS-BRT

0.92

0.73

0.77

BT-GAM

0.98

0.91

0.93

BT-MARS

0.97

0.91

0.94

BT-BRT

0.95

0.79

0.83

MLP

na

ur

RS integrative models

lP

SVM

Jo

BT integrative models

COR

of

AUC

0.82

ro

re

Standalone benchmark models

Model

-p

Table 1 Accuracy assessment of the integrative models, comparison with standalone benchmark models TSS 0.84

Journal Pre-proof

Table 2 Area percent and the number of floods occurred within the susceptibility classes reproduced by the models

Susceptibility rate

BT-GAM

BT-BRT

Number of floods

Area percent

Number of floods

Area percent

Number of floods

Very low

191209 (40%)

1 (0.7%)

155901 (8%)

8 (5%)

1953529 (41%)

1 (0.7%)

Low

581374 (12%)

2 (1.36%)

162310(39%)

25 (17%)

644932 (13%)

1 (0.7%)

Moderate

380406 (8%)

2 (1.36%)

277526 (15%)

9 (6%)

464488 (10%)

3 (2%)

High

379499 (8%)

10 (6%)

422092 (22%)

25 (17%)

452812 (10%)

9 (5.5%)

Very high

148599 (32%)

133 (90.5%)

857621 (46%)

80 (54%)

1223601 (26%)

134 (91%)

Jo

ur

na

lP

re

-p

ro

of

Area percent

BT-MARS

Journal Pre-proof

Curvature

Distance to river

Aspect

NDV

Resampling algorithms Slope

Flood and Non-flood points

Elevation

ro

of

Sensitivity analysis

Jo

ur

na

lP

re

-p

Flood susceptibility map

Rainfall

Machine learning models

Journal Pre-proof

Highlights: Novel integrative models proposed for flood susceptibility predictions.



Resampling algorithms were integrated with machine learning models.



Bootstrapping and subsampling algorithms improved the performance of the models.



Machine learning models performed best with bootstrapping algorithm.

Jo

ur

na

lP

re

-p

ro

of