Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping

Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping

Journal of Environmental Management 217 (2018) 1e11 Contents lists available at ScienceDirect Journal of Environmental Management journal homepage: ...

3MB Sizes 0 Downloads 72 Views

Journal of Environmental Management 217 (2018) 1e11

Contents lists available at ScienceDirect

Journal of Environmental Management journal homepage: www.elsevier.com/locate/jenvman

Research article

Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping Hossein Shafizadeh-Moghadam a, *, Roozbeh Valavi b, Himan Shahabi c, Kamran Chapi d, Ataollah Shirzadi d a

Department of GIS and Remote Sensing, Tarbiat Modares University, Tehran, Iran School of BioSciences, University of Melbourne, Parkville, VIC, 3010, Australia Department of Geomorphology, Faculty of Natural Resources, University of Kurdistan, Sanandaj, Iran d Department of Rangeland and Watershed Management, Faculty of Natural Resources, University of Kurdistan, Sanandaj, Iran b c

a r t i c l e i n f o

a b s t r a c t

Article history: Received 20 January 2018 Received in revised form 13 March 2018 Accepted 20 March 2018

In this research, eight individual machine learning and statistical models are implemented and compared, and based on their results, seven ensemble models for flood susceptibility assessment are introduced. The individual models included artificial neural networks, classification and regression trees, flexible discriminant analysis, generalized linear model, generalized additive model, boosted regression trees, multivariate adaptive regression splines, and maximum entropy, and the ensemble models were Ensemble Model committee averaging (EMca), Ensemble Model confidence interval Inferior (EMciInf), Ensemble Model confidence interval Superior (EMciSup), Ensemble Model to estimate the coefficient of variation (EMcv), Ensemble Model to estimate the mean (EMmean), Ensemble Model to estimate the median (EMmedian), and Ensemble Model based on weighted mean (EMwmean). The data set covered 201 flood events in the Haraz watershed (Mazandaran province in Iran) and 10,000 randomly selected non-occurrence points. Among the individual models, the Area Under the Receiver Operating Characteristic (AUROC), which showed the highest value, belonged to boosted regression trees (0.975) and the lowest value was recorded for generalized linear model (0.642). On the other hand, the proposed EMmedian resulted in the highest accuracy (0.976) among all models. In spite of the outstanding performance of some models, nevertheless, variability among the prediction of individual models was considerable. Therefore, to reduce uncertainty, creating more generalizable, more stable, and less sensitive models, ensemble forecasting approaches and in particular the EMmedian is recommended for flood susceptibility assessment. © 2018 Elsevier Ltd. All rights reserved.

Keywords: Flood susceptibility mapping Haraz watershed Ensemble forecasting Machine learning Background sampling

1. Introduction Flood is the overflowing of river water from its natural bed which causes inundation of nearby lands (Shirzadi et al., 2017) leading to huge damages to human properties and their lives (Lee et al., 2012; Merkuryeva et al., 2015; Rahmati and Pourghasemi, 2017; Tehrany et al., 2015). Therefore, spatial prediction of this environmental disaster is so crucial that failure to identify flood prone areas of a watershed may increase its devastating effects. The flood occurrence is affected by several factors such as land use, distance to river, drainage network, soil type, altitude and slope

* Corresponding author. E-mail addresses: h.shafi[email protected] (H. Shafizadeh-Moghadam), [email protected] (R. Valavi), [email protected] (H. Shahabi), K. [email protected] (K. Chapi), [email protected] (A. Shirzadi). https://doi.org/10.1016/j.jenvman.2018.03.089 0301-4797/© 2018 Elsevier Ltd. All rights reserved.

(e.g, Rahmati and Pourghasemi, 2017; Tehrany et al., 2015; Zhao et al., 2018); thus, understanding and quantifying the influence of each of these factors on flood occurrence is essential. Nowadays, thanks to the advancements in remote sensing, Geographic Information Systems (GIS), Machine Learning (ML), and statistical models, creation of more accurate flood susceptibility map is quite feasible (e.g, Chapi et al., 2017; Tehrany et al., 2013; Zhao et al., 2018). However, success in preparation of these maps requires a thorough understanding of the process of flood occurrence, identification of the flood related factors, knowledge of how each factor impacts flood occurrence, and proper model selection and/or model development. Flood susceptibility mapping has been conducted for various areas at basin and national scales (Zhao et al., 2018), using different algorithms such as Logistic Regression (LR; (Tehrany et al., 2013), Artificial Neural Networks (ANNs; (Zhao et al., 2018), Frequency

2

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11

Ratio (FR; Cao et al., 2016), Boosted Regression Trees (BRT; (Rahmati and Pourghasemi, 2017), Generalized Linear Model (GLM; (Chapi et al., 2017), Support Vector Machine (SVM (Tehrany et al., 2015); and Random Forest (RF; (Chapi et al., 2017; Rahmati and Pourghasemi, 2017). Recently, Hong et al. (2018) have proposed a novel approach to create a flood susceptibility map in the Poyang County, China, using fuzzy weight of evidence (fuzzy-WofE) and data mining methods. Termeh et al. (2018) mapped flood hazard areas by integration of adaptive neuro-fuzzy inference systems (ANFIS) with different metaheuristics algorithms such as ant colony optimization (ACO), genetic algorithm (GA), and particle swarm optimization (PSO). To determine areas of flood exposure, Tang et al. (2018) combined the probabilistic and the local ordered weighted averaging (OWA) methods via Monte Carlo simulation to consider the uncertainty associated with the weights of selected factors, spatial heterogeneity of preferences and the analysts. Nevertheless, some of these methods have less predictive ability but can explain the role of effective factors, for example, LR. A group of them such as BRT have both a high predictive power and the power to diagnose the role of factors underlying the flood occurrence. There is also a group of modelling approaches with great predictive ability, such as ANN. Nonetheless, they cannot explicitly identify the role of factors influencing flood. Flood susceptibility mapping faces several challenges. First, adopting a proper method for selecting non-occurrence flood points is a challenge. Indeed, there is a number of methods with each one creating different results. Another issue is related to model selection for flood susceptibility mapping since there is no universal consensus on a single model and each model has its own pros and cons. Each model may show different accuracies based on its assumptions on data distribution, sensitivity to the extreme values, internal factors, and primary reasons of development. Development of models by combining individual models could generate more generalizable results, less sensitive and more accurate models (Araújo and New, 2007; Buston and Elith, 2011). One approach for boosting the accuracy of individual models is ensemble forecasting. Recently, some ensemble forecasting approaches were proposed for flood susceptibility mapping (Tehrany et al., 2013, 2014); however, they were based on few simple or weighted averaging models, while ensemble forecasting works reasonably when there are several methods at work. On the other hand, there was neither uncertainty map of the individual models nor exploring the spatial response of each influencing factor on flood occurrence in the previous studies. Many torrential floods have recently occurred in northern Iran due to high density of population, the lack of legal supervision on compact and intensive construction around rivers, and enormous deforestation. Hence, preparing flood susceptibility map as an appropriate management tool to identify flood prone areas is essential to prevent construction in those areas and protect natural resources (Tehrany et al., 2013). Thus, based on the abovementioned concerns, the primary aim of this paper is to generate response curves showing the spatial behavior of different factors affecting flood occurrence. Then, performance of eight wellestablished ML and statistical models including ANNs, CART, Flexible Discriminant Analysis (FDA), GAM, BRT, GLM, MARS, and MaxEnt will be evaluated. Finally, ensemble forecasting approaches to combine the results of individual models and create more generalizable and more stable ones will be discussed. 2. Materials and methods 2.1. Study area Haraz watershed, which is located in the north of Iran, was

selected as the case study. This area has been highly affected by flood specifically during the last decades. The region extends between longitudes of 51 430 to 52 360 E, and the latitudes of 35 450 to 36 220 N. The majority of the Haraz watershed is located in Mazandaran province and the remaining covers some parts of Tehran, Semnan and Golestan provinces (Fig. 1). This basin consists of rivers that originate from the central Alborz Mountain chain and discharge to the Caspian Sea. Haraz River is the main stream running through the watershed. The watershed covers an area of about 4014 km2 including mountains, hills, rivers, and streams. The temperature falls below 25  C in winter and reaches over 36.5  C in summer (Chapi et al., 2017). Altitude of the region ranges between 328 and 5595 m above sea level (Khosravi et al., 2018) and receives the mean annual rain of ~500 mm (Pourghasemi et al., 2012). Spatial variation of rainfall based on Iranian Meteorological Department shows that the largest bulk of rainfalls occur in January, February, March, and October. Grassland is the prominent land cover of the region, covering over 92% of the watershed. The remaining 8% includes forest lands, wood lands, barren lands, water bodies and residential areas. 2.2. Geospatial database preparation According to the literature review (e.g, Cao et al., 2016; Chapi et al., 2017; Rahmati and Pourghasemi, 2017; Tehrany et al., 2015; Zhao et al., 2018) and with regard to the local characteristics, eleven main factors which have significantly affected the occurrence of flood in the study area were recognized. Conditioning factors include slope degree, curvature, elevation, topographic wetness index (TWI), stream power index (SPI), distance to river, river density, land use, Normalized Difference Vegetation Index (NDVI), rainfall, and lithology. A Digital Elevation Model (DEM) with 20 m spatial resolution for the study area was obtained from the Mazandaran Regional Water Authority (MRWA). Since most of the factors were derived from DEM layer, spatial resolution of all factors was set identical to the DEM layer. On the other hand, the selected cell size maintains details and produces high precision outputs. (e.g., Chapi et al., 2017; Hong et al., 2018; Tehrany et al., 2015). Based on the DEM, slope degree, curvature, elevation, TWI, SPI, distance to river, and river density maps were extracted using ArcGIS 10 (Fig. 2). Slope degree is related to the water infiltration and the higher velocity so that the higher velocity of water and the lower rate of water infiltration is associated with the higher slope angle (Khosravi et al., 2016). Therefore, the lower areas are more exposed to floods. In the present study, slope angle varies from 0 - 66.8 . Curvature is considered in most literature as one of the conditioning factors influencing flooding in basins (Tehrany et al., 2013). A positive value represents convex surface whereas a negative value indicates concave surface at a given pixel (Chapi et al., 2017). In our study area, curvature map varied from -7.93 - 9.83. Elevation is defined as the height above the sea level, which has relationship with climatic conditions. Hence, it is an important conditioning factor in flooding of a basin (Tehrany et al., 2015). In this study, it varies from 328 to 5595 m. Another factor is TWI, one of the most effective factors on flood occurrence (Chapi et al., 2017). The ratio between specific basin area and slope is called TWI (Gallant, 2000), which was calculated as (Beven and Kirkby, 1979):

 TWI ¼ ln

a

tan b

 (1)

where a is the cumulative upslope area of a drainage basin through a point (per unit contour length) and tan b is the angle of slope at the same point. The TWI ranged between 1.9 and 11.5. The SPI

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11

3

Fig. 1. Haraz watershed and its location in Iran.

(Tehrany et al., 2014), another important factor, was extracted from DEM and was computed as (Moore and Wilson, 1992):

SPI ¼ As tan b

(2)

where AS is the specific basin area, and b is the local slope gradient (in degree). The SPI range for this study area was from 0 to 6 to 900,000. Distance to river as another flood conditioning factor is directly related to closeness and proximity to main stream in the catchment area (Tehrany et al., 2015). In this study, the river network was extracted from topographical maps of the Haraz watershed and then divided into several river types whose length ranged from 0 to 4352 m. Another effective factor was river density that is defined as the total length of streams and rivers divided by the total area of the drainage basin indicating how well or poorly a watershed is drained (Oikonomidis et al., 2015). The river density values of the study area ranged from 0 to 2.08 m/m2. Land use is considered as one of the most important factors on flooding of a basin, which affects the infiltration and runoff through the nature of surface materials (Kia et al., 2012). The land use map of the study area was extracted from Landsat8 Operational Land Imager (Landsat-8 OLI) satellite images using a neural network algorithm in ENVI 5.1 software (Exelis Visual Information Solutions, Boulder, Colorado). The obtained map was classified into seven categories including water bodies, forest, barren, residential areas, woodland, range land and crop land. The NDVI, another factor, is a valuable index to assess vegetation cover and their effects on flooding in a basin. This index generally ranges between 1 and þ1. The NDVI values ranged from -0.68 to 0.72.

Rainfall was also considered as an important hydrologic process for recharging basins and flooding in aquifers (Knebl et al., 2005). As rainfall increases, the potential of flood occurrence increases correspondingly (El Alfy, 2016; Wang et al., 2013). The rainfall map of the present study was constructed by ordinary kriging interpolation method using the mean annual rainfall data of 17 rainfall gauges in a period of 20 years (1991e2011). The rainfall map of the study area ranged between 187.7 mm and 740.5 mm. Lithology, as another important flood conditioning factor, is related to both soil porosity and water permeability of aquifers (Tehrany et al., 2014), which in turn influences the flooding in basins in the light of their lithology. The lithology map of this study was extracted from the geological map at 1:100,000 scale obtained from the Geological Survey & Mineral Exploitation of Iran (GSI). The lithology map of the present study has six units including Jurassic, Teryas, Permian, Quaternary, Tertiary, Cretaceous (Table 1). 2.3. Individual models 2.3.1. Artificial neural networks (ANN) The ANN consists of an interconnected set of elements called neuron and a number of layers (Rahmati and Pourghasemi, 2017; Zhang and Goh, 2016) including the input, hidden and output layers. In this study, 10 nodes, identical to the number of explanatory variables, were set for the input and hidden layers, while we used one node for the output layer coded with either 1 for places where flood occurrence was recorded or 0 for non-occurrence flood. Different algorithms exist to train (determine the weights) of an ANN model, with Back Propagation (BP) being one of the most popular ones. Therefore, a BP based ANN was used to approximate

4

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11

Fig. 2. Flood conditioning factor; a) Curvature, b) Elevation, c) Land cover, d) Lithology, e) NDVI, f) Rainfall, g) River density, h) Distance to river, i) Slope, j) SPI and k) TWI.

Table 1 Data layers and their ranges applied in the flood modelling in this study. Conditioning factor

Range/class types

Source

River density River distance Slope TWI Land cover Lithology Plan curvature Elevation NDVI Rainfall SPI

0e2.08 0e4352 (m) 0e66.8 (%) 1.9e11.5 Water bodies, Barren land, Forest, Residential areas, woodland, range land, crop land Jurassic, Teryas, Permian, Quaternary, Tertiary, Cretaceous 7.93e9.83 328-5595 0.68e0.72 187.7e740.5 (mm) 0-6,900,000

DEM DEM DEM DEM Landsat 8 Geological survey of Iran DEM DEM Landsat 8 Rain gauges statistics DEM

the non-linear relationships between the flood occurrence and exploratory variables. (Zhang and Goh, 2016). BP selects the initial

weights randomly, then compares the calculated values with observed values and their difference is summarized and reported as

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11

mean squared error (Pijanowski et al., 2002). Based on a generalized delta rule (Rumelhart et al., 1986), the initial weights are then adjusted so that the total error is distributed among the various neurons in the network (Pijanowski et al., 2002). This process is iteratively repeated until the error levels off at a low level. 2.3.2. Classification and regression trees (CART) The CART is a tree-based model used for both classification and regression tasks (Breiman et al., 1984). A nonlinear technique that operates based on a set of binary decision rules was extracted from a number of randomly selected predictors (Breiman et al., 1984). It recursively partitions the exploratory variables to find the ranges that minimize the residual sum of the square or misclassification rate. The task of segmentation is controlled by the Gini index, an index used for calculating the impurity of a data partition by measuring misclassification rates or sum of the squares errors (Araújo et al., 2011). It has a straightforward and easy to follow structure so that its generated rules can be easily understood. 2.3.3. Flexible discriminant analysis (FDA) The FDA is a method to accomplish Linear Discriminant Analysis (LDA) on derived responses using linear regression (Friedman et al., 2001). LDA tries to find a projection hyperplane aiming at minimizing the interclass variance and maximizing the distance between the projected means of the classes (Xanthopoulos et al., 2013). The problem of optimization of LDA is to minimize the square average of the residuals, while, in the FDA, linear regression is replaced by nonparametric regression. This process will lead to a nonparametric and flexible alternative to LDA (Friedman et al., 2001). FDA is also a better classifier than LDA for several reasons (Hastie et al., 2009). Main reason is that LDA has a linear decision boundary that do not appropriately separate the classes (flood vs. non-flood). Many natural phenomena like flood (susceptibility) could be non-linear. FDA solves this issue (and other mentioned in the reference) by generalising LDA to be more flexible. Thus, we used FDA instead of LDA for complex and nonlinear relations of flood susceptibility mapping. 2.3.4. Generalized linear model (GLM) The GLM with a logistic link function (logistic regression) has been widely used for modelling binary variables or presenceabsence data. In this paper, a logistic regression (LR) to associate the flood occurrence with underlying environmental driving forces was used. Apart from creating a susceptibility map that shows the relative likelihood of a cell to be assigned as torrential, the magnitude of driving forces can also be obtained. The point here is that interpretation of the obtained coefficient from GLM is not similar to the linear models because the relationship between variables is not a straight line (James et al., 2013). The model can well handle both the categorical and continuous variables. Further, response curve for each factor of the flood can be created. 2.3.5. Generalized additive model (GAM) In flood susceptibility mapping, the relationship between flood and environmental factors is often complex. GAM is a nonparametric extension of GLM (using kernel or spline smoothers) and the main difference between them is the ability of GAM to model nonlinear relationships, and it does not require normal distribution of the dependent (response variable) and the prediction is generated from a linear combination of predictor variables connected to the dependent variable using a link function (Hastie and Tibshirani, 1990). Each factor in a GAM can take the linear or nonlinear forms (Goetz et al., 2011). GAM models have become popular due to their predictive power, flexibility and interpretability (Zhang and Batterman, 2010).

5

2.3.6. Boosted regression trees (BRT) The BRT is an ensemble method for fitting statistical models that benefit from combination of two algorithms: regression trees and boosting (Elith et al., 2008). Handling exploratory variables of different types, treatment of missing data, handling outliers and insensitive to data distribution, dealing with complex nonlinear relations and accounting for interaction among the exploratory variables are key characteristics of a BRT model (Elith et al., 2008). The BRT can be resembled to the random forest algorithm where both algorithms are constructed from a large number of trees and overcome limitations of a single tree model. There are two parameters that need to be set: the ‘learning rate’ to determine the involvement of each tree to the growing model and ‘tree complexity’ to control if the interactions are fitted (Buston and Elith, 2011). These parameters were optimally tuned using the process of cross validation. Furthermore, the effect of multiple predictors on flood occurrence was investigated using BRTs. In doing so, the relative influence of each predictor is assessed based on the frequency of the predictor selection and the model improvement selection. 2.3.7. Multivariate adaptive regression splines (MARS) Introduced by Friedman (1991), the MARS is a flexible regression method which is used to model nonlinear relations. It shows a high potential in handling complex and high-dimensional data (Friedman, 1991). It is a nonparametric model composed of a series of piecewise linear segments (splines) of differing gradients (Zhang and Goh, 2013). The required basic functions and their related parameters are automatically determined by the data (Friedman, 1991). MARS is the product of a two-step process so that the forward step creates functions and detects potential knots and the backward step comprises pruning. These two steps not only strike a balance between the number of functions and knots, but also prevents over fitting. The end points of the segments are called knots or values of a variable that define variation point along the range of a predictor (Leathwick et al., 2006). MARS generates basic functions by searching in a stepwise manner. Locating the knot points is performed using adaptive regression algorithm (Zhang and Goh, 2013). It works under no specific assumption about the data distribution or relations, a good characteristic for complex spatial data such as the flood. 2.3.8. Maximum entropy (MaxEnt) The MaxEnt is a method used for species distributions modelling (SDM) from presence-only species records (Elith et al., 2011). Available since 2004, MaxEnt has been widely employed for modelling species distributions and its predictive performance is usually competitive with the methods of highest performing results (Elith et al., 2006). The SDM estimates the relationship between the influences of environmental and/or spatial factors on species records at specific sites (Franklin, 2010). The same approach was adopted to assess the influence of environmental factors on flood occurrence in this study. A total number of 10,000 background points from the raster layers to represent the available landscape for flooding to be used as presence-background modelling were randomly selected. 2.4. Ensemble forecasting Many studies have been working relying on single statistical or ML techniques, however, there is no single absolute best modelling technique for spatial prediction. In addition, one major source of uncertainty in spatial prediction is the choice of modelling technique as each model responses to geographic range and sample size differently (Meller et al., 2014). Nevertheless, combination of

6

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11

multiple techniques shows that they create robust and more stable results than single techniques which is called ensemble forecasting or modelling (Araújo and New, 2007; Burnham and Anderson, 2003). Ensemble modelling has been successfully applied in the spatial prediction of plant and animal species distributions that outperforms single modelling techniques and provide a more reliable prediction (Araújo et al., 2005; Breiner et al., 2017; Marmion et al., 2009). In this paper, ensemble forecasting creates an output from the results of multiple models so that the errors risen by each model are even and balanced. Seven different ensemble forecasting approaches available within the R programming package of 'biomod2' (Thuiller et al., 2009, 2016) were implemented including: i) EMca, Ensemble Model committee averaging which linearly combines the model outputs and is identical to arithmetic averaging; ii) EMciInf, Ensemble Model confidence interval Inferior values. In this model, inferior values of confidence interval maps of each individual model are taken; iii) EMciSup, Ensemble Model confidence interval Superior values. In contrast to EMciInf, here, superior values of confidence interval maps of each individual model are taken; iv) EMcv, Ensemble Model to estimate the coefficient of variation across predictions; v) EMmean, Ensemble Model to estimate the mean probabilities across predictions; vi) EMmedian, Ensemble Model median probabilities across predictions;

vii) EMwmean, Ensemble Model weighted mean probabilities across predictions. For this approach, weight is indeed the Area Under the Receiver Operating Characteristic curve (AUROC) value obtained for each model multiplied by their corresponding probability map. Whole the modelling processes were coded in R programming language (R Development Core Team, 2017). 3. Performance evaluation When a model is implemented to estimate the spatial likelihood of flood in a given region, its performance should be tested and validated. In this study, the Relative Operating Characteristics (ROC) curve was used to assess the performance of the individual and ensemble models. With caution, it can be stated that in most geohazard modelling studies, the ROC has been used as a standard tool (Pham et al., 2016b; Shahabi and Hashim, 2015). The ROC curve is designed in a two-dimensional space in which the X-axis specifies sensitivity and the Y-axis denotes specificity. The sensitivity is defined as the number of flood pixels correctly classified as flood, while the specificity is the number of non-flood pixels correctly classified as non-flood (Bui et al., 2016). As an integral part of ROC curve, the Area Under the Receiver Operating Characteristic (AUROC) is a quantitative metric commonly used to evaluate the performance of the flood models (Pham et al., 2016a). When the value of AUROC is 0.5, it shows that the model operates like a random model and the values greater than 0.5 up to 1 indicate the higher performance of a given model (Shirzadi et al., 2017). For further description of ROC, one can refer to Pontius and Parmentier

Fig. 3. The relationship between conditioning factors and the probability of flood occurrence. Land cover codes correspond to 1: water bodies, 2: barren land, 3: forest, 4: residential areas, 5: wood land, 6: range land, 7: crop land. Lithology codes correspond to 1: Jurassic, 2: Teryas, 3: Permian, 4: Quaternary, 5: Tertiary, 6: Cretaceous.

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11

(2014). 4. Results and discussion 4.1. Influence of spatial factors on the relative likelihood of flood occurrence The relationship among 11 conditioning factors and flood occurrence was assessed using the BRT model. Fig. 3 shows this relationship in which the X-axis represents the distribution of pixel values of each conditioning factor and the Y-axis denotes the relative likelihood of flood occurrence. These results clearly imply that distance from river can be considered as the most effective conditioning factor on FMS, followed by NDVI, elevation, rainfall, slope, land cover, SPI, river density, TWI, lithology and curvature. This is partly in agreement with findings reported by Chapi et al. (2017), even though slope was reported as the main effective factor. In the curvature factor (Fig. 4a), the probability of flood

7

occurrence increases between 1 and 0 values. This range of curvature conforms to the flat zones. The result indicates that curvature values less than 1 which represents convex slopes (negative value of curvature), have lower probability of flood occurrence (~0.4) with a constant trend than the concave slopes that flood probability increases to 0.6 in those regions (curvature values more than 0 or positive value of curvature). General response of curvature values to flood occurrence confirms the results reported by Cao et al. (2016), Chapi et al. (2017) and Khosravi et al. (2016). As the elevation (Fig. 4b) increases, the probability of flooding decreases. Relationship between elevation and the relative likelihood of flood occurrence is negative up to 2700 m, meaning the chance of flood occurrence is higher in lower lands than elevations higher than 2700 m. The most probability of flood occurrence (0.8) is shown to occur in elevations around 300 m. Our result is in line with the findings reported by (Hong et al., 2017). The role of land cover (Fig. 4c) in flood occurrence in the study area was also quite remarkable. Amongst the land cover classes,

Fig. 4. Flood susceptibility maps extracted from individual models for the Haraz watershed.

8

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11

water bodies and barren lands displayed the most and least impacts on flooding and showed the probability of flood occurrence close to 0.7 and 0.5, respectively. Naturally, barren lands are more vulnerable to flooding, because there is no vegetation to slow down the water flow and increase the penetration rate. On the other hand, as expected, the forest showed the lowest impact on the flood among all land cover classes. In terms of lithology (Fig. 4d), the areas with high resistant rocks or highly permeable subsoil materials have lower drainage density as well as lower probability of flood occurrence (Srivastava et al., 2014). In the study area, although tertiary formations have higher value of the probability of flood occurrence (0.6), other lithology formations showed almost the same importance for flooding (~0.6). NDVI (Fig. 4e) reflects the density of vegetation cover, although depending on the purpose of the study, various interpretations could be inferred. NDVI affects the flood in different ways under different conditions. Higher NDVI and dense vegetation reduce and lu and Do € lek, 2011). The dense vegetation slow water flow (Turog gives the water time to penetrate into the ground, resulting in a decrease in water volume and less probability of flood event lu and Do €lek, 2011). In this study, the effect of NDVI on flood (Turog incidence is very considerable. With a glimpse into Fig. 3, it can be found that the lower the NDVI value, the higher the probability of flood occurrence will be. Additionally, higher probability of flood incidence (~0.9) in our study area was obtained by the NDVI less than 0.5. In a nutshell, the response curve shows that less covered areas are more at the risk of flood. It has been confirmed in the previous studies that the higher the rainfall (Fig. 4f), the higher the probability of flood incidence will be (Todini et al., 2004). In this study, the rainfalls up to 400 mm had the most important role in flooding, while rainfalls over 400 mm showed a constant effect. The rainfalls approximately around 220 mm are associated with the highest chance of flood occurrence (0.8) in our study area. Another factor which can directly affect the probability of flood occurrence is river density (Fig. 4g) (Chapi et al., 2017). In this study, river densities between 0.8 and 2 m/km2 were more causative in flooding. The higher the river density (>0.8), the higher the probability of flood incidence will be. Also, river density approximately around 1.1 m/km2 played the greatest role in flood occurrence in the Haraz watershed with the highest probability (~0.61). Since most of floods occur due to overflowing of water from

river banks, distance to river (Fig. 4h) is a critical factor for spatial prediction of flood prone areas in a watershed (Chapi et al., 2017). Therefore, the areas closer to rivers demonstrate rapid response to rainstorms and, therefore, are more exposed to flooding (Butler et al., 2006). In this study, the results indicated that distance to rivers up to 150 m was the most influential distance to flood occurrence. The lower the distance to the river (~<20 m), the higher the probability of flood occurrence (~0.9) will be. Our findings confirm the results obtained by Hong et al. (2017), where the class of 0e100 m from the river showed a higher association in relation to flood occurrence. The Slope (Fig. 4i) response curve shows that the lower the slope angle, the higher the probability of flood occurrence will be, a finding that is consistent with that of other studies (e.g., Rahmati and Pourghasemi, 2017; Tehrany et al., 2014). It implies that the chance of flood occurrence increases in flat areas. Fig. 4 shows that the most probable slope range for the flood occurrence falls between 0 and 20 , so that the probability of flood occurrence decreases after 20 and it drops and levels off for the slopes steeper than about 20 . In particular, the likelihood of flood occurrence at falt areas and slopes up to 8 is extremely high while it falls sharply at slopes of more than 8 . The SPI and TWI as two effective hydrological factors can influence the spatial variation of floods. SPI (Fig. 4j) represents soil water content and erosion power of floods to flow downwards in a watershed (Cao et al., 2016). The lower SPI is associated with the stronger power of flood occurrence. The areas with ability of flow lu accumulation are specified with the lower values of SPI (Turog € lek, 2011). In this regard, the SPI close to zero value has and Do higher probability of flood occurrence (>0.5) indicating the most floods in the study area have occurred in places with lower values of SPI (Beven, 2011). have reported that the TWI (Fig. 4k) implies saturated condition as well as water accumulation in a watershed. Therefore, the areas of a watershed identified by higher values of TWI are more exposed to flood inundation. In this study, the TWI showed a fixed role with almost no spatial variability in the occurrence of the flood. 4.2. Flood susceptibility mapping using individual and ensemble models In the current study, all individual models were first implemented and then, based on the results of individual models, the

Fig. 5. Flood susceptibility maps extracted from ensembles models in the Haraz watershed.

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11

9

Fig. 6. Comparison specify the prioritization of performance of individual and ensemble models.

ensemble forecasting approaches were derived. The result of both model categories was a flood susceptibility map. Figs. 5 and 6 show the flood susceptibility maps generated by the individual and ensemble models. According to the response curve obtained by the BRT model, there is a chance of flood occurrence only in the vicinity of 500 m of the rivers. Therefore, in order to model the likelihood of a flood event, a 500 m buffer zone was considered. With regard to the scale of the map, it is difficult to visualize the details in a 500 m buffer zone when displaying the entire map. As a result, an area in which the density of rivers was considerable was selected and magnified (Figs. 4 and 5). As seen in Figs. 4 and 5, the output of all prediction maps vary between 0 and 1. The values near to zero and 1 indicate the lowest and highest likelihood of flood occurrence respectively. As the figures show, individual models show roughly different patterns as compared to ensemble ones, a problem that the Standard Deviation (S.D) map also confirms. The map of standard deviation was created by running the standard deviation among the individual models and showed a remarkable variability among the prediction of individual models, indicating that the model operates well in some and in some places it produces a high error. This indicates the sensitivity of individual models while the ensemble models show a more uniform pattern excluding the EMcv.

4.3. Performance of individual and ensemble models In this study, eight individual models (ANN, CART, FDA, GLM, GAM, BRT, MARS and MaxEnt), and seven ensemble models (EMca, EMciInf, EMciSup, EMcv, EMmean, EMmedian and EMwmean) were calibrated and implemented for flood susceptibility mapping. The AUROC, S.D and significant level (Sig.) were utilized for assessing the performance of these models. Results can be observed in Tables 2 and 3. The values of AUROC in both individual and ensemble models were statistically significant due to having Sig. equal to 0.000 (less than 0.05). For the ease of interpretation, performance of individual and ensemble models was overlaid on a single ROC (Fig. 6a and b). Accordingly, the best model performance among the individual models belonged to the BRT (AUROC ¼ 0.975). It was followed by the GLM and MaxEnt (AUROC ¼ 0.971), the GAM model (AUROC ¼ 0.962), the MARS

Table 2 AUROC, standard deviation (S.D) and significant level (Sig.) of individual models. Variables

ANN CART FDA GLM GAM BRT MARS MaxEnt

AUC

0.920 0.643 0.822 0.971 0.962 0.975 0.941 0.971

S.D

0.027 0.047 0.040 0.005 0.014 0.007 0.016 0.006

Sig.

0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000

Asymptotic 95% Confidence Interval Lower Bound

Upper Bound

0.867 0.551 0.743 0.960 0.934 0.961 0.909 0.959

0.973 0.735 0.901 0.981 0.990 0.989 0.973 0.982

Table 3 AUROC, standard deviation (S.D) and significant level (Sig.) of ensemble. Variables

AUC

S.D

Sig.

Asymptotic 95% Confidence Interval Lower Bound

Upper Bound

EMca EMcInf EMciSup EMcv EMmean EMmedian EMwmean

0.959 0.870 0.972 0.733 0.974 0.976 0.974

0.013 0.035 0.006 0.042 0.005 0.005 0.005

0.000 0.000 0.000 0.000 0.000 0.000 0.000

0.933 0.802 0.961 0.650 0.963 0.966 0.963

0.985 0.938 0.983 0.816 0.984 0.986 0.984

model (AUROC ¼ 0.941), the ANN model (AUROC ¼ 0.920), the FDA model (AUROC ¼ 0.822), and the CART model (AUROC ¼ 0.643). Results also indicated that the EMmedian ensemble model (AUROC ¼ 0.975) among the ensemble models had the highest performance, followed by the EMmean and EMwmean models (AUROC ¼ 0.973), the EMciSup model (AUROC ¼ 0.972), the EMciInf model (AUROC ¼ 0.869), and the EMcv model (AUROC ¼ 0.732). 4.4. Comparison of individual and ensemble forecasting The prediction power among the best individual and ensemble models was graphically determined using the ROC curve (Fig. 7).

10

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11

2009); however, there is no guarantee that ensemble models always outperform individual models or boost the AUC values of the individual models impressively. AUC value should be interpreted as the power of a model in realization of the occurrence versus nonoccurrence. Important aspects to consider are generalization ability and creation of a model more stable and less prone to over fitting, the matters that are substantially covered by ensemble forecasting. Furthermore, the best individual model does not assure that, by changing the input data or changing the future conditions, it can still generate the highest accuracy. 6. Conclusion

Fig. 7. Comparison of the prediction power between the best individual and ensemble models for flood susceptibility mapping in the Haraz watershed.

Basically, the highest value of the area under the ROC curve among individual models was obtained by the BRT (AUC ¼ 0.975), while the EMmedian model acquired the highest value of AUC (0.976) among the ensemble models. As shown, there is no outstanding difference between the two models. BRTs can be considered as an advanced form of regression although they belong to ML techniques (Friedman et al., 2000; Schapire, 2003). BRT is composed of regression trees and boosting and, in contrast to the regression models that output a single predictive model, they fit numerous simple models and combine them for prediction (Buston and Elith, 2011). BRT can successfully consider different types of variables including numeric, binary, categorical, and missing values as well as non-independent data. Further, it automatically models the interactions between predictors (Elith et al., 2008). In contrast to the other individual models used in this research, the BRT operation is based on a boosting approach where the aim is improving the final prediction and accuracy. Boosting generates a more robust final model and allows curvilinear functions to be modelled (Elith et al., 2008). 5. Lessons learned from the application of six flood susceptibility models

To alleviate the devastating effects of floods, creating accurate flood susceptibility maps is essential. In this study, eight individual ML and statistical techniques for mapping flood susceptibility were implemented and cross compared. The integration of individual models has then resulted in the emergence of seven ensemble forecasting approaches, leading to the application of 15 different approaches for mapping flood susceptibility. Our uncertainty map, which resulted from running the standard deviation among the individual models, showed an outstanding variability among the prediction of individual models, indicating that each model predicts well for some places and generates less accurate predictions for the other areas. Model evaluation showed that the BRT model had the highest prediction power among individual models, whereas the EMmedian appeared to be the most accurate model among the ensemble ones, although the difference between the value of AUROC for the BRT and EMmedian models was not significant. On the other hand, investigation of spatial influence of each factor on the flood occurrence showed the distance to river, NDVI, elevation, and rainfall as the most effective factors. As a conclusion, ensemble models are suggested for flood mapping due to their more stable results, more generalization ability, and higher prediction accuracy. Nevertheless, there is limited literature on the application of ensemble forecasting approaches in flood mapping, and thus development of other ensemble frameworks such as model stacking is strongly recommended. Acknowledgment Authors wish to acknowledge the financial support of the Iran National Science Foundation through the research project No 96004000. Roozbeh Valavi is supported by an Australian Government Research Training Program Scholarship and a Rowden White  Lahoz-Monfort for his Scholarship. The authors appreciate Jose valuable comments on modelling framework. Also, the authors would like to acknowledge the anonymous reviewers and editor for their helpful comments on a previous version of the manuscript. References

Flood susceptibility mapping can be implemented using a verity of models. The spatial pattern of predicted map in different modelling techniques might be different, although their evaluation might be similar while using the AUROC value. Therefore, selecting one model as “the best” for creating flood susceptibility mapping and decision making is quite challenging. Model averaging or ensemble forecasting is a technique to combine the results of different individual models in a more robust single model (Araújo and New, 2007). This approach is especially powerful when applying the calibrated model to new regions since it maps the main trend among individual modelling techniques (e.g. median and mean) and catches overall variations, hence reducing uncertainty in the spatial prediction (Guisan et al., 2017). There are several ways to combine the individual models (Marmion et al.,

Araújo, M., Rivas, T., Giraldez, E., Taboada, J., 2011. Use of machine learning techniques to analyse the risk associated with mine sludge deposits. Math. Comput. Model. 54, 1823e1828. Araújo, M.B., New, M., 2007. Ensemble forecasting of species distributions. Trends Ecol. Evol. 22, 42e47. Araújo, M.B., Whittaker, R.J., Ladle, R.J., Erhard, M., 2005. Reducing uncertainty in projections of extinction risk from climate change. Glob. Ecol. Biogeogr. 14, 529e538. Beven, K., Kirkby, M.J., 1979. A physically based, variable contributing area model of le a  base physique de zone d'appel variable de basin hydrology/Un mode l'hydrologie du bassin versant. Hydrological Sci. J. 24, 43e69. Beven, K.J., 2011. Rainfall-runoff Modelling: the Primer. John Wiley & Sons. Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A., 1984. Classification and Regression Trees. CRC press, New York. Breiner, F.T., Nobis, M.P., Bergamini, A., Guisan, A., 2017. Optimizing ensembles of small models for predicting the distribution of species with few occurrences. Methods Ecol. Evol.

H. Shafizadeh-Moghadam et al. / Journal of Environmental Management 217 (2018) 1e11 Bui, D.T., Tuan, T.A., Klempe, H., Pradhan, B., Revhaug, I., 2016. Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13, 361e378. Burnham, K.P., Anderson, D.R., 2003. Model Selection and Multimodel Inference: a Practical Information-theoretic Approach, second ed. Springer Science & Business Media. Buston, P.M., Elith, J., 2011. Determinants of reproductive success in dominant pairs of clownfish: a boosted regression tree analysis. J. Animal Ecol. 80, 528e538. Butler, D., Kokkalidou, A., MAKROPOULOS, C., 2006. Supporting the siting of new urban developments for integrated urban water resource management. Integr. Urban Water Resour. Manag. 19e34. Cao, C., Xu, P., Wang, Y., Chen, J., Zheng, L., Niu, C., 2016. Flash flood hazard susceptibility mapping using frequency ratio and statistical index methods in coalmine subsidence areas. Sustainability 8, 948. Chapi, K., Singh, V.P., Shirzadi, A., Shahabi, H., Bui, D.T., Pham, B.T., Khosravi, K., 2017. A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ. Model. Softw. 95, 229e245. El Alfy, M., 2016. Assessing the impact of arid area urbanization on flash floods using GIS, remote sensing, and HEC-HMS rainfallerunoff modeling. Hydrology Res. 47, 1142e1160. Elith, J., Graham, C.H., Anderson, R.P., Dudík, M., Ferrier, S., Guisan, A., J Hijmans, R., Huettmann, F.R., Leathwick, J., Lehmann, A., Li, J., G Lohmann, L., Overton, J.M.M., Peterson, A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira, R.,  n, J., Williams, S., Wisz, M.S., Zimmermann, N.E., 2006. Schapire, R.E., Sobero Novel methods improve prediction of species' distributions from occurrence data. Ecography 29, 129e151. Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regression trees. J. Animal Ecol. 77, 802e813. Elith, J., Phillips, S.J., Hastie, T., Dudík, M., Chee, Y.E., Yates, C.J., 2011. A statistical explanation of MaxEnt for ecologists. Divers. Distrib. 17, 43e57. Franklin, J., 2010. Mapping Species Distributions: Spatial Inference and Prediction. Cambridge University Press. Friedman, J., Hastie, T., Tibshirani, R., 2000. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). Ann. Stat. 28, 337e407. Friedman, J., Hastie, T., Tibshirani, R., 2001. The Elements of Statistical Learning. Springer series in statistics, New York. Friedman, J.H., 1991. Multivariate adaptive regression splines. Ann. Stat. 1e67. Gallant, J.C., 2000. Terrain Analysis: Principles and Applications. John Wiley & Sons. Goetz, J.N., Guthrie, R.H., Brenning, A., 2011. Integrating physical and empirical landslide susceptibility models using generalized additive models. Geomorphology 129, 376e386. Guisan, A., Thuiller, W., Zimmermann, N.E., 2017. Habitat Suitability and Distribution Models: with Applications in R. Cambridge University Press. Hastie, T., Tibshirani, R., 1990. Generalized Additive Models. Wiley Online Library. Hastie, T., Tibshirani, R., Friedman, J., 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, second ed. Springer series in statistics, New York. Hong, H., Panahi, M., Shirzadi, A., Ma, T., Liu, J., Zhu, A.-X., Chen, W., Kougias, I., Kazakis, N., 2017. Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci. Total Environ. Hong, H., Tsangaratos, P., Ilia, I., Liu, J., Zhu, A.-X., Chen, W., 2018. Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. Sci. Total Environ. 625, 575e588. James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning. Springer. Khosravi, K., Nohani, E., Maroufinia, E., Pourghasemi, H.R., 2016. A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multicriteria decision-making technique. Nat. Hazards 83, 947e987. Khosravi, K., Pham, B.T., Chapi, K., Shirzadi, A., Shahabi, H., Revhaug, I., Prakash, I., Bui, D.T., 2018. A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, Northern Iran. Sci. Total Environ. 627, 744e755. Kia, M.B., Pirasteh, S., Pradhan, B., Mahmud, A.R., Sulaiman, W.N.A., Moradi, A., 2012. An artificial neural network model for flood simulation using GIS: Johor River Basin, Malaysia. Environ. Earth Sci. 67, 251e264. Knebl, M., Yang, Z.-L., Hutchison, K., Maidment, D., 2005. Regional scale flood modeling using NEXRAD rainfall, GIS, and HEC-HMS/RAS: a case study for the San Antonio River Basin Summer 2002 storm event. J. Environ. Manag. 75, 325e336. Leathwick, J., Elith, J., Hastie, T., 2006. Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecol. Model. 199, 188e196. Lee, M.-J., Kang, J.-e., Jeon, S., 2012. Application of frequency ratio model and validation for predictive flooded area susceptibility mapping using GIS. In: Geoscience and Remote Sensing Symposium (IGARSS), 2012 IEEE International. IEEE, pp. 895e898. Marmion, M., Parviainen, M., Luoto, M., Heikkinen, R.K., Thuiller, W., 2009. Evaluation of consensus methods in predictive species distribution modelling. Divers. Distrib. 15, 59e69. Meller, L., Cabeza, M., Pironon, S., Barbet-Massin, M., Maiorano, L., Georges, D., Thuiller, W., 2014. Ensemble distribution models in conservation prioritization:

11

from consensus predictions to consensus reserve networks. Divers. Distrib. 20, 309e321. Merkuryeva, G., Merkuryev, Y., Sokolov, B.V., Potryasaev, S., Zelentsov, V.A., Lektauers, A., 2015. Advanced river flood monitoring, modelling and forecasting. J. Comput. Sci. 10, 77e85. Moore, I.D., Wilson, J.P., 1992. Length-slope factors for the revised universal soil loss equation: simplified method of estimation. J. Soil Water Conserv. 47, 423e428. Oikonomidis, D., Dimogianni, S., Kazakis, N., Voudouris, K., 2015. A GIS/Remote Sensing-based methodology for groundwater potentiality assessment in Tirnavos area, Greece. J. Hydrology 525, 197e208. Pham, B.T., Bui, D.T., Prakash, I., Dholakia, M., 2016a. Rotation forest fuzzy rulebased classifier ensemble for spatial prediction of landslides using GIS. Nat. Hazards 83, 97e127. Pham, B.T., Pradhan, B., Bui, D.T., Prakash, I., Dholakia, M., 2016b. A comparative study of different machine learning methods for landslide susceptibility assessment: a case study of Uttarakhand area (India). Environ. Model. Softw. 84, 240e250. Pijanowski, B.C., Brown, D.G., Shellito, B.A., Manik, G.A., 2002. Using neural networks and GIS to forecast land use changes: a land transformation model. Comput. Environ. Urban Syst. 26, 553e575. Pontius, R.G., Parmentier, B., 2014. Recommendations for using the relative operating characteristic (ROC). Landsc. Ecol. 29, 367e382. Pourghasemi, H.R., Pradhan, B., Gokceoglu, C., 2012. Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat. Hazards 63, 965e996. R Development Core Team, 2017. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Rahmati, O., Pourghasemi, H.R., 2017. Identification of critical flood prone areas in data-scarce and ungauged regions: a comparison of three data mining models. Water Resour. Manag. 31, 1473e1487. Rumelhart, D.E., Hinton, G.E., Williams, R.J., 1986. Learning representations by backpropagating errors. Nature 323, 533. Schapire, R.E., 2003. The Boosting Approach to Machine Learning: an Overview, Nonlinear Estimation and Classification. Springer, pp. 149e171. Shahabi, H., Hashim, M., 2015. Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment. Sci. Rep. 5, 9899. Shirzadi, A., Bui, D.T., Pham, B.T., Solaimani, K., Chapi, K., Kavian, A., Shahabi, H., Revhaug, I., 2017. Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environ. Earth Sci. 76, 60. Srivastava, P.K., Han, D., Rico-Ramirez, M.A., Islam, T., 2014. Sensitivity and uncertainty analysis of mesoscale model downscaled hydro-meteorological variables for discharge prediction. Hydrol. Process. 28, 4419e4432. Tang, Z., Zhang, H., Yi, S., Xiao, Y., 2018. Assessment of flood susceptible areas using spatially explicit, probabilistic multi-criteria decision analysis. J. Hydrol. Termeh, S.V.R., Kornejady, A., Pourghasemi, H.R., Keesstra, S., 2018. Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Sci. Total Environ. 615, 438e451. Tehrany, M.S., Pradhan, B., Jebur, M.N., 2013. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J. Hydrology 504, 69e79. Tehrany, M.S., Pradhan, B., Jebur, M.N., 2014. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J. Hydrology 512, 332e343. Tehrany, M.S., Pradhan, B., Mansor, S., Ahmad, N., 2015. Flood susceptibility assessment using GIS-based support vector machine model with different kernel types. Catena 125, 91e101. Thuiller, W., Georges, D., Engler, R., Breiner, F., 2016. biomod2: Ensemble Platform for Species Distribution Modeling. R Package Version 3.3-7. https://CRAN.Rproject.org/package¼biomod2. Thuiller, W., Lafourcade, B., Engler, R., Araújo, M.B., 2009. BIOMOD-a platform for ensemble forecasting of species distributions. Ecography 32, 369e373. Todini, F., De Filippis, T., De Chiara, G., Maracchi, G., Martina, M., Todini, E., 2004. Using a GIS approach to asses flood hazard at national scale. In: Proceedings of the European Geosciences Union, 1st General Assembly: 25e30 April 2004; Nice, France. _ 2011. Floods and their likely impacts on ecological environlu, H., Do € lek, I., Turog ment in Bolaman River basin (Ordu, Turkey). Res. J. Agric. Sci. 43, 167e173. Wang, L.-C., Behling, H., Lee, T.-Q., Li, H.-C., Huh, C.-A., Shiau, L.-J., Chen, S.-H., Wu, J.T., 2013. Increased precipitation during the Little Ice Age in northern Taiwan inferred from diatoms and geochemistry in a sediment core from a subalpine lake. J. Paleolimnol. 49, 619e631. Xanthopoulos, P., Pardalos, P.M., Trafalis, T.B., 2013. Linear Discriminant Analysis, Robust Data Mining. Springer, pp. 27e33. Zhang, K., Batterman, S., 2010. Near-road air pollutant concentrations of CO and PM 2.5: a comparison of MOBILE6. 2/CALINE4 and generalized additive models. Atmos. Environ. 44, 1740e1748. Zhang, W., Goh, A.T., 2016. Multivariate adaptive regression splines and neural network models for prediction of pile drivability. Geosci. Front. 7, 45e52. Zhang, W., Goh, A.T.C., 2013. Multivariate adaptive regression splines for analysis of geotechnical engineering systems. Comput. Geotechnics 48, 82e95. Zhao, G., Pang, B., Xu, Z., Yue, J., Tu, T., 2018. Mapping flood susceptibility in mountainous areas on a national scale in China. Sci. Total Environ. 615, 1133e1142.