An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines

An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines

Science of the Total Environment 651 (2019) 2087–2096 Contents lists available at ScienceDirect Science of the Total Environment journal homepage: w...

3MB Sizes 1 Downloads 43 Views

Science of the Total Environment 651 (2019) 2087–2096

Contents lists available at ScienceDirect

Science of the Total Environment journal homepage: www.elsevier.com/locate/scitotenv

An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines Bahram Choubin a,b,⁎, Ehsan Moradi b, Mohammad Golshan a, Jan Adamowski c, Farzaneh Sajedi-Hosseini b, Amir Mosavi d,e a

Department of Watershed Management, Sari Agricultural Sciences and Natural Resources University, Sari, Iran Department of Reclamation of Arid and Mountainous Regions, University of Tehran, Karaj, Iran c Department of Bioresource Engineering, Faculty of Agricultural and Environmental Sciences, Macdonald Campus, McGill University, Canada d Institute of Automation, Kalman Kando Faculty of Electrical Engineering, Obuda University, Budapest, Hungary e Institute of Advanced Studies Koszeg, IASK, Koszeg, Hungary b

H I G H L I G H T S

G R A P H I C A L

A B S T R A C T

• Ensemble machine learning (ML) predicting flood susceptibility • Contribution of models with accuracy values above 80% in ensembling process • Area under curve (AUC) for the models ranges from 0.83 to 0.89. • ML allows quick priority of prone areas for the remediation of floods.

a r t i c l e

i n f o

Article history: Received 3 July 2018 Received in revised form 1 October 2018 Accepted 5 October 2018 Available online 06 October 2018 Editor: Ashantha Goonetilleke Keywords: Ensemble approach Flood susceptibility Support vector regression Multivariate discriminant analysis Classification and regression trees

a b s t r a c t Floods, as a catastrophic phenomenon, have a profound impact on ecosystems and human life. Modeling flood susceptibility in watersheds and reducing the damages caused by flooding is an important component of environmental and water management. The current study employs two new algorithms for the first time in flood susceptibility analysis, namely multivariate discriminant analysis (MDA), and classification and regression trees (CART), incorporated with a widely used algorithm, the support vector machine (SVM), to create a flood susceptibility map using an ensemble modeling approach. A flood susceptibility map was developed using these models along with a flood inventory map and flood conditioning factors (including altitude, slope, aspect, curvature, distance from river, topographic wetness index, drainage density, soil depth, soil hydrological groups, land use, and lithology). The case study area was the Khiyav-Chai watershed in Iran. To ensure a more accurate ensemble model, this study proposed a framework for flood susceptibility assessment where only those models with an accuracy of N80% were permissible for use in ensemble modeling. The relative importance of factors was determined using the Jackknife test. Results indicated that the MDA model had the highest predictive accuracy (89%), followed by the SVM (88%) and CART (0.83%) models. Sensitivity analysis showed that slope percent, drainage density, and distance from river were the most important factors in flood susceptibility mapping. The

⁎ Corresponding author at: Department of Watershed Management, Sari Agricultural Sciences and Natural Resources University, Sari, Iran. E-mail address: [email protected] (B. Choubin).

https://doi.org/10.1016/j.scitotenv.2018.10.064 0048-9697/© 2018 Elsevier B.V. All rights reserved.

2088

B. Choubin et al. / Science of the Total Environment 651 (2019) 2087–2096

ensemble modeling approach indicated that residential areas at the outlet of the watershed were very susceptible to flooding, and that these areas should, therefore, be prioritized for the prevention and remediation of floods. © 2018 Elsevier B.V. All rights reserved.

1. Introduction Climate change is contributing to an increase in abnormal weather events such as flooding, which is one of the most devastating natural disasters (Hirabayashi et al., 2013; CRED and UNISDR, 2015). Flooding can have a profound impact on ecosystems and human life (Alexander et al., 2011) and can result in environmental damage and economic loss to residential areas, agriculture and water resources (Lee et al., 2015; Klaus et al., 2016; Wu et al., 2017). Population increase and the rapid growth of urban development and riverside construction have increased the risk of flooding in many areas (Ahmadisharaf et al., 2016). Floods generally begin suddenly and increase rapidly over several hours (Borga et al., 2008). They occur most commonly in the spring due to severe rainfall and/or snowmelt (Furl et al., 2018). In addition, morphological changes to river channels, caused by human or natural interference, may sometimes result in changes to river flow and cause flooding (Yousefi et al., 2018). Iran has faced many flood events in different regions due to large climatic and rainfall variations (Khosravi et al., 2016a, 2016b; Norouzi and Taslimi, 2012). The locations most vulnerable to flooding are areas with high population density and agricultural lands with marginal drainage, as well as river networks where runoff from rainfall events has concentrated (Dankers et al., 2014). Modeling flood susceptibility in watersheds and reducing the damage caused by flooding is an important component of environmental and water management (Siahkamari et al., 2017). Various methods have been used to identify and evaluate flood susceptible areas. For instance, some studies used multi-criteria decision analysis (MCDA) methods, such as the analytic hierarchy process (AHP), to assess flood susceptibility (Yahaya et al., 2010; Chen et al., 2011; Luu et al., 2018; Tang et al., 2018). These methods are based on expert knowledge, which may be skewed by ambiguous judgments and uncertainty (Miles and Snow, 1984). Other studies used statistical methods such as frequency ratio (FR) and regression logistics for mapping flood susceptibility and hazardous areas (Rahmati et al., 2016; Siahkamari et al., 2017; Samanta et al., 2018). These methods depend on predicted variables based on relationships with various explanatory parameters and depend greatly on the size of datasets (McLay et al., 2001). Physically based models such as HEC-RAS and MIKE11 are also used for studying floods (e.g., Gharbi et al., 2016) but require a large amount of data, substantial processing power, and often use parameters that demand extensive calculation time (Shrestha et al., 2013; Hong et al., 2018b). Newer methods include machine learning techniques such as random forest (RF), artificial neural networks (ANN) and support vector machines (SVM), which can identify areas prone to flooding and produce flood susceptibility maps (Lee et al., 2017; Zhao et al., 2018; Termeh et al., 2018). Each of these approaches exhibits certain weaknesses in terms of flood susceptibility mapping that can be improved through an ensemble modeling approach. Ensemble modeling is a process of combining the predictions of single models into an integrated model to increase prediction accuracy (Rokach, 2010). The main objective of this study was to use and apply an ensemble approach to generate a flood susceptibility map. To achieve this, SVM, a universal machine learning method, was applied in conjunction with two other robust machine learning methods (i.e., MDA and CART) that have not been used to date for the spatial modeling of floods. The main contribution of SVM is excellent generalization, but it is difficult to capture important modeling variables using this method. Most important and critical modeling variables can be applied with MDA (Elith et al., 2008; Sajedi-Hosseini et al., 2018b). The CART model has several capabilities,

such as the use of many trees for process description and modeling, insensitivity to data distribution and the existence of data outliers, and the ability to easily account for categorical and numerical variables in the modeling process, such as land use and slope, respectively (Sutton, 2004; Choubin et al., 2018). Ensemble modeling, a synthesis of the individual model predictions, was employed in the current study to address the limitations of the SVM, MDA, and CART models in the context of flood susceptibility mapping (Lee et al., 2012b; Sajedi-Hosseini et al., 2018b). Accordingly, the objectives of the current research were to: i) explore the capability of the CART, MDA and SVM methods to predict flood susceptibility, ii) compare the results of the new algorithms used in this study (i.e., MDA and CART) with a commonly used algorithm (i.e., SVM) to predict flood susceptibility areas, iii) evaluate the importance of flood conditioning factors in producing flood susceptibility maps, and iv) use and apply an ensemble modeling approach to generate a flood susceptibility map. The current paper first provides a description of the study area, the Khiyav-Chai watershed in Iran. Next, in the context of flood susceptibility prediction and assessment, the Methodology section demonstrates the preparation of a flood inventory map, the selection of the appropriate flood conditioning factors, and introduces tools and techniques. Subsequently, the prediction results from the different applied models are presented and discussed. Finally, some conclusions from the current research are drawn. 2. Materials and methods 2.1. Study area The Khiyav-Chai watershed is located in Ardabil Province in northwestern Iran. This watershed is an important tourist area that generates economic and ecological benefits for both the government and the local people (Porkhial et al., 2008). The watershed area is 126 km2 and is located between 47°38′34″ to 47°48′18″ east and 38°12′30″ to 38°23′51″ north (Fig. 1). This area has a rugged topography and an elevation between 1372 m at the river outlet and 4353 m upstream. The climate is semi-arid with an average annual precipitation of 368 mm (2000–2016), and an average temperature of 10 °C (Moghaddam, 2006). The average humidity in the Khiyav-Chai watershed is about 58%. The lowest humidity occurs in July at 13% and the highest is in May at 85%. The mean annual discharge at the watershed outlet (PolSoltani hydrometric station) was about 1.8 m3 s−1 from 1970 to 2016. 2.2. Methodology The flowchart of the methodology used in this study is shown in Fig. 2. It includes (i) preparing a flood inventory map, (ii) determining the appropriate flood conditioning factors, (iii) modeling the flood susceptibility map using CART, SVM and MDA algorithms, (iv) performing an accuracy assessment of the models, (v) implementing a sensitivity analysis of flood conditioning factors, and (vi) producing a flood susceptibility map using an ensemble modeling approach. 2.2.1. Flood inventory map The flood inventory map is a basic map for flood susceptibility assessment (Tehrany et al., 2014; Rahmati et al., 2016). Accurate analysis of flood susceptibility requires a precise flood inventory map that shows the locations of flood occurrence. In the current study, 51 flood location points, identified between the years 2010 to 2017, were obtained from

B. Choubin et al. / Science of the Total Environment 651 (2019) 2087–2096

2089

Fig. 1. Location of the Khiyav-Chai watershed in Iran.

the regional water agency of Ardabil Province (Fig. 1). According to documents from the agency, flooding at these recorded flood location points has caused serious damage to transportation infrastructure, residential areas, natural ecosystems, etc. The maximum and average discharge of the floods in the region are recorded at 381 and 140 m3 s−1, respectively. In addition to the flooded points, an equal number of data points, i.e., 51, were considered as the non-flooded locations (Fig. 1). Then the values of 1 and 0 were assigned to indicate the existence and absence of flooding, respectively, over the area. In this study, we have paid special attention to include all points where experienced flooding. Although increasing the number of data points may help in improving the quality of modeling (Tien Bui and Hoang, 2017), due to the relatively small area of the watershed (about 126 km2), the available flooded points (51) are adequate. Nevertheless,

in many cases researchers, for even larger areas considered less flooded points for modeling. For instance, Termeh et al. (2018) for an area of 5737 km2 considered a total number of 51 flood locations in Jahrom Basin. Using the SDM package in R software, the data points were applied to model calibration (70%) and validation (30%) via the random partition method of the machine learning technique for flood susceptibility assessment. The random partition method is a data-splitting technique that repeats the splitting of data randomly into training and testing sets without replacement (Naimi and Araújo, 2016). 2.2.2. Flood conditioning factors Using information from previous studies (e.g., Tehrany et al., 2014; Rahmati et al., 2016; Khosravi et al., 2016a) and the data available in

Fig. 2. Conceptual flowchart of the methodology.

2090

B. Choubin et al. / Science of the Total Environment 651 (2019) 2087–2096

the study area, 11 effective flood susceptibility parameters were identified: altitude, slope, aspect, curvature, distance from river, topographic wetness index (TWI), drainage density, soil depth, soil hydrological groups (SHG), land use, and lithology.

2.2.2.1. Altitude. Altitude is one of the most important factors affecting flooding (Tehrany et al., 2014; Bui et al., 2016). In general, there is an inverse relationship between flood and elevation; flood frequency increases with decreasing elevation, with the result that lower

Fig. 3. Flood conditioning factors used to prepare the flood susceptibility map: a) altitude, b) slope percent, c) aspect, d) curvature, e) distance from river, f) topographic wetness index (TWI), g) drainage density, h) soil depth, i) soil hydrological groups (SHG), j) land use and k) lithology. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)

B. Choubin et al. / Science of the Total Environment 651 (2019) 2087–2096

elevations are more susceptible to flooding (Khosravi et al., 2016a). A digital elevation map (DEM) with 30 × 30 m pixel size was created using the topographic map (1:25000) in the ArcGIS environment (Fig. 3a). 2.2.2.2. Slope. Flooding is directly related to slope gradient and is an important physiographic characteristic (Meraj et al., 2015; Bui et al., 2016; Khosravi et al., 2016b). Slope contributes directly to surface runoff velocity and vertical percolation and thus affects flood susceptibility (Rahmati et al., 2016). The slope percent map was generated from the DEM layer in ArcGIS. In the study area, the slope was between 0 and 177.5%, indicating a rugged topographic feature (Fig. 3b). 2.2.2.3. Aspect. The slope aspect is defined as the direction of the maximum slope of the terrain surface, which in most flood studies was considered an important flood susceptibility parameter (Regmi et al., 2014). The aspect map was prepared in ArcGIS from the DEM map (Fig. 3c). 2.2.2.4. Curvature. Curvature was determined to be another influential conditioning parameter and was extracted from the DEM map in ArcGIS (Fig. 3d). The curvature consists of three categories: concave, convex, and flat surface. It is a factor in runoff flow and can be useful in detecting flood susceptibility (Lee et al., 2012a). 2.2.2.5. Distance from river. River flows are the main pathways for flood discharge and areas near rivers are susceptible to flooding (Opperman et al., 2009). In this study, distance from river was calculated using the Euclidean tool in ArcGIS. The range of this distance was between 0 and 2618 m (Fig. 3e). 2.2.2.6. Topographic wetness index (TWI). This parameter was presented by Beven and Kirkby (1979) and indicates the spatial variations of wetness in a watershed (Rahmati et al., 2016). In other words, this index exhibits the amount of water accumulation in any pixel size of the watershed area (Gokceoglu et al., 2005), which is calculated as: TWI ¼ ln ðAs = tan βÞ

ð1Þ

where As and β are the specific catchment area (m2 m−1) and the slope gradient (in degrees), respectively. TWI was calculated in the SAGA GIS environment (Fig. 3f). 2.2.2.7. Drainage density. When rainfall occurs in a watershed, the drainage density has effects on peak flows. Regional floods are often related to peak discharge value and drainage area (Ayalew and Krajewski, 2017; Sajedi-Hosseini et al., 2018a), therefore this parameter has an important influence on flood susceptibility. Poor drainage systems often result in river overflow and continuous flooding in an area (Komolafe et al., 2018). The drainage density was derived from the line density tool using ArcGIS software (Fig. 3g). 2.2.2.8. Soil depth. Soil depth was considered as the depth of the soil layer from the ground surface to bedrock. In some areas with low soil depth and especially upstream, runoff generation is higher. The soil depth in the study area was between 69 and 140 cm (Fig. 3h) – values obtained from the Iranian Department of Water Resources (IDWR, 2015). 2.2.2.9. Soil hydrological groups (SHG). Soil hydrological groups indicate soil quality that is based on a minimum water infiltration rate (USDA, 1986). Soil hydrological groups are classified into four groups: A, B, C, and D. Soils in group A have the minimum runoff potential, while soils in group D have the maximum runoff potential (Gittleman et al., 2017). In the study area, group A covered only a small extent and group C covered N60% (Fig. 3i). The map of soil hydrological groups

2091

was obtained from the Iranian Department of Water Resources (IDWR, 2015). 2.2.2.10. Land use. Land use has a significant role in runoff speed, interception, infiltration, and evapo-transportation (Yalcin et al., 2011). The land use map is essential for the determination of areas susceptible to flooding (Karlsson et al., 2017; Komolafe et al., 2018). The study area land use map was obtained from the IRWA and included residential, agriculture, dry farming, woodland, orchard and rangeland areas (Fig. 3j). 2.2.2.11. Lithology. Lithology has a significant influence on hydrological processes in a watershed (Ward and Robinson, 2000). Different lithology units have different susceptibilities to flooding (Lee et al., 2012a). Lithology units in the study area included rhyolitic to rhyodacitic tuff (Qdt), andesite to basaltic volcanic (Qabv), and andesitic subvolcanic bodies (Qasv) (Fig. 3k). 2.2.3. Flood susceptibility modeling In this study, the flood susceptibility map was created using three machine learning models (CART, MDA, and SVM). 2.2.3.1. Classification and regression trees (CART). Breiman et al. (1984) introduced the CART method. The CART is a non-parametric method that can be used to build prediction relationships from input data (Choubin et al., 2018). Among the different types of decision tree (DT) models, such as Quick, Unbiased, and Efficient Statistic Tree (QUEST), CART, and Chi-squared Automatic Interaction Detection (CHAID), only CHAID has been used for flood susceptibility assessment by Tehrany et al. (2013). In the current study, the authors used the CART model as this method has been shown to perform well with processes with nonlinear structures and significant internal heterogeneity (Ji et al., 2013). It can be used to partition the dependent variables into a series of classes based on a required level of internal homogeneity, and for each collection of variables, construct a simple model. The classification component of CART is in accordance with the squared residuals minimization algorithm (Timofeev, 1984). Further details about the CART model are described by Timofeev (1984), Breiman et al. (1984), and Choubin et al. (2018). 2.2.3.2. Multivariate discriminant analysis (MDA). MDA is an extension of Linear Discriminate Analysis (LDA). In LDA, a collection is assumed to be a part of the nearest group, where the distance is calculated by normal distribution of the parameters, and in each class, it is assumed that the variability and correlation among the parameters are equal (Lombardo et al., 2006). In MDA, multiple normal distributions are used within each class. MDA derives the linear combinations as follows (Hair et al., 1998): Y ¼ W 1 X 1 þ W 2 X 2 þ …W n X n

ð2Þ

where Y is a discriminant score, Wi (i = 1,2,3, …, n) are discriminant weights, and Xi (i = 1,2,3, …, n) are independent variables. More details of the MDA model are described in Hair et al. (1998). 2.2.3.3. Support vector machine (SVM). The SVM is a universal machine learning method with a set of linear indicator functions that is used for estimating the issues function (Vapnik, 2013). The kernel mathematical function is applicable for transformation data in SVM. Using the training data, SVM maps the original data into a high dimensional feature space where a hyperplane is constructed (Kavzoglu et al., 2014). The optimal linear hyperplane is used in order to separate the original input space, and the kernel function is used to transform data into two classes consisting of non-flood susceptibility and flood susceptibility {0, 1}. The capability of the SVM is dependent on suitable kernel functions, which include polynomial kernel (PL), sigmoid kernel (SIG), radial basis function (RBF) and linear kernel (LN). According to several

2092

B. Choubin et al. / Science of the Total Environment 651 (2019) 2087–2096

studies (Tien Bui et al., 2012; Hong et al., 2018b), RBF outperforms other kernels in a flood prediction context. As such, the RBF function was used in the current study. 2.2.3.4. Ensemble modeling. The ensemble modeling approach, which produces a combination of predictions from each individual model, often results in better classification than individual models (Lee et al., 2012b; Pourghasemi et al., 2017). Synthesizing the individual models' predictions is performed using weighted and unweighted averaging (Naimi and Araújo, 2016). In this study, the ensemble model was

assembled using weighted averaging based on AUC statistics (Eq. (3)): n

EM ¼

∑i¼1 ðAUCi  Mi Þ n

∑i¼1 AUCi

ð3Þ

where EM is the ensemble model, and AUCi is the AUC value of the ith single model (Mi) (Pourghasemi et al., 2017). To ensure a more accurate ensemble model, this study proposed a framework in flood susceptibility assessment where models with an accuracy of N80% were permissible for use in ensemble modeling. The area

Fig. 4. Flood susceptibility maps: (a) classification and regression trees (CART), (b) support vector machine (SVM), (c) multivariate discriminant analysis (MDA), and (d) ensemble. (For interpretation of the references to color in this figure, the reader is referred to the web version of this article.)

B. Choubin et al. / Science of the Total Environment 651 (2019) 2087–2096

under the receiver operating characteristics curve (AUC) is widely used to show the prediction accuracy for binary or two class classification issues. According to Yesilnacar (2005), an AUC value N0.8 indicates very good model prediction accuracy. In addition, Sajedi-Hosseini et al. (2018b) posit that an AUC above 80% is appropriate to contribute to the ensembling process for modeling. As such, the current study applied it in flood susceptibility assessment for the first time. Since the researchers randomly partitioned training and testing data sets, the models yielded different results for each run. Therefore, the researchers repeated the model calibration many times until a desired AUC value was achieved (N80%). 2.2.4. Accuracy assessment In flood classification, each class belongs to either a positive (flood) or negative (non-flood) class. The number of positive and negative pixels that are correctly classified as positive or negative classes are known as true positives and true negatives. Conversely, the positive and negative pixels that have been erroneously classified are known as false positives and false negatives (Schumann et al., 2014). The AUC value shows the accuracy of two classifiers in a receiver operating characteristic (ROC) graph. The AUC value is between 0 (a diagnostic test that cannot distinguish between flood and non-flood) and 1 (reflecting that a true positive is equal to 1 and a false positive is equal to 0) (Evans et al., 2005). An AUC value b0.6 indicates weak modeling accuracy, 0.6–0.7 shows moderate accuracy, 0.7–0.8 exhibits good accuracy, and N0.8 indicates very good accuracy (Yesilnacar, 2005). In this study, the ROC curves were prepared to explain the models' accuracy through the pROC package (Robin et al., 2011) in the R platform. In addition to the AUC value, the mean square error was used for comparison of model results.

2093

Table 2 The number of flood occurrences within each flood susceptibility class for each model. Flood susceptibility class

CART

SVM

MDA

Ensemble

Low Moderate High Very high

0 0 10 41

0 0 3 48

0 3 4 44

0 0 2 49

the SVM, MDA and CART models (Fig. 4d). Based on the natural break classification scheme (Poli and Sterlacchini, 2007; Khosravi et al., 2016b), the maps created from each model were categorized into low, moderate, high and very high flood susceptibility (Fig. 4). This method is a type of classification method that groups data based on their inherent characteristics and identifies class breaks that minimize the withinclass differences and maximize the between-class differences (De Smith et al., 2007). The flood susceptibility maps showed that the areas around the river and the major water pathways were the most vulnerable to flooding and were classified in the very high susceptibility category. This result was consistent with the findings of Yang et al. (2018) in Hainan, China. The land use map of the study (Fig. 3j) indicated that a residential area (red region in the north of the study area) at the outlet of the Khiyav-Chai watershed was located in the very high flood susceptibility category (Fig. 4). This suggests an intersection point for intervention to minimize future flood damage. This is consistent with a pattern described by Yang et al. (2018), who reported that annually about 181,000 people around the world are affected by flood damage in the areas around rivers, which are mostly urban and residential. 3.2. Flood locations and susceptibility classes analysis

2.2.5. Sensitivity analysis The current study employed the Jackknife test (Efron, 1982; Park, 2015) to assess the sensitivity of flood conditioning factors. Most of the statistical coefficients based on AUC rely on the Jackknife test, which is accepted to have a high capability for a broad range of practical problems (Bandos et al., 2017). The authors used the percentage of relative decrease (PRD) of the AUC to assess factor contribution as follows (Park, 2015): PRDi ¼ 100 

½AUCall −AUCi  AUCall

The map generated by the MDA model showed the low susceptibility class covering the greatest area (80.6 km2; shown in dark green) and the very high susceptibility class covering the smallest area (10.1 km2; shown in red) (Fig. 4c; Table 1). On the CART model map, the very high susceptibility class covered the greatest area (13.8 km2) and the low susceptibility class covered the least area (51.1 km2) (Fig. 4a; Table 1). According to the SVM and ensemble models respectively,

ð4Þ

where AUCall is the AUC value computed from the prediction by all factors. AUCi and PRDi are respectively the AUC value and the percentage of relative decrease of AUC when the ith factor has been removed from the prediction process. 3. Results and discussion 3.1. Flood susceptibility mapping results Flood susceptibility maps were developed using the effective parameters in flood susceptibility and the CART, SVM and MDA methods (Fig. 4). First, a flood susceptibility map was produced by entering the flood conditioning factors in each of the models. Then a new map of flood susceptibility, namely ensemble, was prepared by synthesizing Table 1 The area (km2) identified as belonging to each flood susceptibility class for each model. Flood susceptibility class

CART

SVM

MDA

Ensemble

Low Moderate High Very high

51.1 38.8 22.3 13.8

55.7 38.5 20.1 11.7

80.6 20.6 14.6 10.1

56.9 36.6 18.8 13.7

Fig. 5. The receiver operating characteristic (ROC) curves of flood susceptibility maps.

2094

B. Choubin et al. / Science of the Total Environment 651 (2019) 2087–2096

Table 3 The mean square error (MSE) of the models. CART

SVM

MDA

Ensemble

0.18

0.12

0.12

0.10

11.7 and 13.7 km2 of the total area was placed in the very high susceptibility class, and 55.7 and 56.9 km2 was found to be in the very low susceptibility class (Table 1). The sum of the area in the high and very high susceptibility classes are equal to 36.1, 32.5, 31.8, and 24.7 for the CART, ensemble, SVM, and MDA models, respectively (Table 1). Table 2 indicates the number of flood occurrences in each susceptibility class. The ensemble model identified the greatest number of flood locations (49 flood location points) in the very high susceptibility class, followed by SVM, MDA, and CART with 48, 44, and 41 flood location points, respectively. It should be noted that the MDA identified three flood points in the moderate class. This is because the area in high and very high-susceptibility classes is lower for the MDA model than for the other models. Analysis found that the ensemble model outperformed the other models in terms of flood location identification (Table 2).

3.3. Validation of the flood susceptibility maps The validation of the models' predictions was conducted using the AUC value of the ROC curve (Fig. 5) and mean square error (MSE) value (Table 3). The ROC curve assesses the prediction accuracy of each model by plotting the observed and predicted values. Results of the ROC curve indicated that the ensemble model had the highest

predictive accuracy (AUC = 0.91), followed by MDA (AUC = 0.89), SVM (AUC = 0.88), and CART (AUC = 0.83) models. In addition, the MSE values of the models were calculated (Table 3). According to the MSE values the ensemble model had the lowest error (MSE = 0.10), and the CART model had the highest error (MSE = 0.18). The SVM and MDA had the same MSE equal to 0.12 (Table 3). Analysis of the AUC and MSE values (Fig. 5 and Table 3) showed the ensemble model outperforming the others. This result is in accordance with Tehrany et al. (2017) and supports the argument that the ensemble method is accurate, efficient, and reasonable for use in flood susceptibility assessment. 3.4. Sensitivity analysis results Choosing the appropriate conditioning parameters is very important in flood susceptibility modeling (Fernandez and Lutz, 2010; Kourgialas and Karatzas, 2012). In this study, the sensitivity of the 11 factors was investigated using the Jackknife test to model flood susceptibility. The Jackknife test used in this study employs a partial-derivative calculation (Skinner and Rao, 1996), which is a more appropriate method for sensitivity analysis because of its simplicity and the reduction of the estimator bias that can arise when applying more complex frameworks. Fig. 6 shows the relative importance of the identified flood susceptibility factors in the study area. Slope percent was assessed as the most important factor, followed by drainage density and distance from river. These results were consistent with those obtained from studies by Termeh et al. (2018) and Hong et al. (2018a) that indicated that slope percent is the most important factor for flooded locations. As can be seen in Fig. 6, the slope percent had the highest PRD value across all models (i.e., 19, 10, 18, and 16 for the CART, SVM, MDA, and the ensemble

(b) SVM

Slope

Slope

Drainage density

Drainage density

Conditioning factors

Conditioning factors

(a) CART

Distance from river Altitude TWI Lithology Curvature Landuse HSG

Distance from river Altitude TWI Lithology Curvature Landuse HSG

Soil depth

Soil depth

Aspect

Aspect 0

5

10

15

20

0

5

PDR

Slope Drainage density

Conditioning factors

Conditioning factors

Slope Drainage density Distance from river Altitude TWI Lithology Curvature Landuse HSG

20

15

20

Distance from river Altitude TWI Lithology Curvature Landuse HSG

Soil depth

Soil depth

Aspect

Aspect 5

15

(d) Ensemble

(c) MDA

0

10

PDR

10

PDR

15

20

0

5

10

PDR

Fig. 6. Sensitivity analysis results from the Jackknife test: a) classification and regression trees (CART), b) support vector machine (SVM), c) multivariate discriminant analysis (MDA), and d) ensemble.

B. Choubin et al. / Science of the Total Environment 651 (2019) 2087–2096

model, respectively). PRD values of slope, drainage density, and distance from river in the CART model were about 19, 12, and 6, respectively, while all other factors were b5 (Fig. 6a). In the SVM model, the PRD values of all conditioning factors (except slope percent) were b5 (Fig. 6b). The PRD values of the slope, drainage density, and distance from river in the MDA model were about 18, 7, and 6, respectively, whereas all other factors were b5 (Fig. 6c). According to the ensemble model, the slope percent, drainage density and distance from river (with PRD equal to 16, 7, and 5 respectively) were the most important factors in flood susceptibility mapping (Fig. 6c). 4. Conclusion This study employed two new algorithms for the first time, MDA and CART, as well as a widely used algorithm, SVM, to produce a flood susceptibility map using the ensemble modeling approach. Sensitivity analysis indicated that slope percent, drainage density, and distance from river were the most important factors in flood susceptibility mapping. The main limitation of this study was the shortage of recorded flood location points. The authors' assessment suggests that it may be valuable to carry out field surveys and to derive information from local stakeholders to improve flood location records, especially in developing countries with information deficits. The ensemble modeling approach used in the current study indicated that a residential area at the outlet of the Khiyav-Chai watershed is very highly-susceptible to flooding and should be a priority for management to prevent and remediate flooding. The methodology and factors used in this research could be applied to other regions to support management, control, and reduction of flood damage in areas susceptible to flooding. Acknowledgements This research was partially funded by an NSERC Discovery and Accelerate Grant held by Jan Adamowski. We thank Chelsea Scheske of McGill for editorial work on the paper. References Ahmadisharaf, E., Tajrishy, M., Alamdari, N., 2016. Integrating flood hazard into site selection of detention basins using spatial multi-criteria decision-making. J. Environ. Plann. Manag. 59 (8), 1397–1417. Alexander, M., Viavattene, C., Faulkner, H., Priest, S., 2011. A GIS-based Flood Risk Assessment Tool: Supporting Flood Incident Management at the Local Scale. Ayalew, T.B., Krajewski, W.F., 2017. Effect of river network geometry on flood frequency: a tale of two watersheds in Iowa. J. Hydrol. Eng. 22 (8), 06017004. Bandos, A.I., Guo, B., Gur, D., 2017. Jackknife variance of the partial area under the empirical receiver operating characteristic curve. Stat. Methods Med. Res. 26 (2), 528–541. Beven, K.J., Kirkby, M.J., 1979. A physically based, variable contributing area model of basin hydrology/Un modèle à base physique de zone d'appel variable de l'hydrologie du bassin versant. Hydrol. Sci. J. 24 (1), 43–69. Borga, M., Gaume, E., Creutin, J.D., Marchi, L., 2008. Surveying flash floods: gauging the ungauged extremes. Hydrol. Process. 22, 3883–3885. https://doi.org/10.1002/ hyp.7111. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and regression trees. The Wadsworth Statistics/Probability Series. Chapman and Hall, New York. Bui, D.T., Pradhan, B., Nampak, H., Bui, Q.T., Tran, Q.A., Nguyen, Q.P., 2016. Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibility modeling in a high-frequency tropical cyclone area using GIS. J. Hydrol. 540, 317–330. Chen, Y.R., Yeh, C.H., Yu, B., 2011. Integrated application of the analytic hierarchy process and the geographic information system for flood risk assessment and flood plain management in Taiwan. Nat. Hazards 59 (3), 1261–1276. Choubin, B., Zehtabian, G., Azareh, A., Rafiei-Sardooi, E., Sajedi-Hosseini, F., Kişi, Ö., 2018. Precipitation forecasting using classification and regression trees (CART) model: a comparative study of different approaches. Environ. Earth Sci. 77 (8), 314. CRED, UNISDR, 2015. The Human Cost of Weather-related Disasters. United Nations, Geneva, pp. 1995–2015. Dankers, R., Arnell, N.W., Clark, D.B., Falloon, P.D., Fekete, B.M., Gosling, S.N., Heinke, J., Kim, H., Masaki, Y., Satoh, Y., Stacke, T., Wada, Y., Wisser, D., 2014. First look at changes in flood hazard in the inter-sectoral impact model intercomparison project ensemble. Proc. Natl. Acad. Sci. 111, 3257–3261. De Smith, M.J., Goodchild, M.F., Longley, M.A., 2007. Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools. Matador Press, Leicester. Efron, B., 1982. The Jackknife, the Bootstrap, and Other Resampling Plans. vol. 38. SIAM.

2095

Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regression trees. J. Anim. Ecol. 77 (4), 802–813. Evans, R., Horstman, C., Conzemius, M., 2005. Accuracy and optimization of force platform gait analysis in Labradors with cranial cruciate disease evaluated at a walking gait. Vet. Surg. 34 (5), 445–449. Fernandez, D.S., Lutz, M.A., 2010. Urban flood hazard zoning in Tucuman Province, Argentina, using GIS and multicriteria decision analysis. Eng. Geol. 111 (1), 90–98. Furl, C., Sharif, H., Zeitler, J.W., El Hassan, A., Joseph, J., 2018. Hydrometeorology of the catastrophic Blanco river flood in South Texas, May 2015. J. Hydrol. Reg. Stud. 15, 90–104. Gharbi, M., Soualmia, A., Dartus, D., Masbernat, L., 2016. Comparison of 1D and 2D hydraulic models for floods simulation on the Medjerda Riverin Tunisia. J. Mater. Environ. Sci. 7, 3017–3026. Gittleman, M., Farmer, C.J., Kremer, P., McPhearson, T., 2017. Estimating stormwater runoff for community gardens in New York City. Urban Ecosyst. 20 (1), 129–139. Gokceoglu, C., Sonmez, H., Nefeslioglu, H.A., Duman, T.Y., Can, T., 2005. The 17 March 2005 Kuzulu landslide (Sivas, Turkey) and landslide-susceptibility map of its near vicinity. Eng. Geol. 81, 65–83. Hair, J.F., Black, W.C., Babin, B.J., Anderson, R.E., Tatham, R.L., 1998. Multivariate Data Analysis. 5(3). Prentice hall, Upper Saddle River, NJ, pp. 207–219. Hirabayashi, Y., Mahendran, R., Koirala, S., Konoshima, L., Yamazaki, D., Watanabe, S., Kim, H., Kanae, S., 2013. Global flood risk under climate change. Nat. Clim. Chang. 3 (9), 816. Hong, H., Panahi, M., Shirzadi, A., Ma, T., Liu, J., Zhu, A.X., Kazakis, N., 2018a. Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci. Total Environ. 1124–1141. Hong, H., Tsangaratos, P., Ilia, I., Liu, J., Zhu, A.X., Chen, W., 2018b. Application of fuzzy weight of evidence and data mining techniques in construction of flood susceptibility map of Poyang County, China. Sci. Total Environ. 625, 575–588. Iranian Department of Water Resource (IDWR), 2015. Report of Natural Resources Management. Ji, Z., Li, N., Xie, W., Wu, J., Zhou, Y., 2013. Comprehensive assessment of flood risk using the classification and regression tree method. Stoch. Env. Res. Risk A. 27 (8), 1815–1828. Karlsson, C.S., Kalantari, Z., Mörtberg, U., Olofsson, B., Lyon, S.W., 2017. Natural hazard susceptibility assessment for road planning using spatial multi-criteria analysis. Environ. Manag. 60 (5), 823–851. Kavzoglu, T., Sahin, E.K., Colkesen, I., 2014. Landslide susceptibility mapping using GISbased multi-criteria decision analysis, support vector machines, and logistic regression. Landslides 11 (3), 425–439. Khosravi, K., Nohani, E., Maroufinia, E., Pourghasemi, H.R., 2016a. A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique. Nat. Hazards 83 (2), 947–987. Khosravi, K., Pourghasemi, H.R., Chapi, K., Bahri, M., 2016b. Flash flood susceptibility analysis and its mapping using different bivariate models in Iran: a comparison between Shannon's entropy, statistical index, and weighting factor models. Environ. Monit. Assess. 188 (12), 656. Klaus, S., Kreibich, H., Merz, B., Kuhlmann, B., Schröter, K., 2016. Large-scale, seasonal flood risk analysis for agricultural crops in Germany. Environ. Earth Sci. 75, 1289. Komolafe, A.A., Herath, S., Avtar, R., 2018. Methodology to assess potential flood damages in urban areas under the influence of climate change. Nat. Hazards Rev. 19 (2), 05018001. Kourgialas, N.N., Karatzas, G.P., 2012. Flood management and a GIS modeling method to assess flood hazard areas – a case study. Hydrol. Sci. J. 56 (2), 212–225. Lee, M.J., Kang, J.E., Jeon, S., 2012a. Application of frequency ratio model and validation for predictive flooded area susceptibility mapping using GIS. Geoscience and Remote Sensing Symposium (IGARSS), 2012 IEEE International, pp. 895–898. Lee, M.J., Choi, J.W., Oh, H.J., Won, J.S., Park, I., Lee, S., 2012b. Ensemble-based landslide susceptibility maps in Jinbu area, Korea. Environ. Earth Sci. 67, 23–37. Lee, M.J., Kang, J.E., Kim, G., 2015. Application of fuzzy combination operators to flood vulnerability assessments in Seoul, Korea. Geocarto Int. 30, 1052–1075. Lee, S., Kim, J.C., Jung, H.S., Lee, M.J., Lee, S., 2017. Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul Metropolitan City, Korea. Geomatics Nat. Hazards Risk 8 (2), 1185–1203. Lombardo, F., Obach, R.S., DiCapua, F.M., Bakken, G.A., Lu, J., Potter, D.M., Zhang, Y., 2006. A hybrid mixture discriminant analysis–random forest computational model for the prediction of volume of distribution of drugs in human. J. Med. Chem. 49 (7), 2262–2267. Luu, C., Von Meding, J., Kanjanabootra, S., 2018. Assessing flood hazard using flood marks and analytic hierarchy process approach: a case study for the 2013 flood event in Quang Nam, Vietnam. Nat. Hazards 90 (3), 1031–1050. McLay, C.D.A., Dragten, R., Sparling, G., Selvarajah, N., 2001. Predicting groundwater nitrate concentrations in a region of mixed agricultural land use: a comparison of three approaches. Environ. Pollut. 115 (2), 191–204. Meraj, G., Romshoo, S.A., Yousuf, A.R., Altaf, S., Altaf, F., 2015. Assessing the influence of watershed characteristics on the flood vulnerability of Jhelum basin in Kashmir Himalaya. Nat. Hazards 77 (1), 153–175. Miles, R.E., Snow, C.C., 1984. Designing strategic human resource systems. Org. Dyn. 13, 36–52. https://doi.org/10.1016/0090-2616(84)90030-5. Moghaddam, A.R., 2006. A Conceptual Design of a Geothermal Combined Cycle and Comparison With a Single-flash Power Plant for Well NWS-4 (Sabalan, Iran). Naimi, B., Araújo, M.B., 2016. Sdm: a reproducible and extensible R platform for species distribution modelling. Ecography 39, 368–375. Norouzi, G., Taslimi, M., 2012. The impact of flood damages on production of Iran's Agricultural Sector. Middle East. J. Sci. Res. 12, 921–926.

2096

B. Choubin et al. / Science of the Total Environment 651 (2019) 2087–2096

Opperman, J.J., Galloway, G.E., Fargione, J., Mount, J.F., Richter, B.D., Secchi, S., 2009. Sustainable floodplains through large-scale reconnection to rivers. Science 326 (5959), 1487–1488. Park, N.W., 2015. Using maximum entropy modeling for landslide susceptibility mapping with multiple geoenvironmental data sets. Environ. Earth Sci. 73 (3), 937–949. Poli, S., Sterlacchini, S., 2007. Landslide representation strategies in susceptibility studies using weights-of-evidence modeling technique. Nat. Resour. Res. 16 (2), 121–134. Porkhial, S., Shabihkhani, R., Oladnia, S., Moridi, A., 2008. Environmental Monitoring of Air, Soil and Surface Water Resources; A Case Study on Meshkinshahr Geothermal Field Development. Pourghasemi, H.R., Yousefi, S., Kornejady, A., Cerdà, A., 2017. Performance assessment of individual and ensemble data-mining techniques for gully erosion modeling. Sci. Total Environ. 609, 764–775. Rahmati, O., Pourghasemi, H.R., Zeinivand, H., 2016. Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto Int. 31 (1), 42–70. Regmi, A.D., Devkota, K.C., Yoshida, K., Pradhan, B., Pourghasemi, H.R., Kumamoto, T., Akgun, A., 2014. Application of frequency ratio, statistical index, and weights-ofevidence models and their comparison in landslide susceptibility mapping in Central Nepal Himalaya. Arab. J. Geosci. 7 (2), 725–742. Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.C., Müller, M., 2011. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinforma. 12 (1), 77. Rokach, L., 2010. Ensemble-based classifiers. Artif. Intell. Rev. 33 (1-2), 1–39. Sajedi-Hosseini, F., Choubin, B., Solaimani, K., Cerdà, A., Kavian, A., 2018a. Spatial prediction of soil erosion susceptibility using FANP: application of the Fuzzy DEMATEL approach. Land Degrad. Dev. https://doi.org/10.1002/ldr.3058. Sajedi-Hosseini, F., Malekian, A., Choubin, B., Rahmati, O., Cipullo, S., Coulon, F., Pradhan, B., 2018b. A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Sci. Total Environ. 644, 954–962. https://doi.org/ 10.1016/j.scitotenv.2018.07.054. Samanta, R.K., Bhunia, G.S., Shit, P.K., Pourghasemi, H.R., 2018. Flood susceptibility mapping using geospatial frequency ratio technique: a case study of Subarnarekha River basin, India. Model. Earth Syst. Environ. 1–14. Schumann, G.P., Vernieuwe, H., De Baets, B., Verhoest, N.E.C., 2014. ROC-based calibration of flood inundation models. Hydrol. Process. 28 (22), 5495–5502. Shrestha, N.K., Leta, O.T., De Fraine, B., Van Griensven, A., Bauwens, W., 2013. Open MIbased integrated sediment transport modelling of the river Zenne, Belgium. Environ. Model Softw. 47, 193–206. Siahkamari, S., Haghizadeh, A., Zeinivand, H., Tahmasebipour, N., Rahmati, O., 2017. Spatial prediction of flood-susceptible areas using frequency ratio and maximum entropy models. Geocarto Int. 1–15. Skinner, C.J., Rao, J.N.K., 1996. Estimation in dual frame survey with complex design. J. Am. Stat. Assoc. 91, 349–356. Sutton, C.D., 2004. Classification and regression trees, bagging, and boosting. Handb. Stat. 24, 303–329. https://doi.org/10.1016/S0169-7161(04)24011-1. Tang, Z., Zhang, H., Yi, S., Xiao, Y., 2018. Assessment of flood susceptible areas using spatially explicit, probabilistic multi-criteria decision analysis. J. Hydrol. 558, 144–158.

Tehrany, M.S., Pradhan, B., Jebur, M.N., 2013. Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. J. Hydrol. 504, 69–79. Tehrany, M.S., Pradhan, B., Jebur, M.N., 2014. Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J. Hydrol. 512, 332–343. Tehrany, M.S., Shabani, F., Neamah Jebur, M., Hong, H., Chen, W., Xie, X., 2017. GIS-based spatial prediction of flood prone areas using standalone frequency ratio, logistic regression, weight of evidence and their ensemble techniques. Geomatics Nat. Hazards Risk 8 (2), 1538–1561. Termeh, S.V.R., Kornejady, A., Pourghasemi, H.R., Keesstra, S., 2018. Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Sci. Total Environ. 615, 438–451. Tien Bui, D., Hoang, N.D., 2017. A Bayesian framework based on a Gaussian mixture model and radial-basis-function Fisher discriminant analysis (BayGmmKda V1.1) for spatial prediction of floods. Geosci. Model Dev. 10, 3391–3409. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., 2012. Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and Naive Bayes models. Math. Probl. Eng. 2012, 1–27. Timofeev, R., 1984. Classification and Regression Trees (CART) theory and applications. Wadsworth Stat. Ser. x, 358. USDA, S.C.S., 1986. Urban hydrology for small watersheds. Technical Release. 55, pp. 2–6. Vapnik, V., 2013. The Nature of Statistical Learning Theory. Springer Science & Business Media. Ward, R.C., Robinson, M., 2000. Principles of Hydrology. 4th edn. McGraw-Hill, Maidenhead. Wu, F., Sun, Y., Sun, Z., Wu, S., Zhang, Q., 2017. Assessing agricultural system vulnerability to floods: a hybrid approach using emergy and a landscape fragmentation index. Ecol. Indic. https://doi.org/10.1016/j.ecolind.2017.10.050 (in press). Yahaya, S., Ahmad, N., Abdalla, R.F., 2010. Multicriteria analysis for flood vulnerable areas in Hadejia-Jama'are River basin, Nigeria. Eur. J. Sci. Res. 42 (1), 71–83. Yalcin, A., Reis, S., Aydinoglu, A.C., Yomralioglu, T., 2011. A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression methods for landslide susceptibility mapping in Trabzon, NE Turkey. Catena 85 (3), 274–287. Yang, W., Xu, K., Lian, J., Ma, C., Bin, L., 2018. Integrated flood vulnerability assessment approach based on TOPSIS and Shannon entropy methods. Ecol. Indic. 89, 269–280. Yesilnacar, E.K., 2005. The Application of Computational Intelligence to Landslide Susceptibility Mapping in Turkey. University of Melbourne, Department, p. 200. Yousefi, S., Mirzaee, S., Keesstra, S., Surian, N., Pourghasemi, H.R., Zakizadeh, H.R., Tabibian, S., 2018. Effects of an extreme flood on river morphology (case study: Karoon River, Iran). Geomorphology 304, 30–39. Zhao, G., Pang, B., Xu, Z., Yue, J., Tu, T., 2018. Mapping flood susceptibility in mountainous areas on a national scale in China. Sci. Total Environ. 615, 1133–1142.