A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for landslide susceptibility modeling: A case study in the Himalayan area, India

A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for landslide susceptibility modeling: A case study in the Himalayan area, India

Author’s Accepted Manuscript A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for lands...

8MB Sizes 2 Downloads 33 Views

Author’s Accepted Manuscript A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for landslide susceptibility modeling: a case study in the Himalayan area, India Binh Thai Pham, Ataollah Shirzadi, Dieu Tien Bui, Indra Prakash, M.B Dholakia www.elsevier.com/locate/ijsrc

PII: DOI: Reference:

S1001-6279(16)30132-9 https://doi.org/10.1016/j.ijsrc.2017.09.008 IJSRC146

To appear in: International Journal of Sediment Research Received date: 5 December 2016 Revised date: 17 May 2017 Accepted date: 25 September 2017 Cite this article as: Binh Thai Pham, Ataollah Shirzadi, Dieu Tien Bui, Indra Prakash and M.B Dholakia, A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for landslide susceptibility modeling: a case study in the Himalayan area, India, International Journal of Sediment Research, https://doi.org/10.1016/j.ijsrc.2017.09.008 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting galley proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

A hybrid machine learning ensemble approach based on a Radial Basis Function neural network and Rotation Forest for landslide susceptibility modeling: a case study in the Himalayan area, India Binh Thai Phama,b*, Ataollah Shirzadic, Dieu Tien Buid, Indra Prakashe, M.B Dholakiaf a

Department of Civil Engineering, Gujarat Technological University, Nr.Visat Three Roads, Visat - Gandhinagar Highway,

Chandkheda, Ahmedabad - 382424, Gujarat, India. b

Department of Geotechnical Engineering, University of Transport Technology, 54 TrieuKhuc, ThanhXuan, Ha Noi,Viet

Nam. c

Department of Rangeland and Watershed Management, College of Natural Resources, University of Kurdistan, Sanandaj, Iran

d

Geographic Information System Group, Department of Business and IT, University College of Southeast Norway,

Gullbringvegen 36, N-3800 Bø i Telemark, Norway. e

Department of Science & Technology, Bhaskarcharya Institute for Space Applications and Geo-Informatics (BISAG),

Government of Gujarat, Gandhinagar, India. f

Department of Civil Engineering, LDCE, Gujarat Technological University, Ahmedabad - 380015, Gujarat, India.

*

Corresponding author: [email protected] or [email protected] (Binh Thai Pham)

Acknowledgement Authors are thankful to the Director, Bhaskarcharya Institute for Space Applications and Geo-Informatics (BISAG), Department of Science & Technology, Government of Gujarat, Gandhinagar, Gujarat, India for providing facilities to carry out this research work. Disclosure statement No potential conflict of interest was reported by the authors.

Abstract In this paper, a hybrid machine learning ensemble approach namely the Rotation Forest based Radial Basis Function (RFRBF) neural network is proposed for spatial prediction of landslides in part of the Himalayan area (India). The proposed approach is an integration of the Radial Basis Function (RBF) neural network classifier and Rotation Forest ensemble, which are state-of-the art machine learning algorithms for classification problems. For this purpose, a spatial database of the study area was established that consists 1

of 930 landslide locations and fifteen influencing parameters (slope angle, road density, curvature, land use, distance to road, plan curvature, lineament density, distance to lineaments, rainfall, distance to river, profile curvature, elevation, slope aspect, river density, and soil type). Using the database, training and validation datasets were generated for constructing and validating the model. Performance of the model was assessed using the Receiver Operating Characteristic (ROC) curve, area under the ROC curve (AUC), statistical analysis methods, and the Chi square test. In addition, Logistic Regression (LR), Multi-layer Perceptron Neural Networks (MLP Neural Nets), Naïve Bayes (NB), and the hybrid model of Rotation Forest and Decision Trees (RFDT) were selected for comparison. The results show that the proposed RFRBF model has the highest prediction capability in comparison to the other models (LR, MLP Neural Nets, NB, and RFDT); therefore, the proposed RFRBF model is promising and should be used as an alternative technique for landslide susceptibility modeling. Keywords: Landslide; GIS; Rotation Forest; Radial Base Function Neural Network; India

1. Introduction Landslide susceptibility maps depict areas where landslides are likely to occur based on local terrain conditions and the past spatial distribution of landslides (Ilia et al., 2010; Wang et al., 2015). Identification of potentially landslide prone areas is important and helps to minimize potential damages through land use planning and slope engineering management (Dai et al., 2002; Varnes, 1984; Wang et al., 2005). Over the last decade, although landslide susceptibility assessment and mapping were widely investigated aiming to determine susceptible areas, landslides are still occurring and cause huge loss of life, damages to properties, and the environment (Ilia & Tsangaratos, 2016; Shirzadi et al., 2017; Tsangaratos & Benardos, 2014). In addition, in recent years, the number of landslide occurrences has increased due to changes of climate (Gariano & Guzzetti, 2016; Kalsnes et al., 2016), growing population, urbanization, development of settlements and deforestation (Fuller et al., 2016; Haque et al., 2016; Saito et al., 2017; Tien Bui et al., 2016a). Therefore, it is necessary to predict landslide-prone areas as accurately as possible, and for this purpose, investigation of new frameworks and techniques for landslide modeling is highly necessary. During recent decades, a number of different methods were proposed for landslide modeling (Yilmaz, 2010) including heuristic, deterministic (engineering approach), and probabilistic (non-deterministic or data-driven) methods. Heuristic methods are based on expert’s judgment to give weights to each parameter for landslide analysis, which is known as a subjective method (Günther et al., 2013). Deterministic 2

methods are based on the mathematical models of the physical mechanisms that control slope stability (Kramer, 1996). However, deterministic methods have the limitation of collecting geotechnical and hydrogeological data for a large area (Terlien et al., 1995). Probabilistic methods are based on the spatial distribution of past and current landslides in conjunction with a set of influencing parameters for predicting future landslides (Ohlmacher & Davis, 2003b). Among these methods, probabilistic methods are now used more widely for landslide susceptibility assessment at the regional scale (Gokceoglu et al., 2005; Hoang & Tien Bui, 2016; Ilia et al., 2015; Pham et al., 2016c; Youssef et al., 2016). Machine learning methods are among the probabilistic methods, which give better results for landslide prediction, and were applied efficiently in recent years (Dickson & Perry, 2016; Pham et al., 2016c; Tien Bui et al., 2016b; Vasu & Lee, 2016). Methods such as artificial neural networks (Melchiorre et al., 2008; Pham et al., 2015b; Poudyal et al., 2010), logistic regression (Conoscenti et al., 2014; Ozdemir, 2011; van den Eeckhaut et al., 2006b), fuzzy-logic (Pourghasemi et al., 2012; Pradhan, 2011);

decision tree

(Alkhasawneh et al., 2014; Tsangaratos & Ilia, 2016b; Yeon et al., 2010), Naive Bayes (Pham et al., 2015b; Tsangaratos & Ilia, 2016a), support vector machines (Marjanović et al., 2011; Pham et al., 2016b; Pourghasemi et al., 2013) are considered as the most popular machine learning methods applied for landslide prediction with high accuracy. However, recent approaches for landslide modeling show that the prediction of landslide susceptibility could be enhanced with the use of hybrid machine learning techniques (Althuwaynee et al., 2014b; Tien Bui et al., 2016b; Vasu & Lee, 2016). Therefore, exploration of new hybrid machine learning methods for landslide susceptibility modeling should be further carried out. In the present study, this gap in literature is partially filled by proposing a new hybrid machine learning approach for landslide susceptibility modeling with a case study for part of the Himalayan Area (India). The proposed approach relies on an integration of the Radial Basis Function neural networks (RBF Neural Nets) and Rotation Forest ensemble, and named as Rotation Forest based RBF (RBFRF) Neural Network. According to current literature, such a hybrid approach has not been explored for landslide modeling. To confirm the usability of the proposed approach, several popular models, such as Logistic Regression (LR), Multi-layer Perceptron Neural Networks (MLP Neural Nets), Naïve Bayes (NB), and the hybrid model of Rotation Forest and Decision Trees (RFDT), were selected for comparison. It is noted that the data preparation was done using ArcGIS 10.2 software, whereas the modeling process was done using Weka 3.8 software. In addition, the modeling results were transferred to a GIS format to open in ArcGIS 10.2 using a script developed by the authors in the Matlab environment 3

2. Background of the methods used 2.1. Radial Basis Function neural network The Radial Basis Function Neural Networks (RBF Neural Nets) is a hidden layer neural network that consists of three layers such as the input layer, hidden layer, and output layer (Orr, 1996). The input layer broadcasts the coordinates of the input vector to each of the units in the hidden layer. Each unit in the hidden layer then produces an activation based on the associated RBF (Orr, 1996). Finally, each unit in the output layer computes a linear combination of the activations of the hidden units. In the classification case, the output of the model learning of the RBF Neural Nets for the input pattern x is given as (Du & Swamy, 2014):

fi ( x)   k 1 wki  x  ak m



(1)

where m is a the number of computing units, wki is a the connecting weights, ak is a the RBF centers or prototypes, and the function of  . is selected as a Gaussian function (Du & Swamy, 2014). To select the initial hidden unit centers, a k-means function is used on the training dataset in an unsupervised manner of a RBF Neural Nets network. Moreover, the initial value of all variance parameter(s) in the network is set to the maximum squared Euclidean distance between any pair of cluster centers. 2.2. Rotation Forest Ensemble A Rotation Forest ensemble, which uses machine learning algorithms, can be used to improve the prediction accuracy of weak individual classifiers (Liu & Huang, 2008). Classifier ensembles are generally more accurate compared to a single base classifier (Ozcift & Gulten, 2011). These algorithms are first trained using the samples (conditioning parameters). In the test phase, algorithms are then applied to select the future samples (Li & Zhou, 2007). Basically, the capability of an algorithm to predict the correct status (occurrence and non-occurrence of landslides) is defined as “success” of the algorithm in the analysis of dataset. In the Rotation Forest algorithm (Rodriguez et al., 2006), if

X

is the training dataset with

N n

matrix (N

is the number of columns and n is the number of row of matrix), Y is the corresponding labels and feature dataset; Also, if  is the class labels of Y from a set of classes {1 , feature set of the dataset is to be partitioned randomly into 4

K

subsets and

L

F

is the

, 2 } . Assuming that the decision trees in a Rotation

Forest algorithm are recognized by {D1 ,

, D L }. Otherwise,

L

and

K

are the two parameters that should

be obtained. The ensemble is built by applying a given base learning algorithm to different training datasets (Zhang & Zhang, 2008). In order to construct the training dataset for classifier D i , the following steps are required in the analysis: Step 1:

F

is randomly split into

K

feature sets with each subset having the number of features M  n . K

Step 2: If Fij is the j th subset of features to train classifier D i and X ij denote the dataset

X

for the features

in Fij ,then for X ij a nonempty random subset defined as X ij is designed which is a new training set 75% of the size of the original training dataset formed using the bootstrap method. Consequently, to produce the coefficients of matrix C ij , a linear transformation is applied to X ij . The size of each matrix of X ij is  with the coefficients of a ij , 1

, a

Mj  ij

M 1

.

Step 3: a sparse rotation matrix namely, R i , is created as given in the following: a 1 , , a  M 1 0 i1  i1 2 M2   0 a i 2 , , a  i 2    0 

      MK   ,a iK 

0 0 a

K iK

,

(2)

At this time, according to the original feature set, the columns of R i are rearranged. A novel rearranged





rotation matrix is assumed as R ia . Then, xR ia will denote the transformed training set for classifier D i . Ultimately, all classifiers will be trained in a parallel manner (Rodriguez et al., 2006).





The classification phase for a given test dataset x is evaluated when d ij xR ia is the probability obtained from the classifier, D i , with the hypothesis that x belongs to class  j . Then, the confidence of a class is computed by the average combination method (Eq. 3):

 j X  

1 L d ij  x R ia , j  1,  L i 1

(3)

,c.

5

X where  j  is confidence of a class of  j by the average combination method (or the perdition of x),

d ij  xRia  is the probability assigned by the classifier Di and the regressor d ij to the hypothesis that x comes from class  j , and c is the number of classes (Rodriguez et al., 2006). Consequently, Principal Component Analysis (PCA) rotation matrices are computed for each

K

subset of

input variables (Ayerdi & Graña, 2014) to preserve variability of the information in the data. The new features for the base classifier are formed by means of

K

axis rotations. The Rotation Forest algorithm

tries to enhance the diversity and accuracy of the ensemble. 2.3. Model assessment 2.3.1. Statistical analysis methods In order to statistically evaluate the performance of the landslide models, accuracy, kappa index (k), and Root Mean Square Error (RMSE) criteria can be used (Althuwaynee et al., 2014a; Tien Bui et al., 2012b). These statistical indexes can be calculated from four types of possible consequences in a cross table including true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The proportion of the number of pixels which are correctly classified as landslides is TP and if they are incorrectly classified as landslide is FP (Pham et al., 2016b). TN is defined when the number of pixels which are correctly classified as non-landslides, whereas the FN is described for the number of pixels that are incorrectly classified as non-landslides (Tsangaratos & Benardos, 2014). The fore mentioned statistical indexes can be calculated from the following equations: Accuracy 

TP  TN  / TP  TN  FP  FN 

(4)

k  Pc  P exp 1  P exp

RMSE 

where

Pc is

(5)

 1 n n EO-MO 2     i 1 

(6)

defined as the observed agreement whereas P exp is defined as the expected agreement, n is

total number of samples in datasets, EO is the estimated value of the ith observation, and MO is the measured value of the ith observation (Pham et al., 2016c).

2.3.2. Receiver Operating Characteristic (ROC) curve method Past research has proven that the ROC curve is a standard and useful tool to determine the quality of deterministic and probabilistic models (Swets, 1988). The ROC curve provides a comprehensive and 6

visually attractive way to summarize the accuracy of predictions (Tien Bui et al., 2016c). It is widely applicable regardless of the source of predictions. In the ROC curve, on the X axis, the “sensitivity” of the model is plotted against “100-specificity” on the Y axis (Tsangaratos & Ilia, 2015). The area under the ROC curve (AUC) shows the ability of a model to predict the correct occurrence or non-occurrence of landslide events. As the AUC values increase, the prediction capability of the model is better. According to Yesilnacar and Topal (2005), the quantitative–qualitative relation between the AUC and prediction accuracy can be classified as follows: 0.5–0.6 (poor), 0.6–0.7 (average), 0.7–0.8 (good), 0.8–0.9 (very good), and 0.9–1(excellent). 2.3.3. Chi square test The Chi square test is a popular statistical method which is often used for comparison when the difference among models is small (Pham et al., 2016e). The Chi square test is based on the prior null assumption that models are not statistically different from one another, a significance level of 0.05 is selected, and then the actual significance level (p) and Chi-square values   2  can be calculated (Sarkar & Kanungo, 2004). The null assumption can be verified by comparing the values of (p) and   2  with standard values of 0.05 and 3.841, respectively (Wilson & Hilferty, 1931). If the actual significance level is smaller than 0.05

 p  0.05 and the Chi-square value is higher than 3.841, then the null hypothesis is rejected, which means the difference between the models is statistically significant (Lewis & Burke, 1949).

3. A hybrid machine learning ensemble approach based on RBF neural network and Rotation Forest for landslide prediction in part of the Himalayan area (India) 3.1 Methodological flow chart The methodology for spatial prediction of landslides using the hybrid machine learning ensemble approach of RBF neural network and Rotation Forest known as Rotation Forest based RBF (RFRBF) Neural Nets can be done in five main steps (Fig. 1): (1) collecting landslide data, (2) generating datasets, (3) processing models, (4) validating landslide models, and (5) generating a landslide susceptibility map. (1) Collecting landslide data: In the present study, historical and current landslide locations were collected from a part of Himalayan area (India) by analyzing Google Earth images and integrating this information with available field data. Fifteen landslide influencing parameters were considered

7

for analysis based on the geo-environmental characteristics of the study area and mechanisms of past and current landslides. (2) Generating datasets: Datasets (training and validation) for building and validating the models were generated using landslide data collected from the study area. Landslide locations were divided randomly into two parts, one part consists of 70% of the landslide locations and it is used to generate a training dataset and another part consist of 30% and it is used to generate a validation dataset. Landslide locations were then combined with landslide influencing parameters to generate final datasets for learning models. (3) Modeling process: The training dataset was used as input data in the modeling process. Firstly, the Rotation Forest ensemble was applied to optimize the training dataset to generate the optimal input data for the next classification step. Sub-training datasets, which were generated and optimized, have then been utilized to classify landslide and non-landslide classes for spatial prediction of landslides using the RBF Neural Nets classifier. Finally, the RFRBF model was constructed based on the integration of both Rotation Forest ensemble and RBF Neural Nets classifier for landslide susceptibility assessment in the study area.

Fig. 1 Flow chart for landslide spatial prediction (Note: LR is Logistic Regression, RBF Neural Nets is Radial Basis Function Neural Networks, MLP Neural Nets is Multiple Perceptron Neural Networks, NB is Naïve Bayes, and RFDT is the hybrid model of Rotation Forest and Decision Trees) (4) Validating landslide models: the predictive capability of the hybrid RFRBF model was validated using quantitative methods such as the ROC curve method, statistical analysis methods, and the Chi square test. (5) Generating landslide susceptibility map: using the results from learning with the RFRBF model, a landslide susceptibility map was generated for landslide hazard management of the study area. 3.2. Description of the study area The study area covers about a 132,547 m2 area, between longitudes 78o37’40’E to 79o00’50’’E and latitudes 30o23’15’’N to 30003’58’’N, in part of the Himalayan area of India (Fig. 2). The area is situated in the subtropical monsoon region. Annual rainfall ranges from 200 to 1000 mm with a mean annual 8

rainfall of 463 mm, the temperature varies from sub-zero to 45o C, and the relative humidity varies from 25 to 85%.

Fig. 2 Location map of the study area with landslide and non-landslide data Topographically, the area belongs to the mountainous region (96%) having many high mountain ranges and deep valleys. The elevation varies from 450 to 2738 m with an average elevation of about 1373 m above standard sea level. The general elevation in 75% of the study area varies from 900 to 1900 m. About 85% of hill slopes range from 20 to 40 degrees, whereas a few areas have steep slopes up to 70 degrees. Geologically, three main types of rocks are present in the study area, metamorphic rocks, igneous rocks, and sedimentary rocks. Among these rocks, metamorphic rocks such as phyllite and quartzite belonging to the Jaunsar group are predominant in the area. Mainly silt and loamy soils along with debris materials are present in the area. Loamy soils occupy about 75.9% area. In the study area, different types of loamy soils are present: coarse loam or sandy loam, fine loam or fine sandy loam, skeletal loam, mixed loam (loamy soil with about 10% humus), and fine silt clay loam. Loamy soil composed mostly of sand (40%), silt (40%), and clay (20%). The skeletal loam consists of very deep, well drained soils on river terraces developed on slope ranges from 0 to 2 %. Soil classification is based on United States Department of Agriculture (USDA) soil taxonomy classification (Fig.4g). 3.3. Data collection and analysis 3.3.1. Landslide and non-landslide data The landslide inventory map in this area was constructed based on interpretation of Google Earth images in Google Earth pro 7.0. A total of 930 landslide locations were identified and mapped (Fig. 2). These landslide locations were then selected randomly for field verification. In addition, these locations were validated from historical reports. Three types of landslides were classified in this area namely rotational (70 landslide locations), translational (730 landslide locations), and debris flow (130 landslide locations). Landslides in this area occur mainly during heavy rainfall in monsoon season along road cuts and rivers (Fig. 3). Landslide locations have first been converted into raster data at a 20 x 20 m grid size, and then divided randomly into two parts. One part includes 651 landslide pixels (70%) utilized to make the training data. Another part consists of 279 landslide pixels (30%) utilized to build the validation data. 9

In landslide susceptibility modeling, landslide prediction is considered as a binary classification problem where landslide and non-landslide locations were taken into account for generating the datasets. In the present study, non-landslide locations were selected from non-landslide areas which were identified on the basis of the analysis of the geomorphology of the study area (Fig. 2). Non-landslide locations have also been converted into raster data with 20x20m grid size. A total of 651 non-landslide pixels were used for generating the training dataset whereas 279 other non-landslide pixels wereused for creating the validation dataset.

Fig. 3 Photos of landslides along roads and rivers in the study area (source: http://www.portal.gsi.gov.in/) 3.3.2. Landslide influencing parameters Landslide influencing parameters in the present study were extracted mainly from the Aster Global DEM, Landsat images, and other available thematic maps (http://www.ahec.org.in/wfw/maps.htm) which have been collated in a GIS environment for the analysis. In the study area, fifteen parameters (slope angle, road density, curvature, land use, distance to road, plan curvature, lineament density, distance to lineaments, rainfall, distance to river, profile curvature, elevation, slope aspect, river density, and soil type) were selected as influencing parameters on landslide occurrences which is based on the analysis of the nature of landslide occurrences and characteristics of environmental conditions. More specifically, slope angle is one of the most significant influencing parameters on landslide occurrences (Pham et al., 2015a, 2016f). Gentle slopes usually have a lower frequency of landslide occurrences than steep slopes (Dai et al., 2001). Slope aspect is defined as the direction of maximum slope of the terrain surface in degrees clockwise from 0 to 360, where 0, 90, 180, and 270 degrees are north, east, south, and west facing, respectively (Kavzoglu et al., 2015). Slope aspect plays a significant role on landslide occurrence as exposure to sunlight and rainfall, affects the weathering process (Anbalagan et al., 2015). Curvature is defined as the rate of change of slope angle and aspects of terrain surface (Nefeslioglu et al., 2010). Curvature parameter is affected by riverbank erosion, and, thus, it is important for landslide analysis. Morphology of curvature would be negative, zero, and positive for concave, flat, and convex surfaces, respectively (Nefeslioglu et al., 2008). Plan curvature is the horizontal element of the terrain surface which represents the local relief of topography (Ayalew & Yamagishi, 2005). This parameter via the divergence and convergence of flow direction on hill slope influences the landslide process (Oh & Pradhan, 2011). Profile curvature is defined as the horizontal form of the topography of 10

the terrain surface in the direction of the maximum slope (Ayalew & Yamagishi, 2005) which describes the complexities of the terrain, thus, affecting landslide occurrences.

Soil type is highly related to geologic units which have a significant relation with landslide occurrences (Ohlmacher & Davis, 2003a). For example, surface and subsurface runoff and infiltration processes are dependent on soil texture, porosity, and permeability (Mogaji et al., 2015). Land cover is related to the level of vegetation covering on the earth surface which may also affect occurrences of landslides. Vegetation also helps in preventing soil erosion (Varnes, 1984) as roots of vegetation play the role of anchors to maintain stability of slopes (Pham et al., 2016f). Rainfall is one of the most important parameters affecting landslide occurrence as it affects the properties of soil and rocks especially weathered rocks on slopes (Lee et al., 2004; Pham et al., 2016d). A rainfall map was constructed based on meteorological data that were collected over 30 years from 1984 to 2014 according to the Climate Forecast System Reanalysis (CFSR) in the Global Weather data for SWAT (NCEP, 2014) using the spline interpolation method (Kawamura et al., 1992). Distance to lineaments and lineament density are known as influencing parameters for landslide occurrences as lineaments are defined as fractures, faults, geomorphologic ridges, and other tectonic and topographic structures which affect continuity of rocks and soils masses (Ayalew & Yamagishi, 2005). Distance to roads and road density are usually taken into account for landslide analysis as road networks create discontinuities in the slope material mass (Yalcin et al., 2011). Distance to rivers and river density are also known as influencing parameters for landslide occurrences as river networks affect the surface runoff and ground water infiltration (Pham et al., 2015b). Landslide susceptibility assessment was done by establishing the spatial relation between past and current landslide events and landslide influencing parameters. Therefore, maps of influencing parameters were generated and reclassified into different classes (Table 1 and Fig. 4). Reclassification of these parameters for generating the maps is based on an analysis of the frequency of landslides in the study area in conjunction with an analysis of the nature of landslide occurrences (Tsangaratos et al., 2016c; Varnes, 1978, 1984). In addition, it is also based on the analysis of geo-environmental characteristics of the study area in relation with landslide occurrences (Pham et al., 2016f).

11

Fig. 4 Landslide influencing parameters maps: (a) slope angle map, (b) slope aspect map, (c) elevation map, (d) curvature map, (e) plan curvature map, (f) profile curvature map, (g) soil type map, (h) land cover map, (i) rainfall map, (j) distance to lineaments map, (k) distance to roads map, (l) distrance to rivers map, (m) lineament density map, (n) road density map, and (o) river density map (Pham et al., 2016f) Table 1 Landside influencing parameters and their classes No.

Landslide influencing parameters

1

Slope angle (o)

2

Slope aspect

Classes (1) 0-10; (2) 10-20; (3)20-30; (4) 30-40; (5) 40-50; (6) 50 - 60; (7) > 60 1) flat; (2) north; (3) northeast; (4) east; (5) southeast; (6) south; (7) southwest; (8) west; (9) northwest (1) 0 - 700; (2) 700 - 900; (3) 900 - 1100; (4) 1100 - 1300; (5) 1300

3

Elevation (m)

– 1500; (6) 1500 - 1700; (7) 1700 - 1900; (8) 1900 - 2100; (9) 2100 - 2300; (10) > 2300 (1) high concave [(-18.25) - (-2)]; (2) concave [( -2) - (-0.05)]; (3)

4

Curvature

flat [(-0.05) – (0.05)]; (4) convex [0.05 – 2]; (5) high convex [2 24.25]

5

Plan curvature

6

Profile curvature

7

Soil type

8

Land cover

9

Rainfall (mm)

10

Distance to lineaments (m)

11

Distance to roads (m)

12

Distance to rivers (m)

(1) [(-8.6) - (-1.1)]; (2) [(-1.1) - (-0.3)]; (3) [(-0.3) - 0.3]; (4) [0.3 1]; (5) [1 - 8.5] (1) [(-16.3) - (-1.4)]; (2) [(-1.4) - (-0.4)]; (3) [(-0.4) - (0.3)]; (4) [0.3 - 1.2]; (5) [1.2 - 4] (1) coarse loam; (2) fine loam; (3) skeletal loam; (4) mixed loam; (5) fine silt (1) dense forest; (2) open forest; (3) scrub land; (4) non forest (1) < 300; (2) 300 - 400; (3) 400 - 500; (4) 500 - 600; (5) 600 - 700; (6) 700 - 800; (7) > 800 (1) 0 - 100; (2) 100 - 200; (3) 200 - 300; (4) 300 - 400; (5) 400 500; (6) 500 - 600; (7) 600 - 700; (8) > 700 (1) 0 - 50; (2) 50 - 100; (3) 100 - 150; (4) 150-200; (5) 200-250; (6) > 250 (1) 0 - 50; (2) 50 - 100; (3) 100 - 150; (4) 150-200; (5) 200-250; (6) 12

> 250

13

14

Lineament density (km/km2) Road density (km/km2)

(1) very low (0 - 0.673); (2) low (0.673 - 0.939); (3) moderate (0.939 - 1.184); (4) high (1.184 - 1.469); (5) very high (1.469 2.602) (1) very low (0 - 0.05); (2) Low (0.05 - 0.154); (3) moderate (0.154 - 0.268); (4) high (0.268 - 0.423); (5) very high (0.423 - 1.268) (1) very low (0 - 0.086); (2) low (0.086 - 0.589); (3) moderate

15

River density (km/km2)

(0.589 - 1.049); (4) high (1.049 - 1.623); (5) very high (1.623 3.663)

3.4. Results and analysis 3.4.1. Training the RFRBF model Using the training dataset, the RFRBF model was trained and constructed for landslide susceptibility assessment. In the training process of the RBF Neural Nets classifier, number of seed, maximum iteration, and minimum standard deviation were selected as 1, -1, and 0.1, respectively. Meanwhile, the number of clusters was changed to achieve the best value with the highest AUC value corresponding with the highest predictive capability of the model. The results of optimization of the calculation parameters of the RBF Neural Nets model are shown in Fig.5. The result of the analysis shows that 14 clusters have the highest value of AUC (87.8 %). Therefore, the value of 14 was determined as the number of clusters option in the RBF Neural Nets model.

Fig. 5 The performance of the RBF Neural Nets model using different numbers of clusters The training process for the proposed RFRBF model was started by optimizing the training dataset using Rotation Forest ensemble method. The maximum and minimum number of groups and seed was chosen 3 and 1, respectively. The PCA was determined as a projection filter option. RBF Neural Nets was used as a based classifier in the classifier option. In the training process of the RFRBF model, selection of the number of iterations was done from 1 to 20 iterations to achieve the best model based on the value of AUC (Fig. 6). Results of the optimization of the number of iterations in the hybrid model of RFRBF show that the number of iterations of “6” gives the highest value of AUC (89.1 %), thus, the number of iterations of “6” was selected to build the RFRBF model for landslide spatial prediction in this study. 13

Fig. 6 The performance of the RFRBF model using different numbers of iterations 3.4.2. Validation of the RFRBF model Detail accuracy assessment was done for the RFRBF model using both the training and validation datasets (Table 2). Results show that the predictive accuracy values for the training dataset and validation dataset are 85.9% and 82.8%, respectively. The RMSE values are 0.718 and 0.656 for training dataset and validation dataset, respectively. The Kappa index values for the training and validation dataset are 0.718 and 0.656, respectively. Table 2 Performance of the RFRBF model using the training dataset and validation datasets No

Parameter

Training dataset

Validation dataset

1

Accuracy (%)

85.9

82.8

2

Kappa

0.718

0.656

3

RMSE

0.326

0.362

The performance of the RFRBF model was validated using the ROC curve method (Figs. 7 and 8). The results of the analysis of the ROC curve show that the AUC values of the hybrid RFRBF model are very high for both the training dataset (0.929) and the validation dataset (0.891).

Fig. 7 Analysis of the ROC curve of the RFRBF model using the training dataset

Fig. 8 Analysis of the ROC curve of the RFRBF model using the validation dataset 3.4.3. Generating landslide susceptibility map The landslide susceptibility map was developed using the RFRBF model (Fig. 9). In the model study, the Landslide Susceptibility Indexes (LSI) were reclassified into five relative susceptibility classes: very low, low, moderate, high, and very high using geometrical interval methods (Pham et al., 2016f). Predicted susceptibility zones and landslide density (Eq. 7) were calculated based on raster cells on landslide susceptibility maps to evaluate coincidence of the zonation areas with existing landslides (Fig. 9 and Table 3).

14

Landslide density 

LP / SLP CP / SCP

(7)

where LP is the number of landslide pixels on each class, SLP is the total number of landslide pixels, CP is the number of class pixels on each class, SCP is the total number of class pixels (Pham et al., 2016f).

Fig. 9 Landslide Susceptibility Map (LSM) using the RFRBF model in this study Results of landslide density analysis show that most of landslide pixels were observed in the very high class as the landslide density value is the highest in the very high class (3.49). In addition, the analysis of the spatial distribution of susceptible classes on the slope map has also been done (Fig. 10). Results show that high and very high classes were observed mostly for slope angles of 20 – 30 degrees (72.82%) and 30 – 40 degrees (58.55%). Table 3 Landslide density values observed on landslide susceptibility map

No

Class

LSI intervals

Number of

Number of

% Class

class pixels

landslide pixels

pixels

% Landslide pixels

Landslide density

1

Very low

0.031 - 0.109

727556

17

22.02

1.83

0.08

2

Low

0.109- 0.140

418185

18

12.65

1.94

0.15

3

Moderate

0.140 - 0.219

656077

31

19.85

3.33

0.17

4

High

0.219- 0.421

685596

62

20.75

6.67

0.32

5

Very high

0.421 - 0.940

817104

802

24.73

86.24

3.49

Fig. 10 Analysis of the spatial distribution of landslide susceptibility classes on slope angle map 4. Comparison of the RFRBF model with other methods and techniques The usability of the proposed model was assessed through a comparision with models derived from benchmark methods: Logistic regression (LR), Multiple Perceptron Neural Network (MLP Neural Nets), Naïve Bayes (NB), and the hybrid model of Rotation Forest and Decision Trees (RFDT). LR - a popular method for landslide spatial prediction (Lee & Pradhan, 2007; van den Eeckhaut et al., 2006a) is based on Logit-natural logarithms for analyzing the relation of the presence and absence of landslides (output classes) with a set of influencing parameters (input variables) (Ayalew & Yamagishi, 2005). 15

MLP Neural Nets were applied widely for landslide susceptibility assessment (Ermini et al., 2005; Gomez & Kavzoglu, 2005). The MLP Neural Nets use a back-propagation algorithm as in the function of a human brain to explore the knowledge in weighted connections (Pradhan & Lee, 2010). For building the the MLP Neural Nets model, the structure of the model was determined using the method proposed in Tien Bui et al. (2016e, d). Accordingly, the logistic sigmoid was selected as the activation function, whereas the learning rate, momentum, and training iteration were selected as 0.3, 0.2, and 500, respectively. As a result, the network with 14 inputs, 1 neuron in the hidden layer, and one output was used. NB, which is a popular technique used in landslide modeling (Tien Bui et al., 2012a; Tsangaratos & Ilia, 2016a), was selected for comparison. The NB is base on Bayes’ theorem to estimate the prior probabilities of each class (landslide or non-landslide) in relation to a set of influencing parameters in the study area. RFDT is a hybrid machine learning ensemble method which uses Decision Trees (DT) as a base classifier. The DT, which is a popular classification tree method (Nefeslioglu et al., 2010), is based on decision rules to predict the output classes (landslide and non-landslide) from a set of input variables (influencing parameters). The DT often is selected as a base classifier in traditional Rotation Forest ensemble method because of their sensitivity to rotation of the feature axes (Ozcift & Gulten, 2011). In the present study, DT is also selected as a base classifier to compare with the RBF Neural Nets base classifier using the Rotation Forest ensemble in the hybrid approach. In the training process of the RFDT model, the number of folds for training the DT classifier and the number of iterations for training the RFDT model were selected as 3 and 15, respectively, to get the best performance. The prediction capabilities of the four models using the validation dataset are listed in Tables 4 and 5 and shown in Fig. 11. The results of the statistial analysis show that the RFRBF model has the highest value of accuracy (0.828), followed by the RFDT model (0.827), the LR model (0.826), the MLP Neural Nets (0.823), and the NB model (0.804). Similarly, the the RFRBF model has the highest Kappa value (0.656), followed by the RFDT model (0.653), the LR model (0.652), the MLP Neural Nets (0.646), and the NB model (0.608). In contrast, the the RFRBF model has the lowest RMSE value (0.362), followed by the RFDT model (0.363), the LR model (0.367), the MLP Neural Nets (0.368), and the NB model (0.392). The results of the ROC curve analysis show that the RFRBF model also has the higest value of AUC (0.891), followed by the RFDT model (0.888), the LR model (0.885), the MLP Neural Nets (0.874), and the NB model (0.870). Results of the Chi square test show that the Chi-square values of comparison pairs

16

between the RFRBF model and other models are higher than the standard value of 3.841, and the p values are also smaller than the threshold value of 0.05 (Table 5). Table 4 The performance of other benchmark landslide models No

Statistical indexes

RFDT

LR

MLP Neural Nets

NB

1

Accuracy (%)

0.827

0.826

0.823

0.804

2

Kappa

0.653

0.652

0.646

0.608

3

RMSE

0.363

0.367

0.368

0.392

Fig. 11 The analysis of the ROC curve for the other benchmark landslide models using the validation dataset Table 5 Chi square test results No

Comparative pairs

Chi-square values   2 

p value

Significance

1

RFRBF vs RFDT

449.782

< 0.0001

Yes

2

RFRBF vs LR

463.689

< 0.0001

Yes

3

RFRBF vs MLP Neural Nets

436.560

< 0.0001

Yes

4

RFRBF vs RBF Neural Nets

378.297

< 0.0001

Yes

5

RFRBF vs NB

399.532

< 0.0001

Yes

5. Discussion Recently, many algorithms were proposed for landslide susceptibility modeling. In the landslide studies, machine learning algorithms are more accurate than conventional statistical techniques (Abedi et al., 2012). However, ensemble methods are known as good machine learning techniques in combining statistical methods for better landslide susceptibility assessment (Althuwaynee et al., 2014b; Jebur et al., 2015; Umar et al., 2014). In the present study, the Rotation Forest ensemble which is known as one of the best ensemble techniques was integrated with the RBF Neural Nets classifier to generate the hybrid model of RFRBF for better landslide spatial prediction in part of the Himalayan area (India). The result analysis for the study area shows that the RFRBF model has good predictive capability for landslide susceptibility assessment, even better than other benchmark landslide models (RFDT, LR, MLP Neural Nets, and NB). The reason for the better predictive capability of RFRBF model is its use of the Rotation Forest ensemble which has the ability to improve the performance for the weak individual 17

classifiers (Pham et al., 2016f; Rodriguez et al., 2006). Moreover, input data used for classification in the RFRBF model were optimized using the Rotation Forest ensemble which has improved the predictive capability of the hybrid RFRBF model (Pham et al., 2016a). In addition, RBF Neural Nets, which was used in the hybrid RFRBF model as a classifier, also improved the predictive capability as it has many advantages in comparison to other methods such as (i) it is easier to be trained than other neural networks like MLP Neural Nets, (ii) it can enhance the stability of the training process because it has a strong tolerance for input noise, and (iii) it can respond well for missing patterns (Yu et al., 2011). Hong et al. (2015) and Zare et al. (2013) studied the comparison between RBF Neural Nets and MLP Neural Nets for landslide susceptibility assessment, and stated that the MLP Neural Nets model outperforms the RBF Neural Nets model. However, the results of the present study show that the RFRBF model, which used the RBF Neural Nets model as a base classifier in the ensemble framework, has better performance than the MLP Neural Nets model. This finding means that the RFRBF, which used the Rotation Forest ensemble in the hybrid framework, has significantly improved the prediction capability of a single RBF Neural Nets classifier for landslide spatial prediction. Even though the hybrid model of RFRBF was applied successfully and efficiently for landslide spatial prediction in the present study, it is noticed that the RFRBF model is only applicable for binary classification problems. In addition, for obtaining best performance from this model, it is required to optimize the input parameters used in the modeling process such as the number of clusters and the number of iterations. The landslide susceptibility map was constructed using the hybrid model by reclassifying the LSI based on the geometrical interval method. It is generally assumed that highest landslide frequencies are expected to occur in very high susceptibility and high susceptibility zones, and low landslide frequencies are expected in very low susceptibility and low susceptibility zones (Chauhan et al., 2010). Similar observtions were observed in the present study as the landslide susceptibility map of the study area indicates that the density of landslide pixels in susceptibility classes increased gradually from very low class to very high class. Thus, landslide susceptibility map produced based on the hybrid machine learning ensemble approach of the RFRBF would be useful for landslide hazard prediction. In addition, the result of the spatial distribution of susceptibility classes on the slope map proves that slope angles of 20 – 40 degrees are more susceptible to landslide occurrence which is in conformity of study carried out by Pham et al. (2016c). 6. Conclusions 18

In this study, a hybrid machine learning ensemble approach (RFRBF) based on RBF Neural Nets and Rotation Forest ensemble was proposed with the aim to improve the overall performance of landslide susceptibility modeling, with a case study in the Himalayan area (India). Prediction capability of the hybrid model was validated and compared using the ROC curve method, statistical analysis methods, and the Chi square test. In addition, four popular models namely RFDT, LR, MLP Neural Nets, and NB were used for determining prediction accuracy for comparison with the RFRBF model. The results show that the proposed RFRBF model has high prediction capability for landslide susceptibility assessment. Compared with the other benchmark models, the proposed model performs better; therefore, the RFRBF model is a promising technique which can be used for better landslide susceptibility mapping in other parts of the mountain areas of the world prone to landslide hazards. It can also be concluded that Rotation Forest ensemble is an efficient technique for improving the predictive capability of weak individual classifiers.

19

References Abedi, M., Norouzi, G.-H., & Bahroudi, A. (2012). Support vector machine for multi-classification of mineral prospectivity areas. Computers & Geosciences, 46, 272-283. Alkhasawneh, M., Sh Ngah, U.K., Tay, L.T., Isa, N.A.M., & Al-Batah, M.S. (2014). Modeling and testing landslide hazard using decision tree. Journal of Applied Mathematics, 2014. Althuwaynee, O.F., Pradhan, B., Park, H.-J., & Lee, J.H. (2014a). A novel ensemble bivariate statistical evidential belief function with knowledge-based analytical hierarchy process and multivariate statistical logistic regression for landslide susceptibility mapping. Catena, 114(0), 21-36. Althuwaynee, O.F., Pradhan, B., Park, H.-J., & Lee, J.H. (2014b). A novel ensemble decision tree-based CHi-squared Automatic Interaction Detection (CHAID) and multivariate logistic regression models in landslide susceptibility mapping. Landslides, 11(6), 1063-1078. Anbalagan, R., Kumar, R., Lakshmanan, K., Parida, S., & Neethu, S. (2015). Landslide hazard zonation mapping using frequency ratio and fuzzy logic approach, a case study of Lachung Valley, Sikkim. Geoenvironmental Disasters, 2(1), 1. Ayalew, L., & Yamagishi, H. (2005). The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, central Japan. Geomorphology, 65(1), 15-31. Ayerdi, B., & Graña, M. (2014). Hybrid extreme rotation forest. Neural Networks, 52, 33-42. Chauhan, S., Sharma, M., Arora, M., & Gupta, N. (2010). Landslide susceptibility zonation through ratings derived from artificial neural network. International Journal of Applied Earth Observation and Geoinformation, 12(5), 340-350. Conoscenti, C., Angileri, S., Cappadonia, C., Rotigliano, E., Agnesi, V., & Märker, M. (2014). Gully erosion susceptibility assessment by means of GIS-based logistic regression: A case of Sicily (Italy). Geomorphology, 204(0), 399-411. Dai, F., Lee, C., Li, J., & Xu, Z. (2001). Assessment of landslide susceptibility on the natural terrain of Lantau Island, Hong Kong. Environmental Geology, 40(3), 381-391. Dai, F.C., Lee, C.F., & Ngai, Y.Y. (2002). Landslide risk assessment and management: an overview. Engineering Geology, 64, 65–87. Dickson, M.E., & Perry, G.L. (2016). Identifying the controls on coastal cliff landslides using machinelearning approaches. Environmental Modelling & Software, 76, 117-127. Du, K.-L., & Swamy, M. (2014). Radial basis function networks, Neural Networks and Statistical Learning (pp. 299-335). London: Springer. Ermini, L., Catani, F., & Casagli, N. (2005). Artificial neural networks applied to landslide susceptibility assessment. Geomorphology, 66(1), 327-343. Fuller, I.C., Riedler, R.A., Bell, R., Marden, M., & Glade, T. (2016). Landslide-driven erosion and slope– channel coupling in steep, forested terrain, Ruahine Ranges, New Zealand, 1946–2011. Catena, 142, 252-268. Gariano, S.L., & Guzzetti, F. (2016). Landslides in a changing climate. Earth-Science Reviews, 162, 227252. Gokceoglu, C., Sonmez, H., Nefeslioglu, H.A., Duman, T.Y., & Can, T. (2005). The 17 March 2005 Kuzulu landslide (Sivas, Turkey) and landslide-susceptibility map of its near vicinity. Engineering Geology, 81(1), 65-83. Gomez, H., & Kavzoglu, T. (2005). Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River basin, Venezuela. Engineering Geology, 78(1), 11-27.

20

Günther, A., Reichenbach, P., Malet, J.-P., van den Eeckhaut, M., Hervás, J., Dashwood, C., & Guzzetti, F. (2013). Tier-based approaches for landslide susceptibility assessment in Europe. Landslides, 10(5), 529-546. Haque, U., Blum, P., da Silva, P.F., Andersen, P., Pilz, J., Chalov, S.R., … Poyiadji, E. (2016). Fatal landslides in Europe. Landslides, 13(6), 1545-1554. Hoang, N.-D., & Tien Bui, D. (2016). A novel relevance vector machine classifier with cuckoo search optimization for spatial prediction of landslides. Journal of Computing in Civil Engineering, 30(5), 1-10. Hong, H., Xu, C., Revhaug, I., & Bui, D.T. (2015). Spatial prediction of landslide hazard at the Yihuang area (China): A comparative study on the predictive ability of backpropagation multi-layer perceptron neural networks and radial basic function neural networks, In R. S. Claudia, B. M. C. Carla & M. Ld. M. Paulo (Eds.), Cartography-Maps Connecting the World (pp. 175-188). Switzerland: Springer. Ilia, I., Koumantakis, I., Rozos, D., Koukis, G., & Tsangaratos, P. (2015). A geographical information system (GIS) based probabilistic certainty factor approach in assessing landslide susceptibility: the case study of Kimi, Euboea, Greece. In G. Lollino, D. Glordan, G. Crosta, J. Coromimas, R. Azzam, J. Wasowski, & N. Sciarra(Eds.) Engineering Geology for Society and Territory-Volume 2 (pp. 1199-1204). Cham: Springer. Ilia, I., & Tsangaratos, P. (2016). Applying weight of evidence method and sensitivity analysis to produce a landslide susceptibility map. Landslides, 13(2), 379-397. Ilia, I., Tsangaratos, P., Koumantakis, I., & Rozos, D. (2010). Application of a Bayesian approach in GISbased model for evaluating landslide susceptibility. Case study Kimi area, Euboea, Greece. Bulletin of the Geological Society of Greece, 3, 1590-1600. Jebur, M.N., Pradhan, B., & Tehrany, M.S. (2015). Manifestation of LiDAR-derived parameters in the spatial prediction of landslides using novel ensemble evidential belief functions and support vector machine models in GIS. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 8(2), 674-690. Kalsnes, B., Nadim, F., Hermanns, R., Hygen, H., Petkovic, G., Dolva, B., … Høgvold, D. (2016). Landslide risk management in Norway. In K. H. S. Lacasse & L. Picarelli (Eds.), Slope Safety Preparedness for Impact of Climate Change (pp. 215-252). Boca Raton, FL: CRC Press. Kavzoglu, T., Sahin, E.K., & Colkesen, I. (2015). An assessment of multivariate and bivariate approaches in landslide susceptibility mapping: A case study of Duzkoy district. Natural Hazards, 76(1), 471496. Kawamura, H., Sasaki, T., & Otsuki, T. (1992). Spline interpolation method (Google Patents). Kramer, S.L. (1996). Geotechnical earthquake engineering. Delhi Pearson Education India. Lee, S., & Pradhan, B. (2007). Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides, 4(1), 33-41. Lee, S., Ryu, J.-H., Won, J.-S., & Park, H.-J. (2004). Determination and application of the weights for landslide susceptibility mapping using an artificial neural network. Engineering Geology, 71(3), 289-302. Lewis, D., & Burke, C.J. (1949). The use and misuse of the chi-square test. Psychological Bulletin, 46(6), 433. Li, M., & Zhou, Z.-H. (2007). Improve computer-aided diagnosis with machine learning techniques using undiagnosed samples. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 37(6), 1088-1098. Liu, K.-H., & Huang, D.-S. (2008). Cancer classification using rotation forest. Computers in Biology and Medicine, 38(5), 601-610. 21

Marjanović, M., Kovačević, M., Bajat, B., & Voženílek, V. (2011). Landslide susceptibility assessment using SVM machine learning algorithm. Engineering Geology, 123(3), 225-234. Melchiorre, C., Matteucci, M., Azzoni, A., & Zanchi, A. (2008). Artificial neural networks and cluster analysis in landslide susceptibility zonation. Geomorphology, 94(3-4), 379-400. Mogaji, K., Lim, H., & Abdullah, K. (2015). Regional prediction of groundwater potential mapping in a multifaceted geology terrain using GIS-based Dempster–Shafer model. Arabian Journal of Geosciences, 8(5), 3235-3258. National Centers for Environmental Prediction (NCEP). (2014). Global weather data for SWAT. Retrieved from http://globalweather.tamu.edu/home. Nefeslioglu, H., Sezer, E., Gokceoglu, C., Bozkir, A., & Duman, T. (2010). Assessment of landslide susceptibility by decision trees in the metropolitan area of Istanbul, Turkey. Mathematical Problems in Engineering, 2010. Nefeslioglu, H.A., Gokceoglu, C., & Sonmez, H. (2008). An assessment on the use of logistic regression and artificial neural networks with different sampling strategies for the preparation of landslide susceptibility maps. Engineering Geology, 97(3-4), 171-191. Oh, H.-J., & Pradhan, B. (2011). Application of a neuro-fuzzy model to landslide-susceptibility mapping for shallow landslides in a tropical hilly area. Computers & Geosciences, 37(9), 1264-1276. Ohlmacher, G.C., & Davis, J.C. (2003a). Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Engineering Geology, 69(3–4), 331-343. Ohlmacher, G.C., & Davis, J.C. (2003b). Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Engineering Geology, 69(3), 331-343. Orr, M.J. (1996). Introduction to radial basis function networks (Technical Report). Center for Cognitive Science, University of Edinburgh. Ozcift, A., & Gulten, A. (2011). Classifier ensemble construction with rotation forest to improve medical diagnosis performance of machine learning algorithms. Computer methods and programs in biomedicine, 104(3), 443-451. Ozdemir, A. (2011). Using a binary logistic regression method and GIS for evaluating and mapping the groundwater spring potential in the Sultan Mountains (Aksehir, Turkey). Journal of Hydrology, 405(1–2), 123-136. Pham, B.T., Bui, D.T., Dholakia, M.B., Prakash, I., Pham, H.V., Mehmood, K., & Le, H.Q. (2016a). A novel ensemble classifier of rotation forest and Naïve Bayer for landslide susceptibility assessment at the Luc Yen district, Yen Bai Province (Viet Nam) using GIS. Geomatics, Natural Hazards and Risk, 1-23. Pham, B.T., Bui, D.T., Prakash, I., & Dholakia, M. (2016b). Evaluation of predictive ability of support vector machines and naive Bayes trees methods for spatial prediction of landslides in Uttarakhand state (India) using GIS. Journal of Geomatics, 10(1), 71 - 79. Pham, B.T., Pradhan, B., Tien Bui, D., Prakash, I., & Dholakia, M.B. (2016c). A comparative study of different machine learning methods for landslide susceptibility assessment: A case study of Uttarakhand area (India). Environmental Modelling & Software, 84, 240-250. Pham, B.T., Tien Bui, D., Dholakia, M.B., Prakash, I., & Pham, H.V. (2016d). A comparative study of least square support vector machines and multiclass alternating decision trees for spatial prediction of rainfall-induced landslides in a tropical cyclones area. Geotechnical and Geological Engineering, 34(1), 1-18. Pham, B.T., Tien Bui, D., Indra, P., & Dholakia, M.B. (2015a). Landslide susceptibility assessment at a part of Uttarakhand Himalaya, India using GIS–based statistical approach of frequency ratio method. International Journal of Engineering Research & Technology, 4(11), 338-344. 22

Pham, B.T., Tien Bui, D., Pourghasemi, H.R., Indra, P., & Dholakia, M.B. (2015b). Landslide susceptibility assesssment in the Uttarakhand area (India) using GIS: A comparison study of prediction capability of naïve bayes, multilayer perceptron neural networks, and functional trees methods. Theoretical and Applied Climatology, 122(3-4), 1-19. Pham, B.T., Tien Bui, D., Prakash, I., & Dholakia, M.B. (2016e). Hybrid integration of multilayer perceptron neural networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS. Catena, 149(1), 52-63. Pham, B.T., Tien Bui, D., Prakash, I., & Dholakia, M.B. (2016f). Rotation forest fuzzy rule-based classifier ensemble for spatial prediction of landslides using GIS. Natural Hazards, 83(1), 1-31. Poudyal, C.P., Chang, C., Oh, H.-J., & Lee, S. (2010). Landslide susceptibility maps comparing frequency ratio and artificial neural networks: a case study from the Nepal Himalaya. Environmental Earth Sciences, 61(5), 1049-1064. Pourghasemi, H.R., Jirandeh, A.G., Pradhan, B., Xu, C., & Gokceoglu, C. (2013). Landslide susceptibility mapping using support vector machine and GIS at the Golestan Province, Iran. Journal of Earth System Science, 2, 349–369. Pourghasemi, H.R., Pradhan, B., & Gokceoglu, C. (2012). Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Natural hazards, 63(2), 965-996. Pradhan, B. (2011). Manifestation of an advanced fuzzy logic model coupled with geo-information techniques to landslide susceptibility mapping and their comparison with logistic regression modelling. Environmental and Ecological Statistics, 18(3), 471-493. Pradhan, B., & Lee, S. (2010). Landslide susceptibility assessment and factor effect analysis: backpropagation artificial neural networks and their comparison with frequency ratio and bivariate logistic regression modelling. Environmental Modelling & Software, 25(6), 747-759. Rodriguez, J.J., Kuncheva, L.I., & Alonso, C.J. (2006). Rotation Forest: A new classifier ensemble method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619-1630. Saito, H., Murakami, W., Daimaru, H., & Oguchi, T. (2017). Effect of forest clear-cutting on landslide occurrences: Analysis of rainfall thresholds at Mt. Ichifusa, Japan. Geomorphology, 276, 1-7. Sarkar, S., & Kanungo, D. (2004). An integrated approach for landslide susceptibility mapping using remote sensing and GIS. Photogrammetric Engineering & Remote Sensing, 70(5), 617-625. Shirzadi, A., Bui, D.T., Pham, B.T., Solaimani, K., Chapi, K., Kavian, A, … Revhaug, I. (2017). Shallow landslide susceptibility assessment using a novel hybrid intelligence approach. Environmental Earth Sciences, 76(2), 60. Swets, J.A. (1988). Measuring the accuracy of diagnostic systems. Science, 240(4857), 1285-1293. Terlien, M.T., van Westen, C.J., & van Asch, T.W. (1995). Deterministic modelling in GIS-based landslide hazard assessment. In A. Carrara & F. Guzzetti (Eds.), Geographical Information Systems in Assessing Natural Hazards (pp. 57-77). Netherlands: Springer. Tien Bui, D., Anh Tuan, T., Hoang, N.-D., Quoc Thanh, N., Nguyen, B.D., Van Liem, N., & Pradhan, B. (2016a). Spatial prediction of rainfall-induced landslides for the Lao Cai area (Vietnam) using a novel hybrid intelligent approach of least squares support vector machines inference model and artificial bee colony optimization. Landslides, 14, 447-458. Tien Bui, D., Ho, T.-C., Pradhan, B., Pham, B.-T., Nhu, V.-H., & Revhaug, I. (2016b). GIS-based modeling of rainfall-induced landslides using data mning based functional trees classifier with AdaBoost, bagging, and multiBoost ensemble frameworks. Environmental Earth Sciences, 75(14), 1101. Tien Bui, D., Pham, B.T., Nguyen, Q.P., & Hoang, N.-D. (2016c). Spatial prediction of rainfall-induced shallow landslides using hybrid integration approach of least-squares support vector machines and 23

differential evolution optimization: a case study in central Vietnam. International Journal of Digital Earth, 9(11), 1-21. Tien Bui, D., Pradhan, B., Lofman, O., & Revhaug, I. (2012a). Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and naive Bayes models. Mathematical Problems in Engineering, 2012, 1-26. Tien Bui, D., Pradhan, B., Lofman, O., Revhaug, I., & B Dick, O. B. (2012b). Application of support vector machines in landslide susceptibility assessment for the Hoa Binh province (Vietnam) with kernel functions analysis. International Environmental Modelling and Software Society (iEMSs), 226. Tien Bui, D., Pradhan, B., Nampak, H., Quang Bui, T., Tran, Q.-A., & Nguyen, Q.P. (2016d). Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibility modelling in a high-frequency tropical cyclone area using GIS. Journal of Hydrology, 540, 317-330. Tien Bui, D., Tuan, T.A., Klempe, H., Pradhan, B., & Revhaug, I. (2016e). Spatial prediction models for shallow landslide hazards: A comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides, 13, 361378. Tsangaratos, P., & Benardos, A. (2014). Estimating landslide susceptibility through a artificial neural network classifier. Natural Hazards, 74(3), 1489-1516. Tsangaratos, P., & Ilia, I. (2015). Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece. Landslides, 1-16. Tsangaratos, P., & Ilia, I. (2016a). Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: The influence of models complexity and training dataset size. Catena, 145, 164-179. Tsangaratos, P., & Ilia, I. (2016b). Landslide susceptibility mapping using a modified decision tree classifier in the Xanthi Perfection, Greece. Landslides, 13(2), 305-320. Tsangaratos, P., Ilia, I., Hong, H., Chen, W., & Xu, C. (2016c). Applying information theory and GISbased quantitative methods to produce landslide susceptibility maps in Nancheng County, China. Landslides, 1-21. Umar, Z., Pradhan, B., Ahmad, A., Jebur, M.N., & Tehrany, M.S. (2014). Earthquake induced landslide susceptibility mapping using an integrated ensemble frequency ratio and logistic regression models in West Sumatera Province, Indonesia. Catena, 118, 124-135. van den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G., & Vandekerckhove, L. (2006a). Prediction of landslide susceptibility using rare events logistic regression: a case-study in the Flemish Ardennes (Belgium). Geomorphology, 76(3), 392-410. van den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G., & Vandekerckhove, L. (2006b). Prediction of landslide susceptibility using rare events logistic regression: A case-study in the Flemish Ardennes (Belgium). Geomorphology, 76(3-4), 392-410. Varnes, D.J. (1978). Slope movement types and processes, In R. L. Schuster and R. J. Krizek (Eds.), Landslides - Analysis and control, National Academy of Sciences Transportation Research Board Special Report No, 176, 12-23. Varnes, D.J. (1984). Landslide Hazard Zonation: A Review of Principles and Practice, UNESCO, Paris. Vasu, N.N., & Lee, S.-R. (2016). A hybrid feature selection algorithm integrating an extreme learning machine for landslide susceptibility modeling of Mt. Woomyeon, South Korea. Geomorphology, 263, 50-70. Wang, H.B., Liu, G.J., Xu, W.Y., & Wang, G.H. (2005). GIS-based landslide hazard assessment: an overview. Progress in Physical Geography, 29(4), 548-567. 24

Wang, L.J., Guo, M., Sawada, K., Lin, J., & Zhang, J. (2015). Landslide susceptibility mapping in Mizunami City, Japan: A comparison between logistic regression, bivariate statistical analysis and multivariate adaptive regression spline models. Catena, 135, 271-282. Wilson, E.B., & Hilferty, M.M. (1931). The distribution of chi-square. Proceedings of the National Academy of Sciences, 17(12), 684-688. Yalcin, A., Reis, S., Aydinoglu, A.C., & Yomralioglu, T. (2011). A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistics regression methods for landslide susceptibility mapping in Trabzon, NE Turkey. Catena, 85(3), 274-287. Yeon, Y.-K., Han, J.-G., & Ryu, K.H. (2010). Landslide susceptibility mapping in Injae, Korea, using a decision tree. Engineering Geology, 116(3–4), 274-283. Yesilnacar, E., & Topal, T. (2005). Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Engineering Geology, 79(3), 251-266. Yilmaz, I. (2010). Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environmental Earth Sciences, 61(4), 821-836. Youssef, A.M., Pourghasemi, H.R., Pourtaghi, Z.S., & Al-Katheeri, M.M. (2016). Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah basin, Asir Region, Saudi Arabia. Landslides, 13, 839-856. Yu, H., Xie, T., Paszczynski, S., & Wilamowski, B.M. (2011). Advantages of radial basis function networks for dynamic system design. IEEE Transactions on Industrial Electronics, 58(12), 54385450. Zare, M., Pourghasemi, H.R., Vafakhah, M., & Pradhan, B. (2013). Landslide susceptibility mapping at Vaz Watershed (Iran) using an artificial neural network model: A comparison between multilayer perceptron (MLP) and radial basic function (RBF) algorithms. Arabian Journal of Geosciences, 6(8), 2873-2888. Zhang, C.X., & Zhang, J.S. (2008). RotBoost: A technique for combining rotation forest and AdaBoost. Pattern Recognition Letters, 29(10), 1524-1536.

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48