Advanced Engineering Informatics 42 (2019) 100978
Contents lists available at ScienceDirect
Advanced Engineering Informatics journal homepage: www.elsevier.com/locate/aei
Full length article
Spatial prediction of shallow landslide using Bat algorithm optimized machine learning approach: A case study in Lang Son Province, Vietnam
T
Dieu Tien Buia,b, Nhat-Duc Hoangc, , Hieu Nguyend, Xuan-Linh Trane ⁎
a
Geographic Information Science Research Group, Ton Duc Thang University, Ho Chi Minh City, Viet Nam Faculty of Environment and Labour Safety, Ton Duc Thang University, Ho Chi Minh City, Viet Nam c Faculty of Civil Engineering, Institute of Research and Development, Duy Tan University, P809 - 03 Quang Trung, Danang, Viet Nam d Institute of Research and Development, Duy Tan University, P809 - 03 Quang Trung, Danang, Viet Nam e Institute of Research and Development, Duy Tan University, P809 - 03 Quang Trung, Danang, Viet Nam b
ARTICLE INFO
ABSTRACT
Keywords: Shallow landslide Least Squares Support Vector Classification Bat algorithm GIS Susceptibility map
This study develops a machine learning method that hybridizes the Least Squares Support Vector Classification (LSSVC) and Bat Algorithm (BA), named as BA-LSSVC, for spatial prediction of shallow landslide. To construct and verify the hybrid method, a Geographic Information System (GIS) database for the study area of Lang Son province (Vietnam) has been employed. LSSVC is used to separate data samples in the GIS database into two categories of non-landslide (negative class) and landslide (positive class). The BA metaheuristic is employed to assist the LSSVC model selection process by fine-tuning its hyper-parameters: the regularization coefficient and the kernel function parameter. Experimental results point out that the hybrid BA-LSSVC can help to achieve a desired prediction with an accuracy rate of more than 90%. The performance of BA-LSSVC is also better than those of benchmark methods, including the Convolutional Neural Network, Relevance Vector Machine, Artificial Neural Network, and Logistic Regression. Hence, the newly developed model is a capable tool to assist local authority in landslide hazard mitigation and management.
1. Introduction A landslide, also called landslip, is defined as the movement of a mass of rock, debris or earth down a slope under the effects of gravity. When gravitational and shear stresses within a slope are higher than the shear strength of the slope-forming materials, landslides happen. There are many factors that can trigger landslides. These factors include: earthquake, rainfall, increase of hydrostatic pressure in cracks and fractures, mining, deforestation, cultivation and construction. As Vietnam has a monsoon-influenced tropical climate and has almost no earthquake, rainfall is recognized as the main cause of landslides. According to a recent report, thousands of people worldwide die each year due to 105 landslides [76]. The economic damage of landslides is also substantial and has been widely reported in previous studies [83]. In Vietnam, as reported by the Vietnam Meteorological and Hydrological Administration [80], from 2000 to 2018, there were 300 flash floods and landslides which were the causes of 943 casualties and economic damage of tens of thousands of billions Vietnamese Dong (billions dollars). It is therefore very important to understand the
causes, the mechanism of landslides and especially to be able to predict and/or mitigate the future occurrence of landslides. Landslides have been academically studied since as early as the beginning of the nineteenth century by geologists. Nowadays, apart from works on reporting/investigating historical events, there are also studies on the causes and behaviors of landslides, and especially more and more studies dealing with the prediction and prevention of landslides. One of the most important topics of research in the last category is landslide susceptibility mapping [33,90]. According to Brabb [9], landslide susceptibility is the likelihood of a landslide occurring in an area on the basis of local terrain conditions. Mathematically, it can be defined as the probability of spatial occurrence of slope failures, given a set of geo-environmental conditions [29]. A landslide susceptibility map is very important in preventing catastrophic losses and developing sustainable land-use. It can detect risky areas that can be affected by future landslides. Based on such a map, landslide mitigation and prevention can be achieved by building landslide defense structures or by moving local communities and industrial facilities in danger areas to safer areas. As a matter of fact,
Corresponding author. E-mail addresses:
[email protected] (D. Tien Bui),
[email protected] (N.-D. Hoang),
[email protected] (H. Nguyen),
[email protected] (X.-L. Tran). ⁎
https://doi.org/10.1016/j.aei.2019.100978 Received 21 March 2019; Received in revised form 1 July 2019; Accepted 12 August 2019 1474-0346/ © 2019 Elsevier Ltd. All rights reserved.
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
developing susceptibility map is considered the most important step in landslide hazard management [14]. In addition, the last thirty years have witnessed the trend that GIS, a computer assisted system for the capture, storage, retrieval, analysis and display of spatial data, has become an integral part in many datadriven landslide susceptibility models and other types of natural hazard mapping models reported in previous works [38,66] and the review done by Chacón et al. [12]. In these models, a historical landslide inventory and a set of landslide conditioning factors are combined to establish the GIS database. Subsequently, statistically-based models (classical statistical or modern machine learning models) are then employed to build susceptibility maps. Many statistical-based models have successfully been developed for landslide susceptibility. According to the review carried out by Reichenbach et al. [55], among 565 articles published from January 1983 to June 2016 on statistically-based landslide susceptibility models, classical statistical models account for nearly 80% of all the occurrences, notably logistic regression (LR) (18.5%), data overlay (10.7%), multi-criteria decision evaluation (MCDE) (10.6%), indexbased models (e.g., weight-of-evidence (8.2%)). Other classical statistical models may include step-wise weight ratio assessment (SWARA) [20], logistic regression [2,27]. Machine learning models were not introduced for landslide susceptibility until the early 2000s but they have gained more and more attention due to their superior capability in handling multiple variables and nonlinearity. Various machine learning approaches including Artificial neural networks [53,82], neuro-fuzzy models [13,54], support vector machine [1,40,41,71], fuzzy instance based learning [67], and decision tree [4,32], and Bayes' net [15] have shown to be capable alternatives for constructing landslide susceptibility map at a regional scale. However, investigations of alternative methods are still important because of the complex nature of landslide phenomenon which involes many governing factor and of the unique characteristics of each study area. It is because a model can be suitable for the study in one region but not the study in another one. Evidents of this fact are clearly demonstrated in previous studies [50,81]. Recently, Zhou et al. [91] carried out a comparative study in which the support vector machine (SVM), neural network, and the logistic regression were applied for landslide susceptibility modeling; this study revealed that SVM is capable of delivering the most accurate predictive outcome. Kadavi et al. [39] constructed landslide susceptibility maps using various ensemble-based machine learning models including the AdaBoost, LogitBoost, Multiclass Classifier, and Bagging models for a case study in of South Korea. A metaheuristic optimized adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms has been proposed in [36] for spatial prediction of landslide susceptibility. Recent works on landslide spatial modeling shows an increasing trend of applying machine learning methods for the solving the problem of interest [6,33]. Support Vector Machine (SVM) is a powerful machine learning method for solving spatial modeling of natural hazards including floods [56,64], forest fires [43], and landslides [33,40,49]. One of the reasons for the successful applications of SVM in modeling remote sensing and GIS data is that this machine learning method is established based on the principal of structural risk minimization [79]; this means SVM is less likely prone to overfitting than other approaches such as neural networks [8]. Other advantages of SVM which makes it a suitable method for spatial prediction of shallow landslide include high capability of nonlinear data modeling and resilience to noise [25]. Nevertheless, it is notable that the model construction phase of SVM requires solving a quadratic programming problem which can impose certain difficulty for large-sized GIS data sets. To resolve this issue, Suykens et al. [61] put forward an improved version of SVM called least squares Support Vector Machine (LSSVM). The LSSVM used for pattern classification tasks is also called the Least Squares Support Vector
Classification (LSSVC). The LSSVM inherits the advantages of the SVM. Moreover, the model establishment phase of this machine learning approach only requires solving a linear system instead of the aforemention nonlinear programming problem necessitated by the SVM. The capability of LSSVM has been demonstrated in various successful applications [16,26,84]. In addition, the model training phase of a LSSVM model used for data classification require a proper selection of its hyper-parameters, including the regularization constant and the kernel function parameter [21,71]. These two parameters are usually referred to as hyper parameters. The task of selecting them is called the model selection problem [8]. This is an important task because hyper parameters can greatly affect the learning process and strongly influence its prediction capability. If the model parameters are not set properly, the LSSVC-based models might not be accurate due to either over fitting or under fitting [62,78]. The selection of these hyper-parameters is by no means an easy task because they must be search in continuous domains. Therefore, there are infinite combinations of the model hyper-parameters. This fact means that an exhaustive search to identify the most suitable values of the regularization constant and the kernel function parameter is infeasible. Thus, metaheuristic approaches can be employed for optimizing the LSSVC based shallow landslide prediction model by identifying an appropriate set of model hyper-parameters. In this paper, the authors focus on developing a hybrid machine learning model based on the LSSVC and Bat Algorithm (BA) to build a shallow landslide susceptibility map for the study area of Lang Son province, Vietnam. In LSSVC, nonlinear data is converted into linearly separable data in a high-dimensional feature space using kernel functions [61]. BA, described in [89], is a nature-inspired metaheuristic for coping with complex global optimization problems [58,65,75,85]. However, the capability of this metaheuristic approach for optimization LSSVC based shallow landslide model has not yet been investigated. Hence, our current study is an attempt to fill this gap in the literature. Another contribution of the proposed model is that since the model selection phase of the LSSVC approaches can be achieved by the BA. The constructed shallow landslide prediction model can be employed and adapted automatically without human effort and intervention. The rest of the paper is organized as follows: the next section presents general information of the study area which is the Lang Son province (Vietnam). The third section reviews the employed computational methods used for model construction. The fourth section describes the proposed BA Optimized LSSVC used for landslide susceptibility mapping, followed by the next section which reports the experimental results. The last section summarizes the current study with several concluding remarks. 2. The study area and the landslide inventory 2.1. General description of the study area The study area, Lang Son city, is located between the longitudes of 106° 41́ 34˝ E and 106° 48́ 32˝ E, and between the latitudes of 21° 49́ 43˝ N and 21° 57́ 13˝ N. It extends for about 101.3 km2 in Lang Son province in the Northeastern Vietnam (see Fig. 1). This study area is selected as the case study for the current work due to various reasons. First, this province has a complex geographical condition; mountains and forests occupy 80% of the province's area. Second, the capital of this province (which is also named as Lang Son) has two international border crossings; thus, it is a strategically important town in terms of trade and culture. Third, due to the rapid development of economics in this study area during the last two decades, the expansions of the infrastructures and the settlements into the mountainous regions have disturbed the natural geographical condition. This fact coupled with the severe deforestation leads to a potential increase in landslide occurrences in recent years [69]. Moreover, by studying the factors 2
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
Fig. 1. Location of the study area and landslide inventory.
Three different sources are utilized to build the landslide inventory map of the area for the last 15 years. Aerial photos with spatial resolution of about 1 m which are obtained from the previous work of Tien Bui et al. [70] and field survey data are used to determine the locations of landslides occurred before 2003. The information of landslides in 2006 and 2009 is extracted from [63,74]. Finally, Nguyen et al. [47] provides the locations of some recent landslides. In total, 101 landslide locations covering 3455 pixels are identified and registered in the inventory map (see Fig. 1). They are divided into two groups: the first group containing 69 locations (2410 pixels) is employed for model training and the second group with 32 locations (1045 pixels) is utilized for model verification. In other words, approximately 70% of the data is served as the training set and 30% of the data is used as the verification set. In order to have a complete data set, data samples belonging to non-landslide locations are randomly sampled from the GIS database.
governing landslides in this study area, it can help local authority to generalize the landslide circumstances in neighboring provinces within the area of northern Vietnam which have similar geographic and climatic conditions. The average elevation of the study area is 325.6 m above standard sea level; while the lowest and highest elevation is 214 m and 800 m, respectively. The annual average temperature in the area varies from 17 °C to 22 °C. July and January are the months with highest and lowest average monthly temperature of 27.5 °C and 12.5 °C, respectively. In addition, the climate in the area is very humid with annual average relative humidity between 80% and 85%. The humidity usually reaches its peak in August and its lowest value in December. The precipitation in the area is affected by monsoon winds (seasonal reversing winds) [10]. It is most abundant in rainy season from May to September; with the average annual rainfall varies from 1200 mm to 1600 mm. Historical records indicate that landslides are common in the area and high amount of rainfall is their main cause.
2.3. Landslide conditioning factors
2.2. Landslide inventory map
Landslide prediction models are based on the assumption that the factors that triggered landslides do not change over time [31]. Therefore, choosing the landslide conditioning factors is very important in developing an accurate landslide susceptibility map [37]. In this study, based on reviewing previous works on landslide spatial modeling [11,60,69,71] and the available data sources for the study area, we consider fourteen landslide conditioning factors. They are slope angle, slope length, slope aspect, curvature, elevation, topographic wetness index (TWI), stream power index (SPI), sediment transport index (STI), valley depth, toposhade, lithology, land use, soil type, and distance to faults. Herein, the first ten factors belong to the geo-morphometrical group while the last four belong to the geo-environmental group. In order to obtain the geo-morphometrical factors, first a digital
In order to develop a landslide susceptibility map for the study area, a landslide inventory map is constructed and landslide conditioning factors are collected. The landslide inventory map is built with the information of past landslide events. The selected landslide conditioning factors, which are known for their influences on the likelihood of landslide occurrences, cover a variety of aspects including geomorphological, geological, hydrological, and climatic factors. In this study, we are only interested in shallow soil slides and debris flows triggered by rainfall because no earthquake-induced landslides have been documented in the area. Consequently, a few rock fall events were removed from the inventory due to their rarity, insignificant damages and also different mechanism. 3
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
Fig. 2. Landslide conditioning factors used in the study area: (a) Slope; (b) Slope length; (c) Aspect; (d) Curvature; (e) Elevation; (f) TWI; (g) SPI; (h) STI; (i) Valley depth; (j) Topo-shape; (k) Land use (RA: Residential Area; PTL: Protective Forest Land; PDL: Productive Forest Land; PL: Paddy Land; BL: Barren Land; PCL: Perennial Crop Land; WSL: Water Surface Land); (l) Soil type (FA: Ferralic Acrisols; DG: Dystric Gleysols; PA: Plinthic Acrisols; WS: Water Surface; DF: Dystric Fluvisols; EF: Eutric Fluvisols; RF: Rhodic Ferralsols; RMS: Rocky Mountain Surface; (m) Lithology; (n) Distance to faults.
4
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
Fig. 2. (continued)
elevation map (DEM) was generated using the National Topographic Maps at scale of 1:5000 for the Lang Son city. The spatial resolution of the DEM is 5 m × 5 m. A raster resolution of 20 m was used for all the conditioning factors in the landslide modeling process. From the constructed DEM, all the geo-morphometrical factors were extracted using the software package ArcGIS 10.2. Continuous values of these factors (except slope aspect) were reclassified into classes using Jenks Natural Break optimization method [48]available in ArcGIS 10. 2, as suggested and explained by Hung et al. [34]. Fig. 2 shows all fourteen conditioning factors and their categories.
nonlinear mapping functions during the model training and prediction phases. The outcome of this dot product of nonlinear mapping functions is the kernel function K(.) [28]. The LSSVC algorithm adapts the optimal separating hyper-plane or decision boundary by adjusting the parameters of its normal vector [61]. The data classification process of the LSSVC is illustrated in Fig. 3; herein, the machine learning method is employed to generalize a decision boundary that separates the input space into two distinctive domains: “landslide” and “non-landslide”. Important advantages of the LSSVC include its good generalization capability and fast computation. It is because the learning phase of LSSVC is based on the principal of Structural Risk Minimization [79]; this principal of learning is capable of balancing the model’s complexity and data fitting accuracy. Moreover, the model construction phase of this machine learning method can be converted to solving a system of linear equations instead of the quadratic programming problem necessitated by the SVM. Thus, good performance of the LSSVC has been reported in geoscience and other engineering disciplines [59,68,71,92,93]. The LSSVM learning phase can be stated as the following constrained optimization problem [61]:
3. The employed computational methods 3.1. Least Squares Support Vector Classification (LSSVC) Proposed by Suykens and Vandewalle [62], LSSVC is an effective machine learning method for pattern recognition. This machine learning approach is viewed as a least squares version of the standard Support Vector Machine (SVM). Similar to the standard SVM, the LSSVC relies on a nonlinear mapping function (xk ) to cope with nonlinear classification tasks. It is noted that the original input space includes 14 shallow landslide conditioning factors. Based on the LSSVC algorithm, the data in the original input space is first mapped to a highdimensional feature space within which a linear decision surface can be constructed. Notably, it is only required to compute the dot product of
Minimize Jp (w, e ) = Subjected to 5
1 T 1 w w+ 2 2
yk (wT (xk ) + b) = 1
N
ek2
(1)
k=1
ek ,
k = 1, ...,N
(2)
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
Original Input space
X2
High Dimensional Feature space Kernel Mapping
Φ(xl)
Φ(x)
Φ(xu)
X1 The Nonlinear Decision Boundary The LSSVC Based Decision Boundary
Landslide Non-Landslide
Φ(xv)
Fig. 3. The LSSVC based data classification process.
where w Rn denotes the normal vector to the classification hyperplane; b R is the bias; ek R represents an error variable; > 0 denotes a regularization constant. In order to solve (1) and (2), we construct the following the Lagrangian:
due to prediction errors. In addition, the kernel function parameter ( ) influences the smoothness of the classification boundary. In order to determine the pair of parameters ( , ) of LSSVC, the Bat algorithm is utilized in this study. The algorithm was first introduced in [86], see also [57,88,89]. It is based on echolocation behavior of microbats, with varying pulse rates of emission and loudness. The Bat algorithm has been successfully applied in various real-life problems [18,22,42,57]. The main idea of the Bat algorithm can be summarized as follows. In order to minimize the cost function F(x) , a colony of N virtual bats is considered. The solution is searched at the locations xi of these bats. During the run of the algorithm, these locations are constantly changing as the bats flying around randomly with velocity vi. While flying, the bats emit short sound pulses and “listen” to the echoes that bounce back to “see” the three-dimensional surrounding to detect preys and obstacles. As they search and find their preys, the bats change the frequency fi , loudness Ai and emission rate ri of their sound pulses. Search can be intensified by a local random walk. The algorithm stops when certain criteria on the number of iterations or the reduction of cost function from the previous iteration are met. The details of the algorithm are given in Fig. 4. More specifically, the new location, velocity, and the sound pulse frequency of the i th bat at time step t are updated as follows.
N T k {yk (w
L (w, b , e ; ) = Jp (w, e )
(x k ) + b )
1 + ek }
k=1
(3)
where k represents a Lagrange multiplier; (xk ) denotes a nonlinear mapping function. By employing the KKT conditions for optimality, the conditions for optimality are stated as follows: L w L b L ek L k
=0
w= N k=1
=0 =0
k
N k=1
k yk
k yk
=0
= ek ,
(xk )
k = 1, ...,N
yk (wT (xk ) + b)
=0
1 + ek = 0,
k = 1, ...,N
(4)
By eliminating e and w, the following linear system can be attained:
yT +
0 y
b
1I
0 1v
=
(5)
where y = [y1 , ...,yN ], 1v = [1; ...;1], and kernel function is applied as follows:
= [ 1; ...;
= yi yj (xk )T (xl ) = yi yj K (xk , xl )
N ].
In addition, the (6)
Finally, the LSSVC model can be described compactly as follows: N
y (x ) = sign (
k yi K
(x k , x l ) + b )
k=1
(7)
where k and b are the solution to the aforementioned linear system. Notably, the kernel function that is often utilized is Radial Basis Function (RBF) kernel [30]:
K (xk , xl ) = exp( where
xk
xl 2
2
2
)
(8)
is the kernel function parameter.
3.2. Bat algorithm (BA) As can be shown in the model description of the LSSVC, there are two tuning parameters ( , ) required to be determined properly. The regularization parameter ( ) affects the penalty to the model structure
Fig 4. Pseudo code of the Bat algorithm. 6
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
fi = fmin + (fmax
(9)
fmin)
toposhade, lithology, land use, soil type, and distance to faults) are processed and incorporated into a unified database. It should be noted that the data used in this study has been acquired and processed by the platforms supported by ArcGIS and IDRISI Selva software packages. Based on these software packages and the newly constructed hybrid machine learning tool, the susceptible indexes of all the pixels of the map in the study area can be calculated. Moreover, for establishing the shallow landslide prediction model, data regarding 6908 pixels within the map of the study area has been collected. Within this dataset, 3453 pixels are labeled with shallow landslide occurrences. As mentioned earlier, since LSSVC is a supervised learning approach, the labels of pixel within the map are assigned to be either “non-landslide” (negative class) or “landslide” (positive class). Additionally, the original data set has been separated into two folds: a training set (70%) used for model training and a testing set (30%) used for model verification. It is also noted that in order to facilitate the landslide modeling process, all influencing factors have been normalized and converted from categorical classes into continuous values within the range of 0.01 and 0.99 by the frequency ratio method [10]. Based on the created data set, the model that combines the BA metaheuristic and the LSSVC machine learning model can be trained and used for shallow landslide susceptibility mapping. LSSVC is employed to establish a decision boundary that separates data instances associated with landslide occurrence from data instances labeled with non-landslide. In addition, because the LSSVC training phase needs a proper selection of the regularization coefficient and the kernel function parameter, the BA metaheuristic is used to determine these hyper parameters in a data-driven manner. The operational flow of the hybrid model, named as BA-LSSVC, is presented in Fig. 6. It is noted that the integrated BA-LSSVC model has been programmed in MATLAB by the authors with the help of the source codes provided by De Brabanter et al. [19] and Yang [87].
where fmin and fmax respectively are the lower and upper bounds of the [0, 1] is a random number taken from a uniform frequency and distribution.
v ti = v ti
1
+ (x ti
x ti = x ti
1
+ v ti
1
(10)
x ) fi
(11)
where x* is currently the best global solution. This is the exploration phase. Once the solution is selected among the current best ones, local search is started for the exploitation phase. A new solution for each bat is generated locally using a local random walk.
x new = x old i i +
(12)
At
where is a random number, while At is the average loudness of all the bats in time step t. As the loudness usually increases and the rate of pulse emission usually decreases as a bat approaches its prey, they are updated from iteration to iteration as follows.
A ti = A ti 1, where
and
rti = r0i (1
exp(
t))
(13)
are constants.
4. The proposed Bat algorithm optimized Least Squares Support Vector Classification for spatial of prediction shallow landslide Since LSSVC is a supervised machine learning method. Prior to the construction of this machine learning based model for shallow landslide prediction, it is required to establish a GIS database. The GIS database created for the study is demonstrated in Fig. 5.The locations of the past shallow landslides and the landslide influencing factors (slope angle, slope length, slope aspect, curvature, elevation, topographic wetness index, stream power index, sediment transport index, valley depth,
Fig. 5. The established GIS database. 7
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
Start
Original Dataset
Training Set Training Samples (70%)
Initialization of LSSVC Tuning-Parameters Iter = 1
Testing Samples (30%) Bat Algorithm
Model Training and Prediction Phases Model 1
Model 2
Model M
LSSVC training
LSSVC training
LSSVC training
LSSVC prediction
LSSVC prediction
LSSVC prediction
Computing Cost Function Iter = Iter + 1
Stopping Condition Verification
Landslide Susceptibility Prediction
The Optimized Prediction Model
Testing Set
Fig. 6. The hybrid framework of Bat Algorithm and Least Squares Support Vector Classification for spatial modeling of shallow landslide.
In the preparation phase, the parameters of the BA metaheuristic including the number of optimization iteration (NI) and the size of population (SP), and the range of the search variables (RV) must be selected. Based on several trial runs, these parameters of BA have been determined as follows: NI = 100, SP = 20, and RV = [0.01, 1000], respectively. Hence, the hyper-parameters of a LSSVC model (the regularization coefficient and the kernel function parameter) used for spatial prediction of shallow landslide are created randomly within the aforementioned range based on the following formula:
ParLSSVC, i = LBi + RN × (UBi
LBi), i = 1, 2
K
fCF = ( k=1
2 )/ K PPVk + NPVk
(15)
where K = 5 is the number of data folds. PPV and NPV denote the Positive Predictive Value and the Negative Predictive Value, respectively. These two terms are used to quantify the prediction accuracy of a model constructed with a set of hyper-parameters. The PPV (also called Precision) and NPV are calculated using the following equations [5]:
PPV =
TP TP + FP
(16)
NPV =
TN TN + FN
(17)
(14)
whereParLSSVC, i is the ith hyper-parameter of a LSSVC model. RN is a uniform random number generated within the range of 0 and 1. LBi = 0.01 and UBi = 1000 represent the lower and upper bounds of the hyper-parameters, respectively. Moreover, in order to construct a LSSVC model which has a balance between model predictive accuracy and model complexity, a K-fold cross validation with K = 5 has been employed to compute the objective function of the BA optimizer. Based on the framework of cross validation, the training dataset has been randomly divided into five data folds. Each fold consists of 20% of the original training set. Using the set of hyper-parameters (the regularization coefficient and the kernel function parameter) encoded in each solution of BA, five LSSVC based shallow landslide prediction models are trained. Hence, in each training time, 4 data folds are employed for model construction; the rest of the data is utilized for verifying the model generalization. Thus, the cost function (fCF) used to guide the BA’s members to better regions in the search space is expressed in the following manners:
where TP, TN, FP, and FN represent the true positive, true negative, false positive, and false negative values, respectively. Notably, the model fitness value can be simply computed as the inverse of the cost function value. In addition, besides PPV and NPV, the following indices can also been used for appraising the BA-LSSVC performance [35]:
Classification Accuracy Rate (CAR) =
TP + TN × 100% TP + TN + FP + FN (18)
Recall =
TP TP + FN
F1 Score =
2TP 2TP + FP + FN
(19) (20)
Moreover, the Receiver Operating Characteristic (ROC) curve can 8
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
also be employ to evaluate the predictive power of a classification model [77]. The global performance of a shallow landslides model can be quantified by the area under the ROC curve; this area is denoted as AUC. In general, AUC is a performance measurement index of a model at various point threshold points. This index expresses the capability of a model in distinguishing data instances from different class labels. The higher the AUC is, the better the prediction model performs. When the cost function (fCF) for each bat in the population is calculated, the BA performs its searching process to explore and exploit the search space and identify solutions associated with better LSSVC models. Using the selection operator which favors the solutions having smaller cost function values, the locations of all bats are updated iteratively. The BA optimization process repeats its operations until the iteration number reaches the maximum value. At this point, an appropriate set of hyper-parameters has been found; the optimized LSSVC model can be used for spatial prediction of landslide susceptibility in the study region.
Table 1 Testing performance of BA-LSSVC with different values of bat number. Model parameter and prediction performance
Regularization coefficient Kernel function parameter CAR (%) Precision Recall NPV F1 AUC
Bat Number (N) 20
30
40
50
11.30 7.68 86.74 0.85 0.89 0.88 0.87 0.92
12.66 8.69 86.69 0.85 0.90 0.89 0.87 0.92
15.61 9.64 87.08 0.85 0.91 0.90 0.88 0.92
13.27 8.73 86.64 0.85 0.90 0.89 0.87 0.92
accuracy is found with N = 40. Thus, after 100 iterations, the best hyper-parameters of the LSSVC model have been determined as follows: the regularization coefficient is 15.61 and the kernel function parameter is 9.64. The performance of the BA-LSSVC in the training phase is as follows: CAR = 91.22%, Precision = 0.87, Recall = 0.97, NPV = 0.97, F1 score = 0.92, and AUC = 0.97. In these indices obtained from the testing performance is as follows: CAR = 87.08%, Precision = 0.85, Recall = 0.91, NPV = 0.90, F1 score = 0.88, and AUC = 0.92. ROC curves of BALSSVC in the training and testing phases are illustrated in Fig. 8. Moreover, to demonstrate the good prediction performance of the newly constructed BA-LSSVC landslide prediction model, the LSSVC without the BA optimization, the Convolutional Neural Network (CNN), the Relevance Vector Machine (RVM), Backpropagation Artificial Neural Network (BPANN), and Logistic Regression (LR) are employed
5. Experimental result and comparison This section of the paper dedicates to reporting the experimental results of the hybrid BA-LSSVC model used for spatial prediction of shallow landslide susceptibility. As mentioned earlier, to construct and verify the model, the whole dataset has been randomly divided into a training (70%) and a testing set (30%). Using the training data set, the model optimization process carried out by BA is demonstrated in Fig. 7 which records the population’s cost function corresponding to different values of bat numbers (N). The model prediction performance with different values of N is provided in Table 1. The most desired prediction
Fig. 7. Optimization process of Bat Algorithm with different number of bats (a) N = 20, (b) N = 30, (c) N = 40, and (d) N = 50. 9
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
Fig. 8. ROCs of the proposed BA-LSSVC model.
AUC = 0.92), RVM (CAR = 83.53% and AUC = 0.92), BPANN (CAR = 83.58% and AUC = 0.89), and LR (CAR = 83.01% and AUC = 0.83). Furthermore, to alleviate the bias due to randomness in data selection, a repeated sampling of the data set with 20 runs has also been performed in this section of the study. In each run, the training (70%) and testing (30%) data sets are drawn randomly from the original dataset. The average outcome of the model performance is obtained from this repeated sampling process. The experimental results of the proposed BA-LSSVC and other benchmark models are reported in Table 3 including the mean and standard deviation (Std) of the measured indices. Fig. 9 provides the average ROCs of BA-LSSVC obtained from the repeated data sampling process. As observed from this table, the average performance of BA-LSSVC (CAR = 90.44% and AUC = 0.96) is significantly higher than those of the LSSVC (CAR = 88.78% and AUC = 0.92), CNN (CAR = 86.86 8% and AUC = 0.93), RVM (CAR = 88.69% and AUC = 0.93), BPANN (86.61% and AUC = 0.91), and LR (CAR = 86.65% and AUC = 0.87). This fact clearly confirms that the newly developed BA-LSSVC is highly suited for predicting landslide susceptibility in the study area. In addition, to evaluate the sensitivities of prediction variable on the BA-LSSVC model performance, this study has employed the Fourier amplitude sensitivity test (FAST). FAST, proposed in McRae et al. [46], is a variance-based sensitivity analysis method which can calculate the effects of the variables on the model output of the landslide susceptibility. This study relies a toolbox developed by Pianosi et al. [51] and Pianosi et al. [52] to implement the FAST method. Based on the FAST approach, the variation of the landslide prediction result was decomposed into partial variances of input factors by means of a Fourier transformation. The impact of input features on the BA-LSSVC output is quantified by the first-order sensitivity index (FOSI) [30,51]. Results of sensitivity analysis are demonstrated in Fig. 10 with FOSI values of the slope angle (X1), slope length (X2), slope aspect (X3), curvature (X4), elevation (X5), topographic wetness index (X6), stream power index (X7), sediment transport index (X8), valley depth (X9), toposhade (X10), lithology (X11), land use (X12), soil type (X13), and distance to faults (X14). As can be seen from this figure, X3 (FOSI = 17.02%) has the largest impact on the BA-LSSVC prediction outcomes, followed by X1 (FOSI = 8.89%), X5 (FOSI = 8.12%), X14 (FOSI = 6.88%), X6 (FOSI = 6.71%), X8 (FOSI = 6.49%), and so on. Thus, the slope aspect (X3) imposes the strongest effect on the model prediction. It is can be explained by the fact that slope aspect significantly influence both the physical and biotic characteristics of the slope. Moreover, based on the analysis outcomes, all influencing factors have certain impacts on the BA-LSSVC’s landslide prediction.
as benchmark models. The machine learning methods of LSSVC, RVM, and BPANN have been successfully employed in landslide modeling as reported in previous studies [2,29,40,44,53,70,90]. In addition, CNN [23]is a deep learning based classifier which can be applied for shallow landslide prediction. The LSSVC model has been constructed in MATLAB with the help of the source codes provided by De Brabanter et al. [19]; the grid search algorithm [30] is employed for determined the suitable set of the LSSVC hyper-parameters. The BPANN and CNN models are implemented via the MATLAB’s Statistics and Machine Learning Toolbox and Neural Network Toolbox [7,45]. Via several trial and error runs, the appropriate configuration of the CNN model is as follows: The number of convolutional layers = 3; each convolutional layer has 49 neurons; the filter sizes used in the convolutional layer and pooling layer are both 2 × 2. The BPANN model requires the selections of the number of neurons in its hidden layer, the learning rate, and the number of training epochs. Using several trial runs, the appropriate configuration of the employed BPANN model is as follows: The number of neurons = 11; the learning rate = 0.1; the number of training epochs = 1000. In addition, The RVM model [72] is constructed by the source codes provided by Tipping [73]; its hyper-parameters of the basis width has been determined to be 5.6 via a trial and error process. Meanwhile, the LR model has been constructed by the Newton-Raphson algorithm [3,24] with the number of training epochs = 300. As reported in Table 2, the performance of BA-LSSVC in the testing phase (CAR = 87.08 and AUC = 0.92) is better than those of LSSVC (CAR = 85.93% and AUC = 0.92), CNN (CAR = 86.38% and Table 2 Model performances obtained from the training and testing phases. Phase
Indices
Models BA-LSSVC
LSSVC
CNN
RVM
BPANN
LR
Training
CAR (%) Precision Recall NPV F1 AUC
91.22 0.87 0.97 0.97 0.92 0.97
92.53 0.89 0.97 0.97 0.93 0.98
86.25 0.79 0.99 0.99 0.88 0.93
92.57 0.9 0.96 0.95 0.93 0.97
88.48 0.84 0.95 0.95 0.89 0.93
88.38 0.85 0.93 0.92 0.89 0.88
Testing
CAR (%) Precision Recall NPV F1 AUC
87.08 0.84 0.91 0.9 0.88 0.92
85.93 0.85 0.87 0.87 0.86 0.92
86.38 0.79 0.99 0.99 0.88 0.92
83.53 0.86 0.8 0.81 0.83 0.92
83.58 0.82 0.86 0.85 0.84 0.89
83.01 0.83 0.83 0.83 0.83 0.83
10
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
Table 3 Model performances obtained from the repeated random data sampling process. Phase
Indices
Models BA-LSSVC
LSSVC
CNN
RVM
BPANN
LR
Mean
Std
Mean
Std
Mean
Std
Mean
Std
Mean
Std
Mean
Std
Training
CAR (%) Precision Recall NPV F1 AUC
90.87 0.86 0.97 0.97 0.91 0.96
0.21 0.00 0.00 0.00 0.00 0.00
89.05 0.85 0.95 0.94 0.90 0.92
0.32 0.00 0.00 0.00 0.00 0.00
87.59 0.84 0.92 0.92 0.88 0.94
1.75 0.03 0.05 0.05 0.02 0.02
91.76 0.88 0.96 0.96 0.92 0.95
0.37 0.00 0.00 0.00 0.00 0.03
86.79 0.83 0.92 0.91 0.87 0.91
0.96 0.01 0.02 0.02 0.01 0.00
86.74 0.83 0.92 0.91 0.87 0.87
0.63 0.01 0.02 0.02 0.01 0.01
Testing
CAR (%) Precision Recall NPV F1 AUC
90.44 0.86 0.97 0.96 0.91 0.96
0.52 0.01 0.00 0.00 0.00 0.00
88.78 0.85 0.94 0.93 0.89 0.92
0.68 0.01 0.01 0.01 0.01 0.01
86.86 0.83 0.92 0.92 0.87 0.93
1.87 0.03 0.05 0.05 0.02 0.02
88.69 0.86 0.93 0.92 0.89 0.93
1.67 0.02 0.00 0.01 0.01 0.03
86.61 0.83 0.92 0.91 0.87 0.91
1.41 0.01 0.03 0.03 0.01 0.01
86.65 0.84 0.91 0.91 0.87 0.87
0.77 0.01 0.02 0.02 0.01 0.01
Because BA-LSSVC has gained a satisfactory predictive result with the GIS database collected from the study area of Lang Son province, this hybrid machine learning model is use to establish a landslide susceptibility map for this region. The landslide susceptibility map for the Lang Son area (Fig. 11) is constructed; in this map the landslide susceptibility index (LSI) ranges from 0.001 to 0.999. Notably, the LSI is 0.001 for the state of non-landslide and is 0.999 for the state of high risk of landslide. To verify the accuracy and usefulness of the constructed susceptibility map, the landslide inventory map, which includes locations of the past landslide events, has been overlaid with the newly created map. The graphic curve [17] has been plotted with the percentage of the landslide pixels on the y-axis and the percentage of pixels of susceptible classes which have been arranged from high to low susceptible indexes. Observed from the graphic curve, most of actual landslide pixels have been located in high and very high classes whereas very few actual landslide pixels have been found to be in low and very low classes. These results demonstrate the usefulness and reliability of the landslide susceptibility map constructed by the proposed BA-LSSVC model.
Lang Son province has been heavily damaged by shallow landslides in recent years. Therefore, constructing a landslide susceptibility map is highly useful for the local authority to better deal with this natural disaster. To develop a capable landslide prediction tool, this study has integrated the LSSVC and BA algorithms. LSSVC is employed as a machine learning method used for distinguishing locations associated with high risk of landslide occurrence. In addition, the BA metaheuristic is employed for optimizing the LSSVC model construction phase. To train and verify the proposed hybrid model, a GIS data set containing information of 101 historical landslide locations and fourteen landslide influencing factors has been used. Experimental results show that BALSSVC with CAR = 90.44% and AUC = 0.96 is a capable tool for shallow landslide susceptibility mapping. The performance of the hybrid method is better than those of the CNN, RVM, BPANN, and LR prediction approaches. Since the model training phase of BA-LSSVC is automatically accomplished with the help of the BA metaheuristic, the hybrid model can be used and updated by decision-makers without much domain knowledge about machine learning and programming. Future extensions of the current work may include the investigation of other advanced machine learning and metaheuristic methods to enhance the performance of landslide spatial modeling as well as applying the current model for hazard management in other study areas. In addition, the current model constructs the GIS database by randomly sampling
6. Conclusion Spatial mapping of shallow landslide susceptibility is crucial for disaster prevention/mitigation and land use planning. The study area of
Fig. 9. The average ROCs of BA-LSSVC obtained from the repeated data sampling process. 11
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
Fig. 10. The first-order sensitivity index (FOSI) of landslide influencing factors.
Fig. 11. Landslide susceptibility map.
data points in non-landslide locations. Future works may investigate the performance of the landslide prediction model in the case that nonlandslide data points are sampled from locations having similar characteristics with landslide ones. Moreover, the generalization capability of the proposed BA-LSSVC should be evaluated with more GIS datasets related to spatial prediction of landslides.
Acknowledgements This research was funded by the Geographic Information Science Research group, Ton Duc Thang University, Ho Chi Minh city, Vietnam. References [1] M. Ada, B.T. San, Comparison of machine-learning techniques for landslide susceptibility mapping using two-level random sampling (2LRS) in Alakir catchment area Antalya, Turkey Nat. Hazards 90 (2018) 237–263, https://doi.org/10.1007/ s11069-017-3043-8. [2] A. Aditian, T. Kubota, Y. Shinohara, Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural
Declaration of Competing Interest The authors confirm that there are no conflicts of interest regarding the publication of this study. 12
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
[3] [4] [5] [6] [7] [8] [9] [10]
[11]
[12] [13]
[14] [15] [16]
[17] [18]
[19] [20]
[21] [22] [23] [24] [25] [26]
[27]
[28] [29] [30]
network in a tertiary region of Ambon, Indonesia, Geomorphology 318 (2018) 101–111, https://doi.org/10.1016/j.geomorph.2018.06.006. A. Agresti, Foundations of Linear and Generalized Linear Models. Wiley Series in Probability and Statistics, John Wiley & Sons Inc, Hoboken, New Jersey, 2015. M.S. Alkhasawneh, U.K. Ngah, L.T. Tay, N.A. Mat Isa, M.S. Al-Batah, Modeling and testing landslide hazard using decision tree, J. Appl. Math. 2014 (2014) 9, https:// doi.org/10.1155/2014/929768. D.G. Altman, J.M. Bland, Statistics Notes: Diagnostic tests 2: predictive values, BMJ 309 (1994) 102, https://doi.org/10.1136/bmj.309.6947.102. A. Arabameri, B. Pradhan, K. Rezaei, C.-W. Lee, Assessment of landslide susceptibility using statistical- and artificial intelligence-based FR–RF integrated model and multiresolution DEMs, Remote Sens. 11 (2019) 999. M.H. Beale, M.T. Hagan, H.B. Demuth, Neural network toolbox user’s guide, 2018, https://www.mathworks.com/help/pdf_doc/nnet/nnet_ug.pdf (Last accessed 04/ 28/2018). The MathWorks, Inc. C.M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics), Springer, 2011 (April 6, 2011), ISBN-10: 0387310738. E.E. Brabb, Innovative approaches to landslide hazard and risk mapping, Proceedings of IVth International Conference and Field Workshop in Landslides, Tokyo, Japan, Japan Landslide Society, 1985, pp. 1–6. D.T. Bui, B. Pradhan, I. Revhaug, D.B. Nguyen, H.V. Pham, Q.N. Bui, A novel hybrid evidential belief function-based fuzzy logic model in spatial prediction of rainfallinduced shallow landslides in the Lang Son city area (Vietnam) Geomatics, Natural Hazards Risk 6 (2015) 243–271, https://doi.org/10.1080/19475705.2013.843206. D.T. Bui, T.A. Tuan, H. Klempe, B. Pradhan, I. Revhaug, Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree, Landslides 13 (2016) 361–378. J. Chacón, C. Irigaray, T. Fernández, R. El Hamdouni, Engineering geology maps: landslides and geographical information systems, Bull. Eng. Geol. Environ. 65 (2006) 341–411, https://doi.org/10.1007/s10064-006-0064-z. W. Chen, M. Panahi, H.R. Pourghasemi, Performance evaluation of GIS-based new ensemble data mining techniques of adaptive neuro-fuzzy inference system (ANFIS) with genetic algorithm (GA), differential evolution (DE), and particle swarm optimization (PSO) for landslide spatial modelling, CATENA 157 (2017) 310–324, https://doi.org/10.1016/j.catena.2017.05.034. W. Chen, et al., Applying population-based evolutionary algorithms and a neurofuzzy system for modeling landslide susceptibility, CATENA 172 (2019) 212–231, https://doi.org/10.1016/j.catena.2018.08.025. W. Chen, et al., Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China, Sci. Total Environ. 626 (2018) 1121–1135, https://doi.org/10.1016/j.scitotenv.2018.01.124. M.-Y. Cheng, D. Prayogo, Y.-W. Wu, Prediction of permanent deformation in asphalt pavements using a novel symbiotic organisms search–least squares support vector regression, Neural Comput. Appl. (2018), https://doi.org/10.1007/s00521-0183426-0. C.-J.F. Chung, A.G. Fabbri, C.J. Van Westen, Multivariate regression analysis for landslide hazard zonation, Geographical Information Systems in Assessing Natural Hazards, Springer, 1995, pp. 107–133. N.R. da Silva, P. Van der Weeën, B. De Baets, O.M. Bruno, Improved texture image classification through the use of a corrosion-inspired cellular automaton, Neurocomputing 149 (2015) 1560–1572, https://doi.org/10.1016/j.neucom.2014. 08.036. K. De Brabanter, et al., LS-SVMlab Toolbox User's Guide version 1.8, 2010. A. Dehnavi, I.N. Aghdam, B. Pradhan, M.H. Morshed Varzandeh, A new hybrid model using step-wise weight assessment ratio analysis (SWARA) technique and adaptive neuro-fuzzy inference system (ANFIS) for regional landslide hazard assessment in Iran, CATENA 135 (2015) 122–148, https://doi.org/10.1016/j.catena. 2015.07.020. S. Etedali, N. Mollayi, Cuckoo Search-Based Least Squares Support Vector Machine Models for Optimum Tuning of Tuned Mass Dampers, Int. J. Struct. Stab. Dyn. 18 (2018) 1850028, https://doi.org/10.1142/s0219455418500281. M.R. Gamarra Acosta, J.C. Vélez Díaz, N. Schettini Castro, An innovative imageprocessing model for rust detection using Perlin Noise to simulate oxide textures, Corros. Sci. 88 (2014) 141–151, https://doi.org/10.1016/j.corsci.2014.07.027. I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (Adaptive Computation and Machine Learning series), The MIT Press, 2016 ISBN-10: 0262035618. M. Gormley, Logistic Regression, 2016. https://www.cs.cmu.edu/~mgormley/ courses/10701-f16/slides/lecture5.pdf (Last Access 12/12/2018). L.H. Hamel, Knowledge Discovery with Support Vector Machines, John Wiley & Sons Inc, 2009 ISBN: 978-0-470-37192-3. S. Heddam, O. Kisi, Modelling daily dissolved oxygen concentration using least square support vector machine, multivariate adaptive regression splines and M5 model tree, J. Hydrol. 559 (2018) 499–509, https://doi.org/10.1016/j.jhydrol. 2018.02.061. H. Hemasinghe, R.S.S. Rangali, N.L. Deshapriya, L. Samarakoon, Landslide susceptibility mapping using logistic regression model (a case study in Badulla District, Sri Lanka), Procedia Eng. 212 (2018) 1046–1053, https://doi.org/10.1016/j. proeng.2018.01.135. N.-D. Hoang, Classification of asphalt pavement cracks using Laplacian pyramidbased image processing and a hybrid computational approach, Comput. Intelligence Neurosci. 2018 (2018) 16, https://doi.org/10.1155/2018/1312787. N.-D. Hoang, D.T. Bui, A novel relevance vector machine classifier with cuckoo search optimization for spatial prediction of landslides, 04016001, J. Comput. Civ. Eng. 30 (2016), https://doi.org/10.1061/(ASCE)CP.1943-5487.0000557. N.-D. Hoang, D.T. Bui, Predicting earthquake-induced soil liquefaction based on a
[31] [32] [33] [34]
[35] [36]
[37] [38]
[39] [40]
[41]
[42] [43]
[44] [45] [46] [47]
[48]
[49]
[50]
[51] [52] [53]
[54]
13
hybridization of kernel Fisher discriminant analysis and a least squares support vector machine: a multi-dataset study, Bull. Eng. Geol. Environ. 77 (2018) 191–204, https://doi.org/10.1007/s10064-016-0924-0. N.-D. Hoang, D. Tien-Bui, A novel relevance vector machine classifier with cuckoo search optimization for spatial prediction of landslides, 04016001, J. Comput. Civ. Eng. 30 (2016), https://doi.org/10.1061/(ASCE)CP.1943-5487.0000557. H. Hong, et al., Landslide susceptibility mapping using J48 Decision Tree with AdaBoost, Bagging and Rotation Forest ensembles in the Guangchang area (China), CATENA 163 (2018) 399–413, https://doi.org/10.1016/j.catena.2018.01.005. Y. Huang, L. Zhao, Review on landslide susceptibility mapping using support vector machines, CATENA 165 (2018) 520–529, https://doi.org/10.1016/j.catena.2018. 03.003. L.Q. Hung, N.T.H. Van, D.M. Duc, P. Van Son, N.H. Khanh, L.T. Binh, Landslide susceptibility mapping by combining the analytical hierarchy process and weighted linear combination methods: a case study in the upper Lo River catchment (Vietnam), Landslides 13 (2016) 1285–1301. D. Itzhak, I. Dinstein, T. Zilberberg, Pitting corrosion evaluation by computer image processing, Corros. Sci. 21 (1981) 17–22, https://doi.org/10.1016/0010-938X(81) 90059-7. A. Jaafari, M. Panahi, B.T. Pham, H. Shahabi, D.T. Bui, F. Rezaie, S. Lee, Meta optimization of an adaptive neuro-fuzzy inference system with grey wolf optimizer and biogeography-based optimization algorithms for spatial prediction of landslide susceptibility, CATENA 175 (2019) 430–445, https://doi.org/10.1016/j.catena. 2018.12.033. M.N. Jebur, B. Pradhan, M.S. Tehrany, Using ALOS PALSAR derived high-resolution DInSAR to detect slow-moving landslides in tropical forest: Cameron Highlands, Malaysia, Geomat., Nat. Hazards Risk 6 (2015) 741–759. M. Juliev, M. Mergili, I. Mondal, B. Nurtaev, A. Pulatov, J. Hübl, Comparative analysis of statistical methods for landslide susceptibility mapping in the Bostanlik District, Uzbekistan, Sci. Total Environ. 653 (2019) 801–814, https://doi.org/10. 1016/j.scitotenv.2018.10.431. P.R. Kadavi, C.-W. Lee, S. Lee, Application of ensemble-based machine learning models to landslide susceptibility mapping, Remote Sens. 10 (2018) 1252. B. Kalantar, B. Pradhan, S.A. Naghibi, A. Motevalli, S. Mansor, Assessment of the effects of training data selection on the landslide susceptibility mapping: a comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN) Geomatics, Nat. Hazards Risk 9 (2018) 49–69, https:// doi.org/10.1080/19475705.2017.1407368. T. Kavzoglu, I. Colkesen, E.K. Sahin, Machine learning techniques in landslide susceptibility mapping: a survey and a case study, in: S.P. Pradhan, V. Vishal, T.N. Singh (Eds.), Landslides: Theory, Practice and Modelling, Springer International Publishing, Cham, 2019, pp. 283–301, , https://doi.org/10.1007/ 978-3-319-77377-3_13. K.K. Kuok, S.M. Kueh, P.C. Chiu, Bat optimisation neural networks for rainfall forecasting: case study for Kuching city, J. Water Climate Change (2018), https:// doi.org/10.2166/wcc.2018.136. F. Lafarge, X. Descombes, J. Zerubia, Textural kernel for SVM classification in remote sensing: application to forest fire detection and urban area extraction, in: IEEE International Conference on Image Processing 2005, 14-14 Sept. 2005, 2005. Pp. III-1096. doi: 10.1109/ICIP.2005.1530587. L. Lombardo, P.M. Mai, Presenting logistic regression-based landslide susceptibility results, Eng. Geol. 244 (2018) 14–24, https://doi.org/10.1016/j.enggeo.2018.07. 019. Matwork, Statistics and Machine Learning Toolbox User's Guide. Matwork Inc., 2017, https://www.mathworks.com/help/pdf_doc/stats/stats.pdf, Date of last access: 04/28/2018. G.J. McRae, J.W. Tilden, J.H. Seinfeld, Global sensitivity analysis—a computational implementation of the Fourier Amplitude Sensitivity Test (FAST), Comput. Chem. Eng. 6 (1982) 15–25, https://doi.org/10.1016/0098-1354(82)80003-3. Q.-K. Nguyen, D. Tien Bui, N.-D. Hoang, P.T. Trinh, V.-H. Nguyen, I. Yilmaz, A Novel hybrid approach based on instance based learning classifier and rotation forest ensemble for spatial prediction of rainfall-induced shallow landslides using GIS, Sustainability 9 (2017) 813. M.A. North, A method for implementing a statistically significant number of data classes in the Jenks algorithm, in: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery, 14-16 Aug. 2009, 2009, pp. 35–38. doi: 10. 1109/FSKD.2009.319. B.T. Pham, A. Jaafari, I. Prakash, D.T. Bui, A novel hybrid intelligent model of support vector machines and the MultiBoost ensemble for landslide susceptibility modeling, Bull. Eng. Geol. Environ. (2018), https://doi.org/10.1007/s10064-0181281-y. B.T. Pham, D. Tien Bui, I. Prakash, Landslide susceptibility assessment using bagging ensemble based alternating decision trees, logistic regression and J48 decision trees methods: a comparative study, Geotech. Geol. Eng. 35 (2017) 2597–2611, https://doi.org/10.1007/s10706-017-0264-2. F. Pianosi, F. Sarrazin, T. Wagener, A Matlab toolbox for Global Sensitivity Analysis, Environ. Modell. Software 70 (2015) 80–85, https://doi.org/10.1016/j.envsoft. 2015.04.009. F. Pianosi, F. Sarrazin, T. Wagener, SAFE Toolbox, 2019 https:// wwwsafetoolboxinfo/about-us/ (Last Access Date: 02/18/2019). C. Polykretis, C. Chalkias, Comparison and evaluation of landslide susceptibility maps obtained from weight of evidence, logistic regression, and artificial neural network models, Nat. Hazards (2018), https://doi.org/10.1007/s11069-0183299-7. C. Polykretis, C. Chalkias, M. Ferentinou, Adaptive neuro-fuzzy inference system (ANFIS) modeling for landslide susceptibility assessment in a Mediterranean hilly
Advanced Engineering Informatics 42 (2019) 100978
D. Tien Bui, et al.
[55] [56]
[57] [58]
[59] [60] [61] [62] [63] [64] [65] [66] [67]
[68]
[69]
[70] [71]
[72]
Learn. Res. 1 (2001) 211–244. [73] Tipping ME, SparseBayes software release for Matlab, 2009 http:// wwwmiketippingcom/downloadshtm (Last Access Date 10/1/2018). [74] P. Truong, T. Nghi, P. Phuc, H. Quyet, The NV (2009) Geological mapping and mineral resource investigation at 1: 50 000 scale for Lang Son area Northern Geological Mapping Division, Hanoi, 2009. [75] E. Tuba, M. Tuba, D. Simian, Adjusted bat algorithm for tuning of support vector machine parameters, in: 2016 IEEE Congress on Evolutionary Computation (CEC), 24-29 July 2016 2016, pp. 2225–2232. doi: 10.1109/CEC.2016.7744063. [76] USGS, United States Geological Survey How many deaths result from landslides each year? 2019 https://wwwusgsgov/faqs/how-many-deaths-result-landslideseach-year (accessed date 10/01/2019). [77] A.R. van Erkel, P.M.T. Pattynama, Receiver operating characteristic (ROC) analysis: Basic principles and applications in radiology, Eur. J. Radiol. 27 (1998) 88–94, https://doi.org/10.1016/S0720-048X(97)00157-5. [78] T. van Gestel, et al., Benchmarking least squares support vector machine classifiers, Mach. Learn. 54 (2004) 5–32, https://doi.org/10.1023/B:MACH.0000008082. 80494.e0. [79] V.N. Vapnik, Statistical Learning Theory, John Wiley & Sons Inc, 1998 ISBN-10: 0471030031. [80] VMHA, Giải pháp giảm thiệt hại từ lũ, sạt lở đất Vietnam Meteorological and Hydrological Administration, 2018, http://wwwkttvqggovvn/tin-tuc/9766/Giaiphap-giam-thiet-hai-tu-lu,-sat-lo-dathtml (Access Date: 02/25/2019). [81] L.-J. Wang, M. Guo, K. Sawada, J. Lin, J. Zhang, A comparative study of landslide susceptibility maps using logistic regression, frequency ratio, decision tree, weights of evidence and artificial neural network, Geosci. J. 20 (2016) 117–136, https:// doi.org/10.1007/s12303-015-0026-1. [82] Q. Wang, W. Li, M. Xing, Y. Wu, Y. Pei, D. Yang, H. Bai, Landslide susceptibility mapping at Gongliu county, China using artificial neural network and weight of evidence models, Geosci. J. 20 (2016) 705–718, https://doi.org/10.1007/s12303016-0003-3. [83] M.G. Winter, B. Shearer, D. Palmer, D. Peeling, C. Harmer, J. Sharpe, The economic impact of landslides and floods on the road network, Procedia Eng. 143 (2016) 1425–1434, https://doi.org/10.1016/j.proeng.2016.06.168. [84] Y.-H. Wu, H. Shen, Grey-related least squares support vector machine optimization model and its application in predicting natural gas consumption demand, J. Comput. Appl. Math. 338 (2018) 212–220, https://doi.org/10.1016/j.cam.2018. 01.033. [85] B. Xing, R. Gan, G. Liu, Z. Liu, J. Zhang, Y. Ren, Monthly mean streamflow prediction based on bat algorithm-support vector machine, J. Hydrol. Eng. 21 (2016) 04015057, https://doi.org/10.1061/(ASCE)HE.1943-5584.0001269. [86] X.-S. Yang A new metaheuristic bat-inspired algorithm J.R. González D.A. Pelta C. Cruz G. Terrazas N. Krasnogor Nature Inspired Cooperative Strategies for Optimization (NICSO 2010) 2010 Springer, Berlin Heidelberg Berlin, Heidelberg 65 74 10.1007/978-3-642-12538-6_6. [87] X.-S. Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014. [88] X.-S. Yang, X. He, Bat algorithm: literature review and applications, Int. J. BioInspired Comput. 5 (2013) 141–149, https://doi.org/10.1504/ijbic.2013.055093. [89] X.S. Yang, A. Hossein Gandomi, Bat algorithm: a novel approach for global engineering optimization, Eng. Comput. 29 (2012) 464–483, https://doi.org/10. 1108/02644401211235834. [90] J.L. Zêzere, S. Pereira, R. Melo, S.C. Oliveira, R.A.C. Garcia, Mapping landslide susceptibility using data-driven methods, Sci. Total Environ. 589 (2017) 250–267, https://doi.org/10.1016/j.scitotenv.2017.02.188. [91] C. Zhou, K. Yin, Y. Cao, B. Ahmed, Y. Li, F. Catani, H.R. Pourghasemi, Landslide susceptibility modeling applying machine learning methods: A case study from Longju in the Three Gorges Reservoir area, China, Comput. Geosci. 112 (2018) 23–37, https://doi.org/10.1016/j.cageo.2017.11.019. [92] L.Y. Zhou, F.P. Shan, K. Shimizu, T. Imoto, H. Lateh, K.S. Peng, A comparative study of slope failure prediction using logistic regression, support vector machine and least square support vector machine models, in: AIP Conference Proceedings 1870:060012, 2017. doi: 10.1063/1.4995939. [93] X. Zhu, S.-Q. Ma, Q. Xu, W.-D. Liu, A WD-GA-LSSVM model for rainfall-triggered landslide displacement prediction, J. Mountain Sci. 15 (2018) 156–166, https:// doi.org/10.1007/s11629-016-4245-3.
area, Bull. Eng. Geol. Environ. (2017), https://doi.org/10.1007/s10064-0171125-1. P. Reichenbach, M. Rossi, B.D. Malamud, M. Mihir, F. Guzzetti, A review of statistically-based landslide susceptibility models, Earth Sci. Rev. 180 (2018) 60–91, https://doi.org/10.1016/j.earscirev.2018.03.001. S. Sachdeva, T. Bhatia, A.K. Verma, Flood susceptibility mapping using GIS-based support vector machine and particle swarm optimization: A case study in Uttarakhand (India), in: 2017 8th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 3-5 July 2017, 2017, pp. 1–7. doi: 10.1109/ICCCNT.2017.8204182. S.C. Satapathy, N. SriMadhavaRaja, V. Rajinikanth, A.S. Ashour, N. Dey, Multi-level image thresholding using Otsu and chaotic bat algorithm, Neural Comput. Appl. 29 (2018) 1285–1307, https://doi.org/10.1007/s00521-016-2645-5. Sayed GI, Soliman M, Hassanien AE, Modified optimal foraging algorithm for parameters optimization of support vector machine, in: Cham, 2018. The International Conference on Advanced Machine Learning Technologies and Applications (AMLTA2018), Springer International Publishing, pp. 23–32. A. Shabri, Suhartono, Streamflow forecasting using least-squares support vector machines, Hydrol. Sci. J. 57 (2012) 1275–1293, https://doi.org/10.1080/ 02626667.2012.714468. H. Shahabi, M. Hashim, Landslide susceptibility mapping using GIS-based statistical models and Remote sensing data in tropical environment, Sci. Reports 5 (2015), https://doi.org/10.1038/srep09899. J. Suykens, J.V. Gestel, J.D. Brabanter, B.D. Moor, J. Vandewalle, Least Squares Support Vector Machines, World Scientific Publishing Co., Pte. Ltd., 2002 ISBN-13: 978-9812381514. J.A.K. Suykens, J. Vandewalle, Least squares support vector machine classifiers, Neural Process. Lett. 9 (1999) 293–300, https://doi.org/10.1023/ a:1018628609742. V. Tam, et al., Geohazard investigation in some key areas of the northern mountainous area of Vietnam for the planning of socio-economic development Vietnam Institute of Geosciences and Mineral Resources, Hanoi 83 (2006) 56–62. M.S. Tehrany, B. Pradhan, S. Mansor, N. Ahmad, Flood susceptibility assessment using GIS-based support vector machine model with different kernel types, CATENA 125 (2015) 91–101, https://doi.org/10.1016/j.catena.2014.10.017. A. Tharwat, A.E. Hassanien, B.E. Elnaghi, A BA-based algorithm for parameter optimization of Support Vector Machine Pattern, Recognition Lett. 93 (2017) 13–22, https://doi.org/10.1016/j.patrec.2016.10.007. D. Tien Bui, H.V. Le, N.-D. Hoang, GIS-based spatial prediction of tropical forest fire danger using a new hybrid machine learning method, Ecol. Inf. 48 (2018) 104–116, https://doi.org/10.1016/j.ecoinf.2018.08.008. D. Tien Bui, Q.P. Nguyen, N.-D. Hoang, H. Klempe, A novel fuzzy K-nearest neighbor inference model with differential evolution for spatial prediction of rainfall-induced shallow landslides in a tropical hilly area using GIS, Landslides 14 (2017) 1–17, https://doi.org/10.1007/s10346-016-0708-4. D. Tien Bui, B.T. Pham, Q.P. Nguyen, N.-D. Hoang, Spatial prediction of rainfallinduced shallow landslides using hybrid integration approach of Least-Squares Support Vector Machines and differential evolution optimization: a case study in Central Vietnam, Int. J. Digital Earth 9 (2016) 1077–1097, https://doi.org/10. 1080/17538947.2016.1169561. D. Tien Bui, B. Pradhan, I. Revhaug, Tran C. Trung, A comparative assessment between the application of fuzzy unordered rules induction algorithm and J48 decision tree models in spatial prediction of shallow landslides at Lang Son City, Vietnam, in: P.K. Srivastava, S. Mukherjee, M. Gupta, T. Islam (Eds.), Remote Sensing Applications in Environmental Research, Springer International Publishing, Cham, 2014, pp. 87–111, , https://doi.org/10.1007/978-3-319-05906-8_6. D. Tien Bui, et al., A novel integrated approach of relevance vector machine optimized by imperialist competitive algorithm for spatial modeling of shallow landslides, Remote Sens. 10 (2018) 1538. D. Tien Bui, T.A. Tuan, N.-D. Hoang, N.Q. Thanh, D.B. Nguyen, N. Van Liem, B. Pradhan, Spatial prediction of rainfall-induced landslides for the Lao Cai area (Vietnam) using a hybrid intelligent approach of least squares support vector machines inference model and artificial bee colony optimization, Landslides 14 (2017) 447–458, https://doi.org/10.1007/s10346-016-0711-9. M.E. Tipping, Sparse Bayesian learning and the relevance vector machine, J. Mach.
14