Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling

Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling

Computers & Geosciences 81 (2015) 1–11 Contents lists available at ScienceDirect Computers & Geosciences journal homepage: www.elsevier.com/locate/c...

5MB Sizes 0 Downloads 68 Views

Computers & Geosciences 81 (2015) 1–11

Contents lists available at ScienceDirect

Computers & Geosciences journal homepage: www.elsevier.com/locate/cageo

Evaluating machine learning and statistical prediction techniques for landslide susceptibility modeling J.N. Goetz a,b,d,n, A. Brenning d,b, H. Petschko c, P. Leopold a a

Health and Environment Department, AIT-Austrian Institute of Technology GmbH, Konrad-Lorenz-Straße 24, 3430 Tulln, Austria Department of Geography and Environmental Management, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, Canada N2L 3G1 c Department of Geography and Regional Research, University of Vienna, Universitätsstraße 7, A-1010 Vienna, Austria d Department of Geography, Friedrich Schiller University Jena, Loebdergraben 32, 07743 Jena, Germany b

art ic l e i nf o

a b s t r a c t

Article history: Received 6 June 2014 Received in revised form 14 January 2015 Accepted 16 April 2015 Available online 20 April 2015

Statistical and now machine learning prediction methods have been gaining popularity in the field of landslide susceptibility modeling. Particularly, these data driven approaches show promise when tackling the challenge of mapping landslide prone areas for large regions, which may not have sufficient geotechnical data to conduct physically-based methods. Currently, there is no best method for empirical susceptibility modeling. Therefore, this study presents a comparison of traditional statistical and novel machine learning models applied for regional scale landslide susceptibility modeling. These methods were evaluated by spatial k-fold cross-validation estimation of the predictive performance, assessment of variable importance for gaining insights into model behavior and by the appearance of the prediction (i.e. susceptibility) map. The modeling techniques applied were logistic regression (GLM), generalized additive models (GAM), weights of evidence (WOE), the support vector machine (SVM), random forest classification (RF), and bootstrap aggregated classification trees (bundling) with penalized discriminant analysis (BPLDA). These modeling methods were tested for three areas in the province of Lower Austria, Austria. The areas are characterized by different geological and morphological settings. Random forest and bundling classification techniques had the overall best predictive performances. However, the performances of all modeling techniques were for the majority not significantly different from each other; depending on the areas of interest, the overall median estimated area under the receiver operating characteristic curve (AUROC) differences ranged from 2.9 to 8.9 percentage points. The overall median estimated true positive rate (TPR) measured at a 10% false positive rate (FPR) differences ranged from 11 to 15pp. The relative importance of each predictor was generally different between the modeling methods. However, slope angle, surface roughness and plan curvature were consistently highly ranked variables. The prediction methods that create splits in the predictors (RF, BPLDA and WOE) resulted in heterogeneous prediction maps full of spatial artifacts. In contrast, the GAM, GLM and SVM produced smooth prediction surfaces. Overall, it is suggested that the framework of this model evaluation approach can be applied to assist in selection of a suitable landslide susceptibility modeling technique. & 2015 Elsevier Ltd. All rights reserved.

Keywords: Statistical and machine learning techniques Landslide susceptibility modeling Spatial cross-validation Variable importance

1. Introduction Mitigating the impacts of landslides remains a great challenge for land-use planners and policy makers. Landslide susceptibility models, which are used to derive maps of locations prone to landslides, can support and enhance spatial planning decisions focused on reducing landslides hazards. Currently there is a vast

n Corresponding author at: Department of Geography, Friedrich Schiller University Jena, Loebdergraben 32, 07743 Jena, Germany. E-mail address: [email protected] (J.N. Goetz).

http://dx.doi.org/10.1016/j.cageo.2015.04.007 0098-3004/& 2015 Elsevier Ltd. All rights reserved.

selection of quantitative methods applied for spatial modeling and predicting landslide susceptibility (Chung and Fabbri, 1999; Guzzetti et al., 1999; Dai et al., 2002; van Westen et al., 2003; Brenning, 2005; Goetz et al., 2011; Pradhan, 2013). Quantitative methods for modeling landslide susceptibility can be generalized into physically-based and statistical approaches (Soeters and van Westen, 1996; van Westen et al., 1997). This study focuses on statistical and machine learning techniques, which have become common approaches for modeling landslide susceptibility over large regions (van Westen et al., 1997; Brenning, 2005; Petschko et al., 2014, Micheletti et al., 2014). The basic assumption of the empirical approach is that future

2

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

landslides are likely to occur in similar conditions of the past (Varnes, 1984). Typically, a range of predictors (i.e., independent variables) is used to represent landslide preparatory conditions (van Westen et al., 2008). The exact relationship of the predictors to the response (i.e., landslide presence/absence) is not always well known a priori. In most cases, the predictors are proxies for conditions and processes that are difficult to measure across large regions (Pachauri and Pant, 1992; Guzzetti et al., 1999; Goetz et al., 2011). The susceptibility model output is a prediction surface or map that spatially represents the distribution of predicted values, usually as probabilities distributed across grid cells. The freedom of choice to decide which modeling method is most suitable for a particular application is challenging. Numerous comparisons of susceptibility modeling methods have been conducted; yet no single best method has been identified (Brenning, 2005; Yesilnacar and Topal, 2005; Lee et al., 2007; Yilmaz, 2009, 2010; Yalcin et al., 2011; Goetz et al., 2011; Pradhan, 2013). The search for the optimal susceptibility modeling method is a complicated one and should not only consider model accuracy. Robustness to sampling variation and adequacy to describe processes associated with landslides are also crucial model properties (Frattini et al., 2010). The simplest approach to select an optimal model for prediction is to compare the error rates estimated from cross-validation, where the modeling method with the lowest error estimate is determined as the best one to use. This assessment on the prediction performance is also viewed as essential for a model to have any practical or scientific significance (Chung and Fabbri, 2003; Guzzetti et al., 2006). There are a variety of measures to assess the accuracy of landslide susceptibility models. Common ones are derived from success rate curves, prediction rate curves (Chung and Fabbri, 2003) or receiver operating characteristic (ROC) curves (Brenning, 2005; Beguería, 2006; Gorsevski et al., 2006; Frattini et al., 2010). It is necessary to carefully select a suitable performance measure. Ideally, this measure should communicate performance in the context of the model application (Brenning, 2012a). Performance should also be assessed using test sets that are independent from the training set used to build the prediction model, resampling-based estimation methods such as cross-validation being the state of the art (Brenning, 2012a): cross-validation utilizes the entire dataset for training and testing the model. The ability to communicate model behavior is a desirable quality for landslide susceptibility models (Brenning, 2012a). Generally speaking, users feel more comfortable in the practical application of a model if they understand how the model works. The ability of a model to adequately describe the system behavior can be assessed by determining how well the predictors represent the processes associated with landslides (Frattini et al., 2010). In statistical methods, this is relatively straight forward compared to machine learning models. The model coefficients from generalized linear models can be used to evaluate the relative importance of landslide predictors (Dai and Lee, 2002; Ayalew and Yamagishi, 2005). Variable importance has also been estimated for regression models by observing the relative frequencies of variable selection when an automatic stepwise variable selection method has been applied and tested with cross-validation (Brenning, 2009; Goetz et al., 2011; Petschko et al., 2014). In contrast, the internal mechanisms defining the representation of response by the predictors are difficult to interpret in machine learning models because of their ‘black box’ nature. Micheletti et al. (2014) demonstrated how some feature selection properties of different machine learning techniques can be implemented to assess the relative importance of variables for landslide susceptibility modeling. However, since their approach applied features selection methods only relevant to the corresponding machine learning technique, making comparisons of variable importance with other modeling

techniques can be challenging. A standardized approach for comparing the relative variable importance of different modeling statistical and machine learning techniques for geospatial problems was demonstrated by Brenning et al. (2012b). They assessed variable importance using internal estimates of changes in error rates by randomly permuting predictors in out-of-bag samples (Breiman, 2001; Strobl et al., 2007). There are many criteria that can be considered for model selection in the context of landslide susceptibility (Brenning, 2012a). This study focuses on one particular aspect, which is the predictive performance. Therefore, a rigorous assessment of prediction performance is performed on various statistical and machine learning techniques in an attempt to determine the ‘best’ predictive model. The modeling techniques include logistic regression (GLM), generalized additive models (GAM), weights of evidence (WOE), the support vector machine (SVM), random forest classification (RF), and bootstrap aggregated classification trees (bundling) with penalized discriminant analysis (BPLDA). The importance of predictor variables in each model is also analyzed to demonstrate how a standard measure of variable importance can be applied to communicate and compare model behavior, even when a model is considered ‘black box’. The main objective of this paper is to demonstrate an approach to make a rigorous comparison of landslide susceptibility models for the purpose of spatial prediction.

2. Materials and methods 2.1. Study area Multiple areas of interest (AOIs) were selected to observe model behavior under different landslide conditions. The modeling techniques were tested on AOIs that were each 50 km2 and within the province of Lower Austria (Fig. 1). The Molasse AOI (Fig. 1a) is located in a relatively low lying basin (o 300 m a.s.l). It mainly consists of sand and clay sediments, sandstones, claystones, and marls. These bedrock materials can be covered by Quaternary gravels and eolian sediments (loess). Deep-seated and shallow landslides occur in this area. The Austroalpine AOI (Fig. 1b), which includes the Upper Austroalpine lithology units, is made up of predominately steep terrain, and has the highest elevations in Lower Austria (1000–2000 m a.s.l). The lithology is dominated by limestone and dolomite rock, with some interbedded strata of claystone and marl. Landslides in the Austroalpine area are typically shallow. Generally, the Flysch AOI (Fig. 1c) is very susceptible to landslide activity (Petschko et al., 2014). This low mountain region has exceptionally undulating terrain, and consists of sedimentary rocks that are made up of layers of sandstones, marls and claystones. The main triggers for landslides in Lower Austria are intense rainfall and rapid snowmelt events (Schwenk, 1992; Schweigl and Hervás, 2009). For more details on the lithology and geology of Lower Austria, please refer to Gottschling (2006) and Wessely (2006). 2.2. Landslide inventory The landslides in this analysis are a subset of an inventory for Lower Austria that has been previously published by Petschko et al. (2014), which consists of mapped initiation areas for deepseated and shallow landslides. These landslides were mapped in a geographic information system (GIS) using topographic derivatives (e.g. hillshade and slope angle maps) of an airborne laser scanner (ALS) digital terrain model (DTM) with a 1 m  1 m spatial resolution, which was acquired between 2006 and 2009. The general procedure for mapping these landslides was similar to Schulz (2004, 2007). This inventory consists of points (single grid cells)

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

3

Fig. 1. A lithological map of Lower Austria indicating (black rectangles) AOIs and landslide inventories.

that mark the landslide scarp. Only one point was mapped per landslide to provide equal treatment of large and small landslide occurrences, to reduce the effect of spatial autocorrelation between observations, to increase mapping effectiveness and to avoid uncertainties in mapping landslide boundaries (Petschko et al., 2014). Further details on the landslide inventory and its quality can be found in Petschko et al. (2013, 2014). The numbers of landslides for the three AOIs used in this study are 261 landslides in the Molasse area, 193 in the Austroalpine area and 285 in the Flysch area. 2.3. Predictor variables Terrain analysis commonly forms the basis of quantitative landslide modeling (Dai and Lee, 2002; Gorsevski et al., 2006; Van den Eeckhaut et al., 2006; Muenchow et al., 2012; Petschko et al., 2014). Terrain attributes derived from elevation models function as surrogates for surface processes and geophysical site conditions to simplify complex meaningful geomorphological relationships (Pachauri and Pant, 1992; Moore et al., 1991; Lineback Gritzner et al., 2001). Goetz et al. (2011) demonstrated that empirical models based on terrain attributes alone could outperform traditional physically based models such as SHALSTAB and the factor of safety. The terrain attributes slope angle, elevation, profile curvature, plan curvature, catchment area (C. area), catchment height (C. height), convergence index (Conv. Ind.), topographic wetness

index (TWI; Beven and Kirkby, 1979), slope aspect and surface roughness (SDS) were used as predictors in this study. These were derived from a 10 m  10 m spatial resolution ALS-DTM, which is also the resolution the modeling in the analysis was completed in. All except surface roughness were processed from the DTM with SAGA GIS (Conrad, 2006). Catchment area, which was extremely skewed, was logarithmically transformed. Slope aspect was separated into the ‘east-exposedness’ (sine transformation) and ‘northexposedness’ (cosine transformation; Brenning and Trombotto, 2006). Surface roughness can be used as variable that may distinguish between types of landslide activity (Glenn et al., 2006). We define surface roughness as the variation of slope angle for a given scale for this study (Atkinson and Massari, 1998). It was calculated as the standard deviation of slope (SDS) in a 3  3 moving window. The predictor variables are summarized in Table 1. Predictor variables sensitive to temporal changes, e.g. landuse and rainfall, were not included in the analysis because the landslide inventory does not have information on when the landslides were triggered (Petschko et al., 2014). Therefore, a meaningful empirical relationship to such temporally sensitive predictors cannot be made. In addition to the terrain attributes, lithological units (map scale 1:200,000; Schnabel, 2002) were included as indicator variables. The strong generalization provided by this map scale is likely to result in a reduced contribution of lithology to the predictive performance.

4

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

Table 1 Descriptive statistics (median and interquartile range, in parentheses) of the predictor variables in each AOI and the sample size n for landslide and non-landslide points. Variable

Slope angle (deg) Elevation (m) Profile curv. (10  3 m  1) Plan curv. (10  3 m  1) C. area (log10 m2) C. height (m) Conv. Ind. TWI South( þ1)–north(  1) East( þ1)–west( 1) SDS

Median (IQR) for landslide/non-landslide points Flysch (n¼ 285)

Molasse (n¼261)

Austroalpine (n¼ 193)

21 (9)/15 (8) 507 (95)/541 (124)  4.8 (21.4)/0.5 (6.8)  6.1 (19.5)/0.4 (6.0) 3.4 (0.4)/3.2 (0.5) 32 (24)/21 (22)  5.1 (18.7)/1.3 (25.4) 8.5 (1.3)/8.6 (1.5)  0.45 (1.24)/  0.11 (1.50) 0.19 (1.14)/  0.03 (1.22) 2.5 (1.9)/1.5 (1.5)

15 (7)/7 (5) 291 (28)/310 (47) 0.4 (18.5)/0.0 (2.8)  2.1 (12.5)/0.2 (3.2) 3.0 (0.4)/3.0 (0.5) 9 (8)/8 (10)  3.2 (23.1)/2.3 (28.5) 8.5 (1.5)/9.4 (2.3) 0.65 (1.33)/0.40 (1.49)  0.15 (1.02)/  0.01 (1.21) 2.8 (2.6)/0.86 (1.4)

28 (8)/27 (12) 537 (167)/666 (185)  9.7 (23.7)/  0.6 (9.1)  11.6(24.6)/0.7 (9.1) 3.3 (0.5)/3.0 (0.5) 45 (37)/30 (40)  5.5 (18.2)/4.2 (25.2) 8.2 (1.1)/7.7 (1.4)  0.25 (1.3)/0.23 (1.6) 0.50 (0.95)/0.09 (1.27) 2.4 (2.1)/1.6 (1.9)

2.4. Susceptibility modeling techniques Six statistical and machine learning techniques are compared in this study: a generalized linear model (GLM) with stepwise variable selection (Hosmer and Lemeshow, 2000); the generalized additive model with stepwise variable selection (GAM; Hastie and Tibshirani, 1990); weights of evidence (WOE; Bonham-Carter, 1994); the support vector machine (SVM; Moguerza and Muñoz, 2006); random forest (RF; Breiman, 2001); and bootstrap aggregated classification trees (bundling) with penalized linear discriminant analysis (BPLDA; Hastie et al., 1995; Hothorn and Lausen, 2005). While each of these methods could potentially be used with a variety of settings and procedures for model selection, we choose configurations that we consider typical for most of their applications. The GLM with a logistic link function, or logistic regression, is the most common statistical technique for prediction of landslide susceptibility (Brenning, 2005). It was first applied to landslide susceptibility modeling by Atkinson and Massari (1998) and Guzzetti et al. (1999). In contrast, generalized additive models (GAM) have recently been applied for landslide susceptibility mapping (Goetz et al., 2011; Muenchow et al., 2012). A GAM is a semi-parametric nonlinear extension of a GLM (Hastie and Tibshirani, 1990). In our application of the GLM and GAM we use forward–backward stepwise variable selection based on the Akaike Information Criterion (AIC; Akaike, 1974). Weights of evidence (WOE) is a non-linear statistical technique based on the log-linear form of the Bayesian probability model (Bonham-Carter, 1994). Early application of WOE for landslide susceptibility modeling was completed by Lee et al. (2002). The weights, which represent the statistical relationship of the predictor variable to the response, are defined as the combination of the natural logarithm of relative risk-the ratio of conditional probabilities for the presence to absence of a response to a set of discrete, categorical variables. The conditional probabilities are calculated using Bayes' theorem. This technique requires continuous variables to be classified into a set of categories. We use an automatic approach that ‘bins’ the data into quartiles, which is believed to provide a reasonable trade-off between model flexibility (more bins) and sufficient data availability for estimating the weights. The WOE method assumes predictor variables are conditionally independent (Bonham-Carter, 1994). Instead of statistically testing these assumptions (Bonham-Carter, 1994; Agterberg and Cheng, 2002), which is rarely done in practical applications and would require additional questionable assumptions such as spatial independence, we use this technique as a purely predictive method and interpret predicted ‘probabilities’ only as a relative index of susceptibility (Neuhäuser and Terhorst, 2007).

The support vector machine (SVM) is a machine learning technique that is based on discrimination of classes in a high-dimensional feature space that is generated through nonlinear transformations of the predictors (Vapnik, 1998). In this high-dimensional space, a decision hyperplane is computed to separate prediction classes. Brenning (2005) demonstrated the potential use of SVM for susceptibility modeling. Our implementation of SVM used the default parameter settings of the R package e1071; the regularization parameter was C ¼1, and the kernel bandwidth was γ ¼p  1, where p is the number of predictors. SVM parameter tuning was not performed since this process does not necessarily improve model performance and may result in poorly defined, or highly variable optimal parameters when comparing different cross-validation repetitions (Brenning et al., 2009, 2012). Classification trees are a nonlinear technique for predicting a response using a set of binary decision rules that determine class assignment based on the predictors (Breiman et al., 1984). Random forest (RF) is an ensemble technique that utilizes many classification trees (a ‘forest’) to stabilize the model predictions (Breiman, 2001). These trees are fitted to resamples of the observations that are selected randomly with replacement (bootstrap resampling). Each decision of the tree is furthermore based on randomly selected predictors. The prediction of class assignment is determined by the majority voting among all trees, and the proportion of trees in the ensemble that predict landslide presence can be used as an index of landslide susceptibility. RF was recently applied for landslide susceptibility modeling by Ließ et al. (2011). Bundling is another ensemble classification tree technique (Hothorn and Lausen, 2005). Like RF, bundling uses a bootstrapaggregation approach; however, an ancillary classifier is trained on the part of the training set that is not included in the bootstrap resample. This ancillary classifier is then used as an additional predictor in constructing a tree. In this study we use penalized linear discriminant analysis (PLDA) as the ancillary classifier with the bundling approach (BPLDA). PLDA is a discriminant analysis technique designed for high-dimensional data and correlated predictors. It avoids overfitting by applying smoothing constraints on coefficients of the predictor variables (Hastie et al., 1995). Bundling with PLDA has not yet been applied for landslide susceptibility modeling; however, Brenning (2009) utilized it for another geomorphological classification context, and Bundling with other ancillary classifiers was included in a model comparison by Brenning (2005). 2.4.1. Assessing prediction performance The performances of the susceptibility models were estimated with a repeated k-fold spatial cross-validation approach (Brenning et al., 2012). This approach is similar to k-fold cross-validation,

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

where the data is randomly partitioned into k disjoint sets, and one set at a time is used for model testing while the combined remaining k  1 sets are used for model training (e.g., James et al., 2013). However, instead of partitioning the dataset into k random subsets, spatial cross-validation splits the data into spatially disjoint sub-areas. In this study, these partitions were formed with the k-means clustering algorithm (Ruß and Brenning, 2010). The estimation of model performance was repeated 20 times using k ¼5 spatial cross-validation folds. Each model training and testing was based on the commonly applied 1:1 sampling strategy of presence to absence of landslide initiation (Heckmann et al., 2014). Other sampling strategies have been investigated by Heckmann et al. (2014) and Regmi et al. (2014) for landslide susceptibility modeling using logistic regression; however, they both conclude that the sampling ratio of presence to absence of landslides does not significantly influence the prediction accuracies. The performance measure estimated with spatial cross-validation was the area under the receiver operating characteristic (ROC) curve (AUROC). The ROC curve plots all possible true positive rates (TPR; sensitivity) to the corresponding false positive rates (FPR, 1 – specificity). AUROC values close to 50% indicate no discrimination, while an AUROC close to 100% indicates perfect discrimination between binary prediction classes. The AUROC is independent of a specific decision threshold (Zweig and Campbell, 1993; Beguería, 2006). In addition, to assess the ability of the models to predict the occurrence of landslide initiation while correctly classifying most non-landslide locations as stable, the TPR at a low FPR of 10% (or a high specificity of 90%) was measured (Brenning, 2012a). The differences in model performance were tested for statistical significance to show these differences may not only be caused by random variability. The statistical comparison of AUROC and TPR at 10% FPR performances for different susceptibility models was based on the non-parametric Kruskal–Wallis test for systematic differences among a set of variables. The Wilcoxon–Mann– Whitney rank sum test was then applied to individually detect differences in model performances. An adjustment for multiple comparisons was applied using the Benjamini–Hochberg procedure to control the false discovery rate, i.e. the expected proportion of falsely rejected null hypotheses among all rejected null hypotheses (FDR; Benjamini and Hochberg, 1995; Brenning, 2009). The modeling and statistical analysis was conducted entirely in R, a free software environment for statistical computing (version 3.0.0; R Development Core Team, 2013), with the contributed packages ‘sperrorest’ (Brenning, 2012b), ‘e1071’ (Dimitriadou et al., 2007; Chang and Lin, 2011), ‘gam’ (Hastie, 2009), ‘ipred’ (Peters and Hothorn, 2009), ‘mda’ (Hastie and Tibshirani, 2009), ‘randomForest’ (Breiman and Cutler, 2012), ‘raster’ (Hijmans and van Etten, 2013), and ‘ROCR’ (Sing et al., 2009). 2.4.2. Estimating which predictors are important A permutation-based variable accuracy importance approach, which computes how much a performance measure deteriorates when an individual variable is randomly permuted, was used to assess variable importance across all model types (i.e., ‘messed up’; Strobl et al., 2007). As applied in a geomorphological classification study by Brenning et al. (2012), we measured variable importance in a spatial cross-validation context. This permutationbased spatial variable importance (SVI) measure was calculated for the change in median AUROC values estimated by spatial crossvalidation. In this approach, one predictor variable was permuted ten times for each test partition for a total of 1000 permutations per predictor, and the AUROC of the prediction for each permutation was measured and compared to the unperturbed predictive performance. The SVI was standardized for each model by dividing it by the highest value obtained for an individual predictor.

5

Therefore, the SVI values range from 0 to 1, 0 being low relative importance, and 1 being high relative importance. 2.4.3. Mapping landslide susceptibility Maps of the model predictions were produced for each AOI and modeling technique to facilitate a visual comparison of model outputs. Since the models were fitted using a 1:1 sampling ratio of landslide to non-landslide points, the predicted unadjusted probabilities should be interpreted as relative scores (see Petschko et al., 2014 for possible adjustments). Since these models are typically classified into levels of susceptibility, the probabilities were classified into five classes representing the relative potential for landslide initiation. An approach based on an equal percentage of overall area for each class was applied in this study to facilitate the visual comparison of the output predictions (Chung and Fabbri, 2003). The probabilities were classified based on the 50th, 75th, 90th and 95th percentiles of the prediction values.

3. Results 3.1. The predictive performances The variation in median AUROC model performances between the models was relatively low for the Flysch and Molasse AOIs. In the Flysch and Molasse AOIs, differences in median AUROC between models were only up to 2.9 and 3.6 percentage points (pp; Table 2). In contrast, differences in median AUROC between models in the Austroalpine AOI was up to 8.9pp. The tree-based ensemble techniques, RF and BPLDA had the highest model performance based on median AUROC values for all three AOIs (Figs. 2 and 3, Table 2). RF was overall the highest ranked model based on median AUROC and median TPR at 10% FPR. It achieved the highest median AUROC for the Flysch (86.3%) and Molasse (93.0%) areas, and lowest IQR of AUROC for AOIs. RF also had the highest median TPR at 10% FPR estimation for all AOIs: Flysch 64.3%, Molasse 79.1% and Austroalpine 56.5%. The model performance of BPLDA was similar to RF and always achieved a top three estimation of median AUROC and median TPR at 10% FPR for all AOIs, usually just behind RF. The rank in performances of WOE, SVM, and GAM was much more variable than RF and BLPDA. GAM had the second highest median TPR at 10% FPR in the Flysch area (61.5%); however, it had the second lowest median TPR at 10% FPR for the remaining AOIs. WOE usually performed in the middle of the pack just behind SVM. The lowest model performance was consistently achieved by the GLM, which includes having the highest estimated AUROC and IQR values of all models in the AOIs. Overall, the differences in AUROC performance were not only small, but in many cases they were insignificant (Table 2). By estimating significance of AUROC difference, with a pairwise comparison, it was observed in all areas that there were no significant systematic differences in AUROC values for RF and BPLDA, and SVM and WOE. Also, it was observed that the GLM AUROC performance was always significantly lower than the other models in all AOIs. 3.2. A ranking of predictor importance The ranking of relative variable importance was substantially different for all modeling techniques and AOIs. However, there was some consistency in the set of highest ranked predictors for each area (Table 3). Slope angle, surface roughness (SDS) and plan curvature were the only variables that were ranked in the top five based on maximum SVI in all of the AOIs (SVI40.28). The highestranked predictor based on maximum SVI was always slope angle (SVI¼1.00). The most consistently lower-ranked variables

6

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

Table 2 Model performance estimated with 20-repeated 5-fold spatial cross-validation. The median summarizes the central tendency of the estimated performance and the interquartile range (IQR) its spread. The systematic difference (Δ), by percentage points (pp), and the associated significance of model performance (AUROC and TPR at 10% FPR) were measured for pairwise comparisons. The significance of systematic differences was based on the adjusted p-values (to control the FDR) corresponding to the Wilcoxon–Mann–Whitney tests. Model

AUROC

Model

Median (IQR)

Δ AUROC

p-Values

FLYSCH GLM SVM WOE GAM BPLDA RF

83.4 (6.4) 84.6 (6.0) 84.9 (6.0) 85.1 (5.4) 85.3 (5.2) 86.3 (4.1)

þ þ þ þ þ

1.3 0.3 0.2 0.1 1.0

p ¼ 0.030n p ¼ 0.468 p ¼ 0.155 p ¼ 0.471 p ¼ 0.094 

MOLASSE GLM GAM WOE SVM BPLDA RF

89.4 (7.7) 91.9 (7.3) 92.1 (7.0) 92.3 (6.9) 92.5 (5.7) 93.0 (4.9)

þ þ þ þ þ

2.5 0.2 0.3 0.1 0.5

AUSTROALPINE GLM GAM WOE SVM RF BPLDA

74.7 (12.1) 76.6 (9.9) 80.1 (9.0) 80.4 (9.2) 83.5 (5.5) 83.6 (6.8)

þ 2.0 þ 3.5 þ 0.3 þ 3.2 þ 0.0

TPR at 10% FPR Median (IQR)

Δ TPR

p-Values

GLM WOE SVM BPLDA GAM RF

53.2 (25.8) 54.4 (22.5) 58.9 (18.8) 59.2 (12.5) 61.5 (25.3) 64.3 (18.7)

þ 1.2 þ 4.6 þ 0.3 þ 2.2 þ 2.8

p ¼ 0.025n p ¼ 0.012n p ¼ 0.840 p ¼ 0.980 p ¼ 0.034n

p o 0.001nn p ¼ 0.358 p ¼ 0.900 p ¼ 0.809 p ¼ 0.707

GLM GAM SVM BPLDA WOE RF

64.1 (28.3) 75.2 (29.9) 76.5 (33.1) 76.7 (25.3) 77.8 (35.7) 79.1 (19.8)

þ 11.1 þ 1.3 þ 0.2 þ 1.1 þ 1.3

p ¼ 0.012n p ¼ 0.911 p ¼ 0.747 p ¼ 0.911 p ¼ 0.747

p ¼ 0.092  p ¼ 0.015n p ¼ 0.878 p o 0.001nn p ¼ 0.977

GLM GAM WOE SVM BPLDA RF

43.9 46.6 50.0 50.0 55.9 56.5

þ 2.7 þ 3.4 þ 0.0 þ 5.9 þ 0.6

p ¼ 0.499 p ¼ 0.103 p ¼ 0.464 p ¼ 0.008nnn p ¼ 0.926

(26.1) (22.0) (20.7) (23.5) (18.7) (16.2)

Significance codes for adjusted p-values: p o 0.001 “nnn”, p o 0.01 “nn”, p o0.05 “n”, p o0.1 “  ”, p 40.1 “ ”.

(SVIr0.15) were the convergence index (Conv. Ind), south-north slope aspect and catchment height (C. height). Slope angle (max. SVI ¼1.00), catchment area (C. area; 1.00), and plan curvature (0.83) were the highest ranked variables in the Flysch Area. In the Molasse area, slope (1.00), surface roughness (0.75) and elevation (0.60) were the highest ranked variables; and in the Austroalpine area they were slope (1.00), profile curvature and surface roughness (1.00). There was no pattern related to how SVI was distributed for each modeling technique across the three AOIs. However, SVI was distributed more uniformly for the AOIs with lower AUROC values. The Austroalpine area, which had the lowest AUROC performances (the mean median-AUROC¼79.8), had the largest number of variables with an SVI40.15 (mean¼6). The Flysch area had a

mean median-AUROC of 84.9 and a mean of 5 variables with an SVI4 0.15. The highest AUROC performances (mean medianAUROC ¼91.9) in the Molasse area had generally the lowest spread of variables with SVI 40.15 (mean ¼4). Correlations between predictor variables were examined with Spearman rank coefficient (ρSp). In each AOI, catchment area was strongly correlated with catchment height, convergence index and TWI (0.59 r|ρSp| r0.85); plan and profile curvature were moderately to strongly correlated (0.59 r|ρSp| r0.85); and catchment height, convergence index and topographic wetness were moderately correlated (0.41 r|ρSp| r0.68). Slope angle was only moderately correlated with other predictors, TWI (ρSp ¼  0.65) and SDS (ρSp ¼0.66), in the Molasse AOI.

Fig. 2. Box-and-whisker plot of area under the receiver operating characteristic curve (AUROC %) estimated for each prediction technique applied to landslide susceptibility modeling in different AOIs.

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

7

Fig. 3. Box-and-whisker plot of true positive rates (TPR %) estimated for each prediction technique at a 10% false positive rate (FPR %) applied to landslide susceptibility modeling in different AOIs.

3.3. Comparing susceptibility map appearances Modeling techniques whose predictions are a continuous function of the predictors (GLM, GAM, SVM) had much smoother prediction surfaces in the landslide susceptibility maps (Fig. 4). WOE, RF and

BPLDA, in contrast, produced more heterogeneous prediction surfaces. In particular, the RF and BPLDA models had more spatial artifacts than the other techniques, which make the prediction surface appear rather noisy. Abrupt changes in prediction surface related to actual categorical predictors (i.e., lithology) were not present in the maps.

Table 3 Spatial variable importance (SVI) based on median AUROC values from spatial cross-validation. The values are standardized relative to the most important predictor variable for each model. Variable

Rank

Max. SVI

FLYSCH Slope angle C. area Plan curv. SDS TWI Profile curv. East–west Elevation South–north Conv. Ind. C. height Lithology

1 2 3 4 5 6 7 8 9 10 11 12

1.00 1.00 0.83 0.70 0.61 0.31 0.17 0.10 0.09 0.06 0.04 0.03

MOLASSE Slope angle SDS Elevation TWI Plan curv. Profile curv. Lithology C. height Conv. Ind. South–north C. area East–west

1 2 3 4 5 6 7 8 9 10 11 12

AUSTROAPLINE Slope angle Profile curv. SDS Plan curv. Elevation C. area TWI East–west Conv. Ind. South–north C. height Lithology

1 2 3 4 5 6 7 8 9 10 11 12

GAM

GLM

WOE

SVM

RF

BPLDA

0.74 1.00 0.28 0.52 0.40 0.02 0.00 0.08 0.09 0.02 0.03 0.00

0.68 1.00 0.33 0.25 0.61 0.02  0.01 0.10 0.08 0.00 0.00 0.00

1.00 0.11 0.56 0.61 0.04 0.28 0.00 0.05 0.05  0.07 0.04 0.03

1.00 0.46 0.40 0.70 0.08 0.07 0.15 0.08  0.02 0.01  0.02 0.00

1.00 0.19 0.83 0.69 0.12 0.31 0.14 0.03  0.01 0.06 0.04 0.00

1.00 0.52 0.67 0.51 0.12 0.21 0.17 0.09 0.00 0.05  0.01  0.01

1.00 0.75 0.60 0.49 0.28 0.26 0.20 0.06 0.04 0.03 0.02 0.00

1.00 0.05 0.25 0.13 0.01 0.01 0.04 0.00 0.00 0.00 0.02 0.00

1.00 0.02 0.17 0.42 0.02 0.00 0.06 0.00 0.02 0.00 0.02 0.00

1.00 0.75 0.60 0.49 0.28 0.26 0.20 0.06 0.02 0.03 0.00  0.01

1.00 0.17 0.25 0.29 0.05 0.02 0.05 0.02 0.04 0.03 0.00  0.01

1.00 0.17 0.26 0.11 0.07 0.03 0.02 0.01 0.01 0.01 0.01 0.00

1.00 0.12 0.31 0.14 0.03 0.08 0.04 0.02 0.00 0.02 0.01 0.00

1.00 1.00 1.00 0.86 0.53 0.48 0.29 0.18 0.15 0.09 0.01 0.00

1.00 0.03 0.38 0.78 0.39 0.41 0.22 0.11 0.04 0.02 0.00  0.04

1.00 0.00 0.15 0.86 0.53 0.43 0.00 0.13 0.01 0.09 0.01  0.01

0.43 1.00 0.44 0.78 0.14  0.02  0.05 0.18 0.04  0.04  0.05  0.03

0.79 0.49 1.00 0.77 0.51 0.48 0.29 0.04 0.12 0.04  0.16  0.04

0.39 1.00 0.45 0.80 0.22 0.10 0.11  0.01 0.15 0.03  0.10 0.00

0.48 1.00 0.26 0.61 0.20 0.17 0.07 0.00 0.08 0.06  0.07  0.01

The three highest SVI values for each model area are printed in boldface.

8

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

Fig. 4. An example of classified landslide susceptibility maps for each prediction technique in the Austroalpine AOI.

4. Discussion 4.1. Evaluating prediction performance With all of the available prediction techniques and methods for model selection, it is always critical to remember that in reality no one model is correct in a suite of competing models (Elith et al., 2002). Therefore, we can only base our criteria and decision to select a model on the specific scientific goals of the study (Elith and Leathwick, 2009). In this paper, the spatial prediction ability of empirically-based landslides susceptibility models was investigated while using typical model configurations. Clearly, as is the case for any complex algorithm, different model or data

configurations may lead to different results, and it is beyond the scope of this study to evaluate these possible differences. Among the factors that may influence model performances are, in particular, feature selection procedures (Nguyen and de la Torre, 2010), model setup (e.g., SVM kernel choice: Moguerza and Muñoz, 2006), sampling design (Mathur and Foody, 2008) or preprocessing of predictors (Xu et al., 2014). Apart from the GLM, or logistic regression, the AUROC performances of the prediction techniques appear to be very similar. Previous comparison of prediction techniques also found no or only small differences that were significant in prediction of landslide susceptibility (Yilmaz, 2009, 2010; Goetz et al., 2011). Consequently, we cannot declare which model is best solely on the

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

performance measures unless they had significant and practical differences. In general, the ability of many machine learning prediction techniques to represent complex nonlinear relationships and higher-order interactions are similar. This can result in statistically insignificant results when comparing their performances (Han and Kamber, 2006). The small differences in AUROC performance can have some practical significance depending on the prediction requirements for a particular application (Beguería, 2006). In the case of landslide risk management, a higher detection rate of landslides in areas where the detection of the absence of landslides is already high can translate to an improved model prediction for high risk areas (Goetz et al., 2011; Brenning, 2012a). The TPR at 10% FPR is an example of an AUROC cutoff measure that can be used to assess such a specific prediction requirement (Goetz et al., 2011). For example, the difference between the best and worst median AUROC performances was 2.9–9pp, depending on the AOI (Table 2). These differences in median AUROC resulted in an increase of 11–15pp in the TPR at 10% FPR. That is, 21–29% more landslides were being detected in potential high risk areas. The TPR at 10% FPR is only one such measure that was decided to present in this paper. The ROC curve can be utilized to weigh the assessment on either high accuracy of predicting the presence of landslides or of identifying the most stable areas (Gorsevski et al., 2006). More balanced decisions regarding class determination can be obtained utilizing the ROC curve since the specific accuracy of various cut-off combinations can be evaluated with this curve. Therefore, the predictive costs of the susceptibility classes can be clearly defined (Beguería, 2006; Gorsevski et al., 2006). In addition to evaluating the prediction performance with a specific accuracy measure, Guzzetti et al. (2006) suggested that the model that is least sensitive to variation in performance given different sampling conditions should be considered. We have accounted for variation in model performance by reporting the interquartile range (IQR) of the cross-validation results. Lower IQR values indicated more robust model performances. Therefore, when comparing two of the highest performing prediction techniques, random forest (RF) and bundling with penalized linear discriminant analysis (BPLDA), we can suggest that random forest was the better technique for landslide initiation prediction based on a lower IQR. 4.2. Considering other model criteria Additional criteria can also be used to support decisions in model selection. This study focused on assessing the performance of models for spatial prediction. However, in addition to spatial prediction performance, the ability to interpret the model for statistical inference (i.e. spatial analysis) to gain insight into landslide distribution characteristics may also be important (Brenning, 2012a). Therefore, a statistically valid model is critical; good predictive properties are less important in this situation (Brenning, 2005, 2012a). Interpretation of models is more complex for machine learning algorithms, which are generally considered to be ‘black-box’ models (Elith and Leathwick, 2009). The GLM and GAM, and with some limitations also WOE, are prediction techniques that provide easily interpretable results that can shed light on landslide conditioning factors (Lee et al., 2002; Brenning, 2005; Regmi et al., 2010; Goetz et al., 2011). Especially the integration of physically motivated predictors or model components can facilitate the identification of possible causal mechanisms (Goetz et al., 2011). The machine learning techniques, such as SVM, RF and BPLDA, have been developed especially for prediction. They have the advantage of automatically detecting interactions between predictors; thus, the prediction accuracy typically exceeds more conventional techniques when complex interactions are present (Elith et al., 2006).

9

In addition to careful validation of model performance, there are also qualitative features that could be of importance to the end user. For example, models with similar prediction performance do not necessarily have similar prediction surfaces (Sterlacchini et al., 2011). The geographic representation of susceptibility levels may affect the way the mapped results are interpreted. For example, isolated grid cells corresponding to a heterogeneous prediction surface, which are predicted as being possibly unstable, may affect planning decisions (also for surrounding stable terrain). It is generally easier to clearly define hazardous zones for application in planning when the prediction surface is smooth. In practice, the appearance of the prediction surface can also influence the end users perception of the method; including their trust in the model. Heterogeneous prediction surfaces are sometimes misinterpreted by end users as being associated with a poor prediction of landslide susceptibility; which is not necessarily true. In this study, the prediction performances of the RF, which does not produce a smooth prediction surface and is more prone to spatial artifacts (Brenning, 2005, 2012a), was for the majority better at prediction than models that produced much smoother prediction surfaces such as the GLM, GAM, and SVM. It is also important to mention that abrupt changes in the prediction surface can also be caused in any of the methods investigated in this paper when a categorical variable, such as lithology, has a strong effect in the model. 4.3. Comparing the importance of predictor variables A comparison of model behavior was roughly gauged on a standardized approach for comparing relative importance of predictors for each prediction technique. In this paper, it was observed with the SVI measure that the importance of predictors, which in this case are based primarily on geomorphic conditions, differed for each AOI. This relationship should be expected because each AOI is associated with generally unique geomorphological conditions. Petschko et al. (2014) observed a similar finding when analyzing variable-selection frequencies of different GAMs applied individually for lithology units in Lower Austria; the importance of variables was different for each lithology unit. It has been well established that the local site conditions have an important role in the prediction of landslide susceptibility (van Westen et al., 2003; Sidle and Ochiai, 2006; Lee et al., 2007; Blahut et al., 2010). The dissimilar rank of predictor variables between each site provided additional quantitative evidence to support this relationship. The ranking of variable importance was irregular when comparing prediction techniques. Yet, the highest ranked variables were generally consistent. Slope angle, surface roughness, and plan curvature were the most common variables ranking high in terms of relative importance in all AOIs (Table 3). The availability of geologic data was only important for the prediction using WOE in the Molasse area, which had the most heterogeneity in lithology. The lack of importance of geology may be attributed to regression dilution bias related to the thematic coarseness of the lithology (Carrol et al. 2012); more detailed, in terms of scale, geologic data may be required for the size of the AOIs. Regmi et al. (2010) suggested only a handful of variables may be sufficient for prediction of landslide susceptibility. The uneven distribution of the SVI values observed in this study may indicate just that. The different ranking of variables between prediction techniques should be expected. Yesilnacar and Topal (2005) had made a similar observation when comparing a GLM to artificial neural networks (ANN), a machine learning technique, for landslide susceptibility modeling. Although the prediction techniques had similar performances, they are unique in their individual approaches to model construction and establishing the relevant relationships between the predictor variables and landslide initiation. Understanding of these differences is essential to select a

10

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

suitable prediction technique for a specific study goal (Brenning, 2009, 2012a).

5. Conclusions Our study demonstrates that there was generally little differentiation in prediction performance between statistical and machine learning landslide susceptibility modeling techniques applied to the AOIs in Lower Austria. This result underlines that even when conducting model comparisons with a clear objective, such as prediction performance, understanding the abilities and limitation remains critical for model selection. In terms of pure prediction performance, the RF and BPLDA modeling techniques were the best. The most interpretable method and visually appealing (i.e. had a smooth prediction surface) was the GAM, which significantly performed better than the GLM. The SVM also had a smooth prediction surface, but is generally difficult to interpret. However, the SVM, RF and BPLDA may be particularly useful for high-dimensional prediction problems where a large number of highly correlated predictor variables are present. Overall, it is recommended that model evaluation should be tied closely to the goals of the study. The framework of this paper is designed to assist in the evaluation of susceptibility modeling techniques to enhance a user's decision on which method is most suitable for a particular application.

Acknowledgements The data for this research was provided by the Provincial Government of Lower Austria from the MoNOE project – Method development for landslide susceptibility maps for Lower Austria. The authors are grateful for contributions of students from the Department of Geography and Regional Research, University of Vienna, under the supervision of Dr. Rainer Bell and Dr. Thomas Glade, and colleagues at the Austrian Institute of Technology GmbH, (AIT) in the Health and Environment department, for construction of the landslide inventory. R. Cabrera implemented the WOE method during a stay at the AIT that was partly supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant (No. 355764-2013) awarded to A. Brenning. We would also like to express our gratitude to the anonymous reviewers for their constructive comments that helped to improve the paper.

References Agterberg, F.P., Cheng, Q., 2002. Conditional independence test for weights-ofevidence modeling. Nat. Resour. Res. 11 (4), 249–255. Akaike, H., 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control 19, 716–723. Atkinson, P.M., Massari, R., 1998. Generalised linear modelling of susceptibility to landsliding in the central apennines, Italy. Comput. Geosci. 24 (4), 373–385. Ayalew, L., Yamagishi, H., 2005. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko mountains Central Japan. Geomorphology 65 (1), 15–31. Beguería, S., 2006. Validation and evaluation of predictive models in hazard assessment and risk management. Nat. Hazards 37 (3), 315–329. Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B (Methodol.), 289–300. Beven, K.J., Kirkby, M.J., 1979. A physically based, variable contributing area model of basin hydrology. Hydrol. Sci. Bull., 24; , pp. 43–69. Blahut, J., van Westen, C.J., Sterlacchini, S., 2010. Analysis of landslide inventories for accurate prediction of debris-flow source areas. Geomorphology 119 (1), 36–51. Bonham-Carter, G., 1994. Geographic information systems for geoscientists; modelling with GIS. Computer Methods in Geosciences. Vol. 13. Pergamon Press, p. 398.

Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J., 1984. Classification and Regression Trees. CRC Press, Wadsworth. Breiman, L., 2001. Random forests. Mach. Learn. 45 (1), 5–32. Breiman, L., Cutler, C., 2012. Breiman and Cutler's random forests for classification and regression (randomForest). R package version 4.6-7, R port by A. Liaw & M. Wiener. Brenning, A., 2005. Spatial prediction models for landslide hazards: review, comparison and evaluation. Nat. Hazards Earth Syst. Sci. 5, 853–862. Brenning, A., Trombotto, D., 2006. Logistic regression modeling of rock glacier and glacier distribution: topographic and climatic controls in the semi-arid Andes. Geomorphology 81 (1), 141–154. Brenning, A., 2009. Benchmarking classifiers to optimally integrate terrain analysis and multispectral remote sensing in automatic rock glacier detection. Remote Sens. Environ. 113, 239–247. Brenning, A., 2012a. Improved spatial analysis and prediction of landslide susceptibility: practical recommendations. In: Eberhardt E. Froese, C., Turner A.K., Leroueil S. (Eds.). Landslides and Engineered Slopes: Protecting Society through Improved Understanding. Proceedings of the 11th International and 2nd North American Symposium on Landslides and Engineered Slopes, vol. 1., Banff, Canada, 3–8 June 2012, CRC Press/Balkema Leiden, the Netherlands, pp. 789–794. Brenning, A., 2012b. Spatial cross-validation and bootstrap for the assessment of prediction rules in remote sensing: the R package ‘sperrorest’ 2012. IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 23–27 July 2012, pp. 5372–5375. Brenning, A., Long, S., Fieguth, P., 2012. Detecting rock glacier flow structures using Gabor filters and IKONOS imagery. Remote Sens. Environ. 125, 227–237. Carrara, A., Guzzetti, F., Cardinali, M., Reichenbach, P., 1999. Use of GIS technology in the prediction and monitoring of landslide hazard. Nat. Hazards 20 (2–3), 117–135. Carrol, R.J., Rupert, D., Stefanski, L.A., Crainiceanu, C.M., 2012. Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn. CRC Press, New York, p. 488. Chang, C.C., Lin, C.J., 2011. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2 (3), 1–27. Chung, C.J.F., Fabbri, A.G., 1999. Probabilistic prediction models for landslide hazard mapping. Photogramm. Eng. Remote Sens. 65 (12), 1389–1399. Chung, C.J.F., Fabbri, A.G., 2003. Validation of spatial prediction models for landslide hazard mapping. Nat. Hazards 30 (3), 451–472. Conrad, O., 2006. SAGA – Program structure and current state implementation. In: Böhner, J., McCloy, K.R., Strobl, J. (Eds.), SAGA — Analysis and Modelling Applications, vol. 115. Göttinger Geographische Abhandlungen, pp. 39–52. Dai, F.C., Lee, C.F., 2002. Landslide characteristics and slope instability modeling using GIS Lantau Island, Hong Kong. Geomorphology 42, 213–228. Dai, F.C., Lee, C.F., Ngai, Y.Y., 2002. Landslide risk assessment and management: an overview. Eng. Geol. 64 (1), 65–87. Dimitriadou, E., Hornik, K., Leisch, F., Meyer, D., Weingessel, A., 2007. e1071: Misc Functions of the Department of Statistics (e1071) TU Wien. R package version 1.6-1. Elith, J., Burgman, M.A., Regan, H., 2002. Mapping epistemic uncertainties and vague concepts in predictions of species distribution. Ecol. Modell. 157 (2), 313–329. Elith, J., et al., 2006. Novel methods improve prediction of species' distributions from occurrence data. Ecography 29, 129–151. Elith, J., Leathwick, J.R., 2009. Species distribution models: ecological explanation and prediction across space and time. Annu. Rev. Ecol., Evol. Syst. 40, 677–697. Frattini, P., Crosta, G., Carrara, A., 2010. Techniques for evaluating the performance of landslide susceptibility models. Eng. Geol. 111, 62–72. Glenn, N.F., Streutker, D.R., Chadwick, D.J., Thackray, G.D., Dorsch, S.J., 2006. Analysis of LiDAR-derived topographic information for characterizing and differentiating landslide morphology and activity. Geomorphology 73 (1), 131–148. Goetz, J.N., Guthrie, R.H., Brenning, A., 2011. Integrating physical and empirical landslide susceptibility models using generalized additive models. Geomorphology 129 (3), 376–386. Gottschling, P., 2006. Massenbewegungen. Geologie der Bundesländer-Niederösterreich. vol. 2. Geologische Bundesanstalt, Wien, pp. 335–340. Gorsevski, P.V., Gessler, P.E., Foltz, R.B., Elliot, W.J., 2006. Spatial prediction of landslide hazard using logistic regression and ROC analysis. Trans. GIS 10 (3), 395–415. Guzzetti, F., Reichenbach, P., Ardizzone, F., Cardinali, M., Galli, M., 2006. Estimating the quality of landslide susceptibility models. Geomorphology 81 (1), 166–184. Han, J., Kamber, M., 2006. Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco, p. 743. Hastie, T.J., Buja, A., Tibshirani, R., 1995. Penalized discriminant analysis. Ann. Stat. 23 (1), 73–102. Hastie, T.J., Tibshirani, R., 1990. Generalized Additive Models.. Chapman & Hall, London, p. 352. Hastie, T., 2009. GAM: Generalized Additive Models R package version 1.08. Hastie, T.J., Tibshirani, R., 2009. MDA: Mixture and Flexible Discriminant Analysis. R package version 0.4-2, R port by F. Leisch, K. Hornik & B. D. Ripley. 〈http://cran. r-project.org/package¼mda〉. Heckmann, T., Gregg, K., Gregg, A., Becht, M., 2014. Sample size matters: investigating the effect of sample size on a logistic regression susceptibility model for debris flows. Nat. Hazards Earth Syst. Sci. 14, 259–278. Hijmans, R.J., van Etten, J., 2013. Raster: Geographic data analysis and modeling (raster). R package version 2.1-25. Hosmer, D.W., Lemeshow, S., 2000. Applied Logistic Regression, 2nd edition. John Wiley & Sons, New York, p. 373.

J.N. Goetz et al. / Computers & Geosciences 81 (2015) 1–11

Hothorn, T., Lausen, B., 2005. Bundling classifiers by bagging trees. Comput. Stat. Data Anal. 49 (4), 1068–1078. James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An Introduction to Statistical Learning. Springer, New York, p. 441. Lee, S., Choi, J., Min, K., 2002. Landslide susceptibility analysis and verification using the Bayesian probability model. Environ. Geol. 43 (1–2), 120–131. Lee, S., Ryu, J.H., Kim, I.S., 2007. Landslide susceptibility analysis and its verification using likelihood ratio, logistic regression, and artificial neural network models: case study of Youngin Korea. Landslides 4 (4), 327–338. Ließ, M., Glaser, B., Huwe, B., 2011. Functional soil-landscape modelling to estimate slope stability in a steep Andean mountain forest region. Geomorphology 132 (3), 287–299. Lineback Gritzner, M., Marcus, W.A., Aspinall, R., Custer, S.G., 2001. Assessing landslide potential using GIS, soil wetness modeling and topographic attributes Payette River, Idaho. Geomorphology 37 (1), 149–165. Mathur, A., Foody, G.M., 2008. Crop classification by a support vector machine with intelligently selected training data for an operational application. Int. J. Remote Sens. 29 (8), 2227–2240. Micheletti, N., Foresti, L., Robert, S., Leuenberger, M., Pedrazzini, A., Jaboyedoff, Kanevski, M., 2014. Machine learning feature selectoin methods for landslide susceptiblity mapping. Math. Geosci. 46, 33–57. http://dx.doi.org/10.1007/ s11004-013-9511-0. Moguerza, J.M., Muñoz, A., 2006. Support vector machines with applications. Stat. Sci. 21 (3), 322–336. http://dx.doi.org/10.1214/088342306000000493. Moore, I.D., Grayson, R.B., Ladson, A.R., 1991. Digital terrain modelling: a review of hydrological, geomorphological, and biological applications. Hydrol. Process. 5 (1), 3–30. Muenchow, J., Brenning, A., Richter, M., 2012. Geomorphic process rates of landslides along a humidity gradient in the tropical Andes. Geomorphology 139140, 271–284. Neuhäuser, B., Terhorst, B., 2007. Landslide susceptibility assessment using “weights-of-evidence” applied to a study area at the Jurassic escarpment (SWGermany). Geomorphology 86 (1), 12–24. Nguyen, M.H., de la Torre, F., 2010. Optimal feature selection for support vector machines. Pattern Recognit. 43, 584–591. Pachauri, A.K., Pant, M., 1992. Landslide hazard mapping based on geological attributes. Eng. Geol. 32 (1), 81–100. Peters, A., Hothorn, T., 2009. Ipred: Improved predictors. R package version 0.9-1. Petschko, H., Bell, R., Leopold, P., Heiss, G., Glade, T., 2013. Landslide inventories for reliable susceptibility maps. In: Margottini, C., Canuti, P., Sassa, K. (Eds.), Landslide Science and Practice, vol. 1: Landslide Inventory and Susceptibility and Hazard Zoning. Springer. Petschko, H., Brenning, A., Bell, R., Goetz, J., Glade, T., 2014. Assessing the quality of landslide susceptibility maps – case study Lower Austria. Nat. Hazards Earth Syst. Sci. 14, 95–118. http://dx.doi.org/10.5194/nhess-14-95-2014. Pradhan, B., 2013. A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput. Geosci. 51, 350–365. R Development Core Team R, 2003. A language and environment for statistical computing. R Foundation for Statistical Computing Vienna, Austria. Regmi, N.R., Giardino, J.R., Vitek, J.D., 2010. Modeling susceptibility to landslides using the weight of evidence approach: Western Colorado, USA. Geomorphology 115 (1), 172–187. Regmi, N.R., Giardino, J.R., McDonald, E., Vitek, J.D., 2014. A comparison of logistic regression-based models of susceptibility to landslides in western Colorado, USA. Landslides 11, 247–262. Ruß, G., Brenning, A., 2010. Data mining in precision agriculture: Management of spatial information. Lect. Notes Comput. Sci. 6178, 350–359. http://dx.doi.org/ 10.1007/978-3-642-14049-5_36. Schnabel, W., 2002. Geologische Karte von Niederösterreich 1:200000. Wien, Austria (in German). Schweigl, J., Hervás, J., 2009. Landslide Mapping in Austria, JRC Scientific and Technical Reports, European Commission Joint Research Centre, Institute for

11

Environment and Sustainability, Italy. Available at: 〈http://eusoils.jrc.ec.europa. eu/ESDB_Archive/eusoils_docs/other/EUR23785EN.pdf〉 (last access: 01.0311). Schwenk, H., 1992. Massenbewegungen in Niederösterreich 1953-1990. Jahrbuch der Geologischen Bundesanstalt. vol. 135. Geologische Bundesanstalt, Wien, pp. 597–660. Schulz, W.H., 2004. Landslides mapped using LIDAR imagery, Seattle, Washington. U.S. Geological Survey Open-File Report 2004-1396. 11 pp., 1 plate. Schulz, W.H., 2007. Landslide susceptibility revealed by LIDAR imagery and historical records, Seattle, Washington. Eng. Geol. 89, 67–87. Sidle, R.C., Ochiai, H., 2006. Landslides: Processes, Prediction, and Land Use. American Geophysical Union, Washington DC, p. 312. Sing, T., Sander, O., Beerenwinkel, N., Lengauer, T., 2009. ROCR: Visualizing the Performance of Scoring Classifiers. R package version 1.0-4. http://cran.r-pro ject.org/package¼ ROCR. Soeters, R., van Westen, C. J., 1996. Landslides: Investigation and mitigation. Chapter 8-Slope instability recognition, analysis, and zonation. Transportation Research Board Special Report 247. Sterlacchini, S., Ballabio, C., Blahut, J., Masetti, M., Sorichetta, A., 2011. Spatial agreement of predicted patterns in landslide susceptibility maps. Geomorphology 125 (1), 51–61. Strobl, C., Boulesteix, A.-L., Zeileis, A., Horthon, T., 2007. Bias in random forest variable importance measures: Illustrations sources, and a solution. BMC Bioinf. 8, 25. http://dx.doi.org/10.1186/1471-2105-8-25. Van den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G., Vandekerckhove, L., 2006. Prediction of landslide susceptibility using rare events logistic regression: a case-study in the Flemish Ardennes (Belgium). Geomorphology 76, 392–410. van Westen, C.J., Rengers, N., Terlien, M.T.J., Soeters, R., 1997. Prediction of the occurrence of slope instability phenomenal through GIS-based hazard zonation. Geol. Rundsch. 86 (2), 404–414. van Westen, C.J., Rengers, N., Soeters, R., 2003. Use of geomorphological information in indirect landslide susceptibility assessment. Nat. Hazards 30 (3), 399–419. van Westen, C.J., Castellanos, E., Kuriakose, S.L., 2008. Spatial data for landslide susceptibility, hazard, and vulnerability assessment: an overview. Eng. Geol. 102 (3), 112–131. Vapnik, V., 1998. Statistical Learning Theory. John Wiley & Sons Inc., New York, p. 736. Varnes, D.J., 1984. Landslide Hazard Zonation: A Review of Principles and Practice, Natural Hazards No. 3, IAEG Commission on Landslides and other MassMovements, UNESCO, Paris. Wessely, G., 2006. Geologie der Österreichischen Bundesländer: Niederösterreich. Geologische Bundesanstalt, Vienna, p. 416. Xu, L., Li, J., Brenning, A., 2014. A comparative study of different classification techniques for marine-oil spill identification using RADARSAT-1 imagery. Remote Sens. Environ. 141, 14–23. Yalcin, A., Reis, S., Aydinoglu, A.C., Yomralioglu, T., 2011. A GIS-based comparative study of frequency ratio, analytical hierarchy process, bivariate statistics and logistic regression methods for landslide susceptibility mapping in Trabzon, NE Turkey. Catena 85, 2011. Yesilnacar, E., Topal, T., 2005. Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Eng. Geol. 79 (3), 251–266. Yilmaz, I., 2009. Landslide susceptibility mapping using frequency ration, logistic regression, artificial neural networks and their comparison: a case study from Kat landslides (Tokat-Turkey). Comput. Geosci. 35, 1125–1138. Yilmaz, I., 2010. Comparison of landslide susceptibility mapping methodologies for Koyulhisar, Turkey: conditional probability, logistic regression, artificial neural networks, and support vector machine. Environ. Earth Sci. 61 (4), 821–836. Zweig, M.H., Campbell, G., 1993. Receiver-operating characteristic (ROC) plots. Clin. Chem. 39, 561–577.