Engineering Geology 110 (2009) 11–20
Contents lists available at ScienceDirect
Engineering Geology j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / e n g g e o
A GIS-based landslide susceptibility evaluation using bivariate and multivariate statistical analyses A. Nandi a,⁎, A. Shakoor b a b
Department of Geosciences, East Tennessee State University, Johnson City, Tennessee 37614, USA Department of Geology, Kent State University, Kent, Ohio 44242, USA
a r t i c l e
i n f o
Article history: Received 12 May 2008 Received in revised form 29 April 2009 Accepted 15 October 2009 Available online 25 October 2009 Keywords: Landslide susceptibility Bivariate analysis Logistic regression analysis GIS ROC curve Cuyahoga River watershed
a b s t r a c t Bivariate and multivariate statistical analyses were used to predict the spatial distribution of landslides in the Cuyahoga River watershed, northeastern Ohio, U.S.A. The relationship between landslides and various instability factors contributing to their occurrence was evaluated using a Geographic Information System (GIS) based investigation. A landslide inventory map was prepared using landslide locations identified from aerial photographs, field checks, and existing literature. Instability factors such as slope angle, soil type, soil erodibility, soil liquidity index, landcover pattern, precipitation, and proximity to stream, responsible for the occurrence of landslides, were imported as raster data layers in ArcGIS, and ranked using a numerical scale corresponding to the physical conditions of the region. In order to investigate the role of each instability factor in controlling the spatial distribution of landslides, both bivariate and multivariate models were used to analyze the digital dataset. The logistic regression approach was used in the multivariate model analysis. Both models helped produce landslide susceptibility maps and the suitability of each model was evaluated by the area under the curve method, and by comparing the maps with the known landslide locations. The multivariate logistic regression model was found to be the better model in predicting landslide susceptibility of this area. The logistic regression model produced a landslide susceptibility map at a scale of 1:24,000 that classified susceptibility into four categories: low, moderate, high, and very high. The results also indicated that slope angle, proximity to stream, soil erodibility, and soil type were statistically significant in controlling the slope movement. © 2009 Elsevier B.V. All rights reserved.
1. Introduction Assessing landslide related hazard with only limited background information and data is a constant challenge for engineers, geologists, planners, landowners, developers, insurance companies, and government entities (Van Westen, 1993; Aleotti and Chowdhury, 1999; Glade and Crozier, 2005). Landslide hazard assessment is performed in various steps. The first step is identifying and evaluating landslideprone areas, and constructing a landslide inventory map for future use. Landslide inventory mapping is the systematic mapping of existing landslides in a region using different techniques such as field survey, air photo interpretation, and literature search for historical landslide records (Wieczorek, 1984). A landslide inventory map provides the spatial distribution of locations of existing landslides. The second step in the hazard assessment program consists of preparing a landslide susceptibility map showing the “likelihood that a phenomenon will occur in an area on the basis of the local terrain conditions” (Soeters and Van Westen, 1996). The landslide susceptibility map is
⁎ Corresponding author. Tel.: +1 423 439 6086. E-mail address:
[email protected] (A. Nandi). 0013-7952/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.enggeo.2009.10.001
based on distribution of past landslides, slope steepness, type of soil/ bedrock, structure, hydrology, and other pertinent data. The unique combination of several geospatial input datasets, as stated above, creates a susceptibility map representing different classes of susceptibility: low, moderate, high, and very high (Soeters and Van Westen, 1996; Brabb et al., 1998; Guzzetti et al., 1999; Lee and Min, 2001). Preparing a realistic landslide susceptibility map of an area is difficult but several attempts have been made to minimize subjectivity and error in regional scale mapping (1:50,000 to 1:24,000) (Brabb et al., 1998; Greco et al., 2007; Ruff and Czurda, 2008). Geographical Information Systems (GIS) technology and rapid development of computing capacity have raised great expectations as a potential means of coping with large landslide related datasets (Carrara et al., 1990; Yiping and Beighley, 2007). According to Aleotti and Chowdhury (1999), landslide hazard assessment can be performed using two general approaches: the field-based, qualitative approach, or the data-driven quantitative approach. Within the quantitative approach, statistical models, using bivariate or multivariate techniques for landslide susceptibility analysis, are widespread (Dai and Lee, 2002; Donati and Turrini, 2002). A synthesis of the available methods, and their applicability and limitations, can be found in a wide range of literature (Brabb et al.,
12
A. Nandi, A. Shakoor / Engineering Geology 110 (2009) 11–20
1972; Carrara et al., 1977; Carrara et al., 1990; Van Westen, 2004; Suzen and Doyuran, 2004; Giraud and Shaw, 2007; Thiery et al., 2007; Greco et al., 2007). There are several ways to use bivariate and multivariate statistics to assess landslide susceptibility of an area. The aim of this study was to use two widely-accepted models, one from bivariate and the other from multivariate approaches, and evaluate their performances. Several instability factor variables are used in the present bivariate approach; the influence of each variable on the occurrence of landslide is evaluated independently and the variables are combined in the form of a unique equation (Agterberg and Cheng, 2002; Thiart et al., 2003; Conoscenti et al., 2008). Among the range of available multivariate approaches, logistic regression was found to be the most suitable approach for the present study. In this analysis, spatial distribution of landslides is assessed on the basis of interaction of only statistically significant instability factor data; insignificant data are excluded from consideration. Additionally, logistic regression analysis is free of data distribution issues and can handle a variety of datasets, such as continuous, categorical, and binary, common types of instability factor data used in landslide studies (Dai et al., 2001; Lee and Min, 2001; Suzen and Doyuran, 2004; Lulseged and Hiromitsu, 2004; Lee, 2005; Yesilnacar and Topal, 2005; Lee and Sambath, 2006; Nefeslioglu et al., 2008). Both bivariate and multivariate approaches strongly depend upon the quality and quantity of the data collected and much less on any subjective evaluation. These data-driven techniques are robust, yet the preparation of landslide susceptibility maps can be delayed, as collecting adequate landslide related information in an area requires prolonged effort (Van Westen et al., 2006). On the other hand, limited or unrepresentative data may give an unreasonable result. An important objective of this study was to evaluate the applicability of both bivariate and multivariate techniques, using only limited data collected on landslides from a portion of the study area referred to as the ‘training area’. The statistical techniques were first tested on the
training area and then extended to the entire study area. In order to evaluate the performance of the proposed techniques, landslide locations from the whole study area were surveyed and mapped. This study suggests that thorough sampling from a smaller, representative area can produce realistic results with less time and effort. The study was performed in the Cuyahoga River watershed, northeastern United States, adjacent to the Great Lakes. The Cuyahoga River watershed is well-known for landslide hazards; however, there are very few landslide hazard maps with only limited spatial coverage available for the area. In the existing maps, landslide boundaries were delineated subjectively after the slides had occurred. There is an urgent need to prepare landslide susceptibility maps of the area to help predict the future landslides. 2. The study area Native Americans named the river “Cuyahoga” meaning “crooked river”. The Cuyahoga River is approximately 160 km long, and the watershed drains 2105 km2 of land in Geauga, Portage, Summit, and Cuyahoga counties, located in northeast Ohio (Fig. 1). The U-shaped Cuyahoga River collects all the surface water that enters the basin as precipitation in Geauga and Portage counties and sends it south, then west to Summit County, and finally north to Cuyahoga County, collecting additional water, and ultimately ends at Lake Erie. The typical annual precipitation (rain and snowfall) in the area is about 92.70 cm to 102.67 cm, although a higher amount of localized precipitation is very common and is known as ‘lake effect precipitation’. The Cuyahoga River watershed is located in the Allegheny Plateau section of the Appalachian Plateaus Province. The bedrock in this region consists of siliciclastic sedimentary rocks of Paleozoic age, specifically late Devonian, Mississippian, and Pennsylvanian (Szabo, 1987). A rock record of about 270 million years spanning most of the Permian Period, the Mesozoic Era, and almost all of the Cenozoic Era is missing. This missing rock record is explained by uplift of Ohio above sea level, which
Fig. 1. Location map of the study area.
A. Nandi, A. Shakoor / Engineering Geology 110 (2009) 11–20
13
led to erosion (instead of deposition) being the dominant force. During the Pleistocene Epoch, northeast Ohio was covered with glaciers which, upon melting, left behind a thick heterogeneous regional blanket of clay, silt, sand, gravel, and boulders. These glacial soils, brought from Canada to the north, include a variety of rock types including igneous, metamorphic, and sedimentary. The melt water from glaciers reworked this material, resulting in extensive silt and clay deposits of lacustrine origin. Rapid down cutting of the Pleistocene deposits by the Cuyahoga River and its tributaries results in steep-sided valleys which are highly susceptible to landsliding. The topography of the area is generally hummocky with elevations ranging from 160 m to 395 m and slopes varying between flat and 70°. Rotational slides constitute the principal geologic hazard in the Cuyahoga River watershed (Brabb et al., 1998). They are characterized by the movement of a mass of soil along a single or several internal curved slip planes (Fig. 2). The upper part (crown or head) consists of one or more transversely oriented zones of rupture (scarps) that form a stair-step pattern of displaced blocks. The upper surfaces of these blocks are rotated backward (reverse slopes), forming depressions along which water may accumulate, creating small ponds or swampy areas. Trees on these rotated blocks may be inclined upslope. The lower, downslope end (toe) of a rotational slide is a fan-shaped, bulging mass of material characterized by radial ridges and cracks (Gardner, 1972). Apart from rotational slides, some translational slides and earth flows are present in the study area but they are not considered in the present study. 3. Methodology and data used Detecting instability factors responsible for landslides requires a comprehensive knowledge of the local geologic conditions affecting landslides in the study area (Guzzetti et al., 1999). The instability factors used in this study included slope angle, soil geology and erodibility, proximity to streams, precipitation, landcover patterns, and soil properties (Wieczorek, 1984; Soeters and Van Westen, 1996). Landslides were identified from terrain features such as morphology, vegetation, and drainage conditions of the slope faces using colored, digital aerial photographs of 0.6-m pixel resolution (Fig. 3a, and b), and they were verified by field investigation. Historical landslide locations (about 30-year record) were also mapped using records from the Ohio Department of Transportation and published reports of landslides in northeast Ohio (Gardner, 1972; Jones and Shakoor, 1989). A landslide
Fig. 2. A rotational slide along the Chagrin River Boulevard, Cuyahoga County.
Fig. 3. Samples of rotational slide (a), and barren scarp face (b) identified from aerial photographs.
inventory map (Fig. 4) for historical and recent landslides in the region was produced using GIS. The landslides in the Cuyahoga River watershed vary from small scale (few meters) to large scale (few thousand meters). The small scale questionable landslides (less than 10 m), which can be considered as soil creep, were discarded from the landslide inventory map. The literature review revealed that there are several approaches to defining the extent of landslides for the inventory maps. The approaches include: unique condition area (all landslide area with depletion and accumulation zones), pre-landslide slope facets, landslide rupture zone, geomorphologic terrain units, seed cell approach, and area representing pre-failure condition (Guzzetti et al., 1999; Remomdo et al., 2003; Van Den Eeckhaut et al., 2006; Nefeslioglu et al., 2008). It should be noted that landslide inventory maps constructed using these different methods are completely different and have different meanings (Nefeslioglu et al., 2008). In this study area, the landslides range from small to large scale, and delineating the depletion and accumulation zone boundaries using aerial photographs are not always feasible, especially in heavily wooded areas and in the cases of old landslides. Therefore, the authors have followed the ‘seed cell approach’ proposed by Suzen and Doyuran (2004). This approach considers that the best undisturbed morphological conditions (conditions before a landslide occurs) would be extracted from the vicinity of
14
A. Nandi, A. Shakoor / Engineering Geology 110 (2009) 11–20
Fig. 4. Landslide inventory map of the Cuyahoga River watershed with marked boundaries of the training and test areas.
the actual landslide; this can be achieved by adding a buffer zone to the landsides. Studies have commonly used a buffer zone of 100 to 150 m (Suzen and Doyuran, 2004); however, in this study, a 50-meter buffer zone is chosen so that very limited to no pixels fall beyond the prelandslide area. For the purpose of this research, the entire watershed was divided into two areas: training area and test area. A 1416 km2 area of the entire watershed (located in Summit and Cuyahoga counties) was used as the ‘training area’ and the remaining watershed (689 km2) situated in Geauga and Portage counties was designated as the ‘test area’. The term ‘study area’, as used in this paper, includes both the training area and the test area. In the landslide inventory map, 170 landslides (covering 147,500 pixels) were identified in the training area, and are shown as solid triangular points in Fig. 4. In the remaining test area, 104 landslides (covering 85,320 pixels) were identified and are represented by solid circular points (Fig. 4). The training area landslides were used in both bivariate and the multivariate analyses and the results of the analyses were then implemented over the entire study area, including the test area. The landslides in the test area were used solely for verification of the analysis results. The instability factors (slope angle, soil geology and erodibility, proximity to streams, precipitation, landcover patterns, and soil
liquidity index) were inserted as various raster data layers in GIS that correspond to the USGS 1:24,000 scale (Table 1). A 10-meter Digital Elevation Model (DEM) in 7.5-minute topographic quadrangle map was imported in ArcGIS, and slope angle data of the watershed were calculated (Fig. 5a). The slope angle map was prepared in degrees ranging from flat to 70°. The shallow landslides in the study area occur in the residual/colluvial soils overlying bedrock. The soil mapping units were obtained from the U.S. Department of Agriculture, Natural Resource Conservation Service and were designated into specific types of soil using the United States Department of Agriculture textural classification in terms of unique characteristics including texture, cohesion, slope, and erosion class. The soils were classified as clay, silt, sand, and organic soil while the open quarries were designated as gravel pits (Fig. 5c). For determining soil erodibility (average annual soil loss), the universal soil loss equation was used to estimate soil loss in kg/km2/ year. The parameters to calculate soil erodibility include erosivity index from annual cumulative rainfall energy and intensity, cohesive character of the soil and its resistance to transportation, topographic factor, and crop management factor. The average soil loss in the area ranges from approximately 2.27 × 106 to 2.27× 107 kg/km2/year. The proximity to the streams map was created using a 400-meter buffer zone around the Cuyahoga River and its tributaries (Fig. 5b). Field visits
Table 1 Instability factor data and maps used in the study. Parameter
Source
Data type
Unit
Range
Slope angle Proximity to stream Erodible soil Soil type Liquidity index Precipitation Landcover
Digital Elevation Model Stream map of watershed Ohio Department of Natural Resources, annual soil loss data U.S. Department of Agriculture Soil type data U.S. Department of Agriculture, engineering properties of soil Water and Climate Center of the Natural Resources Conservation Service Landsat Thematic Mapper
Continuous Continuous Continuous Categorical Continuous Continuous Categorical
Degrees Meters kg/km2/year – Unitless (ratio) Centimeters –
0–70 0–4000 2.2E+06–2.2E+07 – − 2.2 to 0.70 92.7–102.67 –
A. Nandi, A. Shakoor / Engineering Geology 110 (2009) 11–20
15
Fig. 5. Classified spatial factor data used in the bivariate and logistic regression analyses: (a) slope angle, (b) streams, (c) soil type, (d) landcover.
revealed that the landslides are concentrated in the erosional banks of the streams; however, due to the sinuous nature of Cuyahoga River and its tributaries, any sensitivity analysis of potential instability along erosional banks of the entire watershed was beyond the scope of this research. Therefore, the proximity to streams map, with a 400-meter buffer zone, was used as an input map to indicate where most of the slope movement was taking place. Data for the landcover maps were extracted from the 1994 Landsat Thematic Mapper Data. This data layer was classified into the general landcover categories of urban, agriculture/open urban areas, shrub/scrub, wooded, open water, non-forested wetlands, and barren. Literature review (Mack, 2004) shows that the landcover of the Cuyahoga River watershed have changed drastically due to widespread deforestation in the late 1980s, but reforestation
started taking place in the late 1990s. Assuming that the number of landslides might increase due to deforestation, the landcover map of 1994 was used in the study purposefully, although more recent landcover maps were available for the area. Also, the historical landslides used in the study have dated as old as 30 years; thus, the landcover map from 1994 better represents the timeframe of the average landslide events (Fig. 5d). A 30-year annual average precipitation dataset, published by the Water and Climate Center of the Natural Resources Conservation Service, for the climatological period of 1961– 1990, was interpolated in ArcGIS. As the recorded historical landslides are as old as 30 years, 30-year annual average precipitation data were used to fit the same range. In the watershed there was a wide range (92.70–102.67 cm) of spatial variation in precipitation data which is the
16
A. Nandi, A. Shakoor / Engineering Geology 110 (2009) 11–20
result of ‘lake effect precipitation’. This phenomenon is very common in the northeastern part of the Cuyahoga River watershed, situated downwind of Lake Erie (one of the Great Lakes) (Heidorn, 2005). Precipitation is not inherent of the ground condition; however a regular annual precipitation cycle in Ohio has the long standing effect of saturating the underground soil material. This paper is concerned with spatial distribution of landslides in the Cuyahoga River watershed; therefore, the time series (temporal) relationship of landslides with precipitation was not evaluated, and a 30-year average annual precipitation was treated as one of the input factors. Soils that contain fine materials exhibit the properties of plasticity and cohesion, depending upon the silt and clay content and the amount of water content. As the water content (Wn) increases, clay or silt becomes softer, changing from solid to plastic state, until it cannot retain its shape and, hence, is considered a liquid. The natural water content in the soil at the threshold between semi-solid and plastic states is called the plastic limit (PL) and between plastic and liquid states is called the liquid limit (LL). The Wn and plasticity index (LL–PL) data for soil coverage in the watershed were obtained from the U.S. Department of Agriculture, Natural Resource Conservation Service. The relative value of Wn, in relation to LL and PL, indicates whether the remolded soil is likely to behave as a liquid, plastic, or brittle material. That relationship is expressed by the liquidity index (LI), LI = ðWn–PLÞ = ðLL–PLÞ
ð1Þ
A LI value greater than 1.0 indicates that Wn is greater than the LL and that the behavior of the sediment would approximate that of a liquid upon shearing. In contrast, a value of 0 to 1 indicates that the soil would behave plastically when sheared, and a LI value of less than 0 indicates that the soil would behave as a brittle material when sheared. The LI values for the Cuyahoga River watershed soils range from −2.2 to 0.70. A grid based soil liquidity index map was prepared in Raster calculator, ArcGIS using information from soil Wn, LL, and PL. All data layers were georeferenced to the Universal Transverse Mercator (UTM) projection system and oriented to North American Datum (NAD) of 1927 (NAD27). The scale of each data layer was chosen to be 1:24,000, the same as the USGS topographic quadrangle maps. The geospatial data layers were then used for bivariate and multivariate analyses. 4. Susceptibility analysis and mapping The landslide susceptibility analysis was performed on the training area of Cuyahoga River watershed using bivariate and multivariate techniques. In each analysis, landslide instability factors (slope angle, soil type, soil erodibility, soil liquidity index, precipitation, landcover, and proximity to the stream) were considered to be independent variables responsible for landslides (dependent variable). In the first stage of the analysis a comprehensive data matrix was prepared where the rows and columns represented the independent and dependent variables, respectively. For the purpose of unbiased sample representation, both the presence (1) and the absence (0) of landslides were used as dependent variables. A total of 170 landslides, covering 147,500 pixels were identified as landslide bearing cells (1). The same number of pixels representing the absence of landslides (0) was selected randomly. Therefore, each sample point had its representative binary dependent variable in terms of 1 (presence of landslide) or 0 (absence of landslide) as well as the information of the independent variables. 4.1. Bivariate statistical analysis In the bivariate analysis, effect of each of the seven independent variables (slope angle, soil type, soil erodibility, soil liquidity index, precipitation, landcover, and proximity to the stream) on landslide
susceptibility was investigated assuming that the independent variables were not inter-related. Landslide susceptibility was estimated separately by exploring the relationship between each of the seven independent variables and the frequency of landslide occurrence. Landslide frequency was calculated for each variable in the training area using histogram analysis (Table 2). Based on the frequency distribution of landslides, with respect to individual factors affecting the landslides in the training area, a numerical ranking system ranging
Table 2 Numerical rating values (reclassification rank) of different instability factors. Instability factors
Class
% area covered by Landslide in parameter class
Landslide frequency %
Reclassification rank
Slope angle (degrees)
0–7.0 7.1–14.0 14.1–21.0 21.1–28.0 28.1–35.0 35.1–42.0 42.1–49.0 49.1–56.0 56.1–63.0 63.1–70.0 0–400 401–800 801–1200 1201–1600 1601–2000 2001–2400 2401–2800 2801–3200 3201–3600 3601–4000 2.2 E+06–4.4E+06 4.5E+06–6.7E+06 6.8E+06–9.0E+06 9.1E+06–1.1E+07 1.2E+07–1.3E+07 1.4E+07–1.6E+07 1.7E+07–1.8E+07 1.9E+07–2.0E+07 2.1E+07–2.2E+07 N2.2E+07 Sand Silt Clay Organic soil Gravel b− 2.2 −2.2 to −1.80 − 1.79 to − 1.40 − 1.39 to −1.0 − 1.09 to − 0.70 − 0.69 to −0.30 − 0.29 to 0.10 0.09 to 0.30 0.31 to 0.70 N.70 b 92.70 92.7–93.97 93.98–95.25 95.26–96.52 96.53–97.79 97.80–99.06 99.07–100.33 100.34–101.60 101.61–102.67 N102.67 Urban Agricultural Shrub Wooded Wetlands
0.02 0.11 0.13 0.20 0.24 0.30 0.10 0.05 0.04 0.01 0.35 0.25 0.13 0.11 0.08 0.07 0.07 0.06 0.05 0.02 0.00 0.01 0.04 0.08 0.13 0.12 0.16 0.18 0.25 0.23 0.25 0.43 0.48 0.02 0.01 0.08 0.04 0.06 0.14 0.10 0.12 0.13 0.17 0.25 0.11 0.05 0.06 0.11 0.08 0.23 0.13 0.19 0.07 0.16 0.12 0.23 0.32 0.10 0.49 0.06
2 9 11 17 20 25 8 4 3 1 29 21 11 9 7 6 6 5 4 2 0 1 3 7 11 10 13 15 21 19 21 36 40 2 1 7 3 5 12 8 10 11 14 21 9 4 5 9 7 19 11 16 6 13 10 19 27 8 41 5
0.2 0.6 0.7 0.8 0.9 1.0 0.5 0.4 0.3 0.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.1 0.2 0.3 0.4 0.6 0.5 0.7 0.8 1.0 0.9 0.6 0.8 1.0 0.4 0.2 0.3 0.1 0.2 0.8 0.4 0.6 1.0 0.9 0.7 0.5 0.1 0.2 0.5 0.4 1.0 0.7 0.9 0.3 0.8 0.6 0.6 0.8 0.4 1.0 0.2
Proximity to stream (meters)
Soil erodibility (kg/km2/year)
Soil type
Liquidity index
Precipitation (cm)
Landcover
A. Nandi, A. Shakoor / Engineering Geology 110 (2009) 11–20
from 0.0 to 1.0 was created. Each factor map was divided into ten equal interval classes. Classes in which the landslides were found to be dominant were assigned the highest number (1.0) in the numerical scale (Table 2). Two categorical data layers, landcover and soil type, were divided into five classes, and each class was assigned a weight value. Increments of 0.2 were used for the sake of uniformity with other continuous factor layer data. Each class value was replaced with a new value, expressing the level of landslide susceptibility (Table 2). This information was transferred in ArcGIS to create seven different reclassified, grid maps with the numerically-ranked data in 10 m by 10 m pixels. Bivariate statistical analysis was performed on the numerically-ranked factor grid data and the landslide susceptibility map was prepared using the following equation: Landslide susceptibility =
1
n
n
∑
i=1
Xi
=Xi max
ð2Þ
where, Xi (i = 1 to 8) are the numerically-ranked categories of individual geospatial factor data layers, Xi max is the highest rank for a particular factor layer, and n is the number of factors. Landslide susceptibility was calculated by adding the ratios of Xi and Xi max. Assuming that similar slope-instability-related conditions prevail in the entire watershed, the analysis results from the training area were extrapolated beyond the training area to encompass the entire test area. All reclassified data layers for the training area as well as the test area (entire watershed) were used to calculate landslide susceptibility in grid format in ArcGIS, using Eq. 2, and each grid was assigned a number, ranging from 0.1 to 1.0, depending upon the level of susceptibility.
17
occurrence. The susceptibility maps are classified in different categories from low, medium, high, and very high (Guzzetti et al., 1999; Lee and Min, 2001; Ohlmacher and Davis, 2003). In determining which procedure to use for classifying landslide susceptibility, the authors considered five classification systems discussed in previous literature and attempted to select the system most suitable for the present study. The classification systems are: expert opinions, binary, natural breaks, standard deviations, and equal interval. Expert-based classification is used in literature (Guzzetti et al., 1999) but the scheme is subjective, and not useful for data oriented analysis. Binary classification uses a threshold of 0.5 to distinguish the two extreme cases where landslide has and has not occurred (Dai and Lee, 2002). This method cannot separate less susceptible areas from the areas more highly susceptible to landsliding. The use of natural breaks is good when there are obvious jumps in the data (Ruff and Czurda, 2008); however, this is not the case in this study. The standard deviation method has certain merits as it uses the average data to generate the mean class intervals, and uses one standard deviation (+ and −) to add a range within the class (Suzen and Doyuran, 2004). In this study the mean class intervals of landslide susceptibility from bivariate model are very different from those of the logistic regression, and could not be used for the sake of comparison. The equal interval method has its own limitation where it can emphasize one class relative to the other (Ayalew and Yamagishi, 2005). However, in this study, our focus was to compare the two statistical models, hence equal interval method was found to be suitable (Yesilnacar and Topal, 2005; Nefeslioglu et al., 2008). The landslide susceptibility, calculated from bivariate and logistic regression models, ranged from 0.0 to 1.0 and was classified in equal intervals of low: 0.00–0.25, medium: 0.26–0.50, high: 0.51–0.75, and very high: 0.76–1.00 (Fig. 6).
4.2. Multivariate statistical analysis 5. Results of statistical analyses In the multivariate statistical model, unlike the bivariate model, all instability factor layers (independent variables) responsible for landslides were treated together and their interactions helped to determine the future probability of landslides in the Cuyahoga River watershed. The multivariate model also evaluated the relative contribution of each variable, putting more emphasis (relative weight) on the variables known to contribute to landslide occurrence. Logistic regression analysis was chosen as the appropriate multivariate technique because of the nature of data used in this study. Most data cannot meet the typical requirements of many parametric statistical tests for normally distributed data (Suzen and Doyuran, 2004). The independent variables in this analysis include non-linear, non-normally distributed, categorical data (landcover and soil type) and continuous data (slope angle, soil liquidity index, soil erodibility, proximity to streams, and precipitation data) and the dependent variable is binary, dichotomous (presence of landslide or absence of landslide) in nature. Detailed descriptions of the logistic regression technique can be found in Hosmer and Lemeshow (2000) and Kleinbaum and Klein (2002). Logistic regression analysis was performed on the training area (Fig. 4) of the Cuyahoga River watershed using a forward stepwise regression method. In the forward stepwise process, independent variables were entered in each step if their probability of remaining in the model was greater than 0.05 (the default value in this study). Results were then transferred in Raster Calculator of ArcGIS for mathematical computation of logistic regression equations, and for developing probabilistic landslide susceptibility maps (pixel size 10 m × 10 m) with probability (P) of landslide occurrence ranging from 0 to 1.
Data from training area of the watershed, used in the bivariate analysis, indicated that landslide frequency increases gradually with
4.3. Susceptibility maps The purpose of using bivariate and logistic regression models is to develop a landslide susceptibility map indicating locations where the probabilities of landslide occurrence fall within the range of 0.0 to 1.0. Numbers closer to 1 better indicate the possibility of landslides
Fig. 6. Landslide susceptibility map of the Cuyahoga River watershed using logistic regression model, the landslides of the test area are overlaid on the map.
18
A. Nandi, A. Shakoor / Engineering Geology 110 (2009) 11–20
an increase in slope angle until the range of 35.1–42.0° and then decreases beyond that range (Table 2). A majority of landslides occur closer to streams, as undercutting of stream banks plays a major role in initiation of landslides. Precipitation versus landslide distribution did not show any particular trend, the highest concentration of landslides occurred in the 96.53–97.79 cm range. Most landslides occur in soils with liquidity index values ranging from −0.29 to 0.10. Liquidity index values between 0 and 1 indicate that the soil behaves as a plastic material when sheared. Even in the case of dry soils (negative liquidity index values), as the water infiltrates rapidly upon heavy rainfall, the soil tends to behave like a plastic or liquid material, resulting in slides. Rotational slides were concentrated in the zone of very-high soil erodibility (1.8 × 107 to 2.0 × 107 kg/km2/year soil loss region) and landslide activity decreases with a decrease in soil erodibility. Landslides are more dominant in the silty and clayey soils than in sands and gravels, and most of them are concentrated in wooded (forested) areas, followed by agricultural land, and then urban areas. The correlation between wooded area and presence of landslides is apparently unusual, although field visits have confirmed that (a) many landslides in the watershed have taken place in wooded areas on steeper slopes and (b) historical landslides in the study area are also presently covered with vegetation. The landcover map was purposely derived from 1994 Landsat images; however, the authors believe that the landslide frequency distribution might show a different trend if more recent landcover maps were used, as the landcover pattern is constantly changing due to urbanization. Using the input factors as independent variables and the presence/ absence of landslides as a dependent variable, logistic regression was performed on the training area of the Cuyahoga River watershed, using the forward stepwise method. Tests for ‘goodness-of-fit’ of the logistic regression model analysis were performed using Maximum Likelihood Estimation (MLE). According to the MLE estimation, smaller −2 log likelihood (−2LL) values mean that the model was a better fit to the data (Table 3) (Hosmer and Lemeshow, 2000). The −2LL values in the training area decreased from 87.894 in step 1 to 24.842 in step 6. Higher Cox and Snell and Nagelkerke R2 values also indicated a better fit of the data with the model (Hosmer and Lemeshow, 2000). The Wald test was used to assess the statistical significance of the coefficient of each independent variable incorporated in the model. The slope angle variable incorporated in the first step of the model indicated the highest Wald value (49.913), suggesting that it is the most significant variable in the analysis (Table 4). In building the final logistic regression model, as other individual variables were added, the Wald value of slope angle decreased from 49.913 in the first step to 9.021 in the final step. Here the significance of individual variables decreased while the collective significance of the model increased. The Wald values revealed that slope angle is statistically the most significant factor in landslide occurrence, followed by proximity to stream, soil erodibility, and soil type, respectively. Thus, the overall result indicated that slope angle, proximity to stream, soil type, and soil erodibility are statistically significant slope instability factors that contributed to landslides in the training area of the Cuyahoga River watershed. The analysis results from the training area were extrapolated into the designated test area of the watershed. Slope angle, proximity to stream,
Table 3 Maximum likelihood estimation (MLE) using −2LL, Cox and Snell and Nagelkerke R2 values. Steps
−2 LL
Cox and Snell R2
Negelkerke R2
1 2 3 4 5 6 (final)
87.894 59.621 41.453 35.713 24.842 27.344
0.663 0.682 0.702 0.718 0.720 0.726
0.856 0.891 0.934 0.955 0.963 0.968
Table 4 Logistic regression coefficients (B0 and Bi), and Wald statistics.
Step 1 Step 2
Step 3
Step 4
Step 5
Step 6
Slope angle Constant Slope angle Soil erodibility Constant Slope angle Soil erodibility Soil type Constant Slope angle Landcover Soil erodibility Soil type Constant Slope angle Proximity to stream Landcover Soil erodibility Soil type Constant Slope angle Proximity to stream Soil erodibility Soil type Constant (B0)
Coefficient (Bi)
Standard error (S.E.)
Wald statistics
Significance (Sig.)
0.340 −6.257 0.410 2.201 −11.309 0.467 2.784 3.573 − 16.949 0.546 −0.991 3.912 4.794 − 17.070 0.763 − 0.023 − 0.761 3.678 4.919 − 20.162 0.697 − 0.032 3.202 5.141 − 20.760
0.051 0.886 0.078 0.756 1.793 0.102 0.767 1.120 3.421 0.188 0.412 0.829 1.283 4.459 0.247 0.035 0.589 1.231 1.702 6.498 0.302 0.043 0.987 1.024 5.930
49.913 48.312 36.506 17.624 35.342 21.514 12.921 11.758 24.631 18.591 6.012 12.163 12.359 17.406 9.209 5.223 1.729 9.09 8.365 11.033 9.021 5.170 8.923 7.701 11.092
0.00 0.00 0.00 0.01 0.00 0.01 0.00 0.12 0.00 0.00 0.02 0.00 0.01 0.00 0.00 0.02 0.16 0.00 0.01 0.00 0.00 0.01 0.00 0.01 0.00
soil erodibility, and soil type data layers of the training area as well as test area data were imported into ArcGIS. Based on logit of f(x), the Bi and B0 coefficient values obtained from the final output of the logistic regression analysis (Table 4) were applied on the raster layers for the entire study area, using Eq. 3, to produce a landslide susceptibility map of the Cuyahoga River watershed (Fig. 6). z = −20:760 + 0:697 Slope Angle–0:032 Proximity to Stream
ð3Þ
+ 3:202 Soil Erodibility + 5:141 Soil Type
6. Model verification and comparison of statistical analyses The accuracy of bivariate and logistic regression analyses in modeling landslide susceptibility in the Cuyahoga River Watershed was evaluated by calculating relative operating characteristics (ROC) (Lee, 2005; Fawcett, 2006) and percentage of known landslides in various susceptibility categories. In the ROC method, the area under the ROC curve (AUC) values, ranging from 0.5 to 1.0, are used to evaluate the accuracy of the model. The AUC defines the quality of the probabilistic model by describing its ability to reliably predict the occurrence or non-occurrence of an event (Yesilnacar and Topal, 2005). The ideal model shows an AUC value close to 1.0, whereas a value close to 0.5 indicates inaccuracy in the model (Fawcett, 2006). In order to apply the ROC method to the study area, a concise and representative dataset was prepared using randomly selected pixels from landslide and non-landslide locations in the training area of the watershed. The AUC value of ROC curve for bivariate analysis was found to be 0.59, with an estimated standard error of 0.11 (Fig. 7). The AUC value for logistic regression was calculated as 0.81, with an estimated standard error of 0.07 (Fig. 8). These results indicate that bivariate analysis is a poor estimator, whereas logistic regression is a good estimator of landslide susceptibility in the Cuyahoga River watershed. Moreover, the ROC curve for logistic regression had a greater steepness in the first part of the curve (with a lower false positive fraction), indicating its greater predictive capability (Chung and Fabbri, 2003; Remondo et al., 2003).
A. Nandi, A. Shakoor / Engineering Geology 110 (2009) 11–20
19
Fig. 9. Percentages of known landslides falling into the different landslide susceptibility categories, the Cuyahoga River watershed, as determined by the bivariate analysis and multivariate analysis (logistic regression model), respectively. Fig. 7. ROC curve evaluation for bivariate analysis.
The landslide susceptibility analysis results were also verified using 104 landslides (covering 85,320 pixels) in the test area of the watershed that were kept aside from the training area model analysis. These 104 landslide locations were overlaid on the landslide susceptibility maps produced by bivariate and logistic regression analysis, and the percentage of pixels of existing landslides within different susceptibility classes was calculated. The bivariate analysis identified that most of the landslides in the test area were concentrated in medium to high susceptibility classes, whereas the logistic regression suggested that 54% of the landslides were concentrated in the very high landslide susceptibility category (Fig. 9). The logistic regression appeared to be reliable inasmuch as the model best corresponded to the actual ground truth in the Cuyahoga River watershed. 7. Discussion Landslide susceptibility maps can help engineers, contractors, landuse planners, etc. minimize landslide hazards if the landslide
susceptibility analyses are performed effectively. In this study, bivariate and multivariate (logistic regression) approaches were applied to generate landslide susceptibility maps, using the same instability factor database. The bivariate approach computes the frequency of landslides with respect to each input factor separately, and the final susceptibility map is a simple combination of all the factors irrespective of their relative significance in causing landslides in a particular region. An innovative, and logical technique that can assign weighting values to the more significant instability factors, and exclude the factors that have less or no effect on landslide activity, can refine the bivariate analysis method. The multivariate statistical analysis (logistic regression) method computes weight-based combinations of significant factors and excludes insignificant factors from consideration, providing more reliable results. Application of a logistic regression model to landslide susceptibility analysis is appropriate for the present study. Due to the strength of the flexible, non-linear nature of the logistic regression model, this method has the advantage of being able to deal with a wide range of data. In addition, the nonparametric nature of this model provides the ability to analyze instability factors that are nonsymmetrical and show skewed distributions, typical in the natural environment. The overall result could be improved by using additional landslides in the model analysis. Landslide susceptibility assessment over a relatively large area requires inclusion of a large number of landslides, using aerial photographs, field mapping, and historical landslide information. This long-term process involves not only supplementary funding and manpower but also requires awareness among civilians. These added resources can enrich the comprehensive instability factor database of the area and will guarantee a more precise landslide susceptibility map. Further improvement of both methods can be accomplished by incorporating additional factors like rainfall and snow index, major wind direction, land use, and more detailed information about the hydrology and hydrogeology (micro watershed delineation, depth to groundwater table) of the area. Due to increasing awareness of landslide hazards, similar studies are being undertaken all over the world. For this reason, it is necessary to establish a standard method for creating landslide susceptibility maps, using modern, computer-based technology. 8. Conclusion
Fig. 8. ROC curve evaluation for logistic regression analysis.
Preparing the landslide susceptibility map was a major step forward in landslide hazard mitigation in the Cuyahoga River watershed. Two different methodologies, bivariate analysis and logistic regression
20
A. Nandi, A. Shakoor / Engineering Geology 110 (2009) 11–20
analysis, were used to define spatial distribution of landslide susceptibilities in the watershed on a regional scale of 1:24,000. The study was conducted by means of pixel-based analysis in ArcGIS. The size of each pixel is 10 m × 10 m, square grid. The accuracy of each method was tested using ROC curves and by comparing the landslide susceptibility maps with known landslide locations. The map produced by the logistic regression method had better representativeness than the map produced using the bivariate method. The bivariate analysis is a rudimentary approach and decline in its performance is because less relevant factors are also considered in the model analysis. The logistic regression method indicates that slope angle, proximity to streams, soil type, and soil erodibility constitute the most significant factors in promoting slope movement in the study area. In addition to preparing a realistic landslide susceptibility map of the watershed, a second objective of the work was to assess, whether landside susceptibility of an area can be predicted using limited sampling locations. The entire study area extends over 2105 km2 of which a 1416 km2 area was chosen as training area. Landslide susceptibility analyses were performed in the training area and were extrapolated into the test area of the watershed, assuming that similar slope-instability-related conditions prevailed in the entire watershed. Logistic regression analysis has yielded a satisfactory result. With regard to such a study and to obtain reasonable results, landslide related information and every independent factor that might have direct bearing to landsliding should be thoroughly investigated. Above all, application of logistic regression model for landslide susceptibility analysis is appropriate for the present study. Results of this analysis provided information that led to significant improvement in understanding the reasons for spatial distribution of landslides in the watershed. The landslide susceptibility maps produced in this study can serve as a reference for planners and engineers in slope management and land-use planning. In order to determine the exact extent of slope instability, areas within high and very high landslide susceptibility categories require more site specific studies by engineering geologists before commencing development. References Agterberg, F.P., Cheng, Q., 2002. Conditional independence test for weights of evidence modeling. Natural Resources Research 11, 249–255. Aleotti, P., Chowdhury, R., 1999. Landslide hazard assessment: summary review and new perspectives. Bulletin of Engineering Geology and the Environment 58 (1), 21–44. Ayalew, L., Yamagishi, H., 2005. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda–Yahiko mountains, Central Japan. Geomorphology 65, 15–31. Brabb, E.E., Pampeyan, E.H., Bonilla, M.G., 1972. Landslide susceptibility in San Mateo County, California, Misc. Field Studies Map MF360 (scale 1:62,500). United States Geological Survey, Reston, Virginia. Brabb, E., Colgan, J.P., Best, T., 1998. Map showing inventory and regional susceptibility for Holocene debris flows and related fast-moving landslides in the conterminous United States. U.S Department of the Interior. USGS 1–5. Carrara, A., Pugliese, C.E., Merenda, L., 1977. Computer based data bank and statistical analysis of slope instability phenomena. Zeitschrift fur Geomorphologie N.F 21 (2), 187–222. Carrara, A., Cardinalli, M., Detti, R., Guzzetti, F., Pasqui, V., Reichenbach, P., 1990. Geographical information systems and multivariate models in landslide hazard evaluation. ALPS 90-Alpine Landslide Practical Seminar, Sixth International Conference and Field Workshop on Landslides. Universita degli Studi de Milano, Milan, Italy, pp. 17–28. Chung, C.F., Fabbri, A.G., 2003. Validation of spatial prediction models for landslide hazard mapping. Natural Hazards 30, 1052–1068. Conoscenti, C., Ciprioano, D.M., Rotigliano, E., 2008. GIS analysis to assess landslide susceptibility in a fluvial basin of NW Sicily (Italy). Geomorphology 94, 325–339. Dai, F.C., Lee, C.F., 2002. Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 42, 213–228. Dai, F.C., Lee, C.F., Xu, Z.W., 2001. Assessment of landslide susceptibility on the natural terrain of Lantau Island, Hong Kong. Environmental Geology 40 (3), 381–391. Donati, L., Turrini, M.C., 2002. An objective method to rank the importance of the factors predisposing to landslides with the GIS methodology: application to an area of the Apennines (Valnerina; Perugia, Italy). Engineering Geology 63, 277–289. Fawcett, T., 2006. An introduction to ROC analysis. Pattern Recognition Letters 27, 861–874.
Gardner, G.,1972, A Regional Study of Landsliding in the Lower Cuyahoga River Valley, Ohio, M.S. Thesis, Kent State University, Kent, Ohio, USA, 345 p. Giraud, R.E., Shaw, L.M., 2007. Landslide Susceptibility Map of Utah Report, MAP 228DM. Utah Geological Survey Publication. 11 p. Glade, T., Crozier, M.J., 2005. A review of scale dependency in landslide hazard and risk analysis. In: Glade, T., Anderson, M., Crozier, M.J. (Eds.), Landslide Hazard and Risk. John Wiley and Sons, Chichester, pp. 75–138. Greco, R., Sorriso-Valvo, M., Catalano, E., 2007. Logistic regression analysis in the evaluation of mass movement susceptibility: The Aspromonte case study, Calabria, Italy. Engineering Geology 89, 47–66. Guzzetti, F., Carrara, A., Cardinalli, M., Reichenbach, P., 1999. Landslide hazard evaluation: a review of current techniques and their application in a multi-case study, central Italy. Geomorphology 31, 181–216. Heidorn, K., 2005. And now the weather. Fifth Hose Books, Canada. Hosmer, D.W., Lemeshow, S., 2000. Applied logistic regression. John Wiley and Sons, New York. 375 pp. Jones, M.L., Shakoor, A., 1989. Some landslide hazards in northern Summit County, Ohio, with special reference to selected landslide locations. Bulletin of the Association of Engineering Geologists 26, 351–368. Kleinbaum, D.G., Klein, M., 2002. Logistic regression: a self learning text, 2nd ed. SpringerVerlag, New York. 513 pp. Lee, S., 2005. Application of logistic regression model and its validation for landslide susceptibility mapping using GIS and remote sensing data. International Journal of Remote Sensing 26 (7), 1477–1491. Lee, S., Min, K., 2001. Statistical analysis of landslide susceptibility at Yongin, Korea. Environmental Geology 40, 1095–1113. Lee, S., Sambath, T., 2006. Landslide susceptibility mapping in the Damrei Romel area, Cambodia using frequency ratio and logistic regression models. Environmental Geology 50, 847–855. Lulseged, A., Hiromitsu, Y., 2004. The application of GIS-based logistic regression for landslide susceptibility mapping in the Kakuda–Yahiko Mountains, Central Japan. Geomorphology 65 (1–2), 15–31 1–2. Mack, J.J., 2004. Integrated Wetland Assessment Program. Part 9: field manual for the vegetation index of biotic integrity for wetlands. Ohio EPA Technical Report WET/ 20049. Ohio Environmental Protection Agency. Division of Surface Water, Wetland Ecology Group, Columbus, Ohio. Nefeslioglu, H.A., Gokceoglu, C., Sonmez, H., 2008. An assessment on the use of logistic regression and artificial neural netwoks with different sampling strategies for the preparation of landslide susceptibility maps. Engineering Geology 97, 171–191. Ohlmacher, C.G., Davis, C.J., 2003. Using multiple regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Engineering Geology. 69, 331–343. Remomdo, J., Gonzalez-Diez, A., Teran, J.R.D., Cendrero, A., 2003. Landslide susceptibility models utilizing spatial data analysis techniques. A case study from the lower Deba valley, Guipuzcoa (Spain). Natural Hazards 30, 267–279. Ruff, M., Czurda, K., 2008. Landslide susceptibility analysis with a heuristic approach in the Eastern Alps (Vorarlberg, Austria). Geomorphology 94, 314–324. Soeters, R., Van Westen, C.J., 1996. Slope instability recognition, analysis and zonation. In: Turner, A.K, Schuster, R.L. (Eds.), Landslides: Investigation and Mitigation, Special Report 247. Transportation Research Board National Research Council, Washington D.C, pp. 129–177. Suzen, M.L., Doyuran, V., 2004. Data driven bivariate landslide susceptibility assessment using geographical information systems: a method and application to Asarsuyu Catchment, Turkey. Engineering Geology 71 (3–4), 303–321. Szabo, J.P., 1987. Wisconsinan stratigraphy of the Cuyahoga Valley in the Erie Basin, Northeastern Ohio. Canadian Journal of Earth Science 24, 279–290. Thiart, C., Bonham-Carter, G.F., Agterberg, F.P., 2003. Conditional independence in weights of evidence: application of an improved test. IAMG, International Association for Mathematical Geology, Portsmouth, UK. Thiery, Y., Malet, J.P., Sterlacchini, S., Puissant, A., Maquaire, O., 2007. Landslide susceptibility assessment by bivariate methods at large scales: application to a complex mountainous environment. Geomorphology 92, 38–59. Van Den Eeckhaut, M., Vanwalleghem, T., Poesen, J., Govers, G., Verstraeten, G., Vanderkerckhove, L., 2006. Prediction of landslide susceptibility using rare events logistic regression: a case-study in Flemish Ardennes (Belgium). Geomorphology 76, 392–410. Van Westen, C.J., 1993. Application of Geographic Information Systems to landslide hazard zonation. ITC Publication No. 15, International Institute for Aerospace Survey and Earth Sciences (ITC), Netherlands, p. 245. Van Westen, C.J., 2004. Geo-information tools for landslide risk assessment: an overview of recent developments. In: Lacerda, W.A., Ehrlich, M., Fountoura, S.A.B., Sayao, A.S.F. (Eds.), Landslides Evaluation and Stabilization. Balkema, Rotterdam, pp. 39–56. Van Westen, C.J., Van Asch, T.W.J., Soeters, R., 2006. Landslide hazard and risk zonation: why is it still so difficult? Bulletin of Engineering Geology and the Environment 65, 167–184. Wieczorek, G.F., 1984. Preparing a detailed landslide-inventory map for hazard evaluation and reduction. Bulletin of the Association of Engineering Geologist 21, 337–342. Yesilnacar, E., Topal, T., 2005. Landslide susceptibility mapping: a comparison of logistic regression and neural networks methods in a medium scale study, Hendek region (Turkey). Engineering Geology 79, 251–266. Yiping, H.R., Beighley, E., 2007. GIS-based regional landslide susceptibility mapping: a case study in southern California. Earth Surface Processes and Landforms 33 (3), 380–393.