A comparison between Bayes discriminant analysis and logistic regression for prediction of debris flow in southwest Sichuan, China

Geomorphology 201 (2013) 45–51 Contents lists available at ScienceDirect Geomorphology journal homepage: www.elsevier.com/locate/geomorph A compari...

Download PDF

745KB Sizes 2 Downloads 22 Views

Report

PDF Reader
Full Text

Geomorphology 201 (2013) 45–51

Contents lists available at ScienceDirect

Geomorphology journal homepage: www.elsevier.com/locate/geomorph

A comparison between Bayes discriminant analysis and logistic regression for prediction of debris ﬂow in southwest Sichuan, China Wenbo Xu a,⁎, Shaocai Jing a, Wenjuan Yu a, Zhaoxian Wang a, Guoping Zhang b, Jianxi Huang c a b c

School of Resources and Environment, University of Electronic Science and Technology of China, 611731 Chengdu, Sichuan Province, China Public Weather Service Center of China Meteorological Administration, 100081 Beijing, China College of Information and Electrical Engineering, China Agricultural University, 100083 Beijing, China

a r t i c l e

i n f o

Article history: Received 19 January 2013 Received in revised form 4 June 2013 Accepted 7 June 2013 Available online 15 June 2013 Keywords: Debris ﬂow Rainfall and environmental factors Prior probability Bayes discriminant analysis Logistic regression

a b s t r a c t In this study, the high risk areas of Sichuan Province with debris ﬂow, Panzhihua and Liangshan Yi Autonomous Prefecture, were taken as the studied areas. By using rainfall and environmental factors as the predictors and based on the different prior probability combinations of debris ﬂows, the prediction of debris ﬂows was compared in the areas with statistical methods: logistic regression (LR) and Bayes discriminant analysis (BDA). The results through the comprehensive analysis show that (a) with the mid-range scale prior probability, the overall predicting accuracy of BDA is higher than those of LR; (b) with equal and extreme prior probabilities, the overall predicting accuracy of LR is higher than those of BDA; (c) the regional predicting models of debris ﬂows with rainfall factors only have worse performance than those introduced environmental factors, and the predicting accuracies of occurrence and nonoccurrence of debris ﬂows have been changed in the opposite direction as the supplemented information. © 2013 Elsevier B.V. All rights reserved.

1. Introduction Debris ﬂow is a sudden natural disaster speciﬁcally occurring in mountain areas, with strong carrying, lashing, and burying abilities and intense destructiveness, and has become a huge threat to the security of human life and property and an obstacle to economic development (Ma, 2010). The occurrences of debris ﬂows owe to the interaction of geology, topography, geomorphology, hydrology, weather, and other natural factors, which can be divided into two groups: one is rainfall, which directly triggers the occurrence of debris ﬂow, and the other is the environmental factors that are the basic conditions of the occurrence of debris ﬂow. This disaster has caught unprecedented attention in the world; lots of researchers are continuously carrying out relevant research, mainly focusing on its prediction. In the earlier debris ﬂow prediction, most prediction models were built by means of the investigation of relationship between rainfall and debris ﬂow on the basis of the processing of rainfall data (Kenneth, 1987; Tan and Han, 1992; Chen et al., 2007; Shieh et al., 2009). With the deep-going research of debris ﬂow prediction and the innovative development of data obtaining technologies, environmental factors are paid more attention. These environmental factors are comprehensively analyzed in order to conduct debris ﬂow susceptibility evaluation and risk zoning (Lee, 2005; Liu, 2006; Pradhan and Lee, 2007; Pradhan, 2010). And these factors—along with rainfall factors—are used as independent predictor variables for debris ﬂow ⁎ Corresponding author. Tel./fax: +86 28 61831571. E-mail address: [email protected] (W. Xu). 0169-555X/$ – see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.geomorph.2013.06.003

forecasting; that is, the forecasting model is established containing these two kinds of parameters (Jomelli et al., 2003; Ohlmacher and Davis, 2003; Rupert et al., 2008). Therefore, considering the environmental factors for the prediction of debris ﬂow is necessary. Recently, the prediction of debris ﬂow mainly focuses on two aspects: mechanical prediction based on the disaster formation mechanism (Liu, 2002) and quantitative prediction based on mathematical statistics. The quantitative prediction models are usually applied in the research of regional debris ﬂow, and then this kind of prediction can be divided into probabilistic prediction and deterministic prediction according to the predicting results. Probabilistic prediction is represented by the logistic regression (LR) model, which has been used to build a model based on the combination of various rainfall and environmental factors (Ohlmacher and Davis, 2003; Rupert et al., 2008). Deterministic prediction is represented by the Bayes discriminant analysis (BDA); Spiegelhalter (1986) applied the Bayes formula to build spatial forecasting models at an earlier time, and later this method was used for the prediction of debris ﬂows and landslides (Leclerc, 1994; Graf et al., 2009). The multivariate statistical methods, BDA and LR, are widely used for analysis of data in event classiﬁcation. Many researchers have used two classiﬁcation methods in various practical ﬁelds (Maja et al., 2004; Alkarkhi and Easa, 2008), especially in health sciences and clinical psychology (Payne et al., 1998; Udris et al., 2001). The LR is a form of regression and uses the logit transformation to calculate the ratio of probability by using probability outcome divided by probability without it and to predict the probabilities of group memberships in relation to several variables (Worth and Cronin, 2003). Bayes discriminant analysis is derived from the linear discriminant analysis (LDA),

46

W. Xu et al. / Geomorphology 201 (2013) 45–51

which distinguishes new samples and classiﬁes them into known groups (Fan and Mei, 2002). The two methods have different basic ideas; in the whole, BDA is usually used to simulate the linear relationships between the independent variables and dependent variables under the assumptions of multivariate normality and equal covariance, while LR simulates the nonlinear relationship and makes no such assumptions (Lei and Koehly, 2003). In general, BDA will give better results when these assumptions are met, but in other cases LR will be more suitable (Efron, 1975; Harrell and Lee, 1985). However, which of these two methods will be selected is more relevant to the actual statistical application ﬁeld than to the assumption satisfaction. In practice, the assumptions are nearly violated, therefore doing continuous simulation experiments to ﬁnd the more suitable method is necessary (Maja et al., 2004). In addition, prior probabilities, which are the proportions of group members that exist in the populations, also affect the classiﬁcation results of BDA and LR. For instance, Fan and Wang (1999) and Lei and Koehly (2003) compared the classiﬁcation error rates of LDA and LR by using the Monte Carlo simulation under different prior probabilities in the binary cases. Consequently, both methods are very applicable in debris ﬂow prediction and worthy of study. In view of the previous research, the feasibility of modeling based on the occurrence mechanism is low; and the mathematical statistical model still occupies an important position in the prediction of regional debris ﬂow. Nevertheless, BDA is less used than LR for debris ﬂow prediction, and the studies of comparison between the two are much rarer. On the mastery of both theoretical methods, the main objective of this study is to compare the performances of BDA and LR to predict debris ﬂow with different combinations of debris ﬂow prior

probabilities in terms of the historical debris ﬂow data in the period 1981–2000, including rainfall and environmental background data. 2. Study areas and data source 2.1. Study areas According to the susceptibility of debris ﬂows in Sichuan Province (Xu et al., 2013), Panzhihua and Liangshan Yi Autonomous Prefecture belonging to the debris-ﬂow-prone areas, are located in the southwest of Sichuan and are bounded by longitudes of 100°15′ E. and 103°53′ E., latitudes of 26°03′ N. and 29°27′ N (Fig. 1). Obvious dry and wet seasons are to provide concentrated rainfall. Elevation has signiﬁcant differences, a low-lying West High East; and the complex topography is mainly mountainous. The geological structure is also complicated, with staggered fault zones and seismic belts. Hence, debris ﬂow is easy to outbreak in the regions. 2.2. Data sources Firstly, the corresponding debris ﬂow material, in the period 1981– 2000, were extracted from the China Institute of Geo-Environmental Monitoring, which contain attributes about event time and accurate positioning information like latitude and longitude. And these data were based on a day as the event unit. The count was 129. In order to meet the model condition (which is that building the models needs occurrence and nonoccurrence of debris ﬂow), the precipitation records of meteorological stations that were from the nearest disaster points

Fig. 1. Map of distribution of Liangshan Yi Autonomous Prefecture and Panzhihua Administrative Region, debris ﬂow (1981–2000) and DEM.

W. Xu et al. / Geomorphology 201 (2013) 45–51

were chosen as the records that the debris ﬂow did not happen as a result of the limitation of economic and natural conditions. Secondly, the daily rainfall data were collected from the China Meteorological Data Sharing Service System, and the count is 54. In the study areas, these data came from 24 meteorological stations during the period from 1981 to 2000. Two analysis predictors were the intraday rainfall that was obtained by spatial interpolation, and the 10-day previous effective rainfall. Thirdly, considering the environmental factors fully in accordance with the actual situation of the study areas, availability, reliability, and economy, is required. For large-scale studied areas such as provinces or cities, by reference to Soeter's research (Soeters and van Westen, 1996), elevation, slope, aspect, ﬂow accumulation, vegetation coverage, soil types, and land use were selected as the environmental background predictors. In this study, the environmental data are divided into the following: the topographic data such as elevation, slope, and aspects and the hydrology data ﬂow accumulation (which were derived from SRTM-DEM with spatial resolution of 90 × 90 m); vegetation coverage was represented by the NDVI index (which was extracted from the AVHRR data product in the period of 1981–2000); the land use data (which came from the Sichuan Environmental Monitoring Centre in the scale of 1:100,000); and the soil types data (which came from the soil thematic image sublibrary of China in the scale of 1:1,000,000). And then the qualitative data were turned into quantitative data. 3. Methods The computational process is as shown in Fig. 2. First, the relevant predictors were preprocessed: daily rainfall was obtained by the co-kriging spatial interpolation method considering the environmental factors; and uniﬁed projection and quantitative processing of the environmental factor data layer were performed. Then the predicting factors were divided into two groups: rainfall-oriented factors and rainfall- and environmental-oriented factors. Second, the processing of the training samples was organized by eliminating the missing data samples and multiple collinear samples. In reference to the research of Fan and Wang (1999) and Lei and Koehly (2003), three combinations about the prior probabilities of occurrence and

47

nonoccurrence of debris ﬂow are equal prior probability (0.5:0.5), mid-range prior probabilities (0.67:0.33 or 0.75:0.25), and extreme prior probability (0.9:0.1). The LR and BDA methods were to model for predicting under these prior probability combinations. Finally, the results were discussed by model test, accuracy comparisons and receiver operating characteristic curve (ROC). 3.1. Spatial interpolation Because debris ﬂows mostly occurred in remote areas with complex terrain and the limited number of meteorological stations, the daily rainfall data is difﬁcult to obtain. Thus, in practice, spatial interpolation methods are used to interpolate the stations in the adjacent areas to obtain the daily rainfall data (Tao et al., 2009). The co-kriging (as one of them, with relevant environmental factors introduced, such as elevation, slope, and aspect) is a method based on variogram theory and structural analysis and is used to perform unbiased optimal estimation with regionalized variables in a limited area — it is a part of the major content of geostatistics. The corresponding intraday rainfall of disaster sites was extracted from the above interpolation. And then, effective antecedent rainfall was calculated by a formula that is extensively used (Senoo, 1985): n

Ra ¼ ∑i¼1 K Ri

ð1Þ

where Ra is the effective rainfall of previous n days; Ri is the rainfall of the i-th day before debris ﬂow occurs, and K is the attenuation coefﬁcient. The value of attenuation coefﬁcient K is usually an empirical value, which lacks practical relevance. The attenuation coefﬁcient is calculated by Eqs. (2) and (3) (proposed by Xu et al., 2012), the effective rainfall coefﬁcients of short-time-heavy rainfall and long-timelight rainfall, respectively; x indicates the number of days; f(x) is the attenuation coefﬁcient of x-th. f ðxÞ ¼ 50:8351 expð−1:0093xÞ

ð2Þ

f ðxÞ ¼ 30:6558 expð−0:4131xÞ:

ð3Þ

3.2. Bayes discriminant analysis Daily rainfall data set

Environment background data set

The general form of linear discriminant function is Y ¼ C1 X1 þ C2 X2 þ ⋯ þ Cm Xm þ C0

Spatial interpolation

Intraday rainfall

Quantitative processing

10-day previous effective rainfall

Different combinations of debris flow prior probabilities

Bayes discriminant analysis

Logistic regression

Model test

where Y is the discriminant index, which may be probability or coordinate value or score value in different situations; X1, X2,⋯, Xm are the variables of the research object; C0 is the constant; and C1,C2,⋯,Cm are the relevant discriminant coefﬁcients. Bayes discriminant analysis (BDA) is a kind of commonly used classiﬁcation method based on linear discriminant analysis ideas, and under the premise of considering the prior probability, Bayes formula is used to build each category of discriminant function in accordance with certain criteria. Bayes discriminant function is as follows (Zheng et al., 2009): 8 qj f j ðX Þ > > > P ðj=X Þ ¼ j ¼ 1; ⋯; k > > k > X < qi f i ðX Þ > i¼1 > −1=2 > > 1 −1 −p=2 > > ⋅ exp − X−μ ðjÞ ′∑ðjÞ X−μ ðjÞ ΣðjÞ : f j ¼ ð2πÞ 2 j

Accuracy comparison

Fig. 2. Flow chart of computational process.

ð4Þ

ð5Þ

qj f j ðX Þ → max

ð6Þ

P ðjjX Þ ¼ ln qj þ C 0ðjÞ þ C jðjÞ X i

ð7Þ

48

W. Xu et al. / Geomorphology 201 (2013) 45–51

Table 1 Wilks' λ statisticsa.

Table 3 Prediction accuracy of equal prior probabilitya.

Prior probability

Functions

Wilks' λ

Ndf

Sig.

Prior probability

Functions

POverall

Plarge-group

Psmall-group

0.5:0.5

FBDA1 FBDA2 FBDA1 FBDA2 FBDA1 FBDA2 FBDA1 FBDA2

0.724 0.952 0.669 0.824 0.741 0.887 0.885 0.941

9 2 9 2 9 2 9 2

1.020E-009 0.008 1.966E-012 1.126E-008 1.397E-007 4.096E-005 0.007 0.003

0.5:0.5

FBDA1 FLR1 FBDA2 FLR2

67.5% 75.8% 56.2% 56.7%

62.9% 75.3% 68.0% 68.0%

72.2% 76.3% 44.3% 45.4%

0.67:0.33 0.75:0.25 0.9:0.1

a BDA1 is a Bayes discriminant analysis model with rainfall and environment as predictors; BDA2 is a Bayes discriminant analysis model with rainfall as predictors; df is degree of freedom that is the number of independent variables in the model.

−1 where, qj = nj/n, C0(j) = − (1/2)(μ(j))T ∑ −1 (j) μ(j), Cj(j) = ∑ (j) μ(j), P is the posterior probability, fj(X) is the probability density function, qj is the prior probability of the j-th population, n is the total sample size, nj is the sample size of the j-th population, μ(j) is the mean vector of the j-th population, and ∑ (j) is the covariance matrix of the j-th population. The goal of BDA is to distinguish the presence or absence of debris ﬂow, a binary event (Y = 1 indicates the debris-ﬂow event occurs; Y = 0 indicates the debris-ﬂow event does not occur; Y is a binary outcome variable), according to a set of independent predictor variables such as daily rainfall, topography, geomorphology, and so on. The classiﬁcation rule of BDA used in this study is the following: Xi is classiﬁed into the presence of debris ﬂow, if P(Y = 1|Xi) > P(Y = 0|Xi); otherwise, Xi is classiﬁed into the absence of debris ﬂow, which is the maximum posterior probability rule.

3.3. Logistic regression We know from the above recommendation of BDA that the BDA is a part of the general linear model (Fan, 1996), but LR is not because it models the nonlinear probabilistic function (Neter et al., 1989). Given a binary outcome variable Y, with a set of independent predictor variables, the probability of an event is calculated by the logistic function: logitðP Þ ¼ ln

P 1−P

¼ β0 þ β1 X 1 þ β2 X 2 þ ⋯ þ βm X m

ð8Þ

B

P¼

e 1 þ eB

ð9Þ

B ¼ β0 þ β1 X 1 þ β2 X 2 þ ⋯ þ βm X m

ð10Þ

where P is the probability of an event occurrence; X1,X2,⋯,Xm are independent variables; β0 is the constant of the equation; and β1,β2,⋯,βm are the coefﬁcients of independent variables. In addition, maximum likelihood estimation is usually used to estimate these parameters. Table 2 Logistic regression model statisticsa. Prior probability 0.5:0.5 0.67:0.33 0.75:0.25 0.9:0.1 a

Functions

Hosmer and Lemeshow Sig.

FLR1 FLR2 FLR1 FLR2 FLR1 FLR2 FLR1 FLR2

0.305 0.007 0.944 0.348 0.983 0.572 0.899 0.768

a Overall, large group and small group are representative of the overall prediction accuracy, the prediction accuracies of occurrence and nonoccurrence of debris ﬂow in samples, respectively.

As discussed earlier, the classiﬁcation criterion of BDA is the maximum posterior probability rule. The classiﬁcation rule in LR is deeB scribed in the present study: if P ðY ¼ 1jX i Þ ¼ 1þe B > 0:5 then Xi is classiﬁed into the presence of debris ﬂow; otherwise, Xi is classiﬁed into the absence of debris ﬂow. 4. Results and discussion 4.1. Prediction model results 4.1.1. BDA model test In the practical application, the regional debris ﬂow data do not fulﬁll the assumptions of multivariate normal distribution and equal covariance. Hence, Wilks' λ statistics were chosen to test the validity of discriminant functions of BDA. The value of Wilks' λ should be as small as possible when variables have a more signiﬁcant impact on the model; signiﬁcant parameter value (Sig.) is b 0.05, illustrating that discriminant functions are effective to carry out discriminant analysis. The results of Wilks' λ statistics are shown in Table 1. In this table, Wilks' λ statistics of BDA, taking the rainfall and environment factors as predictors, are smaller than those of BDA considering the rainfall factors as predictors only, indicating that the variables of the former have a more signiﬁcant inﬂuence than that of the latter; their Sig. values are b 0.05, stating that both Bayes discriminant functions are effective. 4.1.2. LR model test The goodness of ﬁt of the logistic regression model is usually assessed by coefﬁcients of determination R2 (Cox and Snell R2 (1968) and Nagelkerke R2 (1991)) or the Hosmer–Lemeshow statistic (1989): the former reﬂects the variation percentage of dependent variables explained using all the independent variables in the model, and the closer its value is to 1, the better it is; the latter is used for the signiﬁcant test of the regression model, and a signiﬁcant value is > 0.05, which manifests that no signiﬁcant difference exists in the observed frequency and the expected frequency obtained by prediction probability, namely the regression model is satisfactory. Logistic regression statistics are shown in Table 2.

Table 4 Prediction accuracy of mid-range prior probabilitiesa. Cox and Snell R2

Nagelkerke R2

Prior probability 0.67:0.33

0.369 0.050 0.445 0.260 0.369 0.183 0.202 0.105

0.491 0.066 0.620 0.362 0.548 0.272 0.413 0.215

LR1 is a logistic regression model with rainfall and environment as predictors; LR2 is a logistic regression model with rainfall as predictors.

0.75:0.25

Functions BDA1

F FLR1 FBDA2 FLR2 FBDA1 FLR1 FBDA2 FLR2

POverall

Plarge-group

Psmall-group

83.9% 80.7% 74.5% 69.8% 83.6% 83.6% 78.9% 78.4%

95.3% 88.7% 96.9% 84.5% 96.9% 92.2% 99.2% 97.7%

60.3% 65.1% 28.6% 39.7% 42.9% 57.1% 16.7% 19.0%

a Overall, large group and small group are representative of the overall prediction accuracy, the prediction accuracies of occurrence and nonoccurrence of debris ﬂow in samples, respectively.

W. Xu et al. / Geomorphology 201 (2013) 45–51 Table 5 Prediction accuracy of extreme prior probabilitya. Prior probability

Functions

POverall

Plarge-group

Psmall-group

0.9:0.1

FBDA1 FLR1 FBDA2 FLR2

90.5% 91.1% 90.0% 90.5%

100.0% 98.8% 100.0% 100.0%

10.0% 25.0% 5.0% 5.3%

a Overall, large group and small group are representative of the overall prediction accuracy, the prediction accuracies of occurrence and nonoccurrence of debris ﬂow in samples, respectively.

We see from Table 2 that in the goodness-of-ﬁt testing of Hosmer and Lemeshow (1989), the signiﬁcant value of the model considering rainfall only (priors 0.5:0.5) is b0.05, and the signiﬁcant values of the rest are > 0.05, which indicate the models ﬁt well; the values of two determination coefﬁcients R2 also show that the LR models with rainfall and the environment factors as predictors are ideal relatively.

49

4.1.3. Comparison between BDA and LR Owing to sample size limitations, this paper made use of their respective training samples to perform prediction accuracy tests, and the overall prediction accuracy is given priority, supplemented by the prediction accuracies of occurrence and nonoccurrence of debris ﬂow (large group and small group). The results in training samples are shown in Tables 3, 4, and 5. And the corresponding receiver operating characteristic (ROC) plots of the training data were obtained to make the model capacities clear (Figs. 3, 4, and 5). According to Table 3, in the equal prior probability, the overall prediction accuracy of BDA is 7.3% lower than that of LR with rainfall and environment as predictors and is 0.5% lower than that of LR with rainfall only as predictors. In addition to the overall prediction accuracies, the area under curve of ROC was obtained (Fig. 3). What can be observed from Table 4, in the mid-range prior probability. The overall prediction accuracies of BDA are 3.2% higher (priors 0.67:0.33) and 0% higher (priors 0.75:0.25) than those of the LR based on rainfall and environment factors and are 4.7% higher (priors 0.67:0.33) and 0.5% higher (priors 0.75:0.25) based on rainfall factor. In addition to the overall prediction accuracies, the area under curve of ROC was obtained (Fig. 4). Table 5 shows that the overall prediction accuracy of BDA is 0.6% lower than that of LR in view of rainfall and environment factors and is 0.5% lower than that of LR in the extreme prior probability. In addition to the overall prediction accuracies, the area under curve of ROC was obtained (Fig. 5). From the Plarge-group and Psmall-group columns of three tables, prediction accuracies of the occurrence of debris ﬂows gradually increase, while those of the nonoccurrence of debris ﬂows gradually decrease. 4.2. Discussion

Fig. 3. ROC curve and AUC evaluation with respect to equal prior probability.

The above results show that BDA and LR have different performance under the various combinations of prior probability and predictors in the study area. In general, the more the predicting factors are in the model, the higher the model's accuracy is. Therefore, the prediction models introduced environmental factors on the basis of rainfall have varying degrees of improvement in predicting accuracy. As far as the two models are concerned, certain conditions have added to the uncertainty of their results. The same condition is that both Bayes discriminant and logistic regression need independent

Fig. 4. ROC curve and AUC evaluation with respect to mid-range prior probabilities: (a) is 0.67:0.33; (b) is 0.75:0.25.

50

W. Xu et al. / Geomorphology 201 (2013) 45–51

discriminant analysis and logistic regression for researchers interested in debris ﬂow prediction. Acknowledgments This study is supported by the National Natural Science Foundation of China (Grant Nos. 40971016 and 11102124), the important National Science and Technology Speciﬁc Projects of China (No.2012ZX10004-901001), the Program for New Century Excellent Talents in University, Ministry of Education of China (NCET-10-0604), the Provincial Key Science and Technology Research and Development Program of Sichuan, China (2013SZ0002), and the Fundamental Research Funds for the Central Universities (Grant Nos. ZYGX2012J152, A03008023401011, and A03008023401021). References

Fig. 5. ROC curve and AUC evaluation with respect to extreme prior probability.

variables to build the prediction models. The difference is that Bayes requires some variable assumptions; but these assumptions are difﬁcult to be completely met in practice, which does not necessarily indicate that Bayes discriminant is poorer than logistic regression or vice versa, that all the assumptions fulﬁlled does not mean that Bayes is necessarily better than logistic regression. The selection of methods should be analyzed and discussed at length according to the actual situation. In this study, none of Bayes discriminant's assumptions are met; however, through Wilks' λ statistic, Bayes discriminant functions can still effectively be used for the forecast. The major disadvantage of the logistic regression model is the estimation method of model parameters: maximum likelihood estimation, which always requires large enough samples for the iterative calculation to obtain stable model parameters. In the study, the prior probabilities have some effect on relationship equations of BDA and LR. With equal and extreme prior probability, the relations among the predictors tend toward the nonlinear model (LR). When the prior probability is mid-range, the relations are approximately linear (BDA) with the priors of 0.67 and 0.33; with the priors of 0.75 and 0.25, BDA and LR have little difference; and these priors may be in the boundary line of mid-range priors and extreme priors. Hence, the prior probabilities impact the prediction accuracies. For the occurrence and nonoccurrence of debris ﬂow, both accuracies change in opposite directions and their difference also becomes bigger and bigger, which are as supplementary information of model evaluation. In addition, the quality and robustness of selected sample data are both very important, and the errors existing in the data processing and system error will all affect the accuracy of results.

5. Conclusions Aimed at the Panzhihua and Liangshan Yi Autonomous Prefecture areas, this study compared Bayes discriminant analysis and logistic regression for predicting debris ﬂow with different priors and different predictors. Through the comprehensive consideration the results indicated that Bayes discriminant analysis had good prediction with the mid-range prior probability while logistic regression had good prediction with equal and extreme prior probability, based on rainfall factors or the combination of rainfall and environmental factors. This paper provides a reference about the statistical model of Bayes

Alkarkhi, A.F.M., Easa, A.M., 2008. Comparing discriminant analysis and logistic regression model as statistical assessment tools of arsenic and heavy metal contents in cockles. Journal of Sustainable Development 1 (2), 102–106. Chen, C.Y., Lin, L.Y., Yu, F.C., Lee, C.S., Tseng, C.C., Wang, A.H., Cheng, K.W., 2007. Improving debris ﬂow monitoring in Taiwan by using high-resolution rainfall products from QPESUMS. Natural Hazards 40 (2), 447–461. Cox, D.R., Snell, E.J., 1968. A general deﬁnition of residuals. Journal of the Royal Statistical Society. Series B (Methodological) 40 (2), 248–275. Efron, B., 1975. The sufﬁciency of logistic regression compared to normal discriminant analysis. Journal of the American Statistical Association 70 (352), 892–898. Fan, X., 1996. Canonical correlation analysis as a general analytical model. In: Thompson, B. (Ed.), Advances in Social Science Methodology. JAI Press, Greenwich, CT, pp. 71–94. Fan, J.C., Mei, C.L., 2002. Data Analysis. Science Press, Beijing (in Chinese). Fan, X.T., Wang, L., 1999. Comparing linear discriminant function with logistic regression for the two-group classiﬁcation problem. The Journal of Experimental Education 67 (3), 265–286. Graf, C., Stoffel, M., Grêt-Regamey, A., 2009. Enhancing debris ﬂow modeling parameters integrating Bayesian networks. Geophysical Research Abstracts 11, 10725. Harrell, F.E., Lee, K.L., 1985. A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality. In: Se, P.K. (Ed.), Biostatistics: Statistics in Biomedical, Public Health and Environment Sciences. North-Holland, Amsterdam, pp. 333–343. Hosmer, D.W., Lemeshow, S., 1989. Applied Logistic Regression. Wiley, New York. Jomelli, V., Brunstein, D., Chochillon, C., Pech, P., 2003. Hillslope debris ﬂows frequency since the beginning of the 20th century in the Massif des Ecrins (French Alps). In: Rickenmann, D., Chen, C. (Eds.), Debris Flows Hazards Mitigation: Mechanics, Prediction and Assessment. Millpress, Rotterdam, pp. 127–137. Kenneth, A., 1987. Debris ﬂow during intense rainfall in Snoedonia, north Wales: a preliminary survey. Earth Surface Processes and Landforms 12, 561–566. Leclerc, Y., 1994. The Design of FM: Data Integrating Package for Zoning Natural Hazards in the Developing Countries. (M.E. Des. Thesis) Environmental Science, Faculty of Environmental Design, University of Calgary Alberta, Canada. Lee, S., 2005. Application of logistic regression model and its validation for landslide susceptibility mapping using GIS and remote sensing data. Journal of Remote Sensing 26, 1477–1491. Lei, P.W., Koehly, L.M., 2003. Linear discriminant analysis versus logistic regression: a comparison of classiﬁcation errors in the two-group case. The Journal of Experimental Education 72 (1), 25–49. Liu, X.L., 2002. An overview of foreign debris ﬂow mechanism models. Journal of Catastrophology 17 (4), 1–6 (in Chinese). Liu, X.L., 2006. Site-speciﬁc vulnerability assessment for debris ﬂows: two case studies. Journal of Mountain Science 3 (1), 20–27 (in Chinese). Ma, D.T., 2010. Some suggestions on controlling catastrophic debris ﬂows on Aug. 8th, 2010 in Zhouqu, Gansu. Journal of Mountain Science 28 (5), 635–640 (in Chinese). Maja, P., Mateja, B., Sandra, T., 2004. Comparison of logistic regression and linear discriminant analysis: a simulation study. Metodološki zvezki 1 (1), 143–161. Nagelkerke, N.J.D., 1991. A note on a general deﬁnition of the coefﬁcient of determination. Biometrika 78 (3), 691–692. Neter, J., Wasserman, W., Kutner, M.H., 1989. Applied Linear Regression Models, 2nd ed. Irwin, Boston, MA. Ohlmacher, G.C., Davis, J.C., 2003. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Engineering Geology 69 (3–4), 331–343. Payne, S.L., Lyketsos, C.G., Steele, C., Baker, L., Galik, E., Kopunek, S., Steinberg, M., Warren, A., 1998. Relationship of cognitive and functional impairment to depressive feature in Alzheimer's disease and other dementias. The Journal of Neuropsychiatry and Clinical Neurosciences 10 (4), 440–447. Pradhan, B., 2010. Remote sensing and GIS-based landslide hazard analysis and crossvalidation using multivariate logistic regression model on three test areas in Malaysia. Advance in Space Research 45, 1244–1256. Pradhan, B., Lee, S., 2007. Utilization of optical remote sensing data and GIS tools for regional landslide hazard analysis by using an artiﬁcial neural network model at Selangor, Malaysia. Earth Science Frontiers 14 (6), 143–152.

W. Xu et al. / Geomorphology 201 (2013) 45–51 Rupert, M., Cannon, S.H., Gartner, J.E., Michael, J.A., Helsel, D.R., 2008. Using Logistic Regression to Predict the Probability of Debris Flows in Areas Burned by Wildﬁres, Southern California, 2003–2006. U.S. Geological Survey. Senoo, K., 1985. Rainfall indexes for debris ﬂow warning evacuating program. Shin-Sabo 38, 16–21. Shieh, C.L., Chien, Y.S., Tsai, Y.J., 2009. Variability in rainfall threshold for debris ﬂow after the Chi-Chi earthquake in central Taiwan, China. International Journal of Sediment Research 24 (2), 177–188. Soeters, R., van Westen, C.J., 1996. Slope instability recognition, analysis, and zonation. In: Turner, A.K., Schuster, R.L. (Eds.), Landslides, Investigation and Mitigation. Transportation Research Board, National Research Council, Special Report. National Academy Press, Washington, DC, pp. 129–177. Spiegelhalter, D.J., 1986. A statistical view of uncertainty in expert systems. In: Gale, W.A. (Ed.), Artiﬁcial Intelligence and Statistics. Addison-Wesley Publ. Co., Reading, MA, pp. 17–55. Tan, W.P., Han, Q.Y., 1992. Study on regional critical rainfall indices of debris ﬂow in Sichuan Province. Journal of Catastrophology 7 (2), 37–42 (in Chinese).

51

Tao, T., Chocat, B., Lui, S., Xin, K., 2009. Uncertainty analysis of interpolation methods in rainfall spatial distribution — case of a small catchment in Lyon. Journal of Water Resource and Protection 2, 136–144. Udris, E.M., Au, D.H., Mcdonell, M.B., Chen, L., Martin, D.C., Tierney, W.M., Fihn, S.D., 2001. Comparing methods to identify general internal medicine clinic patients with chronic heart failure. American Heart Journal 142, 465–471. Worth, A.P., Cronin, M.T., 2003. The use of discriminant analysis, logistic regression and classiﬁcation tree analysis in the development of classiﬁcation models for human health effects. Journal of Molecular Structure (Theochem) 622 (1), 97–111. Xu, W.B., Yu, W.J., Zhang, G.P., 2012. Prediction method of debris ﬂow by logistic model with two types of rainfall: a case study in Sichuan, China. Natural Hazards 62 (2), 733–744. Xu, W.B., Yu, W.J., Jing, S.C., Zhang, G.P., Huang, J.X., 2013. Debris ﬂow susceptibility assessment by GIS and information value model in a large-scale region, Sichuan Province (China). Natural Hazards. http://dx.doi.org/10.1007/s11069-012-0414-z. Zheng, G.Q., Zhang, H.J., Liu, T., Wu, J.D., Hou, X.F., Ye, Z.H., 2009. Prediction model of ﬂush ﬂood and debris ﬂow in Miyun County based on Bayes discriminatory analysis. Bulletin of Soil and Water Conservation 29 (1), 83–87 (in Chinese).

A comparison between Bayes discriminant analysis and logistic regression for prediction of debris flow in southwest Sichuan, China

A comparison between Bayes discriminant analysis and logistic regression for prediction of debris flow in southwest Sichuan, China

Recommend Documents