Two credit scoring models based on dual strategy ensemble trees

Knowledge-Based Systems 26 (2012) 61–68 Contents lists available at ScienceDirect Knowledge-Based Systems journal homepage: www.elsevier.com/locate/...

Download PDF

1MB Sizes 9 Downloads 39 Views

Report

PDF Reader
Full Text

Knowledge-Based Systems 26 (2012) 61–68

Contents lists available at ScienceDirect

Knowledge-Based Systems journal homepage: www.elsevier.com/locate/knosys

Two credit scoring models based on dual strategy ensemble trees Gang Wang a,b,⇑, Jian Ma c, Lihua Huang d, Kaiquan Xu c,e a

School of Management, Hefei University of Technology, Hefei, Anhui 230009, PR China Key Laboratory of Process Optimization and Intelligent Decision-making, Ministry of Education, Hefei, Anhui, PR China c Department of Information Systems, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong d School of Management, Fudan University, Shanghai 200433, PR China e Department of Electronic Commerce, School of Business, Nanjing University, Nanjing, Jiangshu 210093, PR China b

a r t i c l e

i n f o

Article history: Received 14 August 2009 Received in revised form 28 June 2011 Accepted 28 June 2011 Available online 13 July 2011 Keywords: Credit scoring Ensemble learning Bagging Random subspace Decision tree

a b s t r a c t Decision tree (DT) is one of the most popular classiﬁcation algorithms in data mining and machine learning. However, the performance of DT based credit scoring model is often relatively poorer than other techniques. This is mainly due to two reasons: DT is easily affected by (1) the noise data and (2) the redundant attributes of data under the circumstance of credit scoring. In this study, we propose two dual strategy ensemble trees: RS-Bagging DT and Bagging-RS DT, which are based on two ensemble strategies: bagging and random subspace, to reduce the inﬂuences of the noise data and the redundant attributes of data and to get the relatively higher classiﬁcation accuracy. Two real world credit datasets are selected to demonstrate the effectiveness and feasibility of proposed methods. Experimental results reveal that single DT gets the lowest average accuracy among ﬁve single classiﬁers, i.e., Logistic Regression Analysis (LRA), Linear Discriminant Analysis (LDA), Multi-layer Perceptron (MLP) and Radial Basis Function Network (RBFN). Moreover, RS-Bagging DT and Bagging-RS DT get the better results than ﬁve single classiﬁers and four popular ensemble classiﬁers, i.e., Bagging DT, Random Subspace DT, Random Forest and Rotation Forest. The results show that RS-Bagging DT and Bagging-RS DT can be used as alternative techniques for credit scoring. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction The recent world ﬁnancial tsunami arouses unprecedented attention of ﬁnancial institutions on credit risk. A good credit risk assessment method can help ﬁnancial institutions to grant loans to creditable applicants, thus increase proﬁts; it can also deny credit for the non-creditable applicants, so decrease losses. In recent years, credit scoring has been become one of the primary ways for ﬁnancial institutions to assess credit risk, improve cash ﬂow, reduce possible risks and make managerial decisions [1,2]. The accuracy of credit scoring is critical to ﬁnancial institutions’ proﬁtability. Even 1% of improvement on the accuracy of recognizing applicants with bad credit will decrease a great loss for the ﬁnancial institutions [3]. Credit scoring was originally evaluated subjectively according to personal experiences, and later it was based on 5Cs: the character of the consumer, the capital, the collateral, the capacity and the economic conditions. However, with the tremendous increase in the number of applicants, it is impossible to conduct the work ⇑ Corresponding author at: School of Management, Hefei University of Technology, Hefei, Anhui 230009, PR China. Tel.: +86 0551 2904910; fax: +86 0551 2904910. E-mail address: [email protected] (G. Wang). 0950-7051/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.knosys.2011.06.020

manually. Two categories of automatic credit scoring techniques, i.e., statistical techniques and Artiﬁcial Intelligence (AI) techniques have been studied by prior studies [4]. Some statistical techniques have been widely applied to build the credit scoring models, such as Linear Discriminant Analysis (LDA) [5,6], Logistic Regression Analysis (LRA) [7,8], Multivariate Adaptive Regression Splines (MARS) [9]. However, the problem with applying these statistical techniques to credit scoring is that some assumptions, such as the multivariate normality assumptions for independent variables, are frequently violated in reality, which makes these techniques theoretically invalid for ﬁnite samples [4]. In recent years, many studies have demonstrated that AI techniques, such as Artiﬁcial Neural Network (ANN) [8,10], decision tree (DT) [11,12], Case based Reasoning (CBR) [13,14] and Support Vector Machine (SVM) [2,15,16] can be used as alternative methods for credit scoring. In contrast with statistical techniques, AI techniques do not assume certain data distributions. These techniques automatically extract knowledge from training samples. According to previous studies, AI techniques are superior to statistical techniques in dealing with credit scoring problems, especially for nonlinear pattern classiﬁcation [4]. Among all the AI techniques, DT is a widely used technique for three reasons: ﬁrst, due to its intuitive representation, the resulting classiﬁcation model

62

G. Wang et al. / Knowledge-Based Systems 26 (2012) 61–68

is easy to assimilate by humans [17,18]. Second, DT is non-parametric. DT construction algorithms do not make any assumptions about the underlying distribution and are thus especially suitable for exploratory knowledge discovery. Third, DT can be constructed relatively fast compared to other techniques [18,19]. In despite of its merits, DT is less used as a credit scoring model because its classiﬁcation accuracy is relative lower than others techniques, and easily affected by the noise data and the redundant attributes of data [20–22]. In this study, we propose two dual strategy ensemble trees: RS-Bagging DT and Bagging-RS DT, which are based on two ensemble strategies: bagging and random subspace, to reduce the affection from the noise data and the redundant attributes of data and get the relative higher classiﬁcation accuracy. Ensemble is a machine learning paradigm where multiple learners are trained to solve the same problem [23]. In contrast to ordinary machine learning approaches that try to learn one hypothesis from the training data, ensemble learning tries to construct a set of hypotheses and combine them to use [24]. In order to reduce the inﬂuence of the noise data and the redundant attributes to the accuracy of DT, we introduce two ensemble strategies, i.e., bagging and random subspace. Firstly, as prior studies have proved that bagging performs better then others ensemble methods, e.g. boosting, in situations with large noise [25,26], we introduce bagging as one of ensemble strategies to reduce the inﬂuence of the noise data to DT. Secondly, as random subspace has been found to work well when there is redundant information which is dispersed across all the features [27,28], we introduce random subspace as another ensemble strategy to reduce the affection by the redundant attributes of data to DT. As we can process data with different order, i.e., ﬁrstly reduce the noise data using bagging strategy and then reduce the redundant attributes of data using random subspace strategy, or ﬁrstly reduce the redundant attributes of data using random subspace strategy and then reduce the noise data using bagging strategy, there are two dual strategy ensemble trees: RS-Bagging DT and Bagging-RS DT. For the testing and illustration purposes, two open credit datasets are used to verify the effectiveness of the proposed two ensemble methods, i.e., RS-Bagging DT and Bagging-RS DT. The experimental results reveal that DT gets the lowest average accuracy among ﬁve single classiﬁers, i.e., LRA, LDA, Multi-layer Perceptron (MLP) and Radial Basis Function Network (RBFN). In addition, RS-Bagging DT and Bagging-RS DT get the better results than ﬁve single classiﬁers and four popular ensemble classiﬁers, i.e., Bagging DT, Random Subspace DT, Random Forest and Rota-

tion Forest. All these results illustrate that RS-Bagging DT and Bagging-RS DT can be used as alternative techniques for credit scoring. The remainder of the paper is organized as follows. In Section 2, the background of DT, bagging and random subspace are presented. In Section 3, we propose two algorithms, i.e., RS-Bagging DT and Bagging-RS DT based on the bagging and the random subspace for credit scoring. In Section 4, we present the details of experiment design. Section 5 reports the experimental results. Based on the observations and results of these experiments, Section 6 draws conclusions and future research directions. 2. Background 2.1. Decision tree A decision tree is a tree-like structure (Fig. 1) which divides a set of input samples based on some characteristics of their attributes into several smaller sets [17,29]. Unlike conventional statistical classiﬁers, which use all available features simultaneously and make a single membership decision for each pixel, the DT uses a multi-stage or sequential approach to the problem of label assignment. The labeling process is considered to be a chain of simple decisions based on the results of sequential tests rather than a single, complex decision. Sets of decision sequences form the branches of the DT, with tests being applied at the nodes. DT construction involves the recursive partitioning of a set of training data, which is split into increasingly homogeneous subsets on the basis of tests applied to one or more of the attribute values [29]. These tests are represented by nodes. The univariate DT applies a test to a single attribute at a time, whereas the multivariate DT uses one or more attributes simultaneously. Labels are assigned to terminal (leaf) nodes by means of an allocation strategy, such as majority voting. In this study, we choose widely used C4.5 as base learner. 2.2. Bagging Breiman’s bagging, short for bootstrap aggregating, is one of the earliest ensemble learning algorithms [30]. It is also one of the most intuitive and simplest to implement, with a surprisingly good performance. Diversity in bagging is obtained by using bootstrapped replicas of the training dataset: different training data subsets are randomly drawn—with replacement—from the entire

Fig. 1. An example of decision tree.

G. Wang et al. / Knowledge-Based Systems 26 (2012) 61–68

63

Fig. 2. The bagging algorithm.

Fig. 3. The random subspace algorithm.

training dataset. Each training data subset is used to train a different base learner of the same type. The base learners’ combination strategy for bagging is majority vote. Simple as it is, this strategy can reduce variance when combined with the base learner generation strategies. The pseudo-code for the bagging algorithm is given in Fig. 2. 2.3. Random subspace The random subspace method is an ensemble construction technique proposed by Ho [27]. In the random subspace, the training dataset is also modiﬁed as in bagging. However, this modiﬁcation is performed in the feature space (rather than example space). The pseudo-code for the random subspace algorithm is given in Fig. 3. The random subspace may beneﬁt from using both random subspaces for constructing the classiﬁers and aggregating the classiﬁers. When the dataset has many redundant attributes, one may obtain better classiﬁers in random subspaces than in the original feature space [27]. The combined decision of such classiﬁers may be superior to a single classiﬁer constructed on the original training dataset in the complete feature space. 3. Two dual strategy ensemble trees: RS-Bagging DT and Bagging-RS DT for credit scoring Like many other applications, credit scoring suffers from concurrent negative effects by the noise and the redundant attributes.

The noise disrupts the training data and the redundant attributes prevent the classiﬁer from picking relevant attributes in building the model. Therefore, they may reduce classiﬁcation accuracy. Especially for DT, as it uses the ‘‘divide and conquer’’ strategy and chooses the most promising attribute to split on at each point, the redundant attributes and the noise can cause overﬁtting, unstable and bad accuracy [29,30]. All these make the performance of DT based credit scoring model relatively poorer than other techniques and DT is seldom used as credit scoring model. The greedy characteristic of DT leads it is over-sensitive to redundant attributes and noise which make it especially unstable: a minor change in one split chose to the root will change the whole subtree below. Since the bagging trains classiﬁers on each bootstrap sample of training data and combines the results by majority voting, it is less inﬂuenced by noise than single classiﬁer. Thus we introduce the bagging strategy into single DT to get better performance. At the same time, since each DT is trained on a bootstrap sample, the distribution of training sample is similar to the whole sample. Thus, DTs in a bagging have relatively high classiﬁcation accuracy. However, for Bagging DT, the only factor encouraging diversity between DTs is the proportion of different instances in the training dataset. Although the DTs used in Bagging DT are sensitive to small changes in data, the bootstrap sampling appears to lead to ensembles of low diversity compared to other ensemble strategy. As a result, Bagging DT requires larger ensemble sizes to perform well [31]. At the same time, as Bagging DT bootstraps sample from

64

G. Wang et al. / Knowledge-Based Systems 26 (2012) 61–68

Fig. 4. The RS-Bagging DT algorithm.

Fig. 5. The Bagging-RS DT algorithm.

training data, it can not resolve the problem of redundant attributes. As discussed in Section 1, random subspace has been found to work well when there is redundant information which is dispersed across all the features [27,28], we introduce random subspace strategy into Bagging DT to enhance the diversity of it and to get better performance. As we can process data with different order, i.e., ﬁrstly reduce the noise data and then reduce the redundant attributes of data, or ﬁrstly reduce the redundant attributes of data and then reduce the noise data, there are two dual strategy ensemble trees: RS-Bagging DT and Bagging-RS DT. Since the RS-Bagging DT and BaggingRS DT can generate more diversiﬁed classiﬁers than using the bagging or the random subspace respectively, they might outperform the two. The algorithms are described in Figs. 4 and 5.

4. Experimental setup 4.1. Real world credit dataset Two real world datasets are used to evaluate the performance of the two ensemble methods: Australian credit datasets and German

Table 1 The characteristics of three datasets used in the experiment.

Australian credit German credit

Total cases

Good/bad cases

No. of attributes

690 1000

307/382 700/300

14 20

65

G. Wang et al. / Knowledge-Based Systems 26 (2012) 61–68 Table 2 Confusion matrix for credit scoring. Actual condition

Test Result

Positive (non-risk) Negative (risk)

Positive (non-risk)

Negative (risk)

True positive (TP) False negative (FN)

False positive (FP) True negative (TN)

credit datasets. These two datasets are available from UCI machine learning repository [32] and have been widely used in credit scoring researches. The Australian dataset includes 307 good customers and 383 bad customers. Each customer contains 6 nominal, 8 numeric attributes. The German dataset consists of 700 good customers and 300 bad customers. It contains 20 attributes, which include 7 continuous and 13 categorical attributes. A summary of the characteristics of above two datasets is reported in Table 1. 4.2. Performance evaluation The evaluation criteria of our experiments are adopted from the established standard measures in the ﬁelds of credit scoring. These measures include average accuracy, type I error and type II error. Each measure has its merits and limitations. In this study, we prefer to use a combination of these measures, rather than a single measure, to measure the performance of credit scoring models. The deﬁnition of these measures can be explained with respect to a confusion matrix as shown in Table 2. Formally speaking, they are deﬁned as follows:

Average accuracy ¼

Type I Error ¼

TP þ TN TP þ FP þ FN þ TN

ð1Þ

FN TP þ FN

Type II Error ¼

ð2Þ

FP TN þ FP

ð3Þ

5. Experimental results The experiments described in this section were performed on a PC with a 3.00 GHz Intel Core Duo CPU and 4 GB RAM, using Windows XP operating system. Data mining toolkit WEKA (Waikato Environment for Knowledge Analysis) version 3.6.0 is used for experiment. WEKA is an open source toolkit, and it consists of a collection of machine learning algorithms for solving data mining problems [33]. In this study, we compared the performance of 10 methods, including LRA, LDA, MLP, RBFN, DT, Bagging DT, Random Subspace

DT, Random Forest, Rotation Forest, RS-Bagging DT and Bagging-RS DT. For implementation of DT, we chose J48 (WEKA’s own version of C4.5) module in WEKA. And for implementation of ensemble learning, i.e., Bagging DT, Random Subspace DT, Random Forest, Rotation Forest, we chose Bagging module, RandomSubSpace module, RandomForest module and RotationForest module. For implementation of RS-Bagging DT and Bagging-RS DT, we used WEKA Package, i.e., WEKA.JAR and implement in Eclipse. Except when stated otherwise, all the default parameters in WEKA were used. On each dataset, for each compared ensemble algorithm, four ensemble sizes are tried, including 10, 20, 100 and 150, respectively. Moreover, ﬁve subspace rate are tested, where the value of k is set to 0.5, 0.6, 0.7, 0.8 and 0.9. To minimize the inﬂuence of the variability of the training set, 10 times 10-fold cross validation is performed on the Australian and German credit datasets. In detail, each credit dataset is partitioned into 10 subsets with similar sizes and distributions. Then, the union of nine subsets is used as the training set while the remaining subset is used as the test set, which is repeated for 10 times such that every subset has been used as the test set once. The average test result is regarded as the result of the 10-fold cross validation. The whole above process is repeated for 10 times with random partitions of the 10 subsets, and the average results of these different partitions are recorded. Table 3 presents three performance indicators, i.e., average accuracy, type I error and type II error, of different methods. Note that the results of LRA, LDA, MLP and RBFN are from the original literature [8]. For the Bagging DT, Random Subspace DT, Random Forest, Rotation Forest, RS-Bagging DT and Bagging-RS DT, we report the maximal average accuracy under different parameters and corresponding type I error and type II error. In addition, the experimental results on the other ensemble sizes and subspace rate will be presented later in this subsection. It is evident from the Table 3 that that Bagging-RS DT has the highest average accuracy of 88.01% for the Australian credit dataset. Closely following Bagging-RS DT is LRA with an average accuracy of 87.25%, RBFN with 87.14%, and RS-Bagging DT with 87.17%. Note that as expected, DT gets the lowest average accuracy among ﬁve base learners, i.e., LRA (87.25%), LDA (85.96%), MLP (85.84%), RBFN (87.14%) and DT (84.39%). The results for the German credit dataset exhibit some of the same patterns discussed for the Australian credit dataset. Bagging-RS DT has the highest average accuracy of 78.52%, followed closely by RS-Bagging DT (78.36%). Random Forest (77.05%) and Rotation Forest (77.00%) are comparable from an average accuracy consideration. As expected, DT also gets the lowest average accuracy among ﬁve base learners, i.e., LRA (76.30%), LDA (72.60%), MLP (73.28%), RBFN (74.60%) and DT (72.10%). The reason that RS-Bagging DT and Bagging-DT get higher average accuracy is to reduce type I error or type II error. To illustrate

Table 3 Results of different methods. Methods

LRA (West, 2000) LDA (West, 2000) MLP (West, 2000) RBFN (West, 2000) DT Bagging DT Random Subspace DT Random Forest Rotation Forest RS-Bagging DT Bagging-RS DT

Australian credit dataset

German credit dataset

Average accuracy (%)

Type I error (%)

Type II error (%)

Average accuracy (%)

Type I error (%)

Type II error (%)

87.25 85.96 85.84 87.14 84.39 86.38 86.93 86.89 86.55 88.17 88.01

11.07 7.82 15.40 13.15 18.00 14.15 18.18 13.75 13.17 19.44 15.6

14.09 19.06 13.26 12.74 13.70 13.19 8.97 12.60 13.69 7.52 9.00

76.30 72.60 73.28 74.6 72.10 76.45 76.12 77.05 77.00 78.36 78.52

11.86 27.71 13.52 13.47 17.06 11.85 6.44 9.52 9.39 5.98 7.19

51.33 26.67 57.53 52.99 53.20 50.83 64.60 54.28 54.78 58.56 55.44

66

G. Wang et al. / Knowledge-Based Systems 26 (2012) 61–68

(a) Ensemble size : 10

(b) Ensemble size : 50

(c) Ensemble size : 100

(d) Ensemble size : 150

Fig. 6. Sensitivity analysis of classiﬁcation performance on Australian credit dataset.

this point, we give the detailed ﬁgures based on different ensemble size and subspace rate in Figs. 6 and 7. As shown in Fig. 6, RS-Bagging DT and Bagging-RS DT get higher average accuracy than DT, Random Forest, Rotation Forest, Bagging DT and Random Subspace DT under different ensemble size. For RS-Bagging DT and Bagging-RS DT, they get the best average accuracy (88.17% and 88.01%) when the ensemble size is 150 and subspace rate is 0.7. For the other two performance indicators, i.e., type I error and type II error, with the increasing subspace rate, type I errors of Random Subspace DT, RS-Bagging DT and Bagging-RS DT decrease while type II errors of Random Subspace DT, RS-Bagging DT and Bagging-RS DT increase in a certain ensemble size. As shown in Fig. 7, RS-Bagging DT and Bagging-RS DT also get higher average accuracy. For RS-Bagging DT, it gets the best average accuracy (78.36%) when the ensemble size is 150 and subspace rate is 0.6. In addition, for Bagging-RS DT, it gets the best results (78.52%) when the ensemble size is 100 and subspace rate is 0.7. Note that unlike Australian credit dataset, with the increasing subspace rate, type I error of RS-Bagging DT and Bagging-RS DT in-

crease while type II error of RS-Bagging DT and Bagging-RS DT decrease in a certain ensemble size. The main reason is that there are more bad cases in the Australian credit dataset while there are more good cases in the German credit dataset. And type I error is related to positive case, i.e., good case while type II error is related to negative case, i.e., bad case. It is interesting that type II error of Random Subspace DT increases more quickly than RS-Bagging DT and Bagging-RS DT while type I error of Random Subspace DT decreases more slowly in the Australian credit dataset while the contrary is in the German credit dataset. Moreover, with the absolute values of type I error and type II error of Random Subspace DT is bigger than the absolute values of RS-Bagging DT and Bagging-RS DT, all these make the average accuracy of Random Subspace DT lower than RS-Bagging DT and Bagging-RS DT. The main reason may lie in that Random Subspace DT can reduce the inﬂuences of redundant attributes of data while little to the noise data. For the other three popular ensemble methods, i.e., Bagging DT, Random Forest and Rotation Forest, as type I error and type II error of them are smaller than type I error and

G. Wang et al. / Knowledge-Based Systems 26 (2012) 61–68

67

(a) Ensemble size : 10

(b) Ensemble size : 50

(c) Ensemble size : 100

(d) Ensemble size : 150

Fig. 7. Sensitivity analysis of classiﬁcation performance on German credit dataset.

type II error of single DT, the average accuracy of Bagging DT, Random Forest and Rotation Forest get the better results. Note that Random Forest and Rotation Forest get better results than Bagging DT as prior studies [31,34], while they get worst results than RSBagging DT and Bagging-RS DT. The main reason may lie in that based on Bagging, Random Forest uses a random selection of features to split each node and Rotation Forest generate classiﬁer ensembles based on feature extraction. Through introducing these tricks into Bagging, diversity is promoted for each base learner. Thus Random Forest and Rotation Forest get better results than Bagging DT. However, just as discussed above, Random Forest and Rotation Forest can only reduce the inﬂuences of redundant attributes of data. Thus they get worse results than RS-Bagging DT and Bagging-RS DT. According to the above experimental results, we can draw the following conclusions: (1) For single DT, as discuss before, it is easily affected by the noise data and the redundant attributes and its classiﬁcation accuracy is relative lower than others techniques. In our

experiments, average accuracy of DT is the lowest among ﬁve popular single classiﬁers, i.e., LRA, LDA, MLP and RBFN. Moreover, only one percent scoring accuracy increase would retrieve a great loss for ﬁnancial institutions. Thus single DT does not apply to credit scoring individually. (2) For the four popular ensemble methods, i.e., Bagging DT, Random Subspace DT, Random Forest and Rotation Forest, they get better results then single DT in our experiments. These ensemble methods introduce certain mechanisms to reduce the inﬂuence of the noise or the redundant attributes. For example, Random Subspace DT introduces random selection of attributes to reduce the inﬂuence of the redundant attributes. Bagging DT combines classiﬁers of randomly generated training sets to reduce the variance and the inﬂuence of the noise. The experimental results show that all these different mechanisms effect in the real world credit datasets. (3) For RS-Bagging DT and Bagging-RS DT, they combine the mechanisms of Bagging DT and Random Subspace DT to reduce the inﬂuence of the noise and the redundant attri-

68

G. Wang et al. / Knowledge-Based Systems 26 (2012) 61–68

butes. From the theory perspective, RS-Bagging DT and Bagging-RS DT introduce the random selection of training instances and attributes to increase the diversity of base learners. In our experiments, empirical results prove that RS-Bagging DT and Bagging-RS DT have more accuracy than Bagging DT and Random Subspace DT. Moreover, they can be used as alternative techniques for credit scoring. 6. Conclusions and future directions Credit scoring has become a very important task as ﬁnancial institutions have to decide whether to grant credit to consumers who submit an application. And only one percent scoring accuracy increase would retrieve a great loss for ﬁnancial institutions. Until now, a lot of credit scoring models have been developed based on the traditional statistical techniques or AI techniques. The traditional statistical techniques perform favorably only when the essential assumptions are satisﬁed. In contrast to statistical techniques, the AI techniques do not require the knowledge of the underlying relationships between input and output variables. Although DT is one of the most popular classiﬁcation algorithms in current use in data mining and machine learning, it is seldom used for credit scoring as the performance of DT based credit scoring model is often relative poorer than other techniques. This is mainly due to DT easily affected by the noise data and the redundant attributes of data. In this research, we introduce two ensemble strategies, i.e., bagging and random subspace and propose two dual strategy ensemble trees: Bagging-RS DT and RS-Bagging DT, to reduce the inﬂuence of the noise data and the redundant attributes to the accuracy of DT. The experimental results show that RS-Bagging DT and Bagging-RS DT can be used as alternative techniques for credit scoring. Several future research directions also emerge according this study. Firstly, large datasets for experiments and applications, particularly with more exploration of credit rating data structures, should be collected to further valid the conclusions of the study. Secondly, a major limitation of ensemble learning methods is the lack of interpretability of the results, i.e., the knowledge learned by ensembles is difﬁcult to understand by humans. Therefore improving the interpretability of ensembles is another important yet largely understudied research direction. Acknowledgments This work is partially supported by the National Natural Science Foundation of China (No. 71071045), the National High-tech R&D Program of China (863 Program) (No. 2009AA043403) and the Doctoral Special Fund of Hefei University of Technology (No. 2010HGBZ0607). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation. References [1] B.A. Yang, L.X. Li, Q.H. Xie, J. Xu, Development of a KBS for managing bank loan risk, Knowledge-Based Systems Volume 14 (5–6) (2001) 299–302. [2] C.L. Huang, M.C. Chen, C.J. Wang, Credit scoring with a data mining approach based on support vector machines, Expert Systems with Applications 33 (4) (2007) 847–856. [3] D.J. Hand, W.E. Henley, Statistical classiﬁcation methods in consumer credit scoring: a review, Journal of the Royal Statistical Society: Series A (Statistics in Society) 160 (3) (1997) 523–541.

[4] Z. Huang, H. Chen, C.J. Hsu, W.H. Chen, S.S. Wu, Credit rating analysis with support vector machines and neural networks: a market comparative study, Decision support systems 37 (4) (2004) 543–558. [5] A.K. Reichert, C.C. Cho, G.M. Wagner, An examination of the conceptual issues involved in developing credit-scoring models, Journal of Business and Economic Statistics 1 (2) (1983) 101–114. [6] G. Karels, A. Prakash, Multivariate normality and forecasting of business bankruptcy, Journal of Business Finance Accounting 14 (4) (1987) 573–593. [7] L.C. Thomas, A survey of credit and behavioral scoring: forecasting ﬁnancial risks of lending to customers, International Journal of Forecasting 16 (2) (2000) 149–172. [8] D. West, Neural network credit scoring models, Computers and Operations Research 27 (11–12) (2000) 1131–1152. [9] J.H. Friedman, Multivariate adaptive regression splines, The Annals of Statistics 19 (1) (1991) 1–141. [10] V. Desai, J. Crook, G. Overstreet, A comparison of neural networks and linear scoring models in the credit union environment, European Journal of Operations Research 95 (1) (1996) 24–37. [11] P. Makowski, Credit scoring branches out, Credit World 74 (2) (1985) 30–37. [12] C. Hung, J.H. Chen, A selective ensemble based on expected probabilities for bankruptcy prediction, Expert Systems with Applications 36 (3) (2009) 5297– 5303. [13] R. Wheeler, S. Aitken, Multiple algorithms for fraud detection, KnowledgeBased Systems 13 (2–3) (2000) 93–99. [14] K.S. Shin, I. Han, A case-based approach using inductive indexing for corporate bond rating, Decision Support Systems 32 (1) (2001) 41–52. [15] B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J. Suykens, J. Vanthienen, Benchmarking state-of-the-art classiﬁcation algorithms for credit scoring, Journal of the Operational Research Society 54 (6) (2003) 1082–1088. [16] K.B. Schebesch, R. Stecking, Support vector machines for classifying and describing credit applicants: detecting typical and critical regions, Journal of the Operational Research Society 56 (9) (2005) 1082–1088. [17] L. Breiman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classiﬁcation and Regression Trees, Wadsworth, Belmont, 1984. [18] M. Mehta, R. Agrawal, J. Rissanen, SLIQ: a fast scalable classiﬁer for data mining, in: Proceedings of the Fifth Int’l Conference on Extending Database Technology, Avignon, France, 1996. [19] T.-S. Lim, W.-Y. Loh, Y.-S. Shih, An empirical comparison of decision trees and other classiﬁcation methods, Technical Report 979, Department of Statistics, University of Wisconsin, Madison, 1997. [20] D.F. Zhang, S. Leung, Z.M. Ye, A decision tree scoring model based on genetic algorithm and K-means algorithm, in: 3rd International Conference on Convergence and Hybrid Information Technology, 2008, pp. 1043–1047. [21] X.Y. Zhou, D.F. Zhang, Y. Jiang, A new credit scoring method based on rough sets and decision tree, in: T. Washio et al. (Eds.), PAKDD2008, LNAI5012, pp. 1081–1089. [22] C.F. Tsai, Feature selection in bankruptcy prediction, Knowledge-Based Systems 22 (2) (2009) 120–127. [23] R. Polikar, Ensemble based systems in decision making, IEEE Circuits and Systems Magazine 6 (3) (2006) 21–45. [24] Z.H. Zhou, Ensemble, in: L. Liu, T. Özsu (Eds.), Encyclopedia of Database Systems, Springer, Berlin, 2009. [25] D. Opitz, R. Maclin, Popular ensemble methods: an empirical study, Journal of Artiﬁcial Intelligence Research 11 (1999) 169–198. [26] T.G. Dietterich, An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization, Machine Learning 40 (2) (2000) 139–157. [27] T.K. Ho, The random subspace method for constructing decision forests, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (8) (1998) 832– 844. [28] R. Bryll, R. Gutierrez-Osuna, F. Quek, Attribute bagging: improving accuracy of classiﬁer ensembles by using random feature subsets, Pattern Recognition 36 (6) (2003) 1291–1302. [29] J.R. Quinlan, C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, 1993. [30] L. Breiman, Bagging predictors, Machine Learning 24 (2) (1996) 123–140. [31] J.J. Rodriguez, L.I. Kuncheva, C.J. Alonso, Rotation forest: a new classiﬁer ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (10) (2006) 1619–1630. [32] A. Asuncion, D.J. Newman, UCI Machine Learning Repository, University of California, School of Information and Computer Science, Irvine, CA, 2007. Available from: . [33] I.H. Witten, E. Frank, Data mining: Practical Machine Learning Tools and Techniques, Morgan Kaufman Publishers, Boston, 2005. [34] L. Breiman, Random Forests, Machine Learning 45 (1) (2001) 5–32.

Two credit scoring models based on dual strategy ensemble trees

Two credit scoring models based on dual strategy ensemble trees

Recommend Documents