A hybrid approach of DEA, rough set and support vector machines for business failure prediction

A hybrid approach of DEA, rough set and support vector machines for business failure prediction

Expert Systems with Applications 37 (2010) 1535–1541 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

211KB Sizes 83 Downloads 241 Views

Expert Systems with Applications 37 (2010) 1535–1541

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

A hybrid approach of DEA, rough set and support vector machines for business failure prediction Ching-Chiang Yeh a, Der-Jang Chi b,*, Ming-Fu Hsu c a

Department of Business Administration, National Taipei College of Business, Taipei, Taiwan, ROC Department of Accounting, Chinese Culture University, Taipei 11114, Taiwan, ROC c Department of International Business Studies, National Chi Nan University, Nantou County, Taiwan, ROC b

a r t i c l e

i n f o

Keywords: Business failure Financial ratios DEA Rough set Support vector machines

a b s t r a c t The prediction of business failure is an important and challenging issue that has served as the impetus for many academic studies over the past three decades. While the efficiency of a corporation’s management is generally acknowledged to be a key contributor to corporation’s bankrupt, it is usually excluded from early prediction models. The objective of the study is to use efficiency as predictive variables with a proposed novel model to integrate rough set theory (RST) with support vector machines (SVM) technique to increase the accuracy of the prediction of business failure. In the proposed method (RST–SVM), data envelopment analysis (DEA) is employed as a tool to evaluate the input/output efficiency. Furthermore, by RST approach, the redundant attributes in multi-attribute information table can be reduced, which showed that the number of independent variables was reduced with no information loss, is utilized as a preprocessor to improve business failure prediction capability by SVM. The effectiveness of the methodology was verified by experiments comparing back-propagation neural networks (BPN) approach with the hybrid approach (RST–BPN). The results shows that DEA do provide valuable information in business failure predictions and the proposed RST–SVM model provides better classification results than RST–BPN model, no matter when only considering financial ratios or the model including both financial ratios and DEA. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction The prediction of business failure is an important and challenging issue that has served as the impetus for many academic studies over the past three decades (Altman, 1968; Beaver, 1966; Bryant, 1997; Ohlson, 1980). Business failure is a general term and, according to a widespread definition, is the situation that a firm cannot pay lenders, preferred stock shareholders, suppliers, etc., or a bill is overdrawn, or the firm is bankrupt according to the law (Ahn, Cho, & Kim, 2000). Widely identified causes and symptoms of business failure include poor management, autocratic leadership and difficulties in operating successfully in the market. As the world’s economy has been experiencing severe challenges during the past decade, more and more companies, no matter large or small, are facing the problems of filing bankruptcy. Thus, accurate business failure prediction models have drawn serious attention from both researchers and practitioners aiming to provide on time signals

* Corresponding author. Address: Department of Accounting, Chinese Culture University, 55, Hwa-Kang Road, Yang-Ming-Shan, Taipei 11114, Taiwan, ROC. Tel.: +886 2 2861 0511x35505; fax: +886 2 2861 4177. E-mail address: [email protected] (D.-J. Chi). 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.06.088

for better investment and government decisions with timely warnings. Many different useful techniques have already been investigated in the course of comparative studies related in several review articles (Altman, 1984; Dimitras, Zanakis, & Zopounidis, 1996; Jones, 1987; Keasey & Watson, 1991; Scott, 1981; Zavgren, 1983) in order to solve the problems involved during the evaluation process. Recently, Kumar and Ravi (2007) gave a complete review of methods used for the prediction of business failure and of new trends in this area. Basically, the business failure prediction models use appropriate independent variables to ‘‘predict” a company is a healthy company or a bankrupt one. Therefore, the business failure prediction problems are in the scope of the more general and widely discussed discrimination and classification problems (Johnson & Wichern, 2002). However, while these well-established techniques are there to solve business failure prediction problems and applications, two main problems arise. First, after Beaver (1966) and Altman (1968) used the financial ratios methodology in conducting business failure predictions, most of the studies only considered financial ratios as independent (input) variables. Although financial ratios, originated in a corporation’s financial statements, can reflect some characteristics of a

1536

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

corporation from various aspects to a certain extent. While the efficiency of a corporation’s management is generally acknowledged to be a key contributor to corporation’s bankrupt (Gestel et al., 2006; Seballos & Thomson, 1990; Secrist, 1938), it is usually excluded from early prediction models. Therefore, in this study, we believe efficiency which reflects the status of the management of a corporation in business failure predictions will be decisive factors affecting the predictive capability. For a typical efficiency measurements, for example, research on operational efficiency—the most widely studied efficiency issue— assumes the resources of a corporation as inputs (e.g., personnel, technology, space, etc.) and some measurable form of the services provided as output (e.g., number of accounts serviced or loans and other transactions processed, etc.). However, it is hard to evaluate the efficiency of a corporation directly from its financial statements. An approach known as data envelopment analysis (DEA) may serve to offer useful insights into the manner by incorporating multiple inputs and outputs; DEA is able to provide measures for the efficiency of a corporation. Secondly, early studies of business failure prediction used statistical techniques such as univariate statistical methods, multiple discriminant analysis (MDA), linear probability models, and logit and probit analysis have mainly been used for business classification problems (Altman, 1968; Altman, Haldeman, & Narayanan, 1977; Collins & Green, 1972). These conventional statistical methods, however, have some restrictive assumptions such as the linearity, normality and independence among predictor or input variables. Considering that the violation of these assumptions for independent variables frequently occurs with financial data (Deakin, 1972), the methods can have limitations to obtain the effectiveness and validity. Artificial intelligence approaches that are less vulnerable to these assumptions, such as inductive learning, Neural networks (NN) can be alternative methodologies for classification problems to which traditional statistical methods have long been applied. NN have shown to have better predictive capability than MDA and logistic regression in business failure prediction problems (Coleman, Graettinger, & Lawrence, 1991, Rahimian Salchengerger, Cinar, & Lash, 1993, Salchengerger et al., 1992, Sharda & Wilson, 1996, Tam & Kiang, 1992, Wilson & Sharda, 1994, Zhang, Hu, Patuwo, & Indro, 1999). Recently, support vector machines (SVM), developed by Vapnik (1995), have gained popularity due to many attractive features and excellent generalization performance on a wide range of problems. Also, SVM embody the structural risk minimization principle (SRM), which has been shown to be superior to traditional empirical risk minimization principle (ERM) employed by conventional neural networks. It has been demonstrated by Min and Lee (2005) that SVM outperform NN, MDA and logistic regression in business failure prediction. While there are several arguments that variable selection, also called feature selection, is a fundamental problem that has significant impact on the prediction accuracy of the models. Many methods have been developed to create the best preparation for data inputs. For doing good classification process in SVM, the preparation of data inputs for classifier needs special treatment to guarantee the good performance in the classifier. It is therefore not surprising that much research has been done on dimensionality reduction (Dash & Liu, 1997; Kira & Rendell, 1992; Langley, 1994). A technique that can reduce dimensionality using information contained within the data set and preserving the meaning of the features is clearly desirable. Rough set theory (RST) can be used as such a tool to discover data dependencies and reduce the number of attributes contained in a dataset by purely structural methods (Pawlak, 1991), have successfully been applied to real world classification problems (Ahn et al., 2000; Siegel, de Korvin, & Omer, 1993; Slowinski & Zopounidis, 1995).

The objective of the study is using efficiency as predictive variables and proposed a novel model to integrate RST with SVM technique, named RST–SVM, to increase the accuracy for the prediction of business failure. By RST approach, the redundant attributes in multi-attribute information table can be reduced, which showed that the number of independent variables was reduced with no information loss, is utilized as a preprocessor to improve business failure prediction capability by SVM. In the first stage, RST is selected for doing variable selection because of its reliability to obtain the significant independent variables. The second stage of the study will use the obtained significant independent variables from RST as inputs of SVM models. The obtained results can then be compared to see whether the one including efficiency variable will give better classification accuracy or not. In the proposed method, DEA is employed as a tool to evaluate the input/output efficiency. The effectiveness of the methodology was verified by experiments comparing back-propagation neural networks (BPN) approach with the hybrid approach (RST–BPN). This paper is organized as follows. We will give a brief review of the DEA model used to evaluate the efficiency of a corporation in Section 2. Section 3 describes classification techniques used in previous researches concerned with our paper: RST and SVM, respectively. In Section 4, the proposed data preprocessing algorithm by RST and hybrid models is described. In Section 5, we analyze and compare the results of each model. Finally, discussion and conclusions are provided in Section 6. 2. Using DEA for evaluating the efficiencies Data envelopment analysis (DEA) is an evaluation tool for decision making units (DMUs) and it solves many decision-making problems by integrating multiple inputs and outputs simultaneously. DEA is a non-parametric data analytic technique that is extensively used by various research communities (e.g., Hong, Ha, Shin, Park, & Kim, 1999; Seol, Choi, Park, & Park, 2007; Sohn & Moon, 2004). The basic ideas behind DEA date back to Farrell (1957) but the recent series of discussions started with the article by Charnes, Cooper, and Rhodes (1978). We give very briefly the salient features of DEA. More detailed information can be obtained elsewhere (Banker, Charnes, & Cooper, 1984; Charnes, Cooper, Lewin, & Seiford, 1993). The DEA ration form, proposed by Charnes, Cooper and Rhodes (CCR) (1978), is designed to measure the relative efficiency or productivity of a specific DMUk. The DEA formulation is given as follows. Suppose that there is a set of n DMUs to be analyzed, each of which uses m common inputs and s common outputs. Let k (k = 1, . . ., n) denote the DMU whose relative efficiency or productivity is to be maximized.

Maximize Subject to

Ps urk Y rk hk ¼ Pr¼1 m i¼1 v ik X ik Ps ur Y rj Pr¼1 61 m v i¼1 i X ij

ð1Þ

ur ; v i P 0 i ¼ 1; 2; . . . ; m r ¼ 1; 2; . . . ; s j ¼ 1; 2; . . . ; n where urk is the variable weights of given to the rth output of the kth DMU, v ik is the variable weights of given to the ith input of the kth DMU, urk and v ik are decision variables determining the relative efficiency of DMUk , Y rj is the rth output of the jth DMU, and X ij is the ith input of the jth DMU. It also assumes that all Y rj and X ij are positive. hk is the efficiency score and is less than or equal to 1. When efficiency score of hk is 1, DMUk is called the efficient frontier.

1537

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

There are two types of CCR models. One version is input oriented model, which minimizes the inputs, and the other is output oriented model maximizing the outputs. In this paper, we apply the output oriented CCR model since we focus on maximizing the multiple outputs. 3. Rough sets and support vector machines 3.1. Basic concepts of rough sets Rough sets theory (RST) is a machine-learning method, which is introduced by Pawlak (1991) in the early 1980s, has proved to be a powerful tool for uncertainty and has been applied to data reduction, rule extraction, data mining and granularity computation. Here, we illustrate only the relevant basic ideas of RST that are relevant to the present work. By an information system we understand the 4-tuple S = ðU; A; V; f Þ, where U is a finite set of objects, called the universe, A is a finite set of attributes, V ¼ U a2A V a is a domain of attribute a, and f : U  A ! V is called an information function such that f ðx; aÞ 2 v a , for 8a 2 A; 8x 2 U. In the classification problems, an information system is also seen as a decision table assuming that A ¼ C [ D and C \ D ¼ /, where C a set of condition attributes and D is a set of decision attributes. Let S ¼ ðU; A; V; f Þ be an information system, every P # A generates an indiscernibility relation INDðPÞ on U, which is defined as follows:

INDðPÞ ¼ fðx; yÞ 2 U  U : f ðy; aÞ; 8a 2 pg

ð2Þ

U=INDðPÞ ¼ fc1 ; c2 ; . . . ; ck g is a partition of U by P, every C i is an equivalence class. For 8x 2 U the equivalence class of x in relation to U=INDðPÞ is defined as follows:

½xU=INDðPÞ ¼ fy 2 U : f ðy; aÞ ¼ f ðx; aÞ; 8a 2 Pg

ð3Þ

Let P # A and X # U. The P-lower approximation of x (denoted by P  ðxÞÞ and the P-upper approximation of x (denoted by P ðxÞÞ are defined as follows:

P ðxÞ ¼ fy 2 U : ½yU=INDðPÞ # X g;

ð4Þ

P ðxÞ ¼ fy 2 U : ½YU=INDðPÞ \ X – /g:

Obviously, reduction is a feature subset selection process, where the selected feature subset not only retains the representational power, but also has minimal redundancy. So, RST methodology-based dimensionality reduction will get a good feature subset. Some RST-based reduction and feature selection algorithms have been proposed. Consistency of data (Mi, Wei-Zhi, & Wen-Xiu, 2004; Pawlak, 1991), dependency of attributes (Wang, Hu, & Yang, 2002), mutual information (Skowron & Rauszer, 1992), discernibility matrix (Jue & Duo-Qian, 1998) and genetic algorithm are employed to find reducts of an information system (Moradi, Grzymala-Busse, & Roberts, 1998). And these techniques are applied to text classification (Swiniarski & Hargis, 2001), face recognition (Liu & Setiono, 1998), texture analysis (Swiniarski & Skowron, 2003) and process monitoring (Dubois & Prade, 1992). An extensive review about RST-based feature selection was given in Thangavel and Pethalakshmi (2009). 3.2. Support vector machines Support vector machines (SVM) is the theory based on statistical learning theory. It realizes the theory of VC dimension (for Vapnik–Chervonenkis dimension) and principle of structural risk minimum (SRM). The whole theory can simply be described as follows: searching an optimal hyper plane satisfies the request of classification, then using a certain algorithm to make the margin of the separation beside the optimal hyper plane maximum while ensuring the accuracy of correct classification. According to the theory, we can classify the separable data into classes effectively. The following is the brief introduction of SVM in cases. Suppose we are given a set of training data xi 2 Rn ði ¼ 1; 2; . . . ; nÞ with the desired output yi 2 fþ1; 1g corresponding to the two classes. And suppose there exists a separating hyper plane with the target functions w  xi þ b ¼ 0 (w represents the weight vector and b represents the bias). To ensure that all training data can be classified, we must make the margin of separation ð2=kwkÞ maximum. Then, in the case of linear separation, the linear SVM for optimal separating hyper plane has the following optimization problem,

1 T w w 2 Subject to yi ðxi  w þ bÞ P 1; Minimize



where P ðxÞ is the set of all objects from U which can certainly be classified as elements of x employing the set of attributes P. P ðxÞ is the set of objects of U which can be classified as elements of X using the set of attributes P. Let P; Q # A, the positive region of classification U=INDðQ Þ with respect to the set of attributes P, or in short, P – positive region of Q, is defined as POSðQ Þ ¼ U X2U=INDðQ Þ PðxÞ: POSP ðQ Þ contains objects in U that can be classified to one class of the classification U=INDðQ Þ by attributes P. The dependency of Q on P is defined as:

/ðwÞ ¼

Maximize Q ðaÞ ¼

n X

ai 

i¼1

ð5Þ

An attribute a is said to be dispensable in P with respect to Q, if cP ðQ Þ ¼ kPfag ðQ Þ; otherwise a is an indispensable attribute in P with respect to Q. Let S ¼ ðU; A; V; f Þ be a decision table, the set of attributes PðP # CÞ is a reduce of attributes C, which satisfies the following conditions:

i ¼ 1; 2 . . . ; n

ð8Þ

The solution to above optimization problem can be converted into its dual problem. We can search the nonnegative Lagrange multipliers by solving the following optimization problem,

Subject to

cP ðQ Þ ¼ cardðPOSP ðQ ÞÞ=cardðUÞ:

ð7Þ

n X

n X n 1X ai aj yi yj xTi xj 2 i¼1 j¼1

ai yi ¼ 0ai P 0;

i ¼ 1; 2 . . . ; n

ð9Þ ð10Þ

i¼1

The corresponding training data are the support vectors. Suppose ai are the optimal Lagrange multipliers, the optimal weight vectors are

w ¼

n X

ai yi xi

ð11Þ

i¼1

The optimal biases are 0

cP ðDÞ ¼ cC ðDÞ; cP ðDÞ – cP0 ðDÞ 8P  P:

ð6Þ

A reduce of condition attributes C is a subset that can discern decision classes with the same accuracy as C, and none of the attributes in the reduced can be eliminated without decreasing its distrainable capability (Pawlak, 2002).



b ¼ yj 

n X

yj ai xTi xj

ð12Þ

i¼1

Then, the optimal equation for classification is 

f ðxÞ ¼ sgnfðw  xÞ þ b g

ð13Þ

1538

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

The above discussion is restricted to the case that the training data are separable. To generalize the problem to the non-separable case, slack variable ei P 0; i ¼ 1; 2; . . . n is introduced under the constraints of (10). The objective equation is

Minimize /ðw; eÞ ¼

n X 1 T ei w wþC 2 i¼1

Subject to yi ðwT xi þ bÞ P 1  ei

ei P 0; i ¼ 1; 2; . . . ; n

ð14Þ ð15Þ

C is the nonnegative parameter chosen by users. Solving the problem is similar to the problem of the case of linear separation. But the constraints are changed to be n X

ai yi ¼ 0 0 6 ai 6 C; i ¼ 1; 2; . . . ; n

ð16Þ

i¼1

As to the non-linear separable data, the data can be mapped into a high dimensional feature space with a nonlinear mapping in which we can search the optimal hyperplane. The linear classification after mapping is performed by selecting the appropriate inner-product kernel that satisfies the Mercer’s condition. Then the problem is converted into searching the nonnegative Lagrange multipliers fai gni¼1 by solving the following optimization problem (Gold & Sollish, 2005; Sinalingam & Pandia, 2005; Zhu & Zhang, 2003),

Q ðaÞ ¼

Maximize

n X

ai 

i¼1

Subject to

n X

n X n 1X ai aj yi yj Kðxi ; xj Þ 2 i¼1 j¼1

ai yi ¼ 0 0 6 ai 6 C; i ¼ 1; 2; . . . ; n

ð17Þ ð18Þ

i¼1

Hence, the final classification function is

( f ðxÞ ¼ sgn

n X

)  i yi Kðxi ; xj Þ

a

þb



ð19Þ

i¼1

The common used kernel function is RBF kernel function.

kx; x0 k2 Kðx; x Þ ¼ exp  2r2

!

0

ð20Þ

Financial applications of SVM typically focus on pattern matching, classification and forecasting. Haärdle, Moro, and Schaäfer (2003) employed SVM to predict bankruptcy and compared with NN, MDA and learning vector quantization (LVQ) (Fan & Palaniswami, 2000). SVM obtained the best results, followed by NN, followed by LVQ and followed by MDA. Van Gestel et al. (2003) also reported on the experiment with least squares SVM, a modified version of SVM, and showed significantly better results in business failure prediction when contrasted with the classical techniques. 4. Research data and experiments The objective of the study is to use efficiency as predictive variables and propose a novel model, RST–SVM, to increase the accuracy for the prediction of business failure. To test whether efficiency variable will be helpful in business failure predictions, our approach is based on the rationale that with financial ratios already been included as independent variables, testing whether the inclusion of DEA will provide extra information about improving the classification accuracy of the prediction model. As we also like to see whether RST can be a good supporting tool in deciding the input variables of the SVM prediction model, the objective of the proposed study is to explore the performance of business failure predictions by proposed RST–SVM model. In the first stage, RST is selected for doing variable selection because of its reliability to obtain the significant independent variables. The second stage of

the study will use the obtained significant independent variables from RST as inputs of SVM models. The obtained results can then be compared to see whether the one including DEA will give better classification accuracy or not. Finally, for verifying the applicability of methodology, we also designed RST–BPN model as the benchmark. The research data we employ are provided by Taiwan Stock Exchange (TSE) and database of the Taiwan Economic Journal (TEJ) in Taiwan and consists of the information and electronic manufacturing firms, which are filed for bankruptcy from 2005 to 2007. The criteria for sampling required that once a company was announced that stocks needed to be ‘‘Traded” or ‘‘Terminated.” In other words, it may have been cited as (1) having credit crisis, (2) having net operating loss, (3) failing to pay debts, or (4) violating for regulation. A failed firm was paired with a healthy firm by (1) industry, (2) products, (3) capitalization and (4) values of assets. The size of matched sample was 114 firms, including 38 failed firms and 76 healthy firms. After deleting variables with missing values, the previous research, experiences from past decisions, and the domain knowledge of financial experts in that industry, there are 18 attributes (including 17 financial ratios and DEA) and by the binary assignment to a decision class (healthy or unhealthy, coded by 1 and 2, respectively). For the DEA, informative input and output variables should be selected. Generally, the input variables for a corporation are capital, liability, human resources, technology, etc. and the output variables are commonly profit and sales. Therefore, in this paper, we selected R&D expense, R&D designers and the number of patents and trademarks as the input variables for DEA, and the output variable included gross profit and market share. To pick out the significant independent variables that are informative and closely related to the corporate condition, in this study, the RST-based application RSES is a collection of algorithms and data structures for rough set computations, developed at the Group of Logic, Inst. of Mathematics, University of Warsaw, Poland. and in particular the genetic algorithm (Komorowski, Øhrn, & Skowron, 2002) were used. The selected variables for this research are shown in Table 1, and these eight variables are taken as the input of the classifier of SVM and BPN. 5. Results and analysis 5.1. Two-stage hybrid model in integrating RST and SVM After RST analysis was finished and holdout sample was separated into two groups, we tested whether DEA will be helpful in business failure predictions. Next, hybrid model proposed in this paper is composed of RST and SVM with two groups. In this study, we tested these two possible hybrid models. RST–SVM model only uses financial ratios as independent variables which is model I, and RST-SVM model includes both financial ratios and DEA which is model II.

Table 1 Definition of variables. Variables

Description

X1 X2 X3 X4 X5 X6 X7 X8

Working capital/total assets Total debt/total assets Net income/total assets Current assets/total assets Current assets/sales Net income/(total assets  total liabilities) Account receivable turnover DEA

1539

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

In SVM, we applied the LIBSVM program, downloaded from http://www.csie.ntu.edu.tw/wcjlin/libsvm/, to construct the classification model and chose Gaussian RBF as the kernel function. There are two parameters associated with the RBF kernels: C and r. Some kind of parameter-selection procedure has to be done. Hsu, Chang, and Lin (2004) proposed a ‘grid search’ on C and r and a m-fold cross-validation on the training data. The goal of this procedure is to identify the optimal C and r, so that the classifier can accurately predict unseen data. In m-fold cross-validation, we first divide the training set into subsets of equal size. Sequentially one subset is tested using the classifier trained on the remaining (m  1) subsets. Thus, each instance of the whole of training set is predicted once, so the cross-validation accuracy is the percentage of data that are correctly classified. The cross-validation procedure can prevent the over fitting problem. In this paper, we performed the 5-fold cross-validation to choose the proper parameters of C ¼ f20 ; 21 ; .. .;27 g and r ¼ f23 ; 22 ; .. .;23 g. After conducting the grid-search for training data, we found that the results of the confusion matrix using the obtained two hybrid models can be summarized in Tables 2 and 3 respectively. From the results in Tables 2 and 3, we can observe that the average correct classification rate is 83.33% for the model only considering financial ratios and 86.84% for the model considering both financial ratios and DEA. From the improved correct classification rate of the model considering both financial ratios and DEA, DEA should be helpful in improving the classification accuracy of the prediction model.

tions of hidden nodes and learning rates, the network structure was 7-9-1 and 8-9-1 for input layer, hidden layer and output layer for the model III and IV respectively. We used sigmoid function for activation and Levenberg–Marquardt algorithm for learning. The BPN were executed by MATLAB NN toolbox. The prediction results of the testing sample (the confusion matrix) using the two hybrid prediction models are summarized in Tables 4 and 5 respectively. From the results revealed in Tables 4 and 5, we can observe that the average correct classification rate is 78.95% for the model only including financial ratios, and 82.46% for the model incorporating both financial ratios and DEA. Again from Table 6, the improved correct classification rate of the model considering both financial ratios and DEA, we can also conclude that DEA should provide extra information other than financial ratios in improving the classification accuracy of the prediction model. 5.3. Results compared with Type I and Type II errors of the constructed models It is well known that, in order to evaluate the overall classification capability of the designed business failure prediction models,

Table 4 RST–BPN model (model III) classification results with only financial ratios. Actual class

1 (Healthy) 2 (Unhealthy)

5.2. Two-stage hybrid model in integrating RST and BPN

Classified class 1 (Healthy)

2 (Unhealthy)

63 (82.89%) 11 (28.95%)

13 (17.11%) 27 (71.05%)

Average correct classification rate: 78.95%.

Since Vellido, Lisboa, and Vaughan (1999) pointed out that around 80% of business applications using neural networks will use the BPN training algorithm, for verifying the applicability of SVM, we will also use the popular BPN as the benchmark. As recommended by Cybenko (1989) and Hornik et al. (1989) that the network structure with one hidden layer is sufficient to model any complex system with any desired accuracy, the designed network model will have only one hidden layer. In this study, we tested these two possible hybrid models. RST– BPN model only uses financial ratios as independent variables which is hybrid model III, and RST–BPN model includes both financial ratios and DEA which is hybrid model IV. After comparing the prediction results of the testing sample with different combina-

Table 2 RST–SVM model (model I) classification results with only financial ratios. Actual class

1 (Healthy) 2 (Unhealthy)

1 (Healthy)

2 (Unhealthy)

64 (84.21%) 7 (18.42%)

12 (15.79%) 31 (81.58%)

Table 3 RST–SVM model (model II) classification results with both financial ratios and DEA.

1 (Healthy) 2 (Unhealthy)

1 (Healthy) 2 (Unhealthy)

1 (Healthy)

2 (Unhealthy)

65 (85.53%) 4 (10.53%)

11 (14.47%) 34 (89.47%)

Classified class 1 (Healthy)

2 (Unhealthy)

63 (82.89%) 7 (18.42%)

13 (17.11%) 31 (81.58%)

Average correct classification rate: 82.46%.

Table 6 Predictive accuracies of the constructed model.

RST–SVM (model I) RST–SVM (model II) RST–BPN (model III) RST–BPN (mode IV)

Accuracy (%) (1-1)

(2-2)

Average accuracy

84.21 85.53 82.89 82.89

81.58 89.47 71.05 81.58

84.21 86.84 78.95 82.95

Table 7 TypeI and TypeII errors of the constructed model. Model

Classified class

Average correct classification rate: 86.84%.

Actual class

Model

Classified class

Average correct classification rate: 83.33%.

Actual class

Table 5 RST–BPN model (model IV) classification results with both financial ratios and DEA.

RST–SVM (model I) RST–SVM (model II) RST–BPN (model III) RST–BPN (mode IV)

Performance assessment (%) TypeI error

TypeII error

15.79 14.47 17.11 17.11

18.42 10.53 28.95 18.42

1540

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

the misclassification also have to be taken into account. Type I errors means a healthy company is misclassified as a unhealthy company; Type II errors means a unhealthy company is misclassified as a healthy one. In order to evaluate the overall classification capability, Table 7 summarizes the Type I and Type II errors of the constructed models when considering only financial ratios and both financial ratios and DEA. From the results revealed in Table 7, we can find that the RST–SVM model has the lowest Type I and Type II errors in comparison with the RST–BPN model. Hence we can conclude that the RST–SVM model not only have the best classification rate, but also have the lowest Type I and Type II errors. After comparing the results in Table 7, several conclusions can be observed. First, the models including both financial ratios and DEA provide better classification results than the corresponding models only using financial ratios. The above phenomenon implies that DEA do provide valuable information in predicting business failure. Secondly, we constituted RST–SVM model that provides better classification results than RST–BPN model, no matter when only considering financial ratios or the model including both financial ratios and DEA. Hence, we believe the proposed RST–SVM model should be a better alternative since it exhibits the capability in identifying important independent variables which may provide valuable information for further diagnostic purposes.

6. Discussion and conclusions 6.1. Discussion The prediction of business failure is an important and challenging issue that has served as the impetus for many academic studies over the past three decades. While the efficiency of a corporation’s management is generally acknowledged to be a key contributor to corporation’s bankrupt, it is usually excluded from early prediction models. The objective of the study is to use efficiency as predictive variables and propose a novel model, RST–SVM, to increase the accuracy for the prediction of business failure. For verifying the applicability of methodology, we also designed neural networks approach with the hybrid approach as the benchmark, and the proposed RST–SVM model was applied to a dataset on bankruptcies in Taiwan. First, this study found that most of the prior studies only adopted financial ratios as independent variables. While the efficiency of a corporation’s management is generally acknowledged to be a key contributor to corporation’s bankrupt (Gestel et al., 2006; Seballos & Thomson, 1990; Secrist, 1938), it is usually excluded from early prediction models. Therefore, we believe efficiency which reflects the status of the management of a corporation in business failure predictions will be decisive factors affecting the predictive capability. As an efficiency evaluation technique, the DEA is useful especially when there are various selection criteria and measurement units. Thus, we use DEA as the variable for efficiency of a corporation in the business failure prediction models. Secondly, most financial ratios did not satisfy the normality assumption for multivariate statistical models such as the MDA and the logistics regression model. Thus, these statistical prediction techniques exhibited the worst predictive accuracy and the largest errors of all models tested herein. Thirdly, artificially intelligent models (SVM and NN) are more accurate in predicting business failure than other multivariate statistical models. By minimizing the sum of the empirical risk and the complexity of the hypothesis space, SVM gives good generalization performance on many business failure prediction problems. For doing good classification process in SVM, the preparation of data inputs for classifier needs special treatment to guarantee the good performance in classifier. One of the reasons is after getting data from experiment and many variables of course, it cannot

be directly inputted into classifier because it will decrease the performance of classifier. Finally, the RST has become very popular among scientists worldwide and is now one of the most developed techniques in intelligent data analyses. The RST reduction technique is applied to find all reducts of the data that contain the minimal subset of attributes that are associated with a class label for classification. Based on these reasons, we can conclude that the proposed RST–SVM model outperformed the other business failure models. Additionally, the results of this work demonstrate that the predictive accuracy of the RST–SVM model in forecasting the business failure is significantly increased by DEA. 6.2. Conclusions For verifying the feasibility on this proposed RST–SVM model, business failure prediction tasks are performed using the public companies filing bankruptcy between 2005 and 2007 in Taiwan. The contribution of this study can be summarized as follows. First, DEA do provide valuable information in business failure predictions. Secondly, the proposed RST–SVM model provides better classification results than RST–BPN, no matter when only considering financial ratios or the model including both financial ratios and DEA. Hence, the RST–SVM model should be an efficient alternative. The above-mentioned research findings justify the presumptions that the RST–SVM model should be a better alternative in conducting business failure prediction tasks. Besides, the RST–SVM model not only has better classification accuracies, but also has the lowest Type I and Type II errors. Thus, the forecasting technique (RST– SVM) can provide a guide of investment for investors and government. References Ahn, B. S., Cho, S. S., & Kim, C. Y. (2000). The integrated methodology of rough set theory and artificial neural network for business failure prediction. Expert Systems with Applications, 18, 65–74. Altman, E. I. (1968). Financial ratios, discriminant analysis, and the prediction of corporate bankruptcy. Journal of Finance, 23(4), 589–609. Altman, E. I. (1984). The success of business failure prediction models: An international survey. Journal of Banking and Finance, 8(2), 171–198. Altman, E. I., Haldeman, R. G., & Narayanan, P. (1977). Zeta analysis. Journal of Banking and Finance, 29–51. June. Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefficiencies in data envelopment analysis. Management Science, 30(1), 1078–1092. Beaver, W. H. (1966). Financial ratios as predictors of failure, empirical research in accounting: Selected studies. Supplement to the Journal of Accounting Research, 4, 179–199. Bryant, S. M. (1997). A case-based reasoning approach to bankruptcy prediction modeling. Intelligent System Accounting, Financial and Management, 6, 195–214. Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2, 429–444. Charnes, A., Cooper, W. W., Lewin, A. Y., & Seiford, L. M. (Eds.). (1993). Data envelopment analysis: Theory, methodology and applications. Boston: Kluwer. Coleman, K. G., Graettinger, T. J., & Lawrence, W. F. (1991). Neural networks for bankruptcy prediction: The power to solve financial problems. AI Review, 48–50. Collins, R. A., & Green, R. D. (1972). Statistical methods for bankruptcy forecasting. Journal of Economics and Business, 32, 349–354. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematical Control Signal Systems, 2(1989), 303–314. Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(3), 131–156. Deakin, E. B. (1972). A discriminant analysis of predictors of business failure. Journal of Accounting Research, 10(1), 167–179. Dimitras, A. I., Zanakis, S. H., & Zopounidis, C. (1996). A survey of business failures with an emphasis on prediction methods and industrial applications. European Journal of Operational Research, 90, 487–513. Dubois, D., & Prade, H. (1992). Putting rough sets and fuzzy sets together. In R. Slowinski (Ed.), Intelligent Decision Support-Handbook of Applications and Advances of the Rough Set Theory (pp. 203–232). Dordrecht: Kluwer Academic. Fan, A., & Palaniswami, M. (2000). Selecting bankruptcy predictors using a support vector machine approach. Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks (Vol. 6, pp. 354–359). Farrell, M. J. (1957). The measurement of productive efficiency. Journal of the Royal Statistical Society. Series A (General), 120, 253–289.

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541 Gestel, T. V., Baesens, B., Suykens, J., Poel, D. V., Baestaens, D. E., & Willekens, M. (2006). Bayesian kernel based classification for financial distress detection. European Journal of Operational Research, 172, 979–1003. Gold, C., & Sollish, P. (2005). Model selection for support vector machine classification. Neurcomputing, 55, 221–249. Haärdle, W., Moro, R., & Schaäfer, D. (2003). Predicting corporate bankruptcy with support vector machines Working slide. Humboldt University and the German Institute for Economic Research. Hong, H., Ha, S., Shin, C., Park, S., & Kim, S. (1999). Evaluating the efficiency of system integration projects using data envelopment analysis (DEA) and machine learning. Expert Systems with Applications, 16, 283–296. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximations. Neural Networks, 2, 336–359. Hsu, C. W., Chang, C. C., & Lin, C. J. (2004). A practical guide to support vector classification. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University. Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis (5th ed.). Upper Saddle River, NJ: Prentice-Hall. Jones, F. L. (1987). Current techniques in bankruptcy prediction. Journal of Accounting Literature, 6, 131–164. Jue, W., & Duo-Qian, M. (1998). Analysis on attribute reduction strategies of rough set. Journal of Computer Science and Technology, 13(2), 189–193. Keasey, K., & Watson, R. (1991). Financial distress prediction models: A review of their usefulness. British Journal of Management, 2, 89–102. Kira, K., & Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new algorithm. In Proceedings of ninth national conference on artificial intelligence (pp. 129–134). Komorowski, K., Øhrn, A., & Skowron, A. (2002). The ROSETTA rough set software system. In W. Klösgen & J. Zytkow (Eds.), Handbook of data mining and knowledge discovery. Oxford University Press. Kumar, P. R., & Ravi, V. (2007). Bankruptcy prediction in banks and firms via statistical and intelligent techniques – A review. European Journal of Operational Research, 180(1), 1–28. Langley, P. (1994). Selection of relevant features in machine learning. In Proceedings of the AAAI fall symposium on relevance (pp. 1–5). Liu, H., & Setiono, R. (1998). Some issues on scalable feature selection. Expert Systems with Applications, 15, 333–339. Mi, J.-S., Wei-Zhi, W., & Wen-Xiu, Z. (2004). Approaches to knowledge reduction based on variable precision rough set model. Information Sciences, 159(3–4,15), 255–272. Min, J. H., & Lee, Y. C. (2005). Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Systems with Applications, 28, 603–614. January. Moradi, H., Grzymala-Busse, J. W., & Roberts, J. A. (1998). Entropy of english text: Experiments with humans and a machine learning system based on rough sets. Information Sciences, 104(1-2), 31–47. Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 109–131. Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht: Kluwer Academic Publishing. Pawlak, Z. (2002). Rough sets and intelligent data analysis. Information Science, 11, 1–12. Rahimian, E., Singh, S., Thammachote, T., & Virmani, R. (1993). Bankruptcy prediction by neural networks. In E. Trippi & E. Turban (Eds.), Neural networks in finance and investing: Using artificial intelligence to improve real-world performance (pp. 159–176). Chicago: Probus Publishing.

1541

Salchengerger, L. M., Cinar, E. M., & Lash, N. A. (1992). Neural networks: A new tool for prediction thrift failures. Decision Sciences, 23, 899–916. Scott, J. (1981). The probability of bankruptcy: A comparison of empirical predictions and theoretical models. Journal of Banking and Finance, 5, 317–344. Seballos, L. D., & Thomson, J. B. (1990). Understanding causes of commercial bank failures in the 1980s. Economic commentary. Federal Reserve Board of Cleveland. September. Secrist, H. (1938). National bank failures and non-failures: An autopsy and diagnosis. Bloomington, IN: Principia Press. Seol, H., Choi, J., Park, G., & Park, Y. (2007). A framework for benchmarking service process using data envelopment analysis and decision tree. Expert Systems with Applications, 32(2), 432–440. Sharda, R., & Wilson, R. L. (1996). Neural networks experiments in business-failure forecasting: Predictive performance measurement issues. International Journal of Computational Intelligence and Organizations, 1(2), 107–117. Siegel, P. H., de Korvin, A., & Omer, K. (1993). Detection of irregularities by auditors: A rough set approach. Indian Journal of Accounting, 44–56. Sinalingam, D. M., & Pandia, N. (2005). Minmal classification method with error correlation codes for multiclass recognization. International Journal of Pattern Recognition and Artificial Intelligence, 5, 663–680. Skowron, A., & Rauszer, C. (1992). The discernibility matrices and functions in information systems. Intelligent decision support: Handbook of applications and advances of rough set theory, 331–362. Slowinski, R., & Zopounidis, C. (1995). Application of the rough set approach to evaluation of bankruptcy risk. International Journal of Intelligent Systems in Accounting, Finance and Management, 4, 27–41. Sohn, S., & Moon, T. (2004). Decision tree based on data envelopment analysis for effective technology commercialization. Expert Systems with Applications, 26(2), 279–284. Swiniarski, R. W., & Hargis, L. (2001). Rough sets as a front end of neural networks texture classifier. Neurocomputating, 36, 85–102. Swiniarski, R. W., & Skowron, A. (2003). Rough set methods in feature selection and recognition. Pattern Recognition Letter, 24(6), 833–849. Tam, K. Y., & Kiang, M. (1992). Managerial applications of neural networks: The case of bank failure predictions. Management Science, 38(7), 926–947. Thangavel, K., & Pethalakshmi, A. (2009). Dimensionality reduction based on rough set theory: A review. Applied Soft Computing, 9(1), 1–12. Van Gestel, T., Baesens, B., Suykens, J., Espinoza, M., Baestaens, D.E., Vanthienen, J., et al. (2003). Bankruptcy prediction with least squares support vector machine classifiers. In Proceedings of the IEEE international conference on computational intelligence for financial engineering, Hong Kong (pp. 1–8). Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer. Vellido, A., Lisboa, P. J. G., & Vaughan, J. (1999). Neural networks in business: A survey of applications (1992–1998). Expert Systems with Applications, 17, 51–70. Wang, G., Hu, H., & Yang, D. (2002). Decision table reduction based on conditional information entropy. Chinese Journal of Computers, 25(7), 1–8. Wilson, R. L., & Sharda, R. (1994). Bankruptcy prediction using neural networks. Decision Support Systems, 11, 545–557. Zavgren, C. V. (1983). The prediction of corporate failure: The state of the art. Journal of Financial Literature, 2, 1–37. Zhang, G., Hu, M. Y., Patuwo, B. E., & Indro, D. C. (1999). Artificial neural networks in bankruptcy predictions: General framework and cross-validation analysis. European Journal of Operational Research, 116, 16–32. Zhu, Y. S., & Zhang, Y. Y. (2003). The study on some problems of support vector classifier. Computer Engineering and Applications, 13, 38–66.