A hybrid approach of DEA, rough set and support vector machines for business failure prediction

Expert Systems with Applications 37 (2010) 1535–1541 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

Download PDF

211KB Sizes 83 Downloads 241 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 37 (2010) 1535–1541

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

A hybrid approach of DEA, rough set and support vector machines for business failure prediction Ching-Chiang Yeh a, Der-Jang Chi b,*, Ming-Fu Hsu c a

Department of Business Administration, National Taipei College of Business, Taipei, Taiwan, ROC Department of Accounting, Chinese Culture University, Taipei 11114, Taiwan, ROC c Department of International Business Studies, National Chi Nan University, Nantou County, Taiwan, ROC b

a r t i c l e

i n f o

Keywords: Business failure Financial ratios DEA Rough set Support vector machines

a b s t r a c t The prediction of business failure is an important and challenging issue that has served as the impetus for many academic studies over the past three decades. While the efﬁciency of a corporation’s management is generally acknowledged to be a key contributor to corporation’s bankrupt, it is usually excluded from early prediction models. The objective of the study is to use efﬁciency as predictive variables with a proposed novel model to integrate rough set theory (RST) with support vector machines (SVM) technique to increase the accuracy of the prediction of business failure. In the proposed method (RST–SVM), data envelopment analysis (DEA) is employed as a tool to evaluate the input/output efﬁciency. Furthermore, by RST approach, the redundant attributes in multi-attribute information table can be reduced, which showed that the number of independent variables was reduced with no information loss, is utilized as a preprocessor to improve business failure prediction capability by SVM. The effectiveness of the methodology was veriﬁed by experiments comparing back-propagation neural networks (BPN) approach with the hybrid approach (RST–BPN). The results shows that DEA do provide valuable information in business failure predictions and the proposed RST–SVM model provides better classiﬁcation results than RST–BPN model, no matter when only considering ﬁnancial ratios or the model including both ﬁnancial ratios and DEA. Ó 2009 Elsevier Ltd. All rights reserved.

1. Introduction The prediction of business failure is an important and challenging issue that has served as the impetus for many academic studies over the past three decades (Altman, 1968; Beaver, 1966; Bryant, 1997; Ohlson, 1980). Business failure is a general term and, according to a widespread deﬁnition, is the situation that a ﬁrm cannot pay lenders, preferred stock shareholders, suppliers, etc., or a bill is overdrawn, or the ﬁrm is bankrupt according to the law (Ahn, Cho, & Kim, 2000). Widely identiﬁed causes and symptoms of business failure include poor management, autocratic leadership and difﬁculties in operating successfully in the market. As the world’s economy has been experiencing severe challenges during the past decade, more and more companies, no matter large or small, are facing the problems of ﬁling bankruptcy. Thus, accurate business failure prediction models have drawn serious attention from both researchers and practitioners aiming to provide on time signals

* Corresponding author. Address: Department of Accounting, Chinese Culture University, 55, Hwa-Kang Road, Yang-Ming-Shan, Taipei 11114, Taiwan, ROC. Tel.: +886 2 2861 0511x35505; fax: +886 2 2861 4177. E-mail address: [email protected] (D.-J. Chi). 0957-4174/$ - see front matter Ó 2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2009.06.088

for better investment and government decisions with timely warnings. Many different useful techniques have already been investigated in the course of comparative studies related in several review articles (Altman, 1984; Dimitras, Zanakis, & Zopounidis, 1996; Jones, 1987; Keasey & Watson, 1991; Scott, 1981; Zavgren, 1983) in order to solve the problems involved during the evaluation process. Recently, Kumar and Ravi (2007) gave a complete review of methods used for the prediction of business failure and of new trends in this area. Basically, the business failure prediction models use appropriate independent variables to ‘‘predict” a company is a healthy company or a bankrupt one. Therefore, the business failure prediction problems are in the scope of the more general and widely discussed discrimination and classiﬁcation problems (Johnson & Wichern, 2002). However, while these well-established techniques are there to solve business failure prediction problems and applications, two main problems arise. First, after Beaver (1966) and Altman (1968) used the ﬁnancial ratios methodology in conducting business failure predictions, most of the studies only considered ﬁnancial ratios as independent (input) variables. Although ﬁnancial ratios, originated in a corporation’s ﬁnancial statements, can reﬂect some characteristics of a

1536

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

corporation from various aspects to a certain extent. While the efﬁciency of a corporation’s management is generally acknowledged to be a key contributor to corporation’s bankrupt (Gestel et al., 2006; Seballos & Thomson, 1990; Secrist, 1938), it is usually excluded from early prediction models. Therefore, in this study, we believe efﬁciency which reﬂects the status of the management of a corporation in business failure predictions will be decisive factors affecting the predictive capability. For a typical efﬁciency measurements, for example, research on operational efﬁciency—the most widely studied efﬁciency issue— assumes the resources of a corporation as inputs (e.g., personnel, technology, space, etc.) and some measurable form of the services provided as output (e.g., number of accounts serviced or loans and other transactions processed, etc.). However, it is hard to evaluate the efﬁciency of a corporation directly from its ﬁnancial statements. An approach known as data envelopment analysis (DEA) may serve to offer useful insights into the manner by incorporating multiple inputs and outputs; DEA is able to provide measures for the efﬁciency of a corporation. Secondly, early studies of business failure prediction used statistical techniques such as univariate statistical methods, multiple discriminant analysis (MDA), linear probability models, and logit and probit analysis have mainly been used for business classiﬁcation problems (Altman, 1968; Altman, Haldeman, & Narayanan, 1977; Collins & Green, 1972). These conventional statistical methods, however, have some restrictive assumptions such as the linearity, normality and independence among predictor or input variables. Considering that the violation of these assumptions for independent variables frequently occurs with ﬁnancial data (Deakin, 1972), the methods can have limitations to obtain the effectiveness and validity. Artiﬁcial intelligence approaches that are less vulnerable to these assumptions, such as inductive learning, Neural networks (NN) can be alternative methodologies for classiﬁcation problems to which traditional statistical methods have long been applied. NN have shown to have better predictive capability than MDA and logistic regression in business failure prediction problems (Coleman, Graettinger, & Lawrence, 1991, Rahimian Salchengerger, Cinar, & Lash, 1993, Salchengerger et al., 1992, Sharda & Wilson, 1996, Tam & Kiang, 1992, Wilson & Sharda, 1994, Zhang, Hu, Patuwo, & Indro, 1999). Recently, support vector machines (SVM), developed by Vapnik (1995), have gained popularity due to many attractive features and excellent generalization performance on a wide range of problems. Also, SVM embody the structural risk minimization principle (SRM), which has been shown to be superior to traditional empirical risk minimization principle (ERM) employed by conventional neural networks. It has been demonstrated by Min and Lee (2005) that SVM outperform NN, MDA and logistic regression in business failure prediction. While there are several arguments that variable selection, also called feature selection, is a fundamental problem that has significant impact on the prediction accuracy of the models. Many methods have been developed to create the best preparation for data inputs. For doing good classiﬁcation process in SVM, the preparation of data inputs for classiﬁer needs special treatment to guarantee the good performance in the classiﬁer. It is therefore not surprising that much research has been done on dimensionality reduction (Dash & Liu, 1997; Kira & Rendell, 1992; Langley, 1994). A technique that can reduce dimensionality using information contained within the data set and preserving the meaning of the features is clearly desirable. Rough set theory (RST) can be used as such a tool to discover data dependencies and reduce the number of attributes contained in a dataset by purely structural methods (Pawlak, 1991), have successfully been applied to real world classiﬁcation problems (Ahn et al., 2000; Siegel, de Korvin, & Omer, 1993; Slowinski & Zopounidis, 1995).

The objective of the study is using efﬁciency as predictive variables and proposed a novel model to integrate RST with SVM technique, named RST–SVM, to increase the accuracy for the prediction of business failure. By RST approach, the redundant attributes in multi-attribute information table can be reduced, which showed that the number of independent variables was reduced with no information loss, is utilized as a preprocessor to improve business failure prediction capability by SVM. In the ﬁrst stage, RST is selected for doing variable selection because of its reliability to obtain the signiﬁcant independent variables. The second stage of the study will use the obtained signiﬁcant independent variables from RST as inputs of SVM models. The obtained results can then be compared to see whether the one including efﬁciency variable will give better classiﬁcation accuracy or not. In the proposed method, DEA is employed as a tool to evaluate the input/output efﬁciency. The effectiveness of the methodology was veriﬁed by experiments comparing back-propagation neural networks (BPN) approach with the hybrid approach (RST–BPN). This paper is organized as follows. We will give a brief review of the DEA model used to evaluate the efﬁciency of a corporation in Section 2. Section 3 describes classiﬁcation techniques used in previous researches concerned with our paper: RST and SVM, respectively. In Section 4, the proposed data preprocessing algorithm by RST and hybrid models is described. In Section 5, we analyze and compare the results of each model. Finally, discussion and conclusions are provided in Section 6. 2. Using DEA for evaluating the efﬁciencies Data envelopment analysis (DEA) is an evaluation tool for decision making units (DMUs) and it solves many decision-making problems by integrating multiple inputs and outputs simultaneously. DEA is a non-parametric data analytic technique that is extensively used by various research communities (e.g., Hong, Ha, Shin, Park, & Kim, 1999; Seol, Choi, Park, & Park, 2007; Sohn & Moon, 2004). The basic ideas behind DEA date back to Farrell (1957) but the recent series of discussions started with the article by Charnes, Cooper, and Rhodes (1978). We give very brieﬂy the salient features of DEA. More detailed information can be obtained elsewhere (Banker, Charnes, & Cooper, 1984; Charnes, Cooper, Lewin, & Seiford, 1993). The DEA ration form, proposed by Charnes, Cooper and Rhodes (CCR) (1978), is designed to measure the relative efﬁciency or productivity of a speciﬁc DMUk. The DEA formulation is given as follows. Suppose that there is a set of n DMUs to be analyzed, each of which uses m common inputs and s common outputs. Let k (k = 1, . . ., n) denote the DMU whose relative efﬁciency or productivity is to be maximized.

Maximize Subject to

Ps urk Y rk hk ¼ Pr¼1 m i¼1 v ik X ik Ps ur Y rj Pr¼1 61 m v i¼1 i X ij

ð1Þ

ur ; v i P 0 i ¼ 1; 2; . . . ; m r ¼ 1; 2; . . . ; s j ¼ 1; 2; . . . ; n where urk is the variable weights of given to the rth output of the kth DMU, v ik is the variable weights of given to the ith input of the kth DMU, urk and v ik are decision variables determining the relative efﬁciency of DMUk , Y rj is the rth output of the jth DMU, and X ij is the ith input of the jth DMU. It also assumes that all Y rj and X ij are positive. hk is the efﬁciency score and is less than or equal to 1. When efﬁciency score of hk is 1, DMUk is called the efﬁcient frontier.

1537

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

There are two types of CCR models. One version is input oriented model, which minimizes the inputs, and the other is output oriented model maximizing the outputs. In this paper, we apply the output oriented CCR model since we focus on maximizing the multiple outputs. 3. Rough sets and support vector machines 3.1. Basic concepts of rough sets Rough sets theory (RST) is a machine-learning method, which is introduced by Pawlak (1991) in the early 1980s, has proved to be a powerful tool for uncertainty and has been applied to data reduction, rule extraction, data mining and granularity computation. Here, we illustrate only the relevant basic ideas of RST that are relevant to the present work. By an information system we understand the 4-tuple S = ðU; A; V; f Þ, where U is a ﬁnite set of objects, called the universe, A is a ﬁnite set of attributes, V ¼ U a2A V a is a domain of attribute a, and f : U A ! V is called an information function such that f ðx; aÞ 2 v a , for 8a 2 A; 8x 2 U. In the classiﬁcation problems, an information system is also seen as a decision table assuming that A ¼ C [ D and C \ D ¼ /, where C a set of condition attributes and D is a set of decision attributes. Let S ¼ ðU; A; V; f Þ be an information system, every P # A generates an indiscernibility relation INDðPÞ on U, which is deﬁned as follows:

INDðPÞ ¼ fðx; yÞ 2 U U : f ðy; aÞ; 8a 2 pg

ð2Þ

U=INDðPÞ ¼ fc1 ; c2 ; . . . ; ck g is a partition of U by P, every C i is an equivalence class. For 8x 2 U the equivalence class of x in relation to U=INDðPÞ is deﬁned as follows:

½xU=INDðPÞ ¼ fy 2 U : f ðy; aÞ ¼ f ðx; aÞ; 8a 2 Pg

ð3Þ

Let P # A and X # U. The P-lower approximation of x (denoted by P ðxÞÞ and the P-upper approximation of x (denoted by P ðxÞÞ are deﬁned as follows:

P ðxÞ ¼ fy 2 U : ½yU=INDðPÞ # X g;

ð4Þ

P ðxÞ ¼ fy 2 U : ½YU=INDðPÞ \ X – /g:

Obviously, reduction is a feature subset selection process, where the selected feature subset not only retains the representational power, but also has minimal redundancy. So, RST methodology-based dimensionality reduction will get a good feature subset. Some RST-based reduction and feature selection algorithms have been proposed. Consistency of data (Mi, Wei-Zhi, & Wen-Xiu, 2004; Pawlak, 1991), dependency of attributes (Wang, Hu, & Yang, 2002), mutual information (Skowron & Rauszer, 1992), discernibility matrix (Jue & Duo-Qian, 1998) and genetic algorithm are employed to ﬁnd reducts of an information system (Moradi, Grzymala-Busse, & Roberts, 1998). And these techniques are applied to text classiﬁcation (Swiniarski & Hargis, 2001), face recognition (Liu & Setiono, 1998), texture analysis (Swiniarski & Skowron, 2003) and process monitoring (Dubois & Prade, 1992). An extensive review about RST-based feature selection was given in Thangavel and Pethalakshmi (2009). 3.2. Support vector machines Support vector machines (SVM) is the theory based on statistical learning theory. It realizes the theory of VC dimension (for Vapnik–Chervonenkis dimension) and principle of structural risk minimum (SRM). The whole theory can simply be described as follows: searching an optimal hyper plane satisﬁes the request of classiﬁcation, then using a certain algorithm to make the margin of the separation beside the optimal hyper plane maximum while ensuring the accuracy of correct classiﬁcation. According to the theory, we can classify the separable data into classes effectively. The following is the brief introduction of SVM in cases. Suppose we are given a set of training data xi 2 Rn ði ¼ 1; 2; . . . ; nÞ with the desired output yi 2 fþ1; 1g corresponding to the two classes. And suppose there exists a separating hyper plane with the target functions w xi þ b ¼ 0 (w represents the weight vector and b represents the bias). To ensure that all training data can be classiﬁed, we must make the margin of separation ð2=kwkÞ maximum. Then, in the case of linear separation, the linear SVM for optimal separating hyper plane has the following optimization problem,

1 T w w 2 Subject to yi ðxi w þ bÞ P 1; Minimize

where P ðxÞ is the set of all objects from U which can certainly be classiﬁed as elements of x employing the set of attributes P. P ðxÞ is the set of objects of U which can be classiﬁed as elements of X using the set of attributes P. Let P; Q # A, the positive region of classiﬁcation U=INDðQ Þ with respect to the set of attributes P, or in short, P – positive region of Q, is deﬁned as POSðQ Þ ¼ U X2U=INDðQ Þ PðxÞ: POSP ðQ Þ contains objects in U that can be classiﬁed to one class of the classiﬁcation U=INDðQ Þ by attributes P. The dependency of Q on P is deﬁned as:

/ðwÞ ¼

Maximize Q ðaÞ ¼

n X

ai

i¼1

ð5Þ

An attribute a is said to be dispensable in P with respect to Q, if cP ðQ Þ ¼ kPfag ðQ Þ; otherwise a is an indispensable attribute in P with respect to Q. Let S ¼ ðU; A; V; f Þ be a decision table, the set of attributes PðP # CÞ is a reduce of attributes C, which satisﬁes the following conditions:

i ¼ 1; 2 . . . ; n

ð8Þ

The solution to above optimization problem can be converted into its dual problem. We can search the nonnegative Lagrange multipliers by solving the following optimization problem,

Subject to

cP ðQ Þ ¼ cardðPOSP ðQ ÞÞ=cardðUÞ:

ð7Þ

n X

n X n 1X ai aj yi yj xTi xj 2 i¼1 j¼1

ai yi ¼ 0ai P 0;

i ¼ 1; 2 . . . ; n

ð9Þ ð10Þ

i¼1

The corresponding training data are the support vectors. Suppose ai are the optimal Lagrange multipliers, the optimal weight vectors are

w ¼

n X

ai yi xi

ð11Þ

i¼1

The optimal biases are 0

cP ðDÞ ¼ cC ðDÞ; cP ðDÞ – cP0 ðDÞ 8P P:

ð6Þ

A reduce of condition attributes C is a subset that can discern decision classes with the same accuracy as C, and none of the attributes in the reduced can be eliminated without decreasing its distrainable capability (Pawlak, 2002).

b ¼ yj

n X

yj ai xTi xj

ð12Þ

i¼1

Then, the optimal equation for classiﬁcation is

f ðxÞ ¼ sgnfðw xÞ þ b g

ð13Þ

1538

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

The above discussion is restricted to the case that the training data are separable. To generalize the problem to the non-separable case, slack variable ei P 0; i ¼ 1; 2; . . . n is introduced under the constraints of (10). The objective equation is

Minimize /ðw; eÞ ¼

n X 1 T ei w wþC 2 i¼1

Subject to yi ðwT xi þ bÞ P 1 ei

ei P 0; i ¼ 1; 2; . . . ; n

ð14Þ ð15Þ

C is the nonnegative parameter chosen by users. Solving the problem is similar to the problem of the case of linear separation. But the constraints are changed to be n X

ai yi ¼ 0 0 6 ai 6 C; i ¼ 1; 2; . . . ; n

ð16Þ

i¼1

As to the non-linear separable data, the data can be mapped into a high dimensional feature space with a nonlinear mapping in which we can search the optimal hyperplane. The linear classiﬁcation after mapping is performed by selecting the appropriate inner-product kernel that satisﬁes the Mercer’s condition. Then the problem is converted into searching the nonnegative Lagrange multipliers fai gni¼1 by solving the following optimization problem (Gold & Sollish, 2005; Sinalingam & Pandia, 2005; Zhu & Zhang, 2003),

Q ðaÞ ¼

Maximize

n X

ai

i¼1

Subject to

n X

n X n 1X ai aj yi yj Kðxi ; xj Þ 2 i¼1 j¼1

ai yi ¼ 0 0 6 ai 6 C; i ¼ 1; 2; . . . ; n

ð17Þ ð18Þ

i¼1

Hence, the ﬁnal classiﬁcation function is

( f ðxÞ ¼ sgn

n X

) i yi Kðxi ; xj Þ

a

þb

ð19Þ

i¼1

The common used kernel function is RBF kernel function.

kx; x0 k2 Kðx; x Þ ¼ exp 2r2

!

0

ð20Þ

Financial applications of SVM typically focus on pattern matching, classiﬁcation and forecasting. Haärdle, Moro, and Schaäfer (2003) employed SVM to predict bankruptcy and compared with NN, MDA and learning vector quantization (LVQ) (Fan & Palaniswami, 2000). SVM obtained the best results, followed by NN, followed by LVQ and followed by MDA. Van Gestel et al. (2003) also reported on the experiment with least squares SVM, a modiﬁed version of SVM, and showed signiﬁcantly better results in business failure prediction when contrasted with the classical techniques. 4. Research data and experiments The objective of the study is to use efﬁciency as predictive variables and propose a novel model, RST–SVM, to increase the accuracy for the prediction of business failure. To test whether efﬁciency variable will be helpful in business failure predictions, our approach is based on the rationale that with ﬁnancial ratios already been included as independent variables, testing whether the inclusion of DEA will provide extra information about improving the classiﬁcation accuracy of the prediction model. As we also like to see whether RST can be a good supporting tool in deciding the input variables of the SVM prediction model, the objective of the proposed study is to explore the performance of business failure predictions by proposed RST–SVM model. In the ﬁrst stage, RST is selected for doing variable selection because of its reliability to obtain the signiﬁcant independent variables. The second stage of

the study will use the obtained signiﬁcant independent variables from RST as inputs of SVM models. The obtained results can then be compared to see whether the one including DEA will give better classiﬁcation accuracy or not. Finally, for verifying the applicability of methodology, we also designed RST–BPN model as the benchmark. The research data we employ are provided by Taiwan Stock Exchange (TSE) and database of the Taiwan Economic Journal (TEJ) in Taiwan and consists of the information and electronic manufacturing ﬁrms, which are ﬁled for bankruptcy from 2005 to 2007. The criteria for sampling required that once a company was announced that stocks needed to be ‘‘Traded” or ‘‘Terminated.” In other words, it may have been cited as (1) having credit crisis, (2) having net operating loss, (3) failing to pay debts, or (4) violating for regulation. A failed ﬁrm was paired with a healthy ﬁrm by (1) industry, (2) products, (3) capitalization and (4) values of assets. The size of matched sample was 114 ﬁrms, including 38 failed ﬁrms and 76 healthy ﬁrms. After deleting variables with missing values, the previous research, experiences from past decisions, and the domain knowledge of ﬁnancial experts in that industry, there are 18 attributes (including 17 ﬁnancial ratios and DEA) and by the binary assignment to a decision class (healthy or unhealthy, coded by 1 and 2, respectively). For the DEA, informative input and output variables should be selected. Generally, the input variables for a corporation are capital, liability, human resources, technology, etc. and the output variables are commonly proﬁt and sales. Therefore, in this paper, we selected R&D expense, R&D designers and the number of patents and trademarks as the input variables for DEA, and the output variable included gross proﬁt and market share. To pick out the signiﬁcant independent variables that are informative and closely related to the corporate condition, in this study, the RST-based application RSES is a collection of algorithms and data structures for rough set computations, developed at the Group of Logic, Inst. of Mathematics, University of Warsaw, Poland. and in particular the genetic algorithm (Komorowski, Øhrn, & Skowron, 2002) were used. The selected variables for this research are shown in Table 1, and these eight variables are taken as the input of the classiﬁer of SVM and BPN. 5. Results and analysis 5.1. Two-stage hybrid model in integrating RST and SVM After RST analysis was ﬁnished and holdout sample was separated into two groups, we tested whether DEA will be helpful in business failure predictions. Next, hybrid model proposed in this paper is composed of RST and SVM with two groups. In this study, we tested these two possible hybrid models. RST–SVM model only uses ﬁnancial ratios as independent variables which is model I, and RST-SVM model includes both ﬁnancial ratios and DEA which is model II.

Table 1 Deﬁnition of variables. Variables

Description

X1 X2 X3 X4 X5 X6 X7 X8

Working capital/total assets Total debt/total assets Net income/total assets Current assets/total assets Current assets/sales Net income/(total assets total liabilities) Account receivable turnover DEA

1539

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

In SVM, we applied the LIBSVM program, downloaded from http://www.csie.ntu.edu.tw/wcjlin/libsvm/, to construct the classiﬁcation model and chose Gaussian RBF as the kernel function. There are two parameters associated with the RBF kernels: C and r. Some kind of parameter-selection procedure has to be done. Hsu, Chang, and Lin (2004) proposed a ‘grid search’ on C and r and a m-fold cross-validation on the training data. The goal of this procedure is to identify the optimal C and r, so that the classiﬁer can accurately predict unseen data. In m-fold cross-validation, we ﬁrst divide the training set into subsets of equal size. Sequentially one subset is tested using the classiﬁer trained on the remaining (m 1) subsets. Thus, each instance of the whole of training set is predicted once, so the cross-validation accuracy is the percentage of data that are correctly classiﬁed. The cross-validation procedure can prevent the over ﬁtting problem. In this paper, we performed the 5-fold cross-validation to choose the proper parameters of C ¼ f20 ; 21 ; .. .;27 g and r ¼ f23 ; 22 ; .. .;23 g. After conducting the grid-search for training data, we found that the results of the confusion matrix using the obtained two hybrid models can be summarized in Tables 2 and 3 respectively. From the results in Tables 2 and 3, we can observe that the average correct classiﬁcation rate is 83.33% for the model only considering ﬁnancial ratios and 86.84% for the model considering both ﬁnancial ratios and DEA. From the improved correct classiﬁcation rate of the model considering both ﬁnancial ratios and DEA, DEA should be helpful in improving the classiﬁcation accuracy of the prediction model.

tions of hidden nodes and learning rates, the network structure was 7-9-1 and 8-9-1 for input layer, hidden layer and output layer for the model III and IV respectively. We used sigmoid function for activation and Levenberg–Marquardt algorithm for learning. The BPN were executed by MATLAB NN toolbox. The prediction results of the testing sample (the confusion matrix) using the two hybrid prediction models are summarized in Tables 4 and 5 respectively. From the results revealed in Tables 4 and 5, we can observe that the average correct classiﬁcation rate is 78.95% for the model only including ﬁnancial ratios, and 82.46% for the model incorporating both ﬁnancial ratios and DEA. Again from Table 6, the improved correct classiﬁcation rate of the model considering both ﬁnancial ratios and DEA, we can also conclude that DEA should provide extra information other than ﬁnancial ratios in improving the classiﬁcation accuracy of the prediction model. 5.3. Results compared with Type I and Type II errors of the constructed models It is well known that, in order to evaluate the overall classiﬁcation capability of the designed business failure prediction models,

Table 4 RST–BPN model (model III) classiﬁcation results with only ﬁnancial ratios. Actual class

1 (Healthy) 2 (Unhealthy)

5.2. Two-stage hybrid model in integrating RST and BPN

Classiﬁed class 1 (Healthy)

2 (Unhealthy)

63 (82.89%) 11 (28.95%)

13 (17.11%) 27 (71.05%)

Average correct classiﬁcation rate: 78.95%.

Since Vellido, Lisboa, and Vaughan (1999) pointed out that around 80% of business applications using neural networks will use the BPN training algorithm, for verifying the applicability of SVM, we will also use the popular BPN as the benchmark. As recommended by Cybenko (1989) and Hornik et al. (1989) that the network structure with one hidden layer is sufﬁcient to model any complex system with any desired accuracy, the designed network model will have only one hidden layer. In this study, we tested these two possible hybrid models. RST– BPN model only uses ﬁnancial ratios as independent variables which is hybrid model III, and RST–BPN model includes both ﬁnancial ratios and DEA which is hybrid model IV. After comparing the prediction results of the testing sample with different combina-

Table 2 RST–SVM model (model I) classiﬁcation results with only ﬁnancial ratios. Actual class

1 (Healthy) 2 (Unhealthy)

1 (Healthy)

2 (Unhealthy)

64 (84.21%) 7 (18.42%)

12 (15.79%) 31 (81.58%)

Table 3 RST–SVM model (model II) classiﬁcation results with both ﬁnancial ratios and DEA.

1 (Healthy) 2 (Unhealthy)

1 (Healthy) 2 (Unhealthy)

1 (Healthy)

2 (Unhealthy)

65 (85.53%) 4 (10.53%)

11 (14.47%) 34 (89.47%)

Classiﬁed class 1 (Healthy)

2 (Unhealthy)

63 (82.89%) 7 (18.42%)

13 (17.11%) 31 (81.58%)

Average correct classiﬁcation rate: 82.46%.

Table 6 Predictive accuracies of the constructed model.

RST–SVM (model I) RST–SVM (model II) RST–BPN (model III) RST–BPN (mode IV)

Accuracy (%) (1-1)

(2-2)

Average accuracy

84.21 85.53 82.89 82.89

81.58 89.47 71.05 81.58

84.21 86.84 78.95 82.95

Table 7 TypeI and TypeII errors of the constructed model. Model

Classiﬁed class

Average correct classiﬁcation rate: 86.84%.

Actual class

Model

Classiﬁed class

Average correct classiﬁcation rate: 83.33%.

Actual class

Table 5 RST–BPN model (model IV) classiﬁcation results with both ﬁnancial ratios and DEA.

RST–SVM (model I) RST–SVM (model II) RST–BPN (model III) RST–BPN (mode IV)

Performance assessment (%) TypeI error

TypeII error

15.79 14.47 17.11 17.11

18.42 10.53 28.95 18.42

1540

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541

the misclassiﬁcation also have to be taken into account. Type I errors means a healthy company is misclassiﬁed as a unhealthy company; Type II errors means a unhealthy company is misclassiﬁed as a healthy one. In order to evaluate the overall classiﬁcation capability, Table 7 summarizes the Type I and Type II errors of the constructed models when considering only ﬁnancial ratios and both ﬁnancial ratios and DEA. From the results revealed in Table 7, we can ﬁnd that the RST–SVM model has the lowest Type I and Type II errors in comparison with the RST–BPN model. Hence we can conclude that the RST–SVM model not only have the best classiﬁcation rate, but also have the lowest Type I and Type II errors. After comparing the results in Table 7, several conclusions can be observed. First, the models including both ﬁnancial ratios and DEA provide better classiﬁcation results than the corresponding models only using ﬁnancial ratios. The above phenomenon implies that DEA do provide valuable information in predicting business failure. Secondly, we constituted RST–SVM model that provides better classiﬁcation results than RST–BPN model, no matter when only considering ﬁnancial ratios or the model including both ﬁnancial ratios and DEA. Hence, we believe the proposed RST–SVM model should be a better alternative since it exhibits the capability in identifying important independent variables which may provide valuable information for further diagnostic purposes.

6. Discussion and conclusions 6.1. Discussion The prediction of business failure is an important and challenging issue that has served as the impetus for many academic studies over the past three decades. While the efﬁciency of a corporation’s management is generally acknowledged to be a key contributor to corporation’s bankrupt, it is usually excluded from early prediction models. The objective of the study is to use efﬁciency as predictive variables and propose a novel model, RST–SVM, to increase the accuracy for the prediction of business failure. For verifying the applicability of methodology, we also designed neural networks approach with the hybrid approach as the benchmark, and the proposed RST–SVM model was applied to a dataset on bankruptcies in Taiwan. First, this study found that most of the prior studies only adopted ﬁnancial ratios as independent variables. While the efﬁciency of a corporation’s management is generally acknowledged to be a key contributor to corporation’s bankrupt (Gestel et al., 2006; Seballos & Thomson, 1990; Secrist, 1938), it is usually excluded from early prediction models. Therefore, we believe efﬁciency which reﬂects the status of the management of a corporation in business failure predictions will be decisive factors affecting the predictive capability. As an efﬁciency evaluation technique, the DEA is useful especially when there are various selection criteria and measurement units. Thus, we use DEA as the variable for efﬁciency of a corporation in the business failure prediction models. Secondly, most ﬁnancial ratios did not satisfy the normality assumption for multivariate statistical models such as the MDA and the logistics regression model. Thus, these statistical prediction techniques exhibited the worst predictive accuracy and the largest errors of all models tested herein. Thirdly, artiﬁcially intelligent models (SVM and NN) are more accurate in predicting business failure than other multivariate statistical models. By minimizing the sum of the empirical risk and the complexity of the hypothesis space, SVM gives good generalization performance on many business failure prediction problems. For doing good classiﬁcation process in SVM, the preparation of data inputs for classiﬁer needs special treatment to guarantee the good performance in classiﬁer. One of the reasons is after getting data from experiment and many variables of course, it cannot

be directly inputted into classiﬁer because it will decrease the performance of classiﬁer. Finally, the RST has become very popular among scientists worldwide and is now one of the most developed techniques in intelligent data analyses. The RST reduction technique is applied to ﬁnd all reducts of the data that contain the minimal subset of attributes that are associated with a class label for classiﬁcation. Based on these reasons, we can conclude that the proposed RST–SVM model outperformed the other business failure models. Additionally, the results of this work demonstrate that the predictive accuracy of the RST–SVM model in forecasting the business failure is signiﬁcantly increased by DEA. 6.2. Conclusions For verifying the feasibility on this proposed RST–SVM model, business failure prediction tasks are performed using the public companies ﬁling bankruptcy between 2005 and 2007 in Taiwan. The contribution of this study can be summarized as follows. First, DEA do provide valuable information in business failure predictions. Secondly, the proposed RST–SVM model provides better classiﬁcation results than RST–BPN, no matter when only considering ﬁnancial ratios or the model including both ﬁnancial ratios and DEA. Hence, the RST–SVM model should be an efﬁcient alternative. The above-mentioned research ﬁndings justify the presumptions that the RST–SVM model should be a better alternative in conducting business failure prediction tasks. Besides, the RST–SVM model not only has better classiﬁcation accuracies, but also has the lowest Type I and Type II errors. Thus, the forecasting technique (RST– SVM) can provide a guide of investment for investors and government. References Ahn, B. S., Cho, S. S., & Kim, C. Y. (2000). The integrated methodology of rough set theory and artiﬁcial neural network for business failure prediction. Expert Systems with Applications, 18, 65–74. Altman, E. I. (1968). Financial ratios, discriminant analysis, and the prediction of corporate bankruptcy. Journal of Finance, 23(4), 589–609. Altman, E. I. (1984). The success of business failure prediction models: An international survey. Journal of Banking and Finance, 8(2), 171–198. Altman, E. I., Haldeman, R. G., & Narayanan, P. (1977). Zeta analysis. Journal of Banking and Finance, 29–51. June. Banker, R. D., Charnes, A., & Cooper, W. W. (1984). Some models for estimating technical and scale inefﬁciencies in data envelopment analysis. Management Science, 30(1), 1078–1092. Beaver, W. H. (1966). Financial ratios as predictors of failure, empirical research in accounting: Selected studies. Supplement to the Journal of Accounting Research, 4, 179–199. Bryant, S. M. (1997). A case-based reasoning approach to bankruptcy prediction modeling. Intelligent System Accounting, Financial and Management, 6, 195–214. Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efﬁciency of decision making units. European Journal of Operational Research, 2, 429–444. Charnes, A., Cooper, W. W., Lewin, A. Y., & Seiford, L. M. (Eds.). (1993). Data envelopment analysis: Theory, methodology and applications. Boston: Kluwer. Coleman, K. G., Graettinger, T. J., & Lawrence, W. F. (1991). Neural networks for bankruptcy prediction: The power to solve ﬁnancial problems. AI Review, 48–50. Collins, R. A., & Green, R. D. (1972). Statistical methods for bankruptcy forecasting. Journal of Economics and Business, 32, 349–354. Cybenko, G. (1989). Approximation by superpositions of a sigmoidal function. Mathematical Control Signal Systems, 2(1989), 303–314. Dash, M., & Liu, H. (1997). Feature selection for classiﬁcation. Intelligent Data Analysis, 1(3), 131–156. Deakin, E. B. (1972). A discriminant analysis of predictors of business failure. Journal of Accounting Research, 10(1), 167–179. Dimitras, A. I., Zanakis, S. H., & Zopounidis, C. (1996). A survey of business failures with an emphasis on prediction methods and industrial applications. European Journal of Operational Research, 90, 487–513. Dubois, D., & Prade, H. (1992). Putting rough sets and fuzzy sets together. In R. Slowinski (Ed.), Intelligent Decision Support-Handbook of Applications and Advances of the Rough Set Theory (pp. 203–232). Dordrecht: Kluwer Academic. Fan, A., & Palaniswami, M. (2000). Selecting bankruptcy predictors using a support vector machine approach. Proceedings of the IEEE-INNS-ENNS international joint conference on neural networks (Vol. 6, pp. 354–359). Farrell, M. J. (1957). The measurement of productive efﬁciency. Journal of the Royal Statistical Society. Series A (General), 120, 253–289.

C.-C. Yeh et al. / Expert Systems with Applications 37 (2010) 1535–1541 Gestel, T. V., Baesens, B., Suykens, J., Poel, D. V., Baestaens, D. E., & Willekens, M. (2006). Bayesian kernel based classiﬁcation for ﬁnancial distress detection. European Journal of Operational Research, 172, 979–1003. Gold, C., & Sollish, P. (2005). Model selection for support vector machine classiﬁcation. Neurcomputing, 55, 221–249. Haärdle, W., Moro, R., & Schaäfer, D. (2003). Predicting corporate bankruptcy with support vector machines Working slide. Humboldt University and the German Institute for Economic Research. Hong, H., Ha, S., Shin, C., Park, S., & Kim, S. (1999). Evaluating the efﬁciency of system integration projects using data envelopment analysis (DEA) and machine learning. Expert Systems with Applications, 16, 283–296. Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximations. Neural Networks, 2, 336–359. Hsu, C. W., Chang, C. C., & Lin, C. J. (2004). A practical guide to support vector classiﬁcation. Technical Report, Department of Computer Science and Information Engineering, National Taiwan University. Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate statistical analysis (5th ed.). Upper Saddle River, NJ: Prentice-Hall. Jones, F. L. (1987). Current techniques in bankruptcy prediction. Journal of Accounting Literature, 6, 131–164. Jue, W., & Duo-Qian, M. (1998). Analysis on attribute reduction strategies of rough set. Journal of Computer Science and Technology, 13(2), 189–193. Keasey, K., & Watson, R. (1991). Financial distress prediction models: A review of their usefulness. British Journal of Management, 2, 89–102. Kira, K., & Rendell, L. A. (1992). The feature selection problem: Traditional methods and a new algorithm. In Proceedings of ninth national conference on artiﬁcial intelligence (pp. 129–134). Komorowski, K., Øhrn, A., & Skowron, A. (2002). The ROSETTA rough set software system. In W. Klösgen & J. Zytkow (Eds.), Handbook of data mining and knowledge discovery. Oxford University Press. Kumar, P. R., & Ravi, V. (2007). Bankruptcy prediction in banks and ﬁrms via statistical and intelligent techniques – A review. European Journal of Operational Research, 180(1), 1–28. Langley, P. (1994). Selection of relevant features in machine learning. In Proceedings of the AAAI fall symposium on relevance (pp. 1–5). Liu, H., & Setiono, R. (1998). Some issues on scalable feature selection. Expert Systems with Applications, 15, 333–339. Mi, J.-S., Wei-Zhi, W., & Wen-Xiu, Z. (2004). Approaches to knowledge reduction based on variable precision rough set model. Information Sciences, 159(3–4,15), 255–272. Min, J. H., & Lee, Y. C. (2005). Bankruptcy prediction using support vector machine with optimal choice of kernel function parameters. Expert Systems with Applications, 28, 603–614. January. Moradi, H., Grzymala-Busse, J. W., & Roberts, J. A. (1998). Entropy of english text: Experiments with humans and a machine learning system based on rough sets. Information Sciences, 104(1-2), 31–47. Ohlson, J. A. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 109–131. Pawlak, Z. (1991). Rough sets: Theoretical aspects of reasoning about data. Dordrecht: Kluwer Academic Publishing. Pawlak, Z. (2002). Rough sets and intelligent data analysis. Information Science, 11, 1–12. Rahimian, E., Singh, S., Thammachote, T., & Virmani, R. (1993). Bankruptcy prediction by neural networks. In E. Trippi & E. Turban (Eds.), Neural networks in ﬁnance and investing: Using artiﬁcial intelligence to improve real-world performance (pp. 159–176). Chicago: Probus Publishing.

1541

Salchengerger, L. M., Cinar, E. M., & Lash, N. A. (1992). Neural networks: A new tool for prediction thrift failures. Decision Sciences, 23, 899–916. Scott, J. (1981). The probability of bankruptcy: A comparison of empirical predictions and theoretical models. Journal of Banking and Finance, 5, 317–344. Seballos, L. D., & Thomson, J. B. (1990). Understanding causes of commercial bank failures in the 1980s. Economic commentary. Federal Reserve Board of Cleveland. September. Secrist, H. (1938). National bank failures and non-failures: An autopsy and diagnosis. Bloomington, IN: Principia Press. Seol, H., Choi, J., Park, G., & Park, Y. (2007). A framework for benchmarking service process using data envelopment analysis and decision tree. Expert Systems with Applications, 32(2), 432–440. Sharda, R., & Wilson, R. L. (1996). Neural networks experiments in business-failure forecasting: Predictive performance measurement issues. International Journal of Computational Intelligence and Organizations, 1(2), 107–117. Siegel, P. H., de Korvin, A., & Omer, K. (1993). Detection of irregularities by auditors: A rough set approach. Indian Journal of Accounting, 44–56. Sinalingam, D. M., & Pandia, N. (2005). Minmal classiﬁcation method with error correlation codes for multiclass recognization. International Journal of Pattern Recognition and Artiﬁcial Intelligence, 5, 663–680. Skowron, A., & Rauszer, C. (1992). The discernibility matrices and functions in information systems. Intelligent decision support: Handbook of applications and advances of rough set theory, 331–362. Slowinski, R., & Zopounidis, C. (1995). Application of the rough set approach to evaluation of bankruptcy risk. International Journal of Intelligent Systems in Accounting, Finance and Management, 4, 27–41. Sohn, S., & Moon, T. (2004). Decision tree based on data envelopment analysis for effective technology commercialization. Expert Systems with Applications, 26(2), 279–284. Swiniarski, R. W., & Hargis, L. (2001). Rough sets as a front end of neural networks texture classiﬁer. Neurocomputating, 36, 85–102. Swiniarski, R. W., & Skowron, A. (2003). Rough set methods in feature selection and recognition. Pattern Recognition Letter, 24(6), 833–849. Tam, K. Y., & Kiang, M. (1992). Managerial applications of neural networks: The case of bank failure predictions. Management Science, 38(7), 926–947. Thangavel, K., & Pethalakshmi, A. (2009). Dimensionality reduction based on rough set theory: A review. Applied Soft Computing, 9(1), 1–12. Van Gestel, T., Baesens, B., Suykens, J., Espinoza, M., Baestaens, D.E., Vanthienen, J., et al. (2003). Bankruptcy prediction with least squares support vector machine classiﬁers. In Proceedings of the IEEE international conference on computational intelligence for ﬁnancial engineering, Hong Kong (pp. 1–8). Vapnik, V. (1995). The nature of statistical learning theory. New York: Springer. Vellido, A., Lisboa, P. J. G., & Vaughan, J. (1999). Neural networks in business: A survey of applications (1992–1998). Expert Systems with Applications, 17, 51–70. Wang, G., Hu, H., & Yang, D. (2002). Decision table reduction based on conditional information entropy. Chinese Journal of Computers, 25(7), 1–8. Wilson, R. L., & Sharda, R. (1994). Bankruptcy prediction using neural networks. Decision Support Systems, 11, 545–557. Zavgren, C. V. (1983). The prediction of corporate failure: The state of the art. Journal of Financial Literature, 2, 1–37. Zhang, G., Hu, M. Y., Patuwo, B. E., & Indro, D. C. (1999). Artiﬁcial neural networks in bankruptcy predictions: General framework and cross-validation analysis. European Journal of Operational Research, 116, 16–32. Zhu, Y. S., & Zhang, Y. Y. (2003). The study on some problems of support vector classiﬁer. Computer Engineering and Applications, 13, 38–66.

A hybrid approach of DEA, rough set and support vector machines for business failure prediction

A hybrid approach of DEA, rough set and support vector machines for business failure prediction

Recommend Documents