An integrative model with subject weight based on neural network learning for bankruptcy prediction

Available online at www.sciencedirect.com Expert Systems with Applications Expert Systems with Applications 36 (2009) 403–410 www.elsevier.com/locate...

Download PDF

165KB Sizes 45 Downloads 118 Views

Report

PDF Reader
Full Text

Available online at www.sciencedirect.com

Expert Systems with Applications Expert Systems with Applications 36 (2009) 403–410 www.elsevier.com/locate/eswa

An integrative model with subject weight based on neural network learning for bankruptcy prediction Sungbin Cho *, Jinhwa Kim, Jae Kwon Bae School of Business, Sogang University, #1 Sinsu-dong, Mapo-gu, Seoul 121-742, Republic of Korea

Abstract This study proposes an integration strategy regarding how to eﬃciently combine the currently-in-use statistical and artiﬁcial intelligence techniques. In particular, by combining multiple discriminant analysis, logistic regression, neural networks, and decision trees induction, we introduce an integrative model with subject weight based on neural network learning for bankruptcy prediction. The strength of the proposed model stems from diﬀerentiating the weights of the source methods for each subject in the testing data set. That is, the relative weights consist of N by I matrix, where N denotes the number of subjects and I denotes the number of the source methods. The experiments using a real world ﬁnancial data indicate that the proposed model can marginally increase the prediction accuracy compared to the source methods. The integration strategy can be useful for a dichotomous classiﬁcation problem like bankruptcy prediction since prediction can be improved by taking advantage of existing and newly emerging techniques in the future. 2007 Elsevier Ltd. All rights reserved. Keywords: Bankruptcy prediction; Integrative prediction model; Subject weight learning; Method-data ﬁtness

1. Introduction Evaluating the ﬁnancial healthiness of a ﬁrm and assessing the default risk have been of great interest to many stakeholders such as creditors, investors, and government. An idea about the importance of corporate bankruptcy problem can be given if we take a look at the statistics of Korea. For example, the total balance of bank loans to corporations is approximately $279 billion as of January 2006 in Korea (Bank of Korea, 2006). A more serious fact is that 285 ﬁrms, on average, declared bankruptcy each month in 2005. Since the business environment is increasingly becoming uncertain and competitive these days, bankruptcy can happen to any organizations. Over the last six decades, a number of studies have been conducted by scholars and practitioners in order to predict corporate bankruptcy in the area of business administration. Even a slight improvement with respect to assessing *

Corresponding author. Tel.: +82 2 705 8715; fax: +82 2 705 8519. E-mail address: [email protected] (S. Cho).

0957-4174/$ - see front matter 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2007.09.060

credit risk helps reduce the creditor’s risk and lead to a considerable amount of saving for an economy. Accurate credit assessment usually brings into many beneﬁts such as cost reduction in credit analysis, better monitoring, and an increased debt collection rate. Banks have used an internal rating system traditionally for assessing credit risk. The scoring system includes quantitative aspects as well as qualitative aspects such as reputation (Treacy & Carey, 2000). As a more systematic way, the modeling strategies have been either statistical or machine learning based-approaches, which take into account quantitative factors only. Multiple discriminant analysis might be the pioneering statistical approach in exploring the factors aﬀecting bankruptcy, but it contained some limitations due to the rigid assumptions generally required in statistics. Logistic regression model became popular as an alternative technique due to its relaxed assumptions. Since the artiﬁcial intelligence techniques began to be applied to solving business problems in the 1990s, the techniques have become even more sophisticated and reﬁned in recent years. According to our literature

404

S. Cho et al. / Expert Systems with Applications 36 (2009) 403–410

review, many studies reported that artiﬁcial intelligence techniques often perform better than the traditional statistical methods. With the purpose of developing an even more evolutionary approach, this study attempts to combine the statistical methods with the artiﬁcial intelligence techniques. We select multiple discriminant analysis and logistic regression among statistical methods and two artiﬁcial intelligence (AI) techniques, neural networks and rule induction. By combining these methods, we introduce an integrative model with subject weight. The most challenging problem here is how to develop an integrating strategy in order to take advantage of strength of each method, which usually varies from data to data. In other words, how to determine the relative weight of these methods for each record is of most interest to us. Using the ﬁnancial data about 1,800 ﬁrms listed in Korea Credit Guarantee Fund, this study develops integration strategies and evaluates the performance with respect to bankruptcy prediction. 2. Literature review The bankruptcy forecasting has long been of great interest both to scholars and practitioners. The bankruptcy prediction is basically a dichotomous decision, either being bankrupt or not. Most statistical and AI methods estimate the probability of bankruptcy, and if this probability is greater than the cutoﬀ value (0.5 in most cases), then the prediction is to be bankrupt. The earliest studies about bankruptcy prediction were pioneered by those who adopted the statistical approaches upon empirical data. Beaver (1966) used ﬁnancial ratios with respect to profitability, liquidity, and solvency to predict a ﬁrm’s failure to pay its ﬁnancial obligations upon maturity such as bankruptcy, bond default, and an overdrawn bank account. His research is essentially to develop a cutoﬀ threshold value for each ﬁnancial ratio in order to classify into two groups. After these traditional ratio analyses, statistical linear models began to be applied to the problem of corporate bankruptcy prediction. Altman (1968) employed the multiple discriminant analysis, in that he computed an individual ﬁrm’s discriminant score using a set of ﬁnancial and economic ratios. The model performance was superior for two years before bankruptcy, and then deteriorated substantially thereafter. Ohlson (1980) introduced the logistic regression with a sigmoid function to the bankruptcy prediction problem. Compared to the multiple discriminant analysis, the logistic regression model was easier to understand since the logistic score, taking a value between 0 and 1, was interpretable in a probabilistic way. In parallel with the rapid advancement of computer technology, data mining techniques have been developed and popularly applied to the management problems since 1990s. Data mining techniques are very useful for searching an unknown meaning or non-linear pattern in a massive data set. Among these techniques, artiﬁcial neural networks have been used frequently. The main interest of a

considerable amount of research was to explore the possibility of making more accurate prediction than the traditional statistical methods (Boritz & Kennedy, 1995; Coates & Fant, 1992; Klersey & Dugan, 1995; Pompe & Feelders, 1997). Desai, Crook, and Overstreet (1996) and Fletcher and Goss (1993) showed that the neural networks model predicts bankruptcy more accurately than linear discriminant analysis and logistic regression. Unlike the early neural network approaches that relied on the backpropagation algorithm, recent works have attempted to develop and experiment with more diverse algorithms. Charalambous, Charitou, and Kaourou (2000) introduced a few advanced algorithms such as the learning vector quantization and the feedforward networks and compared the forecasting performance to those obtained by the logistic regression and the backpropagation-based neural networks. Besides the neural networks, decision tree induction and case-based reasoning have been also applied to the bankruptcy forecasting problem (Bryant, 1997; Curram & Mingers, 1994). In the 2000s, much work has focused on new evolutionary algorithms in the context of neural networks for the binary classiﬁcation problems like the bankruptcy prediction (Atiya, 2001; Baek & Cho, 2003; Nasir, John, & Bennett, 2000). Anandarajan, Lee, and Anandarajan (2001) developed a genetic algorithm-based neural network model for ﬁnancially stressed ﬁrms and examined its misclassiﬁcation cost compared to those of backpropagation-based neural networks and multiple discriminant analysis. Since numerous studies have already examined various models for the eﬃcient prognosis of bankruptcy, it seems very difﬁcult to marginally improve the forecasting accuracy beyond the existing approaches. One main stream of such eﬀort is made by jointly using genetic algorithms with neural networks. Abdelwahed and Amir (2005) and Chen and Huang (2003) showed the promising results in terms of forecasting accuracy and adaptability. Pendharkar (2005) proposed a thresholdvarying neural network approach with a genetic algorithm and compared the performance to the traditional backpropagation-based neural networks and discriminant analysis. Lee, Booth, and Alam (2005) investigated the impact of data set size in neural network approach. They also computed the amount of tolerance in terms of forecasting accuracy if an unsupervised approach like the Kohonen self-organizing feature map is adopted instead of the accuracy-based supervised neural network approaches. Other than the development of learning algorithm itself in the neural networks, Pai, Annapoorani, and Pai (2004) applied the principal component analysis to bypass one of the practical constraints of the neural networks, which is that only a moderate number of ﬁnancial ratios must be selected as input variables. Their experiments showed that the models with a cumulative database could be better in forecasting than the same models with an isolated database. Table 1 summarizes the ﬁndings of the former researches regarding bankruptcy prediction. For the remainder of this

S. Cho et al. / Expert Systems with Applications 36 (2009) 403–410

405

Table 1 Comparison of former study results Former studies

Models with better results

Compared models

Anandarajan et al. (2001), Pendharkar (2005) Bryant (1997) Charalambous et al. (2000) Curram and Mingers (1994) Desai et al. (1996) Fletcher and Goss (1993), Zhang et al. (1999) Olmeda and Fernandez (1997) Pompe and Feelders (1997)

ANN with genetic algorithms LOGIT ANN with learning vector quantization ANN, MDA LOGIT, ANN with multilayer ANN Simple voting of models No superior model

MDA, ANN with backpropagation CBR LOGIT, ANN with backpropagation RULE MDA, modular ANN LOGIT ANN, C4.5, LOGIT, MARS, MDA ANN, MDA, RULE

paper, we will use the following abbreviations: ANN for artiﬁcial neural networks, CBR for case-based reasoning, LOGIT for logistic regression, MARS for multivariate adaptive regression splines, MDA for multivariate discriminant analysis, and RULE for rule induction by decision trees. The fact is that it is very diﬃcult or meaningless in some sense to draw a general conclusion such that a particular model outperforms others. Varying model-data ﬁtness invokes a motivation of this study. If this is the case, it may be worthwhile to develop an integrative strategy to combine the existing and newly emerging models, which might result in, in overall, better performance. 3. Modeling procedures 3.1. Source methods In order to develop an integrative prediction model, we use the following four methods as source methods – multiple discriminant analysis, logistic regression, neural networks, and rule induction. After being trained upon the training data set, these source methods are supposed to provide a prediction about whether a given ﬁrm is likely to be bankrupt or not for the testing data set. Let us brieﬂy review the four source methods. To explain a categorical dependent variable by metric independent variables, multiple discriminant analysis has been widely applied. Given a priori deﬁned groups, a variate is derived, which will distinguish best between the groups. The discriminant score is computed by the linear combination of the independent variables and the discriminant weight for each variable. The weights of the variates are searched in a way to maximize the between-group variance over the within-group variance. Note that discriminant analysis works best under the multivariate normality of the independent variables and equal variance–covariance matrices across groups. If this assumption is not tenable, logistic regression would be more appropriate alternative. Compared to discriminant analysis, logistic regression is preferred case by case due to two reasons. First, in cases where the rigid assumptions that are required in discriminant analysis are not met, logistic regression can be an alternative. Besides, logistic regression can incorporate non-linear eﬀects. The logit equation essentially estimates

the probability of belonging to a certain group given a set of independent variables using logarithmic function. A study on artiﬁcial neural networks began by mimicking the neurophysiology of human brain. Since its introduction to data mining, neural networks have been applied to solve many socio-economic problems. Neural networks provide good prediction for the complex problems wherein the variables are related to each other in a non-linear way. One distinct feature of the neural networks compared to the traditional statistical methods is the existence of hidden layers. Hidden nodes are the function of the independent variables whereas the dependent variables are the function of the hidden nodes. To determine the proper number of hidden layers and hidden nodes, a preliminary test is usually performed. Given the weights determined by applying a training data set, a testing data set is applied for the evaluation of the model. Rule induction refers to the rules derived by the decision trees algorithm in the area of data mining. The data are split into many partitions in a way to increase the purity, which is the degree to which the dependent variable belongs to a certain class. The rules that are applied for splitting the data are called the inducted rules. Rule induction is a non-parametric method and suitable for ﬁguring out interaction eﬀect or non-linearity. In many cases, decision trees are also used for the sake of interpreting the analysis results acquired by the neural networks. 3.2. Simple voting model For the purpose of comparison with the model that we will propose in this paper, we introduce a simple voting approach. The Simple Voting Model integrates the predicted values of the four source methods by counting the votes for each classiﬁcation – to be bankrupt or healthy. In doing so, there might be a conﬂict in cases where two source methods predict bankrupt while other two predict healthy. To handle with this even-voting situation, the weight of each source method must be diﬀerentiated. We use the correct classiﬁcation ratio obtained from the training data set as the weight. The terms are deﬁned as follows. XS Y

independent variables 1; dependent variable: Y ¼ 0;

if bankrupt if healthy

406

S. Cho et al. / Expert Systems with Applications 36 (2009) 403–410

Yb i

Yb i ¼

prediction for the dependent variable using source method i

1;

if predict \bankrupt"

0;

if predict \healthy"

works might work better than the logistic regression for a group of subjects, while the logistic regression is better for another group. Note that the four source methods are based on totally diﬀerent error minimization algorithms. The key idea of this paper is that if we learn about this method-data ﬁtness, we can create an integrative prediction algorithm that outperforms individual prediction algorithms. A well known ensemble approach is a neural networks ensemble. The strength of ensemble approach comes from perturbing the training data sets. That is, by applying a number of training data sets with various hidden layers and nodes, the ensemble approach improved forecasting accuracy (Freund & Schapire, 1996; Hansen & Salamon, 1990; Heskes, 1997; Igelnik, Pao, LeClair, & Shen, 1999). Our model is diﬀerent from ensemble approaches in the sense that we learn from totally diﬀerent algorithms, not from a speciﬁc method with many data sets. As illustrated in Fig. 1, the modeling procedures of the Integrative Model with Subject Weight are as follows.

where i = MDA, LOGIT, ANN, and RULE. RCi correct classification ratio of source method i for the training data set where i = MDA, LOGIT, ANN, and RULE. Here, the prediction is made by P P 8 RCi > RCi < bankrupt; if b b b for Y i ¼1 for Y i ¼0 Y ¼ : healthy; otherwise For example, consider the case when the correct classiﬁcation ratios are as follows: RCMDA = 0.7852, RCLOGIT = 0.7988, RCANN = 0.8012, and RCRULE = 0.7654. If each source method predicts the dependent variable of a particular subject in the testing data set such that: Yb MDA ¼ 1, Yb LOGIT ¼ 1, Yb ANN ¼ 0, and Yb RULE ¼ 0, then the Simple VotingPModel will predict ‘‘bankrupt’’ for this subject beP RC ¼ 1:5840 and cause i b b RCi ¼ 1:5666. for Y i ¼1

3.3.1. Step 1 By applying the source methods upon the training data set, both the predicted value and the probability of being bankrupt can be obtained. All the source methods but the MDA directly results in the estimated probability of being bankrupt, Pr(Y = 1)i, where i = MDA, LOGIT, ANN, and RULE. The MDA generates a discriminant score that is not conﬁned to the values between zero to one. If the discriminant score is less than zero, the predicted value is ‘‘bankrupt’’, and it is ‘‘healthy’’, otherwise. The smaller

for Y i ¼0

3.3. Integrative model with subject weight It is a common situation that a particular prediction method ﬁts quite well for some subjects in a data set but does not for other subjects. For example, the neural net-

MDA Training data set

•

LOGIT

•

•

•

• •

∧

S

•

Pr(Y=1)LOGIT

WTR(MDA)

•

•

∧

Y LOGIT

IMDA

RULE

• •

∧

Y MDA

Pr(Y=1)MDA

ANN

Y ANN

ILOGIT

Pr(Y=1)ANN

WTR(LOGIT)

•

∧

Y RULE

IANN

Pr(Y=1)RULE

WTR(ANN)

IRULE

WTR(RULE)

Neural network learning Testing data set

•• ∧

WTE(MDA)

Y MDA

•

•

WTE(LOGIT)

Y LOGIT

∧

•

• •

WTE(ANN)

Y ANN

∧

• WTE(RULE)

Integrating ∧

Y

for final prediction

Fig. 1. Modeling procedures of the integrative model with subject weight.

∧

Y RULE

S. Cho et al. / Expert Systems with Applications 36 (2009) 403–410

the discriminant score is, the more likely it is to be bankrupt. To be comparable with other source methods, the discriminant score should be transformed to the estimated probability of being bankrupt using the following equation: PrðY ¼ 1ÞMDA ¼

S max S S max S min

where S = discriminant score, Smin = minimum of the discriminant score, and Smax = maximum of the discriminant score. 3.3.2. Step 2 For the training data set, an indicator variable is to be created such that its value equals to 1 if its prediction is correct. ( 1; if Y ¼ Yb i Ii ¼ 0; otherwise where i = MDA, LOGIT, ANN, and RULE. The indicator variable can be interpretable as a reward/penalty function for the corresponding subject. For each and every subject, reward, 1, is assigned if the source method predicts correctly. Otherwise, penalty, 0, is assigned. 3.3.3. Step 3 The cutoﬀ value of 0.5 is used for the dichotomous classiﬁcation in the analysis. If the estimated probability of bankrupt is greater than 0.50, the prediction is ‘‘bankrupt’’. Otherwise, it will be ‘‘healthy’’. An extreme probability, which is closer to either 0 or 1, represents a high conﬁdence for its prediction. For example, both the probabilities 0.95 and 0.55 lead to ‘‘bankrupt’’ classiﬁcation. However, 0.95 predicts ‘‘bankrupt’’ with a higher conﬁdence than 0.55 does. Similarly, although the probabilities of bankrupt 0.05 and 0.45 lead to ‘‘healthy’’ classiﬁcation, 0.05 predicts with a higher conﬁdence than 0.45 does. As is well known, a dichotomous decision based on numerical results is highly likely to be incorrect around the cutoﬀ boundary. To cope with this problem, we create a numerical conﬁdence for the prediction. If the estimated probability is greater than 0.5, the conﬁdence associated with this estimation is equal to the estimated probability itself. If the estimated probability is less than 0.5, the conﬁdence is one minus the estimated probability. For example, both the estimated probabilities of bankrupt 0.95 and 0.05 have a conﬁdence of 0.95. For each subject in the training data set, the relative weight of a source method is computed by the sum of the indicator variable and the conﬁdence associated with the estimated probability of bankrupt. if PrðY ¼ 1Þi > 0:50 I i þ PrðY ¼ 1Þi ; W TRðiÞ ¼ I i þ f1 PrðY ¼ 1Þi g; otherwise where i = MDA, LOGIT, ANN, and RULE. The role of the indicator variable, Ii, is to amplify the conﬁdence further based on the method-data ﬁtness.

407

An intuition regarding how the weights work can be given by comparing the following four cases. First, consider the case where Ii = 1 and Pr(Y = 1)i = 0.95. In this case, the prediction ‘‘bankrupt’’ is actually correct with a high level of conﬁdence. Therefore, the weight of the source method for the given subject must be high, 1.95 in this case. Second, consider the case where Ii = 1 and Pr(Y = 1)i = 0.05. In this case, the prediction ‘‘healthy’’ is correct with a high level of conﬁdence, 0.95. Thereby, a high value, 1.95 must be given as the weight of the source method for the given subject. Third, if Ii = 0 and Pr(Y = 1)i = 0.95, then we incorrectly predict ‘‘bankrupt’’. At least, though, a high conﬁdence should be reﬂected in the weight despite the incorrect prediction. Thus, the weight is 0.95. Fourth, if Ii = 0 and Pr(Y = 1)i = 0.55, then the prediction ‘‘bankrupt’’ is incorrect with a low level of conﬁdence. The weight is 0.55, implying a poor prediction with a low conﬁdence. 3.3.4. Step 4 The neural networks algorithm is now used to train the relative weights of the source methods given the independent variables (see Fig. 2). Once the model trained upon the training data set, the proposed model yields the relative weight of the source method for every single subject in the testing data set, WTE(i). What must be stressed in the learning algorithm is that since each subject has diﬀerent values of the independent variables, the relative weights of the source methods diﬀer subject by subject. 3.3.5. Step 5 The ﬁnal step is to predict whether ‘‘bankrupt’’ or ‘‘healthy’’ by integrating the relative weights of the source methods for the testing data set. P P 8 W TEðiÞ > W TEðiÞ < bankrupt; if b b b for Y i ¼1 for Y i ¼0 Y ¼ : healthy; otherwise Suppose we obtained the following predicted values for a particular subject in the testing data set using the four source methods: Yb MDA ¼ 1, Yb LOGIT ¼ 1, Yb ANN ¼ 0, and Yb RULE ¼ 0.

Input layer

Hidden layers

X1

Output layer WMDA

H1,1

H2,1

H1,2

H2,2

•

•

•

•

•

•

H1,k

H2,l

X2

WLOGIT

•

X11

WANN

WRULE

Fig. 2. Neural networks for training the relative weight.

408

S. Cho et al. / Expert Systems with Applications 36 (2009) 403–410

If the neural networks learning system yields the relative weights of the source methods such that: WTE(MDA) = 1.40, WTE(LOGIT) = 0.85, WTE(ANN) = 0.67, WTE(RULE) = 1.50, then the proposed model will predict ‘‘bankrupt’’ for the P W TEðiÞ ¼ 2:25 and corresponding subject since P for b Y i ¼1 W ¼ 2:17. TEðiÞ b for Y i ¼0

4. Experiment and evaluation 4.1. Data description and input variable selection For the experiment, we used the yearly ﬁnancial data collected by the Korea Credit Guarantee Fund. The corpoTable 2 List of ﬁnancial variables selected

rations used in the analysis belong to a manufacturing industry with an asset size from $1 million to $7 million. The data consist of 900 bankrupted corporations and 900 healthy (non-bankrupted) corporations from the ﬁscal year 1999–2002. The variables are standardized after the elimination of outliers. Out of 83 variables in total, 54 variables are selected by a t-test as a preliminary screening, and then 11 variables are ﬁnally selected by a stepwise logistic regression. The variables ﬁnally selected include interest expenses to sales, proﬁt to sales, operating proﬁt to sales, ordinary proﬁt to total capital, current liabilities to total capital, growth rate of tangible assets, turnover of managerial assets, net ﬁnancing cost, net working capital to total capital, growth rate of current assets, and ordinary income to net worth (see Table 2). 4.2. Experiment results

Variable

Deﬁnition

Interest expenses to sales Proﬁt to sales Operating proﬁt to sales Ordinary proﬁt to total capital Current liabilities to total capital Growth rate of tangible assets Turnover of managerial assets Net ﬁnancing cost Net working capital to total capital Growth rate of current assets Ordinary income to net worth

(interest expenses/sales) · 100 (proﬁt/sales) · 100 (operating proﬁt/sales) · 100 (ordinary proﬁt/total capital) · 100 (current liabilities/total capital) · 100 (tangible assets at the end of the year/tangible assets at the beginning of the year · 100) 100 Sales/{total assets (construction in progress + investment assets)} Interest expenses interest incomes {(current assets current liabilities)/total capital} · 100 (current assets at the end of the year/current assets at the beginning of the year · 100) 100 (ordinary income/net worth) · 100

A ﬁve-fold cross validation is employed to enhance the generalizability of the test results (Zhang, Hu, Patuwo, & Indro, 1999). The experiment results are summarized in Table 3. SVM stands for the Simple Voting Model and IMSW stands for the Integrative Model with Subject Weight. The prediction accuracy is measured in three ways – correct classiﬁcation ratio (RC), false positive ratio (FP) that refers to incorrectly predicting ‘‘bankrupt’’, and false negative ratio (FN) for implying incorrectly predicting ‘‘healthy’’. There are a few ﬁndings from the experiments. First, with respect to the overall prediction accuracy, the IMSW results in the best performance. The mean of the correct classiﬁcation ratios of the IMSW is 78.92%, which is greater than other models. Regarding the misclassiﬁcation, the LOGIT gives the lowest false positive ratio, 11.79%, whereas the ANN gives the lowest false negative ratio, 7.61%. In contrast to the ﬁndings by Olmeda and

Table 3 Comparison of the prediction accuracy for the testing data set Model

Set 1

Set 2

Set 3

Set 4

Set 5

Mean

Standard deviation

MDA

RC FP FN

79.36 12.69 7.93

78.83 13.35 7.80

78.04 12.69 9.25

76.85 14.28 8.86

77.65 12.56 9.78

78.15 13.11 8.72

0.98 0.72 0.85

LOGIT

RC FP FN

80.15 11.24 8.59

78.30 12.16 9.52

78.04 11.50 10.44

76.19 12.69 11.11

77.51 11.37 11.11

78.04 11.79 10.15

1.43 0.61 1.09

ANN

RC FP FN

79.49 14.41 6.08

78.57 14.15 7.27

77.64 15.74 6.61

75.79 14.55 9.65

78.57 12.96 8.46

78.01 14.36 7.61

1.40 0.99 1.44

RULE

RC FP FN

74.86 16.00 9.12

70.76 21.82 7.40

72.08 15.87 12.03

71.96 16.13 11.90

72.22 12.83 14.94

72.38 16.53 11.08

1.51 3.26 3.92

SVM

RC FP FN

80.15 13.49 6.34

78.96 14.02 7.01

77.64 12.96 9.39

75.66 14.41 9.92

77.65 12.69 9.65

78.01 13.51 8.46

1.68 0.72 1.66

IMSW

RC FP FN

80.69 12.96 6.34

79.23 13.88 6.87

78.17 12.43 9.39

77.65 13.22 9.12

78.84 11.77 9.39

78.92 12.85 8.22

1.16 0.80 1.49

S. Cho et al. / Expert Systems with Applications 36 (2009) 403–410

Difference 6 5 4 3 2 1 0 -1

1

-2 -3 -4 -5

2

3

4

5

Data Set

MDA LOGIT ANN RULE

-6

Fig. 3. Diﬀerence in the correct classiﬁcation ratios between the training data set and the testing data set: for the comparison of the source methods.

Fernandez (1997), the experiment results of this study reveal that the simple voting algorithm is not successful in consistently outperforming the individual prediction methods such as the MDA and the ANN. Second, the variability does not seem to be problematic in this case since all models produce fairly stable results. The RULE has a relatively higher standard deviation in terms of FP and FN. This tendency can also be checked in terms of the ﬂuctuation in the prediction accuracy between the training data set and the testing data set. Fig. 3 depicts the diﬀerence in the correct classiﬁcation ratios (i.e., the RC of the training data set – the RC of the testing data set). Unlike other three source methods where the diﬀerence looks somewhat random, the RULE shows a consistent decrease in the prediction accuracy upon applying the trained model into the testing data set. This may be due to the weakness of the decision trees algorithm such that decision trees can possibly have a fairly large amount of forecasting error around the boundary of partitioning because it treats a continuous variable in a categorical way. Third, it appears that all the six models evaluated except the RULE are slightly better or worse with respect to prediction accuracy. This may be, conversely, an evidence such that how hard it is to develop a new approach that predicts better than the currently-in-use approaches. The proposed model seems to be fruitful, although only marginally improving the overall forecasting, because it suggests a way of integrating fairly diﬀerent prediction algorithmsbased on subject-adjusted learning. 5. Conclusion To better classify a dichotomous decision-making problem, this study proposed an integrative prediction approach that combines the statistical methods with the artiﬁcial intelligence techniques. In particular, the multiple

409

discriminant analysis, the logistic regression, the neural networks, and the rule induction are used. By combining these four source methods on the basis of subject-adjusted learning algorithm using the neural networks, the prediction can be, although only in a marginal degree, improved in the analysis. Clearly, no deﬁnite assertion can be made regarding the supremacy of our model over the others by inferring the experiment results of this study only. More research is deﬁnitely needed for conﬁrming our ﬁndings. However, given the exploratory nature of this study, it can be said that this study contributes to developing an integration strategy that eﬃciently extracts and learns about the goodness-of-ﬁt of method-data, which usually varies subject by subject, and then determines the relative weights of these methods diﬀerently for every single subject in the testing data set. Future studies might enhance the prediction accuracy in two avenues. First, it may be worthwhile to combine newly emerging techniques such as support vector machine. Among the pool of various source methods, a set of most eﬃcient methods can be selected depending on each one’s performance obtained by a preliminary test. Second, a more sophisticated learning algorithm must be developed in the future studies. As discussed before, it is not easy to develop a novel approach that consistently outperforms the currently-in-use approaches. Accordingly, it may be more practical to integrate the existing and newly emerging approaches and adjust the relative weight of each approach case by case. The extent to which the prediction becomes more accurate will rely on how sensitive the integrative learning algorithm can be, depending on the characteristics of data on hand. Acknowledgement This research was supported by the Sogang University Research Grants in 2004. References Abdelwahed, T., & Amir, E. M. (2005). New evolutionary bankruptcy forecasting model based on genetic algorithms and neural networks. In The 17th IEEE international conference on tools with artiﬁcial intelligence (pp. 241–245). Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. Journal of Finance, 23(4), 589–609. Anandarajan, M., Lee, P., & Anandarajan, A. (2001). Bankruptcy prediction of ﬁnancially stressed ﬁrms: An examination of the predictive accuracy of artiﬁcially neural networks. International Journal of Intelligent Systems in Accounting, Finance and Management, 10, 69–81. Atiya, A. F. (2001). Bankruptcy prediction for credit risk using neural networks: A survey and new results. IEEE Transactions on Neural Networks, 12(4), 929–935. Baek, J., & Cho, S. (2003). Bankruptcy prediction for credit risk using an auto-associative neural network in Korean ﬁrms. In IEEE International Conference on Computational Intelligence for Financial Engineering (pp. 25–29). Bank of Korea. (2006). The Bank of Korea: Statistics Database.

410

S. Cho et al. / Expert Systems with Applications 36 (2009) 403–410

Beaver, W. H. (1966). Financial ratios as predictors of failure. empirical research in accounting: Selected studies. Journal of Accounting Research, 4(1), 71–111. Boritz, J. E., & Kennedy, D. B. (1995). Predicting corporate failure using a neural network approach. Intelligent Systems in Accounting, Finance, and Management, 4, 95–111. Bryant, S. M. (1997). A case-based reasoning approach to bankruptcy prediction modeling. Intelligent Systems in Accounting, Finance and Management, 6, 195–214. Charalambous, C., Charitou, A., & Kaourou, F. (2000). Comparative analysis of artiﬁcial neural network models: Application in bankruptcy prediction. Annals of Operations Research, 99, 403–425. Chen, M. C., & Huang, S. H. (2003). Credit scoring and rejected instances reassigning through evolutionary computation techniques. Expert Systems with Applications, 24, 433–441. Coates, P. K., & Fant, L. F. (1992). A neural network approach to forecasting ﬁnancial distress. Journal of Business Forecasting, 3(4), 8–12. Curram, S. P., & Mingers, J. (1994). Neural networks, decision tree induction and discriminant analysis: An empirical comparison. Journal of the Operational Research Society, 45(4), 440–450. Desai, V. S., Crook, J. N., & Overstreet, G. A. (1996). A comparison of neural networks and linear scoring models in the credit union environment. European Journal of Operational Research, 95, 24–37. Fletcher, D., & Goss, E. (1993). Forecasting with neural networks. Information and Management, 24, 159–167. Freund, Y., & Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the 13th international conference on machine learning (pp. 148–156). Morgan Kaufmann. Hansen, L. K., & Salamon, P. (1990). Neural network ensembles. IEEE Transactions in Pattern Analysis, 12(10), 993–1001. Heskes, T. (1997). Balancing between bagging and bumping. Advances in Neural Information Processing Systems (vol. 9). Cambridge: MIT Press (pp. 466–472).

Igelnik, B., Pao, Y., LeClair, S. R., & Shen, C. Y. (1999). The ensemble approach to neural-network learning and generalization. IEEE Transactions in Neural Networks, 10(1), 19–30. Klersey, G. F., & Dugan, M. T. (1995). Substantial doubt: Using artiﬁcial neural networks to evaluate going concern. Advances in Accounting Information Systems, Vol. 9 (pp. 267–273). Greenwich, CT: JAI Press. Lee, K., Booth, D., & Alam, P. (2005). A comparison of supervised and unsupervised neural networks in predicting bankruptcy of Korean ﬁrms. Expert Systems with Applications, 29, 1–16. Nasir, M. L., John, R. I., & Bennett, S. C. (2000). Predicting corporate bankruptcy using modular neural networks. In IEEE International Conference on Computational Intelligence for Financial Engineering (pp. 86–91). Ohlson, J. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18(1), 109–131. Olmeda, I., & Fernandez, E. (1997). Hybrid classiﬁers for ﬁnancial multicriteria decision making: The case of bankruptcy prediction. Computational Economics, 10, 317–335. Pai, G. A. R., Annapoorani, R., & Pai, G. A. V. (2004). Performance analysis of a statistical and an evolutionary neural network based classiﬁer for the prediction of industrial bankruptcy. In Conference on Cybernetics and Intelligent Systems, Singapore (pp. 1033–1038). Pendharkar, P. C. (2005). A threshold-varying artiﬁcial neural network approach for classiﬁcation and its application to bankruptcy prediction problem. Computers and Operations Research, 32, 2561–2582. Pompe, P. P. M., & Feelders, A. J. (1997). Using machine learning, neural networks, and statistics to predict corporate bankruptcy. Microcomputers in Civil Engineering, 12, 267–276. Treacy, W., & Carey, M. (2000). Credit risk rating at large US banks. Journal of Banking and Finance, 24, 167–201. Zhang, G., Hu, M. Y., Patuwo, B. E., & Indro, D. C. (1999). Artiﬁcial neural networks in bankruptcy prediction: General framework and cross-validation analysis. European Journal of Operational Research, 116, 16–32.

An integrative model with subject weight based on neural network learning for bankruptcy prediction

An integrative model with subject weight based on neural network learning for bankruptcy prediction

Recommend Documents