Computers & Operations Research 38 (2011) 409–419
Contents lists available at ScienceDirect
Computers & Operations Research journal homepage: www.elsevier.com/locate/caor
Hybridizing principles of TOPSIS with case-based reasoning for business failure prediction Hui Li a,b,, Hojjat Adeli b,1, Jie Sun a, Jian-Guang Han c a
School of Economics and Management, Zhejiang Normal University, P.O. Box 62, YingBinDaDao 688, Jinhua, Zhejiang 321004, PR China College of Engineering, Ohio State University, 470 Hitchcock Hall, 2070 Neil Avenue, Columbus, OH 43210, USA c School of Economics and Management, Harbin Institute of Technology, 13 FaYuanJie Street, Harbin, Heilongjiang 150001, PR China b
a r t i c l e in fo
abstract
Available online 25 June 2010
Case-based reasoning (CBR) solves many real-world problems under the assumption that similar observations have similar outputs. As an implementation of this assumption and inspired by the technique for order performance by the similarity to ideal solution (TOPSIS), this paper proposes a new type of multiple criteria CBR method for binary business failure prediction (BFP) with similarities to positive and negative ideal cases (SPNIC). Assuming that the binary prediction of business failure generates two results, i.e., failure and non-failure, we set the principle of this CBR forecasting method which is termed as SPNIC-based CBR as follows: new observations should have the same output as the positive or negative ideal case to which they are more similar. From the perspective of CBR, the SPNIC-based CBR forecasting method consists of R4 processes: retrieving positive and negative ideal cases, reusing solutions of ideal cases to forecast, retain cases, and reconstruct the case base. As a demonstration, we applied this method to forecast business failure in China with three data representations of a formerly collected dataset from normal economic environment and a representation of a recently collected dataset from financial crisis environment. The results indicate that this new CBR forecasting method can produce significantly better short-term discriminate capability than comparative methods, except for support vector machine, in normal economic environment; On the contrary, it cannot produce acceptable performance in financial crisis environment. Further topics about this method are discussed. & 2010 Elsevier Ltd. All rights reserved.
Keywords: TOPSIS Multiple criteria case-based reasoning Business failure prediction Similarities to positive and negative ideal cases
1. Introduction One of the most important activities in management is performance measurement. Employers, governmental officials, investors, creditors, and bankers need effective tools to help them identify companies with good performance from those that perform badly. Investing or working in companies with potential developing capabilities will reduce risk. If a company goes bankruptcy or financial distress, then the company is in business failure. Research on business failure prediction (BFP) can provide tools for identifying companies with good performance [1–9]. Nowadays, companies and their suppliers and consumer companies have comprised supply chains to survive and make more money in global competition. Companies must therefore make their supply chains more competitive by cooperating with
Corresponding author. Tel.: + 86 158 8899 3616.
E-mail addresses:
[email protected],
[email protected] (H. Li),
[email protected] (H. Adeli),
[email protected] (J. Sun). 1 Abba G. Lichtenstein Professor. 0305-0548/$ - see front matter & 2010 Elsevier Ltd. All rights reserved. doi:10.1016/j.cor.2010.06.008
trustworthy companies with good performance. For companies, it is sensible to use BFP as a pre-diagnosis tool for supply chain trust diagnosis because companies with high performance should be trusted and employed in order to build a competitive supply chain with minimal risk. If it is predicted that a company will fail, this company does not deserve trust from other companies and should not be chosen as part of a supply chain unless it adopts measures to improve its performance. Some other criteria can be further used when making the final decision of supplier evaluation, e.g. quality, delivery, price/cost, manufacturing capability, service, management, technology, research and development, flexibility, reputation, relationship, risk, safety, and environment [10]. BFP can be used as a tool to check the credit risk of companies. The results and principles of such a tool must be easily interpretable. For example, an investor can employ BFP to find out whether or not a company deserves his or her investment. BFP can also be used by companies as a tool for supply chain trust diagnosis. Case-based reasoning (CBR) is an artificial intelligence approach which simulates the human problem-solving mechanism of solving new problems by recalling
410
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
similar experiences. It is based on the assumption that similar observations have similar outputs. Due to its characteristics of ease of computation and ease of understanding, the chief decision-aiding technique for order performance by similarity to ideal solutions (TOPSIS) has been widely used in solving decision problems by functioning preferences of human beings. Deriving the concept of displaced ideal point from which the compromised solution would have the shortest distance, TOPSIS is a useful technique developed in the area of multi-criteria decision making. Under the assumption that the current problem may have several alternative solutions, this method ranks alternatives by comparing alternatives’ distances with the positive and negative ideal solutions, and then generating one preferred solution from alternatives. This preferred alternative is the one which is more similar to the positive ideal solution than other alternatives and is more dissimilar to the negative ideal solution than other alternatives. The inside mechanism is easily interpretable and explainable. Although TOPSIS is not designed to solve forecasting problems, comparison of distances with positive and negative solutions can be regarded as a specific implementation of the idea that similar observations have similar outputs. If an alternative is similar to the positive ideal solution, how well it can solve the current problem is correlated with how similar it is to the positive ideal solution. Inspired by the idea of TOPSIS, we attempt to investigate and apply a new type of CBR method to solve the problem of BFP. By revising principles of TOPSIS under the assumption that similar observations have similar outputs, this research is devoted to proposing a new CBR forecasting method of BFP based on the similarities to positive and negative ideal cases (SPNIC). This method belongs to a type of multiple criteria CBR (MCCBR), which refers to a hybrid method from combination of multiple criteria decision-aiding techniques with CBR. This forecasting method retains characteristics of ease of interpretation and explanation, and serves as an effective alternative predictive tool of BFP. This paper is organized as follows. Section 2 introduces basic concepts of CBR and TOPSIS, with a discussion on significance of this research. Principles and procedures of the new CBR, i.e., SPNICbased BFP method, are proposed in Section 3. The new forecasting method is applied to predict the business failure of Chinese companies with three data representations of a dataset from normal economic environment and a dataset from financial crisis environment in Section 4. Finally, conclusions and research limitations are discussed in Section 5.
2. Concepts of case-based reasoning and TOPSIS 2.1. Case-based reasoning (CBR) Case-based reasoning is a process of solving new problems by integrating solutions of similar past experiences. The old problems are called cases. As the name indicates, output of CBR is based on past cases. For example, a judge can create some case laws by collecting some specific cases. In the future, other judges can refer to case laws if a similar case happens. The earliest contribution of CBR can be traced to Schank [11]. CBR has been viewed as a plausible high-level model of cognitive processing [12]. CBR is implemented as an R4 model [13]. The four R’s in the R4 model represent retrieval, reuse, revise, and retain. These four words mean retrieving similar previously experienced cases from case base, reusing solutions of similar cases to solve the current problem, revising or adapting generated solutions in order to make them suitable for new situations, and retaining the new solution once it has been validated.
CBR can be used in forecasting problems such as BFP [14], credit scoring, and others. However, new types of CBR need to be constructed to make both BFP and CBR more applicable. In these types of problems, case reuse and revision steps are difficult to distinguish. Case revision is not needed to process basic CBRbased BFP since labels of experienced cases are integrated to make predictions. These predictions should not be revised, unless some other evidence, such as an auditing judgment from a Certified Public Accountant, indicates that the predictions are not correct. Thus, the three R’s, i.e., retrieval, reuse, and retain, are useful in CBR-based BFP if all cases are represented correctly. Case retrieval is the most important step in a CBR-based forecasting. Other stages either serve case retrieval or are products of this stage. The most commonly investigated retrieval technique is k nearest neighbors (kNN), a technique that involves developing a similarity metric to measure similarity among cases. In this algorithm, cases retrieved are those with greater similarities that match the current case than other cases. Since feature selection is a specific case of feature weighting, all features are commonly weighted equally after employing the process of feature selection. Consider the situation with two cases: a and b. The similarity between them can be calculated by the following means [38]: Simðca ,cb Þ ¼
1 ¼ 1 þ b EucDisðca ,cb Þ
1 qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 2 1þb i ¼ 1 wi ðxa,i xb,i Þ
ð1Þ
where ca and cb denote the two cases, EucDis( ) denotes the Euclidean distance between them, xa,i and xb,i denote the values of the ith feature of cases a and b, wi expresses the weight of the ith P feature satisfying the condition that wi ¼1, and b is a turning parameter, which is commonly set as 1 to simplify the problem. CBR has been employed in a number of different applications such as intelligent decision support for pathology ordering [15], classification [16], personnel rostering [17,18], wholesaler’s returning book forecasting [19], course timetabling problems [20], steel bridge engineering [21], and context-aware comparative shopping [22].
2.2. TOPSIS TOPSIS has the fewest rank reversals, namely the computation complexity, among eight popular multiple criteria decisionmaking methods [23], which means that this method is easily interpretable, explainable, and usable. Current researches in TOPSIS area chiefly focus on making it more applicable in decision-aiding problems. For example, Shih [24] used TOPSIS as a reference tool when exploiting incremental analysis to overcome the common drawbacks of ratio scales, i.e. invariant under positive similarity transformations, utilized in various multicriteria decision-making methods. TOPSIS, which was proposed by Hwang and Yoon [25], has become one of the chief multiple criteria decision-aiding techniques. It chooses one optimal solution for a real-world problem from several alternative solutions. All alternatives can be used to solve the current problem. This is different from CBR where only the most similar experienced cases can be used to solve the current problem. The concern is to find a preferred solution that is superior to all other alternatives. In attempting to find the most optimal solution, TOPSIS operates under the assumption that the most preferred alternative should have the shortest distance from the positive ideal solution and the largest distance from the negative ideal solution. The so-called positive ideal solution refers to a solution that could perfectly solve the current problem, while the so-called negative ideal solution refers to the worst possible solution to the
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
current problem. These two types of solutions normally do not exist in the real world. We can assume that each feature takes a monotonically increasing variation. Suppose the current problem has n alternatives, and each alternative has m features. Let xj,i denote the value of the ith feature of the jth alternative. There are five steps in TOPSIS according to Hwang and Yoon [25]: (1) Normalize the decision matrix using the following formula: xj,i vj,i ¼ wi qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 2 k ¼ 1 xk,i
ð2Þ
where wi is a weight value of the ith feature, xj,i and xk,i denote values of the jth alternative on the ith and kth features, vj,i denotes the standardized value of the jth alternative on the ith feature. (2) Calculate the positive and negative ideal solutions using the following formula: A þ ¼ fv þ ,1 ,v þ ,2 , . . . ,v þ ,m g ¼ fðmax vj,i jiÞg A ¼ fv,1 ,v,2 , . . . ,v,m g ¼ fðmin vj,i jiÞg
ð3Þ
where A + and A , respectively, denote the positive ideal solution and the negative ideal solution, m denotes the number of features. (3) Compute the distances between each alternative and the two ideal solutions using the following formula: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u m uX Sjþ ¼ t ðvj,i v þ ,i Þ2 i¼1
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u m uX Sj ¼ t ðvj,i v,i Þ2
ð4Þ
i¼1
where Sjþ and S j , respectively, express the distance between the jth alternative and the positive ideal solution, and the distance between the jth alternative and the negative ideal solution. (4) Calculate the relative closeness to the ideal solution using the following formula: Cj ¼
Sjþ Sjþ þS j
ð5Þ
where Cj (0,1] represents the degree of closeness. (5) Rank alternatives and choose the most preferred one. For more detailed information about TOPSIS, pleaser refer to the works of Opricovic and Tzeng [26], Wang et al. [27], Chamodrakas et al. [28], Deng et al. [29], Ashtiani et al. [30], Ic and Yurdakul [31], and Ahi et al. [32]. 2.3. Significance of this research Besides the significance in BFP, other significances of this research are explained as follows:
In terms of CBR, we present a new type of CBR forecasting method. The drawback of classical CBR is that determination of the number of nearest neighbors is a tough task which is commonly solved through trial-and-error process. The new type of CBR based on SPNIC calculates how similar the current
411
problem is to the positive ideal case and the negative ideal case. These two ideal cases are calculated from the case base. Thus, one does not need to determine nearest neighbors’ optimal value. The solutions of ideal cases are used to solve the current problem. In terms of TOPSIS, we extend the applicable area of the TOPSIS idea. Traditionally, TOPSIS is proposed, applied, enhanced, and known only in the area of decision making. By integrating the basic idea of TOPSIS with CBR, we propose a new type of multiple criteria CBR forecasting method which can be applied to some other areas where TOPSIS has seldom been applied successfully as a quantitative tool, e.g., BFP and credit scoring. This extension demonstrates that the idea of TOPSIS is also suitable for some other areas.
3. The new CBR forecasting method based on SPNIC CBR can be used as a case-based forecasting method of BFP. Knowledge-based forecasting method should be constructed on some specific datasets. When CBR is used as a tool to forecast business failure, samples of healthy and failed companies are used to create the case base. When the current company is to be predicted on its business state, the most similarity cases are retrieved from case base. Firstly, similarities between the current company case and historical company cases are computed. Then, labels of historical cases are integrated by majority voting to reuse solutions of historical cases in making prediction. Case revision is not used in making the prediction. Case retain is then employed to add new case into case base. Traditionally, TOPSIS is not viewed as a quantitative method of BFP. As a preference-based approach, it can be used as a decision-aiding technique to handle preferences of experts on business states of companies. A decision-maker should be involved to express his/her preference on prediction of the current company. The preference of one single decision-maker is biased. Thus, a group of decision-makers should be introduced. Sun and Li [6] conducted the work of employing group decisionmaking for BFP according to preferences of experts. This approach is qualitative. In this research, we attempt to transfer principles of TOPSIS together with CBR to generate a new quantitative method for BFP. This method can be used when datasets for BFP are ready. Collecting datasets for BFP is more operational than finding a group of decision-makers with expertise sometimes. 3.1. Concept of the SPNIC-based CBR forecasting method Assuming that the binary prediction of business failure generates two results, failure and non-failure, we set the principle of this CBR forecasting method as follows: new observations/ samples should have the same output as the positive or negative ideal case which they are more similar to. More specifically, observations should be predicted as in failure if they are more similar to the positive reference point. On the other hand, observations should be predicted as in non-failure if they are more similar to the negative reference point. This quantitative forecasting method consists of five processes: data normalization, distinguishing positive samples from negative observations, identifying positive and negative ideal points, calculating similarities to reference points, and making predictions. 3.2. Usage of SPNIC-based CBR forecasting method CBR-based research is mostly founded on the algorithm of retrieving several most similar experienced cases to the current one. There are several different implementations to fulfill this idea such as kNN and Structured Query Language (SQL), among others. We can also regard the usage of CBR as a means to rank
412
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
experienced cases according to their similarities to the current problem. Notice that (1) the current problem should be represented with the same features used to represent experienced cases, (2) representation of the current problem must be used to rank experienced cases, (3) the number of experienced cases can be very large, e.g., several hundreds or several thousands, and (4) only the most similar cases, not all experienced cases, should be used to generate the solution of the current problem. When using CBR to solve real-world problems, the current problem and historical cases should be represented with the same features. This requirement is the fundamental assumption of theory of CBR. Thus, new cases can be learned as time passes. Taking the specific CBR method with kNN as the heart as an example, the so-called similarity between a pair of cases is defined on the foundation of feature difference, which can not be obtained if the current problem and historical cases are represented with different features. If SQL technique is used in CBR, the current problem and historical cases should also be represented with the same features. Otherwise, the system is not able to retrieve historical cases from data base with SQL technique. In general, CBR ranks experienced cases with reference to the representation of the current problem. Please refer to Fig. 1 for a clearer understanding on the usage of CBR. Just like studies in CBR, some studies attempt to revise the algorithm of TOPSIS in order to improve its decision performance or enable it to work more broadly. However, almost all studies have the assumption that the preferred alternative should not only have the shortest distance from the positive ideal solution but also have the largest distance from the negative ideal solution. The usage of TOPSIS involves ranking alternative solutions of the current problem and finding the most preferred one. Notice the following: (1) the current problem is represented by using criteria that should be considered in ranking alternatives; however, alternatives are represented by using the preference values of decision makers on the alternative under the criteria, (2) representation of the current problem cannot and should not be used in the process of ranking alternatives, as the current problem in TOPSIS is not represented quantitatively but descriptively, and
(3) the number of alternative solutions should be very small, e.g., no more than 10 alternatives, since decision makers have difficulty in handling extremely huge comparisons between each pair of alternatives. In fact, the candidate solutions for the current problem are limited, which also determines that the number of alternatives cannot be large, and (4) all the alternatives can be used to solve the current problem, and what is needed is a preferred solution. Please refer to Fig. 2 for a clearer understanding on the usage of TOPSIS. By integrating the usage of TOPSIS into CBR, the usage of the new SPNIC-based CBR forecasting method is as follows: to implement the principle that similar observations have similar outputs, the new CBR forecasting method attempts to solve the current problem by using solutions of the positive ideal case and the negative ideal case to make a prediction. This means that (1) the current problem should be represented by using the same features that are used to represent experienced cases, (2) the representation of the current problem must be used to compute whether it is more similar to the positive ideal case or the negative ideal case, which belongs to binary ranking, (3) the number of experienced cases can be very large, and (4) only positive ideal cases and negative ideal cases are used in prediction. Please refer to Fig. 3 for a clearer understanding on the usage of the SPNIC-based CBR forecasting method. TOPSIS has the limitation on the number of alternatives when dealing with preference-based decision-making. As a tool to transfer preferences of decision-makers into quantitative values, it is hard for decision-makers to rank more than ten alternatives precisely. However, if preferences of decision-makers are not involved and TOPSIS is used to rank quantitative alternatives, some computing tool-boxes, e.g., Excel, Matlab, can be used to implement it. We inject the principles of TOPSIS, namely, identifying positive and negative ideal points, into CBR to generate a new hybrid CBR. Preferences of decision-makers are not used in this hybrid approach. Thus, the proposed approach is able to handle hundreds of samples. When using CBR in dealing with quantitative problems, the criteria from expertise are sometimes different from the criteria from statistical techniques. The reason may be as follows. The criteria from expertise are transferred from recognition of human beings or some pure theory analysis. The assumptions of such criteria may not be met by some specific problems. The criteria from statistical techniques are transferred from analyzing specific datasets. These criteria may be more effective for specific problems. Correlated features
Fig. 1. The usage of CBR.
Fig. 2. The usage of TOPSIS.
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
413
Fig. 3. The usage of SPNIC-based CBR forecasting method.
Fig. 4. The R4 model of SPNIC-based CBR forecasting method.
will introduce redundancy and double counting. In features generated by expertise, the problem of correlated feature should be handled if feature weighting is not used. Or else, one can put the weight of correlated feature as zero. In features generated from statistical techniques, this problem should also be handled according to the assumptions of some predictive models. However, if two weakly correlated features are selected by statistical techniques, it indicates that the features are more important than some other features. Thus, this feature should be weighted more. Another means to handle correlated features is to extract principal components from these features. 3.3. R4 model of the SPNIC-based CBR forecasting method
After case representation, positive and negative ideal cases
The R4 model of classical CBR can help the method to be easily understood, interpreted, and explained by humans. When basic ideas of TOPSIS are combined with CBR, the classical R4 model can be expressed as Fig. 4.
In the binary predictive problem of BFP, cases are represented by using financial ratios calculated from public statements of companies such as the balance sheet, income statement, and cash flow statement, among others. Solutions of cases are represented with {failure, non-failure} or {+1, 1} according to whether a company goes bankruptcy/financial distress or not.
should be calculated and retrieved from the case base. We assume that cases have two types of solutions, i.e., failure and non-failure, and we attempt to identify companies that will possibly fail in the future by reusing solutions of similar cases. Thus, failed cases are considered as positive and nonfailed cases are considered as negative. In this assumption, we can calculate and retrieve the positive and negative ideal cases. Notice that if solutions of cases are more than two types, the SPNIC-based CBR forecasting method should be revised to meet this condition, which needs to be researched further in the future. We focus on binary forecasting in this research since this problem is fundamental. After positive and negative ideal cases are retrieved, SPNICbased CBR can make a prediction on the current problem by reusing solutions of the two types of ideal cases. In order to reuse solutions of idea cases, similarities between the current problem and the two ideal cases are calculated. Then a mechanism is employed to determine which ideal case the current problem is more similar to. Finally, the new case is retained to the case base. We can find that the two ideal cases are very important in making prediction. Thus, quality of the two ideal cases is very critical. It is vital to employ some approaches to reconstruct the case base in order to generate ideal cases effectively.
414
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
In case retrieval, the most similar cases are retrieved. Then, labels of the most similar cases are integrated by majority voting to generate a consensus decision. Commonly, all available historical cases are used to build up case base and used in case retrieval. However, only the retrieved similar cases are used to make predictions. In the specific problem of BFP, historical cases of healthy and failed companies should be collected firstly. Then, they are used to build up case base. Case retain refers to case learning as the time pasts. New healthy or failed companies are added into case base to implement case retain. Thus, cases in case base are important to determine the final results produced by the system. In the area of CBR, some researches begin to focus on how to maintain case base. It is valuable to conduct further researches on management of case base in the area of BFP. There are several issues that should be taken into consideration when conducting these researches. Characteristics of each sector of the economy are different. As a result, the so-called significant ratios are different.
where Fb and Fc, respectively, denote benefit features and cost features. The labels of the two ideal cases are expressed as follows: LðA þ Þ ¼ þ 1 LðA Þ ¼ 1
ð8Þ
Step 3: Obtain distances between the current case and each ideal case. This is a key step in calculating case similarities. The so-called similarity can be simply defined as 1/distance. In classical CBR, Euclidean distance is used to measure distance between two cases by integrating feature distances of two cases. Thus, we employ Euclidean distance as classic CBR does in the new hybrid method. The formulas are as follows: d0 ¼ jv0,i vi j
D0þ
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u m uX þ 2 ¼t ðd0 Þ
ð9Þ
ð10Þ
i¼1
3.4. Specific model of the new CBR forecasting method The model of the new CBR forecasting method aims to implement the R4 model to make prediction. Assume that the case base consists of n experienced cases, and m features are used to represent them. Assume that the current problem is composed of one case, and m features are utilized to represent it. Let x0,i express the value of the target case on the ith feature. This model is composed of the following steps: Step 1: Normalize the current case and the case base. In order to normalize feature values of each case into the range of [0,1] to reduce sensitivity of CBR to large feature values, the normalization is implemented with the following transformation for each xj,i: xj,i mini ðxj,i Þ , rj,i ¼ wi maxi ðxj,i Þmini ðxj,i Þ
j ¼ 0,1, . . . ,m;
i ¼ 1,2, . . . ,n
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u m uX 2 t D ¼ ðd 0 0Þ
ð11Þ
i¼1
This step is different from formula (4) of TOPSIS, where distances are computed between alternatives and each ideal point. In SPNIC-based CBR, the distances are computed between the current case and each ideal case. Step 4: Obtain similarities between the current case and each ideal case. Euclidean distances between the current case and each ideal case can be transferred to be similarities by the following formulas, with which the similarities between the target case and the two ideal cases can be calculated S0þ ¼
1 1 þ bD0þ
ð12Þ
S 0 ¼
1 1þ bD 0
ð13Þ
ð6Þ
where maxi(xj,i) and mini(xj,i), respectively, represent the maximum and minimum values of the ith feature among all cases, and wi is the weight of the ith feature. Rj,iA[0,1]. The weight values can be set with some feature weighting approaches, e.g. analytic hierarchy process (AHP). We suggest the employment of feature selection approach, such as stepwise MDA, to choose significant features to represent cases. Feature selection is a specific type of feature weighting. Thus, after feature selection, one can put equal importance on each feature to simplify the problem. Feature selection weights variables by putting zero weights on non-selected ones and the same weights on selected ones. Additional feature weighting approaches, such as: AHP, genetic algorithm, are helpful in improving performance of predictive model. Step 2: Determine the positive ideal case and negative ideal case. These two types of ideal cases are calculated from the experienced cases in the case base. Commonly, cases are represented by benefit features and cost features. For benefit feature, the higher the feature value is, the better it is for the company. For cost feature, the smaller the feature value is, the better it is for the company. Cost features can be transferred into benefit features. The two ideal cases can be determined as follows according to principles of TOPSIS: A þ ¼ fv þ ,1 ,v þ ,2 , . . . ,v þ ,m g ¼ fðmaxi vj,i jiA Fb Þ,ðmini vj,i jiA Fc Þg A ¼ fv,1 ,v,2 , . . . ,v,m g ¼ fðmini vj,i ji A Fb Þ,ðmaxi vj,i ji A Fc Þg, j ¼ 1,2, . . . ,m; i ¼ 1,2, . . . ,n ð7Þ
where b is a turning parameter, which is commonly set as 1. This parameter can be used to turn the influence of distance values on similarity. When these two cases are becoming more far away, the larger the parameter is, the smaller the similarity between two cases will be. By setting the parameter as 1, the distance can be transferred to be similarity smoothly. The reason why TOPSIS is not continually applied in solving the problem of BFP, instead, SPNIC-based CBR is constructed for this problem is as follows. In initial TOPSIS, the assumption is that the solution most similar to the positive ideal point is preferred. This means that the positive ideal solution is the best solution, and the negative ideal point is the worst solution. However, in SPNIC-based CBR, the assumption is that the current problem is predicted by label of positive ideal solution or negative ideal solution. The positive ideal solution is not better than the negative ideal solution. The initial procedure of TOPSIS is to compute the relative closeness to the ideal solution, which is used to indicate how similar an alternative is to the positive ideal point. Thus, it does not fit the assumption of CBR. In the ranking process, no similarity between the current problem and alternative is computed. This means that TOPSIS cannot directly generate prediction on the current problem. The result produced by TOPSIS is the best ranked alternative according to positive and negative ideal points. Taking the problem of BFP as an example, the so-called positive and negative ideal cases are computed inside case base. The current problem is not involved in representing the two ideal cases. If the principles of TOPSIS are continuously
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
415
within a more small range of [yo 0.01, yo + 0.01] with the step of 0.001. Optimization algorithms, e.g., genetic algorithm, are helpful in determining the value. The integration of optimization algorithms with SPNIC-based CBR is an interesting topic which should be further investigated. 3.5. Algorithm of the SPNIC-based CBR forecasting method According to the presentation of step-by-step model in the last section, the pseudo-code of the SPNIC-based CBR forecasting method is illustrated as follows: Fig. 5. The relationship among the current case, positive ideal case, and negative ideal case.
employed, business state of the target company cannot be predicted without linking the current problem to case base. If the current case is more similar to the positive ideal case, it should be ranked positively. If the current case is more similar to the negative ideal case, it should be ranked negatively. Besides our approach of constructing a predictive method, there must be some other procedures that can revise TOPSIS to be a predictive method. This topic might be valuable to be further investigated to make TOPSIS more useful. Step 5: Make the prediction. The positive ideal case and the negative ideal case are ranked based on which are more similar to the current case. However, there are three conditions about relationship among the current case, positive ideal case, and negative ideal case, which is illustrated in Fig. 5 by assuming that a case is represented by two ratios. If the current case is in part I, it is more similar to the positive ideal case. Thus, the solution of the positive ideal case should be used to solve the current problem. On the contrary, if the current case is in part III, it is more similar to the negative ideal case. Thus, solution of the negative ideal case should be used to solve the current problem. If the current case is in distinct II, the two þ similarities, that is, S0,j , and S 0,j , should be compared to find which ideal case the current problem is more similar to. Thus, the prediction can be made via the following formulas: 8 þ þ > < LðA Þ if S0 4 S0 and D0 4D þ þ þ Lðx0 Þ ¼ LðA Þ if S0 o S0 and D0 4 Dþ ð14Þ > : Lu ðx Þ otherwise 0 and Dþ ¼
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðv þ ,i v,i Þ2
Lu ðx0 Þ ¼ signðS0þ S 0 yÞ
ð15Þ ð16Þ
where sign( ) is a function which only returns +1 or 1, and y is a turning parameter, which is in the range of [ 0.1, +0.1]. The turning parameter determines where the current problem is. More specifically, when y is set as 0, the predictive model is constructed as the middle of positive and negative ideal cases in Euclidean space. When y is set as a value that is larger than 0, the predictive model is constructed between the middle of the two ideal cases and the negative ideal case. On the contrary, when y is set as a value that is smaller than 0, the predictive model in constructed between the middle of the two ideal cases and the positive ideal case. This parameter is critical. Grid-search technique is useful to set the value of this parameter. It can be firstly searched in the range of [ 0.1: 0.1] with the step of 0.01. When an optimal value, yo, is found, it should be further searched
SPNIC-based CBR (CC, CB, L, y) Input: CC: the current case, that is, the current problem CB: the case base of experienced cases L: label of business state {+ 1, 1} y: turning parameter Process: //initialize the case base by case representation and normalization CB’Initialize (CB); //distinguish positive cases from negative cases {CBp,CBn}’Disting (CB); //retrieve the positive ideal case from positive cases and retrieve the negative ideal case from negative cases {(Idp, Lp), (Idealn, Ln)}’RetIdCase (CBp,CBn, L); //calculate the distance between the positive ideal case and the negative ideal cases {Disp,n}’GetDis (Idp, Idn); //respectively, calculate the distances between the current case and the two ideal cases {Disp, Disn}’GetDis (Idp, Idn,CC); //transfer distances to be similarities {Simp, Simn}’Transfer (Disp, Disn); //make prediction based on Eq. (14) LCC’Predict (Disp, Disn, Disp,n, Simp, Simn, Lp, Ln, y); Output: CC.labelA{ 1, +1}; 4. Empirical research 4.1. Design This research presented a new type of CBR forecasting method. The empirical work investigates whether or not the new CBR forecasting can be used to forecast business failure in China. Data were collected from the Shenzhen Stock Exchange and the Shanghai Stock Exchange. Initial dataset consists of 135 pairs of positive and negative samples during the year of 2000–2005, and they were represented with 30 commonly used financial ratios of listed companies noted in Table 1. However, not all companies publish all values on all the 30 ratios. Thus, samples with at least one missing value on financial ratios were deleted. Finally, we obtained 82 positive and 71 negative samples. These samples were randomly split into two groups (70%, 30%) for model construction and assessment. This split was repeated 200 times. When a predictive model is used, 200 predictions on the current problem can be generated. As a result, a series of independent results were generated for statistical analysis. Paired-samples t-test is a useful technique to find whether there are significant differences among compared predictive models. In order to investigate performance of the new CBR forecasting method, we used three different case representations: case representation using features selected by t-test, case representation using features selected by stepwise multivariate discriminant analysis (MDA), and case representation using features selected by
416
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
Table 1 Case representations using three types of features. No. Name
Case representation Stepwise MDA features
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Gross income/sales Net income/sales Ebit/total asset Net profit/total assets Net profit/current assets Net profit/fixed assets Profit margin Net profit/equity Account receivable turnover Inventory turnover Account payable turnover Total assets turnover Current assets turnover Fixed assets turnover Current ratio Cash/current liability Asset–liability ratios Equity/debt ratio Liability/tangible net asset Liability/equity market value Interest coverage ratio Growth rate of primary business Growth rate of total assets Current assets/total assets Fixed assets/total assets Equity/fixed assets Current liability/total liability Earning per share Net assets per share Cash flow per share
Stepwise logit features
t-test features
X X X X X X X X
X X
X
X
X X X X X X X X X X
X
X
X
X X
X X X X
stepwise logistic regression (logit). The four case representations are listed in Table 1. Notice that there are several different methods available when identifying key financial ratios for BFP. Identifying key BFP ratios according to some domain knowledge is one of them. The pioneer research of Altman [39] selected five financial ratios, namely, working capital to total assets, retained earnings to total assets, earning before interest and taxes to total assets, market value of equity to book value of total debt, and sales to total assets. They were claimed to describe the attributed economic characteristic. Back et al. [33], Hillegeist et al. [34], and McKee [40] found that not all of the five ratios used in Altman [39] were significant from the perspective of statistics and machine learning. Thus, either of the two types of feature selection approach might be useful. More and more researches, e.g., Min et al. [35], Lin and McClean [36], Jo et al. [14], employ feature selection approaches proposed from the perspective of statistics or machine learning. These approaches selected key ratios by assessing performance of variable on a given function. Please refer to the special issue on variable and feature selection of Journal of Machine Learning research [37] for a detailed introduction. The commonly used type of feature selection approaches were employed in this research. The initial 30 ratios are composed of profitability ratios, activity ratios, liability ratios, growth ratios, structure ratios, per share items and yields. We employed the same type of forecasting method which is easily understandable, interpretable, and explainable in the area of BFP as benchmarks. The three classical statistical methods of MDA, logit, and probit were taken into consideration. They construct models by estimating coefficients of multiple variables
from the data. It was also necessary to investigate whether the new type of CBR method is superior to classical CBR. Thus, classical CBR with kNN and CBR with decision tree as the heart were employed as benchmarks. For CBR with kNN, the number of nearest neighbors was set as seven. For CBR with decision tree, standard tree building and pruning algorithms were employed. For the new CBR forecasting method, the parameter of y was searched with grid-search technique. Finally, the optimal value of y was set as 0.055, 0.060, and 0.006 for the three different representations. 4.2. Results and discussion Predictive performances of SPNIC-based CBR, MDA, logit, probit, CBR with kNN, and CBR with decision tree on three case representations are listed in Tables 2 and 3, where the accuracy and standard deviation (SD) are the performances on 200 randomly generated datasets for each representation. In terms of the absolute predictive accuracies on MDA features, the proposed new type of CBR method produced the best predictive performance (89.69%) among all methods. This performance is better than those of MDA, logit, probit, CBR with kNN, and CBR with decision tree, respectively, by 1.3, 2.31, 2.58, 0.89, and 2.29 in terns of absolute value. This means that MDA, logit, probit, CBR with kNN, and CBR with decision tree produced worse performance than the new CBR by 12.61%, 22.41%, 25.24%, 8.63%, and 22.21%, respectively, in terms of error rate. This is a significant difference in predictive performance. The new CBR also generated the best predictive accuracy on logit features (88.27%). This result is better than those of MDA, logit, probit, CBR with kNN, and CBR with decision tree, respectively, by 1.37, 2.98, 3.05, 0.04, and 1.44 in absolute value. This means that the new CBR can achieve the same rank of performance as CBR with kNN, and they both outperform the other compared methods. In terms of proportion analysis on error rate, MDA, logit, probit, and CBR with decision tree are inferior to Table 2 Results of mean accuracies on the three datasets (%). Method
Mean MDA features
Logit features
t-test features
Benchmarks MDA Logit Probit CBR with kNN CBR with decision tree
88.39 87.38 87.11 88.80 87.40
86.90 85.29 85.22 88.23 86.83
84.67 80.52 80.78 87.99 86.70
Proposed method SPNIC-based CBR
89.69
88.27
86.84
MDA features
Logit features
t-test features
Benchmarks MDA Logit Probit CBR with kNN CBR with decision tree
3.57 4.48 4.48 3.84 4.20
4.64 4.72 4.76 4.15 4.48
5.04 5.62 5.73 3.79 4.53
Proposed method SPNIC-based CBR
4.41
4.44
5.43
Table 3 Results of SD on the three datasets. Method
SD
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
SPNIC-based CBR by 11.68%, 25.41%, 26.00%, and 12.28%, respectively. This result indicates a significant difference in predictive performance. In terms of results from the dataset represented by t-test features, CBR with kNN achieved the best predictive ratio (87.99%). The new CBR is outperformed by CBR with kNN at 86.84% accuracy. This new type of CBR is in the same rank as CBR with decision tree (86.70%). When comparison is made between SPNIC-based CBR and classical statistical methods, we find that the new type of CBR outperformed MDA, logit, and probit by 2.17, 6.32, and 6.06 in terms of absolute value, and by 16.49%, 48.02%, and 46.05% in terms of proportion ratios. This result indicates a significant difference in predictive performance. By integrating the results from the three datasets, we find that the new CBR is at least in the same rank as CBR with kNN, and outperforms CBR with decision tree, MDA, logit, and probit in Chinese short-term BFP. Notice that the new type of CBR achieved the best predictive accuracy among all ratios. This result shows that SPNIC-based CBR can make more precise prediction than CBR with kNN. This provides some evidence that we have fulfilled our aim to provide an effective alternative forecasting method which is accurate, easily interpretable, understandable, and explainable. In order to find out whether or not there are statistically significant differences in the predictive performance between the new CBR and each compared method, paired-sample t-test was employed. The null hypotheses are listed as follows.
H1: There is no significant difference in the predictive performance between the new CBR and MDA.
H2: There is no significant difference in the predictive performance between the new CBR and logit.
H3: There is no significant difference in the predictive performance between the new CBR and probit.
H4: There is no significant difference in the predictive performance between the new CBR and CBR with kNN.
H5: There is no significant difference in the predictive performance between the new CBR and CBR with decision tree.
417
The results of the significance test are listed in Table 4, where ‘‘***’’ means the difference is significant at the level of 1%, and ‘‘–’’ means there is no significant difference. From Table 4, we find that H1, H2, and H3 are rejected three times, which means that the differences between the new CBR and the three statistical methods of MDA, logit, and probit are significant in statistic. Combining the findings from analyzing absolute predictive accuracies, we can conclude that the new type of CBR is superior to MDA, logit, and probit significantly for Chinese shortterm BFP. H3 is rejected twice and accepted once, which means that CBR with kNN and the new CBR each win once and play zero once. This finding means that SPNIC-based CBR can provide at least the same performance as CBR with kNN. Considering that the new type of CBR produced the most optimal value of predictive accuracy, it is useful in forecasting the short-term business failure of Chinese companies. H5 is rejected twice and accepted once, which means that the new CBR can provide superior performance to CBR with decision tree. In general, the empirical research provides some evidence that the TOPSIS idea can be integrated with CBR to produce a new type of forecasting method, and this new CBR can provide effective performance in Chinese short-term BFP. 4.3. Further investigation with a new dataset and some other benchmarks When MCDA techniques are used to solve decision-making problems, recommendations given by the mathematical models from different MCDA methods are different. Can classification approaches, such as ELECTRE-based CBR, PROMETHEE-based CBR, produce acceptable predictive performances as SPNIC-based CBR does? This needs to be further investigated. Among other algorithms that are not employed in the above experiment, neural network should be regarded as a benchmark. Support vector machine (SVM) is viewed as a specific variation of neural network, and it overcomes some shortcomings of traditional neural network algorithms. Thus, predictive performances of SPNIC-based CBR and SVM are to be compared. We conducted an experiment on employing the three types of hybrid CBR and SVM
Table 4 Results of significance tests on the three datasets. Representation
Models
MDA features
Proposed method SPNIC-based CBR Benchmarks MDA Logit Probit CBR with kNN CBR with decision tree
Logit features
t-test features
Proposed method SPNIC-based CBR Benchmarks MDA Logit Probit CBR with kNN CBR with decision tree Proposed method SPNIC-based CBR Benchmarks MDA Logit Probit CBR with kNN CBR with decision tree
Mean
t and p
Significant level (%)
Hypothesis
5.153(0.000)*** 7.301(0.000)*** 8.131(0.000)*** 3.270(0.001)*** 7.711(0.000)***
1 1 1 1 1
Reject Reject Reject Reject Reject
4.167(0.000)*** 8.915(0.000)*** 9.053(0.000)*** 0.114(0.910) 4.527(0.000)***
1 1 1 – 1
Reject H1 Reject H2 Reject H3 Accept H4 Reject H5
6.110(0.000)*** 15.91(0.000)*** 14.98(0.000)*** 4.273(0.000)*** 0.451(0.653)
1 1 1 1 –
Reject H1 Reject H2 Reject H3 Reject H4 Accept H5
89.69 88.39 87.38 87.11 88.80 87.40
H1 H2 H3 H4 H5
88.27 86.90 85.29 85.22 88.23 86.83 86.84 84.67 80.52 80.78 87.99 86.70
418
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
Table 5 Results of predictive CBR approaches based on decision techniques and SVM with the best representation of the dataset (%). Methods
Minimum accuracy
Maximum accuracy
Mean accuracy
Median accuracy
SD
Benchmarks ELECTRE-based CBR PROMETHEE-based CBR SVM
77.78 80.00 77.78
95.56 95.56 95.56
88.82 88.86 89.20
88.89 88.89 88.89
4.46 3.51 3.68
Proposed method SPNIC-based CBR
75.56
97.78
88.82
88.89
3.92
Table 6 Results of predictive methods on the new dataset (%). Index
MDA
Logit
Probit
CBR with kNN
CBR with DT
CBR with ELECTRE
CBR with PROMETHEE
SVM
SPNIC-based CBR
Min Max Mean Median SD
73.68 100 94.95 94.74 5.59
84.21 100 96.89 97.37 3.67
84.21 100 96.79 94.74 3.58
73.68 100 90.79 89.47 6.41
89.47 100 95.89 94.74 2.95
78.95 100 94.68 94.74 5.21
78.95 100 93.74 94.74 5.48
84.21 100 97.89 100 3.89
42.11 100 84.21 84.21 10.92
for BFP with MDA features. The experiment was repeated a hundred times. The results of ELECTRE-based CBR, PROMETHEEbased CBR, SPNIC-based CBR, and SVM are provided in Table 5. From Table 5 we can find that all the three MCCBR approaches can produce acceptable performances, respectively, at 88.82 74.46%, 88.86 73.51%, 88.82 73.92%. This result indicates that predictive approaches based on multi-criteria methods can produce similar results after optimization. Thus, group decision or ensemble learning is valuable to integrate potential predictive methods. Meanwhile, SVM produced the best predictive performance among the four methods. The inside theory of SVM helps it map source data into a high dimensional space, where more effective model can be constructed. From the above research, the experiment provides some supporting evidence that the proposed approach from the integration of TOPSIS with CBR is capable of generating acceptable predictive performance on Chinese BFP. The dataset is composed of sample companies of Chinese listed companies in the year span of 2000–2005. Thus, comparison between the proposed approach and benchmark algorithms with a new dataset is necessary. However, in the followed years, the world, including China, was stepping into financial crisis step by step. Hillegeist et al. [34] pointed out that changes in economic environment will change the association between accounting variables and probability of business failure. The association between accounting ratios and probability of business failure changes in normal economic environment, much less in financial crisis. Thus, the following investigation is to find: either the proposed approach still belongs to the most effective predictive methods on the dataset in the period of financial crisis, or the association between accounting variables and probability of business changes. As Altman [39] did, we focused on manufacturing industry when collecting the new dataset. The same four ratios selected by stepwise MDA were used to represent the data, which consists of 65 samples (29 failure companies and 36 non-failure companies). All the above approaches, i.e., MDA, logit, probit, CBR with knn, CBR with decision tree, ELECTRE-based CBR, PROMETHEE-based CBR, SVM, and SPNIC-based CBR, were used. The predictive results are presented in Table 6. From Table 6 we can find that the most useful method for the new dataset is SVM. The new CBR ranks in the last. The new
CBR4CBR with kNN 4MDA 4CBR with DT 4logit 4probit on the initial dataset. SVM 4logit4probit4CBR with DT4MDA 4 CBR with ELECTRE4 CBR with PROMETHEE4the new CBR on the new dataset. This result indicates that the association between accounting ratios and probability of business failure changes with the change of the economic environment, especially with the change from normal economic environment to financial crisis environment. Thus, business failure prediction in normal economic environment and business failure prediction in financial crisis environment should be researched dependently. However, SVM performed the best in the two datasets from normal economic environment and financial crisis environment. Theory of SVM helps it produce high performance on small datasets and adapt to the change of association between accounting ratios and probability of business failure. In summary, the proposed method can produce acceptable predictive performance in Chinese short-term BFP with dataset from normal economic environment. The ease of computation and interpretation can make it more useful in industrial use. However, in financial crisis environment, some other methods should be employed to estimate the association between accounting variables and probability of business failure.
5. Conclusion and limitations This research concludes that the new type of CBR forecasting method is indeed a viable alternative method for BFP. It can help humans control the risk involved in their decisions by providing a prediction on companies’ performance. Traditionally, TOPSIS is proposed, applied, revised, and known in the area of decision making. We therefore extend applicable range of TOPSIS by combining it with CBR. Both CBR and TOPSIS assume similar observations to have similar outputs. This assumption is commonly employed by humans to solve daily problems. Thus, the new type of CBR holds the advantage of ease of interpretation and explanation. Furthermore, this method can produce acceptable predictive performance in normal economic environment. These two characteristics will make CBR more applicable. The concept, usage and the R4 model, implementable model, and algorithm of
H. Li et al. / Computers & Operations Research 38 (2011) 409–419
the SPNIC-based CBR were discussed in this research, which can help users understand the new method. This study has the following limitations. First, aside from applying the new CBR to forecast the short-term BFP of Chinese companies, we also forecasted Chinese medium-term and longterm BFP by using the new type of CBR. However, it produced unacceptable performance. The reason may be that the other two types of datasets contain less information and more outliners than the short-term BFP dataset. Thus, complex non-linear methods should be used to tackle the medium-term and long-term BFP of Chinese companies. On the other hand, the new CBR may be more useful to linear or non-complex non-linear separable problems in normal economic environment. It might not be a good choice for business failure prediction in financial crisis environment. Second, we mainly focused on forecasting business failure of Chinese companies. Thus, it would be valuable to conduct some studies focusing on BFP in western countries. Finally, BFP problems are different from personal credit scoring problems since continuous features are mainly used in BFP. Whereas category features are commonly used in personal credit scoring problems. It is therefore valuable to extend the new CBR to personal credit scoring problems by employing techniques which can transfer category features to continuous features.
Acknowledgements This research is partially supported by the National Natural Science Foundation of China (no. 70801055) and the Zhejiang Provincial Natural Science Foundation of China (no. Y6090392). The authors gratefully thank the two anonymous referees for their useful comments and editors for their work. The comments greatly help us to improve the quality of the paper. References [1] McKee TE. Rough sets bankruptcy prediction models versus auditor signaling rates. Journal of Forecasting 2003;22(8):569–586. [2] Bose I. Deciding the financial health of dot-coms using rough sets. Information & Management 2006;43(7):835–46. [3] Sun L, Shenoy PP. Using Bayesian networks for bankruptcy prediction: some methodological issues. European Journal of Operational Research 2007;180(2):738–53. [4] Nam CW, Kim TS, Park NJ, et al. Bankruptcy prediction using a discrete-time duration model incorporating temporal and macroeconomic dependencies. Journal of Forecasting 2008;27(6):493–506. [5] Lin R, Wan Y, Wu C, et al. Developing a business failure prediction model via RST, GRA and CBR. Expert Systems with Applications 2009;36(2):1593–600. [6] Sun J, Li H. Financial distress early earning based on group decision making. Computers and Operations Research 2009;36(3):885–906. [7] Psillaki M, Tsolas IE, Margaritis D. Evaluation of credit risk based on firm performance. European Journal of Operational Research 2010;201:873–81. [8] Garcia F, Guijarro F, Moya I. A goal programming approach to estimating performance weights for ranking firms. Computers & Operations Research 2010;37(9):1597–609. [9] Yeh C, Chi D, Hsu M. A hybrid approach of DEA, rough set and support vector machines for business failure prediction. Expert Systems with Applications 2010;37(2):1535–41. [10] Ho W, Xu X, Dey P. Multi-criteria decision making approaches for supplier evaluation and selection: a literature review. European Journal of Operational Research 2010;202(1):16–24. [11] Schank R. Dynamic memory: a theory of reminding and learning in computers and people. New York: Cambridge University Press; 1982.
419
[12] Kolodner JL. Case-based reasoning. San Francisco: Morgan Kaufmann; 1993. [13] Aamodt A, Plaza E. Case-based reasoning: foundational issues, methodological variations, and system approach. AI Communications 1994;7(1):39–59. [14] Jo H, Han I, Lee H. Bankruptcy prediction using case-based reasoning, neural network and discriminant analysis for bankruptcy prediction. Expert Systems with Applications 1997;13(2):97–108. [15] Zhuang ZY, Churilov L, Burstein F, et al. Combining data mining and case-based reasoning for intelligent decision support for pathology ordering by general practitioners. European Journal of Operational Research 2009;3:662–75. [16] Liu CH, Chen LS, Hsu CC. An association-based case reduction technique for case-based reasoning. Information Sciences 2008;178(17):3347–3355. [17] Beddoe G, Petrovic S. Selecting and weighting features using a genetic algorithm in a case-based reasoning approach to personnel rostering. European Journal of Operational Research 2006;175(2):649–71. [18] Beddoe G, Petrovic S. Enhancing case-based reasoning for personnel rostering with selected tabu search concepts. Journal of the Operational Research Society 2007;58(12):1586–98. [19] Chang PC, Lai CY, Lai KR. A hybrid system by evolving case-based reasoning with genetic algorithm in wholesaler’s returning book forecasting. Decision Support Systems 2006;42(3):1715–29. [20] Burke EK, MacCarthy BL, Petrovic S, et al. Multiple-retrieval case-based reasoning for course timetabling problems. Journal of the Operational Research 2006;57(2):148–62. [21] Waheed A, Adeli H. Case-based reasoning in steel bridge engineering. Knowledge-Based Systems 2005;18(1):37–46. [22] Kwon OB, Sadeh N. Applying case-based reasoning and multi-agent intelligent system to context-aware comparative shopping. Decision Support Systems 2004;37(2):199–213. [23] Zanakis SH, Solomon A, Wishart N, et al. Multi-attribute decision making: a simulation comparison of selection methods. European Journal of Operational Research 1998;107:505–29. [24] Shih HS. Incremental analysis for MCDM with an application to group TOPSIS. European Journal of Operational Research 2008;186:720–34. [25] Hwang CL, Yoon K. Multiple attribute decision making. Berlin: Springer; 1981. [26] Opricovic S, Tzeng GH. Compromise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS. European Journal of Operational Research 2004;156:445–55. [27] Wang JW, Cheng CH, Huang KC. Fuzzy hierarchical TOPSIS for supplier selection. Applied Soft Computing 2009;9:377–86. [28] Chamodrakas I, Alexopoulou N, Martakos D. Customer evaluation for order acceptance using a novel class of fuzzy methods based on TOPSIS. Expert Systems with Applications 2009;36:7409–9415. [29] Deng HP, Yeh CH, Willis RJ. Inter-company comparison using modified TOPSIS with objective weights. Computers and Operations Research 2000;27:963–73. [30] Ashtiani B, Haghighirad F, Makui A, et al. Extension of fuzzy TOPSIS method based on interval-valued fuzzy sets. Applied Soft Computing 2009;9:457–61. [31] Ic YT, Yurdakul M. Development of a quick credibility scoring decision support system using fuzzy TOPSIS. Expert Systems with Applications 2010;37(1):567–74. [32] Ahi A, Aryanezhad MB, Ashtiani B, et al. A novel approach to determine cell formation, intracellular machine layout and cell layout in the CMS problem based on TOPSIS method. Computers and Operations Research 2009;36(5): 1478–96. [33] Back B, Laitinen T, Sere K, et al. Choosing bankruptcy predictors using discriminant analysis, logit analysis, and genetic algorithms. Turku Centre for Computer Science. Technical report no. 40, 1996. [34] Hillegeist SA, Keating EK, Cram DP, et al. Assessing the probability of bankruptcy. Review of Accounting Studies 2004;9(1):5–34. [35] Min S, Lee J, Han I. Hybrid genetic algorithms and support vector machines for bankruptcy prediction. Expert Systems with Applications 2006;31:652–60. [36] Lin F, McClean S. A data mining approach to the prediction of corporate failure. Knowledge-Based Systems 2001;14(3–4):189–95. [37] Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of Machine Learning Research 2003;3:1157–82. [38] Pal S, Shiu S. Foundations of soft case-based reasoning. New Jersey: Wiley; 2004. [39] Altman E. Financial ratios, discrminant analysis and the prediction of corporate bankruptcy. Journal of Finance 1968;23:589–609. [40] McKee TE. Altman’s 1968 Bankruptcy prediction model revisited via genetic programming: new wine from old bottle or a better fermentation process. Journal of Emerging Technologies in Accounting 2007;4:87–101.