Expert Systems with Applications 38 (2011) 10210–10217
Contents lists available at ScienceDirect
Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa
Financial health prediction models using artificial neural networks, genetic algorithm and multivariate discriminant analysis: Iranian evidence F. Mokhatab Rafiei a,⇑, S.M. Manzari b, S. Bostanian b a b
Department of Industrial Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran Department of Industrial Engineering, Tarbiat Modares University, Tehran, Iran
a r t i c l e
i n f o
Keywords: Financial health prediction Financial ratios Artificial neural networks Genetic algorithm Discriminant analysis Iranian company
a b s t r a c t The purpose of this study is to design a model to predict financial health of companies. Financial ratios for 180 manufacturing companies quoted in Tehran Stock Exchange for one year (year ended March 21, 2008) have been used. Three models; based on artificial neural networks (ANN), genetic algorithm (GA), and multiple discriminant analysis (MDA) are utilized to classify the bankrupt from non bankrupt corporations. ANN model achieved 98.6% and 96.3% accuracy rates in training and holdout samples, respectively. To evaluate the reliability of the model, the data were examined with genetic algorithm and Multivariate discriminate analysis method. GA model attained only 92.5% and 91.5% accuracy rates and MDA reached 80.6% and 79.9 in training and holdout samples, respectively. Ó 2011 Elsevier Ltd. All rights reserved.
1. Introduction Bankruptcy and failure of companies is an unwanted phenomenon which is always an important problem. Prediction of corporate financial distress has long been the object of study of corporate finance literature. Business failures in general, have serious negative economic and social consequences, therefore should be seriously treated. The impact of bankruptcy is often the object of research by bank economist and government officers. Economic cost of business failures is significant. It affects the national economy as a whole, influencing the trends of major economic indicators such as sales, exports and production. Furthermore every company, to a greater or lesser degree and in difficult ways, influences and is influenced by various stakeholders like investors, customers, employees and suppliers. Hence, the suppliers of capital, investors and creditors, as well as management and employees, are severely affected by business failures. Models of bankruptcy prediction will also help a manager to keep track of a company’s performance over a number of years and will help identify important trends. The models may not specifically tell the manager what is wrong, but it should encourage them to identify problems and take effective action to minimize the incidence of failure. A predictive model may warn an auditor of a company’s vulnerability. In addition, lenders may adopt use predictive models to aid in assessing a company defaulting on its
⇑ Corresponding author. Tel.: +98 311 391 5508; fax: +98 311 391 5526. E-mail addresses:
[email protected],
[email protected] (F. Mokhatab Rafiei). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.02.082
loan. Regulatory agencies are concerned whether a monitored company is in danger of failing. In Iran in general and especially in the level of listed companies in Tehran Stock Exchange (TSE) there are many companies which suffer from weakness of financial health or in other words they are bankrupt. As an example, some companies cannot pay their liabilities and their inadequate return could not cover their expenses. Therefore they are incorporated under the trade act 141. According to this act the companies are known as bankrupt whose retained losses are more than 50% of their capital. If this financial failing be discovered sooner the chance of cure and recovery with simple methods will be higher. So it is recommended that instead of bankruptcy, weakness of financial health be used as it has been used interchangeably in this paper. Prediction of corporate financial health from financial data is a well-known research topic. One of the first researchers to study bankruptcy prediction is Beaver (1966). He examined the predictability of the 14 financial ratios using 158 samples that consisted of bankrupt and non-bankrupt companies. Altman’s (1968, 1983) pursued Beaver’s study based on the MDA to identify the companies into recognized classes. According to Altman’s model, bankruptcy could be explained totally by using a combination of five financial ratios out of original list of 22 ratios. The classification of Altman’s model based on the value obtained for the Z score has a predictive power of 96% for prediction one year prior to bankruptcy. These used statistical methods, however, have some limiting assumptions such as the linearity, normality and independence among predictor or input variables. Financial data often violates these assumptions for independent variables (Deakin, 1972). Hence the methods can have limitations to obtain the effectiveness
F. Mokhatab Rafiei et al. / Expert Systems with Applications 38 (2011) 10210–10217
and validity. In recent times, a number of studies have confirmed that artificial intelligence approaches are less exposed to these assumptions. Among these methods ANNs can be substitute methods for classification problems to these conventional statistical methods which have been practiced for a long time. ANNs are competent of recognizing and presenting non-linear relations in the data set. Therefore they have been studied widely in financial problems studies including bankruptcy prediction (Barniv, Agarwal, & Leach, 1997; Bell, 1997; Shin, Shin, & Han, 1998; Zhang, Hu, Patuwo, & Indro, 1999). ANNs basically are different from parametric statistical models. In case of parametric statistical models the user should specify the nature of the functional relationship such as linear or logistic between the dependent and independent variables. After preparing an assumption about the functional form, optimization methods are used to decide on a set of parameters which reduces the measure of error. On the contrary, ANNs with at least one hidden layer are capable of using data to expand an interior illustration of the relationships between variables in away that previous assumptions about primary parameter distributions are not vital. As a result, better outcomes might be predictable with ANNs when the relationship between the variables does not fit the implicit model (Salchenberger, Cinar, & Lash, 1992). Odom and Sharda (1990) use ANNs for bankruptcy prediction for the first time. Their model had five input variables similar to Altman’s study (Altman, 1968) and had one hidden layer with five nodes and one node for the output layer. They took 65 bankrupt companies and 64 non-bankrupt companies as the research sample between 1975 and 1982. Among these 74 companies (38 bankrupt and 36 non-bankrupt companies) were used to form the training set, while the remaining 55 companies (27 bankrupt and 28 non-bankrupt companies) were used to make holdout sample. An MDA was carried out on the same training set as a benchmark. Consequently, ANNs correctly classified 81.81% of the hold out sample correctly while MDA only attained 74.28%. Tam and Kiang (1992) compared an ANN models’ performance with a linear discriminant model approach using the commercial bank failure data. There were 59 failed and 59 non-failed banks which were considered for the period between 1985 and 1987. Han, Jo, and Shin (1997) compared an ANN models’ performance with an MDA. Among the models, ANN model had the highest level of accuracies (78.7%) in the given data sets. Genetic algorithm (GA) is an innovative approach to artificial intelligence, capable of solving complex problems with almost no human supervision. This algorithm utilizes the survival of the fittest to select the best solution for the problem under consideration (Davis, 1991; Goldberg, 1989; Holland, 1975). An empirical application has been conducted by Varetto (1998), who introduces a GA to analyze the insolvency risk of 3840 industrial Italian companies. Varetto compares linear discriminant analysis as a traditional statistical methodology for bankruptcy classification and prediction and a genetic algorithm. He concludes that genetic algorithms are a very effective instrument for insolvency diagnosis, although the results obtained with linear discriminant analysis were superior to those obtained with the GA. Varetto notes that ‘‘the results of the GA were obtained in less time and with more limited contributions from the financial analyst than the linear dicriminant analysis. Shin and Lee (2002) proposed a GA approach to bankruptcy prediction modeling, which is capable of extracting rules that are easy to understand for users. They extracted five rules which predicted the bankrupt companies with 80.8% accuracy. Kim and Han (2003) used GA for bankruptcy prediction. The rules found through GA were superior to rules were found by neural networks. Most of the bankruptcy studies in Iran conducted to ponder the capability of the widespread statistical methods for prediction bankruptcy of companies. Most of these researches utilized
10211
multivariate statistical methods. The first study (Rasoolzadeh, 2001) was conducted to answer if the Altman’s model is useful to predict the bankruptcy in Iran. This study predicts bankrupt textile companies with 75% accuracy. Financial information of textile companies during 1996–1999 was used in this study. A recent study (Sohrabi Eraghi, 2008), using 21 ratios utilizing MDA with a total of all 460 listed companies, claim 80% accuracy to predict bankrupt companies. Etemadi, Anvary Rostamy, and Farajzadeh Dehkordi (2009) developed a genetic programming model for bankruptcy prediction. They report 94% accuracy in training set and 90% accuracy in their holdout set. Their genetic programming was applied to classify 144 bankrupt and non-bankrupt Iranian firms listed in Tehran stock exchange (TSE). Bostanian (2008) utilized GA in her study to classify Iranian companies into bankrupt and non bankrupt. Her model attained only 92.5% and 91.5% accuracy rates. The purpose of this article was twofold. First, to conduct a new study for bankruptcy prediction studies of Iranian companies. Second, to propose an artificial neural networks based model to predict bankruptcy of companies. These dual purposes were a strong motivation for this research. The overall organization of the paper is as follows. After the introduction, fundamental issues of neural networks are presented in Section 2. Sections 3–5 examine theoretical issues of genetic algorithm, multiple discriminant analysis, and principal component analysis. Section 6 is devoted to experimental design including selecting variables and the sample. Model development is discussed in Section 7. Finally, Section 8 concludes the paper. 2. Neural networks An artificial neural network is similar to a human brain. Likewise as the human brain, it contains of neurons and links between them. A learning model is supervised, unsupervised or a mixture of the two, and replicates the process in which training data is offered to the neural network. There are a number of different parameters that must be determined ahead when designing a neural network. Among these parameters are the number of layers, the number of neurons per layer, the number of training iterations, and so on. Some of the more important parameters in terms of training and network capacity are the number of hidden neurons, the learning rate and the momentum parameter (Nazzal, El-Emary, & Najim, 2008). The multilayer perception (MLP) neural network is built up of simple components that will be described first. A single-input neuron is shown in Fig. 1. The scalar weight w is multiplied by the scalar input p to form w ⁄ p, which is sent to the adder for summing the inputs. The other input, 1, is multiplied by a bias b and then passed to the adder. The adder output y often called as the net input, goes into an activation function f which generates the scalar neuron output a. Both w and b are adjustable scalar factors of the neuron. Typically the transfer function is selected by the user and then the factors w and b will be customized by some learning rule in order that the neuron input or output relationship arrive at a challenging objective. The transfer function in Fig. 1 could be a
Fig. 1. Single input neuron.
10212
F. Mokhatab Rafiei et al. / Expert Systems with Applications 38 (2011) 10210–10217
target vector as closely as possible. In order to do this, an error function is defined, such as sum-of-squares error. While sum-ofsquares error is the most commonly used error function in MLP applications, there are many other choices which are well-founded, depending on the nature of the training data and the target output of the network. Many learning algorithm are use in neural networks which are included of conjugate gradient learning (Zhang, 2000), Levenberg–Marquardt learning process (Singh, Gupta, & Gupta, 2007), and Bayesian learning (Zhang, 2009). 3. Genetic algorithm
Fig. 2. Log-sigmoid transfer function.
linear or nonlinear function of n. A particular transfer function is chosen to go with some condition of the problem that the neuron is attempting to solve. A particular transfer function is chosen to go with some condition of the problem that the neuron is attempting to solve. One of the most commonly used functions is the log-sigmoid transfer function, which is shown in Fig. 2 (Nazzal et al., 2008). In multi-layer networks that are trained using the back propagation algorithm the log-sigmoid transfer function is usually used. Frequently one neuron, even with many inputs, may not be sufficient. More than one neuron working in parallel might be required. This will be called a ‘‘layer’’. A neuron usually has more than one input. Now consider a network with two layers which has been used in this paper for the purpose of analyzing bankruptcy prediction. In this network each layer has its own weight matrix w, its own bias vector b, a net input vector n and an output vector a. A typical two layer network is shown in Fig. 3. Mathematical presentation of this network is as follows:
Y ¼ b2 þ w2 Sðb1 þ w1 PÞ
ð1Þ
y is final output and S is transfer function related to each layer. Vectors bi and wi are network factors, bias and input weights, respectively in ith layer of neural network. Dimension of these vectors in each layer show the number of neurons of the layer. The property that is of primary significance for neural network is the ability of the network to learn from its environment and to improve its performance through learning. The learning process corresponds to the estimation process in statistics. In the supervised learning tasks, a set of input–output vectors are presented to the network, a learning algorithm adjusts the connection weights until the output vector of the network approximates the
GAs are statistical search methods that can search large and complex spaces based on natural genetics. GAs conducts an investigation procedure in four stages: initialization, selection, crossover, and mutation (Davis, 1991). In the initialization stage, a population of chromosomes with genetic structures, are randomly scattered in the solution space. Then each chromosome is evaluating by a user defined fitness function. The fitness function’s job is to train the performance of the chromosome. Deciding on the fitness function is the most important step in real applications. From the initial population of chromosomes, a new population is generated using three genetic operators: reproduction, crossover, and mutation. These are modeled on their biological equivalents. Members of the population are chosen for the new population by probabilities proportional to their fitness. Pairs of chromosomes in the new population are selected at random to exchange genetic material, their bits, in a mating operation called crossover. This produces two new chromosomes that replace the parents. Randomly chosen bits in the offspring are flipped, called mutation. The new population generated with these operators replaces the old population. The population develops, containing more and more very fit chromosomes. When the convergence criterion is reached, such as no significant further increase in the average fitness of the population, the best chromosome produced is decoded into the search space point it represents. 4. Multiple discriminant analysis Multiple discriminant analysis is a multivariate statistical method, used for classification of observations based on two or more independent variables (Johnson & Wichern, 2002). This method determines that to which group, an observation belongs. MDA stands on the maximum likelihood of correct classification and constructs some linear combinations of independent variables, called discriminant functions or classification functions, to separate observations into predetermined groups. A discriminant function is like Eq. (2):
Fig. 3. A sample two layer network.
10213
F. Mokhatab Rafiei et al. / Expert Systems with Applications 38 (2011) 10210–10217
Z ¼ W 1X1 þ W 2X2 þ þ W nXn
ð2Þ
where Z is the discriminant score, Wi is the discriminant weight for variable i and Xi is the variable i. The discriminant function is built in such a way to maximize the ratio between groups sum of squares/within groups sum of squares. It is built in a way that first discriminant function is calculated to best separate the groups. The second discriminant function is calculated to best separate the groups and is not correlated to the first one and so on, to reach the maximum number of discriminant functions: min (number of groups – 1, p). 5. Principal component analysis Principal component analysis is a multivariate statistical method used for reduction of dataset dimension (Jolliffe, 2002). Reduction of dataset dimension has some advantages like that with low dimension, interpretation of data is easier and also the behavior of data can be seen graphically in the space of most important components. Important components are those that express and contain more variance of dataset. Although for definition of a problem p variables are required, but frequently most of these variables can be shown with k components (k < p). So almost the same amounts of information are in the reduced components as in the original variables. With this operation, the problem dimension of p n will be reduced to k n, where k < p and n is the number of observations. Principal components are specific linear combinations of original variables calculated on variance-covariance matrix of original variables. Principal components are rated according to their importance. The first principal component is the one whose variance is maximum. The second principal component is the one whose variance is the next maximum and it is not correlated with first principal component and so on. The principal components are extracted in decreasing order of importance. The principal components whose cumulative variance is not high can be eliminated and in this way data dimension reduction will be achieved. 6. Experimental design Constructing a model to classify a population is based on many parameters which are independent of the implied model. In this section selecting the population will be discussed. The main goal of this research is using financial ratios to predict financial health of listed Iranian companies. At first the ratios introduced by predecessor gathered and then with experts’ guidelines the ratios using for constructing the models have been selected. This study by no means claims that the best ratios have been selected for discrimination of bankrupt and non-bankrupt companies except its goal is to find the ratios that have the potential of sending relevance signal related to future financial status of the companies. 6.1. Population and sample Population under this study is the accepted manufacturing companies in Tehran Stock Exchange (TSE) for one year ended March 21, 2008. The reason for this choice is the availability of financial information of these companies. There are 461 companies listed in TSE in 37 industry groups which 412 of them are manufacturing companies and 49 are non-manufacturing. The number the manufacturing companies are more than other listed companies and they are subject to grant more loans due to their extensive activities. A sample of 180 companies is chosen for this research. In Tehran stock exchange, the measure for companies exiting capital market is commercial law of 141 act. According to this
act the companies are known as bankrupt whose retained losses are more than 50% of their capital. 58 companies are bankrupt under this law. The rest of non-bankrupt companies have been randomly selected from the remaining list. Financial ratios for 180 manufacturing companies quoted in Tehran Stock Exchange for one year (year ended March 21, 2008) have been used. The required data to calculate the ratios have been gathered from companies’ balance sheets and income statements. 6.2. Variables selection The first step in designing a classification model is the selection of variables. Selecting the right variables has a critical rule in building a model. Noticing few points in selecting variables are important. First of all, number of variables should be appropriate. Second, it can be calculated. Dependent variable for classification of companies to two distinct bankrupt and non bankrupt companies is the trade act 141. As it mentioned before according to this act a company is bankrupt if its retained losses is more than 50% of its capital. In most bankruptcy studies a set of initial variables were used. This set had been chosen arbitrary or they were selected based on previous study. The arbitrary selection methods are used since there was no fundamental theory to help researcher to select the correct ratios. The second type, based on previous study, was not without mistakes either. Since these common used ratios are exposed to false presentation and might be reported incorrectly by companies’ it selves (Beaver, 1966). In this research set of independent variables including 17 ratios were selected based on previous study and were finalized after interview with experts. In this regard, the author tried her best to choose the ratios that were matched with financial condition of Iranian companies. Since many used ratios in other study outside the atmosphere of Iran could not be an appropriate financial health prediction symbol for Iranian companies. Ratios based on cash flows are among those (Sohrabi Eraghi, 2008). Table 1 show the ratios utilized in this research. Detail definition of these ratios is fully explained in Peterson (1999).
7. Classification models development 7.1. Classification based on neural network Binary classification (Min & Jeong, 2009) is a method of classifying the members of a given set of objects into two groups to find
Table 1 Ratios used to build initial variables set. Ratio
Number
Working capital/total assets Operating income/total assets Market value/total assets Sales/total assets Owners’ equity/total assets Net income/total assets current ratio Quick ratio Gross profit/sales Sales/inventory Sales/fixed assets Net income/sales Sales/total liabilities Net income/owners’ equity Account receivables/average daily sales Operating income/interest expenses Cost of goods sold/inventory
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17
10214
F. Mokhatab Rafiei et al. / Expert Systems with Applications 38 (2011) 10210–10217
a model to predict the classification of any new entry. It is a two stages process: Learning: In this stage a classification model based on analysis or learning from training date set is built. In other words with any input vector X = (x1, x2, . . . , xn) and label of the class which the vector belongs to, the model start to learning any relationships between it and corresponding class. Classification: In this stage utilizing the built model in previous stage it starts to classify of new data. Classification is a crucial step for analyzing financial status of companies. Bankruptcy prediction belongs to binary classification problem. The effective factors to financial health of companies are financial ratios which are considered as input vector X = x1, x2, . . . , xn in such a way that xi, i e [1, n] is the ith factor. The goal is to find label of occurrence of bankruptcy for any input vector X so that for y equal 1 the label shows non-bankrupt and for y equal 0 it says bankrupt. A few numbers of learning algorithms such as of the Levenberg– Marquardt algorithm and conjugate gradient, and Bayesian learning algorithms have been proposed. 7.1.1. Data pre-processing Data pre-processing is a useful step before classification. In general pre-processing includes three stages; data transformation, data cleaning, and data selection (Han & Kamber, 2001). Data transformation addresses to delete the inconsistent data from training set. A common approach of transformation is normalizing data. Normalizing is to re-write all factors in a known range for example [0, 1]. Normalization prevents elimination of the effect of small inputs by effects of large inputs. There are two main normalizing methods; linear and stochastic (Nguyen and Chan, 2004). (1) Linear normalization into [a, b] range that is done with following Eq. (3):
ðb aÞðx xmin Þ=ðxmax xmin Þ þ a
ð3Þ
(2) Stochastic normalization or xn ¼ ðx xÞ=d. In this equation x and r are the mean and standard deviation, respectively. In this study both methods have been used and the first method, linear normalization showed higher precision and has been selected. Also the log-sigmoid transfer function (Nazzal et al., 2008), is selected among the transfer functions. It takes the input (which may have any value between plus and minus infinity) and squashes the output into the range 0 to 1. It is appropriate to say that since the outputs of classification models generally are in this range 0 to 1 but they are not necessarily equal to 0 and 1, utilizing a tolerance limit for classification is required. After many tests a tolerance limit of 0.65 is found.
combination assign 80% to training set, 15% to test set, and 5% to validation set. 7.1.3. Structure of classification model The neural network structure is used for classification. The neural networks are able to estimate any complex and non-linear relations between input and output vectors in short time and with high accuracy. The inputs for current classification model are financial ratios as discussed in preceding parts. Structure of employed network can be seen in Table 2. After considerable tests and assessment of performance of four widespread learning algorithms; steepest descent, conjugate gradient, Levenberg–Marquardt, and Bayesian, the best learning algorithm is selected. The result of utilizing neural network for classification is shown in Table 3. 7.1.4. Results of classification based on neural netwoks As explained before the neural network model was capable to classify the training set into two bankrupt and non-bankrupt with a total precision of 98.6%. In this regard 96.6% of non-bankrupt and 100% of bankrupt companies have been classified. For further evaluation of classification ability of the model the test set contain of 27 companies were not used to construct the model have been used. The result have validated that 95% of non-bankrupt and 100% of bankrupt companies have been classified correctly. The result implied that the model can be used to classify other companies outside the current sample. Therefore the neural network model is not biased and a balanced outcome will be generated with it. The complete result can be seen in Table 4. 7.2. Classification based on genetic algorithm While ANN has been used successfully to classify organizations in terms of solvency, they are limited in degree of generalization by lack of knowledge of how a conclusion is reached. Because of this
Table 2 Structure of neural network used for classification. Type of network
Explanation
Number of layers Number of hidden layers Transfer function for hidden layers Transfer function for outer layer Learning algorithms
2 38 Log-sigmoid Linear Steepest descent, conjugate gradient, Levenberg–Marquardt, Bayesian Transfer of inputs between plus and minus one and squashes the output into the range 0–1 (Fig. 2) 80, 15, 5 17
Pre-processing
% of training, test, and valid set Number of input
7.1.2. Selection of validation set When the training set in each class are large, neural network instead of learning of the relationship between variables tries to retain the output amount for corresponding input variables. This phenomenon reduces the generalization in test stage and result a high classification error. For this reason a part of data set is considered as validation set and with tracing the network learning, the power of generalization in validation set have been evaluated. When the network learning becomes better but the generalization of network gets weaker, the learning must stop. In this spot the overfitting in network is happening. For this reason many tests with different combination of sizes for training set, validation set, and test set have been accomplished and at last the best combination which reached the minimum model error was selected (Wang, Yang, Shi, & Wang, 2008). This
Table 3 The accuracy of classifications based on used neural networks. % of classification precision
Type of artificial neural network
65.46
Feedforward with conjugate gradient descent learning process Feedforward with Levenberg–Marquardt learning process Feedforward with Bayesian learning process Recursive dynamic with Levenberg–Marquardt learning process Recursive dynamic with Bayesian learning process
97.84 89.92 87.76 82.73
F. Mokhatab Rafiei et al. / Expert Systems with Applications 38 (2011) 10210–10217 Table 4 The best result reached by ANN.
Training set
Number %
Test set
Number %
a
a Solvency status
Predicted group by ANN
Total
Total precision %
0
1
0 1 0 1
56 3 100 3.37
0 86 0 96.6
55 89 100 100
98.6
0 1 0 1
7 1 100 5
0 19 0 95
7 20 100 100
96.3
0: bankrupt, 1: non-bankrupt.
and also to find out if the ANN is superior to other methods or what, the author decided to try another approaches. The concept behind the GA is not new. It is based on the survival of the fittest and evolution. Starting with members of a candidate population of solutions, this population is evolved to the best set of solutions in an evolutionary manner. It was thus decided to apply a GA to the data, focusing on the same ratios used in ANN approach. In most researches the data set is divided into two sets, training set for building the model and test set be used to measure the performance of the model. The main point in this division is adequate remaining data sets. Literature review showed that in most cases training set and test set are incorporated different numbers of members. The division of data into two distinct sets containing the proportion of 60% and 40% or 75% and 25% are common. The point is to keep same percent of members of classes in each set for current study same percent of bankrupt to non-bankrupt companies in each set. Therefore, prior applying a genetic algorithm, the data set was divided into two sets, 140 companies in training set and the rest 40 companies form validation set. 7.2.1. Decision on initial parameters In this part decision on number of variables, number of rules in each string and number of chromosomes and sample size are made. There are four rules in each string. Therefore at the end there must be four rules. These rules contain the range of financial ratios which discriminate the bankrupt companies from non-bankrupts. Initial population contains 100 chromosomes. There are a total of 140 companies in training set and 40 companies in test set. Since two relational operators Op (<) and (>) are used in the algorithm, the model contains two operators. A zero is used for (<) and 1 is used for (>). 7.2.2. Chromosome structure Designing of a good coding system affects on success of the model. In addressed algorithm in this research cods are in string format. Each chromosome (string) contain four rules. Each rule
10215
contains three gens. This three gens are name of variable (Var), operator (Op) (< or >) and the variable value (x). Structure of a chromosome can be seen in Fig. 4. All three gens are generated randomly. Variable value is a random number between the higher and lower value of variable in data set. The particular used variable can be selected randomly. Rules were limited to only four out of all seventeen ratios. Rules could take this form: If var1 < x1, and var2 > x2. . . and varN < xN then (company is bankrupt). 7.2.3. Initial population The GAs maintains a population of strings which are chosen at random. This initialization allows the GAs to explore the range of all possible solutions, and this tends to favor the most likely solutions. Generally, the population size is determined according to the size of the problem, i.e. bigger population for larger problem. The common view is that a larger population takes longer to settle on a solution, but is more likely to find a global optimum because of its more diverse gene pool. We use 100 strings in the initial population. The task of defining a fitness function is always application specific. In this study, the objective of the system is to find a rule which would yield the highest hit ratio if rules are fired across the company. Thus, we define the fitness function to be the hit ratios of the rule. The mutation rate is 0.08 for this experiment. Two rules have used as a stopping condition; if the best chromosome does not change after 30 trials or if the total trials reach its maximum at 400. 7.2.4. Fitness function The fitness function used in this research has defined as number of success in correct classification of bankrupt and non-bankrupt companies. An appropriate fitness function is crucial to have reliable prediction. For this research, fitness function must utilize a ratio combined of correct to not correct prediction. After a through search and evaluate of many alternatives, fitness function introduced in Bostanian (2008) was selected.
Fitness function ¼ ð1 þ ðTp=Tp þ FpÞÞ ð1 þ ðFn=ðFn þ TnÞÞ=4
ð4Þ
such that: True positive (Tp) is the rule predicts that the company is bankrupt and it is. False positive (Fp) the rule predicts that the company is bankrupt and it is not. True negative (Tn) the rule predicts that the company is non-bankrupt and it is; and False negative (Fn) is the rule predicts that that the company is non-bankrupt and it is no. Adding 1 to each ratio prevent from being zero and division to 4 hold the fitness function in range of zero and one. 7.2.5. Rule extraction using GAs After 218 runs the fitness did not changed. Therefore the first condition to cease the algorithm has occured. The chromosome including four rules can be seen in Fig. 5. So the final result in this stage for prediction bankruptcy is: If X1 < 0.02, X2 < 0.11, X5 < 0.17,
Fig. 4. The structure of chromosome used in study.
10216
F. Mokhatab Rafiei et al. / Expert Systems with Applications 38 (2011) 10210–10217
Fig. 5. Extracted rules with GA.
and X16 < 1.12 then the company is bankrupt otherwise it is nonbankrupt. X1 is the ratio of working capital to total assets, X2 is the ratio of operating income to total assets, X5 is the ratio of owners’ equity to total assets, and X16 is the ratio operating income to interest expenses. 7.2.6. Results of classification based on genetic algorithm The proposed genetic algorithm classified the companies in training set with 92% accuracy into two bankrupt and non-bankrupt groups or a total of 129 correct predictions to 140 available companies. The result showed that model classified bankrupt companies with 91.5% accuracy; 43 correct predicted companies out of 47 bankrupt companies. Also the model predicted 86 non-bankrupt companies out of total 93 non-bankrupt companies or achieved 92.5% precision. For validation analysis 40 companies were kept. Almost 91% of bankrupt companies and 90% of nonbankrupt companies from test set have been correctly classified. Although the ANN’s result outperforms the GA’s result but this result is still confirms that the proposed GA model is an acceptable method for classification of bankrupt and non-bankrupt companies and can be used as a prediction model successfully. Table 5 shows the classification result of GA model. 7.3. Classification based on multiple discriminant analysis For further evaluation of attained results from ANN and GA, two prediction models based on MDA are also proposed. For this reason in two distinct steps two models are built one with utilizing all 17 ratios and the other with only 4 ratios arrived at from GA. 7.3.1. MDA with four ratios from GA In this attempt four ratios from implementing GA including X1, X2, X5, and X16 have been used as inputs for discriminant analysis. Due to nature of MDA, there is no need for separation of data to two sets; training set and test set, since the software itself internally make such a separation. After running MDA the following model has been reached.
Table 6 MDA classification utilizing four ratios are found by GA. * Solvency Predicted group by status MDA
a
Total
Total precision %
0
1
Training set Number 0 1 % 0 1
15 0 31.2 0
33 92 68.8 100.0
48 76.4 92 100.0 100.0
Test set
13 1 27.1 1.1
35 91 72.9 98.9
48 74.3 92 100.0 100.0
Number 0 1 % 0 1
0: bankrupt, 1: non-bankrupt.
the financial health of the company. Table 6 shows the result found by MDA. 7.3.2. MDA with all ratios After implementing the model only three ratios selected for discrimination. These ratios are operating income/total assets, net income/total assets, and net income/sales. The model has the ability to predict 80.6% of training set (initial set) and 79.9% of test set (holdout set) correctly. 7.3.3. Model implementation with ratios from utilizing PCA As mentioned before PCA is a data reduction statistical method. Because 17 ratios have been used in this research there was a chance that correlations among these 17 ratios affect the result. Hence it was decided to utilize PCA as a data reduction technique to reduce this possible affection. Therefore after utilizing PCA four ratios; working capital to total assets, sales to inventory, account receivables to average daily sales, and market value to total assets, were selected. With using MDA with these four ratios only 52% were classified correctly. The result is much weaker than the result gained by all ratios.
Z ¼ 0:94X1 þ 0:85X2 þ 0:99X5 þ 0:1X16 X1 is the working capital/total assets, X2 is the operating income/total assets, X5 is the owners’ equity/total assets, X16 is the operating income/interest expenses, and Z is the total index. If Z < .088 the company is in a disaster mode and close to bankruptcy. If Z > 0.46 the company is in non-bankrupt group. At last for the amount of Z between 0.88 and 0.46 cannot predict
Table 5 Classification result of the proposed GA model. a Solvency status
Training set
Number %
Test set
Number %
a
Predicted group by GA
Total
Total precision %
0
1
0 1 0 1
43 7 91.5 7.5
4 86 8.5 92.5
47 93 100.0 100.0
92.14
0 1 0 1
10 3 91 10
1 26 9 90
11 29 100.0 100.0
90
0: bankrupt, 1: non-bankrupt.
8. Comparison of the results and conclusion Bankruptcy and failure of companies is an unwanted phenomenon which is always an important problem. Prediction of corporate financial distress has long been the object of study of corporate finance literature. Business failures in general, have serious negative economic and social consequences, therefore should be seriously treated. Models of bankruptcy prediction will also help a manager to keep track of a company’s performance over a number of years and will help identify important trends. The models may not specifically tell the manager what is wrong, but it should encourage them to identify problems and take effective action to minimize the incidence of failure. A predictive model may warn an auditor of a company’s vulnerability. In addition, lenders may adopt use predictive models to aid in assessing a company defaulting on its loan. Regulatory agencies are concerned whether a monitored company is in danger of failing. In this study three methods were utilized, ANN, GA, and MDA. The result shows that ANN model outperforms the other models captured by two other methods. GA is also a good a powerful method to select the rules based on financial ratios which are more
F. Mokhatab Rafiei et al. / Expert Systems with Applications 38 (2011) 10210–10217 Table 7 The result.
ANN GA MDA with four ratios MDA with all ratios
Method Test set
Total precision % Training set
98.11 92.5 76.3 80.6
96.22 91.5 74.1 79.9
understandable for management. The superiority of this method against MDA is obvious. Table 7 shows the accuracy reached by each method. Unavailability of information and difficulty to access the required data are the main limitations in this research. The small size of listed companies in TSE was a barrier to select companies in same industry. Also choosing all bankrupt companies due to the small size of them is another drawback of this research. Using these models, ANN and GA by TSE for accepting companies to be listed will help the TSE to evaluate the financial health of these companies more accurately. Also the result gained by utilizing these models allows banks and financial institutions to make a safer decision in lending money and assigning fund to their customers. References Altman, E. (1983). Corporate financial distress—A complete guide to predicting, avoiding and dealing with bankruptcy. New York: Wiley. Altman, E. I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The Journal of Finance, 23(4), 589–609. Barniv, R., Agarwal, A., & Leach, R. (1997). Predicting the outcome following bankruptcy filing: A three-state classification using neural networks. Intelligent Systems in Accounting, Finance and Management, 6, 177–194. Beaver, W. H. (1966). Financial ratios as predictors of failure. Journal of Accounting Research, Empirical Research in Accounting: Selected Studies: Supplement, 4, 71–111. Bell, T. (1997). Neural nets or the logit model? A comparison of each model’s ability to predict commercial bank failures. Intelligent Systems in Accounting, Finance and Management, 6, 249–264. Bostanian, S. (2008). A genetic model for bankruptcy prediction. MS thesis, Tarbiat Modares University, Iran. Davis, L. (1991). Handbook of genetic algorithms. New York: Van Nostrand Reinhold. Deakin, E. B. (1972). A discriminant analysis of predictors of business failure. Journal of Accounting Research, 10(1), 167–179. Etemadi, H., Anvary Rostamy, A. A., & Farajzadeh Dehkordi, H. (2009). A genetic programming model for bankruptcy prediction: Empirical evidence from Iran. Expert Systems with Applications, 36(2 Part 2), 3199–3207. Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning. Reading, MA: Addison-Wesley.
10217
Han, I., Jo, H., & Shin, K. S. (1997). The hybrid systems for credit rating. Journal of the Korean Operations Research and Management Science Society, 22(3), 163–173. Han, J. W., & Kamber, M. (2001). Data mining: Concepts and techniques. San Francisco: Morgan Kaufmann. Holland, J. H. (1975). Adaptation in natural and artificial systems. Ann Arbor: The University of Michigan Press. Min, J. H., & Jeong, C. (2009). A binary classification method for bankruptcy prediction. Expert Systems with Applications: An International Journal, 36(3), 5256–5263. Johnson, R., & Wichern, W. D. (2002). Applied multivariate statistical analysis (5th ed.). Prentice Hall. Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). Series: Springer series in statistics. NY: Springer. Kim, M. J., & Han, I. (2003). The discovery of experts’ decision rules from qualitative bankruptcy data using genetic algorithms. Expert Systems with Applications, 25(4), 637–646. Nazzal, J. M., El-Emary, I. M., & Najim, S. A. (2008). Multilayer perceptron neural network (MLPs) For analyzing the properties of Jordan oil shale. World Applied Sciences Journal, 5(5), 546–552. Nguyen, H. H., & Chan, C. W. (2004). A comparison of data preprocessing strategies for neural network modeling of oil production prediction. In Third IEEE international conference on Cognitive informatics (ICCI’04) (pp. 199–207). Odom, M., & Sharda, R. (1990). A neural networks model for bankruptcy prediction. In Proceedings of the IEEE international conference on neural network (Vol. 2, pp. 163–168). Peterson, P. (1999). Financial ratio analysis. Faculty of Finance, Florida Atlantic University. Rasoolzadeh, M. (2001). Application of Altman’s model for bankruptcy prediction of textile listed companies. Tadbeer, 12(120), 105–107. Salchenberger, L., Cinar, E., & Lash, N. (1992). Neural networks: A new tool for predicting thrift failures. Decision Sciences, 23, 899–916. Shin, K., Shin, T., Han, I. (1998). Neuro-genetic approach for bankruptcy prediction: a comparison to back-propagation algorithms. In Proceedings of the International Conference of the Korea Society of Management Information Systems (pp. 585597). Seoul, South Korea. Shin, K. S., & Lee, Y.-J. (2002). A genetic algorithm application in bankruptcy prediction modeling. Expert Systems with Applications, 2(3), 321–328. Singh, V., Gupta, I., & Gupta, H. O. (2007). ANN-based estimator for distillation using Levenberg–Marquardt approach. Engineering Applications of Artificial Intelligence, 20, 249–259. Sohrabi Eraghi M. (2008). A model for prediction financial health of Iranian companies. PhD thesis, Allameh Tabatabaei University, Department of Accounting. Tam, K., & Kiang, M. (1992). Managerial applications of neural networks: The case of bank failure predictions. Management Science, 38(7), 926–947. Varetto, F. (1998). Genetic algorithm applications in the analysis of insolvency risk. Journal of Banking and Finance, 22, 1421–1439. Wang, K., Yang, J., Shi, G., & Wang, Q. (2008). An expanded training set based validation method to avoid over fitting for neural network classifier. In Fourth international conference on natural computation (Vol. 3, pp. 83–87). Zhang, G., Hu, Y. M., Patuwo, E. B., & Indro, C. D. (1999). Artificial neural networks in bankruptcy prediction: General framework and cross validation analysis. European Journal of Operational Research, 116, 16–32. Zhang, G. P. (2000). Neural networks for classification: A survey. IEEE Transactions on Systems, Man, and Cybernetics—Part C: Applications and Reviews, 30(4). Zhang, J. (2009). Interweave neural networks with evolutionary algorithms, cellular computing, Bayesian learning and ensemble learning. In International conference on business intelligence and financial engineering, Beijing, China. ISBN: 978-07695-3705-4.