Comparison of country risk models: hybrid neural networks, logit models, discriminant analysis and cluster techniques

Comparison of country risk models: hybrid neural networks, logit models, discriminant analysis and cluster techniques

Expert Systems with Applications 28 (2005) 137–148 www.elsevier.com/locate/eswa Comparison of country risk models: hybrid neural networks, logit mode...

307KB Sizes 0 Downloads 80 Views

Expert Systems with Applications 28 (2005) 137–148 www.elsevier.com/locate/eswa

Comparison of country risk models: hybrid neural networks, logit models, discriminant analysis and cluster techniques Juliana Yim*, Heather Mitchell School of Economics and Finance, RMIT University; 239 Bourke Street, Melbourne, Vic. 3000, Australia

Abstract This paper looks at the ability of a relatively new technique, hybrid ANN’s, to predict country risk rating. These models are compared with traditional statistical techniques and conventional ANN models. The performance of hierarchical cluster analysis and another type of ANN, the self-organizing map were also investigated, as possible methods for making country risk analysis with visual effects. The results indicate that hybrid neural networks outperform all other models. This suggests that for researchers, policymakers and others interested in early warning systems, hybrid network may be a useful tool for country risk analysis. q 2004 Published by Elsevier Ltd. JEL classification: G2 Keywords: Hybrid neural networks; Kohonen networks; Country risk; Early warning systems

1. Introduction The last two decades have seen several developing countries in external debt crises around the world (Sachs, 1996). After the oil crises, the loans provided by official institutions such as the World Bank and IMF were not sufficient to cover the large foreign debt from these economies. As result, emerging economies sharply increased their demand for commercial bank loans. Due to the large amount of debt payment rescheduling in the early 1980s, commercial banks, policymakers and researchers are now more interested in evaluating the country risk. This study investigated whether hybrid artificial neural networks can outperform traditional classification models such as, discriminant analysis (DA), logit model, probit model and ordinary neural networks for predicting country risk rating. The hybrid neural network is a relatively new technique. This type of network is formed by integrating the variables selected by the statistical models and the outputs of statistical models with those of an ordinary neural network to create hybrid models that might be more accurate than either of the techniques used separately. The performance of * Corresponding author. Tel.: C61 405662120; fax: C61 99255986. E-mail address: [email protected] (J. Yim). 0957-4174/$ - see front matter q 2004 Published by Elsevier Ltd. doi:10.1016/j.eswa.2004.08.005

hierarchical cluster analysis and another type of ANN, the self-organizing map (SOM) were also investigated, as possible methods for making country risk analysis with visual effects. Prior studies generally test and evaluate country risk models using three popular standard statistical techniques: DA, logit or probit models and cluster analysis. Frank and Cline (1971) and Grinols (1976) employed DA, Feder and Just (1977) used probit and logit models. There are few comparative studies that investigated the predictive power of statistical models for country risk rating. Saini and Bates (1978) compared logit models and DA models. They found that the results from both models were very similar. But, Schmidt (1984) compared DA, logit models and cluster analysis. He found that the logit model was superior to DA and cluster analysis. Logit, probit and DA models require assumptions such as normality of the data and independence of the predictors. In particular DA assumes the covariance matrix for the failed and non-failed groups are the same. When the data do not satisfy these assumptions, both logit and DA provide nonoptimal solutions (Altman, 1968; Ohlson, 1980). On the other hand, non-parametric and non-linear models, such as artificial neural networks (ANNs) do not rely on these assumptions that are often adopted to make traditional statistical methods tractable. Most of the papers that have been published on the comparison of ANNs and statistical models for country risk

138

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

prediction indicated that ANNs outperformed statistical models. Cosset and Roy (1994) applied ANN and logit models on a sample of 76 countries during the period of 1983– 1985. Predictions of both types of model are then compared for 1986. The selected variables were reserves per imports, net foreign debt per exports, GNP per capita, current account/GNP, propensity to invest, export variability, export growth rate, political instability. Substantially better results from the ANN models confirm their potential for more accurate economic and financial forecasting. The results show that ANN was more accurate in predicting country risk rating than logistic regression. Cooper (1999) explores the success of artificial neural networks relative to DA, logit and probit analyses in identifying countries likely to seek a rescheduling of their sovereign debt. The sample consisted of 70 countries on 22 of which rescheduled their debt during 1983. The variables selected were the average annual increase in real GNP per capita over 1960–1982, the externals debt due within one year as percentage of the exports goods and services, the interest on total external debt plus amortization on long-term debt as percentage of the export of goods and services and international reserves divided by total imports. The results show that DA, logit, probit and ANN correctly classified approximately 80, 85, 85 and 90% of the cases, respectively. According to the studies discussed above, the neural network approach has proven itself to be at least as good as, but in most cases, to be superior to traditional statistical models for predicting country risk. Again, the idea of this study is combining statistical models with ANN in order to obtain better predictions. Han, Kwon, and Lee (1996) introduced hybrid neural networks which combine neural network models with other statistical or artificial intelligence models and found that hybrid neural network models are very powerful for bankruptcy prediction. Markham and Ragsdale (1995), combined output estimated by DA with ANN and concluded that the hybrid network performed better than the DA and ANN used individually. The general conclusion of the above literature is that neural networks have potential as a forecasting tool and their integration with other statistical techniques might improve their overall performance. The authors are not aware of any study which has used hybrid ANN for predicting country risk rating. This paper is organized as follows. Section 2 introduces the hybrid neural networks and Kohonen networks. Section 3 describes the data. Section 4 presents the results of estimated models. Section 6 presents a comparison of all models. Section 6 presents concluding comments.

2. Artificial neural network models 2.1. Hybrid neural networks This section introduces a method, the hybrid neural network, to be applied to the classification problems by integrating the variables selected by the statistical models

and the outputs of statistical models with those of an ordinary neural network to create hybrid models that might be more accurate than either of the techniques when considered separately. The ordinary ANN used in this study uses a multilayer perceptron network (MLP), trained by a gradient descent algorithm called backpropagation1 (Rumelhart, McClelland, & PDP Group, 1986). The MLP is the most common type of formulation and is used for problems that involve supervised learning. We will consider two different approaches to hybrid models. The first approach is to use statistical models to select the variables to be used as inputs to the ANN. The second is to use output, such as an estimated probability, as an input to a neural net. We decided to combine ANN’s and statistical models because ANN’s have problems when dealing with large numbers of variables. These are the time taken for this selection and the possibility of overfitting. By combining statistical models with ANN’s we can reduce the problems in the following ways: † Using statistical models to preselect variables reduces the risk of overfitting and also reduces the time taken to select the model. † Using output from a statistical model as input to a ANN efficiently condenses information. To denote the various hybrid models we use the following notation. If a statistical model is used as a preprocessor for selecting the variables we add its name to ANN. So ANN-DA is a hybrid artificial neural network that used discriminant analysis to select the input variables. When we wish to indicate the probability from a statistical model is used as input, we put a ‘P’ before its name before adding it to ANN. The two effects can also be used in combination, ANN–Logit–PDA is a hybrid ANN which used a logit model to preselect the variables and has the probability from a DA model as an additional input. 2.2. Kohonen networks The Kohonen network (Kohonen, 1990) is a relatively new clustering technique, which aims to group a set of input patterns into a number of unknown groups such that the observations in each group possess similar characteristics. The end result is a map which allows the visualization and interpretation of clusters. These types of networks are trained using unsupervised competitive learning2. During the training process the network has no information about the desired outputs and the training process is based on the competition between the output units. This network consists of two layers: The input layer and the output layer. The input layer presents an input pattern to all the output neurons. Each neuron has a synaptic weight vector 1 2

Details of the backpropagation method are given in Appendix. Details of the Kohonen method are given in Appendix.

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

which gives each element of the input vector a weight. When the Kohonen map is initialised, a random value is assigned to each weight of the vector of each neuron. The patterns randomly ordered, are presented over and over again to the network. One presentation of all patterns is called an epoch. In the first stage, neighbourhood size was set to a relatively high initial value (10) to achieve global ordering during the first stage. The learning rate function type was chosen as linear, which means that during learning the learning rate decreases linearly from its initial value to 0.01. Once the training is completed, the weights are frozen and the network is ready to be tested. So, when a new observation is introduced, each neuron calculates the distance between the input vector and its weight vector. The neuron whose distance is smallest is the one who identified this pattern. On the map, neurons that identify similar input patterns will appear close to another (Martin and Serrano Cinca, 1994). This study used the Teruel program prepared by Mendrano and Martin (2001).

Table 1 Indicators for country risk rating Category

Indicator

Code

Economic

GDP per capita (US$) Savings/GDP (%) Investment/GDP (%) Real GDP (%change) Real Investment (%change) Unemployment (% of Workforce) Consumer Price Index (%change) Net Debt/GDP (%) Debt Net of Government Deposits/GDP (%) Gross Debt/GDP (%) Surplus (Deficit)/GDP (%) Primary Balance/GDP (% Revenue/GDP (%) Spending/GDP (%) Interest/GDP (%) Current Account Receipts/GDP (%) Real Exports (%change) Current Account Balance/Exports (%) Current Account Balance/ Current Account Receipts (%) Net Borrowing/ Current Account Receipts (%) Reserves/Imports (months) Gross Financing Gap (% of reserves) Net FDI/GDP (%) Net External Liabilities/Exports (%) Gross External Debt/Exports (%) Net External Debt/Exports (%) Narrow Net External Debt/Exports (%) Net Public-Sector External Debt/Exports (%) Net Investment Payments/Exports (%) Net Interest Payments/Exports (%) Political RisktK1

GDPP SAVG INVG RGDP RINV UNEM CPI NDEG DGDG GDG SURG PRBG REVG SPEG INTG CABG REXP CABE CABC

General government

Balance of payments

3. Description of the data The data was collected from a report produced by Standard and Poor’s Ratings Direct from the following web site: www.standardandpoors.com/ratingsdirect. The sample consisted of 20 high risk countries and 32 low risk countries for 2002. The holdout sample consisted of five high risk countries and 15 low risk countries. A full listing of all the countries used is given in Table A3 of the appendix. Following Cosset and Roy (1991, 1994), the objective of this study is to predict the country risk rating produced by the Euromoney. This is done because it is difficult to find information on rescheduling events as this is sometimes kept confidential. The credit rating score varies from 0 to 100. In this study countries with a score lower (or equal to 50) than 50 were classified as high risk countries and above 50 were considered low risk countries. The economic indicators tested were derived from Standard and Poor’s Ratings Direct and the political indicator was collected from Euromoney publication (March/2003). Thirtythree indicators were considered when setting up the failure prediction models. As can be observed in Table 1, the data is classified in five categories, economic, balance of payments, external debt, government and political risk. The definition for each variable considered is summarized as follows. Table A1 of the Appendix A shows the descriptive statistics of the indicators. The Jarque–Bera statistic indicates that the most of the indicators are non-normal. 4. Empirical investigation: predicting country risk rating 4.1. Discriminant analysis First we will estimate an optimum discriminant function and determine whether or not it is statistically significant.

139

External

Political risk

NBOE RESI GFP NFDI NELE GEDE TEDE NNED NPED NIPE NINT POLR

Several different models3 were estimated using the stepwise procedure for selecting the best variables. Further details of the stepwise estimation were reported in Table A2. As can be observed in Table 2, the best discriminant function was found to contain two variables: gross external debt/exports (%) and political risk from the previous year. Table 2 shows that only two selected variables are significant. Based on the Jarque–Bera statistics in Table A1, the variable political risktK1 is normal but the variable gross external debt per exports (%) is not normal. The value of Box’s M test (see Norusis, 1990 for further details) is MZ4.427, p-valueZ0.239 using the F approximation, so the assumption of constant variance is satisfied. This means that the tests of model adequacy which follows should be reasonably reliable. The discriminant function for the best model is significant and displays a canonical correlation of 0.841, showing that 70.7% of the variance in the dependent variable can be accounted for by this model. The model produced a Wilk’s lambda of 0.293, which indicates that the selected discriminant function is significant. 3

Estimated using the software package SPSS 10.0.

140

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

Table 2 The best discriminant model Step

Variable

Lambda

F

1 2

Political RisktK1 0.382 80.893 Gross External 0.293 59.147 Debt/Exports (%) ZZ3.259C0.005GEDEK0.299POLR

Sig. 5.025!10K12 8.602!10K14

The overall success rate of the model was 94%. More specifically, the success rate of predicting high risk cases was 97% and that of low risk cases, 90%. In the holdout sample the best logit model correctly classified 60% of the high risk cases and 100% of the low risk cases. So the results from the logit and probit models were inferior to DA. 4.3. ANN models

For this study, the critical cutting score is zero, so a country is classified as a low risk case if its discriminant score is positive and as a high risk case if its discriminant score is negative. Only two countries were misclassified, Kazakhstan and Panama. Both had an estimated probability of being a high risk country more than 0.7. The overall success rate of the model was 96.2%. More specifically, the success rate of predicting failure was 100% and that of success, 94%. In the holdout sample the best DA model correctly classified 80% of the high risk cases and 100% of the low risk cases. So the model produced good results. 4.2. Logit and probit models This section analyses the predictive ability of logit and probit models for identifying low risk and high risk countries. Several different models4 were estimated by experimentation. The variables used are CPI (average %change), current account balance/GDP (%), reserves/imports (months) and political risk from the previous year. The final logistic functions is5: Pr ðfailureÞ Z

1 1 C eKZi

Zi Z 0:76CPI K0:92CABE C0:54RESI K0:88POLRtK1 ð0:0191Þ

ð0:0553Þ

ð0:0702Þ

ð0:0354Þ

The final probit functions is:  2 ð Xi b 1 Kz pffiffiffiffiffiffi exp Pr ðfailureÞ Z dz 2 KN 2p Xi b Z 0:43CPI K0:52CABE C0:30RESI K0:49POLRtK1 ð0:0123Þ

ð0:0539Þ

ð0:0705Þ

ð0:0299Þ

The Hosmer and Lemeshow (1989) goodness-of-fit test for the logit and probit has a p-value above 0.8 which is greater than 0.05, implying that the model’s estimates fit the data at an acceptable level. For the logit and probit models, the countries were classified as high risk if the probability of being a high risk country exceeds a cutoff point of 0.5. The model misclassified Belize with the probability of 0.12, Kazakhstan with the probability of 0.73 and Peru with a probability of 0.39. 4 5

Estimated using the software package SPSS 10.0. Figures in brackets are p-values.

This section analyses the predictive ability of ANNs and hybrid ANNs for predicting country risk rating. From all ANN models6 tried, the best specification consisted of six variables in the input layer and one hidden layer with two neurons. The selected input variables were CPI (average % change), current account balance/GDP (%), reserves/ imports (months), gross external debt/exports (%) and political risktK1. Two neurons were selected in the hidden layer by experimentation. Learning rates and momentum for each of the models were ranged from 0.5 to 0.7. The countries were classified as high risk if the probability exceeds a cutoff point of 0.5. Only two countries were misclassified, Kazakhstan and Panama. Both had an estimated probability of being a high risk country more than 0.7. The success rate of predicting high risk cases was 100% and that of low risk cases, 94%. In the holdout sample the best ANN model correctly classified 80% of the high risk cases and 100% of the low risk cases. So the model is as good as DA. 4.4. Hybrid ANN The final topologies chosen for each of the hybrid network, are given in Table A4 of the appendix. Table 3 indicates that Belize was only misclassified by ANN–Plogit and ANN– Plogit. Uruguay and Croatia were only misclassified by ANN–logit and ANN–logit–Plogit, respectively. The hybrid models that misclassified few high risk countries were the ANN–Logit–Plogit, ANN–DA, ANN–DA–PDA, ANN– Logit–PDA and ANN–PDA networks. The probit models were not used for the hybrid models because the estimated probit models used the same variables as the logit models. Also, both models misclassified the same countries and the estimated probabilities of rescheduling were almost the same. This is not unexpected as results from these two models are unlikely to differ for the sample size used here (Madalla, 1983). So it was decided to consider only the probability of failure from the logit models to build the hybrid networks. The success rate of predicting high risk countries was 100% and that of low risk coutnries, 94%. The hybrid model that misclassified the fewest low risk countries was ANPlogit. The success rate of predicting high risk cases was 95% and that of low risk, 100%. Figures in bold represent a misclassification. In the holdout sample the best hybrid ANN model, ANN-logit-Plogit correctly classified 100% of 6

Estimated using the software package Neuroshell 2.

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

141

Table 3 Probability of rescheduling for misclassified countries Country

Status

ANN-logit

ANN-DA

ANN-logitPlogit

ANN-DAPDA

ANN-logitPDA

ANN-DAPlogit

ANN-Plogit

ANN-PDA

Belize Uruguay Croatia Kazakhstan Panama

High risk High risk Low risk Low risk Low risk

0.90 0.06 0.17 0.95 0.11

0.92 0.81 0.12 0.71 0.97

0.83 0.78 0.60 0.97 0.12

0.84 0.79 0.09 0.71 0.85

0.83 0.75 0.06 0.75 0.83

0.43a 1.00 0.13 0.62 0.53

0.43 1.00 0.13 0.46 0.49

0.80 0.77 0.08 0.70 0.81

Figures in bold represent a misclassification.

the high risk cases and 100% of the low risk cases. So the model is superior to the statistical models and ordinary ANN. 4.5. Cluster techniques Cluster analysis and Kohonen maps support the results obtained by DA, logit, probit and ANNs in identifying high risk and low risk countries, by showing the distinct areas of high risk cases and low risk cases graphically. The methods chosen also offer a way to visualize the results. By using these methods we can overcome the problems associated with finding the appropriate distribution and the functional form of the economic and political indicators. 4.5.1. Hierarchical cluster analysis For the hierarchical procedure, a Ward’s algorithm applied to squared Euclidian distances between 52 countries

produced the dendogram in Fig. 2 The codes from each country can be observed in Table A3 of the Appendix A. The variables selected for composing the best model were CPI (average % change), current account balance/ GDP (%), reserves/imports (months) and political risk in the previous period. The best variable was political risk in the previous period. The idea is that those indicators should be relatively homogenous among the countries that belong to the same cluster. The dendogram is shown in Fig. 1. The dendrogram in Figure 1 may be separated in 2 clusters: (1) high risk countries and (2) low risk countries. Uruguay was the only misclassified country. It was classified as low risk country instead of high risk country. The model correctly classified 94% of the high risk countries and 100% of the low risk countries. For the holdout sample, the model misclassified one high risk country, Jordan. The model correctly classified 80% of

Fig. 1. Dendogram of country risk.

142

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

Fig. 2. Solvency map for country risk.

Fig. 3. Weight map for country risk.

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

the high risk countries and 100% of the low risk countries7.

143

Table 4 Classification accuracy from the best models In-sample

4.5.2. Konohen networks The number of training iterations was set to 13,000, the optimal learning rate was found to be 0.05. Also during both phases the neighbourhood gradually decreases from its initial value to one. The initial neighbourhood size was set as 8. Fig. 2 shows that the model misclassified two of the low risk countries, Costa Rica and Kazakhstan. For the holdout sample the model misclassified one of the high risk countries, Jordan and one of the low risk countries, Egypt. From this map the relationships between the input variables can easily be seen, allowing us to delimit several regions. Comparing Fig. 2 (location of low risk and high risk) and Fig. 3 (distribution of synaptic weights composing the best model were CPI (average % change), Current Account Balance/GDP (%) and Reserves/Imports (months) were not able to discriminate between the countries between high risk and low risk case. Good predictors of country risk have been political risktK1. The model misclassified one low risk case. The model correctly classified 100% of the high risk countries and 93.7% of the low risk countries. For the holdout sample the model correctly classified 80% of the high risk countries and 93.3% of the low risk countries. Therefore, the results from the cluster analysis and Kohonen maps were very similar, but they misclassified different countries. The Kohonen maps can provide more information in an intuitive way by the mean of the weight maps. From these maps we found that political risk is the best indicator in predicting country risk.

5. Comparison of the models According to Table 4 the results from ANN were very similar to the ones from DA model. The performance of the ANN was the same when the hybridization with DA and logit model was considered. The results from cluster analysis were as good as the ones from Kohonen maps. The results from the cluster techniques were inferior to DA and the hybrid ANNs. The ANN–Logit–Plogit network was the best model. This model correctly classified all countries. The test for the statistical significance of the hit rate strongly rejects the null hypothesis that the model predicts no better than a non-informative model at level of significance of 1% for all the hybrid models. (for full details of this test see Frances, 2000) Therefore, the holdout test results for hybrid ANNs are very encouraging.

7 The cluster classification in the holdout sample for each country can be found in Table 11.5.A of the appendix of this chapter on the CD-ROM.

Holdout sample

Best model

Low risk countries correctly classified (%)

High risk countries correctly classified (%)

Low risk countries correctly classified (%)

High risk countries correctly classified (%)

DA Logit ANN ANN–Logit–Plogit Cluster Kohonen

94.0 90.0 90.0 94.0 94.0 94.0

100.0 97.0 97.0 100.0 100.0 100.0

100.0 100.0 100.0 100.0 100.0 93.0

80.0 60.0 80.0 100.0 80.0 80.0

6. Conclusions This study investigated whether two artificial neural networks, multilayer perceptron and hybrid networks, can outperform traditional statistical models for predicting country risk rating. Cluster techniques were also used for making prediction with visual effects. The results in sample indicate that except for logit and probit models, DA and ANNs worked very well for predicting country risk in the holdout sample. But, the hybrid ANN, ANN-Logit-Plogit, produced the best results in the holdout sample. The Kohonen and Cluster analysis worked very well, but they were also inferior to the best hybrid ANN. This supports the conclusion that for researchers, policymakers and others interested in early warning systems, hybrid networks would be useful. Acknowledgements The authors would like to thank Michael McKenzie for helpful comments.

Appendix A A.1. Backpropagation algorithm The first phase of the backpropagation algorithm consists of repeatedly presenting the network with examples of input and expected output. Suppose that the qth neuron of the hidden layer receives the activation signal, Hq, given by X Hq Z vqj xj ; j

where xj is the signal to the input neuron j and vqj is the weight of the connection between input neuron j and the hidden neuron q. This activation signal is then transformed by a transfer function f in the hidden layer to give the output hq Z f ðHq Þ:

144

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

Table A1 Descriptive statistics Indicator

Rescheduling countries

GDP per capita (US$) Savings/GDP (%) Investment/GDP (%) Real GDP (% change) Real investment (% change) Unemployment (% of workforce) Consumer price index (% change) Net debt/GDP (%) Debt net of government deposits/GDP (%) Gross debt/GDP (%) Surplus (deficit)/GDP (%) Primary balance/GDP (% Revenue/GDP (%) Spending/GDP (%) Interest/GDP (%) Current account receipts/GDP (%) Real exports (% change) Current account balance/exports (%) Current account balance/current account receipts (%) Net borrowing/ Current account receipts (%) Reserves/imports (months) Gross financing gap (% of reserves) Net FDI/GDP (%) Net external liabilities/exports (%) Gross external debt/exports (%) Net external debt/exports (%) Narrow net external debt/exports (%) Net public-sector external debt/ exports (%) Net investment payments/exports (%) Net interest payments/exports (%) Political risktK1

Non-rescheduling countries

Mean

Std Dev.

Skewness

Kurtosis

2037.45 18.11 20.39 1.95 5.28 10.78 14.00 69.95 71.20

1397.67 5.55 5.29 3.25 10.15 4.90 18.00 34.97 34.60

0.49 K0.12 0.13 K2.65 K0.27 1.32 2.46 1.07 1.08

2.06 1.99 1.98 10.31 4.46 4.28 8.26 4.27 4.41

1.53 0.82 0.83 68.02 1.81 5.74 43.21 5.17 5.57

76.80 K4.62 1.23 25.85 30.25 5.95 38.15 3.21 K4.36 K20.36

36.52 4.78 3.26 5.71 8.23 6.00 17.28 3.89 9.38 56.24

0.95 K1.88 0.59 K0.37 0.23 1.86 0.33 K0.87 K3.24 K3.90

3.94 5.88 4.67 2.07 2.57 5.57 1.82 5.29 13.93 16.89

3.75 18.76 3.49 1.18 0.33 17.08 1.53 6.21 134.62 211.64

50.19 K1.43 1.63 33.19 34.66 2.68 57.44 0.50 1.97 1.58

12.53

55.94

4.03

17.52

229.73

K2.91

4.07 176.15 2.41 225.74 222.50 109.80 141.55 98.45

1.80 131.76 2.37 140.77 115.32 88.82 73.14 59.83

0.78 0.90 0.88 0.99 0.82 K1.11 0.23 0.23

3.40 2.83 3.53 3.01 3.41 4.63 2.28 2.10

2.15 2.72 2.81 3.08 2.41 6.28 0.60 0.85

4.84 114.80 2.19 50.29 140.13 9.19 36.09 K9.78

10.19 6.90 8.41

8.51 6.36 2.63

0.50 0.60 0.50

2.51 2.63 3.59

1.02 1.33 1.11

2.51 2.17 17.44

The output neuron i now receives the activation signal, Oi, from the hidden nodes given by X Oi Z wiq hq ;

JB

Mean

Std Dev.

Skewness

Kurtosis

JB

2.25 2.76 4.99 2.79 22.62 2.68 29.10 6.68 6.97

4.43 1.97 6.45 0.16 559.26 0.86 1055.53 37.50 21.30

32.09 1.59 4.99 1.57 4.71 1.48 12.38 0.13 10.40 0.06 1.81 0.32 31.45 2.23 20.23 K4.57 9.13 1.69 14.40 0.92

6.16 6.01 5.69 2.00 1.69 2.08 10.53 23.45 5.34 4.18

26.87 25.14 20.61 1.41 2.30 1.64 102.18 585.49 22.52 6.38

9.83 K0.79

4.40

5.70

2.57 2.37 1.71 K0.40 1.77 K1.02 K0.63 K1.92

11.61 9.57 6.61 2.50 5.78 6.16 5.87 8.44

134.07 68.28 31.89 1.17 26.92 18.94 13.09 59.08

8.86 K1.72 3.57 K0.39 3.97 0.54

6.87 5.95 2.31

35.72 11.63 2.19

12342.90 12220.71 0.85 24.03 8.29 0.61 22.45 7.14 K0.51 2.99 1.75 K0.13 K0.16 20.09 K4.41 8.28 4.76 0.41 K0.10 18.37 K5.25 10.45 65.89 K1.97 38.74 37.69 0.42

4.21 191.73 3.38 99.82 117.86 107.35 111.78 85.46

used by the backpropagation is given by Dwiq Z Kg

vEðwÞ ; vwiq

q

where wiq is the weight of the connection between hidden neuron q and output neuron i. This is transformed again to give the output signal oi Z f ðOi:Þ This is then compared with the desired, or actual value of the output neuron, and the function of squared errors for each node, which is to be minimized, is given by X EðwÞ Z 12 ðdi K oi Þ2 ; i

In the second phase the weights are modified to reduce the squared error. The change in weights, Dwiq,

where 0!g!1 is the learning rate. Using the chain rule, it can easily be shown that Dwiq Z Kg

vE voi vOi Z gðdi K oi Þf 0 ðOi Þhq Z gdoi hq ; voi vOi vwiq

where doi is the error signal of neuron i and is given by doi Z ðdi K oi Þf 0 ðOi Þ: To avoid oscillation at large g, the change in the weight is made dependent on the past weight change by adding a momentum term Dwiq ðt C 1Þ Z gdoi hq C aDwiq ðtÞ;

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148 Table A2 Stepwise procedure for DA Step

F to Remove

Variables in the analysis 1 Political risk 2 Political risk Gross External Debt/ Exports (%)

80.893 100.072 14.905

Step Variables not in the analysis 0 Real GDP (%change CPI (average %change) Debt (% of GDP) Gross Fiscal Performance (% of GDP) Surplus (Deficit) Fiscal performance (% of GDP) revenue Fiscal performance (% of GDP) spending Current account receipts/GDP (%) Current account balance/GDP (%) Current account balance/current account receipts(%) Reserves/imports (months) Gross external debt/exports (%) Net external debt/exports (%) Narrow net external debt/exports (%) net public- sector external debt/exports (%) Net investment payments/exports (%) Political risk 1 Real GDP (% change CPI (average % change) Debt (% of GDP) Gross Fiscal performance (% of GDP) Surplus (Deficit) Fiscal performance (% of GDP) revenue Fiscal performance (% of GDP) spending Current account receipts/GDP (%) Current account balance/GDP (%) Current account balance/current account receipts(%) Reserves/imports (months) Gross External Debt/Exports (%) Net external debt/exports (%) Narrow net external debt/exports (%) Net public- sector external debt/exports (%) Net investment payments/exports (%) 2 Real GDP (% change CPI (average % change) Debt (% of GDP) gross Fiscal performance (% of gdp) surplus (deficit) Fiscal performance (% of gdp) revenue Fiscal performance (% of gdp) spending Current account receipts/GDP (%) Current account balance/GDP (%) Current account balance/current account receipts(%) Reserves/imports (months) Net external debt/exports (%) Narrow net external debt/exports (%) Net public- sector external debt/ exports (%) Net investment payments/ exports (%)

Min. D Squared

145

where a is a constant chosen by the operator. Similarly it can be shown that the change in the weight between the hidden neuron i and the input neuron j, Dvij, is given by Dvqj Z gdhq xj ;

.497 6.573

where dhq is the error signal of neuron q and is given by X doi wiq : dhq Z f 0 ðHq Þ i

F to Enter

Min. D Squared

2.278 7.362 7.612 5.174

0.185 0.598 0.618 0.420

6.166 2.573 6.301 5.797 4.452

0.501 0.209 0.512 0.471 0.362

0.598 6.112 12.284 13.996 24.483 9.516 80.893 5.042 1.268 2.553 .024

0.049 0.497 0.998 1.137 1.989 0.773 6.573 7.667 6.848 7.127 6.578

1.139 0.944 3.303 0.571 1.056

6.820 6.777 7.289 6.696 6.802

0.526 14.905 3.064 8.605 4.882 0.365 0.627 0.551 0.008 1.045

6.687 9.807 7.238 8.440 7.632 6.652 9.989 9.967 9.810 10.109

3.122 1.560 0.000 0.795 .862

10.709 10.258 9.808 10.037 10.057

0.017 0.065 0.382 0.156 0.007

9.812 9.826 9.918 9.852 9.809

As before a momentum term can be used to prevent oscillation. A.2. Algorithm for Kohonen network A basic competitive learning network has one layer of input neurons and one layer of output neurons. Let m be the number of neurons in the input layer: x Z ðx1 ; x2 ; .; xm Þ Assuming that there are n nodes in the output layer, the weight vector of a neuron i can be written as: wi Z ðwi1 ; wi2 ; .; wim Þ where wij is the weight associated with the neuron i and the input neuron j. The weight vectors are initialised to random values. These values will be adjusted as the training progresses. The weight matrix is used to propagate the network input values to the map neurons. Standardised values of the input vectors, Xjs, are calculated as follows: Xj Z

ðinputðj; kÞ K averageðiÞ Dmax

where average (i) is the arithmetic average of the input i and Dmax is the range of the input i. The same procedure is used for the weights. The Manhattan distance is used as a similarity measure given by: X dðx; wi Þ Z jxj K wij j: j

An observation is presented and each neuron computes the distance between the values of the variables of this pattern and the values of its synaptic weights. The neuron whose distance is smallest is the winner neuron. This neuron adjusts its weights to move towards the input pattern. So, each neuron in the output layer learns recognize a specific type of input pattern. After the best matching is selected, the weight vectors of the winning neuron will be adjusted based on the following formula: Wij ðt C 1Þ Z Wij ðtÞ C gt ðXj ðtÞ K Wij ðtÞÞ where the learning coefficient gt is a slowly decreasing function of the time to guarantee convergence (Tables A1–A4).

146

Table A3 Probability of being a high risk and low risk countries in-sample and holdout sample Countries

Status

pda

Plogit

Pprobit

ANN

ANN– logit

ANN–DA

ANN– logit– plogit

ANN– DA–pda

ANN– logit–pda

ANN– DA–plogit

ANN– plogit

ANN–pda

AR BZ BO BR CO EC ID JM LB MN PK PG

Argentina Belize Bolivia Brazil Colombia Ecuador Indonesia Jamaica Lebanon Mongolia Pakistan Papua New Guinea Paraguay Peru Romania Russia Senegal Turkey Ukraine Uruguay Barbados Botswana Chile China Costa Rica Croatia Czech Republic Denmark El Salvador Estonia Germany Iceland India Israel Japan Kazakhstan Kuwait Latvia Morocco Norway Oman Panama

1 1 1 1 1 1 1 1 1 1 1 1

1.00 0.91 1.00 0.98 0.97 1.00 0.99 0.93 1.00 1.00 1.00 0.82

1.00 0.12 0.95 0.53 0.92 1.00 0.62 0.99 1.00 1.00 0.84 0.97

1.00 0.14 0.96 0.54 0.92 1.00 0.62 1.00 1.00 1.00 0.84 0.98

1.00 0.89 0.99 0.92 0.96 1.00 1.00 0.95 1.00 1.00 1.00 0.86

1.00 0.90 1.00 0.88 0.97 1.00 1.00 1.00 1.00 1.00 1.00 0.97

1.00 0.92 1.00 0.98 0.97 1.00 0.99 0.93 1.00 1.00 1.00 0.82

0.98 0.83 0.98 0.96 0.97 0.98 0.98 0.98 0.98 0.98 0.98 0.98

0.87 0.84 0.87 0.86 0.85 0.87 0.86 0.84 0.87 0.86 0.87 0.80

0.85 0.83 0.84 0.84 0.84 0.85 0.84 0.84 0.85 0.84 0.85 0.81

1.00 0.43 1.00 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.97

1.00 0.43 1.00 0.98 1.00 1.00 1.00 0.99 1.00 1.00 1.00 0.89

0.83 0.80 0.82 0.82 0.81 0.82 0.82 0.80 0.83 0.82 0.82 0.77

1 1 1 1 1 1 1 1 0 0 0 0 0 0 0

0.91 0.97 0.86 0.95 0.99 0.99 1.00 0.80 0.05 0.01 0.01 0.00 0.03 0.16 0.00

0.97 0.39 1.00 0.95 0.80 1.00 0.89 0.75 0.00 0.44 0.00 0.00 0.43 0.32 0.00

0.98 0.41 1.00 0.96 0.79 1.00 0.89 0.76 0.00 0.45 0.00 0.00 0.47 0.35 0.00

0.93 0.96 0.95 0.97 0.99 1.00 1.00 0.78 0.03 0.11 0.00 0.00 0.02 0.11 0.00

0.98 0.95 1.00 0.98 1.00 1.00 1.00 0.06 0.06 0.11 0.03 0.03 0.06 0.17 0.03

0.91 0.97 0.87 0.95 0.99 0.99 1.00 0.81 0.03 0.00 0.00 0.00 0.02 0.12 0.00

0.98 0.94 0.98 0.98 0.98 0.98 0.98 0.78 0.03 0.11 0.00 0.00 0.47 0.60 0.00

0.84 0.85 0.82 0.85 0.86 0.86 0.86 0.79 0.07 0.06 0.06 0.06 0.07 0.09 0.06

0.83 0.84 0.83 0.84 0.84 0.85 0.85 0.75 0.04 0.04 0.03 0.03 0.04 0.06 0.03

1.00 0.77 0.99 1.00 1.00 1.00 1.00 1.00 0.05 0.02 0.01 0.01 0.05 0.13 0.00

0.97 0.90 0.93 1.00 1.00 1.00 1.00 1.00 0.07 0.09 0.04 0.03 0.05 0.13 0.02

0.80 0.81 0.79 0.81 0.82 0.82 0.82 0.77 0.04 0.03 0.04 0.03 0.04 0.08 0.03

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0.00 0.37 0.00 0.00 0.00 0.06 0.02 0.00 0.72 0.00 0.03 0.10 0.00 0.00 0.97

0.00 0.01 0.01 0.00 0.00 0.01 0.00 0.00 0.73 0.00 0.01 0.00 0.00 0.00 0.00

0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.73 0.00 0.01 0.00 0.00 0.00 0.00

0.00 0.28 0.00 0.00 0.00 0.03 0.00 0.00 0.72 0.00 0.00 0.06 0.00 0.00 0.83

0.02 0.22 0.03 0.02 0.02 0.05 0.03 0.02 0.95 0.02 0.05 0.05 0.02 0.03 0.11

0.00 0.32 0.00 0.00 0.00 0.04 0.00 0.00 0.71 0.00 0.01 0.07 0.00 0.00 0.97

0.00 0.17 0.00 0.00 0.00 0.01 0.00 0.00 0.97 0.00 0.00 0.00 0.00 0.00 0.12

0.06 0.20 0.06 0.06 0.06 0.07 0.06 0.06 0.71 0.06 0.07 0.07 0.06 0.06 0.85

0.03 0.17 0.03 0.03 0.03 0.04 0.03 0.03 0.75 0.03 0.04 0.04 0.03 0.03 0.83

0.00 0.19 0.00 0.00 0.01 0.06 0.02 0.00 0.62 0.00 0.04 0.08 0.00 0.01 0.53

0.02 0.24 0.02 0.02 0.04 0.09 0.04 0.02 0.46 0.02 0.07 0.13 0.02 0.03 0.49

0.03 0.23 0.03 0.03 0.04 0.05 0.04 0.03 0.70 0.03 0.04 0.05 0.03 0.03 0.81

PY PE RO RU SE TR UA UY BB BW CL CN CR HR CZ DK SV EE GE IS IN IL JP KZ KW LV MA NO OM PA

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

Code

PH QA SG SK TW TH TT TN UK

GT JO VE ZA EG AU CA CY HK HU KR LT MY MX NZ PL SE SZ

0 0 0 0

0.18 0.01 0.00 0.01

0.00 0.00 0.00 0.50

0.00 0.00 0.00 0.50

0.11 0.00 0.00 0.00

0.08 0.02 0.02 0.05

0.13 0.00 0.00 0.00

0.06 0.00 0.00 0.30

0.09 0.06 0.06 0.06

0.06 0.03 0.03 0.04

0.12 0.01 0.00 0.03

0.15 0.03 0.02 0.04

0.08 0.03 0.03 0.04

0 0 0

0.00 0.01 0.04

0.00 0.00 0.00

0.00 0.00 0.00

0.00 0.00 0.03

0.02 0.02 0.06

0.00 0.00 0.02

0.00 0.00 0.03

0.06 0.06 0.07

0.03 0.03 0.04

0.00 0.02 0.04

0.02 0.03 0.06

0.03 0.03 0.04

0 0

0.04 0.01

0.00 0.00

0.00 0.00

0.01 0.00

0.05 0.02

0.02 0.00

0.01 0.00

0.07 0.06

0.04 0.03

0.05 0.02

0.08 0.15

0.04 0.04

0 1 1

0.00 0.54 0.45

0.00 0.97 0.10

0.00 0.98 0.12

0.00 0.65 0.23

0.02 0.88 0.69

0.00 0.59 0.27

0.00 0.97 0.69

0.06 0.45 0.27

0.03 0.51 0.31

0.00 0.92 0.17

0.04 0.87 0.16

0.03 0.51 0.32

1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0.61 0.64 0.70 0.45 0.37 0.05 0.02 0.27 0.11 0.08 0.11 0.31 0.17 0.18 0.08 0.17 0.02 0.04

0.77 0.04 1.00 0.01 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.01 0.00 0.00

0.77 0.03 1.00 0.01 0.05 0.00 0.00 0.00 0.00 0.00 0.00 0.03 0.00 0.00 0.00 0.00 0.00 0.00

0.80 0.86 0.97 0.28 0.14 0.00 0.00 0.18 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.00 0.00 0.00

0.96 0.86 0.99 0.32 0.10 0.02 0.02 0.02 0.02 0.03 0.03 0.10 0.03 0.03 0.02 0.03 0.02 0.02

0.78 0.86 0.93 0.31 0.14 0.00 0.00 0.08 0.00 0.00 0.00 0.05 0.00 0.00 0.00 0.00 0.00 0.00

0.97 0.64 0.98 0.26 0.06 0.00 0.00 0.00 0.00 0.00 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00

0.59 0.65 0.73 0.28 0.18 0.06 0.06 0.11 0.07 0.06 0.07 0.13 0.08 0.08 0.06 0.08 0.06 0.06

0.64 0.66 0.75 0.29 0.16 0.03 0.03 0.05 0.04 0.04 0.04 0.11 0.05 0.05 0.03 0.05 0.03 0.03

0.78 0.38 1.00 0.18 0.12 0.00 0.00 0.10 0.01 0.00 0.01 0.07 0.02 0.02 0.00 0.02 0.00 0.00

0.69 0.40 1.00 0.21 0.17 0.03 0.02 0.32 0.03 0.02 0.03 0.09 0.03 0.05 0.03 0.05 0.02 0.02

0.60 0.65 0.71 0.33 0.21 0.04 0.03 0.13 0.04 0.04 0.04 0.14 0.06 0.07 0.04 0.06 0.03 0.03

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

US BG DO

Philippines Qatar Singapore Slovak Republic Taiwan Thailand Trinidad and Tobago Tunisia United Kingdom United States Bulgaria Dominican Republic Guatemala Jordan Venezuela South Africa Egypt Australia Canada Cyprus Hong Kong Hungary Korea Lithuania Malaysia Mexico New Zealand Poland Sweden Switzerland

147

148

J. Yim, H. Mitchell / Expert Systems with Applications 28 (2005) 137–148

Table A4 Topology of the best ANN models Topology

ANN

ANN-DA

ANN-Logit

ANN-Logitplogit

ANN-LogitPDA

ANN-DAPDA

ANN-DAPlogit

ANN-PDA

ANN-Plogit

Best network Learning rate Momentum

5!2!1

2!2!1

4!2!1

5!2!1

3!2!1

3!2!1

3!2!1

6!3!1

6!3!1

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.5

0.7

0.7

0.7

0.7

0.7

0.7

0.7

0.7

0.7

References Altman, E. (1968). ’Discriminant Analysis and Prediction of Corporate Bankruptcy’. Journal of Finance, pp. 589–609. Cooper, J. C. B. (1999). Artificial neural networks versus multivariate statistics: An application from economics. Journal of Applied Statistics, 26(8), 909–921. Cosset, J. C., & Roy, J. (1991). The determinants of country risk ratings. Journal of International Business Studies, 22(1), 135–142. Cosset, J. C., & Roy, J. (1994). Predicting country risk ratings using artificial neural networks Advances in Artificial Intelligence in Economics, Finance, and Management, Vol. 1 pp. 141–157. Feder, G., & Just, R. E. (1977). A study of debt servicing capacity applying logit analysis. Journal of Development Economics , 4. Frank, C. R., Jr., & Cline, W. R. (1971). Measurement of debt servicing capacity: an application of discriminant analysis. Journal of International Economics, 1, 327–344. Grinols, E. (1976). International debt rescheduling and discrimination using financial variables. Washington, DC: US Treasury Department mimeo. Han, I., Kwon, Y., & Lee, K. C. (1996). Hybrid neural network models for bankruptcy predictions’. Decision Support Systems, 18, 63–72. Hosmer, D. W., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley. Kohonen, T. (1990). The self-organising map. Proceedings of the IEEE, 78(9), 1464–1480. Markham, I., & Ragsdale, C. (1995). Combining neural networks and statistical predictions to solve the classification problem in discriminant analysis. Decision Sciences, 26(2), 229–241.

Norusis, M. J. (1990). SPSS advanced statistics user’s guide. SPSS. Ohlson, J. (1980). Financial ratios and the probabilistic prediction of bankruptcy. Journal of Accounting Research, 18, 109–131. Rumelhart, D., McClelland, J., & PDP Group (1986). Parallel distributed processing. Exploration in the microstructure of cognition. 1. foundation. Cambridge, MA: MIT Press. Sachs, J. D. (1996). A framework for action to resolve debt problems of the heavily indebted poor countries. Washington: Development Committee of the World Bank and International Monetary Fund. Saini, K., & Bates, P. (1978). Statistical techniques for determining debtservice capacity for developing countries. Federal Reserve Bank of New York Research Paper, no. 7818. Schmidt, R. (1984). Early warning of debt rescheduling. Journal of Banking and Finance, 8, 357–370. Martı´n B. & Serrano Cinca, C. (1994): “Self Organizing Neural Networks: The Financial State of Spanish Companies”, en Neural Networks in the Capital Markets, coautor: Bonifacio Martı´n, noviembre 1994, Ed. N.A. Refenes, John Wiley & Sons. Medrano, N y & Martı´n, B (2001): "Manual de uso del programa Teruel, una red neuronal autoorganizada", [en lı´nea] 5campus.com, Inteligencia Artificial, http://www.5campus.com/leccion/teruel. Franses, P.H. (2000). ‘A test for the hit rate in binary response models’, International Journal of Market Research, V. 42, n.2, pp. 239–245. Madalla, G.S. 1983, Limited-dependent and qualitative variables in Econometrics, Cambridge University Press, UK.