A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies

A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom A nonline...

358KB Sizes 0 Downloads 60 Views

Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies Xiangrong Zhang a,b,n, Longying Hu b a b

School of Management, Harbin Institute of Technology, Harbin, China School of Management, Heilongjiang Institute of Technology, Harbin, China

art ic l e i nf o

a b s t r a c t

Article history: Received 5 September 2015 Received in revised form 13 November 2015 Accepted 20 November 2015

Financial distress prediction (FDP) is of great importance for managers, creditors and investors to take correct measures so as to reduce loss. Many quantitative methods have been proposed to develop empirical models for FDP recently. In this paper, a nonlinear subspace multiple kernel learning (MKL) method is proposed for the task of FDP. A key point is how basis kernels could be well explored for measuring similarity between samples while a MKL strategy is used for FDP. In the proposed MKL method, a divide-and-conquer strategy is adopted to learn the weights of the basis kernels and the optimal predictor for FDP, respectively. The optimal weights of the basis kernels in linear combination is derived through solving a nonlinear form of maximum eigenvalue problem instead of solving complicated multiple-kernel optimization. Support vector machine (SVM) is then used to generate an optimal predictor with the optimally linearly-combined kernel. In experiments, the proposed method is compared with other FDP methods on Normal and ST Chinese listed companies during the period of 2006– 2013, in order to demonstrate the prediction performance. The performance of the proposed method is superior to the state-of-the-art predictor compared in the experiments. & 2015 Elsevier B.V. All rights reserved.

Keywords: Financial distress prediction Multiple kernels learning Listed company Subspace learning

1. Introduction As the world's second largest economy, Chinese economic development has brought a great power to the global economic recovery, but the enterprise management mechanism remains backward state, including listed companies’ financial distress study. Accurate judgment before arising the company financial distress will benefit to reduce property loss form of country and companies' themselves. The companies can take some effective measure to stop this situation spend and bring it back on track. Financial distress prediction (FDP) is using some useful methods to catch this situation, analyzing the report data from enterprises before distress arising [1–3]. In the early stages, some methods were developed for FDP, such as univariate analysis [1], multiple discriminant analysis (MDA) [2], logistic regression algorithm (Logit) [1]. With the development of some artificial intelligence methods, these methods are also used in FDP, like neural networks (NNs) [4,5], support vector machine (SVM) [5–7]. In recent years, some combinations of multiple classifiers are also present to solve the limited explanatory ability problem in single classifiers, such as Bagging method n Corresponding author at: School of Management, Harbin Institute of Technology, 92 West Dazhi Street, Harbin 150001, China. Tel.: þ86 159 451 82234.

and Adaboost method [9]. Most of researches focus on the model learning for FDP but ignore the importance of financial ratios selection. Although there have already exist some state-of-the-art feature selection method which can be used for ratio selection, like Principal Component Analysis (PCA) [10], Linear Discriminate Analysis (LDA) [11], Kernel-PCA [12] and Kernel-LDA [13], those feature selection are not suitable for FDP and not have a sufficient interpretability. Recently, SVM method has present excellent nonlinear generalization ability to high dimension and small sample evaluation problem and can get upper prediction accuracy using kernel method. SVM has been recently applied for FDP task and demonstrated good performance [7,8]. The conventional SVM only use single kernel like Gaussian kernel with fixed parameters to measure similarity of samples from same class or different classes. In recent years, the limitation of SVM with single kernel has been recognized gradually. The limitation of single kernel learning motivates researchers to develop new kind of kernel learning methods, called multiple kernel learning (MKL). The Multiple Kernel Learning (MKL) methods based on SVM framework can get a better perform, using a composite kernel effectively to increase the adaptive capacity [14–18]. Essentially, multiple basis kernels with different forms or same forms but different parameters provide more enhanced ability to measure sample similarity. Integrating the basis kernels will results in better generalization

http://dx.doi.org/10.1016/j.neucom.2015.11.078 0925-2312/& 2015 Elsevier B.V. All rights reserved.

Please cite this article as: X. Zhang, L. Hu, A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.11.078i

X. Zhang, L. Hu / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

capability and better classification performance. In our previous work, a two-step multiple kernel regression (MKR) was proposed for macroeconomic data forecasting of China [19]. Those existing MKL algorithms demonstrate better performance than the conventional SVM for forecasting and FDP. However, better utilizing the potential of basis kernels for measuring sample similarity is still an open topic. In this paper, a nonlinear subspace multiple kernel learning (MKL) method is proposed for the task of FDP. In the proposed MKL method, a divide-and-conquer strategy is adopted to separately learn the weights of the basis kernels and the optimal predictor for FDP. The optimal weights of basis kernels in linear combination is derived through solving a nonlinear form of maximum eigenvalue problem instead of solving complicated multiple-kernel optimization. Support vector machine (SVM) is then used to form an optimal predictor with the optimally linearly-combined kernel. The main contribution of this paper can be summarized as follow. In the existing MKL algorithms, two main ways to learn a linear combination of the basis kernels. One is to directly solve a complicated optimization problem which simultaneously optimizes the weights and the final classification results. The other one adopts linear subspace methods to learn the optimally linear combination of the basis kernels. Compared to the existing state-of-the-art, the main contribution of this work is to adopt more effectively nonlinear subspace methods to get a combined kernel which has excellent ability to learn samples. The rest of the paper is divided into five sections. Section 2 briefly describes the kernel learning and MKL. Section 3 represents the proposed MKL method and its' flowchart for task of FDP. Section 4 gives a detailed description of the test data, i.e. the Chinese listed companies' ratios data and analysis of the experimental result. The last section provides conclusion.

In the proposed MKL method, the basis kernels are firstly generated by means of Gaussian kernels with different bandwidth parameters. The optimal weights of the basis kernels in linear combination form are learned via subspace learning manner. In other words, searching the optimal weights is converted into a subspace learning problem. In order to solve the subspace learning problem, an eigenvalue decomposition method is performed on basis kernels in Reproduced Kernel Hilbert Space (RKHS). The eigenvector which responds to the maximum eigenvalue is just the optimal weights of the basis kernels. 2.1. Conventional MKL  N Given a set of training data set xi ; yi i ¼ 1 , for binary classification, yA f þ 1;  1g. As we know, the dual optimization of the conventional SVM can be written as the following form 8 9 N N X N < = X 1X αi  αi αj yi yj Kðxi ; xj Þ max Lðαi ; αj Þ ¼ : ; 2 i¼1 i ¼ 1j ¼ 1

s:t:

i¼1 > > : αi ; αj A ½0; C; 8 i; j ¼ 1; 2; ⋯; N

Here, the Gaussian kernel is used as the kernel function. The Gaussian kernel is the most representative kernel with some merits such as translation invariability. In the proposed MKL method, the Gaussian kernels with different bandwidths are fixed as the predefined basis kernels. As far as MKL is concerned, K can be substitute by convex linear combination of predefined basis kernels with different kernel forms or different kernel parameters. The linear combination of the basis kernels can be expressed as follows M X



ð1Þ

where αi and αj are Lagrange multipliers, and if αi is nonzero, the corresponding xi is called support vector which determines the decision hyperplane. In Eq. (1), K is the kernel matrix which can be denoted by   ðK Þi;j ¼ k xi ; xj . In aspect of forms of kernel mapping, Gaussian

ð3Þ

dm K m

m¼1

(PSD) where fK m gM m ¼ 1 is a set of the positive semidefinite  M P basis kernel matrices with bounded trace and dm dm Z 0; m¼1

M dm ¼ 1 are the corresponding weights. m¼1

By integrating Eq. (3) into Eq. (1), the dual problem of MKL under the optimization routine of SVM can be represented as follows ( Lðαi ; αj Þ

max ¼

N X

αi 

i¼1

s:t:

2. The proposed MKL method

8 N X > > < αi yi ¼ 0

radial basis function (RBF) kernel which is widely used in various tasks of signal processing and pattern recognitions, is considered in this paper. The Gaussian kernel function is given below   ‖x  z‖2 kðx; zÞ ¼ exp  ð2Þ 2σ 2

8 N
) N X N M X 1X αi αj yi yj dm K m ðxi ; xj Þ 2i ¼ 1j ¼ 1 m¼1

αi yi ¼ 0

:i¼1

αi ; αj A ½0; C; 8 i; j ¼ 1; 2; ⋯; N dm Z 0; and

M X

dm ¼ 1:

ð4Þ

m¼1

By means of the optimization routine of the conventional SVM, MKL refers to achieve the simultaneous optimization of the kernel combination and learning performance. 2.2. Subspace learning in MKL Given the Gram matrices of M basis kernels K0 ¼ fK m ; m ¼ 1; 2; ⋯; M; K m A ℝNN g. We can reformulate Eq. (3) as follows T

ð5Þ

K ¼d K where d Z0;

M P i¼1

di ¼ 1 and K ¼ ½K 1 K 2 ⋯K M T .

It is easy to find that the Eq. (5) is a typical form of subspace projection. Now we build a loss function as follows LðU; Z Þ ¼ ‖K  UZ‖2F

ð6Þ

where U is the projection matrix whose columns, Z is the projected matrix in the linear subspace spanned by U, and ‖ U‖F denotes Frobenius norm of matrix. According to projection theorem, minimizing the loss function LðU; Z Þ is only determined by Z ¼ U T K. Furthermore, minimizing the problem for LðU; Z Þ can be converted to the following dual problem under a rank-one constraint argmax U T ΣU ¼ argmax U T K U

F

U

F

Please cite this article as: X. Zhang, L. Hu, A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.11.078i

X. Zhang, L. Hu / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

s:t:U T U ¼ 1

ð7Þ

where Σ is equal to KKT . The dual problem in Eq. (7) can be solved by singular value decomposition (SVD) or eigenvalue decomposition here. The variances of Z can be maximized by searching the U  . It should be noted that only max-variance projection vector is taken into account. After that, we can obtain the projection vector U  ¼ u1 ¼ ½u11 ; u12 ; ⋯; u1M T which is just the optimal weights for the basis kernels obtained by maximizing the variance of the basis kernels,  i.e.,d ¼ u1 . As a result, referred to Eq. (3), we can obtain the finally optimal kernel learned from the basis kernels as follows M X

K ¼ uT1 K ¼

u1m K m ¼

m¼1

M X



ð8Þ

dm K m

qm ¼ VecðK m Þ

ð9Þ

where K m is a basis kernel matrix, K m A ℝ ; m ¼ 1; 2; ⋯; M, qm is a column vector and is composed by stacking each row of K m into a 2 column, qm A ℝN . Through vectorization, qm can be treated as a sample in original space.   We introduce a kernel mapping function, φ: qm A ℝφ qm A F. Using such a kernel mapping, the samples qm ; m ¼ 1; 2; ⋯; Mg is mapped from the original space into the feature space. P In the feature space, the variance of the mapped samples is F. We have NN

¼

m T 1X φ qj φ qj mj¼1



m X

α i φðx i Þ

ð13Þ

i¼1

Accordingly, eigenvalue decomposition of covariance matrix φ can be expressed as X λUV ¼ φ UV ð14Þ

P

Furthermore, Eq. (14) can be converted into the following form

X  λ φðxÞ UV ¼ φðxÞ U φ U V ð15Þ 

For all m eigenvalues, Eq. (15) can be rewritten as



X λ φðxÞ U Vk ¼ φðxÞ U φ U Vk where k ¼ 1; 2; ⋯; m;Vk ¼

In our previous work, a linear subspace MKL algorithm was proposed, in which an eigenvalue decomposition is performed in the original data space. The data space are composed of the basis kernel matrices. This method implements a linear learning for the optimal weights via maximizing variance projection. Here, a nonlinear subspace learning is used instead of the linear subspace method. In the nonlinear subspace learning, the basis kernel matrices are firstly converted into vectors as ‘samples’ and are mapped into higher dimensional feature space via a given kernel function. Eq. (5) is a typical subspace learning problem. We can get the vector form of the kernel matrix by vectorization. Here we denote the vectorization as Vec and the vector as q. So we have

φ

  this space, i.e., V A span φðx1 Þ; φðx2 Þ; ⋯; φðxm Þ . So we have the following linear expression



m¼1

2.3. Nonlinear subspace learning from basis kernels

X

3

ð10Þ

In the feature space, the subspace projection of Eq. (5) can be converted into the following form   ð11Þ ℒ U φ ; Z φ ¼ ‖Kφ U φ Z φ ‖2F  T where Kφ ¼ q1 q2 ⋯qM .   Accordingly, maximizing the variance of samples qm , the solution to the subspace projection problem of Eq. (11) is equal to the eigenvector which responds  to  the maximum eigenvalue of covariance matrix of samples qm . So the minimizing the problem for ℒ U φ ; Z φ of Eq. (11) can be converted to the following dual problem under a rank-one constraint U φ ¼ arg max‖U φ T Kφ ‖F Uφ

s:t:U φ T U φ ¼ 1 ð12Þ P T where φ is equal to Kφ Kφ . The maximization problem of Eq. (12) can be solved by SVD or eigenvalue decomposition. P Assume that V is eigenvector of sample covariance matrix φ in the high-dimensional feature space. so V is obtained using eigenvalue decomposition in the feature space, so V still belongs to

m P i¼1

ð16Þ

αki φðxi Þ.

In the feature space, we introduce the following new kernel inner operation

m m

X   X V k U φðx Þ ¼ αki φðxi Þ U φðxÞ ¼ αki K φ ðxi UxÞ i¼1

ð17Þ

i¼1

Using Eq. (17), (16) can be again rewritten as 

T  λαk ¼ K φ αk ; αk ¼ αk1 ; αk2 ; ⋯; αkm

ð18Þ

Before eigenvalue decomposition, the new kernel matrix should be centered. T   D E  1X 1X K ci;j ¼ φci φcj ¼ φi  φ φ  φ j k k l l N N 1X T 1X T 1 XX T T ¼ φi φj  φi φl  φ φþ φk φl N l N k k j N2 k l 1X 1X 1 XX ¼ K i;j  K il  K þ K kl ð19Þ N l N k kj N 2 k l P 1 c where φi ¼ φðxi Þ, φi ¼ φi  N k φk . The centered kernel matrix can be calculated as follows K cφ ¼ K φ  1m U K φ  K φ U1m þ 1m U K φ U1m

ð20Þ

where 1m is a m  m matrix whose all elements are equal to 1=m. The weight vector αk ðk ¼ 1; 2; ⋯mÞ is just the eigenvector, which can be obtained by performing eigenvalue decomposition on the centered matrix K cφ . The eigenvector α1 which responds to the maximum eigenvalue is just the optimal projection vector in Eq. (11), i.e., U φ ¼ α1 . The vector U φ is just the optimal weights in the linear combination of the basis kernels. Thus, a nonlinear subspace multiple-kernel machine (NS-MKL for short) is constructed and trained. The description of the proposed is given below. Algorithm. : NS-MKL

 N Input: training samples and labels xi ; yi i ¼ 1

a)

b) c)

d)

e)

Predefine

a

set

of

basis

dm Km ; dm Z 0; traceðKÞ ¼ c

 M P kernels K ¼ K Z 0 K ¼

m¼1

Computing basis kernel matrices fK m g using training samples Performing vectorization on basis kernel matrices fK m g and   getting mapped samples qm using a given kernel mapping φ   P Computing covariance matrix φ of qm and performing P eigenvalue decomposition on φ Taking eigenvector relevant to the maximum eigenvalue as optimal weight vector for linear combination of basis kernel and getting the optimally combined kernel

Please cite this article as: X. Zhang, L. Hu, A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.11.078i

X. Zhang, L. Hu / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

4

Constructing and training a SVM classifier combined kernel g) Predicting financial situation on testing samples Output: Normal or ST f)

with

the

3. Experiments 3.1. Dataset description In order to validate the performance of financial distress prediction for Chinese listed companies based on the proposed nonlinear subspace multiple kernel learning, a real dataset collected from Chinese listed companies is adopted for experiments. The principles of data selecting contain the diversity type of companies, the time continuity, the proportionate number of each type samples and financial ratios diversity. 203 normal companies and 203 ST companies are selected for the period from 2006–2013 by the principles above and 20 different original financial ratios were collected for predicting the present financial situation. The information of original data is listed in Table 1. 3.2. Experimental design To validate the effectiveness of the proposed nonlinear subspace multiple kernel learning (NS-MKL) method, some state-ofthe-art FDP methods and financial ratios selection methods were added for comparison, which included the conventional multipleTable 1 Financial distress data for experiments. Year

Normal data set

ST data set

Ratios

2006 2007 2008 2009 2010 2011 2012 2013

39 47 20 18 30 11 24 14

39 47 20 18 30 11 24 14

20 20 20 20 20 20 20 20

linear discriminant analysis (MDA), Logist algorithm, k-NN, single SVM. In each experiment, same training samples and test samples were selected for all the FDP methods for reasonable comparison. In our previous work, a linear subspace MKL was developed for FDP. Here the linear subspace MKL is denoted by LS-MKL. For MKL algorithms, Gaussian kernel with different bandwidths were adopted. The different scales can help the kernels to better capture the similarity between different samples at several scale levels. The bandwidth changed from 0.05 to 2 with interval 0.05, so 40 basis kernels were predefined. Many researches in field of machine learning have proven that this range of bandwidth is reasonable for various applications to the normalized data [16,17]. Each experiment was repeated 10 times and average prediction accuracy was used for evaluation. The effectiveness of the proposed method was tested in different conditions, i.e., with different year-prediction models, different ratios between normal samples and ST samples (1:1, 2:1). 3.3. Experimental result and analysis 3.3.1. Prediction: T-1 and T-2 models, fixed training samples Financial data of Normal companies and ST companies collected in different years are greatly different. The conventional prediction methods generally show good performance for a single year data, but are not good for all years. Namely, generalizing ability is not good enough for FDP. Therefore, different yearprediction models were adopted. Furthermore, serious imbalance between normal and ST company samples in practice was considered. Different ratio between normal and ST company samples were fixed. In the first experiment, the ratio between normal and ST company samples for training were firstly fixed to 1:1. 40 percent samples were randomly selected from all the samples as training samples and the rest were used as test samples. T-1 and T-2 models were used in different prediction methods for comparison. Tables 2 and 3 show the prediction results with different methods and the proposed NS-MKL method. From both of Tables 2 and 3, it can be found that the average prediction accuracy from 2006 to 2013 by the proposed NS-MKL method is better than others. Namely, the proposed NS-MKL always achieves the best prediction accuracy among those

Table 2 Prediction accuracy with T-1 and T-2 models (under the condition: different years, T-1 and T-2 models, ratio 1:1, 40% for training). Models

T-1

T-2

Predicting methods MDA

LOGIT

KNN

NNs

SVM

LS-MKL

NS-MKL

2006 2007 2008 2009 2010 2011 2012 2013

83.707 7.13 70.007 8.67 71.677 8.74 70.507 9.26 87.78 7 6.17 80.007 10.15 85.717 5.32 81.25 7 10.21

78.487 6.86 84.117 5.35 62.50 7 4.81 59.50 7 10.35 67.50 7 10.86 71.677 11.76 61.43 7 9.34 76.88 7 10.64

89.78 7 5.03 84.82 7 4.06 80.007 5.12 69.007 7.38 85.83 7 5.47 79.177 11.45 87.50 7 4.53 82.50 7 7.10

74.357 9.90 75.007 7.94 65.83 7 7.30 65.50 7 7.62 76.677 10.49 65.007 12.92 81.077 9.79 69.38 7 7.48

91.09 74.20 86.43 74.14 83.75 75.36 79.00 75.58 89.17 78.01 89.17 76.86 93.21 72.64 80.00 76.45

92.83 7 3.25 86.617 3.50 84.177 5.49 78.50 7 7.48 88.89 7 5.55 90.83 7 5.30 93.217 3.25 81.25 7 5.89

94.137 3.43 87.147 3.45 85.007 4.57 77.50 7 7.50 91.117 4.68 91.677 4.98 92.86 7 4.12 83.757 5.27

Avg. 2006 2007 2008 2009 2010 2011 2012 2013

78.83 7 8.01 80.22 7 11.25 74.46 7 9.78 59.177 8.29 56.50 7 11.56 69.727 8.33 78.337 8.96 74.29 7 8.38 74.38 7 10.40

70.267 8.89 70.657 11.92 68.577 8.96 59.177 10.17 49.507 18.02 56.117 6.78 73.337 19.16 53.577 10.51 61.25 7 16.08

82.337 6.34 84.78 7 5.25 80.36 7 3.04 71.677 9.17 64.50 7 12.57 73.89 7 5.89 82.50 7 4.73 74.647 5.94 76.88 7 8.36

71.60 7 9.89 76.30 7 9.62 69.46 7 5.48 64.177 9.25 60.007 5.77 67.50 7 6.42 71.677 11.25 65.717 8.28 75.007 9.77

86.48 75.34 80.22 75.80 70.3678.17 66.67 78.33 62.00 711.83 78.89 76.57 82.50 76.15 69.29 77.93 78.13 76.07

87.047 5.30 83.26 7 7.41 76.617 7.36 68.337 9.04 63.50 7 7.09 80.287 5.77 81.677 5.27 74.29 7 6.90 81.25 7 10.21

87.907 5.12 86.967 8.34 81.967 4.08 68.337 7.66 68.50 7 11.80 79.727 3.72 82.507 4.73 76.437 7.18 83.757 11.49

Avg.

70.887 9.62

61.52 7 12.70

76.167 6.87

68.737 8.23

73.51 77.61

76.157 7.38

78.527 7.37

Please cite this article as: X. Zhang, L. Hu, A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.11.078i

X. Zhang, L. Hu / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

5

Table 3 Prediction accuracy with T-1 and T-2 models (under the condition: different years, T-1 and T-2 models, ratio 2:1, 40% for training). Models

T-1

T-2

Predicting methods MDA

LOGIT

KNN

NNs

SVM

LS-MKL

NS-MKL

2006 2007 2008 2009 2010 2011 2012 2013

92.577 3.37 81.677 4.35 80.56 7 7.52 70.677 9.00 84.077 8.91 85.56 7 5.37 82.38 7 9.80 65.007 15.11

66.00 711.46 67.14 76.13 74.44 717.01 76.00 714.47 67.78 711.98 72.22 716.77 67.14 711.32 70.00 722.64

89.147 5.18 84.29 7 5.41 75.56 7 9.15 80.677 4.92 85.93 7 5.47 82.22 7 15.89 82.38 7 6.37 76.677 9.46

77.717 8.17 75.95 7 6.49 72.78 7 9.96 74.677 9.84 74.81 7 4.88 70.007 15.76 72.38 7 10.95 63.337 14.80

92.86 7 4.70 85.717 2.85 77.78 7 9.15 83.337 8.78 90.007 7.24 90.007 9.73 86.197 6.35 80.007 8.61

93.147 3.35 85.487 3.17 81.117 5.86 84.007 9.53 91.117 5.80 90.007 9.73 85.717 7.86 81.677 8.05

93.43 7 4.48 86.43 7 2.26 81.677 7.88 85.337 7.86 92.96 7 5.91 90.007 8.20 88.107 5.70 80.007 8.96

Avg. 2006 2007 2008 2009 2010 2011 2012 2013

80.317 7.93 83.43 7 5.52 78.337 8.43 74.447 5.37 68.007 15.65 72.96 7 5.25 73.337 15.00 67.147 8.23 82.50 7 7.30

70.09 713.97 69.14 75.99 66.90 78.94 61.11 712.28 62.67710.04 63.70713.84 72.22 721.75 53.81 78.41 75.00 714.16

82.117 7.73 88.577 4.47 78.81 7 8.58 69.447 7.52 83.337 10.54 75.197 3.51 84.447 7.77 70.48 7 8.92 85.83 7 11.15

72.707 10.11 71.147 16.35 77.147 5.29 70.567 17.58 62.677 14.81 69.26 7 16.93 63.337 15.76 55.247 11.92 64.177 14.19

85.737 7.18 82.86 7 5.71 79.05 7 5.36 73.89 7 5.27 78.677 8.20 79.63 7 6.36 86.677 10.21 67.62 7 11.62 85.007 12.30

86.53 7 6.67 88.29 7 4.35 81.677 6.74 75.56 7 5.97 80.007 8.31 81.117 4.77 88.897 7.41 67.62 7 10.24 88.337 11.25

87.247 6.40 89.147 4.82 84.767 6.27 77.78 7 4.54 80.007 8.31 82.59 7 4.95 88.89 7 9.07 69.52 7 6.02 87.50 7 11.28

Avg.

75.017 8.84

65.56 711.93

79.517 7.81

66.687 14.10

79.177 8.13

81.43 7 7.38

82.52 7 6.91

Table 4 Prediction accuracy with T-1 under the matching ratio 1:1 Training samples

10 20 30 40 50 60

Methods MDA

LOGIT

KNN

NNs

SVM

LS-MKL

NS-MKL

82.80 7 5.11 83.747 5.49 83.767 8.73 86.60 7 5.61 89.187 2.61 81.577 13.79

56.92 7 9.49 75.167 7.15 82.577 3.18 83.22 7 3.41 85.727 3.85 88.577 2.21

84.87 7 3.19 85.417 2.00 86.737 2.31 86.53 7 2.07 87.617 1.38 87.177 1.89

73.60 7 8.01 76.26 7 11.58 74.197 8.30 78.53 7 9.93 79.187 7.93 79.34 7 7.69

85.447 5.88 86.727 4.15 88.187 3.68 87.55 7 3.85 87.94 7 3.26 90.707 1.70

88.46 7 4.20 87.94 7 2.88 89.87 7 2.96 88.98 7 1.96 90.54 7 2.47 91.707 1.16

89.69 7 3.11 89.79 7 2.29 91.39 7 1.87 91.26 7 1.69 92.727 1.79 92.84 7 1.12

Table 5 Prediction accuracy with T-1 under the matching ratio 2:1. Training samples

10 20 30 40 50 60

Methods MDA

LOGIT

KNN

NNs

SVM

LS-MKL

NS-MKL

77.72 712.40 85.51 74.65 85.80 75.33 88.44 73.65 88.54 73.41 89.14 78.16

59.127 9.94 81.137 5.88 82.45 7 5.04 84.277 5.89 89.617 1.98 89.417 3.67

86.21 74.97 86.94 72.60 87.43 72.99 87.38 72.90 89.32 71.76 87.95 73.97

71.30 7 7.35 77.517 13.30 82.65 7 7.74 74.98 7 15.59 77.80 7 14.40 78.277 11.14

86.047 4.22 88.94 7 3.41 88.167 3.86 88.497 2.63 90.39 7 2.12 90.59 7 3.52

89.84 7 3.15 90.707 2.27 90.22 7 3.15 90.87 7 1.85 92.377 2.13 91.43 7 2.75

91.79 7 2.28 92.157 2.02 92.53 7 2.18 92.22 7 1.27 93.80 7 1.89 92.59 7 2.50

Table 6 Prediction accuracy with T-2 model and different number of samples under the matching ratio 1:1. Training samples

10 20 30 40 50 60

Methods MDA

LOGIT

KNN

NNs

SVM

LS-SMKL

NS-MKL

71.197 8.25 77.60 7 5.57 70.957 5.79 78.65 7 4.73 76.54 7 3.30 73.717 4.31

52.69 71.94 69.81 73.74 71.50 72.10 77.42 73.09 76.34 72.63 80.52 73.75

76.767 5.12 79.23 7 8.91 76.02 7 9.74 79.717 8.50 80.52 7 9.53 80.577 6.95

65.96 7 4.07 65.90 7 5.84 70.727 6.57 71.20 7 2.69 75.107 2.28 73.46 7 3.18

69.7772.81 75.16 73.93 75.79 73.42 79.25 74.22 79.51 73.62 80.89 73.59

76.167 2.98 78.92 7 3.85 77.017 3.57 80.36 7 3.67 80.30 7 4.34 81.247 3.90

79.88 7 1.94 81.86 7 3.07 81.36 7 3.03 82.777 3.42 81.93 7 1.59 84.137 1.82

Please cite this article as: X. Zhang, L. Hu, A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.11.078i

X. Zhang, L. Hu / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎

6

Table 7 Prediction accuracy with T-2 model and different numbers of samples under the matching ratio 2:1. Training samples

10 20 30 40 50 60

Methods MDA

LOGIT

KNN

NNs

SVM

LS-SMKL

NS-MKL

75.727 8.41 82.34 7 5.90 79.107 5.09 77.56 7 8.16 77.617 18.32 82.81 7 10.10

49.61 76.27 68.08 76.91 70.24 77.11 78.44 74.82 79.95 73.77 81.57 73.32

76.28 7 7.33 80.917 5.41 79.517 3.67 81.42 7 4.53 80.687 2.66 82.59 7 2.81

72.677 6.23 74.007 12.54 68.457 18.05 69.077 18.26 73.46 7 14.29 83.417 4.12

75.447 6.21 78.08 7 6.87 77.317 5.01 80.277 2.71 82.29 7 5.44 83.78 7 3.62

78.23 7 7.52 82.28 7 5.94 80.76 7 3.69 81.22 7 6.21 80.32 7 3.96 83.97 7 4.20

80.88 7 7.39 85.067 4.29 84.69 7 3.33 85.60 7 3.60 85.667 2.97 87.197 2.55

Table 8 Prediction: T-1 and T-2 models, different industries (ratio of N/ST 1:1, sampling ratio: 0.4). Model and industries

T1

T2

Methods MDA

LOGIT

KNN

NNs

SVM

LS-MKL

NS-MKL

C1 C2 C3 C4 C5

91.82 7 11.70 52.417 0.58 85.007 5.27 78.75 7 12.22 75.52 7 7.35

85.45 77.67 84.45 73.51 62.86 714.60 72.50 715.08 59.31 77.41

93.55 7 4.69 87.08 7 1.89 82.147 8.42 82.50 7 13.11 86.217 8.90

80.007 17.30 80.157 8.13 69.29 7 15.08 72.50 7 14.19 73.45 7 11.61

93.55 74.69 87.01 72.92 83.57 710.13 90.00 78.44 87.2476.91

94.557 4.69 89.20 7 1.81 83.577 10.13 88.137 9.06 87.93 7 5.92

90.917 7.42 90.807 1.89 87.867 5.88 86.25 7 10.94 90.007 5.00

Avg. C1 C2 C3 C4 C5

76.707 7.42 85.45 7 10.67 62.99 7 3.89 75.007 10.24 60.637 6.62 82.077 6.25

72.91 79.65 85.45 714.97 78.98 75.60 67.14 714.36 50.63 712.31 52.76 710.54

86.30 7 7.40 87.277 8.78 74.167 4.36 78.577 6.73 58.137 11.80 81.38 7 5.44

75.08 7 13.21 79.09 7 11.38 74.317 9.49 70.007 12.05 49.387 8.04 73.79 7 13.02

88.27 76.62 86.36 79.36 76.20 73.85 73.57 710.13 56.25 711.02 83.7975.64

88.687 6.32 87.277 9.82 78.107 4.56 80.717 9.79 59.38 7 10.40 82.767 4.60

89.167 6.23 88.187 9.63 82.777 2.41 79.29 7 9.19 59.38 7 10.72 82.767 3.25

Avg.

73.23 7 7.54

66.99 711.56

75.90 7 7.42

69.317 10.80

75.23 78.06

77.647 7.83

78.487 7.04

methods in case of different ratio of Normal/ST and in case of both T1 and T2 models. 3.3.2. Prediction: T-1 and T-2 models, varying training samples In the second experiment, the prediction was performed with T-1 and T-2 model but different number of training samples. In this experiment, the training samples were randomly selected and changed from 10 to 60 for normal and ST companies. The ratios between normal and ST company samples were still fixed to 1:1 and 2:1. The results of prediction are given in Tables 4 and 5 with T-1 model but different matching ratios. From both of Tables 4 and 5, it can be found that the proposed NS-MKL methods outperforms the others in terms of prediction accuracy, accompanying the increase of training samples. In case of the ratio of Normal/ST 1:1 and 2:1, the proposed prediction method keeps superiority, using T-1 model. The experimental results indicate that the prediction performance of the proposed NS-MKL is stable. Tables 6 and 7 provide the prediction results with T-2 model. Table 6 is of ratio 1:1 and Table 7 is of ratio 2:1, respectively. The same conclusion as using T-1 model can be achieved. The experiments with T-2 model also prove the effectiveness of the proposed method for FDP task. 3.3.3. Prediction: T-1 and T-2 models, different industry In the third experiment, prediction performance of the proposed method was tested by categorizing different industries. The financial data of 5 different industries were selected from the original data. Those industries are mining, manufacturing, energy, retail and realestate and are denoted by C1, C2, C3, C4 and C5. The sampling ratio of the training samples was still fixed to 40 percent. The ratio between normal and ST samples was fixed to 1:1. Table 8 shows the results of prediction. From Table 8, it can be found that the proposed NS-MKL method shows best performance for C2, C3 and C5, and the average

prediction performance for total 5 industries using NS-MKL is better than the others when both of T-1 model and T-2 model were used.

4. Conclusions In this paper, a nonlinear subspace multiple kernel learning method is proposed for FDP. In the proposed method, the predefined basis kernels are firstly converted from matrix form into vector form. A kernel-based nonlinear subspace method is adopted to learn an ‘optimal’ combined kernel from the vector-form basis kernels. The optimally combined kernel can represent the predefined basis kernels in sense of maximizing variance in feature space. The experiments were conducted to prove the effectiveness of the proposed method, referring to T-1 and T-2 models, varying number of training samples and different industries. The results show that the proposed NS-MKL method outperforms state-of-theart methods in terms of prediction performance for all cases.

Acknowledgement This work is supported by the Humanities and Social Sciences Project of Heilongjiang Province Education Department under the Grant 12532303.

References [1] W. Beaver, Financial ratios as predictors of failure, J. Account. Res. 4 (1966) 71–111. [2] E.I. Altman, Financial ratios discriminant analysis and the prediction of corporate bankruptcy, J. Financ. 23 (1968) 589–609.

Please cite this article as: X. Zhang, L. Hu, A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.11.078i

X. Zhang, L. Hu / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ [3] J.A. Ohlson, Financial ratios and the probabilistic prediction of bankruptcy, J. Account. Res. 18 (1) (1980) 109–131. [4] S.-C. Carlos, Self-organizing neural networks for financial diagnosis, Decis. Support. Syst. 17 (1996) 227–238. [5] S. Cho, J. Kim, J.K. Bae, An integrative model with subject weight based on neural network learning for bankruptcy prediction, Expert. Syst. Appl. 36 (2009) 403–410. [6] C. Cortes, V. Vapnik., Support-vector networks, Mach. Learn. 20 (1995) 273–297. [7] L.-H. Chen, H.-D. Hsiao, Feature selection to diagnose a business crisis by using a real GA-based support vector machine: an empirical study, Expert. Syst. Appl. 35 (2008) 1145–1155. [8] Y. Ding, X. Song, Y. Zen, Forecasting financial condition of Chinese listed companies based on support vector machine, Expert. Syst. Appl. 34 (2008) 3081–3089. [9] J.H. Friedman, P. Hall., On bagging and nonlinear estimation, J. Stat. Plan. Inference 137 (2007) 669–683. [10] J. Sun, M.Y. Jia, H. Li, AdaBoost ensemble for financial distress prediction: an empirical comparison with data from Chinese listed companies, Exp. Syst. Appl. 38 (2011) 9305–9312. [11] David M. Blei, Andrew Y. Ng, Michael I. Jordan, Latent Dirichlet allocation, J. Mach. Learn. Res. 3 (4–5) (2003) 993–1022. [12] Bernhard Schölkopf, Alexander Smola, Klaus-Robert Müller, Kernel principal component analysis, Lect. Notes Comput. Sci. 1327 (1997) 583–588. [13] G. Baudat, F. Anouar, Generalized discriminant analysis using a kernel approach, Neural Comput. 12 (10) (2000) 2385–2404. [14] M. Gönen, E. Alpaydn, Multiple kernel learning algorithms, J. Mach. Learn. Res. 12 (2011) 2211–2268. [15] A. Rakotomamonjy, F.R. Bach, S. Canu, Y. Grandvalet., Simple MKL, J. Mach. Learn. Res. 9 (2008) 2491–2521. [16] Y. Gu, C. Wang, D. You, et al., Representative multiple-kernel learning for classification of hyperspectral imagery, IEEE Trans. Geosci. Remote. Sens. 50 (7) (2012) 2852–2865. [17] Y. Gu, G. Gao, D. Zuo, D. You, Model selection and classification with multiple kernel learning for hyperspectral images via sparsity, IEEE J. Sel. Top. Appl. Earth Obs. Remote. Sens. 7 (6) (2014) 2119–2130. [18] N. Subrahmanya, Y.C. Shin, Sparse multiple kernel learning for signal processing applications, IEEE Trans. Pattern Anal. Mach. Intellegence 32 (5) (2010) 788–798.

7

[19] X. Zhang, L.Hu, Z.Wang, Multiple kernel support vector regression for economic forecasting, in: Proceedings of the 17th International Conference on Management Science & Engineering, November 24–26, 2010 Melbourne, Australia, pp. 129–134.

Xiangrong Zhang She was born in Heilongjiang, China, in 1979. She received bachelor degree from Heilongjiang University, 1999, master degree from Harbin Engineering University, 2006. Now she is working toward the Ph.D. degree in School of Management, Harbin Institute of Technology, China. Her interests of researches include technical innovation, data analysis, forecasting.

Longying Hu He was born in Harbin, China, in 1960. He received the Ph.D. degree in technical economics and management, in 2000. Currently, he is a professor in School of Management, Harbin Institute of Technology, China. His interests of researches include technical innovation, technical economics, and strategyalliance.

Please cite this article as: X. Zhang, L. Hu, A nonlinear subspace multiple kernel learning for financial distress prediction of Chinese listed companies, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.11.078i