Accepted Manuscript A new approach for reject inference in credit scoring using kernel-free fuzzy quadratic surface support vector machines Ye Tian, Ziyang Yong, Jian Luo
PII: DOI: Reference:
S1568-4946(18)30481-2 https://doi.org/10.1016/j.asoc.2018.08.021 ASOC 5052
To appear in:
Applied Soft Computing Journal
Received date : 18 November 2017 Revised date : 16 August 2018 Accepted date : 19 August 2018 Please cite this article as: Y. Tian, et al., A new approach for reject inference in credit scoring using kernel-free fuzzy quadratic surface support vector machines, Applied Soft Computing Journal (2018), https://doi.org/10.1016/j.asoc.2018.08.021 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
*Manuscript Click here to view linked References
A New Approach for Reject Inference in Credit Scoring Using Kernel-free Fuzzy Quadratic Surface Support Vector Machines Ye Tiana , Ziyang Yongb , Jian Luoc,∗ a
School of Business Administration and Collaborative Innovation Center of Financial Security, Southwestern University of Finance and Economics, Chengdu, 611130, China b School of Business Administration, Southwestern University of Finance and Economics, Chengdu, 611130, China c School of Management Science and Engineering, Dongbei University of Finance and Economics, Dalian, 116025, China
Abstract Credit scoring models have offered benefits to lenders and borrowers for many years. However, in practice these models are normally built on a sample of accepted applicants and fail to consider the remaining rejected applicants. This may cause a sample bias which is an important statistical issue, especially in the online lending situation where a large proportion of requests are rejected. Reject inference is a method for inferring how rejected applicants would have behaved if they had been granted and incorporating this information in rebuilding a more accurate credit scoring system. Due to the good performances of SVM models in this area, this paper proposes a new approach based on the state-of-the-art kernel-free fuzzy quadratic surface SVM model. It is worth pointing out that our method not only performs very well in classification as some latest works, but also handles some big issues in the classical SVM models, such as searching proper kernel functions and solving complex models. Besides, this paper is the first one to eliminate the bad effect of outliers in credit scoring. Moreover, we use two real-world loan data sets to compare our method with some benchmark methods. Particularly, one of the data set is very valuable for the study of reject inference, because the outcomes of rejected applicants are partially known. Finally, the numerical results strongly demonstrate the superiority of the proposed method in applicability, accuracy and efficiency. Keywords: Reject Inference, Credit Scoring, Kernel-free Quadratic Surface SVM, Outlier Detection, Online Lending
∗
Corresponding author Email address:
[email protected] (Jian Luo )
Preprint submitted to Applied Soft Computing
August 16, 2018
1. Introduction Credit scoring is an automatic method of modeling potential risk of credit applicants [1]. It employs various statistical and machine learning techniques on historical data to generate a credit score. Due to its good performance in risk control, credit scoring has largely aided the growth of the loan market. However, since lenders regularly only have biased sample information, the models may not accurately and completely reflect the true situation of the whole applicants. In order to improve the predictive power, researchers considered combining the information from the rejected borrowers in the models. Thus, reject inference has been used to assess the default risk of the rejected applicants when their actual statuses are unavailable. Due to its importance, reject inference has drawn a lot of attention from both academy and industry [2, 3, 4]. In recent years, due to the quick development of the Internet, lenders can earn higher returns while borrowers can borrow money at lower interest rates. Therefore, online lending has achieved a great success in the loan markets in many countries. Normally, the only way for a lender to learn an applicant’s attributes is from his/her online request [5]. Therefore, based on this limited information, lenders may set up high thresholds and reject a lot of “unqualified” borrowers to prevent high losses. Thus, this abnormally large proportion of rejection actually provides a good opportunity to investigate the problem of reject inference, which is of great importance to both lenders and applicants. The main contributions of this manuscript include the following three aspects. First, we propose a new method using the state-of-the-art kernel-free Fuzzy Quadratic Surface Support Vector Machine (FQSSVM) to infer the statuses of the rejected applicants. As a popular machine learning technique, Support Vector Machine (SVM) and its semisupervised extension (S3 VM) have gained huge successes in credit scoring in recent years [6, 7]. And the methods based on them in reject inference also perform well and demonstrate big potentials [8, 9]. It is worth pointing out that a kernel function is always necessary in a traditional SVM model to handle the nonlinearity in the data set. However, there is no universal rule to automatically choose a suitable kernel for any given data set. Therefore, finding a proper kernel function and its corresponding parameters is a very hard work for the users and damages the applicability of the method. Besides, since a semi-supervised SVM introduces some binary-integer variables into the model, the corresponding problem becomes a mixed integer quadratic programming (MIQP) problem which is generally difficult to solve. Thus, the latest method in [8] is unable to deal with some large-sized problems and its approximated algorithm [10] may not get the “real” optimal solution. In order to overcome these drawbacks, a kernel-free FQSSVM model is incorporated into our new approach. The new model doesn’t need a kernel function and has a convex structure which is easy to solve. Second, this paper is the first attempt to solve the “outlier” problem in credit assessment. Here, the “outliers” indicate a small portion of “bad” applicants with great attributes. 2
Usually, these applicants would perform well and should not be considered as bad ones in any classification. However, due to some unexpected and inevitable factors, they may default their loans. Hence, if these applicants are not properly handled, their existences may largely affect the classification accuracy in the future. To the best of our knowledge, this issue hasn’t been addressed in the literature. In this paper, we design a novel way to lessen the bad effect of those potential outliers. Third, we use one precious data set in which the statuses of the rejected applicants are partially known. Note that, since most of the previous works don’t have any rejected information, this data set is very valuable for the study of reject inference. The remaining paper is structured as follows. In Section 2, we do the literature review about some classical inference studies in credit scoring. In Section 3, the benchmark Logit, K-NN, SVM and S3 VM models are briefly reviewed, meanwhile the new FQSSVM model is introduced. Then Section 5 shows how to prepare the data sets in detail. After that, we propose our new method in Section 4. Then, we conduct several numerical experiments to compare our method with other methods in Section 6. Concluding remarks and future research directions are given in Section 7. 2. Literature review Ideally, credit models should be built on the complete information of all applicants. However, in practice we only have records of those accepted applicants but little information about the repayment performances of rejected applicants in most cases. Hence, this sample bias not only results in a possible bias in the parameter estimation, but also leads to an unstable predictive performance for the application population [11]. Note that, the sample bias in reject inference is essentially a missing data problem which is related to two types of missing in the context of credit scoring, namely Missing Not at Random (MNAR) and Missing at Random (MAR) [12]. To infer the performances of the rejected applicants and mitigate the bias, different techniques have been developed according to these types of missing in reject inference. Now, we summarize some typical works here. MNAR means the class of the outcome depends on both the features of applicants and some unobserved variables [13]. In other words, MNAR indicates that the loan officer has the right to change the decision according to his/her overall impression of an applicant. For example, Germany forbids the lenders to merely use credit models in the judgement. Instead, human investigation must be incorporated in the final decision. Due to the difficulty caused by the unobserved variables, very few empirical studies have investigated the MNAR reject inference problems. B¨ ucker et al. [3] found that the parameters would be significantly altered and the predictions would be improved by adding the information of the rejected group to the 3
predictive model. This result indicates that the ignorance of rejected group is not appropriate while considering statistical and economic concerns. MAR means the missingness is random, but it can be fully accounted for variables when there is complete information. In other words, the decision making is based on the features of applicants as well as some criterion. It is worth pointing out that most rejected cases belong to MAR in credit scoring practice. And our paper also focuses on this MAR reject inference problem. Note that, if the probability distributions of features of both accepted and rejected applicants are known, Expectation-Maximization (EM) Logistic model can be applied to do the reject inference [14]. Traditionally, the selection mechanism is the Accept-Reject (AR) decision and the outcome mechanism is the Good-Bad (GB) decision, respectively. Note that, the AR decision is based on a set of features XAR while the GB decision depends on another set of features XGB . If XAR is a subset of XGB , then extrapolation method can be applied [15]. If XAR is not a subset of XGB which means no set of features can determine both the AR and GB decisions, then augmentation method can be used [16]. On the corporate credit side, Kim and Sohn [17] found that combining samples of the accepts and the rejects together could improve the predictive accuracy. Aside from extrapolation and augmentation, reweighting [3] and reclassification [18] are two other techniques in reject inference. Besides, Bayesian theory was combined into reject inference to improve the discriminant performance under the MNAR assumption [19]. Survival analysis is another popular credit scoring method in recent years. Sohn and Shin [4] firstly considered the time factor in the late repayment as a measurement of default. Note that, the class of the rejected applicants depends on the distribution of the survival time in their model. Banasik and Crook [20] also tested survival analysis on multi-period consumer loan data over 40 months. However, their empirical results indicate that this method has a negative bearing in the reject inference. As pointed by [21], only if the corresponding methods are based either on extrapolation of the acceptance model to the rejected applicants, or on the distribution of the rejected applicants, the corresponding reject inference can be effective. Otherwise, supplementary information such as actual statuses of the rejected applicants should be collected from other sources, as in [15, 17, 19, 20]. It is worth pointing out that this supplementary information is usually very hard to get. Apart from these above statistical methods, machine learning techniques become a main and popular stream in both consumer and corporate credit modeling [22]. Neural Networks, Genetic Algorithm and their extensions have been compared to conventional statistical methods [11, 23, 24]. Recently, due to the impressive performance as a classifier, Support Vector Machine (SVM) and its extensions have been widely applied to credit assessment in the literature [6, 25]. Particularly, Maldonado and Paredes [9] iteratively applied a linear 4
Table 1: Summary of the reject inference literature Author Joanes (1993) Feelders (1999) Banasik et al. (2003) Crook & Banasik (2004) Verstraeten & Van den Poel (2004) Banasik & Crook (2005) Sohn & Shin (2006) Banasik & Crook (2007) Kim & Sohn (2007) Banasik & Crook (2010) Maldonado & Paredes (2010) Chen & Astebro (2012) B¨ ucker et al. (2013) Anderson & Hardin (2013) Li et al. (2017)
Data type Artificial Artificial Consumer Consumer Consumer Consumer Consumer Consumer Corporate Consumer Consumer Corporate Consumer Consumer Consumer
Status of rejects Unknown Unknown Known Known Partially known Known Unknown Known Known Known Known Known Unknown Unknown Unknown
No. of accepts 75 Varying 8168 8168 38,048 8168 759 8168 4298 147,179 800 4589 3984 3000 53,630
No. of rejects 12 Varying 4040 4040 6306 4040 10 4040 689 Varying 200 Varying 5667 1500 536,738
Reject inference approach Reclassification EM Augmentation Reweighting, Extrapolation Augmentation Augmentation Reclassification Augmentation Extrapolation Augmentation Extrapolation Bound and Collapse Reweighting Augmentation, EM Extrapolation
Method Logistic QDA, Logistic Logistic, Probit Logistic Logistic Logistic Survival analysis Logistic, Probit Bivariate Probit Survival analyssis SVM Bayesian Logistic Logistic S3 VM
surface SVM model to do the reject inference. Furthermore, in order to incorporate the information of the rejects, Li et al. [8] proposed a semi-supervised SVM (S3 VM) model to infer the class of the rejected applicants. In summary, we chronologically give an overview of the typical studies of reject inference up to date in Table 1. 3. Methodology Now show some notations used in this paper. R denotes the set of real numbers, Rn denotes the n-dimensional vector space and Rn×n denotes the space of n × n-dimensional matrices. For a vector x ∈ Rn , xi denotes the ith component of x, kxk denotes the 2-norm of x. For a matrix A ∈ Rn×n , Aij denotes the element in the ith row and jth column of A. Several quantitative methods have been developed for credit scoring. Among them, Logistic regression and K-nearest neighbor (K-NN) are the most commonly used and well-known methods. The interested readers can refer to [1, 26] for some detailed information. 3.1. Supervised and semi-supervised SVM Recently, another popular and effective method for credit risk management is applying SVM which is a classification technique in the area of machine learning [27]. Note that, SVM models can be divided into the following three categories according to the labeled situation. They are supervised SVM (all training points are labeled), semi-supervised SVM or transductive SVM (some training points are labeled while others are unlabeled) and unsupervised SVM (all training points are unlabeled). It is worth noting that the classical SVM models with “linear” hyperplanes can’t perform well for a data set with nonlinear structure. Hence, researchers proposed a method by mapping the data points into a higher dimensional feature space via a nonlinear mapping function ψ(x) : Rm → Rd , d > m, then separating the points by a hyperplane wT ψ(x)+b = 0 5
in the new space [28]. Since the mapping function is hardly to be handled directly, researchers introduced the kernel function in the dual problem. Note that, there are many choices for the kernel function [29]. Given a training data set S = {(x1 , y1 ), ..., (xn , yn )}, where yi = −1 means default or bad and yi = 1 means good. If all the training points are labeled which means yi is known for each i, then the supervised SVM model with a mapping function can be written as follows. P min 21 kwk2 + M ni=1 ξi2 (1) s.t. yi (wT ψ(xi ) + b) ≥ 1 − ξi , i = 1, ..., n, where ξ = (ξ1 , ..., ξn ) ∈ Rn is a slack variable which measures the misclassification error for each point and M > 0 is the penalty value of the misclassification error in the objective function. If a part of the training points are labeled while the remaining points are unlabeled, then this becomes a semi-supervised learning problem. As noted, the attributes of these unlabeled points still contain important information about their positions. Hence, in order to improve the classification accuracy, semi-supervised SVM needs to combine the labeled and unlabeled samples together to capture the structure of data. The main idea of semi-supervised SVM is to maximize the margin between labeled points of two classes in the presence of unlabeled points, by keeping the boundary traversing through low-density regions while respecting labels in the input space [30]. Let S denote the index subset for the labeled training points ¯ denote the the index subset for the unlabeled training points, respectively. We have and TS S ¯ S S = ∅, S S¯ = {1, · · · , n}. Then the semi-supervised SVM model with a mapping function can be written as follows. P min 21 kwk2 + M ni=1 ξi2 s.t. yi (wT ψ(xi ) + b) ≥ 1 − ξi , i = 1, ..., n, (2) ¯ yi ∈ {−1, 1}, i ∈ S. A detailed discussion of S3 VM issues can be found in [30] and [31]. Notice that, the performances of these kernel SVM models largely depend on the choice of the kernel function and corresponding parameters. However, there is no rule to decide which setting is more proper for a certain data set. Therefore, searching the optimal setting is really a time-consuming and formidable task for the users [32]. Besides, the unlabeled variables yi , i ∈ S¯ would lead to a mixed integer programming problem which is difficult to solve in general. Thus, the S3 VM model is unable to handle some large-sized problems [30]. 3.2. Kernel-free fuzzy quadratic surface SVM The major contribution of this paper is to develop a new approach for the reject inference based on the kernel-free FQSSVM model. This new model skips the searching process 6
in the classical SVM or S3 VM models. Therefore, it can save effort of the users and be more applicable for them (even some unexperienced starters). Besides, instead of picking a kernel function from several alternatives, FQSSVM directly utilizes a quadratic surface for separation. Since any nonlinear surface can be Taylor approximated by a quadratic surface, FQSSVM is a more generalized model than SVM with a kernel [33]. First, we introduce the basic FQSSVM model. As note, FQSSVM model aims to find a quadratic surface 1 g(x) ≡ xT W x + bT x + c = 0, 2
W11 W12 where W = W T = .. .
W12 W22
··· ···
W1m W2m · · ·
W1m b1 W2m b2 m×m , b = .. ∈ Rm and ∈R . Wmm bm
c ∈ R, that separates the n training points {xi , i = 1, · · · , n} into two classes in the original space. Let ηi denote the fuzzy weight of each training point i which represents its importance in the model. Then the FQSSVM model can be shown as follows [33]. min
n X i=1
s.t.
i
ηi kW x +
bk22
+M
n X
ηi ξi2
i=1
1 yi ( (xi )T W xi + bT xi + c) ≥ 1 − ξi , i = 1, · · · , n. 2
(FQSSVM)
Similarly, ξi is a slack variable to measure the misclassification error for point xi and M > 0 is a penalty constant for the marginal error. This FQSSVM model can be further simplified 2 by the following steps. First, let u be the vector formed by taking the m 2+m elements of the upper triangle part of the matrix W , i.e., T m2 +m u , W11 , W12 , W13 , · · · , W1m , W22 , W23 , · · · , Wmm ∈ R 2 . 2
Then we can construct an m×( m 2+m ) matrix M i for the training point xi = [xi1 , xi2 , · · · , xim ]T ∈ m2 +m
Rm as follows. For the jth row of M i in R 2 , j = 1, · · · , m, check the elements of u one by one. If the pth element of u is Wjk or Wkj for some k = 1, 2, · · · , m, then assign the pth element of the jth row of M i to be xik . Otherwise, assign it to be 0. Afterwards, let matrix P m2 +m m2 +3m m2 +3m H i , M i , Im ∈ Rm×( 2 +m) , i = 1, · · · , n, and G , ni=1 ηi (H i )T H i ∈ R( 2 )×( 2 ) , 7
u where Im is the m-dimensional identity matrix. Also define the vector of variables z = ∈ b
R
m2 +3m 2
and the vector
1 1 1 1 si =[ xi1 xi1 , · · · , xi1 xim , xi2 xi2 , · · · , xi2 xim , · · · , xim−1 xim−1 , xim−1 xim , xim xim , 2 2 2 2 xi1 , , xi2 , · · · , xim ] ∈ R
m2 +3m 2
.
Finally, the FQSSVM model can be reformulated as follows. min
T
z Gz + M
n X
ηi ξi2
i=1
s.t.
i T
yi ((s ) z + c) ≥ 1 − ξi , i = 1, · · · , n, (z, c) ∈ R
m2 +3m +1 2
(FQSSVM1 )
.
P Note that, since G = ni=1 ηi (H i )T H i and ηi ≥ 0 for i = 1, · · · , n, G is a positive semidefinite matrix. Therefore, problem (FQSSVM1 ) is a convex quadratic programming problem with linear constraints which can be efficiently solved by a lot solvers such as cvx [34] and SeDuMi 1.3 [35]. Remark 1. FQSSVM directly generates a nonlinear (quadratic) separation surface in the original space. This is totally different from using a quadratic kernel. And the reformulation process can be executed very fast by the computer Moreover, due to the convex structure of FQSSVM, this model is able to handle some real large-sized problems. Above all, the kernelfree FQSSVM model is a new trend in the machine learning area. It is the state-of-the-art technique and has demonstrated a great power in some real applications. 4. Data and variables In our numerical experiment, we use two real-world data sets. One is the public data set from Lending Club in the United States and the other one is the loan data set from an online financial company in China. Lending Club is one of the largest online credit marketplaces offering P2P lending. Based on the public data provided by Lending Club till September 2015, we extracted 36-month loans issued from January 2009 to September 2012 (the actual good/bad results of those accepted loans are known in September 2015). By excluding some records with obvious errors, this test data set includes 536,738 rejected loans, 53,630 accepted loans and 6,443 defaults ones. Following the same way in [8], we categorize all regions into four groups with 8
Table 2: Descriptive statistics Accept Variable Min Max Median Loan amount 1000 35,000 10,000 Fico score 662 848 697 0 34.99 15.77 Debt to income Employment length 0 10 5 State 1 State 2 State 3
of different variables in Lending Club data Reject Mean S.D. Min Max Median Mean 11,490 6,969 1000 35,000 10,000 14,900 705.7 33.01 385 850 652 642.7 15.91 7.40 0 419.4 18.42 23.95 5.52 3.53 0 10 0 0.7696 0.12 0.12 0.51 0.46 0.35 0.40
S.D. 11,057 65.05 28.32 2.27
reference to the default rates. Then, three dummies are used to denote the first three groups while the last one (States DC, WV, WY and KS) is left as the reference set. Any applicant fallen in the corresponding group of states is given a value 1, otherwise given a value 0. Descriptive statistics of the common five variables between the accepted and rejected classes are shown in Table 2. Notice that, only the average values of state variables are calculated. The other real-world loan data set comes from a Chinese online financial company called Huijin 1 . This company was built in Chongqing in 2013 and developed very fast in the area of small online loans. Note that Huijin company have several lines of business such as car loan, decoration loan, mobile phone loan, electrical appliance loan, cash loan and so on. In our numerical tests, we use the data of mobile phone loan from 2014 to present. By excluding some accepted loans with indefinite results (recently accepted loans), this data set includes 93,886 requests with 35,667 accepted ones and 58,219 rejected ones 2 . It is worth noting that the data set from Huijin has two main advantages. The first one is this data set has much more common variables. The second one is this data set has some known results for the rejected applicants. It is due to the various strategies of the company. At the beginning, a conservative strategy was applied and a small portion of applicants were accepted. After a period of time, a more risky strategy was executed by the company. Then some rejected applicants in the first stage were accepted in the second stage. If we focus on the reject inference issue in the first stage, then the real repayment behaviors of some applicants who were rejected in the first stage are known to us. It is worth pointing out that assuming repayment behavior of an applicant who is rejected in stage 1 and accepted in stage 2 is the same as if the applicant had been accepted in stage 1 is a little bold claim which may suffer from the problem of endogeneity. However, compared with most of the benchmark data sets whose rejected statuses are completely unknown, Huijin data set undoubtedly has 1
http://www.5dhj.com The interested readers can contact us to ask for the data. But it is worth pointing out that a confidential agreement with Huijin company needs to be signed to restrict any commercial use. 2
9
a much higher value in studying reject inference. In Huijin data set, according to the criteria, an applicant with more than 60 days overdue would be considered as a bad one. Note that we focus on the reject inference for all the applicants in the first stage. Therefore, we exactly know the good/default labels of the accepted applicants, and know some of the good/default labels of the rejected applicants who were accepted in the second stage. For those applicants who were rejected in both two stages, we label them as bad ones in the model. Totally, we have 45,278 requests with 14,492 accepted ones and 30,786 rejected ones in the first stage. For the accepted applicants, there are 1,647 defaults. Moreover, for the rejected applicants, 4,890 applicants were accepted in the second stage with 1,027 defaults. Now we explain how we handle the variables in Huijin data. For each binary variable, such as sex, marriage, property, car, we use a dummy variable to indicate it. The corresponding detailed information is shown in Table 3. For the variables with discrete states but without any logic connection, such as education, address and working company, we first transform these variables into several categories according to some rules. We explicitly show the different categories of those features in Table 4. Then for each category, we use the weight of evidence (WOE) coding. For each attribute, let WOEi denote the WOE value for its ith category. The equation of WOEi is given as follows. gi b WOEi = ln(( )( )), g bi
(3)
where gi and bi are the numbers of good and bad applicants which correspond to the ith category or small interval, g and b are the total numbers of good and bad applicants in the sample, respectively. After that, we can calculate the information value (IV) for each variable as follows. X gi bi X g i bi gi b [( − ) × WOEi ]. (4) IV = [( − ) × ln( )] = g b gbi g b i i
Note that the bigger the IV is, the more discriminative power the corresponding variable has. As usual, the variables with IV bigger than 0.02 are chosen into the final model. After executing this process, only one variable (Address) is excluded in the final model. Above all, there are 14 variables in total. And the correlation test among these variables indicates that they do not have significant correlations. Descriptive statistics of these continuous and dummy variables are shown in Table 5. For the dummy variables, only average values are calculated. It is worth pointing out that this WOE encoding is firstly performed on the training data set and then applied on the new data set in each test. In addition, we independently normalize each feature by linearly scaling it into the range of (-1,1). This process prevents the smaller value inputs from being overwhelmed by the larger value inputs, hence helps to reduce the prediction errors [36]. 10
Table 3: The states of dummy variables data Dummy variable 1 State 0
and their corresponding meanings of the binary variables in Huijin Sex Male Female
Marriage Married Single&Divorced
Property Own No
car Own No
Table 4: The different categories of some multiple-discrete variables and corresponding classification rules in Huijin data Feature Education Based on degree Address Based on location Working company Based on size
E1 Middle&High school A1 Northeastern WC1 no
Categories E2 E3 Junior college Undergraduate A2 A3 Northwestern Southwestern WC2 WC3 small middle
E4 Graduate A4 Southeastern WC4 large
A5 Middle
Table 5: Descriptive statistics of some continuous and dummy variables in Huijin data Variable Age Number of Children Monthly income Employment length Loan amount First repay percentage(%) Number of loan periods Value of desired Phone Sex Marriage Property Car
Min 16 0 1800 0 500 20 3 1500
Max 48 3 15,000 14 12,500 50 24 9000
Accept Median 26 1 3800 4 2700 30 12 2900
Mean 31.52 1.13 4570 5.16 3117 33.68 15.47 3015 0.41 0.39 0.18 0.26
11
S.D. 6.47 0.46 1842 3.07 1466 8.39 8.36 1178
Min 16 0 400 0 500 10 6 1200
Max 56 5 7500 9 14,500 30 24 9000
Reject Median 23 1 3100 1 2900 20 12 3200
Mean 27.05 1.34 3620 1.97 3682 22.47 18.02 3759 0.46 0.33 0.09 0.14
S.D. 8.28 0.71 791 1.85 2018 5.72 9.33 1550
5. New approach In this section, we propose a totally new approach by using the FQSSVM model to handle the reject inference problem. This approach can be divided into the following four steps. 5.1. First step: initial separation Let A and R denote the index sets for the accepted and rejected applicants, respectively. First, we use A and R to get the initial separation which is the original criteria for the renter to decide whether an applicant should be accepted or not. For each point, we indiscriminately assign an equal weight η¯ to it. Suppose the bad/good ratio among the training points is γ1 , then for the good points, let ηi = γ1 η¯; while for the bad points, remain ηi = η¯. In this way, we can deal with the imbalance which is commonly seen in a machine learning problem. After that, for each point xi , we can use the FQSSVM model to find the separation surface g(x) and calculate a distance di between xi and g(x). 5.2. Second step: outliers detection Now we focus on the outlier issue which can largely affect the prediction accuracy of the model. Here, “outlier” indicates an accepted applicant who accidently becomes a default one due to some unexpected factors, such as accident, disease or something else. Note that this phenomenon is commonly seen in the credit risk area and is inevitable. However, these accidental factors actually have very small probabilities to happen repeatedly. Thus, the applicants with the similar conditions should not be considered as bad ones in the model and rejected in the future. Here, we use one simple example in Figure 1 to show the bias of the optimal separation if the “outliers” are not eliminated. Blue solid circle and green dotted circle are the real borders of the good and bad classes, respectively. Then we randomly generate some training points in these two circles and denote them by little blue circles and little green diamonds. Particularly, the “outliers” among the good applicants are represented by the red diamonds. As note, the black line is the perfect separation hyperplane. The pink line and cyan line denote the separating hyperplanes generated with/without the influence of these outliers, respectively. It is obvious that eliminating outliers can significantly improve the accuracy of the classification. To the best of our knowledge, our paper is the first work to handle this important issue in reject inference. Let G and B denote the index sets for the good and bad applicants, respectively. Notice that the rejected applicants are temporarily considered as bad ones here. Specially, let AB denote the we have S index set for the S accepted applicants who are actually bad ones. Then, i A = G AB and B = R AB . The Euclidean distance between two points x and xj in the feature space is d(xi , xj ) = kxi − xj k. Let | · | denote the cardinality of a set. Let β be an adjustable parameter which is used as the radius of the neighborhood. Then we discern which points are the outliers in AB . For a training point xi , i ∈ AB , we check the number of 12
6
Perfect Separation With Outliers Eliminate Outliers
5.5 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5
1
2
3
4
5
6
7
Figure 1: Bias of the optimal separation under the influence of the outliers
homogeneous points and the number of total points in its neighborhood. Then the degree of affinity of this point can be given by the proportion of these two numbers, ρ(xi ) =
|{xj | d(xi , xj ) ≤ β, yj = −1}| . |{xj | d(xi , xj ) ≤ β}|
(5)
Let T be a threshold value. Then for each point xi , i ∈ AB , if ρ(xi ) < T , it has a high probability to be an outlier. That is because its neighborhood is full of good applicants, there is no reason to classify a similar applicant as a bad applicant in the future. Thus, we delete these outliers in AB . It is worth pointing out that if there exist outliers in the training points, we can largely eliminate their bad effects and significantly improve the accuracy of the final classification by applying this process. 5.3. Third step: fuzzy weights Different fuzzy weights have different influences on the classifier, hence it is important to assign a proper weight to each training point [37, 38]. In this paper, for the membership of each point, we consider not only its distance to the class center, but also the relative density in its neighborhood. As we know, the points around the separating surface are more important than other points in obtaining an accurate classification, especially for the bad applicants among the accepted ones. That is because this set AB is the only available and confident information which implies the real boundary between those two different classes. Therefore, we need to assign more weights to those points. On the other hand, a large 13
proportion of the potential good applicants (currently rejected ones) may hide around the separation surface but are away from the new set of bad applicants. Therefore, in order to contain some of these potential good applicants in our final classification, we also need to decrease the corresponding weights of those points in the new FQSSVM model. Besides, it is hard to dig other potential good applicants which are far away from the separation surface, because it would be too risky to consider them as good ones. Above all, we design an exquisite weighting scheme which attentively considers the structure of the reject inference problem. Compared with the S3 VM method, this new weighting scheme not only captures the features of reject inference and performs as well as S3 VM, but also largely improves the solving efficiency. The numerical results in Section 6 will strongly demonstrate it. Now we show how to execute this weighting scheme. First we calculate the weights of all training points. Let lG = |G| and lB = |B| denote the numbers of points in the positive and negative classes, respectively. Then the centers of these two classes are given by cG =
1 X i x; lG i∈G
cB =
1 X i x. lB i∈B
(6)
Besides, the radii of these two classes are defined as follows. rG = max kxi − cG k,
rB = max kxi − cB k.
i∈G
i∈B
Then, we define the degree of membership of a point xi by ( i Gk , i ∈ G; 1 − kxrG−c i +δ µ(x ) = kxi −cB k 1 − rB +δ , i ∈ B,
(7)
(8)
where δ is an adjustable parameter. For a training point xi , we also check the number of heterogeneous points and the number of total points in its neighborhood. Then the degree of hesitation of this point can be given by the proportion of these two numbers, i.e., ς(xi ) =
|{xj | d(xi , xj ) ≤ β, yj 6= yi }| . |{xj | d(xi , xj ) ≤ β}|
(9)
Then the final degree of membership of a point xi can then be calculated by the following equation ν(xi ) = µ(xi )(1 − ςxi )). 14
(10)
Let lNPdenote the number of points in the index set AB . The center of this class is i cN = lN1 i∈AB x . Let D1 and D2 be two threshold distance values. Then the index set which may contain a large portion of potential good applicants among the rejected ones can be denoted as RG = {i|d(xi , cN ) > D1 , di < D2 , i ∈ B}. For any i ∈ RG , let η¯i = di ν(xi ), and for the remaining i, let η¯i = di1+δ ν(xi ). In this way, for each training point, we can reasonably achieve our aim by considering both the final membership and distance to the separation surface. At last, suppose the updated bad/good ratio among the training points is γ2 , then for the good points, let ηi = γ2 η¯i ; while for the bad points, remain ηi = η¯i . Remark 2. An alternative approach is to iteratively apply the model to extrapolate the rejected applicants one by one [9]. But this approach does not show any superiority in the predicting performance. Contrarily, this scheme is very time-consuming especially for the large-sized problem. It is worth pointing out that we will compare this scheme with our scheme in the numerical test section later. 5.4. Fourth step: kernel-free FQSSVM Now we have different weights η i of all training points. Then we can use the FQSSVM ¯x+b ¯ T x + c¯ = 0. After that, for model to generate a new separation surface g¯(x) ≡ 12 xT W each point xi , the sign of g¯(xi ) can tell us which class this applicant should be discriminated into. 6. Numerical experiments and results 6.1. Experiment design We compare our new method with those benchmark methods in the area of reject inference, such as logistic regression, K-NN, supervised SVM and semi-supervised SVM. Moreover, we compare the two approaches, one with outliers being eliminated by our method and one without this process. Besides, we check the sensitivities of the parameters used in our model. Finally, we also compare some typical fuzzy weights with our fuzzy weight in the FQSSVM model. Following the standard steps proposed in [14], the numerical experiments can be conducted as follows: Step 1: Sample equal numbers of accepts and rejects in each test and randomly choose 70% as the training set and 30% as the testing set. Step 2: Respectively build models of Logit, K-NN, supervised SVM, FQSSVM using the accepted sample with labels, and S3 VM using the accepted sample with labels and rejected sample without labels.
15
Step 3: Respectively apply the classification rules derived from Step 2 to the test set and compare the performances of different methods. In order to check the robustnesses of the methods, all the experiments will be repeated 50 times and the final numerical results are the average values of those experiments. Moreover, we set up a fixed 5,000 sample size in Step 1 (1000 sample size for S3 VM). To eliminate the influence of the sample difference, we randomly choose these sample points each time. Remark 3. The 70/30 split in Step 1 is a traditional way to generate training and testing points. Besides, the fixed number of training points is good for measuring the average performance of each method in the repeated numerical tests. Due to the mixed integer structure hidden in the S3 VM model, the current approximation is limited to small-sized problem. Therefore, we restrict the sample size to be 1000 for S3 VM in the numerical tests. 6.2. Measurement of model performance Like the traditional way, classification accuracy is used to evaluate the performance of different models [39]. Note that the confusion matrix is a good tool to gauge the classification accuracy or level of misclassification. Moreover, the computational time is another measurement of model performance. That is because the efficiency of the model is extremely important for some large-sized problems in the current age of big data. Besides, the Receiver Operation Curve (ROC) also provides a good measure for the discriminant power [11]. It is a graph which plots the true positive rate (sensitivity) against the false positive rate (fall-out) at various threshold settings. 6.3. Results For FQSSVM model, following the classical intuitionistic fuzzy set method, we set the adjustable parameter δ = min{r5G¯ ,rB¯ } and the neighbor radius β = 0.1 as tenth of the domain distance (all the feature values have been scaled into the range of (-1,1)). Moreover, we set the threshold value T = 0.1. It is worth pointing out that our approach is quite robust to this threshold value (please see Table 12) . For SSVM and SVM models, we compare two traditional kernel functions (Gaussian kernel and linear kernel) then choose the best one. As i −xj k2 ), the parameter σ was set as the noted, for the Gaussian kernel K(xi , xj ) = exp(− kx 2σ 2 median value of all pairwise distances. For the K-NN method, we use the Euclidean distance to measure the degree of similarity. Then the grid searching method is used to find the optimal penalty value M for all SVM models. Besides, the accuracy criterion for all models is set as 10−4 . All the computational tests in this paper are carried out by matlab 7.9.0 on a PC equipped with Intel Core i5 CPU 3.3 Ghz and 4G usable RAM. Moreover, the solvers cvx and SeDuMi are used in the experiment.
16
The summary of the numerical results for different methods is shown in Table 7. To provide some detailed information, we show the accuracy results for the accepted, the rejected, the good and the bad (Type I & II errors) ones. The model with the highest accuracy is highlighted in bold. We can easily find that FQSSVM achieves the best accuracy and significantly leads other methods for both data sets. Thus, these results strongly demonstrate the effective power of our new method in improving the classification accuracy and providing very useful information for the reject inference. Then we check the performances of different methods on Type I and Type II errors for both data sets. It is obvious that FQSSVM beats Logit and SVM in controlling both Type I and Type II errors. Moreover, though FQSSVM has a slightly higher Type I error than S3 VM, it has a much better performance than S3 VM in controlling Type II error. Therefore, by eliminating the bad effect of “outliers” and using an advanced weighting scheme, FQSSVM indeed decreases the classification errors in both accepted and rejected applicants. More importantly, since the loss of a bad applicant is much higher than the profit of a good applicant, Type II error is more crucial than Type I error on decision making. It is worth pointing out that, compared with other benchmark methods, FQSSVM not only has good Type I error but also has much better Type II error in all cases. Thus, our new method is more practical in reject inference. After that, we check the efficiencies of these methods. The summary of the average computational times (in seconds) is provided in Table 8. Note that the sample size for S3 VM is 1,000 while the sample size for other methods is 5,000. From the results, we can see the efficiency of FQSSVM is much better than those of SVM and S3 VM. Considering the performance, FQSSVM is a much better choice than Logit and K-NN, although it has a relatively longer computational time. In summary, compared with the state-of-the-art SVM and S3 VM methods, our approach is much easier to be implemented and solved. Therefore, the proposed method is more suitable for the users and has a bigger potential in some realworld applications. Moreover, we compare the discriminant power among these methods by calculating their average values of AUC. The corresponding results are shown in Table 9. As noted, the improvement of AUC is evident for FQSSVM compared with other methods. Therefore, FQSSVM apparently has the best discriminant power as a reject inference technique in classifying. What’s more, since S3 VM is the most advanced method to do reject inference in the literature, we conduct a more comprehensive comparison between this method and our new one. Besides, as we mentioned in Remark 2, we also compare another commonly used approach which iteratively extrapolates the unknown rejected applicants one by one. Note that we also apply FQSSVM as the basic model in this iterative approach. Here we denote this method as Iter. In this experiment, we use different training sets with various sizes and set the maximal 17
Table 6: Overall ccuracy results for Lending Club data and Huijin data Overall accuracy Logit K-NN SVM S3 VM FQSSVM Lending Club Accepted 0.727 0.684 0.751 0.832 0.867 2009 Rejected 0.762 0.749 0.909 0.931 0.953 Both 0.747 0.725 0.841 0.888 0.916 Accepted 0.755 0.671 0.738 0.854 0.907 2010 Rejected 0.797 0.788 0.935 0.964 0.941 0.773 0.740 0.862 0.903 0.919 Both Accepted 0.824 0.716 0.763 0.848 0.892 2011 Rejected 0.833 0.810 0.952 0.924 0.943 Both 0.827 0.784 0.887 0.896 0.918 Accepted 0.773 0.691 0.728 0.796 0.866 0.939 2012 Rejected 0.790 0.748 0.942 0.907 Both 0.782 0.723 0.856 0.873 0.905 0.745 0.833 0.883 Accepted 0.770 0.709 Rejected 0.796 0.767 0.934 0.932 0.944 All Both 0.782 0.735 0.862 0.890 0.915 Huijin 0.826 0.871 0.890 Accepted 0.775 0.722 Rejected 0.684 0.659 All 0.730 0.747 0.798 Both 0.735 0.694 0.788 0.819 0.863
Table 7: Type I & II errors for Lending Club data and Huijin data Lending Club Accepted 2009 Rejected Both Accepted 2010 Rejected Both Accepted 2011 Rejected Both Accepted 2012 Rejected Both Accepted All Rejected Both Huijin Accepted All Rejected Both
Type I errors SVM S3 VM
Logit
K-NN
0.184 0.207 0.186 0.169 0.241 0.195 0.117 0.270 0.173 0.172 0.230 0.206 0.161 0.237 0.190
0.265 0.244 0.254 0.279 0.193 0.247 0.251 0.173 0.196 0.274 0.235 0.256 0.270 0.211 0.242
0.226 0.082 0.130 0.306 0.072 0.164 0.260 0.051 0.165 0.329 0.054 0.168 0.280 0.055 0.157
0.195 0.374 0.286
0.228 0.305 0.262
0.187 0.149 0.170
Type II errors SVM S3 VM
FQSSVM
Logit
K-NN
0.046 0.040 0.043 0.038 0.028 0.033 0.038 0.034 0.036 0.032 0.045 0.039 0.039 0.037 0.038
0.051 0.027 0.035 0.069 0.037 0.056 0.085 0.083 0.084 0.102 0.038 0.072 0.074 0.046 0.062
0.803 0.264 0.357 0.742 0.185 0.281 0.703 0.152 0.274 0.641 0.201 0.311 0.722 0.201 0.306
0.688 0.309 0.362 0.647 0.266 0.303 0.477 0.240 0.285 0.531 0.280 0.341 0.482 0.259 0.297
0.294 0.126 0.194 0.223 0.091 0.178 0.212 0.038 0.166 0.226 0.103 0.183 0.239 0.075 0.180
0.861 0.075 0.235 0.911 0.058 0.215 0.924 0.055 0.231 0.852 0.232 0.275 0.887 0.105 0.239
0.208 0.056 0.118 0.171 0.093 0.147 0.145 0.034 0.102 0.172 0.085 0.148 0.174 0.082 0.129
0.063 0.107 0.083
0.168 0.261 0.215
0.352 0.255 0.305
0.369 0.517 0.431
0.180 0.329 0.266
0.628 0.435 0.522
0.085 0.148 0.117
18
FQSSVM
Table 8: Computational times for Logit K-NN Lending Club 2009 24.67 53.31 2010 19.52 48.44 2011 20.32 50.25 2012 21.18 51.18 Average 21.42 50.80 Huijin All 36.84 68.27 Table 9: AUC results Logit Lending Club 2009 0.819 2010 0.838 0.874 2011 2012 0.845 Average 0.844 Huijin All 0.776
Lending Club data and Huijin data SVM S3 VM FQSSVM 175.7 171.8 173.1 169.4 172.5
885.4 792.8 915.6 876.6 867.6
37.56 38.11 35.29 41.03 37.00
224.7
1463
113.6
for Lending Club data and Huijin data K-NN SVM S3 VM FQSSVM 0.790 0.785 0.826 0.808 0.802
0.853 0.875 0.879 0.836 0.861
0.866 0.882 0.890 0.851 0.872
0.884 0.901 0.907 0.873 0.891
0.753
0.822
0.839
0.865
runtime to be 10,000 seconds. The corresponding results are shown in Table 10. We only provide the overall accuracy for each method here. From the table, we can easily check that FQSSVM not only achieves the best accuracy and AUC, but also has the highest efficiency in all the cases. Especially for those large-sized problems, S3 VM and Iter are unable to handle them in a reasonable computational time, but FQSSVM can solve them very quickly. Furthermore, since we design an exquisite weighting scheme in our method, it is necessary to do a comparative analysis which includes some other alternative weighting schemes to show the added value of our proposed one. In this test, we consider the following two alternative weighting schemes: 1. the “basic form”, which assigns the same weight for each point, i.e. ηi = 1, denoted as Basic; 2. the traditional fuzzy weighting scheme in [40], denoted as FSVM. It is worth pointing out that all these schemes use the same FQSSVM model but with different weights. And the sample size is fixed as 5,000. The corresponding results are shown in Table 11. Since the difference of computational times for various weighting schemes is quite small, we do not show them here. From the results we can see that, compared with other two alternative weighting schemes, our new one indeed can dig deeper in the structure of reject inference and capture more effective features. Therefore, it achieves the best accuracy and AUC. At last, we verify the effectiveness of the outliers detection. Note that T = 0 indicates 19
Table 10: Comparison between S3 VM and FQSSVM for Huijin data Size 200 500 1000 2000 5000 3 0.757 0.770 0.796 0.756 0.738 S VM Accuracy results Iter 0.766 0.801 0.818 0.827 0.829 FQSSVM 0.789 0.827 0.845 0.853 0.861 S3 VM 122.1 465.7 1459 10000 10000 Computational times 104.3 251.2 773.1 1682 4715 Iter FQSSVM 8.83 13.31 20.72 49.89 120.5 0.792 0.824 0.775 0.760 0.751 S3 VM AUC results Iter 0.781 0.811 0.825 0.848 0.857 FQSSVM 0.807 0.831 0.841 0.858 0.874
10000 0.725 0.826 0.866 10000 10000 481.2 0.739 0.853 0.870
Table 11: Different weighting schemes for Huijin data Basic FSVM FQSSVM Accuracy results 0.801 0.822 0.860 AUC results 0.826 0.843 0.872
that no training point will be detected as an outlier. In that case, the detection process is not executed. Besides, we also check the sensitivities of the parameters used in our model. The corresponding results are shown in Table 12. Since the amount of outliers varies according to the sample selection, except the average values, we also provide the worst case (which has the highest amount of outliers) to explicitly show the importance of the outlier detection in improving the performance. Similarly, the sample size is fixed as 5,000. From the results, we can see that the outliers detection is very effective in eliminating the bad effect of outliers, especially when the amount of outliers can significantly bias the original structure of the data. Therefore, considering the power and convenience of this process, it is necessary to add it as a pre-process in the reject inference. Moreover, the results also indicate that this approach is quite robust to the parameters selection. Therefore, the users can have a great flexibility in applying this method. Table 12: Effectiveness of the outliers detection and sensitivities of the parameters for Huijin data T = 0 T = 0.05 T = 0.1 T = 0.1 T = 0.1 T = 0.15 T = 0.2 Cases Results β = 0.05 β = 0.05 β = 0.1 β = 0.15 β = 0.15 β = 0.2 Accuracy 0.851 0.859 0.862 0.866 0.865 0.862 0.860 Average AUC 0.857 0.880 0.884 0.887 0.885 0.884 0.881 Accuracy 0.781 0.822 0.831 0.844 0.846 0.849 0.851 Worst AUC 0.819 0.837 0.851 0.862 0.866 0.870 0.871
20
7. Conclusion As an important and difficult problem in credit scoring, reject inference has gained a lot of attention for a long time. This paper proposes a new approach for the reject inference problem based on the kernel-free FQSSVM model. Unlike the classical logit and KNN methods which only use the known information of the accepted borrowers, our new method can dig some valuable information hidden in the rejected applicants. Moreover, compared with the benchmark SVM methods and their extensions, our method has the following four advantages. First, an outlier detection process is designed to eliminate the bad effect of the outliers. Second, our new weighting scheme can capture the features of reject inference more accurately. Third, the new kernel-free method can avoid the time-consuming searching task in the traditional SVM model. Forth, the convex structure of FQSSVM model guarantees its high efficiency in the implementation. What’s more, two real-world customer loan data sets are used in the experiments to measure the performance of different methods. Especially, one of the data set contains partial known information of the rejected applicants. This is very valuable to the study of reject inference. The corresponding experimental results strongly demonstrate the superiority of our proposed method in applicability, accuracy and efficiency. Therefore, FQSSVM method shows a big potential as an effective reject inference technique in some real applications. Finally, it is worth noting that this method has been successfully applied in Huijin company for several months. So far, though more loan applications have been approved by our model, the corresponding default rate has been kept at a normal level. Therefore, this method indeed brings an economic benefit to this company. Besides, except the reject inference in credit scoring, this method can also be used in some real-world applications where the selection mechanisms are applied and the real performances of those rejected ones are unknown. For example, the graduate schools can apply it to improve their ways to enroll good students. For the future study, we can explore the following three directions. First, for some largesized problems, we can follow the smart idea of random forest to separate the data into small ones, then build and integrate those individual models. Second, we may introduce some new SVM models to the area of reject inference, such as parallel surface SVM and robust fuzzy SVM. Third, we can consider incorporating the existing well-known methods (including logit, KNN, SVM models and so forth) together to build an ensemble learner to do the reject inference. Acknowledgment Tian’s research has been supported by the National Natural Science Foundation of China Grants #11401485 and # 71331004. Jian’s research has been supported by the National 21
Natural Science Foundation of China Grant # 71701035. Reference [1] S. Sohn, D. Kim, J. Yoon, Technology credit scoring model with fuzzy logistic regression. Appl. Soft Comput. 43 (2016) 150-158. [2] K. Bijak, L. Thomas, Does segmentation always improve model performance in credit scoring? Expert Syst. Appl. 39 (2012) 2433-2442. [3] M. B¨ ucker, M. van Kampen, W. Kramer, Reject inference in consumer credit scoring with nonignorable missing data. J. Bank. Financ. 37 (2013) 1040-1045. [4] S. Sohn, H. Shin, Reject inference in credit operations based on survival analysis. Expert Syst. Appl. 31 (2006) 26-29. [5] R. Iyer, A. Khwaja, E. Luttmer, K. Shue, Screening Peers Softly: Inferring the Quality of Small Borrowers. Manage. Sci. (2016) forthcoming. [6] T. Bellotti, J. Crook, Support vector machines for credit scoring and discovery of significant features. Expert Syst. Appl. 36 (2009) 3302-3308. [7] P. Konar, P. Chattopadhyay, Bearing fault detection of induction motor using wavelet and support vector machines (SVMs). Appl. Soft Comput. 11 (2011) 4203-4211. [8] Z. Li, Y. Tian, K. Li, F. Zhou, W. Yang, Reject inference in credit scoring using semisupervised support vector machines. Expert Syst. Appl. 74 (2017) 105-114. [9] S. Maldonado, G. Paredes, A Semi-supervised Approach for Reject Inference in Credit Scoring Using SVMs. In P. Perner (Ed.), Advances in Data Mining. Applications and Theoretical Aspects (Vol. 6171, pp. 558-571): Springer Berlin Heidelberg (2010). [10] Y. Tian, J. Luo, A new branch-and-bound approach to semi-supervised support vector machine. Soft Comput. 1 (2010) 1-10. [11] J. Crook, D. Edelman, L. Thomas, Recent developments in consumer credit risk assessment. Eur. J. Oper. Res. 183, (2007) 1447-1465. [12] A. Feelders, Credit scoring and reject inference with mixture models. Intell. Syst. Account. Financ. Manag. 8 (1999) 271-279. [13] J. Heckman, Sample Selection Bias as a Specification Error. Econometrica, 47 (1979) 153-161. 22
[14] B. Anderson, J. Hardin, Modified logistic regression using the EM algorithm for reject inference. Int. J. Data Anal. Tech. Strat. 5 (2013) 359-373. [15] J. Crook, J. Banasik, Does reject inference really improve the performance of application scoring models? J. Bank. Financ. 28 (2004) 857-874. [16] J. Banasik, J. Crook, Credit scoring, augmentation and lean models. J. Oper. Res. Soc. 56 (2005) 1072-1081. [17] Y. Kim, S. Sohn, Technology scoring model considering rejected applicants and effect of reject inference. J. Oper. Res. Soc. 58 (2007) 1341-1347. [18] D. Joanes, Reject inference applied to logistic regression for credit scoring. IMA J. Manage. Math. 5 (1993) 35-43. [19] G. Chen, T. Astebro, Bound and collapse Bayesian reject inference for credit scoring. J. Oper. Res. Soc. 63 (2012) 1374-1387. [20] J. Banasik, J. Crook, Reject inference in survival analysis by augmentation. J. Oper. Res. Soc. 61 (2010) 473-485. [21] D. Hand, W. Henley, Can reject inference ever work? IMA J. Manage. Math. 5 (1993) 45-55. [22] S. Lessmann, B. Baesens, H. Seow, L. Thomas, Benchmarking stat-of-the-art classificatin algorithms for credit scoring: An update of research. Eur. J. Oper. Res. 247 (2015) 124-136. [23] S. Chang, T. Yeh, An artificial immune classifier for credit scoring. Appl. Soft Compt. 12 (2012) 611-618. [24] K. Ravi, V. Ravi, Bankruptcy prediction in banks and firms via statistical and intelligent techniques C A review. Eur. J. Oper. Res. 180 (2007) 1-28. [25] Y. Tian, M. Sun, Z. Deng, J. Luo, Y. Li, A new fuzzy set and nonkernel SVM approach for mislabeled binary classification with applications. IEEE T Fuzzy Syst. 25 (2017) 15361545. [26] L. Feng, The Hybrid Credit Scoring Strategies Based on KNN Classifier. Sixth Int. Conf. Fuzzy Syst. Knowl. Disc. IEEE Comput. Soc. (2009) 330-334. [27] V. Vapnik, Statistical Learning Theory. (1998) New York: Wiley-Interscience.
23
[28] B. Sch¨olkopf, A. Smola, Learning with kernels: Support vector machines, regularization, optimization, and beyond. (2002) Cambridge: MIT press. [29] I. Steinwart, Consistency of support vector machines and other regularized kernel machines. IEEE T. Inform. Theory 51 (2005) 128-142. [30] O. Chapelle, V. Sindhwani, S. Keerthi, Optimization Techniques for Semi-Supervised Support Vector Machines. J. Mach. Learn. Res. 9 (2008) 203-233. [31] X. Zhu, A. Goldberg, Introduction to Semi-Supervised Learning. Syn. Lec. Artif. Intell. Mach. Learn. 3 (2009) 1-130. [32] S. Wu, V. Pham, T. Nguyen, Two-phase optimization for support vectors and parameter selection of support vector machines: two class classification, Appl. Soft Comput. 59 (2017) 129-142. [33] J. Luo, S.-C. Fang, Y. Bai, Z. Deng, Fuzzy quadratic surface support vector machine based on fisher discriminant analysis. J. Ind. Manage. Opt. 12 (2016) 357-373. [34] M. Grant, S. Boyd, CVX: Matlab Software for Disciplined Programming, version 1.2, http://cvxr.com/cvx. (2010) [35] J. Sturm, SeDuMi 1.02, a Matlab tool box for optimization over symmetric cones. Optim. Meth. Soft. 11&12 (1999) 625-653. [36] J. Min, Y.-C. Lee, A practical approach to credit scoring. Expert Syst. Appl. 35 (2008) 1762-1770. [37] R. Batuwita, V. Palade, FSVM-CIL: Fuzzy support vector machines for class imbalance learning. IEEE T. Fuzzy Syst. 18 (2010) 558-571. [38] R. Sevakula, N. Verma, (2017). Compounding general purpose membership functions for fuzzy support vector machine under noisy environment. IEEE T. Fuzzy Syst. 25 (2017) 1446-1459. [39] N. Siddiqi, Credit risk scorecards: developing and implementing intelligent credit scoring (Vol. 3). (2012) John Wiley & Sons. [40] X. Jiang, Y. Zhang, J. Lv, Fuzzy SVM with a new fuzzy membership function, Neural Comput. Appl. 15 (2006) 268-276.
24
*Highlights (for review)
Highlights:
Kernel-free FQSSVM is used in the approach to reject inference.
The approach eliminates the bad effect of outliers in credit scoring.
The approach can handle large-sized problem efficiently.
The approach achieves the best accuracy in prediction.
The real-world data contains valuable information about rejected applicants.