Reject inference in consumer credit scoring with nonignorable missing data

Reject inference in consumer credit scoring with nonignorable missing data

Journal of Banking & Finance 37 (2013) 1040–1045 Contents lists available at SciVerse ScienceDirect Journal of Banking & Finance journal homepage: w...

276KB Sizes 0 Downloads 89 Views

Journal of Banking & Finance 37 (2013) 1040–1045

Contents lists available at SciVerse ScienceDirect

Journal of Banking & Finance journal homepage: www.elsevier.com/locate/jbf

Reject inference in consumer credit scoring with nonignorable missing data q Michael Bücker, Maarten van Kampen, Walter Krämer ⇑ Department of Statistics, University of Dortmund, D-44221 Dortmund, Germany

a r t i c l e

i n f o

Article history: Received 30 June 2011 Accepted 8 November 2012 Available online 17 November 2012 JEL classification: C25 C58 G21

a b s t r a c t We generalize an empirical likelihood approach to deal with missing data to a model of consumer credit scoring. An application to recent consumer credit data shows that our procedure yields parameter estimates which are significantly different (both statistically and economically) from the case where customers who were refused credit are ignored. This has obvious implications for commercial banks as it shows that refused customers should not be ignored when developing scorecards for the retail business. We also show that forecasts of defaults derived from the method proposed in this paper improve upon the standard ones when refused customers do not enter the estimation data set.  2012 Elsevier B.V. All rights reserved.

Keywords: Credit scoring Reject inference Logistic regression

1. Introduction Statistical models for predicting defaults in the consumer credit industry and elsewhere suffer from the non-availability of default information for customers who were denied credit in the first place (Hand and Henley, 1993; Crook and Banasik, 2004, among many others). This is known as the reject-inference-problem; it affects the estimation of the model parameters in the same way as the non-availability of high-probability rainy days would affect the parameter estimates of a meteorological model for predicting rain. This non-availability does not matter if observations are missing at random (MAR) in the sense of Rubin (1976). Missing at random means that the probability of default, given all the exogenous variables of the model, is the same whether an applicant is granted a credit or not (or in the meteorological example: if the probability of rain, given a set of relevant regressors, is the same for days observed and unobserved). In applications, this can reasonably be assumed if creditors base their decision on the same statistical model (or a preliminary version thereof) which is to be estimated. However, such procedures are illegal in many countries. In Germany, for instance, the federal data privacy act explicitly

q Financial support from Deutsche Forschungsgemeinschaft (SFB 823) is gratefully acknowledged. We thank Jing Qin, Matthias Arnold and an anonymous referee for helpful discussions and comments. ⇑ Corresponding author. Tel.: +49 231 755 3125; fax: +49 231 755 5284. E-mail addresses: [email protected] (M. Bücker), maarten. [email protected] (M. van Kampen), [email protected] (W. Krämer).

0378-4266/$ - see front matter  2012 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.jbankfin.2012.11.002

forbids banks to grant consumer credit solely on the basis of a statistical model – there must be some human judgement involved as well (for instance to determine whether applicants conceal relevant information, see Fees et al. (2011)). This means that loan officers have both the right and the duty to override a statistical model if they think this is warranted by extra information. Among applicants with otherwise identical sets of explanatory variables, some may therefore be granted a credit and some may not. Or technically speaking, the probability of being granted a credit, given the observed regressors, is not the same as the probability of being granted a credit, given the observed regressors and future default information. Whenever human judgement adds any additional information on future defaults, these probabilities will differ. This implies that data are missing not at random (MNAR) in the Rubin (1976) sense. The present paper adds to previous approaches to take credit decision processes into account when estimating models of default (see e.g. Boyes et al. (1989) or Marshall et al. (2010)) by proposing a new approach to cope with this. It is based on Qin et al. (2002), who show how to reweight observations in the light of missing data, given a parametric model for the missings, using empirical likelihood (Owen, 2001). It compares favorably to other techniques that have been suggested in the literature to mitigate the effects of missing data in the credit scoring business in that we are able to analytically derive the limiting distribution of the resulting estimator. Most prominent among established methods are extrapolation, reweighting, or simultaneous bivariate probit modeling of acceptance and default along the lines of Boyes et al. (1989). Extrapolation means assigning a default status also to the rejects, based on the same model that is fitted to the accepted cases only, and then

1041

M. Bücker et al. / Journal of Banking & Finance 37 (2013) 1040–1045

reestimating the model. Reweighting is based on the preliminary estimation of a model for acceptance, using both accepts and rejects, and a subsequent redistribution of all cases into classes with varying percentages of defaults. All accepts are then reweighted according to the proportion of defaults in their respective class. See Crook and Banasik (2004) for a survey and a discussion of the pros and cons of the various approaches. We do not want to add to this comparison literature here, as this would greatly expand the scope of our paper. Rather, we would like to introduce a new competitor and derive and illustrate its properties. In particular, in the context of a logistic regression model for defaults, we suggest an alternative reweighting scheme and show analytically that it delivers consistent and asymptotically normal parameter estimates even when credit decisions and defaults are still correlated, given all regressors. We also investigate the relationship between the severity of the missing data problem and the improvement provided by our new estimator and show that there is a monotonous relationship between the two. When applied to a recent data set of almost 10,000 individuals requesting credit with a major German bank, our approach yields parameter estimates which are significantly different from standard ones both in a statistical and in an economic sense. This shows that ignoring the missing data problem has the potential to mislead credit granting decisions in practice and is therefore also relevant for practitioners: Whenever the credit granting process is a mixture of formal scoring and informal judgment by credit officers, the parameter estimates of the scoring model may be biased and the default predictions derived from them may be inaccurate, with obvious implications for the profit of banks and financial institutions. We also show by Monte Carlo experiments that default forecasts derived from our new estimator indeed improve upon default forecasts obtained from standard Maximum Likelihood estimators of the model parameters. 2. An alternative way of reweighting observations in the presence of rejects

PðY i ¼ 1jX i ¼ xi ; bÞ : expðb0 þ b1 x1;i þ b2 x2;i þ . . . þ bk xk;i Þ 1 þ expðb0 þ b1 x1;i þ b2 x2;i þ . . . þ bk xk;i Þ

ð1Þ

(i = 1, . . . , N). This is still by far the most popular statistical model entertained in this context, see e.g. Thomas (2000), Jacobson and Roszbach (2003) or Crook and Banasik (2004). (For a rather different approach, see Khandani et al. (2010)). The primary difference to a conventional logistic regression is that not all N outcomes are observed. Let Ri = 1 if credit is granted and Ri = 0 if credit is denied. Without loss of generality, we assume that Ri = 1 for the first n data points and Ri = 0 for the remaining ones. From a statistical point of view, the problem is that ignoring all data beyond n produces inconsistent ML-estimates for the model (1) whenever data are missing not at random in the sense that

PðR ¼ 1jX; YÞ – PðR ¼ 1jXÞ:

be some parametric model for observability (sometimes also called accept-reject-model; see Crook and Banasik (2004)), let W:¼P(R = 1), and consider the following semiparametric likelihood for h, W, and F:

"

# n Y Ln ðh; W; FÞ ¼ wðyi ; xi ; hÞdFðyi ; xi Þ  ð1  WÞNn :

ð2Þ

We now show, following Qin et al. (2002), how this inconsistency can be removed. To that purpose, let F(y, x) be the joint distribution function of (Y, X) (no parametric model is needed for this), let

ð3Þ

i¼1

This function is maximized under the constraints pi P 0;

n n n X X X   pi ¼ 1; pi xi  lX ¼ 0;and pi ½wðyi ;xi ;hÞ  W ¼ 0; ð4Þ i¼1

i¼1

i¼1

where pi = dF(yi, xi) = F(yi, xi)  F(yi, xi), i.e. pi is the increase in the joint distribution function at (yi, xi) and lX is either the known expectation or the empirical mean of X. By introducing Lagrange multipliers and profiling for all values of pi, it is seen that

1 ; pi ¼  n 1 þ k>1 ðxi  lX Þ þ k2 ðwðyi ; xi ; hÞ  WÞ where k1 and k2 are Lagrange multipliers. Substituting pi into (3) results in a profile likelihood that can be maximized numerically. (Qin et al. (2002), Theorem 1) show that under mild regularity conditions, the resulting empirical likelihood estimates for h and W are consistent and asymptotically normal. ^i of pi in order to Here we are interested in the plug-in estimate p reweight the likelihood derived from (1). Doing this, we obtain

LI n ðbÞ ¼

n Y

^i f ðyi jxi ; bÞ; p

ð5Þ

i¼1

where f ðyi jxi ; bÞ ¼ ½PðY i ¼ 1jX i ¼ xi ; bÞyi  ½1  PðY i ¼ 1jX i ¼ xi ; bÞ1yi . The full likelihood function is then given by

LI n ðbÞ ¼

We consider N applicants for a credit, n of whom are granted a credit and N–n are not. Default is coded by a dichotomous variable Y, where Yi = 0 in case of default and Yi = 1 in case of no default. We assume that Yi depends on a set of k regressors which we collect together in a (k  1) vector X. Potential errors of measurement concerning X are neglected; see however Fees et al. (2011). We also assume that the dependence of Y on X can be described by a logistic regression model

¼

wðy; x; hÞ :¼ PðR ¼ 1jY; X; hÞ

n Y

1 h i > cÞ ^ ^ X Þ þ ^k2 ðwðyi ; xi ; ^hÞ  W i¼1 n 1 þ k1 ðxi  l   yi  1yi expðx> bÞ expðx> bÞ   1  : 1 þ expðx> bÞ 1 þ expðx> bÞ

ð6Þ

^ which ignores all missings is The conventional ML-estimator b ^i . Our main theoretical rethe solution to (6) without the weights p sult is that maximizing (6) yields a consistent and asymptotically ~ for b even in the case of (2), i.e. when missingnormal estimator b ness cannot be ignored. Theorem 1. Under mild regularity conditions to be specified in the ~ is weakly consistent and appendix, the modified ML-estimator b

pffiffiffiffi d ~  b0 Þ ! Nðb N ð0; VÞ; where b0 denotes the true value of b. The proof of this theorem and the description of the limiting covariance matrix V are in the appendix. Table 1 provides some finite sample Monte Carlo evidence for N = 10,000, a common sample size in consumer credit scoring applications. We consider the case of a single regressor, i.e., k = 1, iid with values X i  N ð0; 4Þ and

PðY i ¼ 1jX i ¼ xi ; bÞ ¼

expðb0 þ b1 xi Þ 1 þ expðb0 þ b1 xi Þ

ði ¼ 1 . . . ; NÞ:

The observability of Y is governed by

wðyi ; xi ; hÞ ¼

expðh0 þ h1 yi Þ 1 þ expðh0 þ h1 yi Þ

ði ¼ 1 . . . ; NÞ:

ð7Þ

1042

M. Bücker et al. / Journal of Banking & Finance 37 (2013) 1040–1045

In the table, we keep b0, b1 and h1 fixed at 2, 1, and 1, respectively, and report the empirical bias and the empirical mean square error for various values of the crucial parameter h0 which determines the proportion of missing data (the larger h0, the smaller the proportion of missing y’s). 1000 runs are performed for each parameter combination. Our experiments show that, for this particular setup of regressors and parameters, the inconsistency of the standard estimator ^ manifests itself mainly via a considerable bias in the estimate b for the intercept b0. This bias decreases as the percentage of missings decreases, but is still substantial for a percentage of missings as low as 6.4%. As for the intercept, the bias of the new and the standard estimator are negligible and almost equal to each other. And in terms of variance, the standard estimator does even better than the estimator suggested here (but not enough to endanger the latter’s MSE-superiority). This superiority in terms of variance of the standard estimator is not surprising, as our new procedure adds additional variability to the estimates. This extra variability reduces the bias but increases the variance, with a resulting net decrease of the MSE. The fact that the gain in efficiency of our new approach manifests itself mainly via a decrease in the bias of the estimated intercept is an artifact of the particular experimental set-up used for Table 1; in other experiments, where the missingness also depends on the covariates and not only on y, efficiency gains spread over all parameter estimates. Also, the observability model used for Table 1 is responsible for the (almost) unbiasedness of this standard estimate of its slope coefficient; for other regressors, there is a bias in the estimated slope coefficient as well. Detailed tables are available from the authors upon request. A minor drawback of the proposed estimation method is the non-identifiability of the model parameters in the case of too many covariates in the missing data process. More precisely, the parameter h of the missing data process must not have length larger than k + 1. This is a technical condition which ensures that the number of free parameters does not exceed the number of estimation equations in (4); it will be inconsequential for most empirical applications.

3. Improved consumer credit scoring in practice Next we analyze 9651 credit histories provided to us by a major German bank. For 3984 clients the repayment status is known, all other clients have been denied credit, so information on potential repayment is missing. The lending institution holds information about various covariates for all applicants, collected together in Table 2. The table and various descriptive analyses show that our data conforms to

patterns observed elsewhere (see e.g. Jacobson and Roszbach (2003) or Blöchlinger and Leippold (2006)). Fig. 1 for instance shows that civil servants have a small risk of both rejection and default while self-employment comes with a high propensity for both default and rejection by lending institutions. Fig. 2 implicates that existing clients are far less likely to be denied credit by the bank, although the probability for default is almost identical to that of newcomers. And Fig. 3 shows that repayment frequency and acceptance by the bank are above average in the age group 28– 46. For certain variables the category ‘‘other’’ comprises observations with missing or undetermined values. Next, we fit a logistic regression model to this dataset. This is still the most popular technique in credit scoring since it is easy to implement, to understand and to interpret (Thomas, 2000). Other than e.g. Jacobson and Roszbach (2003), we use all available regressors as we have lots of degrees of freedom to accomodate also insignificant ones. Also, we do not want to let statistical significance take too prominent a role as compared to economic significance, see Krämer (2011). The variable ‘‘time at present address’’ turns out to be independent of the missingness and can thus be discarded as a covariate for the model for w in (7). Hence we are able to ^i . By means of these weights we compute estimate the weights p the new estimator for b. In addition, Table 3 reports the standard parameter estimates of our logistic regression model. Both estimators confirm what has been observed elsewhere (e.g. Jacobson and Roszbach (2003), Crook and Banasik (2004) or Blöchlinger and Leippold (2006)): The risk of default is higher, ceteris paribus, for singles or persons who are separated or divorced. Also, it increases with the number of loans outstanding or the Schufa-score. On the other hand, the results show that the new approach leads to significantly different estimates for some parameters, both in a statistical and in an economic sense. For some variables, even the sign of the estimate is reversed. For instance, the effect of working experience is negative if estimated conventionally and positive if estimated by our new method. Similarly, the existence of a co-signer has a positive effect if estimated conventionally, but turns negative with our new estimator. A possible explanation for this reversal is the requirement of a co-signer only for clients with a high risk of default. We also perform a Hausman test for nonignorability of the rejects as described in Bücker et al. (2012). The resulting v241 -distributed test statistic is 478.367 (p < 0.001), so the null hypothesis of nonignorability of the rejects is indeed rejected, justifying our new approach. The goodness of fit of both models can be compared by McFadden’s R2. The conventional model yields R2McF ¼ 0:103 and for the new model we have R2McF ¼ 0:341, so again there is evidence that our approach improves upon the standard one. 4. Comparing default predictions

Table 1 ~ and conventional Bias and mean square error of new parameter estimates b ^ (each multiplied by 1000). parameter estimates b h0 (Resulting percentage of missings in parentheses) 2 (76.5)

1 (55.2)

0 (32.1)

1 (15.3)

^0 Þ biasðb ~0 Þ biasðb

818.276

618.419

380.993

187.864

2 (6.4) 77.567

3.200

2.432

2.108

1.275

0.202

^1 Þ biasðb ~1 Þ biasðb

3.757

0.568

0.356

0.127

0.543

4.291

0.786

0.322

0.151

0.538

^0 Þ varðb ~0 Þ varðb

11.883

5.406

2.778

2.149

1.834

21.521

8.537

4.089

2.475

1.988

^1 Þ varðb ~1 Þ varðb

3.476

1.459

0.906

0.664

0.599

3.645

1.492

0.919

0.666

0.600

^ MSEðbÞ ~ MSEðbÞ

684.933

389.301

148.836

38.102

8.448

25.169

10.026

5.007

3.140

2.586

Next we examine via some additional Monte Carlo simulations whether the new estimator can improve upon conventional default predictions in the logistic regression context. We assume that the dependent variable Yi is  Bð1; pi Þ, where pi = exp (b0 + b1x1,i + b2x2,i)/[1 + exp (b0 + b1x1,i + b2x2,i)](i = 1, . . . , N) and the regressors are cross sectionally independent and iid N ð1; 1Þ. The observability of Y is governed by

wðyi ; xi ; hÞ ¼

expðh0 þ h1 yi þ h2 x1;i Þ 1 þ expðh0 þ h1 yi þ h2 x1;i Þ

ði ¼ 1 . . . ; NÞ:

We set N = 10,000, b0 = 2, b1 = b2 = 1, h0 = 3, h1 = 3 and h2 = 1, and use 200 simulation runs. In each run we simulate one sample to estimate our parameters and another sample to calculate the predictions given the estimated parameters. We estimate the parameters using the observed sample (i.e. all obervations with R = 1 in

1043

M. Bücker et al. / Journal of Banking & Finance 37 (2013) 1040–1045 Table 2 Variables used in the analysis.

Age Marital status Children Occupation Working experience Household income Co-signer Purchasing power Loans

Age of the applicant in years Categorial variable. Takes values married, widowed, single, cohabitee, divorced, and separated Categorical variable. Takes values no child, one child, two children, and more than two children Categorial variable. Takes values civil servant, skilled worker, self-employed, pensioner, and other Working experience of the applicant in months Household income in Euro per month Dummy. Takes value 1 if a co-signer is present and 0 otherwise Categorial variable. Purchasing power of the applicant’s residential area. Takes values very high, medium, low, and other Categorial variable. Number of outstanding loans to be repaid by the applicant. Takes values 1 loan, 2 loans, 3 loans, 4 or more loans, no loan, and other Categorial variable. An applicant’s score with Schufa, the leading German Credit Bureau. Takes values A–P with increasing default probability, aggregated to classes A–C, D–E, F–J, K–M, and P Dummy. Takes value 1 if an applicant is a new customer and 0 if the applicant is an existing customer Categorial variable. Takes values mixed use building, single family home, two-flat, 3–5 flats, 6–10 flats, 11–14 flats, 15–20 flats, more than 20 flats, and other Months an applicant has been living at the present address

Schufa New customer Accommodation type

1 0.5 0

Reject ratio

1.5

Time at present address

Civil servant

Employee

Skilled worker

Self−employed

Pensioner

Other

Default ratio

Definition

0.00 0.02 0.04 0.06 0.08 0.10 0.12

Variable

0.04 0.02 0.00

Existing client

Default ratio

0.7 0.5 0.3 0 0.1

Reject ratio

0.06

Fig. 1. Relative frequencies of rejection (dark bars, left axis) and default (light bars, right axis) for different occupations. The ordinates are scaled in such a way that the dashed line represents overall means for both rejections and defaults.

New client

0.04 0.03

Dichte

0.00

0.00

0.01

0.02

0.03 0.02 0.01

Density

0.04

Fig. 2. Relative frequencies of rejection (dark bars, left axis) and default (light bars, right axis) for existing and new clients. The axes are scaled in such a way that the dashed line represents both overall means.

20

30

40

50

60

Age

0

10

20

30

40

50

60

70

Age

Fig. 3. Histogram and kernel density estimate of the variable age for all clients (solid line), rejected clients (dotted line) and bad clients (dashed line).

the first sample), but calculate the predictions using the complete second sample. Given the default probabilities, we compare the area under the receiver operating characteristic curve (AUROC) of our

new estimator and the conventional one. Fig. 4 displays the boxplots of the AUROC. It shows that the new estimator indeed leads to better default predictions on average than the standard one.

1044

M. Bücker et al. / Journal of Banking & Finance 37 (2013) 1040–1045

Table 3 ^ and new parameter estimates b ~ of the logistic regression model for creditworthiness. For categorical variables the reference class is given in Conventional parameter estimates b parentheses.

Intercept Age Marital status (Married)

Children (No child) Occupation (Civil servant)

Working experience Household income Co-signer (not available) Purchasing power (Very high)

Credits (1 Credit)

Schufa score (A–E)

New customer (no) Acommodation type (Mixed use)

Widowed Single Cohabitee Divorced Separated 1 Child 2 Children >2 Children Skilled worker Unskilled worker Self-Employed Pensioner Other

Available High Medium low Other 2 Credits 3 Credits 4 Or more No credit Other F–J K–O P Other Yes Single family Two-flat 3–5 Flats 6–10 Flats 11–14 Flats 15–20 Flats >20 Flats Other

Time at present address

^ b

sd

~ b

sd

5.726 0.032 0.264 0.299 1.632 0.121 0.663 0.007 0.450 0.124 0.826 1.336 1.958 0.635 0.621 0.003 0.000 0.102 0.188 0.219 0.129 0.669 0.119 0.113 0.365 0.294 0.800 0.830 1.548 4.202 0.804 0.068 0.340 0.165 0.331 0.069 0.124 0.165 0.334 0.256 0.001

0.930 0.013 0.873 0.269 1.030 0.336 0.424 0.271 0.293 0.517 0.447 0.468 0.578 1.321 0.876 0.001 0.000 0.284 0.327 0.311 0.353 0.401 0.259 0.312 0.290 0.239 0.353 0.205 0.271 0.615 0.477 0.329 0.496 0.506 0.503 0.481 0.543 0.633 0.604 0.525 0.001

4.638 0.053 0.997 0.585 3.134 0.053 1.028 0.240 1.033 0.754 0.997 1.788 3.296 3.459 1.433 0.002 0.001 0.240 0.178 0.941 0.067 2.248 0.058 0.222 1.165 1.360 1.574 0.922 2.741 7.086 2.143 0.091 0.959 0.394 0.572 0.682 0.875 0.229 1.551 2.686 0.001

0.983 0.012 0.806 0.287 1.178 0.370 0.505 0.333 0.450 0.325 0.431 0.301 0.618 1.229 1.240 0.001 0.000 0.267 0.391 0.348 0.414 0.385 0.340 0.450 0.306 0.287 0.403 0.229 0.295 1.541 0.393 0.500 0.741 0.691 0.701 0.772 0.770 0.719 0.953 0.612 0.001

0.72

0.74

0.76

0.78

0.80

We also compare our results to results obtained using the bivariate probit model of Boyes et al. (1989). We estimate the parameters of this model using the weighted exogenous sample maximum likelihood estimator. To determine the weights, we have to calculate different sample and population frequencies (see Boyes et al. (1989)). We obtain the population frequencies by simulating a single sample of one million observations. Fig. 4 shows that our results are again significantly better. This is not really surprising, however, as our estimator is based on the true data generating process in contrast to the bivariate probit model. Also, our technique sometimes runs into numerical problems. This happens, for example, if we simulate observations from a bivariate probit model. In ^i . A this case, we were not always able to calculate the weights p similar problem occurred in smaller samples. 5. Discussion

conventional estimator

new estimator

bivariate probit

Fig. 4. Boxplots of the AUROC of the logistic regression model estimated with the conventional and our new estimator.

The technique developed here can be applied to more general parametric models of the dependency of Y on X. Additional applications can be imagined like clinical studies with nonrespondents or non-response in opinion polls. In the credit scoring context, the usefulness of our approach depends upon the extent to which

M. Bücker et al. / Journal of Banking & Finance 37 (2013) 1040–1045

credit officers override formal scorecards among applicants that are used to estimate the scorecard model. If this happens very often, rejects are not missing at random anymore and default forecasts derived from the estimated model can be improved by the method introduced above. Appendix A. Proof of Theorem 1 We only sketch the main idea here. For details see Bücker and Krämer (2011). Let

Wn ðbÞ :¼

n 1X @ ln f ðyi jxi ; bÞ ^i np n i¼1 @b

be the derivative of the log of the likelihood (5). In addition, let

^i wni ðbÞ :¼ np

(A5) wn(b) is continuous in b for almost all y, x, (A6) $d(x, y) with EC(d(X, Y)) < 1 and jwn(b)j 6 d(x, y) "b, (A7) the operations of integration with respect to y, x and differentiation with respect to b can be interchanged, (A8) E (@lnf(yijxi, b)/@b) has a unique root, ^ ¼ op ð1Þ, (A9) Wn ðbÞ j j j @ 2 Nnj ðbj ;gj ;an Þ @ 2 Nnj ðbj ;gj ;an Þ @ 2 Nnj ðbj ;gj ;an Þ (A10) , und exist and are bounded @ g> @ g @b> @b ð@aN Þ2 by an integrable function "j, 0Þ exists and is invertible. (A11) E @sðb @b> ^ follows from the fact that Now consistency of the M-estimator b under the additional conditions (A1)–(A9) Wn(b) converges uniformly in probability to W(b):¼E (si(b)) with unique root b0. From the uniform integrability of wn(b) and the equicontinuity of EC(wn(b)) it follows that



p

sup EC wni ðbÞ  Eðsi ðbÞÞ !0:

@ ln f ðyi jxi ; bÞ @b

b

Also, by the uniform law of large numbers,

and

sni ðbÞ



1 X n  n



p

sup wi ðbÞ  EC wni ðbÞ !0:

b n i¼1

@ ln f ðyi jxi ; bÞ : :¼ @b

From Qin et al. (2002) it can be seen that

^ follows from the fact that Finally, the consistence of b

W0 : wðyi ; xi ; h0 Þ

p

^i ! np

Let E be the expectation with respect to F(y, x) and EC as the expectation with respect to the conditional distribution w(y, x, h0)dF(y, x)/W0, where h0 and W0 represent the true values of h and W respectively. Then it is easily verified that

 EC

1045

   W0 @ ln f ðyi jxi ; bÞ @ ln f ðyi jxi ; bÞ ¼E : @b @b wðyi ; xi ; h0 Þ

where the second term converges as it contains a Cesàro mean. The normality of the estimator can be derived by a componentwise Taylor expansion of (A.1) similar to the proof of Theorem 2 in Qin et al. (2002). Details are again available in Bücker and Krämer (2011).

Similar to Qin et al. (2002), let

c ¼ k1 ð1  WÞ; g ¼ ðh> ; W; c> Þ> ; > > g0 ¼ ðh>0 ; W 0 ; 0 Þ ; aN ¼ Nn  W10 : Then

npi ¼

1 X n  n



supjWn ðbÞ  WðbÞj ¼ sup wi ðbÞ  EC wni ðbÞ b b n i¼1



1 X n n

 n



1X

þ EC wni ðbÞ  Eðsi ðbÞÞ 6 sup wi ðbÞ  EC wni ðbÞ

n i¼1 b n i¼1



1 X n

p

þsup EC wni ðbÞ  Eðsi ðbÞÞ !0;

b n i¼1

References

1W 0 wðyi ; xi ; hÞ þ c> ðxi  lX Þ þ aN ðwðyi ; xi ; hÞ  WÞ 1  WW0 þ 1W W0

:

Defining

Nn ðb; g; aN Þ :¼

n 1X n ðb; g; aN Þ n i¼1 i

ðA:1Þ

where ni ðb; g; aN Þ :¼

n 1X 1W si ðbÞ 0 n i¼1 1  WW þ 1W wðyi ;xi ; hÞ þ c> ðxi  lX Þ þ aN ðwðyi ; xi ; hÞ  WÞ W0 0

we have

^ ; aN Þ ¼ Wn ðbÞ: Nn ðb; g Also, let Nnj ðb; g; aN Þ denote the jth component of Nn(b, g, aN). For the proof of Theorem 1, we impose the following assumptions (which look rather technical but are easily verified in many applications, see Bücker and Krämer (2011)): (A1) (A2) (A3) (A4)

b is from some compact subset of Rkþ1 , the marginal distribution of X must not depend on b, wn (b) is asymptotically uniformly integrable, EC wni ðbÞ is equicontinuous,

Blöchlinger, A., Leippold, M., 2006. Economic benefit of powerful credit scoring. Journal of Banking and Finance 30, 851–873. Boyes, W., Hoffman, D., Low, S., 1989. An econometric analysis of the bank credit scoring problem. Journal of Econometrics 40, 3–14. Bücker, M., Arnold, M., Krämer, W., 2012. A Hausman test for non-ignorability. Economics Letters 114, 23–25. Bücker, M., Krämer, W., 2011. Reject inference in consumer credit scoring with nonignorable missing data. Tech. Rep. 1/2011, SFB 823, TU Dortmund. Crook, J., Banasik, J., 2004. Does reject inference really improve the performance of application scoring models? Journal of Banking & Finance 28, 857–874. Fees, E., Schieble, M., Walzl, M., 2011. Why it pays to conceal: on the optimal timing of acquiring verifiable information. German Economic Review 12, 100–123. Hand, D.J., Henley, W.E., 1993. Can reject inference ever work? IMA Journal of Mathematics Applied in Business & Industry 5, 45–55. Jacobson, T., Roszbach, K., 2003. Bank lending policy, credit scoring and value-atrisk. Journal of Banking & Finance 27, 615–633. Khandani, A., Kim, A., Lo, A., 2010. Consumer credit risk models via machinelearning algorithms. Journal of Banking & Finance 34, 2767–2787. Krämer, W., 2011. The cult of statistical significance – what economists should and should not do to make their data talk. Schmollers Jahrbuch 131, 455–468. Marshall, A., Tang, L., Milne, A., 2010. Variable reduction, sample selection bias and bank retail credit scoring. Journal of Empirical Finance 17, 501–512. Owen, A.B., 2001. Empirical Likelihood. Chapman & Hall/CRC, Boca Raton. Qin, J., Leung, D., Shao, J., 2002. Estimation with survey data under nonignorable nonresponse or informative sampling. Journal of the American Statistical Association 97, 193–200. Rubin, D.B., 1976. Inference and missing data. Biometrika 63, 581–592. Thomas, L.C., 2000. A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers. International Journal of Forecasting 16, 149–172.