Exposure at default modeling – A theoretical and empirical assessment of estimation approaches and parameter choice

Exposure at default modeling – A theoretical and empirical assessment of estimation approaches and parameter choice

Accepted Manuscript Exposure at Default Modeling – A Theoretical and Empirical Assessment of Estimation Approaches and Parameter Choice Marc Gurtler ...

1MB Sizes 0 Downloads 24 Views

Accepted Manuscript

Exposure at Default Modeling – A Theoretical and Empirical Assessment of Estimation Approaches and Parameter Choice Marc Gurtler , Martin Thomas Hibbeln , Piet Usselmann ¨ PII: DOI: Reference:

S0378-4266(17)30054-7 10.1016/j.jbankfin.2017.03.004 JBF 5107

To appear in:

Journal of Banking and Finance

Received date: Revised date: Accepted date:

14 January 2016 17 February 2017 3 March 2017

Please cite this article as: Marc Gurtler , Martin Thomas Hibbeln , Piet Usselmann , Exposure at ¨ Default Modeling – A Theoretical and Empirical Assessment of Estimation Approaches and Parameter Choice, Journal of Banking and Finance (2017), doi: 10.1016/j.jbankfin.2017.03.004

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Exposure at Default Modeling – A Theoretical and Empirical Assessment of Estimation Approaches and Parameter Choice Marc Gürtler a, Martin Thomas Hibbeln b, Piet Usselmann a, University of Braunschweig - Institute of Technology, Germany b

University of Duisburg-Essen, Germany

Abstract

CR IP T

a

M

AN US

Estimating the credit risk parameter exposure at default is important for banks from an internal risk management and a regulatory perspective. Several approaches are common in the literature and in practice. We theoretically and empirically analyze how the exposure at default should be modeled to obtain accurate estimates of the expected loss. Our empirical analysis is based on a large and unique dataset from a retail portfolio of a European bank. We demonstrate that some approaches can lead to substantially biased estimates of the expected loss and show that the generalized cohort approach is advantageous. Moreover, using in- and out-of-sample analyses, we empirically demonstrate that using the credit conversion factor is preferable to the loan equivalent factor, exposure at default factor, and direct exposure at default estimation to achieve high estimation accuracy.

ED

Keywords: Credit risk, checking accounts, exposure at default, credit conversion factor, probability of default

AC

CE

PT

JEL classification: G21, G28



Corresponding author: Piet Usselmann; Technische Universität Braunschweig; Abt-Jerusalem-Str. 7, 38106

Braunschweig, Germany; Phone: +49 531 391 2894; E-mail: [email protected].

1

ACCEPTED MANUSCRIPT

1. Introduction When estimating the risk related to a credit product, banks typically model the risk parameters probability of default (PD), loss given default (LGD), and exposure at default (EaD) (or the credit conversion factor (CCF)) separately.1 In contrast to the credit risk parameters PD and

CR IP T

LGD, few papers have theoretically or empirically analyzed the modeling of EaD or CCF, although modeling these parameters is important for banks from an internal risk management and a regulatory perspective. On the one hand, CCF is needed to fulfill regulatory requirements regarding the internal ratings-based (IRB) approach of Basel III. Furthermore,

AN US

CCF modeling can also be required for estimating expected loss over lifetime, which is mandated by the new International Financial Reporting Standard 9 (IFRS 9). On the other hand, an unbiased EaD estimation is important from an internal risk management perspective regarding risk-based pricing, limit managing, or economic capital calculations to control risk

M

and to obtain an advantage over competitors.

ED

In this context, products with time-varying exposure are of particular interest. Whereas most previous studies regarding CCF address products for corporate customers, we analyze

PT

retail checking accounts with lines of credit, which is a widespread product in many countries. In 2013, more than 98 million checking accounts existed in Germany and more than 99% of

CE

the main payment instruments (measured by frequency of transactions) were credit transfers,

AC

direct debits, electronic cards, and credit cards. Similar numbers can be observed for most

1

These risk parameters can be defined as follows: The PD is the probability of default of a counterparty,

typically over a one-year period. The LGD is the (expected) ratio of the loss on an exposure due to the default of a counterparty to the amount outstanding at default. The EaD is the (expected) amount outstanding at default. The CCF is the proportion of the currently undrawn amount of the commitment (the open limit) that is expected to be drawn down at default. The expected loss EL can be determined as the product of PD, LGD and EaD. For an overview of credit risk measurement see Bluhm et al. (2010).

2

ACCEPTED MANUSCRIPT countries that participate in the Committee on Payments and Market Infrastructures (CPMI) (Bank for International Settlements 2014). In September 2015, checking account customers in Germany made use of nearly 36 billion euros in overdraft facilities (Deutsche Bundesbank 2015). Both numbers show the high qualitative and quantitative relevance of this product. In our theoretical analysis, we discuss three common modeling approaches for CCF,

CR IP T

namely the fixed-horizon, the variable-horizon, and the (standard) cohort approach. Particularly, we demonstrate the conditions under which the resulting CCF and EL estimates are unbiased. In addition, we propose a generalized cohort approach, which can yield unbiased CCF estimates even if these assumptions are not fulfilled. Based on these CCF

AN US

estimates, the EaD and, ultimately, the EL is calculated, such that a biased CCF generally results in biased EL estimates.

In our empirical assessment of EaD, we not only consider an estimation based on CCF because the literature suggests various parameters for estimating the EaD: the CCF, the loan

M

equivalent factor (LEQ), the exposure at default factor (EaDF), or direct estimates of EaD.

ED

Also in banking practice, there is no consensus about the parameter on which EaD estimates should be based (Bank for International Settlements 2016). For this reason, we first analyze

PT

which of these parameters is superior for EaD modeling based on in- and out-of-sample estimations. In this context, we use k-fold cross-validation techniques and transform the

CE

resulting estimates to an identical basis to allow for an appropriate comparison of these

AC

different parameters. We evaluate the different models based on several predictive accuracy measures. We find that it is beneficial to estimate the CCF instead of the alternative parameters to achieve high accuracy. Furthermore, our empirical results confirm the relevance of the bias discussed in the theoretical analysis. In particular, a combination of the common PD estimate based on a variable-horizon approach and the CCF estimate based on a fixedhorizon approach can lead to an overestimation of the expected loss. However, from a regulatory perspective, this estimation is less problematic because the estimate is rather 3

ACCEPTED MANUSCRIPT conservative. Finally, we can relax some assumptions of our theoretical model. We find that if the CCF is estimated conditional on some independent variables x instead of using simple historical averages, the predictions are not only improved because we account for the variance in the CCF estimates but also because of a substantial decline in the abovementioned bias. Our paper contributes to the banking and finance literature in several ways. First, we

CR IP T

provide the first systematic and extensive overview of the existing literature on CCF modeling. Second, we present the assumptions under which the different approaches for modeling EaD can be transferred into one another and consistent EaD or CCF estimates can be obtained, whereas the previous literature on EaD or CCF modeling primarily addresses the

AN US

advantages and disadvantages of these approaches from a practical perspective. Moreover, we propose a new general cohort approach to achieve unbiased estimates. Third, based on a large data set, we empirically identify which of the various parameters (CCF, LEQ, EaDF, EaD) should be used for EaD modeling. Furthermore, we demonstrate the existence of relevant

M

interactions between modeling PD and EaD.

ED

The remainder of the paper is organized as follows. In Section 2, we provide a systematic overview of the existing literature on CCF modeling. In Section 3, we describe our

PT

model and present our theoretical results. In Section 4, we report the empirical analysis, and

CE

Section 5 concludes.

AC

2. Literature Review

Despite the high relevance of modeling EaD or CCF, the literature on this topic is rather scarce. We present the first systematic and extensive overview of the literature addressing this topic. Table 1 summarizes our review of the literature.

The existing studies consider different periods and countries, but most analyses are based on US data. Regardless of the product, the average realized CCF values are typically 4

ACCEPTED MANUSCRIPT between 30% and 60%. Hence, most borrowers do not fully use their limit in the event of default. However, we find evidence that the CCF value strongly depends on the product, data, and empirical strategy used. Most studies on CCF address products for corporate customers whereas few focus on retail customers. Moreover, the previous literature on retail customers focuses primarily on modeling CCF (EaD) for credit cards. Most studies discuss which factors

CR IP T

might influence CCF values in a univariate or multivariate setting (e.g., Araten and Jacobs 2001, Jiménez et al. 2008, Qi 2009, Jacobs 2010, Zhao et al. 2011, and Leow and Crook 2016). Especially for corporate customers, the most relevant factors are time-to-default and borrower risk. For example, Agarwal et al. (2006) find that a decrease in credit quality results

AN US

in a significant increase in credit line utilization.

There are also some other streams of CCF-related literature. Some studies directly model exposure at default instead of CCF (e.g., Leow and Crook 2016), whereas others model the exposure (which is not necessarily conditional on default) or credit line usage and credit

M

line usage at default (e.g., Jiménez et al. 2009, Sufi 2009, Hibbeln et al. 2015, or Hon and

ED

Bellotti 2016). Models regarding LEQ or EaDF are discussed, e.g., in Jacobs (2010), Moral (2011), or Leow and Crook (2016). The studies by Hon and Bellotti (2016) and Tong et al.

PT

(2016) discuss a variety of other sophisticated statistical regression models for modelling EaD, CCF, LEQ or EaDF.

CE

Another stream of the literature discusses different theoretical topics in EaD/CCF

AC

modeling. Moral (2011) analyzes various methods for estimating EaD. Bag and Jacobs (2011) compare the empirical results of selected papers, and Hahn and Reitz (2011) present possible approaches for estimating exposure. Bag and Jacobs (2012) present an algorithm-based method to determine EaD. Finally, the Bank for International Settlements (2016) published the findings of a survey in 2014 among 37 banks from 17 countries. They find widely varying realized CCFs in banks due to different estimation approaches (i.e. the fixed-horizon, variable-horizon, or standard cohort approach), estimators other than the mean, or data 5

ACCEPTED MANUSCRIPT cleaning processes. Moreover, they notice that EaD estimates within banks are based on different risk parameters, where the EaDF and CCF approach are most common.

3. Theoretical analysis of CCF forecasts 3.1. Variable description

CR IP T

To define the CCF in a manner consistent with regulatory guidelines, we first need to introduce a model environment. Let Bt denote the balance at time t of the account under consideration, and let Lt stand for the limit advised at time t. In addition, t  {0, 1} defines an indicator variable that describes a default of the owner of the account for different reasons at

AN US

time t if and only if t = 1. Examples of such reasons are first exogenous signals from an external rating agency or from a credit bureau or defaults of other credit products by the same customer.2 Second, a default occurs at time t after being past due for more than 90 days (B < L for  [t90, t]).3 Against this background, we define the exposure at time t as

M

et : min{Bt ,0} and the default time as d : min{t |  t  1}. On this basis we are able to

ED

define the exposure at default (EaDt,T) at a future time T from the perspective of time t as the expected exposure at default time d=T, where the expectation value Et is determined at time t

PT

< T, i.e., EaDt ,T : Et ( ed | d  T ).

CE

On the basis of the EaD and for the purposes of Basel III and EU Regulation No. 575, the conversion factor is defined as “the ratio of the currently undrawn amount of a

AC

commitment that could be drawn and that would therefore be outstanding at default to the currently undrawn amount of the commitment (the extent of the commitment being 2

It is common bank practice in retail business and in line with Basel III and EU Regulation No. 575 to assign

default status to the specific account and not to the client. 3

Usually, the negative balance has to stay below the authorized limit. An overdraft of an account can, however,

occur by using the checking account offline, interest on debit balances charged by the bank, or a manual approval by a loan officer.

6

ACCEPTED MANUSCRIPT determined by the advised limit, unless the unadvised limit is higher)” (EU Regulation No. 575).4 Hence, we define the expected credit conversion factor (CCFt,T) for a default of a credit product at time T from the perspective of time t as

 EaDt ,T  et ,  CCFt ,T   Lt  et 0, 

if Lt  et  0,

(1)

else.

CR IP T

To determine the expected loss (EL) of a checking account, two further risk parameters are relevant, the PD and the LGD. On one hand, we define PDt,T = Pt(d=T) as the probability of default at time T from the perspective of time t.5 On the other hand, we consider

AN US

as the probability of default in the period between t+1 and T. PDtcum ,T  Pt (d {t  1, ..., T }) Furthermore, we define the share of the exposure that is lost if the borrower defaults as

t

t

(2)

 et stands for the absolute loss at time t, and the expected loss given default d

| d  T ).

ED

is defined as LGDt ,T  Et (

M

Consequently,

loss ratio of the exposure, if d  t, :  else. 0,

PT

3.2. The model

From the perspective of both risk management and banking regulation (with respect to Basel

CE

III), the EL is of particular importance. In this subsection, we theoretically demonstrate how

AC

CCF influences EL and which methods of CCF determination are adequate to obtain an unbiased EL-estimation method.

4

This definition is only valid for accounts where a part of the credit line is currently undrawn, which is the

standard state. In case that the current exposure exceeds the limit ( et  Lt ) it is consequential to define CCF as 0. This is common in practice and literature (e.g. Bank for International Settlements 2016 or Leow and Crook 2016). 5

Pt denotes the probability from the perspective of time t.

7

ACCEPTED MANUSCRIPT Assuming that ed |(d=) and

d|(d=)

are independent for all  > t, the EL for potential

defaults in the period between t and T can be determined as follows



T



 t 1

Et (

T

E(

 t 1



d  ed ) 

t

d

T

E(

 t 1

t

d

 ed | d   )  Pt (d   )

| d   )  Et (ed | d   )  Pt (d   )

(3)

T

 LGD   EaD   PD  .  t 1

t,

t,

t,

CR IP T

ELt ,T 

For simplification, we assume LGDt ,1  LGDt , 2 : LGDt for all 1, 2 , i.e., the share of the exposure that is lost if the borrower defaults is independent of the time of default.6. Under this assumption the EL simplifies to T

T

 (e  CCF   (L  e ))  PD  .

AN US

ELt ,T  LGDt   EaDt ,  PDt ,  LGDt   t 1

 t 1

t

t,

t

t

t,

(4)

We call this approach the “generalized cohort approach” because it can be applied at an arbitrary point in time t. However, in practice, the parameters CCFt, are typically estimated

M

on the basis of a specific point in time, e.g., t = “January 1st,” and these estimations are

ED

applied to arbitrary points in time t. This so-called “cohort approach” (below, we use the designation “standard cohort approach” to avoid confusion with the “generalized cohort

PT

approach”) implicitly assumes that two CCF values coincide if the corresponding residual

CE

lifetimes t are the same, i.e., CCFt,  CCFt+c, +c for an arbitrary c. In the empirical section, we analyze this assumption. However, if this assumption proves false, the generalized cohort

AC

approach (4) should be applied for each considered time t. Independent of the empirical result, in the following, we demonstrate the implications of this and further assumptions. Specifically, we use the following assumptions: (A1)

CCFt, = CCFT+t,T (for all t = 1, …, T1; t <  ≤ T),

(A2)

PDt, = PDT+t,T (for all t = 1, …, T1; t <  ≤ T),

6

Discussions with our data provider confirmed the validity of this assumption for our dataset.

8

ACCEPTED MANUSCRIPT (A3)

Et(ed | d = T) = E(ed | d = T) is independent of t,

(A4)

et < Lt (for all t = 1, …, T1),

(A5)

et t  Lt t  . E (ed | d  T )  et Lt  E (ed | d  T )

The first two assumptions concern the independence of both the CCFt, and PDt, of the

CR IP T

abovementioned residual lifetime of an account. According to the assumption (A3), the determination of an expectation value is independent of the point in time t. Assumption (A4) requires that the limit is not fully drawn at t. Assumption (A5) requires that the “velocity” of a potential limit reduction by the bank is relatively lower than the “velocity” of a potential

AN US

exposure increase of the account owner. This assumption is satisfied if, e.g., the exposure is strictly increasing and the limit is constant. On the basis of assumptions (A1) and (A2), the EL can be modified

t 1

t,

T

 (CCF

 t 1

T  t  ,T

t

t

t

t,

 ( Lt  et )  et )  PDT t  ,T

(5)

ED

 LGDt 

T

 (CCF   ( L  e )  e )  PD  

M

ELt ,T  LGDt 

T 1

 LGDt   (CCF ,T  ( Lt  et )  et )  PD ,T .

PT

 t

Particularly, the CCF does not have to be calculated for the period [t, τ] but for [τ, T]. This

CE

approach is called the “variable-horizon approach”. Again, it should be emphasized that this approach requires that assumptions (A1) and (A2) be valid, which is tested in the empirical

AC

section. Furthermore, we present a proposition that reveals the potential bias of an additional approach – the fixed-horizon approach – that is also used in practice.7

Proposition Let assumptions (A1) – (A4) be valid. Then, the following statements result: 7

The proof is presented in the Appendix.

9

ACCEPTED MANUSCRIPT 1) CCFt,T is strictly decreasing in t. 2) ELt ,T  LGDt  (CCFt ,T  ( Lt  et )  et )  PDtcum ,T .

The right-hand side of part 2) is called the “fixed-horizon approach” to estimate the EL. Consequently, this approach overestimates the EL. However, the fixed-horizon approach

CR IP T

could still be used for regulatory purposes because it can be interpreted as a conservative approach.

3.3. Overview of CCF approaches

AN US

In practice, the four approaches discussed in the previous section differ in the (ex post) calculation of parameters regarding the reference point t. By using the fixed-horizon approach, the exposure at default is linked to the balance and limit at the date of reference (reference point t) exactly one year prior to default (fixed time). The variable-horizon

M

approach is a generalization of the fixed-horizon approach. The year before default is

ED

subdivided into several time windows with several dates of reference (e.g., each month). The exposure at default is linked to the balance and limit of each date of reference. In contrast, the

PT

observation period for the standard cohort approach consists of general dates of reference (e.g., the first month of each year) by subdividing the period into one-year time windows. The

CE

exposure at default is linked to the balance and limit at the starting point of the corresponding

AC

time window. The proposed generalized cohort approach, however, can be applied at an arbitrary point in time t. The observation period for the generalized cohort approach consists of general dates of reference. In contrast to the standard cohort approach, each month of the year is a reference point. With respect to (A1), we estimate means for each month, and if these means were default weighted, the result would be identical to the variable-horizon approach mean.

10

ACCEPTED MANUSCRIPT For every of these CCF approaches, a simple (ex ante) forecast could be implemented by calculating (e.g., at product level) the historical (weighted) average CCF values. Moreover, based on each of these approaches, more sophisticated CCF regression models can be developed to estimate individual CCF forecasts. Implementing the generalized cohort approach means that not one but multiple estimators will be calculated, e.g., based on

CR IP T

regression models referring to a specific reference month.

4. Empirical analysis 4.1. Variable description

AN US

The CCF is defined according to (1). We can transform the CCF into EaD estimates as follows

EaDt ,T  et  CCFt ,T  ( Lt  et ) .

(6)

Alternatively to the CCF, the estimation of EaD can be based on the LEQ, which is defined as

ED

M

 EaDt ,T ,  LEQt ,T   et 0, 

if et  0,

(7)

else.

PT

The LEQ can also be transformed into EaD and CCF

CE

EaDt ,T  et  LEQt ,T

 et  ( LEQt ,T  1) ,  and CCFt ,T   Lt  et 0, 

if Lt  et  0,

(8)

else.

AC

Finally, we can define the EaDF as

 EaDt ,T ,  EaDFt ,T   Lt 0, 

The transformation into EaD and CCF is given by

11

if Lt  0, else.

(9)

ACCEPTED MANUSCRIPT

EaDt ,T  Lt  EaDFt ,T

 EaDFt ,T  Lt  et ,  and CCFt ,T   Lt  et 0, 

if Lt  et  0,

(10)

else.

It is important to note that our definition of CCF is in accordance with the regulatory framework and with Taplin et al. (2007) or Valvonis (2008). Some other studies, however, denominate our CCF variable as LEQ (e.g., Qi 2009, Jacobs 2010, or Leow and Crook 2016)

CR IP T

or EaDF (e.g., Yang and Tkachenko 2012).

4.2. Data description

AN US

Our dataset stems from a large, privately owned European bank. This dataset represents a unique panel of checking accounts with lines of credit consisting of 2,798,491 account-month observations with 2,623 defaults between 2007 and 2014 from 61,371 customers. Checking accounts are a widespread product in many countries. This product is

M

typically used for receipts and expenses and can have a positive or negative balance. Borrowers obtain an initial line of credit of €1000, which is unsecured and without any

ED

expiration date.8 The default definitions used by the bank – “90 days past due” or the

PT

expectation that the customer will not repay all his obligations – comply with Basel III and

(Insert Table 2 about here)

AC

CE

EU Regulation No. 575. Summary statistics of our data are given in Panel A of Table 2.

For the given dataset, we calculate the parameters CCF, LEQ, EaDF, and EaD. Due to

some extreme observations and to avoid instability in parameters, we winsorized the training as well as the validation dataset for CCF, LEQ, and EaDF at the 5th and 95th percentiles, which is common in the literature (e.g., Qi 2009, Jacobs 2010, or Leow and Crook 2016). 8

See e.g. Jiménez et al. (2009) for the impact of collateral and maturity on EaD of corporate credit lines.

12

ACCEPTED MANUSCRIPT Summary statistics of the parameters are given in Panel B of Table 2, and the frequency distributions of realized CCF, LEQ, EaDF, and EaD are given in Figure 2.9

As seen in Table 2, the mean of realized CCF is 72.95%. Hence, on average, approximately 73% of the open limit at the reference point t (up to 12 months prior to default) will be drawn

CR IP T

at default in T. This number slightly exceeds the mean values reported in previous CCF studies concerning credit cards. A possible explanation is that checking accounts are typically more important for customers than are credit card accounts because the former are used, e.g., for the customers’ monthly salary. For this reason, a default will often occur only if the

AN US

customer has already drawn a high percentage of the outstanding limit. Panel A of Figure 2 indicates that many customers do not use their open limit at all (CCF = 0) or use their entire limit (CCF = 1) in default. In this context, it is important to note that approximately the half of the observations with CCF=0 stem from customers without an open limit at reference point t.

M

Values between zero and one are nearly uniformly distributed. Some negative CCF values and

ED

some CCF values greater than one can also be observed. The mean of realized LEQ is 1.23. Therefore, on average, 123% of the credit used at

PT

the reference point is used at default. More than 4,700 customers have no negative balance at the reference point and, consequently, an LEQ of zero (cf. equation 7). The other customers

CE

with LEQ = 0 have an exposure at default of zero. Furthermore, many customers have almost

AC

no change in balance between the reference point and default (LEQ = 1). 9

We further analyzed accounts with extreme CCF observations and found that these accounts have on average a

significantly smaller open limit and higher exposure at the time of CCF calculation, which explains the extreme values. This is similar to Qi (2009), who, as consequence, excludes accounts with an open limit below $50. When repeating all analyses with unwinsorized values for the validation subset, we find very low performance measures, which is a consequence of extreme values. Nevertheless, the other results are quite robust. Furthermore, the economic relevance of accounts with such a low open limit is very small, as they account for only 2.0% of the total open limit.

13

ACCEPTED MANUSCRIPT The distribution of realized EaDF is also bimodal with many low values (EaDF = 0 if the exposure at default is zero) and many high values (EaDF = 1 if the exposure at default equals the limit at the reference point). On average, 85% of the limit at the reference point is used at default. The mean of realized EaD is approximately €1,393, which is above the initial limit. However, customers can ask for an increase of their initial limit of €1,000, which leads to a

CR IP T

mean limit of €1,658. Note that approximately 3,000 observations have an EaD of zero. Most of these observations have a positive balance at default, leading to zero exposure for the bank.

AN US

4.3. Empirical strategy

As presented in Section 2, the estimation of EaD is based on different parameters in the literature, namely CCF, LEQ, EaDF, or a direct estimation of EaD. Using in- and out-ofsample estimations, we empirically determine which definition of such parameters is

M

beneficial for EaD modeling to achieve high accuracy.

ED

Furthermore, various approaches are commonly used in practice and discussed in the literature to model CCF or EaD (e.g., CEBS 2006, Valvonis 2008, Moral 2011, or Bank for

PT

International Settlements 2016). We discussed these approaches in Section 3 in the theoretical analysis of CCF, namely the fixed-horizon, the variable-horizon, the standard cohort, and the

CE

generalized cohort approach. Subsequently, we empirically analyze which approach is

AC

beneficial and in which situations the resulting estimate of the EL is likely to be biased. Predictive accuracy is measured with six different criteria: the out-of-sample

coefficient of determination (R2), relative absolute error (RAE), root mean squared error (RMSE), mean absolute error (MAE), absolute error (ABS), and relative error (REL). The out-of-sample coefficient of determination R2 is defined as

14

ACCEPTED MANUSCRIPT 1  i | yi  yˆi |2 2 n R  1 , 1 2  | yi  ytrain | n i

(11)

where yi and ˆyi are the realized and the forecasted values for account i and ytrain is the

1  i | yi  yˆi | n RAE  100. 1 | y  y |  i train n i

CR IP T

historical average value of the training data.10 Similarly, the RAE is defined as

(12)

Note that if values for RAE are lower than 100 and values for R2 are greater than zero, then

is defined as

RMSE 

1 ( yi  yˆi )2 ,  i n

M

and the MAE is defined as

AN US

the predictive performance of the model is better than the historical average value. The RMSE

1  | yi  yˆi | . n i

(14)

ED

MAE 

(13)

Smaller values of RMSE and MAE imply better accuracy, with zero being the lower bound.

ABS 

1  ( yˆi  yi ) , n i

(15)

REL 

 ( yˆ  y ) . y

(16)

CE

PT

The ABS is defined as

AC

and the REL is defined as i

i

i

i

i

The best achievable value for ABS and REL is zero. Values greater than zero indicate an overestimation of the real value, and values lower than zero imply an underestimation.

10

See also Campbell and Thompson (2008) or Gürtler and Hibbeln (2013) for applications of this out-of-sample

R2 statistic.

15

ACCEPTED MANUSCRIPT For evaluation, we use a 10-fold cross-validation to avoid overfitting and to obtain more robust results (see, e.g., Bastos 2010, Qi and Zhao 2011, or Hartmann-Wendels et al. 2014). For this purpose, we randomly divide the sample into 10 subsamples. Then, we use nine subsamples to build the model (in-sample) and one subsample to test the model (out-ofsample) and calculate our predictive accuracy measures. We repeat this 10 times such that

CR IP T

each subsample represents exactly once the test sample. The obtained (10) values for the measures are combined to obtain one value for each measure for each 10-fold crossvalidation. This method is repeated 1,000 times with different randomly generated subsamples.11

AN US

For estimation, we use common independent variables regarding default risk, account activity, or behavioral variables on a monthly basis, relationship variables, and client controls (see, e.g., Qi 2009, Leow and Crook 2016, or Hon and Bellotti 2016). Based on some directly observable variables such as balance or limit, we also derive ratios or dummy variables such

As an additional robustness check, we have repeated the analyses with an out-of-time validation. We find that

PT

11

ED

M

as usage or high usage (usage greater than 95%) (cf. Qi 2009).12 In our baseline analysis, we

the results remain largely unchanged.

We derived the relevant variables from the literature. For estimation, we use the same variables for all models

CE

12

to have identical conditions for each model, because our focus is not on finding the best statistical model.

AC

Instead, we analyze on which parameter (EaD, CCF, LEQ, or EaDF) a model should be based in order to optimally estimate EaD. We use the following variables: undrawn amount, drawn amount, usage, usage greater than 95%, rating, difference between monthly cash inflows and outflows, difference between the maximum and minimum exposure in each month (both as a percentage of the external limit), average number of bounced debits (return debit notes), percentage of days with a negative balance, percentage of days with overdrafts (last three variables in the preceding 12 months), and length of the relationship in months. Customer controls are customer’s age, gender, job, marital status, number of children, nationality, online versus offline banking, and academic degrees.

16

ACCEPTED MANUSCRIPT use linear regression (OLS) for calculation. As a robustness test, we also implement mixture regression models.

4.4. Comparison of different parameters As shown in Section 4.1, the estimation of EaD can be based on different risk parameters, which can easily be transformed into the EaD. We estimate the risk parameters CCF, LEQ,

CR IP T

EaDF, and EaD at variable time horizons. For comparison, we additionally implement the historical average of CCF and EaD, namely CCF-mean and EaD-mean. To be able to compare the accuracy of the different models, we have to transform the estimates into an identical

AN US

basis, for example into EaD estimates or CCF estimates. Specifically, we transform the estimated values of CCF, LEQ, EaDF, and CCF-mean into EaD estimates (EaD level). Similarly, we transform the estimated values of LEQ, EaDF, EaD, and EaD-mean into CCF estimates (CCF level). In Table 3, we display our performance measures after this

ED

M

transformation into EaD estimates.

By definition, the R², ABS, and REL measures for EaD-mean equal zero, and similarly, the

PT

RAE equals 100, when calculated in-sample. Typically, the out-of-sample predictive accuracy measures are smaller than in-sample measures. However, we find that the out-of-sample

CE

accuracy is nearly identical for most measures, which indicates that there is no problem

AC

regarding overfitting the data. Interestingly, the CCF-mean provides a good prediction of EaD (in-sample and out-of-sample), which means that CCF-mean performs substantially better than EaD-mean in explaining the variance of EaD. This effect appears because the transformation of the CCF-mean into EaD values by equation (6) leads to variance in the estimates. Nevertheless, regarding the absolute error, the accuracy of CCF-mean is rather low. The risk parameters CCF and EaD exhibit the best overall performance. In particular, for R², RAE, RMSE, and MAE, we find only small differences between the two parameters. 17

ACCEPTED MANUSCRIPT Interestingly, the absolute error for CCF is also very good and comparable to the absolute error for EaD. The parameter EaDF and to an even greater extent the LEQ have rather low performance in explaining the EaD. Specifically, the absolute error and relative error are very high compared to CCF and EaD. In summary, if the EaD is the parameter of interest, the

CR IP T

estimate should be based on either CCF or EaD for modeling and forecasting.

Next, we analyze the predictive accuracy measures evaluated at the CCF level (see Table 4). Again, out-of-sample predictive accuracy measures are generally slightly smaller than insample measures, and the R², ABS, and REL for CCF-mean is zero (the RAE equals 100) in-

AN US

sample. This finding also (approximately) holds for the out-of-sample prediction. Remarkably, only for CCF do the R² and RAE indicate good accuracy. For LEQ, EaDF, EaD, and EaD-mean, we observe low performance. Furthermore, the other performance measures indicate poor performance of parameters that are transformed into CCF; only the CCF model

M

results in a good predictive power if evaluated at the CCF level. Regarding the absolute error,

ED

no parameter except CCF (and EaDF) is qualified to forecast CCF. As mentioned before, there exists a stream of literature discussing the wide variety of

PT

other sophisticated statistical regression models for modelling EaD, CCF, LEQ, or EaDF (e.g. Hon and Bellotti 2016, Leow and Crook 2016, or Tong et al. 2016). We do not focus on

CE

finding the best statistical model and further work may shed more insight into the best EaD

AC

modelling approach. Instead, we analyze on which parameter (EaD, CCF, LEQ, or EaDF) a model should be based to optimally estimate EaD. However, as a robustness test we additionally implement several of the proposed sophisticated statistical regressions models. More specifically, we perform (unreported) analyses based on the zero-adjusted gamma model for EaD (see Tong et al. 2016), a mixed-model (degenerate term plus a normal distribution) for CCF (see Hon and Bellotti, 2016), a mixed-model (degenerate term plus a weibull distribution) for LEQ (see Hon and Bellotti, 2016), and a mixed-model (degenerate 18

ACCEPTED MANUSCRIPT term plus a weibull distribution) for EaDF (see Hon and Bellotti, 2016). We find that the overall results regarding the influence of parameter choice on the EaD estimate remain largely unchanged.13 To sum up, we show that if the EaD is estimated directly and subsequently transformed into CCF predictions, the accuracy at the CCF level is rather low. The same holds

CR IP T

for LEQ (and EaDF). However, we find that CCF can not only be used to achieve accurate CCF predictions, but these forecasts can also be used to derive EaD predictions that are of similar accuracy to estimates that directly focus on EaD. Thus, our results suggest that EaD forecasts should be based on CCF models instead of applying LEQ, EaDF, or direct EaD

AN US

models.

4.5. Comparison of different estimation approaches evaluated at the CCF level As a next step, we repeat the analysis of Table 4 with a focus on the different CCF estimation

M

approaches and evaluate them at CCF level. Specifically, we calculate the in-sample and out-

ED

of-sample performance based on 1,000 random 10-fold cross-validations for fixed CCF, variable CCF, standard cohort CCF, and generalized cohort CCF. Regardless of the estimation

PT

approach, we find widely comparable performance measures. However, the R² for fixed CCF is particularly high (and the RAE is low) because the estimation is based solely on defaults in

CE

exactly 12 months. However, note that we compare the results of the estimated fixed CCF

AC

with the realized fixed CCF. Thus, it should not be concluded that estimating the fixed CCF is 13

Results are available from the authors upon request. For estimation, we again use the same variables for all

models to have identical conditions for each model. Thus, the variable selection is not tuned specifically to each subcomponent of the mixture models because our focus is to find which parameter a model should be based on for optimally estimating EaD, and not to find the best statistical prediction model for the EAD. We notice that, for a given parameter choice, it might be meaningful to put additional emphasis into finding the best statistical model, e.g. as in Hon and Bellotti (2016) or Tong et al (2016) regarding mixture models, which should therefore be seen as an important complementary analysis.

19

ACCEPTED MANUSCRIPT superior to the other approaches but only that the accuracy of this model is high if the target is to estimate the CCF for a fixed horizon. We will show in Section 4.6 that the EL can be substantially biased if the estimation is based on the fixed-horizon approach. Moreover, the results in Table 5 confirm that there is a high dependence on CCF and the reference month. Especially for the absolute error, the minimum and the maximum of cohort CCF differ

CR IP T

substantially, ranging from an underestimation of 6% to an overestimation of 4%. As expected, the mean value of standard cohort CCF, calculated as the mean CCF for all possible reference periods, is similar to the generalized cohort CCF.

AN US

In the previous sections, we discussed the accuracy of estimates at the CCF and EaD level. However, the ultimate parameter of interest is the EL. In our next analysis, we empirically investigate which approach is beneficial for estimating the EL and in which

M

situations the resulting estimate of the EL is likely to be biased.

ED

4.6. Comparison of different estimation approaches evaluated at the EL level In Table 6, we first present different historical means for CCF, EaD, and PD regarding the

PT

standard cohort, the generalized cohort, the variable-horizon, and the fixed-horizon approach.14 We find that the mean CCF based on the fixed-horizon approach is much higher

CE

than the CCF based on the standard cohort or generalized cohort/variable-horizon approach,

AC

which is consistent with our theoretical results in Section 3.2. In Figure 3, the mean CCF is presented for all 12 months prior to default, which shows that our theoretical result that the

14

Note that the presented results for the generalized cohort and variable-horizon approach are identical because

historical means for CCF, EaD, and PD are displayed. Differences between the two approaches arise when EL estimates are calculated: For the generalized cohort approach, for each month, a separate CCF mean value is assigned to an individual contract, whereas for the variable-horizon approach, one general CCF mean value is assigned to all contracts.

20

ACCEPTED MANUSCRIPT CCF is strictly decreasing in t and, thus, that a fixed CCF leads to an overestimation of risk, holds for our empirical dataset. However, this bias does not apply to the EaD because the realized value of EaD does not depend on a reference point t.

The historically estimated CCF values computed on the basis of the standard cohort approach vary substantially, meaning that there is high dependence on the chosen reference

CR IP T

month. This means that assumption (A1) “the CCF for a given residual lifetime does not depend on the reference month” seems not to be fulfilled.15 Consequently, the same is true for the resulting EL estimates. As expected, the mean standard cohort CCF is nearly equal to the

AN US

variable/generalized cohort CCF.16

Notably, the estimated PDs based on the cohort approach differ only slightly. This indicates that the PD estimation seems to be nearly independent of the reference month. As the fixed-horizon approach only considers defaults in exactly 12 months, the value for PD is

M

approximately 12 times smaller than the corresponding value based on the variable approach

ED

because the observations refer to monthly data.

These results, in combination with our previous results presented in Section 4.4, imply

PT

that the estimates of the EL can be biased if different time horizons are used for parameter estimation. Based on the data, we now calculate the magnitude of this bias. Furthermore, we

CE

show how the CCF (in combination with the PD) should be modeled to achieve consistent

AC

estimates of the EL. For this purpose, we first calculate the simple historical estimators for

15

We also tested assumption (A1) with bootstrapping. We resample one 12th of observations (with replacement)

10,000 times and calculate the mean of CCF. The mean value of all 10,000 runs is comparable to the variable CCF in Table 6 (0.7460). The 95% confidence interval is [0.6758, 0.8193]. This indicates that assumption (A1) is not fulfilled if formulated unconditional on x. 16

If values based on the standard cohort approach were default weighted, the results would be identical to those

of the generalized cohort/variable-horizon approach.

21

ACCEPTED MANUSCRIPT CCF and PD, in both cases based on the four discussed estimation approaches (standard cohort, generalized cohort, variable and fixed approach). For each combination of these estimators, we calculate the performance of the resulting EL estimate. As a next step, we perform similar analyses but implement regression approaches (probit regressions for the PD and linear regressions for the CCF) instead of the historical estimators to evaluate the

CR IP T

performance of each combination of the four discussed approaches. In Table 7, we report the results for different estimates of the EL calculated from CCF and PD estimates that are based on the historical average using the four discussed approaches (cf. Table 6). For the cohort approach, we present the minimum, mean, and maximum of the d|(d=)

= 100%, i.e., in the

AN US

12 different reference months. For simplification, we assume

event of default, the entire exposure is lost. As expected, the mean CCF based on the fixedhorizon approach in combination with mean variable PD leads to a substantial overestimation of the EL of generally more than 49%. Furthermore, we find considerable variation for the

M

standard cohort CCF, which confirms our previous findings. However, when comparing the

ED

mean value of the standard cohort CCFs with the mean variable CCF and the mean generalized CCF, we find that the estimates are nearly identical.17 One possible problem is

PT

that the resulting EL values appear to be biased for all of these approaches, with an overestimation of the EL of approximately 28%. Considering a portfolio of 100,000 checking

CE

accounts, the overestimation of EL, for example when using “PD-mean: generalized cohort,”

AC

differs by from approximately €457,000 (“CCF-mean: variable”) up to €794,350 (“CCFmean: fixed”). The reason for this overestimation of the EL is that the mean CCF is calculated on the basis of observable values for defaulted customers, and this CCF is also assigned to non-defaulted customers, for which the CCF is not observable. This is problematic because 17

As discussed in Table 6, the CCF mean for the generalized cohort and the variable-horizon approach are

identical. However, at the EL level, the values differ slightly due to a not exactly uniform distribution of defaults over the year.

22

ACCEPTED MANUSCRIPT the CCF is typically lower if the PD is low (see Hibbeln et al. 2015 for a similar finding regarding credit line usage). As a consequence, the CCF values assigned to non-defaulting customers, who represent the majority of the dataset, are on average too high, leading to an overestimation of EL.

CR IP T

To analyze this further, we next implement regression models for the various CCF and PD approaches. As the expected CCF values are derived from regression models instead of historical averages, the resulting estimates include information on several explanatory variables, including the default risk of an account. Thus, the abovementioned problem that

AN US

CCF values are estimated based on defaulted accounts and applied to non-defaulted accounts does not necessarily lead to the previously found bias.18 Moreover, it is straightforward to formulate the proposition in Section 3.2 conditional on some arbitrary explanatory variables x; in this case, it is sufficient to formulate assumption (A1) conditional on x as well, (CCFt, |

M

x)= (CCFT+t,T | x), which is a substantially weaker assumption. Thus, even if the CCF

ED

depends on the reference month, this is not problematic as long as this dependence can be explained by some explanatory variables x that are considered in the estimation. For these

PT

reasons, it is likely that the biases found in the analyses above would be less pronounced or

CE

even vanish if we estimated CCF (and the EL) based on regression models instead of on

AC

historical averages.

18

Note that including, for example, default risk as an explanatory variable in the regression models results in

lower CCF values for non-defaulted accounts. The means of forecasted CCF values of non-defaulted accounts for the variable, standard cohort, and generalized cohort approach are 0.478, 0.482 (mean), and 0.478, whereas the means of forecasted CCF values of defaulted accounts for the variable, standard cohort, and generalized cohort approach are 0.745, 0.739 (mean), and 0.745.

23

ACCEPTED MANUSCRIPT Thus, similar to Table 7, we report different estimates of the EL in Table 8, but the estimates are now based on CCF regressions to allow controlling for variables, e.g., reflecting the default risk. We find that our proposed generalized cohort approach performs best in estimating EL in the next 12 months, but the results are very similar to those based on the variable CCF, and even the fixed and the standard cohort CCF exhibit good accuracy. This

CR IP T

confirms that assumption (A1) is rather uncritical if it is formulated conditional on x.

In summary, if a simple historical average is calculated to estimate the CCF, then the results are likely to overestimate the EL because the (high) CCF of accounts with high default

AN US

risk is assigned to accounts with low default risk. Moreover, the resulting CCF and EL estimates highly depend on the chosen approach (fixed, variable, standard cohort, or generalized cohort). As assumption (A1) is not necessarily fulfilled, the outcomes are likely to be biased, which is particularly problematic for the fixed and the standard cohort approach. If,

M

however, the CCF estimates are based on regression models and include variables reflecting

ED

the default risk, then the choice of the CCF approach has a substantially smaller impact on the

CE

5. Conclusion

PT

resulting bias of both CCF and EL.

In this paper, we identify and investigate several theoretical and empirical issues regarding

AC

EaD and CCF modeling. We apply our empirical analyses to a unique dataset of a large European bank, consisting of 2,798,491 monthly observations from 61,371 customers during the period 2007–2014. First, we propose the generalized cohort approach and discuss its advantages over the three other common approaches: the fixed-horizon, variable-horizon, and (standard) cohort approach. Specifically, the generalized cohort approach is beneficial for a consistent modeling of EL. We demonstrate the assumptions under which two other approaches (variable-horizon 24

ACCEPTED MANUSCRIPT and standard cohort) can also be used, but the fixed-horizon approach continues to be (positively) biased. From a regulatory perspective, however, this is rather unproblematic because the approach is more conservative. On the contrary, from an internal risk management perspective and regarding the new International Financial Reporting Standard (IFRS) 9, this could be problematic because consistent estimation is necessary.

CR IP T

Furthermore, in our empirical analyses, we use in- and out-of-sample estimations to show which parameter is beneficial for EaD modeling to achieve high accuracy. We find that CCF should be applied instead of LEQ, EaDF, or direct EaD estimates. In particular, a transformation of EaD estimation into CCF produces rather low performance, whereas a

AN US

transformed CCF estimation outperforms direct EaD predictions on most predictive accuracy measures.

Finally, we empirically demonstrate that there are interactions between modeling PD and EaD that can lead to substantially biased estimates of the EL. Regarding the (standard)

M

cohort approach, we show that the result depends substantially on the chosen reference month,

AC

CE

PT

ED

whereas our proposed generalized cohort approach avoids this problem.

25

ACCEPTED MANUSCRIPT Appendix – Proof of the Proposition 1) Using et = e(t), Lt = L(t), a:= Et(ed | d = T) = E(ed | d = T) and applying the definition of CCF

AN US

   a  e(t )  CCFt ,T    t t  L(t )  e(t )  e '(t )  ( L(t )  e(t ))  (a  e(t ))  ( L '(t )  e '(t ))  ( L(t )  e(t )) 2 e '(t )  L(t )  a  ( L '(t )  e '(t ))  e(t )  L '(t )  ( L(t )  e(t )) 2 (a  L(t ))  e '(t )  (a  e(t ))  L '(t)  0 ( L(t )  e(t )) 2 a  L(t ) L '(t ) e '(t )  L '(t )     . a  e(t ) e '(t ) a  e(t ) L(t )  a

CR IP T

for et  Lt (see assumption A4), it follows that

The latter inequality is fulfilled as a result of assumption (A5).

M

2) T 1

 t

T 1

ED

ELt ,T  LGDt   (et  CCF ,T  ( Lt  et ))  PD ,T  LGDt   ( et  CCFt ,T  ( Lt  et ))  PD ,T  t

T 1

PT

 LGDt  (et  CCFt ,T  ( Lt  et ))   PD ,T  t

CE

 LGDt  (et  CCFt ,T  ( Lt  et ))  PDtcum ,T .

AC

The above inequality results from the application of part 1) to each of the CCF values in the sum.

26

ACCEPTED MANUSCRIPT References Agarwal, S., B.W. Ambrose, and C. Liu. 2006. Credit Lines and Credit Utilization. Journal of Money, Credit, and Banking 38, 1-22. Araten, A., and M. Jacobs Jr. 2001. Loan Equivalents for Revolving Credits and Advised Lines. The RMA Journal. May. 34-39. Asarnow, E., and J. Marker. 1995. Historical Performance of the U.S. Corporate Loan Market 1988-1993. Commercial Lending Review 10, 13-32.

Loan Commitments. Journal of Risk Finance 13, 77-94.

CR IP T

Bag, P., and M. Jacobs Jr. 2012. Parsimonious Exposure-at-Default Modeling for Unfunded

Banerjee, P., and J.J. Canals-Cerdá. 2012. Credit Risk Analysis of Credit Card Portfolios under Economic Stress Conditions, Working Paper, Federal Reserve Bank of Philadelphia, No. 12-18.

AN US

Bank for International Settlements. 2014. Statistics on Payment, Clearing and Settlement Systems in the CPMI Countries.

Bank for International Settlements. 2016. Regulatory Consistency Assessment Programme (RCAP) – Analysis of Risk-Weighted Assets for Credit Risk in the Banking Book.

Finance 34, 2510-2517.

M

Bastos, J. A. 2010. Forecasting Bank Loans Loss-Given-Default. Journal of Banking and

ED

Bluhm, C., Overbeck, L., and C. Wagner. 2010. Introduction to Credit Risk Modeling, 2nd Ed. CRC Press, Boca Raton.

PT

Campbell, J.Y., and S.B. Thompson. 2008. Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average? Review of Financial Studies 21, 1509–1531.

CE

Committee of European Banking Supervisors. 2006. Guidelines on the Implementation, Validation and Assessment of Advanced Measurement (AMA) and Internal Ratings Based

AC

(IRB) Approaches.

Deutsche Bundesbank. 2015. Zinsstatistik. November. Gürtler, M., and M. Hibbeln. 2013. Improvements in Loss Given Default Forecasts for Bank Loans. Journal of Banking and Finance 37, 2354-2366. Hahn R., and S. Reitz. 2011. Possibilities of Estimating Exposures. In: Engelmann, B., and R. Rauhmeier (Ed.), The Basel II Risk Parameters, 2nd Ed. Springer, Berlin. Hartmann-Wendels, T., P. Miller, and E. Töws. 2014. Loss Given Default for Leasing: Parametric and Nonparametric Estimations. Journal of Banking and Finance 40, 364-375. 27

ACCEPTED MANUSCRIPT Hibbeln, M., L. Norden, P. Usselmann, and M. Gürtler. 2015. Informational Synergies in Consumer Credit. Working Paper. January 2015. Hon, P. S., and T. Bellotti. 2016. Models and Forecasts of Credit Card Balance, European Journal of Operational Research 249, 498-505. Jacobs Jr, M. 2010. An Empirical Study of Exposure at Default. Journal of Advanced Studies in Finance 1, 32-59. Jacobs Jr, M., and P. Bag 2011. What do We Know About Exposure at Default on Contingent

Paper, April 2011.

CR IP T

Credit Lines? – A Survey of the Literature, Empirical Analysis and Models. Working

Jiménez, G., J.A. Lopez, and J. Saurina. 2008. Calibrating Exposure at Default for Corporate Credit Lines. Journal of Risk Management in Financial Institutions 2, 121-129.

Jiménez, G., J.A. Lopez, and J. Saurina. 2009. Empirical Analysis of Corporate Credit Lines.

AN US

Review of Financial Studies 22, 5069-5098.

Kim, M.-J. 2008. Stress EAD: Experience of 2003 Korea Credit Card Distress. Journal of Economic Research 13, 73-102.

Leow, M., and J. Crook. 2016. A New Mixture Model for the Estimation of Credit Card

M

Exposure at Default. European Journal of Operational Research 249, 487-497.. Moral, G. 2011. EAD Estimates for Facilities with Explicit Limits. In: Engelmann, B., and R.

ED

Rauhmeier (Ed.), The Basel II Risk Parameters, 2nd ed. Springer, Berlin. Qi, M. 2009. Exposure at Default of Unsecured Credit Cards. Working Paper, OCC

PT

Economics Working Paper 2009-2.

Qi, M., and X. Zhao. 2011. Comparison of Modeling Methods for Loss Given Default.

CE

Journal of Banking and Finance 35, 2842-2855. Sufi, A. 2009. Bank Lines of Credit in Corporate Finance - An Empirical Analysis. Review of

AC

Financial Studies 22, 1057-1088. Taplin R., H.M To, and J. Hee. 2007. Modeling Exposure at Default, Credit Conversion Factors and the Basel II Accord. Journal of Credit Risk 3, 75-84.

Tong, E.N.C., C. Mues, I. Brown, and L.C. Thomas. 2016. Exposure at Default Models with and without the Credit Conversion Factor, European Journal of Operational Research 252, 910-920. Valvonis, V. 2008. Estimating EAD for Retail Exposures for Basel II Purposes. Journal of Credit Risk 4, 79-109. 28

ACCEPTED MANUSCRIPT Yang, B.H., and M. Tkachenko. 2012. Modeling Exposure at Default and Loss Given Default: Empirical Approaches and Technical Implementation. Journal of Credit Risk 8, 81-102. Zhao, J. Y., D.W. Dwyer, and J. Zhang. 2011. Usage and Exposures at Default of Corporate

AC

CE

PT

ED

M

AN US

CR IP T

Credit Lines: An Empirical Study. Moody’s Analytics. December. 1-19.

29

ACCEPTED MANUSCRIPT Figure 1: Different estimation approaches of CCF/EaD

CR IP T

Panel A displays the fixed-horizon approach. The exposure at default (black) is linked to the balance and limit at the date of reference (gray) exactly one year prior to default (fixed time). Panel B displays the variable-horizon approach. It is a generalization of the fixed-horizon approach. The year before default is subdivided into several time windows with several dates of reference (e.g., each month). The exposure at default is linked to the balance and limit of each date of reference. Panel C displays the standard cohort approach. The observation period consists of general dates of reference (e.g., the first month of each year) by subdividing the full observation period into one-year time windows. The exposure at default is linked to the balance and limit at the starting point of the corresponding time window. Panel D displays the generalized cohort approach. The observation period consists of general dates of reference, but in contrast to the standard cohort approach, each month of the year is used as a date of reference. In the upper half, the date of reference A (e.g., the first month of each year) is displayed. In the lower half, the date of reference B (e.g., the second month of each year) is displayed, leading to different CCF/EaD estimates depending on the relevant date of reference. Again, the exposure at default is linked to the balance and limit at each date of reference. A similar graphical illustration of the first three approaches can be found in Valvonis (2008). Panel A: Fixed-horizon approach

AN US

Date of reference One-year fixed horizon

Panel B: Variable-horizon approach

Dates of reference

Date of default

Time

Date of default

ED

M

Max. one-year horizon

Time

Panel C: Standard cohort approach

Date of reference Date of default

PT

One-year time window

CE

Time

AC

Panel D: Generalized cohort approach Date of reference Date of default A

One-year time window

Time

One-year time window

Date of reference Date of default B

Time

30

ACCEPTED MANUSCRIPT Figure 2: Frequency distribution of realized CCF, LEQ, EaDF, and EaD This figure reports frequency distributions of the realized (winsorized) dependend variables credit conversion factor (CCF), loan equivalent factor (LEQ), exposure at default factor (EaDF), and exposure at default (EaD).

Panel B: Frequency distribution of realized LEQ

2000 0

0

0

2 CCF

4

6

1

2

3

4

5

LEQ

Panel D: Frequency distribution of realized EaD

.5

1 EaD-F

1.5

2000

4000

2

0

AC

CE

PT

0

0

0

ED

1000

M

2000

Frequency

6000

3000

8000

4000

Panel C: Frequency distribution of realized EaDF

Frequency

0

AN US

-2

CR IP T

4000

Frequency

6000 4000 2000

Frequency

8000

10000

6000

Panel A: Frequency distribution of realized CCF

31

5000

10000 EaD

15000

20000

ACCEPTED MANUSCRIPT Figure 3: CCF for different reference periods

Months to default

CCF-weigthed-mean

AC

CE

PT

ED

M

CCF-mean

32

1

3

4

5

6

AN US 7

8

9

10

11

12

2

CR IP T

.6 .2

.4

CCF

.8

1

The figure presents the mean CCF and the weighted mean CCF (weighted by the open limit) calculated for different reference periods between 12 months and one month prior to default.

ACCEPTED MANUSCRIPT Table 1: Empirical studies concerning the credit conversion factor This table summarizes the CCF/EaD literature regarding mean CCF values and various characteristics of the analyzed data. The country abbreviation AU stands for Australia, CA for Canada, ES for Spain, EU for European Union, KR for South Korea, UK for the United Kingdom, and US for the United States. Agarwal et al. (2006), Banerjee and Canals-Cerdá (2012), Leow and Crook (2016), and Yang and Tkachenko (2012) do not report explicit CCF values. Observations denotes the number of observations in the study, which can either be based on the full sample (f.s.) or on the defaulted sample (d.s.) after filtering the data. For studies with n.a., the number of observations was not available. For more detailed values of the survey by the Bank for International Settlements (2016) see ibid. p. 29. Authors

Country

Data

Years

CCF

Type

Agarwal et al. (2006)

US

Bank

98-01

n.a.

Araten/Jacobs (2001)

US

Bank

95-00

43%

Asarnow/Marker (1995)

US

Capital market

88-93

60%

Retail Home equity line Corporate Revolving credit Corporate Loan

Bag/Jacobs (2012)

US

Capital market

2008

52.5%

Banerjee/Canals-Cerdá (2012)

US

Credit bureau

05-10

n.a.

Bank for International Settlements (2016)

17 countries

Bank

n.a.

62%/ 69%

Corporate/ Retail

Jacobs (2010)

US

Capital market

85-07

63.7%

Corporate Lines of credit

Jiménez et al. (2009)

ES

Credit bureau

84-05

59.6%

Jiménez et al. (2008)

ES

Credit bureau

84-05

Kim (2008)

KR

Bank

Leow/Crook. (2016)

UK

Bank

Borrowers 34,384

1021 (d.s.)

399

n.a. (f.s.)

575 (facilities)

n.a. (d.s.)

26

22,265,000 (f.s.)

2,539,000

n.a.

n.a.

3886 (d.s.)

683

Corporate Credit line

2,078,434 (f.s.)

368,977

59.6%

Corporate Credit line

n.a. (d.s.)

4,094

01-05

39%

Corporate Credit card

197,810 (d.s.)

197,810

01-10

n.a.

Retail Credit card

n.a. (d.s.)

80,552

CR IP T

34,384 (f.s.)

Corporate Contingent credit line Retail Credit card

AN US

M

ED

PT

Qi (2009)

Observations

US

Credit bureau

99-06

166%

Retail Credit card

152,657 (d.s.)

152,657

AU

Bank

n.a.

25%

Corporate Credit card

n.a.

n.a.

UK

Bank

01-04

51,5%

Retail Credit Card

10,271 (d.s)

10,271

Valvonis (2008)

EU

Bank

05-06

32.4%

Retail Credit card

n.a. (d.s.)

3,332

Valvonis (2008)

EU

Bank

05-06

48.0%

Corporate Credit card

n.a. (d.s.)

44

Yang/Tkachenko (2012)

CA

Bank

n.a.

n.a.

Corporate

n.a. (d.s.)

500

Zhao et al. (2011)

US

Bank

00-10

49%

Corporate Credit line

n.a. (d.s.)

7,653

Zhao et al. (2011)

US

Capital market

87-09

39%

Corporate Credit line

n.a. (d.s.)

286

AC

Tong et al. (2016)

CE

Taplin et al. (2007)

33

ACCEPTED MANUSCRIPT Table 2: Summary statistics This table reports summary statistics of our data. Panel A reports the number of account-month observations and the frequency of default events. Panel B provides summary statistics of our (winsorized) variables of interest: credit conversion factor (CCF), loan equivalent factor (LEQ), exposure at default factor (EaDF), and exposure at default (EaD).

Panel A: Number of account-month observations and default events

q25 0.000 0.497 0.373 271 €

AC

CE

PT

ED

M

AN US

Panel B: Summary statistics of realized dependent variables Mean Std. Dev. CCF 0.730 1.674 LEQ 1.232 1.168 EaDF 0.849 0.551 EaD 1,394 € 1,567 €

34

Checking account 2,798,491 27,402 2,623

CR IP T

Number of account-month observations Number of account-months with default in the subsequent 12 months Number of defaults

q50 0.084 1.056 0.997 945 €

q75 1.138 1.429 1.125 2,031 €

ACCEPTED MANUSCRIPT Table 3: Comparison of different parameters evaluated at the EaD level We calculated the in-sample and out-of-sample performance based on the means of 1,000 random 10-fold crossvalidations using OLS regressions (for CCF, LEQ, EaDF, and EaD) or historical averages (CCF-mean and EaDmean). Panel A reports the in-sample performance measures of coefficient of determination (R2), relative absolute error (RAE), root mean squared error (RMSE), mean absolute error (MAE), absolute error (ABS), and the relative error (REL). The displayed values are at the EaD level, meaning that we transformed CCF, LEQ, EaDF, and CCF-mean into EaD estimates. We report standard deviations in parentheses. The best values are highlighted in bold. Panel B reports the corresponding out-of-sample performance measures.

Panel A: In-sample predictive accuracy

EaDF EaD CCF-mean EaD-mean

RMSE 963.4

MAE 520.4

(0.000)

(0.002)

(0.042)

(0.000)

(0.000)

(0.000)

0.433

53.32

1,179

607.0

-141.3

-0.101

(0.000)

(0.002)

(0.109)

(0.021)

(0.006)

(0.000)

0.588

50.04

1,005

569.6

-63.81

-0.046

(0.000)

(0.001)

(0.021)

(0.009)

(0.000)

(0.000)

0.618

48.90

968.6

556.7

0.000

0.000

(0.000)

(0.001)

(0.014)

(0.011)

(0.000)

(0.000)

0.549

52.51

(0.000)

(0.000)

0.000

100.0

(0.000)

(0.000)

1,052 (0.007)

ABS 0.335

REL 0.000

CR IP T

LEQ

RAE 45.71

597.7

AN US

CCF

R2 0.622

(0.000)

163.8

0.118

(0.002)

(0.000)

1,567

1,138

0.000

0.000

(0.005)

(0.002)

(0.000)

(0.000)

RMSE 963.9

MAE 521.1

ABS 0.327

REL 0.000

LEQ EaDF

RAE 45.78

(0.001)

(0.016)

0.432

53.39

(0.000)

(0.017)

0.587 (0.001)

EaD

0.616

CCF-mean

(0.177)

(0.000)

607.8

-141.3

-0.101

(0.168)

(0.176)

(0.000)

1,006

570.6

-63.88

-0.046

(0.011)

(0.541)

(0.104)

(0.097)

(0.000)

49.00

969.9

557.8

-0.054

0.000

(0.012)

(0.430)

(0.115)

(0.099)

(0.000)

0.548

52.51

1,052

597.8

163.8

0.118

(0.001)

(0.008)

(0.314)

(0.026)

(0.020)

(0.000)

0.000

100.0

1,566

1,138

0.000

0.001

(0.000)

(0.000)

(0.322)

(0.025)

(0.001)

(0.000)

AC

CE

EaD-mean

(0.167)

1,178

(1.548)

PT

(0.001)

(0.579)

50.12

ED

CCF

R2 0.620

M

Panel B: Out-of-sample predictive accuracy

35

ACCEPTED MANUSCRIPT Table 4: Comparison of different parameters evaluated at the CCF level We calculated the in-sample and out-of-sample performance based on the means of 1,000 random 10-fold crossvalidations using OLS regressions (for CCF, LEQ, EaDF, and EaD) or historical averages (CCF-mean and EaDmean). Panel A reports the in-sample performance measures of coefficient of determination (R2), relative absolute error (RAE), root mean squared error (RMSE), mean absolute error (MAE), absolute error (ABS), and the relative error (REL). The displayed values are at the CCF level, meaning that we transformed LEQ, EaDF, EaD, and EaD-mean into CCF estimates. We report standard deviations in parentheses. The best values are highlighted in bold. Panel B reports the corresponding out-of-sample performance measures.

Panel A: In-sample predictive accuracy RAE 78.56

RMSE 1.504

MAE 0.907

ABS 0.000

REL 0.000

(0.000)

(0.001)

(0.000)

(0.000)

(0.000)

(0.000)

LEQ

-9,132

549.1

477.3

11.39

2.498

2.417

(9,221)

(1.089)

(2.733)

(0.023)

(0.022)

(0.021)

EaD CCF-mean EaD-mean

-13,837

464.8

195.3

5.367

0.028

0.038

(115.62)

(0.802)

(1.158)

(0.009)

(0.009)

(0.013)

-1,904,017

7,241

2,304

83.61

80.13

109.9

(6,053.13)

(7.833)

(1.758)

(0.090)

(0.090)

(0.124)

0.000

100.0

(0.000)

(0.000)

-9,108,667

1,747

(943,113)

(0.893)

0.000

0.000

(0.000)

1.674

(0.000)

1.155

(0.000)

(0.000)

5,050

201.7

182.1

249.6

(0.922)

(0.008)

(0.008)

(0.021)

AN US

EaDF

CR IP T

CCF

R2 0.193

R2 0.189

RAE 78.74

RMSE 1.507

MAE 0.909

ABS 0.000

REL 0.002

(0.000)

(0.000)

(0.000)

(0.000)

(0.001)

553.2

323.1

11.50

2.419

2.343

(1,136,440)

M

Panel B: Out-of-sample predictive accuracy

(11.55)

(23.42)

(0.209)

(0.200)

(0.325)

EaDF

-14,256

468.1

130.3

5.408

-0.000

0.005

(1,151)

(7.642)

(8.578)

(0.085)

(0.084)

(0.134)

EaD

-1,919,390

7,259

2,187

83.81

80.29

110.2

(51,143)

(86.64)

(61.26)

(0.986)

(0.984)

(1.568)

0.000

100.0

1.674

1.155

0.000

0.002

(0.000)

(0.000)

(0.000)

(0.000)

(0.000)

(0.001)

CCF

-9,501

CCF-mean

17,470

4,851

201.8

182.1

249.9

(3,777,000,000)

(49.464)

(111.3)

(0.069)

(0.069)

(1.809)

-9,115,929

AC

CE

EaD-mean

PT

ED

LEQ

(0.017)

36

CR IP T

ACCEPTED MANUSCRIPT

Table 5: Comparison of different approaches for CCF estimation at the CCF level

Panel A: In-sample predictive accuracy CCF: standard cohort

R2 0.175 0.179 0.184

RAE 78.41 79.68 82.03

RMSE 1.512 1.517 1.522

MAE 0.915 0.920 0.929

ABS -0.061 -0.001 0.040

REL -0.083 -0.001 0.054

(0.000)

(0.013)

(0.000)

(0.000)

(0.000)

(0.000)

CFF: generalized cohort CCF: variable CCF: fixed

(0.000)

(0.020)

0.206

78.52

(0.000)

(0.004)

0.193

78.56

(0.000)

(0.001)

0.941

5.727

(0.000)

(0.001)

Panel B: Out-of-sample predictive accuracy

(0.000)

(0.000)

(0.000)

(0.000)

1.491

0.907

0.000

0.000

(0.000)

(0.000)

(0.000)

(0.000)

1.504

0.907

0.000

0.000

(0.000)

(0.000)

(0.000)

(0.000)

1.617

1.080

0.000

0.000

(0.000)

(0.000)

(0.000)

(0.000)

RAE 78.58 79.87 82.22

RMSE 1.515 1.521 1.526

MAE 0.917 0.923 0.931

ABS -0.061 -0.001 0.040

REL -0.082 0.000 0.056

(0.001)

(0.043)

(0.001)

(0.001)

(0.001)

(0.001)

CFF: generalized cohort CCF: variable CCF: fixed

(0.001)

(0.054)

0.159

80.86

(0.001)

(0.059)

0.189

78.74

(0.000)

(0.017)

0.937

(0.001)

CE AC

(0.001)

(0.001)

(0.001)

1.535

0.934

-0.001

0.000

(0.001)

(0.001)

(0.001)

(0.001)

1.507

0.909

0.000

0.002

(0.000)

(0.000)

(0.000)

(0.001)

5.923

1.669

1.117

-0.002

0.000

(0.019)

(0.007)

(0.004)

(0.004)

(0.000)

PT

(0.001)

M

R2 0.171 0.175 0.180

ED

CCF: standard cohort

AN US

We calculated the in-sample and out-of-sample performance based on the means of 1,000 random 10-fold cross-validations using OLS regressions. Panel A reports the in-sample performance measures of coefficient of determination (R2), relative absolute error (RAE), root mean squared error (RMSE), mean absolute error (MAE), absolute error (ABS), and the relative error (REL). The displayed values are at the CCF level using different estimation approaches. For the standard cohort approach we present the minimum, mean, and maximum of the 12 different reference months. We report standard deviations in parentheses. Panel B reports the corresponding out-of-sample performance measures.

37

ACCEPTED MANUSCRIPT Table 6: Summary statistics for CCF, EaD, and PD The table displays historically estimated means for CCF, EaD, and PD using different estimation approaches. For the standard cohort approach we present the minimum, mean, and maximum of the 12 different reference months.

Generalized cohort

Variable

Fixed

0.745 1,407 € 0.0115

0.745 1,407 € 0.0115

0.880 1,410 € 0.0008

AC

CE

PT

ED

M

AN US

CR IP T

CCF EaD PD

Standard cohort min mean max 0.648 0.746 0.824 1,379 € 1,407 € 1,440 € 0.0110 0.0115 0.0119

38

CR IP T

ACCEPTED MANUSCRIPT

Table 7: Comparison of different approaches for CCF-estimation at EL-level based on historical averages

The table displays different estimates of the EL calculated from CCF and PD estimates that are based on different approaches. For estimation of CCF and PD, we implemented the historical averages based on the different approaches (standard cohort, generalized cohort, variable-, and fixed-horizon approach). For the cohort approach, we present the minimum, mean, and maximum of the 12 different reference months. Without loss of generality, we assume that LGD = 1.

CCF-mean: variable

AC

CE

0.002 114.2 225.8 36.45 4.587 0.285 0.002 114.2 225.8 36.44 4.584 0.284 0.002 114.2 225.8 36.44 4.578 0.284 -0.002 124.7 226.3 39.79 7.944 0.493

0.005 120.3 226.1 38.39 6.538 0.406

PD-mean: variable

0.000 106.56 225.6 34.00 2.128 0.132

AN US

PT

CCF-mean: fixed

0.000 106.6 225.6 34.00 2.130 0.132

M

CCF-mean: generalized cohort

PD-mean: generalized cohort

ED

CCF-mean: standard cohort

R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL

PD-mean: standard cohort min mean max -0.000 0.002 0.005 104.3 114.4 123.1 225.6 225.9 226.1 33.27 36.50 39.28 1.385 4.643 7.449 0.086 0.288 0.462 0.002 0.002 0.002 111.6 114.3 116.7 225.8 225.8 225.9 35.60 36.46 37.25 3.726 4.600 5.402 0.231 0.285 0.335 0.002 0.002 0.002 111.6 114.3 116.8 225.8 225.8 225.9 35.61 36.47 37.25 3.733 4.607 5.410 0.232 0.286 0.336 -0.002 -0.002 -0.001 121.7 124.8 127.7 226.2 226.3 226.3 38.82 39.82 40.74 6.962 7.978 8.911 0.432 0.495 0.553

39

0.002 114.2 225.8 36.44 4.584 0.284 0.002 114.2 225.8 36.43 4.568 0.283 0.002 114.2 225.8 36.43 4.575 0.284 -0.002 124.7 226.3 39.78 7.940 0.493

0.005 120.3 226.1 38.39 6.535 0.406

PD-mean: fixed

-0.004 54.32 226.5 17.33 -14.88 -0.923

-0.004 54.840 226.5 17.50 -14.71 -0.913 -0.004 54.84 226.5 17.50 -14.71 -0.913 -0.004 54.84 226.5 17.50 -14.71 -0.913 -0.004 55.55 226.5 17.72 -14.48 -0.899

-0.004 55.25 226.5 17.63 -14.58 -0.905

CR IP T

ACCEPTED MANUSCRIPT

Table 8: Comparison of different approaches for CCF estimation at the EL level based on regression models

The table displays different estimates of the EL calculated from CCF and PD estimates that are based on different approaches. For estimation of CCF and PD, we implemented OLS regressions and probit regressions, respectively, based on the different approaches (standard cohort, generalized cohort, variable-, and fixed-horizon approach). For the cohort approach, we present the minimum, mean, and maximum of the 12 different reference months. Without loss of generality, we assume that LGD = 1. The best values for EL estimates regarding defaults within one year are highlighted in bold.

CCF: variable

AC

CE

CCF: fixed

PT

PD: variable

AN US

CCF: generalized cohort

PD: generalized cohort 0.100 86.31 214.1 27.54 -0.837 -0.052

0.102 87.48 214.3 27.91 -0.422 -0.026 0.102 87.46 214.3 27.91 -0.430 -0.027 0.102 87.37 214.2 27.88 -0.457 -0.029 0.100 90.24 214.5 28.79 0.550 0.034

M

CCF: standard cohort

R RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL

PD: standard cohort min mean max 0.098 0.101 0.104 85.07 87.56 89.40 214.0 214.4 214.7 27.14 27.94 28.52 -1.300 -0.406 0.332 -0.081 -0.025 0.021 0.100 0.101 0.103 86.31 87.55 88.65 214.2 214.4 214.5 27.54 27.94 28.29 -0.861 -0.408 0.079 -0.053 -0.025 0.005 0.100 0.101 0.103 86.20 87.45 88.54 214.1 214.3 214.5 27.50 27.90 28.25 -0.895 -0.440 0.041 -0.056 -0.027 0.003 0.098 0.099 0.101 89.01 90.33 91.76 214.4 214.6 214.7 28.40 28.82 29.28 0.089 0.568 1.169 0.006 0.035 0.073

ED

2

40

0.103 88.41 214.2 28.21 -0.106 -0.007

0.100 86.32 214.2 27.54 -0.850 -0.053

0.101 87.50 214.3 27.92 -0.433 -0.027 0.101 87.48 214.3 27.91 -0.438 -0.027 0.102 87.39 214.3 27.88 -0.469 -0.029 0.099 90.27 214.6 28.80 0.543 0.034

PD: fixed

0.102 88.43 214.5 28.21 -0.117 -0.007

0.006 52.97 225.4 16.90 -15.12 -0.938

0.006 53.04 225.4 16.93 -15.09 -0.936 0.006 53.04 225.4 16.93 -15.09 -0.936 0.006 53.03 225.4 16.92 -15.09 -0.937 0.006 53.25 225.4 16.99 -15.02 -0.932

0.006 53.12 225.5 16.95 -15.07 -0.935