Accepted Manuscript
Exposure at Default Modeling – A Theoretical and Empirical Assessment of Estimation Approaches and Parameter Choice Marc Gurtler , Martin Thomas Hibbeln , Piet Usselmann ¨ PII: DOI: Reference:
S0378-4266(17)30054-7 10.1016/j.jbankfin.2017.03.004 JBF 5107
To appear in:
Journal of Banking and Finance
Received date: Revised date: Accepted date:
14 January 2016 17 February 2017 3 March 2017
Please cite this article as: Marc Gurtler , Martin Thomas Hibbeln , Piet Usselmann , Exposure at ¨ Default Modeling – A Theoretical and Empirical Assessment of Estimation Approaches and Parameter Choice, Journal of Banking and Finance (2017), doi: 10.1016/j.jbankfin.2017.03.004
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Exposure at Default Modeling – A Theoretical and Empirical Assessment of Estimation Approaches and Parameter Choice Marc Gürtler a, Martin Thomas Hibbeln b, Piet Usselmann a, University of Braunschweig - Institute of Technology, Germany b
University of Duisburg-Essen, Germany
Abstract
CR IP T
a
M
AN US
Estimating the credit risk parameter exposure at default is important for banks from an internal risk management and a regulatory perspective. Several approaches are common in the literature and in practice. We theoretically and empirically analyze how the exposure at default should be modeled to obtain accurate estimates of the expected loss. Our empirical analysis is based on a large and unique dataset from a retail portfolio of a European bank. We demonstrate that some approaches can lead to substantially biased estimates of the expected loss and show that the generalized cohort approach is advantageous. Moreover, using in- and out-of-sample analyses, we empirically demonstrate that using the credit conversion factor is preferable to the loan equivalent factor, exposure at default factor, and direct exposure at default estimation to achieve high estimation accuracy.
ED
Keywords: Credit risk, checking accounts, exposure at default, credit conversion factor, probability of default
AC
CE
PT
JEL classification: G21, G28
Corresponding author: Piet Usselmann; Technische Universität Braunschweig; Abt-Jerusalem-Str. 7, 38106
Braunschweig, Germany; Phone: +49 531 391 2894; E-mail:
[email protected].
1
ACCEPTED MANUSCRIPT
1. Introduction When estimating the risk related to a credit product, banks typically model the risk parameters probability of default (PD), loss given default (LGD), and exposure at default (EaD) (or the credit conversion factor (CCF)) separately.1 In contrast to the credit risk parameters PD and
CR IP T
LGD, few papers have theoretically or empirically analyzed the modeling of EaD or CCF, although modeling these parameters is important for banks from an internal risk management and a regulatory perspective. On the one hand, CCF is needed to fulfill regulatory requirements regarding the internal ratings-based (IRB) approach of Basel III. Furthermore,
AN US
CCF modeling can also be required for estimating expected loss over lifetime, which is mandated by the new International Financial Reporting Standard 9 (IFRS 9). On the other hand, an unbiased EaD estimation is important from an internal risk management perspective regarding risk-based pricing, limit managing, or economic capital calculations to control risk
M
and to obtain an advantage over competitors.
ED
In this context, products with time-varying exposure are of particular interest. Whereas most previous studies regarding CCF address products for corporate customers, we analyze
PT
retail checking accounts with lines of credit, which is a widespread product in many countries. In 2013, more than 98 million checking accounts existed in Germany and more than 99% of
CE
the main payment instruments (measured by frequency of transactions) were credit transfers,
AC
direct debits, electronic cards, and credit cards. Similar numbers can be observed for most
1
These risk parameters can be defined as follows: The PD is the probability of default of a counterparty,
typically over a one-year period. The LGD is the (expected) ratio of the loss on an exposure due to the default of a counterparty to the amount outstanding at default. The EaD is the (expected) amount outstanding at default. The CCF is the proportion of the currently undrawn amount of the commitment (the open limit) that is expected to be drawn down at default. The expected loss EL can be determined as the product of PD, LGD and EaD. For an overview of credit risk measurement see Bluhm et al. (2010).
2
ACCEPTED MANUSCRIPT countries that participate in the Committee on Payments and Market Infrastructures (CPMI) (Bank for International Settlements 2014). In September 2015, checking account customers in Germany made use of nearly 36 billion euros in overdraft facilities (Deutsche Bundesbank 2015). Both numbers show the high qualitative and quantitative relevance of this product. In our theoretical analysis, we discuss three common modeling approaches for CCF,
CR IP T
namely the fixed-horizon, the variable-horizon, and the (standard) cohort approach. Particularly, we demonstrate the conditions under which the resulting CCF and EL estimates are unbiased. In addition, we propose a generalized cohort approach, which can yield unbiased CCF estimates even if these assumptions are not fulfilled. Based on these CCF
AN US
estimates, the EaD and, ultimately, the EL is calculated, such that a biased CCF generally results in biased EL estimates.
In our empirical assessment of EaD, we not only consider an estimation based on CCF because the literature suggests various parameters for estimating the EaD: the CCF, the loan
M
equivalent factor (LEQ), the exposure at default factor (EaDF), or direct estimates of EaD.
ED
Also in banking practice, there is no consensus about the parameter on which EaD estimates should be based (Bank for International Settlements 2016). For this reason, we first analyze
PT
which of these parameters is superior for EaD modeling based on in- and out-of-sample estimations. In this context, we use k-fold cross-validation techniques and transform the
CE
resulting estimates to an identical basis to allow for an appropriate comparison of these
AC
different parameters. We evaluate the different models based on several predictive accuracy measures. We find that it is beneficial to estimate the CCF instead of the alternative parameters to achieve high accuracy. Furthermore, our empirical results confirm the relevance of the bias discussed in the theoretical analysis. In particular, a combination of the common PD estimate based on a variable-horizon approach and the CCF estimate based on a fixedhorizon approach can lead to an overestimation of the expected loss. However, from a regulatory perspective, this estimation is less problematic because the estimate is rather 3
ACCEPTED MANUSCRIPT conservative. Finally, we can relax some assumptions of our theoretical model. We find that if the CCF is estimated conditional on some independent variables x instead of using simple historical averages, the predictions are not only improved because we account for the variance in the CCF estimates but also because of a substantial decline in the abovementioned bias. Our paper contributes to the banking and finance literature in several ways. First, we
CR IP T
provide the first systematic and extensive overview of the existing literature on CCF modeling. Second, we present the assumptions under which the different approaches for modeling EaD can be transferred into one another and consistent EaD or CCF estimates can be obtained, whereas the previous literature on EaD or CCF modeling primarily addresses the
AN US
advantages and disadvantages of these approaches from a practical perspective. Moreover, we propose a new general cohort approach to achieve unbiased estimates. Third, based on a large data set, we empirically identify which of the various parameters (CCF, LEQ, EaDF, EaD) should be used for EaD modeling. Furthermore, we demonstrate the existence of relevant
M
interactions between modeling PD and EaD.
ED
The remainder of the paper is organized as follows. In Section 2, we provide a systematic overview of the existing literature on CCF modeling. In Section 3, we describe our
PT
model and present our theoretical results. In Section 4, we report the empirical analysis, and
CE
Section 5 concludes.
AC
2. Literature Review
Despite the high relevance of modeling EaD or CCF, the literature on this topic is rather scarce. We present the first systematic and extensive overview of the literature addressing this topic. Table 1 summarizes our review of the literature.
The existing studies consider different periods and countries, but most analyses are based on US data. Regardless of the product, the average realized CCF values are typically 4
ACCEPTED MANUSCRIPT between 30% and 60%. Hence, most borrowers do not fully use their limit in the event of default. However, we find evidence that the CCF value strongly depends on the product, data, and empirical strategy used. Most studies on CCF address products for corporate customers whereas few focus on retail customers. Moreover, the previous literature on retail customers focuses primarily on modeling CCF (EaD) for credit cards. Most studies discuss which factors
CR IP T
might influence CCF values in a univariate or multivariate setting (e.g., Araten and Jacobs 2001, Jiménez et al. 2008, Qi 2009, Jacobs 2010, Zhao et al. 2011, and Leow and Crook 2016). Especially for corporate customers, the most relevant factors are time-to-default and borrower risk. For example, Agarwal et al. (2006) find that a decrease in credit quality results
AN US
in a significant increase in credit line utilization.
There are also some other streams of CCF-related literature. Some studies directly model exposure at default instead of CCF (e.g., Leow and Crook 2016), whereas others model the exposure (which is not necessarily conditional on default) or credit line usage and credit
M
line usage at default (e.g., Jiménez et al. 2009, Sufi 2009, Hibbeln et al. 2015, or Hon and
ED
Bellotti 2016). Models regarding LEQ or EaDF are discussed, e.g., in Jacobs (2010), Moral (2011), or Leow and Crook (2016). The studies by Hon and Bellotti (2016) and Tong et al.
PT
(2016) discuss a variety of other sophisticated statistical regression models for modelling EaD, CCF, LEQ or EaDF.
CE
Another stream of the literature discusses different theoretical topics in EaD/CCF
AC
modeling. Moral (2011) analyzes various methods for estimating EaD. Bag and Jacobs (2011) compare the empirical results of selected papers, and Hahn and Reitz (2011) present possible approaches for estimating exposure. Bag and Jacobs (2012) present an algorithm-based method to determine EaD. Finally, the Bank for International Settlements (2016) published the findings of a survey in 2014 among 37 banks from 17 countries. They find widely varying realized CCFs in banks due to different estimation approaches (i.e. the fixed-horizon, variable-horizon, or standard cohort approach), estimators other than the mean, or data 5
ACCEPTED MANUSCRIPT cleaning processes. Moreover, they notice that EaD estimates within banks are based on different risk parameters, where the EaDF and CCF approach are most common.
3. Theoretical analysis of CCF forecasts 3.1. Variable description
CR IP T
To define the CCF in a manner consistent with regulatory guidelines, we first need to introduce a model environment. Let Bt denote the balance at time t of the account under consideration, and let Lt stand for the limit advised at time t. In addition, t {0, 1} defines an indicator variable that describes a default of the owner of the account for different reasons at
AN US
time t if and only if t = 1. Examples of such reasons are first exogenous signals from an external rating agency or from a credit bureau or defaults of other credit products by the same customer.2 Second, a default occurs at time t after being past due for more than 90 days (B < L for [t90, t]).3 Against this background, we define the exposure at time t as
M
et : min{Bt ,0} and the default time as d : min{t | t 1}. On this basis we are able to
ED
define the exposure at default (EaDt,T) at a future time T from the perspective of time t as the expected exposure at default time d=T, where the expectation value Et is determined at time t
PT
< T, i.e., EaDt ,T : Et ( ed | d T ).
CE
On the basis of the EaD and for the purposes of Basel III and EU Regulation No. 575, the conversion factor is defined as “the ratio of the currently undrawn amount of a
AC
commitment that could be drawn and that would therefore be outstanding at default to the currently undrawn amount of the commitment (the extent of the commitment being 2
It is common bank practice in retail business and in line with Basel III and EU Regulation No. 575 to assign
default status to the specific account and not to the client. 3
Usually, the negative balance has to stay below the authorized limit. An overdraft of an account can, however,
occur by using the checking account offline, interest on debit balances charged by the bank, or a manual approval by a loan officer.
6
ACCEPTED MANUSCRIPT determined by the advised limit, unless the unadvised limit is higher)” (EU Regulation No. 575).4 Hence, we define the expected credit conversion factor (CCFt,T) for a default of a credit product at time T from the perspective of time t as
EaDt ,T et , CCFt ,T Lt et 0,
if Lt et 0,
(1)
else.
CR IP T
To determine the expected loss (EL) of a checking account, two further risk parameters are relevant, the PD and the LGD. On one hand, we define PDt,T = Pt(d=T) as the probability of default at time T from the perspective of time t.5 On the other hand, we consider
AN US
as the probability of default in the period between t+1 and T. PDtcum ,T Pt (d {t 1, ..., T }) Furthermore, we define the share of the exposure that is lost if the borrower defaults as
t
t
(2)
et stands for the absolute loss at time t, and the expected loss given default d
| d T ).
ED
is defined as LGDt ,T Et (
M
Consequently,
loss ratio of the exposure, if d t, : else. 0,
PT
3.2. The model
From the perspective of both risk management and banking regulation (with respect to Basel
CE
III), the EL is of particular importance. In this subsection, we theoretically demonstrate how
AC
CCF influences EL and which methods of CCF determination are adequate to obtain an unbiased EL-estimation method.
4
This definition is only valid for accounts where a part of the credit line is currently undrawn, which is the
standard state. In case that the current exposure exceeds the limit ( et Lt ) it is consequential to define CCF as 0. This is common in practice and literature (e.g. Bank for International Settlements 2016 or Leow and Crook 2016). 5
Pt denotes the probability from the perspective of time t.
7
ACCEPTED MANUSCRIPT Assuming that ed |(d=) and
d|(d=)
are independent for all > t, the EL for potential
defaults in the period between t and T can be determined as follows
T
t 1
Et (
T
E(
t 1
d ed )
t
d
T
E(
t 1
t
d
ed | d ) Pt (d )
| d ) Et (ed | d ) Pt (d )
(3)
T
LGD EaD PD . t 1
t,
t,
t,
CR IP T
ELt ,T
For simplification, we assume LGDt ,1 LGDt , 2 : LGDt for all 1, 2 , i.e., the share of the exposure that is lost if the borrower defaults is independent of the time of default.6. Under this assumption the EL simplifies to T
T
(e CCF (L e )) PD .
AN US
ELt ,T LGDt EaDt , PDt , LGDt t 1
t 1
t
t,
t
t
t,
(4)
We call this approach the “generalized cohort approach” because it can be applied at an arbitrary point in time t. However, in practice, the parameters CCFt, are typically estimated
M
on the basis of a specific point in time, e.g., t = “January 1st,” and these estimations are
ED
applied to arbitrary points in time t. This so-called “cohort approach” (below, we use the designation “standard cohort approach” to avoid confusion with the “generalized cohort
PT
approach”) implicitly assumes that two CCF values coincide if the corresponding residual
CE
lifetimes t are the same, i.e., CCFt, CCFt+c, +c for an arbitrary c. In the empirical section, we analyze this assumption. However, if this assumption proves false, the generalized cohort
AC
approach (4) should be applied for each considered time t. Independent of the empirical result, in the following, we demonstrate the implications of this and further assumptions. Specifically, we use the following assumptions: (A1)
CCFt, = CCFT+t,T (for all t = 1, …, T1; t < ≤ T),
(A2)
PDt, = PDT+t,T (for all t = 1, …, T1; t < ≤ T),
6
Discussions with our data provider confirmed the validity of this assumption for our dataset.
8
ACCEPTED MANUSCRIPT (A3)
Et(ed | d = T) = E(ed | d = T) is independent of t,
(A4)
et < Lt (for all t = 1, …, T1),
(A5)
et t Lt t . E (ed | d T ) et Lt E (ed | d T )
The first two assumptions concern the independence of both the CCFt, and PDt, of the
CR IP T
abovementioned residual lifetime of an account. According to the assumption (A3), the determination of an expectation value is independent of the point in time t. Assumption (A4) requires that the limit is not fully drawn at t. Assumption (A5) requires that the “velocity” of a potential limit reduction by the bank is relatively lower than the “velocity” of a potential
AN US
exposure increase of the account owner. This assumption is satisfied if, e.g., the exposure is strictly increasing and the limit is constant. On the basis of assumptions (A1) and (A2), the EL can be modified
t 1
t,
T
(CCF
t 1
T t ,T
t
t
t
t,
( Lt et ) et ) PDT t ,T
(5)
ED
LGDt
T
(CCF ( L e ) e ) PD
M
ELt ,T LGDt
T 1
LGDt (CCF ,T ( Lt et ) et ) PD ,T .
PT
t
Particularly, the CCF does not have to be calculated for the period [t, τ] but for [τ, T]. This
CE
approach is called the “variable-horizon approach”. Again, it should be emphasized that this approach requires that assumptions (A1) and (A2) be valid, which is tested in the empirical
AC
section. Furthermore, we present a proposition that reveals the potential bias of an additional approach – the fixed-horizon approach – that is also used in practice.7
Proposition Let assumptions (A1) – (A4) be valid. Then, the following statements result: 7
The proof is presented in the Appendix.
9
ACCEPTED MANUSCRIPT 1) CCFt,T is strictly decreasing in t. 2) ELt ,T LGDt (CCFt ,T ( Lt et ) et ) PDtcum ,T .
The right-hand side of part 2) is called the “fixed-horizon approach” to estimate the EL. Consequently, this approach overestimates the EL. However, the fixed-horizon approach
CR IP T
could still be used for regulatory purposes because it can be interpreted as a conservative approach.
3.3. Overview of CCF approaches
AN US
In practice, the four approaches discussed in the previous section differ in the (ex post) calculation of parameters regarding the reference point t. By using the fixed-horizon approach, the exposure at default is linked to the balance and limit at the date of reference (reference point t) exactly one year prior to default (fixed time). The variable-horizon
M
approach is a generalization of the fixed-horizon approach. The year before default is
ED
subdivided into several time windows with several dates of reference (e.g., each month). The exposure at default is linked to the balance and limit of each date of reference. In contrast, the
PT
observation period for the standard cohort approach consists of general dates of reference (e.g., the first month of each year) by subdividing the period into one-year time windows. The
CE
exposure at default is linked to the balance and limit at the starting point of the corresponding
AC
time window. The proposed generalized cohort approach, however, can be applied at an arbitrary point in time t. The observation period for the generalized cohort approach consists of general dates of reference. In contrast to the standard cohort approach, each month of the year is a reference point. With respect to (A1), we estimate means for each month, and if these means were default weighted, the result would be identical to the variable-horizon approach mean.
10
ACCEPTED MANUSCRIPT For every of these CCF approaches, a simple (ex ante) forecast could be implemented by calculating (e.g., at product level) the historical (weighted) average CCF values. Moreover, based on each of these approaches, more sophisticated CCF regression models can be developed to estimate individual CCF forecasts. Implementing the generalized cohort approach means that not one but multiple estimators will be calculated, e.g., based on
CR IP T
regression models referring to a specific reference month.
4. Empirical analysis 4.1. Variable description
AN US
The CCF is defined according to (1). We can transform the CCF into EaD estimates as follows
EaDt ,T et CCFt ,T ( Lt et ) .
(6)
Alternatively to the CCF, the estimation of EaD can be based on the LEQ, which is defined as
ED
M
EaDt ,T , LEQt ,T et 0,
if et 0,
(7)
else.
PT
The LEQ can also be transformed into EaD and CCF
CE
EaDt ,T et LEQt ,T
et ( LEQt ,T 1) , and CCFt ,T Lt et 0,
if Lt et 0,
(8)
else.
AC
Finally, we can define the EaDF as
EaDt ,T , EaDFt ,T Lt 0,
The transformation into EaD and CCF is given by
11
if Lt 0, else.
(9)
ACCEPTED MANUSCRIPT
EaDt ,T Lt EaDFt ,T
EaDFt ,T Lt et , and CCFt ,T Lt et 0,
if Lt et 0,
(10)
else.
It is important to note that our definition of CCF is in accordance with the regulatory framework and with Taplin et al. (2007) or Valvonis (2008). Some other studies, however, denominate our CCF variable as LEQ (e.g., Qi 2009, Jacobs 2010, or Leow and Crook 2016)
CR IP T
or EaDF (e.g., Yang and Tkachenko 2012).
4.2. Data description
AN US
Our dataset stems from a large, privately owned European bank. This dataset represents a unique panel of checking accounts with lines of credit consisting of 2,798,491 account-month observations with 2,623 defaults between 2007 and 2014 from 61,371 customers. Checking accounts are a widespread product in many countries. This product is
M
typically used for receipts and expenses and can have a positive or negative balance. Borrowers obtain an initial line of credit of €1000, which is unsecured and without any
ED
expiration date.8 The default definitions used by the bank – “90 days past due” or the
PT
expectation that the customer will not repay all his obligations – comply with Basel III and
(Insert Table 2 about here)
AC
CE
EU Regulation No. 575. Summary statistics of our data are given in Panel A of Table 2.
For the given dataset, we calculate the parameters CCF, LEQ, EaDF, and EaD. Due to
some extreme observations and to avoid instability in parameters, we winsorized the training as well as the validation dataset for CCF, LEQ, and EaDF at the 5th and 95th percentiles, which is common in the literature (e.g., Qi 2009, Jacobs 2010, or Leow and Crook 2016). 8
See e.g. Jiménez et al. (2009) for the impact of collateral and maturity on EaD of corporate credit lines.
12
ACCEPTED MANUSCRIPT Summary statistics of the parameters are given in Panel B of Table 2, and the frequency distributions of realized CCF, LEQ, EaDF, and EaD are given in Figure 2.9
As seen in Table 2, the mean of realized CCF is 72.95%. Hence, on average, approximately 73% of the open limit at the reference point t (up to 12 months prior to default) will be drawn
CR IP T
at default in T. This number slightly exceeds the mean values reported in previous CCF studies concerning credit cards. A possible explanation is that checking accounts are typically more important for customers than are credit card accounts because the former are used, e.g., for the customers’ monthly salary. For this reason, a default will often occur only if the
AN US
customer has already drawn a high percentage of the outstanding limit. Panel A of Figure 2 indicates that many customers do not use their open limit at all (CCF = 0) or use their entire limit (CCF = 1) in default. In this context, it is important to note that approximately the half of the observations with CCF=0 stem from customers without an open limit at reference point t.
M
Values between zero and one are nearly uniformly distributed. Some negative CCF values and
ED
some CCF values greater than one can also be observed. The mean of realized LEQ is 1.23. Therefore, on average, 123% of the credit used at
PT
the reference point is used at default. More than 4,700 customers have no negative balance at the reference point and, consequently, an LEQ of zero (cf. equation 7). The other customers
CE
with LEQ = 0 have an exposure at default of zero. Furthermore, many customers have almost
AC
no change in balance between the reference point and default (LEQ = 1). 9
We further analyzed accounts with extreme CCF observations and found that these accounts have on average a
significantly smaller open limit and higher exposure at the time of CCF calculation, which explains the extreme values. This is similar to Qi (2009), who, as consequence, excludes accounts with an open limit below $50. When repeating all analyses with unwinsorized values for the validation subset, we find very low performance measures, which is a consequence of extreme values. Nevertheless, the other results are quite robust. Furthermore, the economic relevance of accounts with such a low open limit is very small, as they account for only 2.0% of the total open limit.
13
ACCEPTED MANUSCRIPT The distribution of realized EaDF is also bimodal with many low values (EaDF = 0 if the exposure at default is zero) and many high values (EaDF = 1 if the exposure at default equals the limit at the reference point). On average, 85% of the limit at the reference point is used at default. The mean of realized EaD is approximately €1,393, which is above the initial limit. However, customers can ask for an increase of their initial limit of €1,000, which leads to a
CR IP T
mean limit of €1,658. Note that approximately 3,000 observations have an EaD of zero. Most of these observations have a positive balance at default, leading to zero exposure for the bank.
AN US
4.3. Empirical strategy
As presented in Section 2, the estimation of EaD is based on different parameters in the literature, namely CCF, LEQ, EaDF, or a direct estimation of EaD. Using in- and out-ofsample estimations, we empirically determine which definition of such parameters is
M
beneficial for EaD modeling to achieve high accuracy.
ED
Furthermore, various approaches are commonly used in practice and discussed in the literature to model CCF or EaD (e.g., CEBS 2006, Valvonis 2008, Moral 2011, or Bank for
PT
International Settlements 2016). We discussed these approaches in Section 3 in the theoretical analysis of CCF, namely the fixed-horizon, the variable-horizon, the standard cohort, and the
CE
generalized cohort approach. Subsequently, we empirically analyze which approach is
AC
beneficial and in which situations the resulting estimate of the EL is likely to be biased. Predictive accuracy is measured with six different criteria: the out-of-sample
coefficient of determination (R2), relative absolute error (RAE), root mean squared error (RMSE), mean absolute error (MAE), absolute error (ABS), and relative error (REL). The out-of-sample coefficient of determination R2 is defined as
14
ACCEPTED MANUSCRIPT 1 i | yi yˆi |2 2 n R 1 , 1 2 | yi ytrain | n i
(11)
where yi and ˆyi are the realized and the forecasted values for account i and ytrain is the
1 i | yi yˆi | n RAE 100. 1 | y y | i train n i
CR IP T
historical average value of the training data.10 Similarly, the RAE is defined as
(12)
Note that if values for RAE are lower than 100 and values for R2 are greater than zero, then
is defined as
RMSE
1 ( yi yˆi )2 , i n
M
and the MAE is defined as
AN US
the predictive performance of the model is better than the historical average value. The RMSE
1 | yi yˆi | . n i
(14)
ED
MAE
(13)
Smaller values of RMSE and MAE imply better accuracy, with zero being the lower bound.
ABS
1 ( yˆi yi ) , n i
(15)
REL
( yˆ y ) . y
(16)
CE
PT
The ABS is defined as
AC
and the REL is defined as i
i
i
i
i
The best achievable value for ABS and REL is zero. Values greater than zero indicate an overestimation of the real value, and values lower than zero imply an underestimation.
10
See also Campbell and Thompson (2008) or Gürtler and Hibbeln (2013) for applications of this out-of-sample
R2 statistic.
15
ACCEPTED MANUSCRIPT For evaluation, we use a 10-fold cross-validation to avoid overfitting and to obtain more robust results (see, e.g., Bastos 2010, Qi and Zhao 2011, or Hartmann-Wendels et al. 2014). For this purpose, we randomly divide the sample into 10 subsamples. Then, we use nine subsamples to build the model (in-sample) and one subsample to test the model (out-ofsample) and calculate our predictive accuracy measures. We repeat this 10 times such that
CR IP T
each subsample represents exactly once the test sample. The obtained (10) values for the measures are combined to obtain one value for each measure for each 10-fold crossvalidation. This method is repeated 1,000 times with different randomly generated subsamples.11
AN US
For estimation, we use common independent variables regarding default risk, account activity, or behavioral variables on a monthly basis, relationship variables, and client controls (see, e.g., Qi 2009, Leow and Crook 2016, or Hon and Bellotti 2016). Based on some directly observable variables such as balance or limit, we also derive ratios or dummy variables such
As an additional robustness check, we have repeated the analyses with an out-of-time validation. We find that
PT
11
ED
M
as usage or high usage (usage greater than 95%) (cf. Qi 2009).12 In our baseline analysis, we
the results remain largely unchanged.
We derived the relevant variables from the literature. For estimation, we use the same variables for all models
CE
12
to have identical conditions for each model, because our focus is not on finding the best statistical model.
AC
Instead, we analyze on which parameter (EaD, CCF, LEQ, or EaDF) a model should be based in order to optimally estimate EaD. We use the following variables: undrawn amount, drawn amount, usage, usage greater than 95%, rating, difference between monthly cash inflows and outflows, difference between the maximum and minimum exposure in each month (both as a percentage of the external limit), average number of bounced debits (return debit notes), percentage of days with a negative balance, percentage of days with overdrafts (last three variables in the preceding 12 months), and length of the relationship in months. Customer controls are customer’s age, gender, job, marital status, number of children, nationality, online versus offline banking, and academic degrees.
16
ACCEPTED MANUSCRIPT use linear regression (OLS) for calculation. As a robustness test, we also implement mixture regression models.
4.4. Comparison of different parameters As shown in Section 4.1, the estimation of EaD can be based on different risk parameters, which can easily be transformed into the EaD. We estimate the risk parameters CCF, LEQ,
CR IP T
EaDF, and EaD at variable time horizons. For comparison, we additionally implement the historical average of CCF and EaD, namely CCF-mean and EaD-mean. To be able to compare the accuracy of the different models, we have to transform the estimates into an identical
AN US
basis, for example into EaD estimates or CCF estimates. Specifically, we transform the estimated values of CCF, LEQ, EaDF, and CCF-mean into EaD estimates (EaD level). Similarly, we transform the estimated values of LEQ, EaDF, EaD, and EaD-mean into CCF estimates (CCF level). In Table 3, we display our performance measures after this
ED
M
transformation into EaD estimates.
By definition, the R², ABS, and REL measures for EaD-mean equal zero, and similarly, the
PT
RAE equals 100, when calculated in-sample. Typically, the out-of-sample predictive accuracy measures are smaller than in-sample measures. However, we find that the out-of-sample
CE
accuracy is nearly identical for most measures, which indicates that there is no problem
AC
regarding overfitting the data. Interestingly, the CCF-mean provides a good prediction of EaD (in-sample and out-of-sample), which means that CCF-mean performs substantially better than EaD-mean in explaining the variance of EaD. This effect appears because the transformation of the CCF-mean into EaD values by equation (6) leads to variance in the estimates. Nevertheless, regarding the absolute error, the accuracy of CCF-mean is rather low. The risk parameters CCF and EaD exhibit the best overall performance. In particular, for R², RAE, RMSE, and MAE, we find only small differences between the two parameters. 17
ACCEPTED MANUSCRIPT Interestingly, the absolute error for CCF is also very good and comparable to the absolute error for EaD. The parameter EaDF and to an even greater extent the LEQ have rather low performance in explaining the EaD. Specifically, the absolute error and relative error are very high compared to CCF and EaD. In summary, if the EaD is the parameter of interest, the
CR IP T
estimate should be based on either CCF or EaD for modeling and forecasting.
Next, we analyze the predictive accuracy measures evaluated at the CCF level (see Table 4). Again, out-of-sample predictive accuracy measures are generally slightly smaller than insample measures, and the R², ABS, and REL for CCF-mean is zero (the RAE equals 100) in-
AN US
sample. This finding also (approximately) holds for the out-of-sample prediction. Remarkably, only for CCF do the R² and RAE indicate good accuracy. For LEQ, EaDF, EaD, and EaD-mean, we observe low performance. Furthermore, the other performance measures indicate poor performance of parameters that are transformed into CCF; only the CCF model
M
results in a good predictive power if evaluated at the CCF level. Regarding the absolute error,
ED
no parameter except CCF (and EaDF) is qualified to forecast CCF. As mentioned before, there exists a stream of literature discussing the wide variety of
PT
other sophisticated statistical regression models for modelling EaD, CCF, LEQ, or EaDF (e.g. Hon and Bellotti 2016, Leow and Crook 2016, or Tong et al. 2016). We do not focus on
CE
finding the best statistical model and further work may shed more insight into the best EaD
AC
modelling approach. Instead, we analyze on which parameter (EaD, CCF, LEQ, or EaDF) a model should be based to optimally estimate EaD. However, as a robustness test we additionally implement several of the proposed sophisticated statistical regressions models. More specifically, we perform (unreported) analyses based on the zero-adjusted gamma model for EaD (see Tong et al. 2016), a mixed-model (degenerate term plus a normal distribution) for CCF (see Hon and Bellotti, 2016), a mixed-model (degenerate term plus a weibull distribution) for LEQ (see Hon and Bellotti, 2016), and a mixed-model (degenerate 18
ACCEPTED MANUSCRIPT term plus a weibull distribution) for EaDF (see Hon and Bellotti, 2016). We find that the overall results regarding the influence of parameter choice on the EaD estimate remain largely unchanged.13 To sum up, we show that if the EaD is estimated directly and subsequently transformed into CCF predictions, the accuracy at the CCF level is rather low. The same holds
CR IP T
for LEQ (and EaDF). However, we find that CCF can not only be used to achieve accurate CCF predictions, but these forecasts can also be used to derive EaD predictions that are of similar accuracy to estimates that directly focus on EaD. Thus, our results suggest that EaD forecasts should be based on CCF models instead of applying LEQ, EaDF, or direct EaD
AN US
models.
4.5. Comparison of different estimation approaches evaluated at the CCF level As a next step, we repeat the analysis of Table 4 with a focus on the different CCF estimation
M
approaches and evaluate them at CCF level. Specifically, we calculate the in-sample and out-
ED
of-sample performance based on 1,000 random 10-fold cross-validations for fixed CCF, variable CCF, standard cohort CCF, and generalized cohort CCF. Regardless of the estimation
PT
approach, we find widely comparable performance measures. However, the R² for fixed CCF is particularly high (and the RAE is low) because the estimation is based solely on defaults in
CE
exactly 12 months. However, note that we compare the results of the estimated fixed CCF
AC
with the realized fixed CCF. Thus, it should not be concluded that estimating the fixed CCF is 13
Results are available from the authors upon request. For estimation, we again use the same variables for all
models to have identical conditions for each model. Thus, the variable selection is not tuned specifically to each subcomponent of the mixture models because our focus is to find which parameter a model should be based on for optimally estimating EaD, and not to find the best statistical prediction model for the EAD. We notice that, for a given parameter choice, it might be meaningful to put additional emphasis into finding the best statistical model, e.g. as in Hon and Bellotti (2016) or Tong et al (2016) regarding mixture models, which should therefore be seen as an important complementary analysis.
19
ACCEPTED MANUSCRIPT superior to the other approaches but only that the accuracy of this model is high if the target is to estimate the CCF for a fixed horizon. We will show in Section 4.6 that the EL can be substantially biased if the estimation is based on the fixed-horizon approach. Moreover, the results in Table 5 confirm that there is a high dependence on CCF and the reference month. Especially for the absolute error, the minimum and the maximum of cohort CCF differ
CR IP T
substantially, ranging from an underestimation of 6% to an overestimation of 4%. As expected, the mean value of standard cohort CCF, calculated as the mean CCF for all possible reference periods, is similar to the generalized cohort CCF.
AN US
In the previous sections, we discussed the accuracy of estimates at the CCF and EaD level. However, the ultimate parameter of interest is the EL. In our next analysis, we empirically investigate which approach is beneficial for estimating the EL and in which
M
situations the resulting estimate of the EL is likely to be biased.
ED
4.6. Comparison of different estimation approaches evaluated at the EL level In Table 6, we first present different historical means for CCF, EaD, and PD regarding the
PT
standard cohort, the generalized cohort, the variable-horizon, and the fixed-horizon approach.14 We find that the mean CCF based on the fixed-horizon approach is much higher
CE
than the CCF based on the standard cohort or generalized cohort/variable-horizon approach,
AC
which is consistent with our theoretical results in Section 3.2. In Figure 3, the mean CCF is presented for all 12 months prior to default, which shows that our theoretical result that the
14
Note that the presented results for the generalized cohort and variable-horizon approach are identical because
historical means for CCF, EaD, and PD are displayed. Differences between the two approaches arise when EL estimates are calculated: For the generalized cohort approach, for each month, a separate CCF mean value is assigned to an individual contract, whereas for the variable-horizon approach, one general CCF mean value is assigned to all contracts.
20
ACCEPTED MANUSCRIPT CCF is strictly decreasing in t and, thus, that a fixed CCF leads to an overestimation of risk, holds for our empirical dataset. However, this bias does not apply to the EaD because the realized value of EaD does not depend on a reference point t.
The historically estimated CCF values computed on the basis of the standard cohort approach vary substantially, meaning that there is high dependence on the chosen reference
CR IP T
month. This means that assumption (A1) “the CCF for a given residual lifetime does not depend on the reference month” seems not to be fulfilled.15 Consequently, the same is true for the resulting EL estimates. As expected, the mean standard cohort CCF is nearly equal to the
AN US
variable/generalized cohort CCF.16
Notably, the estimated PDs based on the cohort approach differ only slightly. This indicates that the PD estimation seems to be nearly independent of the reference month. As the fixed-horizon approach only considers defaults in exactly 12 months, the value for PD is
M
approximately 12 times smaller than the corresponding value based on the variable approach
ED
because the observations refer to monthly data.
These results, in combination with our previous results presented in Section 4.4, imply
PT
that the estimates of the EL can be biased if different time horizons are used for parameter estimation. Based on the data, we now calculate the magnitude of this bias. Furthermore, we
CE
show how the CCF (in combination with the PD) should be modeled to achieve consistent
AC
estimates of the EL. For this purpose, we first calculate the simple historical estimators for
15
We also tested assumption (A1) with bootstrapping. We resample one 12th of observations (with replacement)
10,000 times and calculate the mean of CCF. The mean value of all 10,000 runs is comparable to the variable CCF in Table 6 (0.7460). The 95% confidence interval is [0.6758, 0.8193]. This indicates that assumption (A1) is not fulfilled if formulated unconditional on x. 16
If values based on the standard cohort approach were default weighted, the results would be identical to those
of the generalized cohort/variable-horizon approach.
21
ACCEPTED MANUSCRIPT CCF and PD, in both cases based on the four discussed estimation approaches (standard cohort, generalized cohort, variable and fixed approach). For each combination of these estimators, we calculate the performance of the resulting EL estimate. As a next step, we perform similar analyses but implement regression approaches (probit regressions for the PD and linear regressions for the CCF) instead of the historical estimators to evaluate the
CR IP T
performance of each combination of the four discussed approaches. In Table 7, we report the results for different estimates of the EL calculated from CCF and PD estimates that are based on the historical average using the four discussed approaches (cf. Table 6). For the cohort approach, we present the minimum, mean, and maximum of the d|(d=)
= 100%, i.e., in the
AN US
12 different reference months. For simplification, we assume
event of default, the entire exposure is lost. As expected, the mean CCF based on the fixedhorizon approach in combination with mean variable PD leads to a substantial overestimation of the EL of generally more than 49%. Furthermore, we find considerable variation for the
M
standard cohort CCF, which confirms our previous findings. However, when comparing the
ED
mean value of the standard cohort CCFs with the mean variable CCF and the mean generalized CCF, we find that the estimates are nearly identical.17 One possible problem is
PT
that the resulting EL values appear to be biased for all of these approaches, with an overestimation of the EL of approximately 28%. Considering a portfolio of 100,000 checking
CE
accounts, the overestimation of EL, for example when using “PD-mean: generalized cohort,”
AC
differs by from approximately €457,000 (“CCF-mean: variable”) up to €794,350 (“CCFmean: fixed”). The reason for this overestimation of the EL is that the mean CCF is calculated on the basis of observable values for defaulted customers, and this CCF is also assigned to non-defaulted customers, for which the CCF is not observable. This is problematic because 17
As discussed in Table 6, the CCF mean for the generalized cohort and the variable-horizon approach are
identical. However, at the EL level, the values differ slightly due to a not exactly uniform distribution of defaults over the year.
22
ACCEPTED MANUSCRIPT the CCF is typically lower if the PD is low (see Hibbeln et al. 2015 for a similar finding regarding credit line usage). As a consequence, the CCF values assigned to non-defaulting customers, who represent the majority of the dataset, are on average too high, leading to an overestimation of EL.
CR IP T
To analyze this further, we next implement regression models for the various CCF and PD approaches. As the expected CCF values are derived from regression models instead of historical averages, the resulting estimates include information on several explanatory variables, including the default risk of an account. Thus, the abovementioned problem that
AN US
CCF values are estimated based on defaulted accounts and applied to non-defaulted accounts does not necessarily lead to the previously found bias.18 Moreover, it is straightforward to formulate the proposition in Section 3.2 conditional on some arbitrary explanatory variables x; in this case, it is sufficient to formulate assumption (A1) conditional on x as well, (CCFt, |
M
x)= (CCFT+t,T | x), which is a substantially weaker assumption. Thus, even if the CCF
ED
depends on the reference month, this is not problematic as long as this dependence can be explained by some explanatory variables x that are considered in the estimation. For these
PT
reasons, it is likely that the biases found in the analyses above would be less pronounced or
CE
even vanish if we estimated CCF (and the EL) based on regression models instead of on
AC
historical averages.
18
Note that including, for example, default risk as an explanatory variable in the regression models results in
lower CCF values for non-defaulted accounts. The means of forecasted CCF values of non-defaulted accounts for the variable, standard cohort, and generalized cohort approach are 0.478, 0.482 (mean), and 0.478, whereas the means of forecasted CCF values of defaulted accounts for the variable, standard cohort, and generalized cohort approach are 0.745, 0.739 (mean), and 0.745.
23
ACCEPTED MANUSCRIPT Thus, similar to Table 7, we report different estimates of the EL in Table 8, but the estimates are now based on CCF regressions to allow controlling for variables, e.g., reflecting the default risk. We find that our proposed generalized cohort approach performs best in estimating EL in the next 12 months, but the results are very similar to those based on the variable CCF, and even the fixed and the standard cohort CCF exhibit good accuracy. This
CR IP T
confirms that assumption (A1) is rather uncritical if it is formulated conditional on x.
In summary, if a simple historical average is calculated to estimate the CCF, then the results are likely to overestimate the EL because the (high) CCF of accounts with high default
AN US
risk is assigned to accounts with low default risk. Moreover, the resulting CCF and EL estimates highly depend on the chosen approach (fixed, variable, standard cohort, or generalized cohort). As assumption (A1) is not necessarily fulfilled, the outcomes are likely to be biased, which is particularly problematic for the fixed and the standard cohort approach. If,
M
however, the CCF estimates are based on regression models and include variables reflecting
ED
the default risk, then the choice of the CCF approach has a substantially smaller impact on the
CE
5. Conclusion
PT
resulting bias of both CCF and EL.
In this paper, we identify and investigate several theoretical and empirical issues regarding
AC
EaD and CCF modeling. We apply our empirical analyses to a unique dataset of a large European bank, consisting of 2,798,491 monthly observations from 61,371 customers during the period 2007–2014. First, we propose the generalized cohort approach and discuss its advantages over the three other common approaches: the fixed-horizon, variable-horizon, and (standard) cohort approach. Specifically, the generalized cohort approach is beneficial for a consistent modeling of EL. We demonstrate the assumptions under which two other approaches (variable-horizon 24
ACCEPTED MANUSCRIPT and standard cohort) can also be used, but the fixed-horizon approach continues to be (positively) biased. From a regulatory perspective, however, this is rather unproblematic because the approach is more conservative. On the contrary, from an internal risk management perspective and regarding the new International Financial Reporting Standard (IFRS) 9, this could be problematic because consistent estimation is necessary.
CR IP T
Furthermore, in our empirical analyses, we use in- and out-of-sample estimations to show which parameter is beneficial for EaD modeling to achieve high accuracy. We find that CCF should be applied instead of LEQ, EaDF, or direct EaD estimates. In particular, a transformation of EaD estimation into CCF produces rather low performance, whereas a
AN US
transformed CCF estimation outperforms direct EaD predictions on most predictive accuracy measures.
Finally, we empirically demonstrate that there are interactions between modeling PD and EaD that can lead to substantially biased estimates of the EL. Regarding the (standard)
M
cohort approach, we show that the result depends substantially on the chosen reference month,
AC
CE
PT
ED
whereas our proposed generalized cohort approach avoids this problem.
25
ACCEPTED MANUSCRIPT Appendix – Proof of the Proposition 1) Using et = e(t), Lt = L(t), a:= Et(ed | d = T) = E(ed | d = T) and applying the definition of CCF
AN US
a e(t ) CCFt ,T t t L(t ) e(t ) e '(t ) ( L(t ) e(t )) (a e(t )) ( L '(t ) e '(t )) ( L(t ) e(t )) 2 e '(t ) L(t ) a ( L '(t ) e '(t )) e(t ) L '(t ) ( L(t ) e(t )) 2 (a L(t )) e '(t ) (a e(t )) L '(t) 0 ( L(t ) e(t )) 2 a L(t ) L '(t ) e '(t ) L '(t ) . a e(t ) e '(t ) a e(t ) L(t ) a
CR IP T
for et Lt (see assumption A4), it follows that
The latter inequality is fulfilled as a result of assumption (A5).
M
2) T 1
t
T 1
ED
ELt ,T LGDt (et CCF ,T ( Lt et )) PD ,T LGDt ( et CCFt ,T ( Lt et )) PD ,T t
T 1
PT
LGDt (et CCFt ,T ( Lt et )) PD ,T t
CE
LGDt (et CCFt ,T ( Lt et )) PDtcum ,T .
AC
The above inequality results from the application of part 1) to each of the CCF values in the sum.
26
ACCEPTED MANUSCRIPT References Agarwal, S., B.W. Ambrose, and C. Liu. 2006. Credit Lines and Credit Utilization. Journal of Money, Credit, and Banking 38, 1-22. Araten, A., and M. Jacobs Jr. 2001. Loan Equivalents for Revolving Credits and Advised Lines. The RMA Journal. May. 34-39. Asarnow, E., and J. Marker. 1995. Historical Performance of the U.S. Corporate Loan Market 1988-1993. Commercial Lending Review 10, 13-32.
Loan Commitments. Journal of Risk Finance 13, 77-94.
CR IP T
Bag, P., and M. Jacobs Jr. 2012. Parsimonious Exposure-at-Default Modeling for Unfunded
Banerjee, P., and J.J. Canals-Cerdá. 2012. Credit Risk Analysis of Credit Card Portfolios under Economic Stress Conditions, Working Paper, Federal Reserve Bank of Philadelphia, No. 12-18.
AN US
Bank for International Settlements. 2014. Statistics on Payment, Clearing and Settlement Systems in the CPMI Countries.
Bank for International Settlements. 2016. Regulatory Consistency Assessment Programme (RCAP) – Analysis of Risk-Weighted Assets for Credit Risk in the Banking Book.
Finance 34, 2510-2517.
M
Bastos, J. A. 2010. Forecasting Bank Loans Loss-Given-Default. Journal of Banking and
ED
Bluhm, C., Overbeck, L., and C. Wagner. 2010. Introduction to Credit Risk Modeling, 2nd Ed. CRC Press, Boca Raton.
PT
Campbell, J.Y., and S.B. Thompson. 2008. Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average? Review of Financial Studies 21, 1509–1531.
CE
Committee of European Banking Supervisors. 2006. Guidelines on the Implementation, Validation and Assessment of Advanced Measurement (AMA) and Internal Ratings Based
AC
(IRB) Approaches.
Deutsche Bundesbank. 2015. Zinsstatistik. November. Gürtler, M., and M. Hibbeln. 2013. Improvements in Loss Given Default Forecasts for Bank Loans. Journal of Banking and Finance 37, 2354-2366. Hahn R., and S. Reitz. 2011. Possibilities of Estimating Exposures. In: Engelmann, B., and R. Rauhmeier (Ed.), The Basel II Risk Parameters, 2nd Ed. Springer, Berlin. Hartmann-Wendels, T., P. Miller, and E. Töws. 2014. Loss Given Default for Leasing: Parametric and Nonparametric Estimations. Journal of Banking and Finance 40, 364-375. 27
ACCEPTED MANUSCRIPT Hibbeln, M., L. Norden, P. Usselmann, and M. Gürtler. 2015. Informational Synergies in Consumer Credit. Working Paper. January 2015. Hon, P. S., and T. Bellotti. 2016. Models and Forecasts of Credit Card Balance, European Journal of Operational Research 249, 498-505. Jacobs Jr, M. 2010. An Empirical Study of Exposure at Default. Journal of Advanced Studies in Finance 1, 32-59. Jacobs Jr, M., and P. Bag 2011. What do We Know About Exposure at Default on Contingent
Paper, April 2011.
CR IP T
Credit Lines? – A Survey of the Literature, Empirical Analysis and Models. Working
Jiménez, G., J.A. Lopez, and J. Saurina. 2008. Calibrating Exposure at Default for Corporate Credit Lines. Journal of Risk Management in Financial Institutions 2, 121-129.
Jiménez, G., J.A. Lopez, and J. Saurina. 2009. Empirical Analysis of Corporate Credit Lines.
AN US
Review of Financial Studies 22, 5069-5098.
Kim, M.-J. 2008. Stress EAD: Experience of 2003 Korea Credit Card Distress. Journal of Economic Research 13, 73-102.
Leow, M., and J. Crook. 2016. A New Mixture Model for the Estimation of Credit Card
M
Exposure at Default. European Journal of Operational Research 249, 487-497.. Moral, G. 2011. EAD Estimates for Facilities with Explicit Limits. In: Engelmann, B., and R.
ED
Rauhmeier (Ed.), The Basel II Risk Parameters, 2nd ed. Springer, Berlin. Qi, M. 2009. Exposure at Default of Unsecured Credit Cards. Working Paper, OCC
PT
Economics Working Paper 2009-2.
Qi, M., and X. Zhao. 2011. Comparison of Modeling Methods for Loss Given Default.
CE
Journal of Banking and Finance 35, 2842-2855. Sufi, A. 2009. Bank Lines of Credit in Corporate Finance - An Empirical Analysis. Review of
AC
Financial Studies 22, 1057-1088. Taplin R., H.M To, and J. Hee. 2007. Modeling Exposure at Default, Credit Conversion Factors and the Basel II Accord. Journal of Credit Risk 3, 75-84.
Tong, E.N.C., C. Mues, I. Brown, and L.C. Thomas. 2016. Exposure at Default Models with and without the Credit Conversion Factor, European Journal of Operational Research 252, 910-920. Valvonis, V. 2008. Estimating EAD for Retail Exposures for Basel II Purposes. Journal of Credit Risk 4, 79-109. 28
ACCEPTED MANUSCRIPT Yang, B.H., and M. Tkachenko. 2012. Modeling Exposure at Default and Loss Given Default: Empirical Approaches and Technical Implementation. Journal of Credit Risk 8, 81-102. Zhao, J. Y., D.W. Dwyer, and J. Zhang. 2011. Usage and Exposures at Default of Corporate
AC
CE
PT
ED
M
AN US
CR IP T
Credit Lines: An Empirical Study. Moody’s Analytics. December. 1-19.
29
ACCEPTED MANUSCRIPT Figure 1: Different estimation approaches of CCF/EaD
CR IP T
Panel A displays the fixed-horizon approach. The exposure at default (black) is linked to the balance and limit at the date of reference (gray) exactly one year prior to default (fixed time). Panel B displays the variable-horizon approach. It is a generalization of the fixed-horizon approach. The year before default is subdivided into several time windows with several dates of reference (e.g., each month). The exposure at default is linked to the balance and limit of each date of reference. Panel C displays the standard cohort approach. The observation period consists of general dates of reference (e.g., the first month of each year) by subdividing the full observation period into one-year time windows. The exposure at default is linked to the balance and limit at the starting point of the corresponding time window. Panel D displays the generalized cohort approach. The observation period consists of general dates of reference, but in contrast to the standard cohort approach, each month of the year is used as a date of reference. In the upper half, the date of reference A (e.g., the first month of each year) is displayed. In the lower half, the date of reference B (e.g., the second month of each year) is displayed, leading to different CCF/EaD estimates depending on the relevant date of reference. Again, the exposure at default is linked to the balance and limit at each date of reference. A similar graphical illustration of the first three approaches can be found in Valvonis (2008). Panel A: Fixed-horizon approach
AN US
Date of reference One-year fixed horizon
Panel B: Variable-horizon approach
Dates of reference
Date of default
Time
Date of default
ED
M
Max. one-year horizon
Time
Panel C: Standard cohort approach
Date of reference Date of default
PT
One-year time window
CE
Time
AC
Panel D: Generalized cohort approach Date of reference Date of default A
One-year time window
Time
One-year time window
Date of reference Date of default B
Time
30
ACCEPTED MANUSCRIPT Figure 2: Frequency distribution of realized CCF, LEQ, EaDF, and EaD This figure reports frequency distributions of the realized (winsorized) dependend variables credit conversion factor (CCF), loan equivalent factor (LEQ), exposure at default factor (EaDF), and exposure at default (EaD).
Panel B: Frequency distribution of realized LEQ
2000 0
0
0
2 CCF
4
6
1
2
3
4
5
LEQ
Panel D: Frequency distribution of realized EaD
.5
1 EaD-F
1.5
2000
4000
2
0
AC
CE
PT
0
0
0
ED
1000
M
2000
Frequency
6000
3000
8000
4000
Panel C: Frequency distribution of realized EaDF
Frequency
0
AN US
-2
CR IP T
4000
Frequency
6000 4000 2000
Frequency
8000
10000
6000
Panel A: Frequency distribution of realized CCF
31
5000
10000 EaD
15000
20000
ACCEPTED MANUSCRIPT Figure 3: CCF for different reference periods
Months to default
CCF-weigthed-mean
AC
CE
PT
ED
M
CCF-mean
32
1
3
4
5
6
AN US 7
8
9
10
11
12
2
CR IP T
.6 .2
.4
CCF
.8
1
The figure presents the mean CCF and the weighted mean CCF (weighted by the open limit) calculated for different reference periods between 12 months and one month prior to default.
ACCEPTED MANUSCRIPT Table 1: Empirical studies concerning the credit conversion factor This table summarizes the CCF/EaD literature regarding mean CCF values and various characteristics of the analyzed data. The country abbreviation AU stands for Australia, CA for Canada, ES for Spain, EU for European Union, KR for South Korea, UK for the United Kingdom, and US for the United States. Agarwal et al. (2006), Banerjee and Canals-Cerdá (2012), Leow and Crook (2016), and Yang and Tkachenko (2012) do not report explicit CCF values. Observations denotes the number of observations in the study, which can either be based on the full sample (f.s.) or on the defaulted sample (d.s.) after filtering the data. For studies with n.a., the number of observations was not available. For more detailed values of the survey by the Bank for International Settlements (2016) see ibid. p. 29. Authors
Country
Data
Years
CCF
Type
Agarwal et al. (2006)
US
Bank
98-01
n.a.
Araten/Jacobs (2001)
US
Bank
95-00
43%
Asarnow/Marker (1995)
US
Capital market
88-93
60%
Retail Home equity line Corporate Revolving credit Corporate Loan
Bag/Jacobs (2012)
US
Capital market
2008
52.5%
Banerjee/Canals-Cerdá (2012)
US
Credit bureau
05-10
n.a.
Bank for International Settlements (2016)
17 countries
Bank
n.a.
62%/ 69%
Corporate/ Retail
Jacobs (2010)
US
Capital market
85-07
63.7%
Corporate Lines of credit
Jiménez et al. (2009)
ES
Credit bureau
84-05
59.6%
Jiménez et al. (2008)
ES
Credit bureau
84-05
Kim (2008)
KR
Bank
Leow/Crook. (2016)
UK
Bank
Borrowers 34,384
1021 (d.s.)
399
n.a. (f.s.)
575 (facilities)
n.a. (d.s.)
26
22,265,000 (f.s.)
2,539,000
n.a.
n.a.
3886 (d.s.)
683
Corporate Credit line
2,078,434 (f.s.)
368,977
59.6%
Corporate Credit line
n.a. (d.s.)
4,094
01-05
39%
Corporate Credit card
197,810 (d.s.)
197,810
01-10
n.a.
Retail Credit card
n.a. (d.s.)
80,552
CR IP T
34,384 (f.s.)
Corporate Contingent credit line Retail Credit card
AN US
M
ED
PT
Qi (2009)
Observations
US
Credit bureau
99-06
166%
Retail Credit card
152,657 (d.s.)
152,657
AU
Bank
n.a.
25%
Corporate Credit card
n.a.
n.a.
UK
Bank
01-04
51,5%
Retail Credit Card
10,271 (d.s)
10,271
Valvonis (2008)
EU
Bank
05-06
32.4%
Retail Credit card
n.a. (d.s.)
3,332
Valvonis (2008)
EU
Bank
05-06
48.0%
Corporate Credit card
n.a. (d.s.)
44
Yang/Tkachenko (2012)
CA
Bank
n.a.
n.a.
Corporate
n.a. (d.s.)
500
Zhao et al. (2011)
US
Bank
00-10
49%
Corporate Credit line
n.a. (d.s.)
7,653
Zhao et al. (2011)
US
Capital market
87-09
39%
Corporate Credit line
n.a. (d.s.)
286
AC
Tong et al. (2016)
CE
Taplin et al. (2007)
33
ACCEPTED MANUSCRIPT Table 2: Summary statistics This table reports summary statistics of our data. Panel A reports the number of account-month observations and the frequency of default events. Panel B provides summary statistics of our (winsorized) variables of interest: credit conversion factor (CCF), loan equivalent factor (LEQ), exposure at default factor (EaDF), and exposure at default (EaD).
Panel A: Number of account-month observations and default events
q25 0.000 0.497 0.373 271 €
AC
CE
PT
ED
M
AN US
Panel B: Summary statistics of realized dependent variables Mean Std. Dev. CCF 0.730 1.674 LEQ 1.232 1.168 EaDF 0.849 0.551 EaD 1,394 € 1,567 €
34
Checking account 2,798,491 27,402 2,623
CR IP T
Number of account-month observations Number of account-months with default in the subsequent 12 months Number of defaults
q50 0.084 1.056 0.997 945 €
q75 1.138 1.429 1.125 2,031 €
ACCEPTED MANUSCRIPT Table 3: Comparison of different parameters evaluated at the EaD level We calculated the in-sample and out-of-sample performance based on the means of 1,000 random 10-fold crossvalidations using OLS regressions (for CCF, LEQ, EaDF, and EaD) or historical averages (CCF-mean and EaDmean). Panel A reports the in-sample performance measures of coefficient of determination (R2), relative absolute error (RAE), root mean squared error (RMSE), mean absolute error (MAE), absolute error (ABS), and the relative error (REL). The displayed values are at the EaD level, meaning that we transformed CCF, LEQ, EaDF, and CCF-mean into EaD estimates. We report standard deviations in parentheses. The best values are highlighted in bold. Panel B reports the corresponding out-of-sample performance measures.
Panel A: In-sample predictive accuracy
EaDF EaD CCF-mean EaD-mean
RMSE 963.4
MAE 520.4
(0.000)
(0.002)
(0.042)
(0.000)
(0.000)
(0.000)
0.433
53.32
1,179
607.0
-141.3
-0.101
(0.000)
(0.002)
(0.109)
(0.021)
(0.006)
(0.000)
0.588
50.04
1,005
569.6
-63.81
-0.046
(0.000)
(0.001)
(0.021)
(0.009)
(0.000)
(0.000)
0.618
48.90
968.6
556.7
0.000
0.000
(0.000)
(0.001)
(0.014)
(0.011)
(0.000)
(0.000)
0.549
52.51
(0.000)
(0.000)
0.000
100.0
(0.000)
(0.000)
1,052 (0.007)
ABS 0.335
REL 0.000
CR IP T
LEQ
RAE 45.71
597.7
AN US
CCF
R2 0.622
(0.000)
163.8
0.118
(0.002)
(0.000)
1,567
1,138
0.000
0.000
(0.005)
(0.002)
(0.000)
(0.000)
RMSE 963.9
MAE 521.1
ABS 0.327
REL 0.000
LEQ EaDF
RAE 45.78
(0.001)
(0.016)
0.432
53.39
(0.000)
(0.017)
0.587 (0.001)
EaD
0.616
CCF-mean
(0.177)
(0.000)
607.8
-141.3
-0.101
(0.168)
(0.176)
(0.000)
1,006
570.6
-63.88
-0.046
(0.011)
(0.541)
(0.104)
(0.097)
(0.000)
49.00
969.9
557.8
-0.054
0.000
(0.012)
(0.430)
(0.115)
(0.099)
(0.000)
0.548
52.51
1,052
597.8
163.8
0.118
(0.001)
(0.008)
(0.314)
(0.026)
(0.020)
(0.000)
0.000
100.0
1,566
1,138
0.000
0.001
(0.000)
(0.000)
(0.322)
(0.025)
(0.001)
(0.000)
AC
CE
EaD-mean
(0.167)
1,178
(1.548)
PT
(0.001)
(0.579)
50.12
ED
CCF
R2 0.620
M
Panel B: Out-of-sample predictive accuracy
35
ACCEPTED MANUSCRIPT Table 4: Comparison of different parameters evaluated at the CCF level We calculated the in-sample and out-of-sample performance based on the means of 1,000 random 10-fold crossvalidations using OLS regressions (for CCF, LEQ, EaDF, and EaD) or historical averages (CCF-mean and EaDmean). Panel A reports the in-sample performance measures of coefficient of determination (R2), relative absolute error (RAE), root mean squared error (RMSE), mean absolute error (MAE), absolute error (ABS), and the relative error (REL). The displayed values are at the CCF level, meaning that we transformed LEQ, EaDF, EaD, and EaD-mean into CCF estimates. We report standard deviations in parentheses. The best values are highlighted in bold. Panel B reports the corresponding out-of-sample performance measures.
Panel A: In-sample predictive accuracy RAE 78.56
RMSE 1.504
MAE 0.907
ABS 0.000
REL 0.000
(0.000)
(0.001)
(0.000)
(0.000)
(0.000)
(0.000)
LEQ
-9,132
549.1
477.3
11.39
2.498
2.417
(9,221)
(1.089)
(2.733)
(0.023)
(0.022)
(0.021)
EaD CCF-mean EaD-mean
-13,837
464.8
195.3
5.367
0.028
0.038
(115.62)
(0.802)
(1.158)
(0.009)
(0.009)
(0.013)
-1,904,017
7,241
2,304
83.61
80.13
109.9
(6,053.13)
(7.833)
(1.758)
(0.090)
(0.090)
(0.124)
0.000
100.0
(0.000)
(0.000)
-9,108,667
1,747
(943,113)
(0.893)
0.000
0.000
(0.000)
1.674
(0.000)
1.155
(0.000)
(0.000)
5,050
201.7
182.1
249.6
(0.922)
(0.008)
(0.008)
(0.021)
AN US
EaDF
CR IP T
CCF
R2 0.193
R2 0.189
RAE 78.74
RMSE 1.507
MAE 0.909
ABS 0.000
REL 0.002
(0.000)
(0.000)
(0.000)
(0.000)
(0.001)
553.2
323.1
11.50
2.419
2.343
(1,136,440)
M
Panel B: Out-of-sample predictive accuracy
(11.55)
(23.42)
(0.209)
(0.200)
(0.325)
EaDF
-14,256
468.1
130.3
5.408
-0.000
0.005
(1,151)
(7.642)
(8.578)
(0.085)
(0.084)
(0.134)
EaD
-1,919,390
7,259
2,187
83.81
80.29
110.2
(51,143)
(86.64)
(61.26)
(0.986)
(0.984)
(1.568)
0.000
100.0
1.674
1.155
0.000
0.002
(0.000)
(0.000)
(0.000)
(0.000)
(0.000)
(0.001)
CCF
-9,501
CCF-mean
17,470
4,851
201.8
182.1
249.9
(3,777,000,000)
(49.464)
(111.3)
(0.069)
(0.069)
(1.809)
-9,115,929
AC
CE
EaD-mean
PT
ED
LEQ
(0.017)
36
CR IP T
ACCEPTED MANUSCRIPT
Table 5: Comparison of different approaches for CCF estimation at the CCF level
Panel A: In-sample predictive accuracy CCF: standard cohort
R2 0.175 0.179 0.184
RAE 78.41 79.68 82.03
RMSE 1.512 1.517 1.522
MAE 0.915 0.920 0.929
ABS -0.061 -0.001 0.040
REL -0.083 -0.001 0.054
(0.000)
(0.013)
(0.000)
(0.000)
(0.000)
(0.000)
CFF: generalized cohort CCF: variable CCF: fixed
(0.000)
(0.020)
0.206
78.52
(0.000)
(0.004)
0.193
78.56
(0.000)
(0.001)
0.941
5.727
(0.000)
(0.001)
Panel B: Out-of-sample predictive accuracy
(0.000)
(0.000)
(0.000)
(0.000)
1.491
0.907
0.000
0.000
(0.000)
(0.000)
(0.000)
(0.000)
1.504
0.907
0.000
0.000
(0.000)
(0.000)
(0.000)
(0.000)
1.617
1.080
0.000
0.000
(0.000)
(0.000)
(0.000)
(0.000)
RAE 78.58 79.87 82.22
RMSE 1.515 1.521 1.526
MAE 0.917 0.923 0.931
ABS -0.061 -0.001 0.040
REL -0.082 0.000 0.056
(0.001)
(0.043)
(0.001)
(0.001)
(0.001)
(0.001)
CFF: generalized cohort CCF: variable CCF: fixed
(0.001)
(0.054)
0.159
80.86
(0.001)
(0.059)
0.189
78.74
(0.000)
(0.017)
0.937
(0.001)
CE AC
(0.001)
(0.001)
(0.001)
1.535
0.934
-0.001
0.000
(0.001)
(0.001)
(0.001)
(0.001)
1.507
0.909
0.000
0.002
(0.000)
(0.000)
(0.000)
(0.001)
5.923
1.669
1.117
-0.002
0.000
(0.019)
(0.007)
(0.004)
(0.004)
(0.000)
PT
(0.001)
M
R2 0.171 0.175 0.180
ED
CCF: standard cohort
AN US
We calculated the in-sample and out-of-sample performance based on the means of 1,000 random 10-fold cross-validations using OLS regressions. Panel A reports the in-sample performance measures of coefficient of determination (R2), relative absolute error (RAE), root mean squared error (RMSE), mean absolute error (MAE), absolute error (ABS), and the relative error (REL). The displayed values are at the CCF level using different estimation approaches. For the standard cohort approach we present the minimum, mean, and maximum of the 12 different reference months. We report standard deviations in parentheses. Panel B reports the corresponding out-of-sample performance measures.
37
ACCEPTED MANUSCRIPT Table 6: Summary statistics for CCF, EaD, and PD The table displays historically estimated means for CCF, EaD, and PD using different estimation approaches. For the standard cohort approach we present the minimum, mean, and maximum of the 12 different reference months.
Generalized cohort
Variable
Fixed
0.745 1,407 € 0.0115
0.745 1,407 € 0.0115
0.880 1,410 € 0.0008
AC
CE
PT
ED
M
AN US
CR IP T
CCF EaD PD
Standard cohort min mean max 0.648 0.746 0.824 1,379 € 1,407 € 1,440 € 0.0110 0.0115 0.0119
38
CR IP T
ACCEPTED MANUSCRIPT
Table 7: Comparison of different approaches for CCF-estimation at EL-level based on historical averages
The table displays different estimates of the EL calculated from CCF and PD estimates that are based on different approaches. For estimation of CCF and PD, we implemented the historical averages based on the different approaches (standard cohort, generalized cohort, variable-, and fixed-horizon approach). For the cohort approach, we present the minimum, mean, and maximum of the 12 different reference months. Without loss of generality, we assume that LGD = 1.
CCF-mean: variable
AC
CE
0.002 114.2 225.8 36.45 4.587 0.285 0.002 114.2 225.8 36.44 4.584 0.284 0.002 114.2 225.8 36.44 4.578 0.284 -0.002 124.7 226.3 39.79 7.944 0.493
0.005 120.3 226.1 38.39 6.538 0.406
PD-mean: variable
0.000 106.56 225.6 34.00 2.128 0.132
AN US
PT
CCF-mean: fixed
0.000 106.6 225.6 34.00 2.130 0.132
M
CCF-mean: generalized cohort
PD-mean: generalized cohort
ED
CCF-mean: standard cohort
R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL
PD-mean: standard cohort min mean max -0.000 0.002 0.005 104.3 114.4 123.1 225.6 225.9 226.1 33.27 36.50 39.28 1.385 4.643 7.449 0.086 0.288 0.462 0.002 0.002 0.002 111.6 114.3 116.7 225.8 225.8 225.9 35.60 36.46 37.25 3.726 4.600 5.402 0.231 0.285 0.335 0.002 0.002 0.002 111.6 114.3 116.8 225.8 225.8 225.9 35.61 36.47 37.25 3.733 4.607 5.410 0.232 0.286 0.336 -0.002 -0.002 -0.001 121.7 124.8 127.7 226.2 226.3 226.3 38.82 39.82 40.74 6.962 7.978 8.911 0.432 0.495 0.553
39
0.002 114.2 225.8 36.44 4.584 0.284 0.002 114.2 225.8 36.43 4.568 0.283 0.002 114.2 225.8 36.43 4.575 0.284 -0.002 124.7 226.3 39.78 7.940 0.493
0.005 120.3 226.1 38.39 6.535 0.406
PD-mean: fixed
-0.004 54.32 226.5 17.33 -14.88 -0.923
-0.004 54.840 226.5 17.50 -14.71 -0.913 -0.004 54.84 226.5 17.50 -14.71 -0.913 -0.004 54.84 226.5 17.50 -14.71 -0.913 -0.004 55.55 226.5 17.72 -14.48 -0.899
-0.004 55.25 226.5 17.63 -14.58 -0.905
CR IP T
ACCEPTED MANUSCRIPT
Table 8: Comparison of different approaches for CCF estimation at the EL level based on regression models
The table displays different estimates of the EL calculated from CCF and PD estimates that are based on different approaches. For estimation of CCF and PD, we implemented OLS regressions and probit regressions, respectively, based on the different approaches (standard cohort, generalized cohort, variable-, and fixed-horizon approach). For the cohort approach, we present the minimum, mean, and maximum of the 12 different reference months. Without loss of generality, we assume that LGD = 1. The best values for EL estimates regarding defaults within one year are highlighted in bold.
CCF: variable
AC
CE
CCF: fixed
PT
PD: variable
AN US
CCF: generalized cohort
PD: generalized cohort 0.100 86.31 214.1 27.54 -0.837 -0.052
0.102 87.48 214.3 27.91 -0.422 -0.026 0.102 87.46 214.3 27.91 -0.430 -0.027 0.102 87.37 214.2 27.88 -0.457 -0.029 0.100 90.24 214.5 28.79 0.550 0.034
M
CCF: standard cohort
R RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL R2 RAE RMSE MAE ABS REL
PD: standard cohort min mean max 0.098 0.101 0.104 85.07 87.56 89.40 214.0 214.4 214.7 27.14 27.94 28.52 -1.300 -0.406 0.332 -0.081 -0.025 0.021 0.100 0.101 0.103 86.31 87.55 88.65 214.2 214.4 214.5 27.54 27.94 28.29 -0.861 -0.408 0.079 -0.053 -0.025 0.005 0.100 0.101 0.103 86.20 87.45 88.54 214.1 214.3 214.5 27.50 27.90 28.25 -0.895 -0.440 0.041 -0.056 -0.027 0.003 0.098 0.099 0.101 89.01 90.33 91.76 214.4 214.6 214.7 28.40 28.82 29.28 0.089 0.568 1.169 0.006 0.035 0.073
ED
2
40
0.103 88.41 214.2 28.21 -0.106 -0.007
0.100 86.32 214.2 27.54 -0.850 -0.053
0.101 87.50 214.3 27.92 -0.433 -0.027 0.101 87.48 214.3 27.91 -0.438 -0.027 0.102 87.39 214.3 27.88 -0.469 -0.029 0.099 90.27 214.6 28.80 0.543 0.034
PD: fixed
0.102 88.43 214.5 28.21 -0.117 -0.007
0.006 52.97 225.4 16.90 -15.12 -0.938
0.006 53.04 225.4 16.93 -15.09 -0.936 0.006 53.04 225.4 16.93 -15.09 -0.936 0.006 53.03 225.4 16.92 -15.09 -0.937 0.006 53.25 225.4 16.99 -15.02 -0.932
0.006 53.12 225.5 16.95 -15.07 -0.935