Evaluating measures of adverse financial conditions

Evaluating measures of adverse financial conditions

G Model ARTICLE IN PRESS JFS-449; No. of Pages 16 Journal of Financial Stability xxx (2016) xxx–xxx Contents lists available at ScienceDirect Jou...

7MB Sizes 0 Downloads 58 Views

G Model

ARTICLE IN PRESS

JFS-449; No. of Pages 16

Journal of Financial Stability xxx (2016) xxx–xxx

Contents lists available at ScienceDirect

Journal of Financial Stability journal homepage: www.elsevier.com/locate/jfstabil

Evaluating measures of adverse financial conditions夽 Mikhail V. Oet a,b,∗ , Dieter Gramlich c , Peter Sarlin d,e a

Weatherhead School of Management, Case Western Reserve University, 11119 Bellflower Road, Cleveland, OH 44106, United States Federal Reserve Bank of Cleveland, 1455 E 6th St., Cleveland, OH 44114, United States c Baden-Wuerttemberg Cooperative State University, Marienstraße 20, 89518 Heidenheim, Germany d Department of Economics, Hanken School of Economics, Arkadiankatu 7, FI-00100 Helsinki, Finland e RiskLab Finland at Arcada University of Applied Sciences, Jan-Magnus Janssons Plats 1, Fi-00560 Helsinki, Finland b

a r t i c l e

i n f o

Article history: Received 5 November 2014 Received in revised form 15 June 2016 Accepted 29 June 2016 Available online xxx JEL classifications: G01 G18 E32 E37

a b s t r a c t Timely identification and anticipation of adverse conditions in the financial system are critical for macroprudential policy. However, there is no consensus on how to evaluate the quality of systemic measures. This paper provides a framework to compare measures of systemic conditions. We illustrate the proposed tests with a case study of US measures from 1976 to 2013. We find that measures which include information from multiple markets improve identification of critical system states. However, tested measures show limited capacity to anticipate critical episodes. © 2016 Elsevier B.V. All rights reserved.

Keywords: Measures of systemic conditions Evaluation of information quality Signal extraction approach Pervasiveness Persistence Severity Noise-to-signal ratio Relative usefulness Information value

1. Introduction

夽 The authors thank Alistair Milne, Joseph Haubrich, Kalle Lyytinen, Lucia Alessi, Agostino Capponi, Myong-Hun Chang, Corinne Coen, John M. Dooley, Stephen J. Ong, and the anonymous referees for constructive feedback. The authors are grateful to Monica Reusser for editorial suggestions. The authors also thank the participants of the Conference on Data Standards, Information and Financial Stability (Loughborough University, April 11–12, 2014), the 13th INFINITI Conference on International Finance (Ljubljana, June 8–9, 2015), and the 5th International Conference of the Financial Engineering and Banking Society (Nantes, June 11–13, 2015) for helpful comments and discussion. The views expressed are those of the authors and are not to be considered as the views of the Federal Reserve Bank of Cleveland or the Federal Reserve System. ∗ Corresponding author at: Weatherhead School of Management, Case Western Reserve University, 11119 Bellflower Road, Cleveland, OH 44106, United States. E-mail addresses: [email protected], [email protected] (M.V. Oet), [email protected] (D. Gramlich), peter@risklab.fi (P. Sarlin).

The complexity of the financial system continues to challenge supervisors and policymakers.1 Such challenges include not only concerns about the safety and soundness of individual institutions, but also systemwide risks. Policymakers agree that control of systemwide risks must identify changes in the system (examples of financial system transformations are highlighted in Fig. 1). Implementations of dynamic macroprudential policy have been

1 The concept of the economy as a complex and adaptive system was pioneered by Holland (1975, 1988) in his work on adaptive nonlinear networks. Brock and Hommes (1997, 1998) study financial markets as adaptive belief systems. Hommes (2001) extends this approach to markets as nonlinear adaptive evolutionary systems. See Arthur (1995) and Farmer and Lo (1999) for an analysis of heterogeneity in financial markets, Hollingsworth et al. (2005) for the socioeconomic implications and Judge (2012) for analysis of complexity caused by the fragmentation of financial markets.

http://dx.doi.org/10.1016/j.jfs.2016.06.008 1572-3089/© 2016 Elsevier B.V. All rights reserved.

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16 2

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

Fig. 1. Percentage of total financial assets held by each financial sector: US data, 1952–2013. Source: Board of Governors of the Federal Reserve System (2014).

suggested by the Bank of England (BoE, 2011) and the IMF (Lim et al., 2011).2 To efficiently implement prudential policies, regulators need measures that are able to identify and anticipate adverse conditions in the financial system (Borio, 2003). The problem we confront in this study is the evaluation of these measures. Multiple coincident and early-warning measures are available to assess systemwide risks.3 A substantial research effort has also focused on the problem of evaluating early-warning measures (Edison, 2003; Davis and Karim, 2008; Drehmann and Juselius, 2014; Holopainen and Sarlin, 2015). However, few papers address the practical issue of evaluating coincident measures and how they might be used by policymakers.4 The following questions are addressed in this paper: First, how can the suitability of systemic measures for policy be assessed? Second, what are the empirical findings from such evaluation? This paper proposes a methodology to evaluate both coincident and early-warning measures of systemic conditions. We then apply this methodology in a case study of US data from 1976 to 2013. We show how the strength and consistency of association between volatility and alternative measures of US financial conditions varies. However, few of the measures considered provide reliable early-warning out of sample. Hence the available US data appears more suitable for monitoring adverse conditions than for anticipating them. The paper is organized as follows. Section 2 traces the development of evaluation methods for the binary classification problem across the literature. Section 3 proposes three methodological contributions to support the assessment of systemic condition

2 Similarly, the Basel Accords continually enhance the flexibility of banking regulation to keep pace with financial system changes. 3 These include measures intended to continually monitor the cyclical buildup of widespread imbalances, as well as early-warning indicators of exuberance, excessive change, and misalignments. Overviews are given by Davis and Karim (2008), Gramlich et al. (2010), Babecky´ et al. (2013), and Holopainen and Sarlin (2015). 4 Kliesen et al. (2012) survey the composition of available coincident measures. Gallegati (2014) applies wavelet analysis to compare the early-warning properties of several coincident measures.

measures. First, multidimensional signal extraction enables the search for optimal systemic measurement. Second, the classification of system states is improved by considering the severity, persistence, and pervasiveness of volatility. Third, an information value statistic is used to assess the quality of systemic measures across a diverse range of system states. Section 4 applies the proposed evaluation framework to US systemic measures from 1976 to 2013. In this case study, we confirm that measures based on multiple markets identify critical states better than more narrowly constructed alternatives. In addition, we find that considerations of level and change in system conditions are relevant to policymakers’ decisions. Section 5 concludes with a discussion of this study’s implications. 2. Literature review The literature offers many measures of financial system conditions. Coincident measures seek to identify current system conditions. Early-warning measures seek to anticipate potentially adverse conditions. Coincident measures include financial condition indexes (FCIs) and financial stress indexes (FSIs). FCIs assess the impact of deviations of asset prices from long-term trends (Bordo et al., 2000; Swiston, 2008; inter alia). The notion of FSIs varies widely from systemic excitation (Korinek, 2011) to measurement of the demand-supply imbalance for financial goods (Borio and Lowe, 2002; Lo Duca and Peltonen, 2013), to force exerted on economic agents by changing expectations (Illing and Liu, 2006). There is little consensus on the choice of these measures. This is particularly evident in coincident measures, where both policy goals and conceptual definitions vary widely. Policy goals include inter alia identification of adverse conditions (Carlson et al., 2012), differentiation from cyclical activity (Hatzius et al., 2010; Brave and Butters, 2012), guiding monetary policy (Hakkio and Keeton, 2009), and detection of system instability (Holló et al., 2012). Early-warning measures (EWMs) include macroeconomic and institutional indicators of exuberance, excessive changes, and overall build-up of imbalances. Macroeconomic EWMs detail the systemwide imbalances which lead the financial cycle toward

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

crises (Kaminsky et al., 1998; Borio and Drehmann, 2009 inter alia). Institutional EWMs specify leading institutional imbalances to explain the build-up of macroeconomic stress (Hanschel and Monnin, 2005; Dridi et al., 2012; Oet et al., 2013). Recent studies in economics (Rosser, 2013; Elsner et al., 2014), finance (Mantegna and Stanley, 1999), and social sciences (Dopfer and Potts, 2007; Saviotti and Metcalfe, 1991; Scott and Davis, 2015; Young, 2001) agree that the notions of critical states, crises, instability, and distress are multidimensional concepts.5 Despite this theoretical consensus, empirical measurement of financial system conditions continues to be treated by means of binary classification (e.g., Kaminsky et al., 1998; Edison, 2003). Recent evaluation studies have followed the suggestions of Brock et al. (2003) and Leeper and Sargent (2003) to improve traditional binary classification with a policymakers’ loss function (Alessi and Detken, 2011; Sarlin, 2013) and the robust analysis of the measurement uncertainty (Holopainen and Sarlin, 2015). However, the gap between the multidimensional systemic conditions and the binary classification of their measures remains to be addressed with a more general classification approach. In any classification, the measures are calibrated against observable critical states. Under binary classification, a unidimensional event variable is constructed. Typically, this involves either signal extraction (Alessi and Detken, 2011), discrete binary choice methods (Berg and Pattillo, 1999; Andreou et al., 2009; Gerdesmeier et al., 2010), or a range of machine learning methods (Holopainen and Sarlin, 2015). Signaling extracts a binary event data point whenever a single reference time series exceeds a predetermined threshold. For example, Kaminsky et al. (1998) define currency crises to occur when their market-pressure index exceeds its mean by more than three standard deviations. Lo Duca and Peltonen (2013) identify systemic events as those in which their financial stress index is above the 90th country-specific percentile. By contrast, discrete choice models apply thresholds to extract an estimated probability, as Holopainen and Sarlin (2015) do with machine learning. In prior studies, the event-variable construction has been operationalized by multiple means. Commonly, a dataset of observed crises is used (Holló et al., 2012; Laeven and Valencia, 2013). To remedy the problem of very few crisis observations in a particular system,6 researchers have also used recessions (Hatzius et al., 2010), surveys (Illing and Liu, 2006), policy interventions (Carlson et al., 2012), episodes of distress (Oet et al., 2013), volatility-based binary benchmarks (Oet et al., 2015b), and news (Rönnqvist and Sarlin, 2015). Traditionally, binary classifiers are evaluated using a two-way contingency table. This table classifies predictors by the number of four possible matches: true positive, false positive (a Type I error), false negative (a Type II error), and true negative (e.g., Kaminsky and Reinhart, 1999). Kaminsky et al. (1998) and Borio and Lowe (2002) summarize this analysis with a noise-to-signal ratio (NTSR), which is a fraction of Type II errors to one minus Type I errors. An NTSR lower than one indicates the measure is beneficial (Kaminsky et al., 1998). Recent contributions introduce policymakers’ loss function as a parameterized modification of the NTSR statistic. The loss-function parameters reveal policymakers’ loss aversion to misclassification (Demirgüc¸-Kunt and Detragiache, 2000; Alessi and

5 Interested readers are also referred to seminal contributions by Simon (1957, 1962, 1979, 1991), Thompson (1967), Levins (1968), and Mohr (1982). 6 For example, the list of banking crises proposed by Laeven and Valencia (2013) finds only two US episodes: between 1970 and 2013: the savings and loan crisis in 1988 and the financial crisis starting in 2007. The approach can also be criticized for its focus on systemic banking crises. This inherently misses critical disturbances in the broader financial system.

3

Detken, 2011, 2014) and the costs of these errors (Bussiere and Fratzscher, 2008; Sarlin, 2013).7 We complete our review of the development of evaluation methods for financial system conditions by noting some relevant developments in information science, empirical macroeconomics, and empirical finance. Collectively, these developments extend the means to evaluate measures of multidimensional financial system conditions. Recent developments in information science show that partitioning the binary event variable into a multidimensional set of critical states enables the search for an optimal measurement algorithm. This literature is founded on analyzing information as a measurable quantity that differentiates one series of signals from another (Shannon and Weaver, 1949). Information signals are transmitted by channels, generally with noise, and subject to maximum channel capacity. Thus, as additional information channels are introduced, the number of information signals (e.g., about critical states in the system) can be expected to increase (Shannon, 1949). Wolpert and Macready (1995, 1997) show that it is impossible to differentiate among alternative algorithms that search for and optimize this type of signal function (e.g., crises or their costs). These findings are known as no-free-lunch (NFL) theorems. Subsequent mathematical research (Streeter, 2003; Igel and Toussaint, 2005) proves that NFL does not hold when the probability distribution of the signal function is variant. To generalize the evaluation of classifiers, the information-science literature introduces the information value (IV) statistic (Kullback, 1959). IV is designed to assess the quality of classification across the series by dividing it into a series of bins and quantifying the noise-to-signal properties of each. The IV statistic has been used in multinomial discrete choice models, such as credit risk scorecards, where a variety of information channels with borrower characteristics serve as classification inputs (Siddiqi, 2006). Empirical findings in finance and macroeconomics indicate that the volatility of financial markets is continually changing in severity, persistence, and pervasiveness. The macroeconomic literature contributes to the measurement of systemic conditions by building on its longstanding interest in the properties of business and financial cycles, from the classic studies of Wallis and Moore (1941), Moore (1954, 1967), and Zarnowitz (1985) to the recent contribution of Stock and Watson (2002). This stream of literature describes the cyclical properties of measures along the dimensions of severity, persistence over time, and pervasiveness across underlying system partitions (Banerji, 1999).8 Empirical-finance findings in market-volatility clustering (Fama and French, 1989; Schwert, 1989; Shiller, 1989) are also relevant to the modeling of system conditions. Fama and French (1989, 23) find that spread-based patterns of “common stocks and longterm bonds contain a term or maturity premium that has a clear business-cycle pattern (low near peaks, high near troughs).” They also find that spreads “contain a risk premium that is related to longer-term aspects of business conditions. The variation through time in this premium is stronger for low-grade bonds than for highgrade bonds and stronger for stocks than for bonds.” Shiller (1989, 1) finds that “Financial market prices, prices of stocks, bonds, foreign exchange, and other investment assets, have shown striking changes in volatility through time. For each of these kinds of assets there are years when prices show enormous unpredictable movements from day to day or month to month, and there are years of

7 The current literature debates which of the two approaches to policymakers’ loss function should be preferred. See Section 4 for empirical findings and Section 5 for commentary. 8 Alternatively, these cyclical properties are studied as depth, duration, and diffusion (e.g., Moore, 1967).

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

4

stable, uneventful markets.” As a result, policymakers are particularly interested in monitoring market volatility patterns for changes in the underlying systemic conditions (Oet et al., 2015a). In this context, the policy literature recognizes that policy actions may be considered both from level and change perspectives (Carlstrom and Fuerst, 2014). Under the level view, policymakers consider how close the system conditions are to those under which policy action is warranted. Under the change view, policymakers may also consider how fast these system conditions are changing.

ing the quality of classification across the series using a division into a series of bins and quantifying the noise-to-signal properties of each. Moreover, we allow the costs of different types of errors to vary as we make use of the so-called Usefulness measures.9 Accordingly, we compare alternative measures through a threestep procedure which involves (1) defining a multidimensional measure of systemic conditions, (2) comparing identification properties, and (3) comparing early-warning properties. 3.2. Multidimensional signaling

3. Methodology This section presents a methodological framework for evaluating measures of financial system conditions. We begin by introducing evaluations as a classification problem in the policy context. We then discuss how designated tests may help macroprudential regulators select between competing measures of financial conditions. Finally, we discuss how standard time-series methods are used to forecast coincident series. 3.1. Classification problem Under binary classification, evaluation involves a comparison of x with the critical states of the measures of systemic conditions Si,t x x x system Ct . We compare Si,t and Ct in levels and differenced (x ∈



In the first step, we transform the one-dimensional volatility series vi,t into multidimensional patterns of market signals of different severity Cx , persistence lCx , and pervasiveness wCx .10 Therefore, the construction of Ctx captures several characteristics of crisis. We first determine whether to focus on  an extreme level or a rapid  change in volatility vi,t (x ∈ lev, diff ). The former will identify the period of crisis while the latter is more useful for detecting the onset of a crisis. The severity of systemic conditions required is given by Cx in Eqs. (2) and (3). To identify systemic crises we look for persistence over lCx consecutive periods in one market or pervasiveness across wCx of the k financial system markets in Eq. (4).





i vi,t =



lev, diff ) to determine the duration and onset of each critical episode. When a crisis occurs, policy is either implemented to the benefit of the system (true positive, TP) or not implemented with a detrimental effect (false negative, FN). If a crisis does not occur, policymakers can implement an unnecessary and burdensome policy (false positive, FP) or efficiently abstain from implementing policy (true negative, TN). We assume that the costs cFN and cFP are non-negative, while cTP and cTN are non-positive costs (Elkan, 2001; Sarlin, 2013). The cost of not implementing policy in times of crisis is c1 = cFN − cTP , and the cost of implementing policy when there is no crisis is c2 = cFP − cTN . We denote by P1 = (TPi + FNi )/(TPi + FPi + FNi + TNi ) the unconditional probability that there is a crisis and P2 = 1 − P1 the unconditional probability that no crisis will occur. The use of binary classification as an approach to evaluate systemic measures can, however, be criticized. First, it is conceptually inappropriate to evaluate measures of a continuous phenomenon with binary classification, as it involves a loss of information. Second, the no free lunch theorems of statistical inference (Wolpert and Macready, 1995, 1997) state that when the probability distribution of objective function (here critical states) is invariant for all alternative measurement algorithms, the computational cost of inference is the same for all alternative measures. From this standpoint, variability in the critical states is desirable as it supports optimal measurement of the critical states. Our proposed method adds variance to the distribution of critical states by introducing additional system partitions with distinct volatility patterns and considering their cyclical properties. This approach follows literature in empirical finance (Fama and French, 1989; Schwert, 1989; Shiller, 1989) and empirical macroeconomics (Moore, 1954, 1967; Stock and Watson, 2002). Using this approach in our case study is only possible, however, under the strong maintained assumption that the optimally weighted signal of a crisis is a true representation of the underlying crisis state. This is because we have only signal data and no independent continuous measure of crisis observation. Because we wish to generalize the problem of classification, we propose the extension of the invariant probability distribution of critical states into a more informative set of critical states of the markets. We generalize the evaluation of classifiers with the information value (IV) statistic (Kullback, 1959) by assess-

vi,t − i i





(1)



I lev vi,t , Clev =

I diff



vi,t , Cdiff

⎧ ⎪ ⎨ Ctx

=

⎪ ⎩





1

ifi vi,t > Clev

0

else

 =



0

k C

  x

I

i=1

0

j=0



(2)



1 if i vi,t − i vi,t−1

lx −1

1 if





diff

> C

(3)

else



vi,t−j , Cx ≥ 1

or

k

 x

I



vi,t, Cx ≥ wCx

(4)

i=1

else

The settings for severity Cx , persistence lCx , and pervasiveness wCx are chosen to maximize the alignment with an observable onedimensional crisis series (e.g., the observed policy interventions in the US case study of Section 4). Optimal matching can be accomplished by standard nonparametric statistics such as NTSR and Hamming distance (Hamming, 1950).11 3.3. Comparison of identification properties In the second step, we compare the identification properties for alternative measures of systemic conditions against critical states, using three statistics: noise-to-signal ratio (NTSR), relative usefulness UR (), and information value (IV). In the setting described by Eqs. (1)–(4), Ctx encodes the occurrence and absence of crisis events. x support macroprudential regulators’ policy deciThe measures Si,t sions by providing distinct perspectives of the financial system. We x match a determine how closely the conditions described by Si,t x history of critical states Ct by using NTSR, IV, and UR collectively. The NTSR for measure i is defined as NTSRi = T 2i / (1 − T 1i ). The Type I error indicates the proportion of crisis observations which x = are falsely classified as noncrisis (T 1i = (FNi / (TPi + FNi )) = P(Si,t

9 This extension avoids the NFL problem and makes meaningful the search for an optimal systemic measurement algorithm. 10 The business cycle literature has historically referred to these three dimensions as depth, duration, and diffusion (Moore, 1954, 1967; Zarnowitz, 1985). 11 Hamming distance measures the number of observations that are different between two time series of equal length.

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model

ARTICLE IN PRESS

JFS-449; No. of Pages 16

M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

0|Ctx = 1)). Type II error refers to the proportion of noncrisis periods where a crisis was mistakenly signaled (T 2i = (FPi /(FPi + TNi )) = x = 1|C x = 0)). P(Si,t t The IV statistic has been proposed when choosing between sevx are eral regressors (Kullback, 1959; Siddiqi, 2006). Observations Si,t sorted and distributed into bins Jj (j = 1, 2, . . ., k) delimited by the k − 1 quantiles of measure i. IV looks at the balance between  x  the fraction of good and bad predictions in each bin (goodj Si,t and





x respectively) following Eqs. (5)–(7). badj Si,t





x = goodj Si,t



x x [Si,t ∗ Ctx + 1 − Si,t





1 − Ctx ]/|Jj |

(5)

t ∈ Jj

badj



x Si,t



=

 [

x 1 − Si,t

  x  +

Ct

x Si,t

t ∈ Jj

IVi =

k



goodj



x Si,t



− badj



x Si,t



 

ln

j=1

1 − Ctx



]/|Jj |









x goodj Si,t x badj Si,t

(6)

(8)

Ua (i ) min (P1 i , P2 (1 − i ))

(9) (10) x Si,t

To utilize the above tests, we derive the signals from continuous measures of systemic conditions di,t following Eq. (11) (leveraging the notation fromEqs. (1)–(3)). Crisis signals are generated when the imbalance i di,t for series i (or the differenced diff

imbalance) is greater than the threshold ilev (respectively, i

).12

diff i

The thresholds and may be chosen based upon NTSR, IV, or x and C x . UR (i ) to provide the best match between Si,t t x Si,t =I

 x

di,t , i

(12)

i

An alternative which incorporates the cost of Type I and Type II error is to select i and thresholds ix to maximize the Relative x have Usefulness. Varying i will determine which measures Si,t value for differing relative costs of Type I error versus Type II error, assuming that the policymaker is risk-neutral. (13)

i

L (i ) = i T1 P1 + (1 − i ) T2 P2

 x

min x |IVi − 0.5|

max x ,i UR (i )

Ua (i ) = min (P1 i , P2 (1 − i )) − L (i )

ilev

is less than one, which requires only the correct identification of a single critical observation (TPi > 0). Therefore, thresholds chosen to minimize the NTSR will focus on Type II error at the expense of Type I error. x must avoid excesTo achieve a satisfactory IV, the measure Si,t sive errors in each bin. Both Type I and Type II errors cause deviation from an IV of 0.5 (which we consider to be optimal). Therefore, selecting the thresholds ix to minimize deviations of IV from 0.5 introduces less bias.

(7)

If the measure of conditions contains no relevant information, we would expect to see the same proportion of good and bad predictions in each bin, leading to an information value of zero. As goodj or badj approach zero, IVi becomes very large. This makes IV a somewhat unstable metric. We select the number of bins k in order to minimize the number of measures for which the IV becomes undefined. Siddiqi (2006) provides a guide whereby an IV of less than 0.1 is weak, IV from 0.1 to 0.3 is average, and IV from 0.3 to 0.5 is strong. The IV and NTSR metrics do not consider the policymakers’ cost of misidentifying critical outcomes. This offers an opportunity for improvement. Sarlin (2013) defines the Absolute and Relative Usefulness of measure i according to Eqs. (8) and (9), respectively. L (i ) represents the policymaker’s loss function (Sarlin, 2013). The variable i combines the costs cTN , cTP , cFN , and cFP into a single parameter. Specifically, i = (c1 /(c1 + c2 )) is the fraction of total costs incurred when the policymaker does not implement policy and a crisis occurs. This construction highlights measures that minimize L (i ).

UR (i ) =

5

(11)

Kaminsky et al. (1998) select the thresholds for each indicator diff in their study (ilev and i ) to minimize the NTSR. However, it is generally possible to achieve an NTSR of zero by eliminating Type II error. A sufficiently high threshold will eliminate Type II error by achieving FPi = 0. This approach will work so long as the Type I error

12 Each threshold ix is used to transform the continuous measure of systemic conx in Eq. (11). These thresholds ix are found by ditions di,t into a binary series Si,t

optimizing IV or UR for series i in Eqs. (12) or (13), respectively. Thresholds ix are distinct from the thresholds Cx that specify Ctx in Eqs. (1)–(4).

The NTSR considers both Type I and II errors but it is potentially subject to manipulation. The IV looks for consistent identification across bins. The Usefulness test considers the cost of Type I versus x against C x from Type II errors. These tests allow us to compare Si,t t different and complementary perspectives. Therefore, to optimize a single test, we consider the remaining perspectives for supplementary evaluation. 3.4. Comparison of early-warning properties In the third step, we compare early-warning properties of the alternative measures by using standard methods of using timeseries analysis (Box and Jenkins, 1970; Johansen, 1995). These time-series tests go beyond the above classification approach in predicting a window of prior adverse conditions. Coincident measures can provide useful information for the purpose of disclosure and fast-acting policies. However, the deployment of slower policies generally requires more time (Borio, 2003). Therefore, the evaluation of the information quality of systemic measures also calls for testing their ability to provide an early warning of adverse systemic developments. A relevant question is whether policymakers can produce useful near-term forecasts of systemic conditions using only coincident measures. To this end, the evaluation of early-warning properties can consider the autoregressive properties of coincident measures. The individual measures are analyzed using the Box and Jenkins (1970) methodology. For each measure, several variations of the autoregressive integrated moving-average (ARIMA) (p,d,q) model are tested, given by Eq. (14): d yi,t = a +

p

j=1

bj d yi,t−j + εi,t +

q

cj εi,t−j

(14)

j=1

The difference operator d yi,t yields the time series yi,t differenced d times. Where appropriate, a generalized autoregressive conditional heteroskedasticity (GARCH) (p,q) methodology can be implemented to account for heteroskedasticity. The final model is selected based on the properties of the residuals (stationarity, heteroskedasticity, autocorrelation, and partial autocorrelation) and the Bayesian information criterion (BIC). The Johansen (1995) method is applied to test the properties of specific data and to select a vector error-correction (VEC) model, following Eq. (15). This approach allows consideration of assorted perspectives of financial system conditions provided by individual

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model

ARTICLE IN PRESS

JFS-449; No. of Pages 16

M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

6

Table 1 Summary statistics for the stress series between June 2000 and February of 2014. Name

Start date (frequency)

Source

Panel 1—Systemic stress measures (FSIs) Cleveland Financial Stress Index (CFSI) CFSI: Credit subindex CFSI: Real estate subindex CFSI: Funding subindex CFSI: Equity subindex CFSI: Foreign exchange subindex CFSI: Securitization subindex National Financial Conditions Index—Chicago (NFCI) NFCI: Nonfinancial leverage subindex NFCI: Leverage subindex NFCI: Credit subindex NFCI: Risk subindex Adjusted NFCI Kansas City Financial Stress Index St. Louis Financial Stress Index Bloomberg Financial Conditions Index Bloomberg Financial Conditions Index Plus Goldman Sachs Financial Conditions Index Goldman Sachs Financial Conditions Index with Oil

1/19/1970(D) 1/19/1970(D) 1/19/1970(D) 1/19/1970(D) 1/19/1970(D) 1/19/1970(D) 1/19/1970(D) 1/5/1973(W) 1/5/1973(W) 1/5/1973(W) 1/5/1973(W) 1/5/1973(W) 1/5/1973(W) 2/1/1990(M) 12/31/1993(W) 1/2/1990(D) 1/2/1990(D) 1/2/1981(D) 4/6/1983(D)

Authors Authors Authors Authors Authors Authors Authors FRED FRED FRED FRED FRED FRED FRED FRED Bloomberg Bloomberg Bloomberg Bloomberg

Panel 2—Economic activity measures (FCIs) Chicago Fed National Activity Index (CFNAI) CFNAI: Diffusion index CFNAI: Personal consumption and housing CFNAI: Employment, unemployment, and hours CFNAI: Production and income CFNAI: Sales, orders, and inventories Philadelphia’s leading index for the US

3/1/1967(M) 3/1/1967(M) 3/1/1967(M) 3/1/1967(M) 3/1/1967(M) 3/1/1967(M) 1/31/1982(M)

FRED FRED FRED FRED FRED FRED FRED

Panel 3—Early-warning measures (EWMs) Kamakura’s Troubled Company Index (over 1%) Kamakura’s Troubled Company Index (over 5%) Kamakura’s Troubled Company Index (over 10%) Kamakura’s Troubled Company Index (over 20%) SRISK from V-Lab

1/30/1990(M) 1/30/1990(M) 1/30/1990(M) 1/30/1990(M) 6/2/2000(D)

Kamakura Kamakura Kamakura Kamakura NYU V-Lab

measures and provides insight into a mechanism for the development of critical systemic episodes.

Yt = A (BYt−1 + c) +

k

Bi Yt−i + εt

(15)

i=1

This approach can be applied to a collection of coincident measures yi,t , where i = 1, . . ., n and Yt is the n × 1 vector of coincident measures. We set c as an n × 1 constant vector, A, B, and Bi are n × n matrices, and k is the number of lags considered for each stress measure. The number of lagged terms to incorporate is determined through consideration of the BIC.

4. Case study: measures of US systemic conditions (1976–2014) 4.1. Data and sampling In this section, we describe the dataset we use for the case study of US systemic conditions from 1976 to 2014 and our sampling strategy. The dataset consists of three types of data: the onedimensional series of policymakers’ financial system interventions (intervention data), the critical states indicator Ctx (signal data), and x (measures data). the measures of US financial system conditions Si,t

4.1.1. Intervention data We assemble a binary time series of policymakers’ interventions from prior studies (Bordo et al., 2015; FRBNY, 2013; Babecky´ et al., 2013; Laeven and Valencia, 2013; Carlson et al., 2012;

Minimum

Maximum

Mean

Variance

26.35 5.09 0.47 1.14 4.95 1.71 1.47 −0.88 −1.74 −1.80 −0.86 −0.86 −1.30 −0.96 −1.52 −10.29 −8.97 98.94 98.08

80.67 17.11 10.48 13.98 28.76 13.48 10.60 2.71 2.59 3.69 2.53 2.73 4.26 5.68 5.15 1.26 1.95 103.84 103.36

48.60 9.77 4.48 5.25 16.26 7.48 5.36 −0.28 0.13 −0.10 −0.17 −0.31 −0.20 0.25 −0.04 −0.62 −0.19 100.11 100.11

−4.44 −0.87 −0.40 −1.63 −1.81 −0.66 −2.67

0.91 0.47 0.19 0.35 0.54 0.36 1.91

−0.37 −0.12 −0.08 −0.17 −0.09 −0.03 0.85

5.66 0.70 0.15 0.02 24,225.81

42.06 18.09 9.90 4.70 916,457.07

16.43 5.15 2.37 0.83 286,418.80

Skewness

Kurtosis

12.90 0.41 2.58 0.29 2.93 0.62 2.62 1.37 7.04 0.14 2.42 −0.20 2.11 0.65 0.65 2.52 1.33 0.37 0.97 1.46 0.64 2.00 0.64 2.67 0.77 2.34 1.20 2.44 1.13 1.91 1.65 −2.70 1.84 −1.95 0.97 1.63 0.88 0.89

−0.75 −0.38 −0.72 1.93 −1.35 −0.08 0.01 7.28 −0.97 3.45 4.88 8.20 7.87 7.00 5.30 10.52 5.28 2.63 2.35

0.85 0.33 0.17 0.37 0.34 0.16 0.96

−2.29 −0.70 −0.28 −1.84 −2.00 −1.52 −1.86

6.35 −0.34 −1.36 3.98 6.53 3.55 3.68

8.97 4.40 2.53 1.10 244,535.94

0.92 1.18 1.39 1.68 0.69

−0.28 0.24 0.79 1.89 −0.77

Kaminsky and Reinhart, 1999).13 These include foreign exchange interventions in the 1970s and 1980s, episodes of dramatic change in monetary policy, and regulatory interventions into the financial system (e.g., forbearance, stress testing, deregulation, policy changes applied during the recent global financial crisis). These observed interventions inform the specification of Ctx based upon a collection of volatility measures vi,t following Eqs. (1)–(4).14 4.1.2. Measures data Our dataset includes 32 published measures of US financial conditions di,t at various frequencies. These continuous series di,t will x folbe transformed into binary signals of systemic conditions Si,t lowing Eq. (11). Each measure belongs to one of three groups: FSIs, FCIs, and EWMs (Table 1). The first group focuses on the measures of financial system stress. These typically incorporate variables describing core markets and functions of the financial system, which are then aggregated using a variety of weighting methodologies. The main sample includes six measures describing the National Financial Conditions Index (NFCI) and seven measures describing the Cleveland Financial Stress Index (CFSI). A second group of measures (FCIs) examines the state of financial system conditions through a set of relevant macroeconomic activities.

13

Intervention data are available from the authors upon request. We extract signals of volatility vi,t for six markets (following Oet et al., 2015a) using data from Datastream. For the equity, foreign exchange, and real estate markets we calculate the rolling 30 day standard deviation of daily returns on the S&P 500 (S&PCOMP), the trade weighted dollar index (US$CWMN), and a real estate index (RLESTUS), respectively. For the funding, credit, and securitization markets we calculate the rolling 90 day standard deviation of yields on three month treasury bills (FRTBS3M), corporate bonds (LHCCORP), and mortgage backed securities (LHMNBCK). 14

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

7

Table 2 Comparison of interventions and multidimensional market signals under level perspective.

Given the level perspective we consider a combination of pervasiveness lClev , persistence lClev , and severity Clev that minimizes the alignment of Ctlev against the observed

interventions. The correspondence between Ctlev and interventions is measured through the Hamming distance (defined in Section 3.2) and NTSR (defined in Section 3.3). Hamming % is fraction of Hamming distance to all monthly observations in the dataset. The table is organized in the ascending order of persistence from one to six markets. Light shaded rows highlight that persistence of five markets minimizes the Hamming distance, Hamming %, and NTSR. Dark shaded row emphasizes the optimal calibration.

The main sample includes 7 measures describing the Chicago Fed National Activity index (CFNAI). EWMs describe system conditions through market expectations of systemic risk. By contrast, select EWMs become available starting in 2000 and are only considered in our robustness sample.

4.1.3. Sampling We analyze the dataset in two samples. The main sample maximizes the available length of the sample starting in 1976, while the second, robustness, sample tests the broadest population of measures available starting in 2000. Accordingly, the main sample (July 1976–February 2014) consists of 20 monthly, 13 weekly, and 7 daily series. The robustness sample (June 2000–February 2014) consists of 32 monthly, 23 weekly, and 16 daily series. The main sample encompasses at least five full economic cycles (following the NBER delineation of recession periods),15 including several wellrecognized critical episodes. Our sampling strategy balances the trade-off between the number of critical observations in the main sample and the number of indicators in the robustness sample. Thus, the main sample enhances insight into the identification and early-warning properties of available measures. The robustness sample enhances the cross-sectional comparison between various types of measures for limited number of joint observations. Fig. 2 presents a subsample of comparisons of several systemic (following Oet et al., measures with the multidimensional signals   2015a) of critical states under both x ∈ lev, diff perspectives.

15

See http://www.nber.org/cycles/cyclesmain.html.

4.2. Results This section empirically assesses US measures of adverse conditions. First, we compare measures of systemic conditions against the benchmark using the full sample between 1976 and 2014.16 (Tables 2–7, Fig. 3) Second, we test the early-warning effectiveness of the measures (Tables 8 and 9, Fig. 4). 4.2.1. Findings from multidimensional signaling Fig. 3 compares the set of US policymakers’ financial system interventions (shown as brown vertical bars) against the set of extracted market volatility signals with particular severity, persistence, and pervasiveness (shown as gray vertical bars). Tables 2 and 3 show the results of matching the extracted signals of critical states against actual interventions. We compare the extracted signals with the set of financial system interventions at a monthly frequency under two policymakers’ perspectives: the level of system conditions and their changes. We find that the rules by which the signals optimally match the interventions vary with these perspectives. Considering the level of system conditions, the best match is produced at the severity of Clev = 0.6 standard deviations and a persistence lClev of five months

or a pervasiveness wClev of two markets. In other words, historically, at the severity level above the designated threshold (of 0.6 standard deviations), policymakers have intervened when this level of volatility was signaled in one market for five consecutive months, or when it was signaled across two markets simultaneously. Another way that policymakers consider whether or not to intervene in a critical state is to consider the changes in system conditions. They consider whether spikes in the market or economic

16 Similar results for the robust sample between 2000 and 2014 are provided in Appendix A (Supplementary).

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model

ARTICLE IN PRESS

JFS-449; No. of Pages 16

M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

8

Table 3 Comparison of interventions and multi-dimensional market signals under difference perspective.

diff

diff

diff

Given the difference perspective we consider a combination of pervasiveness lC , persistence lC , and severity C

diff

that optimizes the alignment of Ct

against the observed

diff Ct

interventions. The correspondence between and interventions is measured through the Hamming distance (defined in Section 3.2) and NTSR (defined in Section 3.3). Hamming % is fraction of Hamming distance to all monthly observations in the dataset. The table is organized in the ascending order of persistence from one to six markets. Shaded row highlights that persistence of one market minimizes the Hamming distance, Hamming %, and NTSR. Table 4 Comparison of coincident measures’ ability to signal stress when selecting ilev based on IV.

Panel 1 uses data between July 1976 and February 2014. Panel 2 uses data between 6/12/1976 and 3/22/2014. Panels 3 uses data between 6/3/1976 and 3/28/2014. The x correctly indicates crisis (true positive, TP), erroneously indicates crisis (false positive, FP), correctly indicates the absence of crisis (true number of observations where Si,t negative, TN), and incorrectly indicates the absence of crisis (false negative, FN) is provided above. The NTSR, IV, and UR (i ) metrics (defined in Section 3.3) describe the x x (shaded columns). A superior measure of systemic conditions Si,t would demonstrate an NTSR close to zero, an IV close to balance between accurate and flawed signals Si,t 0.5, and a UR (i ) near 1.

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

9

Fig. 2. Multidimensional signal compared to several measures of critical states. The figure shows the Cleveland Financial Stress Index (CFSI), National Financial Conditions Index (NFCI), and Chicago Fed National Activity Index (CFNAI) at monthly frequency starting in July 1976. The designated continuous measures of financial conditions di,t (CFSI, NFCI, and CFNAI) are compared against the shaded critical states Ctx viewed from the level (Panel A) and difference (Panel B) perspectives.

events transpiring quickly are sufficiently large to warrant intervention. In these circumstances, they are less concerned with the level of conditions and more concerned with the dramatic changes in these conditions. To study the classification from this perspective, we consider how changes in the observed policy interventions can be best matched by changes in severity, persistence, and pervasiveness. In our case study, the best match to critical states occurs diff diff from a dramatic (severity C = 0.6 std) and rapid (persistence lC diff

of one period) change in a single market (pervasiveness wC of one market). As the results show, the change-based matching signal is quite noisy (Fig. 3, Panel B; Table 4) and inferior to the consideration of levels of system conditions (Fig. 3, Panel A; Table 3). This matching pattern still produces a substantial amount of noise and

is suboptimal by comparison with an intervention rule based on observation of volatility levels. Our results suggest policymakers obtain a superior identification of the onset and duration of critical states by considering the level of system conditions. The consideration of changes in these conditions may be justified when policymakers are particularly worried about dramatic changes in system conditions and ask whether intervention is warranted. However, the consideration of changes produces a very noisy set of signals of critical states. Overall, we find that US policymakers get a better match between intervention and volatility when they consider the pattern in the level of volatility rather than the pattern in the change of volatility. Put differently, based on the known set of US financial system interventions, we find that US policymakers appear less sensitive to

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16 10

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

Table 5 diff Comparison of coincident measures’ ability to signal stress when selecting i based on IV.

Panel 1 uses data between July 1976 and February 2014. Panel 2 uses data between 6/12/1976 and 3/22/2014. Panel 3 uses data between 6/3/1976 and 3/28/2014. The x number of observations where Si,t correctly indicates crisis (true positive, TP), erroneously indicates crisis (false positive, FP), correctly indicates the absence of crisis (true negative, TN), and incorrectly indicates the absence of crisis (false negative, FN) is provided above. The NTSR, IV, and UR (i ) metrics (defined in Section 3.3) describe the x x (shaded columns). A superior measure of systemic conditions Si,t would demonstrate an NTSR close to zero, an IV close to balance between accurate and flawed signals Si,t 0.5, and a UR (i ) near 1.

rapid changes in volatility than they are to the pattern of sustained volatility—persistent across time or pervasive across markets.

4.2.2. Findings from a comparison of identification properties Tables 4 and 5 report the comparative signaling results for diff the tested coincident measures where thresholds ilev and i are selected to optimize the IV following Eq. (12). These tables should display measures with comparable IV metrics. Therefore, we evaluate the comparative advantage of these measures in terms of the Type I (T1) error rate, Type II (T2) error rate, NTSR, and UR metrics. Table 4 displays comparative metrics when signals of crisis are based upon the level of imbalances in the volatility and stress time series. Almost every measure of stress produces an NTSR below unity at every frequency, indicating varying degrees of benefit from their use. These results indicate that the NFCI and the CFSI produce the highest UR metrics and very low NTSRs, modestly surpassing the CFNAI. In Table 5 we analyze the differenced imbalances in an effort to focus on the onset of crises instead of their duration. There is substantial variation in the comparative advantage of each measure. Moreover, the three tests (NTSR, IV, and UR ) often provide conflicting direction. dif We determine ilev and i based upon maximization of the UR metric following Eq. (13) in Tables 6 and 7, respectively. In Table 6, all measures consistently achieve high UR metrics and low NTSR. diff based upon Eq. (13) yields unstaInterestingly, determining i ble results across frequencies in Table 7. In both tables the IV is generally weak.

Some results are common to all approaches. First, the UR () of most series to risk-neutral policymakers is maximized when Type I error is more costly than Type II error since i = 0.7. Typically it is assumed that the cost of not implementing policy in the case of a crisis outstrips the cost of implementing policy in the case of no crisis (i > 0.5). Therefore, the results indicate that tested measures are in large part conducive to policymakers’ needs. This is also in line with previous findings on relative costs between errors (Sarlin, 2013; Betz et al., 2014). Second, we notice a general lack of stability (across frequencies) and agreement (between tests) when adopting a differenced perspective. Third, it is interesting to note that the CFSI, the NFCI, and the CFNAI (for which the components are also available) modestly outperform their components. This holds consistently true when we use the level perspective. This supports the use of composite methodologies and is consistent with the financial system’s property of hierarchical composition and decomposability (Simon, 1962). 4.2.3. Findings from a comparison of early-warning properties We focus on autoregressive properties of the US coincident systemic measures. To apply the Box-Jenkins methodology, we difference the standardized coincident measures to achieve weak form stationarity. We find that, based upon autocorrelation and partial autocorrelation evidence, there is no support for an autoregressive or moving-average structure in the data. These results appear to support the idea that systemic conditions, as viewed by these measures individually, display characteristics of a random walk between January 2002 and June 2007. Therefore, we omit the estimation results.

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

11

Table 6 Comparison of coincident measures’ ability to signal stress when selecting ilvl based on UR ().

Panel 1 uses data between July 1976 and February 2014. Panel 2 uses data between 6/12/1976 and 3/22/2014. Panel 3 uses data between 6/3/1976 and 3/28/2014. The x number of observations where Si,t correctly indicates crisis (true positive, TP), erroneously indicates crisis (false positive, FP), correctly indicates the absence of crisis (true negative, TN), and incorrectly indicates the absence of crisis (false negative, FN) is provided above. The NTSR, IV, and UR (i ) metrics (defined in Section 3.3) describe the x x (shaded columns). A superior measure of systemic conditions Si,t would demonstrate an NTSR close to zero, an IV close to balance between accurate and flawed signals Si,t 0.5, and a UR (i ) near 1.

We also pursue an exploratory examination of the potential for a process through which financial conditions perceived by a collection of measures may develop into stress observed by another set of measures. The VEC forecasts are presented visually in Fig. 3, using an initial estimation sample from March 1992 to February 2004. The forecast is effected by estimating the parameters using all observable data and calculating the forecast one period ahead repeatedly between March 2004 and February 2014. A comparison of the forecast accuracy from out-of-sample forecasts is available in Table 8. We achieve the most accurate out-of-sample forecasts for the KCFSI (Kansas City Financial Stress Index) while the CFSI’s forecast is the least accurate. From these forecasts we generate signals of crisis where the thresholds ilev for each measure maximize the in-sample UR metric following Eq. (13). We are interested in whether there is a difference between in-sample and out-of-sample UR (Table 9). By definition, forecasts with a uniform prediction of crisis or no crisis have a maximum UR of zero. We find the NTSR, IV, and UR do not exhibit a great deal of stability between in-sample and out-of-sample results. Moreover, the relative rankings of these measures change. Most of the measures perform better (superior NTSR and UR ) out-of-sample than in-sample. This may be partially attributable to the prominent crisis in the out-of-sample time period (Fig. 4). 5. Conclusion: implications and limitations Multiple measures for systemic conditions have been developed. A major challenge is to assess their quality. Despite evidence of the multidimensional complexity of system conditions, current

approaches evaluate measurement quality by methods of binary classification. The main contributions of this study are the improvements to the evaluation framework for systemic conditions and their application to a case study of existing measures. The framework is based on two incremental enhancements to prior methods. First, the classification of system states is enhanced by considering the criticality of market-volatility signals as evidenced by their severity, persistence, and pervasiveness. This enhancement integrates findings from two streams of the literature: findings on patterns of volatility in financial markets from empirical finance and findings on cyclical properties from empirical macroeconomics. Second, we apply the information-theoretic statistic of information value to assess the quality of multinomial classification across the predicted outcome distribution for each measure. We apply this evaluation framework to the case of US systemic measures from 1976 to 2013. For this dataset, we partition the US system into six markets and assess both coincident signaling quality and early-warning performance. There are three principal findings. First, considering both levels and change in the severity, persistence, and pervasiveness of market volatility is helpful in identifying the critical states of the US financial system. Second, the association of market volatility with critical states varies with the level and change in conditions. An efficient identification of critical states is achieved at the severity level of 0.6 standard deviations and persistence of five periods or pervasiveness across two markets. Third, measures based on multiple markets improve the identification of critical states.

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16 12

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

Table 7 diff Comparison of coincident measures’ ability to signal stress when selecting i based on UR ().

Panel 1 uses data between July 1976 and February 2014. Panel 2 uses data between 6/12/1976 and 3/22/2014. Panel 3 uses data between 6/3/1976 and 3/28/2014. The x correctly indicates crisis (true positive, TP), erroneously indicates crisis (false positive, FP), correctly indicates the absence of crisis (true number of observations where Si,t negative, TN), and incorrectly indicates the absence of crisis (false negative, FN) is provided above. The NTSR, IV, and UR (i ) metrics (defined in Section 3.3) describe the x x (shaded columns). A superior measure of systemic conditions Si,t would demonstrate an NTSR close to zero, an IV close to balance between accurate and flawed signals Si,t 0.5, and a UR (i ) near 1. Table 8 Accuracy of VEC forecasts. Name

RMSE

MAE

MAPE

Cleveland Financial Stress Index Kansas City Financial Stress Index National Financial Conditions Index Chicago Fed National Activity Index Philadelphia’s Leading Index for the US Kamakura’s Troubled Company Index Bloomberg’s Financial Conditions Index Goldman Sachs Financial Conditions Index

0.8740 0.3138 0.38,989 0.5919 0.4913 0.4317 0.6544 0.6162

0.6854 0.1910 0.2606 0.4341 0.2996 0.29791 0.40194 0.4034

1.6044 0.4738 0.8708 1.9686 1.1928 0.7431 2.5977 4.1441

5.1. Implications The conceptual implications of this study suggest that the appropriateness of systemic measures to macroprudential policy hinges on their capacity to identify and anticipate adverse systemic conditions. Candidate measures must reveal these two types of information in a timely manner and across the relevant individual markets. Decomposable coincident measures tend to carry enhanced systemic information by reflecting conditions from a variety of system aspects. Our case-study findings indicate that policymakers’ intervention decision is less uncertain when they consider signals from levels rather than changes in systemic conditions. This is evidenced by increased stability of the classification metrics (IV, NTSR, and UR ) across frequencies. Analyzing the level

of systemic conditions also allows a direct study of the beginning and end of each episode, which is not possible with the difference perspective. Our case study also provides interesting empirical information in the context of the preferred quantification of the policymakers’ loss function. We find that the cost-based relative UR metric has three attractive features. First, it has a straightforward scaling interpretation for which higher UR is better. Second, it is stable across all selections of i ∈ (0, 1). Third, it incorporates the necessary and intuitive aspect of policy cost versus benefit, represented by the preference parameter . At the same time, UR should not be considered by itself, but it remains a convenient and accessible metric to use for an introductory comparison of systemic measures along side the NTSR and IV statistics.

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

13

Fig. 3. Optimal matching of interventions and multi-dimensional market signals. We use the following series from Datastream to construct vi,t (and Ctx ): S&PCOMP (equity), US$CWMN (foreign exchange), RLESTUS (real estate), FRTBS3M (funding), LHCCORP (credit), and LHMNBCK (securitization). The designated continuous volatility measures vi,t are compared against the shaded critical states Ctx (gray) and observed interventions (brown), viewed from the level (Panel A) and difference (Panel B) perspectives. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

5.2. Limitations Policymakers rely on their ability to project systemic conditions to enable the implementation of policies that take time to affect the financial system. Our analysis of US coincident measures using the Box-Jenkins ARIMA methodology indicates that these measures do not (after necessary transformation) possess sufficient structure individually. As an alternative, the VEC methodology is used with mixed results on a longer sample to determine whether systemic-condition measures allow insight into a mechanism for the development of critical systemic episodes. The results show significant cointegration, which indicates that there are long-run relationships between several of the coincident measures. In addi-

tion, some of the forecasts exhibit moderately stable positive UR out-of-sample, which is attractive to policymakers. However, even this one-period forecast radically limits the application for policy implementation. This is a topic that requires further study using methods capable of producing robust, dynamic, and actionable forecasts. A basic problem in identifying and analyzing systemic conditions is that they may arise from unprecedented patterns. This is particularly challenging to policymakers since a complex financial system adapts to change. Thus, past historical patterns under which critical states of the system emerge may change over time. Alternatively, assessing systemic conditions on the basis of policymakers’ loss function may lead to unrealistic assumptions. A particular chal-

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16 14

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

Fig. 4. VEC forecast results. Forecast is shown with a red dashed line; realized systemic conditions are shown with a solid blue line. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

15

Table 9 In-sample versus out-of-sample association.

x Panel 1 uses data from March 1990 to February 2004. Panel 2 uses data from March 2004 to February 2014. The number of observations where Si,t correctly indicates

crisis (true positive, TP), erroneously indicates crisis (false positive, FP), correctly indicates the absence of crisis (true negative, TN), and incorrectly indicates the absence of x (shaded crisis (false negative, FN) is provided above. The NTSR, IV, and UR (i ) metrics (defined in Section 3.3) describe the balance between accurate and flawed signals Si,t x columns). A superior measure of systemic conditions Si,t would demonstrate an NTSR close to zero, an IV close to 0.5, and a UR (i ) near 1.

lenge in applying early-warning projections for macroprudential policy is that the policy itself leads to feedbacks and unanticipated dynamics. Such amplification of the system’s adaptive response to macroprudential policy must be considered a major challenge of the policy itself. A further question, therefore, is to what extent policy should restrict itself to ex-post responses to the transformation of markets or direct itself ex-ante to influence the system’s adaptation.

Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.jfs.2016.06.008.

References Alessi, L., Detken, C., 2011. Quasi real time early warning indicators for costly asset price boom/bust cycles: a role for global liquidity. Eur. J. Political Econ. 27 (3), 520–533. Alessi, L., Detken, C., 2014. On policymakers’ loss functions and the evaluation of early warning systems: comment. Econ. Lett. 124 (3), 338–340. Andreou, I., Dufrénot, G., Sand-Zantman, A., Zdzienicka-Durand, A., 2009. A forewarning indicator system for financial crises: the case of six Central and Eastern European countries. J. Econ. Integr. 24 (1), 87–115. Arthur, W.B., 1995. Complexity in economic and financial markets. Complexity 1 (1), 20–25. ˇ ´ J., Havránek, T., Matˇeju, ˚ J., Rusnák, M., Smídková, Babecky, K., Vaˇsíˇcek, B., 2013. Leading indicators of crisis incidence: evidence from developed countries. J. Int. Money Finance 35 (June), 1–19. Banerji, A., 1999. The three Ps: simple tools for monitoring economic cycles. Bus. Econ. 34 (4), 72–77. Berg, A., Pattillo, C., 1999. Predicting currency crises: the indicators approach and an alternative. J. Int. Money Finance 18 (4), 561–586. Betz, F., Oprica, S., Peltonen, T., Sarlin, P., 2014. Predicting distress in European banks. J. Bank. Finance 45 (August), 225–241. Board of Governors of the Federal Reserve System, 2014. Financial Accounts of the United States, Z1 Report, Washington. BoE, 2011. Instruments of Macroprudential Policy. Bank of England (Discussion Paper, December). Bordo, M.D., Dueker, M., Wheelock, D., 2000. Aggregate Price Shocks and Financial Instability: An Historical Analysis. Federal Reserve Bank of St. Louis Working Paper No. 005B. Federal Reserve Bank of St. Louis, St. Louis, MO, USA. Bordo, M.D., Humpage, O.F., Schwartz, A.J., 2015. Strained Relations: US Foreign-exchange Operations and Monetary Policy in the Twentieth Century. University of Chicago Press. Borio, C.E., 2003. Towards a macroprudential framework for financial supervision and regulation. CESifo Econ. Stud. 49 (2), 181–216. Borio, C.E., Drehmann, M., 2009. Assessing the risk of banking crises—revisited. BIS Q. Rev. (March). Borio, C.E., Lowe, P., 2002. Assessing the risk of banking crises. BIS Q. Rev. (December), 43–54. Box, G.E., Jenkins, G.M., 1970. Time Series Analysis: Forecasting and Control. John Wiley & Sons Hoboken.

Brave, S., Butters, A., 2012. Diagnosing the financial system: financial conditions and financial stress. Int. J. Cent. Bank. 8 (2), 191–239. Brock, W.A., Hommes, C.H., 1997. A rational route to randomness. Econom.: J. Econom. Soc. 65 (5), 1059–1095. Brock, W.A., Hommes, C.H., 1998. Heterogeneous beliefs and routes to chaos in a simple asset-pricing model. J. Econ. Dyn. Control 22 (8–9), 1235–1274. Brock, W.A., Durlauf, S.N., West, K.D., 2003. Policy evaluation in uncertain economic environments. Brook. Pap. Econ. Activ. 2003 (1), 235–301. Bussiere, M., Fratzscher, M., 2008. Low probability, high impact: policy making and extreme events. J. Policy Model. 30 (1), 111–121. Carlson, M., Lewis, K., Nelson, W., 2012. Using Policy Intervention to Identify Financial Stress. Federal Reserve Board Working Paper No. 2012-02. Federal Reserve, Washington DC, USA. Carlstrom, C., Fuerst, T., 2014. Adding double inertia to taylor rules to improve accuracy. Econ. Comment. (May). Davis, E.P., Karim, D., 2008. Comparing early warning systems for banking crises. J. Financ. Stab. 4 (2), 89–120. Demirgüc¸-Kunt, A., Detragiache, E., 2000. Monitoring banking sector fragility: a multivariate logit approach. World Bank Econ. Rev. 14 (2), 287–307. Dopfer, K., Potts, J., 2007. The General Theory of Economic Evolution. Routledge. Drehmann, M., Juselius, M., 2014. Evaluating early warning indicators of banking crises: satisfying policy requirements. Int. J. Forecast. 30 (3), 759–780. Dridi, A., El Ghourabi, M., Limam, M., 2012. On monitoring financial stress index with extreme value theory. Quant. Finance 12 (3), 329–339. Edison, H.J., 2003. Do indicators of financial crises work? An evaluation of an early warning system. Int. J. Finance Econ. 8 (1), 11–53. Elkan, C., 2001. The foundations of cost-sensitive learning. Proceedings of the 17th International Joint Conference on Artificial Intelligence vol. 2, 973–978. Elsner, W., Heinrich, T., Schwardt, H., 2014. The Microeconomics of Complex Economies: Evolutionary, Institutional, Neoclassical, and Complexity Perspectives. Academic Press, pp. 307–363. Fama, E.F., French, K.R., 1989. Business conditions and expected returns on stocks and bonds. J. Financ. Econ. 25 (1), 23–49. Farmer, J.D., Lo, A.W., 1999. Frontiers of finance: evolution and efficient markets. Proc. Natl. Acad. Sci. 96 (18), 9991–9992. Federal Reserve Bank of New York (FRBNY), 2013. Crisis Timeline, Retrieved from https://www.newyorkfed.org/medialibrary/media/research/global economy/ Crisis Timeline.pdf (accessed 01.05.16.). Gallegati, M., 2014. Early warning signals of financial stress: a wavelet-based composite indicators approach. In: Advances in Non-linear Economic Modeling. Springer, Berlin/Heidelberg, pp. 115–138. Gramlich, D., Oet, M.V., Miller, G., Ong, S.J., 2010. Early warning systems for systemic banking risk: critical review and modeling implications. Banks Bank Syst. 5 (2), 199–211. Gerdesmeier, D., Reimers, H.E., Roffia, B., 2010. Asset price misalignments and the role of money and credit. Int. Finance 13 (3), 377–407. Hakkio, C.S., Keeton, W.R., 2009. Financial stress: what is it, how can it be measured, and why does it matter? Econ. Rev.: Fed. Reserve Bank Kans. City 94 (2), 5. Hamming, R.W., 1950. Error detecting and error correcting codes. Bell Syst. Tech. J. 29 (2), 147–160. Hatzius, J., Hooper, P., Mishkin, F.S., Schoenholtz, K.L., Watson, M.W., 2010. Financial Conditions Indexes: A Fresh Look after the Financial Crisis, NBER Working Paper No. 16150. National Bureau of Economic Research (NBER), Cambridge, MA, USA. Hanschel, E., Monnin, P., 2005. Measuring and forecasting stress in the banking sector: Evidence from Switzerland. BIS Pap. 22, 431–449. Holland, J.H., 1975. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI.

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008

G Model JFS-449; No. of Pages 16 16

ARTICLE IN PRESS M.V. Oet et al. / Journal of Financial Stability xxx (2016) xxx–xxx

Holland, J.H., 1988. The global economy as an adaptive process. In: Anderson, P.W., Arrow, K.J., Pines, D. (Eds.), The Economy as an Evolving Complex System. Addison-Wesley, Reading, MA, pp. 117–124. Hollingsworth, J.R., Müller, K.H., Hollingsworth, E.J., 2005. Advancing Socio-economics: An Institutionalist Perspective. Rowman & Littlefield Publishers, Inc. Holló, D., Kremer, M., Lo Duca, M., 2012. CISS—A Composite Indicator of Systemic Stress in the Financial System. European Central Bank Working Paper No. 1426. European Central Bank (ECB), Frankfurt, Germany. Holopainen, M., Sarlin, P., 2015. Toward Robust Early-warning Models: A Horse Race, Ensembles and Model Uncertainty. Bank of Finland (Research Discussion Paper (6)). Hommes, C.H., 2001. Financial markets as nonlinear adaptive evolutionary systems. Quant. Finance 1 (1), 149–167. Igel, C., Toussaint, M., 2005. A no-free-lunch theorem for non-uniform distributions of target functions. J. Math. Model. Algorithms 3 (4), 313–322. Illing, M., Liu, Y., 2006. Measuring financial stress in a developed country: an application to Canada. J. Financ. Stab. 2 (3), 243–265. Johansen, S., 1995. Likelihood-based Inference in Cointegrated Vector Autoregressive Models. Oxford University Press, Oxford. Judge, K., 2012. Fragmentation nodes: a study in financial innovation, complexity, and systemic risk. Stanf. Law Rev. 64 (3), 657–725. Kaminsky, G., Lizondo, S., Reinhart, C.M., 1998. Leading Indicators of Currency Crises. International Monetary Fund, pp. 1–48 (Staff Papers). Kaminsky, G.L., Reinhart, C.M., 1999. The twin crises: the causes of banking and balance-of-payments problems. Am. Econ. Rev. 89 (3), 473–500. Kliesen, K.L., Owyang, M.T., Vermann, K.E., 2012. Disentangling diverse measures: a survey of financial stress indexes. Fed. Reserve Bank St. Louis Rev. 94 (5), 369–398. Korinek, A., 2011. Systemic Risk-taking: Amplification Effects, Externalities, and Regulatory Responses. Networks Financial Institute Working Paper No. 2011-WP-13. Networks Financial Institute, Terre Haute, IN, USA. Kullback, S., 1959. Information Theory and Statistics. John Wiley and Sons, New York. Laeven, L., Valencia, F., 2013. Systemic banking crises database. IMF Econ. Rev. 61 (2), 225–270. Leeper, E.M., Sargent, T.J., 2003. Comments and discussion. Brook. Pap. Econ. Activ. 2003 (1), 302–313. Levins, R., 1968. Evolution in Changing Environments: Some Theoretical Explorations. Princeton University Press. Lim, C., Bhattacharya, R., Columba, F., Costa, A., Otani, A., Wu, X., 2011. Macroprudential Policy: An Organizing Framework. Background Paper. IMF Mimeo (March). Lo Duca, M., Peltonen, T.A., 2013. Assessing systemic risks and predicting systemic events. J. Bank. Finance 37 (7), 2183–2195. Mantegna, R.N., Stanley, H.E., 1999. Introduction to Econophysics: Correlations and Complexity in Finance. Cambridge University Press, Cambridge. Mohr, L.B., 1982. Explaining Organizational Behavior. Jossey-Bass, San Francisco. Moore, G.H., 1954. Analyzing business cycles. Am. Stat. 8 (2), 13–19. Moore, G.H., 1967. What is a recession? Am. Stat. 21 (4), 16–19. Oet, M.V., Bianco, T., Gramlich, D., Ong, S.J., 2013. SAFE: an early warning system for systemic banking risk. J. Bank. Finance 37 (11), 4510–4533.

Oet, M.V., Dooley, J.M., Janosko, A.C., Gramlich, D., Ong, S.J., 2015a. Supervising system stress in multiple markets. Risks 3 (3), 365–389. Oet, M.V., Dooley, J.M., Ong, S.J., 2015b. The financial stress index: identification of systemic risk conditions. Risks 3 (3), 420–444. Rönnqvist, S., Sarlin, P., 2015. Detect & describe: deep learning of bank stress in the news. arXiv preprint arXiv:1507.07870. Rosser, J.B., 2013. From Catastrophe to Chaos: A General Theory of Economic Discontinuities. Springer Science & Business Media, Dordrecht. Sarlin, P., 2013. On policymakers’ loss functions and the evaluation of early warning systems. Econ. Lett. 119 (1), 1–7. Saviotti, P., Metcalfe, J.S., 1991. Evolutionary Theories of Economic and Technological Change: Present Status and Future Prospects. Harwood Academic Publishers. Schwert, G.W., 1989. Why does stock market volatility change over time? J. Finance 44 (5), 1115–1153. Scott, W.R., Davis, G.F., 2015. Organizations and Organizing: Rational, Natural and Open Systems Perspectives. Routledge, London. Shannon, C.E., Weaver, W., 1949. The Mathematical Theory of Communication. University of Illinois Press, Urbana. Shannon, C.E., 1949. Communication in the presence of noise. Proc. IRE 37 (1), 10–21. Shiller, Robert J., 1989. Causes of changing financial market volatility. In: Financial Market Volatility. Federal Reserve Bank of Kansas City, pp. 1–32. Siddiqi, N., 2006. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. SAS Institute, pp. 79–83. Simon, H.A., 1957. Models of Man, Social and Rational: Mathematical Essays on Rational Human Behavior in a Social Setting. John Wiley, New York. Simon, H.A., 1962. The architecture of complexity. Proc. Am. Philos. Soc. 106 (6), 467–482. Simon, H.A., 1979. Rational decision making in business organizations. Am. Econ. Rev. 69 (4), 493–513. Simon, H.A., 1991. Organizations and markets. J. Econ. Perspect. 5 (2), 25–44. Streeter, M.J., 2003. Two broad classes of functions for which a no free lunch result does not hold. In: Genetic and Evolutionary Computation—GECCO 2003. Springer, Berlin/Heidelberg, pp. 1418–1430. Stock, J.H., Watson, M.W., 2002. Macroeconomic forecasting using diffusion indexes. J. Bus. Econ. Stat. 20 (2), 147–162. Swiston, A., 2008. A U.S. Financial Conditions Index. International Monetary Fund Working Paper No. 16. International Monetary Fund, Washington DC, USA. Thompson, J.D., 1967. Organizations in Action: Social Science Bases of Administrative theory. Transaction Publishers. Wallis, W.A., Moore, G.H., 1941. A significance test for time series analysis. J. Am. Stat. Assoc. 36 (215), 401–409. Wolpert, D.H., Macready, W.G., 1995. No Free Lunch Theorems for Search, Technical Report SFI-TR-95-02-010 vol. 10. Santa Fe Institute. Wolpert, D.H., Macready, W.G., 1997. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1 (1), 67–82. Young, H.P., 2001. Individual Strategy and Social Structure: An Evolutionary Theory of Institutions. Princeton University Press, Princeton. Zarnowitz, V., 1985. Recent work on business cycles in historical perspective: review of theories and evidence. J. Econ. Lit. 23 (6), 523–580.

Please cite this article in press as: Oet, M.V., et al., Evaluating measures of adverse financial conditions. J. Financial Stability (2016), http://dx.doi.org/10.1016/j.jfs.2016.06.008