Probability forecasting and central bank accountability

Probability forecasting and central bank accountability

Journal of Policy Modeling 28 (2006) 223–234 Probability forecasting and central bank accountability Gabriel Casillas-Olvera a,b , David A. Bessler c...

205KB Sizes 1 Downloads 65 Views

Journal of Policy Modeling 28 (2006) 223–234

Probability forecasting and central bank accountability Gabriel Casillas-Olvera a,b , David A. Bessler c,∗ a Banco de M´ exico, Av. 5 de Mayo No. 6, Col. Centro, 06059 M´exico, DF, Mexico Department of Economics, Instituto Tecnol´ogico Aut´onomo de M´exico (ITAM), M´exico, DF, M´exico Department of Agricultural Economics, Texas A&M University, Mail Stop 2124, College Station, TX 77845, USA b

c

Received 11 July 2005; received in revised form 20 September 2005; accepted 10 October 2005 Available online 5 December 2005

Abstract The paper studies probability forecasts of inflation and GDP by monetary authorities. Such forecasts can contribute to central bank transparency and reputation building. Problems with principal and agent make the usual argument for using scoring rules to motivate probability forecasts confused; however, their use to evaluate forecasts remains valid. Public comparison of forecasting results with a “shadow” committee is helpful to promote reputation building and thus serves the motivational role. The Brier score and its Yates-partition of the Bank of England’s forecasts are compared with those of a group of non-bank experts. © 2005 Society for Policy Modeling. Published by Elsevier Inc. All rights reserved. JEL classification: E58; C8 Keywords: Central banks; Accountability; Probability forecasting; Brier score; Yates’ partition

If you twist my arm, you can make me give a single number as a guess about next year’s GNP. But you will have to twist hard. My scientific conscience would feel more comfortable giving you my subjective probability distribution for all the values of GNP. Paul A. Samuelson (1965), p. 278. 1. Introduction For years, the conduction of monetary policy by central bankers has been a mystery to the general public. Central bankers built reputations making decisions in environments of confidentiality. ∗

Corresponding author. Tel.: +1 409 845 2116; fax: +1 409 862 1563. E-mail address: [email protected] (D.A. Bessler).

0161-8938/$ – see front matter © 2005 Society for Policy Modeling. Published by Elsevier Inc. All rights reserved. doi:10.1016/j.jpolmod.2005.10.004

224

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

Arguments supporting a higher degree of transparency have recently persuaded monetary authorities to be more open with respect to policymaking decisions, up to the point that some make their forecasts of key variables public. Intensifying the public’s response to monetary policy changes is among the potential gains of increased transparency (Svensson, 1997; Woodford, 2003). The Bank of England (BoE) is one of the few Central Banks that actually publish inflation forecasts.1 The Monetary Policy Committee (MPC) of the BoE has been issuing density forecasts of inflation, also called “Fan Charts,” on a quarterly basis in its Inflation Report since August 1997. It has been issuing output growth forecasts since November 1997. In addition, the BoE has published probabilistic forecasts of these two “key” variables from a quarterly survey of undisclosed external forecasters, averaging their responses for each range of the probability distribution. In this paper, we evaluate the probability forecasts of the MPC and those of the group of undisclosed external forecasters using the Brier score and its partition, the latter originally suggested by Yates (1982). Our purpose is to demonstrate that the ex post evaluations of probability forecasts of both the MPC and an alternative “shadow” committee offer valuable information that is not available from reports on the MPC (alone).2 A humorous (slightly edited) epigraph, summarizing a conversation between person “A” and person “B” of Granger and Newbold (1986) p. 265 illustrates well our suggestion—“A: How is your spouse? B: Compared to what?” Comparing the Central Bank’s probability forecasts with a competent but “shadow” expert will help induce forecasting “soundness” by reputation building and learning. Analyzing both of the forecasters’ predictability performances appeals to the forecast competition argument suggested above in the Granger and Newbold quote. Recognizing the incentive-compatible feature of the Brier score, we considered (and later ruled out) utilizing the Brier score in the context of a contract between the government and the central bank in the spirit of Persson and Tabellini (1993, 1999, 2000) and Walsh (1995, 1998). Because of ambiguities discussed in McCallum (1999) and Blinder (1998) that present themselves in central banking, this possibility was abandoned.3 Determining whether it is the principal (Parliament or Congress) or the agent (central bank) who has more incentive to try and boost real output in the short-run by creating “surprise inflation” is among these ambiguities. Clements (2004) also calculates the Brier score of the MPC forecasts. This paper differs from his as we apply the Yates decomposition to extract meaningful information about the forecaster’s beliefs. We find that the MPC is upwardly biased by placing larger probabilities to the high state, preventing the less conservative members of the Committee to gain any approval for interest rate cuts. These results are consistent with Pagan (2003), Wallis (2003, 2004) and Clements (2004). The Yates-partition shows that the MPC forecasts of inflation do not sort or discriminate between events that occur versus events that do not occur as well as the “shadow” forecasters. On the other hand, the MPC’s forecasts of GDP do sort (or distinguish between events that ultimately obtain versus events that do not obtain) about the same as the “shadow” committee. 1 Hatch (2001) provides an insightful introduction to the Bank of England’s modeling and forecasting. For detailed information on the construction of fan charts, see Britton, Fisher, and Whitley (1998). 2 Here we use the undisclosed experts as a “shadow committee” because these forecasts are available. Certainly we do not necessarily argue for or against the selection of this particular set of individuals. If our suggestion is to be used as policy, selection of the “shadow committee” will need to be given more consideration (than we have in this particular case). 3 In order to connect Walsh’s (1995, 1998) and Persson and Tabellini’s (1993, 1999, 2000) contracting approach to our proposal, it had to be followed literally. Consequently it was not given any practical consideration. However, this should not be interpreted as discarding the importance of their contributions to improve the assessment of modern monetary policy issues.

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

225

The remainder of the paper is divided into three sections. Section 2 provides an overview of probabilistic forecasting concepts. Section 3 presents empirical results using the evaluation methods on the density forecasts of the MPC and an external group of forecasters on UK inflation and GDP. Section 4 concludes the paper. 2. Probability forecasting The probability forecasts studied in this paper fall under the heading of “prequential analysis” as introduced by Dawid (1984). Prequential analysis refers to the study of sequential probability forecasting. Let Xt , t = 1,. . ., N, be a K × 1 vector time series of realized values x t = (x1,t ,. . ., xK,t ), with K defined (discrete) possible outcomes. Assume that at time N, given the observed values xt , t = 1,. . ., N, the forecaster issues a set of probability distributions PN,m = (PN + m |j = 1,. . ., m) for future (unknown) quantities xN + j , j = 1,. . ., m. A relationship P which links a selected PN,m with each value of N and any possible set of outcomes xt , t = N + 1,. . . N + m, is defined as a prequential forecasting system (PFS) (Dawid, 1984). Although we could think of probabilities arising from computer-based models, this could also apply to subjective probability judgments (Kling & Bessler, 1989). 2.1. Probability forecast evaluation There are three aspects to consider in judging a probability forecast as good or bad: coherence, calibration and expertise (Winkler, 1986). Coherence refers to the a priori condition that the assessed probabilities meet the rules of the probability calculus. Here in our study of the MPC forecasts, the events forecasted are mutually exclusive and exhaustive, giving us a coherence condition that the probabilities sum to one (de Finetti, 1974). Calibration is the ability to match the ex post relative frequency of all events with the associated ex ante forecasted probability distribution. So if a forecaster (a model or a person) issues the probability 0.20 one hundred times (the probability of 0.2 is issued to one hundred events) we should expect to see, after the fact, 20 of the events actually occur for a well calibrated forecaster. Finally expertise or the ability to show resolution refers to the ability of the assessor to distinguish (ex ante) between events that ultimately occur and events that do not occur. Both calibration and expertise (resolution) can be measured with the Brier scoring rule and its partition (more on this below). Bessler and Ruffley (2004) present a potential problem of using only calibration-based forecast evaluation methods. Calibration does not measure the ability of the forecaster (model or person) to sort or distinguish between events that actually occur and events that do not occur. A forecaster could be perfectly calibrated, but, at the same time, not particularly informative, in the sense of being able to sort or signal events into groups: those events that occur versus those events that do not occur. The Yates-partition of the Brier score is able to provide information on such sorting. 2.2. The quadratic probability rule and the Brier score The quadratic rule was first introduced by Brier (1950) in the context of weather forecasting. The mean probability score or Brier score is a variant of the quadratic scoring rule. The Brier score belongs to a set of rules called proper rules or, in modern contract theory terminology, of incentive-compatible forecasting scores. This means that these rules encourage honesty (Osband, 1989; Winkler, 1986). There is a rich history on the use of quadratic scoring rules

226

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

to motivate and evaluate subjective probabilities, finding strong foundations in both theoretical (de Finetti, 1937, 1965, 1974; Savage, 1971) and experimental (Nelson & Bessler, 1989) fields. Let the probabilistic forecast of an event k’s occurrence be denoted by p and let d be a vector defined as the outcome index for event k. If k occurs d = 1; if event k does not occur k = 0. The quadratic probability score (PS) for a single forecast is: PS (p, d) = (p − d)2 ,

0 ≤ PS ≤ 1

(1)

PS ranges between 0 and 1. A score of 0 means that the forecaster predicted the events perfectly. A forecaster who performed badly gets a 1. The mean probability score or Brier score (PS) is an average of the single-forecast version of the probability score (3) over N occasions, indexed by t = 1,. . ., N:   N 1 PS (p, d) = (pt − dt )2 (2) t=1 N Here, the notation is the same as above. The Brier mean probability score can also be expressed for a more than two-event case. 2.3. Yates’ decomposition of the Brier score Yates (1982, 1988) emphasizes the covariance between the reported forecasts and the outcome as “the heart of forecasting”. We now show Yates’ decomposition for the Brier score (PS). Yates (1982, 1988) decomposed the Brier score (PS) into several modules providing further analyses on resolution. Yates’ so-called covariance decomposition is given as:4 Bias2 + Scatter + Minimum variance of probability forecast (p) + Variance of the outcome index (d) − 2 × Covariance between P and d

(3)

Bias is also called calibration in the large or mean probability judgment. It quantifies whether the probability forecasts are too low or too high, and it is a measure of miss-calibration of the probability assessments. Whenever bias equals 0, it indicates that the forecaster matched perfectly the mean forecasts to the outcome index relative frequency (i.e., the forecaster is perfectly calibrated). Scatter can be interpreted as an index of general excess variability (or noise) contained in the forecaster’s probability statements. As a result, the minimum variance of the probability forecast equals the overall forecast variance whenever there is no scatter about the conditional means of the probabilities that occur and the ones that do not occur. If the variance of the outcome index is completely exogenous to the forecaster’s judgments, the appraiser has to minimize the scatter and the minimum variance of the probability forecast and maximizes the covariance between the forecast and the outcome index. This covariance in turn measures the responsiveness of the forecaster to information related to the event’s occurrence, and the scatter indexes the forecaster’s responsiveness to information not related to event’s occurrence.

4

We deliberately departed from the original notation (Yates, 1982) for consistency of the present paper.

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

227

Fig. 1. Monetary Policy Committee fan chart of inflation.

3. Evaluation of the Bank of England’s fan charts The Monetary Policy Committee took responsibility for publishing an Inflation Report on a quarterly basis when the BoE was given operational independence in 1997. The MPC meets every first Wednesday and Thursday following the first Monday of each month, and they issue their forecasts at the February, May, August and November meetings. The MPC has been publishing its inflation forecast as Fan-Charts since August 1997. It has been issuing output growth forecasts since November 1997. These forecasts are conditional on the assumption that the interest rates remain constant at the level the MPC decided. The committee reports their projections for onequarter up to eight-quarters ahead on mutually exclusive and exhaustive partitions of the inflation and output growth (GDP growth) space. Recently, Wallis (2003, 2004) and Clements (2004) have performed calibration-based analyses on the MPC 1-year-ahead inflation density forecasts. They both agree that the MPC overestimated the future uncertainty, making the inflation probabilistic forecasts “fan out” more rapidly. Further they suggest the existence of bias by stressing the MPC has placed too much probability in the upper ranges of the forecasted distribution. While Wallis (2004) compares the MPC inflation forecasts with forecasts issued by the National Institute of Economic and Social Research (NIESR),5 Wallis (2003) and Clements (2004) do not offer comparisons with other forecasters. Fig. 1 depicts the historical retail price index, excluding mortgage interest payments (RPIX) inflation. As the graph fans out, it portrays the probability of various outcomes for future inflation. These projections are based on constant nominal interest rates at 4% (Bank of England, May 2003). The BoE’s inflation target was based on the RPIX until December 2003, when the MPC changed the target to inflation based on the CPI.

5 The National Institute of Economic and Social Research issues one-quarter and 1-year-ahead probabilistic forecasts for inflation and output growth in the National Institute Economic Review since October 1996.

228

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

We assess the MPC and the others density forecast evaluations for the February 1998 through May 2001 forecasts of inflation and GDP growth rate for the first quarter of 2000 to the second quarter of 2003. Although the MPC started reporting their inflation forecasts in a density form in August 1997, they did not report the GDP until November 1997. Moreover, the real GDP growth density forecast for November 1997 is only reported as a fan chart and not an explicit table. The Brier score is said to work only for unconditional forecast evaluations (Clements & Smith, 2000). Clements (2004) and Wallis (2003, 2004) discuss that the 1-year-ahead forecasts can be treated as unconditional forecasts since interest rates do not have such an impact on inflation in the short-run. However, due to the lack of consistent 1-year-ahead forecasts issued by the external surveyed forecasters, published in the Inflation Report, we focus our analysis on the 2year-ahead forecasts. Clements and Smith (2000) conclude that evaluating conditional forecasts with unconditionally-related forecast evaluation techniques is not a major drawback if the data set is sufficiently small.6 Furthermore, exploring better methods becomes worthwhile as the focus of the analysis lies on larger data sets. On the other hand, evaluating the 2-year density forecasts is important since longer term forecasting horizons are usually related to the establishment of central bank credibility. The BoE has issued fan charts conditionally on a constant interest rate. Since February 1997, the MPC has published fan charts conditional to the market expectations of the interest rate. They perform this analysis by extracting information from the implied volatilities from the market for options on government bonds with different maturities (Bahra, 1997). 3.1. Probability scores, Brier scores and Yates’ partitions of the density forecasts of the UK Here we calculate the multiple-event probability scores (PSM) for each quarter’s forecast of inflation and the output growth rate for both the MPC and the BoE’s surveyed forecasters. The latter group will be our “shadow” committee. While we do not necessarily endorse this particular committee, as we have no information on how members were selected, it does provide us a set of probability forecasts to compare to those of the MPC. Clearly, much thought needs to be behind the selection of a “shadow” committee if this proposal is adopted as policy. Figs. 2 and 3 illustrate the MPC’s and the other forecasters’ PSMs obtained for their assessments on the forecasts of inflation and output growth. On first sight, the upward trend in both graphs caught our attention.7 A hypothesis emerged from the learning-by-doing theories that could make us think that experience in forecasting is gained as time goes by and better scores can be achieved. We do not find support for these ideas, and unfortunately, the analysis is performed on very few observations. On the other hand, they both run into the same trend, so it is not clear that this is an issue. An oddity that becomes apparent is the spike in the forecast for the second quarter of 2002, published in May 2000. While this appears in both the MPC and the surveyed forecasters’ PSMs, it is clearly a more dramatic one for the other forecasters. The over-responsiveness shown by

6

An idea not followed here for evaluating conditional probability forecasts is de Finetti’s (1974) notion of called off bets. Bessler and Kling (1989) apply such to the evaluation of probability forecasts on one variable, say y, where the conditional is defined in terms of an interval on another variable, say x1 < x < x2 . There they study the calibration properties on probabilistic forecasts of y only for those cases where x falls into the conditional interval; other probabilities, where the conditional does not obtain, are not evaluated. 7 Recall that the multiple-event probability score ranges from 0 to 2. An appraiser with perfect forecast attains a 0. The worst possible score is 2.

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

229

Fig. 2. Multiple-event Brier probability score of the MPC and “Other” forecasters’ inflation forecast. Notes: PSM is the multiple-event Brier probability score, MPC the Monetary Policy Committee and “Others” are the “shadow” committee forecasters.

the surveyed forecasters might be explained because the MPC changes their surveyed sources (“shadow” committee members) from time to time without notice. Changes in the MPC committee may generate responses by the “shadow” committee also. Such changes may generate “outliers” as the other forecasters (the “shadow” forecasters) may attribute or assign a dominant role to new MPC members, as each has a history which precedes committee service. Members’ reputations as an interest rate “dove or hawk” may be reflected in the “shadow” forecasters’ probabilities, especially in the early periods of the forecasting experience (see David, 2003 for discussion of the period surrounding the 2000–2002 period). The PSM analysis also shows that the “shadow” committee is performing better than the MPC. In terms of the Brier score for the inflation forecasts, the “shadow” committee does a better job than the MPC obtaining 0.59, a smaller score than the MPC’s 0.70. A completely different situation emerges when we turn our sight to the output growth forecasts, where both appraisers

Fig. 3. Multiple-event Brier probability score for the MPC and “Other” forecasters’ real GDP growth forecast. Notes: PSM is the multiple-event Brier probability score, MPC the Monetary Policy Committee and “Others” are the “shadow” committee forecasters.

230

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

Table 1 Brier mean probability scores and their Yates-partitions on the MPC and “shadow” committee forecasts Inflation

Brier score Variance of d Minimum variance of p Scatter Bias2 Covariance between p and d

2000 Q1–2001 Q1 (5 obs.)

2001 Q2–2003 Q2 (9 obs.)

Overall period (14 obs.)

MPC

Others

MPC

Others

MPC

Others

0.6256 0.0000 0.0000 0.0071 0.6185 0.0000

0.5136 0.0000 0.0000 0.0031 0.5105 0.0000

0.7555 0.5679 0.0000 0.0043 0.1763 −0.0035

0.6466 0.5679 0.0000 0.0009 0.0770 −0.0004

0.7091 0.3651 0.0000 0.0053 0.3342 −0.0022

0.5991 0.3651 0.0000 0.0017 0.2318 −0.0002

GDP growth rate

Brier score Variance of d Minimum variance of p Scatter Bias2 Covariance between p and d

2000 Q1–2001 Q1 (5 obs.)

2001 Q2–2003 Q2 (9 obs.)

Overall period (14 obs.)

MPC

Others

MPC

Others

MPC

Others

0.7210 0.4800 0.0002 0.0037 0.2371 0.0000

0.8212 0.4800 0.0003 0.0024 0.3145 −0.0120

0.7083 0.4938 0.0001 0.0061 0.2110 0.0014

0.6548 0.4938 0.0001 0.0036 0.1613 0.0020

0.7128 0.4889 0.0002 0.0052 0.2203 0.0009

0.7142 0.4889 0.0002 0.0032 0.2160 −0.0030

Combined forecasts

Brier score Variance of d Minimum variance of p Scatter Bias2 Covariance between p and d

2000 Q1–2001 Q1 (10 obs.)

2001 Q2–2003 Q2 (18 obs.)

Overall period (28 obs.)

MPC

Others

MPC

Others

MPC

Others

0.6733 0.2400 0.0001 0.0054 0.4278 0.0000

0.6674 0.2400 0.0002 0.0027 0.4125 −0.0060

0.7319 0.5309 0.0001 0.0052 0.1937 −0.0011

0.6507 0.5309 0.0000 0.0022 0.1192 0.0008

0.7110 0.4270 0.0001 0.0053 0.2773 −0.0007

0.6567 0.4270 0.0001 0.0024 0.2239 −0.0016

Notes: MPC is the Bank of England Monetary Policy Committee. “Others” are the “shadow” committee of forecasters. The five numbers below the Brier score are the components of its Yates-decomposition. To avoid ambiguities, the covariance between p and d is reported instead of −2 × Cov(p,d).

perform a similar job with scores almost matching 0.71. Combining both the inflation and the output growth forecasts, we can observe that the MPC’s Brier score worsens from period one to period two. This does not happen to the “shadow” committee. These numerical results resemble what we have already seen in Figs. 2 and 3. The difference here is that we are able to ask why they have different scores regarding the inflation predictions. In order to answer the question, let’s look at the bias and covariance components in the first set of results of Table 1. Even though both forecasters show biases in their assessments, the MPC acquires a larger bias with 0.33, compared to 0.23 of the other forecasters. On the other hand, the MPC’s covariance between their forecasts and the outcome index is −0.0022. This is a much smaller number, compared to the “shadow” forecasters’ almost non-existence covariance. In this case, if the covariance term is negative, the appraiser would rather choose not to have covariance at all, minimizing his/her Brier score.

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

231

Bias can be interpreted as “wishful thinking”. This can be permeated through direct qualitative corrections to the forecasts, or is contained in the models’ assumptions on how the economy works. Mankiw (1998) approaches this point of concern in the following way: “Wishful thinking is one reason that monetary policy has historically been excessively inflationary . . .. To my mind, wishful thinking is as worrisome a problem for monetary policy as timeinconsistency.” However, in this case, we find that “wishful thinking” is coming from the inflation forecast, rather than from the output growth predictions. Thus, we do not find evidence of bias towards exploiting the output inflation trade-off. On the contrary, our results support the idea that “wishful thinking” is directed towards a lower inflation level. This explanation gains more weight if we consider the popular literature about the peak found in Fig. 2 due to rejecting dovish committee members (see David, 2003), due to a supposed fear of inflation on the MPC’s side. Shifting gears to another set of facts, the MPC consistently portrays a larger scatter compared to the surveyed appraisers. Even at the overall level, that is, combining the forecasts, the most emphatic difference between the components of the Yates-partition and the two forecasters is the scatter. An intuitive interpretation of the relationship between the total forecast variance, its components (minimum forecast variance and scatter) and the covariance between the probability forecasts and the outcome index, follows. The covariance between p and d measures the responsiveness of the forecaster to information related to event’s occurrence, and the scatter indexes the forecaster’s responsiveness to information not related to event’s occurrence. A large value of scatter suggests that either the forecaster is aware of exogenous shocks and wants to take them into account at a qualitative level, i.e., not included in their models, or he/she simply wants to hedge his/her results. This view is enhanced by the fact that both forecasters attain a minimum variance of their forecasts very close to 0. This suggests that both the MPC and the surveyed forecasters have to do a better job in selecting the variables relevant for forecasting and the causal structure among them. This gains a special meaning in the case of inflation since it is the key variable for the Bank of England. A graphical approach illustrating each forecaster’s ability to discriminate between events that occur and events that do not occur is given in Fig. 4. The figure shows the covariance graphs of the probability judgments of the MPC and the “other” forecasters for inflation and real GDP growth. Each sub-graph of the figure tells a story about the forecasters’ abilities to sort events ex ante into groups. Those events that ultimately do not obtain are denoted by a 0 on the x-axis and events that ultimately obtain are denoted by the 1 on the x-axis. Associated with these two extremes are the issued probabilities of the forecasters. We want a forecaster’s issued probabilities associated with the 0 value on the x-axis to be at or very near to 0 and his forecasts associated with the 1 value on the x-axis to be at or very near 1.0. We observe that both the MPC and the other forecasters’ ability to sort are modest as both groups assign very low probabilities to the events that actually occurred. Here, the 45◦ line represents the ideal forecaster (perfect foresight). The dashed-lines are the regression lines of the probability forecasts on the outcome indices. As the dashed line approaches the 45◦ line, the forecaster gets closer to perfection in both calibration and resolution criteria. The bias of the MPC’s inflation forecasts can be seen clearer when comparing the upper-left corner graph, depicting the MPC’s covariance graph, with the upper-right corner graph (the other forecasters). The MPC’s dashed line is flatter than the other forecasters showing a slope of 0.09, compared to 0.16 of the other forecasters’ dashed line. In the case of the output growth forecasts, in spite of the flatness of their dashed-lines, they both have similar slopes (0.11 and 0.12).

232

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

Fig. 4. Covariance graphs for the MPC and “Other” forecasters’ probability judgments on inflation and real GDP growth. Notes: MPC is the Monetary Policy Committee, “Others” the “shadow” committee forecasters, p¯ 0 and p¯ 1 correspond to the mean probabilities when the outcome occurred and when it did not occur, respectively, and θ is the slope.

4. Conclusions The Monetary Policy Committee of the Bank of England has been issuing quarterly probabilistic forecasts of inflation since 1997. This paper considers two issues related to probability forecasts by a central bank: motivation and evaluation. Since problems of explicit monetary payoffs, rewards or penalties associated with the probability forecast and subsequent realization appear to be non-trivial when considered in the context of the central bank’s forecasting problem, we suggest reputation building be promoted via comparison of forecasting results of the monetary committee and those from a group of monetary experts (a “shadow” committee). Through explicit competition of the two groups, learning and adapting, the central bank can evolve to a more informative and transparent monetary policy.

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

233

In order to promote accountability, optimal forecast evaluation becomes an issue. While some studies use techniques designed to evaluate point-forecasts for the study of the MPC forecasts (e.g., Pagan, 2003), others have applied calibration-based evaluation methods (Clements, 2004; Wallis, 2003, 2004). Although calibration procedures are more appropriate than point-forecast techniques, due to the probabilistic nature of the published forecasts, calibration fails to take into account the forecaster’s ability to sort between the events that occurred and the events that did not occur. This paper suggests the use of the Brier score and its Yates partition to evaluate probability forecasts. We suggest that public reporting of the forecaster’s performance with these methods can help alleviate the central banks’ accountability problem and, potentially, bolster monetary policy’s stabilization features. The Brier score encompasses both calibration and resolution of the forecast. We argue that it is important to evaluate a central bank in terms of its accuracy in matching probabilities with both ex post relative frequencies (calibration) and resolution (sorting). The Yates’ partition of the Brier score allows analysts to study the ability of the forecasters to sort events, ex ante, into groups—those that ultimately obtain versus those that do not obtain. Utilizing these methods to evaluate the MPC and the surveyed forecasters’ (“shadow” committee) inflation and output growth rate forecasts, we found two substantive results. First, the MPC and the other forecasters have shown a large responsiveness to information not related to the forecasted variable. Second, our results suggest the MPC has hedged somewhat (or engaged in “wishful thinking”) its forecasts of inflation. The MPC could be hedging its inflation forecasts and or influencing them with “wishful” thoughts against a high (perhaps even moderate) inflation outcome. In this paper, we used the forecasts of other economic forecasters available from the Bank of England as our “shadow” committee of alternative forecasters. Our use of this set of forecasts was for illustration purposes only. In practice, we suggest that membership on the “shadow” committee needs to be given careful consideration. In particular, our set of other forecasters does not necessarily share the same set of information as the MPC. Important conditioning information held by the MPC was not necessarily held by our set of other forecasters. The “shadow” committee should be made aware of major policy conditionals so that the subsequent comparison of probabilistic forecasting performance is credible. Acknowledgements The authors would like to thank Leonardo Auernheimer for his helpful comments. G. CasillasOlvera acknowledges support from CONACYT and Banco de M´exico, and would like to thank Javier Duclaud for his insightful comments at early stages of the paper. References Bahra, B. (1997). Implied risk-neutral probability density functions from option prices: Theory and application. Bank of England. London: Working Paper No. 66. Bank of England (2003). Bank’s response to the Pagan Report. Bank of England Quarterly Bulletin, 30–32. Bessler, D. A., & Kling, J. L. (1989). The forecast and policy analysis. American Journal of Agricultural Economics, 79, 503–507. Bessler, D. A., & Ruffley, R. (2004). Prequential analysis of stock market returns. Applied Economics, 36, 399–412. Blinder, A. S. (1998). Central banking in theory and practice. Cambridge: MIT Press. Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78, 1–3.

234

G. Casillas-Olvera, D.A. Bessler / Journal of Policy Modeling 28 (2006) 223–234

Britton, E., Fisher, P., & Whitley, J. (1998). The inflation report projections: Understanding the fan chart. Bank of England Quarterly Bulletin, (February), 30–37. Clements, M. P. (2004). Evaluating the Bank of England density forecasts of inflation. Economic Journal, 114, 844–866. Clements, M. P., & Smith, J. (2000). Evaluating the forecast densities of linear and non-linear models: Applications to output growth and unemployment. Journal of Forecasting, 19, 255–276. David, D. (2003). Who’s who at the MPC. BBC News World Edition, Business Section, February 18th. Dawid, A. P. (1984). Statistical theory: A prequential approach. Journal of the Royal Statistical Society, 147, 278–297. de Finetti, B. (1937). La prevision: Ses lois logiques, ses sources subjectives. Annales de l’Institute Heni Poincare, 7, 1–67. de Finetti, B. (1965). Methods for discriminating levels of partial knowledge concerning a test item. British Journal of Mathematical and Statistical Psychology, 18(Part I), 87–123. de Finetti, B. (1974). Theory of probability. A critical introductory treatment. London: John Wiley and Sons. Granger, C. W. J., & Newbold, P. (1986). Forecasting economic time series (2nd ed.). London: Academic Press. Hatch, N. (2001). Modeling and forecasting at the Bank of England. In D. F. Hendry & N. R. Ericsson (Eds.), Understanding economic forecasts (pp. 124–148). Cambridge: MIT Press. Kling, J. L., & Bessler, D. A. (1989). Calibration-based predictive distributions: An application of prequential analysis to interest rates, money, prices, and output. Journal of Business, 62, 477–499. Mankiw, N. G. (1998). In R. M. Solow & J. B. Taylor (Eds.), Inflation, unemployment and monetary policy (pp. 72–78). Cambridge: MIT Press. McCallum, B. (1999). Issues in the design of monetary policy rules. In J. B. Taylor & M. Woodford (Eds.), Handbook of macroeconomics: Vol. 1C (pp. 1483–1530). Amsterdam: Elsevier Science Publishers B.V. Nelson, R. G., & Bessler, D. A. (1989). Subjective probabilities and scoring rules: Experimental evidence. American Journal of Agricultural Economics, 71, 363–369. Osband, K. (1989). Optimal forecasting incentives. Journal of Political Economy, 97, 1091–1112. Pagan, A. R. (2003). Report on modeling and forecasting at the Bank of England. Bank of England Quarterly Bulletin, (Spring), 60–88. Persson, T., & Tabellini, G. (1993). Designing institutions for monetary stability. Carnegie–Rochester Conference Series on Public Policy, 39, 53–84. Persson, T., & Tabellini, G. (1999). Political economics and macroeconomic policy. In J. B. Taylor & M. Woodford (Eds.), Handbook of macroeconomics: Vol. 1C (pp. 1397–1482). Amsterdam: Elsevier Science Publishers B.V. Persson, T., & Tabellini, G. (2000). Political economics. Explaining economic policy. Cambridge, MA: MIT Press. Samuelson, P. A. (1965). Economic forecasting and science. Michigan Quarterly Review, 4, 274–280. Savage, L. (1971). Elicitation of personal probabilities and expectations. Journal of the American Statistical Association, 66, 783–801. Svensson, L. E. O. (1997). Inflation forecast targeting: Implementing and monitoring inflation targets. European Economic Review, 41, 1111–1146. Wallis, K. F. (2003). Chi-squared tests of interval and density forecasts, and the Bank of England’s fan charts. International Journal of Forecasting, 19, 165–175. Wallis, K. F. (2004). An assessment of Bank of England and National Institute inflation forecast uncertainties. January, Mimeo, University of Warwick. Walsh, C. E. (1995). Central bank independence and the short-run output-inflation tradeoff in the E.C. In B. Eichengreen, J. Frieden, & J. von Hagan (Eds.), Monetary and fiscal policy in an integrated Europe (pp. 12–37). Berlin: SpringerVerlag. Walsh, C. E. (1998). Monetary theory and policy. Cambridge: MIT Press. Winkler, R. L. (1986). On good probability appraisers. In P. Goel & A. Zellner (Eds.), Bayesian inference and decision techniques (pp. 265–278). Amsterdam: Elsevier Science Publishers B.V. Woodford, M. (2003). Interest and prices. Foundations of a theory of monetary policy. Princeton: Princeton University Press. Yates, F. A. (1982). External correspondence: Decompositions of the mean probability score. Organizational Behavior and Human Performance, 30, 132–156. Yates, F. A. (1988). Analyzing the accuracy of probability judgments for multiple events: An extension of the covariance decomposition. Organizational Behavior and Human Decision Processes, 41, 281–299.