Forecasting inflation and GDP growth using heuristic optimisation of information criteria and variable reduction methods

Forecasting inflation and GDP growth using heuristic optimisation of information criteria and variable reduction methods

Computational Statistics and Data Analysis ( ) – Contents lists available at ScienceDirect Computational Statistics and Data Analysis journal home...

581KB Sizes 1 Downloads 40 Views

Computational Statistics and Data Analysis (

)



Contents lists available at ScienceDirect

Computational Statistics and Data Analysis journal homepage: www.elsevier.com/locate/csda

Forecasting inflation and GDP growth using heuristic optimisation of information criteria and variable reduction methods George Kapetanios a , Massimiliano Marcellino b,c,d , Fotis Papailias e,f,∗ a

School of Economics and Finance, Queen Mary, University of London, UK

b

Department of Economics, Bocconi University, Italy

c

Innocenzo Gasparini Insitute for Economic Research (IGIER), Italy

d

Center for Economic and Policy Research (CEPR), UK

e

Queen’s University Management School, Queen’s University Belfast, UK

f

quantf research1

article

info

Article history: Received 5 February 2014 Received in revised form 26 February 2015 Accepted 27 February 2015 Available online xxxx Keywords: Heuristic optimisation Information criteria Unbalanced datasets Forecasting Inflation GDP Principal components Partial least squares Bayesian shrinkage regression

abstract Forecasting macroeconomic variables using many predictors is considered. Model selection and model reduction approaches are compared. Model selection includes heuristic optimisation of information criteria using: simulated annealing, genetic algorithms, MC3 and sequential testing. Model reduction employs the methods of principal components, partial least squares and Bayesian shrinkage regression. The problem of unbalanced datasets is discussed and potential solutions are suggested. An out-of-sample forecasting exercise provides evidence that these methods are useful in predicting the growth rates of quarterly GDP and monthly inflation. © 2015 Elsevier B.V. All rights reserved.

1. Introduction Selecting proper forecasting methods for macroeconomic variables has been a major debate issue among researchers, academics, economic analysts and others. During the last decade, attention has focused on the development of methods to cope with a large and possibly unbalanced set of predictors. This literature can be divided into two main approaches: (i) variable selection and (ii) variable reduction. In the former approach, the aim is to identify the specific predictors with highest information content for the target variable. In the latter approach, the large information set is summarised into a smaller number of efficient predictors. Several methods have been proposed within each approach. Sequential testing, which belongs to the variable selection approach, has been analysed by Krolzig and Hendry (2001) and is associated to what is often described as ‘‘general-to-specific’’ methodology. Starting from a general statistical model

∗ Correspondence to: Queen’s University Management School, Queen’s University Belfast, Riddel Hall, 185 Stranmillis Road, BT9 5EE, Northern Ireland, UK. Tel.: +44 2890974667; fax: +44 2890974201. E-mail addresses: [email protected], [email protected] (F. Papailias). 1 www.quantf.com. http://dx.doi.org/10.1016/j.csda.2015.02.017 0167-9473/© 2015 Elsevier B.V. All rights reserved.

2

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



that captures the dynamics of the data, the model is reduced by sequentially testing the significance of the regressors in order to achieve parsimony while retaining accuracy. We implement Sequential Testing (ST ) along the lines of Hoover and Perez (1999). An alternative method for variable selection is based on the use of information criteria. While conceptually simple, this method is computationally very demanding when applied in a large dataset context. To tackle this issue, Kapetanios (2006) uses heuristic methods for the optimisation of information criteria. We apply similar heuristic algorithms to select the combination (subset) of all regressors that returns the minimum of a given information criterion. We use Akaike (1974) Information Criterion (AIC ) and Schwarz (1978) Bayesian Information Criterion (BIC ). Results using Hannan and Quinn (1979) Information Criterion are omitted as they are qualitatively similar to BIC . In terms of the heuristic optimisation algorithms, we consider: (i) the Simulated Annealing (SA), (ii) the Genetic Algorithm (GA), and (iii) the MC 3 . Recently, Buchen and Wohlrabe (2011) evaluated the boosting method on the Stock and Watson (2006) dataset providing evidence that alternatives of this kind need further exploration. The above techniques have been analysed by Acosta-González and Fernández-Rodríguez (2007), Alcock and Burrage (2004), Baragona et al. (2004), Brooks et al. (2003), Jacobson and Yücesan (2004), Jerrell and Campione (2001), Maringer (2005) and Gilli et al. (2011) among others. However, their forecasting performance has not been extensively covered. Therefore, we perform a forecasting exercise comparing sequential testing and variable selection based on the use of the non-standard approaches for the optimisation of the information criteria. We adopt the ‘‘direct’’ forecasting approach, which can be more robust in the presence of model misspecification (see Marcellino et al., 2006 for a detailed discussion), and rank the models and methods according to their Root Mean Square Forecast Error (RMSFE). An AR model which typically produces good and robust forecasts for several macroeconomic variables acts as a benchmark. The forecasting comparison also includes variable reduction methods which have often been used in the recent forecasting literature. Specifically, we evaluate: (i) the Principal Components (PC ), (ii) the Partial Least Squares (PLS ), and (iii) the Bayesian Shrinkage Regression (BR). In order to choose the number of factors and the shrinkage parameter, we use a grid search method to optimise the Root Mean Squared Error in a cross-validation set. We focus on forecasting the quarterly GDP growth and monthly inflation in the euro area, which are two key macroeconomic indicators, based on a large set of 195 monthly variables extracted from the Eurostat Principal European Economic Indicators (PEEIs) Dataset. We also discuss how to handle the mixed quarterly/monthly frequency in the case of GDP forecasting. The forecasting exercise investigates: (i) the combination of heuristic method(s) and information criteria that is most accurate; (ii) the top performers within the variable reduction approaches; (iii) the relative ranking of variable selection and variable reduction methods. Overall, our findings indicate that variable selection based on heuristic optimisation of information criteria often outperforms variable reduction methods (and the AR benchmark), matched only in some cases by the Bayesian Shrinkage Regression using LASSO regressions. As expected, AIC results in a larger set of predictors than BIC and, due to lack of parsimony, often deteriorates the forecasting performance. From an economic point of view, the selected regressors are also reasonable. Specifically, labour market variables (wages and salaries, employment index and unemployment rate) are most frequently selected by the heuristic approaches to forecast the HICP inflation, together with its own lags. Interest rates, inflation, money supply and other monetary variables are used in most of the cases to forecast the GDP growth rate. The rest of the paper is organised as follows. Section 2 briefly discusses the various methods and algorithms used in our empirical evaluation. Section 3 describes the forecast evaluation, the data, and how to cope with the data unbalancedness problem. Section 4 discusses the forecast results. Section 5 summarises our main findings and conclusions. 2. Variable selection and variable reduction Consider the following regression model: ′

yt = α + β 0 x0t + ϵt ,

t = 1, . . . , T ,

x0t

(1) 0

where is a k-dimensional vector of stationary predetermined variables. The superscript denotes the true regression model. Let the set of all available variables at time t be represented by the N-dimensional vector xt = (x1,t , . . . , xN ,t )′ , with N much larger than k and the set of variables in x0t contained in xt . The aim of the analysis is to determine x0t starting from xt . Formally, let I = (I1 , . . . , IN )′ denote a vector of zeros and ones (which we refer to as a string). Let I0 be the string for which I0i = 1 if xi,t is an element of x0t and zero otherwise. We wish to determine I0 starting from I. In small samples I0 may not represent the best fitting model for the data at hand. Thus, we cannot base a selection only on the goodness of fit. Therefore, information criteria to select the relevant variables in Eq. (1) should be considered. The generic form of information criteria is usually: IC (I) = −2L(I) + CT (I),

(2)

where L(I) is the log-likelihood of the model associated with string I and CT (I) is the penalty term related to the same ˜ (I) and ln(T )m ˜ (I), corresponding to AIC and BIC, respectively, where m ˜ (I) is string. The two penalty terms we use are 2m the number of free parameters in the model resulting from string I.

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



3

It is straightforward, under relatively weak conditions on xj,t and ϵj,t and using the results of, say, Sin and White (1996), to show that the string which minimises IC (·) will converge to I0 with probability approaching one as T → ∞, as long as (i) CT (I) → ∞ and (ii) CT (I)/T → 0. More specifically, the main assumptions needed for the results of Sin and White (1996) to hold are the following, assuming estimation of the models is undertaken by Gaussian or pseudo maximum likelihood (which, in the simplest case of spherical errors, is equivalent to OLS): (i) Measurability, continuity and twice differentiability of the log-likelihood function and a standard identifiability assumption; (ii) A uniform weak law of large numbers for the log-likelihood of each observation and its second derivative; (iii) A central limit theorem for the first derivative of the loglikelihood of each observation. (ii) and (iii) above can be obtained by assuming, e.g., that xj,t are weakly dependent, say, near epoch dependent, processes and ϵj,t are martingale differences processes. Hence, it is clear that consistency of model selection, as long as the penalty related conditions hold, is straightforwardly obtained in our context. In particular, BIC consistently estimates the true model in the sense of Sin and White (1996), but AIC is inconsistent, as CT remains bounded as T → ∞. For small dimensional xt , evaluating the information criteria for all strings may be feasible as, e.g., in AR lag order selection. In the case of lag selection the problem is made even easier by the fact that there exists a natural ordering of the variables, but in more general cases (e.g., in regression models) such an ordering may not be available. Moreover, as soon as N exceeds, say, 10 or 15 units, evaluating all strings is not feasible. In fact, since I is a binary sequence, there exist 2N strings to be evaluated. For example, when N = 50, and optimistically assuming that 100.000 strings can be evaluated per second, we still need about 357 years for an evaluation of all strings. A solution that overcomes this problem has recently been proposed by Gatu and Kontoghiorghes (2006). In addition, we have a discrete minimisation problem, so that many standard minimisation algorithms cannot be applied. To overcome these difficulties, we consider heuristic optimisation approaches which include: (i) simulated annealing (SA), (ii) genetic algorithm (GA), and (iii) the MC 3 . These approaches are employed here in their ‘‘standard’’ form; for detailed information see Brooks et al. (2003), Brüggemann et al. (2003), Gilli et al. (2011), Goffe et al. (1994), Hajek (1998), Hartl and Belew (1990), Kapetanios (2006), Krolzig and Hendry (2001), Maringer (2005), Morinaka et al. (2001) and Fernandez et al. (2001) among others. As mentioned in the Introduction, we also consider variable selection by means of sequential testing (ST ), see Hendry (1995, 1997) and Hoover and Perez (1999) among others for more information. In terms of variable reduction approaches, factor methods have been at the forefront of developments in forecasting with large datasets and in fact started this literature, see e.g., the influential work of Stock and Watson (2002a). The assumption is that the co-movements across the N variables in xt can be captured by a small number, r, of unobserved factors, grouped in the vector Ft = (F1,t · · · Fr ,t )′ . Formally, it is: x˜ t = Λ′ Ft + et ,

(3)

where x˜ t may be equal to xt or may involve other variables such as, e.g., lags and leads of xt , and Λ is a r × N matrix of parameters describing how the individual indicator variables relate to each of the r factors, which we denote with the term ‘loadings’. In Eq. (3) et is a vector of zero-mean I (0) errors, which represent the idiosyncratic component of each variable. It is important that the number of factors is small, so that the potentially many explanatory variables in Eq. (1) can be replaced by the few factors Ft . To extract (or estimate) the common factors we consider: (i) Principal Components (PC ) and (ii) Partial Least Squares (PLS ). A third alternative to reduce the dimensionality of xt is to use Bayesian Shrinkage Regression (BR), and we evaluate both Ridge and Lasso regressions. More information regarding the theoretical features of these methods and examples of their application can be found in De Mol et al. (2008), Forni et al. (2000, 2005), Helland (1988, 1990), Stock and Watson (2002a,b) and Wold (1982) among others. In order to choose the number of factors to be used in the PC and PLS methods, and the shrinkage parameter for BR, we employ a grid search cross-validation method. The function to be optimised in the cross-validation exercise is the Root Mean Squared Forecast Error as defined in Eq. (6) in the next section. We set up the cross-validation exercise in a similar fashion to Baillie et al. (2014). A maximum of 30 factors for PC and PLS and 5N shrinkage parameter values (with step 0.1) for BR are considered. In all cases, we present results in terms of RMSFE relative to an AR(1) benchmark, so that values smaller than one imply that a specific method beats the benchmark. The best method is the one with the smallest RMSFE value. 3. Forecasting exercise and data description 3.1. Structure of the forecasting exercise Assume that x0t has been obtained using one of the methods previously described. In the next step, we use x0t in combination with the direct approach to predict future values of the variable of interest, yt +h . The forecasts are given by: f  y t +h =  β h′ x0t ,

where  β is obtained by regressing yt on h

(4) x0t −h

and h denotes the forecast horizon, with h = 1, 2, . . . , H.

4

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



A summary of the (pseudo) out-of-sample forecasting algorithm follows. 1. For h = 1, 2, . . . , H, use an initial sample of T1 observations (T1 = T − E v al − h), where E v al is the evaluation sample. 2. With any method described in the previous section, obtain x0t ′ , t = 1, 2, . . . , T1 . ′ 3. Regress yt on x0t −h (for each h) and obtain  β h. f

f



f

f

′

4. Calculate the forecasts of  yt +h (for each h) using x0t ′ and  β h , and obtain  yt =  yt +1 , . . . , yt +H . 5. Repeat the whole procedure increasing the initial sample T1 to Tl = Tl−1 + 1 until Tl = T − h. At the end of the iterations, we will have gathered a number of E v al forecast values for each forecast horizon, h. The forecast errors are then calculated as: f f  et +h = yt +h − yt +h ,

(5)

and they can be used to compute the Root Mean Squared Forecast Error (RMSFE) for a given model, defined as:

  E v al  2  1  f RMSFEh =  et +h,j . E v al j=1

(6)

3.2. Data Our dependent variables are the growth rates of:

• Monthly HICP (EA-16) (source: Eurostat), • Quarterly GDP (EA-16), seasonally adjusted (source: ECB). We calculate the growth rate using the log transformation: gt (yt ) = ln



yt yt −1



.

(7)

The seasonal adjustment is made by the ECB and the series are provided transformed. The sample spans from 1996-Q2 to 2008-Q4 for the GDP series in levels and, consequently, from 1996-Q3 to 2008-Q4 for its quarter-on-quarter growth rate. The HICP series spans from January 1996 to February 2009 in levels and, from February 1996 to February 2009 for the month-on-month inflation. We have available 195 monthly predictors (source: Eurostat, ECB), spanning from January 1996 to February 2009. The dataset is the same used in Foroni and Marcellino (2014) and contains a large universe of variables that are potentially useful instruments in forecasting key macroeconomic variables in the Euro Area (see Table 1 for the mnemonics). Furthermore, in the spirit of Stock and Watson (2002a) we have transformed the series for stationarity using first differences or log differences appropriately (see Table 2 for the transformations, noting that some of the variables remain unchanged). An obvious problem that arises using quarterly regressands and monthly regressors is how to cope with the frequency irregularity. We assess three different approaches: Tr1 Take the quarterly average of each monthly indicator. Tr2 Take the observations in the last month of each quarter only. Tr3 Split each variable into three indicators, each of them containing observations for, respectively, the first, second and third months of each quarter. This approach is in the spirit of the UMIDAS regressions introduced in Foroni et al. (2015). It is less restrictive than the other methods, but it leads to a further increase in the number of predictors, which becomes N = 3 × 195 = 585. In the case of the monthly HICP inflation, we start the recursive forecast exercise using 110 observations for the first estimation sample and 36 observations (3 years) for out-of-sample evaluation. We also repeat the exercise using a first insample size of 85 observations, which allows for a 60 observations (5 years) out-of-sample evaluation period. Regarding the quarterly GDP growth, we start the forecasting algorithm using the first 33 observations. Then we perform the forecast exercise recursively for 12 evaluation periods (i.e., 3 years of out-of-sample data). We set h = 1, 2, . . . , 12 for inflation and h = 1, 2, . . . , 6 for GDP growth. Finally, we point out that in our context one could also produce monthly updates of the quarterly GDP growth forecasts. This exercise is considered in Bulligan et al. (2014). Their results are qualitatively similar to ours confirming the relevance of variable selection. 3.3. Handling unbalanced datasets Some of the 195 predictors we consider in the analysis are not available for the full sample period, they either start at a later date or present some scattered missing observations or are not available at the very end of the sample. The effect of missing observations is twofold. First, we need to adapt the estimation methods. There are particular ways to handle missing observations in a given model. For example, if a factor model is used, the assumed factor structure can be used both for the estimation of the

69 BS − SV − NY

34 MIG − NDCOG − IS − PPI

35 MIG − NRG − IS − PPI

36 D35 − E36 − IS − IMPR

37 IS − WSI − F

38 B − D − IS − WSI

39 B − E36 − IS − WSI

4 CP − HI00XE

5 CP − HI00XTB

6 CP − HI00

7 CP − HI01

8 CP − HI02

49 MIG − DCOG − IS − WSI

50 MIG − ING − IS − WSI

51 MIG − NDCOG − IS − WSI

52 MIG − NRG − IS − WSI

53 BS − CCI − BAL

54 BS − CEME − BAL

55 BS − COB − BAL

56 BS − CPE − BAL

57 BS − CTA − BAL

58 BS − BCI

59 BS − CSMCI

60 BS − FS − LY

21 B − D − IS − PPI

22 B − E36 − IS − PPI

23 B − C − D − IS − PPI

24 B − IS − PPI

25 B − TO − E36 − IS − PPI

26 C − IS − PPI

27 C − ORD − IS − PPI

28 D − IS − PPI

29 E36 − IS − PPI

30 MIG − CAG − IS − PPI

46 E36 − IS − WSI

16 CP − HI10

20 CP − HIF

45 D − IS − WSI

15 CP − HI09

19 CP − HIE

44 D35 − E36 − IS − WSI

14 CP − HI08

47 MIG − CAG − IS − WSI

43 C − IS − WSI

13 CP − HI07

48 MIG − COG − IS − WSI

42 B − TO − E36 − IS − WSI

12 CP − HI06

17 CP − HI11

41 B − IS − WSI

11 CP − HI05

18 CP − HI12

40 B − C − D − IS − WSI

10 CP − HI04

9 CP − HI03

68 BS − SFSH

33 MIG − ING − IS − PPI

90 BS − SERM

89 BS − SCI

88 BS − SARM

87 BS − SAEM

86 BS − SABC

Label

120 B − C − IS − EPI

119 B − E36 − IS − EPI

118 B − D − IS − EPI

117 IS − EPI − F

116 MIG − NDCOG − IS − IP

115 MIG − ING − IS − IP

114 MIG − DCOG − IS − IP

113 MIG − COG − IS − IP

112 MIG − CAG − IS − IP

111 D − IS − IP

110 C − ORD − IS − IP

109 C − IS − IP

108 B − IS − IP

107 B − C − IS − IP

106 B − D − IS − IP

105 IS − IP − F

104 IS − IP − F − CC 2

103 IS − IP − F − CC 1

102 IS − IP

101 1000 − PERS − LM − UN − T − TOT

100 1000 − PERS − LM − UN − T − LE25

99 1000 − PERS − LM − UN − T − GT 25

98 RT − LM − UN − T − TOT

97 RT − LM − UN − T − LE25

96 RT − LM − UN − T − GT 25

95 BS − SCI − BAL

94 BS − RCI − BAL

93 BS − ICI − BAL

92 BS − ESI − I

91 BS − CSMCI − B

#

150

149

148

147

146

145

144

143

142

141

140

139

138

137

136

135

134

133

132

131

130

129

128

127

126

125

124

123

122

121

#

MIG − DCOG − IS − ITD

MIG − COG − IS − ITT

MIG − COG − IS − ITND

MIG − COG − IS − ITD

MIG − CAG − IS − ITT

MIG − CAG − IS − ITND

MIG − CAG − IS − ITD

C − ORD − IS − ITT

C − ORD − IS − ITND

C − ORD − IS − ITD

C − IS − ITT

C − IS − ITND

C − IS − ITD

B − C − IS − ITT

B − C − IS − ITND

B − C − IS − ITD

G45 − IS − EPI

MIG − NRG − IS − EPI

MIG − NDCOG − IS − EPI

MIG − ING − IS − EPI

MIG − DCOG − IS − EPI

MIG − COG − IS − EPI

MIG − CAG − IS − EPI

E − IS − EPI

E36 − IS − EPI

D − IS − EPI

D35 − E36 − IS − EPI

C − IS − EPI

B − TO − E36 − IS − EPI

B − IS − EPI

Label

180

179

178

177

176

175

174

173

172

171

170

169

168

167

166

165

164

163

162

161

160

159

158

157

156

155

154

153

152

151

#

EMECB3Y

EMECB2Y

DJES50I

BDSHRPRCF

EXA − RT − GBP

EXA − RT − JPY

EXA − RT − USD

LTGBY − RT

3MI − RT

M3

M2

M1

X − G473 − IS − DIT

NFOOD − X − G473 − IS − DIT

NFOOD − IS − DIT

IS − DIT

FOOD − IS − DIT

IS − CAR

C − ORD − X − C 30 − IS − IO

C − ORD − IS − IO

IS − HWI − F

IS − IPI − F − CC 11 − X − CC 113

IS −PEI −F −CC 11−X −CC 113

MIG − NDCOG − IS − ITT

MIG − NDCOG − IS − ITD

MIG − ING − IS − ITT

MIG − ING − IS − ITND

MIG − ING − IS − ITD

MIG − DCOG − IS − ITT

MIG − DCOG − IS − ITND

Label

195

194

193

192

191

190

189

188

187

186

185

184

183

182

181

#

BDECBXLIA

BDECBXOLA

BDECBXNOA

BDECBXNGA

BDECBXDMA

BDECBXDGA

BDEBDBSIA

BDWU0022R

BDWU1032R

FIBOR6M

FIBOR3M

FIBOR1Y

EMGBOND

EMECB7Y

EMECB5Y

Label

)

85 BS − RPBS

84 BS − ROP

83 BS − REM

82 BS − REBS

81 BS − RCI

80 BS − RAS

79 BS − ISPE

78 BS − ISFP

77 BS − IPT

76 BS − IPE

75 BS − IOB

74 BS − IEOB

73 BS − IEME

72 BS − ICI

71 BS − UE − NY

70 BS − SV − PR

67 BS − PT − NY

66 BS − PT − LY

65 BS − MP − PR

64 BS − MP − NY

63 BS − GES − NY

62 BS − GES − LY

3 CP − HI00XES

Label

61 BS − FS − NY

#

32 MIG − DCOG − IS − PPI

Label

31 MIG − COG − IS − PPI

#

2 CP − HI00XEF

Label

1 CP − HI00XEFU

#

Table 1 Set of predictors (mnemonics).

G. Kapetanios et al. / Computational Statistics and Data Analysis ( – 5

6

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



Table 2 Transformations. #

Label

#

Label

#

Label

#

Label

#

Label

#

Label

#

Label

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs

31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60

FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange

61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90

NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange NoChange

91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120

NoChange NoChange NoChange NoChange NoChange FirstDiff . FirstDiff . FirstDiff . NoChange NoChange NoChange FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs

121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150

FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs

151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180

FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff . FirstDiff . FirstDiff . FirstDiff . FirstDiff . FirstDiff ., Logs FirstDiff ., Logs FirstDiff . FirstDiff .

181 182 183 184 185 186 187 188 189 190 191 192 193 194 195

FirstDiff . FirstDiff . FirstDiff . FirstDiff . FirstDiff . FirstDiff . FirstDiff . FirstDiff . FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs FirstDiff ., Logs

unobserved factors and the interpolation of missing observations by using the factor estimates. This can be implemented via an EM algorithm, as discussed in the literature. However, it would be useful to have a generic (non-model specific) method T to handle this problem. For this, we note that most estimation methods rely on sample moments of the form T1 t =1 f (xt ). For example, the i, j-th element of X ′ X in the Bayesian Regression, is given by

1 T 1 T

T xjt xit . Therefore, as a generic device Tt =1 t =1,t ∈Ix f (xt ) where Ix denotes the set of

for handling missing observations for parameter estimation we suggest using time indices, for which an observation for xt exists. This enables the estimation of moments that are needed for parameter estimation. The second effect of missing observations arises when end of sample observations for a predictor xt are not available, as they are needed for forecasting. In this case, we suggest as a generic solution the use of simple AR forecasting models to forecast the missing value(s) for xt . This approach works well in a related context in Kuzin et al. (2011); Marcellino and Schumacher (2010). Mixed sampling frequency can be considered as a special case of missing observations, where for variables collected at lower frequency some observations are systematically missing. This case is considered in the previous subsection. 4. Discussion of empirical results In this section we discuss the outcome of the forecasting exercise for the monthly HICP inflation and the quarterly GDP growth. All figures are in terms of the RMSFE of each method relative to the AR(1) benchmark. Specifically, we report the ratio RMSFEAlternativ e /RMSFEBenchmark . The AR(1) benchmark is used as a point of reference and allows cross-comparisons across different approaches. To make the results clearer, we report the Sequential Testing using 1% level of statistical significance (this permits to decrease the number of variables to which apply ST and improves the performance by dropping not significant regressors), and the heuristic approaches that use BIC in the selection process. Next, we take a closer look at the variable selection methods and we examine how the optimal solutions change by running a 30 times repetition (reporting the case of HICP with 36 evaluation periods). Finally, we discuss the variables that are selected by the most successful heuristic approaches and the Bayesian Shrinkage Regression using LASSO regressions. This sheds some extra light on the economic meaning of the ‘‘best’’ predictors. In all cases, we have included the first lag of the dependent variable in the set of predictors. In the grid search crossvalidation used in the variable reduction methods we use a training set of 36 observations for the HICP inflation and 12 observations for GDP growth. In other words, we find the factor/shrinkage parameter value that provided the smallest RMSFE in the past 3 years and use it for the out-of-sample forecasting exercise.

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



7

Table 3 Forecasting HICP: RMSFE. Method \ h

Eval = 36

Eval = 60

1

3

6

12

Average

SD

1

3

6

12

Average

SD

ST SABIC GABIC MC 3BIC

0.860 0.890 0.899 0.862

0.797 0.867 0.861 0.837

0.971 0.996 0.999 1.018

1.136 1.240 1.335 1.288

0.941 0.998 1.024 1.001

0.149 0.171 0.216 0.207

0.827 0.882 0.916 0.950

0.837 0.873 0.855 0.901

0.961 1.173 1.177 1.054

1.303 1.335 1.582 1.260

0.982 1.066 1.132 1.041

0.223 0.227 0.330 0.159

PC − CrV PLS − CrV BR − CrV BR − LASSO − CrV

0.979 0.975 1.071 0.894

0.957 0.955 1.009 0.829

1.150 1.134 1.096 0.943

1.408 1.401 1.354 1.202

1.124 1.116 1.132 0.967

0.208 0.206 0.152 0.164

0.973 0.979 1.104 0.891

0.975 0.962 1.039 0.861

1.151 1.120 1.082 0.915

1.519 1.478 1.553 1.184

1.155 1.135 1.195 0.963

0.257 0.239 0.240 0.149

1

We use the following notation: Sequential Testing at 1% significance level (ST 1 ), Simulated Annealing with BIC (SABIC ), 3 Genetic Algorithm with BIC (GABIC ), MC 3 with BIC (MCBIC ), Principal Components with factors selected by grid search crossvalidation (PC − CrV ), Partial Least Squares with factors selected by grid search cross-validation (PLS − CrV ), Bayesian Shrinkage Regression using Ridge regressions and shrinkage parameter selected by grid search cross-validation (BR − CrV ), and Bayesian Shrinkage Regression using LASSO regressions and shrinkage parameter selected by grid search cross-validation (BR − LASSO − CrV ). 4.1. Forecasting HICP inflation The results of HICP inflation forecasting are reported in Table 3 for h = 1, 3, 6, 12 steps ahead forecasts. The table is divided into two panels: the left panel reports the relative RMSFE of each method in a forecast exercise with 36 evaluation rounds, whereas the right panel reports the same results for an exercise with 60 evaluation rounds. 3 Starting with the 36 evaluation periods exercise, for h = 1ST 1 and MCBIC are the best methods providing a relative RMSFE of 0.860 and 0.862, respectively. The other heuristic approaches are better than the AR(1) benchmark and the variable reduction methods, where only the BR − LASSO − CrV using lasso regressions returns a relative RMSFE of 0.894. This is also evident in Fig. 1, which reports the forecast values of all methods (including the AR(1) benchmark) contrasted to the actual values, after having translated the growth forecasts in levels to make the comparison more visible. Next, for h = 3 the best two methods are ST 1 and BR − LASSO − CrV with a relative RMSFE of 0.797 and 0.829 respectively. 3 SABIC , GABIC and MCBIC still perform better compared to the benchmark and the variable reduction methods of Principal Components and Partial Least Squares. The same qualitative findings hold for h = 6, when the top performers are again the ST 1 and the BR − LASSO − CrV . However, at the longer horizon, i.e. h = 12, none of the methods outperforms the simple AR(1). For the 60 evaluation period exercise, ST 1 is again the top performer for h = 1, 3, 6. The second best method is the SABIC , GABIC and BR − LASSO − CrV for h = 1, h = 3 and h = 6, respectively. 4.2. Convergence of heuristics Another issue which often arises when using the heuristic optimisations is the convergence to the optimal solution. To further investigate this issue, we report in Fig. 2 the boxplots of the optimal fitness values for BIC using 30 repetitions across the 36 out-of-sample periods. We see that all three heuristic approaches follow the same uptrend as we move across time. Specifically, we see that after the fifth out-of-sample period SABIC provides solutions with closer convergence with a small number of extreme values. Next, 3 GABIC is also accurate in terms of convergence, however we see boxes that are slightly larger. Finally, MCBIC seems to provide more extremes towards the end of the out-of-sample evaluation period. In general, we can claim that the heuristic approaches, as employed here and for the purposes of the forecasting exercise, do not suffer from convergence problems. 4.3. Forecasting the GDP growth rate Before we start discussing the GDP results in detail, it is worth mentioning that the available data is very limited, allowing for an initial in-sample size of just 33 observations. This is not ideal, however we can still use it to reach some useful conclusions. Table 4 is divided into three panels: the top panel reports the results using the quarterly average transformation (Tr1), the mid panel reports the results using the last month of each quarter transformation (Tr2) and the bottom panel reports the results using the U-MIDAS approach where each monthly predictor is split into three variables that contain the first, second and third months of each quarter (Tr3). The first important fact emerging from Table 4 is the superior performance provided by the BR − LASSO − CrV across all three transformations for h = 1, h = 2 and h = 4. It results in an average RMSFE of 0.667, 0.704 and 0.669 for Tr1, Tr2 and

8

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



Fig. 1. Forecasting HICP, h = 1.

Tr3 respectively. We also observe a common pattern regarding the ranking of the methods: BR − LASSO − CrV is better for short-term forecasting (up to 4 quarters) and PC − CrV and PLS − CrV are better for longer term forecasts (h = 6). In detail, using Tr1 we see that ST 1 outperforms the other methods (apart from BR − LASSO − CrV ) for h = 1. However, for h = 2 we see that BR − LASSO − CrV and BR − CrV are the top-two performing methods followed by ST 1 and GABIC . For h = 4 we again have BR − LASSO − CrV and BR − CrV as the top performers with a relative RMSFE of 0.780 and 0.585 respectively. In the last case of h = 6 quarters ahead, PC − CrV and PLS − CrV provide better forecasts with 0.988 and 0.937 RMSFE respectively. These results are illustrated in Fig. 3. Using Tr2 we again see the superiority of BR − LASSO − CrV , with an average RMSFE of 0.704. The next best method, 3 across all forecast horizons, is MCBIC with an average RMSFE of 0.860. Similarly, for Tr3 we also see that BR − LASSO − CrV 3 and BR − CrV are followed by MCBIC with an average RMSFE of 0.669, 0.675 and 0.999 respectively. 4.4. Selected variables Having discussed the usefulness of the heuristic variable selection approaches and the Bayesian Shrinkage Regression using LASSO regressions, we now examine the underlying selected variables. In Table 5 we report the top twenty variables selected by the three heuristics and BR − LASSO − CrV across the evaluation periods. The table is divided into four panels: (i) the top left panel is concerned with the HICP inflation forecasting using 36 evaluation periods, (ii) the top right panel is concerned with GDP growth forecasting using Tr1, (iii) the bottom left panel is concerned with GDP growth forecasting using Tr2 and (iv) the bottom right panel is concerned with GDP growth forecasting using Tr3. In each panel we present the variable categories, the variable numbers, the number of times each variable is selected by the different methods and, finally, the total number of selections. 4.4.1. HICP inflation Starting with the HICP, we see in the top left panel of Table 5 that the most selected variables are categorised as follows: (i) inflation related variables, (ii) labour variables that include employment, unemployment, employment and unemployment expectations, wages and salaries, (iii) building activity variables, (iv) financial variables that include total liabilities and non-equity holdings.

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



9

Fig. 2. Forecasting HICP. Convergence of heuristic algorithms.

The above variables are not selected in the same manner for each method. Specifically, ST 1 selects the HICP-All items variable 33 times out of the 36 out-of-sample evaluation rounds (92% of the time) with employment, unemployment, wages and salaries and building activity being selected from 33% to 53% of the times. GABIC focuses more on the three top variables that fall in the following two categories: HICP-All items (50% of the times) 3 follows a pattern very similar to ST 1 , selecting the HICP-All and the number of persons employed (33% of the times). MCBIC items variable in 83% of the times and wages and salaries 44% of the times.

10

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



Table 4 Forecasting GDP. Method \ h

Tr1 1

2

4

6

Average

SD

ST SABIC GABIC MC 3BIC

0.791 3.362 1.191 0.976

0.768 2.843 0.872 0.931

0.991 1.015 0.946 1.080

0.995 1.560 1.375 1.163

0.886 2.195 1.096 1.038

0.124 1.092 0.231 0.104

PC − CrV PLS − CrV BR − CrV BR − LASSO − CrV

1.506 1.675 0.963 0.535

1.345 1.426 0.756 0.530

1.075 1.065 0.780 0.585

0.988 0.937 1.334 1.018

1.228 1.276 0.958 0.667

0.240 0.337 0.267 0.235

Method \ h

Tr2 1

2

4

6

Average

SD

ST SABIC GABIC MC 3BIC

1.001 0.936 0.794 0.857

0.648 0.944 0.893 0.830

0.953 1.109 2.993 0.735

2.593 1.443 1.080 1.018

1.299 1.108 1.440 0.860

0.877 0.237 1.042 0.117

PC − CrV PLS − CrV BR − CrV BR − LASSO − CrV

1.484 1.612 1.151 0.631

1.314 1.421 0.843 0.570

1.060 1.067 0.625 0.601

0.987 0.922 1.157 1.012

1.211 1.256 0.944 0.704

0.230 0.317 0.258 0.207

Method \ h

Tr3 1

2

4

6

Average

SD

ST 1 SABIC GABIC MC 3BIC

0.977 2.091 0.802 0.922

0.802 2.056 1.399 1.073

0.900 1.251 1.155 0.733

1.452 1.089 1.061 1.268

1.033 1.622 1.104 0.999

0.288 0.526 0.247 0.227

PC − CrV PLS − CrV BR − CrV BR − LASSO − CrV

1.689 1.505 0.707 0.565

1.416 1.371 0.557 0.538

1.055 1.042 0.568 0.578

0.977 0.952 0.869 0.997

1.284 1.217 0.675 0.669

0.331 0.263 0.146 0.219

1

1

On the other hand, BR − LASSO − CrV tends to select financial variables most of the times. Specifically, total liabilities is selected 100% of the times and non-equity holdings 86% of the times followed by inflationary variables. In general, all methods (on average) use inflation related variables and labour market variables as their main forecasting instruments. This selection highlights the forecasting power of: (i) the ‘‘autoregressive’’ nature of the series and, (ii) a potential Phillips Curve relationship. 4.4.2. GDP growth rate Moving to the GDP growth forecasting, the results for the three different transformations are in the top right, bottom left and bottom right panels of Table 5. In general, now the top selected variables (across all transformations and methods) are categorised as: (i) inflation related variables, (ii) financial market variables which include interest rates, exchange rates, debt securities, deposits and stock price indexes, (iii) production variables, (iv) money supply and (v) unemployment variables. The above findings confirm the well-known ability of the yield curve to predict recessions along with the impact labour market variables have on the GDP. Another important fact is the effect of the German economy on the Euro Area’s GDP. Using the Tr3 transformation, we see that the DAX index is selected 100% of the times (12 out of 12 evaluation rounds) in the BR − LASSO − CrV approach. 5. Summary and conclusions The issue of forecasting key macroeconomic variables is approached using indicators from a large unbalanced dataset. The predictors are selected accordingly to heuristic optimisation techniques and reduction methods, such as factor analysis, in order to reduce the computational burden. Our work, overall, indicates that these methods are worth considering, as some of the results are promising. We focus on forecasting euro area HICP inflation and the GDP growth rate, finding that sequential testing at 1% significance level and the MC 3 using BIC provide more accurate forecasts compared to the rest of the heuristic methods. The Bayesian Shrinkage Regression using LASSO regressions is the best method in the GDP growth forecasting exercise. As discussed, it is also interesting that the variable selection methods and the Bayesian Shrinkage Regression using LASSO regressions tend to choose variables with strong economic links with the dependent variables, which explains their successful performance.

42 71 38 57 87 19

Gross Wages & Salaries

Unemployment Expectations (next 12 months)

Gross Wages & Salaries

Building Activity (past 3 months)

Demand Expectation (next 3 months)

HICP-Energy

New Residential Buildings

163 171 172 109 195

3-Month Interest Rates

New Residential Buildings

Total Liabilities (EUR)

189

Debt Securities Issued

M3

187

Yields—All Bank Bonds

4

5

6

5

6

7

7

SABIC

0

6

4

2

0

7

9

0

0

2

12

12

15

13

12

19

16

16

16

33

SABIC

2

1

9

9

9

8

9

GABIC

0

0

0

6

0

5

1

4

0

6

0

1

4

2

5

7

6

12

12

18

GABIC

6

3

6

8

7

8

10

3 MCBIC

1

4

2

4

0

4

6

0

0

3

11

11

8

13

16

10

16

14

14

30

3 MCBIC

8

11

0

0

0

0

2

BR − LASSO − CrV

27

18

22

17

31

15

15

29

36

25

15

15

15

15

15

15

15

15

15

15

BR − LASSO − CrV

20

20

21

22

22

23

28

Total

28

28

28

29

31

31

31

33

36

36

38

39

42

43

48

51

53

57

57

96

Total

GDP Tr 1, Eval = 12

HICP-Furnishings

Seasonal Food

HICP-Excl. Energy &

5-Year Yield

HICP-Furnishings

HICP-Education

3-Month Interest Rates

Unprocessed Food

HICP-Excl. Energy &

Category

GDP Tr 3, Eval = 12

New Orders

Offered Rate

DE Interbank 12 Month

2-Year Yield

Stoxx 50

M3

Turnover Index

months)

Price Trends (last 12

Euro Area (EUR)

Non-Equity Holdings,

New Orders

(EUR)

Euro Area Deposits

Debt Securities Issued

Production Index

HICP-Furnishings

EUR/JPY (Average)

New Cars

3-Year Yield

3-Month Interest Rates

HICP-Excl. Tobacco

HICP-All Items

HICP-Education

Category

11-M3

3-M3

M3

181-

11-M2

16-M3

M3

172-

1-M3

Variable

162

184

179

178

171

164

66

192

161

190

189

106

11

175

163

180

172

5

6

16

Variable

3

7

5

5

6

6

8

SABIC

1

6

6

4

5

5

1

5

3

6

6

5

5

5

6

5

8

10

10

9

SABIC

5

5

3

3

5

2

6

GABIC

0

3

3

1

5

1

3

4

0

5

4

6

6

2

7

3

4

8

8

12

GABIC

1

4

0

1

1

4

6

3 MCBIC

3

7

7

3

5

10

4

6

3

7

6

7

4

4

7

5

9

7

8

8

3 MCBIC

1

8

1

10

9

7

11

9

17

17

18

18

19

23

29

Total

16

17

17

17

17

17

17

18

18

19

19

19

19

21

21

22

22

26

27

30

Total

(continued on next page)

BR − LASSO − CrV

12

1

1

9

2

1

9

3

12

3

1

4

10

1

9

1

1

1

1

BR − LASSO − CrV

)

New Cars

Variable

158

Category

GDP Tr 2, Eval = 12

83

9

HICP-Clothing

Employment Expectations (next 3 months)

4

192

HICP-Excl. Energy

120

Non-Equity Holdings, Euro Area (EUR)

54

Employment Expectations (next 3 months)

# Persons Employed

13

HICP-Transport

195

39

Gross Wages & Salaries

Total Liabilities (EUR)

40

122

# Persons Employed

Gross Wages & Salaries

119

6

Variable

# Persons Employed

HICP-All Items

Category

HICP, Eval = 36

Table 5 Most frequently selected variables.

G. Kapetanios et al. / Computational Statistics and Data Analysis ( – 11

179 181 77

2-Year Yield

5-Year Yield

Production Development (Observed over the past 3

151 164

Turnover Index 5

1

4

2

3

3

1

4

2

4

5

6

4

2

0

8

2

5

6

3

3

2

1

5

5

3

4

0

1

2

4

2

2

4

3

2

3

6

1

2

12

0

8

2

3

8

4

8

9

3

0

10

13

13

13

14

14

14

14

15

15

16

16

17

18

Total Unemployment

Unemployment (U25)

3 months)

(Observed over the past

Development

Production

HICP-Excl. Tobacco

Yields—All Bank Bonds

Yields—All Bank Bonds

Unemployment (U25)

HICP-Furnishings

Seasonal Food

HICP-Excl. Energy &

3-Year Yield

3-Year Yield

DAX

HICP-Excl. Energy

98-M3

97-M3

77-M3

5-M1

M3

187-

M1

187-

97-M2

11-M1

3-M2

M3

180-

M2

180-

M3

177-

4-M3

0

2

2

4

5

4

0

6

8

3

4

2

6

0

0

5

3

2

1

0

4

3

2

1

1

3

2

0

6

6

1

2

3

4

3

0

1

1

1

6

12

12

1

1

7

8

12

1

1

11

10

12

14

14

14

14

15

15

15

15

15

16

16

16

16

G. Kapetanios et al. / Computational Statistics and Data Analysis (

Turnover Index (Non-Domestic)

months)

106

45

161

3

115

Production Index

Gross Wages & Salaries

New Orders

HICP-Excl. Energy & Seasonal Food

Production Index

97

190

Euro Area Deposits (EUR)

Unemployment (U25)

180

3-Year Yield

Table 5 (continued)

12 ) –

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



13

Fig. 3. Forecasting GDP using Tr1, h = 1.

Acknowledgements The authors would like to thank the Editor, the Associated Editor, three anonymous referees and the participants of: (i) the CREATES Seminar Series presentation, and (ii) the Department of Economics, University of Ioannina Seminar Series presentation for their helpful comments which improved the quality of this paper. Any remaining errors are our own. Additional results are available upon request. Appendix. Parameters setup and normalisation For the simulated annealing and genetic algorithms we use the same (default) values as in Kapetanios (2006), i.e. in the simulated annealing h = 1, Bυ = 500, Bs = 5000, T0 = 10, in the genetic algorithm m = 200, Bg = 200, pc = 0.6 and pm = 0.1. We allow the max counter of convergence iterations to be 10 and 500 times. In all heuristics we have used the data as is, however in PC and PLS we have normalised the regressors to zero mean and unit variance series. For the Bayesian Shrinkage Regression with lasso we use the iterative Landweber algorithm with the same values as in De Mol et al. (2008). References Acosta-González, E., Fernández-Rodríguez, F., 2007. Model selection via genetic algorithms illustrated with cross-country growth data. Empir. Econom. 33, 313–337. Akaike, H., 1974. A new look at the statistical model identification. IEEE Trans. Automat. Control 19, 716–723. Alcock, J., Burrage, K., 2004. A genetic estimation algorithm for parameters of stochastic ordinary differential equations. Comput. Statist. Data Anal. 42 (2), 255–275. Baillie, R.T., Kapetanios, G., Papailias, F., 2014. Bandwidth selection by cross-validation for forecasting long memory financial time series. J. Empir. Financ. 29, 129–143. Baragona, R., Battaglia, F., Cucina, D., 2004. Fitting piecewise linear threshold autoregressive models by means of genetic algorithms. Comput. Statist. Data Anal. 47, 277–295. Brooks, S.P., Friel, N., King, R., 2003. Classical model selection via simulated annealing. J. Roy. Statist. Soc. Ser. B 65, 503–520. Brüggemann, R., Krolzig, H.M., Lutkepohl, H., 2003. Comparison of Model Reduction Methods for VAR Processes, Technical Report 2003-W13. Nuffield College, University of Oxford.

14

G. Kapetanios et al. / Computational Statistics and Data Analysis (

)



Buchen, T., Wohlrabe, K., 2011. Forecasting with many predictors: is boosting a viable alternative? Econom. Lett. 113, 16–18. Bulligan, G., Marcellino, M., Venditti, F., 2014. Forecasting economic activity with targeted predictors. Int. J. Forecast. 31, 188–206. De Mol, C., Giannone, D., Reichlin, L., 2008. Forecasting with a large number of predictors: is bayesian regression a valid alternative to principal components. J. Econometrics 146, 318–328. Fernandez, C., Ley, E., Steel, M.F.J., 2001. Benchmark priors for bayesian model averaging. J. Econometrics 100, 381–427. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2000. The generalised factor model: identification and estimation. Rev. Econ. Stat. 82, 540–554. Forni, M., Hallin, M., Lippi, M., Reichlin, L., 2005. The generalised factor model: one-sided estimation and forecasting. J. Amer. Statist. Assoc. 100 (471), 830–840. Foroni, C., Marcellino, M., 2014. A comparison of mixed frequency approaches for modelling euro area macroeconomic variables. Int. J. Forecast. 30, 554–568. Foroni, C., Marcellino, M., Schumacher, C., 2015. Unrestricted mixed data sampling (MIDAS): MIDAS regressions with unrestricted lag polynomials. J. Roy. Statist. Soc. Ser. A 178, 57–82. Gatu, C., Kontoghiorghes, E.J., 2006. Branch-and-bound algorithms for computing the best subset regression models. J. Comput. Graph. Statist. 15, 139–156. Gilli, M., Maringer, D., Schumann, E., 2011. Numerical Methods and Optimization in Finance. Academic Press. Goffe, W.L., Ferrier, G.D., Rogers, J., 1994. Global optimisation of statistical functions with simulated annealing. J. Econometrics 60 (1), 65–99. Hajek, B., 1998. Cooling schedules for optimal annealing. Math. Oper. Res. 13 (2), 311–331. Hannan, E.J., Quinn, B.G., 1979. The determination of the order of an autoregression. J. Roy. Statist. Soc. Ser. B 41, 190–195. Hartl, H.R.F., Belew, R.K., 1990. A Global Convergence Proof for a Class of Genetic Algorithms, Technical Report. Technical Universiy of Vienna. Helland, I.S., 1988. On the structure of partial least squares regression. Comm. Statist. Simulation Comput. 17, 581–607. Helland, I.S., 1990. Partial least squares regression and statistical models. J. Statist. 17, 91–114. Hendry, D.F., 1995. Dynamic Econometrics. Oxford University Press. Hendry, D.F., 1997. On Congruent Econometric Relations: A Comment. In: Carnegie-Rochester Conference Series on Public Policy, vol. 47. pp. 163–190. Hoover, K.D., Perez, S.J., 1999. Data mining reconsidered: encompassing and the general-to-specific approach to specification set. Econom. J. 2, 167–191. Jacobson, S.H., Yücesan, E., 2004. Global optimization performance measures for generalized hill climbing algorithms. J. Global Optim. 29, 173–190. Jerrell, M.E., Campione, W.A., 2001. Global optimization of econometric functions. J. Global Optim. 20 (3-4), 273–295. Kapetanios, G., 2006. Variable selection in regression models using non-stantard optimisation of information criteria. Comput. Statist. Data Anal. 52 (1), 4–15. Krolzig, H.M., Hendry, D.F., 2001. Computer automation of general-to-specific model selection procedures. J. Econ. Dynam. Control 25 (6–7), 831–886. Kuzin, V., Marcellino, M., Schumacher, C., 2011. MIDAS vs. mixed-frequence VAR: nowcasting GDP in the Euro area. Int. J. Forecast. 27, 529–542. Marcellino, M., Schumacher, C., 2010. Factor-MIDAS for now- and forecasting with ragged-edge data: a model comparison for German GDP. Oxf. Bull. Econ. Stat. 72, 518–550. Marcellino, M., Stock, J., Watson, M., 2006. A comparison of direct and iterated multistep AR methods for forecasting macroeconomic time series. J. Econometrics 135, 499–526. Maringer, D., 2005. Portfolio Management with Heuristic Optimization. Springer, Dordrecth. Morinaka, Y., Yoshikawa, M., Amagasa, T., 2001. The L-index: an indexing structure for efficient subsequence matching in time sequence databases. In: Proceedings of the Fifth Pacific-Asia Conference on Knowledge Discovery and Data Mining. Schwarz, G., 1978. Estimating the dimension of the model. Ann. Statist. 6, 461–464. Sin, C.Y., White, H., 1996. Information criteria for selecting possibly misspecifed parametric models. J. Econometrics 71 (1–2), 207–225. Stock, J., Watson, M., 2002a. Forecasting using principal components from a large number of predictors. J. Amer. Statist. Assoc. 297, 1167–1179. Stock, J., Watson, M., 2002b. Macroeconomic forecasting using diffusion indexes. J. Bus. Econom. Statist. 20, 147–162. Stock, J., Watson, M., 2006. Forecasting with many predictors. In: Graham, E., Granger, C., Timmerman, A. (Eds.), Handbook of Economic Forecasting, Amsterdam, pp. 546–550. Wold, H., 1982. Soft modeling: the basic design and some extensions. In: Jöreskog, K.G., et al. (Eds.), Systems Under Indirect Observation, Part 2. NorthHolland, Amsterdam, pp. 1–54.