Model selection criteria for the leads-and-lags cointegrating regression

Model selection criteria for the leads-and-lags cointegrating regression

Journal of Econometrics 169 (2012) 224–238 Contents lists available at SciVerse ScienceDirect Journal of Econometrics journal homepage: www.elsevier...

321KB Sizes 0 Downloads 32 Views

Journal of Econometrics 169 (2012) 224–238

Contents lists available at SciVerse ScienceDirect

Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom

Model selection criteria for the leads-and-lags cointegrating regression✩ In Choi a,∗ , Eiji Kurozumi b a

Department of Economics, Sogang University, #1 Shinsu-dong, Mapo-gu, Seoul, 121-742, Republic of Korea

b

Department of Economics, Hitotsubashi University, 2-1 Naka, Kunitachi, Tokyo, Japan

article

info

Article history: Available online 21 January 2012 Keywords: Cointegration Leads-and-lags regression AIC Corrected AIC BIC Cp

abstract In this paper, Mallows’ (1973) Cp criterion, Akaike’s (1973) AIC, Hurvich and Tsai’s (1989) corrected AIC and the BIC of Akaike (1978) and Schwarz (1978) are derived for the leads-and-lags cointegrating regression. Deriving model selection criteria for the leads-and-lags regression is a nontrivial task since the true model is of infinite dimension. This paper justifies using the conventional formulas of those model selection criteria for the leads-and-lags cointegrating regression. The numbers of leads and lags can be selected in scientific ways using the model selection criteria. Simulation results regarding the bias and mean squared error of the long-run coefficient estimates are reported. It is found that the model selection criteria are successful in reducing bias and mean squared error relative to the conventional, fixed selection rules. Among the model selection criteria, the BIC appears to be most successful in reducing MSE, and Cp in reducing bias. We also observe that, in most cases, the selection rules without the restriction that the numbers of the leads and lags be the same have an advantage over those with it. © 2012 Elsevier B.V. All rights reserved.

1. Introduction Several methods have been proposed for efficient estimation of cointegrating relations. Phillips and Hansen (1990) and Park (1992) use semiparametric approaches to derive efficient estimators that have a mixture normal distribution in the limit. Saikkonen (1991), Phillips and Loretan (1991) and Stock and Watson (1993) use the regression augmented with the leads and lags of the regressors’ first differences, yielding estimators as efficient as those based on the semiparametric approach. This is called leads-and-lags regression or dynamic OLS regression. This method was also used for cointegrating smooth-transition regression by Saikkonen and Choi (2004). Johansen (1988) uses vector autoregression to derive the maximum likelihood estimator of cointegrating spaces under the assumption of a normal distribution. In addition, Pesaran and Shin (1999) use an autoregressive distributed modeling approach for the inference on cointegrating vectors. Finite-sample properties of the aforementioned methods are studied by Hargreaves (1994) and Panopoulou and Pittis (2004), among others.

✩ This paper was presented in a conference honoring the 60th birthday of Professor Peter C.B. Phillips that was organized by Robert Mariano, Zhijie Xiao and Jun Yu and held in Singapore in July, 2008. The authors are thankful to the seminar participants for their valuable comments. Kurozumi’s research was partially supported by the Ministry of Education, Culture, Sports, Science and Technology under Grants-in-Aid No. 18730142. ∗ Corresponding author. E-mail address: [email protected] (I. Choi).

0304-4076/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jeconom.2012.01.021

This paper focuses on the leads-and-lags regression among the methods mentioned above. Though it has been used intensively in empirical applications1 due to its optimal property and simplicity, the practical question of how to select the numbers of the leads and lags has not been resolved yet. Most empirical studies use arbitrary numbers of leads and lags and empirical results can differ depending on this choice. Furthermore, a restriction that the numbers of leads and lags be the same is often imposed out of convenience. This situation is certainly undesirable from empirical viewpoints and indicates a need for methods that select the numbers of leads and lags in nonarbitrary ways. The main purpose of this paper is to propose methods for the selection of the numbers of leads and lags in cointegrating regressions. More specifically, we will derive model selection criteria for the leads-and-lags regression so that the numbers of leads and lags can be chosen scientifically. These will make the leads-and-lags regression more useful for empirical applications. Deriving model selection criteria for the leads-and-lags regression is a nontrivial task since the true model is of infinite dimension. Most model selection criteria in time series analysis are derived assuming that the true model is contained in a set of candidate

1 Examples are Ball (2001) for the money-demand equation; Bentzen (2004) for the rebound effect in energy consumption; Caballero (1994) for the elasticity of the US capital-output ratio to the cost of capital; Hai et al. (1997) for spot and forward exchange rate regressions; Hussein (1998) for the Feldstein–Horioka puzzle; Masih and Masih (1996) for elasticity estimates of coal demand for China; Weber (1995) for estimates of Okun’s coefficient; Wu et al. (1996) for current account deficits; Zivot (2000) for the forward rate unbiasedness hypothesis.

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

models, though there are such exceptions as Shibata (1980, 1981) and Hurvich and Tsai (1991). Shibata (1980) considers a model selection problem for an infinite-order autoregressive process and proposes an optimal model selection criterion based on the minimization of prediction errors. Shibata (1981) uses the same idea as in Shibata (1980) for a linear regression model where the number of regressors is infinite or increases with the sample size. Hurvich and Tsai (1991) consider a biascorrected Akaike information criterion (AIC) for the infinite-order autoregressive model using a frequency-domain approach for the approximation of the variance–covariance matrices for the true and approximating models. Notably, the resulting formula from this study is different from that for the finite-order autoregressive model (cf. Hurvich and Tsai, 1989). By contrast, we will show that all the model selection criteria that we derive for the leadsand-lags regression are the same as those for the case of a finitedimensional true model. In this paper, we will consider four model selection criteria: Mallows’ (1973) Cp criterion, Akaike’s (1973) AIC, Hurvich and Tsai’s (1989) corrected AIC, and the Bayesian information criterion (BIC) of Akaike (1978) and Schwarz (1978). These methods are the most common in practice,2 though there are many other methods available as documented by Rao and Wu (2001).3 We will also report extensive simulation results that compare the model selection criteria using bias and mean squared error (MSE) of the long-run coefficient estimate as benchmarks. The simulation results show that the model selection criteria are successful in reducing bias and MSE relative the fixed selection rules. the BIC appears to be most successful in reducing MSE, and Cp in reducing bias. We also observe that, in most cases, the selection rules without the restriction that the numbers of the leads and lags be the same have an advantage over those with it. A recent paper related to the current one is Kejriwal and Perron (2008). Since such selection rules as the AIC and BIC yield logarithmic rates of increase of the chosen numbers of leads and lags, they do not satisfy the upper bound condition for leads-andlags regression (condition (9) in Section 2). However, Kejriwal and Perron (2008) show that the condition can be weakened without bringing changes to the asymptotic mixture normality of the longrun coefficient estimates. The weakened condition is satisfied by the AIC and BIC so that their use is justified in practice. Kejriwal and Perron (2008) use the AIC and BIC without considering their appropriateness for the leads-and-lags regression. This paper establishes rigorously that using them and others is proper from the viewpoint of model selection. This paper is organized as follows. Section 2 briefly explains cointegrating leads-and-lags regression. Section 3 derives model selection criteria for leads-and-lags regression. Section 4 reports simulation results that compare the performance of the model selection criteria in finite samples. Section 5 summarizes and concludes. All the proofs are contained in Appendices. A few words on our notation. Weak convergence is denoted by ⇒ and all limits are taken as T → ∞. The largest integer not exceeding x is denoted by [x]. For an arbitrary matrix A, ∥A∥ =

1/2

tr(A′ A) and ∥A∥1 = sup {∥Ax∥ : ∥x∥ ≤ 1}. When applied to matrices, the inequality signs > and ≥ mean the usual ordering of positive definite and semidefinite matrices, respectively. Last, for a matrix A, PA = A(A′ A)−1 A′ and MA = I − PA .



2 The Google citation numbers for these articles are 1060, 4546, 741, 205 and 5720, respectively, as of August 23, 2008. 3 Unfortunately, this review article focuses on the statistics literature only and neglects contributions to the subject of model selection that appeared in the econometrics literature. Some of the important works neglected there are Phillips and Ploberger (1994, 1996). However, since the Phillips–Ploberger criterion assumes that the true model is contained in a set of candidate models (see Section 3 of Phillips and Ploberger, 1994), it is inapplicable to our problem.

225

2. Leads-and-lags regression This section briefly introduces the leads-and-lags regression of Saikkonen (1991)4 and some required assumptions. Consider the cointegrating regression model yt = µ + β ′ xt + ut ,

(t = 1, 2, . . . , T ) ,

(1)

where xt (p × 1) is an I (1) regressor vector and ut a zero-mean stationary error term. The main purpose of the leads-and-lags regression is to estimate the cointegrating vector β efficiently such that it has a mixture normal distribution in the limit. Leads and lags will augment the regression model (1) for this purpose. For the regressors and error terms, we assume that wt = (1x′t ut )′ = (vt′ ut )′ satisfy conditions for the multivariate invariance principle such that [Tr ] 1 



T t =1

wt ⇒ B(r ),

r ∈ (0, 1],

(2)

where B(r ) is a vector Brownian motion witha positive-definite  variance–covariance matrix Ω =

Ωvv ωuv

ωv u p ωuu 1. More primitive

conditions for this are available in the literature (cf., e.g., Phillips and Durlauf, 1986). Furthermore, the summability condition ∞     E wt w ′  < ∞ t +j

(3)

j=−∞

needs to be satisfied. This implies that the process wt has a continuous spectral density matrix fww (λ), which we assume to satisfy fww (λ) ≥ ε Ip+1 ,

ε > 0.

(4)

This assumption means that the spectral density matrix fww (λ) is bounded away from zero. Last, denoting the fourth-order cumulants of wt as κijkl , we also require a technical assumption: ∞     κijkl (m1 , m2 , m3 ) < ∞. m1 ,m2 ,m3 =−∞

Note that all of these assumptions are taken from Saikkonen (1991). Under conditions (3) and (4), the error term ut can be expressed as ut =

∞ 

πj′ vt −j + et ,

(5)

j=−∞

where et is a zero-mean stationary process such that Eet vt′ −j = 0 for all j = 0, ±1, . . . , and ∞    πj  < ∞. j=−∞

As is well known, the long-run variance of the process et can be −1 expressed as ωe2 = ωuu − ωuv Ωvv ωv u . Using Eq. (5), we can write model (1) as yt = µ + β ′ xt +

∞ 

πj′ 1xt −j + et ,

j=−∞

4 See also Phillips and Loretan (1991) and Stock and Watson (1993).

(6)

226

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

where 1 signifies the difference operator. Truncating the infinite sum in model (6) at KU and KL , we obtain

1K =

KU



yt = µ + β ′ xt +

πj′ 1xt −j + eKt ,

j=−KL

=

(t = KU + 2, KL + 3, . . . , T − KL ) , = θK zKt + eKt , ′

where θK = [µ β 1x′t −KU ]′ and

(7) ′



eKt = et +

π−′ KL , . . . , πK′ U ]′ ,

1

σ

πj′ vt −j = et + Vt ,K .

(8)

and (9)

j>KU , j<−KL

In fact, Saikkonen (1991) did not derive asymptotic normality of ′ the OLS estimator of [π−KL , . . . , πK′ U ]′ , but this can be done using the same methods as for Theorem 4 of Lewis and Reinsel (1985) (see also Berk, 1974). Conditions (8) and (9) are sufficient to derive the asymptotic distributions of the OLS estimators for model (7), but do not provide practical guidance in selecting KU and KL in finite samples. The next section will consider various methods for their optimal selection. 3. Methods for selecting KU and KL This section considers various procedures for selecting KU and KL . These are basically model selection procedures that have often been used for regressions and time-series analysis. For the derivations of these procedures, we assume that the conditions of the leads-and-lags regression in Section 2 hold. For later use, write model (7) in obvious matrix notation as y = ZK θK +eK with eK = VK +e. The OLS estimator of the parameter vector θK using model (7) is denoted by θˆK . In addition, model (6) is written in vector notation as y = τ + e, where e is the vector of errors.



E (θˆK − θ )′ ZK′ ZK (θˆK − θ ) | ZK , VK T −KL

1



σ

2 e t =KU +2

Vt2,K

T −KL

1

σ





2 e t =KU +2

E



′  θˆK − θK zKt Vt ,K | ZK , VK

= A + B − C , say. Since 12 (θˆK − θ )′ ZK′ ZK (θˆK − θ ) ⇒ χ 2 (p(KL + KU + 2) + 1) (cf. Theσe

orem 4.1 of Saikkonen, 1991, and Theorem 4 of Lewis and Reinsel, 1985), A is approximated by p(KL + KU + 2) + 1. Using the relation T −KL ˆ 2t ,K − E (e2Kt ) = σe2 + E (Vt2,K ), we approximate B by 12 t =KU +2 e σe

(T − KU − KL − 1). The third term C is approximated by zero since θˆK − θK converges to zero in probability. Let KU ,max and KL,max be the maximum values used for selecting KU and KL , respectively. We estimate σe2 by σˆ e2 =

1 T −KU ,max −KL,max −1

T −KL,max

t =KU ,max +2

eˆ 2t ,Kmax , where

eˆ t ,K max denotes the regression residual using KU ,max and KL,max for KU and KL , respectively. Then, using the aforementioned approximations, the Cp criterion that approximates 1K is defined by Cp =

  πj  → 0.





2 e

+

E ft2 | ZK , VK

t =KU +2

−2

KU3 /T , KL3 /T → 0

T



σe2



In regression model (7), leads and lags are used as additional regressors. Saikkonen (1991) uses a common value for KU and KL for simplicity, but the results he obtained apply to the current case with some minor changes in notation. The numbers of leads and lags should be large enough to make the effect of truncation negligible, but should not be too large because this will bring inefficiency in estimating the coefficient vector β . Conditions on KU and KL that provide asymptotic mixture normality of the OLS estimator of β and asymptotic normality of ′ the OLS estimator of [π−KL , . . . , πK′ U ]′ are



T −KL

1

zKt = [1, xt , 1xt +KL , . . . , ′

j>KU , j<−KL



errors standardized by σe2 (=E (e2t )) is

T −KL

1



σˆ e2

eˆ 2t ,K + (p + 1)(KL + KU + 2) − T .

t =KU +2

In practice, we choose KU and KL so that the Cp criterion is minimized. Note that this requires preselecting KU ,max and KL,max . 3.2. Akaike information criterion The AIC of Akaike (1973) is an estimator of the expected Kullback–Leibler information measure and is often used for regression and time-series models. In deriving the AIC and its variants, it is usually assumed5 that candidate models include the true model (cf. Akaike, 1973; Hurvich and Tsai, 1989), though it does not have to be so because the Kullback–Leibler information measure simply indicates how far apart the true and any candidate models are. Since the true model of the current study (Eq. (6)) involves an infinite number of parameters, candidate models cannot include the true model. However, it is still possible to derive the AIC using a general formula.6 Assume that the error terms of the candidate model (7) follow an i.i.d. normal distribution with mean 0 and 2 variance σeK conditional on ZK , and denote the conditional log2 likelihood function of the candidate model by l(θK , σeK ) = l(δK ). Then, letting n = T − KU − KL − 1, the general formula can be written as



T −KL



 t =K +2  U  n

n ln 



eˆ 2t ,K 

    + 2 tr J (δo )I (δo )−1 , 

3.1. Cp criterion The Cp criterion of Mallows (1973) is an estimator of the expected squared sum of forecast errors. Assume that E (et | ZK , VK ) = 0 for any KU and KL . Then the forecast error for the Cp criterion

5 A notable exception is Hurvich and Tsai (1991), which considers a biascorrected AIC for the autoregressive model of infinite order. √ 6 The general formula assumes T asymptotics, but it can straightforwardly be

is defined by ft = yˆ t − E (yt | ZK , VK ) = θˆK′ zKt − θK′ zKt − Vt ,K . This measures the distance between the fitted value yˆ t and the conditional expectation of yt . The expected squared sum of the forecast

extended to the current case because the leads-and-lags coefficient estimators for the nonstationary regressors have a mixture-normal distribution in the limit. The general formula is related to Takeuchi’s (1976) information criterion. See Chapter 7, Section 2 of Burnham and Anderson (2002) for further details on this.

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

where δo

= [θo′ σo2 ]′ minimizes the Kullback–Leibler infor  ∂ l(δ ) ∂ l(δ ) mation measure, J (δo ) = E ∂δ K ∂δ K and I (δo ) = K K |δK =δo     σ2  ∂ 2 l(δK ) −1 E − ∂δ ∂δ ′ . Since tr J (δo )I (δo ) = σe2 p(KU + KL + 2) K o K |δK =δo p 2σo2 −σe2  2 as shown in Appendix A and σo −→ σe2 as shown + 1 + σ2 o

in Lemma B.2 in Appendix B, the AIC is defined by



T −KL



 t =K +2  U  n

AIC = n ln 



eˆ 2t ,K 

Comparison of the AIC and the corrected AIC reveals that the major difference between them lies at which stage the maximum likelihood estimators are plugged into the Kullback–Leibler information measure. The AIC calculates the Kullback–Leibler information measure using the maximum likelihood estimators from the beginning, but the corrected AIC calculates the Kullback–Leibler information measure using unknown parameter values and then uses the maximum likelihood estimators for the computed Kullback–Leibler information measure. 3.4. Bayesian information criterion

  + 2 (p(KU + KL + 2) + 2) . 

Notably, this is the same as the usual AIC that assumes that candidate models include the true model. However, notice that the sample size n depends on the chosen model unlike in conventional regression models.

The BIC of Akaike (1978) and Schwarz (1978) is an approximation to a transformation of the Bayesian posterior probability of a candidate model. Unlike the AIC, it does not require the probability density of the true model. Therefore, the true model of infinite dimension as in this paper does not require any separate treatment and the usual formula,



3.3. Corrected Akaike information criterion Hurvich and Tsai’s (1989) corrected AIC is also an estimator of the Kullback–Leibler information measure. First, it calculates the Kullback–Leibler information measure using unknown parameter values. Next, the unknown parameter values are replaced with the maximum likelihood estimators. These steps provide the corrected AIC. In our application, letting EF (·) be an expectation operator using the true model (6) and assuming EF (e2t | Z∞ ) = σe2 where Z∞ denotes {zKt }∞ t =−∞ , the Kullback–Leibler information measure using unknown parameter values is

−2EF (l(δK ))    (y − ZK θK )′ (y − ZK θK )  2 = EF n ln(σeK ) +  Z∞ 2 σeK   (τ − ZK θK )′ (τ − ZK θK ) nσe2 2 = EF n ln(σeK ) | Z∞ + + 2 2 σeK σeK 2 = f (θK , σeK ),

2 Replacing θK and σeK with corresponding maximum likelihood estimators, we obtain

 (τ − ZK θˆK )′ (τ − ZK θˆK ) nσe2  2 2 f (θˆK , σˆ eK + 2 , ) = EF n ln(σˆ eK ) + 2 σˆ eK σˆ eK 2 = where σˆ eK

n2 , χ 2 (n−m)

respectively, under a normality assumption when T is large. Thus, using the mean values of the second and third terms in relation 2 ) by (10), we approximate f (θˆK , σˆ eK T −KL



 t =K +2  U  n

AICC = n ln 

The ultimate purpose of the leads-and-lags regression is to estimate the long-run coefficient β precisely. This section investigates how model selection criteria of the previous section perform in relation to the estimation of the long-run coefficient β . To this end, we consider the data generating process



eˆ 2t ,K 

2

nm n  + . +  n−m−2 n−m−2

This is the corrected AIC. Notice that this is exactly the same as the corrected AIC of Hurvich and Tsai (1989), which assumes that candidate models include the true model. By contrast, Hurvich and Tsai’s (1991) corrected AIC derived for the AR(∞) model is different from that of the current work.

xt = xt −1 + vt ,

(11)

where xt is a scalar unit root process with x0 = 0. We set µ = 1 and β = 1 throughout the simulations and set the sample size at 100 or 300. The error term wt = (vt , ut )′ is generated from a VARMA(1,1) process:

wt = Awt −1 + εt − Θ εt −1 ,

(12)

where



εt =

(m, n − m) and

  + (p(KU + KL + 2) + 2) ln(n), 

4. Simulation results

. The corrected AIC is an approximation

nm F n −m



eˆ 2t ,K 

can be used. Because we never use the true model in the leads-andlags regression, consistency of the BIC cannot be an issue.

A=

2 to f (θˆK , σˆ eK ). As shown in Lemma B.3 in Appendix B, the second and third



 t =K +2  U  n

(10)

terms in relation (10) follow



BIC = n ln 

T −KL

n

T −KL

yt = µ + β xt + ut ,

say.

eˆ 2 t =KU +2 t ,K

227

a11 0





0 , a22

Θ=

 ε1t ∼ i.i.d. (0, Σ ), ε2t



θ11 0

0



, θ22  1 Σ= σ12

σ12 1



,

w 0 = ε0 = 0 . The parameters aii and θii (i = 1, 2) are related to the strength of the serial correlation in wt , whereas σ12 signifies contemporaneous correlation. In the simulations, the parameters aii and θii take values from {0, 0.4, 0.8} while σ12 is equal to either 0.4 or 0.8. We consider two distributions for εt : a standard normal distribution and a log-normal distribution. Since AICC is based on the assumption that the error terms are normally distributed, the nonnormal distribution of εt implies that the correction for the AICC does not make much sense. In order to check the robustness of AICC to nonnormal error terms, we try a log-normal distribution for the error terms. We also note that all the selection criteria depend on the maximum numbers of the leads and lags. We will use K4 = [4 × (T /100)1/4 ] and K12 = [12 × (T /100)1/4 ] commonly for KU ,max and KL,max in the simulations. All computations are carried out using the GAUSS matrix language with 10,000 replications.

228

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

Table 1 Bias of the estimates. Part I: a11 = θ11 = 0 (bias × 10, T = 100, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 ) (Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax KL = KU = 1 KL = KU = 2 KL = KU = 3

(0, 0.4) (0, 0)

σ12 = 0.8 (0, 0.8) (0, 0)

(0, 0.4) (0, 0)

σ12 = 0.4 (0, 0.8) (0, 0)

(0, 0) (0, 0.4)

σ12 = 0.8 (0, 0) (0, 0.8)

(0, 0) (0, 0.4)

(0, 0) (0, 0.8)

σ12 = 0.4

σ12 = 0.8

(0, 0.8) (0, 0.4)

(0, 0.8) (0, 0.4)

0.0082 0.0304 0.0364 0.0777 0.0106 0.0482 0.0591 0.1077 −0.0038

0.3433 0.3835 0.4023 0.5010 0.3548 0.4357 0.4664 0.5957 0.3045

0.0135 0.0258 0.0312 0.0594 0.0155 0.0393 0.0491 0.0935 0.0050

0.6369 0.6389 0.6392 0.6665 0.6474 0.6722 0.6916 0.8024 0.6376

−0.0197 −0.0235 −0.0467 −0.0116 −0.0306 −0.0366 −0.0684 −0.0030

−0.0051 −0.0115 −0.0149 −0.0402 −0.0069 −0.0250 −0.0310 −0.0876 −0.0013

−0.0019 −0.0026 −0.0026 −0.0029 −0.0017 −0.0029 −0.0027 −0.0087 −0.0013

−0.0018 −0.0021 −0.0020 −0.0012 −0.0015 −0.0017 −0.0011 −0.0018 −0.0006

0.1745 0.2055 0.2165 0.2768 0.1790 0.2328 0.2524 0.3142 0.1499

0.3243 0.3347 0.3391 0.3872 0.3323 0.3644 0.3857 0.4946 0.3178

−0.0155

0.0564 0.1515 0.2610 0.4242 0.0658 0.2016 0.3810 0.5693 0.0315

−0.0028

0.0220 0.0283 0.0763 −0.0206 0.0409 0.0556 0.1073 −0.0227

0.0063 0.0261 0.0577 −0.0073 0.0233 0.0462 0.0933 −0.0126

0.1664 0.2386 0.3068 0.4229 0.1637 0.2858 0.4619 0.6541 0.1373

−0.0017 −0.0153 −0.0237 −0.0463 −0.0066 −0.0275 −0.0355 −0.0684 −0.0083

−0.0016 −0.0092 −0.0163 −0.0398 −0.0043 −0.0231 −0.0301 −0.0876 −0.0022

−0.0049 −0.0021 −0.0018 −0.0029 −0.0053 −0.0017 −0.0017 −0.0087 −0.0062

−0.0018 −0.0014 −0.0013 −0.0012 −0.0009 −0.0011 −0.0004 −0.0017 −0.0032

0.0270 0.1066 0.1591 0.2624 0.0268 0.1401 0.2344 0.3111 0.0085

0.0825 0.1473 0.1984 0.3107 0.0823 0.1868 0.3099 0.4694 0.0641

0.0480 0.0148 0.0009

0.5513 0.4524 0.3703

0.1086 0.0425 0.0143

1.1267 0.9314 0.7690

−0.0043 −0.0026 −0.0026

−0.0027 −0.0001 −0.0001

−0.0022 −0.0013 −0.0017

−0.0017 −0.0001 −0.0002

0.2727 0.2237 0.1826

0.5620 0.4644 0.3829

0.0084

Part II: a11 ̸= 0 and/or θ11 ̸= 0 (bias × 10, T = 100, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 ) (Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC(KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax KL = KU = 1 KL = KU = 2 KL = KU = 3

σ12 = 0.4

σ12 = 0.8

(0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

(0.4, 0.4) (0, 0)

σ12 = 0.8 (0.8, 0.8) (0, 0)

σ12 = 0.4 (0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0.8, 0.8) (0.4, 0.4)

(0.8, 0.8) (0.4, 0.4)

−0.0059 −0.0056 −0.0059 −0.0049 −0.0060 −0.0077 −0.0061 −0.0049 −0.0051

−0.0065 −0.0071 −0.0062 −0.0056 −0.0066 −0.0056 −0.0061 −0.0051 −0.0051

−0.0024 −0.0022 −0.0019 −0.0016 −0.0031 −0.0024 −0.0024 −0.0021 −0.0014

−0.0005 −0.0015 −0.0010 −0.0008 −0.0004 −0.0014 −0.0015 −0.0012 −0.0005

−0.0048 −0.0048 −0.0046 −0.0046 −0.0065 −0.0050 −0.0063 −0.0048 −0.0041

−0.0094 −0.0045 −0.0059 −0.0041 −0.0114 −0.0018 −0.0033 −0.0071 −0.0044

−0.0031 −0.0044 −0.0039 −0.0023 −0.0031 −0.0048 −0.0036 −0.0027 −0.0028

−0.0027 −0.0040 −0.0037 −0.0041 −0.0052 −0.0003 −0.0010 −0.0057 −0.0055

−0.0070 −0.0067 −0.0060 −0.0050 −0.0066 −0.0059 −0.0059 −0.0047 −0.0051

−0.0012 −0.0013 −0.0013 −0.0013 −0.0010 −0.0017 −0.0016 −0.0015 −0.0008

−0.0137 −0.0058 −0.0069 −0.0054 −0.0149 −0.0074 −0.0073 −0.0048 −0.0149

−0.0121 −0.0094 −0.0084 −0.0077 −0.0119 −0.0132 −0.0068 −0.0047 −0.0127

−0.0054 −0.0075 −0.0022 −0.0012 −0.0069 −0.0040 −0.0023 −0.0021 −0.0086

−0.0033 −0.0036 −0.0029 −0.0031 −0.0032 −0.0058 −0.0002 −0.0013 −0.0053

0.0026 0.0000 −0.0067 −0.0040 −0.0094 −0.0017 −0.0060 −0.0048 −0.0138

0.0095 0.0051 −0.0048 −0.0027 −0.0054 0.0070 −0.0013 −0.0073 −0.0122

−0.0072 −0.0054 −0.0035 −0.0023 −0.0086 −0.0037 −0.0031 −0.0027 −0.0093

−0.0095 −0.0037 −0.0042 −0.0032 −0.0096 −0.0006

−0.0123 −0.0102 −0.0059 −0.0050 −0.0113 −0.0124 −0.0059 −0.0050 −0.0131

−0.0047 −0.0067 −0.0026 −0.0007 −0.0035 −0.0065 −0.0016 −0.0011 −0.0060

−0.0054 −0.0053 −0.0054

−0.0050 −0.0052 −0.0054

−0.0023 −0.0024 −0.0027

−0.0014 −0.0014 −0.0012

−0.0071 −0.0043 −0.0046

−0.0111

−0.0035 −0.0022 −0.0034

−0.0058

−0.0052 −0.0053 −0.0054

−0.0016 −0.0016 −0.0016

Before reporting our simulation results, we note that the leads of 1xt = vt do not have to be included in the augmented regression model (7) when a11 = θ11 = 0. In this case, vt becomes equal to ε1t and ut = (1 − a22 L)−1 (1 − θ22 L)ε2t

= φ(L)σ12 vt + φ(L)ε2·1,t ,

(13)

where φ(L) = (1 − a22 L) (1 − θ22 L) with L being the lag operator and ε2·1,t = ε2t − σ12 ε1t . Since the first term of (13) includes vt −j = 1xt −j only for j ≥ 0, while ε2·1,t is independent of vs = ε1s for all s and t, we can see that ut can be expressed as in Eq. (5) without the leads of 1xt . Tables 1–4 present empirical bias of the estimate of β while Tables 5–8 report empirical MSE. In each table, Cp , AIC, AICc , BIC correspond to the information criteria derived in this paper without −1

σ12 = 0.8

0.0021 −0.0008

0.0006

−0.0057 −0.0131 0.0011 −0.0030

the restriction of KU = KL , while Cp (KU = KL ), AIC(KU = KL ), AICc (KU = KL ) and Cp (KU = KL ) are the information criteria with the restriction. Part I of each table deals with the case where the leads of 1xt are not required (i.e., a11 = θ11 = 0). Part II handles the case where either a11 ̸= 0 or θ11 ̸= 0 or both. Note that a different scale is used in each table depending on the sample size and the distribution of εt . For the purpose of comparison, we also select the leads and lags by the fixed rules KU = KL = 1, 2, 3, K4 and K12 . Although it is often the case that KU is conventionally set equal to KL in practice, we do not have to use the same numbers for the leads and lags. Especially, when a11 = θ11 = 0, we do not have to include the leads and then the selection rules without the restriction of KU = KL are expected to have an advantage over those with KU = KL . We first summarize the simulation results regarding the bias.

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

229

Table 2 Bias of the estimates. Part I: a11 = θ11 = 0 (bias × 102 , T = 100, εt : log-normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 ) (Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax KL = KU = 1 KL = KU = 2 KL = KU = 3

σ12 = 0.8

(0, 0.4) (0, 0)

(0, 0.8) (0, 0)

(0, 0.4) (0, 0)

0.0078 0.0172 0.0205 0.0345 0.0127 0.0257 0.0293 0.0440 −0.0018

0.4234 0.4444 0.4544 0.5515 0.4528 0.5120 0.5443 0.6975 0.4226

0.0093 0.0132 0.0142 0.0202 0.0089 0.0151 0.0173 0.0275 0.0024

−0.0083 0.0045 0.0123 0.0316 −0.0085 0.0135 0.0236 0.0418 −0.0078

0.1096 0.1780 0.2424 0.4053 0.1071 0.2280 0.3859 0.6072 0.0862

0.0244 0.0093 0.0030

0.7429 0.6158 0.5114

σ12 = 0.4 (0, 0.8) (0, 0)

σ12 = 0.8

σ12 = 0.4

σ12 = 0.8

(0, 0.8) (0, 0.4)

(0, 0.8) (0, 0.4)

(0, 0) (0, 0.4)

(0, 0) (0, 0.8)

(0, 0) (0, 0.4)

(0, 0) (0, 0.8)

0.4088 0.4078 0.4079 0.4169 0.4189 0.4290 0.4366 0.4800 0.4158

−0.0064 −0.0085 −0.0096 −0.0138 −0.0044 −0.0101 −0.0108 −0.0185 −0.0023

−0.0047 −0.0078 −0.0084 −0.0158 −0.0032 −0.0091 −0.0110 −0.0243 −0.0017

0.0007 0.0009 0.0005 −0.0006 0.0007 0.0000 −0.0001 −0.0032 −0.0002

0.0006 0.0007 0.0006 0.0007 0.0001 0.0005 0.0005 −0.0001 −0.0005

0.2250 0.2527 0.2633 0.3372 0.2436 0.2978 0.3200 0.3921 0.2099

0.2103 0.2147 0.2169 0.2368 0.2162 0.2317 0.2409 0.2883 0.2079

0.0001 0.0113 0.0125 0.0180 −0.0044 0.0108 0.0161 0.0275 −0.0144

0.0994 0.1416 0.1819 0.2384 0.0938 0.1517 0.2560 0.3448 0.0644

−0.0013 −0.0060 −0.0083 −0.0131 −0.0006 −0.0060 −0.0090 −0.0173

−0.0008 −0.0097 −0.0113 −0.0178

−0.0008

0.0000

0.0036

0.0020 0.0023 0.0015 0.0016 0.0001 0.0018 0.0019 0.0002 0.0004

0.0593 0.1198 0.1722 0.2894 0.0591 0.1612 0.2542 0.3610 0.0413

0.0575 0.1022 0.1309 0.1771 0.0494 0.1097 0.1808 0.2553 0.0278

0.0285 0.0128 0.0064

0.7383 0.6101 0.5048

−0.0012 −0.0012 −0.0005

−0.0009 −0.0010 −0.0001

−0.0006 −0.0001 0.0005

0.3707 0.3073 0.2553

0.3690 0.3054 0.2530

0.0013

−0.0089 −0.0134 −0.0250

0.0010 −0.0003 −0.0008 −0.0012 −0.0001 0.0001 −0.0031 −0.0042

−0.0004 0.0003 0.0009

Part II: a11 ̸= 0 and/or θ11 ̸= 0 (bias × 10 , T = 100, εt : log-normal) 2

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 ) (Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax KL = KU = 1 KL = KU = 2 KL = KU = 3

σ12 = 0.4

σ12 = 0.8

(0.8, 0.8) (0, 0)

(0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

(0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0.8, 0.8) (0.4, 0.4)

(0.8, 0.8) (0.4, 0.4)

0.0003 0.0026 0.0034 0.0055 0.0014 0.0040 0.0041 0.0067 0.0021

0.0287 0.0321 0.0334 0.0420 0.0301 0.0360 0.0383 0.0480 0.0263

−0.0004 −0.0010 −0.0008 −0.0016 −0.0001 −0.0004 −0.0005 −0.0018 −0.0000

−0.0071 −0.0079 −0.0085 −0.0105 −0.0067 −0.0080 −0.0088 −0.0111 −0.0063

−0.0059 −0.0082 −0.0085 −0.0093 −0.0041 −0.0068 −0.0078 −0.0097 −0.0031

−0.0300 −0.0437 −0.0473 −0.0669 −0.0250 −0.0458 −0.0502 −0.0697 −0.0068

0.0012 0.0019 0.0019 0.0016 0.0014 0.0022 0.0018 0.0013 0.0000

0.0039 0.0088 0.0084 0.0115 0.0045 0.0076 0.0087 0.0131 −0.0161

0.0260 0.0297 0.0310 0.0399 0.0280 0.0346 0.0366 0.0442 0.0232

−0.0064 −0.0074 −0.0076 −0.0096 −0.0062 −0.0077 −0.0082 −0.0103 −0.0056

−0.0062 −0.0024 −0.0004

−0.0022

−0.0049 −0.0038 −0.0050 −0.0087 −0.0056 −0.0040 −0.0077 −0.0104 −0.0083

−0.0018 −0.0058 −0.0085 −0.0092

−0.0115 −0.0487 −0.0606 −0.0760 −0.0031 −0.0446 −0.0602 −0.0729

−0.0009

0.0030 0.0108 0.0137 0.0160 −0.0001 0.0106 0.0134 0.0134 0.0011

−0.0009

0.0002 0.0018 0.0062 −0.0051

0.0006 0.0073 0.0124 0.0297 0.0015 0.0117 0.0228 0.0405 0.0026

0.0043 0.0137 0.0318 0.0014 0.0118 0.0266 0.0388 0.0024

−0.0033 −0.0025 −0.0049 −0.0075 −0.0042 −0.0045 −0.0062 −0.0091 −0.0085

0.0039 0.0012 −0.0003

0.0461 0.0384 0.0320

−0.0009

−0.0105 −0.0084 −0.0071

−0.0022 −0.0023 −0.0003

−0.0027 −0.0016

0.0411 0.0343 0.0286

−0.0095 −0.0076 −0.0063

(0.4, 0.4) (0, 0)

0.0048

−0.0069

σ12 = 0.8

0.0001 −0.0005 −0.0012 −0.0034 0.0005 −0.0003 −0.0014 −0.0090 0.0004 0.0007

σ12 = 0.4

(b-i) Cp without the restriction of KU = KL tends to be most successful in reducing bias. Especially when there are high serial correlations in the data (see the last two columns in each table), Cp shows much better performance than the other selection criteria and the fixed selection rules. (b-ii) When a11 = θ11 = 0, the absolute value of the bias without the restriction of KU = KL is smaller than that with KU = KL in most cases. This is well expected because the leads of 1xt are not required in this case for the augmented regression (7). (b-iii) When a11 = θ11 = 0, the bias resulting from the use of Cp is smallest whereas the BIC leads to the most biased estimates among the four selection criteria.

0.0005

−0.0036 −0.0058 −0.0084 −0.0004

σ12 = 0.8

0.0093

−0.0054 −0.0081 −0.0007

0.0011 0.0003 0.0017 −0.0017 0.0013 0.0029 0.0011 0.0073

−0.0009 0.0003 0.0018

0.0043

(b-iv) The AIC tends to perform slightly better than the AICc , especially when a11 = θ11 = 0. But the differences are marginal. (b-v) There are a few cases where the model selection criteria with the restriction KU = KL result in the smaller bias, but the differences between the restricted and unrestricted cases are relatively small. In most cases, the model selection criteria without the restriction KU = KL perform better than those with it. (b-vi) The fixed selection rules sometimes result in large biases. In most cases, they are dominated by one of the model selection criteria. (b-vii) The log-normal distribution does not bring any noticeable changes in evaluating the selection rules. In particular,

230

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

Table 3 Bias of the estimates. Part I: a11 = θ11 = 0 (bias × 10, T = 300, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 ) (Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax KL = KU = 1 KL = KU = 2 KL = KU = 3

(0, 0.4) (0, 0)

σ12 = 0.8 (0, 0.8) (0, 0)

(0, 0.4) (0, 0)

σ12 = 0.4 (0, 0.8) (0, 0)

0.0049 0.0092 0.0096 0.0241 0.0059 0.0131 0.0141 0.0366 0.0015

0.1046 0.1139 0.1155 0.1597 0.1100 0.1311 0.1346 0.2177 0.0974

0.0025 0.0054 0.0059 0.0142 0.0033 0.0080 0.0088 0.0229 0.0006

0.1847 0.1839 0.1837 0.1827 0.1857 0.1861 0.1863 0.1965 0.1856

0.0005 0.0069 0.0080 0.0241 0.0012 0.0113 0.0128 0.0366 −0.0013

0.0262 0.0522 0.0586 0.1457 0.0304 0.0786 0.0925 0.2148 0.0098

0.0003 0.0035 0.0041 0.0142 0.0001 0.0068 0.0078 0.0229 −0.0001

0.0319 0.0448 0.0489 0.0944 0.0349 0.0587 0.0698 0.1501 0.0235

0.0211 0.0096 0.0045

0.2232 0.1817 0.1474

0.0382 0.0155 0.0061

0.4321 0.3503 0.2836

(0, 0) (0, 0.4) 0.0002

−0.0008 −0.0008 −0.0068 0.0000

−0.0021 −0.0023 −0.0161 0.0004 0.0000

−0.0005 −0.0006 −0.0068 0.0002 −0.0019 −0.0022 −0.0161 −0.0005 0.0003 0.0008 0.0004

σ12 = 0.8

σ12 = 0.4

σ12 = 0.8

(0, 0.8) (0, 0.4)

(0, 0.8) (0, 0.4)

(0, 0) (0, 0.8)

(0, 0) (0, 0.4)

(0, 0) (0, 0.8)

0.0001 0.0000 −0.0001 −0.0008 0.0003 0.0002 0.0000 −0.0045 0.0002

0.0000 0.0001 0.0001 0.0000 0.0000 0.0000 −0.0001 −0.0002 −0.0002

0.0000 0.0001 0.0000 0.0000 0.0001 −0.0001 −0.0001 −0.0001 −0.0002

0.0560 0.0658 0.0667 0.1010 0.0604 0.0788 0.0811 0.1260 0.0490

0.0926 0.0928 0.0927 0.0983 0.0935 0.0957 0.0963 0.1182 0.0926

0.0001 0.0002 0.0001 −0.0008 0.0004 0.0002 0.0001 −0.0045 −0.0002

0.0000 0.0000 0.0001 0.0000 −0.0002 −0.0001 0.0000 −0.0002 −0.0001

0.0000

0.0153 0.0370 0.0425 0.0983 0.0171 0.0573 0.0657 0.1257 0.0045

0.0172 0.0300 0.0333 0.0708 0.0191 0.0417 0.0493 0.1092 0.0117

−0.0002

−0.0002

−0.0001

0.0004 0.0001

0.0001 −0.0001

0.0001 0.0000

0.1120 0.0914 0.0741

0.2160 0.1752 0.1418

−0.0001 −0.0001 0.0000

−0.0003 −0.0001 −0.0001 −0.0001 −0.0002

Part II: a11 ̸= 0 and/or θ11 ̸= 0 (bias × 10, = 300, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 ) (Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax KL = KU = 1 KL = KU = 2 KL = KU = 3

σ12 = 0.4

σ12 = 0.8

(0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

(0.4, 0.4) (0, 0)

σ12 = 0.8 (0.8, 0.8) (0, 0)

(0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0.8, 0.8) (0.4, 0.4)

(0.8, 0.8) (0.4, 0.4)

0.0008 0.0007 0.0009 0.0010 0.0006 0.0007 0.0008 0.0010 0.0007

0.0005 0.0006 0.0007 0.0007 0.0006 0.0006 0.0006 0.0007 0.0005

−0.0004 −0.0001

−0.0003 −0.0002 −0.0002 −0.0001 −0.0002 −0.0001 −0.0002 −0.0002 −0.0004

0.0008 0.0010 0.0011 0.0009 0.0008 0.0008 0.0009 0.0010 0.0006

−0.0003

0.0002 0.0001 0.0001 0.0001 0.0003 0.0002 0.0001 0.0000 −0.0003

0.0004 0.0000 0.0001 0.0010 0.0010 0.0008 0.0007 0.0012 −0.0002

0.0009 0.0006 0.0006 0.0008 0.0006 0.0006 0.0006 0.0008 0.0005

−0.0003 −0.0002 −0.0002 −0.0001 −0.0001 −0.0002 −0.0001 −0.0002 −0.0004

−0.0006

−0.0005 −0.0006 −0.0004

−0.0005 −0.0004 −0.0003

0.0006 0.0003 −0.0001 0.0009 0.0008 0.0004 0.0005 0.0022 −0.0003

−0.0003

−0.0008

0.0001 0.0000 0.0001 −0.0003 0.0002 0.0001 0.0000 −0.0002

0.0000 0.0001 0.0010 −0.0009 0.0005 0.0007 0.0012 −0.0007

−0.0004 −0.0004 −0.0002

−0.0003 −0.0001 −0.0001 −0.0001 −0.0002

0.0005 0.0009 0.0010 0.0009 0.0005 0.0009 0.0010 0.0010 −0.0006

0.0005 0.0006 0.0008 −0.0011

−0.0004 −0.0001 −0.0003 −0.0001 −0.0005 −0.0001 −0.0002 −0.0002 −0.0001

−0.0002 −0.0002 −0.0003

0.0007 0.0013 0.0008

−0.0014

−0.0002

−0.0014

0.0027 0.0012

0.0001 −0.0001

0.0008 −0.0001

0.0007 0.0008 0.0006

−0.0002 −0.0002 −0.0003

0.0005 0.0006 0.0010 −0.0007 0.0004 0.0006 0.0010 −0.0009

0.0001 0.0001 0.0007 −0.0012

0.0009 0.0010 0.0007

0.0007 0.0007 0.0006

0.0007

−0.0005

0.0000

−0.0001 −0.0002 0.0000

−0.0001 −0.0001 −0.0004 −0.0003 −0.0004 −0.0002 −0.0001 −0.0005 −0.0001 −0.0002 −0.0001 −0.0001 −0.0001 0.0000 −0.0002

σ12 = 0.4

0.0000

performance of the AICc does not change much with the log-normal distribution. (b-viii) As sample size grows, bias decreases as expected, but qualitative differences are not observed with the increasing sample sizes in evaluating the selection rules.

(m-iii)

Regarding to the MSE, the simulation results are summarized as follows. (m-i) The BIC tends to perform best in almost all the cases, and the AICc tends to follow. The MSE with Cp tends to become largest in most cases. Overall, however, the differences of the MSEs are relatively small among the four selection criteria. (m-ii) In most cases, the MSE without the restriction of KU = KL is smaller than that with KU = KL though there

(m-iv) (m-v) (m-vi)

σ12 = 0.8

0.0007 0.0005 0.0009 0.0006 0.0006 0.0010 0.0022 0.0004

0.0008

−0.0006

are exceptions, especially in Part II of the tables, but the differences between the restricted and unrestricted cases are quite small. The AICc tends perform slightly better than the AIC. But their performance cannot be too different because they are (m+1)(m+2) related by the relation AICc = AIC + n + 2 n−m−2 where n = T − KU − KL − 1 and m = p(KU + KL + 2)+ 1 (cf. Hurvich and Tsai, 1989). In most cases, the fixed selection rules are dominated by one of the model selection criteria. Overall, the MSE with Kmax = K12 tends to be larger than that with Kmax = K4 . The log-normal distribution does not bring any noticeable changes in evaluating the selection rules. In particular,

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

231

Table 4 Bias of the estimates. Part I: a11 = θ11 = 0 (bias × 103 , T = 300, εt : log-normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 ) (Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax KL = KU = 1 KL = KU = 2 KL = KU = 3

σ12 = 0.8

σ12 = 0.4

σ12 = 0.8

(0, 0) (0, 0.4)

(0, 0) (0, 0.8)

(0, 0) (0, 0.4)

(0, 0) (0, 0.8)

(0, 0.8) (0, 0.4)

(0, 0.8) (0, 0.4)

0.3568 0.3569 0.3574 0.3599 0.3587 0.3606 0.3611 0.3821 0.3578

−0.0059 −0.0062 −0.0064 −0.0099 −0.0024 −0.0063 −0.0061 −0.0168 −0.0019

−0.0035 −0.0056 −0.0058 −0.0064 −0.0013 −0.0036 −0.0035 −0.0105 −0.0003

−0.0008 −0.0008 −0.0009 −0.0015 −0.0013 −0.0013 −0.0016 −0.0018 −0.0018

−0.0001

0.1836 0.2137 0.2169 0.3470 0.2171 0.2766 0.2849 0.4385 0.1758

0.1807 0.1830 0.1833 0.1950 0.1802 0.1853 0.1868 0.2268 0.1774

0.0048 0.0053 0.0053 0.0090 0.0036 0.0037 0.0033 0.0167 −0.0025

0.0886 0.1143 0.1216 0.1941 0.0774 0.1274 0.1459 0.2783 0.0385

−0.0046 −0.0048 −0.0044 −0.0104 −0.0024 −0.0054 −0.0055 −0.0173

−0.0048 −0.0075 −0.0083 −0.0087 −0.0029 −0.0052 −0.0057 −0.0113

−0.0001

0.0017

0.0012

−0.0006 −0.0001 −0.0003 −0.0014 −0.0021 −0.0010 −0.0012 −0.0017 −0.0006

0.0511 0.1196 0.1330 0.3231 0.0801 0.1926 0.2124 0.4186 0.0291

0.0580 0.0800 0.0842 0.1394 0.0487 0.0895 0.1054 0.2053 0.0185

0.0288 0.0098 0.0019

0.8606 0.6935 0.5574

0.0003 −0.0020 −0.0007

0.0019 −0.0014 0.0006

−0.0006 −0.0014 −0.0004

0.4246 0.3413 0.2744

0.4294 0.3458 0.2780

(0, 0.4) (0, 0)

(0, 0.8) (0, 0)

(0, 0.4) (0, 0)

−0.0013

0.3427 0.3594 0.3640 0.5095 0.3856 0.4423 0.4522 0.7135 0.3550

0.0025 0.0047 0.0046 0.0098 0.0006 0.0040 0.0046 0.0163 −0.0043

0.0009 0.0018 0.0264 0.0010 0.0129 0.0116 0.0383 0.0034

0.0729 0.1451 0.1673 0.4492 0.1245 0.2692 0.3046 0.6787 0.0560

0.0250 0.0074 0.0003

0.8506 0.6851 0.5508

0.0049 0.0057 0.0277 0.0077 0.0159 0.0163 0.0418 −0.0054

−0.0088

σ12 = 0.4 (0, 0.8) (0, 0)

σ12 = 0.8

0.0004 0.0003 0.0001 0.0003 0.0008 0.0007 0.0007 −0.0005

0.0014 0.0014 0.0007 −0.0003 0.0009 0.0007 0.0011 0.0003 0.0006

−0.0009 0.0007

Part II: a11 ̸= 0 and/or θ11 ̸= 0 (bias × 10 , T = 300, εt : log-normal) 3

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 ) (Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax KL = KU = 1 KL = KU = 2 KL = KU = 3

σ12 = 0.4

σ12 = 0.8

(0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

σ12 = 0.8 (0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

(0, 0) (0.4, 0.4)

σ12 = 0.4 (0, 0) (0.8, 0.8)

(0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0.8, 0.8) (0.4, 0.4)

(0.8, 0.8) (0.4, 0.4)

0.0007 0.0030 0.0032 0.0068 0.0011 0.0030 0.0034 0.0071 −0.0034

0.0364 0.0469 0.0483 0.0675 0.0403 0.0534 0.0549 0.0718 0.0256

−0.0039 −0.0048 −0.0048 −0.0059 −0.0044 −0.0049 −0.0051 −0.0059 −0.0030

−0.0121 −0.0145 −0.0147 −0.0194 −0.0130 −0.0161 −0.0166 −0.0202 −0.0093

−0.0079 −0.0116 −0.0117 −0.0160 −0.0081 −0.0137 −0.0140 −0.0174 −0.0031

−0.0353 −0.0593 −0.0601 −0.0917 −0.0336 −0.0688 −0.0690 −0.0951 −0.0019

−0.0013 −0.0010 −0.0013 −0.0020 −0.0022 −0.0035 −0.0032 −0.0018 −0.0030

0.0058 0.0082 0.0090 0.0136 0.0042 0.0052 0.0047 0.0131 −0.0026

0.0330 0.0433 0.0440 0.0576 0.0366 0.0476 0.0483 0.0602 0.0214

−0.0113 −0.0136 −0.0138 −0.0174 −0.0121 −0.0148 −0.0152 −0.0181 −0.0084

−0.0049 −0.0008

−0.0004 −0.0029 −0.0023 −0.0053 −0.0001 −0.0045 −0.0044 −0.0055 −0.0016

−0.0031 −0.0078 −0.0087 −0.0179 −0.0034 −0.0108 −0.0121 −0.0194 −0.0021

−0.0071 −0.0082 −0.0086 −0.0171 −0.0050 −0.0114 −0.0115 −0.0183

−0.0348 −0.0680 −0.0725 −0.0998 −0.0309 −0.0677 −0.0695 −0.0980

−0.0006

0.0010 0.0052 −0.0023 0.0023 0.0017 0.0051 0.0017

0.0102 0.0278 0.0306 0.0634 0.0149 0.0372 0.0413 0.0671 0.0051

0.0032

0.0080

0.0000 −0.0004 −0.0022 −0.0030 −0.0029 −0.0031 −0.0018 −0.0008

0.0071 0.0101 0.0127 0.0158 0.0008 0.0077 0.0064 0.0134 0.0026

0.0074 0.0272 0.0288 0.0531 0.0112 0.0340 0.0368 0.0559 0.0047

−0.0020 −0.0078 −0.0082 −0.0163 −0.0026 −0.0109 −0.0116 −0.0171 −0.0020

0.0037 0.0000 −0.0015

0.0623 0.0501 0.0401

−0.0033 −0.0024 −0.0021

−0.0174 −0.0146 −0.0124

−0.0002 −0.0034 −0.0016

0.0052 −0.0073 0.0014

−0.0014 −0.0025 −0.0008

0.0008

0.0530 0.0425 0.0340

−0.0151 −0.0127 −0.0107

performance of the AICc does not change much with the log-normal distribution. (m-vii) As sample size grows, MSE decreases as expected, but qualitative differences are not observed with the increasing sample sizes in evaluating the selection rules. We infer from the simulation results that the model selection criteria are successful in reducing bias and MSE relative the fixed selection rules. The BIC appears to be most successful in reducing MSE, and Cp in reducing bias. We also observe that the selection rules without the restriction KL = KU have an advantage over those with KU = KL in most cases. For practitioners, therefore, we recommend using the selection rules of this paper without the restriction of KU = KL . In addition to the bias and MSE, we also investigate coverage rates of 95% confidence intervals. Table 9 reports the results in the

σ12 = 0.8

−0.0047 0.0033

case of T = 100 with normal errors.7 In general, the coverage rates are smaller than 0.95 when wt has an AR structure while they tend to be larger than 0.95 when wt is an MA process. The coverage rates with BIC and AICc tend to be greater than those with Cp and AIC, but the differences are minor. We can also see that the coverage rates are closer to 0.95 when the maximum leads-and-lags lengths are set at K4 than at K12 . As far as the coverage rates are concerned, it seems that the maximum leads-and-lags lengths should not be set too large and that BIC and AICc tend to result in better coverage rates than the other criteria.

7 Other cases were also studied, but we do not report the results in order to save space. They can be obtained from the authors upon request.

232

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

Table 5 MSE of the estimates. Part I: a11 = θ11 = 0 (MSE × 10, T = 100, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0, 0.4) (0, 0)

(0, 0.8) (0, 0)

σ12 = 0.8 (0, 0.4) (0, 0)

(0, 0.8) (0, 0)

(0, 0) (0, 0.4)

σ12 = 0.4 (0, 0) (0, 0.8)

σ12 = 0.8 (0, 0) (0, 0.4)

(0, 0) (0, 0.8)

(0, 0.8) (0, 0.4)

(0, 0.8) (0, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.0398 0.0354 0.0339 0.0302 0.0398 0.0340 0.0326 0.0289 0.0452

0.3031 0.2853 0.2807 0.2581 0.3074 0.2854 0.2765 0.2448 0.3224

0.0172 0.0157 0.0152 0.0141 0.0175 0.0163 0.0158 0.0155 0.0188

0.1863 0.1828 0.1808 0.1811 0.1948 0.2004 0.2045 0.2300 0.1921

0.0060 0.0054 0.0053 0.0051 0.0061 0.0054 0.0052 0.0052 0.0071

0.0023 0.0022 0.0022 0.0028 0.0024 0.0025 0.0026 0.0040 0.0027

0.0026 0.0023 0.0022 0.0020 0.0027 0.0022 0.0022 0.0022 0.0031

0.0009 0.0008 0.0008 0.0007 0.0010 0.0008 0.0008 0.0007 0.0012

0.1081 0.0985 0.0959 0.0854 0.1079 0.0951 0.0916 0.0800 0.1171

0.0619 0.0611 0.0607 0.0637 0.0643 0.0670 0.0686 0.0785 0.0632

0.1459 0.0699 0.0449 0.0309 0.1563 0.0637 0.0360 0.0289 0.1875

0.9807 0.7069 0.4926 0.3311 1.0208 0.6961 0.3979 0.2736 1.1669

0.0680 0.0345 0.0203 0.0146 0.0717 0.0334 0.0172 0.0155 0.0818

0.4590 0.3506 0.2572 0.2198 0.4752 0.3700 0.2604 0.2607 0.5083

0.0207 0.0080 0.0058 0.0051 0.0235 0.0072 0.0053 0.0052 0.0332

0.0092 0.0033 0.0025 0.0028 0.0108 0.0034 0.0027 0.0040 0.0156

0.0097 0.0035 0.0025 0.0020 0.0110 0.0031 0.0022 0.0022 0.0142

0.0035 0.0014 0.0009 0.0007 0.0043 0.0014 0.0008 0.0007 0.0064

0.3620 0.2183 0.1472 0.0939 0.3713 0.2044 0.1114 0.0810 0.4345

0.1699 0.1136 0.0819 0.0697 0.1755 0.1205 0.0789 0.0811 0.1897

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.0295 0.0340 0.0394

0.2376 0.2583 0.2868

0.0142 0.0144 0.0163

0.2709 0.2277 0.2027

0.0046 0.0054 0.0064

0.0016 0.0020 0.0023

0.0020 0.0023 0.0026

0.0007 0.0008 0.0010

0.0825 0.0918 0.1035

0.0773 0.0682 0.0636

Part II: a11 ̸= 0 and/or θ11 ̸= 0 (MSE × 10, T = 100, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

σ12 = 0.8 (0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

(0, 0) (0.4, 0.4)

σ12 = 0.4 (0, 0) (0.8, 0.8)

σ12 = 0.8 (0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0.8, 0.8) (0.4, 0.4)

(0.8, 0.8) (0.4, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.0148 0.0128 0.0123 0.0105 0.0147 0.0120 0.0114 0.0099 0.0173

0.0158 0.0140 0.0135 0.0113 0.0158 0.0134 0.0125 0.0105 0.0177

0.0063 0.0054 0.0052 0.0044 0.0062 0.0051 0.0048 0.0042 0.0073

0.0065 0.0058 0.0056 0.0048 0.0066 0.0056 0.0053 0.0044 0.0073

0.0149 0.0130 0.0126 0.0112 0.0154 0.0127 0.0120 0.0109 0.0187

0.0416 0.0419 0.0430 0.0484 0.0464 0.0487 0.0486 0.0526 0.0484

0.0065 0.0055 0.0054 0.0047 0.0065 0.0052 0.0050 0.0045 0.0081

0.0184 0.0184 0.0186 0.0204 0.0194 0.0199 0.0204 0.0222 0.0208

0.0150 0.0131 0.0126 0.0107 0.0150 0.0124 0.0117 0.0101 0.0171

0.0062 0.0055 0.0053 0.0045 0.0062 0.0053 0.0050 0.0042 0.0071

0.0589 0.0267 0.0160 0.0107 0.0616 0.0235 0.0124 0.0099 0.0742

0.0591 0.0414 0.0260 0.0149 0.0616 0.0398 0.0200 0.0117 0.0712

0.0256 0.0129 0.0069 0.0045 0.0264 0.0114 0.0052 0.0042 0.0320

0.0259 0.0188 0.0112 0.0066 0.0269 0.0181 0.0082 0.0050 0.0305

0.0516 0.0201 0.0144 0.0113 0.0590 0.0172 0.0123 0.0109 0.0828

0.1355 0.0591 0.0465 0.0485 0.1557 0.0596 0.0500 0.0526 0.2273

0.0257 0.0085 0.0060 0.0047 0.0282 0.0075 0.0051 0.0045 0.0360

0.0575 0.0255 0.0195 0.0205 0.0702 0.0261 0.0206 0.0222 0.0937

0.0560 0.0328 0.0198 0.0115 0.0588 0.0588 0.0146 0.0102 0.0689

0.0248 0.0155 0.0086 0.0049 0.0256 0.0145 0.0059 0.0043 0.0294

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.0112 0.0130 0.0151

0.0114 0.0132 0.0153

0.0047 0.0054 0.0062

0.0048 0.0055 0.0063

0.0119 0.0142 0.0167

0.0399 0.0423 0.0444

0.0051 0.0059 0.0068

0.0173 0.0176 0.0181

0.0111 0.0128 0.0148

0.0047 0.0053 0.0061

The data generating process we have considered so far is a VARMA process for which the coefficients {πj } in Eq. (5) decay to zero exponentially. Following a suggestion of a referee, we investigate the finite sample properties of the information criteria when {πj } decrease more slowly. To this end, we consider (11) with

ut =

500 

πj vt −j + et where πj = 1/jα ,

(14)

with a few exceptions.8 One of the exceptions is that Cp criterion does not necessarily work better than the others in reducing the bias. Moreover, both the bias and MSE do not necessarily decrease for α = 1.1 even when the sample size increases. In general, however, we confirm again that the information criteria are useful in reducing the bias and MSE. 5. Conclusion and further remarks

j=−500

where vt ∼ i.i.d. N (0, 1), et ∼ i.i.d. N (0, 1), and {vt } and {et } are independent. We set = 1.1, 1.5, 2 and 3, for which α ∞ the summability condition −∞ ∥πj ∥ < ∞ is satisfied. For the data generating process (14), the overall properties of the information criteria we have observed are found to be preserved

We have derived Mallows’ (1973) Cp criterion, Akaike’s (1973) AIC, Hurvich and Tsai’s (1989) corrected AIC, and the BIC of Akaike

8 Relevant tables are suppressed to save space and can be obtained from the authors upon request.

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

233

Table 6 MSE of the estimates. Part I: a11 = θ11 = 0 (MSE × 103 , T = 100, εt : log-normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0, 0.4) (0, 0)

(0, 0.8) (0, 0)

σ12 = 0.8 (0, 0.4) (0, 0)

(0, 0.8) (0, 0)

(0, 0) (0, 0.4)

σ12 = 0.4 (0, 0) (0, 0.8)

σ12 = 0.8 (0, 0) (0, 0.4)

(0, 0) (0, 0.8)

(0, 0.8) (0, 0.4)

(0, 0.8) (0, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.0287 0.0262 0.0256 0.0240 0.0287 0.0262 0.0255 0.0243 0.0342

0.2775 0.2735 0.2720 0.2717 0.2869 0.2878 0.2874 0.2881 0.2857

0.0105 0.0098 0.0096 0.0092 0.0109 0.0103 0.0101 0.0097 0.0122

0.1315 0.1276 0.1261 0.1228 0.1383 0.1396 0.1403 0.1455 0.1378

0.0039 0.0036 0.0036 0.0035 0.0041 0.0038 0.0037 0.0036 0.0052

0.0012 0.0011 0.0011 0.0011 0.0014 0.0013 0.0013 0.0014 0.0016

0.0014 0.0013 0.0013 0.0012 0.0015 0.0014 0.0013 0.0013 0.0019

0.0004 0.0004 0.0004 0.0003 0.0005 0.0004 0.0004 0.0004 0.0006

0.0975 0.0950 0.0944 0.0921 0.0993 0.0974 0.0960 0.0931 0.1011

0.0430 0.0419 0.0416 0.0417 0.0455 0.0461 0.0465 0.0491 0.0451

0.0695 0.0367 0.0269 0.0229 0.0750 0.0363 0.0268 0.0240 0.1087

0.5939 0.4696 0.3687 0.3087 0.6210 0.4773 0.3473 0.3132 0.7029

0.0252 0.0136 0.0105 0.0092 0.0277 0.0148 0.0109 0.0099 0.0386

0.2186 0.1773 0.1417 0.1255 0.2341 0.1980 0.1588 0.1576 0.2528

0.0087 0.0044 0.0037 0.0033 0.0106 0.0049 0.0038 0.0037 0.0190

0.0037 0.0016 0.0012 0.0011 0.0048 0.0020 0.0012 0.0013 0.0088

0.0033 0.0017 0.0014 0.0012 0.0040 0.0019 0.0014 0.0013 0.0069

0.0015 0.0006 0.0004 0.0003 0.0020 0.0008 0.0004 0.0004 0.0033

0.2075 0.1452 0.1128 0.0943 0.2168 0.1426 0.1050 0.0958 0.2607

0.0761 0.0565 0.0447 0.0413 0.0814 0.0625 0.0499 0.0498 0.0932

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.0257 0.0280 0.0309

0.2739 0.2690 0.2738

0.0098 0.0101 0.0110

0.1869 0.1614 0.1464

0.0038 0.0042 0.0048

0.0010 0.0012 0.0015

0.0014 0.0015 0.0017

0.0004 0.0004 0.0005

0.0912 0.0920 0.0956

0.0549 0.0493 0.0464

Part II: a11 ̸= 0 and/or θ11 ̸= 0 (MSE × 103 , T = 100, εt : log-normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

σ12 = 0.8 (0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

(0, 0) (0.4, 0.4)

σ12 = 0.4 (0, 0) (0.8, 0.8)

σ12 = 0.8 (0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0.8, 0.8) (0.4, 0.4)

(0.8, 0.8) (0.4, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.0103 0.0092 0.0090 0.0083 0.0102 0.0090 0.0088 0.0084 0.0126

0.0113 0.0105 0.0103 0.0092 0.0112 0.0103 0.0099 0.0088 0.0123

0.0036 0.0033 0.0032 0.0030 0.0037 0.0032 0.0032 0.0030 0.0045

0.0040 0.0037 0.0036 0.0032 0.0040 0.0036 0.0035 0.0031 0.0044

0.0103 0.0093 0.0091 0.0087 0.0105 0.0094 0.0092 0.0089 0.0141

0.0239 0.0206 0.0200 0.0176 0.0256 0.0218 0.0201 0.0173 0.0333

0.0037 0.0033 0.0033 0.0032 0.0038 0.0033 0.0033 0.0032 0.0050

0.0087 0.0075 0.0072 0.0063 0.0093 0.0076 0.0071 0.0062 0.0121

0.0106 0.0097 0.0094 0.0085 0.0106 0.0094 0.0090 0.0083 0.0120

0.0037 0.0034 0.0033 0.0030 0.0037 0.0033 0.0032 0.0029 0.0043

0.0259 0.0128 0.0097 0.0081 0.0277 0.0127 0.0094 0.0084 0.0409

0.0315 0.0222 0.0159 0.0110 0.0336 0.0216 0.0131 0.0097 0.0394

0.0093 0.0047 0.0036 0.0030 0.0102 0.0048 0.0035 0.0031 0.0148

0.0116 0.0080 0.0054 0.0038 0.0122 0.0078 0.0047 0.0035 0.0142

0.0241 0.0113 0.0095 0.0083 0.0281 0.0127 0.0095 0.0089 0.0505

0.0694 0.0282 0.0210 0.0162 0.0847 0.0342 0.0190 0.0166 0.1513

0.0086 0.0043 0.0035 0.0031 0.0104 0.0046 0.0035 0.0032 0.0180

0.0254 0.0113 0.0075 0.0058 0.0327 0.0122 0.0072 0.0062 0.0536

0.0279 0.0173 0.0121 0.0089 0.0298 0.0164 0.0104 0.0086 0.0377

0.0103 0.0059 0.0042 0.0032 0.0109 0.0059 0.0038 0.0031 0.0136

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.0094 0.0103 0.0114

0.0092 0.0101 0.0111

0.0033 0.0037 0.0041

0.0033 0.0036 0.0040

0.0103 0.0114 0.0129

0.0240 0.0262 0.0310

0.0037 0.0041 0.0046

0.0088 0.0095 0.0109

0.0090 0.0098 0.0109

0.0032 0.0035 0.0039

(1978) and Schwarz (1978) for the leads-and-lags cointegrating regression. Our results justify using conventional formulas of those model selection criteria for the leads-and-lags cointegrating regression. These model selection criteria allow us to choose the numbers of leads and lags in scientific ways. Simulation results regarding the bias and mean squared error of the long-run coefficient estimates are also reported. The model selection criteria are shown to be successful in reducing bias and mean squared error relative to the conventional, fixed selection rules. Among them, the BIC appears to be most successful in reducing MSE, and Cp in reducing bias. We also observe that the selection rules without the restriction that the numbers of the leads and lags be the same have an advantage over those with it in most cases. The model selection

criteria in this paper were derived for linear regression, but we note that they can also be used for the nonlinear leads-and-lags regression of Saikkonen and Choi (2004). The ultimate purpose of the model selection criteria for the leads-and-lags cointegrating regression is to estimate the longrun slope coefficient efficiently. Though it was shown through simulations that they improve on the fixed selection rules in terms of bias and mean squared error, a better rule that directly minimizes the mean squared error (or other efficiency measures) of the long-run slope coefficient estimate may exist. How this rule, if it exists, and the model selection criteria of this paper are related is a question one may be interested in investigating.

234

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

Table 7 MSE of the estimates. Part I: a11 = θ11 = 0 (MSE × 102 , T = 300, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0, 0.4) (0, 0)

(0, 0.8) (0, 0)

σ12 = 0.8 (0, 0.4) (0, 0)

(0, 0.8) (0, 0)

(0, 0) (0, 0.4)

σ12 = 0.4 (0, 0) (0, 0.8)

σ12 = 0.8 (0, 0) (0, 0.4)

(0, 0) (0, 0.8)

(0, 0.8) (0, 0.4)

(0, 0.8) (0, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.0345 0.0329 0.0328 0.0312 0.0346 0.0330 0.0329 0.0322 0.0363

0.3112 0.3070 0.3069 0.3137 0.3192 0.3251 0.3260 0.3409 0.3172

0.0145 0.0140 0.0139 0.0135 0.0147 0.0143 0.0142 0.0143 0.0152

0.1769 0.1727 0.1719 0.1660 0.1814 0.1819 0.1820 0.1910 0.1811

0.0046 0.0044 0.0044 0.0044 0.0046 0.0044 0.0044 0.0048 0.0050

0.0009 0.0009 0.0009 0.0009 0.0010 0.0009 0.0009 0.0013 0.0010

0.0019 0.0018 0.0018 0.0017 0.0020 0.0018 0.0018 0.0018 0.0021

0.0004 0.0004 0.0004 0.0003 0.0004 0.0004 0.0004 0.0004 0.0004

0.1104 0.1096 0.1093 0.1092 0.1130 0.1135 0.1133 0.1128 0.1131

0.0579 0.0566 0.0565 0.0572 0.0596 0.0607 0.0609 0.0695 0.0593

0.0490 0.0377 0.0361 0.0312 0.0500 0.0355 0.0343 0.0322 0.0587

0.4356 0.3852 0.3720 0.3236 0.4404 0.3827 0.3691 0.3437 0.4875

0.0211 0.0163 0.0157 0.0135 0.0214 0.0158 0.0151 0.0143 0.0248

0.1905 0.1722 0.1675 0.1511 0.1987 0.1877 0.1841 0.1888 0.2078

0.0064 0.0047 0.0046 0.0044 0.0065 0.0046 0.0045 0.0048 0.0082

0.0013 0.0010 0.0009 0.0009 0.0013 0.0009 0.0009 0.0013 0.0018

0.0027 0.0020 0.0019 0.0017 0.0027 0.0019 0.0019 0.0018 0.0034

0.0005 0.0004 0.0004 0.0003 0.0005 0.0004 0.0004 0.0004 0.0007

0.1564 0.1322 0.1275 0.1106 0.1587 0.1292 0.1240 0.1130 0.1763

0.0682 0.0607 0.0591 0.0556 0.0703 0.0652 0.0641 0.0701 0.0750

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.0305 0.0313 0.0327

0.3224 0.3106 0.3072

0.0149 0.0135 0.0138

0.3955 0.3017 0.2414

0.0041 0.0043 0.0045

0.0008 0.0009 0.0009

0.0018 0.0018 0.0019

0.0004 0.0004 0.0004

0.1081 0.1066 0.1072

0.1103 0.0874 0.0730

Part II: a11 ̸= 0 and/or θ11 ̸= 0 (MSE × 102 , T = 300, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

σ12 = 0.8 (0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

(0, 0) (0.4, 0.4)

σ12 = 0.4 (0, 0) (0.8, 0.8)

σ12 = 0.8 (0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0.8, 0.8) (0.4, 0.4)

(0.8, 0.8) (0.4, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.0125 0.0118 0.0116 0.0106 0.0124 0.0113 0.0112 0.0105 0.0133

0.0128 0.0121 0.0121 0.0110 0.0127 0.0118 0.0117 0.0107 0.0136

0.0052 0.0049 0.0049 0.0045 0.0052 0.0048 0.0048 0.0044 0.0056

0.0055 0.0052 0.0051 0.0046 0.0054 0.0050 0.0050 0.0045 0.0058

0.0124 0.0115 0.0115 0.0107 0.0124 0.0113 0.0112 0.0107 0.0137

0.0261 0.0324 0.0330 0.0474 0.0306 0.0403 0.0410 0.0526 0.0233

0.0052 0.0049 0.0049 0.0047 0.0053 0.0049 0.0048 0.0047 0.0057

0.0111 0.0138 0.0140 0.0207 0.0131 0.0168 0.0171 0.0226 0.0095

0.0126 0.0119 0.0118 0.0108 0.0125 0.0115 0.0114 0.0107 0.0134

0.0054 0.0050 0.0050 0.0046 0.0053 0.0049 0.0048 0.0045 0.0057

0.0181 0.0135 0.0128 0.0106 0.0182 0.0120 0.0115 0.0105 0.0217

0.0193 0.0157 0.0151 0.0112 0.0193 0.0144 0.0135 0.0108 0.0223

0.0076 0.0056 0.0054 0.0045 0.0076 0.0052 0.0050 0.0044 0.0091

0.0081 0.0067 0.0064 0.0047 0.0080 0.0061 0.0057 0.0045 0.0094

0.0169 0.0123 0.0120 0.0107 0.0173 0.0117 0.0113 0.0107 0.0223

0.0294 0.0306 0.0309 0.0474 0.0323 0.0395 0.0404 0.0526 0.0379

0.0071 0.0052 0.0051 0.0047 0.0072 0.0050 0.0049 0.0047 0.0094

0.0118 0.0132 0.0133 0.0207 0.0138 0.0166 0.0169 0.0226 0.0158

0.0188 0.0147 0.0141 0.0108 0.0189 0.0133 0.0126 0.0107 0.0219

0.0080 0.0062 0.0059 0.0046 0.0079 0.0056 0.0053 0.0045 0.0092

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.0109 0.0114 0.0120

0.0112 0.0118 0.0124

0.0046 0.0048 0.0051

0.0047 0.0050 0.0052

0.0113 0.0118 0.0122

0.0309 0.0257 0.0230

0.0048 0.0049 0.0051

0.0132 0.0109 0.0097

0.0111 0.0116 0.0122

0.0047 0.0049 0.0051

Appendix A. Calculation of tr{J (δo )I (δo )−1 }

This is minimized by θo = (ZK′ ZK )−1 ZK′ τ and σo2 = + σe2 .



Ignoring a constant, the Kullback–Leibler information measure Since

is written as

−EF (l(δK )) = EF =

n 2



  (y − ZK θK )′ (y − ZK θK )  ln(σ ) +  Z∞ 2 2 2σeK n

∂ l(δK ) ∂δK

=

 

1 2 σeK

n

(τ − ZK θK )′ (τ − ZK θK ) nσe2 + . 2 2 2σeK 2σeK

(ZK′ y − ZK′ ZK θK )

1

(y − ZK θK )′ (y − ZK θK ) − 2 + 4 2σeK 2σeK

n

  ′  , ZK y −

ZK′ ZK θo = ZK′ e, and y − ZK θo = MZK τ + e, we may write

2 eK

2 ln(σeK )+

(τ −ZK θo )′ (τ −ZK θo )

  A11 ∂ l(δK ) ∂ l(δK )  = A′12 ∂δK ∂δK′ θ=θo

A12 A22



,

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

235

Table 8 MSE of the estimates. Part I: a11 = θ11 = 0 (MSE × 105 , T = 300, εt : log-normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0, 0.4) (0, 0)

(0, 0.8) (0, 0)

σ12 = 0.8 (0, 0.4) (0, 0)

(0, 0.8) (0, 0)

(0, 0) (0, 0.4)

σ12 = 0.4 (0, 0) (0, 0.8)

σ12 = 0.8 (0, 0) (0, 0.4)

(0, 0) (0, 0.8)

(0, 0.8) (0, 0.4)

(0, 0.8) (0, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.1041 0.1010 0.1008 0.0984 0.1051 0.1017 0.1013 0.0998 0.1136

0.9804 0.9678 0.9670 0.9680 1.0073 1.0089 1.0082 1.0206 1.0066

0.0379 0.0369 0.0368 0.0362 0.0388 0.0379 0.0378 0.0374 0.0407

0.4089 0.4027 0.4018 0.3930 0.4229 0.4233 0.4233 0.4279 0.4226

0.0135 0.0131 0.0131 0.0131 0.0137 0.0133 0.0133 0.0134 0.0154

0.0022 0.0021 0.0022 0.0021 0.0023 0.0022 0.0022 0.0024 0.0026

0.0049 0.0047 0.0047 0.0047 0.0049 0.0048 0.0048 0.0048 0.0055

0.0008 0.0008 0.0008 0.0008 0.0008 0.0008 0.0008 0.0008 0.0009

0.3496 0.3453 0.3446 0.3453 0.3576 0.3559 0.3554 0.3548 0.3599

0.1400 0.1377 0.1376 0.1362 0.1450 0.1454 0.1455 0.1495 0.1447

0.1175 0.1014 0.1000 0.0956 0.1199 0.1026 0.1015 0.0983 0.1573

1.1857 1.0732 1.0503 0.9593 1.2028 1.0976 1.0704 1.0130 1.3313

0.0435 0.0379 0.0373 0.0355 0.0447 0.0388 0.0379 0.0371 0.0566

0.4308 0.3979 0.3910 0.3657 0.4510 0.4302 0.4206 0.4136 0.4760

0.0147 0.0130 0.0129 0.0128 0.0153 0.0133 0.0132 0.0132 0.0216

0.0026 0.0022 0.0021 0.0021 0.0028 0.0023 0.0022 0.0023 0.0040

0.0054 0.0047 0.0047 0.0046 0.0056 0.0049 0.0048 0.0047 0.0078

0.0009 0.0008 0.0008 0.0007 0.0010 0.0008 0.0008 0.0008 0.0015

0.4128 0.3660 0.3590 0.3369 0.4162 0.3739 0.3652 0.3512 0.4812

0.1512 0.1384 0.1359 0.1302 0.1586 0.1490 0.1465 0.1468 0.1722

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.1010 0.1035 0.1068

1.0260 0.9990 0.9906

0.0383 0.0375 0.0384

0.6532 0.5523 0.4876

0.0135 0.0140 0.0144

0.0023 0.0024 0.0024

0.0049 0.0050 0.0052

0.0008 0.0008 0.0009

0.3530 0.3489 0.3498

0.1985 0.1740 0.1589

Part II: a11 ̸= 0 and/or θ11 ̸= 0 (MSE × 105 , T = 300, εt : log-normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

σ12 = 0.8 (0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

(0, 0) (0.4, 0.4)

σ12 = 0.4 (0, 0) (0.8, 0.8)

σ12 = 0.8 (0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0.8, 0.8) (0.4, 0.4)

(0.8, 0.8) (0.4, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.0369 0.0356 0.0355 0.0346 0.0368 0.0353 0.0353 0.0347 0.0411

0.0389 0.0375 0.0374 0.0353 0.0385 0.0368 0.0367 0.0350 0.0406

0.0133 0.0128 0.0128 0.0124 0.0133 0.0128 0.0127 0.0125 0.0148

0.0137 0.0132 0.0132 0.0125 0.0136 0.0130 0.0129 0.0124 0.0145

0.0368 0.0356 0.0356 0.0351 0.0369 0.0357 0.0357 0.0354 0.0424

0.0506 0.0479 0.0477 0.0450 0.0512 0.0478 0.0477 0.0448 0.0607

0.0133 0.0129 0.0129 0.0126 0.0133 0.0129 0.0129 0.0127 0.0152

0.0184 0.0174 0.0173 0.0160 0.0185 0.0172 0.0170 0.0159 0.0221

0.0380 0.0364 0.0363 0.0346 0.0376 0.0358 0.0357 0.0345 0.0404

0.0134 0.0129 0.0129 0.0123 0.0134 0.0128 0.0127 0.0123 0.0145

0.0420 0.0358 0.0353 0.0337 0.0424 0.0359 0.0354 0.0343 0.0571

0.0484 0.0423 0.0413 0.0349 0.0476 0.0403 0.0390 0.0348 0.0561

0.0154 0.0131 0.0129 0.0122 0.0156 0.0130 0.0128 0.0124 0.0205

0.0170 0.0148 0.0144 0.0125 0.0170 0.0142 0.0137 0.0124 0.0201

0.0396 0.0353 0.0351 0.0343 0.0415 0.0359 0.0355 0.0350 0.0595

0.0588 0.0485 0.0478 0.0430 0.0642 0.0508 0.0482 0.0434 0.0940

0.0147 0.0128 0.0127 0.0124 0.0153 0.0130 0.0129 0.0127 0.0216

0.0215 0.0176 0.0174 0.0155 0.0232 0.0172 0.0170 0.0158 0.0348

0.0462 0.0392 0.0383 0.0341 0.0455 0.0380 0.0369 0.0344 0.0558

0.0163 0.0139 0.0137 0.0122 0.0164 0.0134 0.0132 0.0123 0.0200

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.0363 0.0375 0.0387

0.0360 0.0371 0.0382

0.0130 0.0134 0.0139

0.0128 0.0132 0.0137

0.0374 0.0386 0.0399

0.0537 0.0547 0.0573

0.0134 0.0137 0.0143

0.0193 0.0196 0.0205

0.0357 0.0368 0.0380

0.0127 0.0131 0.0136

where A11 =

1 ′ Z ee′ ZK , σo4 K

A12 =

e′ e + 2e′ MZK τ ) and A22 =

 n − 2σ 2



MZK τ

2

 × − 2σn 2 + 2σ1 4 (τ ′ MZK τ + o o  ′ 1 + 2σ 4 τ MZK τ + e′ e + 2e′

1 ′ Z e σo2 K

o

o

. Thus,

Moreover, using  1 − 2 ZK′ ZK 2 σeK ∂ l(δK )   =  1 ′ ∂θ ∂θ ′ − 4 (y ZK − θK′ ZK′ ZK ) σeK

− n 4 2σeK

1

σ

4 eK

(ZK′ y − ZK′ ZK θK )



  ,  1 − 6 (y − ZK θK )′ (y − ZK θK ) σeK

we find

σ σ  J (δo ) =   

2 e Z′ Z 4 K K o

0





0 nσe4

σe2 ′ + τ MZK τ 2σo8 σo8

  . 

(A.1)

1

′  σ 2 ZK ZK

I (δo ) = 

 

o

0

 0

  . τ ′ MZK τ  nσe2 n − + σo6 2σo4 σo6

(A.2)

236

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

Table 9 Coverage rate of 95% confidence interval. Part I: a11 = θ11 = 0 (T = 100, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0, 0.4) (0, 0)

(0, 0.8) (0, 0)

σ12 = 0.8 (0, 0.4) (0, 0)

(0, 0.8) (0, 0)

(0, 0) (0, 0.4)

σ12 = 0.4 (0, 0) (0, 0.8)

σ12 = 0.8 (0, 0) (0, 0.4)

(0, 0) (0, 0.8)

(0, 0.8) (0, 0.4)

(0, 0.8) (0, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.859 0.865 0.869 0.882 0.868 0.876 0.879 0.893 0.882

0.758 0.755 0.757 0.768 0.766 0.762 0.764 0.783 0.785

0.862 0.863 0.865 0.875 0.869 0.870 0.872 0.873 0.880

0.727 0.730 0.732 0.733 0.738 0.730 0.724 0.699 0.743

0.960 0.959 0.960 0.960 0.964 0.961 0.961 0.960 0.967

0.997 0.996 0.996 0.991 0.997 0.994 0.994 0.986 0.997

0.955 0.960 0.961 0.964 0.958 0.962 0.963 0.962 0.963

0.997 0.997 0.997 0.997 0.997 0.998 0.998 0.999 0.998

0.744 0.745 0.749 0.769 0.756 0.758 0.763 0.783 0.773

0.719 0.714 0.713 0.704 0.731 0.711 0.703 0.668 0.745

0.748 0.795 0.837 0.879 0.783 0.836 0.872 0.892 0.807

0.607 0.615 0.656 0.722 0.643 0.657 0.720 0.767 0.674

0.753 0.794 0.832 0.870 0.780 0.821 0.863 0.872 0.806

0.614 0.617 0.649 0.676 0.649 0.638 0.666 0.657 0.677

0.916 0.944 0.955 0.959 0.925 0.954 0.960 0.960 0.925

0.987 0.994 0.995 0.992 0.988 0.993 0.994 0.986 0.985

0.917 0.947 0.957 0.964 0.920 0.956 0.962 0.962 0.921

0.985 0.995 0.997 0.997 0.985 0.996 0.998 0.999 0.983

0.611 0.630 0.677 0.753 0.646 0.681 0.743 0.781 0.680

0.617 0.616 0.650 0.672 0.652 0.631 0.669 0.653 0.683

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.900 0.893 0.888

0.804 0.798 0.789

0.885 0.891 0.885

0.695 0.715 0.734

0.972 0.968 0.966

0.999 0.998 0.998

0.967 0.967 0.964

0.999 0.998 0.998

0.792 0.785 0.778

0.689 0.715 0.733

Part II: a11 ̸= 0 and/or θ11 ̸= 0 (T = 100, εt : normal)

σ12 = 0.4 (a11 , a22 ) (θ11 , θ22 )

σ12 = 0.4

σ12 = 0.8

(0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

σ12 = 0.8 (0.4, 0.4) (0, 0)

(0.8, 0.8) (0, 0)

(0, 0) (0.4, 0.4)

σ12 = 0.4 (0, 0) (0.8, 0.8)

σ12 = 0.8 (0, 0) (0.4, 0.4)

(0, 0) (0.8, 0.8)

(0.8, 0.8) (0.4, 0.4)

(0.8, 0.8) (0.4, 0.4)

(Kmax = K4 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax (Kmax = K12 ) Cp AIC AICc BIC Cp (KL = KU ) AIC (KL = KU ) AICc (KL = KU ) BIC (KL = KU ) KL = KU = Kmax

0.860 0.866 0.870 0.891 0.869 0.879 0.883 0.901 0.880

0.743 0.753 0.757 0.781 0.756 0.767 0.773 0.800 0.770

0.858 0.864 0.870 0.888 0.867 0.876 0.883 0.896 0.879

0.748 0.753 0.757 0.785 0.761 0.771 0.778 0.801 0.772

0.959 0.965 0.967 0.969 0.966 0.967 0.969 0.971 0.968

0.989 0.977 0.973 0.951 0.983 0.961 0.957 0.939 0.998

0.956 0.963 0.963 0.967 0.959 0.964 0.965 0.969 0.963

0.989 0.976 0.973 0.954 0.984 0.964 0.959 0.939 0.997

0.738 0.746 0.750 0.776 0.752 0.761 0.768 0.789 0.765

0.734 0.744 0.749 0.783 0.750 0.765 0.772 0.797 0.766

0.746 0.796 0.840 0.888 0.777 0.839 0.876 0.901 0.802

0.571 0.592 0.644 0.740 0.612 0.647 0.727 0.790 0.640

0.753 0.795 0.836 0.885 0.774 0.834 0.878 0.896 0.797

0.578 0.608 0.660 0.744 0.617 0.656 0.734 0.791 0.645

0.919 0.951 0.962 0.969 0.930 0.961 0.968 0.971 0.929

0.988 0.980 0.975 0.951 0.987 0.964 0.958 0.939 0.988

0.917 0.950 0.961 0.967 0.922 0.956 0.965 0.969 0.924

0.985 0.980 0.977 0.954 0.986 0.967 0.960 0.939 0.987

0.589 0.620 0.679 0.765 0.627 0.679 0.749 0.787 0.660

0.598 0.635 0.684 0.771 0.630 0.684 0.754 0.796 0.665

KL = KU = 1 KL = KU = 2 KL = KU = 3

0.899 0.894 0.886

0.799 0.788 0.779

0.895 0.890 0.885

0.803 0.793 0.783

0.973 0.969 0.966

0.989 0.994 0.996

0.967 0.967 0.965

0.989 0.993 0.996

0.786 0.780 0.771

0.791 0.783 0.771

Using (A.1), (A.2) and the relation τ ′ MZK τ = n(σo2 − σe2 ), we obtain tr J (δo )I (δo )



−1



σ = σ

2 e 2 o

σ2 = e2 σo

  0    Ip(KU +KL +2)+1    2 2 tr  2σo − σe    0   σo2   2σo2 − σe2 p(KU + KL + 2) + 1 + . σo2

Lemma B.1. (i)

V ′ MZ V K

n −m

= op (1). (ii)

V ′ MZ e K

n−m

= op (1).

Proof. (i) Write V ′ MZK V = V ′ V − V ′ ZK (ZK′ ZK )−1 ZK V . Since

2

 E

Vt2,K







=E

π j v t −j ′

j>KU , j<−KL



=

E πj′ vt −j vt′ −j πj









j>KU , j<−KL

+

Appendix B. Proofs



j>KU , j<−KL l>KU , l<−KL

The following lemma is required for the derivation of the corrected AIC in Section 3.



 j>KU ,j<−KL

πj′ Ωvv πj

E πj′ vt −j vt′ −l πl



I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238



+







Proof. (i) Since τ − ZK θˆK = τ − PZK y = MZK τ − PZK e, the second term in relation (10) is written as

πj′ Ωvv πj πl′ Ωvv πl

j>KU ,j<−KL l>KU , l<−KL



e′ PZK e τ ′ MZK τ (τ − ZK θˆK )′ (τ − ZK θˆK ) = + 2 2 2 σˆ eK σˆ eK σˆ eK  e′ P e  

 2 πj 



≤ lvv

j>KU , j<−KL

 

+



ZK

  πj  ∥πl ∥

e

+

where lvv is the maximum eigenvalue of Ωvv and the Cauchy– Schwarz inequality is used for the first inequality. Assumption (9)   implies E Vt2,K = o(T −1 ). Thus, V ′ V = op (1).

(B.3)

Next, let DT = diag[n diag[1, n



−2

T −KL

, n Ip , n

−1/2

Ip , . . . , n



Ip ] and R =

xt xt , Γ ] where Γ = E vt +KL , . . . , vt −KU ′

t =KU +2

−1/2





′

e′ PZ e K

σe2

2 nσˆ eK

σe2

 ( n − m) = =

vt′ +KL , . . . , vt′ −KU . Then, we have the following equality for the and

 ′  V ZK DT (DT Z ′ ZK DT )−1 DT ZK V  K    ≤ V ′ ZK DT  R−1 1 ∥DT ZK V ∥    + V ′ ZK DT  DT (ZK′ ZK )−1 DT − R−1 1 ∥DT ZK V ∥ ,

K

σe2



O(1). Therefore, V ′ ZK (ZK′ ZK )−1 ZK V = op (K ), which gives the stated result along with Assumption (8). (ii) Write V ′ MZK e = V ′ e − V ′ ZK (ZK′ ZK )−1 ZK e. Then, V ′ e = √ op ( n) because









E (V ′ V ) E (e′ e)

√ √ = o(1) × O( n) = o( n).

Since ∥DT ZK e∥ = Op (K 1/2 ), as shown in Saikkonen (1991), V ′ ZK (ZK′ ZK )−1 ZK e = op (K ). Thus, the stated result follows.  p

Lemma B.2. σo2 −→ σe2 . Proof. Since σo2 = V ′ MZ V K

n

(τ −ZK θo )′ (τ −ZK θo ) n

+ σe2 and

p

(τ −ZK θo )′ (τ −ZK θo )

−→ 0 by Lemma B.1(i), we obtain the result.

n

=



Lemma B.3. Assume et ∼ i.i.d. N (0, σe2 ). (i) (ii)

(τ −ZK θˆK )′ (τ −ZK θˆK ) 2 σˆ eK nσe2 2 σˆ eK

K

σe2 (n−m)

e′ MZ e

and DT (ZK′ ZK )−1 DT − R−1 1 = Op (K /T 1/2 ), where K /KL , K /KU =

E V ′ e ≤ E ∥V ∥ ∥e∥ ≤

e′ MZ e

e′ MZK e

σe2 (n − m) e′ MZK e

σ (n − m) 2 e

+

V ′ MZK V

σe2 (n − m)

+

2V ′ MZK e

σe2 (n − m)

+ op (1),

follows nnm F (m, n − m) when T is large. −m

2

follows χ 2 (nn−m) when T is large.

9 Saikkonen (1991) does not consider an intercept term in his linear regression model, but extending his results to the model with an intercept term is straightforward.

(B.5)

d

= χ 2 (n − m)/(n − m). Note that the second

equality of relation (B.5) holds due to Lemma B.1. Since

where Lemma B.1 of Saikkonen is   used for the inequality.   As shown in Saikkonen (1991),9 V ′ ZK DT  = op (K 1/2 ), R−1 1 = Op (1),



d

= χ 2 (m). Next, letting V = [VKL +2,K , . . . ,

VT −KU ,K ]′ , we have



second term of V ′ MZK V .

(B.4)

where m = p(KU + KL + 2) + 1. For the first term in relation (B.4), note first that

 −1

τ MZK τ , 2 σˆ eK ′

j>KU , j<−KL

−1/2

m

 2  n − m  nσˆeK ( n − m) σ2

2   π j  ,



σe2

nm

=

j>KU ,j<−KL l>KU ,l<−KL

 = lvv

237

e′ PZ e K

σe2

and

are independent, the first term in (B.4) is approximately

distributed as nnm F (m, n − m). For the second term in (B.4), note −m that relation (B.3) yields τ ′ MZK τ = V ′ MZK V ≤ V ′ V = op (1) and 2 that σˆ eK = Op (1). Thus, the second term is op (1). This completes the proof. (ii) Relation (B.5) implies that

nσe2 2 σˆ eK

2

is distributed as χ 2 (nn−m) .



References Akaike, H., 1973. Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csáki, F. (Eds.), 2nd International Symposium on Information Theory. Akadémiai Kiadó, Budapest, pp. 267–281. Akaike, H., 1978. A Bayesian analysis of the minimum AIC procedure. Annals of the Institute of Statistical Mathematics 30, 9–14. Ball, L., 2001. Another look at long-run money demand. Journal of Monetary Economics 47, 31–44. Bentzen, J., 2004. Estimating the rebound effect in US manufacturing energy consumption. Energy Economics 26, 123–134. Berk, K.N., 1974. Consistent autoregressive spectral estimates. The Annals of Statistics 2, 489–502. Burnham, K.P., Anderson, D.R., 2002. Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach, second ed. Springer-Verlag. Caballero, R.J., 1994. Small sample bias and adjustment costs. The Review of Economics and Statistics 76, 52–58. Hai, W., Mark, Nelson C., Wu, Y., 1997. Understanding spot and forward exchange rate regressions. Journal of Applied Econometrics 12, 715–734. Hargreaves, C.P., 1994. A review of methods of estimating cointegrating relationships. In: Hargreaves, C.P. (Ed.), Nonstationary Time Series Analysis and Cointegration. Oxford University Press, pp. 87–132. Hurvich, C.M., Tsai, C., 1989. Regression and time series model selection in small samples. Biometrika 76, 297–307. Hurvich, C.M., Tsai, C., 1991. Bias of the corrected AIC criterion for underfitted regression and time series models. Biometrika 78, 499–509. Hussein, K.A., 1998. International capital mobility in OECD countries: the Feldstein–Horioka ‘puzzle’ revisited. Economics Letters 59, 237–242. Johansen, S., 1988. Statistical anaysis of cointgrating vectors. Journal of Economic Dynamics and Control 12, 231–254. Kejriwal, M., Perron, P., 2008. Data dependent rule for selection of the number of leads and lags in the dynamic OLS cointegrating regression. Econometric Theory 24, 1425–1441. Lewis, R., Reinsel, G.C., 1985. Prediction of multivariate time series by autoregressive model fitting. Journal of Multivariate Analysis 16, 393–411. Mallows, C.L., 1973. Some comments on Cp . Technometrics 15, 661–675.

238

I. Choi, E. Kurozumi / Journal of Econometrics 169 (2012) 224–238

Masih, R., Masih, A.M.M., 1996. Stock–Watson dynamic OLS (DOLS) and errorcorrection modelling approaches to estimating long- and short-run elasticities in a demand function: new evidence and methodological implications from an application to the demand for coal in mainland China. Energy Economics 18, 315–334. Panopoulou, E., Pittis, N., 2004. A comparison of autoregressive distributed lag and dynamic OLS cointegration estimators in the case of a serially correlated cointegration error. Econometrics Journal 7, 545–617. Park, J.Y., 1992. Canonical cointegrating regressions. Econometrica 60, 119–143. Pesaran, H.M., Shin, Y., 1999. An autoregressive distributed lag modelling approach to cointegration analysis. In: Strom, S. (Ed.), Econometrics and Economic Theory in the Twentieth Century: The Ragnar Frisch Centennial Symposium. Cambridge University Press, Cambridge, UK, pp. 371–413. Phillips, P.C.B., Durlauf, S.N., 1986. Multiple time series regression with integrated processes. The Review of Economic Studies 53, 473–495. Phillips, P.C.B., Hansen, B.E., 1990. Statistical inference in instrumental variables regression with I(1) processes. The Review of Economic Studies 57, 99–125. Phillips, P.C.B., Loretan, M., 1991. Estimating long-run economic equilibria. The Review of Economic Studies 58, 407–436. Phillips, P.C.B., Ploberger, W., 1994. Posterior odds testing for a unit root with databased model selection. Econometric Theory 10, 774–808. Phillips, P.C.B., Ploberger, W., 1996. An asymptotic theory of Bayesian inference for time series. Econometrica 64, 381–412.

Rao, C.R., Wu, Y., 2001. On model selection. In: Lahiri, P. (Ed.), Model Selection. Institute of Mathematical Statistics, Beachwood, Ohio, pp. 1–57. Saikkonen, P., 1991. Asymptotically efficient estimation of cointegration regressions. Econometric Theory 7, 1–21. Saikkonen, P., Choi, I., 2004. Cointegrating smooth transition regressions. Econometric Theory 20, 301–340. Schwarz, G., 1978. Estimating the dimension of a model. The Annals of Statistics 6, 461–464. Shibata, R., 1980. Asymptotically efficient selection of the order of the model for estimating parameters of a linear process. The Annals of Statistics 8, 147–164. Shibata, R., 1981. An optimal selection of regression variables. Biometrika 68, 45–54. Stock, J.H., Watson, M.W., 1993. A simple estimator of cointegrating vectors in higher order integrated systems. Econometrica 61, 783–820. Takeuchi, K., 1976. Distribution of informational statistics and a criterion of model fitting. In: Suri-Kagaku (Mathematic Sciences), vol. 153. pp. 12–18. Weber, C.E., 1995. Cyclical output, cyclical unemployment, and Okun’s coefficient: a new approach. Journal of Applied Econometrics 10, 433–445. Wu, J., Fountas, S., Chen, S., 1996. Testing for the sustainability of the current account deficit in two industrial countries. Economics Letters 52, 193–198. Zivot, E., 2000. Cointegration and forward and spot exchange rate regressions. Journal of International Money and Finance 19, 785–812.