Economics Letters 87 (2005) 301 – 306 www.elsevier.com/locate/econbase
Spurious nonlinear regressions in econometrics Young-Sook Leea, Tae-Hwan Kimb,c,T, Paul Newboldc a
Department of Economics and Finance, Business School, University of Durham, UK Department of Economics, Yonsei University, 134 Shinchon-dong, Seodaemun-gu, 120-749 Seoul, South Korea c School of Economics, University of Nottingham, UK
b
Received 4 May 2004; accepted 14 October 2004 Available online 15 April 2005
Abstract In this paper we consider the situation where two independent random walks are used in various frequentlyemployed nonlinear test and estimation procedures. We show analytically and by simulation that all nonlinear test and estimation procedures wrongly indicate that (i) the two independent random walks have a significant nonlinear relationship, and (ii) the spurious nonlinear relationship becomes stronger as the sample size approaches infinity. D 2005 Elsevier B.V. All rights reserved. Keywords: Spurious nonlinearity; Random walk; Nonlinear tests JEL classification: C12; C13; C22
1. Introduction The seminal study of Granger and Newbold (1974) showed that when two independent random walks are used in a linear regression, one tends to find a significant relationship between the variables. This phenomenon is termed dspurious regressionT. Granger and Newbold discovered this phenomenon
T Corresponding author. Mailing address: 134 Shinchon-dong, Seodaemun-gu, Seoul 120-749, South Korea. Tel.: +82 2 2123 5461; fax: +82 2 313 5331. E-mail address:
[email protected] (T.-H. Kim). 0165-1765/$ - see front matter D 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.econlet.2004.10.016
302
Y.-S. Lee et al. / Economics Letters 87 (2005) 301–306
by undertaking a Monte Carlo experiment and later Phillips (1986) provided a rigorous and elegant proof. The results in the two papers served as a springboard to a subsequent long series of investigations of the phenomenon for different types of regression and different types of data generating process. While Granger and Newbold (1974) and Phillips (1986) used driftless random walks, Entorf (1997) analyzed two independent random walks with non-zero drifts. It is now well understood that the spurious regression phenomenon also occurs for a wider class of time series; I(2) processes (Haldrup, 1994), nonstationary fractionally integrated processes (Marmol, 1998) and stochastic unit root processes (Granger and Swanson, 1997). All these studies, however, rely on the use of linear regression and, therefore, demonstrate the existence of dspurious linear regressionT. To the best of our knowledge, no previous papers investigate the possibility of spurious nonlinear relationship between two independent variables. This paper investigates the finite sample and large sample behavior of some popular nonlinear tests when they are employed to test if there exists a nonlinear relationship between two independent random walks. Obviously, the objective of our investigation is the case where there is no nonlinear cointegration. If there is such a relationship between two integrated series, then estimation and inference can be carried out by the methodology developed by Park and Phillips (2001).
2. Nonlinear tests We consider two independent random walks y t and x t generated from the data generating process (DGP): yt ¼ yt1 þ eyt ;
xt ¼ xt1 þ ext :
ð1Þ
where t = 1,2,. . .,T and the error terms e yt and e xt are assumed to satisfy the following conditions: (i) e yt is i.i.d(0, r y2), (ii) e xt is i.i.d(0, r x2) and (iii) e yt and e xt are independent. To investigate a possible nonlinear relationship between y t and x t , a researcher might conduct various nonlinear tests. Most nonlinear tests are usually based on the model: yt ¼ x˜ tVh þ g ðxt ; a; bÞ þ et where x˜ t = (1, x t )V and h = (h o, h 1)V. The function g(x t , a, b) captures a possible nonlinear contribution of x t , and a, b are some parameters governing the nonlinear behavior of that function. The nonlinear function g is assumed to satisfy g(x t , a, b = 0) = 0 so that the natural null hypothesis is then H 0 : b = 0. We include in our study five popular nonlinear tests (RESET test, McLeod and Li test, Keenan test, Neural Network test, White’s information matrix test) and one newly developed nonlinear test by Hamilton (2000). All six tests are based as a preliminary step on the OLS regression of y t on x˜ t . For our subsequent discussion, we define hˆ to be the OLS estimator from that regression, eˆ t to be the residuals (eˆ t = y t x˜ tVhˆ) and f t to be the fitted value ( f t = x˜ tVhˆ ). In what follows, we briefly discuss some nonlinear
Y.-S. Lee et al. / Economics Letters 87 (2005) 301–306
303
tests. For the other unexplained tests, the readers are referred to Lee et al. (1993) for a detailed discussion. The RESET test exploits the idea that if there is no nonlinearity, then any nonlinear transformation of f t should not be useful in explaining y t . The test is based on the following auxiliary regression: eˆ t ¼ x˜ tVh˜ þ
r X
aˆ k ftkþ1 þ v˜ t
ð2Þ
k¼1
for some r z 1: The null hypothesis is H 0 : a 1 = . . . = a r = 0. The RESET statistic is given by RESET = TR 2 d 2 where R 2 is the uncentered R 2 from Eq. (2). It is shown that RESET Y v ðrÞ. When r is large, f tk+1 can be highly correlated. To avoid the multicollinearity problem, the largest r*(b r) principal components are used. The Keenan test is based on Tukey’s nonadditivity test using f t2 as in the RESET test. The procedure to obtain the statistic is as follows: (i) regress y t on x˜ t to obtain f t and eˆ t (ii) regress f t2 on x˜ t and calculate the residuals denoted by nˆt , and (iii) regress eˆ t on nˆ t and obtain the residuals vˆt . The Keenan test 2 asks whether the squaredfitted value f t has 2any additional forecasting ability for y t . The test statistic 1 d is Keenan ¼ ðT 4Þˆe Vnˆ nˆ Vnˆ nˆ Vˆe =ˆv Vˆv Y v ð1Þ where eˆ , nˆ , vˆ, are T 1 vectors of eˆ t , nˆtV, vˆt , respectively. The flexible nonlinear test in Hamilton (2000) is based on y t = x˜ tVh + km(gx t ) + t where m(d ) is a unknown nonlinear function. Naturally, the null hypothesis is H0 : k = 0 and its test statistic is given by 2
Hamilton ¼
rˆ 4 2tr
where M = I T X(X VX) element H ij is given by
h
1
½eˆ VH eˆ rˆ 2 trð M HM Þ
d 2 i2 Yv ð1Þ M HM ðT k 1Þ M trð M HM Þ 1
X V, X is the T 2 matrix made of x˜ t , and H is a T T matrix whose (i, j)th
h
h
if otherwise xi xj
V1 ¼ 0 Hij ¼ 1 xi xj
2 2 h i1=2 P P with h ¼ 2 T 1 Tt¼1 ðxt x¯ Þ2 where x¯ ¼ T 1 Tt¼1 xt . The Hamilton test is different from the previous five tests in that the user does not need to specify the nonlinear function m(d ) and this is why the method is called dflexibleT.
3. Spurious nonlinearity We first present some simulation evidence for spurious nonlinearity. Two independent random walks y t and x t are generated through Eq. (1) and then the six tests in section 2 are employed to test whether y t
304
Y.-S. Lee et al. / Economics Letters 87 (2005) 301–306
Table 1 Proportion of rejections of nonlinear tests at nominal 0.05 level T
50
100
500
1000
10,000
RESET McLeod Keenan Neural White1 White2 White3 Hamilton
0.40 0.67 0.38 0.38 1.00 1.00 1.00 0.51
0.63 0.99 0.55 0.56 1.00 1.00 1.00 0.73
0.89 1.00 0.80 0.80 1.00 1.00 1.00 0.98
0.96 1.00 0.85 0.85 1.00 1.00 1.00 1.00
1.00 1.00 0.95 0.96 1.00 1.00 1.00 1.00
is nonlinearly related to x t for various sample sizes T = 50, 100, 500, 1000 and 10,000. In our simulation, we use r = 5 (r* = 2) for the RESET test, m = 20 for the McLeod and Li test and q = 10 ( q* = 2)1 for the neural network test. The simulation results are reported in Table 1. It is striking to note that even with T as low as 50, the rejection rates of all six tests are considerably larger than the nominal level 5%. All White tests reject the null of linearity in favor of nonlinearity almost 100% of the time, displaying most severe spurious nonlinearity. They are followed by the McLeod (67%) and Hamilton (51%) tests. The RESET (40%), Keenan (38%) and Neural (38%) tests are the most resistant among the six tests considered. As the sample size increases, the rejection rates for the White, McLeod and Hamilton tests quickly converge to one while the rate of convergence is much slower for the other tests. For example, for the Keenan and Neural tests, rejection rates are 95% and 96% even when T is 10,000. We have conducted an additional simulation for these tests and confirmed that the rejection rate does converge to one eventually. The simulation results suggest that none of the six test statistics converges in distribution to the intended Chi-square distributions shown in Section 2, and in fact they tend to diverge to infinity. Given this strong simulation evidence, we now provide a theoretical explanation for this spurious nonlinear regression phenomenon. In particular, we investigate the asymptotic behavior of the RESET and Keenan tests since these two are not only among the most frequently used nonlinear tests by applied economists, but they also render the theoretical analysis tractable. Theorem 1 below shows the limiting distributions of the RESET and Keenan tests when scaled by T 1. Theorem 1. Suppose y t and x t are generated by Eq.(1) and assumption 1 holds. When r = 1 for the RESET test, we have 1 2 1 2 Wff Wee W2ef ; T RESET Z Wef =Wff Wee ; T Keenan Z Wef
1 In the McLeod and Li test, m is the number of autocorrelation terms included in the test statistic. For the neural network test, q is the number of nonlinear terms in the auxiliary regression and q* is the largest principal components used to avoid the multicollinearity problem when q is large.
Y.-S. Lee et al. / Economics Letters 87 (2005) 301–306
305
where Wef ¼ r3y fð2x pÞ þ r3y f2
(Z
1 0
(Z
1
W ðrÞV ðrÞdr
Z
0
1
W ðrÞdr
Z
0
)
1
V ðrÞdr 0
(Z ) Z 1 Z 1 Z 1 1 2 2 3 3 3 W ðrÞ V ðrÞdr W ðrÞ dr V ðrÞdr ry f W ðrÞ dr W ðrÞ2 dr 0
0
0
0
) "Z (Z )2 # Z 1 1 1 ; W ðrÞ2 dr W ðrÞdr W ðrÞdr r3y f2 ð2x pÞ 0
0
r4y f2 ð2x
Wff ¼
2
pÞ
"Z
1
2
W ðrÞ dr
(Z
0
0
#
1
W ðrÞdrg
2
0
þ
2r4y f3 ð2x
(Z
1
pÞ
W ðrÞ3 dr
0
"Z (Z 2 # Z 1 Z 1 1 1 2 4 2 4 4 ; W ðrÞdr W ðrÞ dr þ ry f W ðrÞ dr W ðrÞ dr 0
r2y
Wee ¼
R
Z
1
Z
(Z
2
V ðrÞ dr 0
R
V ðrÞdr
2
W ðrÞ dr
R
Z
"Z
1
2
W ðrÞ dr 0
(Z
1
2 # ; W ðrÞdr
0
1
0
1
V ðrÞdr ; )2
W ðrÞdr
W ðrÞdr;
3
W ðrÞ dr
0
R
r2y f2
0
1
1
R
W ðrÞdr
0
0
0
R
1
0
1
V ðrÞdr f
p ¼ 2x þ f
R
(
1 0
)2
1 0
W ðrÞV ðrÞdr
0
x¼
0
1
0
f¼
0
R
1
0
1 2
W ðrÞ dr
2
W ðrÞ dr ( 1
R
0
R 0
1
W ðrÞdr ; )2
W ðrÞdr
and W(r) and V(r) are independent Wiener processes on C [0, 1]. It is straightforward but tedious to prove Theorem 1. It is a direct application of the unit root mathematics. A detailed proof of this result is provided in Lee et al. (2004)2. The limiting distributions
2
The discussion paper can be downloaded from: http://www.swan.ac.uk/economics/dpapers/2004/0404.pdf.
306
Y.-S. Lee et al. / Economics Letters 87 (2005) 301–306
are not Chi-square distributions and the results confirm our observation from the simulation that the statistics diverge to infinity, rejecting the null of linearity in favor of spurious nonlinearity. A large size distortion of the RESET test has been reported via the means of Monte Carlo simulation in the literature when y t and x t are stationary, but highly autocorrelated (see Porter and Kashyap, 1984; Leung and Yu, 2001). Hence, our analytic and Monte Carlo results on the RESET test can be regarded as an extreme case extension of the previous work on that test.
4. Summary While most papers in the spurious regression literature focus exclusively on the linear regression framework, in this paper we present both small sample and large sample evidence showing that dspurious nonlinear regressionsT can occur when two independent random walks are used in nonlinear estimation and testing procedures. Thus, this paper shows that the so-called spurious regression phenomenon can occur in practice more frequently and more widely than previously perceived. We conclude the paper by emphasizing that when interpreting nonlinear test results in favor of nonlinearity, applied economists should weigh the evidence against the persistence structure of the time series under investigation.
References Entorf, H., 1997. Random walks with drifts: nonsense regression and spurious fixed-effect estimation. Journal of Econometrics 80, 287 – 296. Granger, C.W.J., Newbold, P., 1974. Spurious regressions in econometrics. Journal of Econometrics 2, 111 – 120. Granger, C.W.J., Swanson, N., 1997. An introduction to stochastic unit-root processes. Journal of Econometrics 80, 35 – 62. Haldrup, N., 1994. The asymptotics of single-equation cointegration regressions with I(1) and I(2) variables. Journal of Econometrics 63, 153 – 181. Hamilton, J.D., 2000. A parametric approach to flexible nonlinear inference. Econometrica 69, 801 – 812. Lee, T.-H., White, H., Granger, C.W.J., 1993. Testing for neglected nonlinearity in time series models. Journal of Econometrics 56, 269 – 290. Lee, Y.-S., Kim, T.-H., Newbold, P., 2004. Spurious nonlinear regressions in econometrics. Discussion paper, University of Wales Swansea. Leung, S.F., Yu, S., 2001. The sensitivity of the RESET tests to disturbance autocorrelation in regression analysis. Empirical Economics 26, 721 – 726. Marmol, F., 1998. Spurious regression theory with nonstationary fractionally integrated processes. Journal of Econometrics 84, 233 – 250. Park, J.Y., Phillips, P.C.B., 2001. Nonlinear regressions with integrated time series. Econometrica 69, 117 – 161. Phillips, P.C.B., 1986. Understanding spurious regressions in econometrics. Journal of Econometrics 33, 311 – 340. Porter, R.D., Kashyap, A.K., 1984. Autocorrelation and the sensitivity of RESET. Economics Letters 14, 229 – 233.