225
Economics Letters 29 (1989) 225-230 North-Holland
A NOTE ON THE DISTRIBUTION OF THE LEAST SQUARES OF A RANDOM WALK WITH DRIFT *
ESTIMATOR
Svend HYLLEBERG University of Aarhus, DK-8000 Aarhus, Denmark
Grayham
E. MIZON
University of Southampton,
Southampton
SO9 5NH,
UK
Received 29 August 1988 Accepted 10 November 1988
It is shown that the application of the result that the Dickey-Fuller ‘t’ obtained from a regression with an intercept is asymptotically normal if the DGP is a random walk with drift may be of little use in small samples unless the drift is enormous. In fact the Dickey-Fuller distribution may give a better approximation in many cases.
1. Introduction One of the more popular tests for the existence of a unit root in a time series is due to Dickey and Fuller; see Fuller (1976). To test the hypothesis H, : p = 1 against H, : p < 1 in the model x, = px, + Et, E, - nid (0, 0,‘) they suggest the use of the ‘t ’ statistic on v in the auxiliary regression Ax, = nx,-r + cr with rejection of the hypothesis of a unit root p = 1 (i.e., r = p - 1 = 0) if the ‘t’ value is sufficiently negative. As the statistic has a non-standard distribution under the null, Dickey and Fuller compute and tabulate the critical values using Monte Carlo simulations. They also tabulate the distribution in the case where the auxiliary regression contains an intercept but the null is as above. In the latter case it is correctly argued in West (n.d.) and Dolado and Wikinson (1988) that if the data generating process contains an intercept, i.e., has the form x, = pxl_r + p + eI, er - nid(O, u,‘), p = 1 and p f 0, the ‘t ’ statistic in the auxiliary regression Ax, = 7~x~_r + m + e, is asymptotically normal. See also Nankervis and Savin (1987). Consequently, it is recommended that the standard normal tables be used instead of the Dickey-Fuller tables, as use of the latter will result in too few rejections of the unit root hypothesis asymptotically. However, we will argue that although the asymptotic results are correct, the use of the standard normal critical values in small samples may lead to rejection in too many cases as the small sample distribution of the Dickey-Fuller ‘t’ is far from the standard normal in cases where one would consider testing for unit roots. In fact one may be better off using the Dickey-Fuller tables. * The Monte Carlo simulations were made by help of Gauss while the regressions were made using PC-Give. The plots were made in ISISMA. Grants from the Research Foundation of the University of Aarhus and from the Danish Social Science Research Council are gratefully acknowledged together with help and assistance from Kirsten Stentoft and Lars Stampe Villadsen. 0165-1765/89/$3.50
0 1989, Elsevier Science Publishers B.V. (North-Holland)
S. Hylleberg
226
G. E. Mmw
/ Distribution
of least squares estimator
2. The small sample distribution Consider the DGP for t = 1, 2 ,..., T, x,=x,_, + p + E,,c - nid(O, u,‘) which implies (x, = + + Z;=,e,, where x0 = 0 for simplicity. The series x, then consists of a deterministic trend component pt and a stochastic trend component C:=I~,. A series generated by (1) is called integrated of order 1, I(l), by Engle and Granger (1987) or more precisely integrated of order 1 at the zero frequency I,(l), see Hylleberg, Engle, Granger and Yoo (1988) as the series becomes stationary by one pass through the zero frequency filter 1 - B.
Table
1
Small sample distributions process
x, = x,_,
of Dickey and Fuller’s ‘t ’ based on the auxiliary
+ p + c,, cI - nid(O,l),
0.001
0.010
0.100
0.250
0.500
1.000
10.000
Normal
(0,l)
dist.)
Carlo
regression
Ax, = TX, ~, + m + e,. Data generating
replications.
0.01
0.05
0.10
0.50
0.90
- 3.34
- 2.81
- 2.52
- 1.52
- 0.36
-0.02
0.75
100
- 3.39
- 2.85
- 2.56
- 1.56
- 0.41
- 0.04
0.62
200
- 3.41
- 2.91
-2.59
- 1.57
- 0.44
- 0.08
0.59
400
- 3.50
- 2.88
- 2.58
- 1.58
- 0.42
- 0.06
0.63
0.000 (Dickey-Fuller
. T. 18000 Monte
Fractiles
T
P
f = 1,2,.
0.95
0.99
48
- 3.32
- 2.74
~ 2.44
- 1.52
- 0.40
- 0.02
0.62
100
- 3.53
- 2.93
- 2.63
- 1.58
- 0.47
PO.09
0.55
200
- 3.40
- 2.84
- 2.55
- 1.56
- 0.43
- 0.06
0.64
400
- 3.42
- 2.86
- 2.58
- 1.57
-0.45
~ 0.07
0.57
48
- 3.50
- 2.92
- 2.63
-1.59
0.67
- 3.41
- 2.82
- 2.53
- 1.54
- 0.42 - 0.42
PO.04
100
~ 0.05
0.63
200
- 3.47
- 2.91
- 2.60
- 1.56
~ 0.44
~ 0.07
0.55
400
- 3.42
-2.88
~ 2.57
~1.56
~ 0.42
- 0.04
0.65
48
- 3.35
~ 2.75
- 2.47
-1.48
- 0.28
0.11
0.83
100
- 3.29
- 2.67
-2.34
-1.27
- 0.02
0.37
1.06
200
- 3.27
- 2.70
- 2.38
- 1.25
0.04
0.41
1.13
400
- 3.02
- 2.43
-2.14
-0.90
0.39
0.74
1.36 1.18
48
- 3.09
- 2.52
- 2.23
- 1.08
0.18
0.54
100
- 2.91
- 2.36
- 2.05
-0.83
0.44
0.79
1.46
200
- 2.78
- 2.12
-1.79
-0.56
0.69
1.03
1.69
400
- 2.62
- 1.95
~ 1.59
-0.35
0.89
1.28
1.95
48
- 2.51
- 2.02
- 1.72
- 0.59
0.62
0.96
1.62
100
- 2.65
- 1.98
- 1.62
- 0.42
0.81
1.17
1.86
200
- 2.63
-1.94
- 1.61
-0.37
0.89
1.27
1.92
400
- 2.43
- 1.78
- 1.42
-0.14
1.11
1.49
2.15
48
-2.40
- 1.80
-1.49
PO.33
1.85
-2.45
- 1.75
- 1.43
~ 0.21
0.89 1.03
1.22
100
1.36
2.03
200
~ 2.38
-1.74
~ 1.39
-0.15
1.12
1.48
2.16
400
- 2.35
-1.68
- 1.32
- 0.06
1.20
1.56
2.21
48
- 2.45
- 1.82
-1.47
-0.28
0.90
1.25
1.85
100
- 2.42
~ 1.73
- 1.39
-0.16
1.06
1.41
2.09
200
- 2.34
- 1.60
-1.24
0.02
1.26
1.62
2.31
400
- 2.50
-1.71
-1.34
- 0.08
1.16
1.53
2.21
~ 2.326
- 1.645
- 1.282
0.0
1.282
1.645
2.326
S. Hylleberg
G. E. Mizon / Dutribution
of least squares estimator
221
As the sample variability of the deterministic trend is of order O,(T’) while the sampling variability of the stochastic trend is O,(T*) the former dominates the latter asymptotically. This is seen by considering the series z, = pt where E(CT= ,z,?) = p2T(T+ 1) (2T-t 1)/6 while we for the series y, = C,‘= 1c, , 6, - nid(O, IJ~*)get E(CT=, Y,*)= a,*T(T + 1)/2. This implies that X, behaves like a normality results of deterministic trend as T - co. From Theil (1971) we know that the asymptotic least squares hold whenever the variables are linear trends and the asymptotic normality of fi in the auxiliary regression Ax, = TX,_, + m + e, follows. However, from this argument it is also immediately clear that for a given sample size T the relative influence of the term pt to the term X:=,6, depends on the relative size of /.Land a,. Table 1 contains fractile values from the small sample distributions of the ‘t ’ statistics in the auxiliary regression Ax, = TX,_, + ZI+ e, estimated by Monte Carlo experiments with eI - nid (0, l), 100, 200, 400. i.e., 0,’ = 1 and for ~1= 0, 0.001, 0.01, 0.1, 0.25, 0.5, 1.0, 10.0 and T=48, From the results presented in the table it is obvious that the small sample distribution is far from the standard normal in the upper part of the table where p is small. For large values of ~1(and T) the normal gives a reasonably close approximation. Of course, which range of values of Z.Lis the most likely in actual time series is debatable. If one applied the often recommended procedure of looking at plots of the time series in the cases presented in table 1, one would probably never think of running unit root tests when ZJwas 1 or 10 times the standard error u. Also notice that if the series are in logs a given value of ~1 corresponds to a deterministic growth in the series of 100 p percentage points.
3. An example Let us consider the following Danish time series as examples. The first is the GDP, in 1929 prices. The second is Ml,, the third the unemployment rate u,, and the fourth and last is the long-term interest rate r,. The yearly observations from 1921 to 1970 are found in the CLEO data bank, see Kaergaard (1987). Instead of using the raw series we have transformed them so that we consider log [GDP,], log [MI,], log[u,/(l - u,)] and log [l + r,]. These series are depicted in fig. 1. From the figure it is seen that all the series may have Z(1) characteristics or even Z(2), i.e., their first differences may be Z(1); see fig. 2 which depicts the first difference of log [MZ,]. We begin by testing the hypothesis of Z(2) series against the series being Z(1) and then continue by testing a hypothesis of Z(1) against that of the series being Z(0). In order to check whether the first difference or the level contain a drift A*y, and Ay, where y, = log[GDP,], log[MZ,], log[ u,/(l u,)] or log[l + r,] are regressed on a constant. The results, shown in table 2, indicate non-significant intercepts in A2y, and intercepts significantly different from zero in Ay, except for y, = log[u,/(l u,)]. The estimated relations between p and u are 0.72, 0.77, and 0.25 for A log[GDP,], A log[MZ,], and A log[l + r,], respectively. According to the asymptotic results one should therefore apply the standard normal to the DF statistic when testing the hypothesis H, : y, - Z(1) for these three series. However, the results printed in table 2 imply that the null hypothesis cannot be rejected against the stationary alternative irrespective of which distribution, i.e., the standard normal, the Dickey-Fuller distribution or the distributions given in table 1 is applied. An interesting case arises when we consider the tests for y, being Z(2). If we apply the results presented earlier, that no intercept should be in the regression, the Dickey-Fuller test and the Sargan-Bhargava test based on the Durbin-Watson statistics clearly reject the hypothesis of two unit roots for log [GDP,], log[u,/(l - u,)] and log[l + r,] while two unit
228
S. Hylleberg, G. E. Miron / Distribution
of least squares estimator
I1250 l”ijl+rtl
.10500 .09750 .09000 .08250 .01500 .06150 .06000 .05250 .04500 .03’50 .03000
(4 ‘,~~~~,,~“~~~~~~~~1’~~~~ 1920 1930
~~~~‘~~~~~~,~~‘,,“~~~~~’ ,950
1340 -
1960
1970
>
LLIG.ONER
-.sA
mj.0
-
-1.5
-
-2.0
-
-2.5
-
-3.0
-
-3.5
-
Ut
-4.0
i”“‘T1
: (b)
‘,~~~~,,~~‘~~~~~~~~~I~~~~~~~~~‘~~~~~~~,~’,~,,~~~,~’ 1920 1930 (940
1950
4960
$910
1920
j950
1960
1370
>
11.0~
10.5
-
10.0
-
9.5
-
9.0
-
8.5
-
8.0
Fig. 1.
1930
1340
The logarithms of GDP,, MI,, u,/(l - u,)
and (l+
r,) in
>
Denmark 1921L1970.
roots in log MI cannot be rejected at the 5% level using the Dickey-Fuller tables for the no intercept case where the 0.05 fractile is - 1.95; see Fuller (1976). The same result is obtained if we applied an intercept in the auxiliary regression, and used the fractiles in the first block of table 1, or, if point estimates of p (and u,) from the regression A’y, = p + et were to be trusted (p/u, = 0.04) if we used
S. Hylleberg, G. E. Mizon / Distribution
of least squares estimator
229
,240 ,210
180 ,150 ,120 .090 ,060 ,030
,030 .- ,060 -.090
‘~~~~~~~,~‘,~~~‘,,,,‘,,,,~~,,~‘~~~,~~,’~’~,~,~’~~,I 1920 4930 1940
Fig. 2. The change
in log[Ml,]
Table 2 Tests for unit roots in selected
in Denmark
Danish
1950
4960
> 1970
1921-1970.
macroeconomic
time series, 1921-1970.
Variable log]GDf’, A’Y, = p + C, AY, = P + 6,
H,:n=O Y, - I(2)
DW
H,:r
DF
Wu,/(l-
u,)l
logll + ‘r1
Intercept
1.71 -5.89
0.42 - 2.60
1.95 - 6.60
1.54 - 5.32
mzo No intercept
-4.36
- 1.84
- 5.02
- 4.67
0.02 0.71
0.01 0.12
0.14 -0.80
0.09 2.06
4.93
1.91
0.42
2.23
DW
Yt - I(f) H,:r
1~1MI, 1
- 0.0004 (0.0084) 0.0579 0.0317 (0.0063) 0.0438
0.0014 (0.0058) 0.0399 0.0474 (0.0087) 0.0610
- 0.0049 (0.0621) 0.4299 - 0.0427 (0.0436) 0.3050
0.0003 (0.0008) 0.0055 0.001 (0.0006) 0.0044
m=O
Yt - 10) H,:r=O
1
P 0, P L %
DF
Yr - I(O)
Intercept mzo No intercept m=O
a The numbers in parentheses are standard errors. DW gives the Durbin-Watson statistic for the first difference of the series when testing H, : y, - I(2) and for the level y, when testing the hypothesis H, : y, - I(1). DF gives the ‘t ’ ratio on rr from the auxiliary regression A’y, = lady,_ , + m + 6, and A y, = my,_ , + m + z,, respectively, in both cases with and without M fixed at zero. The Dickey-Fuller auxiliary regression may be augmented by lagged values of the left-hand side variable in order to whiten the errors but this was only done for the test of Z(1) for log[ MI,] where one lag was applied.
the blocks for p = 0.01 and p = 0.10. However, application of the standard normal distribution would imply rejection of the 1(2) hypothesis for log[MI,]. 4. Conclusion It has been shown that application of the result that the Dickey-Fuller ‘t ’ obtained from a regression with an intercept is asympotically normal if the data generating process is a random walk
230
S. Hylleberg, G. E. Mizon / Dzstribution
of leastsquaresestimator
with drift must be applied with care in small samples where the Dickey-Fuller better approximation to the actual small sample distribution. This is especially not enormous.
distribution may be a true when the drift is
References Dolado, J.J. and T. Wilkinson, 1988, Cointegration: A survey of recent developments, Mimeo. (Institute of Economics and Statistics, Oxford University, Oxford) Engle, R.F. and C.W.J. Granger, 1987, Co-integration and error correction: Representation, estimation and testing, Econometrica 5, 251-276. Fuller, W.A., 1976, Introduction to statistical time series (Wiley, New York). Hylleberg, S., R.F. Engle, C.W.J. Granger and B.S. Yoo, 1988, Seasonal integration and co-integration, Discussion paper. (University of California, San Diego, CA). Kzergaard, N., 1987, Nogle lange tidsserier for den danske okonomiske udvikling, Mimeo. (University of Copenhagen, Copenhagen). Nankervis, J.C. and N.E. Savin, 1987, Finite sample distributions of t and F statistics in an AR(l) model with an exogeneous variable, Econometric Theory 3, 387-408. Theil, H., 1971, Principles of Econometrics (Wiley, New York). West, K.D., n.d., Asymptotic normality when regressors have a unit root, Econometrica, forthcoming.