Journal
of Econometrics
PREDICTORS
13 (1980) 139-157.
FOR
Wayne
THE
0 North-Holland
FIRST-ORDER PROCESS*
A. FULLER
and
Iowa Srccte University,
David
Publishing
Company
AUTOREGRESSIVE
P. HASZA
Ames, IA 501 II, USA
Received July 1978, fmal version
received
December
1979
The error made in predicting a first-order autoregressive process with unknown parameters is investigated. It is shown that the least squares predictor is unbiased for symmetric error distributions. Alternative predictors for stationary and non-stationary processes are studied using the Monte Carlo method. The ordinary least squares statistics perform reasonably well for one period predictions with samples as small as ten for both stationary and non-stationary processes. It is demonstrated that there is a considerable loss in efficiency when outdated estimators are used to construct predictors.
1. Introduction
Let the first-order I:=cc,+c(, = Y0,
autoregressive y-1
+e,,
process
{x; t =O, 1,. .) be defined
by
t-1,2,..., t=O,
(1)
is a sequence of independent identically distributed where {e,;t=O,l,...) random variables [{e,) is IID(0,a2)]. The values of a, and CC,, and the form of Y, determine the nature of the time series. If Ic(,~< 1 and (2) the time series is covariance stationary. If the e, are normally distributed and (2) holds, the time series is a normal strictly stationary time series. If ax1= 1 the process is sometimes called a random walk. If cc,#O and c(~= 1 the random walk is said to display ‘drift’. If lzll> 1 the process is called explosive. of the parameters Given that (Y,, Y,,. ., Y,) are observed, the estimation *This research was partly supported by Joint 79-10 with the U.S. Bureau of the Census.
Statistical
Agreements
J.S.A. 78-30
and J.S.A.
140
WA. Fuller
and D.P. Hasza,
Predictorsforjirst-order
AR processes
(a,,a,,a*) has been extensively studied under the assumption that (r,/ < 1. See, for example, Kendall and Stuart (1966, ch. 48) Anderson (1971, ch. 6) Fuller (1976, ch. 8), and references cited by these authors. While estimation of the parameters of the model with [LX, 12 1 has been less extensively studied, considerable information on the large sample distribution of the estimators is available. See White (1958), Anderson (1959) Rao (1961) Dickey and Fuller (1976) Fuller (1976, ch. 8) Hasza (1977) and references cited by these authors. A number of Monte Carlo studies of the estimators of c1i given that Ic(il< 1 have been conducted. Examples include Cobas (1966) Thornber (1967), Orcutt and Winokur (1969), Salem (1971) and Gonedes and Roberts (1977). In part, these studies show that the bias in the least squares estimator oi, is approximately n-l (- l -3cr, ) and that the Monte Carlo variance of r*, for large positive ~1, is greater than large sample theory suggests. Somewhat less extensive literature is available on the properties of predictions for the first-order process. Davisson (1965) obtained the mean square error through terms of order n-l for one period prediction of the zero mean normal stationary autoregressive process. Fuller and Hasza (1978) derived the mean square error of prediction through terms of order n- ’ for that the processes with \rx,\ < 1, \c(,\= 1, and ICI,\> 1. They also demonstrated ordinary regression estimator of the variance of a prediction was applicable in large samples for all three cases. Orcutt and Winokur (1969) and Gonedes and Roberts (1977) have conducted Monte Carlo studies of the prediction error. Gonedes and Roberts considered (c~i(< 1, but Orcutt and Winokur included c(i = I in their study. Phillips (1979) developed an Edgeworth-type approximation for the distribution of the prediction error conditional upon Y, for the model with CI~=O. In this note we investigate the properties of predictors of the next observation(s) in a realization of the process (1). We consider the three cases la,l 1. The Monte Carlo mean square error of the predictor errors are in reasonable agreement with the theoretically developed approximations. The Monte Carlo mean square errors are slightly larger than the theoretical approximations for ICI,) close to, but less than one. The distribution of the ‘regression t-statistic’ for the one period predictor error is close to that suggested by the theoretical approximation. The variance of the ‘t-statistic’ is much larger than the theoretical approximation for predictions more than one period ahead made with small samples for processes with large positive cxr. 2. Theoretical results In this section we give some properties of predictors constructed autoregressive process. The first result establishes that the expected
for the value of
WA. Fuller and D.P. Hasza, Predictorsforfirst-order
AR processes
141
the predictor error is zero for a wide class of processes, including stationary normal processes. We note that Malinvaud (1970, p. 355) stated that the predictor for the zero mean first-order process was unbiased for symmetric error distributions. Theorem 1. Let model (1) hold with Ic(~[< 1. Let Y, be a (possibly degenerate) random variable symmetrically distributed about the mean p= and with finite variance. Let {e,) be a sequence of IID(0,a2) (l-%-‘&J, random variables with u symmetric distribution. Let {e,} be independent of‘ Y,. Dejine ~“+s=&o+oilY”,
s=l, s = 2,3,. . ,)
=&-J+t,Yn+,_,,
(3)
where
&,=n-’
[Ii x-4 *=1
i 1=1
T-l]
Let the sample size n and the distribution
Jbr t=O,l,...,
n, and s a positive E{Y,+,-
Proof:
The predictor
integer.
of e, be such that
Then
E,+J=O. computed
for X, = k; + A is
X n+s= E+,+A for any real number A. Therefore p = 0. The predictor error is
we assume,
s-1
Yn+_S-t+,=
1 j=O
with no loss of generality,
s-1
a!e,+,_j-Oi,
1 j=O
L?(+ (tl; -q)Y,,
that
142
WA. Fuller and D.P. Hasza. Predictors
forfirst-order AR
processes
and we need only evaluate
as E{e,} =0 for all t. Because oi, is denote a sample realization. Let {yO,Yi,yZ,...,K)=$, even function of qy, oi, computed from Fy is equal to that computed from where +!I:={-Y,, -Y,, -Y2 ,..., -Y”}. Likewise 6, is an odd function of Therefore Y, +S computed from +Syis the negative of that computed from and
an $F, $,,. 3:
s-1
for the sample $Jy is the negative of that for the sample because Y, + s - Y,+, has a symmetric distribution. 0 If a0 =0 the predictor processes.
is unbiased
for both
$,*. The result follows
non-stationary
Let Y, Corollary 1.1. Let model (1) hold with a,=O. degenerate) random variable with symmetric distribution, zero variance. Let {e,} be a sequence of IID(O,o’) random variables distribution and {e,} independent of’ Y,. Let E+, be defined in the sample size n and the distribution of e, be such that
jbr t=O,l,...,
n, where s is a positive
integer.
and stationary
be a (possibly mean, and finite with symmetric Theorem 1. Let
Then
JqY,+,- %+J=O Proof: The proof rests parallels that of Theorem If c1i is one, the predictor
upon the 1. 0
symmetry
is unbiased
of the distribution
of e, and
for all values of Y,.
Corollary 1.2. Let model (1) hold with cc,=0 and CI~=l. Let {e,> be a sequence of’ IID(0, a’) random variables with symmetric distribution and (e,> independent of’ Y,. Let E,,, be as defined in Theorem 1. Let the sample size n and the distribution oj’e, be such that
WA. Fuller and D.P. Hasza, Predictors for first-order
for
n, and s a positive
t=O,l,...,
integer.
143
AR processes
Then
qY,+s- E,+,Ir,>=o. Proof:
We have
where X,=
ej.
i
j=
I
It follows that
where k,+, is defined by (3) with X, replacing 1.1. 0 unbiased for X,,, by Corollary It is also of interest that construct a confidence interval
Corollary Corollary
of‘ Theorem
Y t^+s=s-1 Gfi+a&C,, r+2 iFo
fl+s
((
-
r:_,,
co,=-D-l
i 1=1
Cl1 =D-‘n,
y_1,
E+, +’
i (I:-I$-&,E;_,)~, r=1
,=l
1, of Corollary
+2aOsalsCo1 +af,C,,
where
Gm=D-’ i
2,,+,
the ‘regression t-statistic’ one would for the prediction has zero expectation.
1.3. Let the assumptions 1.2 hold. Let
cY2=(n-22)-l
x. The predictor
)I
is
use to
1.1, or of
144
WA. Fuller
and D.P. Hasza,
Predictorsforjrst-order
AR processes
a,, is the partial derivative oj’ p,+, with respect to a, evaluated at (oiO,iI ) and a,, is the partial derivative of’ pn+, with respect to CI~ evaluated at (c&,&~). Then
E{t^+,) =o. Proof: It has been established that t+, and c?, are odd functions of the sample while a^, is an even function. It is clear that S2, D, C,,, C,,, and CT:; c?fi are even functions while C,, is an odd function. The partial derivative
is an even function
of the sample,
is an odd function of the argument of Theorem 1. 0
while
sample.
The
result
follows
by
the
symmetry
Theorem 1 and the corollaries were presented for the first-order process, but they extend immediately to higher-order processes. Theorems 2 and 3 follow from the results of Fuller and Hasza (1978) and are presented without proof.
Theorem 2. Let model (1) hold with la,l be a sequence oj’normal independent (0, a2) random variables and let Y, be defined by (2). Let t+, be defined by (3). Then s-1 EI(Y,+,-
~“+,)‘>=cT’
C
[
j=O
(, ,I s-1
c++n-‘s2af(“-‘~+nP1
‘2
_F;,,+
+O(n-f). Two interesting square error of the The limiting value the variance of the
special cases are associated with Theorem 2. The mean one period prediction error is approximately 02( 1 + 2n- ’ ). of the prediction mean square error, as s becomes large, is process plus the variance of the estimated mean.
WA. Fuller and D.P. Hasza, Predictorsforfirst-order
Theorem sequence
3.
Let
model
(1) hold with IaIl= 1 and Y, fixed.
of IID(0, a’) random variables.
Ei(Y,+, -
AR processes
Let
145
{e,j
be a
Then
E,+,)2}-cr2(1+3n-‘) Ac2(1 +4n-‘)
if
cr,=fl,
cc,=O,
if
a1 = 1,
a0 #O,
where the approximation arises from the deletion of higher-order terms and from the numerical evaluation of the order n-l term. Theorem 4 gives the order 6’ approximation to the prediction error for the explosive process : Theorem 4. Let model (1) hold with Iall > 1. Let IID(0, 02) random variables. Then
{e,}
be a sequence
of
-~+l=an+l+OpWb),
Yn+l where
(4) u .+l=e,+l-[(a~-l)-n-‘a,(al+l)]L,-a,~~,
1
L,=
a,(“p’+l)e,.
t=, Prooj:
We have
Yn+l
-En+l=en+l-(a:-l)[L,+(aO-OiO)(al-l)-l]
Now
(a:-l)-‘X’F-n-‘(a,-l)-‘X2 =
i
a;‘“~‘+l)e,+Op(lall~n)
1=,
(a~-1)-1X2-n-1(a~-1)-2X2+0,(~all~”)
’
146
W.A. Fuller and D.P. Hasza, Predictors
for first-order
AR processes
where X=Y,+a,(U,-1)-l+
f
a;‘ej.
j=l
Then a _di = 0 0
(a: - l)-‘e-n~‘(a,
We have ~?=0,(n-~)
- l))‘L”
(a;_l)-l_n-l(al_1)-2
+%(l~ln~
and
ao-~o=e-n-‘(al
+l)L,+O,(n-*).
Therefore Yn+1
-(al-l)(L,+[e,-n-‘(tL1+l)L,](CI1-l)-l}
-En+l=en+l
+~~--n-l(a,+l)L,+O,(n-Q) =en+l
-L,[(af-1)--n-’
ccl(al+l)]-al~n+O,(n-+).
We have _E{$} =n-r02, E{L,2} = f; (.q2(n-t+1)02 t=1
=(a:-l)-
=n
1c72+O(/aI(-2n),
-‘(a1
-l)-‘a2+O(n-‘[aI[-“).
Therefore, 0~2E(u,2+l}=l+(a~-l)-1[(a~-l)-n-’a,(a,+1)]2+a~n~1 +2a,n-1(a,-1)-1[(a~-1) +O(n-
-n-la,(a,+l)]
‘Ja,l-“)
= a~(1+n-‘)-n-2a~(a1-l))-1(a1+I) +O(n-‘la,J”).
0
The explosive case is interesting in that estimation of the parameters produces an increase in the leading term of the prediction mean square error. Thus the order one term with estimated a1 is afa’ instead of the value o2
WA. Fuller
associated prediction
und D.P. Huszu,
with known for s periods
Predicfors
c1i. The order ahead is
forfirst-order
147
AR processes
one term in the mean
square
error
d (s2a2’S - “(a; - 1) + (up - 1 )(x(: - I)- ’ }
of
(5)
Fuller and Hasza (1978) have shown that the ordinary regression estimator of the variance of the predictor given in the denominator of t?,+, is an appropriate estimator of the mean square error of the autoregressive predictor. They demonstrated that the order n-l term of the least squares estimator was estimating the order n- ’ term of the mean square error for several models. It follows that, in large samples, i,,+, can be used to set confidence intervals for predictors irrespective of the magnitude of u1 when the et are assumed to have a normal distribution.
3. Monte Carlo study In this section we report the results of a Monte Carlo study of the prediction error for the first-order autoregressive process. To simulate the random variables a sequence of NID(O,l) random variables was generated using the program SUPER DUPER from McGill University [Marsaglia et al. (1976)]. For stationary processes the first observation was generated as Y,= (1 -cc:)-feo, and the remaining
observations
~=al~-l+er,
of the sample t-l,2
by
)..., n.
For the non-stationary processes Y, was set equal to zero. The error in predicting Y,+ , given Y,, Y,_ , , . . ., Yi, Y, can be written
-E+1=%+1+ (a,--&,j+
Y?I+1
as
(a, -&)y,,
and E{(Y,+, - I;,+,)‘}=W+, Therefore,
it is only necessary
}+E{C(cc,-~,)+(a1-C1*1)Yn12)
to simulate
the distribution
of
to obtain an estimate of the mean square error of the one period prediction error. Similar expressions hold for two and three period predictions.
148
WA. Fuller and D.P. Hasza, Predictors
for first-order
AR processes
Table 1 contains the Monte Carlo variances of the error made in predicting the first-order process one, two, and three periods in the future. The predictor is the least squares predictor defined in (3). Because the normal distribution is symmetric the predictors are unbiased and the mean square error of prediction is equal to the variance. The entries in the table are the Monte Carlo variances estimated from’ 5000 samples. Each parameter-sample size conliguration is constructed from an independent set of N(0, 1) random variables. That is, the three entries for s = 1,2, and 3 for ai = - 1 and T= 10 were constructed from one set of samples, while the three entries for ai = -0.9 and T= 10 were constructed from an independent set of samples. We follow the notation of Orcutt and Winokur, letting T denote the total number of observations available and n = T - 1 denote the number of observations used in the regression. The large sample theory gives 1+2n-‘, 1+3n-‘, and af(l+n-‘) as the mean square error of one period prediction for Ic(i( < 1, /t(il= 1, and ltxil> 1, respectively. Generally, for small n, the Monte Carlo mean square errors are slightly larger than the theoretical approximations. The agreement between the Monte Carlo results and the theoretical approximation is quite good for Ia11 = 1 for all sample sizes. The theoretical variances have a discontinuity at iail = 1, while the Monte Carlo variances are much smoother in the vicinity of [aI\ = 1. It seems that, for small n and a1 E [ - 1, 11, the largest mean square error of one period prediction occurs for aI slightly less than 1. The large sample approximation to the variances of the three period prediction error for n=19 and a1 = -0.9, -0.5, 0, 0.5, 0.7, 0.95, and 0.99 are 2.820, 1.372, 1.053, 1.503, 1.959, 3.531, and 3.860, respectively. Thus the approximation of Theorem 2 is quite close throughout the range of a1 values Carlo results are slightly below the theoretical (la,l 1, the limiting distribuiion of a;(x*, -aI) is a multiple of a Cauchy we might expect the distribution of (6: -a:)Y, to display heavy tails. This seems to be the situation, because the Monte Carlo variance of prediction for s= 3 and a1 = 1.05 is larger for n=59 than for n=19. The variance of the prediction error for large positive a, is much below what one might anticipate on the basis of the mean square error of ii. For example, with a1 =0.9 and n= 19 the Monte Carlo mean square error of t?r is
s=2
3.05 2.33 I .48 1.18 1.29 1.71 2.17 2.72 2.75 2.70 2.8 1 3.02 3.05
s=l
1.37 1.28 1.26 1.27 1.29 1.35 1.36 1.38 1.36 1.31 1.32 1.35 1.34
Parameter
- 1.0 -0.9 -0.5 0.0 0.2 0.5 0.7 0.9 0.95 0.99 1.oo I .02 1.05
T= 10
Table 1
6.07 3.77 1.66 1.17 1.34 2.03 2.94 4.40 4.47 4.46 4.86 5.58 5.55
s=3 1.16 1.12 1.11 1.11 1.12 1.13 1.14 1.18 1.16 1.16 1.16 1.15 1.19
s=l
T=20
2.40 1.99 1.32 1.06 1.13 1.42 1.77 2.26 2.36 2.43 2.44 2.50 2.73
s=2 3.99 2.85 1.39 1.06 1.13 1.51 2.14 3.24 3.53 3.77 3.79 4.02 4.65
s=3 1.05 1.04 1.03 1.04 1.03 1.04 1.04 1.05 1.05 1.05 1.05 1.07 1.16
S=l
T=60 s=3 3.31 2.58 1.33 1.02 1.07 1.37 1.85 2.73 3.05 3.32 3.34 3.68 4.97
s=2 2.13 1.87 1.27 1.02 1.07 1.30 1.57 1.96 2.08 2.17 2.17 2.30 2.78
Monte Carlo mean square error of least squares predictor error (5000 samples)
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.04 1.10
s=l
T=x
2.00 1.81 1.25 1.00 1.04 1.25 1.49 1.81 1.90 1.98 2.00 2.21 2.56
s=2
3.00 2.47 1.31 1.00 1.04 1.31 1.73 2.47 2.72 2.94 3.00 3.52 4.44
s=3
150
WA. Fuller and D.P. Hasza, Predictors for first-order
AR processes
0.082 and the variance of Y, is 5.263, while the Monte Carlo variance one period prediction error is 1.176. To better understand the variance recall that the variance of the estimation portion of the prediction for one period prediction is
In fact loi, -a11 is generally
oi,=n-’
i
smaller
of the result, error
for large lY.1, and
(~-o;IY,_r)“(l-oi,)j
t=1
is negatively correlated with (2, - a1 ) Y,. The estimated standard errors for the mean computed using the common moment formula,
square
errors
of table
1 were
5000
@%=)=(5000)-‘(4999))’
c
(MSE,-MSE)‘,
j=l
where MSE is the entry in the table and MSEj is the squared error computed for the jth sample. The estimated standard errors for T = 10 were about 0.012, 0.041, and 0.15 for one, two, and three period predictions respectively. The estimated standard errors for T=60 were about 0.0015, 0.0041, and 0.011 for one, two, and three period predictions respectively. The sample percentiles for the distribution of the ‘r-statistic’ defined in Corollary 1.3 are given in table 2. For this simulation x+, was generated and the r-statistic constructed from the definition. For one period prediction the percentiles are in reasonable agreement with those of Student’s t with n-2 degrees of freedom. The percentiles are greater than those of Student’s t for the larger values of a1 and less than those of Student’s t otherwise. The smallest percentiles occur when a1 is near zero. The percentiles for three period predictions, given in table 3, deviate considerably from those of Student’s t for the smaller sample sizes. The percentiles for a1 near zero are less than those of Student’s t while the percentiles for a1 of large absolute value are greater than those of Student’s t. For large positive a1 the percentiles are much larger than those of Student’s t. The agreement between the theoretical and observed percentiles definitely improves as the sample size is increased. The estimated standard errors were about 0.04 for the estimated five percent points and about 0.08 for the estimated one percent points of table 2.
WA. Fuller and D.P. Haszu, Predictorsfor,first-order
Table Sample
percentiles
of regression
‘t-statlstx’
T= 10
AR processes
2 for one period
prediction
T=20 Probabihty
151
(5000 samples). T=60
of a larger
value, sign ignored
Parameter
0.05
0.01
0.05
0.01
0.05
0.01
-1.00 - 0.90 -0.50 0.00 0.20 0.50 0.70 0.90 0.95 0.99 1.00 1.02 1.05
2.31 2.32 2.29 2.24 2.40 2.39 2.34 2.48 2.50 2.53 2.55 2.51 2.54
3.45 3.42 3.29 3.33 3.52 3.50 3.41 3.54 3.68 3.79 3.19 3.83 3.56
2.12 2.10 2.07 2.07 2.05 2.11 2.13 2.17 2.20 2.17 2.20 2.22 2.23
2.92 2.89 2.84 2.92 2.80 2.84 2.86 2.91 3.01 2.93 3.04 2.93 2.91
1.98 1.91 2.01 2.00 2.00 2.00 2.00 2.03 2.01 2.04 2.05 2.07 2.06
2.60 2.60 2.55 2.70 2.11 2.67 2.66 2.13 2.70 2.65 2.74 2.16 2.75
2.365
3.499
2.110
2.898
2.002
2.665
t with (n-2)
d.f.
The estimated standard errors were about 0.06 for the estimated five percent points and about 0.13 for the estimated one percent points of table 3. Tables 1 and 2 were constructed under the assumption that no information on the magnitude of (a,,~,) was available. If one has knowledge that lclilz 1 and that czO=0 if Ic(iI = 1 it is reasonable to use the estimators o?,=-1
if
&is-l,
=l
if
oi,zl,
A =cZi
otherwise,
&-,=n-1
[i
r;--C;, 1=1
i
the predictor
~+l=o?o+071y,. We also consider ‘I?+
if
l&,l
otherwise,
=o
to construct
x-i]
I=1
1, (i) =
the predictor G(i)
+
aT(i) Y.9
Table Sample
percentiles
of regression
‘t-statistic’
T= 10
3
for three period
prediction
T=20 Probability
(5000 samples). T=60
of a larger
value, sign ignored
Parameter
0.05
0.01
0.05
0.01
0.05
0.01
-1.00 - 0.90 -0.50 0.00 0.20 0.50 0.70 0.90 0.95 0.99 1.00 1.02 1.05
2.61 2.56 2.24 2.17 2.35 2.60 2.90 3.34 3.41 3.56 3.54 3.57 3.66
4.28 3.92 3.27 3.02 3.39 3.72 4.35 4.94 5.09 5.50 5.65 5.69 5.89
2.30 2.22 2.06 2.01 2.11 2.19 2.42 2.69 2.75 2.72 2.76 2.75 2.84
3.29 3.22 2.82 2.85 2.85 3.08 3.40 3.90 4.03 3.85 3.98 4.00 4.21
2.05 2.06 2.02 1.98 2.00 2.07 2.08 2.16 2.22 2.25 2.22 2.27 2.18
2.85 2.76 2.68 2.64 2.62 2.77 2.87 2.97 3.00 3.16 3.04 3.04 2.97
t with (n-2) d.f.
2.365
3.499
2.110
2.898
2.002
2.665
where ay(i, = 1 zz- 1 = (n--4)-’
=n
-’
i
[(n-i)&,
+ l]
(k;-U:(i,x_I)
if
(n-4)-‘[(n-i)oi,+l]zl,
if
(,7-4)-‘[(n-i)&,+1]5-1,
otherwise,
otherwise.
t=1
The estimator ~r:~~, with no truncation Winokur (1969). Gonedes and Roberts (1977) suggested
where
was
investigated
the predictor
by Orcutt
and
WA. Fuller and D.P. Husza, Predictorsforprst-order
AR processes
153
We also calculated the maximum likelihood estimator under the assumption that the observations form part of a realization of a stationary We denote the predictor normal first-order autoregressive process. constructed from the maximum likelihood estimator by yn+ I,m,. In table 4 we compare the mean square errors of predictor errors for the alternative estimators with T=20. All estimators for a particular sample sizeparameter configuration were computed from the same set of samples. AS in table 1, an independent set of samples was generated for each sample sizeparameter configuration. The ranking of the estimators is essentially the same for other sample sizes. As one would expect, the predictor error of ?,,+ 1 values close to the boundary of is smaller than that of p,+ , for parameter the parameter space. Comparing tables 1 and 4 we see that the reduction in predictor error variance associated with truncation of oi, is greater for CI,= -1 than for a,=l. The maximum likelihood predictor is marginally superior to the truncated least squares predictor for CI~ in the interior of the parameter space (-0.9 scr, 50.9) for both s = 1 add s = 3. The truncated least squares estimator is superior at u1 = k 1. The predictions based upon ET(~, and CY~(~,are superior to the truncated least squares predictor for c(~ close to one, but are inferior for some smaller positive values of cI,. One interesting result is the fact that the estimator t+,,, is dominated by the estimator Y,*+ , , t1,. That is, there is no sample size-parameter configuration for which the mean square error of pn+ 1.6- Y,, , is less than the mean square error of Y,*+ , ,C1J- Y. + 1. This result does not hold for s= 3. For the longer prediction period t+,,, is superior to Y,*+,,(,, for CI~~0.9. When comparing alternative estimators, Gonedes and Roberts (1977) chose as a criterion the mean of the squares of the 20 one period prediction errors for the next 20 observations, where the predictions were constructed without updating the estimated parameters used to form the predictions. Thus for n = 19, Gonedes and Roberts constructed the 20 predictions
where the subscripts on the estimated parameters constructed from observations ( YO,Y,, . . ., Y,,). criterion was the expected value of
indicate that they were The Gonedes-Roberts
Table 5 has been constructed to illustrate the effect of not using the most recent data in computing predictions. The first column of the table is the
Table 4
y:+,,,3, 1.08 1.11 1.11 1.12 1.12 1.13 1.13 1.12 1.09 1.07 1.07
O+, 1.08 1.10 1.11 1.11 1.12 1.13 1.14 1.16 1.15 1.13 1.13
t+,.,I
1.13 1.09 1.10 1.11 1.11 1.12 1.12 1.14 1.14 1.14 1.15
Parameter
-1.00 - 0.90 - 0.50 0.00 0.20 0.50 0.70 0.90 0.95 0.99 1.00
One period predictor
1.07 1.05 1.04
1.10
1.14 1.16 1.15
1.13
1.04 1.09 1.13
y~+l,(l,
1.09 1.06 1.06
1.11
1.48 1.33 1.22
1.60
2.26 2.15 1.86
1+,.,
3.26 3.53 3.62
2.94
1.12 1.48 2.01
1.05
3.50 2.67 1.38
t+,.,,
3.35 3.49 3.50
3.11
1.13 1.51 2.10
1.06
3.25 2.75 1.39
Z+,
Three period predictor
3.20 3.27 3.27
3.07
1.14 1;57 2.22
1.06
3.27 2.76 1.39
y.*+,.,a,
Monte Carlo mean square error of alternative predictors for t = 20 (5000 samples).
3.10 3.15 3.16
2.99
1.15 1.65 2.31
1.06
3.10 2.84 1.45
3.02 3.08 3.09
2.94
1.78 2.21 2.58
1.60
5.60 4.01 1.91
y:+,,u, t+3,,
L k D a w 2 B
F
$ B &
;r: 4. i; S a >
-u
G
.s
WA. Fuller and D.P. Hasza, Predictors for first-order
Table Monte
Carlo
mean square
Estimate
Parameter
Current
-1.00 -0.90 -0.50 0.00 0.20 0.50 0.70 0.90 0.95 0.99 1.00
1.08 1.10 1.11 1.11 1.12 1.13 1.14 1.16 1.15 1.13 1.13
error
sample
155
AR processes
5
of one period prediction error by distance sample and prediction (T= 20).
between
estimation
is: One period old
Two periods old
Five periods old
Twenty periods old
1.09 1.11 1.10 1.11 1.12 1.15 1.18 1.24 1.24
1.13 1.12 1.10 1.11 1.12 1.16 1.21 1.31 1.32 1.32 1.33
1.18 1.14 1.10 1.11 1.12 1.17 1.25 1.48 1.55 1.58 1.62
1.54 1.16 1.10 1.11 1.12 1.17 1.26 1.78 2.22 2.81 3.08
1.22 1.23
Monte Carlo variance of the one period prediction error when the prediction is constructed from the most recent observations. This column is the same as the column for ?“+ 1 and n= 19 of table 4. The second column is the variance of the prediction error obtained if the sample estimates based on (yo, r,,..., Yrg) are used to construct a one period prediction for Y,, That is, the second column gives the Monte Carlo variance of Y,, - pZ1CR, where
The third column is the variance of the one period prediction error obtained by using an estimator of (a,,~,) that is two periods old, etc. The effect of using outdated parameter estimates in constructing predictions varies by parameter value. For c(i = - 1.0 there is an increase of about 40% in the mean square error as one moves from current estimates to estimates that are 20 periods old. For Ic(iI SO.5 the Monte Carlo variances of the prediction error for predictions based on current data are similar to those based on ancient data. For large positive c(i there is a considerable increase in variance associated with the failure to use the current data in constructing the one period prediction. This is less surprising when one observes that for \c(i( = 1 the mean square error of one period prediction increases without bound as the distance between the estimation sample (of fixed size n) and the prediction data increases. If a, =0.99, the Monte Carlo mean square error of the Gonedes-Roberts prediction error for the prediction of Y,, is 2.81. If one
156
WA. Fuller and D.P. Hasza, Predictorsforfirst-order
AR processes
used the observations (Yo,Y,,...,Y,,)to construct (oi,o,,, &1(38j) the variance of the one period prediction error for YJ9 would be about 1.08.
4. Conclusions This study, as well as that of Orcutt and Winokur (1969), show that the small sample behavior of predictors of the first-order autoregressive process is in better agreement with large sample theory than is the small sample behavior of estimators of the autoregressive parameter. For samples as small as T= 10 and autoregressive parameter c(~E [ - 1.0, 1.051 the maximum increase in the variance of the one period least squares predictor error over that for known r,, was less than 40 y0 (38 y0 for c(~= - 1.0). At T=20 the maximum increase was 17%. At T= 10 the maximum deviation between the order 6’ large sample theoretical approximation and the Monte Carlo mean square error was 13% (for CI, =0.90). At T=20 the maximum deviation between the theoretical and Monte Carlo mean square error for one period prediction was 7% (for ~1~ = 0.90). ‘t-statistic’ for one period prediction The percentiles of the regression agreed well with those of Student’s t with n-2 degrees of freedom. The regression ‘t-statistic’ for one period prediction has the property that its limiting distribution for the three cases lall< 1, 1x1/= 1, and [c1,1> 1 is that of a (0,l) random variable. The behavior of the ‘t-statistic’ was less satisfactory for longer period predictions. A sample size of 60 was required before the distribution of the ‘t-statistic’ for the three period predictor was in reasonable regression agreement with Student’s t. Our study demonstrated the importance of using the most current data in constructing predictions. The loss in efficiency associated with the use of outdated data is greatest for large positive’&, . If T = 20 and CI~=0.95 the use of estimates one period old results in an eight perdent increase in predictor variance relative to the use of estimates based on current data. If ~1~= 1, T =20, and the estimation sample is five periods old the loss in efficiency is about 43 percent. It is possible to construct predictors that perform better than the least squares predictor for some values of CI~. The predictor constructed from the maximum likelihood estimators of a0 and CI~ is superior to the truncated least squares estimator except for Iall close to one. All alternative predictors studied were inferior to the least squares predictor for some values of ~1~.The one period predictor based upon differences suggested by Gonedes and Roberts (1977) was dominated by another estimator and is not recommended.
WA. Fuller c~r~tl D.P. Hasza, Predictors for first-order
AR processes
157
References Anderson, T.W., 1959, On asymptotic distributions of estimates of parameters of stochastic difference equations, Annals of Mathematical Statistics 30, 676687. Anderson, T.W., 1971, The statistical analysis of time series (Wiley, New York). Cobas, J.B., 1966, Monte Carlo results for estimation in a stable Markov time series, Journal of the Royal Statistical Society A129, no. 1, 1 l&l 16. Davisson, L.D., 1965, The prediction error of stationary Gaussian time series of unknown covariance, IEEE Transactions on Information Theory 11, 527-532. Dickey, D.A. and W.A. Fuller, 1979, Distribution of the estimators for autoregressive time series with a unit root, Journal of the American Statistical Association 74, 427431. Fuller, W.A., 1976, Introduction to statistical time series (Wiley, New York). Fuller, W.A. and D.P. Hasza, 1978, Properties of prediction for autoregressive time series, Report to the US. Bureau of the Census (Department of Statistics, Iowa State University, Ames, IA). Gonedes, N.J. and H.V. Roberts, 1977, Differencing of random walks and near random walks, Journal of Econometrics 6, 289-308. Hasza, D.P., 1977, Estimation in nonstationary time series, Ph.D. dissertation (Iowa State University, Ames, IA). Kendall, M.G. and A. Stuart, 1966, The advanced theory of statistics, Vol. 3 (Hafner, New York). Malinvaud, E., 1970, Statistical methods of econometrics (North-Holland, Amsterdam). Marsaglia, G., K. Ananthanarayanan and N.J. Paul, 1976, Improvements on fast methods for generating normal random variables, Information Processing Letters 5, no. 2, 27-30. Orcutt, G.H. and H.S. Winokur, 1969, First order autoregression: Inference, estimation, and prediction, Econometrica 37, 1-14. Phillips, P.C.B., 1979, The sampling distribution of forecasts from a first-order autoregression, Journal of Econometrics 9, 241-262. Rao, M.M., 1961, Consistency and limit distributions of estimators of parameters in explosive stochastic difference equations, Annals of Mathematical Statistics 32, 195-218. Salem, A.S., 1971, Investigation of alternative estimators of the parameters of autoregressive processes, Unpublished M.S. thesis (Iowa State University, Ames, IA). Thornber, H., 1967, Finite sample Monte Carlo studies: An autoregressive illustration, Journal of the American Statistical Association 62, 801-819. White, J.S., 1958, The limiting distribution of the serial correlation coefficient in the explosive case, Annals of Mathematical Statistics 29, 1188-l 197.