Predictors for the first-order autoregressive process

Predictors for the first-order autoregressive process

Journal of Econometrics PREDICTORS 13 (1980) 139-157. FOR Wayne THE 0 North-Holland FIRST-ORDER PROCESS* A. FULLER and Iowa Srccte Universi...

903KB Sizes 204 Downloads 53 Views

Journal

of Econometrics

PREDICTORS

13 (1980) 139-157.

FOR

Wayne

THE

0 North-Holland

FIRST-ORDER PROCESS*

A. FULLER

and

Iowa Srccte University,

David

Publishing

Company

AUTOREGRESSIVE

P. HASZA

Ames, IA 501 II, USA

Received July 1978, fmal version

received

December

1979

The error made in predicting a first-order autoregressive process with unknown parameters is investigated. It is shown that the least squares predictor is unbiased for symmetric error distributions. Alternative predictors for stationary and non-stationary processes are studied using the Monte Carlo method. The ordinary least squares statistics perform reasonably well for one period predictions with samples as small as ten for both stationary and non-stationary processes. It is demonstrated that there is a considerable loss in efficiency when outdated estimators are used to construct predictors.

1. Introduction

Let the first-order I:=cc,+c(, = Y0,

autoregressive y-1

+e,,

process

{x; t =O, 1,. .) be defined

by

t-1,2,..., t=O,

(1)

is a sequence of independent identically distributed where {e,;t=O,l,...) random variables [{e,) is IID(0,a2)]. The values of a, and CC,, and the form of Y, determine the nature of the time series. If Ic(,~< 1 and (2) the time series is covariance stationary. If the e, are normally distributed and (2) holds, the time series is a normal strictly stationary time series. If ax1= 1 the process is sometimes called a random walk. If cc,#O and c(~= 1 the random walk is said to display ‘drift’. If lzll> 1 the process is called explosive. of the parameters Given that (Y,, Y,,. ., Y,) are observed, the estimation *This research was partly supported by Joint 79-10 with the U.S. Bureau of the Census.

Statistical

Agreements

J.S.A. 78-30

and J.S.A.

140

WA. Fuller

and D.P. Hasza,

Predictorsforjirst-order

AR processes

(a,,a,,a*) has been extensively studied under the assumption that (r,/ < 1. See, for example, Kendall and Stuart (1966, ch. 48) Anderson (1971, ch. 6) Fuller (1976, ch. 8), and references cited by these authors. While estimation of the parameters of the model with [LX, 12 1 has been less extensively studied, considerable information on the large sample distribution of the estimators is available. See White (1958), Anderson (1959) Rao (1961) Dickey and Fuller (1976) Fuller (1976, ch. 8) Hasza (1977) and references cited by these authors. A number of Monte Carlo studies of the estimators of c1i given that Ic(il< 1 have been conducted. Examples include Cobas (1966) Thornber (1967), Orcutt and Winokur (1969), Salem (1971) and Gonedes and Roberts (1977). In part, these studies show that the bias in the least squares estimator oi, is approximately n-l (- l -3cr, ) and that the Monte Carlo variance of r*, for large positive ~1, is greater than large sample theory suggests. Somewhat less extensive literature is available on the properties of predictions for the first-order process. Davisson (1965) obtained the mean square error through terms of order n-l for one period prediction of the zero mean normal stationary autoregressive process. Fuller and Hasza (1978) derived the mean square error of prediction through terms of order n- ’ for that the processes with \rx,\ < 1, \c(,\= 1, and ICI,\> 1. They also demonstrated ordinary regression estimator of the variance of a prediction was applicable in large samples for all three cases. Orcutt and Winokur (1969) and Gonedes and Roberts (1977) have conducted Monte Carlo studies of the prediction error. Gonedes and Roberts considered (c~i(< 1, but Orcutt and Winokur included c(i = I in their study. Phillips (1979) developed an Edgeworth-type approximation for the distribution of the prediction error conditional upon Y, for the model with CI~=O. In this note we investigate the properties of predictors of the next observation(s) in a realization of the process (1). We consider the three cases la,l 1. The Monte Carlo mean square error of the predictor errors are in reasonable agreement with the theoretically developed approximations. The Monte Carlo mean square errors are slightly larger than the theoretical approximations for ICI,) close to, but less than one. The distribution of the ‘regression t-statistic’ for the one period predictor error is close to that suggested by the theoretical approximation. The variance of the ‘t-statistic’ is much larger than the theoretical approximation for predictions more than one period ahead made with small samples for processes with large positive cxr. 2. Theoretical results In this section we give some properties of predictors constructed autoregressive process. The first result establishes that the expected

for the value of

WA. Fuller and D.P. Hasza, Predictorsforfirst-order

AR processes

141

the predictor error is zero for a wide class of processes, including stationary normal processes. We note that Malinvaud (1970, p. 355) stated that the predictor for the zero mean first-order process was unbiased for symmetric error distributions. Theorem 1. Let model (1) hold with Ic(~[< 1. Let Y, be a (possibly degenerate) random variable symmetrically distributed about the mean p= and with finite variance. Let {e,) be a sequence of IID(0,a2) (l-%-‘&J, random variables with u symmetric distribution. Let {e,} be independent of‘ Y,. Dejine ~“+s=&o+oilY”,

s=l, s = 2,3,. . ,)

=&-J+t,Yn+,_,,

(3)

where

&,=n-’

[Ii x-4 *=1

i 1=1

T-l]

Let the sample size n and the distribution

Jbr t=O,l,...,

n, and s a positive E{Y,+,-

Proof:

The predictor

integer.

of e, be such that

Then

E,+J=O. computed

for X, = k; + A is

X n+s= E+,+A for any real number A. Therefore p = 0. The predictor error is

we assume,

s-1

Yn+_S-t+,=

1 j=O

with no loss of generality,

s-1

a!e,+,_j-Oi,

1 j=O

L?(+ (tl; -q)Y,,

that

142

WA. Fuller and D.P. Hasza. Predictors

forfirst-order AR

processes

and we need only evaluate

as E{e,} =0 for all t. Because oi, is denote a sample realization. Let {yO,Yi,yZ,...,K)=$, even function of qy, oi, computed from Fy is equal to that computed from where +!I:={-Y,, -Y,, -Y2 ,..., -Y”}. Likewise 6, is an odd function of Therefore Y, +S computed from +Syis the negative of that computed from and

an $F, $,,. 3:

s-1

for the sample $Jy is the negative of that for the sample because Y, + s - Y,+, has a symmetric distribution. 0 If a0 =0 the predictor processes.

is unbiased

for both

$,*. The result follows

non-stationary

Let Y, Corollary 1.1. Let model (1) hold with a,=O. degenerate) random variable with symmetric distribution, zero variance. Let {e,} be a sequence of IID(O,o’) random variables distribution and {e,} independent of’ Y,. Let E+, be defined in the sample size n and the distribution of e, be such that

jbr t=O,l,...,

n, where s is a positive

integer.

and stationary

be a (possibly mean, and finite with symmetric Theorem 1. Let

Then

JqY,+,- %+J=O Proof: The proof rests parallels that of Theorem If c1i is one, the predictor

upon the 1. 0

symmetry

is unbiased

of the distribution

of e, and

for all values of Y,.

Corollary 1.2. Let model (1) hold with cc,=0 and CI~=l. Let {e,> be a sequence of’ IID(0, a’) random variables with symmetric distribution and (e,> independent of’ Y,. Let E,,, be as defined in Theorem 1. Let the sample size n and the distribution oj’e, be such that

WA. Fuller and D.P. Hasza, Predictors for first-order

for

n, and s a positive

t=O,l,...,

integer.

143

AR processes

Then

qY,+s- E,+,Ir,>=o. Proof:

We have

where X,=

ej.

i

j=

I

It follows that

where k,+, is defined by (3) with X, replacing 1.1. 0 unbiased for X,,, by Corollary It is also of interest that construct a confidence interval

Corollary Corollary

of‘ Theorem

Y t^+s=s-1 Gfi+a&C,, r+2 iFo

fl+s

((

-

r:_,,

co,=-D-l

i 1=1

Cl1 =D-‘n,

y_1,

E+, +’

i (I:-I$-&,E;_,)~, r=1

,=l

1, of Corollary

+2aOsalsCo1 +af,C,,

where

Gm=D-’ i

2,,+,

the ‘regression t-statistic’ one would for the prediction has zero expectation.

1.3. Let the assumptions 1.2 hold. Let

cY2=(n-22)-l

x. The predictor

)I

is

use to

1.1, or of

144

WA. Fuller

and D.P. Hasza,

Predictorsforjrst-order

AR processes

a,, is the partial derivative oj’ p,+, with respect to a, evaluated at (oiO,iI ) and a,, is the partial derivative of’ pn+, with respect to CI~ evaluated at (c&,&~). Then

E{t^+,) =o. Proof: It has been established that t+, and c?, are odd functions of the sample while a^, is an even function. It is clear that S2, D, C,,, C,,, and CT:; c?fi are even functions while C,, is an odd function. The partial derivative

is an even function

of the sample,

is an odd function of the argument of Theorem 1. 0

while

sample.

The

result

follows

by

the

symmetry

Theorem 1 and the corollaries were presented for the first-order process, but they extend immediately to higher-order processes. Theorems 2 and 3 follow from the results of Fuller and Hasza (1978) and are presented without proof.

Theorem 2. Let model (1) hold with la,l be a sequence oj’normal independent (0, a2) random variables and let Y, be defined by (2). Let t+, be defined by (3). Then s-1 EI(Y,+,-

~“+,)‘>=cT’

C

[

j=O

(, ,I s-1

c++n-‘s2af(“-‘~+nP1

‘2

_F;,,+

+O(n-f). Two interesting square error of the The limiting value the variance of the

special cases are associated with Theorem 2. The mean one period prediction error is approximately 02( 1 + 2n- ’ ). of the prediction mean square error, as s becomes large, is process plus the variance of the estimated mean.

WA. Fuller and D.P. Hasza, Predictorsforfirst-order

Theorem sequence

3.

Let

model

(1) hold with IaIl= 1 and Y, fixed.

of IID(0, a’) random variables.

Ei(Y,+, -

AR processes

Let

145

{e,j

be a

Then

E,+,)2}-cr2(1+3n-‘) Ac2(1 +4n-‘)

if

cr,=fl,

cc,=O,

if

a1 = 1,

a0 #O,

where the approximation arises from the deletion of higher-order terms and from the numerical evaluation of the order n-l term. Theorem 4 gives the order 6’ approximation to the prediction error for the explosive process : Theorem 4. Let model (1) hold with Iall > 1. Let IID(0, 02) random variables. Then

{e,}

be a sequence

of

-~+l=an+l+OpWb),

Yn+l where

(4) u .+l=e,+l-[(a~-l)-n-‘a,(al+l)]L,-a,~~,

1

L,=

a,(“p’+l)e,.

t=, Prooj:

We have

Yn+l

-En+l=en+l-(a:-l)[L,+(aO-OiO)(al-l)-l]

Now

(a:-l)-‘X’F-n-‘(a,-l)-‘X2 =

i

a;‘“~‘+l)e,+Op(lall~n)

1=,

(a~-1)-1X2-n-1(a~-1)-2X2+0,(~all~”)



146

W.A. Fuller and D.P. Hasza, Predictors

for first-order

AR processes

where X=Y,+a,(U,-1)-l+

f

a;‘ej.

j=l

Then a _di = 0 0

(a: - l)-‘e-n~‘(a,

We have ~?=0,(n-~)

- l))‘L”

(a;_l)-l_n-l(al_1)-2

+%(l~ln~

and

ao-~o=e-n-‘(al

+l)L,+O,(n-*).

Therefore Yn+1

-(al-l)(L,+[e,-n-‘(tL1+l)L,](CI1-l)-l}

-En+l=en+l

+~~--n-l(a,+l)L,+O,(n-Q) =en+l

-L,[(af-1)--n-’

ccl(al+l)]-al~n+O,(n-+).

We have _E{$} =n-r02, E{L,2} = f; (.q2(n-t+1)02 t=1

=(a:-l)-

=n

1c72+O(/aI(-2n),

-‘(a1

-l)-‘a2+O(n-‘[aI[-“).

Therefore, 0~2E(u,2+l}=l+(a~-l)-1[(a~-l)-n-’a,(a,+1)]2+a~n~1 +2a,n-1(a,-1)-1[(a~-1) +O(n-

-n-la,(a,+l)]

‘Ja,l-“)

= a~(1+n-‘)-n-2a~(a1-l))-1(a1+I) +O(n-‘la,J”).

0

The explosive case is interesting in that estimation of the parameters produces an increase in the leading term of the prediction mean square error. Thus the order one term with estimated a1 is afa’ instead of the value o2

WA. Fuller

associated prediction

und D.P. Huszu,

with known for s periods

Predicfors

c1i. The order ahead is

forfirst-order

147

AR processes

one term in the mean

square

error

d (s2a2’S - “(a; - 1) + (up - 1 )(x(: - I)- ’ }

of

(5)

Fuller and Hasza (1978) have shown that the ordinary regression estimator of the variance of the predictor given in the denominator of t?,+, is an appropriate estimator of the mean square error of the autoregressive predictor. They demonstrated that the order n-l term of the least squares estimator was estimating the order n- ’ term of the mean square error for several models. It follows that, in large samples, i,,+, can be used to set confidence intervals for predictors irrespective of the magnitude of u1 when the et are assumed to have a normal distribution.

3. Monte Carlo study In this section we report the results of a Monte Carlo study of the prediction error for the first-order autoregressive process. To simulate the random variables a sequence of NID(O,l) random variables was generated using the program SUPER DUPER from McGill University [Marsaglia et al. (1976)]. For stationary processes the first observation was generated as Y,= (1 -cc:)-feo, and the remaining

observations

~=al~-l+er,

of the sample t-l,2

by

)..., n.

For the non-stationary processes Y, was set equal to zero. The error in predicting Y,+ , given Y,, Y,_ , , . . ., Yi, Y, can be written

-E+1=%+1+ (a,--&,j+

Y?I+1

as

(a, -&)y,,

and E{(Y,+, - I;,+,)‘}=W+, Therefore,

it is only necessary

}+E{C(cc,-~,)+(a1-C1*1)Yn12)

to simulate

the distribution

of

to obtain an estimate of the mean square error of the one period prediction error. Similar expressions hold for two and three period predictions.

148

WA. Fuller and D.P. Hasza, Predictors

for first-order

AR processes

Table 1 contains the Monte Carlo variances of the error made in predicting the first-order process one, two, and three periods in the future. The predictor is the least squares predictor defined in (3). Because the normal distribution is symmetric the predictors are unbiased and the mean square error of prediction is equal to the variance. The entries in the table are the Monte Carlo variances estimated from’ 5000 samples. Each parameter-sample size conliguration is constructed from an independent set of N(0, 1) random variables. That is, the three entries for s = 1,2, and 3 for ai = - 1 and T= 10 were constructed from one set of samples, while the three entries for ai = -0.9 and T= 10 were constructed from an independent set of samples. We follow the notation of Orcutt and Winokur, letting T denote the total number of observations available and n = T - 1 denote the number of observations used in the regression. The large sample theory gives 1+2n-‘, 1+3n-‘, and af(l+n-‘) as the mean square error of one period prediction for Ic(i( < 1, /t(il= 1, and ltxil> 1, respectively. Generally, for small n, the Monte Carlo mean square errors are slightly larger than the theoretical approximations. The agreement between the Monte Carlo results and the theoretical approximation is quite good for Ia11 = 1 for all sample sizes. The theoretical variances have a discontinuity at iail = 1, while the Monte Carlo variances are much smoother in the vicinity of [aI\ = 1. It seems that, for small n and a1 E [ - 1, 11, the largest mean square error of one period prediction occurs for aI slightly less than 1. The large sample approximation to the variances of the three period prediction error for n=19 and a1 = -0.9, -0.5, 0, 0.5, 0.7, 0.95, and 0.99 are 2.820, 1.372, 1.053, 1.503, 1.959, 3.531, and 3.860, respectively. Thus the approximation of Theorem 2 is quite close throughout the range of a1 values Carlo results are slightly below the theoretical (la,l 1, the limiting distribuiion of a;(x*, -aI) is a multiple of a Cauchy we might expect the distribution of (6: -a:)Y, to display heavy tails. This seems to be the situation, because the Monte Carlo variance of prediction for s= 3 and a1 = 1.05 is larger for n=59 than for n=19. The variance of the prediction error for large positive a, is much below what one might anticipate on the basis of the mean square error of ii. For example, with a1 =0.9 and n= 19 the Monte Carlo mean square error of t?r is

s=2

3.05 2.33 I .48 1.18 1.29 1.71 2.17 2.72 2.75 2.70 2.8 1 3.02 3.05

s=l

1.37 1.28 1.26 1.27 1.29 1.35 1.36 1.38 1.36 1.31 1.32 1.35 1.34

Parameter

- 1.0 -0.9 -0.5 0.0 0.2 0.5 0.7 0.9 0.95 0.99 1.oo I .02 1.05

T= 10

Table 1

6.07 3.77 1.66 1.17 1.34 2.03 2.94 4.40 4.47 4.46 4.86 5.58 5.55

s=3 1.16 1.12 1.11 1.11 1.12 1.13 1.14 1.18 1.16 1.16 1.16 1.15 1.19

s=l

T=20

2.40 1.99 1.32 1.06 1.13 1.42 1.77 2.26 2.36 2.43 2.44 2.50 2.73

s=2 3.99 2.85 1.39 1.06 1.13 1.51 2.14 3.24 3.53 3.77 3.79 4.02 4.65

s=3 1.05 1.04 1.03 1.04 1.03 1.04 1.04 1.05 1.05 1.05 1.05 1.07 1.16

S=l

T=60 s=3 3.31 2.58 1.33 1.02 1.07 1.37 1.85 2.73 3.05 3.32 3.34 3.68 4.97

s=2 2.13 1.87 1.27 1.02 1.07 1.30 1.57 1.96 2.08 2.17 2.17 2.30 2.78

Monte Carlo mean square error of least squares predictor error (5000 samples)

1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.04 1.10

s=l

T=x

2.00 1.81 1.25 1.00 1.04 1.25 1.49 1.81 1.90 1.98 2.00 2.21 2.56

s=2

3.00 2.47 1.31 1.00 1.04 1.31 1.73 2.47 2.72 2.94 3.00 3.52 4.44

s=3

150

WA. Fuller and D.P. Hasza, Predictors for first-order

AR processes

0.082 and the variance of Y, is 5.263, while the Monte Carlo variance one period prediction error is 1.176. To better understand the variance recall that the variance of the estimation portion of the prediction for one period prediction is

In fact loi, -a11 is generally

oi,=n-’

i

smaller

of the result, error

for large lY.1, and

(~-o;IY,_r)“(l-oi,)j

t=1

is negatively correlated with (2, - a1 ) Y,. The estimated standard errors for the mean computed using the common moment formula,

square

errors

of table

1 were

5000

@%=)=(5000)-‘(4999))’

c

(MSE,-MSE)‘,

j=l

where MSE is the entry in the table and MSEj is the squared error computed for the jth sample. The estimated standard errors for T = 10 were about 0.012, 0.041, and 0.15 for one, two, and three period predictions respectively. The estimated standard errors for T=60 were about 0.0015, 0.0041, and 0.011 for one, two, and three period predictions respectively. The sample percentiles for the distribution of the ‘r-statistic’ defined in Corollary 1.3 are given in table 2. For this simulation x+, was generated and the r-statistic constructed from the definition. For one period prediction the percentiles are in reasonable agreement with those of Student’s t with n-2 degrees of freedom. The percentiles are greater than those of Student’s t for the larger values of a1 and less than those of Student’s t otherwise. The smallest percentiles occur when a1 is near zero. The percentiles for three period predictions, given in table 3, deviate considerably from those of Student’s t for the smaller sample sizes. The percentiles for a1 near zero are less than those of Student’s t while the percentiles for a1 of large absolute value are greater than those of Student’s t. For large positive a1 the percentiles are much larger than those of Student’s t. The agreement between the theoretical and observed percentiles definitely improves as the sample size is increased. The estimated standard errors were about 0.04 for the estimated five percent points and about 0.08 for the estimated one percent points of table 2.

WA. Fuller and D.P. Haszu, Predictorsfor,first-order

Table Sample

percentiles

of regression

‘t-statlstx’

T= 10

AR processes

2 for one period

prediction

T=20 Probabihty

151

(5000 samples). T=60

of a larger

value, sign ignored

Parameter

0.05

0.01

0.05

0.01

0.05

0.01

-1.00 - 0.90 -0.50 0.00 0.20 0.50 0.70 0.90 0.95 0.99 1.00 1.02 1.05

2.31 2.32 2.29 2.24 2.40 2.39 2.34 2.48 2.50 2.53 2.55 2.51 2.54

3.45 3.42 3.29 3.33 3.52 3.50 3.41 3.54 3.68 3.79 3.19 3.83 3.56

2.12 2.10 2.07 2.07 2.05 2.11 2.13 2.17 2.20 2.17 2.20 2.22 2.23

2.92 2.89 2.84 2.92 2.80 2.84 2.86 2.91 3.01 2.93 3.04 2.93 2.91

1.98 1.91 2.01 2.00 2.00 2.00 2.00 2.03 2.01 2.04 2.05 2.07 2.06

2.60 2.60 2.55 2.70 2.11 2.67 2.66 2.13 2.70 2.65 2.74 2.16 2.75

2.365

3.499

2.110

2.898

2.002

2.665

t with (n-2)

d.f.

The estimated standard errors were about 0.06 for the estimated five percent points and about 0.13 for the estimated one percent points of table 3. Tables 1 and 2 were constructed under the assumption that no information on the magnitude of (a,,~,) was available. If one has knowledge that lclilz 1 and that czO=0 if Ic(iI = 1 it is reasonable to use the estimators o?,=-1

if

&is-l,

=l

if

oi,zl,

A =cZi

otherwise,

&-,=n-1

[i

r;--C;, 1=1

i

the predictor

~+l=o?o+071y,. We also consider ‘I?+

if

l&,l
otherwise,

=o

to construct

x-i]

I=1

1, (i) =

the predictor G(i)

+

aT(i) Y.9

Table Sample

percentiles

of regression

‘t-statistic’

T= 10

3

for three period

prediction

T=20 Probability

(5000 samples). T=60

of a larger

value, sign ignored

Parameter

0.05

0.01

0.05

0.01

0.05

0.01

-1.00 - 0.90 -0.50 0.00 0.20 0.50 0.70 0.90 0.95 0.99 1.00 1.02 1.05

2.61 2.56 2.24 2.17 2.35 2.60 2.90 3.34 3.41 3.56 3.54 3.57 3.66

4.28 3.92 3.27 3.02 3.39 3.72 4.35 4.94 5.09 5.50 5.65 5.69 5.89

2.30 2.22 2.06 2.01 2.11 2.19 2.42 2.69 2.75 2.72 2.76 2.75 2.84

3.29 3.22 2.82 2.85 2.85 3.08 3.40 3.90 4.03 3.85 3.98 4.00 4.21

2.05 2.06 2.02 1.98 2.00 2.07 2.08 2.16 2.22 2.25 2.22 2.27 2.18

2.85 2.76 2.68 2.64 2.62 2.77 2.87 2.97 3.00 3.16 3.04 3.04 2.97

t with (n-2) d.f.

2.365

3.499

2.110

2.898

2.002

2.665

where ay(i, = 1 zz- 1 = (n--4)-’

=n

-’

i

[(n-i)&,

+ l]

(k;-U:(i,x_I)

if

(n-4)-‘[(n-i)oi,+l]zl,

if

(,7-4)-‘[(n-i)&,+1]5-1,

otherwise,

otherwise.

t=1

The estimator ~r:~~, with no truncation Winokur (1969). Gonedes and Roberts (1977) suggested

where

was

investigated

the predictor

by Orcutt

and

WA. Fuller and D.P. Husza, Predictorsforprst-order

AR processes

153

We also calculated the maximum likelihood estimator under the assumption that the observations form part of a realization of a stationary We denote the predictor normal first-order autoregressive process. constructed from the maximum likelihood estimator by yn+ I,m,. In table 4 we compare the mean square errors of predictor errors for the alternative estimators with T=20. All estimators for a particular sample sizeparameter configuration were computed from the same set of samples. AS in table 1, an independent set of samples was generated for each sample sizeparameter configuration. The ranking of the estimators is essentially the same for other sample sizes. As one would expect, the predictor error of ?,,+ 1 values close to the boundary of is smaller than that of p,+ , for parameter the parameter space. Comparing tables 1 and 4 we see that the reduction in predictor error variance associated with truncation of oi, is greater for CI,= -1 than for a,=l. The maximum likelihood predictor is marginally superior to the truncated least squares predictor for CI~ in the interior of the parameter space (-0.9 scr, 50.9) for both s = 1 add s = 3. The truncated least squares estimator is superior at u1 = k 1. The predictions based upon ET(~, and CY~(~,are superior to the truncated least squares predictor for c(~ close to one, but are inferior for some smaller positive values of cI,. One interesting result is the fact that the estimator t+,,, is dominated by the estimator Y,*+ , , t1,. That is, there is no sample size-parameter configuration for which the mean square error of pn+ 1.6- Y,, , is less than the mean square error of Y,*+ , ,C1J- Y. + 1. This result does not hold for s= 3. For the longer prediction period t+,,, is superior to Y,*+,,(,, for CI~~0.9. When comparing alternative estimators, Gonedes and Roberts (1977) chose as a criterion the mean of the squares of the 20 one period prediction errors for the next 20 observations, where the predictions were constructed without updating the estimated parameters used to form the predictions. Thus for n = 19, Gonedes and Roberts constructed the 20 predictions

where the subscripts on the estimated parameters constructed from observations ( YO,Y,, . . ., Y,,). criterion was the expected value of

indicate that they were The Gonedes-Roberts

Table 5 has been constructed to illustrate the effect of not using the most recent data in computing predictions. The first column of the table is the

Table 4

y:+,,,3, 1.08 1.11 1.11 1.12 1.12 1.13 1.13 1.12 1.09 1.07 1.07

O+, 1.08 1.10 1.11 1.11 1.12 1.13 1.14 1.16 1.15 1.13 1.13

t+,.,I

1.13 1.09 1.10 1.11 1.11 1.12 1.12 1.14 1.14 1.14 1.15

Parameter

-1.00 - 0.90 - 0.50 0.00 0.20 0.50 0.70 0.90 0.95 0.99 1.00

One period predictor

1.07 1.05 1.04

1.10

1.14 1.16 1.15

1.13

1.04 1.09 1.13

y~+l,(l,

1.09 1.06 1.06

1.11

1.48 1.33 1.22

1.60

2.26 2.15 1.86

1+,.,

3.26 3.53 3.62

2.94

1.12 1.48 2.01

1.05

3.50 2.67 1.38

t+,.,,

3.35 3.49 3.50

3.11

1.13 1.51 2.10

1.06

3.25 2.75 1.39

Z+,

Three period predictor

3.20 3.27 3.27

3.07

1.14 1;57 2.22

1.06

3.27 2.76 1.39

y.*+,.,a,

Monte Carlo mean square error of alternative predictors for t = 20 (5000 samples).

3.10 3.15 3.16

2.99

1.15 1.65 2.31

1.06

3.10 2.84 1.45

3.02 3.08 3.09

2.94

1.78 2.21 2.58

1.60

5.60 4.01 1.91

y:+,,u, t+3,,

L k D a w 2 B

F

$ B &

;r: 4. i; S a >

-u

G

.s

WA. Fuller and D.P. Hasza, Predictors for first-order

Table Monte

Carlo

mean square

Estimate

Parameter

Current

-1.00 -0.90 -0.50 0.00 0.20 0.50 0.70 0.90 0.95 0.99 1.00

1.08 1.10 1.11 1.11 1.12 1.13 1.14 1.16 1.15 1.13 1.13

error

sample

155

AR processes

5

of one period prediction error by distance sample and prediction (T= 20).

between

estimation

is: One period old

Two periods old

Five periods old

Twenty periods old

1.09 1.11 1.10 1.11 1.12 1.15 1.18 1.24 1.24

1.13 1.12 1.10 1.11 1.12 1.16 1.21 1.31 1.32 1.32 1.33

1.18 1.14 1.10 1.11 1.12 1.17 1.25 1.48 1.55 1.58 1.62

1.54 1.16 1.10 1.11 1.12 1.17 1.26 1.78 2.22 2.81 3.08

1.22 1.23

Monte Carlo variance of the one period prediction error when the prediction is constructed from the most recent observations. This column is the same as the column for ?“+ 1 and n= 19 of table 4. The second column is the variance of the prediction error obtained if the sample estimates based on (yo, r,,..., Yrg) are used to construct a one period prediction for Y,, That is, the second column gives the Monte Carlo variance of Y,, - pZ1CR, where

The third column is the variance of the one period prediction error obtained by using an estimator of (a,,~,) that is two periods old, etc. The effect of using outdated parameter estimates in constructing predictions varies by parameter value. For c(i = - 1.0 there is an increase of about 40% in the mean square error as one moves from current estimates to estimates that are 20 periods old. For Ic(iI SO.5 the Monte Carlo variances of the prediction error for predictions based on current data are similar to those based on ancient data. For large positive c(i there is a considerable increase in variance associated with the failure to use the current data in constructing the one period prediction. This is less surprising when one observes that for \c(i( = 1 the mean square error of one period prediction increases without bound as the distance between the estimation sample (of fixed size n) and the prediction data increases. If a, =0.99, the Monte Carlo mean square error of the Gonedes-Roberts prediction error for the prediction of Y,, is 2.81. If one

156

WA. Fuller and D.P. Hasza, Predictorsforfirst-order

AR processes

used the observations (Yo,Y,,...,Y,,)to construct (oi,o,,, &1(38j) the variance of the one period prediction error for YJ9 would be about 1.08.

4. Conclusions This study, as well as that of Orcutt and Winokur (1969), show that the small sample behavior of predictors of the first-order autoregressive process is in better agreement with large sample theory than is the small sample behavior of estimators of the autoregressive parameter. For samples as small as T= 10 and autoregressive parameter c(~E [ - 1.0, 1.051 the maximum increase in the variance of the one period least squares predictor error over that for known r,, was less than 40 y0 (38 y0 for c(~= - 1.0). At T=20 the maximum increase was 17%. At T= 10 the maximum deviation between the order 6’ large sample theoretical approximation and the Monte Carlo mean square error was 13% (for CI, =0.90). At T=20 the maximum deviation between the theoretical and Monte Carlo mean square error for one period prediction was 7% (for ~1~ = 0.90). ‘t-statistic’ for one period prediction The percentiles of the regression agreed well with those of Student’s t with n-2 degrees of freedom. The regression ‘t-statistic’ for one period prediction has the property that its limiting distribution for the three cases lall< 1, 1x1/= 1, and [c1,1> 1 is that of a (0,l) random variable. The behavior of the ‘t-statistic’ was less satisfactory for longer period predictions. A sample size of 60 was required before the distribution of the ‘t-statistic’ for the three period predictor was in reasonable regression agreement with Student’s t. Our study demonstrated the importance of using the most current data in constructing predictions. The loss in efficiency associated with the use of outdated data is greatest for large positive’&, . If T = 20 and CI~=0.95 the use of estimates one period old results in an eight perdent increase in predictor variance relative to the use of estimates based on current data. If ~1~= 1, T =20, and the estimation sample is five periods old the loss in efficiency is about 43 percent. It is possible to construct predictors that perform better than the least squares predictor for some values of CI~. The predictor constructed from the maximum likelihood estimators of a0 and CI~ is superior to the truncated least squares estimator except for Iall close to one. All alternative predictors studied were inferior to the least squares predictor for some values of ~1~.The one period predictor based upon differences suggested by Gonedes and Roberts (1977) was dominated by another estimator and is not recommended.

WA. Fuller c~r~tl D.P. Hasza, Predictors for first-order

AR processes

157

References Anderson, T.W., 1959, On asymptotic distributions of estimates of parameters of stochastic difference equations, Annals of Mathematical Statistics 30, 676687. Anderson, T.W., 1971, The statistical analysis of time series (Wiley, New York). Cobas, J.B., 1966, Monte Carlo results for estimation in a stable Markov time series, Journal of the Royal Statistical Society A129, no. 1, 1 l&l 16. Davisson, L.D., 1965, The prediction error of stationary Gaussian time series of unknown covariance, IEEE Transactions on Information Theory 11, 527-532. Dickey, D.A. and W.A. Fuller, 1979, Distribution of the estimators for autoregressive time series with a unit root, Journal of the American Statistical Association 74, 427431. Fuller, W.A., 1976, Introduction to statistical time series (Wiley, New York). Fuller, W.A. and D.P. Hasza, 1978, Properties of prediction for autoregressive time series, Report to the US. Bureau of the Census (Department of Statistics, Iowa State University, Ames, IA). Gonedes, N.J. and H.V. Roberts, 1977, Differencing of random walks and near random walks, Journal of Econometrics 6, 289-308. Hasza, D.P., 1977, Estimation in nonstationary time series, Ph.D. dissertation (Iowa State University, Ames, IA). Kendall, M.G. and A. Stuart, 1966, The advanced theory of statistics, Vol. 3 (Hafner, New York). Malinvaud, E., 1970, Statistical methods of econometrics (North-Holland, Amsterdam). Marsaglia, G., K. Ananthanarayanan and N.J. Paul, 1976, Improvements on fast methods for generating normal random variables, Information Processing Letters 5, no. 2, 27-30. Orcutt, G.H. and H.S. Winokur, 1969, First order autoregression: Inference, estimation, and prediction, Econometrica 37, 1-14. Phillips, P.C.B., 1979, The sampling distribution of forecasts from a first-order autoregression, Journal of Econometrics 9, 241-262. Rao, M.M., 1961, Consistency and limit distributions of estimators of parameters in explosive stochastic difference equations, Annals of Mathematical Statistics 32, 195-218. Salem, A.S., 1971, Investigation of alternative estimators of the parameters of autoregressive processes, Unpublished M.S. thesis (Iowa State University, Ames, IA). Thornber, H., 1967, Finite sample Monte Carlo studies: An autoregressive illustration, Journal of the American Statistical Association 62, 801-819. White, J.S., 1958, The limiting distribution of the serial correlation coefficient in the explosive case, Annals of Mathematical Statistics 29, 1188-l 197.