Time Series: Nonstationary Distributions and Unit Roots

Time Series: Nonstationary Distributions and Unit Roots Tukey J W 1977 Exploratory Data Analysis. Addision-Wesley, Reading, MA Tukey J W 1984 The Coll...

Download PDF

91KB Sizes 2 Downloads 164 Views

Report

PDF Reader
Full Text

Time Series: Nonstationary Distributions and Unit Roots Tukey J W 1977 Exploratory Data Analysis. Addision-Wesley, Reading, MA Tukey J W 1984 The Collected Works of John W. Tukey I-II Time Series. Wadsworth, Belmont, CA Whittle P 1951 Hypothesis Testing in Time Series Analysis. Almqvist and Wiksell, Uppsala, Sweden Wiener N 1949 Time Series. MIT Press, Cambridge, MA Wiener N 1958 Nonlinear Problems in Random Theory. MIT Press, Cambridge, MA Wold H O A 1965 Bibliography on Time Series and Stochastic Processes. MIT Press, Cambridge, MA Yule G U 1927 On the method of investigating periodicities in disturbed series with special reference to Wolfer’s sunspot numbers. Philosophical Transactions A 226: 267–98

D. R. Brillinger

this predictor is the conditional expected value of Yt given the past. Here et is white noise, that is, a sequence of i.i.d. (0,σ#) variates. One must estimate the αi and µ parameters as well as the number p of lags to use. Early work by Mann and Wald (1943) established the asymptotic normality of parameters estimated by least squares regression of Yt on Yt− ,…,Yt−p for coariance stationary time series. A series is" covariance stationary if it has a constant mean and the covariance between Yt and Yt+h is a function only of QhQ denoted by γ(h) and called the autocovariance function. For an autoregressive series as deﬁned above, covariance stationarity is ensured if the roots of the characteristic equation 1kα mkα m#k…kαpmp l 0 " #

Time Series: Nonstationary Distributions and Unit Roots 1. Introduction A stationary time series is one whose mean and variance are constant and the covariance between observations depends only on their separation in time. From a statistics point of view the distinction between stationary and nonstationary series is important because of its implications about the distributions of estimators. From an economics point of view, the distinction is important because certain economic theories imply nonstationarity and can be tested by looking for it in the data. There are questions about the impact of unanticipated shocks to an economic system. Do they produce permanent or transitory shifts in economic variables? This too boils down to a stationarity question. Diebold and Nerlove (1990) provide a long list of references in this area.

2. The Role of Nonstationary Processes and Unit Roots When data are taken over time, a natural way to forecast future observations is to model the current observation Yt as a linear combination of previous observations Yt−j. Deviations from some deterministic function, such as a long run mean µ, are often the basis for prediction. Such a prediction is written as Yp tkµ l α (Yt− kµ)jα (Yt− kµ)j…jαp(Yt−pkµ) " # # " For the usual autoregressie model of order p Ytkµ l α (Yt− kµ)jα (Yt− kµ)j…jαp(Yt−pkµ)jet " " # #

are all greater than 1 in magnitude. In that case the covariance γ(h) is bounded by some constant multiple of (1\QmminQ)QhQ where mmin is the root of the characteristic equation with minimum (closest to 1) magnitude. For example if p l 2 lags, with Y l 150 and ** Y l 120, then using "!! ( µ,α ,α ) l (100,0.9,k0.2) " # the one-step-ahead forecast is Yp

"!"

l 108 computed as

100j0.9(120k100) k0.2(150k100) l 0.3(100)j0.9(120)k0.2(150) while (µ,α ,α ) l (µ,1.2,k0.2) gives the forecast " # Yp

"!"

l µj1.2(120kµ)k0.2(150kµ) l 1.2(120)k0.2(150) l 114

regardless of the value of µ. The characteristic equations factor as (1k0.4m)(1k0.5m) l 0 in the ﬁrst and (1k0.2m)(1km) l 0 in the second case. The unit root m l 1 in the second case is the reason that the forecast does not involve µ. A unit root series is nonstationary, having no tendency for mean reversion. The bounding of the autocovariance by an exponentially decaying function in the stationary case makes the data sequence close enough to an uncorrelated sequence so that standard results like asymptotic normality of regression estimates will hold. While convenient from a statistical theory point of view, the exponentially decreasing covariance seems inconsistent with many observed data sets. Furthermore, exponential decay of the correlations implies that forecasts h steps into the future will approach the series mean at a rate bounded by (1\QmminQ)QhQ. Such quick reversion to the mean may at times seem unrealistic. 15731

Time Series: Nonstationary Distributions and Unit Roots As a motivating example, suppose the price of a stock were represented as Ytkµ l α(Yt− kµ)jet " an autoregressive process of order p l 1. The deviation of the current price from the historic mean should hold no information about future prices. If it did, the pattern would be known to intelligent market participants who would in turn bid the price up or down in such a way as to nullify the eﬀect of the information. What does this say about α? It eliminates QαQ 1, for then the prediction of Yt+h, µjαh(Ytkµ) would move toward the historic mean thus implying useful information in the knowledge of Ytkµ. Assuming that such knowledge is not informative suggests α l 1, a testable hypothesis. For QαQ 1, the series can be represented as a convergent inﬁnite sum _

Ytkµ l αjet−j j=! in which the eﬀect of the unanticipated ‘shocks’ et−j in the distant past is negligible. However, for α l 1 the recursive model becomes Yt l Yt− jet, a simple " apparent role random walk in which the mean µ has no and the only thing moving the stock price is the unanticipated (white noise) shock et. When α l 1, Yt becomes Yt l Y je je j…jet and the shocks et " # Each $ Y is Y plus a sum of es have a permanent eﬀect. t " will not involve so the deviation of Yt from the average Y nor will the residuals from any regression that " includes an intercept. The assumptions on Y will be irrelevant for any statistics computable from a" regression with an intercept column. In a more general context, whenever the characteristic equation has a unit root, the term µ drops out of the model, the forecasts do not revert to a historic mean, the forecast error variance grows without bound, and the distribution of the least squares regression estimates are not all normal even in the limit. Just as the reported Dow Jones average is usually followed by its diﬀerence, the ‘up or down points’ on the evening news, series with unit roots are often diﬀerenced prior to analysis in order to produce stationary series with nice statistical properties. The nonstationary series Ytkµ l 1.2(Yt− kµ)k0.2(Yt− kµ)jet " # with a bit of algebra, is re-expressed as YtkYt− l 0.2(Yt− kYt− )jet " " # This is a stationary autoregressive process in the 15732

diﬀerenced series YtkYt− . The parameter 0.2 can be " diﬀerence on its lag. In estimated by regressing the general Ytkµ l α (Yt− kµ)jα (Yt− kµ)jet becomes " # # " YtkYt− l k(1kα kα )(Yt− kµ) " " # " kα (Yt− kYt− )jet # " # The coeﬃcient of the lag level term (Yt− kµ) is seen to be the negative of the characteristic" polynomial 1kα mkα m# at m l 1. This coeﬃcient is 0 if and only "if there# is a unit root and is negative if the series is stationary. The expression above gives a general method for testing the hypothesis of a unit root in autoregressions. Regress YtkYt− on Yt− and enough lags of YtkYt− " autoregressive " to reproduce the order of the series," then test to see if the coeﬃcient of Yt− is 0. Of course " distribution to do that, one must have at hand the null of the estimated coeﬃcient and\or its t statistic. The work of Mann and Wald does not apply to a nonstationary series and in fact, the limit distributions are not normal. This does not imply that the regression t statistic is an inappropriate test. It does imply that a new distribution will need to be tabulated for it.

3. Distributions for Unit Root Tests Starting very simply, assume Yt l αYt− jet or equivalently YtkYt− l (αk1)Yt− jet and "α l 1 is to be tested. If α l 1,"the regression" of YtkYt− on Yt− for t " l 2,3,…,n gives regression coeﬃcient " n

5

n

# Yt− et Yt− " " t=# t=# (because YtkYt− l et). Dickey and Fuller (1979) " show that this statistic converges to 0 at rate 1\n. The normalized bias statistic n(αV k1) l

5

1 n 1 n Yt− et Y# " nσ# t = n#σ# t = t−" # #

is of order 1. For stationary series Nn(α# kα) is the Op(1) normalization. Here the numerator and denominator quadratic forms have each been normalized to be Op(1). Dickey and Fuller derive a representation for the denominator using its eigenvalues. The limit is an inﬁnite weighted sum of squared N(0,1) variables. They derive the joint representation for the numerator and denominator in terms of these N(0,1) variables and a method for simulating from the limit distribution. Empirical

Time Series: Nonstationary Distributions and Unit Roots Table 1 Limit-critical values for unit root tests (τ, τµ, ττ) compared to normal Regress YtkYt− on " Yt− plus lagged diﬀerences " 1, Yt− plus lagged diﬀerences " 1, t Yt− plus lagged diﬀerences " (normal values)

1 percent critical value

5 percent critical value

k2.58 k3.42 k3.96 k2.326

k1.95 k2.86 k3.41 k1.645

percentiles appear in Fuller (1996). Matching of the large n and limit percentiles provides a check on the theory and computer simulations. An alternative approach to the distribution of this statistic was suggested by White (1958, 1959) and shown in detail by Chan and Wei (1987) and by Phillips and his students in a series of papers; see for example Phillips and Perron (1988). This mathematically elegant approach expresses the limit distribution of the normalized bias statistic n(α# k1) in terms of a Weiner process W(t) as

&

" 1 (W #(1)k1)\ W #(t)dt 2 ! Billingsley (1968) gives the general theory for this type of convergence. The denominator and numerator are both random variables and the limit distribution is the same regardless of whether a quadratic form decomposition or Weiner Process representation is written, so the tables from Fuller (1996) are used either way. The associated studentized statistic, called τ rather than t by Dickey and Fuller to emphasize its nonstandard distribution, converges to 1 (W #(1)k1)\ 2

p&"!W #(t)dt

in the Weiner process representation and has a similar quadratic form representation. An exceptionally nice feature of τ is that its limit distribution under the null hypothesis is the same for higher order autoregressive processes.

4. Beyond Autoregression For purely autoregressive processes, it has been seen that regression of the ﬁrst diﬀerence on the lagged level and enough lagged diﬀerences gives a unit root test. The test on the lagged level term in the presence of lagged diﬀerences has become known as the augmented Dickey–Fuller test or ADF. Said and Dickey (1985) show that in mixed models (ARIMA models involving both lags of Yt and et) direct nonlinear estimation of the parameters results in test statistics having the same limiting distributions as discussed

above. Said and Dickey (1984) show that if one ignores the true mixed structure and simply ﬁts a long enough autoregression, the asymptotics again give the previously discussed τ test. This has been quite reassuring for practitioners who often just use autoregressive representations at this stage of analysis. Hall (1994) studied methods for determining the correct number of lagged diﬀerences to include based on the data. Overﬁtting then testing to omit unnecessary lagged diﬀerences appeared to be a good way to prepare a model for unit root testing. With that in mind, a nice feature of unit root models in the ADF form is that the limit distribution of the regression t tests for the lagged diﬀerence terms have the usual standard limiting normal distributions, as in Mann and Wald (1943). For higher order mixed models, Phillips and Perron (1988) regress the diﬀerence on the lagged level as though a lag one model is appropriate. They then adjust the test statistics based on the autocorrelation function of the diﬀerenced series which, of course, must be estimated. Decisions on how to estimate these autocorrelations and how many of them to use in the adjusting process must be made, just as a decision on how many lagged diﬀerences to use must be made in the ADF test. The method uses the same critical values as ADF. All methods are justiﬁed by asymptotics. The ﬁnite sample power and adherence to the claimed signiﬁcance level will diﬀer. Schwert (1989) provides a comparison.

5. Models with Deterministic Parts In a usual regression problem, under the null hypothesis the t test for a regression coeﬃcient has a t distribution in a properly speciﬁed model regardless of what other terms are in the model. Unfortunately the same is not true of τ in unit root testing. For example, a lag 1 model with a trend and autocorrelation might be posed as (Ytkβ kβ t) l α(Yt− kβ kβ (tk1))jet " ! " ! " or equivalently in two parts as Yt l β jβ tjZt where Zt l αZt− jet. Now if QαQ 1 we !have" stationary " errors around a trend, or trend stationarity as it is 15733

Time Series: Nonstationary Distributions and Unit Roots known in the econometric literature. On the other hand if α l 1 our model becomes Yt l Yt− jβ jet " " known as a random walk with drift β . In order to estimate the model well under both "the null and alternative hypotheses, one regresses YtkYt− on Yt− , " 1, and t, then tests the coeﬃcient of Yt− "using its " studentized statistic τt. Even if the process is a simple random walk, this test has neither a limit normal distribution nor the limit obtained when regressing YtkYt− on just Yt− . The limit distributions for both " " studentized statistics change normalized bias and when an intercept and\or trend are added to the regression. The 1 percent and 5 percent critical values from the limit distribution of the studentized test statistic for cases in which one regresses the ﬁrst diﬀerence on a lagged level and lagged diﬀerences (τ) are given in Table 1, along with the corresponding critical values when an intercept (τµ) or both an intercept and time (ττ) are included. These are tabulated under the unit root null hypothesis. The familiar normal distribution critical values are also shown for comparison. Clearly the naive use of normal or t critical values in this time series setting will result in far too many spurious declarations of stationarity, even in extremely large samples. From the table it is seen that the problem worsens with the ﬁtting of deterministic mean and trend components. Models containing 1 (and t) are invariant to nonzero means (and deterministic trends) in the data. That is, the calculated test statistics used in rows 2 and 3 will remain exactly the same if an arbitrary constant is added to the data, while that in the third row will be unchanged by the addition of an arbitrary linear trend. In discussing trends, it is important to recall that under the α l 1 null hypothesis, a trend is induced by a nonzero β in the model " Yt l Yt− jβ jet. " "

squares in the stationary case. Looking at this likelihood function as simply an objective function to be maximized, Gonzalez-Farias (1992) and GonzalezFarias and Dickey (1999) compute its maximum for a unit root process deﬁning what she calls the unconditional maximum likelihood estimator. She gets representations for it including cases with intercepts and trends in the model. In a comparison study by Pantula et al. (1994) this estimator and Fuller’s weighted symmetric estimator have very similar behavior and seem to be the best performers of the tests mentioned here. All tests described in this section require new tables of critical values which the corresponding authors provide.

7. Related Topics Unit roots can arise in multivariate time series, that is, in cases where a vector of observations is recorded at each time point. In such cases, there may be some linear combinations of the vectors that form stationary time series and other linear combinations that are nonstationary. These results appear under the name ‘co-integration analysis’ and ‘reduced rank regression.’ Another recent area of research is ‘trend breaks.’ There might be one mean during the ﬁrst period of data collection and a second one later. How do unit root tests perform if the break is undiscovered? Can we discover the break point? Is it possible that series we thought were nonstationary are in fact stationary around two or three diﬀerent means? Seasonal unit roots and methods of decomposing them are also currently popular topics. See also: Time Series: Advanced Methods; Time Series: General

Bibliography 6. Beyond Regression In an attempt to increase the power of unit root tests, several alternative estimation methods have been tried. Dickey et al. (1984) introduce a symmetric estimator which combines the regression of Yt on past Ys with a regression of Yt on future Ys. Fuller (1996) discusses this and a weighted symmetric estimator. Elliott et al. (1996) base a test statistic on the following clever idea. Pick a point in the alternative that depends on n, say α l 1k0.7\n.Nowﬁndthepointoptimaltestofα l 1vs. the simple alternative α l 1k.7\n. Counting on the continuity of the power curve, that test should give good power over a range of alternative points near that for which it is point optimal. The estimator that maximizes the stationary normal likelihood function performs slightly better than least 15734

Ahn S K, Reinsel G C 1990 Estimation for partially nonstationary multivariate autoregressive models. Journal of the American Statistical Association 85: 813–23 Billingsley P 1968 Conergence of Probability Measures. Wiley, New York Chan N H, Wei C Z 1987 Limiting distributions of least squares estimates of unstable autoregressive processes. Annals of Statistics 16: 367–401 Dickey D A, Fuller W A 1979 Distribution of the estimators for autoregressive time series with a unit root. Journal of the American Statistical Association 74: 427–31 Dickey D A, Hasza D P, Fuller W A 1984 Testing for unit roots in seasonal time series. Journal of the American Statistical Association 79: 355–67 Diebold F X, Nerlove M 1990 Unit roots in economic time series: a selective survey. Econometrica 8: 3–69 Elliott G, Rothenberg T J, Stock J H 1996 Eﬃcient tests for an autoregressive unit root. Econometrica 64: 813–36

Time Series Quasi-experiment Fuller W A 1996 Introduction to Statistical Time Series. Wiley, New York Gonzalez-Farias G 1992 A new unit root test for autoregressive time series. Ph.D. Thesis, North Carolina State University Gonzalez-Farias G, Dickey D A 1999 Unit root tests: An unconditional maximum likelihood approach. Boletin de la Sociedad Matematica Mexicana 5: 199–221 Hall A 1994 Testing for a unit root in time series with pretest data-based model selection. Journal of Business and Economic Statistics 12: 461–70 Mann H B, Wald A 1943 On the statistical treatment of linear stochastic diﬀerence equations. Econometrica 11: 173–220 Pantula S G,, Gonzalez-Farias G, Fuller W A 1994 A comparison of unit root test criteria. Journal of Business and Economic Statistics 12: 449–59 Phillips P C B, Perron P 1988 Testing for a unit root in time series regression. Biometrika 75: 335–46 Said S E, Dickey D A 1984 Testing for unit roots in autoregressive moving average models of unknown order. Biometrika 71: 599–607 Said S E, Dickey D A 1985 Hypothesis testing in ARIMA (p,1,q) models. Journal of the American Statistical Association 80: 369–74 Schwert G W 1989 Tests for unit roots: a Monte Carlo investigation. Journal of Business and Economic Statistics 7: 147–59 White J S 1958 The limiting distribution of the serial-correlation coeﬃcient in the explosive case. Annals of Mathematical Statistics 29: 1188–97 White J S 1959 The limiting distribution of the serial-correlation coeﬃcient in the explosive case II. Annals of Mathematical Statistics 30: 831–4

D. A. Dickey Copyright # 2001 Elsevier Science Ltd. All rights reserved.

Time Series Quasi-experiment Campbell and Stanley (1963), Campbell (1963) recommend the time series quasi-experiment for assessing the causal eﬀect of a discrete intervention on a time series. In the simplest case of this design, a discrete intervention breaks a time series into pre- and postintervention segments of Npre and Npost observations. For pre- and post-intervention means µpre and µpost, analysis of the quasi-experiment tests the null hypothesis H : ω l 0 where ω l µpostkµpre !

(1)

Rejecting H , HA attributes ω to the intervention. The validity of H! A can be challenged on the two grounds that Cook and Campbell (1979) call ‘threats to internal validity’ and ‘threats to statistical conclusion validity.’ Whereas the threats to internal validity apply to quasiexperiments generally, the threats to statistical conclusion validity, which arise because the time series observations are sampled sequentially, distinguish this

design from other quasi-experiments. Whereas the threats to internal validity are controlled by design, a statistical model must control the threats to statistical conclusion validity.

1. Internal Validity Although in principle, all eight of the threats to internal validity discussed by Campbell and Stanley (1963) apply to the time series quasi-experiment, four threats are inherently plausible. The threats of history and maturation are plausible due to the design’s relatively long time frame and to the nature of long time series. The threats of instrumentation and regression are plausible due to the nature of planned interventions; for unplanned interventions, a special case that Campbell (1969) calls the ‘natural experiment,’ neither threat is plausible. Hennigan et al. (1982) use the time series in Fig. 1 to document an increase in property crime following the introduction of commercial television. In 34 ‘early’ and 34 ‘late’ cities, television broadcasting begins in 1950 and 1954, respectively. If the two time series are considered separately, history is a plausible threat to the internal validity of both eﬀects. The ‘late’ eﬀect may be due to an economic recession that began in 1955, for example, and the ‘early’ eﬀect may be an artifact of the Korean War mobilization. As husbands were called away to military service and wives joined the workforce, homes were left undefended against property criminals. Although history is a plausible threat to the internal validity of each eﬀect, however, it is implausible for the joint eﬀect. The other three common threats to internal validity are less robust to standard designs. Fig. 2 shows monthly burglaries for Tucson before, during, and after an intervention (McCleary et al. 1982). Prior to 1979, burglaries were investigated by uniformed police oﬃcers. When this task was taken over by detectives, burglaries dropped abruptly. Two years later, when the task was transferred back to uniformed oﬃcers, burglaries returned abruptly to pre-1979 levels. While the on-oﬀ component of this design rules out the threat of history, the other threats are still plausible. In fact, this eﬀect is entirely explained by diﬀerences in the way detectives and uniformed oﬃcers keep records; it is an instrumentation artifact. Because interventions often aﬀect the way in which a target phenomenon is measured, instrumentation is always a plausible threat to the evaluation of planned interventions. Regression threats are also associated with planned interventions but for diﬀerent reasons. Following a 1955 traﬃc ‘crackdown’ in Connecticut, for example, highway fatalities dropped signiﬁcantly. Examining a long time series, however, Campbell and Ross (1968) show that the reduction was largely due to an uncontrolled regression threat. The ‘crackdown’ was implemented in 1955 because traﬃc fatalities in 15735

International Encyclopedia of the Social & Behavioral Sciences

ISBN: 0-08-043076-7

Time Series: Nonstationary Distributions and Unit Roots

Time Series: Nonstationary Distributions and Unit Roots

Recommend Documents