Economics Letters 64 (1999) 13–16
Testing for a break at an unknown change-point: a test with known size in small samples Clinton A. Greene* Department of Economics, University of Missouri, St. Louis, MO 63121, USA Received 15 September 1998; accepted 4 March 1999
Abstract A stability test is presented which does not commit to a particular break-point and allows for multiple testing as in the Max-F test of Andrews (1993). Unlike the Max-F, this new test has known size in small samples. 1999 Elsevier Science S.A. All rights reserved. Keywords: Chow; Forecast; Max-F; Multiple; Stability JEL classification: C12; C22
1. Introduction This paper introduces a simple stability test which I call the min-P test. Like the max-F (or sup-F ) test of Andrews (1993) and Hansen (1992) this new test is valid in a multiple testing context and is useful when the date of the possible break is unknown. As in Harvey’s (1976) test this min-P test is based on a series of individual forecast Chow tests in which the forecast periods do not overlap. Unlike the max-F test, this new test has the great advantage of known size in small samples. In contrast to the discrete nature of Harvey’s test statistic, the min-P statistic and its cumulative distribution are continuous. Harvey’s test and the min-P test proposed here rely on classical assumptions for the errors of the model, specifically the errors are assumed to be independent, identically distributed and normal. In applied time series work the breakpoint F-test for stability is almost always invalid. The traditional analysis of variance assumes a single test in a unique, previously unstudied sample. A sample extended to include new observations over time is not unique, further testing after a finding of instability amounts to pre-testing and the distributions of subsequent test statistics are not independent *Tel.: 11-314-516-5565; fax: 11-314-516-5352. E-mail address: clinton
[email protected] (C.A. Greene) ] 0165-1765 / 99 / $ – see front matter PII: S0165-1765( 99 )00073-7
1999 Elsevier Science S.A. All rights reserved.
14
C. A. Greene / Economics Letters 64 (1999) 13 – 16
of the previous finding. If breakpoint F-statistics were independent and we applied a test size of 5% then we would expect to find individually significant statistics for a stable model at about 5% of the possible break-point dates, so finding that 5% of the breaks are individually significant is quite consistent with stability. But the problem is not actually this simple because multiple breakpoint tests are not independent. Chu (1990), Hansen (1992) and Andrews (1993) propose the maximum of breakpoint F-statistics calculated at almost every possible break as a test which avoids these problems. The test essentially formalizes and allows for the joint research process that occurs when 100 researchers apply the standard test a few times at arbitrary points or points suggested by previous investigations. As an additional payoff, the max-F allows one to test for stability when the date of the possible break is unknown. A limitation of the break-point max-F test is that the distribution of the test statistic is known only asymptotically. It is extremely difficult to reject using the asymptotic critical values. For instance, for a model employing four weakly dependent (non-trending) regressors the 5% critical value (from Andrew’s (1993)) is 16.45.1 One response for applied work could be to bootstrap the critical values. But the extent to which bootstrapping improves on the asymptotic values in autoregressive models is still an active and unsettled area of research. And of course bootstrapping is a time consuming approach. The min-P test presented here is based on Harvey’s (1976) results on the properties of forecast Chow tests. Harvey develops a discrete test statistic so the cumulative density or probability value is a step function and different test sizes can map to the same critical value.2 And Harvey’s test requires one to commit to evaluating the significance of each individual F-statistic via a fixed nominal size, so a very low-probability F-statistic receives no more weight than one of marginally significant size. The min-P statistic developed here has the advantage of continuity in addition to the advantage of known size in small samples. And it does not require committing to a fixed size in evaluating the individual F-statistics. This new test is a simple extension of Harvey’s work (which has been neglected) and so the next section discusses his approach to the problem of multiple testing of stability.
2. Harvey’s approach to evaluating multiple tests Harvey (1976 and 1990, pp. 181–84, p. 288) shows if model errors are independent then a series of forecast Chow tests (and recursive residuals) are independent. Here it will suffice to note that a forecast is based on all past information (model errors are not autocorrelated) and so the forecast error contains only information not in the previous data. For independence of the series of Chow tests the forecast periods must not overlap. This condition is met in one-step-ahead tests. In multi-period forecasts it is met if the terminal date of the sample used in each forecast is incremented by the number forecast periods at each step, thus for four-quarter forecasts only three sets of forecasts would 1 As suggested by Andrews (1993) it is now standard to calculate the F-statistics for all breakpoints in the middle 70% of the sample. 2 For instance, consider 95 one-step-ahead forecast Chow tests evaluated at an individual, nominal size of 5%. Harvey’s test will accept stability if eight individual rejects are tallied and will reject stability for a tally of nine when the tally is evaluated for an over-all test size of either 5 or 10%.
C. A. Greene / Economics Letters 64 (1999) 13 – 16
15
be made over a period of 3 years. Under the null of stability (and no error autocorrelation) the outcome of individual forecast tests is then a set of Bernoulli trials. Under the classical assumptions the probability of an individual rejection is simply the test size as evaluated via the standard F-distribution. His test is conducted by counting the number of individual rejections and calculating the probability that there will be at least this many rejections given the number of independent trials or forecasts. This calculation requires that the nominal size of each individual test be equal and known. So in addition to independence the test requires identically and normally distributed errors. Thus, the size of Harvey’s test can be calculated in small samples. But the discrete nature of the test statistic implies the mapping from test size to critical values is not one-to-one. For instance, if 48 individual tests are conducted at an individual (nominal) test size of 10%, then both the 5 and 10% critical values equal nine rejections.
3. The forecast min-P test A max-F version of Harvey’s approach is not practical using a fixed starting date because the distribution (degrees of freedom) of each F-statistic differs as the baseline period for the forecast is incremented. There are two possible approaches to this problem. One is to advance the starting date and ending date as a moving window (along with the forecasting date or period) so the degrees of freedom of the F-distribution are the same for every individual test. The second is to allow for the differing distributions by considering the nominal P values of each of the individual F-statistics and to use the minimum P value in a manner similar to the max-F. For the moving-window approach it is equivalent to consider either the max-F or the min-P, so only the min-P test is discussed here. Suppose n forecast Chow tests are conducted for a model meeting the conditions described above, and the nominal P values for each test are recorded as a set x 1 . . . x n . The probability that one or more of these P values are less than a given value (x) is 1 minus the probability that all the P values are greater than x: P(x i # xufor any i 5 1 . . . n) 5 1 2 P(x i . xufor all i)
(1)
For an individual trial (i) the probability of drawing an F-statistic with individual P value x i #x is simply x: P(x i # x) 5 x
(2)
In other words, the probability of rejection for a test size of 5% is 5% and the probability of not rejecting is 95% or 12x. As in Harvey (1976) each forecast is an independent trial. Thus, the probability that there are no rejects among the n trials or forecast tests is simply the product of the individual probabilities: P(x i . xufor all i) 5
P
(1 2 x) 5 (1 2 x)n
(3)
i51 . . . n
Substitution of the probability of no rejects Eq. (3) into Eq. (1) then yields P(x i # xufor any i) 5 1 2 P(x i . xufor all i) 5 1 2 (1 2 x)n .
(4)
C. A. Greene / Economics Letters 64 (1999) 13 – 16
16
Letting x5min(x 1 . . . x n ), the right-hand side of Eq. (4) gives the probability value of the min-P among the n P values. Or for test size a the critical value (x * ) for the min-P statistic is x * 5 1 2 (1 2 a )1 / n .
(5)
4. Conclusion The derivation of the test size for the min-P test is extremely simple. Unlike Hansen’s (1992) version of the max-F, the min-P test cannot be applied to models which employ integrated variables. But under the null of cointegration the stability of error–correction models can be evaluated via the min-P test. And the known size of the min-P test in small samples is a tremendous advantage over other tests which control for multiple testing or pre-testing. The test can be conducted using one-step-ahead Chow tests, n-step-ahead Chow tests (for a fixed n) and using a baseline period which is sequentially extended in length or is shifted (with constant length) as a moving window.
Acknowledgements Funding for this research was supported in part by a University of Missouri Research Board Grant, [S-3-40607.
References Andrews, D.W.K., 1993. Tests for parameter instability and structural change with unknown change point. Econometrica 61 (4), 821–856. Chu, C.-S.J., 1990. Test for Parameter Constancy in Stationary and Nonstationary Regression Models. Mimeo, UCSD. Hansen, B.E., 1992. Tests for parameter instability in regressions with I(1) processes. Journal of Business and Economic Statistics 10 (3), 321–336. Harvey, A.C., 1976. An alternative proof and generalization of a test for structural change. The American Statistician 30 (3), 122–123. Harvey, A.C., 1990. In: The Econometric Analysis of Time Series, 2nd ed, MIT Press, Cambridge, MA.