Economics Letters 76 (2002) 429–436 www.elsevier.com / locate / econbase
Using bootstrap methods to obtain nonnormality robust Chow prediction tests Leslie G. Godfrey a , Chris D. Orme b , * a
Department of Economics and Related Studies, University of York, Heslington, York YO10 5 DD, UK b School of Economic Studies, University of Manchester, Manchester M13 9 PL, UK Received 28 July 2001; received in revised form 21 February 2002; accepted 12 March 2002
Abstract This paper emphasizes the sensitivity to nonnormality of the standard Chow test for predictive failure. Based on well established asymptotic arguments, a simple double bootstrap procedure is proposed, evaluated and found to be robust to nonnormality. 2002 Elsevier Science B.V. All rights reserved. Keywords: Bootstrap; Chow tests; Regression models JEL classification: C12; C52
1. Introduction Chow (1960) considered the problem in which, having estimated a linear regression with k coefficients using n 1 observations, it is desired to test whether n 2 other observations belong to the same population. He identified two cases: (n 1 . k, n 2 . k); and (n 1 . k, n 2 # k). The analysis of covariance test is available for the first case. For the second case, Chow gave a test of prediction errors and it is this procedure that is examined in this paper. The purpose is to draw attention to the following: (i) the strong auxiliary assumption of normality that underpins the popular Chow test for predictive failure; (ii) the impact on this test of departures from normality; and (iii) the potential for using bootstrap methods to derive a more robust inference. The recent work of Godfrey and Orme (2000) on the application of single bootstrap methods is extended by examining the use of prediction error tests based on a double bootstrap approach. It is argued that while, unlike the standard Chow procedure, the single bootstrap method proposed by Godfrey and Orme (2000) allows asymptotically * Corresponding author. Tel.: 144-161-275-4856; fax: 144-161-275-4928. E-mail address:
[email protected] (C.D. Orme). 0165-1765 / 02 / $ – see front matter PII: S0165-1765( 02 )00088-5
2002 Elsevier Science B.V. All rights reserved.
L.G. Godfrey, C.D. Orme / Economics Letters 76 (2002) 429 – 436
430
valid inference in the presence of nonnormality, it is likely to give poorer control over finite sample significance levels than a double bootstrap (adjusted p-value) approach. The plan of the paper is as follows. Section 2 contains a brief description of the model and the standard Chow test in the presence of nonnormality. Section 3 contains discussions of single and double bootstrap methods, with comments on reducing computational cost. Section 4 contains Monte Carlo evidence that emphasizes the pitfalls of using conventional critical values and illustrates the more secure basis for inference afforded by bootstrap methods. Some concluding remarks are made in Section 5.
2. The model and Chow’s test Suppose that a regression model links a dependent variable to k strictly exogenous explanatory variables, and that n observations are available. It is convenient to partition the observation vector for the dependent variable as y 5 ( y 91 ,y 29 )9, where y 1 is (n 1 3 1), and y 2 is (n 2 3 1) with n 5 n 1 1 n 2 . Similarly, the (n 3 k) data matrix for the regressors is partitioned as X 5 (X 91 ,X 92 )9, conformably with y. It is assumed that n 1 . k and n 2 # k. It is also assumed that the first n 1 observations are generated by the linear regression model y 1 5 X1 b 1 u 1 ,
(1)
where rank(X1 ) 5 k , n 1 , and elements of u 1 are independently and identically distributed (iid) with zero mean and variance s 2 , having a common cumulative density function (cdf) denoted by G. It is convenient, but not essential, to assume that an element of b corresponds to an intercept. The OLS estimate of b from (1) is
bˆ (n 1 ) 5 (X 91 X1 )21 X 19 y 1 ,
(2)
and bˆ (n 1 ), which is unbiased for b, can be used to predict y 2 , given X2 . This yields a vector of prediction errors: e 2 5 y 2 2 X2 bˆ (n 1 ).
(3)
If the same model applies to all observations, then y 5 Xb 1 u,
(4)
with the n elements of u being independent drawings from the distribution characterized by G and where u 5 (u 91 ,u 92 )9 conformably with y. Under (4), E( y 2 ) 5 X2 b, implying that E(e 2 ) 5 0. A check for predictive failure can, therefore, be carried out by testing the joint significance of the n 2 elements of e 2 in (3). Chow’s (1960) statistic is often used for this purpose and is given by [RSS(n) 2 RSS(n 1 )] /n 2 T 5 ]]]]]]], [RSS(n 1 )] /(n 1 2 k)
(5)
where RSS(n 1 ) and RSS(n) are the residual sums of squares from the OLS estimation of (1) and (4), respectively. Chow shows that if, in addition to (4) being valid, the iid elements of u are normally
L.G. Godfrey, C.D. Orme / Economics Letters 76 (2002) 429 – 436
431
distributed, T is distributed as F(n 2 ,n 1 2 k), with significantly large values being interpreted as signals that the null hypothesis is inconsistent with the data. Comparison of Chow’s test statistic with critical values from the F(n 2 ,n 1 2 k) distribution has long been common practice in empirical work. There is, however, no good reason to suppose that error terms are normally distributed. Consequently, it is important to examine the effects of nonnormality on the conventional Chow test. Godfrey and Orme (2000) use asymptotic analysis and Monte Carlo experiments to investigate the properties of the usual F-test of the significance of T when the same mean function applies to all observations but the errors have a nonnormal distribution. In their asymptotic analysis, Godfrey and Orme examine behaviour as n 1 → ` with k and n 2 # k fixed. This approach is used to obtain approximations that are intended to be useful when: (i) n 1 is quite large compared with both k and n 2 ; (ii) and n 2 is not greater than k (which is regarded as fixed in standard asymptotic analysis). Under the assumptions of Godfrey and Orme (2000), the standard Chow test statistic is not asymptotically robust to nonnormality, having an asymptotic distribution that depends on the cdf G.1 Godfrey and Orme also report Monte Carlo evidence that illustrates how standard practice can produce rejection rates which are markedly different from the nominal values in the presence of nonnormality.
3. Bootstrap methods Although it is not, in general, asymptotically valid (as n 1 → ` with k and n 2 # k fixed) to take critical values for T from the F(n 2 ,n 1 2 k) distribution, these critical values can be estimated consistently without recourse to analytical work based upon spuriously precise assumptions about the error distribution. The methods of estimation proposed in this paper involve the use of the bootstrap, as discussed by Beran (1988). In Beran’s terminology, Chow’s test statistic T is not asymptotically pivotal because it has an asymptotic distribution, when E(e 2 ) 5 0, that is not independent of the cdf G (see Godfrey and Orme, 2000, Section 2.1).2 In this case, and under suitable regularity conditions, the application of a single (nonparametric) bootstrap produces a test that has an error in its significance level of the same order in n as the correct asymptotic test. Exploiting this result, Godfrey and Orme (2000) propose a single bootstrap method for carrying out prediction error tests in the presence of nonnormality that consists of the following stages. (SB1) Estimate under the null hypothesis by applying OLS to (4) to obtain the sample values of T, bˆ (n) 5 (X9X)21 X9y, and the associated residual vector uˆ having elements uˆ t , t 5 1, . . . ,n. (SB2) Generate B1 bootstrap samples of size n, using the scheme y *i 5 Xbˆ (n) 1 u *i , i 5 1, . . . ,B1 , 1
(6)
The prediction error test could be used when n 1 and n 2 are both larger than k, rather than the analysis of covariance test. The referee has pointed out that asymptotic analysis of robustness to nonnormality could then be studied in two scenarios: n 1 and n 2 both tend to infinity with n 2 5 O(n 1 ); and n 1 and n 2 both tend to infinity with n 2 5 o(n 1 ). This analysis will be the subject of future research. 2 Breusch’s (1980) results imply that this asymptotic distribution does not depend on the values of b and s 2 when the null hypothesis is true. However, in order to be asymptotically pivotal, a statistic must have an asymptotic null distribution that is independent of b, s 2 and G.
L.G. Godfrey, C.D. Orme / Economics Letters 76 (2002) 429 – 436
432
where u *i 5 (u *i 1 , . . . ,u *in )9 is derived by simple random sampling with replacement from the empirical distribution function 1 Gˆ : probability ] on uˆ t , t 5 1, . . . ,n. n (Since the bootstrap data generation process must mimic the model under the null hypothesis, the elements of u *i must have expected values equal to zero. If the model does not include an intercept term, the OLS residuals in Gˆ should be centred according to their sample mean.) OLS estimations using the bootstrap samples provide values of Chow’s statistic, parameter estimates, and residuals which are denoted by T i* , bˆ *i (n), and uˆ *i 5 (uˆ *i 1 , . . . ,uˆ *in )9, respectively, i 5 1, . . . ,B1 . (SB3) The B1 values of T i* could be ordered to obtain an estimate of the critical value for the desired significance level a (see, for example, Horowitz, 1994). However, a more flexible (and for later purposes, more convenient) approach is to estimate the p-value for the observed test statistic T from stage (SB1) as B1
p * (T ) 5
O I(T * . T ) /B , i
1
i51
where I( ? ) is the indicator function that equals unity when its argument is true and is zero otherwise. The null hypothesis that the same mean function applies to all data is rejected if p * (T ) , a. Godfrey and Orme (2000) find that the above single bootstrap method gives much better control over finite sample significance levels than the standard F-test when the strong auxiliary assumption of normality is relaxed. However, the results of Beran (1988) indicate that there is scope for further improvement under standard regularity conditions. More precisely, the single bootstrap gives an error in rejection probability relative to a that is at most O(n 21 / 2 ) in this case, since the test statistic is not asymptotically pivotal. However, in general, when applied to an asymptotically pivotal test statistic, the bootstrap will reduce this discrepancy to be at most O(n 21 ). In view of this, the double bootstrap approach is motivated by viewing p * (T ), rather than T, as the test statistic because the former is asymptotically pivotal, having an asymptotic null distribution that is uniform between zero and unity (see Davison and Hinkley, 1997, Section 4.5). Consequently, the error in rejection probability for a test in which p * (T ) is bootstrapped is of a smaller order in n than that for the method proposed by Godfrey and Orme (2000). Following Davison and Hinkley (1997), a double bootstrap test for predictive failure can be implemented by an ‘adjusted p-value’ approach using the following stages. (DB1) Same as stage (SB1) above. (DB2) Same as stage (SB2) above. (DB3) For each of the first level bootstrap pairs ( bˆ *i (n),uˆ i* ), i 5 1, . . . ,B1 , generate B2 second level bootstrap samples of size n according to y ij** 5 Xbˆ *i (n) 1 u ** ij , j 5 1, . . . ,B 2 ,
(7)
L.G. Godfrey, C.D. Orme / Economics Letters 76 (2002) 429 – 436
433
where the n elements of u ** are obtained by simple random sampling with replacement from ij 1 Gˆ i : probability ] on uˆ it* , t 5 1, . . . ,n. n (As with (SB2), the residuals must be recentred to have zero mean if b does not contain an intercept.) (DB4) Let the Chow statistics derived from the second level bootstrap data y ** be denoted by T ** ij ij , i 5 1, . . . ,B1 and j 5 1, . . . ,B2 . Calculate the B1 p-values defined by B2
p ** i (T * i )5
O I(T ** . T * ) /B , i 5 1, . . . ,B . ij
i
2
1
j51
These terms can now be used to gain an asymptotic refinement over the single bootstrap rejection rule ‘Reject null hypothesis if p * (T ) , a ’. An adjusted p-value is calculated as B1
padj (T ) 5
O I( p**(T * ) # p*(T )) /B , i
i
1
(8)
i 51
and the null hypothesis is rejected if padj (T ) , a. At first sight, the computational cost of the adjusted p-value approach might seem very large, requiring B1 3 B2 OLS estimations for the calculation of a single test. However, it is possible to reduce this cost to only seconds of waiting time on a modern computer. First, the only regressions that need to be estimated are those involving the genuine data, i.e. those of economic interest. For the calculation of Chow test statistics corresponding to bootstrap data, it is useful to note that, when (4) is valid, T of (5) can be rewritten as [u 29 u 2 2 u9Pu 1 u 19 P1 u 1 ] /n 2 T 5 ]]]]]]]]] 5 f(u;P,P1 ), [u 91 u 1 2 u 19 P1 u 1 ] /(n 1 2 k) 21
21
(9)
where P 5 X(X9X) X9 and P1 5 X1 (X 19 X1 ) X 19 . The projection matrices P and P1 in (9) need only be calculated once and outside bootstrap loops because they are fixed for all bootstrap samples. In a ˆ first level bootstrap, i.e. stage (DB2) above, having selected u *i 5 (u i*9 1 ,u i*9 2 )9 from G, T i* can be * ˆ * calculated as f(u i ;P,P1 ) using (9). The residual vector u i that would be obtained from an OLS regression using first level bootstrap data y i* can instead be more efficiently calculated without any additional matrix inversion using uˆ *i 5 u i* 2 Pu i* . The elements of the vector uˆ i* are required for drawing the second level errors u ** from Gˆ i in stage (DB3), but having selected u ** the ij ij corresponding Chow statistic T ij** can be evaluated as f(u ** ij ;P,P1 ) using (9). Second, it is not always necessary to carry out all B2 second level bootstraps. It is clear from (8) that second level bootstrapping can stop as soon as the value of the indicator function is known. For example, with p * (T ) 5 0.04 and B2 5 100, simulation can stop for any i as soon as five values of the T ij** are found to be greater than T i* because p i** (T i* ) must be greater than p * (T ), implying that I( p i** (T i* ) # p * (T )) 5 0. Horowitz et al. (1999) give a full discussion of stopping rules for double bootstrap tests and report impressive computational savings in their application.
L.G. Godfrey, C.D. Orme / Economics Letters 76 (2002) 429 – 436
434
4. Monte Carlo experiments and results
4.1. Design The purpose of this small study is to investigate the efficacy of the double bootstrap procedure (for obtaining asymptotically valid critical values) relative to the single bootstrap and standard Chow procedures under various assumptions about the distribution of the error terms. Following Godfrey and Orme (2000), the regression model employed is
O x b 1u , 6
yt 5
tj
j
t
(10)
j51
in which: x t1 5 1; x t2 is drawn from a uniform distribution with parameters 1 and 31; x t 3 is drawn from a log-normal distribution with ln(x t3 ) | N(3,1). Unlike x t2 and x t3 , the remaining regressors are serially correlated with x t4 5 0.9x t21,4 1 vt 4 , x t5 5 0.6x t21,5 1 vt 5 , x t6 5 0.3x t21,6 1 vt 6 , with vts being independently normally distributed, such that E[x ts ] 5 0 and var[x ts ] 5 1, for s 5 4,5,6. All regression coefficients bj are set equal to zero and the error terms, u t , of (10) are iid (0,1) in all experiments, which involves no loss of generality (Breusch, 1980). Since sensitivity to nonnormality is the primary focus, the disturbances u t are obtained by standardizing pseudo-random variables drawn from several distributions. These disturbance distributions are: normal, Student’s t with five degrees of freedom, uniform over the unit interval, chi-square with two degrees of freedom and log-normal. Previous research indicates that this range of distributions should provide a reasonable guide to how poor (oversized or undersized) any procedure might be in a practical situation. Finally, the experiments involve three combinations of (n,n 2 ): n 5 30, 50, 80 all with n 2 5 6. The reason for this choice of n 2 is that the Chow prediction test is usually applied when n 2 # k and Godfrey and Orme (2000) find that, given n, the performance of the single bootstrap procedure deteriorates as n 2 increases. Thus, in order to give a stringent check for tests based upon (10), attention is restricted to the case of n 2 5 k 5 6. For the standard Chow test, critical values are obtained from an F(6,n 1 2 6) distribution.
4.2. Results Results were obtained for three nominal significance levels of 1, 5 and 10%, using 5000 replications of sample data with B1 5 500 and B2 5 100. The main features of the results relating to the standard Chow test and the single bootstrap are as given by Godfrey and Orme (2000). The conventional Chow prediction error test cannot be relied upon when the errors are not normal; and employing the single bootstrap to obtain asymptotically valid critical values, under any of the nonnormal error distributions, offers substantial improvement (with estimated significance levels being much closer to their nominal
L.G. Godfrey, C.D. Orme / Economics Letters 76 (2002) 429 – 436
435
Table 1 Percentage rejection rates of prediction error tests with nominal size of 5% (n 1 ,n 2 ) (24,6)
(44,6)
(74,6)
(a) Errors derived from a normal distribution F critical value 5.58 4.94 Single bootstrap 5.76 5.36 Double bootstrap 5.44 5.16
5.02 5.42 4.80
(b) Errors derived from a t(5) distribution F critical value 8.90 8.54 Single bootstrap 7.76 6.54 Double bootstrap 6.22 5.10
8.40 5.56 4.62
(c) Errors derived from a uniform F critical value 2.42 Single bootstrap 4.02 Double bootstrap 4.28
0.86 4.02 4.50
distribution 1.60 3.92 4.38
(d) Errors derived from a x 2 (2) distribution F critical value 10.88 10.92 Single bootstrap 8.62 7.50 Double bootstrap 6.48 5.42
11.60 7.62 5.96
(e) Errors derived from a log-normal distribution F critical value 15.24 13.18 12.92 Single bootstrap 11.30 8.32 7.12 Double bootstrap 7.54 5.56 5.58
values). However, for the small extra computing cost, the double bootstrap affords further and useful improvement. Adopting the test procedure described by Godfrey and Orme (2000, p. 75), for each of the nominal significance levels across all error distributions, the null hypothesis that the true significance level is within 0.5% of the nominal level cannot be rejected for the double bootstrap procedure when n 5 50 or 80. This is not the case for the single bootstrap procedure which is sensitive to asymmetry of the error distribution, performing relatively poorly under the chi-square and log-normal distributions. The superiority of the double bootstrap is also observed when n 5 30. These features are illustrated by the results for the nominal significance level of 5%, which are presented in Table 1.
5. Conclusions The standard Chow test for predictive failure is widely used in applied work when the number of prediction errors is no larger than the number of regressors, but it is not robust to nonnormality of the regression model errors. The current paper emphasizes this sensitivity and, based on well established asymptotic arguments, proposes a simple double bootstrap procedure designed to be asymptotically robust to nonnormality. In this context, the double bootstrap is superior to the single bootstrap, previously analysed by Godfrey and Orme (2000), in that it provides an error in the rejection
436
L.G. Godfrey, C.D. Orme / Economics Letters 76 (2002) 429 – 436
probability which is of a smaller order of magnitude (in the sample size) than that yielded by the single bootstrap. A small Monte Carlo study supports this view by illustrating that inference based on the double bootstrap is more reliable than that based on the single bootstrap. Furthermore, this improvement can be obtained at little extra computing cost over the single bootstrap since various stopping rules can be applied at the level of the second bootstrap.
Acknowledgements We are grateful to the referee for helpful comments that improved the exposition of this paper.
References Beran, R., 1988. Prepivoting test statistics: a bootstrap view of asymptotic refinements. Journal of the American Statistical Association 83, 687–697. Breusch, T.S., 1980. Useful invariance results for generalized regression models. Journal of Econometrics 13, 327–340. Chow, G., 1960. Tests of equality between sets of coefficients in two linear regressions. Econometrica 28, 591–605. Davison, A.C., Hinkley, D.V., 1997. Bootstrap Methods and their Applications. Cambridge University Press, Cambridge. Godfrey, L.G., Orme, C.D., 2000. Controlling the significance levels of prediction error tests for linear regression models. Econometrics Journal 3, 66–83. Horowitz, J., 1994. Bootstrapped-based critical values for the information-matrix test. Journal of Econometrics 61, 395–411. Horowitz, J.L., Lobato, I.N., Nankervis, J.C., Savin, N.E., 1999. Bootstrapping the Box–Pierce Q Test: A Robust Test of Uncorrelatedness. University of Iowa.