Modified wald tests under nonregular conditions

Modified wald tests under nonregular conditions

JOURNAL OF Econometrics ELSEVIER Journal of Econometrics 78 (1997) 315-332 Modified Wald tests under nonregular conditions H e l m u t L i i t k e ...

909KB Sizes 0 Downloads 69 Views

JOURNAL OF

Econometrics ELSEVIER

Journal of Econometrics 78 (1997) 315-332

Modified Wald tests under nonregular conditions H e l m u t L i i t k e p o h l * , M a i k e M. B u r d a Institut fiir Statistik und Okonometrie, Humboldt-Universitiit zu Berlin, Spandauer Str. 1, 10178 Berlin, Germany Received January 1994; final version received April 1996

Abstract Under nonregular conditions Wald tests are known to have incorrect size even asymptotically in part of the parameter space. Modifications are discussed which ensure an asymptotic )~2-distribution of the Wald statistic under H0. As an example, Wald tests for multi-step causality are considered. A variable yt is h-step causal for another variable xt if the information in yt helps improving the j-step forecasts of xt for some j = 1, 2,..., h. If more than two variables are involved and are generated by a finite order vector autoregressive (VAR) process, this type of multi-step noncausality implies a set of highly nonlinear restrictions on the VAR coefficient matrices. For this type of nonlinear restrictions standard Wald tests fail to have limiting ;(2-distributions in general. Key words: Nonlinear restrictions; Wald test; Vector autoregressive processes; Multiple time series; Causality J E L classification: C12; C32

1. Introduction Suppose it is desired to test the null hypothesis Ho: r(O) = 0 for some parameter vector 0. In this case a W a l d test is easy to use when a parameter estimator is available with a nonsingular asymptotic normal distribution and the matrix o f first-order partial derivatives o f the restrictions, Or(O)/~O I, is o f full rank for all 0 in the parameter space. Under these regularity conditions, the W a l d statistic has an asymptotic zZ-distribution under Ho. The popularity o f Wald tests is due to the simplicity o f the basic concept and the fact that the

* Corresponding author. The authors thank Halbert White for suggesting the procedure following Proposition 2 as well as Jiirgen Wolters, two referees, an associate editor and A. Ronald Gallant for various comments that have resulted in substantial improvements of the paper. Financial support by the Deutsche ForschungsgemeinschaR, Sonderforschungsbereich 373, is ~atefully acknowledged. 0304-4076/97/$17.00 9 1997 Elsevier Science S.A. All rights reserved PII S03 0 4 - 4 0 7 6 ( 9 6 ) 0 0 0 1 5-2

316

1-1. Liitkepohl, M.iVl. Burda/Journal of Econometrics 78 (1997) 315-332

two aforementioned conditions are often satisfied in practice. Moreover, in some situations Wald tests have the same optimality properties as likelihood ratio tests. It has been pointed out, however, that in some other situations the power properties of the Wald test are quite poor (e.g. Gregory and Veal, 1985; Breusch and Schrnidt, 1988). Another problem arises from the fact that occasionally parameter estimators have a singular asymptotic distribution for some parameter values and the matrix of partial derivatives Or(O)/OO' may have reduced rank. In that situation, the standard Wald statistic may not have its usual asymptotic g2-distribution under the null hypothesis. This problem arises for instance in VAR (vector autoregressive) processes. If the process is stationary, the multivariate LS estimator of the coefficients has a nonsingular asymptotic distribution whereas the distribution becomes singular, if some variables are integrated or cointegrated. The matrix of first-order partial derivatives Or(O)/OO~ is likely to have reduced rank over part of the parameter space if the function r(O) involves products of the elements of 0. Such functions are relatively common in practice. For instance, impulse responses and related quantities of interest in a VAR analysis involve products of the VAR coefficients. Similarly, such functions come up in analyzing multi-step causality in VAR models (see Dufour and Renault, 1993; Bruneau and Nicolai, 1993) or Granger causality in VARMA models (see L/itkepohl, 1991, Sect. 6.7.1, Boudjellabah, et al. 1992). In this article we propose two simple modifications to Wald tests that result in asymptotic x2-distributions of the Wald statistic under the null. The first proposal overcomes the singularity problem by simply adding random noise to the restrictions whereas the second modification amounts to estimating the rank of the approximate covariance matrix and then modifying the Wald statistic accordingly. The paper is structured as follows. In the next section the modified Wald tests are presented. In Sect. 3 testing for multi-step causality is discussed as a special case and simulations are used in Sect. 4 to explore the small sample implications of the asymptotic results of the earlier sections. Conclusions are given in Sect. 5.

2. Wald tests

Suppose the parameters of interest are contained in the n-dimensional vector 0 and suppose further that we wish to test the null hypothesis H0: r(O) = 0

against

Hi: r(O) ~ O,

(2.1)

where r: R n ~ •J is a continuously differentiable function. Given an asymptotically normal estimator 0, v"-T(0 - 0) & N(0, Zo),

(2.2)

H. Liitkepohl, M:M. BurdalJournal of Econometrics 78 (1997) 315-332

317

the usual Wald statistic is )~Watd = TF(O)! k-~7

0 ~0 )

(2.3)

F(0),

provided the inverse exists. Here T is the sample size and ~r/O0 ~ and Z0 are consistent estimators of 0r/00 t and Zo, respectively. If Ho is true and

Zrr

=

Or r Or/

is nonsingular, 2w~d has an asymptotic Za(J)-distribution. This is not necessarily true, however, if Zr(0) is singular. The latter may hold, if So is singular or if rk(Or/O0') < J. In that case,

0r t oTff may still be nonsingular and the Wald statistic may not have an asymptotic zZ-distribution (see Andrews, 1987). The problem may be viewed as follows. The asymptotic normality of 0 in (2.2) implies that, under H0, v'~r(0) • N(0, Sr(O)) . . . .

(2.4)

Thus, if S~(O) is singular, some components of r(O) or linear combinations of them converge at a rate higher than v/-T. In other words, for our purposes they are estimated 'too efficiently'. This suggests that one solution to the problem is to make them slightly less efficient. A simple way to do so is to add random noise to the estimator r(0). The following proposition states precisely how this device can be used to obtain a Wald statistic with asymptotic Z2(j)-distribution.

Proposition 1. Suppose the estimator 0 satisfies (2.2) and Ho in (2.1) is true. Let Zw be a positive semidefinite matrix such that Z,(o) + Sw is positive definite and let w ~ N(0, Zw) be a (J x 1) random vector which is independent o f O. Moreover, S,(o) is a nonsingular consistent estimator of L',(0). Then the modified Wald statistic "w~,a~' m~= T r(0) +

(2r<0) + Zw)-I

r(O) + w

d Z2(j).

Proof The proposition follows from (2.4) and basic asymptotic results.

[]

Although the modification of the Wald statistic is likely to result in a loss in power it is still consistent against fixed alternatives and has local power against

H. Liitkepohl, M. M. Burda / Journal of Econometrics 78 (1997) 315-332

318

alternatives Hi: r(O) = 6/x/-T

(2.5)

where 6 # 0 is a (J x 1) vector. To see this, note that under Hi the modified Wald statistic has an asymptotic noncentral ;(2-distribution,

2~v~,~L %z(j, 5'(Zr(O) + Zw)-15).

(2.6)

Obviously the power will depend on Sw and hence on w. It is easy to choose some w that remedies the problem under H0. Any normal vector w ~ N(0, Zw) with positive definite covariance matrix Z~ results in a nonsingular matrix Z~(0)+ Zw. On the other hand, the choice of Sw clearly has an impact on the noncentrality parameter and, thus, may severely reduce the power of the test. The choice of Z'~ and hence w should therefore be based on a priori knowledge of St(0) and may even be data dependent. A detailed example will be given in the next section. The idea to reduce the efficiency of parameter estimators to make Wald tests well-behaved has also been utilized in studying cointegrated VAR processes. For instance, Toda and Yamamoto (1995) and Dolado and Lfitkepohl (1996) propose to estimate additional, redundant coefficient matrices and thereby obtain slightly less efficient estimates. Yamamoto (1994) also uses additional random noise in the cointegration framework. An alternative cure to the problem results from the fact that the inverse in (2.3) may be replaced by a generalized inverse and the asymptotic xZ-distribution of the Wald statistic is maintained if the estimator z~r(o) has the same rank a s Zr(O). More precisely, if rk(Zr(0)) = rk(Z~(0)) = JI, +

2wa,d = Tr( O), 2+r(o)r(O) d~ %2(j, )

(2.7)

if H0 is true (Andrews, 1987). Here B + denotes the Moore-Penrose generalized inverse of a matrix B. Of course, using some estimator 2~(0), the rank condition is not likely to be satisfied in practice. However, it may be possible to overcome the problem of an unknown asymptotic distribution of the Wald statistic by constructing or finding a suitable reduced rank estimator Zr(O). The following proposition provides the basis for accomplishing this. The idea is to use only the principal components associated with the largest eigenvalues of the estimated covariance matrix.

Proposition 2. Suppose the estimator 0 satisfies (2.2) and H0 in (2.1) is true. Let Zr(o) be a consistent estimator of Zr(O) with eigenvalues 2z >1 "'" >12J and I2 an orthogonal matrix such that Zr(o) = VAV', where ~1 = diag(21 ..... 2j). For some e > 0 , define Jc to be the number of 2i>c and let /lc = diag( ~l .... , ~j, , 0 ..... 0).

H. Liitkepohl, M. 3I. BurdaI Journal of Econometrics 78 (1997) 315-332

319

Then, if c is smaller than the largest eigenvahte of Zr(O), 2+,d = rr(O)'fzzi+P'r(O) &

Z2(Jc)

where Jc is the number of eigenvalues of 2r(o) 9reater than c. Proof Let 2t > t . . . >~2j be the eigenvalues of St(o) and let V be an orthogonal matrix such that 2Y~(0)= VAV', where A = diag(2t ..... 2:). Moreover, define Ac = diag(21, . .,. ).j: 2-) . . 0,. • 0) and note that A + = diag(2[ q ) ' ' ' ~ arc ' 0 ..... 0). Con^+

+

sistency of Sr(O) implies plim I;"= V, plim zic = Ac and hence plim A c = A c . It follows that plim V,/[+I)' = VA+V '.

(2.8)

Denoting the first (J x Jc) submatrix of V by Vc with corresponding estimator ~, we have from (2.2) that

v/ ( r ( O ) - r ( O ) )

d~ N(0, diag(2, . . . . . 2at)).

Hence, for r(O) = O,

Tr(O)'V~ diag(;q- I,. ..,2 2-I ) Vc,r(O) = Tr(O),VA+V,r(O) d )~2(jc)" Thus, the proposition follows from (2.8) and a standard limiting result for Z2 statistics. [] In Proposition 2 all eigenvalues o f ~r(O) below the threshold c are replaced by zero. Thereby, some restrictions are eliminated. This will not necessarily result in a loss in power. Depending on the alternative, this may even lead to improved power of the test. From the above considerations it is easily seen that under local alternatives

H~: ,-(0) = ~ / v ~ the Wald statistic has an asymptotic noncentral z2-distribution, 2+~,d & Z2(J~,'/),

(2.9)

with noncentrality parameter y = 6' VA + V' 6. If Jc < J, the modified test has fewer degrees of freedom than the standard Wald test and hence may have improved power, in particular if superfluous restrfctions are removed. This procedure is similar to one proposed by Gallant (1977) and Gallant and Tauchen (1989) for taking care of, e.g., parameters unidentified under H0 and redundant restrictions in nonlinear models. In that context Gallant (1977) finds in a small sample Monte Carlo investigation that power advantages may in fact result from taking into account fewer restrictions.

320

H. Liitkepohl, M.M. Burda / Journal of Econometrics 78 (1997) 315-332

On the other hand, since VA + V I is not positive definite but just semidefinite, in our case the asymptotic power may be equal to the size of the test even if b ~ 0, i.e., the noncentrality parameter ~ may be zero. This contrasts with the result for "~W~,d~m~given in (2.6) which implies that the noncentrality parameter of mod will result in "'W,~d~"~is always nonzero if b r 0. Hence, in such a situation 2Wald a test more powerful than "~Wald" + To ensure that 2w,~d + includes all restrictions which correspond to nonzero eigenvalues of St(o), the threshold c may be chosen dependent on the sample size T. If c ~ 0 as T ~ ~ , all nonzero eigenvalues of Zr(O) will eventually be greater than c. On the other hand, c must not decline too fast with growing T to make sure that the estimates of the true zero eigenvalues are still smaller than the threshold and are set to zero at least in large samples. For instance, if the eigenvalues of St(0) are Op(T -1/2) (that is, the v/-T(~i- 2i) converge in distribution), c should go to zero somewhat slower than 1/v/-T. For example, c = T -1/3 would be a possible choice. Such a procedure is similar to a pretest where a null hypothesis Ho: 2i = 0 is rejected if 2i>c. Thus, the procedure has the properties of a pretesting sequence. To assess these properties it would be necessary to know the asymptotic properties of the estimated eigenvalues. A pretesting procedure of a slightly different type has in fact been proposed by Boudjellabah et al. (1992) for the case of testing for Granger causality in VARMA models. Of course, given the problems associated with Wald tests, one may want to consider other procedures such as likelihood ratio (LR) tests. It should be noted, however, that these tests are also beset with problems in some of the situations which we have in mind. For example, one difficulty may be the computation of the restricted estimates in a highly nonlinear and potentially nonidentified situation. The possible lack of identification in some parts of the parameter space also requires nonstandard asymptotics for deriving the properties of the LR tests. In the next section we will discuss in detail an example where testing problems occur and we will show how they can be solved with the methods proposed in the foregoing. There are also other examples that can be handled in an analogous way. As mentioned earlier, testing restrictions on the levels parameters of a cointegrated VAR process could be an alternative example. We emphasize, however, that the modifications of the Wald tests considered here provide general solutions to the nonsingularity problem. In special situations, it may, of course, be preferable to search for more efficient solutions. In any case, the solutions discussed here may be helpful in clarifying the structure of the problem.

3. Testing noncausality restrictions Suppose the vector time series xt, Yt, zt with dimensions Kx, Ky, Kz, respectively, are jointly generated by a VAR(p) process

1t. Liitkepohl, M.M. Burda/ Journal of Econometrics 78 (1997) 315-332

Yt zt

=

Ai 1 Yt-i ~. zt-i

=

+

uy, t Uz,t

321

(3.1)

,

where

Ai =

Axx, i Ayx, i A~,i

Axy, i Ayy, i Azy, i

Axz, i ] Ayz, i , Azz, i

i = 1 . . . . , p,

with Akl, i having dimension (Kk • K~) and ut is zero mean white noise with nonsingular covariance matrix Z',. Let xt(h]w) be the optimum (minimum MSE) forecast of xt, h periods ahead, based on the information {ws[s <~t}. Then we say that yt is h-step noncausal for xt, (h) y, 7C-~xt i f f x t ( j l x , z ) = x , ( j [ x , y , z )

for all j = 1.... ,h.

(3.2)

The case of primary interest in the following is when h = c~. It is well-known that (1) yt 7c--+ xt

iff Axy, i = O ,

i = 1 . . . . . p.

(3.3)

Defining A = [Al . . . . . Ap] and c~ = vec(A), where vec is the usual column stacking operator, the restrictions can be written as R~ = 0 for a suitable matrix R. Moreover, defining the ( K p • K p ) matrix

[K(p-i) 0 where K = Kx + Ky + K~ with Ik being the (k • k) identity matrix, A(J) = J A j, where J = [/x: 0: ...: 0] is (K • K p ) , and c~(J) = vec(A(J)), it can be shown that (h) Yt /

'xt

iff Rc~( j ) = O

for

j=

1.... ,h.

(3.4)

Furthermore, Dufour and Renault (1993, Proposition 4.1) show that (c~) Yt / ' x t i f f

Rc~( j ) = O

for

j=

1.... , p K z + l .

(3.5)

322

H. Liitkepohl, M.M. Burda/Journal o f Econometrics 78 (1997) 315-332

To see that these restrictions may cause problems for a Wald test consider a three-dimensional VAR(1) process: Xt

r xt 1 --'-- A 1 I Y t - 1 L Zt- 1

II

9 Yt

Zt

~xx

O~xy .

O~yx

O~yy

O~yz

L c~=

~y

~=

; r + ut =

t'xtf | Yt-!

L Zt- I

+

bly, t

"

Uz, t

(3.6)

Here, a test of oo-step noncausality from Yt to xt, (oo) Yt /

' xt

requires to test h = pKz + 1 ---- 2 restrictions on the coefficient vector ~ which are of the following form: r(~)=[

~.r

~xy

-~ ~ x y ~ y y q- Otxz~zy

]=

[00] "

(3.7)

These restrictions are fulfilled in the following cases: ~y=~==0,

Cqyr

~xy :

~xz

Otzy -~- 0,

(3.8)

r 0,

(3.9)

C~xy= ~= = ~zy = 0.

(3.10)

Note that the matrix of first-order partial derivatives of the function r(c0, ~r

_

--

[0.0 0.0 0.0

1.0

0.0 0.0 0.0 0.0

0.0

~Xy 0.0 0 . 0 C~xx "~ O~yy ~xy 0 ~ O~zy 0 . 0 0 . 0

]

(3.11)

'

has reduced rank if (3.10) holds. Hence, the standard Wald statistic will not have its asymptotic z2-distribution if (3.10) is true under H0. The modifications discussed in Sect. 2 can be used to obtain Wald statistics with asymptotic x2-distributions under H0. To derive a suitable covariance matrix mod we proceed as follows. Let Zw for 2wa~a, ~(2)

a(h) =

.

(3.12)

and ~(h) the corresponding estimator based on the multivariate LS estimator of ~. The hypotheses of interest are H0: (Ih |

(h) = 0

against

HL: (Ih |

(h) r 0

(3.13)

H. Liitkepohl, M.M. Burda I Journal of Econometrics 78 (1997) 315-332

323

where h = Kzp + 1. Furthermore we define the (hpKxKy x hpK~Ky) matrix

iOoIh-~ | diag(RZ~R') o 1"

Sw(h) =

For some small 2 > 0 let w2 " (h) ~ N(0, 2Sw(h)) be independent of ~. Moreover, note that 0e(h)/~cd consists of subvectors aa/~cd = IK:p and a~(J) ~ .. ~vec(A j ) ~ , = (IKp ~ a )

= (Ixp |

(A,)j_I_ i | Ai

j--I = ~ (A') j-l-i

|

',

)

Ovec(A)

j = 2 . . . . ,h.

i=0

With this notation 1 and these results we get the following corollary o f Proposition 1.

Corollary 1.

Let Z=(h) be the usual consistent estimator of Z,(h), that is,

~ ( h ) = oa(h) S, ~a(h)' Suppose ff~h) ,-, N( 0, 2Zw( h ) ) is generated just as w (h 2 ) but with covariance matrix

[o~ 2w(h) -Then

o

(

2~,~,~ = T (Ih | x

]

Ih_~ | diag(R2~R')

"

(h) + v ' ~ J

[([h |

|

+

X2~(h)l-'

~j

(Ih |

%2 (hpK~Ky) under Ho.

t Note that in terms of the notation used in the foregoing sections 0 = ~, r(O) = (Ih | R)ct(h),

Z o = S~ and Or(O)s Or(O)'

z~(o) = - T -

~

0o~( h )

~

Od(h}'

= (Ih | R)-~7-~' Z=---~-g--~(/h | R')

324

1t. Liitkepohl, M.M. Burda/Journal of Econometrics 78 (1997) 315-332

Proof. Since plim z~w(h) = Z~(h) we have ff,}h) ~ w}h) ,,, N(0, 2Zw(h)) which is independent of c~ by assumption. Moreover, the nonsingularity of (Ih | R) Z~(h)(Ih @ R') + 2Zw(h) follows because Z~ and hence RZ~R', the upper lefthand ( pKxKy x pKxKy ) block of (Ih | R) Z~( h ) (Ih @ R' ), is nonsingular. Furthermore the lower ((h - 1)pKxK), x (h - 1)pKxKy) diagonal block o f ).Z~,(h) is a diagonal matrix with positive diagonal elements which is also nonsingular. Thus the corollary follows from Proposition 1. [] In this corollary the efficiency of the estimator 6(h~ is reduced by adding noise to some of its components. Since this is likely to result in a loss in power of the test, especially if too much noise is added in relation to the estimated variance, the amount of noise (the variance of the noise) is linked to the variance of the estimator by making it dependent on the variability of the estimator through Sw(h) and by choosing 2 close to zero. In fact, by using a sufficiently small 2 the investigator can in principle make the loss in efficiency arbitrarily small. In the simulation study reported in the next section we will explore the impact of + different 2 values and we will also compare ~r,od to 2Wald.For the present example, the precise structure of the latter statistic follows easily from Proposition 2. "~Wald

4. A simulation study Although the modified Wald statistic 9 mod has the limiting z2-distribution given 2w,id in Proposition 1 for any 2 > 0 used in w~)), the choice of this quantity is likely to be important for the small sample properties of the test. Therefore a Monte Carlo experiment was conducted in order to investigate small sample size and power of this modified Wald test for different values of 2. A 2 value of 0 and hence the standard Wald test is also included for comparison purposes. If the matrix of first-order partial derivatives of the restrictions, ~r/~O', has reduced rank, the standard Wald statistic does no longer have a limiting z2-distribution (see (2.3)) whereas the modified Wald statistic still has. Therefore, in that case we would expect the modified test to have an actual (empirical) size closer to the nominal (theoretical) one than the standard test. If on the other hand ~r/~O' has full rank, we expect both tests to have actual sizes close to the nominal ones. For small 2, the actual sizes and powers obtained with both tests should not differ much. We will also compare ~mod to/~w.Ja. 4+ For the latter test the choice of c is critical. We have used different values of c. Note that a value of c which converges at a rate higher than T-L/2 ensures in princit~le that eventually all linearly independent restrictions are included in the test. The objective of the following simulation study is threefold: first, we want to investigate whether in small samples the modified tests may overcome the difficulties of the standard test in the problematic case where ~r/~O' does not have full rank. Second, we would like to explore the impact of/l on the size and "

"~Wald

"

H. L~itkepohl, M.M. Burda/Journal o f Econometrics 78 (1997) 315-332

325

mod Third we would like to get an idea power properties of a test based on 2w~ld. about the relative merits of the two alternative modifications of Wald tests for a particular example. Throughout this section the following simulation design is used. We consider a three-dimensional VAR(1 ) process of the form described in (3.6). For simplicity the coefficient matrix Al is chosen upper triangular (i.e., ayx = c~= = azy = 0). Thus there is no causal link from xt to Yt and zt and from Yt to zt. On the other hand, causality from Yt to xt is governed by the coefficients Ctxy, c~xz and Cqy, the latter of which is zero in this case. Details will be discussed later when the specific values of the remaining coefficients used in the simulations are given. Because of the triangular form of the coefficient matrix, the elements on the main diagonal (C~x~, O~yy, O~zz) are the eigenvalues of Al. Consequently these elements can be used to control the distance of the process from the nonstationarity region. Stationarity is ensured when the diagonal elements are less than one in absolute value. For a given coefficient matrix A l, trivariate time series of length T are generated by first drawing randomly T + B observations for the trivariate residual series from a standard normal distribution, that is, ut ,-~ N(0,I3). Here, T denotes the sample size and B the number of presample values. Different values of T will be considered while B = 100 is used throughout the experiments. In the next step, a set of time series xt, yt and zt is generated successively according to the data generation process in (3.6) with starting values of 0. The first B presample values are then cut off to eliminate the starting-up effects. Although the process mean is zero in the data generation process, we use an intercept term in the estimation of the VAR(1 ) process. We first explore the actual rejection frequencies of the tests under the null hypothesis of oo-step noncausality from Yt to xt,

yt / , xt. The restrictions are given in (3.7). Since we have chosen c~.y = 0, the restrictions specified in the null hypothesis are fulfilled when c~xy= 0. The remaining potentially nonzero VAR coefficients are chosen as follows: O~xx = O~yy = O~zzE {--0.9,

c~yz = 0.5,

-0.3, 0.3, 0.9}, (4.1)

{0,0.5}. The values of the diagonal elements of ]11 are such that processes close to the nonstationary region (~zii = -0.9, 0.9) as well as processes well inside the stationary region (C~ig = -0.3,0.3) are included. The coefficient C~y~ is of little interest in our study and is therefore arbitrarily set at 0.5. Simulations with other values of c ~ indicated that the results are not sensitive with respect to the choice of this parameter. Therefore we do not vary C~y~in the following discussion.

326

H. Lfitkepohl, M.M. BurdalJournal of Econometrics 78 (1997) 315-332

Given these parameter values, exz determines whether the standard Wald test has its asymptotic x2-distribution or not: the matrix of first-order partial derivatives of the function r(cQ has full rank for exz r 0 and reduced rank for C~xz= 0 (see (3.11)). Hence, the latter parameterization represents the critical case where the standard Wald test does not have its usual limiting z2-distribution (see (3.10)). With this experimental design, 1000 replications of the experiment are performed. In each case the VAR(1 ) processes are estimated and the modified Wald statistics 2W~d moa and "~Wald + are computed and compared to the 95% quantile of a x2-distribution to determine the relative rejection frequencies. The modified Wald statistic ZWaldm ' ~is computed as given in Proposition 1 with 2 values 0,0.01,0.1 and 10. Note that for a value 2 = 0 the modified Wald statistic reduces to the standard one. For the computation of the modified Wald statistic 2w~a + different threshold values el = ~1 T-l/3 and c2 = ~'l T - 1/2 are used. Other values produced similar results and are therefore not reported. Note that Cl goes to zero somewhat slower than t/x/-T. Normalization with respect to the largest eigenvalue 21 of zi (see Proposition 2) corrects for the scaling of the data. Since the matrix of first-order partial derivatives ~r/~a' in (3.11) has at least rank 1 and at most rank 2, only the second^ (smaller) eigenvalue 22 is set to zero in the simulation whenever the condition 22 < c is met. 2 For comparison + purposes results are included for a modified Wald statistic )'Wald with the rank of the generalized inverse always set to one. All calculations have been carried out with GAUSS 3.2 for workstations. Pseudo standard normal random numbers have been generated for the residuals (see (3.6)) and for ff~h) (see Corollary 1). Notice that for all scenarios the same set of 1000 residual series has been used to ease comparability of the resuits. Some results are given in Tables 1 and 2. As to the notation, note that a value 2 = 0 in conjunction with the modified Wald statistic ~mod "~Wald denotes the standard Wald statistic. A value ' 1' signifies that in the computation of the rood+ ified Wald statistic J'Wald the rank of the generalized inverse has always been set to one whereas cl (c2) signifies that the rank has been determined in a pretest with threshold value c1(c2). In the right-hand panels of Tables 1 and 2, C~xy~ 0 and hence the standard Wald statistic has an asymptotic Z2(2)-distribution whereas in the left-hand panels of both tables exy = 0 and hence the standard Wald statistic fails to have its usual limiting ;(2-distribution. For T = 1000 the actual distributions of the test statistics should be close to the asymptotic ones. This is clearly seen in the right-hand panel of Table 1 where all the relative rejection frequencies are close to 5%. Notice that the standard error of a 5% rejection probability estimated from 1000 independent replications of the experiment is x/0.05 x 0.95/1000 ~ 0.0069.

2Note that when testing for multi-step causality in a stationary VAR model of order p and

Kx, Ky, Kz>/1 the minimal rank of Or/O01will equal pKxKy.

H. LiitkepohL M.M. Burda/Journal of Econometrics 78 (1997)315-332

327

Table 1 Relative rejection frequencies of Wald and modified Wald tests, nominal significance level 5%, T = 1000 ~z = 0

e~= = 0.5

~mod Wald

~+ Wald

~.mod Wald

~.+ Wald

2.

rk

2

rk

~ii

0

0.01

0.1

10

I

ct

c2

0

0.01

0.1

10

1

cl

c2

-0.9 -0.3 0.3 0.9

1.5 1.5 1.9 1.5

4.5 3.7 3.5 4.4

4.7 4.9 5.3 4.6

4.7 5.2 5.3 4.7

4.6 4.8 5.2 3.6

4.6 4.8 5.2 3.6

4.6 4.8 5.2 3.6

6.2 4.1 5.5 6.3

6.3 4.8 5.3 6.7

5.5 5.2 5.5 6.2

5.7 5.4 4.5 5,5

5.3 4.7 4.7 6.7

5.3 3.8 5.6 6.7

5.3 4.1 5.5 6.7

Table 2 Relative rejection frequencies of Wald and modified Wald tests, nominal significance level 5%, T = 100 ct= = 0

ct= = 0.5

2mod Wald 2

,2+ Wald rk

~mod "'Watd 2

2+ Wald rk

otii

0

0.01

0.1

10

I

cl

c2

0

0.01

0.1

10

1

cl

c2

-0.9 -0.3 0.3 0.9

5.7 1.9 1.5 5.2

5.9 1.9 1.6 5.2

7.0 3.2 2.7 6.8

7.2 3.7 4.1 7.8

10.4 5.1 5.0 10.5

10.4 5.1 5.0 10.5

10.4 5.1 5.0 10.5

10.3 4.5 4.0 13.1

10.7 4.3 3.6 12.3

9.4 4.0 3.7 11.4

6.8 3.8 3.9 8.8

10.5 6.1 5.5 11.2

10.5 5.5 5.8 11.2

10.5 4.7 4.1 11.2

In the left-hand panel Ctxz= 0. Now, the relative rejection frequencies of the standard Wald statistic are much lower than 5%, reflecting the fact that the standard Wald test does not have a limiting ;(2(2)-distribution in that case. In contrast, the relative rejection frequencies of the modified Wald tests are much closer to the nominal significance level of 5%. For instance, the relative rejection frequencies of the modified Wald tests "~Wald ~mod increase already for a value of 2 as small as 0.01. Moreover, the size of the test does not seem to be very sensitive to the choice of 2. However, a value 2 >~0.1 seems more suitable since the actual sizes then all fall inside a 2-standard-deviation confidence interval around 0.05. + For ct= = 0 the modified Wald statistics 2Wald all show the same relative rejection frequencies which means that the true rank one is estimated correctly with both threshold values cl and c2. For a sample as small as T = 100 the relative rejection frequencies depend on how close the process is to the nonstationary region. For the modified Wald

328

H. Liitkepohl, M.M. Burda / Journal of Econometrics 78 (1997) 315-332

statistic ]rnod "~Wald there is a tendency to underreject for ccxz= 0 and very small 2 values. Also, in small samples this statistic behaves more sensitive with respect to different 2 values. For eigenvalues 0.9,-0.9 all tests show a clear tendency towards overrejection. Although it is risky to draw general conclusions from a limited Monte Carlo experiment of the present type one result is obvious: The actual size of the standard Wald test can be substantially different from the intended theoretical size if the test statistic is used in conjunction with the usual ~(2 critical values, whereas the modified tests result in a clear improvement in this respect. Obviously, this depends to some extent on 2 and, hence, on the amount of noise added. Clearly, in smaU samples, adding very little noise (2 small) does not change the test ~mod statistic sufficiently to obtain a full size correction of the test bascd on "~Wald, + whereas "~Wa|ddoes better in this rcspcct. Also, in the present examplc, one m a y argue that thc standard test is conscrvativc and thcrefore rcduccd power is the only consequence of using it in unmodificd form. Therefore, using a modified test which has also reduced powcr m a y not result in any advantage. This argument can always bc used if the standard Wald test is conservative. Unfortunatcly, it cannot be shown in gcncral that the standard test is conservative and in particular we arc unablc to show such a result for thc presently considcrcd causality hypothescs. W c will considcr the power of the tcsts ncxt. For that purposc wc choosc ~xy= ~ with values of ~ which result in a reasonable range of powers. Again, a valuc 2 = 0 permits comparison with thc standard Wald tcst. All calculations arc bascd on time scrics of length T = 1000 because this way wc hope to gct results close to the asymptotic ones which arc not distorted by small sample deviations of the typc observed in Table 2 for smallcr T valucs. Results arc prescntcd in Table 3 for negative cigcnvalucs ~ii= -0.3, -0.9. Notc that for positive eigenvalues, results did not change significantly. For ~xz = 0 and ~xy - 3, we consider the power for altcrnativcs closc to a singularity point. While the asymptotic propcrtics of the standard Wald statisticarc unknown in this case, the modified Wald statistic "~Wald ~mod follows asymptotically + a noncentral X2(2)-distribution (see (2.6)) and the modified Wald statistic 2Wald has a limiting noncentral X2(1)-distribution if the true rank ( = 1) is estimated correctly (see (2.9)). Therefore we expect the modified Wald tests to have superior power in this case. This is clearly seen in the left-hand panel of Table 3. At least in those cases, where the power is generally low, results are in favor of the modified Wald tests. For larger ~ the power of "~Wald ~mod is about the same as + that of the standard test whereas 2Wald performs generally slightly better. In the right-hand panel of Table 3 alternatives close to a regularity point are considered. In this case all Wald statistics follow asymptotically a noncentral Z2]mod tO perform slightly distribution. One might expect the modified Wald test "Wald worse than the standard Wald statistic due to a smaller noncentrality parameter which results from adding noise. However, this is not reflected in the simulation + results. Moreover, the modified Wald statistic 2Wald performs slightly better in

H. Liitkepohl, M:M. Burda/Journal o f Econometrics 78 (1997) 315-332

329

Table 3 Power of Wald and modified Wald tests, nominal significance level 5%, T = 1000 ~ x x = O~yy = ~ z z

= -- 0.3

~= = 0

~xz = 0.5

~.mod Wald

2+ Wald

J.

rk

~5

0

0 0.00316 0.00948 0.0158 0.0316 0.0632 0.1264

1.5 1.7 2.6 4.2 12.2 50.9 99.0

0.01 3.7 4.2 5.4 7.1 15.6 55.8 99.2

0.1 4.9 5.5 6.6 8.9 17.4 59.0 99.5

10

jmod

+

. "'Wald

"~Wald

2

1

Cl

5.2 4.8 4.8 5.7 5.6 5.6 6.8 7.4 7.4 9.6 1 0 . 1 10.1 1 8 . 1 23.5 23.5 59.2 70.0 70.0 99.7 99.7 99.7

c2

rk 0

0.01

4.8 4.1 4.8 5.6 4.7 5.1 7.4 5.7 5.8 10.1 7.7 7.5 23.5 18.3 18.i 70.0 56.4 56.2 99.7 99.3 99.2

~xz = 0

0.I 10 5.2 5.5 6.5 9.0 17.4 55.8 99.3

1

cl

5.4 4.7 3.8 4.1 5.8 5.0 4.4 4.7 7.1 6.0 5.4 5.7 8.6 8.8 7.6 7.7 18.0 21.8 18.7 18.3 56.1 64.8 57.4 56.4 99.3 99.7 99.3 99.3

~xz = 0.5

rood J'Watd

+ J'Wald

,]mod "~Wald

+ '~Wa|d

).

rk

2

rk

0 0 1.5 0.00316 4.6 0.00632 15.1 0.00948 35.2 0.01264 58.4 0.0158 78.9 0.0316 100.0

c2

0.01 4.5 8.1 20.4 37.5 59.7 79.5 100.0

0.1 4.7 8.3 21.1 37.9 59.7 79.1 100.0

10

1

ct

4.7 4.6 4.6 8.4 1 0 . 7 1 0 . 7 20.9 25.7 25.7 37.6 47.0 47.0 60.0 70.6 70.6 79.4 87.8 87.8 100.0 100.0 t00.0

c2 0 4.6 10.7 25.7 47.0 70.6 87.8 100.0

6.2 7.3 10.0 14.7 21.2 30.8 91.9

0.01

0.1 10

1

6.3 5.5 5.7 5.3 6.8 6.3 6.6 6.8 9.8 9.3 9.0 9.7 14.8 14.0 13.1 15.8 21.2 20.2 19.9 25.2 31.2 30.8 31.1 37.3 92.0 92.2 91.8 94.2

Cl

c2

5.3 6.8 9.7 15.8 25.2 37.3 94.2

5.3 6.8 9.7 15.8 25.2 37.3 94.2

terms of power when the rank is permanently underestimated. This illustrates that a test with fewer degrees of freedom may have superior power properties relative to a competitor with more degrees of freedom (see also the finding of Gallant (1977) mentioned in Sect. 2). Note that the alternative considered here is of the form 6 r(~)= [(~+~yy)~] (see Eq.(3.7)). Since ~y, the first element checked in Ho, is not contaminated by noise in the test based on 2mod "~Wald,one may expect that the good power properties of this

H. Lfitkepohl, M.M. Burda I Journal of Econometrics 78 (1997) 315-332

330

Table 4 Power of Wald and modified Wald tests, nominal significance level 5%, T = 1000 mod ~'Wald

2+ Wald

2

3

0

-0.0158 0 0.0158 0.0316 0.0632 0.1264

9.2 3.9 8.5 23.5 76.2 100.0

rk 0.01 9.8 4.6 8.0 22.4 71.6 100.0

0.1 8.9 5.0 5.7 15.2 45.4 98.0

10

1

cl

c2

5.6 5.4 5.8 6.3 6.8 8.4

6.8 6.7 6.2 6.4 6.7 1 I. 1

7.0 6.4 6.9 9.4 15.2 32.4

9.2 3.9 8.5 23.5 76.2 100.0

modified test are partly due to violation of the first restriction in (3.7). Moreover, + the good power properties of 2Wala may be due to the fact that whenever the rank is set to one, more weight is given to the first restriction. Hence, underestimating the rank has not reduced the power since the first restriction was always violated. The standard Wald test may be expected to have clear power advantages when only the second restriction in (3.7) is violated. To investigate this possibility we have also generated 1000 realizations of a VAR(1) process of sample size T = 1000 with coefficient matrix

84

o3 o0.3o ~ 0.7 0.5

0.4

with C~x.,= 3 and 6 = - 0 . 0 1 5 8 , 0, 0.0158, 0.0316, 0.0632, 0.1264. Stationarity of the process is guaranteed for all 6 values. Here again, the alternatives are close to a regularity point so that the standard and the modified Wald statistics have asymptotic noncentral x2-distributions. For this process only the second restriction in (3.7) is violated if ~xz ~ 0 while the first restriction is always fulfilled. Thus, this process is clearly favorable to the standard procedure which maintains its usual asymptotic properties for the processes considered now. In contrast, since the second restriction is contaminated by noise in the test based on ~mod "~Watd, this inefficiency should result in lower power. Also, we now expect the modified + Wald statistic 2Wald to have reduced power in those cases where the rank is set + to one since then 2wald takes into account only the first restriction which is not + violated. Hence, we expect the modified Wald statistic 2ward to have asymptotic power equal to 5% in this case. This is clearly seen in Table 4. Taking into account a two-standard-deviation interval for the estimated rejection + frequencies, the relative rejection frequency of the modified Wald statistic )'Wald comes close to 5% in most cases if the rank is permanently underestimated + trivially performs as well as the standard Wald statistic if (rk = I) whereas 2Wald

H. Liitkepohl, M.M. Burd(t/Journal of Econometrics 78 (1997) 315-332

331

the rank is estimated correctly (threshold value c2). The power of the modified ,~mod Wald test "~Wala now decreases in contrast to that of the standard Wald test for increasing 2 values. As can be seen from (2.6), for increasing 2 values and hence increasing size of 2~w relative to the covariance matrix Zr(O), the noncentrality parameter is driven down to zero. Of course, the last scenario is an extreme case which may be of limited importance from a practical point of view. However, the simulation results show that both tests 2 Wald m~ and "~Wald + may depend strongly on the structure of the true 2m~ may underlying data generation mechanism. Moreover, the modified test "~Waid in principle have reduced power relative to the standard procedure, depending + it is on the size of the parameter 2 whereas for the modified Wald test 2Wald crucial to determine the critical value c adequately. Though theoretical considerations would argue in favor of a threshold value which converges at a lower rate than I/v/-T, in the simulation experiment a value c=O(T -~/2) gave the best results. Overall, the simulations show that for the present example the gains from using the modified tests are limited whereas the potential losses may be quite substantial. Although this is partly a consequence of our special simulation design the latter problem is a more general one: It is always possible to add so mod much noise that 2Wald is completely dominated by the noise and thereby loses + all its power. This is apparent for 2 = 10 in Table 4. Moreover, 2WaJd has no or very little power for certain alternatives (see r k = 1 in Table 4). In the latter case it may be possible to overcome this problem by a clever choice of the threshold value or by developing a suitable pretesting procedure. In the context of causality testing such a procedure will be more difficult to design for systems with higher order p and more variables because the rank may then take any value between pKxKy and hpK~Ky. For "~wald ]mod a suitable choice of the Sw matrix may help. In the present example one could, for instance, try different 2 values and reject H0 if ~moa "~Wald exceeds a prespecified critical value for one of them. In other cases, it may be possible to develop data dependent procedures for choosing ~w. As mentioned earlier, the present study lays out a general framework. How well it works will depend on the particular problem under consideration. In any case, + have the way the noise is added and the choice of the threshold values for 2wala to be decided upon in the context of a particular problem. No general procedure for that purpose can be suggested here. It is also not possible to give general guidelines as to which of the two modifications is generally more suitable. They both have relative merits and neither one is uniformly more powerful. As we + have mentioned in Sect. 2, there are alte/-natives where 2wal0 has very little or no power whereas ~mod always has some power. Results in Table 4 show that "~Wald + "~W~ld~m~may indeed be more powerful than 2w~ld whereas the results in Table 3 illustrate a situation where the reverse is true. Of course, in practice both tests could be performed. If one of them rejects H0 this decision should be respected by the investigator.

332

H. Liitkepohl, M.M. Burda/Journal of Econometrics 78 (1997)315-332

5. Conclusions

"~Wald+

In this paper we have proposed two modified W a l d statistics "~Wald~]m~and which maintain their asymptotic z2-distributions in those cases where due to nonregularity conditions the standard W a l d statistic fails to have its limiting Z2distribution. Computation o f both modified W a l d statistics is illustrated by reference to testing for multi-step causality. Here, nonregutarities m a y arise due to highly nonlinear restrictions on the V A R coefficient matrices. In a small simulation experiment we find that in the case where the standard W a l d statistic has its usual limiting z2-distribution the modified tests do not perform worse than the standard W a l d test in terms o f size. The modified tests do perform better in the critical case where the standard Wald statistic has unknown limiting distribution. In this case the modified procedures are also seen to be potentially more powerful. Generally, both suggested procedures have their relative merits and disadvantages. They both require subjective choices b y the investigator that will have an impact on the performance o f the modified tests. Currently we are unable to give general advice on how to make these choices in a w a y that r e s u l t s in superior properties o f the procedures. W e are aware that this p r o b l e m m a y limit the usefulness o f our procedures. W e feel, however, that they m a y still be helpful for a better understanding o f W a l d tests in nonstandard situations.

References Andrews, D.W.K., 1987, Asymptotic results for generalized Wald tests, Econometric Theory 3, 348-358. Boudjellaba, H., J.-M. Dufour and R. Roy, 1992, Testing causality between two vectors in multivariate ARMA models, Journal of the American Statistical Association 87, 1082-1090. Breusch, T.S. and P. Schmidt, 1988, Alternative forms of the Wald test: How long is a piece of string? Communications in Statistics, Theory and Methods 17, 2789-2795. Bruneau, C. and J.-P. Nicolai, 1992, Probabilistic foundations of causal analysis in a stationary vectorial autoregressive model, Working Paper (CREST-ENSAE, Paris). Dolado, J. and H. Lfitkepohl, 1996, Making Wald tests work for cointegrated VAR systems, Econometric Reviews 15, 369-386. Dufour, J.-M. and E. Renault, 1993, Short-run and long-run causality in time series, Working Paper (CRDE, Universit6 de Montreal, and GREMAQ and IDEI, Toulouse). Gallant, A.R., 1977, Testing a nonlinear regression specification: A nonregular case, Journal of the American Statistical Association 72, 523-530. Gallant, A.R. and G. Tauchen, 1989, Seminonparametric estimation of conditionally constrained heterogeneous processes: Asset pricing applications, Econometrica 57, 1091-I 120. Gregory, A.V. and M.R. Veal, 1985, Formulating Wald tests of nonlinear restrictions, Econometrica 53, 1465-1468. Liitkepohl, H., 1991, Introduction to multiple time series analysis, (Springer, Berlin). Toda, H.Y. and T. Yamamoto, 1995, Statistical inference in vector autoregressions with possibly integrated processes, Journal of Econometrics 66, 225-250. Yamamoto, T., 1994, A simple approach to the statistical inference in linear time series models which may have some unit roots, Working Paper (Hitotsubashi University, Tokyo).