Small sample properties of alternative forms of the Lagrange Multiplier test

Small sample properties of alternative forms of the Lagrange Multiplier test

Economics Letters North-Holland 12 (1983) 269-275 269 SMALL SAMPLE PROPERTIES OF ALTERNATIVE THE LAGRANGE MULTIPLIER TEST * Russel DAVIDSON FORMS ...

353KB Sizes 0 Downloads 27 Views

Economics Letters North-Holland

12 (1983) 269-275

269

SMALL SAMPLE PROPERTIES OF ALTERNATIVE THE LAGRANGE MULTIPLIER TEST * Russel DAVIDSON

FORMS

OF

and James G. MacKINNON

Queen’s University, Kingston, Ont., Canada K7L 3N6 Received

3 November

1982

We argue that alternative forms of the LM test may perform very differently in small samples. One common variant often yields misleading inferences, while an alternative variant, which we have recently proposed, performs much better.

1. Introduction In recent years a number of papers have shown how several variants of the Lagrange Multiplier, or ‘score’, test may be used in econometrics; see, among many others, Godfrey (1978) Breusch and Pagan (1980) and Godfrey and Wickens (198 1). These LM tests are appealing because they require estimation only under the null hypothesis, and because they can often be calculated very easily using artificial ordinary least squares regressions. In most cases several variants of LM statistic may be computed; these are of course asymptotically equivalent, but will differ in finite samples. In this note we point out that alternative variants of the LM test may have very different distributional properties in small samples, and argue that one variant in particular should be avoided when the sample size is small. This argument is supported by evidence from a sampling experiment.

* This research was supported, in part, by grants from the Social Sciences and Humanities Research Council of Canada, and from the School of Graduate Studies, Queen’s University. 0165-1765/83/$3.00

0 1983, Elsevier Science Publishers

B.V. (North-Holland)

270

R. Davidson, J.G. MacKinnon

/ Small samples of alternative forms of LM test

2. The OPG form of the LM test

Consider initially a very general case, where 8 denotes a vector of k unknown parameters, I, (6) denotes the contribution to the loglikelihood of the tth observation, and L(0) denotes the loglikelihood function. Of course, I,(e) and L(B) depend on data, for which n observations are available, but this dependence will be suppressed for notational convenience. By definition, L(8)

= i

l,(e).

(1)

t=l

Let g(0) denote the gradient of L(O), considered as k x 1 column and let G(B) denote the n x k matrix whose typical element is G,,(e)

= a/,(e)/ae,,

g,(e)=

vector,

SO that

(2)

5 G,,w.

(3)

r=l

Let 6 denote the restricted maximum likelihood distinct restrictions. Then the LM test statistic

estimate of 6. subject may be written as

to I

where 9 denotes any k X k matrix which consistently estimates l/n times the information matrix. Under standard regularity conditions, this test statistic is asymptotically distributed as Chi-squared with r degrees of freedom; see Breusch and Pagan (1980). Alternative forms of the LM test differ in their choice of 3. One possibility is simply to use minus l/n times the Hessian of the loglikelihood function, evaluated at 6. However, this is often time-consuming to obtain analytically, and inconvenient to work with once obtained. A simpler approach is to use

(I/~)G(@'G(~)

(5)

in the role of 4. It is well known that this expression consistently estimates l/n times the information matrix; its use in estimation and inference has been advocated by Berndt, Hall, Hall and Hausman (1974).

R. Davidson, J.G. MacKinnon

/ SmnN samples of alternative forms of LM test

271

Let us denote g(8) by g, G(8) by G’, and a vector of n ones by 1. Then the LM statistic (4) using (5) for 4, is simply

where the equality follows immediately from (3). Now it is easy to see that the expression on the right-hand side of (6) is simply the explained sum of squares from the artificial linear regression ~=Gb+u,

(7)

in which a vector of ones is regressed on G;. Since the total sum of squares of this regression is n, the LM statistic may readily be calculated as n minus the sum of squared residuals. The version of the LM test just described will be referred to as the ‘OPG’ variant, where OPG stands for ‘Outer Product of the Gradient’. This variant can be used in a wide variety of circumstances, and is very easy to compute. However, there is reason to believe that its small-sample properties may not be good. Remember that the information matrix itself depends only on 8, and on any conditioning variables, but.not on the data for the dependent variable(s). Ideally, then, an estimator of the information matrix should depend on the data only through 8. The fact that ML estimators are asymptotically efficient, and that functions of ML estimators are themselves ML, will then guarantee the efficiency of such an estimator. The reason that efficiency is important here is that the asymptotic Chi-square distribution of (4) depends on 4 being a non-stochastic matrix; the greater the randomness of 4 in small samples, the less closely we would expect the small-sample distribution of (4) to approximate its asymptotic one. Now G’r? clearly depends on the data to a substantial degree. Consider, for example, the classical linear regression model: y = Xp + u, u - N(O,a’Z). The ijth element of (5) will be (l/n)

d4. i zifx,,x,j t=1 I

In contrast, the corresponding ante matrix estimate will be

(1/n)a-’

2 x,,x,,, 1=I

(8) element

of the inverse of the OLS covari-

(9)

272

R. Davidson, J.G. MacKinnon

/ Small samples of alternatroe forms of LM test

which depends on the data only through 6. Thus, we would expect the OPG form of the LM test to be less well-behaved in small samples than alternative forms based on (9).

3. The DLR variant of the LM test In the case of linear, or non-linear, regression models, computationally convenient alternatives to the OPG variant of the LM test are readily available; see, for example, Godfrey (1978). In other cases, however, that has not been so. Thus a new variant of the LM test, recently suggested by Davidson and MacKinnon (1981a), may be useful. This variant applies to quite a general class of models, and can be computed very easily by means of an artificial linear regression. The class of models dealt with by Davidson and MacKinnon (198 1a) may be written as

f,(.Y,?6) = Et,

E, -

(10)

N(O, 11,

where y, is the t th observation on a dependent variable, 0 is a vector of function parameters to be estimated, and f,( .) is a suitably continuous which may depend on exogenous variables and/or lagged values of .v. The restriction that the errors have a variance of unity is inconsequential. since the actual parameter(s) determining the variance are subsumed in 6. A great many multivariate as well as univariate models may be written in the form of (10). For this model, the contribution to the loglikelihood of the t th observation is I,(e)=

(11)

-~log(2a)-~f,(e)2+log(f,‘(e)~.

where f,‘( .) denotes the derivative of (11) with respect to 6, is

G,,(e)= -f,(e>F,,(e) +4,(e),

off,

with respect

to );. The derivative

(12)

where F,, (0) and J,, (8) denote respectively the derivatives of f,( 8) and log If,‘(e)1 with respect to 8,. In Davidson and MacKinnon (1981a), it was proved that the matrix, (l/n)( ET3 + j?) consistently estimates l/n times the information ma-

R. Davidson, J. G. MacKinnon / Small samples of alternative forms of LM test

273

trix; here P and j denote the matrices whose elements are F,,( 6) and J,,(e). Using this estimate for 4, and remembering that the gradient is just a vector of ones times (12), we obtain the LM statistic

This expression is simply the explained linear regression

({)= (-;)b+&

sum of squares from the artificial

(14)

This artificial linear regression has 2n ‘observations’. Thus we shall refer to (13) as the ‘DLR’ for ‘Double Length Regression’ variant of the LM statistic. Since (l/n)( cr&) contains many stochastic terms that do not appear in (l/n)(Frp+ jrj), we would expect the latter to provide a more efficient estimate of !l, so that the DLR variant should be better behaved than the OPG variant in small samples.

4. Results of a sampling experiment Godfrey and Wickens (198 1) proposed using the OPG form of the LM test to test the specification of linear and loglinear regression models against a general alternative based on the Box-Cox transformation. In Davidson and MacKinnon (1981b), we showed how the DLR form of the test could be used for the same purpose, and presented the results of a sampling experiment in which the small-sample performance of the two tests was compared. In this section we briefly summarize those results. The data were generated by a model with two exogenous variables and a constant term, which was either linear or loglinear. The exogenous variables were genuine economic time series, and the number of observations was normally either 50 or 100; in the latter case the X’s were repeated once, to ensure that X7X/n did not vary with the sample size. Since the tests involved only one degree of freedom, the LM test statistics were transformed into statistics which should be asymptotically N(0, 1); this was done by taking the square root of the statistic, and multiplying by minus one if the sign of the coefficient on the regressor corresponding to the Box-Cox parameter was negative. The results in table 1 dramatically confirm the conjectures about

Log

Log

Linear

Linear

Log

Log

Linear

Linear

Log

50

50

50

50

100

100

100

100

200

OPG DLR OPG DLR

0.202

I

0.03 OPG

OPG DLR

0.0 1

0.07

OPG DLR

OPG DLR

0.03

0.07 1

OPG DLR

OPG DLR

0.01

0.202

OPG DLR

Test

0.03

Sigma

a

standard

0.134 (3.58)

0.148 (5.06) 0.111 (1.16)

0.144 (4.64) 0.097 (0.32)

0.147 (4.95) 0.108 (0.84)

0.139(4.11) 0.101 (0.11)

0.174 (7.80) 0.113(1.37)

0.181 (8.54) 0.106 (0.63)

0.180 (8.43) 0.119 (2.00)

levels

probability

are

0.021 (3.50)

0.026 (5.09) 0.013 (0.95)

0.028 (5.72) 0.01 I (0.32)

0.024 (4.45) 0.010 (0.00)

0.025 (4.77) 0.012 (0.64)

0.026 (5.09) 0.011 (0.33)

0.038 (8.90) 0.007 (0.95)

0.030 (6.36) 0.015 (1.59)

0.053 (13.7) 0.013 (0.95)

1%

at nominal

or rejection

0.075 (3.63)

0.089 (5.66) 0.057 (1.02)

0.087 (5.37) 0.046 (0.48)

0.084 (4.93) 0.059 (1.31)

0.08 1 (4.50) 0.052 (0.29)

0.100 (7.25) 0.058 (1.16)

0.122 (10.5) 0.05 1 (0.15)

0.099 (7.11) 0.055 (0.73)

0.128(11.3) 0.064 (2.03)

5%

probabilities

deviation

rejection

0.190 (9.49) 0.114(1.48)

10%

Estimated

that the observed as N(0, I).

1.094(4.19)

1.130 (5.81) 1.027(1.19)

1.146 (6.51) 0.997 (0.16)

1.125 (5.60) 1.022 ( 1.OO)

1.128 (5.73) 0.992 (0.36)

1.195 (8.74) 1.037 (1.64)

1.249 (11.1) 1.002 (0.11)

1.217 (9.72) 1.056 (2.52)

1.285 (12.7) 1.033 (1.46)

Std. dev.

under the null hypothesis.

” Figures in brackets are asymptotic r-statistics for the hypotheses consistent with the OPG or DLR test statistics being distributed

True model

of DLR and OPG variants

Sample size

Performance

Table 1

$ 2 r:

2 K

% B 2 2 ?. : 6

$ zk

$

D \

% f: -x 3

;

.B b Q c % 9

Y P

R. Da&son, J.G. MacKinnon / Small samples of alternative forms of LM test

215

small-sample performance which we made above. Even when the sample size is only 50, the observed distributions of the DLR test statistics are quite close to N(0, 1). In contrast, the observed distributions of the OPG test statistics are never very close to N(0, 1); their variances are too large, and they reject the null hypothesis too often. This is less true when sigma (the standard deviation of the error terms) is smaller, but not to any great extent (and the smaller values of sigma in our experiments correspond to models which fit very well). Even when the sample size is 200, we can easily reject the hypothesis that the OPG test statistic is distributed as N(O, 1). Further results (not presented here) suggest that the OPG variant also rejects the null hypothesis more often than the DLR variant when the null is false. Thus, on a size-corrected basis, the power of the two forms of test is very similar. But since using the OPG variant may easily lead to serious errors of inference, while our results suggest that such errors are unlikely to occur with the DLR variant, it is clear that the latter should be the procedure of choice.

References Berndt. E.R., B.H. Hall, R.E. Hall and J.A. Hausman, 1974, Estimation and inference in nonlinear structural models, Annals of Economic and Social Measurement 3, 653-665. Breusch. T.S. and A.R. Pagan, 1980, The Lagrange Multiplier test and its applications to model specification in econometrics, Review of Economic Studies 47, 239-253. Davidson, R. and J.G. Ma&&non, 198la, Model specification tests based on artificial linear regressions, Queen’s Institute for Economic Research discussion paper no. 426. Davidson, R. and J.G. MacKinnon, 1981b, Small sample properties of alternative forms of the Lagrange Multiplier test, Queen’s Institute for Economic Research discussion paper no. 439. Godfrey. L.G.. 1978. Testing for higher order serial correlation in regression equations when the regressors include lagged dependent variables, Econometrica 46, 1303-1310. Godfrey. L.G. and M.R. Wickens, 1981, Testing linear and log-linear regressions for functional form, Review of Economic Studies 48, 487-496.