Journal
of Econometrics
26 (1984) 355-373.
North-Holland
A STUDY OF SEVERAL NEW AND EXISTING TESTS FOR HETEROSCEDASTICITY IN THE GENERAL LINEAR MODEL* Mukhtar Depurtment
of Economm,
Uniwrsi&
Carmelo Depurtment
M. AL1 of Kentucky.
Lxxlngton,
KY 40506. USA
GIACCOT’IO
of Fmunce, Uniuerstty of Connecticut. Stows. CT 06.?6#, USA
Received
March 1983, final version received February
1984
Several optimum non-parametric tests for heteroscedasticity are proposed and studied along with the tests introduced in the literature in terms of power and robustness properties. It is found that all tests are reasonably robust to the Ordinary Least Squares (OLS) residual estimates, number and character of the regressors. Only a few are robust to both the distributional and independence assumptions about the errors. The power of tests can be improved with the OLS residual estimates. the increased sample size and the variability of the regressors. It can be substantially reduced if the observations are not normally distributed, and may increase or decrease if the errors arc dependent. Each test is optimum to detect a specific form of heteroscedasticity and a serious power loss may occur if the underlying heteroscedasticity assumption in the data generation deviates from it.
1. Introduction One of the standard assumptions of the classical linear regression model is that the errors are homoscedastic; i.e., all observations have equal variances. However, in many practical applications, it has been found that errors are heteroscedastic. This is especially true for cross-sectional data where significant variation exists in the size distribution of the units. Thus, Prais and Houthakker (1955) and Jorgenson (1965) have found the problem of heteroscedasticity in family budget studies utilizing observations on individuals with diverse incomes and family sizes. Meyer and Kuh (1957) have found a similar problem in analyzing investment behavior of firms of different sizes. It is well known that if errors are heteroscedastic, the ordinary least square estimates of the regression coefficients are inefficient and the standard method of inference may produce misleading conclusions [see Johnston (1971) Malinvaud (1980), Goldberger (1964) and Goldfeld and Quandt (1972)]. Thus, detecting the possibility of heteroscedasticity is of utmost importance. *We are grateful in our expositions.
to a detailed
0304-4o76/84/$3.ooel984,
report
from an anonymous
Elsevier Science Publishers
referee which has helped enormously
B.V. (North-Holland)
M.M. Ah
356
utd
C. Gumotto,
Serlerul tests for hetero.wdusticr(~~
Several tests for heteroscedasticity have been proposed in the literature. Among them are Goldfeld and Quandt (1965) Ramsey (1969) Theil (1971), Glejser (1969), Park (1966), Harvey and Phillips (1974) Hedayat and Robson (1970), Rutemiller and Bowers (1968), Szroeter (1978), Harrison and McCabe (1979), Harvey (1976), Breusch and Pagan (1979). White (1980) and Bickel (1978). In this paper, we propose a class of optimum non-parametric tests and study them along with the tests introduced in the literature in terms of power and robustness properties. In particular, we investigate power effects and robustness to alternative residual estimates, sample sizes, the number and character (trendy, stationarity, variability) of the regressors, and the distributional and independence assumptions about the errors. Also, we investigate the power effects to various heteroscedasticity assumptions. It is found that all tests are reasonably robust to the OLS residual estimates, and the number and character of the regressors. Only a few are robust to both the distributional and independence assumptions about the errors. The tests tend to have more power with the OLS residual estimates, sample size and increased variability of regressors. The powers can be substantially reduced if the errors are not normally distributed. The power effects of correlated errors are uncertain. Each test is optimum to detect heteroscedasticity of a specific form, and serious power loss may occur if the underlying heteroscedasticity assumption in the data generation deviates from it. In section 2, we introduce a class of non-parametric tests, and describe most of the tests proposed in the literature.’ The methods of analysis are discussed in section 3, and the results of the investigation are reported in section 4. Some concluding remarks are in the final section 5. 2. Tests for heteroscedasticity Consider y, =
the general
x;p + u
r’
linear model, t=1,2,...,T,
(2.1)
where y, is the t th observation on the dependent variable y, xi = (x,i, . , x,~) is a 1 x k vector of t th observations on the k independent variables, p is a k X 1 vector of regression coefficients, and U, is the error. It is assumed that with each other. When E( ~4~)= 0, var( u,) = a:, and the ~4,‘s are uncorrelated ‘Several tests for heterogeneity have also been proposed in the context of analysis of variance model. A comparative study of these tests has been reported by Conover, Johnson and Johnson (19X1). These tests are not applicable in the framework of our regression model where only one observation is available to estimate a variance. Thus, they are not discussed here. However, some of our tests are adapted from these tests.
M.M. Ali und C. Giuccotto, Several
testsfor heteroscedustint~
the ut2’s are not all equal, we have the case of heteroscedasticity. Under null hypothesis of homoscedasticity, all the ut2’s are equal to a constant.
2.1. Rank
351
the
tests for heteroscedasticity
The hypothesis of heteroscedasticity for the regression error u,‘s can be interpreted as a shift in location of the distribution of u,’ or a shift in scale of the distribution of u,‘s themselves. It is well known that there exists a class of locally most powerful (LMP) non-parametric tests for the randomness of a sequence of random variables against shifts in location or scale of their distributions. To adapt these tests to the present situation, we consider the following two models of heteroscedasticity: (UJa)2=1+eg(z:(Y)+WI,
(~,/u)=[exp{Bg(z:a)}]w,,
(2.2)
t=l,T...,T,
(2.3)
where the w,‘s are identically, independently distributed (i.i.d.) random variables each with a distribution function F and the density function f; z; = they (z ri,...,ztm >2 Z,l = 1 for all t, zl,‘s (i = 2 ,..., m) are known constants; may be x,,‘s or some known functions of them; (Y’= (ai,. , . , a,) are parameters assumed to be known; u and 8 are unknown parameters and g is a known function. Eqs. (2.2) and (2.3) imply, respectively, that a; is proportional to [l + Bg(z$)] and exp{28g(z+)}. Thus, in either case, a: varies with the z,‘s. If m = k, z, = x, and (Y= p, uZ2 varies with E( y,) = x;p or some function of E( y,), whereas if m = 2, z,~ = xIZ, then a,’ varies with the regressor xt2 or some function of it. In both formulations (2.2) and (2.3) to test for heteroscedasticity we test H,: B = 0 against the alternatives 6 > 0, 8 < 0 or 8 # 0. Eqs. (2.2) and (2.3) represent respectively a location shift alternative for the random variables u: and a scale shift alternative for the random variables u,. Under some regularity conditions on F and g, = g(z$), the locally most powerful (LMP) rank test [for details see Hajek and Sidak (1967) henceforth referred to as HS] is based on the statistic
RH, = i
(g,-i?)h,(R,>f 1,
(2.4)
t=l
in the case of (2.2), and
Rff,= i t=l
in the case of (2.3).
JE
E
(s,-g)h,(R,,f),
S = i t=1
g,/‘T,
(2.5)
M.M. Air urd C. Gimcotto,
35x
Seuerul tests for heieroscedusticit?
R, is the rank of (~,/a)~
and (u/o) in the cases of (2.2) and (2.3), respectively; and defining x = F-‘(R,/(T+ l)), h,(R,,f)= -alnf(x)/ax, and h2( R,, f) = xh,( R,, f). Since the rank of (~,/a)* or (~,/a) is independent of u, these tests are free of cr. Under both null and alternative hypotheses, these statistics are asymptotically normally distributed with means and variances given in HS (p. 216). Specifically under H,, for i = 1,2, E( RH,) = 0, var( RH,) = u$ i
(g,
-
g)‘,
r=l
h,=
i
h,,/T
and
h,,=h;(R,,f).
r=1
Thus, at least for large samples, we can test our hypotheses by referring the statistics to a normal distribution. It may be noted that for the alternatives 0 > 0 or B < 0, these statistics provide the LMP tests among all possible tests (HS, p. 249). One difficulty is that they involve u,‘s which are unobservable and often the parameters q’s and hence g,‘s are unknown. However, these tests are still applicable, at least in large samples, when these unknowns are replaced by their consistent estimates. For either of the statistics RH, or RH, to be operational, one needs to specify g, = g(z;a) as well as the distribution function F. The choice of g, is dictated by the type of heteroscedasticity suspected in the data, whereas the choice of F is more arbitrary. As a precaution, one may specify more than one distribution. We have selected four well-known distributions - normal, logistic, double exponential and cauchy. All the distributions are symmetric. Both the double exponential and cauchy distributions are heavy-tailed. Moments of any order for the cauchy do not exist. Our choice of the distributions is not meant to imply, in any sense, that the regression errors that occur in practice follow one of them. We would hope that the true distribution is close to one of our selections.2 The statistic RH, is designated by NL, LL, DEL, and CL corresponding to the assumed distribution of normal, logistic, double exponential, and cauchy. Similarly, the designations for the RH, statistics are NS, LS, DES, and CS. ‘If the true distribution most powerful ones.
deviates from the chosen one, the tests are still valid but they are not the
M. M. Ali und C. Guccorto,
Seoeral tests for heteroscedusricit)
359
The respective h,( .) from which the h2( +) functions can be obtained, are @-‘(qt), R,, sgn(q, - i) and Z/(1 + Z:), where q, = R,/(T+ 1); sgn(x)= 1 ifx2Oand =-lifx
one can introduce
a,=a{l+eg(z;cu)+o(e)},
t=1,2
a class of tests for heteroscedas-
,...,. T,
(2.6)
where CT,8, g, zt and cx are defined in the previous section and o(d)/0 tends to zero as 19-+ 0, so that 1!3= 0 corresponds to homoscedasticity. In Bickel (1978) it is assumed that z;a = x;fi = E( y,). The null hypothesis to be tested is 8 = 0 and the alternatives are 13> 0, 8 < 0 or 6 # 0. For the alternatives 0> 0 or 8 < 0, the LMP test is given by
B = 5
(&-
~)~(@J)/%
(2.7)
I=1
where
b(x)
= --x alnf(x)/Jx,
and f is the density function of the distribution of (u/u) which is assumed to exist. Under appropriate conditions [for details, see Bickel (1978)] on f, the distribution function F of u/u and g,, one can show that asymptotically B is normal both under the null and alternative hypotheses. Specifically, under the null hypothesis, E(B) = 0 and var( B) = 1. As in the case of the statistics RH, and RH,, the statistic B involves ~4,‘s which are unobservable and often the parameter u and g,‘s are unknown. However, this test is still applicable, at least in large samples, when these unknowns are replaced by their consistent estimates. Following Bickel (1978) u may be replaced by 8 = median{ Ifi,/, . . . , /fi,1}/0.674 where ir,‘s are the OLS residuals. For the statistic B to be operational, one needs to specify g, = g( z;a) as well as the distribution function F. Again the choice of g, is dictated by the type of
M.M.
360
Ali und C. Giuccotto,
Seoerul tests for heteroscedusticity
heteroscedasticity suspected in the data, whereas the choice of F is more arbitrary. As in the case of the Rank tests, we have chosen the same four distributions - normal, logistic, double exponential and cauchy. The four test statistics thus obtained are denoted correspondingly by NB, LB, DEB and CB.
2.3. Tests by grouping For these tests, we assume that observations are arranged according to increasing variance as suspected in the data. The variances are often assumed to vary in relation to one of the regressors. Once the observations are ordered, n of the regression error u,‘s are estimated. n = T if the estimates are the OLS residuals and equals T - k if they are the BLUS [Theil (1971)] or Recursive residuals [Harvey and Phillips (1974)] because only T - k of the BLUS or Recursive residuals can be obtained. Let S,, S, and S be the sum of squares of the first n,, the last n2 and all n estimated u,‘s. Define three tests statistics,
GQ = (Wh)/(W~,)~
(2.8)
HM
(2.9)
RMS=
= S/S,
t
n,ln(S,/ni)-nln(S/n),
(2.10)
r=l
where S = S, + S, + S,, n = n, + n2 + n3, and A, = n, - k and A, = n2 - k if the u,‘s are estimated by the OLS residuals, and h, = n, and h, = n2 if they are estimated by the BLUS or Recursive residuals. In the case of GQ, if the u,‘s are estimated by the OLS residuals, we follow Goldfeld and Quandt (1965) to obtain the first n, and the last n2 residuals from two separate regressions based on the first n, and the last n2 observations, respectively. In all other cases, HM or RMS, the residuals are obtained from one regression based on all T observations. GQ is the test proposed by Goldfeld and Quandt (1965), Theil (1971) and Harvey and Phillips (1974) if the u,‘s are estimated respectively by the OLS, BLUS and Recursive residuals. In each case, under the null hypothesis, GQ has an F distribution with degrees of freedom= Xi, h,. If the u,‘s are estimated by the OLS residuals, HM is the test proposed by Harrison and McCabe (1979), and if they are estimated by the BLUS residuals, RMS is the test proposed by Ramsey (1969). Asymptotically, RMS is a x2 variable with degrees of freedom = 2. In choosing n, and n2, it has been suggested by Ramsey (1969) and Goldfeld and Quandt (1965) that both be set equal to one-third of n, whereas
M. M. Ali und C. Giuccotto, Severul tests
forheteroscedasticity
361
Theil (1971) and Harrison and McCabe (1979) suggest them to be approximately one-half of n. In computing GQ, both n, and n2 have been chosen to be [T/3] + k if the u,‘s are estimated by the OLS residuals and [T/3] if they are estimated by the BLUS or Recursive residuals. [x] is the largest integer not exceeding x. Irrespective of the residual estimates used, we have taken n, = n, = [n/2] and n, = n2 = [n/3] in computing HA4 and RMS, respectively. There is some arbitrariness in selecting a base of k observations to obtain either the BLUS or the Recursive residuals. Theil (1971) suggested the middle k observations as the base. We have followed this advice as closely as the data permitted. For example, if T = 10, k = 3, then the middle three observations have been chosen to be the fourth, fifth and sixth ones. 2.4. Glejser tests Let ic, be an estimate of u, in the regression (2.1). Consider the following models: Ii&( =
z;a+ wr,
(2.11)
i(,”= z;a+ wt,
t=1,2
,...,
(2.12)
T,
where z, and cr are as defined in section 2.1 except that (Yis unknown; wt’s are appropriately defined errors in these models. Then the standard F statistic to test for the hypothesis H,: (Y*= . . . = cx, = 0 in either model, can be used to test for heteroscedasticity. If m = 2, the standard statistic is t. We denote this statistic by GLO and GLM for the models (2.11) and (2.12) respectively. If the fi,‘s are the OLS residuals, then GLO is the original test statistic proposed by Glejser (1969) and GLM is the modified Glejser test statistic proposed by Goldfeld and Quandt (1972). 2.5. White test Let ic, be an estimate of u, in the regression (2.1). Consider the following artificial regression: k
fi: = (~a+ C r=l
k
C (Y,x~~x~,+ error,
t=1,2
,..., T,
(2.13)
j=l
where ‘Y,‘sare parameters to be estimated. Then the standard squared multiple correlation coefficient to test for H,: (pi = . . . = (Y~(~+~),~ = 0 can be used to test for heteroscedasticity. We denote this statistic by WH. Asymptotically,
362
M.M. Ali and C. Guccotto, Seuerul tests for heteroscedusticig,
T. WH is a &i-squared variable with k(k + 1)/2 degrees of freedom. If ic,‘s are the OLS residuals, then WH is the test proposed by White (1980). 2.6. Likelihood ratio tests Let at2 in the model (2.1) be given by a,’ = exp(z$), where z, and (Yare as defined in section 2.1, except that (Y is unknown. Assuming the U,‘S to be normally distributed, Harvey (1976) derived the likelihood ratio test for the hypothesis of homoscedasticity, H,: LYE = . . . = (Y,,= 0, to be LR= Tln(ESS/T)-
f
z;&,
(2.14)
r=1
where ESS = Cy,,fif, fi,‘s are the OLS residuals in (2.1) and & is the maximum likelihood estimate of (Y.Asymptotically, LR is a x2 variable with (m - 1) degrees of freedom. 2.7. Lagrangian multiplier tests Let at in the model (2.1) be given by a: = g( z;ti) where g is continuous with continuous first derivatives and z, and (Yare as defined in section 2.1 except that (Yis unknown. Moreover, at a2 = . . . = (Y,,,= 0, g(z$) # 0. Assuming the U,‘S to be normally distributed, Breusch and Pagan (1979) derived the Lagrangian Multiplier test for the hypothesis of homoscedasticity, H,: LY*= . . . = q,, = 0, to be
which is half of the regression sum of squares when e, is regressed on z,, where e, = ii:/G*, r?* = c~=,i(f/T, and fi,‘s are the OLS residuals from the regression model (2.1). Asymptotically, LM is a x2 variable with (m - 1) degrees of freedom. 3. Study objective and methodology Our objective is to study the power and robustness properties of the various tests presented in the previous section. In particular, first, we investigate the effects of alternative residual estimates on the power and size of the individual tests. Observe that all the tests depend upon the estimated u,‘s. It is well known that the OLS residuals are correlated and heteroscedastic even when the U,‘S satisfy the ideal conditions. Therefore, they may not be suitable estimates
M. M. Ali und C. Gtaccotto, Seoeral tests for heteroscedastici!~~
363
of the errors when testing for changes in variance. Moreover, the Rank tests, Bickel tests and tests like RMS, GLO and GLM require that the estimated u,‘s be uncorrelated for their valid applications. Since the BLUS [Theil (1971)] or the Recursive residuals [Harvey and Phillips (1974)] are uncorrelated and homoscedastic, it seems that these residuals would be ideal estimates of the U,‘S.
Second, the quality of u, estimates which may affect the size and power of a test is likely to depend on sample size T, number of regressors, k, and regressor characteristics. For example, the efficiency of the OLS residuals as estimates of the u,‘s increases with T or the variability of the regressors and deceases with k. We have studied both the size and power effects of T, k and regressor characteristics. Third, it is expected that a test derived with a specific assumption of heteroscedasticity will be most powerful to detect such heteroscedasticity. For example, the Bickel tests are expected to be most powerful when the true heteroscedasticity is given by (2.6). But their powers are open to question if the actual heteroscedasticity differs from this ideal specification. Fourth, and last, we intend to shed some light on robustness and power effects of these tests to distributional and independence assumptions regarding the u[‘s. Only the power of GQ and HM tests can be computed analytically [see Harvey and Phillips (1974) and Harrison and McCabe (1979)] and that is only when the u,‘s are independent normal variables. Even then the power computations involve expensive numerical integration. In our experience, the accuracy obtained from these numerical methods can be matched with that of the estimated power by a Monte Carlo simulation at a cost which is often smaller. It may also be noted that the computation of power of all the tests considered here depends not only on the specification of the alternative hypothesis of heteroscedasticity and the observation on the regressors, but also on the distribution of ~4,‘s which can be non-normal and the correlation structure of u,‘s. Thus, we resorted to Monte Carlo methods for power computations. All the powers were estimated using 1,000 replications. For the Monte Carlo studies, the following five basic models of data generation were used: Model I: Y, = Pi% + Uf, Model II: Yr = PO + &X,1 + u,, Model III: ~1~= & + &x,i + &x,~ + ut, Model IV: Y, = Pa + &x,i + PZx,2 + &x,~ + u,, Model V: Y, = PO + &x,1 + &x,2 + &x,3 + P&4 + &x,5 + u,, for t = 1,2,. . . , T. In all cases, we have set PO = pi = & = & = & = & = 1. The essential difference among these models is the number of regressors. The majority of the experiments were performed with Model II.
364
M. M. Ali and C. Giaccotto,
Six different models.
data
Seoeral tests for heteroscedasticit_y
sets were used in specifying
the regressors
in the above
Data Set 1:
A random sample of size T from a uniform range ( - 1.73, 1.73) and with mean = 0 and approximately,
distribution in the variance = 1.
Data Set 2:
distribution in the variance = 25.
A random sample of size T from a uniform range ( - 8.66, 8.66) and with mean = 0 and approximately,
Data Set 3: x,, t = 1,2,. . _, T, are generated from x, = +x,_~ + a,, where x0 = 2.5, 9 = 0.5 and a,‘s are i.i.d. normal with mean = 0 and variance = 1. x,‘s are standardized by subtracting sample standard deviation.
the sample
mean
Data Set 4:
Same as Data set 3, except that C#I = 1.0.
Data Set 5:
Same as Data set 3, except that $I = 1.05.
Data Set 6:
x, (t = 1,2,. . . , T) is the income of radio reported in Rutemiller as in Data set 3.
the demand standardized
and dividing
by the
of the tth state in the study of and Bowers (1968). x,‘s are
Data sets 1, 2 and 3 are stationary, whereas Data sets 4, 5 and 6 are non-stationary in character. Both Data sets 1 and 2 are random and free from any trend (steady growth or decline). Because of the correlation structure, Data set 3 may display weak trend when examined in small subsections of consecutive observations. Data set 4 is wandering and trendy in subsections. Both Data sets 5 and 6 are clearly trendy. All Data sets have mean = 0 and with the exception of Data set 2 all have variance = 1. Data set 2 has a variance of twenty five. Six types of heteroscedasticity were experimented with: H, : at = a2; H, : and 5 2 = (~~1x,~l; H,: uf2 = u21E(yr)(; H,: ut2 = u*x~r; H,: a,‘= u2{E(y,)}*; H,: a:= u2 for t s (T/2) and = 2u2 for t > T/2. Throughout, u* = 1. H, corresponds to the null hypothesis of homoscedasticity. H,, . . . , H, can be identified with a specification such as a, 2 = u’g(z$x) of section 2 with an appropriate function g and the vector of variables z,. Note that in H,, g(z$) = z+, m = 2, aI = a2 = 1, and zt2 is a dummy variable such that zf2 = 0 for t I (T/2) and = 1 for t >(T/2). H,, H,, H, and H, characterize respectively that the variance uZ2 increases with the regressor, mean of the dependent variable, square of the regressor and square of the mean of the dependent variable. H, states that any observation in the second half of the sample has variance twice that of any observation in the first half. Four distributions
of U,‘S were considered:
N:
~,/a,
is standard
normal,
T3:
u/u, is t/6 where t has t-distribution with degrees of freedom = 3; with this choice, variance of (u/u,) is unity,
M. M. Ali und C. Giuccotto,
365
Seoerul tests for heteroscedasticrty
c:
u/u,
LGN:
(u,/q) + 1.27 is log-normal with mean = 0 and variance = 0.48; we have chosen the shift parameter of 1.27 and variance = 0.48, so that, E( u,) = 0 and variance (~,/a,) = 1, approximately.
is cauchy with location
= 0 and scale parameter
= 1,
The distributions N, T3 and C are all symmetric, whereas the distribution LGN is skewed. C has longer tails than T3 which in turn has longer tails than N has. All order moments of both N and LGN are finite, whereas only firstand second-order moments of T3 and no moment of C are finite. For specifying the correlation structure of the u,‘s, we assumed that c’,= PZJ_~ + a,, t = 1,2,. . ., T, where u, = u/u,, lpi < 1, u,, is normal with mean = 0 and variance = 1, and a,‘~ are i.i.d. normal with mean = 0 and variance = 1 - p2. Thus, E( u,) = 0, var(u,) = uz2, and the correlation between u, and u,+~ is ps for s = 0, * 1, *2, __. . We considered five correlation structures: p = - 0.9, - 0.5,0,0.5,0.9. When p = 0, u,‘s are independent; they are positively correlated if p = 0.5 or 0.9 and negatively correlated if p = -0.5 or - 0.9. p = 0;5 or - 0.5 can be considered the case of moderate correlation and that p = 0.9 or -0.9 of extreme correlation. Finally, we experimented with three sample sizes - T = 10, 25 and 40, and three residual estimates - OLS, Recursive and BLUS. For the Rank and Bickel tests, the GLO, GLM, LR and LM tests and the GQ, HM and RMS tests to be operational we must specify, respectively, g, = g(z$), z((u and a variable to order the observations in increasing variance. Respective implications are that the variance uf2 is changing in relation to g,, z;a and the ordering variable. Historically, most tests of heteroscedasticity have been carried out under the assumption that the variance u,~ 1s ’ changing in relation to one of the regressors, xt, or the mean of the dependent variable y,. Thus, we experimented with three choices of g,: Ix,rl, lE(y,)l and InlE(y three choices of z;a: lyO+ IY~z,~, with z,~ = lx,rI, lnlx,,l and lE(y,)l; and two choices of the ordering variable: Ix,rl and lE(y,)j. Note that no distinction can be made between the ordering variables lx,rl and lnlx,,l and between IE( y,) I and lnlE( y,)l. The choices of g, imply that the variance al2 is increasing as a simple function of the regressor x,r or the mean of the dependent variable E(y,). Similar implications are evident for the choices of z$ or the ordering variables. Note that E(y,), which is unobservable, appears in some of these choices. In those cases we replaced E(y,) by its OLS estimate. It should be clear that each of the tests was applied more than once depending upon the choice of gt, z$ and the ordering variable. Thus, each of the Bickel and Rank tests was applied three times - first, with g, = 1~~~1; second, with g, = lE( y,)l; and third, with g, = lnlE(y,)l. Each of GLO, GLM, LR and LM tests was applied three times - first, with zt2 = lx,rl; second, with zz2 = lnlx,,l; and third, with zt2 = lE( y,) 1. Finally, each of GQ, HM and RMS
M. M. Ali and C. Giuccoiio, Secerul tests for heteroscedustici
366
tests was applied twice - first, with the ordering variable = I.x,ij or lnlx,,l; and second, with the ordering variable = jE(r,)j or lnlE( r,)l_ We differentiate the variety of tests so obtained by attaching a 1, 2 or 3 to each of the tests according to whether the first, second or third choice (if possible) of gt, Z;CX or the ordering variable is made. Thus, for example, NLI, NL2 and NL3 stand for the XL test when g, is set equal to lx,il, lE( r,)l and lnlE(y,)l, respectively. In all we have 55 tests. Eight experiments were performed with all the tests described above. These experiments can be identified with the specifications of one of the five models; regressor variables xt; error variance at2., distribution and correlation structure of uI; alternative estimates of uI; and the sample size T. Table 1 describes these experiments. The first one (column 1) is designed to investigate the effect of different residual estimates: OLS, Recursive, and BLUS. The second and third deal with the impact of sample size T, number of regressors k, respectively. The fourth and fifth relate to types of heteroscedasticity (specifications Hi through H6), and type of data - trendy, stationary or highly variable - (Data sets 1 through 6). The sixth and seventh allow us to investiTable 1 Descriphon
of experiments. Experiment
Data description
1
2
3
4
5
~. 6
7
8
Model
11
II
I. II III. IV or v
II
II
II
II
II
Sample size (T)
25
lo,25 or 40
25
25
25
25
25
25
Regressor”
Data 5et 6
Data set 6
Data set 6
Data set 6
Data set 1. 2, 3.4, 5 or6
Data set 6
Data set 6
Data set 6 or 1
Distribution of u,‘s
N
N
N
N
N
N. T3, C or LGN
N, TJ. C or LGN
N
p=o
p=o
p=o
p=o
- 0.5, 0, 0.5
Correlation prop. of u,‘s
p = -0.9.
p=o
p=o
p=o
or 0.9
u, estimates
OLS, Recursive
OLS
OLS
OLS
OLS
OLS
TRUE
OLS
U,‘S r&S
“This is a description of the data on the regressor variable x,, only. In experiment 3 where the regressors .Y,*, x,~, x,~ and x,~ are also in use. we have used Data set 1. 3. 4 and 5. respectively, for these regressors.
M. M. Ali and C. Giaccotto,
Serlerul tests for heteroscedustici(v
367
gate the effect of distributional assumption (N, T3, C, LGN) both when the errors are estimated (experiment #6) and when they are known (experiment # 7). Finally, experiment # 8 deals with the impact of serial correlation. Once the specifications of an experiment were made, they remained constant for all replications. Each replication consisted of generating T of the ~4,‘sfrom which the y,‘s were obtained, and of computing all the test statistics on the basis of the generated data. In all experiments, one sided (right) alternative was used for all the tests except those based on RMS, LR and LM where two sided alternatives were used. The critical rejection regions of the tests were obtained on the basis of both the nominal and actual 5% level of significance. The nominal critical regions were based on the distributions that are known for these statistics. In most cases, these are the asymptotic distributions (see section 2 for details). For example, all the Rank tests are asymptotically normal. In the case of the HM test, only the distributions of an upper and a lower bound are available. Thus, we have two critical regions, one based on the distribution of the upper bound, and the other on that of the lower bound. The actual critical regions, for each of the tests, with the true size of 5% were obtained from the simulated null distributions. For each replication, it was noted whether or not a particular statistic was in its nominal critical rejection region under H, and the actual critical region under alternatives other than Hi. The power of each test was estimated by the percentage of times out of 1000 replications, the test statistic fell in its rejection region. Under Hi which is the hypothesis of homoscedasticity, this percentage is the estimated actual level of significance for the nominal critical region. Under other than Hi, this percentage is the estimated power of the test with actual critical region.
4. Findings For easy reference, we have divided the tests discussed in the previous section into various groups. Each group is assigned a number and a name. These are recorded in table 2. Designations of all groups, except possibly groups 6 and 7, are self-explanatory. The tests in group 6 are specifically devised to detect variance heterogeneity related to one of the regressors. This should be clear from the choice of g,, z;a or the ordering variable on which these tests are based. Similarly, the tests in group 7 are expected to be most powerful in detecting changes in variance related to the mean of the dependent variable. The robustness of a test is usually judged by the insensitiveness of its distribution under the null and alternative hypotheses to variations in the underlying assumptions of the test. Any variation in the null distribution affects the size of the test and may thereby invalidate its use. Variations in the
368
M. M. Ali und C. Giu(~ofto, Several tests for hereroscedusticig
distribution under the alternative hypotheses affect the power of the test which may not invalidate the test but may make it undesirable in relation to other tests. We have examined both aspects of a test. To avoid confusion, in what follows, robustness of a test will be exclusively referred to the insensitiveness of its null distribution; i.e., size or level of significance, and variations in the distribution under alternative hypotheses will be described by the effect on its power. There are different ways of interpreting our results as there are many ways of defining what is a ‘good’ as opposed to a ‘bad’ test. Theoretically, if the true level of a test exceeds the nominal level, the test is non-robust (invalid). Making allowance for sampling errors, we define a test to be robust if its rejection rate under H, does not exceed the nominal rate of 5% by two standard errors. When the true rejection rate is 5%, the standard error of its estimate from 1000 replications is 0.7%. Thus, a test is declared non-robust if its estimated type I error exceeds 6.4%. We also define a test which is not robust to be moderately robust if its actual type I error does not exceed the nominal rate of 5% by 2.5%, i.e., if the estimated type I error does not exceed 9%. In examining the effect of the three alternative estimates of the u,‘s - OLS, Recursive and BLUS residuals - we found (experiment # 1) the following tests
Table 2 Test groupings Group no. and designation
and their designations.
Tests in the group
1.
Parametric tests
GQl, GQ2, HMI, HM2, RMSl, RMS2, GLOI, GL02, GL03, GLMI, GLMZ, GLMJ, LRI, LR2, LR3, LMI, LM2, LM3
2.
Non-parametric tests
All the tests except those in group 1
3.
Location rank tests
NLI, NL2, NL3, LLI, CLI, CL2, CL3
LL2, LL3, DELl,
4.
Scale rank tests
NSI, CSI
LS2, LS3, DESl,
5.
Bickel tests
NBI, NB2, NB3, LBI, CBI, CB2, CB3
6.
Tests for variance varying with regressor
GQI, HMI, RMSI, GLOI, GL02, GLMI, LR2, LMl, LM2, NLI, LLI, DELl, CLl, DES1 CSl, NBl, LBl, DEBI, CBI
7.
Tests for variance varying with E(y)
All the tests except those in group 6
NS2, NS3, LSl,
DEL2,
DEL.?,
DES-‘, DES3,
) cs2, cs3 LB2, LB3, DEBl,
DEBZ, DEB3,
GLMZ, LRI, NSl, LSl,
M. M. Ali und C. Giuccotro, Severul tests for hereroscedasrrci(g
(with type I error rate in parentheses)
369
to be valid: 3,4
OLS:
RMSl
(S), RMS2(7);
Recursive:
RMS2(8), GLO3(7), GLMl(7), NB1(8), LS2(8), LBl(8), LB2(9),
BLUS:
RMSl(9), RMS2(9), GLMl(7), GLM3(8), NB2(8), LB1 (7), LB2(7), DEB1 (7), DEB2(7).
GLM3(8), DES2(7),
NSI (7), NS2(8), DEB2(7), CB2(7); NSl(7),
NB1(8),
Thus, while only 2 of the 55 tests were invalid if the OLS residuals were used, 13 or 11 of the tests were invalid if the Recursive or the BLUS residuals were used. An examination of the powers revealed that the tests tended to have maximum power with the OLS residuals. Comparing for each test, the estimated powers across different u, estimates, we found that out of 55 tests, the numbers of tests attaining their maximum were 40 and 45 under H, and H,, respectively when the OLS residuals were used. With minor exceptions, the power of a test was lowest with the Recursive residuals and only in a few cases, BLUS was preferred but the gain was small. Thus, it seems a test is more likely to be valid and powerful if the OLS rather than the other residuals are used. This is surprising because of the theoretical objections in using the OLS residuals, but it is pleasing because of its computational advantage over the Recursive or BLUS residuals. In the remaining experiments, we used the OLS residuals as estimates of the u,‘s. By increasing the sample size to 40 or decreasing it to 10 (experiment $2) and increasing or decreasing the number of regressors (experiment #3), only the Ramsey tests (RMSl, RMS2) were found to be non-robust. If the regressor was stationary (Data set 1) as opposed to trendy (Data set 6) then besides the Ramsey tests, the following were invalid (experiment #5): GQI, GL03, NS2, LL2, LS2, DELZ, DESZ, DEBZ, CL2, CL3, and CS2. However, all of them were moderately robust. As the variability of the regressor was increased (Data set 2) all the tests were robust. Thus, it seems the robustness of a test is insensitive to the sample size or the number of regressors but is sensitive to such regressor characteristics as trendy, stationarity and variability. Both trendy nature and high variability of the regressors may insure robustness of a test. When the u,‘s were drawn from a skewed distribution, LGN (experiment #6), all the non-parametric except the scale rank tests (NSI, LSI, DESl) and among the parametric, the Glejser (GLOI,. . . ,GLMS), the likelihood ratio
‘As the rejection rates under H, for HMI and HM2 are based on their bounding no definite conclusions can be reached. 4Details
of the various
rejection
rates can be obtained
on request.
distributions,
370
M. M. Ali and C. Giuccorto, Seoerul tests for heteroscedusticr<~
(LRI, LR2) and the Lagrangian multiplier (LMl ) tests were either robust or moderately robust. It was surprising that so many parametric tests were robust to the assumption of skewed distribution. Barone-Adesi and Talwar’s (1983) findings that GQ and Ramsey-type tests are invalid are consistent with ours. However, they also found that GLOZ-type test was invalid when the U,‘S were from a chi-squared distribution with 4 degrees of freedom. This is specially surprising because the standard measure of skewness [third central moment/(standard deviation)3] for such a &i-squared variable is half as much as that of our choice of LGN. When the u,‘s were from the distribution, T3 which is symmetric but has finite moments only up to the order of 2, all the non-parametric tests except LSl were robust or moderately robust. The estimated type 1 error for LSI was 10%. The parametric tests which were not even moderately robust were (with rejection rates in parentheses): GQL(14), GQ2(14), RMS1(32), RMS2(33), LMZ(15) and LM3(10). This is consistent with Barone-Adesi and Talwar (1983) who also found GQl and RMSl-type tests to be invalid if the U,‘S follow t-distribution with four degrees of freedom. When the error distribution was changed to C which is symmetric with no finite moment and seemingly, its tails are longer than those of T3, only the four Glejser (GL02, GLMl,GLM2, GLM3), the White (WH) and the five Bickel (NBZ, NB2, NB3, LB3, DEB3) tests were at best moderately robust. Thus, none of the location or scale rank tests (groups 3 and 4) were even moderately robust. This is surprising because by construction, the non-parametric tests should be robust to distributional assumptions. However, this non-robustness may be artificial because of the error in estimating the U,‘S by the OLS residuals, especially where the error distribution is cauchy. For in that case, the standard assumptions of finite mean and variance for the linear regression model are not tenable. To examine this possibility, we repeated the experiment (# 7) with the U,‘S taken to be the known generated values in the simulations. All the non-parametric tests were either robust or moderately robust. Thus, it seems if the U,‘S are from a symmetric distribution, increasing the efficiency of the u, estimates may insure the robustness of the non-parametric tests. However, at present, in the absence of a better alternative, we may have to use the OLS residuals and therefore, only distributionally robust tests seem to be the Glejser (GLOZ,GLMI,GLM2,GLM3), the White (WH) and the Bickel (NB1, NB2, NB3, LB3, DEBS) tests. These tests are robust to both In what follows they (in the order long-tailed and skewed distributions. mentioned) will be referred to TROB (ten robust) tests. When the U,‘S were autocorrelated (experiment # 8) and the correlation was positive or moderately negative, all the tests, parametric or non-parametric, were robust. When the correlation was highly negative, there were 12 paramet-
M. M. Ali und C. Giuccotro, Several tests for heteroscedusticiq
371
ric and only 4 non-parametric tests were at best moderately robust. Of the TROB tests, only GL02 and GLM2 were robust. In this experiment, our regressor was trendy (Data set 6). When the regressor was changed to be stationary (Data set l), every test whether the autocorrelation was positive, negative, moderate or extreme, was either robust or moderately robust. Thus, it seems if we use the OLS residuals as estimates of the U,‘S and the regressors are stationary, the TROB tests are robust to distributional (skewed or symmetric) and dependent assumptions for the errors. Some of these tests may be invalid if the regressors are trendy and the autocorrelation (first-order) is extreme negative. As in practice, the negative autocorrelation is an exception rather than a rule, one may conclude that the TROB tests can be used validly for all practical purposes. There were some exceptions but in general we found that the test powers increased with sample size and trendy nature or variability of the regressors; decreased with the number of regressors and the deviations of the errors from the normal distribution; and were mostly erratic (increase or decrease) when the errors were autocorrelated. None of the tests was powerful to detect heteroscedasticity specified by H,. The maximum power was 9%. The tests in group 6 and those in group 7 had respectively the most power under H, and H, and the next best power under H, and H,. This is not surprising because the specifications H, and H, and H, and H, are consistent, respectively with the heteroscedasticity assumption on which the tests in group 6 and 7 are based.5 Moreover, the tests in group 6 had significantly less power to detect H, than H, and H, than H,. Similarly, the tests in group 7 had significantly less power under H, than H, and H, than H,. As a typical example, the powers (in percentages) of GLMl (a test in group 6) were 61, 33, 87, and 68 and those of GLM3 (a test in group 7) were 40, 55, 68 and 81 under H,, H,, H, and H,, respectively. Thus, it seems that none of the tests has any significant power to detect heteroscedasticity which deviates substantially from the underlying assumption of heteroscedasticity for the test. This is not consistent with the findings of Buse (1982) and Goldfeld and Quandt (1972). However, in both these studies the assumed and the true heteroscedasticity were of the same form and they may not be considered to differ substantially from each other. The powers of the TROB tests when averaged over H,, H,, H, and H, were 56, 62, 62, 61, 64, 62, 61, 55, 52 and 51, respectively. These averages over H,, H, and the four distributional assumptions (N, T3, C, LGN) were 37, 31, 35, 29, 35, 31, 29, 27, 27 and 29 and over H,, H, and the five autocorrelations ( p = - 0.9, - 0.5,0,0.5,0.9) assumed for the errors were 44, 47, 47, 42, 43, 48, ‘Even though no specific assumption of heteroscedasticity one may infer from its construction that the test is designed
was made to develop to detect specifications
the WFI test. like H,.
312
M. M. Ali und C. Giuccotto, Seoerul tests for heteroscedusricily
43, 43, 37 and 37. As can be seen there are very little differences in powers6 among .these tests except that the tests LB3 and DEB3 tend to have the minimum power. There is some evidence of reduction in powers due to dependency in errors and its seems certain that the powers are reduced if the errors are not normally distributed.
5. Concluding remarks A major finding of this investigation is that a test, parametric or non-parametric, is likely to be valid and tends to have more power if the OLS than either the Recursive or the BLUS residuals are used for the estimates of the u,‘s. The higher is the efficiency of the OLS residuals, the more likely a test will be robust and powerful. Thus, an increased variability of the regressors may insure the validity and high power of a test but it may be non-robust and lose power considerably if the u,‘s are from an extreme long-tailed distribution. A future search for a valid and powerful test may concentrate on efficient and robust estimates of the u,‘s. The parametric rather than the non-parametric tests tend to be invalid when the observations are from a skewed (LGN) or a moderately long tailed (T3) distribution. All tests, especially the location and scale rank tests (groups 3 and 4) are non-robust to extreme long-tailed distributions such as cauchy. The tests are generally robust to autocorrelated errors if the regressors are stationary. If the regressors are trendy, it is more likely for a non-parametric than a parametric test to be invalid. The members of TROB include both parametric and non-parametric tests. They are likely to be robust to distributional and dependence assumptions of the errors. Generally, they are all powerful, But none of them has any significant power to detect heteroscedasticity characterized by changes in variance in isolated jumps. They seem to detect specific forms of heteroscedasticity. Their powers can be reduced if the errors are not normal or independent. A similarity among the tests in TROB may be noted. Except the tests GLOZ, LB3 and DEB3, everyone of the rest is designed implicitly to detect a relationship between uf and a measure that is related to the heteroscedasticity. The tests that are of the same form but use (u,I in place of u: are likely to be
‘We recomputed the powers based on nominal critical regions and found little differences from those based on estimated actual critical regions. For example, the powers of these test averaged over H,, H,, H, and H, were 52.60, 52, 59, 61, 60, 57, 49, 47 and 47, respectively, those over Hz, H, and the four distributional assumptions (N, T3, C, LGN) were 35, 34, 29, 33, 31, 33, 28, 20, 29 and 31 and over H,, H, and the five autocorrelations ( p = - 0.9, ~ 0.5,O. 0.5,0.9) assumed for the errors were 40, 42, 39, 42, 47, 42, 42, 36, 36 and 31.
non-robust and lack power.7 However, one may recommend
Kef
This may be a clue to develop a better test. any one of the TROB tests for practical use.
erences
Barone-Adesi, CF. and P.P. Talwar. 1983, Market models and heternscedasticity of residual security returns. Journal of Business and Economic Statistics 1, 163-168. Bickel, P.J.. 197X. Using residuals robustly I: Tests for heteroscedasticity, nonlincanty. Annals of Statistics 6. 266269. Breuach. T.S. and A.R. Pagan. 1979, A simple test for hetcroscedaaticity and random cocthcrent variation. Econometrica 47, 1287-1294. Buse. A., 1982, Tests for additive heteroscedasticity: Goldfeld and Quandt revisited, Unpublished discussion paper (UnivcGty of Alberta). Conovcr. W.J.. ME Johnson and M.M. Johnson, 1981, A comparative study of test\ for homogeneity of variances, with applications to the outer continental shelf bidding data. Technomethcs 23, 351- 361.
‘Ail scale rank tests belong
.IE
t
to this class