Journal
of Econometrics
20 (1982) 211-253.
North-Holland
Publishing
Company
ASSESSING THE PRESENCE OF HARMFUL COLLINEARITY AND OTHER FORMS OF WEAK DATA THROUGH A TEST FOR SIGNAL-TO-NOISE David A. BELSLEY* Boston College, Chestnut Hill, MA 02167, USA Massachusetts Institute of Technology, Cambridge, MA 02139, USA Received
September
1981, final version received
April 1982
Data weaknesses (such as collinearity) reduce the quality of least-squares estimates by inflating parameter variances. Standard regression diagnostics and statistical tests of hypothesis are unable to indicate such variance inflation and hence cannot detect data weaknesses. In this paper, then, we consider a different means for determining the presence of weak data based on a test for signal-to-noise in which the size of the parameter variance (noise) is assessed relative to the magnitude of the parameter (signal). This test is combined with other collinearity diagnostics to provide a test for the presence of harmful collinearity and/or short data. The entire procedure is illustrated with an equation from the Michigan Quarterly Econometric Model. Tables of critical values for the test are provided in an appendix.
1. Introduction It has long been known in economics and other disciplines plagued with non-experimental data that weak or uninformative data (such as collinear data) reduce the quality of the least-squares estimates b=(XTX)-lXTy of the standard linear model y =XB+.s [y(n x l), X(n x p), /I(p x I), Ee =0 and EsT =a211 by inflating the parameter variances a& i.e., the diagonal elements of the data-dependent variance-covariance matrix a2(XTX)-l.l For a given g2, then, it is reasonable to define the data matrix X as possessing a weak-data (uninformative-data) problem relative to the estimation of a parameter fik (assumed a priori or maintained to be non-zero) if it can be determined that the variance c$~ of bk is ‘too large’. We adopt just such a definition here, and it becomes the purpose of this paper to suggest a criterion for assessing when this variance is too large. *The author is indebted to Professors Gregory Chow, Jim Durbin, Saul Hymans, Edwin Kuh, John Tukey and Walter Vandaele for many helpful discussions relating to this research, which avowal is in no way intended to implicate those generous souls in any remaining errors. Thanks also to Aik Quek, Fatma Taskin, Stephen Swartz and Ana Aizcorbe for research assistance. Computation for this research was done mainly through TROLL at MIT, although thanks go also to the Boston College Computer Center. This research was financed in part through a Mellon Grant at Boston College. ‘These ‘true’ variances u:~ are to be’distinguished from the estimated variances st, based on s2(XTX)- I, where sz = e’e/(n -p) and e is the residual vector.
0304-4076/82/000&0000/$02.75
0 1982 North-Holland
212
D.A. Belsley, Assessing the presence of data weaknesses
Since data weaknesses apply to a specific data matrix, we view the OLS estimator and its variances CJ~~as conditional on X. While this introductory discussion centers on a single parameter flk of the vector fl, the tests developed subsequently are general. Existing diagnostic and statistical tests cannot assess the size of the c& and thus assess the presence of data weaknesses. Collinearity diagnostics which are based on the X data alone, such as those given in Belsley, Kuh and Welsch (1980) (BKW) (and which play a role later), can signal the presence of collinear relations among the columns of X (and hence indicate the potential for data problems) but are devoid of information on c?. Thus, they cannot determine when the parameter variances c$~ are large or small. Likewise, traditional statistical tests of location (t’s and F’s) employ null hypotheses under which the resulting test statistics are centrally distributed (noncentrality parameter equal to zero) regardless of the parameter variances. Thus, they cannot provide further tests on the sizes of these variances. A test for uninformative data, then, must add to existing tools the ability to test for the size of the parameter variances rr$. Of course, whether (rtk is large or small is necessarily relative, and hence standard tests on the absolute magnitude of aik are not suitable to this need. Another common practice, however, measures the size of the variance (or standard deviation) relative to the size of the expected value. This suggests a test for the magnitude of the signal-to-noise (s/n) parameter of the OLS estimator b, of the kth regression parameter; namely, = - Pk&.
(1)
The inverse of z is often called the coefficient of variation [Wilks (1962)]. Given from prior considerations that /O, a test that z is high (low) is also a test that crb, is relatively low (high) and, hence, indicative of the absence (presence) of weak or uninformative data.’ In section 3 of this paper, then, we develop a test for the level of s/n. In section 4 we propose a definition of an adequate level of s/n, which, combined with the above test, provides a test for uninformative or weak data. Section 5 discusses the relation of this test to conventional tests of hypothesis. Section 6 combines the tests developed here with the diagnostics of BKW to define and assess the presence of two related forms of uninformative data (or weak data): harmful collinearity and short data. Section 7 gives an example. Tables of critical values (mostly previously unpublished) for the test for adequate s/n are presented in appendix A. Appendix B summarizes the collinearity diagnostics of BKW. We turn next to several preliminary remarks. 2Choice of units or the parameterization of the rest of the model is not a concern here since the s/n ratio is seen rather generally to be invariant to such reparameterizations - or linear transformations of the data.
D.A. Belsley, Assessing
2. Preliminary
the presence
ofdata weaknesses
213
remarks
(a) It is assumed above that it is maintained a priori that /O. This condition is important, for if H,:/?,=O can be entertained, one cannot identify whether the rejection of a high z indicates a relatively high B:~ or the truth of H,, and hence one cannot assess the presence of weak data. That O a priori occurs often in practice when it can be maintained that the variable X, (the kth column of X) definitely belongs in the regression and is not subject to test (no matter how poorly estimated its parameter may be); income, for example, surely belongs in the consumption function. The focus of this test for s/n, then, is distinctly different from that of conventional hypothesis testing. We do not test here whether a variable belongs in the equation (H,:pk=O) or whether a parameter is a particular value (H,: ljk =/?$). Rather we test whether the variance of a given estimate (which depends on the data) is sufficiently small relative to its expected value that we can rule out data weaknesses such as collinearity or short data (defined in section 4) as being underlying problems in need of consequent resolution (H,: IZJ> z*). (b) The s/n parameter statistic,
z bears
a superficial
resemblance
to the usual
t-
but the two concepts are obviously quite different - one being a parameter, the other a random variable. Indeed, in general, t is distributed as a noncentral Student’s t with non-centrality parameter z; that is, the s/n parameter z is identified with the non-centrality parameter of the distribution of t.3 This fact again helps us to relate the test for s/n developed here to conventional tests. Both tests use the t-statistic (2) as a natural means for estimating T. Under conventional tests (II,,:/?, =O), however, z ~0 for all CJ~,, i.e., t is centrally distributed regardless of CT~~.~Such tests cannot therefore provide further information on the relative size of gb,. By contrast, under the test proposed here, it is assumed that Pk#O, so that the t-statistic becomes noncentrally distributed with z#O. Thus a test that [z/ exceeds some chosen threshold z*>O does indeed provide information on the magnitude of CJ~, relative to bk and hence on the presence of uninformative data. (c) An immediate consequence of the preceding is that values of the t-statistic thought high for conventional tests of hypothesis need not be high ‘In subsequent sections we shall deal more generally with F (t*-like) statistics that are distributed as the non-central F with non-centrality parameter 7’. “In general, I = (h,- /?z)/sb,is distributed with a non-centrality parameter TE((B,,-~~)/cT~~. Under II,:[i, =/I:, T ~0.
214
D.A. Belsley,
Assessing the presence of data weaknesses
for tests on the presence of no&ero s/n and, hence, for the absence of data problems. A t of 4, for example, typically indicates its coefficient to be significantly different from zero. We shall see, however, that this same t of 4 may not suffice to accept a hypothesis that the s/n exceeds some reasonably chosen threshold level and hence to accept the hypothesis that (TV,is sufficiently small relative to flk to rule out the presence of data weaknesses. Under these conditions the investigator is appraised of the presence of a data weakness and may feel that, despite the significance of the estimate, an analysis based on more or better conditioned data, or one incorporating prior information of some sort, 5 is warranted before the results can be used in understanding a key structural parameter or providing some important policy recommendation. (d) The test developed here for adequate s/n is useful as a diagnostic for data weaknesses, but cannot, by itself, distinguish between the two data weaknesses: harmful collinearity and short data. This limitation is overcome in section 4 where we combine the test for s/n with the collinearity diagnostics of Belsley, Kuh and Welsch (BKW), a brief summary of which is presented in appendix B. Inadequate s/n together with collinearity defines harmful collinearity whereas inadequate s/n without collinearity defines short data. Since linear transformations of the data,,do not affect s/n but can reduce collinearity (as often occurs when two collinear variates X, and X2 are replaced with X, and X, -XI), we see that such transformations cannot remove a data weakness, but only alter its form (the collinearity between X, and X, is replaced by the short data in X,-X,). Harmful collinearity and short data, then, are two sides of the same coin, but in correcting for their effects, there is typically an advantage in knowing which type of problem is occurring relative to the particular parameterization being estimated. Thus, the two diagnostic procedures, the test for adequate s/n and the BKW collinearity diagnostics, strongly complement one another in analyzing weakdata problems, each having independent value. The test for adequate s/n can detect the presence of weak data but cannot determine its cause (collinearity or short data). The diagnostics of BKW, by contrast, cannot detect the presence of weak data, but can determine whether an already detected weakdata problem is due to collinearity or short data. Furthermore, they can determine the structure of a collinear relation and hence help to direct where new data or prior information can be most advantageously employed. While the collinearity diagnostics of BKW are introduced for this purpose, and, in 50n means for incorporating prior information and its ability to aid data conditioning, see Chapter 4 of Belsley, Kuh and Welsch (1980). A pure Bayesian might argue that prior information, if available, should be introduced at the outset. As a matter of practice, however, prior information is often included only after classical techniques have failed to provide estimates of a desired degree of precision.
D.A. Be&y,
Assessing the presence of data weaknesses
215
the opinion of the author are those best suited to this need, in practice any means for determining the presence of collinearity could be used in their stead. (e) Finally, it can now be seen that the test provided here makes substantial headway in filling some of the needs implicit in the following remark for Smith and Campbell (1980, p. 77): The essential problem with VIF and similar measures [of collinearity] is that they ignore the parameters while trying to assess the information given by the data. Clearly, an evaluation of the strength of the data depends on the scale and nature of the parameters. One cannot label a variance or a confidence interval (or, even worse, a part of the variance) as large or small without knowing what the parameter is and how much precision is required in the estimate of that parameter. In particular, a seemingly large variance may be quite satisfactory if the parameter is very large,. . .
3. Signal-to-noise Consider
and a test for its magnitude
the linear
regression
model
y=XB+q
(3)
where y is an n-vector, X is an n x p data matrix, p is a p-vector of unknown parameters and E- N,(O, a2ZJ. Our concern is with a subset of the elements of p. Hence, partition (3) as y=X,P,
is n x pl, X, is Partition the commensurately to give distribution of the p,-vector where
X,
p1 +pz
=p.
(4)
+X2/32+5
n x pz, PI is a p,-vector, p2 is a p,-vector and least-squares .estimator b=(XTX)-lXTy of /I b= [b, b,]. As is well known, the marginal b, is
where V(b,) is the variance-covariance V(b,) = cG(X;M,X,)where M, -I-X,(XTXi))‘XT,
matrix
of b, and, conditional
‘, and (X:X,)-’
on X, is (6)
is assumed
to exist.
D.A. Belsley, Assessing the presence of data weaknesses
216
3.1. Signal-to-noise Let pt be any p,-vector. We define the signal-to-noise of the LS estimator b, of f12 relative to fit, denoted r2, as rZ = (B2- PF)‘v- ‘(MD2 - BT).
(7)
The generality arising from the introduction of fiz will prove useful later on. For the moment it is simply any arbitrary point in the p,-dimensional parameter space, but eventually it will be identified with a hypothesized value for &. Although it proves convenient to suppress the phrase ‘relative to SS’ when referring to signal-to-noise unless context requires it, its presence should always be assumed. The special case where 82 =0 is an important, but by no means isolated, instance. Furthermore, while the term signal-to-noise properly refers to the magnitude z, we typically deal with r2 and find it convenient to call this magnitude by the same name. No confusion arises on this account. The quadratic form in ‘(7) appropriately generalizes the concept of signalto-noise given in the introduction, clearly reducing to (1) in the case that p2 = 1, /$ =0 and the vector b, is simply b,, the kth element of b.6 3.2. Testing for signal-to-noise We will say that the signal-to-noise tz significantly exceeds some hypothesized value r: when we are able to reject the null hypothesis A,: r2 = r’, in favor of the alternative A,:z’ > r”, at some chosen test size CI.The symbols A, and A, have been used here to avoid confusion with the usual tests of hypothesis on coeflicient values; H, and H, will be reserved for this latter context. In light of (5), we have [Rao (1973), pp. lSl-192), Anderson (1958, p. 113)] (b2-B~)T1/-1(b2)(b2--~)NX~2(~)’ a non-central chi-squared distribution centrality parameter
(8)
with pZ degrees of freedom and non-
A-(& - /%)‘I’- ‘(UP2 - Bt).
(9)
Further, we know that the sum of squared regression residuals s* seTe/(n-p), (e = y - Xb), obeys (n-p)s2/02
q&,
(10)
6The notational ambiguity that arises here between the vector b, and the element bk causes no troubles; numbered subscripts denote a vector while lettered subscripts denote an arbitrary element.
D.A. Belsley, Assessing the presence
of data weaknesses
217
a central chi-squared distribution with n-p degrees of freedom, independent of (8). The ratio of (8) to (lo), each divided by their respective degrees of freedom, produces [recalling (6)] the test statistic
~2_(b2--/lt)*(X~IMIX2)(h2-P:)_Fpl
(11)
@)
P2S2
nIJ
’
non-central Fisher F with p2 and n-p degrees of freedom and noncentrality parameter 3, from (9) [Rao (1973, p. 216). Anderson (1958, p. 114)]. Comparing (9) with (7) we recognize that a
A=r2,
(12)
i.e., the non-centrality parameter relevant to the distribution of 4’ identically the s/n parameter r ‘. Hence, under A, we have ,! = 7: and
is
(13) We therefore have the following test for A,:r2=ri Adopt a test size a (such as 0.05 or 0.01) and calculate F, = 1 _ J;‘_ ,(z:),
against
A,:T~>z:.
(14)
the (1 - a)-critical value for the non-central F with p2 and n-p degrees of freedom and non-centrality parameter r!$. If 4”s F,, accept A,; if 42> F,, reject A, and accept A,. In accepting A,, we can also say that r2 is significantly greater than 7: at significance level ~1.~ Two aspects of this test deserve comment. First, the test statistic 4’ in (13) used to test A, is the same as the F-statistic conventionally used to test the null hypothesis H,:P,=fi,* against the alternative H,:fi2#~~.8 Indeed, in the p2 = 1 case with 8; =O, $’ is recognized to be the square of the standard tstatistic (2). Although the test statistic is the same, a fact that facilitates its calculation, it is clear that, in general, the tests are not. Testing for A, is equivalent to testing for H, only for r: =0.9 ‘This test is related to that of Toro-Vizcarrendo and Wallace (1968) for determining when linear restrictions on /I will reduce the MSE of the estimates. The 7: for their test is f, As relates to collinearity, their test determines when removal of a set of variates reduces variances by more than it increases bias. Our orientation is quite different, since we wish to define the presence of data weaknesses (of which collinearity is but one) and to determine when estimates of parameters known to be involved (and hence not to be removed) are beset with such problems. Furthermore, we wish to be able to set T: meaningfully for different levels of stringency, a procedure for which we introduce in the next section. ‘See, for example, Goldberger (1964, pp. 173-176) or Theil (1971, pp. 1399141). This F-statistic is also known as Hotelling’s (1931) Tz-statistic - see Anderson (1958, ch. 5). ‘This is seen by noting that, in addition to the fact that 4’ is the usual F-statistic, (1) ?=O if and only if /3* =/?T and (2) under H, (A, with T: =O), the F-distribution in (13) becomes a central F.
218
D.A. Belsley, Assessing the presence of data weaknesses
Second, if p2 #/?T, a test that r2 is high is also interpretable as a test that the variance-covariance structure I/‘@,) of b, is low relative to fi2 -PT. Similarly, an inability to reject A0 for at least a modest value of r”, can be considered evidence for large or inflated variances of b,, an interpretation we exploit in section 6 in diagnosing the presence of harmful data problems.
4. Defining and testing for adequate/inadequate
signal-to-noise
The test for s/n given in section 3.2 has one practical drawback: it requires knowledge of Bz and I/@,) to stipulate directly a value for 7:. Here we propose a practical and intuitively appealing definition for an adequate level of s/n that does not require /?z and V(b,). This measure is an increasing function of a single, selectable parameter y E [0, l), and can be made stringent (y chosen near unity) or relaxed (y chosen small). This level of adequate s/n is combined with the previous test for s/n to produce a test for adequate/inadequate s/n. 4.1. Isodensity
ellipsoids
as a measure
of adequate
signal-to-noise
We begin by introducing the y-isodensity ellipsoidlo for the least-squares estimator b2 of pz. For each 0 5 y < 1, this p,-dimensional ellipsoid is the set of all b2 such that
where &, is the y-critical value for the central chi-squared distribution with degrees of freedom. Fig. 1 depicts one member of this family of ellipses for the p2=2 case where b, has two components b, and bj. Since b z -N&4, VbJ) implies
p2
a central chi-squared distribution with
pz
degrees of freedom, the y-isodensity
“This concept, or closely related ones, occurs in the literature under a number of different names. Cramer (1946, sect. 21.10) calls one element of this family the ellipse of concentration, and, in a different context (sect. 21.6) calls the family of such curves ellipses of inertia. Malinvaud (1971, p. 154) borrows the name concentration el?ipsoid. We have not used this name here, however, since it is too easily confused with the commonly occurring, but distinctly different, concept of the confidence ellipse employed in describing regions of confidence for a subset of parameters (see text). Wonnacott and Wonnacott (1979) call this identical family of ellipsoids the isoprobability ellipses, but this intuitively appealing term is a misnomer, since the probability associated with every outcome b, is zero. Presumably they have isodensity in mind. Anderson (1958, p, 57) in fact names this family the ellipsoids of constant density, a wholly appropriate, but slightly awkward name. The term isodensity ellipsoid seems a proper compromise.
D.A. Belsley, Assessing the presence of data weaknesses
219
bk t
Fig. 1. The y-isodensity
ellipsoid
for h,.
ellipsoid is the ellipsoid of smallest volume that contains any particular leastsquares outcome from b, -NP2(j?,, V(b,)) with probability y.” The y-isodensity ellipsoids therefore define regions of likely and unlikely outcomes for the LS estimator b, [given X and, hence, V(b,)]. In fig. 1, for example, if y=O.95, an outcome such as b2=& that lies outside of the ellipsoid would be unlikely. This would be even more true if y = 0.99 or 0.999. If, however, y = 0.3, such a result would hardly be surprising. Thus, the family of isodensity ellipsoids provides a natural measure of the probabilistic distance between any point /3T and the true mean /12.12gl3 This leads to the following definition: Definition. The probabilistic distance between /?z and p2, relative to the least-squares estimator b2-NPz(B2, V(b,)), is the y that determines the isodensity ellipsoid centered on /I2 and including BT on its boundary, i.e., the y such that
“These isodensity ellipsoids for the random vector b, should not be confused with the superficially similar, but conceptually distinct, concept of a confidence ellipsoid for a parameter PI. These latter ellipsoids, defined for a particular LS estimate b:, are the set of 5 ERGS such that (4:-b:),T-‘(b,X5-b:)~,p,,F~,,
(17)
where S(b2)=s2(X:A4,XJ1 and .,F,P?, is the y-critical level for the central F with p2 and n--p degrees of freedom. Confidence ellipsoids, centered on bf, have the property that, conditional on X, they will bracket the true value /I2 with probability y. By contrast, the y-isodensity ellipsoids, centered on the true parameter value b2, have the property that they contain any particular LS outcome, conditional on X, with probability y. “Although pz here represents any point in the space R,>, we shall eventually associate it with a hypothesized value for j2. r31n the pz = 1 case this measure is simply the probability associated with the event that a least-squares outcome will be less than or equal to a multiple m (bk-P:l/bbli standard errors away from the mean value pk.
220
D.A. Belsley, Assessing the presence of data weaknesses
Clearly y =0 if and only if p2 = /3; and there is otherwise a one-to-one monotone increasing mapping between y on [0,1) and & on [0, co). This measure has immediate application, for we see from (7) that the lefthand side of (18) is r2, the s/n of b, relative to /I:. The probabilistic distance, then, also provides a natural means for assessing the size of s/n, r2, which does not require knowledge of j2 and V(b,). A level of s/n, r2, can be considered large if it corresponds to a large probabilistic separation of pg from f12, that is, if it equals a value of .J;, for a y chosen near unity, say 0.90 or 0.95 or 0.999. A weak level of s/n corresponds to little probabilistic separation and has magnitude &, for y chosen small, say 0.75 or 0.5 or even 0.0. In fig. 2 we show how J:, varies with 11for p2 = 1. Y’
0.1
0 0
I I
I
2
I 3
I
4
I 5
I 6
I 7
I 8
I 9
I IO
I
.$
Fig. 2
For any choice of y E [0, l), then, call the magnitude
r:=ux;,
(19)
the threshold of adequacy (at level y), where, we recall, & is the y-critical value of the central chi-squared distribution with p2 degrees of freedom. The s/n, r2, of the least-squares estimator b, of /I2 relative to j?t will be said to be adequate at level y if
A level ? $rt
will be said to be inadequate
at level y.
The threshold of adequacy, rt, may be set by the researcher, according to need: stringently, by choosing y near unity, or relaxed, by choosing smaller values for y. Precisely which value is most useful for econometric practice is somewhat like choosing a size for a test of hypothesis, something that experience and empirical experimentation must answer. Different choices will
D.A. Belsley, Assessing the presence of data weaknesses
clearly be appropriate to different applications. different choices on y will be explored.
In the example
221
in section
7
4.2. Testing for adequate/inadequate signal-to-noise The test for s/n given in section 3.2 can be used to test for the adequacycondition (20) to produce the following test for adequate s/n: 1)
Choose a level 0 5 y < 1 defining adequacy consider the null hypothesis A,(y):? A,(y):z2 > zt, which is condition (20).
and determining 7: 3 J;~, and =z: versus the alternative
2)
Choose a test size CI (say 0.05 or 0.01) for the test for s/n of section 3.2 and calculate the critical value F, from (14) for Z: = it (see tables in appendix A).
3)
Then $J~> F, rejects A,(y) in favor of A,(y) and hence accepts the presence of adequate s/n, while 4”s F, accepts A,(?;), which will be interpreted to accept the presence of inadequate s/n.
Appendix A provides tables for the critical values F, required in step 2) for values of cc=O.Ol, 0.05, 0.10, 0.25, for values of y=O.75, 0.90, 0.95, 0.99, 0.999, 0.9999, 0.99999, 0.999999, and for degrees of freedom relevant to most econometric applications.
5. Relating the test hypothesis
for adequate
signal-to-noise
to conventional
tests
of
In this section we explore some aspects of the relation between the test for adequate signal-to-noise and the usual test of hypothesis H,:b, =/3T vs. H,:fi, # PT (which will be interpreted to subsume tests of significance and confidence intervals). As typically described in the textbooks, in testing H,:P, =a; vs. H,:B2#Pz, a test statistic is examined. If it falls in an acceptance region, H, is accepted; otherwise H, is rejected and H, accepted. In practice, however, tests are not so straightforwardly conducted. Often the acceptance of W, has deleterious repercussions as, for example, when a variable thought to be important is found to be insignificant. In such circumstances few practitioners would unquestioningly accept H,:fi, = 0 and toss X, from the model, for the costs of such an action can be great. Economically, it has the cost of removing a theoretically important driving force from the model. Econometrically, it weighs the usually small gains that accrue to one added degree of freedom against introducing substantial misspecification bias. Rather, the practitioner would typically realize that
222
D.A. Belsley, Assessing the presence of data weaknesses
data problems can greatly inflate the acceptance region, rendering standard tests unpowerful and substantially broadening interval estimates. The preceding highlights an interesting practical aspect of the use of conventional tests and the interpretation given their test statistics. Since data problems are known to cause inflated parameter variances and a loss of power, their suspected presence can raise suspicion about the quality of the estimated regression results and any tests based on them. But, in the absence of anything better, low values for the t-statistic are often used, somewhat informally and certainly circularly, as diagnostic evidence for the presence of such data problems, particularly when an important variate has an insignificant parameter estimate. Of course, conventional r-tests are not designed to serve such heavy duty, both providing a test of hypothesis on coefficient values and diagnosing problems due to data weaknesses. Whereas low t’s may indicate weaknesses, high t’s need not indicate their absence. To serve this diagnostic need, a test for a loss of power or inflated variances is required. This task might be accomplished through the use of the power function, but this requires knowledge of p2 and V(/(b,).However, in testing H, against H, when H, is true, ceteris paribus, a test based on the OLS estimator b2 of flz will have greater power the greater is r2, the s/n of b, relative to j3T.r” This suggests the relevance of assessing the magnitude of s/n as a diagnostic for the presence of inflated variances and data problems,’ 5 and this is precisely what the test for adequate s/n is designed to achieve. The test for adequate s/n, then, does not substitute for, nor is it intended to substitute for, conventional testing procedures. Rather, it complements these procedures, indicating when data problems may be causing inflated variances. 6. Diagnosing
harmful data problems (collinearity
and short data)
We have seen that a low value of s/n, r2, can indicate inflated variances which reduce the power of tests and render interval estimates broader than 14To see this for the case p2= I, see Wilks (1962, pp. 396397). ‘?n a different context, a related use of signal-to-noise, r *, is suggested by Anderson (1958, p. 115) who writes: ‘Emma Lehmer (1944) has computed tables of $ [a transformation of $1 for given significance level and given probability of a Type II error. Her tables can be used to see what value of T* is needed to make the probability of acceptance of the null hypothesis significantly low when p#O. For instance, if we want to be able to reject the hypothesis p=O on the basis of a sample for a given p and Z, we may be able to choose N so that N prC-‘p=r2 is sufftciently large. Of course, the difficulty with these considerations is that we usually do not know exactly the values of p and Z (hence, r’) for which we want the probability of rejecting a certain value.’ Our use of the concept of adequate signal-to-noise as measured by the probabilistic distance given in section 4.1 is designed to circumvent this last-mentioned practical shortcoming.
223
D.A. Belsley, Assessing the presence of data weaknesses
may be desirable. We have also indicated that many practitioners, when confronted with such symptoms, like ‘low t’s’, often consider them as informal evidence of the presence of data problems. Here, we show how this process can be given a proper basis through the test for adequate s/n. Specifically, given the unalterable structural parameters DZ and cr2 and the presumption that H,:P, = fi; is not true, 22 is small precisely when there are data problems either in the form of collinearity or of short data, a term we define as we proceed. Thus, under these circumstances, inadequate s/n directly reflects data weaknesses, and a test for inadequate sJn is diagnostic of these problems, 6.1. The four causes of low signal-to-noise
We rewrite (7), the expression for r2, the signal-to-noise of the least-squares estimator b, of /j2 relative to pz, in a manner that shows how four elements come together to determine its value. We begin by repeating eq. (7): 2’
=
and reformulate
(B2
-P*)‘v 2
- ‘@
2 )(A
-B*)
2,
I/- l(b2) as follows:
a2V-‘(b,)=X$M,X,,
= x;xz
- XTr?,,
where
M, =I-X,(XTX,)-‘Xi’,
where
Xz-X,(XTX,))‘XTX2,
=X~X,[Z-(X~X,)~1~z,T~,],
(21)
where we assume (XTX,) - ’ exists, = XTX,(Z - A),
where
A E (XIX,)
‘X:‘X,.
The matrix A, in (21) generalizes the (uncentered) multiple correlation coefficient. Indeed, if X2 were a single column (pa = l), then A is simply R$ =~~8,/X,‘X,, the uncentered R2 from a regression of X2 on the remaining columns of X. More generally, if, at the one extreme, of columns of X, are orthogonal to those of X,, X2 =0 and A =O; while if, at the other, the columns of X2 are perfectly related to those of X,, 8, =X2 and A = 1. The greater the collinearity, then, the ‘closer’ is A to 1.16 Now (7) becomes T2 =
(B2
-PT)Tx,‘x,(~
-
4x82 - E)b2.
‘%everal scalar measures of closeness could be used here - det (A), for example, IlAl/, the spectral norm of A; see Wilkenson (1965), Stewart (1973), or BKW (1980).
65-9 or better yet
224
D.A. Belsley, Assessing the presence of data weaknesses
Here we see how four separable elements determine a low value for ?. First, the closer is /?T to fiZ, ceteris paribus, the smaller 22. Second, the greater is the inherent noise, cz2, ceteris paribus, the lower 22. Third, the greater the collinearity,’ ’ ceteris paribus, the closer A tends towards I and the smaller becomes 22. Fourth, the shorter is the ‘length’ of X2 (i.e., X:X, as measured, for example, by the spectral norm), ceteris puribus, the lower r2 tends to be. However, simply resealing X, (changing its units of measurement) to make it long cannot increase 22, since resealing X2 requires an inverse scaling for pZ, leaving 3 unchanged and‘ violating the ceteris paribus condition. Indeed we note that z2 has the desirable property of being invariant to linear transformations of X. Now, the first two determinants of 52 deal with the unknown but unalterable structural parameters p2 and g2, and the hypothesized value for fiz, all of which are fixed for a given situation. If, then H, is assumed untrue will be small precisely as there is a short set of explanatory 032 f m TZ variables X2 and/or a strong degree of collinearity. 6.2. Determining harmful collinearity
and short data
In light of the preceding discussion, it is natural to designate collinearity as being harmful when it can be simultaneously demonstrated that collinearity exists and inadequate signal-to-noise exists relative to some or all of the parameter estimates of the affected variates. The matrix A, introduced above, might seem to provide a reasonable basis for defining the presence of collinearity. Correlation-like measures, however, have several serious faults in this context. First, while they are near unity when there is very strong collinearity, and near zero when there is near orthogonality, no value has been determined which provides a meaningful dividing line between the presence and absence of collinearity. Second, A can be calculated only after the collinear relations among the columns of X have been discovered, and so it cannot be used to diagnose them. These and other problems, however, are overcome by the collinearity diagnostics described in Belsley, Kuh and Welsch (1980). The collinearity diagnostics from BKW allow one quickly to determine the number of near dependencies among the columns of X as well as the variates involved in each, and hence to partition the model as in (4).‘* It is then straightforward to apply the test of section 4.2 to determine whether inadequate s/n has also resulted. “On the meaning of this, see BKW (1980, ch. 3). “This partitioning is, of course, not unique since, if there are pz relations among the p columns of X, there are many ways p2 of the columns could be selected to regress on the remaining p1 =p-pz. This non-uniqueness is unimportant, however, since no matter how X, is picked, the essence of the ensuing argument continues to hold. A particularly effective means for selecting X, is described in section 3.4 of BKW (1980).
D.A. Belsley,
Assessing the presence of data weaknesses
225
Collinearity Present? (determined
via the diagnostics of BKW)
yes Inadequate Signal-to-Noise Present? (determined by test of Section 4.2)
Fig. 3
A procedure for determining harmful collinearity, then, results from the sequence of (1) the collinearity diagnostics of BKW, followed by (2) a test for adequate s/n for the b, that correspond to the X, indicated by the collinearity diagnostics. Harmful collinearity will be said to occur when both conditions prevail. More generally, we can examine the four possible outcomes of this sequence of tests given in fig. 3:
(1)
In situation 1, everything seems right with the world; neither collinearity nor inadequate signal-to-noise is present. This situation is desirable for both structural estimation and for prediction.
(2)
Situation 2 depicts non-harmful collinearity; that is, collinearity is present, but has not resulted in inadequate s/n. This is not to say that, ceteris paribus, better conditioned data wouldn’t be nice to have (for this is always the case), but rather that the ill effects of collinearity have been mitigated be relatively small C? and/or long X,. This situation augurs well for the use of the model for prediction purposes, particularly, but not necessarily only, if the collinear relations continue into the prediction period.
3 depicts the case where there is no collinearity in X, but (3) Situation inadequate signal-to-noise is nevertheless present. For the given p2 #,@ and g2 this situation arises when the ‘length’ of X, is short. Thus, while data problems exist, collinearity is not the culprit, and we use this situation to define the presence of short data. It is of interest to note that, should inadequate s/n be shown to exist for the full set b relative to p*=O in this situation, the quality of information derivable from OLS would be poor not only for the estimation of individual parameters but also for estimates of linear combinations of the parameters (since there are no strong near dependencies among the columns of X). This latter set of circumstances, then, would be bad for either structural estimation or prediction.
226
D.A. Belsley, Assessing the presence
of data weaknesses
(4) Situation 4, as already noted, defines harmful collinearity: the joint occurrence of collinearity and inadequate signal-to-noise. This situation is rather generally harmful to structural estimation, but not necessarily to prediction if the collinear relations extend into the prediction period.lg That is, inadequate signal-to-noise may occur relative to the subset b, but not to the full set of estimators b.
7. An example We illustrate the foregoing techniques with the commercial paper rate equation from the Michigan Quarterly Econometric Model (MQEM) for the U.S. economy. ” This equation has the form
+/~,s~(T)+B,D(T)+P,INF(T)+P,cPR(T-1)+&,
(23)
where CPR(T) = the commercial paper rate, percent per annum, TB(T) = the 90-day treasury bill rate, percent per annum, Si( T) = a seasonal dummy for quarter i = 1,2,3, =a dummy for the first three quarters 1974 [see D(T) Hymans (1978)], ZNF(T) =the rate of inflation, (pt-~t_2)/pt-2.
Gardner
and
The data are quarterly, 1955.1 to 1975.1V, giving 84 observations and 75 degrees of freedom. Although TB(T) and ZNF(T) are endogenous, this equation is nevertheless estimated in the MQEM by OLS. We also treat all the right-hand variables as predetermined. Estimation of (23) by OLS gives (standard errors in parentheses) CPR(T)=
5.589 + 0.985TB(T)(1.325) (0.041)
0.395TB(T(0.095)
l)-
O.O53Sl(T)+ (0.037)
O.O82S2(T) (0.037)
+ 0.05733(T) + 1.6800(T) - 5.489ZNF( T) + 0.511 CPR(T), (0.066), (1.340) (0.038) (0.153) (24) SER=0.189,
R2=0.9926,
D.lK=1.276,
~(X)=206.
“For an example employing these diagnostics to improve prediction when harmful collinearity does not extend into the forecast period, see Belsley (1981). “This model has been made available by Professors Hymans and Howrey to the Model Reliability Project based at MIT as a vehicle for collaborative research among eight universities.
D.A. Belsley, Assessing the presence of’data weaknesses
227
Following the procedure of section 6, we first apply the collinearity diagnostics of BKW to determine the linear near dependencies in the matrix of predetermined variables, and then test for inadequate signal-to-noise. A brief summary of the collinearity diagnostics of BKW is presented in appendix B. 7.1. The collinearity
diagnostics
The large (scaled) condition number of 206 assures us that at least one strong linear near dependency exists among the columns of X. The number of near dependencies and the variates involved in each are revealed from the Zl matrix of variance-decomposition proportions given in table 1. For this analysis wt adopt a condition-index threshold of 30 and a variancedecompysition proportion threshold of 0.5. Two near dependencies have condition indexes (206 and 58) which exceed the threshold of 30. The first is dominant and certainly contains the variates CONST and ZNF(‘I’). The second contains TB(T- l), CPR(Tl).. and possibly CONST and ZNF(T). The exact composition of these two near dependencies is made clear through the auxiliary regressions in table 2 occurring when CONST and TB(T- 1) are regressed on the remaining variates. These results confirm a strong linear near dependency between CONST and INF(T), which also includes CPR(T1) at a moderate level and D(T) weakly.” The second near dependency involves TB(T- 1) and CPR(T1) and has weak, indeed inconsequential, strongly, TB(T) moderately, involvement of S1(T) and S3(T). Thus, the five variates CONSll; TB(T), TB(Tl), ZNF(T) and CPR(T1) are substantially involved in one or more linear near dependencies, while variates S1( T), SZ(T), S3(T) and D(T) are essentially free from such involvement.
7.2. Inadequate
signal-to-noise
Unless some specific p* is indicated, s/n will most frequently be examined relative to the origin, B* =O. We begin examining each coefficient individually (p2 = 1) and turn, when indicated, to joint tests of interest. Choices must be made for the test size LYand for the parameter y determining the level of adequacy. For purposes of illustration, we choose a=0.05 and examine two different choices for y of 0.90 and 0.999. A y of 0.9 is a modest value, illustrating a choice that might be made if the regression analysis were of less importance. A 7 of 0.999 is more stringent and illustrates a choice suitable for assessing the quality of parameter estimates that are ‘lExperience has shown use of linear regression.
that t-values
of 3 and 4 are none too large in this purely
descriptive
228
D.A. Eel&y,
Assessing the presence of data weaknesses
D.A. Be&y,
Assessing the presence
ofdata weaknesses
229
considered important. Here, the test will be more sensitive to data inadequacies and apprise the user more readily of their presence. Table 3 summarizes the results for all parameter estimates individually (p2 = 1) relative to the otGgin, /I: = 0. We indicate in column 2 the results of the previous collinearity analysis. In column 3 the value calculated for 4” from (11) is reported. Table 3 Summary
of adequacy of signal-to-noise and possible data problems for each coefficient commercial paper rate equation (MQEM); pz = 1, /I: = 0 and u = 0.05.
(1)
(2)
Variates and parameters
Degraded due to collinearity”
CONST TNT) TB(T-1) S!(T) SJ( T) S(T) D(T) INF(T) CPR(T-
B, ;: i: [jh ;; I) /L,
yes yes yes no no no no yes yes
(3) 4’
(4) jJ= 0.90, F O,Os= 11.3
of the
(5) ;’ = 0.999, F,,, = 26.0 -
Il.9 566.0 17.3 2.0 4.9 2.2 119.92 16.78 59.71
“From section 6.2. %D = possible short data, HC = harmful
Inadequate s/n?
Possible data problemb
Inadequate s,‘n?
Possible data problcmb
no no no yes yes yes no no no
none none none SD SD SD none none none
yes
HC none HC SD SD SD none HC none ___-
“0
yes yes yes yes no yes no
collinearity.
7.3. Harmful collinearity For y = 0.90, we see from column (4) that none of the collinear variates has a 4” exceeding the critical value for Fo.os of 11.3 (taken from tables, in appendix A). At this level of adequacy, then, there is degrading but not harmful collinearity. For y=O.999, however, the critical value for F,.,, is 26.0, and there is harmful collinearity (HC) evidenced individually for the estimates of pI(CONST), /3,(TB(T-1)), and fl,(INF(T)). The two other variates showing degradation, P2(TB(T)) and j,(CPR(Tl)), show no evidence of harm due to collinearity (relative to the origin). Thus, if the estimates of PI, /I3 or /3s are of importance to the analysis, the researcher is apprised that collinear relations have significantly reduced their quality and that efforts for increasing the information set upon which these estimates are based, either
230
D.A. Belsley, Assessing the presence of data weaknesses
better conditioned data or the introduction of prior information, would be worthwhile. This analysis also helps determine where prior information is most usefully applied. Since CONST and INF(T) are strongly collinear, prior information on p,(CONST) or P,(INF(T)) will increase the quality of both of these estimates even if directed at only one. Similarly, since CPR(T- 1) is involved in both dependencies, prior information on pg can help the quality of the estimates of PJTB(T1)) and fl,(CONST) even if the estimate of & is not directly harmed by the presence of collinearity. We can also investigate whether the estimates of the parameters of variates involved in collinear relations show jointly inadequate signal-to-noise. With two near dependencies, we take pz =2. If we select X, to contain CONST and TB(T- l), as for the auxiliary regressions, the joint test on b, and b, (relative to the origin) has a 4’ of 16.04. The critical values from the tables are approximately 8.32 for y = 0.90 and 16.06 for y = 0.999. At the weaker y there is no evidence of jointly harmful collinearity, while at the stronger y, there is. In this latter case, not only are the individual estimates of /I1 and /I3 of poor quality, but so also is the estimate of their joint influence. This need not hold for all such joint tests. If one were interested in the joint adequacy of the estimates of fi,(CONST) and &(CRP(T--l)), the 4’ for this joint test is 31.79, which exceeds the 16.06 critical value. In this case collinearity is not harmful (at y =0.999) to analysis based on joint effects of b, and b,, but is harmful to understanding their individual effects. We can also consider tests relative to fiz’s other than the origin. In a dynamic model of this sort with a lagged dependent variable, stability requires that (&[ < 1. G’iven that the estimate b, of fig is seriously degraded by collinearity, we might wish to determine whether there is harmful collinearity relative to H,:/?, = 1. If there is not, one would have great faith in the stability of the equation; if there is, one would, for example, be quite hesitant to accept H,:fi, = 1 on the basis of these data. The 4’ relevant to this test is 54.79, greatly in excess of the critical value of 26.0 for y =0.999. Thus, collinearity is not harmful to b, in this context either. 7.4. Short data From table 3, we note inadequate s/n for the estimates of the coefficients of the three seasonal dummies SI(T), S2(7’) and S3(T), which are, however, effectively free from involvement in collinear relations. A researcher with no strong prior belief in this seasonal effect would easily discount the value of these estimates. Even a test for the joint adequacy of s/n for these three seasonal dummies (p2 = 3) fails. The relevant 4’ is 3.71 and
D.A. Belsley, Assessing the prhsence c#”data weaknesses
231
the appropriate critical value is approximately 7.0 for “J=0.90 and 12.4 for y =0.999. In such circumstances one might simply test whether the three seasonal dummies were jointly significantly different from zero by a conventional test of significance, either removing the variates if it fails or simplifying their structure. The 0.05-critical value for this test (from standard F tables) is 2.73. The joint effect is therefore significant, but clearly not well determined and not suitable for assessing important analytical questions. By way of contrast, a researcher with strong prior beliefs in the relevance of this seasonal effect would be disappointed. Not only are the estimates of low quality, but the data cannot be held responsible. Here, one might question the seasonal specification of (23), but if the linear structure is considered appropriate, only a much larger sample could further reduce these seasonal parameter variances. Short of such riches, the user must depend upon strongly formulated prior information if better quality estimates for lj4, p5 and b6 are required. Finally we note that the dummy D(T) is not involved in a collinear relation and that its parameter estimate possesses adequate s/n even for y=O.999. It must be concluded that the data are thoroughly suitable to providing a high quality estimate of p7. 7.5. Summary The data for a commercial paper rate equation are found to possess two strong collinear relations involving five of the nine columns of X. At a moderate value of y =0.90, no harmful collinearity is evident. However, at a more stringent choice for y=O.999, the estimates of pi(COiur), fi3(TB(TI)), and B,(INF(T)), are found to be harmed by collinearity. The data are unsuitable to provide estimates of adequate quality for these parameters. These data are suitable, however, to provide adequate quality estimates for the parameters bi(TB(T)) and p,(CPR(Tl)), the two other collinear variants. For these estimates we have degrading but not harmful collinearity. This analysis also indicates where the introduction of prior information (or better data) would be most beneficial. The seasonal structure is seen to have inadequate s/n, individually and jointly, even though the seasonal dummies are free from collinearity. Either there is no seasonal effect or its impact is small, unimportant and ill determined. By contrast, the dummy D(T) is not collinear and its parameter estimate possesses adequate s/n, even when ‘/ =0.999. The data, then, are admirably suitable for providing a good quality estimate of B,, Most of the above information would not be systematically available from usual test procedures.
232
D.A. Belsley, Assessing the presence
of data weaknesses
Appendix A: Critical values for 1_ .F:Z ,,(&,) In order to apply the test for adequate critical values are required for
signal-to-noise
given in section
3.2,
(A.11 i.e., the (1 -a)-critical value for the non-central freedom and non-centrality parameter 7: = &.
F with pz and n-p Here we recall:
degrees of
CI =level chosen to define adequacy, e.g., 0.75, 0.90, 0.999, y =test size for the test for signal-to-noise, e.g. 0.05, or 0.10, n = number of observations, p =number of explanatory (independent) variables (including the constant term) pz =number of explanatory variates in the subset under test. The following tables, grouped according to increasing values of “J within each c(, provide values for (A.l) for y =0.75, 0.90, 0.95, 0.99, 0.999, 0.9999, 0.99999, and 0.999999; CI=O.Ol, 0.05, 0.10, 0.25; p2 = 1(1)6(2)10; and n-p = 10(1)20(2)30(5)40(10)60, 100, 150 300, and 1,000. These values were computed using a modified version of the non-central distribution programs given in Bargmann and Ghosh (1964). l
D.A. Belsley, Assessing the presence
ofdata weaknesses
233
Table A.la z=o.25, Degrees
E I 5 b ;;i .c E E 4 i 4 8 t % z t F n
of freedom,
numerator
yzo.75
(pi)
1
2
3
4
5
6
8
10
10 Ii 12 13 14 15 16 17 I8 19 20 22 ;:
3.7706 3.7267 3.6343 3.6656 3.6412 3.6204 3.6022 3.5864 9.5723 3.5539 3.5488 3.5236 3.5137
28 30 35 40 50 60 a0 100 150 300 1000
3.5034 3.4890 3.4792 3.4593 3.4453 3.4253 3.4120 3.3955 3.3656 3.3727 3.3589 3.3498
3.8917 3.8381 3.7941 3.7574 3.7263 3.6997 3.6765 3.6563 3.6383 3.6224 3.6082 3.5838 3.5636 3.5466 3.5320 3.5195 3.4947 3.4763 3.4507 3.4337 3.4127 3.4001 3.3842 3.3671 3.3560
3.8077 3.7435 3.7017 3.6618 3.6280 3.5989 3.5737 3.5516 3.5322 3.5148 3.4392 3.4726 3.4505 3.4319 3.4161 3.4024 3.3753 3.3551 3.3270 3.3084 3.2854 3.2716 3 2544 3.2357 3.2231
3.7265 3.6657 3.6158 3.5740 3.5386 3.5082 3.4817 3.4585 3.4380 3.4198 3.4035 3.3754 3.3522 3.3326 3.3159 3.3015 3.2729 3.2516 3.2219 3.2023 3.1779 3.1633 3.1446 3.1250 3.1116
3.6602 3.5977 3.5464 3.5034 3.4668 3.4354 3.4081 3.3843 3.3631 3.3442 3.3273 3.2983 3.2742 3.2539 3.2366 3.2217 3.IYlY 3.1698 3.1389 3.1185 3.0930 3.0778 3.0582 3.0377 3.0236
3.6066 3.5429 3.4905 3.4466 3.4093 3.3771 3.3492 3.3247 3.3030 3.2836 3.2663 3.2365 3.2118 3.1910 3.1732 3.1578 3.lSll 3.1043 3.0725 3.0513 3.0250 3.0092 2.9888 2.9675 2.9529
3.5259 3.4605 3.4067 3.3615 3.3231 3.2899 3.2611 3.2357 3.2132 3.1932 3.1752 3.1443 3.1186 3.0969 3.0783 3.0623 5.3SUJ 3.0063 2.9729 2.9506 2.9229 2.9062 2.8846 2.8619 2.8464
3.4679 3.4015 3.3467 3.3007 3.2615 3.2277 3.1982 3.1722 3.1492 3.1287 3.1103 3.0735 3.0520 3.0297 3.0106 2.9940 _._P.LO"I 2.9361 2.9015 2.8783 2.8494 2.8320 2.8094 2.7057 2.7694
4.3178 4.2398 4.1755 4.1216 4.0758 4.0363 4.0020 3.9718 3.9451 3.9213 3.9000 3.8632 3.6328 3.8071 3.7851 1 ?CCI W.."". 3.7283 3.7000 3.6606 3.6344 3.6017 3.5822 3.5568 3.5303 3.5121
4.1199 4.0423 3.9782 3.9244 3.8786 3.8391 3.8047 3.7744 3.7476 3.7237 3.7022 3.6652 3.6345 3.6085 3.5863 3.5570 3.528; 3.4999 3.4598 3.4330 3.3996 3.3795 3.3534 3.3261 3.3074
3.9858 3.9085 3.0447 3.7910 3.7452 3.7057 3.6712 3.6409 3.6140 3.5900 3.5684 3.5312 3.5002 3.4740 3.4516 3.4122 3.3333 3.3641 3.3233 3.2960 3.2619 3.2413 3.2145 3.1865 3.1673
Table A.lb a=0.25, Degrees
of freedom,
numerator
y=O.90
(pz) 5
2
z
I
E "0 ;;i .E 2 E -0 g 4 8 t c= u
M !t t? n
10 11 12 13 14 1s 16 17 18
6.1402 6.0650 6:0033 5.9518 5.9082 5.6707 5.8382 5.8097 5.7846
:: 22 24
5.7622 5.7422 5.7079 5.6794 5.6555 5.6351 5.6!76 5.5827 5.5567 5.5208 5.4970 5.4674 5.4497 5.4270 5.4026 5.3863
',; 30 35 40 50 60 80 100 150 300 1000
5.3575 5.2790 5.2147 5.1610 5.1154 5.0763 5.0424 5.0!27 4.9865 4.9631 4.9423 4.9064 4.8767 4.8518 4.8305 4.8121 4.7756 4.7485 4.7110 4.6860 4.6551 4.6366 4.6133 4.5882 4.5715
4.9228 4.8441 4.7736 4.7256 4.6798 4.6405 4.6064 4.5765 4.5501 4.5265 4.5055 4.4632 4.4333 4.4141 4.3925 4.3740 4.3371 4.3096 4.2715 4.2462 4.2148 4.1960 4.1721 4.1470 4.1297
4.6491 4.5707 4.5062 4.4522 4.4064 4.3670 4.3327 4.3027 4.2762 4.2526 4.2314 4.1950 4.1648 4.1394 4.1177 4.0990 4.0618 4.0340 3.3354 3.9698 3.9380 3.9190 3.8944 3.8688 3.8511
4.4590 4.3807 4.3163 4.2624 4.2166 4.1771 4.1428 4.1127 4.0861 4.0624 4.0411 4.0046 3.9742 3.9487 3.9268 7 Onon _..___ 3.8704 3.8424 3.8034 3.7775 3.7453 3.7259 3.7008 3.6750 3.6569
234
D.A. Belsley, Assessing
the presence of data weaknesses Table A.lc
a = 0.25, Degrees
of freedom,
1 8.0016 7.8966 7.8105 7.7386 7.6776 7.6253 7.5800 7.5402 7.5052 7.4739 7.4460 7.?981 7.3564
10
E I s 5 2 ._ E e $ G4 8 t % h: E 2 0
11 12 13 14 15 16 I7 18 19 20 22
'2; 28 30 35 40 50 60 80 100 150 300 1000
7.3251 7.2966 7.2721 7.2234 7.1872 7.1370 7.1037 7.0624 7.0378 7.0067 6.9728 6.9503
numerator
2 6.4438 6.3458 6.2653 6.1961 6.1411 6.0922 6.0497 6.0125 5.9796 5.9504 5.9242 5.8792 5.8420 5.8108 5.7840 5.7610 5.7153 5.6812 5.6340 5.6027 5.5639 5.5407 5.5110 5.4799 5.4587
y = 0.95
(pz) 4
5
6
8
10
5.3018 5.2104 5.1353 5.0723 5.0189 4.9729 4.9329 4.8979 4.6669 4.8393 4.8146 4.7720 4.7367 4.7070 4.6816 4.6597 4.6161 4.5836 4.5383 4.5083 4.4709 4.4486 4.4196 4.3896 4.3688
5.0165 4.9269 4.8532 4.7914 4.7389 4.6937 4.6544 4.6198 4.5893 4.5620 4.5376 4.4956 4.4608 4.4314 4.4063 4.3846 4.3413 4.3091 4.2641 4.2342 4.1970 4.1748 4.1457 4.1157 4.0949
4.8092 4.7210 4.6483 4.5874 4.5356 4.4909 4.4520 4.4178 4.3876 4.3606 4.3364 4.2948 4.2602 4.2311 4.2061 4.1845 4.1415 4.1094 4.0645 4.0347 3.9975 3.9751 3.9463 3.9159 3.8952
4.52.11 4.4379 4.3668 4.3071 4.2562 4.2123 4.1740 4.1403 4.1105 4.0839 4.0600 4.0188 3.9845 3.9556 3.9308 3.9094 3.8655 3.8344 3.7895 3.7595 3.7221 3.6997 3.6704 3.6398 3.6188
4.3344 4.2496 4.1796 4.1207 4.0704 4.0270 3.9892 3.9559 3.9263 3.8999 T.8762 3.8352 3.8011 3.7723 3.7476 3.7262 3.6833 3.6511 3.6061 3.5759 3.5382 3.5155 3.4857 3.4546 3.4334
3 5.7252 5.6322 5.5550 5.4904 5.4356 5.3885 5.3476 5.3118 5.2801 5.2519 5.2266 5.1832 5.1472 5.1170 5.0911 5.0688 5.0245 4.9915 4.9456 4.9152 4.8774 4.8548 4.8259 4.7955 4.7747
Table A.ld r=0.25, Degrees
z I 5 "0 2 .s E 8 +i E^ 4 8 * % E E
10 I1 I2 13 I4 I5 16 I7 I8 I9 20 22
;; 28 30 40 35
z: 80 100 150 300 1000
y=O.99
of freedom,
numerator
1
2
3
4
5
6
8
10
12.4157 12.2330 12.0831 11.9578 11.8517 11.7605 11.6815 11.6122 11.5511 Il.4967 Il.4479 Il.3642 11.2950
8.9245 8.7791 8.6597 8.5598 8.4750 8.4022 8.3389 8.2836 8.2346 8.1909 8.1519 8.0291
7.5214 7.3918 7.2851 7.1958 7.1200 7.0547 6.9980 6.9483 6.9043 6.8651 6.8299 6.7696 6.7196
6.7387 6.6160 6.5187 6.4354 6.3646 6.3036 6.2506 6.2040 6.1628 6.1261 6.0931 6.0365 5.9895
7.9824 7.9424 7.9079 7.8394 7.7884 7.7176 7.6705 7.6122 7.5773 7.5323 7.4855 7.4532
6.6774 6.6414 6.6103 6.5484 6.5023 6.4380 6.3954 6.3424 6.3106 6.2693 6.2269 6.1973
5.9499 5.9160 5.8867 5.8284 5.7849 5.7242 5.6839 5.6336 5.6036 5.5647 5.5238 5.4959
6.2303 6.1156 6.0210 5.9417 5.8742 5.8161 5.7654 5.7209 5.6816 5.6464 5.6149 5.5606 5.5155 5.4775 5.4450 5.4169 5.3608 5.3189 5.2604 5.2215 5.1729 5.1438 5.1057 5.0665 5.0392
5.8695 5.7590 5.6679 5.5914 5.5263 5.4701 5.4212 5.3782 5.3401 5.3061 5.2755 5.2229 5.1792 5.1423 5.1108 5.0834 5.0288 4.9880 4.9310 4.8929 4.8455 4.8170 4.7799 4.7412 4.7147
5.3847 5.2801 5.1937 5.1211 5.0592 5.0057 4.9591 4.9181 4.8817 4.8492 4.8200 4.7696 4.7277 4.6973 4.6619 4.6356 4.5830 4.5435 4.4802 4.4513 4.4051 4.3772 4.3408 4.3029 4.2770
5.0693 4.9686 4.8854 4.8153 4.7555 4.7038 4.6587 4.6190 4.5837 4.5522 4.5238 4.4749 4.4341 4.3996 4.3700 4.3443 4.2928 4.2542 4.1999 4.1636 4.1180 4.0905 4.0543 4.0167 3.9908
11.2369 11.1672 11.1443 11.0593 IO.9961 10.9084 10.8501 10.7780 10.7349 10.6810 10.6228 10.5830
8.0647
(pz)
D.A. Belsley, Assessing the presence c$ data weaknesses
235
Table A.le 3 = 0.25. Degrees
E ti .c: E : 4
10 11 12 13 14 15 16 17 16 19 20 22
2f'
28 ;;
8 ck
30 35 40 50 60 80 100 150
^4 I :
% M E F CI
300 1000
of freedom,
numerator
‘; = 0.999
(pJ
1
2
3
4
18.6536 18.5471 18.2952 18.0845 17.9058 17.7522 17.6187 17.5017 17.3987 17.3068 17.2243 17.0827 16.9654 16.8668 16.7825 16.7098 16.5654 16.4579 16.3084 16.2093 16.0862 16.0126 15.9174 15.8195 15.7515
12.4209 12.2049 12.0272 11.8784 11.7520 11.6432 11.5486 11.4657 11.3923 11.3269 11.2683 11.1675 11.0841 11.0137 10.9535 10.9016 10.7982 lb.7212 10.6139 10.5426 10.4539 10.4009 10.3322 10.2606 10.2115
9.9995 9.8185 9.6634 9.5444 9.4382 9.3467 9.2670 9.1970 9.1351 9.0799 9.0304 8.9452 8.8745 8.8149 8.7639 8.7198 8.6320 8.5665 8.4750 8.4142 8.3384 8.2930 8.2336 8.1726 8.1302
6.6926 8.5309 6.3974 6.2855 8.1902 6.1081 8.0365 7.9737 7.9180 7.8683 7.8237 7.7469 7.6832 7.6294 7.5833 7.5435 7.4640 7.4045 7.3215 7.2662 7.1973 7.1559 7.1020 7.0460 7.0076
7.8619 7.7124 7.5891 7.4855 7.3973 7.3212 7.2548 7.1965 7.1448 7.0986 7.0572 6.9858 6.9264 6.8762 6.8333 6.7962 6.7219 6.6663 6.5886 6.5367 6.4720 6.4331 6.3821 6.3294 6.2932
7.2813 7.1406 7.0243 6.9266 6.8433 6.7714 6.7087 6.6536 6.6047 6.5610 6.5218 6.4541 6.3978 6.3502 6.3094 6.2741 6.2035 6.1505 6.0764 6.0269 5.9650 5.9278 5.8790 5.8283 5.7936
6.5141 6.3850 6.2782 6.1883 6.1117 6.0455 5.9876 5.9367 5.8915 5.8511 5.8147 5.7520 5.6997 5.6555 5.6176 5.5847 5.5168 5.4693 5.3998 5.3533 5.2950 5.2596 5.2136 5.1654 5.1323
6.0231 5.9016 5.8009 5.7162 5.6436 5.5812 5.5265 5.4783 5.4355 5.3972 5.3627 5.3031 5.2535 5.2114 5.1752 5.1439 5.0810 5.0336 4.9670 4.9223 4.8661 4.8321 4.7871 4.7406 4.7084
Table A.lf 1=0.25, Degrees
10 11 12 13 14 15 16 17 18 19 20 22 24 26 28 30 35 40 50 60 80 100
150 300 1000
of freedom,
numerator
y=O.9999 (pz)
1
2
3
4
5
6
8
10
25.3756 24.9361 24.5745 24.2717 24.0146 23.7933 23.6009 23.4321 23.2829 23.1499 23.C307 22.8257 22.6560 22.5130 22.3907 22.2851 22.0750 21.9184 21.7003 21.5554 21.3754 21.2676 21.1275 20.9825 20.8823
15.R874 15.5987 15.3609 15.1615 14.9920 14.8459 14.7187 14.6071 14.5063 14.4202 14.3411 14.2050 14.0921 13.9970 13.9155 13.8451 13.7049 !3.6000 13.4539 13.3567 13.2356 13.1630 13.0689 12.9704 12.9032
12.4224 12.1898 11.9980 11.8370 11.7000 11.5818 11.4789 1.3884 1 1.3083 1 1.2367 1.1725 1.0620 1 0.9701 0.8926 0.8262 10.7686 10.6542 10.5685 10.4487 10.3689 10.2694 10.2096 10.1313 10.0507 9.9952
10.5834 10.3809 10.2138 10.0733 9.9537 9.8505 9.7605 9.6814 9.6112 9.5486 9.4923 9.3954 9.3147 9.2466 9.1883 9.1378 9.0368 8.9612 8.8554 8.7848 8.6966 8.6436 8.5742 8.5022 8.4530
9.4278 9.2444 9.0929 8.9654 8.8569 8.7631 8.6813 8.6093 8.5455 8.4884 8.4372 8.3488 8.2752 8.2130 8.1597 8.1135 8.0212 7.9518 7.8548 7.7899 7.7088 7.6599 7.5958 7.5293 7.4837
8.6271 8.4570 8.3164 8.1981 8.0972 8.0100 7.9339 7.8669 7.8074 7.7543 7.7065 7.6241 7.5554 7.4973 7.4475 7.4043 7.3179 7.2530 7.1618 7.1009 7.0246 6.9785 6.9181 6.8552 6.8120
7.5790 7.4265 7.3003 7.1939 7.1032 7.0247
6.9150 6.7737 6.6567 6.5580 6.4738 6.4008 8.3370 6.2608 6.2308 A.1861 6.1458 6.0761 6.0180 5.9687 5.9263 5.8895 5.8156 5.7599 5.6814 5.6286 5.5621 5.5219 5.4683 5.4131 5.3746
6:9561 6.8958 6.8420 6.7940
6.750i 6.6762 6.6139 6.5612 6.5160 6.4767 6.3979 6.3387 6.25'>3 6.1994 6.1292 6.0868 6.0306 5.9725 5.9323
D.A. Belsley, Assessing the presence of data weaknesses
236
Table A.lg (x= 0.25, Degrees
^4 I Fr: 2 z .E E 2 4
10 11 12 13 14 15 16 17 18 19 20 22
@ 4 B k
24 26 28 30 35 "0 40 E 50 ix? 60 60 100 150 300 1000
of freedom,
numerator
y =0.99999
(p2)
1
2
3
4
5
31.9510 31.3730 30.8970 30.4977 30.1565 29.6661 29.6117 29.3683 29.1906 29.0143 28.8562 28.5841 28.3582 28.1660 28.0053 27.8644 27.5840 27.3747 27.0826 26.6884 26.6465 26.5016 26.3123 26.1172 25.9823
19.3383 18.9756 18.6765 18.4254 16.2117 16.0274 17.8666 17.7257 17.6007 17.4892 17.3890 17.2165 17.0732 16.9522 16.8487 16.7591 16.5802 16.4464 16.2595 16.1350 15.9796 15.8863 15.7647 15.6363 15.5518
14.6137 14.5293 14.2945 14.0972 13.9292 13.7841 13.6576 13.5464 13.4478 13.3597 13.2806 13.1442 13.0308 12.9350 12.8529 12.7818 12.6398 12.5333 12.3844 12.2850 12.1607 12.0860 11.9881 11.8867 11.8173
12.4372 12.1943 11.9935 11.8247 11.6808 11.5565 11.4481 11.3526 11.2680 11.1923 11.1243 11.0071 10.9094 10.8269 10.7562 10.6948 10.5722 10.4802 10.3511 lOi26.49 10.1569 10.0919 10.0056 9.9180 9.8573
10.9548 10.7379 10.5585 10.4076 10.2768 10.1676 10.0704 9.9849 9.9090 9.8412 9.7801 9.6748 9.5870 9.5128 9.4492 9.3939 9.2834 9.2003 9.0837 9.0057 8.9078 8.8488 8.7714 8.6906 8.6353
6
8
I0
9.9336 9.7347 9.5701 9.4316 9.3133 9.2111 9.1218 9.0431 8.9733 8.9108 8.8546 8.7576 8.6766 8.6081 8.5493 8.4983 8.3961 8.3192 8.2111 8.1387 8.0477 7.9928 7.9202 7.8452 7.7933
8.6053 6.4300 8.2849 8.1625 6.0580 7.9676 7.8805 7.6188 7.7569 7.7015 7.6516 7.5653 7.4933 7.4322 7.3798 7.3343 7.2428 7.1740 7.0769 7.0117 6.9296 6.8800 6.8139 6.7450 6.6984
7.7695 7.6093 7.4764 7.3644 7.2686 7.1856 7.1131 7.0490 6.9921 6.9411 6.8952 6.8158 6.7494 6.6931 6.6446 6.6025 6.5179 6.4540 6.3639 6.3032 6.2265 6.1602 6.1160 6.0541 6.0094
8
10
9.6054 9.4079 9.2442 9.1061 8.9882 8.8860 8.7966 6.7178 8.6478 6.5850 8.5265 8.4309 8.3491 8.2799 8.2203 6.1Ge6 8.0646 7.9862 7.8755 7.8010 7.7071 7.6503 7.5741 7.4960 7.4413
8.5988 6.4200 8.2717 6.1466 6.0396 7.9466 7.8657 7.7940 i.7303 7.6733 7.6216 7.5328 7.4583 7.3951 7.3407 7.293a 7.1982 7.1263 7.0246 6.9561 6.8694 6.8170 6.7464 6.6737 6.6226
Table A.lh 2 = 0.25, Degrees
of freedom,
numerator
y = 0.999999
(pz) 5
28 g,
38.5356 37.8161 37.2228 36.7247 36.3010 35.9356 35.6173 35.3375 35.0898 34.8688 34.6703 34.3265 34.0446 33.ao50
30 35 40 50 EO 60 100 150 300 1000
33.6001 33.4217 33.0684 32.8038 32.4340 32.1876 31.8804 31.6959 31.4545 31.2054 31.0345
10
5I
11 12 13 14
8 ;i
15 16 17
s
d E2 8 r= 6 z z
20 22
22.7763 22.3389 21.9777 21.6742 21.4157 21.1925 20.9979 20.8268 20.6751 20.5396 20.4179 20.2081 20.0336 19.8862 19.7599 19.6506 19.4320 19.2681 19.0389 16.8860 18.6948 18.5798 18.4291 18.2731 18.1663
17.1844 16.8481 16.5701 16.3363 16.1371 15.9649 15.8147 15.6825 15.5652 15.4604 15.3662 15.2037 15.0694 14.9540 14.8559 !a.7709 14.6009 14.4733 14.2943 14.1747 14.0249 13.9347 13.8164 13.6934 13.6091
14.2666 13.9834 13.7491 13.5520 13.3838 13.2384 13.1115 12.9097 12.9006 12.8119 12.7321 12.5944 12.4797 12.3626 12.2993 12.2271 12.0824 11.9738 11.8210 11.7189 11.5907 11.5135 11.4115 11.3060 11.2332
12.4559 12.2058 11.9988 11.6245 11.6758 11.5471 11.4347 11.3357 11.2478 11.1692 11.0984 10.9762 10.8742 10.7879 10.7138 *0.-95 10.5207 10.4238 10.2874 10.1960 10.0813 10.0119 9.9206 9.8255 9.7600
11.2137 10.9864 10.7982 10.6397 10.5043 10.3871 10.2848 10.1945 10.1144 10.0426 9.9781 9.8665 9.7734 9.6945 9.6267 9.ECT'O 9.4499 9.3610 9.2359 9.1519 9.0462 8.9624 8.8975 6.8100 8.7493
D.A. Belsley, .4ssessinl: the presence of data weaknesses
237
Table A.2a a=O.lO, Degrees
10 II 12 13 14 15 16 17 18 19 20 22 24 26 28 30 35 40 50 60 60 100 150 300 1000
of freedom,
numerator
;1=0.75
(pz)
1
2
3
4
5
6
8
10
7.3402 7.1331 7.0734 6.9743 6.8908 6.8195 8.7580 6.7043 6.6571 6.6752 6.5779 6.5140 6.4613 6.4172 6.3797 6.3474 6.2636 6.2364 6.1713 6.1282 6.0749
6.5638 6.4076 6.2762 6.1674 6.0758 5.9977 5.9303 5.E716 5.0!59 5.774: 5.7333 5.66?6 5.6062 5.5581 5.5172 5.cc22 5.4127 5.3613 5.2904 5.2436 5.1858 5.1516 5.1084 5.C619 5.0316
6.0599 5.9337 5.7938 5.6871 5.5932 5.5131 5.4439 5 3637 5.3307 5.2639 5.2419 5.liO3 5.1114 5.0620 5.0201 4.9840 4.9127 4.8EOO 4.7870 4.7390 4.6796 4.6443 4.5995 4.5524 4.5210
5.7996 5.6314 5.4948 5.3817 5.2865 5.2053 5.1352 5.0740 5.c203 4.9726 4.9501 4.8574 4.7975 4.7474 4.7048 4.6681 4.5956 4.5418 4.4675 4.4185 4.3579 4.3219 4.2750 4.2277 4.1946
5.5906 5.4211 5.2835 5.1694 5.0734 4.9914 4.9206 4.e569 4.@046 4.7564 4.7134 4.6398 4.5793 4.5265 4.4e53 4.4462 4.3746 4.3201 4.2446 4.1948 4.1331 4.0964 4.0467 3.9998 3.9666
5.4360 5.2656 5.1272 5.0125 4.9158 4.6333 4.7619 4.6997 4.6449 4.5963 4.5529 4.4786 4.4174 4.3661 4.3225 4.2849 4.2105 4.1552 4.0787 4.0281 3.9655 3.9282 3.6798 3.8298 3.7956
5.2207 5.0492 4.9098 4.7940 4.6565 4.6130 4.5409 4.4779 4.4224 4.3732 4.3291 4.2537 4.1915 4.1393 4.0949 4.0566 3.9CCG 3.9241 3.8457 3.7936 3.7255 3.6911 3.6411 3.5897 3.5542
5.0764 4.9043 4.7642 4.6476 4.5496 4.4656 4.3928 4.3293 4.2733 4.2235 4.1790 4.1027 4.0397 3.9668 3.9417 3.9028 3.3255 3.7679 3.6679 3.6349 3.5690 3.5295 3.4780 3.4250 3.3869
6.7266 6.5186 6.3479 6.2064 6.0872 5.9854 5.6975 5.8208 5.7532 5..6933 5.6398 5.5482 5.4728 5.4095 5.3557 5.3094 5.2176 5.1495 5.0551 4.9926 4.9156 4.8696 4.6098 4.7485 4.7061
6.4453 6.2384 6.0701 5.9306 5.8130 5.7125 5.6256 5.5497 5.4830 5.4237 5.3707 5.2801 5.2053 5.1427 5.0893 5.0434 4.9522 4.8845 4.7907 4.7266 4.6517 4.6058 4.5465 4.4847 4.4426
6.0595 5.8569 5.6970 5.5550 5.4395 5.3408 5.2553 5.1806 5.1146 5.0564 5.0041 4.9146 4.8407 4.7757 4.7256 4.6803 4.5696 4.5224 4.4289 4.3669 4.2900 4.2440 4.1842 4.1224 4.0802
5.8054 5.6058 5.4431 5.3080 5.1939 5.0963 5.0117 4.9378 4.8726 4.8147 4.7628 4.6740 4.6005 4.5388 4.4862 4.4408 4.3505 4.2832 4.1895 4.1274 4.0500 4.0037 3.9429 3.8807 3.8381
6.0433 6.0020 5.9577 5.9265
Table A.2b r=O.lO. Degrees
of freedom,
numerator
;,=0.90
(pz)
2
-s
10 11 12 13
c
;;
; E" .3 E :
16 17 18 20 19
2
22
E$8 k % 8 E 2 Lz
f; 28 30 35 40 50 60 80 100 150 300 1000
10.9327 10.6866 10.4867 10.3212 10.1818 10.0629 9.9602 9.8707 9.7921 9.7222 9.6599 9.5534 9.4657 9.3924 9.3301 9.2764 9.1702 9.0916 8.9831 8.9114 8.8229 0.7705 8.7037 8.6313 8.5835
6.7162 8.4865 8.3001 8.1457 8.0159 7.9052 7.8096 7.7263 7.6531 7.5862 7.5303 7.4313 7.3498 7.2816 7.2237 7.1738 7.07;3 7.0022 6.9014 6.13350 6.7529 6.7041 6.6419 6.5766 6.5330
7.7099 7.4898 7.3111 7.1631 7.0386 6.9323 6.8406 6.7606 6.6902 6.6279 6.5723 6.4772 6.3989 6.3333 6.2775 6.2296 6.1347 6.0644 5.9673 5.9032 5.8239 5.7768 5.7159 5.6536 5.6110
7.1211 6.9070 6.7330 6.5889 6.4676 6.3640 6.2745 6.1965 6.1279 6.0670 6.0127 5.9197 5.8433 5.7791 5.7246 5.6776 5.5047 5.5158 5.4205 5.3575 5.2797 5.2334 5.1731 5.1116 5.0694
D.A. Belsley, Assessing the presence
238
of data weaknesses
Table A.2c 1 = 0. IO. Degrees 1
z I 5 & z .s E z 4
10 11 12 13 14 15 16 17 16 19 20 22
3i
22;
8 t 3
28 30 35 40
z b F;
g 80 100 150 300 1000
13.6654 13.3368 13.0700 12.8490 12.6631 12.5045 12.3676 12.2462 12.1433 12.0503 11.9672 ll.P?53 11.7065 11.6106 11.5276 11.4561 11.3148 11.2101 11.0656 10.9702 10.6524 10.7624 10.6948 10.6011 10.5356
of freedom,
numerator
;‘= 0.95
(pz)
2
3
4
5
10.2821 10.0005 9.7719 9.5827 9.4234 9.2876 9.1703 9.0661 6.9763 6.8966 6.6275 6.7059 6.6059 6.5222 6.4510 6.3898 8.2666 6.1769 6.0549 7.9732 7.0722 7.8122 7.7346 7.6553 7.6002
8.8579 8.5979 6.3667 8.2118 8.0647 7.9390 7.6305 7.7359 7.6527 7.5789 7.5130 7.4004 7.3077 7.2301 7.1640 7.1072 6.9947 6.9114 6.7960 6.7199 6.6259 6.5699 6.4969 6.4233 6.3722
6.0466 7.6014 7.6004 7.4339 7.2937 7.1739 7.0705 6.9602 6.9008 6.8304 6.7674 6.6598 6.5712 6.4969 6.4337 6.3792 6.2715 6.1915 6.0806 6.0077 5.9172 5.8633 5.7933 5.7212 5.6719
7.5180 7.2793 7.0651 6.9242 6.7866 6.6727 6.5725 6.4851 6.4062 6.3399 6.2786 6.1744 6.0684 6.0162 5.9548 5.9016 5.7970 5.7191 5.6111 5.5399 5.4515 5.3986 5.3300 5.2598 5.2110
7.1392 6.9067 6.7175 6.5605 6.4262 6.3151 6.2173 6.1319 6.0566 5.9699 5.9302 5.6280 5.7437 5.6730 5.6128 5.5609 5.4579 5.3814 5.2753 5.2051 5.1180 5.0660 4.9965 4.9290 4.8810
6.6261 6.4041 6.2217 6.0702 5.9424 5.6330 5.7384 5.6557 5.5627 5.5180 5.4600 5.3606 5.2768 5.2099 5.1513 5.1007 5.0001 4.9252 4.8211 4.7522 4.6665 4.6152 4.5482 4.4792 4.4323
6.2946 6.0763 5.6984 5.7505 5.6257 5.5186 5.4262 5.3452 5.2730 5.2103 5.1535 5.0561 4.9755 4.9076 4.6500 4.8001 4.7009 4.6270 4.5239 4.4555 4.3703 4.3192 4.2522 4.1633 4.1363
Table A.2d r=O.lO. Degrees 1
z I 5
z
z
d 2 k8 % z E
10 11 12 13 14 15 16 17
2’; 26 30 35 40 50 60 80 100 150 300 1000
20.0245 19.4893 19.0547 18.6949 18.3921 16.1338 17.9108 17.7164 17.5458 17.3944 17.2590 17.0279 16.6377 16.6784 16.5426 16.4264 16.1961 16.0255 15.7898 15.6344 15.4423 15.3280 15.1834 15.0308 14.9284
of freedom,
numerator
YEO.99
(pz)
2
3
4
5
6
8
IO
13.8173 13.4131 13.0848 12.8128 12.5839 12.3866 12.2198 12.0727 11.9433 11.6285 11.7260 11.5509 11.4C67 11 .2859 11.1631 11.0947 10.9196 10.7900 10.6105 10.4921 10.3456 10.2585 10.1451 10.0297 9.9504
11.4021 11.0509 10.7655 10.5269 10.3297 10.1595 10.0125 9.6642 9.7713 9.6712 9.5817 9.4287 9.3027 9.1970 9.1071 9.0297 6.8764 8.7627 8.6051 8.5011 8.3723 6.2955 8.1952 8.0935 8.0225
10.0779 9.7563 9.4946 9.2779 9.0952 6.9390 6.8040 6.6661 6.5624 6.4903 6.4080 6.2672 6.1511 8.0537 7.9708 7.8995 7.7576 7.6527 7.5069 7.4106 7.2912 7.2199 7.1272 7.0323 6.9666
9.2280 8.9257 8.6797 6.4756
8.6301 8.3416 8.1066 7.9116 7.7470 7.6063 7.4645 7.3702 7.2844 7.2011 7.1267 6.9991 6.8938 6.8053 6.7300 6.6650 6.5380 6.4399 6.3066 6.2182 6.1064 6.0426 5.9572 5.8690 5.6090
7.6351 7.5650 7.3450 7.1621 7.0077 6.87'>6 6.7611 6.6610 6.5728 6.4913 6.4211 6.3038 6.2043 6.1207 6.0494 5.9878 5.8654 5.7742 5.6472 5.5629 5.4579 5.3950 5.3125 5.2278 5.1699
7.3233 7.0654 6.8551 6.6802 6.5324 6.4058 6.2961 6.2001 6.1154 6.0401 5.9726 5.8569 5.7611 5.6805 5.6116 5.5524 5.4341 5.3458 5.2226 5.1407 5.0385 4.9771 4.6962 4.6134 4.7563
8.3035 6.1564 8.0202 7.9180 7.8201 7.7332 7.6555 7.5225 7.4128 7.3207 7.2422 7.1746 7.0405 6.9407 6.8023 6.7107 6.5971 6.5292 6.4404 6.3498 6.2673
D.A. Belsley,
239
Assessing the presence cf datu weaknesses
Table A.2e i’= 0.999
x=0.10, Degrees
of freedom.
numerator
(I)~) IO
1 10 11 12 13 14 15 16 17 18 19 20 22 24 26 28 30 35 40 50 60 80 100 150
29.1567 28.3024 27.6086 27.0337 26.5498 26.1367 25.7800 25.4688 25.1951 24.9523 24.7356 24.3651 24.0600 23.8045 23.5871 23.4000 23.0296 22.7551 22.3754 22.1247 T1.8147 21.6302 21.3895 21.1446 20.9762
300 1000
18.7491 18.1657 17.6915 17.2983 16.9672 16.6843 16.4398 16.2265 16.0387 15.8721 15.7233 15.4687 (5.2588 15.0829 14.9333 14.8042 14.5488 14.3592 14.0966 13.9231 13.7081 13.5801 13.4141 13.2420 13.1249
14.8871 14.4066 14.0157 13.6914 13.4101 13.1844 12.9825 12.8061 12.6508 12.5129 12.3896 12.1787 12.0047 11.8587 11.7345 11.6274 11.4149 11.2571 11.0382 10.8934 10.7138 10.6067 10.4663 10.3236 10.2249
12.8204 12.3956 12.0499 11.7619 11.5209 11.3139 11.1348 10.9784 10.8406 10.7182 10.6087 10.4213 10.2666 10.1367 10.0260 9.9306 9.7413 9.6503 9.4047 9.2751 9.1143 9.0182 8.8927 8.7638 8.6760
11.5147 11.1256 10.8086 10.5454 10.3233 10.1332 9.9687 9.8250 9.6983 9.5857 9.4850 9.3124 9.1699 9.0502 8.9481 8.8601 8.6852 8.5550 8.3739 8.2539 8.1047 8.0154 7.8965 7.7787 7.6967
10.6067 10.2425 9.9458 9.6992 9.4910 9.3128 9.1585 9.0236 8.9046 8.7989 8.7043 8.5421 8.4080 8.2954 8.1993 8.1164 7.9515 7.8207 7.6576 7.5441 7.4029 7.3162 7.2073 7.0933 7.0154
9.4137 9.08?6 8.8126 8.58UO 8.3983 8.2358 8.09'>0 7.9718 7.8630 7.76(;3 7.6798 7.5312 7.4003 7.3049 7.2166 7.1404 6.9886 6.87',3 6.7172 6.6121 6.4809 6.4022 6.2984 6.1922 6.1189
6.6550 8.3453 8.0925 7.8621 7.7042 7.5518 7.4196 7.3039 7.2017 7.1108 7.0293 6.8895 6.7736 6.6761 6.5927 6.5207 6.3771 6.2698 6.1198 6.0198 5.8948 5.8197 5.7201 5.6184 5.5478
Table A.21 2! =
Degrees
I
of freedom,
2
0. IO,
:’ = 0.9999
numerator (pz) ____.__ 3
4
5
6
8
I0
15.4636 14.9375 14.5089 14.1529 13.8524 13.5952 13.3726 13.1780 13.0065 12.8540 12.7177 12.4839 12.2908 12.1285 11.9902 11.8708 11.6336 11.4569 11.2110 11.0480 10.6452 10.7237 10.5646 10.4013 10.2899
13.7019 13.2283 12.8423 12.5215 12.2506 12.0187 11.8176 11.6422 il.4872 11.3495 11.2263 11.0149 10.8401 10.6932 10.5679 10.4598 10.2446 10.0842 9.8607 9.7124 9.5276 9.4168 9.2714 9.1221 9.0201
12.4853 12.0482 11.6918 11.3954 11.1451 10.9306 10.7449 10.5823 10.4389 10.3114 10.1973 10.0014 9.8394 9.7031 9.5868 9.4864 9.2864 9.1373 8.9292 8.7909 8.6184 8.5149 8.3785 8.2390 8.1428
10.8907 10.5096 10.1970 9.9278 9.7044 9.5130 9.3470 9.2017 9.0734 8.9592 8.8570 8.6814 8.5360 8.4116 8.3091 8.2187 8.0386 7.9039 7.7158 7.5905 7.4338 7.3397 7.2146 7.0875 6.9991
9.8978 9.5393 9.2465 9.0027 8.7965 8.6196 6.4662 8.3318 8.2131 8.1075 5.0128 7.8501 1.7152 7.6015 7.5044 7.4203 7.2527 7.1272 6.9515 6.8342 6.6873 6.5990 6.4811 6.3609 6.2774
_ IO 11 12 13 14 15 16 17 18 19 20 22 24 26 28 30 35 40 50 60 80 100 150 300 1000
38.3270 37.1370 36.1697 35.3677 34.6922 34.1151 33.6164 33.1812 32.7982 32.4583 32.1547 31.6753 31.2071 30.8462 30.5429 30.2799 29.7586 29.3717 28.8358 28.4818 28.0434 27.7821 27.4399 27.0935 26.8520
23.6087 22.8431 22.2203 21.7035 21.2678 20.8954 20.5733 20.2921 20.0444 19.8246 19.6280 19.2915 19.0139 18.7810 18.5827 18.4118 18.0727 17.8206 17.4709 17.2396 16.9527 16.7813 16.5585 16.3293 16.1721
18.2781 17.6687 17.1726 16.7606 16.4132 16.1160 15.8588 15.6342 15.4362 15.2603 15.1031 14.8337 14.6112 14.4245 14.2654 14.1282 13.8557 13.6530 13.3712 13.1846 12.9528 12.8141 12.6325 12.4471 12.3200
240
D.A. Belsley, Assessing the presence
of data weaknesses
Table A.2g c(=O.lO, Degrees 1
-
10 11 12 13 14 15 16 17 18 19 20 22 24 26 26 30 35 40 50 60 80 100 150 300 1000
47.5297 45.9934 44.7436 43.7065 42.6325 42.0852 41.4391 40.8749 40.3760 39.9369 39.5426 36.6676 36.3108 37.8436 37.4458 37.1031 36.4230 35.9174 35.2163 34.7525 34.1772 33.8337 33.3834 32.9262 32.6106
of freedom, 2 28.4299 27.4600 26.7066 26.0644 25.5227 25.0593 24.6583 24.3079 23.9992 23.7249 23.4796 23.0594 22.7124 22.4210 22.1728 21.9587 21.5334 21.2170 20.7771 20.4857 20.1237 19.9072 19.6235 19.3341 19.1357
numerator
y=o.99999
(pz)
3
4
5
6
8
10
21.6155 20.8774 20.2760 19.7763 19.3545 16.9935 18.6810 18.4078 18.1669 17.9528 17.7613 17.4329 17.1616 16.9335 16.7391 16.5714 16.2379 15.9894 15.6436 15.4141 15.1286 14.9577 14.7338 14.5035 14.3465
18.0490 17.4225 16.9118 16.4873 16.1288 15.8218 15.5558 15.3233 15.1161 14.9357 14.7725 14.4925 14.2609 14.0662 13.9001 13.7567 13.4714 13.2566 12.9621 12.7651 12.5196 12.3725 12.1785 11.9806 11.6442
15.8304 15.2736 14.8199 14.4423 14.1233 13.8501 13.6133 13.4061 13.2233 13.0607 12.91;2 12.6654 12.4587 12.2647 12.1363 12.0081 11.7528 11.5622 11.2963 11.1195 10.8988 10.7664 10.5919 10.4131 10.2899
14.3056 13.7972 13.3825 13.0374 12.7457 12.4958 12.2791 12.0895 11.9221 11.7731 11.6398 11.4108 11.2212 11.0616 10.9253 10.8076 10.5729 10.3976 10.1527 9.9697 9.7860 9.6636 9.5016 9.3362 9.2216
12.3277 11.6823 11.5187 11.2159 10.9599 10.7403 10.5499 10.3631 10.2358 10.1047 9.9872 9.7853 9.6160 9.4770 9.3565 9.2523 9.0444 6.8889 6.6712 6.5260 8.3441 6.2346 6.0669 7.9404 7.0372
11.0872 10.6816 IO.3503 10.0743 9.6406 9.6404 9.4665 9.3141 9.1795 9.0596 6.9521 6.7673 8.6140 8.4047 6.3741 8.2785 6.0874 7.9442 7.7435 7.6093 7.4409 7.3396 7.2042 7.0656 6.9690
Table A.2h r=O.lO, Degrees
11 12 13 14 15 16 17 18 19 20 22
56.71'96 54.8313 53.2940 52.0175 50.9410 50.0203 49.2232 48.5270 47.9135 47.3665 46.8012 46.0455 45.3572
:; 28 30 35 40 50 60
44.7784 44.2854 43.86C2 43.0158 42.3876 41.5143 40.936'3
10
3
I E
2 ._
z
.9 E
B -s E' 2 8 k "0 g 8 0"
100 80 150 300 1000
40.2176 39.7881 39.2258 38.6504 38.2567
of freedom,
numerator
1’=0.999999
(p2)
2
3
4
5
6
8
10
33.2229 32.0674 31.1623 30.393s 29.7447 29.1893 28.7085 28.2861 27.9174 27.5080 27.2932 26.7680 26.3704 26.0194 25.7202 25.4620 24.9487 24.&62 24.0337 23.6805 23.2409 22.9776 22.6316 22.2779 22.0357
24.9163 24.0515 23.3447 22.7572 22.2610 21.6360 21.4679 21.1459 20.8618 20.6093 20.3332 19.9955 19.6748 19.4051 19.1750 18.9763 16.5810 18.2863 17.8749 17.6018 17.2614 17.0572 16.7890 16.5133 16.3247
20.5965
17.9200 17.2813 16.7601 16.3264 15.9596 15.6456 15.3732 15.1346 14.9243 14.7370 14.5692 14.2812 14.0426 13.8416 13.6703 13.5220 13.2266 13.0056 12.6973 12.4916 12.2349 12.0806 11.6770 11.6675 11.5226
16.0870 15.5085 15.0362 14.6431 14.3107 14.0257 13.7705 13.5621 13.3710 13.2008 13.0484 12.7867 12.5697 12.3870 12.2309 12.0959 11.8267 11.6253 11.3437 11.1559 10.9209 10.7797 10.5913 10.3998 10.2668
13.7169 13.2185 12.8096 12.4693 12.1813 11.9342 11.7196 11.5319 11.3659 11.2181 11.0856 10.8579 10.6690 10.5097 10.3735 10.2557 10.0205 9.8442 9.5972 9.4323 9.2253 9.1010 8.9344 6.7646 8.6459
12.2403 11.7890 11.4202 11.1128 10.6526 10.6293 10.4354 10.2655 10.1153 9.9815 9.6615 9.6551 9.4837 9.3392 9.2155 9.1064 8.8944 8.7338 6.5065 6.3576 8.1679 8.0536 7.9008 7.7442 7.6340
19.8702 19.2779 16.7852 18.3689 16.0122 17.7031 17.4326 17.1939 16.9816 16.7915 16.4652 16.1952 15.9680 15.7740 15.6064 15.2727 15.0236 14.6758 14.4444 14.1556 13.9823 13.7533 13.5194 13.3573
D.A. Belsley, Assessing the presence
ofdata weaknesses
241
Table A.3a z=o.o5, Degrees
10 11 12 13 14 15 16 17 16 19 20 22 24 26 28 30 35 40 50 60 80 100
150 300 1000
of freedom,
numerator
8.8716 8.5791 6.3436 8.1504 7.9687 7.8515 7.7337 7.6313 7.5417 7.4625 7.3921 7.2721 7.1738 7.0918 7.0224 6.9629 6.P455 6.7589 6.6396 6.5017 6.4656 6.4087 6.3361 6.2603 6.2113
10.4769 10.1907 9.9597 9.7695 9.6102 9.4748 9.3584 9.2572 9.lEP5 9.0899 5.3201 6.9012 6.8033 6.7217 8.6526 8.5929 8.4756 6.3692 8.2706 8.1926 6.0959 8.0383 7.9650 7.8843 7.6319
y=o.75
(pz)
3
4
5
6
8
10
6.0565 7.7655 7.5314 7.3391 7.1785 7.0422 6.9251 6.6236 6.7346 6.6559 6.5861 6.4671 6.3695 6.2882 6.2193 6.1632 6.0439 5.9560 5.8399 5.7624 5.6671 5.6105 5.5375 5.4647 5.4139
7.5634 7.2730 7.0410 6.8497 6.6699 6.5543 6.4378 6.3367 6.2482 6.1699 6.1003 5.9618 5.8648 5.8038 5.7351 5.6763 5.%02 5.4747 5.3569 5.2796 5.1644 5.1280 5.0544 4.9810 4.9306
7.2298 6.9415 6.7095 6.51 go 6.3597 6.2245 6.1064 6.0076 5.9192 5.6412 5.7717 5.6534 5.5565 5.4756 5.4071 5.3482 5.7773 5.1467 5.0286 4.9514 4.6560 4.7995 4.7262 4.6515 4.6004
6.9875 6.7000 6.4688 6.2787 6.1198 5.9649 5.6689 5.7683 5.5600 5.6021 5.5326 5.4144 5.3175 5.2366 5.1680 5.1091 "_~0?9
6.6560 6.3699 6.1395 5.9501 5.7916 5.6569 5.5412 5.4406 5.3524 5.2744 5.2049 5.0865 4.9894 4.9082 4.6394 4.7802 a.F;R?a
6.4377 6.1525 5.9227 5.7336 5.5753 5.4408 5.3251 5.2245 5.1362 5.0581 4.9665 4.8698 4.7723 4.6908 4.6217 4.5622 d.4d4R
4.9072 4.7090 4.7113 4.6155 4.5567 4.4851 4.4093 4.3579
4.5771 4.4579 4.3795 4.2626 4.2250 4.1503 4.0741 4.0215
4.3576 4.2373 4.1580 4.0599 4.0015 3.9250 3.6478 3.7949
Table A.3b E = 0.05, Degrees
10
5I
11 12 13 14
b z .5
15 16 17 16
--s
E
2 4 g + 8 L% % t? e 8 a
20 19 22 ;; 28 30 35 40 50 60 80 100 150 300 1000
of freedom,
numerator
y = 0.90
(pz)
1
2
3
4
5
6
8
10
15.0655 14.6063 14.2363 13.9319 13.6772 13.4609 13.2751 13.1136 12.9721
11.5634 11.1767 10.8496 10.5809 10.3562 10.1656 10.0019 9.8597 9.7352 9.6252 9.5274 9.3608 9.2242 9.1104 9.0139 6.9312 8.7661 0.6478 6.4824 6.3738 6.2402 6.1610 6.0591 7.9553 7.8832
10.0803 9.6995 9.3932 9.1417 8.9314 6.7530 8.5997 6.4667 6.3502 6.2472 6.1556 7.9996 7.8719 7.7653 7.6750 7.5975 7.4447 7.3321 7.1771 7.0753 6.9500 6.8757 6.7792 6.6612 6.6162
9.2193 8.8539 6.5601 6.3186 6.1168 7.9455 7.7983 7.6706 7.5586 7.4597 7.3716 7.2217 7.0968 6.9963 6.9094 6.8349 6.6879 6.5794 6.4299 6.3316 6.2108 6.1391 6.0452 5.9512 5.6864
6.6527 6.2978 8.0122 7.7775 7.5612 7.4147 7.2715 7.1472 7.0382 6.9419 6.8562 6.7101 6.5904 6.4904 6.4057 6.3330 6.1694 6.0634 5.9374 5.8414 5.7230 5.6529 5.5611 5.4669 5.4043
6.2477 7.9003 7.6208 7.3909 7.1987 7.0354 6.8951 6.7732 6.6663 6.5719 6.4877 6.3444 6.2266 6.1286 6.0453 5.9738 5.8326 5.7283 5.5844 5.4898 5.3730 5.3037 5.2133 5.1219 5.0566
7.7010 7.3641 7.0976 6.8696 6.6827 6.5240 6.3875 6.2686 6.1647 6.0726 5.9906 5.6507 5.7356 5.6396 5.5564 5.4604 5.3500 5.2477 5.1062 5.0131 4.6979 4.6294 4.7397 4.6492 4.5072
7.3445 7.0146 6.7468 6.5300 cf.3467 6.1909 6.0566 5.9402 5.8378 5.7473 5.6665 5.5268 5.4156 5.3209 5.2405 5.1714 5.0346 4.9332 4.7930 4.7005 4.5659 4.5176 4.4264 4.3371 4.2754
12.8469 12.7355 12.5458 12.3902 12.2605 12.1504 12.0559 11.8699 11.7325 11.5435 11.4194 11.2665 11.1760 11.0626 10.9423 10.6556
242
D.A. Belsley, Assessing the presence of data weaknesses Table A.3c a=o.o5, Degrees
of freedom,
numerator
y=o.95
(pJ 6
8
10
9.6372 9.2353 8.9119 8.6460 8.4236 8.2348 8.0725 7.9315 7.8080 7.6987 7.6014 7.4357 7.2998 7.1863 7.0901 7.0074 6.8443 6.7238 6.5576 6.4484 6.3136 6.2336 6.1288 6.0235 5.9503
9.1120 8.7229 8.4096 8.1520 7.9365 7.7535 7.5961 7.4594 7.3394 7.2334 7.1390 6.9780 6.8460 6.7356 6.6421 6.5617 6.4029 6.2855 6.1236 6.0170 5.8853 5.8071 5.7050 5.6015 5.5310
8.4083 8.0366 7.7372 7.4908 7.2846 7.1093 6.9585 6.8274 6.7124 6.6106 6.5199 6.3652 6.2382 6.1320 6.0418 5.9643 5.8111 5.6977 5.5408 5.4375 5.3036 5.2335 5.1341 5.0327 4.9641
7.9523 7.5922 7.3019 7.0629 6.0627 6.6925 6.5460 6.4185 6.3066 6.2076 6.1193 5.9686 5.0447 5.7411 5.6530 5.5773 5.4274 5.3163 5.1624 5.0609 4.9350 4.8599 4.7611 4.6614 4.5931
11.7649 11.2596 10.8528 10.5182 10.2382 10.0004 9.7959 9.6182 9.4623 9.3245 9.2017 8.9924 8.8206 6.6771 8.5553 8.4508 8.2441 8.0913 7.8803 7.7414 7.5699 7.4678 7.3345 7.1995 7.1069
10.9656 10.4856 10.0990 9.7809 9.5146 9.2884 9.0938 8.9246 8.7762 8.6449 8.5279 8.3284 8.1646 8.0277 7.9115 7.8116 7.6142 7.4681 7.2662 7.1332 6.9687 6.8709 6.7429 6.6128 6.6248
9.9072 9.4610 9.1014 8.8054 6.5574 8.3466 8.1652 8.0073 7.86138 7.7462 7.6369 7.4504 7.2970 7.1688 7.0599 6.9662 6.7808 6.6434 6.4532 6.3277 6.1722 6.0795 5.9574 5.6344 5.7500
9.2289 8.8048 6.4628 6.1810 7.9449 7.7441 7.5711 7.4206 7.2884 7.1714 7.0670 6.8888 6.7421 6.6194 6.5150 6.4253 6.2474 6.1154 5.9324 5.8114 5.6612 5.5715 5.4536 5.3336 5.2511
2 10
s
11 12 13 14
$ I? ." E g
15 16 17 18 20 19
E I
*
22
g
::
j$ %
3':: 35 40 50 60 100 80
c E g
150 300 1000
18.5189 17.9193 17.4365 17.0396 16.7075 16.4256 16.1834 15.9733 15.7889 15.6260 15.4810 15.2341 15.0318 14.8630 14.7198 14.5971 14.3552 14.1767 13.9315 13.7702 13.5717 13.4540 13.3045 13.1520 13.0465
13.5498 13.0566 12.6600 12.3341 12.0616 11.8305 11.6319 11.4596 11.3086 11.1752 11.0564 10.8544 10.6889 10.5508 10.4337 10.3334 10.1354 9.9846 9.7887 9.6570 9.4947 9.3985 9.2740 9.1490 9.0613
11.5170 11.0704 10.7113 10.4162 10.1695 9.9601 9.7803 9.6242 9.4874 9.3665 9.2590 9.0759 8.9258 8.8005 8.6945 8.6034 8.4239 8.2915 8.1090 7.9893 7.8418 7.7544 7.6408 7.5255 7.4467
10.3777 9.9580 9.6204 9.3429 9.1109 8.9140 8.7448 8.5979 6.4691 8.3553 8.2539 8.0814 7.9399 7.8219 7.7218 7.6360 7.4665 7.3414 7.1690 7.0558 6.9161 6.8333 6.7261 6.6157 6.5408
Table A.3d r*=o.o5, y=o.99 Degrees
of freedom,
numerator
(pz) 5
2 10
E 5 5 3 .c E 2 4 E^ 4 B t z E E
11 12 13 14 15 16 17
18 :z 22 24 ;; 30 35 40 50 60 80 100 150 300 1000
26.4997 25.5547 24.7944 24.1695 23.6470 23.2037 22.4922 22.8229 22.2025 21.9464 21.7186 21.3310 21.0131 20.7480 20.5234 20.3307 19.9507 19.6706 19.2850 19.0318 18.7201 16.5353 18.2980 10.0552 17.8911
17.9686 17.2733 16.7140 16.2543 15.8701 15.5440 15.2639 15.0207 14.8075 14.6192 14.4515 14.1661 13.9322 13.7370 13.5717 13.4297 13.1498 12.9433 12.6588 12.4721 12.2419 12.1055 11.9286 11.7480 11.6247
14.6904 14.0944 13.6149 13.2207 12.8912 12.6114 12.3710 12.1622 11.9792 11.8174 11.6734 11.4282 11.2270 11.0592 10.9169 10.7947 10.5536 10.3757 10.1304 9.9692 9.7704 9.6525 9.4986 9.3436 9.2354
12.9052 12.3643 11.9290 11.5710 11.2716 11.0174 10.7988 10.6089 10.4425 10.2953 10.1642 9.9409 9.7577 9.6047 9.4750 9.3636 9.1436 8.9810 6.7568 8.6093 8.4274 8.3192 8.1789 6.0360 7.9374
D.A. Belsley, Assessing
the presence of data weaknesses
243
Table A.3e r = 0.05, Degrees
10 11 12 13 14 15 16 17 18 19 20 22 24 26 28 30 35 40 50 60 a0 100 150 300 1000
of freedom,
numerator
y=o.999
(p2)
1
2
3
4
5
6
8
37.8914 36.4217 35.2393 34.2676 33.4551 32.7656 32.1731 31.6587 31.2078 30.8094 30.4548 29.8510 29.3560 28.9429 28.5929 28.2924 27.7000 27.2628 26.6604 26.2649 25.7773 25.4884 25.1113 24.7320 24.4671
24.1073 23.1195 22.3246 21.6711 21.1245 20.6605 20.2617 19.9153 19.6116 19.3432 19.1042 16.6971 18.3631 18.0843 i7.a480 17.6450 17.2444 16.9485 16.5406 16.2724 15.9418 15.7453 15.4912 15.2313 15.0520
19.0230 la.2167 17.5677 17.0339 16.5873 16.2080 15.8819 15.5985 15.3500 15.1303 14.9346 14.6011 14.3274 14.0987 13.9048 13.7382 13.4092 13.1660 12.8303 12.6094 12.3367 12.1745 11.9624 11.7483 11.6013
16.3120 15.6037 15.0333 14.5641 14.1713 13.8377 13.5506 13.3012 13.0823 12.8888 12.7163 12.4223 12.1808 11.9791 11.8079 11.6607 11.3699 11.1548 10.8575 10.6617 10.4197 10.2757 10.0875 9.8963 9.7669
14.6038 13.9579 13.4375 13.0092 12.6506 12.3459 12.0837 11.8557 11.6556 11.4786 11.3209 11.0518 10.8307 10.6459 10.4890 10.3541 10.0874 9.8899 9.6167 9.4366 9.2137 9.0810 8.9077 6.7307 8.6108
13.4184 12.8161 12.3306 11.9310 11.5962 11.3117 11.0667 10.8537 10.6667 10.5012 10.3537 10.1020 9.8951 9.7221 9.5751 9.4487 9.1986 9.0132 8.7566 8.5873 8.3776 8.2526 8.0882 7.9221 7.8084
11.8646 11.3250 10.8807 10.5189 10.2156 9.9577 9.7356 9.5423 9.3725 9.2222 9.0691 8.6592 6.6709 8.5132 8.3792 6.2639 8.0355 7.8659 7.6309 7.4755 7.2825 7.1673 7.0155 6.6619 6.7557
-
10 .a794 10.3717 9.9620 9.6244 9.3413 9.1004 6.8928 6.7121 6.5533 8.4127 8.2872 8.0728 7.8962 7.7483 7.6225 7.5142 7.2994 7.1398 6.9181 6.7713 6.5887 6.4796 6.3346 6.1885 6.0875
10
Table A.3f x=0.05, Degrees
10 11 B
.A
8 z .5
g & -c) i -i b "0
12 13 14 15 16 17 16 :z 22 24
',; 30 35 40 50 fi? 60 % 80 B 100 150 300 1000
of freedom,
numerator
li’=o.9999
(pz)
1
2
3
4
5
6
49.2900 47.2734 45.6505 44.3162 43.2002 42.2528 41.4385 40.7311 40.1110
30.1404 28.8571 27.8238 26.9739 26.2628 25.6588 25.1393 24.6880 24.2921 23.9421 23.6303 23.0988 22.6626 22.2982 21.9890 21.7235 21.1989 20.8110 20.2753 19.9227 19.4874 19.2285 18.8898 18.5486 16.3149
23.2300 22.2151 21.3978 20.7251 20.1621 19.6837 19.2722 18.9144 18.6005 18.3228 18.0754 17.6535 17.3070 17.0174 16.7716 16.5603 16.1426 15.8335 15.4062 15.1247 14.7765 14.5693 14.2984 14.0235 13.8358
19.5896 18.7176 18.0149 17.4365 16.9522 16.5405 16.1862 15.8780 15.6076 15.3683 15.1550 14.7912 14.4922 14.2421 14.0298 13.8473 13.4861 13.2186 12.8484 12.6043 12.3019 12.1219 11.8849 11.6462 11.4829
17.3150 16.5328 15.9023 15.3831 14.9482 14.5784 14.2600 13.,9831 13.7400 13.5248 13.3329 13.0055 12.7363 12.5110 12.3197 12.1551 11.8293 11.6877 11.2532 11.0323 10.7585 10.5952 10.3804 10.1630 10.0148
15.7463 15.0264 14.4460 13.9679 13.5672 13.2264 12.9330 12.6777 12.4535 12.2549 12.0779 11.7756 11.5270 11.3189 11.1420 10.9898 10.6884 10.4647 10.1547 9.9498 9.6955 9.5437 9.3436 9.1415 9.0017
39.5627 39.0747 38.2432 37.5612 36.9917 36.5090 36.0945 35.2763 34.6717 33.8383 33.2905 32.6147 32.2134 31.6892 31.1605 30.7944
8 13.7041 13.0659 12.5511 12.1268 11.7710 11.4682 11.2074 10.9603 10.7808 10.6041 10.4464 loI17;o 9.9553 9.7695 9.6115 9.4755 9.2057 9.0053 6.7269 6.5427 8.3135 8.1766 7.9948 7.8114 7.6847
10 12.4184 11.8321 11.3589 10.9687 10.6414 10.3628 10.1226 9.9134 9.7295 9.5666 9.4212 9.1726 8.9678 6.7961 8.6499 6.5241 8.2742 8.0683 7.8298 7.6584 7.4448 ;.317i 7.1472 6.9753 6.8559
D.A. Belsley, Assessing the presence of data weaknesses
244
Table A.3g d(= 0.05, Degrees
z 5I
10 11 12 13 14 15 16 17
& 7 .c 5 E u
:: 20 22
2g
'2;
8 & 6 c E B
26 30 35 40 50 60 80 100 150 300 1000
60.7063 58.1285 56.0531 54.3459 52.9176 51.7044 50.6612 49.7546 48.9595 48.2564 47.6301 46.5527 45.6666 44.9546 44.3338 43.8004 42.7468 41.9677 40.6921 40.1642 39.3098 36.7899 36.1099 37.4229 36.9521
of freedom,
36.1169 34.5357 33.2619 32.2138 31.3363 30.5907 29.9492 29.3915 26.9022 26.4693 26.0636 27.4258 26.8854 26.4337 26.0504 25.7206 25.0693 24.5670 23.9202 23.4809 22.9373 22.6137 22.1917 21.7602 21.4664
numerator
y = 0.99999
(pz)
27.3653 26.1427 25.1575 24.3464 23.6671 23.0896 22.5926 22.1604 21.7810 21.4452 21.1460 20.6353 20.2157 19.8646 19.5665 19.3102 18.8030 16.4271 17.9069 17.5636 17.1387 16.6653 16.5532 16.2163 15.9864
22.7922 21.7584 20.9249 20.2385 19.6635 19.1745 18.7534 16.3872 18.0656 17.7606 17.5270 17.0937 16.7374 16.4392 16.1858 15.9679 15.5363 15.2163 14.7729 14.4800 14.1168 13.9001 13.6154 13.3266 13.1275
19.9510 19.0350 18.2964 17.6679 17.1779 16.7441 16.3705 16.0454 15.7599 15.5070 15.2615 14.8964 14.5796 14.3143 14.0888 13.8948 13.5104 13.2250 12.8293 12.5677 12.2429 12.0486 11.7944 11.5341 11.3562
16.0002 17.1656 16.4924 15.9376 15.4725 15.0767 14.7356 14.4391 14.1763 13.9474 13.7414 13.3895 13.0999 12.8573 12.6510 12.4734 12.1214 11.6599 11.4969 11.2567 10.9580 10.7797 10.5429 10.3043 10.1395
15.4730 14.7444 14.1563 13.6714 13.2648 12.9166 12.6202 12.3603 12.1320 11.9296 11.7469 11.4402 11.1659 10.9727 10.7913 10.6351 10.3249 10.0943 9.7736 9.5610 9.2961 9.1360 8.9273 6.7145 0.5667
13.8903 13.2265 12.6941 12.2534 11.8635 11.5665 11.2970 ii.0603 10.6523 10.6679 13.5032 lo.2216 9.9695 9.7946 9.6290 9.4661 9.2023 8.9909 6.6966 8.5011 6.2572 8.1111 7.9166 7.7193 7.5616
20.2046 19.2574 18.4928 17.8625 17.3339 16.6840 16.4963 16.1586 15.6619 15.5990 15.3644 14.9635 14.6333 14.3565 14.1211 13.9163 lS.tJlOO 13.2169 12.8012 12.5257 12.1627 11.9779 11.7056 11.4302 11.2394
17.1944 16.3773 15.7177 15.1736 14.7171 14.3284 13.9932 13.7012 13.4446 13.2170 13.0139 12.6665 12.3003 12.1401 11.9358 11.7596 ll.‘i391 11.1493 10.7867 10.5460 10.2457 10.0659 9.6266 9.5641 9.4151
15.3169 14.5816 13.9077 13.4977 13.0664 12.7360 12.4338 12.1705 11.9389 11.7335 11.5500 11.2362 10.9774 10.7603 10.5753 10.4157 llj.\j4%5 9.8623 9.5326 9.3137 9.0396 6.6760 8.6565 8.4345 8.2786
Table A.3h a=o.o5, Degrees
of freedom,
numerator
y=o.999999
(p2)
2
s r;i .9 E eo
10 11 12 13 14 15 16 17 16 20 19
3
22
E I c
iia" 8 & %
z
E F n
;; 28 30 35 40 50 60 80 100 150 300 1000
72.0930 66.9467 66.4124 64.3269 62.5812 61.0977 59.8216 58.7122 57.7388 56.6776 56.1103 54.8019 53.7272 52.6286 52.0663 51.4110 50.1154 49.1564 47.6307 46.9573 45.8766 45.2336 44.3940 43.5374 42.9573
42.0527 40.1723 38.6566 37.4092 36.3642 35.4759 34.7113 34.0463 33.4627 32.9460 32.4556 31.6999 31.0541 30.5139 30.0551 29.6606 28.6799 26.3014 27.5006 26.9722 26.3175 25.9274 25.4151 24.8956 24.5410
31.4545 30.0246 28.8718 27.9223 27.1267 26.4501 25.6676 25.3608 24.9157 24.5217 24.1704 23.5707 23.0774 22.6646 22.3138 22.0121 21.4144 20.9711 20.3568 19.9510 19.4475 19.1470 16.7523 18.3506 16.0764
25.9456 24.7512 23.7680 22.9943 22.3291 21.7632 21.2758 20.8516 20.4790 20.1491 19.8547 19.3521 18.9366 16.5923 18.2979 18.0445 17.5424 17.1696 16.6525 16.3105 15.8857 15.6320 15.2958 14.9582 14.7234
22.5373 21.4692 20.6436 19.9470 19.3627 18.8656 16.4373 18.0644 17.7366 17.4466 17.1877 16.7454 16.3813 16.0763 15.8169 15.5935 rS.l;,, 14.6215 14.3646 14.0621 13.6859 13.4611 13.1647 12.8630 12.6548
D.A. Belsley, Assessing
the presence
of data weaknesses
245
Table A.4a x=0.01, Degrees
10 11 12 13 14 15 16 17 16 19 20 22 24 26 28 30 35 40 50 60 80 100 150 300 1000
of freedom,
numerator
y=o.75
(pJ
1
2
3
4
5
6
8
IO
19.5875 18.7024 18.0020 17.4343 16.9652 16.5712 16.2357 15.9466 15.6350 15.4741 15.2785 14.9479 14.6792 14.4565 14.2691 14.1089 13.7353 13.5661 13.2532 13.0494 12.8003 12.6536 12.4638 12.2769 12.1161
15.4805 14.C67.2 14.0264 13.5090 13.0827 12.7256 12.4222 12.1614 11.9347 11.7360 11.56C3 11.2639 11.0235 10.8247 10.6575 10.5149 10.2362 lo.0328 9.7556 9.5756 9.3560 9.2269 9.0591 a.8955 a. 7780
13.5523 12.8768 12.2702 11.7796 ii.3758 11.0379 10.7511 10.5046 10.2'jO6 10.1031 9.9375 9.6582 9.4318 9.2447 9.0874 a.9534 6.65i5 8.5005 a.2404 a.0716 7.8657 7.7449 7.5877 7.4318 7.3286
12.5978 11.8484 11.2593 10.7847 10.3943 10.0678 9.7907 9.5527 9.3461 9.1651 9.0053 a.7359 a.5175 8.3371 a.1855 8.0563 7.2039 7.6199 7.3693 7.2067 7.0084 6.8919 6.7409 6.5891 6.4870
11.9034 11.1702 10.5942 10.1302 9.7487 9.4296 9.1569 8.9264 a.7246 a.5479 a.3918 6.1266 7.9154 7.7392 7.5912 7.4650 ;.iioi 7.0389 6.7942 6.6354 6.4416 6.3276 6.1794 6.0331 5.9313
11.4079 IO.E664 10.1197 9.6632 9.2860 8.9742 a.7080 a.4793 8.2809 a.1071 7.9536 7.6948 7.4851 7.3118 7.1662 7.0421 6.,S;i 6.6227 6.3819 6.2255 6.0347 5.9225 5.7765 5.6316 5.5338
10.7419 10.0364 9.4822 9.0359 8.6690 8.3621 a.1018 7.8782 7.6841 7.5140 7.3638 7.1105 6.9053 6.7356 6.5929 6.4713 6 233; 6:0602 5.8237 5.6701 5.4824 5.3720 5.2284 5.0853 4.9892
lo.3108 9.6157 9.0697 a .6299 8.2683 7.9658 7.7091 7.4866 7.2572 7.1294 6.9812 6.7313 6.5286 6.3609 6.2200 6.0998 6 . ;;;-; Y 5.6930 5.4587 5.3064 5.1200 5.0103 4.8668 4.7251 4.6297
Table A.4b !x=O.Ol, Degrees
10 11 12 13 14 15 16 17 16
22 24 26 28 30 35 40 50 60 a0 100 150 300 1000
of freedom,
numerator
y=o.90
(pz)
1
2
3
4
5
6
8
10
26.9514 25.6171 24.5637 23.7116 23.0087 22.4190 21.9176 21.4860 21.1107 20.7813 20.4901 19.0984 19.5390 19.2683 la.9901 18.7529 18.2686 17.9494 17.4867 17.1859 16.8187 16.6026 16.3223 16.0497 15.8728
19.7831 18.6861 17.8227 17.1260 16.5525 16.0722 15.6645 15.3140 15.0095 14.7427 14.5070 14.1093 13.7869 13.5202 13.2961 13.1051 12.7318 12.4593 12.0881 11.8472 11.5533 11.3805 11.1565 10.9333 10.7893
16.8442 15.8520 15.0718 14.4430 13.9257 13.4928 13.1255 12.8099 12.5360 12.2960 12.0939 11.7264 11.4367 11.1972 10.9960 lo.8245 10.4894 10.2449 9.9121 9.6961 9.4326 9.2778 9.0759 8.8774 8.7385
15.1996 14.2676 13.5352 Ii.9452 12.4600 12.0541 11.7097 11.4140 11.1572 10.9323 10.7336 10.3987 10.1274 9.9030 9.7146 9.5539 9.2402
14.1331 13.2408 12.5399 11.9753 11.5111 11.1228 lo.7930 10.5105 10.2649 10.0498 9.8598 9.5394 9.2799 9.0653 8.8850 6.7313 8.4311 8.2120 7.9135 7.7198 7.4833 7.3443 7.1629 6.9839 6.9598
13.3786 12.5148 11.8364 11.2899 10.8406 10.4648 10.1459 9.8721 9.6344 9.4262 9.2423 8.9322 8.6808 8.4731 8.2984 8.1496 7.6587 7.6464 7.3571 7.1692 6.9397 6.8048 6.6287 6.4557 6.3374
12.3715 11.5463 10.8981 10.3760 9.9467 9.5876 9.2829 9.0212 8.7940 8.5948
11.7222 10.9223 10.2939 9.7876 9.3714 9.0231 8.7275 8.4735 9.2530 6.0597 7.8889 7.6007 7.3670 7.1737 ;.0110 6.8723 6.6009 6.4025 6.1316 5.9554 5.7396 5.6124 5.4463 5.2609 5.1702
9.0112
8.6994 8.4971 8.2502 6.1051 7.9165 7.7302 7.6001
8.4189 8.1223 7.8818 7.6829 7.5156 7.3730 7.0943 6.8906 6.6128 6.4323 6.2115 6.0816 5.9115 5.7439 5.6315
D.A. Belsley, Assessing the presence of’ data weaknesses
246
Table A.4c 1=0.01, Degrees
1
'0 ;;j
10 11 12 13 14 15 16 17
.c E 2 4
18 19 20 22
z I 5
ti 4 8 t
;; 28 30 35 40 50 60 100 90
% i! kc B
150 300 1000
32.4288 30.7419 29.4117 26.3368 27.4508 26.7082 26.0771 25.5342 25.0623 24.6484 24.2825 23.6649 23.1638 22.7492 22.4004 22.1029 21.5212 21.0965 20.5174 20.1412 19.6820 19.4118 19.0598 18.7181 18.4902
of freedom,
numerator
y=o.95
(pz)
2
3
4
5
6
8
IO
22.8796 21.571s 20.5424 19.7124 19.0293 18.4576 17.9722 17.5551 17.1928 16.8754 16.5950 16.1220 15.7366 15.4216 15.1551 14.9280 14.4843 14.1604 13.7194 13.4331 13.0838 12.8785 12.6131 12.3453 12.1672
19.0976 17.9471 17.0428 16.3139 15.7145 15.2130 14.7874 14.4218 14.104s 13.8264 13.5808 13.1667 12.8312 12.5530 12.3206 12.1220 11.7339 11.4506 11.0649 10.8146 10.5091 10.3297 10.0956 9.8649. 9.7045
17.0118 15.9501 15.1160 14.4440 13.8914 13.4292 13.0370 12.7002 12.4078 12.1517 11.9254 11.5440 11.2348 10.9793 10.7646 10.5816 10.2240 9.9631 9.6077 9.3769 9.0953 8.9298 8.7145 8.5006 8.3560
15.6703 14.6666 13.8782 13.2431 12.7209 12.2842 11.9136 11.5953 11.3191 11.0770 10.8632 10.5028 10.2106 9.9691 9.7661 9.5931 9.2550 9.0082 8.6720 8.4536 8.1869 8.0301 7.8258 7.6234 7.4846
14.7263 13.7639 13.0079 12.3990 11.8983 11.4796 11.1243 10.8191 10.5541 10.3220 10.1170 9.7712 9.4910 9.2592 9.0645 6.8984 6.5738 8.3368 8.0138 7.8039 7.5474 7.3965 7.2001 7.0046 6.6736
13.4719 12.5651 11.8528 11.2789 10.8071 10.4124 10.0774 9.7896 9.5397 9.3208 9.1273 8.8010 8.5364 8.3175 8.1335 7.9765 7.6696 7.4452 7.1391 6.9401 6.6965 6.5531 6.3657 6.1798 6.0555
12.6664 11.7958 11.1118 IO.5608 10.1076 9.7284 9.4066 9.1300 8.8898 8.6793 8.4932 8.1793 7.9246 7.7139 7.5366 7.3853 7.0893 6.8728 6.5771 6.3846 6.1488 6.0097 5.8274 5.6472
5.5246
Table A.4d 2=0.01, Degrees
of freedom,
numerator
y=o.99
(pJ
2
10 11 12 13 14 15 16 17 1: 20 22 24 26 28 30 35 40 50 60 EO 100 150 300 1000
44.9829 42.4494 40.4552 38.8461 37.5214 36.4122 35.4702 34.6606 33.9574 33.3409 32.7963 31.8774 31.1324 30.5163 '29.9983 29.5567 28.6938 28.0639 27.2057 26.6485 25.9683 25.5688 25.0484 24.5349 24.1773
29.7977 28.0036 26.5931 25.4562 24.5211 23.7386 23.0745 22.5039 22.0086 21.5746 21.1912 20.5447 20.0207 19.5875 19.2234 18.9131 18.3068 17.8643 17.2616 16.8703 16.3928 16.1122 15.1474 15.3867 15.1352
24.0529 22.5466 21.3630 20.4094 19.6252 18.9692 18.4125 17.9344 17.5193 17.1556 16.8344 16.2928 15.8538 15.4908 15.1856 14.9258 14.4178 14.0470 13.5418 13.2138 12.8133 12.5778 12.2712 11.9673 11.7589
20.9515 19.6030 18.5437 17.6902 16.9885 16.4016 15.9035 15.4757 15.1043 14.7789 14.4915 14.0068 13.6139 13.2891 13.0160 12.7833 12.3284 11.9962 11.5435 11.2495 10.8903 10.6790 10.4042 10.1304 9.9463
5
6
18.9825 17.7353 16.7556 15.9663 15.3174 14.7745 14.3139 13.9182 13.5747 13.2737 13.0078 12.5593 12.1958 11.8951 11.6424 11 .i269 11.0056 10.6979 10.2783 10.0056 9.6723 9.4762 9.2203 8.9661 a.7949
17.6089 16.4329 15.5092 14.7650 14.1531 13.6413 13.2068 12.8336 12.5096 12.2257 11.9748 11.5517 11.2086 10.9248 10.6862 10.4827 10.0847 9.7939 9.3973 9.1393 8.8238 8.6380 8.3952 8.1546 7.9934
8 15.7992 14.7181 13.8689 13.18.16 12.6218 12.1510 11.7513 11.4079 11.1097 10.8432 10.6172 10.2275 9.9113 9.6496 9.429s 9.2417 8.8743 8.6055 8.2386 7.9996 7.7070 7.534s 7.3084 7.0077 6.9326
10 14.6461 13.6263 12.8250 12.1793 11.6482 11.2037 10.8263 10.5019 10.2202 9.9731 9.7548 9.3863 9.0872 8.8396 6.6312 g.4533 8.1051 7.8502 7.5017 7.2746 6.9960 6.8316 6.6155 6.4009 6.2540
D.A. Belsley, Assessing the presence of data weaknesses
247
Table A.4e a=O.Ol, Degrees 1 10
E I r 8 z ." E 4: -s i 4 8 L!z % ?I E
11 12 13 14 15 16 17 18 19 20 22 24 26 28 30 35 40 50 60 80 100 150 300 1000
62.7685 58.9782 55.9982 53.5958 51.6196 49.9659 48.5623 47.3563 46.3093 45.3918 44.5812 43.2143 42.1063 41.1902 40.4201 39.7638 38.4812 37.5449 36.2695 35.4415 34.4305 33.8363 33.0609 32.2966 31.7671
y=o.999
(pJ
of freedom,
numerator
2
3
4
5
6
8
10
30.7889 28.7874 27.2151 25.9482 24.9066 24.0351 23.2956 22.6604 22.1089 21.6256 21.1987 20.4788 19.8951 19.4124 19.0066 18.6606 17.9843 17.4903 16.8168 16.3792 15.8443 15.5294 15.1206 14.7113 14.4363
26.2423 24.5013 23.1336 22.0317 21.1256 20.3676 19.7242 19.1716 18.6917 18.2712 17.8997 17.2731 16.7650 16.3447 15.9913 15.6899 15.1006 14.6700 14.0826 13.7006 13.2334 12.9584 12.5992 12.2429 12.0042
23.3877 21.8115 20.5732 19.5755 18.7551 18.0687 17.4861 16.9856 16.5509 16.1700 15.8334 15.2655 14.8050 14.4240 14.1035 13.8302 13.2955 12.9047 12.3712 12.0240 11.5992 11.3489 11.0213 10.6974 10.4788
21.4123 19.9508 18.8027 17.8775 17.1167 16.4801 15.9397 15.4753 15.0721 14.7186 14.4062 13.8792 13.4516 13.0978 12.8001 12.5462 12.0493 11.6859 Il.1896 10.8664 10.4707 10.2373 9.9317 9.6290 9.4202
18.8315 17.5211 16.4916 15.6618 14.9793 14.4081 13.9231 13.5063 13.1442 12.8267 12.5461 12.0725 11.6880 11.3698 11.1019 10.0733 IO.4256 10.0979 9.6499 9.3577 8.9995 8.7000 8.5103 8.2343 8.0461
17.2011 15.9871 15.0331 14.2641 13.6314 13.1018 12.6521 12.2654 Il.9294 11.6348 Il.3743 10.9345 IO.5774 10.2816 10.0325 9.8198 9.4032 9.0978 6.6800 8.4072 8.0722 7.8744 7.6137 7.3554 7.1779
25.0151 23.2805 21.9175 20.8191 19.9156 19.1595 18.5175 17.9657 17.4865 17.0663 16.6948 16.0679 15.5591 15.1379 14.7834 14.4809 13.8886 13.4549 12.8622 12.4758 12.0019 11.7223 11.3551 10.9903 10.7416
21.6768 20.1496 18.9495 17.9821 17.1862 16.5199 15.9541 15.4677 15.0451 14.6745 14.34G8 13.7936 13.3443 12.9722 12.6589 12.3915 11.8674 Il.4634 10.9578 10.6148 10.1935 9.9448 9.6172 9.2923 9.0692
19.5809 18.1848 17.0876 16.2030 15.4750 14.8655 14.3470 13.9027 13.5158 13.1764 12.8763 12.3695 II.9577 11.6165 11.3291 11.0836 10.6024 10.2494 9.7659 9.4498 9.0612 8.8313 8.5293 8.2271 8.0200
39.3564 36.8693 34.Y151 33.3404 32.0455 30.9622 30.0428 29.2531 28.5675 27.9E68 27.4361 26.5413 25.8159 25.2162 24.7120 24.2823 23.4425 22.8292 21.9936 21.4509 20.7880 20.3981 19.8900 19.3893 19.0477
Table A.4f a=O.Ol, Degrees
of freedom,
numerator
YEO.9999
(pz)
I 10 11 12 I3 14 15 16 17 18 19 20 22 24 26 28 30 35 40 50 60 80 100 150 300 1000
80.4834 75.4014 71.4079 68.1896 65.5430 63.3287 61.4495 59.8352 58.4337 57.2055 56.1206 54.2910 52.8078 51.5814 50.5504 49.6716 47.9539 46.6995 44.9900 43.8796 42.5231 41.7254 40.6840 39.6556 38.9500
48.7177 45.5372 43.0385 41.0252 39.3696 37.9846 36.8092 35.7994 34.9227 34.1544 33.4757 32.3310 31.4028 30.6352 29.9897 29.4395 28.3636 27.5776 26.5057 25.8089 24.9572 24.4559 23.8004 23.1535 22.7193
37.3103 34.8213 32.8660 31.2906 29.9950 28.9111 27.9912 27.2008 26.5145 25.9131 25.3817 24.4052 23.7582 23.1568 22.6510 22.2196 21.3760 20.7594 19.9180 19.3707 18.7010 18.3066 17.7915 17.2807 16.9400
31.3197 29.1968 27.5290 26.1851 25.0799 24.1552 23.3703 22.6958 22.1101 21.5967 21.1431 20.3777 19.7568 19.2430 18.8108 18.4422 17.7209 17.1934 16.4732 16.0044 15.4304 15.0920 14.6498 14.2104 13.9080
27.5054 25.6920 24.2044 23.0057 22.0197 21.1947 20.4943 19.8924 19.3697 18.9114 18.5064 17.8230 17.2684 16.8094 16.4232 16.0938 15.4488 14.9770 14.3323 13.9124 13.3978 13.0942 12.6981 12.3032 12.0324
D.A. Be&y,
248
Assessing the presence
ofdata
weaknesses
Table A.4g a=O.Ol, Degrees
10
I s ;; s .E E B 3
11 12 13 14 15 16 17 18 19 20 22
<
2:
4 8 t
28 30 35 40 30 60 80 100 150 300 1000
E
"0 h: it
98.1784 91.7820 86.7564 82.7069 79.3770 76.5910 74.2265 72.1952 70.4316 68.8860 67.5206 65.2175 63.3501 61.8057 60.5071 59.3999 57.2350 55.6534 53.4965 52.0944 50.3803 49.3711 48.0592 46.7523 45.8738
of freedom,
y=o.99999
(pz)
numerator
2
3
57.9713 54.0961 51.3516 48.5985 46.5812 44.8933 43.4607 42.2298 41.1610 40.2242 39.3965 38.0001 36.8676 35.9306 35.1426 34.4706 33.1560 32.1949 30.8834 30.0301 28.9658 28.3707 27.5662 26.1706 26.2262
43.7093 40.7369 38.4017 36.5199 34.9723 33.6773 32.5780 31.6334 30.8130 30.0939 29.4584 28.3862 27.5162 26.7963 26.1907 25.6741 24.6631 23.9235 22.9136 22.2560 21.4506 20.9757 20.3564 19.7367 19.3153
36.2733 33.7745 31.8112 30.2291 28.9217 27.8386 26.9140 26.1194 25.4292 24.8241 24.2893 23.3867 22.6543 22.0480 21.5370 21.1025 20.2502 19.6265 18.7741 18.2186 17.5378 17.1361 16.6086 16.0857 15.7283
31.6614 29.4577 27.7262 26.3306 25.1826 24.2217 23.4059 22.7046 22.0954 21.5613 21.0892 20.2922 19.6452 19.1095 16.6587 18.2739 17.5202 16.9684 16.2138 15.7217 15.1180 14.7616 14.2939 13.8301 13.5132
28.4996 26.4990 24.9269 23.6597 22.61.72 21.7445 21.0035 20.3664 19.8130 19.3276 16.8985 18.1741 17.5859 17.0987 16.6886 16.3385 15.6525 15.1500 14.4624 14.0137 13.4627 13.1376 12.7093 12.2849 11.9935
24.4107 22.6741 21.3091 20.2087 19.3032 18.5450 17.9010 17.3473 16.6661 16.4441 16.0708 15.4404 14.9204 14.5041 14.1465 13.8415 13.2431 12.8043 12.2033 11.8105 11.3275 11.0419 10.6657 10.2916 IQ.0351
21.8554 20.2846 19.0499 18.0542 17.2347 16.5485 15.9655 15.4640 15.0282 14.6458 14.3076 13.7362 13.2718 12.8869 12.5626 12.2854 11.7416 11.3429 10.7957 10.4378 9.9969 9.7360 9.3910 9.0491 6.8120
Table A.4h x=0.01, Degrees
of freedom,
1
^E
I
5 0" 2 .,c E 0
5
0
E” 2 8 & % .z e z Q
10 11 12 13 14 15 16 17 18 19 20 22 24
;;
30 35 40 50 60 80 100 150 300 1000
115.7972 108.0760 102.0099 97.1219 93.1022 89.7388 66.8841 64.4313 82.3015 60.4348 78.7854 76.0028 73.7459 71.8789 70.3586 63.5634 66.3497 64.4247 61.e212 60.1209 5R.0403 56.6145 55.2116 53.6253 52.5545
67.1497 62.5792 58.9883 56.0946 53.7147 51.7232 50.0326 48.5798 47.3182 46.2122 45.2347 43.5854 42.2473 41.1399 40.2082 39.4134 3Y.8578 36.7199 35.1657 34.1536 32.9138 32.1828 31.2235 30..2774 29.6216
numerator
y=o.999999
(pJ
3
4
5
6
8
50.0301 46.5767 43.8633 41.6765 39.8770 38.3724 37.0943 35.9959 35.0418 34.2053 33.4659 32.2180 31.2051 30.3667 29.6612 29.G59i 27.8802 27.0173 25.8379 25.0692 24.1266 23.5704 22.8421 22.1191 21.6227
41.1466 30.2757 36.0198 34.2016 32.7058 31.4538 30.3907 29.4769 28.6831 27.9869 27.3716 26.3327 25.4894 24.7911 24.2032 23.7014 22.7186 21.9988 21.0143 20.3721 19.5840 19.1193 16.5069 17.9006 17.4856
35.6577 33.1481 31.1759 29.5862 28.2783 27.1834 26.2535 25.4542 24.7596 24.1506 23.6120 22.7027 21.9643 2i.3528 20.8376 io.38ai 19.5367 18.9054 18.0415 17.4775 16.7648 16.3754 15.037'1 15.3011 14.9390
31.9058 29.6440 27.8663. 26.4333 25.2541 24.2669 23.4284 22.7074 22.0810 21.5315 21.0456 20.2251 19.5586 19.0064 16.5414 16.1443 17.3658 16.7951 16.0136 15.5030 14.6754 14.5044 14.0156 13.5302 13.1963
27.0699 25.1267 23.6026 22.3724 21.3596 20.5118 19.7914 19.1719 18.6334 18.1610 17.7432 17.0373 16.4637 15.9se2 15.5877 16 2454 14:5741 14.0816 13.4062 12.9644 12.4205' 12.0986 11.6737 11.2516 10.9595
10 24.0586 22.3161 20.9497 19.8461 16.9376 16.1767 17.5301 16.9740 16.4905 16.0662 15.6909 15.0567 14.5410 .14.1135 13.7531 i; -VA: 12:8406 12.3967 11.7874 11.3883 10.8963 10.6042 10.2187 9.6357 9.5689
D.A. Belsley, Assessing
the presence of data weaknesses
Appendix B: A summary of the collinearity B.1. Determining
the presence
diagnostics
249
of BKW
of collinearity
Many measures have been suggested for diagnosing the presence of collinearity including det (XrX), corr (X’X), corr- ‘(XTX), corr (h) and the variance-inflation factors. These measures are subject to significant shortcomings discussed at length in chapter 3 of Belsley, Kuh and Welsch and overcome by the procedure described there and summarized here. Our object is to discover the presence of linear near dependencies among the columns of an n x p data matrix X. The basis of the diagnostic technique is the singular-value decompositionz2 of X as X= UDVT,
(B.1)
and D is diagonal with non-negative diagonal where UTU = VTV=I,, elements Pi, k= 1,. . .,p, called the singular values of X. The singular-value decomposition of X is mathematically equivalent to the system of eigenvalues and eigenvectors of XTX, but, for various reasons is preferred in this context. For each exact linear dependency among the columns of X, some singular value would be zero. Hence if X contained p-r exact linear dependencies (i.e., the rank of X = r) they could be displayed as
xv,=o,
(B.2)
which comes from the relevant
manipulation
XCVlv21=cu1 u21
i
of (B.l),
D
j1 ; >
03.3)
1
U, is nxr and U2 is nx(p-r). where D,, is rxr, VI is pxr, V, is px(p-r), When the collinear relations among the columns of X are not exact, but are near dependencies (having, for example, a high RZ when one column is regressed on some of the others), there will be pL’s that are small. The problem, of course, is to define what is meant by ‘small’. This question is answered through the use of a generalization of the concept of the condition number of a matrix. It is shown in numerical analysis that the higher the condition number of a matrix A of a system of linear equations Az=c, the more sensitive is the solution z=A-’ c to small perturbations of the elements of A and c. This measure, denoted EC(A),is shown to be
K(A)
=P(maxlPmin
‘*See Golub
(1969), Golub
2 l> and Reinsch (1970), Hanson
(B.4) and Lawson
(1969)
250
D.A. Belsley, Assessing
the presence of’duta wutrkm~\wr
and in fact provides a bound on the sensitivity of the solution to small changes in A and c.23 Hence, we learn that ,umaxprovides a yardstick against which smallness can be measured. This suggests defining a set of condition indexes,
VW= &mlPk?
k=l,...,p,
(B.5)
whose high values will signal the presence of near dependencies among the columns of X. Experiments conducted in chapter 3 of BKW show the efficacy of the condition indexes in this context. There, various matrices are made to become systematically more ill-conditioned by constructing one or more columns of the matrix to differ from being linear combinations of other columns by an error term whose variance is made smaller and smaller. In this way the ‘tightness’ of the known linear near dependencies can be assessed through conventional correlations and multiple correlations and compared to the resulting condition indexes. The following properties emerge: (i) The condition indexes do indeed signal the presence of near dependencies, in a manner that affords stable interpretation. q’s of around 10 correspond to weak relations (R2 of 0.4-0.6). q’s of around 30 correspond to more moderate relations (R2 of 0.9) that are more troublesome to econometric practice. (ii) The tighter the underlying relation, the higher the condition index. Indeed, as the underlying correlations or R2’s increase along the progression 0.9, 0.99, 0.999, 0.9999, etc., the condition indexes increase roughly along the progression 3, 10, 30, 100, 300, 1000, etc. It is very important to note, however, that these are not equivalent pieces of information, for the condition indexes operate on the entire X matrix, treating all columns symmetrically, and allow one to determine not only the number of near dependencies but also the variables included in each; to find the R2, however, one must have prior knowledge of the number of near dependencies and those columns of X to be regressed on the others. (iii) The condition indexes are highly successful in diagnosing simultaneous or coexisting near dependencies. Indeed there will be one high condition index for each near dependency. The number of near dependencies is therefore readily determined by this analysis. Furthermore, the condition indexes corresponding to any one near dependency will remain stable in magnitude while a coexisting near dependency is made tighter. Hence, relative magnitudes of coexisting near dependencies can be assessed. ?See
Wilkinson
(1965) and Forsythe
and Moler (1967).
251
D.A. Belsley, Assessing the presence of data weaknesses
B.2. Determining
the structure oj’each
near dependency
We turn now to determining which variates are involved in each near dependency and, hence, also to assessing the potential harm that these near dependencies may cause least-squares estimates. These tasks are accomplished through the use of a decomposition of the variance of the least-squares estimators. Applying (B.l) to the variance
(B.6)
and for the kth component, var (b,J = CT’C vtj/pf, j
k=l,...,p,
(B.7)
where V=(uii). This expression, introduced in this context by Silvey (1969), is seen to decompose each parameter variance into components that are directly associated with the near dependencies. Small ,uu,‘s produce a high component, and therefore a high proportion, of the variance of the estimated coefficients of those variates that are involved in the specific relation and which are thereby potentially harmed by it. The information from (B.7) is most usefully summarized in a Z7 matrix of variance-decomposition proportions. That is, define
03.8) and, form the variance-decomposition 71jk =
Tell-tale exhibited
k,j=l,...,
$kj/6k?
patterns of in a summary
proportions p.
(B.9)
high variance-decomposition proportions table of the form shown in table B.l. Table B.l
II matrix
of variance-decomposition
proportions
can
be
252
D.A. Belsley, Assessing the presence
of data weaknesses
Since each high q is associated with a linear near dependency, one seeks to find cases where two or more variates (since it takes at least two to form a linear relation) have a high proportion of their variance associated with a single high condition index. It is shown in chapter 3 of BKW that values for these variancedecomposition proportions in excess of 0.5 are indeed effective in indicating those variates that are involved in specific near dependencies. Thus high r’s (>30 or 100) diagnose the presence of collinear relations while high associated z’s (>O.S) diagnose variate involvement. Some care is required, however, when a given variate is involved in several coexisting near dependencies, for its variance-decomposition proportions can then be distributed across the q’s so that all are relatively small. In this instance, it is clearly the sum of the rc’s across the high y’s that provides the relevant diagnostic information. A value in excess of 0.5 also works well in practice for this sum. Once the number, g, of collinear relations has been determined from the number of high q’s and the variate involvement established, it is often desirable to select g of the columns of X to regress descriptively on the remaining p-g in order directly to display the collinear relations in a set of auxiliary regressions. A particularly effective means for selecting the columns of X for this purpose is described in section 3.4 of BKW.
References Anderson, T.W., 1958, An introduction to multivariate statistical analysis (Wiley, New York). Bargmann, R.E. and S.P. Ghosh, 1964, Noncentral statistical distribution programs for a computer language, Report no. RC-1231 (IBM Watson Research Center, Yorktown Heights, NY). Belsley, David A., 1980, The statistical problem associated with collinearity and a test for its presence, Technical report no. 6 (Center for Computational Research in Economics and Management Science, MIT, Cambridge, MA). Belsley, David A., 1981, Forecasting and collinearity, Technical report no. 27 (Center for Computational Research in Economics and Management Science, MIT, Cambridge, MA) or Boston College, Boston, MA). Also Working paper no. 113 (Department of Economics, forthcoming in: Business and economics statistical section of the proceedings of the 1981 American Statistical Association meetings, Detroit. Belsley. David A., Edwin Kuh and Roy E. Welsch, 1980, Regression diagnostics: Identifying influential data and sources of collinearity (Wiley, New York). Cramer, N., 1946, Mathematical methods of statistics (Princeton University Press, Princeton, NJ). Forsythe, G. and C. Moler, 1967, Computer solution of linear algebraic systems (Prentice-Hall, Englewood Cliffs, NJ). Gardner, J.R. and S.H. Hymans, 1978, An econometric model of the U.S. monetary sector, RSQE research report (The University of Michigan, Ann Arbor, MI). Goldberger, A.S., 1964, Econometric theory (Wiley, New York). Comb, G.H., 1969, Matrix decompositions and statistical calculations, in: R.C. Milton and J.A. Nelder, eds., Statistical computation (Academic Press, New York) 365-397. Golub, G.H. and C. Reinsch, 1970, Singular-value decomposition and least-squares solutions. Numerische Mathematik 14, 403-420.
D.A.
Belsley,
Assessing
the presence
c$ data
weaknesses
253
Hanson, R.J. and CL. Lawson, 1969, Extensions and applications of the householder algorithm for solving least squares problems, Mathematics of Computation 23, 7877812. Hotelling, H., 1931, The generalization of Student’s ratio, Annals of Mathematical Statistics 2, 36&378. Lawson, C.L. and R.J. Hanson, 1974, Solving least squares problems (Prentice-Hall, Englewood ClitTs, NJ). Lehmer, E., 1944, Inverse tables of probabilities of errors of the second kind, Annals of Mathematical Statistics 15, 3888398. Malinvaud, E., 1970, Statistical methods of econometrics, 2nd ed. (North-Holland, Amsterdam). Rao, C.R., 1973, Linear statistical inference and its applications (Wiley, New York). Silvey, S.D., 1969, Multicollinearity and imprecise estimation, Journal of the Royal Statistical Society 831, 539-552. Smith, Gary and Frank Campbell, 1980, A critique of some ridge regression methods, Journal of the American Statistical Association 75, 74103. Stewart, G.W., 1973, introduction to matrix computations (Academic Press, New York). Theil, H., 1971, Principles of econometrics (Wiley, New York). Toro-Vizcarrendo, Carlos and T.D. Wallace, 1968, A test of the mean square error criterion for restrictions in linear regression, Journal of the American Statistical Association 63, 5588572. Wilkinson, J.H., 1965, The algebraic eigenvalue problem (Oxford University, University Press, Oxford). Wilks, Samuel S., 1962., Mathematical statistics (Wiley, New York). Wonnacott, R.J. and T.H. Wonnacott, 1979, Econometrics, 2nd ed. (Wiley, New York).