Journal
of Econometrics
56 (1993) 141-168.
North-Holland
Minimum chi-square estimation and tests for model selection*
This paper studies the problem of model selection based on Pearson chi-square type statistics. Such goodness-of-tit statistics have been considered by Moore (1978) and diagnostic tests based on them have recently been extended to general econometric models by Andrews (1988a. b). Here we consider another important use of these statistics. Specifically, we propose some convenient asymptotically standard normal tests for model selection based on chi-square type statistics. Following Vuong (1989). the null hypothesis is that the competing models are equally close to the data-generating process (DGP) vs. the alternative hypotheses that one model is closer to the DGP where closeness of a model is measured according to the discrepancy implictt in the Pearson type statistic used. Our model selection tests have the desirable feature that neither model needs to be correctly specified nor nested in each other. As a prerequisite. we study the corresponding class of minimum chi-square estimators.
1. Introduction
The study of the Pearson (I 900) chi-square statistic and its extensions has been the subject of considerable statistical research. See, e.g., Watson (1959) and Moore (1978, 1986) for comprehensive surveys. In view of the recent emphasis on specification testing, Pearson type statistics have attracted new interest due to their natural suitability as diagnostic tests. In econometric work, the use of Pearson chi-square statistics has initially been confined to qualitative choice models. See Nerlove and Press (1973) McFadden (1974), and Horowitz (1985) among others. More recently. Heckman (1984) and Tauchen (1985) have
C’o~rrsportdence IO: Weiren Wang. Department KY 40506-0034, USA.
of Economics,
University
of Kentucky,
Lexington.
*We owe a great intellectual debt to D.S. Moore whose work on chi-square type tests much influenced this research. A preliminary version of this paper was presented at the Econometric Society meeting in Atlanta, December 1989. We are grateful to F. Diebold, C. Hsiao. A. Weiss. and three anonymous referees for helpful comments.
0304-4076;93
‘$05.00
t‘
1993-
Elsevier Science Publishers
B.V. All rights reserved
extended their use to more general models. while Andrews (1988a, b) has provided the most general treatment of chi-square tests for econometric models, i.e., for statistical models with covariates. This paper considers another important use of Pearson type statistics. Because they constitute natural measures of goodness-of-fit, such statistics are frequently used for choosing among competing econometric models, the best model being the one with highest goodness-of-fit. Such a model selection procedure, however. does not take into account random variations inherent in the values of the statistics. A model is simply deemed best as long as its goodness-of-fit statistic is lower than those of its competitors whether differences in goodness-of-fit statistics are small or large. The preceding remark suggests a need for taking into account the stochastic nature of these differences so as to assess their significance. The main purpose of this paper is to address this issue. Specifically. we shall propose some asymptotically standard normal tests for model selection based on Pearson type statistics. Following Vuong (1989). our tests are directional and are testing the null hypothesis that the competing models are equally close to the datagenerating process (DGP) vs. the alternative hypotheses that one model is closer to the DGP where closeness of a model is measured according to the discrepancy implicit in the Pearson type statistic used. Alternatively, in a nontesting framework. our results can be viewed as providing information on the strength of the statistical evidence in favor of the choice of a model based on its goodness-of-fit. The model selection approach considered in this paper differs from those of Akaike (1973, 1974) and Cox (1961. 1962) for nonnested hypotheses. It differs from the former for it sets the model selection problem within a hypothesis testing framework. It differs from the latter because Cox’s null hypothesis is that a model is correctly specified. Another difference is that the present approach is based on the discrepancy implicit in the Pearson type statistic used, while these other approaches as well as Vuong’s ( 1989) tests for model selection rely on the Kullback Leibler (1951) information criterion (KLIC). Which discrepancy or criterion is preferable depends on the user’s loss function. Lastly, our model selection tests have the desirable feature that neither model needs to be correctly specified nor nested in each other. This contrasts with the more familiar use of chi-square statistics in the analysis of contingency tables where the competing models are nested and the larger model is assumed to be correctly specified [see. e.g., Goodman (I 970). Hnberman (1978). Bishop, Fienberg. and Holland (1975)]. The paper is organ&d as follows. Section 2 detines a class of Pearson type statistics and sets out the basic assumptions. Some general results are proved. Section 3 dctincs a class of minimum chi-square estimators and derives their asymptotic distributions whether or not the econometric model is correctly specified. The asymptotic properties of the resulting class of goodness-of-fit tests arc established. Section 4 defines the model selection problem in the minimum
chi-square framework. and proposes new and simple directional tests for model selection. Section 5 illustrates our results with the classical problem of choosing between an exponential model and a log-normal model. A limited Monte Carlo study is performed to assess the small sample properties of our tests. Section 6 discusses some extensions and concludes. All the proofs are collected in the appendix.
2. Basic notation and some general results For simplicity, we consider models without covariates. In section 6, we shall briefly indicate how our results may extend to models with covariates based on ideas drawn from Andrews (1989a, b). Let (Xi, i = 1,2, . . ) be a sequence of random vectors taking values in some complete separable metric space X. Our first assumption concerns the DGP.
Xi,i= I, 2,
Assu~~prio~~ A./.
(iid) with common
true distribution
, are independent H on (X, 0).
and identically
distributed
Assumption A.1 is more suitable to cross-section than time series data. Our results can be extended to more general DGP such as those considered by White and Domowitz (1984). To construct chi-square test statistics, we partition X into M disjoint cells E,. E2, . E,&,, i.e., I
.\I
X
EinEj=qj
and
if
i#j.
For simplicity, the range of the sample space X is partitioned into fixed disjoint cells. The case of data-dependent (or random) cells is addressed in section 6. Let /I be the vector of true cell probabilities, i.e., 11= (/I,, II,,
. . /l,w)‘,
where
hi =
.i E,
dH(u)
for
i = I, 2, . . . . M.
The cells Ei are chosen so that we have: ilssur~zption A.2.
hi > 0 for i = I. 2,
, M.
We consider a parametric model F,, = iF(. / 0); HE 0 i which may or may not contain the true distribution H. where 0 is a compact subset of a k-dimensional Euclidean space (assuming li < A1 ~ I). If F,, contains H. then there exists a (I,, E 0. such that F(z 1(I,,) = H(.y) and the model F,, is said to be correctly specified. For any OE 0. let
/‘i(j) = [/J,(f)). /‘2(f)), . . /J\,(fI)]‘. whcrc
denote
the vector of expected
Ass~rnptiort A.3. i = I. 7. . 31.
cell probabilities.
We assume:
/I,(()) > 0 and pi(O) is twice continuously
Corresponding to the partition served ccl1 frequencies f = (,I; . f2.
El,. E,,
. E,,,
differentiable
we define
for all
the vector of ob-
. /;rI .
where
and l,~,(Xj) is the indicator function ,Yj does not fall in cell Ei. Following the !2,1-dirnensil.~l;a1 vector
r;,(0)= \ )I(
.( I; - /J,(f)))',
taking values 1 if Xi falls in cell E, or 0 if Moore (197X), it is convenient to consider
/l,(f)). ,._ )‘.
which measures the difference between observed and expected cell frequencies. Regarding estimators and weight matrices used in forming Pearson type statistics. we make the next two assumptions. .~.S.SLJJJl[Jti(JJl
11.3.
Let O,, be any estimator
, JJ(f1,,- 0, ) = O,,( 11.
of 0. For some O.+E 0, O,, satisfies
Assumption
A.5.
M,, is a M x M symmetric
possibly
random
matrix satisfying
1 M,, = M, + 7 L,,, V ” where L,, is a matrix whose elements positive definite matrix.
are O,(l)
and M, is a fixed symmetric
Assumption A.4 implies that (I,,is a consistent estimator of some value (I*. It is not overly restrictive since it is satisfied by common estimators such as maximum likelihood estimators, least square estimators, and other extremum estimators, even when the model is misspecified. Assumption A.5 implies that M, is a consistent estimator of M,, which is Moore’s assumption [Moore (1984, thm. 5.3)]. As seen below, consistency of M, to M, is insufficient to characterize the asymptotic distribution of general chi-square statistics under possible misspecification of the model F,,. We now define a general class of Pearson type statistics. Dcfitzitiott
I.
A Pearson
type statistic
is of the form
Qn(un) = Ki(&,)M, V,(h), where (I,, and M,, satisfy assumptions
A.4 and A.5.
This class of statistics has been extensively studied by Moore (1978. 1984, 1986). It contains some well-known chi-square statistics such as the Pearson statistic with M, = I,,,, (the identity matrix), the Modified Pearson statistic with M, = diag( . . . , pi(m)/.fl, . . ), and the Gauss statistic with M,,= diag( . , p,(O), . ) as special cases. The remainder of this section characterizes the asymptotic distributions of such statistics under general conditions. To do so we introduce the following vectors and matrices:
h=
hi - p*; . . . . -/-)... \I P*t
D, = diag
jhi . . . -7,
JP*i
’
,
. ..
B=diag(
,
... . $
,... 12,
D, = d jag
=
D,(Df
-I,,),
where I’*
=(/‘*I.
. ..-/'*\I
)
and
/‘*i
=
[J,(IJ*).
Note that each element of C;,, is the sum of iid random multivariate central limit theorem. it follows that
vectors. Hence by the
This result holds even where 2‘,, = I,,, ~ lJ,,y;, and LI,, = (\ 17, . , h,, . L h,,,)‘. if the specifed model doesn’t contain the true DGP H. The next theorem uses this result to characterize the asymptotic distribution of the general chi-square statistic Q,,(O,l) under general conditions. It provides the main property on which the results of this paper rest.
Theorem 2.1 can be used to characterize the asymptotic distribution of Q,(e,,) under various conditions. Note in particular that the random matrix L, of assumption A.5 appears in Theorem 2.1 always premultiplied by h’. Hence, unless h = 0, the asymptotic distribution of Q,,(H,) depends in general on L, and hence on how the limiting weighting matrix M, is estimated by M,. Two special cases are important. The first case, which is standard, relates to goodness-of-fit testing. Corollary
.?.2.
U&Y
ussurnptions A.16A.5.
Q,(fM = CU,-
B&n -
&)l'M,CU,
if h = p.+, then
- B&(0, -
O,)] + o,(l).
In fact, this corollary holds as long as M, + M, so that assumption A.5 is not fully required. When 0, = Ho so that h = p(O,), the result is well-known [see, e.g., Moore (1978, eq. (14))]. Corollary 2.2 is the basis for the specification tests discussed in section 3, where 0, is the minimum chi-square estimator that minimizes the quadratic form Q,,(O). The second case, which is of interest to us, requires the asymptotic distribution of (l/Jn)Q,,(H,). Then assumption A.5 is fully required. Corollar_r 2.3.
5
Ultller msmptiom
Q,t(&)
=Jnh’M,h
A.I-A.5,
+ b’L,,h + 2h’M,D1
- b’M, DzB&(O,,
U,
- 0,) + op( 1).
Corollary 2.3 gives us the basic equation for constructing various tests for model selection in section 4. In this equation, h’ M, b is the implicit discrepancy between the DGP and the specific member F(. 10,) of the specified model Fo. The second term b’L,b is O,( 1) since L, is O,(l) by assumption AS. Hence this term is not negligible and contributes to the asymptotic distribution of (l/Jn)Q,((II,,). Moore (1984, thm. 5.3) inadvertently ignored the term b’L,b, and the condition M,, -+ M, in probability under H seems too weak for his stated result.
3. Minimum
chi-square estimators and tests for goodness-of-fit
Hereafter, we focus on a subclass of the general class given by Definition 1. Namely, we consider the class of statistics QJO,), where 0, is the minimum chi-square estimator obtained by minimizing the quadratic form Q,(0). Our
main motivation is that the statistic Q,,(O,,) hence defined and divided by II converges. as we shall see, to some value Q(0.J which can be viewed as the corresponding asymptotic discrepancy between the DGP and the specified parametric model F(,. As a prerequisite, we study here the asymptotic properties of the minimum chi-square estimators (I,,. To do so. we specialize the weighting matrix M,, in the quadratic form Q,,(n) to be of the form M(,f; 0). Hence, from now on. we shall assume that Q,,(n) = L’,;(O).kf(,/; 0) I,(O). The matrix function M( f_ 0) is assumed to satisfy:
.4ss~rr~~prior~‘4.6. Each element of the weighting matrix M( 1; 0) is twice continuously differentiable in ( fl 0)~ R.” x 0. and M(h. 0) is ;t positive definite matrix for any 0.
The specification 124(,/i I)) is quite fexible and more general than the quadratic forms C’,:(O),V,,Lb(O) where M,, is independent of 0. These latter quadratic forms have been considered by Moore ( 1978, 1984. 1986). Next, we define the discrepancy between H and a member of the parametric family E.,, as Q(n) = I”(O):24(h. 0) I’(0). where
Clearly. Q(0) 2 0 and Q(o) = 0 only if h = /‘(I)). Thus Q(0) measures the closeness from the true distribution H to any particular member of the parametric family ( F(. / I)): 0 E 0 I. Important special cases are !M(h, 0) = I,k,. .JZ(h. 0) = diag( ,_. . p,(0)/hi, ._. ). and ,!4(h. 0) = diag( ___.p,(O). ___). which correspond to the Pearson discrepancy. the modified Pearson discrepancy. and the Gauss discrepancy. respectively. Given a discrepancy associated with a particular weighting matrix M(,/; O), we define the corresponding minimum chi-square estimator: lkfirlitiorz 2. Given O,, is defined as
M( 1: fi). the corresponding
minimum
chi-square
estimator
To study the asymptotic properties of this minimum chi-square estimator under possible misspecification. we define the pseudo-true parameter as the value of I) that minimizes the discrepancy Q(O).
Dffinition 3. Given M(j; fI), the pseudo-true discrepancy Q(0) is defined as N, = argmin OE8
Q(0) = argmin OEO
parameter
0, associated
with the
V’(O)M(h, 0) V(0).
See. e.g., Sawa (1978) and White (1982) in the ML framework. To establish the asymptotic properties of H,, we make additional regularity conditions on p(H) and 0 [see, e.g.. Birch (1964)]. Assumption A.7. (I.+ is an interior point neighborhood of (I* completely contained Asswnption
A.8.
Ass~lmption A.Y.
The Jacobian
matrix
8, is the unique
standard
of 0, and there is a k-dimensional in 0.
ap(e,)/aO
minimizer
is of full rank M.
of Q(f)).
Under assumptions A.1 -A.3 and A.66A.9, Q,,(H)/n + Q(O) in probability uniformly in 6, as n +cc. Since 0, minimizes Q,,(H) and 0, minimizes Q(O), then H, -+ H, and Q,(O,)/n + Q(H,) in probability. See Moore (1984, thm. 5.1). Regarding the asymptotic distribution of H,,, we define the following matrices:
@(,f; 0) = diag
I
. . ..T ;p;(H)’
B = B1 + Bl - 83 - B,/‘2,
where
.
84 = (Bhij),
“.
M(f; B)diag
i
... .
1 , Jpi(O)’
“’
’
and
where the (i, j) element
I
+
is
?
c$!\I*;$Y / I
The matrices ,G(,fI 0) and D and DJ are M x M, the matrices L?. B1. B2, B,, and B4 are k x 51. and the matrix Q is 1; x k. where k and ,%I are the dimension of 0 and the number of cells, respectively. I~ettlttltr 3.1.
utlrler
tr.s.suttzpriotzs A. I~_A .3 l/d
\ tt(0,, - (I,) = Q
A.6
.4.9.
’ L?Dc’,, + op( 1).
Recall that U,, + N(0. -,,). “ Thus Lemma 3. I implies that , tt(O,, - 0,) has. as expected. an asymptotic normal distribution. Specifically.
\ f?(O,,- 0,) I)
N(O. Qm’L?DZ‘,,Dti’Q--‘).
Since ~ t!(t),, - 0,) is bounded in probability, then assumption A.4 is satisfied. In addition. given assumption A.6 the weighting matrix M,,, which is now M( f; (I,,), satisfies assumption AS as the next result shows.
tttul L,,
is O,(
1).
For completeness, we first look at Q,(e,) in the usual way, i.e., as a goodnessof-fit statistic. Recall that here 0, minimizes the quadratic form VA(0)M(,L 0) KX(0). Since Qn(H,,) is a consistent estimator of Q(H,), the implicit null hypothesis when using the statistic Qn(e,) is H,:
Q(H,) = 0,
or equivalently,
ti,:
11= p(H,),
because Q(H,) = h’M,h where M, = M(h, 0,) and assumption A.6 holds. In general, when H is a continuous distribution on X, the implicit null hypothesis H, strictly contains the null hypothesis of correct specification: H,:
H(.) = F(.l0,,)
for some
00~0
Hence rejection of H, implies rejection of Ho so that one can infer that the parametric model F. is misspecified. On the other hand, acceptance of ni, is not sufficient to guarantee that the parametric model is correctly specified. Throwm 3.3 [u.s~~~ptoti~ distribution trnd A.hpA.9:
(i) Under tile implicit
null hypofhesis
of Q,,(H,,)].
Given assumptions
A.IbA.3
H,, M
Q,Jo,) = U:,[M,
- M,B(B’M,B)-’
EM,]
U, + oP( I) 5
1
E.,zf ,
i=l whrw
Zi,
R = &,[M,
i = 1, 2, . . . , M,
ure
iid
N(0, I), und
r.i ure
the
of
eiyenvalurs
- M,B(B’M,B)-‘FM,]&,.
(ii) Unrler the ulternutice
to H,,
Q,(H,,) + + x. Part (ii) of Theorem 3.3 implies that the test is consistent for any alternative to H,. Part (i) shows that our general chi-statistics Qn(Nn), where 0, is the corresponding minimum chi-square estimator, are not necessarily asymptotically chi-square distributed under the null hypothesis of correct specification. The next corollary, however, characterizes conditions under which Qn(Hn) has the usual chi-square distribution under H,,. Corollur!’ 3.4. Gicen ussllmptions A.IbA.3 und A.6CA.9, the statistic Q,(d,) conrvrgrs to u chi-square distribution under its implicit null hypothesis I?, (fund only if’ the M x M matrix n’= is idempotent,
&[M,
- M,B(B’M,B)~‘B’M,]Z‘,
in which cuse the number
of’ degrees qffreedom
is M - k -
1.
An interesting feature is that all statistics Q,(C),,) have the same number of degrees of freedom, namely M ~ li - I. whenever they are asymptotically chisquare distributed under their implicit null hypotheses. This result differs from previous ones which apply when other estimators such as the ML estimator on ungrouped data are used in forming the chi-square type statistics Q”(C),,). See Rao and Robson ( 1974) and Moore ( 1978) among others. Second. because z‘, is idempotent, a sufficient condition for Q,,(n,,) to be chi-square distributed under n,, is that the matrix in square brackets in the equation for fi be a y-inverse of C,,. See Rao and Mitra (1971). For instance. since y;lB = 0. it is easy to verify that this is the case when ‘VI.+ = I zI. i.e., when the probability limit of the weight matrix +J,, = SJ(,fi II,,) is the identity matrix. Well-known examples are M, = I,,{ and 5f,, = diag[ . pi(O,,),‘,/l. . ..I. which lead to the Pearson and Modified Pearson statistics. respectively. although other less trivial examples can be constructed easily.
4. Tests for model selection As seen in the previous section, when one chooses a particular Pearson type statistic Q,,(O,,) with (I,, being the corresponding minimum chi-square estimator. one actually evaluates the goodness-of-fit of the parametric model FCIaccording to the implicit discrepancy Q(0,) between the true distribution H and the specified model F,,. Thus it is natural to define the ‘best’ model among a collection of competing models to be the model that is closest to the true distribution according to the discrepancy Q( .) = V’(.)M(h, .) V(.) implicit in the chosen Pearson type statistic. In this paper. we consider the problem of selecting between two models. Let 3,. = ;%(, 1;‘); ;‘E I‘ j b t‘ a competing model, where r is a y-dimensional parameter space in RY. The competing model Y; can be nested, nonnested. or overlapping with E;,. In a similar way. we can define the minimum chi-square estimator ;‘,,, the quasi-true parameter ;I.+.,the general chi-square statistic Qn( ;‘,,), and the implicit discrepancy Q(;!*) for the model
J~f+nitiorl
4
two competing
(qlrirtrlf~r~t.
parametric
lwtter. md ~cwsr).
models
= Q(;t,)
means
Given
the discrepancy
Q( .) and
Ff, and %,.
H;I,:
Q(0,)
that the two models are equivalent
H, :
Q(l),) < Q(;j.J means
H,,:
Q(c)*) > Q(;‘.+) means that C;, is worse than %,
that E-,,is better than Y.,.
This definition dots not require that either of the competing models be correctly specified. On the other hand, a correctly specified model must be at
Q.H. Vuonq and W. Wanq, Model selection fests
153
least as good as any other model because Q(0,) = 0 while Q(y*) 2 0. Note also that if F0 is nested in ?I7 (say), then H, cannot occur. The indicator Q(0,) - Q(y*) is unknown. From the previous section, it can be consistently estimated by the difference [Q.(e,) - Q,,(y,,)]/n. This difference converges to 0 under the null hypothesis Hg, but converges to a strictly negative or positive constant when H, or H, holds. These properties actually justify the use of Q,(0,,) - Q,,( yn) as a model selection indicator and the common procedure of selecting the model with highest goodness-of-fit. As argued in the introduction, however, it is important to take into account the random nature of the difference Q,(6),) - Q,,(Y,,) so as to assess its significance. To do so we consider the asymptotic distribution of [Q,(@,,) Q,,(7n)]/Jn under Hg. Our major task is to propose some tests for model selection, i.e., for the null hypothesis HE against the alternatives H, or H,. We use the next lemma which follows from Corollary 2.3, Lemma 3.1, and Lemma minimum chi-square estimators of 3.2, with 0, and jl,, as the corresponding ~PO and 9;. Using the subscript f for the vectors and matrices defined earlier when applied to .FO, we define the 1 x M row vector S; = cis + 2h;M,,D,,
+ (c ;s -
@&+.D,,B.dQj’&-R
where
cir =
aMf*
b;--
Similar
notations
Lemma
4.1.
ah
apply to the model YY with the subscript
Given assumptions
A.I-A.3
and A.6&A.9:
(i) For model 5@,
&Q.(4)
= JnQV,)
+ S;u, + o,(l).
= JnQ(y*)
+ $u,, + op(l).
(ii) For model YI,
$Q,,ty,,l
.f replaced
by 9
Using this lemma. it is now straightforward model selection. We define
to form some test statistics
for
IT2= (S, - S,,)‘Z,,(S~ ~ S,). which is the variance of (Sf - S,)‘C’,,. This variance can be consistently estimated. Indeed, since I’- 11,[‘((I,,) + p(l),). and M,-,, = M(,/; O,,) + M,-, with similar properties for model 9 . then S,. S,. and C,, are consistently estimated by their sample
analogues
s^,, .?(,. and z’,,. Hence 0’ is consistently
Next we define the model selection
cxf
=
,I \
estimated
by
statistic
_L_Q,,(O,, 1- Q,,(;*,,) ri ”
Theorem 4.2 applies to any Pearson type discrepancies, not only to those for which the corresponding statistics are asymptotically chi-square distributed under correct specification. Thus it provides us with a wide variety of simple directional and asymptotically standard normal tests for model selection based on Pearson type discrepancies. Specifically. given any Pearson type discrepancy of the general form Q( .) = I”( .)M(h. .) I’(.), one first estimates each competing model by the corresponding minimum chi-square estimator to obtain the goodness-of-fit statistics Q,,(O,,) and Q,,( ;‘,,). Then to select among the competing models. one computes the difference between these Pearson type statistics suitably normalized by V rlri and chooses a critical value c from the standard normal distribution for some significance level. If the value of the normalized difference CM,, is greater than (‘, then one rejects the null hypothesis of equivalence and chooses ?Y;,over .P, according to the discrepancy Q( .). If the value of CM,, is less than ~ C, then one selects Xl0 over ‘G,. Lastly. if the value of CM,, falls between ~ L’and L’.then one concludes that there is not sufficient evidence to discriminate between the competing models -PC,and ‘G Parts (ii) and (iii) ensure consistency of the test.
An important feature of our procedure is that one no longer has to choose a model in every case. In particular, the competing models and data may be such that there is not enough evidence in favor of either model. In addition, the same numerical difference in goodness-of-fit may be statistically significant for some competing models while insignificant for others. Alternatively, if one does not want to carry out a formal test, the (asymptotic) p-value of the statistic CM, can be used to provide information on the strength of the evidence in favor of one model over the other. Theorem 4.2 requires that the variance 0’ be nonzero. As in Vuong (1989) this is violated if one competing model is nested in the other or if the competing models are overlapping and both contain H. In the nested case, the statistic CM, converges in probability to zero. For, when .YO is nested in Y,. (say), it can be shown, using Corollary 2.3, Lemma 3. I, and Lemma 3.2, that the unnormalized difference Q,,(8,,) - Q,l(;‘n) is asymptotically distributed as a weighted sum of independent chi-squares with positive weights under the null hypothesis of equivalence H6. This latter result contains some well-known properties such as the asymptotic &k distribution of the Pearson and modified Pearson statistics under the maintained hypothesis that the larger model Z?? is correctly specified. See, e.g., Goodman (1970). Haberman (1978) Bishop, Fienberg, and Holland (1975).
5. An example
To illustrate our results. we consider the problem of choosing between an exponential model and a log-normal model. This problem has a long history in the statistical literature. See, e.g., Cox (1962) and Atkinson (1970) among others. See also Pesaran (1987). The log-normal distribution is parameterized by x = (xl. rz) and has density
.f‘(s; X(1,22) =
and zero otherwise.
1
(log-x - X,)l
.x(27r)1’2J2 exp ( -
The exponential
Il(.y: /j) = lexp( P
- s/b)
for
2x;
distribution
for
.Y > 0,
) with parameter
p has density
.Y> 0,
and zero otherwise. We use the Pearson statistic. The real line is partitioned into M intervals ((c-r.ci), i= 0,I,....Mj, where c,, = 0 and c,,, = + a. The corresponding
minimum
chi-square
estimators
i = (i,, i,) = argmin Isn’
/i = argmin
of x = (x,. J,) and fi are
II C
XERL
Q,,(/l) = argmin
/JtR
“’ (.fl -
Q,,(O) = argmin
/ItR
Pi(r))'
Pit21
I=1
’
jA-’(.1’- P,CP))’
II 1 i=l
Pi
.
where pi(z) and p;(B) are the probabilities of the interval (c,;- 1, ci) under,f’(z, x) and <](.Y,p). respectively. Given i and /T, we can readily compute the goodnessof-fit statistics Q,,(G) and Q,,(j). To test the null hypothesis that both models arc equally close to the true distribution vs. the alternative that one model is closer than the other, we compute ci’ = (s^, ~ S,)‘f,(s^,- - gq) using ,ZZ,,= I.&,- G,G;, IjH= (.‘.fl, , tl,/.l)‘, and the vectors s^, and ,‘?(,as
because M,, = I,,r so that M, = Iv,. For simplicity. we have omitted the subscripts,/‘and .L/pertaining to the competing models in the above formula. Then we simply refer to the model selection test statistic
and use the standard normal tables. To assess the accuracy of our asymptotic results and the performance of our procedure in small samples. we conduct a limited Monte Carlo study. We consider various sets of experiments in which the data are generated from a mixture of an exponential distribution and a log-normal distribution. The latter two distributions are calibrated so that they have the same population means and variances. namely one and one. Hence the DGP has density h(n) = rr Exponential(l)
+ (1 - rr)Log-normal(
~ 0.3466. 9.8326).
where T( is specific to each set of experiments. In each set, several random samples are drawn from this mixture. The sample size varies from 100 to 1000, and for each sample size. the number of replications is 1000. Throughout, the chosen partition has four cells defined by the values co = 0, (‘, = 0.1, cZ = I .O. c3 = 3.0, and cs = + X. Because the log-normal distribution
Q.H. Vuony and W. Wang, Model selection tests
157
has two parameters, four is the minimum number of cells for which a perfect fit is not always achieved when fitting this distribution by minimum chi-square methods. Moreover, the log-normal and exponential densities greatly differ around the origin. This motivates the choice of c1 = 0.1. The value c2 is equal to the common population mean, while cj is two standard deviations away from the mean so as to control for large deviations. Values for rt are 0.0, 1.0, 0.523, 0.25, and 0.75. Although our proposed model selection procedure does not require that the DGP belongs to either of the competing models, we consider the two limiting cases 7c = 0.0 and rr = 1.0 for they correspond to the null hypotheses under test by the Pearson chi-square statistics Q,,(i) and Q,,(p). The value 7c= 0.523 is the value for which the log-normal and exponential families are approximately at equal distance to the mixture h(rc) according to the Pearson discrepancy with the above cells. Thus this set of experiments corresponds approximately to the null hypothesis of our proposed model selection test CM,. Lastly, to investigate the cases where both competing models are misspecified but not at equal distance from the DGP, we consider the cases where rt = 0.25 and x = 0.75. These cases correspond to a DGP which is log-normal or exponential but slightly contaminated by the other distribution. The results of our five sets of experiments are presented in tables l-5. The first half of each table gives the average values of the minimum chi-square estimators 4 and /?, the Pearson goodness-of-fit statistics Qn(Oi) and Q,,(b), and the model selection statistics CM, with its estimated variance c?~.The values in parentheses are standard errors. The second half of each table gives in percentage the number of times our proposed model selection procedure based on CM, favors the log-normal model, the exponential model, and is undecisive. The tests are conducted at the 5% nominal significance level. Each table afso gives the number of times a procedure based on the pair of Pearson chi-square goodnessof-fit tests chooses the log-normal model, rejects both models, accepts both models, and chooses the exponential model. Again, the nominal significance level of each goodness-of-fit test is set at 5%. In the first two sets of experiments (n = 0.0 and z = 1.0) where one model is correctly specified, we use the labels ‘correct’ and ‘incorrect’ when a choice is made. Finally, in the case where 7c= 0.523, we give in addition the 2.5%, 5.0%, 95%, and 97.5% fractiles of the observed distribution of the CM, statistic. This allows a comparison with the asymptotic N(0, 1) approximation under our null hypothesis of equivalence. The first halves of tables l-5 confirm our asymptotic results. They all show that the minimum chi-square estimators ii, ci2, and fi converge rapidly to their pseudo-true values in the misspecified cases and to their true values in the correctly specified cases as the sample size increases. The estimated variance 6 stabilizes quickly starting from sample size 250. With the exception of table 1 and table 2, respectively, Q,,(i) and Qn( $) increase approximately at the rate of n, as expected when the models are misspecified. In table 1, Q,,(6) converges to
J.Econ
F
158
D
= Log-normal
(-
0.3366.
IO0 ~ 0.3245 (0.1032)
0.X3261. I000 -- 0.3467 (0.0354)
o.x4so (0.0975)
0.x.37-1 (0.02X.3)
0.9h I x (0.1001J
0.9349 10.0335)
0.0x32 (0.07021
0.07 l-1 10.02391
NY3 (1.435)
0.Y7 (1.319)
I I .OY (‘.X34)
IOO.5 (X.472)
I.6 IO (3.3OYl
11.45 (3.103)
0.0” 0 I I .11”,, X7.X”,,
0.0” 0 0.0” II I OO.O”~,~ 0.0” 0 5.?“,, 0.0” 0 94.7” II
I which is the mean of its asymptotic chi-square distribution with one (4 - 1 - 2) degree of freedom when the log-normal model is correctly specified. Similarly, in table 2. Q,,(p) converges to 2 which is the mean of its asymptotic chi-square distribution with two (4 - I - I) degrees of freedom when the exponential model is correct. With respect to our statistic CM,,, it diverges to - x or + x at the approximate rate of ,’ II except in table 3. In the latter case the C,Vf,, statistic converges, as expected. to zero which is the mean of the asymptotic N(0. 1) distribution under our null hypothesis of equivalence. Turning to the second halves of tables I 2. we first note that the percentage of rejection of both models when using two Pearson chi-square tests converges approximately to 5%. This is due to the nominal significance level of 5% for these tests. As a consequence, a model selection procedure based on a pair of Pearson statistics cannot achieve 100% of correct choices when one model is correctly specified even when the sample size increases. In contrast. the model selection procedure based on our statistic CM,, does not have this property, and the percentage of correct choices steadily increases with sample size attaining 100% of correct choices starting from II = 500 when the log-normal is correct.
159
Table 2 DGP = Exponential 100
(I .O) 1000
250
500
(0.1382)
- 0.5569 (0.0822)
~ 0.5549 (0.0610)
- 0.5499 (0.0423)
YL
I.248 (0.0974)
I.232 (0.0644)
I.230 (0.0465)
1.226 (0.0336)
li
0.9915 (0. I 134)
0.9965 (0.0684)
0.9965 (0.0503)
0.998 I (0.0354)
ri*
0.252 (0.1124)
0.230 (0.0394)
0.223 (0.0217)
0.220 (0.0122)
Q.c;)
6.47 (4.614)
14.64 (7.393)
2.01
2.03 (2.147)
2.06 (2.140)
2.10 (2.232)
1.706 (1.041)
2.524
(1.009)
(I ,049)
3.652 (1.086)
-
Xl
0.5755
(2.168) CM”
0.9766
28.43 (10.52)
55.99 (15.28)
Model selection based on CM,,
Incorrect Indecisive Correct
0.2% 82.6”/0 17.2%
O.O”/” 60.5% 39.5%
0.0% 31.1% 68.9%)
0.00/u 5.4% 94.6%
Chi-square specification
Incorrect Reject both Accept both Correct
2.30/u 3.0% 32.0”/, 62.7%
1.5% 4.5%
0.0% 6.3%
0.0% 5.7% 0.0% 94.30/u
test
I I”/” 9X2”/”
0.0% 93.7”/0
On the other hand, tables 1 and 2 also show that in small samples the model selection procedure based on a pair of chi-square statistics dominates our CM,, procedure at the risk, however, of a larger percentage of incorrect decisions. The preceding comments for the second halves of tables 1 and 2 also apply to the second halves of tables 4 and 5. An important difference, however, is that the percentages of rejection of both models when using a pair of chi-square statistics no longer converge to 5%. This is so because both models are now incorrectly specified so that, as sample size increases, this percentage of rejection will approach 100%. As a consequence, the percentage of correct choice using a pair of chi-square statistics approaches 0%. In contrast, the percentage of correct choices using the CM,, statistic steadily increases and ultimately converges to 100%. The second half of table 3 confirms, in small samples, the relative domination of the model selection procedure based on a pair of chi-square statistics over the CM,, model selection procedure in percentages of correct decisions at the risk however of a higher rate of incorrect decisions. Table 3 also confirms our asymptotic results. As sample size increases, the percentage of rejection of both
160
Table DGP
= 0.523
1; Exponential(
3
1.0) + 0.477
loo
ri
CM,,
2.5”% 5.0% 95.0% 97.5”/u
CSI”
Model based
selection on CM,
Chi-square specification
test
fractile fractile fractile fractile
* Log-normal( 250
0.347.0.833). 500
1000
~ 0.4662 (0.1262)
- 0.4735 (0.0746)
- 0.4749 (0.0547)
~ 0.4790 (0.0399)
I.058 (0.0984)
I .064 (0.0597)
I.066 (0.0447)
I .0663 (0.030.5)
0.9867 (0.1094)
0.9814 (0.0660)
0.9803 (0.0466)
0.9767 (0.0349)
0.217 (0.0440)
0.225 (0.0200)
0.229 (0.0126)
0.23 14 (0.0071)
3.14 (3.046)
6.56 (4.928)
12.41 (6.703)
74.27 (9.449)
3.94 (2.790)
7.58 (4.109)
13.22 (5.903)
24.5 I (7.572)
- 0.264 (1.213)
~ 0.167 ( I.0681
~ 0.095 ( I.0591
-- 0.0261 (0.9823)
_ 2.870 ~ 2.354 I.473 1.8191
~ 2.395 - 1.954 I .4634 I .7586
~~ 2.177 - 1.X89 I .6366 1.X433
_ 2.085 ~ 1.664 I .5235 I .9775
8.1‘%,
Favor log-n Equivalent Favor exp.
90.0”/;, I .9%
4.9% 93.50/u I .60/U
3. Y 5’” 94. I %r
Favor lop-n. Reject both Accept both Favor exp.
17.5Ri, 3.09,, 5l.O”ia 28.5”,&
3 I I o:u 30.7V” 3.9% 34.3%
7.8 T,o X1.90,” o.oO/;, 10.30;,
2.00%
3.0% 94.4% 2.6?$ 0.25” 99.84; 0.0% 0.0%
models using a pair of chi-square statistics converges, as it should, to 100% because both models are misspecified. On the other hand, the percentage of times the models are deemed equivalent according to the CM, statistic converges to 95% (1 - the nominal size of our test). Moreover, the displayed fractiles show good agreements with the asymptotic N(0, 1) especially for n = 1000. In small samples, however, the CM,, test seems to reject too often the null hypothesis of equivalence. Hence the actual size of the test is somewhat higher than the nominal size of 5%. It seems also that the actual distribution of the CM, statistics is slightly skewed towards the log-normal model. Small sample adjustments to the CM,, may improve the asymptotic N(0, 1) approximation. Finally, we note that it is easier to select the log-normal model when this model is correct or closer to the DGP than to select the exponential model when
Q.H.
Vuomy and W. Wany, Model
161
selection tests
Table 4 DGP = 0.25 * Exponential(
1.O) + 0.75 * Log-normal(
~ 0.347,0X33) 1000
100
250
500
Yl
- 0.4150 (0.1 118)
- 0.4285 (0.0727)
- 0.4209 (0.0538)
- 0.4237 (0.0374)
12
0.9556 (0.0982)
0.967 (0.0584)
0.9616 (0.0429)
0.9604 (0.0288)
/j
0.9645 (0.099 1)
0.9587 (0.0647)
0.9598 (0.0482)
0.9556 (0.0339)
(j’
0.1791 (0.0728)
0.1909 (0.0410)
0.1918 (0.0318)
0.1936 (0.0212)
K(i)
1.64 (2.159)
3.03 (3.164)
5.22 (4.287)
9.38 (5.766)
Q.(li)
6.9 I (3.183)
15.09 (4.915)
29.40 (7.170)
57.84 (9.283)
- 1.630 (1.738)
- 1.909 (1.353)
- 2.592 (1.345)
- 3.553 (1.196)
CM”
Model selection based on CM,
Favor log-n. Equivalent Favor exp.
30.4% 69.5% 0.1%
44.3% 55.7% 0.00/u
66.7% 33.3% 0.0%
91.5% 8.5% 0.0%
Chi-square specification
Favor Reject Accept Favor
54.2% 4.1% 33.4% 8.3%
69.8% 28.2% 0.2% I .8%
45.2% 54.8% 0.0% 0.0%
16.6% 83.4% 0.0% 0.0%”
test
log-n. both both exp.
the latter model is correct or closer to the DGP. This is because the log-normal model has one more parameter than the exponential model. For instance, in table 2, the CM, test only detects the correct model 17.2% of the times. This percentage increases very gradually to 68.9% when the sample size reaches 500. A penalty for additional parameters may be incorporated into the test statistic CM,,for small sample adjustments.
6. Extensions and conclusion In this paper, we investigated the problems of model selection using Pearson type statistics. Specifically, we proposed some asymptotically standard normal tests for model selection based on Pearson type statistics that use the corresponding minimum chi-square estimators. Our tests are based on testing whether the competing models are equally close to the true distribution against the alternative hypotheses that one model is closer than the other where closeness of a model is measured according to the discrepancy implicit in the Pearson type statistics used.
I wo
100
30
0.5006 (0. I xx I
~ O.jl IO (0.0785)
0,s Ihh 10.04 I I1 l.l4Y (O.Oil
I. I MO (O.OYY71
I.142 (0.062
I .0001 (0. I 1001
0.9924 (O.OhhO)
O.YX87 (0.0354)
0.724’) (0.03751
0.2~3X (O.OI 7’))
0.7247 (0.0077)
4.56 (3.XSYI
IO.22 (6.21
2.43 (Z.lJY)
3.40 (2.7401
7.75 (3 5651
0.4336 ( 1.065)
0.9 I70 ( I .O’I 1
3.0464 (O.YX IO)
O.O’“a X3.?“<, 15 7”<,
0 0” 0 17.5%, 52.5” 0
I
II
)
3)
3x 36 II 1.73)
X 3 “~I, X.0”;, 6.4” I, 77.3” 0
Our work can be extended in several directions. One direction is to extend our results to econometric models. In these models, only the conditional distribution of the endogeneous variables _t’given the exogeneous variables -_ is specified to belong to a conditional parametric model ,f(~ 1:; 8). Because the marginal distribution of the exogeneous variables is left unspecified one cannot associate a joint distribution for the observed data (J,~, zi) to a given parameter value H. Hence when the sample space X = Y x Z is partitioned into mutually disjoint cells. the expected probability in each cell cannot be evaluated. This expected probability can, however. be estimated consistently by substituting the empirical marginal distribution for the true marginal distribution of 3 [see Andrews (19X&t)], i.e.. we can use
where r( ~1) is some a-finite measure on Y. Given quencies, chi-square type statistics can be constructed
these ‘expected’ cell freand minimum chi-square
estimators can be defined. The asymptotic standard normality of our proposed tests for model selection is expected to remain. A second extension is to use random instead of fixed cells. Random cells arise when the boundaries of each cell Ci depend on some unknown parameter vector r. which are estimated. See, e.g., Andrews (1988b) for various examples. Random cells can be useful. For instance, with appropriate random cells, the asymptotic distribution of a Pearson type statistic may become independent of the true parameter 8, under correct specification. See, e.g., Roy (1956) and Watson (1959). Recent work has shown that the asymptotic distributions of Pearson type statistics remain unchanged under correct specification with random cells as long as the cell boundaries converge in probability to a set of fixed boundaries. See Chibisov (1971) Moore and Spruill (1975) and Andrews (1988a). In view of this latter result, it is expected that our model selection test statistic will remain asymptotically normally distributed with the same asymptotic variance fr2. Finally, because the minimum chi-square estimator is in practice not readily obtained, alternative simpler estimators such as the ML estimator on ungrouped data are frequently used when forming the statistic Q,(fl,). See Chernoff and Lehmann (1954), Moore and Spruill (1975) Moore (1977, 1978) and Andrews (1988a). Thus it will be useful to extend our current results to such situations, and in particular by allowing a broad class of alternative estimators in forming goodness-of-fit and model selection statistics. This is pursued in Vuong and Wang (199 1). Moreover, since different cell choices lead to different implicit null hypotheses, it will be interesting to examine the sensitivity of the various tests to different cell choices.
Appendix
This is basically
a Taylor
1 pgj
expansion
of Q,(fI,,) around
1 =
- 2( q42
p:;2
Pm
-
P*i)
+
t"ptlL
where pi,, = pi(l),,), and
&=n
I,2 A~-___ - Pin (
x2
P*i
.1; - PinAp, + ~O,(l) 2py ‘” n >’
0,. Note that
’
. II Q,,(f),, 1= i
. II
Regrouping
Theorem
terms and substituting
2.1 follows after tedious
algebra.
This follows from Theorem 2. I when we notice that h = p* implies that /J = 0, A, = 0 and A3 = 0. D, = I.,,, and Ds3 = 0. Consequently.
This follows from Theorem o,,l I ) (small 0).
2.1. since both ,+I2and -3, divided
by ~ II become
P,Y10f’of’Lctt1rt1tr 3. I From the first-order around 0,. we obtain
condition
of Q,,(C),,)and a Taylor expansion
of ?Q,,(fl,);/~fl
(A.11
Vuong and W. Wang, Model selection
Q.H.
where & lies between
165
rests
0, and 0,. We rewrite Q,(O) and Q(O) as
PUW’~(.L @IL- ~(~11,
Q,(O) = KXO)M(,f; 0) v,(O) = nCf-Q(d) = Ch - p(Wfi(h,
O)Ch - P(~)l,
where G(,fi 0) = diag[ . . . , p; 1’2(O), . ..]M(.f. M(K, H) is similarly defined. Taking derivative
aQn(@ _ 2nww -= F M(,f, QCfaBi
B)diag[ , pi ‘12(0), . ..] and of Q,,(O) with respect to Oi:
PWI
I
+
Expanding
P(waMi;~ , O) Cf- PWI.
n[f--
fi(,f; (9) and afi(,f,
ti(J
0) =
fi(h,
B)/aOi around
cM afig
0) +
0)
ah,
j=
an;i(,f,0) afivb fl) aei = aOi +
1
(h, 6), we obtain
(f;. -
(A.3)
hj),
I
M aSi(h,q
i:l
(A.21
(J;. - hj).
aQ.ah. 1
J
Substituting (A.3) and (A.4) into (A.2) and after tedious following expression:
= - 2z$*J,(f.L
+
h) - 2%
2(h - p,)‘--‘f$
+(h-p,)‘C jE
M 1
I
1 aQn(o*)
$U-
,-1
algebra,
- p,)Jn(.fi
$(h
we get the
- hj)
J
h)
a32 --*(h aOi ahj
for all i = I, 2, . . , k. Stacking
$TG-=-
,i I
(A.4)
-
P*)Jn(.fi
all Oi together,
Z&/n(f-
-
lzj) +
o,(l),
we get
h) + o,(l) = - 2i7DUn + o,(l).
(A.3
Furthermore.
Substituting
we have
(A.5) and (A.6) into (A.l), we
= -
[2Q +
o,,( I )]
’[~
2i?DU,,+ o,,( I )]
= Q- ’ BDCf,, + op( I). Ptwof
q/’ Letlltl7tr 3.2
This is just the Taylor
expansion
of M,, = M(,f; (IJ around
(h. 0,).
Note that 2‘,, is idempotent. Part (i) follows from Corollary 2.2. Lemma 3.1. the consistency of M(,f; O,,) to M(h, H*), and Moore (1976, thm. I). Part (ii) is immediate.
That Q,,(O,,) is asymptotically chi-square distributed under HO if and only if d is idempotent follows from Theorem 3.3. Rao and Mitra ( I97 I, thm. 9.2.1). and the fact that L;,, + N(O. I,,), where 1, is idempotent. To complete the proof, it suffices to show rank[M,
- !M, B(B’M*B))‘B’M*]C,
= ‘44 ~ I\ ~ 1.
(A.7)
Let Z = M: ‘B, Y = i!4k”z,,. and I’ = [I ~ Z(Z’Z)-‘Z’] Y. Since M, is positive definite by assumption A.6, then eq. (A.7) is equivalent to rank I’= hl ~ k ~ I. We first show that Ker[l - Z(Z’Z))‘Z’]clm[Y].where Ker[.] and Im[.] are the null space and image space of [.I. respectively. Since 1, = I ~ yHy;I and Ker[I - Z(Z’Z)) ‘Z’] is the vector space generated by the columns of Z, we have to show that V j.E Rk, 3 /LE R” , such that Zi. = M$‘(I - yHy;I)/l. This is equivalent to Bi. = (I - yI,y;,)/~. Taking p = Bx, then ,U - y”y;I,~= Bi ~ y,,y;,Bi = Bi because q;IB = 0. Thus Ker[I - Z(Z’Z))‘Z’]cIm[Y]. Now dim (Im[ Y]) = rank[M: ‘(I - yr,y;,)] = rank(l - yrryh) = M - 1. It follows that rank( I’) = rank[Mk ‘(I ~ yHy;l)] - dim (Ker[I - Z(Z’Z)-‘Z’]) = M - k - I, which establishes (A.7).
Proof’ of Lel?lnlu 4. I
Sustituting d,‘n(H,, - H,) in Lemma 2.4. the desired result follows.
3.2 and L, in Lemma
3.3 into Corollary
Proof‘ of’ Theorrttl 4.2
From
Under
Lemma
4. I.
H,: h;.M,.,h,
= hiM,,h,,
which converges to N(0, a2). Since 6 -+ CTa.s., then proves (i). Part (ii) follows easily.
CM, A
N(0, 1). This
References Akaike. H.. 1973. Information theory and an extenston of the hkelihood ratio principle. in: B.N. Petrov and F. Csaki. eds., Proceedings of the second international symposium of information theory (Akademiai &ado, Budapest) 157 -2X1. Akaike, H.. 1974. A new look at the statistical model identllication, IEEE Transactions on Automatic Control AC-19. 716-723. Amemiya. T., 1972. Bivariate probit analysis: Minimum chi-square methods. Journal of the American Statistical Association 69. 940-944. Amemiya. T., 1976. The maxImum likelihood. the minimum chi-square and the non-linear weighted least squares in the general qualitative regression model. Econometrica 45, 955-968. Amemiya. T.. 1985. Advanced econometrics (Harvard University Press. Cambridge. MA). Andrew, D.W.K.. 1988a. Chi-square diagnostic tests for econometric models: Theory, Econometrica 56. 1419-1453. Andrew. D.W.K., 1988b. Chi-square diagnostic tests for econometric models: Introduction and applications, Journal of Econometrics 37. 135% 156. Atkinson. A.C.. 1970. A method for discriminating between models, Journal of the Royal StatistIcal Society B 32, 323-353. Bishop. Y.M.M.. SE. Ficnberg. and P.W. Holland. 1975. Discrete multivariate analysis (MIT Press. Cambridge. MA). Birch. M.W.. 1964. A new proof of the Pearson-Fisher theorem, Annals of Mathematical Statistics 35. x 17-824. Chernoff H. and E.L. Lehmann. 1954, The use of maximum of likelihood estimates in L” test for goodness of fit. Ann. Math. Sratiatica 25. 579-586. Chibisob. D.M.. 1971. Certain chi-square type tests for continuous distributions. Theory of Probabiltty and Its Applications 16, I-22. Cox. D.R.. 1961. Tests of separate families of hypotheses. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability I. 105-123.
168
Q.H. Vuony and W. Wang, Modrl selection
tests
Cox. D.R., 1962, Further results on tests of separate families of hypotheses. Journal of the Royal Statistical Society B 24, 406-424. Fisher, R.A., 1924. The condition under which zz measures the discrepancy between observation and hypothesis, Journal of the Royal Statistical Society 87, 422-450. Goodman, L.A.. 1970, The multivariate analysis of qualitative data: Interactions among multiple classifications, Journal of the American Statistical Association 65, 2266256. Haberman, S.J., 1978. Analysis of qualitative data (Academic Press, New York, NY). Heckman, J.J.. 1984. The lz goodness of tit statistic for models with parameters estimated from microdata. Econometrica 52, 154331547. Horowitz. J.L., 1985. Testing probabilistic discrete choice models of travel demand by comparing predicted and observed aggregate choice shares, Transportation Research B 19, 17-38. Kullback, S. and R.A. Leibler, 1951, On information and sufficiency, Annals of Mathematical Statistics 22, 79-86. McFadden, D., 1974, in: P. Zarembka eds., Conditional logit analysis of qualitative choice behavior: Frontiers of econometrics (Academic Press. New York, NY). Moore, D.S., 1977, Generalized inverses, Wald’s method and the construction ofchi-squared tests of fit, Journal of the American Statistical Association 7, 131-137. Moore, D.S.. 1978, Chi-square tests, in: R.V. Hogg, ed.. Studies in statistics, Vol. 19 (The Mathematical Association of America). Moore. D.S., 1984, Measures of lack of fit from tests of chi-squared type. Journal of Statistical Planning and Inference 10, I5 I - 166. Moore. D.S., 1986, Tests of chi-squared type. in: R.B. D’Agostino and M.A. Stephens, eds., Goodness-of-fit techniques (Marcel Dekker, New York, NY) 63-95. Moore, D.S. and M.C. Spruill, 1975, Unified large-sample theory of general chi-squared statistics for tests of fit, Ann. Statist. 3, 5999616. Nerlove. M. and S.J. Press, 1980, Multivariate log-linear probability models in econometrics, Discussion paper no. 47 (Center for Statistics and Probability, Northwestern University, Evanston, IL). Pearson, K., 1900, On the criterion that a given system of deviation from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling, The London, Edinburgh and Dublin Philosophical Magazine and Journal of Science 50, 157- 175. Peasaran, M.H.. 1987. Global and partial non-nested hypotheses and asymptotic local power, Econometrrc Theory 3, 67-97. Rao. C.R. and SK. Mitra, 1971, Generalized inverse of matrices and its applications (Wiley. New York. NY). Rao, K.C. and D.S. Robson. 1974, A chi-square statistic for goodness-of-fit tests within the exponential family. Communications in Statistics 3, 1139-I 153. Roy, A.R., 1956, On zz statistics with variable intervals, Technical report no. 1 (Stanford University, Department of Statistics. Stanford, CA). Sawa. T., 1978. Information criteria for discriminating among alternative regression models, Econometrica 46. 1273- 129 1. Tauchen, G.E., 1985, Diagnostic testing and evaluation of maximum likelihood models. Journal of Econometrics 30, 4155443. Vuong. Q.H., 1989. Likelihood ratio tests for model selection and non-nested hypotheses, Econometrica 57. 257-306. Vuong, Q.H. and W. Wang, 1991, Selecting estimated models using chi-square statistics, Discussion paper no. 9165 (Center for Economic Research, Tilburg University, The Netherlands). Watson, G.S., 1959, Some recent results in chi-square goodness-of-fit tests. Biometrics 15,440-468. White, H.. 1982. Maximum likelihood estimation of misspecified models, Econometrica 50, l-25. White, H. and 1. Domowitz, 1984, Non-linear regression with dependent observations, Econometrica 52. 143-161.