Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
Contents lists available at ScienceDirect
Journal of Statistical Planning and Inference journal homepage: w w w . e l s e v i e r . c o m / l o c a t e / j s p i
Testing for and against a set of inequality constraints: The k-sample case Hammou Elbarmia,∗,1 , Vicente Núñez-Antónb , Dale L. Zimmermanc a
Department of Statistics and CIS, Baruch College, City University of New York, USA Departamento de Econometría y Estadística (E.A. III), Universidad del País Vasco, Spain c Department of Statistics and Actuarial Science, University of Iowa, USA b
A R T I C L E
I N F O
Article history: Received 29 June 2006 Received in revised form 16 April 2008 Accepted 15 June 2008 Available online 25 June 2008 Keywords: Chi-bar square Inequality constraints Lagrange multipliers Likelihood ratio Orthant probabilities
A B S T R A C T
In this paper we aim to generalize the existing results and provide a test for testing for and against a set of inequality constraints placed upon any k (k 1) vectors of parameters corresponding to k populations. The approach we take is the likelihood ratio approach. We show that the limiting distributions of the likelihood ratio test are of chi-bar square type and provide the expression of the weighting values. Examples are discussed to illustrate the theoretical results. © 2008 Elsevier B.V. All rights reserved.
1. Introduction A commonly occurring problem in statistics is that of deciding, based on a number of independent observations of some random variable with distribution indexed by a finite-dimensional parameter h, whether or not h belongs to some proper subset 0 of the set of possible values of h. This problem has received considerable attention and a popular method for deciding the issue is the likelihood ratio approach. When 0 = {h : hj (h) = 0, j = 1, 2, . . . , c} for some known functions hj , j = 1, 2, . . . , c, Aitchison and Silvey (1958) showed that under some regularity conditions, the likelihood ratio test statistic has, asymptotically, a chi square distribution when testing h ∈ 0 against the unrestricted alternative. Their work was extended to testing that h ∈ 0 against the alternative that h ∈ 1 = {h, hj (h) 0, j = 1, 2, . . . , c} and not in 0 by El Barmi and Dykstra (1999). In this case they showed that, under the null hypothesis, the test statistic had, asymptotically, a chi-bar square distribution and gave the expression of the weighting values. Similar results can be derived from those in Self and Liang (1987) who studied the asymptotic properties of the maximum likelihood estimators and the likelihood ratio tests under nonstandard conditions. We note that Self and Liang considered large sample properties of the likelihood function when the true parameter value might be on the boundary of the parameter space. They obtained the asymptotic distribution of the maximum likelihood estimators and likelihood ratio test statistics by showing that this problem was asymptotically equivalent to that of estimating the restricted mean of a multivariate normal distribution and applying standard results of normal theory. Other tests equivalent to the likelihood ratio test for this situation were discussed in Chapter 4 of Silvapulle and Sen (2005). We also note that the problem of testing that h ∈ 1 against the unrestricted alternative was considered by Wollan and Dykstra (1985). In this paper we extend these results to the k populations case. Specifically, we assume that we have k populations whose distribution functions F1 , F2 , . . . , Fk , depend on h1 , h2 , . . . , hk , respectively, where hi ∈ Rri , ri 1. We assume that ∗
1
Corresponding author. E-mail address:
[email protected] (H. Elbarmi). Formarly El Barmi
0378-3758/$ - see front matter © 2008 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2008.06.008
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
1013
{Xi1 , Xi2 , . . . , Xini }, i=1, 2, . . . , k are independent random samples from these populations and consider testing H0 against H1 −H0 and H1 against H2 − H1 , where
H0 : hs (h1 , . . . , hk ) = 0,
s = 1, 2, . . . , c,
H1 : hs (h1 , . . . , hk ) 0,
s = 1, 2, . . . , c,
(1.1) (1.2) k
and H2 imposes no constraints on the parameter vectors. Here c < r = i=1 ri and the function h=(h1 , h2 , . . . , hc ) is a well-behaved function from Rr into Rc . A particular example of this concerns testing for equality of k probability vectors corresponding to independent multinomials against the alternative that they are stochastically ordered. In this case r1 = r2 = · · · = rk ; h1 , h2 , . . . , hk (1)
(2)
(k−1)
are probability vectors and h = (h1 , h1 , . . . , h1 (j)
hi (h1 , . . . , hk ) =
j
i+1,l −
l=1
j
i,l ,
∀(i, j).
(1)
(2)
(k−1)
, h2 , h2 , . . . , h2
(1)
(2)
(k−1)
, . . . , hk−1 , hk−1 , . . . , hk−1 ) where (1.3)
l=1
The results here are therefore an extension of those obtained in the one sample case to the k samples case (k 1). They represent a unified approach to testing for or against a set of inequality constraints placed upon k 1 parameters corresponding to k distributions on the basis of independent random samples with sample sizes that are not necessarily equal. The results here extend those obtained in the product multinomial case in El Barmi and Johnson (2006) and are based on the results in Aitchison and Silvey (1958). The rest of the paper is organized as follows. In Section 2, we discuss some properties of the restricted maximum likelihood estimators and in Section 3, we derive the limiting distributions of the test statistics of interest. Section 4 discusses some examples which illustrate the theoretical results. Some of the proofs are given in Appendix A. Throughout the paper we use ∇g(x, h) to denote the gradient of a function g with respect to h at (x, h) and h ∈ H to indicate that h satisfies the constraints specified by the hypothesis H. 2. Estimation Let Xi1 , Xi2 , . . . , Xini be a random sample of size ni from a population with distribution (density) function Fi (., hi ) (fi (., hi )), i = 1, 2, . . . , k. Let h =(hT1 , hT2 , . . . , hTk )T . Throughout the rest of the paper, we assume that i =limn→∞ ni /n > 0, for all i where n= ki=1 ni . The likelihood function is given by L(h) =
k i=1
Li (hi ) =
ni k
f (xij , hi ).
(2.1)
i=1 j=1
In the discussion that follows we need to introduce various assumptions concerning (F1 , F2 , . . . , Fk ) and (h1 , h2 , . . . , hc ). These assumptions are as follows: 1. For i = 1, 2, . . . , k, the functions ln fi (t, hi ) possess third-order partial derivatives in a neighborhood of h0i and, if hi belongs to this neighborhood, then 2 j ln fi (t, hi ) < G (t), j ln fi (t, hi ) < G (t), 1i jiu jiu jiv 2i and
j3 ln f (t, h ) i i < G3i (t), jiu jiv jiw
for all i, u, v and w and Gji (t) dFi (t, h0i ) < ∞, R
j = 1, 2, 3.
2. For all 1 s c, the function hs is convex and possesses second-order derivatives which are continuous in a neighborhood of h0 and j2 h (h , h , . . . , h ) s 1 2 k < A, jiu jjv a constant for all i, j, u, v.
1014
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
3. The matrix whose (u, v)th element is j j ln fi (x, h0i ) ln fi (x, h0i ) dFi (t, h0i ) j j iv R iu is nonsingular and we denote its inverse by Bi . 4. Let B˜ = diag[(1/ 1 )B1 , (1/ 2 )B2 , . . . , (1/ k )Bk ] and Hh be the r × c matrix Hh = [∇h1 (h), ∇h2 (h), . . . , ∇hc (h)]. We assume that H ≡ H
has full rank and define the matrices P, Q and R by h0 −1 −H P Q . = 0 QT R
−1 B˜ −HT
(2.2)
A direct computation of these matrices shows that ⎤−1 ⎡ k 1 T Hi Bi Hi ⎦ ,
˜ −1 = −⎣ R = −[HT BH]
i=1 i
⎤ ⎡ 1 B1 H1 R ⎥ ⎢ 1 ⎥ ⎢ ⎥ ⎢ 1 ⎢ B2 H2 R ⎥ ⎥ ⎢ −1 T ˜ ˜ Q = −BH[H BH] = ⎢ 2 ⎥ ⎥ ⎢ .. ⎥ ⎢ ⎥ ⎢ . ⎦ ⎣ 1 Bk Hk R
k
T BH] ˜ ˜ −1 HT B˜ = (P ) and P = B˜ − BH[H ij 1 i,j k where
Pii =
1
i
Bi +
Pij = PjiT =
1
2i 1
i j
Bi Hi RHTi Bi ,
Bi Hi RHTj Bj .
0T T If h0 = (h0T 1 , . . . , hk ) ∈ H0 and if Assumptions 1–4 hold, it can be shown, using the same techniques as in Aitchison and Silvey (1958), that on a set in Rkn whose probability measure with respect to ki=1 nj=1 fi (xij , h0i ) tends to 1 as n → ∞, the equations
∇ ln L(x, h) + Hk = 0,
(2.3)
hi (h) = 0,
(2.4)
i = 1, 2, . . . , c,
have a solution (hˆ 0T , kˆ 0T )T where kˆ 0 is a Lagrangian multiplier and hˆ 0 maximizes L(h) subject to hi (h) = 0, i = 1, 2, . . . , c. Moreover, √
1 n[hˆ 0T − h0T ] = √ P∇ ln L(h0 ) + op (1) n
(2.5)
and 1 1 (0) (0) √ [ˆ 1 , . . . , ˆ c ]T = √ Q T ∇ ln L(h0 ) + op (1). n n
(2.6)
Combining (2.5) and (2.6) gives the following theorem, which is a generalization of a result in Aitchison and Silvey (1958) for the one-sample case. Theorem 2.1. Under H0 , √
T T 1 d n hˆ (0)T − h(0) , kˆ (0)T → N(0, V), n
where V=
P 0
0 . −R
(2.7)
(2.8)
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
1015
√ Proof. The proof follows from combining Eqs. (2.5) and (2.6), the fact that (1/ n)∇ ln L(h0 ) converges in distribution to N(0, B˜ −1 ) under H0 and the definition of P, Q and R. A consequence of Theorem 2.1 is that −(1/n)kˆ 0T R−1 kˆ 0 has, asymptotically, a chi-square distribution with c degrees of freedom. It can be used to test H0 against H2 − H0 as was shown in Aitchison and Silvey (1958) for the one sample case. 3. Hypotheses testing Next, we consider testing H0 against H1 − H0 and H1 against H2 − H1 . If ij denotes the likelihood ratio test statistic for testing Hi against Hj − Hi , (i, j) = (0, 1) or (1,2), the likelihood ratio approach rejects Hi in favor of Hj for large values of LRTij = −2 ln ij . In order to establish the distributions of the likelihood ratio test statistics for testing H0 against H1 − H0 and H1 against H2 − H1 , we first consider testing H0 against H1: − H0 where
H1: : hs (h1 , . . . , hk ) = 0,
s ∈ ,
(3.1)
where is a proper subset of {1, 2, . . . , c}. Under Assumptions 1–4, it can be shown, by using the same technique as in Aitchison and Silvey (1958), that the maximum T ˆ 0T ˆ (0) likelihood estimator under H1: of h, hˆ (0) ()=(hˆ 0T 1 (), . . . , hk ()) exists. Moreover, if k () is the vector of Lagrange multipliers corresponding to maximizing (2.1) under H1: , then √
1 n[hˆ (0) () − h(0) ] = √ P()∇ ln L(h0 ) + op (1) n
(3.2)
and 1 1 √ kˆ () = √ Q T ()∇ ln L(h0 ) + op (1), n n
(3.3)
where P() and Q() are the values of P and Q in (2.2) when H is replaced by H(), the submatrix of H whose column indices are in . The following result is known and we include it for the development of the main theorem. Theorem 3.1. Let LRT01: denote the likelihood ratio test statistic for testing H0 against H0 − H1: . Under H0 and for any t, we have lim P(LRT01: t) = P(2c-card() t),
n→∞
(3.4)
where card() is used to denote cardinality of . In what follows, we consider testing H0 against H1 − H0 and H1 against H2 − H1 . Let C denote the class of all subsets of {1, 2, . . . , c} and let LRT01 and LRT12 denote the log-likelihood ratio test statistics for testing H0 against H1 − H0 and H1 against H2 − H1 , respectively. For a proper subset of {1, 2, . . . , c}, with complement ˜ , define pj (h0 ) =
,card()=j
P(N(0, 1 ()) 0)P(N(0, ()) > 0),
where ˜ )]−1 1 () = [HT ()BH( and ˜ ˜ ) − HT (˜ )BH( ˜ ˜ ) HT (˜ )BH( ˜ ˜ ). () = HT (˜ )BH( 1 Let p0 (h0 ) = P(N(0, 0 ) 0) and pc (h0 ) = P(N(0, c ) 0)
1016
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
0 T˜ where 0 = −1 c = H BH. The weights pj (h )'s are sums of products of normal orthant probabilities. In general, they do not exist in a closed form (for more discussion see Kudô, 1963; Robertson et al., 1988). We use the algorithm by Genz (1992, 1993) to approximate these weights in Example 2 discussed in Section 4. The following theorem holds.
Theorem 3.2. Under H0 and for any t1 and t2 , we have lim P(LRT01 t1 , LRT12 t2 ) =
n→∞
c
pj (h0 )P(2c−j tt )P(2j t2 ),
(3.5)
j=0
where 20 ≡ 0. As a consequence of this theorem, we have under H0 lim P(LRT01 t) =
n→∞
c
pj (h0 )P(2c−j t)
(3.6)
pj (h0 )P(2j t).
(3.7)
j=0
and lim P(LRT12 t) =
n→∞
c j=0
It is the case that lim P(LRT01 t1 ) = lim
n→∞
n→∞
c
pj (hˆ 0 )P(2c−j t)
(3.8)
j=0
almost surely if hˆ 0 is any consistent estimator of h under H0 . The natural estimator for h is its maximum likelihood estimator under H0 . We note that empirical evidence suggests that critical points and p-values obtained from c
pj (hˆ 0 )P(2c−j t)
(3.9)
j=0
work well. Remark. We note that Wald, Rao and Lagrange multiplier tests discussed in Robertson et al. (1988, p. 320) extend also in a natural way to the k-sample case. It can be shown, using similar arguments as those in the proof of Theorem 3.2, that these test statistics have, asymptotically, chi-bar square distributions when h0 satisfies hi (h0 ) = 0, i = 1, 2, . . . , c. 4. Examples and a simulation study 4.1. Example 1: exponential families Suppose that for i = 1, 2, . . . , k, Xi1 , Xi2 , . . . , Xini is a random sample from a distribution Fi whose probability density with respect to some sigma finite measure on the real line is f (x; i , i ) = exp{p1 (i )p2 (i )S(x; i ) + T(x; i ) + q(i , i )}IS (x), where S ⊂ R, −∞ < i < ∞ and i is a known nuisance parameter. Suppose that p1 and q(., i ) both have continuous second derivatives with respect to on (, ) for every and satisfy (see Robertson et al., 1988)
A1 : p1 () > 0 for ∈ (, ) and p2 () > 0 for all , A2 : jq(, )/ j = −p1 ()p2 () for all ∈ (, ) and . Suppose that we want to test H0 : 1 = 2 = · · · = k against H1 − H0 where H1 : 1 2 · · · k and H1 against H2 − H1 where H2 imposes no constraint on the i 's. If hi (1 , 2 , . . . , k ) = i+1 − i , i = 1, 2, . . . , k − 1, then H0 and H1 are
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
1017
equivalent to H0 : hi (1 , 2 , . . . , k ) = 0, i = 1, 2, . . . , k − 1 and H1 : hi (1 , 2 , . . . , k ) 0, i = 1, 2, . . . , k − 1. It is easy to show here that B˜ = diag(1/w1 , 1/w2 , . . . , 1/wk ) where wi = i p1 (i )p2 (i ), i = 1, 2, . . . , k, and ⎡ 1 1 + w2 ⎢ w1 ⎢ 1 ⎢ ⎢ − ⎢ w 2 ⎢ ˜ =⎢ HT BH ⎢ 0 ⎢ ⎢ ⎢ . ⎢ .. ⎢ ⎣
−
1 w2
0
0
...
0
...
1 1 + w2 w3 1 − w3 . ..
1 − w3 1 1 + w3 w4 . ..
1 − w4 . ..
0
0
0
0
0
...
−
... 1 wk−1
⎤
⎥ ⎥ ⎥ 0 ⎥ ⎥ ⎥ ⎥ ⎥. ... ⎥ ⎥ ⎥ . ⎥ .. ⎥ 1 1 ⎦ + wk−1 wk
Therefore p0 (h0 ) = P(N(0, HT B˜ 0 H) 0) = P(Y2 − Y1 > 0, Y3 − Y2 > 0, . . . , Yk − Yk−1 > 0), where Y1 , Y2 , . . . , Yk are independent zero-mean normal random variables with variances 1/w1 , 1/w2 , . . . , 1/wk , respectively. This is the probability that the least squares projection of (Y1 , Y2 , . . . , Yk )T with weights given by w = (w1 , w2 , . . . , wk )T onto the cone I = {x ∈ Rk , x1 x2 · · · xk } has exactly k distinct levels. A direct computation of the other weights in (3.5) shows that pj (h0 ) = P(k − j, k, w) where P(l, k, w), also known as a level probability, is the probability that the same least squares projection has exactly l distinct levels. Therefore we have the following result, which is slightly more general that the one provided by Robertson et al. (1988). Under H0 and for any t1 and t2 , we have lim P(LRT01 t1 , LRT12 t2 ) =
n→∞
c
pj (h0 )P(2c−j t1 )P(2j t2 )
j=0
=
k
P(j, k, w)P(2j−1 t1 )P(2k−j t2 ),
j=1
where 20 ≡ 0. In particular, under H0 lim P(LRT01 t) =
n→∞
k
P(j, k, w)P(2j−1 t)
j=1
and lim P(LRT12 t) =
n→∞
k
P(j, k, w)P(2k−j t).
j=1
For more discussion on this, see Robertson et al. (1988). 4.2. Example 2: testing against stochastic ordering In Table 1, we present the results of a clinical trial regarding the outcome of patients who experienced trauma due to subarachnoid hemorrhage after receiving a treatment (Silvapulle and Sen, 2005; Agresti and Coull, 2002; El Barmi and Johnson, 2006). This study was designed to compare the effectiveness of four treatments (Placebo = 1, Low dose = 2, Medium dose = 3 and High dose = 4). The possible outcomes are Death = 1, Vegetative state = 2, Major disability = 3, Minor disability = 4 and Good recovery = 5. For i = 1, 2, 3, 4 and j = 1, . . . , 5, let
ij = P(Outcome = j| Treatment = i). Table 1 Results of a clinical trial comparing the effectiveness of varying levels of a treatment on patients who had suffered from a subarachnoid hemorrhage. Treatment
Death
Vegetative state
Major disability
Minor disability
Good recovery
Placebo Low dose Medium dose High dose
59 48 41 41
25 21 14 4
46 44 54 49
48 47 64 58
32 30 31 41
1018
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
Table 2 Fitted values of the clinical trial data under the null hypothesis, which states that there is no difference between dosages. Treatment
Death
Vegetative state
Major disability
Minor disability
Good recovery
Placebo Low dose Medium dose High dose
50.20 45.42 48.77 46.61
16.82 15.22 16.34 15.62
50.73 45.89 49.28 47.10
57.03 51.60 55.40 52.96
35.22 31.86 34.21 32.70
Table 3 Fitted values of the clinical trial data under the alternative hypothesis, which states that the outcome distribution is stochastically ordered by the level of the dosage. Treatment
Death
Vegetative state
Major disability
Minor disability
Good recovery
Placebo Low dose Medium dose High dose
59.00 48.23 42.85 40.96
25.00 21.10 13.76 4.05
46.00 44.21 53.09 49.66
48.00 47.23 62.92 58.78
32.00 29.22 31.38 41.55
Suppose we want to test the null hypothesis of no treatment effect against a particular alternative that implies that higher doses are more effective. That is, suppose we want to test H0 against H1 − H0 where
H0 : 1j = 2j = 3j = 4j ,
j = 1, . . . , 4
and
H1 :
j l=1
4l
j
3l
l=1
j
2l
l=1
j
1l ,
j = 1, . . . , 4,
j=1
that is, hs is stochastically smaller than hs+1 , s = 1, 2, 3.
Let hi = (i1 , i2 , i3 , i4 )T , i = 1, 2, 3, 4 and for s = 1, 2, 3, define (j)
hs (h1 , . . . , h4 ) =
j
s+1,l −
l=1
j
s,l ,
j = 1, . . . , 4.
l=1
It is easy to see that H0 and H1 can be expressed as
H0 : h(j) s (h1 , . . . , h4 ) = 0,
j = 1, 2, . . . , 4, s = 1, 2, 3
H1 : h(j) s (h1 , . . . , h4 ) 0,
j = 1, 2, . . . , 4, s = 1, 2, 3.
and
The fitted values under H0 are easy to obtain and are given in Table 2. Under H1 , we obtained them using the iterative algorithm developed in El Barmi and Dykstra (1994, 1996). They are given in Table 3. The value of the test statistic for testing H0 against H1 − H0 is 28.43 and the p-value based on (3.9) is 0.00028. 4.3. Simulation study Next we conduct a small simulation study to investigate the finite sample performance of (3.6). Let (Xi1 , Xi2 , Xi3 )T , i = 1, 2, be two independent multinomial vectors with parameters (ni , (i1 , i2 , i3 )T ), i = 1, 2, respectively. Consider testing H0 against H1 − H0 where
H0 : 1j = 2j , j = 1, 2, 3 and H1 :
j
1i
i=1
j
2i , j = 1, 2.
i=1
Let w = (w1 , w2 , w3 )T be positive weights and U1 , U2 , U3 be independent normal variables with zero means and variances , w−1 , w−1 , respectively. Let also P(j, 3, w) be the probability that the least squares projection of U = (U1 , U2 , U3 )T onto w−1 1 2 3
I = {x ∈ R3 , x1 x2 x3 } with weights given by w has exactly j distinct values. If T01 is the log-likelihood ratio test statistic in this case, using Theorem 3.2, it can be shown that, if limmin(n ,n )→∞ ni /(n1 + n2 ) > 0, then 1 2 lim P(LRT01 t) =
n→∞
3 j=1
P(j, 3, h0 )P(3−j t)
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
1019
Table 4 Power of the test based on the asymptotic distribution and finite sample sizes. n1
n2
Size of T01
Power of T01
Power of T02
30 30 50 50
30 50 30 50
0.0562 0.0548 0.0536 0.0516
0.3002 0.3660 0.3506 0.4294
0.1976 0.2516 0.2438 0.3130
under H0 (see also Robertson et al., 1988). Here h0 is the common value of the two probability vectors under H0 . In what follows, we assume under H0 , h0 = ( 13 , 13 , 13 )T in which case P(3, 3, h0 ) = 16 , P(2, 3, h0 ) = 12 , P(1, 3, h0 ) = 13 and 3
P(j, 3, h0 )P(23−j 4.5773) = 0.05.
(4.1)
j=1
We estimate the power of the test based on the left-hand side of (4.1) for different sample sizes using 5000 replications at h1 = ( 12 , 14 , 14 )T and h2 = ( 31 , 13 , 13 )T . We also compute the power of T02 , the log-likelihood ratio test statistic for testing H0 against H2 − H0 based on its asymptotic distribution. The results are given in Table 4. As can be seen from these results, the use of the asymptotic distribution in practice can result in a conservative test (size greater than 0.05). The results also show that the T01 outperforms T02 at all the sample sizes considered. Acknowledgments The authors thank the editor and a referee for helpful comments. Dr. Elbarmi's work was supported by the City University of New York through the PSC-CUNY. Dr. Núñez-Antón's work was supported by Ministerio Español de Educación y Ciencia, FEDER, Universidad del País Vasco (UPV/EHU) and Departamento de Educación del Gobierno Vasco (UPV/EHU Econometrics Research Group) under research Grants MTM2004-00341, MTM2007-60112 and IT-334-07. Appendix A. To prove Theorem 3.2 we make use of the following lemmas. The proof of the first lemma can be found in El Barmi and Dykstra (1999). Lemma A.1. Suppose X has a multivariate normal distribution with zero mean vector and covariance matrix I, P is an idempotent symmetric matrix of rank r and d1 , d2 , . . . , dk are k vectors satisfying either Pdi = 0 or Pdi = di for all i. Then the conditional distribution of XT PX given dT X 0, i = 1, 2, . . . , k, is that of a chi-squared random variable with r degrees of freedom. Lemma A.2. Let U ∼ N(0, B˜ −1 ) and for any proper subset of {1, 2, . . . , c} define Y2 () = HT (˜ )P()U, −1/2 T Y3 () = [−R()] Q ()U, Y4 () = Q T ()U. Y1 () = [P() − P]U,
Then (Y1T (), Y2T (), Y3T (), Y4T ())T has a multivariate normal distribution with mean 0 and covariance matrix ⎡
P() − P 0 P()H(˜ ) ⎢ HT (˜ )P() HT (˜ )P()H(˜ ) ⎢ ⎢ ⎢ 0 0 ⎣ 0
0
0 0 Icard() [−R()]
⎤
0 0 −1/2
[−R()]
⎥ ⎥ ⎥ ⎦
−1/2 ⎥ .
−R()
Proof. The proof follows immediately from the following identities obtained from the definition of P(P()), Q(Q()) and R(R()). We will only show how to obtain cov(Y1 (), Y2 ()). The definition of P(P()), Q(Q()) and R(R()) implies that (a) B˜ −1 P − HQ T = I, (b) − HT P = 0, (c) B˜ −1 Q − HR = 0, (d) − HT Q = I, (e) B˜ −1 P() − H()Q T () = I, (f) − HT ()P() = 0, (g) B˜ −1 Q() − H()R() = 0, (h) − HT ()Q() = I.
1020
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
We have cov(Y1 (), Y2 ()) = [P() − P] cov(U, U)P()H(˜ ) = [P() − P]B˜ −1 P()H(˜ ) = P()B˜ −1 P()H(˜ ) − P B˜ −1 P()H(˜ ). Now P()B˜ −1 P()H(˜ ) = P()H(˜ ) by (e) and (f) and P B˜ −1 P()H(˜ ) = 0 by (f) and (b). Therefore cov(Y1 (), Y2 ()) = P()H(˜ ).
Proof of Theorem 3.1. Let hˆ denote the unrestricted maximum likelihood estimator of h. Since √
1 ˜ n[hˆ − h(0) ] = √ B∇ ln L(h0 ) + op (1), n
(A.1)
Eqs. (3.1) and (3.4) imply that √
1 ˜ ln L(h0 ) + op (1) n[hˆ (0) − hˆ ] = √ [P − B]∇ n
(A.2)
√
1 ˜ ln L(h0 ) + op (1). n[hˆ () − hˆ ] = √ [P() − B]∇ n
(A.3)
and
Applying Taylor's expansion of ln L(hˆ (0) ) and ln L(hˆ (0) ()) around hˆ under H0 we find that ln L(hˆ (0) ) = ln L(hˆ ) − 12 n(hˆ (0) − hˆ )T B˜ −1 (hˆ (0) − hˆ ) + op (1),
ln L(hˆ ()) = ln L(hˆ ) − 12 n(hˆ () − hˆ )T B˜ −1 (hˆ () − hˆ ) + op (1). Therefore, we have
LRT01: = − 2(ln L(hˆ (0) ) − ln L(hˆ ()))
= n(hˆ () − hˆ )T B˜ −1 (hˆ () − hˆ ) − n(hˆ () − hˆ )T B˜ −1 (hˆ () − hˆ ) + op (1) ˜ B˜ −1 (P − B)[n ˜ −1/2 ∇ ln L(h0 )] = [n−1/2 ∇ ln L(h0 )]T (P − B) −1 0 T −1/2 ˜ B] ˜ ˜ −1/2 ∇ ln L(h0 )] + op (1) ∇ ln L(h )] (P() − B)[ (P() − B)[n − [n
= [n−1/2 ∇ ln L(h0 )]T (P() − P)[n−1/2 ∇ ln L(h0 )] + op (1),
(A.4)
where the last equality follows from the identities in the proof of Lemma A.1. Since n−1/2 ∇ ln L(h0 ) converges in distribution as n goes to infinity to N(0, B˜ −1 ) and since (P() − P)B˜ −1 (P() − P)B˜ −1 = (P() − P)B˜ −1 , ˜ −1 LRT01: converges in distribution to a chi-square random variable with rank(P() − P) degrees of freedom. But (P() − P)[B] is an idempotent matrix, therefore
rank(P() − P) = rank((P() − P)B˜ −1 ) = trace((P() − P)B˜ −1 ) ˜ )]−1 HT ()B] ˜ B˜ −1 ) ˜ )[HT ()BH( = trace([B˜ − BH( T BH] ˜ ˜ −1 HT B] ˜ B˜ −1 ) − trace([B˜ − BH[H
= (r − card()) − (r − c) = c − card() and we have the desired conclusion.
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
1021
Proof of Theorem 3.2. For any observed data set, hˆ (1) equals hˆ () for exactly one . Moreover hˆ (1) = hˆ () if and only if (Wollan and Dykstra, 1985) hs (hˆ ()) < 0,
ˆ s () > 0,
∀s ∈ ˜ ,
(A.5)
∀s ∈ .
(A.6)
Let LRT12: be the likelihood ratio test statistic for testing H1: against H2 − H1: . It follows from Aitchison and Silvey (1958) and Silvey (1959) that 1 n
LRT12: = − [Ln (h0 )]T Q()[R()]−1 Q()T Ln (h0 ) + op (1). Using (2.5), (2.6) and (A.3)–(A.6) we have P(LRT01 t1 , LRT12 t2 ) P(LRT01 t1 , LRT12 t2 , hˆ (1) = hˆ ()) = ∈ C = P(LRT t , LRT t , < 0, hs (hˆ ()) < 0, ∀s ∈ ˜ , ˆ s () > 0, ∀s ∈ ) = =
∈ C ∈ C ∈ C
01
1
12
2
P(LRT01: t1 , LRT12: t2 , hs (hˆ ()) < 0, ∀s ∈ ˜ , ˆ s () > 0, ∀s ∈ ) P([n−1/2 ∇ ln L(h0 )]T (P() − P)[n−1/2 ∇ ln L(h0 )] + op (1) t1 , 1 [∇ ln L(h0 )]T Q()R()Q T ()∇ ln L(h0 ) + op (1) t2 , hs (hˆ ()) < 0, n ∀s ∈ ˜ , ˆ s () > 0, ∀s ∈ ) −
=
∈ C
P([n−1/2 ∇ ln L(h0 )]T (P() − P)[n−1/2 ∇ ln L(h0 )] + op (1) t1 , 1 [∇ ln L(h0 )]T Q()R()Q T ()∇ ln L(h0 ) + op (1) t2 , n n−1/2 HT (˜ )P()∇ ln L(h0 ) + op (1) < 0, n−1/2 Q T ()∇ ln L(h0 ) + op (1) > 0). −
Since n−1/2 ∇ ln L(h0 ) converges in distribution as n goes to infinity to N(0, B˜ −1 ), P(UT (P() − P)U t1 , −UT Q()[R()]−1 Q T ()U t2 , lim P(LRT01 t1 , LRT12 t2 ) = n→∞ ∈ C HT (˜ )P()U < 0, Q T ()U 0), where U has a multivariate normal distribution with zero mean vector and variance B˜ −1 . Therefore lim P(LRT01 t1 , LRT12 t2 ) = P(Y1T ()Y1 () t1 , Y3T ()Y3 () t2 , Y2 () 0, Y4 () 0) n→∞ ∈ C P(Y1T ()Y1 () t1 , Y2 () 0) = ∈ C × P(Y3T ()Y3 () t2 , Y4 () 0). Let Z have a multivariate normal distribution with zero mean vector and variance equal to the identity matrix I, then P(Y1T ()Y1 () t1 , Y2 () 0) = P(ZT B˜ −1/2 (P() − P)B˜ −1/2 Z t1 , HT (˜ )P()B˜ −1/2 Z 0) = P(ZT B˜ −1/2 (P() − P)B˜ −1/2 Z t1 )P(HT (˜ )P()B˜ −1/2 Z 0) by Lemma A.1 since B˜ −1/2 (P() − P)B˜ −1/2 is idempotent and B˜ −1/2 (P() − P)B˜ −1/2 B˜ −1/2 P()H(˜ ) = B˜ −1/2 P()H(˜ ). Also P(Y3T ()Y3 () t2 , Y4 () 0) = P(−ZT B˜ −1/2 Q()[R()]−1 Q T ()B˜ −1/2 Z t2 , Q T ()B˜ −1/2 Z 0) = P(−ZT B˜ −1/2 Q()[R()]−1 Q T ()B˜ −1/2 Z t2 )P(Q T ()B˜ −1/2 Z 0)
(A.7)
1022
H. Elbarmi et al. / Journal of Statistical Planning and Inference 139 (2009) 1012 -- 1022
by Lemma A.1 since B˜ −1/2 Q()[R()]−1 Q T ()B˜ −1/2 is idempotent and B˜ −1/2 Q()[R()]−1 Q T ()B˜ −1/2 B˜ −1/2 Q() = B˜ −1/2 Q(). Therefore
P(ZT B˜ −1/2 (P() − P)B˜ −1/2 Z t, Q T ()B˜ −1/2 Z 0, HT ()P()B˜ −1/2 Z < 0)
∈ C
=
∈ C
P(ZT B˜ −1/2 (P() − P)B˜ −1/2 Z t1 )P(−ZT B˜ −1/2 Q()[R()]−1 Q T ()B˜ −1/2 Z t2 )
× P(Q T ()B˜ −1/2 Z 0)P(HT ()P()B˜ −1/2 Z < 0). Note that by Theorem 3.2, P(ZT B˜ −1/2 (P() − P)B˜ −1/2 Z t1 ) = P(2c-card() t1 ). Since B˜ −1/2 Q()R()Q T ()B˜ −1/2 is idempotent with rank equal to cardinality of , we have P(−ZT B˜ −1/2 Q()[R()]−1 Q T ()B˜ −1/2 Z t2 ) = P(2card() t2 ). Therefore lim P(LRT01 t1 , LRT12 t2 ) =
n→∞
c
P(2c−j t1 )P(2j t2 )
j=0
P(Q T ()B˜ −1/2 Z 0)
∈C,card()=j
× P(HT ()P()B˜ −1/2 Z < 0). Since ˜ )]−1 , Q T ()B˜ −1 Q() = −R() = [HT ()BH( P(Q T ()B˜ −1/2 Z > 0) = P(N(0, Q T ()B˜ −1 Q() 0) ˜ )]−1 ) 0). = P(N(0, [HT ()BH( Also P(HT (˜ )P()B˜ −1/2 Z < 0) = P(N(0, HT (˜ )P()H(˜ ) < 0) so that
∈C,card()=j
P(Q T ()B˜ −1/2 Z 0)P(HT ()P()B˜ −1/2 Z < 0) = pj (h0 )
and we have the desired result. References Agresti, A., Coull, B., 2002. The analysis of contingency tables under inequality constraints. J. Statist. Plann. Inference 107 (1–2), 45–73. Aitchison, J., Silvey, S.D., 1958. Maximum likelihood estimation of parameters subject to restraints. Ann. Math. Statist. 29, 813–828. El Barmi, H., Dykstra, R., 1994. Restricted multinomial maximum likelihood estimation based upon Fenchel duality. Statist. Probab. Lett. 21, 121–130. El Barmi, H., Dykstra, R., 1996. Restricted product multinomial and product poisson estimation based upon Fenchel duality. Statist. Probab. Lett. 29, 117–123. El Barmi, H., Dykstra, R., 1999. Likelihood ratio test against a set of inequality constraints. J. Nonparametric Statist. 11, 233–250. El Barmi, H., Johnson, M., 2006. A unified approach to testing for or against a set of inequality constraints: the product multinomial case. J. Multivariate Anal. 97, 1894–1912. Genz, A., 1992. Numerical computation of multivariate normal probabilities. J. Comput. Graphical Statist. 1, 141–149. Genz, A., 1993. Comparison of methods for computation of multivariate normal probabilities. Comput. Sci. Statist. 25, 400–405. Kudô, A., 1963. A multivariate analogue of the one sided test. Biometrika 50, 403–418. Robertson, T., Wright, F.T., Dykstra, R.L., 1988. Order Restricted Statistical Inference. Wiley, New York. Self, S.G., Liang, K.Y., 1987. Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Statist. Assoc. 82, 605–610. Silvapulle, M.J., Sen, P.K., 2005. Constrained Statistical Inference. Wiley, New York. Silvey, S.D., 1959. The Lagrangian multiplier test. Ann. Math. Statist. 30, 389–407. Wollan, P., Dykstra, R., 1985. Conditional tests with an order restriction as a null hypothesis. In: Dykstra, R.L., Robertson, T., Wright, F.T. (Eds.), Advances in Order Restricted Inference. Springer, New York, pp. 279–295.