Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429 www.elsevier.com/locate/jspi
On a multiple endpoints testing problem Morris L. Eatona , Robb J. Muirheadb,∗ a School of Statistics, University of Minnesota, 224 Church Street SE, Minneapolis, MN 55455, USA b Statistical Research and Consulting Center, Pfizer Inc., MS6025-A5115, 50 Pequot Avenue, New London, CT 06320, USA
Available online 30 March 2007
Abstract In a clinical trial comparing drug with placebo, where there are multiple primary endpoints, we consider testing problems where an efficacious drug effect can be claimed only if statistical significance is demonstrated at the nominal level for all endpoints. Under the assumption that the data are multivariate normal, the multiple endpoint-testing problem is formulated. The usual testing procedure involves testing each endpoint separately at the same significance level using two-sample t-tests, and claiming drug efficacy only if each t-statistic is significant. In this paper we investigate properties of this procedure. We show that it is identical to both an intersection union test and the likelihood ratio test. A simple expression for the p-value is given. The level and power function are studied; it is shown that the test may be conservative and that it is biased. Computable bounds for the power function are established. © 2007 Elsevier B.V. All rights reserved. MSC: 62H15; 63F03; 62P10 Keywords: Multiple endpoints testing; Likelihood ratio test; Intersection union test; Power function; Schur concavity
1. Introduction There appears to be an increasing tendency for regulatory agencies to require that confirmatory clinical trials for new drugs demonstrate efficacy on more than one primary endpoint. Within the pharmaceutical industry, this is seen as problematical in areas where there is not a consensus on what constitutes a meaningful “drug effect”. For example, in migraine studies, the FDA now requires that a new drug show efficacy on four primary endpoints (pain, nausea, photosensitivity, phonosensitivity), and this implies that these four endpoints are of equal importance and interchangeable. The International Headache Society Clinical Trials Subcommittee (2000), however, in its guidelines for clinical trials involving drugs in migraine, says that the primary measure of efficacy should be pain, and does not suggest that the other three endpoints be “co-primary”. In a recent review paper, Sankoh et al. (2003) discuss some clinical reasons for multiple endpoints and how these vary across therapeutic areas. In addition, they give the following three possible clinical decision-making scenarios for a clinical trial with at least two primary endpoints. “A sponsor could propose to claim drug effect if acceptable clinical treatment effect sizes are realized and: (i) statistical significance is demonstrated at the prespecified nominal -level for all primary endpoints, or (ii) statistical significance is demonstrated at the prespecified nominal -level for the majority . . . of the primary endpoints, or (iii) statistical significance is demonstrated at the prespecified nominal -level for one ∗ Corresponding author.
E-mail addresses:
[email protected] (M.L. Eaton), robb.j.muirhead@pfizer.com (R.J. Muirhead). 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.03.021
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
3417
or more of the primary endpoints.” The majority of the discussion in the paper of Sankoh et al. (2003) is devoted to scenario (iii). We are concerned in this paper with scenario (i), as this relates to showing efficacy on two or more primary endpoints. The paper is organized as follows. Section 2 introduces distributional assumptions and notation. We discuss the appropriate null and alternative hypotheses in Section 3. Section 4 focuses on an intersection–union test (IUT); we show that it is exactly the same as the procedure that tests each endpoint separately using two-sample t-tests, and derive a useful, simple expression for the p-value. We show that the test may be conservative and that it is biased. In Section 5 it is shown that the likelihood ratio test (LRT) is identical to the IUT. Some properties of the power function are derived in Section 6 and, in particular, a computable lower bound for the power is given. The discussion in Section 7 focuses on size, power, and sample size considerations; and on open problems. Many of the more technical proofs in the paper are given in the Appendix. 2. Statistical setting In a clinical trial setting, suppose we have n1 subjects on drug and n2 subjects on placebo, and that there are p primary endpoint variables, assumed to have a p-variate normal distribution, where p 2. The measurements on the n1 (P) (D) subjects on drug will be denoted Xi , i = 1, . . . , n1 , and those for the n2 subjects on placebo by Xi , i = 1, . . . , n2 . Our distributional assumptions are that these vectors are all independent, with (D)
X1 , . . . , Xn(D) ∼ i.i.d. Np ((D) , ) and 1
(P)
X1 , . . . , Xn(P) ∼ i.i.d. Np ((P) , ). 2
The common covariance matrix is assumed unknown. We are interested in testing hypotheses about the difference in the mean vectors = (D) − (P) . In particular, to claim efficacy on all p endpoints, we need to be able to conclude that i > 0 for all i = 1, . . . , p. In what follows in this paper, this will be the form of the alternative hypothesis. (D) (P) In obvious notation, let X , X , S (D) , S (P) be the two sample mean vectors and the two (unbiased) sample covariance matrices. Let S denote the usual pooled matrix of sums of squares and cross products S = (n1 − 1)S (D) + (n2 − 1)S (P) . By sufficiency and translation invariance, we need only consider the difference between the two sample mean vectors and S. Thus, let n1 n2 (D) (P) (X − X ). Y= n1 + n 2 Then Y and S are independent, with Y ∼ Np (, ) and S ∼ Wp (n, ) (Wishart), where n = n1 + n2 − 2 and n1 n 2 n1 n 2 = ((D) − (P) ). ≡ n1 + n 2 n1 + n 2 The distributional notation follows that in Muirhead (1982). 3. Two testing problems The discussion in Section 2 shows that the version of the multiple endpoint problem of concern in this paper can be described as follows. The observed data consists of a random vector Y in R p with a multivariate normal distribution, Np (, ), where the mean vector and the covariance matrix are both unknown. We assume that is inSp+ , the set of all p × p positive definite symmetric matrices. In addition to Y, we also observe the p × p random matrix S having a Wishart distribution Wp (n, ) with n p. To describe the testing problems under consideration, let = { ∈ R p |i > 0, i = 1, . . . , p}. Our discussion is limited to the alternative hypothesis HA : (, ) ∈ × Sp+ .
(1)
3418
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
In the multiple endpoints context, HA asserts that each i > 0. One null hypothesis of interest is that is in the complement of ; that is, at least one coordinate of is less than or equal to zero. This we write as H0 : (, ) ∈ c × Sp+ . (1)
(2)
To describe an interesting second null hypothesis, note that is an open convex cone and the boundary of is j = { ∈ R p |i 0, i = 1, . . . , p and j = 0 for some j }. Our second null hypothesis is H0 : (, ) ∈ (j ) × Sp+ . (2)
(3)
Informally, the first null hypothesis is that min1 i p i 0; and the second null hypothesis is, given i 0 for all i, that min1 i p i = 0. For future reference, note that = { ∈ R p |i 0, i = 1, . . . , p}.
(4)
is the closure of . Let Fj = { ∈ R p | ∈ , j = 0}.
(5)
Then Fj is a “face” of the closed polyhedral cone and j =
p
Fj .
(6)
j =1
These “faces” will arise in the discussion later. 4. The intersection–union test For the problems considered here, the IUTs as described in Section 3 of Berger and Hsu (1996) are plausible candidate tests. To see this, first let i = {(, )|i 0, ∈ Sp+ }
for i = 1, . . . , p.
(1)
Then the null hypotheses H0 and the alternative hypothesis HA are (1) H0
: (, ) ∈
p
i
and
HA : (, ) ∈
i=1
ci .
i=1
An obvious one-sided test for testing i versus Ti = √
p
ci
is the one-sided t-test that rejects for large values of
Yi , sii /n
for i = 1, . . . , p, where sii is the ith diagonal element of S. When i = 0, Ti has a Student t-distribution with n degrees (1) of freedom. Let c denote the (1 − )-quantile point of the tn distribution. Then the test (an IUT) that rejects H0 if and only if Ti c
for all i = 1, . . . , p (1)
(7)
has size equal to for testing H0 versus HA (see Theorem 2 in Berger and Hsu, 1996). (This is, in fact, the procedure in common usage, where each endpoint is tested separately at the same significance level using two-sample t-statistics, and drug efficacy is claimed only if each is significant.) Applying essentially the same argument shows that the test that
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
3419
(2)
rejects when (7) holds also has size or testing H0 versus HA . Note that, for either null hypothesis, the probability of rejecting the null when all the i = 0 and = Ip is p and this is strictly less than . This leads immediately to the following two observations. First, the test may be extremely conservative. For example, if p = 4 endpoints are each tested at level = .05, the IUT’s probability of a Type I error, if in fact all the i = 0 and = Ip , is (.05)4 = .0000625. Secondly, because the power function of the test is a continuous function of the parameters, there are points in the alternative parameter space “close” to the point ( = 0, = Ip ) for which the power is strictly less than , or in other words, the test is biased. The remainder of this paper concerns the IUT (which, in Section 5 is also shown to be the LRT). In all that follows, we assume ∈ (0, 21 ) so c > 0. Setting T = min Ti ,
(8)
1i p
the IUT rejects if T c , and the discussion above shows that this test has size equal to . Now, suppose the value T = t0 is observed. The p-value of this observed value, for each null hypothesis, is pi (t0 ) = sup P {T t0 |, },
i = 1, 2,
(9)
(i)
H0
(i)
where the supremum is over all values of (, ) in H0 , i = 1, 2 (see Casella and Berger, 1990, p. 364). Theorem 1 below shows that (9) is just the upper tail probability of a Student t-distribution. To formulate this result, first partition the mean vector as 1 , 1 ∈ R 1 , ˙ ∈ R p−1 . = ˙ In what follows, the notation ˙ → +∞ means that each coordinate of ˙ converges to +∞. Theorem 1. For t0 ∈ R 1 , pi (t0 ) = P {T1 t0 |1 = 0, = Ip }, (1)
i = 1, 2.
(10)
(2)
That is, under both H0 and H0 , the p-value in (9) is equal to the upper tail probability P {tn∗ t0 } = pi (t0 ),
i = 1, 2,
(11)
where tn∗ denotes a Student-t random variable with n degrees of freedom. (i)
Proof. For H0 , i = 1, 2, ˙ = Ip } sup P {T t0 |, } lim P {T1 t0 , . . . , Tp t0 |1 = 0, , ˙ →∞
(i)
H0
= P {T1 t0 |1 = 0, = Ip } = P {tn∗ t0 }. (i)
Further, under H0 , i = 1, 2, there is an index j so that j 0 where j is the j th coordinate of . then P {T t0 |, } P {Tj t0 |, } = P {Tj t0 |j , jj },
(12)
where jj is the (j, j ) element of . Since the non-central t distribution has monotone likelihood ratio and j 0, the right hand expression in (12) is bounded above by P {Tj t0 |j = 0, jj }.
(13)
Because (13) does not depend on jj , it is clear that (13) is equal to (11). Thus (11) is both an upper bound and a lower bound on pi (t0 ), i = 1, 2, and the proof is complete.
3420
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
Note that Theorem 1 tells us that the p-value of the IUT is, as one might expect, the largest of the p-values associated with the individual t-tests. Example. Zhang et al. (1997) consider a clinical trial comparing drug with placebo in asthmatic subjects, with p = 4 primary endpoints: (1) forced expiratory volume in 1 s (FEV1 ), in liters, (2) peak expiratory flow rate (PEFR), in liters per minute, (3) symptoms score (SS), and (4) -agonist use, in puffs per day. The analysis in Zhang et al. (1997) uses percent change from baseline for FEV1 , and change from baseline for the other three endpoints. The sample sizes are n1 = 34 (drug) and n2 = 35 (placebo). The two-sample t-statistics comparing drug with placebo for each endpoint, and their corresponding (one-sided) p-values are T2 = 2.75 (p2 = .0039),
T1 = 3.00 (p1 = .0019),
T3 = 2.25 (p3 = .0139),
T4 = 2.13 (p4 = .0185).
Testing each endpoint at, for example, significance level = .05, supports the claim that the drug is efficacious. As noted above, this is the same as the IUT with size = .05, and (from Theorem 1) its p-value is equal to .0185, the largest of the individual p-values. It is shown in the next section that this test is also the LRT with size = .05. 5. The likelihood ratio test (1)
(2)
In this section, we discuss the LRTs for both H0 and H0 versus HA and show that these are identical to the IUT described in Section 4. To this end, note that the likelihood function, given the data (y, S), is proportional to L(, ) = ||−(n+1)/2 exp[− 21 (y − ) −1 (y − ) −
1 2
tr −1 S].
(14)
For a fixed value of standard calculation shows that the matrix ˆ=
1 (S + (y − )(y − ) ) n+1
(15)
ˆ is proportional to maximizes L(, ) as ranges over Sp+ . Some routine algebra shows that L(, ) L∗ () =
1 [1 + (y
− ) S −1 (y
− )](n+1)/2
(1)
.
(16) (1)
Therefore, for testing H0 versus HA , the LRT will reject H0 if
1 =
sup∈c L∗ ()
(17)
sup∈R p L∗ () (1)
is too small. The denominator in (17) is equal to 1, so the LRT rejects H0 if and only if D1 = inf c (y − ) S −1 (y − )
(18)
∈
is large enough. (1)
Theorem 2. For ∈ (0, 21 ), the size LRT that rejects H0 for large values of D1 in (18) is equivalent to the size IUT described in Section 4. The essence of the proof of Theorem 2 involves establishing the identity 1 1/2 D1 = max √ (19) min Ti , 0 . n 1i p √ 1/2 (1) Thus, when T in (8) is positive, D1 = T / n, so rejecting H0 for large positive values of T (the IUT) is identical to (1) rejecting H0 for large values of D1 (the LRT). The details of the proof of Theorem 2 are given in Section A.1 of the Appendix.
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
3421
(2)
For testing the null hypothesis H0 versus HA , we simply repeat the earlier argument to show that the LRT rejects (2) H0 if and only if
2 = 2 (y) =
sup∈j L∗ ()
(20)
sup∈ L∗ ()
is small enough, where L∗ is given in (16). The obvious identity j = j has been used in (20). (2)
(2)
Theorem 3. For ∈ (0, 21 ), the size LRT of H0 versus HA that rejects H0 for small values of 2 is equivalent to the size IUT. The proof of Theorem 3 is given in Section A.2 of the Appendix. The proofs of Theorems 2 and 3 are somewhat technical, but rely only on standard multivariate methods. 6. The power function (i)
The IUT (equivalently LRT) that rejects if T c has size for both null hypotheses H0 , i = 1, 2. Thus, sup (1)
P {T c |, } =
(,)∈H0
sup (2)
P {T c |, } = .
(21)
(,)∈H0
For any (, ), the power function of the IUT is (, ) = P {T c |, }.
(22)
In this section we provide some information about the behavior of . Because the IUT (equivalently LRT) is invariant under positive scale changes of each coordinate, the power function has the property that (D, DD ) = (, )
(23)
for all p × p diagonal matrices D whose diagonal elements, say d11 , . . . , dpp , are positive. Let the diagonal elements √ of be 11 , . . . , pp and pick dii = 1/ ii , i = 1, . . . , p. Then R = DD
(24)
is the correlation matrix, = D
(25)
is the vector consisting of the non-centrality parameters of the statistics T1 , . . . , Tp , and ( , R) = (, ).
(26)
It is clear that marginally, each Ti has a non-central t-distribution with n degrees of freedom and non-centrality parameter i where is given in (25). Theorem 4. If the covariance matrix is diagonal, then (, ) = ( , Ip ) =
p
P {Ti c | i }.
(27)
i=1
Proof. When is diagonal, then T1 , . . . , Tp are mutually independent so (27) follows.
In many multiple endpoint problems involving drug efficacy, it is reasonable to assume that the elements of the covariance matrix are non-negative—that is, all the correlations between endpoints are non-negative. In this case, the following result provides a useful lower bound for the power function.
3422
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
Theorem 5. Assume all the elements of the covariance matrix are non-negative. Then the power function of the IUT satisfies (, ) ( , Ip ) =
p
P {Ti c | i }.
(28)
i=1
The proof of the inequality (28) involves Slepian’s Theorem (1962) coupled with an extension of Corollary 4.1 in Das Gupta et al. (1972). The details are given in Section A.3 of the Appendix. The inequality (28) shows that the infimum of the power function over all positive definite ’s with non-negative elements is ( , Ip ). Of course, given 1 , . . . , p , ( , Ip ) is easily computable for each value of p and n. A natural question is how ( , Ip ) behaves as a function of , especially when the alternative hypothesis HA holds—that is, when i > 0 for i = 1, . . . , p. Some information on the behavior of ( , Ip ) is available via notions involving majorization and Schur concavity. A brief description of this follows. The reader is referred to the book of Marshall and Olkin (1979) for history and more detailed information. Consider two vectors x and y in R p and let x[1] x[2] · · · x[p] , y[1] y[2] · · · y[p] denote the ordered coordinates of x and y, ordered from largest to smallest. Then y majorizes x if r i=1 p
y[i] y[i] =
i=1
r i=1 p
x[i]
for r = 1, . . . , p − 1,
x[i] .
(29)
i=1
When (29) holds, we write x ≺ y and say that x is majorized by y or equivalently, that y majorizes x. The following two examples provide some benchmarks for majorization: 1 1 1 1 1 1 ,..., ≺ ,..., ,0 ≺ ··· ≺ , , 0, . . . , 0 ≺ (1, 0, . . . , 0) p p p−1 p−1 2 2 and
1 1 ,..., p p
≺ b1 , . . . , bp ≺ (1, 0, . . . , 0)
for any p-vector whose coordinates are b1 , . . . , bp with bi 0 and bi = 1. A geometric characterization of majorization that is often helpful is the following. Let G be the group of p × p permutation matrices. Given y ∈ R p , let C(y) be the convex hull of the set {gy|g ∈ G}. Then x ≺ y if and only if x ∈ C(y) . For a discussion of this and some related ideas, see Eaton (1987). One useful application of majorization involves the derivation of inequalities. Given a symmetric (under permutations) convex subset B ⊆ R p , a real valued function h defined on B is called Schur concave if h(x)h(y) for all x, y ∈ B
with x ≺ y.
(30)
For example, if B is the probability simplex in R p and H (b) = −
p
bi log bi ,
b∈B
i=1
is the entropy function, the fact that H (·) is Schur concave, coupled with the benchmark examples given above, provide useful inequalities for entropy. See Marshall and Olkin (1979) for more examples.
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
3423
We now return to a discussion of ( , Ip ) =
p
(31)
w( i ),
i=1
where w( ) = P {Ti c | }
(32)
is the right tail probability of a non-central t random variable with n degrees of freedom and non-centrality parameter
. Since the non-central t-distribution has a monotone likelihood ratio, w(·) is increasing in so ( , Ip ) is increasing in each coordinate of . The following theorem is proved in Section A.4 of the Appendix. Theorem 6. The function ( , Ip ) is a Schur concave function defined on R p . 6 implies that if t = Theorem p = t is i 1
p 1
i > 0 is fixed, then the maximum of the power function ( , Ip ) on the contour
m(t) = [w(t/p)]p .
(33)
Of course, (33) provides an upper bound on how well (in terms of power) the IUT can do on the specified contour. It p should be remarked that for i 0, i = 1, . . . , p and 1 i = t given, ( , Ip ) is minimized at (t, 0, . . . , 0) and the minimum is m0 (t) = p−1 w(t) which is smaller than p−1 . This reinforces an earlier observation about bias; that is, there are points in the alternative parameter space where the power is much smaller than the size when p 3. 7. Discussion The results of this paper show that assessing multiple endpoint problems via IUTs, or equivalently via LRTs, is a rather delicate matter. When the data are assumed to be p-variate normal, the power function of the size test is a function of p(p + 1)/2 parameters. Tight lower bounds on the power function, that involve only p non-centrality parameters, are provided in Theorem 4. The use of the bounds in practice involves the specification of the two sample sizes n1 and n2 , the differences of population means introduced in Section 2, and the population variances 11 , . . . , pp . Only then are the non-centrality parameters n1 n2 i i = (34) √ , i = 1, . . . , p n1 + n2 ii specified, so that the right-hand side of (28) can be calculated to provide a lower bound on the power function. √ Assuming n1 = n2 = m and = i / ii for i = 1, . . . , p, the i in (34) become √ = m√ 2
(35)
and the lower bound (28) becomes (P {T1 c | })p ,
(36)
where T is a non-central Student t variable with 2m − 2 degrees of freedom and non-centrality parameter given by (35). Setting (36) equal to some pre-specified power yields a sample size m necessary to achieve this. The sample size calculation would have to be done numerically. In summary then, (28) provides a sharp lower bound for the power function of the level IUT (LRT). This sharp lower bound is given in (31) as a product of right tail probabilities based on a non-central t random variable. This shows that the power function ( , Ip ) is non-decreasing in each coordinate i 0, i = 1, . . . , p. Further, Theorem 6 shows that ( , Ip ) is Schur concave in . As a consequence, ( , Ip ) is maximized (given the sum of the coordinates) when all the coordinates are equal. This describes what we know qualitatively about the power function (31). More detailed information about the power would require specification of the dimension, the sample sizes, and the non-centrality parameters—see, for example, (35) and (36) for a particular case.
3424
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
Open questions remain. For example, are better lower bounds than (28) available—perhaps involving the minimum correlation coefficient in the correlation matrix R? It might be the case that (26) is decreasing in each off-diagonal element of R (an analog of Slepian’s Theorem), but this is not known to us. Of course, this would actually be the case if (T1 , . . . , Tp ) were jointly normal with correlation matrix R. We now turn to the rather natural question: “Where in the parameter space is the power function equal to ?” If we (1) (2) restrict attention to the null spaces specified by H0 or H0 , we are then asking: “When is the probability of a Type I error actually equal to the intended significance level ?” According to Theorem 3, for (, ) in either of the null (1) (2) spaces specified by H0 or H0 , the value of the power function is bounded above by and its supremum is equal to . Now, fix and set 1 = 0, where 1 is the first coordinate of the vector . Using the notation adopted in Theorem 1, it is easy to show that lim P {T c |, } = P {T1 c |1 = 0} = .
˙ →∞
The same conclusion is valid with any i = 0, fixed, and the remaining j , j = i, converging to +∞. This argument shows that the level is achieved in the null hypotheses on the set where is fixed, one coordinate of is zero, and the remaining coordinates of are +∞. Finally, the IUT approach to the multiple endpoints problem investigated in this paper treats the endpoints symmetrically, or of “equal importance”. The results of this paper suggest that decision makers may well be better off (at least in terms of power) by ordering the endpoints in terms of importance and then making decisions sequentially. Such problems could perhaps be formulated using decision theory as in Cohen and Sackrowitz (2005), thus suggesting step-up or step-down procedures. This has yet to be explored. Acknowledgment The authors would like to thank a referee for many helpful comments. Appendix A. In this technical appendix, proofs of Theorems 2, 3, 5 and 6 are given. The appendix is divided into four parts, labeled A.1 through A.4, dealing with each of these. A.1. Discussion of Theorem 2 We first recall a well-known matrix identity. Let A be a p × p positive definite matrix partitioned as A11 A12 A= A21 A22 with A11 and A22 both square matrices. Also consider u1 u= ∈ Rp u2 partitioned conformably with A. Then (see Eaton, 1983, p. 182) −1 −1 −1 u A−1 u = u 1 A−1 11 u1 + (u2 − A21 A11 u1 ) A22.1 (u2 − A21 A11 u1 )
−1 −1 −1 = u 2 A−1 22 u2 + (u1 − A12 A22 u2 ) A11.2 (u1 − A12 A22 u2 ),
−1 where A11.2 = A11 − A12 A−1 22 A21 and A22.1 = A22 − A21 A11 A12 .
Proof of Theorem 2. As noted in Section 5, it suffices to establish the identity (19). To this end, first write 1/2
1/2
S = BS RB S
(A.1)
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
3425
where BS is a p × p diagonal matrix with diagonal entries s11 , . . . , spp and R is a (sample) correlation matrix with diagonal entries equal to one. Next, consider −1/2
z = BS
−1/2
= BS
y,
which are both in R p . In terms of these new variables, D1 = inf c (z − ) R −1 (z − ) ∈
(A.2)
√ and Ti = n zi , i = 1, . . . , p. If z ∈ c , then D1 = 0 since ˆ = z achieves the infimum in (A.2), and (19) holds for this case. Next, assume z ∈ and assume for simplicity that z1 = min zi . 1i p
(A.3)
We now show that, for z ∈ and when (A.3) holds, 1/2
D1
= z1 .
(A.4)
Because the function → (z − ) R −1 (z − ) is convex and achieves its infimum at z ∈ , the infimum in (A.2) is equal to I ≡ inf (z − ) R −1 (z − ). ∈j
In terms of the faces Fj in (5), we have inf (z − ) R −1 (z − )
I = min
1 j p ∈Fj
and with
z=
z1 z˙
,
(A.5)
0 = ˙ ,
the fact that R is a correlation matrix yields −1 (z − ) R −1 (z − ) = z12 + (˙z − ˙ − R21 z1 ) R22.1 (˙z − ˙ − R21 z1 ).
Because (A.3) holds and the coordinates of the (p − 1) × 1 vector R21 are all in [−1, 1], the vector ˆ˙ = z˙ − R21 z1 has non-negative coordinates. Thus the choice ˆ = 0ˆ ∈ F1 ˙ shows that inf (z − ) R −1 (z − ) = z12 .
∈F1
But, a similar argument shows that for 2 j p, inf (z − ) R −1 (z − ) zj2 .
∈Fj
Thus, when (A.3) holds, I = z12 and (A.4) is established.
3426
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
A symmetric argument shows that if z ∈ and zj = min zi , 1i p
1/2
then D1
= zj . Hence, for z ∈ ,
1/2
D1
1 = min zi = √ min Ti . 1i p n 1i p
and the proof is complete.
A.2. Discussion of Theorem 3 (1)
Here we provide a proof of Theorem 3. The idea is to show that the LRT for H0 versus HA is, in fact, the same as (2) (2) the LRT for H0 versus HA , and then invoke Theorem 2. Recall that the LRT for H0 versus HA rejects if 2 in (20) is small enough. The strict convexity of the function → (y − ) S −1 (y − ) shows that, for y ∈ c , inf (y − ) S −1 (y − ) = inf (y − ) S −1 (y − ).
∈
∈j
This immediately implies that 2 (y) = 1 for y ∈ c . But for y ∈ , the supremum in the denominator in (20) is equal to one. Therefore, 2 (y) is given by
1 if y ∈ c ,
2 (y) = sup L∗ () if y ∈ ∈j (2)
and the LRT rejects H0 if 2 (y) is too small. However, for y ∈ , sup L∗ () = ∈j
1 [1 + inf ∈j (y − ) S −1 (y − )](n+1)/2
.
Now, repeating the arguments used to prove Theorem 2 (see Section A.1 above), for y ∈ , inf (y − ) S −1 (y − ) = D1 ,
∈j
(2)
(2)
where D1 is given in (18). Because D1 > 0 for y ∈ , we see that the LRT for H0 versus HA rejects H0 for small enough values of 2 (y) given by
1 if y ∈ c , 1
2 (y) = if y ∈ . (1 + D1 )(n+1)/2 (1)
Since ∈ (0, 21 ), c > 0 and this test is the same as the LRT for H0 versus HA . The proof of Theorem 3 is complete. A.3. Discussion of Theorem 5 We first state a result that is a direct consequence of Slepian (1962) (see Tong, 1980, pp. 10–12 for a discussion). Suppose Z1 , . . . , Zp have a joint normal distribution, Np (, 0 ), where 0 is a correlation matrix, all of whose entries are non-negative. Then for all real numbers a1 , . . . , ap , P {Zi ai for i = 1, . . . , p|0 } P {Zi ai for i = 1, . . . , p|Ip } =
p i=1
P {Zi ai }.
(A.6)
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
3427
Actually, Slepian’s result is stronger than (A.6) (the first probability is non-decreasing in each off-diagonal element of 0 —see Tong (1980) for a precise statement), but (A.6) is what we need for the proof of Theorem 5. Next, we recall an inequality for a Wishart matrix. Suppose S has a Wishart Wp (n, 1 ) distribution, n p, with diagonal elements s11 , . . . , spp . Then for all positive numbers a1 , . . . , ap , P {s11 a1 , . . . , spp ap |1 }
p
P {sii ai |ii },
(A.7)
i=1
where 11 , . . . , pp are the diagonal elements of the covariance matrix 1 . Inequality (A.7) is from Corollary 4.1 in Das Gupta et al. (1972). An immediate extension of (A.7) is the following. Lemma A.1. Consider non-increasing functions hi : (0, ∞) → [0, ∞), for i = 1, . . . , p and suppose S ∼ Wp (n, 1 ) with n p. Then ES1
p
hi (sii )
i=1
p
Esiiii hi (sii ).
(A.8)
i=1
Proof. This is a direct consequence of (A.7). Because of (A.7), (A.8) holds when all the hi are indicators of sets of the form [0, ai ], i = 1, . . . , p. Also, both sides of (A.8) are linear in each hi . Thus (A.8) holds for all hi ’s of the form hi (x) =
m
bj I[0, aij ] (x),
j =1
where bj 0, j = 1, . . . , m, and aij 0. Now take limits.
Proof of Theorem 5. It must be shown that ( , R) ( , Ip )
(A.9)
because of the identity in (24). Recall that in (A.9), R is the correlation matrix obtained from and by assumption, all the elements of R are non-negative. Now, consider variables Y1 , . . . , Yp that are jointly normal Np ( , R) and an independent Wishart matrix S that is Wp (n, ), and let Yi (A.10) ∗ ( , R, ) = P √ c , i = 1, . . . , p| , R, . sii /n Observe that ( , R) = ∗ ( , R, R)
(A.11)
so ( , Ip ) = ∗ ( , Ip , Ip ). Conditioning on S, (A.10) can be written sii , i = 1, . . . , p|S, , R, . ES P Yi c n Applying (A.6) to the probability inside the expectation yields ∗ ( , R, )ES
p i=1
sii S, , I P Yi c , . p n
(A.12)
3428
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
But for and S fixed, the probability inside the product sign does not depend on . Further, sii hi (sii ) = P Yi c , . S, , I p n is a non-increasing function of sii (and does not depend on ), so ∗ ( , R, )ES
p
(A.13)
hi (sii ).
i=1
Now, applying (A.8) to the right-hand side of (A.13) and setting each ii = 1 yields ∗ ( , R, )ES
p
hi (sii ) = ∗ ( , Ip , Ip ).
i=1
Thus ( , R) = ∗ ( , R, ) ∗ ( , Ip , Ip ) = ( , Ip ) which is the assertion in Theorem 5.
A.4. Discussion of Theorem 6 Theorem 6 is a consequence of Lemma A.2 below, whose proof relies on a result of Davidovic et al. (1969). Let h1 and h2 be two non-negative functions defined on R p , and consider the convolution h1 (y − x)h2 (y) dy (A.14) (h1 ∗ h2 )(x) = Rp
which is assumed to be finite for each x. The result of Davidovic et al. (1969) asserts that if h1 and h2 are both log concave, then the convolution h1 ∗ h2 is also log concave. Lemma A.2. The function w( ) in (32) is a log concave function of the real variable . Proof. Lemma A.2 asserts that the function w( ) = P {T1 c | }
(A.15)
is log concave, where T1 has a non-central t-distribution with n degrees of freedom and non-centrality parameter . Now, consider X ∼ N( , 1) and an independent V which is chi-squared with n degrees of freedom. Then T1 has the same distribution as Z+ T˜ = √ V /N where Z is N(0, 1) and is independent of V . First observe that for u ∈ R 1 , h1 (u) = P {Z u} = P {Z − u 0} =
(A.16)
I[0,∞) (z − u)(z) dz.
(A.17)
Since I[0,∞) (·) is log concave and the normal density is log concave, the result (A.14) of Davidovic et al. (1969) implies that h1 is log concave. Next, Z+ w( ) = P √ c = EV P{Z c V /n − |V } = EV h1 (c V /n − ). V /n
M.L. Eaton, R.J. Muirhead / Journal of Statistical Planning and Inference 137 (2007) 3416 – 3429
3429
√ Now, c > 0 and it is easy to show that the density function of the random variable U = c V /n, say h2 (u) is log concave. Therefore ∞ h1 (u − )h2 (u) du w( ) = −∞
is log concave by a second application of (A.14). This completes the proof of Lemma A.2.
Proof of Theorem 6. Because has the product representation (31) and w is log concave by Lemma A.2 above, the results in Chapter 3, Section E (pp. 73–74) of Marshall and Olkin (1979) show that is Schur concave. References Berger, R.L., Hsu, J., 1996. Bioequivalence trials, intersection-union tests and equivalence confidence sets. Statist. Sci. 11, 283–319. Casella, G., Berger, R.L., 1990. Statistical Inference. Wadsworth, Pacific Grove, CA. Cohen, A., Sackrowitz, H.B., 2005. Decision theory results for one-sided multiple comparison procedures. Ann. Statist. 33, 126–144. Das Gupta, S., Eaton, M.L., Olkin, I., Perlman, M.D., Savage, L.J., Sobel, M., 1972. Inequalities on the probability content of convex regions for elliptically contoured distributions. In: LeCam, L.M., Neyman, J., Scott, E.L. (Eds.), Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, vol. 2, University of California Press, Berkeley, CA, pp. 241–265. Davidovic, Ju.S., Korenbljum, B.I., Hacet, B.I., 1969. A property of logarithmically concave functions. Soviet Math. Dokl. 10 (2), 477–480 (English translation). Eaton, M.L., 1983. Multivariate Statistics: A Vector Space Approach. Wiley, New York. Eaton, M.L., 1987. Lectures on Topics in Probability Inequalities. Centrum voor Wiskunde en Informatica, Amsterdam. International Headache Society Clinical Trials Subcommittee, 2000. Guidelines for Controlled Trials of Drugs in Migraine, second ed. Cephalalgia 20, 765–786. Marshall, A.W., Olkin, I., 1979. Inequalities: Theory of Majorization and its Applications. Academic Press, New York. Muirhead, R.J., 1982. Aspects of Multivariate Statistical Theory. Wiley, New York. Sankoh, A.J., D’Agostino, R.B., Huque, M., 2003. Efficacy endpoint selection and multiplicity adjustment methods in clinical trials with inherent multiple endpoint issues. Statist. Med. 22, 3133–3150. Slepian, D., 1962. The one-sided barrier problem for Gaussian noise. Bell System Tech. J. 41, 463–501. Tong, Y.L., 1980. Probability Inequalities in Multivariate Distributions. Academic Press, New York. Zhang, J., Quan, H., Ng, J., Stepanavage, M.E., 1997. Some statistical methods for multiple endpoints in clinical trials. Controlled Clinical Trials 18, 204–221.