A Holm-type procedure controlling the false discovery rate

A Holm-type procedure controlling the false discovery rate

ARTICLE IN PRESS Statistics & Probability Letters 77 (2007) 1756–1762 www.elsevier.com/locate/stapro A Holm-type procedure controlling the false dis...

145KB Sizes 0 Downloads 33 Views

ARTICLE IN PRESS

Statistics & Probability Letters 77 (2007) 1756–1762 www.elsevier.com/locate/stapro

A Holm-type procedure controlling the false discovery rate Yongchao Gea,, Stuart C. Sealfona, Chi-Hong Tsengb, Terence P. Speedc,d a

Department of Neurology, Mount Sinai School of Medicine, One Gustave L. Levy Place, Box 1137, New York, NY 10029, USA b Division of Biostatistics, School of Medicine, New York University, USA c Department of Statistics, University of California, Berkeley, USA d Division of Bioinformatics, The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia Received 21 September 2005; received in revised form 16 February 2007; accepted 10 April 2007 Available online 7 May 2007

Abstract Benjamini and Hochberg [1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57, 289–300] proposed a step-up procedure controlling the false discovery rate (FDR) under the assumption that test statistics are independent. Benjamini and Yekutieli [2001. The control of false discovery rate in multiple hypothesis testing under dependency. Ann. Statist. 29 (4) 1165–1188] further developed a conservative procedure without this assumption. This paper proposes a Holm-type procedure controlling the FDR. This simple stepdown procedure can be more powerful than the Benjamini and Yekutieli (2001) procedure when the number of rejected hypotheses is smaller than lnðmÞ, where m is the total number of hypotheses. r 2007 Elsevier B.V. All rights reserved. Keywords: Multiple testing; Holm; False discovery rate; Bonferroni; Dependent data

1. Introduction When m hypotheses H 1 ; . . . ; H m associated with test statistics T 1 ; . . . ; T m are to be tested together, we can obtain the corresponding p-values P1 ; . . . ; Pm . The traditional definition of type I error rate for the multiple testing problem is the family-wise error rate (FWER), the probability of falsely rejecting at least one null hypothesis. The widely applied Bonferroni procedure leads to a very simple level a FWER test by rejecting H i if Pi pa=m. a typically takes the value of 0.05. Holm (1979) proposed a more powerful procedure, still simple, to ensure that the FWER be no greater than a. We first order the p-values such that Pð1Þ p    pPðmÞ , and then define a critical value for each i; i ¼ 1; . . . ; m: ci;Holm ¼ a=ðm  i þ 1Þ. Let kHolm be the largest k such that Pð1Þ pc1;Holm ; . . . ; PðkÞ pck;Holm . Corresponding author.

E-mail address: [email protected] (Y. Ge). 0167-7152/$ - see front matter r 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.spl.2007.04.019

ARTICLE IN PRESS Y. Ge et al. / Statistics & Probability Letters 77 (2007) 1756–1762

1757

Under the Holm procedure, we reject the hypotheses H ð1Þ ; . . . ; H ðkHolm Þ . If kHolm is not defined, then no hypothesis is rejected. The Holm procedure begins with the most significant p-value Pð1Þ to see if H ð1Þ can be rejected with the critical value c1;Holm , and then compares the second significant p-value Pð2Þ with c2;Holm , until Pðkþ1Þ , which cannot be rejected anymore as Pðkþ1Þ 4ckþ1;Holm . Therefore the Holm is a step-down procedure. The Holm procedure is currently the most powerful procedure controlling the FWER among those that do not need strong assumptions on test statistics (independence or a parametric distribution) and that do not need resampling procedures such as those proposed in Westfall and Young (1993). In many research applications, any FWER procedure is too conservative in that non-null H i ’s cannot be identified, especially when m is large. In response to this problem, Benjamini and Hochberg (1995) considered using the proportion of falsely rejected null hypotheses among all rejected ones as an alternative of the FWER. As this proportion is an unobservable random variable, they defined the expectation of this proportion as the false discovery rate (FDR). Under the assumption that the null p-values are independent, Benjamini and Hochberg (1995) provided a procedure (hereafter called the BH procedure) whose FDR is no greater than level a, or we say the BH procedure controls the FDR at level a. In a later paper, without relying on the assumption that the null p-values are independent or positively dependent, Benjamini and Yekutieli (2001) developed a conservative step-up procedure (hereafter called the BY procedure) controlling the FDR. Benjamini and Yekutieli (2001) first defined the critical values ai ci;BY ¼ Pm . m h¼1 1=h Let kBY be the smallest k such that PðmÞ 4cm;BY ; . . . ; PðkÞ 4ck;BY and then reject hypotheses H ð1Þ ; . . . ; H ðkBY Þ . If kBY is not defined, then all hypotheses H ð1Þ ; . . . ; H ðmÞ are rejected. The BY procedure is a step-up one because it begins with the most insignificant p-value PðmÞ to see if H ðmÞ can be accepted with the critical value cm;BY , and then Pðm1Þ , until Pðk1Þ which cannot be accepted any more as Pðk1Þ pck1;BY . The BY is a very conservative procedure whose critical values ci;BY are even smaller than ci;Holm when ip lnðmÞ. In applications where the number of rejected hypotheses is smaller than lnðmÞ, the Holm procedure will be more powerful than the BY procedure. As with any FWER procedure, the Holm procedure always controls the FDR. There seems a need to develop a procedure more powerful than the Holm in considering the FDR control. In this paper, we aim to propose a new step-down and still simple procedure that can control the FDR for dependent data. This new procedure is more powerful than the Holm, and also more powerful than the BY when at most lnðmÞ hypotheses are rejected. Our new procedure is a Holm-type procedure in being simple and step-down. Section 2 describes our new procedure and gives the proofs. Section 3 gives the comparison of our new procedure with the BY procedure. The discussion is in Section 4. 2. A new step-down procedure Our new step-down procedure defines the critical values am ci;new ¼ . ðm  i þ 1Þ2

(1)

Let knew be the largest k such that Pð1Þ pc1;new ; . . . ; PðkÞ pck;new and reject the hypotheses H ð1Þ ; . . . ; H ðknew Þ . If knew is not defined, then no hypothesis is rejected. As with the Holm procedure, our new procedure is a step-down one. Let us review some notation before introducing our theorems. Denote the complete set of hypotheses by M, and let jMj ¼ m, where j  j is the cardinality of a set. Let M 0 be the subset of M for which the null hypotheses are true, so the complement of M 0 is the set M 1 ¼ MnM 0 for which the null hypotheses are false. Define m0 ¼ jM 0 j, m1 ¼ jM 1 j ¼ m  m0 .

ARTICLE IN PRESS Y. Ge et al. / Statistics & Probability Letters 77 (2007) 1756–1762

1758

Let the p-value vector of the true null hypotheses be PM 0 ¼ fPi ; i 2 M 0 g and let the p-value vector of the false null hypotheses be PM 1 ¼ fPi ; i 2 M 1 g. For a given rejection procedure, we denote the total number of rejected hypotheses by R, and the number of falsely rejected null hypotheses by V. Benjamini and Hochberg (1995) defined the FDR to be FDR ¼ EðV =RÞ.

(2)

In the above equation, V =R is treated as zero if R ¼ 0. In this paper, we do not consider the trivial case that m0 ¼ 0 since the FDR of any multiple testing procedure will be zero when m0 ¼ 0. Next we will prove two theorems, and the proof will use some results from the lemmas at the end of this section. Theorem 1. Define non-decreasing critical values 8 > < a  ðm0 þ i  1Þ for i ¼ 1; . . . ; m1 þ 1; m20 ci ¼ > :1 for i ¼ m1 þ 2; . . . ; m:

(3)

If PM 0 and PM 1 are jointly independent, i.e., PðPi pxi ; i 2 M 0 ; Pj pyj ; j 2 M 1 Þ ¼ PðPi pxi ; i 2 M 0 Þ  PðPj pyj ; j 2 M 1 Þ, then the step-down procedure with these critical values ci controls the FDR at level a. Proof. Let PðiÞ;M 1 be the ith smallest p-value from the set M 1 . Denote B0 ¼ fPð1Þ;M 1 4c1 g. For i ¼ 1; . . . ; m1  1, let Bi ¼ fPð1Þ;M 1 pc1 ; . . . ; PðiÞ;M 1 pci ; Pðiþ1Þ;M 1 4ciþ1 g and define Bm1 ¼ fPð1Þ;M 1 pc1 ; . . . ; Pðm1 Þ;M 1 pcm1 g. B0 ; . . . ; Bm1 are a mutually exclusive decomposition of the whole sample space. Note that in set Bi , Pð1Þ;M 1 ; . . . ; PðiÞ;M 1 are not necessarily the smallest p-values from the whole set M since some p-values from M 0 may be smaller than some p-values from M 1 . However, Lemma 3 leads us to at least reject the false null hypotheses H ð1Þ;M 1 ; . . . ; H ðiÞ;M 1 , and then we have V V m0 p p . R V þ i m0 þ i

(4)

If we also erroneously reject at least one null hypothesis in set M 0 , then Lemma 3 says Pð1Þ;M 0 pciþ1 .

(5)

Readers should note that the above statement does not claim that Pð1Þ;M 0 will be the ði þ 1Þth smallest p-value, but simply that Eq. (5) needs to be satisfied for Pð1Þ;M 0 to be rejected. Therefore, FDR ¼ ¼ p

m1 X i¼0 m1 X i¼0 m1 X i¼0

EðV =R  I Bi Þ EðV =R  I Bi  I fPð1Þ;M 0

is rejectedg Þ

Eðm0 =ðm0 þ iÞ  I Bi  I fPð1Þ;M 0 pciþ1 g Þ

ðapply Eqs. (4) and (5)Þ

ARTICLE IN PRESS Y. Ge et al. / Statistics & Probability Letters 77 (2007) 1756–1762 m1 X

¼ p p

m0  PðPð1Þ;M 0 pciþ1 Þ  PðBi Þ m0 þ i

i¼0 m1 X

ðPM 0 and PM 1 are independentÞ

m0  m0 ciþ1  PðBi Þ m0 þ i

i¼0 m1 X

1759

ð6Þ

aPðBi Þ

i¼0

¼ a:

&

ð7Þ

In practice, we do not have any knowledge of m0 . The following theorem can be applied without any information on m0 . Theorem 2. With the critical values defined in Eq. (1), our new step-down procedure controls the FDR at level a under the assumption that PM 0 and PM 1 are jointly independent. Proof. We can rewrite the proof of Theorem 1 by replacing ci with ci;new . The Bi;new can be redefined in terms of ci;new . Everything follows until Eq. (6). Eq. (6) becomes FDRp

m1 X i¼0

p

m1 X i¼0

pa:

m0  m0 ciþ1;new  PðBi;new Þ m0 þ i m0  m0 ciþ1  PðBi;new Þ ðciþ1;new pciþ1 by Lemma 1Þ m0 þ i &

ð8Þ

This theorem uses critical values similar to those of the Holm procedure that controls the FWER, in which the critical value is ci;Holm ¼ a=ðm  i þ 1Þ. By noting that ci;new is always greater than ci;Holm for each i, our procedure rejects more hypotheses than the Holm, and hence is more powerful. This result is not surprising as our procedure aims to control the FDR while the Holm procedure aims to control the FWER, and it is known that the FWER is a much more stringent criterion than the FDR. Lemma 1. Using the ci and ci;new defined in Eqs. (3) and (1), respectively, we have ci Xci;new for i ¼ 1; . . . ; m: Proof. If i ¼ 1; . . . ; m1 þ 1,then m0 pm  i þ 1, and so m0 þ i  1 i1 i1 m ¼ . ¼1þ X1 þ m0 m0 miþ1 miþ1 The two facts m0 pm  i þ 1 and ðm0 þ i  1Þ=m0 Xm=ðm  i þ 1Þ lead to m0 þ i  1 m X . 2 m0 ðm  i þ 1Þ2 Using the definitions of ci and ci;new gives the proof.

&

Lemma 2. For non-decreasing critical values c1 p    pcm , let xi , i ¼ 1; . . . ; m1 and yi , i ¼ 1; . . . ; m0 be real numbers. Let xðiÞ be the ith smallest of the x’s. Assume xð1Þ pc1 ; . . . ; xðKÞ pcK ; xðKþ1Þ 4cKþ1 . Let z ¼ ðx1 ; . . . ; xm1 ; y1 ; . . . ; ym0 Þ. (a) For any j, if zðjÞ pcK then zðjÞ pcj . (b) Let J be the integer such that zðJÞ ¼ yð1Þ . If zðjÞ pcj for j ¼ 1; . . . ; J, then yð1Þ pcKþ1 . Proof. For the first statement, if jpK, then xðjÞ pcj . Combining this with zðjÞ pxðjÞ , we have zðjÞ pxðjÞ pcj .

(9)

ARTICLE IN PRESS Y. Ge et al. / Statistics & Probability Letters 77 (2007) 1756–1762

1760

If j4K, then the non-decreasing sequence ci ’s imply cK pcj , and so zðjÞ pcK pcj . We will prove the second statement by the method of contradiction. Assume yð1Þ 4cKþ1 , so xð1Þ pc1 ; . . . ; xðKÞ pcK implies that J must be no smaller than K þ 1 to have zðJÞ ¼ yð1Þ . The fact that zðKþ1Þ ¼ minðxðKþ1Þ ; yð1Þ Þ4cKþ1 and the fact that JXK þ 1 imply that the following statement is impossible: zðjÞ pcj

for j ¼ 1; . . . ; J.

This contradiction proves that yð1Þ pcKþ1 .

&

If we state Lemma 2 as a step-down rejection problem, then we have: Lemma 3. For non-decreasing critical values c1 p    pcm , assume Pð1Þ;M 1 pc1 ; . . . ; PðKÞ;M 1 pcK

and

PðKþ1Þ;M 1 4cKþ1 .

If we use a step-down rejection procedure with the critical values ci , (a) The hypotheses H ð1Þ;M 1 ; . . . ; H ðKÞ;M 1 must be rejected. (b) If H ð1Þ;M 0 is also rejected, then Pð1Þ;M 0 pcKþ1 . Proof. Let x ¼ fPi : i 2 M 1 g and y ¼ fPi : i 2 M 0 g. Applying Lemma 2, we have the proof.

&

3. Comparing the new procedure with the BY procedure Our procedure is much less powerful than the BH procedure as we do not employ any independence assumption among the true null hypotheses. It is thus more reasonable to compare this new procedure with the BY procedure, which controls the FDR without relying on the independence or positive dependence Pm assumptions. Note that 1=h is approximately lnðmÞ þ 12. The ratio of the critical values of the new h¼1 procedure to those of the BY procedure is ci;new ri ¼ ci;BY a  m=ðm  iÞ2 a  i=ðm lnðmÞÞ m2  lnðmÞ . ¼ ðm  iÞ2  i 

It is easy to see that ri 41 if ip lnðmÞ, and ri will be smaller than 1 if i is greater than lnðmÞ. This shows that our procedure has advantages when only a smaller number of hypotheses are rejected. This procedure will be mostly useful in applications where people expect that most null hypotheses are true with a small number of false null hypotheses. In the discussion above, we have not considered the step-up and step-down enforcement, which might complicate the comparison. The power of the new procedure and the BY procedure is also studied by simulations, which take into account the step-up and step-down enforcement. Let X i ¼ mi þ i , for i ¼ 1; . . . ; m. The hypothesis H i is to test mi ¼ 0. We assume that mi ¼ 0 for i 2 M 0 and mi ¼ D for i 2 M 1 , and i are independently distributed as Nð0; 1Þ. This very simplified scenario is used for the illustration purpose to compare the BY procedure and the new procedure, though under this scenario the BH procedure is a much better choice. The p-value based on two-sided Z-test can be computed by Pi ¼ 2  ð1  FðjX i jÞÞ, where FðÞ is the cumulative distribution function of Nð0; 1Þ. Table 1 lists the average numbers of hypotheses rejected by the BY procedure and by the new procedure under different combinations of m; m1 , and D. This simulation study demonstrates that the new procedure is more powerful than the BY procedure when only a small number of false null hypotheses are present in the data (m1 is small).

ARTICLE IN PRESS Y. Ge et al. / Statistics & Probability Letters 77 (2007) 1756–1762

1761

Table 1 The numerical comparison between the BY procedure and the new procedure under different combinations of m; m1 , and D m 100

m1 1

5

10

1000

1

5

10

50

100

D

BY

New

New/BY

1 2 3 1 2 3 1 2 3

0.01 0.04 0.19 0.02 0.17 1.15 0.03 0.36 2.8

0.06 0.12 0.36 0.08 0.41 1.64 0.11 0.75 3.25

5.49 2.81 1.87 4.34 2.4 1.42 3.92 2.09 1.16

1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

0.01 0.04 0.2 0.02 0.17 1.18 0.03 0.35 2.8 0.11 3.12 24.39 0.24 10.12 59.42

0.06 0.12 0.37 0.09 0.39 1.65 0.11 0.74 3.25 0.36 3.64 17.69 0.66 7.5 43.68

4.15 2.95 1.84 3.96 2.31 1.4 3.93 2.13 1.16 3.19 1.17 0.73 2.8 0.74 0.74

At the FDR level a ¼ 0:05, the column BY and column New give the average numbers of hypotheses rejected from 10,000 simulations.

4. Discussion This paper gives a Holm-type procedure controlling the FDR for dependent data. The critical values of our new procedure are always greater than the Holm procedure, and accordingly it is more powerful than the Holm in controlling the FDR. This procedure is also more powerful than the BY when the number of rejected hypotheses is smaller than lnðmÞ, where m is the total number of hypotheses. Our new procedure allows any unknown correlation structure on test statistics among each subset (M 0 or M 1 ). In contrast to the BH procedure, our approach does not require the strong assumption that the null test statistics are independent or positively dependent. As with the BY procedure, this new procedure is much less powerful than the BH procedure, which requires independence or positive dependence assumptions on the test statistics. Another research direction is to develop a resampling procedure for controlling the FDR by utilizing the dependence information of the test statistics (Ge et al., 2003; Westfall and Young, 1993) in controlling the FWER. Interested readers are referred to our working paper (Ge et al., 2007). Our new procedure still depends on the assumption that PM 0 and PM 1 are jointly independent. Future research may find a simple step-down or step-up procedure which, like the BY procedure, does not rely on this assumption, but is more powerful. Another interesting area for exploration results from Theorem 1, which states that if we have rejected the smallest m1 þ 1 p-values, then the remaining m0  1 p-values are automatically rejected altogether. In order to make use of the procedure suggested by this theorem, one would need information on m0 . An open question for researchers is how to estimate m0 . One solution of this problem can be seen in Storey et al. (2004) under the assumption that null p-values from M 0 are independent. For dependent data, it is a more challenging problem to find an estimate of m0 and make use of this estimate so that the FDR can still be controlled for finite samples.

ARTICLE IN PRESS 1762

Y. Ge et al. / Statistics & Probability Letters 77 (2007) 1756–1762

Acknowledgments We thank Carol Bodian, Sylvan Wallenstein and Xiaochun Li for their valuable comments on the paper. This work was supported by NIH Grants DK46943, U19 Al06231 and Contract HHSN 266200500021C. We thank the Editor for suggesting the addition of the numerical comparison in Table 1. After we submitted the paper, we became aware of the work by Romano and Shaikh (2006), whose Theorem 4.1 has considerable overlap with our Theorem 2. As far as we know, our Theorem 1 to address the control of the FDR is new. References Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57, 289–300. Benjamini, Y., Yekutieli, D., 2001. The control of the false discovery rate in multiple hypothesis testing under dependency. Ann. Statist. 29 (4), 1165–1188. Ge, Y., Dudoit, S., Speed, T.P., 2003. Resampling-based multiple testing for microarray data analysis. Test 12 (1), 1–44 (with discussion 44–77). Ge, Y., Sealfon, S.C., Speed, T.P., 2007. Some step-down procedures controlling the false discovery rate under dependence. Manuscript. Holm, S., 1979. A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6, 65–70. Romano, J.P., Shaikh, A.M., 2006. On stepdown control of the false discovery proportion. In: IMS Lecture Notes—Monograph Series: Second Lehmann Symposium—Optimality, vol. 49, pp. 33–50. Storey, J.D., Taylor, J.E., Siegmund, D., 2004. Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach. J. Roy. Statist. Soc. Ser. B 66, 187–205. Westfall, P.H., Young, S.S., 1993. Resampling-Based Multiple Testing: Examples and Methods for p-value Adjustment. Wiley, New York.