FDR- and FWE-controlling methods using data-driven weights

FDR- and FWE-controlling methods using data-driven weights

Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870 www.elsevier.com/locate/jspi FDR- and FWE-controlling methods using data-driven ...

205KB Sizes 0 Downloads 49 Views

Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870 www.elsevier.com/locate/jspi

FDR- and FWE-controlling methods using data-driven weights Livio Finosa , Luigi Salmasob,∗ a Center for Modelling Computing and Statistics, University of Ferrara, via N. Machiavelli 35, 44100 Ferrara, Italy b Department of Management and Engineering, University of Padova, Str. S. Nicola 3, 36100 Vicenza, Italy

Received 11 November 2005; received in revised form 3 April 2006; accepted 6 April 2007 Available online 4 May 2007

Abstract Weighted methods are an important feature of multiplicity control methods. The weights must usually be chosen a priori, on the basis of experimental hypotheses. Under some conditions, however, they can be chosen making use of information from the data (therefore a posteriori) while maintaining multiplicity control. In this paper we provide: (1) a review of weighted methods for familywise type I error rate (FWE) (both parametric and nonparametric) and false discovery rate (FDR) control; (2) a review of datadriven weighted methods for FWE control; (3) a new proposal for weighted FDR control (data-driven weights) under independence among variables; (4) under any type of dependence; (5) a simulation study that assesses the performance of procedure of point 4 under various conditions. © 2007 Elsevier B.V. All rights reserved. Keywords: A priori ordered hypotheses; FDR; FWE; Multiplicity control; Weighted procedures

1. Introduction In problems dealing with thousands (and sometimes hundreds of thousands) of variables, the standard Type I error rate criterion used to evaluate tests becomes less important as hundreds of “significances” might easily be, as a matter of fact, Type I errors. On the other hand, attempts to rigorously control the familywise type I error rate (FWE) are typically excessively conservative. For example, if the Bonferroni method is used to control the FWE with m tests, then a test will have to be significant at the /m level for it to be declared “real”. The two complaints most often made about this method are that (i) the procedure’s dependence on m seems arbitrary, and related to that, (ii) the method is exceedingly conservative for large values of m. Holm’s (1979) step-down approach is a slight improvement on the simple Bonferroni method, but gains little in terms of power when m is large. The Holm’s method rejects the hypothesis corresponding to the most significant (smallest p-value) test if the p-value is less than /m; if this hypothesis is rejected, then the second smallest p-value is compared to /(m−1), and so on. It is usually called Bonferroni–Holm Procedure (BHP). Benjamini and Hochberg’s (1995) false discovery rate (FDR) controlling method has been proposed as an alternative to FWE-controlling methods. To a certain extent, FDR solves problems (i) and (ii) though it may become exceedingly conservative for large m. FDR-controlling procedures do not generally control the FWE, thus they allow some fraction of detected significances to be in error. ∗ Corresponding author. Tel.: +39 0444998720; fax: +39 0444998888.

E-mail addresses: livio.fi[email protected] (L. Finos), [email protected] (L. Salmaso). 0378-3758/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2007.04.004

3860

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870

Weighted methods are useful when some Hi hypotheses are deemed more important than others—e.g. in clinical trials the various patient end points might be ranked a priori, and the testing procedure designed to give more power to the more important hypotheses. The simplest weighted multiple testing procedure, discussed for example in Rosenthal and Rubin (1983), is to reject Hi if pi wi , where the weights wi lie in the simplex wi 0; wi = 1, and pi is the p-value of the ith test, i = 1, . . . , m. The choice of wi may be based purely on the a priori importance of the hypotheses or, to optimise power, based on prior information (Spjtvoll, 1972; Westfall and Krishen, 2001). Giving greater weight to the more significant hypotheses is generally considered “data snooping”, and such methods inflate (multivariate) type I error rates. However, when properly chosen, weights can be taken from the concurrent data set so as to improve power without compromising significance levels. In this paper, we consider a class of weighted FWE-controlling procedures and a class of weighted FDR-controlling procedures which exploit information from the data and, under some conditions, improve performances in terms of power. All proofs of the proposed new theorems are given in the Appendix. 2. Weighted FWE and FDR methods In order to discuss the error rates we would like to control, we get Ri =1 if Hi (Hi being a true or false null hypothesis) is rejected and 0 otherwise, and we get V i = 1 if a true Hi is (erroneously) rejected, and 0 otherwise. The FWE can be defined as FWE =P ( Vi > 0), which is the probability of making at least one erroneous rejection. The FDR, which is the rejected hypotheses  expected  proportion of erroneously   among the rejected ones, can be defined as FDR = E( Vi / Ri ), where Vi / Ri is defined as 0 when Ri = 0. It is easy to see that FDR  FWE. Note also that when all hypotheses are true FDR = FWE (Benjamini and Hochberg, 1997; Benjamini and Yekutieli, 2001). 2.1. Weighted FWE methods When we require FWE control, there are at least two weighted procedures based on Bonferroni’s inequality to be considered. The first is Holm’s (1979) weighted procedure (known as the Weighted Holm’s Procedure—WHP), the second, proposed in Benjamini and Hochberg (1997), might be called Weighted Benjamini–Hochberg’s Procedure (WBHP). Both of the above procedures control the unweighted FWE. The natural “weighted” extension of FWE   (P [ Vi > 0]) which we call WFWE is P [ wi Vi > 0]. These procedures control the WFWE as well since the latter is equal to the former for any set of weights. We shall now briefly describe the two procedures. 2.1.1. Holm’s (1979) weighted procedure ∗  · · · P ∗ . Let H∗ , w ∗ , correspond to P ∗ . Reject H∗ for i = 1, 2, 3, . . . as long Let Pi∗ = Pi /wi and order P(1) (m) (i) (i) (i) (i) as  ∗ P(i)  m ∗ . k=i w(k) Holm’s unweighted procedure is the above procedure with equal weights all have  to be equal to one. m which m ∗ ∗ ∗ If the weighted procedure called for the rejection of Hi when P(i) k=i w(k) = P(i) (1 + k=i+1 w(k) /w(i) ) is smaller than a constant , then a larger w(i) implies greater power for rejecting that hypothesis. The weights w(i) come into ∗ ; therefore w play also in the definition of the ordered vector P(i) (i) affects the probability of rejecting H(i) given that H(j ) ∀j < i has been rejected and it also affects the order of the tested hypothesis. 2.1.2. Benjamini–Hochberg’s (1997) weighted procedure Benjamini and  Hochberg (1997) discuss a procedure which controls the WFWE by the ordered Pi , P(i)  · · · P(m) . Let H(i) , w(i) ( m i=1 w(i) = m), correspond to P(i) . Reject H(i) for i = 1, 2, 3, . . . as long as w(i) . P(i)  m k=i w(k) Proof of the FWE control is found in Benjamini and Hochberg (1997, Theorem 2). Again, Holm’s unweighted procedure is the same as the above procedure with equal weights which all have to be equal to one.

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870

3861

Note that this procedure respects the natural ordering of p-values (not corrected for multiplicity). Therefore, wi affects the probability of rejecting Hi given that Hj ∀j < i has been rejected, but it does not affect the order of the tested hypothesis. 2.2. Weighted FDR methods For the ease of notation assume that the first 0 m0 m hypotheses tested are in fact true and m1 = m − m0 are false. The FDR is  m0  i=1 Vi E(Q) = E m , (1) i=1 Ri which is the expected proportion of the falsely rejected hypotheses among the rejected ones. When weighting is desired, the FDR can be generalized as follows. Let Q(w) be  m0 w V m i=1 i i m , i=1 wi Ri > 0, (2) Q(w) = w R i=1 i i 0 otherwise, then the weighted FDR (WFDR) is defined as E(Q(w)). Note that if some of the weights are 0, and the others are all equal, then the WFDR is identical to the FDR for the problem restricted to testing hypotheses with positive weights. Under the intersection null hypothesis that all tested hypotheses are true, WFDR = WFWE. It is again easy to show that WFDR WFWE, so a WFDR controlling procedure is potentially more powerful than a WFWE controlling procedure. 2.2.1. Weighted FDR methods (Benjamini and Hochberg, 1997) for independent variables Consider the following procedure: Let k be the largest j satisfying j w(i) P(j )  i=1 q, m

(3)

then reject H(1) , . . . , H(k) (q being the desired proportion of false rejections). 

Theorem 2.1 (Benjamini and Hochberg, 1997; Theorem 4). Procedure (3) always controls the FDR at level  (M0 being the set of indexes that correspond to true null hypotheses) for independent test statistics.

i∈M0 wi

m

q

In this procedure, the weights incorporated into the error rate are suitably accumulated to form the procedural weights. It is important to reject a hypothesis with a high weight as it considerably increases the “weight” of the total discoveries. However,  Essentially we are incorporating the same weights into the loss from  it also increases the weight of the errors. errors wi Vi and the gain from rejections wi Ri . 2.2.2. Weighted FDR methods for dependent variables Benjamini and Yekutieli (2001) provide a correction constant that guarantees FDR control under every type of dependence between variables. In this section, we extend previous results from the literature to the weighted case for dependent variables. Noting that this situation is not the main task of the paper, this paragraph is self-contained with respect to the rest of the paper. Theorem 2.2. When procedure (3) is performed with 

FDR at level 

i∈M0 wi

m

m

j =1 (1/h  j wh )q

taking the place of q, it always controls the

q (M0 being the set of indexes that correspond to true null hypotheses).

This result may increase the range ofproblems for which a powerful procedure with proven FDR control can be 1 offered. Obviously the adjustment by m j =1 h  j wh is very often unnecessary, and yields a procedure that is too   h∈M0 wh 1 conservative. Indeed the control is given at level ( m ) q and no correction whatsoever is necessary j =1 h  j wh m

3862

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870 

 h∈M0 wh 1 1. Given that in many real experimental cases the proportion of active variables is low, until ( m j =1 h  j wh ) m this condition is usually respected and the correction is not therefore necessary. In the main theorem of the work of Benjamini and Yekutieli (2001, Theorem 1.2), the authors prove that a procedure without this correction still controls the (unweighted) FDR in families with Positive Regression Dependency on each element from a Subset M0 , or PRDS on M0 . Recall that a set D is called increasing if x ∈ D and y x, implies that y ∈ D as well. Property PRDS. For any increasing set D, and for each i ∈ M0 , P (X ∈ D|Xi = x) is nondecreasing in x. Hence, the structure of the dependency assumed may be different for the set of the true hypotheses with respect to the set of the false hypotheses. Background on these concepts is clearly presented in Eaton (1986), supplemented by Holland and Rosenbaum (1986). The PRDS property is a relaxed form of the positive regression dependency property (Sarkar, 1969) and of MTP2 (Sarkar, 1998) as well. Hence the PRDS is a very weak condition of dependence between variables which often holds for the real data. The control of the FDR over PRDS variables can be easly extended to the case of weighted FDR procedures as follows: Theorem 2.3. If the joint distribution of the test statistics is PRDS on the subset  M0 of test statistics corresponding to true null hypotheses, the procedure given by (3) controls the FDR at level 

i∈M0 wi

m

q.

Proof is straightforward if one adapts the proof given in Benjamini and Yekutieli (2001) to the weighted procedure. In the same way we adapt Theorem 2.2 from Theorem 1.3 of the same authors. Hint: define v as the sum of weights of true null hypotheses and s as the weights of false ones. In the same way, it is trivial matter to prove the control for discrete and one-sided test statistics (see Benjamini and Yekutieli, 2001, Theorems 5.2 and 5.3). 3. Data-driven weighted procedures We show a way to define data-driven weights that “gathers” information on the distribution of the hypotheses under the null or under the alternative. Afterwards, we show how these weights allows for FWE or FDR control for parametric and nonparametric tests. 3.1. Data-driven weights We consider parametric and nonparametric versions of the one sample and two-sample tests. Extension to C independent samples is straightforward. The parametric one-sample case is characterized by a sample of n i.i.d. multivariate normal observation vectors of dimension m, ⎛x ⎞ j1

xj = ⎝ ... ⎠ ∼ Nm (μ, ), j = 1, . . . , n, xj m ⎛ ⎞ ⎛ ⎞ 11 . . . 1m 1 . . . .. .. ⎠ , μ = ⎝ .. ⎠ ,  = ⎝ .. . m1 . . . mm m

(4)

and we wish to test the m partial hypotheses Hi : i =0 (i =1, . . . , m) under strong control of the FWE. The covariance matrix  is arbitrary and positive semidefinite. In a nonparametric setting, we assume that the n i.i.d. m-dimensional sample vectors xj (j = 1, . . . , n) have a density f (x) which is symmetrical to location vector μ = (1 , . . . , m ) , f (μ + x) = f (μ − x). Dependences among variables are not specified but are generally assumed to exist.

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870

3863

In the case of two independent samples from two m-dimensional normal populations with the same covariance matrix ⎛x ⎞ j 1c

xj c = ⎝ ... ⎠ ∼ Nm (μc , ), xj mc

c = 1, 2; j = 1, . . . , nc , n = n1 + n2 ,

(5)

we consider the usual two-sample t statistic in order to test partial null hypotheses Hi : 1i = 2i (i = 1, . . . , m). In the nonparametric approach, we simply assume that under the null hypothesis the two samples have the same distribution: f (xi1 ) = f (xi2 ) ∀i = 1, . . . , m. One way to obtain information about the presence of possible alternative hypotheses is given by the use of the total variance calculated on the entire sample without considering the classification into groups. Indeed the total sum of squares SSTi (i = 1, . . . , m) is given by sum of between groups sum of squares SSBi and within groups SSWi as follows: SSTi = =

nc



c=1,2 j =1 nc



(xij c − x i· )2 (xij c − x ic )2 +

c=1,2 j =1



nc (x ic − x i· )2

c=1,2

= SSWi + SSBi ,  c xicj /nc ; c = 1, 2 and x i· indicates the overall mean for variable i. where x ic = nj =1 The mean of total variance estimated by MSTi = SSTi /(n − 1) (i = 1, . . . , m) is equal to E(MSTi ) = E(SSWi /(n − 1)) + E(SSBi /(n − 1))

= ii (n − 2)/(n − 1) + ii /(n − 1) + nc (ic − i· )2 /(n − 1) c=1,2

= ii + n/(n − 1)((i1 − i2 )/2)2 = ii + n/(n − 1)(/2)2 . In this way we highlight that SSTi tends to be larger when the variable i is under H1 . This is especially true when n is small. In order to understand how the total variance is related to the power of tests with different sample sizes, we keep fixed the value of the t-statistic for increasing n (i.e. the√ same p-value when ii is known or the√ asymptotical√approximation to the . Now, writing Gaussian distribution holds). Hence n = 1 / n, being the t-statistic equal to t = n n/ii = 1 1/ii√ the expected value of MST as a function of n we get: E(MSTi ) = ii + (n /2)2 n/(n − 1) = ii + (/( n2))2 n/(n − 1) = ii + (/2)2 /(n − 1) which is a decreasing function of n. Similar considerations can be made in the case of more than two independent samples defining S i as the total  variance of variable i. If we are considering a one-sample test Si becomes Si = nj=1 xij2 /n. Note that Si = nj=1 xij2 /n= n 2 2 j =1 (xij −x i· ) /n+x i· . Therefore, with this definition we are considering the shift from zero and the error variability. Using the total variability as for the two-sample case, it would not be sensitive about nonnull effects. Hence, to define combining functions which give greater weight to the p-values with “large” S it is sufficient to consider wi = f (Si );

i = 1, . . . , m,

with f nondecreasing monotone function of Si . One particularly interesting case is obtained when weights are zero if Si is below a threshold  0: S if Si  wi = i 0 otherwise (i = 1, . . . , m).

(6)

(7)

This is equivalent to excluding the corresponding variables from the analysis. Also note that for  = 0 the weights are equal to the total variance, i.e. wi = Si .

3864

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870

A choice less sensitive to the magnitude of Si is the following: 1 if Si  wi = 0 otherwise (i = 1, . . . , m).

(8)

We shall see its usefulness in the simulation study of Section 4. 3.2. FWE-controlling procedures with data-driven weights 3.2.1. Parametric approach In the parametric setting, Läuter et al. (1996) noted that linear combination of weight vectors may depend on the data through certain quadratic forms, and the resulting (ordinary) t-tests retain their significance levels. Thus, by choosing the vector of weights suitably, the same procedures may be used to give more weight to particular hypotheses selected by the data. Westfall et al. (2004) propose using WHP with wi = Sir (r ∈ R, ),

i = 1, . . . , m.

This class of functions has the peculiarity of including two well-known procedures as particular cases: • if r = 0, it is equivalent to Bonferroni–Holm’s procedure, and all weights are equal to one. • if r → ∞, this corresponds to Kropf and Läuter’s procedure (2002) which orders the tests based on their variance and sequentially tests the hypotheses of interest without corrections until the first accepted hypothesis is found. In the two extreme cases, therefore, the influence of Si is different—in the first case Si is not considered, in the second it is very important since it establishes the order of admission of the hypotheses independently of their significance. By using the theory of spherically distributed matrices (Fang and Zhang, 1990), it is possible to state that under assumptions (4) and (5) the subset of variables corresponding to the true null hypothesis (defined by X0 ) is left spherically  distributed and the conditional distribution of X0 for fixed X0 X0 is also left spherical. Hence, each of the columns of X0 is also conditionally left spherically distributed and the t tests (one sample or two sample) exactly maintain their type I error (i.e. their p-values are uniformly distributed). Kropf and Hommel (2004) propose the use of the same kind of weight but with WBHP. Again with r = 0, we have the usual unweighted Bonferroni–Holm method but the procedure does not converge to Kropf and Läuter’s parametric procedure (Kropf and Läuter, 2002) when r → ∞. The FWE control again makes use of the theory of spherical distribution matrices. 3.2.2. Nonparametric approach For the one-sample problem the (nonparametric) rank-based counterpart of WHP and WBHP uses the medians of the absolute values of the original observations instead of the total variances and the p-values from the one-sample Wilcoxon tests instead of those from the t tests. For the two-sample problem they make use of the empirical interquartile range of the pooled sample of n = n1 + n2 observations and the p-value from the two-sample Wilcoxon test (Wilcoxon Mann Whitney U test). Kropf et al. (2004) proved that WHP based on rank tests controls FWE. Kropf and Hommel (2004) do the same for the WBHP. The proofs for the FWE control are very similar to those for its parametric counterpart. A second nonparametric approach is offered by Finos and Salmaso (2006) and is based on permutation tests. The proof of the FWE control is based on the invariance principle: if wi is defined on the basis of a generic element of orbit X/X (i.e. the permutation sample space), the procedure controls the FWE because the distribution of the test statistic is independent of the values of wi . There is an interesting connection between the principle of invariance and the theory of left-spherical distribution. In both cases we obtain tests that, conditional to the weights based on the variance, have a uniform distribution under the null hypothesis. Finos and Salmaso also show that, in the conditional framework, any statistic that evaluates the dispersion of data depending on a generic element of the permutation space, is an unbiased multiple test. Possible choices for defining the weights include using functions of the coefficient of variability (wi =f (CV )=f (S·i /x ·i )) or functions of the first–third

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870

3865

quartile range (wi = f (IQR) = f (Q3i − Q1i ), where Q3i and Q1i are the third and first quartiles for the ith variable). This last choice seems to be particularly useful when there are heavy-tailed variables, giving little weight to the large values produced by the random errors and more weight to the fixed effects. Finally, these results can be extended from the class of Bonferroni-like combining function to the broader class of nonparametric combining functions defined by Pesarin (2001). 3.3. FDR controlling procedures with data-driven weights In order to prove that the weights chosen as in (6) guarantee FDR control, we shall make use of the theory of spherically distributed matrices and follow the ideas provided in Benjamini and Hochberg (1997), we point out the steps of the proof where it is necessary to refer to the theory of left spherically distributed matrices to guarantee that the p-values corresponding to the true hypotheses Hi follow a uniform distribution conditional to wi . Lemma. For any 1 m0 m independent p-values corresponding to the true null  hypotheses, any set of m1 = m − m0 p-values, corresponding to the false null hypotheses, any set of weights wi 0, m i=1 wi = m defined as in (6) and any constant q, the multiple testing procedure defined by (3) satisfies the inequality m0 wi E(Q(w)|Pm0 +1 = p1 , . . . , Pm = pm1 )  i=1 q. (9) m Theorem 3.1. If test statistics are independent, then the procedure based on (3) and weights defined as in (6) controls the WFDR at level q. The proof of the theorem is straightforward from the previous lemma. For dependent variables, WFDR control is proven by the following theorem. Theorem 3.2. When procedure (3) with weights defined as in (6) is performed with 

always controls the FDR at signifance level less than or equal to to true null hypotheses).

i∈M0 wi

m

m

j =1 (1/h  j wh ) replacing q, it

q (M0 being the set of indexes that correspond

Theorem 3.3.  If the joint distribution of the test statistics is PRDS in M0 , the procedure given by (3) controls the FDR at level 

i∈M0 wi

m

q also when weights are defined as in (6).

4. A simulation study It has been noted that the classic procedures lose power when there is a high number of variables. For this reason this simulation study focuses on cases with a high number of variables. We consider the following model for multivariate one-sample data: at first, the variances of each of the m measurements are assumed to be independently generated as 2i ∼ 0 / 2 , i = 1, . . . , m, where 2 denotes a chi-square distributed random variable with degrees of freedom. The parameter 0 is a nuisance parameter reflecting overall scale; for convenience it is taken to be equal to 1 in the simulations. The parameter is specified in each simulation; small corresponds to large variance heterogeneity across variables, while = ∞ corresponds to variance homogeneity across variables; we take = 5, 10 and ∞ in the simulations. We also consider the average empirical ratio between the variance of the m variances and the variance that would result in the case of homogeneity across all m variables (i.e. = ∞), indicated by . This can be considered an indication of heterogeneity across the m variables. Next, conditional to 2i , the effect sizes j i = i /i ; ∀j = 1, . . . , n (see (4)) are assumed to be drawn independently from a mixture N(0, 2 ) and single point (0) distributions. The parameter 2 is specified in the simulations; larger 2 denotes generally larger alternatives. We take 2 = 1 or 2 in our simulations. The mixing parameter is denoted by , with = P (i = 0), and represents the proportion of variables under H1 . It varies from 0 to 0.5. Finally, conditional to the means and variances, the observed data vector is assumed to come from a m-dimensional multivariate normal distribution, with i.i.d. components.

3866

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870 λ=5

n=5 m=1000 σ2δ=1

λ=∞ 1

1

0.5

0.5

0.5

0

0.1

0.25

0.5

0

0.1

n=5 m=10000 σ2δ=1

ζ=1.000E+000

0.5

0

1

1

0.5

0.5

0.5

0.1

0.25

0.5

0

0.1

ζ=1.000E+000 n=5 m=1000 σ2δ=2

0.25 ζ=4.390E+001

1

0

0.25 ζ=4.894E+001

0.5

0

1

1

1

0.5

0.5

0.5

0

0.1

0.25

0.5

0

0.1

ζ=1.000E+000 n=5 m=10000 σ2δ=2

λ=1

1

0.25 ζ=9.316E+001

0.5

0

1

1

1

0.5

0.5

0.5

0

0.1

0.25

0.5

0

0.1

ζ=1.000E+000 FDR

WFDR th=2

0.25 ζ=1.001E+002 WFDR w=S

2

0.5

BHP

0

0.1

0.25 ζ=1.023E+025

0.5

0.1

0.25 ζ=4.767E+022

0.5

0.1

0.25 ζ=5.533E+018

0.5

0.1

0.25 ζ=1.244E+023

0.5

2

WBHP th=2

WBHP w=S

Fig. 1. Simulation results for n = 5: Power estimates for different values of (abscissa axis), m and 2 (along the rows), (along the columns). FDR = FDR-controlling procedure, WFDR = Weighted FDR-controlling procedure, BHP = Bonferroni.Holm FWE-controlling procedure, WBHP = Benjamini.Hochberg weighted FWE-controlling procedure. Moreover, th = 2 means threshold  = 2 (see definition (8)), w = S 2 means weight w equal to the total variance (see definition (7) with  = 0).

δ

σ2=1

n=10 m=1000

λ=∞

λ=5 1

1

0.5

0.5

0.5

0

0.1

0.25

0.5

0

0.1

δ

σ2=1

n=10 m=10000

ζ=1.000E+000

δ

σ2=2

n=10 m=1000

0

0.5

0.5

0.5

0.1

0.25

0.5

0

0.1

0.25

0.5

0

1

0.5

0.5

0.5

0.5

0

0.1

ζ=1.000E+000

0.25

0.5

0

1

0.5

0.5

0.5

0.25

0.5

0

ζ=1.000E+000 FDR

0.1

0.25

0.5

0

ζ=1.002E+002 WFDR th=2

WFDR w=S

2

0.5

0.25

0.5

0.25

0.5

ζ=3.011E+019

1

0.1

0.1

ζ=9.707E+001

1

0

0.25

ζ=6.141E+019

1

0.25

0.1

ζ=4.649E+001

1

0.1

0.1

ζ=1.023E+017 1

0

δ

0.5

1

ζ=1.000E+000

σ2=2

0.25 ζ=4.725E+001

1

0

n=10 m=10000

λ=1

1

0.1

0.25

0.5

ζ=2.379E+024 BHP

WBHP th=2

2

WBHP w=S

Fig. 2. Simulation results for n = 10: Power estimates for different values of (abscissa axis), m and 2 (along the rows), (along the columns). FDR = FDR-controlling procedure, WFDR = Weighted FDR-controlling procedure, BHP = Bonferroni.Holm FWE-controlling procedure, WBHP = Benjamini.Hochberg weighted FWE-controlling procedure. Moreover th = 2 means threshold  = 2 (see definition (8)), w = S 2 means weight w equal to the total variance (see definition (7) with  = 0).

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870

3867

Power is taken to be the average fraction of correctly rejected hypotheses in each simulation. Specifically, defining K1 = {i|Hi is rejected} and K2 = {i|i = 0}, we define Power =E(|K1 ∩ K2 |/|K2 |||K2 | > 0). Figs. 1 and 2 show the power estimates with sample size (n) equal to 5 and 10, respectively, for different proportion of active variables ( ), variances of effects 2 , variance heterogeneity ( ) and number of variables (m) using weights as defined in (6)–(8) (with  = 2), using 1000 simulated data sets. Among competing procedures that control the FWE, we show only the WBHP that generally proves to be more powerful than the WHP. A detailed report of the power estimates for all the presented procedures and various thresholds is available upon request to the corresponding author. A discussion of the results is given in the Conclusion section. 5. Conclusions In this paper, we have discussed data-driven weighted procedures controlling FWE. We also propose new data-driven weighted procedures controlling FDR in the case of independence between variables and under positive regression dependency on the subset of variables corresponding to null hypotheses (PRDS). These procedures provide good results as long as the variances of the m variables are approximately uniform under H0 . As heterogeneity grows, the procedures (though remaining correct) become less powerful. However the simulation study suggests they can also be applied in cases of high incidence of nonhomogeneity. In particular, the solution proposed in (8) based on a threshold, seems to provide better results in the FWE-controlling WBHP procedure even when there is very high heterogeneity. This is particularly true for simulations with a high number of variables (M = 10 000). This is due to the fact that the threshold value reduces the number of tested variables and this compensates the loss of power characterizing the Bonferroni-like procedures in high dimensional situations. In contrast, the proposal made in (6) seems to behave better in terms of power in the WFDR-controlling procedure even when the ratio is large. As an example, consider the setting n = 5, m = 10 000 and  = 2 (Fig. 1, plots on the bottom) and consider the case with 10% of active variables. The power of the FDR-controlling procedure is about 33%. When there is complete homogeneity, the power of the WFDR-controlling procedure proposed in (6) (w = S 2 ) is very high (61%). Power does not change when the variance heterogeneity across variables is large. On the contrary, when the lack of homogeneity is huge then power decreases very rapidly. As a general guideline, the gain in power is higher for smaller sample size (see also the formal discussion of Section 3.1), whereas power seems to be independent from the number of variables involved. Although the combination of low sample size and high number of variables used in the simulation could seem negligible in the real applications, this setting is the most common one in brain imaging studies, where the sample size rarely exceeds 8–10 subjects and the number of variables involved are rarely less than 100 000. Also in microarray studies the sample size and number of variables often have similar settings. The software has been developed in Matlab 7 (Mathwork inc. 䉷) and is available upon request to the corresponding author. Acknowledgement Authors wish to thank two referees for helpful comments and suggestions. Appendix (i)

Proof of Theorem 2.2. Let Ck denote the event at which the true null hypothesis i is rejected along with other k − 1 (true and/or false) hypotheses. Now let pij k = P ({Pi ∈ [ Note that for any given set of ordered weights wi , m

k=1

 pij k = P

 Pi ∈

i  j −1 wi

m

q,

m

i=1 wi

m

q,

j

i=1 wi

m





i  j wi

j −1

q

 (i) ∩ (∪k Ck )

(i)

q]} ∩ Ck ).

=

qw j m

.

(10)

3868

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870 (i)

Note that, for each i the Ck are disjoint, so the FDR can be expressed as m



E(Q(w)) =



i∈M0 k=1 m



=



i∈M0 k=1

h  k wh



h  k wh



=

wh

i∈M0

m

j =1

h  k wh

Pi 

pij k =

j =1

1

h  j wh

i∈M0 j =1 k=j



P k

1

m

m







1

m

 q

m

m



(i) ∩ Ck



pij k 

1

h  k wh

i∈M0 j =1 k=j m







1

pij k

m

h  j wh k=1

i∈M0 j =1

pij k

⎞ ⎛ m

w 1 q ⎠ i∈M0 h q.  =⎝  w m w m hj h hj h 1



(11)

j =1



Proof of the Lemma. When m = 1 the result is straightforward. Now, we assume that (9) holds for any m m. If m0 = 0, all null hypotheses are false, Q = 0, and m0 E(Q(w)|P1 = p1 , . . . , Pm+1 = pm+1 ) = 0 

i=1 wi

m+1

q.

(12)



If m0 > 0: Let P(m0 ) denote the p-value corresponding to the largest p-value among P1 · · · Pm0 . Since these corre

spond to the true null hypotheses, P(m0 ) is distributed as the largest of m0 independent U(0, 1) random variables (this 

is due to the fact that the conditional distribution of X0 for fixed X0 X0 is left spherical), and fP(m0 ) (p) = m0 p m0 −1 for 0 p 1. Let us also define j0 as the largest j, m0 + 1 j m + 1, satisfying j pj 

i=1 wi q. m+1

(13) 

Let p  denote the value of the right side of (13) at j0 . Conditional to P(m0 ) = p, E(Q(w)|Pm0 +1 = p1 , . . . , Pm+1 = pm1 )  p  = E(Q(w)|P(m0 ) = p, Pm0 +1 = p1 , . . . , Pm+1 = pm1 )fP  (p) dp 0

+

m0



1

p



E(Q(w)|P(m0 ) = p, Pm0 +1 = p1 , . . . , Pm+1 = pm1 )fP  (p) dp. m0

(14) (15)

For p p all j0 hypotheses are rejected, and m0

Q(w) = i=1 j0

wi

i=1 wi

.

(16)

From (14) we get m0

i=1 wi (p  )m0 j0 i=1 wi

j0 m0 m0 wi i=1 wi i=1 wi  m0 +1 q(p q(p  )m0 +1 . = i=1 ) = j0 m + 1 m + 1 i=1 wi

(17)

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870 

3869 



In order to evaluate (15) we further condition to the p-value at which P(m0 ) is achieved, indexed by i , 1 i m0 : 

1

p





E(Q(w)|P(m0 ) = p, P(m0 ) = Pi , Pm0 +1 = p1 , . . . , Pm+1 = pm1 )fP  (p) dp m0

=

m0  1 1

  E(Q(w)|P(m0 ) = p, P(m0 ) = Pi  , Pm0 +1 = p1 , . . . , Pm+1 = pm1 )fP  (p) dp m0  m0  p

(18)

i =1





Let us consider both cases: when Pj P(m0 ) = Pi  = p < pj +1 for j > j0 , or when p < P(m0 ) = Pi  = p < pj0 +1 . From the definition of j0 and p  , no hypothesis can be rejected because of the value of p, pj +1 , . . . , pm+1 . Therefore, when all hypotheses—true and false—are considered together and their p-values are ordered, a hypothesis H(i) can be rejected only if there exists a constant k, i k j − 1, for which k w(i) P(k)  i=1 q. p (m + 1)p

(19)

Equivalently, H(i) will be rejected if k satisfies j −1  k j −1   ∗ w w w P(k) (j − 1) ki=1 w(i) i i i=1 (i) i=1 i=1 q = q  j −1 (j − 1) (m + 1)p p (j − 1) w(i) (m + 1)p

(20)

(j − 1) ∗ = j −1 . w(i) i=1 w(i)

(21)

i=1

with

j −1 These weights satisfy w ∗ 0, and i=1 wi∗ = j − 1; since we conditioned to Pi  = P(m0 ) = p, the m0 − 1 remaining Pi /p are distributed as independent U(0, 1) random variables (again, this is due to the left-spherical property of X0 for  fixed X0 X0 ); pi /p for i = m0 + 1, . . . , j are values in the interval (0,1) corresponding to false null hypotheses. Hence,  using (3) to test the j − 1 = m m hypotheses, of which m0 − 1 are true, is equivalent to using the procedure with the j −1

i=1 w(i) (m+1)p q

constant

taking the role of q. 

Applying now the starting assumption that (9) holds for any m m induction hypothesis we have that   E(Q(w)|P(m = p, P(m = Pi  , Pm0 +1 = p1 , . . . , Pm+1 = pm1 ) 0) 0)

m0 

i=1;i =i



wi∗

(j − 1)

j −1

∗ i=1 wi

(m + 1)p

 q

(22)

m0 =

i=1 wi

− wi  q, (m + 1)p

(23)

where (23) is derived after replacing w ∗ with its definition in (21). The upper bound in (23) depends on p, but not on a particular j pj < p < pj +1 . Let us recall that for j0 the range for p is p < p Pj0 +1 and it does depend on i  . Therefore, by integrating (23) over (p  , 1], while still conditional to i  we get 

1 p

m0

i=1 wi

− wi  m0 qm0 p m0 −1 dp = m0 − 1 (m + 1)p

m0

i=1 wi

− wi  q(1 − (p  )m0 −1 ). (m + 1)

(24)

3870

L. Finos, L. Salmaso / Journal of Statistical Planning and Inference 137 (2007) 3859 – 3870

Averaging now over i  , from (18) and (24) we get  1 E(Q(w))|Pm0 +1 = p1 , . . . , Pm+1 = pm1 )fP 

(m0 )

p

m0 1

m0 = m0  m0 − 1 i =1

(p) dp

m0

m0 − wi  wi  m0 −1 q(1 − (p ) ) = i=1 q(1 − (p  )m0 −1 ). m+1 m+1

i=1 wi

Finally by adding (25) and (17) we get the desired inequality (12) for m + 1.

(25)



Proof of Theorem 3.2. In order to prove that the procedure for dependent variables controls the WFDR, we simply need to note that Pi , i = 1, . . . , m0 in (10) and (11) are under H0 , hence uniformly distributed conditional to wi .  Indeed, X0 is left spherically distributed, its conditional distribution for fixed X0 X0 is also left spherical and each of the columns of X0 is conditionally left spherically distributed as well and the result follows.  Proof of Theorem 3.3. It directly follows from the fact that only statistics under the null hypothesis have to be PRDS, therefore the same consideration of proof of Theorem 3.2 over the left spherically distributed variables holds.  References Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a new and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57, 1289–1300. Benjamini, Y., Hochberg, Y., 1997. Multiple hypotheses testing with weights. Scand. J. Statist. 24, 407–418. Benjamini, Y., Yekutieli, D., 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 (4), 1165–1188. Eaton, M.L., 1986. Lectures on topics in probability inequalities. CWI Tract 35. Fang, K.T., Zhang, Y.T., 1990. Generalized Multivariate Analysis. Science Press, Beijing. Finos, L., Salmaso, L., 2006. Weighted methods controlling the multiplicity when the number of variables is much higher than the number of observations. J. Nonparametric Statist. 18 (2), 245–261. Holland, P.W., Rosenbaum, P.R., 1986. Conditional association and unidimensionality in monotone latent variable models. Ann. Statist. 14, 1523–1543. Holm, S., 1979. A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6, 65–70. Kropf, S., Läuter, J., 2002. Multiple test for different sets of variables using a data-driven ordering of hypotheses, with an application to gene expression data. Biometrical J. 44, 789–800. Kropf, S., Hommel, G., 2004. In: 7th Tartu Conference on Multivariate Statistics in 2003. Appeared in: Acta et Commentationes Universitatis Tartuensis de Mathematica 8 (2004), (special vol.), 169–177. Kropf, S., Läuter, J., Eszlinger, M., Krohn, K., Paschke, R., 2004. Nonparametric multiple test procedures with data-driven order of hypotheses and with weighted hypotheses. J. Statist. Plann. Inference 125, 31–47. Läuter, J., Glimm, E., Kropf, S., 1996. New multivariate tests for data with an inherent structure. Biometrical Journal 38, 5–23. Erratum: Biometrical J. 40, 1015. Pesarin, F., 2001. Multivariate permutation test with application to biostatistics. Wiley, Chichester. Rosenthal, R., Rubin, D.B., 1983. Ensemble-adjusted p-values. Psychological Bulletin 94, 540–541. Sarkar, T.K., 1969. Some lower bounds of reliability. Technical Report 124, Department of Operation Research and Statistics, Stanford University. Sarkar, S.K., 1998. Some probability inequalities for ordered MTP2 random variables: a proof of Simes’ conjecture. Ann. Statist. 26 (2), 494–504. Spjtvoll, E., 1972. On the optimality of some multiple comparison procedures. Ann. Math. Statist. 43, 398–411. Westfall, P.H., Krishen, A., 2001. Optimally weighted, fixed sequence and gatekeeper multiple testing procedures. J. Statist. Plann. Inf. 99, 25–40. Westfall, P.H., Kropf, S., Finos, L., 2004. Weighted FWE-controlling methods in high-dimensional situations. In: Benjamini, Y., Bretz, F., Sarkar, S. (Eds.), Recent Developments in Multiple Comparison Procedures, Institute of Mathematical Statistics Lecture Notes—Monograph Series, vol. 47, pp. 143–154.

Further Reading Finos, L., Pesarin, F., Salmaso, L., 2003. Combined tests for controlling multiplicity closed testing procedures. Italian J. Appl. Statist. 15, 301–329. Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., 1999. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 532–537. Läuter, J., 1996. Exact t and F tests for analysing studies with multiple endpoints. Biometrics 52, 964–970. Marcus, R., Peritz, E., Gabriel, K.R., 1976. On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63, 655–660.