A general approach to stepup multiple test procedures for free-combinations families

A general approach to stepup multiple test procedures for free-combinations families

Journal of Statistical Planning and Inference 82 (1999) 35–54 www.elsevier.com/locate/jspi A general approach to stepup multiple test procedures for...

156KB Sizes 0 Downloads 27 Views

Journal of Statistical Planning and Inference 82 (1999) 35–54

www.elsevier.com/locate/jspi

A general approach to stepup multiple test procedures for free-combinations families a Central

Eugene Grechanovskya;∗ , Ilia Pinskerb Bureau of Statistics, Statistical Analysis Unit, 91130 Jerusalem, Israel b Neve Yakov, 403=3 Jerusalem, Israel Received 1 January 1997; accepted 16 July 1997

Abstract Dunnett and Tamhane (1992, J. Amer. Statist. Assoc. 87, 162–170) proposed a stepup multiple test procedure for simultaneous testing of k hypotheses satisfying the free-combinations condition under the normal setup in the equicorrelated case and later extended it for the non-equicorrelated case (1995, Biometrics, 51, 217–227). However, our counterexample shows that the last one does not control the familywise error rate. We suggest a new approach to constructing stepup procedures which yields as particular cases the procedures by Hochberg (1988, Biometrika, 75, 800–802), Rom (1990, Biometrika, 77, 663–665) and Dunnett and Tamhane (1992) though not by Dunnett and Tamhane (1995). Furthermore, using this approach we develop a stepup test procedure for the non-equicorrelated case that generalizes the Dunnett and Tamhane (1992) procedure but di ers from the Dunnett and Tamhane (1995) one. A new test algorithm for the generalized procedure has no break rules and in general makes k tests associated with k nodes on the graph of subset intersection hypotheses whose positions are dictated by the observations. Under the normality assumption, we suggest approximations for sharp critical values and for c 1999 Elsevier Science p-values. The performance of the procedure is illustrated by examples. B.V. All rights reserved. MSC: 62J15 Keywords: Stepup tests; General test algorithm; Approximate critical values; Procedural p-values; Comparative powers

1. Background, introduction, and summary Multiple test and multiple comparison procedures generalize statistical tests in that they test families of hypotheses rather than single hypotheses. For decades, the main tools for dealing with multiplicity included Bonferroni, Tukey and Sche e tests. Recently, however, there appeared a number of testing procedures variously called ∗

Corresponding author.

c 1999 Elsevier Science B.V. All rights reserved. 0378-3758/99/$ - see front matter PII: S 0 3 7 8 - 3 7 5 8 ( 9 9 ) 0 0 0 3 0 - 0

36

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

‘stepwise’, ‘stepdown’, ‘stagewise’, ‘sequentially-rejective’ and ‘stepup’. They are considerably more complex and do not easily let themselves t into a single theory. Usually, they are put into one of the two categories: ‘stepdown’ and ‘stepup’ although it is not always clear whether the distinction lies at the conceptual, algorithmic, or purely computational level. Theory of stepdown procedures is well developed by now while no comparable theory exists for stepup procedures, and each of them appears to be ad hoc. In this work, we consider testing free-combinations families of hypotheses, i.e. hypotheses without logical interrelations. For such families, there exist presently four major stepup procedures: by Hochberg (1988), Rom (1990), Dunnett and Tamhane (1992) (DT92), and Dunnett and Tamhane (1995) (DT95). The simplest one is by Hochberg (1988). It orders the p-values p(1) ¿ · · · ¿p(k) with corresponding hypotheses H(1) ; : : : ; H(k) ; and tests H(i) ’s sequentially starting from H(1) : if p(i) ¿ =i; H(i) is accepted, and it proceeds testing H(i+1) ; if p(i) ¡ =i; H(i) is rejected, and all H( j) ; j¿i, are rejected without further testing. Simes’s (1986) theorem for independent p-values is used to guarantee control of the familywise error rate (FWE). The same scheme is employed by Rom (1990) and by DT92, the di erence being improvements in sharpness of critical values achieved at the price of increasing complexity of computations. Being perfectly valid for cases when the test statistics Ti related to Hi are equicorrelated, this scheme runs into diculties for more complex situations, for instance, when statistics Ti are normal non-equicorrelated (e.g., comparing treatments with a control in an imbalanced oneway design), and sharp critical values are desired. An attempt by DT95 to apply the scheme to such cases failed since their procedure does not control FWE (see Section 3.2 below). We will develop a new approach to construction of stepup procedures that produces the procedures of Hochberg (1988), Rom (1990), and DT92 under appropriate assumptions while enabling us to deal with more general situations as well. To begin, we T nd it convenient to picture a family of subset intersection hypotheses HI = i∈I Hi as a graph (see an example in Fig. 1) in which intersection hypotheses are represented by nodes, implications by directed arcs, the overall hypothesis is at the top, and original hypotheses (minihypotheses) sit at the bottom oor (Sonnemann, 1982). We de ne the grand critical path C as the sequence of nodes (Ik ; : : : ; Ir ; : : : ; I1 ) going from the overall hypotheses down to the node I(1) = {H(1) }, and consisting of nodes Ir = {H(1) ; H(2) ; : : : ; H(r) }; r = k; k − 1; : : : ; 1. A usual stepdown procedure, i.e. by Holm (1979), tests the grand critical path in a top-bottom manner, rejecting nodes one by one and stops at the rst acceptance therefrom accepting everything downwards. In the same vein, the stepup procedure by Hochberg (1988) may be viewed as testing the grand critical path from below upwards, accepting nodes one by one and terminating at the rst rejection therefrom everything is rejected upwards. Rom’s (1990) and DT92’s procedures may be similarly interpreted, albeit they have di erent critical values. As noted above, this reversion of the step-down test scheme does not seem to work properly when applied to more general cases. Incidentally, this point has been overlooked in a recent paper by Liu (1996).

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

37

Fig. 1. A graph for a free combinations family for k = 4.

Our solution to the problem consists of three parts. First, we construct new nominal tests I , and de ne a stepup procedure as a coherent procedure (Gabriel, 1969) with nominal tests I (Section 2.2). (For a node I , the test I rejects when rejects at least one test I on the “small” critical path of the node I .) Second, we develop a general test algorithm equivalent to thus de ned stepup procedure (Section 3.3). Third, we suggest new approximations for critical values (Section 4) and a new algorithm for calculating p-values (Section 5). Section 6 presents simulation results for comparative powers of stepdown and stepup procedures, and Section 7 outlines possible extensions. This paper is based on Grechanovsky and Pinsker (1996) where more details and examples can be found. 2. Stepup procedures for free-combinations families 2.1. Families of hypotheses and stepwise test procedures We adopt the standard normal theory linear model setup, and consider the following basic example. Assume that we are interested in simultaneously testing a set of hypotheses Hi : i = 0;

i = 1; : : : ; k;

(1)

against one-sided alternatives i ¿0, where i are parameters. Given a vector of observations Y , we have unbiased least-squares estimators ˆ1 ; : : : ; ˆk with corr(ˆi ; ˆj ) = ij and var(ˆi ) = 2 ; and an unbiased independent estimator S 2 for 2 such that S 2 =2 has a 2 distribution. Simultaneous (or multiple) testing implies here that the famliywise error rate (FWE) is strongly controlled: the probability of rejecting any true hypothesis does not exceed . One way to satisfy this requirement is to use the

38

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

Bonferroni procedure. Another way is to use a single-step -exhaustive procedure which rejects any Hi if Ti ¿c where Ti = ˆi =S; and the critical value c satis es the Tk equation PrH {max16i6k Ti 6c} = 1 − with H = i=1 Hi . It is well known however that one can be better-o by using stepwise procedures rather than single-step ones. To talk about stepwise procedures, we need some notation (mostly adopted from Sonnemann, 1982; Hochberg and Tamhane, 1987; Royen, 1989). We will call Hi in (1) the minimal hypotheses or minihypotheses, and denote their family by Hmin ={Hi : i ∈ Imin }. Often small letters will be used for minihypotheses: hi , etc. We assume here that the family Hmin is a free-combinations family; it means that for any I ⊆ Imin , the set of parameters for which all Hi ; i ∈ I; are true and all Hj ; j 6∈ I; are false is nonempty (Holm, 1979, HT, pp. 55). Hmin constitutes a subfamily of a larger family of subset intersection hypotheses H = {HI : I ∈ I}; where HI =

\

Hi ;

I = {i1 ; : : : ; ir } ⊆ Imin :

(2) (3)

i∈I

We assume that the family H contains all intersection hypotheses in (3); in general such families are called closed. A family of logically related hypotheses in (2) can be conveniently pictured as a directed acyclic graph G in which hypotheses are represented by nodes, implications T by directed arcs or arrows, the overall hypothesis HO = i∈Imin Hi by the root node, and minihypotheses sit at the bottom oor (Fig. 1 shows for k = 4, all nodes and a few arrows by thin lines). We use HI and I interchangeably, in particular, if HI implies HJ ; HI ⊆ HJ , we write I ⇒ J (e.g. in Fig. 1, I123 ⇒ I13 ; I12 ⇒ I1 ; etc.). A multiple test procedure P maps a vector of observations Y into the set of accept=reject decisions for all HI (often only decisions for Hmin are of interest). Assume that with each hypothesis HI in (3) there is associated a nominal test function I (Y ). In view of the coherence property (nominal acceptance of a hypothesis implies its procedural acceptance and also procedural acceptance of all the hypotheses implied by it), any procedure P is unambiguously de ned by a nominal test family {HI ; I : I ∈ I} as all its procedural tests I are functions of the nominal tests. Thus, coherent stepwise procedures embrace a two-component structure – graph of hypotheses and their nominal tests. The most well-developed class of coherent procedures is stepdown procedures (SD procedures) which are traditionally described via the stepdown testing scheme (cf. Hochberg and Tamhane, 1987, p. 53): A procedure P begins by testing the overall hypothesis and steps down through the hierarchy of implied hypotheses; if any HI is test-accepted, all its implied hypotheses are procedurally accepted. However, this description runs together the de nition of SD procedures per se which is independent of the order of testing (Grechanovsky, 1993, Lemma 3.1), and a test algorithm which does not belong to the procedure proper and can be chosen, say, on intuitive or computational grounds.

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

39

Apparently new animals began to appear in the literature from the Welsch (1977) paper on under the name Stepup (SU) procedures (Hochberg, 1988; Rom, 1990; DT92). They ostensibly reverse the stepdown testing scheme. As Welsch (1977) put it: “The new (stepup) tests begin by examining the gaps between adjacent ordered sample means, then the three stretches, etc., until the range is reached. This reverses a procedure (stepdown) proposed by Ryan”. Hochberg and Tamhane (1987, p. 53) divide stepwise procedures into two types – stepdown and stepup, and note an absence of a theory for SU procedures. A number of procedures which appeared since then, although variously called, t the description above (Hochberg, 1988; Rom, 1990; DT92). In the absence of a theory for SU procedures, however, there seem to be no easy answers to the following questions: Is there a general scheme for SU procedures like that above for SD ones? Are they substantially di erent, and if so, can they be grouped into a separate category? Our work suggests answers to these questions in the framework of free-combinations families which hopefully can be extended to more complex situations. 2.2. Stepup tests and stepup procedures To motivate our approach to SU procedures, consider a node I = {h1 ; : : : ; hr }, and the corresponding nominal (node) test I . Order the observed values of the test statistics Ti:I related to I : T1:I ¡T2:I ¡ · · · ¡Tr:I , with the corresponding minihypotheses h1:I ; h2:I ; : : : ; hr:I . Ties are ignored as their probability is zero. Tr:I is called the I -critical statistic of the node I , and its observed value is denoted by I . In a usual SD procedure, I = 1 if I ¿cI , and I = 0 otherwise where the critical values are assumed to decrease downwards, (I ⇒ J ) ⇒ (cI ¿cJ ). In other words, the statistic Tr:I is used for making a reject=accept decision, and tests I are Union–Intersection (UI) tests. We could try to improve our procedure by using at each I , another test, better than I . Consider a subgraph GI implied by I : GI = {J : I ⇒ J }. E.g. for a node I123 , its subgraph consists of the nodes I123 ; I12 ; I13 ; I23 ; I1 ; I2 ; I3 . Besides a node test I which rejects only when Tr:I exceeds cI , consider a node test I which rejects when at least one Ti:I exceeds the critical values at all nodes J ∈ GI in which it is the critical statistic, J = Ti:I . It transpires that tests I , albeit being non-UI tests, are often though not always better than usual UI tests I . We will show below that SU procedures are coherent procedures in which tests I play the part of node tests. To make the idea precise, let th be the observed value of a statistic Th corresponding to a minihypothesis h, and RI; h the subset of GI for which th is the critical statistic: RI; h = {J : J ∈ GI ; J = th }. First, we de ne component tests at a node I : I; h = min J : J ∈RI; h

(4)

Second, we de ne a stepup node test at node I as I = max I; h : h∈I

(5)

Finally, a stepup procedure U is de ned as a coherent procedure with node tests I .

40

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

It is easy to see that min in formula (4) takes place at the node having s minihypotheses where s is de ned from h=hs:I . This node has the maximal critical value at the set RI; h ; it will be called a maximal node for I and h. De ne the critical path CI for a node I as the sequence of nodes (Ir ; Ir−1 ; : : : ; I1 ) where Ii = {h1:I ; : : : ; hi:I }; i = 1; : : : ; r. If T1 ¡T3 ¡T2 , the critical path for a node I123 is (I123 ; I13 ; I1 ). Obviously, the critical path CI is the set of maximal nodes for all h in I . Therefore, for obtaining the decision in the test I in (5), it will suce to test the critical path CI : I = maxJ ∈CI J . A usual test algorithm in a SD procedure for a free-combinations family will test the grand critical path in top-bottom fashion until the rst acceptance, and then will accept all the remaining hypothesis. We will call it the stepdown break (SDB) algorithm. Now a question might be asked, whether it is possible to obtain a test algorithm for a SU procedure by ‘reversing’ SDB algorithm? This idea has been developed by Hochberg (1988), Rom (1990), DT92, and for testing contrasts by Welsch (1977). In Section 3.1 we prove that such stepup break (SUB) algorithm can always be used in a ‘symmetric’ case when critical values are determined by node cardinalities. However, in a general ‘non-symmetric’ case (for instance, for imbalanced designs) it fails; in Section 3.2 we give relevant counterexamples, and in Section 3.3 we develop a general stepup (GSU) algorithm for ‘non-symmetric’ cases. Now we introduce two major approaches to constructing critical values in tests I de ned above. Recall that to construct a single test I , one must de ne critical values for all J ; J ∈ GI . The rst approach is to use the node tests J in an existing SD procedure, say, P, as tests needed in I . We will call such a procedure an inverted procedure (relative to P), and will denote it by or P U . Note that though tests J are UI-tests, the tests I are not UI tests. This approach has been used by Hochberg (1988) in inverting Holm’s (1979) SD procedure to obtain a new SU procedure. What about the FWE control in P U ? If P controls FWE, this does not guarantee per se that P U will control FWE. For instance, if P is -exhaustive, then P U will not control FWE, and one will have to enlarge the critical values in I to have P U control FWE. If P is conservative, however, chances are that its conservatism is sucient for providing FWE control for P U (e.g. Hochberg’s, 1988 procedure). The second approach is to begin from scratch and build tests I inductively from the bottom upwards in such a way that the resulting I tests have -levels. This approach has been used by Rom (1990), DT92, and also by Welsch (1977) for testing contrasts. Numerical computations may present here considerable problems, cf. Welsch (1977), DT92, and Section 4 below. 2.3. p-values in stepwise procedures Westfall and Young (1989, 1993), DT92, and Wright (1992) introduced p-values into stepwise testing. The adjusted, or procedural, p-value of a particular hypothesis HI in a multiple procedure P is the maximal overall (or familywise) signi cance level that results in acceptance of that particular hypothesis HI .

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

41

In a SD procedure P that is equivalent to its SDB algorithm testing the grand critical path C=(Ik ; : : : ; Ir ; : : : ; I1 ) where Ir ={h1:I ; : : : ; hr:I }; r=1; : : : ; k, the p-value of a minihypotheses h(r) corresponding to T(r) is computed as p(r) = max{pIk ; pIk−1 ; : : : ; pIr } where pIi = Pr Ii {Ti:Ii ¿Ii }; i = k; k − 1; : : : ; r. In stepup procedures, the situation is more complicated. Here we have two problems: calculating node p-values pI (which are di erent of course from those in SD procedures) and p-values of minihypotheses proper. First consider pI . When a SU procedure is built inductively (see the second approach in Section 2.2 above) so that I ’s are -exhaustive tests, p-values pI can be de ned as follows. At the bottom oor, we have pi = 1 − Fi (ti ) where Fi are the distribution functions of statistics Ti and ti are the observed values of Ti . For a node I at the rth oor, assume by induction that the distribution functions FJ (·) of tests J for all J ∈ GI besides I have already been computed. Also assume that for a given , all stepup critical values uJ ; J ∈ GI , except uI have been computed. Then we can nd uI in the test I such that Pr{ I = 1} = , and the relation FI (uI ) = 1 − for 0¡ ¡1 de nes FI (:). Although monotonicity of functions FI (:) has not been analytically demonstrated, it should not appear a serious concern. In particular, computations in DT92 and our own indicate that those functions are very close to the corresponding distribution functions of stepdown tests I and get even closer as k increases. Now, if t(r) is the observed value of Tr:I , we have pI = 1 − FI (t(r) ):

(6)

Using pI ’s one can compute the p-values of minihypotheses. In the equicorrelated case, it has been done by DT92 and is described in Section 3.1. The general case is treated in Section 5 after the GSU algorithm has been developed in Section 3.3.

3. Test algorithms in stepup procedures 3.1. Symmetric case Some preliminary de nitions are in order. If I ={h1:I ; : : : ; hr:I }, a node I is called the closure of the node I if I contains all minihypotheses in I and also all minihypotheses hi whose statistics Ti lie between those corresponding to the minihypotheses in I . If T1 ¡T3 ¡T2 ¡T4 , then the closure of the node I12 is I123 ; I12 = I123 , and the node I23 is the closure of itself, I23 = I23 . A procedure P is called consonant (Gabriel, 1969) if whenever P rejects a non-minimal hypothesis HI , it also rejects at least one of its proper components HJ . A procedure P is called a UI-procedure when it T rejects any subset intersection hypothesis HI = i∈I Hi if and only if it rejects at least one its proper component Hj ; j ∈ I . A procedure P is marginally monotone if the ordering of procedural p-values of the minihypotheses corresponds to the ordering of T(i) .

42

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

Proposition 3.1. If U is a SU procedure for testing a free-combinations family in which the critical values in UI tests I depend only on node cardinalities and decrease downwards; then (i) A node I is rejected by U i I is rejected by U; (ii) A node I is rejected by U i the I-critical minihypotheses is rejected by U; (iii) U is consonant and is a UI procedure; (iv) If a node I ∈ C is -rejected; then it is rejected by U; and all the nodes of the same cardinality are rejected by U; (v) Testing in U may be done by the SUB algorithm using tests I on the grand critical path C: if a current node is -accepted; it is -accepted and is labeled ‘accepted ’ by U; if it is -rejected; all its implying nodes in C (itself included) are -rejected and are labeled ‘rejected’ by U whereupon the testing stops; (vi) The stepup p-values of minihypotheses h(r) are calculated as p(r) = min pIi ; 16i6r

r = 1; : : : ; k;

(7)

with Ii ∈ C. Thus; U is marginally monotone. The assertions (i)–(v) are implied by Theorems 2:2 and 2:3 in Grechanovsky (1994), so we omit the proof. The assertion (vi) follows from (ii) and (iv). Note 3.1. 1. When critical values are not determined by cardinalities; the Proposition 3:1 does not apply; and in fact the assertions (i); (ii); (iv)– (vi) are not valid (a counterexample is given in the next Section); 2. If monotonicity of critical values does not hold; one can enlarge critical values to restore monotonicity and obtain more conservative procedure. Example 3.1. If Ti ’s are independent, p-values are used as test statistics, and I ’s are Bonferroni tests having level , we obtain Hochberg’s (1988) SU procedure. Procedural stepup p-values of minihypotheses are calculated from (7). Example 3.2. Under the same assumptions, if tests I are -exhaustive as in Section 2.3, we obtain Rom’s (1990) procedure. Example 3.3. Under the normal setup in the equicorrelated case when I are exhaustive tests, we obtain DT92’s procedure. The p-values pIr and the p-values of minihypotheses in (7) are identical to the p-values p˜ 0(r) and p˜ (r) , respectively, in DT92 (next to last equation in Section 5 with r = m). In the general ‘non-symmetric’ case however, a p-value pI in (6) is determined by FI (·) which depends on all FJ (·); J ∈ GI , so that here p-values pIr are not identical 0 , and procedural p-values of minihypotheses are not identical to pr in DT95, to p(r) p. 224, Eq. (9) with r = m.

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

43

Fig. 2. Example of non-equivalence of SU procedure and SUB algorithm.

3.2. Nonsymmetric case. The stepup break algorithm is not valid All existing SU procedures except in DT95 are particular cases of the ‘symmetric’ case covered in Section 3.1 and use SUB algorithm. As pointed out in Section 2.2, in the general non-symmetric case when critical values are not determined by node cardinalities, SUB algorithm is not equivalent to the SU procedure de ned there. Here is the promised counterexample for k = 3 (Fig. 2). Example 3.4. Under the normal setup, let the covariances be cov(T1 ; T2 )¿0; cov (T1 ; T3 ) = 0; cov(T2 ; T3 ) = 0. The critical values and observed values of test statistics can be set to satisfy the equation T1 ¡c1 = c2 = c3 ¡c12 ¡T2 ¡T3 ¡c13 ; c23 ¡c123 ; where cij::: is the critical value at the node I = {ij : : :}. The rst sign + (acceptance) or − (rejection) to the right of the node circles show decisions in I ’s, the second sign the corresponding decisions in I ’s. Procedural acceptance is shown by hollow circles, procedural rejection by lled circles. SU procedure’s decisions for minihypotheses are: accept h1 , reject h2 , accept h3 (by coherence from node (13)). Now consider SUB algorithm’s performance. The grand critical path is ((123), (12), (1)). SUB algorithm accepts h1 at node (1), rejects h2 at node (12), and rejects h3 by the break rule without testing. The di erence in the decisions stems from the fact that SUB algorithm never tests the node (13) and bases its decision about it (rejection) on the test at node (12) (cf. (iv) in Proposition 3.1). Note that assertions (i), (ii), (iv) – (vi) in Proposition 3.1 are not valid here. Now a question arises, could not the SUB algorithm be used after all even though it does not correspond to SU procedure as de ned in Section 2.2? Example 3.5 below shows that it cannot be used in the general case, and Example 3.6 demonstrates that the SU procedure in DT95 based on it does not control FWE. Under the normal setup with a common variance, let U be a SU procedure with -exhaustive tests I as de ned in Section 2.2, UB the procedure based on SUB algorithm, UDT the procedure

44

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

in DT95. For k = 2; UB is obviously equivalent to U , and under the null con guration 1 = 2 = 0, we have Pr{FWE in UB } = Pr{FWE in U } = Pr{reject h1 or reject h2 } = Pr{T(1) ¿c1 or T(2) ¿c12 } = Pr{ 12 = 1} = :

(8)

The following example shows that UB does not control FWE. Example 3.5. For k = 3, let the test statistics T = (T1 ; T2 ; T3 )0 have the degenerate normal distribution T ∼ N3 (; ) with  = (0; ; 0)0 and var(T1 ) = var(T2 ) = var(T3 ) = 1; cov(T1 ; T2 ) = 1; cov(T1 ; T3 ) = cov(T2 ; T3 ) = 0 which implies T2 = T1 + :

(9)

Obviously, the -exhaustive critical values satisfy the following equation: c1 = c2 = c3 = c12 ¡c13 = c23 = c123 :

(10)

For any  6= 0, the minihypothesis h2 is false. First consider =+∞. Since here T2 =∞ and always rejects, the problem reduces to a two-dimensional one, and we have by Eq. (8) for (T1 ; T3 ) = Pr =∞ {FWE in UB } = Pr =∞ {T(1) ¿c1 } + Pr =∞ {T(1) ¡c1 ¡c13 ¡T(2) } = Pr =∞ {c1 ¡T1 ¡T3 } + Pr =∞ {c1 = c3 ¡T3 ¡T1 } +Pr =∞ {T1 ¡c1 ¡c13 ¡T3 } + Pr =∞ {T3 ¡c1 = c3 ¡c13 ¡T1 }:

(11)

Now consider 0¡¡∞. By using (9), (10) and the properties of UB , we have Pr  {FWE in UB } = Pr  {c1 ¡T1 ¡T3 } + Pr  {c1 = c3 ¡T3 ¡T1 } + Pr  {T1 ¡c1 ¡c13 = c123 ¡T3 } + Pr  {T3 ¡c1 = c3 ¡c13 ¡T1 } + Pr  {T1 ¡c1 = c12 ¡T2 ¡T3 ¡c13 = c123 }:

(12)

The rst four terms in (12) do not depend on  and are identical to the corresponding terms in (11) while the last term is positive. Therefore = Pr =∞ {FWE in UB }¡Pr  {FWE in UB }

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

45

which implies that UB does not control FWE. Note the di erence between U and UB : the o ending last term in (12) does not appear in Pr  {FWE in U }. By continuity argument, the conclusion still holds for cov(T1 ; T2 ) less than but close to 1. The next example shows that the procedure UDT in DT95 does not control FWE. Example 3.6. Consider the setup in Example 3.5. The critical values in UB and UDT are identical: at the rst and second oors they are always identical, and at the top

oor they are identical due to condition (9). As the test algorithms in UB and UDT are identical (if UDT is interpreted as testing the graph), UDT is identical to UB and does not control FWE. By continuity argument, UDT does not control FWE for cov(T1 ; T2 ) less than but close to 1. 3.3. General stepup algorithm Now we describe a GSU algorithm to be used in the general non-symmetric case and prove its equivalence to SU procedure de ned in Section 2.2. This algorithm will test the nodes on a general test path of the free-combinations graph G. Let G1 = G. Step 1: Here the maximal node L1 ∈ G1 for h(1) is L1 = I(1) . If T(1) ¿uL1 where uL1 = dI(1) (the critical value in SD procedure at I(1) ), then: (1) reject h(1) ; (2) remove (reject) from the graph G1 all nodes containing the minihypotheses h(1) ; denote the resulting graph by G2 ; otherwise: (1) accept h(1) ; (2) set G2 = G1 . Step r (26r6k): The maximal node Lr ∈ Gr for h(r) is Lr = {hi1 ; : : : ; his ; h(r) } where hi1 ; : : : ; his ; 06s6r − 1, are the minihypotheses accepted prior to step r. If T(r) ¿uLr where uLr is the stepup critical value in Section 2.3, then: (1) reject h(r) ; (2) remove (reject) from the graph Gr all nodes containing the minihypothesis h(r) ; denote the resulting graph by Gr+1 ; otherwise: (1) accept h(r) ; (2) set Gr+1 = Gr . If r = k, then stop, otherwise go to the next step. After the GSU algorithm stops, all the nodes that have not been labelled ‘rejected’ are labelled ‘accepted’. The ‘p-value form’ of the test at step r is pLr ¡ where the p-value pLr is de ned in (6). A numerical illustration is given at the end of Section 5. The rule of total rejection (the break rule at the rst step). Assuming the critical values at the rst oor are identical: u1 = u2 = · · · = uk , if GSU algorithm has rejected at the rst step, it will reject everything. This rule should be incorporated into GSU algorithm. Except in cases when everything is rejected after the rejection at the rst step, GSU algorithm will test the general test path (L1 ; L2 ; : : : ; Lk ) consisting of maximal nodes. In general, it begins by testing the grand critical path of the graph G from bottom to top and tests it until the rst rejection; then it removes the nodes containing the current minihypothesis, and continues testing the grand critical path of the reduced graph until

46

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

the next rejection, etc. Thus, the general test path consists of pieces of grand critical paths of a sequence of steadily shrinking graphs when rejection signals a jump to the next smaller subgraph. Lemma 3.1. The decisions about all the hypotheses made by GSU algorithm are identical to the decisions in SU procedure in Section 2:2. Proof is given in the appendix. It follows that U is consonant, and since any consonant stepwise procedure is a UI procedure (Gabriel, 1969; Grechanovsky, 1993, Lemma 2:2), assertion (iii) in Proposition 3.1 is valid in the general case. If critical values depend on node cardinalities, then under an appropriate selection of critical values in SU procedure, GSU algorithm becomes equivalent to algorithms of Hochberg (1988), Rom (1990), or DT92. In Section 3.2 above we showed that the procedure UB based on the SUB algorithm may be liberal. We can obtain a simple conservative version of U if we use node tests in U but make the algorithm test all the nodes on the grand critical path from bottom to top.

4. Computation of critical values In this section we consider possible approaches to computation of stepup critical values de ned in Section 2.3. For the case of independently possibly non-identically distributed observations, somewhat conservative approach has been given by Hochberg (1988), and a slightly better version by Rom (1990). Now consider the normal setup with an additional assumption that the variance 2 is known. Without loss of generality, this assumption allows considerable simpli cations in presentation and in computations. When corr(Ti ; Tj ) = , DT92 proposed a method for computing stepup critical values by solving by iterations the system of equations Pr{(T(1) ; T(2) ; : : : ; T(r) )6(u1 ; u2 ; : : : ; ur )} = 1 −

(13)

for r = 1; : : : ; k. Now we turn to a more general normal setup and consider the case when the covariance matrix is a -matrix: T = (T1 ; T2 ; : : : ; Tk )0 ∼ Nk (O; ) where var(Ti ) = i2 , cov(Ti ; Tj ) = i j i j ; −1¡i ¡1; i = 1; : : : ; k. Here the DT92 method cannot be utilized since nodes sitting at a common oor have di erent correlation matrices and hence di erent stepup critical values. Of the two major ways around the diculty – Monte Carlo and approximations – we will develop here approximations. For a node I = {h1 ; : : : ; hr }, an approximate critical value of uI = uI () can be computed as  − dI ();  uI () = dI () + uI ()

(14)

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

47

where dI () is the usual critical value in test I associated with our -matrix, uI ()  is the stepup critical value for the equicorrelated matrix obtained by averaging corre is the usual critical value computed for the averaged correlations. lations, and dI ()  can be computed by standard methods, and uI ()  by iteratively Now dI () and dI () solving (13) as in DT92. Since the algorithm in DT92 might be too time-consuming, we can replace system (13) by the following system of equations: Pr{T(1) 6d1 } =1 − ; Pr{T(2) 6d2 } .. .

=1 − ; .. .. . .

(15)

Pr{T(r−1) 6dr−1 } =1 − ; Pr{(T(1) ; T(2) ; : : : ; T(r−1) ; T(r) )6(d1 ; d2 ; : : : ; dr−1 ; uI )} =1 − : Contrary to system (14), this system can be solved without iterations by rst solving the rst r−1 equations, and then substituting the obtained d1 ; : : : ; dr−1 into the last equation. Since d1 = u1 ; d2 ¡u2 ; : : : ; dr−1 ¡ur−1 , the resulting value of uI will be slightly bigger, i.e. conservative. Our experience shows that the error in uI resulting from computational inaccuracies in the previous critical values is negligible which validates approximations (15) and also (19) in the next Section. √ √ Example 4.1. For = 0:05; k = 4; 1 = 0; 2 = 3 = 0:5; 4 = 0:9; the stepdown and stepup critical values are as follows. Bottom oor: d1 = d2 = d3 = d4 = u1 = u2 = u3 = u4 = 1:6449. Second oor: d12 = 1:9545; u12 = 1:9600; d13 = 1:9545; u13 = 1:9600; d14 = 1:9545; u14 = 1:9600; d23 = 1:9163; u23 = 1:9330; d24 = 1:8846; u24 = 1:9053; d34 = 1:8846, u34 = 1:9053. Third oor: d123 = 2:1009; u123 = 2:1046; d124 = 2:0832; u124 = 2:0876; d134 = 2:0832; u134 = 2:0876; d234 = 2:0279; u234 = 2:0385. Top oor: d1234 = 2:1746; u1234 = 2:1775. Some checkup on the accuracy of the proposed approximations has been done. First, we computed the upper and lower bounds on the critical value uI . Note that for a given , the bigger the values uJ ; J ∈ GI ; J 6= I , the smaller the corresponding uI . So, the use of upper bounds on all uJ ; J ∈ GI ; J 6= I , allows to calculate the lower bounds on uI , and vice versa. The bounds can be calculated recursively from bottom to top. Assuming that for all J ∈ GI ; J 6= I , the upper bounds on uJ have been computed, we set ui ; i = 1; : : : ; r − 1, in Eq. (13) equal to maximum of those bounds at the respective oor and then calculate the lower bound ul on uI by solving (13) as described above. Similar calculations produce the upper bound ur on uI . Our simulations for √ = 0:05; 06i 6 0:5; and k = 3; 4; 5; showed that in all cases considered we had −3 0¡u√ r − ul ¡2 × 10 . Furthermore, for a number of -matrices with i ’s ranging from 0 to 0:5, and k =3; 4; 5; we computed the stepup critical values by approximation (14) and found them lying between ul and ur . When k increases, the di erence uI ()−d   I () decreases (as can be seen for example from tables in DT92), and dI () can be used as an approximate value of uI (). As to the conservative approximation (15), we

48

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

calculated the di erences between the two approximations for = 0:05 under assumption of equicorrelation for a few ’s between 0 and 0.5 and found that they do not exceed 10−3 . As a result, the conservative approximation appears suitable for computations, at least in the indicated range of parameters. 5. p-values of minihypotheses DT92 gave a method of computation of p-values of minihypotheses for the equicorrelated case under the normal setup. Their method is closely connected to the SUB algorithm which is not valid in the non-equicorrelated case (Section 3.2 above). For the same reason we cannot use p-values in DT95. Below we suggest a new method of calculating p-values for the general case under the normal setup. Computation of p(r) ’s is based on the GSU algorithm in Section 3.3 and on the following two lemmas. Lemma 5.1. (i) p(1) = pI(1) ; (ii) Assuming that for some r; 1¡r6k; there have been computed the p-values p(1) ; p(2) ; : : : ; p(r−1) ;

(16)

the p-value p(r) is computed either as (a) p(r) = pI where I ∈ Rr ; with Rr = {I : I = t(r) }; or as (b) p(r) = p( j) where p( j) ; 16j6r − 1; is one of p-values in (16). (Here p(1) and pI are de ned as in (6).) Proof in the appendix. Lemma 5.2. Assuming that p(r) 6= p(i) for r¿1 and i = 1; : : : ; r − 1; we have p(r) = pI if and only if for = pI ; the GSU algorithm arrives to node I at step r. Proof in the appendix. Although Lemmas 5.1 and 5.2 de ne the set of values which can be taken by p-values, they do not give an explicit formula for computing p(r) at step r, say, so the algorithm for computation of p(r) ’s described below searches at each step the set of feasible values and checks which of them is the actual value of p(r) . General method of computation of p-values For r = 1; p(1) = pI(1) where I(1) = {h(1) }. For r¿1, assume that p(i) ;

i = 1; : : : ; r − 1;

(17)

have been already computed. By Lemma 5.1, it suces: (i) To check whether one of p(i) ’s in (17) can be used as p(r) . There may be less than r − 1 di erent values in (17) to check. (ii) If no p(i) in (17) is acceptable, consider all pI ’s with I ∈ Rr . By Lemma 5.2, a node I will provide a required p-value p(r) if for = pI , GSU algorithm arrives to I at step r.

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

49

Algorithm for computation of p-values Step 1. For r = 1; p(1) = pI(1) . Main Step. For r¿1; do the following steps: 1 For I = {h(1) ; : : : ; h(r) } do: 1.1 Set = pI ; 1.2 Run GSU algorithm up to step r; if node I is reached at step r, then set p(r) = pI and go to next r; otherwise, go to 2. 2 For i = r − 1; r − 2; : : : ; 1; do: 2.1 If p(i) = p( j) where i¡j6r − 1; then decrease i; 2.2 Set − = p(i) −  and + = p(i) +  where  = 10−4 ; 2.3 If GSU algorithm accepts h(r) for − and rejects it for + , then set p(r) = pi and go to next r; otherwise, go to 3. 3 For all I ∈ Rr except I = {h(1) ; : : : ; h(r) } do: 3.1 Set = pI ; 3.2 Run GSU algorithm up to step r; if node I is reached at step r, then set p(r) = pI and go to next r; otherwise, try another node I ∈ Rr . It can be veri ed that in the equicorrelated case, the algorithm computes the p-values in DT92, p. 168 (Grechanovsky and Pinsker, 1996). In general, p-values may not be marginally monotone (cf. examples below). For calculating the required p-values pI when the correlation matrix is a -matrix, we suggest an approximation similar to (14):  − pI0 ():  pI () = pI0 () + pI ()

(18)

 is the stepup p-value computed for Here pI0 () is the usual p-value in test I , pI () 0  is the usual p-value for the averaged correlations. the averaged correlations, and pI () The stepup p-value pI ()  can be computed either by an iterative algorithm in DT92, or from the following simpli ed system: Pr{T(1) 6d1 } =1 − pI0 ; Pr{T(2) 6d2 } .. .

=1 − pI0 ; .. .. . .

Pr{T(r−1) 6dr−1 } =1 − pI0 ;

(19)

Pr{T(r) 6t(r) } =1 − pI0 ; Pr{(T(1) ; T(2) ; : : : ; T(r−1) ; T(r) )6(d1 ; d2 ; : : : ; dr−1 ; t(r) )} =1 − pI : Now we give examples of computation of node p-values and procedural p-values of minihypotheses. √ √ Example 5.1. Dimension: k = 3. Correlation matrix: 1 = 2 = 0:5; 3 = 0:3. Observed values: t1 = 1:63; t2 = 1:93; t3 = 1:931. Node p-values: p1 = 0:0515; p2 = 0:0268;

50

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

p3 = 0:0267; p12 = 0:0500; p13 = 0:0513; p23 = 0:0513; p123 = 0:0707. p-values of minihypotheses: p(1) = p1 = 0:0515; p(2) = p12 = 0:0500; p(3) = p13 = 0:0513. √ Example 5.2. Dimension: k = 4. Correlation matrix: 1 = 2 = 3 = 0:5; 4 = 0. Observed values: t1 =1:63; t2 =1:91; t3 =2:1; t4 =2:101. Node p-values: p1 =0:0515; p2 = 0:0280; p3 = 0:0178; p4 = 0:0178; p12 = 0:0523; p13 = 0:0337; p14 = 0:0356; p23 =0:0337; p24 =0:0356; p34 =0:0356; p123 =0:0467; p124 =0:0506; p134 =0:0506; p234 = 0:0506; p1234 = 0:0635. p-values of minihypotheses: p(1) = p1 = 0:0515; p(2) = p1 = 0:0515; p(3) = p123 = 0:0467; p(4) = p124 = 0:0506. Now we describe the performance of the GSU algorithm for the data in Example 5.2 above. Example 5.3. For =0:05, GSU algorithm tests the following nodes on the general test path: GTP = ({1}+ ; {1; 2}+ ; {1; 2; 3}− ; {1; 2; 4}+ ). At node {1} there is acceptance (+), GSU algorithm goes to node {1; 2} where there is acceptance too, GSU algorithm goes to node {1; 2; 3} where there is rejection (−), GSU algorithm goes to node {1; 2; 4} at the same oor where it accepts. See Fig. 1 where its general test path is shown by thick line. It follows that GSU algorithm accepts minihypotheses h(1) ; h(2) and h(4) while rejecting h(3) . Such a decision could not be obtained in an equicorrelated case as the SUB algorithm would have stopped testing after rejecting h(3) , and rejected h(4) automatically. For another , a general test path may be di erent. If the procedural p-values of minihypotheses have been computed, all the decisions can be obtained by comparing them to without explicit construction of the general test path.

6. Power The general SU procedure was compared in terms of power with SD procedure and also with two other versions of SU procedure: the liberal one based on SUB algorithm, and the conservative one described at the end of Section 3.3. In general, the power in multiple testing is de ned as the probability of rejecting at least t alternatives with t ranging from 1 to the actual number of alternatives. The most useful de nition appears to be related to rejection of all available alternatives. We looked at two versions of power: ‘rejection of all alternatives’ and ‘rejection of at least one alternative’. In adopting these de nitions of power we follow DT92 and are able to compare our results with theirs. We considered one-sided tests under normal setup with various covariance -matrices. Since no closed formulas for power are available, we performed Monte Carlo simulations with 10,000 runs for a number of k’s, ’s and ’s con gurations. The results mainly agree with those in DT92, albeit the prevalence of SU procedure over SD procedure appears even more pronounced. Table 1 presents a summary of our calculations for the ‘rejection of all alternatives’ de nition of power. For each combination of ’s and each number of alternatives it

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

51

Table 1 Overview of comparative powers of SU and SD procedures Number of alternatives 1

2

3

4

5

D :? : U

D :? : U

k



D :? : U

D :? : U

D :? : U

3

(0; 0:51=2 ; 0:51=2 )

(0:11=2 ; 0:51=2 ; 0:51=2 ) (0:11=2 ; 0:31=2 ; 0:51=2 ) (0:31=2 ; 0:31=2 ; 0:51=2 ) (0:31=2 ; 0:51=2 ; 0:51=2 ) (0:4; 0:9; 0:877) 

1:5:0 1:5:0 3:6:0 3:3:0 3:3:0 5:1:0 16 : 23 : 0

0:1:5 0:3:3 0:4:5 0:4:2 2:2:2 1:5:0 3 : 19 : 17

0:0:3 0:0:3 0:0:3 0:0:3 0:0:3 0:0:3 0 : 0 : 18

4

(0:11=2 ; 0:11=2 ; 0:11=2 ; 0:51=2 ) (0:11=2 ; 0:11=2 ; 0:51=2 ; 0:51=2 ) (0:11=2 ; 0:31=2 ; 0:31=2 ; 0:51=2 ) (0:11=2 ; 0:51=2 ; 0:51=2 ; 0:51=2 ) (0:11=2 ; 0:31=2 ; 0:51=2 ; 0:71=2 ) (0:11=2 ; 0:11=2 ; 0:91=2 ; 0:91=2 ) 

0:6:0 0:6:0 0:9:0 2:4:0 0:6:0 0:5:1 2 : 36 : 1

0:3:3 0:4:5 0:8:4 0:4:2 1:5:3 0:7:2 1 : 31 : 19

0:1:5 0:2:4 0:2:4 0:2:4 0:3:6 0:2:4 0 : 12 : 27

0:0:3 0:0:3 0:0:3 0:0:3 0:0:3 0:0:3 0 : 0 : 18

5

(0; 0; 0:51=2 ; 0:51=2 ; 0:51=2 ) (0:11=2 ; 0:11=2 ; 0:31=2 ; 0:31=2 ; 0:31=2 ) (0:11=2 ; 0:11=2 ; 0:51=2 ; 0:51=2 ; 0:51=2 ) (0:11=2 ; 0:31=2 ; 0:51=2 ; 0:71=2 ; 0:91=2 ) (0:1; 0:3; 0:5; 0:7; 0:9) (0:55; 0:6; 0:7; 0:8; 0:9) 

0:6:0 0:6:0 0:6:0 2 : 13 : 0 0:6:0 2:4:0 4 : 41 : 0

0:2:4 0:5:1 0:5:1 0:4:5 0:3:3 2:4:0 2 : 23 : 14

0:0:6 0:0:6 0:0:6 0:4:2 0:1:5 0:3:3 0 : 8 : 28

0:1:5 0:0:6 0:0:6 0:3:6 0:0:6 0:0:6 0 : 4 : 35

0:0:3 0:0:3 0:0:3 0:0:3 0:0:3 0:0:3 0 : 0 : 18

gives the number of times SD procedure prevailed over SU procedure, the number of draws, i.e. insigni cant results, and the number of times SU procedure outperformed SD procedure. The  lines give the sums of those numbers for respective k. As can be seen, SU procedure often loses to SD procedure for a single alternative, mostly wins for two alternatives, and practically always wins for more than two alternatives. Thus, there is a clear and growing tendency of SU procedure to outperform SD procedure for two and more alternatives. The di erences between SU and conservative SU procedures were almost always insigni cant and so not reported here. As to ‘rejection of at least one alternative’ de nition of power, the results were qualitatively similar, the actual number of wins of SU procedure over SD one being in general less by a factor of two to three. 7. Conclusions and possible extensions In this work, we developed a general approach to SU procedures for free-combinations families that enables us to construct SU procedures in the general non-symmetric case (arising, for instance, in comparing treatments with a control in imbalanced oneway

52

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

designs). We replaced the usual nominal stepdown tests I with new stepup tests I , and de ned SU procedures as the coherent procedures having new nominal tests. New approximations for critical values that provide -levels in tests I allow to considerably simplify computations. We also developed a new test algorithm equivalent to thus de ned SU procedure that is no more complicated than the traditional SUB algorithm. It starts at the node containing a single maximal p-value and proceeds along the grand critical path just as the traditional algorithm does but then switches o it by the rst rejection. In symmetric cases, it reduces to the SUB algorithm and produces the algorithms of Hochberg (1988), Rom (1990), and DT92 under appropriate assumptions. Thus, three levels are clearly separated: the de nitions of nominal tests and SU procedures, the computation of critical values guaranteeing the control of FWE rate, and development of test algorithms used for actually carrying out the testing. Besides conceptual clarity, this approach can be generalized for testing pairwise contrasts in families of blocks for balanced designs (Grechanovsky, 1994, 1995; Grechanovsky and Pinsker, 1997), and in closed families (research in progress). Furthermore, it allows to dispense with some non-trivial ad hoc theorems on FWE control (Theorem 3:1 in DT92 and the corresponding theorem in Welsch (1977)) deriving it instead from basic principles (Peritz’s theorem for closed families and Tukey–Ryan–Welsch levels (Hochberg and Tamhane, 1987, p. 69) for blocks). The use of new procedural tests I is not the only feasible way to improve on the usual SD procedures; an alternative solution has been suggested by Hommel (1988) who used Simes’s tests as node tests. Computation of critical values in SU procedures is a dicult problem, so that even after the suggested simpli cations the computational burden is much heavier than in SD procedures. But precisely because stepwise procedures appear to be computer-intensive, the computational di erences between SU procedures and SD ones may not be that important after all; as far as the user has an appropriate program, say, as an option in a statistical package, he should not care if the computations are more intensive than “usual” as far as the job is done by a computer. As to the comparative powers, the SU procedures have demonstrated their superiority over the SD ones in many albeit not all cases. The gains are mostly moderate but still non-negligible. There are a few extensions which have not been implemented yet; among them are two-sided tests and Student distributions rather than normal for unknown variances. Less trivial extensions might include the use of resampling as done for instance in Westfall and Young (1993).

Appendix Proof of Lemma 3.1. First we prove that if a node I is rejected by GSU algorithm, then it will be rejected by U . Using coherence, it will suce to prove that it will be -rejected. Assume that a node I has been rejected by GSU algorithm at step r; r¿1, on rejecting the minihypothesis h(r) . If I contains at least one minihypothesis h(i) ; i¡r, rejected by GSU algorithm prior to step r, then I has been rejected

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

53

by GSU algorithm at step i and, by induction, I is rejected by U . Therefore, assume that I contains some or all of the minihypotheses hiq in Lr accepted by GSU algorithm prior to step r, the minihypothesis h(r) , and possibly some minihypotheses h( j) with j¿r. The node I 0 obtained by removing from I all h( j) ; j¿r, is maximal for I and h(r) , and thus I 0 ∈ CI . Since Lr ⇒ I 0 , we have I 0 = T(r) ¿uLr ¿uI 0 . Therefore, I 0 is -rejected, and hence I is -rejected. Now we prove the converse, i.e. that if a node I has been accepted by GSU algorithm, it is also accepted by U . I consists of minihypotheses accepted by GSU algorithm: I = {h(i1 ) ; : : : ; h(ir ) ; : : : ; h(in ) }, and its critical path is CI = (In ; : : : ; Ir ; : : : ; I1 ) where Ir = {h(i1 ) ; : : : ; h(ir ) }. From the description of GSU algorithm it follows that for ∀ r; 16r6n; Ir ∈ general test path, and acceptance of each h(ir ) implies Ir =0. Therefore, I is -accepted, and by coherence is accepted by U . Proof of Lemma 5.1. (i) is obvious. (ii) Select − and + satisfying − ¡p(r) ¡ +

(20)

and so close that no p-values pI not equal to p(r) lie between them. Consider two general test paths t − and t + produced by GSU algorithm for − and + , respectively (both start from node L1 = I(1) ). 1. First assume that t − and t + are identical at least up to Lr reached by GSU algorithm at step r. As GSU algorithm makes decisions about h(r) at step r by checking pLr ¡ , and it is equivalent to U (Lemma 3.1), (20) gives − ¡pLr ¡ + which by selection of − and + implies p(r) = pLr , so that here we have case (a). 2. Now assume that t − and t + split for the rst time by a step l¡r at a node Ll . Then t − corresponds to acceptance at Ll and t + to rejection. Di erent decisions at Ll are possible only if − ¡pLl ¡ + which by selection of − and + implies p(r) = pLl . Arguing for node Ll the same way as for node Lr in part (1) above, we obtain p(l) = pLl which gives p(r) = p(l) , so that here we have case (b). Proof of Lemma 5.2. (i) Part ‘only if’. Selecting − and + as in (20), we have − ¡ = pI ¡ + :

(21)

By p(r) 6= p(i) , the GTPs t − and t + cannot split prior to step r (cf. Part 2 in proof of Lemma 5.1). Now, as in proof of Part 1 of Lemma 5.1, we obtain p(r) =pLr wherefrom Lr = I with probability 1. (ii) Part ‘if’. Select − and + satisfying (21) and close enough not to contain between them pJ ’s and p(i) ’s di erent from pI . Then the corresponding GTPs t − and t + part the rst time at step r: t − accepts and t + rejects. By equivalence of GSU algorithm and SU procedure (Lemma 3.1), we have − ¡p(r) ¡ + which by choice of − and + in (21) gives p(r) = pI . Note that the condition p(r) 6= p(i) is not used in proving part (ii).

54

E. Grechanovsky, I. Pinsker / Journal of Statistical Planning and Inference 82 (1999) 35–54

References Dunnett, C.W., Tamhane, A.C., 1992. A step-up multiple test procedure. J. Amer. Statist. Assoc. 87, 162–170. Dunnett, C.W., Tamhane, A.C., 1995. Step-up multiple testing of parameters with unequally correlated estimates. Biometrics 51, 217–227. Gabriel, K.R., 1969. Simultaneous test procedures – some theory of multiple comparisons. Ann. Math. Statist. 40, 224–250. Grechanovsky, E., 1993. Multiple comparisons, and a conditional approach in linear regression analysis. Unpublished Ph. D. Dissertation. Dept. of Statistics, Hebrew University, Jerusalem, Israel. Grechanovsky, E., 1994. A stepdown theory for stepup procedures. Technical Report 94-1, Series in Applied Statistics. Dept. of Statistics, Tel Aviv University, Tel Aviv, Israel. Grechanovsky, E., 1995. A stepdown theory for stepup test procedures. Paper Presented at the 155th Annual Joint Statistical Meetings of American Statistical Association, Orlando, Florida. Grechanovsky, E., Pinsker, I., 1996. A general approach to stepup multiple test procedures for free-combinations families. Technical Report 96-1, Jerusalem, Israel, unpublished. Grechanovsky, E., Pinsker, I., 1997. A general approach to stepup testing of pairwise contrasts in block families, submitted for publication. Hochberg, Y., 1988. A sharper Bonferroni procedure for multiple tests of signi cance. Biometrika 75, 800–802. Hochberg, Y., Tamhane, A.C., 1987. Multiple Comparison Procedures, Wiley, New York. Holm, S., 1979. A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6, 65–70. Hommel, G., 1988. A stagewise rejective multiple test procedure based on modi ed Bonferroni test. Biometrika 75, 383–386. Liu, W., 1996. Multiple tests of a non-hierarchical nite family of hypotheses. J. Roy. Statist. Soc. B 58 (2), 455–461. Rom, D., 1990. A sequentially rejective test procedure based on a modi ed Bonferroni inequality. Biometrika 77, 663–665. Simes, R.J., 1986. An improved Bonferroni procedure for multiple tests of signi cance. Biometrika 73, 751–754. Sonnemann, E., 1982. Allgemeine Losungen multipler Testprobleme. EDV Med. Biol. 13 (4), 120–128. Welsch, R.E., 1977. Stepwise multiple comparison procedures. J. Amer. Statist. Assoc. 72, 566–575. Westfall, P.H., Young, S.S., 1989. p value adjustments for multiple tests in multivariate binomial models. J. Amer. Statist. Assoc. 84, 780–786. Westfall, P.H., Young, S.S., 1993. Resampling-Based Multiple Testing. Wiley, New York. Wright, S.P., 1992. Adjusted p-values for simultaneous inference. Biometrics 48, 1005–1013.