Computer Methods and Programs in Biomedicine, 29 (1989) 179-190
179
Elsevier COMMET 01003 Section II. Systems a n d p r o g r a m s
A program for the computation of power and determination of sample size in hierarchical experimental designs N . J e a n p r S t r e a n d R. K r a f t s i k Institute of Anatomy, Unioersity of Lausanne, 1005 Lausanne, Switzerland
We have devised a program that allows computation of the power of F-test, and hence determination of appropriate sample and subsample sizes, in the context of the one-way hierarchical analysis of variance with fixed effects. The power at a fixed alternative is an increasing function of the sample size and of the subsample size. The program makes it easy to obtain the power of F-test for a range of values of sample and subsample sizes, and therefore the appropriate sizes based on a desired power. The program can be used for the 'ordinary' case of the one-way analysis of variance, as well as for hierarchical analysis of variance with two stages of sampling. Examples are given of the practical use of the program. Power; Sample size; Hierarchical analysis of variance; Computer program; Two-stage sampling
1. Introduction O n e of the most widely used techniques for the statistical analysis of b i o m e d i c a l data w h e n the goal is c o m p a r i n g several p o p u l a t i o n s is the a n a l y sis of variance ( A N O V A ) . A universal p r o b l e m i n the p l a n n i n g of such investigations is that of deciding what constitutes a n a d e q u a t e sample. O f t e n samples will p r e s e n t a hierarchical structure. A m o r p h o m e t r i c investigation, for example, typically involves the choice of the n u m b e r of a n i m a l s followed b y the choice of the n u m b e r of observations per animal. O u r o w n interest in this type of p r o b l e m arose i n response to the need for statistical analysis a n d p l a n n i n g in investigations conc e m i n g m o r p h o m e t r i c properties of b r a i n cells, such as for i n s t a n c e the total l e n g t h of dendrites b e l o n g i n g to n e u r o n s of a p a r t i c u l a r type. O f t e n the goal of the investigation is to c o m p a r e the q u a n t i t y of interest or m e a s u r e m e n t across a n u m -
Correspondence: R. Kraftsik, Institute of Anatomy, rue du Bugnon 9, 1005 Lausanne, Switzerland.
ber of specified groups of animals. O n c e the groups are defined, the e x p e r i m e n t e r m u s t select a n u m ber of a n i m a l s from each group, a n d from each a n i m a l o n e or more cells o n which the q u a n t i t y of interest is measured. I n general we speak of the sample size or n u m b e r of p r i m a r y u n i t s i n s t e a d of animals, a n d of the s u b s a m p l e size or n u m b e r of s u b u n i t s i n s t e a d of cells; the statistical t e c h n i q u e for this p a r t i c u l a r s a m p l i n g scheme is called the hierarchical analysis of variance (the term nested analysis of variance is also c o m m o n ) * . T h e o r d i n a r y analysis of variance is i n c l u d e d as the special case when the s u b s a m p l e size is 1; it is treated explicitly i n o u r approach. It is a n i m p o r t a n t part of the statistical p l a n n i n g of the e x p e r i m e n t to help d e t e r m i n e r a t i o n ally, in our example, the n u m b e r of a n i m a l s to be
* The same term, hierarchical (or nested) analysis of variance, is sometimes used for a different purpose, the estimation of a parameter of a single population [1]. In that context, interest centers on the effect of the variabilities of the different stages of sampling on the variability of the estimate of the parameter. For morphometric applications see [2].
0169-2607/89/$03.50 © 1989 Elsevier Science Publishers B.V. (Biomedical Division)
180
selected in each group and the number of cells to be measured in each animal. Failure to take into account properly this component of planning m a y result in inability to demonstrate differences in the quantity of interest between groups (too few animals per group or cells per animal), or in an excessively costly or long experiment (too m a n y animals or cells per animal). It m a y also result in confusion at the analysis stage, when investigators believe wrongly that substituting more observations per animal instead of more animals equally increases the experiment's precision. The mathematical definition of power involves complicated formulas and leads to heavy computation. Traditionally the practical way was recourse to printed tables [3,4] or nomograms [4]. Both of these last two approaches are limited and laborious; thus the computer provides a valuable alternative. Computer programs that have been published have not addressed the sampling aspect for the hierarchical analysis of variance. In this paper we present a program for computation of the statistical power of the hierarchical as well as ordinary analysis of variance F-test. The program produces tables of power values useful for the determination of appropriate sample and subsampie sizes. The data required are mainly estimates of the variance between units and of the variance between subunits; also a value must be indicated for an important parameter, the smallest difference between populations that is deemed to be of interest. The program is interactive and allows flexibility in the manner in which parameter values and data are introduced. Besides its main goal in the planning of experiments, a subsidiary use of the program is for the purpose of checking the power of the F-test for an experiment already completed. This m a y be valuable particularly when the test did not discover statistically significant differences between the populations. After some theoretical notions in Section 2, Section 3 presents the mathematical basis for the computation. A few indications are given on different existing computational methods, in order to facilitate contact with the literature and not as a systematic comparison.
Section 4 describes the main segments and notations of the program. Several examples show the different possible uses of the program (Section 5). Finally, the main program is listed in the Appendix. References are given when published subroutines are used.
2. Power and the determination of the number of observations: theoretical basis
To introduce the hierarchical one-way analysis of variance design consider a study of the effect of age on the total length of dendrites of the nerve cells of a certain type (e.g., pyramidal cells) in the human visual cortex. As the visual cortex may contain hundreds of millions of the cells of interest, measurement of dendritic length must be restricted to a sample of cells. For clarity we will say that the h u m a n subjects selected in one age group form a sample while the cells selected from one individual form a subsampie; and the complete procedure is termed twostage sampling. In general items selected at the first stage are termed primary units or experimental units, those at the second stage secondary units or subunits. To the above situation corresponds the model Xijk = ll -~ ~ti q- fl(i)j q- e i j k ,
i=l,...a,
j = l,... b,
k=l,...r
where in general a is the number of treatments, b the number of experimental units, or primary units per treatment, and r the n u m b e r of subunits per primary unit; /~ is the general mean over all the treatments and oti the differential effect of the ith treatment, which is assumed to be non-random, the sum of the a,s being zero; fl
181
Source
Sums of Squares
df's
(SS)
Treatment
Units
SS a = br Z (~i..-x...) 2
SS b = r E E ( ~ i j . - ~ i . . ) 2
a-i
a(b-l)
M e a n Squares
Expected M e a n
F-
(MS)
Squares
statistic
MS a =
~
+ r ~
SSa/(a-I )
br ~ *
MS b ffi
o'2 + r ~
(EMS)
+
MSa/MS b
SSb/a(b-l)
Subunits
SS r = E E E (Xijk--fij.)2
ab(r-l) MS r ffi
o"2
SSr/ab (r- i)
Total
SS T = E E E (Xijk-~...)2
abr-i
(corr.) * O'a2 = Z
ct
~i/(a-1). Fig. 1. Analysis of variance table for the hierarchical design.
units are c o n s i d e r e d to c o n t r i b u t e o n l y to the v a r i a b i l i t y of the o b s e r v a t i o n s , so t h a t the terms fl(o; are a s s u m e d to follow i n d e p e n d e n t n o r m a l d i s t r i b u t i o n s with zero m e a n s a n d a c o m m o n varia n c e ob2. W e s u p p o s e as well that such ' i n d i v i d u a l effects' are i n d e p e n d e n t of the 'cell effects' ei; k. T h e analysis o f v a r i a n c e t a b l e for this m o d e l is i n d i c a t e d in Fig. 1. O t h e r details o n this design m a y b e f o u n d in [5] o r [6]. U n d e r the no-effect hypothesis, H 0 : 4 1 = • .- = aa = 0, the F - s t a t i s t i c M S a / M S b follows the F ( a 1, a ( b 1)) d i s t r i b u t i o n , H 0 b e i n g r e j e c t e d at level a w h e n e v e r M S ~ / M S b > ga 1, a(b-1),l-a, t h a t is w h e n the F - s t a t i s t i c exceeds the u p p e r ~t p o i n t o f the F - d i s t r i b u t i o n with a - 1 a n d a ( b - 1) degrees of freedom. T h e p o w e r 7r o f the test d e p e n d s o n the p a r t i c u l a r a l t e r n a t i v e O = (41 . . . . t~a), a n d o n a, b, r, ct, o 2 a n d o2. It t u r n s o u t t h a t ~r d o e s n o t d e p e n d i n d i v i d u a l l y o n each of the a i s b u t o n l y on the s u m of their s q u a r e s •¢t 2, a g l o b a l m e a s u r e o f the m a g n i t u d e of the a~s. F o r m a t h e m a t i c a l reasons this m u s t b e specified in the f o r m of the p a r a m e t e r 3 d e f i n e d as
where on2 = o~ + o ~ / r .
Under a particular alternative characterized by a n o n - z e r o value of 3, the F - s t a t i s t i c is d i s t r i b u t e d a c c o r d i n g to the n o n - c e n t r a l F - d i s t r i b u t i o n , den o t e d b y F ' ( a - 1, a ( b - 1), 8), where a - 1 a n d a(b-1) are degrees of f r e e d o m a n d 8 is also called the n o n - c e n t r a l i t y p a r a m e t e r of the distribution. T h u s the p o w e r of the F - t e s t for t~sting H 0, at an a l t e r n a t i v e 3, ~=P(MSa/MSb>Fa-I,a(b
1).l ~l ~)
is a f u n c t i o n of a, b, a a n d 8, where P ( E ] 8 ) d e n o t e s the p r o b a b i l i t y of a n event E u n d e r the d i s t r i b u t i o n F ' ( a - 1, a ( b - 1), 8). If a, b, r, a are fixed, ~r is an i n c r e a s i n g f u n c t i o n of 8 whose g r a p h is the p o w e r curve of the test. In p r a c t i c e it m a y b e difficult to assign n u m e r i cal values to 3 o w i n g to d i f f i c u l t y in i n t e r p r e t a t i o n of this p a r a m e t e r . It will b e f o u n d m o r e convenient to assign a value to the m i n i m u m d i f f e r e n c e
182
to be detected between the treatments' averages, thereafter denoted by za. This new parameter m a y be then related to the previous one b y the formula (see [5] p. 64): a = ( b / 4 ) A 2 / o 2.
POWER 10-
0B-
OS-
04"
I n order to use this formula (and the previous one defining 8) a f and 02 will have to be k n o w n f r o m previous experiments or estimated in a pilot experiment; in the latter case m e a n squares m a y be obtained f r o m the same A N O V A table as above but applied to the preliminary data, estimates being then s 2 -- MS r for o2, and s 2 = (MS b -MSr)/r for o 2. If in the previous experiment the n u m b e r of p r i m a r y units is n o t the same in all the treatment groups, but the n u m b e r of subunits is still the same for all p r i m a r y units, the variances m a y still be estimated b y means of a simple m o d ification. Let N be the total n u m b e r of observations (subunits) for the experiment, and let B be the total n u m b e r of p r i m a r y units. SS b and SSr m a y be c o m p u t e d according to the same formulas as above except that for each i, j runs f r o m I to the n u m b e r of p r i m a r y units in the ith group; let MS b = SSb/(B -- a ) and MS r = S S J ( N - B). T h e n 02 is again estimated by MS r and Ob2 b y ( M S b MS~)/r. (Note: it m a y h a p p e n occasionally that M S b is less than MS r. In such cases the estimate of 0 2 will be set equal to zero.) The sample and subsample sizes m a y be determined as follows. Let us assume given values of 0 2, of, a and a; a n d let A be the chosen value of the difference to be detected. The power is n o w a function of the sample size b and of the subsample size r, increasing with b o t h of them, and tending to 1 for fixed r and b tending to infinity. A simple strategy is then, for a fixed value of r, to increment b until a desired value for ~r is attained. A n illustration is given in Fig. 2 based on artificial data. The power m a y also be represented, for fixed sample size, as a function of subsample size (see Fig. 3 for an illustration). In general, therefore, there will be several pairs b, r for the same power, a m o n g which a choice m a y be m a d e according to cost, time or other considerations. T h e case o f the ordinary one-way analysis of variance m a y be obtained as the special case where
0~-
00'
~
[ ~
F ~
I l l 0 r ll2 I 1 ;
' ,6 I I~ 1210 I 212 ' 214 1 216 ' 218 1 3 ]
SAMPLE SIZE
( b )
Fig. 2. P o w e r of the F-test for a n h i e r a r c h i c a l a n a l y s i s of v a r i a n c e as a f u n c t i o n of s a m p l e size, for d i f f e r e n t s u b s a m p l e sizes.
S u b s a m p l e size: 4, 12, 20, 28 P a r a m e t e r values: a = 0.05, a = 5, A = 3, st, = 2, s r = 5.
the n u m b e r of observations r is equal to 1. The two error terms fltoJ and ~ijl a r e n o w c o n f o u n d e d and are therefore replaced b y a single term e~j = fl(oJ + e~Jl, leading to the familiar model Xij=~-Cliq-Eij
,
i=l,...a,
j=l,...b,
eij normal with m e a n zero and variance %2. N o t e the relationship between variances a f = 02 + a f = 02 (the symbol a f is a d o p t e d here for parallelism with traditional notation). The above formulas for
POWER lO
08
06
o4
02
o0
i ~ i ~ i / i i~ i 112 i 114 i1~
i?~ ~21e i z/ 1214 E~
SUBSAMPLE SIZE
i 2/ i 3/
( r )
Fig. 3. P o w e r of the F-test for a n h i e r a r c h i c a l a n a l y s i s of v a r i a n c e as a f u n c t i o n of s u b s a m p l e size, for different s a m p l e sizes. S a m p l e size: 4, 12, 20, 28 P a r a m e t e r values: i d e n t i c a l to those in Fig. 2.
183
Source
Sums of Squares
df's
(ss)
Treatment
SS a = n E (~i.-~..) 2
a-i
Mean Squares
Expected Mean
F-
(MS)
Squares (EMS)
statistic
MS a =
~
+ n ~*
MSa/MS e
SSa/(a-l)
Error
SS e = E E (xij-xi.)2
a(n-l)
MS e
=
ff
SSe/a(n-l)
Total
SS T = E E (xij-~..)2
an- I
(corr)
*
-
Fig. 4. A n a l y s i s of v a r i a n c e table for the o n e - w a y design.
3 and ~r remain valid, with o~z replaced b y o f. If o f is not k n o w n it m a y be estimated by means of a pilot experiment, the estimator being Se2 = MS e (see Fig. 4). However, if in the previous experim e n t the n u m b e r of observations is not the same in all the treatment groups, an estimate of oe m a y still be obtained with very little modification. I n d e e d let N be the total n u m b e r of observations in the previous experiment. Let the sum of squares for error SSe be c o m p u t e d according to the same f o r m u l a as above, and let M S e = S S e / ( N - a ) . T h e n Se2 = M S e is again an estimate of of. Alternatively it m a y be appropriate to fix Ea~, or Az, as a multiple of o f (e.g., A / o e = 1 or 2), thus avoiding the necessity of estimating o e.
3. Numerical methods
3.1. Methods used in the program 3.1.1. Obtaining the critical point of F-test Let ~" be the F-statistic for either type of the analyses of variance considered, and let p and q be the degrees of freedom. To obtain for given p, q and a the critical or upper a point Fp, q,1-a, we rely on the well-known relationship between the F-distribution and the incomplete beta function ratio I x (p. 187 in Ref.
[7], or chapter 24 in Ref. [8]). Based on this relationship, the critical point is given by G,q,~ - = (q/p)Bp,q,,_o/O- Bp,~,~ ~) where Be,q,~_~, the solution to the equation in x
Ix ( p/2, q/2) = 1 - a, is c o m p u t e d iteratively by N e w t o n ' s method. For this step we have used algorithms A C M 291, AS 63, and AS 6 4 / A S 109 f r o m Refs. [9], [10] and
Ill]. 3.1.2. Approximation of cdf of F" The m e t h o d in our p r o g r a m relies on an approximation to the cdf of F ' p r o p o s e d by Patnaik [12], which we have f o u n d simpler and faster than some of the most recent methods. Briefly, Patnaik's m e t h o d approximates the non-central F-distribution by an F-distribution with adjusted degrees of freedom and a scale factor. That is, if ~ has F ' ( p , q, 3) distribution, ~ / k has an approximate F ( m , q) distribution where
k=(p+28)/p, m=(p+23)Z/(p+43). The c o m p u t a t i o n of the power is thereby reduced to the c o m p u t a t i o n of a certain F integral; for this step we again use subroutine AS 63
184
[10]. F o r the a c c u r a c y of P a t n a i k ' s a p p r o x i m a t i o n , see [12] a n d [13].
3.2. Other approaches to the calculation of power 3.2.1. Tabulations T h e t y p e II e r r o r o f the test P ( . , ~ < Fp, q , l _ a [~), at a p a r t i c u l a r a l t e r n a t i v e 8, has b e e n t a b u l a t e d b y T a n g [3] for a = 0.01 a n d 0.05, 1 < p < 9, 2 < q < 30, q = 60, a n d q = ~ , in t e r m s o f B = Bp, q, 1-~ a n d o f the t r a n s f o r m @ o f 8 d e f i n e d b y = ~/28/(p + 1): ~ = 8 ( B , p, q, O) ( T a n g uses the n o t a t i o n E 2 for o u r B.) T a n g expresses the t y p e II e r r o r as an infinite series
fl = ~, ( 8 ~ e - * / i ! ) l B ( p / 2 + i, q/2). i=o T h e m e t h o d s of c o m p u t a t i o n in his p a p e r are: (1) direct use o f the series w i t h a finite n u m b e r o f terms. T h i s he i n d i c a t e s as p r a c t i c a l o n l y for s m a l l values o f 8 o r for large q; (2) a r e c u r r e n c e f o r m u l a for the p a r t i a l sums o f the infinite series. This p r o c e d u r e is l i m i t e d to the case where q is even. F o r o d d values recourse to i n t e r p o l a t i o n is necessary; (3) in the limiting case q = o0, a X 2 a p p r o x i m a t i o n is used. D i f f e r e n t m e t h o d s of c o m p u t a t i o n for /3, o r m o r e generally for the c d f F ' have b e e n given (see [14] a n d references therein).
3.2.2. Nomograms
c o m p u t e r s . A flow c h a r t of the p r o g r a m c a n b e f o u n d in Fig. 5. T h e p r o g r a m stores in the a r r a y NBIND the s a m p l e sizes a n d in the a r r a y NBOB the s u b s a m p l e sizes selected b y the user; p o w e r values a r e s t o r e d in the a r r a y APOW. T y p e II e r r o r is s t o r e d in v a r i a b l e ERR, a n d the p o w e r is c o m p u t e d as 1 ERR.
4.1. lnput parameters NBTREAT ALPHA IAOV ITREAT
IDAT
DELTA MSE $E TB TBMAX INT MSB MSR NBSR
TR
number of treatments - p r o b a b i l i t y of t y p e I e r r o r - flag for the t y p e of analysis o f variance - flag for choice b e t w e e n p l a n n i n g of e x p e r i m e n t a n d a single p o w e r computation - flag for choice b e t w e e n the i n p u t o f s u m s o f squares o r i n p u t of estimates - minimum difference between m e a n s to b e d e t e c t e d - error m e a n s q u a r e ( e s t i m a t e for e f ) - e s t i m a t e o f oe - lower l i m i t for the n u m b e r o f indiv i d u a l s in the s a m p l e - u p p e r limit for the n u m b e r of indiv i d u a l s in the s a m p l e - i n c r e m e n t of the n u m b e r of individuals - m e a n square b e t w e e n i n d i v i d u a l s - error m e a n s q u a r e ( e s t i m a t e for 02) - s u b s a m p l e size used for the e s t i m a tion o f a b in the p r o g r a m called ' d e g r e e s o f f r e e d o m for SB' - lower limit for the n u m b e r of o b servations in s u b s a m p l e - u p p e r limit for the n u m b e r of o b servations in s u b s a m p l e -increment o f the n u m b e r o f o b servations in the s u b s a m p l e
-
N o m o g r a m s are a v a i l a b l e allowing g r a p h i c a l det e r m i n a t i o n o f p o w e r for c e r t a i n ranges of p , q a n d ~ [4,15].
TRMAX
4. P r o g r a m
4.2. Output of the program
description
T h e p r o g r a m was w r i t t e n in F O R T R A N 77, o n a VAX 11/750 computer from Digital Equipment Company, and can be run on a broad range of
INT-S-E
T h e p r o g r a m p r i n t s the i n p u t values a n d a t a b l e o f p o w e r values for d i f f e r e n t s a m p l e a n d s u b s a m p l e sizes l i m i t e d b y the lower a n d u p p e r limit o f the
185
DEBUT )
(AOV)
=1
= 2
(HAOV)
YES
yNO
I COMPUTE NB-IND I YESt ~ N O "~
/~,,~-~-~
It
POWER J I T
NO
I'YES
YES
~ NO YES
( STOP ) ( Fig. 5. Flow chart of the main program.
STOP )
186
sample and subsample with the given step sizes.
*eeCALCULATION
W i t h fixed values for the m e a n squares several tables can be c o m p u t e d for several differences to be detected. It also prints the values: $ E - estimate of o e
NUMBER OF TREATMENTS
S B - estimate of o b S R - estimate of o r
VALUE OF ALPHA ORDINARY AOV PLANNING
?:
(EG:
(I) O R H I E R A R C H I C A L
DIFFERENCE
T O BE D E T E C T E D
V A L U E S O F MSR, M S B
M S B 7:
3
7:
T h e programs use three subroutines from [9], [10]
S I Z E 7:
I0
and [11]: - the real function At_0GAM to compute the natural logarithm of g a m m a function (algorithm A C M 291) [9]
MAXIMUM
SUBSAMPLE
S I Z E ?:
30
- the real f u n c t i o n XlNBTA to c o m p u t e the in-
S T E P S I Z E ( NB.
(NB. INDIV.)
INDIV.)
?:
?:
10
?:
20
3
i0
A0V
P O W E R of H I E R A R C H I C A L NBTRFAT-
A L P H A - 0 . 0 5 SR-
DIFFERENCE
?: I0
(NB. OES)
S A M P L E S I Z E (NB. INOIV.)
MAXIMUM SAMPLE SIZE
0.305 SB-
0,500
TO BE D E T E C T E D -
NUMBER OF OBSERVATIONS
IN S U B S A M P L E
...... I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
T h e program h a n d l e s a m a x i m u m of 50 rows and 50 c o l u m n s for the table of power values. This can be changed b y altering the d i m e n s i o n statement.
......
10
30
I ..........................................
i0 I
0.48
0.49
0.50
20
0.84
0.85
0.85
I
ANOTHER DIFFERENCE
5. Examples
20
6.000
0.469 N B S R -
4.4. Restrictions
I
1
F O R SB 7: 6
S T E P SIZE F O R T H E S U B S A N P L E
INDIV. I
SR, SB (2) 7:
1.413
SUBSAMPLE
[111.
2
1
.093
MINIMUM
- the real f u n c t i o n BETAIN to c o m p u t e the inc o m p l e t e beta function ratio (algorithm A S 63)
(2) 7:
.5
OF FREEDOM
verse of the incomplete beta function ratio (algorithm A S 6 4 / A S 109) [10]
A O V (2) ?:
(i) O R ESTIMATES
DEGREES
MINIMUM
.05
.01 O R .05) 7:
(I) O R F O W E R C A L C U L A T I O N
M S R ?:
4.3. Subroutines
OF POWER***
(YES-l,
ANOTHER CALCULATION
NO-O)
(I) O R
?:
0
E X I T (0) ?:
0
Fig. 6. Example program run.
5.1. Example of a program run A s s h o w n in Fig. 5 there are m a n y possibilities to use the program. W e present in Fig. 6 a run m a k i n g the calculation o f table of powers in the context of hierarchical analysis o f variance. T h e i n p u t values for the m e a n squares correspond to an application presented in detail in [16]. Different options can be c h o s e n by typing one o f the n u m b e r s in parentheses.
5.2. Examples of different uses of the program 5. 2.1. Ordinary analysis of variance (a) Let there be a = 3 treatments, a = 0.05. F r o m a previous experiment MSe = 0.093. A = 0.5. M i n i m u m s a m p l e size = 10, m a x i m u m s a m p l e size
= 20, step size = 2. The following values are obtained: Number of individuals: 10 Power:
12
0.9
14
0.95
0.98
16 0.99
18
20
1.0
1.0
(b) W i t h the same values o f a and a, s u p p o s e an estimate for o e is not available and that the value A / % = 1 is chosen. Let the m i n i m u m and maxim u m s a m p l e sizes be 5 and 20, with a step size o f 5. Result: Number of individuals:
5
Power:
0.22
10 0.45
15 0.65
20 0.80
187
5.2.2. Hierarchical analysis of variance (a) With the s a m e a and a, let the estimate of Ob, Or be s b = 0.469, s , - - 0 . 3 0 5 . Let A = 0.5. Let the m i n i m u m and m a x i m u m subsample sizes be 10 and 30 with step size 10, and the m i n i m u m and m a x i m u m sample sizes 10 and 20 with step size 10. The result is:
GENERAL LINEAR MODELS PROCEDURE
DEPENDENT VARIABLE:
SOURCE
DF
SUM OF SQUARES
MODEl
13
26 66329524
ERROR
70
6 52000000
83
33 18329524
CORRECTED
TOTAL
MODEl. F -
N u m b e r of observations:
10
20
30
0.49 0.85
0.50 0.85
Power N u m b e r of individuals:
10 20
0.48 0.84
CV
0.8035]6
7 6398
SOURCE
5.3. Example of calculation of mean squares with the statistical package S A S
MEAN SQUARE 2 05102271 009314286
22.02
R-SQUARE
PR > F ~ 0 800 ROOT MSE
DENS MEAN
0 30519315
3 99476190
DF
TYPE I SS
F VALUE
PR > F
2
11 12047607
59.70
0 GO01
INDIV(AGE)
ii
1554281917
1517
00001
SOURCE
DF
TYPE Ill SS
AGE
(b) The s a m e values as in 5.2.2(a) result if instead of s b, s r the m e a n squares f r o m a pilot experiment M S r = 0.093, MS b = 1.413 are entered, the degrees of f r e e d o m for M S r (subsample size in the pilot experiment) being N B S r = 6. (For the c o m p u t a t i o n of the m e a n squares see 5.3 below.)
DENS
AGE INDIV(AGE)
TESTS OF HYPOTHESES
SOURCE AGE
F VALUE
PR > F
2
1112047607
5920
00001
Ii
15.54281917
1517
0OOO]
USING THE TYPE III MS FOR INDIV(AGE)
DF
TYPE
2
A5 AN ERROR TERM
III SS
P VALUE
PR > F
11.12047607
3 94
0.0514
Fig. 8, Computer output from commands in Fig. 7.
5.3.1. Hierarchical analysis of variance T h e following SAS c o m m a n d s m a y be used to obtain an analysis of variance table containing the error m e a n square MS, and the s u m of squares SS b (Fig. 7). The resulting output is indicated in Fig. 8. The following quantities can be identified:
OPTIONS
LINESIZE-70 DATA
PAGESIZE-70;
DATA SYNDENS;
DENS ;
INPUT AGE
(1) MS, = 0.09314286 (2) degrees of f r e e d o m for SS b = 11 (3) SS b = 15.54281917 F r o m (2) and (3) w e derive MS b = ( 3 ) / ( 2 ) = 1.413. The last c o m m a n d line TEST H = . . . is not relevant to the planning situation but m a y be used w h e n the goal is to test for treatment effect in
1 INDIV
3
INPUT
DENS 5-8;
CARDS ;
GROUP
DENS;
CARDS;
I 1 3.19
i
25.6000
1 1 3.27
i
34.6500
1 1 3.85
i
26.7500
3 5 3.62
2
15.9000
2
15.0000
P R O C OLM; CLASS
AGE
INDIV;
PROC G I M ;
M O D E L DENS - A G E I N D I V ( A G E ) ;
CLASS GROUP;
TEST HmAGE E-INDIV(AGE);
MODEL DENS-GROUP ;
Fig. 7. SAS commands for calculation of mean squares in the hierarchical A N O V A .
Fig. 9. SAS commands for calculation of mean squares in the one-way A N O V A .
188 GENERAL LINEAR MODELS PROCEDURE
DEPENDENT VARIABLE: DENS
SOURCE
DF
SUM OF SQUARES
MEAN SQUARE
MODEL
i
402.16091234
402.16091234
ERROR
15
224.92970122
14.99531341
CORRECTED TOTAL
16
627.09061356
C.V.
ROOT MSE
DENS MEAN
3,87237826
28.21333529
26.82 PR>
same technique. W h i l e power w o u l d still b e relev a n t after a n e x p e r i m e n t is completed, it w o u l d be less so at the design stage since different n u m b e r s of i n d i v i d u a l s / o b s e r v a t i o n s typically are n o t p l a n n e d for. The core of the p r o g r a m , of course, is relevant to a n y test b a s e d o n the F - d i s t r i b u t i o n .
F
0.0001
Acknowledgements R-SQUARE
0.641312
13.7253
SOURCE
GROUP
SOURCE
DF
TYPE I SS
F VALUE
FN > F
1
402.16091234
26.82
0.0001
DF
TYPE III SS
F VALUE
PR > F
i
402,16091234
W e are grateful to Mrs. Ch. Vaclavik for typing, to Drs. P. Clarke a n d P. Melzer for criticism of a n early version, Dr. E. Welker for a useful discussion, a n d to Prof. H. V a n der Loos for his s u p p o r t and encouragement.
References GROUP
26.82
0.0001
Fig. 10. Computer output from commands in Fig. 9.
hierarchical designs, a n d produces the last lines of the o u t p u t . T h e original data used for this example were the same as i n the example o n n e u r o n a l densities, p a r t A [16].
5.3.2. Ordinary one-way analysis of variance MSe, as well as its square root, are given i n SAS a n d m a y be f o u n d i n other statistical packages also. The SAS c o m m a n d s i n Fig. 9 m a y be used. T h e o u t p u t from these c o m m a n d s is i n d i c a t e d in Fig. 10, from which MS e = 14.99531341 is obtained.
6. Final remarks T w o n a t u r a l extensions of the p r o g r a m presently envisaged are, first, for the case of more t h a n two stages, a n d second, for the case of m o r e t h a n one factor. O n the other h a n d p o w e r c o m p u t a t i o n i n the case of a n analysis of variance with different n u m b e r s of i n d i v i d u a l s i n the t r e a t m e n t groups or different n u m b e r s of o b s e r v a t i o n s per i n d i v i d u a l is n o t a m e n a b l e , for the present time at least, to the
[1] R.R. Sokal and J. Rohlf, Biometry (Freeman and Co., New York, 1969). [2] W.D. Kornegay, III and M.C. Poole, A computer program using a nested analysis of variance in morphometric sampling, Comput. Programs Biomed. 16 (1983) 155-160. [3] P.C. Tang, The power function of the analysis of variance tests with tables and illustrations of their use, Stat. Res. Mem. 2 (1938) 126-157. [4] E.S. Pearson and H.O. Hartley, Biometrika Tables for Statisticians (Cambridge University Press, Cambridge, 1954). [5] H. Scheff6, The Analysis of Variance (Wiley, New York, 1959). [6] O.J. Dunn and V.A. Clark, Applied Statistics: Analysis of Variance and Regression (Wiley, New York, 1987). [7] S.S. Wilks, Mathematical Statistics (Wiley, New York, 1962). [8] N.L. Johnson and S. Kotz, Continuous Univariate Distributions, Vol. 2 (Houghton Mifflin, Boston, MA, 1970). [9] M.C. Pike and I.D. Hill, Logarithm of gamma function, in: Applied Statistics Algorithms, eds. P. Griffiths and I.D. Hill, pp. 243-246 (Ellis Hotwood, Chichester, 1985). [10] K.L.' Majumder and G.P. Bhattacharjee, The incomplete beta integral, in: Applied Statistics Algorithms, eds. P. Griffiths and I.D. Hill, pp. 117-120 (Ellis Horwood, Chichester, 1985). [11] K.L. Majumder and G.P. Bhattacharjee, Inverse of the incomplete beta function ratio, in: Applied Statistics A1"gorithms, eds. P. Griffiths and I.D. Hill, pp. 121-125 (Ellis Horwood, Chichester, 1985). [12] P.B. Patnaik, The non-central X2- and F-distributions and their applications, Biometrika 36 (1949) 202-232. [13] M.L. Tiku, A note on approximating to the non-central F distribution, Biometrika 53 (1966) 606-610.
189 [14] C.-P. Han and J.L.G. Wang, Computation of noncentral F distributions with even denominator degrees of freedom, Commun. Star. Simul. Comput. 12 (1983) 1-9. [15] M. Fox, Charts of the power of the F-test, Ann. Math. Star. 27 (1956) 484-497. [16] G. Leuba, N. Jeanpr~tre, g. Kraftsik and J.-M. Fritschy, Sample size and statistical power: applications in morphometry of the nervous system, J. Neurosei. Methods (in press).
READ (*,*) TBMAX TXT-'STEP SIZE (NB. OBS) ?:' WRITE(*,*) TXT READ (*,*) INT NIT-(TBMAX-TB)/INT +i IF(NIT.LT.I) NIT-I ELSE NIT-I ENDIF C...
COMPUTE THE POWER FOR INCREASING N B O F
i01
P-(NBTREAT-I)/2. FI-2*P IIS-I DO IO0 IIT-I,NIT RNTB-TB+(IIT-I)*INT PHI-DELTASE*SQRT(RNTB/2/NBTREAT) 12kMBD-PHI*PHI*(FI+I) Q-NBTREAT* (RNTB -1 )/2. BETA- ALOGAM(P,IFAULT) + ALOGAM(Q,IFAULT) 9"X-XINBTA(P, Q, BETA, ALPHA, IFAULT) F2-2*Q FALF-F2/FI/(I~VX)*VX
Appendix PROGRAM LISTING
C,
CALCULATION OF POWER
C
19
TYPE *,' ' TYPE *,' *** CALCULATION OF POWER ***' TXT-'NIJMBER OF TREATMENTS ?:' WRITE(*,*) TXT READ(*,*) NBTREAT TXT-'VALUE OF ALPHA (EG: .01 OR .05) ?:' WRITE(*,*) TXT READ(*,*) ALPHA ALPHA-I.-ALPRA TXT-'OP~DINARY AOV (i) OR HIERARCHICAL AOV (2) ?:' WRITE(*,*) TXT READ(*,*) IAOV IF(IAOV.LT.I.OR.IAOV.GT.2) GO TO 19 GOTO (i0000,20000) IAOV
C..
ORDINARY ANALYSIS OF VARIANCE
i0000
CONTINUE TXT-'PLANNING (I) OR CALCULATION OF POWER (2) ?:' WRITE(*,*) TXT READ(*,*) ITREAT IF(ITREAT.NE.I.AND.ITRF.AT.NE.2) GO TO 22
22
C,..
25
INPUT OF DATA IF(ITREAT.EQ.I) THEN TXT-'VALUES OF MSE, DELTA (I) O R R A T I O SE/DELTA (2) ?:' WRITE(*,*) TXT READ(*,*) IDAT IF(IDAT.NE.I.AND.IDAT.NE.2) GO TO 25 IF(IDAT.EQ.I) THEN TXT-' MSE ?:' WRITE(*,*) TXT READ (*,*) MSE SE-SQRT(MSE) TXT-'DIFFERENCE TO BE DETECTED ?:' WRITE(*,*) TXT READ (*,*) DELTA DELTA_SE-DELTA/SE ELSE TXT-'RATIO DELTA/SE ? : ' WRITE(*,*) TXT READ (*,*) DELTA SE ENDIF ELSE T X T - ' N U ~ E R OF OBSERVATIONS ( > OR - 2) ?:' WRITE(*,*) TXT READ (*,*) TB TXT.' MSE 7:' WRITE(*,*) TXT READ (*,*) HSE SE-SQRT(MSE) TXT-'DIFFERENCE TO BE DETECTED ?:' WRITE(*,*) TXT READ (*,*) DELTA DELTA SE-DELTA/SE ENDIF IF(ITREAT.EQ.I) THEN TXT-'MINIMUM NUMBER OF OBSERVATIONS ( > OR - 2) ?:' WRITE(*,*) TXT READ(*,*) TB TXT-'MAXIMUM NUMBER OF OBSERVATIONS ?:' WRITE(*,*) TXT
- ALOGAM(P + Q,IFAULT)
Power aproximation algorithm (Patnaik) RK-(FI+LAMBD)/FI RR-(FI+LAMBD)*(FI+I~D)/(FI+2*LAM~D) XI-P~R/F2*FALF/KK/(RR/F2*FALF/RK+I)
REAL*4 LA/£BD,P,Q,BETA,BETAIN,ALPHA,XINBTA,ALOGAM REAL*4 NBTREAT,MSE,MSB,MSR,NBSR CHARACTER*80 TXT DIMENSION APOW(50,50),NBOB(50),NBIND(50) i
OBS
RR2-RR/2
I00
F22-F2/2 BI-ALOGAM(RR2,1F)+ALOGAM(F22,1F)-ALOGAM(RR2+F22,1F) ERR-BETAIN(XI,RR2,F22,BI,IFAULT) APOW(IIT,IIS)-I-ERR NBIND(IIT)-RNTB NBOB(IIS)-0 CONTINUE NB INT_S_E-I
ALPHALEV-I -AI2?HA WRITE(*, 6015) NBTREAT,ALPHALEV, SE 6015 FORMAT(IX,//,' POWER OF AOV F-TEST',/, + ................................. ' / , + ' NBTREAT-',FB.0,' ALPHA-',FB.2,' SE-',FI0.3) IF(IDAT.EQ.I) WRITE(*,6020) DELTA IF(IDAT.EQ.2) WRITE(*,6025) DELTA_SE 6020 FORMAT(IX,//,' DIFFERENCE TO BE DETECTED -',F8.3,/, + .................................. , ) 6025 FORMAT(IX,//,' RATIO DELTA/SE -',F8.3,/, + .................................. , ) WRITE(*,6005) 6005 FORMAT(//,' INDIV. I POWER',/, + ' - ..... I............... ') DO 150 IIT-I,NIT WRITE(*,6000) NBIND(IIT),(APOW(IIT,IIS),IIS-I,NBINT_SE) 150 CONTINUE TXT-'ANOTHER DIFFERENCE (YES-I, NO-O) ?:' %/RITE(*,*) TXT RFAD(*,*) IAUT IF(IAUT.EQ.I) THEN IF(ITREAT.EQ.I) THEN IF(IDAT.EQ.I) THEN TXT-'DIFFERENCE TO BE DETECTED 7"~ WRITE(*,*) TXT READ (*,*) DELTA DELTA SE-DELTA/SE GO TO I01 ELSE TXT-'RATIO DELTA/SE ?:' WRITE(*,*) TXT READ (*,*) DELTA SE GO TO I01 ENDIF ELSE TXT-'DIFFERENCE TO BE DETECTED ? WRITE(*,*) TXT READ (*,*) DELTA DELTA SE-DELTA/SE OO TO 101 ENDIF ELSE TXT-'ANOTNER CALCULATION (I) OR EXIT (0) ?:' WRITE(*,*) TXT READ(*,*) IAUT IF(IAUT.EQ.O) GO TO 30000 IF(IAUT.EQ.I) GO TO I ENDIF STOP C...
HIERARCHICAL ANALYSIS OF VARIANCE
20000
CONTINUE TXT-'PI2kNNING (i) OR POWER CALCULATION WRITE(*,*) TXT READ(*,*) ITREAT
28
(2) ?:'
190 IF(ITREAT.NE.I.AND.ITREAT.NE.2) GO TO 28 TXT-'DIFFERENCE TO BE DETECTED ?:' WRITE(*,*) TXT READ (*,*) DELTA IF(ITHEAT. EQ. I) THEN TXT-'VALUES OF MSR, MSB (i) OR ESTIMATES SR, SB (2) ?:' WRITE(*,*) TXT READ(*,*) IDAT IF(IDAT.NE.I.AND.IDAT.NE.2) GO TO 29 IF(IDAT. EQ. i) THEN TXT- ' MSR ? : ' WRITE(*,*) TXT READ (*,*) MSR SR-SQRT (MSR) TXT-' MSB ?:' WRITE (*,*) TXT READ(*,*) MSB TYPE *, 'DEGREES OF FREEDOM FOR SB ?:' READ(*,*) NBSR IF(MSB-MSR. LE.O) SB-O IF(MSB-MSR.GT. O) SB-SQRT ( (MSB -MSR)/NBSR) ELSE TXT-'ESTIMATE SB ?:' WRITE(*,*) TXT READ (*,*) SB TXT-'ESTIMATE SR ?:' WRITE(*,*) TXT READ (*,*) SR ENDIP ELSE TXT- ' MSR ? : ' WRITE(*,*) TXT READ (*,*) MSR SR-SQRT(MSR) TYPE *, 'NI/MBER OF OBSERVATIONS IN SUBSAMPLE ?: ' READ(*,*) NBSR TR-NBSR TXT-' MSB ?:' WRITE (*,*) TXT READ(*,*) MSB TXT-'SAMPLE SIZE ?:' WRITE(*,*) TXT READ(*,*) TB IF(MSB-MSR.LE. O) SB-O IF(MSB-MSR.GT. O) SB-SQRT ( (MSB -MSR )/NBSR ) ENDIF IF(ITREAT.EQ. i) THEN TXT-'MINIMITM SUBSAMPLE SIZE 7:' WRITE(*,*) TXT READ(*,*) TR TXT-'MAXIMUM SUBSAMPLE SIZE ?:' WRITE(*,*) TXT READ(*,*) THMAX TYPE *,'STEP SIZE FOR THE SUBSAMPLE (NB. OBS.)?:' READ(*,*) INT_S_E NB_INT_S E- (TRMAX -TR)/INT_S_E+ 1 TXT-'MINIMUM SAMPLE SIZE (NB. INDIV.) ?:' WRITE(*,*) TXT READ(*,*) TB TXT--'MAXIMUM SAMPLE SIZE (NB. INDIV.) ?: ' WRITE(*,*) TXT READ (*,*) TBMAX TXT-'STRP SIZE ( NB. INDIV.) ?:' WRITE(*,*) TXT READ ( * , * ) INT NIT-(TBMAX- TB)/INT +I IF(NIT.LT. i) NIT-1 ELSE NIT-1 NB INT_S E-I ENDIF
C... COMPUTE THE POWER FOR INCREASING NB. OF OBS. 201
P-(NBTREAT- i)/2. FI-2*P DO 200 IIT-I,NIT RNTB-TB+ (IIT - I)*INT DO 300 IIS-I,NB INT_S E RTR-TR+ (I I S -I )*INT_S_E S_ETA-SQRT (SB* S B+S R*SR/RTR ) PHI-DELTA/S_ETA*SQRT (RNTB/2/NBTREAT ) LAMED-PHI*PHI* (FI+I ) Q-NBTREAT* (RNTB -1 )/2. F2-2*Q BETA- ALOGAM(p, IFAULT) + ALOGAM(Q, IFAULT) VX-XINBTA(P, Q, BETA, ALPHA, IFAULT) FALP-P2/FI/( 1 -VX) *VX
C
Power approximation
algorithm
- ALOGAM(P + Q, IFAULT)
(P~tnaik)
RK- (FI+LAMBD)/FI RR- (FI+LAMBD) * (FI+LAMBD) / (FI+2*LAMBD) XI-RR/F2 *FALF/RK/( RR/F2 *PALF/RK+I ) RR2-RR/2 F22-F2/2 BI-ALOGAM (RR2, IF)+ALOGAM(F22, IF) -ALOGAM (RR2+F22, IF) ERR-BETAIN (Xl, RR2 ,F22 ,BI, IFAULT) APOW(IIT ,IIS)-I- ERR NBOB ( I I S )-RTR 300 CONTINUE NBIND(IIT)-RNTB 200 CONTINUE TYPE *,' ' IF(IDAT.EQ.I) WRITE(*,6050) NBTREAT,I-ALPHA,SR,SB,NBSR IF(IDAT.EQ.2) WRITE(*~6051) NBTREAT,I-ALPHA~SR~SB 6050 FORMAT(IX,//,' POWER of HIERARCHICAL AOV ',/~ + .................................. ' /, + ' NBTREAT-',F5,O,' ALPHA-',F5.2,' SR-',FIO.3,' SB-',FI0.3, + ' NBSR-' ,FI0.3) 6051 FORMAT(IX,//,' POWER of HIERARCHICAL AOV ',/, + .................................. ' / , +
' NBTRHAT-',F5.0,' ALPHA-',FS.2,' SR-',FI0.3,' SB-',FI0.3) WRITE(*,6020) DELTA WRITE(*,6010) (NBOB(IIS), IIS-I ,NB INT_S_E) DO aO0 IIT-I,NIT WRITE(*, 6000) NBINU(I IT) ,(APOW(IIT, IIS), IIS-I ,NB_INT_S_E) 400 CONTINUE TXT-'ANOTHER DIFFERENCE (YES-I, NO-O) ?:' WRITE(*,*) TXT READ(*,*) IAUT IF(IAUT. EQ. i) THEN TXT-'DIFFERENCE TO BE DETECTED ?:' WRITE(*,*) TXT R F ~ (*,*) DELTA GO TO 201 ELSE TXT-'ANOTHER CALCULATION (i) OR EXIT (O) ?;' WRITE(*,*) TXT READ(*,*) IAUT IF(IAUT.EQ.O) GO TO 30000 IF(IAUT.EQ.I) GO TO I ENDIF 6000 FORMAT(IX,I5,' I ',17(F7.2)) 6010 FORMAT(//' INDIV. [ N~ER OF OBSERVATIONS IN SUBSAMPLE' ,/, + ' - ..... 1 ........................................... + .............................
'/,
+ ( 6 X , ' I ',1717),/ + [ ....................................... + ............................. ,) 30000 CONTINUE END