Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180 www.elsevier.com/locate/jspi
Sample size determination for 2-step studies with dichotomous response Man-Lai Tanga,∗ , Nian-Sheng Tangb , Vincent J. Careyc a Department of Mathematics, Hong Kong Baptist University, Kowloon, Hong Kong b Research Center of Applied Statistics, Yunnan University, Kunming 650091, PR China c Channing Laboratory, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School,
Boston, MA 02115, USA Received 18 November 2003; accepted 14 July 2004 Available online 28 October 2004
Abstract In this article, we consider problems with correlated data that can be summarized in a 2×2 table with structural zero in one of the off-diagonal cells. Data of this kind sometimes appear in infectious disease studies and two-step procedure studies. We propose two kinds of approximate sample size formulas, based on rate ratio, for comparison of the marginal and conditional probabilities in a correlated 2 × 2 table with structural zero. The first type of formula is derived to guarantee a pre-specified power of a hypothesis test at certain significance level while the second type of formula is developed to bound the width of a confidence interval with specified confidence level. Our empirical studies confirm that sample size formulas based on the log-transformation and score tests outperform that based on the Wald’s test. We illustrate our methodologies with a real example from a two-phase treatment study. © 2004 Elsevier B.V. All rights reserved. MSC: Primary 62P10; Secondary 62P99 Keywords: Asymptotic inference; Correlated binary data; Rate ratio; Score test; Structural zero
1. Introduction In studies of repeated exposures to infectious agents, researchers may be interested in assessing the effect due to the primary infection on the risk of developing secondary infection ∗ Corresponding author.
E-mail address:
[email protected] (M.-L. Tang). 0378-3758/$ - see front matter © 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.07.016
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
1167
Table 1 Observed frequencies for two stage treatment regimes for severely ill patients Phase I
No improvement Improvement
Phase II No improvement
Improvement
6 —
21 43
(Agresti, 1990). Subjects who do not suffer a primary infection from a certain infectious disease cannot develop a secondary infection from the same disease. If the outcome of interest is binary (e.g., infected or not infected), the resultant data can usually be summarized in a correlated 2 × 2 table with a structural zero in one of the off-diagonal cells. Johnson and May (1995) reported another scenario in which a treatment regimen for a particular disease was administered to qualified patients in two stages. After the initial stage, researchers recorded the patient’s status as either improved or not improved. Patients were initiated to the second phase and were asked to return for assessment only if the initial treatment provided no improvement. Since the researchers recorded the patient’s response to the second phase of the treatment only if the initial response was negative, they introduced a structural zero in the cell of the summary table that corresponds with positive response for the first treatment and negative response for the second treatment. The corresponding data for patients with initial disease status being severe are reported in Table 1. Let 11 be the probability associated with lack of improvement after both phases of the treatment and 1+ be the marginal probability of no improvement after the first phase. In this case, 11 /1+ represents the conditional probability that a patient shows no improvement in the second phase of the treatment, given that the same patient shows no improvement in the first phase of the treatment. One has evidence to believe that the second phase of treatment is beneficial if the rate ratio 11 /21+ (=[(11 /1+ )/1+ ]) is less than one (Johnson and May, 1995). Other examples include two-stage treatment studies (Johnson and May, 1995) and two-step tuberculosis testing studies (Tang and Tang, 2002, 2003). Statistical tools for analyzing correlated 2 ×2 tables with structural zero have been developed by Lui (1998, 2000), Tang and Tang (2002, 2003), and Tang et al. (2004). Currently, comparative measures under study include the risk/rate difference (RD) and the risk/rate ratio (RR). Lui (1998) investigated the confidence interval estimators for the risk ratio based on the Wald’s test, the logarithm transformation test and the test according to Fieller’s theorem. Asymptotic confidence intervals obtained from the logarithm transformation test were found by simulation to consistently outperform the other two interval procedures in terms of coverage probability and expected width. Lui (2000) then studied the confidence interval estimators for the simple risk difference. Empirical study showed that the confidence interval estimator based on the likelihood ratio test consistently performed well in various situations. Tang and Tang (2003) revisited the work by Lui (2000) and derived the score statistic for testing the null hypothesis of non-zero risk difference. Their simulation study showed that the score-test-based confidence interval estimator and the likelihood-ratio-test-based confidence interval estimator perform equally well in most of the small sample settings. Most
1168
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
importantly, the score statistic is well-defined (except for one scenario) in small sample designs, while the likelihood ratio statistic can often be undefined for very sparse data sets. Tang et al. (2004) reported similar observations for the score-type test statistic for non-unity risk ratio. In addition, they found that confidence interval estimators based on the score-type statistic generally possess shorter expected confidence width than those based on Wald’s and log-transformation statistics. However, none of the above cited work proposed sample size formulas for design purposes. In Section 2, based on the Wald’s, log-transformation and score test statistics for risk ratio we investigate two different approaches for sample size determination: the significance test approach and the confidence interval approach. The significance test approach produces sample size formulas that guarantee a pre-specified power of a hypothesis test at certain significance level while the confidence interval approach produces sample size formulas that bound the width of a confidence interval for specified confidence level. We then investigate the accuracy of the proposed sample size formulae under different settings in Section 3. We demonstrate our methodologies with a real example in Section 4. A brief discussion is given in Section 5.
2. Approximate sample size formulas Suppose a sample of n patients is recruited to a two-step treatment regimen for a particular disease. The two-step treatment regimen is designed to operate in the following manner. All patients are classified first according to whether they have improvement in the initial phase of the treatment. Only those patients who show no improvement in the initial phase of the treatment are initiated to the second phase of the treatment. By definition, patients who demonstrate improvement in the initial phase should show improvement in the second phase and the study is discontinued for patients of this kind. As a result, only three possible outcomes can be observed and the results can be summarized by the following incomplete 2 × 2 table (Table 2). Let 0 < ij < 1 (for (i, j ) = (1, 1), (1, 2), and (2, 2)) denote the probability of the corresponding cells and 1+ = 11 + 12 , and 1+ + 22 = 1. Here, a, b, and c are the corresponding frequencies of the cells and n = a + b + c. Lui (1998) adopted the ratio of conditional risk of secondary infection, given the primary infection, to the risk of primary infection (RR = 11 /21+ ) to gauge the magnitude of the effect due to the primary infection on the likelihood of developing the secondary infection.
Table 2 Two-phase treatment regimen Initial phase
No improvement Improvement
Second phase
Total
No improvement
Improvement
a(11 ) —
b(12 ) c(22 )
a + b(1+ ) c(22 )
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
1169
In this article, we consider testing the following hypotheses H0 : RR = 0
versus
H1 : RR = 0 ,
(1)
where 0 is a specified quantity. The Wald’s statistic, the logarithmic transformation statistic (see, Lui, 1998) and the score statistic (see Appendix for derivation) for testing H0 and their associated confidence interval estimators for RR are summarized as follows: (i) Wald’s test statistic TW (a, b; 0 ) =
na − 0 (a + b)2 , √ na(n − a)
the corresponding asymptotic 100(1−)% test-based confidence intervals for RR is given by √ √ ˜ + z(1−/2) ˜ 1 / n], ˜ − z(1−/2) ˜ 1 / n, 0}, RR [max{RR ˜ = ˜ 11 /˜ 21+ , ˜ 21 = ˜ 11 (1 − ˜ 11 )/˜ 41+ , ˜ 11 = a/n, ˜ 1+ = (a + b)/n and z(1−/2) where RR is the 100(1 − /2) percentile point of the standard normal distribution. (ii) Logarithmic transformation test statistic TL (a, b; 0 ) =
n1/2 [ln(na) − 2 ln(a + b) − ln(0 )] , √ (n − a)/a
the corresponding asymptotic 100(1 − )% test-based confidence intervals for RR is given by √ √ ˜ + z(1−/2) ˜ 2 / n}], ˜ − z(1−/2) ˜ 2 / n}, exp{ln(RR) [exp {ln(RR) ˜ ˜ 11 are as defined for the Wald’s test statistic. where ˜ 22 = (1 − ˜ 11 )/˜ 11 , and RR, (iii) Score test statistic a − (a + b)0 ˆ 1+ 1 − 0 ˆ 21+ TS (a, b; 0 ) = , (1 − 0 ˆ 1+ )ˆ 1+ n0 √ where ˆ 1+ =[B − B 2 − 4AC]/(2A), with A=(n+a +b)0 , B =n+a +2(a +b)0 , and C=2a+b, (sometimes denoted as ˆ 1+ (0 ) to indicate its dependence on 0 ) is the maximum likelihood estimate of 1+ under the null hypothesis which satisfies 0 ˆ 1+ min(1, 1/0 ) (Tang et al., 2004). The confidence limits for the 100(1 − )% score-test-based confidence interval for RR = 11 /21+ can be shown to be the two roots of the following quadratic equation A1 2 − B1 + C1 = 0
(2)
∗ ∗ ∗ ∗ ∗ 2 ∗2 with A1 = (a + b)2 ∗2 1+ (1 − 11 1+ ), B1 = 2a(a + b)1+ (1 − 11 1+ ) + n1, 1+ (1 − ∗11 )2 , C1 = a 2 (1 − ∗11 ∗1+ ), ∗11 = (∗1+ )2 and ∗1+ = ˆ 1+ (), where 21, is the upper -percentile of the central 2 distribution with one degree of freedom. TW , TL and TS are all asymptotically standard normal under H0 . It is noteworthy that TW and TL are undefined whenever a =0 or a =n while TS is undefined only when a =b =0. We also notice that Lui (1998) also proposed another test statistic based on Fieller’s Theorem.
1170
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
Since this statistic could be frequently undefined in small to moderate sample designs, we omit consideration of this test statistic. We now consider two different approaches for approximate sample size calculation based on the above statistics. When an investigator takes a hypothesis testing approach for the RR analysis, (s)he may want to determine the necessary number of subjects to be recruited for a given test that may guarantee a desired power level, say 1 − , and at the same time control the type I error rate at the nominal level, say . We will call this a power-controlled sample size determination. On the other hand, one may also want to calculate the necessary sample size such that the width of the resultant (1 − )100% test-based confidence interval for the RR does not exceed a pre-specified quantity, say 2. We will call this a CI-width-controlled sample size determination. We first discuss the derivation of power-controlled sample size for the Wald’s test statistic. Let f (x, y) = x − 0 (x + y)2 and = 11 − 0 21+ . By noting that ˆ = f (a/n, b/n) = [na − 0 (a + b)2 ]/n2 and applying the delta method, we can easily show that ˆ 1 ] ≈ (1 − 0 )2 and E[|H 1+ ˆ 1 ] ≈ 2 [1 − 40 (1 − 0 )1+ − (1 − 20 )2 2 ]/n. Var[|H 1+ 1+ As n goes to infinity, TW1 =
na − 0 (a + b)2 − n2 (1 − 0 )21+
n1+ n(1 − 40 (1 − 0 )1+ − (1 − 20 )2 21+ )
is asymptotically standard normal under the alternative hypothesis H1 : 11 /21+ =1 ( = 0 ). The asymptotic power function for the Wald’s statistic TW at nominal level can then be shown to be Pr[|TW | z(1−/2) |RR = 1 ] ≈ 1 − (uW ), where uW =
√ z(1−/2) 1 (1−1 21+ )− n(1 −0 )1+ 1 −40 (1 −0 )1+ −(1 −20 )2 21+
and
(·) is the standard normal distribution function. In order to achieve a power of 1 − , we arrive at the approximation uW ≈ −z(1−) . Hence, the approximate sample size required for the Wald’s test statistic TW to achieve the desired power of 1 − at RR = 1 is given by nTW ⎫2 ⎧ ⎪ ⎬ ⎨ z(1−/2) 1 (1−1 21+ )+z(1−) 1 −40 (1 −0 )1+ −(1 −20 )2 21+ ⎪ = . (3) ⎪ ⎪ (1 −0 )1+ ⎭ ⎩
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
1171
Similarly, the asymptotic power function for the logarithmic transformation statistic TL at nominal level can be shown to be Pr[|TL | z(1−/2) |RR = 1 ] ≈ 1 − (uL ), √ √ where uL = z(1−/2) − n11 ln(1 /0 )/ 1 − 11 . Hence, the approximate sample size required for the logarithmic transformation test statistic TL to achieve the desired power of 1 − at RR = 1 is given by
1 − 1 21+ z(1−/2) + z(1−) 2 nTL = . (4) ln(1 /0 ) 1 21+ To derive the power-controlled sample size for the score test statistic TS , let ¯ 1+ be the asymptotic limit of ˆ 1+ for sufficiently large n under the alternative hypothesis H1 : RR = 11 /21+ = 1 . It is not difficult to show that
¯ 1+ = B2 − B22 − 4A2 C2 (2A2 ), where A2 = (1 + 1+ )0 , B2 = 1 + 1 21+ + 21+ 0 , and C2 = 1+ (1 + 1 1+ ). Then the asymptotic power of the score test at level is Pr TS2 21, |H1 = Pr 21 ( )21, , where 21 ( ) denotes the non-central 2 distribution with non-centrality parameter (see Appendix for derivation) =
(1−0 ¯ 21+ )¯ 21+ 0 ¯ 21+ (1−0 ¯ 21+ )2
[(n−1)(1 1+ −0 ¯ 1+ )2 +20 ¯ 21+ +(1−20 ¯ 1+ )1 ],
which is obtained from the asymptotic expectation of TS2 under H1 . It is noteworthy that the non-centrality parameter is linear in the desired sample size n. Hence, the approximate sample size (nT s ) required for the score test statistic TS to achieve the desired power of 1 − at RR = 1 can be readily obtained from the following equation: 21,1− ( ) = 21, ,
(5)
where 21,1− ( ) is the upper 1 − percentile of the non-central 2 distribution with noncentrality parameter . In practice, we usually set 0 being 1.0. We note that 1+ is the probability of a patient showing no improvement in the initial phase of the treatment and hence can be easily determined from previous available data sets based on the original one-step treatment regimen. We next consider the derivation of the CI-width-controlled sample size. First, we note that the half width of the score-test-based confidence interval is given by = B12 − 4A1 C1 /(2A1 ),
1172
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
where A1 , B1 , C1 are as defined for expression (2) above, or (1 − ∗11 ) ∗1+ [4na(a + b)(1 − ∗11 ∗1+ )21, + n2 (21, )2 ∗1+ (1 − ∗11 )2 ] = , 2(a + b)2 ∗1+ (1 − ∗11 ∗1+ ) where ∗11 , ∗+1 are as defined for expression (2) above. In this case, the asymptotic limit of the right- handed side of the above equation can be expressed as (1 − ¯ ∗11 ) ¯ ∗1+ [4n11 1+ (1 − ¯ ∗11 ¯ ∗1+ )21, + (21, )2 ¯ ∗1+ (1 − ¯ ∗11 )2 ] , = 2n21+ ¯ ∗1+ (1 − ¯ ∗11 ¯ ∗1+ ) where ¯ ∗11 = (¯ ∗1+ )2 , ¯ ∗1+ is the asymptotic limit of ∗1+ for a large n and given 1+ . In this case, we have
∗ 2 (2A3 ), ¯ 1+ = B3 − B3 − 4A3 C3 with A3 = (1 + 1+ ), B3 = 1 + 21+ + 21+ , C3 = 1+ (1 + 1+ ). Hence, the desired sample size that controls the half-(1 − )100%-confidence-width at based on TS is given by
(2A4 ) (6) nCS = B4 + B42 + A4 C4 with A4 = 41+ (¯ ∗1+ )2 (1 − ¯ ∗11 ¯ ∗1+ )2 2 , B4 = 11 1+ (1 − ¯ ∗11 ¯ ∗1+ )(1 − ¯ ∗11 )2 21, ¯ ∗1+ , C4 = (21, ¯ ∗1+ )2 (1 − ¯ ∗11 )4 , where ∗11 , ∗+1 are as defined for expression (2) above. For Wald’s (TW ) and the logarithmic transformation test statistic (TL ), the desired sample sizes that control the half-(1 − )100%-confidence-width at are respectively given by ⎧ 2 z 11 (1−11 ) ⎪ ⎨ (1−/2) 2 4 , when 11 /21+ 1+ nCW = z2 and (7) ⎪ ⎩ (1−/2)211 (1−211 ) , otherwise, (21+ −11 )
2 2 2 4 /2 )]2 . (1 − )/ [log( / + 1 + nCL = z(1− 11 11 1+ 11 /2) 1+ 11
(8)
3. Empirical study We first examine the accuracy of the three approximate power-controlled sample size formulas (i.e., (3), (4) and (5)). To do so, we have to examine the performance of the three sample size formulae at various values of 1+ , 0 , 1 , 1 − (i.e., designated power) and (i.e., nominal level). Due to space limitation, we report in Fig. 1 those representative results at which 1+ = 0.5, 0 = 0.9, 1.0, and 1.1, 1 = 0.5, 1.5(0.1), 1 − = 0.8, and = 0.05. Here, we add 0.5 to each cell when a = 0 or b = 0 for TW and TL , and add 0.5 to each cell when a = b = 0 for TS in all calculations.
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
1173
Fig. 1. Plot of sample sizes/100 against 1 , when 0 = 1.0, 1+ = 0.5, and = 0.05.
Fig. 2. Plot of exact powers against 1 , when 0 = 1.0, 1+ = 0.5, and = 0.05.
Fig. 3. Plot of true sizes against 1 , when 0 = 1.0, 1+ = 0.5, and = 0.05.
Based on the sample sizes (i.e., nTS , nTW , and nTL ) reported in Fig. 1, we gauge the accuracy of these formulas by computing the respective exact powers at 1 and actual sizes at 0 . For any test statistic, the exact power at 1 is computed by (a,b)∈R f (a, b|11 , 12 ; 1 ) where (a, b), (11 , 12 ), f (a, b|11 , 12 ; 1 ); R = {(a, b): |Tj (a, b; 0 )| z(1−/2) }(j = S, W, L) are the sampling point, alternative hypothesis (with RR = 1 ), the probability mass function of (a, b) under the alternative hypothesis, and critical region, respectively. For calculations of actual size, we simply replace the alternative by the null hypothesis in the above formula (i.e., 1 = 0 ). Here, we only plot the corresponding exact powers and true sizes (against 1 = 0.5, 1.5(0.1)) at 0 = 1.0 in Figs. 2 and 3, respectively. In general, all power-controlled sample size formulas (i.e., nTS , nTW , and nTL ) provide fairly accurate sample size estimates in the sense that the exact power (see Fig. 2)based on the estimated sample size is usually pretty close to the nominal power (e.g., 80% ± 5%). Nevertheless, actual sizes (see, Fig. 3) vary considerably among the three test procedures. For the Wald’s
1174
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
Fig. 4. (a) Plot of sample sizes/100 against when = 0.3, and 1+ = 0.2. (b) Plot of sample sizes/100 against when = 0.5, and 1+ = 0.2. (c) Plot of sample sizes/100 against when = 0.8, and 1+ = 0.2.
test statistic, its actual size may exceed the pre-assigned nominal significance level. For example, the actual size (i.e., 6.03%) is 20 per cent greater than the designated nominal significance level (i.e., 5%) at 1+ = 0.5, 0 = 1.0, and 1 = 0.6. In this regard, we consider the Wald’s test an unstable test. On the other hand, the actual sizes of the log-transformation and score test statistics are seldom very far from the nominal significance level (e.g., ±5% of the nominal significance level). In view of this reason, we consider the log-transformation and score tests a reliable test for testing the null hypothesis of non-unity risk ratio. Tang et al. (2004) compared the performance of various test-based confidence interval estimators in terms of coverage probability and expected confidence width. They reported that coverage probabilities of both the log-transformation and score-based confidence interval estimators generally fluctuate around the desired confidence level (1 − ) in a very narrow neighborhood while the Wald’s confidence interval estimator fails to maintain the confidence level for moderate sample sizes. Moreover, the score-based confidence interval estimator usually possesses significantly shorter expected confidence interval width. This phenomenon is particularly obvious when n and 1+ are small. To close this section, we plot (see Fig. 4) some approximate confidence width controlled sample sizes (i.e., nCS , nCW ,
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
1175
and nCL ) of some values for , 1+ , and . In this case, the confidence level is set to be 90%. In view of the above reasons, we would generally recommend the usage of nCS and nCL in practice. 4. Numerical example We revisit the two-stage treatment study described in Section 1 and the data are reported in Table 1. That is, a = 6, b = 21, and c = 43 with n = 70. In this example, one may want to determine whether improvement exists after the second phase given no improvement after the first phase. To do so, we may consider the rate ratio RR between the probability of no improvement after the second phase given no improvement after the first phase and the probability of no improvement after the first phase. To answer the question posed, we test the following hypotheses at = 0.05: H0 : RR = 1.0
versus
H1 : RR = 1.0.
˜ = 0.576, and the p-values corresponding to the Wald’s, logIn this case, we have RR transformation, and score test statistics are given by 0.059, 0.158, and 0.111. The respective 95% confidence intervals are [0.135, 1.017], [0.268, 1.238], and [0.254, 1.111]. Using any of the proposed test procedures, the data do not support the claim that the probability of no improvement after the second phase given no improvement after the first phase differs from the probability of no improvement after the first phase. However, the Wald’s p-value seems to suggest a marginal rejection of the null hypothesis. This leads to the question if the present study has sufficiently large sample size for the existing test procedures to detect a non-unity rate ratio at 0.05 nominal level with power, say, 0.90. To answer this question, we set 1+ = 0.386, 0 = 1.0, 1 = 0.576, = 0.90 and = 0.05. In this case, we have nTW = 264, nTL = 368, and nTS = 255. Here, the power controlled sample size based on the score test statistic is significantly smaller than that based on the log-transformation statistic. Suppose an investigator would like to adopt the confidence interval approach and to guarantee that the half width of the resultant 95% test-based confidence interval for RR is no greater than = 0.5. In this case, we obtain nCW = 62, nCL = 67, and nCS = 39. Again, the CI-width-controlled sample size based on the score test statistics is significantly smaller than that based on the log-transformation statistic. 5. Conclusion In this paper, we provide accurate sample size formulas based on score-type statistic. These formulas are derived to guarantee either (i) pre-specified power level at given nominal level; or (ii) bounded CI width with specified confidence level. Asymptotic test procedures and confidence interval estimators based on score-type statistics have consistently been shown to perform well in the literature (Newcombe, 1998a,b; Tang et al., 2003). In this paper, we observe similar findings for risk ratio in incomplete 2×2 table with structural zero. In our empirical study, we find that the asymptotic score-type and log-transformation procedures possess actual type I error rates close to the pre-specified nominal level with higher powers. On the other hand, Tang et al. (2004) reported that the asymptotic score-type confidence
1176
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
interval estimator generally possesses shorter expected width with coverage probability close to the pre-assigned confidence level. Therefore, we would highly recommend sample size formulas based on score-type statistic. Although there is a well-known duality that relates the type I error rate of a test and the coverage probability of the associated confidence interval, similar duality cannot be established between the power of the test and the width of the corresponding confidence interval (Bristol, 1989). In practice, the analytical approach (i.e., hypothesis testing vs. confidence interval construction) adopted to analyze the data should be driven by the goal of the study (i.e., testing vs. estimation). We therefore urge practitioners to be alert of the consistency between the approach used to determine the sample size and the approach to be used to analyze the data. In this article, we consider only sample size formulae based on asymptotic theory. In some applications, due to the limited availability of potential subjects and/or budget ones could only afford recruiting a small number of subjects. In these cases, exact power analyses and small-sample statistical procedures are usually recommended. Tang and Tang (2002) considered exact unconditional procedures for risk ratio in a correlated 2 × 2 table with structural zero. Unlike those asymptotic procedures, they found that the corresponding exact unconditional procedures could always control their type I errors at or below the pre-chosen nominal level while the corresponding approximate unconditional procedures could be more powerful with type I error rates being close to the pre-assigned nominal level. Although their comparisons were solely based on the Wald’s and logarithmic transformation test statistics, similar conclusions should apply to the score test statistic considered in this paper. Acknowledgements The author would like to thank the Editor, Professor Andrew Gelman, and the referee for many valuable comments and suggestions. The first author would like to thank HoiSze Chow for her encouragement during the preparation of the revision. The work of the first author was supported by a grant from the Research Grant Council of the Hong Kong Special Administration Region (project CUHK 4371/04M). The work of the second author was supported by The Mathematical Tianyuan Younger Fund of the National Natural Science Foundation from China (10226005) and The Natural Science Foundation of Yunnan Province. Appendix A. (1) Derivation of Score Test Statistic (i.e., TS (a, b)). Let = 11 /21+ − 0 . Hypotheses in (1) can now be rewritten as H0 : = 0
versus
H1 : = 0.
Thus, 11 = 21+ ( + 0 ), 12 = 1+ [1 − 1+ ( + 0 )], 22 = 1 − 1+ . Therefore, the log-likelihood for observed vector (a, b, d) is given by l = (2a + b) ln(1+ ) + a ln( + 0 ) + b ln(1 − 1+ ( + 0 )) + c ln(1 − 1+ ) + D,
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
1177
where D is a constant, which does not depend on parameters 1+ , 0 and . The first and second order derivatives for the log-likelihood are then given as jl b1+ a − = , j + 0 1 − 1+ ( + 0 ) c jl 2a + b b( + 0 ) − , = − 1+ 1 − 1+ ( + 0 ) 1 − 1+ j1+ b21+ j2 l a = − − , [1 − 1+ ( + 0 )]2 ( + 0 )2 j 2 j2 l 2a + b b( + 0 )2 c = − − − , 2 2 2 j1+ 1+ [1 − 1+ ( + 0 )] (1 − 1+ )2
and
j2 l b =− . j j1+ [1 − 1+ ( + 0 )]2
(A.1)
In addition, the Fisher information matrix with respect to vector (1+ , ) under H0 is given by ⎡ I =⎣
n(1−0 21+ ) 1+ (1−1+ )(1−0 1+ ) n1+ 1−0 1+
n1+ 1−0 1+ n21+ 0 (1−0 1+ )
⎤ ⎦.
Therefore, the score test statistic for hypothesis (2) is given by TS (a, b) =
jl |1+ = ˆ 1+ , = 0 j
(I −1 )22 |1+ = ˆ 1+ , = 0,
where ˆ 1+ is the maximum likelihood estimate of 1+ under H0 (see the derivation below) and (I −1 )22 is the (2,2) element of the inverse of I. (2) Derivation of maximum likelihood estimate of 1+ (i.e., ˆ 1+ ). (I) b = 0. By solving the equation jl/j1+ = 0 in (A.1) and the condition 0 ˆ 1+ min(1, 1/0 ), we could readily obtain ˆ 1+ = min(1/0 , 2a/(n + a)). (II) b = 0. We will show that ˆ 1+ is the smaller root of the following equation: f (x) = (n + a + b)0 x 2 − [n + a + 2(a + b)0 ]x + 2a + b = 0.
(A.2)
√ Or, ˆ 1+ = [B − B 2 − 4AC]/(2A) with A = (n + a + b)0 , B = n + a + 2(a + b)0 , and C = 2a + b. Proof. Since ˆ 1+ satisfies jl/j1+ = 0 in (A.1), it is readily to see that ˆ 1+ is the root of the equation in (A.2). Since 0 ˆ 1+ min(1, 1/0 ), the appropriate root of f (x) = 0 must
1178
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
also satisfy 0 x 1
for
0 1,
0 x 1/0
for
0 > 1.
(A.3)
Case 1. 0 = 1. In this case, equation in (A.2) possesses two roots: x1 = 1,
and
x2 = (2a + b)/(n + a + b).
However, the equation jl/j1+ = 0 in (A.1) has the only root x2 . It follows from (A.3) that ˆ 1+ is the smaller root of f (x) = 0. Case 2. 0 = 1 with a + b = n. When 0 > 1 we have f (0) = 2a + b > 0, f (1) = (n − a − b)(0 − 1) > 0, f (1/0 ) = b(1 − 0 )/0 < 0
and
and these indicate that the smaller root of equation (A.2) satisfies 0 < x < 1/0 . When 0 < 1, we have f (0) > 0,
and
f (1) < 0,
and thus the smaller root of Eq. (A.2) satisfies 0 < x < 1. Therefore, when b > 0 and a + b = n, ˆ 1+ is the smaller root regardless of the values of 0 . Case 3. 0 = 1 with a + b = n. When 0 > 1, we have f (0) > 0, f (1) = 0, f ((n + a)/(2n0 )) = 0, f (1/0 ) < 0, (n + a)/(2n0 ) < 1/0 < 1,
and
and thus it follows from (A.3) that ˆ 1+ = (n + a)/(2n0 ), which is the smaller root of Eq. (A.2). When (n + a)/(2n) < 0 < 1, we have f (0) = 2n0 > 0, f (1) = 0, f ((n + a)/(2n0 )) = 0, f (1/0 ) > 0, (n + a)/(2n0 ) < 1 < 1/0 ,
and
and thus it follows from (A.3) that ˆ 1+ = (n + a)/(2n0 ), which is the smaller root of Eq. (A.2). When 0 < (n + a)/(2n), we have two roots x1 = 1,
and
x2 = (n + a)/(2n0 ) > 1.
However, the equation jl/j1+ = 0 in (A.1) has no roots and the log-likelihood takes its maximum at ˆ 1+ = x1 = 1, which is again the smaller root of equation (A.2). Case 4. a = 0 and b = 0. In this case, TS (a, b) is undefined.
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
1179
(3) Derivation of the non-centrality parameter . (a − (a + b)0 ˆ 1+ )2 (1 − 0 ˆ 21+ ) 2 E(TS |H1 ) = E n0 (1 − 0 ˆ 1+ )2 ˆ 21+ H1 2 2 (a − (a + b)0 ¯ 1+ ) (1 − 0 ¯ 1+ ) ≈E n0 (1 − 0 ¯ 1+ )2 ¯ 21+ H 1
=
1 − 0 ¯ 21+ n0 (1 − 0 ¯ 1+ )2 ¯ 21+
E[((1 − 0 ¯ 1+ )a − 0 ¯ 1+ b)2 |H1 ].
Here, ¯ 1+ is the asymptotic limit of ˆ 1+ for sufficiently large n under the alternative hypothesis H1 . Noting that E(a|H1 ) = n11 = n1 21+ , E(b|H1 ) = n12 = n1+ (1 − 1 1+ ),
and
Cov(a, b|H1 ) = −n11 12 = −1 31+ (1 − 1 1+ ), we have E[(1 − 0 ¯ 1+ )a − 0 ¯ 1+ b|H1 ] = n1+ (1 1+ − 0 ¯ 1+ ), Var[(1 − 0 ¯ 1+ )a
and
− 0 ¯ 1+ b|H1 ] = n21+ [1 (1 − 20 ¯ 1+ ) + 20 ¯ 21+ /1+
− (1 1+
− 0 ¯ 1+ ) ]. 2
Therefore, it follows from E(2 ) = Var() + [E()]2 that E(TS2 |H1 ) ≈
(1 − 0 ¯ 21+ )¯ 21+ 0 ¯ 21+ (1 − 0 ¯ 21+ )2
[(n − 1)(1 1+ − 0 ¯ 1+ )2 + 20 ¯ 21+ /1+
+ (1 − 20 ¯ 1+ )1 ].
References Agresti, A., 1990. Categorical Data Analysis, Wiley, New York. Bristol, D.R., 1989. Sample sizes for constructing confidence intervals and testing hypothesis. Statist. Med. 8, 803–811. Johnson, W.D., May, W.L., 1995. Combining 2 × 2 tables that contain structural zeros. Statist. Med. 14, 1901–1911. Lui, K.J., 1998. Interval estimation of risk ratio between the secondary infection given the primary infection and the primary infection. Biometrics 54, 706–711. Lui, K.J., 2000. Confidence intervals of the simple difference between the proportions of a primary infection and a secondary infection, given the primary infection. Biomet. J. 42, 59–69. Newcombe, R.G., 1998a. Two-sided confidence intervals for the single proportion: comparison of seven methods. Statist. Med. 17, 857–872. Newcombe, R.G., 1998b. Interval estimation for the difference between independent proportions: comparison of eleven methods. Statist. Med. 17, 873–890. Tang, M.L., Tang, N.S., Carey, V.J., 2004. Confidence interval for rate ratio in a 2 × 2 table with structural zero: an application in assessing false negative ratio when combining two diagnostic tests. Biometrics 60, 550–555.
1180
M.-L. Tang et al. / Journal of Statistical Planning and Inference 136 (2006) 1166 – 1180
Tang, N.S., Tang, M.L., 2002. Exact unconditional inference for risk ratio in a correlated 2 × 2 table with structural zero. Biometrics 58, 972–980. Tang, N.S., Tang, M.L., 2003. Statistical inference for risk difference in an incomplete correlated 2 × 2 table. Biomet. J. 45, 34–46. Tang, N.S., Tang, M.L., Chan, I.S.F., 2003. On tests of equivalence via non-unity relative risk for matched-pair design. Statist. Med. 22, 1217–1233.