Contemporary Clinical Trials 32 (2011) 140–146
Contents lists available at ScienceDirect
Contemporary Clinical Trials j o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / c o n c l i n t r i a l
A type of sample size design in cancer clinical trials for response rate estimation Junfeng Liu ⁎ Department of Biostatistics, School of Public Health, University of Medicine and Dentistry of New Jersey, 683 Hoes Lane West, Room 219, P.O. Box 9, Piscataway, NJ 08854, USA
a r t i c l e
i n f o
Article history: Received 2 April 2010 Accepted 11 October 2010
Keywords: Admissible Bayesian Cost Frequentist Mean squared error Sample size (design)
a b s t r a c t During the early stage of cancer clinical trials, when it is not convenient to construct an explicit hypothesis testing, a study on a new therapy often calls for a response rate (p) estimation concurrently with or right before a typical phase II study. We consider a two-stage process, where the acquired information from Stage I (with a small sample size (m)) would be utilized for sample size (n) recommendation for Stage II study aiming for a more accurate estimation. Once a sample size design and a parameter estimation protocol are applied, we study the overall utility (cost-effectiveness) in connection with the cost due to patient recruitment and treatment as well as the loss due to mean squared error from parameter estimation. Two approaches will be investigated including the posterior mixture method (a Bayesian approach) and the empirical variance method (a frequentist approach). We also discuss response rate estimation under truncated parameter space using maximum likelihood estimation with regard to sample size and mean squared error. The profiles of p-specific expected sample size, mean squared error and risk under different approaches motivate us to introduce the concept of “admissible sample size (design)". © 2010 Elsevier Inc. All rights reserved.
1. Introduction Once a phase I cancer clinical trial chooses a dosage with consideration of efficacy and safety, statistical evaluation on the experimental treatment would take place by means of hypothesis test and estimation (two major statistical inference tools) with regard to response rate (p). Sometimes, a typical phase IIA single-arm cancer clinical trial as a succeeding step involving a hypothesis test may not be conveniently planned due to the lack of accurate historical information on the standard therapy response rate (p0) and/ or a clearly expected experimental response rate (p1) ([1]). After determining the dose to be used for the experimental therapy (e.g., phase I trial), oncologists are often interested in estimating p before formally launching a phase II study. Moreover, p estimation may be done concurrently with a ⁎ Tel.: +1 732 235 4654; fax: +1 732 235 5464. E-mail address:
[email protected]. 1551-7144/$ – see front matter © 2010 Elsevier Inc. All rights reserved. doi:10.1016/j.cct.2010.10.005
phase II study once patient enrollment is in process. For instance, when the required sample sizes are comparable for the two tasks (estimation and hypothesis test), properly distributing these recruited patients for estimation purpose into two stages would make Simon's design ([2]) applicable. Parallel to hypothesis test, we study a two-stage estimation procedure where the information from Stage I would be used for sample size design for Stage II where a more accurate estimation is expected. On the other hand, planning clinical trials often involves cost-effectiveness study. For instance, evaluation of a patient's response may require months of observation, patient accrual may be increasingly difficult and the experimental drug may be highly expensive and/or toxic ([2,3]). This article studies methods for reaching a compromise between the estimation accuracy and total trial cost. The organization of the rest of the paper is as follows. Section 2 proposes two types of sample size design approaches from Bayesian and frequentist viewpoints; Section 3 summarizes the numerical results; Section 4
J. Liu / Contemporary Clinical Trials 32 (2011) 140–146
discusses the relationship between sample size and mean squared error using maximum likelihood estimation for binomial distribution under truncated parameter space; and Section 5 concludes with a discussion. 2. Sample size design for response rate estimation after a preliminary stage A two-stage design (e.g., [2]) is frequently proposed for early stage clinical trials under consideration of bioethics and/ or cost-effectiveness. Compared to massive literature discussing hypothesis test during early stages of cancer clinical trials, a limited number of articles studied response rate estimation (e.g., [4–7]). Based on our recent consultation experience, we now discuss sample size design from estimation perspectives. After a preliminary (sample size m is small) pilot trial (Stage I), the collected information on p may not be sufficient for a conclusive evaluation of the experimental treatment. At this point, oncologists are interested in recruiting another group of patients for a more confirmative assessment. The mean squared error (MSE), which is defined as the expected value of squared error loss in estimating parameter θ using estimator θˆ , i.e., 2 MSE = Eθ θˆ −θ , has been broadly used for estimator evaluation. Moreover, we let Xn to represent the number of responders (a random variable which follows a binomial distribution) out of Stage II patients, which are to be recruited and treated. The related trial cost is assumed to be nD, where n is Stage II sample size and D is the trial cost per patient. As for utilizing Stage I information, we come into two major approaches including Bayesian and frequentist themes. The former one makes use of available information (from Stage I) to refine the initially assumed continuous probability distribution of the model parameters (e.g., response rate p) and the sample size design is based on optimization under this sort of posterior mixture (updated prior distribution for p). The latter one exploits the Stage I information to create an estimate of the objective function (i.e., trial cost plus estimation loss) which usually involves an estimated response rate (a single value) rather than a continuous probability distribution. We now study the associated details. 2.1. A Bayesian approach
observed responders (a random variable which follows a binomial distribution) during Stage II. (1) Given α, β and m, conditional on Xm, Stage II sample size (n) may determine the following expected risk (denoted as E(R(n))) by involving estimation mean squared error under posterior mixture 2 Xn −p n ∫10 pð1−pÞpα + Xm −1 ð1−pÞβ +
EðRðnÞÞ = nD + Eπ1 ðpÞ EXn j n;p = nD +
π1 ðpÞ = Betaðα + Xm ; β + m−Xm Þ∝p p∈½0; 1:
β + m−Xm −1
ð1−pÞ
;
ð1Þ
In view of the uncertainty arising from involving Xm and the subjectivity of specifying parameters α and β, we would estimate the response rate by Xn/n, where Xn is the number of
dp
nBðα + Xm ; β + m−Xm Þ
ðnBðα + Xm ; β + m−Xm ÞÞ = nD +
ðα + Xm Þðβ + m−Xm Þ ; nðα + β + m + 1Þðα + β + mÞ
ð2Þ where, B(a,b) represents the Beta function value, i.e., B (a, b) = ∫ 10 ta − 1(1 − t)b − 1dt; EXn|n, p stands for the expectation of certain function of Xn given n and p; and Eπ1(p) stands for the expectation of certain function of p given its distribution π1(p). (2) Given D, α, β, Xm and m, the desirable Stage II sample size (n) (denoted as n(Xm)) could be such a positive integer which minimizes E(R(n)) (Eq. (2)), i.e., nðXm Þ = argminn≥1 nD +
ðα + Xm Þðβ + m−Xm Þ : nðα + β + m + 1Þðα + β + mÞ
ð3Þ (3) For any p, the expected Stage II sample size (denoted as E(n)) under such a process is m
X
EðnÞ = ∑
Xm = 0
X
m−Xm
nðXm ÞCmm p m ð1−pÞ
;
ð4Þ
j
where, Ck represents the binomial coefficient k !/j ! (k − j) !. (4) For any p, the mean squared error (MSE) due to such a Stage II estimation process is m
nðXm Þ
Xm = 0
Xn = 0
X
X
∑ Cmm CnðnXm Þ p
Xm + Xn
ð1−pÞ
m−Xm + nðXm Þ−Xn
Xn −p nðXm Þ
2
:
ð5Þ (5) For any p, the risk (denoted as R(p)) involving trial cost and estimation loss due to such a Stage II estimation process is 2 Xn X X X −p nðXm ÞC + Cmm CnðnXm Þ p m nðXm Þ Xn = 0 nðXm Þ
m
∑
Xm = 0
∑
m−Xm + nðXm Þ−Xn
ð1−pÞ α + Xm −1
m−Xm −1
= nD + Bðα + Xm + 1; β + m−Xm + 1Þ
∑
We consider the following Stage II sample size (n) determination process by starting with a prior distribution for p ∈ [0, 1], e.g., π0(p) ∼ Beta(α,β), where Beta(α,β) represents a Beta prior with a probability density function ∝ pα − 1 (1 − p)β − 1 which is regulated by shape parameters α and β. We assume there are Xm (a random variable) responders out of the total m patients during Stage I, the updated prior distribution (or posterior distribution after observing Xm out of m) for Stage II becomes into
141
:
+ Xn
ð6Þ
2.2. A frequentist approach When Xn/n is used as Stage II estimator, the risk R(n) = nD + p(1 − p)/n, where p(1 − p)/n is the variance of Xn/n. Since Xm/m could be used as the point estimator for p, i.e., pˆ = Xm = m, we discuss another sample size (n) design rule
142
J. Liu / Contemporary Clinical Trials 32 (2011) 140–146
ˆ ðnÞ = which may minimize the empirical risk function R nD + pˆ ð1−pˆ Þ = n, which is a probabilistic approximation to R(n). Given D, Xm and m, the recommended Stage II sample size (n) could be such a positive integer which minimizes nD + Xm(m − Xm)/(nm2). Moreover, we prefer an adjusted variance estimator along with the following updated empirical risk function
posterior median (denoted as pposmed) with ∫ p0posmed π2(p) dp = 0.5, where π2(p) is the posterior probability density function for p given (Xn, n, Xm, m, α, β), i.e., π2 ðpÞ = Betaðα + Xm + Xn ; β + m−Xm + n−Xn Þ; p∈½0; 1: ð8Þ Another Stage II estimator may be the conditional posterior mean (denoted as pposmea) given (Xn, n, Xm, m, α, β), i.e.,
ˆ ðnÞ = nD + pˆ ð1−pˆ Þ = ðn−n = mÞ; R which is an unbiased estimator of the true risk R(n). Given D, Xm and m, the desirable Stage II sample size (n) could be ðX Þðm−Xm Þ : nðXm Þ = argminn≥1 nD + m nmðm−1Þ
ð7Þ
pposmea = ðXn + Xm + αÞ = ðn + m + α + βÞ; ðutilizing Stages I + IIÞ; or ðutilizing Stage IIÞ: pposmea = ðXn + αÞ = ðn + α + βÞ;
Similar to Eq. (2), conditional on m, Xm and D, the expected risk function is calculated as E(R(n)) = nD + Eπ1(p) EXn|n, p(pposmed − p)2 and nD + Eπ1(p)EXn|n, p(pposmea − p)2, respectively. We will not investigate these alternatives in details since the computational sample size recommendation is more complicated without an explicit form (e.g., Eqs. (3) and (7)) and may introduce more uncertainty by using regulating parameters α and β.
All other considerations are similar to those (Eqs. (4–6)) in Section 2.1. Although the two items in the brackets (Eqs.(3) and (7)) have similarity, no (α, β) value would make them identical for a given (Xm,m). 2.3. Some other methods
Definition I. (Admissible sample size design (SSD)): Given the model parameter (θ) and the protocol for obtaining the estimated parameter (θˆ ), the risk function R(SSD, θ) depends
The Stage II estimator Xn/n in the preceding Bayesian approach (Section 2.1) may be replaced by the conditional
E(n) [Bayesian=solid, Frequentist=dotted] 20 10
Frequency
5
10
0
0
5
E(n)
20
E(n) [Bayesian−Frequentist]
0.0
0.2
0.4
0.6
0.8
1.0
−3.0
MSE [Bayesian=solid, Frequentist=dotted]
−2.5
−2.0
−1.5
−1.0
−0.5
0.0
10 0
5
Frequency
0.04 0.00
MSE
0.08
15
MSE [Bayesian−Frequentist]
0.0
0.2
0.4
0.6
0.8
1.0
0.0000
Risk [Bayesian=solid, Frequentist=dotted]
0.0010
0.0015
8 6 0
0.00
2
4
Frequency
0.08
Risk [Bayesian−Frequentist]
0.04
Risk
0.0005
0.0
0.2
0.4
0.6
0.8
1.0
−3e−04
−2e−04
−1e−04
0e+00
1e−04
2e−04
3e−04
Fig. 1. Under (m = 6, D = 0.0005), pair-wise differences (the expected sample size (E(n)) (Stage II), mean squared error (Stage II estimator) and risk (R) (Stage II estimator)) vs. response rate value (p) between Bayesian (α = β = 0) and frequentist approaches.
J. Liu / Contemporary Clinical Trials 32 (2011) 140–146
143
2) In Fig. 2, Bayesian approach (with prior Beta(1,1), solid line) has a larger expected sample size than frequentist approach (dotted line) when p gets away from 0.5 towards the two end points {0,1}. However, Bayesian approach has a uniformly smaller mean squared error than frequentist approach and an almost uniformly smaller risk than frequentist approach (except at the tiny regions close to two endpoints). In other words, the frequentist approach is almost inadmissible in this case. 3) In Fig. 3, Bayesian approach with prior Beta(0,0) (solid line) has a uniformly smaller expected sample size than Bayesian approach with prior Beta(1,1) (dotted line). However, the former one has a uniformly larger mean squared error than the latter one and an almost uniformly larger risk than the latter one (except at the tiny regions close to two endpoints). In other words, the Beta(0,0) prior induced Bayesian approach is almost inadmissible in this case.
on the sample size design (SSD) protocol. For sample size design protocol SSDa, if there is no other protocol SSD⁎ such that RðSSD⁎ ; θÞ≤RðSSDa ; θÞ; ∀θ; and RðSSD⁎ ; θÞbRðSSDa ; θÞ for at least one θ;
then SSDa is an admissible SSD. 3. Numerical results Under the setting (m = 6, D = 0.0005), the obtained expected sample size (E(n)) will be within a normal range (≤ 30). We compare the two approaches introduced in Sections 2.1 (Bayesian approach) and 2.2 (frequentist approach) at each p ∈ [0, 1], where Bayesian approach have two cases: priors Beta(0,0) and Beta(1,1). For p estimation using Stage II information (Xn,n), the p-specific MSE and risk calculations are based on Eqs. (5) and (6), respectively. The numerical results are given in Figs. 1 through 3. 1) In Fig. 1, Bayesian approach (with prior Beta(0,0), solid line) has a uniformly smaller expected sample size than frequentist approach (dotted line). However, Bayesian approach has a uniformly larger mean squared error than frequentist approach. The risks from two approaches are comparable to each other.
4. A note on maximum likelihood estimation Ref. [8] proposed a conjecture on maximum likelihood estimation which expects MSE to monotonically decrease as sample size increases. We now discuss a left-truncation case in
E(n) [Bayesian=solid, Frequentist=dotted]
15 10
Frequency
0
5
10 0
5
E(n)
20
20
E(n) [Bayesian−Frequentist]
0.0
0.2
0.4
0.6
0.8
1.0
MSE [Bayesian=solid, Frequentist=dotted] 8 6 4
Frequency
0
2
0.08 0.04
MSE
10
MSE [Bayesian−Frequentist]
0.00 0.0
0.2
0.4
0.6
0.8
1.0
−0.05
−0.04
Risk [Bayesian=solid, Frequentist=dotted]
−0.03
−0.02
−0.01
0.00
6 4
Frequency
0.04
0
2
0.08
8
Risk [Bayesian−Frequentist]
0.00
Risk
5
0
0.0
0.2
0.4
0.6
0.8
1.0
−0.04
−0.03
−0.02
−0.01
0.00
0.01
Fig. 2. Under (m = 6, D = 0.0005), pair-wise differences (the expected sample size (E(n)) (Stage II), mean squared error (Stage II estimator) and risk (R) (Stage II estimator)) vs. response rate value (p) between Bayesian (α = β = 1) and frequentist approaches.
144
J. Liu / Contemporary Clinical Trials 32 (2011) 140–146
E(n) [Beta(0,0)=solid, Beta(1,1)=dotted]
8 4
Frequency
10
0
0
5
E(n)
20
12
E(n) [Beta(0,0)−Beta(1,1)]
0.0
0.2
0.4
0.6
0.8
1.0
−12
−10
MSE [Beta(0,0)=solid, Beta(1,1)=dotted]
−8
−6
−4
−2
0
6 4
Frequency
0
2
0.04 0.00
MSE
0.08
8
MSE [Beta(0,0)−Beta(1,1)]
0.0
0.2
0.4
0.6
0.8
0.00
1.0
0.01
Risk [Beta(0,0)=solid, Beta(1,1)=dotted]
0.03
0.04
0.05
6 4
Frequency
0.04
0
2
0.08
8
Risk [Beta(0,0)−Beta(1,1)]
0.00
Risk
0.02
0.0
0.2
0.4
0.6
0.8
1.0
−0.01
0.00
0.01
0.02
0.03
0.04
Fig. 3. Under (m = 6, D = 0.0005), pair-wise differences (the expected sample size (E(n)) (Stage II), mean squared error (Stage II estimator) and risk (R) (Stage II estimator)) vs. response rate value (p) between Bayesian approaches (α = β = 0, α = β = 1).
a phase II clinical trial setting. A patient from the control group usually follows a well studied Bernoulli distribution with a known treatment response probability a, and any patient from the treatment group independently follows a Bernoulli distribution with an unknown event probability p. However, treatment induces potential medical benefit, so it is reasonable to impose left-truncation on the treatment group such that 0 ≤ a ≤ p ≤ 1. Ref. [9] discussed admissibility and minimaxity of parameter estimation for exponential family under truncated space {p|p ≥ a} and square error loss (δ(X) − p)2, where a is certain truncation point, δ(X) is the estimator given data X, and p is the unknown parameter. For binomial model with size n and event probability p under truncation {p|p ≥ a}, after observing X (0 ≤ X ≤ n) responders, [9] showed that estimator
δð X Þ =
X aX ð1−aÞðn−X Þ + ; n ∫1a pX−1 ð1−pÞn−X−1 dp
X = 0; 1; …; n:
ð9Þ
pˆ MLE = maxfa; X = ng =
X = n; a;
ð if X = n ≥ aÞ; ð if X = n b aÞ:
bnac X X n−X = 0 Cn ð pÞ ð1−pÞ
2 MSEðn; a; pÞ = ða−pÞ ∑X n
+ ∑X
X X n−X 2 ð X =n−pÞ ; = bnac + 1 Cn ð pÞ ð1−pÞ
where ⌊⌋ is the floor function, i.e., the maximal integer not exceeding the argument. Note that, neither the estimator in Eq. (9) nor that in Eq. (10) is necessarily unbiased. Given a = 0.94, we plot “MSE(n, a, p) vs. n" under different ps. Fig. 4 shows that some p values induce bell-shapes rather than monotonic decreasing curves. Definition II. (Admissible sample size (SS)): Given the model parameter (θ) and the protocol for obtaining the estimated parameter (θˆ ), the risk function R(n, θ) depends on the sample size (n). For sample size n = na, if there is no other sample size n⁎ such that Rðn⁎ ; θÞ≤Rðna ; θÞ; ∀θ; and Rðn⁎ ; θÞbRðna ; θÞ; for at least one θ;
is minimax. The related maximum likelihood estimation is given by
Conditional on sample size n and truncation point a, the mean squared error for pˆ MLE is
ð10Þ
then sample size na is an admissible sample size. In the above example which only involves MSE into the risk function, sample size n1 dominates n2 for the region p ∈[0.98,0.99]) whenever 0 b n1 b n2 ≤ 20. We may say that n2 is not an admissible sample size for this special p region. Thus,
J. Liu / Contemporary Clinical Trials 32 (2011) 140–146
0.0020 0.0005
MSE
0.0025
MSE vs. n (a=0.94, p=0.95)
0.0005
10
20
30
40
50
0
10
20
30
40
n
n
MSE vs. n (a=0.94, p=0.96)
MSE vs. n (a=0.94, p=0.97)
50
0.0006
5e−04
MSE
MSE
8e−04
0.0014
0
10
20
30
40
50
0
10
20
30
40
n
n
MSE vs. n (a=0.94, p=0.98)
MSE vs. n (a=0.94, p=0.99)
50
0.00015
4e−04
MSE
MSE
7e−04
0
0.00040
MSE
MSE vs. n (a=0.94, p=0.94)
145
0
10
20
30
40
50
0
10
n
20
30
40
50
n
Fig. 4. Mean squared error vs. sample size (left-truncated response rate space).
Shi's conjecture ([8]) is generally restricted to the original full parameter space. 5. Discussion Cost-effectiveness has been under consideration for a long time. [10] (Chapter 7) discussed fixed sample size and sequential analysis based statistical decision methodologies where sequential analysis is taken as a difficult subject. We studied a two-stage process rather than a naive individualbased sequential procedure. As for parameter estimation, confidence interval for certain parameters regulating the distribution of discrete responses simply based on normality assumption may be unreliable. Ref. [11] introduced methods for intra-class correlation confidence interval estimation in compound-Poisson sampling, and Ref. [12] discussed approaches to response rate confidence interval estimation in binomial sampling and highlighted the resultant chaotic properties. Based on the numerical results on the expected sample size and risk function in the present work, the difference among these diverse methods shows smooth and predictable trend. One sample size design and parameter estimation protocol may dominate another one uniformly or almost uniformly in terms of overall risk (trial cost plus mean squared error), expected sample size or estimation mean squared error. Our premise for sample size design and response rate estimation is based on per-patient trial cost D (patient availability, treatment and recruitment cost, response evaluation time period, etc) and response rate estimation accuracy. The comparison results are
sensitive to D specification. A clinical protocol on how to balance out cost (D) versus response rate estimation accuracy (MSE) may be implemented interactively and graphically where a carefully chosen ratio between D and the unit of MSE suffices for conducting protocol selection.
Acknowledgements We are grateful to Mark Stein from the Medical Oncology Program at The Cancer Institute of New Jersey for a helpful discussion on early stage cancer clinical trial. We also thank Anastasios Tsiatis for his constructive comments on an earlier draft and two anonymous reviewers for their insightful suggestions which greatly improved our presentation.
References [1] Liu J, Lin Y, Shih WJ. On Simon's two-stage design on single-arm phase IIA cancer clinical trials under beta-binomial distribution. Stat Med 2010;29:1084–95. [2] Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials 1989;10:1–10. [3] Mander AP, Thompson SG. Two-stage designs optimal under the alternative hypothesis for phase II cancer clinical trials. Contemp Clin Trials 2010;31:572–8. [4] Gehan EA. The determination of the number of patients required in a follow-up trial of a new chemotherapeutic agent. J Chron Dis 1961;13 (4):346–53. [5] Jennison C, Turnbull BW. Confidence intervals for a binomial parameter following a multi-stage test with application to MIL-STD 105D and medical trials. Technometrics 1983;25:49–58.
146
J. Liu / Contemporary Clinical Trials 32 (2011) 140–146
[6] Chang MN, O'Brien PC. Confidence intervals following group sequential tests. Control Clin Trials 1986;7:18–26. [7] Koyama T, Chen H. Proper inference from Simon's two-stage designs. Stat Med 2008;27:3145–54. [8] Shi N-Z. A conjecture on maximum likelihood estimation. Inst Math Stat Bull 2008;37(4):4. [9] Katz MW. Admissible and minimax estimates of parameters in truncated spaces. Ann Stat 1961;32(1):136–42.
[10] Berger JO. Statistical decision theory and Bayesian analysis. 2nd ed. Springer-Verlag: New York; 1985. [11] Lui K-J, Kuo L. Confidence limits for the intraclass correlation in compound-Poisson sampling. Biom J 1996;38(2):231–9. [12] Brown LD, Cai TT, DasGupta A. Interval estimation for a binomial proportion. Stat Sci 2001;16(2):101–33.