Optimal and minimax three-stage designs for phase II oncology clinical trials

Optimal and minimax three-stage designs for phase II oncology clinical trials

Available online at www.sciencedirect.com Contemporary Clinical Trials 29 (2008) 32 – 41 www.elsevier.com/locate/conclintrial Optimal and minimax th...

162KB Sizes 60 Downloads 52 Views

Available online at www.sciencedirect.com

Contemporary Clinical Trials 29 (2008) 32 – 41 www.elsevier.com/locate/conclintrial

Optimal and minimax three-stage designs for phase II oncology clinical trials Kun Chen a,⁎, Michael Shan b a

Global Biometric Science, Pharmaceutical Research Institute, Bristol Myers Squibb Company, 5 Research Parkway, Wallingford, CT 06492, USA b Department of Global Biometry, Bayer Pharmaceutical Corporation, 400 Morgan Lane, West Have, CT 06516, USA Received 5 February 2007; accepted 24 April 2007

Abstract The common objective of oncology phase II trials is to evaluate the anti-tumor activity of a new agent and to determine whether the new drug warrants further investigation. For cancer drugs that significantly shrink tumors, response (CR and PR) rate is usually the primary endpoint in cancer phase II trials for testing H0: P ≤ P0 vs H1: P ≥ P1, where P0 and P1 are response rates which does not or does warrant further investigation given the rate of false positive (α) and false negative (β). Multiple-stage designs including two-stage and three-stage have been developed by several authors. For example, Simon's optimal two-stage design [Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials 1989;10:1–10], Ensign et al. optimal three-stage design with restriction at the first stage [Ensign LG, Gehan EA, Kamen DS, Thall PF. An optimal three-stage design for phase II clinical trials. Stat Med 1994;13:1727–1736], Chen's optimal three-stage design without any restriction [Chen TT. Optimal three-stage designs for phase II clinical trials. Stat Med 1997;16:2701–2711], etc. However, all the above designs only early terminate a trial due to lack of activity of the study drug. Fleming's multiple-stage design [Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics 1982;38:143–151] allows early stopping for either sufficient activity or lack of activity. But his design does not attempt to optimize its efficiency. We extend Chen's [Chen TT. Optimal three-stage designs for phase II clinical trials Stat Med 1997;16:2701–2711] design and propose an optimal and a minimax design for three-stage cancer phase II trials which allows early stopping under both hypotheses. The design is optimal in the sense that the average sample number (ASN) is minimized under P = P0. The minimax design minimizes the maximal sample size (N) and then given this value of N minimizes the average sample number under P = P0. © 2007 Elsevier Inc. All rights reserved. Keywords: Two-stage design; Three-stage design; Optimal; Minimax; Average sample number; Maximal sample size

1. Introduction In cancer drug development, phase I trials are to determine the maximal tolerated dose which usually is the recommended dose in phase II trials. Phase II trials are intended to assess the anti-tumor activity of a new drug. Single-arm trials without control are commonly used in phase II trials for cytotoxic agents. Phase I trials usually include patients with ⁎ Corresponding author. Tel.: +1 203 677 5010; fax: +1 203 6777279. E-mail address: [email protected] (K. Chen). 1551-7144/$ - see front matter © 2007 Elsevier Inc. All rights reserved. doi:10.1016/j.cct.2007.04.008

K. Chen, M. Shan / Contemporary Clinical Trials 29 (2008) 32–41

33

different tumor types. But in phase II trials, patients with one specific type of tumor who meet inclusion criteria would be enrolled into the studies. Typically, tumor response rate is the primary endpoint for single-arm cancer phase II trials. There are two standard criteria, the standard World Health Organization (WHO) Criteria and the Response Evaluation Criteria in Solid Tumors Group (RECIST) criteria, for evaluating tumor response to treatment. There are some differences between these two criteria. In particular, the WHO criteria uses bi-dimension (tumor area) and the RECIST criteria use uni-dimension (longest diameter) to measure tumor size. For example, by WHO criteria, Complete Response (CR) means disappearance of all known tumors. Partial Response (PR) means ≥ 50% decrease in the sum of all target lesions, no new lesions. Stable Disease (SD) means b50% decrease and b 25% increase in the sum of all target lesions. Progressive Disease means ≥ 25% increase in the sum of all target lesions or appearance of new lesions. CR and PR are considered as ‘responders’ and SD and PD are considered as ‘non-responders’. Response rate is calculated as the number of patients with CR or PR divided by the total number of patients. Our objective in phase II trials is to determine whether a new drug is worth further investigation, for example, in a large more costly phase III trial. Both desirable and undesirable response rates would be specified in the protocol. Given the false-positive (α) and false-negative (β) rates, the null hypothesis could be tested: H0: P ≤ P0 vs H1: P ≥ P1, where P0 is the true response rate which is not clinically meaningful, and P1 is the true response rate which is sufficiently high to warrant further study. The probability of rejecting a promising drug should be required to be less than or equal to β if the alternative hypothesis is true and the probability of accepting an inactive drug should be required to be less than or equal to α under the null hypothesis. The three-stage trial design, which is the topic of this manuscript, is based on a one-sample binomial statistic with the probability of success as the probability of response. The multi-stage design allows early stopping of a trial due to lack of activity or promising effect. Early termination can save drug development time, which may reduce the cost and bring efficacious treatments to patients early. Multiple-stage designs, including two-stage and three-stage designs, have been developed by the following authors among others. Simon's optimal two-stage design for phase II trials [1] is a very popular multiple-stage design for phase II oncology trials. Simon's design does not allow early acceptance of the drug. Early termination occurs only when the drug has low activity. Two-stage design is easier than three-stage design from clinical operational standpoint because patient accrual at the end of each stage may have to be suspended. Accrual suspension may cause difficulty for future patient enrollment. However, Simon's design does not allow early termination if there is a long run of responses at the start. Ensign et al. [2] proposes a three-stage design which permits early stopping when a moderately long sequence of initial responses occurs. They put a constraint at the first stage, if all treated patients respond, then the trial stops. If there are one or more responses in stage 1, then continue to stages 2 and 3 using the same stopping rules as in Simon's design. Chen's [3] designs extend both the optimal and minimax two-stage designs to three-stage designs without any restriction at the first stage. His extension reduces the average sample number when the treatment is ineffective by an average of 10% from those of two-stage designs. Our optimal and minimax three-stage designs extend Chen's design to allow early stopping due to effective or ineffective drug. They give more opportunity to terminate the trial earlier based on the number of responses at the early stages. This may facilitate the further development of effective drug if the sufficient number of responses occurs at an early stage and also terminate the trial earlier based on the insufficient number of responses at an early stage. 2. Proposed three-stage designs 2.1. General notation Let ni denote the number of patients enrolled into a trial at the ith stage, i = 1, 2, 3, and Ni represent the cumulative sample size at the ith stage, N1 = n1 and N2 = n1 + n2, the total sample size N3 = N = n1 + n2 + n3. Let si denote the number of responses among the ni (i = 1, 2, 3) patients, Sg denote the number of cumulative responses observed at the gth stage (g = 1, 2, 3), where Sg is a binomial random variable (Ni, P). Let ai denote the acceptance points (of H0) and ri denote the rejection points (of H0), where i = 1, 2, 3. The decision rules for stopping or continuing the trial are: At stage g (g = 1, 2), If Sg = ∑ig= 1 si ≤ ag, stop and reject H1;

34

K. Chen, M. Shan / Contemporary Clinical Trials 29 (2008) 32–41

If Sg = ∑ig= 1 si ≥ rg, stop and reject H0; If ag b Sg = ∑ig= 1 si b rg, continue to stage g + 1. At the final stage (stage 3), we either reject H0 or reject H1, which results in a3 = r3 + 1 [5]. We assume that a patient's response is a binary random variable with true success probability P. Denote the binomial probability distribution function (PDF) and cumulative distribution function (CDF) by b(s; n, P) and B(s; n, P) respectively, that is,   s bðs; n; PÞ ¼ Ps ð1−PÞðn−sÞ : n Define the probability of stopping and rejecting H1 at each stage as: b1 ð PÞ ¼ Pr½s1 Va1 jP ¼B Pða1 ; n1 ; PÞ ¼ bðs; n1 ; PÞ sVa1

b2 ð PÞ ¼ Pr½a1 bs1 br1 ; s2 Va2 jP min½X n1 ;r1 −1 ¼ bðs; n1 ; PÞB½ða2 −sÞ; n2 ; P s¼a1 þ1

b3 ð PÞ ¼ Pr½a1 bs1 br1 ; a2 bs2 br2 ; s3 Va3 jP min½X n1 ;r1 −1 min½n2X ;r2 −s1 þ1 ¼ bðs1 ; n1 ; PÞbðs2 ; n2 ; PÞB½ða3 −s1 −s2 Þ; n3 ; P s1 ¼a1 þ1

s2 ¼a2 −s1 þ1

Define the probability of stopping and rejecting H0 at each stage as: a1 ð PÞ ¼ Pr½s1 zr1 jP ¼ 1−BP ðr1 −1; n1 ; PÞ ¼ 1− bðs; n1 ; PÞ sVr−1

a2 ð PÞ ¼ Pr½a1 bs1 br1 ; s2 zr2 jP min½X n1 ;r1 −1 ¼ bðs; n1 ; PÞ½1−Bðr2 −s−1Þ; n2 ; P s¼a1 þ1

a3 ð PÞ ¼ Pr½a1 bs1 br1 ; a2 bs2 br2 ; s3 zr3 jP min½X n1 ;r1 −1 min½n2X ;r2 −s1 þ1 ¼ bðs1 ; n1 ; PÞbðs2 ; n2 ; PÞ½1−Bðr3 −s1 −s2 −1Þ; n3 ; P s1 ¼a1 þ1

s2 ¼a2 −s1 þ1

Since these are disjoint events, the probability of rejecting H1 [β(P)] or rejecting H0 [α(P)] is the sum of probabilities of rejecting H1 (H0) at each stage, that is, bð PÞ ¼ Pr½reject H1 jP ¼ b1 ð PÞ þ b2 ð PÞ þ b3 ð PÞ and að PÞ ¼ Pr½reject H0 jP ¼ a1 ð PÞ þ a2 ð PÞ þ a3 ð PÞ: β (P1) and α(P0) are probabilities of making Type II and Type I errors during the trial. The probability of early termination (PET) at the first and second stage is PET1 ð PÞ ¼ ½a1 ð PÞ þ b1 ð PÞ and PET2 ð PÞ ¼ ½a1 ð PÞ þ b1 ð PÞ þ ½a2 ð PÞ þ b2 ð PÞ respectively: The average sample number (ASN) is ASNð PÞ ¼ n1 þ n2 ⁎½1−PET1 ð PÞ þ n3 ⁎½1−PET2 ð PÞ:

K. Chen, M. Shan / Contemporary Clinical Trials 29 (2008) 32–41

35

Table 1 Optimal three-stage designs for P1 − P0 = 0.20 Multiple testing procedure

Stage 1

Overall

P0

P1

n1

n2

n3

N

a1

a2

a3

r1

r2

r3

ASN(P0)

PET(P0)

PET(P0)

0.05

0.25

0.10

0.30

0.15

0.35

0.20

0.40

0.25

0.45

0.30

0.50

0.35

0.55

0.40

0.60

0.45

0.65

0.50

0.70

0.55

0.75

0.60

0.80

0.65

0.85

0.7

0.9

0.75

0.95

10 7 10 13 6 13 11 9 12 13 8 17 15 10 15 14 9 15 14 10 14 15 9 16 15 10 18 15 9 15 12 11 15 11 10 11 10 9 13 9 10 13 9 11 12

7 8 7 11 10 15 13 12 11 14 13 15 13 13 18 11 16 20 15 15 17 11 17 22 12 15 20 11 17 16 15 12 18 11 14 18 9 13 11 6 4 9 8 4 7

10 14 13 7 18 15 15 14 26 15 21 22 16 24 28 20 23 31 22 22 35 20 26 28 22 28 27 26 23 35 21 20 28 25 17 26 18 22 20 19 20 14 11 12 12

27 29 30 31 34 43 39 35 49 42 42 54 44 47 61 45 48 66 51 47 66 46 52 66 49 53 65 52 49 66 48 43 61 47 41 55 37 44 44 34 34 36 28 27 31

0 0 0 1 0 1 1 1 1 2 1 3 3 2 3 3 2 4 4 3 4 5 3 6 6 4 8 7 4 7 6 6 8 6 6 6 6 6 8 6 7 9 7 9 9

1 1 1 3 2 4 4 4 4 6 5 8 8 7 10 8 9 12 11 10 12 11 12 17 13 13 19 14 15 17 16 14 20 14 16 19 13 16 17 11 11 17 14 13 16

3 4 3 5 6 7 8 8 11 11 12 15 14 16 20 17 19 25 22 21 29 22 26 32 26 29 35 30 29 39 30 28 39 32 29 38 27 33 33 27 28 29 23 24 27

2 2 3 4 3 5 5 6 6 7 6 9 8 7 9 9 7 10 9 8 11 11 8 14 12 9 14 12 9 13 11 11 14 10 10 11 10 9 13 9 10 13 9 11 12

4 5 4 5 5 7 8 8 8 10 9 12 12 11 15 12 13 19 15 16 17 16 16 23 17 17 25 18 20 22 20 19 25 18 20 25 17 21 22 14 14 21 17 15 19

4 5 4 6 7 8 9 9 12 12 13 16 15 17 21 18 20 26 23 22 30 23 27 33 27 30 36 31 30 40 31 29 40 33 30 39 28 34 34 28 29 30 25 25 28

13.14 10.27 14.88 17.42 13.16 20.06 20.59 15.89 23.68 23.04 18.18 26.78 24.82 19.78 28.93 26.18 20.88 30.42 26.86 21.29 31.45 27.01 21.56 31.39 26.44 21.29 30.89 25.66 20.58 29.67 23.84 19.21 27.70 21.89 17.56 25.29 18.82 15.54 22.27 15.52 14.12 18.72 11.71 12.21 15.56

0.68 0.74 0.61 0.66 0.55 0.63 0.51 0.60 0.45 0.51 0.50 0.55 0.48 0.53 0.47 0.36 0.47 0.52 0.45 0.52 0.42 0.41 0.49 0.53 0.46 0.51 0.58 0.52 0.50 0.50 0.48 0.60 0.55 0.50 0.62 0.47 0.50 0.68 0.50 0.58 0.65 0.59 0.77 0.85 0.64

0.91 0.91 0.83 0.91 0.85 0.90 0.79 0.85 0.78 0.79 0.82 0.86 0.81 0.85 0.85 0.74 0.85 0.81 0.79 0.81 0.78 0.72 0.85 0.83 0.78 0.86 0.83 0.79 0.86 0.81 0.81 0.83 0.84 0.79 0.86 0.82 0.76 0.89 0.81 0.79 0.87 0.86 0.92 0.95 0.91

The first, second and third rows of each pair of (P0,P1) design correspond to (α, β) = (0.1, 0.1), (0.05, 0.20), (0.05,0.10), respectively. ASN(P0), stage 1 PET(P0) and overall PET(P0) represent the average sample number, probability of early termination at stage 1 and overall under the true response rate P = P0.

A given design is a valid design if it meets the α and β requirements. There are many valid designs for given α and β. The purpose of our paper is to search for the optimal design which minimizes the average sample number under P = P0 and the minimax design which minimizes the maximal sample size (N) and then given this value of N that minimizes the average sample number under P = P0.

36

K. Chen, M. Shan / Contemporary Clinical Trials 29 (2008) 32–41

Table 2 Optimal three-stage designs for P1 − P0 = 0.15 Multiple testing procedure

Stage 1

Overall

P0

P1

n1

n2

n3

N

a1

a2

a3

r1

r2

r3

ASN(P0)

PET(P0)

PET(P0)

0.05

0.20

0.10

0.25

0.15

0.30

0.20

0.35

0.25

0.40

0.30

0.45

0.35

0.50

0.40

0.55

0.45

0.60

0.50

0.65

0.55

0.70

0.60

0.75

0.65

0.80

0.7

0.85

0.75

0.90

0.80

0.95

13 9 14 17 13 17 19 15 23 23 17 27 26 17 29 24 18 31 30 18 32 27 19 27 28 17 30 22 17 27 25 14 28 20 17 27 21 13 24 18 13 19 15 13 15 13 15 16

9 17 9 12 17 18 19 18 22 23 20 24 25 27 33 22 23 31 25 28 35 26 25 35 28 25 35 29 25 36 20 25 31 23 20 29 17 20 24 16 17 24 13 18 20 5 16 15

15 13 19 20 21 29 21 29 37 25 31 45 28 42 47 43 43 48 31 42 48 47 49 57 36 48 51 40 45 55 40 39 51 34 40 47 35 37 36 25 30 40 27 24 38 16 14 21

37 39 42 49 51 64 59 62 82 71 68 96 79 86 109 89 84 110 86 88 115 100 93 119 92 90 116 91 87 118 85 78 110 77 77 103 73 70 84 59 60 83 55 55 73 34 45 52

0 0 0 1 1 1 2 2 3 4 3 5 6 4 7 6 5 9 10 6 11 10 7 10 12 7 13 10 8 13 13 7 15 11 10 16 13 8 15 12 9 13 11 10 11 10 13 13

1 2 1 3 4 4 6 6 8 10 9 12 14 13 18 15 14 21 21 18 26 23 20 27 27 21 32 27 23 34 26 23 35 27 24 36 26 23 33 25 22 32 22 25 28 15 27 27

3 4 4 7 8 10 12 13 17 18 18 25 24 27 34 32 31 40 35 37 48 46 44 56 47 47 60 51 50 67 52 49 68 51 52 69 52 51 61 45 47 64 45 46 60 30 40 46

3 3 4 5 6 6 7 8 9 9 10 12 12 10 16 12 13 17 17 13 23 16 14 22 18 16 22 17 16 21 19 14 23 18 17 25 20 13 24 18 13 19 15 13 15 13 15 16

4 4 5 7 7 8 10 11 13 15 15 17 19 18 23 20 20 28 26 25 32 28 25 33 33 27 39 32 29 41 32 31 42 32 30 42 30 29 39 29 29 37 25 31 33 18 31 30

4 5 5 8 9 11 13 14 18 19 19 26 25 28 35 33 32 41 36 38 49 47 45 57 48 48 61 52 51 68 53 50 69 52 53 70 53 52 62 46 48 65 46 47 61 31 41 47

20.61 16.06 23.87 28.07 21.85 32.40 34.40 26.91 39.68 39.18 30.95 45.34 43.24 34.15 50.09 45.97 36.40 53.31 47.73 37.98 55.49 48.51 38.89 56.72 48.35 38.53 56.15 47.01 37.55 54.85 44.81 36.11 52.11 41.67 33.23 48.40 37.25 29.99 44.22 31.81 26.27 37.25 25.46 21.35 29.83 18.75 17.76 22.45

0.54 0.64 0.49 0.50 0.62 0.49 0.46 0.60 0.54 0.53 0.55 0.54 0.53 0.58 0.56 0.42 0.53 0.55 0.52 0.55 0.55 0.49 0.49 0.46 0.52 0.47 0.50 0.42 0.50 0.50 0.48 0.45 0.52 0.41 0.55 0.54 0.46 0.50 0.47 0.47 0.59 0.53 0.55 0.69 0.55 0.55 0.87 0.68

0.77 0.93 0.72 0.74 0.88 0.79 0.76 0.83 0.82 0.79 0.84 0.84 0.80 0.86 0.86 0.79 0.82 0.83 0.81 0.82 0.84 0.82 0.85 0.81 0.81 0.83 0.83 0.79 0.82 0.82 0.76 0.78 0.82 0.76 0.82 0.83 0.80 0.81 0.79 0.79 0.79 0.83 0.83 0.88 0.85 0.78 0.95 0.92

Note: The first, second and the third rows of each pair of (P0,P1) design correspond to (α, β) = (0.1, 0.1), (0.05, 0.20), (0.05,0.10), respectively. ASN (P0), stage 1 PET(P0) and overall PET(P0) represent the average sample number, probability of early termination at stage 1 and overall under the true response rate P = P0.

In general, only one unique solution exists for optimal and minimax design respectively. We wrote C programs to search exhaustively for the optimal and minimax design. Our program is available upon request for searching the optimal and minimax design under different assumption of P0, P1, α and β.

K. Chen, M. Shan / Contemporary Clinical Trials 29 (2008) 32–41

37

Table 3 Minimax three-stage designs for P1 − P0 = 0.20 Multiple testing procedure

Stage 1

Overall

P0

P1

n1

n2

n3

N

a1

a2

a3

r1

r2

r3

ASN(P0)

PET(P0)

PET(P0)

0.05

0.25

0.10

0.30

0.15

0.35

0.20

0.40

0.25

0.45

0.30

0.50

0.35

0.55

0.40

0.60

0.45

0.65

0.50

0.70

0.55

0.75

0.60

0.80

0.65

0.85

0.7

0.9

0.75

0.95

13 9 13 11 8 16 13 11 16 15 10 16 14 14 23 19 11 17 17 14 23 21 19 22 16 15 19 17 17 16 23 15 22 16 15 20 12 15 18 12 13 17 11 12 15

7 7 6 6 12 6 11 9 11 10 9 12 13 12 10 11 8 15 18 13 18 8 7 15 11 9 15 9 12 20 12 8 13 6 17 10 6 9 19 8 10 15 4 4 7

0 1 6 8 5 11 8 8 11 11 14 17 12 10 16 9 20 21 7 12 12 12 13 17 14 15 20 13 8 17 3 13 14 13 2 15 13 6 3 5 3 2 6 7 5

20 17 25 25 25 33 32 28 38 36 33 45 39 36 49 39 39 53 42 39 53 41 39 54 41 39 54 39 37 53 38 36 49 35 34 45 31 30 40 25 26 34 21 23 27

0 0 0 0 0 1 1 1 1 2 1 2 2 3 5 4 2 4 5 4 7 7 7 8 6 6 7 7 8 7 13 8 12 9 9 11 7 10 11 8 9 12 8 10 12

2 1 1 1 3 2 4 3 4 5 4 6 7 7 9 9 6 10 14 11 15 12 12 16 12 12 16 13 16 19 22 14 21 14 23 19 12 18 28 15 19 26 12 14 19

2 2 3 4 5 6 7 7 9 10 10 13 13 13 17 15 16 21 18 18 24 20 20 27 22 22 30 23 23 32 24 24 32 24 24 32 23 23 30 20 21 27 18 20 23

3 3 3 4 4 5 6 6 7 7 6 9 8 8 13 11 9 13 12 11 16 14 18 16 14 15 15 14 15 15 19 15 20 15 15 18 12 15 18 12 13 17 11 12 15

3 3 4 5 5 6 7 7 10 9 9 12 11 12 14 14 11 16 18 16 21 17 21 23 18 20 22 19 21 25 25 20 28 20 25 24 16 22 30 18 21 28 15 16 21

3 3 4 5 6 7 8 8 10 11 11 14 14 14 18 16 17 22 19 19 25 21 21 28 23 23 31 24 24 33 25 25 33 25 25 33 24 24 31 21 22 28 19 21 24

16.24 11.65 16.97 18.62 15.17 22.26 21.20 17.78 27.75 24.13 19.63 28.83 26.63 22.07 32.46 29.97 22.72 32.69 28.53 23.67 37.06 29.92 25.13 34.49 28.29 23.40 35.34 28.18 24.59 32.43 27.57 21.28 30.46 22.74 21.91 29.89 19.55 18.73 28.52 16.75 17.72 22.86 13.83 12.79 16.77

0.54 0.64 0.54 0.33 0.44 0.53 0.41 0.49 0.29 0.42 0.38 0.35 0.29 0.53 0.47 0.29 0.31 0.39 0.42 0.42 0.41 0.36 0.49 0.46 0.37 0.45 0.32 0.32 0.50 0.40 0.64 0.55 0.57 0.48 0.60 0.41 0.42 0.65 0.45 0.52 0.59 0.61 0.59 0.87 0.78

1 0.88 0.80 0.55 0.92 0.69 0.79 0.72 0.64 0.70 0.71 0.70 0.71 0.76 0.74 0.65 0.69 0.69 0.84 0.82 0.71 0.68 0.80 0.75 0.61 0.77 0.69 0.61 0.80 0.74 0.91 0.80 0.80 0.72 0.97 0.74 0.69 0.90 0.97 0.82 0.96 0.97 0.80 0.96 0.96

Note: The first, second and third rows of each pair of (P0,P1) design correspond to (α, β) = (0.1, 0.1), (0.05, 0.20), (0.05,0.10), respectively. ASN(P0), stage 1 PET(P0) and overall PET(P0) represent the average sample number, probability of early termination at stage 1 and overall under the true response rate P = P0.

2.2. Optimal three-stage design Tables 1 and 2 provide some examples of optimal designs for different combinations of P0, P1, α and β. Table 1 is for trials with P1 − P0 = 0.20 and Table 2 is for trials with P1 − P0 = 0.15. In each table, the three rows correspond to (α, β) = (0.1, 0.1), (0.05, 0.20), (0.05, 0.1), respectively. Also given in these tables, ASN(P0), stage 1 PET(P0) and overall

38

K. Chen, M. Shan / Contemporary Clinical Trials 29 (2008) 32–41

Table 4 Minimax three-stage designs for P1 − P0 = 0.15 Multiple testing procedure

Stage 1

Overall

P0

P1

n1

n2

n3

N

a1

a2

a3

r1

r2

r3

ASN(P0)

PET(P0)

PET(P0)

0.05

0.20

0.10

0.25

0.15

0.30

0.20

0.35

0.25

0.40

0.30

0.45

0.35

0.50

0.40

0.55

0.45

0.60

0.50

0.65

0.55

0.70

0.60

0.75

0.65

0.80

0.7

0.85

0.75

0.90

0.80

0.95

17 12 22 22 16 21 21 19 35 27 25 31 33 30 46 29 29 46 36 33 38 30 33 55 41 27 51 35 36 40 43 33 46 31 32 46 30 23 37 24 18 34 22 21 22 16 17 21

6 10 3 11 12 15 18 15 22 17 13 21 18 12 23 22 13 27 19 15 22 18 30 15 28 22 39 19 26 26 17 31 38 15 26 29 27 20 24 23 16 30 11 18 11 4 12 14

9 5 13 7 12 19 14 14 37 14 15 25 13 18 14 18 23 15 17 18 34 25 7 24 5 21 5 18 6 27 8 2 5 18 3 9 2 12 14 4 15 3 7 2 21 11 6 6

32 27 38 40 40 55 53 48 82 58 53 77 64 60 83 69 65 88 72 66 94 73 70 94 74 70 95 72 68 93 68 66 89 64 61 84 59 55 75 51 49 67 40 41 54 31 35 41

0 0 0 1 1 1 2 2 3 4 4 5 7 7 11 7 8 12 11 10 12 10 13 21 18 11 22 16 18 18 23 18 25 17 19 28 19 14 23 16 12 24 16 16 16 13 14 17

1 1 1 3 3 4 6 6 8 9 8 11 13 12 19 16 14 25 20 18 22 19 30 29 35 23 47 28 36 34 36 41 52 28 40 50 41 30 42 36 25 50 26 33 26 17 26 31

3 3 4 6 7 9 11 11 17 15 15 21 20 20 27 25 25 33 30 29 40 34 34 45 38 38 50 41 40 54 42 42 56 43 42 57 42 41 55 39 39 52 33 34 45 27 31 36

3 3 4 6 5 7 8 9 9 12 12 15 15 17 21 17 19 25 21 22 25 21 22 34 28 20 35 26 29 29 34 27 37 27 28 39 26 23 33 23 18 31 21 21 22 16 17 21

4 4 5 7 7 8 10 10 13 15 15 17 19 20 25 24 24 33 26 28 31 29 35 38 39 30 51 34 41 43 42 43 57 34 43 57 43 35 48 39 30 53 30 35 32 20 29 33

4 4 5 7 8 10 12 12 18 16 16 22 21 21 28 26 26 34 31 30 41 35 35 46 39 39 51 42 41 55 43 43 57 44 43 58 43 42 56 40 40 53 34 35 46 28 32 37

22.45 17.52 28.29 31.54 24.46 35.22 36.06 30.16 39.68 42.77 37.26 51.11 48.03 39.80 60.22 49.76 41.39 66.25 53.29 47.90 61.84 53.63 47.05 71.16 55.20 46.22 72.62 52.17 47.70 66.63 53.16 47.07 64.51 47.15 44.12 58.31 43.59 36.95 53.76 37.29 30.01 46.21 29.00 27.63 32.63 19.26 20.68 26.29

0.47 0.56 0.35 0.36 0.53 0.37 0.38 0.44 0.54 0.35 0.42 0.39 0.40 0.51 0.51 0.32 0.48 0.35 0.36 0.36 0.40 0.29 0.55 0.45 0.51 0.41 0.45 0.37 0.57 0..32 0.48 0.55 0.52 0.34 0.54 0.60 0.50 0.41 0.42 0.44 0.47 0.60 0.50 0.63 0.48 0.68 0.71 0.64

0.75 0.78 0.67 0.65 0.76 0.75 0.72 0.80 0.82 0.66 0.68 0.71 0.67 0.78 0.78 0.67 0.76 0.83 0.70 0.71 0.69 0.56 0.93 0.67 0.90 0.71 0.95 0.71 0.93 0.67 0.83 0.98 0.93 0.65 0.96 0.91 0.96 0.81 0.80 0.91 0.77 0.96 0.79 0.97 0.76 0.82 0.96 0.96

The first, second and third rows of each pair of (P0,P1) design correspond to (α, β) = (0.1, 0.1), (0.05, 0.20), (0.05,0.10), respectively. ASN(P0), stage 1 PET(P0) and overall PET(P0) represent the average sample number, probability of early termination at stage 1 and overall under the true response rate P = P0.

PET(P0) represent the average sample number, the probability of early termination at first stage and overall probability of early termination respectively when the true response rate is P0. For example, the first row in Table 1 corresponds to a design with P0 = 0.05, P1 = 0.25, α = 0.1 and β = 0.1. At the first stage, ten patients should be enrolled into the study, if there is no response out of these ten patients, the trial will be

K. Chen, M. Shan / Contemporary Clinical Trials 29 (2008) 32–41

39

Table 5 Comparison of one-stage, two-stage and three-stage designs with α = 0.1 and β = 0.1 P0

P1

Type

0.1

0.3

One stage Simon's two stage Chen's three stage Our three Stage

Optimal Minimax Optimal Minimax Optimal Minimax

n1

n2

n3

N

a1

a2

a3

r1

r2

r3

ASN (P0)

Stage 1 PET(P0)

Overall PET(P0)

25 12 16 10 12 13 11

NA 23 9 9 4 11 6

NA NA NA 7 9 7 8

25 35 25 26 25 31 25

4 1 1 0 0 1 0

NA 5 4 2 1 3 1

NA NA NA 4 4 5 4

∞ ∞ ∞ ∞ ∞ 4 4

∞ ∞ ∞ ∞ ∞ 5 5

∞ ∞ ∞ ∞ ∞ 6 5

25 19.8 20.4 17.8 19.1 17.42 18.62

NA 0.65 0.51 0.35 0.28 0.66 0.33

NA 0.65 0.51 0.72 0.53 0.91 0.55

terminated and we conclude that the new drug is not sufficiently active; if there are at least two responses at the first stage, the trial will be terminated and we conclude the new drug is promising and warrants further investigation; if there is one response seen, the trial will continue to the second stage. At the second stage, another seven patients (a total of 17 patients) are enrolled into the study, if there are fewer than two responses out of 17 patients, the trial will be terminated and we conclude that the new drug is not sufficiently active; if there are at least four responses out of 17 patients, the trial will be terminated and we conclude the new drug sufficiently active for further investigation; if there are two or three responses observed, the trial will continue to the third and final stage. At the final stage, an additional 10 patients (a total of 27 patients) are enrolled into the study. If there are fewer than four responses out of 27 patients, the trial will be terminated and we conclude that the new drug is not active; otherwise, the new drug is considered active. The average sample number is 13.14 for a drug with a true response rate of 0.05. The probability of early termination at first stage is 0.68 and overall probability of early termination is 0.91 assuming the true response rate is 0.05. Comparing Tables 1 and 2 with Chen's [3] Tables I and II, our optimal design requires a little larger maximal sample size but smaller average sample number in most cases. When P1 and P0 are close to one such as P0 = 0.80 and P1 = 0.95, the average sample numbers are little larger than Chen's [3]. The probability of early termination at the first stage and overall probability of early termination are higher than Chen's [3] due to the two-boundary design. 2.3. Minimax three-stage design Tables 3 and 4 show the minimax designs for a variety of design parameters. Table 3 applies to trials with P1 − P0 = 0.20 and Table 4 applies to trials with P1 − P0 = 0.15. In each table, the three rows correspond to (α, β) = (0.1, 0.1), (0.05, 0.20), (0.05, 0.1), respectively. Also given in the tables, ASN(P0), stage 1 PET(P0) and overall PET(P0) represent the average sample number, the probability of early termination at first stage and overall probability of early termination respectively when the true response rate is P0. Comparing Tables 3 and 4 with Chen's [3] Tables III and IV, our optimal design requires similar maximal sample size but smaller average sample number in most cases. When P1 and P0 are close to one such as P0 = 0.80 and P1 = 0.95, the average sample numbers are little larger than Chen's [3]. The probability of early termination at the first stage and overall probability of early termination are higher than Chen's [3] due to the two-boundary design. Table 6 Comparison of three-stage optimal designs with α = 0.05 and β = 0.1 Multiple testing procedure P0

P1

Type

n1

n2

n3

N

a1

a2

a3

r1

r2

r3

ASN(P0)

0.05

0.20

0.10

0.30

0.20

0.40

Our three stage Chen Fleming Our three stage Chen Fleming Our three stage Chen Fleming

14 14 15 13 13 15 17 17 20

9 15 15 15 10 10 15 13 15

19 14 10 15 22 10 22 20 15

42 43 40 43 45 35 54 50 50

0 0 −1 1 1 0 3 3 2

1 2 2 4 3 3 8 7 9

4 4 4 7 7 6 15 14 15

4 ∞ 4 5 ∞ 5 9 ∞ 10

5 ∞ 5 7 ∞ 6 12 ∞ 13

5 ∞ 5 8 ∞ 7 16 ∞ 16

23.87 23.89 31.6 20.06 20.39 24.78 26.78 27.05 33.8

Stage 1 PET(P0)

Overall PET(P0)

0.49 0.49 0.01 0.63 0.62 0.22 0.55 0.55 0.21

0.72 0.84 0.83 0.90 0.84 0.80 0.86 0.79 0.87

40

K. Chen, M. Shan / Contemporary Clinical Trials 29 (2008) 32–41

As shown in Tables 1–4, the optimal three-stage design only minimizes the average sample number and in most cases does not minimize the maximal sample size. The minimax three-stage design minimizes the maximal sample size and in most cases does not minimize the average sample number. But for the same fixed maximal sample size, minimax design does minimize the average sample number. In cancer trials, when accruing patients is very expensive and accrual rate is low, minimax design could be more attractive than optimal design if the difference of the average sample number is not too big. 3. Example The same example in Chen's [3] is used here for illustration purpose. For head and neck locally recurrent or metastatic cancer, the target desirable response rate is 30% and undesirable response rate is 10% with both α and β being 10%. Table 5 provides the specifications for one-stage, two-stage and three-stage optimal and minimax designs. From Table 5, comparing to one-stage designs, multiple-stage designs generally need larger maximal sample size. The maximum sample sizes increase from 25 patients in one-stage design up to 35 patients in Simon's optimal twostage design. However, optimal two-stage and three-stage designs usually need smaller average sample number. All optimal designs in Table 5 require less than 20 patients comparing to 25 patients in one-stage design in terms of average sample number. Minimax designs minimize the maximal sample size and also minimize the average sample number under the same maximal sample size. All minimax designs in Table 5 require the same number of patients but less average sample number than one in one-stage design. In Table 5, comparing to Simon's optimal two-stage design, optimal three-stage design generally needs smaller average sample number and larger maximum sample size (in most situation). However, the maximal sample size is similar in all three minimax designs. Because it could stop the trial early for both effective and ineffective drugs, our optimal three-stage design has higher probability of early termination comparing to Simon's and Chen's optimal designs. Both Chen's and our minimax three-stage designs have smaller stopping probabilities at the first stage comparing with Simon's minimax two-stage design. 4. Discussion In the previous sections, we have provided the optimal and minimax three-stage designs with early stopping under both hypotheses for given α, β with P1 − P0 = 0.20 and P1 − P0 = 0.15. Table 6 compares 3-stage designs among our optimal design, Chen's optimal design [3] and Fleming's design [4] with α = 0.05 and β = 0.1. Our optimal design with early stopping under both hypotheses has similar average sample number and similar probability of early termination as Chen's optimal design with early stop only under H0 [3]. Comparing to Fleming's design [4] that is not optimalized in the sense of minimizing average sample number, both our optimal design and Chen's optimal design have much smaller average sample size but a larger maximal sample size. The probabilities of early termination for our optimal and Chen's optimal design at stage one are much larger than Fleming's design [4]. However, the overall stopping probabilities are close for all three-stage designs. As mentioned in Chen's paper [3], the expected sample number decreases from two-stage to three-stage but the magnitude is not as great as decrease from one-stage to two-stage. He suggests that further extension to more than three-stage design may not be useful. He also comments that optimal design has flavor of Pocock's group sequential design and minimax design has flavor of O'Brien and Fleming's group sequential design. The most challenging aspect from one-stage to multi-stage is operational difficulty in clinical trial practice since enrollment has to be halted at the interim stage which may take a couple of months for database clean and analysis process. Three-stage design adds more challenge than two-stage design for the same reason. However, with current advance technology such as interactive voice response system (IVRS), it is much easier now to monitor and control the clinical trial. To speed the database clean and analysis process and reduce the enrollment pause period, the database clean process should focus on the most important data related to decision making. The shorter the enrollment pause period is, the better the integrity of the trial is kept. Since this is single-arm open label trial, it adds another convenience for sponsors to monitor the trial and make decision. The exact sample size as design stage is not always obtained due to operational complexity of trial. When sponsor decides to halt enrollment for interim stage analysis and communicate to clinical sites which may take some time, a few more patients could have been enrolled if the enrollment is fast. Sponsors must monitor the trial very closely and predict the time when the number of patients required in a design will be reached and also communicate to sites

K. Chen, M. Shan / Contemporary Clinical Trials 29 (2008) 32–41

41

efficiently. Although the actual sample size may not be the exact same as one in design stage, it should be as close as possible. Green and Dahlberg [6] propose the modification of two-stage design to deal with this situation. For the threestage design, similar approach could be used. Both Simon's two-stage [1] and Chen's three-stage [3] do not recommend early termination when response rate is really high. Even though there is no ethical reason to terminate the trial due to a beneficial treatment, sponsors may want to speed up from phase II to phase III based on the early result of phase II. However, only when the extreme evidence in favor of efficacy of the study drug at the early stages has been obtained, one can decide to terminate the trial earlier and declare the anti-tumor activity of the study treatment. As suggested by Fleming [4], the decision of whether or not to allow early termination if early results are favor of study treatment should be made on a study-by-study basis. Acknowledgements The authors wish to thank Dr. Zhengqing Li at Bristol Myers Squibb Co. for his constructive comments. The authors also wish to thank Hemant Virkar for his C program for Simon's optimal and minimax two-stage design. References [1] [2] [3] [4] [5] [6]

Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials 1989;vol. 10:1–10. Ensign LG, Gehan EA, Kamen DS, Thall PF. An optimal three-stage design for phase II clinical trials. Stat Med 1994;vol. 13:1727–36. Chen TT. Optimal three-stage designs for phase II clinical trials. Stat Med 1997;vol. 16:2701–11. Fleming TR. One sample multiple testing procedure for phase II clinical trials. Biometrics 1982;vol. 38:143–51. Schultz JR, Nichol FR, Elfring GL, Weed SD. Multiple-stage procedures for drug screening. Biometrics 1973;vol. 29:293–300. Green SJ, Dahlberg S. Planned versus attained design in phase II clinical trials. Stat Med 1992;vol. 11:853–62.