Performance of multinomial designs in comparison with response-based designs in non-randomized phase II trials of targeted cancer agents

Performance of multinomial designs in comparison with response-based designs in non-randomized phase II trials of targeted cancer agents

original articles 26. Negri T, Virdis E, Brich S et al. Functional mapping of receptor tyrosine kinases in myxoid liposarcoma. Clin Cancer Res 2010; 1...

109KB Sizes 0 Downloads 41 Views

original articles 26. Negri T, Virdis E, Brich S et al. Functional mapping of receptor tyrosine kinases in myxoid liposarcoma. Clin Cancer Res 2010; 16: 3581–3593. 27. Young H, Baum R, Cremerius U et al. Measurement of clinical and subclinical tumour response using [18F]-FDG/PET: review and 1999 EORTC

Annals of Oncology recommendations. EORTC PET Study Group. Eur J Cancer 1999; 35: 1773–1782. 28. Postel-Vinay S, Ashworth A. AXL and acquired resistance to EGFR inhibitors. Nat Genet 2012; 44: 835–836.

Annals of Oncology 24: 1936–1942, 2013 doi:10.1093/annonc/mdt122 Published online 3 April 2013

R. Jamal1*, R. A. Goodwin2, D. Tu3, W. Walsh3, D. Lacombe4 & E. A. Eisenhauer5 1

Department of Hematology-Oncology, Notre-Dame Hospital, CHUM, University of Montreal, Montreal; 2General Division, Ottawa Health Research Institute, Ottawa; NCIC Clinical Trials Group, Queen’s University, Kingston, Canada; 4European Organization for Research and Treatment of Cancer, Brussels, Belgium; 5Department of Oncology, Kingston General Hospital, Queen’s University Cancer Centre of Southeastern Ontario, Kingston, Canada 3

Received 19 October 2012; revised 25 January 2013; accepted 13 February 2013

Background: In phase II trials of cytotoxic agents, a multinomial phase II design incorporating early progression and response end points was shown to perform more efficiently than designs based only on response. We undertook a study to evaluate the performance of these designs in trials of targeted agents using the actual phase II data. Patients and methods: Using best response data from sequentially enrolled patients in 15 NCIC Clinical Trials Group and 7 European Organization for Research and Treatment of Cancer trials of targeted agents, we determined that trials would have been stopped at the end of stage I of accrual by applying rules generated by the multinomial and Fleming designs. Two variants of the multinomial design were studied: to stop accrual after stage I of enrolment, Variant A required either response or progression criteria to be met, whereas Variant B required that both response and progression criteria to be met. Results: Using early progression, null/alternate hypotheses of 60% and 40% (60/40), the multinomial A variant recommended early stopping more often than the Fleming design. In most of the cases, this recommendation was correct given the final trial outcome. In contrast, the multinomial B variant never led to recommendations for early stopping and changing progression hypotheses did not improve the performance of this design. Conclusions: The multinomial A design using 60/40 hypotheses carried out better than the Fleming design in appropriately stopping trials of inactive targeted agents early. The multinomial B design was not useful for early stopping decisions. The multinomial A design may be favored over response-based designs in phase II trials of targeted agents. Key words: clinical trials, multinomial design, phase II designs, targeted agents

introduction Single-agent phase II studies of new cancer agents have traditionally employed objective tumor response as the primary end point to guide decisions on the activity of the agent. Such trials are often non-randomized and use a multistage design such as those of Gehan, Fleming and Simon [1–4] to terminate the trial after the first stage of accrual if too few responses are seen. The stopping rule and sample size of the trial are determined by the null (Ho) and alternate hypotheses (Ha) established for response and the alpha (false-positive) and beta *Correspondence to: Dr R. Jamal, Department of Hematology-Oncology, Notre-Dame Hospital, CHUM, University of Montreal, 1560 Sherbrooke East, Montreal, QC, Canada H2L 4M1. Tel: +1-514-862-0966; Fax: +1-514-412-7803; E-mail: [email protected]

(false-negative) error rates. These parameters will vary based on the tumor type, the patient population and the type of drug being evaluated. Recently, early disease progression has been suggested to be an additional measure of inactivity in an innovative multinomial design [5]. This design presumes that a truly active treatment may be detected not only by evidence of response, but also by the observation of a low proportion of patients having early disease progression. Thus, information on both the number of observed responses and the number of early progression events is considered in decisions about early stopping and conclusions about activity. Zee et al. [5] defined early progression as a best overall response of progressive disease (PD), meaning that those patients had progressed at, or

© The Author 2013. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: [email protected].

Downloaded from http://annonc.oxfordjournals.org/ at University of Ulster at Coleraine on April 11, 2015

Performance of multinomial designs in comparison with response-based designs in non-randomized phase II trials of targeted cancer agents

original articles

Annals of Oncology

materials and methods trial data Individual patient response results were obtained from 15 completed NCIC Clinical Trials Group (CTG) phase II trials of targeted agents (Table 1). All 15 trials were conducted using a standard two-stage Fleming design (Ho: proportion responding [ pr] ≤ 5% versus Ha: [ pr] ≥ 20%). For trials I and O, the hypotheses were Ho: [ pr] ≤ 10% versus Ha: [ pr] ≥ 30%. In addition, through a collaboration with the European Organization for Research and Treatment of Cancer (EORTC), we obtained data from seven phase II EORTC trials evaluating targeted agents (Table 2). Three of these trials used an objective response end point in the original design of the study and the rest of the trials used progression-free survival, but all had collected objective response information. Because several EORTC trials evaluated the same targeted agent in different tumor types, we compiled 13 treatment arms that were analyzed as individual trials (considered as trials 1–13 in Table 2). For each study, we obtained the date of patient enrolment and their best objective response to treatment on study.

Table 1. NCIC CTG phase II trials of single-agent targeted therapies Trials

NCIC CTG phase II single-agent targeted therapy trials

A B C D

ZD6474 in recurrent/relapsed multiple myeloma MG98 in renal cell carcinoma Flavopiridol in untreated metastatic malignant melanoma Flavopiridol in untreated metastatic or locally advanced soft tissue sarcoma Sunitinib in locally advanced or metastatic cervical carcinoma Sunitinib in relapsed or refractory diffuse or mediastinal large B-cell lymphoma CGP 64128A (ISIS 3521) in locally advanced or metastatic colorectal cancer CGP 69846A (ISIS 5132) in locally advanced or metastatic colorectal cancer CCI-779 in metastatic and/or locally advanced recurrent endometrial cancer BB-2516 in malignant melanoma and cutaneous metastases Sunitinib in advanced ovarian, primary fallopian tube or peritoneal cancer BAY 43-9006 in hormone refractory prostate cancer Flavopiridol in untreated or relapsed mantle cell lymphoma PS-341(bortezomib) in mantle cell lymphoma PS-341 (bortezomib) in Waldenström’s macroglobulinemia

E F G H I J K L M N O

In the Fleming two-stage design [2], null and alternate hypotheses for response are set and generate a stopping rule based only on response observations in the prespecified number of first stage patients. Because of our familiarity with this design, it was selected for evaluation in this project. The Simon two-stage design is based on similar principles and yields similar decision rules [3].

design parameters: hypotheses and error rates As summarized in Table 3, similar response null and alternate hypotheses were employed for all designs tested. The multinomial designs tested incorporated null and alternate hypotheses for progression that included a variety of paired values as presented in Table 3. The alpha and beta used for the derivation of the stopping rule were 0.1 for all cases considered. The primary comparisons were between Fleming design and the multinomial A and B designs using progression hypotheses of Ho 60% and Ha 40%. Following the results of these comparisons, we explored the impact on the conclusions of the study using the other progression hypotheses summarized in Table 3. Each design generated rules for stopping (accepting the null hypotheses) or acceptance of agent as interesting (accepting the alternate hypotheses) based on the observations of the number of responses and early progression events observed in a minimum number of accruals.

multinomial and Fleming

application of design rules to actual clinical trial data

The multinomial two-stage design has been previously described in detail [5]. It is based on generating a combination of hypotheses for response and progression identifying the values for the null and alternate hypotheses for both variables based on the rates of tumor response and progression deemed to be of interest in the particular tumor setting. Two variants are possible: in one (multinomial A), stopping at the end of stage I of accrual occurs if either response or progression criteria are met. In multinomial B, stopping at the end of stage I requires that both the response and progression criteria for stopping are met.

We applied the rules for stopping trials at the accrual point representing stage I of accrual and at the end of study (if data were available) using Fleming, multinomial A and B designs to the NCIC CTG and EORTC single-agent phase II trials. The best overall response to therapy (complete response, partial response, stable disease or PD) was compiled. For the purpose of the multinomial designs, early progression was considered to have occurred if the best response to therapy was PD. Sequential patient data were reviewed in each trial to tabulate a STOP/ GO decision at the end of stage 1 of accrual and a REJECT/ACCEPT

designs evaluated

Volume 24 | No. 7 | July 2013

doi:10.1093/annonc/mdt122 | 

Downloaded from http://annonc.oxfordjournals.org/ at University of Ulster at Coleraine on April 11, 2015

before, the first scheduled follow-up evaluation on the trial. Trials would stop early for high numbers of early progressions and too few or no responses. In a comparison of the performance of the multinomial stopping rule developed by Zee et al. [5] with the Fleming and Gehan designs in a series of phase II trials evaluating new cytotoxic agents, the multinomial design was found to be more efficient in recommending the early trial closure of what were eventually found to be ineffective agents [6]. In the last 10 years, the majority of new drugs entering cancer clinical trials have been molecular-targeted agents. Because of the differing mechanism of action of these agents, some argued that stabilization of disease (non-progression) might be a more relevant end point than objective response in phase II designs [7–10]. The multinomial design could theoretically temper the importance given to response in decision-making in such trials. Indeed, the Methodology for the Development of Innovative Cancer Therapies task force recently stated that a multinomial design should be considered for use in phase II studies of targeted drugs, particularly when responses rates are anticipated to be low [11]. To investigate the performance of the multinomial design in phase II trials of targeted agents in comparison with a response-based design, we applied both types of design to actual data from a series of phase II studies.

original articles

Annals of Oncology

Table 2. EORTC phase II trials of single-agent targeted therapies EORTC trial

Description of trials

Single-agent targeted therapy arm used for analysis

End point used in original trial

1

Phase II study of GW786034 in patients with relapsed or refractory soft tissue sarcoma Phase II study of STI 571 in advanced soft tissue sarcoma (STS) Phase II study of gefitinib (ZD1839) in locally advanced and/or metastatic synovial sarcoma expressing HER1/EGFR1 Multicenter parallel phase II trial of BI 2536 administered as 1-h IV infusion every 3 weeks in defined cohorts of patients with various solid tumors Phase II study of GW786034 in patients with relapsed or refractory soft tissue sarcoma Open-label phase II study on STI571 administered as a daily oral treatment in gliomas Randomized phase II of erlotinib versus temozolomide or BCNU in patients with recurrent glioblastoma multiforme Open-label phase II study on STI571 administered as a daily oral treatment in gliomas Open-label phase II study on STI571 administered as a daily oral treatment in gliomas Phase II study of GW786034 in patients with relapsed or refractory soft tissue sarcoma Phase II study of GW786034 in patients with relapsed or refractory soft tissue sarcoma Phase 2 study of imatinib in locally advanced and/or metastatic soft tissue sarcomas expressing the t(17;22)(q22;q13) translocation resulting in a COL1A1 /PDGF-beta fusion protein, i.e. DermatoFibroSarcoma Protuberans (DFSP) and Giant Cell Fibroblastoma (GCF) Phase II study of STI 571 in advanced soft tissue sarcoma (STS)

Adipocytic tumors

Progression-free survival Objective response Progression-free survival Objective response

2 3 4 5 6

8 9 10 11 12

13

Table 3. Design parameters: hypotheses tested

Fleming Multinomial A

Multinomial B

Response Ho [pr]

Ha [pr]

Progression Ho [pp]

Ha [pp]

5% 5% 5% 5% 5% 5% 5%

20% 20% 20% 20% 20% 20% 20%

– 60% 50% 40% 60% 50% 40%

– 40% 30% 20% 40% 30% 20%

decision on the regimen’s activity at the end of the study based on each design. A comparison of the performance of the designs was based on whether the recommendations to close accrual early were correct as determined by the final trial observations (when available; see example of process in Table 4 using trial J).

results NCIC CTG trials: performance of the Fleming versus multinomial A and B stopping rules When the stopping rules generated by the various designs were applied to 15 NCIC CTG phase II trials of targeted agents, the

 | Jamal et al.

– Leiomyosarcoma Anaplastic astrocytoma and recurrent low-grade astrocytoma – Glioblastoma multiforme Anaplastic oligodendroglioma and mixed oligoastrocytomas Synovial sarcoma Other types of eligible STS –

GIST (c-Kit positive)

Progression-free survival Objective response Progression-free survival Objective response Objective response Progression-free survival Progression-free survival Progression-free survival

Objective response

Fleming design recommended continuing seven trials (I–O) to stage 2 of accrual, while the multinomial A 60/40 design would have stopped all but three of these (M–O, Table 5). The recommendation at the end of the first stage was concordant in 11 of the 15 trials (in eight (A–H), recommendation was to stop and in three (M–O), recommendation was to continue accrual; Table 5). In the four trials with discordant recommendations (I–L), all would have been continued using the Fleming design and all stopped using multinomial A. Because these trials were actually conducted using a Fleming design, all seven trials also had second stage accrual data. Of these, only two were concluded to be positive by the Fleming design (N, O), whereas those same two trials plus an additional three studies (total 5, K–O) met criteria to consider the agent active by the multinomial A design—interestingly, this included two studies that the multinomial A had recommended for early stopping. In contrast, when the multinomial B 60/40 design parameters were evaluated, none of the 15 trials met criteria for stopping after the first stage of accrual. As summarized in Table 5, changing progression hypotheses to 50/30 and 40/20 had the effect of increasing the probability of early stopping for both multinomial designs such that, for the 40/20 hypotheses, all trials were stopped early in multinomial A and for multinomial B, 4 of the 15 trials were stopped early (B, F–H), all of which were in the same subset of trials stopped early by the original Fleming design.

Volume 24 | No. 7 | July 2013

Downloaded from http://annonc.oxfordjournals.org/ at University of Ulster at Coleraine on April 11, 2015

7

All STS with the exception of GIST –

original articles

Annals of Oncology

Table 4. Example of the application of each design (Fleming, multinomial A and multinomial B) at the end of each stage of accrual for trial J Designs

Stage 1 of accrual: GO/STOP

Application of rule to trial J (actual response and PD data)

Stage 2 of accrual: ACCEPT/REJECT

Application of rule to trial J (actual response and PD data)

Activity reported in the published trial

Fleming

STOP if 0/15 R

1/15 R → GO

ACCEPT if R ≥ 3/25

2/28 R → REJECT

Multinomial A 60/40

STOP if R ≤ 0/15 OR PD ≥ 4/15

1/15 R 12/15 PD → STOP

ACCEPT if any applies:

2/28 R 16/28PD → REJECT

Modest activity reported N/A

• • •

Multinomial B 60/40

STOP if R ≤ 0/14 1/15 R AND PD ≥ 11/14 12/15 PD → GO

ACCEPT if R ≥ 6/50 OR PD ≤ 13/50

2/28 R 16/28 PD → UKN

N/A

UKN: unknown: stage 2 sample size for multinomial B design exceeded the number of patients actually enrolled in some cases, so only rarely could activity be determined. R: response; PD: progressive disease.

Table 5. Recommendations of three designs (Fleming, multinomial A and multinomial B) for early termination of trials and assessment of activity at end of trial using sequential patient data in the NCIC CTG trials Trial Stage 1 R PD

Stage 2 R PD

STOP/GO at stage 1 accrual FL MNA MNA MNA 60/40 50/30 40/20

ACCEPT (+)/REJECT (X) at the end of trial Activity MNB MNB MNB FL MNA MNA MNA MNB MNB MNB reported 60/40 50/30 40/20 60/40 50/30 40/20 60/40 50/30 40/20 (published trial)

A B C D E F G H I J K L M N O

– – – – – – – – 1/27 2/28 1/30 1/27 3/30 13/29 7/27

STOP STOP STOP STOP STOP STOP STOP STOP GO GO GO GO GO GO GO

GO GO GO GO GO GO GO GO GO GO GO GO GO GO GO

0/15 0/15 0/15 0/15 0/18 0/15 0/15 0/15 1/15 1/15 1/15 1/15 3/16 6/16 4/15

8/15 9/15 8/15 7/15 3/18 6/15 11/15 10/15 6/15 12/15 4/15 8/15 2/16 3/16 1/15

– – – – – – – – 12/27 16/28 11/30 13/27 5/30 3/29 1/27

STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP GO GO GO

STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP GO GO

STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP

EORTC trials: performance of the Fleming stopping rule versus the multinomial A and B The seven EORTC trials included 13 different arms of treatment, which were treated as individual trials for the purpose of analysis. Overall, the Fleming design recommended continuing seven trials (trials 7–13) to stage 2 of accrual, while the multinomial A 60/40 recommended only two of the trials continue (trials 12–13, Table 6). In comparing the early stopping recommendation between the two designs, the recommendation at the end of the first stage was concordant in 8 of the 13 trials: in six (trials 1–6), the recommendation was to stop and in two (trials 12 and 13), the recommendation was to continue accrual (Table 6). Similar to the NCIC CTG trial

Volume 24 | No. 7 | July 2013

GO GO GO GO GO GO GO GO GO GO GO GO GO GO GO

GO STOP GO GO GO STOP STOP STOP GO GO GO GO GO GO GO

X X X X X X X X X X X X X + +

X X X X X X X X X X + + + + +

X X X X X X X X X X X X + + +

X X X X X X X X X X X X + + +

UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN + +

UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN + +

UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN UKN + +

Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Modest Limited Modest Positive Positive

experience, the five discordant recommendations (trials 7–11) were all to continue accrual using the Fleming design parameters, but to stop accrual using the multinomial A 60/40 design. Because the EORTC trials actually did continue to accrue, we can compare the final interpretation of results in those studies using both designs. Of the seven trials, moving that continued to stage 2 on the basis of Fleming design parameters, only three would have been concluded to be positive by the same design (trials 11–13), whereas those same three trials plus one additional study (total 4, trials 10–13) met the multinomial A 60/40 criteria to be considered positive. This included two studies (10 and 11) that the multinomial A design would have stopped early.

doi:10.1093/annonc/mdt122 | 

Downloaded from http://annonc.oxfordjournals.org/ at University of Ulster at Coleraine on April 11, 2015



R ≥ 1/30 AND ≤13/30 PD R ≥ 2/30 AND ≤14/30 PD R ≥ 3/30 AND ≤15/30 PD R ≥ 4/30

original articles

Annals of Oncology

Table 6. Recommendations of three designs (Fleming, multinomial A and multinomial B) for early termination of trials and assessment of activity at the end of trial using sequential patient data in the EORTC trials Stage 2 R PD

STOP/GO at stage 1 accrual FL MNA MNA MNA 60/40 50/30 40/20

ACCEPT (+) / REJECT (X) at the end of trial Activity MNB MNB MNB FL MNA MNA MNA MNB MNB MNB reported 60/40 50/30 40/20 60/40 50/30 40/20 60/40 50/30 40/20 (published trial)

1 2 3 4 5 6 7 8 9 10 11 12 13

– – 0/47 0/59 1/39 1/25 2/54 3/50 1/35 5/34 3/47 6/16 16/30

STOP STOP STOP STOP STOP STOP GO GO GO GO GO GO GO

STOP GO GO GO GO GO GO GO GO GO GO GO GO

0/15 0/15 0/15 0/15 0/15 0/15 1/15 2/15 1/15 2/15 3/15 6/14 8/15

14/15 10/15 8/15 8/15 10/15 11/15 13/15 9/15 7/15 7/15 5/15 1/14 2/15

– – 35/47 31/59 19/39 19/25 40/54 33/50 23/35 15/34 25/47 1/16 6/30

STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP GO GO

STOP STOP STOP STOP STOP STOP STOP STOP STOP GO GO GO GO

STOP STOP STOP STOP STOP STOP STOP STOP STOP STOP GO GO GO

Again, as in the NCIC CTG cohort, the stopping rule derived from the multinomial B 60/40 design never led to early trial termination. As summarized in Table 6, changing progression hypotheses to 50/30 and 40/20 had very little effect on increasing the probability of early stopping using multinomial A, while it increased the number of trials stopping early using multinomial B.

discussion Single-agent phase II trials in oncology have historically used a response-based primary end point. With the introduction of targeted molecular agents into the clinic, several reviews have reported on the need to consider the use of other primary end points in these trials, such as progression-free survival or the multinomial end point [7–13]. Limitations of the response rate as an endpoint have been well described in hepatocellular and renal cell carcinomas, where sorafenib, despite response rates of <10%, has led to significant prolongation of both progression-free survival and overall survival [8, 9, 14, 15]. A recent review of 19 targeted drugs being evaluated in a series of 89 single-agent phase II trials showed that objective response was the primary end point used to design 57% of the trials, while a multinomial end point or progression-free survival end point were each used in ∼10% of trials [13]. In an another review of 351 phase II studies on targeted agents, 7.4% and 75.9% of the phase II studies were conducted with a progressive disease and response end point, respectively [12]. Certainly, there remains controversy about what the primary end point should be in phase II trials evaluating targeted agents. There has been little in the way of evaluation of the performance of single-arm phase II designs using response or a multinomial end point using real data generated from actual phase II studies. We previously reported on this comparison within a series of cytotoxic agent phase II trials and have now applied the same approach to a series of targeted single-agent

 | Jamal et al.

STOP STOP GO GO STOP STOP GO GO GO GO GO GO GO

STOP STOP STOP STOP STOP GO GO GO GO GO GO GO GO

X X X X X X X X X X + + +

X X X X X X X X X + + + +

X X X X X X X X X X X + +

X X X X X X X X X X X + +

X UKN X X UKN UKN X X UKN UKN UKN + +

X UKN X X UKN UKN X X UKN + UKN + +

X X X X UKN UKN X X UKN + X + +

Negative Negative Negative Negative Interesting Negative Negative Negative Negative Interesting Interesting Positive Positive

NCIC CTG and EORTC studies. We have shown that a subset of nine trials continued using the Fleming design would have been stopped had the multinomial A design with progression hypotheses of 60% (null) and 40% (alternate), been used to conduct the studies. This is consistent with the fact that multinomial A is designed in such a way that either failure to achieve the response criteria or the progression criteria for continuing can lead to early termination. Thus, it is very sensitive to any evidence of the agent showing signals of failure. In the NCIC CTG data, the final outcome of the four discordant trials supported the multinomial A 60/40 design recommendations, since none showed particularly interesting drug activity at the end of accrual (trials I–L). Paradoxically, two of these studies ‘technically’ met the multinomial A level of interest at the end of accrual through the relatively low rates of early PD seen in the final dataset (trials K and L). Indeed, in the NCIC CTG cohort, all the trials that would have stopped at the first stage of accrual by multinomial A 60/40 were found to have no, limited, or at best modest activity at the end of the trial (Table 5). Observations in the EORTC studies paralleled those of the NCIC CTG studies, i.e. multinomial A 60/40 would have stopped early five trials that would have been continued by applying the Fleming derived stopping rules (trials 7–11, Table 6). In three of these trials, the final outcome of the trials suggested no activity and thus supported the multinomial A 60/40 design recommendations for early stopping (trials 7–9, Table 6). In the remaining two discordant trials, the published data concluded that there was some modest but interesting activity of the agent (trials 10 and 11). Trials 10 and 11 showed interesting activity of pazopanib in some types of soft tissue sarcoma (synovial sarcoma and other types of soft tissue sarcoma) at the end of the study, but according to the multinomial A design, both of these trials would have been stopped at the first stage of accrual [16]. Based on this interesting activity at the end of the phase II EORTC trial,

Volume 24 | No. 7 | July 2013

Downloaded from http://annonc.oxfordjournals.org/ at University of Ulster at Coleraine on April 11, 2015

Trial Stage 1 R PD

original articles

Annals of Oncology

Volume 24 | No. 7 | July 2013

development and in fact neither may do so based on subsequent phase III results. In this review, we evaluated two variants of the multinomial design, each with different hypotheses we thought to be reasonable in the clinical setting. However, in using the multinomial design in phase II trials, the hypotheses would need to be adapted to each clinical setting and should be based on the level of activity of the compound in that setting which would be deemed to be of interest. Related to this, most of the trials in both series were not enriched for a molecular target— only two EORTC trials had populations determined by molecular markers. In a target-enriched population where the target is confirmed to be relevant to activity by earlier studies, one would expect the bar to be set higher for a new drug than the typical null hypothesis level of a 5% response rate. The strength of this review is that it is focused on actual trial data from a range of targeted agents, which more closely reflect how various designs perform in reality. The failure of multinomial B to ever stop a trial when used with original 60/ 40 hypotheses suggests that it does not function as intended as a two-stage design. In conclusion, in both NCIC CTG and EORTC studies, the multinomial A and Fleming designs carried out well in rejecting obviously inactive agents. Certainly, response as end point is still widely used in both studies evaluating cytotoxic and targeted therapies but as shown in the NCIC CTG series, there were also clear cases where the multinomial A design carried out better in identifying inactive agents early in the study as supported by the final outcome of those trials. In the EORTC set, Fleming was again shown to be permissive compared with multinomial A. Interestingly, in both series of trials, the multinomial A design generally carried out best with the original hypotheses of Ho:pp ≥ 60% versus Ha:pp ≤ 40% for progression, while the multinomial B design was ineffective in identifying active agents early. Thus, in choosing between the Fleming versus multinomial A, one must choose the nature of the error one wants to make at the first stage of accrual. The multinomial A 60/40 design might lead to false-negative signals in activity (type II error). Because there is little chance that these agents will be further pursued in drug development, this could result in potentially active agents being permanently disregarded. However, in the context of restricted resources, stopping trials earlier when the signal for efficacy is very low could also be a reason to choose a design using dual end points instead of a pure response end point.

funding Drs Jamal and Goodwin carried out this work as part of a Drug Development Fellowship at the NCIC Clinical Trials Group funded by the Queen’s University Terry Fox Foundation Training Program in Transdisciplinary Cancer Research in partnership with the Canadian Institutes of Health Research (no grant number).

disclosure The authors have declared no conflicts of interest.

doi:10.1093/annonc/mdt122 | 

Downloaded from http://annonc.oxfordjournals.org/ at University of Ulster at Coleraine on April 11, 2015

pazopanib was evaluated recently in a randomized, doubleblind, placebo-controlled phase III trial that included leiomyosarcoma, other sarcoma and synovial sarcoma [16]. Pazopanib significantly increased progression-free survival compared with placebo, but no significant benefit in terms of overall survival was seen. The progression rate in this phase III trial was lower than the one documented in the EORTC phase II trial, but obviously the population and eligibility criteria were not the same making comparisons difficult [16, 17]. Had the EORTC phase II trial been conducted with a multinomial A design, both trials 10 and 11 would have closed after stage I of accrual and no further phase III trial would have been probably conducted. The only two trials given the go-ahead at the first stage of accrual in the EORTC cohort by the multinomial A design were trials 12 and 13 [18, 19]. Both of these trials were enriched for a molecular biomarker predicting response and thus, it is not surprising to document low progression rates and high response rates in trials 12 and 13. Imatinib has been approved in c-Kit-mutated GIST and DermatoFibroSarcoma Protuberans (DFSP) since then. In both NCIC CTG and EORTC series, the performance of multinomial A using lower progression hypotheses (50/30) was similar to multinomial A 60/40. By lowering the levels of early progression rates further to 40/20, multinomial A stopped all NCIC CTG trials early and most EORTC trials early. Thus, it seemed that multinomial A 40/20 could have the potential to miss active agents and is not recommended further. In contrast to the data from multinomial A 60/40, the multinomial B variant, which would continue accrual if either response or progression criteria showed favorable results, did not appear very useful as an approach to early termination of the trials of inactive agents, since all but one trial continued to stage 2 under its stopping parameters. Reducing progression hypotheses to 50/30 and 40/20 improved the performance of the multinomial B variant somewhat but not enough to consider it a useful approach. The findings of this study of trials of targeted agents are similar to those described in a comparison of the performance of the multinomial stopping rule (variant A) versus the Fleming rule in 23 NCIC CTG and EORTC studies of cytotoxic agents [6]. The authors concluded that the multinomial design carried out more efficiently in stopping ineffective agents early. This study has some limitations. Certainly, when assessing if the final outcome of the trial supported rejection of the drug by the multinomial A design in the first stage of accrual, we drew conclusions about the ‘appropriateness’ of the recommendations based on the conclusions of the published trial—which were often based on other measures (response or PFS). The Fleming design was chosen as a comparator in our series, because in the NCIC CTG cohort all 15 trials were initially designed using the Fleming two-stage design. Because the principles behind the Fleming and the Simon designs are similar, using the Fleming design, as a comparator made it is unlikely that other two-stage designs using response as an end point would have offered substantially different conclusions when compared with the multinomial design. Finally, our analysis assumes that one of either the Fleming or the multinomial will provide ‘correct’ recommendations for drug

original articles

Annals of Oncology

references 12.

13.

14. 15. 16.

17.

18.

19.

Annals of Oncology 24: 1942–1947, 2013 doi:10.1093/annonc/mdt073 Published online 13 March 2013

Non-inferiority cancer clinical trials: scope and purposes underlying their design R. P. Riechelmann*, A. Alex, L. Cruz, G. M. Bariani & P. M. Hoff Discipline of Radiology and Oncology, Universidade de Sao Paulo, Instituto do Câncer de Estado de São Paulo, Sao Paulo, Brazil

Received 7 December 2012; revised 26 January 2013; accepted 28 January 2013

Background: Non-inferiority clinical trials (NIFCTs) aim to demonstrate that the experimental therapy has advantages over the standard of care, with acceptable loss of efficacy. We evaluated the purposes underlying the selection of a non-inferiority design in oncology and the size of their non-inferiority margins (NIFm’s). Patients and methods: All NIFCTs of cancer-directed therapies and supportive care agents published in a 10-year period were eligible. Two investigators extracted the data and independently classified the trials by their purpose to choose a non-inferiority design. Results: Seventy-five were included: 43% received funds from industry, overall survival was the most common primary end point and 73% reported positive results. The most frequent purposes underlying the selection of a non-inferiority design were to test more conveniently administered schedules and/or less toxic treatments. In 13 (17%) trials, a clear purpose was not identified. Among the trials that reported a pre-specified NIFm, the median value was 12.5% (range 4%–25%) for trials with binary primary end points and Hazard Ratio of 1.25 (range 1.10–1.50) for trials that used time-

*Correspondence to: Dr R. P. Riechelmann, Discipline of Radiology and Oncology, Universidade de Sao Paulo, Instituto do Câncer do Estado de São Paulo, Av. Dr Arnaldo 251, 12° andar, 01246–000 São Paulo, SP, Brazil. Tel: +55-11-38933437; E-mail: [email protected]

© The Author 2013. Published by Oxford University Press on behalf of the European Society for Medical Oncology. All rights reserved. For permissions, please email: [email protected].

Downloaded from http://annonc.oxfordjournals.org/ at University of Ulster at Coleraine on April 11, 2015

1. Gehan EA. The determination of the number of patients required in a preliminary and a follow-up trial of a new chemotherapeutic agent. J Chron Dis 1961; 13: 346–353. 2. Fleming TR. One-sample multiple testing procedure for phase II clinical trials. Biometrics 1982; 38: 143–151. 3. Simon R. Optimal two-stage designs for phase II clinical trials. Control Clin Trials 1989; 10: 1–10. 4. Simon R. How large should a phase II trial of a new drug be? Cancer Treat Rep 1987; 71: 1079–1085. 5. Zee B, Melnychuk D, Dancey J et al. Multinomial phase II cancer trials incorporating response and early progression. J Biopharm Stat 1999; 9: 351–363. 6. Dent S, Zee B, Dancey J et al. Application of a new multinomial phase II stopping rule using response and early progression. J Clin Oncol 2001; 19: 785–791. 7. Dhani N, Tu D, Sargent DJ et al. Alternate endpoints for screening phase II studies. Clin Cancer Res 2009; 15: 1873–1882. 8. Rubenstein LV, Korn EL, Freidlin B et al. Design issues of randomized phase II trials and a proposal for phase II screening trials. J Clin Oncol 2005; 23: 7199–7206. 9. Karrison TG, Maitland ML, Stadler WM et al. Design of phase II cancer trials using a continuous endpoint of change in tumor size: application to a study of sorafenib and erlotinib in non-small cell lung cancer. J Natl Cancer Inst 2007; 99: 1455–1461. 10. Farley J, Rose PG. Trial design for evaluation of novel targeted therapies. Gynecol Oncol 2010; 116: 173–176. 11. Booth CM, Calvert AH, Giaconne G et al. Design and conduct of phase II studies of targeted anticancer therapy: recommendations from the task force on

methodology for the development of innovative cancer therapies (MDICT). Eur J Cancer 2008; 44: 25–28. Chan JK, Ueda SM, Sugiyama VE et al. Analysis of phase II studies on targeted agents and subsequent phase III trials: what are the predictors for success? J Clin Oncol 2008; 26: 1511–1518. El-Maraghi RH, Eisenhauer EA. Review of phase II trial designs used in studies of molecular targeted agents: outcomes and predictors for success in phase III. J Clin Oncol 2008; 26: 1346–1354. Escudier B, Eisen T, Stadler WM et al. Sorafenib in advanced clear-cell renal-cell carcinoma. N Engl J Med 2007; 356: 125–134. Llovet JM, Ricci S, Mazzaferro V et al. Sorafenib in advanced hepatocellular carcinoma. N Engl J Med 2008; 359: 378–390. Sleijfer S, Ray-Coquard I, Papai Z et al. Pazopanib, a multikinase angiogenesis inhibitor, in patients with relapsed or refractory advanced soft tissue sarcoma: a phase II study from the European Organisation for Research and Treatment of Cancer—Soft Tissue and Bone Sarcoma Group (EORTC Study 62043). J Clin Oncol 2009; 27: 3126–3132. Van der Graff WTA, Blay JY, Chawla SP et al. Pazopanib for metastatic soft-tissue sarcoma (PALETTE): a randomised, double-blind, placebo-controlled phase 3 trial. Lancet 2012; 379: 1879–1886. Rutkowski P, Van Glabbeke M, Rankin CJ et al. Imatinib mesylate in advanced Dermatofibrosarcoma Protuberans: pooled analysis of two phase II clinical trials. J Clin Oncol 2010; 28: 1772–1779. Verweij J, van Oosterom A, Blay JY et al. Imatinib mesylate is an active agent for gastrointestinal stromal tumours, but does not yield responses in other soft tissue sarcomas that are unselected for a molecular target: results from an EORTC Soft Tissue and Bone Sarcoma Group phase II study. Eur J Cancer 2003; 39: 2006–2011.