Do different therapies of AML produce different outcomes?

Do different therapies of AML produce different outcomes?

Leukemia Research Vol. 14, No. 3, pp. 207-208, 1990. Printed in Great Britain. 0145-2126/~0 $3.00 + .00 Pergamon Press plc EDITORIAL DO D I F F E R...

248KB Sizes 1 Downloads 23 Views

Leukemia Research Vol. 14, No. 3, pp. 207-208, 1990. Printed in Great Britain.

0145-2126/~0 $3.00 + .00 Pergamon Press plc

EDITORIAL

DO D I F F E R E N T THERAPIES OF AML P R O D U C E D I F F E R E N T OUTCOMES? THERE is substantial progress in treating acute myelogenous leukemia over the past 40 years. Intensive chemotherapy now produces remissions in greater than 60 per cent of adults and an even higher proportion of children. There is less dramatic progress in extending remissions or in increasing the proportion of long-term, leukemia-free survivors (cures). Median remission duration ranges from about 10 to 20 months in most studies. The proportion of long-term, leukemia-free survivors in persons achieving remission ranges between 15 and 40 per cent. In this annotation we discuss issues related to the analysis of data from treatment trials in A M L . One major challenge in evaluating reports of A M L therapy is to determine whether a new therapy represents a significant improvement over current results. Advances are frequently claimed for uncontrolled trials. If all of these

is again 12 months (95% CI, 10-14 months), and 5-year leukemia-free survival is 20 per cent (95% CI, 15-25%). What do these data imply? First, let us consider the small trial or a comparison of small trials. Here, the same therapy could result in median remissions as divergent as 7 to 17 months and long-term survival as divergent as 7 to 33 per cent. A second consideration relates to the situation where different treatments are compared. In the model we consider, it would be incorrect to assume that a treatment resulting in a 17 month median and a 33 per cent long-term survival was superior to one resulting in a 7 month median and 7 per cent long-term survival. A third consideration is that unless a new therapy were considerably more effective, it would be difficult to distinguish it from conventional treatment. Thus, in considering small trials, there will be substantial numbers of false positive and false negative reports. Comparable or less effective treatments may be identified as more effective, or more effective therapies may be discarded because they seem no better than conventional treatments. The problems we discuss are not as great with larger trials. Nevertheless, there is a substantial range of possible results. For example, an almost doubling of 5-year diseasefree survival from 15 to 25 per cent could be missed or inaccurately ascribed to a comparably effective treatment since it lies within the 95 per cent confidence interval of such a trial size. By this time most readers will realize that we selected these examples since results of a substantial proportion of treatment trials in A M L fall within the 95 per cent confidence intervals discussed. However, to this inherent uncertainty we must add additional factors which further increase uncertainty. Including selection of subjects, patient declinations, withdrawals on-study, competing causes of treatment failure, persons lost to follow-up, protocol variations, and errors in data entry or analysis. The precise magnitude of these effects varies and is difficult to estimate. Nevertheless, this variability added to the variability inherent to trials with small numbers of subjects results in a range of outcomes that includes almost all reports of A M L treatment. Thus, it can be argued that there is no convincing evidence of progress in recent years. More worrisome is the notion that even if a more effective treatment were developed, it might not be identified. A second point we wish to consider is that as results of A M L treatment improve, it is increasingly difficult to detect further improvement. This is because for a trial with a fixed number of subjects in which the initial outcome is poor, the 95 per cent confidence interval increases as results improve. For example, consider our trial of 50 subjects receiving remission induction chemotherapy. If 5 subjects were to achieve remission, the remission rate would be 10 per cent with a 95 per cent confidence interval of 2-18 per cent. However, if 25 subjects achieve remission, while the remission rate would now be 50 per cent, the 95 per cent

claims were correct, survival of persons with A M L would exceed that of age-matched normals. In randomized trials, superiority of the investigational arm seems often to result from an unexpectedly poor outcome amongst controls rather than improvement from the new therapy. Another problem in evaluating data from A M L trials is that different centers or study groups frequently report discordant results with identical or seemingly identical therapies. Still yet another problem is how to analyze the impact of therapies used in persons already in remission. Examples include consolidation or intensification chemotherapy and bone marrow transplantation. Our first point is that despite a general awareness, many investigators underestimate inherent limitation in statistical analyses of typical clinical trials. In general, two types of trials are reported in A M L , single center studies with about 50 subjects and multicenter trials with about 300 subjects. Let us consider what conclusions can be drawn from trials of this size. In the simulations that follow we assume a complete remission rate of 70 per cent, a median remission duration of 12 months, and an actuarial 5-year relapse-free survival of 20 per cent. (Additional assumptions and the bases of these calculations are available from the authors upon request.) If we consider a trial with 50 subjects under the conditions we discuss, 35 will achieve remission. The 95 per cent confidence interval for this 70 per cent remission rate is 57-83 per cent. The median remission duration is 12 months with a 95 per cent confidence interval of 7 to 17 months. The 5-year leukemia-free survival is 20 per cent with 95 per cent confidence intervals of 7 to 33 per cent. In a trial of 300 subjects performed under these conditions, 70 per cent or 210 persons will achieve remission (95% CI, 65-75%), median remission duration

Correspondence to: Dr R. P. Gale, Department of Medicine, Division of Hematology and Oncology, U C L A School of Medicine, Los Angeles, C A 90024-1678, U.S.A. 207

208

R.P. GALE and M. L. LEE

confidence interval would broaden to 36-64 per cent. Our point is that as results of A M L improve, particularly in regard to the proportion of long-term disease-free survivors, it becomes increasingly difficult to detect improvement. Consequently, for trials involving identical numbers of subjects, a greater proportion of false negative and false positive trials will be reported. The next and most complex issue is how to assess the efficacy of new treatments, particularly when used in persons already in remission. This includes post-remission chemotherapy (consolidation, intensification) and transplants (allogeneic, autologous). There are several important issues here. For example, what proportion of persons who achieve remission receive the new treatment. Also, how representative are these persons of the entire group of remitters. Would results in the group receiving the new treatment be the same if applied to all potential recipients or is the group receiving the new treatment biologically distinct. An example might be if persons with resistant leukemia (and therefore the lowest likelihood of cure) relapsed before they could receive the new treatment whereas those with sensitive leukemia received it. These problems of analysis are clearly greatest when the interval from achieving remission to receiving the new treatment is long. They are further confounded when the center giving the new treatment is different than the center giving remission induction chemotherapy since the former is aware not of the universe of persons achieving remission but only of those referred for the new treatment. The major concept underlying the points we discuss is the need to accurately define the true denominator of an outcome, i.e. we know how many persons are alive 5 years after receiving the new treatment but how should and how will this be reported as a response r a t e - - a s a proportion of only those receiving the new treatment, as a proportion of all remitters, or as a proportion that is corrected for those persons already cured without receiving the new treatment. It can be argued that the technique of expressing outcome is not important so long as the investigators indicate the method used. However, comparisons are frequently made between different treatments, for example, between consolidation chemotherapy and bone marrow transplantation. Consequently, it is important to be certain that similar techniques are used or the data are adjusted for these differences. The relevance of these concepts to analysis of studies of A M L is clear--one cannot compare results of chemotherapy trials involving all persons achieving remission with transplant studies in which only a proportion of the eligible remitters receive the treatment. We can illustrate these problems using a trial involving 150 subjects with A M L , 100 of whom (N) achieve remission. Let us assume that 20 subjects ($2) would be alive without leukemia at 5 years even if the new treatment were not given. Next, we assume that all 100 subjects are potential candidates for a new treatment given at a specialized center. However, in the time preceding referral for the new treatment, 30 subjects (dl) relapse. Since the referral center is unaware of the initial 100 subjects, they regard the 70 subjects ($1 = N - dl) as the total study population. Finally, we assume that 30 (St) of the 70 subjects (S 0 receiving the new treatment are alive without leukemia at 5 years. The question is how will and how should these data be reported. First, let us deal with the 70 subjects (S 0 receiv-

ing the new treatment. Since 20 (52) were already cured before receiving the new treatment, the real denominator for assessing efficacy is $1 - $2 or 50 subjects. If 30 subjects (S~) are leukemia-free at 5 years, efficacy of the new treatment can be expressed as (S~ - $2) + ($2 - Sz) or 20 per cent. This is rather different than the 43 per cent (30 of 70) ordinarily reported. Next, we must consider adjusting the results so that they reflect the total study cohort which was 100 (N) and not 70 subjects (SI). Thus, the 30 (S~) alive without leukemia at 5 years represent a 30 per cent, not 43 per cent outcome. However, since 20 subjects ($2) were already cured before receiving the new treatment, the impact of the new treatment is 10 additional survivors, an absolute increment of 10 per cent. This increment would change relatively little even if the 30 subjects who relapsed early (dl) were to receive the new treatment, especially since the outcome of the new treatment in the poor prognosis group (dl) is likely to be inferior to the better prognosis group ($1). These data indicate that with even a relatively simple new treatment, results as diverse as 10 per cent to 43 per cent can and will be reported. Similar scenarios can be drawn for other proposed or observed results of new therapeutic interventions. The key issues are to know the true denominator, i.e. how many subjects never receive the new treatment, and the cure rate with conventional therapy. Unfortunately it is often not possible to know either of the factors and less often, both. Also, as we previously discussed, it will be extremely difficult to distinguish these different treatment outcomes of most clinical trials in A M L because of overlap in confidence intervals and because confidence intervals increase with improving results. There are no simple solutions to the problems of analysis of A M L treatments we discuss. Several suggestions seem reasonable. First, we should encourage increasing trial size and not publish results of small clinical trials unless the data are exceptional. Second, all data should include 95 per cent confidence intervals at the very least. Third, in order to improve the statistical power of the study to detect clinically significant improvements in outcome, the study population could be restricted to persons of highest risk, i.e. those expected to have the poorest long-term survival, and could, thus, benefit most from new treatment modalities. Finally, we should continue to encourage large randomized studies or meta-analysis of small trials if large trials are not always possible. We wish to emphasize that our concern is not solely that ineffective or comparably effective A M L therapies will be judged as superior. The greater danger is that new, effective therapies will be missed. It will be difficult to make further progress in treating A M L unless we substantially modify the way most trials are conducted and reported. ROBERT PETER GALE Department of Medicine Division of Hematology and Oncology U C L A School of Medicine Los Angeles, C A 90024-1678, U.S.A. and MARTIN L. LEE Hyland Division Baxter Healthcare Corporation Glendale, C A 91202, U.S.A.