248
CORRESPONDENCE
improvement in survival as small as 15%. However, the level of power to detect this difference is 80% as opposed to 90%.2 The second issue that deserves comment is the calculation of sample size for the site-specific survival analysis. The authors state that separate analyses according to site were performed. Because the baseline survival rates for calculating the site-specific sample sizes were not reported in the published manuscript, survival rates of 65% for colon cancer and 60% for rectal cancer are assumed.3 The total sample size available to the study was 307 (325218 withdrawals after randomization). Assuming the withdrawals were evenly split between sites, the total site-specific sample sizes available to the study were 229 (23829) patients with colon cancer and 78 (8729) patients with rectal cancer. Even if the authors were correct in selecting a one-tailed test, there is only sufficient power to detect a difference in survival as small as 15% for the colon site. Under the one-tailed test assumption, the required sample size for the colon analysis at 80% power is 226 (62 events). There is insufficient power to detect a difference of this size for rectal cancer under the one-tailed test assumption. At 80% power, the requisite sample size for the rectal analysis assuming a one-tailed test is 246 (80 events). Under the two-tailed test assumption that we espouse, the required sample sizes for colon and rectal cancer at 90% power are 382 (105 events) and 417 (135 events), respectively. At 80% power, the requisite sample sizes are 286 (78 events) and 312 (101 events). Thus, under the two-tailed test assumption, the sample size was inadequate to detect differences as small as 15% for either colon or rectal cancer. Differences as small as 20% can be detected under the two-tailed test assumption for the colon cancer analysis. The smallest difference that can be detected in the rectal cancer analysis under the two-tailed test assumption is 30%. Several issues unrelated to sample size also deserve mention. Surgical technique, a major factor in outcomes analysis, was not standardized. Surgeons with technically superior skills, often achieving better outcomes, may have treated more patients in the standard follow-up group. The amount of bias in the results because of lack of standardization is unclear. The second issue refers to the exclusion criteria. All patients with evidence of residual tumor were excluded by perioperative chest radiography, computed tomography (CT) of the liver, colonoscopy, hemoglobin tests, and liver function tests. It is unclear whether both an operative note and pathology report were required to determine that margins were negative and inadvertent tumor spillage did not occur. The third issue regards the investigation of abnormalities detected during surveillance. An isolated increase in CEA levels did not trigger further work-up. There may be little impact in the first 2 years after treatment when CEA is measured every 3 months, but in the next 5 years visits are scheduled only twice per year. Lack of work-up may result in loss of the window of opportunity to resect. Insufficient data were presented to determine if resectability at the next visit was adversely affected for patients with sustained increases. The fourth issue is a suggestion that quality of life be measured in future trials as has been done for breast cancer.4 Although survival may not differ between standard and intensive follow-up, a significant quality of life difference between the two may provide justification for intensive follow-up. The final issue is a request for clarification of the confidence interval in Table 3 for Dukes’ stage B. It reads 190–12.05. The correct confidence interval is assumed to be 1.90–12.05. We once again wish to congratulate the authors for their important contribution to this field. Although no survival benefit was detected, further prospective analysis with larger sample sizes should focus on both site-specific and particularly stage-specific analyses of follow-up
GASTROENTEROLOGY Vol. 115, No. 1
intensity. As yet unproven is whether subgroups of colorectal cancer patients after surgery benefit from yearly colonoscopy, CT of the liver, and chest radiography. It is also unclear whether the standard arm of the study by Schoemaker et al. improves survival compared with even less intensive follow-up. KATHERINE VIRGO FRANK JOHNSON Department of Surgery St. Louis University Health Sciences Center and Department of Veterans Affairs Medical Center St. Louis, Missouri RICCARDO AUDISIO General Surgery MultiMedica Milan, Italy 1. Schoemaker D, Black R, Giles L, et al. Yearly colonoscopy, liver CT, and chest radiography do not influence 5-year survival of colorectal cancer patients. Gastroenterology 1998;114:7–14. 2. Freedman LS. Tables of the number of patients required in clinical trials using the log rank test. Stat Med 1982;1:121–129. 3. Ries LAG, Kosary CL, Hankey BF, et al. SEER Cancer Statistics Review, 1973–1994, National Cancer Institute. NIH Publication No. 97-2789, Bethesda, MD, 1997. 4. GIVIO Investigators. Impact of follow-up testing on survival and health-related quality of life in breast cancer patients: a multicenter randomized controlled trial. JAMA 1994;271:1587–1592.
Surveillance After Colorectal Cancer: The Final Word? Dear Sir: The recent publication by Schoemaker et al.1 addressed an important question: whether patients with colorectal cancer benefit from intensive postoperative surveillance. Based on their data, the authors rightfully conclude that such a strategy does not affect overall survival. Such conclusions are in agreement with those of other randomized controlled trials,2–4 two of which had already been published in 1995 and which were not referred to in the otherwise thoughtful discussion. Although I fully agree that no one has ever proven that postoperative surveillance will benefit the average patient with colorectal cancer, I am somewhat disturbed by the fact that all four investigations showed a trend toward an improved 5-year survival rate in intensively surveilled patients. It is recognized that these differences between survival curves are small and that none of them reached statistical significance. However, considering the rather small patient samples, the possibility of a type 2 error cannot be excluded. The question can be raised whether an ,5% improvement in survival at 5 years would be of clinical value even if statistical significance was ever proven. For example, if a potential clinical value is expressed in terms of dollars spent per added year of life, such costs will probably exceed the commonly cited benchmark value of $40,000 for patients in the United States. However, for some countries such as Germany with its exceedingly low reimbursements for medical services, we have recently estimated that such costs will be in the range of only $5000–10,000. Thus, if 5% of patients with colorectal cancer would benefit from intensive postoperative surveillance, the decision to enroll patients in such a program may remain a matter of geography. A final and perhaps even more important issue relates to whether all operated patients should be surveilled and which procedures should be performed. It is difficult to understand why patients with Dukes’ A
July 1998
tumors who have a minimal chance for tumor recurrence and a normal life expectancy were enrolled in Schoemaker’s study. It would have been interesting to know whether any patient with this tumor stage had a recurrence and whether surveillance aided in its early detection. Furthermore, with regard to the type of investigations performed, it is surprising that the authors selected liver enzyme determinations and CTs for follow-up investigations. I am not aware of a single study that would have shown that isolated hepatic enzyme elevations (without a concomitant increase in CEA) are sensitive and specific enough to be of value in the early recognition of liver metastases. In addition, at least in Germany, abdominal ultrasonography costs only a fraction of CT and does not seem to be inferior in its accuracy to detect hepatic metastases.5 In summary, Schoemaker et al. convinced me that the average Australian patient with colorectal cancer gains little by cooperating with an intensive postoperative surveillance program as suggested by them. However, before I translate these findings in my daily practice, I still would like to see a large randomized controlled study that investigates patients at risk for cancer recurrence and uses a more rational follow-up program. VOLKER F. ECKARDT, M.D., Ph.D. Universitat Mainz Wiesbaden, Germany 1. Schoemaker D, Black R, Giles L, et al. Yearly colonoscopy, liver CT, and chest radiography do not influence 5-year survival of colorectal cancer patients. Gastroenterology 1998;114:7–14. 2. Ma¨kkela¨ J, Laitinen S, Kairaluoma MI. Five-year follow-up after radical resection for colorectal cancer: results of a prospective randomized trial. Arch Surg 1995;130:1062–1067. 3. Ohlsson B, Brehland U, Ekberg H, et al. Follow-up after curative surgery for colorectal carcinoma: randomized comparison with no follow-up. Dis Colon Rectum 1995;38:619–626. 4. Kjeldsen BJ, Kronburg O, Fenger C, et al. A prospective randomized study of follow-up after radical surgery for colorectal cancer. Br J Surg 1997;84:666–669. 5. McGarrity TJ, Samuels T, Wilson FA. An analysis of imaging and liver function tests to detect hepatic neoplasia. Dig Dis Sci 1987;32: 1113–1117.
Follow-up After Curative Resection for Colorectal Cancer Dear Sir: We read with interest the study of Schoemaker et al.1 addressing the impact of yearly follow-up on the survival of patients after resection of colorectal cancer. Their prospective randomized trial of 325 patients having undergone curative resection provides important information on this common management issue. However, we have some concerns about the presentation of the results. The authors present data from the univariate Cox analysis that shows a strong trend toward increased survival in the patients undergoing intensive annual follow-up compared with patients receiving standard follow-up. In the univariate analysis, the point estimate for the hazard ratio for the intensive follow-up regimen is 0.69 (95% confidence interval, 0.47–1.04; P 5 0.07). The investigators subsequently used multivariate Cox regression analyses to adjust for potential confounding variables based on the univariate analysis. The authors report that in the multivariate analysis, there was not a statistically significant association between the intervention strategy and overall survival. Unfortunately, they do not provide the adjusted point estimate for the association between follow-up schedule and survival. Because of the relatively few deaths and the numerous
CORRESPONDENCE
249
potential confounding variables, it is possible that the point estimate remained near 0.69, but that the P value increased as a result of reduced statistical power in the multivariate analysis (i.e., there was no significant confounding of the univariate results). If the point estimate remained near 0.69, one would need to reconsider whether the more intensive follow-up regimen actually does offer a survival advantage after curative resection for colorectal cancer, but that the study failed to identify a statistically significant association due to issues of statistical power (i.e., a type 2 error). We hope that the authors would provide us with this data to allow the reader to better understand the influence of follow-up strategies on survival after curative resection for colorectal cancer. JAMES D. LEWIS, M.D. MICHAEL L. KOCHMAN, M.D., F.A.C.P. TIMOTHY C. HOOPS, M.D. Division of Gastroenterology Center for Clinical Epidemiology and Biostatistics University of Pennsylvania Health System Philadelphia, Pennsylvania 1. Schoemaker D, Black R, Giles L, et al. Yearly colonoscopy, liver CT, and chest radiography do not influence 5-year survival of colorectal cancer patients. Gastroenterology 1998;114:7–14.
Reply. We thank the authors of letters to the editor plus those who have contacted us directly via e-mail for their interest in our study and for their comments on the article that appeared in GASTROENTEROLOGY. The design of our study had, from the outset, incorporated statistical issues raised in the letters. The criticism of the one-tailed test is valid. However, the sample size calculations were based on the assumption that early detection of recurrent or metastatic lesions would result in earlier treatment and therefore a higher survival rate. At the time we commenced the study, this view was reasonable, and so a one-tailed test was assumed in the sample size calculation. As shown by Virgo et al., adequate power was still obtained with this sample size even if a two-tailed log rank test was assumed. The main finding of this study was that the follow-up schedule made no difference in overall survival. The subsidiary analysis carried out for the rectal cancers is indeed hampered by insufficient power. The site-specific analyses were carried out to establish whether there were any inconsistencies between the overall and site-specific analyses, which was established for the colon site-specific analysis. In the rectal specific analysis, the univariate odds ratio for schedule was found to be 0.56 with an associated 95% confidence interval of 0.25–1.26. This is inconclusive but consistent with the null hypothesis of no effect of schedule. As an aside, the actual number of withdrawals was proportional to the number of patients in each group, rather than evenly distributed between the groups as assumed by Virgo et al. There were 11 withdrawals among the colon cancer patients and 7 withdrawals among the rectal cancer patients. Of these 7 withdrawals, 2 were from the standard schedule and 5 from the intensive arm of the trial. One further issue regarding the deaths in the two arms of the study relates to cause of death. In the standard follow-up group of patients, there were 16 cancer-related deaths, whereas in the intensive arm only 4. This difference contributed to the nonsignificant trend illustrated in Figure 1 of the article. The question of surgical technique and superior skills was raised. We find this a difficult criticism to address, other than to say that all of the surgery was performed or supervised by senior consultants working in university teaching hospitals. We would be fascinated to learn how one standardizes for superior surgical skills!