Analyzing the “Correct” Endpoint

Analyzing the “Correct” Endpoint

Analyzing the “Correct” Endpoint uality of life (QOL) endpoints can be used in a variety of ways to support hypotheses. This chapter will provide guid...

577KB Sizes 3 Downloads 83 Views

Analyzing the “Correct” Endpoint uality of life (QOL) endpoints can be used in a variety of ways to support hypotheses. This chapter will provide guidelines for determining the appropriate outcome measure based on the goals of the study and gives examples from three randomized trials of analyses of those measures. The three trials include: 1) a multi-disciplinary intervention designed to increase QOL versus standard care in those undergoing radiation treatment (Multi-D)1; 2) an efficacy study of Benefin shark cartilage versus placebo (Benefin)2; and 3) a cancer anemia study of Epoetin Alpha versus placebo (EPO).3 The Multi-D study required patients randomized to the multi-disciplinary intervention to participate in a series of eight psychosocial sessions while undergoing 4 weeks of radiation treatment. The sessions consisted of education and counseling with respect to emotional, spiritual, physical, and social concerns affecting QOL. QOL assessments were completed at baseline, 4 weeks, 8 weeks, and 27 weeks. The Benefin study assessed the effect of the shark cartilage on advanced solid tumors. Treatment was administered in 28-day cycles; QOL was measured weekly during the first month and monthly thereafter. The EPO study was a phase III randomized double-blind study where patients with anemia who were undergoing chemotherapy were given either EPO or placebo for 16 weeks. QOL assessments were completed monthly during treatment.

Q

Deciding on Primary and Secondary Endpoints In any clinical trial, there are myriad potential endpoints to explore and a variety of ways to measure them. The scientific method requires any investigation to focus on only a limited subset of the most “important” endpoints. Once these are chosen, it needs to be decided what measurement will allow for a well-informed decision. For example, if it is determined that the important endpoint is tumor response, there are several tumor characteristics, such as volume, shape, density, appearance, elasticity, or pliability, that could be used as potential measures to determine treatment success. Standard practice in Curr Probl Cancer 2006;30:283-297. 0147-0272/2006/$32.00 ⫹ 0 doi:10.1016/j.currproblcancer.2006.08.006

Curr Probl Cancer, November/December 2006

283

TABLE 1. Questions used to determine study goals ● ● ● ● ● ● ● ●

What is likely to change? What would you like to see change? What is the most important improvement you expect for your patients? How should we measure the change in these patients? What are the possible side effects of this treatment? How long does it take to see an effect from this treatment? Are there any QOL issues that could be improved/worsened by this treatment? What would be sufficient evidence to change practice?

phase II clinical trials having a tumor response endpoint is to use Response Evaluation Criteria in Solid Tumors (RECIST) for measuring tumor shrinkage.4 There is no such agreement as of yet on endpoint selection for QOL.5 Thus, the QOL endpoint must be chosen as part of a judgment as to what is deemed clinically important or relevant and must be sensitive to the effect of the treatment.6,7 Consideration of QOL endpoints in previous studies within similar settings is needed to determine what is feasible to measure, which is typically the QOL domains that the study drug or intervention will most likely affect. A clinical example is determining the best choice for measuring anorexia. Is the best endpoint an increase in weight or appetite, or even the lack of a decrease in weight or appetite? QOL values may increase or decrease according to the patient’s response to treatment. It is the decision of the investigator whether the increase or decrease is important to measure. Some questions to consider when determining study endpoints are supplied in Table 1. There are a variety of assessments described in “Choosing the ‘Correct’ Assessment Tool” in this issue which can be used to measure the chosen endpoint. Determining the time points for assessing QOL is driven by expected study results, treatment or intervention duration, logistics, and the value of assessing short- or long-term effects. Choosing time points is addressed in “Optimal Timing for QOL Assessments”8 in the first part of this monograph. The primary endpoint of the Multi-D study was a comparison of the patients’ overall QOL at 4 weeks with secondary endpoints with QOL scores at the other time points, changes from baseline at each time point, and categories of success or failure. The primary endpoint of the Benefin study was the survival benefit of Benefin over placebo with secondary endpoints of toxicity and QOL as measured at each study time point and via area under the curve statistics. The primary QOL endpoint of the EPO study was change from baseline in overall QOL at the end of the study. 284

Curr Probl Cancer, November/December 2006

Secondary endpoints included determining whether there was an increase in hemoglobin levels and a decrease in the number of needed transfusions.

Measuring and Analyzing QOL: Examples This section describes seven common study goals and ways to measure each. Additionally, this section details the statistical methods used and offers examples. Examples use the Linear Analog Self Assessment (LASA) scales, which ask patients to rate QOL (overall or for a specified domain) on a 0 to 10 scale, where 0 indicates the worst it could be and 10 indicates the best it could be. Within the Multi-D study, we consider overall QOL (O-QOL) and spiritual well-being (SWB). Within the Benefin study, we consider O-QOL and physical well-being (PWB). Throughout this section, it is expected that analysis for one endpoint might produce results dissimilar from the analysis of another. This is due to study design and statistical power. The Multi-D study was powered for detecting a difference in scores at a certain time point, whereas the Benefin study was powered to detect a shift in scores over time (ie, a change from baseline). Therefore, re-analysis of the data according to endpoints not determined a priori may not preserve the results obtained from the planned analysis.

Goal 1: To Determine Patient QOL at a Specific Time Point Analysis of assessment scores at a single time point is appropriate in this case. This endpoint is beneficial when trying to determine the immediate impact of the study treatment or intervention. For example, if treatment is over a 5-week period, analysis of patient assessments at the end of the 5th week would provide evidence of study success. Clinically, this type of endpoint would be appropriate in both pharmacokinetic tissue sampling trials, where sampling directly after treatment would result in obtaining an accurate measure of the treatment agent concentration, and in trials using patient-reported toxicity, where collection of these data are within a specific time frame to diminish the chance of forgotten information. When using this endpoint, it is imperative that the baseline values are similar between treatment arms. QOL was measured at the end of the 4 weeks in the Multi-D study, where it was hoped to capture information of the direct effect of the intervention. Summary statistics were compared between arms and tested via the Student’s t-test. It was determined that the patients on the intervention arm did have significantly higher scores than those on the standard care arm (P ⫽ 0.047). Further, patient spirituality scores were Curr Probl Cancer, November/December 2006

285

TABLE 2. Analysis of mean scores at a certain time point (1 month after treatment) Assessment instrument

Intervention (n ⴝ 46)

Standard care (n ⴝ 54)

Multi-D

Overall QOL Spiritual well-being

Benefin

Overall QOL Physical well-being

72.8 84.0 Benefin (n ⴝ 20) 70.0 66.3

64.1 73.0 Placebo (n ⴝ 20) 85.0 72.5

Study

P value 0.047 0.003

0.017 0.287

FIG. 1. Mean scores at 1 month of LASA questions.

also higher in the intervention arm at 1 month (P ⫽ 0.003) (Table 2, Fig 1). Applying this endpoint to the Benefin study and using data collected after the first cycle of treatment, results showed that those patients receiving Benefin have lower overall QOL 1 month after the start of treatment as compared with those patients on placebo (P ⫽ 0.017). Thus, in both examples, measuring QOL within a short period of time since starting treatment resulted in identifying immediate benefit/deficit.

Goal 2: To Detect an Increase (or Decrease) in QOL Scores Over Time This goal may be optimal for studies where both arms of the study are expected to improve, but detecting which arm is improving the most is 286

Curr Probl Cancer, November/December 2006

TABLE 3. Analysis of mean change from baseline at 1 month Study Multi-D

Benefin

Assessment instrument

Intervention (n ⴝ 46)

Standard care (n ⴝ 54)

P value

Overall QOL Spiritual well-being

3.3 5.4

⫺8.9 ⫺2.1

0.009 0.065

Benefin (n ⴝ 20)

Placebo (n ⴝ 19)

⫺10.0 ⫺12.5

5.3 3.9

Overall QOL Physical well-being

0.029 0.027

important. For example, suppose a study involves two distinct treatments for insomnia, both of which have shown to improve fatigue scores. It may be clinically meaningful to show which treatment causes the most improvement because one may be more expensive, more invasive, or harder to administer than the other. The change from baseline is calculated by subtracting the patient’s baseline score from the score at a particular time point. A positive change shows an increase in QOL, and a negative change shows a decrease in QOL. In a randomized two-arm trial, this change is compared between arms using the Student’s t-test or Wilcoxon rank-sum test to detect a statistically significant difference in means. In a single-arm trial, the one sample t-test or Fisher sign test is used to determine whether the change is statistically significantly different than zero. For the Multi-D and Benefin studies, changes from baseline were calculated for the four assessment scores. The Multi-D intervention group had increases in QOL, whereas patients in the standard treatment arm had decreases (Table 3). The Benefin arm had decreases in QOL for both scales. SWB failed to be statistically significant. Changes from baseline in overall QOL are pictured in Figure 2. The plots show changes using the first two assessments for the Multi-D study and changes using the first 4 weeks of treatment in the Benefin study. The range of the values at the 1-month time point show the variation between study arms and reflect the results reported in the table. The change from baseline may or may not give the same conclusion as measuring QOL at a specific time point due to study design and power considerations, as mentioned earlier.

Goal 3: To Assess the Maximum Score or Maximum Change from Baseline This endpoint may be of interest for a number of different situations. One example would be if the patient is receiving multiple cycles of Curr Probl Cancer, November/December 2006

287

FIG. 2. Change from baseline for overall QOL scores (one line per patient).

treatment and can receive treatment as long as it is perceived to be beneficial. The time to the maximum treatment benefit is not as important as the maximum benefit itself. Another example may be if a large number of patients are expected to stop before the end of the study, causing missing values at later dates. It would be inappropriate to assess QOL efficacy at a particular time point. The endpoint is determined by selecting the maximum QOL score per patient over the course of the study or by selecting the largest (positive or negative) change from baseline. A Student’s t-test of the means of the maximum scores between arms failed to identify significant differences in either study (Table 4). The Benefin study maintained significant differences between arms using the maximum change from baseline for comparison.

Goal 4: To Assess QOL at the Last Known Assessment Score or Change from Baseline to the Last Known Assessment Score This endpoint would address the situation where patients drop out of the study, become ineligible for study completion, complete studies at different rates in time, or die before completing all required assessments. For example, if diabetic patients are being monitored for weight loss, then not all patients will need to lose the same amount of weight or be dieting for the same amount of time. Therefore, the endpoint could either be the weight of the patient at study completion or the amount of weight loss on diet completion. Using the last known assessment score provides a reasonable way to examine treatment efficacy when there are missing data. The Student’s 288

Curr Probl Cancer, November/December 2006

TABLE 4. Analysis of means of maximum scores/maximum changes from baseline Mean maximum scores Study Multi-D

Study Benefin

Assessment instrument

Intervention (n ⴝ 49)

Standard care (n ⴝ 54)

P value

81.6 88.6

78.9 83.9

0.406 0.116

Benefin (n ⴝ 32)

Placebo (n ⴝ 27)

P value

76.6 71.1

86.1 78.7

0.104 0.208

Intervention (n ⴝ 49)

Standard Care (n ⴝ 54)

P value

11.6 10.2

5.9 8.8

0.159 0.738

Benefin (n ⴝ 32)

Placebo (n ⴝ 6)

P value

⫺3.1 ⫺5.5

11.5 14.4

0.003 0.001

Overall QOL Spiritual well-being Assessment Instrument Overall QOL Physical well-being

Mean maximum change from baseline Study

Assessment Instrument

Multi-D

Overall QOL Spiritual well-being

Study

Assessment Instrument

Benefin

Overall QOL Physical well-being

t-test or Wilcoxon rank-sum test is used. Applying this endpoint to the Multi-D study, there are no significant results for overall QOL or SWB (Table 5). The Benefin study showed overall QOL and PWB remaining statistically significant. Figure 3 shows the changes depicted in the table as a bar chart. These results tend to be similar to using the maximum scores (Goal 3). An analysis of maximum scores or last known values usually includes more patients than an analysis of scores at a fixed time point.

Goal 5: To Assess Success or Failure of the Study Treatment Typically, a phase II chemotherapy trial uses this type of endpoint by assigning treatment success as being a confirmed complete or partial tumor response. An anorexia study may assign treatment success as a weight gain of at least 10 pounds or a lack of any weight loss and treatment failure as weight loss. A QOL endpoint could be used as a surrogate by using a questionnaire specifically targeted to anorexia symptoms. The assessment could ask the patient a specific question about another aspect of anorexia, such as appetite. Then, patients can be classified as a success or failure depending on the increase in the QOL Curr Probl Cancer, November/December 2006

289

TABLE 5. Analysis of means using the last known assessment score Mean scores of last known assessment Study Multi-D

Study Benefin

Assessment instrument Overall QOL Spiritual well-being Assessment instrument Overall QOL Physical well-being

Intervention (n ⴝ 49)

Standard Care (n ⴝ 54)

P value

72.2 81.4

71.7 78.5

0.889 0.442

Benefin (n ⴝ 32)

Placebo (n ⴝ 27)

P value

62.5 59.4

69.4 66.7

0.299 0.279

Mean change from baseline to last known assessment Study

Assessment instrument

Multi-D

Overall QOL Spiritual well-being

Study

Assessment instrument

Benefin

Overall QOL Physical well-being

Intervention (n ⴝ 49)

Standard Care (n ⴝ 54)

P value

2.2 3.1

⫺1.3 3.7

0.447 0.891

Benefin (n ⴝ 32)

Placebo (n ⴝ 26)

P value

⫺17.2 ⫺17.2

⫺3.8 2.9

0.037 0.005

appetite score or the lack of decrease in the score. Research has indicated that a change of at least 10 points on a 0 to 100 point scale would constitute a clinically meaningful change in a QOL questionnaire score.9 Thus, for the appetite score, a 10-point increase in score (or a lack of a 10-point decrease) would be the success. To analyze this endpoint, patient scores are converted to a 0 to 100 point scale and patients are categorized as having treatment success or failure. Differences between treatment arms are tested using a Chi-Square or Fisher’s Exact test. At 1 month, the Multi-D study had a 43% success rate in the intervention arm and a 22% success rate in the standard care arm for overall QOL (Table 6). This was statistically significant. These results were not consistent with those obtained for the difference in means at week 4. This method reflects a true intent-to-treat analysis since all of the patients are used. Patients with missing values are assumed to be failures.

Goal 6: To Assess the Trend for Improvement in Patient QOL This goal is a result of questioning whether QOL is increasing or decreasing more rapidly in one set of patients compared with another. Examples could be trying to determine which treatment can most quickly 290

Curr Probl Cancer, November/December 2006

FIG. 3. Mean change of last known patient score from baseline.

TABLE 6. Analysis of treatment success or failure according to QOL scores Study Multi-D

Assessment instrument Overall QOL Spiritual well-being

Benefin

Overall

QOL

Physicalwell-being

Success

Intervention (n ⴝ 49)

Standard care (n ⴝ 54)

Yes No Yes No

21 (42.9%) 28 (57.1%) 15 (30.6%) 34 (69.4%)

12 (22.2%) 42 (77.8%) 16 (29.6%) 38 (70.4%)

Benefin (n ⴝ 23)

Placebo (n ⴝ 23)

3 (13%) 20 (87%) 2 (8.7%) 21 (91.3%)

6 (26.1%) 17 (73.9%) 6 (26.1%) 17 (73.9%)

Yes No Yes No

P value 0.025 0.914

0.265 0.120

reduce pain or most quickly increase QOL. A simple linear regression line is fit to data for each patient over time. The slope estimate for each patient is used as the endpoint and the distributions of these slopes are compared between treatment arms. The analysis verifies the drastic decrease in QOL of patients receiving Benefin compared with placebo (Table 7). The Benefin caused many side effects which clearly affected both overall QOL and PWB. This methodology can be used to confirm the results based on other primary endpoints. Curr Probl Cancer, November/December 2006

291

TABLE 7. Mean slope of QOL score over time Study Multi-D

Benefin

Assessment instrument Overall QOL Spiritual well-being

Overall QOL Physical well-being

Intervention (n ⴝ 49)

Standard care (n ⴝ 54)

P value

0.3 0.3

⫺0.1 0.1

0.176 0.487

Benefin (n ⴝ 41)

Placebo (n ⴝ 41)

⫺4.8 ⫺3.8

0.9 1.3

0.004 0.012

Goal 7: To Assess a Patient’s General QOL State Over Time The area under the curve (AUC) method is appropriate in this case. This technique calculates one value per patient, and a Student’s t-test or Wilcoxon rank-sum test is conducted to compare treatment arms. The AUC is calculated using a standard mathematical formula which sums areas of two dimensional spaces created by plotting QOL over the course of the study (in this case from baseline to week 5): AUC ⫽

兰05 xf共x兲dx.

Details of the AUC calculation are described in the “Presenting Longitudinal Data”10 in the first part of this monograph. The AUC must be adjusted if there is no baseline information for a patient, only one valid observation for a patient, or if the last evaluation is missing. In these cases, the AUC may be set to missing, set to zero, or a sensitivity analysis can be performed by trying different methods for filling in the missing values (see “Handling Missing Data”11 in the first part of this monograph for more information on methods of imputing). A normalized statistic may be produced by dividing the AUC for each patient by the number of time periods in which they reported data, thus producing an average AUC value per patient. Comparisons of mean AUC for our examples produce no statistically significant results (Table 8). For the Multi-D study in particular, this analysis accurately reflects that over the course of the entire study there were no overall differences between treatment arms because differences at week 4 ceased to exist at subsequent weeks. The AUC values, therefore, were not expected to be different between the arms. 292

Curr Probl Cancer, November/December 2006

TABLE 8. Analysis of means for normalized AUC Study Multi-D

Assessment instrument

Intervention (n ⴝ 49)

Standard care (n ⴝ 54)

P value

68.1 77.3

65.0 72.5

0.468 0.244

Benefin (n ⴝ 31)

Placebo (n ⴝ 25)

73.4 68.6

77.2 71.5

Overall QOL Spiritual Well-Being

Benefin

Overall QOL PhysicalWell-Being

0.514 0.357

TABLE 9. FACT-Fatigue results from the EPO study Endpoint

Placebo (n ⴝ 164)

Epoetin Alfa (n ⴝ 166)

P value

Baseline

52.8 (n ⫽ 159)

50.1 (n ⫽ 163)

0.221

Score at certain time point (1 month)

57.7 (n ⫽ 96)

57.3 (n ⫽ 91)

0.984

1.6 (n ⫽ 95)

4.9 (n ⫽ 90)

0.178

63.3 (n ⫽ 151)

63.0 (n ⫽ 151)

0.892

9.8 (n ⫽ 147)

12.4 (n ⫽ 149)

0.186

Last known score

54.4 (n ⫽ 151)

53.4 (n ⫽ 151)

0.699

Change from baseline to last known score Treatment success at any time Yes No

0.8 (n ⫽ 147) 42 (26%) 122 (74%)

2.9 (n ⫽ 149) 51 (31%) 115 (69%)

0.231

Change from baseline at 1 month Maximum score Maximum change from baseline

0.302

Slope of scores over time

⫺0.2 (n ⫽ 163)

0.5 (n ⫽ 165)

0.101

Area under the curve (AUC)

183.4 (n ⫽ 147)

182.6 (n ⫽ 149)

0.929

2.6 (n ⫽ 138)

10.6 (n ⫽ 146)

0.199

Area under the curve (AUC) Adjusting for baseline AUC

EPO Study Example: Comparisons of All Endpoints in One Study The FACT-Fatigue scale was used to assess the efficacy of Epoetin Alpha in the EPO study, in part because the study drug was thought to alleviate fatigue related to anemia. All endpoints and analysis techniques described in Goals 1 to 7 were applied to the FACT-Fatigue scale score for comparison purposes. Table 9 shows the results of this endeavor. None of these endpoints produced any significant differences between arms. Figure 4, which compares mean scores (with subsequent 95% confidence intervals) over time by arm, and Figure 5, which shows percent change from baseline by patient at each time Curr Probl Cancer, November/December 2006

293

FIG. 4. Mean FACT-Fatigue over time with 95% confidence intervals.

FIG. 5. Percent of baseline FACT-Fatigue scores over time (one line per patient).

point demonstrate the little variation that existed between arms. As can be seen by Figure 6, success rates were consistent within arms and between arms across the first few weeks of treatment. These endpoint results can be verified by modeling and plotting values over time. Modeling can adjust for baseline values and validate the results of the original analysis. However, these models rarely 294

Curr Probl Cancer, November/December 2006

FIG. 6. Percent of patients with 10-point increases from baseline. (Color version of figure is available online.)

TABLE 10. Regression parameter estimates (EPO study) Effect

FACT-Fatigue

FACT-Fatigue AUC

Intercept Epoetin alfa Large site Lung cancer Breast cancer Planned concurrent RT Mild baseline anemia Male Prestudy transfusion Cisplatinum chemotherapy Age Weight Any transfusion

⫺7.47 0.06 2.82 ⫺0.73 5.32 ⫺4.38 0.12 ⫺4.05 ⫺5.19 ⫺1.70 0.19 0.01 ⫺8.70

⫺17.4 12.7 11.8 5.6 12.4 ⫺38.8* ⫺1.8 ⫺29.1* ⫺12.5 ⫺3.8 0.3 0.4 ⫺37.7

*Significant at 0.05 significance level.

provide additional information on the study results. If there is an important difference, it can usually be seen in the univariate analyses. The results from the regression model are shown in Table 10. The Epoetin Alpha variable is not significant after adjusting for the other variables in the model. This confirms that changes in FACT-Fatigue were not related to treatment. Curr Probl Cancer, November/December 2006

295

Applying Endpoints and Analysis to the NEWMED Study As mentioned previously, the aim for this example is to assess whether NEWMED has an ameliorating effect on patients with advanced cancer who are experiencing pain. So, in particular, there are three specific objectives to measure: 1) impact on pain; 2) impact on other side effects such as fatigue and mood/depression; and 3) impact on survival or disease progression. Formally, the hypotheses are: 1. Patients receiving NEWMED will report a clinically significant change in pain. 2. Patients receiving NEWMED will report clinical improvements in fatigue and mood/depression. 3. Patients receiving NEWMED will have improved survival longevity. A clinically significant change in a QOL score is conservatively a 10-point improvement from baseline when the assessment scale is from 0 to 100. Pain scores are converted so that 0 indicates the worst pain and 100 indicates no pain. The change from baseline is calculated and each patient is classified as being a success or failure. The time point for this change from baseline is determined by prior studies or information, but given the treatment cycle is 4 weeks, the 4-week time point might be optimal to use if patients are expected to experience an immediate response to NEWMED or drop out after this time point. The 2-month time point may be chosen if treatment efficacy isn’t expected for 2 months. The second hypothesis suggests that a comparison would be made between arms to determine whether the fatigue scores and mood/depression scores are significantly higher in the NEWMED arm than the placebo. Correctly powering the study for detecting a 10-point difference between arms would be appropriate. The endpoint is assessed by a comparison of mean scores between arms at the end of treatment or the specified time point. The final hypothesis specifies that a Kaplan–Meier survival analyses is used to compare survival between arms. This endpoint is not driven by QOL assessment scores, but rather by the treatment itself. If the treatment arm achieves longer survival and time to progression, then NEWMED improves longevity.

How to Adjust for Baseline Values Any analysis of QOL over time can be masked by differences in baseline QOL. The best method of adjusting for any imbalance is to prevent it at the design phase. If it is possible to stratify by baseline values, the imbalances can 296

Curr Probl Cancer, November/December 2006

be avoided. Baseline values should be included as covariates in linear or logistic models to see if they affect the outcome of a study. Other methods used for adjusting for baseline values are to subtract the baseline value from the endpoint or subtract off the baseline area in an AUC analysis. Subtracting the baseline area in an AUC analysis should be used with caution since it can result in spurious conclusions that cannot be validated with other analyses.

Summary The choice of QOL endpoints for a study should be based on which score will most likely change if the treatment is favorable. How the QOL change is calculated should be based on the expected amount of missing data, how many time points data will be collected, and whether extreme outliers in the scores impact results. The study should have sufficient power to detect a meaningful difference between arms (typically 10 points on a 0-100 point scale) in the chosen QOL endpoint. At the conclusion of a study, several secondary endpoints can be analyzed which can provide additional information and confirm primary endpoint results.

REFERENCES 1.

2. 3.

4.

5. 6.

7.

8. 9. 10. 11.

Rummans TA, Clark MM, Sloan JA, Frost MH, Bostwick JM, Atherton PJ, et al. Impacting quality of life for patients with advanced cancer with a structured multidisciplinary intervention: a randomized controlled trial. J Clin Oncol 2006;24:635-42. Loprinzi CL, Levitt R, Barton DL, Sloan JA, Atherton PJ, Smith DJ, et al. Evaluation of shark cartilage in patients with advanced cancer. Cancer 2005;104:176-82. Witzig TE, Silberstein PT, Loprinzi CL, Sloan JA, Novotny PJ, Mailliard JA, et al. Phase III, randomized, double-blind study of epoetin alfa compared with placebo in anemic patients receiving chemotherapy. J Clin Oncol 2005;23:2606-17. Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, et al. New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst 2000;92:205-16. Sloan JA, Dueck A. Issues for statisticians in conducting analyses and translating results for quality of life end points in clinical trials. J Biopharm Stat 2004;14:73-96. Sloan JA, Novotny PJ, Loprinzi CL. Analyzing quality of life (QOL) endpoints in clinical trials (via the SAS System). Proceedings of the 23rd Annual SAS Users Group International Conference 1998;1213-22. Sloan JA, Novotny PJ, Loprinzi CL. Design and analysis of cancer control studies using the SAS system. SUGI 25. Indianapolis, IN: Proceedings of the 25th Annual SAS Users Group International Conference; 2000 (paper 259-25):1356-65. Burger K, Mandrekar S. Optimal timing for QOL assessments. Curr Probl Cancer 2005;29:278-84. Sloan JA, Cella D, Hays RD. Clinical significance of patient-reported questionnaire data: another step toward consensus. J Clin Epidemiol 2005;58:1217-9. Mandrekar S, Kamath C. Presenting longitudinal data. Curr Probl Cancer 2005;29:296-305. Huntington JL, Dueck A. Handling missing data. Curr Probl Cancer 2005;29:317-25.

Curr Probl Cancer, November/December 2006

297