191
Psychiatr,v Resrarc~h.30: I9 I - I99 Elsevier
The Introductory Evaluation Fred W. Reimherr,
Placebo
Mark
F. Ward,
Washout:
A Retrospective
and William
F. Byerley
Received January I I, 1988; revised version received January 3, 1989; accepted AJar& 9, /9#. Abstract. We examined the effects of the “introductory placebo washout” technique by reanalyzing the results of a recent trial of an experimental antidepressant. At the beginning, all patients were placed on placebo in a single-blind design. Patients who were rated as placebo responders with the physician-administered Hamilton Rating Scale for Depression (HRSD) were excluded from the trial. In spite of this technique, an alternative measure of depression indicated that many patients with a positive response to placebo had been entered in the trial. In the reanalysis, elimination of these “hidden placebo responders” did not lower the final placebo response rate and actually diminished the differences observed at the end of the study between the active treatment and placebo groups. These data suggest that the introductory placebo washout may have unpredictable, possibly confounding effects on patient samples in trials of antidepressant agents. Key Words. Antidepressants, fluoxetine.
affective
disorder,
imipramine,
placebo
response,
The introductory placebo washout has become an accepted procedure in most trials designed to test psychotropic medications. It usually lasts 7-14 days following acceptance of the patient into the study. All patients have signed an informed consent agreement and believe that they have formally entered the study. The investigator, who assesses the patient’s condition before and after administration of placebo, is aware that the patient is on placebo at this stage of the study. If the subject shows significant improvement (generally defined as 1525%) on a measure of symptom intensity, he is dropped from the study. The rationale for the use of this technique is both appealing and straightforward. Eliminating placebo responders at the outset of a study, it is believed, will lower the placebo response rate observed at the end of a multiweek study, and differences between placebo and active treatment groups will accordingly be enhanced. The origin of the introductory placebo washout technique is obscure. The 1977 Guidelines for the Clinical Evaluation of Antidepressant Drugs (Crout and Finkel, 1977) recommended that a drug-free period should precede the start of an antidepressant drug trial, but did not mention combining this drug-free period with administration of placebo. The International Guidelinesfor Clinical Trials of PsychoFred W. Reimherr, M.D., is Clinical Assistant Professor of Psychiatry and Director of the Mood Disorder Clinic, Department of Psychiatry, University of Utah Medical Center. Mark F. Ward. Ph.D., is Assistant Professor of Psychiatry, Oregon Health Science University. William t. Byerley, M.D.. is Assistant Professor of Psychiatry, University of Utah Medical Center. (Reprint requests to Dr. F. Reimherr. Mood Disorder Clinic. Dept. of Psychiatry. Unwerhity of Utah Medical Center, Salt LakeCity, U184132. USA.) 0165-17X1 ‘X9, 503.50 @ 19X9 l~l\e~~cr Scicntllic
I’uhlishcr\
Ireland
I.td.
192 tropic Drugs (Psychopharmacology Bulletin, 1978) did not mention the introductory placebo washout. In a review article on experimental methods, Prien and Levine (1984) stated that an introductory placebo washout may be an alternative to concomitant placebo control and that this procedure may eliminate rapid remitters or carryover effects of previously administered medications. Data or references that might support their recommendations, however, were not provided. We decided to evaluate the introductory placebo washout retrospectively to help define the effects of this procedure. In the trial, we reanalyzed scores on the Hamilton Rating Scale for Depression (HRSD; Hamilton, 1960) which were obtained at the start of placebo administration and after 1 week on placebo. Subjects who dropped below a score of 20 on the HRSD or had a20% or greater improvement on the HRSD were considered “placebo responders” and were excluded from the trial. The remaining subjects were then randomly assigned to one of three treatment groups-placebo, imipramine, or an experimental antidepressant (fluoxetine). We hypothesized that since the HRSD was administered by a physician who was aware that the patient was on placebo, bias could potentially have altered this assessment. Pressures to produce a study population, for example, could have led the physician to underestimate improvement during placebo administration. As a result. a number of patients could have been entered in the study who were actually “hidden placebo responders.” Fortuitously, we had available another measure of depression, the Hopkins Symptom Checklist (SCL; Derogatis et al., 1974), which was not used as a criterion to decide whether patients were placebo responders. This scale is completed by the patient. Since the patients were obviously not aware that they had been on placebo tablets during the initial week of the study, the depression scale of the SCL provided an alternative and perhaps less biased measure of improvement on placebo. Results on the SCL depression scale indicated that a significant number of patients with very positive responses to placebo had been entered in the study. Next, we determined whether the amount of improvement produced by placebo in these study patients was similar to the amount produced by placebo in patients actually rejected as placebo responders. Data from 18 patients identified as placebo responders during introductory placebo washouts and rejected from this study and several very similar studies of depression were collected. Each of these patients had been rejected because of improvement measured by the HRSD but also had been given the SCL. These patients had a median improvement of 18.9% on the SCL depression scale, dropping from a mean score of 2.85 (* 0.480) to 2.23 (+ 0.582). Sixteen out of 97 of the patients retained in the current drug study dropped more than 18.9% on the SCL depression scale during the introductory placebo washout. Twenty-two of the 97 patients retained in the drug trial had dropped below a mean score of 2.23 on the SCL depression scale while on placebo during the introductory placebo washout. Thus, the amount of improvement these patients demonstrated while on placebo was at least as large as the amount of improvement demonstrated by the patients who actually were excluded from drug trials because of improvement during a placebo washout phase. An examination of how these “hidden placebo responders” influenced results of the study would provide preliminary answers to several questions: (1) In the analysis of a
193 clinical trial, does exclusion of patients with a positive initial response to placebo diminish the response rate seen in placebo groups at end point? (2) In the analysis of a clinical trial, does exclusion of patients with a positive initial response enhance the differences between active and placebo groups observed at the end of the study? (3) If a placebo washout condition is to be used, what “elimination point” based on initial response to placebo will produce the maximum difference in rate of improvement between drug and placebo groups during the ensuing study? Methods The trial we reexamined was a three-cell, double-blind trial comparing an experimental antidepressant, fluoxetine, to imipramine and placebo (Byerleyet al., 1988). Patients were recruited from local mental health centers, from private practitioners, and from a limited amount of advertising. They were required to meet DSM-III (American Psychiatric Association, 1980) criteria for major depression with a duration of at least 1 month and to have an HRSD score of at least 20. All patients were initially given capsules containing an inert placebo for 7-10 days. Those who dropped below 20 or who had a 20% or greater improvement on the HRSD were excluded from the study. The remaining patients entered the actual double-blind, three-cell study. The SCL was administered both at baseline and after 7 days of placebo, but was not used as a criterion for retention in the study. In fact, it was not even scored until the entire study was completed. Starting doses of fluoxetine and imipramine were 20 and 75 mg, respectively. In the final 4 weeks of the study, daily dosages were 40-80 mg fluoxetine and 100-300 mg imipramine. All patients who finished at least 2 weeks of treatment were included in the data analysis. Progress was monitored weekly using the the HRSD, the SCL, and the Physician Clinical Global Rating of Improvement (CGI) (a 7-point scale: marked improvement, no change, markedly worse). In this reexamination, patients treated with fluoxetine and imipramine were combined because there was no significant difference in outcome between these two groups. On a post hoc basis, a patient was considered to be a “hidden placebo responder” if he was included in the study and had showed a > 20% improvement on the depression scale of the SCL during the placebo washout phase. We then analyzed the results of this study with the “hidden placebo responders” included and excluded from the analysis. Next, a search was made to determine if any “elimination point,” based on initial response to placebo, enhanced thedifferences in outcome between the drug and placebo groups. All patients included in the study were used in this analysis. Starting with the patient having the greatest improvement on placebo during the placebo washout phase, we dropped patients from the analysis of the study one by one. After each patient was dropped, study results were then recalculated to obtain the Yoof patients who responded to medication or placebo. Patients rated moderately or markedly improved on the CC1 were considered to be treatment responders. Finally, patients were divided into three groups based on their initial response to placebo: (1) Positive initial placebo responders (mean initial improvement on the SCL depression scale q 21%, n = 33); (2) neutral initial placebo responders (mean initial improvement on the SCL depression scale = 2%, n = 32); (3) negative initial placebo responders (mean initial deterioration on the SCL depression scale q - 1OS, n = 3 1). The drug study results for each of these groups were then compared.
Results results of the study were computed with the total population, including and excluding the “hidden placebo responders,” on two measures of improvement: the CGI and the HRSD (baseline to end point improvement score). These results are presented in Table 1.
The
194 Table 1. Study results as measured by the CGI and HRSD with hidden placebo responders included and excluded Measurement instrument & study group
Hidden responders included
Hidden responders excluded
43%
41%
CGI improvers lmipramine
or fluoxetine
Placebo
(n = 65)
13% (n = 31)
X2
8.61,
p < 0.005
10.7,
SD = 9.21
5.58,
SD 5 9.11
(n = 56)
16% (n = 25) 4.88,
p < 0.05
HRSD improvement lmipramine
or fluoxetine
Placebo t test(2-tailed)
t = 2.50,
df = 94,
p < 0.02
9.9, SD = 8.95 5.24,
SD = 10.02
t = 2.06,
df = 80,
P < 0.05
Note. CGI improvers = % of patients rated at end point as much or very much improved on the Physician Clinical Global Rating of Improvement. HRSD (Hamilton Rating Scale for Depression Jimprovement = HRSD baseline minus HRSD end point improvement score.
The exclusion of patients with a 20% improvement on placebo as measured by the SCL depression scale did not improve the results obtained in the drug study. In fact, it diminished the size of the differences in improvement between the active and placebo groups. On the CGI, the difference in response rates between the active and placebo groups dropped from 30% to 25%. Surprisingly, when the “hidden placebo responders” who showed an immediate positive response to placebo were excluded, the placebo response rate observed at the end of the study actually increased from 13% to 16%. Similarly, the differences between the treatment groups measured by the HRSD also decreased when the “hidden placebo responders” were excluded from the analysis of the drug study. On the HRSD, the amount of improvement found in the placebo group did drop by 0.4 HRSD points when the hidden placebo responders were removed, but the amount of improvement in the active treatment group dropped even more-O.8 HRSD points. This change resulted in a sizable drop in the t value from 2.5 (p < 0.02) to 2.06 @ < 0.05). Thus, our initial examination showed that rejection of patients with a > 20% improvement on placebo during the initial period on placebo was detrimental to study results. The SCL depression scale is generally considered a secondary outcome measure. In this study, however, the relationship between the SCL depression scale and the HRSD is of special significance. In Table 2 correlation coefficients between the main study outcome measures and improvement measured by the HRSD and the SCL depression scale are compared. All the correlation coefficients are statistically significant, withp values exceeding 0.000 1, and are of very similar magnitude and direction. The correlation between change on the HRSD and the SCL depression scale over the course of the study is 0.8. Within the actual 6-week study, patients treated with placebo had an average improvement of 20% on the HRSD (27.3 + 0.58 to 21.7 * 8.0) and 13% on the SCL depression scale (2.81 f 4.5 to 2.45 f 0.72). Thus, in this population, one can
195 Table 2. Comparison of measures of improvement during the study and change measured by the SCL dep and the HRSD Correlation with
improvement on HRSD during study
Measure
Correlation with improvement on SCL dep during study
HRSD end point
-0.89
-0.76
Physician Global Rating of Improvement
-0.89
-0.78
Patient Global Rating of Improvement
-0.75
-0.71
SCL dep end point
-0.69
-0.75
0.80
-
Improvement SCL dep
on
HRSD = Hamilton Ratmg Scale for Depression. Checklist improvement scale.
SCL dep = Hopkins
Symptom
conclude that the HRSD and the SCL depression scale are measuring very closely related symptoms and the changes are similar in magnitude. Next, we tried to determine whether any other exclusion point based on initial response to placebo could be supported. For instance, if a 25% or even 30% improvement occurred with placebo, should the patient be eliminated? Does a higher “elimination point,” based on initial response to placebo, produce a lower rate of response to placebo at the end point of the ensuing study and consequently greater differences in rates of response between drug and placebo groups? This analysis began with the complete sample including all “hidden placebo responders.” Starting with the patient having the greatest improvement on placebo during the placebo washout (53%) we dropped patients from the analysis of the drug study one by one, and the results were recalculated after each deletion. Fig. 1 presents the results. On the left side of Fig. 1, the entire sample is used. Starting from the left, the patient with the largest placebo response was eliminated. Moving to the right, all patients with a placebo response greater than the indicated elimination point value were eliminated from the study analysis. Surprisingly, the minimum level of response to placebo treatment (13%) during the ensuing drug study occurred when no patients with a positive initial response during the placebo washout were dropped from the study analysis. As positive placebo washout responders were dropped and the results recalculated, the response rate to the placebo treatment actually increased until a maximum response rate of 20% occurred at an elimination point of 7% improvement during the placebo washout phase. The maximum level of response to active treatment (43%) also occurred when no patients were dropped from the study analysis. As patients with a positive response to
196 placebo were dropped and the results recalculated, the response rate to the active treatments steadily deteriorated until a minimum positive response rate of 32% occurred at an exclusion point of 7% improvement during the placebo washout phase. Once patients were formally entered in the study, rejection based on an initial positive response to placebo, even a very large placebo response, diminished the magnitude of the positive results obtained during the drug study. Fig. 1 does not reveal any appropriate elimination point based on initial response to placebo above which patients should be rejected because they are “placebo responders.‘* Next, patients were divided into three groups based on their initial or washout response to placebo: positive initial placebo responders (mean initial improvement on the SCL depression scale = 21%), neutral initial placebo responders (mean initial improvement on the SCL depression scale = 2%), and negative initial placebo responders (mean initial deterioration on the XL depression scale = minus 10%). The end-point study results for each of these groups are depicted in Fig. 2 and listed in Table 3. Fig. 1. Response to placebo & drug treatment with alternative elimination points
Fig. 2. Placebo washout response vs. drug study response (O/O) ii? ‘&
I?! Antidepressant
60-
0
Placebo
.:.:.:.:+‘.:. ::::::::::::::: ::::::::::::::: $;:;;g;z ::::::::::::::: .$$t:>;:;:
j::::::::::::
::::::::::::::
z
OI
53 46 41 35 23 20 16 16 I4
II
7
5
ELIMINATION POINT BASED ON X IMPROVEMENT DURING INTRODUCTORY PLACEBO WASHOUT
Positive
Neutra
I
1 Negative
INITIAL PLACEBO RESPONSE The graph shows % of positive, neutral, and negative introductory placebo washout responders judged to be treatment responders at the end of the ensuing drug study.
Table 3. Positive responders to placebo or drug during according to response to placebo during washout period
study
divided
Patient’s response to placebo during initial placebo washout period Study treatment group
Positive
Neutral
Negative
Total
9% (n = 11)
30% jn = 101
0% (fJ = 101
13% in = 31
50% (n = 22)
55% (I7 = 22)
24% (n = 21)
in = 651
Continued placebo treatment-% responders lmipramine or fluoxetine% of positive responders
of positive
I
43%
197 Positive and neutral initial responders to placebo showed similar responses to the active treatments in the study, but neutral initial placebo responders were more likely to have a positive response to placebo in the drug study (30% vs. 9%). Conversely, those with a negative response during the washout had a much lower response rate in the drug study to both placebo and the active treatments. The largest difference in rates of response between active and placebo study conditions occurred among the positive initial placebo responders who actually demonstrated a very low rate of response to placebo when continued on placebo for an additional 6 weeks. These data were examined with a two-way analysis of variance (ANOVA). The first independent factor was type of treatment (placebo, imipramine, or fluoxetine). The second independent factor was type of initial response to placebo (positive, neutral, or negative). The dependent outcome variable was classification as a responder or nonresponder as determined by the doctor’s global assessment. Patients rated as moderately or markedly improved were considered responders. Both independent factors produced significant effects, and there was a significant interaction effect (type of treatment: F = 5.52; df = 2, 87; p = 0.0058; type of initial response to placebo: F 3.59; df = 2, 87; p = 0.0309; interaction: F = 2.56; df = 4, 87; p = 0.0434). q
Discussion Most researchers believe that the introductory placebo washout technique is a useful procedure. We initially attempted to use the XL depression scale as a more accurate measure of placebo response. We felt that by excluding “hidden placebo responders” identified by this instrument we might improve the differences observed in the analysis of this drug study. In fact, the results deteriorated. This initial observation led to the more extensive investigation presented here. Our results indicate that the placebo washout may have unpredictable or confounding effects. Elimination of “hidden placebo responders” from the study analysis diminished differences between the active and placebo treatment groups in the drug study that followed the placebo washout. Many argue that placebo responsiveness is a stable characteristic of an individual patient and that it can be accurately measured by use of a placebo washout phase. This view may be inaccurate. Our data suggest that it is not a stable characteristic but is a more complex artifact of clinical trials. In addition, it was impossible to find any other exclusion point to use to reject patients as placebo responders that improved study results. Patients with a positive initial response to placebo demonstrated both favorable and relatively selective responses to the active treatment. That is, during the actual study, they responded to the active medication or, if kept on placebo for an additional 6 weeks, did not show a continued response to placebo. They appeared to lose their placebo responsiveness as indicated by the final placebo response rate in this group of 9% as opposed to 30% in the patients with a neutral response to placebo during the washout period. A very interesting study published by Molcan et al. (198 1) might offer an explanation for these phenomena. Molcan et al. measured the antianxiety effect of placebo on two successive administrations (2 weeks apart) with the same subjects. On the second administration, the effect of the placebo was diminished. Thus, the placebo washout might artificially suppress the level of placebo-induced improvement during the study, which is not the intended purpose of the placebo washout.
198 One objection that may be raised about the conclusions of our study concerns the relatively low number of placebo responders in this trial compared to other trials of antidepressant medications. Perhaps, in a population with a higher placebo response rate, the introductory placebo washout would be of value. That may be true, but empirical validation is needed. A second possible objection is the comparability of the two measures of depression-the HRSD and the SCL depression scale. Each measure assesses somewhat different aspects of depressive symptomatology. It is felt that the HRSD evaluates more severe, vegetative signs and the SCL depression scale measures more subjective experiences of depressed mood based on the patient’s self-report. In our patients, the correlation between these measures is very high. In addition, changes on the HRSD during the study were larger in percentage than changes on the SCL depression scale. Thus, the 20% change on the SCL depression scale that we used to define hidden placebo responders is well supported by our data. It is particularly surprising that the placebo washout technique has been almost universally accepted despite the absence of supportive data. While the current article was being prepared, Rabkin et al. (1986) published one of the few articles dealing with the placebo washout. Forty-five patients rejected from several antidepressant studies as placebo responders during a placebo washout phase were examined periodically for 6 months after being rejected. Forty-four percent apparently remained well, while 56% again developed depressive symptoms. Patients who relapsed more frequently met Research Diagnostic Criteria for intermittent depressive disorder and for other nonaffective psychiatric disorders. Although not a controlled study, it does suggest that the placebo washout identified patients whom one might wish to exclude from a study population. While the preliminary work of Rabkin et al. (1986) does support the placebo washout, additional research is warranted. One might argue that use of a placebo washout procedure is in violation of Food and Drug Administration guidelines for the protection of human subjects. One very important element in protecting the rights of human volunteers is complete disclosure of the nature of the study. According to the Federal Register (Food and Drug Administration, 1981), informed consent agreements must contain “a description of the procedures to be followed.” The number of treatment phases and the possibility of being switched from one treatment to another would appear to be fundamental parts the informed of the “procedures to be followed.” In addition, when appropriate, consent agreement must contain “Anticipated circumstances under which the subject’s participation may be terminated by the investigator without regard to the patient’s consent.” Such information is often either being withheld or presented in a somewhat misleading manner when a placebo washout is included in a study. Although the placebo washout did not prove helpful in reducing the number of patients who responded to placebo in this study, there are some practical reasons why it might be useful. A patient’s ability to comply with study procedures can be assessed. If a patient lacks the ability to follow such procedures, it is doubtful whether he should be exposed to the risks of an experimental treatment. The placebo washout allows a more careful medical evaluation before exposure to an experimental treatment. For
199 those discontinuing other medications, it produces a longer drug-free interval before entering a study. However, these purposes could be accomplished without the use of a placebo by implementing a longer baseline evaluation. Finally, the decision whether to enter a patient is a study is both critical and difficult. Certainly, one may postpone entering a patient in a study for a variety of reasons such as prior use of medication, diagnostic uncertainty, or questions about the severity or consistency of a patient’s symptoms. However, once the decision to enter a patient is made and the patient has started on medication, our data indicate that tampering with the study population, as occurs with a placebo washout procedure, may be detrimental to the study.
References Psychiatric Association. DSM-III: Diagnostic and Statistical Manual 3rd ed. Washington, DC: APA, 1980. Byerley, W.F.; Reimherr, F.W.; Wood, D.R.; and Grosser, B.l. Fluoxetine, serotonin uptake inhibitor, for the treatment of outpatients with major depression. American
of Mentul
Disorders.
Clinical
Psychopharmacology,
8:
a selective Journal
of
I 12-l 15, 1988.
Crout, J.R., and Finkel, M.J. Guidelines for the Clinical Evaluation of Antidepressant (DHEW Publication #77-3042) Rockville, MD: U.S. Food and Drug Administration, 1977. Derogatis, L.R.; Lipman, R.S.; Rickels, K.; Uhlenhuth, E.H.; and Covi, L. The Hopkins Symptom Checklist (HSCL): A Self-Report Symptom Inventory. Behaviorial Science, 19: 1- 15, 1974. Food and Drug Administration, Health and Human Services Department. Federal Register, 46(17):88, January 27, 1981. Hamilton, M. A rating scale for depression. Journal of Neurology, Neurosurgery and Drugs.
Psychiatry,
2356-62,
International Bulletin,
1960.
Guidelines
14( 1):47-65,
for Clinical
Trials of Psychotropic
Drugs.
Psychopharmaco/og_v
1978.
Molcan, J.; Heretik, V.; Novotny, K.; Vajdickova, K.; and Zucha, 1. The influence of experience with placebo on the placebo effect. Activitas Nervosa Superior, 23: 184-185, 1981. Prien, R.F., and Levine, J. Researchand methodological issuesfor evaluating the therapeutic effectiveness of antidepressant drugs. Psychopharmacology Bulletin, 20:250-257, 1984. Rabkin, J.G.; McGrath, P.; Stewart, J.W.; Harrison, W.; Markowitz, J.S.; and Quitkin, F. Follow-up of patients who improved during placebo washout. Journal of Clinical Psychopharmacology.
6:274-278,
1986.