Journal of Affective Disorders 141 (2012) 160–167
Contents lists available at SciVerse ScienceDirect
Journal of Affective Disorders journal homepage: www.elsevier.com/locate/jad
Review
Study design features affecting outcome in antidepressant trials Verena Henkel a, b,⁎, Flurina Casaulta a,⁎, Florian Seemüller b, Stephan Krähenbühl c, Michael Obermeier b, Jürg Hüsler d, Hans-Jürgen Möller b a b c d
Swissmedic, Swiss Agency for Therapeutic Products, Division Clinical Review, Hallerstr. 7, CH-3000 Berne 9, Switzerland Department of Psychiatry, Ludwig-Maximilians-University Munich, Nussbaumstr. 7, D-80336 Munich, Germany Division of Clinical Pharmacology and Toxicology, University Hospital Basel, Petersgraben 4, CH-4031 Basel, Switzerland Division of Statistics, University Berne, Sidlerstr. 5, CH-3012 Berne, Switzerland
a r t i c l e
i n f o
a b s t r a c t Background: A key issue in the approval process of antidepressants is the inconsistency of results between antidepressant clinical phase III trials. Identifying factors influencing efficacy data is needed to facilitate interpretation of the results. Methods: We reviewed data packages submitted as new drug applications to Swissmedic focusing on pivotal, short-term antidepressant trials. Included studies used HAMD-17 or HAMD-21 as primary measures and enrolled patients aged 18–65 years with a diagnosis of major depression. Due to the hierarchical structure of the data a mixed-effect regression model has been applied with responder rates as primary outcome criterion. Random intercepts were estimated for the different trials, while study design factors were assigned as explanatory fixed effects. Results: The final dataset was based upon 35 study reports with a total of N =10,835 patients. Significant results were found for study arm (placebo vs. active compound, pb 0.001), sample size (p =0.002), duration of treatment (p =0.024), two or more active treatment arms (p =0.022) and the individual drug (p= 0.029). Furthermore, a tendency to an association with the outcome was observed for baseline disease severity (p =0.077) and possibility of dosing adaptation (p =0.076). Limitations: Due to strict confidentiality agreements, individual drugs are not reported here. Further research should consider additional variables that might have an impact on the results of antidepressant trials. Conclusions: Efficacy data in antidepressant trials is significantly affected by various factors. These factors and their potentially confounding role have to be considered in the interpretation of the results. © 2012 Elsevier B.V. All rights reserved.
Article history: Received 31 January 2011 Received in revised form 17 February 2012 Accepted 12 March 2012 Available online 1 June 2012 Keywords: Antidepressant trial Major depression Study design
Contents 1. 2.
Introduction . . . . . . . . . . . Methods . . . . . . . . . . . . . 2.1. Identification of studies . . 2.2. Criteria for including studies 2.3. Confidentiality . . . . . . .
. . . . . . and . .
. . . . . . . . . . . . . . . . . . . . . final database . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
161 161 161 161 163
⁎ Corresponding authors at: Swissmedic, Swiss Agency for Therapeutic Products, Division Clinical Review, Hallerstr. 7, CH-3000 Berne 9, Switzerland. Tel.: + 41 31 322 07 40; fax: + 41 31 322 04 32. E-mail address:
[email protected] (V. Henkel). 0165-0327/$ – see front matter © 2012 Elsevier B.V. All rights reserved. doi:10.1016/j.jad.2012.03.021
V. Henkel et al. / Journal of Affective Disorders 141 (2012) 160–167
2.4. Selection of the study design factors potentially related to outcome 2.5. Statistical analyses . . . . . . . . . . . . . . . . . . . . . . . 3. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1. Selection of studies . . . . . . . . . . . . . . . . . . . . . . . 4.2. Statistical analyses in the present report . . . . . . . . . . . . . 4.3. Factor “investigational product” . . . . . . . . . . . . . . . . . 4.4. Factor “geographical region” . . . . . . . . . . . . . . . . . . . 4.5. Factor “sample size” . . . . . . . . . . . . . . . . . . . . . . . 4.6. Factor “placebo vs. verum”. . . . . . . . . . . . . . . . . . . . 4.7. Factor “dosing schedule”. . . . . . . . . . . . . . . . . . . . . 4.8. Factor “study duration” . . . . . . . . . . . . . . . . . . . . . 4.9. Factor “number of active treatment arms” . . . . . . . . . . . . 4.10. Factor “baseline severity of the underlying disease” . . . . . . . . 4.11. Possible implications for future research and clinical practice . . . . Role of the funding source . . . . . . . . . . . . . . . . . . . . . . . . . . Conflict of interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1. Introduction Unipolar depression is a serious medical disease associated with an approximately 15% lifetime risk of suicide attempt (Chen and Dilsaver, 1996). As this illness is among the top five of the world's total burden of disease (Murray and Lopez, 1997), there is a high medical need for effective treatment strategies (Nierenberg, 2010). Simultaneously, an intense debate about the efficacy of antidepressants is ongoing (Kirsch and Moncrieff, 2007; Kirsch et al., 2008; Turner et al., 2008). In this context, inconsistency of results between antidepressant clinical trials is a key issue (e.g., Parker et al., 2003). Moreover, the interpretation of the data of antidepressant trials is complex. Some factors that have to be considered in the evaluation of the data are already known, e.g. diversities of study populations and ethnic factors (CHMP, 2010; ICH, 1998). Baseline severity of depression as a factor related to the study population has been shown to be a correlate of clinical outcome (Kirsch et al., 2008). However, so far study design factors have rarely been investigated regarding their impact on outcome. Recently, higher medication response rates have been reported for comparator trials related to placebo-controlled antidepressant phase III studies (Rutherford et al., 2009). In an earlier phase of drug development, determination of the dose– response relationship is certainly important (Bollini et al., 1999) and subsequently the use of the optimal dosing schedule in the pivotal trials. In summary, there are many factors that may influence outcome in antidepressant trials. However, available studies on some selected factors report inconsistent results (Khan et al., 2003). Insufficient transparency regarding available data has been raised as a major issue in the current debate about the efficacy of antidepressants (Turner et al., 2008). In this context, our main objective was to identify relevant study design factors influencing outcome in terms of efficacy. We reviewed complete data packages submitted as new drug applications to Swissmedic from 1995 to 2008 focusing on pivotal, short-term antidepressant trials. Since pharmaceutical
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
161
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
163 163 164 164 164 164 165 165 165 165 165 165 166 166 166 166 166 166 166
sponsors are required by legal regulation to submit results of all clinical trials conducted with an investigational product, selective reporting for favorable results should have been avoided in this analysis. 2. Methods 2.1. Identification of studies Assessment reports on each new drug application in the indication major depressive disorder submitted to the national regulatory agency of Switzerland (Swiss Agency for Therapeutic Products called “Swissmedic”) were systematically checked and relevant data were extracted. Due to methodological and regulatory changes in requirements for clinical studies over the past decades, we decided to focus on a limited time interval from 1995 to 2008. Swissmedic's clinical reviewers generate an assessment report on each new drug application. Clinical reviewers evaluate the complete data sets from the original documentation and summarize relevant data in the assessment reports in accordance with current, updated international regulatory guidelines. Finally, the summary of the data and its interpretation (based on a multi-level peer review process and the involvement of an expert advisory board) form the basis for the final approval decision. All assessment reports have to follow a standardized structure, which is in principle comparable to that of the FDA or EMA. Each report summarizes safety and efficacy data collected in the complete clinical development program. Efficacy data from pivotal, short-term studies were assessed for this analysis. This first step resulted in a dataset including N = 12,710 randomized patients, all with a diagnosis of major depression (MDD) of at least moderate severity at baseline (according to DSM-III or DSM-IV criteria, respectively). 2.2. Criteria for including studies and final database In a second step, only studies that were in accordance with specific inclusion/exclusion criteria were selected. Only data
162
V. Henkel et al. / Journal of Affective Disorders 141 (2012) 160–167
Table 1 Main inclusion and exclusion criteria. Inclusion criteria
Exclusion criteria
Pivotal short-term studies in patients with a diagnosis of MDD without co-morbidity Treatment duration: 6 or 8 weeks + 9 weeks Phase II b and III studies Study design: Randomized controlled double-blind parallel group multicenter studies
Long-term studies Other indications than MDD Any other treatment duration Phase I, II a and IV studies Study design: Open label studies Studies using a placebo run-in period Studies in special populations, e.g. in children/adolescents or in elderly or in patients with hepatic or renal impairment Add-on regimen Other scales for use as primary endpoint, e.g. MADRS (Montgomery and Asberg, 1979) Milder baseline severity: HAMD-17 b 22 and a higher baseline severity > 28
Studies in adults aged >/= 18 and b/= 65 years Single treatment regimen Scales for use as primary endpoint: HAMD (-17 or -21 item versions) (Hamilton, 1960) At least moderate baseline severity: mean baseline HAMD-17 >22 and b28
1960, 1967). Regulatory guidelines (CPMP, 1997) recommend the use of the Hamilton Rating Scale of Depression (HAMD-17 or other versions) or the Montgomery Asberg Depression Scale (MADRS) (Montgomery and Asberg, 1979) as primary endpoints to determine symptomatic improvement in major depression. The majority of studies considered in the present analyses used the HAMD-17. In our preliminary database, only four studies used the MADRS as primary endpoint in the pivotal studies, and one single study used the IDS-CR (Rush et al., 1996). Due to this imbalance in the pattern of distribution, trials using MADRS or IDS-CR as primary measure were excluded from the database. All included studies were considered to be pivotal and had a two-arm (comparison with placebo), three-arm (additional comparison to an active comparator) or multi-arm design (using several dosages to establish the clinical effective dose range). The final dataset included N = 10,835 patients and 35 trials on antidepressant compounds. The process is illustrated
from double-blind, randomized, controlled, pivotal, phase III, parallel group studies with antidepressant compounds tested against placebo or against placebo and active comparator in patients with depressive disorder of at least moderate baseline severity (mean HAMD-17 baseline: 22–28) were selected (Table 1). All included studies were conducted in adults aged 18–65 years, i.e. studies in special populations (e.g. children and adolescents or elderly patients) were excluded. The diagnosis “major depressive disorder” had to be classified according to an internationally acknowledged classification system, preferably DSM-IV, using the diagnostic criteria therein. Trials were required to last 6, 8 or 9 weeks (short-term trials). The very few studies lasting 5 weeks, as well as studies testing prevention of relapse or recurrence (long-term trials), were excluded. For all included studies, original data were available for responder analyses using standard definitions of response criteria specified by the Hamilton Rating Scale for Depression (HAMD, response defined as ≥50% reduction in symptoms) (Hamilton,
52 pivotal trials (double-blind, shortterm, randomised placebo and/or active controlled parallel group multicentre clinical studies) (N=12.710 patients) retrieved from Swissmedic assessment reports Four clinical studies in special populations were excluded (patients were > 65years) Two trials were excluded because inclusion criteria were not restricted to major depression Two trials were excluded because a placebo run-in period was part of the study design (dose-response studies) Two trials were excluded because they were not placebo controlled One trial was excluded because the primary efficacy criterion was IDS-CR total score Four trials were excluded because the primary efficacy variable was MADRS total score One trial was excluded because of a study duration <6 weeks
35 trials (N=10.835patients) used for statistical analysis -20 placebo and comparator controlled -15 only placebo-controlled (including multiple-arm dose-finding studies)
Fig. 1. Flow chart of included studies.
V. Henkel et al. / Journal of Affective Disorders 141 (2012) 160–167
163
Table 2 Selected characteristics of included studies.
Number of Studies Treatment groups Mean pre-treatment HAMD score Responder (%) Dosing schedule Fixed Flexible Study duration 6 weeks 8 weeks 9 weeks Primary outcome criterion HAMD-17 HAMD-21 Sample size > 100
Placebo-controlled trials
Number of patients
Comparator and placebo-controlled trials
Number of patients
15 40 23.9 42.9%
4031
6804
1729
20 65 24.1 50.9%
8 7
2793 1238
7 13
2909 3895
7 6 2
1214 2305 512
11 9 –
3052 3752 –
13 2 13
3597 434
15 5 20 (all)
5394 1410
in Fig. 1 in accordance with PRISMA guidelines (Moher et al., 2009) (Table 2). 2.3. Confidentiality Due to confidentiality agreements between Swissmedic and the pharmaceutical industry, we do not refer to any individual the drug in this report. Data extraction and direct access to the database were strictly restricted to the employees of Swissmedic, and members of the human medical expert advisory committee of Swissmedic (HMEC) as well as the statistician. In a second step, an anonymized database was used so that other authors (including a second independent statistician) were only involved in the interpretation and discussion of anonymized data. 2.4. Selection of the study design factors potentially related to outcome The selection process regarding explanatory variables can be described as follows: In a first step, those variables were focused on that are considered to be the most relevant in accordance with the daily routine and experiences in reviewing clinical data during an approval process. In a second step, a critical review of the literature was performed. Subsequently, explanatory variables were selected and data was extracted from each included pivotal trial (see Methods). The following explanatory variables were selected: • Duration of the treatment (in weeks) (Rutherford et al., 2009) • Dosing regimen (fixed or flexible) (Khan et al., 2003) • Depression scale used as primary outcome criterion (HAMD17 or HAMD-21) (Bech et al., 1992; Williams, 2001) • Sample size (ICH, 1998) • Region (three categories, coded as single regions: EU, USA, or “others”; these were all other nations or geographical areas, respectively) (Chen et al., 2010; CHMP, 2010) • Number of treatment arms (i.e., two-arm vs. three-or-morearm trials) (Sinyor et al., 2010) • Investigational product (Angst and Stabl, 1992; Angst et al., 1995; Cipriani et al., 2009) • Study arm (verum vs. placebo)
3464
• Setting (inpatients, outpatients, mixed) • Patients' variables (age, baseline severity of the underlying disease, ethnicity, gender, height, weight) to avoid a potential confounding of the results Some of these selected explanatory variables could not be included in the final statistical model. Due to a strong link between the individual investigational products and the depression scale used, depression scale (HAMD-17 vs. HAMD-21) could not be included in the statistical model. Furthermore, ethnicity, height and weight were excluded due to missing data for a substantial number of patients.
2.5. Statistical analyses Response rate was chosen as primary outcome mainly for clinical reasons (Cipriani et al., 2009). Furthermore, the outcome criterion response rate was used since it was provided in all assessment reports. Obviously, remission is more difficult to achieve in short-term trials and, therefore, has not been considered. Due to the hierarchical structure of the data with 35 different studies offering data of altogether 85 study arms, the linear mixed-effects model in the formulation described in Laird and Ware (1982) was the model of choice. While the different trials served as random intercepts in the models, the above described variables were chosen as explanatory fixed effects. No interactions between variables have been considered due to the generally small number of studies and the relatively high number of possible influencing factors. The final model was found by a stepwise backward exclusion of fixed-effect variables from the starting model based on models comparing likelihood ratio tests. Scatter-plots of standardized residuals were used for validation, as well as the results of a 10-fold cross validation with root-meansquare error (RMSE) serving as a summarization of the prediction error's variation. A result was considered as significant if the p-value was less than 0.05. Due to the exploratory nature of the analysis, corrections for multiple testing were not applied.
164
V. Henkel et al. / Journal of Affective Disorders 141 (2012) 160–167
Statistical analyses were performed using the statistical software environment R 2.11.1 (R Development Core Team, 2011).
The finally selected regression model included seven explanatory fixed factors, of which five were significant: the investigational product itself showed a significant effect (p = 0.029), as well as the study arm (placebo compared to the active compound, p b 0.001) with lower response rates in placebo arms. Greater sample size (p = 0.002), shorter duration of treatment (p = 0.024) and more than one active treatment arm (p = 0.022) were also significantly associated with higher response rates. A higher HAMD score at baseline showed a tendency (p=0.077) to a positive association with higher response rates, and there was also a hint of a relation between high response rates and the possibility of dosage adaption (p=0.076). No significant associations or trends were found for other fixed effects (i.e. gender, age, hospitalization status, or geographical region), and therefore these were excluded from the model. Table 3 lists estimators and p-values based on t-distributions for each explanatory variable of the mixed-effects regression analysis. Cross validation of the mixed-effects model resulted in a RMSE of 7.42 with residual analyses showing a satisfactory goodness-of-fit. For the interpretation, the predicted responder rates (fitted values) for different subgroups were plotted against the prediction error (standardized residuals) in Fig. 2. As there was no obvious association between predicted values and residuals, a successful modeling of the data was assumed. Fig. 2 demonstrates the range of predicted responder rates. We have chosen the number of treatment arms and the medication in each study arm as relevant examples for the results of the analysis. Verum arms clearly showed greater predicted response rates, but also a superiority of three-ormore-arm studies above two-arm studies was visible.
4. Discussion In the following sections, the rationale for the methods of our analyses are provided and the potential meaning of the various factors interacting with efficacy results of antidepressant trials are discussed.
Table 3 p-Values for each variable included in the mixed-effects regression analysis. p-Value Placebo vs. active compound Baseline disease severity Sample size Dosing schedule (fix vs. flexible) Duration of treatment Two or more active treatment arms Individual drug
b 0.001 0.077 0.0022 0.0757 0.0236 0.0224 0.0294
2
standardized residuals
3. Results
3
1
0
-1
-2
Verum and 2 arms Verum and >2 arms Placebo and 2 arms Placebo and >2 arms
-3 20
30
40
50
60
fitted values Fig. 2. Predicted responder rates vs. standardized prediction (verum vs. placebo and two-arm vs. three-or-more-arm studies).
4.1. Selection of studies Publication bias has been raised as a major concern in the ongoing discussion about the efficacy of antidepressants (Kirsch et al., 2008; Moller, 2008; Moreno et al., 2009; Turner et al., 2008). Unlike the situation for published trials, when submitting an investigational compound to regulatory authorities for approval, pharmaceutical sponsors are required legally to submit all results of all clinical trials conducted with the investigational compound. Therefore, selective reporting for favorable results should not have occurred in this report. Previous reports with the same objective, i.e. to detect variables significantly interacting with outcome, were limited to the use of public databases like Medline or the Cochrane Database to identify relevant RCTs (Rutherford et al., 2009). These authors have admitted “… publication bias may have affected which studies were included in these analyses…” (Rutherford et al., 2009). The ability to avoid this important methodological limitation is one of the key strengths of this report.
4.2. Statistical analyses in the present report So far, only very few studies have investigated systematically study design features affecting outcome in antidepressant trials, and the few studies that have been published reported inconsistent results (Khan et al., 2003, 2007). Thus an exploratory approach was used in this analysis without any particular hypotheses. It was considered best to concentrate on a small number of variables and to focus on those study design factors that currently are under debate. Initially, a multiple regression model was considered for the analysis. However, such a model does not account for the nesting effects within studies. Therefore the data was analyzed using a linear mixed-effects regression model as described by Laird and Ware (1982). The final model exhibited a robust goodness-of-fit (please compare Figs. 2–4).
V. Henkel et al. / Journal of Affective Disorders 141 (2012) 160–167
165
4.3. Factor “investigational product”
4.5. Factor “sample size”
A significant association between the investigational product and outcome was observed. In contrast to the analyses by Turner et al. (2008), specific references to individual compounds are not given due to strict confidentiality agreements between Swissmedic and the pharmaceutical industry. Turner's group was able to obtain access to FDA's data as a result of the US Freedom of Information Act (FOIA), which allowed them to download several of the drug reviews from the FDA website. This means that these data, including the drug names, were in the public domain. In contrast, the analyses presented here are based on data extracted directly from our own clinical assessment reports at the Swiss regulatory agency, i.e. these data are not in the public domain. Interestingly, Turner (2010) elsewhere notes that the FDA's data on reboxetine have never been made public since this antidepressant was not approved by the FDA. Thus, regulatory bodies may have more data on drug efficacy and safety compared to what is found in the public domain. Some measures have been taken so far to improve transparency, such as clinical trial registries. However, disclosure of protocols, raw data, results, etc. is still a matter of debate, and the development of public clinical trial registries is an extremely complex activity (Zarin et al., 2011). In general, the extent to which antidepressant agents may vary in terms of efficacy is still rather unclear, and there are only very few reports on the comparative efficacy among antidepressants (Cipriani et al., 2009). We used data from compounds that are on the market either in Europe, Switzerland and in the United States or only in Europe or the United States, respectively. Though regulatory agencies around the world in general receive more or less identical documentation, interpretation of the data as well as final approval decisions may vary.
The link between sample size and demonstration of effects of a compound is very well known: if the sample is too small, the study may fail to demonstrate effects that are truly present in the population (false negative results), or obtain a false positive result by chance. At first glance, the observation of a significant influence of this variable in our analysis may be unsurprising (Table 3).
4.4. Factor “geographical region” The number of countries serving as trial sites outside the U.S. more than doubled in the last 10 years, whereas the number of studies conducted in Western Europe and the U.S. simultaneously declined (Glickman et al., 2009). Indeed, in an analysis of the globalization of biopharmaceutical clinical trials (Thiers et al., 2008), the highest growth rates were observed in countries outside of Western Europe and the U.S. The increasing globalization of clinical trials necessarily raises the question of whether trial results can be extrapolated between different settings. For our analyses, we investigated treatment-by-region interactions for the following three regions: United States of America (here abbreviated as U.S. or USA), Europe and other geographical regions (here called “Others”). No significant effect of geographical region was observed with the mixed-effect model. Therefore this variable has been excluded from the model during the backward selection. Khin et al. (2011) also mention that the overall 8-week trial success rate was about the same in US and non-US trials, but that the numbers were too small to form a basis for comparison. The used statistical model included a random effect which adjusts on the specific center. These adjustments might explain the result here. Further research is needed to clarify the meaning of the variable.
4.6. Factor “placebo vs. verum” Although the results reported here may indicate that antidepressants are efficient, there are increasing questions about the efficacy of these compounds (Kirsch et al., 2008; Moller, 2008; Turner et al., 2008). Simultaneously, the individual compounds seem to differ in terms of efficacy (see Factor “investigational product”). The current controversy on the effectiveness of antidepressants may sometimes underestimate the heterogeneity of the disease. Potential subpopulations are often not very well defined in advance and are not fully investigated in most clinical development antidepressant programs. However, it has to be kept in mind that any trial sample includes subgroups of patients that can differ in terms of outcome.
4.7. Factor “dosing schedule” Our analysis included a few studies with multiple groups comparing the same product at different doses (i.e., dosefinding studies that were simultaneously pivotal studies). Although this could have led to an underestimation of the antidepressant effect, these studies were included since we focused on all pivotal studies declared as such by the sponsor at the time of application. In general, lack of knowledge about active doses and dose–response relationships can be a key issue in an antidepressant development program. Gram has described earlier that inadequate dosing and pharmacokinetic variability are confounding factors in assessment of antidepressant efficacy (Gram, 1990). This might be one reason why the dosing schedule may be a clinical trial design factor that potentially affects antidepressant trial outcome (Bollini et al., 1999; Khan et al., 2003, 2007; Schatzberg, 1991). Our results also suggest an effect of the dosing schedule on treatment outcome (Table 3).
4.8. Factor “study duration” In our analyses, study duration accounted for variability. Unexpectedly, shorter duration was associated with higher response rates and it is difficult to explain this finding, since Rutherford et al. decribed “…The odds of treatment response were higher in 8- and 12-week duration trials compared to those lasting 6 weeks…” (Rutherford et al., 2009). In addition, it has to be mentioned that this analysis only considered short-term trials. Depression is a chronic disorder, and further analyses addressing long-term studies are of utmost importance.
166
V. Henkel et al. / Journal of Affective Disorders 141 (2012) 160–167
4.9. Factor “number of active treatment arms” Our results suggest that the number of treatment arms (two vs. three or more) has an influence on response in antidepressant trials. Specifically, an increase in the patients' chance of receiving active drug in a trial may have a positive effect on outcome. A similar effect was observed by Sinyor et al. (2010) in a pooled analysis using data from published trials and pharmaceutical company trial registries. Sinyor et al. reported that a lower chance of receiving placebo had a positive influence on response rates in both placebo and active treatment arms. 4.10. Factor “baseline severity of the underlying disease” The ongoing debate about the overall efficacy of antidepressants has also raised questions about the role of baseline disease severity in response to both antidepressants and placebo. The results of this analysis suggest a possible interaction between a higher baseline severity of disease and positive outcome in antidepressant trials. If this is due to differential sensitivity of more severely ill patients to treatment (active drug and/or placebo), or only due to regression to the mean, it cannot be determined with the statistical model applied in these analyses. Further analyses should be designed to investigate this issue. 4.11. Possible implications for future research and clinical practice At present, further analyses should be conducted to check the robustness of the evidence presented here. Furthermore, there may be a need for further research in the area of predictors of response/remission in subgroups of depressed patients, since this is a very heterogeneous group of patients. We are planning another analysis on more variables predicting drug versus placebo differences, including both the primary test drug and comparator agents. Furthermore, we plan to use the findings of the present analysis and those of other authors (Fournier et al., 2010; Freeman et al., 2010) to form a priori hypotheses. Role of the funding source No drug manufacturing company was involved in the study design, data collection, data analysis, data interpretation, writing of the report, or in the decision to submit the paper for publication. Conflict of interest Stephan Kraehenbuehl and Juerg Huesler are acting as advisors for Swissmedic. They are receiving fees from the agency for consultancy. There are no further financial relationships to be disclosed. H.-J. Möller has received/is receiving research grants/support from, serves as a consultant or is on the advisory board for, or is a member of the speakers' bureau for AstraZeneca, Bristol-Myers Squibb, Eli Lilly, Eisai, GlaxoSmithKline, Janssen Cilag, Lundbeck, Merck, Novartis, Organon, Pfizer, Sanofi Aventis, Sepracor, Servier, and Wyeth. All other authors declare that they have no conflicts of interest. Acknowledgments The authors would like to thank Erick H. Turner, MD (Assistant Professor, Department of Psychiatry, Joint Assistant Professor, Department of Physiology & Pharmacology, Senior Scholar, Center for Ethics in Health Care, Oregon Health & Science University, Staff Psychiatrist, Portland VA Medical Center, all in Portland OR), for his valuable and very helpful comments during the final revision of the paper.
The authors give thanks to Samuel Hurni, MD, for his support regarding the update of the database in preparation of the additional sensitivity analyses. He conducted the update during his Swiss civilian service at Swissmedic. Amy Brinson has edited the final version of the manuscript in a careful way. We highly appreciate her valuable contribution.
References Angst, J., Stabl, M., 1992. Efficacy of moclobemide in different patient groups: a meta-analysis of studies. Psychopharmacology 106 (Suppl.), S109–S113. Angst, J., Amrein, R., Stabl, M., 1995. Moclobemide and tricyclic antidepressants in severe depression: meta-analysis and prospective studies. Journal of Clinical Psychopharmacology 15, 16S–23S. Bech, P., Allerup, P., Maier, W., Albus, M., Lavori, P., Ayuso, J.L., 1992. The Hamilton scales and the Hopkins Symptom Checklist (SCL-90). A cross-national validity study in patients with panic disorders. The British Journal of Psychiatry 160, 206–211. Bollini, P., Pampallona, S., Tibaldi, G., Kupelnick, B., Munizza, C., 1999. Effectiveness of antidepressants. Meta-analysis of dose–effect relationships in randomised clinical trials. The British Journal of Psychiatry 174, 297–303 297–303. Chen, Y.W., Dilsaver, S.C., 1996. Lifetime rates of suicide attempts among subjects with bipolar and unipolar disorders relative to subjects with other Axis I disorders. Biological Psychiatry 39, 896–899. Chen, Y.F., Wang, S.J., Khin, N.A., Hung, H.M., Laughren, T.P., 2010. Trial design issues and treatment effect modeling in multi-regional schizophrenia trials. Pharmaceutical Statistics 9, 217–229. CHMP, 2010. Reflection Paper on the Extrapolation of Results from Clinical Studies Conducted Outside the EU to the EU-population. EMEA/ CHMP/EWP/692702/2008. Committee for Medicinal Products for Human Use. Cipriani, A., Furukawa, T.A., Salanti, G., Geddes, J.R., Higgins, J.P., Churchill, R., Watanabe, N., Nakagawa, A., Omori, I.M., McGuire, H., Tansella, M., Barbui, C., 2009. Comparative efficacy and acceptability of 12 newgeneration antidepressants: a multiple-treatments meta-analysis. Lancet 373, 746–758. CPMP, 1997. Committee for Proprietary Medicinal Products (CPMP): Note for Guidance on Clinical Investigation of Medicinal Products in the Treatment of Depression. CPMP/EWP/518/97 Rev.1. Fournier, J.C., DeRubeis, R.J., Hollon, S.D., Dimidjian, S., Amsterdam, J.D., Shelton, R.C., Fawcett, J., 2010. Antidepressant drug effects and depression severity: a patient-level meta-analysis. JAMA : The Journal of the American Medical Association 303, 47–53. Freeman, M.P., Mischoulon, D., Tedeschini, E., Goodness, T., Cohen, L.S., Fava, M., Papakostas, G.I., 2010. Complementary and alternative medicine for major depressive disorder: a meta-analysis of patient characteristics, placebo-response rates, and treatment outcomes relative to standard antidepressants. The Journal of Clinical Psychiatry 71, 682–688. Glickman, S.W., McHutchison, J.G., Peterson, E.D., Cairns, C.B., Harrington, R.A., Califf, R.M., Schulman, K.A., 2009. Ethical and scientific implications of the globalization of clinical research. The New England Journal of Medicine 360, 816–823. Gram, L.F., 1990. Inadequate dosing and pharmacokinetic variability as confounding factors in assessment of efficacy of antidepressants. Clinical Neuropharmacology 13 (Suppl. 1), S35–S44 S35-S44. Hamilton, M., 1960. A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry 23, 56–62 56–62. Hamilton, M., 1967. Development of a rating scale for primary depressive illness. The British Journal of Social and Clinical Psychology 6, 278–296. ICH, 1998. Ethnic Factors in the Acceptability of Foreign Clinical Data. http:// www.ich.org/LOB/media/MEDIA481.pdf1998E5(R1). Khan, A., Khan, S.R., Walens, G., Kolts, R., Giller, E.L., 2003. Frequency of positive studies among fixed and flexible dose antidepressant clinical trials: an analysis of the food and drug administration summary basis of approval reports. Neuropsychopharmacology 28, 552–557. Khan, A., Schwartz, K., Kolts, R.L., Ridgway, D., Lineberry, C., 2007. Relationship between depression severity entry criteria and antidepressant clinical trial outcomes. Biological Psychiatry 62, 65–71. Khin, N.A., Chen, Y.F., Yang, Y., Yang, P., Laughren, T.P., 2011. Exploratory analyses of efficacy data from major depressive disorder trials submitted to the US Food and Drug Administration in support of new drug applications. The Journal of Clinical Psychiatry 72 (4), 464–472 (Apr). Kirsch, I., Moncrieff, J., 2007. Clinical trials and the response rate illusion. Contemporary Clinical Trials 28, 348–351. Kirsch, I., Deacon, B.J., Huedo-Medina, T.B., Scoboria, A., Moore, T.J., Johnson, B.T., 2008. Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration. PLoS Medicine 5, e45.
V. Henkel et al. / Journal of Affective Disorders 141 (2012) 160–167 Laird, N., Ware, J., 1982. Random-effects models for longitudinal data. Biometrics 38, 963–974. Moher, D., Liberati, A., Tetzlaff, J., Altman, D.G., 2009. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Medicine 6, e1000097. Moller, H.J., 2008. Isn't the efficacy of antidepressants clinically relevant? A critical comment on the results of the metaanalysis by Kirsch et al. 2008. European Archives of Psychiatry and Clinical Neuroscience 258, 451–455. Montgomery, S.A., Asberg, M., 1979. A new depression scale designed to be sensitive to change. The British Journal of Psychiatry 134, 382–389 382–389. Moreno, S.G., Sutton, A.J., Turner, E.H., Abrams, K.R., Cooper, N.J., Palmer, T.M., Ades, A.E., 2009. Novel methods to deal with publication biases: secondary analysis of antidepressant trials in the FDA trial registry database and related journal publications. BMJ 339, b2981. doi:10.1136/ bmj.b2981 b2981. Murray, C.J., Lopez, A.D., 1997. Global mortality, disability, and the contribution of risk factors: Global Burden of Disease Study. Lancet 349, 1436–1442. Nierenberg, A.A., 2010. The perfect storm: CNS drug development in trouble. CNS Spectrums 15, 282–283. Parker, G., Anderson, I.M., Haddad, P., 2003. Clinical trials of antidepressant medications are producing meaningless results. The British Journal of Psychiatry 183, 102–104 102–4. R Development Core Team, 2011. R: a language and environment for statistical computing. 3-900051-07-0http://www.R-project.org.
167
Rush, A.J., Gullion, C.M., Basco, M.R., Jarrett, R.B., Trivedi, M.H., 1996. The Inventory of Depressive Symptomatology (IDS): psychometric properties. Psychological Medicine 26, 477–486. Rutherford, B.R., Sneed, J.R., Roose, S.P., 2009. Does study design influence outcome? The effects of placebo control and treatment duration in antidepressant trials. Psychotherapy and Psychosomatics 78, 172–181. Schatzberg, A.F., 1991. Dosing strategies for antidepressant agents. The Journal of Clinical Psychiatry 52 (Suppl.), 14–20 14–20. Sinyor, M., Levitt, A.J., Cheung, A.H., Schaffer, A., Kiss, A., Dowlati, Y., Lanctôt, K.L., 2010. Does inclusion of a placebo arm influence response to active antidepressant treatment in randomized controlled trials? Results from a pooled and meta-analysis. The Journal of Clinical Psychiatry 71, 270–279. Thiers, F.A., Sinskey, A.J., Berndt, E.R., 2008. Trends in the globalization of clinical trials. Nature Reviews. Drug Discovery 7, 13–14. Turner, E.H., 2010. Reboxetine in depression. All the relevant data? BMJ 341, c6487. Turner, E.H., Matthews, A.M., Linardatos, E., Tell, R.A., Rosenthal, R., 2008. Selective publication of antidepressant trials and its influence on apparent efficacy. The New England Journal of Medicine 358, 252–260. Williams, J.B., 2001. Standardizing the Hamilton Depression Rating Scale: past, present, and future. European Archives of Psychiatry and Clinical Neuroscience 251 (Suppl. 2), II6–II12 II6–12. Zarin, D.A., Tse, T., Williams, R.J., Califf, R.M., Ide, N.C., 2011. The ClinicalTrials.gov results database—update and key issues. The New England Journal of Medicine 364, 852–860.