Psychiatry Research 152 (2007) 223 – 231 www.elsevier.com/locate/psychres
Clinical prediction of antidepressant response in mood disorders: Linear multivariate vs. neural network models Alessandro Serretti a,b,⁎, Paolo Olgiati b , Michael N. Liebman c , Hai Hu c , Yonghong Zhang c , Raffaella Zanardi a , Cristina Colombo a , Enrico Smeraldi a a
Psychiatry Department, San Raffaele Scientific Institute, Milan, Italy b Institute of Psychiatry, University of Bologna, Bologna, Italy c Windber Research Institute, Windber, PA, USA
Received 9 February 2006; received in revised form 20 May 2006; accepted 26 July 2006
Abstract Predicting the outcome of antidepressant treatment by pre-treatment features would be of great usefulness for clinicians as up to 50% of major depressives may not have a satisfactory response in spite of adequate trials of antidepressant drugs. In the present article we compared a linear multivariate model of predictors with a few artificial neural network (ANN) models differing from one another by outcome definition and validation procedure. The sample consisted of a reanalysis of 116 inpatients with a major depressive episode included in a 6-week open-label trial with fluvoxamine. With the original outcome definition (responders/non-responders), ANN performed better than logistic regression (90% of correct classifications in the training sample vs. 77%). However only 62% of new patients were correctly predicted by ANN for their outcome class. Length of the index episode, psychotic features and suicidal behavior emerged as outcome predictors in both models, while demographic characteristics, personality disorders and concomitant somatic morbidity were pointed to only by ANN analysis. Increase of classes in the outcome field resulted in a more elevated error: 46.4% for three classes, 60.4% for four classes and 70.3% for five classes. Overall, our findings suggest that antidepressant outcome prediction based on clinical variables is poor. The ANN approach is as valid as traditional multivariate techniques for the analysis of psychopharmacology studies. The complex interactions modelled through ANN may eventually be applied at the clinical level for individualized therapy. However, the accuracy of prediction is still far from satisfactory from a clinical point of view. © 2006 Elsevier Ireland Ltd. All rights reserved. Keywords: Bipolar disorder; Major depressive disorder; Neural network; Outcome predictors
1. Introduction Major depression is traditionally considered a treatable mental disorder. Nevertheless up to 50% of such
⁎ Corresponding author. Institute of Psychiatry, University of Bologna, Viale Carlo Pepoli 5, 40123 Bologna, Italy. Tel.: +39 051 6584233; fax: +39 051 521030. E-mail address:
[email protected] (A. Serretti).
patients may not have a satisfactory response in spite of adequate trials of antidepressant drugs (Fava, 2003). This emphasizes the need for pre-treatment outcome predictors to guide drug choice. Moderators of antidepressant treatment have typically fallen into three categories (Kraemer et al., 2002): (1) demographic characteristics (age; gender); (2) illness features (severity at baseline; age of onset; length of illness; length of current episode; number of episodes; longitudinal subtype; endogenicity; melancholia; family history of
0165-1781/$ - see front matter © 2006 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.psychres.2006.07.009
224
A. Serretti et al. / Psychiatry Research 152 (2007) 223–231
mood disorders); and (3) social factors (living status; social support). Unfortunately, even if the largest body of evidence pertains to clinical features (e.g. number of previous episodes, length of illness, and symptom severity) (Nierenberg, 2003), none of the putative predictors could unequivocally be linked to antidepressant treatment outcome, and clinical practice relies mainly on a trial and error procedure (Akiskal, 1982; Fahndrich, 1987; Croughan et al., 1988; Joyce and Paykel, 1989; Brugha et al., 1990; Vallejo et al., 1991; Goodwin, 1993; Cohn et al., 1996; Goodnick, 1996; Spillmann et al., 1997; Esposito and Goodnick, 2003). Moreover clinicians encounter difficulties in translating results obtained from large randomized clinical trials to the treatment of individual patients, both because of the low representativeness of such samples and the lack of a comprehensive analysis of all the patients' features (Braunholtz et al., 2001; Seeman, 2001). A model of response prediction including a large number of factors would therefore be of great usefulness. Recently, genetic factors have been suggested as possible outcome predictors (Serretti and Artioli, 2004). The combination of clinical and genetic predictors could increase the likelihood of forecasting subjects' treatment outcome prior to treatment so that it could be of practical use for clinicians. Unfortunately, the genetics of antidepressant response is not simply Mendelian. As a psychiatric disorder, it is most probably characterized by a multifactorial genetic contribution with a number of susceptibility genes interacting with each other (Botstein and Risch, 2003; Merikangas and Risch, 2003). A feasible strategy would be to identify non-genetic predictors and then build a complex model including genetic and non-genetic factors. Traditional statistical techniques may not be appropriate for such tasks because they rely on the basic assumption of linear combinations only (Moore and Williams, 2002; Schaid et al., 2003). On the contrary, artificial neural networks (ANN) avoid oversimplification by incorporating highorder interactions between predictive variables (Lucek et al., 1998). The superiority of the ANN approach over linear data mining methods is well established in a number of areas of medical research (Ottenbacher et al., 2001; Delen et al., 2005; Jaimes et al., 2005), though misuses are possible (Schwarzer et al., 2000). A large study on antidepressant response failed to identify a good prediction using ANN (Winterer et al., 1998). However, in that study, a limited number of clinical variables were considered, and this may explain the poor prediction. Our group employed ANNs in psychopharmacology to analyse response to moclobemide (Politi et al., 1999), sertraline (Franchini et al., 2001) and
fluvoxamine (Serretti and Smeraldi, 2004). In all these studies, response was defined as a binary or continuous variable, yet there is increasing evidence that patients may be subgrouped on the basis of their response pattern (Quitkin et al., 1984; Stewart et al., 1998). Moreover, only in the last study did we divide the sample into training, validation and test groups. Both the lack of any validation and sample subdivision are prone to criticisms. The sample split, in fact, causes a loss of power, given that subjects in the validating part may have different patterns of association between genotypes and drug response. An alternative strategy is to train the ANN with the samples of one study and to test it with the samples of another study. However, another study with the same design and with the same measured variables is hard to perform. The use of a leave-one-out cross (LOOC) validation strategy may overcome this bias. (See Supplementary Material in the Appendix for further analyses of this issue.) In order to reliably predict the outcome of fluvoxamine treatment, we compared a linear multivariate model with a few neural network models differing from one another by output definition and validation procedure. 2. Methods 2.1. Sample and data collection The present sample is a part of a larger pool reported separately. More details on recruitment procedure are available elsewhere (Smeraldi et al., 1998; Serretti et al., 2001; Serretti and Smeraldi, 2004). Briefly, we analyzed mood disorder patients with a major depressive episode (age = 50.97 ± 13.4 years; onset = 37.46 ± 13.6 years; female/male: 89/27; unipolar/bipolar: 88/28) who were included in a 6-week open-label trial with fluvoxamine. Age at illness onset was 37.46 ± 13.6 years. Length of the index episode before study entry was 24.6 ± 31 months. The baseline HAM-D score was 27.8 ± 5.3. Lithium treatment was maintained nine subjects (7.7%). Concomitant psychotropic drugs were not allowed except for flurazepam at bedtime (up to 45 mg) and lithium maintenance. The subjects were originally collected to investigate the effects of gene polymorphisms on antidepressant response; this is a complementary analysis of non-genetic outcome predictors. Given that subjects were collected in the context of different studies, a variable number of features are available for subjects. For this reason, we selected out of the global sample the subjects with the most extensive clinical assessment as detailed in the next paragraph. This subsample was homogeneous for recruitment methods and did not differ
A. Serretti et al. / Psychiatry Research 152 (2007) 223–231
from the global sample as to age, sex and diagnosis distribution. Depressive symptomatology was assessed at baseline and weekly thereafter until study end-point using the 21-item Hamilton Rating Scale for Depression (HAM-D-21) (Hamilton, 1967) administered by trained senior psychiatrists blind to genetic data and to treatment. Fluvoxamine was gradually tapered to 300 mg daily from day 8; maximum doses were reached when not contra-indicated by medical status or marked side effects. Clinical information was collected throughout structured interviews – SCID-I/P (First et al., 1995) for psychiatric diagnosis and symptoms, SCID-II (First et al., 1990) for personality disorder assessment – and by reviewing medical records (Leckman et al., 1982) when present. A detailed retrospective life chart was completed for each patient using the methodology as reported in the NIMH Life Chart Method (Roy-Byrne et al., 1985). The trial was conducted according to the Declaration of Helsinki (Hong Kong Amendment) and the European Guidelines for Good Clinical Practices, based on a
225
protocol that was submitted to and approved by our local ethical committee. Written informed consent was obtained from all probands who were free to withdraw from the study at any time for any reason, without an effect on their medical care. 2.2. Statistical analyses An “intent-to-treat” analysis was carried out for all patients who had a baseline assessment and at least one post-baseline evaluation, with the last observation carried forward on the HAM-D. The following 15 variables were tested as outcome predictors: age, gender, education, polarity (UP/BP), personality disorders, age of onset, length of current episode, presence of delusions within the index episode, lifetime occurrence of delusions, number of episodes with psychotic manifestations, lifetime history of suicidal behavior, lithium maintenance, side effects, headache and comorbid somatic disorders.
Table 1 Univariate associations with outcome (responders/non-responders to treatment) Non-responders (N = 29) Gender Female Male Polarity Unipolar Bipolar Delusional features No Yes Personality disorders No Yes Suicidal behavior No Yes Side effects No Yes Current medical condition No Yes Lithium maintenance No Yes Age Education years Age of onset Length of current episode (days) Baseline HAM-D score a b c
df = 1. df = 114. not included in the analysis.
Responders (N = 87)
Totals (N = 116)
23 (25.8%) 6 (22.2%)
66 (74.2%) 21 (77.8%)
89 27
20 (22.7%) 9 (32.1%)
68 (77.3%) 19 (67.9%)
88 28
20 (26.9%) 9 (31.6%)
73 (73.1%) 14 (68.4%)
93 23
24 (24.5%) 5 (27.7%)
74 (75.5%) 13 (72.3%)
98 18
25 (23.8%) 4 (36.3%)
80 (76.2%) 7 (65.7%)
105 11
25 (23.6%) 4 (40.0%)
81 (76.4%) 6 (60.0%)
106 10
20 (21.9%) 9 (36.0%)
71 (78.1%) 16 (64.0%)
91 25
26 (24.3%) 3 (33.3%) 55.9 ± 12.1 9.27 ± 4.09 40.8 ± 14.3 36.8 ± 48.8 29.8 ± 5.09
81 (75.7%) 6 (66.7%) 49.3 ± 13.5 9.99 ± 3.94 36.3 ± 13.3 20.5 ± 21.1 21.2 ± 5.28
107 9 116 116 116 116 116
χ2/t
P
0.015 a
0.701
0.968 a
0.325
3.050 a
0.080
0.087 a
0.767
0.837 a
0.360
1.313 a
0.252
2.056 a
0.151
0.371 a
0.548
2.354 b 0.835 b 1.555 b 2.505 b 7.663 c
0.020 0.405 0.122 0.014 0.001
226
A. Serretti et al. / Psychiatry Research 152 (2007) 223–231
Study aim was to compare three models of outcome prediction. Such models differed from one another with respect to type of analysis (linear or non-linear), outcome definition and validation. The first model was linear multivariate. A multiple logistic regression analysis was performed to identify demographic and clinical predictors of antidepressant treatment outcome. This was expressed as a dichotomous variable, and patients were classified as either responders or nonresponders. Response was defined as follows: end-point HAM-D score b 8 with no delusion. The other models were built using artificial neural networks (ANN). In the present article, we used the multilayer perceptions architecture (MLP) (Rumelhart and McClelland, 1986) according to our previous analysis demonstrating its superiority versus other networks (Serretti and Smeraldi, 2004). Back propagation (10,000 epochs) (Bishop,
1995) was then used to minimize the prediction error made by the network. A first ANN model was produced by randomly splitting the sample into two equal-sized training and test groups (split-sample validation: SSV). The number of neurons was automatically defined, and the number of layers was set to three. Variables inclusion was left to the program in both logistic regression and ANN. The performance of the output classifier (responder/non-responder) was evaluated by estimating the area under the Receiver Operating Characteristic (ROC) curve (Zweigh and Campbell, 1993) based on the threshold at the output node. Confidence thresholds were automatically determined. With the aim of further evaluating the prediction, we also used an independent method. The analysis was repeated using leave-one-out cross-validation (LOOC) (Bishop, 1995). One of the 116 records was taken out and
Fig. 1. Diagram of the best fitting network (MLP, dichotomous output and split sample validation). The network has 1 hidden layer with 7 nodes and 15 inputs. Each variable influences each of the 7 nodes that in turn contribute to a cumulative score predicting the final outcome. Specific weights of connections are not reported.
A. Serretti et al. / Psychiatry Research 152 (2007) 223–231
the rest were used to build the ANN model. The resulting model was then used to make prediction on the record set aside. This was done for each of the 116 cases. Treatment outcome was alternatively regarded as a two-group classification (responder/non-responder) as in prior models or as a variable with multiple classes defined by the HAM-D profile. For a further analysis based on the different response patterns, a K-means clustering was performed on HAM-D score series (from baseline to week 6) to give three, four and five clusters, each corresponding to a class of the output variable. Different response time courses were therefore obtained, and patients were subgrouped depending on their response pattern. This is an improvement over the dichotomous responder/non-responder grouping, but its validity has not been fully established. As part of evaluation and comparison, the five-cluster model was also analyzed using the leave-many-out cross-validation (LMOC) approach (Bishop, 1995). Input data were partitioned into four equal-sized subsets. One subset is set aside for validation and the other three are combined to build an ANN model to predict the class identity. This process was repeated for each of the four subsets. Further details are reported in the Supplementary Material in the Appendix. All analyses were performed with commercial software: STATISTICA Kernel release 5.5 for multiple logistic regression analysis, STATISTICA ANNs release 4.0 F for SSV–ANN, SPSS Clementine 8.5 for LOOC and Neural Works Predict™ from Neural Ware for LMOC.
227
3. Results 3.1. Classification of patients according to antidepressant response In the study sample, 29 patients out of 116 (25%) were classified as non-responders and 87 (75%) as responders. Non-responders were older (P = 0.020), with a more severe depression (P = 0.001), a longer index episode (P = 0.014) and (marginally) more delusional features (P = 0.08) (see Table 1). By using logistic regression, a correct outcome prediction was possible for 89 subjects (77%): 80 responders (92%) and 9 non-responders (31%). The overall model fit was: χ2 = 26.54, df = 15, P = 0.032. Then we performed the ANN analysis. First, we used half of the sample to train the ANN model and the rest for validation. Three hundred and fifty networks were trained and five retained. They had good performance (percent of correctly classified cases, N0.80%) and a relatively simple architecture with one or two layers and up to eight nodes in each layer. The best network had one layer with seven nodes (Fig. 1). To improve the model, we trained this network with the back-propagation algorithm: after 10,000 epochs, no further improvement was observed; therefore we stopped research and retained the network. Error rate (percentage of patients with incorrect classification) was 10.4% in the training sample and 38% in the test group. This means that the model was able to predict response to treatment in 62%
Table 2 Importance of demographic and clinical factors in logistic regression and ANN analysis Logistic regression
a
Age Age of onset Delusion at present a Number of delusional episodes Lifetime delusion a, b Education a Gender Headache Length of the index episode a, b Lithium maintenance Current medical condition a Personality disorder a Polarity (UP/BP) Side effects b Suicidal behavior a, b
SSV–ANN
LOOC–ANN
P
OR
95% CI
Rank
Error
Rank
Error
0.20 0.26 0.45 0.94 0.039 0.94 0.65 0.54 0.003 0.84 0.84 0.55 0.34 0.027 0.042
0.96 0.97 2.60 0.96 0.16 1.00 0.75 0.61 0.97 1.25 0.87 0.67 0.54 0.15 0.19
0.91–1.02 0.93–1.02 0.21–31.8 0.36–2.59 0.03–0.96 0.88–1.15 0.21–2.68 0.13–2.95 0.96–0.99 0.13–11.8 0.23–3.28 0.17–2.55 0.15–1.95 0.03–0.81 0.04–0.93
6 11 8 13 3 4 9 10 1 15 2 5 7 14 12
0.23 0.19 0.21 0.13 0.38 0.30 0.20 0.20 0.42 0.02 0.41 0.27 0.22 0.09 0.15
6 8 2 15 13 5 7 13 3 11 9 4 10 12 1
0.50 0.46 0.58 0.09 0.25 0.50 0.47 0.25 0.54 0.33 0.41 0.52 0.39 0.27 0.59
The table compares the relative importance (rank) of demographic and clinical predictors of antidepressant response as revealed by logistic regression analysis and by the two best ANN models with split sample validation (SSV) and leave-one-out cross-validation (LOOC). a Factor ranking among the first six in ANN analysis. b Factor significantly (P b 0.05) associated with outcome in logistic regression analysis.
228
A. Serretti et al. / Psychiatry Research 152 (2007) 223–231
of a new patient group used for validation, a value close to the 77% obtained with logistic regression but on an independent sample. By considering the selected classification threshold of the output neuron, the area under the ROC curve was 0.769. To increase model accuracy we tested four different definitions of outcome and LOOC approach. Cluster analysis yielded the following four groups: responders within 2 weeks, responders, and improvers in the first 2 weeks and then non-responders, improvers in the first 2 weeks and then partial responders (HAM-D approximately between 8 and 18), non-responders. When the original definition (responders/non-responders) was used, LOOC gave a 38.8% error rate. Increases of classes in the outcome field were associated with elevated error as follows: 46.5% for three classes, 60.3% for four classes and 73.3% for five classes. The LMOC approach using NeuralWare with five-class output gave a similar error rate, ∼72%. 3.2. Identification of outcome predictors We compared outcome predictors identified by logistic regression analysis and by the two best ANN models with SSV and LOOC (see Table 2). Length of the index episode was significantly associated with antidepressant response in logistic regression model (P = 0.0034). Furthermore it was the most important predictor in the SSV–ANN model and the third most important predictor in the LOOC–ANN model. Lifetime delusion (P = 0.039) and suicidal behavior (P = 0.042) emerged as significant predictors in logistic regression and one ANN model. Level of education, personality disorders and age were pointed to by ANN models while the role of side effects was exclusively supported by logistic regression (P = 0.027). Five out of the first six factors – length of the index episode, delusional features (delusion at present or lifetime delusion), personality disorders, level of education and age – were the same in both ANN models. 4. Discussion The present article addressed a question that is of great importance both for researchers and clinicians: is it possible to forecast the cases who will respond to an antidepressant drug by examining demographic features and pre-treatment clinical profile? From a statistical point of view, the problem has almost exclusively been dealt with in linear multivariate analyses. However, such methods may not effectively approach, which is probably non-linear. Following this
premise, we performed a comparative analysis of the same data set using multilogistic regression and artificial neural networks. We reasoned that multilogistic regression is equivalent to a single layer ANN (Bishop, 1995); therefore, ANN with more than one hidden node would be a generalization of multilogistic regression. As hypothesized, ANN performed better than logistic regression: the latter got more than twice as many wrong classifications as ANN (23% vs. 10%). Some factors emerged as outcome predictors with both approaches: length of the index episode (a longer episode may suggest a chronic course or partial remission with previous treatments; however, this information was not available for all patients), psychotic features and suicidal behavior. Other possible predictors were identified to only by ANN analysis. Some of them (level of education, personality disorders and age) are reported in several studies (Esposito and Goodnick, 2003; Nierenberg, 2003) while others (e.g. somatic comorbidity) have seldom been investigated. This is in line with our previous studies (Politi et al., 1999; Franchini et al., 2001), which involved independent outpatient samples. Another problem with linear multivariate methods is that they only act retrospectively, showing a statistically significant association between some factors and a past event in a given sample that does not warrant that the same association is present in other samples. Actually a variety of studies on clinical prediction of antidepressant response have yielded inconsistent results (Esposito and Goodnick, 2003; Nierenberg, 2003). This makes generalization a critical issue. The basic characteristic of neural networks is just that once they have learnt to model the function that relates input variables to a target output, they can be used to make predictions where the output is not known (Haykin, 1994). We performed ANN on an independent sample: with the best network, 62% of new cases were correctly predicted for their responsiveness to antidepressant treatment. This rate is approximately the same as reported in a previous study that compared fluvoxamine treatment predictors with either ANN or discriminant analysis in almost 20,000 depressed patients (Winterer et al., 1998). Given the a priori chance of correct classification is 50% in a twoclass problem, the percentage we obtained is not very encouraging. However, not all putative outcome predictors were considered in this study. Baseline HAM-D score, which reflects pre-treatment depression severity, could not be analyzed as it had been used to define response. Psychiatric familiarity, number of episodes (depressive/manic) and social support were also excluded because this information was not available for all
A. Serretti et al. / Psychiatry Research 152 (2007) 223–231
patients. Adding such factors would more likely increase accuracy in tested models. Furthermore, prediction might be improved by combining clinical variables with genes. Indeed the role of serotonin transporter promoter region (HTTLPR), tryptophan hydroxylase (TPH) and G protein Beta 3 subunit polymorphisms in modulating response to selective serotonin reuptake inhibitors has been documented by a number of studies by our group (Smeraldi et al., 1998; Zanardi et al., 2000; Serretti et al., 2001) and others (see our recent review, Serretti et al., 2005). In this perspective, dividing the sample into training and test groups would not be appropriate as genetic composition might vary in each group. Accordingly, we tested an alternative procedure based on iterative comparison of each patient with the rest of the sample. Such a method gave the same overall classification results as split sample validation with the strong advantage of analyzing the whole (unbroken) sample. However, the ranking of variables was somewhat different between the two methods, underscoring the instability of such systems in small samples. Finally, we examined outcome definition. Most studies distinguished between responders and remitters using specific criteria for clinical response (e.g. N50% decrease in HAM-D score or HAM-D-17 b 7) (Smeraldi et al., 1998; Fava, 2003). However, recent literature is focusing on residual symptomatology, associated with a higher risk of relapse (Fava et al., 2002; Tranter et al., 2002). Moreover, the analysis of response patterns during antidepressant treatment has allowed to separate a delayed, persistent response, due to a true drug effect, from an early, nonpersistent improvement, probably attributable to placebo effect (Quitkin et al., 1984; Stewart et al., 1998). This would lead to identifying at least three classes of response: true responders, false nonpersistent responders and non-responders. We performed a cluster analysis on HAM-D weekly scores and no resulting cluster showed a placebo-response pattern (data not shown). In addition, we found that increasing the number of classes over the traditional responder/ non-responder dichotomy considerably decreased the accuracy of outcome prediction. Thus, the usefulness of multiple classes of response is not supported by the present data. Although our results suggest that the observed clinical variables are sufficient to predict antidepressant response in almost two thirds of an independent sample, we cannot guarantee that our sample is fully representative of all depressed subjects. Firstly, the small sample size, particularly in the case of splitting, may offer unstable results. Secondly, our center is a tertiary care
229
setting and, on average, subjects are affected by more severe forms of mood disorders. Females were overrepresented in our sample, but this factor was not influencing the observed analyses. Also, our sample included bipolar patients, who are usually not included in antidepressant studies; this factor was also not influencing results. Further, drug abuses and dependencies were not included; even if this allows us to consider a more homogeneous group of patients, they may represent only a part of subjects in general settings (Solomon et al., 2000). In conclusion, it has been argued that the complex interactions modelled through ANN may be eventually applied at the clinical level for individualized therapy (Pickar and Rubinow, 2001; Serretti and Smeraldi, 2004). We demonstrated that the non-linear ANN approach is more efficient in forecasting antidepressant treatment outcome than linear multilogistic regression. Hence, for a satisfactory clinical prediction, the classification should be considerably higher than the one we obtained. This highlights the need for further development of the method before application in clinical practice. Substantial improvements should most likely occur by combining clinical and genetic features. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j. psychres.2006.07.009. References Akiskal, H.S., 1982. Factors associated with incomplete recovery in primary depressive illness. Journal of Clinical Psychiatry 43, 266–271. Bishop, C., 1995. Neural Networks for Pattern Recognition. University Press, Oxford. Botstein, D., Risch, N., 2003. Discovering genotypes underlying human phenotypes: past successes for Mendelian disease, future approaches for complex disease. Nature Genetics 33, 228–237. Braunholtz, D.A., Edwards, S.J., Lilford, R.J., 2001. Are randomized clinical trials good for us (in the short term)? Evidence for a “trial effect”. Journal of Clinical Epidemiology 54, 217–224. Brugha, T.S., Bebbington, P.E., MacCarthy, B., Sturt, E., Wykes, T., Potter, J., 1990. Gender, social support and recovery from depressive disorders: a prospective clinical study. Psychological Medicine 20, 147–156. Cohn, C.K., Robinson, D.S., Roberts, D.L., Schwiderski, U.E., O'Brien, K., Ieni, J.R., 1996. Responders to antidepressant drug treatment: a study comparing nefazodone, imipramine, and placebo in patients with major depression. Journal of Clinical Psychiatry 57, 15–18. Croughan, J.L., Secunda, S.K., Katz, M.M., Robins, E., Mendels, J., Swann, A., Harris-Larkin, B., 1988. Sociodemographic and prior clinical course characteristics associated with treatment response in depressed patients. Journal of Psychiatric Research 22, 227–237.
230
A. Serretti et al. / Psychiatry Research 152 (2007) 223–231
Delen, D., Walker, G., Kadam, A., 2005. Predicting breast cancer survivability: a comparison of three data mining methods. Artificial Intelligence in Medicine 34, 113–127. Esposito, K., Goodnick, P., 2003. Predictors of response in depression. Psychiatric Clinics of North America 26, 353–365. Fahndrich, E., 1987. Biological predictors of success of antidepressant drug therapy. Psychiatric Developments 5, 151–171. Fava, M., 2003. Diagnosis and definition of treatment-resistant depression. Biological Psychiatry 53, 649–659. Fava, G.A., Fabbri, S., Sonino, N., 2002. Residual symptoms in depression: an emerging therapeutic target. Progress in NeuroPsychopharmacology and Biological Psychiatry 26, 1019–1027. First, M.B., Spitzer, R.L., Gibbon, M., Williams, B.W., Benjamin, L., 1990. Structured Clinical Interview for DSM-IV Axis II Personality Disorders (SCID-II). Biometrics Research Department, New York State Psychiatric Institute, New York. First, M.B., Spitzer, R.L., Gibbon, M., Williams, J.B., 1995. Structured Clinical Interview for DSM-IV Axis I Disorders – Patient Edition (SCID-I/P, Version 2.0). Biometrics Research Department, New York State Psychiatric Institute, New York. Franchini, L., Spagnolo, C., Rossini, D., Smeraldi, E., Bellodi, L., Politi, E., 2001. A neural network approach to the outcome definition on first treatment with sertraline in a psychiatric population. Artificial Intelligence in Medicine 23, 239–248. Goodnick, P.J., 1996. Predictors of treatment response in mood disorders. Clinical Practice. American Psychiatric Press, Washington, DC. Goodwin, F.K., 1993. Predictors of antidepressant response. Bulletin of the Menninger Clinic 57, 146–160. Hamilton, M., 1967. Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology 6, 278–296. Haykin, S., 1994. Neural Networks: A Comprehensive Foundation. Macmillan Publishing, New York. Jaimes, F., Farbiarz, J., Alvarez, D., Martinez, C., 2005. Comparison between logistic regression and neural networks to predict death in patients with suspected sepsis in the emergency room. Critical Care 9, 150–156. Joyce, P.R., Paykel, E.S., 1989. Predictors of drug response in depression. Archives of General Psychiatry 46, 89–99. Kraemer, H.C., Wilson, G.T., Fairburn, C.G., Agras, W.S., 2002. Mediators and moderators of treatment effects in randomized clinical trials. Archives of General Psychiatry 59, 877–883. Leckman, J.F., Sholomskas, D., Thompson, W.D., Belanger, A., Weissman, M.M., 1982. Best estimate of lifetime psychiatric diagnosis: a methodological study. Archives of General Psychiatry 39, 879–883. Lucek, P., Hanke, J., Reich, J., Solla, S.A., Ott, J., 1998. Multi-locus nonparametric linkage analysis of complex trait loci with neural networks. Human Heredity 48, 275–284. Merikangas, K.R., Risch, N., 2003. Will the genomics revolution revolutionize psychiatry? American Journal of Psychiatry 160, 625–635. Moore, J.H., Williams, S.M., 2002. New strategies for identifying gene–gene interactions in hypertension. Annals of Medicine 34, 88–95. Nierenberg, A.A., 2003. Predictors of response to antidepressants general principles and clinical implications. Psychiatric Clinics of North America 26, 345–352. Ottenbacher, K.J., Smith, P.M., Illig, S.B., Linn, R.T., Fiedler, R.C., Granger, C.V., 2001. Comparison of logistic regression and neural networks to predict rehospitalization in patients with stroke. Journal of Clinical Epidemiology 54, 1159–1165.
Pickar, D., Rubinow, K., 2001. Pharmacogenomics of psychiatric disorders. Trends in Pharmacological Sciences 22, 75–83. Politi, E., Balduzzi, C., Bussi, R., Bellodi, L., 1999. Artificial neural networks: a study in clinical psychopharmacology. Psychiatry Research 87, 203–215. Quitkin, F.M., Rabkin, J.G., Ross, D., Stewart, J.W., 1984. Identification of true drug response to antidepressants. Use of pattern analysis. Archives of General Psychiatry 41, 782–786. Roy-Byrne, P., Post, R.M., Uhde, T.W., Porcu, T., Davis, D., 1985. The longitudinal course of recurrent affective illness: life chart data from research patients at the NIMH. Acta Psychiatrica Scandinavica, Supplementum 317, 1–34. Rumelhart, D.E., McClelland, J.L., 1986. Parallel Distributed Processing: Explorations in the Microstructure of Cognition, vol. 1 and 2. The MIT Press, Cambridge, MA. Schaid, D.J., Olson, J.M., Gauderman, W.J., Elston, R.C., 2003. Regression models for linkage: issues of traits, covariates, heterogeneity, and interaction. Human Heredity 55, 86–96. Schwarzer, G., Vach, W., Schumacher, M., 2000. On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology. Statistics in Medicine 19, 541–561. Seeman, M.V., 2001. Clinical trials in psychiatry: do results apply to practice? Canadian Journal of Psychiatry 46, 352–355. Serretti, A., Artioli, P., 2004. From molecular biology to pharmacogenetics: a review of the literature on antidepressant treatment and suggestions of possible candidate genes. Psychopharmacolgy 174, 490–503. Serretti, A., Smeraldi, E., 2004. Neural network analysis in pharmacogenetics of mood disorders. BMC Medical Genetics 5, 27. Serretti, A., Zanardi, R., Rossini, D., Cusin, C., Lilli, R., Smeraldi, E., 2001. Influence of tryptophan hydroxylase and serotonin transporter genes on fluvoxamine antidepressant activity. Molecular Psychiatry 6, 586–592. Serretti, A., Artioli, P., Quartesan, R., 2005. Pharmacogenetics in the treatment of depression: pharmacodynamic studies. Pharmacogenetics and Genomics 15, 61–67. Smeraldi, E., Zanardi, R., Benedetti, F., Dibella, D., Perez, J., Catalano, M., 1998. Polymorphism within the promoter of the serotonin transporter gene and antidepressant efficacy of fluvoxamine. Molecular Psychiatry 3, 508–511. Solomon, D.A., Keller, M.B., Leon, A.C., Mueller, T.I., Lavori, P.W., Shea, T., Coryell, W., Warshaw, M., Turvey, C., Maser, J.D., Endicott, J., 2000. Multiple recurrences of major depressive disorder. American Journal of Psychiatry 157, 229–233. Spillmann, M., Borus, J.S., Davidson, K.G., Worthington 3rd, J.J., Tedlow, J.R., Fava, M., 1997. Sociodemographic predictors of response to antidepressant treatment. International Journal of Psychiatry in Medicine 27, 129–136. Stewart, J.W., Quitkin, F.M., McGrath, P.J., Amsterdam, J., Fava, M., Fawcett, J., Reimherr, F., Rosenbaum, J., Beasley, C., Roback, P., 1998. Use of pattern analysis to predict differential relapse of remitted patients with major depression during 1 year of treatment with fluoxetine or placebo. Archives of General Psychiatry 55, 334–343. Tranter, R., O'Donovan, C., Chandarana, P., Kennedy, S., 2002. Prevalence and outcome of partial remission in depression. Journal of Psychiatry and Neuroscience 27, 241–247. Vallejo, J., Gasto, C., Catalan, R., Bulbena, A., Menchon, J.M., 1991. Predictors of antidepressant treatment outcome in melancholia: psychosocial, clinical and biological indicators. Journal of Affective Disorders 21, 151–162. Winterer, G., Ziller, M., Linden, M., 1998. Classification of observational data with artificial neural networks versus discriminant analysis in
A. Serretti et al. / Psychiatry Research 152 (2007) 223–231 pharmacoepidemiological studies – can outcome of fluoxetine treatment be predicted? Pharmacopsychiatry 31, 225–231. Zanardi, R., Benedetti, F., DiBella, D., Catalano, M., Smeraldi, E., 2000. Efficacy of paroxetine in depression is influenced by a functional polymorphism within the promoter of serotonin
231
transporter gene. Journal of Clinical Psychopharmacology 20, 105–107. Zweigh, M.H., Campbell, G., 1993. Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. Clinical Chemistry 39, 561–577.