Journal of Affective Disorders 268 (2020) 118–126
Contents lists available at ScienceDirect
Journal of Affective Disorders journal homepage: www.elsevier.com/locate/jad
Research paper
Personalized prediction of depression in patients with newly diagnosed Parkinson's disease: A prospective cohort study ⁎
Si-Chun Gua,1, Jie Zhoua,1, Can-Xing Yuana, , Qing Yea,b,
T
⁎
a
Department of Neurology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, 725 South Wanping Road, Shanghai, 200032, China MassGeneral Institute for Neurodegenerative Disease, Department of Neurology, Massachusetts General Hospital, Harvard Medical School, 114 16th Street, Charlestown, MA, 02129, United States
b
A R T I C LE I N FO
A B S T R A C T
Keywords: Depression Machine learning Parkinson's disease Prediction model
Background: Depressive disturbances in Parkinson's disease (dPD) have been identified as the most important determinant of quality of life in patients with Parkinson's disease (PD). Prediction models to triage patients at risk of depression early in the disease course are needed for prognosis and stratification of participants in clinical trials. Methods: One machine learning algorithm called extreme gradient boosting (XGBoost) and the logistic regression technique were applied for the prediction of clinically significant depression (defined as The 15-item Geriatric Depression Scale [GDS-15] ≥ 5) using a prospective cohort study of 312 drug-naïve patients with newly diagnosed PD during 2-year follow-up from the Parkinson's Progression Markers Initiative (PPMI) database. Established models were assessed with out-of-sample validation and the whole sample was divided into training and testing samples by the ratio of 7:3. Results: Both XGBoost model and logistic regression model achieved good discrimination and calibration. 2 PDspecific factors (age at onset, duration) and 4 nonspecific factors (baseline GDS-15 score, State Trait Anxiety Inventory [STAI] score, Rapid Eye Movement Sleep Behavior Disorder Screening Questionnaire [RBDSQ] score, and history of depression) were identified as important predictors by two models. Limitations: Access to several variables was limited by database. Conclusions: In this longitudinal study, we developed promising tools to provide personalized estimates of depression in early PD and studied the relative contribution of PD-specific and nonspecific predictors, constituting a substantial addition to the current understanding of dPD.
1. Introduction Depressive disturbances in Parkinson's disease (dPD) are associated with reduced quality of life and functional impairment in patients with Parkinson's disease (PD), existing across a broad spectrum of severity with a variety of combinations of major and non-major depressive disturbances as important clinical entities (Marsh et al., 2006). These symptoms are not usually chronic, but tend to occur at two temporal points in PD, early in the disease near the time of diagnosis and late in the disease as disability and impairment increase (Goetz, 2010). Thus, the ability to identify individuals at risk of developing depression at the earliest stage of PD is vital, particularly for recruitment into and stratification of clinical trials designed to prevent and treat dPD. Current studies have reported several PD-specific markers for dPD including right hemibody onset (Leentjens et al., 2002), more severe
motor symptoms (Ou et al., 2018), longer disease duration (Sagna et al., 2014), higher levodopa equivalent doses (Ou et al., 2018), age at onset (Leentjens et al., 2002; Sagna et al., 2014; Dissanayaka et al., 2011a), and the presence of nonmotor symptoms such as anxiety (Dissanayaka et al., 2011a; Leentjens et al., 2013), autonomic impairments (Matsubara et al., 2018), limitations in disease-related activities of daily living (Leentjens et al., 2013), cognitive decline (Ou et al., 2018), and rapid eye movement (REM) sleep behavior disorder (RBD) (Neikrug et al., 2014). Measures of baseline depression scores and general factors including age (Leentjens et al., 2002; Schrag et al., 2007), gender (Ou et al., 2018), history of depression (Leentjens et al., 2002, 2013), and history of malignancy have also been suggested as potential markers (Cui et al., 2017). In addition, dopamine deficit on dopamine transporter (DAT) imaging and levels of cerebrospinal fluid (CSF) biomarkers are also linked to dPD (Kang et al., 2013;
⁎
Corresponding authors. E-mail addresses:
[email protected] (S.-C. Gu),
[email protected] (J. Zhou),
[email protected] (C.-X. Yuan),
[email protected] (Q. Ye). 1 SCG and JZ are co-first authors. https://doi.org/10.1016/j.jad.2020.02.046 Received 18 December 2019; Received in revised form 13 February 2020; Accepted 27 February 2020 Available online 28 February 2020 0165-0327/ © 2020 Elsevier B.V. All rights reserved.
Journal of Affective Disorders 268 (2020) 118–126
S.-C. Gu, et al.
Table 1 Baseline of the patients with Parkinson's disease with or without depression at 2-year follow-up. Characteristics Demographic and clinical characteristics Age (median [IQR]) Age at onset (median [IQR]) Gender male, n (%) Education (years, median [IQR]) Symptom onset side, n (%) Right Left Bilateral Duration (months, median [IQR]) Hoehn & Yahr stage, n (%) Stage 1 Stage 2 Stage 3 Motor subtype, n (%) Tremor-dominant Postural instability and gait difficulty Indeterminate MDS-UPDRS II score (median [IQR]) MDS-UPDRS III score (median [IQR]) Baseline GDS-15 score (median [IQR]) MSEADL score (median [IQR]) STAI score (median [IQR]) SCOPA-AUT score (median [IQR]) RBDSQ score (median [IQR]) MOCA score (median [IQR]) UPSIT score (median [IQR]) Without the history of depression, n (%) Without the history of malignancy, n (%) CSF and DAT imaging markers CSF markers (pg/ml, median [IQR]) Total tau Phosphorylated tau Aβ1-42 α-synuclein Total tau: Aβ1-42 ratio DAT imaging (striatal binding ratio) Mean caudate uptake (mean (SD)) Characteristics (n = 276) (n = 66) CSF and DAT imaging markers DAT imaging (striatal binding ratio) Mean putaminal uptake (median [IQR]) Putaminal asymmetry (median [IQR]) Caudate asymmetry (median [IQR])
Without depression (n = 276)
With depression (n = 66)
p value
61.74 [54.93, 68.12] 59.89 [52.95, 66.48] 181 (65.58) 16.00 [14.00, 18.00]
67.88 [53.56, 72.16] 65.00 [51.90–70.42] 24 (66.67) 16.00 [13.50, 18.00]
0.053 0.13 1.00 0.68 0.14
109 (39.49) 162 (58.70) 5 (1.81) 4.23 [2.52, 7.37]
18 (50.00) 16 (44.44) 2 (5.56) 4.97 [2.51, 11.13]
134 (48.55) 140 (50.72) 2 (0.72)
13 (36.11) 23 (63.89) 0 (0.00)
205 (74.28) 42 (15.22) 29 (10.51) 4.00 [2.00, 7.00] 18.00 [14.00, 25.00] 1.00 [0.00, 2.00] 95.00 [90.00, 100.00] 57.00 [49, 68.00] 8.00 [5.00, 11.00] 3.00 [2.00, 5.00] 28.00 [26, 29.00] 23.00 [16.00, 29.00] 236 (85.51) 255 (92.39)
21 (58.33) 10 (27.78) 5 (13.89) 7.00 [3.00, 10.25] 21.50 [16.50, 26.50] 3.00 [1.75, 3.00] 95.00 [90.00, 100.00] 70.00 [61.75, 77.50] 9.00 [6.75, 15.00] 5.00 [3.00, 7.00] 27.00 [26.00, 29.00] 20.00 [14.00, 27.00] 21 (58.33) 33 (91.67)
0.01 0.21 <0.001 0.32 <0.001 0.02 0.02 0.90 0.09 <0.001 0.75
158.70 [130.67, 202.07] 13.48 [11.27, 16.87] 871.90 [654.88, 1129.50] 1422.50 [1103.00, 1808.20] 0.18 [0.15, 0.21]
169.10 [139.75, 227.83] 13.54 [12.09, 18.77] 859.85 [609.05, 1014.75] 1398.95 [1162.00, 1797.63] 0.19 [0.15, 0.23]
0.28 0.19 0.62 0.89 0.28
2.00 (0.53) Without depression With depression p value
2.02 (0.58)
0.81
0.79 [0.64, 0.95] 17.51 [9.77, 26.58] 35.15 [15.81, 53.08]
0.72 [0.61, 0.99] 16.33 [6.92, 25.27] 42.29 [20.03, 55.36]
0.54 0.76 0.45
0.49 0.31
0.11
Abbreviations: CSF, cerebrospinal fluid; DAT, dopamine transporter; MDS-UPDRS, Movement Disorder Society Revision of the Unified Parkinson Disease Rating Scale; GDS-15, 15-item Geriatric Depression Scale; MSEADL, Modified Schwab and England Activities of Daily Living; STAI, State Trait Anxiety Total Score; SCOPAAUT, Scales for Outcomes in Parkinson's Disease-Autonomic questionnaire; RBDSQ, Rapid Eye Movement Sleep Behavior Disorder Screening Questionnaire; MOCA, Montreal Cognitive Assessment; UPSIT, University of Pennsylvania Smell Inventory Test.
and matched healthy controls in 33 sites for identifying PD progression biomarkers (Parkinson Progression Marker Initiative, 2011). Assessments mainly include clinical evaluation of motor and non-motor features, CSF examination, and iodine-123-labelled ioflupane dopamine transporter single photon emission computed tomography (SPECT; DATSCAN) imaging at baseline visit. Follow-up assessments take place one (T1) and two years (T2) after baseline assessment (T0). Each participating site has received approval from an ethical standards committee on human experimentation and obtained written informed consent. For up-to-date information on the study, visit www.ppmi-info. org.
Weintraub et al., 2005), although controversy still exists about most of these factors. To date, no study has previously sought to predict each patient's likelihood of developing dPD by combining demographic, clinical, CSF, and DAT imaging parameters. Therefore, we aimed to use a machine learning algorithm called extreme gradient boosting (XGBoost) to assess the importance of candidate variables in terms of their ability to predict clinically significant depression in a cohort of untreated patients with PD at the earliest stage of the disease studied so far, and compared the performance of XGBoost to the conventional logistic regression model.
2. Methods 2.2. Participants and outcome 2.1. Study design Only participants over the age of thirty with a recent diagnosis of idiopathic PD within two years were assessed for eligibility. An asymmetric resting tremor or asymmetric bradykinesia or two of bradykinesia, resting tremor and rigidity were required to have. Participants
We used data from the Parkinson's Progression Marker Initiative (PPMI) database (www.ppmi-info.org/data). PPMI is a large, multicenter, prospective cohort study that follows drug-naïve PD patients 119
Journal of Affective Disorders 268 (2020) 118–126
S.-C. Gu, et al.
Fig. 1. The XGBoost machine learning curve. Log-loss value for the training and test cohorts is shown in the vertical axis. The learning curve provides a report on how well the model is performing on both training and testing cohorts during training. Abbreviation: XGBoost, extreme gradient boosting.
(MDS-UPDRS) II (Cui et al., 2017), MDS-UPDRS III (Ou et al., 2018), Montreal Cognitive Assessment (MOCA) (Ou et al., 2018), Modified Schwab and England Activities of Daily Living Scale (MSEADL) (Leentjens et al., 2013), Scales for Outcomes in Parkinson's DiseaseAutonomic Questionnaire (SCOPA-AUT) (Matsubara et al., 2018), Rapid Eye Movement Sleep Behavior Disorder Screening Questionnaire (RBDSQ) and State Trait Anxiety Inventory (STAI) scores (Dissanayaka et al., 2011a; Leentjens et al., 2013; Neikrug et al., 2014). For biomarker studies, we assessed CSF for total tau (t-tau) calculated ratio of t-tau to β-amyloid 1–42 (Aβ1-42), t-tau, phosphorylated tau 181p (p-tau181), Aβ1-42 and α-synuclein (α-syn) (Kang et al., 2013). We also included DAT imaging data for mean caudate and putaminal uptake relative to uptake in the occipital area, and asymmetry of caudate and putaminal uptake (side with highest divided by side with lowest uptake) (Weintraub et al., 2005).
were also required to be with 2-year follow-up, be not treated with any PD medications within 60 days of the baseline visit and not be expected to require PD medications within at least 6 months from baseline. Exclusion criteria were lack of sufficient data for assessing the outcome measure and covariates of this sub-study at T0 and T2 and not having a DAT deficit on imaging. The 15-item Geriatric Depression Scale (GDS-15) has been validated in nonelderly and elderly PD patients with good accuracy for identifying major and non-major depression in PD (Weintraub et al., 2007). Consistent with the International Parkinson and Movement Disorders Society (MDS) Task Force, we defined clinically significant dPD as the GDS-15 score with the cutoff of 5 or more assessed at T2, which was used as the outcome (Schrag et al., 2007). Thus patients with a GDS-15 score of 5 or more at baseline were excluded from the analysis. 2.3. Candidate predictors
2.4. Statistical analysis Candidate predictors were considered on the basis of previous evidence and applicability to PPMI database: age (Leentjens et al., 2002), age at onset (Leentjens et al., 2002; Sagna et al., 2014; Dissanayaka et al., 2011a), gender (Ou et al., 2018), side of symptom onset (Leentjens et al., 2002), duration (Sagna et al., 2014), history of depression (Leentjens et al., 2002, 2013), history of malignancy (Cui et al., 2017), baseline GDS-15 (Schrag et al., 2007), Movement Disorder Society Revision of the Unified Parkinson Disease Rating Scale
The study sample was randomly split into training (70% of sample) and testing (30% of sample) cohorts. This was a hypothesis-generating study, thus no attempt was made to estimate the sample size of the study. We used multiple imputation method with chained equations to impute missing values. Descriptive statistics were calculated for the sample. Continuous variables were expressed as median (interquartile range) or mean (standard deviation), and categorical variables were 120
Journal of Affective Disorders 268 (2020) 118–126
S.-C. Gu, et al.
Fig. 2. Variable importance derived from XGBoost model. Abbreviations: XGBoost, extreme gradient boosting; GDS-15, 15-item Geriatric Depression Scale; STAI, State Trait Anxiety Total Score; RBDSQ, Rapid Eye Movement Sleep Behavior Disorder Screening Questionnaire; MDS-UPDRS, Movement Disorder Society Revision of the Unified Parkinson Disease Rating Scale.
0.693 and only slightly greater than the training log-loss (A log-loss of 0.693 is the performance of a binary classifier that performs no better than a coin flip: - log(0.5) ≈ 0.693) as the number of iteration increased. The smbinning package with the function of automated binning based on conditional inference tree was applied for boosting predictive accuracy and the flexibility to train simpler models. Accordingly, age, age at onset of PD, duration, baseline GDS-15, MDS-UPDRS II, STAI, SCOPA-AUT and RBDSQ scores were converted into discrete variables in logistic regression models. Associations between each predictor and the outcome were assessed with simple logistic regression models. Multivariable stepwise logistic regression models with predictors that had p-value ≤ 0.1 in simple logistic analyses were used to select potential factors which were predictive of depression in PD. Both forward selection and backward elimination were applied. Akaike's information criterion (AIC) was used as the selection criteria in our final model. Contribution of each predictor was presented as beta coefficients and odds ratios (ORs) with 95% CIs. Recent studies show that the ten events per variable convention is generally too conservative, and can be relaxed for the logistic and cox regression (Vittinghoff and McCulloch, 2007). The outcome event of this study was 36, and the number of events per variable was set to ≈6, leaving room for a maximum of six variables in the final model. Multicollinearity of the final model was assessed. Measures of discrimination and calibration assessed the performances of final models. Discrimination refers to the ability of a model to correctly distinguish non-events and events, and it can be quantified by calculating the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. We computed the AUC with a 95% confidence interval (CI) by using 1000 bootstrap resampling (Steyerberg et al., 2003), and reported the sensitivity and specificity associated with optimal cutpoint. Calibration measures how closely the
expressed as number and proportion as appropriate. Inter-group comparisons were performed by using Student's t tests or Wilcoxon's ranksum tests for continuous variables and Chi-square tests or Fisher's exact tests for categorical variables. Comparisons used a two-sided significance level of 0.05. An efficient algorithm, named XGBoost, ranking the importance of candidate predictors with respect to their ability to predict dPD, was employed. XGBoost is derived from the gradient boosting decision tree and proposed by Chen et al. (Chen and Guestrin, 2016). At each iteration of the training process, the residual of a base classifier (e.g., decision tree) is used in the next classifier for optimizing the objective function, combining weak base classifier into a stronger classifier. Each classifier focused more on misclassified observations during the previous iterations (Chen and Guestrin, 2016). In addition, a regularization term to control the complexity of the model is introduced in XGBoost for preventing overfitting. The ability to understand the complicated relationship in data requires the tuning of hyper parameters relating to number of trees (nrounds), the learning rate (eta), minimum loss reduction required to make a further partition on a leaf node of the tree (gamma), subsampling proportion (subsample), minimum sum of instance weight needed in a child node (min_child_weight), and maximum tree depth (max_depth). Here, hyperparameters were selected by grid search using ten-fold cross validation (CV) for the best accuracy. We used individual classification trees as the weak learner, binary logistic as the learning objective function, and the log-loss was used as the metric with xgboost package. In CV, the training cohort was randomly partitioned into ten equal-sized subsamples. Nine subsamples were used for training, with the remaining one serving as validation data, and the classification results were obtained by employing the final XGBoost model on the testing cohort. Hyperparameters were considered to be optimally tuned if the CV training log-loss decreased with CV test log-loss less than
121
Journal of Affective Disorders 268 (2020) 118–126
S.-C. Gu, et al.
Table 2 Associations between each candidate predictor and depression in early Parkinson's disease. Variables
Age ≤ 78 > 78 Age at onset ≤ 70 > 70 Gender Female Male Symptom onset side Left Right Bilateral Duration ≤ 6.6 > 6.6 History of depression Yes No History of malignancy Yes No Baseline GDS-15 score ≤2 >2 MDS-UPDRS II score ≤6 >6 MDS-UPDRS III score MOCA score MSEADL score STAI score ≤ 56 (56, 68] > 68 SCOPA-AUT score ≤ 14 > 14 RBDSQ score ≤6 >6 Phosphorylated tau Aβ1-42 α-synuclein Total tau: Aβ1-42 ratio ≤ 0.33 > 0.33 Mean caudate uptake Mean putaminal uptake Caudate asymmetry Putaminal asymmetry
Univariable OR (95% CI)
p value
Multivariable OR (95% CI)
p value
Ref 6.13 (1.15, 29.52)
0.02
Ref 2.76 (0.22, 37.87)
0.43
Ref 4.30 (1.58, 11.09)
0.003
Ref 7.30 (1.38, 38.69)
0.02
Ref 0.97 (0.42, 2.38)
0.94
Ref 0.75 (0.31, 1.77) 4.50 (0.55, 29.98)
0.51 0.12
Ref 2.08 (0.89, 4.79)
0.08
Ref 3.47 (1.13, 11.21)
0.03
Ref 0.34 (0.15, 0.85)
0.02
Ref 0.60 (0.18, 2.10)
0.41
Ref 6.32 (0.01, 68.92)
0.99
Ref 6.34 (2.70, 15.35)
<0.001
Ref 6.53 (2.09, 23.09)
0.002
Ref 2.98 1.01 1.03 0.94
0.01 0.59 0.79 0.08
Ref 2.20 (0.56, 8.77)
0.26
1.01 (0.92, 1.12)
0.81
(1.30, (0.97, (0.86, (0.88,
6.96) 1.06) 1.24) 1.01)
Ref 6.83 (1.69, 45.85) 14.35 (3.83, 96.57)
0.02 <0.001
Ref 7.05 (1.30, 58.65) 11.23 (2.27, 88.39)
0.04 0.007
Ref 3.17 (1.12, 8.24)
0.02
Ref 0.48 (0.08, 2.29)
0.38
Ref 3.99 (1.60, 9.67) 1.07 (0.99, 1.16) 0.999 (0.998, 1.00) 1.00 (0.999, 1.00)
0.002 0.08 0.2 0.52
Ref 5.63 (1.66, 20.19) 1.01 (0.88, 1.15)
0.006 0.88
Ref 6.29 1.10 1.55 1.00 1.00
0.003 0.82 0.55 0.95 0.96
1.73 (0.22, 12.47)
0.59
(1.73, (0.50, (0.34, (0.97, (0.98,
21.53) 2.28) 6.39) 1.03) 1.02)
Abbreviations: Ref, reference; OR, odds ratio; CI, confidence interval; MDS-UPDRS, Movement Disorder Society Revision of the Unified Parkinson Disease Rating Scale; GDS-15, 15-item Geriatric Depression Scale; MSEADL, Modified Schwab and England Activities of Daily Living; STAI, State Trait Anxiety Total Score; SCOPAAUT, Scales for Outcomes in Parkinson's Disease-Autonomic questionnaire; RBDSQ, Rapid Eye Movement Sleep Behaviour Disorder Screening Questionnaire; MOCA, Montreal Cognitive Assessment.
a DAT deficit on imaging. 50 patients were excluded because of the GDS-15 score of 5 or more at baseline. A total of 312 patients were included in our analysis and 36 (12%) of 312 patients were classified as having depression at T2. In addition, 30 of 36 patients were classified as mild depression (GDS-15 between 5 and 9), and 6 were classified as moderate to severe depression (GDS-15 > 9). At baseline, patients later classified as dPD had higher MDS-UPDRS II scores, higher GDS-15 scores, higher STAI scores, higher SCOPA-AUT scores, higher RBDSQ scores, and higher rate of the history of depression (Table 1).
predicted probabilities agree numerically with the actual outcomes. We tested calibration by using the Hosmer–Lemeshow test and plotting observed probabilities against predicted probabilities. Statistical analyses were conducted by using R software version 3.3.3 (R Foundation for Statistical Computing, Vienna, Austria).
3. Results 3.1. Participants 423 drug-naïve PD patients were enrolled into the PPMI study between June 2010 and April 2013. 57 patients lacked GDS-15 data at T2 and were excluded. 4 patients were excluded because they did not have 122
Journal of Affective Disorders 268 (2020) 118–126
S.-C. Gu, et al.
training and testing cohorts. A nomogram was also generated for clinical utility (supplementary e-Fig. 1).
Table 3 Model coefficients and adjusted odds ratios. Variables
Coefficient
Intercept Age at onset > 70 Duration > 6.6 months RBDSQ score > 6 STAI score (56, 68] STAI score > 68 Baseline GDS-15 score > 2 Without the history of depression
−5.321 2.279 1.299 1.726 1.942 2.493 1.776 −0.620
OR (95% CI)
p value
9.77 (2.91, 35.25) 3.67 (1.26, 11.35) 5.62 (1.77, 18.66) 6.97 (1.33, 56.85) 12.09 (2.52, 94.55) 5.91 (2.04, 18.84) 0.54 (0.17, 1.69)
<0.001 <0.001 0.02 0.004 0.04 0.005 0.002 0.28
3.4. Model performance Both models produced good discrimination and calibration. The XGBoost model had an AUC of 0.94 (95% CI 0.89, 0.99) with the cutpoint at 0.237 (Fig. 3A), which corresponded to a sensitivity of 0.92 (95% CI 0.81, 1.00) and specificity of 0.86 (95% CI 0.82, 0.91). The logistic regression model had an AUC of 0.89 (95% CI 0.82, 0.95) with the cutpoint at 0.155 (Fig. 3B), with a sensitivity of 0.81 (95% CI 0.65, 0.96) and specificity of 0.83 (95% CI 0.78, 0.89), respectively. The corresponding confusion matrix for each model is shown in Fig. 3C and Fig. 3D. Neither the Hosmer and Lemeshow test nor the calibration plot (supplementary e-Fig. 2) found strong evidence of poor calibration from the XGBoost (χ2 = 7.31, p = 0.60) or logistic regression (χ2 = 3.16, p = 0.96). The DARGDS score was associated with dPD (OR per unit increase 1.73, 95% CI 1.47, 2.16, p < 0.001), with a sensitivity of 0.73 and specificity of 0.88.
Abbreviations: OR, odds ratio; CI, confidence interval; RBDSQ, Rapid Eye Movement Sleep Behaviour Disorder Screening Questionnaire; GDS-15, 15-item Geriatric Depression Scale; STAI, State Trait Anxiety Total Score. Table 4 Components of the DARGDS score. Variables
Range
Points
History of depression
No Yes ≤70 >70 ≤6 >6 ≤2 >2 ≤6.6 months >6.6 months ≤56 56–68 >68
−1 0 0 4 0 3 0 3 0 2 0 3 4
Age at onset RBDSQ score GDS-15 score Duration STAI score
4. Discussion 4.1. Prevalence of depression in early PD In our study, the prevalence rate of clinically significant depression among patients with newly diagnosed PD over a 2-year period was 12%, mainly of mild and moderate levels, indicating that depression was common in early PD, in keeping with reported estimates. The reported prevalence rates for dPD ranged between 3% and 90% (Reijnders et al., 2008). This substantial variability of prevalence rates could be accounted for by major assessment techniques and design differences, thus the definite view needs to be further crystallized.
Abbreviations: RBDSQ, Rapid Eye Movement Sleep Behaviour Disorder Screening Questionnaire; GDS-15, 15-item Geriatric Depression Scale; STAI, State Trait Anxiety Total Score.
4.2. Prediction of depression in early PD
3.2. The XGBoost model
Conducting a prospective cohort of patients with newly diagnosed PD to investigate markers for dPD has been an elusive goal. Our study filled the gap in the existing studies integrating demographic, clinical, CSF, and DAT imaging variables collected within two years since diagnosis via machine learning and logistic regression to predict dPD at the 2-year time point. Both XGBoost and logistic regression models successfully predicted the occurrence of dPD, which could be helpful in individuating patients at risk of dPD to initiate possible pharmacological and non-pharmacological interventions in clinical trials. We found XGBoost to have the higher predictive value with 92% sensitivity and 86% specificity compared to the logistic regression, representing a good attempt at combining the machine learning algorithm with clinical data. Meanwhile, it should also be nominated that all the variables included in logistic regression were accessible and routinely collected, which could be applied as a simpler tool for clinicians in clinical practice course. Both models identified baseline GDS-15 score, age at onset, STAI score, RBDSQ score, duration and history of depression as predictors of dPD. Methodologically, these two models were fundamentally different, thus the similarity of results hinted to the robustness of our study. Patients with higher GDS-15 scores and those with longer duration were more likely to develop dPD, in line with the reports of depression scores and duration being predictors of dPD (Sagna et al., 2014; Schrag et al., 2007). Similarly, history of depression and the presence of RBD and anxiety were associated with dPD in previous studies, which supported our results (Leentjens et al., 2002; Dissanayaka et al., 2011a, 2013; Neikrug et al., 2014). Published research on the contribution of age at onset to dPD is still scant. Older onset age was a strong predictor in our study. However, in one case-control study involving 90 PD patients and 90 controls, no correlation was found between onset age and mood disorders (Nuti et al., 2004). In another cross-sectional study
Values of hyperparameters were identified with grid search and the hyperparameters used in the final XGBoost model were as follows: eta = 0.05, gamma = 0, max_depth = 2, subsample = 1, min_child_weight = 1, and nrounds = 30. The machine learning curve is presented in Fig. 1. With these hyperparameters, the CV training logloss decreased with CV testing log-loss less than 0.693 and only slightly greater than training log-loss as the number of iteration increased. Variable importance was calculated by the sum of the decrease in error when split by a variable, reflecting the contribution made by each variable in classifying dPD (Fig. 2). 3.3. The logistic regression model Unadjusted associations between each candidate predictor and outcome are showed in Table 2. After including the predictors having pvalue ≤ 0.1 in simple logistic analyses into the multivariable same model, age at onset, duration, baseline RBDSQ score, STAI score and GDS-15 score were significantly associated with the outcome (Table 2). Results of the final model are given in Table 3. No collinearity existed amongst variables. Due to the primary goal of achieving the best performance, predictors which were not significantly associated with the outcome might be included if they increased the discriminatory ability of final model. Although history of depression did not display significant association with the outcome, it was included for holistic utility of prediction. The depression-age of onset-RBDSQ-GDS-duration-STAI (DARGDS) score was derived to help to rapidly predict dPD risk, using the coefficients of the predictors in the final logistic model. The components of the DARGDS score were described in Table 4. There is an increased risk of dPD when the score was equal or greater than 6 in 123
Journal of Affective Disorders 268 (2020) 118–126
S.-C. Gu, et al.
Fig. 3. Receiver operating characteristic curves for prediction of depression at 2-year follow-up in patients newly diagnosed with Parkinson's disease based on XGBoost model (A) and logistic regression model (B) and confusion matrix for the XGBoost model (C) and logistic regression model (D) in the testing cohort. Abbreviation: XGBoost, extreme gradient boosting.
disorder (MDD) are highly prevalent neuropsychiatric conditions with epidemiological overlaps. Recent reports have linked Aβ accumulation to increased brain cytokine production and depressive-like behavior in mice with the role of microglia as a link between AD and depression (Santos et al., 2016). Based on these findings, it could be stated that the involvement of AD copathology in PD and MDD might probably provide insights into pathogenesis of dPD, and the ratio of CSF t-tau to Aβ1-42 might have the potential of predicting dPD. Thus, this factor should receive more consideration.
involving 639 PD patients, younger onset age was associated with dPD (Dissanayaka et al., 2011b). Reasons for such discrepancy might include but was not limited to the diversity in design, sample size, and disease stage across the research, which highlighted the need for further studying. Here, patients classified as dPD had slightly higher values of t-tau together with lower values of Aβ1-42. As a consequence, the ratio of CSF t-tau to Aβ1-42, reported to be predictive of increasing cerebral Alzheimer's disease (AD) and synuclein pathology (Irwin et al., 2018), was likely to be a good feature for dPD in XGBoost and simple logistic analyses. It was investigated that lower levels of CSF Aβ1-42 and ptau181 could differentiate PD patients from healthy controls (Kang et al., 2013). Furthermore, many PD patients have histopathological features of AD at autopsy, including Aβ plaques and neurofibrillary tangles composed of tau (Braak, H and Braak, E, 1990). These findings were consistent with the hypothesis that AD and synuclein pathology might coexist in PD (Irwin et al., 2012). Intriguingly, AD and major depressive
4.3. Clinical implications Present predictors provided by the developed models were of importance with respect to understanding the nature of depression in patients with PD. In our study, 2 PD-specific factors (age at onset, duration) were identified as relevant markers by XGBoost and logistic regression models. Other important factors including GDS-15 score, 124
Journal of Affective Disorders 268 (2020) 118–126
S.-C. Gu, et al.
STAI score, RBDSQ score, and history of depression were not only related to PD, but also known as risk factors for depression, and hence not specific for PD (Schoevers et al., 2000; Lam et al., 2016). Taken together, they showed that both PD-specific and nonspecific factors could contribute to dPD, providing evidence for the theory that there might be subtypes in dPD: non specifically and specifically associated with PD (Even and Weintraub, 2012). For further assessing the effects of PD-specific and nonspecific predictors, we assumed that Mr. Y was a 78-year-old patient diagnosed as PD two years ago who had the baseline GDS-15 score of 1, STAI score of 50, RBDSQ score of 2 without the history of depression, the predicted risk of dPD in the logistic regression model with these results would be 9%. If Mrs. X was a patient without PD who had the baseline GDS-15 score of 4, STAI score of 70, RBDSQ score of 7 with the history of depression, she would get a corresponding 66% risk of depression, indicating that nonspecific factors might have more effects on depression in early PD. It was noteworthy that the previously postulated cardinal PD-specific markers for dPD such as MDS-UPDRS III score, levodopa equivalent doses and side of symptom onset did not contribute to prediction. One important caveat was that the untreated subjects with early PD were underlined in PPMI study with median duration since diagnosis of 4.2 months at baseline, while in studies describing associations between dPD and abovementioned factors, the duration was varying from 3.8 years to 8.3 years (Ou et al., 2018; Leentjens et al., 2013), highlighting that PD-specific factors tended to play more contributions to depression in the late stage. Accordingly, one could hypothesize that the nature of depression in PD patients might be heterogeneous and there might be an interaction between the disease stage and other PD-specific or nonspecific predictors when referring to dPD, which raised the possibility that different pathophysiological processes might underlie depression in different stages of PD. This finding was in line with the theory that the depression occurring early in PD might relate largely to emotional adjustment to the disease (non specifically associated with PD) and the latter to pathological component of the neurodegenerative process intrinsic to PD itself (specifically associated with PD) (Goetz, 2010). As it seems to many clinicians that dPD has not been well treated by current antidepressants or dopaminergic agonists like pramipexole, following the hypothesis above, future trials should assess whether treatment response to specific agents differs on the basis of subtypes and (or) disease stage. The highly complex relationship between depression and PD has already been studied for decades. We found that patients who developed depression in early PD were likely to have a pre-existing vulnerability because of the exposure to common risk factors for depression which were not specific for PD. Moreover, in the light of Mrs. X, it seemed that a series of nonspecific risk factors could predispose her to PD, contributing to the hypothesis that depression could be either a nonmotor symptom or a causal risk factor for PD (Leentjens et al., 2015).
have been a contributing factor in the development of dPD (Rana et al., 2016; Garlovsky et al., 2016). Third, we did not have access to information about the use of antidepressants, while it was reported that the use of antidepressants was not an independent predictor for the development of depression in PD patients who were non-depressed at baseline (Zhu et al., 2016). Fourth, selection bias might arise from the missing subjects during follow-up. Lastly, the generalizability of our models was limited by sample size, and external validation was essential.
4.4. Strengths and limitations
We acknowledge all the authors for their helpful comments on this article. All authors have approved the final version of article.
5. Conclusion In conclusion, we suggest that clinically significant depression in early PD can be predicted based on XGBoost and logistic regression to a significant degree from the prospective PPMI study. The most important markers including 2 PD-specific and 4 nonspecific factors revealed in the present study, helping to understand the natural course of depression in PD. Future validation of these results acquired is warranted and clinical data will be continuously collected for further improving the performance of established models. Funding This work was supported by Key Technologies Research and Development Program [2017YFC1310300]; National Natural Science Foundation of China [81673726]; and New Frontier Technology Project by Shanghai Shen Kang Hospital Development center [SHDC12018131]. Funders had no role in study design, data collection, analysis, or decision to publish the manuscript. PPMI (a public–private partnership) is funded by the Michael J. Fox Foundation for Parkinson's Research and funding partners, including Abbvie, Allergan, Avid Radiopharmaceuticals, Biogen, BioLegend, Bristol-Myers Squibb, Celgene, Denali, GE Healthcare, Genentech, GlaxoSmithKline, Lilly, Lundbeck, Merck, Meso Scale Discovery, Pfizer, Piramal, Prevail Therapeutics, Roche, Sanofi Genzyme, Servier, Takeda, Teva, UCB, Verily, Voyager Therapeutics and Golub Capital. CRediT authorship contribution statement Si-Chun Gu: Conceptualization, Methodology, Formal analysis, Writing - original draft, Validation. Jie Zhou: Conceptualization, Methodology, Formal analysis, Writing - original draft, Validation. CanXing Yuan: Writing - review & editing, Funding acquisition, Validation. Qing Ye: Data curation, Validation. Declaration of Competing Interest The authors declare no conflict of interest. Acknowledgments
One of the strengths in the current study was the application of machine learning technique with strong data processing ability. Nevertheless, the prospective observational design was adopted and the early stage was emphasized in enrollment, enhancing the ability to derive causal interferences, clarify the etiological relationship, and address potential biases and confounding. However, several limitations should be noted. First, important factors related to dPD, such as gene status including SLC6A15 gene, TPH2 gene and LRRK2 gene, the CSF levels of brain-derived neurotrophic factor (BDNF), calcitonin generelated peptide (CGRP), and 3‑methoxy-4-hydroxyphenylethylenglycol (MHPG) were not included in PPMI (Zheng et al., 2017; Palhagen et al., 2010). Second, we lack social and psychological parameters such as marital status, social supports, and personality traits, which may also
Supplementary materials Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.jad.2020.02.046. References Braak, H., Braak, E., 1990. Cognitive impairment in Parkinson's disease: amyloid plaques, neurofibrillary tangles, and neuropil threads in the cerebral cortex. J. Neural Transm. Park. Dis. DementSect. 2, 45–57. https://doi.org/10.1007/bf02251245. Chen, T., Guestrin, C., 2016. XGBoost: a Scalable Tree Boosting System. ACM Press, New York, pp. 785–794. https://doi.org/10.1145/2939672.2939785. arXiv. Cui, S.S., Du, J.J., Fu, R., Lin, Y.Q., Huang, P., He, Y.C., et al., 2017. Prevalence and risk
125
Journal of Affective Disorders 268 (2020) 118–126
S.-C. Gu, et al.
Nuti, A., Ceravolo, R., Piccinni, A., Dell'Agnello, G., Bellini, G., Gambaccini, G., et al., 2004. Psychiatric comorbidity in a population of Parkinson's disease patients. Eur. J. Neurol. 11, 315–320. https://doi.org/10.1111/j.1468-1331.2004.00781.x. Ou, R., Wei, Q., Hou, Y., Yuan, X., Song, W., Cao, B., et al., 2018. Vascular risk factors and depression in Parkinson's disease. Eur. J. Neurol. 25, 637–643. https://doi.org/10. 1111/ene.13551. Palhagen, S., Qi, H., Martensson, B., Walinder, J., Granerus, A.K., Svenningsson, P., 2010. Monoamines, BDNF, IL-6 and corticosterone in CSF in patients with Parkinson's disease and major depression. J. Neurol. 257, 524–532. https://doi.org/10.1007/ s00415-009-5353-6. Parkinson Progression Marker Initiative, 2011. The parkinson progression marker initiative (PPMI). Prog. Neurobiol. 95, 629–635. https://doi.org/10.1016/j.pneurobio. 2011.09.005. Rana, A.Q., Qureshi, A.R., Mumtaz, A., Abdullah, I., Jesudasan, A., Hafez, K.K., Rana, M.A., 2016. Associations of pain and depression with marital status in patients diagnosed with Parkinson's disease. Acta Neurol. Scand. 133, 276–280. https://doi. org/10.1111/ane.12454. Reijnders, J.S., Ehrt, U., Weber, W.E., Aarsland, D., Leentjens, A.F., 2008. A systematic review of prevalence studies of depression in Parkinson's disease. Mov. Disord. 23, 183–189. https://doi.org/10.1002/mds.21803. Sagna, A., Gallo, J.J., Pontone, G.M., 2014. Systematic review of factors associated with depression and anxiety disorders among older adults with Parkinson's disease. Parkinsonism Relat. Disord. 20, 708–715. https://doi.org/10.1016/j.parkreldis.2014. 03.020. Santos, L.E., Beckman, D., Ferreira, S.T., 2016. Microglial dysfunction connects depression and Alzheimer's disease. Brain Behav. Immun. 55, 151–165. https://doi.org/10. 1016/j.bbi.2015.11.011. Schoevers, R.A., Beekman, A.T., Deeg, D.J., Geerlings, M.I., Jonker, C., Van Tilburg, W., 2000. Risk factors for depression in later life; results of a prospective community based study (AMSTEL). J. Affect. Disord. 59, 127–137. https://doi.org/10.1016/ s0165-0327(99)00124-x. Schrag, A., Barone, P., Brown, R.G., Leentjens, A.F., McDonald, W.M., Starkstein, S., et al., 2007. Depression rating scales in Parkinson's disease: critique and recommendations. Mov. Disord. 22, 1077–1092. https://doi.org/10.1002/mds.21333. Steyerberg, E.W., Bleeker, S.E., Moll, H.A., Grobbee, D.E., Moons, K.G., 2003. Internal and external validation of predictive models: a simulation study of bias and precision in small samples. J. Clin. Epidemiol. 56, 441–447. https://doi.org/10.1016/s08954356(03)00047-7. Vittinghoff, E., McCulloch, C.E., 2007. Relaxing the rule of ten events per variable in logistic and Cox regression. Am. J. Epidemiol. 165, 710–718. https://doi.org/10. 1093/aje/kwk052. Weintraub, D., Newberg, A.B., Cary, M.S., Siderowf, A.D., Moberg, P.J., Kleiner-Fisman, G., et al., 2005. Striatal dopamine transporter imaging correlates with anxiety and depression symptoms in Parkinson's disease. J. Nucl. Med. 46, 227–232. Weintraub, D., Saboe, K., Stern, M.B., 2007. Effect of age on geriatric depression scale performance in Parkinson's disease. Mov. Disord. 22, 1331–1335. https://doi.org/10. 1002/mds.21369. Zheng, J., Yang, X., Zhao, Q., Tian, S., Huang, H., Chen, Y., et al., 2017. Association between gene polymorphism and depression in Parkinson's disease: a case-control study. J. Neurol. Sci. 375, 231–234. https://doi.org/10.1016/j.jns.2017.02.001. Zhu, K., van Hilten, J.J., Marinus, J., 2016. Associated and predictive factors of depressive symptoms in patients with Parkinson's disease. J. Neurol. 263, 1215–1225. https:// doi.org/10.1007/s00415-016-8130-3.
factors for depression and anxiety in Chinese patients with Parkinson disease. BMC Geriatr. 17, 270. https://doi.org/10.1186/s12877-017-0666-2. Dissanayaka, N.N., O'Sullivan, J.D., Silburn, P.A., Mellick, G.D., 2011a. Assessment methods and factors associated with depression in Parkinson's disease. J. Neurol. Sci. 310, 208–210. https://doi.org/10.1016/j.jns.2011.06.031. Dissanayaka, N.N., Sellbach, A., Silburn, P.A., O'Sullivan, J.D., Marsh, R., Mellick, G.D., 2011b. Factors associated with depression in Parkinson's disease. J. Affect. Disord. 132, 82–88. https://doi.org/10.1016/j.jad.2011.01.021. Even, C., Weintraub, D., 2012. Is depression in Parkinson's disease (PD) a specific entity? J. Affect. Disord. 139, 103–112. https://doi.org/10.1016/j.jad.2011.07.002. Garlovsky, J.K., Overton, P.G., Simpson, J., 2016. Psychological predictors of anxiety and depression in Parkinson's disease: a systematic review. J. Clin. Psychol. 72, 979–998. https://doi.org/10.1002/jclp.22308. Goetz, C.G., 2010. New developments in depression, anxiety, compulsiveness, and hallucinations in Parkinson's disease. Mov. Disord. 25, S104–S109. https://doi.org/10. 1002/mds.22636. Irwin, D.J., White, M.T., Toledo, J.B., Xie, S.X., Robinson, J.L., Van Deerlin, V., et al., 2012. Neuropathologic substrates of Parkinson disease dementia. Ann. Neurol. 72, 587–598. https://doi.org/10.1002/ana.23659. Irwin, D.J., Xie, S.X., Coughlin, D., Nevler, N., Akhtar, R.S., McMillan, C.T., et al., 2018. CSF tau and beta-amyloid predict cerebral synucleinopathy in autopsied Lewy body disorders. NeurologyNeurology 90, e1038–e1046. https://doi.org/10.1212/WNL. 0000000000005166. Kang, J.H., Irwin, D.J., Chen-Plotkin, A.S., Siderowf, A., Caspell, C., Coffey, C.S., et al., 2013. Association of cerebrospinal fluid beta-amyloid 1-42, T-tau, P-tau181, and alpha-synuclein levels with clinical features of drug-naive patients with early Parkinson disease. JAMA Neurol. 70, 1277–1287. https://doi.org/10.1001/ jamaneurol.2013.3861. Lam, S.P., Wong, C.C., Li, S.X., Zhang, J.H., Chan, J.W., Zhou, J.Y., et al., 2016. Caring burden of REM sleep behavior disorder – spouses’ health and marital relationship. Sleep Med. 24, 40–43. https://doi.org/10.1016/j.sleep.2016.08.004. Leentjens, A.F., 2015. Parkinson disease: depression-risk factor or early symptom in Parkinson disease. Nat. Rev. Neurol. 11, 432–433. https://doi.org/10.1038/nrneurol. 2015.126. Leentjens, A.F., Lousberg, R., Verhey, F.R., 2002. Markers for depression in Parkinson's disease. Acta Psychiatr. Scand. 106, 196–201. https://doi.org/10.1034/j.1600-0447. 2002.02045.x. Leentjens, A.F., Moonen, A.J., Dujardin, K., Marsh, L., Martinez-Martin, P., Richard, I.H., et al., 2013. Modeling depression in Parkinson disease: disease-specific and nonspecific risk factors. NeurologyNeurology 81, 1036–1043. https://doi.org/10.1212/ WNL.0b013e3182a4a503. Marsh, L., McDonald, W.M., Cummings, J., Ravina, B., NINDS/NIMH Work Group on Depression and Parkinson's Disease, 2006. Provisional diagnostic criteria for depression in Parkinson's disease: report of an NINDS/NIMH work group. Mov. Disord. 21, 148–158. https://doi.org/10.1002/mds.20723. Matsubara, T., Suzuki, K., Fujita, H., Watanabe, Y., Sakuramoto, H., Matsubara, M., et al., 2018. Autonomic symptoms correlate with non-autonomic non-motor symptoms and sleep problems in patients with Parkinson's disease. Eur. Neurol. 80, 193–199. https://doi.org/10.1159/000495797. Neikrug, A.B., Avanzino, J.A., Liu, L., Maglione, J.E., Natarajan, L., Corey-Bloom, J., et al., 2014. Parkinson's disease and REM sleep behavior disorder result in increased non-motor symptoms. Sleep Med. 15, 959–966. https://doi.org/10.1016/j.sleep. 2014.04.009.
126