A deep learning framework for automatic diagnosis of unipolar depression

A deep learning framework for automatic diagnosis of unipolar depression

International Journal of Medical Informatics 132 (2019) 103983 Contents lists available at ScienceDirect International Journal of Medical Informatic...

2MB Sizes 0 Downloads 58 Views

International Journal of Medical Informatics 132 (2019) 103983

Contents lists available at ScienceDirect

International Journal of Medical Informatics journal homepage: www.elsevier.com/locate/ijmedinf

A deep learning framework for automatic diagnosis of unipolar depression a,⁎

Wajid Mumtaz , Abdul Qayyum a b

T

b

Department of Electrical Engineering, School of Electrical Engineering and Computer Science, National University of Sciences and Technology, Islamabad, Pakistan Le2I- Electronics, Computer Science and Image Laboratory, CNRS 6306, Université de Bourgogne, Dijon Campus, France

A R T I C LE I N FO

A B S T R A C T

Keywords: EEG-based deep learning for depression EEG-based diagnosis of unipolar depression Convolutional neural network for depression Long short-term memory classifiers for depression EEG-based machine learning methods for depression

Background and purpose: In recent years, the development of machine learning (ML) frameworks for automatic diagnosis of unipolar depression has escalated to a next level of deep learning frameworks. However, this idea needs further validation. Therefore, this paper has proposed an electroencephalographic (EEG)-based deep learning framework that automatically discriminated depressed and healthy controls and provided the diagnosis. Basic procedures: In this paper, two different deep learning architectures were proposed that utilized one dimensional convolutional neural network (1DCNN) and 1DCNN with long short-term memory (LSTM) architecture. The proposed deep learning architectures automatically learn patterns in the EEG data that were useful for classifying the depressed and healthy controls. In addition, the proposed models were validated with restingstate EEG data obtained from 33 depressed patients and 30 healthy controls. Main findings: As results, significant differences were observed between the two groups. The classification results involving the CNN model were accuracy = 98.32%, precision = 99.78%, recall = 98.34%, and f-score = 97.65%. In addition, the study has reported LSTM with 1DCNN classification accuracy = 95.97%, precision = 99.23%, recall = 93.67%, and f-score = 95.14%. Conclusions: Deep learning frameworks could revolutionize the clinical applications for EEG-based diagnosis for depression. Based on the results, it may be concluded that the deep learning framework could be used as an automatic method for diagnosing the depression.

1. Introduction Unipolar Major Depressive Disorder (MDD), also termed as depression, is a recurrent and debilitating mental illness. According to the World Health Organization (WHO), depression will become the number one leading cause of disease burden and functional disability at workplace by 2020. Unfortunately, depression is heterogeneous in nature; therefore, an accurate and early diagnosis could be challenging. Because of its comorbid nature, the depressed episode may turn into a manic (hyper) state rendering the diagnosis as a bipolar depression [1]. In the literature, many different studies involving the conventional machine learning (ML) schemes for automatic diagnosis of depression have agreed on a common limitation, i.e., the poor generalization of the trained classifiers. Since the application of ML methods for clinical applications is an interdisciplinary research area, there might be a limitation due to lack of specific domain knowledge. For example, each relevant study has identified specific features based on the assumption that these features could be useful for diagnosing depression. However, the studies could not achieve maximum classification accuracies and implicated into application specific classifiers. As a result, the classifiers



failed to perform best under new conditions. This limitation mainly arises because of the handcrafted features that are considered as integral part of the conventional ML schemes. There could be a chance that the handcrafted features introduced a notion of biasness and poses a limitation on the full capacity of a classifier or might result into overfitting issues. On the contrary, the deep learning architectures have provided a paradigm shift over the conventional ML schemes. In particular, the deep learning frameworks are more reliable than the conventional ML schemes because of the automatic extraction of information from the EEG data. In the conventional ML methods, the constraint imposed by handcrafted features has been partially removed by the deep learning architectures. In addition, the deep learning architectures further improves the generalizability of classifiers that can further be utilized across different clinics environments. In the context of EEG-based diagnosis of depression, the literature has evidenced few studies involving deep learning models. The first one was proposed by Acharaya et al. [2]. According to their implementation, convolutional neural network (CNN) was employed with 13 layers of abstraction. The study has reported classification accuracies i.e., 93.5% and 96% from the left and right hemispheres, respectively.

Corresponding author. E-mail address: [email protected] (W. Mumtaz).

https://doi.org/10.1016/j.ijmedinf.2019.103983 Received 22 June 2019; Received in revised form 20 August 2019; Accepted 27 September 2019 1386-5056/ © 2019 Elsevier B.V. All rights reserved.

International Journal of Medical Informatics 132 (2019) 103983

W. Mumtaz and A. Qayyum

referenced to infinity reference (IR) [9].

However, the study has employed 30 (15 depressed and 15 healthy controls) study participants. More recently, a similar study involving the same study participants (15 depressed and 15 healthy controls) is conducted having CNN- long short term memory (LSTM) (CNN-LSTM) deep learning architecture [3]. The study has reported 99.12% and 97.66% classification accuracies for the right and left hemisphere, respectively. However, due to complex deep learning architecture, this sample size, i.e., 30 is moderate especially when a deep learning architecture is to be trained and tested. In addition, Li and his team has proposed a deep learning architecture for automatic diagnosis of mild depression [4]. The paper also has utilized CNN with 85% accuracy. However, their method is specific to mild depression only. In general, the depression could have more subtypes such as mild, moderate, and severe etc. There are many papers on automatic diagnosis of depression with conventional classifier (a summary is provided in the ‘Discussion’ section); however, there are only 3 papers on deep learning methods for depression. In this manuscript, all three studies have been cited and used as benchmark for comparison purposes. In this paper, the novelty lies in the comparable performance of the proposed method when compared with the state-of-the-art methods. In addition, the proposed method has been evaluated with larger study sample than the previous studies. The results section of the manuscript has provided the proof of concept of the claims being made in this manuscript. More particularly, this manuscript has evaluated two different and unique deep learning architectures and tested them with a larger set of study samples (63 study participants). Moreover, the proposed architectures have used segmented time step signal with smaller length (1-second time step) to captured temporal information from EEG signal in better way as compared with existing deep learning models. The existing deep learning models related to depression classification choose long time samples signal of EEG data that would not capture time dependences between samples features in an optimal way.

2.3. EEG noise reduction The EEG data often confounded and suppressed by different types of artefacts due to as eye blinks and muscular activities during EEG recording session. Because of these artefacts, the underlying EEG could not be accessed directly without performing the prior noise reduction. Hence, the artifact reduction has fundamental importance before performing any EEG noise reductions. Therefore, this study has employed the multiple source eye correction (MSEC) method [10] in order to correct these artefacts. The MSEC was implemented in the standard brain electric source analysis (BESA) software [11]. A brief description of the method is provided here. The noise topography was estimated based on the recorded EEG data and were subtracted from the raw EEG recording. The estimation of noise topographies depends on the recorded EEG data. The construction of noise topographies depends on the type of artefact that needs to be cleaned such as eye-blinks, muscular and heart activities. The procedure was semi-automatic and the experimenter had to define the artefact template for the process to be started. The definition of the template was based on the recorded EEG data. For this purpose, the MSEC method utilized a head model and the estimated noise topographies to effectively clean the noises from the EEG data. The method estimated the noise topography based on the principal component analysis (PCA) of the recorded data and involved regression procedure. Further details of the method can be studied elsewhere [9]. 2.4. Proposed deep learning scheme In this study, two different deep learning models were presented. The first model was based on one-dimensional convolutional neural network (1DCNN) while the second model utilized a combination of 1DCNN with long short-term memory (LSTM) model. The deep learning models assumed that they captured both the spatial and temporal characteristics of the signal from EEG dataset. The 1DCNN has a capability to extract temporal features in an efficient. In this study, the proposed deep learning architectures were based on unique combinations of different layers and were evaluated on EEG datasets acquired in this study. The proposed layer combinations architectures provided significant performance as compared with the existing deep architectures. Fig. 1 shows an overview of the proposed ML framework including the computation of the classifier performance metrics. According to the proposed framework, the EEG data were segmented with a window length of one second (256 samples). As the sampling rate was 256 samples per second, each EEG segment contains 19 channels and 256 data points (window length). The selection of a one second window size was based on the empirical evaluation of the proposed models. It was observed that a window size of one second provided best results. From the classifier point of view, the input data dimension was 256 × 19 for each instant of class for two EEG datasets (EO and EC). Moreover, the input data were divided into training and testing set with 80 and 20 percent ratio. The training set passed to the deep learning models for classification of depression and healthy state. The testing set used to evaluate the performance of the classifier using various performance metrics (accuracy, precision, recall and f1-score).

2. Methodology 2.1. Study participants This study was conducted in the outpatient clinic of the hospital Universiti Sains Malaysia (HUSM). The study participants were volunteers, and they could leave the study at any time without providing any reason. Both the patients and healthy participants have signed the consent for participation. The ethics committee of the hospital had evaluated and approved the study design. In addition, the experimental procedure was briefed to the study participants, thoroughly. In particular, 33 MDD patients (Females = 18) (Age, mean = 40.33, SD = ± 12.861) and an age-matched group of 30 healthy subjects (Females = 9) (Age, mean = 38.227, SD = ± 15.64) were recruited. The experimental data such as clinical assessments for depression and resting-state EEG data were recorded. Only those MDD patients were recruited who met the diagnostic criteria (without any psychotic symptoms) according to Diagnostic and Statistical Manual-IV (DSM-IV) [5]. 2.2. Experimental data acquisition In this study, both the qualitative (EEG data) and quantitative information (clinical questionnaires) was recorded. The assessment of the disease severity was performed with the Beck Depression Inventory-II (BDI-II) and Hospital Anxiety and Depression Scale (HADS) [6,7]. On the other hand, resting-state EEG data were recorded for 5 min of EEG data recordings during eyes closed (EC) and 5 min of EEG recordings during eyes open (EO) conditions. The EEG data acquisition involved 19-channel EEG cap with sensors placed according to the 10–20 electrode placement standard [8] with linked-ear (LE) as reference. The EEG data were originally recorded in LE reference and subsequently re-

2.4.1. Deep learning architecture: model 1 Fig. 2 shows the block level representation of the proposed model. The model was based on one-dimensional convolutional neural network (1DCNN). The model utilized different number of convolutional and pooling layers with different filter sizes. Table 1 provides the model parameters. During training the proposed models, both the selected training batch size and the total number of epochs were 50 for all batches. The proposed model has used 2

International Journal of Medical Informatics 132 (2019) 103983

W. Mumtaz and A. Qayyum

Fig. 1. Proposed machine learning model for automatic diagnosis of depression.

image or spatial information, 2D convolution produced better results. The input layer has dimension, 256 × 19 (timestep × number of features). The first 1D convolutional layer used 10 × 1 convolutional filter and 50 number of feature maps. The Maxpooling layer used to reduce temporal size of the feature maps. The 3 × 3 filter was used to reduce temporal feature size for the first max-pooling layer. The second and

early stopping criteria and only best results of the models were saved with sigmoid activation function. Moreover, the proposed deep learning model included a CNN with 11 layers of architecture including the input layer. In addition, the model used 1D convolutional layer with max-pooling, dropout and fully connected layers. In order to capture temporal information, 1DCNN would be the better choice. For 2D

Fig. 2. Proposed model (1DCNN) based on 1D convolutional (C1, C2, C3) and pooling layers (P1, P2), global average pooling (GAP), dropout (DP) and fully connected layer (FC). 3

International Journal of Medical Informatics 132 (2019) 103983

W. Mumtaz and A. Qayyum

Table 1 Layer configuration of proposed IDCNN model for classification.

Table 2 Layer configuration of proposed IDCNN + LSTM model for classification.

Layers

Output size

Number of filters

Feature maps

Layers

Output size

Number of filters

Feature maps

Input 1D convolutional (C1) 1D convolutional (C2) 1D MaxPooling (P1) 1D convolutional (C3) 1D MaxPooling (P2) 1D convolutional (C4) GlobalAveragePooling (GAP) Dropout (DP) Fully connected (FC) Classifier (Sigmoid activation)

256 × 19 247 × 50 238 × 50 79 × 50 70 × 50 23 × 100 4 × 50 1 × 50 1 × 50 2 2

1 × 19 10 × 1 10 × 1 3×3 10 × 1 3×3 21 × 1 – – – –

– 50 50 50 100 100 50 50 50 50 × 2 50 × 2

Input 1D convolutional (C1) 1D MaxPooling (P1) Dropout (DP1) 1D convolutional (C2) 1D MaxPooling (P2) Dropout (DP2) 1D convolutional (C3) 1D MaxPooling (P3) Dropout (DP3) LSTM1 LSTM2 Fully connected (FC) Classifier (Sigmoid activation)

256 × 19 252 × 64 250 × 64 250 × 64 246 × 48 244 × 48 244 × 48 242 × 24 240 × 24 240 × 24 240 × 128 240 × 64 2 2

– 5×1 3×1 3×3 5×1 3×1 – 5×1 3×1 – – – – –

– 64 64 64 48 48 48 24 24 24 128 64 64 64 × 2

fourth convolutional layers used 50 feature maps with the same filter size as first convolutional layer. The only third convolutional layer used 100 feature maps with 10 × 1 filter size. Two consecutive convolutional layers in the start of network and next alternative convolutional and pooling layer has been proposed in this model. The global average pooling used to flatten the neurons and dropout layer has been used to avoid overfitting and for better adaption of the network at testing time. This proposed model produced excellent results for classification of EEG dataset.

neurons. The proposed model has utilized the sigmoid activation function due to two-class classification problem. The probability range for the sigmoid activation function was from zero to one representing positive and negative samples in the two classes. The 14 layers proposed 1DCNN + LSTM model could be the better choice to design for EEG signal.

2.4.3. Performance metrics In this study, the classification accuracy, precision, recall and f1 score were computed for true and predicted values according to Eqs. 1 to 4, respectively. The mathematical formulas of each performance metric were shown below:

2.4.2. Deep learning architecture: model 2 Fig. 3 shows the block level representation of the proposed model. The model is called 1DCNN-LSTM due to cascaded formation of various layers of 1DCNN and LSTM. The model has utilized different number of 1D convolutional and pooling layers with two embedded LSTM layers at the bottom of the model. Table 2 provides the model parameters. For training the proposed models, the training batch size 50 has been selected and total number of epochs were 50. The early stopping criteria has been used and best results were saved. In addition, the different number of filter sizes has been used for convolutional and pooling layers. Moreover, the number of feature maps or features chosen for first convolutional layer was 64. The next convolutional layer utilized reduced feature maps i.e., from 64 to 48 based on getting better experimental results. The third convolutional layer included 24 feature maps. In this model, the dropout layer has employed a dropout probability of 0.5 to avoid the overfitting issue. The dropout layer can stop the flow of information from some neurons during training. In addition, its adaption capability can make the model robust during testing. Moreover, the LSTM layers captured temporal information and could produce better accuracy. The model has incorporated two LSTM layers with different number of neurones. First LSTM used 64 neurons and second LSTM used 128 neurons, respectively. The proposed model has utilized fully connected dense layer and produced higher activation capability at the abstract level. In addition, the proposed model utilized global average pooling layer to reduce the number of neurons in the last layers. Furthermore, it could help avoiding overfitting by clipping some

Accuracy=TP+TN/TP+FN+FP+TN

(1)

Precision=TP/TP+FP

(2)

Recall=TP/TP+FN

(3)

F1=2x Precision x recall/precision + recall

(4)

where “TP” implicated true positive, “TN” referred as true negative, “FP” corresponded to false positive, and “FN” represented false negative.

2.4.4. Training the deep learning architectures In this study, various training hyperparameters were involved to train the deep learning models. For example, the parameters learning rate, optimizers, loss-functions were used during model training. In addition, the Adam optimizers were utilized for optimization purposes. The binary cross-entropy was used as a loss function and sigmoid was used as activation function. From the implementation point of view, these hyper-parameters were chosen from the Keras library with TensorFlow backend based on experimental best results. According to the best experimental results, the learning rate was 0.0003 for model 1 and 0.00004 for model 2, accordingly.

Fig. 3. Proposed model (1DCNN-LSTM) consists of block of 1D convolutional (C1), pooling layer (P1) and dropout layer (DP1). The repeated blocks (C2-P2-DP2, C3P3-DP3) with LSTM (Long short-term memory) blocks and fully connected (FC). 4

International Journal of Medical Informatics 132 (2019) 103983

W. Mumtaz and A. Qayyum

Fig. 4. (a) Performance metrics (accuracy, Precision, recall, and f1 score) for various number of features based on 1DCNN model for eye close (EC), (b) Performance metrics (accuracy, Precision, recall, and f1 score) for various number of features based on 1DCNN model for eye open (EO).

accuracy for eye close dataset and attained comparatively better performance metric for depression dataset. In addition, the proposed model based on combination of 1DCNN and LSTM produced better accuracy for eye open dataset during training the deep model. This proposed model attained better performance for testing and validation dataset as compared to state-of-the-art deep learning models used for depression classification as well. Figs. 7 and 8 show a comparison of classifier performances for EC and EO datasets. The folded accuracy for each k (k ranges from one to ten) using 1DCNN has been shown in Fig. 7. Similarly, for 1DCNN + LSTM model, the accuracy assessed using cross validation technique as shown in Fig. 8. In both cases, the EO has performed better classification results than the EC data sets. This could be due to the drowsiness effects in the EEG data. During EO, the participants feel more active and can fight with drowsiness in a better way as there is a fixation point in front of the participants. On the contrary, during EC the drowsiness factor could dominate during recording sessions. Training and validation accuracy and losses: Fig. 9 shows the training, validation accuracy and loss. In Fig. 9 (a,b), the loss and training accuracy is shown using 50 epochs for 1DCNN model. Similarly, the training and validation loss and accuracy for 1DCNN + LSTM model is shown Fig. 9 (c,d). The validation accuracy and loss should be lower or approximate equal to the training accuracy and loss to avoid overfitting. The 1DCNN proposed model produced smooth training and validation loss as shown in Fig. 9 (b). The training and validation curves show that the 1DCNN models could be generalize well and did not produce overfitting effect for testing and validation data. The validation and training loss did not produce better curve after 20 epochs for 1DCNN + LSTM network and degraded the accuracy at validation set. It means that this network could not provide the generalize capability after 20 epochs and needed

2.4.5. Testing the deep learning models The proposed models were tested using 20 percent of the available EEG dataset. For further analysis, 10-fold cross validation has been used. The results section shows the best scores as for both models. The testing of the proposed models is repeated 10 times. An overall performance was computed by choosing the highest results from the 10 iterations. 3. Results Fig. 4 shows the performance metrics for 1DCNN model (model 1). Fig. 4(a) shows 1DCNN performance for eye close data and Fig. 4(b) shows eye open dataset. The IDCNN classification implicated highest classification score of 98.32 percent for eye open (EO) and more than 96 percent for eye close (EC) scenario. The 1DCNN model has other performance metrics such as precision = 99.78, recall = 98.34, and fscore = 97.65 for EO dataset as shown in Fig. 4(b). Fig. 5 shows the performance metrics for 1DCNN + LSTM model (model 2) for eye close and eye open dataset, respectively. The selection of features i.e., 5, 10, 14, and 19 is random and provided for illustration purposes. The 1DCNN + LSTM classification produced highest accuracy = 95.97, precision = 99.23, recall = 93.67, and f-score = 95.14 for 19 features using EO dataset as shown in Fig. 5(b). Fig. 6. shows the accuracy for training, testing and validation of proposed models. The validation and testing accuracies are provided consistent behaviour for all features. The consistency in validation and testing accuracy shows that our proposed model produced better performance. The difference between training and validation accuracy is very close and this difference shows that there is no overfitting occur during the training process for proposed models. The proposed model based on 1DCNN model achieved highest performance in terms of

Fig. 5. (a) Performance metrics (accuracy, Precision, recall, and f1 score) for various number of features based on 1DCNN + LSTM model for eye close (EC), (b) Performance metrics (accuracy, Precision, recall, and f1 score) for various number of features based on 1DCNN + LSTM model for eye open (EO). 5

International Journal of Medical Informatics 132 (2019) 103983

W. Mumtaz and A. Qayyum

Fig. 6. (a) Performance metrics (accuracy) for various number of features based on 1DCNN model for eye close (EC), (b) Performance metrics (accuracy) for various number of features based on 1DCNNmodel for eye open (EO),(c) Performance metrics (accuracy) for various number of features based on 1DCNN + LSTM model for eye close (EC),(d) Performance metrics (accuracy) for various number of features based on 1DCNN + LSTM model for eye close (EO).

Fig. 7. accuracy performance metrics for 10-fold cross validation using 19 features for eye close and eye open dataset.

Fig. 8. accuracy performance metrics for 10-fold cross validation using 19 features for eye close and eye open dataset.

should be high. The ROC curve shows the performance of 1DCNN model (Fig. 10 (a)) and Fig. 10 (b) shows the performance for 1DCNN + LSTM model. The 1DCNN achieved highest performance as explained in performance analysis section. Based on ROC analysis, the 1DCNN + LSTM did not achieve good ROC curve for EC case as shown

other parameters for further optimization. The ROC curve is another performance metric and has been widely used for binary classification. In ROC curve, the model evaluates sensitivity and specificity based on different number of thresholds for robust model selection. For robust model, the sensitivity and specificity 6

International Journal of Medical Informatics 132 (2019) 103983

W. Mumtaz and A. Qayyum

Fig. 9. (a,b)Training accuracy and loss for 1DCNN model, (c,d) Training accuracy and loss for 1DCNN + LSTM model.

Fig. 10. (a) The ROC curve for proposed model 1DCNN using eye open and eye close dataset, b) The ROC curve for proposed model 1DCNN + LSTM for eye open and eye close dataset.

from all depression categories i.e., mild, moderate, and sever. In contrast, Li and his team has proposed a deep learning architecture for involving mild depression only [4] and reported 85% accuracy. Hence, the methods presented in this manuscript have proved considerably better classification results and adds to the knowledgebase on depression diagnosis. According to the literature on EEG-based machine learning (ML) methods for automatic diagnosis of depression, starting from 2001 to 2019, the frequency of publication has seen a rise after 2014. As shown in Table 3, according to the literature review, more than 90% of the studies have emphasized on the use of conventional ML methods and

in Fig. 9(b).

4. Discussion The classification results reported in this study was 98.32%. In the previous study [2], reported a classification accuracy was 93.5% and 96% from the left and right hemispheres, respectively. In this study, the model is evaluated with more study samples that provides a fair account on the deep learning architecture. However, Acharya et. al has employed only 30 (15 depressed and 15 healthy controls) study participants. Moreover, the present study has employed depressed patients 7

8

2018 2019 2019 2019

2019

[2] [4] [30] [31]

Present Study

2015

[20]

2018

2015 2015

[18] [19]

[29]

2015

[17]

2017 2017 2018 2018

2012

[16]

[25] [26] [27] [28]

2010

[15]

2016 2016 2017

2010

[14]

[22] [23] [24]

2004

[13]

2016

2001

[12]

[21]

Year

Study

Deep Learning 13 layers CNN Deep Learning KNN multi layered perceptron neural network (MLPNN), radial basis function network (RBFN), linear discriminant analysis (LDA) Deep Learning

Least Square SVM

Linear Discriminant Analysis (LDA) Support Vector Machine (SVM) Support Vector Machine, K-Nearest Neighbor, Classification Trees, and Artificial Neural Network LR, SVM, and NB

BayesNet (BN), Support Vector Machine (SVM), Logistic Regression (LR), k-nearest neighbor (KNN) and RandomForest (RF) classifiers were used. And BestFirst (BF), GreedyStepwise (GSW), GeneticSearch (GS), LinearForwordSelection (LFS) and RankSearch (RS) SVM with RBF kernel neuro-fuzzy and artificial neural network algorithms LR, SVM, and NB

Decision Tree Classification

Support Vector Machine (SVM) Classifier K-Nearest Neighbors (K-NN), Linear Discriminant Analysis (LDA), Logistic regression

KNN and BPNN

Mixture of factor analysis (MFA)

None

Support Vector Machine (SVM)

Discriminant Analysis

Classifier

Table 3 Methods for EEG-based Machine Learning Classification for Diagnosing the Depression.

depressive and 15 control students with mild depression depressive and 10 control MDD patients and 30 healthy 33 MDD 30 healthy Controls

15 51 10 34

NA MDD = 65 patients 33 MDD patients and 30 healthy controls 17 depressive and 17 control 12 depressive and 12 control 92 depressive and 121 control 33 MDD patients and 30 healthy controls 15 depressive and 15 control

70 MDD 23 Controls 25 MDD 25 Controls 17 MDD 17 Controls 64 MDD 207 Controls 13 depressed females and 12 age matched controls 25 MDD 25 Controls NA 45 MDD 45 Controls 53 MDD 43 Controls 37 students with mild depression

Sample Size

Accuracy = 98.32%

Accuracies = 93% and 95% Accuracy = 85.62% Accuracies = 96% and 98.4% Accuracy = 93.33%

Accuracy = 88.92% Accuracy = 76.88% LR classifier (accuracy = 97.6%) NB classification (accuracy = 96.8%) and SVM (accuracy = 98.4%) Accuracy = 91% Accuracy = 81.23% Accuracy = 79.27% SVM classification accuracy = 98%; LR classification accuracy = 91.7%; NB classification Accuracy = 99.58%

Accuracies = 92.00% and 98.00%

Accuracy = 80%

Accuracy = 98% Accuracy = 90%

Accuracy = 91%

Accuracy = 92.9% and 94.2%

Accuracy = 85%

True detection rate = 88% for patients and 82% for controls

Accuracy = 94%

Accuracy = 91.3%

Classification Results

W. Mumtaz and A. Qayyum

International Journal of Medical Informatics 132 (2019) 103983

International Journal of Medical Informatics 132 (2019) 103983

W. Mumtaz and A. Qayyum

have concluded promising research results. On the contrary, only two studies have advocated the use of deep learning. The studies have concluded with promising classification results as well. However, the clinical applications of these methods are less clear and more evidence on the effectiveness of ML models is required. The models proposed in this manuscript adds to the knowledgebase. The study had few limitations such as the effects of antidepressants should not be neglected. In this study, the study participants were advised to have a wash out period of at-least 2 weeks before performing any kind of data recording. Since the patients are outpatients and it was advised to refrain from caffeine intake or any kind of smoking; however, the possible negative effects of these factors could not be avoided. The generalization of the proposed classification models needs further evaluation with more and different study samples. Hence, before generalization the results of this study, caution must be adopted. The classification models could be employed in a clinical environment; however, the process needs the development of well-defined graphical user interface (GUI) that can be learned by the clinicians. In particular, the GUI provides a one-click solution that can initiate EEG data recording and perform the classification of the sample under test as either healthy or depressed. In addition, the proposed classification models should be generalized and tested with the new study samples.

Comput. Methods Programs Biomed. 161 (2018) 103–113. [3] B. Ay, O. Yildirim, M. Talo, U.B. Baloglu, G. Aydin, S.D. Puthankattil, et al., Automated depression detection using deep representation and sequence learning with EEG signals, J. Med. Syst. 43 (2019) 205. [4] X. Li, R. La, Y. Wang, J. Niu, S. Zeng, S. Sun, et al., EEG-based mild depression recognition using convolutional neural network, Med. Biol. Eng. Comput. (2019) 1–12. [5] A. American Psychiatric Association and A. P. Association, Diagnostic and Statistical Manual of Mental Disorders, (1994). [6] U.R. Acharya, V. Sudarshan, H. Adeli, J. Santhosh, J. Koh, S. Puthankatti, et al., A novel depression diagnosis index using nonlinear features in EEG signals, Eur. Neurol. 74 (2015) 79–83. [7] W.M.R.W. Mahmud, A. Awang, I. Herman, M.N. Mohamed, Analysis of the psychometric properties of the Malay version of Beck Depression Inventory II (BDI-II) among postpartum women in Kedah, north west of peninsular Malaysia, MJMS 11 (2004) 19. [8] H.H. JASPER, The ten twenty electrode system of the international federation, Electroencephalogr. Clin. Neurophysiol. 10 (1958) 371–375. [9] Y. Qin, P. Xu, D. Yao, A comparative study of different references for EEG default mode network: the use of the infinity reference, Clin. Neurophysiol. 121 (2010) 1981–1991. [10] P. Berg, M. Scherg, A multiple source approach to the correction of eye artifacts, Electroencephalogr. Clin. Neurophysiol. 90 (1994) 229–241. [11] K. Hoechstetter, P. Berg, M. Scherg, BESA research tutorial 4: distributed source imaging, BESA Res. Tut. (2010) 1–29. [12] V. Knott, C. Mahoney, S. Kennedy, K. Evans, EEG power, frequency, asymmetry and coherence in male depression, Psychiatry Res. Neuroimaging 106 (2001) 123–140. [13] I. Kalatzis, N. Piliouras, E. Ventouras, C.C. Papageorgiou, A.D. Rabavilas, D. Cavouras, Design and implementation of an SVM-based computer classification system for discriminating depressive patients from healthy controls using the P600 component of ERP signals, Comput. Methods Programs Biomed. 75 (2004) 11–22. [14] M. Bachmann, J. Lass, A. Suhhova, H. Hinrikus, Spectral asymmetry and Higuchi’s fractal dimension measures of depression electroencephalogram, Comput. Math. Methods Med. 2013 (2013). [15] A. Khodayari-Rostamabad, J.P. Reilly, G. Hasey, D. MacCrimmon, Diagnosis of psychiatric disorders using EEG data and employing a statistical decision model, Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, (2010), pp. 4006–4009. [16] X. Zhang, B. Hu, L. Zhou, P. Moore, J. Chen, An EEG based pervasive depression detection for females, Joint International Conference on Pervasive Computing and the Networked World, (2012), pp. 848–861. [17] U.R. Acharya, V.K. Sudarshan, H. Adeli, J. Santhosh, J.E. Koh, A. Adeli, Computeraided diagnosis of depression using EEG signals, Eur. Neurol. 73 (2015) 329–336. [18] U.R. Acharya, V.K. Sudarshan, H. Adeli, J. Santhosh, J.E. Koh, S.D. Puthankatti, et al., A novel depression diagnosis index using nonlinear features in EEG signals, Eur. Neurol. 74 (2015) 79–83. [19] B. Hosseinifard, M.H. Moradi, R. Rostami, Classifying depression patients and normal subjects using machine learning techniques and nonlinear features from EEG signal, Comput. Methods Programs Biomed. 109 (2013) 339–345. [20] M. Mohammadi, F. Al-Azab, B. Raahemi, G. Richards, N. Jaworska, D. Smith, et al., Data mining EEG signals in depression for their diagnostic value, BMC Med. Inform. Decis. Mak. 15 (2015) 108. [21] X. Li, B. Hu, S. Sun, H. Cai, EEG-based mild depressive detection using feature selection methods and classifiers, Comput. Methods Programs Biomed. 136 (2016) 151–161. [22] G.M. Bairy, U. Niranjan, S.D. Puthankattil, Automated classification of depression EEG signals using wavelet entropies and energies, J. Mech. Med. Biol. 16 (2016) 1650035. [23] B. Mohammadzadeh, M. Khodabandelu, M. Lotfizadeh, Comparing diagnosis of depression in depressed patients by EEG, based on two algorithms: artificial Nerve Networks and Neuro-Fuzy Networks, Int. J. Epidemiol. Res. 3 (2016) 246–258. [24] W. Mumtaz, L. Xia, S.S.A. Ali, M.A.M. Yasin, M. Hussain, A.S. Malik, Electroencephalogram (EEG)-based computer-aided technique to diagnose major depressive disorder (MDD), Biomed. Signal Process. Control 31 (2017) 108–115. [25] M. Bachmann, J. Lass, H. Hinrikus, Single channel EEG analysis for detection of depression, Biomed. Signal Process. Control 31 (2017) 391–397. [26] S.-C. Liao, C.-T. Wu, H.-C. Huang, W.-T. Cheng, Y.-H. Liu, Major depression detection from EEG signals using kernel eigen-filter-bank common spatial patterns, Sensors 17 (2017) 1385. [27] H. Cai, J. Han, Y. Chen, X. Sha, Z. Wang, B. Hu, et al., A pervasive approach to EEGBased depression detection, Complexity 2018 (2018). [28] W. Mumtaz, S.S.A. Ali, M.A.M. Yasin, A.S. Malik, A machine learning framework involving EEG-based functional connectivity to diagnose major depressive disorder (MDD), Med. Biol. Eng. Comput. 56 (2018) 233–246. [29] M. Sharma, P. Achuth, D. Deb, S.D. Puthankattil, U.R. Acharya, An automated diagnosis of depression using three-channel bandwidth-duration localized wavelet filter bank with EEG signals, Cogn. Syst. Res. 52 (2018) 508–520. [30] Y. Li, B. Hu, X. Zheng, X. Li, EEG-based mild depressive detection using differential evolution, IEEE Access 7 (2019) 7814–7822. [31] S. Mahato, S. Paul, Detection of major depressive disorder using linear and nonlinear features from EEG signals, Microsyst. Technol. 25 (2019) 1065–1076.

5. Conclusion This manuscript has shown the promise of deep learning architecture for automatic diagnosis of unipolar depression. The advantage of the proposed deep learning framework over the conventional ML frameworks is the elimination of explicit feature extraction stage. In addition, the electroencephalogram (EEG) provides a cheaper solution than the expensive and non-portable modalities such as functional magnetic resonance imaging (fMRI) and magnetoencephalogram (MEG) devices. Hence, the winning combination of EEG and deep learning algorithms provides a solution that make it feasible to treat the patients at their doorsteps. In addition, the deep learning framework for clinical settings could revolutionize the treatment management for depression. Despite the limitations, mentioned in the discussion section, the classifier models have shown promising discrimination abilities. In short, EEG-based deep learning for unipolar depression could be utilized for clinical applications. Author statement We declared that there are no competing interest and the work in original submission. The study designed and the sample size calculation has passed the ethics approval of the study. Declaration of Competing Interest This work is original contribution and there is no conflict of interest found. Acknowledgements These authors are thankful to their parents and teachers for their unforgettable love and support that makes the learning process so easy and comfortable. References [1] C.L. Bowden, A different depression: clinical distinctions between bipolar and unipolar depression, J. Affect. Disord. 84 (2005) 117–125. [2] U.R. Acharya, S.L. Oh, Y. Hagiwara, J.H. Tan, H. Adeli, D.P. Subha, Automated EEG-based screening of depression using deep convolutional neural network,

9