Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine

Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine

Biomedical Signal Processing and Control 49 (2019) 427–433 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journa...

1MB Sizes 0 Downloads 6 Views

Biomedical Signal Processing and Control 49 (2019) 427–433

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine Salim Lahmiri a,∗ , Amir Shmuel a,b a b

The Montreal Neurological Institute, Department of Neurology and Neurosurgery, McGill University, Montreal, Canada Department of Physiology, Department of Biomedical Engineering, McGill University, Montreal, Canada

a r t i c l e

i n f o

Article history: Received 28 August 2017 Received in revised form 6 August 2018 Accepted 27 August 2018 Keywords: Parkinson’s disease Voice disorder Features ranking Support vector machine Radial basis function Bayesian optimization Classification

a b s t r a c t Parkinson’s disease (PD) is a neurodegenerative disorder that causes severe motor and cognitive dysfunctions. Several types of physiological signals can be analyzed to accurately detect PD by using machine learning methods. This work considers the diagnosis of PD based on voice patterns. In particular, we focus on assessing the performance of eight different pattern ranking techniques (also termed feature selection methods) when coupled with nonlinear support vector machine (SVM) to distinguish between PD patients and healthy control subjects. The parameters of the radial basis function kernel of the SVM classifier were optimized by using Bayesian optimization technique. Our results show that the receiver operating characteristic and the Wilcoxon-based ranking techniques provide the highest sensitivity and specificity. © 2018 Elsevier Ltd. All rights reserved.

1. Introduction Parkinson’s disease (PD) is a neurodegenerative disorder linked to loss of dopamine-producing neurons in the basal ganglia [1]. The major symptoms of PD can be classified into motor and nonmotor symptoms [1]. The first category includes tremor, muscular rigidity, bradykinesia, and postural instability, whereas the second category includes depression, executive dysfunctions, sleep disturbances, and autonomic impairments. Over recent years, the efforts to understand and characterize PD have intensified. Recent studies have focused on detection of PD by relying on several measurements, including the dynamics of electro-myographic (EMG) signals [2], gait analysis [3–7], spontaneous cardiovascular oscillations [8], compound force signal [9], and steadiness of syllable repetition [10]. In order to assist clinicians in distinguishing between PD patients and normal subjects, several computer-aided diagnosis (CAD) systems have been proposed. For instance, the authors of [11] used fractional amplitude of low-frequency resting-state functional magnetic resonance imaging (RS-fMRI) and support vector machine (SVM) for classification. The results from 51 patients with PD and 50 healthy controls based on the leave-one-out cross-validation method (LOOM) showed that the proposed system

∗ Corresponding author. E-mail addresses: [email protected] (S. Lahmiri), [email protected] (A. Shmuel). https://doi.org/10.1016/j.bspc.2018.08.029 1746-8094/© 2018 Elsevier Ltd. All rights reserved.

distinguished PD from healthy control subjects with 92% sensitivity and 87% specificity. A system based on principal components analysis (PCA) for feature extraction from whole brain structural magnetic resonance images (MRI) and SVM was proposed in [12]. Experimental results from 28 PD and 28 healthy control subjects indicated a mean accuracy above 92.7% following LOOM. A system that uses multilevel region of interests (ROI) features from structural brain MRI, filtering, wrapper feature selection methods, and multi-kernel support vector machine (SVM) was proposed in [13]. Experimental results from 69 PD patients and 103 normal controls showed that the proposed PD detection system achieved an accuracy of 85.78%, specificity of 87.79%, and sensitivity of 87.64% based on 10-fold cross-validation scheme. In another study, a PD detection system relied on radial basis function (RBF) neural networks trained with gait characteristics represented by gait dynamics [14]. The proposed PD detection system was tested on gait patterns from 93 PD patients and 73 healthy controls using 5-fold cross-validation. The RBF neural networks achieved 96.39% accuracy, 96.77% sensitivity and 95.89% specificity. Other studies used patterns from emotional information [15], handwriting [16], articulation disorder [17], and dysphonia measurements [18]. The main objective of this study is to distinguish PD patients from healthy control subjects by using SVM trained with voice disorder patterns. In fact, compared to alternative physiological measurements, speech signal based patterns obtained noninvasively are informative characteristics that can discriminate PD from healthy control subjects [17,18]. Vocal impairment is a PD

428

S. Lahmiri, A. Shmuel / Biomedical Signal Processing and Control 49 (2019) 427–433

symptom that can be revealed up to five years prior to clinical diagnosis [19], and clear majority of PD patients classically show some form of vocal disorder [20]. We present a detailed comparison of features ranking techniques coupled with SVM in the task of PD detection. Indeed, voice patterns influence the performance of the classifier since some of them could be redundant or irrelevant. Pattern ranking makes it possible to assess the relevance and to select the most distinctive features and class variables. Therefore, pattern ranking methods can identify the most informative features to be used along with a classifier in the design of a CAD system. In this study, the performance of the SVM classifier under eight feature/pattern ranking techniques is examined in terms of accuracy, sensitivity, and specificity statistics. The included eight features ranking techniques are t-test, entropy, Bhattacharyya statistic, receiver operating characteristic (ROC), Wilcoxon statistic, fuzzy mutual information (FMI), genetic algorithms (GA), and the SVM recursive feature elimination with correlation bias reduction (RFE-CBR). We chose these statistical feature selection techniques because they are fast and effective [21]. In addition, genetic algorithms are inductive. Adaptive random search techniques are capable of exploiting accumulating information about an unknown search space to subsequently search into new promising subspaces. In addition, they are fundamentally a domain independent search techniques, suitable when domain knowledge and theory are difficult or impossible to provide [22]. Finally, the SVM-based RFE-CBR is an embedded feature selection algorithm (or a wrapper-based technique) that uses criteria derived from the coefficients in original SVM models to assess features. It recursively removes features that are not informative. Compared to other wrapper-based techniques, the SVM-based RFE-CBR does not use the cross-validation accuracy on the training data as the selection criterion. Therefore, it is less prone to overfitting and is fast even if the original feature set is large [23]. In summary, in the current work we focus only on fast feature selection techniques chosen from statistical filters, evolutionary algorithms, and SVM-based embedded feature selection techniques. We rely on SVM [24] as the main classifier because of the following reasons. Based on structural risk minimization, the SVM classifier can evade local minima and have excellent generalization ability [17]. It is robust to limited data [24], and it performs better than linear discriminant analysis, k nearest-neighbors, naïve Bayes, regression trees, and radial basis function in identifying PD patients based on dysphonia measurements [25]. In general, it is effective in biomedical data classification problems [11,14,26–28].

2. Data and methods In this work, a reduced dataset from [18] is used to examine the performance of the SVM classifier under each feature ranking technique. The dataset in [18] contains 132 voice patterns, whilst the current dataset contains 22. The current dataset contains 195 vowel phonations from 147 PD patients and 48 healthy control (HC) subjects. For each vowel phonation, a set of 22 voice patterns is measured; including the average vocal fundamental frequency (Fo), maximum vocal fundamental frequency (Fhi), minimum vocal fundamental frequency (Flo), jitter (%), jitter absolute value (Abs), relative amplitude perturbation (RAP), period perturbation quotient (PPQ), difference of differences between cycles divided by average period (DDP), local shimmer, shimmer in decibels (db), three point amplitude perturbation quotient (APQ3), five point amplitude perturbation quotient (APQ5), amplitude perturbation quotient (APQ), average absolute difference between consecutive differences between amplitudes of consecutive periods (DDA), noise-to-harmonics ratio (NHR), harmonics-to-noise ratio (HNR),

Fig. 1. Plots of the cumulative distribution function (CDF). Blue and red lines present data from healthy control subjects and Parkinson’s patients, respectively. For each panel, the value of the voice pattern runs along the horizontal axis and its corresponding CDF value along the vertical axis. See the methods section for the various measures extracted from the voice.

recurrence period density entropy (RPDE), detrended fluctuation analysis based fractal (DFA), two measures of spread (Spread 1, Spread 2), correlation dimension (D2), and pitch period entropy (PPE). The full description of these patterns can be found in [18]. Fig. 1 presents the cumulative distribution function (CDF) of each vocal pattern, separately for HC and PD patients. As indicated previously, eight features ranking techniques were employed; namely t-test [29], entropy [30], Bhattacharyya statistic [31], ROC [32], Wilcoxon statistic [33], FMI [34], GA [35], and RFE-CBR [36]. Therefore, voice patterns are ranked from the most to less informative according to each pattern ranking technique. Then, we investigate SVM classification accuracy as a function of the k best ranked voice patterns for each ranking technique. Accordingly, for each ranking technique, the SVM is trained with the first ranked voice pattern, first and second ranked voice patterns, and so on until using all patterns. For instance, the training and testing of the SVM is carried out for the k highest ranked voice patterns ranging from one (first ranked pattern) to twenty-two, where twenty-two is the number of voice patterns. In this framework, each k ranked voice pattern represent a subset of features.

S. Lahmiri, A. Shmuel / Biomedical Signal Processing and Control 49 (2019) 427–433

429

1 0.9 0.8

Accuracy

0.7 0.6 0.5

Battacharyya GA ROC RFE-CBR Wilcoxon Entropy t-test FMI

0.4 0.3 0.2 0.1 0 0

2

4

6

8

10 12 Number of patterns

14

16

18

20

22

Fig. 2. SVM accuracy as a function of the number of patterns.

SVM [24] is a powerful nonlinear classifier that employs a hyperplane based on structural risk minimization to separate classes. The space between classes and the constructed hyper-plane is maximized to distinguish between classes. In particular, the linear SVM is given by: y = f (x) = wT x − b

(1)

where x is the data, y is class label, w is the weight vector orthogonal to the decision hyper-plane, b is offset of the hyper-plane, and T is the transpose operator. The solution to the linear SVM is found by maximizing the margin used to separate classes. This is equivalent to solving the following minimization problem: min w,b,

1 2

wT w + C

Subject to,

n

i=1





i



(2)



yi = f (xi ) = wT xi − b ≥ 1 − i



(3)

where ␰ (␰i ≥0, i = 1,2,. . .,n) is a slack variable used to indicate the allowed degree of classification error, C > 0 is a penalty parameter which is the upper bound on the error, and n is the number of instances. The nonlinear SVM classifier employs a kernel function K to separate data nonlinearly. It is expressed as follows: f (xi ) = sign

 n 







yi ˛i K x, xi + b

(4)

i=1

where ˛ is the Lagrange multiplier, K is a kernel function, and b is a constant coefficient. In our work, we adopt the radial basis function (RBF) as the nonlinear Kernel. It is given by:





second ranked voice patterns, and so on until using all twenty-two voice patterns of the entire dataset. Accordingly, the performance of the SVM classifier is calculated for each subset of k highest ranked voice patterns, where k ranges from one to all twenty two voice patterns.

2

K (x, xi ) = exp −ı |x − xi |

(5)

where ı > 0 is a scale parameter. In our study, the value of the slack variable  is set to 0.001. For each ranking feature along with each run, we optimize the cross-validated SVM classifier penalty parameter C and scale parameter ı by using Bayesian optimization [37]. For robust validation of the SVM classifier, our training-testing stages apply ten-fold cross-validation. Then, the average values of common performance measures, including accuracy, specificity and sensitivity are computed to evaluate the performance of the SVM under each feature ranking technique. Recall that for each feature ranking approach, the features are sorted to consider features with the highest discriminative power first. Afterward, the SVM classifier is trained with the first ranked voice pattern, first and

3. Results For each feature ranking technique, voice patterns were ranked from the most to the least informative. Then, ten-fold crossvalidation method was pursued for training and testing the SVM classifier with each subset of ranked voice patterns. Following the ten-fold cross-validation, the average and standard deviation of each performance measure were calculated. Figs. 2–4 present the average accuracy, specificity, and sensitivity of the SVM classifier as a function of the number of highest ranked patterns used for training and classification. According to Fig. 2, the highest SVM classification accuracy performance (92.21%) to distinguish between PD and HC was achieved when using 14 patterns selected by the Wilcoxon-based pattern ranking technique. Conversely, the lowest SVM classification accuracy performance (71.20%) was achieved with a single pattern selected by the entropy-based pattern ranking technique. According to Fig. 3, the highest SVM specificity (82.79%) was obtained with 13 patterns selected by the ROC-based pattern ranking technique. The lowest SVM specificity performance (0%) was obtained with a single pattern selected by the Battacharyya, genetic algorithm and the t-test based pattern ranking techniques. When using up to seven patterns, the RFE-CBR based ranking method outperformed all other ranking techniques in terms of specificity. Fig. 4 shows that the highest SVM sensitivity performance (99.63%) was obtained with only a single pattern selected by the ROC-based pattern ranking technique. The lowest SVM sensitivity performance (85.58%) was obtained with 17 patterns using the entropy-based patterns ranking technique. Figs. 5–7 present boxplots of accuracy, specificity, and sensitivity distributions for all pattern ranking techniques under study. According to the distributions of accuracy shown in Fig. 5, the ROC and Wilcoxon ranking techniques are better suited for coupling with SVM than the other ranking techniques. In addition, the Battacharyya-based and the Entropy-based ranking techniques yield lower accuracy in comparison with the other techniques. Inspecting the distributions of specificity in Fig. 6, the ROC, RFECBR and Wilcoxon based pattern ranking techniques perform better than the other techniques. The entropy-based and the fuzzy mutual

430

S. Lahmiri, A. Shmuel / Biomedical Signal Processing and Control 49 (2019) 427–433

0.9 0.8 0.7

Specificity

0.6 0.5 Battacharyya GA ROC RFE-CBR Wilcoxon Entropy t-test FMI

0.4 0.3 0.2 0.1 0 0

2

4

6

8

10 12 Number of patterns

14

16

18

20

22

Fig. 3. SVM specificity as a function the number of patterns.

1 0.9 0.8

Sensitivity

0.7 0.6 0.5 Battacharyya GA ROC RFE-CBR Wilcoxon Entropy t-test FMI

0.4 0.3 0.2 0.1 0 0

2

4

6

8

10 12 Number of patterns

14

16

18

20

22

Fig. 4. SVM sensitivity as a function of the number of patterns.

Fig. 5. Boxplot of the SVM accuracy of each of the pattern ranking techniques. The symbol ‘+’ indicates an outlier.

S. Lahmiri, A. Shmuel / Biomedical Signal Processing and Control 49 (2019) 427–433

431

Fig. 6. Boxplot of the SVM specificity of each of the pattern ranking techniques. The symbol ‘+’ indicates an outlier.

Fig. 7. Boxplot of the SVM sensitivity of each of the pattern ranking techniques. The symbol ‘+’ indicates an outlier.

information ranking techniques yield lower specificity than the other ranking methods. Finally, by examining the distributions of sensitivity in Fig. 7, one can observe that the ROC-based and the fuzzy mutual information pattern ranking techniques yield high sensitivity results. In contrast, the RFE-CBR method provides lower sensitivity values. 4. Discussion and conclusion PD is a common neurodegenerative disorder causing motor and cognitive dysfunction. Several studies have been conducted to understand its physiological aspects [2–10] and to design CAD systems to accurately detect PD [15–18]. CAD systems are commonly

based on machine learning, and in many cases they incorporate of a feature selection scheme. We examined the effectiveness of SVM coupled with several pattern ranking techniques in distinguishing PD and healthy control subjects using voice characteristics. Pattern ranking evaluates the importance of patterns and classes, so that most informative features can be fed to the SVM classifier. We selected SVM for classification due to its ability to map original patterns from a high dimensional space to eventually construct an optimal boundary hyper-plane in this space by using nonlinear kernel functions. In addition, it achieves the global optimum and is robust even when the original data sample is small. Moreover, the SVM classifier was also selected based on its success in various biomedical science and engineering applications

432

S. Lahmiri, A. Shmuel / Biomedical Signal Processing and Control 49 (2019) 427–433

Table 1 Comparison with other studies. Works

basis

features

classifier

accuracy/sensitivity/specificity

[11] [12] [13] [14] [15] [16] [17] [18] [18] Current Current

RS-fMRI MRI MRI Gait Emotions Handwriting Articulatory Speech Speech Speech Speech

Fractional amplitude of low-frequency Principal component analysis multilevel region of interests Dynamics of vertical ground Reaction force higher order spectral features Statistics of horizontal and vertical directions Standard articulatory features 132 phonation features Relief applied to 132 features Ranking 22 features using ROC All 22 phonation features

SVM SVM SVM RBFNN SVM NB SVM SVM SVM SVM + BO SVM + BO

NA/92%/87% 92.7%/NA/NA 85.78%/87.79%/87.64% 96.39%/96.77%/95.89% 85.85%±4.88/93.16%±3.19/NA 91%/88%/95% 88%/NA/NA 97.7%±2.8/NA/NA 98.6%±2.1/99.2%±1.8/95.1%±8.4 92.13%/82.79/95.27% 91.82%/80.72%/95.02%

In the current work, the performance obtained by ranking 22 phonation features using the ROC is based on 13 features trained by an SVM classifier optimized by Bayesian optimization. RS-fMRI: resting-state functional magnetic resonance imaging. RBFNN: radial basis function neural network. NB: Naïve Bayes classifier. NA stands for information not available.

[11,14,26–28,38–40]. There are several kernel functions that could be used under SVM, including the linear, quadratic, polynomial, and multilayer perceptron kernels. However, we focused on radial basis function which is a common, local and flexible kernel function. We did not consider a linear kernel because it does not perform well in separating data arranged with a non-linear boundary. The SVM parameters were optimized by using Bayesian optimization [37] which is a fast and effective optimization technique. In this work, we focused on a preprocessing step that can rank patterns based on their capacity to discriminate between instances of the classes before induction takes place. They are simple to implement and interpret. They form an appealing alternative to wrapper techniques which are more demanding computationally. In fact, wrapper techniques are based on an induction algorithm which is assessed over each considered pattern set. For each pattern ranking technique considered in this study, the computations of the ten-fold cross-validation protocol were completed no more than a few seconds. However, only one wrapper-based technique was considered in this study for comparison purpose; namely the SVM-REF-CBR model where a recursive feature elimination (RFE) procedure is employed to shrink the effect of correlation bias in patterns. Consequently, the computational cost of the SVM-REF-CBR was significantly higher than those of the other techniques. The obtained results show that the SVM classifier achieved the highest classification accuracy (92.21%) with the first fourteen voice patterns identified by the Wilcoxon-based pattern ranking technique. The SVM achieved the highest specificity (82.79%) when trained with the first significant thirteen voice patterns identified by the ROC-based pattern ranking technique. The SVM yielded the highest sensitivity, 99.63%, with only one voice pattern under the ROC-based pattern ranking technique. These results are of interest, since they can guide building CAD systems for PD with promising applications. Overall, the SVM classifier achieved 91.82% accuracy, 80.72% sensitivity, and 95.02% specificity when trained with All 22 phonation-based features. Using the ROC-based ranked features with 13 patterns yielded 92.13% accuracy, 82.79% sensitivity, and 95.27% specificity. Therefore, decreasing the number of phonation features leads to small improvements in sensitivity and specificity. In summary, our study shows that ROC-based and the Wilcoxonbased pattern ranking techniques combined with the SVM classifier perform well relative to the other techniques considered in this work. ROC achieved the highest sensitivity (99.63%) with only one voice pattern, the highest specificity (82.79%) with thirteen voice patterns, and the second best accuracy with 13 patterns (92.13% against 92.21% with 14 patterns obtained by the Wilcoxon-based pattern ranking). The high-level performance of the ROC ranking can be explained by the fact that the ROC approach seeks to find an effective compromise between sensitivity and specificity.

Finally, for illustration purpose, Table 1 compares the results from different previous studies using different modalities and methods for PD detection. The work of [18] yielded higher performance than that obtained in the current study. This is mainly due to the fact that the dataset in [18] contained a larger number of phonation features than the one we use in the current study: 132 compared to 22. Therefore, informative and discriminative phonation features that can be found in the dataset used in [18] could not be used in our study. Unfortunately we do not have access to the comprehensive dataset in [18]. Finally, according to the results presented in Table 1, systems for PD detection based on speech yield better results than modalities based on MRI, emotions, and handwriting characteristics. In addition, PD detection systems based on gait patterns yield high performance measures. Lastly, there is no specific theory on the choice of features used to identify PD subjects. Therefore, multimodal feature based system for PD diagnosis needs to be explored. The performance of such developed multimodal systems may further improve accuracy. Acknowledgement Supported by the Natural Sciences and Engineering Research Council of Canada (RGPIN 2015-05103). References [1] R. Yuvaraj, M. Murugappan, N. Mohamed Ibrahim, K. Sundaraj, M.I. Omar, K. Mohamad, R. Palaniappan, Optimal set of EEG features for emotional state classification and trajectory visualization in Parkinson’s disease, Int. J. Psychophysiol. 94 (2014) 482–495. [2] G. De Michele, S. Sello, M. Chiara Carboncini, B. Rossi, S.-K. Strambi, Cross-correlation time-frequency analysis for multiple EMG signals in Parkinson’s disease: a wavelet approach, Med. Eng. Phys. 25 (2003) 361–369. [3] M.R. Daliri, Chi-square distance kernel of the gaits for the diagnosis of Parkinson’s disease, Biomed. Signal Process. Control 8 (2013) 66–70. [4] Y. Xia, Q. Gao, Q. Ye, Classification of gait rhythm signals between patients with neuro-degenerative diseases and normal subjects: experiments with statistical features and different classification models, Biomed. Signal Process. Control 18 (2015) 254–262. [5] B.L. Su, R. Song, L.Y. Guo, C.W. Yen, Characterizing gait asymmetry via frequency sub-band components of the ground reaction force, Biomed. Signal Process. Control 18 (2015) 56–60. [6] Y. Wu, P. Chen, X. Luo, M. Wu, L. Liao, S. Yang, R.M. Rangayyan, Measuring signal fluctuations in gait rhythm time series of patients with Parkinson’s disease using entropy parameters, Biomed. Signal Process. Control 31 (2017) 265–271. [7] M. Eltoukhy, C. Kuenze, J. Oh, M. Jacopetti, S. Wooten, J. Signorile, Microsoft Kinect can distinguish differences in over-ground gait between older persons with and without Parkinson’s disease, Med. Eng. Phys. 44 (2017) 1–7. [8] G. Valenza, S. Orsolini, S. Diciotti, L. Citi, E.P. Scilingo, M. Guerrisi, S. Danti, C. Lucetti, C. Tessa, R. Barbieri, N. Toschi, Assessment of spontaneous cardiovascular oscillations in Parkinson’s disease, Biomed. Signal Process. Control 26 (2016) 80–89. [9] S. Bilgin, The impact of feature extraction for the classification of amyotrophic lateral sclerosis among neurodegenerative diseases and healthy subjects, Biomed. Signal Process. Control 31 (2017) 288–294.

S. Lahmiri, A. Shmuel / Biomedical Signal Processing and Control 49 (2019) 427–433 [10] S. Skodda, Steadiness of syllable repetition in early motor stages of Parkinson’s disease, Biomed. Signal Process. Control 17 (2015) 55–59. [11] Y. Tang, L. Meng, C.-M. Wan, Z.-H. Liu, W.-H. Liao, X.-X. Yan, X.-Y. Wang, B.-S. Tang, J.-F. Guo, Identifying the presence of Parkinson’s disease using low-frequency fluctuations in BOLD signals, Neurosci. Lett. 645 (2017) 1–6. [12] C. Salvatore, A. Cerasa, I. Castiglioni, F. Gallivanone, A. Augimeri, M. Lopezd, G. Arabia, M. Morelli, M.C. Gilardi, A. Quattrone, Machine learning on brain MRI data for differential diagnosis of Parkinson’s disease and Progressive Supranuclear Palsy, J. Neurosci. Methods 222 (2014) 230–237. [13] B. Peng, S. Wang, Z. Zhou, Y. Liu, B. Tong, T. Zhang, Y. Dai, A multilevel-ROI-features-based machine learning method for detection of morphometric biomarkers in Parkinson’s disease, Neurosci. Lett. 651 (2017) 88–94. [14] W. Zeng, F. Liu, Q. Wang, Y. Wang, L. Ma, Y. Zhang, Parkinson’s disease classification using gait analysis via deterministic learning, Neurosci. Lett. 633 (2016) 268–278. [15] R. Yuvaraj, M. Murugappan, N.M. Ibrahim, K. Sundaraj, M.I. Omar, K. Mohamad, R. Palaniappan, Detection of emotions in Parkinson’s disease using higher order spectral features from brain’s electrical activity, Biomed. Signal Process. Control 14 (2014) 108–116. [16] C. Kotsavasiloglou, N. Kostikis, D. Hristu-Varsakelis, M. Arnaoutoglou, Machine learning-based classification of simple drawing movements in Parkinson’s disease, Biomed. Signal Process. Control 31 (2017) 174–180. ˇ ´ J. Rusz, R. Cmejla, ˚ ziˇcka, Automatic evaluation of articulatory [17] M. Novotny, E. Ruˇ disorders in Parkinson’s disease, IEEE/ACM Trans. Audio Speech Proc. Conf. Empir. Methods Nat. Lang. Process. 22 (2014) 1366–1378. [18] A. Tsanas, M.A. Little, P.E. McSharry, J. Spielman, L.O. Ramig, Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease, IEEE Trans. Biomed. Eng. 59 (2012) 1264–1271. [19] B. Harel, M. Cannizzaro, P.J. Snyder, Variability in fundamental frequency during speech in prodromal and incipient Parkinson’s disease: a longitudinal case study, Brain Cogn. 56 (2004) 24–29. [20] A. Ho, R. Iansek, C. Marigliani, J. Bradshaw, S. Gates, Speech impairment in a large sample of patients with Parkinson’s disease, Behav. Neurol. 11 (1998) 131–137. [21] S. Lahmiri, C. Gargour, M. Gabrea, Statistical features selection and pathologies detection in retina digital images, in: Proceeding of The 38th Annual Conference on IEEE Industrial Electronics Society, 2012, pp. 1585–1590. [22] K. De Jong, Learning With Genetic Algorithms: an Overview, Machine Learning, Vol. 3, Kluwer Academic publishers, 1988. [23] K. Yana, D. Zhang, Feature selection and analysis on correlated gas sensor data with recursive feature elimination, Sens. Actuators B: Chem. 212 (2015) 353–363. [24] V.N. Vapnik, The Nature of Statistical Learning Theory, Springer, New York, 1995. [25] S. Lahmiri, D.A. Dawson, A. Shmuel, Performance of machine learning methods in diagnosing Parkinson’s disease based on dysphonia measures, Biomed. Eng. Lett. 8 (2018) 29–39.

433

[26] R. Khelifi, M. Adel, S. Bourennane, Segmentation of multispectral images based on band selection by including texture and mutual information, Biomed. Signal Process. Control 20 (2015) 16–23. [27] S. Lahmiri, Image characterization by fractal descriptors in variational mode decomposition domain: application to brain magnetic resonance, Physica A 456 (2016) 235–243. [28] S. Lahmiri, Glioma detection based on multi-fractal features of segmented brain MRI by particle swarm optimization techniques, Biomed. Signal Process. Control 31 (2017) 148–155. [29] U. Fayyad, K. Irani, Multi-interval discretization of continuous valued attributes for classification learning, in: Proceeding of The International Joint Conference on Artificial Intelligence, 1993, pp. 1022–1029. [30] S. Kullback, R.A. Leibler, On information and sufficiency, Ann. Math. Stat. 22 (1951) 79–86. [31] Y. Chen, L. Zhang, J. Li, Y. Shi, Domain driven two-phase feature selection method based on bhattacharyya distance and kernel distance measurements, in: Proceeding of The IEEE International Conferences on Web Intelligence and Intelligent Agent Technology, 2011, pp. 217–220. [32] H. Mamitsuka, Selecting features in microarray classification using ROC curves, Pattern Recognit. 39 (2006) 2393–2404. [33] J.L. Myers, A. Well, Research Design and Statistical Analysis, Lawrence Erlbaum Associates, Inc, Mahwah, New Jersey, USA, 2003. [34] N. Hoque, H.A. Ahmed, D.K. Bhattacharyya, J.K. Kalita, A fuzzy mutual information-based feature selection method for classification, Fuzzy Inf. Eng. 8 (2016) 355–384. [35] I. Beheshti, H. Demirel, H. Matsuda, for the Alzheimer’s Disease Neuroimaging Initiative, Classification of Alzheimer’s disease and prediction of mild cognitive impairment-to-Alzheimer’s conversion from structural magnetic resource imaging using feature ranking and a genetic algorithm, Comput. Biol. Med. 83 (2017) 109–119. [36] I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Mach. Learn. 46 (2002) 389–422. [37] T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning, Data Mining, Inference, and Prediction, second edition, Springer-Verlag, New York, 2009. [38] S. Lahmiri, M. Boukadoum, New approach for automatic classification of Alzheimer’s disease, mild cognitive impairment and healthy brain magnetic resonance images, IET Healthcare Technol. Lett. 1 (2014) 32–36. [39] S. Lahmiri, M. Boukadoum, Hybrid discrete wavelet transform and gabor filter banks processing for features extraction from biomedical images, J. Med. Eng. (2013), 104684, http://dx.doi.org/10.1155/2013/104684, 13 pages. [40] S. Lahmiri, An accurate system to distinguish between normal and abnormal electroencephalogram records with epileptic seizure free intervals, Biomed. Signal Process. Control 40 (2018) 312–317.