Combining extreme learning machine with modified sine cosine algorithm for detection of pathological brain

Computers and Electrical Engineering 68 (2018) 366–380 Contents lists available at ScienceDirect Computers and Electrical Engineering journal homepa...

Download PDF

1MB Sizes 0 Downloads 23 Views

Report

PDF Reader
Full Text

Computers and Electrical Engineering 68 (2018) 366–380

Contents lists available at ScienceDirect

Computers and Electrical Engineering journal homepage: www.elsevier.com/locate/compeleceng

Combining extreme learning machine with modiﬁed sine cosine algorithm for detection of pathological brain☆

T

⁎

Deepak Ranjan Nayak ,a, Ratnakar Dasha, Banshidhar Majhia, Shuihua Wangb a b

Pattern Recognition Lab, Department of Computer Science and Engineering,NIT Rourkela, 769 008, India School of Electronic Science and Engineering, Nanjing University, Nanjing, Jiangsu 210 046, China

A R T IC LE I N F O

ABS TRA CT

Keywords: Pathological brain detection Magnetic resonance imaging Fast discrete curvelet transform Extreme learning machine Modiﬁed sine cosine algorithm

Development of automated diagnosis systems has taken a major place in current research practice to assist medical experts in decision-making. This paper presents a new automatic system for detection of pathological brain through magnetic resonance imaging (MRI). The proposed system involves contrast enhancement of input MR images using contrast limited adaptive histogram equalization (CLAHE). Then, the curve like features are computed from the preprocessed MR brain images using fast discrete curvelet transform via unequally-spaced FFT (FDCT-USFFT). Subsequently, a combined technique known as PCA+LDA is employed to derive more discriminative and reduced feature sets. Finally, a novel learning approach dubbed as extreme learning machine with modiﬁed sine cosine algorithm (MSCA-ELM) is proposed by combining ELM and MSCA for classiﬁcation of MR images into two categories: pathological and healthy. A mutation operator is introduced to basic SCA (MSCA). In MSCA-ELM, MSCA is used to optimize the input weights and hidden biases of single-hidden layer feed-forward neural network (SLFN) and an analytical procedure is used to compute the output weights. The proposed scheme is rigorously evaluated on three standard datasets and the results are compared against other competent schemes. The experimental results demonstrate that the proposed scheme outperforms its counterparts in terms of classiﬁcation accuracy and number of features required. It has also been noticed that MSCA-ELM yields superior performance than conventional learning methods. Hence, the proposed system can eﬀectively recognize pathological brain in real-time and can possibly be installed on medical robots.

1. Introduction Across the globe, the death rate of individuals with various age groups is increasing immeasurably due to several brain diseases [1]. Pathological brain detection (PBD) has played vital role for early diagnosis of various diseases such as Alzheimer’s disease, mild cognitive impairment, autism spectrum disorder, multiple sclerosis, hearing loss, and microbleeding. The signiﬁcant objective of PBD is to help radiologists in taking correct and quick clinical decisions. Magnetic resonance imaging (MRI), an advanced neuroimaging technique, is frequently used in PBD because of its ability to provide better resolution of brain tissues and its radiation-free properties [2]. However, manual interpretation is tedious and may subject to error due to the high image contents [3,4]. Thus there is a strong demand for identiﬁcation, evaluation and classiﬁcation support tools in the diagnostic procedure. Pathological brain detection system (PBDS) development is a growing research area that aims at meeting these demands. Through PBDS we can speed up

☆ ⁎

Reviews processed and recommended for publication to the Editor-in-Chief by Guest Editor Dr. Y. Zhang. Corresponding author. E-mail addresses: [email protected] (D.R. Nayak), [email protected] (R. Dash), [email protected] (B. Majhi), [email protected] (S. Wang).

https://doi.org/10.1016/j.compeleceng.2018.04.009 Received 1 November 2017; Received in revised form 13 April 2018; Accepted 16 April 2018 0045-7906/ © 2018 Elsevier Ltd. All rights reserved.

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

the clinical decisions and reduce the diagnostic errors. The work on PBD started in early 2000 [1,5]. The eﬀorts then were initiated by Chaplot et al. [6] in which 2D discrete wavelet transform (2D DWT) and support vector machine (SVM) are used for feature extraction and classiﬁcation. El-Dahshan et al. [7] have employed 2D DWT and two classiﬁers such as k-nearest neighbor (KNN) and feed forward back-propagation artiﬁcial neural network (FP-ANN). To reduce the feature dimensionality, they have applied principal component analysis (PCA). The authors in [2,8,9] have used scaled conjugate gradient (SCG), particle swarm optimization (PSO), adaptive chaotic PSO (ACPSO), and scaled chaotic artiﬁcial bee colony (SCABC) to train the feed forward neural network (FNN) classiﬁer. Zhang et al. [10] have combined DWT, PCA and kernel SVM (KSVM). In [3], a ripplet transform (RT) and least squares SVM (LS-SVM) based system is suggested. In [11], the authors harnessed wavelet entropy (WE) to extract features and probabilistic neural network (PNN) is used for classiﬁcation. Later, in [1], the authors have combined feedback pulse coupled neural network (FPCNN), DWT, PCA and FNN to detect pathological brain. Dong et al. [12] have utilized wavelet packet Shannon entropy (WPSE) and wavelet packet Tsallis entropy (WPTE) separately as features. In this, GEPSVM is employed as classiﬁer. Nayak et al. [4] have utilized 2D DWT, probabilistic PCA (PPCA) and AdaBoost with random forests (ADBRF) for identifying pathological brains. Zhang et al. [13] have oﬀered a PBDS which combines stationary wavelet transform (SWT), PCA, and GEPSVM. In [14], a PCA+LDA technique is applied on the 2D DWT features. In [15], Naive Bayes classiﬁer (NBC) based PBDS is proposed which makes use of WE features. Sun et al. [16] have utilized GEPSVM+RBF classiﬁer on WE and Hu moment invariants (HMI) features. Wang et al. [17] have proposed a novel feature called fractional Fourier entropy (FRFE) and performed Welch’s t-test (WTT) to select the relevant features. Twin SVM (TSVM) classiﬁer is employed for classiﬁcation. Later, in [18], a PBDS based on FRFE features and multilayer perceptron (MLP) is proposed. They have employed an adaptive real coded BBO (ARCBBO) approach for training the MLP. In this case, the number of hidden neurons of MLP is found using three separate pruning methods, namely, Bayesian detection boundaries (BDB), dynamic pruning (DP) and Kappa coeﬃcient (KC). Chen et al. [19] have utilized Minkowski-Bouligand dimension (MBD) features and proposed an improved PSO (IPSO) to train the single-hidden layer feedforward neural network. Later on, Wang et al. [20] combined the variance and entropy (VE) values of dual-tree complex wavelet transform (DTCWT) and TSVM to detect pathological brain. Li et al. [21] have employed wavelet packet Tsallis entropy (WPTE) and FNN with real-coded biogeography-based optimization (RCBBO) for pathological brain detection. The literature studies reveals that 2D DWT and its variants (SWT, DTCWT, DWPT, etc.) are commonly used as the feature extractor. However, these transforms have limited capability of representing 2D singularities (edges and textures of an image). In other words, they can not capture curve like features from the images eﬃciently which is inherent in MRI scanning. Further, it has been noticed that most PBDSs employ classiﬁers such as FNN and SVM. However, traditional training algorithms for FNN such as Levenberg-Marquardt (LM) and back-propagation (BP), are slower and trapped at local minima. The computational complexity involved with standard SVM is very high. Furthermore, several PBDSs demand large number of features. To resolve such issues, a novel framework for detection of pathological brain is proposed. The main contributions of this study are summarized as follows: (a) Fast discrete curvelet transform via unequally-spaced FFT (FDCT-USFFT) is harnessed as feature extractor since it is eﬃcient in capturing 2D singularities along with a group of curves. (b) To combat the issues faced by conventional learning algorithms, a simple and non-iterative learning technique known as extreme learning machine (ELM) is employed. (c) The concept of mutation is introduced to basic sine cosine algorithm (SCA) to enhance the global search capability and is referred to as modiﬁed sine cosine algorithm (MSCA). (d) A novel learning algorithm known as MSCA-ELM is proposed based on MSCA and ELM to further enhance the performance of basic ELM. (e) To evaluate the performance of the proposed scheme, extensive experiments are carried out on three well-known datasets. In this context, the performance of the suggested scheme is compared against its counterparts. The remainder of this article is organized as follows. Section 2 oﬀers the datasets used in this study. Section 3 discusses the details of the proposed methodology. In Section 4, the evaluation results on standard datasets and comparisons with existing schemes are presented. Finally, Section 5 concludes the work and suggests some possible future research directions. 2. Datasets used The proposed PBDS has been evaluated on three benchmark datasets, namely, DS-I, DS-II, and DS-III which carries 66, 160 and 255 brain MR images respectively [3,4,12]. The datasets accommodate T2-weighted brain MR images of size 256 × 256 in axial view plane which were downloaded from Medical School of Harvard University website [22]. Both DS-I and DS-II hold samples of seven categories of diseases such as sarcoma, glioma, meningioma, AD plus visual agnosia (VA), Pick’s disease (PD), AD and Huntington’s disease (HD) plus healthy brain samples. However, DS-III includes four more diseases such as cerebral toxoplasmosis (CTP), multiple sclerosis (MS), herpes encephalitis (HE), and chronic subdural hematoma (CSH). 3. Proposed methodology The proposed framework includes four vital components such as contrast limited adaptive histogram equalization (CLAHE) based preprocessing, FDCT-USFFT based feature extraction, PCA+LDA based feature dimensionality reduction, and MSCA-ELM based 367

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

Fig. 1. Overview of the proposed framework.

classiﬁcation. The input of the system is an MR image and the output is the class label (healthy or pathological). The overview of the proposed framework is depicted in Fig. 1. 3.1. Preprocessing based on CLAHE It is observed that most of the images in the datasets considered in this study are of low-contrast. Therefore, for contrast enhancement of the images, a standard technique named CLAHE is employed. CLAHE initially evaluates a histogram of gray values at a contextual region surrounded by every pixel and thereafter, allocates a value to each pixel intensity within the display range [5]. Additionally, it uses a ﬁxed value dubbed as clip limit which helps in clipping the histogram prior to the computation of cumulative distribution function (CDF). However, CLAHE redistributes those parts of the histogram equally among all histogram bins that surpass the clip limit. 3.2. Feature extraction based on FDCT via USFFT Wavelet transform has received much attention of researchers due to its properties like time-frequency localization and multiresolution. Wavelet shows better performance in representing 1D singularities; however, it is unable to capture 2D singularities (line, curves, etc.) from the images. Thereafter another transform called as ridgelet transform is proposed which can handle line singularities, but can not eﬀectively deal with curve singularities. In contrary, ﬁrst generation curvelet eﬃciently handles 2D singularities. Additionally, curvelet transform oﬀers properties like multiresolution, more directional selectivity, anisotropy and localization [5]. More recently, second generation curvelet transform is introduced which resolves the problems faced by ﬁrst order curvelet such as the unclear geometry of ridgelets and more time-consuming [23]. Let g be a signal, we now can deﬁne the curvelet transform with the help of inner product as

(α, β, γ ) = g , ϕα, β, γ

(1)

Here, ϕα, β, γ indicates the curvelet basis function. α, γ, and β denote the scale, position, and direction (orientation) parameter respectively. The curvelet transform decomposes the image into numerous windows at various scales and orientations. The discrete form of curvelet transform for an input Cartesian array g[x1, y1] with 0 ≤ x1, y1 < n is deﬁned as [23]

D (α, β, γ ) =

∑

g [x1, y1] ϕαD, β, γ [x1, y1]

(2)

0 ≤ x 1, y1 < n

where, ϕαD, β, γ indicates a digital curvelet waveform. The proposed PBDS makes use of a second generation curvelet transform also called fast discrete curvelet transform (FDCT) for feature extraction. There exist two procedures to implement FDCT such as FDCT via wrapping (FDCT-WR) and FDCT via unequally spaced fast Fourier transform (FDCT-USFFT). In contrast to ﬁrst generation curvelets, these two procedures are fast, simple, and less redundant. However, we choose FDCT-USFFT as the feature extractor in this study as it provides proper discretization of the continuous deﬁnition. The steps to collect the curvelet coeﬃcients using FDCT-USFFT are listed in Algorithm 1.

Feature generation. To generate a feature vector, the coeﬃcients of FDCT-USFFT at each scale α and orientation β are collected. Here, the number of scales (s) for an image with size nr × nc is decided as 368

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

Require: Input image: g[x1 , y1 ]; 0 ≤ x1 , y1 < n Ensure: Curvelet coeﬃcients: CD (α, β, γ) 1: Given an input g[x1 , y1 ], apply 2D FFT and generate Fourier coeﬃcients g ˆ [n1 , n2 ] as gˆ [n1 , n2 ] =

n−1

g[x1 , y1 ]e−i2π(n1 x1 +n2 y1 )/n ; −n/2 ≤ n1 , n2 < n/2

x1 ,y1 =0

2: 3:

For each scale and angle pair (α, β), interpolate gˆ [n1 , n2 ] to generate sample values gˆ [n1 , n2 − n1 tan θβ ] Perform multiplication of the interpolated object gˆ with the parabolic window U˜ α g˜ α,β [n1 , n2 ] = gˆ [n1 , n2 − n1 tan θβ ]U˜ α , gˆ [n1 , n2 ]

4:

Obtain the discrete curvelet coeﬃcients CD (α, β, γ) by applying inverse 2D FFT to each g˜ α,β . Algorithm 1. FDCT via USFFT algorithm.

s = ⌈ log 2 (min (nr , nc )) − 3⌉.

(3)

Since the MR images are of size 256 × 256, s value is 5 where each scale contains information along diﬀerent orientations (subbands) except ﬁrst and last scale. Scale 2, 3 and 4 possesses 32, 32 and 64 angles respectively. It is worth pointing out here that curvelet at angle θ and θ + π generates same coeﬃcients. Therefore, the coeﬃcients of the symmetric sub-bands at scale 2, 3 and 4 are discarded to remove redundancy from the original feature vector. But, the resultant feature vector is still of large dimension even after discarding symmetric bands. This necessitates the employment of feature reduction techniques on the resultant feature vector. 3.3. Feature reduction based on PCA+LDA Feature reduction methods play a vital role in reducing computational burden, understanding data and improving the classiﬁcation performance. Both PCA and linear discriminant analysis (LDA) have received considerable attention from the researchers in the past decades. PCA transforms high dimensional input data to a lower dimensional space while keeping maximum variations of the data. In contrast, LDA attempts to ﬁnd a feature subspace that best discriminates between the classes. But, conventional LDA performs poorly while dealing with high dimensional and small sample size problem as in this case the within-scatter matrix (Sw) is always singular [24]. To address this issue, a popular approach called PCA+LDA is applied in this study, where a D-dimensional data is ﬁrst reduced to an M-dimensional data using PCA and then reduced to a L-dimensional data using LDA, L < < M < D. It may be noted that the optimal number of features (L) required in our system is selected using the normalized cumulative sum of variances (NCSV) measure. The NCSV value for ath feature is calculated as a

NCSV (a) =

∑u = 1 λ (u) D

∑u = 1 λ (u)

; 1≤a≤D (4)

where, λ(u) represents the eigenvalue of the u feature and D denotes the total number of the eigenvectors (features) sorted in descending order of eigenvalues. Here, we set a threshold value manually and the number of features (for instance L) for which the NCSV value surpasses the threshold are selected. It is worth mentioning here that L best eigenvectors are retained for extraction of features from the unknown test MR images. The overall steps involved in the feature reduction stage are listed in Algorithm 2. th

3.4. Classiﬁcation based on MSCA-ELM In this section, we ﬁrst discuss about extreme learning machine (ELM) and modiﬁed sine cosine algorithm (MSCA), and thereafter, the proposed MSCA-ELM algorithm is presented in detail. 3.4.1. Extreme learning machine (ELM) Extreme learning machine (ELM) is one of the most simple and eﬀective learning approaches for training the single-hidden layer feed-forward neural networks (SLFNs) that avoids the limitations of gradient based learning schemes [25,26]. It has achieved dramatic successes in solving problems like multi-label classiﬁcation problems and regression tasks. Contrast with conventional learning approaches such as BP, SVM and LS-SVM, ELM learns faster with better generalization performance. In ELM, the hidden node parameters (the input weights and hidden biases) are randomly generated, while the output weights of SLFNs are mathematically determined by a simple inverse operation of the hidden layer output matrix. Given N distinct training samples (xj, tj), where x j = [x j1, x j2, …, x jL]T ∈ RL and t j=[t j1, t j2, …, tjC]T ∈ RC, the hidden node number nh and an activation function ϕ(.), the steps of basic ELM algorithm are described as follows. 369

370

Algorithm 2. Feature reduction using PCA+LDA.

Require: Feature matrix: FM of size N × D Ensure: Reduced feature matrix: Fr of size N × L Function pca() and lda() reduce the dimension using PCA and LDA respectively 1: Choose a dimension M Reduced dimension using PCA 2: F (N × M) ← pca(FM , M) 3: Select a dimension L using NCSV measure Reduced dimension using LDA 4: Fr (N × L) ← lda(F , L) 5: Output the reduced matrix Fr

D.R. Nayak et al.

Computers and Electrical Engineering 68 (2018) 366–380

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

1. Generate hidden node parameters randomly (wih, bi ), i = 1, 2, …, nh . 2. Compute the hidden layer output matrix H. 3. Compute the output weight matrix using the minimal norm least square method w o = H†T T

Here, wih = [wih1, wih2, …, wiLh] represents the weight vector that links between ith hidden neuron and the input neurons, o T wio = [wio1, wio2, …, wiC ] indicates the weight vector that connects the ith hidden neuron and the output neurons, and bi is the bias of the th i hidden neuron. H† indicates the Moore-Penrose (MP) generalized inverse of matrix H. The size of H, wo and T are N × nh, nh × C and N × C respectively. As the solution of ELM is obtained using an analytical method without iteratively tuning parameters, it converges faster than other traditional learning algorithms. 3.4.2. Proposed Modiﬁed Sine Cosine algorithm (MSCA) The sine cosine algorithm (SCA) is a recently proposed population-based optimization technique which uses two trigonometric functions to ﬁnd the global optima. In particular, SCA utilizes sine and cosine functions to update a set of candidate solutions [27]. The solutions in SCA are updated as follows.

si (t ) + r1 × sin(r2) × r3 sbest − si (t ) si (t + 1) = ⎧ ⎨ ⎩ si (t ) + r1 × cos(r2) × r3 sbest − si (t )

if r4 < 0.5 if r4 ≥ 0.5

(5)

where, t denotes the current generation, s(t) indicates the current solution, sbest indicates the best solution having best ﬁtness achieved so far, || denotes the absolute values. r1, r2, r3, and r4 are the random variables. The parameter r1 helps in determining the position of the next solution, which may be either in the space between s(t) and sbest or outside it. In order to balance exploration and exploitation, r1 is changed adaptively as follows

r1 = q − t

q MaxItr

(6)

where, MaxItr is the maximum number of generations and q is a constant. The parameter r2 deﬁnes the direction of movement of the next solution towards or outwards sbest. The parameter r3 gives random weights for sbest in order to stochastically emphasize (r3 > 1) or deemphasize (r3 < 1) the eﬀect of destination in deﬁning the distance. The parameter r4 helps in switching between the sine and cosine function, and is a random number between 0 and 1. It has been observed that the conventional SCA has many disadvantages like getting trapped at local optima, slow convergence, and high computational cost [28]. In order to enhance the performance, we introduce a mutation operator to traditional SCA and name it as modiﬁed SCA (MSCA). In general, the mutation operator provides additional diversity and hence improves the search toward the global best solution. In MSCA, we select a random candidate solution and add some random perturbation (mutation step size) to the randomly selected solution by a mutation probability. The general steps of the proposed MSCA method are described in Algorithm 3. In the algorithm, r11, r12, rand1(.), rand2(.) are four separate random numbers in the range [0,1]. rnd denotes a random number between 1 to the number of candidate solutions, Pm indicates the mutation probability, MAXlimit indicates the maximum limit of the variable in the solution, and step size dictates the mutation step size. 3.4.3. Proposed MSCA-ELM method Due to random choice of the input weights and hidden biases, standard ELM poses two critical problems. First, it needs more hidden neurons for which ELM responds slowly to unknown testing data [29]. Second, it causes an ill-conditioned hidden layer output matrix H in presence of large hidden neurons which leads to poor generalization performance. Condition number is shown to be an eﬀective qualitative measure to ﬁnd the conditioning of a matrix [30]. It may be noted that an ill-conditioned system holds large condition number, while a well-conditioned system holds small condition number. In order to overcome the issues of basic ELM, few research eﬀorts have been reported in the past years where the population-based optimization schemes such as genetic algorithms (GA), diﬀerential evolution (DE) and PSO are used to optimize the hidden node parameters of ELM. However, in this paper, a modiﬁed SCA algorithm is introduced to train ELM (MSCA-ELM) which markedly enhance the recent results. In MSCA-ELM, MSCA is used to optimize the hidden node parameters, whereas, MP generalized inverse is utilized to analytically ﬁnd the solution. In the current study, we ﬁrst verify the eﬀectiveness of ELM trained by conventional SCA (referred as SCA-ELM) and thereafter, we verify our proposed approach called MSCA-ELM. It is worth mentioning here that unlike SCA-ELM, MSCA-ELM approach searches global optima by considering both root-mean squared error (RMSE) and norm of the output weights of SLFNs which leads potential improvement in the generalization performance and conditioning. The main goal of MSCA-ELM is to minimize the norm of the output weights and to bound the hidden node parameters within a speciﬁc range with an aim to enhance the convergence performance of ELM. It is known from Bartlett’s theory that for neural networks reaching smaller training error, the smeller the norm of weight is, the better generalization performance of the networks tend to acquire. The steps of the proposed MSCA-ELM are as follows: (a) Randomly initialize all the candidate solutions (z = 1, …, Ps ) such that each solution consists of a set of input weights and hidden biases within a range of [-1,1] as h h h h sz = [w11 , w12 , …, w1hL, w21 , w22 , …, w2hL, wnhh 1, wnhh 2, …, wnhh L, b1, b2, …, bnh]

371

(7)

372

28:

26: 27:

23: 24: 25:

20: 21: 22:

17: 18: 19:

14: 15: 16:

11: 12: 13:

9: 10:

6: 7: 8:

3: 4: 5:

1: 2:

Algorithm 3. General steps of the proposed MSCA algorithm.

Initialize a set of random candidate solutions (s) Calculate the fitness of each solution Find the best candidate solution (sbest ) while (t < maximum number of iterations) do for each candidate solution do Update r1 , r2 , r3 , and r4 Update the solution using Eq. (5) end for Compute the fitness of each updated solution Update the best solution achieved so far (sbest ) Select a random candidate solution (srnd ) if (r11 < Pm ) then if (r12 < 0.5) then srnd = srnd + rand1(.) ∗ MAXlimit /step size else srnd = srnd + rand2(.) ∗ MAXlimit /step size end if if ( f itness(srnd ) is better than f itness(srnd )) then srnd = srnd if ( f itness(srnd ) is better than f itness(sbest )) then sbest = srnd end if else srnd = srnd end if end if end while Return the best solution achieved so far as the global optimum Mutation

D.R. Nayak et al.

Computers and Electrical Engineering 68 (2018) 366–380

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

(b) Evaluate the output weights and ﬁtness of each candidate solution and ﬁnd sbest in the population. For ﬁtness evaluation, we calculate the RMSE over the validation set. The ﬁtness is stated as N

f () =

n

∑ j =v 1 ∑i =h 1 wio ϕ (wih·x j + bi ) − t j

2 2

Nv

(8)

where, Nv indicates the number of validation samples. (c) For each candidate solution, update r1, r2, r3 and r4, and update the solutions using Eq. (5) (d) Bound the new solutions using the following expression

−1 if sz (t + 1) < −1 sz (t + 1) = ⎧ ⎨ + ⎩ 1 if sz (t + 1) > +1

(9)

and ﬁnd the new best solution sbestnew. (e) Update the sbest using the ﬁtness value and the norm of the output weights as follows

sbest =

⎧ sbestnew if ( f (sbest ) − f (sbestnew ) < ϵf (sbest ) and wsobestnew < wsobest ) ⎨ otherwise ⎩ sbest

(10)

where, f(sbest) and f(sbestnew) denotes the ﬁtness of the best solution so far and the new best solution respectively. wsobest and wsobestnew represents the output weights of best solution so far and the new best solution respectively. ϵ > 0 is a user-deﬁned tolerance rate. (f) Randomly select an updated solution in the population and apply mutation to it (using Eqs. in Algorithm 3) and update the solution sbest if there is better solution. (g) Repeat (c)-(f) until the maximum number of iterations are over and eventually obtain the optimal hidden node parameters. The MSCA makes use of Eq. (10) in order to ﬁnd the optimal input weights and hidden biases and therefore it tends to provide a lower norm value of the output weights of SLFNs. On the other hand, the lower norm value prompts to a lower condition value of the output hidden matrix. In summary, the key advantages of the proposed MSCA-ELM algorithm are as follows: (i) it improves the conditioning and (ii) it produces better generalization performance with a much more compact network. Unlike other gradient based methods and classical ELM, MSCA-ELM algorithm does not require any activation function to be inﬁnitely diﬀerentiable. Because the proposed framework includes techniques such as FDCT-USFFT, PCA+LDA, and MSCA-ELM, hereafter, in this paper, the proposed framework is referred to as FDCT-USFFT + PCA+LDA + MSCA-ELM. 4. Experimental results and analysis The proposed method was implemented using MATLAB on a PC with 16 GB RAM, 3.5 GHz processor, and windows 10 OS. The parameters used and the statistical set up was kept similar to other competent schemes to derive fair comparisons. 4.1. Experimental design In order to validate the proposed scheme FDCT-USFFT + PCA+LDA + MSCA-ELM, simulation has been carried out on three benchmark datasets, namely, DS-I, DS-II, and DS-III. For statistical analysis, cross-validation (CV) has been employed which avoids over-ﬁtting. In this work, we have incorporated stratiﬁcation into CV (SCV) which splits the folds in such a way that each fold will have a similar class distribution. Fig. 2 depicts the setting of a 5-fold CV for a single run. In each trial, one fold is used for testing, one for validation and the rests are for training. The validation set is used to ﬁnd the parameters of the MSCA-ELM i.e., it helps us to know when to stop training. The test set is used to evaluate the performance in a run of ﬁve trials. It is worth mentioning here that the statistical setting for all the three datasets is kept similar to the literatures [3,4,18] as shown in Table 1. For DS-I, we employ 6-fold SCV strategy while for rest two datasets, we employ 5-fold SCV strategy. Additionally, we run the SCV procedure ten times to avoid

Fig. 2. Illustration of 5-fold cross validation setting for a single run. 373

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

Table 1 Speciﬁcation of three benchmark datasets [3,4,18]. Dataset

k-fold SCV

DS-I DS-II DS-III

Total samples

6 5 5

Training

Validation

Testing

H

P

H

P

H

P

H

P

18 20 35

48 140 220

12 12 21

32 84 132

3 4 7

8 28 44

3 4 7

8 28 44

randomness. 4.2. Performance metrics The performance of the proposed framework is evaluated using four benchmark metrics such as sensitivity (Se), speciﬁcity (Sp), precision (Pr) and accuracy (Acc). The metrics are deﬁned as follows. Se is the fraction of pathological MR samples successfully predicted, while Sp is the fraction of healthy MR samples successfully predicted. However, Acc determines the fraction of the correctly predicted samples (both pathological and healthy) in the total number of testing samples. Moreover, to compare proposed MSCA-ELM method against other methods such as DE-ELM, PSO-ELM, basic ELM and BPNN, two additional parameters such as condition number (K2 ) and norm of output weights are used. The 2-norm condition number of the matrix H is calculated as,

K2 (H) =

λmax (HT H) λmin (HT H)

(11)

where, λmax(H H) and λmin(H H) denotes the largest and smallest eigenvalues of H H. T

T

T

4.3. Results analyses In the following, we discuss the results obtained at various stages of the proposed scheme. 4.3.1. Preprocessing and feature extraction results In preprocessing stage, CLAHE is utilized which relies on the proper setting of its parameters. Here, the original MR image is divided into 64 contextual regions. The number of bins and the clip limit (β) are set to 256 and 0.01. The representative enhanced images corresponding to four original MR images are depicted in Fig. 3. It can be seen that the aﬀected lesions are clear in the enhanced images than that of original images. Subsequently, a 5-level FDCT-USFFT is employed to extract features from the preprocessed images. The 5-level FDCT-USFFT decomposition of a healthy image is depicted in Fig. 4. It is worth mentioning here that we merely consider the coeﬃcients of 66 subbands (excluding symmetric sub-bands) from a total of 130 sub-bands for feature extraction. The feature vector for a single MR image is constructed by collecting these coeﬃcients and the number of features are counted as 125,952 which is much larger in size.

Fig. 3. Preprocessing using CLAHE. Row 1 lists the original MR samples. Row 2 lists the corresponding contrast enhanced images using CLAHE. 374

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

Fig. 4. Coeﬃcients at level 5 decomposition of FDCT-USFFT.

4.4. Feature reduction results In this study, PCA+LDA has been harnessed to reduce the dimension of the derived feature vectors. The value of M in Algorithm 2 is set to N − 1, where N is the number of training samples. The number of signiﬁcant features is obtained based on the NCSV values of diﬀerent features. The simulation results show that PCA preserves maximum information with more features as compared to PCA+LDA. Here, the threshold value for NCSV was set to 0.95. The classiﬁcation accuracies with respect to the increasing number of features for PCA and PCA+LDA over three datasets are shown in Fig. 5. From the ﬁgure, it is clear that PCA based scheme achieves

Fig. 5. Classiﬁcation accuracy with respect to number of features for three datasets. 375

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

Table 2 Performance comparison of diﬀerent algorithms on DS-I. Classiﬁers

Acc (%)

Hidden neurons (nh)

Norm

Condition number (K2 )

BPNN ELM PSO-ELM DE-ELM SCA-ELM MSCA-ELM

100.00 100.00 99.85 100.00 100.00 100.00

4 5 3 3 3 3

– 30.4136 20.4912 18.4813 12.7985 9.2384

– 4.1260e+03 60.0386 51.3119 42.1588 38.6785

higher accuracy with 13 features over all the three datasets, while PCA+LDA based scheme yields higher accuracy with only two features. 4.5. Classiﬁcation results The proposed system employs MSCA-ELM for classiﬁcation of MR images as healthy or pathological. In this study, the performance of the proposed MSCA-ELM is compared against other learning algorithms such as SCA-ELM, DE-ELM, PSO-ELM, ELM, and BPNN. The activation function used in all the algorithms was kept same for all the algorithms i.e., sigmoidal function and the inputs to the networks were normalized into the range [−1,1]. Further, we set the population size to 20 and the maximum number of iterations to 30 for MSCA-ELM, SCA-ELM, DE-ELM, and PSO-ELM algorithm. The ϵ value in the proposed MSCA-ELM was tested between a range [0.01,0.2] at equally spaced intervals. However, it has been found that the proposed scheme achieves highest performance with ϵ value as 0.02. The parameters r1, r2, r3, and r4 were initialized as follows. r1 is selected using Eq. (6) and q in the equation is set to 2, r2 is a random number in the range [0, 2π], r3 is a random number in the range [0,2], and r4 is a random number in the range [0,1]. The mutation probability (Pm) was set to 0.8 and the step size was set to MAXlimit to 0.1MAXlimit. In case of PSOELM, the value of c1 and c2 were set to 2, while in DE-ELM, the crossover rate (Cr) and scaling factor (fs) were set to 0.7 and 0.8 respectively. Tables 2– 4 show the results obtained by MSCA-ELM, SCA-ELM, DE-ELM, PSO-ELM, ELM and BPNN on three benchmark datasets. From the tables, it is clear that MSCA-ELM outperforms others with less hidden neurons over all the datasets. It can also be noticed that SCA-ELM earns perfect classiﬁcation on DS-I and DS-II, however, it earns lower accuracy over DS-III. As compared to other algorithms, standard ELM demands more hidden neurons. The comparative analyses also indicate that the condition value of the matrix H obtained by MSCA-ELM, SCA-ELM, DE-ELM and PSO-ELM algorithm is much smaller compared to the conventional ELM. Therefore, it is proved that the network trained by all these algorithms are highly well-conditioned compared to the basic ELM. Further, their corresponding norm values are much smaller than basic ELM and hence, these algorithms provide better generalization performance compared to traditional ELM. Moreover, it can be seen that the smaller norm value of wo leads to the smaller condition value. Compared to PSO-ELM, DE-ELM and SCA-ELM, MSCAELM obtains smaller condition and norm values. Therefore, it can be concluded that the proposed MSCA-ELM can provide better generalization performance with a compact network structure. It is worth mentioning here that the results reported in the tables are the average values of 50 trials and the parameters of all the schemes are determined through experimental evaluation. To prove the eﬃcacy of the suggested MSCA-ELM classiﬁer, an additional experiment is performed. In this experiment, we have made accuracy comparison with other standard classiﬁers like BPNN, KNN, random forest (RF), and SVM classiﬁer over the three datasets and the results are depicted in Fig. 6. For DS-I, KNN, BPNN, SVM, RF, ELM and SCA-ELM yield an accuracy of 99.39%, 100.00%, 100.00%, 99.85%, 100.00% and 100.00% respectively; while for DS-II, these classiﬁers obtain an accuracy of 99.44%, 99.94%, 99.88%, 99.81%, 100.00% and 100.00% respectively. The accuracies yielded by KNN, BPNN, SVM, RF, ELM and SCA-ELM are 99.25% 99.37%, 99.49%, 99.41%, 99.49%, and 99.61% respectively on DS-III. But the proposed algorithm (MSCA-ELM) earns ideal classiﬁcation on DS-I and DS-II datasets, and an accuracy of 99.73% on DS-III dataset. This shows that the proposed algorithm outperforms all other classiﬁers on DS-III and is able to provide ideal results over other two datasets. Table 5 indicates the number of correctly classiﬁed MR images obtained by the proposed scheme (FDCT-USFFT+ PCA+LDA + MSCA-ELM) over DS-III in each trial of a 10 × 5-fold SCV. It is found that the proposed scheme could successfully classify 2543 MR images out of 2550 samples (2200 pathological and 350 healthy MR images). In particular, 2196 pathological samples are Table 3 Performance comparison of diﬀerent algorithms on DS-II. Classiﬁers

Acc (%)

Hidden neurons (nh)

Norm

Condition number (K2 )

BPNN ELM PSO-ELM DE-ELM SCA-ELM MSCA-ELM

99.88 100.00 100.00 99.94 100.00 100.00

4 5 3 3 3 3

– 33.5066 17.3990 20.4078 13.6705 10.6514

– 7.8646e+03 66.8353 70.1131 51.6276 33.8798

376

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

Table 4 Performance comparison of diﬀerent algorithms on DS-III. Classiﬁers

Acc (%)

Hidden neurons (nh)

Norm

Condition number (K2 )

BPNN ELM PSO-ELM DE-ELM SCA-ELM MSCA-ELM

99.37 99.49 99.61 99.57 99.61 99.73

4 5 3 3 3 3

– 102.6282 22.0464 33.7713 16.7121 13.5815

– 7.9502e+03 103.9707 121.7516 94.0292 72.4073

Fig. 6. Classiﬁcation accuracy achieved by diﬀerent classiﬁers for three datasets.

successfully classiﬁed by our scheme and the rest four samples are misclassiﬁed to healthy class. However, the proposed system successfully predicts 347 healthy MR images and rest three samples are misclassiﬁed to pathological class. From these results, the sensitivity (Se), speciﬁcity (Sp) and precision values (Pr) of the proposed scheme are computed as 99.82%, 99.14% and 99.86% respectively.

377

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

Table 5 Correctly classiﬁed samples of the proposed scheme on DS-III. Run

F-1

1 2 3 4 5 6 7 8 9 10 Total

51 51 50 51 51 51 51 51 50 51

F-2 (51) (51) (51) (51) (51) (51) (51) (51) (51) (51)

51 51 51 51 50 51 51 51 51 51

F-3 (51) (51) (51) (51) (51) (51) (51) (51) (51) (51)

51 51 51 51 51 51 50 51 51 51

F-4 (51) (51) (51) (51) (51) (51) (51) (51) (51) (51)

51 51 51 51 51 51 51 51 51 49

F-5 (51) (51) (51) (51) (51) (51) (51) (51) (51) (51)

51 50 51 51 51 51 51 51 51 51

(51) (51) (51) (51) (51) (51) (51) (51) (51) (51)

Total

Acc(%)

255 (255) 254 (255) 254 (255) 255 (255) 254 (255) 255 (255) 254 (255) 255 (255) 254 (255) 253 (255) 2543 (2550)

100.00 99.61 99.61 100.00 99.61 100.00 99.61 100.00 99.61 99.22 99.73

x(y) indicating x brain images are correctly classiﬁed out of y brain images.

4.6. Comparison with PCA based scheme An additional experiment has been performed over three datasets in order to test the eﬀectiveness of PCA+LDA approach over PCA. The performances of both the schemes, namely, FDCT-USFFT+PCA+MSCA-ELM and FDCT-USFFT+ PCA+LDA +MSCA-ELM are listed in Table 6. It may be noticed that the proposed FDCT-USFFT+ PCA+LDA +MSCA-ELM scheme achieves better sensitivity, precision and accuracy than FDCT-USFFT+PCA+MSCA-ELM over all the datasets with a relatively less number of features. Also, it can be observed that FDCT-USFFT+ PCA+LDA +MSCA-ELM obtains slightly lower speciﬁcity than FDCT-USFFT+ PCA +MSCAELM on DS-III. However, it is worth addressing here that the CAD system with higher sensitivity values leads to have better performance. Therefore, it can be concluded that the proposed FDCT-USFFT+ PCA+LDA +MSCA-ELM scheme holds greater potential in taking accurate clinical decisions.

4.7. Comparison with previous works To benchmark the performance of the suggested scheme in context of the number of features required and classiﬁcation accuracy, extensive comparison with twenty existing schemes has been performed over three datasets and is shown in Table 7. It is found that most of the earlier PBDSs yield ideal classiﬁcation on DS-I; however, three PBDSs such as RT + PCA + LS-SVM [3], WPTE + FNN + RCBBO [21] and WPTE + GEPSVM [12] oﬀer ideal classiﬁcation on DS-II. Further, there is no PBDS available which can yield perfect classiﬁcation over DS-III. However, our proposed PBDS obtains the highest accuracy of 99.73% as compared to state-of-the-arts while requiring the least number of features. Based on the computational results, we can summarize the key advantages of our scheme: (i) It eﬃciently captures the texture features from the MR images, (ii) The proposed MSCA method helps in enhancing the global search capability via the introduction of a mutation operator, (iii) MSCA-ELM earns better generalization performance and responds faster to unknown testing data, and (iv) It obtains better classiﬁcation accuracy with the least number of features. The proposed framework has the following scopes which can be improved in future: (i) The proposed framework was tested on three openly accessible datasets accommodating images from the patients through the late and middle stages of diseases, but a larger dataset with images from all stages of diseases can be validated to achieve better generalization performance, (ii) The present study deals with solving a two-class classiﬁcation problem, however solving a multi-class brain disease classiﬁcation problem is more challenging, and (iii) MSCA demands more parameter to tune, hence investigating an optimization scheme that requires less number of parameters is another possible scope. Table 6 Classiﬁcation performance (%) of the proposed scheme based on PCA and PCA+LDA. Dataset

Schemes No. of feature

FDCT-USFFT+PCA+MSCA-ELM 13

FDCT-USFFT+PCA+LDA+MSCA-ELM 2

DS-I

Se Sp Pr Acc Se Sp Pr Acc Se Sp Pr Acc

100.00 100.00 100.00 100.00 99.86 100.00 100.00 99.88 99.64 99.43 99.91 99.61

100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 99.82 99.14 99.86 99.73

DS-II

DS-III

378

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

Table 7 Performance comparison with previous works on three standard datasets. Existing PBDSs

DWT + SVM + POLY [6] DWT + PCA + BPNN + SCG [2] DWT + PCA + FNN + SCABC [9] DWT + PCA + FNN + ACPSO [8] DWT + PCA + KSVM [10] WPSE + GEPSVM [12] WPTE + GEPSVM [12] WPTE + FNN + RCBBO [21] WE + HMI + GEPSVM [16] DWT + PCA + ADBRF [4] FRFE + WTT + SVM [17] DTCWT + VE + GEPSVM [20] FRFE + WTT + DP-MLP + ARCBBO [18] RT + PCA + LS-SVM [3] DWT + PCA + k-NN [7] FPCNN + DWT + PCA + FNN [1] SWT + PCA + GEPSVM [13] WE + NBC [15] DWT + PCA + LDA + RF [14] MBD + SLFN + IPSO [19] FDCT-USFFT + PCA + MSCA-ELM FDCT-USFFT + PCA+LDA + MSCA-ELM (Proposed)

Feature size

Run

4761 19 19 19 19 16 16 16 14 13 12 12 12 9 7 7 7 7 7 5 13 2

5 5 5 5 5 10 10 10 10 10 10 10 10 5 5 10 10 10 10 10 10 10

Acc (%) DS-I

DS-II

DS-III

98.00 100.00 100.00 100.00 100.00 99.85 100.00 100.00 100.00 100.00 100.00 100.00 100.00 100.00 98.00 100.00 100.00 92.58 100.00 100.00 100.00 100.00

97.15 98.29 98.93 98.75 99.38 99.62 100.00 100.00 99.56 99.18 99.69 99.75 99.19 100.00 97.54 98.88 99.62 91.87 99.75 98.19 99.88 100.00

96.37 97.14 97.81 97.38 98.82 98.78 99.33 99.49 98.63 98.35 98.98 99.25 98.24 99.39 96.79 98.43 99.02 90.51 99.14 98.08 99.61 99.73

5. Conclusion In this paper, we have proposed an eﬃcient system for detection of pathological brain. The system derives features using fast discrete curvelet transform via unequally-spaced fast Fourier transform. For classiﬁcation, we have proposed a hybrid algorithm modiﬁed sine cosine algorithm - extreme learning machine. The modiﬁed sine cosine algorithm involves an introduction of a mutation operator which helps in optimizing the hidden node parameters of extreme learning machine. The simulation results over three standard datasets demonstrate that the proposed system obtains higher accuracy than other competent schemes with the least number of features. Moreover, it is observed that the proposed classiﬁer obtains good generalization performance and the network trained by it is well conditioned. The proposed learning algorithm can be applied to multi-class classiﬁcation and regression problems. The proposed system has been validated over various accessible datasets of smaller size, however, a larger dataset collected online will further prove its eﬀectiveness. Further, we plan to hybridize other promising meta-heuristic algorithms with extreme learning machine to improve the performance. Future work also includes the application of deep learning algorithms on a larger dataset. We also plan to investigate the images of the BrainWeb database. References [1] El-Dahshan EA, Mohsen HM, Revett K, Salem ABM. Computer-aided diagnosis of human brain tumor through MRI: a survey and a new algorithm. Expert Syst Appl 2014;41(11):5526–45. [2] Zhang Y, Dong Z, Wu L, Wang S. A hybrid method for MRI brain image classiﬁcation. Expert Syst Appl 2011;38(8):10049–53. [3] Das S, Chowdhury M, Kundu K. Brain MR image classiﬁcation using multiscale geometric analysis of ripplet. Progr Electromagn Res 2013;137:1–17. [4] Nayak DR, Dash R, Majhi B. Brain MR image classiﬁcation using two-dimensional discrete wavelet transform and AdaBoost with random forests. Neurocomputing 2016;177:188–97. [5] Nayak DR, Dash R, Majhi B, Prasad V. Automated pathological brain detection system: a fast discrete curvelet transform and probabilistic neural network based approach. Expert Syst Appl 2017;88:152–64. [6] Chaplot S, Patnaik LM, Jagannathan NR. Classiﬁcation of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed Signal Process Control 2006;1(1):86–92. [7] El-Dahshan ESA, Honsy T, Salem ABM. Hybrid intelligent techniques for MRI brain images classiﬁcation. Digit Signal Process 2010;20(2):433–41. [8] Zhang Y, Wang S, Wu L. A novel method for magnetic resonance brain image classiﬁcation based on adaptive chaotic PSO. Progr Electromagn Res 2010;109:325–43. [9] Zhang Y, Wu L, Wang S. Magnetic resonance brain image classiﬁcation by an improved artiﬁcial bee colony algorithm. Progr Electromagn Res 2011;116:65–79. [10] Zhang Y, Wu L. An MR brain images classiﬁer via principal component analysis and kernel support vector machine. Progr Electromagn Res 2012;130:369–88. [11] Saritha M, Joseph KP, Mathew AT. Classiﬁcation of MRI brain images using combined wavelet entropy based spider web plots and probabilistic neural network. Pattern Recognit Lett 2013;34(16):2151–6. [12] Zhang Y, Dong Z, Wang S, Ji G, Yang J. Preclinical diagnosis of magnetic resonance (MR) brain images via discrete wavelet packet transform with tsallis entropy and generalized eigenvalue proximal support vector machine (GEPSVM). Entropy 2015;17(4):1795–813. [13] Zhang Y, Dong Z, Liu A, Wang S, Ji G, Zhang Z, Yang J. Magnetic resonance brain image classiﬁcation via stationary wavelet transform and generalized eigenvalue proximal support vector machine. J Med Imaging Health Inform 2015;5(7):1395–403. [14] Nayak DR, Dash R, Majhi B. Classiﬁcation of brain MR images using discrete wavelet transform and random forests. Fifth national conference on computer

379

Computers and Electrical Engineering 68 (2018) 366–380

D.R. Nayak et al.

vision, pattern recognition, image processing and graphics (NCVPRIPG). IEEE; 2015. p. 1–4. [15] Zhou X, Wang S, Xu W, Ji G, Phillips P, Sun P, Zhang Y. Detection of pathological brain in MRI scanning based on wavelet-entropy and naive Bayes classiﬁer. Bioinformatics and biomedical engineering. 2015. p. 201–9. [16] Zhang Y, Wang S, Sun P, Phillips P. Pathological brain detection based on wavelet entropy and Hu moment invariants. Biomed Mater Eng 2015;26(s1). S1283–S1290 [17] Wang S, Zhang Y, Yang X, Sun P, Dong Z, Liu A, Yuan TF. Pathological brain detection by a novel image feature fractional Fourier entropy. Entropy 2015;17(12):8278–96. [18] Zhang Y, Sun Y, Phillips P, Liu G, Zhou X, Wang S. A multilayer perceptron based smart pathological brain detection system by fractional Fourier entropy. J Med Syst 2016;40(7):1–11. [19] Zhang YD, Chen XQ, Zhan TM, Jiao ZQ, Sun Y, Chen ZM, Yao Y, Fang LT, Lv YD, Wang SH. Fractal dimension estimation for developing pathological brain detection system based on Minkowski–Bouligand method. IEEE Access 2016;4:5937–47. [20] Wang S, Lu S, Dong Z, Yang M, Zhang Y. Dual-tree complex wavelet transform and twin support vector machine for pathological brain detection. Appl Sci 2016;6(6):169. [21] Wang S, Li P, Chen P, Phillips P, Liu G, Du S, Zhang Y. Pathological brain detection via wavelet packet tsallis entropy and real-coded biogeography-based optimization. Fundam Inform 2017;151(1–4):275–91. [22] Johnson K.A., Becker J.A.. The whole brain atlas. http://www.med.harvard.edu/AANLIB/. [23] Candes E, Demanet L, Donoho D, Ying L. Fast discrete curvelet transforms. Multiscale Model Simul 2006;5(3):861–99. [24] Martínez AM, Kak AC. PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 2001;23(2):228–33. [25] Huang GB, Zhu QY, Siew CK. Extreme learning machine: theory and applications. Neurocomputing 2006;70(1):489–501. [26] Huang GB, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classiﬁcation. IEEE Trans Syst, Man, Cybern, Part B (Cybern) 2012;42(2):513–29. [27] Mirjalili S. SCA: a sine cosine algorithm for solving optimization problems. Knowl Based Syst 2016;96:120–33. [28] Elaziz MA, Oliva D, Xiong S. An improved opposition-based sine cosine algorithm for global optimization. Expert Syst Appl 2017;90:484–500. [29] Zhu QY, Qin AK, Suganthan PN, Huang GB. Evolutionary extreme learning machine. Pattern Recognit 2005;38(10):1759–63. [30] Zhao G, Shen Z, Miao C, Man Z. On improving the conditioning of extreme learning machine: a linear case. 7th international conference on information, communications and signal processing. ICICS, IEEE; 2009. p. 1–5. Deepak Ranjan Nayak is currently pursuing PhD in Computer Science and Engineering at National Institute of Technology, Rourkela, India. His current research interests include medical image analysis, pattern recognition and cellular automata. He serves as reviewer of several international journals and conferences. He is a student member of IEEE. Ratnakar Dash is currently working as Assistant Professor in the Department of Computer Science and Engineering at National Institute of Technology, Rourkela, India. His ﬁeld of interests include signal processing, image processing, steganography, etc. He is a professional member of IEEE, IE, and CSI. He has authored more than 50 research papers. Banshidhar Majhi is a full Professor at the Department of Computer Science and Engineering of the National Institute of Technology, Rourkela, India. His research interests include image processing, data compression, cryptography and security, soft computing, and biometrics. He has co-authored more than 150 journal and conference papers. He has served as reviewer for many international journals and conferences. Shuihua Wang is currently an assistant professor at School of Electrical Engineering, Nanjing University, China. Her research interests focus on machine learning, deep learning, and biomedical image processing. She has published over 30 papers in peer-reviewed international journals and conferences. She serves as an editor and reviewer for many well-reputed journals and conferences. She is a member of the IEEE.

380

Combining extreme learning machine with modified sine cosine algorithm for detection of pathological brain

Combining extreme learning machine with modified sine cosine algorithm for detection of pathological brain

Recommend Documents