biocybernetics and biomedical engineering 39 (2019) 880–892
Available online at www.sciencedirect.com
ScienceDirect journal homepage: www.elsevier.com/locate/bbe
Original Research Article
A hybrid regularized extreme learning machine for automated detection of pathological brain Deepak Ranjan Nayak a,*, Ratnakar Dash b, Banshidhar Majhi a, Yudong Zhang c a Department of Computer Science and Engineering, Indian Institute of Information Technology, Design and Manufacturing, Kancheepuram 600127, India b Department of Computer Science and Engineering, National Institute of Technology, Rourkela 769008, India c Department of Informatics, University of Leicester, Leicester LE1 7RH, United Kingdom
article info
abstract
Article history:
This paper presents an automated method for detection of pathological brain using mag-
Received 29 January 2019
netic resonance (MR) images. The proposed method suggests to derive features using fast
Received in revised form
discrete curvelet transform. A combined feature reduction algorithm principal component
12 July 2019
analysis + linear discriminant analysis (PCA + LDA) is then applied to generate a low-di-
Accepted 24 August 2019
mensional and discriminant feature vector. Finally, the classification is carried out using a
Available online
hybrid regularized extreme learning machine (RELM). The proposed hybrid classifier combines RELM and sine cosine algorithm to not only overcome the drawbacks of conventional
Keywords:
learning algorithms but also provide good generalization performance with a compact and
Magnetic resonance imaging
well-conditioned network. Besides root mean square error, the norm of the output weights
Pathological brain
and the condition value of the hidden layer output matrix have been separately taken into
Regularized extreme learning
consideration to optimize the parameters of RELM. Extensive simulations on three bench-
machine
mark datasets demonstrate that the proposed scheme obtains promising results as com-
Sine cosine algorithm
pared to state-of-the-art approaches. The effectiveness of the proposed hybrid classifier is
Fast curvelet transform
compared with its counterparts. Moreover, the impact of noise in the training data is analyzed using the suggested scheme. The proposed approach can aid the radiologists for screening pathological brain at a larger scale. © 2019 Nalecz Institute of Biocybernetics and Biomedical Engineering of the Polish Academy of Sciences. Published by Elsevier B.V. All rights reserved.
1.
Introduction
Across the globe, the prevalence of brain disease has increased significantly over the past years. Early detection and treatment
are hence important to prevent the severity of the diseases as well as enhance the individual's quality life. Several medical imaging techniques such as magnetic resonance imaging (MRI), computed tomography (CT), and positron emission tomography (PET), are in general used in neurological research
* Corresponding author. E-mail addresses:
[email protected] (D.R. Nayak),
[email protected] (R. Dash),
[email protected] (B. Majhi),
[email protected] (Y. Zhang). https://doi.org/10.1016/j.bbe.2019.08.005 0208-5216/© 2019 Nalecz Institute of Biocybernetics and Biomedical Engineering of the Polish Academy of Sciences. Published by Elsevier B.V. All rights reserved.
881
biocybernetics and biomedical engineering 39 (2019) 880–892
[1]. MRI has been predominantly used in identifying pathological brain because of its high soft tissue contrast and radiationfree property [2–4]. Manual detection of pathological brain is tedious, time-consuming and need proficient supervision. An automated computer-aided diagnosis (CAD) system can help to mitigate these issues and can be utilized as supportive tools by the radiologists to validate their diagnosis. The machine learning techniques based CAD systems have gained much attention in the past years to provide fast, reliable and correct decisions. Over the last two decades, quite a significant amount of research has been reported on pathological brain detection using MR images. The detailed reviews of the existing work on CAD for pathological brain detection are summarized in Table 1. In past few years, the texture features computed from different multiresolution techniques have been generously studied for detection of brain abnormality and the CAD systems developed using these features are summarized in Table 2. Many automated brain tumor classification systems have also been developed in the past decade. Zacharaki et al. [28] designed an automated scheme for classification of different types of brain tumors with the help of shape, intensity and rotation invariant features, and SVM classifier. Hemanth et al. [29] introduced a PSO based modified counter propagation neural network (CPN) for distinguishing abnormal brain MR images. Features based on wavelet, histogram, and gray level co-occurrence matrices (GLCM) are taken into account in their study. Hu et al. [30] utilized texture features and three independent classifiers such as SVM, diagonal linear discriminate analysis, diagonal quadratic discriminate analysis to identify the tumor content in Glioblastoma (GBM) patients. Kaur et al. [31] suggested a novel feature selection technique based on parameter-free bat optimization algorithm and Fisher criterion and have used least square SVM (LS-SVM) for brain tumor image classification. A set of texture features comprising first-order statistics, GLCM, Gabor wavelet, etc., are considered for this study. In another contribution, Kaur et al. [32] proposed an approach that uses traditional metabolite ratios as features for brain MR tumor classification. A parameter-free bat algorithm in conjunction with Fisher criterion is suggested to
determine the optimal set of weights to formulate a fused metabolite ratio. An overview of the current CAD systems for brain tumor grade classification is outlined in [33]. In the past decade, automated brain tumor segmentation using MR images has made a significant stride and several algorithms have been devised. Liao et al. [34] proposed a kernelized fuzzy clustering based model for MR brain image segmentation. Later, a multiscale and multiblock fuzzy C-means algorithm is proposed in [35] for automatic brain image segmentation. Li et al. [36] presented an automated brain tumor segmentation method using maximum a posteriori probabilistic (MAP) estimation. A framework based on localized active contour model and background intensity compensation is proposed in [37] for brain tumor image segmentation in multimodal MRI data. Recently, Narayanan et al. [38] proposed a combined algorithm to segment multi-channeled MR brain images using modified fuzzy c means algorithm, PSO, and bacteria foraging optimization. It is worth mentioning that the proposed work mainly focuses on the classification of pathological brain MR images into normal or pathological category, where the pathological class includes samples from different categories such as brain stroke, degenerative disease, infectious disease, and brain tumor. The key observations from the aforementioned literature are summarized as follows. Most of the existing approaches employ DWT or its variants to extract salient features from the brain MR images. However, DWT has a bottleneck of capturing features in few directions and hence, it cannot derive important texture information from brain MR images in which the edges are usually curved. Furthermore, classifiers such as FNN and SVM are commonly utilized despite the fact that they experience the ill effects of basic issues such as poor computational scalability, slow training speed, and trivial human intervene. Thus, it is imperative to exploit a classifier with fast learning speed and good generalization performance. The literature also demonstrates that some existing approaches demand a large number of features to obtain satisfactory results. Thus, there exists still enough scopes for improving the detection accuracy with low-dimensional feature requirements. In this paper, attempts have been made to overcome the above-mentioned issues. This paper is an extension of a recent
Table 1 – A summary of the existing works on pathological brain detection. Author Chaplot et al. [5] El-Dahshan et al. [6] Zhang et al. [7,8] Zhang et al. [9,10] Das et al. [11] El-Dahshan et al. [2] Dong et al. [12] Nayak et al. [13] Chen et al. [14] Nayak et al. [15] Zhang et al. [16]
Feature/method Discrete wavelet transform (DWT) DWT and PCA DWT and PCA DWT and PCA Ripplet transfrom and PCA FPCNN, DWT and PCA Stationary wavelet transform (SWT) and PCA DWT and probabilistic PCA Minkowski-Bouligand dimension features Discrete ripplet-II transform Pseudo Zernike moment
Classifier SVM K-NN FNN + ACPSO, Kernel SVM FNN + SCG, FNN + SCABC Least-square SVM FNN Generalized eigenvalue proximal SVM AdaBoost with random forest (RF) FNN + improved PSO ELM + modified PSO Kernel SVM
No. of images
No. of features
52 70 160 66 66, 160, 101 66, 160, 66, 160, 66, 160, 66, 160, 66, 160,
4761 7 19 19 9 7 7 7 5 13 200
255 255 255 255 255 255
PCA: principal component analysis, PSO: particle swarm optimization, ACPSO: adaptive chaotic PSO, SCG:scaled conjugate gradient, SCABC: scaled chaotic artificial bee colony, FNN: feed-forward neural network, ELM: extreme learning machine.
882
biocybernetics and biomedical engineering 39 (2019) 880–892
Table 2 – A summary of the existing CAD systems developed using multiresolution technique based texture features. Author Dong et al. [17] Wang et al. [18] Zhou et al. [19] Zhang et al. [20] Yang et al. [21] Wang et al. [22] Lu et al. [23] Nayak et al. [24] Zhang et al. [25] Lu et al. [26] Wang et al. [27]
Feature/method Wavelet packet Shannon entropy (WPSE) and Tsallis entropy (WPTE) Fractional Fourier entropy (FRFE) and Welch's t-test Wavelet entropy (WE) Wavelet entropy and Hu moment invariants (HMI) Wavelet energy Dual-tree complex wavelet transform (DTCWT) variance and entropy features Wavelet entropy features Stationary wavelet energy and entropy features Wavelet packet Tsallis entropy Wavelet entropy SWT entropy (SWE) features
Classifier
No. of images
No. of features
Generalized eigenvalue proximal SVM
66, 160, 255
16
Twin SVM
66, 160, 255
12
Naive Bayes Generalized eigenvalue proximal SVM
64 66, 160, 255
7 14
Kernel SVM + BBO Generalized eigenvalue proximal SVM
90 66, 160, 255
10 12
ELM with bat algorithm (BA-ELM) AdaBoost with SVM
132 66, 160, 255
7 7
ELM + Jaya Kernel ELM Kernel SVM
66, 160, 255 125 66, 160, 255
64 7 7
BBO: biogeography-based optimization.
work presented at IEEE RO-MAN 2018 [39]. The contributions of this paper are outlined below. (a) An unequally-spaced FFT based fast discrete curvelet transform is employed to extract multi-directional features. (b) A randomized learning technique known as regularized extreme learning machine (RELM) is employed to perform classification which overcomes the problems of conventional learning algorithms. (c) The randomly assigned hidden node parameters as well as the regularization parameter of RELM strongly influence the classification performance. A new method combining a contemporary meta-heuristic technique, namely sine cosine algorithm and RELM is proposed to optimize the parameters of RELM. (d) Along with the root mean squared error, the norm of the output weights and the condition value of hidden layer output matrix have been separately taken into consideration in the optimization process. The impetus behind this is to achieve good generalization performance with a compact and well-conditioned network. (e) The effectiveness of the proposed method is evaluated on three standard datasets and a comparative analysis has been made with the existing methods. (f) The strength of the proposed scheme is also verified in presence of noisy and mislabeled data. The remainder of the paper is organized as follows. Section 2 includes a description of the datasets considered in our work. The proposed methodology is detailed in Section 3. Section 4 presents the simulation results and comparisons with state-of-the-art methods. Finally, the concluding remarks are given in Section 5.
2.
Datasets used
To gauge the effectiveness of the proposed scheme, three benchmark datasets such as DS-I [9], DS-II [11], and DS-III [11]
are taken into consideration which has been sourced from Harvard Medical School website [40]. The datasets DS-I and DS-II contain 66 (18 normal and 48 pathological) and 160 (20 normal and 140 pathological) axial MR images respectively, whereas DS-III accommodates 255 (35 normal and 220 pathological) axial MR samples. Each image in these datasets is of the T2-weighted type and holds a resolution of 256 256 pixels. Fig. 1 shows the sample MR images (both normal and pathological) taken from the datasets. It may be noted here that these datasets hold pathological MR samples from four different categories of brain diseases such as brain stroke, degenerative disease, infectious disease and brain tumor.
3.
Proposed methodology
The proposed scheme composes of four major phases: preprocessing, feature extraction, feature dimensionality reduction and classification, to automatically identify the pathological brain. Fig. 2 depicts the general framework of our proposed automated model.
3.1.
Preprocessing
For biomedical image analysis task, it has become a common practice to perform preprocessing of the input images prior to feature extraction which enables to obtain prominent features. In this work, contrast limited adaptive histogram equalization (CLAHE) method has been taken into consideration to improve the image contrast. CLAHE at first computes a histogram of gray values in the regions that surround each pixel and then it assigns a value to each pixel lies in the display range [41,24]. It suggests using a clip limit which is utilized for clipping the histogram before the estimation of the cumulative distribution function. The portions which surpass the clip limit are redistributed similarly among all histogram bins.
biocybernetics and biomedical engineering 39 (2019) 880–892
883
Fig. 1 – Example of sample MR images (a) normal brains and (b) pathological brains.
characteristics such as multiresolution, anisotropy, and localization [42]. The extraction of salient features from brain MR images has still remained challenging due to their complex tissue structure. In oder to overcome this challenge, we employ fast discrete curvelet transform (FDCT) for deriving prominent features from the brain images. The mathematical representation of the curvelet transform and the suggested feature extraction process are described in the following subsections.
3.2.1.
Fast discrete curvelet transform
Suppose g be a 2D function, the curvelet transform is expressed as
Fig. 2 – Block diagram of the proposed automated framework for pathological brain diagnosis.
3.2.
Feature extraction
The classifier's performance is primarily controlled by the features of the dataset and hence, several feature representation algorithms have been proposed in the past decades. Wavelet transform has been utilized in a variety of image classification applications due to its multiresolution analysis and time-frequency localization properties. One of the major concerns of wavelet transform is that it captures texture details in limited orientations and thereby, undermining it's usage while dealing with biomedical images where edges are curved in general. On the other hand, curvelet transform facilitates to capture texture information along multiple scales and orientations, and therefore, has gained remarkable attention by researchers for the extraction of relevant features. In addition, curvelet transform holds several important
Cða; b; tÞ ¼ hg; fa;b;t i
(1)
where fa,b,t denotes the curvelet basis function, b, a, and t indicate the direction, scale, and position parameter respectively. For a Cartesian array g[q1, q2] with 0 ≤ q1, q2 < n, the discrete representation of curvelet transform, is stated as follows
CD ða; b; tÞ ¼
X
g½q1 ; q2 fD a;b;t ½q1 ; q2
(2)
0q1 ;q2 < n
where fD a;b;t represents a digital curvelet waveform. Later, Candes et al. [43] proposed the fast discrete curvelet transform (FDCT) to resolve the problems of traditional curvelet transform. The notable characteristics of FDCT are: less redundant, fast, and simple compared to its standard version. In [43], two separate algorithms are reported to implement FDCT: (i) wrapping based FDCT (FDCT-WR), and (ii) unequally spaced fast Fourier transform based FDCT (FDCT-USFFT). The FDCT-USFFT algorithm is taken into consideration in this work since its efficacy has not yet fully investigated in the literature. In
884
biocybernetics and biomedical engineering 39 (2019) 880–892
contrast, FDCT-WR has been fairly exploited in the literature to design features [41,44,45]. Moreover, FDCT-USFFT generates real discretization of the continuous representation. The subtleties of this FDCT technique is well described in [43].
PCA + LDA method has been employed, where PCA is first applied to lessen the D-dimensional feature vector to the Mdimensional vector and subsequently, LDA is harnessed to generate the l-dimensional vector, l << M < D.
3.2.2.
3.4.
Image decomposition and feature vector computation
The images are first decomposed based on FDCT-USFFT technique and the features are then derived from the subbands at distinct scales (levels) and orientations. The number of decomposition level (hs) is decided using the following expression [43]
Classification
The classification of MR brain images is finally performed using a new hybridized learning method that combines regularized ELM and sine cosine algorithm.
3.4.1. s ¼ d log2 ðminðnr ; nc ÞÞ 3 e :
(3)
where nr and nc denote the rows and columns of the image. Fig. 3 illustrates the decomposed FDCT-USFFT subbands at various scales (hs=5) for a sample brain image. From the figure, it can be seen that the low-frequency subband is placed at the center, while the high-frequency subbands are placed on the external strips. Each strip comprises subbands in different directions. In this investigation, the low-frequency as well as high-frequency subbands are taken into account for feature extraction. A sum of 130 subbands (1 in coarse level, 32 in 2nd level, 32 in 3rd level, 64 in 4th level, 1 in fine level) are acquired using FDCT-USFFT decomposition of MR images (Refer Fig. 3). It is known that the information obtained by curvelet at angle b and b + p are similar. Thus, the symmetric subbands are discarded in our current work, thereby, removing the redundancy in the feature representation. Finally, a set of 66 subbands are selected and the coefficients of these subbands are amassed to generate the feature vector.
3.3.
Feature dimensionality reduction
The dimensionality of the feature vector derived from the suggested feature extraction scheme is observed to be significantly high. Feature reduction offers advantages like efficient data storage, classification, and data visualization. Both PCA and linear discriminant analysis (LDA) have been extensively applied for this purpose. However, it is well known that LDA neglects to yield better results while dealing with a small sample size problem where the within-scatter matrix becomes very often singular [46]. To mitigate such problem, a
Extreme learning machine (ELM)
ELM is a simple but effective learning technique for singlehidden layer feed-forward neural networks (SLFNs) that provides better generalization performance at faster training speed, and therefore, it has been used in a wide variety of classification applications [47,48]. ELM overcomes the bottleneck caused by the standard gradient-based learning schemes [49]. The ELM follows the working principle of randomized learning algorithms, where the input weights and hidden biases are randomly assigned and retained fixed in learning process, and the output weights are computed using the least squares method. Given a set of training samples (xj, yj), where x j ¼ T x j1 ; x j2 ; . . . ; xjl 2 Rl and yj = [yj1, yj2, . . ., yjC]T 2 RC, the ELM algorithm is outlined as follows. 1. Generate randomly the input weights and biases (whi ,bi), i = 1, 2, . . ., Q 2. Evaluate the hidden layer output matrix (H). 3. Determine the output weights wo ¼ Hy Y where Q indicates the number of hidden neurons and Hy represents the Moore-Penrose generalized inverse of H. An additional and effective method for evaluating wo is proposed in [50] and is expressed as wo ¼
I þ HT H l
1
HT Y
(4)
where a regularization parameter l is established to achieve a substantially stable solution and greater generalization performance. This variation of ELM is dubbed as regularized ELM (RELM). Despite many advantages, RELM poses many critical issues due to the random initialization of hidden node parameters (hidden biases and input weights). The first issue is that it demands a greater number of hidden neurons. Second, it produces an ill-conditioned network. Third, it yields different classification results in multiple trails. The conditioning of a network is estimated by a well-known measure known as 2norm condition number (K2) which is defined as follows [51], sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi lmax ðHT HÞ K2 ðHÞ ¼ lmin ðHT HÞ
Fig. 3 – FDCT-USFFT decomposition for an example MR image (a) brain image and (b) decomposed components.
(5)
where lmin ðHT HÞ and lmax ðHT HÞ denote the smallest and largest eigenvalues of HT H. It should be noted that the higher value of K2 causes an ill-conditioned network, while its lower
biocybernetics and biomedical engineering 39 (2019) 880–892
value leads to a well-conditioned network. The aforementioned issues cause ELM to induce poor generalization performance and to slowly respond to unknown testing samples. A series of attempts have been made in the last few years to address the aforementioned problems with the assistance of meta-heuristic methods such as genetic algorithms (GA) [52], PSO [53], and differential evolution (DE) [54].
3.4.2.
SCA-RELM + Norm: The suggested SCA-RELM + Norm algorithm consists of the following steps. (a) Initialize the candidate solutions (uz = 1, . . ., Ps) randomly; each candidate solution comprises a set of input weights, hidden biases and a parameter l as i h (8) uz ¼ wh11 ; . . . ; wh1l ; whQ1 ; . . . ; whQl ; b1 ; . . . ; bQ ; l
Sine cosine algorithm (SCA)
SCA is a trigonometric function based optimization technique proposed by Mirjalili [55]. It utilizes two trigonometric functions namely, cosine and sine to search the optimum solution. SCA has been found to be effective for solving optimization problems in comparison with its counterparts. The solution update equation of SCA is defined as
uðkÞ þ r1 :sinðr2 Þ:jr3 u uðkÞj if r4 < 0:5 uðk þ 1Þ ¼ uðkÞ þ r1 :cosðr2 Þ:jr3 u uðkÞj if r4 0:5
(6)
where, |. | indicates the absolute values, k denotes the current iteration, u(k) and u* represent the current solution and the best solution achieved so far. The parameters r1, r2, r3, and r4 indicate four random variables, r1 determines the position of the subsequent solution and is computed as
r1 ¼ p k
p MAXitr
(7)
where p is a constant and MAXitr denotes the maximum number of generations. The settings of r2, and r3 are discussed in Section 4. The parameter r4 holds a value between 0 and 1, and switches among the sine and cosine equations.
3.4.3.
885
(b) For each solution, compute wo using Eq. (4) and fitness on the validation set as sP ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi PQ N 2 o h j¼1 jj i¼1 wi fðwi x j þ bi Þ y j jj2 (9) f ðÞ ¼ N where N indicates the size of validation set. Determine the best solution u* in the population. (c) Evaluate r1, r2, r3 and r4, and update each candidate solution based on Eq. (6). (d) Test if any solution goes beyond the search space and modify it as low ifuz ðk þ 1Þ < low uz ðk þ 1Þ ¼ (10) high ifuz ðk þ 1Þ > high (e) Compute the fitness of each new solution and attain the new best solution unew . (f) Update the best solution u* as follows 8 < unew ifðj f ðu Þ f ðunew Þj < e f ðu Þ andjjwounew jj < jjwou jjÞ u ¼ (11) : otherwise u where, f(u*) and f ðunew Þ represent the fitness of the best solution obtained so far and the new best solution are the output weights of respectively, wou and wou new optimal solution achieved so far and the new optimal solution, and e > 0 denotes a user-specified tolerance rate.
Proposed hybrid regularized ELM method
The generalization capability of RELM relies on various factors such as regularization parameter (l), random hidden biases, and random input weights. Therefore, these parameters require to be chosen appropriately to obtain good generalization performance. To our best knowledge, the effectiveness of SCA has not yet been researched for optimizing the parameters of RELM. In this study, SCA has been explored to find the optimal parameters of RELM and the resulted model termed as SCA-RELM has been employed to classify the pathological brains. In [56] Bartlett shows that for feedforward neural networks having less learning error, smaller the norm of weights infers higher generalization performance. Additionally, it is mentioned in [57,51] that the random assignment of input weights and hidden biases in ELM leads to an ill-conditioned problem, thereby causing numerical unstable solutions. Most of the earlier studies on ELM focused on the convergence error while ignoring the condition of SLFN. Therefore, along with the rootmean-square error (RMSE), the norm of the output weights and the condition value of the hidden layer output matrix are separately considered in the optimization process to establish a well-conditioned and compact SLFN and the suggested methods are referred to as 'SCA-RELM + Norm' and 'SCARELM + Condition' respectively. The proposed methods are described stepwise in the following.
(g) Repeat (c)–(f) till the maximum number of generations reached and finally, acquire the optimal solution. SCA-RELM + Condition: The proposed SCA-RELM + Condition method follows similar steps as in SCA-RELM + Norm except step (f) which is stated as follows
u ¼
8 < unew :
u
ifðj f ðu Þ f ðunew Þj < e f ðu Þ andjjKunew jj < jjKu jjÞ otherwise
(12)
where, Ku and Kunew are the condition values of the hidden layer output matrix corresponding to the best solution so far and the new best solution respectively. The prime advantages of the suggested algorithms are summarized as: (i) they ensure a well-conditioned SLFN (ii) they tend to yield a smaller norm value, and (iii) they offer greater generalization performance with a compact SLFN architecture.
4.
Experimental results and discussion
The effectiveness of the proposed framework is validated through exhaustive simulations on three standard datasets:
886
biocybernetics and biomedical engineering 39 (2019) 880–892
Table 3 – Specification of three standard binary MR datasets used. Dataset
DS-I DS-II DS-III
k-fold
Training
Validation
Normal
Pathological
Normal
Pathological
Normal
Pathological
12 12 21
32 84 132
3 4 7
8 28 44
3 4 7
8 28 44
6 5 5
DS-I, DS-II, and DS-III. The performance results of the proposed framework are compared with the state-of-the-art schemes. The efficacy of the suggested learning algorithms: SCA-RELM + Norm and SCA-RELM + Condition, is compared with relevant learning algorithms such as back-propagation neural network (BPNN), traditional ELM, PSO-ELM, and DEELM. Finally, the impact of noise in the training data on our proposed scheme is analyzed.
4.1.
Testing
Experimental setting
For fair comparison purpose, the k-fold cross-validation setting in [11,22,41] is adopted for our experiment. However, in this work, k 1 folds are partitioned into two sets: validation set and training set, while the testing set remains the same. It may be noted that the samples for training, validation and testing set are chosen randomly. The validation set has been utilized to determine the parameters of SCARELM + Norm, SCA-RELM + Condition, PSO-ELM and DE-ELM algorithms. Table 3 lists the cross-validation settings of the three datasets utilized for simulation. The clip limit for CLAHE is set as 0.01 and the number of bins is set as 256. We set the parameter s of FDCT-USFFT to 5 based on Eq. (3) and the results are found ideal in this setting. The parameter M in PCA + LDA method is set as N 1 (N represents the number of training samples), while l is determined experimentally. The sigmoid activation function has been used in all the learning algorithms and the inputs to all the algorithms have been normalized into the range [1, 1]. The input weights and hidden biases are initialized in the range [1, 1]. The regularization parameter is chosen from l 2 210 ; 29 ; 28 ; . . . ; 28 ; 29 ; 210 . The population size and the number of iterations are set to 20 and 30 respectively and have been kept similar over all the methods to derive a relative comparative analysis. The e value in the SCA-RELM + Norm, SCA-RELM + Condition method is varied in between 0.01 and 0.2 with a step size of 0.01 and the value that yields better performance is chosen. The algorithm-specific parameters of different classifiers are listed in Table 4. These parameters are
either chosen through trial and error method or are the default values used in the literature.
4.2.
Results
In this section, we present the results obtained from various stages of the proposed scheme. Fig. 4 depicts the preprocessing output of a sample brain image. It can be observed that the affected regions are clearly visible in the preprocessed image as compared to the original image. With the proposed feature extraction method, we obtained 125952 features from the preprocessed MR images. Thereafter, PCA + LDA algorithm has been harnessed to reduce such a high-dimensional feature vector. First, PCA has been employed to reduce the original dimension to N 1 dimension and then, LDA is applied over the resultant feature vector. In order to determine the l value, an additional test has been carried out, where we recorded the classification accuracies with respect to different number of reduced features as shown in Fig. 5. The results of PCA are also computed and compared with PCA + LDA. It can be observed that PCA + LDA demands only two features to attain the optimal performance, whereas PCA obtains a comparatively lower performance with 14 features. This is due to the fact that LDA requires class information to decide the number of leading eigenvectors. It is worth mentioning here that the classification accuracies shown in the figure have been obtained using SCA-RELM + Norm classifier over DS-III dataset. Finally, classification is individually performed using SCARELM + Norm [39] and SCA-RELM + Condition. The performance of these proposed methods has been compared with PSO-ELM [53], DE-ELM [54], traditional ELM [49], and BPNN in terms of classification accuracy (%), number of hidden neurons (Q), norm of the output weights and condition number (K2) over three datasets, and the results are tabulated in Tables 5–7.
Table 4 – Parameter setting of different classification methods. Method
Parameter
Proposed SCA-RELM Constant p r2 in solution updating equation r3 in solution updating equation r4 in solution updating equation Acceleration coefficients c1 and c2 PSO-ELM Inertia factor DE-ELM Crossover rate Cr Scaling factor fs
Value (s) 2 [0, 2p] [0, 2] [0, 1] 2 [0.4,0.9] 0.7 0.8
Fig. 4 – Results of preprocessing using CLAHE (a) original MR image and (b) enhanced image.
887
biocybernetics and biomedical engineering 39 (2019) 880–892
Table 6 – Performance evaluation of different classification methods on DS-II dataset. Classifier BPNN ELM DE-ELM PSO-ELM SCA-RELM + Norm SCA-RELM + Condition
Fig. 5 – Classification accuracy versus the number of reduced features for DS-III dataset.
It is observed that the proposed classifiers achieve higher classification accuracy, smaller norm values, and smaller K2 values in comparison with their counterparts across all the datasets. An improved accuracy is observed with SCA-RELM + Norm when compared with SCA-RELM + Condition on DS-III. However, the SCA-RELM + Condition method obtains the smallest norm and condition values compared to SCA-RELM + Norm. It can also be noticed here that the traditional ELM requires the highest number of hidden neurons, whereas the proposed methods require the less number of hidden neurons to achieve better performance. The results from these tables infer that the suggested classification methods can produce good generalization performance with a compact and wellconditioned SLFN. The example of some misclassified MR samples predicted by our proposed SCA-RELM + Norm algorithm over DS-III dataset are shown in Fig. 6. It occurs due to the strong similarities in the pathological regions in these images.
4.3.
Comparison with state-of-the-art schemes
The performance of the proposed schemes 'FDCT + PCA + LDA + SCA-RELM + Condition' and 'FDCT + PCA + LDA + SCA-RELM + Norm' have been compared with different sets of existing approaches over three standard binary datasets in the context of classification accuracy and feature dimension, and results are shown in Table 8.
Table 5 – Performance evaluation of different classification methods on DS-I dataset. Classifier BPNN ELM DE-ELM PSO-ELM SCA-RELM + Norm SCA-RELM + Condition
Acc (%) 100.00 100.00 100.00 99.85 100.00 100.00
Hidden Norm neuron 4 5 3 3 2 2
30.4136 18.4813 20.4912 12.0096 5.9238
K2 4.1260e + 03 51.3119 60.0386 25.4302 12.3559
Acc (%) 99.88 100.00 99.94 100.00 100.00 100.00
Hidden Norm neuron 4 5 3 3 2 2
33.5066 20.4078 17.3990 10.1493 5.6665
K2 7.8646e + 03 70.1131 66.8353 17.7510 10.2759
It can be seen that the proposed schemes achieve improved classification accuracy compared to its competent schemes on DS-II and DS-III dataset. However, we have observed a comparable performance with state-of-the-art methods over DS-I dataset. Further, compared to other schemes, quite a less number of features are required by the proposed approaches.
4.4.
Impact of noise in training data
The training MR data is presumably to be noisy because of several factors such as mislabeling, problems during image acquisition and transmission which can reduce the model performance. An ideal model is expected to provide stable results in presence of the noise in the training data. In this study, we consider two types of noises that may exist in the training data: random Gaussian noise and mislabeling of training data, and investigate their impacts on the system performance separately. The efficacy of two proposed ('FDCT + PCA + LDA + SCA-RELM + Norm' and schemes 'FDCT + PCA + LDA + SCA-RELM + Condition' also referred to as Scheme-I and Scheme-II respectively) have been verified. In the first experiment, we have injected random Gaussian noise with a variance in the range [0.01,0.1] in the training data and the results obtained by the two schemes over three considered datasets are listed in Table 9. A slight reduction in classification accuracy is observed with both the proposed schemes when trained on the corrupted data. This shows the robustness of the proposed schemes towards the random noise. For mislabeling data, in another experiment, we randomly label some percentage of the total MR samples with wrong classes. This type of noise is most likely to occur during the manual labeling of MR samples. To our best knowledge, this is the first study to explore the strength of an automated pathological brain detection model in presence of such noise. In order to analyze the impact of this noise, experiments are conducted on three considered datasets with varying mislabeled percentage of the training data. Fig. 7 illustrates the performances obtained by our proposed methods with the inclusion of mislabeled data in the training set. The percentage of mislabeled data is varied from 0% to 20% with a step size of 2. It can be seen that the schemes trained using the mislabeled/noise data experience a marginally decreasing testing accuracy on all the three datasets when the percentage of noise increases. It is observed that in most cases the Scheme-II is more tolerant to mislabeled data than Scheme-I.
888
biocybernetics and biomedical engineering 39 (2019) 880–892
Table 7 – Performance evaluation of different classification methods on DS-III dataset. Classifier BPNN ELM DE-ELM PSO-ELM SCA-RELM + Norm SCA-RELM + Condition
Acc (%)
Hidden neurons
99.37 99.49 99.57 99.61 99.73 99.65
4 5 3 3 2 2
Norm
K2
102.6282 33.7713 22.0464 9.2361 7.9609
7.9502e + 03 121.7516 103.9707 17.0421 14.9542
Fig. 6 – Example of misclassified images in DS-III dataset.
Table 8 – Performance comparison of the proposed schemes with state-of-the-art methods. Scheme
#Feature
Acc (%) DS-I
DWT + SVM + POLY [5] Texture features + LS-SVM [31] DWT + PCA + BPNN + SCG [9] DWT + PCA + FNN + SCABC [10] DWT + PCA + FNN + ACPSO [7] DWT + PCA + Kernel SVM [8] WPSE + Generalized eigenvalue proximal SVM [17] WPTE + Generalized eigenvalue proximal SVM [17] WE + HMI + Generalized eigenvalue proximal SVM [20] DWT + PCA + AdaBoost with RF [13] DR2T + PCA + MPSO-ELM [15] FRFE + WTT + Twin SVM [18] DTCWT + VE + Generalized eigenvalue proximal SVM [22] RT + PCA + Least-square SVM [11] Wavelet + Histogram + GLCM + PSO-CPN [29] DWT + PCA + K-NN [6] FPCNN + DWT + PCA + FNN [2] SWT + PCA + Generalized eigenvalue proximal SVM [12] WE + Naive Bayes [19] WE + Kernel ELM [26] SWE + Kernel SVM [27] WE + BA-ELM [23] MBD + FNN + IPSO [14] FDCT + PCA + LDA + SCA-RELM + Condition FDCT + PCA + LDA + SCA-RELM + Norm
4761 52 19 19 19 19 16 16 14 13 13 12 12 9 9 7 7 7 7 7 7 7 5 2 2
98.00 99.70 100.00 100.00 100.00 100.00 99.85 100.00 100.00 100.00 100.00 100.00 100.00 100.00 99.85 98.00 100.00 100.00 92.58 99.24 100.00 99.39 100.00 100.00 100.00
DS-II 97.15 99.19 98.29 98.93 98.75 99.38 99.62 100.00 99.56 99.18 99.62 99.69 99.75 100.00 98.94 97.54 98.88 99.62 91.87 98.38 99.18 98.75 98.19 100.00 100.00
DS-III 96.37 98.94 97.14 97.81 97.38 98.82 98.78 99.33 98.63 98.35 99.18 98.98 99.25 99.39 98.71 96.79 98.43 99.02 90.51 98.16 99.02 98.62 98.08 99.65 99.73
889
biocybernetics and biomedical engineering 39 (2019) 880–892
Table 9 – Classification accuracy of the proposed schemes with injected Gaussian noise in the training data. Model
DS-I
Scheme-I w/o Gaussian noise Scheme-I w/ Gaussian noise Scheme-II w/o Gaussian noise Scheme-II w/ Gaussian noise
DS-II
100.00 99.24 100.00 99.70
DS-III
100.00 99.81 100.00 99.66
99.73 99.49 99.65 99.37
w/o: without and w/: with.
4.5.
Computation time analysis
Table 10 shows the average computation time of the proposed schemes during training, validation and testing over DS-III dataset. For scheme-I, the training, validation and testing time
per image are computed as 0.4859 s, 0.0021 s, and 0.0018 s, respectively; whereas for scheme-II, their costs are 0.5984 s, 0.0019 s, and 0.0016 s, respectively. The average testing time to predict a query image shows that the proposed model is rather fast and can hence meet real-time requirement.
Fig. 7 – Classification accuracy obtained by the proposed schemes when trained on mislabeled data.
Table 10 – Average computation time (in second) for DS-III dataset. Proposed scheme
Scheme-I Scheme-II
Training time
Validation time
Testing time
(per image)
(per image)
(per image)
0.4859 0.5984
0.0021 0.0019
0.0018 0.0016
890
5.
biocybernetics and biomedical engineering 39 (2019) 880–892
Conclusion
This paper presented an automated system for detection of the pathological brain in MR images. The FDCT-USFFT algorithm is applied to decompose the MR images into a set of subbands at different scales and directions, and the features are then extracted from the chosen subbands. These features are more effective for predicting brain abnormality since they are captured along multiple scales and orientations. The PCA + LDA algorithm is then harnessed to not only generate a reduced and discriminant feature vector but also eliminate the small sample size problem. The reduced feature sets are finally subjected to the proposed hybrid classifier to distinguish the MR images as normal or pathological. The proposed hybrid classifier is designed by incorporating RELM and SCA to achieve improved results. The SCA has been employed to optimize the parameters of RELM. Different from the existing hybrid ELM classifiers, the proposed hybrid classifier optimizes the parameters by considering the norm of the output weights and the condition performance of SLFN in isolation along with the root mean square error. The goal was to design a compact and well-conditioned network with better generalization performance. A series of experiments have been conducted on three well-studied datasets to validate the proposed scheme. The performance comparison with state-of-the-art methods demonstrates that our proposed scheme obtains higher performance on DS-III dataset with relatively less number of features, while a competitive performance is achieved on DS-I and DS-II schemes. It has been observed that the proposed SCA-RELM + Norm and SCA-RELM + Condition algorithms provide greater generalization performance with a compact network architecture compared to other learning methods. Further, the results on different noises introduced in the training data show the robustness of our proposed scheme. The proposed system can hence be used as a complementary tool by the radiologists to validate their diagnosis. In this study, the proposed scheme has been evaluated on the three standard datasets that comprise less MR samples; however, it's effectiveness need to be examined over a large and diverse dataset. The potential of proposed classifiers could be tested on real-world binary/multiclass classification problems as well as regression problems. Furthermore, the proposed work could be extended for detection of unclear diseases or the classification of diseases e.g., classification of different kind of brain tumors or brain strokes. For this, we only need to replace the number of neurons in the output layer of proposed 'SCA-RELM + Norm' network by the number of classes required. In future, various data augmentation techniques could be investigated to balance the datasets.
Author Statement Deepak Ranjan Nayak Conceptualization, Methodology, Software, Writing- Original draft preparation Ratnakar Dash Formal Analysis, Resources, Supervision, Validation Banshidhar Majhi Investigation, Writing-Reviewing and Editing,
Visualization, Supervision Yudong Zhang Data curation, Writing-Reviewing and Editing, Software, Validation
Conflict of interest We have no conflicts of interest.
references
[1] Caponetti L, Castellano G, Corsini V. MR brain image segmentation: a framework to compare different clustering techniques. Information 2017;8(4):138. [2] El-Dahshan EA, Mohsen HM, Revett K, Salem ABM. Computer-aided diagnosis of human brain tumor through MRI: a survey and a new algorithm. Expert Syst Appl 2014;41 (11):5526–45. [3] Zhang Y, Nayak DR, Yang M, Yuan T-F, Liu B, Lu H, Wang S. Detection of unilateral hearing loss by stationary wavelet entropy. CNS Neurol Disord Drug Targets 2017;16(2):122–8. [4] Nayak DR, Dash R, Majhi B. Classification of brain MR images using discrete wavelet transform and random forests. Fifth National Conference on Computer Vision Pattern Recognition Image Processing and Graphics (NCVPRIPG) 2015;1–4. IEEE. [5] Chaplot S, Patnaik LM, Jagannathan NR. Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomed Signal Process Control 2006;1(1):86–92. [6] El-Dahshan ESA, Honsy T, Salem ABM. Hybrid intelligent techniques for MRI brain images classification. Digit Signal Process 2010;20(2):433–41. [7] Zhang Y, Wang S, Wu L. A novel method for magnetic resonance brain image classification based on adaptive chaotic PSO. Prog Electromagn Res 2010;109:325–43. [8] Zhang Y, Wu L. An MR brain images classifier via principal component analysis and kernel support vector machine. Prog Electromagn Res 2012;130:369–88. [9] Zhang Y, Dong Z, Wu L, Wang S. A hybrid method for MRI brain image classification. Expert Syst Appl 2011;38 (8):10049–53. [10] Zhang Y, Wu L, Wang S. Magnetic resonance brain image classification by an improved artificial bee colony algorithm. Prog Electromagn Res 2011;116:65–79. [11] Das S, Chowdhury M, Kundu K. Brain MR image classification using multiscale geometric analysis of ripplet. Prog Electromagn Res 2013;137:1–17. [12] Dong Z, Liu A, Wang S, Ji G, Zhang Z, Yang J. Magnetic resonance brain image classification via stationary wavelet transform and generalized eigenvalue proximal support vector machine. J Med Imaging Health Inform 2015;5 (7):1395–403. [13] Nayak DR, Dash R, Majhi B. Brain MR image classification using two-dimensional discrete wavelet transform and AdaBoost with random forests. Neurocomputing 2016;177:188–97. [14] Chen X-Q, Zhan T-M, Jiao Z-Q, Sun Y, Yao Y, Fang L-T, Lv YD, Wang S-H. Fractal dimension estimation for developing pathological brain detection system based on Minkowski– Bouligand method. IEEE Access 2016;4:5937–47. [15] Nayak DR, Dash R, Majhi B. Discrete ripplet-II transform and modified PSO based improved evolutionary extreme learning machine for pathological brain detection. Neurocomputing 2018;282:232–47.
biocybernetics and biomedical engineering 39 (2019) 880–892
[16] Zhang Y-D, Jiang Y, Zhu W, Lu S, Zhao G. Exploring a smart pathological brain detection method on pseudo Zernike moment. Multimedia Tools Appl 2018;77(17):22589–604. [17] Dong Z, Wang S, Ji G, Yang J. Preclinical diagnosis of magnetic resonance (MR) brain images via discrete wavelet packet transform with Tsallis entropy and generalized eigenvalue proximal support vector machine. Entropy 2015;17(4):1795–813. [18] Wang S, Zhang Y, Yang X, Sun P, Dong Z, Liu A, Yuan T-F. Pathological brain detection by a novel image featurefractional Fourier entropy. Entropy 2015;17(12):8278–96. [19] Zhou X, Wang S, Xu W, Ji G, Phillips P, Sun P, Zhang Y. Detection of pathological brain in MRI scanning based on wavelet-entropy and naive Bayes classifier. in: Bioinformatics and Biomedical Engineering 2015;201–9. [20] Zhang Y, Wang S, Sun P, Phillips P. Pathological brain detection based on wavelet entropy and Hu moment invariants. Bio-med Mater Eng 2015;26(s1):S1283–90. [21] Yang G, Zhang Y, Yang J, Ji G, Dong Z, Wang S, Feng C, Wang Q. Automated classification of brain images using waveletenergy and biogeography-based optimization. Multimedia Tools Appl 2016;75(23):15601–17. [22] Wang S, Lu S, Dong Z, Yang J, Yang M, Zhang Y. Dual-tree complex wavelet transform and twin SVM for pathological brain detection. Appl Sci 2016;6(6):169. [23] Lu S, Qiu X, Shi J, Li N, Lu Z-H, Chen P, Yang M-M, Liu F-Y, Jia W-J, Zhang Y. A pathological brain detection system based on extreme learning machine optimized by bat algorithm. CNS Neurol Disord Drug Targets 2017;16(1):23–9. [24] Nayak DR, Dash R, Majhi B. Stationary wavelet transform and adaboost with SVM based pathological brain detection in MRI scanning. CNS Neurol Disord Drug Targets 2017;16 (2):137–49. [25] Zhang Y-D, Zhao G, Sun J, Wu X, Wang Z-H, Liu H-M, Govindaraj VV, Zhan T, Li J. Smart pathological brain detection by synthetic minority oversampling technique, extreme learning machine, and Jaya algorithm. Multimedia Tools Appl 2017;1–20. [26] Lu S, Lu Z, Yang J, Yang M, Wang S. A pathological brain detection system based on kernel based ELM. Multimedia Tools Appl 2018;77(3):3715–28. [27] Wang S, Du S, Atangana A, Liu A, Lu Z. Application of stationary wavelet entropy in pathological brain detection. Multimedia Tools Appl 2018;77(3):3701–14. [28] Zacharaki EI, Wang S, Chawla S, Soo Yoo D, Wolf R, Melhem ER, Davatzikos C. Classification of brain tumor type and grade using MRI texture and shape in a machine learning scheme. Magn Reson Med 2009;62(6):1609–18. [29] Hemanth DJ, Vijila CKS, Anitha J. Performance improved PSO based modified counter propagation neural network for abnormal MR brain image classification. Int J Adv Soft Comput Appl 2010;2(1):65–84. [30] Hu LS, Ning S, Eschbacher JM, Gaw N, Dueck AC, Smith KA, Nakaji P, Plasencia J, Ranjbar S, Price SJ, et al. Multiparametric MRI and texture analysis to visualize spatial histologic heterogeneity and tumor extent in glioblastoma. PLoS ONE 2015;10(11):e0141506. [31] Kaur T, Saini BS, Gupta S. A novel feature selection method for brain tumor MR image classification based on the fisher criterion and parameter-free bat optimization. Neural Comput Appl 2018;29(8):193–206. [32] Kaur T, Saini BS, Gupta S. An optimal spectroscopic feature fusion strategy for MR brain tumor classification using fisher criteria and parameter-free BAT optimization algorithm. Biocybern Biomed Eng 2018;38(2):409–24. [33] Mohan G, Subashini MM. MRI based medical image analysis: survey on brain tumor grade classification. Biomed Signal Process Control 2018;39:139–61.
891
[34] Liao L, Lin T. MR brain image segmentation based on kernelized fuzzy clustering using fuzzy Gibbs random field model. in: 2007 IEEE/ICME International Conference on Complex Medical Engineering 2007;529–35. IEEE. [35] Yang X, Fei B. A multiscale and multiblock fuzzy C-means classification method for brain MR images. Med Phys 2011;38(6):2879–91. [36] Li Y, Jia F, Qin J. Brain tumor segmentation from multimodal magnetic resonance images via sparse representation. Artif Intell Med 2016;73:1–13. [37] Ilunga-Mbuyamba E, Avina-Cervantes JG, Garcia-Perez A, de Jesus Romero-Troncoso R, Aguirre-Ramos H, CruzAceves I, Chalopin C. Localized active contour model with background intensity compensation applied on automatic MR brain tumor segmentation. Neurocomputing 2017;220:84–97. [38] Narayanan A, Rajasekaran MP, Zhang Y, Govindaraj V, Thiyagarajan A. Multi-channeled MR brain image segmentation: a novel double optimization approach combined with clustering technique for tumor identification and tissue segmentation. Biocybern Biomed Eng 2018. [39] Nayak DR, Dash R, Lu Z, Lu S, Majhi B. SCA-RELM: a new regularized extreme learning machine based on sine cosine algorithm for automated detection of pathological brain. in: 27th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) 2018;764–9. IEEE. [40] Johnson KA, Becker JA. The whole brain atlas; 1999, http:// www.med.harvard.edu/AANLIB/. [41] Nayak DR, Dash R, Majhi B, Prasad V. Automated pathological brain detection system: a fast discrete curvelet transform and probabilistic neural network based approach. Expert Syst Appl 2017;88:152–64. [42] Starck J-L, Candès EJ, Donoho DL. The curvelet transform for image denoising. IEEE Trans Image Process 2002;11 (6):670–84. [43] Candes E, Demanet L, Donoho D, Ying L. Fast discrete curvelet transforms. Multiscale Model Simul 2006;5(3):861– 99. [44] Uçar A, Demir Y, Güzelis C. A new facial expression recognition based on curvelet transform and online sequential extreme learning machine initialized with spherical clustering. Neural Comput Appl 2016;27 (1):131–42. [45] Sumana IJ, Islam MM, Zhang D, Lu G. Content based image retrieval using curvelet transform. Multimedia Signal Processing 2008 IEEE 10th Workshop on 2008;11–6. IEEE. [46] Yang J, Yang J-y. Why can LDA be performed in PCA transformed space? Pattern Recogn 2003;36(2):563–6. [47] Huang G-B, Wang DH, Lan Y. Extreme learning machines: a survey. Int J Mach Learn Cybern 2011;2(2):107–22. [48] Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B (Cybern) 2012;42(2):513–29. [49] Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing 2006;70 (1):489–501. [50] Deng W, Zheng Q, Chen L. Regularized extreme learning machine. IEEE Symposium on Computational Intelligence and Data Mining (CIDM'09) 2009;389–95. IEEE. [51] Zhao G, Shen Z, Miao C, Man Z. On improving the conditioning of extreme learning machine: a linear case. 7th International Conference on Information Communications and Signal Processing 2009;1–5. ICICS, IEEE. [52] Suresh S, Babu RV, Kim H. No-reference image quality assessment using modified extreme learning machine classifier. Appl Soft Comput 2009;9(2):541–52.
892
biocybernetics and biomedical engineering 39 (2019) 880–892
[53] Xu Y, Shu Y. Evolutionary extreme learning machine based on particle swarm optimization. International Symposium on Neural Networks 2006;644–52. Springer. [54] Zhu Q-Y, Qin AK, Suganthan PN, Huang G-B. Evolutionary extreme learning machine. Pattern Recognit 2005;38 (10):1759–63. [55] Mirjalili S. SCA: a sine cosine algorithm for solving optimization problems. Knowl Based Syst 2016;96:120–33.
[56] Bartlett PL. The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network. IEEE Trans Inf Theory 1998;44(2):525–36. [57] Han F, Yao H-F, Ling Q-H. An improved evolutionary extreme learning machine based on particle swarm optimization. Neurocomputing 2013;116:87–93.