A deep stacked random vector functional link network autoencoder for diagnosis of brain abnormalities and breast cancer

A deep stacked random vector functional link network autoencoder for diagnosis of brain abnormalities and breast cancer

Biomedical Signal Processing and Control 58 (2020) 101860 Contents lists available at ScienceDirect Biomedical Signal Processing and Control journal...

3MB Sizes 0 Downloads 39 Views

Biomedical Signal Processing and Control 58 (2020) 101860

Contents lists available at ScienceDirect

Biomedical Signal Processing and Control journal homepage: www.elsevier.com/locate/bspc

A deep stacked random vector functional link network autoencoder for diagnosis of brain abnormalities and breast cancer Deepak Ranjan Nayak a,∗ , Ratnakar Dash b , Banshidhar Majhi a , Ram Bilas Pachori c , Yudong Zhang d a

Department of Computer Science and Engineering, Indian Institute of Information Technology, Design and Manufacturing, Kancheepuram 600127, India Department of Computer Science and Engineering, National Institute of Technology, Rourkela 769008, India c Discipline of Electrical Engineering, Indian Institute of Technology Indore, Indore 453552, India d Department of Informatics, University of Leicester, Leicester LE1 7RH, UK b

a r t i c l e

i n f o

Article history: Received 22 October 2019 Received in revised form 20 December 2019 Accepted 11 January 2020 Keywords: Magnetic resonance imaging Autoencoder Random vector functional link network Deep neural network ReLU

a b s t r a c t Automated diagnosis of two-class brain abnormalities through magnetic resonance imaging (MRI) has progressed significantly in past few years. In contrast, there exists a limited amount of methods proposed to date for multiclass brain abnormalities detection. Such detection has shown its importance in biomedical research and has remained a challenging task. Almost all existing methods are designed using conventional machine learning approaches, however, deep learning methods, due to their advantages over machine learning, have recently achieved great success in various computer vision and medical imaging applications. In this paper, a deep neural network termed as stacked random vector functional link (RVFL) based autoencoder (SRVFL-AE) is proposed to detect the multiclass brain abnormalities. The RVFL autoencoders are the basic building blocks of the proposed SRVFL-AE. The main purpose of choosing RVFL as the core component of the proposed SRVFL-AE is to improve the generalization capability and learning speed compared to traditional autoencoder based deep learning methods. Further, the rectified linear unit (ReLU) activation function is incorporated in the proposed deep network to provide fast and better hidden representation of input features. To evaluate the effectiveness of suggested method, two benchmark multiclass MR brain datasets such as MD-1 and MD-2 are considered. The scheme achieved a greater accuracy of 96.67% and 95.00% on MD-1 and MD-2 datasets respectively. The efficacy of the model is also tested over a standard breast cancer dataset. The results demonstrated that our deep network obtains better performance with least training time and compact network architecture compared to its counterparts. © 2020 Elsevier Ltd. All rights reserved.

1. Introduction Brain is the control center of the body. A wide variety of brain disorders has been diagnosed in people of various age groups across all corners of the world. Brain disorders are generally classified into four categories: degenerative disease, cerebrovascular disease (stroke), neoplastic disease (brain tumor), and infectious disease [1]. These disorders cause serious issues and the prevalence of some of them is increased with age. Therefore, early screening of such disorders is indispensable to facilitate timely treatment and healthy living with the help of contemporary medical imaging

∗ Corresponding author. E-mail addresses: [email protected] (D.R. Nayak), [email protected] (R. Dash), [email protected] (B. Majhi), [email protected] (R.B. Pachori), [email protected] (Y. Zhang). https://doi.org/10.1016/j.bspc.2020.101860 1746-8094/© 2020 Elsevier Ltd. All rights reserved.

modalities. The most common modalities that are used to diagnose the abnormalities in the brain are positron emission tomography (PET), magnetic resonance imaging (MRI), and computed tomography (CT) [2]. But, MRI has been extremely exercised for brain related abnormalities detection because of its dominant features such as excellent spatial resolution, non-invasive characteristics, and superlative soft tissue contrast [2–5]. Detection of brain abnormalities is in general performed manually through MRIs by the medical experts and the manual method of examination at a larger scale may often cause wrong interpretation because of elements such as fatigue and overabundance of magnetic resonance (MR) slices per subject. Also, it is nonrepeatable, and it leads to intra and inter-reader variability. To mitigate these concerns, it is imperative to develop an automated detection method for diagnosis of various brain abnormalities which promotes fast, reliable, precise analysis, and supports the

2

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860

clinicians in their final screening process. Machine learning techniques are mostly utilized to design automated systems and such systems have achieved dramatic success in the last few decades. A large body of methods (aka automated pathological brain detection systems) has been formulated for classifying various brain MRIs and these schemes mainly focus on solving two types of MRI based brain disorders classification namely, binary and multiclass. In binary classification, the brain MRIs are classified into either pathological (abnormal) or normal category. Chaplot et al. [6] proposed an approach using discrete wavelet transform (DWT) features and support vector machine (SVM) classifier. El-Dahshan et al. [7] suggested the use of 2D DWT and principal component analysis (PCA) in order to derive salient features. For classification, they employed feed-forward neural network (FNN) and k-nearest neighbor (KNN) individually. Zhang et al. [8] developed a combined approach utilizing back-propagation neural network (BPNN) and DWT. Das et al. [9] designed a model where ripplet transform (RT) features are fed to least squares SVM (LS-SVM) classifier. In [10], spider web plot, wavelet entropy, and probabilistic neural network (PNN) were employed. The potential of spider web plot is discussed in [11]. El-Dahshan et al. [12] later modeled an approach where a segmentation algorithm is used for region of interest selection before employing the traditional machine learning stages. Later, in [13], Shannon and Tsallis entropy features computed from wavelet packet transform (DWPT-SE and DWPT-TE) were supplied to generalized eigenvalue proximal SVM. Wang et al. [14] designed a paradigm using wavelet entropy features and PNN classifier. Similarly, Zhou et al. [15] tested wavelet entropy features with naive Bayes classifier. Nayak et al. [16] developed a model using DWT and probabilistic PCA (PPCA) to generate important features. In their work, AdaBoost with random forests (ADBRF) has been employed as the classifier. Yang et al. [17] proposed to calculate energy features from various DWT subbands to classify the brain abnormalities. In another contribution, both energy and entropy features of different stationary wavelet transform (SWT) subbands have been taken into consideration to achieve improved performance [18]. Considering wavelet packet Tsallis entropy features, a model is developed in [19]. For classification, a combined technique comprising extreme learning machine (ELM) and Jaya optimization is used. Wang et al. [20] designed a framework using stationary wavelet entropy (SWE) features and kernel SVM classifier. Recently, the authors in [21] suggested to derive multidirectional features using discrete ripplet-II transform. A modified particle swarm optimization (MPSO) optimized ELM classifier is proposed for efficient identification of pathological brain. Gudigar et al. [22], in their recent study, compared the impact of curvelet, wavelet, and shearlet features for binary MRI classification. In multiclass classification, the MRIs are classified into normal or a distinct category of brain diseases and a few pieces of methods are proposed to date to solve this problem. Kalbkhani et al. [23] used wavelet transform and generalized autoregressive conditional heteroscedasticity for feature extraction. They employed KNN and SVM classifiers separately for the categorization of MRIs into eight classes i.e., normal brain and seven distinct classes of brain diseases. Jia et al. [24] designed an automated classification scheme using deep stacked sparse autoencoder (SSAE) to distinguish the MR images into five classes. Later, Nayak et al. [25] proposed an automatic screening method using entropy features extracted from curvelet subbands and kernel ELM (K-ELM) classifier. Recently, Gudigar et al. [26] analyzed the effect of variational mode decomposition and bidimensional empirical mode decomposition for multiclass classification of MRIs using SVM classifier. It is witnessed that numerous approaches have applied multiresolution techniques such as wavelet or its improved versions for deriving significant features from the brain images despite they fail to capture directional features. The classifiers like SVM, FNN,

and their variants have been extensively used. Almost all existing automated systems heavily rely on the traditional machine learning architecture in which the choice of suitable feature descriptor and classifier is a major concern. Furthermore, it has been observed that the overall detection performance is limited. Deep learning algorithms have made great success in vast amount of computer vision applications and therefore, have quickly turned into a method of choice in examining biomedical images [27–30]. The remarkable characteristics of deep learning include the elimination of manual feature extraction, automatic high-level feature learning, and easy adaption of architecture to solve other tasks [31]. The deep stacked sparse autoencoder (SSAE) method used in [24] optimizes the parameters through an iterative procedure which causes slow learning. This method trains independent autoencoders (AEs) in an unsupervised fashion which aids to initialize the weights of hidden layers and fine-tunes the entire network using back-propagation training method. In this paper, we address the aforesaid issues by proposing an efficient and fast deep learning framework based on the concept of AE. Our scheme is referred to as deep stacked random vector functional link autoencoder (SRVFL-AE) which can be considered as a joint representation of random vector functional link autoencoder (RVFL) and AE. The proposed SRVFL-AE is simple and yields greater generalization performance with least training time. The major contributions of this paper are outlined as follows: • A RVFL based AE (RVFL-AE) is designed for efficient feature encoding. The RVFL is the most popular randomized training algorithm for single layer feedforward neural network that provides effective representation of input data using random weights and non-linear mapping function [32,33]. Unlike standard AEs, the proposed RVFL-AE initializes the hidden parameters randomly and computes the output weights using an analytical approach. • To establish non-linearity in the network, both rectified linear unit (ReLU) and leaky ReLU (LReLU) have been explored individually. The efficacy of both ReLU and LReLU is compared against the most widely used sigmoid and tanh activation functions. • A novel deep learning architecture termed as ‘SRVFL-AE’ is proposed by stacking multiple RVFL-AEs layer-wise which provides fast and effective feature learning. The proposed SRVFL-AE comprises two major phases: (i) unsupervised feature learning, and (ii) supervised classification. The RVFL-AEs are trained in unsupervised manner and their output weights help to initialize the hidden layers of the proposed SRVFL-AE model. Finally, the weights connecting the last hidden layer to the output neurons are learned in a supervised manner using the same method as adopted in standard RVFL. It is worth noting that the proposed SRVFL-AE is free from additional fine-tuning step which is common in traditional deep learning methods, and thereby, facilitating better training speed. A comparative analysis of SRVFL-AE and its counterparts is performed. • To analyze the effectiveness of SRVFL-AE, we considered two standard multiclass MRI brain datasets as well as a benchmark breast cancer dataset. Furthermore, we compared the performance of our scheme with recent existing approaches and the results reveal the potential of the suggested network. The remaining of this article is organized as follows. Section 2 contains a comprehensive description of the datasets utilized for our simulation. An overview of RVFL network and the ongoing advancements of deep learning algorithms are presented in Section 3. Section 4 discusses the proposed deep SRVFL-AE model. The experimental results and analyses over two multiclass MRI brain datasets are provided in Section 5. Eventually, Section 6 concludes the paper with a succinct overview of future work.

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860

3

Fig. 1. Example brain MRIs from five different categories (a) normal, (b) degenerative, (c) stroke, (d) tumor, and (e) infectious.



2. Brain MR datasets used To validate our suggested model as well as existing methods, we considered two multiclass brain MRI datasets: MD-2 and MD1. The MD-1 is a small multiclass brain MRI dataset that comprises 75 images in total from five different groups such as degenerative disease, stroke, tumor, infectious disease, and normal [10]. A total of 15 images is taken into consideration for each of the five categories. The MD-2 [25], a reasonably larger multiclass dataset, consists of 200 brain MRIs in which each of the five classes carries the same number of MRIs. Fig. 1 depicts the example of brain MRIs from each of the five categories. The images in the considered datasets are available from the Harvard Medical School University dataset [34] and are of dimension 256 × 256 pixels. The MRI type is T2-weighted and the data were obtained along the axial view plane.

⎢ ⎢ H=⎢ ⎣ ⎡

h(x1 ) .. . h(xN ) x11

⎢ X = ⎢ .. ⎣ .

⎢ ⎣

In this section, we present a brief overview of the RVFL network that is used as the fundamental component of the proposed deep learning model and the recent progress on AE based deep learning algorithms. The RVFL network is the most popular randomized learning algorithm for single-layer feedforward neural networks (SLFNs) that provides good generalization performance at faster training speed and it has been profoundly used in applications such as regression and classification due to its universal approximation ability [32,35–37]. The key principle of RVFL is to randomly initialize the parameters of the hidden layer and thereafter, analytically determine the parameters of the output layer using the least square approach. Different from other SLFNs, the RVFL network allows direct links from the input to the output layer that facilitates improved generalization capability [33]. Moreover, it mitigates the common problems faced by the iterative based learning algorithms such as getting stuck in a local minimum, overfitting, and slow convergence. N samples labels  Given  with X = (xi , yi )|xi ∈ Rd , yi ∈ Rc , i = 1, . . ., N where each sam-



T

T

ple (xi , yi ) that is, xi = xi1 , xi2 , . . ., xid and yi = [yi1 , yi2 , . . ., yic ] , the conventional RVFL network consisting of Q enhancement nodes is stated as follows: Hwo = Y

...

x1

..

.. .

.

hQ (x1 )

.. .

..

.. .

h1 (xN )

...

w1o T .. . T o wQ +d

y1T

hQ (xN )

N×Q

(3)

...

⎢ ⎢ wo = ⎢ ⎣

...

⎥ ⎥ ⎦

h1 (x1 )

...

H=⎢



h1 (xN )

(2)

⎤ d

N×d

⎡ 3. Background

⎡ ⎤ h1 (x1 ) . . . hQ (x1 ) ⎥ ⎢ ⎥ ⎥ ⎢ . .. ⎥ .. ⎥ = ⎣ .. ⎦ . . ⎦

d xN

1 xN





.

hQ (xN )



x11

1 xN

...

...

x1d

d xN

⎥ ⎥ ⎥ ⎦

⎤ ⎥ ⎥ ⎦

(4)

N×(Q +d)

(5)

(Q +d)×c



⎢ ⎥ ⎢ . ⎥ Y = ⎢ .. ⎥ ⎣ ⎦

(6)

T yN

N×c h · x + b ), and so Here, h1 (x1 ) = ϕ(w1h · x1 + b1 ), hQ (x1 ) = ϕ(wQ 1 Q

on, wlh and bl for l = 1, . . ., Q are the weights and biases from the input to the hidden layer that are set as random values between [−u, u] and [0, u] respectively, u indicates a positive scaling factor [33], and ϕ(.) denotes any nonlinear transfer functions such as radial basis, sigmoid, cosine, sine, tanh, etc. From (1), we can evaluate the output weight matrix wo by the following Moore–Penrose pseudo inverse operation as, w o = H† Y

(7)

The other approach to calculate the output weights is called the ridge regression that solves the following problem:



2

(yi − i wo ) + ||wo ||2 ;

i = 1, . . ., N

(8)

i

(1)

where H indicates a matrix that forms by concatenating the randomly mapped features (resulted from the hidden nodes) and input features i.e., H = [H X], Y is a matrix comprising of a set of N target vectors, and wo represents the output weight matrix. These matrices are separately expressed in the following equations:

where i represents a vector containing both random hidden features and input features (rows in (4)). The solution (or output weights wo ) of the above equation is computed using a regularization parameter (C) as,

wo = HT HHT +

I C

−1 Y

(9)

4

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860

It is worth mentioning here that this method yields more robust solution and promotes improved generalization performance [38,39]. In any machine learning framework, the selection of appropriate features has remained a major concern as they have a strong impact on the generalization performance. Thus, meticulous feature engineering is of utmost importance to provide a better characterization of the input data. But, the design of such engineered features is the most time-consuming task in the traditional machine learning method, and it needs domain knowledge and expertise. Neural networks (NNs) with multiple hidden layers can be useful to characterize the complex data like image, video, etc., where each hidden layer learns to detect more complex features. However, it is hard to perform training for NNs with multiple hidden layers in general. In recent years, methods such as restricted Boltzmann machine (RBM) and AE [40–42] are being extensively used to extract complex features. These methods facilitate the training of multilayer NNs one layer at a time and have been utilized as the core units of various deep learning models like deep belief network (DBN) [41], stacked AEs (SAE) [43] and deep Boltzmann machine (DBM) [44,45]. Both DBM and DBN are formed when multiple numbers of RBMs are stacked layerwise, whereas, SAE is formed when the AEs are stacked one after another. In these deep NNs, the RBMs/AEs are utilized for separately learning the hidden layers in an unsupervised fashion that supports to initialize the hidden layer weights of the deep network. Finally, the full network is fine-tuned with the use of an iterative learning method (mainly, back-propagation) in supervised manner and therefore, the learning speed is very slow in these methods. Recently, deep learning methods have become increasingly popular in analyzing medical images. However, to our best of knowledge, a very limited amount of investigation has been made to date for detection of multiclass brain abnormalities using deep learning algorithms. A deep SSAE is explored in [24] which suffers from serious issues like slow learning speed and getting stuck at a local minimum. Further, its performance could be possibly improved to meet real-time requirements. Seen from the above analysis, the RVFL attempts to achieve good generalization performance at a greater training speed and can provide effective representation of data using random weights and non-linear mapping function. However, the feature representation with RVFL network may not be efficacious while dealing with complex data (e.g., images, videos, etc.) because of its shallow structure. A multi-layer architecture of RVFL can learn features at multiple levels of abstractions similar to other deep networks. 4. Proposed deep stacked RVFL AE In this section, we propose a deep NN architecture based on the concept of autoencoder and RVFL for multiclass classification of brain abnormalities. The impetus behind the proposition of such a model is to achieve improved generalization capability and faster learning speed as compared to traditional back-propagation based deep learning methods. 4.1. RVFL-AE Traditional AE is a NN which is learned to reconstruct its own inputs as the outputs and hence, the dimensionality of its output and input vector are same [43,46]. It has been employed as the principal components in many deep NNs. Training of an AE is carried out in an unsupervised way and its parameters are optimized using a cost function that estimates the error between the input x and its reconstruction xˆ at the output. AE is made up of an encoder and a decoder. For a given input x ∈ Rd , the encoder maps it to a hidden

Fig. 2. Network structure of a RVFL-AE. The ReLU/LReLU activation function is used in the hidden layer to obtain non-linear feature mapping. The direct links are ignored here because of same input and output.

representation x ∈ Rd using a transfer function and is defined as follows: h = ϕ(Wh · x + b)

(10)

where Wh is a weight matrix of size d × d and b is a bias vector. Then, the decoder maps the latent representation back into a reconstruction vector in input space that is defined as, xˆ = ϕ(Wh · h + b )

(11)

Recently, a much similar network to RVFL called ELM has been used to build a deep network using ELM based AE (ELM-AE) [47]. The key difference between ELM and RFVL is that ELM does not have direct connections among the input to output neurons. In ELM based deep learning architecture (ML-ELM), the ELM-AEs are stacked layer-wise and it does not need fine-tuning [48,49]. Zhang et al. [33] have recently evaluated the performance of RVFL over a large set of datasets and have shown the significance of direct connections in RVFL network. An enhanced performance has been achieved with RVFL than that of networks without direct links (also called as ELM) over several datasets. This motivates us to design a deep network based on RVFL and we refer it as deep stacked RVFL-AE (SRVFL-AE). The RVFL based AEs (RVFL-AEs) are the basic building blocks of deep SRVFL-AE. Similar to the traditional AE, RVFL-AE has two components: an encoder and a decoder which is shown in Fig. 2. The encoder projects the input x = [x1 , x2 , . . ., xd ] into a RVFL feature space h(x) = [h1 (x), h2 (x), . . ., hQ (x)] with the help of a number of orthogonal random weights and biases (wh , b). The weights and biases in this case are initialized according to the random projection theories in [50,51] which in turn preserve the Euclidean information of the input features. Unlike traditional AE and ELM-AE, the hidden layer mapping function in the proposed RVFL-AE is chosen to be either ReLU or LReLU because of its advantages over sigmoid, tanh and other non-linear functions [52–55]. The ReLU and LReLU functions are mathematically stated as follows:

 ϕ

ReLU

(v) =

ϕLReLU (v) =

v, v ≥ 0 0, v < 0

⎧ ⎨ v, ⎩

˛v,

v≥0 v<0

(12)

(13)

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860

5

Fig. 3. Architecture of the proposed deep SRVFL-AE. First row represents the layer by layer unsupervised learning using independent RVFL-AEs. Second row indicates the deep stacked RVFL-AE where the hidden layers are initialized through pre-trained weights of RFVL-AEs and the output weights are computed using regularized least square approach. Gray lines in the figure (lower part) indicate the output weights.

where ˛ indicates a fixed parameter. The decoder projects the h(x) back into input x using a set of output weights wo that are computed as,

wo = HT HHT + or wo =

HT H +

I C

−1

 I −1 C

X

(14)

ture learning with  independent RVFL-AEs and the output weights of each unit are used to initialize each hidden layer in SRVFL-AE network. The outcome of every hidden layer in our proposed deep network is calculated as, T

Hj = ϕ(Hj−1 · woj ); 1 ≤ j ≤  HT X

(15)

where H = [h1 , h2 , . . ., hN ] denotes the outcomes at every hidden neuron for N input samples and X = [x1 , x2 , . . ., xN ] represents both the output and input data. 4.2. Deep SRVFL-AE The proposed deep SRVFL-AE is a multi-layer architecture stacked with RVFL-AEs (as shown in Fig. 3) which are trained in an unsupervised manner without fine-tuning. The RVFL-AE provides orthogonal feature representation of input data in its encoder part, while it analytically calculates the output weights in the decoder part which lowers the computational cost as compared to conventional deep learning methods. The upper part of the figure (represented in dotted box) depicts the -layer unsupervised fea-

(16)

where ϕ(·) indicates the activation function (ReLU/LReLU in this case), Hj represents the hidden output matrix at layer j, and wjo denotes the trained output weights of jth AE. It may be noted that H0 in this case is similar to the input data X. It can be seen that the proposed deep SRVFL-AE allows direct connections among input and output nodes. The final output connections of the deep SRVFL-AE network are computed using the target labels (supervised classification) similar to traditional RVFL as follows:

Ow = HT HHT + or Ow =

HT H +

I C

I C

−1

−1

Y

(17)

HT Y

(18)

where H = [H X ], H indicates the output of last hidden layer, X = ϕ(X·W) denotes a random feature mapping of X with a dimen-

6

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860

sion of L (L « d), W is generated randomly and kept fixed, and Ow is a matrix having size (Q + L) × c. Once the final layer weights are trained, the proposed SRVFL-AE does not need fine-tuning like traditional deep NNs. It is worth mentioning here that the additional feature mapping of the input data (used to establish direct links between the input and output nodes in deep SRVFL-AE) is only required if the dimension of the input data is reasonably high. Since in this work the proposed SRVFL-AE is evaluated on the datasets that comprise MRIs with high resolution, this Ldimensional mapping is taken into consideration which in turn reduces the computational overhead. To sum up, the salient features of the proposed deep SRVFL-AE are as follows: 1 The proposed RVFL-AE only calculates the output weights using an analytical approach in a single-step as opposed to the conventional AEs which includes learning of both input and output weights through iterative approaches. 2 The ReLU/LReLU function is suggested to use in the network in order to obtain an effective non-linear feature mapping in spite of the widely applied tanh and sigmoid functions which on the other hand facilitates faster learning. 3 The output weights of each component (i.e., RVFL-AE) of the proposed deep SRVFL-AE network is utilized to initialize the weights in each hidden layer and the training of such components is accomplished in an unsupervised fashion. 4 Only the last layer of the deep network is learned in a supervised fashion and the most important feature of the proposed SRVFL-AE is that it eliminates the need of fine-tuning. These properties make the deep SRVFL-AE method more computationally efficient compared to its competent methods. 5. Experimental evaluation In this section, we present the simulation results that showcase the effectiveness of the proposed method. A series of experiments are carried out on two benchmark datasets (i.e., MD-1 and MD-2) to evaluate the our scheme. The proposed deep network is compared with basic RVFL and ELM along with its counterparts such as SAE [41], stacked denoising AE (SDAE) [46], and ML-ELM [47] over two datasets. A comparative analysis with state-of-the-art approaches is also carried out over the two multiclass datasets. Finally, we discuss the strengths of the proposed scheme.

Table 1 Parameter settings of different deep networks. Method

Parameter

Value (s)

SRVFL-AE

Hidden nodes in first RVFL-AE (Q1 ) Hidden nodes in second RVFL-AE (Q2 ) Random hidden nodes (L) C values used in two RVFL-AEs C value used to computer final solution Hidden nodes in first ELM-AE (Q1 ) Hidden nodes in second ELM-AE (Q2 ) C values used in two ELM-AEs C value used to computer final solution Hidden nodes in first AE (Q1 ) Hidden nodes in second AE (Q2 ) L2 regularization parameters for two AEs Number of epochs during pre-training Number of epochs during fine-training Noise corruption rate Hidden nodes in first AE (Q1 ) Hidden nodes in second AE (Q2 ) Number of epochs during pre-training Number of epochs during fine-training

250 750 100 2−1 and 25 210 750 1000 22 and 2−3 29 100 50 0.004 and 0.002 100 200 0.2 100 50 100 200

ML-ELM

SAE

SDAE

and used grid-search approach to find the best values. To derive a fair comparison, the same range of Q and C is taken into account while evaluating ML-ELM, basic ELM and RVFL networks. Note that the value of L in case of SRVFL-AE is experimentally chosen as 100. Similar to SRVFL-AE, we consider two-layer architecture for SAE, ML-ELM and SDAE. We empirically selected the best network configuration for ML-ELM and SRVFL-AE as (256 × 256)-750-1000-5 and (256 × 256)-250-750-5 respectively. To find the solutions in case of two different RVFL-AEs, two C values are set as 2−1 and 25 , whereas we set the C value to 210 for computing the final output weights. Similarly, for ML-ELM, the three regularization parameters are set to 22 , 2−3 , and 29 . We set a network configuration of (256 × 256)-100-50-5 for SAE that demands a lot of hyperparameters in comparison to SRVFL-AE and ML-ELM. In SAE, we set the number of epochs as 100 for pre-training and 200 for fine-tuning. The two AEs in this network need two L2 regularization parameters which are initialized as 0.004 and 0.002. It is worth noting that the SDAE has a similar network configuration as in SAE. The input noise corruption rate for this network is set to 0.2. The value of u in case of RVFL has been set to 1 i.e., the values of the weights and biases at the hidden layer of RVFL are initialized within a range [−1, 1] and [0, 1] respectively. A summary of list of parameters used for different deep networks is tabulated in Table 1. 5.2. Data augmentation

5.1. Experimental setup We implemented our proposed model using a system with configuration Intel Xeon 2.4 GHz processor, 64 GB RAM and single NVIDIA Quadro K2200 4 GB GPU, and developed the complete algorithm using MATLAB. We partitioned the datasets into training and testing sets. For training, we randomly selected 60% of the total MRIs (120 samples for MD-2 dataset and 45 samples for MD-1 dataset), whereas the remaining 40% (80 samples for MD-2 dataset and 30 samples for MD-1 dataset) were utilized for testing. The parameter settings of the suggested model along with its counterparts are described as follows: The SRVFL-AE used herein includes two hidden layers and therefore, we need two separate RVFL-AEs for their weight initialization. Unlike traditional NN that demands several hyperparameters to tune, SRVFL-AE requires just two hyperparameters: the regularization parameter (C) and the number of nodes (Q). Note that these hyperparameters are to be selected carefully to obtain a better generalization perfor mance.  In our simulation, we set  the value  of Q for layer  = 1, 2 as 50, 100, 150, 200, . . ., 3000 and C as 2−10 , 2−9 , . . ., 29 , 210

It is well-known that deep learning models generalize well only if they are trained on a very large amount of data. Data augmentation is a widely adopted method in deep learning to generate additional data using label-preserving transformations which addresses the scarcity of training data especially for biomedical images [56]. The commonly used data augmentation operations include scaling, rotation, cropping, random translations, flipping, etc. With larger training data, deep learning schemes can learn more robust features. Moreover, it aids to prevent the model from overfitting issue and hence, is considered as one of the easiest and effective ways to enhance the performance of a deep learning method [52]. As seen earlier, the training samples in the considered datasets are very less in number to support effective learning for the proposed deep learning model. Therefore, we performed data augmentation over the training samples prior to learning. The original images were augmented 63 times using the common operations as follows: (i) images are rotated using angles from −45◦ to 45◦ with a step size of 5, (ii) images are flipped in different directions (vertical and horizontal), (iii) images are applied for gamma

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860

7

Fig. 4. Visualization of learned weights of a RVFL-AE.

correction with a random value ranging from 0.7 to 1.3, and (iv) images are corrupted using Gaussian noise with variance 0.01. Note that we performed Gaussian noise injection and gamma correction over the images that are flipped and rotated beforehand along with the original images. Using the above augmentation procedure, we obtained 2835 and 7560 training images for MD-1 and MD-2 dataset respectively. 5.3. Results and analysis

Table 2 Performance evaluation of proposed SRFVL-AE network with different activation functions. Activation function

SRVFL-AE w/ sigmoid SRVFL-AE w/ tanh SRVFL-AE w/ ReLU SRVFL-AE w/ LReLU

Accuracy (%) MD-1

MD-2

93.33 93.33 96.67 96.67

90.00 88.75 93.75 95.00

w/: with.

In this subsection, the experimental results of the proposed model and its comparison with relevant existing methods on two multiclass MRI brain datasets are provided. Note that the original image pixels are given as input to the network and it directly classifies the MRI brain image to any of the five categories: normal, degenerative, stroke, tumor, and infectious. The impact of different activation functions in the proposed model in the context of classification accuracy is first investigated. That is, the efficacy of the suggested ReLU and LReLU functions in proposed deep network is compared with the two most widely used non-linear functions (i.e., sigmoid and tanh) and the results are shown in Table 2. It is evident that our deep network with (w/) LReLU function achieved a better performance compared to both sigmoid and tanh function. However, a competitive performance was observed for ReLU and LReLU functions. The representation learned by the encoder of RVFL-AE benefits to derive prominent attributes from different brain samples. The neurons in the encoder part connect to several weights that are

learned to characterize a particular visual feature. Fig. 4 depicts the abstraction of features learned through the first RVFL-AE of the proposed SRVFL-AE. The weights connected with merely 100 neurons (among 250) of the encoder part are shown in the figure to provide better visualization of learned features. It can be seen that the learned weights exhibit brain-like structures. The classification performances of the proposed SRVFL-AE network along with its counterparts such as RVFL [35], ELM [38], SAE [41], SDAE [46], and ML-ELM [47] over the testing samples of two datasets are summarized in Tables 3 and 4. It is worth noting here that all the methods are implemented on the same machine. It is observed that the proposed SRVFL-AE obtains higher accuracies on two datasets (represented in bold text) as compared to other methods. The training time is also found to be the least among all others and this is primarily due to the compact architecture of the proposed model. Further, an improved accuracy is revealed with

8

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860

Table 3 Performance comparison of SRVFL-AE with its competent methods over MD-1 dataset. Method

Accuracy (%)

Training time (s)

SAE [41] SDAE [46] ML-ELM [47] ELM [38] RVFL [35] SRVFL-AE w/o direct links (Proposed) SRVFL-AE w/ direct links (Proposed)

90.00 93.33 93.33 80.00 86.67 93.33 96.67

2519.70 2933.39 54.52 80.75 84.83 19.93 28.69

w/o: without, w/: with.

Table 4 Performance comparison of SRVFL-AE with its competent methods over MD-2 dataset. Method

Accuracy (%)

Training time (s)

SAE [41] SDAE [46] ML-ELM [47] ELM [38] RVFL [35] SRVFL-AE w/o direct links (Proposed) SRVFL-AE w/ direct links (Proposed)

88.75 90.00 91.25 82.50 85.00 91.25 95.00

5953.50 6459.27 101.06 162.50 198.34 39.58 44.42

SRFVL-AE containing direct links than that of SRVFL-AE without direct links. It can also be noticed that SRVFL-AE yields superior performance than the single layer RVFL. This is due to the fact that SRVFL-AE performs hierarchical learning to derive multilevel abstractions from the input images, whereas the traditional RVFL takes the raw images directly as the input to carry out classification. The confusion matrix obtained by SRVFL-AE using the testing samples of both MD-1 and MD-2 datasets is shown in Fig. 5. 5.4. Comparison with state-of-the-art schemes Table 5 lists the comparison results among the proposed deep learning algorithm and the state-of-the-art brain abnormalities detection schemes over two multiclass MR brain datasets. It can be observed that the proposed model achieves higher accuracies in comparison to other schemes on both MD-1 and MD-2 datasets. 5.5. Performance evaluation on breast cancer dataset To further test the generalization performance of the suggested deep network along with some relevant deep NN models, we considered a set of images from Mammographic Image Analysis Society (MIAS) dataset [57] which has been extensively used for breast cancer diagnosis. The MIAS dataset comprises 322 mammogram images (207 normal and 115 abnormal), each image has a resolution of 1024 × 1024 pixels. However, the images contained background with some noise. Therefore, region of interest (ROI) of all mammograms have been cropped to 128 × 128 pixels. Some example ROIs

Fig. 5. Confusion matrix obtained by SRVFL-AE with datasets (a) MD-1, and (b) MD-2.

Table 5 Performance comparison of SRVFL-AE with existing automated schemes for brain abnormalities detection. Reference

El-Dahshan et al. [7] Zhang et al. [11] and Saritha et al. [10] Nayak et al. [16] Zhang et al. [19] Nayak et al. [21] Jia et al. [24] Nayak et al. [25] Wang et al. [20] Proposed

Method

DWT and PCA DWT and entropy DWT and PPCA DWPT-TE Ripplet-II and PCA + LDA SSAE Fast curvelet transform and entropy SWT-DE Deep SRVFL-AE

Classifier

KNN PNN ADBRF ELM and Jaya MPSO-ELM K-ELM RBF-KSVM

Accuracy (%) MD-1

MD-2

83.33 86.67 86.67 83.33 90.00 93.33 90.00 86.67 96.67

82.50 85.00 87.50 86.25 88.75 90.00 91.25 88.75 95.00

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860

9

Fig. 6. Example mammographic ROIs (a) normal, and (b) abnormal.

Table 6 Comparison of classification performance on MIAS dataset.

In summary, the advantages of our method are as follows:

Method

Accuracy (%)

Training time (s)

SAE [41] SDAE [46] ML-ELM [47] ELM [38] RVFL [35] SRVFL-AE w/o direct links (Proposed) SRVFL-AE w/ direct links (Proposed)

89.58 90.63 91.67 84.38 87.50 92.71 94.79

1406.40 1782.66 31.56 41.20 48.55 9.86 11.38

from both the normal and abnormal category are depicted in Fig. 6. These ROIs are given as input to all the considered deep networks. It is worth noting that we kept the parameter settings of all methods similar to that of earlier experiment. In simulation, we selected 70% mammogram images (145 normal and 81 abnormal) randomly for training and rest 30% (62 normal and 34 abnormal) for testing. Table 6 lists the classification performance comparison results over MIAS dataset in the context of accuracy and learning speed. It can be observed that the proposed deep network obtains 94.79% accuracy with a better training speed. 5.6. Discussion An extensive set of simulations were conducted to demonstrate the effectiveness of the proposed method. The proposed SRVFL-AE attempted to learn hierarchical features from the input image and is operated for both feature extraction and classification. The experimental results over two multiclass MRI datasets reveal that the proposed SRVFL-AE is superior to traditional single layer RVFL and ELM in the context of training speed and classification accuracy. In addition, it achieves promising performance with a faster learning speed in comparison to relevant deep NNs like SAE and SDAE. It has been shown that SAE and SDAE need more than one hour for training even on a high performance machine. The results comparison among all activation functions suggested that LReLU achieves better performance. The suggested network yielded 96.67% classification accuracy for MD-1 dataset and 95.00% accuracy for MD-2 dataset. The comparative analysis with existing schemes confirmed the potential of the proposed network. Further, a standard breast cancer dataset has been taken into account to verify the strength of the proposed method and the simulations showed that the results are promising as compared to the other relevant deep learning models. The results of single layer ELM and RVFL is also compared.

• The proposed method unlike other deep learning methods, does not require learning of both input and output weights using an iterative approach, instead it only learns the output weights using regularized least square approach. In contrast to others, it does not involve any fine-tuning of parameters after training. • The proposed system directly learns the hierarchical features from the brain MRIs which eliminates the requirement of handengineered features. • The proposed model demands very less parameters to be selected during learning compared to iterative based deep learning approaches. Also, it has better generalization capability with compact network structure. • The proposed method obtains higher classification accuracy in detecting brain abnormalities, hence can be utilized as an adjunct tool by physicians to verify their screening. • The model can be well-suited to solve a wide range of image classification and medical image analysis problems. The limitation of our work is that the proposed model has been validated on datasets having small sample size. Hence, its performance needs to be tested on a larger and diverse database. The increase in the number of brain MRIs may demand more memory and computation time. Furthermore, the choice of the number of layers and neurons is still a concern with the proposed model. 6. Conclusion In this paper, a novel deep learning framework referred to as deep stacked RVFL AE (SRVFL-AE) is proposed to detect the multiclass abnormalities in the MRI brain images. The proposed method performed layer-wise unsupervised learning using RVFL-AEs to obtain multilevel abstractions from the input image without the need for fine-tuning. The activation functions like ReLU and leaky ReLU have been explored to provide non-linear feature mapping. The prime objective of designing such a deep network was to achieve better generalization performance with a very fast training speed as compared to the iterative based deep learning models. The proposed model was evaluated on two benchmark MRI brain multiclass datasets and one standard breast cancer dataset. It obtained a classification accuracy of 96.67% and 95.00% for MD-1 and MD2 dataset respectively and 94.79% for MIAS dataset. Comparison results with state-of-the-art schemes demonstrated that our pro-

10

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860

posed method yields superior performance. The model is simple, effective, and no longer requires traditional pipeline architecture for classification of MRI brain images and can assist physicians to cross check their diagnosis. In spite of the above benefits of the proposed framework, it has been realized that there exist enough scopes for further research. The proposed method needs to be validated on a larger dataset to verify its generalization capability. Further, the potency of SRVFL-AE can be tested on several other computer vision applications. CRediT authorship contribution statement Deepak Ranjan Nayak: Conceptualization, Methodology, Software, Writing - original draft. Ratnakar Dash: Formal analysis, Resources, Supervision, Validation. Banshidhar Majhi: Investigation, Writing - review & editing, Visualization, Supervision. Ram Bilas Pachori: Investigation, Writing - review & editing, Supervision, Validation. Yudong Zhang: Data curation, Writing - review & editing, Software, Validation. Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. References [1] S. Wang, Y. Zhang, T. Zhan, P. Phillips, Y.-D. Zhang, G. Liu, S. Lu, X. Wu, Pathological brain detection by artificial intelligence in magnetic resonance imaging scanning, Prog. Electromagn. Res. 156 (2016) 105–133. [2] J. Shi, X. Zheng, Y. Li, Q. Zhang, S. Ying, Multimodal neuroimaging feature learning with multimodal stacked deep polynomial networks for diagnosis of Alzheimer’s disease, IEEE J. Biomed. Health Inform. 22 (1) (2018) 173–183. [3] J.A. Turner, S.G. Potkin, G.G. Brown, D.B. Keator, G. McCarthy, G.H. Glover, Neuroimaging for the diagnosis and study of psychiatric disorders, IEEE Signal Process. Mag. 24 (4) (2007) 112–117. [4] O. Sahu, V. Anand, V. Kanhangad, R.B. Pachori, Classification of magnetic resonance brain images using bi-dimensional empirical mode decomposition and autoregressive model, Biomed. Eng. Lett. 5 (4) (2015) 311–320. [5] V. Bhateja, H. Patel, A. Krishn, A. Sahu, A. Lay-Ekuakille, Multimodal medical image sensor fusion framework using cascade of wavelet and contourlet transform domains, IEEE Sens. J. 15 (12) (2015) 6783–6790. [6] S. Chaplot, L.M. Patnaik, N.R. Jagannathan, Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network, Biomed. Signal Process. Control 1 (1) (2006) 86–92. [7] E.S.A. El-Dahshan, T. Honsy, A.B.M. Salem, Hybrid intelligent techniques for MRI brain images classification, Digit. Signal Process. 20 (2) (2010) 433–441. [8] Y. Zhang, Z. Dong, L. Wu, S. Wang, A hybrid method for MRI brain image classification, Expert Syst. Appl. 38 (8) (2011) 10049–10053. [9] S. Das, M. Chowdhury, K. Kundu, Brain MR image classification using multiscale geometric analysis of ripplet, Prog. Electromagn. Res. 137 (2013) 1–17. [10] M. Saritha, K.P. Joseph, A.T. Mathew, Classification of MRI brain images using combined wavelet entropy based spider web plots and probabilistic neural network, Pattern Recognit. Lett. 34 (16) (2013) 2151–2156. [11] Y. Zhang, Z. Dong, G. Ji, S. Wang, Effect of spider-web-plot in MR brain image classification, Pattern Recognit. Lett. 62 (2015) 14–16. [12] E.A. El-Dahshan, H.M. Mohsen, K. Revett, A.B.M. Salem, Computer-aided diagnosis of human brain tumor through MRI: a survey and a new algorithm, Expert Syst. Appl. 41 (11) (2014) 5526–5545. [13] Y. Zhang, Z. Dong, S. Wang, G. Ji, J. Yang, Preclinical diagnosis of magnetic resonance (MR) brain images via discrete wavelet packet transform with Tsallis entropy and generalized eigenvalue proximal support vector machine, Entropy 17 (4) (2015) 1795–1813. [14] S. Wang, P. Phillips, J. Yang, P. Sun, Y. Zhang, Magnetic resonance brain classification by a novel binary particle swarm optimization with mutation and time-varying acceleration coefficients, Biomed. Eng./Biomed. Tech. (2016) 1–10. [15] X. Zhou, S. Wang, W. Xu, G. Ji, P. Phillips, P. Sun, Y. Zhang, Detection of pathological brain in MRI scanning based on wavelet-entropy and naive Bayes classifier, Bioinform. Biomed. Eng. (2015) 201–209.

[16] D.R. Nayak, R. Dash, B. Majhi, Brain MR image classification using two-dimensional discrete wavelet transform and AdaBoost with random forests, Neurocomputing 177 (2016) 188–197. [17] G. Yang, Y. Zhang, J. Yang, G. Ji, Z. Dong, S. Wang, C. Feng, Q. Wang, Automated classification of brain images using wavelet-energy and biogeography-based optimization, Multimedia Tools Appl. 75 (23) (2016) 15601–15617. [18] D.R. Nayak, R. Dash, B. Majhi, Stationary wavelet transform and adaboost with SVM based pathological brain detection in MRI scanning, CNS Neurol. Disord. Drug Targets 16 (2) (2017) 137–149. [19] Y.-D. Zhang, G. Zhao, J. Sun, X. Wu, Z.-H. Wang, H.-M. Liu, V.V. Govindaraj, T. Zhan, J. Li, Smart pathological brain detection by synthetic minority oversampling technique, extreme learning machine, and jaya algorithm, Multimedia Tools Appl. (2017) 1–20. [20] S. Wang, S. Du, A. Atangana, A. Liu, Z. Lu, Application of stationary wavelet entropy in pathological brain detection, Multimedia Tools Appl. 77 (3) (2018) 3701–3714. [21] D.R. Nayak, R. Dash, B. Majhi, Discrete ripplet-ii transform and modified PSO based improved evolutionary extreme learning machine for pathological brain detection, Neurocomputing 282 (2018) 232–247. [22] A. Gudigar, U. Raghavendra, T.R. San, E.J. Ciaccio, U.R. Acharya, Application of multiresolution analysis for automated detection of brain abnormality using MR images: a comparative study, Future Gener. Comput. Syst. 90 (2019) 359–367. [23] H. Kalbkhani, M.G. Shayesteh, B. Zali-Vargahan, Robust algorithm for brain magnetic resonance image (MRI) classification based on GARCH variances series, Biomed. Signal Process. Control 8 (6) (2013) 909–919. [24] W. Jia, K. Muhammad, S.-H. Wang, Y.-D. Zhang, Five-category classification of pathological brain images based on deep stacked sparse autoencoder, Multimedia Tools Appl. 78 (4) (2019) 4045–4064. [25] D.R. Nayak, R. Dash, X. Chang, B. Majhi, S. Bakshi, Automated diagnosis of pathological brain using fast curvelet entropy features, IEEE Trans. Sustain. Comput. (2018) 1–12. [26] A. Gudigar, U. Raghavendra, E.J. Ciaccio, N. Arunkumar, E. Abdulhay, U.R. Acharya, Automated categorization of multi-class brain abnormalities using decomposition techniques with MRI images: a comparative study, IEEE Access (2019) 28498–28509. [27] G. Litjens, T. Kooi, B.E. Bejnordi, A.A.A. Setio, F. Ciompi, M. Ghafoorian, J.A. van der Laak, B. Van Ginneken, C.I. Sánchez, A survey on deep learning in medical image analysis, Med. Image Anal. 42 (2017) 60–88. [28] J. Schmidhuber, Deep learning in neural networks: an overview, Neural Netw. 61 (2015) 85–117. [29] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeper with convolutions, Computer Vision and Pattern Recognition (2015) 1–9. [30] R.K. Tripathy, A. Bhattacharyya, R.B. Pachori, Localization of myocardial infarction from multi-lead ECG signals using multiscale analysis and convolutional neural network, IEEE Sens. J. 19 (23) (2019) 11437–11448. [31] X. Cao, R. Togneri, X. Zhang, Y. Yu, Convolutional neural network with second-order pooling for underwater target classification, IEEE Sens. J. 19 (8) (2018) 3058–3066. [32] Y.-H. Pao, G.-H. Park, D.J. Sobajic, Learning and generalization characteristics of the random vector functional-link net, Neurocomputing 6 (2) (1994) 163–180. [33] L. Zhang, P.N. Suganthan, A comprehensive evaluation of random vector functional link networks, Inf. Sci. 367 (2016) 1094–1105. [34] K.A. Johnson, J.A. Becker, The Whole Brain Atlas, http://www.med.harvard. edu/AANLIB/. [35] Y.-H. Pao, S.M. Phillips, D.J. Sobajic, Neural-net computing and the intelligent control of systems, Int. J. Control 56 (2) (1992) 263–289. [36] L. Zhang, P.N. Suganthan, Visual tracking with convolutional random vector functional link network, IEEE Trans. Cybern. 47 (10) (2017) 3243–3253. [37] D.R. Nayak, R. Dash, B. Majhi, U.R. Acharya, Application of fast curvelet tsallis entropy and kernel random vector functional link network for automated detection of multiclass brain abnormalities, Comput. Med. Imaging Graph. 77 (2019) 101656. [38] G.-B. Huang, H. Zhou, X. Ding, R. Zhang, Extreme learning machine for regression and multiclass classification, IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 42 (2) (2012) 513–529. [39] C.M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006. [40] G.E. Hinton, A practical guide to training restricted Boltzmann machines, in: Neural Networks: Tricks of the Trade, Springer, 2012, pp. 599–619. [41] G.E. Hinton, R.R. Salakhutdinov, Reducing the dimensionality of data with neural networks, Science 313 (5786) (2006) 504–507. [42] Y. Bengio, et al., Learning deep architectures for AI, Found. Trends Mach. Learn. 2 (1) (2009) 1–127. [43] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, P.-A. Manzagol, Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res. 11 (2010) 3371–3408. [44] R. Salakhutdinov, H. Larochelle, Efficient learning of deep Boltzmann machines, International Conference on Artificial Intelligence and Statistics (2010) 693–700. [45] N. Srivastava, R.R. Salakhutdinov, Multimodal learning with deep Boltzmann machines, Advances in Neural Information Processing Systems (2012) 2222–2230.

D.R. Nayak, R. Dash, B. Majhi et al. / Biomedical Signal Processing and Control 58 (2020) 101860 [46] P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, in: International Conference on Machine Learning, ACM, 2008, pp. 1096–1103. [47] L.L.C. Kasun, H. Zhou, G.-B. Huang, C.M. Vong, Representational learning with extreme learning machine for big data, IEEE Intell. Syst. 28 (6) (2013) 31–34. [48] J. Tang, C. Deng, G.-B. Huang, Extreme learning machine for multilayer perceptron, IEEE Trans. Neural Netw. Learn. Syst. 27 (4) (2016) 809–821. [49] D.R. Nayak, D. Das, R. Dash, S. Majhi, B. Majhi, Deep extreme learning machine with leaky rectified linear unit for multiclass classification of pathological brain images, Multimedia Tools Appl. (2019) 1–16. [50] W.B. Johnson, J. Lindenstrauss, Extensions of Lipschitz mappings into a Hilbert space, Contemp. Math. 26 (189-206) (1984) 1. [51] D. Achlioptas, Database-friendly random projections, in: Proceedings of 20th Symposium on Principles of Database Systems, ACM, 2001, pp. 274–281.

11

[52] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems (2012) 1097–1105. [53] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: surpassing human-level performance on imagenet classification, in: International Conference on Computer Vision, IEEE, 2015, pp. 1026–1034. [54] A.L. Maas, A.Y. Hannun, A.Y. Ng, Rectifier nonlinearities improve neural network acoustic models, International Conference on Machine Learning (2013). [55] B. Xu, N. Wang, T. Chen, M. Li, Empirical evaluation of rectified activations in convolutional network, arXiv preprint arXiv:1505.00853. [56] Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, M.S. Lew, Deep learning for visual understanding: a review, Neurocomputing 187 (2016) 27–48. [57] J. Suckling, et al., The mammographic image analysis society digital mammogram database, Exerpta Medica. International Congress Series (1994) 375–386.