Automated pathological brain detection system: A fast discrete curvelet transform and probabilistic neural network based approach

Automated pathological brain detection system: A fast discrete curvelet transform and probabilistic neural network based approach

Accepted Manuscript Automated pathological brain detection system: A fast discrete curvelet transform and probabilistic neural network based approach...

9MB Sizes 1 Downloads 85 Views

Accepted Manuscript

Automated pathological brain detection system: A fast discrete curvelet transform and probabilistic neural network based approach Deepak Ranjan Nayak, Ratnakar Dash, Banshidhar Majhi, Vijendra Prasad PII: DOI: Reference:

S0957-4174(17)30462-1 10.1016/j.eswa.2017.06.038 ESWA 11409

To appear in:

Expert Systems With Applications

Received date: Revised date: Accepted date:

17 December 2016 22 June 2017 23 June 2017

Please cite this article as: Deepak Ranjan Nayak, Ratnakar Dash, Banshidhar Majhi, Vijendra Prasad, Automated pathological brain detection system: A fast discrete curvelet transform and probabilistic neural network based approach, Expert Systems With Applications (2017), doi: 10.1016/j.eswa.2017.06.038

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights • The proposed scheme can efficiently detect pathological brain in real-time. • FDCT via wrapping scheme is employed to capture curve features from MR images. • The proposed PBDS is validated on several benchmark datasets.

AC

CE

PT

ED

M

AN US

• It has a potential to be installed on medical robots.

CR IP T

• The proposed scheme outperforms 21 existing competent schemes.

1

ACCEPTED MANUSCRIPT

Automated pathological brain detection system: A fast discrete curvelet transform and probabilistic neural network based approach Deepak Ranjan Nayaka,∗, Ratnakar Dasha , Banshidhar Majhia , Vijendra Prasada Pattern Recognition Lab, Department of Computer Science and Engineering, National Institute of Technology, Rourkela, India 769 008

CR IP T

a

Abstract

Computer-aided diagnosis (CAD) systems have drawn attention of researchers for arriving at qualitative and faster clinical decisions, and hence has become one of the most important directions of

AN US

research. In this paper, we propose an efficient CAD system to classify pathological and healthy brains using brain MR images. The suggested pathological brain detection system (PBDS) has the ability to help radiologists to initiate the corrective measures for treating the ailing patients at an early stage. The proposed scheme uses a simplified pulse-coupled neural network (SPCNN) for the region of interest (ROI) segmentation and fast discrete curvelet transform (FDCT) for feature

M

extraction. Subsequently, PCA+LDA approach is harnessed for feature dimensionality reduction and finally probabilistic neural network (PNN) is applied for classification. The scheme is validated on various standard datasets and compared with existing competent schemes with respect

ED

to classification accuracy and number of features. The statistical set up is kept similar as reported in the recent literature to derive an unbiased analysis. Experimental results demonstrate that

PT

the suggested scheme yields higher accuracy as compared to others with considerably less number of features. The number of parameters need to be tuned at different stages are significantly less

CE

in contrast to existing schemes. Further, PNN used has a simple network structure and offers faster learning speed. Therefore, the proposed scheme can effectively detect pathological brain in real-time and hence has a potential to be installed on medical robots.

AC

Keywords: Computer-aided diagnosis (CAD), Magnetic resonance imaging (MRI), Pulse-coupled neural network (PCNN), Fast discrete curvelet transform (FDCT), PCA+LDA, Probabilistic neural network (PNN)



Corresponding author Email addresses: [email protected] (Deepak Ranjan Nayak), [email protected] (Ratnakar Dash), [email protected] (Banshidhar Majhi), [email protected] (Vijendra Prasad)

Preprint submitted to Journal of Expert System with Applications

July 3, 2017

ACCEPTED MANUSCRIPT

1. Introduction During the past few decades, brain diseases are commonly found in people with different age groups across the globe. These diseases can be categorized into various groups such as cerebrovascular diseases (stroke), neoplastic diseases (brain tumor), infectious diseases and degenerative diseases. 5

Sometimes these diseases cause serious issues in human brain and prompt to death. Early diagnosis

CR IP T

system in order to arrive at correct and quick clinical choices is a basic necessity to mitigate the calamity through pathological brain detection (PBD) (Wang et al., 2016a). Magnetic resonance imaging (MRI) is a potential medical imaging technique in solving many PBD problems as it has the advantage of providing huge information regarding the soft tissues of the human brain (West10

brook, 2014; Zhang, Dong, Wu, & Wang, 2011a). In addition, MRI is a non-invasive and faster

AN US

imaging modality in contrast to other modalities. However, interpreting MR images manually is a costly, troublesome and time-consuming task because of the enormous data storage Nayak et al. (2016a). This has paved the way for modeling computer-aided diagnosis (CAD) systems which can aid radiologists in taking accurate and fast decisions. The CAD systems in this regard are generally 15

dubbed as pathological brain detection systems (PBDSs) which are particularly designed for brain

M

disease identification.

Many attempts have been made to propose various PBDSs in the recent past El-Dahshan et al.

ED

(2014). However, the choice of proper features and the relevant classifiers remain a challenge during the development of these systems. Discrete wavelet transform (DWT) has been found worthy in 20

extracting features from the brain MR images as it can analyze images at several scales. DWT

PT

can effectively handle point singularities, but it performs poorly in representing line and curve singularities. In other words, DWT can not capture curve like features effectively from the MR

CE

images. Further, two classifiers such as feed-forward neural network (FNN) and support vector machine (SVM) have been frequently employed in earlier PBDSs due to their abilities to separate nonlinear input patterns and predicting continuous functions. Nevertheless, traditional training

AC

25

algorithms of FNN suffer from parameter tuning and local minima issues. To overcome such issues in FNN, few hybrid algorithms have been suggested in recent PBDSs in which different populationbased optimization algorithms such as simulated annealing (SA), genetic algorithm (GA), particle swarm optimization (PSO), and artificial bee colony (ABC) are embedded with FNN. However,

30

they lead to higher computational overhead. Additionally, conventional SVM classifier suffers from higher computational complexity as it constantly solves a quadratic programming problem and it performs poorly on large datasets. The number of features in most PBDSs is also larger 3

ACCEPTED MANUSCRIPT

which makes the classifier’s task more difficult and time-consuming. Recently, El-Dahshan et al. (2014) have suggested a scheme FPCNN + DWT + PCA + FNN which utilizes feedback pulse35

coupled neural network (FPCNN) for ROI extraction, DWT for feature extraction, PCA for feature dimensionality reduction and FNN for classification and the scheme is validated on a dataset of 101 images. However, their scheme suffers from following limitations:

CR IP T

(a) FPCNN contains one additional feedback input and requires ten parameters to tune. (b) DWT can not capture curve like features which is inherent in MR images. 40

(c) Validation has been performed on a dataset with less number of images.

(d) Back-propagation algorithm has been used to train FNN which is apt to be trapped at local minima and requires parameter tuning.

AN US

To alleviate these limitations, here we propose a PBDS with the following characteristics:

(a) Simple pulse-coupled neural network (SPCNN) is employed for ROI extraction which has a 45

simple network structure and needs only four parameters to tune.

(b) Fast discrete curvelet transform (FDCT) is used place of DWT since FDCT is efficient in

M

capturing curve-like features.

(c) Simulations are carried with various benchmark datasets in order to validate the proposed

50

ED

scheme. In this context, the proposed PBDS is compared with El-Dahshan’s scheme as well as other competent schemes.

(d) A probabilistic neural network (PNN) classifier is exploited, which offers faster training process

PT

and does not suffer from local minima issue. The rest of the article is structured as follows. Section 2 highlights the related works. The

55

CE

materials used in the experiments are given in Section 3. The proposed methodology is delineated in Section 4. Section 5 outlines the experimental results and comparisons with other existing

AC

schemes. In Section 6, concluding remarks are presented along with an outline of the future research directions.

2. Related Work In the last decade, a number of schemes on pathological brain detection have been reported in

60

the literature for detection of brain abnormality using MR images. These schemes can be of two types such as direct feature based schemes and indirect feature based schemes in light of the type 4

ACCEPTED MANUSCRIPT

of features used. In the first scheme, the direct coefficients of various image transforms are used as the key features; whereas in the later one, the statistical descriptors like energy, entropy, standard deviation, etc., are first calculated from the direct coefficients to serve as features. 65

Chaplot et al. (2006) have first developed a PBDS using the two-dimensional DWT (2D DWT) features and two separate classifiers such as self-organizing map (SOM) and SVM. In another work,

CR IP T

Maitra & Chatterjee (2006) have proposed a scheme which utilizes back-propagation neural network (BPNN) for classification and Slantlet transform (ST) for feature extraction. In El-Dahshan et al. (2010), the authors have offered a hybrid approach with the help of 2D DWT features and two 70

different classifiers such as k-nearest neighbor (k-NN) and feed forward back-propagation artificial neural network (FP-ANN). To reduce the feature dimensionality, they have applied principal component analysis (PCA). Later on, Zhang et al. (2010, 2011a,b, 2013) have offered new PBDSs with

AN US

the same features (2D DWT+PCA). But for classification, they have applied several optimization methods such as scaled conjugate gradient (SCG), particle swarm optimization (PSO), adaptive 75

chaotic PSO (ACPSO), and scaled chaotic artificial bee colony (SCABC) with classifiers like FNN, BPNN and kernel SVM (KSVM). Furthermore, Zhang & Wu (2012) have suggested a PBDS where DWT plus PCA based features are given to a KSVM classifier. In Das et al. (2013), a new PBDS is

M

proposed where the features are derived from Ripplet transform (RT) and then subjected to PCA for dimensionality reduction. Subsequently, they have applied least squares SVM (LS-SVM) classifier in order to get significant results. Later on, El-Dahshan et al. (2014) have proposed a scheme

ED

80

where DWT plus PCA based features are used after the segmentation using feedback pulse coupled

PT

neural network (FPCNN) followed by a FP-ANN classifier. In Wang et al. (2015a), a new automatic classification system is offered based on stationary wavelet transform (SWT) which is translation

85

CE

invariant in nature. In this, PCA is used to get a low dimensional feature vector and to update the parameter of FNN classifier, hybridization of two swarm intelligence based algorithms such as ABC and PSO are performed. Hence, the schemes are termed as IABAP-FNN, ABC-SPSO-FNN, and

AC

HPA-FNN. On the other hand, for feature extraction and reduction, Zhang et al. (2015f) have used weighted-type fractional Fourier transform (WFRFT) and PCA respectively. Finally, two variants of SVM, namely, generalized eigenvalue proximal SVM (GEPSVM) and twin SVM (TSVM) have

90

been employed for classification. Thereafter, to achieve better classification accuracy, Nayak et al. (2016a) have offered a PBDS based on 2D DWT, probabilistic PCA (PPCA) and AdaBoost with random forests (ADBRF). While in Zhang et al. (2015b), the authors have suggested a PBDS where SWT plus PCA based features are considered prior to the application of GEPSVM classifier. Fur5

ACCEPTED MANUSCRIPT

thermore, Nayak et al. (2015a) have proposed a PBDS in which the features are extracted from the 95

HL3 sub-band of 2D DWT and subsequently PCA+LDA technique is applied to get more relevant features. In order to primarily reduce the feature vector size and make the classification task easier, many recent PBDSs use several feature descriptors like energy and entropy as the features instead

100

CR IP T

of direct coefficients. In Saritha et al. (2013), the authors have suggested a PBDS which uses the entropy values of the approximation components of 2D DWT as features and subsequently a spider web plot is drawn based on these values. In order to select the most relevant features, they have employed t-test scheme and for classification probabilistic neural network (PNN) has been applied. In Yang et al. (2015), the authors have presented a PBDS where for feature extraction the energy values of the level-3 sub-bands of DWT are utilized and for classification, they have incorporated biogeography-based optimization (BBO) method with SVM. On the other hand, two

AN US

105

types of entropies such as Shannon entropy (SE) and Tsallis entropy (TE) are used in Zhang et al. (2015c) to generate the feature vector from discrete wavelet packet transform (DWPT). Finally, GEPSVM has been utilized as the classifier. Furthermore, Zhang et al. (2015d) have offered a PBDS which makes use of wavelet entropy as features and FNN as a classifier. In order to find the optimal parameters of FNN, a hybrid BBO and PSO based method named HBP is proposed. Zhou

M

110

et al. (2015) have used wavelet entropy features and Naive Bayes classifier (NBC) in their proposed

ED

PBDS. While in Zhang et al. (2015a), the authors have used wavelet energy for feature extraction and SVM for classification. Thereafter, to achieve better performance Zhang et al. (2015g) have

115

PT

suggested a PBDS where the Tsallis entropy values calculated from DWPT are used as features and fuzzy support vector machine (FSVM) is employed as a classifier. Zhang et al. (2015e) have

CE

suggested a scheme which utilizes wavelet entropy (WE) and Hu moment invariants (HMI) as features and GEPSVM+RBF as classifier. Later on, Wang et al. (2015b) have introduced a PBDS based on a new feature called fractional Fourier entropy (FRFE) which combines both FRFT

120

AC

and Shannon entropy. Subsequently, both Welch’s t-test (WTT) and Mahalanobis distance (MD) are performed to select the relevant features followed by a twin SVM (TSVM) for classification. Later, Zhang et al. (2016) have offered a PBDS using the same features, but for classification multilayer perceptron (MLP) has been used. To find the optimal hidden neurons in MLP, three pruning methods, namely, dynamic pruning (DP), Bayesian detection boundaries (BDB) and Kappa coefficient (KC) have been used, whereas to update the weights of MLP an adaptive real coded

125

BBO (ARCBBO) approach has been utilized. Wang et al. (2016b) have employed three variants of 6

ACCEPTED MANUSCRIPT

binary PSO (BPSO) to select relevant features from the 8-level DWT entropy features and used PNN for classification. While in (Wang et al., 2016a), the variance and entropy (VE) values of a level-1 dual-tree complex wavelet transform (DTCWT) are used as features. For classification, both GEPSVM and TSVM are used. Later on, Nayak et al. (2016b) have computed both energy and 130

entropy feature values from the sub-bands of 2D-SWT. In this, a symmetric uncertainty ranking

CR IP T

(SUR) filter and AdaBoost with support vector machine (ADBSVM) is used for feature selection and classification respectively.

It is observed from the literature that for feature extraction wavelet and its variants such as SWT and DTCWT are often used in recent PBDSs. But, conventional DWT suffers from limitations like 135

limited directional selectivity and translation variance. To resolve the translation variance issue, SWT is sufficient; however, SWT leads more redundancy and can not handle higher dimensional

AN US

singularities. Contrasted to DWT and SWT, DTCWT offers six directional selectivities. On the other hand, DTCWT is efficient and less redundant. However, if more directions can be selected, the features will be more relevant. Further, for classification, FNN and SVM have been mostly used 140

in many PBDSs; however, both requires more parameters to tune. Moreover, there exists scope to reduce the number of features without compromising the accuracy. Even though few schemes have

M

been shown to yield high accuracy, they perform poorly on larger datasets. In this work, we have suggested a PBDS for detecting pathological brain through MR images

145

ED

of patient brains which avoid the limitations of existing PBDSs. The contributions of this work are as follows. The suggested PBDS makes use of the simple pulse-coupled neural network (SPCNN),

PT

a reduced parameter based approach for ROI segmentation and fast discrete curvelet transform (FDCT) for feature extraction. FDCT shows better performance in representing the 2D singular-

CE

ities. Thereafter, a PCA+LDA approach is employed in order to determine the most significant feature set. Subsequently, PNN is applied to classify the MR images as it possesses interesting 150

properties which enable our system to provide faster response to probe MR images. This makes

AC

the system to achieve significantly improved performance as compared to its counterparts. Hence, the scheme has the potential to be installed on medical robots for real-time pathological brain detection.

3. Materials 155

To test the proposed PBDS, a standard dataset of 101 brain MR images is used in our experiment which is reported in El-Dahshan et al. (2014). The dataset contains T2-weighted brain MR images 7

ACCEPTED MANUSCRIPT

of size 256×256 in axial view plane. It has 14 healthy and 87 pathological samples from different

b

c

d

e

f

g

h

AN US

a

CR IP T

types of brain disease MR images and few sample images are shown in Figure 1.

M

i

Figure 1: Samples from T2-weighted brain MR images indicating different types of brain tissue: (a) healthy, (b)

ED

meningioma, (c) metastatic bronchogenic carcinoma, (d) glioblastoma, (e) AD, (f) AD plus VA, (g) sarcoma, (h) glioma, (i) Picks disease (El-Dahshan et al., 2014).

160

PT

In addition, the proposed PBDS has been tested on three popular datasets, viz., DS-66, DS160, and DS-255 containing 66, 160 and 255 images respectively. The type and size of these images are same as that of the dataset with 101 images which are available for use in Medical

CE

School of Harvard University website (Johnson & Becker, 1999). In addition to healthy brain samples, the datasets DS-66 and DS-160 have samples from seven classes of diseases, namely,

165

AC

sarcoma, AD, AD plus visual agnosia (VA), glioma, meningioma, Huntington’s disease (HD), and Pick’s disease (PD). These above categories along with the samples from four more diseases such as cerebral toxoplasmosis (CTP), herpes encephalitis (HE), multiple sclerosis (MS), and chronic subdural hematoma (CSH) are included in DS-255. Among all the disease, glioma, meningioma, and sarcoma belong to neoplastic disease; while CTP, MS, and HE belong to infectious disease. The degenerative disease includes AD, AD plus VA, PD, and HD; whereas cerebrovascular disease

170

contains only CSH. In this work, we serve all the diseased samples as the pathological brain samples.

8

ACCEPTED MANUSCRIPT

Therefore, it is a two-class classification problem (healthy or pathological). It is worth mentioning that the cost sensitivity problem (predicting pathological samples as healthy) may have the severe impact on patients if he/she defers his/her treatment. Hence, to solve this problem, all the datasets considered in this work contain more number of pathological samples as compared to healthy samples with an aim of making the system biased towards the pathological brain.

CR IP T

175

4. Proposed methodology

The proposed PBDS has four different stages, namely, preprocessing, feature extraction, feature reduction, and classification. To enhance the quality of the MR images, contrast limited adaptive histogram equalization (CLAHE) is employed followed by ROI extraction using SPCNN. FDCT is employed to extract features and the dimension of the features is reduced using PCA+LDA

AN US

180

approach. Thereafter, the reduced features are subjected to PNN classifier to classify the image as pathological or healthy. The proposed PBDS works in two phases, namely, offline learning and online prediction. The first phase includes the training of the system with the reduced feature sets, whereas, the second phase predicts a class label for a query MR image. The detailed block diagram

AC

CE

PT

ED

M

is depicted in Figure 2 and the steps followed in each stage are described below in brief.

Figure 2: Block diagram of the proposed PBDS 185

9

ACCEPTED MANUSCRIPT

4.1. Preprocessing based on CLAHE and SPCNN Image pre-processing is a standout amongst the most vital stages in image analysis which is typically used to improve the quality of the images Nayak et al. (2016c). It has been observed that many images in the selected datasets are low-contrast in nature. Therefore, to enhance such images, 190

a well-known methodology called contrast limited adaptive histogram equalization (CLAHE) is

CR IP T

employed in this work. CLAHE is a variation of adaptive histogram equalization (AHE) which at first computes a histogram of gray values in a contextual region centered around each pixel and thereafter, assigns a value to each pixel intensity within the display range according to the pixel intensity rank in its local histogram (Pizer, Johnston, Ericksen, Yankaskas, & Muller, 1990). Unlike 195

AHE, CLAHE has the benefits of preventing over-enhancement of noise and diminishing the edge-

AN US

shadowing effect by limiting the contrast enhancement of AHE (Pisano et al., 1998). CLAHE uses a predefined value called clip limit which helps in clipping the histogram prior to the computation of cumulative distribution function (CDF). However, it redistributes those parts of the histogram which surpass the clip limit equally among all histogram bins. 200

Region of interest (ROI) extraction or segmentation is another essential preprocessing stages in

M

medical image analysis. It preserves the important regions of the image containing abnormalities in order to extract features and excludes the unwanted information from the image. Pulse-coupled neu-

ED

ral network (PCNN) has been found to be a powerful technique for image segmentation. PCNN is a single layered, 2-D and a laterally connected neural network consisting of pulse-coupled neurons. In 205

this network, there is a one-to-one mapping between the image pixels and network neurons. Unlike

PT

other neural networks, no training is required in case of PCNN. The output images obtained during different iterations of PCNN show some segments or edge information of the input image (Wang,

CE

Ma, Cheng, & Yang, 2010). In El-Dahshan et al. (2014), the authors have used feedback pulsecoupled neural network (FPCNN) for ROI segmentation. However, both conventional PCNN and FPCNN demands too many parameters to tune manually. Therefore, a simplified version of PCNN

AC

210

called SPCNN is harnessed in this paper which involves less parameters (Wei, Hong, & Hou, 2011).

The general structure of a SPCNN neuron is shown in Figure 3. Like PCNN, each neuron

in SPCNN comprises of three fields, namely, input field, modulation field, and pulse generator. The neuron gets the input signals from both external sources Spq and other neurons through two distinct channels, namely, feeding channel (F Cpq ) and linking channel (LCpq ). The modulation field combines signals from both the channels to calculate the internal activity of neuron Upq and 10

AN US

Figure 3: SPCNN neuron model

CR IP T

ACCEPTED MANUSCRIPT

the last field controls the pulse generating activity-firing with the help of an adaptive threshold Tpq . The neuron only fires when the internal activity connected with it surpasses the threshold value. The SPCNN model is expressed using the series of equations as,

ED

M

F Cpq [z] = Spq

LCpq [z] =

X

AC

CE

PT

(g,h)∈N (p,q)

wpqgh Ygh [z − 1]

Upq [z] = F Cpq [z](1 + βLCpq [z])

 1 if Upq [z] > Tpq [z − 1] Ypq [z] = 0 otherwise Tpq [z] = e−λT Tpq [z − 1] + VT Ypq [z]

(1)

(2)

(3)

(4)

(5)

Here, pq denotes the position of neuron in the network, N (p, q) defines the neighborhood of neuron

(p, q), and gh refers to any position of a neuron in N (p, q). Spq is the gray level of image pixels at position (p, q) and Ypq [z] is the pulse output of the neuron with values either 0 or 1. Similarly, β and wpqgh represent the linking coefficient and constant synaptic weights from neuron (g, h) to neuron (p, q) respectively. The symbol λT and VT define the threshold decay time constant and the

11

ACCEPTED MANUSCRIPT

normalization constant of Tpq , respectively. In this model, we need to tune only four parameters such as w, β, λT , and VT . The value of parameters may vary from image to image. To find out the value of λT adaptively the following formula has been utilized (Wei et al., 2011), λT =

C µ

(6)

image. Other parameters have been determined experimentally. 215

4.2. Feature extraction based on FDCT

CR IP T

where, C is a constant which is same for all images and µ denotes the average gray level of the

Fourier transform has undermined its usage in numerous applications as it loses the time information of a signal. Moreover, it demands several terms to reconstruct a discontinuity within good

AN US

accuracy (Cand`es, 2003; Candes, Demanet, Donoho, & Ying, 2006). Later on, wavelet transform has attracted much attention of researchers as it possesses properties like time-frequency localiza220

tion and multiresolution. Wavelet transform performs effectively in representing 1D singularities (i.e., point singularities), while it can not handle 2D singularities (line, curves, etc.) because of the lower directional selectivity and isotropic scaling. In order to address these problems that con-

M

ventional wavelet suffers from, another powerful multiscale geometric analysis tool called curvelet transform was introduced in Cand`es & Donoho (2000). Curvelet transform can effectively deal with curve singularities of an image and has gained more popularity among other transforms due to its

ED

225

inherent properties such as multiresolution, more directional selectivity, anisotropy, and localiza-

PT

tion. Generally, it is an extension of ridgelet transform which can only handle line singularities effectively and therefore it is less applicable to many real-life problems (Cand`es, & Donoho, 1999;

230

CE

Do, & Vetterli, 2003). However, ridgelet transform can represent curve singularities by applying it on the sub-images obtained from an original image. This type of block-based transform is called as the first generation curvelet transform which has gained less popularity because of the unclear ge-

AC

ometry of ridgelets. In order to resolve the issues with first generation curvelets, second generation curvelet transform was introduced in Candes et al. (2006). Because the segmented ROI possesses various curves and lines and the curvelet transform is efficient in representing edges and textures,

235

a second generation curvelet transform is used as a feature extraction tool in this work. For a given signal f , the curvelet transform can be defined by a inner product as C(j, l, k) = hf, φj,l,k i

12

(7)

ACCEPTED MANUSCRIPT

where, φj,l,k is the curvelet basis function and j, l, and k denote the scale, direction (orientation), and position parameter respectively. In the frequency domain, the continuous curvelet transform is represented by r and θ polar coordinates along with x and ω which are spatial variable and frequency domain variable respectively. In the curvelet transform, the image is decomposed into various windows each at different scales and orientations.

CR IP T

For each j ≥ j0 , we can define the frequency window Uj in the Fourier domain as ! 2bj/2c θ −3j/4 −j Uj (r, θ) = 2 W (2 r)V 2π

(8)

where, bj/2c indicates the integer part of j/2, and W (r) and V (t) are called radial and angular

∞ X

AN US

window respectively which holds the following basic conditions

W 2 (2j r) = 1,

j=−∞ ∞ X

l=−∞

V 2 (t − l) = 1,

r ∈ (3/4, 3/2)

(9)

t ∈ (−1/2, 1/2)

(10)

Both the windows are smooth, non-negative and real valued on r ∈ (1/2, 2) and t ∈ [−1, 1].

M

The curvelets can be stated as a function of p = (p1 , p2 ) with a specific orientation and shifting at the scale 2−j

ED

  (j,l) φj,l,k (p) = φj Rθl (p − pk )

(11)

where, orientation θl = 2π.2−bj/2c .l, l = 0, 1, . . . with 0 ≤ θl < 2π, the parameter k = (k1 , k2 ) ∈ (j,l)

PT

Z2 indicates the sequence of shift parameters, and position pk

(k1 .2−j , k2 .2−j/2 ). The = Rθ−1 l

parameter Rθ is the orthogonal rotation by θ radians and Rθ−1 is its inverse.

CE

Now, the continuous curvelet transform takes the form Z C(j, l, k) = hf, φj,l,k i = f (x)φj,l,k dx R2

Expressing the curvelet coefficients as integral over the frequency plane, Eq. (12) becomes Z 1 Cj,l,k = fˆ(ω)φˆj,l,k (ω)dω (2π)2 D E Z (j,l) 1 i pk ,ω ˆ f (ω)Uj (Rθl ω)e = dω (2π)2

AC

240

(12)

(13)

For an input Cartesian array f [x1 , y1 ] with 0 ≤ x1 , y1 < n, the discrete curvelet transform is stated as C D (j, l, k) =

X

f [x1 , y1 ]φD j,l,k [x1 , y1 ]

0≤x1 ,y1
13

(14)

ACCEPTED MANUSCRIPT

where, φD j,l,k indicates a digital curvelet waveform. Figure 4 shows the basic curvelet decomposition of the image with multiple scales and orientations in the frequency plane. At a specific scale and orientation, the frequency response of a curvelet is represented by wedges and one such wedge is

AN US

CR IP T

shown in a shaded area of the figure.

M

Figure 4: Digital tiling of curvelet in frequency plane

The proposed system employs a second generation curvelet transform called fast discrete curvelet

ED

transform (FDCT) for feature extraction which can be implemented using two different procedures: via wrapping and via unequally spaced fast Fourier transform (USFFT). Contrasted to the first

PT

generation curvelets both these procedures are faster, simple, and less redundant (Candes et al., 2006). However, the first procedure (FDCT via wrapping) is more faster and easier to implement

CE

than the second one. Hence, we use FDCT via wrapping to extract the features from the segmented MR images. For feature vector construction, the approximation coefficients of FDCT via wrapping procedure at a given scale are computed and thereafter, arranged in a vector using row-major order. This feature extraction process was inspired by Zhang et al. (2011a) and El-Dahshan et al. (2014,

AC

245

2010) in which approximation coefficients of 2D DWT are used as features. The number of scales (s) for an image with nr rows and nc columns in FDCT via wrapping procedure can be found as given in Candes et al. (2006) s = dlog2 (min(nr , nc )) − 3e

(15)

For instance, if the image is of size 256 × 256, then the number of scales is computed as 5 where different scale contains information at various orientations except first and last scale. Scale 2, 3 and 14

ACCEPTED MANUSCRIPT

4 has angles such as 16, 32 and 32 respectively. The sub-band at scale-1 denotes the approximation 250

component and all other sub-bands refer to detail components. The feature vectors of all input images are likewise generated and finally stored in a feature matrix. Algorithm 1 lists the feature extraction process using curvelet via wrapping procedure. The dimension of the feature vectors can be reduced further to get more efficient representation of features.

CR IP T

Algorithm 1 Feature extraction using wrapping based FDCT scheme Input: N input images: f [x1 , y1 ]; 0 ≤ x1 , y1 < n Output: Feature matrix: FM of size N × D 2:

for each image f [x1 , y1 ] ∈ N do Compute 2D FFT as F [u1 , v1 ] =

n−1 X

x1 ,y1 =0

3:

f [x1 , y1 ]e−i2π(x1 u1 +y1 v1 )/n ; −n/2 ≤ u1 , v1 < n/2

for each scale j and angle l do

AN US

1:

˜j,l [u1 , v1 ] Generate the discrete localizing window in Fourier domain, U

5:

˜j,l [u1 , v1 ]F [u1 , v1 ] Find the product U

6:

Re-index the data by wrapping the product around the origin as

M

4:

ED

˜j,l , F )[u1 , v1 ] F˜j,l [u1 , v1 ] = W(U where, 0 ≤ u1 < L1,j and 0 ≤ v1 < L2,j with θ ∈ (−π/4, π/4), and L1,j ∼ 2j and

L2,j ∼ 2j/2 indicates the length and width of the rectangle at scale j

Apply inverse 2D FFT on F˜j,l and generate the discrete curvelet coefficients C D (j, l, k)

PT

7:

in spatial domain. end for

9:

Obtain the approximation coefficients and arrange in a vector of size 1 × D, where D is the

CE

8:

AC

number of features 10:

11:

Store the vector in the feature matrix FM

end for

4.3. Feature reduction based on PCA+LDA 255

The high dimensional feature vector generated by the feature extractor usually makes the classification task more difficult. Further, it increases the computational overhead and storage space 15

ACCEPTED MANUSCRIPT

requirement. This has paved the way to exploit dimensionality reduction techniques over the high-dimensional feature space in order to obtain the most relevant candidate features. Principal component analysis (PCA) is utilized as the most frequent tool for feature reduction. PCA 260

transforms a high dimensional input data to a lower dimensional space, termed as principal subspace while keeping maximum variations of the data. That is, it always seeks a direction that best

CR IP T

represents the data by excluding the class labels and hence unsupervised in nature. In contrast to PCA, linear discriminant analysis (LDA), a supervised approach, attempts to find a feature subspace that best discriminates between the classes and therefore has drawn the attention 265

of researchers in past decades. More formally, LDA always searches those vectors over which the samples of dissimilar classes are far from each other, whereas the samples of similar classes are close to each other. However, traditional LDA leads to degradation in performance while dealing with

AN US

high dimensional and small sample size problem as in this cases the within-scatter matrix (Sw ) is always singular (Yang & Yang, 2003). Further, to make sure that Sw does not become singular, we 270

need at least D + C (where D=dimension of feature vector and C=number of classes) number of samples which is practically impossible (Mart´ınez & Kak, 2001). To tackle this problem, a popular approach called PCA+LDA has been applied in this work. In PCA+LDA, a D-dimensional data

M

is first reduced using PCA to an M -dimensional data and thereafter to a d-dimensional data using LDA, where d << M < D.

ED

It should be noted that prior to the application of PCA+LDA, the feature vectors have been normalized with zero mean and unit variance. The process of finding an optimal or relevant

PT

feature set is described as follows. Initially, the eigenvalues of different features are arranged in decreasing order. Subsequently, a measure called the normalized cumulative sum of variances

275

AC

as

CE

(NCSV) corresponding to each feature is calculated and the NCSV value for j th feature is defined

N CSV (j) =

j P

u=1 D P

α(u) α(u)

; 1≤j≤D

(16)

u=1

where, α(u) represents the eigenvalue of the uth feature and D denotes the dimensionality of

the feature vector. Finally, a threshold value is set manually and the number of features for which the NCSV value surpasses the threshold are selected. Relevant features selected are determined experimentally to have an maximal accuracy. It may be noted that the coefficients of the d eigenvectors suitably called as basis vectors (BV) corresponding to d largest eigenvalues are retained for

16

ACCEPTED MANUSCRIPT

reducing the dimensionality of unknown test MR images. 4.4. Classification based on PNN Probabilistic neural network (PNN) is a supervised learning algorithm proposed by Specht in 1990 and is implemented with the help of the probabilistic model like Bayesian classifiers (Specht, 1990). Compared to other neural network models such as multilayer perceptron (MLP), radial basis

CR IP T

function (RBF) network and self-organizing map (SOM), PNN offers faster training process as it only requires a one-pass network training phase without any iteration for adjusting weights and does not suffer from local minima issues. In addition, it is adaptive to structural changes. Because of the faster learning speed and simple structure, PNN has been found suitable for pattern classification tasks (Mishra, Bhende, & Panigrahi, 2008) and therefore it is considered in the proposed work.

AN US

It may be noted that PNN is quite similar to Parzen window PDF estimator. The architecture of PNN is comprised of many neurons arranged in four layers, namely, input layer, pattern layer,

CE

PT

ED

M

summation layer and output layer as shown in Figure 5. The input layer passes the input to the

AC

280

Figure 5: Architecture of probabilistic neural network

neurons in the pattern layer without performing any computation. The number of neurons required in the input layer depends on the dimension of the feature vector (here it is d). The pattern layer contains neurons representing samples from the training set. For a test pattern X in the input 17

ACCEPTED MANUSCRIPT

layer, the output of neuron Xik at pattern layer is defined as   k X − Xik k2 yik (X) = exp − 2σ 2

(17)

where, Xik represents the k th sample of class Ci in the training set and σ represents the smoothing parameter. The summation layer performs average operation of the outputs from the neurons of pattern layer belonging to the same class   ni 1 X k X − Xik k2 exp − ni 2σ 2 k=1

CR IP T

gi (X) =

(18)

where ni is the number of samples in class Ci . The output layer contains single neuron and it classifies the test pattern X to a class as

i = 1, 2, . . . , c

AN US

ˆ C(X) = arg max {gi (X)} ,

(19)

ˆ where, c is the number of classes in the training set and C(X) is the predicted class label of test pattern X. In our case, we have two class labels and hence c = 2. The optimal value of σ is

285

5. Experiments and results

M

experimentally determined.

ED

To validate the efficacy of the proposed scheme, simulation was carried out with different datasets using MATLAB image processing toolbox on a machine having 3.7 GHz processor, 16 GB RAM, and Windows 8 OS. The proposed scheme along with the existing schemes are subjected

290

PT

to evaluation in a similar statistical set up to derive an equivalent comparative analysis. Four different measures, namely, sensitivity (Se ), specificity (Sp ), precision (Pr ) and accuracy are used

CE

to evaluate the proposed system. Se is the fraction of pathological MR samples correctly predicted by the classifier, while Sp is the fraction of healthy MR samples correctly predicted by the model.

AC

Accuracy determines the fraction of the correctly predicted samples (both pathological and healthy) in the total number of testing samples. The summary of all steps involved in the proposed PBDS is

295

listed in Algorithm 2. As the proposed scheme involves algorithms like SPCNN, FDCT, PCA+LDA and PNN, it is referred to as SPCNN + FDCT + PCA+LDA + PNN for subsequent use. The results at preprocessing and feature extraction stage are presented below. In order to enhance and segment the MR images, we have employed CLAHE and SPCNN respectively. The successful application of CLAHE and SPCNN relies on the actual setting of different parameters.

300

To achieve better enhancement, we divide the image equally by 8 in each direction which results in 18

ACCEPTED MANUSCRIPT

Algorithm 2 Implementation steps of the proposed PBDS Offline learning: 1:

for each ground truth MR image do Enhance the contrast of the image using CLAHE.

3:

Segment the enhanced image using SPCNN.

4:

Apply 5-level FDCT on the segmented image.

5:

Obtain the approximation coefficients and form a feature vector set of dimension D.

CR IP T

2:

6:

end for

7:

Apply PCA+LDA approach to reduce the dimension of feature vector from D to d, where d is computed from N CSV measure. Retain d basis vector (BV) coefficients.

Feed the train and test patterns (with reduced feature set) to PNN classifier and record the classification performance.

Online evaluation:

AN US

8:

Load the unknown MR image into the proposed system.

2:

Apply CLAHE on the image.

3:

Employ SPCNN on the enhanced image.

4:

Apply 5-level FDCT on the segmented image and obtain the feature vector from the approxi-

M

1:

mation coefficients.

Compute BV score by multiplying the feature vector with the retained BV coefficients.

6:

Feed the results of the BV score into the PNN classifier and predict the class label as pathological

ED

5:

PT

or healthy.

CE

64 contextual regions. Further, the number of bins is set as 256 and the parameter β is initialized as 0.01. Additionally, in order to get a flat histogram shape, a uniform distribution scheme is selected for each region. For ROI extraction, SPCNN is employed on the enhanced images. The correct

305

AC

setting of various parameters involved in SPCNN has a significant impact on the segmentation results. Table 1 shows the list of parameters used in SPCNN, where C, β, and VT are experimentally

determined. The parameter w is the synaptic weight between two adjacent neurons and is initialized

to a value as reported in Wei et al. (2011). The simulation results for preprocessing are shown in Figure 6 where each column represents the contrast enhancement and segmentation result of a brain MR image. From the figure, it may be observed that the affected regions are more distinguishable 310

in the preprocessed images as compared to the original images.

19

ACCEPTED MANUSCRIPT

Table 1: Parameters used in SPCNN

Values

Descriptions

C

2.64

Invariant constant

β

0.40

Linking coefficient

VT

100.00

Normalization constant of Tij

w

0.1036 along the diagonals

Linking weight

0.1464 along vertical and horizontal directions

e

f

i

j

c

d

AN US

b

g

h

k

l

ED

M

a

CR IP T

Parameters

Figure 6: Preprocessing results using CLAHE and SPCNN. Row 1 lists the original images. Row 2 lists preprocessing

PT

results using CLAHE. Row 3 lists segmentation results using SPCNN

CE

Subsequently, FDCT is applied to extract features from the preprocessed images. Here, the number of decomposition levels of FDCT is set to 5 using Eq. (15). Figure 7 shows the coefficients in a 5-level curvelet decomposition of a pathological MR sample. There exist 82 sub-bands in a

315

AC

5-level curvelet decomposition according to the sub-band division in Figure 4. However, in this work, only approximation coefficients are chosen as the features in order to primarily reduce the feature dimensionality. The size of the approximation sub-band is found to be 21×21 and therefore the total number of features to represent one MR image is 441 which may be reduced further. The overall simulation is divided into two different experiments based on the type of dataset used and is discussed below in detail.

20

CR IP T

ACCEPTED MANUSCRIPT

Figure 7: 5-level decomposition of curvelet transform for a pathological MR image

320

5.1. Experiment-I: Performance analysis on a dataset of 101 MR images

In the first experiment, the proposed PBDS has been validated using a dataset which is reported

AN US

in El-Dahshan et al. (2014). The dataset consists of 101 samples of two classes with 14 healthy and 87 pathological images. The training and testing set is chosen based on El-Dahshan’s scheme and is given in Table 2.

Table 2: Training and testing set division (El-Dahshan et al., 2014)

Training set (65) Healthy (H)

325

10

Healthy (H)

Pathological (P)

55

4

32

ED

101

Testing set (36)

Pathological (P)

M

Total images

In order to achieve better performance and make the classifier’s job easier, the high dimensional

PT

FDCT features (441 features) are reduced using PCA+LDA. The number of significant features is obtained based on the NCSV values of different features. The NCSV values relating to a various

CE

number of features obtained by PCA+LDA and standard PCA are shown in Figure 8. It is observed that PCA+LDA scheme preserves maximum information with only one feature, whereas in standard 330

PCA, it requires 14 features. However, setting the threshold value for NCSV as 0.9, we select one

AC

feature from PCA+LDA and five features from PCA separately. In the proposed system, probabilistic neural network (PNN) is employed to predict the test

image as healthy or pathological. The network contains only one neuron at the input layer, 65 neurons in the pattern layer and two neurons in the output layer. The optimal value of σ is

335

determined experimentally as 0.25. In order to compute the effectiveness of PNN, its results are compared with other classifiers such as back-propagation neural network (BPNN), random forest (RF), and k-NN (Table 3). The diagonal values (shown by bold letters) indicate the correctly 21

ACCEPTED MANUSCRIPT

1 0.95 0.9 0.85

PCA PCA+LDA

0.75 0.7 0.65 0.6 0.55

1

2

3

4

5

6

7

8

9

10

11

12

13

AN US

0.5

CR IP T

NCSV

0.8

14

15

Number of features

Figure 8: NCSV values vs number of features for the dataset with 101 samples

classified MR images and the non-diagonal values indicate incorrect classification. The classification

340

M

accuracies obtained by k-NN, BPNN, RF and PNN on both training and testing images are 97.03%, 99.00%, 98.02% and 100.00%, respectively. Hence, it is clear that PNN earns the best classification

ED

performance over this dataset.

Table 3: Classification results of different classifiers on both training and testing images

P

H→

86 2

BPNN

RF

PNN

H

P

H

P

H

P

H

1

87

0

86

1

87

0

1

13

1

13

0

14

CE

P→

PT

k-NN

12

Accuracy=99.00%

Accuracy=98.02%

Accuracy=100%

AC

Accuracy=97.03%

Further, the classification accuracy for both PCA and PCA+LDA with respect to different

number of features are shown in Figure 9. It may be observed that with only one feature (obtained from PCA+LDA) PNN can effectively classify MR images; whereas PNN requires five features

345

(obtained from PCA) to produce optimal results. The classification performance for both of the methods with only one feature is recorded in Table 4. It is seen that PCA+LDA is more suitable for feature reduction as it offers better accuracy with very less number of features. 22

ACCEPTED MANUSCRIPT

100

PCA PCA+LDA 96

94

92

90

88

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

AN US

Number of features

CR IP T

Classification accuracy (%)

98

Figure 9: Classification accuracy (%) with varying number of features for both PCA and PCA+LDA Table 4: Classification results for PCA and PCA+LDA method with only one feature

PCA

78

H→

3

ED

P→

H

P

H

9

87

0

11

0

14

M

P

PCA+LDA

Accuracy=100%

PT

Accuracy=88.12%

5.1.1. Classification comparison with El-Dahshan’s scheme

350

CE

The proposed schemes (considering PCA in place of PCA+LDA as another scheme) have been compared with the most recently reported competent scheme as proposed by El-Dahshan et al. (2014) under the similar experimental set up and the results are listed in Table 5. It is observed

AC

that the proposed methods, namely, SPCNN + FDCT + PCA + PNN and SPCNN + FDCT + PCA+LDA + PNN achieve higher classification accuracy, sensitivity and specificity than ElDahshan’s scheme (FPCNN + DWT + PCA + FNN). Moreover, the proposed schemes SPCNN +

355

FDCT + PCA + PNN and SPCNN + FDCT + PCA+LDA + PNN require five and one feature respectively which is relatively less than the features required in El-Dahshan’s scheme. Further, the algorithms used in our proposed schemes are simple and requires less parameter to tune.

23

ACCEPTED MANUSCRIPT

Table 5: Performance comparison of proposed methods with El-Dahshan’s method on 101 images Approaches

No. of features

Accuracy(%)

Se (%)

Sp (%)

FPCNN + DWT + PCA + FNN (El-Dahshan et al., 2014)

7

99.00

100.00

92.80

SPCNN + FDCT + PCA + PNN (Proposed)

5

100.00

100.00

100.00

SPCNN + FDCT + PCA+LDA + PNN (Proposed)

1

100.00

100.00

100.00

CR IP T

5.1.2. Comparison with other existing schemes In order to evaluate the effectiveness of the proposed methods, other competent schemes have 360

been simulated and the results are compared in terms of classification accuracy, sensitivity, specificity, and the number of features (Table 6). It may be seen that the proposed methods yield better performance results than nine other existing methods with significantly less number of feature

AN US

requirements.

Table 6: Comparison with other existing schemes on a dataset of 101 MR images Approaches

No. of features

Accuracy(%)

Se (%)

Sp (%)

4761

96.04

96.55

92.86

7

98.02

98.85

92.86

7

97.03

98.85

85.71

DWT + PCA + BPNN + SCG (Zhang et al., 2011a)

19

98.02

100.00

85.71

ICA + SVM (Z¨ollner et al., 2012)

9

79.00

87.00

75.00

PCA + SVM (Z¨ ollner et al., 2012)

3

85.00

87.00

84.00

FPCNN + DWT + PCA + FNN (El-Dahshan et al., 2014)

7

99.00

100.00

92.80

DWT + KPCA + LS-SVM+RBF (Nayak et al., 2015b)

7

99.00

98.85

100.00

DWT + PPCA + ADBRF (Nayak et al., 2016a)

13

100.00

100.00

100.00

SPCNN+ FDCT + PCA + PNN (Proposed)

5

100.00

100.00

100.00

SPCNN+ FDCT + PCA+LDA + PNN (Proposed)

1

100.00

100.00

100.00

DWT +SVM+POLY (Chaplot et al., 2006) DWT + PCA + k-NN (El-Dahshan et al., 2010)

CE

PT

ED

M

DWT + PCA + ANN (El-Dahshan et al., 2010)

365

AC

5.2. Experiment-II: Performance analysis on other standard datasets In the second experiment, three popular datasets, namely, DS-66, DS-160, and DS-255 were

used to validate the proposed schemes. To find out training and testing samples, cross-validation (CV) approach has been employed and its setting was kept same as reported in the literature (Das et al., 2013; Nayak et al., 2016a; Zhang et al., 2016) which is given in Table 7. CV helps in avoiding over-fitting problems. Also, it makes the classifier to generalize on independent datasets. In this

370

work, a stratification scheme is included with CV in order to preserve practically a similar class distributions in each fold. For DS-66, we employ 6-fold stratified cross-validation (SCV), while 24

ACCEPTED MANUSCRIPT

for other two datasets, we employ 5-fold SCV procedure. It may be noted that we run the SCV procedure 10 times on three datasets in order to avoid randomness. Table 7: k-fold SCV setting of three benchmark datasets (Das et al., 2013; Nayak et al., 2016a; Zhang et al., 2016)

Total samples

k-fold SCV

Training

Testing

H

H

H

P

DS-66

18

48

6

15

DS-160

20

140

5

16

DS-255

35

220

5

28

P

P

CR IP T

Dataset

40

3

8

112

4

28

176

7

44

375

AN US

By using the same number of features as derived from Experiment-I, the accuracies obtained by the proposed schemes on the three benchmark datasets are reported in Table 8. It may be seen that the proposed methods yield ideal classification on DS-66 dataset. Further, it is observed that for datasets with more number of samples, the classification accuracy is relatively poor. The classification results can be further improved by feeding more number of features to PNN classifier.

380

M

Therefore, another test has been carried out between NCSV values and the number of features in order to find out the significant features. The variation of number of features versus NCSV

ED

reveals that three features using PCA+LDA and seven features using PCA+LDA are selected with threshold 0.9 (Figure 10). Therefore, PCA+LDA approach is found more suitable in finding the

of three datasets.

PT

significant features. Here, both the schemes PCA and PCA+LDA have been employed on the union

Table 8: Classification accuracy (%) of the proposed methods on three datasets with same number of features as used

CE

in Experiment-I

Approaches

No. of features

DS-66

DS-160

DS-255

5

100.00

98.75

97.45

1

100.00

99.12

98.04

AC

SPCNN + FDCT + PCA + PNN SPCNN + FDCT + PCA+LDA + PNN

385

To demonstrate the effectiveness of the PNN classifier with three features, accuracy comparison

has been made with BPNN, random forest (RF), and k-NN classifier on all the three datasets (Figure 11). The overall accuracies earned by k-NN over DS-66, DS-160 and DS-255 datasets are 98.48%, 98.75% and 98.55% respectively. BPNN achieves accuracies of 100.00%, 99.82% and 99.22% on DS-66, DS-160 and DS-255 datasets respectively; whereas, RF achieves accuracies of 99.39%, 25

ACCEPTED MANUSCRIPT

1

0.9 PCA PCA+LDA

CR IP T

NCSV

0.8

0.7

0.6

0.4

1

2

3

4

5

6

7

8

AN US

0.5

9

10 11 12 13 14 15 16 17 18 19 20

Number of features

M

Figure 10: NCSV values with respect to different number of features on the combination of three datasets

100

98.5

PT

98

97.5 97

CE

Classification accuracy (%)

99

ED

99.5

k−NN BPNN RF PNN

96.5

AC

96

95.5 95

DS−66

DS−160

DS−255

Dataset

Figure 11: Classification accuracy obtained by different classifiers on three standard datasets

26

ACCEPTED MANUSCRIPT

390

99.25% and 98.94%. However, PNN earns ideal classification on DS-66 and DS-160 dataset and an accuracy of 99.57% on DS-255 dataset. In general PNN yields superior performance as compared to other classifiers. 5.2.1. Results of 10 × 5-fold SCV for DS-255 Table 9 lists the number of correctly classified samples and the corresponding accuracies at

CR IP T

different folds and runs of a 10 × 5-fold SCV approach on DS-255. The results in the table

Run

Fold-1

1

51 (100.00) 50 (98.04)

51 (100.00) 51 (100.00) 51 (100.00) 254 (99.61)

2

51(100.00)

3

51 (100.00) 51 (100.00) 51 (100.00) 51 (100.00) 51 (100.00) 255 (100.00)

4

50 (98.04)

5

51 (100.00) 51 (100.00) 51 (100.00) 51 (100.00) 51 (100.00) 255 (100.00)

6

51 (100.00) 50 (98.04)

7

AN US

Table 9: 10 × 5-fold SCV result of proposed method SPCNN + FDCT + PCA+LDA + PNN on DS-255

51 (100.00) 51 (100.00) 51 (100.00) 51 (100.00) 50 (98.04)

8

51 (100.00) 51 (100.00) 51 (100.00) 51 (100.00) 51 (100.00) 255 (100.00)

9

50 (98.04)

10

51 (100.00) 51 (100.00) 49 (96.08)

Fold-4

51 (100.00) 51 (100.00) 49 (96.08) 51 (100.00) 51 (100.00) 50 (98.04)

M

51 (100.00) 50 (98.04)

Fold-5

Total

51 (100.00) 253 (99.22) 51 (100.00) 253 (99.22) 51 (100.00) 253 (99.22) 254 (99.61)

ED

51 (100.00) 51 (100.00) 51 (100.00) 51 (100.00) 254 (99.61) 51 (100.00) 51 (100.00) 253 (99.22) 2539 253.9 (99.57)

CE

395

Fold-3

PT

Sum Average Result

Fold-2

demonstrate that the suggested SPCNN + FDCT + PCA+LDA + PNN method correctly classifies 2539 samples and misclassifies 11 samples. It is worth mentioning that for a 10 × 5-fold SCV,

AC

there exist 2200 and 350 samples from pathological and healthy class respectively. However, 2194 pathological samples are correctly classified by the proposed scheme and the rest six samples are

400

misclassified to healthy class. Further, out of 350 healthy samples, 345 samples are correctly classified and rest five samples are misclassified to pathological class. Based on these results, the sensitivity, specificity, and precision values of the proposed scheme are computed as 99.73%, 98.57%, and 99.77% respectively which are listed in Table 10. Similarly, the results for SPCNN + FDCT + PCA + PNN scheme with seven features are computed. It is seen that SPCNN + FDCT +

27

ACCEPTED MANUSCRIPT

405

PCA+LDA + PNN scheme has higher values for sensitivity and accuracy, and comparable values for specificity and precision as compared to SPCNN + FDCT + PCA + PNN. Since a CAD system with high sensitivity and low specificity is considered to be ideal, the proposed SPCNN + FDCT + PCA+LDA + PNN scheme holds great potential to be applicable in correct clinical decision making.

Proposed scheme

Se (%)

Sp (%)

Pr (%)

Accuracy (%)

SPCNN + FDCT + PCA + PNN

99.18

100.00

100.00

99.29

SPCNN + FDCT + PCA+LDA + PNN

99.73

98.57

99.77

99.57

AN US

410

CR IP T

Table 10: Performance assessment of the proposed schemes on DS-255

5.3. Comparison with other PBDSs

An extensive comparison of 21 existing competent PBDSs with the proposed scheme has been made on three datasets with respect to the number of features, the number of runs, and the classification accuracy and is given in Table 11. It might be seen that a large number of the

415

M

PBDSs yield perfect classification on DS-66; however only two schemes, namely, RT + PCA + LS-SVM (Das et al., 2013) and DWPT + TE + GEPSVM (Zhang et al., 2015c) along with the

ED

proposed SPCNN + FDCT + PCA+LDA + PNN scheme offer 100% of accuracy over DS-160. On DS-255, no existing PBDS earn perfect classification, but the proposed SPCNN + FDCT + PCA+LDA + PNN scheme achieves an accuracy of 99.57% which is higher as compared to others.

420

PT

Further, it demands the least number of features. Though the improvement in accuracy is marginal and comparable with some of the existing schemes, the result is obtained over a number of runs

reliable.

CE

of a k-fold SCV procedure. This reflects the improvement in proposed scheme to be robust and

AC

From the experiments, it is evident that the proposed scheme earns better performance in the context of classification accuracy and number of features used as compared to other competent

425

schemes on all the four datasets. The proposed scheme involves algorithms such as SPCNN, FDCT, and PNN which possesses several advantages e.g., it is simple and requires less parameters adjustment in contrast to other algorithms. FDCT is effective in capturing edge and texture features from MR images. PNN has simple network structure and faster learning speed than the classifiers like BPNN, SVM, RF, etc., that are employed in existing PBDSs. These methods collectively enhance

430

the strength of the system and therefore can be installed on medical robots for detecting patients 28

ACCEPTED MANUSCRIPT

Table 11: Comparative analysis with other competent PBDSs on three standard datasets

Existing PBDSs

#Feature #Run

Accuracy (%) DS-66 DS-160 DS-255

4761

DWT + SVM + RBF (Chaplot et al., 2006) DWT + PCA + k-NN (El-Dahshan et al., 2010) DWT + PCA + FNN + ACPSO (Zhang et al., 2010) DWT + PCA + BPNN + SCG (Zhang et al., 2011a)

98.00

97.15

96.37

4761

5

98.00

97.33

96.18

7

5

98.00

97.54

96.79

19

5

100.00

98.75

97.38

19

5

100.00

98.29

97.14

19

5

100.00

98.93

97.81

19

5

100.00

99.38

98.82

RT + PCA + LS-SVM (Das et al., 2013)

9

5

100.00 100.00

99.39

FPCNN + DWT + PCA + FNN (El-Dahshan et al., 2014)

7

10

100.00

98.88

98.43

SWT + PCA + IABAP-FNN (Wang et al., 2015a)

7

10

100.00

99.44

99.18

SWT + PCA + ABC-SPSO-FNN (Wang et al., 2015a)

7

10

100.00

99.75

99.02

DWPT + SE + GEPSVM (Zhang et al., 2015c)

16

10

99.85

99.62

98.78

DWPT + TE + GEPSVM (Zhang et al., 2015c)

16

10

100.00 100.00

99.33

SWT + PCA + GEPSVM (Zhang et al., 2015b)

7

10

100.00

99.62

99.02

WE + NBC (Zhou et al., 2015)

7

10

92.58

91.87

90.51

DWT + PCA + LDA + RF (Nayak et al., 2015a)

7

10

100.00

99.75

99.14

WE + HMI + GEPSVM (Zhang et al., 2015e)

14

10

100.00

99.56

98.63

FRFE + WTT + SVM (Wang et al., 2015b)

12

10

100.00

99.69

98.98

DWT + PCA + ADBRF (Nayak et al., 2016a)

13

10

100.00

99.18

98.35

DTCWT + VE + GEPSVM (Wang et al., 2016a)

12

10

100.00

99.75

99.25

FRFE + WTT + DP-MLP + ARCBBO (Zhang et al., 2016)

12

10

100.00

99.19

98.24

SPCNN + FDCT + PCA + PNN

7

10

100.00 99.81 99.29

SPCNN + FDCT + PCA+LDA + PNN

3

10

100.00 100.00 99.57

AC

CE

PT

ED

DWT + PCA + KSVM (Zhang & Wu, 2012)

M

AN US

DWT + PCA + FNN + SCABC (Zhang et al., 2011b)

5

CR IP T

DWT + SVM + POLY (Chaplot et al., 2006)

(Proposed) #Feature: Number of features, #Run: Number of runs

29

ACCEPTED MANUSCRIPT

with a pathological brain to arrive at a final clinical decision. However, the suggested scheme suffers from following limitations. SPCNN still requires four parameters. Hence, a non-parametric based segmentation algorithm can be investigated as an alternative to SPCNN. We validate our scheme on four available datasets containing images from patients during the late and middle stages of 435

diseases, but a lager dataset with images from all stages of diseases can be explored for better

CR IP T

generalization. The present work deals with a two-class problem, however multi-class classification is highly in demand.

6. Conclusion and future work

In this paper, an efficient pathological brain detection system based on FDCT features is proposed which can aid radiologist to correctly predict pathological brains. Contrast to El-Dahshan’s

AN US

440

method, the proposed system uses SPCNN for ROI segmentation, FDCT for feature extraction, PCA+LDA for feature reduction and finally, PNN for classification. This system has advantages over other schemes in which DWT and FNN are used for feature extraction and classification. The proposed system is first compared with El-Dahshan’s scheme on a dataset of 101 MR images and the results indicate that the proposed system offers ideal classification results. In order to make our

M

445

system more generalized and robust, it has also been validated with three standard datasets. An

ED

extensive comparison of existing schemes with the proposed scheme has been made and the results show that the proposed method achieves better performance than other existing schemes in terms of accuracy and number of features.

The main contributions of the proposed PBD system lies in the use of simplified pulse-coupled

PT

450

neural network (SPCNN) for segmentation, fast discrete curvelet transform (FDCT) to extract

CE

curve like features and finally probabilistic neural network (PNN) for classification. The suggested scheme found to have several advantages compared to the existing schemes in terms of number

455

AC

of features used, classification accuracy and computational overhead. SPCNN demands very few parameters compared to PCNN and FPCNN. FDCT extracts better edge and texture features from MR images compared to DWT, SWT, and DTCWT which are often used in other competent schemes. Compared to other classifiers like FNN, RBFN and SVM, PNN offers faster learning speed and has no local minima issues. Therefore, the proposed scheme leads to have faster response to unknown testing data. 460

Even though the proposed PBDS has been validated on different available datasets, a larger dataset collected online will further prove its effectiveness. Further, the images in the chosen 30

ACCEPTED MANUSCRIPT

datasets are collected from the late and middle stage of the diseases, an early stage collected images need to be validated. In future, the proposed system may be tested on images from other modalities like MRSI, CT, and PET. Further, the proposed system may incorporate some advanced 465

machine learning techniques such as dictionary learning, deep learning, and extreme learning in

References

CR IP T

order to enrich the performance.

Candes, E., Demanet, L., Donoho, D., & Ying, L. (2006). Fast discrete curvelet transforms. Multiscale Modeling & Simulation, 5 , 861–899. 470

Cand`es, E. J. (2003). What is... a curvelet? Notices of the American Mathematical Society, 50 ,

AN US

1402–1403.

Cand`es, E. J., & Donoho, D. L. (1999). Ridgelets: A key to higher-dimensional intermittency? Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 357 , 2495–2509.

Cand`es, E. J., & Donoho, D. L. (2000). Curvelets- a surprisingly effective nonadaptive representa-

M

475

tion for objects with edges. Vanderbilt University Press, Nashville, TN , 105–120.

ED

Chaplot, S., Patnaik, L. M., & Jagannathan, N. R. (2006). Classification of magnetic resonance brain images using wavelets as input to support vector machine and neural network. Biomedical

480

PT

Signal Processing and Control , 1 , 86–92. Das, S., Chowdhury, M., & Kundu, K. (2013). Brain MR image classification using multiscale

CE

geometric analysis of ripplet. Progress In Electromagnetics Research, 137 , 1–17. Do, M. N., & Vetterli, M. (2003). The finite ridgelet transform for image representation. IEEE

AC

Transactions on image Processing, 12 , 16–28.

El-Dahshan, E. A., Mohsen, H. M., Revett, K., & Salem, A. B. M. (2014). Computer-aided

485

diagnosis of human brain tumor through MRI: A survey and a new algorithm. Expert Systems

with Applications, 41 , 5526–5545. El-Dahshan, E. S. A., Honsy, T., & Salem, A. B. M. (2010). Hybrid intelligent techniques for MRI brain images classification. Digital Signal Processing, 20 , 433–441.

31

ACCEPTED MANUSCRIPT

Johnson, K. A., & Becker, J. A. (1999). The Whole Brain Atlas. http://www.med.harvard.edu/ 490

AANLIB/. Maitra, M., & Chatterjee, A. (2006). A Slantlet transform based intelligent system for magnetic resonance brain image classification. Biomedical Signal Processing and Control , 1 , 299–306.

CR IP T

Mart´ınez, A. M., & Kak, A. C. (2001). PCA versus LDA. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 , 228–233. 495

Mishra, S., Bhende, C. N., & Panigrahi, B. K. (2008). Detection and classification of power quality disturbances using S-transform and probabilistic neural network. IEEE Transactions on Power Delivery, 23 , 280–287.

AN US

Nayak, D. R., Dash, R., & Majhi, B. (2015a). Classification of brain MR images using discrete wavelet transform and random forests. In Fifth National Conference on Computer Vision, Pattern 500

Recognition, Image Processing and Graphics (NCVPRIPG) (pp. 1–4). IEEE. Nayak, D. R., Dash, R., & Majhi, B. (2015b). Least squares SVM approach for abnormal brain

M

detection in MRI using multiresolution analysis. In International Conference on Computing, Communication and Security (ICCCS) (pp. 1–6). IEEE.

505

ED

Nayak, D. R., Dash, R., & Majhi, B. (2016a). Brain MR image classification using two-dimensional discrete wavelet transform and AdaBoost with random forests. Neurocomputing, 177 , 188–197.

PT

Nayak, D. R., Dash, R., & Majhi, B. (2016b). Stationary wavelet transform and adaboost with SVM based pathological brain detection in MRI scanning. CNS & Neurological Disorders Drug

CE

Targets, .

Nayak, D. R., Dash, R., Majhi, B., & Mohammed, J. (2016c). Non-linear cellular automata based edge detector for optical character images. Simulation, (pp. 1–11).

AC

510

Pisano, E. D., Zong, S., Hemminger, B. M., DeLuca, M., Johnston, R. E., Muller, K., Braeuning, M. P., & Pizer, S. M. (1998). Contrast limited adaptive histogram equalization image processing to improve the detection of simulated spiculations in dense mammograms. Journal of Digital Imaging, 11 , 193–200.

32

ACCEPTED MANUSCRIPT

515

Pizer, S. M., Johnston, R. E., Ericksen, J. P., Yankaskas, B. C., & Muller, K. E. (1990). Contrastlimited adaptive histogram equalization: speed and effectiveness. In Proceedings of the First Conference on Visualization in Biomedical Computing (pp. 337–345). IEEE. Saritha, M., Joseph, K. P., & Mathew, A. T. (2013). Classification of MRI brain images using combined wavelet entropy based spider web plots and probabilistic neural network. Pattern Recognition Letters, 34 , 2151–2156.

CR IP T

520

Specht, D. F. (1990). Probabilistic neural networks. Neural Networks, 3 , 109–118.

Wang, S., Lu, S., Dong, Z., Yang, J., Yang, M., & Zhang, Y. (2016a). Dual-tree complex wavelet transform and twin support vector machine for pathological brain detection. Applied Sciences,

525

AN US

6 , 169.

Wang, S., Phillips, P., Yang, J., Sun, P., & Zhang, Y. (2016b). Magnetic resonance brain classification by a novel binary particle swarm optimization with mutation and time-varying acceleration coefficients. Biomedical Engineering/Biomedizinische Technik , (pp. 1–10). Wang, S., Zhang, Y., Dong, Z., Du, S., Ji, G., Yan, J., Yang, J., Wang, Q., Feng, C., & Phillips, P.

brain detection. International Journal of Imaging Systems and Technology, 25 , 153–164.

ED

530

M

(2015a). Feed-forward neural network optimized by hybridization of PSO and ABC for abnormal

Wang, S., Zhang, Y., Yang, X., Sun, P., Dong, Z., Liu, A., & Yuan, T.-F. (2015b). Pathological

PT

brain detection by a novel image featurefractional Fourier entropy. Entropy, 17 , 8278–8296. Wang, Z., Ma, Y., Cheng, F., & Yang, L. (2010). Review of pulse-coupled neural networks. Image

535

CE

and Vision Computing, 28 , 5–13. Wei, S., Hong, Q., & Hou, M. (2011). Automatic image segmentation based on PCNN with adaptive

AC

threshold time constant. Neurocomputing, 74 , 1485–1491. Westbrook, C. (2014). Handbook of MRI technique. Oxford: John Wiley & Sons.

Yang, G., Zhang, Y., Yang, J., Ji, G., Dong, Z., Wang, S., Feng, C., & Wang, Q. (2015). Automated classification of brain images using wavelet-energy and biogeography-based optimization.

540

Multimedia Tools and Applications, (pp. 1–17). Yang, J., & Yang, J.-y. (2003). Why can LDA be performed in PCA transformed space? Pattern Recognition, 36 , 563–566. 33

ACCEPTED MANUSCRIPT

Zhang, G., Wang, Q., Feng, C., Lee, E., Ji, G., Wang, S., Zhang, Y., & Yan, J. (2015a). Automated classification of brain MR images using wavelet-energy and support vector machines. In 545

2015 International Conference on Mechatronics, Electronic, Industrial and Control Engineering (MEIC-15) (pp. 683–686). Zhang, Y., Dong, Z., Liu, A., Wang, S., Ji, G., Zhang, Z., & Yang, J. (2015b). Magnetic resonance

CR IP T

brain image classification via stationary wavelet transform and generalized eigenvalue proximal support vector machine. Journal of Medical Imaging and Health Informatics, 5 , 1395–1403. 550

Zhang, Y., Dong, Z., Wang, S., Ji, G., & Yang, J. (2015c). Preclinical diagnosis of magnetic resonance (MR) brain images via discrete wavelet packet transform with Tsallis entropy and

AN US

generalized eigenvalue proximal support vector machine (GEPSVM). Entropy, 17 , 1795–1813. Zhang, Y., Dong, Z., Wu, L., & Wang, S. (2011a). A hybrid method for MRI brain image classification. Expert Systems with Applications, 38 , 10049–10053. 555

Zhang, Y., Sun, Y., Phillips, P., Liu, G., Zhou, X., & Wang, S. (2016). A multilayer perceptron based smart pathological brain detection system by fractional Fourier entropy. Journal of Medical

M

Systems, 40 , 1–11.

Zhang, Y., Wang, S., Dong, Z., Phillip, P., Ji, G., & Yang, J. (2015d). Pathological brain detection

560

ED

in magnetic resonance imaging scanning by wavelet entropy and hybridization of biogeographybased optimization and particle swarm optimization. Progress in Electromagnetics Research,

PT

152 , 41–58.

Zhang, Y., Wang, S., Ji, G., & Dong, Z. (2013). An MR brain images classifier system via particle

1–9.

Zhang, Y., Wang, S., Sun, P., & Phillips, P. (2015e). Pathological brain detection based on wavelet

AC

565

CE

swarm optimization and kernel support vector machine. The Scientific World Journal , 2013 ,

entropy and Hu moment invariants. Bio-medical Materials and Engineering, 26 , S1283–S1290.

Zhang, Y., Wang, S., & Wu, L. (2010). A novel method for magnetic resonance brain image classification based on adaptive chaotic PSO. Progress In Electromagnetics Research, 109 , 325– 343.

570

Zhang, Y., & Wu, L. (2012). An MR brain images classifier via principal component analysis and kernel support vector machine. Progress In Electromagnetics Research, 130 , 369–388. 34

ACCEPTED MANUSCRIPT

Zhang, Y., Wu, L., & Wang, S. (2011b). Magnetic resonance brain image classification by an improved artificial bee colony algorithm. Progress In Electromagnetics Research, 116 , 65–79. Zhang, Y.-D., Chen, S., Wang, S.-H., Yang, J.-F., & Phillips, P. (2015f). Magnetic resonance brain 575

image classification based on weighted-type fractional Fourier transform and nonparallel support

CR IP T

vector machine. International Journal of Imaging Systems and Technology, 25 , 317–327. Zhang, Y.-D., Wang, S.-H., Yang, X.-J., Dong, Z.-C., Liu, G., Phillips, P., & Yuan, T.-F. (2015g). Pathological brain detection in MRI scanning by wavelet packet Tsallis entropy and fuzzy support vector machine. SpringerPlus, 4 , 1–16.

Zhou, X., Wang, S., Xu, W., Ji, G., Phillips, P., Sun, P., & Zhang, Y. (2015). Detection of

AN US

pathological brain in MRI scanning based on wavelet-entropy and naive Bayes classifier. In Bioinformatics and Biomedical Engineering (pp. 201–209).

Z¨ ollner, F. G., Emblem, K. E., & Schad, L. R. (2012). SVM-based glioma grading: optimization

CE

PT

ED

M

by feature reduction analysis. Zeitschrift f¨ ur medizinische Physik , 22 , 205–214.

AC

580

35