PCA based clustering for brain tumor segmentation of T1w MRI images

PCA based clustering for brain tumor segmentation of T1w MRI images

Accepted Manuscript PCA Based Clustering for Brain Tumor Segmentation of T1w MRI Images Irem Ersoz ¸ akmak Pehlivanli , ¨ Kaya , Ayc¸a C Emine Gezmez...

1023KB Sizes 0 Downloads 29 Views

Accepted Manuscript

PCA Based Clustering for Brain Tumor Segmentation of T1w MRI Images Irem Ersoz ¸ akmak Pehlivanli , ¨ Kaya , Ayc¸a C Emine Gezmez Sekizkardes¸ , Turgay Ibrikc¸i PII: DOI: Reference:

S0169-2607(16)30419-9 10.1016/j.cmpb.2016.11.011 COMM 4307

To appear in:

Computer Methods and Programs in Biomedicine

Received date: Revised date: Accepted date:

1 May 2016 10 October 2016 23 November 2016

Please cite this article as: Irem Ersoz ¸ akmak Pehlivanli , Emine Gezmez Sekizkardes¸ , ¨ Kaya , Ayc¸a C Turgay Ibrikc¸i , PCA Based Clustering for Brain Tumor Segmentation of T1w MRI Images, Computer Methods and Programs in Biomedicine (2016), doi: 10.1016/j.cmpb.2016.11.011

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Highlights  To investigate the performance of the PCA based clustering on MR images.  In order to achieve the goal, the PCA methods were first implemented on the MRI images which were resized into three different sizes, as well as the original size. The two common methods, K-means and FCM, are preferred for clustering.



The success of the five PCA algorithms, PCA, PPCA, EM-PCA, GHA, APEX, in dimensionality reduction for clustering and to evaluate the methods according to their propensity to cause information loss.



The EM-PCA and the PPCA achieve the successful results with the two clustering algorithms implemented on the resized MRI images.



Therefore, it can be concluded that the PPCA and the EM-PPCA work effectively with the two clustering algorithms.

AC

CE

PT

ED

M

AN US

CR IP T



1

ACCEPTED MANUSCRIPT

PCA Based Clustering for Brain Tumor Segmentation of T1w MRI Images

Irem ERSÖZ KAYA1, Ayça ÇAKMAK SEKİZKARDEŞ3, Turgay IBRIKÇI4* 1

Mersin

University,

Software

Eng.

Dept.

33440

Tarsus,

Emine

Mersin,

GEZMEZ

Turkey;

(e-mail:

CR IP T

[email protected]);

PEHLİVANLI2,

2

Mimar Sinan Fine Arts University, Department of Statistics, Istanbul, Turkey (e-mail:

[email protected]). 3

Cukurova University, Electrical-Electronics Eng. Dept. 01330, Adana, Turkey (e-mail:

AN US

[email protected]); 4

Cukurova University, Electrical-Electronics Eng. Dept. 01330, Adana, Turkey (e-mail:

[email protected]); *

Corresponding Author: [email protected]

M

ABSTRACT

ED

Background and Objective: Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the

PT

reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate

CE

low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain

AC

tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Methods: Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based 2

ACCEPTED MANUSCRIPT

Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering.

CR IP T

In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256x256, 128x128 and 64x64, were included in the study to examine their effect on the methods.

Results: The obtained results were compared in terms of both the reconstruction errors

AN US

and the Euclidean distance errors among the clustered images containing the same number of principle components.

Conclusion: According to the findings, the PPCA obtained the best results among all

M

others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving

ED

significant results with both clustering algorithms for all size of T1w MRI images.

PT

Keywords: Dimension reduction; PCA algorithms; clustering K-Means; Fuzzy C-Means.

CE

1. INTRODUCTION

Brain tumor is a mass or a growth of abnormal cells in brain tissue. There are two

AC

types of tumors, cancerous and non-cancerous. Researches show that over a third of brain tumors are cancerous, and brain tumors rank among the top of cancer related deaths. Therefore, early diagnosis and treatment are crucial for survival [1]. The detailed aspects of brain tumors can be displayed by imaging modalities such as X-Ray, Ultrasonography, Computed Tomography (CT), and Magnetic Resonance Imaging (MRI), enabling clinical doctors to understand the texture of brain tumors and 3

ACCEPTED MANUSCRIPT

determine the type of therapy. The location and the extent of the tumor considering its posture in compliance with the surroundings are the determining factors in the therapy decision. Since the capacity to obtain brain tumor images has outstripped the ability to

CR IP T

analyze and segment these images manually, the studies on computational segmentation of brain tumor have been inevitably motivated. In recent years, brain tumor segmentation has become one of the most challenging tasks in medical image analysis.

AN US

Clustering is ideally used for image segmentation because of its ability to find out the complex relationships hidden in large unlabeled data sets based on a heuristic search for interesting and reasonable features [2-4]. Image clustering is simply mapping the

M

image into clusters such that the set of clusters presents the same information about the image as the entire image-set collection. The generated clusters provide a concise

ED

summarization and visualization of the image content [5]. In image analysis applications, the curse of dimensionality is a common

PT

problem that might be a factor of degradation in the performance of a given algorithm

CE

as the number of features increases. Consequently, it might be necessary to use models that perform dimensionality reduction for high dimensional data sets. The

AC

reduction techniques are frequently used as a data pre-processing step to reduce the complexity of the data image so that a high-dimensional data can be identified by an appropriate low-dimensional representation. [6]. Principle Component Analysis (PCA) is one of the most popular multivariate techniques for data reduction in image analysis, pattern recognition and machine learning [7-9]. The method focuses on the linear

4

ACCEPTED MANUSCRIPT

projection of multivariate high-dimensional data onto a low-dimensional subspace while retaining as much information as possible [10]. There are a number of studies that exhibit the success of the PCA algorithms on image analysis. A study conducted in 2000, Fiori proposed a new PCA neural network

CR IP T

algorithm based on Adaptive Principal Component Extractor (APEX) and compared the simulation results of the new method with two other PCA algorithms, the Generalized Hebbian Algorithm (GHA) and the standard APEX. Numerical simulations and structural complexity evaluations showed that the new approach improved the numerical

AN US

performances of both GHA and APEX [11]. The method described by Tu et al. provided a statistical hybrid discriminative-generative method to segment 3D brain images. In the discriminative model, learning from thousands of features of the images occurred to

M

capture the complex appearances of different anatomical structures. On the generative model, PCA was used to capture the shape information about each anatomical structure.

ED

Finally, the two models were combined to find a final segmentation by minimizing the energy function [12]. The combination of the probabilistic PCA (PPCA) and Self

PT

Organizing Map (SOM) model was used to create a self-organizing model that executed

CE

online learning of the local subspaces of input data. The authors explored the model success on data visualization, image compression and video compression by comparing

AC

the results of the three existing models, PCA, Kernel Based Topographic Maps and SelfOrganizing Mixture Models (SOMM). The proposed model outperformed the other models with respect to mean squared error (MSE) [13].

In 2013, Sachdeva et al.

proposed a computer aided diagnosis (CAD) system for segmentation, feature extraction and classification of brain tumors. In the study, 55 patients with 428 T1wMR images

5

ACCEPTED MANUSCRIPT

were used to classify multiclass brain tumor. The results demonstrated that the dimension reduction with PCA provided a 14% improvement in classification success [14]. The PCA algorithm was used by Parsi et al. to improve the image segmentation performance of the unsupervised Linde-Buzo-Gray (LBG) clustering algorithm. In the method,

CR IP T

eigenvalues of PCA was used to partition centers in each step of LBG algorithm. The proposed method was compared with the FCM and the Gustafson Kessel (G-K) methods in terms of time and accuracy. The clustering performed faster by the method achieving better accuracy [15]. Ju et al. introduced a new PPCA model, called L1-PPCA for

AN US

dimensionality reduction on 2D image data. The Laplacian density model was used for noise instead of a Gaussian assumption. The new method achieved more robust results for data outliers in terms of average reconstruction error. PPCA was used to develop a

M

self-organizing model [16]. Another study conducted in 2015 combined PCA and Kmeans clustering for segmentation of medical images to improve performance analysis

ED

and image quality. After extracting meaningful part from image, PCA was processed for feature extraction and proper number of cluster to increase success rate [17].

PT

The primary aim of this paper is to unveil a comparison between different

CE

variations of PCA algorithms regarding their success on dimensionality reduction of MRIs as well as the effects on the success of the clustering methods used on brain tumor

AC

segmentation. The clustering methods PCA algorithms considered within this paper are the conventional Principal Component Analysis (PCA), the Probabilistic Principal Component Analysis (PPCA), the EM Algorithm for PCA (EM-PCA), the Generalized Hebbian Algorithm (GHA) and the Adaptive Principal Component Extractor (APEX). A decision has been made to present the comparison of these data reduction algorithms by

6

ACCEPTED MANUSCRIPT

using the methods of Fuzzy C-Means (FCM) and K-Means for clustering. The two clustering algorithms were preferred because of their different notions regarding constituting clusters. Unlike the K-Means algorithm, the Fuzzy C-Means algorithm

clustering, an input pixel exclusively belongs to one cluster.

CR IP T

allows pixels to assign multiple classes with varying degrees of membership. In hard

The two methods have been widely used for the segmentation of MRI images and identification of tumor because of their success on the subject. In 2007, Xiao et al. used FCM clustering for segmentation of lateral ventricles in one pair of T1-weighted and T2-

AN US

weighted MRIs. The method was applied to the combination of the original image and the Gaussian filtered image, leading to more sensitive and more homogeneous results [18]. Juang and Wu proposed an image tracking method by using color-converted

M

segmentation with K-Means clustering in order to detect lesion size and region on brain MRI image. The results showed that the brain regions related to a tumor or lesion could

ED

be separated from the colored image [19]. Similarly, Kalaiselvi et al. used K-Means segmentation to detect brain abnormalities for the MRI images of human brain. They

PT

achieved a faster and better results than the existing fuzzy segmentation method [20]. In

CE

2015, Adhikari et al. introduced an efficient way to segment MRI brain images with the presence of noise and intensity inhomogeneity. They proposed an algorithm that

AC

incorporates spatial information and conditioning effects imposed by some conditional variables into the fuzzy membership function of conventional FCM [21]. Ali et al. developed an MRI brain tumor segmentation system which firstly created a new multiresolution wavelet image fused by a morphological pyramid and then segmented the images by FCM. They compared the system with three mostly used clustering algorithms;

7

ACCEPTED MANUSCRIPT

K-Means, Expectation-Maximization (EM) and Kernel Fuzzy C-Means. The results revealed that the proposed system outperformed the others [22]. The paper is organized as follows: the algorithms are presented with details in Section 2; Section 3 shows implementations of the PCA algorithms for clustering; and

CR IP T

discussions and conclusions with final comments are given in Section 4.

2. MATERIAL AND METHODS

In the study, the methods were implemented to cluster a real T1-weighted (T1w)

of 512 lines and 512 pixels per line.

AN US

MRI of the human brain with brain tumor by using Matlab [23]. The images have the size The raw image was preprocessed with PCA

algorithms first, and then each PCA transformed image was clustered using FCM and K-

M

Means algorithms. In this section, components of PCA-based clustering methods for MRIs are discussed.

ED

2.1. PCA Algorithms

2.1.1 Principle Component Analysis (PCA)

PT

The conventional PCA is a statistical method that focuses on a linear projection of

CE

multivariate high dimensional data onto low-dimensional subspace by using leastsquare decomposition while maintaining the maximum variance [24, 25]. PCA aims to

AC

find the orthogonal directions of strong variability in data. Given a set of observed d-dimensional independent data vectors

xi 

where

i 1,, n , the orthogonal projection is executed by yi  AT xi   

8

(1)

ACCEPTED MANUSCRIPT

where y is the transformed data and  is the sample mean of the observed data. The dxd orthogonal transformation matrix A comprises the orthonormal eigenvectors of the

sample covariance matrix that is given as 1 n  ( xi   )( xi   )T n i 1

(2)

CR IP T

S

The eigenvectors of S can be determined through eigenvalue decomposition of

S  AVAT where the eigenvalues in the diagonal matrix V describe the variance of the observations towards the related eigenvectors. The eigenvectors with the greatest

AN US

eigenvalues are known as principal components and sometimes called ‘loadings’ [26].

The q principal components (where q  d ) which minimize the squared reconstruction error are retained for the optimal linear dimensionality reduction. The

M

reconstruction error E r is computed by

2

(3)

ED

Er  x    yAq

where the matrix of the q principal eigenvectors Aq represents a mapping from the

PT

observed data into its ideal principal subspace. 2.1.2 Probabilistic Principal Components Analysis (PPCA)

CE

PPCA is one of the PCA extension algorithms proposed by Tipping and Bishop

AC

[27]. PPCA defines a probability density model in which the principal subspace of the observed data is defined by means of maximum-likelihood estimation of the parameters [28, 29].

Let y be a q dimensional latent variable. In the context of PPCA, the observed variables x are modeled as a linear transformation of some independently distributed latent variables supplemented by a Gaussian noise, 9

ACCEPTED MANUSCRIPT

x  Wy    

(4)

where W is a dxq parameter matrix representing the projection from the lowerdimensional latent space to the data space while the parameter  allows the model to

CR IP T

have a non-zero mean. In the model, the latent variables are assumed to have a Gaussian distribution with zero mean and unit variance, p( y) ~ N (0, I ) and the noise is defined to be an isotropic Gaussian, p( ) ~ N (0,  2 I ) where  2 represents the variance lost in the projection and specifies the noise level. Hence, the marginal distribution of the observed can

be

determined



via

integrating

the

conditional

distribution

AN US

data



p( x | y) ~ N Wy   ,  2 I weighted by the prior probability of the latent vectors p( y) in case the model dxd covariance is specified by Cv  WW T   2 I . This yields a

M

constrained Gaussian model p( x) ~ N ( , Cv) governed by the parameters  ,  2 and W Owing to the independence of the data vectors, the log-likelihood of all observed

ED

data set under the model is then defined n

L(  ,W ,  )   ln pxi    2







(5)

PT

i 1

n d ln 2   ln Cv  t r Cv 1 S 2

CE

where the sample covariance matrix S is given by (2). The model parameters can then be found by maximizing the likelihood.

AC

For a Gaussian, optimal values of the model parameters can easily be found

analytically by maximizing the likelihood function without need for iteration [30]. The maximum likelihood estimator for the parameter  is simply given by the sample mean while the maximum likelihood solutions for the noise variance  2 and the weight matrix W take the form of

10

ACCEPTED MANUSCRIPT

2  ML 

d 1  i d  q i q 1

(6)

W  Aq (Vq   2 I )1 / 2 O

(7)

where Aq defines the matrix of the q principal eigenvectors of the sample covariance

CR IP T

matrix, Vq is a qxq diagonal matrix of the related eigenvalues q , and O is a qxq orthogonal rotation matrix. The weight matrix W which includes the factor loadings can be decomposed by normalizing the columns of the matrix WO T in which case the

AN US

rotation matrix O is calculated as the eigenvector matrix of W T W .

Conventional PCA is a limiting case of the probabilistic model which arises when the noise level in the model becomes infinitesimal by considering the limit as  2  0 [31]. For the case, the projection into latent space becomes orthogonal [32].

M

2.1.3 Expectation Maximization Based Principal Component Analysis (EM-PCA)

ED

The EM-PCA is another density estimation technique which has a latent variable model representation similar to PPCA. The probabilistic model uses the Expectation-

PT

Maximization (EM) algorithm which provides an iterative approach for maximum likelihood estimation of the parameters in statistical models with unobserved latent

CE

variables i.e. considered missing or incomplete data [33, 34]. The EM algorithm was first

AC

outlined with an application to random coefficient models by Demster et al. [35] and developed for factor analysis by Rubin and Thayer [36]. Within the framework, the model parameters are iteratively updated based on their current estimates to maximize the expected log-likelihood of the complete data, composed of both the observed and the missing data until convergence occurs.

11

ACCEPTED MANUSCRIPT

The EM algorithm comprises of two stages: Expectation (E-step) and Maximization (M-step). Firstly, the expected value of the complete data log-likelihood is calculated regarding the conditional distribution of unobserved latent variables given the observed data under the current settings of the parameters. Afterwards, M-step finds new

CR IP T

values for the parameters that maximize the expectation of the log-likelihood found in the E-step.

From the definitions given in section PPCA, the corresponding log-likelihood of the complete data can be defined regarding the joint probability distribution for the latent





AN US

2 variables y and the observed variables x , p( x, y) ~ N Wy   ,  I . In the E-step, the

expectation of the complete data log-likelihood with respect to the conditional distribution of the latent data y , given the observed data x is calculated;



n

i 1

M

Lcomp   d2 ln  2  12 tr ( yi yiT )  21 2 ( xi   ) T ( xi   )



(8)

 12 yi W ( xi   )  21 2 tr (W W yi y ) T

T

T i

ED

T

The sufficient statistics with the current values of the parameters for the distribution are

PT

given with the following equations;





1



W T  xi   

yi yiT   2 W T W   2 I



1

 yi yi

(9) T

AC

CE

yi  W T W   2 I

2 Then, the new values for the parameters, W and  , are estimated through the

maximization of the conditional expectation log-likelihood Lcomp

W

(t )

n    x i    y i  i 1

12

T

 n T    y i y i    i 1 

1

(10)

ACCEPTED MANUSCRIPT



2 (t )





T T 1 n 2 T  xi    2 yi W (t ) ( xi   )  tr yi yiT W (t ) W (t )  nd i 1

where the superscript

t 



(11)

is the number of iteration. Both the expectation and

maximization steps are repeated until the change in the estimated ability between

CR IP T

consecutive iterations t  t  1 is negligible. 2.1.4 Generalized Hebbian Algorithm (GHA)

Sanger proposes the General Hebbian Algorithm as an extension of the well-

AN US

known Oja's learning rule to perform principal component analysis [37]. The GHA, also known as Sanger’s rule, is a Hebbian-type learning algorithm that allows a linear feedforward neural network to find a certain number of principal eigenvectors based on input data.

M

Let x be a d-dimensional input vector. In GHA, the feed-forward network is composed of a single layer of linear neurons each of which gathers the weighted input

ED

signals to form an output. The input patterns are iteratively fed into the network and the j th output of the network is calculated by y j  W j x where j 1,, q and W j is the

PT

T

weight vector for the j th neuron. For each iteration step, the weight vectors are adapted to

CE

determine the principal eigenvectors of the correlation matrix according to the GHA

AC

learning rule;





W t  1  W t    ( xt y T t   W t LT y(t ) y T (t ) )

(12)

where LT  . denotes a lower triangular matrix. After a large number of iterative computations and adaptations, the weight vector of the j th neuron converges to the j th

13

ACCEPTED MANUSCRIPT

principal component i.e. the network extracts the first q eigenvectors of the correlation matrix in descending eigenvalue order. 2.1.5 Adaptive Principal Component Extractor (APEX) In the APEX method, a laterally connected neural network is trained by using the

CR IP T

Oja's Hebbian rule to extract multiple principal components [38, 39]. The network consists of d inputs x1 ,, xd  which are fully connected to q outputs y1 ,, yq  through feed-forward weights W  wij  and additionally, there are lateral weights

AN US

H  h j  that connect the first q  1 output neurons with the q th output unit.

According to the rule, while the output for the first q  1 neurons is described with T the usual notation y  W x , the activation of q th neuron is given by

yq  Wq x  H T y

(13)

M

T

ED

where y  y1 ,, yq1 , Wq is the weight vector for the neuron. The learning rule of APEX for the q th neuron is

Wq t  1  Wq t    ( yq t xT (t )  yq (t )Wq (t ))

PT

2

(14)

H t  1  H t   ( yq t y T t   yq (t ) H (t ))

CE

2

where  defines a positive learning rate parameter. When the network is trained, the

AC

feed-forward weight vector Wq approaches to principal components and the lateral weight vector H goes to zero. 2.2 Image Clustering Methods 2.2.1 K-Means Clustering Algorithm

14

ACCEPTED MANUSCRIPT

The K-Means is a well-known clustering method firstly proposed by MacQueen and later developed by Hartigan and Wong [40, 41]. It is an unsupervised learning algorithm in which the data is partitioned into a user-predetermined number of clusters. Clusters are defined by means of the patterns which constitutes the related cluster, cited

CR IP T

as centroid.

For a set of n data patterns x1 ,, xn  , the cluster centers, so called centroids are initially determined at random, then the patterns are partitioned into k clusters in such a

AN US

way that each pattern is assigned to the cluster which has the closest centers. Afterwards, the positions of the centers are recalculated according to all patterns belonging to the cluster and the algorithm continues until the convergence which means that there is no more change in the positions of the centroids.

M

The objective function, in this case a squared error function which is attempted to minimize throughout the algorithm is given by k

n

ED

F   x j   i

2

(15)

i 1 j 1

PT

Here,  i symbolizes the mean i.e. the center of ith cluster, therefore x j   i gives a

CE

measure of distance from jth data point to this center where i  1, , k and j  1,, n . 2.2.2 Fuzzy C-Means (FCM) Clustering Algorithm

AC

Being a widespread used clustering algorithm, FCM is an iterative operation of

moving cluster centers gradually closer to input values. The method originally introduced by Bezdek is separated from K-Means algorithm in the way of partitioning process [42]. In FCM, a data point might be included simultaneously in all groups in proportion to its

15

ACCEPTED MANUSCRIPT

degree of membership ranging from 0 and 1 while K-Means provides a “hard” clustering where each data point is assigned to only one cluster. FCM algorithm finds the centroids which minimize the sum of the least square

k

k

n

O(U , ci )   Oi   u ijm d ij2 , i 1

i 1 j 1

CR IP T

errors function; (16)

where d ij is the distance between jth data point and ith cluster center c i ; k is given as

AN US

the number of clusters; and n is the number of data patterns similar to the terminations used in K-Means. The only difference is the membership value u ij which specifies the closeness of the data point x j to the cluster center c i , 0  u ij  1 , here m denotes the quantity of fuzziness, m > 1.

M

In order to minimize the objective function O which involves the measure of

ED

similarity, the membership degrees and the cluster centers for each data point are updated

PT

iteratively until the optimum converge occurs. Two conditions are given as follows;

ci 

i 1

j 1

m ij

u ij 

xj (17)

n

u

m ij

1  d ij       d p 1  pj  k

2 /( m 1)

n

k

u

u j 1

CE AC subject to

n

ij

 1 and n   u ij  0 . j 1

16

(18)

ACCEPTED MANUSCRIPT

3. THE RESULTS AND DISCUSSION In this study, the experiments have been conducted with T1w MRI images of the patient who suffered from a brain tumor dataset to reduce the extensive computing time. The experiments were designed in two stages; the first stage consisted of dimension

CR IP T

reduction performed by five different PCA algorithms: PCA, PPCA, EM-PCA, GHA, APEX, and then two clustering algorithms were applied in the second stage: K-Means and FCM [43]. In addition to the original image size of 512 rows by 512 columns, three more different sizes were included in the study to examine their effect on the methods.

AN US

The image dimensions were proportionally resized to 256x256, 128x128 and 64x64. The

M

samples of the original image in all used sizes are given in Fig.1;

b. 128x128

ED

a. 64x64

c. 256x256

d. 512x512

Figure 1: The images with the original size and the resized dimensions.

PT

Before applying the methods, all images were preprocessed to remove possible noises with box filtering method. Serial trials on the images with different window sizes

CE

demonstrated that the success of reconstruction performance tended to increase as the

AC

window size decreased. As a result of the experiments, the final window size was determined as 8 for this study. After preprocessing the images, the several variations of PCA application were

performed on the resized MRI images and their average reconstruction errors were then calculated using the formula given by (3) in order to get a sight of the change in reconstruction accuracy with respect to the number of principal components. These 17

ACCEPTED MANUSCRIPT

calculations were executed for each dimension of the resized MRI images. The results

AN US

CR IP T

can be visualized as seen in the following Figures 2-5.

ED

M

Figure 2. The approximate average reconstruction errors (64X64)

AC

CE

PT

Figure 3. The approximate average reconstruction errors (128X128)

Figure 4. The approximate average reconstruction errors (256X256)

18

Figure 5. The approximate average reconstruction errors (512X512)

CR IP T

ACCEPTED MANUSCRIPT

The calculated average reconstruction error rates for all images sized s 512x512,

AN US

256x256, 128x128, 64x64, with the PCA methods are given in Table 1 as the values which were simplified with the factors 1.0e+027, 1.0e+026, 1.0e+024 and 1.0e+023, respectively. As easily observed from the Figures 2-5 and Table 1, higher dimensions of

M

images lead to higher average reconstruction error rates. According to the average reconstruction error rates for the PCA methods, mostly EM-PCA and PPCA gives better

ED

results than the others. The error rates for the two methods are considerably close to each other. Both methods are found to be successful as being the best two results except for the

PT

image size of 256x256, in which they are still quite close to the minimum error rate. Table 1. The average reconstruction error rates for all sized images in the PCA methods 256X256

128X128

64X64

3.7993

1.1597

3.7795

1.0583

3.7430

1.1916

3.4214

1.0261

PPCA

3.7991

1.1918

3.6829

1.0151

GHA

4.7339

1.2767

3.8947

1.1686

APEX

4.5778

1.2878

3.6832

1.1943

PCA

CE

512X512

AC

EM-PCA

19

ACCEPTED MANUSCRIPT

As noted earlier, the study was performed in two stages; principal component analysis and clustering. Following the completion of the PCA applications, K-Means and FCM methods for image clustering were implemented on all sizes of the MRI images obtained with the PCA methods. At this stage, firstly the number of clusters were

CR IP T

required to be determined. The histogram of the original image given in Figure 6 can be used to decide the number of clusters. The frequency i.e. the number of pixels and the intensity of the image are indicated on the y-axis and the x-axis of the histogram graphic, respectively. There are roughly five peaks which are assumed to be possible clusters.

AN US

Therefore in the view of the diagram of the original image, the number of clusters was

PT

ED

M

determined to be five (k = 5) for the image clustering methods.

CE

Figure 6: The histogram of the original image

Pursuant to the reconstruction results and the reconstructed images given in

AC

Figures 7-10, it can be claimed that the FCM algorithm mostly gives successful results as compared to the K-Means algorithm. Since its performance is bound with the initial positions of the centers, there is no assurance that the K-Means algorithm always finds the optimal solution. On the other hand, FCM is an algorithm which updates the cluster centers iteratively up to minimize the sum of squared errors function given before in (16).

20

ACCEPTED MANUSCRIPT

The principal difference between the two clustering algorithms is a type of partitioning. The Fuzzy C-Means applies a fuzzy partition in such a way that an object in data has the ability to be in all groups with the degree of membership ranging between 0 and 1, while the K-Means is a hard clustering method assigning each data point to exactly one cluster.

images clustered by K-Means and FCM for all sizes. EM-PCA

PPCA

APEX

GHA

FCM

ED

M

K-Means

AN US

Original

PCA

CR IP T

The Figures 7-10 are given to exhibit differences between the original images and the

PPCA

CE

EM-PCA

K-Means

AC

Original

PCA

PT

Figure 7: The results of 64x64 size

21

APEX

GHA

FCM

ACCEPTED MANUSCRIPT

Figure 8: The results of 128x128 size EM-PCA

PPCA

Original

GHA

ED

EM-PCA

PPCA

FCM

AC

CE

PT

Original

PCA

M

AN US

K-Means FCM Figure 9: The results of 256x256 size

K-Means

APEX

CR IP T

PCA

Figure 10: The results of 512x512 size

22

APEX

GHA

ACCEPTED MANUSCRIPT

In addition to the average reconstruction error, the Euclidean distance matrix was also calculated to evaluate the distance between the original images and the processed images. Euclidean space is a real vector space which has a finite dimensional structure. The Euclidean distance matrix provides detailed table of the squared distances between

CR IP T

the point pairs taken from a list of points.

To obtain the Euclidean distance matrix, the Euclidean distances between the original image and the processed images are calculated. The best projected image among the PCA methods is supposed to have the minimum distance from the original image. The

AN US

average of the squared distance differences between the original images and the images processed with the PCA algorithms are given in Table 2. The simplified values of the results with the factor 1.0e+018 are used in the following tables and the best results are

M

represented in bold.

Table 2. The Euclidean distances between the original and the processed images with the PCA algorithms EM-PCA

PPCA

APEX

GHA

0.0006

0.0021

0.0230

0.0482

0.0075

0.0031

0.0177

0.0085

ED

PCA 0.0098

128X128

0.0055

256X256

1.8656

0.0064

0.0036

0.0012

0.2190

512X512

1.1580

0.0023

0.0006

0.2371

0.3010

CE

PT

64X64

AC

In the light of the results given in Table 2, it can be inferred that the PPCA

method achieves the best results in two of the image sizes which are the original and the 128x128. Although EM-PCA and APEX produce the minimum squared distance between the original and the reconstructed images in the sizes of 64x64 and 256x256, respectively, PPCA gives the second best results for these two sizes. In short, the results can be

23

ACCEPTED MANUSCRIPT

interpreted that the PPCA is the most powerful method for the dimension reduction without loss of information. Secondarily, the EM-PCA algorithm achieves strong results for all sizes as well. Table 3. The Euclidean distances between the original and the clustered images after the PCA algorithms a,b 128 X 128 FCM

K-Means

PCA

1.3412

3.4973

8.8749

EM-PCA

0.1493

2.1150

7.0927

PPCA

0.3110

2.0361

0.1972

APEX

3.9077

3.4346

3.0826

GHA

1.1223

0.4174

3.1258

0.9860

Total

6.8315

11.5004

22.3732

11.8191

AN US

K-Means

CR IP T

64 X 64

FCM

6.6951 1.0201 1.1407 1.9772

a. The results for the images with the resized dimensions of 64x64 and 128x128 512 X 512

FCM

K-Means

FCM

PCA

2.7823

1.0912

2.5594

1.7244

EM-PCA

0.2019

0.2182

0.4053

1.3033

PPCA

0.2552

0.6270

0.2459

0.2659

0.1828

0.5062

9.7202

3.1959

11.390

6.6540

10.596

5.5350

14.8122

9.0966

23.5268

12.0245

Total

PT

CE

APEX GHA

ED

K-Means

M

256 X 256

AC

b. The results for the images with the resized dimensions of 256x256 and the original size of 512x512

The total results reported in Tables 3a-3b indicate that the FCM clustering

algorithm produces the minimum distance values by the majority. On the other hand, KMeans gave the best results for all size of the images. With regard to the results about the success of the PCA algorithms, PPCA and EM-PPCA assisted the K-Means algorithm to 24

ACCEPTED MANUSCRIPT

accomplish the best clustering performance except for the size of 256x256 as well as both working effectively with the two clustering algorithms. When the results were examined in more detail, for the small sizes of the images, FCM was more compatible with GHA and K-Means achieves the best with APEX for the size of 256x256. Nevertheless, the

CR IP T

clustering algorithms both yielded appreciable results with also EM-PCA and PPCA for these sizes.

In order to measure dispersion of the results obtained by K-Means and FCM given in Tables 3a-3b, coefficient of variance (CV) was calculated based on the Total results.

AN US

According to CV values, which are 0.4582 and 0.1224 for K-Means and FCM respectively, it is easily observed that FCM yielded more consistent results than the KMeans did. In the light of the CV values, while KM was more consistent and compatible

M

with PPCA for all size of images, FCM yielded less dispersion with APEX for all size of

4. CONCLUSION

ED

images when compared with other PCA methods.

PT

Image clustering is used for high-level description of image content. It plays an

CE

important role as a problem-solving factor on pattern recognition and image processing in computational neuroscience. T1w MRI brain tumor images with the original size of 512

AC

rows by 512 columns have been used in the paper. The study aims to benchmark the success of the five PCA algorithms, PCA, PPCA, EM-PCA, GHA, APEX, in dimensionality reduction for clustering and to evaluate the methods according to their propensity to cause information loss. The two common methods, K-Means and FCM, were preferred for clustering.

25

ACCEPTED MANUSCRIPT

In order to achieve the goal, the PCA methods were first implemented on the MRI images which were resized into three different sizes as well as the original size. According to the reconstruction error rates of the application results, PPCA and EM-PCA have significantly performed better than the others. This can be caused by the way the

CR IP T

eigenvectors were obtained. Both the EM-PCA and the PPCA algorithms use a probabilistic approach to find a principal subspace without calculating the sample covariance matrix directly, which provides an efficient way for difficult cases of calculating the matrix especially for large-scale and large-variance data. These cases can

AN US

easily lead to an increase in overfitting problems. Besides, the methods can deal with missing data thanks to probabilistic estimation.

In the study, the number of centroids was decided as 5 considering the histogram

M

of the original image. Besides, a series of run were performed to determine the optimum value for the parameter C of FCM algorithm. The algorithm was executed for each value

ED

of C with given the values between 0 and 30 in increments of 5 and finally, the value of 5 was assigned to the parameter. Due to the clustering results of FCM and K-Means

PT

algorithms with the PCA algorithms on the resized T1w MRI images, FCM outperformed

CE

the K-Means clustering. The result can be supported owing to the fact that the finding of the optimum positions of the centroids can easily be effected by the initial conditions of

AC

the clusters. However, FCM iteratively seeks cluster centers which minimize the sum of squared errors function. Furthermore, in accordance with the aforementioned results, the EM-PCA and the PPCA achieve the successful results with the two clustering algorithms implemented on the resized MRI images. Therefore, it can be concluded that both the

26

ACCEPTED MANUSCRIPT

PPCA and the EM-PPCA algorithms work effectively with the two clustering algorithms, FCM and K-Means.

REFERENCES Liu J, Li M, Wang J , Wu F, Liu T, Pan Y (2014) A Survey of MRI-Based Brain Tumor Segmentation Methods, 19(6):578-595 2.

CR IP T

1.

Cardenesa R, Luis-Garciaa R, Bach-Cuadrab M (2009) A Multidimensional

in Biomedicine, 96(2), 108–124 3.

AN US

Segmentation Evaluation for Medical Image Data. Computer Methods and Programs

Zhang Y, Wu L (2012) An MR Brain Images Classifier via PCA and Kernel Support Vector Machine. Progress in Electromagnetics Research 130:369-388 Harchaoui NE, Kerroum MA, Hammouch A, Ouadou M, Aboutajdine D (2013)

M

4.

Unsupervised Approach Data Analysis Based on Fuzzy Possibilistic Clustering:

ED

Application to Medical Image MRI, Computational Intelligence and Neuroscience 2013(3):435497, doi:10.1155/2013/435497 Goldberger J, Greenspan H, Gordon S (2002) Unsupervised Image Clustering Using

PT

5.

CE

the Information Bottleneck Method, 24th DAGM Symposium for Pattern Recognition, Zurich, Switzerland, September 16-18, pp.158-165, doi:10.1007/3-540-

AC

45783-6_20 6.

Bishop CM (1995) Neural Networks for Pattern Recognition. Oxford University Press, New York, USA

7.

Hoyer PO, Hyvärinen A (2000) Independent Component Analysis Applied to Feature Extraction from Colour and Stereo Images. Network: Computation in Neural Systems, 11(3):191-210 27

ACCEPTED MANUSCRIPT

8.

Lu J, Plataniotis KN, Venetsanopoulos AN, Li SZ (2006) Ensemble Based Discriminant Learning with Boosting for Face Recognition. IEEE Transactions on Neural Networks, 17(1):166-178

9.

Yousefi S, Goldbaum MH, Zangwill LM, Medeiros FA, Bowd C (2014) Recognizing

CR IP T

Patterns of Visual Field Loss Using Unsupervised Machine Learning. Proceedings of SPIE 2014, 90342M, doi:10.1117/12.2043145.

10. Diamantaras KI, Kung SY (1996) Principal Component Neural Networks: Theory and Applications. Wiley, New York, USA

Processing Letters, 11:209-218

AN US

11. Fiori S (2000) An experimental Comparison of Three PCA Neural Networks. Neural

12. Tu Z, Narr KL, Dollar P, Dinov I, Thompson PM, Toga AW (2008) Brain

M

Anatomical Structure Segmentation by Hybrid Discriminative/Generative Models. IEEE Transactions on Medical Imaging, 27(4):495-508 E, Ortiz-de-Lazcano-Lobato

ED

13. Lopez-Rubio

JM,

Lopez-Rodriquez

D (2009)

Probabilistic PCA Self-Organizing Maps. IEEE Transactions on Neural Networks,

PT

20(9):1474-1489

CE

14. Sachdeva J, Kumar V, Gupta I, Khandelwal N, Ahuja CK (2013) Segmentation, Feature Extraction, and Multiclass Brain Tumor Classification, J Digit Imaging,

AC

26:1141–1150

15. Parsi A, Sorkhi AG, Zahedi M (2014) Improving The Unsupervised LBG Clustering Algorithm Performance in Image Segmentation Using Principal Component Analysis. Signal, Image and Video Processing, 10(2):301:309

28

ACCEPTED MANUSCRIPT

16. Ju F, Sun Y, Gao J, Hu Y, Yin B (2015) Image Outlier Detection and Feature Extraction via L1-Norm-Based 2D Probabilistic PCA. IEEE Transactions on Neural Networks, 24(12):4834-4846 17. Katkar J, Baraskar T, Mankar VR (2015) A Novel Approach for Medical Image

CR IP T

Segmentation Using PCA and K-Means Clustering. 2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pp. 430-435. doi: 10.1109/ICATCCT.2015.7456922

18. Xiao K, Ho SH, Salih Q (2007) A Study: Segmentation of Lateral Ventricles in Brain

AN US

MRI Using Fuzzy C-Means Clustering with Gaussian Smoothing. Rough Sets, Fuzzy Sets, Data Mining and Granular Computing 4482:161-170

19. Juang LH, Wu MN (2010) MRI Brain Lesion Image Detection Based on Color-

M

Converted K-Means Clustering Segmentation, Measurement 43:941-949 20. Kalaiselvi T, Somasundaram K, Rajeswari M (2012) Fast Brain Abnormality

ED

Detection Method for Magnetic Resonance Images (MRI) of Human Head Scans Using K-Means Clustering Technique. Proceedings of the Fourth International

PT

Conference on Signal and Image Processing (ICSIP 2012) 221:225-234

CE

21. Adhikaria SK, Singb JK, Basub DK, Nasipurib M (2015) Conditional Spatial Fuzzy C-Means Clustering Algorithm for Segmentation of MRI Images. Applied Soft

AC

Computing, 34:758-769 22. Ali H, Elmogy M, El-Daydamony E, Atwan A (2015) Multi-Resolution MRI Brain Image Segmentation Based on Morphological Pyramid and Fuzzy C-mean Clustering, Arabian Journal for Science and Engineering 40(11):3173-3185 23. MATLAB, 2008, www.mathworks.com

29

ACCEPTED MANUSCRIPT

24. Hotelling H (1933) Analysis of a Complex of Statistical Variables into Principal Components. Journal of Educational Psychology 24:417-441 25. Jolliffe, IT (2002) Principal Component Analysis, Springer-Verlag, New York. 26. Ku W, Storer RH, Georgakis C (1995) Disturbance Detection and Isolation by

CR IP T

Dynamic Principal Component Analysis. Chemometrics and Intelligent Laboratory Systems 30:179-196

27. Tipping M, Bishop C (1999) Probabilistic Principal Components Analysis. Journal of the Royal statistical Society: Series B (Statistical Methodology) 61(3):611-622.

AN US

28. Roweis S, Ghahramani Z (1999) A Unifying Review of Linear Gaussian Models. Neural Computation 11:305-345

29. López-Rubio E, Ortiz-De-Lazcano-Lobato JM, López-Rodríguez D (2009)

M

Probabilistic PCA Self-Organizing Maps. IEEE Transactions on Neural Networks 20(9):1474-1489

ED

30. Commenges D, Jacqmin-Gadda H (2015) Dynamical Biostatistical Models CRC Press, New York, p.23

PT

31. Roweis S (1997) EM Algorithms for PCA and SPCA. Advances in Neural

CE

Information Processing Systems 10:626-632 32. Tipping ME, Bishop CM (1999) Mixtures of Probabilistic Principal Component

AC

Analyzers. Neural Computation 11(2):443-482 33. Yu L, Snapp RR, Ruiz T, Radermacher M (2010) Probabilistic Principal Component Analysis

with

Expectation

Maximization

(PPCA-EM)

Facilitates

Volume

Classification and Estimates the Missing Data. Journal of Structural Biology 171(1):18-30

30

ACCEPTED MANUSCRIPT

34. Roweis S (1998) EM Algorithm for PCA and SPCA. Neural Information Processing Systems (NIPS’97) 10:626-632 35. Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 1-38

CR IP T

36. Rubin D, Thayer D (1982) EM Algorithms for ML Factor Analysis. Psychometrika 47(1):69-76

37. Sanger, TD (1989) Optimal Unsupervised Learning in Single-Layer Neural Network. Neural Networks 2:459-473

AN US

38. Kung SY, Diamantaras KI (1994) Adaptive Principal Components Extraction (APEX) and Applications. IEEE Transactions on signal Processing 42(5):1202-1217 39. Oja E (1982) A Simplified Neuron Model as a Principal Component Analyzer.

M

Journal of Mathematical Biology 16:267-273

40. MacQueen JB (1967) Some Methods For Classification and Analysis of Multivariate

ED

Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:281-297

PT

41. Hartigan JA, Wong MA (1979) A K-Means Clustering Algorithm. Applied Statistics

CE

28:100-108

42. Bezdek JC (1981) Pattern Recognition with Fuzzy Objective Function Algorithms.

AC

Plenum Press, New York, USA 43. Gezmez E (2007) Clustering MRI Images with Principal Component Analysis Methods, Master Thesis, Cukurova University, Adana, TURKEY

31