Accepted Manuscript
PCA Based Clustering for Brain Tumor Segmentation of T1w MRI Images Irem Ersoz ¸ akmak Pehlivanli , ¨ Kaya , Ayc¸a C Emine Gezmez Sekizkardes¸ , Turgay Ibrikc¸i PII: DOI: Reference:
S0169-2607(16)30419-9 10.1016/j.cmpb.2016.11.011 COMM 4307
To appear in:
Computer Methods and Programs in Biomedicine
Received date: Revised date: Accepted date:
1 May 2016 10 October 2016 23 November 2016
Please cite this article as: Irem Ersoz ¸ akmak Pehlivanli , Emine Gezmez Sekizkardes¸ , ¨ Kaya , Ayc¸a C Turgay Ibrikc¸i , PCA Based Clustering for Brain Tumor Segmentation of T1w MRI Images, Computer Methods and Programs in Biomedicine (2016), doi: 10.1016/j.cmpb.2016.11.011
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT
Highlights To investigate the performance of the PCA based clustering on MR images. In order to achieve the goal, the PCA methods were first implemented on the MRI images which were resized into three different sizes, as well as the original size. The two common methods, K-means and FCM, are preferred for clustering.
The success of the five PCA algorithms, PCA, PPCA, EM-PCA, GHA, APEX, in dimensionality reduction for clustering and to evaluate the methods according to their propensity to cause information loss.
The EM-PCA and the PPCA achieve the successful results with the two clustering algorithms implemented on the resized MRI images.
Therefore, it can be concluded that the PPCA and the EM-PPCA work effectively with the two clustering algorithms.
AC
CE
PT
ED
M
AN US
CR IP T
1
ACCEPTED MANUSCRIPT
PCA Based Clustering for Brain Tumor Segmentation of T1w MRI Images
Irem ERSÖZ KAYA1, Ayça ÇAKMAK SEKİZKARDEŞ3, Turgay IBRIKÇI4* 1
Mersin
University,
Software
Eng.
Dept.
33440
Tarsus,
Emine
Mersin,
GEZMEZ
Turkey;
(e-mail:
CR IP T
[email protected]);
PEHLİVANLI2,
2
Mimar Sinan Fine Arts University, Department of Statistics, Istanbul, Turkey (e-mail:
[email protected]). 3
Cukurova University, Electrical-Electronics Eng. Dept. 01330, Adana, Turkey (e-mail:
AN US
[email protected]); 4
Cukurova University, Electrical-Electronics Eng. Dept. 01330, Adana, Turkey (e-mail:
[email protected]); *
Corresponding Author:
[email protected]
M
ABSTRACT
ED
Background and Objective: Medical images are huge collections of information that are difficult to store and process consuming extensive computing time. Therefore, the
PT
reduction techniques are commonly used as a data pre-processing step to make the image data less complex so that a high-dimensional data can be identified by an appropriate
CE
low-dimensional representation. PCA is one of the most popular multivariate methods for data reduction. This paper is focused on T1-weighted MRI images clustering for brain
AC
tumor segmentation with dimension reduction by different common Principle Component Analysis (PCA) algorithms. Our primary aim is to present a comparison between different variations of PCA algorithms on MRIs for two cluster methods. Methods: Five most common PCA algorithms; namely the conventional PCA, Probabilistic Principal Component Analysis (PPCA), Expectation Maximization Based 2
ACCEPTED MANUSCRIPT
Principal Component Analysis (EM-PCA), Generalize Hebbian Algorithm (GHA), and Adaptive Principal Component Extraction (APEX) were applied to reduce dimensionality in advance of two clustering algorithms, K-Means and Fuzzy C-Means. In the study, the T1-weighted MRI images of the human brain with brain tumor were used for clustering.
CR IP T
In addition to the original size of 512 lines and 512 pixels per line, three more different sizes, 256x256, 128x128 and 64x64, were included in the study to examine their effect on the methods.
Results: The obtained results were compared in terms of both the reconstruction errors
AN US
and the Euclidean distance errors among the clustered images containing the same number of principle components.
Conclusion: According to the findings, the PPCA obtained the best results among all
M
others. Furthermore, the EM-PCA and the PPCA assisted K-Means algorithm to accomplish the best clustering performance in the majority as well as achieving
ED
significant results with both clustering algorithms for all size of T1w MRI images.
PT
Keywords: Dimension reduction; PCA algorithms; clustering K-Means; Fuzzy C-Means.
CE
1. INTRODUCTION
Brain tumor is a mass or a growth of abnormal cells in brain tissue. There are two
AC
types of tumors, cancerous and non-cancerous. Researches show that over a third of brain tumors are cancerous, and brain tumors rank among the top of cancer related deaths. Therefore, early diagnosis and treatment are crucial for survival [1]. The detailed aspects of brain tumors can be displayed by imaging modalities such as X-Ray, Ultrasonography, Computed Tomography (CT), and Magnetic Resonance Imaging (MRI), enabling clinical doctors to understand the texture of brain tumors and 3
ACCEPTED MANUSCRIPT
determine the type of therapy. The location and the extent of the tumor considering its posture in compliance with the surroundings are the determining factors in the therapy decision. Since the capacity to obtain brain tumor images has outstripped the ability to
CR IP T
analyze and segment these images manually, the studies on computational segmentation of brain tumor have been inevitably motivated. In recent years, brain tumor segmentation has become one of the most challenging tasks in medical image analysis.
AN US
Clustering is ideally used for image segmentation because of its ability to find out the complex relationships hidden in large unlabeled data sets based on a heuristic search for interesting and reasonable features [2-4]. Image clustering is simply mapping the
M
image into clusters such that the set of clusters presents the same information about the image as the entire image-set collection. The generated clusters provide a concise
ED
summarization and visualization of the image content [5]. In image analysis applications, the curse of dimensionality is a common
PT
problem that might be a factor of degradation in the performance of a given algorithm
CE
as the number of features increases. Consequently, it might be necessary to use models that perform dimensionality reduction for high dimensional data sets. The
AC
reduction techniques are frequently used as a data pre-processing step to reduce the complexity of the data image so that a high-dimensional data can be identified by an appropriate low-dimensional representation. [6]. Principle Component Analysis (PCA) is one of the most popular multivariate techniques for data reduction in image analysis, pattern recognition and machine learning [7-9]. The method focuses on the linear
4
ACCEPTED MANUSCRIPT
projection of multivariate high-dimensional data onto a low-dimensional subspace while retaining as much information as possible [10]. There are a number of studies that exhibit the success of the PCA algorithms on image analysis. A study conducted in 2000, Fiori proposed a new PCA neural network
CR IP T
algorithm based on Adaptive Principal Component Extractor (APEX) and compared the simulation results of the new method with two other PCA algorithms, the Generalized Hebbian Algorithm (GHA) and the standard APEX. Numerical simulations and structural complexity evaluations showed that the new approach improved the numerical
AN US
performances of both GHA and APEX [11]. The method described by Tu et al. provided a statistical hybrid discriminative-generative method to segment 3D brain images. In the discriminative model, learning from thousands of features of the images occurred to
M
capture the complex appearances of different anatomical structures. On the generative model, PCA was used to capture the shape information about each anatomical structure.
ED
Finally, the two models were combined to find a final segmentation by minimizing the energy function [12]. The combination of the probabilistic PCA (PPCA) and Self
PT
Organizing Map (SOM) model was used to create a self-organizing model that executed
CE
online learning of the local subspaces of input data. The authors explored the model success on data visualization, image compression and video compression by comparing
AC
the results of the three existing models, PCA, Kernel Based Topographic Maps and SelfOrganizing Mixture Models (SOMM). The proposed model outperformed the other models with respect to mean squared error (MSE) [13].
In 2013, Sachdeva et al.
proposed a computer aided diagnosis (CAD) system for segmentation, feature extraction and classification of brain tumors. In the study, 55 patients with 428 T1wMR images
5
ACCEPTED MANUSCRIPT
were used to classify multiclass brain tumor. The results demonstrated that the dimension reduction with PCA provided a 14% improvement in classification success [14]. The PCA algorithm was used by Parsi et al. to improve the image segmentation performance of the unsupervised Linde-Buzo-Gray (LBG) clustering algorithm. In the method,
CR IP T
eigenvalues of PCA was used to partition centers in each step of LBG algorithm. The proposed method was compared with the FCM and the Gustafson Kessel (G-K) methods in terms of time and accuracy. The clustering performed faster by the method achieving better accuracy [15]. Ju et al. introduced a new PPCA model, called L1-PPCA for
AN US
dimensionality reduction on 2D image data. The Laplacian density model was used for noise instead of a Gaussian assumption. The new method achieved more robust results for data outliers in terms of average reconstruction error. PPCA was used to develop a
M
self-organizing model [16]. Another study conducted in 2015 combined PCA and Kmeans clustering for segmentation of medical images to improve performance analysis
ED
and image quality. After extracting meaningful part from image, PCA was processed for feature extraction and proper number of cluster to increase success rate [17].
PT
The primary aim of this paper is to unveil a comparison between different
CE
variations of PCA algorithms regarding their success on dimensionality reduction of MRIs as well as the effects on the success of the clustering methods used on brain tumor
AC
segmentation. The clustering methods PCA algorithms considered within this paper are the conventional Principal Component Analysis (PCA), the Probabilistic Principal Component Analysis (PPCA), the EM Algorithm for PCA (EM-PCA), the Generalized Hebbian Algorithm (GHA) and the Adaptive Principal Component Extractor (APEX). A decision has been made to present the comparison of these data reduction algorithms by
6
ACCEPTED MANUSCRIPT
using the methods of Fuzzy C-Means (FCM) and K-Means for clustering. The two clustering algorithms were preferred because of their different notions regarding constituting clusters. Unlike the K-Means algorithm, the Fuzzy C-Means algorithm
clustering, an input pixel exclusively belongs to one cluster.
CR IP T
allows pixels to assign multiple classes with varying degrees of membership. In hard
The two methods have been widely used for the segmentation of MRI images and identification of tumor because of their success on the subject. In 2007, Xiao et al. used FCM clustering for segmentation of lateral ventricles in one pair of T1-weighted and T2-
AN US
weighted MRIs. The method was applied to the combination of the original image and the Gaussian filtered image, leading to more sensitive and more homogeneous results [18]. Juang and Wu proposed an image tracking method by using color-converted
M
segmentation with K-Means clustering in order to detect lesion size and region on brain MRI image. The results showed that the brain regions related to a tumor or lesion could
ED
be separated from the colored image [19]. Similarly, Kalaiselvi et al. used K-Means segmentation to detect brain abnormalities for the MRI images of human brain. They
PT
achieved a faster and better results than the existing fuzzy segmentation method [20]. In
CE
2015, Adhikari et al. introduced an efficient way to segment MRI brain images with the presence of noise and intensity inhomogeneity. They proposed an algorithm that
AC
incorporates spatial information and conditioning effects imposed by some conditional variables into the fuzzy membership function of conventional FCM [21]. Ali et al. developed an MRI brain tumor segmentation system which firstly created a new multiresolution wavelet image fused by a morphological pyramid and then segmented the images by FCM. They compared the system with three mostly used clustering algorithms;
7
ACCEPTED MANUSCRIPT
K-Means, Expectation-Maximization (EM) and Kernel Fuzzy C-Means. The results revealed that the proposed system outperformed the others [22]. The paper is organized as follows: the algorithms are presented with details in Section 2; Section 3 shows implementations of the PCA algorithms for clustering; and
CR IP T
discussions and conclusions with final comments are given in Section 4.
2. MATERIAL AND METHODS
In the study, the methods were implemented to cluster a real T1-weighted (T1w)
of 512 lines and 512 pixels per line.
AN US
MRI of the human brain with brain tumor by using Matlab [23]. The images have the size The raw image was preprocessed with PCA
algorithms first, and then each PCA transformed image was clustered using FCM and K-
M
Means algorithms. In this section, components of PCA-based clustering methods for MRIs are discussed.
ED
2.1. PCA Algorithms
2.1.1 Principle Component Analysis (PCA)
PT
The conventional PCA is a statistical method that focuses on a linear projection of
CE
multivariate high dimensional data onto low-dimensional subspace by using leastsquare decomposition while maintaining the maximum variance [24, 25]. PCA aims to
AC
find the orthogonal directions of strong variability in data. Given a set of observed d-dimensional independent data vectors
xi
where
i 1,, n , the orthogonal projection is executed by yi AT xi
8
(1)
ACCEPTED MANUSCRIPT
where y is the transformed data and is the sample mean of the observed data. The dxd orthogonal transformation matrix A comprises the orthonormal eigenvectors of the
sample covariance matrix that is given as 1 n ( xi )( xi )T n i 1
(2)
CR IP T
S
The eigenvectors of S can be determined through eigenvalue decomposition of
S AVAT where the eigenvalues in the diagonal matrix V describe the variance of the observations towards the related eigenvectors. The eigenvectors with the greatest
AN US
eigenvalues are known as principal components and sometimes called ‘loadings’ [26].
The q principal components (where q d ) which minimize the squared reconstruction error are retained for the optimal linear dimensionality reduction. The
M
reconstruction error E r is computed by
2
(3)
ED
Er x yAq
where the matrix of the q principal eigenvectors Aq represents a mapping from the
PT
observed data into its ideal principal subspace. 2.1.2 Probabilistic Principal Components Analysis (PPCA)
CE
PPCA is one of the PCA extension algorithms proposed by Tipping and Bishop
AC
[27]. PPCA defines a probability density model in which the principal subspace of the observed data is defined by means of maximum-likelihood estimation of the parameters [28, 29].
Let y be a q dimensional latent variable. In the context of PPCA, the observed variables x are modeled as a linear transformation of some independently distributed latent variables supplemented by a Gaussian noise, 9
ACCEPTED MANUSCRIPT
x Wy
(4)
where W is a dxq parameter matrix representing the projection from the lowerdimensional latent space to the data space while the parameter allows the model to
CR IP T
have a non-zero mean. In the model, the latent variables are assumed to have a Gaussian distribution with zero mean and unit variance, p( y) ~ N (0, I ) and the noise is defined to be an isotropic Gaussian, p( ) ~ N (0, 2 I ) where 2 represents the variance lost in the projection and specifies the noise level. Hence, the marginal distribution of the observed can
be
determined
via
integrating
the
conditional
distribution
AN US
data
p( x | y) ~ N Wy , 2 I weighted by the prior probability of the latent vectors p( y) in case the model dxd covariance is specified by Cv WW T 2 I . This yields a
M
constrained Gaussian model p( x) ~ N ( , Cv) governed by the parameters , 2 and W Owing to the independence of the data vectors, the log-likelihood of all observed
ED
data set under the model is then defined n
L( ,W , ) ln pxi 2
(5)
PT
i 1
n d ln 2 ln Cv t r Cv 1 S 2
CE
where the sample covariance matrix S is given by (2). The model parameters can then be found by maximizing the likelihood.
AC
For a Gaussian, optimal values of the model parameters can easily be found
analytically by maximizing the likelihood function without need for iteration [30]. The maximum likelihood estimator for the parameter is simply given by the sample mean while the maximum likelihood solutions for the noise variance 2 and the weight matrix W take the form of
10
ACCEPTED MANUSCRIPT
2 ML
d 1 i d q i q 1
(6)
W Aq (Vq 2 I )1 / 2 O
(7)
where Aq defines the matrix of the q principal eigenvectors of the sample covariance
CR IP T
matrix, Vq is a qxq diagonal matrix of the related eigenvalues q , and O is a qxq orthogonal rotation matrix. The weight matrix W which includes the factor loadings can be decomposed by normalizing the columns of the matrix WO T in which case the
AN US
rotation matrix O is calculated as the eigenvector matrix of W T W .
Conventional PCA is a limiting case of the probabilistic model which arises when the noise level in the model becomes infinitesimal by considering the limit as 2 0 [31]. For the case, the projection into latent space becomes orthogonal [32].
M
2.1.3 Expectation Maximization Based Principal Component Analysis (EM-PCA)
ED
The EM-PCA is another density estimation technique which has a latent variable model representation similar to PPCA. The probabilistic model uses the Expectation-
PT
Maximization (EM) algorithm which provides an iterative approach for maximum likelihood estimation of the parameters in statistical models with unobserved latent
CE
variables i.e. considered missing or incomplete data [33, 34]. The EM algorithm was first
AC
outlined with an application to random coefficient models by Demster et al. [35] and developed for factor analysis by Rubin and Thayer [36]. Within the framework, the model parameters are iteratively updated based on their current estimates to maximize the expected log-likelihood of the complete data, composed of both the observed and the missing data until convergence occurs.
11
ACCEPTED MANUSCRIPT
The EM algorithm comprises of two stages: Expectation (E-step) and Maximization (M-step). Firstly, the expected value of the complete data log-likelihood is calculated regarding the conditional distribution of unobserved latent variables given the observed data under the current settings of the parameters. Afterwards, M-step finds new
CR IP T
values for the parameters that maximize the expectation of the log-likelihood found in the E-step.
From the definitions given in section PPCA, the corresponding log-likelihood of the complete data can be defined regarding the joint probability distribution for the latent
AN US
2 variables y and the observed variables x , p( x, y) ~ N Wy , I . In the E-step, the
expectation of the complete data log-likelihood with respect to the conditional distribution of the latent data y , given the observed data x is calculated;
n
i 1
M
Lcomp d2 ln 2 12 tr ( yi yiT ) 21 2 ( xi ) T ( xi )
(8)
12 yi W ( xi ) 21 2 tr (W W yi y ) T
T
T i
ED
T
The sufficient statistics with the current values of the parameters for the distribution are
PT
given with the following equations;
1
W T xi
yi yiT 2 W T W 2 I
1
yi yi
(9) T
AC
CE
yi W T W 2 I
2 Then, the new values for the parameters, W and , are estimated through the
maximization of the conditional expectation log-likelihood Lcomp
W
(t )
n x i y i i 1
12
T
n T y i y i i 1
1
(10)
ACCEPTED MANUSCRIPT
2 (t )
T T 1 n 2 T xi 2 yi W (t ) ( xi ) tr yi yiT W (t ) W (t ) nd i 1
where the superscript
t
(11)
is the number of iteration. Both the expectation and
maximization steps are repeated until the change in the estimated ability between
CR IP T
consecutive iterations t t 1 is negligible. 2.1.4 Generalized Hebbian Algorithm (GHA)
Sanger proposes the General Hebbian Algorithm as an extension of the well-
AN US
known Oja's learning rule to perform principal component analysis [37]. The GHA, also known as Sanger’s rule, is a Hebbian-type learning algorithm that allows a linear feedforward neural network to find a certain number of principal eigenvectors based on input data.
M
Let x be a d-dimensional input vector. In GHA, the feed-forward network is composed of a single layer of linear neurons each of which gathers the weighted input
ED
signals to form an output. The input patterns are iteratively fed into the network and the j th output of the network is calculated by y j W j x where j 1,, q and W j is the
PT
T
weight vector for the j th neuron. For each iteration step, the weight vectors are adapted to
CE
determine the principal eigenvectors of the correlation matrix according to the GHA
AC
learning rule;
W t 1 W t ( xt y T t W t LT y(t ) y T (t ) )
(12)
where LT . denotes a lower triangular matrix. After a large number of iterative computations and adaptations, the weight vector of the j th neuron converges to the j th
13
ACCEPTED MANUSCRIPT
principal component i.e. the network extracts the first q eigenvectors of the correlation matrix in descending eigenvalue order. 2.1.5 Adaptive Principal Component Extractor (APEX) In the APEX method, a laterally connected neural network is trained by using the
CR IP T
Oja's Hebbian rule to extract multiple principal components [38, 39]. The network consists of d inputs x1 ,, xd which are fully connected to q outputs y1 ,, yq through feed-forward weights W wij and additionally, there are lateral weights
AN US
H h j that connect the first q 1 output neurons with the q th output unit.
According to the rule, while the output for the first q 1 neurons is described with T the usual notation y W x , the activation of q th neuron is given by
yq Wq x H T y
(13)
M
T
ED
where y y1 ,, yq1 , Wq is the weight vector for the neuron. The learning rule of APEX for the q th neuron is
Wq t 1 Wq t ( yq t xT (t ) yq (t )Wq (t ))
PT
2
(14)
H t 1 H t ( yq t y T t yq (t ) H (t ))
CE
2
where defines a positive learning rate parameter. When the network is trained, the
AC
feed-forward weight vector Wq approaches to principal components and the lateral weight vector H goes to zero. 2.2 Image Clustering Methods 2.2.1 K-Means Clustering Algorithm
14
ACCEPTED MANUSCRIPT
The K-Means is a well-known clustering method firstly proposed by MacQueen and later developed by Hartigan and Wong [40, 41]. It is an unsupervised learning algorithm in which the data is partitioned into a user-predetermined number of clusters. Clusters are defined by means of the patterns which constitutes the related cluster, cited
CR IP T
as centroid.
For a set of n data patterns x1 ,, xn , the cluster centers, so called centroids are initially determined at random, then the patterns are partitioned into k clusters in such a
AN US
way that each pattern is assigned to the cluster which has the closest centers. Afterwards, the positions of the centers are recalculated according to all patterns belonging to the cluster and the algorithm continues until the convergence which means that there is no more change in the positions of the centroids.
M
The objective function, in this case a squared error function which is attempted to minimize throughout the algorithm is given by k
n
ED
F x j i
2
(15)
i 1 j 1
PT
Here, i symbolizes the mean i.e. the center of ith cluster, therefore x j i gives a
CE
measure of distance from jth data point to this center where i 1, , k and j 1,, n . 2.2.2 Fuzzy C-Means (FCM) Clustering Algorithm
AC
Being a widespread used clustering algorithm, FCM is an iterative operation of
moving cluster centers gradually closer to input values. The method originally introduced by Bezdek is separated from K-Means algorithm in the way of partitioning process [42]. In FCM, a data point might be included simultaneously in all groups in proportion to its
15
ACCEPTED MANUSCRIPT
degree of membership ranging from 0 and 1 while K-Means provides a “hard” clustering where each data point is assigned to only one cluster. FCM algorithm finds the centroids which minimize the sum of the least square
k
k
n
O(U , ci ) Oi u ijm d ij2 , i 1
i 1 j 1
CR IP T
errors function; (16)
where d ij is the distance between jth data point and ith cluster center c i ; k is given as
AN US
the number of clusters; and n is the number of data patterns similar to the terminations used in K-Means. The only difference is the membership value u ij which specifies the closeness of the data point x j to the cluster center c i , 0 u ij 1 , here m denotes the quantity of fuzziness, m > 1.
M
In order to minimize the objective function O which involves the measure of
ED
similarity, the membership degrees and the cluster centers for each data point are updated
PT
iteratively until the optimum converge occurs. Two conditions are given as follows;
ci
i 1
j 1
m ij
u ij
xj (17)
n
u
m ij
1 d ij d p 1 pj k
2 /( m 1)
n
k
u
u j 1
CE AC subject to
n
ij
1 and n u ij 0 . j 1
16
(18)
ACCEPTED MANUSCRIPT
3. THE RESULTS AND DISCUSSION In this study, the experiments have been conducted with T1w MRI images of the patient who suffered from a brain tumor dataset to reduce the extensive computing time. The experiments were designed in two stages; the first stage consisted of dimension
CR IP T
reduction performed by five different PCA algorithms: PCA, PPCA, EM-PCA, GHA, APEX, and then two clustering algorithms were applied in the second stage: K-Means and FCM [43]. In addition to the original image size of 512 rows by 512 columns, three more different sizes were included in the study to examine their effect on the methods.
AN US
The image dimensions were proportionally resized to 256x256, 128x128 and 64x64. The
M
samples of the original image in all used sizes are given in Fig.1;
b. 128x128
ED
a. 64x64
c. 256x256
d. 512x512
Figure 1: The images with the original size and the resized dimensions.
PT
Before applying the methods, all images were preprocessed to remove possible noises with box filtering method. Serial trials on the images with different window sizes
CE
demonstrated that the success of reconstruction performance tended to increase as the
AC
window size decreased. As a result of the experiments, the final window size was determined as 8 for this study. After preprocessing the images, the several variations of PCA application were
performed on the resized MRI images and their average reconstruction errors were then calculated using the formula given by (3) in order to get a sight of the change in reconstruction accuracy with respect to the number of principal components. These 17
ACCEPTED MANUSCRIPT
calculations were executed for each dimension of the resized MRI images. The results
AN US
CR IP T
can be visualized as seen in the following Figures 2-5.
ED
M
Figure 2. The approximate average reconstruction errors (64X64)
AC
CE
PT
Figure 3. The approximate average reconstruction errors (128X128)
Figure 4. The approximate average reconstruction errors (256X256)
18
Figure 5. The approximate average reconstruction errors (512X512)
CR IP T
ACCEPTED MANUSCRIPT
The calculated average reconstruction error rates for all images sized s 512x512,
AN US
256x256, 128x128, 64x64, with the PCA methods are given in Table 1 as the values which were simplified with the factors 1.0e+027, 1.0e+026, 1.0e+024 and 1.0e+023, respectively. As easily observed from the Figures 2-5 and Table 1, higher dimensions of
M
images lead to higher average reconstruction error rates. According to the average reconstruction error rates for the PCA methods, mostly EM-PCA and PPCA gives better
ED
results than the others. The error rates for the two methods are considerably close to each other. Both methods are found to be successful as being the best two results except for the
PT
image size of 256x256, in which they are still quite close to the minimum error rate. Table 1. The average reconstruction error rates for all sized images in the PCA methods 256X256
128X128
64X64
3.7993
1.1597
3.7795
1.0583
3.7430
1.1916
3.4214
1.0261
PPCA
3.7991
1.1918
3.6829
1.0151
GHA
4.7339
1.2767
3.8947
1.1686
APEX
4.5778
1.2878
3.6832
1.1943
PCA
CE
512X512
AC
EM-PCA
19
ACCEPTED MANUSCRIPT
As noted earlier, the study was performed in two stages; principal component analysis and clustering. Following the completion of the PCA applications, K-Means and FCM methods for image clustering were implemented on all sizes of the MRI images obtained with the PCA methods. At this stage, firstly the number of clusters were
CR IP T
required to be determined. The histogram of the original image given in Figure 6 can be used to decide the number of clusters. The frequency i.e. the number of pixels and the intensity of the image are indicated on the y-axis and the x-axis of the histogram graphic, respectively. There are roughly five peaks which are assumed to be possible clusters.
AN US
Therefore in the view of the diagram of the original image, the number of clusters was
PT
ED
M
determined to be five (k = 5) for the image clustering methods.
CE
Figure 6: The histogram of the original image
Pursuant to the reconstruction results and the reconstructed images given in
AC
Figures 7-10, it can be claimed that the FCM algorithm mostly gives successful results as compared to the K-Means algorithm. Since its performance is bound with the initial positions of the centers, there is no assurance that the K-Means algorithm always finds the optimal solution. On the other hand, FCM is an algorithm which updates the cluster centers iteratively up to minimize the sum of squared errors function given before in (16).
20
ACCEPTED MANUSCRIPT
The principal difference between the two clustering algorithms is a type of partitioning. The Fuzzy C-Means applies a fuzzy partition in such a way that an object in data has the ability to be in all groups with the degree of membership ranging between 0 and 1, while the K-Means is a hard clustering method assigning each data point to exactly one cluster.
images clustered by K-Means and FCM for all sizes. EM-PCA
PPCA
APEX
GHA
FCM
ED
M
K-Means
AN US
Original
PCA
CR IP T
The Figures 7-10 are given to exhibit differences between the original images and the
PPCA
CE
EM-PCA
K-Means
AC
Original
PCA
PT
Figure 7: The results of 64x64 size
21
APEX
GHA
FCM
ACCEPTED MANUSCRIPT
Figure 8: The results of 128x128 size EM-PCA
PPCA
Original
GHA
ED
EM-PCA
PPCA
FCM
AC
CE
PT
Original
PCA
M
AN US
K-Means FCM Figure 9: The results of 256x256 size
K-Means
APEX
CR IP T
PCA
Figure 10: The results of 512x512 size
22
APEX
GHA
ACCEPTED MANUSCRIPT
In addition to the average reconstruction error, the Euclidean distance matrix was also calculated to evaluate the distance between the original images and the processed images. Euclidean space is a real vector space which has a finite dimensional structure. The Euclidean distance matrix provides detailed table of the squared distances between
CR IP T
the point pairs taken from a list of points.
To obtain the Euclidean distance matrix, the Euclidean distances between the original image and the processed images are calculated. The best projected image among the PCA methods is supposed to have the minimum distance from the original image. The
AN US
average of the squared distance differences between the original images and the images processed with the PCA algorithms are given in Table 2. The simplified values of the results with the factor 1.0e+018 are used in the following tables and the best results are
M
represented in bold.
Table 2. The Euclidean distances between the original and the processed images with the PCA algorithms EM-PCA
PPCA
APEX
GHA
0.0006
0.0021
0.0230
0.0482
0.0075
0.0031
0.0177
0.0085
ED
PCA 0.0098
128X128
0.0055
256X256
1.8656
0.0064
0.0036
0.0012
0.2190
512X512
1.1580
0.0023
0.0006
0.2371
0.3010
CE
PT
64X64
AC
In the light of the results given in Table 2, it can be inferred that the PPCA
method achieves the best results in two of the image sizes which are the original and the 128x128. Although EM-PCA and APEX produce the minimum squared distance between the original and the reconstructed images in the sizes of 64x64 and 256x256, respectively, PPCA gives the second best results for these two sizes. In short, the results can be
23
ACCEPTED MANUSCRIPT
interpreted that the PPCA is the most powerful method for the dimension reduction without loss of information. Secondarily, the EM-PCA algorithm achieves strong results for all sizes as well. Table 3. The Euclidean distances between the original and the clustered images after the PCA algorithms a,b 128 X 128 FCM
K-Means
PCA
1.3412
3.4973
8.8749
EM-PCA
0.1493
2.1150
7.0927
PPCA
0.3110
2.0361
0.1972
APEX
3.9077
3.4346
3.0826
GHA
1.1223
0.4174
3.1258
0.9860
Total
6.8315
11.5004
22.3732
11.8191
AN US
K-Means
CR IP T
64 X 64
FCM
6.6951 1.0201 1.1407 1.9772
a. The results for the images with the resized dimensions of 64x64 and 128x128 512 X 512
FCM
K-Means
FCM
PCA
2.7823
1.0912
2.5594
1.7244
EM-PCA
0.2019
0.2182
0.4053
1.3033
PPCA
0.2552
0.6270
0.2459
0.2659
0.1828
0.5062
9.7202
3.1959
11.390
6.6540
10.596
5.5350
14.8122
9.0966
23.5268
12.0245
Total
PT
CE
APEX GHA
ED
K-Means
M
256 X 256
AC
b. The results for the images with the resized dimensions of 256x256 and the original size of 512x512
The total results reported in Tables 3a-3b indicate that the FCM clustering
algorithm produces the minimum distance values by the majority. On the other hand, KMeans gave the best results for all size of the images. With regard to the results about the success of the PCA algorithms, PPCA and EM-PPCA assisted the K-Means algorithm to 24
ACCEPTED MANUSCRIPT
accomplish the best clustering performance except for the size of 256x256 as well as both working effectively with the two clustering algorithms. When the results were examined in more detail, for the small sizes of the images, FCM was more compatible with GHA and K-Means achieves the best with APEX for the size of 256x256. Nevertheless, the
CR IP T
clustering algorithms both yielded appreciable results with also EM-PCA and PPCA for these sizes.
In order to measure dispersion of the results obtained by K-Means and FCM given in Tables 3a-3b, coefficient of variance (CV) was calculated based on the Total results.
AN US
According to CV values, which are 0.4582 and 0.1224 for K-Means and FCM respectively, it is easily observed that FCM yielded more consistent results than the KMeans did. In the light of the CV values, while KM was more consistent and compatible
M
with PPCA for all size of images, FCM yielded less dispersion with APEX for all size of
4. CONCLUSION
ED
images when compared with other PCA methods.
PT
Image clustering is used for high-level description of image content. It plays an
CE
important role as a problem-solving factor on pattern recognition and image processing in computational neuroscience. T1w MRI brain tumor images with the original size of 512
AC
rows by 512 columns have been used in the paper. The study aims to benchmark the success of the five PCA algorithms, PCA, PPCA, EM-PCA, GHA, APEX, in dimensionality reduction for clustering and to evaluate the methods according to their propensity to cause information loss. The two common methods, K-Means and FCM, were preferred for clustering.
25
ACCEPTED MANUSCRIPT
In order to achieve the goal, the PCA methods were first implemented on the MRI images which were resized into three different sizes as well as the original size. According to the reconstruction error rates of the application results, PPCA and EM-PCA have significantly performed better than the others. This can be caused by the way the
CR IP T
eigenvectors were obtained. Both the EM-PCA and the PPCA algorithms use a probabilistic approach to find a principal subspace without calculating the sample covariance matrix directly, which provides an efficient way for difficult cases of calculating the matrix especially for large-scale and large-variance data. These cases can
AN US
easily lead to an increase in overfitting problems. Besides, the methods can deal with missing data thanks to probabilistic estimation.
In the study, the number of centroids was decided as 5 considering the histogram
M
of the original image. Besides, a series of run were performed to determine the optimum value for the parameter C of FCM algorithm. The algorithm was executed for each value
ED
of C with given the values between 0 and 30 in increments of 5 and finally, the value of 5 was assigned to the parameter. Due to the clustering results of FCM and K-Means
PT
algorithms with the PCA algorithms on the resized T1w MRI images, FCM outperformed
CE
the K-Means clustering. The result can be supported owing to the fact that the finding of the optimum positions of the centroids can easily be effected by the initial conditions of
AC
the clusters. However, FCM iteratively seeks cluster centers which minimize the sum of squared errors function. Furthermore, in accordance with the aforementioned results, the EM-PCA and the PPCA achieve the successful results with the two clustering algorithms implemented on the resized MRI images. Therefore, it can be concluded that both the
26
ACCEPTED MANUSCRIPT
PPCA and the EM-PPCA algorithms work effectively with the two clustering algorithms, FCM and K-Means.
REFERENCES Liu J, Li M, Wang J , Wu F, Liu T, Pan Y (2014) A Survey of MRI-Based Brain Tumor Segmentation Methods, 19(6):578-595 2.
CR IP T
1.
Cardenesa R, Luis-Garciaa R, Bach-Cuadrab M (2009) A Multidimensional
in Biomedicine, 96(2), 108–124 3.
AN US
Segmentation Evaluation for Medical Image Data. Computer Methods and Programs
Zhang Y, Wu L (2012) An MR Brain Images Classifier via PCA and Kernel Support Vector Machine. Progress in Electromagnetics Research 130:369-388 Harchaoui NE, Kerroum MA, Hammouch A, Ouadou M, Aboutajdine D (2013)
M
4.
Unsupervised Approach Data Analysis Based on Fuzzy Possibilistic Clustering:
ED
Application to Medical Image MRI, Computational Intelligence and Neuroscience 2013(3):435497, doi:10.1155/2013/435497 Goldberger J, Greenspan H, Gordon S (2002) Unsupervised Image Clustering Using
PT
5.
CE
the Information Bottleneck Method, 24th DAGM Symposium for Pattern Recognition, Zurich, Switzerland, September 16-18, pp.158-165, doi:10.1007/3-540-
AC
45783-6_20 6.
Bishop CM (1995) Neural Networks for Pattern Recognition. Oxford University Press, New York, USA
7.
Hoyer PO, Hyvärinen A (2000) Independent Component Analysis Applied to Feature Extraction from Colour and Stereo Images. Network: Computation in Neural Systems, 11(3):191-210 27
ACCEPTED MANUSCRIPT
8.
Lu J, Plataniotis KN, Venetsanopoulos AN, Li SZ (2006) Ensemble Based Discriminant Learning with Boosting for Face Recognition. IEEE Transactions on Neural Networks, 17(1):166-178
9.
Yousefi S, Goldbaum MH, Zangwill LM, Medeiros FA, Bowd C (2014) Recognizing
CR IP T
Patterns of Visual Field Loss Using Unsupervised Machine Learning. Proceedings of SPIE 2014, 90342M, doi:10.1117/12.2043145.
10. Diamantaras KI, Kung SY (1996) Principal Component Neural Networks: Theory and Applications. Wiley, New York, USA
Processing Letters, 11:209-218
AN US
11. Fiori S (2000) An experimental Comparison of Three PCA Neural Networks. Neural
12. Tu Z, Narr KL, Dollar P, Dinov I, Thompson PM, Toga AW (2008) Brain
M
Anatomical Structure Segmentation by Hybrid Discriminative/Generative Models. IEEE Transactions on Medical Imaging, 27(4):495-508 E, Ortiz-de-Lazcano-Lobato
ED
13. Lopez-Rubio
JM,
Lopez-Rodriquez
D (2009)
Probabilistic PCA Self-Organizing Maps. IEEE Transactions on Neural Networks,
PT
20(9):1474-1489
CE
14. Sachdeva J, Kumar V, Gupta I, Khandelwal N, Ahuja CK (2013) Segmentation, Feature Extraction, and Multiclass Brain Tumor Classification, J Digit Imaging,
AC
26:1141–1150
15. Parsi A, Sorkhi AG, Zahedi M (2014) Improving The Unsupervised LBG Clustering Algorithm Performance in Image Segmentation Using Principal Component Analysis. Signal, Image and Video Processing, 10(2):301:309
28
ACCEPTED MANUSCRIPT
16. Ju F, Sun Y, Gao J, Hu Y, Yin B (2015) Image Outlier Detection and Feature Extraction via L1-Norm-Based 2D Probabilistic PCA. IEEE Transactions on Neural Networks, 24(12):4834-4846 17. Katkar J, Baraskar T, Mankar VR (2015) A Novel Approach for Medical Image
CR IP T
Segmentation Using PCA and K-Means Clustering. 2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT), pp. 430-435. doi: 10.1109/ICATCCT.2015.7456922
18. Xiao K, Ho SH, Salih Q (2007) A Study: Segmentation of Lateral Ventricles in Brain
AN US
MRI Using Fuzzy C-Means Clustering with Gaussian Smoothing. Rough Sets, Fuzzy Sets, Data Mining and Granular Computing 4482:161-170
19. Juang LH, Wu MN (2010) MRI Brain Lesion Image Detection Based on Color-
M
Converted K-Means Clustering Segmentation, Measurement 43:941-949 20. Kalaiselvi T, Somasundaram K, Rajeswari M (2012) Fast Brain Abnormality
ED
Detection Method for Magnetic Resonance Images (MRI) of Human Head Scans Using K-Means Clustering Technique. Proceedings of the Fourth International
PT
Conference on Signal and Image Processing (ICSIP 2012) 221:225-234
CE
21. Adhikaria SK, Singb JK, Basub DK, Nasipurib M (2015) Conditional Spatial Fuzzy C-Means Clustering Algorithm for Segmentation of MRI Images. Applied Soft
AC
Computing, 34:758-769 22. Ali H, Elmogy M, El-Daydamony E, Atwan A (2015) Multi-Resolution MRI Brain Image Segmentation Based on Morphological Pyramid and Fuzzy C-mean Clustering, Arabian Journal for Science and Engineering 40(11):3173-3185 23. MATLAB, 2008, www.mathworks.com
29
ACCEPTED MANUSCRIPT
24. Hotelling H (1933) Analysis of a Complex of Statistical Variables into Principal Components. Journal of Educational Psychology 24:417-441 25. Jolliffe, IT (2002) Principal Component Analysis, Springer-Verlag, New York. 26. Ku W, Storer RH, Georgakis C (1995) Disturbance Detection and Isolation by
CR IP T
Dynamic Principal Component Analysis. Chemometrics and Intelligent Laboratory Systems 30:179-196
27. Tipping M, Bishop C (1999) Probabilistic Principal Components Analysis. Journal of the Royal statistical Society: Series B (Statistical Methodology) 61(3):611-622.
AN US
28. Roweis S, Ghahramani Z (1999) A Unifying Review of Linear Gaussian Models. Neural Computation 11:305-345
29. López-Rubio E, Ortiz-De-Lazcano-Lobato JM, López-Rodríguez D (2009)
M
Probabilistic PCA Self-Organizing Maps. IEEE Transactions on Neural Networks 20(9):1474-1489
ED
30. Commenges D, Jacqmin-Gadda H (2015) Dynamical Biostatistical Models CRC Press, New York, p.23
PT
31. Roweis S (1997) EM Algorithms for PCA and SPCA. Advances in Neural
CE
Information Processing Systems 10:626-632 32. Tipping ME, Bishop CM (1999) Mixtures of Probabilistic Principal Component
AC
Analyzers. Neural Computation 11(2):443-482 33. Yu L, Snapp RR, Ruiz T, Radermacher M (2010) Probabilistic Principal Component Analysis
with
Expectation
Maximization
(PPCA-EM)
Facilitates
Volume
Classification and Estimates the Missing Data. Journal of Structural Biology 171(1):18-30
30
ACCEPTED MANUSCRIPT
34. Roweis S (1998) EM Algorithm for PCA and SPCA. Neural Information Processing Systems (NIPS’97) 10:626-632 35. Dempster AP, Laird NM, Rubin DB (1977) Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society, Series B, 1-38
CR IP T
36. Rubin D, Thayer D (1982) EM Algorithms for ML Factor Analysis. Psychometrika 47(1):69-76
37. Sanger, TD (1989) Optimal Unsupervised Learning in Single-Layer Neural Network. Neural Networks 2:459-473
AN US
38. Kung SY, Diamantaras KI (1994) Adaptive Principal Components Extraction (APEX) and Applications. IEEE Transactions on signal Processing 42(5):1202-1217 39. Oja E (1982) A Simplified Neuron Model as a Principal Component Analyzer.
M
Journal of Mathematical Biology 16:267-273
40. MacQueen JB (1967) Some Methods For Classification and Analysis of Multivariate
ED
Observations, Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1:281-297
PT
41. Hartigan JA, Wong MA (1979) A K-Means Clustering Algorithm. Applied Statistics
CE
28:100-108
42. Bezdek JC (1981) Pattern Recognition with Fuzzy Objective Function Algorithms.
AC
Plenum Press, New York, USA 43. Gezmez E (2007) Clustering MRI Images with Principal Component Analysis Methods, Master Thesis, Cukurova University, Adana, TURKEY
31