CHAPTER
MULTILEVEL CLASSIFICATION FRAMEWORK OF fMRI DATA: A BIG DATA APPROACH
6
Luina Pani*, Somnath Karmakar†, Chinmaya Misra‡, Satya Ranjan Dash‡ School of Computer Engineering, Kalinga Institute of Industrial Technology, Bhubaneswar, India* Government College of Engineering and Leather Technology, Kolkata, India† School of Computer Applications, Kalinga Institute of Industrial Technology, Bhubaneswar, India‡
6.1 INTRODUCTION The brain is the master organ in our body and performs a multitude of vital functions. The brain employs nerve cells or neurons to perform these tasks. The activities of neurons fluctuate depending on the pattern of the task. Functional magnetic resonance imaging or functional MRI (fMRI) is a method to assess neuronal activity, which is correlated with brain activity [1, 2]. fMRI is noninvasive and harmless as it does not involve any surgical procedure, or exposure to detrimental electromagnetic radiation. fMRI measures various activities of the brain by identifying changes linked with neuronal activation and blood flow to the cerebral region. When the brain utilizes an area, blood flow to that region increases considerably. We can use fMRI to build activation maps to show which parts of the brain are involved in a certain cerebral process. These experiments are widely used to understand various neurobehavioral disorders, such as Alzheimer’s disease. Functional MRI has a comparatively high spatial and temporal resolution and is a very powerful method to record brain functions. The subject is placed in the magnet of an MRI machine, where different kinds of stimulus such as sound or visual scenes can be administered in a controlled fashion. There is also a facility to record small motor movements or responses. fMRI is based on the idea that the magnetic resonance of blood rich in oxygen and blood deficient in oxygen are different. The additional energetic areas of the brain take delivery of extra oxygenated blood [3, 4]. The fMRI records this augmented flow of blood to find out which area is more active. The extent of flow, quantity of blood, and consumption of oxygen constitutes the blood-oxygenlevel-dependent (BOLD) signal. The protons in the vicinity of oxygenated blood produce the most powerful signals. These signals are processed by a computer to generate a three-dimensional image of the brain, which is stored in the form of voxels. Each voxel corresponds to thousands of nerve cells or neurons. A huge amount of data is produced with the increase in number of networking home appliances, vehicles, and of several other devices, and the information captured by companies and social media. These data can be referred as big data, which is not only very large in volume, but also high in variety Big Data Analytics for Intelligent Healthcare Management. https://doi.org/10.1016/B978-0-12-818146-1.00006-4 # 2019 Elsevier Inc. All rights reserved.
151
152
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
and velocity. The volume refers to amount of the data, velocity is the rate with which data is changing, and variety is the type of the data, and the different ways to use the data. Big data is a dataset that is so large and composite that conventional data processing software is not able to handle it. These data are both structured and unstructured and are widely used in various fields on a day-to-day basis. Big data analytics enables the rapid examination of data to significantly reduce costs and time and thus help in uncovering hidden patterns, unknown correlations, market trends, and other constructive information that can aid smart decision-making [5]. fMRI is an essential neuroimaging technique. It compares the activities of the brain by recording the flow of oxygenated blood supply. It produces an enormous amount of data that have to be evaluated, and thus also generates a vast network of outcomes. The existing medical image processing tools are not competent enough to integrate resources efficiently. Big data analytics platforms have been developed for huge datasets such as fMRI. Big data analytics helps in knowledge extraction and interpretation of the fMRI dataset. The traditional statistical data analytic solutions generally focus on static analytics that are restricted to the analysis of samples that are stationary in time, which often leads to untrustworthy conclusions. Machine learning is a good option that addresses these problems. It focuses on the progress of quick and proficient algorithms for real-time processing of data. The main goal of machine learning is to create exact predictions of various types. Machine learning is an application that provides the facility to learn and improve from experience with no involvement, support, or human intervention and adjust actions accordingly [6]. The development of learning starts with observations or data, such as understanding training examples to look for patterns in data and make enhanced decisions in the future. Machine learning uses various methods to analyze data and broadly group the data into two types: supervised and unsupervised. In supervised learning, trained examples are used to make predictions. The training dataset includes the input data and their desired output values and it can make predictions of the given values for new examples. Supervised learning contains two types of algorithms that is, classification and regression. Unsupervised learning employs a dataset without previous training. Cluster analysis is the most familiar unsupervised learning method on the basis of similarity evaluated using metrics such as Euclidean or probabilistic distance, where the datasets are grouped into various clusters. fMRI has provided researchers with countless new insights into the inner workings of the human brain. The statistical analysis of fMRI data is very difficult. The data is enormous in volume, comprising a sequence of MRI. These data are heterogeneous and noisy due to possible head movement or breathing of subjects and lack of proper technology. The traditional statistical data analytic solutions are not efficient enough to draw effective conclusions from such high-resolution measurements. In the machine learning approach, the dimensionality of the data is effectively reduced by removing redundancy through voxel elimination. The voxels from seven selected region of interest (ROI) are chosen as features for classification. By using machine learning, we can predict or classify the fMRI dataset efficiently. We explore here two classifier training methods: logistic regression and support vector machine [5]. Logistic regression is the regression problem, with a small number of distinct values for prediction. Classification can accept only two values that lie between 0 and 1. We could move toward the classification without paying attention the detail that y is discretevalued, and apply previous linear regression to attempt to calculate y given x. We know that y 2 {0, 1} and thus hθ(x) must take values between 0 and 1.
6.1 INTRODUCTION
153
For this purpose, we will opt with: hθðxÞ ¼ gðθTxÞ ¼
1 T 1 + eθ x
(6.1)
where: gðzÞ ¼
1 1 + ez
(6.2)
is referred as the logistic or sigmoid function. g(z) tends towards 1 as z ! ∞, and g(z) tends towards 0 as z ! ∞. Moreover, g(z), and hence also h(x), is always bounded between 0 and 1. Here we maintain the convention of letting x0 ¼ 1, so that: θ T x ¼ θ0 +
Xn
θx i¼1 j j
(6.3)
The classification model has a number of probabilistic assumptions to fit θ for it, and then the parameters are fitted using the maximum likelihood. Let us assume that: Pðy ¼ 1 j x; θÞ ¼ hθðxÞ
(6.4)
Pðy ¼ 0 j x; θÞ ¼ 1 hθðxÞ
(6.5)
This can be presented more efficiently as: pðy j x; θÞ ¼ ðhθðxÞÞyð1 hθðxÞÞ1 y
(6.6)
SVM is considered to be one of the most important methods for classification and linear kernel [5, 7] is used for training SVM. The SVM is a generalization of maximal margin classifier and it makes use of a hyperplane for segregating two types of data. The maximal margin hyperplane outermost from the training examples is selected. The distance from each training example to a given segregating hyperplane can be calculated. The margin is the minimal distance from the examples to the hyperplane. These two hyperplanes can be represented as given below: x:y b ¼ 1 and x:y b ¼ 1
(6.7)
Obviously, in order for | | 2w| | to be maximized, the value of w should be minimum. The next important step is the feature selection method. The feature selection methods enumerated below are employed independently and collectively in order to compare performance. All voxels or features present in the data are used to create sample data which is used for training and testing different learning models. A training example may have a number of images and each image comprises of thousands of voxels. As a result, the dimension of the feature vector can be very high. ROI is a popular method used for selection of features in fMRI image classification [1]. Within every ROI, the mean activation value referred to as the super voxel is calculated for the voxels. The feature vectors are the collections of average values [2, 8]. This approach is very useful for data that is assembled from diverse subjects. Voxels are chosen based on their capability to differentiate target methods. Here we have used F-ratio based method to find the contribution of each voxel in the feature set.
154
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
This method is similar to the N-most active method but here voxels are chosen uniformly from seven ROIs. This five-feature selection approach is mainly used for subject-dependent experiments. In subject-independent cases, we use two feature selection methods: Average ROI and Active Average ROI where we calculate the N-most active voxels in each ROI and CalculateMean to get the superfeature or supervoxel. The common process of calculation is to find the distribution mean and standard deviation mean for each of the features. Then the mean is deducted from every feature and after that the values of every feature are divided by its standard deviation. x0 ¼
xx σ
(6.8)
Where x represents the original feature vector, x represents the mean of that feature vector, and σ represents its standard deviation. We will use data in learning models without standardization, that is, as we collected data from web [9], and with standardization, that is, scaling the data using the standardization method. We present the results of both cases in different experiments.
6.2 RELATED WORK In present healthcare systems, big data analytics and machine learning have become an inimitable tool for proficient medical services. fMRI is helpful for assessing the latent risks of treatments of the brain and how a normal, ailing, or injured brain is working. This section outlines some of the significant contributions of different researchers in the field of machine learning, big data analytics, and fMRI in healthcare. David M. Vock et al. [9] predicted the prospect of occurrence of different health issues where Bayesian Networks are used for building the prediction models. Logistic regression (LR) models were included as supplement factors. Classification is done using k-nearest neighbors. Classification of properties of different molecules that can cause cancer or alter the genetic material by means of machine learning techniques was proposed by N.S. Hari Narayana Moorthy [9]. Carcinogenic and mutagenic data of 1481 chemically dissimilar particles has been used through SRD (sum of ranking difference). MACCs fingerprinting methods are also applied to cause chemical carcinogenicity. The result obtained was used to find out whether a chemical can induce cancer or can change the genetic properties for chemical regulatory purpose. The sum of ranking difference (SRD) was computed for every predictive model and used for comparison of performance. The work of Daisuke Ichikawaa et. al [10] used substantial health check-up data and predicted whether the candidate needed guidance related to health. A machine learning method was employed for the purpose of identifying candidates. Five different models for prediction were developed with the help of machine learning methods. A gradient-boosting decision tree (GBDT) was also incorporated. J. Shotton et al. [11] presented a framework in which twin layer kernel extreme learning machine (DKELM) was used to detect action in videos. Earlier different features were simply fused together to improve the recognition performance, but the suggested model involved the double layer classification with extreme learning machine. In the late fusion mechanism, the output of the early fusion layer can be used as an input to the second layer. Daisuke Ichikawa et al. [12] discussed the role of machine learning methods to help in effective segregation of subjects with a possibility of hyperuricemia. They proposed a new machine learning approach to identify candidates at high risk of hyperuricemia. The suggested system can be applied
6.2 RELATED WORK
155
to common health check-ups and will minimize medical expenses. A training example was used to prepare the random forest (RF), gradient-boosting decision tree and logistic regression were then applied to calculate the chances of hyperuricemia in the test dataset. Undersampling was applied to build the prediction models to manage the cumbersome class dataset. The outcome showed that the RF had the best performances in terms of sensitivity while the GBDT approach had the best performance in specificity. Diabetes mellitus is a disorder of glucose metabolism affecting the well-being of mankind all around the world. Researchers are putting considerable effort into the various facets of diabetes, such as its identification, complications, genetic conditions responsible, health impacts, and management. Tao Zheng et al. [13] proposed a framework using machine learning that analyzes the computerized health records to discover type 2 diabetes. Expert algorithms were employed for this purpose. Classification models such as Naı¨ve Bayes (NB), RF, SVM, and logistic regression were used to model samples of cases and controls based on features obtained. The research work performed by Md. Maniruzzaman et al. [14] includes utilization of machine learning methods for the grouping of diabetes mellitus information. The process is based on a Gaussian classification technique with three kernels adopted by them. They also compared the performance of a GP-based classification method to existing techniques in terms of the accuracy, sensitivity, specificity, and positive and negative predictive value. M Alssema et al. [15] applied machine learning methods for identification and classification of diabetes mellitus patients, which throws light on the discrepancy observed in the distributions of maximum plantar pressure found in the diabetic population of an area, which helps to avoid foot ulcers caused by diabetes. The SVM and dK-means clustering techniques were adopted for the diagnosis of diabetes. Ioannis Kavakiotis et al. [16] reviewed the use of machine learning data mining method tools in research carried out in diabetes. A number of machine learning techniques were used of which most were supervised learning methods. Comparison of different machine learning algorithms in several biological as well as clinical datasets was carried out. SVM showed the most excellent classification accuracy. Takemori Watanabe et al. [17] incorporates the various features obtained from functional connections or the network map at the resting stage of the entire brain with a multivariate technique. The regularization framework with 6D structure of the functional connection is taken into consideration by the use of the fused Lasso through the hinge-loss generated a SVM with capability of feature selection. Omar Y.Al-Jarrah et al. [18] provided an assessment of energy-efficient machine learning literatures. They introduced a new perspective for technologists, analysts, and researchers in the computer science and provide a layout for potential research activities. A distributed learning model for the restricted Boltzmann machine (RBM) and the back-propagation algorithm using MapReduce was employed. Theoretical and experimental aspects in large-scale data intensive fields, relating to model energy efficiency, including computational requirements in learning, and possible approaches, and structure and design of data-intensive areas, including the relationship between data models and characteristics were discussed. Emmanuel Bibault, Philippe Giraud, and Anita Burgun [19] presented methods that could be used to construct analytical models to treat cancer with radiation. Various machine learning methods, such as SVM and artificial neural networks (NNs) were also integrated. Progresses in radiation oncology have produced huge amounts of data that need to be incorporated. Electronic health reports also give large volumes of information. With the advancement of big data analytics and machine learning, this
156
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
formed excellent support in radiation oncology. The enhancement of the predictive model performance will contribute to their increased usage in personalizing treatment through ionizing radiation safely and efficiently. The use of Spark-based machine learning techniques on streaming of big data was discussed by Lekha R. et al. [20]. Decision tree algorithms were used. Decision tree is the most common machine learning method for classification and was selected for prediction. Spark’s machine learning library was employed to build a flexible machine learning model designed for prediction, which was competent at efficiently handling massive datasets. Lina Zhoua et al. [21] identified the prospects and challenges of using machine learning techniques on big data. A framework using machine learning was introduced for big data. The machine learning component is the key component for identifying hidden patterns from big data. The other components are the user, domain, system, and big data. The machine learning part handles the challenges as well as extraction of knowledge for decision-making. ´ lvaro Brando´n Herna´ndez et al. [22] for A method based on machine learning was discussed by A optimization of parallelism within applications using big data. By observing the various metrics of the system and its application, it is possible to achieve optimal configuration by avoiding chances of failure, deterioration of performance, and boosts in resource utilization. Various regression algorithms, such as LR, gradient boosting regression, and support vector regression were used. Along with these algorithms, k-neighbors were also used to predict the optimal set-up. Lei Zhang [4] explored the machine learning methods to differentiate drug-users from healthy persons. 3D brain images of the candidate were acquired with fMRI BOLD. With the help of machine learning methods, the hidden pattern for differentiating the subjects addicted to drugs from nonaddicted controls were found and used to carry out classification for diagnosis. Mehdi Behroozi et al. [23] processed the fMRI data in order to trace the changes in brain activities due to injury of the brain. The mechanisms that were employed for the analysis of fMRI data are discussed. The preprocessing stages, including univariate and multivariate techniques as employed in functional MRI data evaluation, are described. Guo-Rong et al. [24] proposed that by using BOLD, signal peaks can provide significant information in the resting-state fMRI. In the framework of information theory, partial conditioning was applied to a restricted subset of variables. The differences between BOLD and combined BOLD level effective networks were compared. AnettaLasek-Bal. et al. [25–27] assessed the brain processes in fMRI in patients with strokes due to interruptions of blood supply to the brain and evaluated the possible relationship between the order of activity and the neurological status. The fMRI was carried out and patients were observed on first day as well as on the 14th day after the stroke. Disparity was perceived in stroke and nonstroke hemispheres. More than half of the patients with stroke showed cerebellar activation. Paul M. Matthews et al. [26] discussed the clinical concepts emerging from fMRI functional connectomics. Their work includes the exploration of different challenges and possible opportunities for clinically pertinent applications of fMRI-based functional connections. fMRI had important influences on clinical concepts guiding analysis and management of patients. A model for steady multiple subject independent component analysis (ICA) on fMRI datasets was discussed by G. Varoquaux et al. [28]. Vincent Michel et al. [29] proposed a method that merges signals from several brain regions found in fMRI to forecast the behavior of the control for the period of a scanning session. The length of the probable spatial configurations was condensed to a single tree adapted to the signal. Then the tree was reduced in a supervised setting. Reduction of dimensionality was accomplished with the help of feature agglomeration and the constructed features offer a
6.3 OUR APPROACH
157
multiscale depiction of the signal. Alexandre Abrahamet et al. [30] illustrated how scikit-learn can be used to carry out some important analysis steps. Scikit-learn has several supervised and unsupervised learning algorithms. This work application to neuro-imaging data offers a multipurpose tool to study the brain. Masaya Misaki et al. [6] compared six multivariate classifiers and examined the response normalizations for pattern information. These schemes were compared with reaction patterns in human early visual and inferior temporal cortex and accuracy at deciphering the type of visual objects was evaluated. Niels VæverHartvig et al. [31] estimated a spatial mixture modeling of fMRI data. The fMRI data was divided into two components on the basis of activation of voxel. Using this model, the posterior probability for an activated voxel can be effortlessly estimated. It provides a better thresholding than the statistic image. The method of Everitt and Bullmore [32] was employed for spatial coherency of activated regions. This was achieved by calculating the posterior probability of an activated voxel with the use of spatial structure obtained from modeling the activation in a small region. This research model was functional to synthetic data from statistical image analysis, a synthetic fMRI dataset, and to data of visual stimulation. Big data has given us a new method to analyze the huge collection of data stored as health data. A large amount of research work has been undertaken and their valuable insights help us to make the relevant changes in the healthcare system. It has decreased the cost of treatment drastically and helps us to predict the incoming problems in advance using the latest technology [33–36]. To balance the load of huge health data, some research works [37–40] use the edge computing nodes to make a faster and more reliable framework. In this works, they have shown the comparative performances of different smart devices [41–45].
6.3 OUR APPROACH fMRI is widely used for prediction of the current activity in the human brain and the functional mechanism of the brain can be mainly classified into two or more states by analyzing the fMRI data observed over a single time interval with the assistance of machine learning tools [28]. Therefore, in this chapter we have trained our fMRI dataset with different machine learning approaches to enhance the precision of classification.
6.3.1 DATASET We used preprocessed fMRI data as the input [1], where all the voxel activities were represented after various experiments consisting of sets of trials.
6.3.2 METHODOLOGY A series of trials were carried out on six subjects. In half of the trials, subjects were shown a sentence followed by the picture. While in the rest of the trials the picture is shown first then a sentence. The proposed flow diagram is depicted in Fig. 6.1A and B and fMRI image sample was shown. The former is named as an SP dataset and the latter as a PS dataset. In both cases, the first stimulus (either the picture or sentence) was shown for 4 s, then a blank screen for 4 s followed by the second stimulus for 4 s (either the picture or the sentence). At the end, the subjects has to press the mouse button in order to specify whether the sentence correctly matched the picture. Images of the brain were recorded
158
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
Feature standardization
(A)
(B) FIG. 6.1 (A) Block diagram of our proposed method (B) fMRI image.
each 0.5 s. Each trial had about 270,000 voxels. The data was differentiated into 25 ROA or Regions of Activation. These data were used to find out the feasibility of our SVM and LR classifiers.
6.3.3 RESULT EVALUATION A simple evaluation method is a train test dataset where the dataset is divided into a train and a test dataset, then the learning model is trained using the train data and performance is measured using the test data. In a more sophisticated approach, the entire dataset is used to train and test a given model.
6.3 OUR APPROACH
159
For this study, we used k-fold cross validation to evaluate our learning models. In this method, the dataset is divided into k-fold and the model is repeated k times where in each repetition one fold is used to test and the rest is used for training the model. We calculate the mean of this k performance and present it in the results.
6.3.4 EXPERIMENTAL RESULTS The adopted classifiers mainly arranged the datasets into two cognitive states and it followed two assumptions: whether there is sufficient information to classify the cognitive states and if the machine learning approach can effectively study the spatial-temporal patterns for classifying the states. Experimental results were shown and explained. The sequence of data in the form of images belonging to every group as depicted in Fig. 6.2 in PS and SP can be comprised as follows: PS PS PS PS PS PS PS : Ip1 ,Ip2 ,…,Ip16 ,IS1 ,IS2 ,…,IS16
(6.9)
SP SP SP SP SP SP SP : Is1 ,Is2 ,…, Is16 ,Ip1 ,Ip2 ,…, Ip16
(6.10)
To generate the data for class P, grouping of both PS and SP trials was necessary. The entire sample for each class was 45. Here the 2D matrix comprised of 80 rows and 16 columns, where each column represented a dissimilar snapshot. Here the subject mainly represented the number of voxels of dissimilar types. In Table 6.1, the first row represents subjects and the second row represents corresponding number of voxels for that subject.
SP SP SP Is1 , Is2, ¼ , Is16
¼
Trial 1
¼
Class S
Trial 40
PS PS PS Is1 , Is2, ¼ , Is16 SP SP SP Ip1 , Ip2, ¼ , Ip16
¼
¼
Class P
Trial 1 Trial 40
PS PS PS Ip1 , Ip2, ¼ , Ip16
FIG. 6.2 Overall statistics for a specified subject matter.
Table 6.1 The Number of Voxels in Each Subject Subject
04799
04820
04847
05675
05680
05710
No. of voxels
4949
5015
4698
5135
5062
4634
160
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
In the first experiment, the images were collected from PS and SP and pooled in PS + SP. In the second experiment, we calculated PS and SP separately and the classification error was lower (0.50).
6.3.5 SUBJECT-DEPENDENT EXPERIMENTS ON PS + SP We had 80 samples (40 samples per class), out of which 72 samples were used for training and 8 samples are used for testing in each repetition. In this paper, we conducted four feature selection methods.
6.3.5.1 All features In Table 6.2, we have shown Class P, Class S, and the number of voxels for six subjects. We calculated the number of features for each subject. In this case, no feature selection was applied and all features (voxels) were used to construct the feature vector. Each subject had a different number of voxels. Table 6.3 shows the accuracies of classification using machine learning approaches. The accuracies are presented in the tabular form and considered data values with standardization and without
Table 6.2 Information About Number of Samples and Features Number of Samples Subject
Class P
Class S
No. of Voxels
Snapshots No.
No of Features (Voxels*Snapshots)
04799 04820 04847 05675 05680 05710
45 45 45 45 45 45
40 40 40 40 40 40
4949 5015 4698 5135 5062 4634
18 18 18 18 18 18
79,184 80,240 75,168 82,160 80,992 74,144
Table 6.3 Classification Accuracies in Percentage Logistic Regression
Support Vector Machine
Subject
Without Standardization
With Standardization
Without Standardization
With Standardization
04799 04820 04847 05675 05680 05710 Average
62 62 80 68 69 78 70
64 65 88 69 75 76 72
60 68 91 69 76 76 73
61 65 86 71 74 76 74
6.3 OUR APPROACH
161
Table 6.4 Information About Number of Samples and Features Number of Samples Subject
Class P
Class S
No. of Voxels
No. of Snapshots
No. of Features (Voxels*Snapshots)
04799 04820 04847 05675 05680 05710
45 45 45 45 45 45
45 45 45 45 45 45
1874 1888 1713 2239 2230 1883
18 18 18 18 18 18
29,985 30,208 27,408 35,824 35,680 30,128
standardization. It has been observed that accuracy was higher in support vector machine compared to logistic regression for both standardization and nonstandardized data. Accuracy increased when we applied standardization on data except for some subjects.
6.3.5.2 ROI-based feature All of the voxel information is classified in the seven ROI in Table 6.4. We have shown Class P and Class S for different subject values. We have extracted the number of features for each subject value. Table 6.5 presents the accuracies achieved while ROI-based features were included. ROI-based features gave improved results when compared to the reference experiment where all the voxels were used. Standardization increased performance in LR but in SVM, performance decreased slightly.
6.3.5.3 Average ROI-based feature In this experiment, the average of each seven ROI was considered as a super voxel feature. In Table 6.6 the performance of the Average ROI based feature is presented. We have compared average ROI based feature for logistic regression (LR) and support vector machine (SVM). Therefore, we conclude that the
Table 6.5 Classification Accuracies in Percentage Logistic Regression
Support Vector Machine
Subject
Without Standardization
With Standardization
Without Standardization
With Standardization
04799 04820 04847 05675 05680 05710 Average
64 68 85 71 71 75 72
65 68 95 76 79 81 77
68 68 96 78 85 84 80
66 71 95 76 80 82 78
162
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
Table 6.6 Classification Accuracies in Percentage Logistic Regression
Support Vector Machine
Subject
Without Standardization
With Standardization
Without Standardization
With Standardization
04799 04820 04847 05675 05680 05710 Average
70 61 91 76 64 82 74
65 66 94 76 59 81 74
69 65 91 76 64 85 75
64 60 89 74 57 82 71
Table 6.7 Classification Accuracies in Percentage Logistic Regression
Support Vector Machine
Subject
Without Standardization
With Standardization
Without Standardization
With Standardization
04799 04820 04847 05675 05680 05710 Average
88 94 93 90 84 91 90
94 97 94 99 99 96 97
94 97 94 99 96 94 96
94 97 94 99 97 94 96
averaging-based feature selection discarded precious information. With standardization of data, the average accuracy was the same for LR but it decreased in SVM.
6.3.5.4 N-most active-based feature In Table 6.7, the N-most active voxels were utilized for reducing the number of features in the feature vector. Table 6.7 presents the performance of the experiment where N-most active voxels were considered. It can be seen that the accuracy increased compared to previous experiments and data standardization increased accuracy in both cases.
6.3.5.5 N-most active ROI-based feature This is similar to the above active method. Here, the N most active voxels were employed uniformly from seven ROIs. The performances are presented in Table 6.8. It can be seen that accuracies were less when compared with the previous N-most active based feature but it improved performance compared with others. In LR, the performance increased with standardization.
6.3 OUR APPROACH
163
Table 6.8 Classification Accuracies in Percentage Logistic Regression
Support Vector Machine
Subject
Without Standardization
With Standardization
Without Standardization
With Standardization
04799 04820 04847 05675 05680 05710 Average
81 82 89 86 78 81 83
93 89 94 96 94 88 92
93 85 95 97 89 88 91
90 84 94 96 90 85 90
6.3.6 SUBJECT-DEPENDENT EXPERIMENT ON PS/SP The PS and SP were removed and the test was conducted for each set separately. The organizations of the data used are illustrated below in Fig. 6.3, respectively for PS and SP datasets. The regular accuracies achieved for every subject were recorded after experimenting 10-fold cross validation in Fig. 6.4. In Table 6.9, we have shown the dataset for different subject values to evaluate the performances of the feature selection schemes. In Table 6.10, we have shown the classification problem relating to every subject. We have shown the analysis on the data before the standardization and after the standardization. We have considered SVM and logistic regression. The precision was improved significantly after the standardization of the data and SVM showed better performance when compared to LR.
PS PS PS Is1 , Is2, .... , Is16 PS PS PS Ip1 , Ip2, .... , Ip16
¼
Class P
PS PS PS Is1 , Is2, .... , Is16
FIG. 6.3 PS dataset used in this experiment.
¼
Trial 1 Trial 20 Trial 1 ¼
Class S
¼
PS PS PS Is1 , Is2, .... , Is16
Trial 20
164
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
SP SP SP Is1 , Is2, .... , Is16
¼
Trial 1
¼
Class S
Trial 20
SP SP SP Is1 , Is2, .... , Is16 SP SP SP Ip1 , Ip2, .... , Ip16
Trial 1
¼
¼
Class P
SP SP Ip1 , Ip2,
.... ,
Trial 20 SP Ip16
FIG. 6.4 SP dataset used in this experiment.
Table 6.9 Information for Number of Samples and Features Number of Samples (in both SP and PS) Subject
Class P
Class S
No. of Voxels
04889 04820 04847 05675 05680 05710
25 25 25 25 25 25
25 25 25 25 25 25
4949 5015 4698 5135 5062 4634
Number of Snapshots
Voxels*Snapshots
18 18 18 18 18 18
78,985 80,240 75,168 82,160 80,992 74,144
6.3.6.1 ROI-based feature In this case, the voxels of seven ROIs were selected to diminish the number of features given in Table 6.11. In Table 6.12, we have shown the standard precision of correctness. The SVM was better than NN and it was better compared to PS + SP. Data standardization improved performances in LR but there was almost no improvement in SVM. ROI features do not produce steady improvement in PS/SP experiments but feature scaling showed slight improvements (as shown in Table 6.12).
6.3.6.2 Average ROI-based feature The average ROI was used here for defining the supervoxel features and the total number of features can be computed as 7 17 ¼ 119 in Table 6.13.
Table 6.10 Classification Accuracies in Percentage PS Experiment LR
SP Experiment SVM
LR
SVM
Subject
Without Standardization
With Without Standardization Standardization
With Without Standardization Standardization
With Without Standardization Standardization
With Standardization
04799 04820 04847 05675 05680 05710 Average
50 70 82 78 75 70 71
72 93 93 95 82 100 89
68 93 88 93 80 100 87
93 100 100 95 100 100 98
82 95 97 95 100 100 95
72 93 90 93 80 100 88
80 88 97 95 100 97 93
80 97 100 93 100 100 95
166
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
Table 6.11 Number of Samples and Features Number of Samples (in both SP and PS) Subject
Class P
Class S
No of Voxels
No of Snapshots
No of Features (Voxels*Snapshots)
04799 04820 04847 05675 05680 05710
25 25 25 25 25 25
25 25 25 25 25 25
1874 1888 1713 2239 2230 1883
18 18 18 18 18 18
29,984 30,208 27,408 35,824 35,680 30,128
6.3.6.3 N-most active-based feature In this experiment, the N-most active voxels were chosen so that most energize PS + SP whose feature size was 700 18 ¼ 12,600. Table 6.14 presents the performance of the experiment where the N-most active voxels were considered. It can be seen that the accuracy was the best compared to previous experiments. Feature standardization improved accuracy in LR but it remained the same in SVM.
6.3.6.4 Most active ROI-based feature In Table 6.15, the most active voxels inside each ROI were used to reduce the dimensionality of the feature vector. For the PS dataset, the LR showed more accuracy compared to the SVM. In the SP experiment, the SVM showed more precision than LR.
6.4 RESULT ANALYSIS
6.4.1 SUMMARY OF THE SUBJECT-DEPENDENT RESULTS Better results were found when the N-most active voxels were used in the above cases, where the SP data results were better than PS. Table 6.16 shows that data standardization improved accuracy in most of the cases but in some cases, it slightly decreased the performance. It was also seen that in SVM, after applying standardization, performance was almost same or slightly increased.
6.4.2 SUBJECT-INDEPENDENT EXPERIMENT In Table 6.17, we only considered seven ROI, which were specified by experts. Since, in the ROI average, we considered the average of all the voxels present within the seven ROI, the feature size was 7 16 ¼ 112, but in the case of active ROI average, we considered the average of 100 most active voxels within each of the seven ROI. In this case, the feature size was 7 16 ¼ 112. Table 6.18 showed that the performances achieved by subject-independence were substandard to subject-dependent ones, which is as was expected. And as feature values varied with subjects, data standardization is very important and we can see that after applying this, accuracy improved.
Table 6.12 Classification Accuracies in Percentage PS Experiment LR
SP Experiment SVM
LR
SVM
Subject
Without With Without With Without With Without With Standardization Standardization Standardization Standardization Standardization Standardization Standardization Standardization
04799 04820 04847 05675 05680 05710 Average
55 72 88 75 75 75 73
65 88 90 93 85 100 87
65 90 95 93 78 100 87
60 90 90 93 82 100 86
68 93 100 93 100 100 92
80 95 100 95 100 100 95
78 95 100 95 100 100 95
80 95 100 95 100 100 95
Table 6.13 Classification Accuracies in Percentage PS Experiment LR
SP Experiment SVM
LR
SVM
Subject
Without With Without With Without With Standardization Standardization Standardization Standardization Standardization Standardization
Without With Standardization Standardization
04799 04820 04847 05675 05680 05710 Average
62 72 82 82 72 82 75
85 88 100 90 100 100 94
68 85 82 90 68 95 81
68 78 88 90 62 93 80
70 80 82 93 65 93 81
90 90 100 90 97 100 95
90 90 100 93 100 100 96
82 85 100 90 100 100 93
Table 6.14 Classification Accuracies in Percentage PS Experiment LR
SP Experiment SVM
Logistic Regression
SVM
Subject
Without With Without With Without With With Standardization Standardization Standardization Standardization Standardization Standardization WithoutStandardization Standardization
04799 04820 04847 05675 05680 05710 Average
80 95 95 95 93 80 90
93 100 100 97 100 100 98
90 100 97 97 100 97 97
93 100 97 97 100 97 97
93 100 97 100 100 97 98
93 100 100 100 100 100 99
95 100 97 100 100 97 98
95 100 97 100 100 97 98
Table 6.15 Classification Accuracies in Percentage PS Experiment LR
SP Experiment SVM
LR
SVM
Subject
Without With Without With Without With Without With Standardization Standardization Standardization Standardization Standardization Standardization Standardization Standardization
04799 04820 04847 05675 05680 05710 Average
75 93 95 93 88 80 87
93 97 97 97 95 97 96
93 97 97 95 93 95 95
93 97 97 93 95 95 95
90 85 95 93 97 95 93
93 82 95 97 100 97 94
93 90 93 95 97 95 94
90 78 95 97 97 95 92
6.5 CONCLUSION AND FUTURE WORK
171
Table 6.16 Summary of the Average Accuracies in Percentage PS + SP Experiment All voxels ROI-based features Average ROI N-most active N-most active ROI
Without Standardization With Standardization Without Standardization With Standardization Without Standardization With Standardization Without Standardization With Standardization Without Standardization With Standardization
PS
SP
LR
SVM
LR
SVM
LR
SVM
70 73 72 77 74 74 90 97 83 92
73 72 80 78 75 71 96 96 91 90
71 89 73 87 75 81 90 98 87 96
88 87 87 86 80 81 97 97 95 95
93 98 92 95 95 96 98 99 93 94
95 95 95 95 94 93 98 98 94 92
Table 6.17 Information About Number of Samples and Features No. Of Each Class Sample Subject
Class S
Class P
No. of Voxels
No. of Snapshots
No. of Features (Voxels × Snapshots)
All
240
240
7
16
112
Table 6.18 Classification Accuracies in Percentage PS + SP
Average ROI Active average ROI
Without Standardization With Standardization Without Standardization With Standardization
PS
SP
LR
SVM
LR
SVM
LR
SVM
71 77 74 73
68 76 73 70
78 79 85 87
73 74 81 86
90 92 77 88
88 92 73 88
6.5 CONCLUSION AND FUTURE WORK In this proposed work, we discussed the performance of the different classifiers for classifying different human brain data. The experiments showed that better scores were achieved when the LR method was used compared to the SVM. We need to study the prospects of different machine learning schemes for a more accurate analysis. In the future, we will implement Fuzzy k-SVM to achieve better performance in the given StarPlus fMRI dataset.
172
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
REFERENCES [1] M.D. Fox, M.E. Raichle, Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging, Nat. Rev. Neurosci. 8 (9) (2007) 700. [2] J.D. Van Horn, et al., The functional magnetic resonance imaging data center (fMRIDC): the challenges and rewards of large–scale databasing of neuroimaging studies, Philos. Trans. R. Soc. Lond. B Biol. Sci. 356 (1412) (2001) 1323–1339. [3] S. Parida, S. Dehuri, S.-B. Cho, Machine learning approaches for cognitive state classification and brain activity prediction: a survey, Curr. Bioinforma. 10 (4) (2015) 344–359. [4] L. Zhang, et al., Machine learning for clinical diagnosis from functional magnetic resonance imaging, in: Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference, Vol. 1, IEEE, 2005. [5] A. L’heureux, et al., Machine learning with big data: challenges and approaches, IEEE Access 5 (2017) 7776–7797. [6] M. Misaki, et al., Comparison of multivariate classifiers and response normalizations for pattern-information fMRI, NeuroImage 53 (1) (2010) 103–118. [7] D. Saidulu, R. Sasikala, Machine learning and statistical approaches for big data: issues, challenges and research directions, Int. J. Appl. Eng. Res. 12 (21) (2017) 11691–11699. [8] A. Belle, et al., Big data analytics in healthcare, Biomed. Res. Int. 2015 (2015) 1–16. [9] D.M. Vock, et al., Adapting machine learning techniques to censored time-to-event health record data: a general-purpose approach using inverse probability of censoring weighting, J. Biomed. Inform. 61 (2016) 119–131. [10] D. Ichikawa, T. Saito, H. Oyama, Impact of predicting health-guidance candidates using massive health check-up data: a data-driven analysis, Int. J. Med. Inform. 106 (2017) 32–36. [11] T.V. Nguyen, B. Mirza, Dual-layer kernel extreme learning machine for action recognition, Neurocomputing 260 (2017) 123–130. [12] D. Ichikawa, et al., How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach, J. Biomed. Inform. 64 (2016) 20–24. [13] T. Zheng, et al., A machine learning-based framework to identify type 2 diabetes through electronic health records, Int. J. Med. Inform. 97 (2017) 120–127. [14] M. Maniruzzaman, et al., Comparative approaches for classification of diabetes mellitus data: machine learning paradigm, Comput. Methods Prog. Biomed. 152 (2017) 23–34. [15] F. Mercaldo, V. Nardone, A. Santone, Diabetes mellitus affected patients classification and diagnosis through machine learning techniques, Procedia Comput. Sci. 112 (2017) 2519–2528. [16] I. Kavakiotis, et al., Machine learning and data mining methods in diabetes research, Comput. Struct. Biotechnol. J. 15 (2017) 104–116. [17] T. Watanabe, et al., Disease prediction based on functional connectomes using a scalable and spatiallyinformed support vector machine, NeuroImage 96 (2014) 183–202. [18] O.Y. Al-Jarrah, et al., Efficient machine learning for big data: a review, Big Data Res. 2 (3) (2015) 87–93. [19] J.-E. Bibault, P. Giraud, A. Burgun, Big data and machine learning in radiation oncology: state of the art and future prospects, Cancer Lett. 382 (1) (2016) 110–117. [20] L.R. Nair, S.D. Shetty, S.D. Shetty, Applying spark based machine learning model on streaming big data for health status prediction, Comput. Electr. Eng. 65 (2018) 393–399. [21] L. Zhou, et al., Machine learning on big data: opportunities and challenges, Neurocomputing 237 (2017) 350–361. ´ .B. Herna´ndez, et al., Using machine learning to optimize parallelism in big data applications, Futur. Gener. [22] A Comput. Syst. 86 (2018) 1076–1092.
REFERENCES
173
[23] M. Behroozi, M.R. Daliri, H. Boyaci, Statistical analysis methods for the fMRI data, Basic Clin. Neurosci. 2 (4) (2011) 67–74. [24] G.-R. Wu, et al., A blind deconvolution approach to recover effective connectivity brain networks from resting state fMRI data, Med. Image Anal. 17 (3) (2013) 365–374. [25] A. Lasek-Bal, J. Kidon, M. Blaszczyszyn, B. Stasio´w, A. Zak, BOLD fMRI signal in stroke patients and its importance for prognosis in the subacute disease period—preliminary report, Neurol. Neurochir. Pol. 52 (3) (2018) 341–346. [26] P.M. Matthews, A. Hampshire, Clinical concepts emerging from fMRI functional connectomics, Neuron 91 (3) (2016) 511–528. [27] S.M. Kazan, et al., Vascular autorescaling of fMRI (VasA fMRI) improves sensitivity of population studies: a pilot study, NeuroImage 124 (2016) 794–805. [28] G. Varoquaux, et al., A group model for stable multi-subject ICA on fMRI datasets, NeuroImage 51 (1) (2010) 288–299. [29] V. Michel, et al., A supervised clustering approach for fMRI-based inference of brain states, Pattern Recogn. 45 (6) (2012) 2041–2049. [30] A. Abraham, et al., Machine learning for neuroimaging with scikit-learn, Front. Neuroinform. 8 (2014) 14. [31] Niels Væver Hartvig, Jens Ledet Jensen, Spatial mixture modeling of fMRI data, Hum. Brain Mapp. 11 (4) (2000) 233–248. [32] E.M.R. Lake, P. Bazzigaluppi, B. Stefanovic, Functional magnetic resonance imaging in chronic ischaemic stroke, Philos. Trans. R. Soc. B 371 (1705) (2016) 20150353. [33] A.E. Hassanien, N. Dey, S. Borra (Eds.), Medical Big Data and Internet of Medical Things: Advances, Challenges and Applications, Taylor & Francis, 2019. [34] N. Dey, C. Bhatt, A.S. Ashour, Big Data for Remote Sensing: Visualization, Analysis and Interpretation, Springer, 2018. [35] N. Dey et al., (Ed.), Internet of Things and Big Data Analytics Toward Next-Generation Intelligence, Springer International Publishing, 2018. [36] Y. Bhatt, C. Bhatt, Internet of things in healthcare, in: Internet of Things and Big Data Technologies for Next Generation Health Care, Springer, Cham, 2017, pp. 13–33. [37] M.S. Kamal, et al., De-Bruijn graph with map reduce framework towards metagenomic data classification, Int. J. Inf. Technol. 9 (1) (2017) 59–75. [38] R.K. Barik, et al., GeoFog4Health: a fog-based SDI framework for geospatial health big data analysis, J. Ambient. Intell. Humaniz. Comput. 9 (48) (2018) 1–17. [39] R. Barik, et al., Fog2fog: Augmenting scalability in fog computing for health GIS systems, in: Proceedings of the Second IEEE/ACM International Conference on Connected Health: Applications, Systems and Engineering Technologies, IEEE Press, 2017. [40] R.K. Barik, H. Dubey, K. Mankodiya, Soa-fog: Secure service-oriented edge computing architecture for smart health big data analytics, in: Signal and Information Processing (GlobalSIP), 2017 IEEE Global Conference, IEEE, 2017. [41] H. Das, B. Naik, H.S. Behera, Classification of diabetes mellitus disease (DMD): A data mining (DM) approach, in: Progress in Computing, Analytics and Networking, Springer, Singapore, 2018, pp. 539–549. [42] R. Sahani, et al., Classification of intrusion detection using data mining techniques, in: Progress in Computing, Analytics and Networking, Springer, Singapore, 2018, pp. 753–764. [43] H. Das, et al., A novel PSO based back propagation learning-MLP (PSO-BP-MLP) for classification, in: Computational Intelligence in Data Mining-Volume 2, Springer, New Delhi, 2015, pp. 461–471. [44] C. Pradhan, et al., Handbook of Research on Information Security in Biomedical Signal Processing, IGI Publishing, 2018. [45] K.H.K. Reddy, H. Das, D.S. Roy, 18 a data aware scheme, in: Networks of the Future: Architectures, Technologies, and Implementations, Taylor & Francis Group, 2017, p. 377.
174
CHAPTER 6 CLASSIFICATION FRAMEWORK OF fMRI DATA
FURTHER READING M. Gheisari, G. Wang, M.Z. AlamBhuiyan, A survey on deep learning in big data, in: Computational Science and Engineering (CSE) and Embedded and Ubiquitous Computing (EUC), 2017 IEEE International Conference, Vol. 2, IEEE, 2017. A. Hadioui, Y. Benjelloun Touimi, S. Bennani, Machine learning based on big data extraction of massive educational knowledge, Int. J. Emerg. Technol. Learn. 12 (11) (2017) 151–167.