An MVPA method based on sparse representation for pattern localization in fMRI data analysis

An MVPA method based on sparse representation for pattern localization in fMRI data analysis

Accepted Manuscript An MVPA Method Based on Sparse Representation for Pattern Localization in fMRI Data Analysis Fangyi Wang, Yuanqing Li, Zhenghui G...

2MB Sizes 9 Downloads 74 Views

Accepted Manuscript

An MVPA Method Based on Sparse Representation for Pattern Localization in fMRI Data Analysis Fangyi Wang, Yuanqing Li, Zhenghui Gu PII: DOI: Reference:

S0925-2312(17)30997-9 10.1016/j.neucom.2016.12.099 NEUCOM 18527

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

13 September 2016 15 December 2016 17 December 2016

Please cite this article as: Fangyi Wang, Yuanqing Li, Zhenghui Gu, An MVPA Method Based on Sparse Representation for Pattern Localization in fMRI Data Analysis, Neurocomputing (2017), doi: 10.1016/j.neucom.2016.12.099

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Higlights • A new MVPA method for fMRI Data Analysis.

CR IP T

• The ability of detecting subtle differences between experimental conditions.

• We localized two category-specific brain activation patterns corresponding to two experimental conditions.

• The two sets consisted of a maximal number of informative features.

AC

CE

PT

ED

M

AN US

• The wrong selected features (noises) were removed by permutation tests.

1

ACCEPTED MANUSCRIPT

An MVPA Method Based on Sparse Representation for Pattern Localization in fMRI Data Analysis

CR IP T

Fangyi Wanga,b , Yuanqing Lia,b,∗, Zhenghui Gua,b a Center

for Brain Computer Interfaces and Brain Information Processing, South China University of Technology, Guangzhou, 510640, China b Guangzhou Key Laboratory of Brain Computer Interaction and Applications, Guangzhou 510640, China

AN US

Abstract

Multivariate pattern analysis (MVPA) approach applied to neuroimaging data, such as functional magnetic resonance imaging (fMRI) data, has received a great deal of attention because of its sensitivity to distinguishing patterns of neural activities associated with different stimuli or cognitive states. Generally, when

M

using MVPA approach to decode the mental states or stimuli, a set of discriminative variables (e.g. voxels) is first selected. However, in most of existing MVPA methods, the selected variables do not contain all informative variables,

ED

since these selected variables are sufficient for decoding. In this paper, we propose a multivariate pattern analysis method based on sparse representation for decoding the brain states and localizing category-specific brain activation areas

PT

corresponding to two experimental conditions/tasks at the same time. Unlike traditional MVPA approaches, this method is designed to find informative vari-

CE

ables as many as possible. We applied the proposed method to two judgement experiments: a gender discrimination and a emotion discrimination task, data analysis results demonstrate its effectiveness and potential applications.

AC

Keywords: Sparse representation, localizing, decoding, fMRI

∗ Corresponding

author Email addresses: [email protected] (Fangyi Wang), [email protected] (Yuanqing Li), [email protected] (Zhenghui Gu)

Preprint submitted to Neurocomputing

June 1, 2017

ACCEPTED MANUSCRIPT

1. Introduction On the any given moment, our brains are accessing vast amount of information about the around environment. How the brain processes this flood of

5

CR IP T

information within local and global networks is one fundamental question in neuroscience. Functional magnetic resonance imaging (fMRI) has become one of the

most popular tools for imaging brain function [1]. However, fMRI data yields very complex, high-dimensional data sets including up to hundreds of thousand

voxels. Traditionally, the data have been analyzed with a mass-univariate gen-

eral linear model(GLM) approach to reveal task-related brain areas by treating each voxel separately [2]. One of the limitations about the GLM approach is

AN US

10

that the interrelationship among voxels of spatially distributed brain areas is not considered because it works on isolated voxels and ignores joint information among them.

In recent years, multivariate pattern analysis (MVPA) approaches have shown promise for the analysis of fMRI data, their ability to localize spatial patterns

M

15

of activity that differentiate across experimental conditions/tasks [3]. These spatial patterns generally are too weak to be detected by GLM [4, 5, 6, 7, 8].

ED

Applications of the MVPA have been rapidly developed in fMRI data analysis, such as stimuli reconstruction [9, 10], attention [11, 12, 13], decision making [14], concept representation [15, 16, 17]. Recent MVPA approaches include three

PT

20

common forms: regions of interest (ROI) based MVPA[8, 18, 14], whole-brain MVPA [19, 20], local multivariate search approach(e.g. searchlight) [14, 21]. In

CE

MVPA, the last few years have witnessed a flurry of research activity on algorithms and theory aimed at feature selection and estimation involving sparse

25

representation because of its ability to handle high dimensional data with com-

AC

pressed samples, and discover sparse spatial activity patterns, thus enhancing interpretability. Several methods including Lasso [22], sparse logistic regression [23], Elastic [24], Sparse NMF [25] have been used for this purpose. Moreover, an alternative feature selection approach is to use linear or nonlinear dimen-

30

sionality reduction methods, such as PCA [26] and LLE [27].

3

ACCEPTED MANUSCRIPT

Although MVPA approaches have yielded remarkable insights into understanding the types of stimulus attributes that might be represented in distributed spatial activity patterns, they are inherently limited in their ability

35

CR IP T

to characterize the underlying feature space. Because the informative voxels/features selected by MVPA approaches are based on their prediction power,

part of informative voxels may be sufficient and the redundancy of information may be useless for decoding, but these redundant information are important to localize category-specific brain activation area[23]. In this paper, we propose a

new MVPA method for fMRI data analysis. The proposed method combine a forward feature selection scheme with a sparse regularization and permutation

AN US

40

testing for feature selection in multivariate pattern classification settings, we illustrated the application of our approach using an fMRI data set. The remainder of this paper is organized as follows: Section 2 describes the experimental setting and the detail of proposed method, while Section 3 reports 45

and analyzes the experimental results, followed by our paper conclusions in

M

Section 4.

ED

2. Materials and methods 2.1. Participants

50

PT

Twelve healthy native male Chinese (aged 21 to 48 years, with normal or corrected-to-normal vision and normal hearing) participated in this study. All subjects provided written informed consent prior to the experiment. The exper-

CE

imental protocol was approved by the Ethics Committee of Guangdong General Hospital, China.

AC

2.2. Experimental stimuli and Procedure

55

We selected 80 movie clips of faces with audio from internet sources. Af-

ter image processing (Windows movie maker), each edited movie clip was in gray scale, lasted 1400 ms and subtended 10.7◦ × 8.7◦ . Semantically, these 80 movie clips could be partitioned orthogonally into two groups based on either

4

ACCEPTED MANUSCRIPT

gender (40 male vs. 40 female Chinese faces) or facial emotion (40 crying vs. 60

40 laughing faces). The luminance levels of the videos were matched by adjusting the total power value of each video. Similarly, the audio power levels

CR IP T

were also matched by adjusting the total power value of each audio clip. During the experiment, stimulus presentation and response recording were controlled

with ePrime software. The visual stimuli were projected onto a screen using an 65

LCD projector (SA-9900 fMRI Stimulation System, Shenzhen Sinorad Medical

Electronics, Inc.), and the subjects viewed the visual stimuli through a mirror mounted on a head coil. The auditory stimuli were delivered through a pneu-

AN US

matic headset (SA-9900 fMRI Stimulation System, Shenzhen Sinorad Medical Electronics, Inc.).

Each subject completed two runs, one run for gender discrimination and the

70

other for emotion discrimination, each run contained 10 blocks and each block contained 8 trials. For the 10-fold cross-validation, the 80-trial data of each run were equally partitioned into 10 non-overlapping datasets, each corresponding

75

M

to 1 of the 10 blocks. In the Kth fold, the test data was made up by Kth block of each run, the remain formed the train data. During each trial, which lasted 10

ED

seconds or 5 volumes (TR=2 s), the subjects were asked to focus their attention on either the gender or emotion of faces in the movie clips and recognize whether each face was male/female or crying/laughing. At the beginning of each trial, a

80

PT

stimulus was presented to the subject for 1400 ms, followed by a 600-ms blank period. This 2-s (one TR) cycle with the same stimulus was repeated 4 times

CE

for effectively eliciting a brain activity pattern and was followed by a 6-s blank period. Mean responses of third, fourth and fifth volumes in each trial were used, whereas the other volumes were discarded because of the delay of BOLD

AC

response. More details were described in previous study [21].

85

2.3. fMRI data Collection and Preprocessing Functional images were collected using a 3 Tesla GE Signal Excite HD MR

scanner at Guangdong General Hospital, China. A 3D anatomical T1-weighted scan (FOV, 280 mm; matrix, 256 × 256; 128 slices; and slice thickness: 1.8 mm) 5

ACCEPTED MANUSCRIPT

was acquired before the functional scan for each subject. During the experi90

ment, gradient-echo echo-planar (EPI) T2*-weighted images (25 slices acquired in an ascending noninterleaved order; TR=2000 ms, TE=35 ms, flip angle= 70◦ ;

CR IP T

FOV: 280 mm, matrix: 64 × 64, slice thickness: 5.5 mm, no gap) were acquired,

covering the entire brain. The preprocessing were executed with the program

SPM8 software package1 . First five volumes were discarded because the MRI 95

signals were unsteady. The preprocessing procedure includes head motion correction, slice timing, co-registration between the functional scans and the structural scan, normalization to an MNI standard brain, data masking to exclude

AN US

non-brain voxels, time series detrending and normalization of time series of each run to zero mean and unit variance, using custom functions in Matlab 2012a 100

(Matlab Mathwork, Inc., Natick, MA).To reduce the computational burden and remove noise, filtering of the original data by correlation which was calculated voxelwise between the time series and stimulus function, the 6000 voxels with

M

high absolute value were selected for later processing. 2.4. Feature selection and Decoding

Feature selection is an important problem in machine learning, pattern recog-

ED

105

nition, and statistics. Due to extremely high dimensional features and small number of samples which is known as the curse-of-dimensionality problem in

PT

fMRI studies. As a result, ideally choosing a small subset of features is necessary to maximize model prediction accuracy. The feature selection ability 110

modeled in the sparse representation can be used to selected subset of relevant

CE

features in the signal and meanwhile separating it into two sets corresponding to two class labels, according to the signs of the sparse representation weights

AC

[19, 23, 28]. This sparse representation for regularization is very important for MVPA because feature selection allows for functional localization of cognitive

115

processes, with sparser feature selection providing more concise localization [29]. The sparse representation of signal can be described with the following equa1 http://www.fil.ion.ucl.ac.uk/spm/

6

ACCEPTED MANUSCRIPT

tion: y = Aw.

(1)

where,y ∈
CR IP T

−1 indicates the other class. The data matrix A ∈
are the numbers of samples and features respectively. w ∈
y = Aw.

(2)

AN US

minkwk0

0-norm of w is the sparsest solution of equation (1). Here, we use a greedy 125

algorithm: Orthogonal Matching Pursuit (OMP) [30] to solve this problem, which has the advantages of computationally efficient and easy to implement [31].

The overall scheme for sparse representation based feature selection and

M

decoding, as shown in Fig. 1 and described in following steps:

Step 1: A K-fold cross-validation is performed after data partition(K =

130

ED

− 10). In each fold, we obtain two sets of informative features, IND+ k and INDk ,

corresponding to gender and emotion recognition task respectively. By taking the union operation across folds, we obtain two sets of informative features

PT

IND+ and IND− (see Fig. 1 (A)) at the individual subject level. Step 2: Each fold of the cross-validation contains n0 iterations. As an ex-

135

ample, Fig. 1 (B) illustrates the kth fold. In the nth iteration (n = 1, . . . , n0 ),

CE

(n)

(n)

two sets of informative features Ind+ and Ind− are obtained in each iteration.

AC

− The selected sets of this fold are IND+ k and INDk .

140

Step 3: In the nth iteration of this fold (see Fig. 1(B)), we first perform

sparse representations on the train data to obtain a weight vector w(n) . Second, (n)

(n)

we determine two sets of informative features Ind+ and Ind− using this weight (n)

vector. Specifically, Ind+ contains N0 features corresponding to the largest elements(generally positive, gender recognition task in this paper) of the weight (n)

vector w(n) , while Ind+

contains N0 features corresponding to the smallest 7

AN US

CR IP T

ACCEPTED MANUSCRIPT

Figure 1: Scheme of feature selection by the sparse representation method and decoding of individual subject data. This algorithm contains K folds of cross validation (A) with the

elements (generally negative, emotion recognition task in this paper) of w(n)

ED

145

M

iteration steps (including n0 iterations) of the kth fold listed in (B) as an example.

(n)

(n)

[19, 23]. Third, we remove these features in Ind+ and Ind− from the data set in this iteration, an updated data set with remaining features is obtained for next

PT

iteration. Finally, we perform another 9-fold cross-validation procedure based on the updated train data set using Support vector machine (SVM), the prediction 150

accuracy of labels is denoted as r(n+1) , r1 was performed by using the initial

CE

train data set. Meanwhile, we also perform a decoding based on all the selected features set of the test data of this fold, the prediction model was trained by the

AC

train data with same features of this fold, and the prediction accuracy of labels is denoted as Gn . To assess statistical significance of decoding accuracy, we also

155

employed nonparametric permutation test [32]. The null hypothesis assumes that the relationship between the data and the labels cannot be learned reliably by the family of classifiers used in the training step. In permutation testing, we randomly permuted the class labels of the training data (all the selected 8

ACCEPTED MANUSCRIPT

features used here) 10000 times and calculated corresponding pseudo decoding 160

accuracies. Remark : In this scheme of feature selection, the number of features with

CR IP T

the largest positive/the smallest negative weights selected in each iteration, 15 was assigned to this parameter according to previous studies [14, 28]. 2.5. Localization

After feature selection, we can obtain two sets of selected features IND+ and

165

IND− by 10-fold of cross-validation for each subject, which corresponding to

AN US

gender and emotion discrimination task respectively. However, part of selected features may represent noise. In order to remove these features representing noise, we perform permutation test on the set of selected features IND+ and 170

IND− as below.

Step 1: (probability maps): Two probability maps were constructed using the two sets of features selected across all the K folds of cross-validation for

M

each subject. For example, using the sets IND+ , we assign scores to features based on selection frequency, which by counting the times that the features in IND+ , because of features that are repeatedly selected among folds of training

ED

175

data sets could be important, so high quantitative values should be assigned. If this features does not appear in IND+ , the frequency is set to zero. Thus, we

PT

obtain a probability map corresponding to gender recognition task. Similarly, we obtain a probability map corresponding to emotion recognition task using 180

IND− . Finally, we averaged these probability maps across all subjects to obtain

CE

two probability maps at the group level. Step 2 (permutation): By permuting the class labels 300 times randomly

AC

and repeated the above procedure of feature selection in each permutation, and obtained 300 pairs probability maps. Based on these probability distributions,

185

it is possible to test the null hypothesis at the voxel level. Step 3 (multiple comparison correction): For multiple comparison correction,

a null distribution for each class was constructed by pooling all probability values of the 300 average probability maps corresponding to this class, which were 9

ACCEPTED MANUSCRIPT

90

80

70 MACG MACr

60

10

30

50

70 Iterations

90

110

130

150

AN US

50

CR IP T

Decoding accuracy (%)

100

Figure 2: Iterative decoding accuracy curves. MACr and MACG are abbreviations for mean accuracy curves of r(n+1) and Gn at the group level respectively.

obtained through the 300 permutations. The p value of each voxel is calculated 190

as the proportion of values in the null distribution that is greater or equal to

M

the value obtained by using the real label (i.e. non-permutated) data. A critical threshold was determined by False Discovery Rate (FDR)<0.05 for each class, we remove those features with their values greater than the critical threshold

195

ED

[19, 32]. The remaining features are those informative ones to localize spatial activity pattern with respect to the corresponding label (e.g., class).

PT

3. Results

CE

3.1. Decoding accuracy Each discrimination task (gender or emotion) contained 10 blocks and each

block contained 8 trials. For each subject, we applied 10-fold cross-validation scheme, the dataset of each discrimination task was divided into 10 equal sub-

AC

200

sets by blocks. Test data of each fold included 1 block of gender discrimination task and 1 block of emotion discrimination task. The remain data of two discrimination tasks was used as train data for this fold. There were two decoding results r(n+1) and Gn of each fold that correspond

205

to classification based on train data and test data respectively (see Materials 10

ACCEPTED MANUSCRIPT

2500 2000 1500 1000 500 0

0

10

20

30

40

50 60 Accuracy (%)

70

CR IP T

Permutation count

3000

80

90

100

AN US

Figure 3: The distribution of permutation test (10000 repetitions). The vertical red line indicates the real decoding accuracy without permutation.

and Methods). Increasing n (number of iterations) will result in increasing number of features for Gn calculation. By contrast, number of features for r(n+1) calculation was decreased(With the increasing of iterations, more and

210

M

more informative features were removed). The two average decoding accuracy curves across folds and subjects are shown in Fig. 2, where MACr and MACG

ED

are abbreviations for mean accuracy curves of r(n+1) and Gn at the group level respectively. We can see that after 15 iterations, the mean accuracy curve of Gn has reached 90% and keeps stable after then, because most of latter selected

215

PT

features are informative but highly related with earlier ones, which matches our expectation. Meanwhile, informative features were removed from the train data leads to decline of the mean accuracy curve of r(n+1) .

CE

With the decoding accuracy as the statistic, the distribution of permutation

is shown in Fig. 3. As demonstrated by Fig. 3, the classifier learned the rela-

AC

tionship between the data and the labels with a probability of being wrong of

220

<0.0001. 3.2. Localization of informative features Using two sets of selected features, we perform permutation test with p < 0.05 FDR-corrected and cluster size of 15 voxels to construct the corresponding

11

ACCEPTED MANUSCRIPT

z=-55

z=-50

z=-45

z=-40

z=-35

z=-30

z=-25

z=-20

z=-15

z=-10

z=-5

z=0

z=5

z=10

z=15

z=20

z=40

z=45

AN US

CR IP T

z=-60

z=25

z=30

z=35

z=50

z=55

z=60

Figure 4: Voxels selected by our method with p < 0.05 FDR-corrected cluster size of 15

M

voxels. The blue clusters corresponded to the gender discrimination task, and the red clusters corresponded to emotion discrimination task.

225

ED

activation map (see Materials and Methods), the distribution of informative patterns (clusters) are shown in Fig. 4. As we observe, although the two informative

PT

patterns share some common brain areas,such as left cuneus, left lingual gyrus, bilateral inferior occipital gyrus. Meanwhile, the pattern with non-overlapped which means the task-specific pattern that was separated successfully. For in-

CE

stance, brain regions including left precentral gyrus, left middle frontal gyrus

230

and bilateral postcentral gyrus for emotion discrimination task, and left insula,

AC

right hippocampus and right thalamus for emotion discrimination task.

4. Conclusions In fMRI data analysis, there are hundreds of thousands of voxels, which is

much larger than the number of samples, resulting in overfitting. To address 235

this issue, the number of features needs to be significantly reduced, and infor12

ACCEPTED MANUSCRIPT

mative features have to be wisely selected in order to make the classification task efficiently. In this paper, we proposed an MVPA methods based on sparse representation for decoding the brain states and localizing task-specific brain

240

CR IP T

activation areas at the same time. Experimental results using two discrimination tasks confirmed that such a method is capable of finding two corresponding

semantic categories (gender and emotion) sets of informative features and decoding the two tasks with significantly high accuracy.

Acknowledgements

245

AN US

This work was supported by the National Key Basic Research Program of

China (973 Program) under Grant 2015CB351703, the National Natural Science Foundation of China under Grants 61633010, 91420302 and 61573150, and Guangdong Natural Science Foundation under Grant 2014A030312005.

M

References References

[1] R. A. Poldrack, J. A. Mumford, T. E. Nichols, Handbook of functional

ED

250

MRI data analysis, Cambridge University Press, 2011.

PT

[2] K. J. Friston, A. P. Holmes, J. Poline, P. Grasby, S. Williams, R. S. Frackowiak, R. Turner, Analysis of fmri time-series revisited, Neuroimage 2 (1)

CE

(1995) 45–53. 255

[3] N. Kriegeskorte, Pattern-information analysis: from stimulus decoding to

AC

computational-model testing, Neuroimage 56 (2) (2011) 411–421.

[4] L. Reddy, N. Tsuchiya, T. Serre, Reading the mind’s eye: decoding category information during mental imagery, Neuroimage 50 (2) (2010) 818–825.

[5] F. Pereira, T. Mitchell, M. Botvinick, Machine learning classifiers and fmri:

260

a tutorial overview, Neuroimage 45 (2009) S199–S209.

13

ACCEPTED MANUSCRIPT

[6] A. J. O’Toole, F. Jiang, H. Abdi, N. Penard, J. P. Dunlop, M. A. Parent, Theoretical, statistical, and practical perspectives on pattern-based classification approaches to the analysis of functional neuroimaging data, J Cogn

265

CR IP T

Neurosci 19 (11) (2007) 1735–1752. [7] Y. Kamitani, F. Tong, Decoding the visual and subjective contents of the human brain, Nat Neurosci 8 (5) (2005) 679–685.

[8] J. V. Haxby, M. I. Gobbini, M. L. Furey, A. Ishai, J. L. Schouten,

P. Pietrini, Distributed and overlapping representations of faces and ob-

270

AN US

jects in ventral temporal cortex, Science 293 (5539) (2001) 2425–2430.

[9] K. N. Kay, T. Naselaris, R. J. Prenger, J. L. Gallant, Identifying natural images from human brain activity, Nature 452 (7185) (2008) 352–355. [10] Y. Miyawaki, H. Uchida, O. Yamashita, M. A. Sato, Y. Morito, H. C. Tanabe, N. Sadato, Y. Kamitani, Visual image reconstruction from hu-

M

man brain activity using a combination of multiscale local image decoders, Neuron 60 (5) (2008) 915–929.

275

ED

[11] J. A. Lewis-Peacock, B. R. Postle, Decoding the internal focus of attention, Neuropsychologia 50 (4) (2012) 470–478.

PT

[12] Y. Erez, J. Duncan, Discrimination of visual categories based on behavioral relevance in widespread regions of frontoparietal cortex, The Journal of Neuroscience 35 (36) (2015) 12383–12393.

CE

280

[13] J. C. Francken, E. L. Meijs, P. Hagoort, S. van Gaal, F. P. de Lange,

AC

Exploring the automaticity of language-perception interactions: Effects of attention and awareness, Scientific reports 5.

[14] K. Jimura, R. A. Poldrack, Analyses of regional-average activation and mul-

285

tivoxel pattern information tell complementary stories, Neuropsychologia 50 (4) (2012) 544–552.

14

ACCEPTED MANUSCRIPT

[15] S. M. Polyn, V. S. Natu, J. D. Cohen, K. A. Norman, Category-specific cortical activity precedes retrieval during memory search, Science 310 (5756) (2005) 1963–1966. [16] M. A. Just, V. L. Cherkassky, S. Aryal, T. M. Mitchell, A neurosemantic

CR IP T

290

theory of concrete noun representation based on the underlying brain codes, PloS one 5 (1) (2010) e8622.

[17] S. V. Shinkareva, V. L. Malave, R. A. Mason, T. M. Mitchell, M. A. Just,

Commonality of neural representations of words and pictures, Neuroimage

AN US

54 (3) (2011) 2418–2425.

295

[18] E. Formisano, F. De Martino, M. Bonte, R. Goebel, ”who” is saying ”what”? brain-based decoding of human voice and speech, Science 322 (5903) (2008) 970–973.

[19] J. Mourao-Miranda, A. L. Bokde, C. Born, H. Hampel, M. Stetter, Classi-

M

fying brain states and determining the discriminating activation patterns:

300

Support vector machine on functional mri data, Neuroimage 28 (4) (2005)

ED

980–995.

[20] Z. Wang, A. R. Childress, J. Wang, J. A. Detre, Support vector machine learning-based fmri data group analysis, Neuroimage 36 (4) (2007) 1139– 1151.

PT

305

[21] Y. Q. Li, J. Y. Long, B. Huang, T. Y. Yu, W. Wu, Y. J. Liu, C. H.

CE

Liang, P. Sun, Crossmodal integration enhances neural representation of task-relevant features in audiovisual face perception, Cerebral cortex 25 (2)

AC

(2015) 384–395.

310

[22] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological) (1996) 267–288.

[23] O. Yamashita, M. Sato, T. Yoshioka, F. Tong, Y. Kamitani, Sparse estimation automatically selects voxels relevant for the decoding of fmri activity patterns, Neuroimage 42 (4) (2008) 1414–1429. 15

ACCEPTED MANUSCRIPT

315

[24] H. Zou, T. Hastie, Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society Series B-Statistical Methodology 67 (2005) 301–320.

CR IP T

[25] H. Kim, H. Park, Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis, Bioinformatics 23 (12) (2007) 1495–1502.

320

[26] L. L. Zeng, H. Shen, L. Liu, D. Hu, Unsupervised classification of major depression using functional connectivity mri, Human Brain Mapping 35 (4)

AN US

(2014) 1630C1641.

[27] H. Shen, L. Wang, Y. Liu, D. Hu, Discriminative analysis of resting-state functional connectivity patterns of schizophrenia using low dimensional em-

325

bedding of fmri., Neuroimage 49 (4) (2010) 3110–3121.

[28] Y. Q. Li, J. Y. Long, L. He, H. D. Lu, Z. H. Gu, P. Sun, A sparse

M

representation-based algorithm for pattern localization in brain imaging data analysis, Plos One 7 (12) (2012) e50332. [29] K. Kampa, S. Mehta, C. A. Chou, W. A. Chaovalitwongse, T. J. Grabowski,

ED

330

Sparse optimization in feature selection: application in neuroimaging, Jour-

PT

nal of Global Optimization 59 (2-3) (2014) 439–457. [30] T. Zhang, On the consistency of feature selection using greedy least squares

CE

regression, Journal of Machine Learning Research 10 (2009) 555–568. 335

[31] Y. Q. Li, Z. L. Yu, N. Bi, Y. Xu, Z. H. Gu, S. Amari, Sparse representation

AC

for brain signal processing [a tutorial on methods and applications], IEEE Signal Processing Magazine 31 (3) (2014) 96–106.

[32] T. E. Nichols, A. P. Holmes, Nonparametric permutation tests for func-

340

tional neuroimaging: A primer with examples, Human Brain Mapping 15 (1) (2002) 1–25.

16

ACCEPTED MANUSCRIPT

CR IP T

biography

Fangyi Wang received the M.S. degree in signal and information processing from Jiangxi Science and Technology Normal University,

Nanchang, China, in 2012. He is currently working toward the Ph.D. degree in pattern recognition and intelligent systems at the South China University of

AN US

345

Technology, Guangzhou, China. His current research interests include the fields of sparse representation, fMRI data analysis, pattern recognition and braincom-

M

puter interface.

350

ED

Yuanqing Li was born in Hunan Province, China, in 1966.

He received the B.S. degree in applied mathematics from Wuhan University, Wuhan, China, in 1988, the M.S. degree in applied mathematics from South

PT

China Normal University, Guangzhou, China, in 1994, and the Ph.D. degree in control theory and applications from South China University of Technology,

CE

Guangzhou, China, in 1997. Since 1997, he has been with South China Uni355

versity of Technology, where he became a full professor in 2004. In 200204, he worked at the Laboratory for Advanced Brain Signal Processing, RIKEN Brain

AC

Science Institute, Saitama, Japan, as a researcher. In 200408, he worked at the Laboratory for Neural Signal Processing, Institute for Infocomm Research, Singapore, as a research scientist. His research interests include, blind signal

360

processing, sparse representation, machine learning, brain-computer interface, EEG and fMRI data analysis. He is the author or coauthor of more than 60

17

ACCEPTED MANUSCRIPT

CR IP T

scientific papers in journals and conference proceedings.

Zhenghui Gu received the Ph.D. degree from Nanyang

Technological University, Singapore, in 2003. From 2002 to 2008, she was with 365

the Institute for Infocomm Research, Singapore. In 2008, she joined the College

of Automation Science and Engineering, South China University of Technology,

AN US

Guangzhou, as an associate professor. Her current research interests include the

AC

CE

PT

ED

M

fields of brain signal processing and pattern recognition.

18