A multi-variate blind source separation algorithm

A multi-variate blind source separation algorithm

Computer Methods and Programs in Biomedicine 151 (2017) 91–99 Contents lists available at ScienceDirect Computer Methods and Programs in Biomedicine...

2MB Sizes 0 Downloads 113 Views

Computer Methods and Programs in Biomedicine 151 (2017) 91–99

Contents lists available at ScienceDirect

Computer Methods and Programs in Biomedicine journal homepage: www.elsevier.com/locate/cmpb

A multi-variate blind source separation algorithm M. Goldhacker c,b,∗, P. Keck c, A. Igel c, E.W. Lang c, A.M. Tomé a a

DETI- IEETA -Universidade Aveiro, 3810-193 Aveiro, Portugal Experimental Psychology, University of Regensburg, 93040 Regensburg, Germany c CIML, Biophysics, University of Regensburg, 93040 Regensburg, Germany b

a r t i c l e

i n f o

Article history: Received 13 June 2016 Revised 6 July 2017 Accepted 21 August 2017

Keywords: Blind source separation Independent component analysis fMRI Resting state Retinotopy Spatio temporal

a b s t r a c t Background and objective: The study follows the proposal of decomposing a given data matrix into a product of independent spatial and temporal component matrices. A multi-variate decomposition approach is presented, based on an approximate diagonalization of a set of matrices computed using a latent space representation. Methods: The proposed methodology follows an algebraic approach, which is common to space, temporal or spatiotemporal blind source separation algorithms. More specifically, the algebraic approach relies on singular value decomposition techniques, which avoids computationally costly and numerically instable matrix inversion. The method is equally applicable to correlation matrices determined from second order correlations or by considering fourth order correlations. Results: The resulting algorithms are applied to fMRI data sets either to extract the underlying fMRI components or to extract connectivity maps from resting state fMRI data collected for a dynamic functional connectivity analysis. Intriguingly, our algorithm shows increased spatial specificity compared to common approaches, while temporal precision stays similar. Conclusion: The study presents a novel spatiotemporal blind source separation algorithm, which is both robust and avoids parameters that are difficult to fine tune. Applied on experimental data sets, the new method yields highly confined and focused areas with least spatial extent in the retinotopy case, and similar results in the dynamic functional connectivity analyses compared to other blind source separation algorithms. Therefore, we conclude that our novel algorithm is highly competitive and yields results, which are superior or at least similar to existing approaches. © 2017 Elsevier B.V. All rights reserved.

1. Introduction Exploratory matrix factorization (EMF) techniques are unsupervised, data-driven approaches which are used to discover latent factors, also called sources or features, in the data [1–4]. Such techniques often provide alternative representations of large data sets such as, for example, functional imaging data. This transformed data often reveals underlying characteristics of the data sets under study. Hence these features are generally considered indicative of underlying processes or networks, and often serve classification purposes for discriminating different types of data. Especially functional magnetic resonance imaging (fMRI) experiments produce sequences of images/volumes. Such data is then organized into a data matrix X where each row contains a volume, i.e., a complete 3D scan of the brain. The intensities I(x, y, z) of the voxels within the volume are concatenated into a row vector by associating the spa-



Corresponding author. E-mail address: [email protected] (M. Goldhacker).

http://dx.doi.org/10.1016/j.cmpb.2017.08.019 0169-2607/© 2017 Elsevier B.V. All rights reserved.

tial voxel coordinates (x, y, z) with an index n. Each row of X with a total of N = s1 × s2 × s3 voxels thus represents one volume scan, or a region of interest (ROI) in the total volume, and is associated with a time index m. A session involves a total of M scans, therefore a session is represented by an M × N data matrix where M is related with the time domain and N is associated with the spatial domain. If instead of analyzing intensity distributions, we are interested in dynamic functional connectivity aspects, the data matrix X is formed with Pearson correlation coefficients, which represent two-point correlations between time series resulting from preprocessing the fMRI data. Therefore, the raw data is often decomposed into functional networks using a group level spatial ICA. The referred pre-processing step results in L independent spatial maps and the related time courses. These time courses are then subdivided into short, partially overlapping segments, and L(L − 1 )/2 correlation coefficients are computed between all time windows. These correlations vary over time, thus represent what is called dynamic functional connectivity networks (dFCN). The rows of the data matrix are then formed by concatenating N = SP, e.g. S sessions

92

M. Goldhacker et al. / Computer Methods and Programs in Biomedicine 151 (2017) 91–99

and P time points per session, correlation coefficients ρmm [n], n = 1, . . . , N. The goal of exploratory matrix factorization techniques is to approximate the original data matrix by the product of two matrices, which can be expressed as an outer product of K column W∗ k and row Hk∗ vectors according to

ˆ = WH = W∗1 H1∗ + W∗2 H2∗ + . . . + W∗K HK∗ X with

W∗k Hk∗



(1)



w 1k = ⎝ ... ⎠(hk1 · · · hkN ). wMk

tion (JAD) approach is considered after computing a set of matrices derived from time and space components. More recently [13], a cascaded approach was proposed to compute independent components. Thereby, after applying sICA, a tICA is applied to the mixing matrix W of the spatial mixing model. In this proposal, there is an additional stage to visually identify and remove artifact related independent spatial components. Then, the corresponding time components are regressed out of the non-artifact-related time components of the mixing model. Notice that the final model to approximate the original data is presented as a product of three matrices, originating from the two step procedure.

(2)

The original matrix is thus approximated by K outer products of factors related with the information contained in the space spanned by the column vectors W∗ k and the space spanned by the row vectors Hk∗ . Moreover, each row of W and each column of H , respectively, form the coordinates of a new representation by which the information is encoded in a latent (hidden) space of dimension K. Note that in general no a priori knowledge about the factor matrices is available. The mathematical model described by Eq. (1) is also followed by the general linear model (GLM) widely used in fMRI data analysis. The main difference is that the matrix W, now called design matrix, represents the experimental manipulations and conditions which are assumed to be known. Therefore, GLM represents a semi-knowledge-based approach [5], where the unknown matrix H can be estimated by minimizing the Frobenius norm of ˆ represents the estimate WH. In ˆ ) and X the error matrix (X − X data-driven methods, applied to fMRI data sets X, both factor matrices (W, H) are estimated given only the measured data [6]. To achieve this goal, assumptions like orthogonality (singular value decomposition (SVD) or principal component analysis (PCA)), statistical independence (independent component analysis (ICA) or blind source separation (BSS)), nonnegativity (non-negative matrix factorization (NMF)) and so on need to be applied. Independent Component Analysis (ICA) has been widely applied to fMRI data by imposing the independence constraint either to the spatial (rows of H) domain (sICA) or the time (columns of W) domain (tICA).Note that with sICA the matrix W is considered the mixing matrix, while with tICA this becomes the role of the matrix H. The sICA is more common due to the large dimensionality of the spatial domain when compared with the temporal domain. Moreover, if it comes to compare ICs across a group of subjects, group ICA [7– 9] has to be applied to get around the inherent permutation and scaling indeterminacies of ICA. And the latter leads to a set of independent spatial maps, which are visually analyzed to identify artifacts like head movement or known structures in the brain (major blood vessels or the ventricles). Finally, the time courses, corresponding to valid independent spatial components, are selected to compute the dFCN data matrix. 2. Spatiotemporal decomposition Early attempts to combine sICA and tICA can be found in [10] where sICA is used to select the regions of interest (ROIs), and then tICA is applied to the time series of those voxels to find temporal components. More generally, spatiotemporal approaches were also suggested [11] and [12]. In both works, a singular value decomposition (SVD) pre-processing step determines the dimension of the latent space. Then a new transformation matrix is computed using both time and space components of the SVD decompostion. In [11], the independent components in the new latent space are computed by optimizing a contrast function via gradient descent. In [12], an algebraic joint approximative diagonaliza-

2.1. Motivation The Motivation of this study is to combine the work of [11] with an algebraic approach as proposed in [12]. Therefore, the transformation matrix, estimated in the latent space, is an orthogonal matrix and both, temporal and spatial components, undergo a similar rotation. Both works have shortcomings, which are corrected for by our proposed combination. The algorithm in [11] employs many additional parameters resulting in a less robust behavior, whereas the approach in [12] is an exact analytical method. Nevertheless, the work of [12] uses pseudoinverses, which introduce a computationally costly and a random component to the algorithm. Furthermore, the pseudoinverses render the algorithm computationally instable because of possible degeneracy of the matrix columns. These shortcomings are accounted for in our proposed method, which is based on the well known and robust eigenvalue decomposition. The performance of the method is illustrated with fMRI data sets to extract relevant information in different stages of the processing chain: a processing and a postprocessing step. Nevertheless, the algorithm is generally suitable for spatiotemporal datasets irrespective of their origin and may have applications in many fields of science. 2.2. Singular value decomposition Singular Value Decomposition (SVD) of the original data matrix X is the most widely used factorization technique that serves as a pre-processing step in most of the decomposition techniques. The matrix decomposition reads

X = U V T

(3)

If N > > M, the maximal number of non-zero singular values is M, and those form the entries of the M × M diagonal matrix . The related eigenvector matrices, i.e. the M × M matrix U and the N × M matrix V, have M orthogonal columns. The SVD decomposition may be re-written using the factorization model described in Eq. (1) by using only the K < M largest singular values and corresponding eigenvectors. This decomposition will lead to a low-rank approximation of the original data. The Frobenius norm of the error of the approximation is related with the discarded singular values



X − Xˆ F =

M 

ii2

(4)

i=K+1

when the diagonal entries of  are assumed to be arranged in decreasing order of magnitude. Now considering the decomposition given in Eq. (1): •

If a spatial decomposition (sBSS) is intended, then we identify W = UK K and H = VTK . Note that here H is related to the underlying sources while W forms part of the mixing matrix of the model.

M. Goldhacker et al. / Computer Methods and Programs in Biomedicine 151 (2017) 91–99 •



If, however, one is interested in a temporal decomposition (tBSS), then one has instead W = UK and H = K VTK . Now W corresponds to the underlying sources and H forms part of the mixing matrix. If, finally, we consider a spatio - temporal decomposition (stBSS), then it is most convenient to identify W = UK 1K/2 and H =

1K/2 VTK . Both components have the same energy, which equals the singular values of the original data matrix.

Note that the transformation, described in the first two hypothesis, is called whitening in most of the independent components algorithms (ICA) and achieves source-related components with an energy normalized to unity. So far, only decorrelated components have been achieved, and the next step provides the independent components by applying a new transformation. In summary, the spatial and temporal components in the latent space of dimension K are written as Ss = H and St = WT , respectively, where Ss is a K × N matrix and St is a K × M matrix. 2.3. Independent components As the goal is to have all components of the decomposition statistically independent, an additional rotation matrix needs to be determined. Several options are available to achieve this goal. One of the most widely used is an algebraic method called Joint Approximative Diagonalization (JAD). Therefore, after computing a set of real-valued K × K matrices (Cl , l = 1 . . . D ) JAD aims at estimating a single K × K matrix A, called joint diagonalizer, which satisfies the condition



o f f ( AT Cl A ) ≈ 0

l

where the off - operator selects the off-diagonal elements of the argument. The most widely used algorithm to implement this condition is based on Given’s rotations and estimates an orthogonal matrix AAT = AT A = I. The application of this matrix to the SVD outputs results in

Sˆ t = AT WT Sˆ s = AT H

(5)

Note that thereby the original data matrix X can be approximated by

ˆ = Sˆ Ts Sˆ t = UK K UTK X

(6)

as the matrix A is an orthogonal matrix. Therefore, the reconstruction error is not altered by the rotation matrix computed in this step. Notice that a similar conclusion might be drawn for most of the sBSS or tBSS algorithms, which apply the whitening procedure as described before, and compute an orthogonal matrix on the second step. 2.3.1. Estimation of the matrices Cl Two common strategies can be found to estimate the set of matrices Cl , l = 1, 2, . . . , D: 2.3.1.1. Second-order statistics. One way is to use second order statistics (SOS) only. This is possible if further knowledge about correlations, usually time correlations in time series data, is available in addition and can be exploited. Therefore, denoting the latent signals as s∗ [n], where ∗ = s, t denotes either the temporal or spatial components, second order statistics is based on timedelayed correlation matrices

ˆ l = E {s∗ [n]sT∗ [n + l]} R Note that the expectation operator E usually deals with a different number of samples when considering spatial n ∈ [1, N] or temporal

93

m ∈ [1, M] components, whereby N  M. The index l = {1, 2, . . . , D} counts the number of delays which determine the number of matrices to be diagonalized. The value D is an user assigned parameter. To assure symmetry, the matrices to be diagonalized are deˆ l + 0.5R ˆT fined as Cl = 0.5R l

2.3.1.2. Higher order statistics. An alternative way is to consider the data as samples from a stochastic process. Then higher order statistics (HOS) of the distribution of random variables needs to be exploited. Often fourth-order statistics are considered only. They rely on the estimation of cumulant matrices. The number of cumulant matrices is D = K (K + 1 )/2 and the entries to each cumulant matrix are 4-th order cumulants computed with the latent space variables. Considering either a spatial ∗ ≡ s or temporal ∗ ≡ t component in latent space, i.e. s∗ [n] = [s1 [n] s2 [n] . . . sK [n]]T , the (p, q)th element of the l cumulant matrix is E{si [n]sj [n]sp [n]sq [n]]}. Note that the different cumulant matrices result from indexing (i, j ), i ≥ j, (i, j ) ∈ {1, 2 . . . K }. From now on algorithms employing the second-order timedelayed correlation matrices are denoted stBSS-SO, while algorithms using fourth-order cumulant matrices are called stBSS-FO. 3. Algorithm In this section we introduce a step-by-step formulation of our algorithm (see Algorithm. 1). The information of Section 2 is conAlgorithm 1 stBSS. Input: The M × N data matrix X; M: temporal dimension, N: spatial dimension 1: SVD decomposition of the data matrix X → get the M × M left eigenvector matrix U, the M × M diagonal singular value matrix , and the N × M right eigenvector matrix V 2: Estimate K < M latent components representing temporal and spatial information → get the K × N spatial component matrix Ss = 1K/2 VTK and the K × M temporal component matrix St = 1K/2 UTK  The diagonal entries of matrix  are ordered by decreasing order of magnitude. 3: Perform ICA in the latent space and compute L time-delayed matrices or L cumulant matrices 4: Estimate the matrix A, which diagonalizes all matrices Output: The independent components in latent space → independent spatial components Sˆs = AT 1K/2 VTK and independent temporal components Sˆt = AT 1/2 UT K

K

densed in this algorithmic description. It was implemented on an INTEL i7 Quad-core 3.2 GHz with 16 GB RAM. The used programming language was MATLAB (v2016a). 4. Results The spatiotemporal methodology was applied to two fMRI data sets: a retinotopy data set and a resting-state fMRI data set. We chose the data sets, because both types of data are inherently spatiotemporal. Therefore, they render suitable for applying our stBSS method. Furthermore, the used data sets represent recent fields in the neuroscientific community. Additionally, the number of dimensions of those data sets are relatively high serving as a good benchmark for the proposed algorithm. The following evaluation methods were chosen very specifically for each data set, since we dealt with complex data and experiments. For the retinotopy data set, the spatiotemporal approach was used to extract the underlying components of the fMRI data, and to select the component related with the design vector. In case of resting-state data, the methodology of a spatiotemporal approach

94

M. Goldhacker et al. / Computer Methods and Programs in Biomedicine 151 (2017) 91–99

was applied to interpret the correlation coefficient time series after the group-ICA analysis. Hence, we apply a stBSS algorithm to find for the two domains, i.e. the correlation domain and the temporal domain in our application, a statistically independent representation. 4.1. Retinotopy fMRI A total of 8 participants were scanned during a visual retinotopic mapping experiment. The visual stimulus was a flickering black-and-white, bow-tie shaped checkerboard at either a horizontal (stimulus 1) or vertical (stimulus 2) orientation and flickering at 8 Hz. The task of every subject was to focus a white cross shown in the center of the bow-tie. Each stimulus was displayed for 15 seconds and was alternated during data acquisition corresponding to a total of N = 140 scans. The time courses of both stimuli were convolved with a typical hemodynamic response function (HRF) as provided in the Statistical Parametric Mapping (SPM) package [14]. In this experiment, the difference between the two design vectors, reflecting the phase-shifted stimulus designs, was considered the proper reference time course for correlating temporal components. This was done because the stimuli are highly correlated thus violating the assumption of independent underlying source signals when an ICA analysis is performed. The data set was pre-processed with the SPM toolbox: space and time realignment, normalization to a standard brain and Gaussian smoothing (F W HW = 4 mm). The occipital lobe region was extracted by using an appropriate mask which extracts N = 21515 voxels of each 79 × 95 × 69 complete brain scan. Illustrations showing activity distributions in the brain were generated using the software tool CARET (see Fig. 2). 4.2. Dynamic functional network connectivity We based our analysis on volumetric data from the preselected bundle of 100 unrelated human subjects from the S500 release of the Human Connectome Project [15], in which each subject went through four rs-fMRI sessions lasting 873 [s] resulting in 1200 volumes per session and n = 400 sessions in total. Data was acquired at customized 3T MRI scanners at Washington University using multi-band (factor 8) acquisition techniques [16–19]. From the different versions of the data we chose the most preprocessed data set with motion-correction, structural preprocessing, and ICA-FIX denoising [20–26]. The rs-fMRI data has T R = 720 [ms] and T E = 33.1 [ms] acquired by a Gradient-echo EPI sequence. Flip angle was 52° and the FOV 208 [mm] × 180 [mm] with a slice thickness of 2 [mm], 72 slices, and isotropic 2 [mm] voxel size. As additional preprocessing, we applied a Gaussian smoothing kernel with a FWHM of 5 [mm] using SPM8 software package (http://www.fil.ion.ucl.ac.uk/spm/) and we discarded the first five scans of each session from our analysis. Therefore, each individual session data is formed with M = 1195 scans with 193965 voxels each. The principle of a dFCN is shown in Fig. 1. Assuming that highly correlated time series share highly similar information contents, such correlation measures could help identifying networks of neural information processing. The temporal dynamics of such connectivity patterns can be estimated by computing correlations between different time series within specified time segments which are shifted along the time axis. Hence, time windows with t = 57.6 [s] were chosen and were shifted in steps of δt = Tr = 720 [ms] along the time series. Finally this resulted in a total of P = 1115 correlation matrices per session. Each 24 × 24 - dimensional correlation matrix is symmetric and contains M = 276 different Pearson coefficients. The latter correlation values are concatenated into a column of the data matrix X which encompassed a total of S = 400 sessions.

Fig. 1. Illustration of the dFNC approach applied to the correlation time series. Table 1 Assignments of the ICs to the RSN for the data set with 24 extracted ICs. network name

abbreviation

number of components

auditory sensorymotor visual cognitive control default mode

AUD SM VIS CC DM

2 5 6 6 5

4.2.1. Group-ICA To apply gICA [27] on raw data sets, we used the GIFT toolbox1 . The main processing steps comprise: a principal component analysis (PCA), at a level of individual sessions, followed by a spatial ICA at group level. The PCA step reduces the time dimension of the data to T = 45 principal components related with the largest eigenvalues. Then these T components of all S individual sessions are concatenated in the time domain, and a subsequent PCA reduces the dimension from T · S to L1 = 30 components. Finally, group spatial independent components are computed using the INFOMAX algorithm. These group spatial components are analyzed to eliminate possible artifacts. The components are selected according to their spatial localization as well as the power spectral information of its corresponding time course. This selection process finally yields L = 24 ICs which are back-projected to the individual session space using the matrices computed in the three steps described before. The selected components, corresponding to different brain regions, can be assigned to five networks as described in Table. 1 In particular, the time courses related with the ICs of individual sessions [28] are described by

ˆ Tˆ i = Fi Gi A

(7)

where Fi is the 1195 × T PCA transformation matrix computed at the level of individuals; Gi is the T × L1 - dimensional i − th block of a subsequent PCA transformation matrix computed at the group ˆ is the L1 × L - dimensional matrix resulting from the level, and A INFOMAX algorithm after the elimination of artifact-related components. Therefore, each column of Tˆ i is a time series, and the correlation coefficients are computed using sliding windows with T R = 80 times points, with a shift of 1 time step between consecutive windows. Furthermore, the correlation coefficients are computed after point-wise multiplication of the time series with a Gaussian window with variance equal to 5. The pairwise correlation matrix resulting from each window has L(L − 1 )/2 unique entries, and those are concatenated to form a column of the new data set. So far, at session level, the correlation data matrix Xi has dimensions 276 × 1195. In this study, the matrix to be analyzed by a 1

http://mialab.mrn.org/software/gift/

M. Goldhacker et al. / Computer Methods and Programs in Biomedicine 151 (2017) 91–99

95

Fig. 2. Comparison of best fitting temporal and spatial components obtained with stBSS-FO or stBSS-SO (green) vs SPM (blue/yellow). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Table 2 Dimension of the space of latent variables. Average correlation coefficients of the correlation of temporal components with the design vector for different K. Averages are taken over all 8 subjects. K

stBSS-SO

stBSS-FO

4 6 8 10 15

0.81 ± 0.26 0.87 ± 0.10 0.83 ± 0.10 0.82 ± 0.13 0.78 ± 0.17

0.85 ± 0.24 0.93 ± 0.04 0.91 ± 0.08 0.93 ± 0.04 0.92 ± 0.05

“spatiotemporal” algorithm, is the result of the concatenation of all individual matrices. 5. Discussion 5.1. Retinotopy fMRI 5.1.1. Dimension of the latent space The SVD decomposition determines the approximation error, i.e., how well the decomposition approximates the original data. A scree plot of the singular values offers a good estimate about the intrinsic dimension of the observations, i.e. the number K of informative components. Applying the knee rule , K = 6 turned out to be appropriate for all subjects. This is corroborated by Pearson correlation coefficients between the design vector and the column vectors carrying temporal information, which were computed for different numbers of extracted components (see Table 2). The largest absolute value was used as an indicator of the most relevant vector of latent components. 5.1.2. Spatial and temporal components After the dimension reduction step, the spatial and temporal components are estimated according the Eq. (5). The temporal components are correlated with the reference time course estimated using the SPM tool. The corresponding spatial component was normalized to zero mean and unit standard deviation. Activation maps were then generated by including only activation values that lay outside the range of [−σ , σ ], where σ denotes the standard deviation of the corresponding activity distribution. Two groups of results could be found (see Table. 2): •

For one group, both algorithms have correlation coefficients with temporal components always larger than ρ > 0.80. Then the spatial activation maps point to the same areas and have a large overlap.



For a second group, the activation maps computed with 4-th order cumulant matrices were always broader than those computed with 2-nd order correlation matrices. In that case, the correlation coefficients of temporal components with the design vector dropped to less than ρ < 0.80 whatever was the dimension of the latent space. However, in all cases correlation coefficients were larger than ρ > 0.7.

Fig. 2 shows, for the former group, overlay graphs of the design vector and the ”best-fitting” temporal component and illustrates the spatial extension of the activations resulting from either the stBSS-SO or the stBSS-FO algorithm (green) in relation to a corresponding evaluation using the SPM toolbox (blue - positive, yellow negative activity). The best fitting component is indeed highly correlated with the design vector, hence follows the experimental paradigm almost perfectly. Concerning spatial maps, areas colored green indicate congruent activated regions obtained with one of the newly proposed algorithms as well as with the SPM tool. However, the new algorithms clearly result in much more focused and spatially localized activity distributions. Similar results have been obtained with the algorithms stJADE [12] and stSOBI [29] (not shown here). Results of the second group show that the correlation of the temporal component with the design vector is still high in case of stBSS-FO but drops in case of stBSS-SO. Spatial maps again turn out to be much more localized. By and large, across all subjects studied, activity distributions are congruent between subjects showing prominent activations in area V1 and V2, and to a lesser extent also in higher visual areas as can be seen from Fig. 2. 5.2. Dynamic functional connectivity Functional connectivity is a key aspect in the analysis of resting-state functional magnetic resonance imaging (rs-fMRI). It is based on calculating association measures – mostly Pearson correlation – between distinct regions in the brain. First attempts focused on the static case, for which whole time courses of restingstate-related brain regions were used for evaluating correlation coefficients representing the strength of their functional connections [30,31]. This approach resulted in many insights ranging from a small-world organization of brain graphs that are constructed from this so-called connectome [32] over deviations in functional connectivity between pathological and healthy brains [33,34] to developmental changes of functional connectivity [35]. It was also possible to identify similarities between physical systems like the Ising-model of a spin glass and functional connectivity brain networks [36]. However, using whole time courses integrates out all temporal dependences within the connectome, resulting in static

96

M. Goldhacker et al. / Computer Methods and Programs in Biomedicine 151 (2017) 91–99

Fig. 3. Normalized eigenvalues of the data X.

average connectivities. Recently a paradigm shift occurred towards a so-called dFCN, which takes into account the temporal variability of functional connections in the brain. Investigating temporal fluctuations of functional connectivity thus has received considerable attention in the last few years [28,37–41]. DFCN can be investigated with a sliding-window technique [28] applied to the time courses of independent components (ICs) resulting from a group independent component analysis (gICA) [7] on a very large set of subjects undergoing rs-fMRI. This common technique is also employed in the present study. Note that recent development in the field offers promising new methods for doing gICA efficiently in a parallelized manner [42,43]. The idea is to track the variability of correlation matrices formed from segments of the time courses of all ICs. Shifting the window one time step further, results in a new correlation matrix for the next time step with slight changes in correlation coefficients and so on. The vast set of correlation matrices resulting from such a slidingwindow approach can be condensed into several representative correlation patterns by applying k-means clustering [28]. The stable patterns, which robustly showed up as cluster-centroids, can be considered connectivity-states given the fact that these centroids represent very robust and almost discrete correlation patterns, reflecting characteristic connectivities that the brain goes through over time, while simultaneously remaining similar between subjects. When using the k-means algorithm the number of extracted clusters has to be predefined. Usually it is deduced from a silhouette score which compares the variance within the extracted clusters to the variance between them while varying the number of clusters. 5.2.1. Correlation structure of the resting states Considering all correlation data collected in matrix X, only the stBSS-FO variant of the stBSS algorithm is explored. There, as a first analysis step, an SVD of the data matrix is considered where the number of significant components, i.e. resting states to be considered, can be chosen by a knee criterion applied to a scree plot of the normalized eigenvalues (see 3). As a result, we concluded that 5 resting states deem most appropriate for further analysis. As described in Section 2.3, the K = 5 latent components of both spaces are used to estimate cumulant matrices. The latter serve to find a joint diagonalizer which finally helps to estimate the rotation matrix. Then the components in both spaces are estimated as described by Eq. (5). The result is compared in Fig. 4 with the mixing matrix resulting from an ap-

Fig. 4. Five Brain States estimated using 3 algorithms: INFOMAX, JADE and stBSSFO. Table 3 Cosine similarity score of the extracted connectivity states resulting from the algorithms: (1) INFOMAX, (2) JADE and (3) stBSS-FO. states

(1) vs (2)

(1) vs (3)

(2) vs (3)

1 2 3 4 5

0.99 0.98 0.98 0.99 0.99

0.98 0.94 0.92 0.98 0.68

0.97 0.97 0.95 0.97 0.72

plication of the INFOMAX or the JADE algorithm, respectively. Note that in these algorithms a similar rotation matrix is estimated with pre-whitened data, i.e. using the eigenvector matrix VTK resulting from an SVD. As can be seen, all algorithms produce rather similar states, except the fifth state of the stBSS-FO, which looks slightly different when compared to the result of the other algorithms. The Table 3 collects the values of a pairwise cosine similarity score. 5.2.2. Temporal evolution of the correlation structure Finally, the quality of the extracted resting states has to be assessed. This is achieved by measuring the reconstruction quality of a weighted linear combination of resting states with the original centered dFCN matrices. For this purpose, the five resting states

M. Goldhacker et al. / Computer Methods and Programs in Biomedicine 151 (2017) 91–99

Fig. 5. Measurement of the cosine similarity between the original dFCN matrices and the linear combination of the five extracted states.

97

Fig. 7. Normalized eigenvalues of dFCN times series for different number of components of gICA step. Table 4 Explained variance with five PCs of dFCN correlation matrices computed with different number of gICA components.

Fig. 6. Averaged cosine similarity, over all 400 sessions, between the original dFCN matrices and the linear combination of the five extracted states. The gray shadding display the error bars.

were multiplied with their time course weights and then added up to reconstruct the dFCN matrix of a certain time point. Similarity was measured with a cosine criterion. Fig. 5 illustrates the similarity measures for a whole session. At the single subject level this similarity score fluctuates considerably, which renders it difficult to assess the quality of the states. Therefore an average similarity across all 400 sessions was calculated separately for the three different ICA algorithms. The result of these computations is shown in Fig. 6. All three ICA algorithms yield similar temporal evolutions for each session, though the states are not completely identical. As referred before, all algorithms share a common pre-processing step which explains the reconstruction error in terms of the discarded singular values of the SVD decomposition (see Eq. (4)). The second step does not change the value of this error as it follows from Eq. (6). Therefore, the components, estimated from all three algorithms, show a similar behavior when compared with the original. Finally we need to test the robustness of the resulting brain states against any variation of the number of states to be extracted as well as the number of dFCN correlation matrices to be used in the analysis. Obviously a larger number of ICs used in the gICA analysis would lead to a higher resolution of the human connectome,

Nr components

variance

11 24 43 59

0.4440 0.3868 0.2735 0.2236

thus resulting in an improved data representation via the low-rank approximation of the data matrix. However, the computation time and working memory requirements for the calculation of independent spatial maps via gICA increase strongly with a rising number of brain components. To test the robustness of the extracted resting states, next to the 24 brain component correlation matrices, also a smaller set with 11 components and two larger sets with 43 and 59 components were used. First, the number of underlying brain states was estimated for the new data with the elbow criterion applied to the eigenvalue spectrum. As can be seen in Fig. 7, the resulting curves of the normalized and ordered eigenvalues of the correlation matrix are quite similar. It appears that by applying an elbow criterion, five brain states seems a good choice for the different brain component numbers tested. The amount of variance, which can be explained with five principal components, is given in the following table (see Table. 4) for different sizes of the correlation matrices. It appears that 24 components result in clearly structured brain states, which yield a homogeneous activity distribution across the related brain areas involved. Considering a proper number of brain states, the cosine similarity score has been employed to estimate the closeness between the original dFCNs and the reconstructed correlation maps while varying the number of extracted brain states. Not surprisingly, the similarity score increases with decreasing number of ICs and an increasing number of brain states (see Fig. 8). No optimum could be observed, however, which would have helped to identify a best number of ICs used for the gICA preprocessing. 6. Conclusion The present study treats spatiotemporal ICA based on second order or higher order correlations on a similar footing. It starts from a SVD of any given data array and discusses the form of the factor matrices of a generic ICA model, i.e. X = WH in relation to

98

M. Goldhacker et al. / Computer Methods and Programs in Biomedicine 151 (2017) 91–99

Fig. 8. Mean of the average cosine similarity for various numbers of BRS computed with different number of gICA components.

the eigenvector and singular values matrices of the SVD decomposition. A subsequent ICA only needs to find an orthogonal rotation matrix A, which multiplies the factor matrices of SVD decomposition to simultaneously yield independent spatial and temporal factor matrices of an ICA decomposition. Furthermore, the approximaˆ to the original data set X does not depend on the rotation tion X matrix. This is a common characteristic of all methods that apply SVD on centered data, followed by the estimation of an orthogonal matrix. Two applications are discussed to illustrate the new spatiotemporal BSS algorithm. A rather trivial retinotopic fMRI experiment is discussed first, which compares stBSS-SO with stBSS-FO, and both with an SPM-GLM analysis. It is shown that temporal components agree surprisingly well, while spatial maps differ somewhat though showing substantial overlap. Importantly, the new method yields most confined and focused areas with the least spatial extension. As a second application, dynamic functional connectivity networks are studied with respect to resting state dynamics. The identification of a proper number of resting states is discussed, and the dynamical evolution of the correlations between their voxel time courses are compared as they resulted from different ICA algorithms, including the new stBSS. Algorithms like INFOMAX, JADE and the new stBSS resulted in rather similar resting states and their temporal evolution. 7. Conflict of interest The authors declare no conflict of interest. References [1] I. Jolliffe, Principal Component Analysis, Springer Series in Statistics, 2006. [2] A. Hyvärinen, J. Karhunen, E. Oja, Independent Component Analysis, John Wiley & Sons, 2001. [3] A. Cichocki, R. Zdunek, A.H. Pham, S. Amari, Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation, Wiley and Sons, 2009. [4] P. Comon, C. Jutten, Handbook of Blind Source Separation: Independent Component Analysis and its Applications, Academic Press, 2010. [5] J.-B. Poline, M. Brett, The general linear model and fMRI: does love last forever? Neuroimage 62 (2) (2012) 871–880. [6] V.D. Calhoun, T. Adali, Multi-subject independent component analysis of fMRI: a decade of intrinsic networks, default mode, and neurodiagnostic discovery, IEEE Rev. Biomed. Eng. 5 (2012) 60–73, doi:10.1109/RBME.2012.2211076. [7] V.D. Calhoun, T. Adali, G.D. Pearlson, J.J. Pekar, Spatial and temporal independent component analysis of functional MRI data containing a pair of task-related waveforms., Hum. Brain Mapp. 13 (1) (2001) 43–53.

[8] Y. Shi, W. Zeng, N. Wang, D. Chen, A novel fMRI group data analysis method based on data-driven reference extracting from group subjects, Comput. Methods Programs Biomed. 122 (3) (2015) 362–371, doi:10.1016/j.cmpb.2015.09.002. [9] V.D. Calhoun, J. Liu, T. Adali, A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data, Neuroimage 45 (1, Supplement) (2009) S163–S172, doi:10.1016/j.neuroimage.2008.10.057. [10] E. Seifritz, F. Esposito, F. Hennel, H. Mustovic, J.G. Neuhoff, D. Bilecen, G. Tedeschi, K. Scheffler, F.D. Salle, Spatiotemporal pattern of neural processing in the human auditory cortex., Science (New York, N.Y.) 297 (5587) (2002) 1706–1708. [11] J.V. Stone, J. Porrill, N.R. Porter, I.D. Wilkinson, Spatiotemporal independent component analysis of event-related fMRI data using skewed probability density functions, Neuroimage 15 (2) (2002) 407–421. [12] F.J. Theis, P. Gruber, I.R. Keck, E.W. Lang, A robust model for spatiotemporal dependencies, Neurocomputing 71 (10) (2008) 2209–2216. [13] S.M. Smith, K.L. Miller, S. Moeller, J. Xu, E.J. Auerbach, M.W. Woolrich, C.F. Beckmann, M. Jenkinson, J. Andersson, M.F. Glasser, D.C.V. Essen, D.A. Feinberg, E.S. Yacoub, K. Ugurbil, Temporally-independent functional modes of spontaneous brain activity, Proc. Natl. Acad. Sci. 109 (8) (2012) 3131–3136. [14] W.T.C. of Neuroimaging, http://www.fil.ion.ucl.ac.uk/spm/software/spm8/, 2009. [15] D.C. Van Essen, K. Ugurbil, E. Auerbach, D. Barch, T.E.J. Behrens, R. Bucholz, A. Chang, L. Chen, M. Corbetta, S.W. Curtiss, S. Della Penna, D. Feinberg, M.F. Glasser, N. Harel, A.C. Heath, L. Larson-Prior, D. Marcus, G. Michalareas, S. Moeller, R. Oostenveld, S.E. Petersen, F. Prior, B.L. Schlaggar, S.M. Smith, A.Z. Snyder, J. Xu, E. Yacoub, The human connectome project: a data acquisition perspective., Neuroimage 62 (4) (2012) 2222–2231, doi:10.1016/j.neuroimage. 2012.02.018. [16] S. Moeller, E. Yacoub, C.A. Olman, E. Auerbach, J. Strupp, N. Harel, K. Ug˘ rbil, Multiband multislice GE-EPI at 7 tesla, with 16-fold acceleration using partial parallel imaging with application to high spatial and temporal whole-brain fMRI., Magn. Reson. Med. 63 (5) (2010) 1144–1153, doi:10.1002/mrm.22361. [17] D.A. Feinberg, S. Moeller, S.M. Smith, E. Auerbach, S. Ramanna, M. Gunther, M.F. Glasser, K.L. Miller, K. Ugurbil, E. Yacoub, Multiplexed echo planar imaging for sub-second whole brain FMRI and fast diffusion imaging., PLoS ONE 5 (12) (2010) e15710, doi:10.1371/journal.pone.0015710. [18] K. Setsompop, B.A. Gagoski, J.R. Polimeni, T. Witzel, V.J. Wedeen, L.L. Wald, Blipped-controlled aliasing in parallel imaging for simultaneous multislice echo planar imaging with reduced g-factor penalty., Magnet. Reson. Med. 67 (5) (2012) 1210–1224, doi:10.1002/mrm.23097. [19] J. Xu, S. Moeller, J. Strupp, E.J. Auerbach, L. Chen, D.A. Feinberg, K. Ugurbil, E. Yacoub, Highly accelerated whole brain imaging using aligned-blipped-controlled-aliasing multiband EPI, in: Proceedings of the 20th Annual Meeting of ISMRM, 2012, p. 2306. [20] M.F. Glasser, S.N. Sotiropoulos, J.A. Wilson, T.S. Coalson, B. Fischl, J.L. Andersson, J. Xu, S. Jbabdi, M. Webster, J.R. Polimeni, D.C. Van Essen, M. Jenkinson, The minimal preprocessing pipelines for the human connectome project., Neuroimage 80 (2013) 105–124, doi:10.1016/j.neuroimage.2013.04.127. [21] M. Jenkinson, P. Bannister, M. Brady, S. Smith, Improved optimization for the robust and accurate linear registration and motion correction of brain images, Neuroimage 17 (2) (2002) 825–841, doi:10.1006/nimg.2002.1132. [22] M. Jenkinson, C.F. Beckmann, T.E.J. Behrens, M.W. Woolrich, S.M. Smith, FSL, Neuroimage 62 (2) (2012) 782–790, doi:10.1016/j.neuroimage.2011.09.015. [23] B. Fischl, FreeSurfer, Neuroimage 62 (2) (2012) 774–781, doi:10.1016/j. neuroimage.2012.01.021. [24] S.M. Smith, C.F. Beckmann, J. Andersson, E.J. Auerbach, J. Bijsterbosch, G. Douaud, E. Duff, D.A. Feinberg, L. Griffanti, M.P. Harms, M. Kelly, T. Laumann, K.L. Miller, S. Moeller, S. Petersen, J. Power, G. Salimi-Khorshidi, A.Z. Snyder, A.T. Vu, M.W. Woolrich, J. Xu, E. Yacoub, K. Ug˘ rbil, D.C. Van Essen, M.F. Glasser, Resting-state fMRI in the human connectome project., Neuroimage 80 (2013) 144–168, doi:10.1016/j.neuroimage.2013.05.039. [25] G. Salimi-Khorshidi, G. Douaud, C.F. Beckmann, M.F. Glasser, L. Griffanti, S.M. Smith, Automatic denoising of functional MRI data: combining independent component analysis and hierarchical fusion of classifiers., Neuroimage 90 (2014) 449–468, doi:10.1016/j.neuroimage.2013.11.046. [26] L. Griffanti, G. Salimi-Khorshidi, C.F. Beckmann, E.J. Auerbach, G. Douaud, C.E. Sexton, E. Zsoldos, K.P. Ebmeier, N. Filippini, C.E. Mackay, S. Moeller, J. Xu, E. Yacoub, G. Baselli, K. Ugurbil, K.L. Miller, S.M. Smith, ICA-based artefact removal and accelerated fMRI acquisition for improved resting state network imaging., Neuroimage 95 (2014) 232–247, doi:10.1016/j.neuroimage.2014.03. 034. [27] V.D. Calhoun, T. Eichele, T. Adali, E.A. Allen, Decomposing the brain: components and modes, networks and nodes, Trends Cogn. Sci. 16 (5) (2012) 255– 256, doi:10.1016/j.tics.2012.03.008. [28] E.A. Allen, E. Damaraju, S.M. Plis, E.B. Erhardt, T. Eichele, V.D. Calhoun, Tracking whole-brain connectivity dynamics in the resting state, Cereb. Cortex (2012), doi:10.1093/cercor/bhs352. [29] F. Theis, A. Meyer-Bäse, E.W. Lang, in: Second-Order Blind Source Separation Based on Multi-Dimensional Autocovariances, Springer Berlin, Proc. ICA 2004, Granada, Spain, 2004, pp. 726–733. [30] V.M. Eguíluz, D.R. Chialvo, G.a. Cecchi, M. Baliki, a.V. Apkarian, Scale-free brain functional networks, Phys. Rev. Lett. 94 (1) (2005) 018102, doi:10.1103/ PhysRevLett.94.018102. [31] E.W. Lang, A.M. Tomé, I.R. Keck, J.M. Górriz-Sáez, C.G. Puntonet, Brain connectivity analysis: a short survey, Comput. Intell. Neurosci. 2012 (2012) 21, doi:10.1155/2012/412512.

M. Goldhacker et al. / Computer Methods and Programs in Biomedicine 151 (2017) 91–99 [32] E. Bullmore, O. Sporns, Complex brain networks: graph theoretical analysis of structural and functional systems, Nat. Rev. Neurosci. 10 (3) (2009) 186–198, doi:10.1038/nrn2575. [33] C.J. Stam, B.F. Jones, G. Nolte, M. Breakspear, P. Scheltens, Small-world networks and functional connectivity in Alzheimer’s disease., Cerebral Cortex (New York, N.Y. : 1991) 17 (1) (2007) 92–99, doi:10.1093/cercor/bhj127. [34] S. Ma, V.D. Calhoun, R. Phlypo, T. Adali, Dynamic changes of spatial functional network connectivity in healthy individuals and schizophrenia patients using independent vector analysis., Neuroimage 90 (2014) 196–206, doi:10.1016/j. neuroimage.2013.12.063. [35] L. Geerligs, R.J. Renken, E. Saliasi, N.M. Maurits, M.M. Lorist, A brain-wide study of age-related changes in functional connectivity., Cereb. Cortex 2 (2014) 1–13, doi:10.1093/cercor/bhu012. [36] D. Fraiman, P. Balenzuela, J. Foss, D. Chialvo, Ising-like dynamics in large-scale functional brain networks, Phys. Rev. E 79 (6) (2009) 061922, doi:10.1103/ PhysRevE.79.061922. [37] C. Chang, G.H. Glover, Time-frequency dynamics of resting-state brain connectivity measured with fMRI., Neuroimage 50 (1) (2010) 81–98, doi:10.1016/j. neuroimage.2009.12.011.

99

[38] I. Cribben, R. Haraldsdottir, L.Y. Atlas, T.D. Wager, M.A. Lindquist, Dynamic connectivity regression: determining state-related changes in brain connectivity., Neuroimage 61 (4) (2012) 907–920, doi:10.1016/j.neuroimage.2012.03.070. [39] V.D. Calhoun, R. Miller, G. Pearlson, T. Adali, The chronnectome: time-varying connectivity networks as the next frontier in fMRI data discovery, Neuron 84 (2) (2014) 262–274, doi:10.1016/j.neuron.2014.10.015. [40] G. Deco, V.K. Jirsa, A.R. McIntosh, Emerging concepts for the dynamical organization of resting-state activity in the brain, Nat. Rev. Neurosci. 12 (1) (2011) 43–56, doi:10.1038/nrn2961. [41] N.J. Kopell, H.J. Gritton, M.A. Whittington, M.A. Kramer, Beyond the connectome: the dyome, Neuron 83 (6) (2014) 1319–1328, doi:10.1016/j.neuron.2014. 08.016. [42] Y. Jing, W. Zeng, N. Wang, T. Ren, Y. Shi, J. Yin, Q. Xu, GPU-Based parallel group ICA for functional magnetic resonance data, Comput. Methods Programs Biomed. 119 (1) (2015) 9–16, doi:10.1016/j.cmpb.2015.02.002. [43] A. Eklund, M. Andersson, H. Knutsson, fMRI analysis on the GPU-Possibilities and challenges, Comput. Methods Programs Biomed. 105 (2) (2012) 145–161, doi:10.1016/j.cmpb.2011.07.007.