A Bayesian multilevel model for fMRI data analysis

A Bayesian multilevel model for fMRI data analysis

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252 journal homepage: www.intl.elsevierhealth.com...

3MB Sizes 6 Downloads 206 Views

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

journal homepage: www.intl.elsevierhealth.com/journals/cmpb

A Bayesian multilevel model for fMRI data analysis Adelino R. Ferreira da Silva ∗ Dep. de Eng. Electrotécnica, Faculdade de Ciências e Tecnologia, FCT, Universidade Nova de Lisboa, 2829-516 Caparica, Portugal

a r t i c l e

i n f o

a b s t r a c t

Article history:

Bayesian approaches have been proposed by several functional magnetic resonance imaging

Received 21 April 2009

(fMRI) researchers in order to overcome the fundamental limitations of the popular statisti-

Received in revised form 6 May 2010

cal parametric mapping method. However, the difficulties associated with subjective prior

Accepted 10 May 2010

elicitation have prevented the widespread adoption of the Bayesian methodology by the neuroimaging community. In this paper, we present a Bayesian multilevel model for the analysis

Keywords:

of brain fMRI data. The main idea is to consider that all the estimated group effects (fMRI

Bayesian multilevel/hierarchical

activation patterns) are exchangeable. This means that all the collected voxel time series are

model

considered manifestations of a few common underlying phenomena. In contradistinction

fMRI

to other Bayesian approaches, we think of the estimated activations as multivariate random

MCMC

draws from the same distribution without imposing specific prior spatial and/or temporal information for the interaction between voxels. Instead, a two-stage empirical Bayes prior approach is used to relate voxel regression equations through correlations between the regression coefficient vectors. The adaptive shrinkage properties of the Bayesian multilevel methodology are exploited to deal with spatial variations, and noise outliers. The characteristics of the proposed model are evaluated by considering its application to two real data sets. © 2010 Elsevier Ireland Ltd. All rights reserved.

1.

Introduction

Functional magnetic resonance imaging (fMRI) is a neuroimaging technique that exploits the biological demands of neuronal activity to provide spatial and temporal information on brain function [1]. A fMRI data set consists of time series of volume data sets. The most commonly used functional imaging technique relies on the blood oxygenation level dependent (BOLD) phenomenon [2]. By analyzing the information provided by the BOLD signals in 4D space, it is possible to make inferences about activation patterns in the human brain. The statistical analysis of fMRI experiments usually involves the formation and assessment of a statistic image, commonly referred to as a statistical parametric map (SPM). The SPM summarizes a statistic indicating evidence of the underlying



neuronal activations for a particular task. Currently, the statistical method used by the vast majority of fMRI data researchers and neuroscientists is the general linear model (GLM). The GLM method is an analogue of the classical linear regression model which has been routinely used in statistics, engineering, and econometrics [3]. The GLM procedure is often said to be ‘massively univariate’ [4], since data for each voxel are independently fit with the same model. The fMRI methodology is carried out using a threestage approach. In the first stage, functional data sets are pre-processed to facilitate or improve the analysis in the subsequent two stages. For instance, geometric registration and motion correction of brain images are two important preprocessing first-stage steps. Other pre-processing steps may include spatial and/or temporal smoothing, and voxel time series prewhitening [5]. In the second stage, the GLM model

Corresponding author. Tel.: +351 914853847. E-mail address: [email protected]. 0169-2607/$ – see front matter © 2010 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.cmpb.2010.05.003

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

is used to fit a linear model at each voxel time series, using a linear combination of several regressors (explanatory variables), corresponding to some experimental conditions. These regressors are specified assuming that the BOLD signal is the output of a linear time-invariant system. The form of the response can be expressed as the convolution of the neural activity with a hemodynamic response functions (HRF). Once this model has been fitted at each voxel, statistics associated with the BOLD response are calculated and a SPM is generated. The third stage deals with the identification of those areas of the brain that were activated by the stimuli. Statistical inference is conducted to identify activation maps linked to particular contrasts. Classical hypothesis tests and p-values have been the main statistical tools used to make statistical inferences in neuroimaging. A test statistic (typically t or Fstatistics) is computed and compared to its hypothesized null distribution to assess the consistency of the data with the null hypothesis. The probability distribution of the test statistic under the null hypothesis is used to compute a p-value. If sufficiently small, the null hypothesis can be rejected and an inference is made [6]. Unfortunately, there is a myriad of problems associated with the indiscriminate use of p-values and null hypothesis tests. The difficulty in interpreting pvalues has been the subject of many statistical analysis [7–10]. Recently, the standard SPM analysis has been extended with procedures to circumvent the multiple comparison problem, and to delineate activated regions in a more reliable way (see [11] for a review). The major worries faced by the SPM statistical methodology are significantly reduced when viewed from a Bayesian multilevel (or hierarchical) perspective. Three of the main characteristics of modern Bayesian multilevel models are particularly relevant for fMRI analysis. First, standard Bayesian inference applied to fMRI analysis directly states the probability that an activation is true. The probability of activation, i.e., the probability that the activation exceeds some specified threshold, can be computed from the posterior density. In contrast to p-values, the posterior probabilities do not need to be adjusted. Second, multilevel models are flexible tools for combining information and partial pooling of inferences, i.e., shifting estimates toward each other. There are many sources of noise in the measured fMRI signal. In fMRI analysis, noise, whatever its source, reduces the significance of detected activations. The shrinkage properties of the Bayesian multilevel methodology are important to deal with noise outliers, and data spatial variations. Shrinkage of the coefficients in regression models improves the predictive performance of these models. Furthermore, since the amount of adjustment is learned from the data, the shrinkage induced by the Bayesian multilevel analysis automatically performs a multiple comparison correction. As a consequence, in most cases the multiple comparison problem disappears when viewed from a Bayesian perspective [12,13]. Third, in Bayesian multilevel models, the parameters of prior distributions are estimated from data, rather than requiring them to be specified using subjective information. Fully Bayes approaches have been proposed by several fMRI researchers. However, the complexity of the calculations involved with the implementation of Bayes theorems have precluded the widespread adoption of Bayesian analysis by the

239

neuroimaging community. For example, the difficulties associated with subjective prior elicitation in fully Bayes models such as those proposed in [14] are remarkable. Prior specification in fully Bayes approaches is made more challenging when the data are voluminous, and complex relationships among voxel time series exist. On the contrary, Bayesian multilevel models rely on empirical Bayes approaches to estimate prior hyperparameters. Multilevel modeling may be regarded as a generalization of regression methods in which regression coefficients are themselves given a model with parameters estimated from data [15]. On the other hand, multilevel empirical Bayes approaches are an approximation to a fully Bayes approach in which a second-stage prior is specified on the hyperparameters of the first-stage prior [16]. The prior specification process may be further extended by specifying more than two prior stages. The practical importance of this process is twofold. First, the number of prior hyperparameters to be specified is substantially reduced at each stage. Second, the amount of shrinkage induced by the priors is driven by information in the data. In this paper, we present a Bayesian multilevel model for the analysis of brain fMRI data. The main idea behind the proposed model is to consider that all the estimated group effects (fMRI activation patterns) are exchangeable. In fMRI data analysis terms, this means that all the collected voxel time series are manifestations of a few common underlying phenomena. Hence, all the estimated groups of activation (and non-activation) patterns in the human brain are group effects estimates. In contrast to other similar Bayesian approaches for fMRI data analysis, e.g., [17–20], the proposed approach has the following key distinctive features. We think of the estimated activations as multivariate random draws from the same distribution to take into account variation over voxels. We do not impose specific prior spatial and/or temporal information for the interaction between voxels. Instead, we use a two-stage empirical Bayes prior approach to relate voxel regression equations through correlations between the regression coefficient vectors. The model exploits the adaptive shrinkage properties of the multilevel empirical Bayesian methodology to deal with data spatial variations, and noise outliers. In line with modern neurophysiological studies, the model tries to capture functional similarity between voxel time series, rather than modeling short-range spatial correlations. Based on experimental tests, we show that the proposed method controls the dispersion of the regression coefficients estimates, thus providing a better fitting approach compared to SPM-like massively univariate approaches. The rest of the paper is organized as follows. In Section 2, we describe the proposed Bayesian multilevel model. We define the model’s conditional distributions, and provide implementation details of the MCMC computation. In Section 3, we analyze the characteristics of the proposed model by considering its application to two real data sets. We compare the multilevel method results with those obtained from the application of two other popular fMRI analysis packages to the same data sets. In Section 4, we emphasize the main features of our model relevant to improve current fMRI data analysis practices. Finally, we suggest research directions for further work.

240

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

2.

Bayesian multilevel modeling

2.1.

Multivariate regression

multivariate regression model have been derived by several authors, e.g. [23,22,24,16]. In this work, we follow the notation and derivations used in the last of these references. To draw from the posterior, we first draw  and then draw B given :

Consider a multivariate regression model for the fMRI time series in which the regression equations are related through common predictor variables, and the errors are correlated across equations. For a general linear model fit at a set of m voxels we have: yi = Xˇi + i ,

iid

i ∼N(0,  2 In ),

i = 1, . . . , m,

(1)

where yi is a vector of n time series observations for voxel i, X is a matrix of predictor variables, ˇi is a vector of unknown parameters, and i are unknown error vectors with the Normal distribution N(0,  2 In ). In matrix notation, model (1) assumes the equivalent standard multivariate regression form: Y = XB + U,

iid

uj ∼N(0, ),

j = 1, . . . , n,

(2)

where Y is a n × m matrix of observations, X is a n × k matrix on k independent variables, B is k × m, with each column of B containing the regression coefficients for one of the m equations, and U is a n × m matrix of random disturbances whose rows uj for given X are not autocorrelated. There is, however, contemporaneous correlation between corresponding errors in different equations, with mean 0 and common variance-covariance matrix . Model (2) involves the study of several regressions taken simultaneously, because the disturbance terms in the several regressions are mutually correlated. Thus, information in one regression can be used to estimate coefficients in other regressions. The standard multivariate regression model assumes that the regression equations are related through common X variables, and that the errors are correlated across equations (contemporaneous correlation). In statistics, when X is a design matrix, model (2) is called the general linear model [21]. In Bayesian inference for the linear model [22,3,15,16], the standard priors for model (2) are the natural conjugate priors (multivariate Normal–Wishart prior): p(B, ) = p()p(B|), ∼IW(, V), ¯  ⊗ A−1 ), ˇ = vec(B)|∼N(vec(B),

(3)

where IW(, V) is the expression for an inverted Wishart density with degrees of freedom  and scale matrix V, which is used for the prior on , and A is the k × k precision matrix ¯ is the unknown specified for the prior on ˇ. In (3), ˇ¯ = vec(B) prior mean vector of the regression parameters ˇ, which is then used to draw the posterior estimates specified in (4) and (5). In (3), vec(·) is the operator which transforms a matrix into a vector by stacking the columns of the matrix one underneath the other, and ⊗ means the Kronecker product of two matrices [3]. Simulations from the multivariate regression model (2) can be obtained by considering the posterior as the product of the conjugate priors and the conditional Normal likelihood, p(B, )p(Y|X, B, ). Expressions for the posterior density of the

|Y, X∼IW( + n, V + S),   ˜  ⊗ (X X + A)−1 , ˇ|Y, X, ∼N ˇ,

(4)

where ˜ ˇ˜ = vec(B), −1 ¯ B˜ = (X X + A) (X Y + AB), ˜  (Y − XB) ˜ + (B˜ − B) ¯  A(B˜ − B). ¯ S = (Y − XB)

(5)

and X means the transpose of X.

2.2.

Multilevel modeling

Now, we generalize the standard linear model presented in Section 2.1 in two directions. Firstly, we lift the restriction on the common structure of the regression equations to improve estimation efficiency. We introduce a multivariate regression prior to estimate correlations between the regression coefficient vectors. Secondly, we present an empirical Bayes approach for second-stage priors to obviate the difficulties with prior elicitation. In fMRI data analysis, the disturbances in the voxel regression equations at a given time are likely to reflect some common unmeasurable factors or structure, and hence are correlated. By taking into account the correlation structure of the disturbances across voxel equations, and by jointly estimating the regression equations, it is generally possible to obtain more precise estimates. We may improve estimation by pooling time series and cross-sectional fMRI data, assuming cross-sectional voxels with different coefficient vectors. In econometrics, a widely used method for joint regression estimation is the seemingly unrelated regression (SUR) method proposed in [25], which can be regarded as a generalization of the standard linear model [26]. An alternative approach has been proposed in [16]. Consider the general linear model (1) with different regressors in each equation, and a different error variance for each voxel: yi = Xi ˇi + i ,

iid

i ∼N(0, i2 Ini ),

i = 1, . . . , m.

(6)

In order to tie together the voxels’ regression equations, we assume that the {ˇi } have a common prior distribution. To build the Bayesian regression model we need to specify a prior on the {ˇi } coefficients, and a prior on the regression error variances {i2 }. Following [16], we specify a Normal regression prior with mean  zi for each ˇ: ˇi =  zi + i ,

iid

i ∼N(0, Vˇ ),

(7)

where z is a vector of nz elements, representing characteristics of each of the m regression equations. A special case of (7) is to consider a common mean vector for all betas by doing zi = 1 and centering the matrix . The prior (7) can be written using the matrix form of the multivariate regression model for

241

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

k regression coefficients:













ˇ1 z1 1 ⎢ .. ⎥ ⎢ .. ⎥ ⎢ .. ⎥ B = ⎣ . ⎦, Z = ⎣ . ⎦, V = ⎣ . ⎦,  ˇm zm m

B = Z + V,

 = [ı1 . . . ık ] ,

(8)

where B and V are m × k matrices, Z is a m × nz matrix,  is a nz × k matrix. Interestingly, the prior (8) assumes the form of a second-stage regression, where each column of  has coefficients which describes how the mean of the k regression coefficients varies as a function of the variables in z. In (8), Z assumes the role of a prior design matrix. Assuming that each of the error variances is independent, a commonly used prior for the regression error variances {i2 } is the standard inverse gamma with parameters a = i /2 and  b = (i s2i )/2, where i s2i = (yi − Xi ˇˆi ) (yi − Xi ˇˆi ), i = n − k, and −1   ˇˆi = (X Xi ) X yi [22]. In terms of the relationship between the i

i

inverse gamma form and the inverse of a chi-square random variable: i s2 i2 ∼ 2i . i

(9)

In order to alleviate the difficulties with the assessment of the priors in (8), we specify a second stage of priors on  and Vˇ . As in (3), we specify natural conjugate priors for the multivariate regression model (8): Vˇ ∼IW(, V), ¯ Vˇ ⊗ A−1 ). vec()|Vˇ ∼N(vec(),

(10)

In summary, the proposed model can be written down as a sequence of conditional distributions: yi |Xi , ˇi , i2

ˇi |zi , , Vˇ i2 |i , s2i Vˇ |, V ¯ A. |Vˇ , ,

The prior on the set of regression coefficients ˇ is specified in two stages. First, we specify a Normal prior on ˇ, and then we specify a second-stage prior on the parameters of this distribution. From a practical point of view, the key feature of model (11) is that it converts the problem of assessing a prior on the (m × k)-dimensional joint distribution of the ˇi ¯ A, , and V. into the problem of assessing hyperparameters , At each stage, the prior parameters are being projected onto lower dimensional subspaces.

2.3.

MCMC computation

Efficient implementation of Bayesian multilevel models has become feasible with the development of Markov Chain Monte Carlo (MCMC) methods for sampling from the posterior distribution of the parameters. In particular, Gibbs sampling can be easily be implemented for the model presented in Section 2.2[16]. The set of full conditional posterior distributions in (11) is used to determine the full posterior for all the parameters. The procedure can be summarized as follows: 0

1. Start with the initial values 0 , Vˇ 0 , and {i2 } ≡ i0 . 2. Repeat MCMC steps for t={1. . . T} iterations { • For all voxel time series, i = 1, . . . , m, perform Gibbs sampling using a univariate regression model by drawing ˇit and it , 

ˇit |yi , Xi , zi , (t−1 ) , Vˇt−1 , it−1 it |yi , Xi , ˇit , i , s2i

˜ ˇ ), ˜ V ∼N(ˇ, ∼i s2i /2i .

• Draw, using the multivariate regression model (4), ¯ A, Vˇt |{ˇit }, , V, Z, , ¯ A. t |{ˇit }, Vˇt , Z, , } t

The draws of ˇit and {i2 } ≡ it are carried out using, (11)

The conditional distributions (11) can be depicted as a graphical model of probabilistic dependencies for the generative model as shown below:

ˇ˜ =



˜ˇ = V

˜ X ˜ + (V t−1 )−1 X ˇ i i



−1

˜ X ˜ + (V t−1 )−1 X ˇ i i

−1



˜  y˜ i + (V t−1 )−1 (t−1 ) zi , X ˇ i

,

˜ i = Xi /i . with y˜ i = yi /i , and X

2.4.

Autocorrelation

It is often stated that fMRI time series typically demonstrate complicated autocorrelation structure [27]. If present in the fMRI data, temporal autocorrelations influence the amount of bias in the estimated results. Several different strategies have been proposed in the fMRI literature for dealing with autocorrelations (see, e.g. [5,28–30]). Prewhitening is generally considered the most efficient method of handling autocorrelation in fMRI data. The regression model for the fMRI time series with autocorrelated errors may be obtained from (1), by considering that the errors have a multivariate Normal distribution with mean 0 and covariance matrix  2 , where is a positive definite matrix: yi = Xˇi + i ,

iid

i ∼N(0,  2 ),

i = 1, . . . , m.

(12)

242

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

Fig. 1 – PPMs representing areas of auditory activation, estimated by the multilevel method for axial slices {13–27} of the auditory data set based on HPD 95% intervals. (a) PPMs with  = 3 and (b) PPMs with  = 168.

The prewhitening approach tries to remove the correlation structure in (12) by pre-multiplying both sides by a suitable matrix. Since is a positive definite matrix, there exists a matrix P with the property that P P = In [21]. If we multiply both sides of (12) by P, we obtain the transformed model: y∗i = X∗ ˇi + ∗i ,

∗i ∼N(0,  2 In ),

i = 1, . . . , m,

(13)

where y∗i = Pyi , X∗ = PX, and ∗i = Pi . Therefore, the transformed model (13) is identical to the Normal regression model (1). Most of the approaches in the fMRI literature use the residuals of an initial ordinary least squares (OLS) model fit to estimate the autocorrelation structure [31]. This estimate is then used to ‘prewhiten’ the model before OLS is used again

to obtain final parameter estimates. However, it has been suggested that prewhitening requires a more robust estimator of the autocorrelation to maintain low bias. In [5], estimation is further improved by using nonlinear spatial filtering to smooth the estimated autocorrelation. This approach has been incorporated in the FMRIB (tm)s Software Library (FSL) (http://www.fmrib.ox.ac.uk/fsl). In this work, we have used the model proposed in [5], and the FSL implementation, to ‘prewhiten’ the data before applying the model described in Sections 2.1–2.3. However, it is worth noting that an alternative approach would be to apply Bayesian computation directly to the linear model with autocorrelated errors. In this case, model (12) is replaced by another one which assumes that the disturbances obey a stationary

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

243

Fig. 2 – Statistical parametric maps representing areas of auditory activation for axial slices {13–27} of the auditory data set: (a) SPM package using the classical method and (b) FSL package with voxel thresholding.

autoregressive process of order p. Hence, this modification specifies:

yt = Xˇt + t , t =

p

s t−s + ut ,

(14)

s=1 iid

ut |(t−1 , t−2 , . . .)∼N(0,  2 In ),

for all periods t = 1 . . . , T, and for each regression equation. A Gibbs sampling algorithm to simulating from the posterior of this model was first proposed by [32]. The latter approach has not been implemented in the present work.

3.

Experiments

In this section we apply the Bayesian multilevel model presented in Section 2 to the analysis of two real fMRI data sets. We compare our results with those obtained from the application of two other popular fMRI analysis tools, the SPM and the FSL packages, to the same data sets. The package SPM is a software package for the analysis of fMRI experiments [33] (http://www.fil.ion.ucl.ac.uk/spm). FSL is another software package for image analysis and statistical processing of fMRI data, available from the FMRIB group at Oxford University [34] (http://www.fmrib.ox.ac.uk/fsl). The basic methodology used both by the SPM and the FSL packages is a massively univariate approach based on general linear models

244

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

Fig. 3 – Statistical parametric maps representing areas of auditory activation for axial slices {13–27} of the auditory data set: (a) SPM package using the VB method and (b) FSL package using the cluster-threshold approach, with a Z-threshold value of 2.3.

(GLMs). It consists of the following three main stages: (1) pre-processing of the fMRI data and specification of the GLM design matrix; (2) estimation of GLM parameters; (3) threshold estimation to produce statistical parametric maps (SPMs). The SPM package also has the possibility of applying an approximate Bayesian analysis at stage (2), commonly referred to as Variational Bayes (VB) [30]. In this case, the model still follows the massively univariate approach, but the GLM parameters are estimated by the VB method. The FSL package offers two choices for threshold estimation: voxel- and cluster-threshold estimation. If voxel thresholding is selected, maximum height thresholding based on Gaussian random field (GRF) theory is car-

ried out [34]. If cluster thresholding is selected, a Z statistic threshold is used to define contiguous clusters. Then, each cluster’s estimated significance level (from GRF-theory) is compared with the cluster probability threshold. In Bayesian analysis, posterior probability maps (PPMs) are used for representing the ‘activated’ cortex areas in response to some experimental stimulation. Similarly to classical SPMs, PPMs can be obtained by selecting appropriate thresholds. In MCMC simulations for Bayesian analysis, the thresholds are selected using the posterior distributions of the regressor coefficients. The simulations for the Bayesian multilevel model reported in this section were performed on a slice-by-slice basis.

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

245

Fig. 4 – Variability of the posterior predictive y values as a function of the estimated ˇˆ values in voxels of slice 21 of the auditory data set, for MCMC simulations with three different values of the hyperparameter : (a) left panel:  = 3; middle panel:  = 45; right panel:  = 168.

Until recently, MCMC simulations in Bayesian computations for large data sets imposed too large computational requirements to be effectively used in practice. However, modern graphic processing units (GPUs) are rapidly gaining maturity as powerful general parallel computing devices. The simulations reported in this section were performed both with sequential C-code and parallel GPU code, on a 2.2 GHz notebook equipped with a GeForce 8400M GS(tm) NVIDIA device. For processing the MCMC simulations we used both sequential and parallel GPU implementations. The following figures are indicative of the run-times involved in performing the MCMC simulations for the proposed Bayesian multilevel model. Processing slice 21 of the auditory data set described in Section 3.1 with 5000 iterations in sequential C-code took 452 s of CPU time. The same simulation took 65 s, using the GPU implementation. The GPU software used in this work is freely available as an R-package, named cudaBayesreg, at the CRAN repository http://www.r-project.org/. A description of the GPU computational model supporting the parallel MCMC simulation is beyond the scope of the present paper.

3.1.

Experiments using the SPM auditory data set

The auditory fMRI data set is a data set available from the SPM site, which has been analyzed by several researchers and is often used as a reference. This data set comprises whole brain BOLD/EPI images, acquired as successive blocks alternating between rest and auditory stimulation, starting with rest. Auditory stimulation was bi-syllabic words presented binaurally at a rate of 60/min. Each acquisition consisted of 64 contiguous slices for each volume. Acquisition took 6.05 s, with the scan to scan repeat time (TR) set arbitrarily to 7 s. 96 s acquisitions were made (TR=7 s) from a single subject, in blocks of 6, giving 16 42 s blocks. As suggested in [33], the first 12 volumes were discarded as ‘dummy scans’, leaving nvol = 84 fMRI volumes. The auditory data set was pre-processed by the SPM software for realignment, coregistration and brain extraction, following the procedure outlined in [33]. Typically, the SPM software uses a Gaussian blurring kernel with a full-width at half maximum (FWHM) of 5 mm, which is applied to the images before esti-

246

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

Fig. 5 – Histogram of the posterior distribution of the regression coefficient ˇ2 , simulated by the multilevel method for slice 21 of the auditory data set. The dotted lines reference the HPD 95% interval: (a) simulation with  = 3 and (b) simulation with  = 168.

Fig. 6 – Boxplots of posterior distributions for the regressor coefficient ˇ2 : (a) estimates for 30 time series of random voxels in ‘active’ cortex area and (b) estimates for 30 time series of random voxels in ‘non-active’ cortex areas.

mating the realignment parameters. For the purpose of data preparation, we have used the default procedures and parameterizations for the auditory fMRI example reported in [33, Ch. 28]. Following the SPM manual [33], we used a design matrix with just two regressors, without including a temporal derivative regressor. Two regression coefficients (ˇ1 and ˇ2 ) were estimated. The regression coefficient ˇ1 represents the intercept term. For each slice we selected the HPD 95% interval of the ˇ2 distribution, to define the thresholds of voxel activations associated with the auditory cortex areas. Fig. 1 shows

PPMs for axial slices {13 − 27}, representing cortex areas of auditory activation as estimated by the multilevel method with thresholds selected by highest probability density (HPD) 95% intervals. The slices shown in Fig. 1 were selected among the 46 slices of the auditory data set because they are representative of the main region of interest for auditory activation visualization. Two situations are depicted in Fig. 1 for two values of the hyperparameter parameter  used in the Bayesian multilevel modeling approach presented in Section 2.2:  = 3 and  = 168. The value  = 168 corresponds to twice the

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

247

Fig. 7 – Evolution of the simulated values and autocorrelation functions generated by the multilevel simulation of the  values, as used in (11) to draw the voxel regression coefficients {ˇi }, i = 1, 2, using a subsampling approach with keep factor 10.

number of analyzed volumes. Fig. 2(a) shows SPM images obtained using the SPM package with corrected p-value estimates. A family-wise error (FWE) threshold estimation with default p-value of 0.05 was used to produce the SPM images in Fig. 2(a). Fig. 2(b) presents equivalent statistical parametric images obtained using the FSL package. A ‘voxel-type’ thresholding with p-value of 0.05 was used to produce the images in Fig. 2(b). In Fig. 3(a), SPM images obtained using the SPM package with the VB approach are depicted. Fig. 3(b) shows SPM images obtained using the FSL package with the clusterthreshold approach, for a Z-threshold value of 2.3. In Bayesian methods, the prior induces a phenomenon of shrinkage in which the Bayes estimates are clustered more closely together. In addition, Bayesian multilevel models induce adaptive shrinkage at the group level [12], enabling us to control spatial data variations and noise outliers. In the multilevel method specified in Section 2.2, the amount of shrinkage is controlled both by the prior hyperparameters {, V} and the data. Therefore, parameter estimates and noise outliers are shrunk toward the rest of the parameter estimates in proportion of the specified shrinkage factors, yielding reduced variance estimates. Hence, with bias under control, reduced variance implies better esti-

mates. To illustrate the influence of the hyperparameter  on the shrinking properties of the multilevel approach, Fig. 4 compares the variability of the posterior predictive values of yi = Xˇˆ i in voxels of slice 21 of the auditory data set, for MCMC simulations with three different values of the hyperparameter  ( = {3, 45, 168}). The predictive yi values were obtained using the estimated ˇˆ i values at each voxel. Fig. 4 clearly illustrates the phenomenon of ‘shrinkage’ in the Bayesian multilevel approach, and the influence of the hyperparameter  on the variability of the fitted values. The hyperparameter  is a shrinkage parameter which works as a regularization parameter in a data-adaptive way. This simple regularization approach contrasts with the elaborate data-adaptive mechanisms proposed by Flandin and Penny [19]. Fig. 5(a) and (b) depicts histograms of the posterior distribution of the regression coefficient ˇ2 simulated by the multilevel method for slice 21, for two values of the hyperparameter parameter ,  = {3, 168}. The dotted lines reference the HPD 95% interval for the ˇ2 distribution. The HPD intervals are {−0.55, 0.75} and {−0.6, 0.81}, for  = 3 and  = 168, respectively. The upper values of the HPD intervals are used as threshold estimates to produce posterior probability maps.

248

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

Fig. 8 – Draws of the mean of the random effects distribution with three regressors.

To illustrate the influence of the voxel location on the fitted coefficient precision, we show in Fig. 6 boxplots of the posterior distributions of ˇ2 for voxels in ‘activated’ and ‘non-activated’ cortex areas. Fig. 6(a) shows 30 boxplots of ˇ2 distributions, when the multilevel model is used to fit time series of voxels in estimated ‘active’ visual cortex areas. A similar picture for time series of voxels in estimated ‘non-active’ cortex areas is depicted in Fig. 6(b). The boxes’ lower and upper hinges summarize the limits of the HPD interval. To assess convergence of the simulation, we show in Fig. 7 the evolution of the simulated values and autocorrelation functions (ACFs) generated by the multilevel simulation of the  values in (11). Following (11), the  values are then used to draw the ˇ estimates for each voxel. Fig. 7 shows good mixing properties, and no significant autocorrelations for a simulation of 5000 iterations. The mixing properties of the sampler may be appreciated in Figs. 8–11 as well. All MCMC simulations were performed using a subsampling approach which keeps

Fig. 9 – Draws of the random effects distribution for class CSF with three regressors.

Fig. 10 – Draws of the random effects distribution for class GRY with three regressors.

Fig. 11 – Draws of the random effects distribution for class WHT with three regressors.

every 10th iteration of the MCMC chain [35]. Additional MCMC simulations were performed with a number of iterations up to 15,000. However, since we observed no significant alterations in the PPM images we used simulations with 5000 iterations for run-time convenience. In general, the number of MCMC iterations required for convergence increases as the number of parameters to be estimated increases. In our case, since MCMC simulations were performed one slice at a time, the number of parameters is related to the number of regressions to process for each slice. For instance, after appropriate masking, slice 21 of the auditory data set requires 2872 regressions. Given this figure, the number of MCMC iterations used in the simulations seems rather low. However, it is worth remembering that convergence of the MCMC simulation is attained when sampling from the stationary (invariant) distribution of the parameters is guaranteed. In fMRI analyzes for activation detection, as reflected in PPM images, the posterior distribution of the

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

249

Fig. 13 – PPMs representing areas of activation, estimated by the multilevel method for slices {4–9} of the SUBJ1 data set: (a) visual activated areas and (b) auditory activacted areas.

Fig. 12 – Design matrix for experiment SUBJ1.

regression coefficients may be well-approximated by a mixture of just two Gaussian distributions, one modeling the null distribution and another modeling the tail of the posterior distribution (see Fig. 5). These two distributions are basically associated with just two data clusters: ‘non-activated’ voxels and ‘activated’ voxels. In broad terms, the dimensionality problem of parameter estimation is reduced to evolve a stationary posterior distribution which may by used to discriminate between two components in a Gaussian mixture. Another factor which is known to influence convergence is the scale of the data. In this work, all MCMC simulations were performed with standardized data.

ciated with the partition of the SPM auditory data set in three classes: cerebrospinal fluid (CSF), gray matter (GRY), and white matter (WHT). The segmented masks were obtained by applying ‘FAST’ to the structural high-resolution fMRI image, followed by ‘FLIRT’ for low-resolution registration. The segmented images (CSF/GRY/WHT) were then used to build the Z matrix in (8). In addition, to account for variability in the shape of the voxels’ time series response, we included the derivative of the HRF as a third regressor in the specification of the design matrix. Thus, in this case the MCMC simulation used three regression variables, in which the first represents the intercept and the second is the main regressor. Fig. 8 depicts draws of the mean of the random effects distribution for each of the three regression variables used in the design matrix X. Figs. 9–11 show, for each of the three segmentation classes (CSF/GRY/WHT), the random effects associated with the regression variables. To ease interpretation, the plots for classes CSF/GRY/WHT were drawn at the same scale. These plots clearly show the influence of different tissue types on the

3.2. Analysis of random effects for the SPM auditory data set In this section, we exemplify the analysis of the random effects distribution  (see (8)), following the specification of cross-sectional units (group information) in the Z matrix of the statistical model. The Bayesian multilevel statistical model allows for the analysis of random effects through the specification of the Z matrix for the prior in (8). The FSL tools [34], were used to obtain the segmented masks asso-

Fig. 14 – SPMs representing areas of activation, estimated by the SPM package for slices {4–9} of the SUBJ1 data set: (a) visual activated areas and (b) auditory activated areas.

250

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

Fig. 15 – SPMs representing areas of activation, estimated by the FSL package for slices {4–9} of the SUBJ1 data set: (a) visual activated areas and (b) auditory activated areas.

mean values of the regression estimates. The plots also show that the draws associated with the temporal derivative variable have higher variability than the other draws. In general, the information provided by these analyzes might be useful in identifying factors influencing estimation, and suggest ways for model improvement. A natural extension of these techniques is the estimation of multi-subject random effects.

3.3.

Experiments using the SUBJ1 data set

The fMRI data set used in this section is a complete test data set of a visual/auditory experiment. The data set is available from the University of Cambridge, Department of Psychiatry, Brain Mapping Unit (http://wwwbmu.psychiatry.cam.ac.uk/Data, Brain Mapping Unit). This data set, referred to as SUBJ1, has width = 128(w) × 64(h) × 100(t), and includes 14 slices per volume. The data set includes the specification of the double column design matrix used in the experiment. The first column contains the visual cue, while the next column contains the auditory cue. Full details about the characteristics of this data set are available from the Cambridge BMU data depository. In this experiment, we followed the practice of including the HRF waveforms (regressor coefficients ˇ2 and ˇ4 ) as well as their derivatives (regressor coefficients ˇ3 and ˇ5 ) to account for variability in the shape of the response. The regressor coefficient ˇ1 represents the intercept term. Fig. 12 shows the design matrix, as prepared by the FSL software. As in Section 3.1, PPMs representing the ‘activated’ cortex areas in response to the experimental stimulation can be obtained by selecting appropriate thresholds. For each slice, we selected the HPD 95% intervals of the ˇ2 and ˇ4 distributions to define the thresholds of voxel activations associated with the visual and the auditory cortex areas, respectively. Fig. 13(a) shows PPMs for axial slices {4–9}, representing cortex areas of visual activation as estimated by the multilevel method. In a similar vein, areas of auditory activation are shown in the PPMs of Fig. 13(b). For comparison purposes, the SPM and FSL packages were applied to the same data set.

Fig. 16 – Examples of voxel time series (dotted lines) and fitted time series using the multilevel method (solid lines) for two classes of voxels of the SUBJ1 data set: (a) estimated ‘active’ voxel in the visual cortex area (top panel) and (b) estimated ‘active’ voxel in the auditory cortex area (bottom panel).

Fig. 14 shows images representing auditory and visual areas of activation produced by the SPM package using the classical method. Fig. 15 shows equivalent images, as produced by the FSL package using the voxel approach. Fig. 16 shows two examples of voxels’ time series, and the corresponding fitted responses using the multilevel method. The top panel depicts the fitted response for a voxel estimated ‘active’ in the visual cortex. The bottom panel depicts the fitted response for a voxel estimated ‘active’ in the auditory cortex.

4.

Discussion and conclusion

As referred to in Section 1, the third stage of the classical fMRI methodology focuses on the identification of those areas of the brain that were activated by the stimuli, by thresholding the statistic map at a specified level. In classical inference, the threshold is determined using a classical null hypothesis based on p-values. However, since inferences are carried out over many voxels and large volumes of the brain, the problem is to choose a significance level for each test such that the family-wise error rate (FWE) is controlled at some prespecified level [11,13]. This, so-called, multiple comparison problem requires an adjustment or correction to the p-values using some correction procedure. Many multiple comparison correction approaches have been proposed in the literature (see [11] for a review). In contrast to standard SPM analyzes, the Bayesian multilevel method presented in this work does not employ multiple comparison correction mechanisms. As shown by Gelman in [12], Bayesian multilevel methods, if properly designed, automatically introduce corrections for multiple comparisons. A similar point of view is expressed by Friston and Penny in [13]: “From the point of view of neuroimaging, posterior

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

inference is especially useful because it eschews the multiplecomparison problem”. This is the perspective that has been espoused in this work as well. The experimental analyzes and the PPM images presented in Section 3 show that the Bayesian approach proposed in this work achieves good results in comparison with the ‘state-of-the-art’ approaches incorporated in the SPM and FSL packages. We consider that the techniques reflected in these packages are representative of the field, since they are used by the vast majority of researchers in neuroimaging and cognitive neuroscience. Another problem with classical analysis arises form the non-robustness of the estimates as regards noise and outliers. In standard fMRI analysis, spatial smoothing is often used as a pre-processing step to improve signal-to-noise ratio, and keep the potentially harmful effects of outliers under control. However, these pre-processing steps also have the side effect of blurring the measured signal and reducing sensitivity, thus compromising the quality of the estimates. As seen in Section 3.1, the multilevel approach automatically shrinks noise outliers toward the rest of the parameter estimates in a data-adaptive way. Therefore, the importance of spatial smoothing techniques in Bayesian analysis of fMRI data is greatly reduced. A key assumption of analysis of variance models is that we can make inferences about differences between means by looking at differences between variances. In fMRI data analysis, voxel data provide covariate information at different levels of variation. Besides random noise and correlated measurements, activation patterns associated with different cortex structures contribute to this variability. Bayesian multilevel models are specially designed to measure variability based on a particular prior structure. Gains in estimating regression coefficients are expected when covariance is structured hierarchically. In Section 3, we produced experimental evidence to show that the multilevel model proposed in Section 2 is able to ‘adapt’ the level of shrinkage to the information in the data. The multilevel method used the similarity between voxel time series to improve inference. The level of similarity is determined by the data and conditioned by the prior. A high level of similarity means a high level of information shared and better precision. The gains in parameter precision have a direct impact on the posterior probability estimates, which serve to specify the activation thresholds and outline posterior probability maps. Hence, the estimated threshold values impact the classification of cortex areas as activated or non-activated, and the ensuing interpretation of the results. Bayesian threshold calibration relies on probabilities of activation, obtained directly from the posterior distribution of the parameters. A further advantage of the Bayesian estimates over standard model practices is that we obtain classification probabilities, instead of crude binary classifications derived from tests of hypothesis. In this work we have used the Bayesian approach to estimating the covariance matrix of the distribution of the random effects as a diffuse Wishart conjugate prior. The main reason for using the Wishart distribution W(, ) is that it is the conjugate prior distribution for the inverse covariance matrix in a multivariate Normal distribution. To guarantee that the distribution is proper,  ≥ k + 1, where k × k is the dimension of . Wishart distributions are com-

251

putationally convenient for sampling purposes; multivariate normal simulations can be used to simulate a draw from the Wishart distribution. For this reason, the Wishart distribution is commonly used in Monte Carlo statistical simulations. A non-informative prior is obtained as  → 0. However, as pointed out by one anonymous reviewer, there is a limit on the degree of diffuseness that can be obtained. Alternative formulations for the problem of Bayesian estimation of covariance matrices in hierarchical models have been analyzed by several authors. In particular, Daniels and Kass [36] have proposed a set of hierarchical priors for the covariance matrix that produces posterior shrinkage toward a specific structure, namely diagonality. In general, diffuse Wishart conjugate priors are found effective for large samples, as is the case of the applications considered in this work. Nevertheless, given the impact of the covariance matrix on the estimation of the standard errors of the quantities of interest, namely random effects, further research on the choice of the prior of the covariance matrix and its computational implications is necessary. The multilevel model presented in Section 2 used a Normal distribution as its first-stage prior. In some situations it may be desirable to investigate structural heterogeneity across voxel time series and activation patterns. A more flexible variant of the Normal model would be to consider a mixture of Normals approach for the first-stage prior. The mixture model would be able to identify more detailed structure than that afforded by the Normal model. For instance, the prior model proposed in [37,38] uses a Dirichlet process mixture (DPM) approach. The DPM model is very flexible for approximating multimodal, heavy-tailed or skewed distributions. Besides, it is a valuable tool for cluster classification. As referred to in Section 2.4, another extension of the proposed model would be to include autocorrelation error correction directly in the Bayesian computational model. Finally, we mention that Bayesian multilevel methods are expected to excel in modeling multi-subject and/or multi-session fMRI data. Testing these modifications in neuroimaging is an area of future research.

Acknowledgments The author thanks the three anonymous reviewers for their constructive comments and valuable recommendations. The references on hierarchical priors for covariance matrices presented by one anonymous reviewer are especially appreciated.

references

[1] R. Frackowiak, K. Friston, C. Frith, R. Dolan, C. Price, S. Zeki, J. Ashburner, W. Penny, Human Brain Function, 2nd ed., Academic Press, 2003, URL http://www.fil.ion.ucl.ac.uk/spm/doc/books/hbf2/. [2] G.E. Sardy, Computing Brain Activity Maps from fMRI Time-Series Images, Cambridge University Press, 2007. [3] G. Judge, R. Hill, W. Griffiths, H. Lütkephol, T. Lee, Introduction to the Theory and Practice of Econometrics, 2nd ed., John Wiley and Sons, 1988.

252

c o m p u t e r m e t h o d s a n d p r o g r a m s i n b i o m e d i c i n e 1 0 2 ( 2 0 1 1 ) 238–252

[4] A. Holmes, J. Poline, K. Friston, Characterizing brain images with the general linear model, in: R. Frackowiak, K. Friston, C. Frith, R. Dolan, J. Mazziotta (Eds.), Human Brain Function, Academic Press, 1997, pp. 59–84. [5] M.W. Woolrich, B. Ripley, M. Brady, S.M. Smith, Temporal autocorrelation in univariate linear modeling of FMRI data, NeuroImage 14 (2001) 1370–1386. [6] K. Friston, W. Penny, Posterior probability maps and SPMs, NeuroImage 19 (3) (2003) 1240–1249. [7] J. Berger, T. Sellke, Testing a point null hypothesis: the irreconci ability of p-values and evidence, Journal of the American Statistical Association 82 (1987) 112–122. [8] B. Thompson, Statistical significance tests, effect size reporting, and the vain pursuit of pseudo-objectivity, Theory and Psychology 9 (1999) 191–196. [9] J. Berger, D. Berry, Statistical analysis and the illusion of objectivity, American Scientist 76 (1988) 159–165. [10] K.J. Friston, W. Penny, C. Phillips, S. Kiebel, G. Hinton, J. Ashburner, Classical and Bayesian inference in neuroimaging: theory, NeuroImage 16 (2002) 465–483. [11] J.L. Marchini, A. Presanis, Comparing methods of analyzing fMRI statistical parametric maps, NeuroImage 22 (2004) 1203–1213. [12] A. Gelman, J. Hill, M. Yajima, Why we (usually) dont have to worry about multiple comparisons, unpublished paper, Dept. of Statistics, Columbia University, Available from http://www.stat.columbia.edu/gelman/research/ unpublished/multiple2.pdf (April 2008). [13] K. Friston, W. Penny, Posterior probability maps and SPMs, NeuroImage 19 (2003) 1240–1249. [14] M. Woolrich, M. Jenkinson, J. Brady, S. Smith, Fully Bayesian spatiotemporal modeling of fMRI data, IEEE Transactions on Medical Imaging 23 (2) (2004) 213–231. [15] A. Gelman, Multilevel (hierarchical) modeling: what it can and cannot do, Technometrics 48 (3) (2006) 432–435. [16] P.E. Rossi, G. Allenby, R. McCulloch, Bayesian Statistics and Marketing, John Wiley and Sons, 2005. [17] M.W. Woolrich, T. Behrens, C.F. Beckmann, M. Jenkinson, S.M. Smith, Multilevel linear modelling for FMRI group analysis using Bayesian inference, NeuroImage 21 (2004) 1732–1747. [18] W. Penny, N. Trujillo-Bareto, K. Friston, Bayesian fMRI time series analysis with spatial priors, NeuroImage 24 (2) (2005) 350–362, doi:10.1016/j. neuroimage.2004.08.034. [19] G. Flandin, W. Penny, Bayesian fMRI data analysis with sparse spatial basis function priors, NeuroImage 34 (3) (2007) 1108–1125. doi:10.1016/j.neuroimage.2006.10.005. [20] F. de Pasquale, C.D. Gratta, G. Romani, Empirical Markov Chain Monte Carlo Bayesian analysis of fMRI data, NeuroImage 42 (2008) 99–111. [21] K. Mardia, J. Kent, J. Bibby, Multivariate Analysis, Academic Press, 1979. [22] G.E. Box, G. Tiao, Bayesian Inference in Statistical Analysis, John Wiley and Sons, 1973.

[23] D.V. Lindley, A. Smith, Bayes estimates for the linear model, Journal of the Royal Statistical Society B 34 (1) (1972) 1–41. [24] J. Press, Subjective and Objective Bayesian Statistics, 2nd ed., John Wiley, 2003. [25] A. Zellner, An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias, Journal of the American Statistical Association 57 (1962) 348–368. [26] J. Geweke, Contemporary Bayesian Econometrics and Statistics, John Wiley and Sons, 2005. [27] N. Lazar, The Statistical Analysis of Functional MRI Data, Springer, 2008. [28] K.J. Worsley, C.H. Liao, J. Aston, V. Petre, G.H. Duncan, F. Morales, A.C. Evans, A general statistical analysis for fMRI data, NeuroImage 15 (2002) 1–15. [29] W. Penny, G. Flandin, N. Trujillo-Bareto, Bayesian Comparison of Spatially Regularised General Linear Models, Human Brain Mapping 28 (2007) 275–293. [30] W. Penny, S. Kiebel, K. Friston, Variational Bayesian inference for fMRI time series, NeuroImage 19 (3) (2003) 727–741. [31] J.L. Marchini, S.M. Smith, On bias in the estimation of autocorrelations for fMRI voxel time-series analysis, NeuroImage 18 (2003) 83–90. [32] S. Chib, Bayes regression with autoregressive errors: a Gibbs sampling approach, Journal of Econometrics 58 (1993) 275–294. [33] J. Ashburner, C. Chen, G. Flandin, R. Henson, S. Kiebel, J. Kilner, V. Litvak, R. Moran, W. Penny, K. Stephan, C. Hutton, V. Glauche, J. Mattout, C. Phillips, The SPM8 Manual, Functional Imaging Laboratory, Wellcome Trust Centre for Neuroimaging, Institute of Neurology, UCL, London, October 2008, URL http://www.fil.ion.ucl.ac.uk/spm/. [34] S.M. Smith, M. Jenkinson, M.W. Woolrich, C.F. Beckmann, T.E. Behrens, H. Johansen-Berg, P.R. Bannister, M.D. Luca, I. Drobnjak, D.E. Flitney, R.K. Niazy, J. Saunders, J. Vickers, Y. Zhang, N.D. Stefano, J.M. Brad, P.M. Matthews, Advances in Functional and Structural MR Image Analysis and Implementation as FSL, Tech. Rep. TR04SS2, FMRIB (Oxford Centre for Functional Magnetic Resonance Imaging of the Brain), 2004. [35] C.P. Robert, G. Casella, Monte Carlo Statistical Methods, 2nd ed., Springer Texts in Statistics, Springer-Verlag, New York, 2004. [36] M. Daniels, R. Kass, Nonconjugate Bayesian estimation of covariance matrices and its use in hierarchical models, Journal of the American Statistical Association 94 (448) (1999) 1254–1263. [37] A.R. Ferreira da Silva, A Dirichlet process mixture model for brain MRI tissue classification, Medical Image Analysis 11 (2007) 169–182. [38] A.R. Ferreira da Silva, Bayesian mixture models of variable dimension for image segmentation, Computer Methods and Programs in Biomedicine 94 (2009) 1–14.