C H A P T E R
64 Imaging Brain Networks for Language: Methodology and Examples from the Neurobiology of Reading Anjali Raja Beharelle1 and Steven L. Small2 1
Laboratory for Social and Neural Systems Research, University of Zurich, Zurich, Switzerland; 2 Department of Neurology, University of California, Irvine, CA, USA
64.1 INTRODUCTION Functional neuroimaging is a powerful tool for answering questions regarding how the brain implements language. Standard functional neuroimaging methods tend to treat the brain from a modular perspective, identifying sets of individual regions that are active during particular language tasks. The majority of studies examining the neurobiology of language have used standard functional neuroimaging methods that can help map out areas involved in reading and spoken language. Anatomically, the brain consists of dense and complex connections among these areas, suggesting that interregional communication plays a key role in cognitive function. More recent advances in neuroimaging methods have been developed that can be used to examine the interacting and overlapping networks of the brain regions that support reading. This permits the exploration of the functional properties of any single region with respect to the activity of the other regions within the network (neural context; McIntosh, 1998, 2000), which is essential for characterizing the integration of function among regions. In this chapter, we summarize some of the principal methods of assessing functional and effective brain connectivity and give examples of current studies using these approaches to investigate reading-related networks. We also discuss the advantage each method confers relative to the other methods. Functional and effective connectivity can be assessed in the context of a particular behavior (task-dependent) or during rest (task-independent). Task-dependent
Neurobiology of Language. DOI: http://dx.doi.org/10.1016/B978-0-12-407794-2.00064-X
approaches detect synchronization of activity in neural regions in response to extrinsic stimulation. Approaches involving the absence of a task examine intrinsic functional connections that are formed via spontaneous activity arising during the “resting state” (Biswal, Yetkin, Haughton, & Hyde, 1995). Resting state functional connectivity is thought to reflect a history of coactivation among regions across a wide range of tasks and time (Fox & Raichle, 2007) and can be used to characterize regions across the whole brain into groupings that show high correlations of spontaneous activity (these groups have been referred to as communities, modules, subnetworks, or clusters in network analyses; Power et al., 2011). Connectivity analyses require the selection of a set of relevant nodes within the network to focus the investigation. This is a nontrivial undertaking because there is a lack of consensus regarding how best to define fundamental neural elements (Craddock et al., 2013) and connectivity results can vary based on choices that are made regarding node definition. The specific brain subunits encompassed by the nodes can range from patches of cortex as small as one voxel in size to macroscopic brain regions (such as the pars opercularis of the inferior frontal gyrus). When macroscopic brain regions are used, parcellation schemes have varied widely and can be anatomically based (e.g., automated anatomic labeling (AAL), TzourioMazoyer et al., 2002; the Talairach Daemon, Lancaster et al., 2000; or the “Destrieux” atlas, Fischl et al., 2004) or functionally based (e.g., the CC200 and CC400 atlases that are derived from 200- and 400-unit
805
© 2016 Elsevier Inc. All rights reserved.
806
64. IMAGING BRAIN NETWORKS FOR LANGUAGE: METHODOLOGY AND EXAMPLES FROM THE NEUROBIOLOGY OF READING
functional parcellations, respectively; Craddock, James, Holtzheimer, Hu, & Mayberg, 2012). Although these atlases share some similarities macroscopically, the specific details of the parcellations can vary considerably. Meta-analyses can also be conducted to define nodes based on previous task-based neuroimaging studies (e.g., Vogel et al., 2013).
64.2 FUNCTIONAL CONNECTIVITY ANALYSES: A SET OF EXPLORATORY TECHNIQUES 64.2.1 Overview Functional connectivity can be defined as the synchronization between spatially remote neurophysiological events (Friston, Frith, Liddle, & Frackowiak, 1993). First introduced via early electroencephalography (EEG) and multiunit recording studies, functional connectivity analyses were applied to positron emission tomography and functional magnetic resonance imaging (fMRI) in 1993. Analyses of functional connectivity identify reliable patterns of covarying brain signals that index neural activity. These techniques are exploratory in nature because they are mostly data-driven and are not based on an explicit hypothesis or model about the relationships among neural regions or the effects of task conditions or subject groups. Therefore, a functional connection does not necessarily arise from direct communication between the two regions, because their covariance (or correlation) could be due to input from a third region (or a variety of other inputs), and thus causal inferences cannot be made about the association.
64.2.2 Independent Components Analysis 64.2.2.1 ICA Method Independent components analysis (ICA) is used to take a large data set consisting of many variables and reduce it into smaller number dimensions that can be understood as self-organized functional networks (Beckmann & Smith, 2004). Unlike principal components analysis (PCA), which assumes that the components are uncorrelated in both spatial and temporal domains, ICA components are maximally statistically independent in only one domain. The rationale for ICA is that blood-oxygendependent (BOLD) signal measured within the voxels can be regarded as a linear combination of a smaller number of independent component sources. The independent components are identified to be maximally statistically independent, but they are not necessarily uncorrelated as principal components are (McIntosh & Miˇsi´c, 2013).
For neuroimaging analyses, independence among components can be imposed in either the spatial (spatial ICA) or the temporal (temporal ICA) domain. Spatial ICA is used more often for fMRI analyses because neural activity is assumed to be sparse among a large number of voxels. Therefore, the independent components isolate coherent networks that overlap as little as possible. However, this assumption of sparseness in the brain can be problematic because spatial ICA will push each noncontiguous activity cluster into separate components. Temporal ICA is more often used for event-related potential (ERP) data because scalp recordings have distinct time courses; therefore, the underlying components are assumed to be temporally independent but may have overlapping spatial topographies. To compare components across participants, the ICA can be performed on all participants as a group. Here, the data from all subjects are concatenated so that each subject is treated as an observation of the same underlying system (Calhoun, Adali, Pearlson, & Pekar, 2001; Kovacevic & McIntosh, 2007). If data are concatenated along the spatial dimension, subjects will have unique spatial maps but common time courses. The converse is true if data are concatenated along the temporal dimension. The concatenated group data are then decomposed into independent components. This puts all subjects in the same space and allows them to be directly compared. Statistical inference on the independent components is then possible. This method is useful for extracting functionally segregated networks supporting a cognitive function. The independence of the regions identified as active during task performance relative to the rest of the brain cannot be inferred from the other connectivity methods. 64.2.2.2 ICA: Reading Network Example In an fMRI study, Ye, Don˜amayor, and Mu¨nte (2014) used ICA to examine connectivity across the whole brain underlying semantic integration during a sentence reading task with either semantically congruent or incongruent endings. The authors extracted a functional network consisting of the supplementary motor area, left basal ganglia, left inferior frontal gyrus, left middle temporal gyrus, and left angular gyrus that was modulated by the semantic manipulation in the semantic reading task. The time courses of these regions were highly correlated and their activity was greater for incongruent versus congruent sentence endings.
64.2.3 Seed Partial Least Squares (PLS) 64.2.3.1 PLS Method In general, seed-based functional connectivity techniques examine the correlation with an a priori region
L. WRITTEN LANGUAGE
64.2 FUNCTIONAL CONNECTIVITY ANALYSES: A SET OF EXPLORATORY TECHNIQUES
of interest (ROI) or “seed” region. In its most basic form, an averaged ROI time series is correlated with the time series of all other voxels in the brain or with the average time series of a set of ROIs. Determining the seed region can be done based on functional activity or anatomical parcellation. A specific type of seed-based functional connectivity can be performed with partial least squares (PLS) (McIntosh, Bookstein, Haxby, & Grady, 1996). PLS is a multivariate analysis technique that can be used to identify a set of variables (called latent variables (LVs)) that optimally link spatiotemporal brain data to the task design or to behavioral measures, or, in the case of seed PLS, that link functional connectivity to other neural seed regions by extracting commonalities between them. PLS is similar to a PCA with several important differences: (i) PLS analysis is constrained to the part of the covariance matrix that is related to the time series of the given neural seed region that allows for interpretation of brain connectivity results as relating to each experimental condition; (ii) statistical inferences regarding the significance of the experimental manipulations are made using nonparametric permutation methods that allow one to select the LVs that significantly express task or behavioral effects on connectivity; and (iii) bootstrap resampling is used to retain only voxels that robustly express the task or the behavioral effects. Finally, PLS is specialized to handle larger data sets where the dependent measures are highly correlated; therefore, it is well-suited for the analysis of neuroimaging data (McIntosh & Lobaugh, 2004). Brain activity data for PLS are organized into a 2D matrix where the rows contain scan data for each participant within each task condition and the columns consist of voxels 3 time. A second matrix consisting of time series of the seed neural region is similarly stacked by participant within condition within participant group. The data from the seed region are then correlated with the overall brain data of the participants and subjected to singular value decomposition (SVD). From the input matrix, SVD creates a set of orthogonal singular vectors (the LVs), which represent the entire covariance of the mean-centered matrix in decreasing order of magnitude and whose number is equal to the total number of task conditions times groups. Thus, the LVs can be thought of as being similar to the eigenvectors generated by PCA. Each LV consists of a pair of left- and right-singular vectors that relate brain connectivity to the experimental design. The weights within the LV at each voxeltime point combination are referred to as voxel saliences. The voxel saliences identify a collection of voxels that, as a group, have connectivity to the seed region most related to the task design. The task saliences indicate
807
the degree to which each task is related to the pattern BOLD connectivity differences. The saliences are similar to PCA eigenvector weights. To determine how often the singular value matrix for an LV generated from the original analysis is larger than singular value matrices generated from random data, permutation testing is used. Permutation testing involves resampling without replacement, where data are shuffled to reassign the order of task conditions for each participant (Good, 2004). PLS is then re-run a certain number of times on each set of reordered data, and the number of times the permuted singular values exceed the original values is calculated and given a probability. The 95th percentile of the resulting probability distribution of singular values is used as the significance threshold. In this scenario, the assumption of a normal distribution is not required (McIntosh et al., 1996). In a second step, one assesses the reliability of each voxel’s contribution to the LV by estimating the standard error of the voxel saliences using bootstrapping. Bootstrapping consists of resampling with replacement where participant data are shuffled while the experiment conditions remain fixed. SVD is then performed on the resampled matrix consisting of participant subsets of the original data set, and the standard error of the voxels contributing to the task effects are calculated (Efron & Tibshirani, 1993). The ratio of the salience to the standard error of the voxels is used to threshold the data and can be thought of as similar to a z-score if the data are normally distributed. Although this method is quite data-driven, the advantage of seed PLS relative to the other methods is that it permits the testing of a hypothesis focused on a particular ROI. 64.2.3.2 PLS: Reading Network Example Reinke, Fernandes, Schwindt, O’Craven, and Grady (2008) examined how whole brain functional connectivity changed with the visual word form area (VWFA) based on whether participants viewed English words, meaningful symbols such as $ and %, digits, words in an unfamiliar language (Hebrew), and a control set of stimuli consisting of geometric shapes. They were able to show that while neural activity in the VWFA did not differ significantly for words and meaningful symbols, a specific functional network of regions including the left hippocampus, left lateral temporal, and left prefrontal cortices was specific to words. This study underscored the fact that the neural context of the VWFA, specifically the broader distributed brain activity that is correlated with the VWFA, is specific for visual word processing but not for activity in the focal brain region itself.
L. WRITTEN LANGUAGE
808
64. IMAGING BRAIN NETWORKS FOR LANGUAGE: METHODOLOGY AND EXAMPLES FROM THE NEUROBIOLOGY OF READING
64.2.4 Synchronization of Neuronal Oscillations
64.3 EFFECTIVE CONNECTIVITY ANALYSES: A SET OF CONFIRMATORY TECHNIQUES
64.2.4.1 SNO Method Scalp recordings techniques such as magnetoencephalography (MEG) and EEG, which afford greater temporal resolution than fMRI, allow for real-time investigation of brain network dynamics during reading. Constituent neuronal populations involved in the same functional network are identifiable because of the fact that they fire in synchrony at a given frequency. The specificity of this frequency allows the neuronal population to participate in a variety of representations at different points in time. In addition, oscillatory synchrony can also serve to bind together information that is represented in the different neuronal populations (Gray, Konig, Engel, & Singer, 1989). In its simplest form, one can investigate the linear interdependencies or correlations between the amplitudes of various EEG or MEG signals. This avoids the necessity to band-pass filter or extract instantaneous phase. One can also examine the cross-correlation, which includes further information about the systematic time shifts between the amplitudes of the two signals. However, studies often examine phase synchronization in frequency space. Cross-spectral density can be computed by multiplying the Fouriertransformed signals of the time series. Coherence is assessed by normalizing the cross-spectral density with the power spectral density of both time series. This value ranges from 0, if the signals have no similarity, to 1, if the signals are identical. It is critical for such a synchronization analysis to select the frequency bands of interest. Often, one can assess reliability via confidence levels by comparing to synthetic data, which are created by shuffling the time point of the original data while preserving spatial relationships (Gross et al., 2001). The advantage of this method is that it allows the researcher to characterize dynamics among regions involved in reading on a faster temporal scale than the other methods. 64.2.4.2 SNO: Reading Network Example Kujala et al. (2007) found the strongest synchronization among regions often implicated in reading (occipital temporal, medial, superior, and inferior temporal, prefrontal, and orbital cortices, face motor areas, insula, and cerebellum) at 813 Hz during a rapid reading task. Notably, regions such as the supramarginal gyrus or the posterior superior temporal cortex, which are thought to be involved in grapheme-to-phoneme conversion, were not a part of the network, potentially because the nature of the task required a more lexical-semantic than phonological reading strategy.
64.3.1 Overview Effective connectivity is defined as a directed causal influence of one region on another (Aertsen, Gerstein, Habib, & Palm, 1989). Analyses of effective connectivity involve confirmation of hypotheses. Unlike exploratory analyses, a confirmatory approach begins with the construction of an explicit model of interregional neural relationships. The model is then tested for goodness of fit with the observed data and/or whether it can fit the observed data better than an alternative model. Therefore, effective connectivity analyses test precise hypotheses that take into account external inputs and the neuroanatomical architecture rather than being data-driven like functional connectivity analyses (McIntosh & Miˇsi´c, 2013).
64.3.2 Psychophysiological Interactions (PPI) 64.3.2.1 PPI Method If an experimental manipulation relates to significant changes in the correlation between a pair of brain regions, then this suggests an interaction between the psychological variable and the neural or physiological connectivity, termed a psychophysiological interaction (PPI) (Friston et al., 1997). In a typical functional connectivity analysis, regions may have some baseline correlations due to anatomical connections as in the case of resting state networks (Biswal et al., 1995), common sensory inputs, or neuromodulatory influences. Furthermore, if a change in correlation of activity in the seed is observed with another region, it could be caused by a change in another functional connection, a change in the level of observation noise, or a change in the amplitude of endogenous neuronal fluctuations (Friston, 2011). Therefore, a significant correlation or even a significant change in correlation cannot always be interpreted as a change in the underlying coupling between the two regions. PPI seeks to address this issue by moving beyond task-independent correlations and examining changes in correlations that occur as a result of an imposed task manipulation. In PPI, the activity of the seed region is regressed onto the activity of another brain region across different experimental conditions and the change in slope is calculated. Just like seed PLS, the first step of PPI involves selecting a seed region and extracting its time course. The goal is to find regions with which the seed has a stronger relationship during a particular experimental condition than during the others (i.e., a task by seed region interaction). For this purpose, an
L. WRITTEN LANGUAGE
64.3 EFFECTIVE CONNECTIVITY ANALYSES: A SET OF CONFIRMATORY TECHNIQUES
interaction (PPI) regressor is created by taking the scalar product of the mean-centered task time course with the mean-corrected seed region time course. Voxels whose activity correlates only with the seed region or that show an effect of task will have some correlation to the predictor. Therefore, it is necessary to include the experimental task design and the physiological time courses from which the interaction term was created as covariates of no interest in the model. This will ensure that the variance explained by the interaction term goes significantly beyond what can already be explained by the main effects of task and correlation to the seed. The advantage of this method is that, much like seed PLS, it allows the researcher to test a hypothesis focused on a particular ROI or voxel. However, it specifically examines how the task modulates the network based on that ROI. 64.3.2.2 PPI: Reading Network Example Callan, Callan, and Masaki (2005) trained native Japanese speakers to learn the character-to-sound associations of an unknown orthography (either Thai or Korean phonograms, i.e., a grapheme that represents a phoneme or a combination of phonemes) and examined changes in brain connectivity pre- and post-training with fMRI. They found significant changes in activation post-training relative to pre-training in the left angular gyrus, and then used this region as a seed for a PPI analysis. The authors then identified a network showing greater integration of left angular gyrus activity with activity in the primary visual cortex and superior temporal gyrus for the trained phonograms after training. This finding underscored the importance of the left angular gyrus in grapheme-to-phoneme conversion.
64.3.3 Structural Equation Modeling (SEM) 64.3.3.1 SEM Method The purpose of structural equation modeling (SEM) is to define a theoretical causal model consisting of a set of predicted covariances between variables and then test whether it is plausible when compared to the observed data (Jo¨reskog, 1970; Wright, 1934). In neuroimaging, these causal models consist of the brain activity signal of interest in a subset of ROIs and the pattern of directional influences among them (McIntosh & Gonzalez-Lima, 1991, 1994). The influences are constrained anatomically so that a direct connection between two regions is only possible if there is a known white matter pathway between them. The first step in defining an SEM is to specify the brain regions, which are treated as variables, and the causal influences between them in terms of linear
809
regression equations. There is always one equation for each dependent variable (activity in the ROI), and some variables can be included in more than one equation. This system of equations can be expressed in matrix notation as Y 5 βY 1 ψ, where Y contains the variances of the regional activity for the ROIs, β is a matrix of connection strengths that defines the anatomical network model, and ψ contains residual effects, which can be thought of as either the external influences from other brain regions that cannot be stipulated in the model or the influence of the brain region on itself. Because the model is underspecified, having more unknown than known parameters, it is not possible to construct the model in a completely data-driven manner, and thus some constraints are needed on the model parameters. The most common approach is to arbitrarily restrict some elements of the residual matrix ψ to a constant, usually 3580% of the variance for a given brain region, and to set the covariances between residuals to zero (McIntosh & Gonzalez-Lima, 1994). It is also common in neuroimaging to keep the path coefficients in both directions equal for regions that have mutually coupled paths. The main idea of SEM is that the system of equations takes on a specific causal order, which can be used to generate an implied covariance matrix (McArdle & McDonald, 1984). Unlike in multiple regression models, where the regression coefficients are derived from the minimization of the sum of squared differences from the observed and predicted dependent variables, SEM minimizes the difference between the observed covariance structure and the one implied by the structural or path model. This is done by modifying the path coefficients and residual variances iteratively until there is no further improvement in fit. In most cases, a method such as maximum likelihood estimation or weighted least-squares is used to establish a fit criterion that needs to be maximized. The identified best-fitting path coefficient has a meaning similar to a semipartial correlation in that it reflects the influence of one region onto a second region with the influences from all other regions to the second region held constant. SEM can be conceptualized as a method that uses patterns of functional connectivity (covariances) to derive information about effective connectivity (path coefficients) (McIntosh & Miˇsi´c, 2013). Model inference is done in SEM by comparing the goodness-of-fit between the model implied covariance matrix and the empirical covariance matrix using a χ2 test. It is also possible to compare model fits using a χ2 difference test, and this can be done to examine whether one or more causal influences change as the result of a task or group effect. To this end, models are combined in a single multigroup or stacked run.
L. WRITTEN LANGUAGE
810
64. IMAGING BRAIN NETWORKS FOR LANGUAGE: METHODOLOGY AND EXAMPLES FROM THE NEUROBIOLOGY OF READING
The null hypothesis is that the effective connections do not differ between groups or task conditions and the null model is constructed so that path coefficients are set to be equal across groups or task conditions. The alternative hypothesis is that the effective connections are significantly different between groups or task conditions. Implied covariance matrices are generated for each group- or task-specific model. An alternative χ2 that is significantly lower (better fitting) than the null χ2 implies a significant group or task effect on the effective connections that were specified differently in the models. It is possible that the omnibus test can indicate a poor overall fit, but the difference test shows a significant change from one task to another. SEM has been shown to be robust in these cases and is able to detect changes in effective connectivity, even if the absolute fit of the model is insufficient (Protzner & McIntosh, 2006). Finally, it is possible to use an alternative approach to model selection, where nodes of the network are selected a priori, but the paths are connected in a data-driven manner (see Bullmore et al., 2000). The advantage of SEM is that one can identify directionality in the influence of activity from one region to that of another. In addition, SEM allows the researcher to test the validity of a theoretical model regarding network interactions among regions supporting the task under investigation. 64.3.3.2 SEM: Reading Network Example Levy et al. (2009) used SEM to test neuroanatomical predictions made by the dual-route cascade reading model (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001) on reading skill. Their effective connectivity models consisted of four left hemisphere ROIs: middle occipital gyrus (MOG); occipito-temporal junction (LOT); parietal cortex (LP); and inferior frontal gyrus (IFG). For reading words, MOG-LP, MOG-LOT, and LP-IFG pathways were significantly more involved than the LOT-LP path, suggesting that information traffics along both ventral and dorsal pathways during word reading. For pseudoword reading, MOG-LOT and LOT-LP were significantly more involved than MOG-LP, suggesting that information first flows to LOT before being transferred to the dorsal pathway. In addition, increased reliance on the “word pathway” (MOG-LP) positively correlated with reading skill and increased reliance on the “pseudoword pathway” (MOG-LOT) correlated with the pseudoword reading ability. Their findings are in agreement with the DRC model, suggesting that regular words can be read in two ways (via parallel dorsal and ventral stream processing); however, the dorsal pathway is selective for word stimuli and increased connectivity in this pathway is related to better word
reading. Pseudowords, however, undergo letter/sublexical analysis in the posterior ventral pathway before being fed to the dorsal path.
64.3.4 Dynamic Causal Modeling (DCM) 64.3.4.1 DCM Method The key concept behind Dynamic Causal Modeling (DCM) is that brain networks comprise an input-stateoutput system, where causal interactions are mediated by unobservable neuronal dynamics (Friston, Harrison, & Penny, 2003). Referred to as a causal model, these “hidden” interactions are specified by coupling parameters, which denote the degree of synaptic coupling and model effective connectivity. The local neuronal dynamics underlying these interactions are defined by a set of differential equations. This causal model is then combined with a forward, or observation, model that relates the mapping from the neuronal activity to the observed responses (Friston, Moran, & Seth, 2013). It is important to note that causality is inferred at the neuronal population level and not at the level of the observed responses. In the causal model, each ROI consists of neuronal populations that are intrinsically coupled to each other and extrinsically coupled to neuronal populations of the other regions in the network. Stochastic or ordinary differential equations relate the present state of a neuronal population to the future state of the same neuronal population and to the states of the other populations. The coupling parameters can be thought of as rate constants, which determine how rapidly one population affects another (McIntosh & Miˇsi´c, 2013). Experimental effects are modeled as external perturbations to the system, which can cause either a change in coupling or a change in neuronal activity of a population. The underlying causal model is represented as a system of coupled differential equations, where the rate of changes of state x is a function of the states of the other populations (x), the external inputs, and the coupling parameters, which are unknown and need to be inferred similarly to the path coefficients in SEM. DCMs do not stipulate any particular biophysical model of neuronal activity and only require the model to be biologically plausible and sufficiently able to explain the external perturbations and the interactions between neuronal populations. In the second step of DCM, the forward model is used to map neuronal states into the observed signal measurements while incorporating unknown parameters. The form of the forward model depends on the imaging modality. For ERPs, for example, the mapping function is the lead field matrix that models the propagation and volume conduction of electromagnetic fields
L. WRITTEN LANGUAGE
811
64.4 TECHNIQUES SPANNING BOTH FUNCTIONAL AND EFFECTIVE DOMAINS
through neural tissue, cerebrospinal fluid, skull, and skin. The unknown parameters that are introduced are the location and orientation of the source dipole (Kiebel, David, & Friston, 2006). If BOLD signal contrast is the observed measure, then the mapping function models how changes in neuronal activity engender changes in local blood flow, which result in an influx of oxygenated blood and a reduction in deoxygenated hemoglobin (Buxton, Wong, & Frank, 1998). Here, the unknown parameters can specify factors such as the rate constants of vasodilatory signal decay and capillary resting net oxygen extraction (Stephan, Weiskopf, Drysdale, Robinson, & Friston, 2007). DCMs use a Bayesian approach (see Friston et al., 2002) for estimating the unknown parameters in the models. Essentially, a posterior distribution is estimated for each parameter using an optimization algorithm and taking into account prior beliefs about the value the parameter can realistically take on and the observed data. The observed data are used to update the model (i.e., estimate the parameters) to maximize model evidence in a procedure known as Bayesian model inversion. The model evidence accounts for the ability of the model to explain the data as accurately as possible and to have the fewest parameters (i.e., parsimonious models are rewarded). Models can be compared by taking the ratio of their respective model evidences or the difference in their respective log evidences, and model comparisons can be made to determine group or task effects on anatomical connections in a similar manner to SEMs; however, DCMs can also vary in the specification of their priors for different task or group treatments. Model inversion is performed on each subject individually. Therefore, an experimenter must decide whether to keep the same model for all subjects in a between-subjects analysis. If experimenters choose this, then they can multiply the model evidences or add the log evidences across subjects to get the group model evidence, and this essentially represents a fixedeffects analysis. In this case, one approach to obtaining group-level estimates of the parameters is to compute a joint density for the subject-specific posterior distribution estimates. If the experimenters want to treat the subjects as heterogeneous, then they can take the ratio of the number of subjects who show positive model evidence for a given model to the number of subjects who show greater model evidence for another model (Stephan et al., 2007), and this is essentially a randomeffects analysis. In this case, it is common to take a summary statistic of the subject-specific posterior distribution estimates (such as the median or mode of the distribution) as the parameter estimate and conduct traditional random-effects analyses comparing the group means of these summary statistics (e.g., t-test).
Like SEM, DCM can show the directional influences of regions on one another. Whereas SEM is used to test theoretical models of network interactions based on what is known about anatomical connections, DCM is meant to provide a more explanatory understanding of the relationships among regions identified to be active during task-based fMRI analyses. 64.3.4.2 DCM: Reading Network Example The triangular part of the inferior frontal gyrus is thought to be particularly involved in semantic processing, and the opercular part is thought to have more of a role in phonological tasks (e.g., Poldrack et al., 1999). Mechelli et al. (2005) tested this theoretical framework using DCM. Effective connectivity from the anterior fusiform gyrus to the pars triangularis increased for exception words (e.g., PINT or STEAK), necessitating more lexical-semantic processing relative to pseudowords and between dorsal premotor cortex and posterior fusiform gyrus for pseudowords (e.g., RINT or MAVE) requiring more phonological mapping relative to exception words. This finding demonstrated distinct neuronal mechanisms for semantic and phonological processing and confirmed previous theories of dissociation of function in the inferior frontal gyrus.
64.4 TECHNIQUES SPANNING BOTH FUNCTIONAL AND EFFECTIVE DOMAINS 64.4.1 Granger Causality (GC) 64.4.1.1 GC Method Granger causality (GC) does not fit easily into the functional or effective connectivity because it has both exploratory and confirmatory characteristics. The main idea behind GC is that B “Granger causes” A if B contains information that helps predict the future of A better than information in the past of A predicts or information in the past of other conditioning variables, C (Friston et al., 2013). The GC measure is based on the relative change in the model error when new time series are added to improve the prediction of the dependent signal (Granger, 1969). Essentially, GC is the ratio of the variance of the model before and after the addition of the new time series (time series “B”) in this case: Fy-x 5 ln
VarðeajjaÞ VarðeajjabÞ
In the context of brain networks, multiple predictors comprising the past time series of all ROIs can be specified to account for the present time series of all ROIs. If one assumes the effects to be linear, then the
L. WRITTEN LANGUAGE
812
64. IMAGING BRAIN NETWORKS FOR LANGUAGE: METHODOLOGY AND EXAMPLES FROM THE NEUROBIOLOGY OF READING
relationships can be specified using multivariate linear regression (through what is termed multivariate vector autoregressive (MVAR) modeling; Goebel, Roebroeck, Kim, & Formisano, 2003). An MVAR model contains every possible connection in the network, and each connection is tested to determine which ones are nonzero. This allows subnetworks to be extracted without having to specify connectivity patterns a priori. For any given connection, the influence of all other nodes are partialled out, allowing one to obtain an estimate for whether the past time series of B helps predict the time series of A more than what is accounted for by all other variables combined, C. The coefficients in the MVAR can be estimated using ordinary least-squares (by minimizing the difference in the sum of squared errors between the predicted and observed values of the present times series). The advantage of Granger causality is that it allows a researcher to pinpoint directional influences of regions on another without any a priori hypothesis regarding which regions are involved in particular subnetworks. This is the case because subnetworks are identified in a data-driven manner. 64.4.1.2 GC: Reading Network Example In a MEG study, Frye, Liederman, McGraw Fisher, and Wu (2012) used GC to examine the connectivity of bilateral temporoparietal areas (TPAs) in dyslexic and typical readers during a nonword reading task. The important feature of this study is that GC allowed them to examine hierarchical network structure (i.e., which nodes show dominant influences of the other nodes) rather than connectivity alone. In the beta frequency band, those participants with greater connectivity from the left TPA to other regions (left TPA dominant) were more likely to show improved phonological decoding performance, and those with greater connectivity from other regions into the TPA were more likely to show poorer phonological decoding performance across both groups of participants. It may be the case that participants in which a hierarchical network topography is manifested (i.e., those with greater outward connectivity of TPA) also show more stable network processing, allowing them to optimally process stimuli, because neural networks with hierarchical structures have been shown to be more stable compared to neural networks with nonhierarchical topography. Second, greater relative outward connectivity of the right TPA to other brain areas was associated with worse performance in dyslexic readers. This finding extends previous studies of dyslexic readers that reported greater TPA activity by indicating that the direction of influence of the right TPA onto other brain regions may play a key role in dyslexia.
64.4.2 Graph Theory 64.4.2.1 Graph Theory Method Graph theoretic measures can be applied to structural, functional, and effective connectivity matrices (Bullmore & Sporns, 2009). Unlike the metrics previously discussed, which are often limited by which seeds or subset of nodes are chosen a priori, graph theoretic measures provide information regarding changes in the topology of the whole brain as well as the role that the individual nodes play within that topology, thereby providing a comprehensive assessment of patterns of information flow. This approach allows for the investigation of functional integration and functional segregation within the global brain network architecture. The first step of such an analysis is to construct a graph that consists of nodes (i.e., the neural elements of interest) and edges, representing the statistical dependencies or connections between the nodes. For structural connectivity graphs, the edges correspond to anatomical white matter pathways; for functional connectivity graphs, the edges are pairwise associations between the brain signals of the nodes; and for effective connectivity graphs, the edges consist of pairwise measures of the causal influence of one node onto another. Structural brain connectivity yields a sparse and directed (or asymmetric) graph. Functional and effective brain connectivity give rise to full undirected (or symmetric) or directed graphs, respectively, which can be further reduced by setting a threshold to control the degree of sparsity. All graphs may be weighted, with the weights representing connection densities or efficacies, or binary, indicating the presence or absence of a connection. Once the graph is constructed, various graph theoretic measures can be computed (see Rubinov & Sporns, 2010). One can simply measure the connectedness of each node by counting its total number of connections (degree). To investigate integration, one can examine the least number of steps necessary to get from one node to another (shortest path length) as well as the characteristic path length for the entire network. The network’s global efficiency is a related measure that is the average inverse shortest path length. To examine segregation, one can compute the clustering coefficient, which is the fraction of the node’s neighbors that are also neighbors of each other. One can also partition the brain into subnetworks or communities within the whole brain and then examine their interactions by computing modularity, which is the ratio of the density of connections within specific subnetworks compared to the density of connections between subnetworks. Once the brain has been partitioned into subnetworks, it is possible to identify
L. WRITTEN LANGUAGE
REFERENCES
which nodes act as connector hubs (the terminology for these nodes varies, but here we mean nodes that play a strong role in linking subnetworks to each other) by computing their participation coefficient and which nodes act as modular hubs (nodes that have greater connections within their own subnetwork) by computing their within-module degree z-score. Finally, it is possible to assess the frequency with which certain combinations of nodes or edges occur by looking at motifs. Some metrics, such as modularity or characteristic path length, produce one value per subject and can be submitted to standard between-subject univariate tests. The node-specific measures can be used to generate topological maps for each subject, and some correction for multiple comparisons must be performed for inferential analyses about robust effects. The advantage of graph theory, unlike most of the other methods that require confining analyses to a priori regions thought to be involved in a functional network, is that we have the capability of examining and quantifying interactions among regions on a whole-brain level, thus giving us a comprehensive view of functional network architecture while still being able to assign specific connectivity roles within the network to certain nodes. 64.4.2.2 Graph Theory: Reading Network Example Vogel et al. (2013) examined the functional subnetwork structure during resting state in an attempt to identify a reading-dedicated community. Unlike previous studies that have examined this, the authors only included prominent reading-related regions derived from a meta-analysis (including left supramarginal gyrus, angular gyrus, middle temporal gyrus, and inferior frontal gyrus) as nodes in the analysis. Rather than cluster into their own community, reading-related regions were assigned to other communities whose primary function is more general than reading (e.g., the VWFA was assigned to a visual community, and the angular gyrus was assigned to a default mode community), suggesting the lack of an intrinsic reading community in the brain. This was the case with mature as well as developmental subjects, suggesting that the regions implicated in reading are broadly used across many tasks, including reading. Because this study was undertaken during resting state, it does not rule out that there may be special relationships among some of these regions during task-based reading.
64.5 CONCLUSIONS Connectivity analyses identifying neural networks move beyond isolating a collection of individual regions that are associated with reading and provide
813
an account of how neural regions interact with one another. Exploratory methods such as ICA and PLS aim to reduce the original data into interpretable functional networks. Effective connectivity analyses like SEM and DCM are useful, particularly when trying to test a priori anatomical or theoretical hypotheses about the neurobiology underlying reading (e.g., the DRC model). Graph theoretic measures take into account the global topology of neural networks while also specifying the role of individual regions within a network or subnetwork. These measures can identify the patterns of information flow among regions across the whole brain rather than examining interactions with a seed or within a subset of region, and they have been used to test whether reading-specific subnetworks exist in the brain.
References Aertsen, A., Gerstein, G., Habib, M., & Palm, G. (1989). Dynamics of neuronal firing correlation: Modulation of “effective connectivity.”. Journal of Neurophysiology, 61, 900917. Beckmann, C., & Smith, S. (2004). Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Transactions on Medical Imaging, 23, 137152. Biswal, B., Yetkin, F. Z., Haughton, V. M., & Hyde, J. S. (1995). Functional connectivity in the motor cortex of resting human brain using echo-planar MRI. Magnetic Resonance in Medicine, 34, 537541. Bullmore, E., Horwitz, B., Honey, G., Brammer, M., Williams, S., & Sharma, T. (2000). How good is good enough in path analysis of fMRI data? NeuroImage, 11, 289301. Bullmore, E., & Sporns, O. (2009). Complex brain networks: Graph theoretical analysis of structural and functional systems. Nature Reviews Neuroscience, 10, 186198. Buxton, R. B., Wong, E. C., & Frank, L. R. (1998). Dynamics of blood flow and oxygen metabolism during brain activation: The balloon model. Magnetic Resonance in Medicine, 39, 855864. Calhoun, V., Adali, T., Pearlson, G., & Pekar, J. (2001). A method for making group inferences from functional MRI data using independent component analysis. Human Brain Mapping, 14, 140151. Callan, A. M., Callan, D. E., & Masaki, S. (2005). When meaningless symbols become letters: Neural activity change in learning new phonograms. NeuroImage, 28(3), 553562. Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108, 204256. Craddock, R. C., James, G. A., Holtzheimer, P. E., Hu, X. P., & Mayberg, H. S. (2012). A whole brain fMRI atlas generated via spatially constrained spectral clustering. Human Brain Mapping, 33, 19141928. Craddock, R. C., Jbabdi, S., Yan, C. G., Vogelstein, J. T., Castellanos, F. X., Di Martino, A., et al. (2013). Imaging human connectomes at the macroscale. Nature Methods, 10(6), 524539. Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. New York, NY: Chapman & Hall. Fischl, B., van der Kouwe, A., Destrieux, C., Halgren, E., Segonne, F., Salat, D. H., et al. (2004). Automatically parcellating the human cerebral cortex. Cerebral Cortex, 14, 1122. Fox, M. D., & Raichle, M. E. (2007). Spontaneous fluctuations in brain activity observed with functional magnetic resonance imaging. Nature Reviews Neuroscience, 8, 700711.
L. WRITTEN LANGUAGE
814
64. IMAGING BRAIN NETWORKS FOR LANGUAGE: METHODOLOGY AND EXAMPLES FROM THE NEUROBIOLOGY OF READING
Friston, K., Buechel, C., Fink, G., Morris, J., Rolls, E., & Dolan, R. (1997). Psychophysiological and modulatory interactions in neuroimaging. NeuroImage, 6, 218229. Friston, K., Harrison, L., & Penny, W. (2003). Dynamic causal modelling. NeuroImage, 19, 12731302. Friston, K., Moran, R., & Seth, A. K. (2013). Analysing connectivity with Granger causality and dynamic causal modeling. Current Opinions in Neurobiology, 23(2), 172178. Friston, K. J. (2011). Functional and effective connectivity: A review. Brain Connectivity, 1(1), 1336. Friston, K. J., Frith, C. D., Liddle, P. F., & Frackowiak, R. S. (1993). Functional connectivity: The principal-component analysis of large (PET) data sets. Journal of Cerebral Blood Flow and Metabolism, 13, 514. Friston, K. J., Penny, W., Phillips, C., Kiebel, S., Hinton, G., & Ashburner, J. (2002). Classical and Bayesian inference in neuroimaging: Theory. NeuroImage, 16(2), 465483. Frye, R. E., Liederman, J., McGraw Fisher, J., & Wu, M. H. (2012). Laterality of temporoparietal causal connectivity during the prestimulus period correlates with phonological decoding task performance in dyslexic and typical readers. Cerebral Cortex, 22(8), 19231934. Goebel, R., Roebroeck, A., Kim, D., & Formisano, E. (2003). Investigating directed cortical interactions in timeresolved fMRI data using vector autoregressive modeling and Granger causality mapping. Magnetic Resonance Imaging, 21, 12511261. Good, P. I. (2004). Permutation, parametric, and bootstrap tests of hypotheses (3rd ed.). New York, NY: Springer-Verlag. Granger, C. W. J. (1969). Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37, 424438. Gray, C. M., Konig, P., Engel, A. K., & Singer, W. (1989). Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties. Nature, 338, 334337. Gross, J., Kujala, J., Ha¨ma¨la¨inen, M., Timmermann, L., Schnitzler, A., & Salmelin, R. (2001). Dynamic imaging of coherent sources: Studying neural interactions in the human brain. Proceedings of the National Academy of Sciences, 98, 694699. Jo¨reskog, K. G. (1970). A general method for analysis of covariance structures. Biometrika, 57, 239251. Kiebel, S., David, O., & Friston, K. (2006). Dynamic causal modelling of evoked responses in EEG/MEG with lead field parameterization. NeuroImage, 30, 12731284. Kovacevic, N., & McIntosh, A. (2007). Groupwise independent component decomposition of EEG data and partial least square analysis. NeuroImage, 35, 11031112. Kujala, J., Pammer, K., Cornelissen, P., Roebroeck, A., Formisano, E., & Salmelin, R. (2007). Phase coupling in a cerebro-cerebellar network at 813 Hz during reading. Cerebral Cortex, 17(6), 14761485. Lancaster, J. L., Woldorff, M. G., Parsons, L. M., Liotti, M., Freitas, C. S., Rainey, L., et al. (2000). Automated Talairach atlas labels for functional brain mapping. Human Brain Mapping, 10, 120131. Levy, J., Pernet, C., Treserras, S., Boulanouar, K., Aubry, F., De´monet, J. F., et al. (2009). Testing for the dual-route cascade reading model in the brain: An fMRI effective connectivity account of an efficient reading style. PLoS One, 18(4), e6675. McArdle, J., & McDonald, R. (1984). Some algebraic properties of the reticular actionmodel for moment structures. The British Journal of Mathematical and Statistical Psychology, 37, 234251.
McIntosh, A. (1998). Understanding neural interactions in learning and memory using functional neuroimaging. Annals of the New York Academy of Sciences, 855, 556571. McIntosh, A. (2000). Towards a network theory of cognition. Neural Networks, 13, 861870. McIntosh, A., & Gonzalez-Lima, F. (1991). Structural modeling of functional neural pathways mapped with 2-deoxyglucose: Effects of acoustic startle habituation on the auditory system. Brain Research, 547, 295302. McIntosh, A., & Gonzalez-Lima, F. (1994). Structural equation modeling and its application to network analysis in functional brain imaging. Human Brain Mapping, 2, 222. McIntosh, A. R., Bookstein, F. L., Haxby, J. V., & Grady, C. L. (1996). Spatial pattern analysis of functional brain images using Partial Least Squares. NeuroImage, 3, 143157. McIntosh, A. R., & Lobaugh, N. J. (2004). Partial least squares analysis of neuroimaging data: Applications and advances. NeuroImage, 23, S250S263. McIntosh, A. R., & Miˇsi´c, B. (2013). Multivariate statistical analyses for neuroimaging data. Annual Review of Psychology, 64, 499525. Mechelli, A., Crinion, J. T., Long, S., Friston, K. J., Lambon Ralph, M. A., Patterson, K., et al. (2005). Dissociating reading processes on the basis of neuronal interactions. Journal of Cognitive Neuroscience, 17, 17531765. Poldrack, R. A., Wagner, A. D., Prull, M. W., Desmond, J. E., Glover, G. H., & Gabrieli, J. D. (1999). Functional specialization for semantic knowledge and phonological processing in the left inferior prefrontal cortex. NeuroImage, 10, 1535. Power, J. D., Cohen, A. L., Nelson, S. M., Wig, G. S., Barnes, K. A. B., Church, J. A., et al. (2011). Functional network organization of the human brain. Neuron, 72(4), 665678. Protzner, A., & McIntosh, A. (2006). Testing effective connectivity changes with structural equation modeling: What does a bad model tell us? Human Brain Mapping, 27, 935947. Reinke, K., Fernandes, M., Schwindt, G., O’Craven, K., & Grady, C. L. (2008). Functional specificity of the visual word form area: General activation for words and symbols but specific network activation for words. Brain and Language, 104, 180189. Rubinov, M., & Sporns, O. (2010). Complex network measures of brain connectivity: Uses and interpretations. NeuroImage, 52, 10591069. Stephan, K., Weiskopf, N., Drysdale, P., Robinson, P., & Friston, K. (2007). Comparing hemodynamic models with DCM. NeuroImage, 38, 387401. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., et al. (2002). Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage, 15, 273289. Vogel, A. C., Church, J. A., Power, J. D., Miezin, F. M., Petersen, S. E., & Schlaggar, B. L. (2013). Functional network architecture of reading-related regions across development. Brain and Language, 125(2), 231243. Wright, S. (1934). The method of path coefficients. The Annals of Mathematical Statistics, 5, 161215. Ye, Z., Don˜amayor, N., & Mu¨nte, T. F. (2014). Brain network of semantic integration in sentence reading: Insights from independent component analysis and graph theoretical analysis. Human Brain Mapping.
L. WRITTEN LANGUAGE