www.elsevier.com/locate/ynimg NeuroImage 29 (2006) 853 – 858
Cross-modal processing in auditory and visual working memory Boris Suchan,a,* Britta Linnewerth,a Odo Ko¨ster,b Irene Daum,a and Gebhard Schmid b a
Institute of Cognitive Neuroscience, Department of Neuropsychology, Ruhr-University of Bochum, Universita¨tsstraße 150, D-44780 Bochum, Germany Institute for Diagnostic and Interventional Radiology and Nuclear Medicine, St. Josef Hospital Bochum, Germany
b
Received 8 March 2005; revised 5 August 2005; accepted 15 August 2005 Available online 9 September 2005
This study aimed to further explore processing of auditory and visual stimuli in working memory. Smith and Jonides (1997) [Smith, E.E., Jonides, J., 1997. Working memory: A view from neuroimaging. Cogn. Psychol. 33, 5-42] described a modified working memory model in which visual input is automatically transformed into a phonological code. To study this process, auditory and the corresponding visual stimuli were presented in a variant of the 2-back task which involved changes from the auditory to the visual modality and vice versa. Brain activation patterns underlying visual and auditory processing as well as transformation mechanisms were analyzed. Results yielded a significant activation in the left primary auditory cortex associated with transformation of visual into auditory information which reflects the matching and recoding of a stored item and its modality. This finding yields empirical evidence for a transformation of visual input into a phonological code, with the auditory cortex as the neural correlate of the recoding process in working memory. D 2005 Elsevier Inc. All rights reserved.
Introduction According to Baddeley (1986, 2003), working memory involves two subsystems, the visuospatial sketch pad and the phonological loop, which are controlled by the central executive. The functional subdivision of the subsystems has been made in terms of processing spatial vs. nonspatial contents. According to Smith and Jonides (1999), spatial rehearsal is mediated by the right premotor cortex, in contrast to nonspatial, object related rehearsal which is mediated by the right dorsolateral prefrontal cortex (PFC). Storage of spatial components is mediated by the right posterior and inferior parietal cortices (Smith and Jonides, 1998). The phonological store component of the phonological loop is associated with activity in the left supramarginal gyrus whereas subvocal rehearsal is associated with activity in Broca’s area (Paulesu et al., 1993). The distinction between ‘‘frontal
* Corresponding author. Fax: +49 234 32 14622. E-mail address:
[email protected] (B. Suchan). Available online on ScienceDirect (www.sciencedirect.com). 1053-8119/$ - see front matter D 2005 Elsevier Inc. All rights reserved. doi:10.1016/j.neuroimage.2005.08.014
rehearsal’’ and ‘‘posterior storage’’ systems in verbal working memory has been supported by a range of further studies (e.g. Awh et al., 1996). It is believed that verbal working memory is amodal, with the representation of its contents being independent of input modality (Smith and Jonides, 1997). Auditory and visual stimuli are translated into an amodal code which is further processed in verbal working memory. This hypothesis has generally been confirmed by neuroimaging studies. Comparing visual and auditory presentation of letters in a 3-back task, a nearly complete overlap of activation patterns was observed, with the networks involving the dorsolateral PFC, Broca’s area, the SMA and left premotor cortex as well as parietal and cerebellar regions and the cingulate gyrus (Schumacher et al., 1996). According to Smith and Jonides (1997), visual input is automatically transformed into a corresponding phonological code, with the conversion supporting rehearsal of visual input. In contrast to neuroimaging, ERP studies yielded evidence for distinct visual and auditory working memory mechanisms, with earlier and longer lasting left frontal effects for auditory working memory and larger posterior effects for visual working memory (Ruchkin et al., 1997). The hypothesis of processing differences was further supported by a recent fMRI study which yielded higher activation in the posterior parietal cortex during visual working memory in contrast to higher activation in the left dorsolateral prefrontal cortex during working memory with auditory stimuli (Crottaz-Herbette et al., 2004). Another approach focused on differences in processing strategies (see Courtney et al., 1998a). Visual input may be processed by a verbal – analytical or a more holistic image-based strategy, and the choice of strategy is dependent upon working memory load (i.e. number of items and duration of maintenance), with verbal analytical processing leading to superior performance. Short maintenance (4.5 s) is associated with image-based coding and right PFC activation, whereas longer maintenance recruits the PFC bilaterally or the left PFC, respectively (Courtney et al., 1996, 1998b). These results have been interpreted in terms of a shift from visual to verbal coding with increasing working memory delays. This interpretation, however, is based on the results from three different studies, a manipu-
854
B. Suchan et al. / NeuroImage 29 (2006) 853 – 858
lation of the relevant parameters within one study has not yet been performed. A study by Nystrom et al. (2000) did not replicate the finding of lateralization of activation depending upon stimulus material (visual letters, geometrical figures and spatial locations), which may, however, be attributed to the potential use of a verbal strategy for all materials. Even suppression techniques do not entirely remove the effect of verbal coding, e.g. of fractals (see Ragland et al., 2002). The studies available so far have focused on the direct comparison of activations associated with visual and auditory stimuli in working memory. The potential transformation from visual input into phonological codes as suggested by Smith and Jonides (1997) is inferred from contrasts of visual and auditory working memory activation patterns. Activation patterns that are more directly linked to the transformation process per se have so far not been described. The mechanisms underlying transformation are explored in the present study, using a newly developed n-back task. By changing item modality within an n-back task from auditory to visual and vice versa and by comparing these conditions with trials where modality is kept constant, brain activation patterns associated with cross-modality processing can be assessed. The findings should provide further insights in the mechanisms involved in auditory and visual working memory and their potential interaction.
Methods Subjects 13 right handed healthy subjects (Oldfield, 1971) participated in this study (8 female, 5 male), mean age was 26 years (range 21 – 36 years). They gave written informed consent prior to the experiment. The study was approved by the local Ethics committee.
Task A new variant of a 2-back task was developed in our lab (Peters et al., in press). In 2-back tasks, stimuli are presented sequentially. Subjects have to decide for every item if it matches the item before the last (‘‘2-back’’). To measure cerebral activation pattern during modality changes, visual and auditory stimuli served as stimulus material. Pictures were taken from the set of standardized pictures published by Snodgrass and Vanderwart (1980) (Fig. 1). They had the same number of vowels when spoken aloud, and their names were recorded in stereo with a sample rate of 44.1 kHz. The duration of all auditory stimuli was set to 700 ms. Sound files were presented binaurally using an MR compatible headphone. Corresponding pictures were projected for the same duration (700 ms) by a video beamer on a screen, which could be seen by the subjects via a mirror placed at the top of the MR coil. Stimuli were presented using Experimental Runtime Software (www.erts.de). Two MR compatible response buttons were used to record subjects’ reaction time (RT). The 2-back task was designed to include at least 30 trials of each of the following conditions: visual – visual (VV), auditory – auditory (AA) as well as the transformation from visual to auditory (VA) and vice versa (auditory – visual; AV) 30 trials of new visual and new auditory trials were randomly intermixed thus leading to 180 trials in sum. The context of the 2-back task, namely the 1-back modality of all trials was balanced over all experimental conditions to reduce possible confounding effects of the 1-back condition modality. Conditions were presented in random order. Subjects were asked to respond to every single item by pressing one of two response buttons, indicating whether the actual item matches the item before last, independent of its modality. The same number of new stimuli of both modalities was randomly intermixed. A 0-back condition served as a control task. Subjects were asked to respond with the left key to a specific target item
Fig. 1. Illustration of the 2-back task with modality changes used in this experiment.
B. Suchan et al. / NeuroImage 29 (2006) 853 – 858
presented either in the visual or the auditory modality. Distractors required a right button press. Procedure Subjects were scanned during a series of two experimental blocks. In the first block, they had to perform a 2-back task as illustrated in Fig. 2. The modality between encoding and retrieval was changed randomly across the whole block. Subjects were asked to perform the 2-back task, independent of the items’ modality, i.e. seeing an item and hearing its name, 2 items later are associated with a target response. In addition to transformation of the modality in both directions (visual into auditory; auditory into visual), nontransformation trials were presented. The control block without working memory demands (0-back; see above) was presented separately. Scanning procedure and analysis Subjects were scanned using a Siemens (www.siemens.de) Symphony 1.5 T scanner. 210 scans with a TR = 2000 ms, a TE = 40 ms and a flip angle of 90- were acquired. The voxel size was 3 mm 3 mm 4.6 mm with 1 mm gap between the slices. One trial lasted 2200 ms leading to a total time of 7 min per experimental condition. The stimulus presentation and the fMRI scanner were started not synchronized. This was done to sample the hemodynamic response function (hrf) at different time points. As no jitter was included in the stimulus presentation time, this asynchronicity was used to reduce carry over effects and confounds due to different stimulus modalities. A high resolution T1 image was additionally acquired for anatomical labeling. Images were analyzed using SPM99 (http:// www.fil.ion.ucl.ac.uk/spm/). The first five images were discarded to allow for T1 equilibration. Before realigning all images to the first of a series, they were slice timed. Afterwards, they were normalized to the stereotactic space of the MNI brain provided by SPM 99, and smoothed with a gaussian kernel of 8 mm. A GLM was applied to the data separating them according
855
to their modality and also into transformed and nontransformed trials. Clusters of at least eight contiguous voxels below P < 0.05 (corrected) were considered significant. Foci of significant differences were transformed into Talairach space (Talairach and Tournoux, 1988) using the algorithm suggested by Brett (http:// www.mrc-cbu.cam.ac.uk/Imaging/mnispace.html). Anatomical labeling was performed using Talairach Daemon database (http:// ric.uthscsa.edu/projects/talairachdaemon.html). Anatomical results were further explored using the mean high resolution T1 image of all subjects to account for anatomical differences.
Results Behavioral results 0-back task Data of the 0-back task were analyzed using repeated measures ANOVA with factors Modality (visual vs. auditory) and Presentation (target vs. nontarget). Means of individual RT medians entered analysis. Results yielded a significant main Modality effect (F (1,13) = 24.2; P < 0.001), with shorter RTs for auditory compared to visual stimuli and a significant main Presentation effect (F (1,13) = 16.5; P < 0.01), with shorter RTs for the target compared to the nontarget items. There were no errors on the 0-back task. 2-back task RT medians from correct trials entered repeated measures ANOVA with factors Modality (visual vs. auditory) and Transformation (transformation vs. nontransformation). Results yielded a significant interaction (F (1,12) = 8.7; P < 0.05), none of the main effects reached significance. Post hoc t tests yielded significantly shorter RTs for the AA condition compared to the VA condition. The RTs in the AA condition were significantly shorter than the RTs in the VV condition. Repeated measures ANOVA for correct response rates did not yield any significant effects. The results for the behavioral data are presented in Table 1.
Fig. 2. Activation pattern for the interaction (left images) and the VA > AA contrast (right images).
856
B. Suchan et al. / NeuroImage 29 (2006) 853 – 858
Table 1 Mean of reaction time medians (ms) and standard deviation.
New 0-back-item 0-back-item 2-back-item 2-back-transformation
Visual x¯ (TSD)
Auditory x¯ (TSD)
821.8 696.4 967.9 867.6
611.4 489.3 724.2 1032.1
(225.3) (143.6) (249.8) (308.4)
VA > AA This contrast was analyzed to demonstrate activation reflecting the process of comparing items which were presented as visual images and are compared to the auditory 2-back probe. This contrast yielded activation in the superior temporal gyrus. The coordinates of this activation are consistent with those found for the interaction (see Fig. 2).
(223.15) (78.6) (226.0) (344.5)
AV > VV This contrast was analyzed in analogy to the previous contrast to demonstrate activation reflecting the comparison process of auditory stimuli to the visual probe. No significant clusters were found.
fMRI results The fMRI data entered analysis in a factorial design (factors Modality and Transformation). Factorial design
Discussion
The main Modality effect was associated with significant activations bilaterally in the superior and middle temporal gyrus. Additional activations were seen in the left middle and inferior occipital gyrus and the right fusiform and inferior occipital gyrus. A small activation was found in the right superior parietal lobe, covering probable Brodmann Area (BA) 7. The main Transformation effect did not yield significant clusters. The interaction term yielded significant activation in the left transverse and superior temporal gyrus and in the right middle occipital gyrus (see Fig. 2). The interaction was further explored by the following contrasts (for results see Table 2).
The present study aimed to explore the processing of visual and auditory stimuli in working memory. According to Smith and Jonides (1997), visual input into working memory is automatically transformed into a phonological code to support rehearsal. The transformation process was assessed in a variant of a 2-back task which involved visual and the corresponding auditory stimuli (i.e. the picture names) and transformation as well as nontransformation trials. Behavioral data RT analysis indicated that auditory stimuli (i.e. picture names) were generally processed faster than visual stimuli, when no
Table 2 Statistical results, Talairach coordinates (Talairach and Tournoux, 1988) and probable Brodmann areas (BA) of the factorial design analysis Condition
Cluster level
P corrected
F value
0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001 0.002 0.004 0.006
208.73 152.76 126.09 201.51 181.68 105.32 161.25 152.06 143.00 88.26 78.17 76.90 33.29 32.16 30.60 29.45
61 59 53 61 63 59 34 46 42 34 40 44 48 32 51 32
27 10 17 10 19 31 89 76 84 44 87 63 77 75 12 65
0.000 0.001 0.013
40.72 34.26 27.75
49 48 40
25 23 83
x
y
z
Structure
BA
11 0 5 1 1 2 4 6 3 20 1 15 13 15 22 51
Superior temporal gyrus Superior temporal gyrus Superior temporal gyrus Superior temporal gyrus Superior temporal gyrus Middle temporal gyrus Middle occipital gyrus Middle occipital gyrus Inferior occipital gyrus Fusiform gyrus Inferior occipital Gyrus Fusiform gyrus Middle temporal gyrus Fusiform gyrus Superior temporal gyrus Superior parietal lobule
42 22 41 21 22 21 18 19 18 20 18 37 19 19 38 7
12 5 8
Transverse temporal gyrus Superior temporal gyrus Middle occipital gyrus
41 41 19
Structure
BA
Superior temporal gyrus Superior temporal gyrus Superior temporal gyrus
41 41 22
Main effect modality 2461
2329
1498
1965
22 3 10 5 Interaction 107,707 3 Condition
Cluster level
P corrected
T value
x
y
z
VA > AA 17 3
0.008 0.026 0.000
5.23 4.98 10.64
No significant clusters were found in the contrast AV vs. VV.
49 46 63
23 25 19
3 10 1
B. Suchan et al. / NeuroImage 29 (2006) 853 – 858
working memory load was involved as well as in the 2-back tasks. RTs to auditory targets were faster on trials without a modality switch (AA compared to VA), reflecting increased processing time associated with the need for transformation. For transformation trials, RTs were shorter on AV relative to VA trials. Imaging data Significant activation of the left primary auditory cortex was associated with a statistical interaction effect and more specifically, the contrast VA > AA. This activation in BA 41 is related to transmodal auditory recoding of visual input. As there was no main Transformation effect, the mechanisms underlying modality switches are dependent upon the direction of the item recoding. This process is achieved by recoding the item’s modality to the target stimulus and its modality. VA > AA both involve auditory probe stimuli. Activation in the auditory cortex therefore cannot be attributed to auditory input as this is comparable for both conditions. The activation in the left auditory cortex is more likely associated with the processing of the current stimulus and the matching process of the retrieved item and its modality. As activation in the auditory cortex was observed during the retrieval and matching of information, it represents a recoding of the stored item and its modality during the matching procedure. This process is associated with longer RTs if compared to matching without modality changes. Activation of auditory cortex without auditory input was reported during silent lipreading (Calvert et al., 1997) and the observation of articulatory gestures (Pekkola et al., 2005). Similar mechanisms may underlie the activation of the primary auditory cortex in the present study. All pictures used in this experiment could be easily verbalized and were additionally presented as spoken words. The primary auditory cortex may therefore be activated by verbalizable, visual cues in addition to linguistic cues, as described by Calvert et al. (1997). The activation may therefore reflect the recoding of visual information to the target modality. An automatic visual to auditory transformation as suggested by Smith and Jonides (1997) should not be reflected in additional activation and prolonged RTs. The recoding process of visual information into a phonological code represents an active, nonautomatic process which might reflect, in the context of working memory, a memory-supporting process. These findings do not rule out the existence of an independent nonverbal working memory system with separate neuroanatomical correlates. The critical issue is whether or not stimuli can be coded verbally. Spatial working memory appears to be one candidate for a nonverbal working memory system. The design of nonverbalizable stimuli is a challenging task as even fractals (Ragland et al., 2002) did not completely suppress verbal coding. The present findings are not directly comparable to previous imaging studies which reported modality-invariant representations during working memory in the parietal and/or prefrontal cortex (Wager and Smith, 2003; Owen et al., 2005). The auditory cortex activation in the present contrasts most likely reflects a recoding mechanism, as outlined above, rather than storage/rehearsal processes related to parietal and prefrontal areas. Activation of the visual cortex or other brain areas after presentation of auditory targets was not observed in association with transformation. This finding should, however, be considered with caution as a null result (lack of activation) is difficult to
857
interpret. It may be speculated that matching of auditory to visual stimuli yields an automatic transformation process, a hypothesis which is also supported by the shorter RTs. Thus, verbal working memory does not appear to be amodal as suggested by Schumacher et al. (1996). The results suggest that auditory input can be processed without further modification, perhaps due to its phonological features. Visual input, on the other hand, has to be recoded phonologically which is reflected by the left primary auditory cortex activation. Subvocal rehearsal associated with Broca’s area in verbal memory (Paulesu et al., 1993) does not appear to have been relevant in the present task, which may be related to the use of a factorial design. The contrasts perhaps eliminated activation as they were present in all experimental conditions to a similar degree. The lack of a significant dorsolateral PFC activation in the VA > AA contrast of the present study may be due to the factorial design which contrasts different conditions which involve PFC activation as seen in the single contrasts. In a meta-analysis on nback working memory paradigms, Owen et al. (2005) concluded that the parietal cortex is also consistently activated in n-back tasks, although Ravizza et al. (2004) recently pointed out that parietal activation in verbal working memory might relate to speech processing and executive processes rather than phonological short term storage. There is no clear explanation for the lack of parietal activation in the present task. It is possible that it relates to design features, i.e. the use of an n-back task in an event related design which – to our knowledge – has not been applied before. The present findings focus directly on the modality matching process in working memory if visual and auditory stimuli are used interchangeable. As the matching and recoding process from visual to auditory modality is mediated by the left auditory cortex, the neural correlate of the opposite matching procedure needs to be further investigated, as no clear activation was found in the present study. Results of the present study should also be considered in the context of the 2-back task. The question arises whether the modality of the 1-back stimulus has an effect on the actual probe and its modality. In the present study, all trials irrespective of their 1-back item (auditory or visual) were collapsed into one condition. As the ratio of 1-back items and their modality was balanced over the experimental trials, no confounding effect should be expected. Possible interference effects of the 1-back stimulus and related activation patterns are of general interest and should be addressed in further fMRI studies.
Conclusion The present study aimed to investigate the neural correlates of transmodal processing in verbal working memory using visual and auditory stimuli. The hypothesized automatic transformation of visual input into a phonological code, as suggested by Smith and Jonides (1997) was not supported by the present data. Results suggest that the left auditory cortex is involved in recoding former stored images to the auditory modality of the probe. Activation patterns do not suggest a comparable recoding process of auditory stimuli. Results are consistent with activation patterns from silent lip reading, which indicated recoding of visual contents into phonological code without actual auditory input and embedded this
858
B. Suchan et al. / NeuroImage 29 (2006) 853 – 858
mechanism into the working memory framework. A general amodal working memory model is not supported by the present findings, as verbalizable visual input appears to be recoded to the auditory modality.
References Awh, E., Jonides, J., Smith, E.E., Schumacher, E.H., Koeppe, R.A., Katz, S., 1996. Dissociation of storage and rehearsal in verbal working memory. Psychol. Sci. 7, 25 – 31. Baddeley, A.D., 1986. Working Memory. Oxford Univ. Press, New York. Baddeley, A., 2003. Working memory: looking back and looking forward. Nat. Rev. Neurosci. 10, 829 – 839. Calvert, G.A., Bullmore, E.T., Brammer, M.J., Campbell, R., Williams, S.C., McGuire, P.K., Woodruff, P.W., Iversen, S.D., David, A.S., 1997. Activation of auditory cortex during silent lipreading. Science 276, 593 – 596. Courtney, S.M., Ungerleider, L.G., Keil, K., Haxby, J.V., 1996. Object and spatial visual working memory activate separate neural systems in human cortex. Cereb. Cortex 6, 39 – 49. Courtney, S.M., Petit, L., Haxby, J.V., Ungerleider, L.G., 1998. The role of prefrontal cortex in working memory: examining the contents of consciousness. Philos. Trans. R. Soc. Lond., B Biol. Sci. 353, 1819 – 1828. Courtney, S.M., Petit, L., Maisog, J.M., Ungerleider, L.G., Haxby, J.V., 1998. An area specialized for spatial working memory in human frontal cortex. Science 279, 1347 – 1351. Crottaz-Herbette, S., Anagnoson, R.T., Menon, V., 2004. Modality effects in verbal working memory: differential prefrontal and parietal responses to auditory and visual stimuli. NeuroImage 21, 340 – 351. Nystrom, L.E., Braver, T.S., Sabb, F.W., Delgado, M.R., Noll, D.C., Cohen, J.D., 2000. Working memory for letters, shapes, and locations: fMRI evidence against stimulus-based regional organization in human prefrontal cortex. NeuroImage 11, 424 – 446. Oldfield, R.C., 1971. The assessment and analysis of handiness: The Edinburgh inventory. Neuropsychologia 9, 97 – 113.
Owen, A.M., McMillan, K.M., Laird, A.R., Bullmore, E., 2005. N-back working memory paradigm: a meta-analysis of normative functional neuroimaging studies. Hum. Brain Mapp. 25, 46 – 59. Paulesu, E., Frith, C.D., Frackowiak, R.S., 1993. The neural correlates of the verbal component of working memory. Nature 362, 342 – 345. Pekkola, J., Ojanen, V., Autti, T., Jaaskelainen, I.P., Mottonen, R., Tarkiainen, A., Sams, M., 2005. Primary auditory cortex activation by visual speech: an fMRI study at 3 T. NeuroReport 8, 125 – 128. Peters, J., Suchan, B., Zhang, Y., Daum, I., in press. Visuo-verbal interactions in working memory: evidence from event-related potentials. Cogn. Brain Res. Ragland, J.D., Turetsky, B.I., Gur, R.C., Gunning-Dixon, F., Turner, T., Schroeder, L., Chan, R., Gur, R.E., 2002. Working memory for complex figures: an fMRI comparison of letter and fractal n-back tasks. Neuropsychology 16, 370 – 379. Ravizza, S.M., Delgado, M.R., Chein, J.M., Becker, J.T., Fiez, J.A., 2004. Functional dissociations within the inferior parietal cortex in verbal working memory. NeuroImage 22, 562 – 573. Ruchkin, D.S., Berndt, R.S., Johnson Jr., R., Ritter, W., Grafman, J., Canoune, H.L., 1997. Modality-specific processing streams in verbal working memory: evidence from spatio-temporal patterns of brain activity. Brain Res. Cogn. Brain Res. 6, 95 – 113. Schumacher, E.H., Lauber, E., Awh, E., Jonides, J., Smith, E.E., Koeppe, R.A., 1996. PET evidence for an amodal verbal working memory system. NeuroImage 2, 79 – 88. Smith, E.E., Jonides, J., 1997. Working memory: a view from neuroimaging. Cogn. Psychol. 33, 5 – 42. Smith, E.E., Jonides, J., 1998. Neuroimaging analyses of human working memory. Proc. Natl. Acad. Sci. 95, 12061 – 12068. Smith, E.E., Jonides, J., 1999. Storage and executive processes in the frontal lobes. Science 283, 1657 – 1661. Snodgrass, J.G., Vanderwart, M., 1980. A standardized set of 260 pictures: norms for name agreement, image agreement, familiarity, and visual complexity. J. Exp. Psychol. 6, 174 – 215 (Hum. Learn.) Talairach, J., Tournoux, P., 1988. Co-Planar Stereotaxic Atlas of the Human Brain. Thieme, Stuttgart. Wager, T.D., Smith, E.E., 2003. Neuroimaging studies of working memory: a meta-analysis. Cogn. Affect. Behav. Neurosci. 3, 255 – 274.