Behavioural Brain Research 203 (2009) 200–206
Contents lists available at ScienceDirect
Behavioural Brain Research journal homepage: www.elsevier.com/locate/bbr
Research report
Deictic word and gesture production: Their interaction Sergio Chieffi a,b,∗ , Claudio Secchi a , Maurizio Gentilucci a a b
Department of Neuroscience, Section of Physiology, University of Parma, Via Volturno 39, 43100 Parma, Italy Department of Experimental Medicine, Section of Physiology, Second University of Naples, Via Costantinopoli 16, 80138 Napoli, Italy
a r t i c l e
i n f o
Article history: Received 22 December 2008 Received in revised form 27 April 2009 Accepted 3 May 2009 Available online 9 May 2009 Keywords: Deictic gesture Deictic word Gesture production Word production Gesture kinematics Voice spectra
a b s t r a c t We examined whether and how deictic gestures and words influence each other when the content of the gesture was congruent or incongruent with that of the simultaneously produced word. Two experiments were carried out. In Experiment 1, the participants read aloud the deictic word ‘QUA’ (‘here’) or ‘LA” (‘there’), printed on a token placed near to or far from their body. Simultaneously, they pointed towards one’s own body, when the token was placed near, or at a remote position, when the token was placed far. In this way, participants read ‘QUA’ (‘here’) and pointed towards themselves (congruent condition) or a remote position (incongruent condition); or they read ‘LA” (‘there’) and pointed towards a remote position (congruent condition) or themselves (incongruent condition). In a control condition, in which a string of ‘X’ letters was printed on the token, the participants were silent and only pointed towards themselves (token placed near) or a remote position (token placed far). In Experiment 2, the participants read aloud the deictic word placed in the near or far position without gesturing. The results showed that the congruence/incongruence between the content of the deictic word and that of the gesture affected gesture kinematics and voice spectra. Indeed, the movement was faster in the congruent than in the control and incongruent conditions; and it was slower in the incongruent than in the control condition. As concerns voice spectra, formant 2 (F2) decreased in the incongruent conditions. The results suggest the existence of a bidirectional interaction between speech and gesture production systems. © 2009 Elsevier B.V. All rights reserved.
1. Experiment 1 1.1. Introduction Discourse production, in many cases, involves not only the production of speech sounds but also the performance of hand/arm gestures. Traditionally, gestures have been assumed to share with speech a computational stage and have mainly communicative and informative functions [34–38,49–51]. In support of this view there are a number of evidences. Firstly, gestures and speech show parallel semantic and pragmatic functions [11,49]. This is the case of referential gestures that show a formal relation to the semantic content of the concurrent linguistic item that may be both concrete objects and events (iconic gestures) and abstract concepts (metaphoric gestures) [53]. Other gestures, termed either beats [53] or batons [17], demonstrate parallels of pragmatic function. They emphasize discourse-oriented functions where the importance of a linguistic item arises, not from its own propositional content, but from its relation to other linguistic items [48]. Fur-
∗ Corresponding author at: Department of Experimental Medicine, Section of Physiology, Second University of Naples, Via Costantinopoli 16, 80138 Napoli, Italy. Tel.: +39 81 5665820; fax: +39 81 5667500. E-mail address: sergio.chieffi@unina2.it (S. Chieffi). 0166-4328/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.bbr.2009.05.003
ther, in children’s development, gestures and speech seem to develop together through the same stages of increasing symbolization [3,25,49,64]; and gestures and speech may be simultaneously affected by neurological damage [7,8,12,15]. Along similar lines, McNeill and Duncan [52] proposed that gestures, together with speech, express the same underlying idea unit but not necessarily express identical aspects of it. The confluence of speech and gesture suggests that the speaker is thinking in terms of a combination of imagery and linguistic categorial content [52]. The growth point is the name that McNeill and Duncan [52] give to the minimal psychological unity combining imagery and linguistic categorial content. However, other authors have also suggested that gestures have a role in the speech production process. The majority of researchers that support this view have followed the model of speech production proposed by Levelt [46,47], which divides this process into three broad stages: conceptualization, formulation and articulation. Some authors placed the influence of gesture on speech during the conceptualization stage, i.e. gestures would have internal functions by helping speakers to organize their own thinking [1,28,39]. Such a view of gesture is referred as the Information Packing Hypothesis [39]. According to this hypothesis, gestures help speakers to organize and translate spatio-motoric knowledge into linguistic output. Evidence for the Information Packing Hypothesis comes from studies showing a greater production of
S. Chieffi et al. / Behavioural Brain Research 203 (2009) 200–206
gestures, both in children [1] and in adults [28,29], when the task requires a more complex conceptualization. Other authors placed the influence of gesture on speech during the formulation stage, i.e. gestures help speakers to access to specific items in their mental lexicon [6,9,27,43,55,59]. Such a view of gesture is referred as the Lexical Access Hypothesis. Two types of evidence support this view. First, gesture rates increase when lexical access is difficult to name [56] or when names have not been rehearsed [9]. Second, prohibiting gestures makes speech less fluent [59]. In investigating the processing involved in gesture and speech production, several models have been proposed which follow the account of Levelt [46,47]. The main difference between the proposed models lies at which level of computation the production of gestures and speech occurs. The Krauss’ model [43,44] assumes that gestures are generated from non-propositional (spatio-dynamic) representations in the working memory. A spatial/dynamic feature selector transforms these representations into abstract specifications that will be translated into a motor program for gesture production. Simultaneously, the conceptualizer retrieves propositional representations and elaborates a preverbal message that will be transformed by the formulator in overt speech. The de Ruiter’s model [16] assumes that the conceptualizer has access, in working memory, to both imagistic (or spatio-temporal), for generation of gestures, and propositional information, for the generation of preverbal messages. Then, the output of the conceptualizer will be, besides a preverbal message, a representation called sketch, which contains information that will be sent to the gesture planner. Finally, Kita and Özyürek [40] proposed the Interface Model for speech and gesture production. They suggested [40,41,58] that the message generation process for speech (Conceptualizer in Levelt [46]) interacts online with the process that determines the content of gestures (‘Action Generator’). The Action Generator takes into account both the information in spatio-motoric working memory and the message representation for speech in the Conceptualizer. Unlike the preceding models according to which gestures are generated before and without access to linguistic formulation processes, according to the Kita and Özyürek’s model [40] speech and gesture production processes interact online at conceptual level. Traditionally, the relationship between gesture and word production has been studied through the observation of spontaneous activity during conversations or narratives. Levelt et al. [48] were among the first to propose an experimental approach to this question. They asked participants to indicate which of an array of referent lights was momentarily illuminated. There were four LEDs, two in each field: one LED was near to and the other one far from the midline. The participants pointed to the light (deictic gesture) and/or used a deictic expression “this light”, to indicate the near LED, or “that light”, to indicate the far LED. By analyzing the timing of gesture and speech onset, Levelt et al. [48] found that their synchronization was largely established in the planning phase and, once the pointing movement was initiated, gesture and speech operated in almost modular fashion. Successively, Gentilucci and co-workers [2,5] required participants to produce simultaneously communicative words and symbolic gestures of the same [5] or different [2] meaning, e.g. they pronounced “ciao” while performing the “ciao gesture” or pronounced “no” while performing the “ciao gesture”. The authors found that voice parameters were amplified, whereas arm kinematics was slowed down as compared to the sole production of either the words or the gestures, but only in the condition of congruence between gesture and word [2,5]. They proposed that spoken words and symbolic gestures are coded as single signal by a unique communication system [5,19,22–24]. The system governing the interactions between gesture and word is probably located in Broca’s area, as shown by a repetitive Transcranial
201
Magnetic Stimulation (rTMS) study [20]. Krahmer and Swerts [42] examined whether the occurrence of a beat on a particular word had a noticeable impact on speech itself. In their study, speakers were instructed to produce a target sentence containing two proper names that might be marked for prominence with a pitch accent and/or with a beat gesture. Krahmer and Swerts [42] found that beat gestures have a significant effect on the spoken realization of the target words. When a speaker produced a beat, the word uttered while making the beat was produced with relatively more spoken emphasis, irrespective of the position of the acoustic accent. Krahmer and Swerts [42] suggested that, at least for manual beat gestures, there is a very close connection between speech and gesture. In the present experiment, we examined whether and how gesture and speech influence each other when the content of the deictic gesture was congruent/incongruent with that of the simultaneously produced deictic word. Typically, deictic gestures are pointing movements with the index finger extended and the remaining fingers closed. They are used to indicate an object or a person, a direction, a location, or more abstract referents such as “past time”. The “meaning” of a deictic gesture is the act of indicating the things pointed to [43]. Declarative and request pointing appear in infants at the age of approximately 10 months and are frequently accompanied by vocalizations [65]. Bernardis and co-workers [4] found that in infants the voice spectra of vocalizations produced during request pointing are influenced by the dimensions of the object target of the pointing. In other words, gesture and vocalization specify location and properties of the object showing a strict interaction between gesture and emergent lexicon. On the basis of these data [4] we were interested in verifying the existence of an interaction between “simpler” signals (i.e. deictic words and gestures) that we hypothesized to occur also at level of signal parameterisation besides that of temporal coordination [48]. In fact, previous studies (see for example [2,5,42]) analysed gestures that are involved in more complex functions and their interaction is necessary because they usually add information to spoken language. In the present experiment participants read aloud a deictic word, ‘QUA’ (‘here’) or ‘LA” (‘there’), printed on a token which could be placed in two positions, near to or far from one’s own body. Simultaneously, they performed a deictic (or pointing) gesture directed at one’s own body, when the token was placed near, or at a remote position, when the token was placed far. In this way, the participants read aloud ‘QUA’ (‘here’) and pointed towards themselves (congruent condition) or a remote position (incongruent condition). Similarly, they read ‘LA” (‘there’) and pointed towards a remote position (congruent condition) or themselves (incongruent condition). There was also a further condition in which the strings ‘XXX’ or ‘XX” were printed on the token. In this case, the participants were silent and only pointed towards themselves (token placed near) or a remote position (token placed far). This was a control condition, which was used to examine the performance of the participants when the sole gesture was produced in comparison to when both gesture and word were produced. We examined if the congruence/incongruence between the content of the word and that (i.e. the direction) of the to-be-performed gesture influenced verbal or gestural production or both. Our prediction was as follows: (a) the presence of an effect both on verbal and gestural production would have supported the hypothesis of the existence of a bidirectional interaction between the two production systems; (b) the presence of an effect on verbal or gestural production would have supported the hypothesis of the existence of an unidirectional interaction between the two systems; (c) the absence of an any effect on both verbal or gestural production would have supported the hypothesis of an independence between the two systems.
202
S. Chieffi et al. / Behavioural Brain Research 203 (2009) 200–206
1.2. Materials and methods 1.2.1. Participants Twelve right-handed (according to Edinburgh Inventory [57]) women participated in the study (ages 19–32 years). All participants were naïve as to the purpose of the study. The study was approved by the Ethics Committee of the Medical Faculty of the University of Parma. 1.2.2. Apparatus Participants sat in front of a black table, in a dark and soundproof room. They fixed a LED (fixation point, FP) placed 38 cm from the table edge. Each participant placed her index finger on a switch located on the table plane (starting position, SP). SP was 20 cm distant from the table edge. A circular white token, diameter 5.5 cm, was placed either in a ‘near’ (between participant’s trunk and SP) or ‘far’ (beyond SP) position. Near position was 8 cm distant from table edge (and 12 cm distant from SP); far position was 68 cm distant from table edge (and 48 cm distant from SP). FP, SP and both near and far token were placed along the participant’s midsagittal axis. A microphone (Studio Electret Microphone, 20–20,000 Hz, 500 omega, 5 mV/Pa/1 kHz) was placed on the table by means of a support. The centre of the support was 20 cm distant from table edge and 18 cm distant from participant’s sagittal axis, on the left. Either a deictic word (‘QUA’, i.e. ‘here’, or ‘LA”, i.e. ‘there’) or a X-string of letters (‘XXX’ or ‘XX”) was printed in black on the token. The height of both the words and X-string of letters was 1.5 cm. Further, ‘QUA’ and ‘XXX’ were 4.5 cm wide, ‘LA” and ‘XX” 3.5 cm wide. 1.2.3. Procedure The trial started with the illumination of the room. Illumination was commanded by a PC. As soon as the room was illuminated, the participants were required to read aloud the deictic word and to perform simultaneously a pointing movement directed either towards themselves (the forearm was flexed), if the token was near, or at a remote position (both the arm and forearm were extended), if the token was placed far. If X-string of letters was printed on the token, the participants performed only a pointing movement directed either towards themselves (token placed near) or a remote position (token placed far). The participants were required to move with the maximal velocity. There were the following six experimental conditions: token with the word ‘QUA’ (‘here’) and placed in the (1) near or (2) far position; token with the word ‘LA” (‘there’) and placed in the (3) near or (4) far position; token with X-string (‘XXX’ or ‘XX”) and placed in the (5) near or (6) far position. For each experimental condition there were eight trials. In total, 48 trials were pseudo-randomly run. 1.2.4. Movement and voice recording Pointing movements of index finger were recorded using the three-dimensional (3D)-optoelectronic ELITE system (B.T.S. Milan, Italy). It consists of two TV cameras detecting infrared reflecting markers at the sampling rate of 50 Hz. Movement reconstruction in 3D coordinates and computation of the kinematic parameters are described in a previous work [21]. One marker was placed on the participant’s index finger and it was used to analyze the kinematics (index-displacement, -peak acceleration, -peak velocity) of the deictic gesture. Index displacement was calculated as distance in 3D space between the final and initial position of the index finger. The voice emitted by the participants during word pronunciation was recorded by a microphone connected to a PC for sound recording by means of a card device (16 PCI Sound Blaster CREATIVE Technology Ltd., Singapore). The spectrogram of each pronounced deictic word was computed for each participant using the PRAAT software (University of Amsterdam, the Netherlands). The time courses of formant 1 (F1) and formant 2 (F2) were analysed. The central part of the formant time course was analysed by excluding both the formant transition (consonant/vowel) and the final part of the vowel during which echo could add to the emitted sound. The mean values of F1 and F2 of the vowel /a/ were analysed.
Fig. 1. Mean values of index peak acceleration measured in Experiment 1. Bars are S.E.
n.s.). There was a significant interaction between the two factors (F(2,22) = 5.80, p < 0.01). Post-hoc comparisons showed that when the participants pointed towards a remote position (i.e. token placed far), index peak acceleration was lower when they simultaneously read ‘QUA’ (‘here’) than when they read ‘LA” (‘there’) or were silent. Further, when the participants read ‘QUA’ (‘here’), peak acceleration was lower when they pointed towards remote positions than when they pointed towards themselves (i.e. token placed near) (Fig. 1). No significant main effects were found on index peak velocity (deictic word: F(2,22) = 0.02, n.s.; token position: F(1,11) = 3.13, n.s.). Again, there was a significant interaction (F(2,22) = 8.27, p < 0.005) between the two factors. Post-hoc comparisons showed that when the participants pointed towards a remote position, index peak velocity was lower when they simultaneously read ‘QUA’ (‘here’) than when they read ‘LA” (‘there’) or were silent. Further, when the participants pointed towards themselves, index peak velocity was greater when they simultaneously read ‘QUA’ (‘here’) than when they read ‘LA” (‘there’) or were silent (Fig. 2). Post-hoc comparisons also showed that when the participants read ‘QUA’ (‘here’) or ‘LA” (‘there’) or were silent, index peak velocity was greater when they simultaneously pointed towards themselves than when they pointed towards a remote position (Fig. 2). These differences in peak velocity might depend on differences in index displacement, considering that movement velocity increases with increasing movement amplitude [20]. However, the statistical analysis did not support this hypothesis. Indeed, index displacement of gestures directed towards themselves did not differ from that of gestures directed towards a remote position (F(1,11) = 3.79, n.s.; near position = 38.9 cm; far position = 47.0 cm).
1.2.5. Data analysis Deictic gesture: Separate ANOVAs were conducted on mean values of the analyzed index finger kinematic parameters, with deictic word (‘QUA’ (‘here’) vs. ‘LA” (‘there’) vs. ‘X’) and token position (near vs. far) as within-participant factors. Voice: Separate ANOVAs were also conducted on mean F1 and F2 values of the vowel /a/ of both ‘QUA’ (‘here’) and ‘LA” (‘there’), with deictic word (‘QUA’ (‘here’) vs. ‘LA” (‘there’)) and token position (near vs. far) as within-participant factors. In all analyses, paired comparisons were performed using Newman–Keuls procedure. Significance level was fixed at p < 0.05.
1.3. Results 1.3.1. Deictic gesture No significant main effects were found on index peak acceleration (deictic word: F(2,22) = 0.21, n.s.; token position: F(1,11) = 4.14,
Fig. 2. Mean values of index peak velocity measured in Experiment 1. Bars are S.E.
S. Chieffi et al. / Behavioural Brain Research 203 (2009) 200–206
Fig. 3. Mean formant 2 (F2) values of vowel /a/ for ‘QUA’ (‘here’) and ‘LA” (‘there’) measured in Experiment 1. Bars are S.E.
1.3.2. Voice The analysis of the voice spectrograms showed, as regards mean F1 values, the presence of a significant effect of deictic word (F(1,11) = 66.07; p < 0.00001; ‘QUA’ (‘here’) = 859.0 Hz; ‘LA’ (‘there’) = 943.6 Hz). There was no significant effect of token position (F(1,11) = 1.06, n.s.) and no interaction (F(1,11) = 0.70; n.s.). Regarding mean F2 values, there was a significant effect of deictic word (F(1,11) = 16.30; p < 0.002; ‘QUA’ (‘here’) = 1351.2 Hz; ‘LA’ (‘there’) = 1540.5 Hz), but not of token position (F(1,11) = 4.26, n.s.). A significant interaction between the two factors was present (F(1,11) = 5.01, p < 0.05). For ‘QUA’ (‘here’) pronunciation, post-hoc comparisons showed that F2 was lower when the participants pointed towards a remote position (i.e. token placed far) than when they pointed towards themselves (i.e. token placed near). For ‘LA’ (‘there’) no significant effect due to token position was found (Fig. 3). 1.4. Discussion The main finding of the present experiment was that the congruence/incongruence between the content of deictic word and that of gesture (i.e. its direction) influenced both gesture kinematics and voice spectra. Indeed, when the token was placed far and the participants pointed towards a remote position, both index peak acceleration and peak velocity were lower when they read the word ‘QUA’ (‘here’) than when they read the word ‘LA” (‘there’), or were silent. Conversely, when the token was placed near and the participants pointed towards themselves, index peak velocity was greater when they read the word ‘QUA’ (‘here’) than when they read the word ‘LA” (‘there’), or were silent. Further, as regards voice spectra, when the participants read the word ‘QUA’ (‘here’), F2 was lower when they simultaneously pointed towards a remote position than when they pointed towards themselves. The data also showed that, overall, peak velocity was greater when the participants pointed towards themselves than when they pointed towards a remote position. These differences cannot be ascribed to differences in index displacement between the two spatial conditions. A possible explanation of this observation is that movements in far conditions slowed down for increased arm inertia. Indeed, when the participants pointed towards themselves they moved only the forearm (elbow flexion), whereas when the participants pointed towards a remote position they moved both arm and forearm (elbow and shoulder extension). However, in our experimental design another factor might have influenced both the kinematics of deictic gesture and voice spectra, namely the presence and the physical location of the token on which the deictic word was printed.
203
Token was not the target of the pointing gesture. It signalled the direction of the deictic gesture. Previous studies showed that contextual stimuli may influence movement kinematics. Contextual stimuli may produce an increase in movement duration [60,61], a decrease in peak wrist velocity and an increase in deceleration phase [31], a deviation in movement trajectory [10,18,66]. However, if we hypothesize that, in our experiment, the presence of the token had influenced movement kinematics, such an influence would have been similar in both token positions, being the direction of gestures always congruent with token position. Indeed, the participants pointed towards themselves, when the token was near, or at a remote position, when the token was far. Nevertheless, we observed different effects on pointing kinematics between the two token positions. When the participants pointed toward themselves, peak velocity was greater when they simultaneously read the word ‘QUA’ (‘here’) than when they read the word ‘LA” (there) or were silent. When the participants pointed toward a remote position, both peak acceleration and velocity were lower when they read the word ‘QUA’ (‘here’) than when they read the word ‘LA” (there) or were silent. Therefore, the specific effects we observed on pointing kinematics in the two token position conditions cannot be ascribed to the presence and the physical location of the token, but rather to the congruence vs. incongruence between the content of the gesture and that of the word. The same cannot be said for the effects observed on voice spectra. In fact, in this case, it needs to consider not only the congruence vs. incongruence between the content of the word and that of the to-be-performed gesture but also between the content of the word and the physical location of the token. In other words, the decrease in F2 observed when the participants read the word ‘QUA’ (‘here’) and simultaneously pointed towards a remote position, in comparison to when they pointed towards themselves, might not depend on the incongruence between the content of the word and that of the gesture but the incongruence between the content of the word and the spatial position of the token. In order to assess the possible influence of the spatial position of the token on voice spectra we carried out Experiment 2. 2. Experiment 2 2.1. Introduction Experiment 2 was performed to examine if the congruence/ incongruence between the content of the deictic term and the position of the token influenced voice spectra. The participants read the deictic term, ‘QUA’ (‘here’) or ‘LA” (‘there’), printed on the token without gesturing. The token was placed near or far. Consequently, they read the word ‘QUA’ (‘here’) printed on the token placed in the near (congruent condition) or far (incongruent condition) position; or they read the word ‘LA” (‘there’) printed on the token placed in the far (congruent condition) or near (incongruent condition) position. In this way we could examine if the position of the token influenced voice spectra. 2.2. Materials and methods 2.2.1. Participants Twelve right-handed (according to Edinburgh Inventory [57]) women participated in the experiment (ages 21–30 years). All participants were naïve as to the purpose of the study. They differed from those that participated in Experiment 1 in order to avoid a covert activation of pointing gestures. 2.2.2. Apparatus The apparatus was the same as in Experiment 1. The tokens with the deictic words were used. Token was placed either in the near or far position as in Experiment 1. 2.2.3. Procedure The participants were required to read aloud the deictic word printed on the token, which was placed in the near or far position. There were four experimental
204
S. Chieffi et al. / Behavioural Brain Research 203 (2009) 200–206
selves. Thus, also in this case, the reduction of F2 was likely due to the production of the gesture whose direction was incongruent with the content of the word simultaneously read. 3. General discussion
Fig. 4. Mean formant 2 (F2) values of vowel /a/ for ‘QUA’ (‘here’) and ‘LA” (‘there’) measured in Experiment 2. Bars are S.E.
conditions: token with the word ‘QUA’ (‘here’) and placed in the (1) near or (2) far position; token with the word ‘LA” (‘there’) and placed in the (3) near or (4) far position. For each experimental condition there were eight trials. 32 trials were pseudo-randomly run. 2.2.4. Voice recordings and data analyses Voice recording and analyses performed on F1 and F2 were the same as in Experiment 1.
2.3. Results The analysis of the voice spectrograms showed, as regards mean F1 values, the presence of a significant effect of deictic word (F(1,11) = 27.42; p < 0.0005; ‘QUA’ (‘here’) = 875.7 Hz; ‘LA’ (‘there’) = 981.0 Hz). There was no significant effect of token position (F(1,11) = 1.71, n.s.) and no interaction (F(1,11) = 0.02; n.s.). Regarding mean F2 values, there was a significant effect of both deictic word (F(1,11) = 32.69, p < 0.0002; ‘QUA’ (‘here’) = 1364.8 Hz; ‘LA” (‘there’) = 1567.4 Hz) and token position (F(1,11) = 6.41, p < 0.05; near position = 1478.4 Hz; far position = 1453.7 Hz). Further, there was a significant interaction between the two factors (F(1,11) = 8.21, p < 0.02). For ‘LA” (‘there’) pronunciation post-hoc comparisons showed that F2 was greater when the token was placed in the near than in the far position. For ‘QUA’ (‘here’) no significant effect due to token position was found (see Fig. 4). 2.4. Discussion The results of the present experiment, if compared with those of Experiment 1, suggest that the position of the token, on which the deictic word was printed, did not influence voice spectra. In Experiment 2, as regards ‘QUA’ (‘here’), the value of F2 measured when the token was placed far was not significantly different from that measured when the token was placed near. Conversely, in Experiment 1, the value of F2 measured when the token was placed far was lower than that measured when the token was placed near. It needs to remember that in Experiment 1 the participants, besides reading the word, performed simultaneously a pointing gesture and when the token was placed far they pointed towards a remote position. Thus, the reduction of F2 observed in Experiment 1 was likely due to the production of the gesture whose direction was incongruent with the content of the word simultaneously read. As regards ‘LA’ (‘there’), the value of F2 measured in Experiment 2 when the token was placed near was greater than that measured when the token was placed far. Conversely, in Experiment 1, the value of F2 measured when the token was placed near was not significantly different from that measured when the token was placed far. Thus, it is possible that in Experiment 1 there has been a reduction of F2 when the token was placed near and the participants, besides reading the word, simultaneously pointed towards them-
In the present study, the participants read aloud a deictic word and simultaneously performed a deictic gesture. The main finding was that the congruence/incongruence between the contents of the two signals influenced their production. This is in favour of the hypothesis that speech and gesture production systems interact with each other. According to the dual-route model, two types of mechanisms support reading aloud [13,14]. The non-lexical route allows readers to derive the sounds of written words by means of mechanisms that convert letters or letter clusters into their corresponding sounds. This route is functionally limited in that it does not provide information about word meaning. Conversely, the lexical route is implicated in the retrieval of stored information about the orthography, semantics, and phonology of familiar words [13,14]. The access to the meaning of a lexical item should activate its conceptual representation that incorporates a set of both propositional and non-propositional (e.g. visual, spatial and dynamic) properties [43,44]. Non-propositional specifications, e.g. visual [51], visuo-spatial [26], spatio-dynamic [43], spatio-temporal [16] or spatio-motoric [40], should be translated by a motor planner into a motor program that provides the motor system with a set of instructions for executing the gesture. If we consider the deictic words used in our study, one may expect the reading of ‘QUA’ (‘here’), that means “in, at or to this place or position”, activates spatio-dynamic (or -motoric) specifications that, in turn, trigger a motor plan for a pointing movement directed toward a near position; and the reading of ‘LA” (‘there’), that means “in, at or to that place or position”, activates a motor plan for a pointing movement directed toward a far position. Besides reading and accessing to the meaning of the printed word, the participants simultaneously processed another kind of information, namely token position that indicated the direction (content) of the to-be-performed deictic gesture. When the token was placed near, the participants had to plan a pointing movement directed towards themselves; when the token was placed far, the participants had to plan a pointing movement directed towards a remote position. Thus, it is possible that the congruence/incongruence between the content of the deictic word and that of the to-be-performed gesture could affect both kinematics of pointing movement and voice spectra. The results of the present study suggest that this actually occurred. Indeed, for gesture production, when the participants pointed towards themselves, index peak velocity value was greater when they simultaneously read the word ‘QUA’ (‘here’) (congruent condition) than when they read the word ‘LA” (‘there’) (incongruent condition) or were silent (control condition). This might depend on an amplification of gesture parameterization due to a synergic (or resonance) effect between: (a) the spatio-dynamic specifications related to the content of the deictic word and those related to the content of the to-be-performed gesture or (b) the motor program triggered by spatio-dynamic specifications related to the content of the deictic word and that triggered by the spatio-dynamic specifications related to the content of the gesture. Further, when the participants pointed towards a remote position, both index peak-acceleration and -velocity were lower when they simultaneously read the word ‘QUA’ (‘here’) (incongruent condition) that when they read the word ‘LA” (‘there’) (congruent condition) or were silent (control condition). This might depend on a partial inhibition of gesture parameterisation due to a conflict between: (a) the spatio-dynamic specifications related to the con-
S. Chieffi et al. / Behavioural Brain Research 203 (2009) 200–206
tent of the deictic word and those related to the content of the to-be-performed gesture or (b) the motor program triggered by spatio-dynamic specifications related to the content of the deictic word and that triggered by the spatio-dynamic specifications related to the content of the gesture. As concerns voice spectra, from the examination of the results obtained from both Experiments 1 and 2, and from their comparison, it results that there was a reduction of F2 when the content of the deictic word was incongruous with that of the to-be-performed gesture. This occurred both when the participants read the word QUA (‘here’) and when they read the word LA’ (‘there’). Thus, it is possible to hypothesize that the conflict between the content of the deictic word and that of the to-be-performed gesture interfered with phonetic planning that serves as input to the articulatory system. It is interesting to note that the effects on gesture kinematics were evident only when the participants read the word ‘QUA’ (‘here’) and simultaneously performed a pointing gesture. For the generation of pointing gestures, de Ruiter [16] proposed that some parameters are fixed and stored in memory, e.g. the shape of hand, others are free and constitute the degrees of freedom of gesture, e.g. arm orientation. However, it is possible that also arm orientation is stored in memory if a particular pointing gesture is usually oriented towards a narrow region of space. This might be the case of the deictic gestures performed in association with the word ‘here’, considering that the region of space indicated from this kind of gesture is less wide than that indicated from gestures performed in association with the word ‘there’. Indeed, ‘here’ refers especially to the speaker’s peripersonal space, whereas ‘there’ refers to all the space beyond the peripersonal space. Thus, when the participants read the word ‘here’ and pointed towards themselves, gesture might be facilitated by using spatio-dynamic parameters already stored in memory. Conversely, when the participants read the word ‘here’ and pointed towards a remote position, gesture might be partially inhibited because the spatio-dynamic parameters retrieved from memory, and related to word content, conflict with those related to the to-be-performed gesture. The results of the present study differ from those found by Gentilucci and co-workers [2,5] in that an increase rather than a decrease [5] in the arm kinematics parameters was observed when congruent gesture and word were simultaneously produced, and a decrease rather than no effect [2] was observed when gesture and words were incongruent. These contrasting results can be explained by considering that the deictic gesture and word code the same information on a spatial location. The localization is more precise for the gesture than for the corresponding deictic word. Consequently, their simultaneous execution could induce resonance and, in turn, amplification of arm movement parameters. In contrast, communicative words and symbolic gestures can code different aspects of the same meaning. For example, the gestures studied by Gentilucci and co-workers [2,5] (i.e. CIAO, NO and STOP) can code the intention to interact directly with the interlocutor. This aspect may be absent in the corresponding word. Consequently, the gesture can transfer this aspect to the word, which in turn, when is of the same meaning, partially inhibits gesture execution. Indeed, in this case the gesture becomes somewhat redundant [2,19]. In the studies by Gentilucci and co-workers [2,4] an increase in F2 was found in both the conditions of congruence and incongruence of the gesture, whereas in the present study a decrease in F2 was found in the incongruent condition. Placing the tongue forward/backward induces increase/decrease in F2 [45]. Previously, Gentilucci and co-workers [2,5] suggested that the increase in F2 induced by the gesture was due to the intention to interact directly with the interlocutor because in non-humans, both mouth aperture and tongue protrusion accompany gestures typical of approaching relationships (for example, tongue is protruded during lip-macking
205
and protruded face that precede grooming actions among monkeys [62,63]). A similar explanation can be offered for the results of the present study. However, the gesture affected differently the word. In fact, the incongruent deictic gesture reduced the possibility of communicative intention of the word: consequently the tongue was retracted and F2 decreased. No increase in F2 was observed in the case of congruence of the gesture with the word probably because the symbolic gestures studied by Gentilucci and co-workers [2,5] always contain a communicative intention, which is automatically transferred to the word. In contrast, the context can make communicative the deictic gesture [33,54], and only in this case the simultaneous production of the two signals can be associated to an increase in F2. In conclusion, the results of the present study suggest the existence of a tight interaction between the systems involved in producing deictic word and gesture, so as suggested in previous studies for emblems [2,5], iconic [40,41,58] and beat gestures [42]. A tight interaction between the two systems was also reported in comprehension domain. A number of experimental studies investigated how the brain integrates comprehension of hand gestures with co-occurring speech and provided evidence that semantic processing evoked by gestures is qualitatively similar to that of words [67]. In ERP studies, when subjects were presented with cospeech gestures, a semantically anomalous gesture as well as an anomalous word elicited a stronger negative deflection in the signal around 400 ms after (N400 effect) [32,33,58] and in fMRI studies an increased activation in an overlapping region in the left frontal cortex [68]. Recently, Hubbard et al. [30] studied subjects underwent fMRI while listening to spontaneously-produced speech accompanied by rhythmic (beat) gesture and found a greater activity in left superior temporal gyrus and sulcus (STG/S), areas well-known for their role in speech processing, suggesting the existence of a common neural substrate for processing speech and gesture. References [1] Alibali MW, Kita S, Young A. Gesture and the process of speech production: we think, therefore we gesture. Lang Cogn Process 2000;15:593–613. [2] Barbieri F, Buonocore A, Dalla Volta R, Gentilucci M. How symbolic gestures and words interact with each other. Brain Lang 2009 [on line]. [3] Bates E, Dick F. Language, gesture, and the developing brain. Dev Psychobiol 2002;40:293–310. [4] Bernardis P, Bello A, Pettenati P, Stefanini S, Gentilucci M. Manual actions affect vocalizations of infants. Exp Brain Res 2008;184:599–603. [5] Bernardis P, Gentilucci M. Speech and gesture share the same communication system. Neuropsychologia 2006;44:178–90. [6] Butterworth B, Hadar U. Gesture, speech and computational stages: a reply to McNeill. Psychol Rev 1989;96:168–74. [7] Carlomagno S, Pandolfi M, Marini A, Di Iasi G, Cristilli C. Coverbal gestures in Alzheimer’s type dementia. Cortex 2005;41:535–46. [8] Carlomagno S, Santoro A, Menditti A, Pandolfi M, Marini A. Referential communication in Alzheimer’s type dementia. Cortex 2005;41:520–34. [9] Chawla P, Krauss RM. Gesture and speech in spontaneous and rehearsed narratives. J Exp Soc Psychol 1994;30:580–601. [10] Chieffi S, Ricci M, Carlomagno S. Influence of visual distractors on movement trajectory. Cortex 2001;37:389–405. [11] Chieffi S, Ricci M. Gesture production and text structure. Percept Mot Skills 2005;101:435–9. [12] Cicone M, Wapner W, Foldi N, Zurif E, Gardner H. The relation between gesture and language in aphasic communication. Brain Lang 1979;8:324–49. [13] Coltheart M, Curtis B, Atkins P, Haller M. Models of reading aloud: dualroute and parallel-distributed-processing approaches. Psychol Rev 1993;100: 589–608. [14] Coltheart M, Rastle K, Perry C, Langdon R, Ziegler J. DRC: a dual route cascaded model of visual word recognition and reading aloud. Psychol Rev 2001;108: 204–56. [15] Delis D, Foldi NS, Hamby S, Gardner H, Zurif E. A note on temporal relations between language and gestures. Brain Lang 1979;8:350–4. [16] de Ruiter JP. The production of gesture and speech. In: Mc Neill D, editor. Language and gesture. Cambridge: Cambridge University Press; 2000. p. 284–311. [17] Ekman P, Friesen W. The repertoire of nonverbal behaviour: categories, origins, usage and coding. Semiotica 1969;11:49–98. [18] Gangitano M, Daprati E, Gentilucci M. Visual distractors differentially interfere with the reaching and grasping components of prehension movements. Exp Brain Res 1998;122:441–52.
206
S. Chieffi et al. / Behavioural Brain Research 203 (2009) 200–206
[19] Gentilucci M, Benuzzi F, Gangitano M, Grimaldi S. Grasp with hand and mouth: a kinematic study on healthy subjects. J Neurophysiol 2001;86:1685–99. [20] Gentilucci M, Bernardis P, Crisi G, Dalla Volta R. Repetitive transcranial magnetic stimulation of Broca’s area affects verbal responses to gesture observation. J Cogn Neurosci 2006;18:1059–74. [21] Gentilucci M, Chieffi S, Scarpa M, Castiello U. Temporal coupling between transport and grasp components during prehension movements: effects of visual perturbation. Behav Brain Res 1992;15:71–82. [22] Gentilucci M, Corballis MC. From manual gesture to speech: a gradual transition. Neurosci Biobehav Rev 2006;30:949–60. [23] Gentilucci M, Dalla Volta R, Gianelli C. When the hands speak. J Physiol Paris 2008;102:21–30. [24] Gentilucci M, Stefanini S, Roy AC, Santunione P. Action observation and speech production: study on children and adults. Neuropsychologia 2004;42:1554–67. [25] Goldin-Meadow S, Butcher C. Pointing toward two-word speech in young children. In: Kita S, editor. Pointing: where language, culture, and cognition meet. Mawhaw, NJ: Erlbaum; 2003. p. 85–107. [26] Hadar U, Burstein A, Krauss R, Soroker N. Ideational gestures and speech in brain-damaged subjects. Lang Cognit Process 1998;13:59–76. [27] Hadar U, Yadlin-Gedassy S. Conceptual and lexical aspects of gesture: evidence from aphasia. J Neurolinguistics 1994;8:57–65. [28] Hostetter AB, Alibali MW. On the tip of the mind: gesture as key to conceptualization. In: Forbus K, Gentner D, Regier T, editors. Proceedings of the 26th Annual Meeting of the Cognitive Science Society. Mawah, NJ: Erlbaum; 2004. p. 589–94. [29] Hostetter AB, Alibali WN, Kita S. I see it in my hand’s eye: representational gestures are sensitive to conceptual demands. Lang Cogn Process 2007;22:313–36. [30] Hubbard AL, Wilson SM, Callan DE, Dapretto M. Giving speech a hand: gesture modulates activity in auditory cortex during speech perception. Hum Brain Mapp 2009;30:1028–37. [31] Jackson SR, Jackson GM, Rosicky J. Are non-relevant objects represented in working memory? The effect of non-target objects on reach and grasp kinematics. Exp Brain Res 1995;102:519–30. [32] Kelly SD, Kravitz C, Hopkins M. Neural correlates of bimodal speech and gesture comprehension. Brain Lang 2004;89:253–60. [33] Kelly SD, Ward S, Creigh P, Bartolotti J. An intentional stance modulates the integration of gesture and speech during comprehension. Brain Lang 2007;101:222–33. [34] Kendon A. Some relationships between body motion and speech. An analysis of an example. In: Siegman A, Pope B, editors. Studies in dyadic communication. Elmsford, NY: Pergamon; 1972. p. 177–210. [35] Kendon A. Gesticulation and speech: two aspects of the process of utterance. In: Key MR, editor. The relationship of verbal and nonverbal communication. Mouton: The Hague; 1980. p. 207–27. [36] Kendon A. Gesture and speech: how they interact. In: Wiemann JM, Harrison RP, editors. Nonverbal interaction. Beverly Hills, CA: Sage Publications; 1983. p. 13–45. [37] Kendon A. Do gestures communicate? A review. Res Lang Soc Interact 1994;27:175–200. [38] Kendon A. Gesture: visible action as utterance. Cambridge: Cambridge University Press; 2004. p. 412. [39] Kita S. How representational gestures help speaking. In: McNeill D, editor. Language and gesture. Cambridge, UK: Cambridge University Press; 2000. p. 162–85. [40] Kita S, Özyürek A. What does cross-linguistic variation in semantic coordination of speech and gesture reveal? Evidence for an interface representation of spatial thinking and speaking. J Mem Lang 2003;48:16–32. [41] Kita S, Özyürek A, Allen S, Brown A, Furman R, Ishizuka T. Relations between syntactic encoding and co-speech gestures: implications for a model of speech and gesture production. Lang Cognit Process 2007;22:1212–36. [42] Krahmer E, Swerts M. The effects of visual beats on prosodic prominence: acoustic analyses, auditory perception and visual perception. J Mem Lang 2007;57:396–414.
[43] Krauss RM, Chen Y, Gottesman RF. Lexical gestures and lexical access: a process model. In: McNeill D, editor. Language and gesture. New York: Cambridge University Press; 2000. p. 261–83. [44] Krauss RM, Hadar U. The role of speech-related arm/hand gestures in word retrieval. In: Campbell R, Messing L, editors. Gesture, speech, and sign. Oxford: Oxford University Press; 1999. p. 93–116. [45] Leoni FA, Maturi P. Manuale di Fonetica. Roma: Carocci; 2002. p. 172. [46] Levelt WJM. Speaking: from intention to articulation. Cambridge, MA: MIT Press; 1989. p. 566. [47] Levelt WJM. The skill of speaking. In: Bertelson P, Eelen P, d’Ydewalle G, editors. International perspectives on psychological science. (Vol. I: leading themes). Hillsdale: Lawrence Erlbaum Associates; 1994. p. 89–104. [48] Levelt WJM, Richardson G, La Heij W. Pointing and voicing in deictic expressions. J Mem Lang 1985;24:133–64. [49] McNeill D. So you think gestures are nonverbal? Psychol Rev 1985;92: 350–71. [50] McNeill D. Psycholinguistics: a new approach. New York: Harper & Row; 1987. p. 290. [51] McNeill D. Hand and mind: what gestures reveal about thought. Chicago: Univ. Chicago Press; 1992. p. 416. [52] Mc Neill D, Duncan SD. Growth points in thinking-for-speaking. In: McNeill D, editor. Language and gesture. New York: Cambridge University Press; 2000. p. 141–61. [53] McNeill D, Levy E. Conceptual representations in language activity and gesture. In: Jarvella R, Klein W, editors. Speech, place, and action: studies in deixis and related topics. Chichester, England: Wiley; 1982. p. 271–95. [54] Melinger A, Levelt WJM. Gesture and the communicative intention of the speaker. Gesture 2004;4:119–41. [55] Morrel-Samuels P, Krauss RM. Word familiarity predicts temporal asynchrony of hand gestures and speech. J Exp Psychol Learn Mem Cogn 1992;18: 615–22. [56] Morsella E, Krauss RM. The role of gestures in spatial working memory and speech. Am J Psychol 2004;117:411–24. [57] Oldfield RC. The assessment and analysis of handedness: the Edinburgh inventory. Neuropsychologia 1971;9:97–113. [58] Özyürek A, Kita S, Allen S, Furman R, Brown A. How does linguistic framing of events influence co-speech gestures? Insights from cross-linguistic variations and similarities. Gesture 2005;5:215–37. [59] Rauscher FB, Krauss RM, Chen Y. Gesture, speech and lexical access: the role of lexical movements in speech production. Psychol Sci 1996;7:226–31. [60] Tipper SP, Howard LA, Jackson SR. Selective reaching to grasp: evidence for distractor interference effects. Vis Cogn 1997;4:1–38. [61] Tipper SP, Lortie C, Baylis GC. Selective reaching: evidence for action-centered attention. J Exp Psychol Hum Percept Perform 1992;18:891–905. [62] Van Hoof JARAM. Facial expressions in higher primates. Symp Zool Soc Lond 1962;8:97–125. [63] Van Hoof JARAM. The facial displays of the catarrhine monkeys and apes. In: Morris D, editor. Primate ethology. London: Weidenfield and Nicolson; 1967. p. 7–68. [64] Volterra V, Bates E, Benigni L, Bretherton I, Campioni L. First words in language and action: a qualitative look. In: Bates E, Benigni L, Bretherton I, Camaioni L, Volterra V, editors. The emergence of symbols: cognition and communication in infancy. New York: Academic Press; 1979. p. 141–222. [65] Volterra V, Caselli MC, Capirci O, Pizzuto E. Gesture and the emergence and development of language. In: Tomasello M, Slobin D, editors. Beyond naturenurture. Essays in honor of Elizabeth Bates. NJ: Lawrence Erlbaum Associates; 2005. p. 3–40. [66] Welsh TN, Elliott D, Weeks DJ. Hand deviations toward distractors. Evidence for response competition. Exp Brain Res 1999;127:207–12. [67] Willems RM, Hagoort P. Neural evidence for the interplay between language, gesture, and action: a review. Brain Lang 2007;101:278–89. [68] Willems RM, Ozyürek A, Hagoort P. When language meets action: the neural integration of gesture and speech. Cereb Cortex 2007;17:2322–33.