The impact of task automaticity on speech in noise

The impact of task automaticity on speech in noise

Available online at www.sciencedirect.com ScienceDirect Speech Communication 65 (2014) 1–8 www.elsevier.com/locate/specom The impact of task automat...

256KB Sizes 1 Downloads 21 Views

Available online at www.sciencedirect.com

ScienceDirect Speech Communication 65 (2014) 1–8 www.elsevier.com/locate/specom

The impact of task automaticity on speech in noise Adam P. Vogel a,⇑, Janet Fletcher b, Paul Maruff c a Speech Neuroscience Unit, The University of Melbourne, Melbourne, Australia School of Languages and Linguistics, The University of Melbourne, Victoria, Australia c Howard Florey Institute for Neuroscience and Mental Health, The University of Melbourne, Victoria, Australia b

Received 5 September 2013; received in revised form 30 April 2014; accepted 7 May 2014 Available online 20 May 2014

Abstract In the control of skeleto-motor movement, it is well established that the less complex, or more automatic a motor task is, the less variability and uncertainty there is in its performance. It was hypothesized that a similar relationship exists for integrated cognitivemotor tasks such as speech where the uncertainty with which actions are initiated may increase when the feedback loop is interrupted or dampened. To investigate this, the Lombard effect was exploited to explore the acoustic impact of background noise on speech during tasks increasing in automaticity. Fifteen healthy adults produced five speech tasks bearing different levels of automaticity (e.g., counting, reading, unprepared monologue) during habitual and altered auditory feedback conditions (Lombard effect). Data suggest that speech tasks relatively free of meaning or phonetic complexity are influenced to a lesser degree by a compromised auditory feedback than more complex paradigms (e.g., contemporaneous speech) on measures of timing. These findings inform understanding of the relative contribution speech task selection plays in measures of speech. Data also aid in understanding the relationship between task automaticity and altered speech production in neurological conditions where dual impairments of movement and cognition are observed (e.g., Huntington’s disease, progressive aphasia). Ó 2014 Elsevier B.V. All rights reserved.

Keywords: Lombard effect; Automaticity; Task selection; Speech; Cognitive load; Timing

1. Introduction The cognitive control of speech production is complex and involves adaptive and compensatory mechanisms that are proposed to be based on stored representations of intended speech output (Houde and Jordan, 1998). When there is mismatch between speech output and the stored representation of intended speech, adjustments are made in real time to speech output (Tremblay et al., 2003). One example of this real time adjustment is in the adjustment of loudness, pitch and segment duration in response to changes in background noise during speech (Brown et al., ⇑ Corresponding author. Address: 550 Swanston Street, Melbourne, Victoria 3010, Australia. E-mail address: [email protected] (A.P. Vogel).

http://dx.doi.org/10.1016/j.specom.2014.05.002 0167-6393/Ó 2014 Elsevier B.V. All rights reserved.

1972; Letowski et al., 1993; Summers et al., 1988). This phenomena is termed the Lombard effect (Lombard, 1911). Cognitive neuroscientific models of the Lombard effect propose that background noise introduces error into the signal (where signal is defined as the afferent or efferent message conducted within the nervous system and error is defined as a disruption to the message conducted within the nervous system), distorting auditory feedback and subsequent output. Altered output in response to noise often manifests as increases in fundamental frequency, intensity and duration (Junqua, 1996; Summers et al., 1988; Wassink et al., 2007). Experimental studies indicate that these adjustments are made instantly and seemingly involuntarily and thus they are conceptualized as the result of the altered auditory feedback disrupting sensorimotor feedback loops and forward models of intended speech

2

A.P. Vogel et al. / Speech Communication 65 (2014) 1–8

(Castellanos et al., 1996; Houde and Jordan, 1998; Lee et al., 2007; Lu and Cooke, 2009). While disruption to forward models of intended speech is known to manifest across multiple parameters of speech, there has, as yet, been no systematic investigation into how this disruption affects different types of speech task. For example studies of the Lombard effect have operationalized speech production as single words (Patel and Schell, 2008; Summers et al., 1988), repetition of pre-learned sentences (Brown et al., 1972), reading paragraphs (Letowski et al., 1993; Watson and Hughes, 2006; Yates, 1963) or the generation of spontaneous speech (Rivers and Rastatter, 1985; Siegel et al., 1992). In the control of skeleto-motor movement, it is well established that the less complex, or more automatic a motor task is, the less variability and uncertainty there is in its performance (van Beers et al., 2002). These studies also show that the automaticity of intended movements influences error in those movements (Hayhoe and Ballard, 2005; Wolpert et al., 2011). The control of movement therefore reflects the combined integration of signal and error where error is defined as random or unpredictable fluctuations that are not part of a signal. Error increases with reduced automaticity of movement and this results in greater trial to trial variability in the movement itself (Bays and Wolpert, 2007; Bays et al., 2005; van Beers et al., 2002). As yet models of speech motor control have not exploited task automaticity as a way of understanding the effect of error in internal models of intended speech. Integrated cognitive-motor tasks such as speech that encompasses both higher level cortical involvement (e.g. in language formulation) and skeleto-motor control (e.g. in oral motor function) may be affected by the automaticity in a similar way. In psychological models task automaticity has been conceptualized as a continuum (Macleod and Dunbar, 1988). The position of a task along that continuum can be considered a dynamic process depending on factors such as adaption, complexity, interactional requirements, confidence and ability as well as novelty of the task (Blais et al., 2010; Gilabert, 2006; Robinson, 2001, 2005). In the context of the Lombard effect, it is hypothesized that the magnitude of the ‘effect’ will be in part determined by the extent to which the task is automatic. Whereby the uncertainty with which speech output is produced may increase when the feedback loop is interrupted or dampened, leading to an increase in the Lombard effect. Information on the relative influence task automaticity plays in speech could then inform understanding of speech production in populations and experimental conditions where optimal function is compromised. Pathological groups such as Huntington’s disease present with comorbid presentations of motor speech impairment and cognitive decline (Stout et al., 2011; Vogel et al., 2012). Tasks carrying contrasting levels of automaticity could be employed to delineate the competing effects of cognition and speech production, thus providing evidence for appropriate selection of therapeutic

targets as well as information to aid differential diagnosis of patient symptomatology. To investigate this, the Lombard methodology was exploited to examine the effect of background noise on five different speech tasks, increasing in automaticity, in healthy adults. Measures of timing, frequency and intensity were employed to demonstrate the impact of task determination on speech production. It was hypothesized that tasks possessing comparatively less automaticity (such as contemporaneous speech) would be more affected by auditory interference than tasks where the content and expectations are known to the speaker (e.g., counting). 2. Methods 2.1. Participants Fifteen healthy adults (8 male, 7 female) aged between 21 and 53 years (mean age, 30.9 ± 7.6 years) participated in this study. Participants were excluded if they were heavy smokers or coffee drinkers, were recreational drug users, or had a history of neurological trauma or disease. 2.2. Stimuli Stimuli were chosen in accordance with experimental requirements aiming to introduce varying degrees of automaticity into the speech stimuli. Task automaticity was determined using Robinson’s (2005) criteria for task automaticity, conditions and difficulty (see Table 1 for details). Relative levels were arbitrarily defined through consensus ratings made by the authors. The speech battery consisted of (i) contemporaneous speech task requiring participants to produce a monologue on any topic of their choosing (i.e., work, family, friends) for approximately one minute; (ii) reading a phonetically balanced text, the grandfather passage (169 syllables) (Van Riper, 1963); (iii) counting from one to 20 (32 syllables); (iv) saying the days of the week beginning with Monday (15 syllables); and (v) alternating motion rate task (AMR) requiring participants to repeat the consonant vowel cluster/pataka/10 times as quickly and clearly as possible. Although not containing any overt semantic associations, the final task was included as a highly repetitive and phonetically consistent stimulus. The intrinsic regularity of the AMR task allows for training, and therefore adaption, even for those participants unfamiliar with its structure. Thus, the AMR task potentially mitigates the role meaning plays in speech. The inclusion of a practice production ensured speakers unfamiliar with its structure were able to perform at a functional level prior to the commencement of the study. 2.3. Procedure Speech tasks were explained to the participant verbally and via written instruction before the first recording. For the baseline condition (no noise) participants were

A.P. Vogel et al. / Speech Communication 65 (2014) 1–8

3

Table 1 Determination of task automaticity based on Robinson’s criteria for task difficulty, condition and complexity (Robinson, 2005).

Task complexity (cognitive factors)

Monologue

Reading

Counting

Days

AMR

Resource directing (e.g., ± few elements, ± contemporaneous planning, ± reasoning demands) Resource depleting (e.g., ±planning, ±single task, ±prior knowledge)

4

3

2

2

1

4

3

2

2

1

Task conditions (interactional factors)

Participation variables (e.g., one-way/two-way) Participant variables (e.g., ±familiarity)

2 4

1 3

1 1

1 1

1 3

Task difficulty (learner factors) Sequencing criteria

Affective variables (e.g., motivation, anxiety, confidence) Ability variables (e.g., working memory, aptitude, intelligence) Prospective decisions about task decisions

4 4 4

3 3 3

1 2 2

1 2 2

2 1 2

Note: Scale of 1–4, where 4 indicates paradigm is required to complete task and 1 indicates concept is not required or minimally required to complete task; AMR = alternating motor rate.

instructed to speak in a natural manner, using their typical speaking rate, with a loudness level appropriate for speaking to one or two people in quiet surroundings. Participants completed the entire speech battery two times in succession with a one minute break between productions. The first production was then discarded as a practice session and not included in any analyses. Participants completed the baseline battery twice to acquire a stable recording according to the findings derived from a previous stability study (Vogel et al., 2011). Two hours later, participants completed the speech battery under the ‘noise’ condition. Speech was elicited while a looped sample of 24 multi-speaker babble combined with Gaussian white noise was delivered at 85 dB SPL over headphones. The noise condition was produced only once to mitigate the potential for participants to adapt to the stimuli and thus reduce the anticipated change in production. Multi-speaker babble was selected as it provides less variation in amplitude and frequency modulation over time (Laures and Bunton, 2003) than isolated speakers. It was combined with white noise to provide a flat spectrum of noise. Global signal to noise ratios (SNR) for the white noise and multi-speaker babble were 15.15 dB and 14.6 dB respectively. 85 dB SPL has been shown to adequately induce the Lombard effect (Summers et al., 1988). The response to noise has been shown to plateau at SPLs beyond this level (Dreher and O’Neill, 1957). Alternative noise paradigms were not included as the study aimed to induce the Lombard effect, not to investigate the relative influence of different noise types/amplitudes. 2.4. Data acquisition The speech battery was performed by each participant individually in a sound-treated room. Speech samples were recorded using a laptop PC (Hewlett-Packard, CA) and a Sennheiser PC 135 USB unidirectional head-mounted microphone (Sennheiser Communications, Solrød Strand, Denmark) (minimum sensitivity of – 38 dB and a frequency range of 80 Hz–15 kHz), which was positioned at a 45° angle, 8 cm from the mouth. All data were sampled at 44.1 kHz, with quantization at 16 bits. Data were recorded

using Audacity (version 1.2.6) (Mazzoni and Dannenberg, 2012). 2.5. Data analysis 2.5.1. Timing Each speech sample was segmented and analyzed using a freely available acoustic analysis software program, PRAAT (Boersma, 2001). Silences were removed from the start and end of the counting and reading tasks. The monologue samples were truncated at 20 s each side of the temporal midpoint. Segmentation produced samples with uniform signal lengths, allowing automated analysis of each sample and eliminated the influence of task length on the contemporaneous task. Speech samples were segmented and analyzed using automated scripts designed to derive information from large batches of data containing multiple samples. Timing measures were derived from all tasks using the following methods. Silences were identified from the intensity contour using three thresholds: (a) intensity threshold, (b) minimum silence duration (15 ms), and (c) minimum speech duration (30 ms). Silence segments were defined as the parts of the intensity contour that fell below the intensity threshold. Silence sections that were shorter than 15 ms were classed as speech and concatenated with the adjacent speech sections. Speech sections that were shorter than 30 ms were classed as silences and concatenated with the adjacent silences. The intensity threshold was set to 0.65 of the distance between the reference intensity (equal to 0.95 of the maximum intensity) and floor intensity (minimum). Reference intensity selection of 0.95 of the maximum intensity has been found more robust than use of the maximum, median, or modal intensities due to irregular bursts of energy that often occur with sporadically loud syllables or short phrases in spontaneous speech (e.g., emphatic stress). Visual inspection of the spectrum has shown that 0.95 of the maximum intensity represents the typical intensity of syllable peaks, whereas maximum intensity reflects a single observation interval and is less reliable than use of the reference intensity threshold described. The timing measures derived from this method included total silence time, total speech time, mean silence time, and speech rate (number

4

A.P. Vogel et al. / Speech Communication 65 (2014) 1–8

of syllables/total signal time). The AMR task was analyzed using the Diadochokinetic Rate Analysis component of the Motor Speech Profile, Model 5141 Version 3.4. (KayPENTAX Computerized Speech Lab. Lincoln Park, N.J., USA). This protocol measures the rate and regularity of consonant-vowel (CV) syllables repeated in a task involving repetition of a consonant vowel cluster (e.g., /pataka/). 2.5.2. Frequency and intensity Fundamental frequency (f0) and intensity (dB SPL) were employed to demonstrate the existence of the Lombard effect. These metrics were derived from all tasks using automated scripts in PRAAT (Boersma, 2001) designed for batch processing (Vogel et al., 2009). f0 was determined using generic window lengths based on speaker sex [i.e., Males = floor (70 Hz) ceiling (250 Hz)|Females = floor (100 Hz) ceiling (300 Hz)].

condition. A significant decrease in total speech time was observed on the counting task. No significant differences were observed between baseline and noise conditions on the days and AMR tasks. Timing measures were significantly correlated between conditions on all connected speech tasks. AMR productions were not significantly correlated between conditions. The contemporaneous and reading tasks were cut to reflect equitable sample lengths with the counting task in order to directly compare the role sample length plays in task automaticity and the Lombard effect. At the shorter length of 14 s (average length of the counting task), no significant differences were observed between the two conditions on measures of timing (See Table 3). Significant differences were observed between the two conditions on the reading task, despite cutting the sample down to 32 syllables (the same number of syllables as in the counting task).

2.6. Post Hoc examination of sample length 4. Discussion Tasks with lower levels of automaticity had inherently longer signal lengths than the counting, days of the week and AMR tasks [i.e., contemporaneous (trimmed at 40 s for all samples), reading (x ¼ 44 s), counting (x ¼ 14 s), days of the week (x ¼ 6 s) and AMR tasks (10 repetitions for all samples)]. In order to investigate the interaction between sample length and task automaticity, representative sample lengths of the contemporaneous, reading and counting tasks were compared (i.e., the middle 14 s of the contemporaneous task; the first 32 syllables of the reading task) using the same metrics for the primary analyses protocol. 2.7. Statistical analysis Paired sample t-tests and standardized mean differences using effect size (Dunlap’s d) (Dunlap et al., 1996) were used to compare productions during the noise and baseline conditions. Dunlap’s d is considered more appropriate for dependent group examinations than alternative effect size algorithms (e.g., Cohen’s d) as it allows computation of effect size by accommodating the non-independence of variance that occurs in prospective study designs (Fredrickson et al., 2008). Paired sample t-tests were derived using SPSS (Version 20.0. Armonk, NY: IBM Corp). 3. Results Tasks with lower levels of automaticity yielded larger effect sizes compared to more automatic tasks. f0 and intensity significantly increased during the noise condition on all tasks (See Table 2). A simultaneous decrease in the amount of speech and an increase in the amount of silence between conditions were observed on the contemporaneous task. Large standardized differences were observed during reading between conditions, with significant increases in mean silence length and percentage of silence during the noise

These data support the hypothesis that tasks possessing comparatively less automaticity were more affected by auditory interference than tasks where the content and expectations were known to the speaker. For each acoustic measure that changed during the noise condition, the difference was larger in magnitude for tasks with lower levels of automaticity (i.e., reading and contemporaneous tasks). In relation to measures of timing, individuals slowed their speech, produced longer silences between words and exhibited a higher proportion of silences within their samples during the noise condition on these tasks. In contrast, tasks that were relatively free of meaning or phonetic complexity such as the alternating motor rate task were influenced to a lesser degree by a compromised auditory feedback loop. Even tasks that carried meaning for the speaker (i.e., days of the week and counting) revealed only modest adaption (relating to timing) during the noise condition. When quantified in terms of standardized mean differences between baseline and noise conditions, the changes in timing on tasks with low levels of automaticity (see Table 1) were very large (contemporaneous: Dunlap’s d = 4.12– 4.58|reading: Dunlap’s d = 3.12–4.84 with the exception of total speech time, d = 0.59). However, in contrast, high automaticity tasks yielded smaller effect sizes on the majority of timing measures (counting: Dunlap’s d = 0.32–0.88, with the exception of percent of silence and total speech time, d = 2.47 and 3.1 respectively|days of the week: Dunlap’s d = 0.05–1.83|AMR: Dunlap’s d = 0.59–2.6). Compared to high automaticity tasks, the contemporaneous and reading tasks yielded quantitatively larger changes in performance on the dB measures in particular (Dunlap’s d = 8.14 and 9.56 respectively). By convention, the magnitude of changes observed for f0 between conditions was large for all tasks, the most pronounced being reading (Dunlap’s d = 8.53). The absence of substantial

A.P. Vogel et al. / Speech Communication 65 (2014) 1–8

5

Table 2 Mean (SD), significance, effect size and correlation by task and acoustic measure. Task

Measure

Baseline

Lombard

t

Contemporaneous speech

Mean silence length Silence length SD Percent of silence Total speech time f0 dB

0.08 0.26 20.99 30.35 152.92 58.12

0.12 0.39 28.92 27.51 168.44 65.33

(0.05) (0.17) (11.05) (5.09)

3.16 3.31 3.18 2.99 3.55 5.91

b

Mean silence length Silence length SD Percent of silence Total speech time Speech rate f0 dB

0.059 (0.015) 0.18 (0.064) 16.67 (4.68) 35.84 (5.53) 4.07 (0.55) 156.49 60.48

0.07 0.22 20.98 35.02 3.89 174.36 68.31

(0.02) (0.072) (6.9) (5.42) (0.51)

2.97 2.26 2.77 0.43 3.54 6.27 6.95

a

Counting

Mean silence length Silence length SD Percent of silence Total speech time Speech rate f0 dB

0.08 0.18 24.15 10.53 2.51 152.55 57.63

0.09 (0.05) 0.2 (0.11) 29.81 (14.88) 9.56 (2.35) 2.56 (1.08) 163.52 64.46

Days

Mean silence length Silence length SD Percent of silence Total speech time Speech rate f0 dB

0.097 (0.067) 0.21 (0.13) 22.33 (13.38) 4.27 (1.07) 2.92 (0.94) 151.02 60.04

0.08 0.18 23.74 4.09 3.04 166.13 67.73

Average AMR period (ms) CoV of AMR period (%) f0 dB

224.03 (90.74) 25.67 (17.82) 163.42 60.37

271.08 (89.57) 22.16 (25.56) 176.4 66.78

Reading

AMR

Mean (SD)

(0.04) (0.12) (6.97) (3.79)

(0.04) (0.1) (13.86) (2.68) (1.05)

Paired comparison

(0.044) (0.1) (11.55) (1.01) (1.15)

Effect size

Correlation

Dunlap’s d

r

4.38 4.58 4.41 4.12 4.85 8.14

0.53 0.57 0.54 0.72 0.93 0.73

4.1 3.12 3.82 0.59 4.84 8.53 9.56

0.72 a 0.64 a 0.59 a 0.92 c 0.94 b 0.97 c 0.7 b

0.64 0.58 1.78 2.24 a 0.24 3.3 b 3.9 b

0.88 0.8 2.47 3.1 0.32 4.54 5.39

0.73 b 0.54 b 0.66 b 0.8 c 0.8 c 0.97 c 0.64 a

1.49 1.33 0.43 0.96 0.68 3.98 4.56

0.05 1.83 0.6 1.33 0.94 5.41 6.28

0.82 0.78 0.56 0.77 0.84 0.98 0.65

2.6 0.59 3.43 5.97

0.5 0.09 0.93 c 0.7 b

1.88 0.42 2.52 4.34

b b b b c

a a

b c c

b c

a c

b a b c b

c b a c c b c

CoV = coefficient of variation; AMR = alternating motor rate; SD = standard deviation; ms = milliseconds. a p < 0.05. b p 6 0.01. c p 6 0.001.

differences between tasks in frequency or intensity measures suggests that the traditional acoustic metrics associated with the Lombard effect were not secondary to any differences in task automaticity. The equivalent differences between tasks could suggest that frequency and intensity function largely independent of task automaticity. These data appear contrary to earlier work which found f0 reliably increased upon the induction of increased cognitive load (Scherer et al., 2002) or analogous experimental paradigms such as stress (Mendoza and Carballo, 1998). One hypothesis arising from these contradictory findings is that the magnitude of change in f0 only differs between speech tasks when differences in task automaticity are exacerbated by additional environmental cues such as performance induced stress. The effect of noise on measures of timing was quantitatively larger on tasks where task automaticity was low. These findings demonstrate, for the first time, that motor speech timing is dependent on task automaticity when

error is introduced into the signal. This effect was most evident in the unprepared monologue, where the uncertainty surrounding the content of the task was produced contemporaneously. To explain the changes observed here, several adaption strategies may have been operating. Reading, days of the week and counting tasks all possess set content that does not change between conditions, yet significant decreases in speech time were only observed during counting. The absence of change on the days of the week task could be attributed to the brevity of the sample, in that brief acts of speech are less susceptible to noise. The absence of change in speech time during reading may be the result of the speaker employing other strategies to compensate for noise. For example, the mean silence length and overall percent of silence was reduced significantly, as was speaking rate (syllables per second) in individuals reading the grandfather passage during background noise. This act may be enough for the speaker to reach a threshold at which they feel they have adequately adapted to the noise

6

A.P. Vogel et al. / Speech Communication 65 (2014) 1–8

Table 3 Mean (SD), significance, effect size and correlation comparing sample length and syllable count for contemporaneous, reading and counting tasks. Task

Measure

Baseline

Lombard

t

Dunlap’s d

r

Contemporaneous speech (14 seconds|equivalent to average length of the counting task)

Mean silence length Silence length SD Percent of silence Total speech time f0 dB

0.08 0.26 22.82 10.81 158.04 62.23

0.11 0.32 27.48 10.15 174.75 69.28

(0.06) (0.16) (12.66) (1.77)

1.65 1.61 1.5 1.49 4.48 4.81

2.35 1.63 2.07 2.07 6.1 6.65

0.18 0.24 0.46 0.46 0.94 0.54

Mean silence length Silence length SD Percent of silence Total speech time Speech rate f0 dB

0.06 (0.02) 0.18 (0.07) 15.65 (5.28) 6.3 (1.11) 4.4 (0.74) 151.54 57.73

0.08 0.26 21.55 6.38 4.01 167.08 65.95

(0.04) (0.13) (8.7) (0.99) (0.69)

Reading (32 syllables|equivalent to syllables in counting task)

Mean (SD)

(0.04) (0.16) (9.02) (1.26)

Paired comparison

b c

3.01 b 3.02 b 3.28 b 0.38 5.01 c 3.3 c 6.11 c

Effect size

4.26 4.41 4.53 1.38 6.85 4.51 8.41

Correlation

c

0.84 c 0.76 b 0.67 a 0.74 b 0.93 c 0.92 c 0.7 b

CoV = coefficient of variation; AMR = alternating motor rate; SD = standard deviation; ms = milliseconds. a p < 0.05. b p 6 0.01. c p 6 0.001.

condition. Similarly, Scherer et al. (2002) found decreases in rate during speech tasks produced under conditions requiring higher cognitive loading. These results are inconsistent with earlier work demonstrating an absence of change in speech rate when reading the same passage employed in the current study (Letowski et al., 1993). For measures of total speech time, data from the current study demonstrate an absence of change or decrease in the connected speech tasks (i.e., days of the week, counting, reading, contemporaneous). However, earlier work on single words produced in noise has shown the opposite pattern, with increases in word and segment length during noise (Patel and Schell, 2008; Pittman and Wiley, 2001; Summers et al., 1988; Tartter et al., 1993). The current study also provides contrasting evidence to research showing exponential increases in syllable length during reading tasks (Hanley and Steer, 1949) and increases in word duration dependent on the syntactic structure in use. That is, in an interactive task designed to elicit spontaneous utterances from a speaker giving directions to a listener during noise, word duration increases in nouns rather than verbs or prepositions (Patel and Schell, 2008). Despite disagreement between the current data and similar work, the contrasting findings could be explained by differences in the speech tasks elicited and the analysis methods used to determine duration. The current study utilized global measures of timing derived from automatic analysis scripts that sum the total amount of speech within a sample, whereas duration measures described above were derived manually from specific words or segments often found within carefully selected carrier phrases. Alternatively, contrasting findings could be an artifact resulting from the experimental design which did not include communication loop, but rather the speaker performing in isolation. It is possible that

speakers were simply hurrying through their protocol to limit exposure to the unpleasant noise. However, these rationale would be applicable to all Lombard based trials. Finally, as the topic of monologues was not specified, the semantic content of each production may have varied, potentially influencing acoustic outcomes. Strong correlations between conditions (baseline vs. noise) were observed on all tasks with the exception of the AMR and mean silence length on the contemporaneous speech paradigms. AMR production can be variant in nature (Gadesmann and Miller, 2008), and its stability over time is largely unknown (Vogel and Morgan, 2010). Similarly, content of contemporaneously produced samples is unique to each speaker and each time-point, making heterogeneity between participants and productions a key component of the task. The uniform pattern of change observed in the bulk of connected speech tasks suggests that the majority of speakers alter their production in a similar way when confronted with noise during speech. The lack of internal consistency for the AMR tasks may account for the absence of significant changes in timing. Similarly, the uncommon structure and unfamiliarity of speakers to the AMR paradigm may also contribute to its variability between participants. However, to mitigate the potential impact of these factors, a practice production of all samples was elicited prior to recording to ensure participants felt comfortable and competent in producing each task. The use of a practice condition and the simple structure of the AMR, along with the regularity of its formation (i.e., repetitive consonant/vowel combinations) and the absence of semantic content provided an ideal environment on which to test the hypothesis that automaticity plays a role in speech timing. The consequences of which showed a clear relationship between task automaticity and speech timing.

A.P. Vogel et al. / Speech Communication 65 (2014) 1–8

4.1. Sample length and task automaticity An unmitigated feature of the speech tasks was variation in sample length. Less automated samples were considerably longer than the automated tasks. Post hoc comparison of samples with comparable lengths but differing automaticity was conducted using the contemporaneous and reading tasks cut in line with the length and number of syllables of the counting task. The shorter contemporaneous samples yielded non-significant differences between conditions suggesting sample length affects the impact of noise on timing in unstructured tasks. The smaller effect size and non-significant correlation between conditions could also reflect the variable nature of contemporaneous speech, rather than the role of automaticity per second. Large and significant differences between conditions were observed during the shorter reading task. These findings differ to the counting task, which has the same number of syllables as the reduced reading task. These data suggest that simply reducing sample length does not mitigate the effect of noise when task automaticity remains low. They also provide evidence that structured tasks with low levels of automaticity are consistently affected by altered auditory feedback, regardless of sample length. 4.2. Speaking in isolation or with a communication partner Data from this study were acquired from speakers without feedback from a communication partner. This model of assessment is commonly employed in experiments eliciting the Lombard effect, yet natural speech is rarely produced without a recipient (Boril, 2008). As the primary purpose of speech is to communicate with others, data derived from experimental designs without a listener may be unduly influenced by this unnatural setting. We have provided data in support of the influence of task automaticity on feed-forward speech loop, however we have not considered the potential influence of listener feedback on the communication loop. Work by Boril (2008) has in part demonstrated that the speakers’ reaction to noise varies as function of the communicative intent. That is, speech production in noise is dependent on whether a communicative partner is present. Future work on task automaticity and speech should consider these additional influences on production. 4.3. Cognition, speech and pathological populations It has been argued that speech is an integrated cognitivemotor task involving higher level cortical involvement (e.g. motor planning and language formulation) as well as skeleto-motor control (i.e., execution of vocal and articulatory behaviors). Here it was hypothesized and demonstrated that the position of a task along the automaticity continuum determines, to some extent, the magnitude of the Lombard effect. Following this line of reasoning, tasks with

7

low levels of automaticity will more readily highlight a compromised feed-forward mechanism (e.g., cognitive impairment), and thus allow differentiation of clinical presentations based on task selection. Huntington’s disease is one condition where both motor speech impairment and cognitive decline are key symptoms of the disease (Stout et al., 2011; Vogel et al., 2012), albeit at different stages of the disease. An example of this relationship was established in a study comparing speech production of two groups of individuals carrying the mutant HTT gene responsible for causing Huntington’s disease (1. Symptomatic vs. 2. Pre-symptomatic) with healthy controls (Vogel et al., 2012). Speech tasks possessing distinct levels of automaticity (e.g., contemporaneously produced monologue, reading, saying the days of the week) were elicited to investigate the sensitivity of different speech tasks to disease stage. Similar to the current study, tasks with relatively high levels of automaticity did not discriminate between groups (healthy, pre-symptomatic or early phase HD). However, tasks with unfamiliar content (i.e., reading or contemporaneous) revealed significant differences between groups. Evidence that less automatic tasks demonstrate higher sensitivity to disease state in conditions with known cognitive decline supports the hypothesis that the magnitude of effect of speech task is larger when automaticity is lower. These methods go further to suggest that the sensitivity of less automatic tasks to pathology over highly automatic stimuli are the result of cognitive decline rather than a purely motor manifestation of symptoms (which would be observed on all tasks if present). 5. Conclusions Results of the current study highlight the influence task automaticity has in determining the magnitude of the Lombard effect. The changes observed in speech timing during the noise condition on tasks requiring more complex cognitive processing (e.g., production of an unprepared monologue) and the relative absence of changes on simple speech tasks (e.g., counting, syllable repetition) demonstrates a link between automaticity, motor function and feedback mechanisms. It appears that altered auditory feedback during simple speech tasks is easier to overcome as the internal model of behavior is more certain. In contrast, novel tasks appear to introduce error into a feed-forward model of goal directed movements, the impact of which is exacerbated when the sensory feedback component of speech is altered. A failure to separate or acknowledge the influence of task automaticity on speech may result in misattribution of impaired motor performance, when it may be more reflective of altered cognitive function. In populations where cognitive function is affected and assessment of the motor control of speech is important (e.g., Huntington’s disease), tasks carrying contrasting levels of automaticity need be employed in order to determine the competing effects of cognitive decline and speech production.

8

A.P. Vogel et al. / Speech Communication 65 (2014) 1–8

Acknowledgments APV was supported by a National Health and Medical Research Council - Australia, Early Career Fellowship (#1012302). References Bays, P.M., Wolpert, D.M., 2007. Computational principles of sensorimotor control that minimize uncertainty and variability. J. Physiol. 578, 387–396. Bays, P.M., Wolpert, D.M., Flanagan, J.R., 2005. Perception of the consequences of self-action is temporally tuned and event driven. Curr. Biol. 15, 1125–1128. Blais, C., Harris, M.B., Guerrero, J.V., Bunge, S.A., 2010. Rethinking the role of automaticity in cognitive control. Q. J. Exp. Psychol. 65, 268– 276. Boersma, P., 2001. Praat, a system for doing phonetics by computer. Glot Int. 5, 341–345. Boril, H., 2008. Robust Speech Recognition: Analysis and Equalization of Lombard Effect in Czech Corpora. Ph.D. dissertation, Czech Technical University in Prague, Czech Republic, . Brown Jr., W.S., Brandt, J., John, F., 1972. The effect of masking on vocal intensity during vocal and whispered speech. J. Audit. Res. 12, 157– 161. Castellanos, A., Benedı´, J.-M., Casacuberta, F., 1996. An analysis of general acoustic-phonetic features for Spanish speech produced with the Lombard effect. Speech Commun. 20, 23–35. Dreher, J.J., O’Neill, J.J., 1957. Effects of ambient noise on speaker intelligibility for words and phrases. J. Acoust. Soc. Am. 29, 1320– 1323. Dunlap, W.P., Jose, J.M., Vaslow, J.B., Burke, M.J., 1996. Meta-analysis of experiments with matched groups or repeated measures designs. Psychol. Methods 1, 170–177. Fredrickson, A., Snyder, P.J., Cromer, J., Thomas, E., Lewis, M., Maruff, P., 2008. The use of effect sizes to characterize the nature of cognitive change in psychopharmacological studies: an example with scopolamine. Hum. Psychopharmacol. Clin. Exp. 23, 425–436. Gadesmann, M., Miller, N., 2008. Reliability of speech diadochokinetic test measurement. Int. J. Lang. Commun. Disord. 43, 41–54. Gilabert, R., 2006. The Simultaneous Manipulation of Task Complexity Along Planning Time and [+/–Here-and-Now]: Effects on L2 Oral Production, Investigating Tasks in Formal Language Learning. In: Multilingual Matters. Clevedon, UK, pp. 44–68. Hanley, T.D., Steer, M.D., 1949. Effect of level of distracting noise upon speaking rate, duration and intensity. J. Speech Hearing Disord. 14, 363–368. Hayhoe, M., Ballard, D., 2005. Eye movements in natural behavior. Trends Cogn. Sci. 9, 188–194. Houde, J.F., Jordan, M.I., 1998. Sensorimotor adaptation in speech production. Science 279, 1213–1216. Junqua, J.-C., 1996. The influence of acoustics on speech production: a noise-induced stress phenomenon known as the Lombard reflex. Speech Commun. 20, 13–22. Laures, J.S., Bunton, K., 2003. Perceptual effects of a flattened fundamental frequency at the sentence level under different listening conditions. J. Commun. Disord. 36, 449–464. Lee, G.-S., Hsiao, T.-Y., Yang, C.C., Kuo, T.B., 2007. Effects of speech noise on vocal fundamental frequency using power spectral analysis. Ear Hear. 28, 343–350. Letowski, T., Frank, T., Caravella, J., 1993. Acoustical properties of speech produced in noise presented through supra-aural earphones. Ear Hear. 14, 332. Lombard, E., 1911. Le signe de l’elevation de la voix [The sign of the rise of the voice]. Maladies Oreille, Larynx, Nez, Pharynx 27, 101–119.

Lu, Y., Cooke, M., 2009. Speech production modifications produced in the presence of low-pass and high-pass filtered noise. J. Acoust. Soc. Am. 126, 1495–1499. Macleod, C.M., Dunbar, K., 1988. Training and stroop-like interference: evidence for a continuum of automaticity. J. Exp. Psychol. Learn. Mem. Cogn. 14, 126–135. Mazzoni, D., Dannenberg, R., 2012. Audacity, 2.0.3 ed. Mendoza, E., Carballo, G., 1998. Acoustic analysis of induced vocal stress by means of cognitive workload tasks. J. Voice 12, 263–273. Patel, R.P., Schell, K.W., 2008. The influence of linguistic content on the Lombard effect. J. Speech Lang. Hear. Res. 51, 209–220. Pittman, A.L., Wiley, T.L., 2001. Recognition of speech produced in noise. J. Speech Lang. Hear. Res. 44, 487–496. Rivers, C., Rastatter, M.P., 1985. The effects of multitalker and masker noise on fundamental frequency variability during spontaneous speech for children and adults. J. Audit. Res. 25, 37–45. Robinson, P., 2001. Task complexity, task difficulty, and task production: exploring interactions in a componential framework. Appl. Linguistics 22, 27–57. Robinson, P., 2005. Cognitive complexity and task sequencing: studies in a componential framework for second language task design. Iral 43, 1– 80. Scherer, K.R., Grandjean, D., Johnstone, T., Klasmeyer, G., Ba¨nziger, T., 2002. Acoustic Correlates of Task Load and Stress, ICSLP-2002. Denver, CO, USA, pp. 2017–2020. Siegel, G.M., Clay, S.L., Naeve, J.L., 1992. The effects of auditory and visual interference on speech and sign. J. Speech Hear. Res. 35, 1358– 1362. Stout, J.C., Paulsen, J.S., Queller, S., Solomon, A.C., Whitlock, K.B., Campbell, J.C., Carlozzi, N., Duff, K., Beglinger, L.J., Langbehn, D.R., Johnson, S.A., Biglan, K.M., Aylward, E.H., 2011. Neurocognitive signs in prodromal Huntington disease. Neuropsychology 25, 1– 14. Summers, W.V., Pisoni, D.B., Bernacki, R.H., Pedlow, R.I., Stokes, M.A., 1988. Effects of noise on speech production: acoustic and perceptual analyses. J. Acoust. Soc. Am. 84, 917–928. Tartter, V.C., Gomes, H., Litwin, E., 1993. Some acoustic effects of listening to noise on speech production. J. Acoust. Soc. Am. 94, 2437– 2440. Tremblay, S., Shiller, D.M., Ostry, D.J., 2003. Somatosensory basis of speech production. Nature 423, 866–869. van Beers, R.J., Baraduc, P., Wolpert, D.M., 2002. Role of uncertainty in sensorimotor control. Phil. Trans. R. Soc. Lond. Series B Biol. Sci. 357, 1137–1145. Van Riper, C., 1963. Speech Correction, 4th ed. Prentice Hall, Englewoood Cliffs, NJ. Vogel, A.P., Morgan, A.T., 2010. Assessment of impairment or monitoring change in Friedreich ataxia. Mov. Disord. 25, 1753–1754. Vogel, A.P., Maruff, P., Snyder, P.J., Mundt, J.C., 2009. Standardization of pitch-range settings in voice acoustic analysis. Behav. Res. Methods 41, 318–324. Vogel, A.P., Fletcher, J., Snyder, P.J., Fredrickson, A., Maruff, P., 2011. Reliability, stability, and sensitivity to change and impairment in acoustic measures of timing and frequency. J. Voice 25, 137–149. Vogel, A.P., Shirbin, C., Churchyard, A.J., Stout, J.C., 2012. Speech acoustic markers of early stage and prodromal Huntington’s disease: a marker of disease onset? Neuropsychologia 50, 3273–3278. Wassink, A.B., Wright, R.A., Franklin, A.D., 2007. Intraspeaker variability in vowel production: an investigation of motherese, hyperspeech, and Lombard speech in Jamaican speakers. J. Phonetics 35, 363–379. Watson, P.J., Hughes, D., 2006. The relationship of vocal loudness manipulation to prosodic F0 and durational variables in healthy adults. J. Speech Lang. Hear. Res. 49, 636–639. Wolpert, D.M., Diedrichsen, J., Flanagan, J.R., 2011. Principles of sensorimotor learning. Nat. Rev. Neurosci. 12, 739–751. Yates, A.J., 1963. Delayed auditory feedback. Psychol. Bull. 60, 213–232.