Applied Acoustics 110 (2016) 170–175
Contents lists available at ScienceDirect
Applied Acoustics journal homepage: www.elsevier.com/locate/apacoust
The effects of speech intelligibility and temporal–spectral variability on performance and annoyance ratings Andreas Liebl a,⇑, Alexander Assfalg b, Sabine J. Schlittmeier b,c,d a
Fraunhofer Institute for Building Physics, Nobelstraße 12, 70569 Stuttgart, Germany Catholic University of Eichstaett-Ingolstadt, Ostenstraße 25, 85072 Eichstaett, Germany c University of Kaiserslautern, Erwin-Schrödiger-Straße, 67663 Kaiserslautern, Germany d HSD Hochschule Döpfer, Waidmarkt 3 und 9, 50676 Cologne, Germany b
a r t i c l e
i n f o
Article history: Received 20 June 2015 Received in revised form 19 February 2016 Accepted 21 March 2016 Available online 31 March 2016 Keywords: Noise Office Performance Annoyance
a b s t r a c t Ambient sound can impair verbal short-term memory performance. This finding is relevant to the acoustic optimization of open-plan offices. Two algorithmic approaches claim to model the impairment during a given sound condition. One model is based on the Speech Transmission Index (STI). The other approach relies on the hearing sensation fluctuation strength (F). Within the scope of our consulting activities the approach based on F can hardly be applied and the model based on the STI is often misinterpreted in terms of semanticity. Therefore we put to test the two models and elucidate the relevance of temporal–spectral variability and semanticity of background sound with regard to impairment of performance. A group of 24 subjects performed a short-term memory task and rated perceived annoyance during eight different speech and speech-like noise conditions, which varied with regard to STI and F. The empirical data is compared to the model predictions, which only partly cover the experimental results. Speech impairs performance more than all other sound conditions and variable speech-like noise is more impairing than continuous speech-like noise. Sound masking with continuous speech-like noise provides relief from the negative effect of background speech. This positive effect is more pronounced if the signal to noise ratio is 3 dB(A) or even lower. Ó 2016 Elsevier Ltd. All rights reserved.
1. Introduction Ambient speech is known to impair individual task performance at silent, concentrated work [1,2]. The detrimental impact of background speech on verbal short-term memory was reported for the first time by Colle and Welsh [3]. Since then the phenomenon has been object of a mere uncountable number of studies and is termed [4] as the Irrelevant Sound Effect (ISE). It describes a significant impairment of verbal short-term memory during presentation of certain sound conditions, even so the material to be remembered is presented visually and the sound condition is supposed to be ignored. Most of these studies were motivated by a basic research perspective and intended insight into automatic and obligatory aspects of speech and sound processing in verbal short-term memory. Consequently they focused on necessary and sufficient characteristics of sound conditions for eliciting a detrimental impact. By doing so, a prerequisite for an ISE to arise has ⇑ Corresponding author. E-mail addresses:
[email protected] (A. Liebl), a.assfalg@gmail. com (A. Assfalg),
[email protected] (S.J. Schlittmeier). http://dx.doi.org/10.1016/j.apacoust.2016.03.019 0003-682X/Ó 2016 Elsevier Ltd. All rights reserved.
been revealed which is that the sound condition is a so-called changing-state sound [5]. A changing-state sound is characterized by a distinct temporal structure with different auditory-perceptive tokens varying successively (for further information cp. e.g. [6]). Performance of verbal short-term memory is usually operationalized by the serial recall task where visual items (digits, letters) are presented sequentially and have to be recalled afterwards in the strict order of presentation. The performance decrement in this task during changing-state sound is explained by the interference by-process principle [7–9]. A decline of performance occurs if automatic processing of the sound condition and voluntary task processing call for the same cognitive resources. Correspondingly, the ISE is assumed to be due to a conflict between the intentional serial processing of the items of the serial recall task and the automatic and involuntary serial processing of the successively changing auditory-perceptive tokens of the sound condition. Such interference effects, which are highly specific to the correspondence of sound and task characteristics, have been verified recently to be qualitatively different from the so-called deviation effect. The latter is described by the distraction of attention from the task at hand due to unexpected changes in
A. Liebl et al. / Applied Acoustics 110 (2016) 170–175
the acoustic background, for example by a telephone starting to ring or a change in background speaker voice [10]. This differentiation of specific interference processes and auditory distraction effects is the core assumption of the duplex-mechanism account of auditory distraction, a recently published framework of cognitive noise effects [11,12]. Yet, the detrimental impact of sound conditions on cognitive performance is not only a matter of basic research in the field of cognitive psychology but also an aspect of everyday real life and relevant in many contexts, e.g. in open-plan and group offices, in noisy classrooms, at production lines and construction sites. Therefore the question of which sound condition characteristics are decisive for an impairment of performance to occur is also of economic interest. Consequently, the performance impact of sound conditions have been investigated during the last years in a series of applied studies focusing on open-plan and group offices on the one hand and background speech on the other hand [13–15]. In each of these two lines, basic and applied research, an algorithmic model has been developed to estimate the expectable performance decrement during a given sound condition. From an applied perspective, an algorithmic model is helpful for designing and evaluating working environments with the objective to guarantee for employees optimum performance and well-being under consideration of the cost-value ratio. In basic research, an algorithmic model adds to the understanding of basic cognitive functions and provides quantifiable hypotheses as compared to making only qualitative predictions. The two different approaches will be outlined in the following and applied in the present experiment.
171
performance, proofreading and self-estimated daily waste of working time due to the presence of disturbing sound conditions. For more detailed information the reader is referred to Hongisto [1]. The second approach modeling the disruption of task performance during irrelevant sound conditions has been proposed by Schlittmeier et al. [6]. It differs in several perspectives from the foresaid approach by Hongisto [1]. To start with, the model applies not only to speech but also to non-speech sounds while explicitly focusing on the decrease of short-term memory performance during irrelevant sound. This phenomenon was described above as the ISE. Accordingly the approach presented by Schlittmeier et al. [6] is driven by a basic research perspective. Its defining characteristic is to estimate the performance decrement during a certain sound based on the hearing sensation fluctuation strength (F) as expressed in Eq. (2). This quantity, which can also be instrumentally measured, is named after the hearing percept induced when listening to sounds which are slowly modulated regarding to frequency or amplitude (fmod < 20 Hz). The data base from which Eq. (2) is derived incorporates 70 behavioral outcome measures for 40 different sounds, e.g. background speech, music, traffic noise, tone sequences and office noise. For more detailed information the reader is referred to Schlittmeier et al. [6].
DPðFÞ ¼ ISE ¼
F 7:5 0:68 vacil
ð2Þ
1.2. Research intent 1.1. Theoretical background The first approach to be considered here has been presented by Hongisto [1] and is driven by an applied perspective. It was motivated by the fact that employees in open-plan and group offices judge irrelevant background speech produced by colleagues talking on the phone or with other colleagues to be one of the severest problems in their working environment as it is perceived to be annoying and disturbing (e.g. [15]). However, talking colleagues cannot be banned from open-plan offices, so that the main focus of most interventions is to alter room acoustics in a way that irrelevant background speech becomes less intelligible on its transmission path from a talker to an involuntarily listening co-worker (cp. [15]). This stems from empirical results verifying that reducing background speech’s intelligibility reduces its detrimental impact on task performance and perceived annoyance [1,2,13–19]. Against this background, the approach by Hongisto [1] modeling the detrimental impact of sound conditions has its foundations in room acoustics and focuses on background speech. Hongisto [1] provided a model according to Eq. (1) that predicts how much performance is reduced due to speech of varying Speech Transmission Index (STI). The STI is an instrumental measure of speech intelligibility in rooms [20]. In principle it is based on the estimated degradation of a given speech signal on its transmission path from the source (talker) to the listener’s ear (receiver). Its value varies from 0 (completely unintelligible) to 1 (perfect intelligibility). According to the model increasing speech intelligibility – reflected by increasing STI values – causes a decrease of performance (DP).
The two foresaid models claim to predict the decline of individual task performance due to the presence of sound but they rely on different parameters, namely the STI [1] and the hearing sensation F [6]. Both models have strengths and shortcomings, which need to be addressed. In our practical consultative work we have learned that the approach based on F can hardly be applied in real offices since the signal to noise ratio between disturbing background speech and neutral steady background sound is often too low. The approach based on the STI is often misinterpreted by facility managers and office owners who equate the physical quality measured by the STI with speech intelligibility and especially with semanticity of background speech. As a consequence it is argued that disturbance is mainly a matter of the employees curiosity and that they could just ignore the background speech of their colleagues. We conducted a laboratory experiment to put to test the two models and to elucidate the relevance of temporal–spectral variability and semanticity of background sound. In particular, we wanted to find out about the effects of manipulations of intelligibility and temporal–spectral variability with regard to the decline of individual task performance. Therefore, the effects of several speech and speech-like noise signals on verbal short-term memory were tested and compared to a silent reference condition. In addition to objective performance scores, subjective annoyance ratings were collected for each sound condition.
2. Methods and materials 2.1. Sample
DPðSTIÞ ¼
7 þ7 1 þ exp½ðSTI 0:4Þ=0:06
ð1Þ
Even so the model claims to predict cognitive performance in applied settings the data base relies on two laboratory studies and only one field study, focusing on verbal short-term memory
24 students of the Catholic University of Eichstaett-Ingolstadt took part in the experiment (Age: 20–33 years; Median: Md = 25 years; Gender: 4 male, 20 female). All participants were native German speakers and reported normal hearing ability. A small allowance was paid for participation.
172
A. Liebl et al. / Applied Acoustics 110 (2016) 170–175
2.2. Design The experimental design is a one-way repeated measures design with 8 levels according to the eight sound conditions during which performance and annoyance were tested.
2.3. Material The experiment was run on an Apple MacBook computer. The serial recall task was presented with the experimental software PsyScope X B54 [21]. In the serial recall task, the digits from 1 to 9 were successively presented in the middle of the computer screen in randomized order (700 ms on, 300 ms off; font: Chicago; font size 16). These digits had to be maintained and recalled in exact presentation order after a short retention interval of 10 s. Twelve different repetitions of the serial recall task were conducted in each of the eight sound conditions. For assessing perceived annoyance the five-point verbal (not at all – slightly – moderately – very – extremely) and 11-point numerical (0–10) rating scales in the style of ISO/TS 15666:2003 were used. Since the ISO/TS 15666:2003 addresses long term noise exposure during a period of months, the wording of the questions was changed in order to refer to the short period of indoor sound exposure in this investigation (Thinking about the last minutes, how much did the background sound conditions bother, disturb or annoy you?; Thinking about the last minutes, what number from 0 to 10 best shows how much you were bothered, disturbed or annoyed by the background sound conditions?). Unmasked and masked native speech as well as speech-like noise conditions were included in the experiment to test the effects of sound conditions of varying STI and fluctuation strength F on short-term memory. For speech conditions, unmasked native speech was used as a starting point, labeled Speech. Here, the HSM sentence test [22] material was used. It consists of studio recorded correct German sentences, which are unrelated to each other but represented semantically meaningful native speech to the German participants. The test is marketed by the WESTRA ELEKTROAKUSTIK GmbH and the company was contacted in order to get information about the room acoustical conditions in the recording studio. The company guarantees for early decay times below 0.5 s for the octave bands from 125 to 8000 Hz but did not yet provide more precise information. Following from that the early decay time was set to 0.5 s at all bands for the calculation of the STI. This speech sound was masked with a continuous speech-like noise at different SNR. These sound conditions were labeled Masked Speech SNR = 0 for SNR = 0 dB(A), Masked Speech SNR = 3 for SNR = 3 dB (A) and Masked Speech SNR = 6 for SNR = 6 dB(A). The continuous masking sound was the ICRA noise number 1 [23]. This noise is generated from natural speech and has frequency characteristics corresponding to the overall spectral shape of male speech at normal speaking effort in accordance with ANSI S3.5, while lacking a temporal structure. This Continuous Speech-like Noise was also included as sound condition in the present experiment, i.e. without being superimposed to speech. The unmasked native speech sound was opposed to Variable Speech-like Noise (ICRA noise number 5, [23]), which has frequency characteristics corresponding to the overall spectral shape of male speech at normal speaking effort in accordance with ANSI S3.5 and equal temporal–spectral characteristics as natural speech while carrying no semantic meaning. Finally, Pink Noise was included in the experiment since it has been frequently used in basic research experiments investigating performance effects of sound conditions (e.g. [24]). It is important to note that the spectrum of the ICRA noise does not match with the spectrum of HSM sentence test. The spectra of all sound types are presented in Table 1.
Table 1 Octave band levels (dB) from 125 to 8000 Hz. Sound conditions
Frequency (Hz) 125
Pink Noise Continuous Speech-like Noise Variable Speech-like Noise Speech
250
500
2000
4000
8000
47.9 50.5
47.8 53.6
Level (dB) 47.9 47.8 55.0 49.9
1000
47.9 44.1
47.9 38.1
47.9 32.2
50.7 54.9
53.5 55.7
54.9 52.5
44.0 43.6
37.4 44.5
31.6 45.8
50.0 49.4
The aforementioned sound conditions were all presented at a sound pressure level of 55 dB(A). As overall control condition, a Silence condition was introduced in which soft Pink Noise was played back at 25 dB(A). Soft Pink Noise (25 dB(A)) was also superimposed to Speech and Variable Speech-like Noise. All sound conditions were presented binaurally using the Mac OS X system-internal sound playing system and Sennheiser HD 600 headphones (Sennheiser electronics GmbH & Co., KG, Wedemark, Germany). The sound pressure level refers to an energyequivalent sound pressure level Leq averaged over presentation duration and measured using a sound level meter (Brüel & Kjær 2231) and an artificial ear (Brüel & Kjær 4153). Table 2 depicts STI and F values of the different sound conditions. STIs were calculated according to Hongisto et al. [25] and F was derived using the software ArtemiS version 12.0.110 (HEAD acoustics GmbH, Herzogenrath, Germany). STI values were set to STI = 0 for speech-like noise signals since this signal is unintelligible. 2.4. Procedure Each participant performed different instances of the serial recall task and annoyance ratings under each sound condition. The annoyance ratings were collected directly following the serial recall task. The presentation order of the sound conditions was balanced over participants by a Latin square design to account for potential seriation and position effects. Each participant began with a practice of the serial recall task during silence. Each digit not recalled in its previously presented serial position was counted as an error. 3. Results 3.1. Empirical performance data Average error rates and standard errors are depicted in Fig. 1 for each sound condition. An ANOVA (analysis of variance) verified a significant effect of sound condition on serial recall performance (F (2.63, 60.37) = 12.03, mean square error (MSE) = 0.32, p < .001, g2 = .34). Since the Mauchly test for sphericity was significant, the GreenhouseGeiser correction was applied to degrees of freedom. T-tests for paired samples were calculated to elucidate the corresponding Table 2 Speech Transmission Index (STI) and Fluctuation Strength (F) [vacil] of the experimental sound conditions. Sound conditions
STI
F
Silence (baseline condition) Pink Noise Continuous Speech-like Noise Variable Speech-like Noise Masked Speech SNR = 6 dB(A) Masked Speech SNR = 3 dB(A) Masked Speech SNR = 0 dB(A) Speech
0 0 0 0 0.37 0.45 0.53 0.8
0 0.002 0.004 0.278 0.009 0.016 0.026 0.261
173
A. Liebl et al. / Applied Acoustics 110 (2016) 170–175
which do or do not significantly differ from each other. If the same letter is assigned to different sound conditions, those sound conditions do not significantly differ. If different letters are assigned, the sound conditions are significantly different.
Silence Pink Noise Continuous Speech-like Noise Variable Speech-like Noise
3.2. Modeled vs. empirical performance data Masked Speech SNR = -6dB(A) Masked Speech SNR = -3dB(A) Masked Speech SNR = 0dB(A) Speech
0%
10%
20%
30%
40%
50%
60%
Fig. 1. Performance during the different sound conditions in terms of error rates (n = 24). Means with standard errors are plotted.
Table 3 Summarized behavioral performance results in terms of significance groups (significant differences between test conditions are indicated by different letters). Sound conditions
Significance groups
Silence (Baseline Condition) Pink Noise Continuous Speech-like Noise Variable Speech-like Noise Masked Speech SNR = 6 dB(A) Masked Speech SNR = 3 dB(A) Masked Speech SNR = 0 dB(A) Speech
A A A
B B
C C C D E
sound effect pattern. Due to the relatively large number of calculated t-tests, the Benjamini–Hochberg a-error adjustment was used [26]. During unmasked Speech significantly more errors were made than during all other sound conditions (.00009 < p < .05, onetailed, Cohen’s d 0.68 < d < 1.59). Differences between masked speech conditions were only statistically verifiable for Masked Speech SNR = 0 compared to Masked Speech SNR = 3 (t(23) = 2.45, p = .023, one-tailed, Cohen’s d = 0.71) and Masked Speech SNR = 6 (t(23) = 2.67, p = .015, one-tailed, Cohen’s d = 0.77), respectively. The latter two sound conditions did not differ significantly. Error rates during Variable Speech-like Noise were numerically smaller compared to the speech conditions (unmasked and masked) even though a statistically significant difference was only verifiable compared to unmasked Speech (t(23) = 3.96, p = .001, one-tailed, Cohen’s d = 1.14) and Masked Speech SNR = 0 (t(23) = 2.71, p = .016, one-tailed, Cohen’s d = 0.78). On the other hand, significantly more errors were made during Variable Speech-like Noise compared to Continuous Speech-like Noise (t(23) = 2.11, p = .036, one-tailed, Cohen’s d = 0.61) and to Silence (t(23) = 2.33, p = .025, one-tailed, Cohen’s d = 0.67) but not compared to Pink Noise. Error rates did not significantly differ among the latter three sound conditions. Summarizing these behavioral performance results in terms of significance groups reveals the effect pattern depicted in Table 3. The letters A to E are used to group sound conditions,
The empirical results are now compared to the model assumptions. Table 4 depicts the decrement of performance (DP) under presentation of each sound condition compared to silence as predicted by the models and calculated according to Eq. (1) (DP (STI)) and Eq. (2) (DP(F)). Since large differences between individuals and between experimental tasks occur with regard to the decrement of performance during intelligible speech, the minimum average DP was set to 7% by Hongisto [1]. This means that at least a decrease of performance by 7% can be expected due to the presence of intelligible speech. However, Hongisto [1] recommended to normalize the STI-model by the highest DP within a given data set, which is 18.09% (error rate Speech minus error rate Silence) in this investigation. A normalization procedure can also be applied to the model proposed by Schlittmeier et al. [6]. These normalizations are also shown in Table 4 (DP(STIn); DP(Fn)). The differences between observed decrement of performance and normalized predictions based on DP(STIn) and DP(Fn) were tested for statistical significance by one-sample t-tests. In all t-tests the population variance is estimated using the sample variance (standard deviation). Therefore it is not necessary to have full population information. As can be seen from the data in Table 4, the STI model is doing well for speech and masked speech sounds, if normalization is applied, but faces problems with respect to the difference between observed and predicted (DP(STIn)) performance decrement during Variable Speech-like Noise (t(23) = 2.33, p = .014, one-tailed, Cohen’s d = 0.48). The F model, however, generally captures the detrimental impact by Variable Speech-like Noise but largely overestimates (DP(Fn)) the effect which is shown by the difference between observed and predicted decrement of performance (t(23) = 7.33, p = 9.29 ⁄ 108, one-tailed, Cohen’s d = 1.50). It also underestimates the performance decrement caused by all masked speech sounds (Masked Speech SNR = 6: t(23) = 3.19 p = .002, one-tailed, Cohen’s d = 0.65; Masked Speech SNR = 3: t(23) = 2.73, p = .006, one-tailed, Cohen’s d = 0.56; Masked Speech SNR = 0: t(23) = 3.83, p = .0005, one-tailed, Cohen’s d = 0.78). 3.3. Annoyance ratings Annoyance ratings were collected for each of the eight sound conditions. Since correlation between the five-point rating scale and 11-point rating scale of ISO/TS 15666:2003 is almost perfect (r = 0.99), only the ratings of the 11-point rating scale are reported. Fig. 2 depicts the mean annoyance ratings and standard errors of each sound condition.
Table 4 Predicted decrement of Performance (%) based on Speech Transmission Index DP(STI); DP(STIn) and Fluctuation Strength DP(F); DP(Fn), as compared to the observed DP. Asterisks do mark significant differences (one-sample t-test, one-tailed) between observed decrement of performance and predictions based on DP(STIn) and DP(Fn). Sound Conditions
DP(STI)
DP(STIn)
DP(F)
DP(Fn)
DP (observed)
Silence (Baseline Condition) Pink Noise Continuous Speech-like Noise Variable Speech-like Noise Masked Speech SNR = 6 dB(A) Masked Speech SNR = 3 dB(A) Masked Speech SNR = 0 dB(A) Speech
– 0 0 0 2.64 4.88 6.28 6.99
– 0 0 0⁄ 6.83 12.61 16.23 18.06
– 0.02 0.05 3.07 0.10 0.17 0.29 2.88
– 0.14 0.28 19.27⁄ 0.62⁄ 1.11⁄ 1.80⁄ 18.09
– 0.65 0.51 4.63 11.07 10.04 14.20 18.09
174
A. Liebl et al. / Applied Acoustics 110 (2016) 170–175
Silence Pink Noise Continuous Speech-like Noise Variable Speech-like Noise Masked Speech SNR = 6dB(A) Masked Speech SNR = 3dB(A) Masked Speech SNR = 0dB(A) Speech
0
1
2
3
4
5
6
7
8
9
10
Fig. 2. Annoyance during the different sound conditions in terms of ratings on a 11point scale (n = 24). Means with standard errors are plotted.
An ANOVA revealed a significant effect of sound condition on perceived annoyance (F (4.29, 98.66) = 40.40, mean square error (MSE) = 273.78, p < .001, g2 = .64). T-tests for paired samples were calculated to elucidate the significant sound effect. Again, the Benjamini–Hochberg a-error adjustment was used [26]. Speech was perceived to be more annoying than all other sound conditions (.01 ⁄ 1010 < p < .01, one-tailed, Cohen’s d 1.08 < d < 4.60) with one exception: Speech and Masked Speech SNR = 0 did not differ significantly. Differences between masked speech conditions were statistically verifiable for Masked Speech SNR = 0 compared to Masked Speech SNR = 3 (t(23) = 2.00, p = .038, one-tailed, Cohen’s d = 0.58) and Masked Speech SNR = 6 (t(23) = 2.44, p = .019, one-tailed, Cohen’s d = 0.71), respectively. The latter two sound conditions did not differ significantly. Perceived annoyance during Variable Speech-like Noise was significantly higher compared to Silence (t(23) = 8.73, p < .001, one-tailed, Cohen’s d = 2.52) and Continuous Speech-like Noise (t(23) = 4.10, p < .001, one-tailed, Cohen’s d = 1.18) as well as to Pink Noise (t(23) = 2.11, p = .033, one-tailed, Cohen’s d = 0.61). But on the other hand Variable Speech-like Noise was perceived significantly less annoying compared to Speech (t(23) = 3.73, p = .002, one-tailed, Cohen’s d = 1.08) and Masked Speech SNR = 0 (t(23) = 2.34, p = .022, one-tailed, Cohen’s d = 0.68). No significant difference was found in comparison to Masked Speech SNR = 3 and Masked Speech SNR = 6. Silence was perceived to be less annoying than Pink Noise (t(23) = 5.43, p < .001, one-tailed, Cohen’s d = 1.57) and Continuous Speech-like Noise (t(23) = 4.61, p < .001, one-tailed, Cohen’s d = 1.33). Annoyance ratings did not differ significantly among the latter two sound conditions. Table 5 again summarizes results on annoyance ratings in terms of significance groups. 4. Discussion The experiment explored verbal serial recall performance, which is the standard measure for verbal short-term memory capacity, during eight different sound conditions. The speech and speech-like noise conditions varied with respect to STI (0–0.8) and F (0–0.261). By doing so, it was examined to which extent Table 5 Summarized results with regard to annoyance ratings in terms of significance groups (significant differences between test conditions are indicated by different letters). Sound conditions
Significance groups
Silence (Baseline Condition) Pink noise Continuous speech-like noise Variable speech-like noise Masked speech SNR = 6 dB(A) Masked speech SNR = 3 dB(A) Masked speech SNR = 0 dB(A) Speech
A B B C C C D D
the effects of speech and non-stationary speech-like noise onto cognitive performance can be estimated by STI or F. However, these variations do not fully cover the range the models apply to and further data needs to be collected. As can be seen from Table 4, the reported behavioral effects are only covered partly by the two models proposed by Hongisto [1] and Schlittmeier et al. [6] with each model having its individual shortcomings. Hongisto’s STI-performance curve is to be applied by definition to background speech. Consonant with the model, the speech condition with the highest STI, namely unmasked Speech has the highest detrimental impact. Furthermore Hongisto’s model assumes that the lower the STI value in the range 0.20–0.70, the lower the performance decrement. However, step-wise reduction of the background speech’s disturbance effect could only be statistically verified from STI = 0.80 to STI = 0.53 and from STI = 0.53 to STI = 0.45, but not from STI = 0.45 to STI = 0.37 in the present study. However, this result might be a matter of effect sizes and those differences might turn out to be significant if the sample size is increased. Aside from this aspect, the approach by Hongisto [1] logically does not account for the effects of non-speech sounds, like e.g. non-speech office noise or background music. However, these non-speech sound conditions have been verified to disturb verbal short-term memory significantly (cp. for an overview [6]). Although the STI – as a measure of speech intelligibility – describes per definition a characteristic which applies only to speech but not to non-speech sounds, the corresponding mathematical calculations could also be applied to non-speech sounds. Variable Speech-like Noise is something in between speech and non-speech and maybe best corresponds to foreign speech. But if Variable Speech-like Noise was treated like speech the predictions based on the STI would not fit the empirical results. This can be seen from the fact that Speech reduces serial recall performance significantly more than Variable Speech-like Noise although these two signals yield comparable F values and the STI value of Variable Speechlike Noise is expected to be about 0.7 or even higher if no background sound is present (maximum decline of performance is expected if the STI exceeds 0.7). This weak of strength, however, applies also to the model proposed by Schlittmeier et al. [6] since the fluctuation strengths of the two foresaid signals hardly differ but Speech reduces serial recall performance significantly more than Variable Speech-like Noise. Furthermore, this approach seems to fail with regard to the tested masked speech conditions. The model does not predict a substantial impairment of performance for the masked speech sounds but those were observed to be more impairing than Variable Speech-like Noise. This shortcoming may be due to the fact that the sound samples used for the development of the model did not include signals being partially masked by continuous background sound. It is founded within the model of fluctuation strength that it will decline if a variable sound (signal) is superimposed with a continuous sound. However, it must be stated that the F approach is doing quite well in laboratory settings with very low continuous background sound but in the field, where higher level continuous background noise is quite common, it might not work as good. Please note that speech intelligibility (STI) and variability (F) did not vary independently of each other in the present study. Superimposing speech with continuous noise reduced its intelligibility but the speech’s fluctuation strength was reduced simultaneously. This is due to the fact that continuous noise reduces spectral differences between successive auditory-perceptual tokens and fills up macro and micro pauses in the speech signal, i.e., reducing its prominent temporal characteristic and spectral variability. Nonetheless, the present study speaks in favor of highly intelligible background speech reducing cognitive performance more than speech-like noise of comparable fluctuation strength. This
A. Liebl et al. / Applied Acoustics 110 (2016) 170–175
difference seems to be due to the fact that speech sounds not only comprise a distinct temporal structure with varying successive auditory-perceptive tokens but potentially also carry semantic meaning. At this stage the relevance of semanticity is not considered in either approach (STI or F). The calculation procedures in both models do not reconsider whether a signal is native or foreign speech – thus carrying semantic meaning or not. However, this fact might be crucial from a cognitive psychology perspective as the decrement of performance due to the presence of background sound is assumed to be due to a conflict between cognitive resources applied to the automatic processing of the background sound on the one hand and to voluntary task processing on the other hand (cf. interference by-process principle [7–9]). However, it must be mentioned that so far research on the ISE has shown that foreign language speech, nonwords or unintelligible reversed speech produce a comparable decrement of performance as compared to native speech [3,27–32] and thus it is assumed that semanticity does not contribute to the ISE. However, the sound conditions were treated and analyzed in a qualitative manner within these studies and not analyzed for STI or F. Substantial differences aside from semanticity can result from differences in recording procedures and so not only semanticity might have been changed but also STI and F for example by different signal to noise ratios. The assumption that semanticity might also contribute to the ISE is supported by LeCompte, Neely and Wilson [33] who report a stronger decrement of performance for sound conditions carrying semantic meaning. In the reported investigation it is also striking to see that the results with regard to perceived annoyance correspond to a large extent to the reported effects on performance. Based on the results of this investigation it may be concluded that sound masking with a masking sound that has comparable frequency characteristics as speech is a suitable way to reduce the negative impact of background speech onto both performance and annoyance, even so working in silence or continuous background sound is still preferred. Namely, all sound conditions were rated to be significantly more annoying than performing during silence, even if a sound condition did not reduce task performance. Previous studies have reported analogous findings for other tasks and sound conditions in terms of a general and task-independent preference for silence compared to any other sound condition [34–36], even when the sound level is as low as 35 dB(A) [2]. Since the sound pressure level was kept constant for all the experimental stimuli besides the silent control condition in the present study, annoyance ratings might even be worst in corresponding acoustic conditions in real world settings. Here, sound masking at negative SNRs will result in an overall increase of sound level which might again increase annoyance [37].
References [1] Hongisto V. A model predicting the effect of speech of varying intelligibility on work performance. Indoor Air 2005;15:458–68. [2] Schlittmeier SJ, Hellbrueck J, Thaden R, Vorlaender M. The impact of background speech varying in intelligibility: effects on cognitive performance and perceived disturbance. Ergonomics 2008;51(5):719–36. [3] Colle HA, Welsh A. Acoustic masking in primary memory. J Verb Learn Verb Behav 1976;15(1):17–31. [4] Beaman PC, Jones DM. Role of serial order in the irrelevant speech effect: tests of the changing-state hypothesis. J Exp Psychol Learn Mem Cogn 1997;23 (2):459–71. [5] Jones D, Madden C, Miles C. Privileged access by irrelevant speech to shortterm memory: the role of changing state. Quart J Exp Psychol 1992;44A (4):645–69. [6] Schlittmeier SJ, Weißgerber T, Kerber S, Fastl H, Hellbrueck J. Algorithmic modeling of the irrelevant sound effect (ISE) by the hearing sensation fluctuation strength. Atten Percept Psychophys 2012;74(1):194–203.
175
[7] Jones DM, Macken WJ. Irrelevant tones produce an irrelevant speech effect: implications for phonological coding in working memory. J Exp Psychol Learn Mem Cogn 1993;19(2):369–81. [8] Jones DM, Tremblay S. Interference in memory by process or content? A reply to Neath (2000). Psychon Bull Rev 2000;7(3):550–8. [9] Marsh J, Jones D. Cross-modal distraction by background speech: what role for meaning? Noise Health 2010;12(49):210–6. [10] Hughes RW, Hurlstone MJ, Marsh JE, Vachon F, Jones DM. Cognitive control of auditory distraction: impact of task difficulty, foreknowledge, and working memory capacity supports duplex-mechanism account. J Exp Psychol Hum Percept Perform 2013;39(2):539–53. [11] Hughes RW, Vachon F, Jones DM. Disruption of short-term memory by changing and deviant sounds: support for a duplex-mechanism account of auditory distraction. J Exp Psychol – Learn Memory Cognition 2007;33:1050–61. [12] Hughes RW. Auditory distraction: a duplex-mechanism account. PsyCh J 2014;3(1):30–41. [13] Haapakangas A, Hongisto V, Hyönä J, Kokko J, Keränen J. Effects of unattended speech on performance and subjective distraction: the role of acoustic design in open-plan offices. Appl Acoust 2014;86:1–16. [14] Jahncke H, Hongisto V, Virjonen P. Cognitive performance during irrelevant speech: effects of speech intelligibility and office-task characteristics. Appl Acoust 2013;74(3):307–16. [15] Schlittmeier SJ, Liebl A. The effects of intelligible irrelevant background speech in offices – cognitive disturbance, annoyance, and solutions. Facilities 2015;33 (1/2):61–75. [16] Loewen LJ, Suedfeld P. Cognitive and arousal effects of masking office noise. Environ Behav 1992;24(3):381–95. [17] Haka M, Haapakangas A, Keränen J, Hakala J, Keskinen E, Hongisto V. Performance effects and subjective disturbance of speech in acoustically different office types – a laboratory experiment. Indoor Air 2009;19 (6):454–67. [18] Venetjoki N, Kaarlela-Tuomaala A, Keskinen E, Hongisto V. The effect of speech and speech intelligibility on task performance. Ergonomics 2006;49 (11):1068–91. [19] Ebissou A, Parizet E, Chevret P. Use of the speech transmission index for the assessment of sound annoyance in open-plan offices. Appl Acoust 2015;88:90–5. [20] Steeneken HJM, Houtgast T. A physical method for measuring speechtransmission quality. J Acoust Soc Am 1980;67(1):318–26. [21] Cohen JD, MacWhinney B, Flatt M, Provost j. PsyScope: a new graphic interactive environment for designing psychology experiments. Behav Res Meth Instrum Comput 1993;25(2):257–71. [22] Hochmair Desoyer I, Schulz E, Moser L, Schmidt M. The HSM sentence test as a tool for evaluating the speech understanding in noise of cochlear implant users. Am J Otol 1997;18(6):S83. [23] Dreschler WA, Verschuure H, Ludvigsen C, Westermann S. ICRA noises: artificial noise signals with speech-like spectral and temporal properties for hearing instrument assessment. Audiology 2001;40:148–57. [24] Ellermeier W, Hellbrueck J. Is level irrelevant in ’irrelevant speech’? Effects of loudness, signal-to-noise ratio, and binaural unmasking. J Exp Psychol Hum Percept Perform 1998;24(5):1406–14. [25] Hongisto V, Keranen J, Larm P. Simple model for the acoustical design of openplan offices. Acta Acust United Acust 2004;90(3):481–95. [26] Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat 2001;29(4):1165–88. [27] Jones DM, Macken WJ. Auditory babble and cognitive efficiency. The role of number of voices and their location. J Exp Psychol: Appl 1995;1:216–26. [28] Jones DM, Miles C, Page J. Disruption of proofreading by irrelevant speech: effects of attention, arousal or memory? Appl Cogn Psychol 1990;4(2):89–108. [29] Klatte M, Kilcher H, Hellbrueck J. Wirkungen der zeitlichen Struktur von Hintergrundschall auf das Arbeitsgedächtnis und ihre theoretischen und praktischen Implikationen. Zeitschrift für Experimentelle Psychologie 1995;42:517–44. [30] LeCompte DC, Shaibe DM. On the irrelevance of phonological similarity to the irrelevant speech effect. Quart J Exp Psychol A 1997;50(1):100–18. [31] Salamé P, Baddeley AD. Noise, unattended speech and short-term memory. Ergonomics 1987;30(8):1185–94. [32] Salamé P, Baddeley A. Disruption of short-term memory by unattended speech: implications for the structure of working memory. J Verb Learn Verb Behav 1982;21(2):150–64. [33] LeCompte DC, Neely CB, Wilson JR. Irrelevant speech and irrelevant tones: the relative importance of speech to the irrelevant speech effect. J Exp Psychol Learn Mem Cogn 1997;23(2):472–83. [34] Haapakangas A, Kankkunen E, Hongisto V, Virjonen P, Oliva D, Keskinen E. Effects of five speech masking sounds on performance and acoustic satisfaction. Implications for open-plan offices. Acta Acust United Acust 2011;97:641–55. [35] Schlittmeier SJ, Hellbrueck J. Background music as noise abatement in openplan offices: a laboratory study on performance effects and subjective preferences. Appl Cogn Psychol 2009;23(5):684–97. [36] Schlittmeier S, Feil A, Liebl A, Hellbrück J. The impact of road traffic noise on cognitive performance in attention-based tasks depends on noise level even within moderate-level ranges. Noise Health 2015;17(76):148–57. [37] Veitch JA, Bradley JS, Legault LM, Norcross S, Svec JM. Masking speech in openplan offices with simulated ventilation noise: noise level and spectral composition effects on acoustic satisfaction, Ottawa, Ontario Canada; 2002.