Journal of Phonetics (!987) 15, 15-27
The adaptation of produced voice-onset time Donald G. Jamieson* and Margaret F. Cheesmant Speech and Audition Laboratory, Department of Psychology and Department of Linguistics, University of Calgary , Calgary , Alberta T2N 1N4, Canada Received 28th January 1986, and in revised form 23rd June 1986
Merely listening to a repeated [pha] sound tends to shorten the voice-onset time of [ph a] sounds which are subsequently produced by the listener by an average of approximately 7 ms. This "adaptation" effect seems to be induced readily and to be highly reproducible. In contrast to the results with perceptual testing, listening to [ba] sounds has no effect on [pha] productions. Moreover, the voice-onset time of produced [ba] sounds is not affected by listening to [ba] or to [pha]. The production adaptation effect seems to be short lived, since voice-onset times for [ph a] tokens uttered 30 s after adaptation are statistically indistinct from those uttered without adaptation. These results replicate and extend those reported by Cooper, Blumstein & Nigro (e.g. J. Phonetics, 1975), and they stand in distinction to the failure of Summerfield, Bailey & Erickson (J. Phonetics, 1980) to induce perceptuo-motor adaption.
1. Introduction
It is clear that the context in which a speech sound is heard can produce significant changes in the perception of that sound. For example, after listening to a series of sounds having a long voice-onset (VOT), such as [ph a] or [tha], speech sounds with intermediate values on a VOT continuum are more likely to be heard as voiced (e.g. Eimas, Cooper & Corbit, 1973; Ohde, 1982). Such perceptual "selective adaptation" effects are robust and easily induced, and they have now been studied in many laboratories (e.g. Eimas & Corbit, 1973; Ades, 1974; Cooper, 1974; Diehl, 1975; Sawusch, 1977; Sharf & Ohde, 1981; Kat & Samuel, 1984; Samuel, 1986). While the mechanisms underlying selective adaptation effects are as yet unspecified, it seems reasonable to postulate that a form of perceptual "tuning" occurs during adaptation. This effect may be related to the tuning which occurs during the acquisition of a language. Indeed, Cooper and his colleagues have reported that listening to a repeated speech sound can alter the voice-onset time of produced stop consonants (cf. Cooper, 1974, 1979; Cooper & Lauritsen, 1974; Cooper & Nager, 1975). Cooper (1974) reported that the mean VOT of produced [phi] syllables was reduced following perceptual adaptation with [phi], but that produced VOT was not affected by perceptual *To whom correspondence should be addressed. tPresent address: Department of Communication Disorders, University of Minnesota, Minneapolis, MN, U.S. 0095-4470/87/010015
+
13 $03.00/0
© 1987 Academic Press Inc. (London) Ltd.
16
D. G. Jamieson and M . F. Cheesman
adaptation with [i] or with [bi] . On the other hand, the VOT of produced [bi] syllables was not affected by adaptation with any of the three stimuli . Cooper & Lauritsen ( 1974) reported that changes in produced VOT could be induced by listening to stimuli which differed in place of articulation: listening to repeated presentations of [th i] stimuli decreased the VOT of produced [phi] utterances. This effect could not be attributed to mimicry , since VOT values were shortened for speakers whose produced VOT values, prior to adaptation, were shorter than the adapting stimulus, as well as for speakers whose produced VOT values were initially longer than those of the adapter. Cooper & Nager ( 1975) demonstrated that the VOTs in produced [rephi] syllables and in produced [rethi] syllables could be shortened by listening to repetitions of [reph i] syllables. These changes could not be attributed to stress or to speech rate, since neither the closure interval, nor the duration of the final stressed vowel were changed following adaptation. In distinction to the perceptual selective adaptation results, the reports by Cooper and his colleagues of "perceptuo-motor selective adaptation" have not been replicated by other laboratories. In fact , in the only subsequent, published study of selective adaptation effects in production, Summerfield, Bailey & Erickson ( 1980) reported that listening had no consistent effect on produced VOT. Summerfield eta!. (1980) had their subjects speak the syllables [rephi], [reth i] , or [rekhi], after listening to repetitions of [i] or of[rethi] . They reported that there was no systematic perceptuo-motor adaptation, with only three of their six subjects showing shorter VOT values in their productions of [rethi], VOT values for [rephi] and for [rekhi] tended to increase, overall, following adaptation. Based on z scores, VOTs following adaptation with [rethi] differed significantly from those following [i] in just four of the 18 conditions-with two comparisons showing a decrease following adaptation, and two comparisons showing an increase. Summerfield et a!. ( 1980) consider four possible explanations for their failure to replicate the perceptuo-motor adaptation result reported by Cooper and his colleagues: (I) insufficient sample size; (2) resistance to adaptation by one's own speech; (3) unusually careful articulation by their subjects; and (4) too few adapter repetitions. The first hypothesis was rejected because Cooper had found that perceptuo-motor adaptation occurred in nearly three-fourths of all subjects tested. However, four of the Summerfield et a!. subjects showed non-significant reductions in VOT for [rethi], following adaptation (of 3.6, 4.7, 3.8 and 3.6ms, respectively, for subjects 1-4). These results suggest that perceptuo-motor adaptation may have occurred for two-thirds of the subjects, when producing [rethi], but that the effect may not have been detected because of insufficient power. The second hypothesis was considered unlikely since Cooper, Blumstein & Nigro (1975) had found that motor-perceptual adaptation was greatest when subjects listened to their own speech. Moreover, using synthetic adapting stimuli, Cooper & Lauritsen (1974) found that adaptation decreased VOT, regardless of whether the subject's preadapted VOT values were larger or smaller than those of the adapting stimulus. This result indicates that perceptuo-motor adaptation cannot be equated to the subject attempting to mimic the adapting stimulus in some way. In support of the third hypothesis, the subjects in the Summerfield et a!. ( 1980) study did appear to articulate carefully and/or slowly, displaying VOT and closure durations which were both longer than those found by Cooper & Nager (1975), and also more consistent-with smaller standard deviation values. On the other hand, the subject in the Summerfield et a!. study who produced the longest VOT and closure
17
Perceptuo-motor adaptation
durations was also the sole subject to show perceptuo-motor adaptation on a consistent basis. In considering the final hypothesis, Summerfield et al. (1980) noted that while Bailey (1974) reported that perceptual selective adaptation was not increased by additional adapter repetitions beyond ten, Bailey's experiment may have had insufficient power to detect the effects of larger numbers of adapters. Moreover, it could be that perceptuo-motor adaptation requires larger numbers of adapters than does perceptual adaptation. In consideration of these results, we felt it desirable to attempt to replicate Cooper's perceptuo-motor adaptation effect, with a larger number of adapter repetitions than that used by Summerfield et al. (1980). We used synthesized [pha], [ba] and [a] tokens as adapters and control stimuli. We report the results of three experiments. Experiment 1 examined VOT for productions of[pha], after listening to [pha], [ba], or [a]. Experiment 2 examined VOT for productions of [ba], after listening to [ph a], [ba], or [a] . Experiment 3 examined the time course of recovery from perceptuo-motor adaptation for [pha] utterances. As indicated below, we found no difficulty in replicating the basic details of Cooper's (1974) report of perceptuo-motor adaptation.
2. Experiment 1 2.1. Method 2.1 .1. Stimuli The three speech sounds were synthesized using a cascade software synthesizer (KewleyPort, 1978; Klatt, 1980), implemented on a PDP 11 /34 computer. All signals were synthesized at a sample rate of 10kHz, and stored on disk. The first stimulus was a 300 ms consonant-vowel (CV) syllable, which had a voice onset time of 60 ms, and was heard as [pha]. The second stimulus was a 300ms CV syllable, which had a Oms voice-onset time and was heard as [ba] . The third adapter was a 300 ms segment with parameters identical to those of the vowel portion of the previous syllables, and was heard as [a]. Other parameters of these stimuli are displayed in Table I. Stimuli were output through a 12-bitdigital-to-analog converter, amplified (Crown D-7 5), low pass filtered at 4800Hz, and recorded on audio tape using a Rev ox B71 0 Mkll recorder. Each adaptation sequence contained 40 stimulus presentations with a constant interstimulus interval of 400 ms. Stimuli were presented to listeners at 80 dB SPL, monaurally to the right ear over Telephonics TDH-39 earphones. Table I. Parameters used to synthesize the [ba]- [pha] continuum of stimuli for Experiment I* Stimulus [ba] [ph a] [a]
Transition duration
Fl
F2
F3
VOT
onset
onset
onset
Vowel duration
40 0 0
0 60 0
400 700 700
850 1200 1200
2133 2600 2600
255 235 300
*[ba] and [ph a] stimuli began with a 5 ms burst appropriate for the bilabial place of articulation. Fl, F2 and F3 steady state values for the [ba] stimulus were 700, 1200, and 2600Hz, respectively. FO was fixed at 120Hz for the initial portion of each signal, then fell linearly to II 0 Hz over the final 50 ms.
18
D. G. Jamieson and M. F. Cheesman
2.1.2. Procedure Subjects were tested individually, while seated in a double-walled, lAC sound-attenuated chamber. Speech tokens were spoken by the subject into an AKG C451 EB condenser microphone with integral preamplifier. This signal was recorded using a second Revox 8710 Mkll recorder, which remained active throughout the experimental session. With the microphone in test position, approximately 15 em from the subject's lips, input levels were adjusted to permit optimal recording for each speaker. Each subject served in three experimental sessions, one with each of the three adapters. All three sessions were completed on a single day, with an interval of not less than five minutes duration separating any two sessions. The sequence of the three types of sessions was determined randomly for each subject. Each session lasted approximately twelve minutes and consisted of a series of ten trials. Each trial contained one presentation of the adapting sequence of 40 repetitions of the adapter for that session ([ph a], [ba], or [a], depending on the type of session). The subject was instructed to speak the syllable [pa] into the microphone, as soon as each adapting sequence had concluded. 2.1.3. Subjects Both authors served as subjects (identified as OJ and MC, respectively). In addition, two males and two females, ranging in age from 20 to 35, were recruited from University staff and students, and paid for their participation. All subjects were native speakers of Canadian English who reported that they had no history of speech or hearing disorder, and all had normal hearing, as confirmed by Bekesy, air-conduction, pure-tone audiometry. 2. 1.4. VOT Analysis Each speech token was analyzed using a Bruel and Kjaer Type 2033 high resolution signal analyzer. The signal was sampled at 12.8 kHz, providing temporal resolution to within ± 0.08 ms. The voice onset time of each token was measured using the following procedure: (1) the onset of the initial consonant release burst was determined from the time function display of the signal, and this onset point was marked with a visible cursor for use as the relative zero time point; (2) the onset of voicing was identified as the initial zero crossing of the first identifiable pitch period, and this point was marked with a visible cursor; (3) the temporal interval separating these two cursor settings was recorded as the VOT for that token. Each token was analyzed by two independent judges, one of whom was naive with respect to the purposes of the experiment. Any disagreements between the values produced by the two judges were first dealt with by having both judges remeasure the VOT; remaining disagreements were dealt with in conference between the two judges. 2.2. Results and discussion The mean and standard deviation of the measured VOT values of the [ph a] sounds produced by each subject after listening to the sequences of [ph a] , [ba], and [a] syllables, respectively, are presented in Table II. The mean VOT was 64.0 and 64.1 ms following presentation of the [a] and the [ba] sounds, respectively. However, mean produced VOT decreased to 57.0 ms after listening to [ph a] sounds, replicating the result reported by Cooper ( 1974). Each individual subject showed this effect, although there was considerable variation in both the range of the produced VOTs and in the magnitude of the VOT
Perceptuo-motor adaptation
19
Table II. Means and standard deviations of voice-onset time measurements for [ph a] productions in Experiment I, as a function of adapter condition Adapter type [ph a] Subject
[a]
[ba]
Mean shift in VOT
Mean
SD
Mean
SD
Mean
SD
[a]-[pha]
[a]-[ba]
AB MP DJ MC AD TH
62.8 61.0 70 .8 62.6 58.3 26.7
10.2 22.4 11.8 9.7 14.6 8.7
68.8 74.1 74.8 67.9 64.3 34.0
8.7 16.7 7.1 10.1 11.8 14.7
69.1 74.2 77.6 66.2 63.4 33.9
10.0 15.0 11.0 7.8 9.5 14.0
6.0 13.1 4.0 5.3 6.0 7.3
-0.3 -0.1 -2.8 1.7 0.9 0.1
Mean
57.0
12.9
64.0
11.5
64.1
11.0
7.0
-0.1
shift which accompanied [ph a] adaptation. A repeated-measures analysis of variance confirmed the differences in produced VOT as a consequence of adaptation (F = 27. 72; df = 2,10; p < 0.0001). The results of Experiment 1 confirm Cooper's (1979) conclusion that merely listening to voiceless stop consonants can induce a measurable change in the produced VOT of voiceless stops. Furthermore, listening to voiced stops does not influence the produced VOT of voiceless stops. Cooper (1979) also concluded that these effects were confined to the production of voiceless stops; perceptual adaptation does not alter produced VOT for voiced sounds. Experiment 2 was undertaken to evaluate this conclusion. We followed the procedure used in Experiment 1, with the exception of requiring subjects to speak the CV [ba], rather than [ph a] , at the conclusion of each adapting sequence. 3. Experiment 2 3.1. Method 3.1.1 . Procedure At the conclusion of each adapting set, the subject spoke the syllable [ba] into the microphone. The adapting stimuli, and all other aspects of the experimental procedure, were as described for Experiment I.
3.1.2. Subjects Three males and three females, ranging in age from 20 to 35, were recruited from University staff and students, and paid for their participation . All were native speakers of Canadian English who reported that they had no history of speech or hearing disorder, and all had normal hearing, as confirmed by Bekesy, air-conduction, pure-tone audiometry. Four of the subjects had participated in Experiment I, while two were experimentally naive. 3.2. Results and discussion The mean and standard deviation of the measured VOT values of the [ba] sounds produced after listening to the sequences of [pha], [ba], and [a] syllables, respectively ,
20
D. G. Jamieson and M. F. Cheesman
Table III . Means and standard deviations of voice-onset time measurements for [ba] productions in Experiment 2, as a function of adapter condition Adapter type [ph a]
(a]
[ba]
Mean shift in VOT
Mean
so
Mean
so
Mean
so
[a]-[pha]
[a]- [ba]
MR
5.9 6.2 7.0 1.4 5.3 9.3
4.7 9.1 4.9 14.8 5.0 1.5
6.1 4.1 8.9 7.5 6.1 9.2
4.1 4.7 2.8 2.1 4.9 1.2
5.9 2.2 3.2 6.4 5.7 8.5
5.5 3.8 6.0 2.4 3.8 1.9
0.2 -2.1 1.9 6.1 0.8 -0.1
0.2 1.9 5.7 1.1 0.4 0.7
Mean
5.9
6.7
7.0
3.3
5.3
3.9
1.1
1.7
Subject AB MP
OJ MC
OK
are presented in Table III. VOTs for [ba] were clustered much more tightly than the VOTs for the [pha] utterances from Experiment 1, and there is much less variation within and between conditions. Following presentation of [a], [ba], and [pha] sounds, the mean VOTs were 7.0, 5.3, and 5.9 ms, respectively. Mean VOTs were shorter for utterances following [ba] presentations than for those following [a] presentations for each of the six subjects. However, a repeated-measures analysis of variance indicated that, in spite of the reduced within-set variation, produced VOT was unchanged, either after listening to [pha], relative to [a], or after listening to [ba], relative to [a] (F < 1.0; df = 2,10). These results confirm Cooper's (1979) conclusion that the VOT of voiced stop consonants is resistant to alteration by adaptation. Experiments 1 and 2 confirmed the basic results reported by Cooper (1979) that mere listening to a repeated speech sound can change the produced VOT of voiceless stop consonants, but that VOT is unaffected for voiced consonants. These results indicate that a portion of the selective adaptation effect with voiceless consonants occurs at a level which is common to perception and production. Perceptual and perceptuo-motor adaptation might, therefore, reflect a change in a single mechanism. Alternatively, adaptation might affect a variety of mechanisms, one of which causes the changes in articulation which are here referred to as "perceptuo-motor adaptation", and others which cause the perceptual changes. Such possibilities motivate research which examines the relation between perceptuo-motor adaptation and perceptual adaptation using various measures. One possible way to establish a relation between perceptual adaptation and perceptuo-motor adaptation is to examine the time course of the phenomena. Similarity of the time course for inducing selective adaptation effects and/or the time course for recovering from these effects, would encourage the notion that a single mechanism was responsible for both perceptual and perceptuo-motor effects. For voiceless adapters, perceptual effects are known to last for at least a full minute following the end of the sequence of adapters (Jamieson & Cheesman, 1986; Sharf & Ohde, 1981 ). In order to determine the time course of recovery from perceptuo-motor adaptation for VOT, and to relate this measure to the data available for recovery from perceptual adaptation for VOT, Experiment 3 was undertaken.
Perceptuo-motor adaptation
21
4. Experiment 3
4.1. Method 4.1 . 1. Stimuli The taped [pha] (VOT = 60ms) and [a] sounds used in Experiments I and 2 were used as adapting stimuli.
4.1.2. Procedure Each subject served for a total of four experimental sessions. Each session lasted approximately 20 min and contained five adaptation/test trials. In two of these sessions, [ph a] adapters were used, while the remaining two sessions used the [a] sounds. Each subject received these sessions on four different days, with order of presentation randomized. Each adaptation/test trial began with an adaptation sequence of 40 repetitions of the adapting stimulus for that session. The adaptation sequence was followed by a threeminute recording interval during which the subject produced a sequence of seven [ph a] tokens in response to visual cues; at the cessation of the adaptation sequence, and at intervals of 30 s thereafter, a red, light-emitting diode, positioned 1.5 m directly in front of the subject, was illuminated to cue the subject to speak the CV "[ph a]", once, into the microphone. Three minutes after the final adapter of the previous trial, a new trial began with the presentation of 40 adapting stimuli. Other aspects of the experimental procedure were as described in Experiment I. 4.1.3. Subjects Five males and five females, ranging in age from 18 to 36 years, were recruited from University staff and students, and paid for their participation. All were native speakers of Canadian English who reported that they had no history of speech or hearing disorder and all had normal hearing, as confirmed by Bekesy, air-conduction, pure-tone audiometry. Four subjects had previously participated in Experiments 1 and 2, while six were experimentally naive. 4.2. Results and discussion The mean and standard deviation of the measured VOT values of the [pha] sounds produced immediately after listening to the sequences of [ph a] and [a] syllables, respectively, are presented in Table IV. The mean VOT was 69.8 ms when measured immediately following presentation of the [a] sounds. However, the mean VOT decreased to 63.3 ms when measured immediately after listening to [ph a] sounds (t(9) = 2.88, p < 0.0 I). This decrement in produced VOT, following listening to voiceless adapters, replicates the results of Experiment 1, and of Cooper (1974, 1979). Eight of the ten subjects showed this effect, although as in Experiment 1, there was considerable variation in both the range of the produced VOTs and in the magnitude of the VOT shift which accompanied the [pha] adaptation. Figure I shows, however, that the decrement in VOT consequent to [ph a] adaptation is relatively short lived. VOT values for the second utterance, measured just 30 s after the cessation of adaptation, were 69.9 following [a] sounds and 67.9, following [ph a] sounds (t(9) = 1.02, p > 0.05). Since the utterance at the 30-s measurement point was always preceded by an utterance at the 0-s measurement point, it remains possible that
IV IV
Ta ble IV. Means and standard deviations of voice-onset time measurements for [ph a] productions in Experiment 3, as a function of adapter condition and delay 30s delay
Os delay
60s delay
2min delay
b
3min delay
CJ
[ph a]
[ph a]
[a]
[ph a]
[a] Mean
Mean
so
Mean
so
Mean
so
Mean
so
[a]
;:;.
Mean
so
Mean
MP OJ MC JW TG SE OL OT SK
52.1 77.1 66.4 57.8 64.8 76.6 68 .9 62.8 56.2 49.9
8.3 7.6 8.9 16.5 9.5 9.5 8.8 15.5 15.2 16.5
75 .7 81.4 69.9 64.5 76.9 85.8 71.8 62.1 55.3 54.9
15.1 9.4 6.6 12.3 16.4 12.3 11.4 6.2 13.7 22.1
58.3 83.7 59.4 60.4 69.0 77.7 76.7 65 .8 62.3 65.6
11.0 12.2 8.2 12.5 11.1 8.5 8.2 8.8 9.8 17.8
69 .9 84.7 65.4 56.8 73. 1 89.1 72.3 65 .9 63 .1 58.9
13.0 14.6 9.6 17.6 12.9 13.7 13.5 10.6 15.2 21.4
67 .9 92.6 60. 1 71.7 76.6 83.2 80.8 59.5 69.0 62.2
10.7 17.9 9.7 14.7 10.9 14.2 12.6 8.9 6.5 15.4
71.3 86.9 56.1 61.8 78 .6 76 .7 77.8 68.4 70.5 62.6
10.2 11 .7 15.3 10.2 9.5 11.5 9.5 10.7 10.1 20.6
62.9 79 .5 69.5 63.4 73 .9 79 .8 86.5 60.2 67.7 56.1
10.4 14.2 9.7 11.1 11.0 11 .2 11.2 14.1 10.3 17. 1
73 .9 87.9 63.7 56.1 89.6 81.5 69.4 67. 1 57.3 63 .3
13.2 12.0 8.8 10.6 11.3 11.8 9.8 10.5 10.9 17.9
61.8 85.4 70.0 61.3 76.6 82.0 79.2 56.7 63 .7 69.1
10.3 12.0 9.0 13.9 13. 1 11.6 11 .9 9.2 13.3 21.0
74.8 87.5 67.6 57.0 81.3 84.8 75.6 66.6 65 .0 66.7
11 .0 14.8 10.2 16.8 14.8 7.9 11.2 10.6 11.4 2 1.2
Mean
63 .3
11.6
69.8
12.5
67.9
10.8
69.9
14.2
72.4
12.2
71.1
11.9
70.0
12.0
71.0
11.7
70.6
12.5
72.7
13.0
AB
Mean
so
[ph a]
[a]
so
Subject
so
[ph a]
[a]
Mean
so
Mean
so
;:s
n;·
"' C) ~
1:) ~
::::....
~ ~
Q
"';:s 1:) ~
Figure l. The mean shift in the voice-onset time of the produced [ph a] tokens as a function of the delay following adaptation. Data points indicated by squares are based on the mean difference between the VOT produced following [ph a] adaptation, and the VOT produced after listening to a comparable number of [a] sounds, for the ten subjects in Experiment 3. Vertical bars indicate one standard error. The circle which is plotted at 0-s post-adaptation, shows the mean shift observed in Experiment I, immediately following adaptation , and is plotted to show the excellent agreement between the two experiments, in the one comparable condition. This point is based on the mean of ten observations per subject, collapsed across the six subjects who served in Experiment I.
perceptuo-motor adaptation lasts longer than 30 s, but that a single utterance speeds recovery. However, this possibility seems unlikely since perceptual adaptation outlasts many perceptual judgements (Sharf & Ohde, 1981; Jamieson & Cheesman, 1986). A repeated-measures analysis of variance with the factors adapter type (two) by VOT measurement point (seven) by sessions (two) by trials within sessions (five) confirmed that the difference in produced VOT following [pha] adaptation was short lived (F = 4.25; df = 6,54; p = 0.0014 for the effect of VOT measurement point, and F = 2.79; df = 6,54; p = 0.194 for the interaction of adapter with VOT measurement point). 5. General discussion
Given the ease with which we were able to induce adaptation in our subjects, and the apparent innocence of the procedural differences, we can only speculate that the failure of Summerfield et al. (1980) to obtain perceptuo-motor adaptation must be attributed either to an unusual sample, or to insufficient power. The first hypothesis is related to the observation that some proportion of subjects do not show perceptuo-motor adaptation. Just two of our subjects failed to show adaptation, while Cooper (1979) found that approximately 25% of his subjects failed to show perceptuo-motor adaptation. No explanations have yet been offered to account for such individual differences. However, the subjects who do not adapt nevertheless show apparently normal speech perception in other ways, and their performance clearly warrants additional detailed study. Summerfield et al. (1980) may have included an unusually high number of these individuals in their sample of subjects. Alternatively, Summerfield et al. (1980) may have failed to detect perceptuo-motor adaptation in their listeners because of insufficient power. As noted above, four of their
24
D. G. Jamieson and M . F. Cheesman
six listeners did show a non-significant VOT shift in the predicted direction. Since Summerfield et a!. used just I 0 adaptation repetitions, delaying their test for three seconds following adaptation, only a small amount of perceptuo-motor adaptation may have been present in that study. 5.1 . Relation to language learning As Cooper ( 1979) concluded, the VOT values associated with voiceless ([ph a]) productions can be altered by merely listening to a repeated voiceless adapter. The most important situation in which speech perception and production changes in response to listening to speech sounds is during language acquisition. In this case, both perception and production change to bring performance into closer correspondence with the language environment (e.g. Williams, 1979). Perceptual adaptation also works to bring performance into closer correspondence with the speech context: for example, in the context of a large number of long-VOT sounds, both the category boundary and the peak of the discrimination function shift towards longer VOT values. However, the present work, like that of Cooper and his associates (cf. Cooper, 1979), indicates that perceptuo-motor adaptation is not consistent with the changes which accompany language acquisition: the production of short-VOTs is unaffected by adaptation, while the production of long-VOTs is unaffected by adaptation with short-VOT sounds. Moreover, productions of long-VOT sounds can change in the wrong direction following adaptation with long VOT adapters: the produced VOT of subsequent utterances is reduced, regardless of whether the talker's unadapted VOT is smaller or larger than the VOT of the adapting stimuli. This result may indicate that the changes in produced VOT which accompany language acquisition are due to mechanisms which are different from those which are tapped by perceptuo-motor adaptation. 5.2. Voiced vs. voicless stimuli The variety of ways in which voiced and voicless stimuli have been found to differ in various selective adaptation experiments may offer useful clues to understand the processing of voicing information. First, the present work confirms that perceptuomotor adaptation occurs when voiceless stimuli are both the adapting and the produced stimuli, but not when voiced stimuli are used either as the adapting or as the produced stimulus. Second, several authors have reported that perceptual adaptation with voiceless stimuli produces a larger effect than adaptation with voiced stimuli (e.g. Eimas eta!., 1973; Kat & Samuel, 1984; Raz & Wightman, 1984; Sawusch & Mullinex, 1985; although some authors [e.g. Ohde, 1982] have found that voiced and voiceless adapters did not differ in effectiveness). Third, it is now well established that perceptual adaptation with voiced stimuli is highly ear specific, so that when adapting and test stimuli are each presented to just one of the ears, a much larger perceptual change is seen when the same ear is adapted and tested (Eimas eta!. , 1973; Ohde, 1982; Jamieson & Cheesman, 1986). On the other hand, perceptual adaptation with voiceless stimuli may be largely earindependent, so that the extent of the perceptual change is the same regardless of whether the adapted ear or the unadapted ear is tested (Ohde, 1982; Jamieson & Cheesman, 1986). Fourth, adaptation by voiceless stimuli lasts much longer than adaptation by voiced stimuli (Jamieson & Cheesman, 1986). The distinction is further heightened by the fact that the produced VOT of voiceless tokens is not altered by adaptation with voiced
Perceptuo-motor adaptation
25
stimuli. Additional support is found in the failure to observe any change in produced VOT for voiced tokens following adaptation. These results can be interpreted as evidence that some aspects of the processing of voiced and of voiceless speech stimuli occur at different locations in the auditory system. 5.3. Perceptual vs. perceptuo-motor adaptation It seems clear that perceptual adaptation and perceptuo-motor adaptation for VOT reflect distinct effects. First, perceptual adaptation seems to last longer than perceptuomotor adaptation. The present study indicates that perceptuo-motor adaptation for VOT is relatively short-lived. Just 30 s after the end of the adaptation sequence, VOT was statistically indistinct from unadapted VOT. On the other hand, Jamieson & Cheesman (1986) found that perceptual adaption produced by voiceless adapters lasted at least one minute, and Sharf & Ohde (1981) reported that some perceptual adaptation produced by voiceless adapters lasted at least 30 min. Second, the mean perceptuo-motor voiceonset time shift in the present experiments is about 7 ms-approximately twice as large as the perceptual shift observed with a synthetic VOT ([ba]-[pha]) continuum using the same adapter (Jamieson & Cheesman, 1986). To our knowledge, the NAPP model (cf. Nearey and Hogan, 1986) is the only formal model which permits a precise prediction of the expected relation between production and perception in such circumstances. According to this model , the perceptual and production shifts should be approximately equal. While these results leave open the possibility that perceptuo-motor adaptation is caused by changes in a single mechanism which is common to both perception and production, they also indicate that other mechanisms must be operating which are unique to perceptuo-motor adaption or to perceptual adaptation. 5.4. Bias models of selective adaptation Two broad classes of models have been proposed to account for speech adaptation. One group of models postulates neural adaptation or fatigue of speech processing mechanisms (e.g. Eimas & Corbit, 1973; Cooper, 1979). The second group of models proposes that adaptation results from a change in response bias, possibly due to contrast with the repeated sound (e.g. Diehl, Elman & McCusker, 1978; Diehl, Kluender & Parker, 1985). While the present data are not clearly inconsistent with fatigue models, they seem to contradict the notion that selective adaptation effects reflect response biases. Contrast and bias models have a long history in perception (cf. Warren, 1985), and some quite complex bias models have been invoked in order to avoid treating sensitivity changes as " real " perceptual phenomena (cf. Jamieson, 1977; Jamieson & Petrusic, 1978). Such models explain perceptual change following adaptation in terms of a tendency to reduce a particular type of response (e.g. [ba] identification responses), independently of the stimulus presented . Support for these models comes primarily from data which show than an ambiguous speech sound (e.g. [kha]/[ga]) is more likely to be identified as [kha] following the presentation of a good token of [ga] and more likely to be identified as [ga] following the presentation of a good [kha] token (cf. Diehl et al., 1985). Bias and response contrast models seem to be unable to account reasonably for the following findings from the selective adaptation literature: (1) voiceless sounds may be more effective adapters than voiced adapters; (2) adaptation by voiced adapters is
26
D. G. Jamieson and M. F. Cheesman
largely ear-specific while the effects of voiceless adapters are largely ear-independent (e.g. Jamieson and Cheesman, 1986); (3) adaptation by voiced adapters lasts only about 15 s, while adaptation by voiceless adapters lasts far longer (Jamieson and Cheesman, 1986); (4) contrast is highly right-ear specific, while adaptation is not (Samuel, 1986); and (5) contrast and selective adaptation produce different patterns of changes in response time (Samuel, 1986). The perceptuo-motor adaptation data presented here and by Cooper (1979) present a further challenge to bias and response contrast explanations of selective adaptation effects in speech. 6. Conclusions
In confirming several basic perceptuo-motor adaptation effects, the present study provides evidence that perceptuo-motor adaptation is robust and quite easily induced in most subjects, but that it is relatively short-lived. While perceptuo-motor adaptation and perceptual adaptation do appear to have some elements in common, they also seem to differ in several details of the effects. Additional research in which the same subjects are tested in both paradigms, using common adapters, and in which a variety of measures are taken, including recovery from adaptation, will be required to establish the extent to which the two effects reflect common mechanisms. This research was supported by grants to DGJ from the Alberta Heritage Foundation for Medical Research, Health and Welfare Canada's National Health Research Development Program, and the Natural Sciences and Engineering Research Council of Canada, and by Fellowships to MFC from the Social Sciences and Humanities Research Council of Canada and the Alberta Provincial Government. We are grateful to Drs Fredrick Wightman and Terry Dolan for their hospitality at the Waisman Center, University of Wisconsin, Madison, where this work was completed while DGJ was a Visiting Fellow.
References Ades, A. E. (1974) A bilateral component in speech perception, Journal of the Acoustical Society of America. 56, 610 - 616. Bailey, P. J. (1974) Procedural variables in speech adaptation, Speech Perception: Report on research in progress in the Department of Psychology, Belfast: The Queen's University of Belfast. Cooper, W. E. (1974) Perceptuo-motor adaptation to a speech feature, Perception & Psychophysics. 16, 229- 234.
Cooper, W. E. (1979) Speech perception and production: Studies in selective adaptation. Norwood, N.J.: Ablex. Cooper, W. E., Blumstein, S. E. & Nigro, G. (1975) Articulatory effects of speech perception: a preliminary report, Journal of Phonetics , 3, 87- 98. Cooper, W. E. & Lauritsen, M. R. (1974) Feature processing in the perception and production of speech, Nature, 252, 121-123. Cooper, W. E. & Nager, R. M . (1975) Perceptuo-motor adaptation to speech: An analysis ofbisyllabic utterances and a neural model, Journal of the Acoustical Society of America, 58, 256-365. Diehl, R. L. (1975) The effect of selective adaptation on the identification of speech sounds. Perception & Psychophysics, 17, 48- 52. Diehl, R. L., Elman, J. L. & McCusker, S. B. (1978) Contrast effects on stop consonant identification, Journal of Experimental Psychology: Human Perception and Performance, 4, 599-609. Diehl, R. L., Kluender, K. R. & Parker, E. M . (1985) Are selective adaptation and contrast effects really distinct? Journal of Experimental Psy chology: Human Perception and Performance, 6, 24 - 44. Eimas, P. D., Cooper, W. E. & Corbit, J.D. (1973) Some properties of linguistic feature detectors, Perception & Psychophysics, 13, 247- 252. Eimas, P. D. & Corbit, J.D. (1973) Selective adaptation of linguistic feature detectors, Cognitive Psychology, 4, 99- 109. Jamieson, D. G. (1977) Two presentation order effects, Canadian Journal of Psychology, 31, 184- 194. Jamieson, D. G. & Cheesman, M. F. (1986) Locus of selective adaptation in speech perception. Journal of Experimental Psychology: Human Perception and Performance 12, 286-294.
Perceptuo-motor adaptation
27
Jamieson, D. G. & Petrusic, W. M. (1978) Feedback versus an illusion in time, Perception, 7, 91 - 96. Kat, D. & Samuel, A. G. (1984) More adaptation of speech by nonspeech, Journal of Experimental Psychology: Human Perception and Performance, 10, 512- 525. Kewley-Port, D. (1978) KLTEXC: Executive program to implement the Klatt software speech synthesizer, Progress Report 4, Research on speech perception. Bloomington: Indiana University. Klatt, D. H. (1980) Software for a cascade-parallel formant synthesizer, Journal of the Acoustical Society of America, 67, 971- 995. Nearey, T. M. & Hogan, J. T. (I 986) Phonological contrast in experimental phonetics: Relating distributions of production data to perceptual categorization curves. In Experimental Phonology (J. Ohala and J. Jaeger, editors), pp. 141 - 161. New York: Academic Press. Ohde, R. N. (1982) Adaptation of voicing: effects of ear of presentation and acoustic energy variables, Journal of Phonetics, 10, 265- 278. Raz, I. & Wightman, F. L. (1984) Adaptive estimation of phoneme boundaries and selective adaptation for speech, Perception & Psychophysics, 36, 21-24. Samuel, A. G. (1986) Red herring detectors and speech perception: In defense of selective adaptation, Cognitive Psychology, 18, 452-499. Sawusch, J. R. (1977) Peripheral and central processes in selective adaptation of place of articulation in stop consonants, Journal of the Acoustical Society of America , 62, 738- 750. Sawusch, J . R. & Mullenix, J. W. (1985) When selective adaptation and contrast effects are distinct: A reply to Diehl, Kluender and Parker, Journal of Experimental Psychology: Human Perception and Performance, II , 242- 250. Sharf, D. J . & Ohde, R. N. (1981) Recovery from adaptation to stimuli varying in voice onset time, Journal of Phonetics, 9, 79- 87. Summerfield, Q., Bailey, P. J. & Erickson, D . (1980) A note on perceptuo-motor adaptation of speech, Journal of Phonetics, 8, 491- 499. Warren R. M . (1985) Criterion shift rule and perceptual homeostasis. Psychological Review, 92, 574- 584. Williams, L. (1979) The modification of speech perception and production in second-language learning, Perception & Psychophysics , 26, 95-104.