fortis stops

fortis stops

Journal of Phonetics (1987) 15, 365-381 The effects of voice quality on the perception of lenis/fortis stops K. J. Kohler and W. A. van Dommelen lnst...

6MB Sizes 190 Downloads 41 Views

Journal of Phonetics (1987) 15, 365-381

The effects of voice quality on the perception of lenis/fortis stops K. J. Kohler and W. A. van Dommelen lnstitut fur Phonetik und digitate Sprachverarbeitung, Universitiit Kiel, Olshausenstr. 40, D-2300 Kiel, F.R.G. Received 7th Apri/1987, and in revised form 31st August 1987

This paper deals with the influence of different voice quality frames ('tense voice', ' neutral voice', ' breathy voice') preceding and following stop closure silence, on lenisjfortis perception, cued by a closure duration continuum. It also looks at the importance of the overall energy level and of the closure release cycle for the stop categorization. 'Tense voice' shows a fortis bias, compared with ' breathy voice', but 'neutral voice' clusters with the latter. The voice quality effect has a pre-closure and a post-closure component. Overall energy does not affect lenis/fortis perception if it results from a proportional change of all amplitude values; it is the spectral energy distribution that cues the opposition: a high ratio of the lower (below 1000Hz) and the upper portion of the spectrum, as in ' breathy' and 'neutral voice' , favours Ienis. The stop closure release cycle contributes to the lenis/fortis dichotomy as a local feature within a global voice quality frame .

1. Introduction 1.1. The point of departure for this study In Kohler & van Dommelen (1986) it was shown that the perception of a closure duration continuum from /d/ to jtj in German monotone " leiden"/"leiten" ['laedn]/['laetn] ("to suffer" /"to lead") is biased towards more /t/ responses, particularly in the middle of the range (where the duration cue is indecisive), when LPC-resynthesized stimuli are compared with natural ones. The effect is strongest when both pre- and post-closure signal stretches are resynthesized, but six resynthesized cycles before the stop closure are sufficient to produce a sizable bias. Figure I presents the data for the test stimulus pronounced by the male speaker (KJK). It was argued that there are features in the LPC-synthesis versions of these stimuli that signal to the hearer the machine-made, "metallic" quality, which in turn provides a different frame of reference in perception, thereby influencing segment identification. It was further suggested that different overall human voice qualities [" tense voice", "neutral" or " modal voice", "breathy voice" , etc.; cf. Laver (1980), Nolan (1983)] may have similar prosodic effects on sound perception. This paper pursues this question by looking at the same closure duration continua in three monotone utterances of "leiden"j"leiten" which were produced with different 0095-4470/87/040365

+

17 $03.00/0

© 1987 Academic Press Limited

366

K. J. Kohler and W. A. van Dommelen

80

~

':::::

I

60

I

''

~: 40 '

,'l

.

60

80

I I

'

20

I

--

_J/ 1

100

120

140

C (ms )

Figure I. Percentage ltl responses as a funct io n of closure duration (C), for fl at F0 (origina l monotone " leiten") under the conditions " natura l" (•), " partly synthetic" (0), and " fully synthetic" (e ) (o f Experiment I in Kohler & van Dommelen, 1986); binomial confidence ra nges a t the 5% level ; seven listeners. At each da ta point N = 70.

voice qualities, viz . " tense voice", " neutral voice" and " breathy voice", but with very consistent F 0 and segment durations, by a trained phonetician, the same male speaker that had pronounced the stimulus for the experiments summa rized in Fig. I.

I .2. The "tense" to "breathy " voice-quality scale I .2. I. Production In this context, voice quality refers, first of all , to long-term scalar differences in the source function, produced by laryngeal tightening toward s the " tense voice" end and by laryngeal slackening with increasing air flow through a progressively less complete glottal closure towards the "breathy voice" end . Laryngeal tightening lengthens the closed phase of the glottal cycle and produces na rrow pressure waves, strengthening the higher part of the spectrum [cf. Chiba & K ajiyama (I 958: 20ff. , 35)]. The more open · glottis during breathy voice gives rise to damping and thus greater bandwidth of formants . It has been observed in some languages that have a phonological distinction between " clear" and "breathy" vowels, e.g. Gujarati and the Tibeto-Burman language Jingpho, spoken in Southern China and Burma, that there is no difference in the formant structure between these voice quality pairs, thus showing that the essential acoustic difference does not lie in the vocal-tract filte r but ra ther in the source (Bickley, 1982; Maddieson & Ladefoged, I 985) . On the other hand , there are data of phonological phonation type contrasts from other languages where a " tense" vowel has a higher F , than a "breathy" counterpart, e.g. in H ani , another Tibeto-Burman language spoken in Southern China (Maddieson & Ladefoged, I 985). In these cases, the vocal tract is probably shortened due to a raising of the larynx in " tense voice" . Thus the difference in the source spectra of " tense voice" vs. " breathy voice" may be heightened by a general tightening or slackening of the vocal tract structures, resulting in a change of the transfer function over and above the source change.

Voice quality in lenisffortis perception

367

1.2.2. Terminology Before we look at the acoustic manifestation of the selected tokens along the "tense" to " breathy" voice-quality scale, a few terminological comments are required for the sake of clarity. In south-east Asian linguistic studies, there is an established use of the terms " tense" and "lax" with reference to laryngeal settings to distinguish phonological contrasts of the types referred to in Section 1.2.1. They include creakiness, laryngeal tension and larynx raising versus breathiness, laryngeal slackening and larynx lowering. Matisoff (1973) even grouped clusters of properties into a "tense-larynx syndrome" and a " lax-larynx syndrome", adding further prosodic as well as segmental features , viz . higher pitch/ rising contour, association with final[?], voicelessness, retracted tongue root vs. lower pitch/falling contour, association with final [h], voicedness, advanced tongue root. Maddieson & Ladefoged (1985) and Maddieson & Hess (1986) took over the " tense"j"lax" terminology with regard to these phonological feature oppositions and found that the bundles of the phonatory and non-phonatory manifestations differed quite markedly from language to language in the same linguistic area, although there was a consistent acoustic difference, namely the relationship of the ampli tudes of the first and second harmonics, or of the fundamental and the first formant , as was also established by other studies (Bickley, 1982). Although the authors were able to relate the "tense" j"lax" dichotomy in all the four languages they investigated to a phonatory difference, they provided evidence that the phonation contrast should be described differently in Hani and Yi as opposed to Jingpho and Wa . They suggested that in the latter the contrast is between a relatively more breathy phonation (= " lax") and a modal phonation (="tense"), whereas in the former it is between a relatively more laryngealized phonation (= " tense" ) and a modal phonation (="lax"). This points to a phonetic scale from "tense" to "lax" phonation, i.e. from high to low laryngeal tension accompanied by increasing airflow due to progressive reduction of glottal closure. The phonological opposition between "tense" and "lax" may then be located at different points along this scale in different languages that make use of such a contrast. Depending on where the phonological category of "lax" is located it may have more or less breathiness, i.e. be more or less " leaky" in addition to being characterized by laryngeal slackening in comparison with the "tense" opposite. As this paper does not deal with the phonological use of phonatory settings but with their long-term voice quality function, the terms "tense voice" and " breathy voice", as customarily used in the discussion of voice quality will replace the "tense"j"lax " pair. They are associated with the phonetic scale described in this paragraph and in Section 1.2.1. Somewhere along this scale there is an area where the speech output does not have the clear attributes of either "breathy" or "tense voice"; this voice quality is called "neutral voice" because it is related to a neutral phonatory setting in respect of tension and airflow . It corresponds to 'modal voice' of other studies (e.g. Laver, 1980). 1.2.3. Acoustic manifestation of the selected tokens The selected tokens of " tense" , "neutral" and "breathy voice", as illustrated in the sound spectrograms a nd power spectra of Figs 2- 5, exhibit the following acoustic features, which , by way of generalization, may be taken as a first approximation to the manifestations of the three voice quality types. Across the entire utterances, the spectra of " tense voice" differ from those of the other two voice qualities by having a less prominent first spectral peak in relation to the higher-order peaks and/or a less steep

K. J. Kohler and W. A. van Dommelen

368 ,.

.

~.· ·

., '·

!

(b)

(c)

Figure 2. Broad-band sound spectrograms (300Hz filter bandwidth, 0-8 kHz analysis range, Kay Elemetrics Digital Sona-Graph 7800/7900) of three A/D and D/A converted basic " leiden" ['lae<;)n] utterances used in Test I (after temporal equalization as well as excision of closure periodicity and replacement by 80ms of silence): (a) " tense voice", (b) " neutral voice", (c) " breathy voice". Recording level: peak - 3 dB, high pre-emphasis, dynamic range: NOR-5, analysis attenuator 0 dB, mark level 5; I000 Hz frequency markers.

Voice quality in lenis ffortis perception 40

( 0 )

369

(d )

30

~.

20 10 0 40

( e)

30 co

"0

t~~w

20 10 0 40

(c)

10 0

l

0

"

(f)

30 20

~h!lj

,J

Ill

2

3

4

5

6

7

8

~1 1,1~ ~1

0

2

JJ 3

4

5

6

7

8

kHz

Figure 3. Power spectra (=sections, 45Hz filter ba ndwidth , 0--8 kHz analysis range) at the following points in time: (a) 160 ms after stressed vowel onset, (b) 100 ms before stressed vowel offset, (c- f) 20, 30, 40, 120 ms after closure silence, over 1/45 s (22 ms) of speech signal preceding these time marks. Same recordings and sa me instrumentation (mark level 7) as in Fig. 2: " tense voice" .

spectral tilt. This reduces the lower-frequency energy concentration and, therefore, the ratio between the lower portion of the spectrum [below 1000Hz, cf. Fmkjaer-Jensen & Prytz (1976)] and the upper one; it also defines the peaks more sharply. This corresponds to the narrow formant bandwidths in the laryngealized as against the plain vowels described by Ladefoged ( 1982) for !X66, a Khoisan language of southern Africa. In the diphthong jaej [see Fig. 2(a) vs. Figs 2(b) and 2(c)], the second formant frequen cy reaches a higher maximal value for " tense voice" than for the other two voice qualities (a lth ough the formant frequencies immediately surrounding the silent interval are very similar in all three cases); this points to a more extreme forward movement of the tongue in the diphthongal glide, which may be a further aspect of the general increase in the muscul ar activity of the whole vocal apparatus, characteristic of thi s voice quality. In " breathy voice", the lower-frequency energy concentration is even stronger than in "neutral voice"; there is at the same time a weak energy spread across a wide frequency range, reflecting the noise of the high turbulent airflow. These features of "breathy voice" are particularly salient in the final nasal , which is characterized throughout by weak second and third formants in relation to F 1 and by fairly strong

K. J. Kohler and W. A. van Domme/en

370 40 30 20 10 0 40 30 co

"C

20 /0 0 40

(c )

(f)

30 20 /0 0

~. 0



I

Ld 2

3

4

5

6

7

8

0

2

3

4

5

6

7

8

kHz

Figure 4. Power spectra as in Fig. 3: " neutral voice" .

harmonics between the first and second formants , compared with the other two voice qualities. Furthermore, in " breathy voice" the diphthong has a stronger fundamental in relation to the second harmonic, or to (the harmonic with the highest amplitude in) the first formant, than in the other two voice qualities [see Figs 3(a, b), 4(a, b) and 5(a, b)]. This agrees with the data on phonologically "breathy" (" lax" ) as opposed to "tense" vowels in Bickley (1982), Ladefoged (1982), Maddieson & Ladefoged (1985), Huffman (1985), Ladefoged & Antofianzas-Barroso (1985), Maddieson & Hess (1986), Langmeier, Liiders, Schiefer & Modi (1987). Finally, there is also an increase in the overall sound energy in the three voice quality tokens, rising from "breathy voice" to " neutral voice" to " tense voice" and leading to the auditory impression of increasing loudness. Looking at the temporal succession of narrow-band power spectra, i.e. at the spectral dynamics, after the silent interval in Figs 3- 5 (c--e) we find the following differences between the three voice qualities. In "tense voice", the low-frequency peak of the first spectrum does not increase much , but from the second spectrum on, higher spectral peaks develop to a considerable magnitude. In " breathy voice", however, the first spectral peak continues to rise from a stronger start in the series of spectra across the same stretch of time, resulting in a sequence of energy distributions that favours the lower frequency range. There is thus a long-term spectral difference between "tense voice" and "breathy voice" , characterizing the whole utterance. "Neutral voice" , at least

Voice quality in lenisffortis perception

371

40 30 20 10 0 40 30 Cil "0

20 10 0 40

(c)

(f)

30 20 10 0 0

2

3

4

5

6

7

8

0

2

kHz

Figure 5. Power spectra as in Fig. 3: " breathy voice" .

as realised in the production illustrated in Figs 2(b) and 4, is less clearly defined and occupies an intermediate position between the other two voice qualities, but groups with "breathy voice" in that the upper portion of the spectrum (between 1.5 and 2.5 kHz) develops less rapidly after the closure silence, and that there is thus a stronger lowfrequency energy concentration, at least during the first 40 ms or so. As the presence of low-frequency spectral energy, at the expense of the higher part of the spectrum, in a time window of the order of 20 ms following a stop closure is a well-known cue to the Ienis feature (Stevens & Blumstein, 1981 ), the stop release signals in the three voice qualities are also potential factors in the lenis/fortis distinction. 1.3. Aims of this study It has been shown that in speech production F 0 is higher after and before fortis as opposed to Ienis stops (Hombert, Ohala & Ewan, 1979; Kohler, 1982) and that this microprosodic difference cues the segmental categories (Haggard, Ambler & Callow, 1970; Kohler, 1985, 1987), depending on the global utterance intonation (Silverman, 1986; Kohler, 1985, 1987). It has also been established that after the coalescence of the lenis/fortis opposition in all other phonetic properties the F 0 difference may persist, be intensified and thus give rise to tonal distinctions (Hom bert et al., 1979). The association

372

K. J. Kohler and W. A. van Dommelen

of F 0 differences with the Ienis and fortis stop categories has been explained with reference to aerodynamic and/or muscular tension constraints on phonation by the timing and force of supraglottal articulations (Kohler, 1984). There are strong indications that Ienis and fortis stops not only influence the frequency, but also the mode of vocal fold vibrations, viz, that adjacent to fortis stops phonation is more spike-like with sharper discontinuities in airflow and a larger proportion of closure within the total cycle, also with greater irregularities in the cycle period. A tightening of the vocal folds for fortis stops as against a slackening for Ienis ones would thus, at the same time, raise F 0 and shift phonation more towards the "tense" end of the voice quality scale in the surrounding vowels. Diachronic data on the development of phonological " tense" and " lax" phonations in Sino-Tibetan and Tibeto-Burman languages support this assumption. In one group of these languages, " tense" vowels occur in syllables with originally voiceless fortis initials, "lax" vowels in those with originally voiced Ienis initials. In the other group, "tense" vowels are the normal reflex of the original final (voiceless fortis) stops, including [?], i.e. they occur in historically checked syllables, whereas "lax" vowels are tied to historically unchecked syllables including final [h] (Matisoff, 1973; Maddieson & Ladefoged, 1985). If both F 0 and phonation type differences can be related to pre- and post-vocalic stop consonant articulations via the physiological mechanisms of vocal fold tightening or slackening associated with them (and there are good reasons to believe this is the case), then the tonal evolution of Chinese and other Asian languages appears in a new light. It is no longer necessary to assume that phonation type differences in vowels caused the development of tones, e.g. a split into upper and lower registers in northern standard Chinese (cf. Maddieson & Hess, 1986). F 0 and phonation changes arise simultaneously, and a particular language may enhance either or both to phonological status when the consonant contrast is otherwise lost. This assumption presupposes that phonation type differences between "tense voice", "neutral voice" and "breathy voice", as in the experiments reported here, have a similar cue value for the lenisjfortis contrast as was found for F 0 , i.e. that listeners become aware of these prosodic features when the stop categorization is otherwise made ambiguous through systematic experimental manipulation. If it can be shown that this is what listeners do in languages where a lenisjfortis opposition is generally maintained, then these results will lend support to the explanation of the historical development of tone and phonation systems. In sound change the originally dominant features distinguishing between the stop classes are obliterated by natural development rather than by human laboratory interference, but listeners then do the same: they can hook on to the microprosodic differences which continue to maintain a phonological opposition, and they may exaggerate them as speakers in order to facilitate communication. The above questions can only be elucidated by experimental phonology (Ohala & Jaeger, 1986). The experiments reported in this paper throw light on the following ISSUeS :

(1) the perceptual cue value of the "tense" to "breathy" phonation scale in the differentiation of fortis and Ienis stop consonants, as a further element in the feature bundle distinguishing the two phonological categories; (2) the increase or decrease of the cue effect through various extensions of the voice quality type across the utterance; (3) the parallel effects of LPC synthesis and certain human voice qualities;

Voice quality in lenis/fortis perception

373

(4) the perceptual parallel of phonation type and F 0 as prosodic markers of the lenis/fortis contrast, linked to the same physiological mechanism of laryngeal tightening/ slackening in speech production; (5) the relevance of these data for the explanation of prosodic sound change; and (6) the possible distortions introduced by LPC synthesis into the investigation of the lenis/fortis dichotomy . In view of these aims and the various acoustic phonetic aspects associated with the different human voice qualities selected from the "tense" to " breathy" scale, the following hypotheses as to voice quality effects on segmentallenis/fortis perception were set up. (I) In a comparison of closure duration continua from /d/ to /t/ within naturally produced phonologically identical "breathy voice", " neutral voice" and "tense voice" stimuli there is an increasing bias towards /t/ responses from the first to the third. (2) Equalizing the loudness differences between the three voice quality tokens by raising all amplitude values of "breathy voice" and "neutral voice" proportionally to the level of"tense voice" should influence lenisjfortis perception if the overall energy, rather than the spectral energy distribution, is the decisive factor . (3) Exchanging the stop release cycles in "neutral voice" and "tense voice" by the one in "breathy voice" decreases /t/ responses as compared with the originals. (4) Exchanging the whole / n/ signals (including the stop releases) in "neutral voice" and "tense voice" by the ones in "breathy voice" causes a greater decrease of /t/ responses than in (3) above. However, as the characteristics of "tense voice" are still present in the pre-closure signal there remains a / t/ bias compared with the other two voice qualities, albeit smaller than in the case of both pre- and post-closure signals having the "tense voice" characteristics. On the basis of these hypotheses, four experiments were carried out.

2. Procedure In the whole series of listening tests, three original utterances of "leiden", pronounced by one of the authors (KJK), were used for stimulus construction. They were taken from a recording of several block repetitions , each containing 5- 6 monotone "leiden" or "leiten" tokens, respectively, with gradually changing voice quality from "tense voice" to "breathy voice" across the items of each block. The selected utterances represented auditorily acceptable tokens of " tense" , "neutral" and "breathy voice", respectively, and were equivalent in their fundamental frequency: in all three cases F 0 fluctuated between 105 and 108Hz. After low-pass filtering with 5kHz (12dB octave - 1 ) and A/ D conversion with a sampling rate of 10kHz, the speech material was electronically spliced. In order to prevent possible artifacts in the listening tests, durational differences between the three tokens were equalized as follows. Because the "breathy voice" version showed the shortest duration of the part /lae/ (808 ms), the corresponding segments of the other two tokens were shortened by excising, at zero crossings, six successive vowel periods (ca. 56 ms) in the " tense voice" and one period (ca. 9 ms) in the "neutral voice" token, resulting in jlaej durations of 810ms and 812ms, respectively. For /n/ , the

374

K. J. Kohler and W. A. van Dommelen

"neutral voice" version (611 ms) was taken as a reference. From the nasals in the other utterances two periods each were delected (about 19ms), leading to 615ms and 613ms, respectively, for the corresponding segments. In all the tests, the periodicity during the stop closure was excised and replaced by silence, which was varied in four 30-ms steps from 80 ms to 200 ms to yield a continuum from a clear /d/ to a clear jtj. This procedure was applied to the three modified original utterance tokens ("tense voice", "neutral voice" and "breathy voice") and thus yielded 15 basic stimuli, to which further manipulations were applied, resulting in the following four test sets. Test I . This test contained the 15 basic stimuli. Test 2. This is a modified Test 1, obtained by equalizing the differences in subjective loudness between the original utterances, with the loudest ("tense voice") as reference. The amplitude of the "neutral voice" token was increased by a factor of 1.3 ( + 2.3 dB), and that of the "breathy voice" by 1.5 ( + 3.5 dB). Test 3. Test 3 comprises two sub-tests: 3a and 3b. The stimuli of Test 3a were derived from the 15 basic stimuli of Test 1 by substituting the release cycle of "breathy voice" (lOms) in place of those in "tense voice" (11 ms) and "neutral voice" (7.5ms). There was no release burst in any of the stimuli; the closure opened into a vibratory cycle. In the case of "neutral voice", the separation of the original closure periodicity from the post-closure vibration proved more difficult, and in excising the former, part of the release cycle was probably cut off, resulting in the low value of 7.5 ms for the first post-closure vibration; the next cycles measure 10 ms. In Test 3b, the stimuli of Test 1 were used, but the "neutral voice" stop release cycle was increased to 10 ms by excising the closure periodicity differently. Test 4. In order to equalize the differences in loudness between the three utterances, the same amplifications were applied as in Test 2. Subsequently, the stop release cycle plus the following nasal in the "tense voice" and "neutral voice" utterances were replaced by the corresponding signal portions from the "breathy voice" token.

For each test, the 15 stimuli were copied 10 times and randomized, yielding a total of 150 tokens. Each stimulus was followed by a 3-s response pause and a 100-ms pure tone, which served as a warning for the next stimulus, which in turn followed after a 1.5-s pause. Each group of 10 stimuli was preceded by a 500-ms bleep tone for listener orientation. Stimuli were recorded on digital tape and presented to subjects over a loudspeaker in a sound-treated room. Subjects responded by pressing one of two buttons for "leiten" = jtj ("to lead") or "leiden" = jdj ("to suffer"), respectively. Several groups of native German listeners (consisting of students of linguistics and phonetics and of staff members of the Phonetics Institute of Kiel University) participated in the experiments: Tests 1 and 2, seven listeners (students ofphoneticsjstaffmembers = Group A); Test 1, eight listeners (students of Romance languages = , Group B); Tests 1, 3a and 4, seven listeners (students of phonetics/staff members = Group C; among them three subjects who also participated in Group A). A further group, Group D (seven listeners, staff members), also repeated Test 1. GroupE (eight listeners, students of phonetics/staff members, among them five who also participated in Group C) did Test 3b.

Voice quality in lenis/fortis perception

375

3. Results and discussion 3.1. Hypothesis 1

Figure 6 provides the combined data of Groups A- D in Test I. As each group produced the same pattern of results they could ail be treated together. In the middle of the duration range, "tense voice" stimuli cause a greater /t/ bias than " neutral voice" stimuli in relation to " breathy voice", the two extremes being significantly different. Hypothesis I was thus not rejected. As regards "neutral voice", the data of Test I wiii have to be reassessed in the light of the results of Tests 3a, b.

100

80

~

' '

60

40

20

80

110

140

170

200

C (ms)

Figure 6. Percentage /t/ reponses in Test I as a function of closure duration (C) for the basic " tense voice" (e), " neutral voice" (0), and "breathy voice" (•) stimuli, and binomial confidence ranges at the 5% level. Combined results of Groups A- D, 29 listeners; at each data point N = 290.

100

80

60

I

~

T

:;:,

'

'

''

I

:/I d / "i I ' -II ' ,/ /~

40

I

20

I

I

/

/

1

'

_j_

.,/

1/

I,

80

110

140

170

200

C(ms)

Figure 7. Percentage /t/ responses in Test 2 as a function of closure duration (C) for the amplitude-adjusted "tense voice" (e) , " neutral voice" (0), and " breathy voice" (•) stimuli, and binomial confidence ranges at the 5% level. Group A, seven listeners; at each data point N = 70.

K. J. Kohler and W. A. van Dommelen

376

80

60

40

20

l

''

100

(b)

80

60

40

20

80

110

140 C (ms)

Figure 8. (a) Percentage ftf responses in Test 3a as a function of closure duration (C) for the "tense voice" (e), " neutral voice" (0), and "breathy voice" (•) stimuli with equalized stop release cycle (taken from " breathy voice" ), and binomial confidence ranges at the 5% level. Group C, seven listeners; at each data point N = 70. (b) Percentage jtf responses in Test 3b as a function of closure duration (C) for the basic " tense voice" (e), "neutral voice" (0), and " breathy voice" (•) stimuli of Test 1 (but with an increased " neutral voice" release cycle of 10 ms instead of 7.5 ms), and binomial confidence ranges at the 5% level. Group E, eight listeners; at each data point N = 80.

3.2. Hypothesis 2

Figure 7 shows the data of Test 2. Equalizing the loudness differences between the three voice qualities does not change the relative position of the three response functions found under hypothesis I. The overall energy is thus not a decisive factor in the voice quality effect on segment perception.

3.3. Hypothesis 3

Figure 8(a) presents the data of Test 3a. The functions for " tense voice" and " breathy voice" approximately stay in the same positions as in Test I, but that for " neutral voice" is lowered to coincide with the one for " breathy voice".

377

Voice quality in lenisffortis perception 100

/<

80

j,'' ' '

,, ~

''

..L

,,/,'

60

40

20

140

170

200

C (ms)

Figure 9. Percentage (t( responses in Test 4 as a function of closure duration (C) for the amplitude-adjusted " tense voice" (e), " neutral voice" (0), and " breathy voice" (•) stimuli with equalized stop release cycle + (n/ (taken from " breathy voice"), and binomial confidence ranges at the 5% level. Group C, seven listeners; at each data point N = 70.

It has already been pointed out in the description of the stimulus construction for Test 3a that the first post-closure cycle only has a duration of 7.5 ms as against the 10 ms of the following cycles. There is thus an abrupt change in the periodicity of the release signal after the silent interval, and this introduces a burst character into the signal (with a substantial F 0 drop, as found in fortis releases), which is absent from the other basic voice quality stimuli as well as from " neutral voice" with " breathy voice" release . Seen this way, the responses to the "neutral voice" set in Test I do not show a /t/ bias due to a different voice quality frame compared with that of "breathy voice" , i.e. to a global prosodic feature , but to a local effect of the stop release. When this local feature is adjusted , as it was for the Test 3a stimuli, " neutral voice" does not have a different effect on Ienis/ fortis perception from "breathy voice". Thus " neutral voice" does not represent an intermediate stage between "tense" and "breathy voice" in its long-term influence as a frame in segment perception, just as it groups together with " breathy voice" in the essential acoustic feature of low-frequency energy concentration. On the other hand , "tense voice" is clearly separated from these two voice qualities in its acoustic manifestation as well as its perceptual effect. To lend further support to this assessment, Test 3b was run, again with an increased release cycle of I 0 ms, but this time taken from " neutral voice" . The results are presented in Fig. 8(b) and confirm that " neutral voice" does not occupy an intermediate stage between "tense" and " breathy voice" in its long-term influence as a frame in segment perception. Again it groups with " breathy voice" , and both are significantly different from "tense voice" . This grouping in perception largely mirrors the grouping in acoustic manifestation. 3.4. Hypothesis 4 The data of Test 4 are given in Fig. 9. A comparison of these results with those of Test 3 shows that the creation of the same post-closure voice quality by taking over

378

K. J. Kohler and W. A . van Dommelen

the"breathy voice" release + /n/ in all three tokens not only has the same effect on " neutral voice" as in Test 3, because of the local release adjustment, but also reduces the /t/ bias in "tense voice", because of the different long-term frame. The ft/ effect is, however, not eliminated altogether, which proves that the different pre-closure frame also has an influence on lenis/fortis perception, and which suggests that the long-term pre-stop voice quality, manifested in spectral relations, is responsible for it. Hypothesis 4 can thus be accepted. 4. Conclusion

Taking the naturally produced " tense voice" stimulus as the point of departure, the replacement, first of the post-closure, then also of the pre-closure signal, by the corresponding "breathy voice" stimulus sections, brings about a stepwise change in the voice quality frame surrounding the silent interval from "tense voice" to "breathy voice" through long-term spectral changes in the relationship of the higher and lower frequencies and in the sharpness of the spectral peaks. This stepwise voice quality shift decreases the /t/ bias in the middle of the closure duration continuum. The three response functions for Group C are compared in Fig. 10. There is a clear long-term voice quality influence on segmental lenis/fortis perception consisting of a pre-closure and a post-closure component. That a global frame, rather than a local segment effect is at work here can be gathered from the stability of the identification function for the fully " tense voice" stimuli, irrespective of the type of stop release ["tense" or "breathy", cf. Figs 6 and 8(a)]. A difference only occurs when the rest of the /n/ is also changed. Furthermore, spectral relations are responsible for these effects; this may be deduced from the fact that an increase of all amplitude values by a constant factor, which raises the overall energy level but leaves the spectral energy distribution unaltered, does not change the response functions at all. "Tense voice" and "breathy voice" are two voice qualities taken from opposite ends of a scale ranging from strong larynx tightening to larynx slackening. "Neutral voice" occupies a position between these extremes, but is 100

:'I

80

~

,/ / :' I ' ' :' I : , i,: I

60

'<:: 40

/

./j I /

,' j /

20

,/

80

110

1

140

170

200

C (ms)

Figure 10. Percentage /t/ responses as a function of closure duration (C) for the " tense voice" stimuli of Test l (e) and the " breathy voice" (•) as well as the mixed "tense voice- breathy voice" (0) stimuli of Test 4. Binomial confidence ranges at the 5% level. Group C, seven listeners, in all three cases; at each data point N = 70.

Voice quality in lenis ffortis perception

379

closer to " breathy voice" because of its low-frequency prominence, producing the same identification function . Besides this global effect of a voice quality frame there is also the local effect of the stop release. Comparing the results oflenis/fortis identification in the different natural voice quality frames ("tense", mixed " tense-breathy" , "breathy") of Fig. I 0 with those in the different man/machine quality frames ("synthetic", "partly synthetic", " natural" ) of Fig. I raises the question as to whether there is a common denominator between these two sets of experimental data . The "'metallic" sound impression of the resynthesized stimuli can certainly not simply be equated with natural "tense voice". Due to the much longer durations of the natural voice quality stimuli in the experiment reported here, longer closure silences were necessary than in the stimuli of Kohler & van Dommelen ( 1986) in order to bring about a change from /d/ to ft/; furthermore , in the previous experiments, the basic utterance for stimulus generation was a " leiten" token , whereas here " leiden" tokens were used; and , finally, different range effects were possible in the experiments that included the stimulus sets of Figs I and I 0 because there were always other sets beside them in each test. All these differences preclude a direct comparison of the striking similarities in the relationships between the three identification functions in Figs I and I 0. However, the LPC-synthesized stimuli contain long-term spectral characteristics that are similar to the ones found in natural " tense voice": the formant bandwidths in LPC synthesis, compared with natural utterances in ' neutral voice', are always narrower and there is less energy between the formant peaks so that they are more sharply defined , and in the nasal of the LPC-resynthesized " lei ten" utterance, the first formant is weakened in relation to the higher formants, compared with the original. It may, therefore, be suggested, tentatively, that " meta llic" synthetic and " tense voice" natural stimuli have these long-term spectral features in common and thus provide similar voice quality frames for lenis/fortis segment perception. Before this issue can be settled, a great deal of further research is necessary. Among others, the following questions require answers. (I) How long does the voice quality frame need to be in order to cause an effect? In particular, is the ft/ bias strengthened by an increase of the synthetic section of the so-called partly synthetic stimuli in Kohler & van Dommelen (1986) from the six cycles to the whole pre-closure signal, and conversely, would six cycles (instead of the whole pre-closure section) of " tense voice" be sufficient to produce the same /t/ bias? (2) What are the effects of having LPC synthesis or natural "tense voice" in the post-closure signal only, and how long a stretch is necessary to produce an effect? (3) What are the effects (on the /t/ categorization) of an LPC resynthesis of the three natural voice qualities? (4) Are the effects of LPC synthesis and of natural voice qualities on lenisjfortis perception also found in intonation patterns other than monotone pitch? (5) What are the interactions of long-term voice quality frames with the various acoustic continua that are known to cue lenisjfortis perception ("trading relations")? There is a striking parallel between the effects of phonation type and of F 0 on the lenisjfortis categorization (cf. Kohler, 1985): the effects are strongest for the ambivalent duration values and, by their strength, they demonstrate that prosodic cues can take over when segmental features lose their perceptual signalling power. Their common physiological origin in the slackening/tightening of the vocal folds for supraglottal stop articulations is apparently associated with the lenis/fortis categories by the listener as

380

K. J. Kohler and W. A. van Dommelen

long as they are still part of the phonological system. When the contrast in the stops themselves (voicing, aspiration) disappears through sound change, the prosodic features in the surrounding vowels provide strong enough perceptual cues to maintain the distinctions between word pairs that were originally differentiated by lenisjfortis consonants. The phonetic experiment in the laboratory can thus provide insight and a principled account of diachronic phonological changes. As the synchronic experimental data also show an increased effect with the temporal extension of the voice quality feature or with a greater F 0 extent, it may be assumed that a similar intensifying of voice quality and/or F 0 occurs at some stage in the historical sound change to make the distinction more robust for auditory communication. In the course of time, the link with the original segmental oppositions will fade from the tacit knowledge of the speech community, and new prosodic systems of phonation and/or tone will have replaced them. A comparison of the voice quality and F 0 effects in this paper and in Kohler (1985) suggests that F 0 is stronger. However, given the cumulative strength of the two factors in the data of Kohler & van Dommelen (1986), namely from LPC synthetic quality and from rising F 0 , there may be a similar combination of effects in Kohler (1985) because there the vowel portions of the stimuli were LPC synthesized and F 0 adjusted, whereas in this paper one factor, F 0 , was kept constant. This means that further studies will also need to take into account the possible distortions introduced into the investigation of the lenisjfortis dichotomy by LPC synthesis. To factor these out it will eventually be necessary to use a synthesis procedure that can model the modes of vibration in different voice qualities.

References Bickley, C. (1982) Acoustic analysis and perception of breathy vowels, Working Papers, Speech Communication Group , MIT-RLE, 1, 71 - 81. Cam bridge, Massachussetts. Chiba, T . & Kajiyama , M. (1958) The vowel: its nature and structure. Tokyo: Phonetic Society of Japan. Fmkjaer-Jensen, B. & Prytz, S. ( 1976) Registration of voice quality, Bruel and Kjaer Technical Review, 3, 3- 17. Haggard, M. , Ambler, S. & Call ow, M. (1970) Pitch as a voicing cue, Journal of the Acoustical Society of America, 47, 613- 617. Hombert, J.-M., Ohala, J. 1. & Ewan, W. G. (1979) Phonetic explanations for the development of tones, Language, 55, 37- 58. Huffman, M. K . (1985) Measures of phonation type in Hmong, UCLA Working Papers in Phonetics, 61, l- 25. Los Angeles, California. Kohler, K. J. (1982) F 0 in the production of Ienis and fortis plosives, Phonetica, 39, 199- 218. Kohler, K. 1. ( 1984) Phonetic explanation in phonology: the feature fortisjlenis, Phonetica, 41, 150-174. Kohler, K. J . (1985) F 0 in the perception of Ienis and fortis plosives, Journal of the Acoustical Society of America, 78, 21 - 32. Kohler, K. 1. ( 1987) Microprosody in segment perception. In Proceedings of the elevemh international congress of phonetic sciences, Vol. l , pp. 80- 83. Tallinn: Academy of Sciences of the Estonian S.S.R. Kohler, K . J. & van Dommelen, W. A. (1986) Prosodic effects on Jenisjfortis perception: preplosive F0 and LPC synthesis, Phonetica, 43, 70- 75. Ladefoged, P. (1982) The linguistic use of different phonation types, UCLA Working Papers in Phonetics, 54, 28- 39. Los Angeles, California. Ladefoged, P. & Antoiianzas-Barroso, N. ( 1985) Computer measures of breathy voice quality, UCLA Working Papers in Phonetics, 61 , 79- 86. Los Angeles, California. Langmeier, C. , Luders, U. , Schiefer, L. & Modi, B. (1987) An acoustic study on murmured and "tight" phonation in Gujarati dialects - a preliminary report. In Proceedings of the eleventh international congress of phonetic sciences, Vol. l, pp. 328- 331. Tallinn: Academy of Sciences of the Estonian S.S.R. Laver, J. (1980) The phonetic description of voice quality . Cambridge/London: Cam bridge University Press. Maddieson, I. & Hess, S. A. (1986) "Tense" and " lax" revisited: more on phonation types and pitch in minority languages of China , UCLA Working Papers in Phonetics, 63, 103- 109. Los Angeles, California.

Voice quality in lenis ffortis perception

381

Maddieson, I. & Ladefoged, P. (1985) " Tense" and " lax" in four minority languages o f China, UCLA Working Papers in Phonetics, 60, 59- 83. Los Angeles, Ca lifornia. Matisoff, J. A. (19 73) Tonogenesis in Southeast Asia. In Consonant Types & Tones. Southern California Occasional Papers in Linguistics , No . I (L. M. H yman , editor) , pp. 71 - 95. Los Angeles: Lingui stics Program University of Southern California . Nolan , F. (1983) The phonetic bases of speaker recogn ition. Cambridge/ London: Cambridge U niversity Press. Ohala, J . J. & Jaeger, J. J. (1986) Experimental phonology. Orlando: Academic Press. Silverman, K. ( 1986) F 0 segmental cues depend on intona tion: the case of the rise after voiced stops, Phonetica, 43, 76--9 1. Stevens, K. N . & Blumstein, S. E. ( 1981) The search for in variant acoustic correlates of phonetic features. In Perspectives on the Study ol Speech (P. D. Eimas & J. L. Miller, editors), pp. 1- 38. Hillsdale: Erlbaum.