in American English: effects of local and prosodic context

in American English: effects of local and prosodic context

Journal of Phonetics (1997) 25, 471—500 The devoicing of /z/ in American English: effects of local and prosodic context Caroline L. Smith* Division o...

710KB Sizes 0 Downloads 31 Views

Journal of Phonetics (1997) 25, 471—500

The devoicing of /z/ in American English: effects of local and prosodic context Caroline L. Smith* Division of Head & Neck Surgery, University of California, Los Angeles, CA, U.S.A. Received 21st August 1996, and in revised form 25th June 1997

Voiced fricatives are often taken as an example of sound that is ‘difficult’ to produce. It might therefore be expected that speakers would choose to simplify them. In English, the most common simplification is devoicing, especially for voiced sibilants. The nature of this process was examined in productions of /z/ and /s/ by four speakers of American English. These were recorded in matched word and phrase positions using acoustic, airflow, and electroglottographic (EGG) data. Although many tokens of /z/ showed little or no vocal fold vibration in the EGG signal, durational and aerodynamic differences maintained the distinction between /z/ and /s/. The speakers varied in overall frequency of devoicing, but showed similar rank orderings for frequency of devoicing in different contexts. Devoicing was most frequent in two kinds of environments: those where it could be viewed as assimilation to an adjacent voiceless context, and those where articulatory and aerodynamic effort tends to be reduced. These contexts (unstressed syllables, and ends of words or phrases) have been shown to favor other kinds of prosodically-structured lenition. ( 1997 Academic Press Limited

1. Introduction Although it is well-known that speakers of English often produce phonologically voiced stop constants such as /b/ with little or no vibration of the vocal folds (Lisker & Abramson, 1964, 1967), the extent to which voiced fricatives are also produced without vocal fold vibration has been investigated much less. This article examines the production of a voiced fricative, /z/, using simultaneous acoustic, aerodynamic, and electroglottographic data. Using the term ‘‘voiced’’ to refer to /b/, /z/, and similar sounds implies that they belong to a category of sounds sharing a common feature [voice], which in English corresponds to a specific vocal tract configuration: an articulatory setting for vocal fold adduction and an aerodynamic setting for a pressure drop across the glottis. Speakers of both American and British English often produce phonologically voiced fricatives as wholly or as partially devoiced, so that either there is no vibration of the vocal folds or it does not last as long as the frication noise created at the oral constriction (Haggard, 1978; Veatch, 1989; Docherty, 1992; Stevens, Blumstein, Glicksman, Burton & Kurowski, 1992). If devoicing occurs during the fricative, there must be an adjustment to the state of the glottis, the aerodynamic conditions in the vocal tract, or the degree of constriction *Current address: Eloquent Technology, Inc. 2389 Triphammer Road Ithaca, NY 14850 0095-4470/97/040471#30 $25.00/0/jp 970053

( 1997 Academic Press Limited

472

C. ¸. Smith

in the oral cavity. This article investigates the conditions under which any of these adjustments, alone or in combination, might lead to devoicing. Since there appear to be at least three potential sources of devoicing, in any given context in which devoicing occurs, more than one of these adjustments may be at work. The problem of identifying which mechanism(s) are involved in a particular context may require more information than is readily available from the acoustic signal alone. Distinguishing voiced portions of the signal from voiceless may not be straightforward from the acoustic waveform alone. Recording a signal that more directly reflects laryngeal activity improves the reliability of detection of voiced portions of the signal and aids in investigating the source of devoicing. The accelerometer and throat microphone have been used in some previous studies of devoicing (Haggard, 1978; Docherty, 1992), but the electroglottograph (EGG) was used in the present study because its signal is less influenced by the supralaryngeal articulators than the signal from an accelerometer (Askenfelt, Gauffin & Sundberg, 1980; Baken, 1987), and thus provides a more unambiguous representation of glottal activity. By combining the EGG signal with measurement of airflow, the data gathered in the present study offer a more complete picture of fricative production than provided by previous studies that used fewer types of data. The multiple data sources should help distinguish whether devoicing can be attributed to adjustments to the glottis, aerodynamic conditions, or supralaryngeal articulators. In addition to recording simultaneously more types of data than in previous studies of devoicing, this study addressed two principal issues that had not been resolved by previous work. One issue is whether, and to what extent, the position of the vocal folds and the aerodynamic properties of a devoiced /z/ differ from voiced /z/ and voiceless /s/. The other main issue is the extent to which different phrasal contexts within a sentence favor or disfavor devoicing. This question is more appropriately investigated using recordings of complete sentences, as was done in this experiment, than from target words recorded in carrier phrases, the technique used in most previous studies which have chiefly investigated the effects of varying segmental context. 1.1. Proposed explanations for the tendency of voiced fricatives to be devoiced Probably the most widespread explanation for why speakers devoice phonologically voiced fricatives is that simultaneous voicing and frication are difficult to produce (Ohala, 1983). (This hypothesis proposes that voiced fricatives make more stringent requirements on the speech production mechanism than most other sounds.) In common with other voiced sounds, voiced fricatives require that subglottal pressure be higher than oral pressure in order to maintain vibration of the vocal folds. In common with voiceless fricatives, high airflow through the oral constriction is necessary in order to create the characteristic noise of the fricative. These two requirements compete in a voiced fricative. Oral pressure must be kept relatively low (compared to subglottal pressure) to maintain voicing, but the narrow oral constriction will tend to increase pressure in the mouth. Although oral pressure needs to be kept low to maintain voicing, if the oral constriction is widened, the pressure drop across the oral constriction will be too low to generate turbulent airflow, as is necessary for a fricative. The result would be a voiced approximant rather than a fricative. At least for English, it appears that simplification of voiced fricatives into voiced approximants is limited to non-sibilant fricatives. There do not seem to be any reports in the literature of voiced sibilants being simplified into voiced approximants, but simplification in the

Devoicing of /z/ in American English

473

form of a loss of voicing is quite common (Haggard, 1978; Docherty, 1992). This asymmetry suggests that speakers are simplifying voiced fricatives by allowing oral pressure to rise and/or subglottal pressure to fall, not by widening the oral constriction and allowing oral pressure to fall. Calculations based on the cross-sectional areas of glottal and oral constrictions and pressures in different parts of the vocal tract show that in the absence of maneuvers explicitly directed at prolonging glottal vibration, it may not last the entire duration of the oral constriction in a voiced fricative (Stevens et al., 1992). This difficulty hypothesis is a production-based explanation for the rarity of voiced fricatives in the world’s languages and their potential for devoicing in languages where they do occur (Ohala, 1983). An alternative hypothesis (Balise & Diehl, 1994) suggests that voicing is perceptually disfavored for fricatives, particularly sibilants, because it diminishes their characteristic high intensity noise. From a perceptual point of view, the need for distinctiveness among the sounds in a language might seem likely to discourage devoicing in English, since each of the voiced fricatives has a voiceless counterpart. Viewed as a conflict between ease of production and ease of perception, devoicing may be an example of sounds that are easier to produce (devoiced fricatives) winning out over those that are more easily identified (fully voiced fricatives). However, experiments on the perception of fricatives have shown that listeners may use frequency and duration information in the signal, not just the presence of glottal vibration, in deciding whether to categorize a fricative as voiced or voiceless (Raphael, 1972; Soli, 1982; Baum & Blumstein, 1987; Jongman, 1989; Stevens et al., 1992). This study will compare the production of voiced and devoiced /z/ and voiceless /s/ in an attempt to determine what adjustments are being made that result in loss of voicing. Using physiological and acoustic data to make pairwise comparisons between /s/ and /z/ will reveal the differences between the two sounds more fully than was possible in studies that collected fewer types of data. The /s/ and /z/ notation is used here to indicate two ‘‘segments’’, or complexes of gestures, that are lexically contrastive in English (and many other languages). As suggested by Byrd (1996a), the notion of segment could be interpreted as a group of articulatory gestures (in the sense of Browman & Goldstein, 1986) occurring within a single word with the temporal phasing of the gestures defined with a narrow tolerance for variation. /s/ includes both a lingual constriction and a glottal opening; /z/ has a lingual constriction gesture that is probably identical to that of /s/ (Scully, 1971), but with glottal adduction rather than glottal opening. One difference between /s/ and /z/ is that while vocal fold activity for /z/ can range from continuous vibration to complete absence of vibration, /s/ is consistently produced with little or no vibration (Docherty, 1992). 1.2. Amount and likelihood of devoicing It is important to distinguish between the amount of devoicing in a given fricative and the likelihood for devoicing to occur in a given position. Some differences in amount of devoicing in different contexts have been reported by Docherty (1992) and Stevens et al. (1992). Comparing fricatives produced in different contexts also makes it possible to study the likelihood of devoicing in these different contexts. This variable has received more attention in previous studies than the amount of devoicing. One reason for this emphasis may be that likelihood of devoicing in different contexts can be studied by making a binary determination (voicing/devoicing) for each individual token, whereas

474

C. ¸. Smith

comparison of the amount of devoicing in different tokens requires that the entire time course of each token be examined. Previous studies that have investigated the frequency of occurrence of devoicing have found substantial variation among different phonological contexts. For British English, Haggard (1978) found that fricatives following a voiced stop were devoiced more often than intervocalic fricatives, and fricatives following a voiceless stop were even more likely to be devoiced. Docherty (1992) for British English, and Veatch (1989) and Stevens et al. (1992), for American, observed similar patterns. In Haggard’s and Veatch’s results, fricatives in word-final position were virtually always devoiced, comparable to what was observed for fricatives adjacent to voiceless stops. Stevens et al. (1992) classified post-vocalic fricatives with at least 30 ms of voicing as ‘‘phonetically voiced’’; by this criterion, 86% of the phonologically voiced word-final singleton fricatives in their data were ‘‘phonetically voiced,’’ a much higher percentage than was found for voiced fricatives preceding (48%) or following (58%) voiceless fricatives. If at least some of these devoiced fricatives lack any vocal fold vibration at all, the potential might arise for neutralization of devoiced fricatives with voiceless ones. However, work on stop devoicing has shown that devoiced stops do not generally share all of the acoustic properties of voiceless stops (Dinnsen & Charles-Luce, 1984; Port & O’Dell, 1985; Port & Crawford, 1989). Perceptual neutralization, meaning that speakers cannot distinguish between devoiced fricatives and their voiceless counterparts, would be most likely if the devoiced fricatives were produced in exactly the same way as the voiceless ones. In fact, English listeners seem to be very successful at distinguishing /s/ and /z/, whether the /z/ is devoiced or not (Stevens et al., 1992). The present study is concerned only with the production, not the perception, of fricatives; however, this investigation of the details of production of devoiced fricatives may highlight how listeners are able to distinguish them from voiceless ones.

2. Method The goal of this experiment was to investigate the devoicing of /z/ in a variety of phonological environments in natural speech. These environments were chosen to allow comparisons involving several of the factors suggested by previous work. In order to achieve the most natural sample of speech, the target fricatives were produced in meaningful sentences. In these sentences, /s/ and /z/ occurred in contexts matched for type of neighboring sounds and position in word or phrase. While it was not always possible to match the contexts for /s/ and /z/ exactly, they were as similar as possible with respect to the phonological factors and usually also syntactic position. The word or words that constitute the immediate context for a target /s/ or /z/ will be referred to as an utterance, and one repetition of an utterance will be called a token. Note that in this usage, an utterance consists of only one or two words and is much smaller than a sentence. The experiment included 20 utterances containing target /z/ and 20 containing target /s/. Thus, for each context in which a /z/ was measured, an /s/ in matching context was measured as well. In order to reduce the duration of the experiment, two or three target fricatives were included in each sentence. However, the utterances for each pair of contexts matched for /s/ and /z/ occurred in different sentences. The complete set of sentences is given in Table I.

Devoicing of /z/ in American English

475

TABLE I. Sentences used in the experiment. The underlining has been added to mark the fricatives that were measured Ms. Barnes observed him reading this book while he was eating dessert. 2 22 Her husband wears a false beard that2 slides around when he sits down. 2 s bemoaned his2 false pretenses for avoiding work. 2 John’s bos 2 The music2 2 paused for a long time after these bands finished playing. 2 sure the zinc closure fits tightly. Pour that liquid2 into the red sink, and make 2 from the earthquake before 2 We should replace broken glass any more of it falls. 2 2 The red zinc platter in the kitchen belongs to my housebound aunt. His boss 2 asked him why he falls behind in his work so2 often. 2 2 in the box pops out very 2 quickly. The jack 2 level falls perceptibly. When Bob’s out, the noise 2 ‘‘Niagara Falls is in Vermont’’ 2 The statement is totally false. 2 2 On a test question choosing true or false is easier than multiple choice. 2 a short pause before she answered 2 There was her boss. 2 five minutes without a pause. 22 Mary’s boss laughed for 2 2 e outraged impatient listeners in the Ros 2 eland concert hall. The long paus 2 e lengthened as the Mafia boss passed2 down Rossland Avenue. The hushed paus 2 shouted obscence slogans at the 2 2 palace guard, he2 2escaped through an ingenious After the protester 2 2 deceit. The2 pitcher’s lengthy pause postponed the start of the Dodgers game. 2 2 A lunar cycle recurs basically once every 28 days. 2

Factors that were varied in the experiment can be divided between those affecting the local context of the fricative and those that relate to the phrasal context or prosodic position. Both local and prosodic factors were varied to produce different contexts for the fricatives. Local context includes the identity of the following sound and of the preceding sound (see Table II). Prosodic position was either syllable-final, word-final, or sentence-final. Another prosodic factor that was varied was the presence or absence of stress on the syllable containing the fricative. The experiment also included fricatives produced in additional contexts that did not specifically test the effect of the factors listed here; these fricatives were included in comparisons between /s/ and /z/ but not in the analyses of contextual factors. Table II is arranged such that matched /s/ and /z/ utterances are next to each other. 2.1. Recording technique Four speakers were recorded in this experiment. They were young adults (20’s and 30’s) from the Midwest and Western United States. Speakers 1 and 3 were male, Speakers 2 and 4 were female. All speakers had previous experience using the equipment and were capable of speaking fluently in a relaxed manner during the experiment. They were all naive as to the purpose of the experiment. The sentences were presented to the speakers on individual cards. In cases in which an error or pause was detected, speakers were asked to repeat the sentence. Speakers were encouraged to maintain a consistent rate and loudness throughout the recording session, but no explicit steps were taken to control these. Later examination of the acoustic durations of the target fricative and preceding vowel showed that the durations varied very little during the course of the experiment.

C. ¸. Smith

476

TABLE II. Utterances recorded in the experiment. Listed here are the immediate contexts for the target fricatives, which were spoken as part of the complete sentences listed in Table I. Utterances in the same row share the same value for the factor listed at left. Comparisons between /s/ and /z/ used pairs of utterances that were identical for all factors. In some cases the fricatives in a given pair of utterances were used in more than one comparison Local context Following sound voiceless stop voiced stop vowel Preceding sound voiced stop vowel

Preceding /l/ false pretenses fals6 e brerad fals6 e is 6 Syllable-initial obscene deceit 6

Prosodic context Position in utterance syllable-final word-final sentence-final Stress pattern in unstressed syll in stressed syll

/s/ Preceding vowel boss passed boss bemoaned boss asked ¼ord-initial red sink 6 he sits 6

/s/ housebound boss6 bemoaned boss.

palace guard 6 e broken replac 6 Other pairs of words used for overall comparisons /s/ Sentence-final false. 6 out ¼ord-final pops ¼ord-final boss6 laughed ¼ord-final this book 6 Syllable-final Rossland Syllable coda test 6

Preceding /l/ falls perceptibly falls6 behind falls6 is 6 Syllable-initial observed 6 dessert

/z/ Preceding vowel pause postponed paus6 e before paus6 e outraged 6 ¼ord-initial red zinc the z6 inc 6

/z/ husband 6 e before paus paus6 e. 6 Dodgers game 6 recurs basically 6 /z/ falls. 6 out Bob’s pause6 lengthened these6 bands Ros6 eland 6 ed paus 6

Three of the four speakers read the set of sentences six times. For Speaker 2, one token of one utterance was discarded before analysis because of speaker error. For Speaker 3, one token each of three different utterances were discarded. No tokens were discarded for Speaker 4. The remaining speaker, Speaker 1, read the set of sentences five times; however, due to recording problems, 14 sentences were unusable in his data. These sentences included one repetition each of 16 different utterances with target /z/; one additional token containing /z/ was discarded due to speaker error. Additional tokens of utterances with /s/ were also missing from the recording, and one other utterance with /s/ was excluded due to speaker error. The total number of matched pairs of /z/ and /s/ that were analyzed for each speaker were: 83 for Speaker 1, 119 for Speaker 2, 117 for Speaker 3, and 120 for Speaker 4. Data were collected using a Glottal Enterprises pneumotachographic mask to measure oral airflow and an electroglottograph (Frøkjaer- Jensen Type EG830). The electroglottograph signal shows changes in the electrical resistance across the neck as a means of measuring vocal fold contact. These signals and the acoustic signal from

Devoicing of /z/ in American English

477

a head-mounted microphone positioned outside the mask approximately 2 inches from the mouth were recorded directly to disk at an 8 kHz sampling rate. The airflow and EGG signals were low-pass filtered at 1 kHz, the acoustic signal at 3 kHz. The acoustic signal was also recorded simultaneously from the same microphone onto cassette tape, digitized at 20 kHz and filtered at 8 kHz for acoustic analysis. Despite the presence of the mask, the frequency information in this recording was good to at least 7 kHz. The alignment between this signal and the directly digitized signal was checked from comparisons of duration measurements between acoustic landmarks that could be easily identified by visual inspection. The EGG data were recorded from the AC coupled channel on the F-J electroglottograph, which includes circuitry to compensate for baseline drift (Baken, 1987). The output signal from this channel displays changes in laryngeal resistance due to the vocal fold vibrations, not the slower changes resulting from laryngeal displacement. (There exist other potential sources of error in the EGG signal, such as movement of the electrodes relative to the thyroid cartilages (see Colton & Conture, 1990).) The airflow data were measured in two ways. One measurement was the maximum value for any sample during acoustic frication. The other was the mean flow averaged over all samples during the frication. The airflow measurements were calibrated in ml/s using a flow manometer. The duration of the acoustic segments for the target fricative and the preceding vowel were measured from the waveform of the tape-recorded acoustic signal. In cases where the segmentation was not obvious from the waveform, a spectrogram display on a Kay Elemetrics’ CSL system was used as an additional aid. The onset of the preceding vowel was defined as the time when formant structure became apparent; the vowel offset was when the formant structure ended. The onset of the fricative was defined as the time when high-frequency noise became salient in the signal and the offset as the end of this noise. The offset of the vowel was measured at a distinct, earlier time than the onset of the fricative in those cases where there was an interval lacking either obvious vocalic formant structure or frication noise. In a subset of the utterances, the fricative was immediately preceded by a stop rather than a vowel; in these cases no measurement was made of the duration of a preceding vowel. These utterances were: ‘‘obscene’’ and ‘‘observed’’, ‘‘red sink’’ and ‘‘red zinc’’, 6 6 6 ‘‘pops out’’ and ‘‘Bob’s out’’. The pair ‘‘palace guard’’ and ‘‘Dodgers game’’ was6 also 6 6 6 excluded from measurements of preceding vowel duration, since the vowels preceding the fricatives in the two words are completely different. In utterances including the words ‘‘false’’ and ‘‘falls’’, in which the sequence /"l/ preceded the fricatives, the total duration of 6 the /"l/ sequence6 was included in vowel duration, since in most cases it was not possible to determine accurately a boundary between the /"/ and /l/. Occurrences of this sequence occurred in matched utterances with /s/ and /z/ so that measures of the vowel (#[l]) durations were comparable. 2.2. Measurement of amount of voicing In order to facilitate comparisons among different tokens, it was desirable to identify specific times as the offset or onset of voicing. In measuring voicing, various methods are available that detect vocal fold vibration directly (photoglottography, electroglottography) or indirectly (acoustic measures). Electroglottography was chosen for this experiment because it permits accurate detection of vibration of the vocal folds while being entirely non-invasive, which makes it possible to record speech samples more

478

C. ¸. Smith

representative of a speaker’s normal behavior than a more invasive technique would allow. It also permits accurate detection of low-amplitude vocal fold vibration even in the presence of acoustic noise, and makes it possible to measure the amplitude of vibration. One problem in measuring vibration of the vocal folds is that the changes in the signal are gradual, which makes it difficult to select a specific moment as the beginning or end of vocal fold vibration (Docherty, 1992). To ensure systematicity in the measurements, the following algorithm was used to identify where vocal fold vibration was present in the EGG signal. The amplitude of one EGG cycle (maximum!minimum during one excursion) was measured at the time of maximum acoustic RMS energy in the vowel preceding the fricative, as calculated by the Energy command in CSL. In utterances in which a stop preceded the fricative, maximum RMS energy was measured in the vowel preceding the stop. The fricative was considered to be voiced during the portion of its duration that the amplitude of the EGG cycles exceeded one-tenth of the EGG cycle amplitude at the time of maximum energy in the preceding vowel. Voicing was considered to cease when the amplitude of an EGG cycle fell below this criterion. This procedure is illustrated in Fig. 1. If the amplitude of several successive EGG cycles wavered between just above and just below the criterion, the offset of voicing was marked where the average amplitude of two successive cycles was below the criterion. For each token of /z/, the percentage of fricative duration with voicing was calculated by dividing the duration of frication with EGG amplitude exceeding the criterion by the total duration of acoustic frication. The tokens of /z/ were divided into three categories according of the percentage of their duration during which there was voicing. The three categories were: 0—25% voicing"devoiced; 25—90% voicing"partially devoiced; 90—100% voicing"voiced.

Figure 1. A partially devoiced token of ‘‘falls perceptibly’’, spoken by Speaker 1, illustrating the use of EGG cycle amplitude to determine a criterion for presence/absence of voicing. The vertical lines in the acoustic waveform indicate the onset and offset of frication.

Devoicing of /z/ in American English

479

Since each token was categorized individually, it was possible for the several tokens of a given utterance to fall into different voicing categories. These category boundaries were chosen because they seemed to reflect the grouping of the data. In particular, the data for Speakers 1, 2, and 3 were bimodal, with few tokens of /z/ having 50—90% voicing, but a number of tokens having over 90% voicing—almost but not quite all of the duration of the frication. It seemed reasonable to group these tokens with over 90% voicing in the ‘‘voiced’’ category together with those that were voiced throughout the entire duration of frication. The boundary between the categories was less clear for Speaker 4. There was not a very clear boundary between devoiced and partially devoiced categories for any speaker, but the 0—25% division grouped together most of the tokens with less voicing. 2.3. Numerical analyses Each speaker’s data were analyzed separately. Two kinds of analyses were performed. The first analysis compared the acoustic and airflow measures of /s/ and /z/. Each token containing /z/ was paired with the corresponding token containing /s/, that is, the matched utterance produced in the same repetition of the sentences. These pairs of tokens were grouped into the voicing categories described in Section 2.2, according to the percentage of voicing in the token with /z/. For example, for Speaker 2 the second repetition of sentence-final ‘‘falls’’ was paired with the second repetition of sentencefinal ‘‘false’’. Since there was no vocal fold vibration during the /z/ in this repetition of ‘‘falls’’, these tokens are both tallied in the ‘‘devoiced’’ category. This method ensures that the comparisons between /s/ and /z/ are being made between tokens that occurred in matching contexts. However, each group of comparisons involves different total numbers of tokens, and very different numbers of tokens from each speaker, that were devoiced, partially devoiced, or voiced. This is because, although the speaker’s results usually patterned in much the same direction, the individuals varied considerably in their overall propensity for devoicing (see Section 3.1.1.). The unequal distribution of data points from the different speakers meant that an analysis grouping together data from all the speakers within a given voicing category might include one data point from one speaker and twenty from another speaker, making it impossible to do a statistical test with speaker identity as a factor.1 Separate analyses of each speaker’s data were also necessary to test the differences between /s/ and /z/ that were the primary focus of the experiment, because in making these comparisons, it was essential to control for the effects of the different contexts in which the fricatives were produced. The surest control was to make pairwise comparisons such that each member of the /s/-/z/ pair had occurred in exactly the same context. Because an important goal of this study was to determine the extent to which these

1 Analyzing these data with the ANOVA combining voicing categories and/or speakers would risk violation of two of the assumptions underlying ANOVA that are described by Rietveld & van Hout (1993, p. 120). These are: (1) ‘‘the observations are drawn from normally distributed populations’’ and (2) homogeneity of variance. The different contexts in which the fricatives were produced impose an uneven, non-normal distribution on the values for duration and airflow. In this experiment, context operates like a categorial factor in its effect on the duration and airflow values, creating groups of tokens with different ranges of values rather than randomly distributed values. Homogeneity of variance would be violated for these same reasons, and also because the skewed distribution of the values for percentage devoicing contributes to skew in the durations and airflow values.

480

C. ¸. Smith

comparisons with /s/ were valid for devoiced /z/ as well as voiced /z/, the statistical tests had to be carried out separately for each voicing category. Within each voicing category, two-tailed paired t-tests were used to compare the acoustic and aerodynamic measurements of the /s/ tokens with the measurements of the matching /z/ tokens. Therefore, for this analysis only those tokens where both members of the pair were available could be included. However, a few tokens were included in the comparison of acoustic durations where the airflow signal was unusable but the acoustic signal was available. A consequence of this procedure is that in a few cases the number of tokens analyzed was different for the acoustic measurements than for the airflow measurements. Each t-test compared a different set of /s/ and /z/ tokens; no data were included in more than one t-test. The significance level used was p(0.01. The second analysis investigated the likelihood of devoicing in different contexts. The count of the number of tokens in each of the voicing categories provided the data for this analysis. No statistical tests were done because of the small numbers of tokens. When a token containing /z/ was available, but its matching token with /s/ was not, the token with /z/ was included in this analysis even though it had not been used in the first analysis.

3. Results All three types of data that were collected—EGG, acoustic durations, and airflow— showed that /z/ is distinct from /s/ occurring in the same context, regardless of whether the /z/ is voiced or devoiced. Moreover, both the voiced and the devoiced /z/’s patterned in the same direction for all three types of measurements. To demonstrate this pattern, first the overall results for the EGG measurements will be given to show how they served to divide the tokens of /z/ among the three voicing categories. Next, the /s/’s are compared to /z/’s belonging to different voicing categories, using the acoustic durations and the airflow measurements. The primary purpose of the acoustic and airflow data was to determine whether /s/ and /z/ are distinct for all voicing categories of /z/. 3.1. Comparisons of /s/ and /z/, organized by voicing category of /z/ It was expected that durational and airflow measurements of devoiced /z/ would be more likely to resemble those for /s/ than would the measurements of voiced /z/. Thus, in order to test carefully whether any category of /z/’s were potentially indistinguishable from /s/’s, separate comparisons were made between devoiced /z/ and /s/, partially devoiced /z/ and /s/, and voiced /z/ and /s/. The tokens of /z/ were first divided into the three voicing categories, then each matched with its corresponding token of /s/. This resulted in three data sets per speaker, one for each voicing category. Each data set included a different number of tokens, because each speaker produced a different number of tokens of /z/ that were devoiced, partially devoiced, or voiced. Also, each data set could contain anywhere from 0 to 6 tokens of a given utterance, depending on how the speaker had produced the /z/ in that utterance. However, each data set always contained the same number of tokens of /s/ and /z/. Within each data set, the dependent measurements that were compared were acoustic duration of the fricative noise and of the vowel preceding the fricative, and the mean and maximum airflow during the fricative.

Devoicing of /z/ in American English

481

3.1.1. Grouping of tokens of /z/ into voicing categories, based on EGG data The first stage in establishing the comparisons was to use the EGG data as described in Section 2.2. to group the tokens into the three voicing categories. These data demonstrate that there were substantial differences among the four speakers as to how often they devoiced /z/. Fig. 2 shows the number of tokens of /z/ that each speaker produced as devoiced, partly devoiced, and voiced during the course of the entire experiment. Speaker 1 often produced /z/ with vibration throughout, whereas Speaker 2 produced the most tokens with no vocal fold vibration. Speakers 3 and 4 tended to produce /z/ with vocal fold vibration during part of the fricative. For Speakers 1 and 2, and to some extent Speaker 3, few tokens were produced with 50—90% of voicing, but a larger number had 90—100% voicing. Speaker 4 showed a slightly different distribution, with fewer and fewer tokens for higher percentages of voicing, so that very few tokens were produced with 90—100% voicing. These overall differences contributed to the differences in likelihood of devoicing in the different contexts that were tested. The differences among speakers did not have an obvious explanation in their rates of speech; Speaker 2, who devoiced the most, spoke the most slowly, Speaker 1, who devoiced the least, spoke next most slowly, while Speakers 3 and 4 spoke the most rapidly. The EGG data were also used to determine whether there was any voicing during the tokens of /s/. Not surprisingly, the duration of vocal fold vibration in the /s/’s was much less than in the /z/’s. Of a total of 443 tokens of /s/, 436 had vocal fold vibration for less than 25% of their duration, and most of these had no vocal fold vibration at all. Seven tokens had vocal fold vibration during 25—50% of their duration; nonetheless, these were all perceived as /s/ in informal listening tests. The tokens that did include some amount of voicing were distributed almost equally among the four speakers. These results show that all four speakers produced /s/ as voiceless in all contexts. However, there were

Figure 2. The total number of tokens of /z/ that each speaker produced as devoiced , partially devoiced , or fully voiced .

C. ¸. Smith

482

substantial differences among /s/’s produced in different contexts; for example, they were much longer sentence-finally because of the effect of final lengthening. 3.1.2. Comparison of /s/ and /z/: acoustic durations The comparisons between /s/ and /z/ were organized by the voicing category of /z/ in order to test whether /s/ and /z/ are distinct regardless of the amount of voicing in /z/. The acoustic durations of the fricatives and preceding vowels agreed in showing differences between /z/’s and /s/’s produced in the same context. The duration of /z/ was significantly shorter than for /s/ (p(0.01) for all except the fully voiced /z/’s of Speaker 4, which did not differ significantly from their matched /s/’s. Mean values and results of the t-tests for the measures of acoustic duration are shown in Table III. The variation among

TABLE III. Results of t-tests comparing acoustic durations for utterances with /s/ and utterances with /z/, grouped into voicing categories according to the amount of voicing in the /z/ token. n is the number of pairs compared. t is the statistic obtained from a paired t-test, and p is the significance level of this statistic

Speaker Voicing category

Mean (ms) utterances with /z/

Mean (ms) utterances with /s/

n

t

p(

Frication duration 1 Voiced Partially devoiced Devoiced

70.5 67.6 91.0

115.7 87.1 130.1

36 25 21

!8.328 !3.460 !4.979

0.001 0.01 0.001

2

Voiced Partially devoiced Devoiced

47.4 56.1 71.3

89.2 86.0 90.8

15 28 76

!6.145 !6.124 !6.225

0.001 0.001 0.001

3

Voiced Partially devoiced Devoiced

73.9 57.0 79.9

94.2 82.9 148.1

15 52 50

!3.088 !6.400 !11.322

0.01 0.001 0.001

4

Voiced Partially devoiced Devoiced

64.9 62.6 81.8

88.9 85.5 113.6

8 56 56

!3.321 !7.636 !9.707

ns 0.001 0.001

Duration of preceding vowel 1 Voiced Partially devoiced Devoiced

140.9 171.8 195.4

121.7 145.7 150.9

29 18 20

3.545 3.967 3.459

0.01 0.001 0.01

2

Voiced Partially devoiced Devoiced

125.7 182.6 216.2

120.8 150.5 164.0

9 23 63

0.428 4.183 9.034

ns 0.001 0.001

3

Voiced Partially devoiced Devoiced

139.3 143.0 200.0

133.0 129.4 84.3

11 45 33

1.103 2.911 15.048

ns 0.01 0.001

4

Voiced Partially devoiced Devoiced

127.5 137.7 178.1

129.1 108.6 134.7

8 45 43

!0.147 5.890 5.289

ns 0.001 0.001

Devoicing of /z/ in American English

483

/z/’s in different voicing categories results from their being produced in different contexts; because the /s/’s grouped in each voicing category were also produced in different contexts, their duration also differs. The vowel durations also conformed to the expected pattern in English, with vowels significantly longer before /z/ than before /s/ (p(0.01) for all speakers’ devoiced and partially devoiced /z/’s, and for Speaker 1’s voiced /z/’s. For Speakers 2 and 3, vowels preceding voiced /z/’s tended to be longer than vowels preceding the matched /s/’s, but the difference was not significant. For all speakers, the difference in vowel durations was greatest for vowels preceding devoiced /z/’s. The larger durational differences in the vowels before devoiced /z/’s may aid the perception of the distinction between /z/ and /s/ in the absence of vocal fold vibration. 3.1.3. Comparison of /s/ and /z/: airflow Measurements of mean and maximum airflow were made for the fricatives for the interval during which there was strong aperiodic noise. This interval does not include the time that the articulators are in the process of forming the oral and laryngeal constrictions for the fricative. The mean airflow thus represents an average value during the production of a noisy signal and is not distorted by potentially high flow through an unobstructed vocal tract (before or after the formation of the oral constriction). The results of the t-tests comparing measurements of mean and maximum flow are shown in Table IV. As for the durations, these results show significant differences between the /z/’s and /s/’s, with /z/’s characterized by generally lower airflow than /s/’s. An important point is that devoicing did not neutralize the distinction between /s/ and /z/. For the mean airflow, there were significantly lower values for /z/ than for corresponding /s/ in 9 out of the 12 comparisons. The exceptions were Speakers 2 and 4’s fully voiced /z/’s and Speaker 4’s devoiced /z/’s. Similarly, maximum flow was significantly lower for /z/ than for /s/ for all voicing categories for all speakers except for the fully voiced /z/’s produced by Speakers 2 and 4. The airflow data thus coincide with the durational results in showing that /z/ and /s/ are different, whether or not the /z/’s are produced with vocal fold vibration. The lower airflow commonly observed for (voiced) /z/ than for /s/ may be due to the vocal folds’ more approximated position during /z/ (Isshiki & Ringel, 1964; Scully, 1971), if the folds occupy a similar position for all /z/’s, whether voiced or devoiced. If during devoiced /z/ the folds are in an intermediate state between their approximated position for consonants with voicing and their open position during voicelessness (as suggested by Catford, 1977 and Laver, 1994), then the resistance that the vocal folds offer to transglottal airflow should be greater than during voicelessness. The airflow measurements reported here show that /z/ almost always had lower airflow than /s/. For these airflow comparisons, the fricatives were grouped by amount of voicing in the /z/’s. Differences among the /z/’s of different voicing categories can most likely be attributed to the different contexts in which they were produced, since contextual factors such as syllable position may play a role in determining the amplitude of flow. Many of the utterances tested in this experiment placed /z/ at the end of a syllable, and in particular, 11 of Speaker 2’s 15 fully voiced /z/’s were syllable- and sometimes word-final. Hardcastle & Clark (1981) found consistently lower airflow for /z/ than for /s/ for syllable-initial fricatives, but not always for syllable-final fricatives. Isshiki & Ringel (1964) found more variable airflow in syllable-final position than syllable-initial for their

C. ¸. Smith

484

TABLE IV. Results of t-tests comparing airflow for utterances with /s/ and utterances with /z/, grouped into voicing categories according to the amount of voicing in the /z/ token. n is the number of pairs compared. t is the statistic obtained from a paired t-test, and p is the significance level of this statistic

Speaker Voicing category

Mean utterances with /z/

Mean (ms) utterances with /s/

n

t

p(

Mean airflow (ml/s) 1 Voiced Partially devoiced Devoiced

50.3 47.5 40.0

82.3 83.4 73.4

32 23 21

!5.302 !4.277 !3.266

0.001 0.001 0.01

2

Voiced Partially devoiced Devoiced

100.4 89.5 96.4

99.8 125.0 124.4

15 28 76

0.057 !4.523 !5.495

ns 0.001 0.001

3

Voiced Partially devoiced Devoiced

71.5 83.2 102.7

123.8 137.4 128.5

15 52 50

!4.312 !9.168 !3.636

0.001 0.001 0.001

4

Voiced Partially devoiced Devoiced

129.8 141.0 150.5

142.8 166.1 159.7

8 56 56

!1.674 !4.277 !1.523

ns 0.001 ns

Maximum airflow (ml/s) 1 Voiced Partially devoiced Devoiced

138.0 121.4 106.4

186.1 198.7 201.2

32 23 21

!3.510 !4.556 !5.571

0.01 0.001 0.001

2

Voiced Partially devoiced Devoiced

151.2 131.8 138.2

162.1 204.5 198.4

15 28 76

!0.674 !4.754 !8.255

ns 0.001 0.001

3

Voiced Partially devoiced Devoiced

148.4 153.8 195.9

229.6 254.7 238.7

15 52 50

!3.231 !7.700 !3.453

0.01 0.001 0.01

4

Voiced Partially devoiced Devoiced

184.6 187.8 198.1

226.0 239.4 246.3

8 56 56

!2.387 !5.715 !5.379

ns 0.001 0.001

sample of consonants. These findings suggest that the higher than expected airflow for Speakers 2 and 4’s fully voiced /z/’s may be due to their position in the syllable. 3.2. ¸ikelihood of devoicing in different segmental contexts Another analysis was carried out to determine what factors determine the likelihood of devoicing in different segmental contexts. The contextual factor examined here that has the largest effect on the likelihood of devoicing is the type of sound that follows the fricative. The effect of following context can be seen by comparing the frequency of devoicing in /z/ across the set of utterances involving the word ‘‘pause’’, followed by 6 These results a vowel, a sonorant consonant /l/, a voiced stop /b/, or a voiceless stop /p/. are shown in Fig. 3. The only context in which speakers produced fully voiced tokens was

Devoicing of /z/ in American English

485

Figure 3. The number of tokens of /z/ in the word ‘‘pause’’ followed by a vowel, 6 a sonorant consonant /l/, a voiced stop /b/, and a voiceless stop /p/ that the speakers produced as devoiced , partially devoiced , or fully voiced .

when a sonorant (vowel or consonant) followed the /z/. Complete devoicing of /z/ occurred more often when the following sound was the voiceless stop /p/ than when any of the voiced sounds followed. The likelihood of devoicing tended to increase when less sonorous sounds followed the /z/. Speakers also produced more devoiced tokens of /z/ before a voiceless consonant in another set of utterances with /z/ at the end of ‘‘falls’’, shown in Fig. 4. Increased frequency of devoicing in fricatives preceding voiceless sounds has been previously reported by Docherty (1992), Stevens et al. (1992), Veatch (1989), and Haggard (1978). In order to compare speakers’ likelihood of devoicing in different contexts, a ‘‘devoicing index’’ was calculated for the set of utterances that included the word ‘‘pause’’ 6 with different following contexts. This index was designed to provide a single number that indicated how often a speaker voiced the /z/ in a particular utterance. A value of this

486

C. ¸. Smith

Figure 4. The number of tokens of /z/ in the word ‘‘falls’’ followed by a vowel, 6 produced as devoiced a voiced stop /b/, and a voiceless stop /p/ that the speakers , partially devoiced , or fully voiced .

index was calculated for each speaker for each utterance. The proportion of tokens with full voicing was calculated by dividing the number of tokens of an utterance that a speaker produced with full voicing by the total number of tokens that the speaker produced of that utterance. The proportion with partial devoicing and the proportion with complete devoicing were calculated in the same way. The speaker’s devoicing index for that utterance was equal to one-half the sum of the proportion with full voicing times two plus the proportion with partial devoicing. Completely devoiced productions were given a value of 0. Thus, the index value for a speaker who fully voiced every token of an utterance would be 1, for a speaker who partially devoiced every token would be 0.5, and for a speaker who completely devoiced every token would be 0. For example, for the /z/ in ‘‘pause outraged’’, Speaker 3 produced 1 voiced token, 2 partially voiced, and 2 devoiced. 6 His index value for this utterance equals 1/2* ((2*1/5)#(1*2/5)#(0*2/5)) "0.4.

Devoicing of /z/ in American English

487

The different speakers’ values of this index were rank ordered for each of the utterances that included ‘‘pause’’, and these orderings were compared to test the prediction that 6 the context following the /z/ made devoicing more likely. If assimidecreased sonority of lation to neighboring context is an important influence on the likelihood of devoicing, the rank ordering should be that the most devoicing would occur before /p/, with less before /b/, even less before /l/, and the least of all before a vowel. The numbers of speakers who conformed to these predictions are listed below. f f f f f f

all speakers had devoicing more often before /p/ than before /b/ all but Speaker 4 had devoicing more often before /p/ than before /l/ all speakers had devoicing more often before /p/ than before a vowel Speakers 1 and 3 had devoicing more often before /b/ than before /l/ Speakers 1 and 4 had devoicing more often before /b/ than before a vowel Speakers 2 and 4 had devoicing more often before /l/ than before a vowel

These results show that the most consistent difference among the tested contexts was that speakers devoiced more before voiceless /p/ and less before the voiced consonants and vowel. Although there were more fully devoiced tokens of /z/ before /p/ than before /b/, no speaker produced any fully voiced tokens of /z/ in ‘‘pause’’ when there was a /b/ 6 following. Since /b/ is often produced as a voiceless unaspirated stop by speakers of American English (Lisker & Abramson, 1964), it may lack any vocal fold vibration. No measurements were made in the current study of the voicing of /b/, but inspection of the data suggests that few tokens of /b/ had any vocal fold vibration. Although /b/ probably often lacked voicing, there were nonetheless differences in the likelihood of devoicing between /z/’s preceding /b/ and /z/’s preceding /p/. Devoicing was more likely to occur preceding /p/, which normally has glottal opening rather than adduction as is typical for /b/ (Lisker, Abramson, Cooper & Schvey, 1969; Flege, 1982; Lo¨fqvist & McGowan, 1992). The context preceding the /z/ also affects the likelihood of devoicing. For comparisons of preceding context, the /z/ was either in a monosyllabic word or at the beginning of a syllable with primary stress. Fig. 5 shows that speakers produced more devoiced /z/ when /z/ was preceded by a voiced stop than by a vowel. This pattern held for both syllable-initial and word-initial /z/, but in general there was more devoicing in syllableinitial, word-medial position than in word-initial position. Like the effect of the sound following the /z/, the effect of the sound preceding /z/ could appear to be an assimilatory process. However, for assimilation to be responsible would assume that /b/ is produced with less vocal fold vibration than vowels are. The mechanism whereby reduced vocal fold vibration could be assimilated from /b/ to /z/ is not clear, but it does seem that /z/ is influenced by preceding context as well as following context. The effect of phonologically voiced stops that are adjacent to /z/ seems to be intermediate between the effects of vowels and of /p/, which is appropriate if the voiced stops have neither full vocal fold vibration as in vowels nor a glottal opening gesture as in /p/. 3.3. ¸ikelihood of devoicing in different stress and phrasal contexts In addition to the local assimilatory effects described above, influences from prosodic structure were also investigated. One set of words compared /z/’s in sentence-final, word-final, and word-internal syllable-final position, with a voiced stop following the word- and syllable-final /z/ (‘‘pause’’, ‘‘pause before’’, and ‘‘husband’’). The goal of this 6 6 6

488

C. ¸. Smith

Figure 5. The number of tokens of syllable- and word-initial /z/ preceded by a voiced stop or a vowel that the speakers produced as devoiced , partially devoiced , or fully voiced .

comparison was to compare the effect of boundaries ending units of different size (syllable, word, or sentence). A voiced stop (/b/) was chosen as the context following /z/ in these comparisons in order to ensure that speakers produced /z/ and /s/ (in the matched utterances) as syllable-final; if a /p/ or /l/ had followed, the fricatives might have been syllabified as onsets to the following syllable. Devoicing occurred more often at the end of the larger domain (sentence) than at the end of a smaller domain (syllable). All sentence-final /z/’s produced by all speakers were completely devoiced. In addition, Speakers 1, 2, and 3 more often devoiced the /z/ wordfinally than at the end of a word-medial syllable; however, there was considerable individual variation for the word-medial syllable which can be seen in Fig. 6. All speakers produced most of their tokens of word-final /z/ in ‘‘pause before’’ with partial devoicing; no speaker produced any fully voiced tokens of /z/ in6 this utterance. In contrast, for word-medial /z/ in ‘‘husband’’, Speakers 1, 2, and 3 produced one or more fully voiced 6

Devoicing of /z/ in American English

489

Figure 6. The number of tokens of sentence-, word-, and syllable-final /z/ that the speakers produced as devoiced , partially devoiced , or fully voiced .

tokens; Speakers 1 and 3 also produced some partly or completely devoiced tokens. However, Speaker 4 produced more tokens of devoiced /z/ in word-medial position than in word-final position. Given that all of this speaker’s word-medial /z/’s in ‘‘husband’’ were completely devoiced, the most likely interpretation is that she has lexicalized a pronunciation of ‘‘husband’’ with an /s/ rather than a /z/. Further evidence for this interpretation is that the mean airflow for the /z/ in ‘‘husband’’ was higher than the 6 airflow for the matching /s/ in ‘‘housebound.’’ This result is discussed further in 6 Section 3.4. In addition to the effect of position in the utterance, the effect of stress on the likelihood of devoicing can also be considered a reflection of prosodic structure. Two-syllable words with different stress patterns were compared to see whether the word-final /z/ was more likely to be devoiced at the end of stressed or unstressed syllables. It was expected that devoicing would be more frequent at the end of an unstressed syllable than at the end of a stressed syllable, since the lack of stress signals

490

C. ¸. Smith

Figure 7. The number of tokens of /z/ at the end of a stressed or unstressed syllable that the speakers produced as devoiced , partially devoiced , or fully voiced .

a prosodically weaker position. Speakers 1 and 3 fulfilled this prediction, as can be seen in Fig. 7. Speaker 1 produced the stressed /z/ in ‘‘recu´rs’’ either fully voiced or partially devoiced, and the unstressed /z/ in ‘‘Do´dgers’’ as either6 partly or completely devoiced. 6 /z/ with partial devoicing, but always Speaker 3 produced most tokens of stressed produced the unstressed /z/ as fully devoiced (all six tokens). Unlike these two speakers, Speaker 2 made no difference in the likelihood of voicing stressed and unstressed /z/, and Speaker 4 showed a pattern somewhat contrary to the prediction. Thus, stress appears to be a less robust predictor of devoicing, since two of the speakers conformed to the predicted pattern and two did not; however, across the four speakers there were 23% fewer tokens with partial or full voicing in the unstressed /z/ than in the stressed /z/. 3.4. Comparison of airflow in different utterances The t-tests reported in Section 3.1.3. showed that for voiced, partially devoiced, and completely devoiced /z/’s, airflow was lower for the /z/’s than for the comparable /s/’s. In these comparisons, tokens were grouped by percentage duration of voicing, combining tokens of different utterances within each voicing category. Additional comparisons were made grouping tokens of the same utterance regardless of duration of voicing, and comparing the airflow for tokens of different utterances. One set of comparisons looked at the difference between /p/ and /b/ contexts following /z/. If devoicing of /z/ preceding /p/ is the result of (glottal) assimilation during /z/ to the open glottis of /p/, higher airflow might be expected in /z/’s that are completely devoiced before /p/ than in those that were not completely devoiced. Devoicing of /z/ before /b/ is hypothesized to be the consequence of transglottal flow insufficient to maintain voicing. This could occur if the glottis opened somewhat during the /z/ compared to its position during the preceding vowel, but the rate of transglottal airflow did not increase enough to compensate for the increase in glottal area. Before /p/, there were no tokens of fully voiced /z/, and there was much variability as to whether completely or partially devoiced /z/ had greater flow. This variability suggests that the devoicing before /p/ could have resulted from either increased glottal opening or

Devoicing of /z/ in American English

491

from insufficient transglottal pressure difference. The pressure difference might be insufficient if the speaker failed to expand the supralaryngeal part of the vocal tract to keep oral pressure low. Since /p/ requires glottal opening but not tract expansion, anticipation of glottal opening seems the more probable explanation for devoicing of /z/ before /p/. Before /b/, there was lower maximum and mean airflow in /z/’s that were completely devoiced than in /z/’s that were fully voiced or partially devoiced. The lower airflow is consistent with the explanation that devoicing is due to the transglottal flow being insufficient to maintain voicing. To investigate further whether devoicing can be attributed to lower airflow, comparisons were made of the airflow values for the utterances in different prosodic contexts grouped by phrasal context (syllable-final, word-final, or sentence-final) rather than by voicing category. The patterns for maximum flow and mean flow were similar; the data for maximum flow2 are given in Fig. 8. For Speaker 1, the flow is lower for /z/ than for /s/ in each of the three contexts. Since all the sentence-final /z/’s were completely devoiced, but some of the syllable-final ones were fully voiced, it can be concluded that for Speaker 1, regardless of the amount of voicing in the /z/, the airflow for /z/’s is lower than that of /s/’s produced in the same context. For this speaker, there is no neutralization between /z/ and /s/. On the other hand, Speaker 3 showed lower airflow for /z/ than for /s/ in word- or syllable-final position, as expected, but surprisingly shows relatively high flow for both /z/ and /s/ in sentence-final position. This high air flow suggests that the speaker may be neutralizing the /z/—/s/ contrast in this one position. It may be that this speaker is opening the vocal folds at the end of the sentence in anticipation of the open position of the glottis that is typical of respiration during a pause. The high airflow in the sentencefinal position appears to be a special case of an assimilatory process, different from the examples of assimilation to adjacent voiceless sounds that were discussed earlier. The pattern in the flow data is almost the opposite for Speaker 4 than for Speaker 3. For Speaker 4, airflow is lower, as expected, in the sentence-final and word-final /z/’s than in the corresponding /s/’s but the airflow in the syllable-final /z/’s is almost as high as in the syllable-final /s/’s. Speaker 4 always completely devoiced these syllable-final /z/’s in the word ‘‘husband’’. The high airflow for these devoiced /z/’s suggests that, as noted earlier, 6 has a different lexical form for this word, so that it contains an /s/ rather than this speaker the /z/ that might be expected. Such lexical variation is found in English in other words, such as the two pronunciations ‘‘ab[s]urd’’ and ‘‘ab[z]urd’’. The airflow data for Speaker 2 were similar to those for Speaker 4, but the high airflow for syllable-final /z/ is difficult to explain for Speaker 2 since her /z/’s were fully voiced in this utterance. 3.5. Summary Both local and sentence-level contexts have been shown to contribute to devoicing, such that /z/’s in different contexts may devoice for different reasons. It is also possible that more than one factor may encourage devoicing in a single context. Some possible causes for devoicing that have been suggested by the data described above are listed in Table V, and are discussed further in the following section.

2 The data for mean flow showed the same patterns as maximum flow except that in Speaker 4’s syllable-final tokens, /z/ had slightly higher mean airflow than /s/.

492 C. ¸. Smith

Figure 8. Maximum airflow during /z/ syllable-final ‘‘husband’’. 6

and /s/

in different prosodic positions: sentence-final ‘‘pause’’, word-final ‘‘pause before’’, and 6 6

Devoicing of /z/ in American English

493

TABLE V. Summary of factors that may explain the occurrence of devoicing in different contexts Cause of devoicing: glottal abduction, transglottal flow insufficient for vocal fold vibration ¼hy Anticipation of glottal opening

¼here Pre-pausal (sentence-final), before voiceless sound In contexts where reduction is likely

Reduced glottal adduction gesture Cause of devoicing: too little transglottal pressure difference ¼hy Lack of expansion of supralaryngeal part of vocal tract, leading to increased oral pressure Tightening of oral constriction, leading to increased oral pressure Declination of subglottal pressure over an entire utterance

¼here Potentially any context Potentially any context End of prosodic group, before emphatic stress

4. Discussion Previous studies such as those by Docherty (1992), Stevens et al. (1992) and Haggard (1978) have documented the influence of segmental context on the likelihood of devoicing in English fricatives, but did not investigate as wide of a range of contextual influences as the present study, which considered stress and position in the utterance as well as segmental context. All studies concur that an adjacent voiceless sound is a strong indicator for devoicing in a phonologically voiced fricative. In the present study, the likelihood of devoicing increased if there was a voiceless context on either side of the /z/. It did not appear that a /z/ caused any adjacent voiceless sounds to become voiced, although no measurements of this were made, nor did /s/’s become voiced when adjacent to voiced stops or vowels. These findings parallel those of Stevens et al. (1992) that in fricative clusters in English, it was more likely that a phonologically voiced fricative would devoice when adjacent to a voiceless fricative than that a phonologically voiceless fricative would become voiced when adjacent to a voiced fricative. With regard to the effect of position within the utterance, the results indicate a substantially greater likelihood of devoicing for /z/ in final position in the sentence than anywhere in the middle of sentence. No data were collected in this experiment for utterance-initial /z/; however, since word-initial /z/’s were seldom devoiced, it seems likely that utterance-initial would also be unlikely to devoice. This expectation follows the results of Docherty (1992); in his data, utterance-initial /z/’s had voicing during an average 85% of their medial phase, whereas utterance-final /z/’s were voiced during an average of 14% of their medial phase (in these data, the ‘‘utterance’’ is a single word spoken in isolation). This study adds to the literature on devoicing by directly comparing devoicing in syllable-, word-, and sentence-final fricatives. Although the size of the data set is modest, it appears that these differences in position have a strong effect on the likelihood of devoicing. This set of comparisons was made with the goal of extending previous findings relating to the effect of prosodic domains on articulatory movements. Several studies have shown evidence of larger movements at the beginning of prosodic domains (for

494

C. ¸. Smith

glottal opening, Pierrehumbert & Talkin, 1992, for linguopalatal contact, Byrd, 1996b; Fougreon & Keating, 1996, 1997), or conversely, that movements are reduced in extent towards the end of a domain (for jaw opening, Vayra & Fowler, 1992; for velum raising, Krakow, Bell-Berti & Wang, 1995; for tongue movement, Browman & Goldstein, 1995). The larger glottal opening for /h/ observed in phrase-initial position (Pierrehumbert & Talkin, 1992) accentuates the gesture that distinguishes /h/ from other sounds; more contact of the tongue to the alveolar region in syllable onsets (Byrd, 1996b) accentuates a gesture that distinguishes alveolar consonants from others. (Gesture is being used here in the sense of coordinated movements of the articulators that are marshaled to the goal of producing a linguistic contrast (Browman & Goldstein, 1986; Saltzman & Munhall, 1989)). All of these studies concur in finding that in initial position, the articulatory movements produce a more fully realized gesture than they do in final position. The tendency for greater gestural magnitude has been shown to increase at the beginning of larger domains (e.g., more glottalization at the beginning of full rather than intermediate intonational phrases (Dilley, Shattuck-Hufnagel & Ostendorf, 1996), more contact of tongue and palate at the onset of large domains (Fougeron & Keating, 1997)). The converse of this pattern of more initial strengthening in larger domains would be the hypothesis that gestural magnitude reduces more at the end of larger prosodic domains. This hypothesis predicts more lenition at the end of larger domains, such as an intonational phrase or sentence. Gestures would be less fully realized at the end of a larger domain (the sentence) than at the end of a smaller domain (the word). For /z/, this hypothesis predicts reduced magnitude at the end of larger domains for the lingual, laryngeal, or aerodynamic gestures. The term ‘‘aerodynamic gesture’’ is used as by McGowan & Saltzman (1995) to refer to certain aerodynamic components that can be controlled by the speaker. In their model, these components are subglottal pressure and ‘‘transglottal pressure’’ drop; the latter includes all maneuvers directed at maintaining voicing in voiced obstruents (e.g., oral cavity expansion, larynx lowering, incomplete velopharyngeal closure). A reduction in the lingual gesture would mean a looser alveolar constriction: since frication was never absent in the fricatives in this study, it appears that the speakers did not reduce lingual gestures by any substantial amount. For a voiced sound, reduction in the laryngeal gesture would imply less glottal adduction, that is, a more open glottis. This is the mechanism that was suggested as the cause of devoicing in contexts where the /z/ is assimilating to an adjacent voiceless sound. (Note that in Articulatory Phonology (Browman & Goldstein, 1986), there is a glottal abduction gesture but not a separate glottal adduction gesture. In McGowan & Saltzman (1995), the modeled variable is glottal width.) So a less fully realized glottal adduction gesture represents a form of reduction in the case of voiced sounds such as /z/. The remaining types of gesture that could be subject to reduction are the aerodynamic gestures, subglottal pressure, and transglottal pressure difference (McGowan & Saltzman, 1995). Since subglottal pressure declines gradually over the course of a sentence except for small local perturbations (Gelfer, Harris & Baer, 1987), the only context in which reduced subglottal pressure is a likely explanation for loss of voicing is at the end of a sentence. A reduction in the transglottal pressure drop, on the other hand, is almost inevitable in the production of voiced obstruents without active or passive maneuvers directed at prolonging voicing (e.g., Ohala & Riordan, 1979; Westbury, 1983). Thus, it is a likely explanation for devoicing. An absence of cavity expansion maneuvers simplifies the articulation of a voiced obstruent. Because it reduces the complexity of articulation,

Devoicing of /z/ in American English

495

devoicing qualifies as a form of lenition. Further discussion of the interpretation of a reduction in transglottal pressure follows in Section 4.1. In summary, there appears to be evidence in this study for reduction of laryngeal and aerodynamic gestures, but not the lingual gesture, in the production of the devoiced /z/’s. Devoicing can thus be interpreted as a form of lenition, since it results from gestures being formed less completely in certain contexts than in others. Lenition of consonants is most often considered to involve a change from voiceless to voiced (Lass, 1984), especially intervocalically as assimilation to the voicing of neighboring vowels. However, a reduction in magnitude of certain gestures can result in a change from voiced to not voiced. The comparisons between /s/ and /z/ given in this paper demonstrate crucially that the devoicing phenomenon analyzed here is not a change from voiced with adducted glottis to voiceless with abducted glottis. Rather, the prosodically conditioned devoiced /z/’s represent a third state, neither voiced nor voiceless, that results from the lenition processes discussed above. 4.1. Mechanisms for devoicing As discussed in the introduction, voiced fricatives present a particularly exigent set of demands on the vocal tract. Because they require precise conditions to produce both voicing and frication simultaneously, a comparatively small divergence from these conditions is more likely to result in a salient difference from the ‘‘default’’ characteristics of a voiced fricative than would be the case for some other sound. In the present data, divergence from the canonical form of /z/ always showed up as devoicing, rather than loss of frication. The tendency to devoice can be explained in part by evidence suggesting that the glottis is always somewhat open during voiced fricatives—more open, at least, than for voiced stops. Such evidence comes from studies using transillumination to examine glottal opening (Lisker, Abramson, Cooper & Schvey, 1969), as well as EMG data showing more suppression of the adductory interarytenoid and lateral cricoarytenoid muscles in word-medial voiced fricatives than voiced stops (Hirose & Ushijima, 1978). The glottis may open wider in anticipation of a following voiceless sound; this can be viewed as a simplification of the articulatory demands of the utterance, somewhat as reduction in gestural magnitude represents a relaxation from the ‘‘ideal’’ production. In the vast majority of tokens in this experiment (279 of a total 322 devoiced /z/’s), the devoicing was at the end of the /z/; that is, at the beginning of frication the vocal folds were vibrating, but the vibration ceased before the end of frication. In a subset of these tokens the /z/ occurred before a voiceless sound, so if the glottis is indeed opening somewhat during the /z/, this opening does not represent a separate additional gesture, merely an anticipation of the opening gesture that would otherwise happen at the end of the fricative. This kind of anticipation can be interpreted as a change in the coordination of the glottal and oral gestures: the glottis is opening before the tongue releases its constriction for /z/, rather than simultaneously with the constriction release. Changing the relative timing of two gestures is a common process in ‘‘casual speech’’ as discussed by Browman & Goldstein (1990): no additional gesture is necessary to arrive at a devoiced /z/, just a change in timing of two of the gestures involved. If the usual state of the glottis for voiced fricatives is more open than for other voiced sounds (cf. Klatt, Stevens & Mead, 1968), maintaining sufficient transglottal flow to permit vocal fold vibration will require greater airflow from the lungs than for sounds

496

C. ¸. Smith

produced with a more closed glottis, so vocal fold vibration may fail more often. In addition, just a small additional opening of the glottis could lead to devoicing. Laver (1994) argues that when sounds such as /z/ are devoiced, the glottis is probably in a state intermediate between voicing and voicelessness, like the state of the glottis that is used in whisper, with the glottis open but the folds very close together. For devoicing to occur part way through the /z/, as was the case in most of the devoiced tokens in this study, the vocal tract must be changing from a state that is conducive to voicing to a state that is not conducive to voicing. If a slightly open glottis is the normal laryngeal position for voiced fricatives (cf. Klatt et al., 1968), a small drop in volume velocity of airflow at the glottis could be sufficient to end voicing, since a more open glottis will require greater volume velocity of flow to maintain vocal fold vibration. Even if the glottis is not more open in voiced fricatives than in other voiced sounds, a reduction in transglottal airflow could cause a fricative to devoice. All other things being equal, a decrease in flow from the lungs would lead to lower transglottal flow, potentially falling below the level necessary to maintain voicing. The volume velocity of airflow required by modal voicing is high enough that frication may be possible even if voicing fails because of inadequate flow. For ‘‘chest voice’’ the volume velocity of flow is at least 50 cm3/s, according to Catford (1977, 100), with higher pitch phonation requiring higher volume velocity. The critical volume velocity for turbulence in /s/ (and /z/, assuming they share the same oral articulation) is relatively low—also about 50 cm3/s, according to Catford (1977)—because the oral constriction is narrow. It appears to be possible for transglottal flow to be inadequate for voicing, but frication to be still possible, even with the vocal cords approximated in a position suitable for voicing. This set of circumstances would occur in a context where flow from the lungs decreases, for example at the end of a sentence where subglottal pressure is lower (Gelfer, Harris, Collier & Baer, 1983). At least two sources for devoicing in /z/ have been identified here: one is glottal abduction, the other is a transglottal pressure difference that is insufficient even with the vocal folds adequately adducted. Both of these can arise from effects of the context in which the fricative is produced. Voicing will cease if there is assimilation to a neighboring sound that requires an open glottis. The pressure difference across the glottis may become insufficient if the speaker makes no active maneuvers directed to oral cavity expansion, which is most likely if the fricative is produced in a position that favors lenition. In other cases there may not be such a clear influence of context, and devoicing may just be due to a particular combination of articulatory and aerodynamic conditions. While much of the variability in the likelihood of devoicing can be accounted for by special contextual influences, devoicing in /z/ is nonetheless a process best described in probabilistic terms—more or less likely rather than possible or impossible. 4.2. Devoicing at different times during the fricative With the exception of Laver (1994), few previous discussions of fricative devoicing have considered the possibility of devoicing at the beginning or in the middle of the fricative rather than at the end. However, all four speakers in the present experiment produced at least a few tokens of /z/ with devoiced portions at the beginning or in the middle. Contrary to the suggestion of Laver (1994), there was no discernible trend for

Devoicing of /z/ in American English

497

word-initial fricatives to have devoicing at the beginning and word-final fricatives at the end. Devoicing at the beginning was observed in /z/’s in word-initial, syllable-initial word-medial, and word-final positions. Devoicing at the end occurred in the full range of positions tested in the experiment. Although devoicing at the end of /z/ was by far the most frequent pattern, in 43 tokens the devoiced portion of the /z/ was either at the beginning or in the middle of the acoustic frication. Given that assimilation to the voicing of adjacent sounds is an important determiner for presence or absence of voicing in /z/, devoicing at the beginning of the /z/ would be expected when the preceding sound is voiceless; assimilation would also suggest that the vocal folds are more likely to be vibrating at the end of the /z/ if the following sound is voiced. These predictions are largely correct, assuming that voiced stops in English often lack voicing during the closure (Lisker & Abramson, 1964). There were 18 tokens of /z/ with devoicing at the beginning. In these tokens the absence of voicing is presumably due to hysteresis: the threshold of transglottal pressure drop required to initiate voicing is higher than that required to sustain voicing already in progress (Lucero, 1995). These included four different utterances, all but one of which had a voiced stop preceding the /z/ and a vowel following. Speaker 3 accounted for 14 of these 18 tokens, Speakers 1 and 4 for two each. Speaker 3’s propensity for this pattern of devoicing may be due to a more consistent lack of vocal fold vibration during voiced stops. (Some speakers of English use both prevoicing and short lag VOT in voiced stops, other speakers regularly choose one or the other VOT pattern (Docherty, 1992)). Devoicing at the beginning of the fricative may simply be a consequence of an individual speaker’s choice of VOT for voiced stops. Devoicing in the middle of the fricative was occasionally produced by all four speakers, and occurred in initial and final syllable and word positions. Speaker 1 produced 2 of these tokens, Speaker 2 produced 7, and Speakers 3 and 4 produced 8 tokens each. There were also a total of 4 tokens in which voicing ceased briefly in the middle of the /z/ for less than 10% of frication duration so that the token was categorized as fully voiced. The utterances in which speakers devoiced the middle of the /z/ occurred with a variety of preceding contexts, and included the four utterances in which devoicing was found at the beginning of the /z/. The medially devoiced /z/’s were all followed by either vowels or in one case [1]. Devoicing in the middle of a /z/ (with voiced frication before and after the devoiced portion) seems incompatible with the proposed assimilatory mechanism, since there is no adjacent glottal opening gesture to instigate assimilation. The devoicing in these tokens must therefore have a different origin. The loss and re-initiation of voicing could result from an increase in the tightness of the oral constriction. If the tongue moved toward the hard surface of the vocal tract during the /z/, the oral constriction could narrow enough and increase intraoral pressure enough to stop the vocal folds from vibrating. As the tongue releases the oral constriction, intraoral pressure decreases and voicing begins anew. Thus, it appears that a change in pressure is likely to be the mechanism involved when there is devoicing in the middle of a /z/.

5. Conclusions In American English, a speaker may devoice /z/ in almost any environment, but the likelihood of devoicing varies greatly and depends on the preceding and following segmental context, as well as the position of the /z/ in the prosodic structure of the

498

C. ¸. Smith

utterance. Individual speakers also varied as to how often they devoiced. The rank ordering of contexts for likelihood of devoicing was fairly similar across the speakers. That is, certain contexts were more favorable for devoicing than others for most or all speakers. Further investigation will be necessary in order to identify other influences on the likelihood of devoicing, and whether any of the effects identified here are lexically idiosyncratic. Comparisons of acoustic durations and airflow measurements showed that even completely devoiced /z/’s did not become identical with /s/’s produced in similar contexts. The /z/’s, whether fully voiced, partially devoiced or completely devoiced, are characterized by lower airflow than /s/. The low airflow suggests that speakers are not opening the glottis wider in order to devoice the /z/; the devoicing processes seem to be characterized more by a reduction in effort than the employment of an additional glottal opening gesture, even in the case of assimilation to an adjoining voiceless consonant. The pattern of reduction in weak prosodic environments suggests that in these environments there is less need for the speaker to produce maximal directions between /z/ and /s/; given the exacting requirements for a voiced /z/, a speaker may opt to reduce effort and produce a devoiced /z/ that is more economical of effort, since this can be achieved while still keeping a distinction with /s/ and thus satisfying the perceptual needs of the listener. In voiced sounds, the output of the vocal tract is a product of the interaction between aerodynamic conditions (subglottal pressure, volume of air expelled from the lungs) and the positions of the vocal folds and the articulator forming a supralaryngeal constriction (Bickley & Stevens, 1991). In devoiced sounds, either or both the aerodynamics and articulation are different, so that the output of the vocal tract is altered. It has been proposed here that devoicing in /z/ may result from either an assimilatory process, in which the position of the vocal folds is more open (a change in articulation) or from a lenition process, in which the transglottal pressure drop or the volume velocity of airflow across the glottis is insufficient to maintain vocal fold vibration (a change in aerodynamics). A third possibility is that the articulatory movement for the supralaryngeal constriction causes a change in the aerodynamic conditions, resulting in devoicing. The present data do not determine which of these changes may occur together, but clearly there are limits to the extent that speakers diverge from an ‘ideal’ voiced /z/. Speakers of English seem to diverge only by losing voicing, not by losing frication. In addition, the airflow results show that speakers do not diverge so much that the devoiced /z/ acquires all the characteristics of /s/. The devoicing process examined here is a complex example of the kind of constrained variability that is typical of speech production. Despite the variability observed among speakers and in different contexts, devoicing is not a random process. It is a function of both segmental and prosodic structure, as well as the interplay between articulatory and aerodynamic conditions in the vocal tract. Our understanding of intricately-conditioned processes such as devoicing will be advanced by more investigation of speech produced in communicative contexts where all the conditioning factors can come into play. This work was supported by a postdoctoral fellowship from NIH grant DC00008 to the UCLA Division of Head & Neck Surgery. Thanks to Bruce Gerratt and Jody Kreiman for guidance with data collection and analysis. I would also like to thank Patrice Beddor, Dani Byrd, Gerry Docherty, Ian Maddieson and an anonymous reviewer for helpful comments on earlier drafts of this paper. (They are, of course, not responsible for remaining flaws.) Special thanks to the speakers in the experiment for their willing participation, and the UCLA Phonetics Laboratory for the use of their facilities.

Devoicing of /z/ in American English

499

References Askenfelt, A., Gauffin, J. & Sundberg, J. (1980) A comparison of contact microphone and electroglottograph for the measurement of vocal fundamental frequency, Journal of Speech and Hearing Research, 23, 258—273 Baken, R. (1987) Clinical measurement of speech and voice. Boston: Little, Brown & Co. Balise, R. & Diehl, R. (1994) Some distributional facts about fricatives and a perceptual explanation, Phonetica, 51, 99—110 Baum, S. & Blumstein, S. (1987) Preliminary observations on the use of duration as a cue to syllable-initial fricative consonant voicing in English, Journal of the Acoustical Society of America, 82, 1073—1077 Bickley, C. & Stevens, K. (1991) Effects of a vocal tract constriction on the glottal source: data from voiced consonants. In ¸aryngeal function in Phonation and respiration (T. Baer, C. Sasaki & K. S. Harris, editors), pp. 239—253. San Diego: Singular Publishing Browman, C. & Goldstein, L. (1986) Toward an articulatory phonology, Phonology ½earbook, 3, 219—252 Browman, C. & Goldstein, L. (1990) Tiers in articulatory phonology, with some implications for casual speech. In Papers in laboratory phonology I: between the grammar and the physics of speech (J. Kingston & M. E. Beckman, editors), pp. 341—376. Cambridge: Cambridge University Press Browman, C. & Goldstein, L. (1995) Gestural syllable position effects in American English. In Producing speech: contemporary issues: for Katherine Safford Harris (F. Bell-Berti & L. Raphael, editors), pp. 19—33. Woodbury, NY: AIP Press Byrd, D. (1996a) A phase window framework for articulatory timing, Phonology, 13, 139—169 Byrd, D. (1996b) Influences on articulatory timing in consonant sequences, Journal of Phonetics, 24, 209—244 Catford, J. C. (1977) Fundamental problems in phonetics. Bloomington: Indiana University Press Colton, R. & Conture, E. (1990) Problems and pitfalls of electroglottography, Journal of »oice, 4, 10—24 Dilley, L., Shattuck-Hufnagel, S. & Ostendorf, M. (1996) Glottalization of word-initial vowels as a function of prosodic structure, Journal of Phonetics, 24, 423—444 Dinnsen, D. & Charles-Luce, J. (1984) Phonological neutralization, phonetic implementation and individual differences, Journal of Phonetics, 12, 49—60 Docherty, G. J. (1992) ¹he timing of voicing in British English obstruents. Netherlands Phonetics Archives, 9, Berlin: Foris Flege, J. E. (1982) Laryngeal timing and phonation onset in utterance-initial English stops, Journal of Phonetics, 10, 177—192 Fougeron, C. & Keating, P. A. (1996) Variations in velic and lingual articulation depending on prosodic position: results for 2 French speakers, ºC¸A ¼orking Papers in Phonetics, 92, 88—96 Fougeron, C. & Keating, P. A. (1997) Articulatory strengthening at edges of prosodic domains, Journal of the Acoustical Society of America, 101, 3728—3740 Gelfer, C. E., Harris, K. S., Collier, R. & Baer, T. (1983) Is declination actively controlled? In »ocal Fold Physiology (I. Titze & R. S. Scherer, editors). Denver: The Denver Center for the Performing Arts Gelfer, C. E., Harris, K. S. & Baer, T. (1987) Controlled variables in sentence intonation. In ¸aryngeal function in phonation and respiration (T. Baer, C. Sasaki & K. S. Harris, editors). Boston: College-Hill Haggard, M. (1978) The devoicing of voiced fricatives, Journal of Phonetics, 6, 95—102 Hardcastle, W. J. & Clark J. E. (1981) Articulatory, aerodynamic and acoustic properties of lingual fricatives in English, ¼ork in Progress, Phonetics ¸aboratory, ºniversity of Reading, 3, 51—79 Hirose, H. & Ushijima, T. (1978) Laryngeal control for voicing distinction in Japanese consonant production, Phonetica, 35, 1—10 Isshiki, N. & Ringel, R. (1964) Airflow during the production of selected consonants, Journal of Speech and Hearing Research, 7, 233—244 Jongman, A. (1989) Duration of frication noise required for identification of English fricatives, Journal of the Acoustical Society of America, 85, 1718—1725 Klatt, D., Stevens, K. N. & Mead, J. (1968) Studies of articulatory activity and airflow during speech. In Sound production in man; annals of the New ½ork Academy of Sciences, 155 (A. Bouhuys, editor), pp. 42—55. New York: New York Academy of Sciences Krakow, R. A., Bell-Berti, F. & Wang, Q. (1995) Supralaryngeal declination: evidence from the velum. In Producing speech: contemporary issues: for Katherine Safford Harris (F. Bell-Berti & L. Raphael, editors), pp. 333—353. Woodbury, NY: AIP Press Lass, R. (1984) Phonology. Cambridge: Cambridge University Press Laver, J. (1994) Principles of Phonetics. Cambridge: Cambridge University Press Lisker, L. & Abramson, A. (1964) A cross-language study of voicing in initial stops: acoustical measurements, ¼ord, 20, 384—422 Lisker, L. & Abramson, A. (1967) Some effects of context on voice onset time in English stops. ¸anguage and Speech, 10, 1—28 Lisker, L., Abramson, A., Cooper, F. & Schvey, M. (1969) Transillumination of the larynx in running speech, Journal of the Acoustic Society of America, 45, 1544—1546

500

C. ¸. Smith

Lo¨fqvist, A. & McGowan, R. S. (1992) Influence of consonantal environment on voice source aerodynamics, Journal of Phonetics, 20, 93—110 Lucero, J. (1995) The minimum lung pressure to sustain vocal fold oscillation, Journal of the Acoustical Society of America, 98, 779—784 McGowan, R. S. & Saltzman, E. L. (1995) Incorporating aerodynamic and laryngeal components into task dynamics, Journal of Phonetics, 23, 255—269 Ohala, J. J. (1983) The origin of sound patterns in vocal tract constraints. In ¹he production of speech (P. MacNeilage, editor), pp. 189—216. New York: Springer-Verlag Ohala, J. J. & Riordan, C. (1979) Passive vocal tract enlargement during voiced stops. In Speech communication papers (J. J. Wolf & D. H. Klatt, editors), pp. 89—92. New York: Acoustical Society of America Pierrehumbert, J. & Talkin, D. (1992) Lenition of /h/ and glottal stop. In Papers in laboratory phonology II: gesture, segment, prosody (G. J. Docherty & D. R. Ladd, editors), pp. 90—117. Cambridge: Cambridge University Press Port, R. & Crawford, P. (1989) Incomplete neutralization and pragmatics in German, Journal of Phonetics, 17, 257—282 Port, R. & O’Dell, M. (1985) Neutralization of syllable-final voicing in German, Journal of Phonetics, 13, 455—471 Raphael, L. (1972) Preceding vowel duration as a cue to the perception of the voicing characteristic of word-final consonants in American English, Journal of the Acoustical Society of America, 51, 1296—1303 Rietveld, T. & van Hout, R. (1993) Statistical ¹echniques for the Study of ¸anguage and ¸anguage Behavior. The Hague: Mouton Saltzman, E. & Munhall, K. (1989) A dynamical approach to gestural patterning in speech production, Ecological Psychology, 1, 333—382 Scully, C. (1971) A comparison of /s/ and /z/ for an English speaker, ¸anguage and Speech, 14, 187—200 Soli, S. (1982) Structure and duration of vowels together specify fricative voicing, Journal of the Acoustical Society of America, 72, 366—378 Stevens, K., Blumstein, S., Glicksman, S., Burton, M. & Kurowski, K. (1992) Acoustic and perceptual characteristics of voicing in fricatives and fricative clusters, Journal of the Acoustical Society of America, 91, 2979—3000 Vayra, M. & Fowler, C. (1992) Declination of supralaryngeal gestures in spoken Italian, Phonetica, 49, 48—60 Veatch, T. (1989) Word-final devoicing of fricatives in English. Paper presented at the Lingustic Society of America meeting, Washington, DC Westbury, J. (1983) Enlargement of the supraglottal cavity and its relation to stop consonant voicing, Journal of the Acoustical Society of America, 91, 2903—2910

.