JOURNAL
OF MEMORY
AND
LANGUAGE
Metrical WILLIAM
25, 369-384
(1986)
Phonology
in Speech
E. COOPER AND STEPHEN University
Production J. EADY
of Iowa
The theory of metrical phonology has been devised in an attempt to capture a variety of aspects of rhythmic patterns in speech. To date. however, very few of the theory’s claims have been tested empirically. In this study, assertions made by E. 0. Selkirk (1984, Phonology and Syntax: The Relation between Sound and Structure, Cambridge, MA: MIT Press) and by B. Hayes (1984, Linguistic Inquiry. Vol. IS, pp. 33-74) are tested in five experiments of English speech production. In each experiment, a different group of speakers produced short phrases or sentences containing key words which may undergo stress retraction, depending on the stress pattern of the following word. The stress patterns of the key words were assessed by perceptual evaluation and by acoustical analysis of fundamental frequency and segmental duration. The tests uniformly fail to provide support for claims about stress clashing and retraction in English. whereas other results are in accord with previous acoustical studies. P 1986 Academic Press. Inc.
The phonological component of linguistic theory has undergone a revolution of sorts since the mid-1970s when it was proposed that the theory of metrical patterns utilized in the study of music (Cooper and Meyer, 1960) might serve as a basis for capturing a number of generalizations about speech intonation (Liberman, 1975; Liberman & Prince, 1977). In recent years various versions of the metrical theory have been developed (e.g., Hayes, 1982, 1983, 1984; Halle & Vergnaud, 1979, 1980, book in preparation; Kiparsky, 1979, 1982; Prince, 1980, 1983; Selkirk, 1984). The theory has This work was supported by NIH Grant NS 20071 and by a Fulbright fellowship. We thank members of the Departamento de Linguistica, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, for their hospitality and for arranging for the first author to teach a graduate course covering some of this material. We thank Pamela Mueller, Lori Nelson, and Mary McNulty for assistance with data collection, Bryant Julstrom for computer programming, and Professors Robert Crowder, Morris Halle, Neal Johnson, and Ilse Lehiste for valuable suggestions. Portions of this work were presented at the December 1985 annual meeting of the Linguistic Society of America in Seattle, Washington. Requests for reprints should be addressed to Professor William E. Cooper, Department of Psychology, Spence Laboratories of Psychology. University of Iowa, Iowa City, IA 52242.
also begun to receive attention in psychological approaches to the study of rhythmic behavior (e.g., Shaffer, 1984). In a psychological vein, metrical theory can be viewed as a candidate theory of the speaker/ hearer’s mental representation of a variety of intonational phenomena dealing with word- and phrase-level stress patterns. As in earlier psychological work on such phenomena (e.g., Martin, 1972), the metrical approach embodies the claim that the mental representations of stress patterns are organized hierarchically rather than serially. Unlike earlier treatments, however, metrical theory is designed to account for a much broader range of phenomena and includes a more elaborate and highly abstract framework to accomplish this aim. According to the tenets of metrical phonology, the phonological representation of the grammar needs to be considerably enriched beyond that envisioned by Chomsky and Halle (1968). As a consequence, many phonological effects which were formerly seen as directly under the influence of syntax are now seen as only indirectly so intluenced. Notationally, the new theory makes heavy use of metrical grids, two-dimensional arrays containing hori-
369 0749-596X/86 Copyright All rights
$3 .OO
0 1986 by Academic Press, Inc. of reproduction in any form reserved.
370
COOPER
zontal levels on which x’s mark abstract beats. The beat is the basic unit of abstract meter. In order for meter to exist, some beats must be accented while others are not (Cooper & Meyer, 1960). Accented beats are known as “strong” beats, while unaccented beats are terms “weak” beats, according to metrical phonology. As for the geometry of the two-dimensional grids on which beats are represented, the horizontal axis is typically defined in terms of successive syllables or words, depending on whether the grids are used to delineate word- or phase-level stress patterns, respectively. The vertical axis refers to abstract levels of stress assignment, postulated by the theory in order to account for a variety of phenomena that are not well handled by simpler theoretical apparatus, including the simple observation that there appears to be no upper bound on the number of degrees of stress that can be distinguished in a language (Halle and Vergnaud, book in preparation; Prince, 1983). As shown in the example below, this form of array permits the representation of hierarchically arranged beats, subject to the constraint that an x on a higher row must always coincide with one on a lower row, but not vice versa: X
x xx xx x xx x xxxxxxxxxxxxx
X
At a given row, the metrical grid notation permits representation of strong and weak beats, with strong beats denoted by x’s. The notation also provides a means of representing a variety of relative strengths when the entire array is considered. Thus, for example, the primary, or strongest stress, is represented by the column containing the most x’s. A major challenge is to formulate a set of constraints on the form of metrical grids and on rules of grid alteration that capture major rhythmic regularities and at the same time prohibit unal-
AND
EADY
lowable sequences. Selkirk (1984), for example, proposes rules that add, move, or delete beats, adopting phonological ‘ ‘transformations” and the appropriate formalisms of Structural Description and Structural Change, as in generative syntax. Generally, these transformations operate to enhance the rhythmic euphony of the metrical pattern. Euphony is optimized, according to Selkirk, when the pattern includes an alternation of strong and weak beats at all levels in the grid. A major function of the beat transformation rules is to avoid the occurrence of adjacent strong beats at a given level and, similarly, to avoid long sequences of weak beats. Hayes (1984) proposes a specific rule of grid euphony along somewhat different lines, specifying four syllables as the optimal number between stresses, to be assessed in Experiments 3-4. In addition to metrical grids, Hayes (1984) has argued that metrical trees are also required as a notational device to capture important relations of phrasal prominence. Selkirk (1984) and Prince (1983) argue that metrical grids alone are sufficient to account for all the regularities. The arguments used in support of different versions of the metrical theory are usually quite complex and elaborate. Yet, these arguments often rest on judgments of prominence whose subtlety calls out for some sort of systematic, objective testing. For example, in arguing for the cyclical treatment of a rule of beat movement, Selkirk cites a complex example, attributed to Janet Pierrehumbert, involving the following phrase, having the presumed stress pattern as indicated: 1 4 2 4 3 Alewife Brook Parkway subway station. Most relevant to the claim is the displacement of secondary stress from parkway to alewife, which simply does not occur in our own rendering of this phrase. More to the point, perceptual judgements and acoustical data by nonpartisan observers are required.
METRICAL
In examining the elaborate theoretical apparatus of metrical phonology, one is struck by the absence of empirical evidence bearing on some of the theory’s most basic claims. For example, metrical theorists uniformly assume a stress contrast on the word thirteen when it appears alone versus when it appears in the phrase thirteen men. It is assumed that a rule of stress retraction operates to shift stress from the second syllable of thirteen to the first when the word is followed by a stressed syllable, thereby avoiding a stress “clash” that would be otherwise produced by the adjacent occurrence of two stressed syllables. The notion that metrical patterns prefer an alternation between stressed and unstressed beats is central to many versions of the theory. While the notion of stress clash seems intuitively plausible, it has not been subjected to much systematic testing. Earlier studies of isochrony (e.g., Lehiste, 1980) are characteristically cited as support for the notion, yet such studies seldom bear on the specific claims of interest in distinguishing this theory from earlier alternatives. In the first two experiments reported here, we examine the possible influence of stress clash on speech timing. In Experiments 3 and 4, Hayes’ (1984) rule of quadrisyllabic meter is tested. Finally, Experiment 5 includes a basic test of the stress clash notion, motivated by a consideration of the results obtained from Experiments 1-4. 1 According to Selkirk (1984), a stress clash can be optionally avoided by segmental lengthening and pausing in instances for which the stress pattern on the words themselves cannot be modified. Thus, for example, when two heavily stressed monosyllabic nouns appear in sequence, the first noun is likely to undergo lengthening, and a pause might appear or be elongated between the two words to alEXPERIMENT
PHONOLOGY
371
leviate the stress clash. Pause elongation would be most readily manifest at clause boundaries (see Experiment 2), where the mere presence of a pause is likely on syntactic grounds. While Selkirk’s notion appears intuitively reasonable, she includes no empirical tests to evaluate it. In this experiment, we test Selkirk’s notion by measuring the durations of words within the same phrase, including contrasts between clashing and nonclashing word pairs using carefully matched materials. Methods Sentence materials. Five pairs of sentences were constructed for use in this experiment. Each sentence contains the key word Chris, whose duration is measured. The sentences contain minimal contrasts in stress pattern, with the A version of each pair including a word like Mack after the key site, creating a stress clash, and with the B version including a similar word like Mackenzie, with main stress after the first syllable, presumably relieving the stress clash with the preceding monosyllabic word. The five sentence pairs appear below, with the A and B version separated by a slash. 1. Tom introduced Chris Burn/Burninski to his sister. 2. Charlie went with Chris MackiMackenzie to see the movie. 3. John went to the baseball game with Chris Mullin/Molino and his brother. 4. Mary liked the sweater that Chris RubiniRubinski bought for her. 5. George took Chris BeWBelinski to watch the Hawkeyes play against Indiana.
In order to camouflage the minimal contrasts, these sentences were presented to speakers along with 36 others which were being tested in a different, unrelated experiment. Speakers. Ten male undergraduates at the University of Iowa participated in this experiment as part of a course requirement in Introductory Psychology. The subjects were native speakers of American English with no speech or hearing impairments. All
372
COOPER
speakers had a Midwest dialect, and none had prior knowledge of the issues under examination. Procedures. The speakers were tested individually in a quiet room. Each speaker was given a pseudorandomized list of the 46 sentences (10 target sentences and 36 fillers) and asked to read each one for recording. The speaker was told to first consider the meaning of each sentence and then to utter it aloud. If the speaker departed from his intended utterance in any way, he was asked to say “repeat” and say the sentence again until it was satisfactorily produced. Aside from rare instances of speech errors, the first token of each sentence was adequate. Half of the speakers read the sentences in normal order: the other half were given the list in reverse order. The utterances were recorded on a reel-to-reel tape recorder at a recording speed of 7.5 in./s. Acoustical analysis of the test utterances was performed from a digitized oscillographic trace of the speechwave using a PDP- 1 l/23 computer and a high-resolution graphics terminal. The speech was first digitized at 10 kHz and the waveform was displayed on the graphics screen. A computer-controlled cursor was then manipulated to demarcate the beginning and end of each key word, thereby providing a measure of duration. The onset of the key word Chris was defined to be the point of plosive release for the initial /k/ consonant. The end of the key word was located at the termination of frication for lsi. The duration measurements were estimated to be accurate to within 3 ms. Results
and Discussion
The results of this experiment appear in Table 1, where the mean word durations are shown for the two versions of each sentence. Each value in the table represents the mean duration for all 10 speakers. In order to assess the statistical significance of these results, we used a repeatedmeasures analysis of variance in a factorial
AND
EADY TABLE 1 DURATION (ms) OF KEY WORD CHRIS FOR THE SENTENCES OF EXPERIMENT I
MEAN
Version Sentence
A
I 2 3 4 5 Grand
Note. tions
means
Values
B
213. (51.0) 204. (22.3) 189. (42.2) 207. (28.5) 235. (36.2)
251. (54.9) 193. (36.8) 204. (44.4) 245. (88.3) 254. (53.0)
209. (38.9)
229. (61.5)
in parentheses
are
standard
devia-
design (2 Versions x 5 Sentences x 10 Speakers). The version factor was designated as a fixed effect, and all other main effects and interactions were defined to be random (following Clark, 1973). The F ratios and degrees of freedom for this mixed model were calculated using the method described in Snedecor and Cochran (1980, p. 324). According to Selkirk’s assertions, the version factor should show a significant main effect for word duration in these sentences. That is, the duration of Chris should be significantly longer in version A than in version B, 1ue to the stress clash in A. However, the mean durations displayed in Table 1 show that this is not the case. In only one of the five sentence pairs is the mean duration greater for the A version. Furthermore, the grand mean for all sentence pairs shows that the duration of the A version averages some 20 ms less than that of the B version. The results of the statistical tests show that this difference is not significant, F’(1,12) = 2.39. Thus, the results of this experiment provide no support for Selkirk’s claim that the duration of a word in a sentence will be in-
METRICAL
fluenced by the stress pattern of the following word. Indeed, in four of the five sentence pairs used in this study, the duration patterns are contrary to the predictions of the Selkirk hypothesis. We conclude, therefore, that the speech patterns predicted by Selkirk are not evident in the productions of the 10 subjects analyzed in this study. While Selkirk does regard lengthening and pausing as optional means of alleviating stress clash, the absence of any systematic trends in the anticipated direction suggests that even the weak version of her notion is inapplicable. EXPERIMENT
2
In the previous experiment, we examined Selkirk’s hypothesis by looking at the duration of the first word of a two-word noun phrase. Since speakers do not normally pause in the middle of a noun phrase, any effect of a stress clash should have been evident in the duration of the key word. In this experiment, we extend testing to clause boundaries, where lengthening and pausing are more likely to appear regardless of stress patterns (e.g., Cooper and Paccia-Cooper, 1980). Generally, the appearance of lengthening and pausing at clause boundaries is expected to alleviate a potential stress clash, according to Liberman and Prince (1977) and Selkirk (1984), and these authors, accordingly, do not predict any differential effects of lengthening and pausing as a function of stress pattern. However, since pausing does not regularly accompany the juncture tested in Experiment 1, we decided to examine clause boundary junctures here to determine whether the stress pattern in the region of the boundary might exert any influence on speech timing. Methods Sentence materials. Six pairs of sentences were constructed for use in this experiment. The sentences contain minimal contrasts in stress pattern, with the A version of each pair including a word like Paul
PHONOLOGY
373
after the key site, creating a stress clash, and with the B version including a similar word like Pauline, with stress after the first syllable, presumably relieving the stress clash with the preceding monosyllabic word. The A version of each string appears below, with the B string derived by the addition/replacement of the material in parentheses. 6. If Nancy goes to the play with Klauss. Robert(a) will be upset with them. 7. If Randy gives trumpet lessons to Chris, Juan(ita) will ask them to conduct their sessions in the den upstairs. 8. If Mary plays basketball in the afternoons with Klauss Lynn(ette) will give them tickets’the Olympic finals. 9. If Suzanne borrows any more money from Chris, Paul(ine) will give them both a piece of his(her) mind. 10. If Dan walks in the procession beside Chris, George(ette) will offer them invitations to his (her) graduation party. 11. If Alice shows up at the tennis match with Klauss, Victor(ia) will give both of them a big hug.
The underlined key word represents the word to be measured for duration in each sentence. The key words Chris and Klauss were chosen for their ease of segmentability and to include an additional contrast between key words containing short vs long syllable nuclei (i.e., the short vowel /I/ in Chris vs the diphthong /a”/ in Klauss). This difference in syllable nuclei has previously been shown to have an effect on duration (Peterson and Lehiste, 1960; Umeda, 1975). In order to camouflage the minimal contrasts, 12 filler sentences were also included in the experiment. Speakers. Eight male undergraduates at the University of Iowa participated in this experiment as part of a course requirement in Introductory Psychology. The subjects had the same qualifications as those used in Experiment 1, and none of them had participated in the previous study. Procedures. The speakers were tested individually in a quiet room, using the same procedures described in Experiment 1. Acoustical analysis of the test utterances
374
COOPERANDEADY
was performed using the digital method described earlier. As in the previous experiment, the duration measurements were determined by manipulating a computer-controlled cursor on a waveform display to demarcate the beginning and end of each key word and subsequent pause. The onset of each key word (i.e., Chris or Klauss) was defined to be the point of plosive release for the initial lki consonant. The end of the key word and the beginning of the following pause were located at the termination of frication for lsi. The end of the pause was measured at the onset of the first phoneme of the following word. The duration measurements were estimated to be accurate to within 3 ms. Results and Discussion The results of this experiment appear in Table 2, where the mean word and pause durations are shown for the two sentence versions. Each value in the table represents the mean duration for all eight speakers and all six sentence pairs. In order to assess the significance of these results for word and pause duration, we used a repeated-measures analysis of variance in a factorial design. The independent variables in the model were version (A or B), word (Chris or Klauss), speaker (eight speakers), and sentence
TABLE2 MEAN DURATION (ms) OF KEY WORD AND PAUSEFOR THESENTENCESOFEXPERIMENT
2
Version A Key-word duration (ms)
Chris Klaus
Pause duration (ms)
Chris Klaus
Note.
tions.
Values in parentheses
B
436. (60.2) 503. (54.9) 362. (118.4) 365.
430. (67.2) 492. (64.7) 285. (171.4) 354.
(190.5)
(205.4)
are standard devia-
nested within word (six sentences containing each of the two key words). The version factor was designated as a fixed effect, and all other main effects and interactions were defined to be random (following Clark, 1973). The F ratios and degrees of freedom for this mixed model were calculated using the method described in Snedecor and Cochran (1980, p. 324). According to Selkirk’s assertions, the version factor should show a significant main effect for both word duration and pause duration in these sentences. That is, the durations of the clause-final word and of the following pause should be significantly longer in version A than in version B, due to the stress clash in A. The mean durations shown in Table 2 do show a trend in this direction for both dependent variables. However. the statistical tests show no significant effect for the version factor on either of these variables (for word duration, F’(2,6) = 1.59; for pause duration, F’(2,ll) = 2.56). This nonsignificant effect can be attributed to the fact that the observed differences in mean duration between the two versions, although showing a trend in the predicted direction, are quite small with respect to the amount of variance in the data (as indicated by the relatively large standard deviations shown in Table 2). For word duration, the difference in the means between the two versions is only 6 ms for the key word Chris and 11 ms for Klauss. In comparison to the standard deviations for word duration (which range between 50 and 70 ms), the difference between the means is quite small and, therefore, not significant. On the other hand, however, the difference in mean duration between the different key words used in these sentences (i.e., -Chris vs ____ Klauss) does show a significant effect for these data, F’(1,8) = 7.96, p < .025. The key word Klauss is 67 ms longer than Chris in Version A sentences and 62 ms longer in Vet-sion B sentences. Thus, although we do not find a significant effect on word duration
METRICAL
that coincides with Selkirk’s claims about English stress patterns, we do see a significant effect for the segmental composition of the key words (i.e., the diphthong vs the short vowel), as has been observed in previous research (e.g., Peterson and Lehiste, 1960; Umeda, 1975). The difference observed in pause duration between the sentences of versions A and B is considerably larger than that for word duration. The pause following the word Chris is 77 ms greater in version A than version B, while that following Klauss shows a difference of 11 ms between the two versions. However, as seen in Table 2, the standard deviations for this variable are quite large (ranging from 118 to 205 ms). Consequently, the observed difference in the mean pause durations between the versions is not significant. The only factor that does show a significant effect for pause duration is speaker, F’(9,12) = 4.03, p < .025. This factor is also significant for word duration, F’(l,S) = 7.96, p < .025, and indicates that, as expected, there is a difference in speaking rate among the eight speakers in this study. None of the other main effects or interactions reached statistical significance for either of the two dependent measures used here. Thus, the results of this experiment provide no support for Selkirk’s claim that the duration of clause-final words and pauses will be influenced by the stress pattern of the following word. Small average trends were observed in the predicted direction, but the trends were not significant due to large variability. For issues other than those involving metrical claims, significant durational effects were observed despite such variability, indicating that the lack of significance obtained for the metrical predictions cannot be attributed to an inadequate testing circumstance. We conclude, therefore, that the speech patterns predicted by Selkirk are not evident in the productions of the eight speakers analyzed in this experiment. Taken in conjunction with
375
PHONOLOGY
the results of Experiment 1, the data provide no support for the notion that lengthening and pausing alleviate stress clash, even in any systematic optional manner. EXPERIMENT
3
Hayes (1984) proposes that the key rule of metrical grid euphony involves an optimal target of four syllables separating prominences, as follows. Quadrisyllabic
Rule
“A grid is eurythmic when it contains a row whose marks are spaced close to four syllables apart.” Hayes formulated this rule on the basis of numerous examples. Chief among these is a claim about contrasts like the following (the diacritic (‘) indicates the location of primary stress): 12a. b. c. d.
Tennessee Tennessee Tennessee Tennessee
abbrevihtions. legislhion. conmktions. r&lathes.
In these and similar examples, Hayes claims that the primary stress on Tennessee is likely to be displaced from the last syllable to the first as a function of the proximity of the primary stress in the following word. Thus, stress retraction is most likely in (12d) and least likely in (12a) above. In order to test this rule, we first examined speakers’ productions of these phrases in isolation. We used perceptual judgements and acoustical measurements to determine whether the productions of American English speakers do in fact conform to the quadrisyllabic rule. Methods Linguistic materials. Sixteen phrases were constructed for use in this experiment. Each two-word phrase consisted of a state name (i.e., Mississippi, Pennsylvania, New York, and Tennessee), followed by one of the key words utilized by Hayes (i.e., relatives, legislation, connections, and abbreviations), with 4 x 4 or 16 combinations in all.
376
COOPER
Speakers. Seventeen native speakers of American English (12 females and 5 males) served in this experiment. None of the speakers had prior knowledge of the issues under investigation, nor had any of them participated in the previous experiments. The speakers had no speech or hearing impairments, and all were students at the University of Iowa. Procedures. The speakers were tested individually as in the previous experiments. Nine of the speakers were first asked to produce the two-word phrases, presented in pseudorandomized order, and then the four state names in isolation. The other eight speakers were presented these tasks in reverse order. The speakers were recorded on magnetic tape, as described in Experiment 1. A trained listener, unfamiliar with the issues under examination, was asked to judge whether the primary stress in the state name occurred on the first or a subsequent syllable and to verify that the primary stress on the following word was produced in standard fashion. In addition to this perceptual analysis, we also performed acoustical analyses on a subset of each speaker’s productions. We analyzed fundamental frequency (F,) and duration patterns in the phrases Pennsylvania relatives and Pennsylvania abbreviations. These two phrases represent the endpoints on the continuum of stress patterns, since relatives has primary stress on the first syllable, while abbreviations has it on the fourth syllable. Thus, according to the Hayes hypothesis, the primary stress in Pennsylvania should occur on the third syllable before abbreviations and should shift to the first syllable before relatives. This stress shift should be manifested in the acoustical patterns of these phrases. Previous research has indicated that two of the primary acoustic correlates of word stress are the duration and F, value of a syllable (Fry, 1955; Lieberman, 1960; Morton and Jassem, 1965). In the present experiment, therefore, we expected that
AND
EADY
the shift in main stress from the third to the first syllable in the word Pennsylvania would be manifested by an increase in the duration and F, value of the first syllable and a corresponding decrease in these two variables for the third syllable. Thus, our acoustical measurements consisted of the duration of the first syllable in Pennsylvania (from the onset of voicing in the vowel to the onset of frication for /s/) and the Peak F, values of the first and third syllables in this word. We were not able to make reliable duration measurements of the third syllable, due to coarticulation effects with neighboring segments. The recorded utterances of each speaker were digitized at a sampling rate of 10 kHz, and a fundamental frequency contour for each phrase was obtained using a timedomain pitch-detection algorithm. The peak F, values of the two measured syllables in Pennsylvania were determined using the criteria of Cooper and Sorensen (1981). The duration of the first syllable was measured on a digitized waveform display using the procedure reported in Experiment 1. Results and Discussion The results of the stress judgement analysis are presented in Table 3. This table shows the number of speakers (out of a total of 17) who placed primary stress on the first syllable of the state name for each of the four phrase contexts. According to Hayes, the number of early stress placements should increase as the phrase context changes from abbreviations to legislation to connections to relatives. As Table 3 reveals, no consistent effect of this kind appears. Most of the speakers placed primary stress on the first syllable of the state names regardless of the following word. For none of the four state names does the number of speakers producing first-syllable stress show a gradual monotonic increase as the context changes from abbreviations to legislation to connections to relatives.
METRICAL
TABLE NUMBER
377
PHONOLOGY 3
OF SPEAKERS (n = 17) WHO PLACED PRIMARY STRESS ON THE FIRST SYLLABLE IN EXPERIMENT 3
OF STATE NAMES
State name Context abbreviations legislation connections relatives
Mississippi
Pennsylvania
16 16 15
17
We conclude, therefore, that the stress patterns of the 17 speakers examined in this study do not provide support for Hayes’ quadrisyllabic rule of meter. This conclusion is based on the perceptual judgments of one listener. As we shall now see, the results for the acoustical analyses confirm these perceptual judgments. The results of the acoustical analyses for the two phrases measured in this experiment are presented in Table 4. The values in the table are the means for all 17 speakers. The means displayed here show that there is very little difference in the acoustical patterns of the word Pennsyfvania, regardless of which word follows it. Measurements for the first syllable show a difference in mean duration of only 2 ms and an F, difference of only 1 Hz between the two phrases. Similarly, the mean F, values for the third syllable differ by only 2 Hz. We performed two-way analyses of variance (2 phrases x 17 speakers) for these data and found that there is no significant difference between the phrases for any of the three measured variables (in each case, F(1,16) < 1.26, NS). Further analysis shows that for the productions of the first syllable in Pennsylvania only 9 of the 17 speakers had greater duration and only 7 had greater F, values before relatives than before abbreviations. For the third syllable, only 9 speakers showed the expected F,, decrease before relatives. Furthermore, only 1 of the 17 speakers displayed a pat-
New York
Tennessee
11 14 10 14
16 16 17 17
10 12 14 12
tern for all three measured variables that would be characteristic of the stress shift hypothesized by Hayes. Based on these results, we conclude that the small differences observed for the measured variables can be attributed to nothing more than chance variation. Thus, in agreement with the perceptual judgments discussed above, the results of the acoustical analyses provide no evidence to support the Hayes hypothesis concerning the stress patterns of these isolated phrases. EXPERIMENT
4
It is conceivable that the nonsignificant findings of Experiment 3 might be attributable to the fact that speakers produced the phrases in isolation with no sentence context. It is possible that the unnatural task of uttering isolated two-word phrases might have masked the effect we were examining
TABLE
4
MEAN DURATION AND F, VALUES FOR THE WORD Pennsylvania FROM THE PHRASES IN EXPERIMENT 3
Phrase Pennsylvania
abbreviations
Pennsylvania
relatives
Note.
tions.
First syllable First Third duration syllable syllable (ms) F,, (Hz) F, (Hz) 121. (14.8) 123. (18.8)
Values in parentheses
193. (55.2) 194. (53.8)
183. (50.4) 181. (50.8)
are standard devia-
37x
COOPER AND EADY
(although we should note that Hayes and other proponents of metrical phonology typically cite such isolated phrases as evidence for their theory). In order to provide a further test of Hayes’ claim, therefore, we embedded the target phrases used in the previous experiment in sentence carriers. As in the previous experiment, we conducted an acoustical analysis of the utterances to complement listener judgements of stress placement. Method Subjects. Subjects
included 19 undergraduates at the University of Iowa (16 males and 3 females), who participated as part of a course requirement. The subjects had the same qualifications as those in the previous experiments and none of them had participated in those studies. Linguistic materials. Twelve sentences were included as materials in this experiment. Each of the four Pennsylvania phrases from Experiment 3 was embedded in three sentence contexts. Each key phrase appeared in sentence-initial, medial, and final positions, as shown below: 13. Pennsylvania (relatives, connections, legislation, abbreviations) is/are especially difficult to organize. 14. I think that Pennsylvania (relatives, connections, legislation, abbreviations) is/are especially intriguing. 15. Everyone but Harry speaks for Pennsylvania (relatives, connections, legislation. abbreviations).
Procedures. The subjects were tested as in previous experiments. The same listener from Experiment 3 made perceptual judgments of primary stress on Pennsylvania. In addition, an acoustical analysis was conducted to examine acoustic correlates of stress in an objective fashion. As before, the peaks in fundamental frequency of the first and third syllables of the target word were measured, as well as was the duration of the first syllable. These measurements were performed using the procedures described in the previous experiment. As in
Experiment 3, we once again chose to perform the acoustical analyses on a subset of the data. We analyzed all speakers’ productions of Pennsylvania abbreviations and Pennsylvania relatives, since this contrast is the one most likely to show evidence of the phenomenon under investigation. We also did acoustical measurements for Pennsylvania connections, in order to provide additional data for testing the Hayes hypothesis. Results and Discussion
The results of the perceptual judgments for this experiment are presented in Table 5, which displays the number of speakers (out of a total of 19) who placed primary stress on the first or third syllable of the target word. The data are arranged according to sentence position. As these results indicate, the listener judged the primary stress in Pennsylvania to occur most often on the third syllable in this experiment. In a number of cases, however, the stress judgment was difficult to make, and the listener indicated that there was equal stress on both the first and third syllables. In only one instance did our listener judge that primary stress fell uniquely on the first syllable. Given this lack of first syllable stress judgments, it is difficult to see evidence in these data of the stress shift posited by Hayes. Even if we view our listener’s judgments of equal stress on the first and third syllables as evidence of this shift, we still do not have a case for the Hayes hypothesis. According to Hayes, the primary stress in Pennsylvania should move from the third to the first syllable as the following word changes from abbreviations to relatives. This pattern is not evident at any of the three sentence positions examined here. Indeed, the most striking observation about these data is that the stress placement in Pennsylvania seems to differ more as a function of sentence location than it does in relation to the stress pattern of the following word. That is, primary stress
METRICAL
NUMBER
379
PHONOLOGY
TABLE 5 OF SPEAKERS (n = 19) WHO PLACED PRIMARY STRESS ON THE FIRST AND THIRD SYLLABLES
OF
Pennsylvania IN THE SENTENCESOF EXPERIMENT 4 Sentence position
Word following Pennsylvania
Initial
abbreviations legislation connections relatives abbreviations legislation connections relatives abbreviations legislation connections relatives
Medial
Final
Primary stress on first syllable
occurs predominantly on the third syllable in sentence-medial and sentence-final positions and much less so in the sentence-initial location. These results suggest that the stress pattern of a word is influenced more by sentence position than by the stress pattern of adjacent words. The results of the acoustical analyses for these sentences also fail to provide evidence for the stress shift posited for the target word. These results are displayed in Table 6, which shows the mean duration and F,, values for the measured syllables averaged across all 19 speakers. The means are arranged according to sentence position, and the grand means at the bottom of the table are averaged across all three sentence positions. Examination of the grand mean values for the three words following Pennsylvania shows that for each dependent variable the highest value is obtained when the following word is abbreviations. This finding does not provide support for the Hayes hypothesis. In order to assess the statistical significance of these results, we calculated repeated-measures analyses of variance for each of the three dependent measures. The
Equal stress on first and third syllables
Primary stress on third syllable
10
9 7 8 9 1.5 15 16 17 17 17 18 19
12
I1 10 4 4 3 2 2
0 0
1 1 0
independent variables in the model were word (the three words following Pennsylvania), sentence position (initial, medial, and final), and subject (19 speakers). All factors in the model were assumed to be random. The statistical analysis for the duration of the first syllable in Pennsylvania shows no significant effect for the following context F’(l0,18) = 1.12, or for sentence position, F’(4,23) = 1.99. The only significant main effect in this analysis is for the subject factor F’(20,71) = 8.16, p < .005, which indicates, as expected, that the subjects did not all speak at the same rate. These results for first syllable duration do not support the Hayes hypothesis which would predict a significant effect depending on the word following Pennsylvania. In contrast to these nonsignificant findings for first syllable duration, the results of the statistical tests for peak F,, on the first and third syllables of Pennsylvania do show significant effects. For both syllables, there is a significant effect on F, for the word factor (for the first syllable, F’(2,lO) = 8.16, p < .Ol; for the third syllable, F’(3,29) = 7.98, p < .005). Subsequent analyses using the Newman-Keuls
380
COOPER AND EADY
MEAN
DURATION
Sentence position Initial
AND
F,
VALUES
FOR THE WORD
TABLE 6 Pennsylvania
Word following Pennsylvania
Medial
Grand means
(ms)
F, (Hz)
Fo 0-M
172. (46.6) 162. (43.6) 163. (47.2)
155. (41.2) 152. (45.8) 153. (46.7) 142. (40.4)
121.
relatives Final
syllable
connections
connections
abbreviations
4
First syllable
120. (20.9)
abbreviations
IN EXPERIMENT
First syllable duration
abbreviations
relatives
FROM THE SENTENCES
(20.6) 123. (19.6) 129. (17.3) 123. (20.5) 128. (24.7) 129. (28.6)
connections
125.
relatives
(19.4) 124. (17.7)
abbreviations
126. (22.7)
connections
123.
relatives
(19.9) 125. (20.6)
155. (42.4)
Third
151.
139.
(43.8) 150. (42.5) 140. (36.7) 135. (40.5) 133. (42.1)
(41.3) 138. (37.3) 128. (35.6) 123. (36.1) 123. (36.6)
156. (43.4) 149. (43.3) 149. (45.0)
142. (40.1) 138. (42.2) 138. (41.6)
Note. Values in parentheses are standard deviations.
test indicate that this effect is due to the fact that the mean F, values for both measured syllables in Pennsylvania are significantly higher before abbreviations than before relatives or connections. No significant difference is obtained between the relatives and connections environments for either of the F,, measures. As is the case for the duration measure reported above, these results provide no support for the stress-shift hypothesis posited by the proponents of metrical phonology. In the case of the first syllable of Pennsylvania, we see no evidence of an increase in stress level before relatives versus abbreviations. Indeed, there is a significant F, change in the direction opposite to that posited by Hayes. As for the third syllable of Pennsylvania, the significantly
higher F, value before abbreviations does conform to the notion that this syllable should receive primary stress in this context, as proposed by Hayes. This piece of evidence in support of the metrical hypothesis is vitiated, however, by the lack of significant difference in first syllable duration and by the significant effect in the wrong direction for F, on the first syllable of the key word. We believe that the significant F,, effects observed in these data are due not to the hypothesized stress shift, but rather to the effect of F, declination and the so-called “PI effect,” which have both been observed in previous studies of the F, patterns of isolated sentences. The presence of F, declination in the present study is evident in the significant F,, effect obtained for
381
METRICALPHONOLOGY
both measured syllables of Pennsylvania with respect to sentence position (for the first syllable, F’(2,21) = 72.2, p < .005; for the third syllable, F’(2,40) = 91.0, p < 305). Newman-Keuls range tests indicate that for both measured syllables in Pennsylvania, F, is significantly higher in the sentence-initial than in the sentence-medial position. In turn, the F, values obtained for the key word in the sentence-medial position are significantly greater than those observed at the end of a sentence. These results are in accord with the declination effect observed in previous studies (e.g., Cooper & Sorensen, 1981; Liberman & Pierrehumbert, 1984). The significantly higher F, values observed on both measured syllables preceding abbreviations versus connections and relatives can be explained if we couple the declination effect with the Pl effect. This latter phenomenon, observed by Cooper and Sorensen (1981) and by Cooper, Soares, and Reagan (1985), stipulates that sentence-initial F, is directly proportional to some measure of sentence length. That is, longer sentences will have higher initial F,, values than shorter ones. One consequence of the Pl effect and F, declination is that the F, value for any word in a sentence will depend, to some degree, on its distance from the end of the sentence (although F, will also depend on other factors, such as sentence focus; see Cooper, Eady, and Mueller (1985)). The results of the present experiment are in agreement with this hypothesis. F, on Pennsylvania is higher before abbreviations than it is before connections and reiatives in all three sentence positions. At the same time, abbreviations has two syllables more than the other two words. Since the sentences are identical in all other respects, the additional two syllables in abbreviations increases the distance between Pennsylvania and the end of the sentence in all cases. Consequently, the F, values for the key word are also increased in this context. While we have not tried to confirm this
hypothesis by making duration measurements of the sentence fragments following Pennsylvania for the three word contexts, we feel that this hypothesis does provide a logical explanation for the results obtained in this study. It certainly gives a much more plausible account of the data than does the Hayes hypothesis. Taken in sum, the results of this experiment provide further evidence against the quadrisyllabic rule of grid euphony proposed by Hayes (1984). Neither listener judgments of primary stress location nor acoustical analysis of duration and fundamental voice frequency patterns reveal any support for the relocation of stress placement as predicted by Hayes’ rule. EXPERIMENT 5
The results of Experiments l-4 prompted us to reconsider one of the most basic claims of metrical phonology, dealing with the retraction of syllabic stress to alleviate stress clash. As observed in the introduction, it is conceivable that the appearance of stress retraction in classic examples such as “thirteen men” might have been overplayed due to the assumption that the citation stress pattern of “thirteen” bears primary stress on the second syllable. If instead, primary stress is borne on the first syllable in citation form, then no retraction would be possible in cases such as “thirteen men,” contrary to the basic claims of Liberman and Prince (1977) and virtually all linguists who have written about metrical phonology since. This experiment was designed to test the retraction claim. Method Subjects. The subjects for this experiment were nine undergraduate students at the University of Iowa (five males and four females), who had the same qualifications as those used in previous studies. None of these subjects had participated in any of the preceding experiments. Linguistic materials. Eighteen sentences
382
COOPER AND EADY
were used in this experiment. Each sentence contained one occurrence of the key word thirteen followed by one of two words which have minimal contrasts in the placement of primary stress. One word of each pair (version A) has primary stress on the first syllable, and is assumed to cause primary stress on thirteen to shift from the second to the first syllable. The other word of each pair (version B) has primary stress on a later syllable, and is thus assumed not to affect the stress pattern on thirteen. The stimulus sentences are listed below, with the minimal word pairs following thirteen separated by a slash (i.e., version A/version B; the diacritic (‘) indicates location of primary stress): 16a. Thirteen blackboards/black boards were purchased at the lumberyard this morning. b. This morning I purchased thirteen blackboards/black boards at the lumberyard. C. At the lumberyard this morning I purchased thirteen blackboards/black boards. 17a. Thirteen colleges/universities reported declining student enrollments during the past decade. b. Declining student enrollments were reported by thirteen colleges/universities during the past decade. C. Declining student enrollments during the past decade were reported by thirteen colleges/univkrsities. 18a. Thirteen companies/corporations submitted bids to build the new shopping mall. b. Bids were submitted by thirteen cornpaniesicorporations to build the new shopping mall. Bids to build the new shopping mall C. were submitted by thirteen companies/ corporations.
It should be noted that the nine sentences listed here actually consist of three base sentences that have been transformed so that the key word appears in sentence-initial, sentence-medial and sentence-final positions. The stimuli were constructed in this way so that we could identify any differential effect on stress retraction due to sentence position. We also wanted to verify that the subjects in this experiment would produce sentences showing the same F,
declination effect that was observed in Experiment 4. Procedures. The 18 stimulus sentences were presented to subjects in a pseudorandom order and were recorded as in the previous studies. All sentences produced by each speaker were then subjected to acoustical analysis as described earlier. In this experiment, we measured the peak in fundamental frequency of each syllable of the word thirteen, as well as the duration of the first syllable in this word. As in Experiments 3 and 4, we expected that any shift in main stress for the key word would be manifested by an increase in the duration in and F, values of the first syllable and a corresponding decrease in F, on the second syllable. Independent of any changes caused by a stress shift, we also expected that there would be a significant decrease in F, on both syllables of the key word as its location changed from sentence-initial to sentence-medial to sentence-final position. Results and Discussion The results of the acoustical analyses of the sentences in this experiment are shown in Table 7. The duration and F, values displayed in the table are the average values for all nine speakers. As noted above, the hypothesis under investigation predicts that the first syllable of the word thirteen will have increased values of duration and F, when the following word has main stress on the first syllable (version A), as opposed to the condition where main stress on the following word falls on a later syllable (version B). At the same time, second syllable F, on thirteen should be less in version A than version B. Yet the grand mean values at the bottom of Table 7 show that these predicted patterns are not present in the sentences analyzed here. For the first syllable of thirteen, the mean duration is actually less for version A than for version B, and this trend is evident at all three sentence positions. Meanwhile, the grand means for F, show
METRICALPHONOLOGY
383
TABLE? MEAN
DURATION
AND F, VALUES FOR THE WORD thirteen
Sentence position
First syllable duration
Stress version
Initial
B Medial
A B
Final
A
B Grand
means
A
B Note.
Values
in parentheses
are standard
F,, (Hz)
169.4 (23.0) 176.9 (39.8) 174.3 (31.0) 175.3 (32.0) 174.7 (27.5) 183.8 (34.6)
196.0 (67.4) 194.1 (72.0) 176.3 (55.7) 177.9 (58.5) 164.5 (52.5) 159.6 (52.2)
214.5 (76.9) 209.2 (80.2) 185.6 (62.4) 187.1 (64.2) 167.1 (56.3) 167.7 (57.6)
172.8 (27.1) 178.7 (35.4)
178.9 (59.6) 177.1 (62.3)
189.0 (67.8) 188.0 (69.3)
deviations.
that the two stress versions produce an average difference of only 1 Hz on both syllables of the measured word. We calculated four-way ANOVAs for these data (2 Versions x 3 Positions x 3 Sentences x 9 Subjects) and found no significant differences between the A and B versions for any of the three acoustic variables measured here. At the same time, we did find a significant main effect for sentence position with respect to the F,, values on the two syllables of thirteen (for the first syllable, F’(2,28) = 10.31, p < .005; for the second syllable, F’(2,23) = 15.09, p < .005). Thus, as was the case in Experiment 4, the expected effect of declination is observed in the F, patterns of these sentences, but the predicted effect of a stress shift is not evident. CONCLUSION
Basic claims been examined each case, the supported. Yet,
First syllable
5
Second syllable F,, (Hz)
(ms)
A
FROM THE SENTENCES IN EXPERIMENT
of metrical phonology have in five empirical tests. In predictions have not been other results from the same
tests indicate effects consistent with earlier work, suggesting that the null results for metrical claims cannot be readily attributed to some unusual attribute of the experimental task or the particular speakers tested. Thus, it appears that at least some of the presumed “facts” of rhythmic patterns presented in metrical phonology do not hold up under empirical testing. Whether the empirical inadequacies of metrical phonology are more widespread than the few instances examined here remains to be determined. In any event, it is possible that the main virtue of metrical theory lies in its characterization of phenomena other than stress clashing. In this regard, it should be noted that the theory still provides a principled account of such phenomena as the following: (a) there appears to be no upper bound on the number of degrees of stress distinguishable in a language, (b) different languages exhibit different patterns of alternating stresses, and (c) typically, the number of stresses in a word remains constant despite syllable de-
384
COOPER AND EADY
letion (e.g. Prince, 1983; Halle & Vergnaud, book in preparation). Thus, the present evidence on stress clashing should not be interpreted as casting doubt on the overall utility of metrical theory. What does seem clear from the present results is that metrical proposals should routinely be coupled with systematic empirical testing before such proposals are used as a basis for more elaborate theoretical constructs (see also Cooper, 1986). In some of the instances tested here, the original claims were utilized extensively in elaborate argumentation without justification. Similar blind alleys might be avoided if greater effort is made to determine the empirical adequacy of key proposals at an earlier stage in research. Lieberman (1965) presuasively documented the need for empirical testing of phonological claims 20 years ago. The present results indicate that his conclusions and caveats are still applicable. REFERENCES CLARK, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research.Journal of Verbal Learning and Verbal Behavior, 12, 335-359. COOPER, G.. & MEYER, L. (1960). The rhythmic structure of music. Chicago: The Univ. of Chicago Press. COOPER, W. E. (1986). Review of E. 0. Selkirk, Phonology and syntax: The relation between sound and structure. Studies in Language, 10, 255-260. COOPER, W. E., EADY. S. J., & MUELLER, P. R. (1985). Acoustical aspects of contrastive stress in question-answer contexts. Journal of the Acoustical Society of America, 17, 2142-2156. COOPER, W. E., SOARES, C., & REAGAN, R. T. (1985). Planning speech: A picture’s word’s worth. Acta Psychologica, 58, 107- 114. COOPER, W. E., & SORENSEN, J. M. (1981). Fundamental frequency in sentence production. New York: Springer-Verlag. FRY, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Society of America, 21, 765-769. HALLE, M., & VERGNAUD, J.-R. Three dimensional phonology. Book in preparation. HALLE, M., & VERGNAUD, J.-R. (1980). Three dimensional phonology. Journnl of Linguistic Research, 1, 83- 105.
HAYES, B. (1982). Extrametricality and English stress. Linguistic Inquiry, 13, 227-276. HAYES, B. (1983). A grid-based theory of English meter. Linguistic Inquiry, 14, 357-393. HAYES, B. (1984). The phonology of rhythm in English. Linguistic Inquiry, 15, 33-74. KIPARSKY, P. (1979). Metrical structure assignment is cyclic. Linguistic Inquiry. 10, 421-442. KIPARSKY, P. (1982). From cyclic phonology to lexical phonology. In H. van der Hulst & N. Smith (Eds.), The structure of phonological representations (part I) (pp. 131-177). Dordrecht: Foris. LEHISTE, 1. (1980). Phonetic manifestation of syntactic structures in English. Annual Bulletin, 14, l-27 (Research Institute of Logopedics and Phoniatrics, University of Tokyo). LIBERMAN, M. (1975). The intonational system of English. Unpublished Ph.D. thesis. Massachusetts Institute of Technology, Cambridge, MA. LIBERMAN, M., & PIERRJXHUMBERT, J. (1984). Intonational invariance under changes of pitch range and length. In M. Aronoff and R. T. Oehrle (Eds.). Language sound structure. Cambridge, MA: MIT Press. LIBERMAN, M., & PRINCE, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8, 249-336. LIEBERMAN, P. H. (1960). Some acoustic correlates of word stress in American English. Journal of the Acoustical Society of Americu. 33, 45 I-454. LIEBERMAN, P. H. (1965). On the acoustic basis of the perception of intonation by linguists. Word, 21, 40-54. MARTIN, J. G. (1972). Rhythmic (hierarchical) versus serial structure in speech and other behaviors. Psychological Review, 19, 487-509. MORTON, J., & JASSEM, W. (1965). Acoustic correlates of stress. Language and Speech, 8, 159- 181. PETERSON, G. E., & LEHISTE, I. (1960). Duration of syllable nuclei in English. Journul of the Acoustical Society of America, 32, 693-703. PRINCE. A. (1980). A metrical theory for Estonian quantity. Linguistic Inquiry, 11, 5 1l-562. PRINCE, A. (1983). Relating to the grid. Linguistic Znquiry, 14, 19-100. SELKIRK, E. 0. (1984). Phonology and syntax: The relation between sound and structure. Cambridge, MA: MIT Press. SHAFFER, L. H. (1984). In D. G. Bouwhuis (Ed.), Attention and Performunce X. Hillsdale, NJ: L. Erlbaum. SNEDECOR,G. W., & COCHRAN, W. G. (1980). Statistical methods. Ames, IA: Iowa State Univ. Press. UMEDA, N. (1975). Vowel duration in American English. Journal of the Acoustical Society c>.f America, 58, 434-445. (Received October 16, 1985) (Revision received January 3 1, 1986)