Speech rhythm: its relation to performance universals and articulatory timing

Speech rhythm: its relation to performance universals and articulatory timing

Journal of Phonetics (1975) 3, 75-86 Speech rhythm: its relation to performance universals and articulatory timing George D. Allen lnstitute of Speec...

6MB Sizes 0 Downloads 19 Views

Journal of Phonetics (1975) 3, 75-86

Speech rhythm: its relation to performance universals and articulatory timing George D. Allen lnstitute of Speech and Hearing Sciences and Dental Research Center, University of North Carolina at Chapel Hill, N.C. 27514, U.S.A . (Received 18th December 1974)

Abstract:

The observed rhythms produced and perceived in spoken language are closely related to rhythms of other human behavior. This relationship is examined in terms of (1) the types of rhythmic structures observed, (2) the rate of succession of rhythmic units, (3) a perceptual tendency toward equalization of physically unequal intervals, and (4) the variability of rhythmic motor action. Universal rhythmic principles underlying this relationship are incorporated into languages in different ways, however, depending upon deeper phonological, syntactic, and lexical constraints, thus giving both competence and performance aspects to rhythm rules.

Introduction No one doubts that spoken language has rhythm. How one should incorporate the notion of rhythm into a theory of language, however, is much less clear. Some linguists (e.g. Bolinger, 1965; Halliday, 1963 ; Pike, 1945) have used rhythm explicitly in their descriptions of language structure, but most have only hinted at where rhythm might fit into their theories. There is considerable literature on other prosodic aspects of language, but this work nearly always stops short of generalization to rhythm. This paper will attempt to shed light on this elusive linguistic phenomenon, showing it to be a product of both performance universals and language-specific grammatical rules. There are two ways of looking at rhythm: the more common in linguistic writing is that rhythm is the pattern of a temporal sequence; less common linguistically, and less specifically, rhythm can be the pattern of any sequence, without regard for time. That is, "rhythmic" and "unrhythmic" are words we can use to describe sequences of like events, whether or not we mark the passage of time exactly while we perceive these sequences. If we mark time exactly, or nearly so, as linguists tend to do, then the sequential rhythms are of the specifically temporal sort; if we mark time less exactly or not at all, then the rhythms are of the more general kind. This distinction between temporal and general rhythms has received some attention in literary criticism (Hrushovsky, 1960; Fowler, 1966), especially in regard to distinguishing between the rhythms of prose and free verse. On the subject of poetic rhythm, for example, Hrushovsky writes (pp. 180-81): We can observe many rhythmic factors: metrical sequences and deviations from their ideal norms; word boundaries and their relations to feet boundaries; syntactic groups and pauses and their relation to metrical groups (line, caesura); syntagmatic relations,

76

G. D. Allen

word order, syntactic tensions; repetitions and juxtapositions of sound, meaning elements, etc. Practically everything in the written poem can contribute to the shaping of the rhythm .... The only referent of temporal rhythm in a poem, however, is meter, an idealization which the actual timing of the poem only approximates. Language, like poetry, has many levels of sequential constraint, and here again the more general sense of rhythm applies to their structure. Any grammatical rule, whether syntactic or phonological, which affects the order of closely-related formatives in a systematic way will have an effect on the eventual rhythmic structure of the phrase. How close this relationship between formatives must be, in order to contribute to the rhythm, depends in large amount upon the nature of the formatives: phonological rhythms generally work at close range, within the phrase and not beyond the sentence, whereas lexical and syntactic rhythms often span two or more sentences. For example, English tends to alternate stressed and unstressed syllables, both within polysyllabic words ("Alternating Stress Rule", Chomsky & Halle, 1968) and between usually unstressed function words alternating with stressed content words. These two close range grammatical constraints on stress placement, one phonological and the other syntactic, may be largely responsible for the perceived stress rhythm of English. Although syntactic rules can and do influence language rhythm, by far the greater amount of rhythmic effect derives from phonological and phonetic constraints that have their roots in universal organizing principles of human perceptual and motor behavior and language- and speaker-specific constraints on speech production. The remainder of this paper is devoted to describing these behavioral principles and their relation to speech rhythm. Rhythmic performance universals Perception of rhythm When we hear a sequence of pulses that is neither too rapid nor too slow we hear it as rhythmic (Woodrow, 1951; Fraisse, 1963). As long as the minimum time between pulses is greater than about 0·1 s, so that successiveness and order are perceivable, and the maximum is less than about 3·0 s, beyond which groupings do not form, we will impose some rhythmic structure on the sequence. With regular sequences of stimuli, such as a sequence of nearly identically spaced nearly identical clicks, the structures usually perceived are simple groupings of from two to six successive stimuli per group, with faster rates of succession giving more stimuli per group (Fraisse, 1963). Each group has one of its pulses perceptually stronger, usually the first or the last, but in groupings by fours a secondary weak beat is often perceived on the pulses next but one to the "strong" pulses, giving a fully alternating rhythm (Woodrow, 1951). This structure which we impose may change through time as our hypothetical rhythm of the moment fails to match the perceived sequence. Our ability to hear a rhythm where one does not in fact exist i~ very strong, however, and the kinds of rhythms we perceive are generally sequences and alternations. Part of our ability to impose rhythm on a sequence may be due to our tendency to underestimate the duration of long time intervals and overestimate the duration of short ones. Early perceptual studies of rhythm suggested that each of us has a central time interval, a so-called "indifference interval", which is neither under- nor overestimated, this interval usuaily being in the range 0·6-0·8 sin duration (Fraisse, 1963; Woodrow, 1951). More recent work (Treisman, 1963; Michon, 1967) has challenged the uniqueness of this central

Speech rhythm

77

interval, showing that it is different for different listeners and can even vary for a single listener. The over- and underestimation of relatively short and long intervals finds continued support, however, which may explain our ability to hear sequences of approximately equal time intervals as more equal than they really are. Now suppose we listen to a sequence of equally spaced pulses with every other one louder. We will hear these pulses as a sequence of pairs, naturally enough, and the louder pulse will lead the group; that is, we will hear a sequence of trochees. But suppose we listen to a sequence of equally spaced pulses with every other pulse longer in duration. Again we will hear a sequence of pairs, but now the stronger pulse, i.e. the longer one, will come second in the group; we will hear a sequence of iambs. Differences in pitch act like differences in loudness, causing us to hear a sequence of trochees with the higher pitch leading. If every third pulse is louder or higher in pitch we will hear groups of three with the strong pulse leading the group; if every third pulse is longer, however, it will end the group of three. Finally, combinations of differences in loudness, pitch and duration can lead to complex rhythmic groupings. Perception of speech rhythm The metric foot types of English verse give us some idea of what rhythmic structures poets perceive in English speech. The most common are feet comprising two or three syllables, with the exclusion of the rarer amphibrach, a three syllable foot with accent on its center syllable. That is, the huge preponderance of English metrical verse has feet that are either two or three syllables long, with accent either beginning or ending the foot. Most of these meters will give rise to an alternating rhythm, since they have one or more unaccented syllables in each foot, the exceptions being the spondee (long-long) and the pyrrhic (shortshort). Since these last two are considered to be "substitute" foot types, however, and are therefore less often used, English must be considered to have primarily an alternating rhythm with two or three syllables per rhythmic group. Although the meters of poetry of Germanic languages are based largely on the number and spacing of accented syllables in each line, Romance (and Japanese) poetry requires only a fixed number of syllables per line (Preminger, 1965, p. 497). That is, since accentuation plays a weaker role in Romance phonology, the poetry of these languages makes little use of differences in syllabic accent, grouping the syllables instead into sequences of equals. Natural language rhythms thus appear to be largely either simple alternations or successions. Stankiewicz (1960, pp. 77-81) explores and contrasts these accentual possibilities within the Slavic family and makes the interesting suggestion that a metrical form "fits" a language only if it allows sufficient room for the poet to violate it creatively, thus conforming the poem to the natural rhythm of the language. The rhythmic intervals of speech are apparently also subject to the same centralizing tendency discussed above. For example, Coleman (1974) asked listeners to judge the time between successive stressed syllables (interstress intervals) in speech that was carefully constructed so as to contain intervals that were quite different in duration. In agreement with our general centralizing tendency, the subjects judged them to be more similar than they actually were, imposing a nonexistent equality. Abe (1967) has proposed that our feeling that English is stress timed (i.e. has equal durations between successive stresses) arises from the preponderance of intervals whose durations lie in the narrow range from 0-4 to 0·7 s (seep. 81). Our tendency to adjust our perceptions ofinterstress interval durations toward some central, or average, duration, when added to our tendency to impose a rhythm on any sequence of intervals, lends support to Abe's hypothesis.

78

G. D. Allen

English has a stress accent, and Fry has shown (1955) that our most important cue for stress is pitch, with duration and loudness being less important, in that order. But all three cues are highly correlated in their pattern of occurrence in English, the syllables that are higher in pitch usually also being longer and louder. We should not be surprised, therefore, that stressed syllables seem to lead the group in which they occur, except when they are extra long, as at the end of the phrase. I cannot say that this characterization of the normal rhythm of English as a sequence of falling stress contours (i.e. strong followed by one or more weak) all grouped into a generally rising pattern ended by the last strong stress (Chomsky & Halle, 1968; Martin, 1972) is a consequence of our perception, but at least there again is a correspondence between our perception of the rhythmic structures of speech and other rhythmic perceptions. In French the rhythmic pattern is different. Successive syllables within a sense group are very similar in duration, pitch, and loudness, with the exception of the last, which is, on the average, higher in pitch, slightly softer, and much longer in duration (Delattre, 1966). Although a higher pitched syllable would tend to lead a rhythmic group, the great increase in duration and slight decrease in loudness co-operate with the semantic grouping to give a terminal accent, and thus a sequence of "rising" feet, to the rhythm of spoken French. If these generalizations concerning rhythmic perception are to have any linguistic validity, they must hold for all languages, and the statement by Jakobson, Fant & Halle (1951) that speakers of different languages perceive identical rhythmic sequences differently as a consequence of the rhythmic pattern oftheir language stands in direct contradiction to this principle (pp. 10-11): Knocks produced at even intervals, with every third louder, are perceived as groups of three separated by a pause. The pause is usually claimed by a Czech to fall before the louder knock, by a Frenchman to fall after the louder; while a Pole hears the pause one knock after the louder. The different perceptions correspond exactly to the position of the word stress in the languages involved: in Czech the stress is on the initial syllable, in French, on the final and in Polish, on the penult. When the knocks are produced with equal loudness but with a longer interval after every third, the Czech attributes greater loudness to the first knock, the Pole, to the second, and the Frenchman, to the third. The authors cite no reference, however, and contrary evidence comes from the mutual agreement of studies by Fraisse (1963) and Woodrow (1951) using French- and Englishspeaking subjects, respectively. It is possible, as Halle has suggested (personal communication, 1968), that different experimental situations might give rise to different effects of a subject's language on his perceptions, perhaps as a function of whether or not he is listening in the "speech mode" (Liberman, 1970). The question remains open. The argument so far is that we perceive spoken language as rhythmic because it is fairly regular in its sequential sound patterns often enough that we can impose upon it simple rhythmic structures. The perceived rhythm will be different from language to language, however, depending on the nature of the sound sequences and their interrelations: languages with strong tonic accent (e.g. English and German) will have rhythmic groupings with the strong syllable leading; languages with accent based on duration (e.g. French) will have the strong syllable last; other aspects of the structure of the rhythmic group will depend upon language specific syntactic and phonological rules. Production of motor rhythms Our tendency to impose rhythms on partially structured sequences, which can account for much of perceived language rhythm, is reinforced by the natural rhythms of our move-

Speech rhythm

79

ments. Miyake (1902), for example, studied the rhythmic structuring of various kinds of motor behavior and found that (l) it is impossible not to act rhythmically and (2) simple successions and alternations are most prevalant in our movements. Most studies of rhythmic behavior have involved movements of the head and jaw as well. All of them, however, have shown simple rhythmic structuring to be the norm, except when there is some external constraint on our behavior. For example, Rose & Pew (1972) examined finger movements of skilled pianists playing a simple five-note sequence and found, not surprisingly, an overall by-fives structure. Interestingly, however, there was often a two-plus-three sub grouping to the five-note sequence, suggesting that our tendency toward simple rhythmic structures in motor action is strong. When we move rhythmically, the rate of succession of rhythmic "beats" is not arbitrary. Experiments have determined that different people prefer acting at different rates, but personally "preferred" rates (Woodrow, 1951) have been found to range around an average of about two acts per second. If we translate this into a time interval measure, we find that people tend to act at the rate of one beat every 0·5 s, when performing some rhythmic motor task. Wundt (1911) found preferred rates of between 0·3 and 0·5 s between acts, and Fraisse writes that the rate of succession of the "important" notes in a musical composition is between 0·15 and 0·90 s between notes (Fraisse, 1963, p. 89). In a study by Miles (1937), 80% of200 subjects preferred rates of between 0·2 and 0·7 s between acts, although 11% preferred rates of greater than a second between acts (Michon, 1967, p. 9). There are of course differences in preferred rates of succession that depend on who we are and what we are doing, but on the average we have limits of about 0·2 and 1·0 s between acts when we are doing some motor task at our natural, preferred rate. A third characteristic of rhythmic action is its variability. Various sources (Fraisse, 1963; Michon, 1967; Woodrow, 1951; Treisman, 1963) report various ranges for the error variability with which people produce time intervals, the overall range of standard errors being about 3-11 % of the length of the interval they are producing. There is a difference between reproducing a given interval, in which case the error will be at the high end of the range, and producing one's own intervals, when the error will be lower. For example, if a subject is presented with a train of clicks, equally spaced between 0·2 and 1·0 s apart, and if he is asked to tap his finger at the same rate as the clicks, but after they have been turned off, then he will do so with average errors of about 7-11% of the standard interval. If he is allowed to tap freely but steadily at his own rate, however, his variability will be only 3-5% of the average. Rhythm of speech production What are the rhythmic beats of speech? In the motor activities discussed above, the beats were simple movements of some muscle or member, such as taps of the finger or bobs of the head. Speech, however, involves complex muscular co-ordination, with many actions proceeding at once. Allen (1972) showed that, at least in English, the onset of the nuclear vowel of a stressed syllable is a rhythmic "focus" in speech; that is, native speakers of English, when acting rhythmically in speaking or listening, place the downbeats of their rhythms in the close vicinity of those stressed vowel onsets. The onset of a vowel often is accompanied by large changes in muscular and acoustic energy (e.g. as the jaw opens), making this point a natural candidate for a rhythmic focus. Although many languages do not have stress accents to the same degree as English, all have some kind of accent. We can therefore generalize from the stress-beat in English to the onset of tht< nuclear vowel of any accented syllable as a potential rhythmic beat in speech.

G. D. Allen

80

The central position of the syllable in rhythmic description has been described by other writers on speech rhythm (see especially Fry, 1964); but because of the wide variability in the degree to which accented syllables differ in their phonetic shape from unaccented ones, different rhythmic patterns can result. Unstressed syllables in English, for example, are "reduced" in both quality and quantity to the extent that the resulting rhythmic pattern consists of the stressed syllables alternating with all of the intervening unstressed syllables, i.e. a sort of massed off-beat. When the unaccented syllables retain their phonetic shape, however, as in French or Japanese, the resulting rhythmic pattern remains tied as much to syllables as to accents. Stress rhythms are thus rhythms of alternation, whereas syllable rhythms are rhythms of succession; they have been referred to by Pike (1945) as "stresstiming" and "syllable-timing", respectively. There appears to be an additional tendency toward accent alternation in many and perhaps all languages. In English, words with indeterminate lexical stress patterns will be accented one way in one environment and another way elsewhere. Bolinger (1965), from whom my examples are drawn, has documented this phenomenon in detail, and Vanderslice (1968) has discussed it somewhat more theoretically.1 For example, the word "upstairs" will be stressed on the first syllable in the phrase "upstairs bedroom"; in the phrase "went upstairs" it is stressed on the second syllable. The position of the stress depends on the position of neighboring stresses, in this case the stresses on "bed-" and "went". Some names have indeterminate stress patterns, "Irene" being a good example for many speakers. Iflrene's last name is "Dunn", she will stress either the first syllable or both syllables of her name, thus giving either an alternating or a sequential rhythm ; if her last name is "McDermott", however, she will alternate the stresses by stressing the second syllable of "Irene". Kiparsky (1966) writes of a similar phenomenon in German, under the heading of "Rhythmischer nebenakzent". He points out for example that certain prefixes, such as "un-", "all-" or "in-" receive secondary stress in some words and not in others 2

1

2

1

2

1

1

1

1

(unertriiglich, allgemein, inkommensurabel vs. unendlich, allmiichtig, infam), and writes that "Alles das konnen wir in der einen Regel zusammenfassen, dass unbetonte Silben am Wortanfang und am W ortende einen Sekundiirakzent erhalten, wenn nicht unmittelbar vor oder nach ihnen eine betonte Silbe steht" (emphasis mine). The resemblance between this rule and the abovementioned alternation rule for English is strong. Even Spanish, a syllable timed language, appears to alternate stresses under similar 1 1 am not referring here to phenomena included in the stress rules of Chomsky & Halle (1968) but rather to a class of alternations they leave unexplained . They write (p. 117), " Now, by a phenomenon superficially similar to the one we have formalized in terms of rules (108) and (107), the resulting x 21 contour is converted to 231. Similarly, in a sequence such as tired old men, the 221 contour produced by the Nuclear Stress Rule is generally converted to 231 , perhaps by the same process. We do not know precisely what the doma in of this process is, or how it should be described in detail. We merely note here that our description, which is limited in scope to the word, is insufficiently general." Chomsky & Halle's rules (107) and (108) are "deep" phonological rules, applied long before the phonetic detail of the surface structure is specified. The examples I cite are described by a rule which reorganizes the ordering of surface stresses and which is therefore not a deep rule. The later examples from German and Spanish seem also of this surface variety, although the concurrent existence of deep rhythm rules would require reinterpretation of these surface rules.

Speech rhythm

81

conditions. 2 Although Spanish has a simple and almost completely general stress rule for words in isolation, some words can have stress on different syllables depending, once again, on the positions of the neighboring stresses .. For example, the word "hasta" should be stressed on the first syllable, as it is in "Fui hasta Mexico". But the stress shifts, or at least becomes stronger, on the second syllable of"hasta" in the sentence "Fui hasta Monterrey", becau~e of the unstressed syllables of "Monterrey" that directly follow "hasta". Other examples involve both high frequency words (esta, desde, para) and less common words (francamente, dimelo). Although the existence of an alternation rule of general application to all languages must be confirmed by further examination of linguistic data, the evidence presented above does give a starting point for such a search. In addition, however, this alternation rule must have some origin, or explanation, for which our tendency to act rhythmically is not a very strong candidate, since successions are presumably as rhythmic as alternations. Perhaps accent, in order to be perceived, must alternate with non-accent, this alternation allowing the speaker to keep himself and his listener continually informed as to endpoints of the accentual dimension. The rate of succession of rhythmic speech acts falls into the same 0·2-1·0 s range as for other motor behaviors. Several studies on inters tress intervals in English are in good agreement: Shen & Peterson (1962) measured the intervals between all the stresses of a few minutes' reading by three speakers, and although many of the intervals span terminal junctures, and so are perhaps not directly relevant to the measurement of rhythm, most of their measurements fell between 0·2 and 0·8 s; Allen (1972) found inters tress intervals ranging from 0·3 to 0·6 seconds in a small number of conversational utterances from three speakers; and Abe (1967) measured similar intervals in a "fast reading" by one speaker, most of them falling in the range from 0·4 to 0·7 seconds. We may expect a similar range for other languages with stress rhythms, such as German, but no relevant data appear to exist at this time. For languages with syllable based rhythms, the problem of measuring rate is a little more difficult. On the one hand, the rate of succession of syllables is usually quite rapid in conversational speech, ranging from an average of about 0·1 s per mora for Japanese (Wang, 1968) to about 0·2 s per syllable for French (Malecot, Johnston & Kizziar, 1972). On the other hand, the rate of succession of rhythmic groups is slower, since there are, in almost all cases, two or more syllables per group. French speakers, for example, appear to produce the majority of such rhythmic groups at rates between 0·4 and 1·7 per second (Malecot et al., 1972). The rate of syllable succession thus falls at the low end of the range of preferred rates of motor action, whereas the rate of rhythmic grouping spans the middle and high end of the range. Although the precise nature of the rhythmic groupings in , syllable based languages deserves greater attention, the rate of succession of syllables and groups appears to be centered around the range of rates of motor behaviors other than speech. The temporal variability of speech also shows a range similar to that of other motor behaviors. Peterson & Lehiste (1960) measured the durational characteristics of words spoken in the test frame "Say the word-- again", as uttered by Peterson and others. 2 My. examples for Spanish were furnished by J. Cruz-Salvadores and R. Gingras, both native speakers of Spanish, though of different dialects. I am indebted to them for their help in furnishing and checking these examples with other native speakers and for the further observation, by Gingras, that this alternation phenomenon may also exist in Qechua.

82

G. D. Allen

In the set of 1263 repetitions by Peterson the average frame duration was 174 cs, with a standard deviation of 6·9 cs, or about 4% of the average. Kozhevnikov & Chistovich (I 965), in their research on the time relationships within and between sense groups in an utterance, measured the variations of the durations of different elements (syllables, phonemes, words) in the speech. Their subjects strove to repeat an utterance (usually "Tonya topia banyu. ") as precisely as possible, and the relative variability of shorter segments (phonemes and syllables) of the speech was compared with the variability of longer segments (syllables and words). Their conclusions are worth quoting here (p. 101): We were not able to find (in the literature) data concerning fluctuations of the duration of large segments of the (sense group); however, in some works data is given on fluctuations of the duration of individual sounds of speech (Kaiser, 1939; Stetson, 1951). The cited data indicates that the relative error of the duration of one and the same sound of speech in the case of a constant rate of pronunciation amounts to 10-20 %. This fully agrees with that which we observed in our material. It is significant that with such a large relative error of the duration of the individual sounds of speech the relative error of the duration of the entire (sense group) amounts to only about 3%. The variability of production of speech segments thus matches nicely the variability of production of more general rhythmic intervals: short speech segments have variabilities at the high end of the range, about 10%; longer segments are at the lower end of the range, about 4%. More recently, the relationship between the variabilities of speaking and finger tapping have been investigated directly (Allen, 1973; Cooper & Allen, 1973; Tingley & Allen, 1975). Measures of speaking and tapping variability are not only similar in magnitude, on the average, but also show a moderate correlation across individuals ; that is, individuals with poor temporal control in their speech also tap their fingers more variably. These data support a close relationship between speech timing control and the control of other temporal behavior. Language is produced by humans and perceived by humans, and it appears to be governed by the same rhythmic constraints as other human motor and perceptual behaviors. These constraints thus set limits on the kinds of rhythms we can expect in languages of the world: they should be simple in structure, confined largely to successions and alternations, depending on the relationship between syllables and stress-accent in the language; the rate of succession of syllables and rhythmic groups should be in or near the range of 0·2-1·0/s; and the variability with which the time program of the rhythm is realized should conform to the variability of other skilled motor acts. These limitations leave a great deal about the rhythm of a given language or dialect unspecified. For example, English and German both have strong tonic stress yet are presumably not rhythmically identical. Likewise, some speakers of English sound more rhythmic than others, even though they speak the same language. These more "delicate" differentiating characteristics of speech rhythm may be due in large part to aspects of articulatory timing to which the remainder of this paper is devoted. Since these constraints are no longer language universal, they must be learned; therefore, although articulatory timing may appear on the surface to fall within the domain of phonetic performance, the aspects of interest here must be considered part of the speaker-hearer's competence. Articulatory timing The first part of this paper suggests that when we speak, we impose certain simple rhythmic structures on our motor actions; and when we listen, we constrain the rhythmic sequence

Speech rhythm

83

still further . Even though these forces do in part determine the rhythm of an utterance, however, they act on a sequence of articulations that have a temporal structure of their own, and the degree to which the idealized articulatory program bends to these forces is of equal importance to the eventual phrase rhythm. For example, the syllables of polysyllabic English words are known to differ in their ability to carry stress-accent: some syllables (e.g. "-tion") can never carry stress-accent, except in constrastive citation usage; others may or may not be accentuated, depending upon context. Vanderslice and Ladefoged (1972) refer to these as "light" and "heavy" syllables, respectively, and agree with Chomsky & Halle (1968) that English speakers (of the same dialect) know and generally agree which are which. The phonetic difference between heavy and light syllables, according to Vanderslice and Ladefoged, is that the former have "full articulations", the latter "reduced timing" (p. 820). Thus, the rhythmic forces alluded to can affect the articulatory timing of some syllables more than others, this difference being one of the complicating characteristics of English speech rhythm. For example, consider the following pair of sentences (Bolinger, personal communication, 1969): (1) Irene's pet chimpanzee Nimrod dotes on fresh horehound drops. (2) George's little dog Fido is crazy about chocolate candy. There are no light syllables in (1), so that, regardless of where the accents fall, syllable durations are pretty much fixed at their full value. In (2), however, there are at least four light syllables which may change their articulatory shape as a function of the rhythmic context. The second syllable of the phrase will, for example, be greater or less in duration depending upon whether or not "little" is stressed; in other words, the duration of a light syllable depends upon the duration of the rhythmic group to which it belongs. Similar effects have been observed for other languages besides English. The exact duration of a syllable depends both upon the position and the number of syllables in a word in Czech (Ondnickova, 1962), Swedish (Lindblom, 1968) and Dutch (Nooteboom & Slis, 1972), and the duration of a syllable in French depends in part upon the number of other syllables in the sense group (Malecot eta!., 1972). The strength of this rhythmic effect on syllable duration is, however, different in these different languages, being so strong in English as to lead to a feeling of stress timing yet so weak in French as to go virtually unnoticed. Such interlanguage differences suggest that, although these temporal effects may be due to rhythmic performance universals, they have been incorporated into the phonology of each language, sometimes in different ways. There is other evidence to support the idea that rhythmic universals act from within phonology rather than as external constraints on performance. In a series of experiments devoted to locating the rhythmic "beats" of English speech, Allen (1968, 1972) found a consistent displacement of any given token of an utterance with respect to the rhythmic pattern that underlay it; that is, if a subject were trying to speak a series of syllables in time to a series of heard clicks, for example, and if the first syllable were uttered a little earlier than usual with respect to its click, then other syllables would likely be earlier than usual with respect to their clicks also. In other words, the speaker appeared to have listened to the clicks, decided when they were going to occur, and uttered the syllable sequence with respect to his predictions, rather than matching his utterance on a click-by-click basis. Timing information for the entire phrase was precomputed and stored before the utterance began. Other studies have sought to determine the units for neural coding of this articulatory timing within the phrase (Kozhevnikov & Chistovich, 1965; Allen, 1973; Lehiste, 1970;

G. D. Allen

84

Ohala, 1973), hoping to find temporal structures within such linguistically motivated units as syllables or words. Unfortunately, the search for such a temporal consonant-vowel syllable unit has so far gone unrewarded, both for Russian (Kozhevnikov & Chistovich, 1965) and for English (Allen, 1973; Ohala, 1973). No particular phonetic substring within the phrase seems any more temporally coherent (or incoherent) than any other. From these studies, then, emerges the following rather simple model of articulatory timing control. Prior to the initiation of speech, the phonetic features (including time values) for the entire phrase are computed and stored as an ordered string in an output buffer. 3 As noted above, there is no evidence for anything more complex than a segment-bysegment structure at this level of articulation, higher-order relationships presumably being expressed as particular choices of segmental attributes by the phonological component. The phrase is then translated into neuromotor commands by the speech motor control program, which controls the exact articulatory timing by comparing the stored segmental time values to some neural clock (Allen, 1973). This work has strong implications for speech rhythm: there appears to be no temporal structure imposed on the phrase after the phonological output, so that any rhythmic structures originate either in.the speaker's grammar or the listener's perceptions. Therefore, although the perceptual universals discussed earlier no doubt partially determine the resulting perceived rhythm, the motor production universals apparently can affect rhythm only through their influence on the grammar of the language. The value of such shared rhythmic constraints for communicative efficiency is apparent. As Martin (1972, p. 488) states: (Temporally) patterned speech sounds could be redundant with respect to linguistic message elements to a far greater extent than sounds that are only concatenated. Furthermore, since rhythmically patterned sounds have a time trajectory that can be tracked without continuous monitoring, perception of initial elements in a pattern allows later elements to be anticipated in real time .... If .. . informative elements are nonadjacent and temporally predictable, then certain efficient perceptual strategies (e.g., attention cycling between input and processing) might be facilitated. Perception of concatenated sounds, on the other hand, would seem to require continuous attention. Seen in this light, then, speech rhythm functions mainly to organize the information bearing elements of the utterance into a coherent package, thus permitting speech communication to proceed efficiently. Rhythm therefore does not carry much linguistic information, other than helping to signal the language of the speaker; without rhythmic organization, however, the linguistic message would be difficult to transfer. This investigation was supported in part by NIH research grant number DE 02668 from the National Institute of Dental Research, and by NSF grant number GS-41863.

References Abe, I. (1967). English sentence rhythm and synchronism. Bulletin of the Phonetic Society ofJapan 125,9-11. Allen, G. D. (1968). Experiments in speech rhythm. Journal of the Acoustical Society of America 44, 377 (Abstract). Allen, G. D. (1972). The location of rhythmic stress beats in English speech. Parts I & II. Language and Speech 15, 72-100, 179- 95. 3 Some possible rules entering into this computation have been described for Swedish by Lindblom & Rapp (1973).

Speech rhythm

85

Allen, G. D. (1973). Segmental timing control in speech production. Journal of Phonetics 1, 219-37. Bolinger, D . (1965). Forms of English (I. Abe and T. Kanekiyo Eds). Cambridge, Massachusetts: Harvard University Press. Chomsky, N. & Halle, M. (1968). The Sound Pattern of English . New York: Harper and Row. Coleman, C. (1974). A study of acoustical and perceptual attributes of isochrony in spoken English. Unpublished dissertation, Department of Speech, University of Washington, Seattle. Cooper, M. H. & Allen, G. D . (1973). Accuracy of timing control in speech. ASHA 15, 432 (Abstract). Delattre, P. (1966). A comparison of syllable length conditioning among languages. IRAL 4, 183-98. Fowler, R. (1966). "Prose rhythm" and metre. Essays on Style and Language (R. Fowler Ed.). Pp. 82-99. London: Routledge and Kegan Paul. Fraisse, P. (1963). The Psychology of Time, New York: Harper and Row. Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustical Society of America 27, 765-8. Fry, D. B. (1964). The function of the syllable. Zeitschrift fiir Phonetik 17, 215-21. Halliday, M.A. K. (1963). The tones of English. Archivum Linguisticum 15, 1-28. Hrushovsky, B. (1960). On free rhythms in modern poetry. Style in Language (T. Sebeok Ed.). Pp. 173-90. Cambridge, Massachusetts: MIT Press. Jakobson, R., Fant, C., Gunnar M. & Halle, M. (1951). Preliminaries to Speech Analysis. Cambridge, Massachusetts: MIT Press. Kaiser, L. (1939). Biological and statistical research concerning the speech of 216 Dutch students. · Archives Neerlandaises Phonet. Experimentelle 15, 1-76. Kiparsky, P. (1966). Dber den deutschen Akzent. Studia Grammatica 7, 69-98. Kozhevnikov, V. A. & Chistovich, L.A. (1965). Speech: Articulation and Perception, Joint Publications Research Service 30,543. U.S. Department of Commerce, Washington, D. C. Lehiste, I. (1970). Temporal organization of higher-level linguistic units. Journal of the Acoustical Society of America 48, 111 (Abstract). Liberman, A. (1970). The grammars of speech and language. Cognitive Psychology 1, 301-23 . Lindblom, B. (1968). Temporal organization of syllable production. Speech Transmission Laboratory, Quarterly Progress and Status Report 2-3, 1-5. Stockholm: Royal Institute of Technology (KTH). Lindblom, B. & Rapp, K. (1973). Some temporal regularities of spoken Swedish. Papers from the Institute of Linguistics 21. University of Stockholm. Malecot, A., Johnston, R. & Kizziar, P.-A. (1972). Syllabic rate and utterance length in French. Phonetica 26,235-51. Martin, J. G. (1972). Rhythmic (hierarchical) versus serial structure in speech and other behavior. Psychological Review 79, 487-509. Michon, J. A. (1967). Timing in Temporal Tracking, Institute for Perception RVO-TNO. Soesterberg, The Netherlands. Miles, D . W. (1937). Preferred rates in rhythmic responses. Journal of Genetic Psychology 16,427-69. Miyake, I. (1902). Researches on rhythmic action. Studies from the Yale Psychological Laboratory (E. W. Scripture Ed.). 10, 1-48. Nooteboom, S. & Slis, I. H. (1972). The phonetic feature of vowel length in Dutch. Language and Speech 15, 301-16. Ohala, J. J. (1973). The temporal regulation of speech. Symposium on auditory analysis and perception of speech, Leningrad, August 21-24. Ondnl.ckova, J. (1962). Contribution to the question concerning the rhythmical units in Czech. Phonetica 8, 55-72. Peterson, G. E. & Lehiste, I. (1960). Duration of syllable nuclei in English. Journal ofthe Acoustical Society of America 32, 693-703. Pike, K. L. (1945). The Intonation of American English, Ann Arbor: University of Michigan Press. Preminger, A. (Ed.) (1965). Encyclopedia of Poetry .and Poetics, Princeton University Press. Rose, A. & Pew, R . W. (1972). Motor programs and time sharing in skilled performance. 44th Annual meeting of the Midwestern Psychological Association, Cleveland, Ohio, May 5. Shen, Y. & Peterson, G. G. (1962). Isochronism in English. Studies in Linguistics, Occasional papers 9. Buffalo, New York. Stankiewicz, E . (1960). Linguistics and the study of poetic language. Style in Language (T. Sebeok, Ed.). Pp. 69-8l.Cambridge, Massachusetts: MIT Press. Stetson, R. H. (1951). Motor Phonetics. Amsterdam: North Holland. Tingley, B. M. & Allen, G. D . (1975). Development of speech timing control in children. Child Development (in press). Treisman, M. (1963). Temporal discrimination and the indifference interval. Psychological Monographs 77 (Whole Number 576). Vanderslice, R. (1968). Synthetic elocution. Working papers in phonetics 8. UCLA Phonetics Laboratory. Los Angeles, California.

86

G. D. Allen

Vanderslice, R . & Ladefoged, P. (1972). Binary suprasegmental features and transformational wordaccentuation rules. Language 48, 819-38. Wang, W. S.-Y. (1968). The basis of speech. Project on Linguistic Analysis Reports, second series 4. Phonology Laboratory, Berkeley, California. Woodrow, H. (1951). Time perception. Handbook of Experimental Psychology (S. S. Stevens Ed.). Pp. 1224-36. New York: Wiley. Wundt, W. (1911). Grundziige der physiologischen Psychologie. 6th ed. Leipzig: Engelmann.