Rhythm in language acquisition

Rhythm in language acquisition

Accepted Manuscript Title: Rhythm in language acquisition Author: Alan Langus Jacques Mehler Marina Nespor PII: DOI: Reference: S0149-7634(16)30120-8...

590KB Sizes 6 Downloads 355 Views

Accepted Manuscript Title: Rhythm in language acquisition Author: Alan Langus Jacques Mehler Marina Nespor PII: DOI: Reference:

S0149-7634(16)30120-8 http://dx.doi.org/doi:10.1016/j.neubiorev.2016.12.012 NBR 2687

To appear in: Received date: Revised date: Accepted date:

1-3-2016 17-11-2016 12-12-2016

Please cite this article as: Langus, Alan, Mehler, Jacques, Nespor, Marina, Rhythm in language acquisition.Neuroscience and Biobehavioral Reviews http://dx.doi.org/10.1016/j.neubiorev.2016.12.012 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Rhythm in language acquisition

Alan Langus1, Jacques Mehler1 and Marina Nespor1

Affiliations: 1. SISSA – International School for Advanced Studies, via Bonomea 265, 34136, Trieste, Italy

Correspondence: Marina Nespor ([email protected])

Highlights

-

The rhythm of spoken language is structured hierarchically Rhythm is universal on the segmental level, the level of metric feet and phonological phrases Human infants are sensitive to speech rhythm early in development and may use this sensitivity for breaking into the speech code

Abstract

Spoken language is governed by rhythm. Linguistic rhythm is hierarchical and the rhythmic hierarchy partially mimics the prosodic as well as the morpho-syntactic hierarchy of spoken language. It can thus provide learners with cues about the structure of the language they are acquiring. We identify three universal levels of linguistic rhythm – the segmental level, the level of the metrical feet and the phonological phrase level – and discuss why primary lexical stress is not rhythmic. We survey experimental evidence on rhythm perception in young infants and native speakers of various languages to determine the properties of linguistic rhythm that are present at birth, those that mature during the first year of life and those that are shaped by the linguistic environment of language learners. We conclude with a discussion of the major gaps in current knowledge on linguistic rhythm and highlight areas of interest for future research that are most likely to yield significant insights into the nature, the perception, and the usefulness of linguistic rhythm.

Key words: linguistic rhythm, language acquisition, perception, consonants, vowels, metrical feet, lexical stress, phonological phrases.

1. Introduction

Most natural and cultural phenomena are pervaded by rhythm. The waves of the sea move rhythmically, we walk rhythmically, our heart beats rhythmically and music as well as dance are rhythmic. But what is rhythm? The most general definition of rhythm has been given by Plato in The Laws: rhythm is order in movement. The task of scholars working on language is thus that of identifying the elements that establish rhythm in the flow of speech. Rhythm in speech can also be seen as alternation of stronger and weaker elements at different levels of the prosodic hierarchy (Nespor & Vogel, 1986, 2007; Nespor, 1990). And among the tasks of scholars working on language acquisition is that of identifying the grammatical properties that can be acquired on the basis of rhythm. At the lowest level – the segmental one – rhythm is signalled through the alternation between consonants and vowels. It defines how frequent and how regular the occurrence of vowels is (Ramus, Nespor & Mehler, 1999). Above the segmental level, linguistic rhythm is manifested in the alternation of stressed elements as defined in the iambic trochaic law (ITL). Originally proposed to describe rhythm in music (Bolton 1894; Woodrow, 1951; Cooper & Meyer, 1960), the ITL states that if tones alternate in duration, they are grouped as iambs (weak-strong / short-long), while if they alternate in intensity they are grouped as trochees (strong-weak / high-low). At the level of the foot, syllables bearing secondary stresses alternate with stressless syllables, and are grouped as predicted by the ITL (Hayes, 1995). At the level of the phonological phrase (PHPH), words with PHPH stress alternate with words that have lexical stress but are weak at the PHPH level. Mainly duration characterizes iambic grouping, and mainly pitch, in

addition to intensity, has been shown to characterize trochaic grouping (Nespor, Shukla, van de Vijver, Avesani, Schraudolf & Donati 2008). In the present paper, we will concentrate on the function rhythm can play in first language acquisition in the prelexical period. It is known that when infants start uttering two-word phrases, they put the words in the order of the language of their environment, e.g. verb-object in English or French and object-verb in Turkish or Japanese (Hirsh-Pasek, Kemler Nelson, Jusczyk, Wright Cassiday, Druss & Kennedy, 1987; Hirsch-Pasek & Golinkoff, 1996). It is also known that 6-9 month old infants know some common words (Bergelson & Swingley, 2012). How do they acquire word order and words early in infancy? We will propose that rhythm – to which infants are very sensitive – plays an important role in these achievements. The two levels we will discuss are the basic rhythmic level, which could contribute to word segmentation, thus helping in the acquisition of words, and the level of the phonological phrase, that has been shown to be crucial for the acquisition of word order. We will not discuss rhythm at the level of the foot since it is hardly audible in connected speech in most languages. In addition, to the best of our knowledge, it has never been proposed to help infants bootstrap into the language of their environment. We will also not discuss word stress, since, as we will briefly show below, it does not alternate: it is thus not rhythmic. Finally, we will address the issue concerning the nature of the learning mechanisms involved in the different acquisitions by discussing which mechanism are domain general and which specific to language; which are present at birth, thus innate, and which require exposure to language to develop.

2. The basic rhythmic level

According to the classic theory of rhythm, the world’s languages belong to one of three rhythmic classes defined as stress-timed, syllable-timed and mora-timed (Pike, 1945; Abercrombie, 1967). This means that there should be isochrony (speech signal divided into equal intervals) either at the level of the foot, i.e. between two syllables one of which receives prominence and the other that does not, as e.g. in English or Dutch, or at the level of the syllable, as e.g. in Italian or Spanish, or at that of the mora, as e.g. in Japanese. Isochrony has, however, not been found in the actual speech signal: in English, interstress intervals increase in duration in proportion to the syllables they contain (Shen & Peterson, 1962; O’Connor, 1965; Lea, 1974). Interstress intervals also vary depending on the structure of the syllables and on their position in the utterance (Bolinger, 1965). Finally, the duration of interstress intervals is similar in stress-timed English, on the one hand, and syllable-timed Spanish, Italian and Greek, on the other hand (Dauer, 1983). Despite the failure to find isochrony in the speech signal, it is clear that languages belonging to the same rhythmic class sound more similar to each other than languages that belong to different rhythmic classes. The most plausible explanation for the different rhythms at the basic level is thus not due to isochrony but rather to the regularity at which consonants and vowels alternate in the utterance. It has been proposed that rhythm at the most basic level – i.e. the segmental level – can be measured through the percentage of vocalic space (%V) in the speech stream, and the standard deviation of consonantal intervals (ΔC) (Ramus et al. 1999). Languages with a very small syllabic inventory like Japanese with just 3 syllable types (e.g. syllables with a vowel, consonant-vowel and

consonant-vowel-consonant) thus have a high %V and a lowΔC. At the other extreme are languages with a very rich syllabic repertoire, such as English with 16 or Dutch with 19 syllable types. And in the middle are languages like Italian with 8 syllable types and Spanish with 9. Importantly, when comparing the two measures from different languages, they group together according to the three rhythmic classes as seen in Figure 1. This provides further evidence that the alternation of consonants and vowels, rather than isochronoy, signals rhythm at the segmental level. Even though other metrics have been proposed for classifying languages rhythmically at the segmental level (Frota & Vigario, 2001; Grabbe & Low, 2002; Wagner & Dellwo, 2004; White & Mattys, 2007), only the account of Ramus et al., (1999) has been tested in young infants through their ability to discriminate languages on the basis of segmental rhythm.

Figure 1. ΔC, the standard deviation of the consonantal intervals, vs. %V, the amount of time per utterance spent in vowels, for 14 languages. The widths of the ellipses along the

two axes represent standard errors of the mean along the axes. Dark ellipses represent head-initial languages, and light ellipses head-final languages. Figure is from Nespor, Shukla, and Mehler (2011). At the left upper corner there are the stressed-timed languages – with low %V and hi ΔC. At the lower right corner there are the mora-timed languages – with high %V and low ΔC. And in the middle syllable-timed languages, in between the other two groups for both measures.

There is experimental evidence that shows that young infants are sensitive to differences in segmental rhythm between different languages. For example, newborn infants can discriminate between the so-called stress-timed and syllable-timed languages (French-Russian and English-Italian: Mehler et al., 1988; English-Spanish: Moon, Cooper & Fifer, 1993) and between the so-called stress-timed and mora-timed languages (English-Japanese: Nazzi et al., 1998; Dutch-Japanese: Ramus et al., 2000). However, they fail to discriminate between languages belonging to the same rhythmic class (English-Dutch: Nazzi et al., 1998). Newborns can also discriminate between a set of mixed English and Dutch (stress-timed) sentences and a set of mixed Spanish and Italian (syllable-timed) sentences, but fail when the sentences forming a set do not belong to the same rhythmic class (Nazzi et al., 1998). The evidence from language discrimination is therefore consistent with the hypothesis that newborn infants are sensitive to the basic rhythmic differences between languages.

To rule out the possibility that discrimination occurs through phonotactic and prosodic cues, newborns have also been tested with stimuli where the available speech cues are selectively reduced. For example, Mehler et al. (1988) showed that newborns succeed in discriminating rhythmically different languages when presented with low-pass filtered stimuli, thus eliminating most of the phonetic information and preserving only its prosodic properties (see also Nazzi et al., 1998; Ramus et al., 2000). Importantly, because low-pass filtered speech contains both segmental rhythm and intonation, rhythmic discrimination at the segmental level can only be shown by resynthesizing the speech stimuli. Ramus and Mehler (1999) used four different transformations that preserved: (a) broad phonotactics and prosody by replacing fricatives with /s/, vowels with /a/, liquids with /l/, plosives with /t/, nasals with /n/ and glides with /j/ (Saltanaj); (b) prosody by replacing all consonants with /s/, and all vowels with /a/ (Sasasasa); (c) rhythm only by flattening the intonational contours from the sasasa transformation (flat Sasasasa), and (d) intonation only by replacing all phonemes with /a/ (Aaaaaaaa). The results show that while adult listeners can discriminate rhythmically different languages (e.g. English and Japanese) in all conditions except in the intonation (Aaaaaaaa) condition (Ramus & Mehler, 1999), newborn infants succeed in discriminating rhythmically different languages with Saltanaj sentences without intonation but not Sasasasa sentences (Ramus, 2002). While adult listeners can discriminate languages from different rhythmic classes on the basis of segmental rhythm alone, newborn infants additionally require broad phonotactic information (e.g. the information about the syllabic complexity) to succeed in the same task. This suggests that the perception of segmental rhythm emerges from neonates’ sensitivity to syllable complexity that gradually develops into the sensitivity to

variation in consonant intervals and the amount of vocalic space. Because discrimination between rhythmically different languages can be achieved through segmental cues alone, and is boosted by broad phonotactics in newborns, it may be suggested that segmental rhythm is independent of prosodic factors such as stress and prominence. The next question we address is what infants can achieve through the perception of the basic rhythmic level. First and foremost, newborns’ ability to discriminate rhythmically different languages on the basis of broad phonotactic (Ramus, 2002) information suggests that humans are already from birth sensitive to the relative space occupied by vowels and consonants in the speech stream. While the ability to discriminate rhythmically different languages does not necessarily imply that infants must be able to encode all phonemes found in the world’s languages, the fact that they require broad phonotactic information for doing so, does suggest that they are highly sensitive to the syllabic complexity of the language they are listening to. Because the syllabic repertoire is correlated with the mean length of the most common words – i.e. mainly monosyllabic in languages such as English and Dutch that have a large syllabic repertoire and tri-syllabic or larger in Japanese that has a poorer syllabic repertoire – suggests that segmental rhythm might help young infants to predict how often to expect word boundaries (Mehler & Nespor, 2004). Language discrimination is also very important for infants who are exposed to two languages simultaneously already in the womb. Newborns from bilingual mothers, exposed to both English (stress-timed) and Tagalog (syllable-timed) during the prenatal period, can discriminate utterances of the two languages (Byers-Heinlein et al., 2010). The ability to discriminate rhythmically different languages may thus help infants in

bilingual environments to tease apart their two mother tongues. However, not all infants in bilingual environments are exposed to rhythmically different languages. Research shows that by 4 months of age, Catalan-Spanish bilingual infants can discriminate their rhythmically similar languages (Bosch & Sebastián-Gallés, 1997; 2001). While doing so clearly requires knowledge of the segmental information and the prosodic structure of the rhythmically similar native languages, infants may also at least partially rely on basic rhythm. For example, 3.5-month-old Basque and Spanish bilingual infants can discriminate low-pass filtered utterances from their two native languages even when these belong to the same rhythmic class. Even though both Basque and Spanish fall within the syllable-time class of languages, they differ both in (ΔC) and %V values (Molnar, Gervain, & Carreiras, 2011), and show marked differences in phrasal prosody (Mehler, Sebastian-Galles, & Nespor, 2004; Nespor, 1990). Consequently it is possible that some of the Basque-Spanish bilingual infants’ ability to discriminate their two native languages stems from the rhythmic differences at the segmental level rather than from the differences in phrasal prosody (Molnar, Gervain & Carreiras, 2013). This would suggest that around 4 months of age infants are sensitive to rhythmic differences between languages that go beyond their belonging to the three basic rhythmic classes.

3. Rhythm and the Iambic-Trochaic Law

Above the segmental level, linguistic rhythm also occurs at the level of metrical feet and at the level of phonological phrases. In both constituents, it is characterized by the alternation of stressed and stressless elements (Liberman & Prince, 1977). Just like

syntax, prosody is structured hierarchically, and its constituents go from the syllable up to the utterance (e.g. Beckman & Pierrehumbert, 1986; Hayes, 1989; Nespor & Vogel, 1986, 2007; Selkirk, 1984). In the prosodic hierarchy, all constituents are exhaustively contained into higher ones (e.g. Selkirk, 1984). Thus metrical feet never straddle prosodic word boundaries, prosodic words never straddle phonological phrase boundaries, phonological phrases never straddle intonational phrase boundaries, and intonational phrases never straddle utterance boundaries. Word secondary stress at the level of metrical feet and phonological phrase prominence also adhere to the Strict Layer Hypothesis that characterizes the prosodic hierarchy (Selkirk, 1984; Nespor & Vogel, 1986, 2007) – i.e. lexical stress is instantiated over syllables that receive prominence at the level of the foot, and phonological phrase prominence is instantiated over syllables that receive prominence both at the level of the foot and at the level of prosodic words. The prominence (or stress) at a higher level of the prosodic hierarchy thus always coincides with one of the stressed position at all lower levels of the prosodic hierarchy. Importantly, prominence in the prosodic hierarchy can only be instantiated by changes in pitch, duration and intensity (Cutler, Dahan, & van Donselaar, 1997; Lehiste, 1970; Langus, Marchetto, Bion, & Nespor, 2012). The perception of the rhythmic alternation of weak and strong elements, where the prominence of the strong element is signalled either through intensity/pitch or duration, is governed by the perceptual mechanism known as the Iambic-Trochaic Law (ITL). An iamb is a group of two elements where the strong element follows the weak one and a trochee is a group where the strong element precedes the weak one. The Iambic-Trochaic Law thus describes the perceptual phenomenon according to which

humans are biased to group elements alternating in prominence either iambically or trochaically depending on the physical properties that signal prominence. After the ITL was first formally proposed in the domain of music (Bolton, 1894), consequent research in the perception of rhythm suggests that iambic / trochaic grouping of pure tones can be caused by a number of different stimulus properties. For example, prominence alternating in intensity tends to lead listeners to group pure tones in trochaic (strong-weak) pairs (Fraisse, 1956; Jones, 1981), as does prominence alternating in pitch (Jones, 1981). Conversely, prominence alternating in duration tends to lead listeners to group pure tones in iambic (weak-strong) pairs (Woodrow, 1951). This leads to the following formulation of the ITL:



Elements alternating in pitch and/or intensity are grouped into trochees with the strong element (with higher pitch/intensity) preceding the weak one



Elements alternating in duration are grouped into iambs with the strong element (with the longer duration) following the weak one

There is linguistic evidence that the ITL universally governs rhythm at two different levels of the prosodic hierarchy: on the level of metrical feet and on the level of phonological phrases (Hayes, 1995; Nespor et al., 2008). Interestingly, prominence on the intermediate lexical level does not appear to be governed by ITL. Before presenting experimental evidence for the perception of ITL in diverse perceptual domains and modalities it is thus important to clarify how the ITL relates and does not relate to linguistic rhythm in the speech signal.

4. Rhythm in metrical feet and its absence in stress at the word level

The ITL was first proposed for language at the level of metrical feet where it universally accounts for the location of word secondary stress. If language, in forming feet, systematically puts heavy syllables at the right edge, then the rightmost syllable in the foot will be marked mainly by longer duration, whereas if a language ignores syllable weight in forming feet, then it will form trochaic feet by stressing the leftmost syllable through intensity (Hayes, 1995). Thus if listeners use the ITL in perceiving rhythm in feet they should group syllables into trochaic feet if prominence is signalled through intensity and into iambic feet if it is signalled through duration. No experimental evidence exists on this matter, to the best of our knowledge. Stress at the word level has caused some confusion in the literature because in some languages, like English, lexical stress is often word initial (sometimes also called trochaic). Since in English many words are just one metrical foot long, listeners may in principle acquire a segmentation strategy that assigns word boundaries according to lexical stress (Cutler & Norris, 1988; Cutler & Carter, 1987; Cutler, 1990) and yields trochaic feet (Jusczyk, Cutler, & Redanz, 1993). However, because not all feet in English carry primary lexical stress and stress does not always fall on the initial syllable (e.g. delicious), even in English stress cannot be used to group syllables into words according to the ITL (cf. Gerken, 1994). For example, unlike prominence in metrical feet, lexical stress does not alternate. There is one lexical stress in bisyllabic words, as síster or guitár, and one in a word with nine or more syllables, as anticonstitutionálity or

othorinolaryngólogist. This is so in English, as well as in other languages, e.g. Italian méla ‘apple’ or metà ‘half’ and precipitevolissimevolménte ‘in a very precipitous way’ or Spanish pérro ‘dog’ and anticonstitucionalménte ‘in an anticonstitutional way’. In addition, the ITL does not predict the nature of lexical stress. For example, Italian words that form minimal pairs in lexical stress, like pápa and papá, do not exhibit the type of prominence predicted by the ITL, where pitch / intensity are initial and duration final. Penultimate lexical stress (initial in the example above) is in fact characterized by longer duration than final stress (Marotta, 1985). Because neither the location nor the specific cues signalling primary lexical stress are universal, they have to be acquired through exposure to one’s native language. Studies that investigate primary lexical stress in speech perception (e.g. Jusczyk, Cutler, & Redanz, 1993; Thiessen & Saffran, 2003) cannot therefore also investigate a universal perceptual mechanism such as the ITL. As we will demonstrate below, segmenting continuous speech into words according to primary lexical stress and grouping syllables into metrical feet or grouping words into phonological phrases according to ITL are two different processes.

5. Rhythm at the level of the phonological phrase

Among the tasks infants face in acquiring the syntax of their language of exposure, are the grouping of words into constituents and figuring out the relative order of words within a constituent. We propose that a perceptual mechanism that uses rhythmic alternation in Phonological Phrases prominence to group word into phrases could guide pre-lexical infants in these tasks.

It is by now well known that prosody – especially rhythm – has the function of interpreting syntactic structure. Although the prosodic hierarchy has come into existence because it not isomorphic to the syntactic hierarchy, its constituents give cues to some major syntactic boundaries (Selkirk, 1984; Nespor & Vogel, 1986, 2007). Prosody can in fact resolve ambiguities: two sentences with the same sequence of words may have a different division into prosodic constituents and thus not be ambiguous when uttered. An example are the two Italian sentences [la vecchia] [legge la regola] ‘the old lady reads the rule’ vs. [la vecchia legge] [la regola] ‘the old law rules it’. Prosody may thus guide infants in discovering some basic syntactic properties of their language of exposure. In human language, in fact, prosody is unavoidable and it has been shown to characterize also sign languages (Nespor & Sandler, 1999, Wilbur 2000). The most relevant phrasal constituent for the interpretation of syntax through rhythm is the phonological phrase, which extends from the left edge of a phrase to the right edge of its head in right recursive languages; and from the left edge of a head to the left edge of its phrase in left recursive languages (Nespor & Vogel, 1986, 2007). As all constituents of the prosodic hierarchy, the PHPH is signalled by phonological phenomena that apply within a PHPH but not across two PHPHs. Importantly for language acquisition in infancy, prominence within phonological phrases signals word order: head initial languages – such as English and Italian where the syntactic heads precede their complements (e.g. verbs in verb phrases) – have PHPH final prominence; and head final languages have PHPH initial prominence. Thus in SVO languages, such as English, a verb-object sentence like eat apples the PHPH is characterized by PHPH stress on the final object apples. In an SOV language such as Turkish, instead, the PHPH stress in an

object-verb phrase is on the initial object elma: elma ye ‘apple – eat’ (Nespor & Vogel, 1986, 2007). Unlike words that can contain many syllables but only one lexical stress, PHPHs usually contain no more than two words that can carry lexical stress and thus instantiate PHPH prominence. This means that prominence in Phonological Phrases alternates rhythmically between words that carry PHPH prominence and words that do not. It has been proposed that the PHPH prominence might be exploited by infants to bootstrap into the syntax of their language of exposure, that is to fix one of the basic syntactic parameters (Nespor, Guasti and Christophe 1996). The fact that infants do not make mistakes in word order when they start combining words, and that before then they discriminate well-formed from ill-formed word orders (Hirsch-Pasek et al., 1987; HirschPasek & Golinkoff, 1996), could be accounted for if they had learned the relative order of heads and complements at a prelexical stage exclusively on the basis of phrasal rhythm, that is, independently of word segmentation and knowledge of the lexicon (Christophe, Nespor, Guasti & van Ooyen, 2003). In this endeavour, the first question to answer is whether infants are sensitive to the different locations of phrasal stress in VO and OV languages. It has indeed been shown that 6-12 week old infants discriminate French from Turkish – two languages similar as to syllabic structure and word primary stress – solely on the basis of the location of phonological phrase prominence. In fact, we have shown that it is not a difference in realization of prominence in the two languages that is responsible for discrimination, but uniquely the difference in relative prominence in the two languages (Christophe, Nespor, Guasti & van Ooyen, 2003).

In addition to hearing the difference between the prosody of two languages with different word orders, it should be found out whether infants are also able to segment the speech stream into phonological phrases. That is, would they know if in a sequence of strong (s) and weak (w) elements such as ….swswswsw… the strong element is initial or final in its group. It has been proposed that the iambic-trochaic law is responsible for the different manifestation of PHPH stress in strong initial vs. strong final position. It has been shown, on the basis of acoustic analyses of French and Turkish declarative sentences, that duration is the main correlate of PHPH prominence in VO, thus iambic, languages, while pitch, in addition to intensity, are the main correlates of PHPH prominence in OV, thus trochaic, languages (Nespor et al., 2008). Crucially, this is true for branching phonological phrases, but not for non-branching ones, thus showing that the different realization of prominence is not a general property of the two languages, but indeed determined only when a specific rhythmic alternation (iambic or trochaic) is present (Nespor et al., 2008). The question then of course arises whether all other OV and VO languages share the same type of phrasal prominence. Needless to say, it is impossible to test all languages, Nespor et al. (2008) analysed PHPH prominence in German, a language that has both the OV and the VO order in subordinate clauses. For example, according to the complementizer used to introduce subordinate clauses – denn or weil – the two different word orders – VO or OV – are required. This is exemplified by sentences such as: Der Abend wird gut werden, weil ich Papa sehe, or: Der Abend wird gut werden denn ich sehe Papa (in English: It´s going to be a nice evening, because I (will) see Papa). The results go in the right direction: stress on papa is more realized through pitch and

intensity in the first case and more through lengthening in the second case. In fact, recent studies show that the predictions of the ITL hold up also in OV languages such as Japanese, Korean, Farsi and Hindi (Molnar et al., 2016). This finding allows us to consider a psychologically plausible mechanism for the acquisition of word order, or the setting of the relative order of heads and complements. The question now is if humans are sensitive to this type of alternation, i.e. if they indeed group sequences of speech sounds alternating in duration iambically, and sequences of sounds alternating in intensity or pitch trochaically.

6. The perception of the ITL with linguistic stimuli

Because the ITL governs rhythmic patterning of prominence at different levels in the rhythmic hierarchy, numerous studies have investigated the perceptual grouping of linguistic elements alternating either in pitch, intensity or duration. For example, Bion, Benavides and Nespor (2011) had participants listen to repeated sequences of 10 syllables separated by pauses (i.e. pa su tu ke ma vi bu go ne du) that had prominence signalled either through higher pitch or longer duration over every other syllable. Participants were then presented with two prosodically flat syllable pairs. One of the pairs had adhered to the ITL during familiarization and the other did not. Participants were consequently asked whether they remembered the first or the second pair. Italian-speaking participants recalled significantly better pairs of syllables that adhered to the ITL. Similar results have also been reported with adult English- and French-speaking participants who were familiarized with prominence signalled through either intensity or duration (Hay & Diehl,

2007). Because Italian, English and French are all Verb-Object languages, where phrasal prominence is signalled through phonological phrase final lengthening (Nespor et al., 2008; Gervain & Werker 2013), these results suggest that adult listeners whose native language signals phonological phrase prominence through duration adhere to the ITL. However, studies investigating the perceptual grouping of linguistic stimuli alternating in pitch and duration with speakers of Object-Verb languages suggest that native language can influence the perception of rhythm. Langus and colleagues (in press) tested Italian-, Turkish- and Persian-speaking adults on the same stimuli as Bion et al. (2011) and found that while Italian speakers adhered to the ITL, Turkish and Persian speakers, whose native languages have the Object-Verb order and signal phonological phrase prominence primarily through pitch and intensity, failed to group syllables alternating in duration iambically (i.e. as short-long). Instead, participants of OV languages in this study grouped syllables alternating in duration trochaically, just as they did with syllables alternating in pitch. While speakers of VO languages (like Italian, English and French) use the ITL for grouping linguistic stimuli, the absence of duration as cue for phonological phrase prominence in OV languages (such as Turkish and Persian) causes participants to rely on the native language trochaic prominence also when hearing syllables alternating in duration. The finding that the grouping of linguistic stimuli alternating in duration iambically is influenced by the knowledge of participants’ native language is also supported by studies with young infants. For example, Bion, Benavides, and Nespor (2011) showed that 7-month-old Italian infants group syllables alternating in pitch into trochees with high pitch on the first syllable, but fail to group syllables alternating in

duration iambically, showing no preference for either short-long or long-short syllable pairs. Importantly, both iambic and trochaic preference – depending on the familiarization stream – was found with 7-month-old bilinguals exposed to English and an OV language: Japanese, Hindi, Punjabi, Korean or Persian (Gervain & Werker, 2013). While age and linguistic environment thus appear to play a crucial role in the development of these grouping preferences (c.f. Yoshida et al., 2010), infants’ ability to use prominence signalled through duration for discovering constituents from speech appears to emerge during the first year of life. Thus perceptual grouping based on pitch and duration follow different developmental trajectories. Grouping biases based on pitch seem to emerge very early in development, possibly reflecting universal abilities, while perceptual grouping based on duration either emerges with language experience or depends on perceptual maturation. Interestingly, there is even some evidence that the rhythmic alternation of prominence in phonological phrases may also be directly related to syntactic processing and the basic word order of the native tongue already in infancy. Two studies have exploited the fact that the OV and VO languages do not only differ in their realization of phrasal rhythm. While VO languages place highly frequent function words before content words (e.g. the cat), OV languages place them after content words (e.g. in Japanese: ‘Tōkyō ni’ which means ‘Tokyo to’). For example, French 8-month-old infants show a preference for the native frequent-infrequent order only if it is aligned with the native phrasal rhythm with the frequent initial ‘function’ words being unstressed and the infrequent final ‘content’ words carrying prominence signalled through duration (Bernard & Gervain, 2012). Also 7-month-old bilingual infants who have one native

language with the Object-Verb order and the other one with the Verb-Object order can determine the order of the frequent-infrequent words through phrasal rhythm, preferring the frequent-infrequent order if prominence is signalled through duration and the infrequent-frequent order if prominence is signalled through pitch and intensity (Gervain & Werker, 2012). This suggests that infants are able to use word frequency as well as prosodic rhythm as cues to word order, integrating them into a coherent representation that may aid infants to figure out the order of heads and complements and consequently the word order of their native language. There is also some evidence that suggests that these grouping biases should not be confused with segmenting continuous speech into words. When familiarized with streams of syllables segregated by pauses, which alternate either in intensity or duration, French speakers group these syllables into either trochees or iambs as predicted by the ITL (Hay & Diehl, 2007). However, French speakers familiarized with continuous syllable streams where the syllables alternate either in pitch, intensity or duration, segment the stream into stress initial chunks with pitch / intensity but fail to segment the stream into stress final chunks with duration (Bhatara et al., 2013), a failure that occurs also with sequences of highly variable sequences of connected instrumental sounds that mimic speech (Bhatara et al., 2015). Also Italian-speaking adults group syllables alternating in prominence that are segmented by pauses into trochees when prominence is signalled through pitch and into iambs when prominence is signalled through duration (Bion et al., 2011; Langus et al., in press). But when Italian speakers are familiarized with a continuous stream of syllables where transitional probabilities between syllables signal three-syllabic words, participants could only segment the continuous stream if lengthening occurred on

penultimate syllables and pitch increase either on the first or final syllable (Ordin & Nespor, 2013). On the basis of lexical stress, neither Italian nor French speakers segment continuous speech into words that correspond to iambs. Importantly when the segmental units are longer than bisyllables Italian-speaking participants do not even converge on the trochaic grouping with pitch. The ITL thus does not govern the perception of rhythm on the lexical level.

7. The perception of the ITL with non-linguistic auditory stimuli

Because the ITL was originally proposed for the perception of music (Bolton, 1894; Woodrow, 1951; Cooper and Meyer, 1960), it has also, and primarily, been studied experimentally with non-linguistic auditory stimuli (Bolton, 1894; Cooper & Meyer, 1960; Woodrow, 1951). For example, adult French and English speakers succeed in grouping square wave sequences alternating in intensity into trochaic pairs and square wave sequences alternating in duration into iambic pairs (Hay & Diehl, 2007). Similar results have also been obtained with pure tones alternating in either fundamental frequency or duration with Italian-, Turkish- and Persian-speaking adults who respond according to the ITL (Marino, 2014). Importantly, in order to directly compare participants’ performance with linguistic and non-linguistic auditory stimuli, Langus et al. (in press) also tested participants’ performance on sinewave analogues of speech that were resynthesized from the linguistic stimuli used in the study of Bion et al. (2011). Sinewave speech is interesting because it is synthesized by replacing three formants in speech with sinusoids, thus eliminating the phonetic information but preserving the

overall complexity of spoken language, including the frequency and amplitude variations that define its prosody (Remez, Rubin, Pisoni, & Carell, 1981). Importantly – because Turkish and Persian speakers violated the ITL with linguistic stimuli by grouping syllables alternating in both pitch and duration torchaically – they succeeded in choosing the prosodically flat sinewave speech chunks according to the ITL, as did Italianspeaking adults. These results suggest that the violations of the ITL observed among speakers of OV languages with linguistic stimuli do not automatically transfer to the nonlinguistic auditory domain. Even though there is some cross-linguistic evidence that the ITL guides listeners’ perception of rhythm in a non-linguistic auditory domain, violations of the ITL have also been found with pure tones. For example, while Japanese and English speakers group pure tones alternating in intensity trochaically, Japanese speakers have repeatedly been found to show no bias for grouping tones alternating in duration iambically (Kusumoto & Moreton, 1997; Iversen, Patel, & Ohgushi, 2008). It is possible that the inability in Japanese-speaking adults to group tones alternating in duration iambically stems from either their linguistic background or from their culture specific experience with music. Japanese is an Object-Verb language that marks phonological phrase level prominence primarily through pitch and intensity. However, Turkish- and Persian-speaking adults, two OV languages, do not show any deficiency in grouping either tones or sinewave speech iambically, if they alternate in duration (Marino, 2014; Langus et al. in press). This suggests that the violations of the ITL by Japanese speakers do not stem from the knowledge of the trochaic phonological phrase rhythm in their native language. Because we know of no study that has investigated Japanese speakers’ abilities to group linguistic

stimuli alternating in duration, it is at present not possible to identify the cause of ITL violations with non-linguistic auditory stimuli.

8. The perception of the ITL with visual stimuli and transfer across perceptual domains

There is cross-linguistic evidence, with a few exceptions, that the Iambic Trochaic Law accounts for grouping in audition more generally than just for speech and could thus be exploited in first language acquisition. Does this suggest that these perceptual biases extend also to a non-auditory perceptual domain? Testing the ITL in the visual domain is thus not only interesting because finding structure in visual information can facilitate visual cognition, but could additionally also serve for the perception and acquisition of non-auditory aspects of language. Language is not only perceived auditorily. Speech is also conveyed visually through the movements of the head, the hands and the body more generally (Bernstein, Eberhardt, & Demorest, 1998; Graf, Cosatto, Strom, & Huang, 2002; Khramer & Swert, 2004; Kelly & Barr, 1999; Guellaï, Langus, & Nespor, 2014; Kendon, 1994 for a review). Furthermore, the analysis of the spatio-temporal properties of visual sequences is relevant in the case of sign languages (Corina & Knapp, 2008), since they involve a complex coordination of non-vocal motor gestures (Poizner & Kegl, 1993). While to our knowledge there is no evidence for the ITL in sign languages, there is some evidence to suggest that the perceptual biases may extend to the visual perceptual as well as to the audio-visual perception of spoken language. It has been shown that adults are able to segment continuous visual sequences into distinct events exploiting information about their duration, intensity and temporal

frequency (Kumar, Abrams, & Mehta, 2009). And it has been shown that subjects who are good at segmenting are also good at remembering (Swallow, Zacks & Abrams, 2009). As a first step to the comprehension of the ITL in vision, a recent study has investigated the grouping of non-linguistic visual sequences (Peña, Bion & Nespor, 2011). In three experiments, Peña et al. (2011) investigated whether the ITL also organizes grouping of elements alternating in prominence in the visual modality. The visual stimuli differed either in duration, intensity or temporal frequency. The latter has been proposed to be the visual analogue of pitch. In all experiments, the familiarization phase contained visual events composed of an arrangement of fifteen identical exemplars of a particular shape (e.g. fifteen stars or fifteen squares) randomly changing their location on the computer screen. The events alternated in either duration (e.g. the shapes changed location for a shorter or longer period of time), intensity (e.g. the shapes were more or less luminous) or temporal frequency (e.g. the shapes changed location faster or slower). As in the study of Bion et al. (2011), adult subjects were tested for their memory of pairs of items. The results show that Italian-speaking adults grouped visual events according to the ITL, i.e. events with either higher temporal frequency or intensity were initial, while events with longer duration were final. Importantly, comparable results were found also with the same visual stimuli with Turkish- and Persian-speaking adults who grouped visual events alternating in fundamental frequency trochaically and visual events alternating in duration iambically (Langus et al., in press). That is, participants were better at remembering sequences of shapes consistent with the visual formulation of the ITL proposed by the authors than sequences that violated the ITL.

Experimental evidence to support the ITL in the visual domain does not only come from the grouping of abstract geometrical shapes. Spoken language is perceived both in the auditory as well as in the visual modality, where the movements of the articulator (i.e. the mouth) closely mimic the sound of language. In a study with Spanishspeaking adults Peña, Langus, Gutierrez, Huepe and Nespor (submitted) asked participants to discriminate semi-nonsense sentences from German that were either iambic or trochaic at the level of the phonological phrase. In sentences with identical meaning, German allows both Object-Verb and Verb-Object order constructions, according to the specific complementizer chosen – denn requires VO and weil requires OV. It is thus possible to have both trochaic and iambic phonological phrase rhythm. The results show that Spanish-speaking participants can discriminate iambic sentences from trochaic sentences both when they could only hear the auditory modality (i.e. only the audio of the sentences) as well as in the visual modality, when they could only see the mouth of the speaker uttering the very same sentences. However, despite the good performance within perceptual domains (auditory-to-auditory / visual-to-visual), participants only showed a partial ability to discriminate auditory and visual sentences when knowledge in one domain had to be transferred to the other domain: participants could only match iambic patterns from the visual domain to the auditory speech domain, and only trochaic patterns from the auditory domain to the visual speech domain. Because the iambic and trochaic sentences contained the same nonsense words that were in the same order for both types of stimuli and only differed in rhythm – i.e. high pitch/intensity initial for trochaic sentences or long duration final for iambic sentences – participants could succeed in discriminating if they could transfer representations of

rhythm from one modality to another. This suggests that while the representations of iambic rhythm acquired through the visual perception of speech appear to be amodal – i.e. they can be transferred to visual as well as to auditory targets – they are modality specific when perceived in the auditory modality – i.e. they can only be transferred to the auditory modality. Exactly the reverse is true for trochees that are amodal when acquired in the auditory modality but modality specific when perceived in the visual modality.

9. Conclusions

Just like prosody only partially mimics the structure of morpho-syntax, also linguistic rhythm can be found only at some levels of the prosodic hierarchy, coinciding with the general organization of spoken language. Rhythmic alternation can thus be found at the segmental level where it is signalled through the alternation between consonants and vowels (Ramus et al. 1999), and on the level of metrical feet as well as phonological phrases where it is manifested through the alternation of stressed and stressless elements (Hayes, 1995; Nespor et al., 2008). Importantly, there does not seem to be rhythm at the level of the word, where the prominent elements – i.e. the syllables receiving primary lexical stress – do not alternate. Because linguistic rhythm at all the different levels of linguistic representation discussed above is universal, linguistic theory assumes that rhythm in language forms the cornerstone of our linguistic acquisition. Newborn infants’ ability to discriminate rhythmically different languages but not those that belong to the same rhythmic-class (Mehler et al., 1988; Moon, Cooper & Fifer, 1993; Nazzi et al., 1998; Ramus et al., 2000)

clearly shows that at least some of the prerequisites to perceive, represent and discriminate rhythm at the segmental level is present when humans are born. While there are no studies that have investigated rhythmic alternation at higher levels of the rhythmic hierarchy in newborn infants, the ability to discern rhythmically acceptable syllable sequences from those that violate the universal rhythmic principles seem to be in place before the end of the first year of life and certainly before infants begin to speak. While infants show knowledge of linguistic rhythm very early on, experimental evidence also suggests that linguistic rhythm is, to a certain extent, also acquired. That newborns can discriminate between rhythmically different languages only when they have segmental information available (the Saltanaj transformation), whereas adults succeed even with information only about the vocalic and consonantal space in the speech stream (the Sasasa transformation), clearly shows that aspects of segmental rhythm either mature or are acquired during development (Ramus & Mehler, 1999; Ramus, 2002). In fact, because discrimination between rhythmically different languages is more necessary for infants born into a bilingual environment, it would also be interesting to see whether infants exposed to rhythmically different languages in the womb, would show advantages over infants born into monolingual environments when presented with stimuli transformed so that only rhythmic information is preserved. This could help to clarify whether the perception of segmental rhythm is determined by the infants’ specific needs: monolingual infants need to learn the segmental repertoire of their mother tongue and bilingual infants additionally need to keep their two mother tongues apart.

To a certain extent there already is evidence that the perception of rhythm is adaptive to the linguistic background and possible even to the cultural practices of the listener. The differences in ontogenetic emergence between the iambic and trochaic biases in young infants show that linguistic rhythm is not only constrained in development (Bion et al., 2011), but also that the perception of rhythm adapts to the knowledge of our mother tongue. For example, Turkish and Persian participants’ failure to group syllables alternating in duration iambically appears to be caused by the fact that phonological phrase rhythm in these Object-Verb languages is trochaic (Langus et al., in press). In the absence of iambic rhythm speakers of such languages thus adapt the universal perceptual biases according to their native language and consequently perceive also syllable sequences alternating in duration trochaically. In experiments where it is clear that participants are grouping linguistic stimuli, rather than simply segmenting continuous speech, these violations have only been observed with biases that are not used in the listeners’ native language. Thus OV speakers whose native language has trochaic phonological phrase prominence only fail in iambic grouping of syllables alternating in duration. This further suggests that duration cues may have to be acquired. The question of acquisition is also determined by the fact that some aspects of linguistic rhythm seem to be specific to language. For example, because only human language has consonants and vowels, a mechanism that tracks the amount of vocalic and consonantal space would clearly be of little use in determining rhythm in music, especially if newborns can only do so relying on segmental information. However, the question about language specificity gets murkier higher up at the rhythmical hierarchy. The ITL has been attested with some success and a few exceptions not only over

linguistic, but also over non-linguistic auditory as well as visual information (Bion et al., 2011; Langus et al., in press; Hay & Diehl, 2007; Peña et al., 2011). This would suggest that these perceptual biases are domain general. However, violations of the ITL over syllables do not necessarily transfer to non-linguistic information, suggesting that the perceptual biases functioning over linguistic and non-linguistic perceptual input are not exactly the same (Langus et al., in press). The differences between the representations of iambic/trochaic rhythm in different perceptual domains can at least partially be seen in the asymmetric transfer of iambs and trochees in audio-visual speech perception: adult participants can only match iambic patterns from the visual to the auditory speech domain, and only trochaic patterns from the auditory domain to the visual speech domain (Peña et al., submitted). As evidence for partial transfer across domains exists only in audio-visual perception of speech, it is too early to say whether similar effects of transfer occur also when comparing linguistic / non-linguistic matching of iambs and trochees. However, the results do suggest that at least in the case of phonological phrase level rhythm there is no single mechanism that can function effortlessly across the domains of auditory and visual speech. Because transfer is difficult to observe in experimental settings, especially so with young infants, the perception of rhythm, that so clearly plays an important role in infants early development, may provide a starting point for investigating the representing and retrieving information across perceptual domains.

Acknowledgements

The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP7/20072013)/ERC grant agreement nu 269502 (PASCAL)

References:

Abercrombie, D. (1967). Elements of General Phonetics. Edinburgh U.P. p. 97. Beckman, M., & Pierrehumbert, J. (1986). Intonational structure in Japanese and English. Phonology Yearbook, 3, 15–70. Bergelson, E., & Swingley, D. (2012). At 6 to 9 months, human infants know the meanings of many common nouns. Proceedings of the National Academy of Sciences of the USA, 109, 3253-3258. Bernard, C. & Gervain, J. (2012). Prosodic cues to word order: what level of representation? Frontiers in Psychology, 3(451). Bernstein, L.E., Eberhardt, S.P., & Demorest, M.E. (1998). Single-channel vibrotactile supplements to visual perception of intonation and stress. Journal of the Acoustical Society of America, 8, 397-405. Bhatara, A., Boll-Avetisvan, N., Unger, A., Nazzi, T., & Höhle, B., (2013). Native language affects rhythmic grouping of speech. Journal of the Acoustical Society of America, 134, 3828-3843. Bhatara, A., Boll-Avetisyan, N., Agus, T, Höhle, B. & Nazzi, T. (in press). Language experience affects grouping of musical instrument sounds. Cognitive Science. Bion, R.H, S. Benavides and M. Nespor (2011) Acoustic markers of prominence influence infants’ and adults’ memory for speech sequences. Language and Speech. 54.1. 123-140. Bolinger, D. (1965) Pitch Accent and Sentence Rhythm. In: Forms of English. Accent,

Morpheme, Order. Canbridge, Mass. Harvard University Press. Bolton, T.L. (1894). Rhythm. American Journal of Psychology. 6. 145-238. Bosch, L., & Sebastian-Galles, N. (1997). Native-language recognition abilities in 4monthold infants from monolingual and bilingual environments. Cognition, 65, 33–69. Bosch, L., & Sebastian-Galles, N. (2001). Evidence of early language discrimination abilities in infants from bilingual environments. Infancy, 2, 29–49. Byers-Heinlein, K., Burns, T. C., & Werker, J. F. (2010). The roots of bilingualism in newborns. Psychological Science, 21, 343–348. Christophe, A., M. Nespor, M.-T. Guasti and B. v. Ooyen (2003) Reflexions on prosodic bootstrapping: its role for lexical and syntactic acquisition. Developmental Science. 6.2. 213-222. Cooper, G. and L. Meyer (1960) The Rhythmic Structure of Music. Chicago. University of Chicago Press. Corina , D.P. & H.P.Knappa (2008) Signed Language and Human action processing. Annals of the New York Academy of Sciences. 1145. 100-112. Cutler, A. (1990). Exploiting prosodic probabilities in speech segmentation. In G. Altmann (Ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives (pp. 105-121). Cambridge, MA: MIT Press. Cutler, A., & Carter, D. M. 1987. The predominance of strong initial syllables in the English vocabulary. Computer Speech and Language, 2:133-142. Cutler, A., Dahan, D., & van Donselaar, W. (1997). Prosody in the comprehension of spoken language: A literature review. Language and Speech, 40, 141–201.

Cutler, A., and Norris, D. (1988). The role of strong syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14:113-121. Dauer, R. (1983) Stress-timing and syllable-timing reanalyzed. Journal of Phonetics. 11. 51-62. Fraisse, P. (1956). Les Structures Rhythmiques. Publication Universitaires de Louvain, Louvain. Frota S., Vigário M. On the correlates of rhythmic distinctions: the European/Brazilian Portuguese case. Probus. 2001;13:247–275. Gerken, L.A. (1994). A metrical template account of children’s weak syllable omissions from multisyllabic words. Journal of Child Language, 21, 565-584. Gervain, J. & Werker, J.F. (2013). Prosody cues word order in 7-month-old bilingual infants. Nature Communications, 4. Grabe E., Low E.L. Acoustic correlates of rhythm class. In: Gussenhoven, Warner, editors. Laboratory Phonology. vol. 7. Berlin: Mouton de Gruyter; 2002. pp. 515– 546. Graf, P. H., Cosatto, E., Strom, V., & Huang, F. J. (2002). Visual prosody: Facial movements accompanying speech. Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition, Washington D.C. Guellaï, B., Langus, A. & Nespor, M. (2014).Prosody in the hands of the speaker. In I. Berent and S. Goldin-Meadow (eds.) Language by mouth and by hand. Frontiers in Psychology, Vol. 5, n. 00700. Hay, J., & Diehl, R. (2007). Perception of rhythmic grouping: Testing the iambic/trochaic

law. Perception & Psychophysics, 69, 113–122. Hayes, B. (1989). The prosodic hierarchy in meter. In P. Kiparsky & G. Youmans (Eds.), Phonetics and phonology. Rhythm and meter (Vol. 1, pp. 201–260). San Diego: Academic Press. Hayes, B. (1995). Metrical stress theory: Principles and case studies. Chicago, IL: University of Chicago Press. Hirsh-Pasek, K., Kemler Nelson, D.G., Jusczyk, P.K., Wright Cassiday, K., Druss, B. & Kennedy, L.J. (1987). Clauses are Perceptual Units for Pre linguistic Infants. Cognition, 26, 269-286. Hirsh-Pasek, K., & Golinkoff, R. M. (1996). How children learn to talk. In W. R. Dell (Ed.) The world book health & medical annual (pp. 92-105). Chicago: World Book, Inc. Iversen, J.R., Patel, A.D., & Ohgushi, K. (2008). Perception of rhythmic grouping depends on auditory experience. Journal of Acoustic Society of America, 124, 2263–2271. Jones, M. R. (1981). A tutorial on some issues and methods in serial pattern research. Perception and Psychophysics, 30(5):492-504. Jusczyk, P.W., Cutler, A., & Redanz, N.J. (1993). Infants’ preference for the predominant stress patterns of English words. Child Development, 64, 675-687. Kelly, S. & Barr, D. (1999). Offering a hand to pragmatic understanding: The role of speech and gesture in comprehension and memory. Journal of Memory and Language 40, 577- 592 Kendon, A. (1994). Do gestures communicate? Research on Language and Social

Interaction 27(3), 175-200. Krahmer, E. J. & Swerts, M. (2004). More about brows: A cross-linguistic analysisbysynthesis study, In: C. Pelachaud & Zs. Ruttkay (Eds.) From Brows to Trust: Evaluating Embodied Conversational Agents. Kluwer Academic Publishers. Kumar, S., Abrams, R.A. & Mehta, R. (2009). Using movement and intentions to understand human activity. Cognition, 112, 201-216. Kusumoto, K., and Moreton, E. (1997). Native language determines the parsing of nonlinguistic rhythmic stimuli. Journal of the Acoustical Society of America, 102, 3204–3204. Langus, A., Marchetto E., Bion, R.A.H., & Nespor, M. (2012) Can prosody be used to discover hierarchical structure in continuous speech? Journal of Memory and Language, 66, 285-306. Langus, A., Seyed-Allaei, S., Uysal, E., Pirmoradian, S., Marino, C., Asaadi, S., Eren, Ö., Toro, J.M., Peña, M., Bion, R.A.H., & Nespor, M. (in press). Listening natively across perceptual domains? Journal of Experimental Psychology: Learning, Memory and Cognition. Lea, W.A. (1974) Prosodic Aids to Speech Recognition. IV A General Strategy for Prosodically-guided Speech Understanding. Univac Report. Sperry Univac. DSD. Saint Paul, Minnesota. Lehiste, I. (1970). Suprasegementals. Cambridge: MIT Press. Liberman, M. & Prince, A. (1977). On stress and linguistic rhythm. Linguistics Inquiry, 8. 249–336. Marino, C. (2014) Audiovisual perception of rhythm. SISSA Master Thesis.

Marotta, G. (1985). Modelli e misure ritmiche. La durata vocalica in italiano, Bologna, Zanichelli. Mehler, J., Jusczyk, P., Lambertz, G., Halsted, N., Bertoncini, J., Amiel-Tison, C. (1988). A Precursor of Language Acquisition in Young Infants, Cognition, 29, pp.143178. Mehler, J. & Nespor, M. (2004) Linguistic rhythm and the development oflanguage. In A. Belletti & L. Rizzi (Eds.) Structures and Beyond : The Cartography of Syntactic Structures. Oxford: Oxford University Press. Pages 213-221. Mehler, J., Sebastian Galles, N., & Nespor, M. (2004). Biological foundations of language: language acquisition, cues for parameter setting and the bilingual infant. In M. Gazzaniga (ed.) The New Cognitive Neurosciences III. Cambridge, MA. MIT Press, pp.825-836. Molnar, M., Carreiras, M., & Gervain, J., ( 2016 ) Language dominance shapes nonlinguistic rhythmic grouping in bilinguals. Cognition, 152, 150–159 Molnar, M., Gervain, J. & Carreiras, M. (2013) Within rhythm class native language discrimination abilities of Basque-Spanish monolingual and bilingual infants at 3.5 months of age. Infancy, 19(3), 326–337. Moon, C., Panneton-Cooper, R., & Fifer, W. P. (1993). Two-day-olds prefer their native language. Infant Behavior and Development, 16, 495-500. Nazzi, T., Bertoncini, J. Mehler, J. (1998). Language Discrimination by Newborns: Toward an Understanding of the Role of Rhythm Journal of Experimental Psychology: Human Perception and Performance, 24,pp. 1-11. Nespor, M. (1990). On the rhythm parameter in phonology. In I. Roca (eds.) The Logical

Problem of Language Acquisition. Foris. Dordrecht. 157-175. Nespor, M. & W. Sandler (1999) Prosody in Israeli Sign Language. Language and Speech. 143-176. Nespor, M., Shukla, M. and Mehler, J. (2011) Stress-timed vs. syllable timed languages. In: M. van Oostendorp, C. J. Ewen, E. Hume & K. Rice (Eds.), The Blackwell Companion to Phonology, Wiley-Blackwell, 1147-1159. Nespor, M., M. Shukla, R. van de Vijver, C. Avesani, H. Schraudolf and C. Donati (2008) Different phrasal prominence realization in VO and OV languages. Lingue e Linguaggio. 7.2. 1-28. Nespor, M. & Vogel, I. (1986). Prosodic Phonology. Dordrecht. Foris. 2007 edition Berlin. Mouton De Gruyter. O’Konnor, J.D. (1965) The Perception of Time Intervals. Progress Report 2.11. Phonetics Laboratory. University College of London. Ordin, M. & Nespor, M. (2013). Transition probabilities and different levels of prominence in segmentation. Language Learning, 63(4), 800-834. Peña, M., Bion, R. A. H. & Nespor, M. (2011) How modality specific is the iambic– trochaic law? Evidence from vision, Journal of Experimental Psychology: Learning, Memory, and Cognition; Vol 37(5), pp. 1199-1208. Peña, M., Langus, A., Gutierrez, C., Huepe, D., & Nespor, M. (in press). Rhythm on your lips. Pike, K.L. (1945). The Intonation of American English. Ann Arbor. University of Michigan Press. Plato The Laws. Loeb Classical Library. Cambridge, Mass. Harvard University Press

(1926). Poizner, H. & Kegl, G. (1993) Neural disorders of the linguistic use of space and movement. Annals of the New York Academy of Sciences. 682, 192-213. Ramus, F. (2002). Language discrimination by newborns: Teasing apart phonotactic, rhythmic, and intonational cues. Annual Review of Language Acquisition, 2, 85115. Ramus, F., Hauser, M. D., Miller, C., Morris, D., Mehler, J. (2000) Language discrimination by human newborns and by cotton-top tamarin monkeys. Science, 288, pp. 340-351. Ramus, F. & Mehler, J. (1999). Language identification with suprasegmental cues: A study based on speech resynthesis. Journal of the Acoustical Society of America, 105(1), 512-521. Ramus, F., Nespor, M., Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73:3, 265-292. Remez, R., Rubin, P., Pisoni, D., & Carell, T. (1981). Speech perception without traditional speech cues. Science, 212, 947-950. Selkirk, E. O. (1984). Phonology and Syntax. Cambridge, Mass. MIT Press. Shen, Y. & G.G.Peterson (1962) Isochronism in English. University of Buffalo Studies in Linguistics. Occasional Paper 9. 1-36. Shukla, M., Nespor,. M., & Mehler, J. (in prep). Grammar on a language map. Swallow, K. M., Zacks, J. M., & Abrams, R. A. (2009). Event boundaries in perception affect memory encoding and updating. Journal of Experimental Psychology: General, 138, 236–257.

Thiessen, E.D., & Saffran, J.R. (2003). When cues collide: Use of statistical and stress cues to word boundaries by 7- and 9- month-old infants. Developmental Psychology, 39, 706-716. Wilbur, R. B. (2000) Phonological and prosodic layering of non-manuals in American Sign Language. In H. Lane and K. Emmorey (eds.), The signs of language revisited: Festschrift for Ursula Bellugi and Edward Klima, Hillsdale, NJ: Lawrence Erlbaum. 213-241. Wagner P.S., Dellwo, V.: Introducing YARD (Yet Another Rhythm Determination) and re-introducing isochrony to rhythm research. Proc. Speech Prosody, Nara 2004. White L., Mattys S.L. Calibrating rhythm: first language and second language studies. J. Phonet. 2007;35:501–522. Woodrow, H. (1951) Time perception. In S.S. Stevens (ed.) Handbook of Experimental Psychology. New York. Wiley. 1224-1236. Yoshida, K.A., Iversen, J.R., Patel, A.D., Nito, H., Mazuka, R., Gervain, J., & Werker, J.F. (2010). The development of perceptual grouping in infancy: a cross-linguistic study. Cognition, 115, 356-361.