Journal of Memory and Language 42, 571–596 (2000) doi:10.1006/jmla.1999.2696, available online at http://www.idealibrary.com on
Implications of Stress-Pattern Differences in Spoken-Word Recognition Sven Mattys and Arthur G. Samuel State University of New York at Stony Brook Existing models of spoken-word recognition positing that stressed syllables tend to be perceived as word onsets have not provided an account of the processing of non-initial-stress words. The present study suggests that such words require additional, time-consuming processing. Two experiments showed that phoneme monitoring is slower in non-initial-stress than initial-stress words, even when the target-carrying syllable is made identical through splicing. In a third experiment, the processing of non-initial-stress words was also found to be more memory-taxing than that of initial-stress words, a result consistent with the need for additional memory storage generated by incorrect lexical activation in non-initial-stress words. Taken together, the results support the view that words bearing different stress patterns are processed differently, with extra processing required for non-initial-stress words. The implementation of such a distinction is discussed in the framework of current models of word recognition, with an emphasis on processing time-course differences. © 2000 Academic Press Key Words: speech segmentation; prosody; retroactive processing.
Recognizing speech is fast and effortless. Words seem to emerge from spoken utterances in a clear and orderly fashion. However, the experience of speech as a sequence of discrete entities is an illusion, as speech is a surprisingly uninterrupted signal. Puzzled by this paradox, researchers have for years sought to understand how the human speech processor detects word boundaries in running speech. In this article, we examine one approach to lexical access that has recently gained considerable empirical support. The basic idea is that locating word boundaries in the signal can be substantially easier if listeners rely on prosodic cues, such as the occurrence of a strong syllable, to initiate lexical access (Cutler & Norris, 1988; Nakatani & Schaffer, 1978; Nooteboom, Brokx, & de Rooij, 1978; Norris, McQueen, & Cutler, 1995). Cutler and Norris (1988), who provided This work was supported by the National Institute of Mental Health, Grant No. R01 MH51663. We thank Donna Kat for her help with computer programming and Richard Gerrig, Marie Huffman, and Nancy Squires for discussing the ideas presented in this article with us. We also thank Gary Dell for his comments on an earlier version of the manuscript. Address correspondence and reprint requests to Sven Mattys, Department of Communication Neuroscience, House Ear Institute, 2100 W. Third Street, Los Angeles, CA 90057. E-mail:
[email protected].
the first substantiated account of this position, have suggested that listeners apply a segmentation strategy whereby every strong syllable in the signal is assumed to be the onset of a word. Such a strategy, known as the Metrical Segmentation Strategy (MSS), is theoretically fairly successful because most words in everyday speech in English bear word-initial stress (Cutler, 1989; Cutler & Carter, 1987). Data provided by Cutler and Norris (1988) indicate that listeners do indeed tend to segment speech at stressed syllable onsets. In their experiments, Cutler and Norris had participants detect a monosyllabic word (e.g., “mint”) starting a bisyllabic pseudoword. One version of the pseudoword bore two stressed syllables (e.g., “mintayf” /’mIn’teIf/, with an apostrophe before a syllable indicating it is stressed), whereas a second version had an unstressed second syllable (e.g., “mintef” /’mIntəf/). The authors reasoned that, if stressed syllables are used to initiate lexical access, an item like “mintayf” should trigger two lexical searches, one on /’mIn/ and one on /’teIf/. The detection of “mint” in “mintayf” should thus be relatively slow because the /t/ would be perceived as belonging to the second search and not to the first one. Such interference would not occur
571
0749-596X/00 $35.00 Copyright © 2000 by Academic Press All rights of reproduction in any form reserved.
572
MATTYS AND SAMUEL
with “mintef” because the unstressed syllable /təf/ would not prompt any parsing. This is exactly what Cutler and Norris found: Detection of “mint” was slower in “mintayf” than in “mintef.” These results, reinforced by more recent evidence in both English (e.g., Cutler & Butterfield, 1992) and Dutch (Vroomen & de Gelder, 1995; Vroomen & de Gelder, 1997; Vroomen, van Zon, & de Gelder, 1996) have been invoked extensively to support the idea that prosody is used to segment speech and guide lexical access. The major problem with stress-based models of word recognition is that they do not provide any direct account for how non-initial-stress words (e.g., “appear,” “fantastic”) are processed. In real life, such words do not seem to cause tremendous difficulty to listeners. To deal with non-initial-stress words, some authors have suggested that stress-based models need to include a routine that allows the speech system to repair incorrect lexical access via some repair operation, such as retroactivity, or via additional processing (Bradley, 1980; Cutler, 1976, 1989; Gow & Gordon, 1993; Grosjean & Gee, 1987). For instance, Cutler (1989) indicated that stress-based segmentation may be “supplemented by an ancillary strategy whereby lexical words beginning with weak syllables can, under appropriate circumstances, be successfully accessed via their strong syllables” (p. 353). To date, no model has provided even a rudimentary idea of the nature of such an “ancillary strategy.” However, whatever cognitive mechanisms are involved, any such strategy should cause recognition to be delayed for the words that depend on it. Thus, all else being equal, processing non-initial-stress words should be more time consuming than processing words with initial stress. The central goal of the current study is to determine if there are indeed such costs in processing non-initial-stress words, when normally occurring acoustic confounds are controlled. In addition, this study also tests a corollary prediction, that processing non-initial-stress words should place a greater load on short-term phonetic memory. These tests are intended to provide an initial fleshing
out of the properties of the putative “ancillary strategy.” Although there is, so far, little direct evidence that the speech system repairs faulty prosodydriven segmentation by backtracking to information presented earlier in time, retroactivity and/or delayed commitment, as more general mechanisms, have been demonstrated repeatedly (e.g., Bard, Shillcock, & Altmann, 1988; Cluff & Luce, 1990; Connine, Blasko, & Hall, 1991; Grosjean, 1985; Luce & Cluff, 1998; Samuel, 1979; Sherman, 1971; Warren & Sherman, 1974). For example, Sherman (1971), in an experiment on phonemic restoration, showed that subsequent context could influence the identity of a restored phoneme (e.g., /w/ being restored more readily than /p/ in “the *eel on the car” and /p/ more readily than /w/ in “the *eel on the orange,” where the asterisk indicates that the original phoneme has been replaced with noise). At the word level, Cluff and Luce (1990) found evidence of retroactive processing in the identification of spondees presented in white noise (spondees are bisyllabic compound words like “chestnut” or “deadlock” that bear two stressed syllables). The authors observed that subjects’ performance was better on spondees whose second constituent was a high-frequency word with few phonetic neighbors than on spondees in which both constituents were low-frequency and shared many phonetic neighbors. However, a high-frequency constituent with few phonetic neighbors in the first position did not improve performance. This result was taken by the authors as an indication of retroactivity in spoken-word recognition. Ambiguity (i.e., lowfrequency and high-density neighborhood) in a first syllable can be compensated for by more useful information (i.e., high-frequency and low-density neighborhood) coming later in time. This finding suggests that, under conditions of lexical uncertainty, right-to-left processing can occur in spoken-word recognition. Finally, Luce and Cluff (1998) have shown that a late strong syllable can generate spurious lexical activation, exerting its influence past a word’s offset. Subjects were presented with auditory monosyllables like “lock” and bisyllabic compound words that contained the monosylla-
STRESS PATTERN AND PROCESSING MECHANISMS
ble embedded at word offset, like “hemlock,” followed by a visual target for lexical decision. The target was either semantically related to the shared syllable of the auditory primes (e.g., “key”) or unrelated (e.g., “dance”). Importantly, the related target was not associated with the meaning of the entire bisyllable and the recognition point of the bisyllable, estimated from gating pilot data, always fell between the onset and the offset of the second morpheme. The results showed that lexical decision latencies were shorter for the related target than for the unrelated target, after both the monosyllabic and the bisyllabic primes. That is, regardless of the fact that the recognition point of the entire bisyllable was in theory reached well before the presentation of the target, the second (stressed) syllable produced a priming effect comparable to that produced by the monosyllable alone. This result shows that lexical activation does not proceed in an exclusively left-to-right fashion, with lexical commitments made sequentially as the speech input unfolds. Instead, late stressed syllables initiate lexical searches that can presumably compete with other, ongoing searches (see Shillcock, 1990, for similar results). In the present study, we examine how lexical stress affects word processing by focusing on two types of words: initial-stress words and non-initial-stress words. Operationally, a stressed syllable is defined here as the most prominent syllable in a word (i.e., the primary stressed syllable). Therefore, what we refer to as an initial-stress word is a word starting with a primary stressed syllable (e.g., “basic,” “generator,” and “capitalism”), whereas a non-initialstress word is a word starting with any syllable other than a primary stressed one (e.g., “terrain,” “panorama,” and “competition”). This specification is particularly relevant in Experiment 3, which features four-syllable-long stimuli. Our underlying assumption is that primary stressed syllables are the most likely to initiate lexical access and that, as a consequence, words not starting with a primary stressed syllable will incur a processing delay. Compared to more traditional stress-based models (e.g., Cutler & Norris, 1988; Norris,
573
McQueen, & Cutler, 1995), in which all syllables bearing a full vowel promote lexical access (Cutler, 1986), our assumption is conservative. A model in which primary stressed syllables were the only initiators of lexical access would be most efficient. When frequency-weighted English content words are considered, access based on full vowels theoretically leads to correct detection of word boundaries about 90% of the time (Cutler, 1990; Cutler & Carter, 1987). This figure includes all words starting with either a primary or a secondary stressed syllable. However, such an algorithm is relatively wasteful since it inevitably postulates an erroneous word boundary in the middle of a majority of polysyllabic words. Therefore, ignoring or downplaying secondary stress as a segmentation marker would improve the model’s accuracy (even though doing so decreases the word boundary detection rate by a few percentage points; see Mattys, 2000, for more details on primary/secondary stress distribution in the lexicon). Despite indications that different degrees of stress in unreduced syllables can be difficult to discriminate (Fear, Cutler, & Butterfield, 1995; Lieberman, 1965), Mattys and Samuel’s (1997) results using the migration paradigm revealed noticeable processing differences between words starting with a primary stressed syllable (e.g., “dictionary”) and words starting with a secondary stressed syllable (e.g., “demolition”). In addition, recent results obtained in the same laboratory (Mattys, 2000) showed that participants are consistently above chance when they have to chose which one of two words, e.g., “prosecutor” or “prosecution,” a sequence like /prosI/ presented auditorily is spliced out of. This result suggests that listeners have some capacity to discriminate between primary stressed and secondary stressed syllables and thus could exploit this information to guide recognition. In fact, recent evidence shows that both adult and infant listeners are able to rely on subtle stress differences to segment words from fluent speech (Mattys, Jusczyk, Luce, & Morgan, 1999; Morgan, 1996; Vroomen & de Gelder, 1997). For instance, Vroomen and de Gelder
574
MATTYS AND SAMUEL
found that Dutch listeners may use a StressBased Segmentation (SBS) strategy, in which only primary stressed syllables signal word boundaries, rather than a Metrical Segmentation Strategy (MSS) (Vroomen & de Gelder, 1995). In one of their experiments, Dutch participants had to detect a bisyllabic trochaic word (e.g., CRAter) embedded in a trisyllabic string. Results showed that, with acoustic differences controlled, the target word was detected faster when its initial syllable was realized as a primary stressed syllable in the string (e.g., /”pɔ’krɑtər/) than when it was secondary stressed (e.g., /”pɔ’krɑtər/). This result suggests that segmentation is guided more by the degree of a syllable’s stress than by whether it is stressed. MSS, which attributes the same segmentation power to any syllable bearing a full vowel, would have predicted that “CRAter” should be segmented from the two strings equally quickly. In Experiment 3, the stimuli are built following a primary versus secondary stress distinction, and hence, are in keeping with the SBS approach. The three experiments reported here are designed to test a set of predictions that, if validated, would collectively support the notion that non-initial-stress words involve delayed processing, presumably because late information has to be incorporated into the encoding of earlier sounds to compensate for inappropriate lexical access begun on an earlier weak syllable. Experiment 1 tests for the existence of a penalty imposed on non-initial-stress bisyllabic words—a logical consequence of delayed processing— when all unwanted acoustic variations in the signal are controlled for. Experiment 2 further investigates the cost of retroactive processing by considering the detection times to wordinitial phonemes in strong–weak and weak– strong words whose target-bearing syllable is made identical through splicing (e.g., /p/ in PERmit vs PERMIT or /k/ in CAMpus vs CAMPAIGN). Thus, the first two experiments provide a rigorous test of the fundamental prediction that noninitial-stress words incur a processing cost. Critically, in these tests, splicing is used to factor out any artifactual source of any observed dif-
ferences, an essential control not present in previous work. Experiment 3 tests another prediction logically associated with the retroactive processing hypothesis, namely that processing of late-stress words places a substantial demand on memory because early information has to be maintained until delayed integration is accomplished. Consequently, the processing of late-stress words is expected to take a larger toll on memory resources than that of initial-stress words. To increase the chance of observing differences in memory demands, longer words (quadrisyllables) were used in Experiment 3. This feature also allowed us to examine a contrast between primary versus secondary stress (SBS) rather than strong versus weak syllables (MSS). EXPERIMENT 1 In this experiment, phoneme-detection RTs were collected in initial-stress and non-initialstress words matched phonemically and acoustically on their target-carrying syllables. On each trial, participants were presented with a list of seven words played back to back. For each list, they were instructed to detect a prespecified syllable-initial consonant. The target-carrying syllable was either stressed or unstressed and was either the first (early) or second (late) syllable of a bisyllabic word. To neutralize acoustic variations between conditions, the same syllable was used in the early and late positions. For example, responses to the target phonemes /g/ and /z/ in the syllables /gə/ and /’zel/ were both measured as early and as late targets in the list-embedded words “gazelle” and “saga–zealous.” The target-bearing syllables were made acoustically identical in the early and late conditions by splicing “gazelle” out of “saga-zealous” (procedural details are given below). With this 2 ⫻ 2 factorial design (i.e., the carrier-syllable can be stressed-early, stressedlate, unstressed-early, or unstressed-late), the results should reflect the interplay of three factors: the position of the syllable in a word, the salience of the syllable, and, most importantly, the stress pattern of the syllable-carrying word.
STRESS PATTERN AND PROCESSING MECHANISMS
Method Participants. Two groups of 28 undergraduate students of the State University of New York at Stony Brook received course credit for their participation in the experiment. All were native English speakers with no reported hearing problems. Materials. Thirty-two bisyllabic test words were selected, 16 of them bearing primary stress on the second syllable (iambic words) and the other 16 bearing primary stress on the first syllable (trochaic words). For each test word, a pair of words (“test twins”) was chosen that embedded the acoustic sequence of the test word. For example, the iambic word “gazelle” is embedded in “saga–zealous” and the trochaic word “token” is embedded in “plateau– conceal.” The 32 test words were generated through computerassisted excerpting of the sequence in the associated test twins. “Gazelle” and “token” were thus never recorded as such. The advantage of keeping the acoustic signal constant is that the RTs to a target phoneme in a given syllable (/gə/ or /’zel/ and /’to/ or /kən/) can be examined in a bias-free manner when these syllables are word-initial and when they are not. The 16 iamb-based sets can then be considered to be control items for the 16 trochee-based sets, and vice-versa, where the origin of the syllable (either real or spliced-out) is balanced between the two series of sets. Indeed, if we kept only, say, the iamb sets (e.g., “gazelle,” “saga-zealous”), the stressed syllables (e.g., /’zel/) would always acoustically originate from the beginning of a word (e.g., “zealous”) in both the early and late conditions, whereas the unstressed syllables (e.g., /gə/) would always originate from the end of a word (e.g., “saga”). In the trochee sets, this link is reversed: Stressed syllables (e.g., /’to/) originate from the end of a word (e.g., “plateau”), whereas unstressed syllables (e.g., /kən/) originate from the beginning of a word (e.g., “conceal”). Using both sets of stimuli provides us with an entirely balanced set of materials. The words bearing the target in a stressed syllable had an average frequency of 46.6 (Kuc˘era & Francis, 1967) and an average
575
uniqueness point of 3.7 (computed as the position of the phoneme after which the sequence can be uniquely specified). The corresponding figures for the words bearing the target in an unstressed syllable were 23.0 and 4.0. When these numbers were entered in ANOVAs with the position factor, no significant effects or interactions were found. An additional ANOVA performed on the frequency data also did not reveal any reliable differences among the three critical types of words within either the iamb sets (e.g., “saga,” “zealous,” and “gazelle”) or the trochee sets (e.g., “plateau,” “conceal,” and “token”). The test words were included in lists of seven words played with no pauses between them (see Appendix A). For example, the two sets of stimuli shown above were presented in the following lists: (1) sauna– gazelle–awkward–profane– depart– dissect–pervade (2) water– basic– dictate–saga–zealous–furry– beyond (3) fizzy–token–salad–revenge–surmise– bombing–fusion (4) repair– grimace–plateau– conceal– gracious–improve–pantry Before the presentation of each list, a target phoneme was specified by presenting the corresponding letter on a computer monitor. Targets were the initial consonants of the syllables under study. While one participant had to detect /g/ in the first and second lists and /t/ in the third and fourth lists, another participant had to detect /z/ in the first and second lists and /k/ in the third and fourth lists. Thus, the participants in the first group always detected consonants originating from a word-final syllable, whereas the participants in the second group always detected consonants originating from a word-initial syllable (recall that “gazelle” and “token” were constructed, not recorded directly). This betweensubject feature avoided presenting the same list twice to the same participant. Given that each test twin (e.g., “saga–zealous” and “plateau– conceal”) had to be recorded as one string to allow the extraction of the embedded words, the resulting test words (e.g.,
576
MATTYS AND SAMUEL
“gazelle” and “token”) could conceivably have had some acoustic quality that might constitute a cue to the target position. To avoid this potential bias, two pairs of words chosen randomly in each list were recorded pairwise to mimic the quality of the test twins. In the lists containing a test twin, one extra pair of words chosen randomly was recorded pairwise. Thus, all lists included four words (two pairs) recorded pairwise, regardless of the condition. For example, in the first list above, “profane– depart” and “dissect–pervade” were recorded as two continuous strings. So was “water– basic,” in addition to “saga–zealous,” in the second list. In the third list, “salad–revenge” and “surmise– bombing” were the pairs, as was “improve– pantry,” in addition to “plateau– conceal,” in the fourth list. All the other words of the lists were recorded in isolation and then concatenated with the pairs of words and the spliced-out test words. There were several considerations regarding the position of the test words and test twins in the lists and the choice of filler words. The position of the carrier-word was randomly chosen, ranging from second to sixth word in the list. The overall position of the carrier-word was balanced between the iamb-based and trocheebased sets. The filler words generally had the same characteristics as the test words and test twins: half of them started with a stressed syllable while the other half started with an unstressed syllable. Most filler words were two syllables long (some were three syllables long) and most started with a consonant. In the experimental lists, the target consonants were always carried by either a crossspliced word (e.g., “gazelle” and “token”) or by words recorded pairwise (e.g., “saga–zealous” and “plateau– conceal”). To avoid any potential cuing of the target based on the quality of the carrier-word, 26 filler lists were built that carried a target consonant in words recorded in isolation. In 13 of them, the target was the initial consonant of the first syllable and, in the other 13, it was the initial consonant of the second syllable. To compensate for the absence of targets in the first and seventh positions in the experimental lists, a majority of carrier-words
in the filler lists occurred in these two positions. Moreover, to match the structure of the experimental lists, each filler list contained four words recorded pairwise and some of the lists contained a word recorded through cross-splicing. This word was spliced out of two pseudowords recorded back to back. In the filler lists, the target never occurred in the pairs of words or in the cross-spliced words. There were also 10 target-absent lists that followed the same features of construction as those used to create the experimental and filler lists. Altogether, there were 100 lists: 64 experimental, 26 filler, and 10 target-absent. The 100 lists were randomized for presentation. Thirty practice trials were built with the same characteristics as the other lists. The target could occur in any position in the list. It could be in the first or second syllable of a word, in a cross-spliced word, or in a pair-recorded word. Five of the practice lists were target-absent. Experiment 1 was designed to maximize sensitivity to lexical processing. Frauenfelder and Segui (1989) have shown that the “generalized” version of phoneme monitoring, in which listeners detect phonemes that can occur in positions other than word-initial, provides a reliable way to reveal lexical effects (see also Frauenfelder, Segui, & Dijkstra, 1990; Pitt & Samuel, 1990). Given that, in our design, the target phonemes occurred in multiple locations within the carrier-words, the results we obtain are likely to reflect any lexical processing. To further ensure that participants performed the task at a lexical, rather than acoustic level, a second, simultaneous task was added to the phonemedetection task (see Eimas, Marcovitz, Hornstein, & Payton, 1990). Participants were asked to push a separate button every time they heard a word belonging to the semantic category “tool” (e.g., “hammer,” “chisel”). Ten percent of the lists contained a word related to the “tool” semantic category. Tool words were not included in any experimental lists. Apparatus and procedure. All of the words were recorded by a male native English speaker in a soundproof chamber. They were low-pass filtered at 4.8 kHz, digitized (12-bit A/D) at 10 kHz, and stored on the disks of two 486/100
STRESS PATTERN AND PROCESSING MECHANISMS
computers. On output, the items were converted to analog form (12-bit D/A, 10 kHz), low-pass filtered at 4.8 kHz, and delivered over highquality headphones at approximately 70 dB SPL. Each participant was tested individually in a soundproof chamber. The participants were seated in front of a monitor and wore headphones. They were told that, on each trial, they would first see a capital letter appear on the center of the monitor (the sound /兰/, as in “machine” or “shallot,” was represented by the two letters SH). Then, the experimenter told them that they would hear a list of several words played without interruption. The participants were instructed to push a button with their right hand as soon as the phoneme represented by the letter(s) on the monitor was heard in the list. They were not to push the button if they did not hear the target phoneme. They were also instructed to simultaneously detect all the words that could be classified as “tools.” Several examples, not used in the actual experiment, were given. The semantic task had to be performed with the left hand, using a button located to the left of the phoneme-monitoring button on the response board. The experimenter emphasized both speed and accuracy in the two tasks. On each trial, the orthographic representation of the target phoneme was presented on the monitor for 1.5 s. Then, there was an interval of 1 s until the onset of the auditory list. One second after list offset, whether or not a response had been given, the program moved on to the next trial. Thirty practice trials preceded the test trials. Results The hypothesis tested in this experiment is that noninitial stress is detrimental to phonememonitoring RTs. In this conceptualization, we expect that both early and late consonants of non-initial-stress words take longer to be detected because such words require additional processing. Figure 1 illustrates several possible stress-sensitive response patterns, all consistent with the hypothesized additional processing in non-initial-stress words. Figure 1a shows a pure case of stress-based
577
processing. Phonemes are processed faster in initial-stress than in non-initial-stress words, regardless of their position in the word and the salience of the syllable in which they occur. In the next two cases, the advantage for targets in initial-stress words combines with an overall position effect (1b) or an overall salience effect (1c). Figure 1b can be obtained by simply tilting the right side of Fig. 1a upward to indicate the position penalty imposed on late segments (see Pitt & Samuel, 1995, for a similar position effect between word-onset and word-medial phonemes). Likewise, Fig. 1c can be conceptualized as a modification of Fig. 1a in which the “unstressed” line is shifted upward to indicate the salience penalty imposed on unstressed syllables (for such salience effects, see, e.g., Bond & Garnes, 1980; Cutler & Foss, 1977; Lieberman, 1965; Mehta & Cutler, 1988; Shields, McHugh, & Martin, 1974). Finally, Fig. 1d depicts the combination of all three factors: stress pattern, position, and salience. In this representation, an early stressed syllable incurs the least cost because it begins a word, enjoys salience, and belongs to an initialstress word. The early unstressed, late unstressed, and late stressed syllables are each penalized twice for, respectively, being unstressed and belonging to a non-initial-stress word, being late and unstressed, and being late and belonging to a non-initial-stress word. A statistical feature shared by these four implementations is the interaction between the Stress and Position factors. Sensitivity to only the position of the target phoneme, only the salience of the target-carrying syllable, or both position and salience would not yield a pattern of interaction. In this experiment, RTs were measured from the onset of the target phonemes. Only RTs between 100 and 1500 ms were kept in the analyses; 7% of the data was discarded by this criterion. Mean RTs and percentages of correct responses were calculated by subjects and by items for each of the eight cells of the design: Stress (first vs second syllable), Position (early vs late), and Origin (acoustic beginning of a word vs acoustic end of a word). Results are
578
MATTYS AND SAMUEL
FIG. 1. (Left) Hypothetical RTs to the syllable-initial consonant of the first (“Early”) and second (“Late”) syllables of bisyllables when the stress pattern of the word is factored in the model (a), in combination with the position of the syllable (b), the stress of the syllable (c), or both (d). (e) Actual mean RTs obtained in Experiment 1.
shown in Table 1 and collapsed across the procedural Origin factor in Fig. 1e. A three-way ANOVA was performed on the RTs, examining Stress, Position, and Origin. For testing stress-based models of lexical access, the relationship of Stress and Position is
TABLE 1 Mean RTs (in Milliseconds) to Phoneme Detection as a Function of the Stress, Position, and Acoustic Origin of the Target-Carrying Syllable a (Experiment 1) Acoustic origin Beginning of a word
End of a word
Stress
Early
Late
Early
Late
Stressed Unstressed
534 (14) 624 (14)
580 (22) 596 (27)
529 (13) 579 (19)
600 (17) 612 (31)
a
Percentage of errors is indicated in parentheses.
critical. There was a significant interaction between Stress and Position, F1(1,54) ⫽ 13.21, p ⬍ .001; F2(1,30) ⫽ 8.29, p ⬍ .01 (see Fig. 1e). Such an interaction was only predicted when stress-based strategies were factored in the model. This interaction was not affected by the acoustic origin of the syllable, F1(1,54) ⫽ 1.34, p ⬎ .20; F2 ⬍ 1. There were main effects of Stress, F1(1,54) ⫽ 40.20, p ⬍ .001; F2(1,30) ⫽ 9.57, p ⬍ .005, and Position, F1(1,54) ⫽ 9.63, p ⬍ .005; F2(1,30) ⫽ 6.69, p ⬍ .02. Phonemes were detected faster in stressed than unstressed syllables and faster when they started an early syllable. The Position effect was more robust when the syllable acoustically originated from the end of a word, F1(1,54) ⫽ 4.85, p ⬍ .05; F2(1,30) ⫽ 5.29, p ⬍ .03. All of the other main effects and interactions were nonsignificant, both by subjects and by items. Further analyses of the critical interaction
579
STRESS PATTERN AND PROCESSING MECHANISMS TABLE 2 Percentage of Errors as a Function of the Stress and Position of the Target-Carrying Syllable and the Overall Accuracy of the Participants (Experiment 1) High-accuracy participants
Low-accuracy participants
Total
Stress
Early
Late
Early
Late
Early
Late
Stressed Unstressed
8 12
11 21
18 21
31 27
13 16
20 29
between Stress and Position revealed that the stress effect was reliable only in the early position, F1(1,54) ⫽ 41.19, p ⬍ .001; F2(1,30) ⫽ 16.37, p ⬍ .001. The initial consonant of an early syllable was detected faster when the syllable was stressed than when it was unstressed. This pattern was expected by models in which performance is guided by a combination of stress pattern and salience (Fig. 1c) or by a combination of stress pattern, salience, and position (Fig. 1d). Of these two alternatives, Fig. 1d clearly fits the shape of the graph and the statistical results best. Indeed, the Position effect (faster RTs to early than late syllables) appeared with stressed syllables, F1(1,54) ⫽ 24.56, p ⬍ .001; F2(1,30) ⫽ 19.87, p ⬍ .001, but not with unstressed syllables, F1 and F2 ⬍ 1. An ANOVA of the error rates revealed a main effect of Stress, F1(1,54) ⫽ 26.81, p ⬍ .001; F2(1,30) ⫽ 6.05, p ⬍ .03, a main effect of Position, F1(1,54) ⫽ 36.19, p ⬍ .001; F2(1,30) ⫽ 11.89, p ⬍ .003, and an interaction between these two factors significant by subjects, F1(1,54) ⫽ 9.38, p ⬍ .003, but not by items, F2(1,30) ⫽ 1.67, p ⬎ .20. Accuracy was higher in stressed than unstressed syllables and in early than late syllables. These two main effects reflect the pattern found in the RT analyses and the traditional trends observed with the phoneme-detection and mispronunciation-detection tasks. Phonemes are better detected in the stressed than the unstressed syllables of words presented in running speech (Mehta & Cutler, 1988; Shields, McHugh, & Martin, 1974) and better early than late in words (Pitt & Samuel, 1995). Likewise, mispronounced pho-
nemes are better detected in stressed than unstressed syllables (Cole & Jakimik, 1980b) and early than late in words (Cole, Jakimik, & Cooper, 1978). The interaction obtained in the accuracy data appears to result from a ceiling effect for the early stressed syllables. Indeed, when the 28 participants in each group were split into two subgroups of 14 according to their overall level of detection accuracy (see Table 2), only the “high-accuracy” participants (mean error rate ⫽ 13%, SD ⫽ 9) showed the significant interaction between Stress and Position, F1(1,26) ⫽ 6.34, p ⬍ .02; F2(1,30) ⫽ 1.98, p ⬍ .20. The interaction in the “low-accuracy” participants (M ⫽ 26%, SD ⫽ 13) did not reach significance either by subjects, F1(1,26) ⫽ 3.72, p ⬎ .06, or by items, F2(1,30) ⫽ 1.12, p ⬎ .20. Note that both subgroups still showed a main effect of Stress [high-accuracy participants: F1(1,26) ⫽ 17.34, p ⬍ .001; F2(1,30) ⫽ 6.09, p ⬍ .03; low-accuracy participants: F1(1,26) ⫽ 9.70, p ⬍ .005; F2(1,30) ⫽ 3.29, p ⬍ .09] and a main effect of Position [high-accuracy participants: F1(1,26) ⫽ 13.56, p ⬍ .005; F2(1,30) ⫽ 5.74, p ⬍ .03; low-accuracy participants: F1(1,26) ⫽ 25.68, p ⬍ .001; F2(1,30) ⫽ 13.69, p ⬍ .001]. Moreover, the RT data resulting from the partitioning revealed the same pattern as in the entire group (although the interaction between Position and Stress only reached significance in the low-accuracy group). Finally, Stress and Origin significantly interacted by subjects, F1(1,54) ⫽ 9.53, p ⬍ .003, but not by items, F2(1,30) ⫽ 2.15, p ⬎ .15. Phonemes were detected more accurately in stressed than unstressed syllables only when the
580
MATTYS AND SAMUEL
carrier-syllable acoustically originated from the end of a word, F1(1,27) ⫽ 36.01, p ⬍ .001; F2(1,30) ⫽ 6.24, p ⬍ .02 [acoustic beginning of a word: F1(1,54) ⫽ 2.07, p ⬎ .20; F2(1,30) ⬍ 1]. Discussion The phoneme-monitoring reaction times suggest that the processing of bisyllabic words is contingent on a combination of factors. Most importantly, the robust interaction between Stress and Position shows that phoneme detection is sensitive to the stress pattern of the carrier-word; detection is slower in non-initialstress words than initial-stress words. This result was predicted by the view that extra time is needed for dealing with non-initial-stress words. Given the lexical nature of any stress pattern differences, such a stress-based cost would clearly have a lexical origin. Note that an early unstressed syllable does not generate longer RTs than a late unstressed syllable (which would be expected from a pure stressbased model). This is so because performance is modulated by the position of the carrier-syllable and by its salience (see Fig. 1d), two factors that are more likely to be acoustically mediated in this monitoring experiment. Reaction times were indeed faster, overall, in early than in late positions and in stressed than in unstressed syllables. In conclusion, the results of Experiment 1 suggest that speech processing is not a simple linear process driven by any one of the factors traditionally discussed in the literature (the physical salience of the speech constituents, the unfolding of time, and the expectation of a specific stress pattern). The present results are best explained by a combination of the three. Given the substantial body of evidence that has demonstrated the role of target position and salience, the major new finding of Experiment 1 is that words with noninitial stress are processed more slowly than those with initial stress. Critically, this result is obtained even when splicing is used to ensure acoustic identity of target phonemes and syllables in early and late positions.
EXPERIMENT 2 The results of Experiment 1 demonstrate that non-initial-stress words (or at least bisyllabic iambs) constitute a more difficult challenge for the speech processor than do words starting with a stressed syllable. As just noted, a particularly compelling aspect of these results was that exactly the same syllables were probed as pieces of either a trochaic word (e.g., /gə/ in “saga”) or an iambic one (e.g., /gə/ in “gazelle”). Experiment 2 takes this approach one step further: we controlled not only the acoustic identity of the target-carrying syllable (through splicing) but also its position—the same targetbearing syllable occurred in the same position in both iambic and trochaic words. In doing so, it is possible to isolate retroactive effects from the position and salience factors. In Experiment 2, participants monitored phonemes in isolated words. On the critical test trials, the target was the initial phoneme of strong–weak and weak–strong words like “PERmit” and “perMIT” (target: /p/) or “CAMpus” and “camPAIGN” (target: /k/). To prevent the acoustic realization of the target phoneme from influencing detection latencies, independent of retroactive influences from the second syllable, we used the same initial syllable for both members of a pair. For instance, a neutral “per” (henceforth PER) was recorded and spliced to the second syllable of normally recorded “PERmit” and “perMIT.” The result is a pair of perceptually strong–weak and weak–strong stimuli differing only in their second syllable (e.g., PERmit vs PERMIT), with position and salience of the target-bearing syllable entirely neutralized. If lexical processing proceeds sequentially, independent of perceived stress pattern, monitoring times should not differ in the strong–weak and weak–strong versions of the stimuli. However, if late stress causes the processing of earlier segments to be delayed, monitoring times should be longer in weak–strong than strong–weak words. Method Participants. Twenty-one students of the Johns Hopkins University received a small hon-
STRESS PATTERN AND PROCESSING MECHANISMS
orarium for their participation in the experiment. All were native English speakers with no reported hearing problems. Materials. Fifty-four pairs of test words were chosen. Eighteen were segmental homophones differing only in their stress pattern (e.g., PERmit–perMIT) while 36 pairs shared only the first syllable and the initial consonant of the second syllable (e.g., CAMpus– camPAIGN). The use of the two categories was intended to create diversity in the test stimuli and increase the number of items. The 54 pairs are shown in Appendix B. To prevent the acoustic properties of the initial syllable from influencing wordinitial phoneme monitoring, a common initial syllable was spliced to the second syllable of each pair member. This initial syllable (together with the initial consonant of the second syllable) was recorded separately with a degree of stress as neutral as possible, that is, with emphasis between stressed and unstressed. For example, in order to generate the test pair “PERmit– perMIT,” three stimuli were recorded: PERmit, perMIT, and PERM (small capitalization indicates neutral stress). Then, PER and per were spliced out of PERmit and perMIT, respectively, and replaced with PER from PERM. The result is a pair of words whose target-bearing syllable is acoustically identical but whose stress pattern is perceived as either strong–weak (e.g., PERmit) or weak–strong (e.g., PERMIT). Three-hundred six bisyllabic filler words were chosen to minimize disparity among the stimuli with regard to presence/absence of the target, position of the target, pairwise similarity, and so on. They were all consonant-initial. Half of them were strong–weak and half weak– strong. Stress pattern was balanced in all of the following breakdowns. To compensate for the presentation of segmentally identical (e.g., PERmit–perMIT) or similar (e.g., CAMpus– camPAIGN) test words, 54 fillers were presented twice (see below for procedural details regarding the presentation of the stimuli). Eighteen of them contained the target phoneme at word onset in one presentation and did not contain it in the other presentation, 18 fillers contained the target phoneme at second-syllable onset in one presentation and did not contain it
581
in the other presentation, and 18 did not contain the target phoneme in either presentation. Thirty-six fillers originated from 18 pairs similar to CAMpus– camPAIGN type pairs. The same splicing algorithm was applied to these fillers. All of them were target-absent. Among the remaining 216 fillers (all presented only once), 126 were target-absent and 90 contained the target phoneme at second-syllable onset. Once repetitions are taken into account, there were 108 test trials (54 pairs) and 360 filler trials, for a total of 468 trials presented to each participant. Main breakdowns are as follows: 234 strong–weak, 234 weak–strong; 234 targetpresent, 234 target-absent; 126 word-initial targets, 108 second-syllable onset targets. The target phonemes to monitor showed the same distribution among target-present and target-absent trials. Moreover, within each category, each target phoneme was monitored in an equal number of strong–weak and weak–strong words. Apparatus and procedure. Stimuli were recorded by the same speaker as in Experiment 1. They were low-pass filtered at 4.8 kHz, digitized (12 bit A/D) at 10 kHz, and stored on the disk of a Pentium 133-MHz computer. On output, the stimuli were converted to analog form (16-bit D/A, 16 kHz) and delivered over headphones at an average 72 dB SPL. Participants were tested individually in two sessions (234 trials in each) separated by at least 24 h. The two-session design was intended to attenuate any priming effects from one member of a test pair (e.g., PERmit) to the other (e.g., perMIT). The trials were distributed among the two sessions such that the two members of a pair (test or foil) were never played in the same session. Half the test words of a session were strong– weak and the other half were weak–strong. Stimuli were presented in one of three random orders for each session. Session order was balanced across participants. The first session was preceded by a practice phase made of an assortment of 20 trials chosen among the foils (10 target-present, 10 targetabsent). Participants faced a monitor and wore a pair of headphones. They were told that, on each trial, they would see a capital letter in the
582
MATTYS AND SAMUEL
center of the monitor and then hear a word played over the headphones. They were instructed to push a button labeled “yes” as soon as they heard the sound corresponding to the letter on the monitor. If they did not hear the sound, they were to push a button labeled “no,” located to the left of the “yes” button. They were encouraged to respond as quickly as possible. On each trial, the target phoneme was presented on the monitor for 1.2 s. Then, there was an interval of 750 ms until the onset of the auditory word. Participants had 2 s, measured from word onset, to give their response. An interval of 1.5 s following the participant’s response or the end of the 2-s response window was allowed before the next letter was displayed on the monitor. Results and Discussion Reaction times were measured from the onset of the target phonemes. Reaction times shorter than 100 ms and longer than 1500 ms were discarded from the analyses, which accounted for less than 6.2% of the data. Mean RTs and percentages of correct responses were computed by subjects and by items for strong–weak and weak–strong test words. Results were also broken down into “mirror” pairs (e.g., PERmit– perMIT) and “matched” pairs (e.g., CAMpus– camPAIGN) to verify that stress effects were not affected by the type of materials. Reaction times can be seen in Fig. 2. An ANOVA on the RT data, examining Stress Pattern (strong–weak vs weak–strong) and Type of Pair (mirror vs matched), revealed a significant effect of Stress Pattern (26 ms), F1(1,20) ⫽ 17.63, p ⬍ .001; F2(1,52) ⫽ 12.58, p ⬍ .001, and no effect of Type of Pair, F1(1,20) ⫽ 1.05, p ⬎ .30; F2(1,52) ⫽ 1.23, p ⬎ .20, or interaction between Stress Pattern and Type of Pair, F1(1,20) ⬍ 1; F2(1,52) ⬍ 1. Participants were slower to detect a word-initial consonant in a word with late stress than in a word with initial stress. The type of pairs— mirror or matched— did not have any influence on the stress effect. Given that the target-bearing syllables in strong–weak and weak–strong words were acoustically identical, this result
FIG. 2. Mean RTs (ms) to word-initial consonants as a function of word stress pattern and type of experimental pairs. Initial syllables are acoustically identical in the strong–weak and weak–strong words of each type of pair (Experiment 2).
constitutes strong evidence for stress-based retroactive processing: Late stress delays the processing of early segments even when salience and position of the target are neutralized. The percentages of correct responses showed comparable trends even though they were near ceiling level (Mirror: SW ⫽ 99%, WS ⫽ 97%; Matched: SW ⫽ 97%, WS ⫽ 96%). ANOVAs suggested that mirror items were responded to more accurately than matched items, F1(1,20) ⫽ 5.91, p ⬍ .05; F2(1,52) ⫽ 1.86, p ⬎ .10. The stress difference and the Stress by Type of Pair interaction were not significant [F1(1,20) ⫽ 2.83, p ⫽ .11; F2(1,52) ⫽ 1.40, p ⫽ .24, and F1(1,20) ⫽ 2.46, p ⫽ .13; F2(1,52) ⫽ 1.19, p ⫽ .28, respectively]. It is conceivable that the monitoring delay introduced by late stress could result from some sort of backward masking based on the higher amplitude, frequency, and/or duration of the late strong syllable, independent of lexical access differences per se. To test this possibility, we first measured the amplitude, F 0, and duration of the second syllable in the 54 strong– weak words and their matched 54 weak–strong
STRESS PATTERN AND PROCESSING MECHANISMS
counterparts. As expected (e.g., Lehiste, 1970), the three variables differed significantly across strong–weak and weak–strong words, with the second syllable of the former having lower amplitude, F 0, and duration than that of the latter [65 dB vs 76 dB, t(53) ⫽ ⫺8.83, p ⬍ .001; 108 Hz vs 136 Hz, t(53) ⫽ ⫺14.02, p ⬍ .001; 336 ms vs 490 ms, t(53) ⫽ ⫺11.11, p ⬍. 001]. If acoustic masking were responsible for the stress effect found on phoneme monitoring, there should be a correlation between the size of the RT stress effect (Stress Dif) for a given test pair and the size of acoustic differences (AmplitudeDif, F 0 Dif, Duration Dif) for that pair. Yet, none of the individual correlations between stress effect size and acoustic differences approached significance: r (Stress Dif-Amplitude Dif) ⫽ .11, t(52) ⫽ .82, p ⫽ .42; r (Stress Dif-F 0 Dif) ⫽ ⫺.04, t(52) ⫽ ⫺.26, p ⫽ .80; r (Stress Dif-Duration Dif) ⫽ ⫺.11, t(52) ⫽ ⫺.77, p ⫽ .44. We then performed a simultaneous multiple regression with StressDif as the dependent variable (DV) and AmplitudeDif, F 0 Dif, and Duration Dif as the independent variables (IVs). The regression coefficient R (.17) was not significant, F(3,50) ⬍ 1, and none of the three IVs, considered uniquely, accounted for a significant percentage of variance in the stress effect size [sr 2 of Amplitude Dif, F 0 Dif, and Duration Dif was .02 (ns), .00 (ns), and .01 (ns), respectively]. Altogether, only 3% of the variability in the RT stress effect can be predicted by the amplitude, F 0, and duration differences in the second syllable of strong–weak and weak–strong stimuli. This regression model indicates that differential backward masking per se cannot account for the robust stress effect obtained on word-initial phoneme monitoring RTs. Instead, longer latencies in weak–strong words are better explained by a mechanism in which late lexical stress initiates a retroactive/corrective type of processing affecting segments that occur earlier in time. EXPERIMENT 3 In the first two experiments, we predicted (and found) longer reaction times for non-initial-stress words, due to the extra time necessary to incorporate late information into the encoding of earlier segments. Experiment 3 tests another logical consequence of the retroactive pro-
583
cessing hypothesis: processing non-initial-stress words should require greater use of phonetic memory capacity than does processing initialstress words. A stress-based account of lexical access implies that the information preceding the stressed syllable of a non-initial-stress word (or the lexical candidates activated by this information) must be kept available in a memory store until lexical access based on the late stressed syllable is accomplished (see Mattys, 1997, for more details on the role of memory in speech segmentation). Information could be stored in a very short-lived acoustic buffer (e.g., Efron, 1970; Huggins, 1975; Massaro, 1972; Nooteboom, 1979) or in a more perceptualauditory memory store (e.g., Baddeley, 1992; Garman, 1990; Miller, 1956; Simon, 1974). If non-initial-stress words have higher memory needs than initial-stress words, then a concurrent task requiring phonetic memory should be performed more poorly during the processing of non-initial-stress than initial-stress words because the mnemonic storage required by noninitial-stress words would consume more of the resources needed to perform the memory task. To test this prediction, we devised a dual-task experiment in which participants were asked to memorize two visually presented monosyllabic pseudowords while simultaneously performing a syllable detection task on initial-stressed words (e.g., /nə/ in “GEnerator”) and non-initial-stress words (e.g., /nə/ in “panoRAma”). We assessed the extent to which memory resources were utilized during the processing of the two types of words through a concurrent rhyme-judgment task. Rhyme judgment was chosen because of its phonetic nature and its documented ability to interact with speech encoding (Samuel & Kat, 1998). Pseudoword retention was measured by presenting a visual probe pseudoword after the syllable detection was made. Participants had to judge if the probe rhymed with either of the to-be-remembered pseudowords. If non-initial-stress words require memory storage to handle the demands of retroactive processing, rhyme recognition should fail more often after non-initial-stress words than after initial-stress words. This would be so because the mnemonic resources necessary to
584
MATTYS AND SAMUEL
identify a rhyme would have been tapped to a greater extent in decoding non-initial-stress words than initial-stress words. Unlike the stimuli used in Experiments 1 and 2, the auditory stimuli in Experiment 3 were quadrisyllables. The use of longer words allowed us to increase the amount of phonetic information to retain for retroactive processing and hence to maximize the probability of observing reliable memory effects. Using longer words also allowed us to examine retroactive effects initiated by primary stress (e.g., the third syllable of “panorama”) as opposed to secondary stress (e.g., the third syllable of “generator”), a finer stress difference than strong versus weak (Mattys, 2000; Mattys & Samuel, 1997; Vroomen & de Gelder, 1997). Method Participants. Thirty-five undergraduate students of the State University of New York at Stony Brook received payment or course credit for their participation in the experiment. All were native English speakers with no reported hearing problems. Materials for the syllable-detection task. Seventeen initial-primary-stress words were chosen. Sixteen were four syllables long and one was five syllables long (for the sake of simplicity, they are all referred to as “quadrisyllables” hereafter). Each word was matched with a word bearing primary stress on the third syllable, with the matching based on the identity of the second syllable and the phonetic category of the immediately following consonant. For example, “panorama” was paired with “generator” because the two words share the second syllable /nə/ and the following consonant /r/ while displaying opposed stress patterns. All of the 17 third-syllable-stress words were four syllables long. The 34 test words are listed in Appendix C. The mean frequency of the initial-stress words was 4.4 and the median was 2, whereas, in the set of third-syllable-stress words, these figures were 7.1 and 2, respectively (Kuc˘era & Francis, 1967). A t test performed on the means did not reveal any significant frequency difference, t(33) ⫽ ⫺.77, p ⫽ .45. Similarly, the uniqueness point did not statistically differ in
the two sets, UP1 ⫽ 4.1, UP2 ⫽ 4.3, t(33) ⫽ ⫺.43, p ⫽ .67. The second syllables, which were copied from each test word to serve as targets, were, on average, 110 and 104 ms long in the two sets, t(33) ⫽ 1.19, p ⫽ .25. Their mean onset times in the words were 192 and 200 ms, t(33) ⫽ ⫺.50, p ⫽ .62. Although initial-stress words are generally more common in English than non-initial-stress words are (Cutler & Carter, 1987), this is not true for longer words. Words with primary stress on the initial syllable and secondary stress on the third syllable (e.g., “generator”) constitute only 21% of four-syllable words (19% when words are frequency-weighted) versus 34% (33% frequency-weighted) for words with primary stress on the third syllable and secondary stress on the first syllable (e.g., “panorama”). Thus, any processing disadvantages associated with third-syllable-stress words, relative to initial-stress words, cannot be accounted for by any peculiarity of the stress pattern of the former; for words of this length, such a stress pattern constitutes the norm rather than the exception. In order to raise the sensitivity of the dualtask methodology, it is desirable to increase the processing load during the detection task. To accomplish this, all test words contained a 200-ms pause after the second syllable (the target syllable) and had their third and fourth syllables masked with noise. The masking noise used in this experiment was signal-correlated, that is, its amplitude varied as a function of the amplitude of the signal, a feature that permits proportionally equal masking of differently stressed syllables. The signal-to-noise ratio was chosen on the basis of pilot work to be intense enough to provide some interference, but not so intense as to mask the speech entirely. Pilot testing revealed that both masked initial-stress and non-initial-stress stimuli were still intelligible. In addition to the 34 test words, 22 initialstress, 36 second-syllable-stress, and 22 thirdsyllable-stress filler words were included in the design. All of the fillers were four syllables long. In the initial- and third-syllable-stress sets of fillers, the target syllable was the first syllable
STRESS PATTERN AND PROCESSING MECHANISMS
585
TABLE 3 Schematic Illustration of an Experimental Sequence in Experiment 3 a
➩
2 visual auditory syllable pseudowords ➩ played twice
e.g.: VONG TREANE
“nə,” “nə”
auditory word ➩
visual ➩ pseudowords ➩ 1(D) 1(M) “generator” FLEEN
a The thick horizontal arrows indicate the unfolding of time and the thin vertical arrows indicate the approximate moment of the participants’ responses in the syllable-detection task (D) and the memory task (M).
of six words, the second syllable of four words, the third syllable of four words, and the fourth syllable of six words. In the second-syllablestress fillers, the target syllable occurred nine times in each of the four positions of the words. This breakdown was intended to create a fairly even distribution of the stress pattern and the target position across the stimuli of the experiment. The above proportions were also conditioned by the need to rotate the location of the delay among the three possible positions in the words. The 114 target-present stimuli (test words and fillers) were pooled with 114 four-syllablelong target-absent stimuli. These words followed the same stress pattern and delay/noise distribution as did the target-present words. Each target syllable in these stimuli originated from one of the other 113 target-absent words. The matching between targets and stimuli followed a pseudorandom assignment; each target was chosen to be phonetically very distinct from the syllables of its paired stimulus. Thirty-six practice trials were built that shared the same characteristics and distributions as the stimuli in the main session. In the 34 test words, the interruption was inserted between the second and third syllables, that is, immediately after the target syllable. The filler stimuli contained the delay in various positions. Among the 22 initial-stress fillers, eight contained the delay after the first syllable, six after the second syllable, and eight after the third syllable. Among the 36 second-syllablestress fillers, 12 contained the delay after the first syllable, 12 after the second syllable, and 12 after the third syllable. The 22 third-syllablestress words had the same distribution as the
initial-stress words. The 114 target-absent stimuli included an identical breakdown. In the practice trials, the position of the delay was distributed equally. Materials for the memory task. The visual stimuli used for the memory task were monosyllabic pseudowords chosen in triplets. Following the structure of the syllable detection task, there were 228 triplets in the main block and 36 triplets in the practice block. The first two pseudowords of a triplet constituted the pair to memorize. They were presented on a computer monitor side by side. The third pseudoword was the test stimulus. In half the triplets, the test stimulus was chosen to rhyme with one of the members of the associated pair (rhyme-present trials). On half of the rhymepresent trials, the rhyming member was the left item on the monitor. On the other half, it was the right item. Most rhyming pseudowords were selected with different orthographic representations (e.g., TREANE vs FLEEN) to force a phonetic coding. In rhyme-absent trials, the test stimulus did not rhyme with either member of the pair. Design. A schematic representation of a trial can be seen in Table 3. The syllable-detection task (i.e., the presentation of an auditory target syllable, played twice, followed by the presentation of an auditory quadrisyllable) was embedded in the two-step memory task. Each one of the 228 trials contained the following sequence (D indicates a component of the syllable detection task; M indicates a component of the memory task): visual presentation of two monosyllabic pseudowords (M), auditory presentation of a target syllable (D), and auditory presentation of a test word (D), and visual
586
MATTYS AND SAMUEL
presentation of a monosyllabic test pseudoword (M). A given pair of visual pseudowords was always associated with the same visual test pseudoword (e.g., VONG and TREANE were always presented with FLEEN) and a given auditory target was always followed by its associated auditory test word. However, the 228 visual pseudoword triplets and the 228 sets of auditory stimuli were randomized separately for each participant (or pair of participants—see below). The two tasks were thus totally independent. Apparatus and procedure. All the auditory stimuli were recorded by the same speaker and with the same apparatus as in Experiment 1. Participants were tested individually or in groups of two in a soundproof chamber. Each was seated in front of a monitor and wore a pair of headphones. They were told that, on each trial, they would have to memorize a pair of pseudowords presented on the monitor (the pseudowords were centered and 1.5 cm apart) and keep them in their memory until the end of the trial. After presentation of the two pseudowords, the participants heard a syllable, played twice to ensure proper encoding, followed by an auditory word. They were instructed to use two keys, one labeled “yes” and the other labeled “no,” located side by side on a response board, to indicate whether the auditory target syllable was in the word. They were told that the syllable could occur anywhere in the word. They were warned that the auditory words would include a short delay and be partially masked with noise but encouraged to try not to be distracted by the distortion and to perform the syllable-detection task as well as they could. Finally, a visual test pseudoword appeared on the monitor. Participants were asked to use the same keys as before to indicate whether the test pseudoword rhymed with either one of the pseudowords presented at the beginning of the trial. Written instructions included a summary table similar to Table 3. Speed and accuracy were encouraged in both tasks. On each trial, the visual pair of pseudowords was displayed on the monitor for 2.5 s. This was followed by a 750-ms pause, after which the
auditory syllable was played twice (the two copies were separated by a 500-ms interval). Between the offset of the second copy of the syllable and the onset of the auditory test word, there was a pause of 1.5 s. The participants then had 2 s (2.5 s in the practice block), measured from word onset, to press a response key. After their response, or at the end of the response window, a 1-s interval was inserted before the presentation of the visual test pseudoword. The pseudoword remained on the monitor for 2 s. The participants had 2.5 s (3.5 s in the practice block), measured from presentation onset, to press a response key. After their response, or at the end of the response window, there was a 1.5-s intertrial interval. Results and Discussion Miss rates and false alarm rates were computed for the memory task. In the syllabledetection task, RTs were measured from the onset of the target syllables in the auditory words. Reaction times shorter than 100 ms or longer than 1500 ms were discarded from the analyses, which represented 1.5% of the data. Miss rates and false alarm rates, in the rhymejudgment task, and mean RTs and percentages of correct responses, in the syllable-detection task, were computed by subjects and by items in the initial-stress and third-syllable-stress conditions. The central question in Experiment 3 is whether processing non-initial-stress words places an increased strain on short-term phonetic coding of information. The results provide a clear, affirmative answer to this question: listeners failed to identify a rhyme more often after third-syllable-stress words (26%) than after initial-stress words (19%), F1(1,34) ⫽ 5.23, p ⬍ .03; F2(1,16) ⫽ 6.52, p ⬍ .03. Thus, as predicted by the retroactive hypothesis (i.e., memory demands are higher for non-initialstress words), participants were significantly impaired in their ability to maintain the phonetic information needed for the rhyme judgment, leading to a loss of memorized information and hence to reduced recognition rates. This difference in recognition memory was not caused by any shift in response criterion: the
STRESS PATTERN AND PROCESSING MECHANISMS
false-alarm rate was the same for the initial- and the noninitial cases (8% in both conditions), F1 and F2 ⬍ 1. Participants did not try to guess that a pseudoword had been presented when they had weaker memory representations. Although the main goal of Experiment 3 was to compare memory demands in initial-stress and non-initial-stress words, the syllable-detection task, too, can provide information about processing differences between these two types of words. If late-stress words promote retroactive processing, then the detection of a target preceding the primary stress (e.g., detecting /nə/ in “panoRAma”) should be delayed compared to the detection of the same target in an initialstress word (e.g., /nə/ in “GEnerator”). This prediction was clearly confirmed in Experiment 2, in which a comparison was made between tightly controlled strong–weak versus weak– strong words (e.g., PERmit vs PERMIT and CAMpus vs CAMPAIGN). However, analyses performed on the RTs and error rates in the syllable detection task of the present experiment did not reveal significant differences between initialstress and third-syllable-stress words (689 ms vs 688 ms and 10% vs 9%, all Fs ⬍ 1). In considering the results of Experiment 3, it is helpful to regard the syllable-detection task and the memory task as competing for a common memory resource. Such a competition leads to a straightforward prediction: all other things being equal, devoting resources to one task should take them away from the other. In other words, there should be a negative correlation between performance on the two tasks because doing one well should be at the expense of the other. For example, a participant who uses most of the available phonetic memory to maintain the rhyme information should obtain high scores on the rhyme task, but at the cost of a memory shortage in the syllable detection task. This shortage should produce a large stress effect on the syllable-detection task because the memory-intensive third-syllable-stress words would not have their memory needs met. Conversely, a participant who devotes fewer resources to the memory task will perform comparatively poorly on this task, but should have more memory available for the syllable-detec-
587
tion task. The availability of memory in the latter should result in a reduced difference between the two types of words in that task. To test this prediction, we correlated the miss rate on the memory task with the size of the stress-pattern effect on the RTs in the syllabledetection task. As predicted, there was a reliable negative correlation, with low miss rates in the rhyme-judgment task associated with high stress-pattern effects in the syllable-detection task and vice versa, r ⫽ ⫺.44, t(33) ⫽ ⫺2.79, p ⬍ .01. This correlation reflects the trade-off between the memory needs of the two tasks and hence indicates that individuals varied in the priority they gave to each one. Following the memory-allocation hypothesis, it should also be the case that the listeners who displayed a large stress-pattern effect in one task should have a small stress-pattern effect in the other. Like the association between memory scores and stress-pattern effect in the syllabledetection task, the correlation between the two stress-pattern effects would reflect constraints on memory allocation: allocation of memory resources to one task (no stress-pattern effect) would result in a memory shortage in the other task (stress-pattern effect). The negative correlation between the two stress-pattern effects was indeed significant, r ⫽ ⫺.38, t(33) ⫽ ⫺2.37, p ⬍ .03. These results suggest that, when faced with two simultaneous, cognitively demanding tasks, participants differ as to which task gets allocated more mnemonic resources. As demonstrated by the correlational results, the outcome of this choice emerges as a disadvantage for non-initial-stress words in the task that has been allocated fewer memory resources. These results constitute a strong indication that noninitial-stress words have greater mnemonic needs than initial-stress words. The individual differences may originate from each listener’s understanding of the experiment’s instructions or from the respondent’s inherent memory capacity and its result on memory allocation (for detailed analyses on this point, see Daneman & Carpenter, 1980; Daneman & Green, 1986; Just & Carpenter, 1992). It should also be noted that the asymmetry between the two tasks is not an
588
MATTYS AND SAMUEL
isolated finding. Pichora-Fuller (1996) observed that, in case of resource limitations, perception tends to be preserved to the detriment of memory. In her study, impaired audiovisual speech perception affected subsequent word recall but high memory load did not affect perception. Analogously, most of our participants let the processing load hinder memory (rhyme judgment) more than perception (syllable detection). Although the present analyses do not provide an explanation for the choice of one task over the other, they do show that the memory store is tapped more extensively during the processing of non-initial-stress than initial-stress words and that it is large enough to accommodate most of the memory demands of either the syllable detection task or the rhyme-judgment task, but not large enough to handle both tasks together. The capacity of this store appears to be in the 1- to 2-s range suggested by Baddeley and his colleagues for the “phonological loop” (Baddeley, 1992; Baddeley, Thompson, & Buchanan, 1975). GENERAL DISCUSSION Understanding how the human speech processor accesses the representations of spokenwords in the lexicon has long been a challenging endeavor for psycholinguists and speech engineers. A growing body of evidence has shown that, in English, the use of a heuristic whereby every stressed syllable is assumed to be the onset of a word may substantially facilitate lexical access. However, this strategy cannot be complete if it theoretically fails to generate correct parsing in a significant number of occurrences (i.e., words beginning with an unstressed syllable). In such cases, additional processing is needed. As recent findings suggest (Luce & Cluff, 1998; Mattys & Samuel, 1997), this additional processing may entail a right-toleft, rather than the traditional left-to-right, time course. Starting from a stressed syllable erroneously assumed to be a word’s onset, the system then appears to require further processing, presumably to incorporate information activated earlier, in order to undo the misleading segmentation. The results obtained in this study show that
non-initial-stress words do indeed require additional processing, as suggested by costs in processing time, accuracy, or memory load for these words relative to initial-stress words. In Experiment 1, an analysis of the reaction times to initial phonemes in the first and second syllables of iambs and trochees showed that three factors affect a subject’s performance: the position of the target-bearing syllable, its salience, and the stress pattern of the word. Critically, the detection of target phonemes in non-initialstress words was found to be delayed compared to phonemes in initial-stress words. This stresspattern effect is consistent with Gow and Gordon’s (1993) results on syllable detection in stress-contrasted bisyllabic words. In their experiments, participants were instructed to detect a syllable presented visually (e.g., “con” or “flict”) in auditory words like CONflict (trochaic) or conFLICT (iambic). The authors observed that, in addition to a main effect of position (second syllables were detected faster than first ones), the syllables of the trochaic words were both detected faster than the corresponding syllables in the iambic words. Salience had no noticeable effect on the results. Thus, Gow and Gordon’s results and the present ones provide converging evidence that non-initial-stress words require additional processing time. The possibility that such extra time is a consequence of retroactive processing was confirmed in Experiment 2. Here, participants had to monitor the word-onset phoneme of trochees and iambs that were perfectly matched, segmentally and acoustically, on their first syllable (e.g., PERmit vs PERMIT or CAMpus vs CAMPAIGN). A robust detection latency cost emerged in the words bearing noninitial stress. This retroactive effect was shown to go beyond acoustic phenomena such as backward masking because acoustic differences in the second syllables (amplitude, F 0, and duration) accounted for virtually no variance in the latency difference. Thus, as predicted by stress-based models of lexical access, non-initial-stress words exhibit retroactive processing, a finding consistent with the assumption that stress in noninitial positions activates irrelevant lexical candidates
STRESS PATTERN AND PROCESSING MECHANISMS
that will need to be deactivated. Because of the splicing used to construct the stimuli, the retroactive effect found in Experiment 2 can only be attributed to activation differences originating in the second syllable, which implies that retroactivity is observed even when the same cohort of lexical candidates is originally activated by the initial syllable of strong–weak and weak– strong words. Therefore, our results suggest that there is a significant component of retroactive processing which seems to be genuinely driven by late activation rather than by weaker activation on early unstressed syllables. The literature on word recognition shows that retroactive processing is in fact not uncommon. For example, Connine, Blasko, and Hall (1991) found significant effects of subsequent semantic context on phoneme identification up to at least three syllables after the test phoneme. Similarly, Samuel (1979) observed reliable contextual effects on phoneme restoration within a window of several words after the critical phoneme. Thus, the present results add another type of retroactive processing to this literature, one governed by lexical prosody. Experiment 3 was an investigation of the difference in memory demands in initial-stress and non-initial-stress words. The rationale was that, if non-initial-stress words involve delayed processing, then they should have higher memory requirements than initial-stress words because of the need to maintain the trace of the prestress information. A direct cost of memory overload on non-initial-stress words appeared clearly in Experiment 3. When participants were asked to memorize two pseudowords to be used in a rhyme-judgment task while performing a separate syllable-detection task, they were more likely to forget the pseudowords after non-initial-stress words than after initial-stress words. This stress pattern effect on rhyme-judgment performance illustrates the different mnemonic needs of the two word types. Specifically, the disruptive effect of memory load in the non-initial-stress word condition supports the hypothesis that these words require more temporary storage than initial-stress words, a difference expected if the former require some form of delayed decision. Moreover, this de-
589
mand for storage competes for the phonetic resources required to make a simultaneous rhyme judgment. Indeed, we found a negative correlation between the stress effect in syllable detection and both the accuracy and the stress pattern-effect in rhyme judgment. Individuals who chose to allocate a high level of memory to the rhyme task and, as a consequence, performed well after both word types, were left with a reduced capacity for the syllable-detection task. Insight into the mnemonic mechanisms used during word recognition could lead to a significant advance in modeling speech processing. To date, most research on auditory memory has focused on either the phoneme level (e.g., Efron, 1970; Huggins, 1975; Massaro, 1972; Nooteboom, 1979) or the sentence level (e.g., Daneman & Carpenter, 1980; Daneman & Green, 1986; Kintsh & Van Dijk, 1978). Few studies have considered the role of memory in lexical access. The results obtained in the present research constitute evidence that auditory memory plays an important role in word recognition. In particular, they suggest that the high need for memory storage in non-initialstress words is a by-product of the stress-based heuristic used by English speakers to cope with the preponderance of initial-stress words in their lexicon. The implication that memory allocation during word processing is a language-specific phenomenon offers interesting opportunities for cross-linguistic research. For example, in languages featuring different stress-pattern distributions, the need for delayed processing and hence for temporary storage might be more or less substantial than the one shown for English (Mattys, 1997). Conversely, evidence of different memory use across various languages could provide information about lexical access in those languages. Experiment 3 also brings some support to a distinction between primary stressed syllables and other syllables. The memory difference between primary-stress-initial and secondarystress-initial words is more consistent with a segmentation strategy that takes different degrees of stress into account than with one that gives the same weight to every full syllable
590
MATTYS AND SAMUEL
(e.g., MSS). In the latter case, neither stress pattern effects nor correlations should have been found because initial-stress and non-initial-stress words would have been treated identically, with successful lexical access on the initial syllable of both types of words. Thus, Experiment 3 highlights the central role of primary stressed syllables in accessing the lexicon. Nonetheless, these results cannot be taken as a demonstration that lexical access is driven exclusively by primary stressed syllables. They only suggest that primary stressed syllables and secondary stressed syllables generate different needs for delayed processing when they are not word-initial, possibly because they trigger lexical access to different degrees. The delay in processing non-initial-stress words can in theory result from either an independent and discrete corrective process that intervenes after failure of the stress-based strategy or, from a more continuous/interactive viewpoint, additional processing caused by the system’s inability to settle into a stable state because of conflicting parsing interpretations. In the first view, failure to access the lexicon via a stressed syllable would result in a second search using the preceding (unstressed) syllable(s) as a new access code. Thus, in addition to the traditional left-to-right (proactive) processing triggered by stressed syllables, listeners would also use right-to-left (retroactive) processing. This idea is consistent with the architecture of several speech-processing models and with research based on the sequential nature of speech recognition (e.g., Cole & Jakimik, 1980a; Cutler & Norris, 1988; Marslen-Wilson & Welsh, 1978; Marslen-Wilson & Zwitserlood, 1989; Radeau & Morais, 1990). The retroactive “process” hypothesis does not necessarily mean that processing completely stops until the strong syllable is found. As suggested above, the probability or strength of lexical activation could be weighted based on the degree of stress and, as a result, the size of retroactive effects would be commensurate with the stress difference between the word-initial syllable and the triggering stressed syllable. For example, graded lexical activation would explain why the size of the stress-pattern effect in Experiment 2 was
smaller in magnitude than the average duration of the initial syllable. Indeed, since the initial syllable was a compromise between strong and weak, it would presumably have generated a certain amount of lexical activation, hence reducing the latency cost induced by retroactive processing. Within network/activation models, recognition delay originates from the continuous activation of multiple lexical candidates. Here, the bias toward selecting initial-stress interpretations among the activated candidates entails a processing delay in the sense that the resulting mid-word alignment has disruptive repercussions on the candidates generated by the wordinitial syllable. However, no distinct retroactive process (or, for that matter, even a “normal” proactive one) would be specified in such models (e.g., McClelland & Elman, 1986; Norris, 1994; Norris, McQueen, & Cutler, 1995). For instance, in Norris et al.’s model, neither the concept of proactive process nor that of retroactive process is theoretically relevant; there only is competition between sequentially activated candidates. However, competition would create an oscillation of the network that would settle after strongly weighted initial-stress and less strongly weighted non-initial-stress candidates yield a solution that best fits the input. Because of the low priority of non-initial-stress candidates in the decision process, words bearing this pattern would involve extra processing and consequently additional processing time. Thus, although model classes differ with respect to whether a specific retroactive process is needed, or simply some form of additional processing, all stress-based models of lexical access and segmentation predict that non-initialstress words will incur a penalty due to erroneous lexical activation on the late strong syllable and to the need to reverse the commitment priority in favor of candidates activated earlier in time. The data obtained in the present study shed some light on this reversal of commitment priority and on the nature of the ancillary strategy postulated by stress-based models to achieve recognition of non-initial-stress words (Cutler, 1989). Each experiment provides evidence that non-initial-stress words present a
STRESS PATTERN AND PROCESSING MECHANISMS
more difficult challenge for the speech system than initial-stress words. The results of the experiments collectively would be extremely difficult to account for (let alone predict) without invoking additional, more “expensive” processing that violates the sequential view of lexical access. There is one theoretical alternative that merits consideration. We have argued that the results reflect the differential processing required for words with noninitial stress. The theoretical alternative is that, rather than reflecting the specific costs we have postulated, the observed differences simply reflect a general disadvantage for relatively uncommon stimuli: because noninitial stress is uncommon, words with this pattern are processed more slowly and less accurately. Although we cannot definitively disprove this alternative, we believe that it is demonstrably inferior to our account on several grounds. For example, for the four-syllable stimuli used in Experiment 3, the “simple frequency” position is at odds with the results because noninitial stress is more common rather than uncommon. This analysis highlights the fundamental problem with the alternative view: because no mechanism is specified, there is no predictive power to the hypothesis; there is no specification of when rarity should hurt performance and when it should not. Consider, for example, another relative rarity: words beginning with a vowel constitute 17% of the English lexicon (based on an analysis of a 65,000-word dictionary), a percentage that is extremely similar to estimates of the frequency of non-initial-stress words. We predict that none of the effects that we have reported here would be found by contrasting vowel-initial and consonant-initial words, whereas the “common-is-better” hypothesis predicts such effects. The latter seems extraordinarily unlikely to be true. While our theoretical position is not at odds with the alternative view, it attaches mecha-
591
nisms to the rarity argument. In particular, following Cutler and Norris (1988), Grosjean and Gee (1987), Vroomen and de Gelder (1995; 1997), and others, we base our predictions on an assumption that because of the distributional facts, and because of the importance of metrical structure more generally, particular processing mechanisms have developed. It is these processing mechanisms, rather than frequency per se, that produce the array of data in this study. Whether the interaction between late information and the processing of earlier segments is related to a discrete retroactive process or to additional processing is unclear at this point. This distinction is primarily a function of the general architecture of the models to which one subscribes. Models devised around a linear notion of time will probably associate nonsequentiality in late-stress words with a specific retroactive process, whereas models devised around parallelism and interactivity will prefer to view retroactivity as additional processing. The results of the current study demonstrate that, whichever model architecture is assumed, noninitial-stress words incur a processing delay due, at least in part, to the interplay between the processing of prestress information and the presence of the stressed syllable later in the word. Our findings concur with Luce and Cluff’s (1998) cross-modal priming data showing that the lexical candidates activated by a late-stressed syllable lead to delayed recognition. In our experiments, we found that degradation of a late-stressed syllable caused the processing of earlier segments to be delayed. This result is perfectly in line with the functional architecture of stress-based models in which, as Luce and Cluff put it, “words with weak initial syllables are not resolved until information from a later occurring strong syllable is taken into account” (p. 485). The present study shows that such a resolution is not cost-free: It requires time (Experiments 1 and 2) and necessitates temporary storage (Experiment 3).
592
MATTYS AND SAMUEL
APPENDIX A Experimental Lists: Iamb-Based and Trochee-Based Lists (Experiment 1) Carrier-Word(s)
Target Iamb-based lists
gazelle saga-zealous consent falcon-sentiment confine pelican-final compete talcum-pizza discrete goddess-cretin domain window-maintenance depict study-picture delight candy-lightning debug cloudy-buggy balloon tuba-lunar become flabby-comfort percent camper-center curtain banker-tailor pursue temper-sewer confer broken-fertile decay bloody-cable
g-z g-z k-s k-s k-f k-f k-p k-p d-k d-k d-m d-m d-p d-p d-l d-l d-b d-b b-l b-l b-k b-k p-s p-s k-t k-t p-s p-s k-f k-f d-k d-k
sauna - gazelle - akward - profane - depart - dissect - pervade water - basic - dictate - saga - zealous - furry - beyond body - enroll - consent - elite - vulgar - propane - weather blatant - report - falcon - sentiment - prevent - weapon - protrude disease - prevail - guru - confine - illness - precede - vivid destiny - basil - despite - bargain - pelican - final - dispute debate - insane - barrier - detail - compete - zero - event cylinder - rifle - salon - talcum - pizza - immune - garden petite - Libra - discrete - insist - repaint - gossip - stagger tumble - goddess - cretin - reproach - believe - pretense - baguette barrack - union - converse - domain - willow - cuisine - police painting - because - pickle - neglect - window - maintenance - triangle vulture - depict - shaker - burlesque - embrace - rectangle - function retain - caution - study - picture - cortex - recline - vacuum cashier - cabin - delight - nature - prescription - business - recess tourist - candy - lightning - surpass - repose - trespass - pregnant shampoo - reprint - pillow - debug - impress - carcass - raffle presume - crucial - sharpen - preempt - cloudy - buggy - recoil credit - surrender - physics - review - balloon - onion - deduct squeaky - courage - success - tuba - lunar - deport - sarcastic valley - become - preserve - pretend - mature - menthol - defense dispose - tension - flabby - comfort - proponent - oppose - master ignore - bouquet - percent - affair - journal - curtain - dialect waiter - camper - center - decline - phantom - awake - blunder gypsy - subsidy - begin - curtail - orange - lapel - succumb grocer - behave - dolphin - syringe - banker - tailor - forbid friendly - follow- acclaim - freedom - pursue - iron - recount invade - direct - kitchen - temper - sewer - gimmick - waiver dragon - amount - confer - torment - trendy - gallon - redress metal - debase - broken - fertile - entail - gadget - total funny - patient - repulse - decay - verbal - barrier - repeat greasy - bloody - cable - allow - silly - repeal - saloon Trochee-based lists
bacon obey-convince garlic cigar-liquidity famous buffet-mosquito lady relay-defuse locus below-casino cursor occur-survive token plateau-conceal pelvic
b-karea b-k g-l g-l f-m f-m l-d l-d l-k l-k k-s k-s t-k t-k p-v
area - bacon - velvet - provide - forgive - perhaps - platoon multiple - fashion - silent - obey - convince - foamy - prepare reason - recruit - garlic - enhance - parish - offend - compass passion - create - cigar - liquidity - protect - muffin - crusade seduce - lagoon - spatial - famous - carpet - guitar - value social - pleasant - cocoon - delta - buffet - mosquito - support cocaine - surprise - person - beret - lady - tanker - return fraction - question - tycoon - relay - defuse - retire - bucket detach - marry - locus - immense - distress - mustard - perish buster - below - casino - pronounce - degree - dismiss - mundane mental - painful - tattoo - cursor - vibrant - village - imply naval - propel - memory - patrol - occur - survive - party fizzy - token - salad - revenge - surmise - bombing - fusion repair - grimace - plateau - conceal - gracious - improve - pantry esteem - seldom - pelvic - trigger - detest - worry - enough
593
STRESS PATTERN AND PROCESSING MECHANISMS
APPENDIX A—Continued Carrier-Word(s) compel-victorious painter campaign-terrain basic flamb“e-sequential greedy agree-derail corpus liqueur-pastrami table pate´-baloney taken saute´-confess curfew incur-futility favor cafe´-verbatim
Target p-v p-t p-t b-s b-s g-d g-d k-p k-p t-b t-b t-k t-k k-f k-f f-v f-v
terrace - compel - victorious - distrust - regret - nasty - witchcraft reduce - fakir - blossom - painter - relax - garbage - cubic require - federal - gamble - finesse - campaign - terrain - fondue pardon - regard - replica - perfume - basic - Viking - volcano palate - climate - donate - flambe´ - sequential - detect - produce culture - greedy - retrieve - explain - request - nectar - prescribe sensation - traffic - agree - derail - provoke - promote - furniture bamboo - debris - corpus - today - model - vowel - deviate journey - liqueur - pastrami - between - letter - brigade - mother cushion - waffle - hurrah - table - omen - perverse - misplace smoking - prolong - nickel - deduce - pate´ - baloney - perspire funnel - wedding - disprove - bureau - taken - finder - embed brochure - disgrace - barrel - saute´ - confess - gravel - various gaudy - employ - curfew - trustee - normal - towel - depend wallet - robust - incur - futility - depress -treasure - peasant local - college - cliche´ - favor - bonus - gasket - persist dentist - cafe´ - verbatim - deplore - closet - grotesque - decrypt
APPENDIX B Test Words Used in Experiment 2 (Capitalization Denotes Strong Syllables) Strong–weak words
DEfect DEcrease DEtail DIgest DIScharge DIScount PERmit PERvert PROtest REbound REcall REcess REcoil REfill REfund REject RElay SUBject BAllot BEAker BEAcon BEAgle CAMpus CARtridge CURtain DEcoy DEEper
Weak–strong words
deFECT deCREASE deTAIL diGEST disCHARGE disCOUNT perMIT perVERT proTEST reBOUND reCALL reCESS reCOIL reFILL reFUND reJECT reLAY subJECT baLLET beCAUSE beCOME beGAN camPAIGN carTOON curTAIL deCAY dePART
Strong–weak words
DEtour DIScourse DISmal DIStant DIStrict FRONtal MISter MISty PARking PERson PERfect PORter PREtense PROlog PROton RAcket REcent REtail REflex RObot ROmance STAMping SUper SURvey TRAIning TRANSfer VERmin
Weak–strong words
deTAIN disCUSS disMAY disTINCT disTRACT fronTIER misTAKE misTREAT parQUET perCEIVE perFORM porTRAY preTEND proLONG proTRUDE raCCOON reCITE reTAIN reFINE roBUST roMAINE stamPEDE suPERB surVIVE traiNEE transFUSE verMOUTH
594
MATTYS AND SAMUEL
APPENDIX C Test Words Used in Experiment 3: The Words of the Two Sets Are Paired on the Identity of the Second Syllable Initial-stress words
Third-syllable-stress words
cognitively navigator prosecutor generator desperately matrimony patronizing systematize commentator momentary voluntary constituting fascinating dominating capitalism calculable dormitory
phonetician convocation consequential panorama desperado detrimental gastronomic destination segmentation lamentation melancholic destitution vaccination termination competition circulation limitation
REFERENCES Baddeley, A. D. (1992). Working memory. Science, 255, 556 –559. Baddeley, A. D., Thompson, N., & Buchanan, M. (1976). Word length and the structure of short-term memory. Journal of Verbal Learning and Verbal Behavior, 14, 575–589. Bard, E. G., Shillcock, R. C., & Altmann, G. T. M. (1998). The recognition of words after their acoustic offsets in spontaneous speech: Effects of subsequent context. Perception & Psychophysics, 44, 395– 408. Bond, Z. S., & Garnes, S. (1980). Misperception of fluent speech. In R. Cole (Ed.), Perception and production of fluent speech (pp. 115–132). Hillsdale, NJ: Erlbaum. Bradley, D. C. (1980). Lexical representation of derivational relation. In M. Aronoff & M.-L. Keen (Eds.), Juncture (pp. 37–55). Saratoga, CA: Anma Libri. Cluff, M. S., & Luce, P. A. (1990). Similarity neighborhoods of spoken two-syllable words: Retroactive effects on multiple activation. Journal of Experimental Psychology: Human Perception and Performance, 16, 551–563. Cole, R. A., & Jakimik, J. (1980a). A model of speech perception. In R. Cole (Ed.), Perception and production of fluent speech (pp. 133–163). Hillsdale, NJ: Erlbaum. Cole, R. A., & Jakimik, J. (1980b). How are syllables used to recognize words? Journal of the Acoustical Society of America, 67, 965–970. Cole, R. A., Jakimik, J., & Cooper, W. E. (1978). Perceptibility of phonetic features in fluent speech. Journal of the Acoustical Society of America, 64, 44 –56.
Connine, C. M., Blasko, D. G., & Hall, M. (1991). Effects of subsequent sentence context in auditory word recognition: Temporal and linguistic constraints. Journal of Memory and Language, 30, 234 –251. Cutler, A. (1976). Phoneme monitoring reaction time as a function of preceding intonation contour. Perception & Psychophysics, 20, 55– 60. Cutler, A. (1986). Forbear is a homophone: Lexical prosody does not constraint lexical access. Language and Speech, 29, 201–220. Cutler, A. (1989). Auditory lexical access: Where do we start ? In W. Marslen-Wilson (Ed.), Lexical representations and process (pp. 342–356). Cambridge, MA: MIT Press. Cutler, A. (1990). Exploiting prosodic probabilities in speech segmentation. In G. Altmann (Ed.), Cognitive models of speech processing: Psycholinguistic and computational perspectives (pp. 105–121). Cambridge, MA: MIT Press. Cutler, A., & Butterfield, S. (1992). Rhythmic cues to speech segmentation: Evidence from juncture misperception. Journal of Memory and Language, 31, 218 – 236. Cutler, A., & Carter, D. M. (1987). The predominance of stressed initial syllables in the English vocabulary. Computer Speech and Language, 2, 133–142. Cutler, A., & Foss, D. J. (1977). On the role of sentence stress in sentence processing. Language and Speech, 20, 1–10. Cutler, A., & Norris, D. G. (1988). The role of stressed syllables in segmentation for lexical access. Journal of Experimental Psychology: Human Perception and Performance, 14, 113–121. Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19, 450 – 466. Daneman, M., & Green, I. (1986). Individual differences in comprehending and producing words in context. Journal of Memory and Language, 25, 1–18. Efron, R. (1970). Effects of stimulus duration on perceptual onset and offset latencies. Perception & Psychophysics, 8, 231–234. Eimas, P. D., Marcovitz Hornstein, S. B., & Payton, P. (1990). Attention and the role of dual codes in phoneme monitoring. Journal of Memory and Language, 29, 160 –180. Fear, B. D., Cutler, A., & Butterfield, S. (1995). The strong/ weak syllable distinction in English. Journal of the Acoustical Society of America, 97, 1893–1904. Frauenfelder, U. H., & Segui, J. (1989). Phoneme monitoring and lexical processing: Evidence of associative context effects. Memory & Cognition, 17, 134 –140. Frauenfelder, U. H., Segui, J., & Dijkstra, T. (1990). Lexical effects in phonemic processing: Facilitatory or Inhibitory? Journal of Experimental Psychology: Human Perception and Performance, 16, 77–91. Garman, M. (1990). Psycholinguistics. Cambridge: Cambridge Univ. Press.
STRESS PATTERN AND PROCESSING MECHANISMS Gow, D. W., & Gordon, P. C. (1993). Coming to terms with stress: Effects of stress location in sentence processing. Journal of Psycholinguistic Research, 22, 545–578. Grosjean, F. (1985). The recognition of words after their acoustic offset: Evidence and implications. Perception & Psychophysics, 38, 299 –310. Grosjean, F., & Gee, J. P. (1987). Prosodic structure and spoken-word recognition. Cognition, 25, 135–155. Huggins, A. W. F. (1975). Temporally segmented speech and “echoic” storage. In A. Cohen & S. G. Nooteboom (Eds.), Structure and process in speech perception (pp. 209 –225). New York: Springer-Verlag. Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual differences in working memory. Psychological Review, 99, 122–149. Kintsh, W., & Van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363–394. Kuc˘era, H., & Francis, W. (1967). Computational analysis of present-day American English. Providence, RI: Brown Univ. Press. Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT Press. Lieberman, P. (1965). On the acoustic basis of perception of stress by linguists. Word, 21, 40 –54. Luce, P. A., & Cluff, M. S. (1998). Delayed commitment in spoken-word recognition: Evidence from cross-modal priming. Perception & Psychophysics, 60, 484 – 490. Marslen-Wilson, W. D., & Welsh, A. (1978). Processing interactions and lexical access during word recognition in continuous speech. Cognitive Psychology, 10, 29 – 63. Marslen-Wilson, W. D., & Zwitserlood, P. (1989). Accessing spoken-words: The importance of word onsets. Journal of Experimental Psychology: Human Perception and Performance, 15, 576 –585. Massaro, D. W. (1972). Stimulus information vs processing time in auditory pattern recognition. Perception & Psychophysics, 12, 50 –56. Mattys, S. L. (1997). The use of time during lexical processing and segmentation: A review. Psychonomic Bulletin & Review, 4, 310 –329. Mattys, S. L. (2000). The perception of primary and secondary stress in English. Perception & Psychophysics, 62, 000 – 000. Mattys, S. L., & Samuel, A. G. (1997). How lexical stress affects speech segmentation and interactivity: Evidence from the migration paradigm. Journal of Memory and Language, 36, 87–116. Mattys, S. L., Jusczyk, P. W., Luce, P. A., & Morgan, J. L. (1999). Phonotactic and prosodic effects on word segmentation in infants. Cognitive Psychology, 38, 465– 494. McClelland, J. L., & Elman, J. L. (1986). The TRACE model of speech perception. Cognitive Psychology, 18, 1– 86. Mehta, G., & Cutler, A. (1988). Detection of target pho-
595
nemes in spontaneous speech and read speech. Language and Speech, 31, 135–156. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81–97. Morgan, J. L. (1996). A rhythmic bias in preverbal speech segmentation. Journal of Memory and Language, 35, 666 – 688. Nakatani, L. H., & Schaffer, J. A. (1978). Hearing ’words’ without words: Prosodic cues for word perception. Journal of the Acoustical Society of America, 63, 234 – 245. Nooteboom, S. G. (1979). The time course of speech perception. In W. J. Barry & K. J. Kohler (Eds.), “Time” in the production and perception of speech. Arbeitsberichte, 12, Institut fu¨r Phonetik, University of Kiel. Nooteboom, S. G., Brokx, J. P. L., & de Rooij, J. J. (1978). Contribution of prosody to speech perception. In W. J. M. Levelt & G. B. Flores d’Arcais (Eds.), Studies in the perception of language (pp. 75–107). New York: Wiley. Norris, D. G. (1994). Shortlist: A connectionist model of continuous speech recognition. Cognition, 52, 189 – 234. Norris, D. G., McQueen, J. M., & Cutler, A. (1995). Competition and segmentation in spoken-word recognition. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21, 1209 –1228. Pichora-Fuller, K. M. (1996). Working memory and speechreading. In D. G. Stork & M. E. Hennecke (Eds.), Speechreading by humans and machines (pp. 257–274). NATO ASI series. Pitt, M. A., & Samuel, A. G. (1990). Attentional allocation during speech perception: How fine is the focus. Journal of Memory and Language, 29, 611– 632. Pitt, M. A., & Samuel, A. G. (1995). Lexical and sublexical feedback in auditory word recognition. Cognitive Psychology, 29, 149 –188. Radeau, M., & Morais, J. (1990). The uniqueness point effect in the shadowing of spoken-word. Speech Communication, 9, 155–164. Samuel, A. G. (1979). Speech is specialized, not special. Doctoral dissertation, University of California, San Diego, 1979, Dissertation Abstracts International, 40 – 08B. Samuel, A. G., & Kat, D. (1998). Adaptation is automatic. Perception & Psychophysics, 60, 503–510. Sherman, B. L. (1971). Phonemic restoration: An insight into the mechanisms of speech perception. Unpublished master’s thesis, University of Wisconsin–Milwaukee. Shields, J. L., McHugh, A., & Martin, J. G. (1974). Reaction times to phomeme to targets as a function of rhythmic cues in continuous speech. Journal of Experimental Psychology, 102, 250 –255. Shillcock, R. (1990). Lexical hypotheses in continuous speech. In G.T.M. Altmann (Ed.), Cognitive models of speech processing: Psycholinguistic and computa-
596
MATTYS AND SAMUEL
tional perspectives (pp. 24 – 49). Cambridge, MA: MIT Press. Simon, H. A. (1974). How big is a chunk? Science, 183, 482– 488. Vroomen, J., & de Gelder, B. (1995). Metrical segmentation and lexical inhibition in spoken-word recognition. Journal of Experimental Psychology: Human Perception and Performance, 21, 98 –108. Vroomen, J., & de Gelder, B. (1997). Trochaic rhythm in speech segmentation. Paper presented at the 38 th Meeting of the Psychonomic Society, Philadelphia, PA.
Vroomen, J., van Zon, M., & de Gelder, B. (1996). Cues to speech segmentation: Evidence from juncture misperceptions and word spotting. Memory & Cognition, 24, 744 –755. Warren, R. M., & Sherman, B. L. (1974). Phonemic restoration based on subsequent context. Perception & Psychophysics, 16, 150 –156. (Received July 26, 1999) (Revision received October 12, 1999)