Journal of Phonetics (1995) 23, 487 – 499
Letter to the Editor Final accent y s . no accent: utterance-final neutralization in Tokyo Japanese Timothy J. Vance Department of Japanese , Connecticut College , New London , CT , 06320 , U.S.A. Receiy ed 2nd August 1993 , and in rey ised form 25th May 1995
The distinction in standard Tokyo Japanese between final accent and no accent is ordinarily said to be neutralized utterance-finally. Measuremens of f0 in tokens produced by four speakers participating in a preliminary experiment indicate that one of the speakers makes a distinction in production. The results of a follow-up experiment show that this speaker is also capable of perceiving a distinction in her own forms with significantly greater than chance accuracy. This speaker’s performance may reflect an individual difference or a dialect difference. This work complements the pioneering research by Sugitoˆ (1982), which remains the standard reference on this topic. ÷ 1995 Academic Press Limited
1. Introduction The accent patterns on short phrases in Tokyo Japanese are traditionally described as sequences of high and low pitches, one pitch assigned to each mora (Hirayama, 1960; McCawley, 1977).1 For nouns with a short final syllable, the difference between final accent and no accent is typically manifested on a following grammatical particle. For example, the pitch pattern on / hana´ 1 ga / ‘‘flower 1 NOMINATIVE’’ is described as LHL, whereas that on / hana 1 ga / ‘‘nose 1 NOMINATIVE’’ is described as LHH. The distinction is said to be neutralized utterance-finally, both / hana´ / and / hana / being LH in isolation. This claim of neutralization, however, has not gone unchallenged. Uwano (1977: 289) says that the pitch patterns on pairs like / hana´ / and / hana / are not idetnical for all speakers on all occasions, and he suggests that an accented final syllable may differ from an unaccented one by having a higher pitch or a falling contour. Neustupny´ (1978: 58 – 73) claims that the distinction is neither clearly maintained nor entirely neutralized, and although he proposes that it is realized acoustically by some inconsistent set of interacting features, he explicitly mentions only pitch and intensity as possibilities. Neustupny´ ’s own experiment was limited to a single subject, and Mathias (1980) interprets the results as showing only that listeners tend to 1 A short syllable contains one mora and long syllable contains two moras. A brief review of older descriptions of the difference between final accent and no accent is found in Sugitoˆ (1982). For a recent and very different view of Japanese accent patterns, see Pierrehumbert & Beckman (1988).
0095-4470 / 95 / 040487 1 13 $12.00 / 0
÷ 1995 Academic Press Limited
488
T. J. Vance
identify all isolated tokens as accented, but the intriguing suggestion of a partially maintained contrast merits consideration. Measurements of short sentences involving comparable word pairs have shown that if / hana´ 1 ga / and / hana 1 ga / appear in identical contexts, the f0 on / na´ / will generally be higher than the f0 on / na / (Sugitoˆ , 1982; Poser, 1984; Pierrehumbert & Beckman, 1988; Kubozono, 1993).2 In view of the evidence for partially maintained distinctions in other putative cases of neutralization (e.g., Fox & Terbeek, 1977; Dinnsen & Charles-Luce, 1984; Charles-Luce, 1985; Slowiaczek & Dinnsen, 1985; Charles-Luce & Dinnsen, 1987; Port & Crawford, 1989), it is only natural to ask whether a similar difference in f0 might also be found in isolated words between final (or only) syllables that are accented and those that are unaccented. Sugitoˆ (1968) found no such difference in measurements from a single speaker, but in a later experiment with 14 speakers (Sugitoˆ , 1982) she found exactly this kind of difference in tokens of / hana´ / and / hana / produced by three of those speakers. Keating (1984) suggests that phonological theory must allow for measurable but inaudible differences in positions of putative neutralization. On the other hand, several studies show that native listeners can perceive such partially maintained distinctions with greater than chance accuracy (Port & O’Dell, 1985; Slowiaczek & Szymanska, 1989; Di Paolo & Faber, 1991). Sugitoˆ (1982) carried out an experiment in which stimuli bearing a range of synthetic f0 contours were identified as / hana´ / or / hana / , and for some listeners the results indicate this kind of less-than-categorical but greater-than-chance perception for the Japanese final accent y s. no accent distinction. Two simple experiments are reported here. The original motivation for this work was simply to provide corroboration for Sugitoˆ ’s (1982) findings, but some of the results were unexpected enough to be worth communicating. The first experiment was a crude pilot study designed merely to screen Tokyo speakers in the hope of finding someone capable of distinguishing final accent from no accent in isolation forms, and only a few judgments were elicited from each listener. There is no clear indication in the results that any of the listeners can perceive a difference, but measurements of peak fundamental frequency (f0) in some of the recorded tokens indicate that one of the participating speakers makes a distinction in production. The second experiment was a more careful follow-up designed to test whether that speaker and one of the others could distinguish final accent from no accent when listening to their own and to each other’s isolation forms. All testing was conducted at the University of Hawaii at Manoa, and all f0 measurements were made using a Kay Elemetrics Visi-Pitch. 2. Preliminary experiment 2.1 . Stimulus materials The seven minimal pairs listed in Table I were chosen for use in the preliminary experiment. The two words in each pair differ only in that, according to standard accent dictionaries (Hirayama, 1960; Nihon Hoˆ soˆ Kyoˆ kai, 1985), one has final accent while the other is unaccented. All 14 words are nouns. Two very simple carrier sentences were constructed for each minimal pair. Each 2
Kubozono (1993: 85 – 93) labels this phenomenon ‘‘accentual boost’’, and the average f0 increment in his data (from a male subject) is about 10 Hz.
Final accent y s. no accent in Tokyo Japanese
489
TABLE I. Minimal pairs used in the preliminary experiment Final-accented word
Gloss
Unaccented word
Gloss
/ kı´ / / hı´ / / e´ / / na´ /
‘‘tree, wood’’ ‘‘fire’’ ‘‘picture’’ ‘‘greens, vegetables’’ ‘’bridge’’ ‘‘flower’’ ‘‘fence’’
/ ki / / hi / /e/ / na /
‘‘spirit, feeling’’ ‘‘sun, day’’ ‘‘handle’’ ‘‘name’’ ‘‘edge’’ ‘‘nose’’ ‘‘persimmon’’
/ ha*´ı / / hana´ / / kakı´ /
/ ha*i / / hana / / kaki /
carrier sentence consisted of a grammatical particle followed by a predicate word and provided a context in which either word of the relevant minimal pair can occur naturally as the first word. For example, the two carrier sentences for / e´ / ‘‘picture’’ and / e / ‘‘handle’’ were – o tsuketa ‘‘(I) attached a – ’’ and – mo aru ‘‘There is also a – ’’. As noted above, the difference between final accent and no accent is typically realized on a following grammatical particle. A list of 28 sentences (14 words 3 2 carriers) was then prepared for recording. A second list, consisting of 35 isolated words, was also prepared. For each minimal pair in Table I, one member appeared twice and the other three times on this second list.3 The sequence of items on both lists was random except for the condition that no two items involving the same minimal pair were contiguous. The purpose of the carrier-sentence list was to serve as a check on a potential confounding factor in interpreting listener responses. Variation in accent patterns among Tokyo speakers is a fact of life. In particular, some of the pairs in Table I are homophonous for some speakers even when a grammatical particle follows. Needless to say, there is no point in investigating whether two such words can be distinguished in isolation if they cannot be distinguished in carrier sentences. Four college-educated Japanese women who were raised in Tokyo served as speakers. Each speaker first read a brief set of instructions to listeners (in Japanese) followed by a short sentence to be used as a sample item. This sample sentence did not include any of the words in Table I. The speaker then read the two test lists. The lists were printed in ordinary Japanese orthography, which does not indicate accent. For each item, the experimenter prompted the speaker with a hand signal, after which the speaker read the item number and the item itself. Items were separated by a 10-second pause, with a longer pause separating the two lists. 2 .2. Acoustic measurements As a first step toward replicating Sugitoˆ ’s (1982) finding that some speakers have a higher f0 on accented final syllables than on unaccented final syllables in isolated words, the peak f0 on each speaker’s isolation tokens of the four monosyllabic words / kı´/ , / ki / , / hı´ / , and / hi / was measured. Each speaker recorded 10 relevant tokens: three each of / kı´ / and / hi / and two each of / ki / and / hı´ / . All four words have the 3 This unbalanced design was motivated by Mathias’s (1980) surmise that listeners expect an equal number of accented and unaccented tokens, and that this expectation biases their responses. In the present case, however, a balanced design would probably have been better. Given the small number of items for each pair and their separation by items for other pairs, it seems highly unlikely that listeners could have kept close enough track for their responses to be affected in this way.
T. J. Vance
Maximum f0(Hz)
490
Figure 1. Maximum fundamental frequency on [i] in each speaker’s isolation tokens of unaccented / ki / and / hi / and accented / kı´ / and / hı´ / .
same vowel preceded by what is phonetically a voiceless palatal obstruent ([kj] for / k / and [(] for / h / ). This similarity in segmental composition controls for the well-known influences of f0 of vowel height (Peterson & Barney, 1952; Lehiste & Peterson, 1961) and preceding consonants (Lehiste & Peterson, 1961). Fig. 1 displays peak f0 on the five accented tokens and the five unaccented tokens produced by each speaker. For Speaker 2 the graph shows a neat vertical separation between the two sets, and in spite of the small sample sizes, the difference between the means of Speaker 2’s sets is significant (Wilcoxon w 5 6.82, p , 0.01). There is no comparable separation for the other three speakers. It thus appears that, at least in monosyllables, only Speaker 2 maintains a measurable difference between final accent and no accent. It is important to keep in mind, of course, that reading lists of isolated words is an unnatural activity that draws the attention of speakers to potentially contrasting word pairs. Fourakis & Iverson (1984) argue that the incomplete neutralization of voiced and voiceless final obstruents reported for German is confined to the artificial speech produced in a typical experimental situation. Thus, the measured tokens displayed in Fig. 1 may not be representative of Speaker 2’s productions in more natural settings. 2.3 . Perceptual study Each of the four recordings of 28 sentences followed by 35 isolated words was played to a different group of 10 listeners over headphones in a language laboratory.4 The 40 listeners were all native speakers of Japanese who were raised 4
Speaker 4’s recording was actually played to 11 listeners, but one insisted that the task was impossible and gave up in the middle.
Final accent y s. no accent in Tokyo Japanese
491
in the greater Tokyo area (Tokyo, Kanagawa, Saitama, and Chiba prefectures). An answer sheet contained written instructions (in Japanese), a response choice for the sample item, and numbered response choices for the test items. Each response choice consisted of the two Chinese characters used to write the two members of the relevant minimal pair in Table I. The written instructions asked listeners to circle the character corresponding to the word they thought they had heard in each item and to make a choice even when uncertain. After reading the written instructions, each listener heard the recorded instructions at the beginning of the tape and the sample item. The tape was then stopped, and any questions were answered. The test portion of the tape was played without interruption. As anticipated, many listeners could not distinguish the two members of certain pairs even in carrier sentences. Table II shows for each speaker how many listeners (out of 10) correctly identified all four carrier-sentence items involving each pair of words. For example, Table II shows that not a single listener could distinguish / kakı´/ from / kaki / in the carrier sentences produced by Speakers 1, 2, and 4. A likely explanation for these results is that none of these three speakers maintains a distinction.5 In other cases, it seems more likely that the speaker maintains a distinction but that some of the listeners do not. For example, Table II shows that three listeners who heard Speaker 2 and one listener who heard Speaker 3 could not distinguish / ha*´ı / and / ha*i / in carrier sentences. These four listeners probably do not maintain the distinction, but the two speakers probably do. As explained in Section 2.1., the carrier-sentence list contained four items for each minimal pair (2 words 3 2 carriers). An incorrect response to any of these four items was taken to mean that the listener could not distinguish the members of the pair in carrier sentences, and that listener responses to the isolated-word items for that pair were discarded as irrelevant. The remaining responses to the isolated-word items recorded by each speaker indicate that the listeners, taken as groups, could not distinguish final accent from no accent in isolated words. For example, the list of 35 isolated words contained three tokens of / kı´ / and two tokens of / ki / . For Speaker 1, the 50 relevant responses (5 items 3 10 listeners) were: / kı´ / identified as / kı´ / : 25; / kı´ / as / ki / : 5; / ki / as / kı´ / : 17; and / ki / as / ki / : 3.6 Each such set of responses can be treated as a 2 3 2 contingency table, and in each case where the total number of responses was sufficiently large (i.e., $40), a chi-square test was performed. None of these chi-square values reaches the 3.84 required for significance at the 0.05 level, but the three highest values (including a ‘‘near miss’’ at 3.58 for / hı´ / y s. / hi / ) are all for Speaker 2. Recall that the f0 measurements reported in Section 2.2. indicate that Speaker 2 maintains a distinction in production. The results of Sugitoˆ ’s (1982) perception experiments indicate that even speakers who maintain a distinction in production are not especially good at perceiving that distinction. An experiment designed to test whether Speaker 2 fits this description is reported below in Section 3.
5
Subsequent elicitations from Speaker 1 confirmed that she pronounces both words unaccented. The overall listening results generally corroborate Mathias’s (1980) suggestions (see Section 1) that listeners tend to identify isolated tokens as accented. The results of Sugitoˆ ’s (1982) perception experiment using natural tokens show the same tendency: most of the errors are tokens of / hana / identified as / hana´ / . 6
492
T. J. Vance TABLE II. Number of listeners (out of 10) who correctly distinguished minimal pairs in all carrier sentences by each speaker Word pair
/ kı´ / y s. / hı´ / y s. / e´ / y s. / na´ / y s. / ha*´ı / y s. / hana´ / y s. / kakı´ / y s.
/ ki / / hi / /e/ / na / / ha*i / / hana / / kaki /
Speaker 1
Speaker 2
Speaker 3
Speaker 4
10 10 0 10 0 10 0
10 10 0 2 7 10 0
10 10 10 10 9 10 4
10 10 10 10 10 10 0
3. Follow-up experiment 3.1 . Stimulus materials Two of the speakers from the preliminary experiment reported above in Section 2. participated in a follow-up experiment. Speaker 2 is the one who produced a measurable distinction between final-accented and unaccented monosyllables in the preliminary experiment, and Speaker 1 was included for comparison purposes. Each speaker recorded a set of 100 isolated words containing 25 tokens each of / kı´ / , / ki / , / hana´ / , and / hana / . Every odd-numbered item was either / hana´ / or / hana / , and every even-numbered item was either / kı´ / or / ki / , but the sequence of accented and unaccented items was random.7 For each item, the experimenter prompted the speaker with a slip of paper bearing the single Chinese character used to write the word in standard Japanese orthography. Items were separated by pauses of approximately three seconds. The experiment was limited to a single pair of monosyllables and a single pair of disyllables in order to minimize the tedium of the recording sessions and maximize the number of comparable tokens. It was important to include a disyllabic pair, since there is no guarantee that monosyllables and disyllables behave in parallel fashion. The pair chosen is the same / hana´ / and / hana / that Sugitoˆ (1982) used in her experiments. 3 .2. Acoustic measurements On all the tokens f0 was measured. For each token of / kı´ / or / ki / , a single measurement was taken: maximum f0 on [i]. For each token of / hana´ / or / hana / , two measurements were taken: minimum f0 on [a] in the first syllable and maximum f0 on [a] in the second syllable. The measurements of / kı´ / and / ki / were expected to corroborate the claim made above in Section 2.2. that Speaker 2 maintains a measurable distinction between accented and unaccented monosyllables. The measurements of / hana´ / and / hana / were expected to reveal whether Speaker 2 maintains a comparable distinction between final accent and no accent in disyllables. 7 In spite of the relatively large number of items for each pair, their alternation between monosyllables and disyllables should have made it difficult for listeners to keep track of the proportions of accented and unaccented responses. Nonetheless, the balanced design does raise the problem mentioned in Note 3.
493
Maximum f0(Hz)
Final accent y s. no accent in Tokyo Japanese
Figure 2. Percentile plots of maximum f0 (in Hz) for [i] in tokens of / kı´ / and / ki / as produced by Speaker 1 (left) and Speaker 2 (right). The solid line in each box marks the median, and the broken lines mark the 25th and 75th percentiles. Each box encloses approximately 90% of the range (except that no data points are excluded if there are fewer than 20 points).
2nd-Syllable Maximum f0(Hz)
Fig. 2 displays the measurements of maximum f0 in / kı´ / and / ki / . Just as in Fig. 1 in Section 2.2. above, there is a significant difference between the means of the accented and unaccented tokens for Speaker 2 (Wilcoxon w 5 34.91, p , 0.001) but not for Speaker 1 (Wilcoxon w 5 0.07, p . 0.79). Fig. 3 displays the measurements of maximum f0 on the second syllable of / hana´ / and / hana / , and here again, there is a significant difference between the means of the accented and unaccented tokens for Speaker 2 (Wilcoxon w 5 23.34, p , 0.001) but not for Speaker 1 (Wilcoxon w 5 3.33, p . 0.06). Neither speaker appears to maintain a distinction
Figure 3. Percentile plots of maximum f0 (in Hz) for second-syllable [a] in tokens of / hana´ / and / hana / as produced by Speaker 1 (left) and Speaker 2 (right). The solid and broken lines are as in Fig. 2.
494
T. J. Vance
Figure 4. Percentile plots of difference (in Hz) between maximum f0 on second-syllable [a] and minimum f0 on first-syllable [a] in tokens of / hana´ / and / hana / as produced by Speaker 1 (left) and Speaker 2 (right). The solid and broken lines are as in Fig. 2.
between final accent and no accent in terms of the minimum f0 on the first syllable: the difference between the means of the accented and unaccented tokens is not significant for either Speaker 1 (Wilcoxon w 5 3.11, p . 0.07) or Speaker 2 (Wilcoxon w 5 1.24, p . 0.26). If a speaker maintains a distinction in disyllables, it would not be surprising to find that magnitude of the rise in f0 from the first-syllable minimum to the second-syllable maximum is more relevant than the second-syllable maximum itself.8 Fig. 4 displays the difference between the first-syllable minimum and the secondsyllable maximum in / hana´ / and / hana / . The difference between the means of the accented and unaccented tokens is significant for Speaker 2 (Wilcoxon w 5 4.85, p , 0 .05) but for Speaker 1 (Wilcoxon w 5 0.011, p . 0 .74). Speaker 2 thus appears to maintain a distinction in terms of magnitude of rise, but the accented and unaccented tokens are not as clearly separated by this measure as they are by second-syllable maximum (cf. Fig. 3). 3.3 . Perceptual study After enough time had passed to make any recollection of the order of items extremely unlikely (more than a week in both cases), each speaker listened to her own recording on one day and to the other speaker’s recording on the following day. For each listening session, an answer sheet was provided containing 100 response choices, each response choice consisting of the two Chinese characters used to write the two members of the relevant minimal pair. The listener was instructed to circle the character corresponding to the word she thought she had heard in each item. 8
Sugitoˆ (1982) took magnitude of rise to be the important value in her measurements of / hana´ / and
/ hana / .
Final accent y s. no accent in Tokyo Japanese
495
TABLE III. Responses to the 25 tokens of / kı´ / and the 25 tokens of / ki / recorded by each speaker for the individual listening experiment: A 5 accented tokens identified as accented; B 5 unaccented tokens identified as accented; C 5 accented tokens identified as unaccented; D 5 unaccented tokens identified as unaccented; χ 2 5 chi-square value when A, B, C, and D are treated as a 2 3 2 contingency table Speaker
Listener
A
B
C
D
χ2
Significance
Speaker 1
Speaker 1 Speaker 2
23 18
22 18
2 7
3 7
0.22 0.00
ns ( p . 0 .63) ns ( p 5 1)
Speaker 2
Speaker 1 Speaker 2
21 23
17 0
4 2
8 25
1.75 —
ns ( p . 0 .18) —
Table III shows the responses to the tokens of / kı´ / and / ki / recorded by each speaker. Each set of 50 responses (25 tokens of each word) can be treated as a 2 3 2 contingency table, and in each case except the last (Speaker 2 listening to her own recording) a chi-square test was performed. None of these three chi-square values was significant at the 0.05 level. These results indicate that neither speaker could identify Speaker 1’s tokens and that Speaker 1 could not identify Speaker 2’s tokens. No chi-square value can be computed for Speaker 2’s responses to her own tokens because one of the cells in the contingency table would be zero, but a statistical test is superfluous in this case. It is obvious that Speaker 2 was able to identify her own tokens with far greater than chance accuracy, since the responses are 96% correct. The responses to the tokens of / hana´ / and / hana / recorded by each speaker are shown in Table IV. Here again, each set of 50 responses (25 tokens of each word) can be treated as a 2 3 2 contingency table, and in each case a chi-square test was performed. The results indicate that neither speaker could identify Speaker 1’s tokens but that both could identify Speaker 2’s tokens with greater than chance accuracy. 3 .4. Discussion It seems clear from the response data in Table III above that Speaker 1 does not distinguish accented and unaccented monosyllables in either production or perception, whereas Speaker 2 both produces and perceives a distinction. Figure 2 in TABLE IV. Responses to the 25 tokens of / hana´ / and the 25 tokens of / hana / recorded by each speaker for the individual listening experiment: A 5 accented tokens identified as accented; B 5 unaccented tokens identified as accented; C 5 accented tokens identified as unaccented; D 5 unaccented tokens identified as unaccented; χ 2 5 chi-square value when A, B, C, and D are treated as a 2 3 2 contingency table Speaker
Listener
A
B
C
D
χ2
Significance
Speaker 1
Speaker 1 Speaker 2
17 9
12 7
8 16
13 18
2.05 0.37
ns ( p . 0 .15) ns ( p . 0 .54)
Speaker 2
Speaker 1 Speaker 2
20 19
11 10
5 6
14 15
6.87 6.65
p , 0.01 p , 0.01
496
T. J. Vance
Maximum fo on [I] (Hz)
Section 3.2. above shows a clear distinction in peak f0 on [i] between Speaker 2’s tokens of / kı´ / and her tokens of / ki / , but it does not necessarily follow that the peak f0 on the vowel is a relevant perceptual cue for differentiating accented and unaccented monosyllables. It could simply be that the relevant cue or cues happen to correlate with peak f0 on the vowel. There are nonstandard Japanese dialects in which the final syllable of a final-accented word pronounced in isolation has what is described as a high falling pitch, whereas the final syllable of an unaccented word pronounced in isolation has what is described as a mid level pitch. Assuming these impressionistic descriptions are essentially accurate, it is an open question whether the difference in contour or the difference in pitch would be more important perceptually for the speakers of such a dialect. Perception experiments in which potential cues can be manipulated independently would be necessary to establish exactly what is relevant for Speaker 2 in differentiating accented and unaccented monosyllables. A closer look at Speaker 2’s responses to her own tokens of / kı´ / and / ki / suggests that peak f0 on the vowel is not the sole cue for differentiating the two words. As Fig. 5 shows, Speaker 2 identified all tokens with a measured maximum f0 of 201 Hz or lower as / ki / and all tokens with a measured maximum f0 of 219 Hz or higher as / kı´/ . She identified six of the intermediate tokens of as / ki / (measured maximum f0: 205, 206, 208, 216, 216, 218) and four as / kı´ / (measured maximum f0: 203, 204, 209, 213), and her identification was mistaken in only one of these ten cases. It is highly implausible to suppose that this 90% success rate was due to chance, and peak f0 could not possibly be serving as the cue in this range of values. It seems fair to conclude from the response data in Table IV that Speaker 2 distinguishes final-accented and unaccented disyllables to some degree in both production and perception, although not as sharply as she distinguishes between accented and unaccented monosyllables (cf. Table III). Speaker 1 apparently does not distinguish final-accented and unaccented disyllables in her own production, but
Figure 5. Speaker 2’s responses to her own tokens of / kı´ / and / ki / displayed according to maximum f0 on [i]. The shaded area shows the portion of the range between the lowest value for a token identified as / kı´ / and the highest value for a token identified as / ki / .
Final accent y s. no accent in Tokyo Japanese
497
she appears to be capable of perceiving the distinction when another speaker produces it. Speakers 1 and 2 had exactly the same success rate (68%) in identifying Speaker 2’s tokens of / hana´ / and / hana / . Fig. 3 shows a clear distinction between speaker 2’s tokens of / hana´ / and / hana / in terms of the maximum f0 on the vowel in the second syllable (i.e., the endpoint of the rise). Fig. 4 shows a less clear distinction in terms of the difference between the maximum f0 on the vowel in the second syllable and the minimum f0 on the vowel in the first syllable (i.e., the magnitude of the rise). To what extent either of these measurable differences serves as a perceptual cue is, of course, unclear. Here again, perception experiments in which potential cues can be manipulated independently would be necessary to establish exactly what is relevant, and it may be that Speakers 1 and 2 do not rely on the same cue(s). The results of Sugitoˆ ’s (1982) perception experiment with synthetic f0 contours indicate that at least some listeners do in fact respond to magnitude of rise as a cue.9 As noted in Section 1., pitch-contour and amplitude have also been proposed as possible cues; vowel quality and voice quality are likely candidates as well. Speaker 1 was able to identify Speaker 2’s tokens of / hana´ / and / hana / with significantly greater than chance accuracy but was not able to do so with Speaker 2’s tokens of / kı´ / and / ki / . This discrepancy suggests that Speaker 2’s monosyllables lack the cue(s) to which Speaker 1 was responding in Speaker 2’s disyllables. Speaker 2’s responses to her own monosyllables were considerably more accurate than her responses to her own disyllables (96% y s. 68%), and this discrepancy, too, suggests that the cue(s) in disyllables may be different (and less robust). On the other hand, the same cue(s) might simply be lest robust in disyllables than in monosyllables.
4 . Conclusion It is clear from the data reported above that Speaker 1 and Speaker 2 performed very differently on the assigned tasks. One possible explanation for the difference is individual variation of the sort that Sugitoˆ ’s (1982) results indicate. In other words, it may be that some Tokyo speakers partially maintain an utterance-final distinction between no accent and final accent while other speakers do not. Individual differences of various kinds have been reported in other studies of putative neutralizations (Dinnsen & Charles-Luce, 1984; Slowiaczek & Dinnsen 1985; Port & Crawford, 1989; Di Paolo & Faber, 1991). A separate question that must be considered is whether Speaker 2 maintains the final accent vs. no accent distinction in natural speech. As noted in Section 2.2., reading lists of isolated words in an experimental setting is an unnatural activity that draws attention to potential contrasts. Perhaps Speaker 2 is particularly skilled at imposing an artificial contrast under such circumstances. Sugitoˆ (1982) notes that the three speakers in her study who most clearly maintained a distinction between / hana´ / and / hana / were all television announcers, and it is not implausible to suppose that such speakers are prone to unnaturally precise speech. One way to 9
Sugitoˆ ’s results indicate that the cue is relative rather than absolute magnitude, since the range of values in the set of tokens presented seems to affect listener responses.
498
T. J. Vance
investigate the distinction in more natural speech would be to have speakers read dialogues containing the words in question in utterance-final position. Another possible explanation for the data reported above is that Speakers 1 and 2 speak slightly different dialects. Speaker 1 was raised in Suginami Ward, a part of western Tokyo proper, and in Mitaka City, a suburb just to the west of Suginami Ward. Speaker 2 was raised in Katsushika Ward, a part of eastern Tokyo proper. Both Suginami and Katsushika are peripheral wards that were annexed to the city proper only in 1932, and neither is part of the geographical area in which modern standard Tokyo Japanese developed historically. Katoˆ (1970) says that an accent distinction between the isolation forms of / hana´ / and / hana / is maintained in several locations in Chiba Prefecture near the border with Katsushika Ward, although he does not describe how the two forms differ. In any case, it would not be terribly surprising to discover that such a distinction is also found in parts of Katsushika Ward itself. Another possibility that must at least be considered is parental influence. Neither of Speaker 2’s parents is a Tokyo native, and one is from Chiba Prefecture. The conventional wisdom in linguistics is that children imitate their peers rather than their parents, but in Japanese dialectology there is a strong tradition of regarding the native dialects of parents as an important factor affecting pronunciation. Perhaps Speaker 2 acquired a dialect with an accent distinction that is not shared even by other people who grew up in the same neighborhood. Work with other speakers from Katsushika Ward is clearly necessary to determine the status of the distinction exhibited by Speaker 2. The results presented here highlight the importance of stringent selection criteria for participants in experiments involving Japanese accent. The listeners for the preliminary experiment reported in Section 2. were simply required to be native speakers of Japanese raised in the greater Tokyo area, and this requirement is clearly too lax for anything other than a pilot study. Sugitoˆ (1982) tried to restrict participation in her study to speakers who were themselves born and raised in Tokyo and whose mothers were from the area that constituted Tokyo proper before the 1932 annexations of outlying areas, and 12 of her 14 subjects satisfied both conditions. Even this kind of restriction is insufficient to guarantee a group that is homogeneous in the relevant respects. The work reported here was supported in part by a summer research award from the Japan Studies Endowment at the University of Hawaii. I am grateful to Yaeko Habein, Miwa Nishimura, Nobuko Ochner, and Kishiko Vance for serving so graciously as speakers and to Fumiko Earns and Cammie Hamblin for their help in recruiting listeners. I am also grateful to Nancy Arakawa and Daniel Tom for their expert technical assistance. Some of this material was presented at the annual meeting of the Linguistic Society of America in 1991 and at Haskins Laboratories in 1994. I would like to thank Mary Beckman, Pam Beddor, Alice Faber, Kikuo Maekawa, Bart Mathias, Bill Poser, and Tom Robb for their comments and advice. Finally, I would like to say a special word of thanks to Yumiko Satoˆ for her help with background research and for her incisive criticism.
References Charles-Luce, J. (1985) Word-final devoicing in German: effects of phonetic and sentential contexts, Journal of Phonetics , 13, 309 – 324. Charles-Luce, J. & Dinnsen, D. A. (1987) A reanalysis of Catalan devoicing, Journal of Phonetics , 15, 187 – 190. Dinnsen, D. A. & Charles-Luce, J. (1984) Phonological neutralization, phonetic implementation and individual differences, Journal of Phonetics , 12, 49 – 60.
Final accent y s. no accent in Tokyo Japanese
499
Di Paolo, M. & Faber, A. (1991) Phonation differences and the phonetic content of the tense-lax contrast in Utah English, Language Variation and Change , 2, 155 – 204. Fourakis, M. & Iverson, G. K. (1984) On the ‘‘incomplete neutralization’’ of German final obstruents, Phonetica , 41, 140 – 149. Fox, R. A. & Terbeek, D. (1977) Dental flaps, vowel duration, and rule ordering in American English, Journal of Phonetics , 5, 27 – 34. Hirayama, T. (1960) Zenkoku akusento jiten [Nationwide accent dictionary]. Tokyo: Toˆ kyoˆ doˆ . Katoˆ , M. (1970) Henka suru koˆ gai no kotoba: Toˆ kyoˆ no higashigawa [Suburban language change: east of Tokyo], Gengo seikatsu , 225, 64 – 72. Keating, P. A. (1984) Phonetic and phonological representation of stop consonant voicing, Language , 60, 286 – 319. Kubozono, H. (1993) The organization of Japanese prosody. Tokyo: Kurosio Publishers. Lehiste, I. & Peterson, G. E. (1961) Some basic considerations in the analysis of intonation, Journal of the Acoustical Society of America , 33, 419 – 425. McCawley, J. D. (1977) Accent in Japanese. In Studies in stress and accent (L. M. Hyman, editor), pp. 261 – 302. Los Angeles: University of Southern California Department of Linguistics. Mathias, B. (1980) Review of Nestupny´ 1978, Journal of Linguistics , 16, 326 – 328. Neustupny´ , J. V. (1978) Post -structural approaches to language : language theory in a Japanese context. Tokyo: University of Tokyo Press. Nihon Hoˆ soˆ Kyoˆ kai (1985) Nihongo hatsuon akusento jiten , kaitei shinpan [Japanese pronunciation and accent dictionary, new revised edition]. Tokyo: Nihon Hoˆ soˆ . Peterson, G. E. & Barney, H. L. (1952) Control methods used in a study of the vowels, Journal of the Acoustical Society of America , 24, 175 – 184. Pierrehumbert, J. B. & Beckman, M. E. (1988) Japanese tone structure. Cambridge: MIT Press. Port, R. & Crawford, P. (1989) Incomplete neutralization and pragmatics in German, Journal of Phonetics , 17, 257 – 282. Port, R. F. & O’Dell, M. L. (1985) Neutralization of syllable-final voicing in German, Journal of Phonetics , 13, 455 – 471. Poser, W. J. (1984) The phonetics and phonology of tone and intonation in Japanese. MIT doctoral dissertation. Slowiaczek, L. M. & Dinnsen, D. A. (1985) On the neutralizing status of Polish word-final devoicing, Journal of Phonetics , 13, 325 – 341. Slowiaczek, L. M. & Szymanska, H. J. (1989) Perception of word-final devoicing in Polish, Journal of Phonetics , 17, 205 – 212. Sugitoˆ , M. (1968) Doˆ tai-sokutei ni yoru Toˆ kyoˆ nihaku-go odaka to heiban akusento-koˆ [A study of final-accented and unaccented disyllabic words in Tokyo using dynamic measurement], Onseigakkai Kaiho ˆ , 129, 1 – 4. Sugitoˆ , M. (1982) Toˆ kyoˆ akusento ni okeru ‘‘hana’’ to ‘‘hana’’ no seisei to chikaku [The production and perception of ‘‘hana’’ and ‘‘hana’’ with Tokyo accent]. In Nihongo akusento no kenkyu ˆ [Research on Japanese accent], pp. 182 – 201. Tokyo: Sanseidoˆ . Uwano, Z. (1977) Nihongo no akusento [Japanese accent]. In Iwanami ko ˆ za Nihongo 5: on ’in [Iwanami ˆ no & T. Shibata, editors), pp. 281 – 321. Tokyo: Iwanami. course on Japanese 5: phonology] (S. O