Journal of Phonetics (1992) 20, 441-456
Affective exploitation of tone in Taiwanese: an acoustical study of "tone latitude" Elliott D. Ross* Neuropsychiatric Research Institute; Department of Neuroscience, University of North Dakota School of Medicine; and the Veterans Affairs Medical Center, Fargo, North Dakota, U.S.A.
Jerold A. Edmondson* Department of Foreign Languages and Linguistics, University of Texas at Arlington, Arlington, Texas, U.S.A.
G. Burton Seibert Department of Community Medicine, University of Texas Southwestern Medical Center, Dallas, Texas, U.S.A.
Jin-Lieh Chan Department of Neurology, Chang Gung Memorial Hospital, Linkou Medical Center, Taipei, Taiwan, R.O.C. Received 28th November 1989, and in revised form 22nd October 1991
Our previous research on the acoustical features associated with affective prosody suggested that speakers of Taiwanese might utilize the allowed imprecision in the phonetic realization of lexical tones ("tone latitude") for affective signalling. In this paper, we explore this hypothesis further, using four measurements of the fundamental frequency contours associated with tones produced by controls and right-brain-damaged patients on an affective repetition task. The measures were average Fo over the tone, initial F0 , F0 slope, and duration. The results strongly support the existence of tone latitude in Taiwanese speech and its manipulation for affective purposes; there was a wide range of values for each measure for at least one tone in the sentence across the affectively different repetitions produced by the normals. Moreover, the total range of variation across the renditions was substantially smaller in the productions by the right-brain-damaged patients by comparison to these normal controls. The results thus also provide further evidence for the differential lateralization of the affective and linguistic aspects of language to the right and left hemispheres of the brain, respectively.
1. Introduction Ever since the fundamental discoveries of Broca (1865) and Wernicke (1977, originally published in 1874) that focal brain lesions in the left hemisphere give rise *Address correspondence to either Dr Ross at Department of Neuroscience, UNO Education Bldg., 1919 N. Elm St. , Fargo, ND 58102 U.S.A ., or Dr Edmondson at the University of Texas at Arlington, Box 19557, Arlington, TX 76019, U.S.A. 0095-4470/92/040441 + 16 $08.00/0
© 1992 Academic Press Limited
442
E. D. Ross et a!.
to specific combinations of linguistic processing deficits (aphasias), a basic claim in Neurology and Neurolinguistics has been that language is a dominant and highly lateralized function of the left hemisphere. Over the last two decades, however, considerable evidence has accrued to suggest that the right hemisphere also actively participates in language, in particular, through its modulation of affective prosody and gestures (Gorelick & Ross, 1987; Heilman, Scholes & Watson, 1975; Heilman, Bowers, Speedie & Coslett, 1984; Ross, 1981, 1985, 1989; Ross & Mesulam, 1979; Shapiro & Danly, 1985; Van Lancker, 1980). Focal brain lesions of the right hemisphere result in specific combinations of affective processing deficits (Ross, 1981) that are analogous to the aphasic deficits (Benson, 1979) following focal lesions of the left hemisphere. The various clinical syndromes associated with right hemisphere lesions that differentially impair the production and comprehension of affective prosody are called aprosodias (Ross, 1981). In non-tone languages such as English, affective-prosodic signalling in speech relies to a great extent on manipulation of fundamental frequency in the form of intonation (Bolinger, 1964; Crystal, 1969; Ross, Edmondson, Seibert & Homan, 1988) with other acoustic features also having a contributory role (Williams & Stevens, 1972). In contrast, tone languages use fundamental frequency to produce short pitch contours as the major acoustic carrier for the lexical category of tone (Abramson, 1962, 1975; Gandour, 1978), which has been shown to be a predominant left hemisphere function for both tone perception and production (Gandour & Dardarananda, 1983; Naeser & Chan, 1980; Packard, 1986; Van Lancker & Fromkin, 1973). (Interlanguage differences, however, may exist; a recent study by Gandour, Petty & Dardarananda (1988) suggests that in Thai patients with certain types of aphasias tone production is relatively more resistant to distortion when compared to that in Mandarin Chinese patients with similar aphasias.) Since focal lesions of the right hemisphere also cause aprosodias in speakers of Taiwanese (Southern Min) and Mandarin (Hughes, Chan & Su, 1983; Edmondson, Chan, Seibert & Ross, 1987), the question arises-what are the acoustic correlates of affective prosody in speakers of tone languages? In a previous paper (Ross, Edmondson & Seibert, 1986) we addressed this issue by comparing acoustically the productions of Taiwanese, Mandarin, Thai and English speakers when asked to generate five different affects on a repetition task using a single sentence. We found that tone language speakers did not use certain acoustic parameters associated with fundamental frequency to the same degree as English speakers for generating effective prosody. We then studied loss of affective prosody following focal right-brain infarction in Taiwanese patients with motor-type aprosodias in order to determine which acoustic features are associated with affective prosody (Edmondson et al., 1987). We found that controls, when compared to patients, employed various global aspects of F0 , intensity, and timing for imparting affect. Since F 0 is the principal acoustic carrier of lexical tones in speech (Abramson, 1972, 1975; Gandour, 1978), we had assumed that if intonation features were manipulated by tone language speakers to communicate affective information, then those features should affect global rather than local aspects of the F0 pattern so as to avoid varying local pitch contours which, in turn, could alter tonal contrasts sufficiently to change word meaning. For example, if a tone language speaker were to raise or lower the overall mean pitch of an utterance for affective purposes, tonal contrasts would remain unaffected. Such
443
"Tone latitude" in Taiwanese I
2
3
,-,-
50 "'•·
- 4 .-5
( h)
n
-
........................
.. ··
80 70
40
............ . . . . . . !""·. . . . .
"
60
30
50
.............................
. ·· . . """··......... 1"'.........
20
40
··...
30
20
10
Ill
+-'-__L..J..,-_j__J__.l,---"---,_l_L-+--"---r'- - - t 70 ,------- - -- - - -- - - -4
-f--Ll--'--'--'-r-L.L__u_,-J-_L_---,--.1_,-------,--~
--.,--,-- ,--(c)
I
- -2
- - --
-
611
... 50
....
··..... ,
>
;:;
~·
~ 40
-
-
- -------,- IIMI (d)
n
l)() ~II
...-.......... ·· .......................
)
~
....
g
70
~
no
~
511
...·..
~ 30 2
II
,"'..,
. . r·. . . . . . . . .· -·-. . .
- ~ 20
t'"'....... .
411
30
.1'. -;:;
:§
"
V)
211 0
!0
Ill
() -j-J-~-L.-L+--'-+--'--r-'--L--,--L--t 50 -,--- -- - - - - - --
+-L.LL----Lf-"-.L....--,-'-'-~'---r--'-,-----,----t(l
- - -- T T - - --
Ie I
5 ,---
2
- - - --
3
,--
4 r-
5
n
-
-
- ----.- IIMI
I fl
,---,-----
~(I
811
40
· !" "·..
70
no
30
50
..
20
·.· .. .. ~·
.........
[.•"'·· ...
40
··..·
30
20
lO
Ill o+-L-~__L-L~-L,_L_L-T-L--+---+~~-+-L.-L,-J-_L_~LL__L,-----,----+ o
0
25
50
75
100
125
150 0 Centiseconds
0.25
0.50
0. 75
1.0
1.25
1.50
Figure 1. Six graphs illustrating how tones were segmented from the
acoustical data in a neutral ((a) , (c), (e)) and a happy ((b), (d) , (f)) utterance produced by J.L.C. for the Stimulus Tape ((a), (b)), by a female Control subject ((d), (e)) and by a male Patient ((e), (f)) .
changes would be analogous to altering the key of a musical score, which would still leave the relative melodic relationships intact. (This is illustrated in Fig. 1, which shows tracings for typical utterances. The normal Control productions show an overall increase of approximately eight semitones between the "Neutral" and "Happy" renditions whereas the Patient's productions show only a minimal increase of about two semitones.) However, if the pitch of a single lexical tone were altered sufficiently within an utterance, word meaning could be affected. To our surprise, one of the F0 parameters (Delta F0 ), which was constructed to be very sensitive to local rather than global variations in pitch contour (Ross, Edmondson & Seibert, 1986), also showed significant variation on the affective tasks for normal controls when compared to the aprosodic patients (Edmondson et a/. , 1987). This unex-
444
E. D. Ross et al.
pected result strongly suggested, therefore, that normal Taiwanese speakers might acoustically exploit for affective purposes some amount of allowed imprecision in the phonetic realization of lexical tones-i .e. of the "tone latitude" . Since tones contrast lexical items in a tone language , they should be perceived by fluent speakers of the language in a manner similar to the perception of the phoneme segment. What this means is that the acoustic features associated with a particular tone, such as pitch contour shape or pitch height, should have a range of acoustic variation across which linguistic category remains invariant. For example, if two high-level tones occur in a row, the first might be higher than the other or have a slight downward slope . However, they will still be perceived as high-level tones even though there is, at an acoustic but not phonemic level , contrast between the two adjacent tones. It is this phonemic, "categorical" perception of lexical tones which allows for tone latitude. We use the term "categorical" loosely here, since certain experiments have not fully supported the idea that tones are perceived categorically in the sense that consonants are (see Section 5.1 for more details). In this paper we examine lexical tone production in normal Taiwanese subjects and in patients with motor-type aprosodias from right-brain damage in order, first, to characterize and quantify acoustically the amount of tone latitude that may be present in Taiwanese speech; second, to identify the proportion of this amount utilized specifically for affective signalling; and third, to define the neurology of hemispheric lateralization that underlies tone latitude.
2. Methods 2.1. Patients and controls The patient and control groups were the same as those used in our previous study of affective prosody in Taiwanese (Edmondson et al., 1987) . The patient group consisted of eight right-handed native-speakers of Taiwanese who incurred ischemic infarctions of the right inferior frontoparietal region as documented by computerized tomographic brain scans. They were interviewed and · examined by the fou:rth author who is a native speaker of Taiwanese and a neurologist, some time between the lOth and 47th day post-stroke. None of the patients showed any impairment of tonal or segmental aspects of language that could be construed as aphasic. All the patients, however , evidenced emotional flattening of speech, including a severe loss of affective repetition as deemed by a panel of judges (see Edmondson et a!., 1987, sections 2.5. and 3.3. ), with variable deficits in affectiveprosodic comprehension , findings characteristic of either motor or global aprosodia (Ross, 1981; Hughes eta!., 1983 ; Gorelick & Ross , 1987). The control group consisted of eight non-hospitalized native speakers of Taiwanese. They were also interviewed by J.L.C, who screened them for the presence of neurologic or psychiatric illness; in addition , none was taking medicines, such as neuroleptics, that could alter affective prosody.
2. 2. Data collection The voice data were recorded on a Sony WM-60 professional quality cassette tape recorder using a Unisound EM-84 microphone that was attached to the boom of a
445
"Tone latitude" in Taiwanese
headset and positioned 2 em from the mouth just to the side of the airstream. High-quality recording tape was employed throughout to insure good frequency response in the critical range from 70-8000 Hz. Patient interviews were conducted in the hospital in a clinical setting and control interviews were done at home . The speech samples analyzed for this paper were identical with those utilized for the "first perceptual task" in our previous paper (Edmondson et al., 1987). The subjects listened to a standardized stimulus tape which contained five tokens for each of five affective renditions (neutral, angry, sad, surprised and happy) of the following sentence: pes3 k'i21 jas3 I k'ua 21 tian 22 lexical form = /ll 53 you phonetic form= [li55
want go pess k'iss
see k'ua 53
movies tian2I(22) jas31
"You are going the movies" The underlying lexical tones assigned to the six syllables of the sentence are described in Table I . Due to tone-sandhi rules that change contiguous tones in speech (Cheng & Cheng, 1977) , the sentence has the phonetic realization shown in the second line . (The phonetic tones we derived from our data for the third and fifth syllables differ somewhat from those in Chen (1987), who would predict the third syllable to be high-falling 53 , rather than high-level 55, and the fifth syllable to be Iow-falling/ 1 whereas in many or our subjects the tone seemed to be low-level 22. These differences may be dialectal since Chen (1987) does not say which of several regional variants he studied; known differences exist among the closely related form of Southern Min spoken on the Chinese mainland and the Southern Min spoken at different geographical locations on Taiwan.) The subjects were asked to model their productions as closely as possible to the stimulus sentence while being taperecorded. Retakes as needed were allowed and patients were monitored carefully for any sign of disinterest or inattention. The first perceptual stimulus tape was constructed from the above tape recordings as follows: for every subject, the best affective response (as judged by the panel of judges) for each of the five affects was TABLE I. The seven tones of Taiwanese. The tone values are transcribed using the usual five-point scale, in which 5 represents the top of the pitch range and 1 the bottom. The tone values are from Cheng & Cheng (1977), and differ slightly from those in other sources (Putonghua-Minnan Fangyan Cidian, 1982) for tone categories 1 and 5 as indicated in parentheses; "open" syllables end with either vowels or nasals and closed syllables end with stop consonants Tone value
Description and distribution
Yin set 1 2 3 4
55(44) 53 21 23
High-level ; open syllables High-falling ; open syllables Low-falling; open syllables Low-rising; closed syllables
Yang set 5 6 8
35 (24) 22 4
Mid-rising; open syllables Low-level ; open syllables High-level; closed syllables
Tone category
446
E. D. Ross et al.
dubbed onto a second tape to produce an effective set comprising five affective renditions of the sentence.
2.3. Acoustical analysis of tones The voice recordings of patients and controls were analyzed acoustically using a PM Pitch Analyzer (Voice Identification, Inc.) that was connected to a PDP 11/23 computer (Digital Equipment Corporation, Inc.) through a parallel interface . Each subject's utterances were played one at a time into the PM Pitch Analyzer. The Pitch Analyzer extracted the F0 in Hz and relative intensity in dB from the utterance and presented the extracted information as upper and lower traces on the machine's cathode ray tube (CRT) display. Simultaneously, through the parallel interface, a digitized volatile memory file was stored by the PDP 11/23 at a sampling rate of 100Hz. Through the use of programmable cursors on the CRT of the PM Pitch Analyzer and a computer program, stray data points caused by microphone artifacts, voice break-ups and other sampling errors are removed from the raw data and the pitch trace was divided into six contours representing lexical tones associated with each of the six constituent syllables of the utterance (see Section 2.2). Other than in the affectively neutral sentences uttered by controls (an observation to be explored in a later study), continuous voicing was often present across syllable boundaries even though the third and fourth syllables began with underlying voiceless stops (Fig. 1(a)-(f)). We were forced to develop, therefore, a consistent criterion other than voice onset and offset for approximating the points for tone onset and offset. Since syllable nuclei are roughly associated with amplitude peaks in periodic energy (Abercrombie, 1967; Lehiste & Peterson, 1959; Pike, 1943; Stevens & House, 1961), we decided to use the intensity contour as our main arbiter. By observing the intensity contour generated on the CRT while listening to the taped utterances, we could consistently identify distinct intensity peaks associated with the succession of syllables . The initial and final inflections of the intensity peak associated with each syllable were used to determine tone onset and offset (Fig. 1(a)-(f)). Two of our six patients had mild to moderate dysarthrias which did not affect spe~ch comprehensibility. Their syllables did not always have sharply delimited intensity peaks because the consonant constrictions were not always made forcibly (Darley, Aronson & Brown, 1975) . In these instances we listened to the utterances a number of times while observing the PM Pitch Analyzer trace the pitch and intensity curves on the CRT. From these observations we formed judgments about where to segment the pitch curve. (We do not necessarily claim that the above procedures precisely identify the exact psychoacoustical portion of the pitch contour associated with individual tones; rather they provide a reasonably consistent method to approximate the intended endpoints.) As the pitch contour boundary was established for each syllable, the onset/offset times in cs were entered manually into the PDP 11/23, which then converted the relevant F0 data from Hz to semitones and stored the data on a floppy disk file. The semitone transformation, as we have discussed in the past (Ross et a/. , 1986), is necessary to convert the Hz-scale into the interval preserving semitone-scale (Fairbanks & Pronovost, 1939).
"Tone latitude" in Taiwanese
447
2.4. Normalization of tone data By far the biggest effect on F0 of varying affect was a change in global pitch range or "key". (Compare, for example, the difference in overall F 0 level between the "neutral' and "happy" renditions in Fig. 1(a) and (b) or (c) and (d).) In this study, however, we are not interested in this global effect, but rather in the effect on the realization of a given tone within the pitch range of an utterance. For example, is Tone 1 level in all renditions and is it as high relative to Tone 3? To answer such questions, we had to normalize for these global changes in "key", using the following procedure. Since five of the six tones were either high level or high falling (Section 2.2), the upper limit of each utterance's "tone space" (Abramson, 1976; Klatt, 1973; also Sections 5.1 and 5.2 below) could be easily defined as the highest F0 data point contained within the F0 contours representing the six tones. For each utterance a constant was derived by subtracting the highest F 0 data point from 50 (semitones). The F 0 data were then normalized for each utterance by adding its derived constant to all F0 data points. 2.5. Acoustic measures of tones To characterize each tone quantitatively so that a statistical comparison between patients and controls could be carried out, four acoustic measurements of the F0 contour at each tone were used. These measures were similar to the acoustic features that have been shown to be distinctive in perceptual studies of tone for speakers of a tone language (Gandour, 1978; Gandour & Harshman, 1978). As with any set of empirically-developed metrics for measuring complex acoustic phenomena, these access certain aspects of the actual phenomena but do not necessarily capture them completely. (a) Average F 0 (semitones)-was derived by averaging all the semitone data points of the tone's F 0 contour. (b) Initial F0 (semitones)-was calculated by averaging the first three points (initial 30 ms) of the tone's F 0 contour. (c) Slope F0 (semitones/s)-was determined using a least squares formula for calculating the linear regression coefficient (Zar, 1984) or slope of the semitone data over the entire F 0 contour for the tone. (d) Length F 0 (centiseconds)-was derived by subtracting the offset from the onset time of the tone's F 0 contour. All measurements for every subject were placed in appropriate data files on floppy disk. A summary data file was then transferred to a VAX 11/780 (Digital Equipment Corporation, Inc.) for statistical analysis using SAS (SAS Institute, Inc.) 3. Data reduction and statistical analysis Since the object of our study was to quantify the degree of tone latitude used for affective signalling, we were interested in the range of variability rather than the raw measures. We chose the following method to estimate this range: we calculated an Emotional Range (ER) statistic for each acoustic measure at each tone position (i.e., for each successive syllable in the sentence) by subtracting the minimum value
448
E. D. Ross et a!.
across each subject's set of five affective renditions from the maximum value. This is similar to what we did in our previous papers (Ross et al. , 1986; Edmondson et a/. , 1987). A statistical analysis was then carried out on the derived ER data for each of the four acoustic measures using a two-way Analysis of Variance with repeated measures. This analysis took into account two factors: (1) a group factor-patients (n = 8) vs. controls (n = 8), and (2) a repeated-measures factor-the ER statistics at each tone position for all subjects. An interaction between the two factors was judged to be significant at a p-value less than or equal to 0.1 and a main effect was judged to be significant at a p-value less than or equal to 0.01.
4. Results Figure 2 shows the mean Emotional Range (ER) for controls and patients for each acoustic measure at each tone position. All but one of the main effects was significant , but only Length F0 (Fig. 2(d)) showed a significant interaction between factors, at p = 0. 004. The significant main effect of Tone Position for Length F0 (p = 0.0001) is most likely accounted for by the mean ER differences between tone positions 1-5 and tone position 6, which probably is related to the expected phenomenon of prepausal lengthening of the final syllable. The significant main effect of Group (p = 0.002) is most likely accounted for by the very large mean ER differences between patients and controls at tone positions 5 and 6. This could also account for much of the significant interaction for this measure (although we note
(a)
( h)
4
... 2" .E
ti
2" 3
.E
...
"
>
>
2
I)
0
30
(c)
(d)
80 ~ 60
~ 20
2"
-~
"
E 40
"
~ 10
>
20 0
0
6
6 Tone pos it ion
Figure 2. Graphs showing the meanER data for cont rols (clear bars) and patients (hatched bars) across the six Tone Positions for the four acoustical parameters used to measure tone latitude : (a) Average F0 ; (b) Initial F0 ; (c) Slope F0 ; (d) Length F0 . Error fla gs equal one Stand ard Error of the Mean (SEM) . See Section 4 for a description of the statistical analysis of the data .
449
"Tone latitude" in Taiwanese
also the reversal of the usual direction of the difference between the two groups at tone position 1). Both Average F0 (Fig. 2(a)) and Initial F0 had highly significant main effects for Groups (p = 0.0006 and p = 0.001, respectively) and for Tone Position (p = 0.0001 for both). Slope F0 (Fig. 2(c)), however, showed only a highly significant Group main effect at p = 0.0003. Some of these statistical effects on individual tones are evident in the tracings of individual utterances in Fig. 1. For example, the Stimulus Tape and Control subjects' tracings both show distinct use of prepausal lengthening in the general shortening of all but the sixth tone for the "Happy" rendition as compared to the "Neutral" (compare Fig. 1(a) to (b) and 1(c) to (d)) which is not seen in the Patient's tracings (compare Figs 1(e) to (f)). Also note that the first three high-level tones in the "Neutral" tracings have approximately the same pitch height and slope for each subject. However, in the "Happy" tracings for the Stimulus Tape and Control subjects distinct variations in pitch height and slope across the three high-level tones are evident which is not present in the Patient's tracing. Despite the variations across the three high-level tones for the Stimulus Tape and Control subjects, word meaning was not altered; it is this acoustically measurable variation in the production of tone contours and contrasts that we refer to as tone latitude. To estimate the mean amount of tone latitude regardless of tone type or position in an utterance, we averaged the patient ER means and the control ER means across tone positions to obtain the summary group means for Average F0 , Initial F0 and Slope F 0 shown in Table II. The performance of the control subjects suggests that Average F 0 for a tone can vary by approximately 3.4 semitones , initial F0 by approximately 3.2 semitones and Slope F 0 by approximately 53 semitones/s without changing word meaning. Since our previous research established , using a panel of judges, that the patients were severely compromised in their ability to repeat an utterance with affective variation (Edmondson et al., 1987; see Sections 2.5. and 3.3.), we assume that while the control ERs represent the maximum range of tone latitude, consisting of affective contributions from the right hemisphere plus any intrinsic contributions from the left hemisphere (see below and Section 5.3) , the patient ERs represent, at most, only a minimal degree of affective tone latitude with most of the range of variation consisting of intrinsic contributions from the left hemisphere. On this assumption, if the patient summary group mean ERs are subtracted from the control summary group mean ERs, then, the component of tone latitude used by the right hemisphere for affective purposes can be estimated. The
TABLE II . Summary group meanER statistics for Average F0 (semitones) , Initial F0 (semitones) and Slope F0 (semitones/s) as measures for calculating the Total, Intrinsic and Affective components of tone latitude in Taiwanese speakers
Controls (C) Patients (P) (C-P) [(C- P)/C)
X
100
Average F 0 Mean (SEM)
Initial F 0 Mean (SEM)
Slope Fo Mean (SEM)
Tone latitude component
3.4 (0.22) 1.9 (0.15) 1.5 44%
3.2 (0.22) 2.0 (0.16) 1.2
53 (4.8) 29 (2 .3) 24 46%
Total Intrinsic Affective % Affective
38%
450
E. D. Ross et a!.
affective component of tone latitude was 1.5 semi tones for Average F0 , 1.2 semitones for Initial F0 and 24 semitones/s for Slope F0 , which represent approximately 44%, 38% and 46% of the total tone latitude, respectively (Table II). Since patients were significantly impaired on the affective repetition task (Edmondson et a!., 1987), the patient summary group mean ERs (Table II) provide an estimate of the intrinsic or nonaffective component of tone latitude that is presumably contributed during speech by the left hemisphere. The intrinsic component of tone latitude was 1.9 semitones for Average F 0 , 2.0 semitones for Initial F0 and 29 semitones/s for Slope F 0 (Table II).
5. Discussion In order to be able to interpret our results, the most pressing questions to consider are whether tones are perceived categorically and whether the quantitative acoustic measurements of tone latitude have psycholinguistic relevance. Although our findings are very robust statistically, it is possible that the acoustic differences measured for assessing the range of tone latitude may not be either perceivable or used by the Taiwanese listener in making affective judgements since there are other acoustical phenomena that contribute to affect in Taiwanese, such as changes in key, timing and loudness (Edmondson et a!., 1987). To answer the question directly, a complicated psychoacoustic study would have to be done whereby various aspects of pitch, intensity and timing, as represented by the acoustic parameters F 0 Register, F0 Attack, Delta F0 , dB register, Delta dB and Total Time (Edmondson et al., 1987), were individually varied holding the others constant in order to sort out the relative contribution of each to the psychoacoustic perception of affect in Taiwanese. However, psychoacoustic studies exist that deal with the categorical perception of tones and just-noticeable differences (JND) for F0 contours which might answer indirectly, at least, whether the amount of tone latitude measured in our data could be perceived as acoustic constrasts by Taiwanese speakers. 5.1. Categorical perception of tones Research in the area of categorical perception has not produced a clear cut answer in regards to how tones are perceived by tone language speakers. Some evidence supports categorical perception (Chan, Chuang & Wang, 1975; Wang, 1976) whereas ·other evidence supports more continuous perception (Abramson 1961, 1979). On reviewing the literature, we think that the differences in findings can be attributed to whether subjects had some conception of the "tone space" for the various synthesized stimuli that were given to them and to whether subjects had previous training or experience with psychoacoustical experiments, enabling them to respond to stimuli on a strictly acoustic rather than a linguistic-phonemic basis. In fact, the discrepant findings in tone perception are very reminiscent of the issues involving categorical perception of vowels and, to a lesser degree, consonants (Repp, 1984). Most likely tones have, at least, a good degree of categoricity; otherwise it would be very difficult to explain why tones are perceived and produced phonologically as part of syllabic units and why they are resistant to degradation under adverse and varying acoustical circumstances , such as presentation of productions by speakers with different natural Fos (male vs. female vs. children) or
"Tone latitude" in Taiwanese
451
by patients with dysarthria. Some degree of categoricity would also be consistent with our findings regarding tone latitude. 5.2. JNDs for synthesized F(J contours A number of psychoacoustical experiments using synthesized vowels with level F 0 (Flanagan & Saslow, 1958; Klatt, 1973) have found that the F 0 JND varies between 0.3 to 0.5 Hz (0.04 to 0.07 semitones) when tested at either 80 or 120Hz. This finding is surprisingly close to the JND reported for pure tones (Harris, 1952). However, if F 0 JNDs are measured when the synthesized vowel has either a rising or falling F0 slope, the results change considerably (Klatt, 1973) . For a 250 ms pitch contour which descends as a linear ramp from 135 to 105Hz at 40 Hz/s, the F0 JND varies between 2.0 and 2.5 Hz (0.25 and 0.32 semitones) depending upon the vowel used. When Klatt (1973) explored the F 0 JND for detecting differences in slope rather than in the absolute pitch of a contour, he found that to be perceived as different from a level tone of 120Hz delivered over 250 ms, the test stimulus must have a minimum slope of 12 Hz/s (1.7 semitones/s) which corresponds to an initial frequency difference of + 1.5 Hz and a final frequency difference of -1.5 Hz. To be perceived as different from a falling tone starting at 135 Hz and falling linearly to 105Hz over 250 ms, a minimum change in slope of 32 Hz/s (4.8 semitone/s) is needed, which corresponds to an initial frequency difference of +4Hz and a final frequency difference of -4Hz. In a complex psychoacoustical study, 't Hart (1981) examined the ability of Dutch speakers to discriminate which of two synthesized multisyllabic words (number names) contained the larger rising or falling pitch movement. The words (and their pitch movements) were embedded in the center of larger frame sentences (and intonation contours.) The embedded pitch movements varied between one and six semitones and a total of four different frame sentences was used in 90 combinations from a potential of 324 combinations. In some sentence pairs the overall pitch registers differed while in others they did not, and in most pairs the frame sentences also differed, thus making the task quite difficult. As one might expect, 't Hart (1981) reported tremendous variability in the F0 JND, which was dependent on the listeners' acoustic strategy for making the judgment and on whether they had previous experience with psychoacoustical experiments. The JNDs ranged from one semitone to greater than five semitones, with the rising mid-pitch stimuli being easier to differentiate than the falling mid-pitch stimuli. When those sentences with equal overall pitch registers were examined, however, the mean JND was less and more consistent between subjects; for the rising situation the JND was 1.6 semitones and for the falling situation the JND was 2.1 semitones. t'Hart (1981) concluded from his study that "only differences of more than three semitones play a part in communication situations". Similar results were also reported by Harris & Umeda (1987) in a study using naturally spoken sentences. In contrast, however, are two other studies in which listeners were required to make linguistic judgements rather than the traditional psychoacoustic judgements regarding F 0 differences. In one, Rietveld & Gussenhoven (1985) tested the ability of 30 untrained Dutch speakers to judge differences in "prominence" corresponding to various manipulations of F0 . The stimuli were four naturally spoken sentences, resynthesized with stylized F 0 contours. Various versions of each sentence were
452
E. D. Ross et al.
created, which differed primarily in the size of the Fa excursion on accented syllables. Subjects were presented with pairs of stimuli and were asked to rate the difference in prominence of the accents in the two sentences on a five-point scale. Rietveld & Gussenhoven (1985) concluded from their results that relatively small changes in pitch excursions (1.5 semitones) were easily perceived as changes in prominence by the test subjects, who were also able to order the degree of prominence appropriately between stimulus pairs. Unfortunately, the JND limit for detecting prominence produced by F0 contours were not determined. The second study was an experiment by Zue (reported by Klatt, 1973) in which the Fa contours of the four lexical tones of Mandarin Chinese were recorded from isolated real-speech syllables and superimposed on the synthetic syllable /Pa/. Although all four tones had different F0 contours and mean F 0 pitch heights, as a group, they were centered at 140Hz with a range of 100 to 180Hz. The group range was then synthetically and incrementally reduced, keeping all relative F 0 contour relations constant. Zue found that only when the group range was reduced to less than the 4Hz span from 138 to 142Hz (0.49 semitones) did the accuracy of native speakers (n = 3) in correctly identifying the four lexical tones become less than 90% . (This experiment also reinforces the idea briefly alluded to in Section 5.1 that a subject's knowledge of the tone space of psychoacoustic stimuli may be a crucial variable to control when attempting to determine if lexical tones are perceived categorically.)
5.3. Application of psychoacoustical data to measures of tone latitude The above psychoacoustic and linguistic studies of F0 JNDs strongly suggest that the F 0 differences associated with tone latitude in Taiwanese speech that we have measured should be readily perceivable by native speakers, regardless of whether the phonological tone contour is classified as level, rising or falling (Table 1). For Average Fa, Initial F 0 and Slope F0 (Table II), tone latitude for normal controls was 3.4 semitones, 3.2 semitones and 52 semitones/s, respectiveiy, values that are between one and two orders of magnitude greater than the various psychoacoustic F0 JNDs described in Section 5.2. Even for patients one might expect that listeners could perceive psychoacoustical differences in their speech at a non-affective level of processing since their tone latitude for Average F 0 , Initial F 0 and Slope F 0 were 1.9 semitones, 2.0 semitones and 29 semitones/s, respectively (Table II). Exactly what kind of information, if any, is carried in this margin of tone latitude is not known but it could conceivably contribute, for example, to the idiolectal or dialectal aspects of voice quality. If we focus on our calculated estimates of the component of tone latitude attributable to affect (Table II), all of the estimates still are far greater than their associated psychoacoustic F 0 JNDs, with difference values for Average Fch Initial Fa and Slope F0 being equal to 1.5 semitones, 1.2 semitones and 23 semitones/s, respectively. Therefore, we conclude that the phonological manipulation of the pitch height and slope of tones can serve as one of the principal acoustic carriers of affect in Taiwanese, and that it should be readily perceivable by native speakers even though the lexical categories of tones remain invariant.
"Tone latitude" in Taiwanese
453
5.4. Processing of tonal contrasts by the brain One of the major roles of the brain in processing auditory information is to extract from the environment the relevant acoustical contrasts and features that carry meaning for a given language. These speech-related sounds will have defining or distinctive characteristics involving such factors as intonation, loudness, formants, voice quality, stops, pauses, voicing, etc., that specify for a language its phonetic and prosodic building blocks . Since language functions appear to be distributed between the two hemispheres-the left involved predominantly with linguistic and the right involved predominantly with affective functions-it is conceivable that the two hemispheres could extract meaning from different, similar, or even completely identical "overlaying" acoustical features that are relevant to either the linguistic or the affective signals. To this point, a number of dichotic listening tasks have demonstrated right ear/left hemisphere and left ear/right hemisphere differences in processing certain classes of sounds. The left hemisphere appears better at recognizing linguistic stimuli, such as stop consonant (Shankweiler & Studdert-Kennedy, 1967; Springer, 1973), digits (Kimura, 1961), nonsense syllables (Studdert-Kennedy & Shankweiler, 1970) and tones in tone language speakers (Van Lancker & Fromkin, 1973), whereas the right hemisphere appears better at recognizing nonlinguistic sounds such as nonverbal vocalizations (Carmon & Nachshon, 1973), pitch (Sidtis, 1981), melody (Kimura, 1964), chords (Gordon, 1970), and linguistic sounds associated with affective meaning (Bryden, 1982) and voice identity (Kreiman & Van Lancker, 1988). In fact, some of the findings described above have been corroborated by metabolic studies of the brain, using positron emission tomography (Mazziotta , Phelps, Carson & Kuhl, 1982) and single photon scanning techniques (Maxmillian , 1982; Knopman, Rubens, Klassen & Meyer, 1982), and dichotic listening tasks in patients with focal brain damage (Basso, Casati & Vignolo, 1977; Blumstein, Baker & Goodglass, 1977; Chobor & Brown, 1987; Shankweiler, 1966; Sidtis & Volpe , 1988; Van Lancker & Kreiman, 1987). Additional studies have also demonstrated that processing of auditory stimuli by the brain can change dramatically depending on how and to whom the stimuli are presented. Using melodies presented in a monotic rather than a dichotic paradigm, Bever & Chiarello (1974) found a left ear (right hemisphere) advantage in musically naive subjects but a right ear (left hemisphere) advantage in trained musicians. This conclusion is also supported by a study by Mazziotta et al. (1982) using positron emission tomography. A study by Spellacy & Blumstein (1970), using a dichotic listening paradigm, found either a right or left ear advantage depending on whether the (identical) stimuli were embedded in a linguistic or musical context, respectively. When tones are embedded in words and presented dichotically, Van Lancker & Fromkin (1973) reported a right ear advantage for speakers of tone languages but not for speakers of non-tone languages; if the same tones are presented as hums, no ear advantage occurred for either group of subjects. These studies serve to underscore the idea that acquiring certain skills or changing the contextual environment of auditory stimuli may alter how the brain handles the stimuli and lends support to some of the issues and findings discussed in Section 5 .1. regarding psychoacoustical experiments that attempt to show categorical perception for lexical
454
E. D. Ross et al.
tones. Thus, it should not be surprising that for one hemisphere what constitutes a relevant acoustical contrast may not carry meaning for the other hemisphere. In this context, then, the results of our current study can be interpreted straightforwardly. The utterances examined in this study carried both affective and linguistic information. Informal testing indicated that the linguistic content of all repetitions could be easily understood by normal Taiwanese listeners in all subjects. Thus, the essential acoustical features underlying lexical tone were sufficiently preserved in the utterances of patients despite the loss of affective variation (Edmondson et al., 1987) and, in some cases, the presence of dysarthria. Although our tone sample was small, the statistical results were very robust. As we suspected from our initial inquiry (Edmondson et a/., 1987), one of the acoustic domains utilized by Taiwanese speakers for signalling affect occurs as an "overlay" on intrinsic pitch variations associated with tone production (intrinsic tone latitude) in a margin we refer to as the affective component of tone latitude. This overlay is possible because of the categorical perception of tones by speakers of Taiwanese. Focal damage to the right hemisphere impairs the ability to use tone latitude for affective expression but does not disturb the essential acoustical features underlying the phonological production of lexical tones. Whether our findings concerning tone latitude in Taiwanese can be generalized as a relatively universal acoustic feature of tone languages, however, awaits further study. This research was supported in part by grants from the Research Advisory Group, Department of Veterans Affairs to E .D . Ross and from the Organized Research Fund, University of Texas to J. A. Edmonson. We are indebted to Diana Van Lancker and several anonymous reviewers for their suggestions and insightful critiques of the manuscript, with the usual disclaimers of responsibility applying .
References Abercrombie, D . (1967) Elements of general phonetics. Chicago : Aldine Publishing Co. Abramson, A. S. ( 1961) Identification and discrimination of phonetic tones, Journal of the Acoustical Society of America, 33, 842(A). Abramson, A. S. (1962) The vowels and tones of standard Thai: acoustical measurements and experiments. Bloomington: Indiana University Research Center in Anthropology, Folklore and Linguistics; Publication 20. Abramson, A. S. (1972) Tonal experiments with whispered Thai. In Papers in linguistics and phonetics to the memory of Pierre Delattre (A. Valdman , editor), pp. 31-44. The Hague: Mouton. Abramson, A. S. (1975) The tones of central Thai: Some perceptual experiments. In Studies in Thai linguistics (1. G. Harris & 1. Chamberlain, editors) p. 1-16. Bangkok: Central Institute of English Language. Abramson, A. S. (1976) Thai tones as a reference system. In Tai linguistics in honor of Fang-Kuei Li. (T. Gething, 1. Harris & 1. Chamberlain, editors), pp. 1-14. Bangkok : Chulalongkorn University Press. Abramson, A. S. (1979) The noncategorical perception of tone categories in Thai. In Frontiers of speech Communication, (B. Lindblom, & S. Ohman, editors), pp. 127-133. London: Academic Press. Basso, A., Casali, C. & Vignola, L.A . (1977) Phonemic identification defects in aphasia, Cortex , 13, 84-95 . Benson, D. F. (1979) Aphasia, alexia and agraphia. Edinburgh : Churchill Livingstone. Bever, T. G. & Chiarello, R . 1. (1974) Cerebral dominance in musicians and nonmusicians, Science, 185, 537-539. Blumstein, S. E., Baker, E. & Goodglass, H. (1977) Phonological factors in auditory comprehension in aphasia, Neuropsychologia, 15, 19-30. Bolinger, D. (1964) Around the edge of language: Intonation, Harvard Educational Review, 34, 282-296.
"Tone latitude" in Taiwanese
455
Broca, P. (1865) Du siege de Ia faculte du langage articule, Bulletin de Ia Societe d'Anthropologie , 6, 377-393. Bryden , M. P. (1982) Laterality: functional asymmetry in the brain. New York : Academic Press. Carmon, A. & Nachshon, I. (1973) Ear asymmetry in perception of emotional nonverbal stimuli, Acta Psychologica , 37, 351-357. Chan , S. W., Chuang, C-K. & Wang, W. S.-Y. (1975) Cross-linguistic study of categorical perception for lexical tone. Journal of the Acoustical Society of America, 58 (suppl1) , 119. Chen , M. (1987) The syntax of Xiamen tone sandhi. In Phonological yearbook (C. T. Ewen & J . Manderson, editors) , Vol. 4, pp. 109-149. Cambridge: Cambridge University Press. Cheng, R. & Cheng, S. ( 1977) Taiwan Fujian Hua de Yuyin Jiegou ji Biaojinfa (Phonological structure and romanization of Taiwanese Hokkien). Taipei: Student Book Store. Chobor , K. L. & Brown, J . W. (1987) Phoneme and timbre monitoring in left and right cerebrovascular accident patients, Brain and Language, 30, 278-284 . Crystal , D. (1969) Prosodic systems and intonation in English. Cambridge: University Press. Darley, F. L., Aronson, A. E. & Brown , J. R. (1975) Motor speech disorders. Philadelphia: W. B. Saunders Co. Edmondson, J. A., Chan , J.-L., Seibert, G. B. & Ross, E . D, (1987) The effect of right-brain damage on acoustical measures of affective prosoty in Taiwanese patients, Journal of Phonetics, 15, 219-233. Fairbanks, G. & Pronovost, W. (1939) An experimental study of of the pitch characteristics of the voice during the expression of emotion, Speech Monographs, 6, 87-104 . Flanagan, J. L. & Saslow, M.G. (1958) Pitch discrimination for synthetic vowels, Journal of the Acoustical Society of America, 30, 435-442 . Gandour, J. (1978) The perception of tone . In Tone: a linguistic survey (V. Fromkin, editor), pp. 41-76. New York: Academic Press. Gandour, J. & Dardarananda , R . (1983) Identification of tonal contrasts in Thai aphasic patients, Brain and Language, 18, 98-114. Gandour, J. & Harshman, R. (1978) Cross language differences in tone perception : A multidimensional scaling investigation, Language and Speech, 21, 1-33. Gandour, J ., Petty , S. H. & Dardarananda, R. (1988) Perception and production of tone in aphasia , Brain and Language, 35, 201-240. Gordon, H. W. ( 1970) Hemispheric asymmetries in the perception of musical chords , Cortex, 6, 387-389. Gorelick, P. B. & Ross, E. D. (1987) The aprosodias: Further functional-anatomic evidence for organization of affective language in the right hemisphere, Journal of Neurology , Neurosurgery and Psychiatry, 50, 553-560. Harris, J.D. (1952) Pitch discrimination, Journal of the Acoustical Society of America, 24,750-755. Harris, M.S. & Umeda, N. (1987) Differences limens for fundamental frequency contours in sentences , Journal of the Acoustical Society of America, 81, 1139-1145. 't Hart, J. (1981) Differential sensitivity to pitch distance, particularly in speech , Journal of the Acoustical Society of America, 69, 811-821. Heilman , K. M. , Scholes, R. & Watson , R . T. (1975) Auditory affective agnosia: Disturbed comprehension of affective speech, Journal of Neurology, Neurosurgery and Psychiatry, 38, 69-72. Heilman, K. M. , Bowers, D., Speedie, L. & Coslett, H. B. (1984) Comprehension of affective and nonaffective prosody, Neurology, 34, 917-921. Hughes, C. P., Chan, J. L. & Su, M.S. (1983) Aprosodia in Chinese patients with right cerebral hemisphere lesions, Archives of Neurology, 40, 732-736. Kimura, D. (1961) Cerebral dominance and the perception of verbal stimuli, Canadian Journal of Psychology, 15, 166-171. Kimura, D. (1964) Left-right differences in the perception of melodies, Quarterly Journal of Experimental Psychology, 16, 335-358. Klatt, D. H. (1973) Discrimination of fundamental frequency contours in synthetic speech : implication for models of pitch perception , Journal of the Acoustical Society of America, 53, 8-16. Knopman, D . S. , Rubens, A. B., Klassen, A. C. & Meyer, M. W. (1982) Regional cerebral blood flow correlates of auditory processing, Archives of Neurology, 39, 487-493 . Kreiman, J. & Van Lancker, D. ( 1988) Hemispheric specialization for voice recognition: Evidence from dichotic listening, Brain and Language, 34, 246-252. Lehiste, I. & Peterson , A. (1959) Vowel amplitude and phonemic stress in English, Journal of the Acoustical Society of America, 31, 428-435 . Maxmillian, X. A. (1982) Cortical blood flow asymmetries during monaural verbal stimulation, Brain and Language, 15, 1-11. Mazziotta, J. C., Phelps , M. E., Carson, R. E . & Kuhl, D. E . (1982) Tomographic mapping of human cerebral metabolism: Auditory stimulation, Neurology, 32, 921-937. Naeser, M. & Chan, S. W. C. (1980) A case study of a Chinese aphasic with the Boston Diagnostic Aphasia Exam , Neuropsychologia, 18, 389-410.
456
E. D. Ross eta!.
Packard, J. L. (1986) Tone production deficits in nonftuent aphasic Chinese speech. Brain and Language, 29, 212-223. Pike, K. (1943) Phonetics. Ann Arbor: University of Michigan Press. Putonghua-Minnan Fangyan Cidian (Mandarin-Min Dialect Dictionary) (1982) Xiamen: Fujing Renmin Chubenshe (Amoy: Fujian People's Publishing House) . Reitveld, A. C. M. & Gussenhoven, C. (1985) On the relation between pitch excursion size and prominence , Journal of Phonetics, 13, 299-308. Repp, B. H. (1984) Categorical perception: Issues, methods, findings. In Speech and Language: Advances in Basic Research and Practice (N.J. Lass, editor), Vol. 10, pp. 243-335. New York: Academic Press. Ross, E . D. (1981) The aprosodias: Functional-anatomic organization of the affective components of language in the right hemisphere , Archives of Neurology, 38, 561-569. Ross, E. D. (1985) Modulation of affect and nonverbal communication by the right hemisphere. In Principles of Behavioral Neurology (M.-M. Mesulam, editor), pp. 239-257. Philadelphia: F. A. Davis. Ross , E. D. (1989) Prosody and brain lateralization: Fact vs fancy or is it all just semantics? Archives of Neurology, 45, 338-339. Ross, E. D. & Mesulam, M.-M. (1979) Dominant language functions of the right hemisphere? : Prosody and emotional gesturing, Archives of Neurology, 36, 144-148. Ross, E . D., Edmondson, J. A. & Seibert, G . B. (1986) The effect of affect on various acoustic measures of prosody in tone and non-tone languages: A comparison based on computer analysis of voice, Journal of Phonetics, 14, 283-302. Ross, E. D ., Edmondson, J. A. , Seibert, G. B. & Homan, R. W. (1988) Transient loss of affective prosody following right-sided Wada Test, Brain and Language, 33, 128-145. Shankweiler, D . P. ( 1966) Effects of temporal lobe damage on the perception of dichotically presented melodies, Journal of Comparative Physiological Psychology, 62, 115-119. Shankweiler, D. & Studdert-Kennedy, M. (1967) Identification of consonants and vowels presented to left and right ears, Quarterly Journal of Experimental Psychology, 19, 59-63. Shapiro, B. & Danly, M. (1985) The role of the right hemisphere in the control of speech prosody in propositional and affective contexts, Brain and Language , 25 , 19-36. Sidtis, J. (1981) The complex tone test: Implications for the assessment of auditory laterality effects, Neuropsychologia, 19, 103-112. Sid tis, J. & Volpe, B. (1988) Selective loss of complex-pitch or speech discrimination after unilateral lesion, Brain and Language, 34, 235-245. Spellacy, F. & Blumstein, S. (1970) The influence of language set on ear preference in phoneme recognition, Cortex, 6, 430-439. Springer, S. (1973) Hemispheric specialization for speech opposed by contralateral noise, Perception and Psychophysics, 14, 391-393. Stevens, K. N. & House, A. S. (1961) An acoustical theory of vowel production, Journal of Speech and Hearing Research, 4, 303-320. Studdert-Kennedy, M. & Shankweiler, D. (1970) Hemispheric specialization for speech perception, Journal of the Acoustical Society of America, 48, 579-594. Van Lancker, D. (1980) Centrallateralization of pitch cues in the linguistic signal, Papers in linguistics: International Journal of Human Communication, 13, 201-277. Van Lancker, D. & Fromkin, V. (1973) Hemispheric specialization for pitch and " tone": Evidence from Thai, Journal of Phonetics, 1, 101-109. Van Lancker, D. & Kreiman, J. (1987) Voice discrimination and recognition are separate abilities, Neuropsychologia, 25, 829-834. Wang, W. S.-Y. (1976) Language change, Annals of the New York Academy of Sciences, 280, 61-72. Wernicke, C. (1977) Der aphasische Symptomencomplex. Eine psychologische Studie auf anatomischer Basis. In Wernicke's works on aphasia. Sourcebook and Review (G. H. Eggert, translator). Paris: Mouton. Williams, C. E. & Stevens, K. N. (1972) Emotions and speech : Some acoustical correlates, Journal of the Acoustical Society of America, 52, 1238-1250. Zar, J. (1984) Biostatistica/ analysis. Englewood Cliffs: Prentice Hall.