Journal of Phonetics (1987) 15,2 19 - 233
The effect of right-brain damage on acoustical measures of affective prosody in Taiwanese patients Jerold A. Edmondson Department of Foreign Languages and Linguistics, University of Texas at Arlington, Arlington, Texas, U.S.A.
Jin-Lieh Chan Department of Neurology, Chang Gung Memorial Hospital, Linkou Medical Center, Taipei, Taiwan
G. Burton Seibert Department of Community Medicine, Medical Computing Resources Center, University of Texas Health Science Center at Dallas, Dallas, Texas , U.S.A.
and Elliott D. Ross* Department of Neurology, University of Texas Health Science Center at Dallas , 5323 Harry Hines Blvd, Dallas, TX 75235, U.S.A. Received 15th June 1986, and in revised form 6th April 1987
Using computer-assisted measurements, loss of affective prosody in Taiwanese-speaking patients following right inferior-frontoparietal brain damage is investigated and the results compared to normals in order to determine the requisite acoustical signallers of affective prosody in a tone language. The results are also contrasted to similar data derived from English-speaking patients and are evaluated in terms of differential lateralization of language substrates in the brain.
1. Introduction
In Ross, Edmondson & Seibert (1986a) a quantitative computer-assisted method was presented for analyzing the affective components of prosody, encompassing emotions and attitudes, utilizing a set of 12 acoustical parameters that measured different aspects of timing, intensity and fundamental frequency (F 0 ). The acoustical parameters were based, in part, on traditional measures for analyzing affective prosody (Fairbanks & Pronovost, 1939; Fairbanks & Hoaglin, 1941). New measures were also developed because simpler acoustical techniques (Ross, Holzapfel & Freeman, 1983; Shapiro & Danly, 1985) that easily distinguished right brain-damaged patients with loss of affective prosody from normals and other brain-damaged controls in English speakers were *Author to whom correspondence and reprint requests should be addressed. 0095-4470/87/030219
+
15 $03.00/0
© 1987 Academic Press Limited
220
J. A. Edmondson et al.
unable to do so for similar research groups in Taiwanese speakers (Edmondson & Ross, unpubl.). Aside from developing quantitative methods (Ross eta!., 1968a), we also studied how well speakers of three tone languages (Mandarin Chinese, Taiwanese and Thai) and one non-tone language (English) were able to manipulate the various acoustical parameters for overall affective signalling across a spectrum of affects represented by neutral, happy, sad, angry and surprise. Such a comparison is of interest in approaching universal brain-language-acoustical relationships because the most salient physical concomitant of affective intonation and the linguistic category of tone is manipulation of Fa, thus providing the possibility for exploring a potential double dissociation and differential interactions among the three (Abramson, 1962; Chao, 1968; Connell, Hogan & Roszypal, 1983). The results clearly indicated that the presence of tone in a language adversely impacts the free use of Fa for affective-prosodic signalling. Specifically, English-speakers were found to be superior to their cohort tone-language speakers in three of the five parameters that characterized the ability to manipulate Fa, i.e. Delta Fa , Fa Variation and Fa Slope [see below and Ross eta!. (1986a) for definitions and discussions of these parameters]. English speakers singularly surpassed the tone language speakers in their Fa range, variation in Fa slope and in their ability to vary Fa locally , when contrasted to a baseline " flat" utterance, for affective purposes. On the other hand, we found no language-type superiority for two other Fa-related parameters, Fa Register and Fa Attack, which measure the mean pitch register of an utterance and how quickly Fa is varied within its range across an utterance, respectively. In addition, no superiority for time or intensity related parameters were found . It was concluded, therefore, that the reduced ability of tone-language speakers to manipulate certain Fa-related features of the speech signal for affective purposes was due to the constraining influence of tonal contrasts on intonation. Expanding on Van Lancker's (1980) proposal concerning pitch cue lateralization in the brain, we also hypothesized that if the modulation of a specific acoustical parameter is lateralized in the brain, then the lateralization would be dependent on the behavioral properties of the parameter rather than on its acoustical properties. In order to continue on this line of research, the next logical step would be to investigate right-brain damaged subjects in comparison with normal controls from populations of both tone and non-tone language speakers and cross-compare the populations utilizing the same research paradigms and measuring techniques described in Ross eta!. (1986a) . The major emphasis of this paper is to investigate loss of affective prosody from focal right-brain damage in patients who speak a tone language (Taiwanese) by determining the acoustical profiles associated with the loss and to verify psychoacoustically that normal speakers of the language can detect the affective flattening of voice rendered by the brain injury. Right-brain damaged subjects constitute a litmus test for the above research goals because of the right-hemisphere's putative role in dominantly modulating the affective components of language (Ross, 1984). Knowledge of this role of the right hemisphere in language is based on a series of clinical studies (Ross, 1981; Hughes, Chan & Su, 1983; Gorelick & Ross, 1987) in both English and MandarinChinese speaking patients with focal right-brain lesions which have demonstrated that various combinations of affective processing deficits, called aprosodias, occur depending upon lesion location. The aprosodias appear to correspond in symptom-type to the well-studied combinations of linguistic deficits, called aphasias (Benson, 1979), that follow focal left hemisphere damage. As the right hemisphere appears to control
Loss of affective prosody in Taiwanese patients
221
dominantly the affective components of language, then a study of right-brain damaged patients who have loss of affective prosody should define clearly which acoustical properties of the speech signal- as measured by the 12 parameters-underlie affect in speech. Based on previous work (Ross et al., 1986a), we expected that the acoustical concomitants of affect in speakers of a tone language would be different from those of English speakers and would not involve the F 0 -related acoustical parameters-F0 Variation , Delta F 0 or F 0 Slope. As will be shown below, however, most, but not all, of these ex pecta tions were met. The second important issue addressed here is a demonstration tha t the acoustical methods developed in Ross et al. ( 1986a) are both useful and powerful tools for studyin g prosodic phenomena. 2. Methods
2.1. Patients and controls The patient group consisted of eight right-handed native-speakers of Taiwanese, two women and six men , with a mean age of61 ± 11 years; all had suffered right-hemisphere strokes and were hospitalized at Chang Gung Memorial Hospital, Linkou Medical Center near Taipei, Taiwan. They were interviewed and examined by one of us (J.L.C.), who is not only a native-speaker of Taiwanese but also a fully trained neurologist. Each interview began with a few questions concerning the patient's awareness of his condition and a brief assessment of his linguistic capabilities. None of the patients showed any impairment of tones or segmental aspects oflanguage that could be construed as aphasic. All the patients, however, evidenced emotional flattening of speech, including loss of affective repetition, with variable deficits in affective-prosodic comprehension, findings characteristic of either motor or global aprosodia (Ross, 1981 ; Hughes et a!., 1983; Gorelick & Ross, 1987). They also had a left hemiplegia with variable sensory loss and had transient left hemi-spatial neglect with denial of illness, an expected behavioral consequence of right-brain damage. The computerized tomographic images (CT scans) of the brain demonstrated that the strokes involved, at the minimum, the right inferiorfrontoparietal regions that border the Sylvian fissure and/or deep right-hemisphere brain structures (basal ganglia); the left-hemispheres were free of lesions. Two patients had hemorrhagic a nd the remainder ischemic strokes. The interviews were conducted between the lOth to 47th day post-stroke. The control group consisted of eight non-hospitalized native speakers of Taiwanese, three women and five men, with a mean age of 58 ± 8 years, who all resided in Taiwan. They were interviewed by J.L.C. , who screened them for the presence of neurological or psychiatric illnesses; in addition, none were taking medicines that could alter affective prosody, such as neuroleptics. 2.2. Data collection The voice data were recorded on a Sony WM-60 Professional Quality Cassette Tape Recorder using a Unisound EM-84 electret condensor microphone that was attached to the boom of a headset and positioned 2 em from the mouth just to the side of the airstream. This procedure maintained a fixed distance between the mouth and microphone irrespective of head movements and thus allowed accurate measurements of intensity variation . High-quality recording tape was used throughout to ensure good
222
J. A. Edmondson et al.
frequency response in the critical range of 70-8000 Hz. Patient interviews were conducted in the hospital in a clinical setting and control interviews were done at the subject's home. 2.3. The repetition task After the subjects had been instructed in the conduct of the test and given an opportunity to respond to "live" verbal trials, a standardized stimulus tape was played to each . This tape contained various affective renditions of the sentence: [lls3 bes3 k?l k' ua21 dian22yas3 ] you want go see movies " You are going to the movies" produced at an earlier time by J .L.C. The tokens on the stimulus tape represented the affectively "best" and "most distinctive" productions of the sentence and were ordered as follows: (a) six neutrals with one used as a " target" neutral for calculation of Delta F 0 (see Section 2.4.); (b) five angries; (c) five sads; (d) five surprises; and (e) five happies. Each subject was asked to model the stimulus sentence as closely as possible. After each token was played, the tape was paused until the subject responded. Retakes as needed were allowed and patients were monitored carefully for any sign of disinterest or inattention. If necessary, the session was interrupted to refocus the patient on the task. The aim of this part of the experiment was to gather a spectrum of representative affective utterances in order to perform an analysis of emotional range as described previously (Ross et a!., 1986a; see below). 2.4. Acoustical analysis of the repetition task The voice recordings of patients and normals were analyzed acoustically using a PM Pitch Analyzer (Voice Identification, Inc.) that was interfaced to a PDP 11 /23 computer (Digital, Inc.). Each subject's repetition responses were played into the Pitch Analyzer, which extracted the F 0 (in Hz) and intensity (in dB) from the utterance and presented the extracted information as upper and lower traces on a cathode ray tube . The fundamental frequency and intensity data were simultaneously accessed every 10 ms by the PDP 11 /23 through a parallel interface. After removal of stray data points resulting from microphone artifacts, voice break-ups or other sampling errors, the digitized F 0 and intensity curves for each utterance were analyzed in terms of the same 12 acoustical parameters that were designed previously to quantify affective prosody (Ross eta!., 1986a). The full derivation of the acoustical parameters is described in that paper, which should be consulted for details and mathematical arguments. Before the 12 parameters were calculated, the F 0 data (measured in Hz) were transformed into semitones, a logarithmically-based pitch scale (Fairbanks & Pronovost, 1939: 94). This transformation is necessary because many of the F 0 parameters used rely on comparing intervals around different mean frequencies ; absolute physical scales that use Hz are not interval-preserving with increasing pitch and are, therefore, inappropriate for our purposes (Ross eta!., 1986a). Reduced to their most important features , these 12 acoustical parameters are: (I) F Slope (semitones cs - 1): this is an overall estimate of whether an utterance's 0
intonation is declining, rising or flat and is calculated using a linear regression analysis of the F 0 data (Zar, 1984). Only those data points where voicing occurs are used in the computation . (2) Total Time [log 10 (ms)] : this is a logarithmic conversion of the measured time of the
Loss of affective prosody in Taiwanese patients
223
utterance determined by the intensity trace on the PM Pitch Analyzer. This transformation is similar to the semitone transformation of the Hz data and allows relative comparisons of durational changes across different utterances (Ross et a!., 1986a). (3) Percent Pause Time: this is the percentage of time within an utterance where intensity drops to the baseline. (4) Percent Voicelessness: this is the percentage of time within an utterance when voicing is not present. (5) Fa Register (semitones): this is the mean Fa of an utterance calculated from all semitone data points during voicing. This measure provides no indication of the variation of Fa around its mean or how quickly it changes across the utterance (see below). (6) F 0 Variation (semitones): this is the standard deviation (n - I degrees of freedom) about mean Fa. This measure estimates the overall range of the Fa across the utterance but gives no information on how quickly Fa varies within its range. (7) Fa Attack (semi tones s- 1 ) : this is a measurement of how quickly Fa varies within its range across the utterance. Fa attack is determined by finding the slope between each consecutive semitone data point associated with voicing across the utterance and calculating the mean and standard deviation of the results. Fa attack is equal to the standard deviation, a digitally-derived statistical estima tion of the first derivative of the Fa contour with respect to time (Ross et a!. , 1986a). (8) Delta Fa (semitones): this is a measure of how similar or different the affective intonation contour is from a neutral (flat) target utterance. After the Fa curve being analyzed is register and time adjusted to the target curve, the two curves are then compared point-for-point and the absolute difference is calculated for each centisecond of voicing as long as both curves evidence voicing. This measurement is very sensitive to local or short-range differences. (9) dB Register (dB): this is the mean intensity of an utterance calculated using all intensity data points which are greater than zero. This measure provides no indication of the variation of intensity around its mean or how quickly it changes across the utterance (see below). (I 0) dB Variation (dB): this is the standard deviation (n - I degrees of freedom) about the mean intensity of the utterance. This measure estimates the overall range of variation of the intensity across the utterance but gives no information about how quickly intensity varies within its range. (II) dB Attack (dBs - 1): this is a measurement of how quickly intensity varies within its range across the utterances. dB Attack is determined in the same manner as Fa attack (standard deviation of the mean point to point slope change in intensity across the utterance) and is a statistical representation of the first derivative, calculated digitally, of the intensity contour with respect to time (Ross et a!., 1986a). ( 12) Delta dB (dB): this measurement indicates numerically how similar or different the dB contours are between the utterance and a neutral target utterance. After the intensity curve being analyzed is register and time adjusted to the target curve, the two curves are compared point-for-point and the absolute difference is calculated at each centisecond. This measure is very sensitive to local or short-term differences between the curves but not to mean , long-range differences (Ross et a!. , 1986a). 2.5 . The perceptual tasks In order to verify that the emotional flattening of speech observed among patients could be di scriminated and identified by native speakers, a two-fold perceptual experiment was
224
J. A . Edmondson et a!.
devised. A second stimulus tape was constructed from the tape recordings of the subjects' affective repetitions as follows : for every subject the best affective response for each of the five affects was dubbed onto a second tape to produce an affective set. In addition , at the beginning of the second tape a similarly constructed affective set was recorded from the original stimulus tape produced by J.L.C. The affective sets of patients and controls were randomly ordered on the second tape. Five native-speakers of Taiwanese living in Dalla s, Texas were asked to assess the 16 affective sets in comparison to J.L.C.'s set using an ordinal, ranking scale of poor, fair, good , or excellent. The judges were asked to rate the subjects on their overall affective variation across the five utterances comprising .a set in comparison to J.L.C.'s set. A third stimulus tape was constructed from the second as follows: the 80 subjectresponses that had been dubbed onto the second stimulus tape as 16 subject sets (five affects per set) were randomly recorded, one at a time, onto a third stimulus tape. Seven native-speakers of Taiwanese living in Dallas were then asked to indicate on a scoring sheet the affect portrayed for each token using the forced-choices of neutral, happy, sad, surpnse or angry.
3. Results 3. 1. Data reduction and statistical analy sis of acoustical data For every subject, five replicate tokens for neutral, angry, sad, surprise and happy were analyzed in terms of the 12 acoustical parameters described in Section 2.4. After all the subject's utterances were processed, the derived data were reduced by averaging the results found across the five replicate tokens for each affect for every subject in order to obtain a mean affective response for every acoustical parameter, a statistic which represents the most precise, least-variable estimate of that response. A summary data file was then transferred to a VAX II /780 (Digital, Inc.) for detailed statistical analysis using SAS (SAS Institute, Inc.). As the aim of the work was to compare statistically the performance of impaired individuals to normals in regard to their ability to signal affect across the spectrum represented by neutral, happy, sad, angry and surprise, we used the summary statistic called Emotional Range (ER), which was developed in our prior study. As was discussed in Ross et al. (1986a), such data reduction is crucial, because previous studies by Lieberman ( 1961) and Williams & Stevens ( 1972) have shown that there is an interaction between speakers and emotions for a given affect. Some speakers, for example, exhibit the maximum mean affective response for Delta F 0 during surprise, others during angry. In order to derive an overall measurement of affective signalling for each acoustical parameter, the summary statistic, ER, is defined as the maximum mean affective response minus the minimum mean affective response across the set of five affects . The derived measurement of ER, therefore, estimates how well a particular acoustical parameter is utilized over its greatest range for affective signalling, without regard to which affects represent the minimum or maximum values. (The actual raw ER data for each subject can be found in Table II.) After all ER values had been calculated, the subjects were grouped into patients and controls and a mean ER value was derived for each acoustical parameter by group. The group-mean ER (gmER) results for each acoustical parameter were then compared employing a t-test for independent gro ups. Values of p greater than 0.0 I were considered
Loss of affective prosody in Taiwanese patients
225
TABLE I. Summary statistics and data analysis of acoustic results su~jects
(a) Group-mean emotional ranges for
F0 Register
F 0 Variation
Mean
Mean
F 0 Slope Mean Controls (n = 8) Patients (n = 8) t-Test(df= 14) p
Result category*
SD
by F0 -dependent acoustic parameters
4.8 1.2 3.1 1.8 2.24 0.02 ns
SD
10.4 3.1 2.7 1.7 6.42 0.001 hs
F 0 Attack
SD
1.0 0.3 0.7 0.3 2.02 0.03 ns
Mean
SD
3.0 0.9
1.0 0.5
Delta F 0 Mean
SD
121 26 28 55 4.90 0.001 hs
5.44 < 0.001 ns
(b) Group-mean emotional ranges for subject by intensity-dependent acoustic parameters dB Register
Controls (n = 8) Patients (n = 8) t-Test (df = 14) p
Result category*
dB Variation
dB Attack
Delta dB
Mean
SD
Mean
SD
Mean
SD
Mean
SD
8.6 4.6
1.7 1.8
1.2 1.2
0.3 0.5
5.3 4.0
2.0 0.7
333 181
109 83
4.51 < 0.001 hs
1.67 0.06 ns
0.00 I
ns
3.13 0.004 s
(c) Emotional ranges for subjects by time-dependent acoustic parameters
Total Time
Controls (n = 8) Patients (n = 8) t-Test p
Result category*
Percent Pause Time
Percent Voicelessness
Mean
SD
Mean
SD
Mean
SD
0.24 0.15
0.03 0.06
6.7 1.9
6.6 2.6
17.9 11.7
4.8 6.4
1.92 0.04 ns
3.97 < 0.001 hs
2.19 0.02 ns
*hs, Highly significant; s, significant; ns, non-significant.
non-significant (ns). The threshold of significance was chosen asp = 0.01 , because we were seeking robust differences between the two groups and because multiple t-tests had been performed on the data (Zar, 1984). The results were divided into three categories: gmERs involving (a) fundamental frequency, (b) intensity and (c) time. Summary statistics for the three groups are presented in Table I together with the results of the t-tests and the accompanyingp values for the differences between the groups. In addition, a result category is shown, whereby p values less than or equal to 0.001 are labelled highly significant and those less than or equal to 0.01 are labelled significant. 3.2. Acoustic results
Five acoustic parameters showed gmER differences between controls and patients that were highly significant (p < 0.001). Three of these were parameters involving F 0 , namely F 0 Register, F0 Attack and Delta F 0 [Fig. I; Table I(a)]. In addition, one intensity-dependent parameter, dB Register, and one time-dependent acoustic
226
J . A . Edmondson et a!.
15
(a )
Fo Register
( b)
F0 Attack
(d)
dB Register
( f)
Delta dB
4 T
"'c: .g
10
"'c:
E
.,E
.E
en
en "'
5
0
2
0 ( c)
Delta Fo
150
10
"'c:
E
m
100
u
E
"' en
5 50
0 0.3
0 (e)
Total time
400
E
m
0
u
C>
.2
200
0 .0'----___.,,.___ _ __ Control s
Patients
Figure 1. Group-mean Emotional Ranges (gmERs) for those acoustical parameters demonstrating highly significant or significant differences between patients and controls; p < 0.001 for F 0 Register, F 0 Attack, Delta F 0 , dB Register and Total Time, and p < 0.01 for Delta dB. Error bars indicate one standard deviation.
parameter, Total Time, showed highly significant results [Fig. 1; Table l(b, c)]. A sixth acoustical parameter, Delta dB, also showed a significant difference between controls and patients [p < 0.01; Fig. I; Table l(a)]. It should be pointed out that among the highly significant results, there was no overlap of individual ER values between patients and controls for Fa Register (Table II), with the gmER value for patients being 26% of the gmER value for controls [Table l(a)] . This was the most impaired acoustical parameter. Fa Attack, Delta Fa , dB Register and Total Time were the next most compromized parameters with the patients' gmER values being 30, 45, 53 and 63% , respectively, of the controls' gmER values [Table I(a, b, c)]. Typically, there was only a single overlapping ER value between patients and controls (Table II). Delta dB, which had a significance value of p = 0.004, showed somewhat more variance than the other parameters, with three overlapping ER values between controls and patients (Table II), and had a gmER value of 54% of the controls' gmER value [Table l(b)]. The rest of the
TABLE II. Summary statistics and correlations between affective voice ratings by judges vs. emotional range (ER) res ults by acoustical a nalysis Judge's (n = 5) ratings* of affective repetition Subjects (n
= 16)
I
2
3
4
5
E E E E E E
E E E E E E E E
E E E E E
E E E E E
G G G E G E
p F p p p p p p
F p p p p p p p
Contro ls I (n = 8) 2 3 4 5 6 7 8
G G
E E E E E E E E
I 8) 2 3 4 5 6 7 8
F F p p p p p p
F F p p p p p p
Pa ti ents (n
=
Total Affective Score* F 0 Reg. F 0 Atk.
=
Total Time
Delta Fo
Delta dB
Percent Fo Percent Pa use Slope Voic.
Fo Va r.
dB Atk.
dB Var.
20 20 20 20 20 18 18 18
10.8 17. 1 10.4 9.0 7.6 12.2 8.2 7.9
3.3 2.0 3.8 1.3 4. 1 3.0 2.8 2.9
9.3 10. 1 10.5 8.5 5.4 9.6 8. 1 7.0
0.26 0.25 0.25 0.2 1 0.2 1 0.20 0.26 0.25
109 156 12 1 130 129 146 74 102
258 283 473 393 283 170 322 482
17.0 2.3 17.3 3.0 1.0 3.5 6.8 3.0
5.1 4.8 2.4 5.2 5.7 5.2 5.9 3.7
20.9 11.3 23 .9 20.7 11.2 16.5 21.7 16.8
1.1 1.4 0.7 0.8 1.0 1.4 0.8 0.7
5.9 3.3 3.7 6.7 5.7 3.3 9. 1 4.8
1.3 1.9 0.8 1.2 1.1 0.9 1. 2
9 8 6 5 5 5 5 5
6. 1 2.5 1.6 1.2 1.8 1. 1 3.0 0.8
1.2 1.9 0.4 0.9 0.6 0.6 0.6 0.8
5.8 6.6 2.8 4.8 5.6 5.2 5.2 1.1
0. 19 0. 17 0. 18 0. 14 0.06 0.24 0. 1 I 0.09
58 45 19 105 69 76 35 31
209 14 1 137 78 15 1 157 22 1 357
0.4 0.8 7.7 1.3 0.0 0.3 3.4 1.4
4.3 2.6 1.7 5.5 5.1 3.2 1.4 0.8
14.2 5.2 7.5 7.8 17.0 9.4 24.2 8.4
0.9 0.9 0.3 1.1 0.7 0.4 0.9 0.2
4.2 3.7 3.5 4. 1 5.0 4.8 4.3 2.9
2.2 1. 1 0.6 1.1 1.8 1.3 0.7 1.2
0.584 < 0.05
0.472 ns
0.386 ns
F p F p p p p p
Spearman Rank correlations (e) Total Affective Scores vs. ER resu lts (p)t *E, Excellent (score
dB Reg.
4); G , good (score
=
0.864 0.800 < 0.00 1 < 0.001
3); F, fair (score
=
0.794 <0.001
0.7 51 0.727 <0.00 1 < 0.0 1
2); P, poor (score
=
0.353 ns
1.5
t-< <::> ~
~
~
""~
~·
'1"" ...:::! <::>
c., <::>
~
s·
~ ~·
!:)
:::s
"""" c.,
'1:::!
!:)
~· <::;
:::s
0.353 0.234 0. 145 ns ns ns
1).
t ns, Not sign ificant ( > 0.05).
lv
N
-..1
228
J. A. Edmondson et al.
acoustical parameters, F 0 Slope, F 0 Variation, dB Attack, dB Variation, Percent Voicelessness and Percent Pause Time, showed no significant differences between patients and controls.
3.3. Data reduction and statistical analysis of perceptual results As discussed in Section 2.5, we investigated not only the acoustical properties of repeated affective utterances, but also sought to verify that the affective flattening of voice we observed among patients was discriminable by native speakers. In the first perceptual experiment, judges were asked to rate the overall affective variation present in each of the 16 affective sets recorded on the second stimulus tape with each subject's set being comprised of five affective tokens representing neutral, sad, happy, angry and surprise. The judges' ratings were based on an ordinal scale that was assigned numerical equivalents for data reduction and analysis (poor = I; fair = 2; good = 3; excellent = 4). A Total Affective Score (T AS) for each subject was calculated by summing the five numerical scores across judges (Table II). Thus, the maximum T AS which a subject could receive was 20 and the minimum 5. As Table II clearly demonstrates, there was no overlap whatsoever between the TASs of controls and patients. The former ranged in value from 18 to 20 (mean = 19.3 ± 1.0), whereas the latter ranged from 5 to 9 (mean = 6.0 ± 1.6). On this basis it appears that native speakers have no trouble in discriminating between the two groups in an absolute manner, thus, rendering statistical analysis superfluous. This result agreed well with our own impressions of the taped utterances of patients vs. controls. The second part of the perceptual experiment involved having seven native-speaking judges identify the affect portrayed in 80 randomly-presented tokens (40 patient and 40 control) on a third stimulus tape using a forced-choice response (see Section 2.5.). The third tape was derived from the 16 affective sets, 80 tokens in all, on the second stimulus tape. This experiment was done to address the issue of whether or not the flattening of affect caused by right-brain damage in patients represents attenuation of affective production with retention of the ability to project affect or a "true" loss of affective modulation. The judges' scoring results were reduced as follows. For each judge the number of correctly identified affective tokens were tabulated separately for patients and controls and a percent correct score was derived based on 40 tokens for each subject group (Table III). Then a mean (percent correct score) with standard deviation for patients and controls was calculated across the judges. For the controls, the judges identified correctly the intended affect 66 ± 10% of the time. In contrast, for the patients they were correct only 29 ± 4% of the time with no overlap in the individual per cent correct scores between patients and controls (Table III). The groups were compared statistically utilizing the non-parametric Mann- Whitney test which rejected the null hypothesis that the distribution of the judges' scores were equivalent for patients and controls (p < 0.001). In addition, it should be noted that the mean percent correct score for patients (29%) is fairly close to that which would be expected if the judges' identifications were at the chance level, that is 20% based on a forced-choice of five possible responses. Thus, the results of the second perceptual experiment confirm robustly the clinical observations (Ross & Mesulam, 1979; Ross, 1981 ; Hughes et al., 1983; Gorelick & Ross, 1987) that acute right inferior-frontoparietal brain lesions cause a "true" loss of affective modulation in speech.
Loss of affective prosody in Taiwanese patients TABLE
229
III. Summary statistics for judges' ability to identify intended affect in 80 token utterances Percent correct scores across judges (n = 7) Subject group
Controls (n = 8; 40 tokens) Patients (n = 8; 40 tokens)
55.0 22.5
2
3
4
5
6
7
Mean
SD
55.0 30.0
57.5 27.5
67.5 30.0
70.0 35.0
75 .0 30.0
80.0 30.0
65.7 29.3
10.2 3.7
3.4. Correlation of perceptual with acoustical data Although the judges in both perceptual experiments and the ER data for F 0 Register were able absolutely to distinguish patients from controls, we felt that it would be valuable to explore statistically and in detail the correlation of the perceptual and acoustical results. In order to accomplish this, as shown in Table II , a series of Spearman Rank Correlations (Zar, 1984) were done using the T AS and the ER results for each acoustical parameter across all subjects. As can be seen from Table II, every acoustical parameter found in Section 3.2. to be associated with affective flattening in patients was, for the most part, highly correlated with the perceptual results, thus substantiating that the acoustical techniques are in fact measuring, at some quantitative level, the intended behavior (affective prosody). 4. Discussion 4.1. Prevous predictions and current results Ross et al. ( 1986a) found that English speakers exhibited significantly greater ER values during affective repetition in comparison to tone language speakers for the acoustic parameters F 0 Slope, Delta F 0 and F 0 Variation. It was anticipated, therefore, that the "flattening of speech" in motor-type aprosodics who speak tone languages would be reflected primarily in reduced ER values for acoustic parameters other than F 0 Slope, Delta F 0 and F 0 Variation. Ross et al. (1986a) argued that the presence of tones in a language caused these three parameters to be attenuated for affective purposes, because manipulating them for affective signalling would cause distortion in local pitch contrasts that might impact linguistic information. Of the remaining nine parameters it was strongly suspected that impaired individuals should have at least reduced F 0 RegisterER values, because tone language speakers often seemed to change the overall pitch height (F 0 Register) of an utterance for affective signalling, a strategy that allows the speaker to maintain the relative values of local pitch differences and thus preserve tonal contrasts. Our assumptions that F 0 Slope, F 0 Variation and Delta F 0 would not be compromised, and that F 0 Register would be, in tone languages speakers after right-hemisphere strokes causing motor-type aprosodia, were, by and large, confirmed . The results for F 0 Slope and F 0 Variation, which did not show significant reductions of gmER values, and F 0 Register, which did show significant reductions, were as anticipated. Surprisingly, however. Delta F 0 showed a highly significant reduction of the gmER value in patients vs. controls. This unpredicted, but important result will be discussed fully in Section 4.3. As anticipated, other acoustic parameters were found, in particular dB Register and Total Time, that were also impaired significantly by right-brain damage and thus are
230
J. A . Edmondson et al.
important acoustical-prosodic signallers of affect in Taiwanese. It is also noteworthy that patients did not show increases of ER values in any of the other acoustical parameters, which indicates that the effect of right-brain damage is to reduce overall affective signalling rather than just to rearrange the acoustic profile in a compensatory manner. Such overall attenuation of affective signalling was quite apparent when listening to the tapes, even to those of us who do not speak Taiwanese, a qualitative observation that was amply confirmed by the Taiwanese-speaking judges in the two perceptual experiments (see Section 3.3.). If we were to characterize qualitatively the speech of patients to normal controls, it was weak in intensity with reduced stiffness of the vocal folds and subglottal airstream pressure, which caused the patients often to lapse into breathy and creaky phonation modes. These characteristics coupled with the inappropriate prosodic uniformity during affective repetition gave the impression that patients were emotionally flat. 4.2. F0 Attack, Total Time and dB Register
The significant reduction of the F0 Attack gmER is, in all likelihood, a secondary manifestation of the loss of the ability to vary the Total Time of an utterance for affective signalling. As was argued in Ross et a/. (1986a), if the relative height and contour relationships among contrastive intonational constituents of an utterance are maintained and simultaneously the utterance is lengthened or shortened for affective purposes, then F 0 Attack must decrease or increase, respectively. Because the brain-injured subjects showed a significantly reduced gmER for Total Time, then it was to be expected that the F 0 Attack gmER should be compromised . The reduction of dB Register and F 0 Register gmERs in the patients indicates that these acoustical parameters are important measures of affective signalling in normals. This result is not unexpected , because, as argued in Ross et a/. ( 1986a), raising or lowering the overall mean F 0 - and by analogy the overall mean intensity- of an utterance should, ceteris paribus, not disturb tonal contrasts to any degree. This process is similar to raising or lowering the " key" of a piece of music to match the register of a singer, which alters the overall pitch but not the melody. 4.3. Delta F0 and signalling affect
As stated above, it was predicted that Delta F 0 would be immune to right-hemisphere damage, as this parameter was constructed to be very sensitive to local rather than overall changes in pitch contours (Ross et a!. , 1986a). Because Delta F 0 was markedly attenuated in normal tone language speakers when compared to English speakers, it was concluded that this difference was attributable to the inability of tone language speakers to locally alter the pitch contour for affective purposes because making such changes could destroy tonal contrasts (Ross et a/. , 1986a). Nevertheless, Delta F 0 proved to be reduced to a highly significant degree during affective signalling in Taiwanese after right-brain damage. The authors believe the explanation for this finding may lie in the fact that tones in context are allowed some prosodic variation from their citation forms without disrupting lexical information . Although the presence of tones in a language clearly places constraints on the manipulation of F 0 (Ross et al. , 1986a), enough freedom or play in the required precision of tone contrasts appears to remain for speakers to exploit this freedom for affect purposes. In order to measure the exact degree of tonal
Loss of affective prosody in Taiwanese patients
231
freedom allowed durin g affective signalling, different acoustical methods to those presented here would ha ve to be employed . Specifica lly, the variation for each tonebea rin g unit within an utterance, i.e. the syllable, would have to be analyzed. It is virtually assured , however, that such variation exists, since Chang (1958) , Ho (1977) and Connell et a!. (I 983) found that the fundamental frequency of a tone could be di sto rted by sentential prosody, e.g. interrogration, exclamation, etc. , and yet the semantic va lue of that lex ical item could still be recogni zed by native speakers. The co nstrained freedom to di sto rt to nes in context for affective or sentential prosody should no t be confused but rather contrasted with the phon o logical, rule-governed alterations in tone as part of the lingu ist ic system of that language, e.g. tone sa ndhi (Cheng, 1982). Our a nal ysis would extend thi s claim by saying that native speakers of Taiwanese actuall y ex ploit the allowable imprecision in the F 0 templates or phonetic targets of their tones to express affect, as stroke victims convi ncingly demonstrate when they lose this capacity. 4.4. Preliminary comparisons of loss of affective prosody in English and Taiwanese speakers Although no study on the phonetics of affective loss in English speaking patients with motor-type aprosodias using the acoustical analyses outlined in this paper (we have such a study underway) is currently available, we do have relevant acoustic information bearing on this matter gained from five English speaking patients who underwent a right-sided Wada test (Ross, Edmondson & Seibert, I 986b). The Wada procedure involves injecting sodium amytal into the right (or left) internal carotid artery, which supplies blood to most of the ipsilateral hemisphere. The amytal causes a transient neurological deficit, in this instance a left hemiplegia with sensory loss and impairment of affective prosody (see footnote in Ross et a!. ( 1986a) for a more complete discussion of the Wada test) . This test allows a within-subjects acoustical analysis of affective prosody before, during and after the Wada test. We found that for subjects injected on the right side, the ER values for the acoustic parameters F 0 Regi ster, F 0 Variation and Delta F 0 were all significantly reduced (p < 0.001) during the Wada test in comparison to their values before or after it. The ER of F 0 Slope was also reduced from its baseline value during the Wada test (p = 0.004). None of the intensity or timing ER values differed significantly during the Wada test from baseline. Thus, the profile of acoustical parameters that seem to be affected in English-speaking patients with right-brain impairment is different from that in Taiwanese-speaking patients. Furthermore, the average reduction of Delta F 0 ER in English speakers during the Wada test amounted to 209 semitones (pre-Wada Delta F 0 gmER = 285 ± 49 semi tones vs . Wada Delta F 0 gmER = 76 ± 69 semi tones) , whereas in Taiwanese speakers the differences in Delta F 0 gmER between normals and patients amounted to 66 semitones [normals' = 121 ± 26 vs. patients' == 55 ± 28 semitones; Table I(a)]. These differences underscore the fact that, even though Taiwanese speakers may exploit latitude in tone contrasts for affective signalling (see Section 4.3.), the amount of local variation allowed for affective signalling as measured by Delta F 0 (66 semitones) is far less than that allowed to English speakers (209 semitones). While these findings must remain tentative at this time because the type of brain " lesions" employed in the two studies are not entirely commensurate, even though the behavioral effects on voice are indistinguishable, they nevertheless appear to confirm
232
J. A. Edmondson eta!.
that tone and non-tone languages employ different combinations of acoustical features for affective signalling. English utilizes four of five parameters involving fundamental frequency, whereas Taiwanese uses three of five F 0 parameters, one of which (F 0 Attack) is not used by English speakers, in addition to two non-F 0 parameters, dB Register and Total Time. The latter two parameters may represent compensatory use by tone speakers of certain non-F0 related acoustic parameters for affective signalling because of the constraints of tone contrasts. It should be pointed out that in Ross et al. (1986a) it was argued that tone languages did not seem to use non-F 0 compensatory acoustic parameters for affective signalling, because English speakers performed as well as the tone-language speakers on the non-F 0 parameters during the affective repetition task. The data currently presented here suggest that, in fact, tone language speakers may use non-F 0 compensatory parameters. 4.5. Conclusion
In this paper two categories of results have been presented: (a) verification of the adequacy and sensitivity of the acoustical methods developed in Ross et al. (1986a) for the study of affective prosody, a methodological finding; and (b) determination of acoustical concomitants of affective prosody in a tone language as revealed by comparing the performance of right-brain-damaged patients to normal subjects on an affective repetition task, a substantive finding. In regard to the former, this study mostly grew out the inadequacy of our initial attempts to extend the standard measuring techniques for quantifying affective prosody in English to tone languages. In Ross et al. (1986a) it was possible to use 12 acoustical measures of prosody to distinguish normal English speakers from normal tone-language speakers (Mandarin, Taiwanese and Thai). These same 12 measures also proved sufficiently sensitive to distinguish acoustically between the affective consequences of right inferior-frontoparietal brain damage on the speech of Taiwanese patients and normals. Thus, the acoustical techniques have demonstrated their usefulness by raising the threshold of precision to a point at which linguistic and affective prosody can be separated, even in tone languages. Furthermore, the technique when applied to English speakers has yielded very robust data (Ross et a!., 1986b) which suggest that these acoustical metrics are both powerful and general tools for analyzing prosody. The second result category (see above) substantiates the division oflabor between the hemispheres of the brain in regard to linguistic vs. affective components of language (Van Lancker, 1980). As in Ross et al. (I 986a), it was once again found that communicative abilities of humans are lateralized according to behavior itself (affective vs. linguistic) and not according to the physical/acoustical carrier that expresses this behavior. As is evident from the data presented in Section 4.4., although different acoustic profiles underlie affective prosody for Taiwanese vs. English patients, the behavioral consequences are the same, i.e. affective flattening of voice. Thus, human languages show the features of a composite that is the product both of specific neurological organization of brain tissue and of the brain's ability to react to the acoustical properties of a particular language, i.e. tone vs. non-tone, during the experience of language acquisition. This research was supported in part by grants from the Organized Research Fund of the University of Texas (J.A.E.) and The Stuttering Foundation, Baylor College of Medicine in Houston, Texas (E.D.R.). The authors thank David Rosenfield, M.D. , for his generosity and encouragement and Anna Morgan-Fisher for her technical assistance.
Loss of affective prosody in Taiwanese patients
233
References Abramson , A. S. (1962) The vowels and tones of standard Thai: acoustical measurements and experiments, International Journal of American Linguistics, 28, 2 (part II). Benson, D. F . (1979) Aphasia, Alexia and Agraphia. Edinburgh: Churchill Livingstone. Chang, T. N.C. (1958) Tones and intonation in the Chengtu dialect, Phonetica, 2, 60-84. Chao, Y. R. (1968) A Grammar of Spoken Chinese. Berkeley: University of California Press. Cheng, R. L. & Susie, S. (1982) Phonological Structure and Romanization of Taiwanese Hokkian. Taipei: Student Book Co. , Ltd. (in Chinese). Connell, B. A., Hogan , J. T. & Rozsypal, A. 1. (1983) Experimental evidence of interaction between tone and intonation in Mandarin Chinese, Journal of Phonetics, 11, 337- 351. Fairbanks, G. & Pronovost, W. (1939) An experimental study of the pitch characteristics of the voice during the expression of emotion, Speech Monographs, 6, 87- 104. Fairbanks, G. & Hoaglin, L. W. (1941) An experimental study of the durational characteristics of the voice during the expression of emotion, Speech Monographs, 8, 85- 90. Gorelick, P. B. & Ross, E. D . (1987) The aprosodias: further functional-anatomic evidence for organization of affective language in the right hemisphere, Journal of Neurology, Neurosurgery & Psychiatry, 50. Ho , A. T. (1977) Intonation variation in a Mandarin sentence for three expressions: interrogative, exclamatory and declarative, Phonetica, 34, 446-457. Hughes, C. P., Chan, J. L. & Su, M.S. (1983) Aprosodia in Chinese patients with right cerebral hemisphere lesions, Archives of Neurology, 40, 732- 736. Lieberman, P. (1961) Perturbations in vocal pitch, Journal of the Acoustical Society of America, 33, 597-603. Ross, E. D. (1981) The aprosodias: functional-anatomic organization of the affective components of language in the right hemisphere, Archives of Neurology , 38, 561- 569. Ross, E. D. (1984) Right hemisphere's role in language, affective behavior and emotion, Trends in Neurosciences, 7, 342-346. Ross, E. D., Edmondson, J. A. & Seibert, G. B. (1986a) The effect of affect on various acoustic measures of prosody in tone and non-tone languages: a comparison based on computer analysis of voice, Journal of Phonetics, 14, 283- 302. Ross, E. D., Edmondson, J. A. & Seibert, G. B. (1986b) Transient loss of affective prosody following right-sided Wada test, Neurology , 36 I, 319 (abstract). Ross, E. D., Holzapfel, D. & Freeman, F. (1983) Assessment of affective behavior in brain damaged patients using quantitative acoustical-phonetic and gestural measurements, Neurology, 33, 219- 220 (abstract). Ross, E. D. & Mesulam, M.-M. (1979) Dominant language functions of the right hemisphere?: prosody and emotional gesturing, Archives of Neurology, 36, 144--148. Shapiro, B. & Danly, M. (1985) The role of the right hemisphere in the control of speech prosody in propositional and affective contexts, Brain and Language, 25, 19- 36. Van Lancker, D. (1980) Central lateralization of pitch cues in the linguistic signal, Papers in linguistics: International Journal of Human Communication. 13. 201 - 277. Williams, C. E. & Stevens, K. N. (1972) Emotions a nd speech: some acoustical correlates, Journal of the Acoustical Society of America, 52, 1238- 1250. Zar, J. (1984) Biostatistica/ Analysis. Englewood Cliffs, NJ: Prentice Hall.