Journal of Phonetics (1983) 11, 373 - 3 82
Speaker normalization in perception of lexical tone Jonathan Leather Engels Seminarium , University of Amsterdam, Spuistraat 210, 1012 VT Amsterdam, The Netherlands Received llth May 1983
Abstract :
Listeners' decisions on vowel and consonant id·entity have been shown to depend upon inferences about the properties of the individual speech source, but little empirical attention has been given to (theoretically necessary) speaker-normalization in the recovery of linguistically significant pitch information from voice fundamental frequency (FO). In the investigation reported here, two sets of synthetic Mandarin tone stimuli, some with lexically ambiguous FO contours, were embedded in natural speech utterances of two native speakers with different but overlapping voice ranges, and presented in a lexical identification test. While individual listeners differed somewhat in the acoustic criteria by which they apparently decided lexical identity , all listeners significantly assigned "ambiguous" stimuli with identical absolute FO contours to different lexical categories depending on which of the speakers was heard to "produce" them . These results indicate that in the perceptual processing of FO, phonetic decisions are indeed referenced to an inferred scaling of the source voice range .
Introduction In arriving at a linguistic interpretation of an acoustic speech input, a listener has to accom modate not only to intra-speaker variation due to factors such as style and speech rate, but also to inter-speaker variation due to dialect and the diversity of individual vocal tract forms (Stevens, 1972). Although such acoustic-phonetic variability does not seem to cause listeners any difficulties , little is known about the perceptual adjustments that are presumably made to compensate for it (Klatt, 1979). Listeners' ability to "tune in" to different speakers can be explained by postulating a level of speech perception at which the acoustic parameters of the speech signal are "normalized" according to the characteristics inferred of the speech sound source before being interpreted in phonological terms (Fourcin, 1972 , 1975). That such normalization may indeed by an additional stage in speech perception is suggested by the longer reaction times characteristic of identification tasks which theoretically require some perceptual modelling of a speaker's vocal tract (Summerfield & Haggard, 1975). · Several studies have provided experimental evidence of speech source inference in the perception of vocalic and consonantal stimuli. Listeners in the experiment of l.adefoged & Broadbent (1957) judged the identities of synthetic vowels differently according to the formant frequencies provided by introductory sentences. Robertson (1971) found that listeners labelled whispered single-formant synthetic vowel stimuli with and without various types of precursors according to values that would have characterized the second formant 0095-4470 /8 3/040373
+ 10$03.00 /0
© 1983 Academic Press Inc. (London) Ltd .
374
J. Leather
of the particular precursive source, and which could be inferred from the stimuli themselves. Fourcin (1968) investigated the effect of precursive child's or adult's speech on the listener's categorization of a range of synthesized whispered single -formant stimuli. The second formant transitions were found to be identified as either /b/ or /d/ depending on whether the source vocal tract was inferred to be that of a child or a man. Rand (1971)usedsynthetic CV syllables with two formants simulating the frequency ranges of an adult and a child, and found like Fourcin that in CV syllables stop consonants are perceived with reference to formant loci appropriate to the inferred source vocal tracts. Voice pitch carries linguistic and paralinguistic information at a number of levels including the pragmatic (Brazil, 1978), textual (Bruce , 1982), interpersonal (Loveday, 1981 ), affective (van Bezooijen, 1980), syntactic (Cooper & Sorenson, 1981) and lexical (as in the experiment reported below). Pitch-mediated information is encoded in patterns of voice fundamental frequency (FO), and since the absolute range of FO differs between speakers -and quite predictably so between men, women and children- a listener must theoretically relate FO to the inferred range of a speaker in order to make any categorical judgements of pitch level: clearly, "high" and "low" can only be so identified in relation to a known range. Methods for the normalization of FO data in acoustic analyses of speech have been described by Jassem (1975) and Takefuta (1975), but complementary understanding of the normalization of FO in speech perception, and of the origination of such a normalization ability, has yet to be reached. Lexical tone systems offer a good "site" for the experimental study of perceptual normalization of FO, because categories of lexical identity established by tonal contrasts may well be clearer to the naive listener than other pitch-mediated contrasts, in, for example, syntactic form, or affect. Chan (1971) used synthetic larynx tones to study the influence of precursive male and female speech on the identification of Cantonese lexical tones, and was able to relate listeners' responses to their inferences of the ranges of the (hypothetical) voice sources. Broadly complementary results were reported by Ching (1981) in a cross-sectional study of children's developing ability to identify natural and synthetic stimu li patterned on the Cantonese tone system. The present experiment was designed to explore possible normalization of FO in the perception of Mandarin tone, using realistic synthetic stimuli embedded in natural speech carriers.
Method In Mandarin (here understood as standard Peking colloquial) Chinese, lexical contrasts may be realized solely by a phonetic system of tone , which has as its principal acoustic correlate a patterning of FO (see among other studies, Dreher & Lee, 1966; Chuang, Hiki, Sone & Nimura, 1971 ; Ho, 1976). The Mandarin tone system comprises the four contrasting citationform pitch contours shown in stylized form by Chuang et al. ( 1971) in Fig. I. Each pitch contour can be described as having a characteristic rise /fall/steady value on the dimension of pitch movement, and a characteristic related placement on the dimension of pitch height. Under conditions where pitch movement cues are reduced or suppressed, it may be expected that perceptual decisions as to tone category membership will be based also (or instead) on assessments of pitch height. Since individual speakers differ in their FO ranges, any perceptual judgement of pitch height must be made relative to the inferred range of the speaker concerned . It was therefore hypothesized that stimuli modelled on minimal tonal pairs, but with pitch movement cues suppressed , would receive different lexical identifications as a function of inferred different speaker FO ranges.
Speaker normalization
375
5
I ,---\-------~ 'I \ II \
4
I'
~
'
I
;: 3 1-· · · -
\
_ __ ____ ___ ..).. __
I
a_
2
I
' Tone 3
0
Figure l
0.2
0 .6 0 .8 Time Simplified representation by Chuang eta!. (1971) of pitch contours of Ma ndarin lex ical tones. 0.4
Stimuli The stimuli were syntheses modelled on the speech of two native male Mandarin speakers in their early twenties , TW and SN, who were born and grew up in Peking. Samples of their speech were collected in three tasks : (I) reading a 32-line excerpt from Li Xin-Tian's Shan shan de Hong Xiug; (2) reading from cards three tokens each, in random order , of the monosyllabic words /y/, / ti /, /di/, /ky/, 1'6¥1and /da/ in each of the four tones ; (3) reading from cards the sentence " :;j::;f, aA{'f. "'fi ·l''~% X " (translatable as " What word is this : X ?", X being each in turn of the four tonally-distinguished /y/ monosyllables). Data on the speakers' respective vowe l . formants were obtained from spectrographic analysis of the recorded speech waveforms. Da ta on their individual FO ranges and their respective placement of the tone contours in those range s were obtained by means of a laryngograph and a voiscope (Fourcin & Abberton , 1971 ). Figure 2 shows that the absolute frequency range in which SN starts his Tone 1 contours is the same as that in which TW, for his part , begins his Tone 2. TWs Tone 1 begins in an altogether higher frequency range , and SN's Tone 2 in a lower one. Tone I
Tone 2
20 0 r-------,----~
N
I
250 ms t---;;.
Figure 2
FO contours of To ne 1 and To ne 2 word tokens produced by (a) SN and (b) TW (t hree tok ens each of/';, y, dl , di , tl , t! /) hand -traced fr om hard copy output fro m laryngograph/vo iscopc.
376
J. Leather
These differences are consistent with differences between the general FO ranges of TW and SN, which are plotted in Fig. 3 as long-term distributions of voice frequencies computed from the recordings of their readings from Shan shan de Hong Xiug. It can be seen from Table I that while the two voice ranges largely overlap, they do differ by an order of around two or three semitones on most measures. Two sets of synthetic tone stimuli based on /y / were created using the SPIT system (Huckvale, I977) to control a parallel synthesizer (West, 1976). In each set, duration was constant (350 ms), and all stimuli had the same overall amplitude pattern . Values for both these parameters were based on means of tokens produced in the word list readings by TW and SN, which proved on analysis to be entirely consistent with those produced by four other native speakers who had provided speech samples in the same production tasks. All formant frequencies, amplitudes and bandwidths of the stimuli in each set were the same, but the two sets differed in being modelled on the vowels of TW and SN respectively. We will accordingly refer to the stimuli sets as "TW stimuli" and "SN stimuli". The FO contours of all the stimuli were multilinear simplifications derived from averaged contours from the word list speech of TW and SN. Stimuli I and 9 in each set had "good" Tone I and Tone 2 FO contours (see Fig. 4); stimulus 8 in each set had the same contour as stimulus 9 though a different starting frequency ; the FO contours of the other stimuli were interpolations between stimuli 1 and 9. As can be seen from Fig. 5, the FO contours of the "SN" stimuli I, 2, 7, 8, and 9 had starting frequencies approximately 2.3 semi tones below the otherwise identical contours of the TW stimuli. The remaining four stimuli in each set had FO contours which were identical to both sets. Previous experiments with the categorization of stimuli on a Tone 1- Tone 2 acoustic continuum (Leather, in preparation)had located the toneme boundary in the region where the FO of stimuli increased at a rate of about 7 semitones/second. In their respective rates of rise the stimuli with these common FO contours were located approximately on (in the case of stimuli 4 and 6) or clearly to one side of (stimuli 3 and 5) this toneme boundary. Their starting frequencies established them either within the normal starting frequency range of TW 's Tone 2 and SN's Tone 1 (as in the case of stimuli 5 and 6), or (as in stimuli 3 and 4) below it-that is, in a range in which a TW Tone 1 would not be expected, but an SN Tone 2 entirely possible . Each stimulus of each set was embedded in one of the four natural speech carrier sentences previously recorded (meaning "What word is this: X?") by the respective speaker, no systematic differences for either speaker having been found between the FO patterns of the
.~ .0
_g
0.01
2
0..
30
100
300
1000
Frequency (Hz) Figure 3
Distribution of second order voice frequencies, 33Hz-1kHz, of informants -).
TW ( · · · · ·)and SN (-
Speaker normalization
377
Table I Statistics over second order voice frequencies, 33Hz-1kHz, of informants TW and SN (values in Hz)
Mode Mean Range when p
>=
0.01
SN
TW
129 141 112-204
204 170 123-234
common parts of these four sentences. Two test tapes were created each with ten occurrences in three random orders of each resulting test sentence type. To minimize the disparity between the natural speech carriers and the synthetic tone stimuli the tapes were low pass filtered at 2.5 kHz (48 dB/octave). From the recordings by TW and SN of the reading passage from Shan shan de Hong Xiug 45-second excerpts were made , to be used to familiarize subjects with the speech of TW and SN. These reading excerpts were low pass filtered at 3.5 kHz (48 dB/octave) to minimize any disparity in subjective auditory quality between these and the test items.
Subjects Subjects were four native Pekingese aged between 20 and 40 with no known speech or hearing defects who were living for a period of 18 months in the Netherlands, and a bilingual Cantonese/Mandarin phonetician (Subject 3) brought up in Hong Kong but currently working in London, whose responses in previous tone categorization experiments had closely resembled those of Pekingese natives.
115 dB
a; 0
:;;
167
"'
S' N
I
"
120~ 0
50
100
150
200
250
300
350
I (ms)
Figure 4
FO and overall amplitude values of the "good" TW and SN Tone 1 and Tone 2 stimuli. (The other stimuli in each set had FO contours interpolated between these extremes.)
378
J. Leather
1671-
I
09
~
08 :
7
TW Tone I
r---~--~-- --- ---- - --- ----Tone I I Tone 2 .,___ I ______,.
l
TW Tone 2 Tone I
SN
I I
16
»
Gj +7 I
u
c
"'"c:r
I
~
I
c>
~-------~--------------- ---1 Tone 2
c 0
e4 I
in
lSN
e3
+2
I I
120
f- I 3
I 4
I 5
I 6
: 7
I 8
I 9
I 10
I II
I 12
I 13
I+ I I 14 15
Rate of rise ( semitones /sec)
Figure 5
Starting frequencies and rates of rise of FO co ntours of all nine stimuli in each set in relation to toneme boundaries. The rates of rise are values based on the differe nce between the maximum and minimum frequencies of each contour. The dotted lines mark off hypothetical areas of tone space according to (i) the Tone 1-Tone 2 labelling function crossover found in previous experiments to co rrespond to a 7 semi tone/sec ra te of rise; (ii) observed frequency ranges in which TW and SN begin their respective citatio n-form Tone I and Tone 2 co ntours. o = TW stimulus;+ = SN stimulu s.
Procedure Subjects first listened to one of the reading excerpts so as to become attuned to the speech of the reader . They then listened to the test stimuli based on the same speaker and assigned them to Tone 1 or Tone 2 categories on a response sheet which gave them a choice of two degrees of certainty ("probably" or "definitely") for each tone category . Thus , for each stimulus sentence the four possible response choices were (translatable as) "definitely fish" , "probably fish", "probably muddy" and "definitely muddy". The tape was stopped after each sequence of 30 trials for 30 seconds rest. Subjects 1, 2 and 3 heard and judged first the TW excerpt and test sentences, then the SN excerpt an d test sentences. Subjects 4 and 5 were tested in the reverse order. All subjects, when questioned after the test, reported believing they had heard natural speech. Results and discussion All subjects made some "probably" judgements, notably on stimuli 3- 7, which suggests that these stim uli were perceived as less clear tonal exemplars than the "good" stimuli 1 and 9. This is not surprising, since in stimuli 3-7 the rate-of-rise cue which is available in normal tone perception was partly or wholly suppressed. Stimuli 1, 2, 7, 8, and 9 in each set were consistently assigned by all subjects to the tone categories predicted for them in Fig. 5 (though stimulus 7, as mentioned , less confidently so). For statistical analysis all responses were reduced to binary categorizations (that is to say, the distinction between "probably" and "definitely" for each tone category was collapsed). For each stimulus pair TW3/SN3 ,
Sp eaker nomwlization
379
TW4/SN4, TW5 / SN5 , and TW6 / SN6, and each subject, x2 values were computed from a 2 x 2 contingency table of tone category (Tone 1 or Tone 2) and speaker (TW or SN). 2 These x values , which are set out in Table II, indicate that all the subjects categorized stimuli with identical FO characteristics differently according to which speaker they were "heard" from. For Subjects 4 and 5 the differences are not so significant as for the other subjects, and this could conceivably be a consequence of their having judged the TW and SN sets of stimuli in opposite order: relatively long attention to a single speaker while concentrating on a single phonetic contrast could possibly affect subjects' phonetic criteria in some way uncharacteristic of normal listening but widely observed in speech adaptation studies (Cooper , 1975). It is noteworthy that while all subjects apparently make systematic speakerrelated distinctions in their categorizations, they differ in their criteria for doing so: Subject 4 does not show significant speaker-related differences for the stimuli TW6/SN6, while the other subjects do; Subject 2 , unlike the others, shows highly significant differences for the stimuli TW5/SN5 ; and so on . These are probably normal individual differences in the use of acoustic cues for phonetic decisions and in the inference of source FO ranges. For instance it may be that Subject 2 makes more use than the others of end-frequency in his evaluation of the pitch height of an FO contour: this might explain why he made highly-significant tonal distinctions between stimuli TWS and SNS, which are not near to tone boundaries in either starting frequency or rise-rate alone. There are indications that the ability apparently shown by subjects in the present experiment to normalize FO data from different speech sources may be available at a very early age. 1n the earliest stages of speech acquisition, the infant must parcel out the languagespecific and source-specific properties of the speech to which he is exposed. As Lieberman (1980) has pointed out, infants who have no way of knowing what supraglottal vocal shapes are adopted by the speakers they hear never attempt to mimic the lower formant frequencies of adult vowels , but instead produce equivalent formant frequency patterns scaled to their own shorter vocal tracts. Similarly, although there are uncertainties about the relationship between FO and perceived pitch in young infants, Kessen, Levine & Wendrich (1979) have found indications that infants of no more than three months can imitate vocal pitch patterns, and such an ability would presuppose the capacity to normalize the FO data to which they have been environmentally exposed . At the age of 5-6 months, at any rate, infants can recognize melodic patterns under conditions of transposition (Chang & Trehub, 1977), and this ability presumably entails a comparable form of relational processing. In their study of the prosody of mothers ' speech to their infants, Stern , Spieker, Barnett & MacKain (1983) found that pitch variation (measured in terms of range, terminal change, change across pauses and highest FO per utterance) was at a peak around the infant age of four months . It seems reasonable to hypothesize that this peaking is functional. Dixon & Just (1978) found in tasks comparing shapes and hues that increasing disparity on a noncriteria! dimension correlated with longer "same" reaction times , hence a presum ed greater amount of normalization processing. Summerfield & Haggard (1975) reported broadly parallel results for a speech task . Fourcin (1978) presents evidence of a mother apparently helping her baby to acquire the ability to normalize by reducing the difficulty of his patternmatching task: she does this by reproducing her baby's FO contours in what is for her an atypically high pitch range , thus minimizing the disparity in absolute pitch between the baby's utterance and her own version of it. A specially raised pitch range in a mother's speech may help her baby to learn to separate out language- and source-specific features of FO contours by reducing , in this way, early pattern-matching demands upon him. 1n the present experiment subjects made consistent speaker-related phonetic distinctions
J. Leather
380
Table II Values of x' (with Yates's correction, DF differences between Tone I and Tone 2 judgements
= I) for speaker-related
Subject
TW3/SN3
Stimuli TW4/SN4
TW5/SN5
TW6/SN6
1 2 3 4 5
0 0 0 0 0
7 .3* 2. 1 0 4 .3* 1.8
2.8 12.9*** 0 0 0
10.2** 5.9* 12.9*** 1.6 4.3*
*p **p ***p
< 0.05 < 0.01 < 0.001.
for FO differences of the order of 2.3 semi tones ; but how finely-grained , in fact, is the adult listener's perceptual frame-of-reference for voice range, and what information does the listener draw upon to set up such a frame-of-reference? Phillips (1971) found that while American mothers' speech to children has a higher mean maximum FO than their speech to other female adults, their mean minimum FO remains the same. This finding was confirmed by Boyce & Menn (1979), leading to the suggestion that the FO value reached at the end of an utterance-final fall, being near-constant for a given speaker, may constitute one reliable reference point for perceptual calibration of voice range (Stevens, Menn & Bouce, 1981 ). The expanded pitch range of mother-child speech revealed by these studies and those of Stern et al. (1983) would maximize the perceptual salience of pitch movements and pitch height contrasts, providing clear exemplars for the child of the pitch patterns to be learned.
Conclusion The experiment reported here has provided support for the supposition that inferred FO range of a speaker is a determining factor in the categorization of lexical tones. The number of subjects was comparatively small, but all showed significant patterns of speaker-adaptation in their phonetic judgements. In a further study it would be desirable to use a larger number of subjects, and to alternate "speakers" between one trial and the next, or randomly, in order to minimize any possible over-adaptation to one speech source. It would also be preferable to adopt an experimental design which does not suppress an important acoustic cue (as in the present case rise-rate). This might be done using a language which has tones contrasting in pitch height alone, of which there are many (Maddieson, 1979). The learning of lexical tone systems by adult non-tonal speakers would be a suitable site for a study of normalization in one kind of phonological acquisition. An understanding of general capacities (and possible language-specific strategies) for perceptual normalization of FO would seem to be a prerequisite for linguistic and acoustic studies of significant pitch variation of any kind-grammatical , discoursal and attitudinal as well as lexical. I am grateful for indispensable advice and help from Adrian Fourcin and many other members of the Department of Phonetics and Linguistics, University College, London , and from Emil Kappner of the Instituut voor Fonetische Wetenschappen, Universiteit van Amsterdam . Shortcomings in this work are my own responsibility.
Speaker normalization
381
References Bezooijen, R. van (1981 ). Characteristics of vocal expressions of emotion: pitch level and pitch range. Proceedings, Vol. 5, Institute of Phonetics, Catholic University of Nijmegen. Boyce, S. & Menn, L. (1979). Peaks vary, endpoints don't: implications for intonation theory. Proceedings of the Fifth Annual Meeting of the Berkeley Linguistic Society , pp . 3 73-384. Brazil, D. (1978). Discourse Intonation I & II. English Language Research, University of Birmingham , England. Bruce, G . (1982). Textual aspects of prosody in Swedish. Phonetica, 39, 274-287 . Chan, A. (1971). A Perceptual Study of Tones in Cantonese. PhD dissertation, University College London. Chang, H. & Trehub, S. (1977). Auditory processing of relat ional information by you ng infants. Journal of Experimental Child Psychology, 24, 324-331. Ching Yuk Ching, T. (1981 ). Communication of Lexical Tone Patterns in Cantonese. PhD dissertation, University College London. Chuang, C. K., Hiki, S., Sane, T. & Nimma, T. (1971). The aco ustical features and perceptual cues for the four tones of standard Chinese. Proceedings of the Seventh International Congress on Acoustics, !II, pp. 297-300. Bundapest: Akademiai Kiado. Cooper , W. E. (1975). Selective adaptation to speech. In: Cogniti11e Theory (F. Restle eta/. eds). Vol. I, pp . 23-54. Hillsdale, New Jersey: Lawrence Er lbaum. Cooper, W. E. & Sorenson, 1. M. (1981 ). Fundamental Frequency in Sentence Production. New York: Springer-Verlag. Dixon, P. & Just , M.A. (1978). Normalization of irrelevant dimensions in stimulus comparisons. Journal of Experimental Psychology: Human Perception and Performance, 4, 36-46 . Dreher, 1. 1. & Lee, P. C. (1966). Instrumental investigat ion of single and paired Mandarin tonemes. Research Communication, 13. Douglas Aircraft Com pany Advanced Research Laboratory. Fourcin, A. 1. (1968). Speech source inference . Institute of Electrical and Electronic Engineers, AU-16. pp . 65-67. Fourcin, A. 1. (1972). Perceptual mechanisms at the first level of speech processing. Proceedings of the Se11enth In ternational Congress of Phonetic Science. pp. 48-59. The Hague: Mouton . Fourcin, A. J. (197 5). Speech perception in the absence of speech productive ability . In: Language, Cognitive Deficits and Retardation (N. O'Connor, ed.). London: Butterworths . Fourcin , A. J. (1978). Acoustic patterns and speech acquisition. In : The Developm ent of Communication (N. Waterson & C. Snow, eds). London: Wiley. Fourcin, A. J. & Abberton, E. (1971 ). First appli cations of a new laryngograph. Medical and Biological fllustration , 21, 172-182. Ho, A. T. (1976). The acoustic variation of Mandarin tones. Phonetica , 33, 353-367. Huckvale, M.A. (1977) . A guide to the synthesis programs installed on th e Texas Instrum en ts 990/I 0. Mimeo, University Co llege London, Department of Phonetics and Linguistics . Jassem, W. (1975). Normalization of FO curves. In : Auditory Analysis and Perception of Speech (G. Fant & M. Tatham, eds) . London: Academic Press. Kesse n, W., Levine , J. & Wendrich , K. A. (1979) . The imita tion of pitch by infants. Infant Behavior and Development, 2, 93-100. Klatt, D. H. (1979). Ladefoged, P. & Broadbent, D. E. (1957). Information co nveyed by vowels. Journal of th e Acoustical Society of America, 29,98-104. Lieber man , P. (1980). On the development of vowel production in young children. In: Child Phonology (G. H. Ycni-Komshian , J. F. Kavanagh & C. A. Ferguson, eds). Vol. I, pp. 113-142 . New York: Academic Press. Loveday, L. (1981 ). Pitch, politeness and sexual role . Language and Sp eech, 24, 71-89. Maddieson, I. (1979) . Dimensions of tone systems. Proceedings of the Ninth Int ernational Congress of Phonetic Science Vol. I, p. 389 (abstract). The Hague : Mouton. Phillips, J. R. (1971). Formal Characteristics of Speech which Mothers Address to Their Young Children. PhD dissertation , John Hopkins University. Rand , T. C. (1971 ). Vocal tract size normalization in the perception of stop consonants. Haskins Laboratories Status R eport SR 25/26, pp.l41-143. Robertson,?. (1971). Stern , D. N., Spieker, S., Barnett, R. K. & MacKain, K . (1983). The prosody of materna l speech: infant age and context related changes. Journal of Child l-anguage, l 0, 1-15. Steve ns, K. N. (1972). Sources of inter- and intra-speaker variab ility in the acoust ic properties of speech sounds. Proceedings of the Seventh fnternational Congress of Phonetic Science. pp. 206-226 . The Hague: Mouton . Stevens, K. N., Menn, L. & Boyce, S. (1981). NSF Ref'Ort Abstract 4FEB81. Summerfield, A. Q. & Hagga rd, M.P. (1975). Vocal tract normalisation as demonstrated by reactio n time.
382
J. Leather
In: Auditory Analysis and Perception of Speech (G. Fant & M.A. A. Tatham, eds). pp.llS - 141. London: Academic Press. Takefuta, Y. (1975). Method of acoustic analysis of intonation. In: Measurement Procedures in Speech, Hearing and Language (S . Singh, ed.) . pp. 363-378 . Baltimore: University Park Press. West, J. (1976). A new speech synthesizer. Speech and Hearing: Work in Progress, 2, 4-6. University College London, Department of Phonetics and Linguistics.