Journal of Phonetics (1994) 22, 65-85
The discrimination of pitch movement alignment in Dutch Jo Verhoeven University of Oxford, Phonetics Laboratory, 41 Wellington Square, Oxford OXl 2JF, U.K. Received 6th January 1992, and in revised form 14th May 1993
This paper reports three experiments on the discrimination of positional variation of pitch movements in Dutch pitch contours. The results of a pilot experiment showed that positional differences in falling pitch movements are better discriminated than those of rising pitch movements in the Dutch flat hat contour. This differential sensitivity was also found in a follow-up experiment, the results of which do not support the pitch height and the boundary hypothesis which were formulated to account for this differential sensitivity. The former explains the differential sensitivity by the fact that subjects perceive variation in pitch movement alignment in terms overall syllable pitch . The latter assumes that there is a temporal boundary in the syllable nucleus which is crucial to the perception of alignment differences. As an alternative to the boundary hypothesis, certain features of the data are found to be consistent with the spectral contraints hypothesis of House (1990). In the third experiment, the discrimination of isolated rises and falls was investigated. The results showed that both types of pitch movements are discriminated equally well. This indicates that the observed differences in discrimination between falls and rises in experiments 1 and 2 do not represent a uniform differential perceptual response to the type of pitch movements, but may be related to linguistic structure.
1. Introduction
Pitch movement alignment refers to the location of a pitch movement with respect to the syllable boundaries of utterances. The importance of pitch movement alignment distinctions to the characterization of intonational phenomena has long been recognized, particularly with regard to so-called pitch accent languages. In Swedish, for instance, some segmentally identical words may be realized prosodically with either accent 1 or accent 2, each accent corresponding to a different meaning. Bruce (1977) has established experimentally that the precise timing of the F0 fall associated with the two accents determines their identification. In accent 1, the fall occurs earlier than in accent 2. A similar phenomenon is found in Serbo-Croatian, which has a distinction between rising and falling accents. Purcell (1976) varied the pitch peak location in the stressed vowel of a two-syllable Present affiliation : Universitaire Instelling Antwerpen, Initiatief Nederlands , Universiteitsplein 1, B-2610 Wilrijk, Belgium . 0095-4470/94/010065+21 $08.00/0
© 1994 Academic Press
Limited
66
J. Verhoeven
Serbo-Croatian word and found that accents with early or central peaks were identified as falling, whereas late F 0 peaks were recognized as rising accents. Differences in the -alignment of pitch transitions associated with tonal distinctions have also been observed in Rhenish dialects (Germany) and Limburg dialects (The Netherlands and Belgium) (Schmidt , 1986; Verhoeven & Connell , 1992). In some languages which do not use an accentual system like those of Swedish or Serbo-Croatian, there are also pitch movements or pitch complexes "which are categorically different in their location with respect to the syllable boundaries" (Collier, 1983: 244). Evidence for a binary opposition between early and late pitch movements in English was provided by Pierrehumbert & Steele (1989). For German, Gartenberg & Panzlaff-Reuter (1991) indicate that there are three categories of peak alignment in the language. This is confirmed by categorical perception results reported in Kohler (1991). In the intonational system of Dutch also, pitch movements are aligned typically early, late or very late . Apart from this alignment dimension, pitch movements can also be distinguished on the basis of a small number of other perceptual dimensions , i.e., their direction (rise vs. fall), the speed of the pitch transition (gradual vs. abrupt) and their excursion size (full size vs. half-size) . The perceptual basis for this three-way alignment distinction was provided by an experiment reported in Collier (1975) . In this experiment, subjects were presented with stimuli drawn from a rise-alignment continuum and were required to classify them in as many categories as they felt appropriate. All the rises were implemented on an utterance-final syllable . The results of a cluster analysis suggest that subjects classified the stimuli in three categories. The first category contained rises that were located relatively early in the experimental syllable . The second category contained two late rises, while the rises in the third category were located very late. The discrimination of rise-alignment was investigated by Hill & Reid (1977). They had subjects listen to a series of word pairs in which the rise in the second word occurred later than the rise in first. The physical difference in alignment was varied as an independent variable and ranged between 10 and 70 ms . Subjects were asked to evaluate the discrimination items as "same" or "different". The reported discrimination results suggest that informants distinguish between three alignment categories. The first contains rises which typically begin during the prevocalic consonant and extend across the release of the vowel nucleus into its steady-state portion. The rises in the second category start in the steady-state portion of the vowel nucleus and extend into the steady-state portion of the following consonant. Finally, Hill & Reid found some evidence as to the existence of a third category, the rises of which are entirely situated outside the vowel nucleus. As to the relevance of these results, it is important to point out that the structure of the stimulus utterance was such that it consisted of three open syllables mamama. Consequently, the obtained discrimination results must be due to whether the rises are situated within the same syllable or not. The category of late movements consists of rises which are only partially located in the same syllable, since they extend across the syllable boundary into the next. The very late category has pitch movements which are entirely situated in the next syllable. In this perspective it has to be pointed out that rises are not discriminated when they are entirely situated in the same syllable and as such the results are not compatible with those reported by Collier (1975); in his experiment all the rises were located within the same syllable.
Pitch movement alignment in Dutch
67
At a global level of analysis, the Dutch intonation system has typical configurations of pitch movements (pitch contours) which can be regarded as the conrete acoustic manifestation of abstract intonation patterns . There are several such patterns, but two of these are important for the present investigation, i.e ., the hat pattern and the cap pattern. Both contours have the same general shape, in that they consist of an utterance-final fall which is preceded by a prominence-lending rise. The general shape of this pattern is illustrated in (1).
~
R~na:t
is na:r paRcis
(1)
In a traditional view ('t Hart, Collier & Cohen, 1990) , the relevant· difference between the two patterns is one of pitch movement alignment, in that both movements in the hat pattern are located earlier to their respective syllables than the movements in the cap pattern. Positional variation in the location of pitch movements in the hat pattern was investigated in Verhoeven (199la). In one experiment, subjects participated in a variable standard AX discrimination task : they were presented with pairs of utterances in which the placement of the rise in the penultimate stressed syllable was physically different and they were asked to evaluate the stimuli as "same" or " different". The results from this experiment showed that there is no indication of categorical perception of rise-alignment . Similar experimentation with fall-alignment suggests that positional variation of falls by contrast, is perceived categorically in this pattern . The first conclusion from the experiments in Verhoeven (1991a) is that the essential difference between hat and cap pattern in Dutch is related to the location of the utterance-final fall, which is aligned late in the hat pattern and very late in the cap pattern. The late position of the fall renders the syllable prominent in the hat pattern , whereas the very late location of the fall in the cap pattern does not render the syllable prominent. In addition, it can be argued that the precise location of the rise is linguistically irrelevant since both early and late locations render the associated syllable prominent. Verhoeven (1991a) argued that any positional variation of rises in the hat pattern that can be observed in speech does not have to be accounted for in a linguistic perspective, but by means of a lower level phonetic component which relates positional variation to the interaction between intonation and other prosodic factors such as speaking rate, the presence of prosodic boundaries and the distance between the accents (Silverman & Pierrehumbert, 1990). The second conclusion that emerged from these experiments is that fall-alignment differences in the hat pattern are generally better discriminated than equivalent positional variation of rises. Such differential sensitivity has been found for other dimensions of pitch movements, such as excursion size ('t Hart, 1981). In his experiment on the discrimination of excusion size differences, subjects were presented with pairs of stimuli, each of which contained a pitch movement in the same direction but of variable size. The subjects were required to indicate the stimulus with the larger movement. The results were such that in the fall condition, 61% of the participating subjects were found to be non-discriminators, i.e ., they were unable to discriminate differences of smaller than four semitones. The number of non-discriminators in the rise condition was only 16%. In addition, the
68
f. Verhoeven
discriminators needed on average 1.8 semitones to judge rises as having a different excursion, whereas the threshold for a fall was slightly larger at 2.4 semitones. 't Hart suggests that this differential sensitivity can be explained by the fact that native speakers of Dutch are more frequently exposed to rises, since the Dutch intonation system prefers rising movements for realizing pitch accents. In this paper, the assumed differential sensitivity to pitch movement alignment is investigated. On the basis of the discrimination results in Verhoeven (1991a ), it can be expected that in the context of the hat pattern, subjects are more sensitive to positional variation in falls than to variation in the position of rises. The aim of the paper is two-fold. First, further evidence is described regarding the differential sensitivity to pitch movement alignment. Second, several hypotheses regarding the causes of this differential sensitivity are investigated.
2. Differential sensitivity to pitch movement alignment 2. 1. Pilot experiment
In this preliminary experiment, two alignment continua were constructed from a canonical flat hat contour. In the first continuum, the location of the rise in the penultimate stressed syllable was progressively delayed, while in the second continuum the placement of the falling pitch movement on the last accented syllable was postponed. The stimuli from each continuum were combined into AX discrimination items and presented for discrimination to native speakers of Dutch. The stimuli for the experiment were prepared by means of resynthesis of modified natural speech. In this method, a naturally produced utterance is recorded, digitized and LPC analysed in order to acquire information about the spectral composition of the utterance. Next, the acoustic parameter of F0 is manipulated and the utterance is resynthesized on the basis of the LPC parameters obtained by the acoustic analysis and the F0 editing process. For the experiment, two Dutch utterances were chosen: Renaat is ziek Renaat is in Parijs
"Renaat is ill"
(2)
"Renaat is in Paris"
(3)
The fact that two different utterances were used in this experiment has to do with the fact that these stimuli were intended to be comparable to the stimuli used in other discrimination experiments on pitch movement alignment in Verhoeven (1991a). Utterance (2) was used as the basis for the stimuli in the rise discrimination experiment. In (2) , Renaat is the experimental syllable with which the risealignment differences were associated. Utterance (3) provided the basis for the stimuli in the fall discrimination task. In these stimuli, the alignment differences were implemented on Parijs. Both utterances were given a standard flat hat contour, the acoustic specificationof which conformed to values suggested in 't Hart & Collier (1975). The rising and falling movements were aligned with their respective experimental syllables in such a way that their onsets coincided with the onset of the prevocalic consonant. Duration and excursion size of the movements were identical, i.e., 100 ms and five semitones. The actual stimuli were then derived from these reference contours. In the stimuli for rise-alignment (henceforth RA), the rise on
Pitch movement alignment in Dutch
69
Vowel
Onset
n
R
a:
Ei
100 ms
Figure 1. Schematization of the range of variation in the stimuli for
rise-alignme nt (top) and fall-alignment (bottom). Vertical lines indicate the segment boundaries and durations are specified in milliseconds. The dashed lines represe nt the location of the pitch movement in the reference stimulus.
Renaat was shifted to the right in steps of 10 ms. By this procedure, a total of 22 stimuli were obtained. In the stimuli with falling movements (FA), the falls were shifted to the right through Parijs in steps of 10 ms, which resulted in 19 stimuli . The total variability in RA consequently amounted to 210 ms, whereas it was 180 ms for FA. The acoustic characteristics of the movements and the range of alignment variation are illustrated in Fig. 1. The reference stimulus in each continuum , i.e., the stimulus with the earliest rise/fall location, was combined with every other stimulus from the continuum into AX discrimination items. This yielded RA differences of between 10 and 210 ms, while those for FA ranged between 10 and 180 ms. In all the discrimination pairs, the reference stimulus occurred as the first stimulus. The pairs were presented to 14 native speakers of Dutch , who were required to indicate whether they could perceive the alignment differences, by labelling the items in terms of "same" or "different". Each pair occurred five times for a total of 70 discriminations by the 14 listeners. The pooled discrimination responses for the two types of pitch movements are summarized in Fig. 2. The figure shows that the percentage of "different" judgements increases as a function of "gap size", i.e., the difference in alignment between the pitch movements in the reference stimulus and the test stimulus. At the same time, the discrimination curve is markedly steeper in the fall condition than in the rise condition: as gap size increases, the percentage of "different" judgements increases more rapidly in FA than in RA . As a first step in the analysis of these results, the sensitivity of the individual informants to pitch movement alignment was measured in the two experimental conditions. For each informant, the point where the discrimination curve crosses the 50% boundary was taken as a measure of his sensitivity to alignment differences. These measures were determined by the method of average z-scores (Woodworth &
J. Verhoeven
70 100
80
'E
-
60
!! ~
c
....
40
Rise Fall
20
0
0
~
0
~
0 M
0
~
0
~
0
w
0
~
0
oo
0
0
rn o
0
~
0 N
0
~
0
~
0
~
0
0
0 m ,._ co
Gap Size
Figure 2. Percentage " different" judgements as a function of gap size (ms) for the items in the rise and fall condition. Each data point represents 70 observations.
Schlosberg, 1954). First the observed proportions of "different" judgements for the individual subjects are converted into z-scores. Subsequently, the mean is taken for the lower half of the gap sizes and the mean of the corresponding z-scores is assigned to it. The same is done for the upper half of the gap sizes . These derived z-scores are marked on a graph in which z -scores are plotted as a function of gap size and a straight line is drawn through them. The result of this procedure for the fall discrimination data from a single listener is illustrated in Fig. 3. The point where the line through the mean z-scores crosses the z = 0 line is taken as a measure of the informant's sensitivity to alignment differences. In the procedure, this method was chosen since it relies on all the discrimination data in the experiment to estimate an informant's sensitivity and it is defensible since the relationship between the z-scores and gap sizes is linear, whereas the relationship between the observed proportions and gap sizes is of a sigmoidal nature. The threshold values obtained by this procedure were subsequently submitted to a one-way analysis of variance . The independent variable in the analysis was the type of pitch movement (rise vs. fall) . This analysis gives an F-value of 13.254 (df= 1), which is significant at p < 0.001. The mean threshold for rise alignment is 100.77 ms, whereas that for all alignment is 68.07 ms. The observed difference in threshold for the rise and fall condition is consequently 32.7 ms. Due to the limitations of the experimental design, it is not possible to arrive at firm conclusions regarding the underlying factors that may be responsible for this differential sensitivity. Several factors were considered in Verhoeven (1991b ), two of which are discussed in the following sections.
Pitch movement alignment in Dutch
71
3 ~--------------------------------.
2
B
.,. . N ~
0
Ill
Ill Ill
0
Ill
Ill
u
a
Iii
-1
a A
-2
e e
Ill
a
-3 0
0 ~
0 N
0
0
0
0
<')
"
lO
C!;J
0
,.__
0 aJ
0 Cl
0 0
0 ~
0 N
0
0
0
0
<')
"
lO
C!;J
~
~
0
,.__
0 aJ
~
~
Gap Size
Figure 3. Scatterplot of z -scores for a single listener as a function of gap size in the fall alignment continuum with IR gap sizes. "A" represents the mean z-score for the lowest nine gap sizes, while "B" represents the mean z -score for the highest nine gap sizes. The estimate of the threshold corresponds to the gap size where the straight line through A and B crosses the horizontal line z = 0. In this case, the threshold is 88 ms.
2.1.1. Boundary hypothesis It was mentioned earlier that the pitch movement in the reference stimulus in both alignment continua is associated with the experimental syllable in such a way that the onset of the movement coincides with the onset of the prevocalic consonant. Furthermore, the rises and falls have an equal duration of 100 ms. The duration of the prevocalic consonant in the two experimental syllables is, however, different, with the result that the endpoint of the rise in the reference stimulus (A) is located earlier in the vowel nucleus (Vowel Onset+ 12 ms) than that of the reference fall (Vowel Onset+ 38 ms) . In addition, the durations of the experimental vowel nuclei are also different: 250 ms in the RA stimuli vs. 220 ms in those for FA. This is illustrated in Fig. 4. Figure 4 also shows the threshold items for RA and FA, i.e., the item in each case which is judged different 50% of the time. Note that the endpoint of the movement in the second stimulus (X) is located at proportionally the same place in the vowel nucleus, i.e., 47% into the vowel. If it is assumed that this location represents the temporal dividing line between what is regarded as "early" and what counts as "late" with respect to tonal events in general, it can be argued that this dividing line is critical for discrimination. As a result of the construction of the stimuli, a bigger gap size is needed in the rise condition to reach this temporal boundary than in the fall condition. This results logically from the combined effect of the unequal duration of the vowel nuclei in both conditions and the different locations of the endpoints of the movements in the reference stimulus with respect to vowel onset. This boundary hypothesis has important consequences for the relationship between gap size and discrimination, since it suggests that it is not the distance
J. Verhoeven
72
Vowel Onset
n
Gap 107
a:
5 St
~ R
...____.
· '"I I
Gap 66
Figure 4. Durational characteristics of the experimental syllables and alignment of the pitch movements in the stimuli of the items at the discrimination threshold for the rise (top) and fall conditions (bottom).
between the pitch movements in absolute terms which is the decisive factor in discriminating alignment differences, but rather whether the F 0 endpoint in the second stimulus is located beyond a critical boundary or not. This boundary is assumed to be in the middle of the syllable nucleus. This hypothesis consequently predicts that a different discrimination threshold for pitch movement alignment will be found depending on the location of the movement in the reference stimulus. In other words, the threshold of rise and fall-alignment can be expected to equal the distance from the endpoint of the movement in the reference stimulus to the assumed boundary. 2.1. 2. Pitch height hypothesis An alternative interpretation of the differential sensitivity assumes that physical changes in pitch movement alignment correspond perceptually to changes in the overall pitch level of the experimental syllable. This assumption is suggested by Nabelek, Nabelek & Hirsh (1970). They carried out a matching experiment in which informants were asked to match the perceived pitch of a frequency transition to a set of reference stimuli with a steady pitch level at various frequencies. The frequency transition in the experimental stimulus did not take place over the whole duration of the stimulus, so that the rising transitions were preceded by a portion of steady state low frequency and followed by a steady state high frequency. In falls, the reverse applied. Moreover, the transitions were stepped through the stimulus, which in effect created an alignment continuum. Nabelek et al. conclude that the pitch is evaluated according to the position of the transition. When the transition occurs with some delay , the judgements shift toward the initial frequency, i.e. toward the lower frequencies for rising changes or toward the higher frequencies for falling changes. [p. 540]
This indicates that the delay of a rising pitch movement corresponds to a lowering of perceived pitch, whereas postponing falls causes the pitch of the syllable to rise. If the stimuli of this experiment are viewed in this perspective, the items in the rise
Pitch movement alignment in Dutch
73
and fall conditions are symmetrical in terms of pitch movement alignment (the pitch movement of the second stimulus always occurs later than the pitch movement in the first stimulus of each item), whereas they are asymmetrical in terms of perceived pitch height of the movements in the stimuli. In the rise condition, the early-late (AX) presentation order corresponds to a lowering of the overall pitch level of the experimental syllable; in the fall condition, the perceptual effect of the early-late presentation order is reversed-the overall pitch level is raised. Interpreted in this way, the threshold differences found between the rise and fall conditions suggest that informants are more sensitive to pitch raising than to pitch lowering. This is consistent with results of informal experimentation on the discrimination of pitch excursion size in the pointed hat intonation pattern in Dutch, in which it was found that when the order of presentation involved pitch raising, the F0 peak was discriminated significantly better than when it involved lowering the peak by an equal amount. The pitch height hypothesis proposed here predicts that the discrimination of pitch movement alignment is dependent on the presentation order of the stimuli in the discrimination items. In the items for RA, an early-late (AX) presentation order in terms of pitch movement alignment corresponds to a lowering of overall syllable pitch, whereas the late-early (XA) presentation order raises the pitch. Since informants have been shown to be more sensitive to pitch raising than to lowering, a large threshold is predicted for the early-late order. In the late-early order, the threshold is expected to be significantly smaller. In the fall condition, the opposite is predicted. The early-late order corresponds to raising the pitch and consequently a small threshold is expected. The late-early order lowers the pitch of the experimental syllable, so that a larger discrimination threshold can be expected. 2.2. Further exploration In order to explore the origins of the differential threshold, a new experiment was carried out that investigated whether the differential sensitivity found in the pilot experiment can be reproduced in a more balanced experimental design . Furthermore, the predictions of the pitch height and boundary hypotheses were tested by controlling the presentation order of the stimuli in the discrimination items and the location of the pitch movement in the reference stimulus. In this experiment, subjects were presented with pairs of Dutch utterances with flat hat contours differing only in the location of a pitch movement in the experimental syllable. The independent variables in the experimental design are the type of pitch movement in the alignment continuum (rise vs. fall), the location of the pitch movement in the reference stimulus (early vs. late), the presentation order of the stimuli in the items (early-late vs. late-early), and the alignment difference between the pitch movements in the discrimination items (10-150 ms). 2.2.1. Experimental design The stimuli of this experiment are based on the Dutch utterances (4) and (5):
Renaat is bij Heleen
"Renaat is with Heleen"
(4)
Heleen is bij Renaat "Heleen is with Renaat"
(5)
74
J. Verhoeven
Utterance ( 4) was used for the stimuli for FA, whereas (5) provided the basis for the RA stimuli. In both instances, Heleen was the experimental syllable with which the alignment variations were associated. The individual constituents required to make up these utterances were each spoken 10 times by a male native speaker of Dutch and recorded in a professional studio. The constituent Heleen was spoken with a monotone pitch level of medium height, Renaat is bij __ was spoken with a prosodic pattern appropriate to the initial part of the hat pattern (i.e., a rise on Renaat, followed by a stretch of high declination) and the predicate __ is bij Renaat was spoken with a suitable final portion of the hat pattern (high declination, followed by a fall on Renaat), as schematized in (6). __r--_~ R;)
na:t iz bci
hele:n
iz bci Rgno:t
(6)
From the realizations of each constituent, one delivery was chosen which sounded fluent in all respects and digitized at F, = 20 000 Hz with Fe =8000Hz. Next, the waveform of Heleen was combined with those of Renaat is bij __ and __ is bij Renaat by means of a waveform recombination programme (Dryden & Mcleod, 1988). This resulted in two complete and syntactically /semantically well-formed Dutch utterances , in which the spectral and durational characteristics of Heleen are identical. These utterances were subsequently analysed acoustically by means of the API subroutine of the ILS signal processing package, with analysis conditions appropriate for autoregressive LPC spectral modelling (Wakita, 1980) . Next, they were provided with a standard flat hat intonation pattern. The pitch movements in the experimental syllable for RA and FA were aligned in such a way that their endpoints coincided with vowel onset in Heleen. The duration of the rise and fall was 100 ms and their excursion size amounted to 5 semi tones. The stimuli for RA were derived by shifting the rise in Heleen is bij Renaat to the right in 21 steps of 10 ms. The stimuli for FA were obtained by moving the fall in Renaat is bij Heleen through the experimental syllable in 10 ms steps. By this procedure, 42 stimuli were obtained. The alignment of the pitch movements in the experimental syllable of the two continua is illustrated in Fig. 5. In the construction of the discrimination pairs for rise-alignment, two reference stimuli were chosen. In one reference stimulus, the rise was aligned early, i.e., its endpoint coincided with the onset of the vowel nucleus in the experimental syllable. In the second reference stimulus, the endpoint of the rise was located 50 ms after vowel onset. These reference stimuli were combined into AX discrimination items with 14 other stimuli from the continuum, yielding items with gap sizes ranging between 10 and 150 ms. Each stimulus pair occurred in AX and XA presentation order. In AX order, the stimulus with the earlier rise location was presented first, while in XA order the stimulus with the later rise location came first. The pairs for fall-alignment were constructed along the same principles as those for RA, i.e., two reference stimuli were chosen, which were combined with 14 other stimuli from the FA continuum so as to give gap sizes between 10 and 150 ms. In the early standard, the endpoint of the fall coincided with vowel onset of the syllable nucleus in the experimental syllable. In the late standard, the endpoint of the fall was located 50 ms into the vowel.
75
Pitch movement alignment in Dutch Vowel Onset I
n
e: 1=110
1=200
1=150
100 ms
Figure 5. Illustration of the range of variation in the stimuli for risealignment (top) and fall-alignment (bottom). Vertical lines indicate the segment boundaries and durations are specified in milliseconds. The dashed lines represent the location of the pitch movements in the reference stimuli with early and late standard.
Each discrimination pair occurred five times in the test, so that a total of 600 items were obtained. These items were recorded on a test tape, which consisted of two presentation blocks. The first block contained the items for the rise-alignment task in random order. The second block contained those for the fall-alignment task. Each block was preceded by 10 trial items. The items were not preceded by an identifier, but after each set of 10 discrimination items, the informants heard an orientation signal. The inter-stimulus interval was 500 ms, whereas the inter-item interval amounted to 2500 ms. A total of 11 informants took part in the experiment. They were Belgian and Dutch visting students at the University of Aarhus (Denmark). They were all native speakers of Dutch and participated on a voluntary basis. The informants were seated in a quiet language laboratory, in which the volume level of the headphones was individually adjustable. They were told that they were going to take part in an experiment on Dutch intonation, which aimed to establish the intonational differences that can be perceived by native speakers of Dutch. In both experimental sessions, informants were instructed to concentrate on the melodic characteristics of the word "Heleen" in the stimuli of the discrimination items and were asked whether they could hear a melodic difference between the two realizations or not. Their judgements were given on a scoring sheet in terms of the labels "same" or ·· different". Before the start of the experiment, the subjects heard 10 trial items to ge t accustomed to the task and adjust the volume to a comfortable listening level. After presentation of 150 discrimination items, subjects were given a short pause. Total duration of the experiment was approximately 45 minutes.
J. Verhoeven
76
1-
z w a:
-e- Rise ... Fall
w
II. II.
0 ;fl.
0
0
0
0
0
0
0
0
~
N
M
V
~
W
~
0
ro
0
rn
0
0
..-
0
0
......
.,....
N
0
0
0
M
V
~
,....
Gap Size
Figure 6. Percentage " different" judgements as a function of gap size for rise and fall-alignment. Each da tapoint summarizes 220 observations.
2.2.2. Results The discrimination results averaged over all the experimental conditions are summarized in Fig. 6. The thresholds of pitch movement alignment were established for the individual subjects in the different experimental conditions, using the same technique as in the previous experiment. The values established by this procedure were then submitted to a three-way analysis of variance. The independent variables are the type of pitch movement (rise vs. fall), the location of the movement in the reference stimulus (early vs. late) and the presentation order of the stimuli in the pairs (AX vs. XA). The main effects of type of pitch movement [F(l, 3) = 10.436, p < 0.002] and location of the movement in the standard stimulus [F(1, 3) = 8.572, p < 0.005] are significant, whereas the factor presentation order and the higher order interactions are not.
2. 2. 3. Discussion The results obtained in this experiment are highly consistent with the findings of the pilot experiment (Fig. 2). As far as the general relationship between discrimination and gap size is concerned , it can be observed that discrimination improves as the alignment difference between the movements in the items increases. Moreover, it can be seen that the fall-alignment differences are consistently better discriminated than equivalent differences in rise-alignment: the threshold for falls is 70 ms and 95 ms for rises. This supports the earlier observation and original hypothesis that informants are more sensitive to variations in fall-alignment than to variations in risealignment in the hat pattern. It was already pointed out that a similar differential sensitivity was noted by 't Hart (1981) in an investigation of the discrimination of the
Pitch movement alignment in Dutch
77
excursion size of pitch movements. 't Hart found that informants are more sensitive to excursion size differences in rises than in falls. Apart from the significance of the type of pitch movement, a statistically significant effect is observed of the movement's location in the reference stimulus . This finding directly relates to the boundary hypothesis, which predicted that is not distance in absolute terms which is decisive to discrimination , but rather whether the pitch movements in the two stimuli of a discrimination item are located on different sides of a temporal boundary. In order to investigate this hypothesis, the location of the pitch movement in the reference stimulus was controlled as an experimental variable. In one set of stimuli, the endpoint of the pitch movement in the reference stimulus coincided with the onset of the vowel nucleus of the experimental syllable. In the second set, the endpoint was located 50 ms later. The segmental durations of the experimental syllables were identical in all stimuli. The boundary hypothesis predicted that the threshold for items with an early standard is larger than the threshold for items with a late standard, since in the former it takes a greater distance to cross this hypothetical boundary. Calculation of the threshold in terms of the location of the movement in the reference stimulus indicates that for items with an early standard the threshold amounted to 91 ms, while those with a late standard have a threshold of 69 ms. Although the difference between these is statistically significant, it is not so large as the 50 ms difference in the location of the early and late standards, so that the boundary hypothesis cannot be maintained if it is taken that the boundary is situated in the middle of the syllable nucleus. As an alternative to the boundary hypothesis, the spectral constraints hypothesis proposed by House (1990) can be considered to account for the difference in discrimination as a function of pitch movement location in the standard. This hypothesis holds that informants' sensitivity to pitch features decreases as the spectral complexity of the signal increases: We can hypothesize a relationship between signal complexity and pitch sensitivity. As the complexity of the signal increases , pitch sensitivity decreases. We can then speculate that when the perceptual mechanism is maximally loaded with the task of resolving spectral information and rapid changes in intensity [ ... ) its capacity to resolve fundamental frequency movement is decreased . (p. 34)
The structure of the stimuli in this experiment with respect to the location of the standard movement in the reference stimulus is such that the early movement is located entirely in the prevocalic consonant of the experimental syllable (Fig. 6). As a result, all the alignments that are to be discriminated are situated such that the pitch transitions overlap with the spectral transition between this consonant and the vowel nucleus. The spectral constraints hypothesis states that the processing system is maximally loaded with resolving the spectral transition information so that it is less sensitive to the alignment difference. In the items with a late movement in the reference stimulus, the standard movement is located partially in the vowel nucleus. Consequently, the pitch transitions are situated in a spectrally stable portion of the signal and are easier to discriminate. The second hypothesis stated that informants perceive pitch movement alignment in terms of variations in overall pitch level of the experimental syllable and it was observed that subjects are more sensitive to pitch raising than to pitch lowering. If these observations are related to the presentation order of the stimuli in the
78
J. Verhoeven
discrimination items , it is predicted that informants are most sensitive to alignment differences in XA presentation order in the rise-alignment task and in AX order in the fall-alignment -experiment. These respective orders of stimulus presentation involve an increase in overall syllable pitch. The reverse order presentations involve a lowering of syllable pitch and are expected to be less well discriminated. The analysis of variance indicates that presentation order is not significant. In both presentation orders, the estimated threshold amounts of 80 ms , so that the pitch height hypothesis cannot be maintained. So far, the differential sensitivity of informants has only been considered explicitly from a psychoacoustic perspective. An alternative third explanation of the differential threshold can be suggested from a psycholinguistic perspective. In the instructions for the previous experiments, informants were asked to pay attention only to melodic differences related to either the rises or the falls. It can nevertheless be assumed that informants may have listened to the whole intonation contour and recognized it as an instance of the flat hat pattern. It was mentioned that the precise location of the rise cannot be exploited to make perceptual/linguistic distinctions between this pattern and any other pattern in Dutch , but that fall differences can be perceived linguistically (Verhoeven, 1991a ). As a result , informants may have been rather insensitive to fairly large differences in alignment of the rises in this pattern and more sensitive to falls . This hypothesis accounts for the difference in threshold by relating informants' sensitivity to linguistic knowledge and assumes that discrimination is at least in part determined by the relative importance of a pitch feature for signalling linguistic distinctions. This hypothesis crucially depends on the assumption that the differential sensitivity observed in the previous experiments does not just represent a generalized differential response to the type of pitch movement. In other words: a prerequisite for the psycholinguistic interpretation to be valid is that it has to be shown that informants are essentially equally sensitive to rise and fall-alignment in a context in which rise and fall-alignment differences are exploited similarly from a linguistic point of view.
3. Alignment of isolated rises and falls In this experiment, the discrimination of pitch movement alignments is investigated for rises and falls in a contour in which variation along the alignment dimension is assumed not to yield linguistically meaningful contrasts with other contours . To this end, two fixed standard AX discrimination experiments were carried out in which informants were asked whether they could discriminate between pairs of stimuli which differed from each other in terms of the position of an F 0 transition in an accented syllable . The independent variables in the experimental design are the type of pitch movement (rise vs. fall) , presentation order (AX vs. XA) , gap size between the locations of the movements in the stimuli (15-150 ms) and the location of the movement in the reference stimulus (early vs. late). 3.1. Experimental design 3.1.1. Stimuli The stimuli for this experiment were prepared by resynthesizing modified natural speech, the method of which is described in the method sections of the previous experiments.
79
Pitch movement alignment in Dutch
For this experiment, the utterance [fiele:na] was chosen as the basis for the stimuli. In this utterance, all the segments are voiced and [1e:] carries the primary accent: it is preceded and followed by a single unstressed syllable. The stressed syllable was used as the experimental syllable with which pitch movement alignment distinctions are associated. The utterance was read 10 times by a male native speaker of Dutch on a monotone pitch of medium height. The utterances were recorded in a sound-treated room and from these recordings , one delivery was chosen which sounded fluent in all respects . The utterance was digitized at 20 000 Hz and analysed acoustically by means of the API subroutine of the ILS signal processing package. On this utterance, four pitch movement alignment continua were implemented. In order to obtain the stimuli for two rise-alignment continua, the utterance was given a pitch contour in which a rising pitch movement was positioned in the accented syllable. The rise was preceded by a stretch of low declination on the first syllable and followed by high declination extending into the third syllable. Duration of the rise was 100 ms with an excursion size of five semi tones. In the first continuum, the endpoint of the rise was located 5 ms after vowel onset. This stimulus will be referred to as the reference stimulus with the early standard. From this contour, 11 stimuli were derived by shifting the rise to the right in steps of 15 ms, leaving the remaining part of the contour unchanged . In the second rise-alignment continuum , the endpoint of the rise in the refence stimulus was located 105 ms after vowel onset (late standard) and 11 stimuli were derived by shifting the rise to the right in steps of 15 ms. The alignments of the rises to the segmental string in both continua are illustrated in Fig . 7. The stimuli for the fall-alignment continua were derived from the same realization of the utterance as the stimuli in RA, so that the segmental durations and
100ms
Late Standard
Early Standard t=85
e:
n
0
Vowel Onset
Figure 7. Location of the rising pitch movements relative to the segmental string in the stimuli for the two rise-alignment continua .
J. Verhoeven
80 100 ms
n
e:
a
;s•~~~ ~
Early Standard
I
:
f Late Standard
I
Vowel Onset
Figure 8. Location of the falling pitch movements relative to the segmental string in the stimuli for the two fall-alignment continua.
spectral characteristics were identical. The utterance was given a pitch contour consisting of a falling pitch movement in the accented syllable, preceded by a stretch of high declination and followed by low declination. Duration, excursion size and location of the endpoint were identical to those of the rises. From this standard contour, 11 stimuli were derived by shifting the fall to the right in steps of 15 ms. The same procedure was applied to obtain 11 stimuli for a second fall alignment continuum, in which the fall endpoint in the reference stimulus was located 105 ms after vowel onset. The alignment of the falls in both continua is illustrated in Fig. 8. 3.1.2. items In order to obtain the discrimination pairs, the reference stimuli from each alignment continuum were combined with every other stimulus in the continuum as a result of which alignment differences between reference stimulus and comparison stimulus ranging from 15-150 ms were obtained. In order to counterbalance for order effects, two presentation orders were used. In AX order, the stimulus with the earlier movement came first. In XA order, the stimulus with the later movement was presented first. The pairs in each experimental condition occurred four times in the test, so that a total of 320 discrimination pairs were obtained, i.e., 160 for RA (10 pairs X 2 standards X 2 orders x 4 repetitions) and 160 for FA . 3.1.3. Test tapes Two test tapes were recorded, each of which consisted of two presentation blocks. On the first tape, the first block contained the discrimination items for RA, randomized across all the variables in the experimental design, i.e., location of the standard in the reference stimulus (early, late), gap size (15-150 ms) and presentation order (AX, XA) . The second block consisted of the items for FA with a
Pitch movement alignment in Dutch
81
different randomization. On the second tape, the first presentation block contained the items for FA. The randomization of these items was identical to the one for RA on tape 1. The second block consisted of the items for RA, the randomization of which was identical to the items for FA on tape 1. The inter-stimulus interval in the items was 500 ms, while the inter-item interval amounted to 2500 ms. There were no identifiers on the tape, but after presentation of 10 items, the informants heard an orientation signal. 3.1.4. Informants Two groups of informants were recruited (n = 28) from the student population of the Department of Germanic Philology and the Department of Business Studies of the University of Antwerp (UFSIA), Belgium. They were all native speakers of Dutch and took part on voluntary basis. Fifteen informants listened to tape 1 and 13 to tape 2. Each informant participated in only one experimental session. 3.1.5. Procedure The informants were seated in a quiet language laboratory, in which the volume level of the headphones was individually adjustable. They were told that they were going to take part in an experiment on Dutch intonation, which aimed to establish the intonational differences that are perceivable by native speakers of Dutch. In the experimental task, the acoustic dimension was not specified explicitly: informants were required to report on whether the speech melodies on the experimental syllable of each word pair were "same" or "different". Before the start of the experiment, the subjects heard 10 trial items to get used to the task and adjust the volume to a comfortable listening level. Subsequently, they proceeded to the two presentation blocks in each experimental session. Informants were given a 5 minute break between the two presentation blocks. Total duration of each tape was 32 minutes. 3.2. Results and discussion The discrimination judgements are summarized in Fig. 9. As in the previous experiments, the thresholds of pitch movement alignment were estimated for the individual subjects in the different experimental conditions by means of the method of average z-scores. These estimates were subsequently submitted to an analysis of variance. The independent variables in the analysis were type of pitch movement (rise vs . fall), stimulus presentation order (AX vs. XA) and location of the movement in the reference stimulus (early vs. late). The results show that location and order have a significant effect on discrimination [F(1, 3) location= 6.199, p < 0.014; F(1, 3) order= 4.842, p < 0.029]. The type of pitch movement does not have a significant effect on discrimination [F(1, 3) = 0.149, p < 0. 700]. In addition, a significant interaction between type of pitch movement and presentation order is noted [F(1, 3) = 4.905, p < 0.028]. The statistical analysis indicates that there is no significant difference in discrimination of rise and fall alignment. For both types of pitch movements, subjects need an alignment difference of 82 ms in order to judge the items "different" 50% of the time. This is consistent with an interpretation of the
82
J. Verhoeven
~
z
w a:
-& Rise
w
u.. u..
...- Fall
c
"#
0
15
30
45
60
75
90
105
120
135
150
Gap Size
Figure 9. Percentage "different" judgements as a function of gap size. Each datapoint summarizes 44R observations.
differential threshold in the hat pattern in terms of the linguistic relevance of alignment variation in the pattern, in that the differential threshold observed in the previous experiments clearly does not represent a general differential perceptual response to rises and falls. The results show that alignment is essentially perceived similarly in a context in which pitch movements function similarly: positional variation in isolated rises and falls is not exploited linguistically in the Dutch intonational system. The first of the significant effects relates to the location of the standard in the reference stimulus of the discrimination items. For both types of pitch movements the effects is the same, in that items with an early standard have a larger threshold than the items with a late standard (89 vs. 77 ms). This means that items with an early standard are somewhat more difficult to discriminate than items with a late standard. This observation is consistent with the results of the previous experiment, in which the effect was accounted for by the spectral constraints hypothesis. Also in this experiment, the location of the standard movement in the reference stimulus is such that the early movement is located almost entirely in the prevocalic consonant (Figs 7 and 8). As a result, all the alignment differences that are to be discriminated are situated round the spectral transition between this consonant and the vowel nucleus. The spectral constraints hypothesis states that the processing system is maximally loaded with resolving spectral information so that it is less sensitive to the alignment difference. In the items with a late movement in the reference stimulus, the standard movement is located such that all the alignment differences are situated in the spectrally stable portion of the signal. Finally, the data reveal a significant effect of the presentation order of the stimuli in the discrimination pairs, such that the AX (early-late) presentation order is more difficult to discriminate than the XA order. The thresholds are 89 and 78 ms,
Pitch movement alignment in Dutch
83
respectively. This seems to suggest that informants are more sensitive to advancing pitch movements in the syllable than to delaying them. The relevance of this main effect has to be taken with care, since the statistical analysis also indicates a significant interaction between the type of pitch movement and presentation order. The nature of this interaction suggests that there is an order effect only in the items for fall-alignment discrimination. For the falling pitch movements, the order effect is such that best discrimination is obtained in the XA order. This is the order in which the fall in the second stimulus of the discrimination items is advanced with respect to the first stimulus. It is not immediately clear why the order effect is confined to the items with falling pitch movements alone, but it can be observed that this result is not consistent with the pitch height hypothesis that was formulated earlier: the order which is best discriminated for falls is the one which actually lowers the pitch on the experimental syllable, whereas the pitch height hypothesis postulated that subjects are more sensitive to a presentation order which raises the pitch level of the experimental syllable.
4. General discussion The results obtained in these experiments have important implications. In the first instance, they have consequences for the methodology of intonation analysis, particularly for those methods which assume that the perception of pitch features is uniform across prosodic contexts. This "uniformity assumption" is implicit in for instance the methodology used by researchers at the Institute of Perception Research in their intonation analysis of various languages ('t Hart eta!., 1990). This perceptual approach to the analysis of intonation essentially consists of two steps. The input to the analysis are F 0 curves in utterances, which are regarded as the concrete acoustic manifestation of abstract intonation patterns. It is assumed that these abstract patterns are not directly observable in the speech signal, since F 0 is essentially a conglomerate of features, some of which relate to intonation while others result from the interaction between the articulatory and phonatory mechanism. Therefore, it is proposed to eliminate segmental information from the signal by applying the perceptual filter of the auditory mechanism. This has been achieved by the stylization method, in which F0 curves are modelled artificially by a minimal set of straight lines in such a way that they are perceptually indistinguishable from the original contours. The stylizations are assumed to contain F 0 information related to intonation only. Application of this method has suggested that global intonation contours consist of a small number of pitch events: they are called "perceptually relevant pitch movements" and cannot be left out of the description without serious perceptual consequences. These are taken to be the discrete, atomistic and invariant units in terms of which the internal structure of global pitch contours can be accounted for. The second step in the analysis has consisted of the standardization of the pitch movements with respect to a small number of perceptual dimensions. This was achieved by perceptual experimentation into the tolerance levels of native speakers with respect to each of these variables. The knowledge obtained in these experiments is subsequently used to categorize pitch movements in the stylized F0 contours. The assumption that has to be made in order to be able to do this is that
84
J. Verhoeven
the perception of pitch features is uniform , i.e. , what is perceivable in one context is equally perceivable in another context. It was already pointed out in 't Hart (1979) that this assumption may not be justified: the perceptual tolerances whi ch make stylization possible at all, do not seem to be uniform , that is, to our expe rience , one has to be every precise on some places, whereas o n other places it does not matter ve ry much (p. 377]
The results of our experiments on pitch movement alignment provide further evidence that the uniformity hypothesis cannot be maintained. As a result, the extrapolation of knowledge obtained on the perception of pitch features in a particular prosodic context to a larger variety of contexts does need to be approached with some reservation. More concretely, our results show that the tolerance levels in the perception of pitch movement alignment established in one context cannot be automatically extended to all prosodic contexts: subjects are less sensitive to rise-alignment in the flat hat pattern than when the movement occurs in isolation. Moreover, the discrimination of pitch movement alignment is not uniform within the syllable itself: the perception experiments reveal a significant effect of the location of the standard movement in the reference stimulus, which was accounted for by the spectral constraints hypothesis. It can furthermore be pointed out that the tolerance levels established for a particular type of pitch movement are not necessarily extendable to other types of pitch movements. Indicative of this is 't Hart's experiment (1981) on pitch excursion size, in which the excursion of falls is more difficult to judge than the excursion of rises. Also, rise-alignment differences in the hat pattern are more difficult to discriminate than fall-alignment differences.
5. Conclusions In this paper, the discrimination of pitch movement alignment was investigated with respect to falling and rising pitch movements in Dutch. The starting point of the investigation was a pilot experiment in which it was observed that subjects are differentially sensitive to pitch movement alignment in the Dutch hat pattern . This finding was confirmed in a follow-up experiment with a more balanced design. In addition, this experiment did not provide support for the pitch height or the boundary hypothesis. In the former, the differential threshold was accounted for by the fact that subjects perceive pitch movement alignment in terms of variations in overall syllable pitch . In the latter, the existence of a temporal boundary crucial to perception was postulated. The results of the third experiment indicate that the differential sensitivity does not represent a general differential response to pitch movement alignment as a function of the type of pitch movement. Therefore , it is likely that the differential threshold is to be accounted for by the fact that positional variation of rises in the Dutch hat pattern is not crucial to the identity of the pattern. In a more general perspective, the results of the discrimination experiments confirm the expectation that the perception of pitch movement alignment is not uniform and this fits well with other data on the non-uniformity of pitch perception. In some instances, the alignment of rises is equally well discriminated as the alignment of falls, whereas in other prosodic environments and given the same experimental conditions, rises are discriminated worse than falls. In addition, discrimination of alignment is not uniform within a given syllable, but seems to
Pitch movement alignment in Dutch
85
depend on the location of the standard movement with respect to the spectrally steady-state portion of the syllable nucleus. The results of this non-uniformity is that knowledge about the perception of pitch features that is obtained in a particular prosodic context cannot be automatically extended to a large variety of prosodic contexts. References Bruce, G. (1977) Swedish word accents in sentence perspective. Lund: Gleerup. Collier, R. (1975) Perceptual and linguistic tolerance in intonation, /RAL , XIII, 293-307. Collier, R. (1983) Some physiological and perceptual constraints on tonal systems, Linguistics, 21, 237-247. Dryden, N. & Mcleod, I. (1988) Speech synthesis by concatenation of segmented elements of natural speech, Work in Progress, 21, 147-151. Gartenberg , R. & Panzlaff-Reuter, C. (1991) Production and perception of F0 peak patterns in German, Arbeitsberichte (AIPUK), 25, 31-113. Hill, D . R. & Reid, N. A. (1977) An experiment on the perception of intonational features, International Journal of Man-Machine Studies , 9, 337-349. House , D. (1990) Tonal perception in speech. Lund: University Press. Kohler, K. J. (1991) Terminal intonation patterns in single-accent utterances of German: phonetics, phonology and semantics, Arbeitsberichte (AIPUK), 25, 117-185 . Nabelek, I. V., Nabelek, A. K. & Hirsh, I. J. (1970) Pitch of tone bursts of changing frequency, Journal of the Acoustical Society of America, 48, 536-553. Pierrehumbert, J . & Steele , S. (1989) Categories of tonal alignment in English, Phonetica , 46, 181 - 196. Purcell, E. T. (1976) Pitch peak location and the perception of Serbo-Croatian word tone, Journal of Phonetics, 4, 265-270. Schmidt, J . E. ( 1986) Die Mittelfriinkischen Tonakzente (Rheinische Akzentuierung). Stuttgart: Franz Steinder Verlage Wiesbaden. Silverman, K. E. A. & Pierrehumbert, J. B. (1990) The timing of prenuclear high accents in English. In Papers in laboratory phonology/. Between the grammar and physics of speech (J. Kingston & M. Beckman , editors) , pp. 72- 106. Cambridge: Cambridge University Press. 't Hart, J. (1979) Relations between physical and perceptual aspects of intonation, Annali Della Scuola Normale Superiore di Pisa, Classi di let/ere e filosofia, III-IX, 367-379. 't Hart, J. ( 1981) Differential sensitivity to pitch distance , particularly in speech, Journal of the Acoustical Society of America, 69, 811-821. 't Hart, J. & Collier, R. ( 1975) Integrating different levels of intonation analysis, Journal of Phonetics, 3, 235-255. 't Hart, J ., Collier, R. & Cohen, A. (1990) A perceptual study of intonation. An experimental-phonetic approach to speech melody. Cambridge: Cambridge University Press. Verhoeven, J. (1991a) Perceptual aspects of Dutch intonation. Unpublished Ph.D. dissertation, Edinburgh University. Verhoeven , J. ( 1991 b) Preliminary observations on the discrimination threshold of pitch movement alignment, Progress Reports from Oxford Phonetics, 4, 53-65. Verhoeven, J. & Connell, B. (1992) Tonal accents in a Limburg dialect : an acoustic-phonetic investigation, Progress Reports from Oxford Phonetics, 5, 60-72 . Wakita, H. (1980) New methods of analysis in speech acoustics, Phonetica, 37, 87-108. Woodworth, H. S. & Schlosberg, R. (1954) Experimental psychology.