On the psychoacoustic nature of the P-center phenomenon

On the psychoacoustic nature of the P-center phenomenon

Journal of Phonetics (1989) 17, 175-192 On the psychoacoustic nature of the P-center phenomenon Bernd Pompino-Marschall Institut fiir Phonetik und S...

7MB Sizes 7 Downloads 49 Views

Journal

of Phonetics (1989) 17, 175-192

On the psychoacoustic nature of the P-center phenomenon Bernd Pompino-Marschall Institut fiir Phonetik und Sprachliche Kommunikation, Ludwig-Maximilians- Universitiit , Munich, F.R.G. Received 29th September 1988, and in revisedform 13th February 1989

It seems almost generally accepted that the P-center, defined as the psychological moment of occurrence of a syllable, is only determined by the duration of the syllable's consonantal onset and of its rhyme, independently of one another and independent of their phonetic categorization. Summing up the results of a number of our own experiments- already described elsewhere-it will be shown that in contrast to this view (I) the effect of prevocalic consonant duration is not independent of vowel duration and vice versa and that both effects are non-linear, (2) the syllable's rhyme does not affect its P-center as a unit, but that there are also different effects of vowel and final consonant duration, again interacting with one another, and (3) phonetically different syllables behave differently and that P-center shifts can also be induced by non-temporal variations. An elaboration of a purely psychoacoustic model- proposed elsewhere- based on the time course of the loudness within each critical band [sone(Bark] can account for all these effects fairly well. Furthermore, this model distinguishes between " points of onset" and " temporal centers of gravity", which both seem to be psychologically relevant factors in experimental tasks involving subjectively regular rhythm. Finally the P-center phenomenon is discussed in connection with the predictions of this perceptual model and with current models of articulatory timing.

1. Introduction

Alternating sequences of syllables (Morton, Marcus & Frankish, 1976; Marcus, 1976) or of tone or noise bursts (Terhardt & Schutte, 1976), when presented with equal temporal intervals between successive acoustical onsets, are not perceived as having a subjectively uniform rhythm. Generally, it is assumed that this effect is due to the fact that the psychological onset of an acoustic event does not correspond to its acoustical onset. Nor is this so-called P-center tied to any obvious acoustic marker in the signal. To account for this effect, different models have been proposed: the acoustic two-factor model of Marcus (1976, 1981)--characterized as phonological by Cooper, Whalen & Fowler ( 1988)-with duration of the initial consonant and of the syllable's rhyme as determining parameters; the center of gravity model proposed by Howell ( 1988), arguing for a dependence of the P-center location from the global (spectra-) temporal distribution of 0095-4470/89/030175

+

18 $03.00/0

© 1989 Academic Press Limited

176

B. Pompino-Marschall

energy within the whole syllable, and different versions of a psychoacoustic threshold model (Schutte, 1978; Kohlmann , 1984; Vos & Rasch , 1981). In the following, the predictions of these models are discussed in the light of the results obtained in a series of experiments on subjectively uniform rhythm (cf. Pompino-Marschall, 1987, 1988; Pompino-Ma rschall , Tillmann & Kuhnert, 1987). Finally, a modified version of a " psychoacoustic threshold model" (cf. Pompino-Marschall, 1988) is proposed, and implications of t!"Jis model are also discussed in connection with observations made in experi ments on production of rhythmically uniform sequences (Pompino-Marschall & Tillmann, 1987; Kuhnert, 1988). 2. Experiment 1

The purpose of the first experiment was to test the assumptions underlying the model of Ma rcus (1976 , 1981) directly, namely the additivity of two linear factors depending on segment durations . Therefore both initial consonant and vowel duration were varied independentl y of one another in simple CV syllables. 2.1. Method

2.1 .1. Stimuli The stimuli consisted of five synthetic /rna/ continua. Within each continuum, the duration of the initial consonant was varied in steps of 40 ms from 40 to 200 ms; between the continua the duration of the vowel was varied also in steps of 40 ms from I 00 to 260 ms, resulting in a total of 25 stimuli. Vowel duration always included a 40 ms transi'cion. Fundamental frequency was set at I 00 Hz for the entire syllable, and amplitude was hel d co nstant over the steady state parts (42.5 dB for fm/, 52.7 dB for fa/ with a linear increase over the transitional part). The stimuli were synthesized with a program based on the Klatt software synthesizer (Klatt, 1980). 2.1.2. Procedure For each single trial, one of the stimuli (S) was presented to the subjects via headphones alternating with a click (5 ms, I kHz tone burst, C) in sequences of five signals with an overall tempo of 120 signals per minute: C-S-C- S- C or S-C-S-C-S. The subjects had to adjust the temporal alignment of the click sequence relative to the test syllable seq uence to perceived isochrony by turning a potentiometer knob (adjustments were possible with an accuracy of I ms). The time instant bisecting the duration between two successive clicks measured relative to the acoustical syllable onset was taken as an indicator of the P-center location (cf. Fig. 1). All stimuli were adjusted by the subjects six times in one session in two types of sequence: those beginning with the test syllable vs. those beginning with the click . To prevent inclusion of some apparent misadjustments Syllable sequence

t -E---E

I

p-center

I s ------;;.

• Clic k sequence

Figure l. Schematic display of the test seq uences (see text for details).

Psychoacoustics of the P-center

177

and to reduce overall variability for each type of sequence, the extreme adjustments within each experimental trial of six adjustments were omitted from the analysis. There were five sessions for every stimulus, resulting in 40 adjustments (5 sessi ons x 2 types of sequence x 6 - 2 adjustments) per item for each subject. 2. 1.3. Subjects Three subjects, experienced in rhythmicity tasks, one female a nd two males, took part in the experiment. 2.2. R esults The results pooled over the three subjects are shown in Table I. Since thi s experiment is part of a larger series (Pompino-Marschall , 1989) with analogously constructed materials (cf. Experiment 3 and below) the data together with those of the other experiments of this series were subjected to a multifactorial analysis of va riance (cf. Appendix and Pompino-Marschall, 1988). Besides an unexpected significant difference between subjects (p < 0.001), interacting with the other effects significantly (p < 0.00 1), there was a clear effect of initial consonant duration (p ~ 0.001) on P-center location as would be predicted by the two-factor model: each 40 ms increment in consonant duration increased the magnitude of the departure from aco ustic isochrony for all vowel durations in all subjects' adjustments without exception . Also in accordance with the duration-based model the weaker effect of vowel duration was significant (p < 0.001) for all subjects but not without single exceptions for some consonant durations . (This was especially the case for one of the male subjects, who was also highly responsible for the subject effect.) In contrast to the model of Marcus (1976, 198 1) however, there was a significant interaction between the effects of consonant and vowel duration for all subjects (p < 0.001) . Furthermore, all simple consonant a nd vowel effects also showed strong non-linearities (cf. PompinoMarschall, 1987, 1988). In order to test the predictions of a psychoacoustic threshold model based on the low-passed sound pressure envelope (Schutte, 1978) and the "center of gravity" model I. P-center loca tions rela tive to acoustical syllable o nset in ms (means, with sta nd a rd deviations in brackets)

T ABLE

Duration of the initial consonant (ms)

Vowel d uration (ms) 100

140

180

220

260

40

54.8 (23.4)

64.4 (24.7)

73.9 (25. 1)

77.6 (26.9)

79.6 (32.6)

80

100. 1 (23.6)

113.2 (22.7)

125.3 (22.4)

127.5 (2 1.2)

137.7 (18.8)

120

137.7 (19.0)

145.8 (18 .1 )

154.9 (20.8)

166.3 ( 17.3)

174. 1 (15. 7)

160

172.3 (22.1)

172.6 (29.7)

191.1 (18 .8)

198.6 (21.4)

206.9 (15.2)

200

197.5 (27 .9)

2 12.3 (27 .6)

2 18.4 (21.6)

225.6 ( 19.8)

238.5 ( 15.4)

B. Pompino-Marschall

178 250 f-

v;

/

200

E c 0

·.;:: 0 (.)

~
c


~

. _#•. ,~~/ 0

150-

100-

(.)

·;

I (L

50-

0

~~ I

I

I

I

I

40

80

120

160

200

Duration of the initial segment (ms)

Figure 2. Variation in P-center location due to the duration of the initial segment in /rna/ syllables (unbroken curve of best fit) and in l 00 Hz square wave sounds of identical sound pressure envelope (dashed curve of best fit) with different durations of the final segment (for the non-speech stimuli: O: l 00 ms; e: l40ms; A: l80ms; •: 220ms; D: 260ms; points represent the vowel dependent va lues for the syllables, cf. Table I).

of Howell (as far as it is explicitly stated in Howell, 1988 1 ), the experiment was replicated using non-speech stimuli-100Hz square wave sounds-with sound pressure envelopes identical to those of the jmaj syllables, sounding like a soft hum followed by a louder one. For these stimuli, both models would predict the same acoustic anisochronies as seen for the syllabic material, because on the one hand, the threshold of the filtered sound pressure envelope (16% of the local increase in sound pressure in the model of Schutte, 1978) would be crossed at the same point in time and also the timing of the center of gravity of the amplitude envelope would remain the same. The overall results of this experiment together with those of the jmaj syllables are depicted in Fig. 2. All effects noted for the speech material are present for these stimuli as well. But there can also be seen a significantly (p < 0.001) smaller overall degree of acoustic anisochrony for the nonspeech material; this would speak against models based on amplitude envelope alone.

3. Experiment 2 The second experiment aimed at testing whether only the overall duration of the syllable's rhyme affects the P-center location for the syllable (cf. Cooper, Whalen & Fowler, 1988), or if we could find different interacting effects of vowel and final consonant duration parallel with those of initial consonant and vowel duration in Experiment 1. 3.1. Method Analogously to Experiment 1, the vowel and final consonant durations of synthetic jam/ syllables were varied independently of one another: vowel duration in steps of 40 ms 1 Although Howell 's (1988) explicitly stated model interprets differences in P-center locations as due to differences in the timing of the center of gravity of the rectified sound pressure envelope, he notes that different parts of the sound pressure envelope (i .e. the different phonetic segments) may have to be weighted differently depending on their spectral content.

179

Psychoacoustics of the P-center

again from 100 to 260 ms and final consonant duration also in steps of 40 ms from 80 to 240 ms. The procedure was identical to that of the first experiment. Except for the one male subject showing the most divergent behavior in Experiment 1, the same subjects participated again. 3.2. Results The results pooled over the subjects are shown in Table II . As in Experiment 1, the data were subjected to a multifactorial analysis of variance. Analogously to Experiment 1, the effect of subjects as well as the interaction of this with the other effects were significant (p < 0.00 I). Both effects of vowel and final consonant duration- an increase of acoustic anisochrony resulting from an increment in duration-were significant for both subjects (p < 0.001). Parallel with the results reported by Cooper, Whalen & Fowler (1988), the P-center shifts induced by both vowel and final consonant duration are far more variable than those due to the duration of the initial consonant. As in Experiment 1, there was a significant interaction of vowel and final consonant effect for both subjects speaking against both the assumption of the additivity of effects and a simple effect of overall syllable rhyme duration (Cooper, Whalen & Fowler, 1988) underlying the two-factor model. Non-linearities in the simple vowel and consonant effects are here seen only for the male subject. As in the case of Experiment I , this experiment was replicated with 100Hz square wave sounds with sound pressure envelopes identical to those of the syllable stimuli. The overall results of both parts of this experiment are shown in Fig. 3. Again, with the exception of missing non-linearities in the simple effects of the non-speech material, all effects are significant for both stimulus sets (cf. Appendix). There is also a significant (p < 0.001) difference between the results for the speech and non-speech material, here in contrast to Experiment 1, resulting in a greater overall degree of acoustic anisochrony for the non-speech material. 4. Experiment 3

In this experiment (cf. Janker, 1987) we wanted to test whether the P-center effect is only dependent on segment durations and unaffected by the phonetic categorization of those segments. II. P-center locations relative to acoustical syllable onset in ms (means, with standard deviations in brackets)

TABLE

Duration of the final consonant (ms)

Vowel duration (ms) 100

140

180

220

260

80

20.5 (13.3)

24.8 (I 0.8)

31.8 (12 .0)

41.4 (12.1)

52.6 (20.8)

120

20 .5 ( 11.4)

29.5 (10.7)

38.0 (14.2)

52.4 (16.9)

46.5 (12.6)

160

27.5 ( 11.8)

34.7 (11.5)

50.0 (17 .0)

50.0 (13.8)

60.5 (16.6)

200

32.8 (14.1)

44.5 (15.9)

46 .3 (16.9)

56.5 (17 .1)

61.3 (18.0)

240

45.0 (19.1)

44.7 (16.0)

56.4 (14.5)

61.4 (16.6)

72.5 (18 .5)

180

B. Pompino-Marschall u;

.Sc ~

.9 -E

100-

0 0

6-

50f--~~-----: . . -

--.

-

Q)

u I

()_

100

140

180

I

I

220

260

Duration of the initial segment (ms)

Figure 3. Variation in P-center location due to the duration of the initial segment in (am/ syllables (unbroken curve of best fit) and in 100Hz square wave sounds of identical sound pressure envelope (dashed curve of best fit) with different durations of the final segment (for the non-speech stimuli: O: 80 ms , e: 120ms, ~:;; 160ms, & : 200ms, D: 240ms; points represent the consonant dependent values for the syllables, cf. Table II).

4.1. Method

Parallel with Experiment I, the duration of the initial consonant and of the vowel were varied in a series of synthetic /Ji/ syllables independently of one another. The fundamental frequency of the vocalic part was set at I 00 Hz and the sound pressure envelope of the /Ji/ syllables, as well, was identical to those fo the jma j syllables. The procedure and the subjects were identical to those of Experiment I. 4.2. Results The results pooled over the three subjects are shown in Table III. In contrast to the predictions of the duration-based model, an analysis of variance revealed a significant difference (p < 0.001) between the results of this experiment and those of Experiment I. The general trends however are in agreement: again there is a highly significant effect of initial consonant duration for all subjects (p ~ 0.001). Without any exception longer initial consonants result in larger P-center shifts. The less strong and less clear-cut vowel effect is significant (p < 0.001) only for two subjects. The male subject with the most III. P-center locations relative to acoustical syllable onset in ms (means, with standard deviations in brackets)

TABLE

Duration of the initial consonant (ms)

Vowel duration (ms) 100

140

180

220

260

40

35.6 (17.8)

43.7 (15 .3)

45.3 (15.0)

50.4 (16. 7)

60.7 (16.3)

80

77.4 (30 .9)

74.8 (23 .0)

84.7 (22 . 1)

87 .7 (20 .5)

93.4 (18.8)

120

117.3 (36.4)

126.6 (36.4)

130.9 (33. 7)

133.5 (27 .8)

141.8 (23.5)

160

162.9 (35.3)

169.1 (30.7)

174.4 (30 .1)

179.3 (26.4)

184.5 (21.1)

200

199.7 (25 .8)

205.4 (23.2)

211.2 (23 .0)

214.6 (19.5)

226.2 (18.2)

Psychoacoustics of the P-center

181

250 I-

-;;;

200 I-

(

E

.#

c

Q

0

150-

(r

(.)

52 ~

Q)

c Q) (.)

100

I Q_

50

r.

~

;( I I

I

I

I

I

40

80

120

160

200

Duration of the initial consonant (ms)

Figure 4. Variation in P-center location due to the duration of the initial consonant in quasi natural /Ji/ syllables (unbroken curve of best fit) and in /Ji/ syllables with rectangular sound pressure envelope (dashed curve of best fit) with different durations of the vowel (for the stimuli with rectangular sound pressure envelope, O: lOOms, e: 140ms, b.: 180ms, &: 220ms, D: 260ms; points represent the vowel dependent values for the quasi natural syllables, cf. Table III).

divergent behavior in Experiment I ( = PJ) failed to show this effect, but together with the other male subject he shows a significant interaction of initial consonant and vowel duration. Also parallel with the first experiment, there are clear non-linearities in the simple consonant and vowel effects for all subjects. To test the relevance of the sound pressure envelope in speech signals for the P-center location in the way intended by Tuller & Fowler (1981) and Fowler, Whalen & Cooper (1988; cf. General discussion) this experiment was replicated with IJi! syllables of identical durational composition but with a rectangular sound pressure envelope. This rectangular sound pressure envelope was achieved by increasing the sound pressure level for III and decreasing it for Iii to an average value. The overall results again for both parts of the experiment are depicted in Fig. 4. Again this second stimulus set shows a significantly (p < 0.001) different P-center shift due to the segmental variation, this time on the average being Jess strong than in the case of the syllables with natural sound pressure envelopes. As in the first part of the experiment, only the male subjects show a significant interaction (p < 0.001 , subject PJ: p < 0.01) of the significant effect of consonant (p < 0.001, all subjects) and vowel duration (p < 0.001, except subject PJ). 5. Discussion of the results

Overtly, the experimental results seem to be quite in agreement with the predictions of the duration based model originally proposed by Marcus (1976, 1981): There clearly is a strong effect of the duration of the initial consonant and a less strong one of the duration of the syllable's rhyme as predicted by his formula P

= 0.65 x C + 0.25 x VC + const.

(where P is the P-center location measured from syllable onset, C the duration of the initial consonant, VC the duration of the vowel plus final consonant, and const. a

182

B. Pompino-Marschall

constant). But this picture is only due to the experimental design and the analyses used , taking this model as a starting point. After a closer look at the results, one can state in summary that the following are in clear contradiction to this model of the P-center effect: (I) The influences of segment durations on P-center position are not independent of one another, (2) the syllable's rhyme does not affect its P-center as a unit, (3) in the majority of cases these effects are not linear, and (4) the effects are not peculiar to speech and furthermore are not independent of the spectral make-up of the stimuli tested. In the light of further experimental results, the last point deserves some more comments. As shown by Cooper, Whalen & Fowler (1986) P-center position is not affected by category boundaries within phonetic continua, but is affected only by the duration of the initial consonants within these continua in a continuous I : I fashion (i.e. that each I ms increase in the duration of the initial consonants resulted in a I ms shift in P-center location). We were able to replicate this result with a jsta/- /spa/ series with variable stop closure duration and variable plosive-vowel transitions (Pompino-Marschall, 1987). A parallel result, however, was also obtained with a jstaj- jspa/ series without durational variation: compared with the first series, the continuous acoustic variation, restricted here to the direction of the plosive-vowel transition , resulted in a continuous P-center shift of half the magnitude (Pompino-Marschall, 1987). Taken together, these results call for an explanation based more on psychoacoustic parameters than on phonetic/phonological segment durations. 6. The model

A preliminary version of this model was proposed in Pompino-Marschall (1988) for the results of the experiments with CV syllables. Because of the significant differences between the results for the jma j syllables and their non-speech analogues with equal sound pressure envelope (leaving only differences in the spectral make-up of the stimuli) the model was based on the time course of specific loudness (i .e. the time function of loudness within each critical band [sonejBark]; cf. Zwicker & Feldtkeller, 1967). A schematic display of the processing stages of this model is given in Fig. 5. To get the amplitude levels of the critical bands necessary for loudness calculation every 15 ms, the acoustic input signal is first subjected to discrete Fourier analyses (OFT) with three different time windows (Hamming windows) depending on the calculated frequency domain: with a window of 60 ms up to 500Hz, of 30 ms from 500 to 1500Hz, and of 15 ms above 1500Hz. This was done to approximate the frequency-dependent frequency resolution of the human ear. From this data, the intensities within each critical band are calculated by pooling over the corresponding spectral lines. The resulting signals are first linearly and afterwards log-linearily low-pass filtered according to Karrjalainen (1987; the formulas are given in Fig. 5) to model temporal integration and masking effects. After dB-scaling and smoothing, the signals are subjected to loudness calculation according to Paulus & Zwicker (1972) . The stages of P-center calculation proper in detail are depicted in Fig. 6. In the first stage for each individual critical band, the time function of specific loudness is analysed with respect to steps within its rising flank exceeding 12% of the maximal loudness (L For each of the steps a "partial (onset) event" is registered at the time (T;) the loudness function reaches 40% of the local increment [Fig. 6(a)]. In the next stage, the " partial events" of all critical bands are weighted according to the following formula: 11 , ) .

W; = L;

X

exp [- (Tmax - T;)/r]

(I)

Psychoacoustics of the P-center Acou stic speech wave input

l

r

r

Hamming window 60 ms

Hamming window 30ms

I Hamming window 15 ms

I

I

l

OFT 1st-30th spectra l line 60 ms

OFT 16th-45th spectral I ine 30 ms

OFT 23rd- 78th spectral li ne 15 ms

16·7

500

183

1500 ...............

5266·7 Hz

111 11111111111111111111111111111 ,1 I I I I I II II I l l II II I

I

Me an values for critical bond s

\ I

I I I I I I I I

19 Bark

X 5 (n)=0· 15 x X 4 (n)+0·85x X 4 (n-l); X 4 (n) 2:X4 (n-l)

X 5 (n)= X 5 (n-l) x exp (0·21 x log(X4 (n) / X 5 (n-l))); X4 (n) < X 4 (n - l)

T;IT/ integrat ion Figure 5. Processing stages of the psychoacoustic model of P-center detection (see text for details).

[Fig. 6(c)]. In the last step the weighted " partial events" are integrated to a single event of "syllable onset" [Fig. 5(d)] by the formula for the center of gravity SO = L W;T;

LW; .

(2)

For simulation of the results of Experiment I and those of non-durational variation in a /sta/- /spa/ continuum a value of 50 ms for the time constant r yielded fairly good results. Table IV contrasts the predicted and measured P-center shifts due to the

B. Pompino-Marschall

184

L'

~ w

(d)

_______j_______I : _

'----l

SO

SCG t(ms)

Figure 6. Steps in the calculation of the psychological moment of " syllable onset" (SO) and of " syllabic center of gravity" (SCG): (a) determination of " partial events" from the specific loudness time series, (b) " partial events" shown enlarged (offsets marked by dahsed lines), (c) weighting of the "partial events", and (d) the "syllabic events" SO and SCG resulting from the application of the integration procedure to the weighted " partial events" (see text for details).

durational variations for the jmaj syllables and their non-speech analogues of Experiment I. Clearly the predicted P-center locations are in good agreement with the experimental results (cf. the standard deviations in Table 1). Furthermore, the model correctly predicts the overall lesser degree of acoustical anisochrony for the non-speech material. This latter effect is due to the fact that for this material, the distribution of spectral energy does not change from lower to higher critical bands as is the case for the natural syllables between the jmj and jaj portion of the acoustic signal: for the non-speech material, therefore more partial events are registered at simulus onset which are integrated to an earlier "syllable onset" . To cope with the results of the VC syllables (and those of their non-speech analogues), the model has only to be expanded by also taking account of the falling flanks within the time functions of specific loudness (which furthermore also may reduce the remaining unexplained variance in the CV data). Analogously to the analysis of the

185

Psychoacoustics of the P-center IV. Measured and predicted (italics) P-center locations for the /rna/ syllables (bold) and their non-speech analogues

TABLE

Duration of the initial consonant (ms)

Vowel duration (ms) 100

140

180

220

260

40

54.8 54.5 44.2 46.1

64.4 64.6 55.1 57.5

73.9 69.2 61.4 64.5

77.6 72.6 67.4 69 .0

79.6 80.3 66.4 72.0

80

100.1 94.4 87.2 79.3

113.2 105.9 107.0 94.2

125.3 110.9 112.6 102.4

127.5 117.0 116.9 107.5

137.7 121.8 119.6 110.8

120

137.7 136.9 127.4 120.7

145.8 148.2 144.8 137.1

154.9 153.7 153.9 145.5

166.3 157.1 160.2 150.6

174.1 165.9 161.6 154.0

160

172.3 179.6 170.4 165.4

172.6 189.8 177.6 180.8

191.1 194.9 184.8 189.1

198.6 198.4 199.5 194.1

206.9 205.8 198.5 197.2

200

197.5 220.8 191.5 209.7

212.3 231.5 219.3 224.4

218.4 236.3 223.2 232.2

225.6 239.3 231.2 237.1

238.5 247.3 227.0 240.2

rising flanks , " partial events" are also registered at the time r;- a falling step exceeds 40% of its range. In a test version of the model, these "partial events" are preweighted by a factor of 0.5 [Fig. 6(b)] before entering the weighting process proper [Fig. 6(c)]. Integrated together with the "partial onset events" according to formula (2), they yield the " syllabic center of gravity" [Fig. 6(d)]. Here, again in agreement with the experimental results, the model correctly would predict a greater shift in P-center location for the non-speech analogues of the VC syllables since, due to the constant distribution of spectral energy in contrast to the natural material, partial events for the higher critical bands would be registered also at simulus offset and not only at vowel offset. Therefore the integration process would yield a later "center of gravity". This last point at the moment can only be stated qualitatively, because more experiments are needed to determine the exact functioning of the weighting and integration of partial events. To come closer to an answer to the question of whether, for example, partial onset and offset events are weighted and integrated independently of one another before their integration with the "syllabic center of gravity", experiments with syllables beginning in /J/ plus stop clusters are in preparation. Although the exact weighting factors and the exact type of integration (integration of all "partial events" vs. integration of"syllable onset" and "syllable offset" etc.) have still to be determined experimentally, it can be shown that the model clearly can cope with all P-center phenomena reported so far.

186

B . Pompino-Marschall 7. General discussion

7.1. Predictions of the model The experimental results reported above can be taken as arguments against a model based on segment durations and- together with the evidence of similar phenomena with non-speech material (Howell, 1984; Schutte, 1978; Terhardt & Schutte, 1976; Vos & Rasch, 1981 )-seem to call for a psychoacoustical explanation of the P-center phenomena. A model based on adaptive thresholds within the time course of specific loudness integrating "partial events" to the events of "syllable onset" and of " syllabic center of gravity" is proposed. Besides the good approximation of the results of Experiment I , this last feature of the model , the calculation of two distinct events, " syllable onset" and " syllabic center of gravity", which distinguishes this model from all other psychoacoustic models proposed so far, seems very reasonable for different reasons: (I) Given the reports of the difficulties in determining P-center locations with inexperienced subjects (Marcus, 1976; Cooper, Whalen & Fowler, 1988) one may question the assumption of one definitely given point in time underlying the subjects' behavior; (2) similarly, results for syllable rhyme variation are reported to be much more variable than those for the variation of initial consonant duration (Cooper, Whalen & Fowler, 1988), which can be explained by a psychologically less strong " syllabic center of gravity" compared with its " onset" . Furthermore, (3) this model feature could be an explanation for an effect observed in experiments on production of rhythmically uniform syllable sequences (Pompino-Marschall & Tillmann, 1987): there was a significant difference in P-center location for the complex German monosyllabic words jpakst/ and jbakst/ produced in time with a metronome beat in alternating sequences with jbak/ or jpak/ dependent on the item the sequence was begun with, the P-center being later for those items in sequences beginning with the simple syllables jbak/ or jpak/. According to the model this could be explained as a regular timing of the "syllable onset" of jpakst/ or jbakst/ if the subject starts with these complex syllables in contrast to a regular timing of their " syllabic centers of gravity" if the subject starts with the simple syllables jbak/ or jpak/. In the case of the latter syllables, no position-dependent timing is seen, because for these stimuli the model predicts ra ther small time differences between "syllable onsets" and " centers of gravity". In order to test the psychological reality of two events as predicted by the model, we have started experiments with inexperienced subjects contrasting selected items of our stimulus material with different time discrepancies between "syllable onset" and "syllabic center of gravity" (i.e. /rna/ syllables with short segment durations: small discrepancies, and jam/ syllables with long segment durations: large discrepancies).

7.2. The P-center phenomenon and articulatory timing In all her publications, Fowler (1979, 1986), together with her colleagues (Cooper et al., 1986, 1988; Fowler et al., 1988; Tuller & Fowler, 1980, 1981) characterizes the P-center effect as an articulatory one, originating in the fact that " listeners extract information from the acoustic signal that specifies articulatory timing" (Fowler et al., 1988, p. 94). In their critique of Howell (1988), Fowler et al. (1988) cite a number of perceptual results in favour of their view. In the following-before discussing articulatory timing proper in connection with the P-center phenomenon- these experimental results will be

Psychoacoustics of the P-center

187

compared with the predictions of the psychoacoustic model proposed above. First they cite Marcus (1976, 1981) who failed to show a P-center shift due to an increase in the amplitude of the word-final stop burst in the stimulus eight in contrast to a significant effect of lengthening the closure duration of this stop. Our model here would make roughly the same predictions, since, because of the spectral characteristics of the stop burst, the variation in amplitude would only result in an increase in the strength of the partial events within the higher critical bands; this is likely to be negligible with respect to the computation of the " syllabic center of gravity", whereas a later occurrence of the partial events of the unaltered burst is much more likely to result in a different outcome of the integration process. In other experiments (Tuller & Fowler, 1981; Fowler, Cooper & Whalen, 1988), radical changes in the amplitude contour of syllables- though never really "infinitely peak clipped" as correctly observed by Howell (1988)- did not result in changes of perceived timing. Concerning the study of Tuller & Fowler (1981), one has to state with Howell ( 1988) that the measure of perceived timing was a very crude one: the subjects were presented with alternating sequences (of natural and manipulated syllables) that were acoustically isochronous, and others that followed the timing pattern of the naturally produced sequences. Subjects only had to choose the more regular sounding sequences. For both natural and amplitude manipulated stimuli, the sequences with the produced anisochronies sounded more regular than the acoustically isochronous sequences. In the follow-up study of Fowler, Cooper & Whalen (1988) natural jba j and jsa j syllables were (again not "peak clipped") scaled up for specified portions to use the full range of the D /A converter. In addition to the natural ones, these stimuli were combined with natural jba j in alternating sequences for an adjustment experiment. The only significant effect found here was between the two different syllable pairs (/baj-jbaj vs. jba j- j sa/). Although Fowler eta!. (1988) concede that with more data (having used 12 adjustments for each stimulus and three subjects) the differences between the differently manipulated j saj stimuli also could emerge as significant, they take the outcome of this experiment as an argument in favor of their articulatory-timing hypothesis. Data on production of their natural stimuli in rhythmically regular sequences alas is missing. Besides the fact that one may ask in what respect there is a direct mapping of articulatory timing to be found in the manipulated stimuli of Fowler et a!. ( 1988), we are able to compare their results and those of Tuller & Fowler (1981) with those of the two parts of our Experiment 3 directly. There, in contrast, we obtained a significant effect of the manipulation of the sound pressure envelope (using 40 adjustments and three subjects), yielding less acoustic anisochrony in the case of the stimuli with a rectangular sound pressure envelope, as would be predicted by the model due to an enhancement of syllable initial partial events. Finally, the proposed model also would not be in contradiction to the production data reported by Fowler (1979) for alternating sequences of prevoiced and voiced stops showing near isochrony of stop release. In our model , the prevoicing would result in weak partial events at acoustical syllable onset integrated with those near stop release (which in turn would be weaker than those of the voiced stops) to an earlier but weaker " syllable onset" more affected by the partial offset events than in the case of the voiced stops. So it might easily be the case that both effects cancel each other yielding equal " syllabic centers of gravity" . Since to my knowledge there are no experimental results in clear contradiction to the predictions of the psychoacoustic model proposed here, I would, in contrast to Fowler

188

B. Pompino-Marschall

and coworkers, formulate the hypothesis that in production of rhythmically regular seq uences, subjects control the acoustic result of their articulation to achieve perceptual regularity. As already noted above in connection with our own production experiments in accordance with this perceptual modeling of the P-center phenomenon there may emerge a conflict between the control of " syllable onset" and "center of gravity" . As Fowler et a!. (1988) remark correctly, psychoacoustical and articulatory accounts of the P-center phenomenon need not conflict in principle. Therefore finally we should have a closer look at the experimental evidence in favour of the claim that talkers " produce precisely the measured acoustic anisochronies that listeners require to hear the seq uences as isochronous" (Fowler et a!., 1988, p. 94; italics mine) and that it is arti culatory vowel timing underlying the P-center phenomenon. Since the outcome of alternative choices between sequences as produced and those artificially altered into aco ustically isochronous ones, as mentioned above, cannot be taken as a strong argument in favour of an articulatory explanation of the P-center, we are left, concerning this question, with the results of the EM G study of Tuller & Fowler ( 1980). They measured orbicularis oris activity during the rhythmically regular production of sequences containing the syllables (1) j bak j and jfak/, (2) jduk/ and jsuk/, and (3) jdup/ and /sup/. Although in the acoustical measurements the expected anisochronies were found (longer intervals between the onset of the fricative-initial syllable and the syllable beginning with the stop than between stop- and fricative-initial syllable) the onset of EMG activity corresponding to the initial consonant (in material 1) as well as the one correspond ing to the vowel (2) or to the vowel and the final consonant (3) was equa lly delayed with respect to the final stop release of the preceding syllable for both syllable types. Since in the case of perceptually isochronous productions of syllables with single consonants and with consonant clusters the articulation of the intial consonant cannot be isochronous (cf. Fowler & Tassinary, 1981 ), it is concluded that it is the timing of vowel articulation that is kept constant. In connection with the results of some recent experiments at our laboratory (Kuhnert, 1988) and because of the temporal regularity for all articulations in the Tuller & Fowler (1980) study, I would be inclined to doubt the correctness of this conclusion. First, the usual technique of EMG averaging may well have canceled out interesting effects in their data. As we have shown elsewhere (PompinoMarschall & Tillmann , 1987), sometimes different effects are seen when different realizations of one syllable are compared, in contrast to the comparison between phonologically different syllables. Secondly, in her recent study, analysing single EMG recordings of the orbicularis oris and the anterior belly of the digastric, corresponding to the consonant and vowel articulations in alternating sequences containing the syllables jpak/, /plak/, /prak/, /pfak/, and jpfiak/, Kuhnert (1988) was not able to find any stable timing of either consonant or vowel articulation. But one effect in her data is especially wo rth mentioning: comparing articulatory timing of the plosive-initial syllables with those beginning with the affricate, one can see that while orbicularis oris activity starts ea rlier in the case of the affricate, the onset of the activity of the digastric and even more the point of its maximal activity is delayed. So we have to conclude that neither consonant nor vowel artic ulation seems to be isochronous. This last effect, on the other hand , strongly resembles the observations of articulatory organization of initial consonant clusters with respect to the timing of the offset of the following vowel made by Browman & Goldstein (1989). But the numerical results of the Kuhnert (1988) study seem to be interpretable as a readjustment of supraglottal articulation with respect to a relatively invariantly timed glottal devoicing gesture (cf. Hoole, Pompino-Marschall & Dames,

Psychoacoustics of the P-center

189

1984) rather than in terms of the task dynamic description of Browman & Goldstein (1989), which assumes a phase-locking between the moment when the articulatory gesture for the first syllable-initial consonant reaches its target and the beginn ing of the vowel gesture, with the stiffness parameter of the latter gesture adjusted in such a way as to yield a fixed timing of their so-called C-center (the arithmetic mean of the moments in time of all initial consonant gestures at their goal position) with respect to the offset of the following vowel. To conclude, I would entirely agree with Fowler (1986) that speech perception is special, and what is perceived as a "distal event" is the articulating vocal tract, but I would question the "directness" of this perception. In my opinion, the role of psychoacoustics in guiding the recovery of articulation from the acoustic speech signal is underestimated in most accounts of speech perception. As far as the P-center phenomenon is concerned, particularly when consideration is not restricted to speech material, we would prefer as psychologically simpler (and more general) an explanation based on psychoacoustic effects to the articulatory/phonological based model envisaged by Fowler and her coworkers. At any rate, the experimental evidence directly supportive of a model of the type proposed by this group seems very weak. This research was supported by German Research Council Grant Ti 69/22-2. I want to tha nk the anonymous reviewers for their valuable comments on earlier drafts of the paper.

References Browman, C. P. & Goldstein , L. (1989) Some notes on syllable structure in articul atory phonology . In Articulatory organization-phonology to speech signals (0. Fujimura, editor), Basel: Ka rger (in press). Cooper, A.M. , Whalen, D. H. & Fowler, C. A. (1986) P-centers are unaffected by phonetic ca tego ri zati on, Perception and Psychophysics, 39, 187-196. Cooper, A. M. , Whalen , D. H. & Fowler, C. A. (1988) The syllable's rhyme affects its P-center as a un it, Journal of Phonetics, 16, 231-241. Fowler, C. A. (1979) "Perceptual centers" in speech production and perception , Perceprion and Psychophysics, 25, 375- 388. Fowler, C. A. (1986) An event approach to the study of speech perception from a direct-realist perspect ive, Journal of Phonetics, 14, 3-28. Fowler, C. A. & Tassinary, L. (1981) Natural measurement criteria for speech: The anisoch rony illu sion. In Allention and performance IX (J. Long & A. Baddeley, editors), pp. 521 -535, Hillsda le, NJ: Er lbaum. Fowler, C. A ., Whalen, D. H. & Cooper, A.M. (1988) Perceived timing is prod uced tim ing: a reply to Howell , Perception and Psychophysics, 43, 94-98. Hoole, P. , Pompino-Marschall, B. & Dames, M. (1984) Glottal timing in German vo iceless occlusives . In Proceedings of the lOth international congress of phonetic sciences (M . P. R. Va n den Broecke & A. Cohen, editors), pp. 399-403, Dordrecht-Cinnaminson: Foris. Howell, P. (1984) An acoustic determinant of perceived and produced anisochrony. In Proceedings of the lOth international congress of phonetic sciences (M. P. R. Van den Broecke & A . Co hen, editors), pp. 429- 433, Dordrecht-Cinnaminson: Foris. Howell, P. (!988) Prediction of P-center location from the distribution of energy in the amplitude envelo pe: I & II , Perception and Psychophysics, 43, 90-93 & 99. Janker, P. M. (1987) Experimentelle Untersuchungen zum P-center Effekt. Unpublished M . A. thesis. Universitiit Miinchen. Karjalainen, M. (1987) Auditory models for speech processing. In Proceedings of the ll rh internarional congress of phonetic sciences, Vol. 2, pp. 11-20. Tallinn: Academy of Science of the Estoni a n S. S. R. Klatt, D. H. (1980) Software for a cascade/ parallel formant synthesizer, Journal of the Acousrical Sociery of America, 67, 971-995. Kohlmann , M. (1984) Rhythmische Segmentierung von Schallsignalen und ihre Anwendun g a uf die Analyse von Sprache und Musik. Dr.-Ing. thesis, Technische Universitiit Miinchen . Kiihnert, B. (1988) Der EinfluB konsonantischer und vokalischer Komponenten bei der Artiku lation isochroner Sequenzen (Eine elektromyographische Untersuchung). Unpubli shed M. A. thesis, Universitiit Miinchen. Marcus, S. M. (1976) Perceptual centres. Unpublished doctoral thesis, Cambridge Universit y.

190

B. Pompino-Marschall

Marcus, S. M. (1981) Acoustic determinants of perceptual center (P-center) location, Perception and Psychophysics, 30, 247-256. Morton , J. , Marcus, S. & Frankish, C. (1976) Perceptual centers (P-centers), Psychological Review, 83, 405-408. Paulus, E. & Zwicker, E. (1972) Programme zur automatischen Bestimmung der Lautheit aus Terzpegeln oder Frequenzgruppenpegeln, Acustica, 27, 253-266. Pompino-Marschall, B. (1987) Segments, syllables, and the perception of speech rate and rhythm. In European Conference on Speech Technology (J. Laver & M. A. Jack, editors), Vol. 2, pp. 237-240. Edinburgh: CEP Consu ltants. Pompino-Marschall, B. (1988) Acoustic determinants of auditory rhythm and tempo perception. In Proceedings of the I988 IEEE international conferences on systems, man , and cybernetics, Vol. 2, pp. 11 84- 1187. Beijing: International Academic Publishers. Pompino-Marschall, B. (1989) Die Silbenprosodie, Forschungsberichte des lnstitutsfor Phonetic und Sprachliche Kommunikation der Universitiit Miinchen, FIPKM 28 (in press). Pompino-Marschall, B. & Tillmann, H. G. (1987) On the multiplicity of factors affecting P-center location. In Proceedings of the I Ith international congress of phonetic sciences, Vol. 5, pp. 370-373. Tallinn: Academy of Science of the Estonian S.S.R. Pompino-Marschall, B. , Tillmann, H . G. & Kuhnert, B. (1987) P-centers and the perception of 'momenta ry tempo'. In Proceedings of the I Ith international congress of phonetic sciences, Vol. 4, pp. 94-97. Tallinn: Academy of Science of the Estonian S.S.R . Schutte, H. (1978) Ein Funktionsschema fiir die Wahrnehmung eines gleichmiil3igen Rhythmus in Schallimpulsfolgen, Biological Cybernetics, 29, 49- 55. Terhardt, E. & Schutte, H. (1976) Akustische Rh ythmus-Wahrnehmung: Subjektive Gleichmiil3igkeit, Acustica, 35, 122-126. Tuller, B. & Fowler, C. A. (1980) Some articulatory correlates of perceptual isochrony, Perception and Psychophysics, 30, 247-283. Tuller, B. & Fowler, C. A. (1981) The contribution of amplitude to the perception of isochrony, Haskins Laboratories Status Report on Speech Research, SR-65, 245-250. Vos, J. & Rasch , R. (1981) The perceptual onset of musical tones, Perception and Psychophysics, 29, 323-335. Zwicker, E. & Feldtkeller, R. (1967) Das Ohr als Nachrichtenemp.fiinger. Stuttgart: Hirzel.

191

Psychoacoustics of the P-center Appendix T ABL E AI.

Results of the analysis of variance for Experiments I and 3: main effects and interactions (df(error] = II 700) df

F

Con Vow

4 4 2

Sti (within Nat!) N at (within Stil) N at (wi thin Sti2)

I

I 1

36 793.33 921.09 4 704.05 2 614.49 226 .95 490 .89

Con x Vow Con x S Con x Sti (within Nat!) Con x Nat (within Sti I) Con x N at (within Sti2) Vow x S Vow x Sti (within Nat! ) Vow x Nat (within Stil ) Vow x N at (within Sti2) s x Sti (within Natl ) s x N a t (within Sti I) s x N at (within Sti 2)

16 8 4 4 4 8 4 4 4 2 2 2

4.49 71.08 150.19 32.39 5.26 74.08 33.14 22.73 11.59 73.48 161.80 83.31

< 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001

Co n x Vow x S Con x Vow x Sti (within N at!) (Co n x Vow x Nat (within Stil) Con x Vow x Nat (within Sti2) Con x S x Sti (within N at!) Co n x S x Nat (within Sti! ) Co n x S x Nat (within Sti2) Vow x S x Sti (within N at! ) Vow x S x Nat (within Sti I) Vow x S x N a t (within Sti2)

32 16 16 16 8 8 8 8 8 8

4.91 4.46 2.58 1.25 129.63 19.73 25 .52 4.89 12.22 2.10

< 0.001 < 0.00 1 0.001 0.223 < 0.001 < 0.001 < 0.001 < 0.001 < 0.001 0.032

Con x Vow x S x Sti (within Nat! ) Con x Vow x S x N at (within Stil ) Con x Vow x S x N at (within Sti2)

32 32 32

4.21 5.7 1 0.94

< 0.001 < 0.001 0.570

Source of varia tion

s

Co n: conso na nt duration ; Vow: vowel duration ; S: subject. Sti (within N at! ): /rna/ vs . /Si/ sy llables. N a t: N atura l (Nat!) vs. derived stimuli (N a t2). Na t2 (within Stil ): 100Hz squa re wave so unds. Na t2 (within Sti2): /Ji/ sylla bles with rectangular sound pressure en velo pe.

p

< < < < < <

0.001 0.001 0.001 0.001 0.001 0.001

192

B. Pompino-Marschall TABLE All . Results of the analysis of variance for Experiment 2: main effects and interactions (df[error] = 3900)

Source of variation Vow Con

s

Nat Vow x Con Vow x S Vow x Nat Con x S Con x Nat s x Nat Vow x Con Vow x Con Vow x S Con x S

x x x x

S N at N at Nat

Vow x Con x S x Nat

df

F

p

4 4 1 1

839 .47 346.88 177.18 527.99

<0.00 1 <0.00 1 < 0.001 < 0.001

16 4 4 4 4 1

5.55 24.44 17.87 60.59 0.56 226.01

<0.001 <0.001 <0.00 1 <0.001 0.692 <0.001

16 16 4 4

5.59 6.17 10.44 5.78

<0.001 <0 .001 <0.001 <0.001

16

3.11

<0.00 1

Vow: Vowel duration; Con: Consonant duration; S: Subject. Nat: jam/ syllables (Nat!) vs. 100Hz square wave sounds (Nat2).