Journal of Phonetics (1981) 9, 225 - 231
Consonant loci: a spectral study of coarticulation. Part ill M.E.H. Schouten Department of English, University of Utrecht, Oudenoord 6, Utrecht, The Netherlands
and L.C. W. Pols Institute for Perception TNO, Kampweg 5, Soesterberg, The Netherlands Received 24th March 1980
Abstract:
In two previous papers locus positions had been determined for six Dutch consonants , by means of extrapolation of traces consisting of voiced samples. In the present paper, locus positions were also determined for the remaining Dutch consonants ; by disregarding the distinction between voiced and voiceless samples, which was found to be unnecessary, we could do this without extrapolation. Three speakers were used , who uttered every CVC-word five times, both as isolated words, and as stressed syllables in a text.
Introduction
In the first two instalments of this series of articles (Schouten & Pols, 1979a, b) we described the results of our investigations into the mutual spectral effects of coarticulation of four vowels and six consonants in stressed syllables pronounced by five speakers of Dutch. The coarticulatory effects were described as trajectories in a vowel subspace derived from band filter spectra. In part I we only looked at the relatively stationary vowel segments plus those parts of the transitions that unambiguously belonged to the vowels. In part II we extended our coverage to include all the voiced samples in the consonant- vowel and vowel- consonant transitions up to and, wherever applicable, including those of the actual consonant itself. The regularities we found with respect to the consonants were described in terms of consonant loci in the vowel space; in part I a lot of extrapolation of transitions had to be done to obtain such loci, and in part II some extrapolation was necessary. The next step obviously was to determine the loci of the remaining Dutch consonants, a step we have taken with the publication of this third article. In this third part of our investigation we decided to discard the very strict separation we had hitherto maintained between voiced and voiceless samples: as we shall see below, the dividing line turned out to be an unnecessary one. This time we wanted to go as far as our method would allow towards establishing the actual positions of the consonants in the most suitable subspace in terms of an optimal representation of the spectral data; as will be shown below, that subspace is the one based on all the voiced samples, although both the voiced and voiceless samples are represented in it. What we wanted to obtain was the most exhaustive possible spectral description of the Dutch consonants and vowels. 0095-4470/81/020225 + 07 $02 .00/0
© 1981 Academic Press Inc. (London) Ltd .
226
M. E. H. Schouten and L. C. W. Pols
Method Material The recorded material consisted of a list of 105 CVC-words, in which most Dutch consonants occurred at least once in combination with every Dutch vowel, and a written text in which each of the 105 words occurred as a stressed syllable. The voiceless consonants occurred both as initial consonants before and as final consonants after every vowel, but the situation of the voiced consonants was different. Dutch has no final voiced plosives and fricatives; consequently /b, d, v, z/ could only be used in initial positions. For reasons of economy, the other four voiced consonants /m, n, r, 1/ were used only in final positions. Speakers Three of the five speakers in our previous papers were used again: speakers 1, 2, and 4. Each of them recorded the list of words and the text five times over a period of two months. The recordings were new, therefore, but there was some overlap with regard to the CVand VC-combinations recorded. Analysis and further processing The analysis proceeded in exactly the same way as that of our earlier material: the outputs of 17 one-third octave filters were sampled every 10 ms, and the resulting filter levels expressed in dB were stored, along with linear level, number of zero crossings, and fundamental frequency. However, since we now used another system with another set of filters with, inevitably, slightly different characteristics, the results might be marginally different from what we had found earlier. Spectral subspaces were determined by means of a principal-components analysis of the covariance matrix based on the filter levels of a different data set. At first we derived subspaces which were based on voiced and voiceless samples separately (for the criteria, see below). The coordinate values along the first three dimensions of the relevant subspace (voiced or voiceless) were calculated for all the voiced or voiceless samples in the isolated words and in the text words . This made it possible to display every utterance as a trace in a plane, using two of the three dimensions at a time . As before (see Schouten & Pols, 1979a), we determined inner and outer boundaries in the voiced subspace for each word: the inner boundaries enclosed the three vowel samples that were closest together spectrally, and the outer boundaries separated the consonants from the transitions to and from the vowel. The (automatic) criteria for placement of the outer boundaries were a mixture of level, voiced- voiceless transition, and spectral distance. Whenever necessary, the automatically determined boundaries were corrected by hand. The samples outside the outer boundaries were considered to belong to the consonant. Voiced and voiceless samples Traditionally (see Pols, 1977) we have said that fewer than 10 zero crossings in a 10 ms sample indicates voicing, and that over 15 zero crossings means voicelessness. A second criterion took care of the samples having between 10 and 15 zero crossings, and that criterion was a specific value along the first dimension of a subspace based on all the samples (voiced and voiceless) of the data set, and weighing low against high frequency energy in the spectrum. In the description given up to now we had discarded all the voiceless samples, not knowing how much useful information we might deprive ourselves of.
Consonant loci
227
In Fig. 1 the average consonant positions for each of the three speakers are di.splayed along the first two dimensions of the subspace based on all the voiced and voiceless samples. The data points in Fig. 1 were determined as follows : of each CVC-word pronounced by a speaker, the coordinate values of all the samples outside the outer boundaries were averaged ; from those averages per word the overall average per consonant was calculated. It should be added here that in Dutch voiceless stops voice onset time is zero. The first (horizontal) dimension of this subspace is the "voicing dimension". Figure 1 shows, however, that there is no clear distinction between voiced and voiceless consonants : particularly in the more natural text words /v/ and /z/ seem at least as voiceless as /k/ and / p/, as far as voicing information can be derived from band filte r spectra. We concluded from this that the distinction between voiced and voiceless samples was an unnecessary one for our material, and that if we wanted to squeeze the maximum amount of information from our traces, we had better calculate the coordinate values of all samples in our subspace. This still left open the choice between the total, the voiced, or the voiceless subspace. Inspection of the data showed that a representation of all the data in the voiced subspace was the most informative. The next question is whether we can fmd an ad hoc criterion for deciding which samples should in the end be regarded as belonging to the "stationary" consonant. This procedure is described in the next subsection.
Obtaining the average traces All calculations were done in the voiced space and separately for each of the three speakers. For every single initial and final consonant, the average CV- or VC-transition was calculated over all the vowels. For every initial consonant we thus obtained an average trace from the very beginning of the consonant to a position somewhere in the middle of the overall vowel area, and for every final consonant an average trace from the centre of the vowel area to the very end of the fmal consonant. The traces were based on averagings over usually 60 utterances per speaker per condition (text and isolated words), which had first been time-normalised in a very simple, linear way. The resulting average traces of one of the speakers are shown in Fig. 2. In this figure we have also indicated, by means of arrows, the points along the traces beyond which a representation of samples in the voiced space seemed to indicate the position of the consonant. As can be seen, most of the traces form fairly straight or lightly bent lines between the consonant and the vowel area, but often exhibit a sudden deflection at the consonantal end. We decided to chop off the deflected parts: if the traces can anywhere be said to change over from "stationary" consonant to (vowel) transition or vice versa, it must be at those points of sudden discontinuity. An exception was made for initial /b/ and / d/ ; as Fig. 1 shows, they, or rather the vocal murmers preceding the bursts, are very strongly voiced, and analysis of a number of individual utterances reveals that they are almost always completely voiced from the first sample on. There is a sharp discontinuity in the /b/- and /d/-traces, however, caused by the transition from vocal murmur to open vocal tract. We decided to express this in terms of two loci : one for the vocal murmur, and one for the released consonant. For the other consonants, the consonantal ends of the CV- or VC-traces (excluding the deviating portions) were defined as the consonant loci. It should be mentioned that not much temporal information is to be gained from Fig. 2: most traces are averages over 60 utterances, in which consonant duration could vary greatly, and in which the three spectrally closest samples could occur early or late in the different vowels . As a result, the paths followed by the traces are quite reliable in terms of position, but their temporal structure is a matter of great uncertainty.
N N
00
I I
( a)
I b2 ld2 d,b, I
k, k 2 p2 p,
ln
p~'
s2
z, z, d,
-8
-~ Figure 1
(c )
I, I, 12
.J,_ -
-
-
v2
t,
k2
k, k,
-48
- 40
-32
P2
-24
d ,b2
b, l
d,
II
-8
~
§ (d )
I
I nrf, 2
b,
s, f-----
I
__!)_ -
t,
I,
12
-
-
-
-
·-
x2
~
-
r2
k2
x, x, k,
-
~
r,
'1 ,
I
I,
12
I,
I I
l
8
16
24
32
-56 -48
-40
-32
-24
-16
-8
0
8
16
fJ ~ 1:;'
+-I - - - -- - I
I:).
!:""'
cl'
n4 m1 m4 n1
I
p,
k,
I -16
c
I::
I
l
p,
+- - - - - - - -
0
~
g:
I,
52
I I I
Pl' -56
I
I,
I
JJ~P·
n
1
v,
t,
p~2
s,
I
I- - - - - - - - - -
x,
X2
x,
~
I
I
z2
s,
12
~
12
1
I I r,
R,
I s,
I,
m2
n4 m,_
r,
I
13
-16
I,
n2
+-- - - -nr-
r2
I
8
0
12
t, , ,
r
16L
m,
f----- --- - - - - - - -
I I I
t,
-24
24
s,
1
"{,
/' 2 I, 12 , ,
-16 1
32
2
+--------
v, -81-
s s,
d,
I
( b )
I
I
b,
I
I
Of------
I
n
24
32
Average positions of all the consonants spoken by our three speakers in a subspace based on all the samples (both voiceless and voiced) of a different data base, so that dimension I represents degree of voicing. Initial and final consonants are presented in separate panels, as are consonants for text words and from isolated words. The subscripts below the phoneme symbols indicate speakers 1, 2, or 4. (a) Initial consonants text words. (b) Final consonants text word s. (c) Initial consonants isolated words. (d) Final consonants isolated words.
16 L
n
(a )
I
n
( b )
I
8
0
-tI - - - - - -~ t
-8
b'
-m
-16 -24
-32r'
Q ;::: ""0 ;:::
-40
§
....
Is
-48
()
;:::_ 16~
n
(c )
n
(d )
8 0
~ -ti
-m
-8 -16
-24
-32 -40
-48 -56 -48
Figure 2
-40
-32
-24
-16
-8
0
8
16
24
32
-56
-48
-40
-32
-24
-16
-8
0
8
16
24
32
Average traces from each of the initial consonants to all the vowels and from all the vowels to each of the final consonants. Speaker 1 is used as an example here. The arrows point to the consonant loci as we defined them; see text for cut-off criteria. The subspace used here is the one based on voiced samples only. (a) Initial consonants text words. (b) Final consonants text words. (c) Initial consonants isolated words. (d) Final consonants isolated words.
N N
\0
/21
40 II
,,
32 i
34
I
1/, .
t)~L
~ -- 's-,
----z, z,~
,-/ ____ -
OL_!Z
1 2
-16 rL
,--~---.. : (, ~VZ!.t _0 ' !},
.
:·--- ,t\\
\I~:
-24
/q'.
/ ir<-~___ _k:i' ,pf>,
\
1
u
- -- -- - -
/_.,......
E,
{ t2
, _ __ ...-
.,
I
' t' f1 \ t.
( a )
\
\
\ P, "El
r'X ~\
jl<~~ ',
I
I k
\
' ....... ~;
l
r, \
I
I
X,
\ :~-
I
1
\
u 1 /l]\ I
I
I
I
/ ~~---,/
ll ,
/.1
I_-
YEL
1
I
',
"..-- -, P,,
~p2
/
t4 /
I
w
0
: _n;_./ /
-'-~ ·:...3 2 +
1- - - - - - - - - - -
---- --5,- . . , ~52 ----- --~~' --! -, '
; r2·'.
N
\
/
I I I I
U
U2
-r-~·:
I
I
I I I
/ "<'
I
/
u2
/'
1
_-:..l(__i~~
--- ,k_0
'· '
//
d,2• 1,
\ ',d
---'
//
/
////
-- -l-r.......
r-.2_~-.::-_ is;------,---- -- -8r___ _:;!,_ ___:~· 8
d,
,.j / ld'
16
,,
,...-./--b'; ';
_....--..-
1
/;;
II
II
2
~ ~
k,'
' - _ ...
/
~
a,~ 2"a,
g:
( b )
I II
/"'
32
24
1' tijl2 d;--~; / ' ,,b e I .,_------, '- ' I
I
1
-- -'
//
16
8
-24
I
,,
I
··r;
I I I I
u
....
'.~~::::-f
------
1( z2
Zt.
/."'
.... , z1 ,
( dl
'-- ------~
',s~
- 16
,
1
Ol- ~--=:--:::-8
./i]i
II
-32
~2',
-
"t
I
:
kl)
',_'"-,....-> ( 'I:'J.Pc' / / ' l"...- -f 2
~
'. . f~ _f_: ./
u,
-
'
/
.·
K
~ ~l
- -
I
I, ~ ~,...
. •·
s '. - --' ~
r~ k,..,.....,...,....... ......_:_.,.....
(c )
.
I
/Etc,
Figure 3
-40
-32
-24
-16
-8
0
'·'~/
n:,/ ' /·::
!_D,t.-1!)~~ · '~ 2
I I
,1 1
I
~
/
~
I
1:;-
I
J
I
-------
I I I
',.!-
ar--a
1)
1\
I
( d )
I.
I 8
16
24
32
40
-56
-48
-40 -32
-24
-16
-8
0
8
16
24
l:l.. $]
///11.\
u,
§
!:""
I
f;/ ~ ~, ',
I -56 -48
£2
..---- P2\
...,......,...
' :::::>'"~·~~~- ./
//~
a~\ I '
·· ..
(t~t; ! ,y.'li...; /
. -;;>:-'~~: .,/]
I
f-/~5~~,- - - ~--~ · ~ . ~·:~/ i
I I
: / --- . . , ('-£
-""-".r-~-. :, V,;; / r;A<£2; ;
/
~~
-7:-
:~
C..- ....
:
2
bt
d2 't
1.9''- - E 2b:). ..----"' - - - -
-.:;~·:
-- -~- 7v2_.· t ~ I
V1
I
a
1:: ~ ;:s
32
40
Average consonant loci for speakers 1, 2, and 4 in the vowel subspace, which has a different orientation fro m the voiced subspace of Fig. 2 (see text for further explanation). The average positions of fo ur vowels are also indicated. Curved lines have, as much as possible, been drawn around loci belonging to one particular phoneme. For the voiced plosives two locus positions are given: one for the vocal murmur and one for the released consonant. (a) Initial consonants text words. (b) Final co nsonants text words. (c) Initial conso nants isolated words. (d) Final consonants isolated words.
Consonant loci
231
Results and Discussion The resulting consonant locus positions for each of the three speakers are displayed in Fig. 3, after recalculation in the vowel subspace used in our previous articles. This recalculation was done purely for reasons of comparability. As we explained in Schouten & Pols (1979b), the voiced subspace is, apart from some rotation, to all intents and purposes identical to the vowel subspace, so there are no great principles involved here. As opposed to most of the loci in our previous two papers, the new ones are not based on extrapolation but are really measured average points. Most of the differences between the newly found loci and those displayed in Schouten & Pols (1979b, fig. 3) may be considered to be due to the greater accuracy of measuring as done in this paper compared to extrapolation; moreover, although the speakers were the same, different recordings and analyses were used. There are, however, a few differences that should be mentioned. First, the whole picture has shifted to the right; this must be due to differences in the frequency response of the two spectrum analysers used- the first dimension separates high from low frequencies. Secondly, the anomalous positions of final /t/ and /p/ in our previous article do not recur ; we still do not know what caused them, however. Figure 3 speaks for itself in most respects . A few things deserve comment, however: (1) It is surprising but heartening to find that in the text words the consonants are easier to separate than in the isolated words; this indicates that in connected speech inter-speaker variation with regard to consonants is actually smaller than in isolated words; (2) /b/ and /d/ are very hard to distinguish; (3) /p/ and /k/ in initial position overlap; (4) /t/ and /v/ cannot be distinguished spectrally, so duration cues will have to be used . On the whole, we have found fairly unique sub-areas for the consonants in the voiced subspace. Straight lines from consonant areas to vowel areas represent the CV- and VCtransitions quite well. References Pols, L. C. W. (1977) . Spectral analysis and identification of Dutch vowels in monosyllabic words. Doctoral thesis, Free University, Amsterdam. Schouten, M. E. H. & Pols, L. C. W. (1979a). Vowel segments in consonantal contexts: a spectral study of coarticulation - Part I. Journal of Phonetics, 7, 1-23. Schouten, M. E. H. & Pols, L. C. W. (1979b). CV- and VC-transitions: a spectral study of coarticulation - Part II . Journal of Phonetics, 7, 205 - 224.