Journal of Phonetics (1989) 17, 63- 70
On the necessity of quantal assumptions. Questions to the quantal theory Louis F. M. ten Bosch and Louis C. W. Pols Institute of Phonetic Sciences , University of Amsterdam, Herengracht 338, 1016 CG Amsterdam, The Netherlands
In this paper several questions to Stevens' quanta) theory are formulated . The concept of quanta) behaviour will certainly prove its applicability in a theory that relates phonetic and phonological units. However, we do not fully agree with Stevens' reasoning. One of our objections to his quanta) theory deals with the difficulty of its verification . We argue that the definition of " quanta)" as applied in the quanta) theory should be sharpened for the sake of model verification. Moreover, to our mind it is not necessary to presuppose the existence of quantal-like properties of the articulation-toperception transformation for describing quantal-like speech features.
1. Introduction
In his paper "On the Quanta! Nature of Speech", Stevens (1989) suggests speech features to be related with plateaus of the subsequent transformations articulatory gesture
-+
acoustic signal
-+
perceptual impression.
According to this quanta! theory (QT), speech features can be defined as "islands" or " plateaus" of acoustic stability with respect to articulatory change. We interpreted the QT as a means to explain the relation between phonological and phonetic units. A study of this relation gives rise to many intriguing questions. In our opinion, Stevens actually combines two principles, one related to speech production, the other to speech perception, thereby stipulating speech properties to be solutions of a feed-back mechanism between speaker and listener. One principle (minimization of articulatory effort or articulatory accuracy) , is a purely physiological one, and is assumed to be evoked by the speaker. The other principle deals with perceptual ease (the existence of plateaus) and is assumed to be demanded by the listener. The existence of plateaus is of course also favourable for the speaker, who (according to Stevens) may reduce the required articulatory precision. Stevens' considerations pose many interesting points for discussion and they inspired us to review the relationship between perceptual, acoustic and articulatory properties of natural speech . To some extent, we do not agree with Stevens' interpretations, mainly with respect to his model design. We will deal with this point explicitly. Below, we formulate several questions about the fundamentals of the QT (Sections 2-6). Section 7 deals with the necessity of quanta! assumptions. We would like to defend 0095- 4470/89/0!0063
+
08 $03.00/0
© !989 Academic Press Limited
64
L. F. M . ten Bosch et a!.
the position that quantal-like effects within speech can equally well be described without assuming some kind of quanta! paradigm in advance. In Sections 8 and 9, we formulate two parallels with mathematical models. In the final Section 10, we summarize our comments.
2. Aims of a quantal theory
What precisely could be the aims of a " quanta! theory"? Firstly, one might look for a quanta! explanation for describing the existence or presence of phones or articulatory gestures in a language. Secondly, one might search for the quanta! properties of phonoiogicaifeatures. Stevens chooses the second option in his QT. He deals with the quanta! explanation of [high], [low], [back], [voiced], [sonorant], etc. All these features play a distinctive role across languages. However, it is not quite clear to us how relatively simple patterns . in phoneme inventories can be explained by QT. Let us consider one universal in vowel systems (cf. Crothers, 1978), the so-called vowel-tree hierarchy. Can this hierarchy be explained or predicted by QT? The explanation of basic universals may be considered as a test case for theories such as QT. The dimensions [height] and [back] behave differently from a phonological point of view: the [height] dimension permits much more partitioning. Does this imply that the [back]-feature is more quanta! than the [height]-feature? As Stevens already recognizes (pp. 5, 38), it will not be easy to interpret by QT the large differences across the sound (or feature) structures of language. Indeed, QT is merely based upon rather strict extra-linguistic principles.
3. Methodology
In order to make clear our initial position, it might be useful to enlighten our view upon the methodology that Stevens uses in his paper. Consider the set of speech features as a subset of the set of all possible signal features. In our opinion, quanta! assumptions may be fruitful for the purpose of selecting optimal or suboptimal sets of preferred speech features out of this larger set of signal features . We will briefly outline this statement. In order to verify QT, Stevens shows that every speech feature ([high], [low], [round], [nasal], [sonorant], [continuant], etc.) has a quanta! nature along some appropriate articulatory dimension . What we want to argue is that this option yields a rather weak verification. To our mind, the verification ofQT (and the sense of having "quanta! nature" ) is much stronger if one could prove that signal features that do not exist in natural speech are indeed less quanta!. One can imagine that several signal features (that are not directly interpretable as speech features , e.g. certain correlation functions of the signal) are much more quanta! than, say, vowel height. Should QT not explain the existence of [height] as a speech feature by showing more than the 'quanta! nature' of [height] only? One may compare this reasoning with the following . Suppose one wants to show that every dog has four paws. Then the inspection of all dogs will certainly be sufficient. However, suppose someone else claims that having four paws is really a very essential property for dogs. In that case, it is not sufficient to show that every dog has four paws, and to conclude from this finding that this property is favourable indeed. One should additionally show that any dog having e.g. three or five paws is indeed in a state or condition of disorder.
Questions to the quanta! theory
65
4. What does "quanta!" mean?
The justification or reasoning of QT is based upon the behaviour of acoustic patterns along one or more one-dimensional articulatory parameter spaces. Why does QT not deal with acoustic output patterns on a higher-dimensional space or on a full environment of the articulatory target position? To our mind , the claims of QT would be much stronger in that case. An articulatory gesture would then prove its quanta! nature relative to much more or all other gestures in its neighbourhood. Stevens applies a one-dimensional definition of " quanta! nature", thereby attributing " quantality" along a rather sparse and more or less arbitrary subset of adjacent articulatory positions. What has actually been proved then? Here one might consider a related phenomenon: certain types of vowel shifts (for instance at /i/, a cardinal vowel) can be explained by the phenomenon that stability does not exist on a full environment of the target position (Goldstein, 1983). Goldstein shows that the diachronic shift of several vowels can be explained by articulatory changes outside the one-dimensional articulatory stability region . Does this mean that something is wrong with the quanta! nature of the shifted vowels or features? If so, should QT not incorporate such eventualities? Or does this phenomenon firmly prove the relevance of the concept of "quanta! nature"? It might be convenient to introduce a " rate of quantality" of articulatory gestures, e.g. by defining this rate as the number of "mutually orthogonal" articulatory subs paces on which the gestures have proved to be of quanta! character, divided by the total number of " mutually orthogonal" articulatory subspaces present.
5. Articulatory precision
We want to deal with a corollary of QT that is coherent with the preceding question. Stevens states (p. 5) that " [. . .], the precision with which the target articulatory state is achieved may be rather lax". We think this illustrates only one aspect of QT. If the acoustic output pattern is rather insensitive to changes of an articulatory parameter, the statement is certainly true. Yet, its validity is restricted to the particular one-dimensional subset of articulatory positions chosen: Outside this subset nothing can be specified. To deal with the peculiarities involved, one may introduce the distinction between the achievement of an articulatory position and the precision required to remain there. If the articulatory stability region is only one-dimensional, the articulatory precision required for the achievement of a position will generally depend on the preceding position. However, the precision required to remain in the target position depends on the "rate of quantality" (see Section 4) of that position. To our mind , in QT arguments concerning these aspects are confused. 6. Dynamic patterns
Our next question deals with the description of dynamic patterns in speech, and more specifically, in the vowel space. We do not quite understand how (paths of) diphthongs can be explained by quantal-like relations. It is relatively easy to show (even with a four-tube model) that paths of diphthongs highly depend on articulatory fine motions. A great variety of diphthongeal paths, all passing from vowel I to vowel 2, can be achieved by simple asynchronous movements along the vocal tract during the articulatory interpolation process from n-tube model 1 to n-tube model 2. Figure 1 shows several
66
L. F. M . ten Bosch et al. Fzli I
. . .(.1'•,:
''S2c"::::::::::::.::::::::>::/ .
Figure I. Diphthongal paths in the two-dimensional formant space. The figure shows diphthongal paths between three vowels (/i/, /of-like and /a/). Between each vowel pair two possible paths are shown, corresponding to different articulatory interpolation processes. These paths correspond to the location of the articulatory onset (the back and the front of the vocal trace , respectively). This figure shows the dependence of formant tracks on the timing within the articulatory interpolation process.
formant tracks from one formant position to another. These different paths arise when the articulatory interpolation is chosen in different ways. As all interpolation movements are (smoothed) linear, the synchrony between them appears to be essential for the exact phonetic realisation . The acoustic output in these dynamic cases is certainly not insensitive to articulatory changes-moreover, they should not be, since the perceptual salience of diphthongs is (often assumed to be) present in the dynamic acoustic change. 7. Necessity of quanta) assumptions We now raise the question whether it is necessary to assume quanta) relations in order to explain quanta) properties in speech. To our mind, this necessity is not evident at all. Let us give an example. Liljencrants & Lindblom (1972) provide in their model a method for interpreting the structure of vowel inventories in languages only by means of continuous (not quantal-like) penalty or cost functions. Their model did indeed turn out to be rather incomplete, for instance with respect to the geometry of the available vowel space and the absence of inherent articulatory constraints. The model has been improved since that time, not in the least by Lindblom himself. However, our main point is that he and Liljencrants did succeed in explaining quantal-like properties of vowel systems, i.e. in locating certain preferred spots in the vowel space (cf. Disner, 1983). Lindblom ( 1983, 1986) designed a similar method applicable to consonants and vowels together- based upon a selection of specific distinct elements out of a larger set of "abstract" speech segments. Once again, these results show quantal-like behaviour without assuming quanta) relations a priori. In ten Bosch, Bonder & Pols (1987), we give another example of the interpretation of quantal-like preferences by means of continuous articulatory and perceptually based functions for vowels. Since all these models merely use extralinguistic premises, one may, therefore, hardly expect them to give a full explanation of actual speech pecularities.
67
Questions to the quanta! theory F2 (kHz)
05
1·0
1·5
2·0
2·5
1·2
Figure 2. Model solution of a seven-vowel system as given by the searching algorithm. For reference , the grey area indicates the region that is used by most languages. The straight line denotes the line F, = F 2 • The other lines are contour lines of an articulatory effort function. These contours give an idea of the theoretically shaped vowel space while applying a static effort principle. Since positions outside the grey area are not strictly forbidden , some of the vowels get their limiting position there.
We would like to give an example of emerging quantal-like behaviour in speech without assuming quanta) relations. It is taken from our study of the structure of vowel systems (cf. ten Bosch et al., 1987). We took the Liljencrants & Lindblom-model (1972) as a starting point. Our suppositions were twofold: within a vowel system, there exists "sufficient" perceptual salience, vowel systems are subject to minimization of articulatory effort. Ohala (1983) notices that the second principle is very close to a triviality, and perhaps too close. Nevertheless, we applied both principles in such versions that they could be dealt with numerically. The first principle is translated into another principle that deals with the minimization of confusion probabilities between vowel sounds. The second principle makes use of an articulatory effort function, based upon the sh<~. pe of n-tubes. A searching algorithm looks for systems of points in the vowel space that optimally fulfil the principles mentioned above. The model results lead us to conclude that the outer contours of the vowel space are determined by articulatory motivated constraints. At the same time, the inner structure of vowel systems is ruled by the perceptually inspired constraints. Figure 2 shows a predicted model system containing seven vowels. The curves denote contour lines of the applied effort function and give an indication of the boundary of the vowel space. The seven points indicate optimal positions of the vowels, according to the searching algorithm. The positions of these vowels appear to depend on the precise mathematical translation of the principles mentioned above. Figure 3 shows the predicted number of N-vowel systems versus the number of vowels N, and actual data (according to Crothers, 1978). Since the model deals with confusion probabilities between vowels, it is possible to predict the optimal number of vowels in a vowel system, provided these systems do not contain any long-short or other feature opposition. In order to obtain these results , we combined aspects from information theory and probability theory. The grey area and the black points denote the graph of the model distribution and the actual data, respectively. With appropriate parameter values as input, this probabilistic model roughly approximates
L. F. M. ten Bosch et al.
68 Q)
0.
E 0
V>
I
0..
0
c::
0 -~ 0
0:: 2
3
4
5
6
7
8
9
10
II
Number of vowels
Figure 3. A graph of the number of languages versus the number of vowels, according to the probabilisitic vowel dispersion model (Ten Bosch, Bonder & Pols, 1987) (grey area), and according to the SPAP-database (cf. Crothers, 1978) (black points). Along the abscissa, the number of vowels in the vowel system is indicated. Along the ordinate, the fraction rel ative to the entire SPAP-sample is shown. For example, roughly 30% of all languages possess a five-vowel system, whereas 25% is predicted.
the vowel system statistics as found by Crothers (1978). We refer to ten Bosch et al. ( 1987) for more details. 8. Definition of Quantality
QT introduces a useful way of looking at the relation between articulatory, acoustic and perceptual properties of speech. Quanta! behaviour of these relations is looked for in the transformations from articulatory states to acoustic states, and in transformations from acoustic states to perceptual states. One may ask why the use of the intermediate acoustic state is permitted in the definition of " quanta! nature", since the premises of QT do obviously have their basis in the ultimate articulatory side and the ultimate perceptual side on the line articulatory gesture
-+
acoustic signal
-+
perceptual impression .
Actually, the " quantality" of a transformation from left to right is much stronger if it has proved to be present throughout all the intermediate substeps at once. However, one may defend Stevens' reasoning by disclosing a parallel to a mathematical phenomenon, i.e. the chain rule for differentiation of functions (in Section 9 we will examine these parallels more closely). To our mind, a study of these parallels possibly leads to a more tractable theory from a mathematical or model-theoretic point of view. Consider the above two mappings as real functions on several variables. Denote the first (left) mapping by F, the second (right) one by G, and assume that F and G are sufficiently smooth. The domains of these functions are the set of gestures and the set of acoustic outputs respectively. Stability, or the existence of plateaus, is evidently related to the flatness of the graphs of these functions . Plateaus of an arbitrary transformation H may be defined as regions where the derivative of H vanishes, in other words, where H ' (gesture)
= 0
(I)
Questions to the quanta! theory
69
Evidently, this definition cannot be too precise: regions would possibly degenerate to only one point. However, this principally does not affect the validity of the reasoning here. In fact, we look for the quanta! behaviour of the composed transformation G(F). According to the reasoning above, these plateaus are defined by the equation [G(F)]' (gesture) = 0. However, by the chain rule, [G(F)]' =
[G '(F)*F']
(2)
More explicitly, [G(F)]'(gesture)
[G ' (F)* F'](gesture) =
G' (F(gesture)) * F'(gesture)
G' (acoustic output)* F' (gesture)
(3)
In words: the set of zeros of[G(F)]' corresponds precisely to the union of the set of zeros ofF' and the set of zeros of G' . This suggests that it is indeed sufficient to deal with F(gesture) and G(acoustic output) separately, as Stevens does . 9. Accepting and rejecting quanta) assumptions a priori. Model parallel from a mathematical point of view In this section, we return to the question concerning the necessity for presupposing the existence of quanta! properties in speech. From Crothers' and Lindblom 's analyses and our model results, we conclude that the necessity for quanta! assumptions is unlikely. We adduce one argument in support of this conclusion. This argument deals with a mathematical parallel between both opposing models (assuming or rejecting quanta! relations a priori) . We observe that the quanta! behaviour of speech features can be traced back to the behaviour of the graph of a ppropriate articulation-to-perception mappings (Section 8). This correspondence is used by Stevens in his argumentation of the QT. The existence of plateaus corresponds to the flatness of those graphs , which is related to the derivatives of the mappings involved. Plateaus of transformations correspond to regions where the derivatives or gradients of the transformations vanish (become zero) . However, mathematical models dealing with the search for optimal subsets in an inventory show tha t the subsets that do locally or globally optimize a penalty function (sufficiently smooth), possess a plateau character at the same time. Indeed , many searching algorithms look for maxima and minima of a mapping by minimizing the length of the gradient of the mapping towards zero, thereby providing the found solution with plateaulike properties. In order to recognize this argument, one might inspect the reasoning of steepest descent-like methods (cf. Whittle (1982) , chapter 15, sections 15.1, 15.1 0) . By definition of " locally optimal", this 'quanta! ' character is relative to a full environment of the optimizing subset. From this point of view, the definitions as applied in QT indeed formulate necessary conditions for optimizing subsets (existence of a plateau character on a one-dimensional articulatory subset), yet still not sufficient ones (existence of a plateau character on a full environment of the target position). 10. Conclusion We want to make the following conCluding remarks. Firstly, Stevens' concept of QT ma kes use of ideas which are very useful and well applica ble in speech models. Secondly,
70
L. F. M. ten Bosch et al.
parallels exist between the model structure of and the way of reasoning in QT on the one hand and the model structure of other speech segment theories (e.g. Lindblom, 1983) on the other hand. Arguments as used in the QT will stimulate further discussions on this subject. We interpret the QT as a tool for disclosing relations between phonetic and phonological units, and as a theory aiming at the articulatory/perceptual explanation of phonological features. However, some of the methodological and model-theoretical aspects are not quite clear to us. Therefore, we would like to raise the following questions for a discussion: Concerning the definition of " quanta!": What is the status of 'quanta! nature' if it is shown to be present on a subset of adjacent articulatory positions only? What is the theoretical relation of quantality with the vowel shift mechanism described by Goldstein (1983)? Concerning the justification of the QT: What is the relation between actual speech features and those signal features that have proved to be of quanta! character? Concerning its applicability: How to deal with dynamic speech patterns, such as diphthongs? How to deal with language-dependent phenomena? Concerning its verification: How to explain/predict preference patterns in segment inventories? References Bosch, L. F. M. ten, Bonder, L. J., & Pols, L. C. W. (1987) Static and dynamic structure of vowel systems. Proceedings of the 1lth international congress of phonetic sciences, Vol. l , pp. 235-238. Crothers, J. (1978) Typology and universals of vowel systems. In: Universals of human language (J . H. Greenberg, editor), Vol. 2, pp. 93-152. Stanford University Press, California. Disner, S. F. (1983) Vowel quality. The relation between universals and language specific factors. UCLA Working Papers 58, University of California, Los Angeles. Goldstein, L. (1983) Vowel shifts and articulatory-acoustic relations. Abstracts of the lOth international congress of phonetic sciences, (M . P. R. van den Broecke & A . Cohen, editors), pp. 267-273. Dordrecht: Foris Publications. Liljencrants, J. & Lindblom, B. E. F. (1972) Numerical simulation of vowel quality systems: the role of perceptual contrasts. Language, 48, 839- 862. Lindblom, B. E. F. (1983) Can the models of evolutionary biology be applied to phonetic problems? Proceedings of the lOth international congress of phonetic sciences (M. P. R. van den Broecke & A. Cohen, editors), pp. 67-81. Dordrecht: Foris Publications. Lindblom, B. E. F. (1986) A typological study of consonant systems: the role of inventory size. Proceedings of the Swedish phonetic conference (0. Engstrand, editor), pp. 1-9. Uppsala University, Uppsala. Ohala, J. J. (1983) Discussion held at symposium 5: " Phonetic explanation in Phonology" . Proceedings of the lOth international congress of phonelic sciences (M. P. R. van den Broecke & A. Cohen, editors), pp. 175- 182. Dordrecht: Foris Publications. Stevens, K. N. (1989) On the quanta! nature of speech. Journal of Phonetics, 17, 3-45. Whittle, P. (1982) Non-linear programming. In Handbook of applicable mathematics (W. Ledermann & S. Vajda , editors), Vol. IV: Analysis, pp. 623-680. Chichester-New York: John Wiley & Sons.