Journal of Phonetics (!989) 17, 55- 61
Theoretical implications of the quantal nature of speech: a commentary Sheila E. Blumstein Department of Cognitive and Linguistic Sciences, Brown University, Providence, RI 02912, U.S.A.
The relation between the quanta! theory of speech and a theory of acoustic invariance and distinctive feature theory is considered. Certain aspects of the quanta! theory are questioned including the role of the proximity of two formants in defining quanta! regions and the extent to which there is an "abrupt" change in the boundary regions between two quanta! regions. Nevertheless, it is argued that the quanta! theory provides the basis for a theory of acoustic invariance and distinctive feature theory. Issues that have yet to be resolved include whether the same acoustic properties necessarily generalize across different sound classes, whether all distinctive features have their basis in quanta! relations, whether the particular distinctive features proposed are the correct ones, and the exact characterization of the nature of the acoustic properties corresponding to the full set of distinctive features.
In his paper " On the Quanta) Nature of Speech" (QNS) , Stevens (1989) has provided a thorough and clear elaboration of the theory which he has developed over the last 17 years. One of the attractions of the quanta) theory (QT) is that it is an integrated theory in the sense that it attempts to account for the nature of the properties of human speech both with respect to the production and the perception system. In fact , it is the intersection of these two systems and the consequent limitations of the quanta) properties for both production and perception which provide a natural explanation for the finite properties defining the sound structure of natural language. The very simple view that there are regions of acoustic stability provided by the architecture of the articulatory system and that these regions "match" the properties of the auditory system gives rise to this integrated theory of speech. Like all well-elaborated and well-constructed theories, it is testable and hence potentially disconfirmable. Stevens should be applauded for providing us with such a provocative and elegant theory of speech . In the following discussion , a number of theoretical issues with respect to the QT will be considered. Some of the theoretical claims of the theory will be reviewed with the hope of bringing into relief some questions and areas for further research. The implications of the QT of speech particularly for the theory of acoustic invariance and distinctive fea ture theory will also be discussed. Stevens makes a number of assumptions in the elaboration of the QT of speech that impinge directly on two other theories of speech-the acoustic invariance theory and 0095-4470/89/010055
+
07 $03 .00/0
© 1989 Academic Press Limited
56
Sheila E. Blumstein
distinctive feature theory. As he states: We suggest that this tendency for quanta! relations between articulatory and acoustic parameters or between acoustic and auditory parameters is a principal factor shaping the inventory of articulatory states or gestures and their acoustic consequences that are used to signal distinctions in language. The articulatory and acoustic attributes that occur within the plateau-like regions of the relations are, in effect, the correlates of the distinctive features .. . One consequence is that during the time the articulatory structures are close to the target state specified by a particular feature, some change in this configuration or state can occur without a significant modification in the relevant attribute of the sound pattern (p . 5).
In other words, Stevens assumes that the QT forms a basis for a theory of acoustic invariance as well as distinctive feature theory. In principle, it would seem that one could propose a quanta! theory of speech in the absence of either a theory of acoustic invariance or distinctive feature theory. Let us consider these two theories separately. A theory of acoustic invariance claims that there are acoustic properties which correspond to the phonetic dimensions of speech, and these properties remain the same across a number of sources of variability including phonetic context, syllable position, speaker, and language. Finding the " landmarks" or those parts of the acoustic signal where such properties may emerge has been a major focus of research for those attempting to chart out the nature of these properties (Stevens & Blumstein, 1981 ). Nevertheless, while it is probably the case that a theory of acoustic invariance requires that the properties of speech be quanta!, it does not follow that the QT necessarily gives rise to invariant properties that stay relatively stable across all sources of variability. It is a requirement that such properties stay stable across some articulatory variability. That is the essence of the QT. However, it is possible that quanta! , stable properties could emerge for a particular place of articulation, but these properties could change as a function of phonetic context such as vowel environment. The considerable influence that neighboring sounds may have on each other could easily affect in a systematic way the particular property that may emerge, even if the property is still quanta) in nature. For example, velar stops in the context of front vowels could have a different property from velar stops in the context of back vowels (cf. Searle, Jacobson, & Rayment, 1979; Kewley-Port, 1983). There is nothing intrinsic to the QT that would require them to have the same property (although cf. Stevens & Blumstein, 1978; Blumstein & Stevens, 1979 for such a claim). Similarly, there is nothing intrinsic in the QT that would necessarily require that the properties of speech relate specifically to distinctive features , or at least to the type of features proposed by Jakobson , Fant & Halle (1963) or Chomsky & Halle (1968). The features as they currently are proposed are arranged into bundles that together ultimately correspond to static phonetic units or segments. Such units are in theory a part of the underlying phonological representation of an utterance. It is certainly possible for the QT to give rise to a set of properties that are stable. However, these properties need not correspond to our current notions of distinctive features or phonological and phonetic units (cf. Fowler, 1986). While the QT may be independent of a theory of acoustic invariance and distinctive feature theory, it is less clear that these two theories would be viable without the QT. Both theories require the emergence of a finite number of invariant properties from a potentially infinite number of articulatory configurations. However, the QT stands on its
Theoretical implications of the quanta! nature of speech: a commentary
57
own merits quite apart from a theory of acoustic in variance or distinctive feature theory. Nevertheless, because of the crucial role that the QT plays, not only in terms of elaborating the properties of speech, but also in terms of the viability of the theories of distinctive features and acoustic invariance, it may be worthwhile to consider first some aspects of the QT that do not impinge directly on either theory. Stevens discusses the proximity of two formants as one manifestation of how quanta! properties emerge. Such proximity may contribute to the stability of the formant structure in one part of the spectrum, and in addition, may serve to enhance acoustically this part of the spectrum. What is at issue is whether proximity itself should receive the special status that Stevens seems to suggest. First, it is uncertain whether a region will still be defined as quanta! if two formants are close to each other but the remaining formants are changing rapidly and hence are not stable across the particular articulatory region in question . Second, quanta! regions may emerge in the absence of the proximity of two formants. For example, in both labial and coronal stop consonants the spectral peaks are spread ou~ or diffuse (Jakobson, Fant, & Halle, 1963; Stevens & Blumstein, 1978). Third and more importantly, a number of the consonants characterized by the proximity of two formants are linguistically marked. That is, they are not as common a part of the sound inventory of natural languages as consonants which are spectrally diffuse. For example, pharyngeal consonants are characterized by the proximity of the third and fourth formant , and retroflex consonants are characterized by the proximity of the third and fourth formant. Nevertheless, these two types of sounds are far less common in natural language and occur in more restricted phonetic environments than either of the diffuse labial and dental consonants (cf. Maddieson, 1984). Thus, while Stevens implies that the proximity of two formants may be a " preferred" parameter in defining quanta! regions, it does not necessarily correspond to " preferred" regions in establishing the phonetic inventories of natural language. While stability is the crucial defining characteristic of the quanta! regions , the nature of the acoustic parameters which define these regions and the region between them has, for Stevens, particular characteristics. The canonical view of the QT is represented in Fig. I of QNS. The two regions of stability that he describes are, hypothetically, regions I and III. In theory, there is a large difference in the value of the acoustic parameters characterizing region I and region III. It is within region II, the intermediate area, in which there is theoretically an abrupt change of the acoustic parameter distinguishing regions I and III. As Stevens states, " as we will see in the examples, the differences in the acoustic pattern between regions I and III should not be regarded as simply a matter of identifying two points on a scale of some acoustic parameter. Rather, the acoustic attribute often undergoes a qualitative change as the articulatory parameter moves through region II" (p. 4). Although the acoustic parameters defining regions I and III seem to be qualitatively different, e.g. the distinction between labial and dental consonants, the region between them, as defined by region II, does not show an abrupt change. For example, as Fig. 3 in QNS shows, the changes of the formant frequencies as a function of the length of the back cavity are relatively continuous. As a result, while the quanta! regions may have particular acoustic characteristics that are qualitatively different from each other, the regions in between seem to reflect a continuum of change. The perception data exploring the acoustic parameters of the phonetic dimensions corresponding to these quanta! articulatory regions is consistent with the view that the
58
Sheila E. Blumstein
boundary regions themselves, while quanta) and abrupt perceptually, are acoustically continuous. It is typical that at a particular region, a change along an acoustic continuum of 50 Hz or several decibels will have a categorical effect on perception. For example, as Stevens describes, the distinction between [sa] and [sa] emerges within 1 dB of when the spectrum amplitude of the third formant is equal to the F 3 prominence in the adjacent vowel; the cross-over point in identifying nasal and non-nasal consonants occurs within a range of 1-2 dB when the acoustic parameter varied is the relation between the low-frequency amplitude in the consonant region and the adjacent vowel; and the distinction between affricates and fricatives rests crucially on a rise-time of about 15 ms. Thus, while the quanta! regions themselves seem to be qualitatively distinct, and perceptually abrupt, the acoustic parameters between these different articulatory regions seem to be characteristically continuous. As indicated earlier, the QT bears importantly on the theory of acoustic invariance and distinctive feature theory. It is to these two theories and their relation to the QT that we will now turn . Critical to the research investigating the theory of acoustic invariance is determining where in the acoustic signal the acoustic property is likely to reside . The challenge is one of finding particular regions in the acoustic signal which remain stable over a number of sources of variability and which correspond to discrete phonetic attributes. With a seemingly continuously changing signal, these "landmark" areas have largely been elusive (cf. Liberman eta!. , 1967). However, Stevens provides a working hypothesis for finding these landmarks, particularly for consonants, in his elaboration of the QT. According to him , changes in articulatory states corresponding to feature changes in the production of many consonants will result in rapid changes in the relevant acoustic parameter. These changes mark the landmark areas where invariant properties are likely to reside. As he states: In the acoustic signal, therefore, there will be an alternation between temporal regions where the acoustic parameters remain relatively steady, and narrow regions marked by acoustic events where there are rapid changes . These somewhat discontinuous attributes of the acoustic signal occur in spite of rather continuous movements or changes in the articulatory parameters (p. 5).
It is just these landmark areas where the search for stable acoustic properties has been
most successful. For example, stable acoustic properties have been derived for place of articulation in stop consonants by focusing on the spectral changes in the vicinity of the stop release (Blumstein & Stevens, 1979; Searle, Jacobson, & Rayment, 1979; KewleyPort, 1983; Lahiri, Gewirth, & Blumstein, 1984), for stops vs . glides by focusing on the amplitude properties in the vicinity of the stop or glide release (Mack & Blumstein, 1983 but cf. Nittrouer & Studdert-Kennedy, 1986), and for place of articulation in nasal consonants by focusing on spectral changes from the murmur to the nasal release (Kurowski & Blumstein , 1987). Nevertheless, while these results are promising, they have yet to provide a full delineation of the invariant properties corresponding to the phonetic dimensions of speech. In fact, the results of these studies have brought up a number of critical and challenging issues. First, it is not clear that the same acoustic properties corresponding to place of articulation will generalize across different manners of articulation. A feature-based theory of acoustic invariance requires that the same property corresponds to place of articulation across such manner changes as stop, fricative , and nasal. Yet, the properties
Theoretical implications of the quanta! nature of speech: a commentary
59
which have been described for place of articulation in nasal and stop consonants, while similar, are not the same. For both labial stops and nasals, there is a greater spectral change in lower frequency regions relative to higher frequency regions, whereas for alveolar stops and nasals, there is a greater spectral change in the higher frequency regions relative to the lower frequency regions. Nevertheless, research results to data indicate that the delineated frequency regions characterizing these properties differ for stops and nasals (cf. Kurowski & Blumstein, 1987). Similarly, the characterization of the properties relevant to the alveolar fricatives of English [s z] show a preponderance of energy in higher frequency regions than those characteristically described for the alveolar stops [t d] 2 (cf. Heinz & Stevens, 1961). It well may be that a common set of properties can be derived across these various manners of articulation, but the problem has yet to be solved. Other challenges relate to the exact characterization of the nature of the acoustic properties corresponding to the full set of distinctive features. Such a characterization requires the quantitative assessment of a proposed property across a set of utterances, vowel environments, phonetic contexts, rates, speakers, and ultimately languages. lri addition, perceptual experiments need to be conducted to determine the extent to which the acoustic property has perceptual relevance for the listener. If the listener does not show sensitivity to such acoustic manipulations, then the relevance of this property for the QT and ultimately for a theory of acoustic invariance would be questioned. For example, listeners seem to be more sensitive to relative changes in the spectrum than to the gross shape of the spectrum in determining place of articulation in stop consonants (cf. Blumstein, Isaacs & Mertus, 1982; Walley & Carrell, 1983; Lahiri eta!., 1984). Such results suggest that stable acoustic properties for place of articulation in stop consonants correspond to time-varying rather than static properties of the spectrum (cf. Blumstein & Stevens, 1979; Kewley-Port, 1983). Furthermore, while much attention has been focused on place of articulation (primarily because researchers have had the greatest difficulties finding stable properties for these phonetic dimensions (Liberman et a!., 1967)), quantitative data assessing both the acoustic stability and perceptual relevance of the proposed properties corresponding to the full inventory of the phonetic features has yet to be fully collected and evaluated . With respect to distinctive features , as Stevens himself points out, it has yet to be determined whether all distinctive features have their bases in the quanta! relations described in the paper. Presumably, those contrasts for which the articulatory and acoustic parameters are inherently more salient will bear a greater burden in the phonetic/phonological inventories of natural language. In particular, they should appear more frequently in phonetic inventories, and they should have a broader phonetic/ phonological distribution in the sound structure of the language. Similarly, if it is the case that the strength of a property for a particular feature is dependent on other features that co-occur with that feature , then it might be expected that that property would more likely be able to undergo such phonological processes as assimilation and neutralization. It has yet to be determined whether the particular distinctive features proposed are in fact the appropriate ones. Although most feature theories take phonetic structure into account, they were originally derived primarily to characterize the phonological structure of language. Nevertheless, phonetic and phonological structure do not seem to always go hand in hand . The feature [strident] is a good case in point. The feature [strident] relates to the amount of turbulence noise generated at the constriction (cf. Jakobson eta!. , 1963; Chomsky & Halle, 1968; Stevens, 1985). Based primarily on phonological grounds, both J okobson eta!. ( 1963) and Chomsky & Halle ( 1968)
60
Sheila E. Blumstein
consider that the fricatives [fs s] are [+strident] and [9] is [-strident]. However, phonetic analyses based upon the acoustic and perceptual characteristics of fricatives suggest that this analysis is incorrect. In particular, acoustic analyses have suggested that [s] and [s] form a natural class distinct from [f] and [9] on the basis of the nature of the feature strident. These analyses have shown that the overall amplitude of the fricative noise relative to the vowel is greater for [s] and [s] compared with [f] and [9] (McCasland, 1978, 1979a, b; Behrens & Blumstein, 1988a; cf. also Ladefoged, 1971). However, while the acoustic data suggest that [s s] and [f 9] form a natural class based on the amplitude properties of these sounds, the perceptual data raise questions about whether there is or should be a separate phonetic feature relating to the amplitude properties of the fricative noise. In a recent study (Behrens & Blumstein, 1988b), the amplitude properties of fricativevowel stimuli were manipulated such that the amplitude of the fricative noise relative to the vowel for [s] and [s] was lowered to be similar to that of[f] and [9], and conversely, the amplitude of the fricative noise relative to the vowel for [f] and [9] was raised to be similar to that of [s] and [s]. Results indicated that when the spectral properties of the fricative noise arid formant transitions were compatible, the perceptual effects of the amplitude manipulations were relatively small. Moreover, although decreasing the amplitude of [s] and [s] resulted in an increase in [f] and [9] responses, increasing the amplitude of [f] and [9] did not result in an increase in [s] and [s] responses. These results question whether the feature [strident] serves to characterize correctly the dichotomy between [f 9] and [s s], and also questions whether amplitude properties in general serve a distinctive role in characterizing place of articulation in fricative consonants. It is of course possible that the amplitude characteristics play a more significant perceptual role, but that the property has not been appropriately characterized. However, whatever the ultimate outcome, the delineation of a feature based on phonological considerations may or may not provide the basis for phonetic features relating to the QT and to the theory of acoustic invariance. In sum, the QT provides a rich theoretical framework for the consideration of the sound structure of human language. While many details need to be further elaborated and more research will be needed to map out the various properties of speech, the QT provides a working hypothesis that the data hold much promise for success. Although in principle the QT is separate from both a theory of acoustic in variance and distinctive feature theory, as currently proposed by Stevens, the structural properties that emerge from it can form the basis for a theory of acoustic invariance and distinctive feature theory. As a result, this single theoretical framework relates articulatory, auditory, phonetic and phonological properties of speech within a single integrated view. For this reason , it has already influenced greatly speech research over the past 17 years, and it should continue to do so, for it is a theory that is exciting, challenging, and provocative. Many thanks to K. N. Stevens for many helpful discussions over the years and for his continued influence and support. His dedication to speech research provides an example and set of standards for all of us to follow . This research was supported in part by Grant NS 15123 to Brown University.
References Behrens, S. & Blumstein, S. E. (1988a) Acoustic characteristics of English voiceless fricatives: A descriptive analysis. Journal o.f Phonetics 16, 295-298. Behrens, S. & Blumstein, S. E. (1988b) On the role of the amplitude of the fricative noise in the perception of place of articulation in voiceless fricative consonants. Journal o.f the Acoustical Society o.f America, 84, 861-867.
Theoretical implications of the quanta! nature of speech: a commentary
61
Blumstein, S. E., Isaacs, E. & Mertus, J. (1982) The role of the gross spectral shape as a perceptual cue to place of articulation in initial stop consonants. Journal of the Acoustical Society of America, 72, 43-50. Blumstein, S. E. & Stevens, K . N. (1979) Acoustic invariance in speech production: Evidence from measurements of the spectral characteristics of initial stop consonants. Journal of the Acoustical Society of America, 66, 1001-1017. Chomsky, N. & Halle, M. (1968) The Sound Patlern of English. New York: Harper and Row. Fowler, C. (1986) An event approach to the study of speech perception from a direct-realist perspective. Journal of Phonetics, 14, 3-28. Heinz, J. M. & Stevens, K. N. (1961) On the properties of voiceless fricative consonants. Journal of the A coustical Society of America, 33, 589-596. Jakobson , R. , Fant, G . & Halle, M. (1963) Preliminaries to Speech Analysis. Cambridge MA: MIT Press. Kewley-Port, D. (1983) Time-varying features as correlates of place of articulation in stop consonants. Journal of the Acoustical Society of America, 73, 322-335. Kurowski , K. & Blumstein, S. E. (1987) Acoustic properties for place of articulation in nasal consonants. Journal of the Acoustical Society of America, 81 , 1917- 1927. Ladefoged, P. (1971) Preliminaries to Linguistic Phonetics. Chicago: The University of Chicago Press. Lahiri, A. , Gewirth, L. & Blumstein, S. E. (1984) A reconsideration of acoustic invariance for place of articulation in diffuse stop consonants: Evidence from a cross-language study. Journal of the Acoustical Society of America, 76, 391-404. Liberman, A. M. , Cooper, F. S., Shankweiler, D. P. & Studdert-Kennedy, M. (1967) Perception of the speech code. Psychology Review, 74, 431-461. Mack, M. & Blumstein, S. E. (1983) Further evidence of acoustic invariance in speech production: The stop-glide contrast. Journal of the Acoustical Society of America, 73, 1739-1750. Maddieson, I. (1984) Patlerns of sounds. Cambridge: Cambridge University Press. McCasland, G. P. (1978) Stridency as a distinctive feature of American fricatives. Paper presented at the Modern Language Association, New York. McCasland, G. P. (1979a) Noise intensity and spectrum cues for spoken fricatives. Journal of Acoustical Society of America, 65, S78-79. McCasland, G. P. (1979b) Noise intensity of spoken fricatives. Journal of the Acoustical Society of America, 66, S88. Nittrouer, S. & Studdert-Kennedy, M. (1986) The stop-glide distinction: Acoustic analysis and perceptual effect of variation in syllable amplitude envelope for initial /b/ and fwf. Journal of the Acoustical Society of America, 80, 1026- 1029. Searle, C. L. , Jacobson, J. Z . & Rayment, S. G . (1979) Stop consonant discrimination based on human audition. Journal of the Acoustical Society of America, 65 , 799-809. Stevens, K. N. (1985) Evidence for the role of acoustic boundaries in the perception of speech sounds. In V. A. Fromkin (ed.), Phonetic Linguistics: Essays in Honor of Peter Lade.foged, New York: Academic Press, 243-255. Stevens, K. N. (1989) On the quanta! nature of speech, Journal of Phonetics, 17, 3-45. Stevens, K. N. & Blumstein, S. E. (1978) Invariant cues for place of articulation in stop consonants. Journal of the Acoustical Society of America, 64, 1358-1368. Stevens, K. N. & Blumstein, S. E. (1981) The search for invariant acoustic correlates of phonetic features. In P. D. Eimas & J. L. Miller (eds), Perspectives on the Study of Speech. Hillsdale NJ: Erlbaum, pp. 1-38. Walley, A. C. & Carrell, T. D. (1983) Onset spectra and formant transitions in the adult's and child's perception of place of articulation in stop consonants. Journal of the Acoustical Society of America, 73, 1011 - 1022.