~ . ~ ~,,~
ELSEVIER
A PRELIMINARY ASSESSMENT OF THE VALIDITY OF THREE INSTRUMENT-BASED MEASURES FOR SPEECH RATE DETERMINATION K L A A S BAKKER Southwest Missouri State Universi O,
G E N E J. B R U T T E N AND J O H N M c Q U A I N Southern Illinois Universi O, at Carbondale
This study assessed the validity of three instrument-based measures with regard to their potential for implementation in automated procedures for speech rate determination. Recorded monologues of 17 normal speakers were analyzed through a live counting procedure to determine the number of syllables each of them produced. Subsequent transcription of these monologues led to exact counts of: (1) all syllables, (2) stressed syllables only, (3) words, and (4) phrases. These results were compared with instrument-based counts of stressed syllables, voice initiations, and pauses. A correlational analysis revealed that automated counts of stressed syllables were strongly predictive of live syllable counts as well as of transcription-based counts of syllables and words. Automated voice initiation counts were also predictive of these measures, but to a lesser extent. These findings were confirmed by a subsequent factor analysis, which, in addition, demonstrated that the number of pauses represented a separate unique dimension. It follows that automated stressed-syllable counts hold the most promise for clinical applications that target speech rate modification.
INTRODUCTION In recent decades, an influx of rate modification therapies for stuttering has made it necessary for clinicians to routinely count syllables while measuring "talking time" (Andrews, Neilson, and Cassar, 1987; Andrews and Feyer, 1985). Moreover, a number of recent approaches concerning the diagnostic evaluation and assessment of fluency disorders require that speech rateAddress correspondence to Dr. Klaas Bakker, Department of Communication Sciences & Disorders, Southwest Missouri State University, 901 South National Avenue, Springfield, MO 65804-0095. J. FLUENCY D1SORD. 20(1995), 63-75 © 1995 by Elsevier Science Inc. 655 Avenue of the Americas, New York, NY 10010
0094-730X/95/$9.50 SSDI 0094-730X(94)00009-1
64
K. BAKKER ET AL~
related data be reported in addition to measures of disfluency (e.g., Costello and Ingham, 1984; Peters and guitar, 1991). Generally, such procedures require the clinician to manually count syllables or words while controlling a hand-held timing device. This creates a burden and limits the attention for additional clinical targets. At the very least, manual determinations of speech rate limit the spontaneity of therapeutic interactions. Intuitively, automated procedures for speech rate determination have advantages over manual procedures. For example, speech rate feedback based on manual procedures necessarily follows the completion of an entire speaking task. This creates a delay between the occurrence of targeted behaviors and the evaluation by the clinician. However, in order to maximize the effectiveness of rate related feedback, it should be contingently applied and immediately follow the behavioral adjustments made by the client (Rimm and Masters, 1979). Automated speech rate measures seem to meet this demand, as they are continuously available to clients and are updated repeatedly throughout speech samples. As a result, the feedback is more directly linked to the behaviors targeted in therapy. Less concentration, furthermore, is needed from clinicians who use automated rather than manual-perceptual procedures for measuring speech rate. This, obviously, frees them for tracking additional clinical targets, and allows them to be more personally involved during the aforementioned therapeutic interactions. Although no instrument-based procedure for measuring speech rate has as yet been developed, a number of variables may reflect it. Goldman-Eisler (1968), for example, used equipment that identified pauses and intervals of phonation during extemporaneous speech. We hypothesize that these measures have the potential for reflecting speech rate. After all, in the long run numbers of pauses and intervals of phonation would likely be proportionately distributed throughout speech. Consequently, these measures should correlate with conventional speech rate measures, such as numbers of syllables or words per unit "talking time" (Ingham, 1984). Recently, in our laboratories we have experimented with an instrument that measures speech rate indirectly. It was designed to identify and count syllables characterized by primary or secondary stress. However, automated counts of pauses, instances of phonation, and stressed syllables per unit time have undetermined concurrent validity for representing the conventional procedures that are based on manual tallies of discrete units of speech production. We do not know to what extent automated measures covary with manually conducted tallies of syllables or words. Moreover, we do not know if the suggested automated procedures reflect aspects of speech rate that are similar to those associated with manual methods. These voids led to the present preliminary investigation which was designed to assess both concurrent and construct validity of the aforementioned automated counting procedures for replacing manual methods such as are currently employed clinically.
SPEECH RATE DETERMINATION
65
Cassette Recorder ] I Bandpass Filters J
I
I
[ Amplifier
I
[ Rectifier/Integrator] [ Rectifler/IntegratorJ [ Rectifier/IntegratorJ I
[ Differentiator I
[ Comparator
I
[ Time Gate
J [ Comparator ]
I
Strassed Syllable
Counter
I
J
}
] [ Comparator
I
I
Voice Initiation Counter
[Time Gate ]
Pause
J ]
1
Counter
}
Figure 1. Acoustic analysis system for counting stressed syllables, voice initiations, and pauses from recorded speech samples. METHOD Seventeen normal speakers, eight men and nine women, participated as subjects in this investigation. They ranged in age from 21 to 64 years. Their mean age was 33. None of these subjects reported present or past difficulties with speech, language, or hearing. Moreover, no such problems were evident during the experimental recordings. Each subject was instructed to engage in a monologue of 2 minutes that was occasioned by a list of suggested topics. Each subject was signaled when the allotted time was over and allowed to finish the sentence. Their speech was recorded on an audio cassette system (Sony; model TC-D5M) using a unidirectional lapel microphone (Beyer; model MCE 5.11). The taped monologues were analyzed employing equipment designed to identify and count syllables characterized by prosodic stress, voice initiations, and pauses (in excess of 250 ms duration, consistent with Goldman-Eisler, 1968). Figure 1 shows the overall design of the equipment used in this investigation.
Automated Counts of Stressed
Syllables.
The left portion of Figure 1 shows that part of the instrumental design that made it possible to identify syllables characterized by prosodic stress. For this purpose, the recorded signal was bandpass-filtered at the fundamental
66
K. BAKKER ET AL.
frequency range. Specifically, two analog filters (Coulbourn Instruments, model $75-36; 24 dB/octave each) were cascaded to produce a high pass at 70 Hz and a low pass at 350 Hz. Subsequently, the filtered signal was slightly boosted with a general purpose interface amplifier (Coulbourn Instruments, model $79-05; AC coupling, with the gain set at 10 times) and rectified/integrated (Coulbourn Instruments model $76-01; full wave rectification; integration time set at 80 ms) to produce the contour following intensity envelopes that represented vowel nuclei of individual syllables. An integration time of 80 ms was chosen because it allowed all syllabic pulses to be represented in the signal, whereas the effect of fluctuations clearly not related to the production of individual syllables would be sharply reduced. To find out which of the syllables received prosodic stress, the slopes of intensity contours were assessed through means of a differentiator (Coulbourn Instruments, $76-41; set to produce 1 V output per 1 V/s input range). This instrument produced positive peaks that reflected abrupt changes in acoustic energy such as associated with initiations of vowels of stressed syllables. Only those peaks, in the positive direction, were counted that exceeded ambient background noise as well as the level of residual acoustic energy caused by consonantal productions and syllables unmarked by prosodic stress. A comparator (Coulbourn Instruments, model $21-06; positive polarity at 50 mV) was used to make this determination. Its function was to mark the presence of each instance where the differentiated signal exceeded the preset criterion of +50 inV. As an extra protection against the possibility of false positives the aforementioned differentiation pulses were fed through a time gate. That is, only pulses ranging in frequency between 1 Hz and 8 Hz were ultimately passed to the digital counter. These frequency boundaries were chosen because they represent pulse rates that would likely be related to the production of individual syllables. Pulses that were either too fast or too slow to be caused by syllables were effectively blocked from the stressed syllable counts.
Automated Counts of Voice Initiations. The middle part of Figure 1 displays the instrumental configuration that was designed to identify and count the subject's voice initiations during speech production. The filtered signal, that is the acoustic energy in the fundamental frequency range for phonation (ban@ass of 50-350 Hz), was rectified and integrated (full wave rectification; custom adjusted to produce an integration time of 10 ms; Coulbourn Instruments model $76-01) and assessed by a comparator (Coulbourn Instruments, model $21-06; positive polarity at 50 mV) to identify the subject's phonatory onsets in speech production. An integration time of 10 ms was chosen here to detect the phonatory onsets as discretely as possible without the potential interference created by individual
SPEECH RATE DETERMINATION
67
glottal cycles. The bandpass filtering, furthermore, was necessary to suppress all nonphonatory acoustic changes and prevent them from affecting the voice initiation counts. A comparator produced pulses for each instance when phonatory power exceeded the preset criterion level. This level was set at +50 mV so that all voiced segments, even those characterized by brief and soft phonation such as schwas bordered by voiceless consonants, would be included in these counts. The chosen cut-off criterion did, however, effectively remove from consideration the potential effects of background noise, or residual noise of voiceless consonants not suppressed by the bandpass filters. Each instance of voice initiation was counted electronically and displayed by a digital counter (Electronic counter, Coulbourn Instruments, model R11-45).
Automated Counts of Pauses. The right part of Figure 1 reveals all instrumental modules incorporated in automatically counting the pauses that exceeded the preset criterion of 250 ms in duration (Goldman-Eisler; 1968). After rectification and integration (Coulbourn Instruments, model $76-01; full wave rectification; custom adjusted to produce an integration time of 2 ms) the acoustic signal was evaluated by means of a comparator (Coulbourn Instruments model $21-06; its positive polarity set at a reference level of 50 mV). The latter signaled those instances, during speech, when the preset criterion of acoustic energy was not met and could not be reliably discriminated from ambient background noise. Each identification of a silent interval, subsequently, triggered a timing circuit that involved a predetermining counter (Coulboum Instruments model $43-30) and a time base (Coulbourn Instruments, model S51-11). Only those pauses, whose duration exceeded the preset durational criterion, were included in the final counts (Electronic counter, Coulbourn Instruments, model R11-45). Prior to analyzing the acoustic signal, the instrumentation was calibrated for each subject so as to compensate for individual differences in modal loudness. That is, the equipment was adjusted so that the signals' amplitudes associated with stressed syllables (after filtering and rectification/integration) would result in voltage peaks that centered around 250 mV in amplitude. This adjustment was made manually by adjusting the output level of the cassette recorder while monitoring the integrated signal on an oscilloscope (Hewlett Packard, model HP 54501A). The chosen reference value formed a safeguard so that: (1) all signal changes related to individual voice initiations (as well as the individual syllable productions) would exceed a minimum level of 50 mV at their respective comparators and (2) the loudest stressed syllable or phonation related pulse would remain within the recommended specifications of all instruments involved in these analyses.
68
K. BAKKER ET AL.
Manual Tallies of Syllables, Stressed Syllables, Words, and Phrases After the automated measurements were completed, the audiotapes were played to one of the experimenters, who separately determined the number of syllables, stressed syllables only, words, and phrases spoken by the subjects. First, this experimenter tallied all of the syllables produced--live--in a fashion similar to that most frequently utilized clinically. That is, the syllables were tracked with a manual counter. The experimenter was not allowed to stop the cassette recorded during these counts. To obtain optimally accurate counts of the numbers of syllables, stressed syllables only, words, and phrases produced by the subjects, the recordings were transcribed orthographically. The experimenter counted only those syllables that were also perceptually evident from the recordings. Stressed syllables were counted separately, and were considered to be those that perceptually contained pure vowel nuclei. A phrase was defined as a "tone group" (Ladefoged, 1982), or an independent unit of speech that is characterized as such by its prosodic pattern.
Statistical Analysis The concurrent validity of the automated measures was assessed by determining their Pearson product-moment correlations with the manual syllable (both live and from transcription), stressed syllable, word, and phrase counts. In addition, the construct validity of the automated measures in their intended function for representing aspects of rate such as reflected by manual tallies of syllables, stressed syllables, words, and phrases was examined through a principal component factor analysis. This analysis was followed by a Varimax rotation to optimize the clarity of its statistical solution. The eigenvalue for selecting factors was set at 1.0. Through this analysis it was possible to establish whether or not any or all of the automated measures would group together with the conventional methods that involve manual counts of units of speech. RESULTS Table 1 displays the descriptive data of the automated, manual, and transcription-based measures. Among these, the manual syllable counts deserve special attention, because they represent the most commonly chosen procedure in therapeutic applications. As can be seen from the table, the manual syllable counts approximated their transcription-based equivalents. On average these counts differed by 43.9 syllables, or 10.0% of the total number of spoken syllables derived from transcription. The instrument-based counts of the stressed syllables could also be directly compared to a transcription based equivalent. As can be seen, both
SPEECH RATE DETERMINATION
69
Table 1. Means and Standard Deviations of the Automated Measures (Number of Stressed Syllables, Voice Initiations, and Pauses), the Manual Live Measure (Syllables), and Manual Transcription-based Measures (Syllables, Stressed Syllables, Words, and Phrases) M
SD
Automated: Number of stressed syllables Number of voice initiations Number of pauses
339.6 195.7 55.9
64.2 35.3 12.4
Manual: Number of syllables
479.8
85.4
Transcription-based: Number of syllables Number of stressed syllables Number of words Number of phrases
435.9 353.8 324.2 82.1
77.0 64.3 52.2 15.1
numbers were in close approximation to each other. In fact, the difference on this measures was only 4% of the number of stressed syllables obtained from transcription. The nature of the remaining two automated measures--number of voice initiations and pauses--did not permit a direct comparison with any of the manual or transcription-based measures that were obtained in this study. Nevertheless, it was felt that the intercorrelations between these sets of measures would be instructive. Table 2 displays the correlations between the automated counts (i.e., tallies of stressed syllables, voice initiations, and pauses) and the manual counts (i.e., real-time tallies of syllables, or transcription based tallies of syllables, stressed syllables, words, and phrases). As can be seen in this table, one of the automated counts--that of stressed syllables--stands out because of its strong relationship to a number of the manual counts. Specifically, the automatically derived stressed syllable numbers correlated highly, and to the same extent, with the real-time manual syllable tallies and the transcriptionbased manual tallies of all syllables, as well as merely those characterized by prosodic stress. Automated stressed syllables counts, too, though to a slightly lesser extent, related to the remaining transcription-based tallies (i.e., those that involved number of words and phrases). These correlations, then, suggest that the automated stressed-syllable counting procedure provides a valid measure for use in clinical applications. Number of voice initiations, the second automated measure, was also assessed with regard to its potential for serving as an indirect measure of
70
K. BAKKER ET AL.
Table 2. Correlation Coefficients Between the Automated (Stressed Syllables, Voice Initiations, and Pauses) and Manual Tallies (Real-Time and Transcription-based) Automated Measures:
Real-time manual tallies: Number of syllables Transcription-based tallies: Number of syllables Number of stressed syllables Number of words Number of phrases
Stressed Syllables
Voice Initiations
Pauses
.88"
.58a
-. 14
.88~ .89" .78.67a
.72" .74" .62" .44~
-. 19 -.04 -. 13 .33
Significant at the .01 level or better.
speech rate. As can be seen in Table 2, moderate correlations were obtained between the voice initiations and the real-time as well as transcription based tallies of syllables. The correlation between number of voice initiations and the real-time syllable counts was lower than for the transcription-based syllable counts. Perhaps this reflects the subjective component that necessarily exists in manual counts conducted during running speech. Overall, the automated voice initiation counting procedure revealed a less favorable picture than did the automated counts of stressed syllables, at least when studied with reference to the conventional manual procedures. Nevertheless, even the moderate correlations of voice-initiation counts with transcription-based counts of syllables and stressed syllables suggests that it could still be an alternative in conditions where automated stressed syllable counts would be compromised. After all, the current data were based on analyses conducted on normal speech and at a rate of the subject's own choosing. The present data, then, may not necessarily reflect the relationships between these measures among individuals with fluency problems and who, in addition, may produce rate-modified speech in therapy. The frequency with which pausing occurred was the remaining automated rate-related speech measure explored in this investigation. This measure failed to show a statistically significant relationship to either the real-time manual tallies or any of the transcription-based tallies. The number of pauses failed to show a statistically significant relationship to number of phrases produced by the subjects. This is somewhat surprising as one would expect each individual phrase to be bordered by two acoustic pauses. Perhaps the relationship between numbers of acoustic pauses and phrases in reality is, either not very strong, or possibly confounded by contextual and acoustic factors not considered in this investigation. Whether or not the variables in this investigation belong to one or more unique statistical dimensions, or constructs, was investigated by means of
SPEECH RATE DETERMINATION
71
Table 3. Results of Principal Component Factor Analysis, Followed by Varimax Rotation, on Automated, Manual, and Transcription-based Speech Rate-related Measures Factor 1 71.6%" Automated Stressed syllables Voice Initiations Pauses
Factor 2
16.6%"
.93 .76 -.07
.08 .20 .98
Manual tallies Syllables
.95
-.09
Transcription-based tallies Syllables Stressed syllables Words Phrases
.97 .99 .95 .74
-. 16 .02 -.07 .47
,'Percentageof total varianceaccountedfor by factor.
factor analysis. More specifically, through this analysis it would become clear if hand-scored data, such as those commonly used in determining speech rate, belong to the same class as the automated measures under investigation. Table 3 reveals the results that emerged from a principal component factor analysis that involved all automated and manually obtained measures that were taken from the recorded monologues. Two factors surfaced which, together, accounted for 82.2% of the total variance. Factor l was the most powerful dimension that emerged from the analysis. It brought together all of the rate-related speech measures except number of pauses. This pattem of results suggests that, in a statistical sense, all measures but pauses, represented one unique dimension. The traditional speech rate measures--manual syllable tallies during running speech-strongly loaded on this factor and as such showed a relationship to all of the transcription-based as well as two of the automated procedures. A particularly strong intercorrelational structure was evidenced for the following measures: automated stressed syllable counts, real-time syllable tallies, and transcription-based tallies of all syllables as well as merely those characterized by prosodic stress. This, then, adds strength to the conclusion that under the conditions of this investigation, automated identification of stressed syllables is a valid choice for representing the syllables numbers that are traditionally determined through manual tallies. The fact that the automated procedure, due to technical limitations, is confined to stressed syllables only, did not appear to have an impact on its construct validity as a measure
72
K. BAKKER ET AL.
expressing speech rate. Moreover, the strong predictiveness of the measure with regard to the syllable numbers obtained live suggests that, if the findings can be replicated with stuttering subjects, automated stressed syllable tallies are useful for estimating the manual tallies through a mathematical conversion. Automatic voice-initiation counts were also related to syllable counts that were determined through manual means. Once again, then, the evidence suggests that this measure has potential for use in clinical applications where rate measurement of speech production is needed. Nevertheless, the extent of its correlation to the more traditional manual means of speech rate determination was less favorable than that of the automated stressed-syllable counts. Voice initiations, then, may have the potential to serve as an alternative measure of rate under circumstances in which the use of automated stressedsyllable counts would be compromised or limited, by conditions not addressed in this study. After all, the present investigation targeted only the speech of normal speakers, without any of the alterations common to many fluency therapies. The usefulness of pause counts for speech rate determination was not supported by the present factor analytic results. The number of pauses loaded on a separate factor that was unrelated to the automated or manual measures of speech rate. As such, it stood out as an individual characteristic of speech production. Nevertheless, number of pauses appeared to have some moderate relationship to the numbers of phrases produced by the subjects. This is intuitively meaningful, as one would expect all phrases to be at least delineated by pauses.
DISCUSSION AND CONCLUSIONS The present findings suggested that two automated measures, stressedsyllable and voice-initiation counts per unit time, show promise for use in clinical programs that are designed to modify speech rate. Both of them showed considerable concurrent and construct validity in comparing favorably with manual procedures that have traditionally been used for measuring speech rate in clinical settings. Although the automated stressed-syllable measure most closely mirrored that which resulted from the conventional manual method, voice-initiation rate also evidenced considerable concurrent validity when subjected to factor analysis. Its somewhat reduced predictive power, compared to that of the stressed syllable counts, may have resulted from the fact that this measure is less directly related to syllable counts and, therefore, may be subject to different contextual and phonatory variables. A speaker's need for using voice initiation in speech may vary from one phonetic context to the next. Moreover, the way voicing features are handled by individual speakers may differ as well. Whereas some individuals blend
SPEECH RATE DETERMINATION
73
words together and, when appropriate, maintain phonation across many word boundaries, others may not do so. As a result, individual variations in the use of phonatory onsets during speech production may confound its relationship to other speech-related measures. The fact that voice initiation numbers showed a somewhat less strong relationship to transcription-based syllable counts than did the automated counts of stressed syllables may not reduce the import of this measure. The possibility still exists, for example, that this measure though somewhat weak as a predictor of conventional measures for rate, is still useful for reflecting rate changes such as targeted in many therapies. Obviously, this possibility is in need of further empirical investigation. It should be noted that the voice-initiation rate displayed by the individual may have a different relevance to many therapists concerned with stuttering therapy. That is, some have emphasized the importance of coordinating laryngeal action with respiration and articulation (e.g. Perkins, et al., 1976; Adams, 1971, 1974, 1975). Others have promoted the establishment of gentle phonatory onsets (e.g. Webster, 1978) as a means of reducing the extent of motor demands placed upon the larynx. As a result, measuring voice-initiation rate, or even the proportion of voice initiations per total number of syllables spoken, may well represent attractive clinical measures other than their relationship to speaking rate. Obviously, these suggestions, too, are in need of further empirical verification. Not until after such studies have been completed can we establish whether or not these possibilities represent clinically attractive or feasible options. It should be noted that the aforementioned automated procedures were tested in relation to the speech of a sample of normal speakers. Further study is needed to determine the concurrent validity of the stressed syllable and voice initiation measures when the subjects sampled are stutterers. Also, the validity of using automated counts of stressed-syllables, or voice-initiations, on speech that is substantially slowed, or prolonged as a result of therapy should be determined. Therapies that use continued phonation targets in therapy, furthermore, would create obvious confounding for the procedure that depends on voice initiation counts and would, to say the least, make this measure counterproductive as a potential predictor for rate of speech production. Clearly, additional research is needed to assess further the concurrent and construct validity of implementing the presently suggested methods for measuring speech rate in clinical work with those whose fluency is problematic. Such research should address the use of the recommended automated measures with modified speaking styles such as those targeted in many fluency programs. To the extent that these studies show that the automated stressed syllable and voice initiation measures, that have been described, appropriately mirror speech rate, and free the clinician to make other obser-
74
K. BAKKER ET AL.
vations, they would likely replace the manual counting procedure that is presently in use.
REFERENCES Adams, M.R., and Reis, R. (1971). The influence of the onset of phonation on the frequency of stuttering. Journal of Speech and Hearing Research 14, 639-644. Adams, M.R., and Reis, R. (1974). Influence of the onset of phonation on the frequency of stuttering: A replication and reevaluation. Journal of Speech and Hearing Research 17, 752-754. Adams, M.R. (1974) A physiologic and aerodynamic interpretation of fluent and stuttered speech. Journal of Fluency Disorders 1, 35-47. Adams, M.R. (1975). Clinical interpretations and applications. In: Vocal tract dynamics and dysfluency. (Webster, L.M., and Furst, L. eds.). New York: Speech Hearing Institute. Andrews, G., Neilson, M., and Cassar, M. (1987). Informing stutterers about treatment. In: Progress in the treatment of fluency disorders. (Rustin, L., Purser, H., and Rowley, D. eds.). London: Taylor and Francis. Andrews, G., and Feyer, A.M. (1985). Does behavior therapy still work when the experimenters depart: an analysis of a behavioral treatment program for stuttering. Behavior Modification 9, 443-447. Costello, J.M., and Ingham, R.J. (1984). Assessment strategies for stuttering. In: Nature and treatment of stuttering: New directions (Curlee, R.F., and Perkins, W.H., eds.). San Diego: College Hill Press. Goldman-Eisler, F. (1968). Psycho linguistics: Experiments in spontaneous speech. London: Academic Press. Ingham, R.J. (1984). Stuttering and behavior therapy: Current status and experimental foundations. San Diego: College Hill Press. Ladefoged, P. (1982). A course in phonetics. 2nd ed. San Diego: Harcourt Brace Jovanovich. Perkins, W.H., Rudas, J., Johnson, L., and Bell, J. (1976). Stuttering: Discoordination of phonation with articulation and respiration. Journal of Speech and Hearing Research 19, 509-522.
SPEECH RATE DETERMINATION
75
Peters, T.J., and Guitar, B. (1991). Stuttering: An integrated approach to its nature and treatment. Baltimore: Williams and Wilkins. Rimm, D.C., and Masters, J.C. (1979). Behavior therapy: Techniques and empiricalfindings. New York: Academic Press. Webster, R.L. (1978). Precision fluency shaping program: Clinicians program guide. Roanoke: Hollins Communications Development Corporation. Manuscript received September 1993; revised January 1994; accepted April 1994.