ORIGINAL ARTICLE
COMPARING REAL-TIME AND TRANSCRIPT-BASED TECHNIQUES FOR MEASURING STUTTERING J. SCOTT YARUSS The University of Pittsburgh, Pittsburgh, Pennsylvania, USA
MICHELLE S. MAX Consolidated School District 21, Wheeling, Illinois, USA
ROBYN NEWMAN Northwestern University, Evanston, Illinois, USA
JUNE HAERLE CAMPBELL Private Practice, Carmel, California, USA
The purpose of this study was to compare results obtained from two different techniques for counting speech disfluencies. Fifty audio-videotaped speech samples were analyzed using: (a) a transcript-based technique designed to evaluate speech (dis)fluency in the context of a speaker’s conveyed message and (b) a real-time technique designed to rapidly determine the frequency of various types of speech disfluencies in conversational speech. Results obtained using these two techniques were quite similar, though there were some consistent and predictable differences associated with the presence of brief sound prolongations and more complicated disfluency clusters. Further analysis revealed that the two measurement techniques resulted in substantially different severity ratings for only two of the 50 speech samples. Findings provide support for a comprehensive measurement strategy in which clinicians utilize a transcript-based approach when more detailed information is needed (e.g., during diagnostic evaluations) and a real-time approach for documenting on-going changes in clients’ speech behaviors (e.g., during treatment). © 1998 Elsevier Science Inc.
INTRODUCTION In recent years, there has been considerable debate regarding the measurement of stuttering behaviors. One of the most prominent topics in the literature has been measurement reliability (Cordes, 1994; Cordes & Ingham, 1994a; Curlee, 1981; Kully & Boberg, 1988; MacDonald & Martin, 1973; Tuthill, Address correspondence to J. Scott Yaruss, Ph.D., CCC-SLP, Communication Sciences and Disorders, The University of Pittsburgh, 4033 Forbes Tower, Pittsburgh, PA 15260.
J. FLUENCY DISORD. 23 (1998), 137–151 © 1998 Elsevier Science Inc. All rights reserved. 655 Avenue of the Americas, New York, NY 10010
0094-730X/98/$19.00 PII S0094-730X(98)00003-5
138
J. S. YARUSS ET AL.
1946; Young, 1975, 1984); however, there are a number of other important issues in need of further study (Yaruss, 1997a). For example, recent research has examined whether it is appropriate to count instances of disfluency or instances of stuttering (e.g., Conture, 1990; Cordes & Ingham, 1996a; Costello & Ingham, 1984; Ham, 1989), whether such counts should be based on the number of words or number of syllables produced (Andrews & Ingham, 1971; Brundage & Bernstein Ratner, 1989; Ham, 1986), or whether counts should be based on the instances of speech disruptions or the time intervals containing disruptions (Cordes & Ingham, 1994b,c, 1995a, 1996b; Cordes et al., 1992; Ingham, Cordes, & Finn, 1993; Ingham, Cordes, & Gow, 1993). It seems likely that clinicians’ varying decisions regarding these variables may contribute to the finding that different clinicians obtain different results when measuring stuttering (Kully & Boberg, 1988; Cordes & Ingham, 1995a). In addition to these topics, it seems reasonable to assume that there may also be important differences associated with the type of measurement technique a clinician selects. Some methods for measuring stuttering emphasize a more comprehensive analysis of speech (dis)fluency based on a detailed verbatim transcript of a speech sample (e.g., Campbell & Hill, 1987; Rustin, Botterill, & Kelman, 1996), while others emphasize a more rapid but less detailed analysis of the production of speech disfluencies based on real-time (or “online,” after Conture, 1990) counting and categorizing of speech disfluencies (e.g., Conture, 1990, Conture & Yaruss, 1993; Riley, 1994; Yaruss, 1998). Certainly, each technique has its advantages and disadvantages. Transcriptbased techniques provide more information than can readily be obtained through real-time measures, including more detailed analyses of how a client’s speech and language abilities relate to the production of speech disfluencies. Transcript-based techniques also provide greater opportunities to assess qualitative aspects of speech disfluencies, such as audible and visible tension. Unfortunately, transcript-based analyses are also quite time consuming, so it is generally not feasible to conduct such detailed measurements on a regular basis (e.g., for documenting changes in a client’s progress throughout treatment). Real-time techniques, on the other hand, are much faster to complete, so they provide a method for collecting the objective data necessary to document changes in a client’s stuttering behaviors without requiring a large time commitment. Still, the amount of detail that can be assessed with real-time techniques is somewhat limited. Based on the strengths and weaknesses of these two approaches, then, one seemingly reasonable measurement strategy would be to utilize transcript-based methods when more detailed data are necessary, such as during a diagnostic evaluation, and real-time methods when it is less feasible to devote a large amount of time to data collection, such as during treatment (Yaruss, 1997a). Unfortunately, it is not presently clear how the results of real-time and transcript-based measures relate to one another. Given the concerns regarding the
STUTTERING MEASUREMENT TECHNIQUES
139
reliability of stuttering measurements noted above, it seems reasonable to assume that there may be differences in the results obtained using these different techniques. For example, real-time analyses require rapid judgments of speech behaviors, so subtle behaviors may be less likely to be identified. Conversely, there may be a tendency to “overanalyze” subtle behaviors with transcriptbased procedures, when a videotaped speech sample can be viewed repeatedly or in slow motion. As a result, it is not clear whether data obtained using these two techniques can be reasonably compared in a strategy such as that proposed above. Accordingly, the purpose of this study was to examine similarities and differences in the frequency and types of speech disfluencies obtained using a transcript-based and a real-time analysis in order to determine whether these two types of measurement approaches can be combined in a comprehensive strategy for measuring the speech disfluency behaviors of individuals who stutter.
METHOD Speech Samples Analyses in this study were based on 50 audio/videotaped speech samples, 200 syllables in length, that were drawn from a collection of videotapes at the Northwestern University Speech and Language Clinics. Each speech sample was collected during a comprehensive diagnostic evaluation of the client’s speech and language production designed to determine whether the client was in need of treatment for stuttering, and, if so, what the nature of that treatment should be (for detailed discussions of the diagnostic procedures, see Gregory, 1986; Gregory & Hill, 1993). Because the purpose of this study was to evaluate different measurement techniques, rather than different individuals who stutter, the speech samples analyzed in this study were drawn from clients at a variety of ages and severity levels, speaking in a variety of different situations.
Analysis Techniques For each of the 50 speech samples, the frequency and types of speech disfluencies were determined utilizing both a transcript-based technique and a realtime technique, as follows: Transcript-Based Technique. The transcript-based technique selected for this study was Systematic Disfluency Analysis (SDA, Campbell & Hill, 1987), a method designed to examine the frequency and severity of speech (dis)fluency within the context of a speaker’s conveyed message. This technique was selected for this study because it has been in use at Northwestern University for over 10 years (in varying stages of development). This study represents one stage in an on-going process of evaluating whether this method
140
J. S. YARUSS ET AL.
can provide a meaningful or useful measurement of speech fluency in a clinical setting. For this technique, the clinician audio/videotape-records a 200-syllable sample of meaningful spontaneous speech in any of a number of different speaking situations (Yaruss, 1997b). From the speech sample, the clinician then prepares a detailed orthographic transcription of the client’s speech and notes on the transcript the occurrence of all speech disfluencies, as well as the audible and visible qualitative features of those disfluencies. Each disfluency is categorized based upon the continuum of “more typical” and “less typical” disfluencies described by Gregory and colleagues (e.g., Campbell & Hill, 1987; Gregory, 1986; Gregory & Hill, 1993).1 In general, “more typical” disfluencies are those that are considered most characteristic of individuals who do not stutter (i.e., repetitions of phrases, revisions, interjections, hesitations), while “less typical” disfluencies are those that are considered most characteristic of individuals who do stutter (e.g., repetitions of sounds or syllables, prolongations, blocks, or any other disfluency produced with accompanying audible or visible physical tension). Audible features of disfluencies that are included in the analysis include increased loudness or pitch changes, as well as the number of iterations in a repetition or the length of a prolongation. Visible features of disfluencies include the presence of nonspeech behaviors or physical tension. Finally, all instances of “multi-component” or “clustered” disfluencies (e.g., Hubbard & Yairi, 1988; LaSalle & Conture, 1995) are identified and categorized as more typical or less typical as noted above. (Note that according to the instructions for the SDA, multi-component or clustered disfluencies are counted as a single instance of disfluency.) The frequency analysis in the SDA is based on the percentage of more typical and less typical disfluencies and disfluency clusters that occurred during the 200-syllable sample. Thus, after coding the speech disfluencies in a sample as more typical or less typical, the clinician counts the total number of each type of disfluency in the sample and divides by 2 (because there are 200 syllables in the speech sample) to obtain a frequency count representing the percentage (i.e., number per 100 syllables) of more typical and less typical disfluencies occurring in that sample (Yaruss, 1998). It is important to note that the percent occurrence of individual disfluency types (e.g., prolongations or repetitions) is not calculated as part of the SDA procedures, although such information is present on the verbatim transcript (Yaruss, 1997a). The SDA also includes a severity analysis (Campbell, Hill, & Driscoll, 1991), which is based on a scoring system that weights the types of disfluencies and the qualitative features of disfluencies described above to derive an index of stuttering 1See Yaruss (1997a) for detailed discussions of how the continuum of more typical and less typical disfluencies relates to similar systems proposed by Conture and colleagues (1990a, b) and Yairi and colleagues (Yairi, 1996; Yairi & Ambrose, 1992; Yairi, Ambrose, & Niermann, 1993; Yairi, Ambrose, Paden, & Throneburg, 1996).
STUTTERING MEASUREMENT TECHNIQUES
141
severity. Because the purpose of this study was to compare measures of the frequency and type of disfluency in transcript-based versus real-time analysis techniques, the present study utilized only the frequency analysis portion of the SDA procedures for more typical and less typical disfluency types. Real-time Technique. The real-time technique selected for this study was based upon procedures designed to obtain a rapid count of the number and types of disfluencies present in a speech sample, as described by Conture and colleagues (e.g., Conture, 1990, pp. 56–60; see also Yaruss, 1998). For this technique, the clinician observes a speech sample (either while the client is speaking or while viewing a videotape2) and determines whether each word or syllable is produced fluently or disfluently. Using a standard disfluency count sheet (e.g., Conture, 1990, pp. 58; Yaruss, 1998), the clinician marks fluent words or syllables with a dash or a dot, while disfluent words or syllables are identified with an abbreviation indicating the type of disfluency produced (e.g., “SSR” for sound/syllable repetition). Next, the clinician calculates the overall frequency of disfluencies per 100 words or syllables produced, as well as the relative frequencies of various disfluency types. Unlike the SDA, which is a formally defined measurement instrument that specifies the nature of the speech sample and the types of disfluencies to be assessed, the real-time technique makes fewer requirements about factors such as the nature of the disfluencies assessed, the size of the sample utilized, or whether the unit of measurement should be words or syllables. In other words, this real-time procedure can be used regardless of the specific decisions made by the clinician regarding the unit of measurement or the disfluencies to be identified (Yaruss, 1997a). Accordingly, the procedures described by Conture (1990) and Yaruss (1998) were adapted in the present study to allow a more direct comparison between real-time and transcript-based analyses. Specifically, the technique was used to count disfluency syllables, rather than words (as suggested by Conture), and the disfluency types identified using the real-time technique were the same as those used in the SDA (i.e., “more typical” vs. “less typical” disfluencies), rather than those utilized by Conture and colleagues (“between-word” vs. “within-word” disfluencies; see footnote 1). Furthermore, the frequency counts were limited to the percentage of more typical and less typical disfluencies as specified by the SDA, rather than individual disfluency types as described by Conture (1990; Yaruss, 1997a, 1998). Finally, although Conture suggests that the length of the speech sample should be at least 300 words, the length of the samples in this study was limited to the 200 syllables proscribed in the instructions for the SDA. In
2Note that although it is possible to view and review the videotape when conducting real-time measures after an evaluation has occurred, the purpose of the technique is to obtain a relatively rapid objective analysis of the frequency and types of disfluencies. Thus, the videotape is not necessarily reexamined to the same extent as it is with transcript-based techniques such as the SDA.
142
J. S. YARUSS ET AL.
spite of these relatively minor changes, however, the basic principles of the real-time technique (rapid count of the frequency and types of disfluencies while watching a client speak or while watching a videotape) were preserved for comparison to the SDA procedures.
Data Analysis and Measurement Reliability When making a comparison of different measurement techniques, there are a number of methodological concerns that may bias the analyses in favor of obtaining more similar results from the different techniques. For example, if the same clinician analyzes a given speech sample on two occasions, it seems likely that decisions made during the first analysis might affect decisions made during the second analysis (similar to the bias that may be introduced in a test-retest situation; e.g., Cordes, 1994, Schiavetti & Metz, 1997). This would be particularly problematic if the first analysis required many careful viewings of the videotape (as in the SDA), while the second analysis was supposed to be based on a more rapid viewing (as in the real-time measure). Accordingly, it was determined that the best way to conduct a meaningful comparison of these two techniques was to utilize different clinicians for measuring each technique. Thus, in this study, the SDA results were drawn from original measures made by examining clinicians at the time of the client’s initial diagnostic evaluations, while the real-time analyses were based on measures made later by the second author via videotapes recorded during the evaluations. To ensure reliability of the transcript-based measures between observers, a number of steps were taken. First, the clinicians who collected the initial SDAs had undergone extensive training regarding the analysis of speech fluency (e.g., Campbell, Hill, Yaruss, & Gregory, 1996). Second, each SDA was thoroughly reviewed by the authors of the SDA technique (June Campbell & Diane Hill) following the diagnostic evaluations. Finally, to assess the reliability of the SDA data, 10 speech samples (20% of the data) were selected at random and scored by the third author. Comparisons between the original and re-scored SDAs for both more typical (MT) and less typical (LT) disfluency types revealed small, nonsignificant mean differences (MT: mean 5 0.11, SD 5 1.84, t(9) 5 0.19, p 5 .85; LT: mean 5 2 0.90, SD 5 1.63, t(9) 5 21.75, p 5 .12) and strong, positive Pearson product-moment correlations (MT: r 5 .86; p , .001; LT: r 5 .87; p , .001). Similarly, to assess reliability of the real-time measures, 10 different speech samples (20% of the data) were selected at random and re-scored by the second author. As with the SDA measures, comparison between the original and re-scored real-time measures for both MT and LT revealed small, nonsignificant mean differences (MT: mean 5 0.10, SD 5 1.27, t(9) 5 0.19, p 5 .81; LT: mean 5 0.31, SD 5 1.25, t(9) 5 0.71, p 5 .45) and strong, positive Pearson product-moment correlations (MT:
STUTTERING MEASUREMENT TECHNIQUES
143
r 5 .87; p , .001; LT: r 5 .97; p , .001). Thus, it was determined that the measures made in this study were sufficiently reliable to allow meaningful comparison of the SDA and real-time measurement techniques.
RESULTS Similarities and Differences Between the Techniques Similarities and differences between the two measurement techniques were assessed in three ways for both more typical and less typical disfluencies: (a) a direct comparison of the frequency counts obtained with each technique; (b) mean differences and paired samples t-tests (Figure 1); and (c) Pearson product-moment correlations (Figure 2). More typical disfluencies. Direct comparison of the frequency of more typical disfluencies obtained using the two measurement techniques revealed that the results were identical (within 6 1%) for only 27 (54%) of the 50 speech samples. Nevertheless, there was a high degree of similarity between the two techniques. The mean difference in the frequency of more typical disfluencies obtained using the two techniques was only 0.16% (SD 5 1.64%), and paired t-tests revealed no significant difference (t(49) 5 270; p 5 .49). Finally, Pearson product-moment correlations revealed strong, significant, positive correlations between the real-time and transcript-based measurement techniques (r 5 .88; p , .001). Less typical disfluencies. Direct comparison of the frequency of less typical disfluencies obtained using the two measurement techniques revealed
Figure 1. Mean differences between real-time and transcript-based analyses for more typical and less typical disfluency types. Vertical lines indicate 1 standard deviation.
144
J. S. YARUSS ET AL.
Figure 2. Scatterplots indicating strong positive correlations between real-time and transcript-based analyses for more typical and less typical disfluency types.
that the results were identical (within 61%) for only 17 (34%) of the samples for less typical disfluency types. Again, however, there was a high degree of similarity between the two techniques. The mean difference between the techniques was 0.50% (SD 5 2.75%), and paired t-tests revealed no significant difference (t(49) 5 1.29; p 5 .20). Finally, Pearson product-moment correlations revealed strong, significant, positive correlations between the real-time and transcript-based measurement techniques (r 5 .94; p , .001).
Explaining the Differences In order to better understand why differences occurred between the two measurement techniques, the transcripts and real-time measures were directly examined for all samples in which the absolute difference in the results was greater than or equal to 3%. More Typical Disfluencies. For the more typical disfluency types, four such samples were found (8% of the data). In each of these cases, the real-time measure resulted in a higher frequency than the transcript-based measure (mean difference 5 3.96%). Analysis of the transcripts revealed that these samples contained several disfluency clusters that were counted as a single disfluency in the transcript-based analysis, but as two or more disfluencies in the real-time analysis. Thus, the higher frequencies obtained using the realtime techniques could be attributed to different ways of coding disfluency clusters. Less Typical Disfluencies. For the less typical disfluencies, 11 samples (22% of the data) were found for which the difference between the two tech-
STUTTERING MEASUREMENT TECHNIQUES
145
niques was greater than or equal to 3%. The transcript-based analysis resulted in a higher frequency for four of these cases (mean difference 5 5.21%), while the real-time measure resulted in the higher frequency for the other seven cases (mean difference 5 4.58%). Analysis of the transcripts revealed that those cases in which the transcript-based technique resulted in a higher frequency contained several very brief prolongations (less than 0.25 second) that were not coded in the real-time analyses. Those cases in which the realtime analysis resulted in a higher frequency contained several disfluency clusters. As with the more typical disfluency clusters discussed above, these disfluencies were counted as a single disfluency in the transcript-based technique and as two or more disfluencies in the real-time technique, thus resulting in a higher frequency count for the real-time analysis.
Diagnostic Implications of Measurement Differences In order to determine whether the differences obtained via the two techniques might have resulted in different severity judgments for these 50 speech samples, the frequency counts for the less typical disfluencies (i.e., those disfluencies that were most likely to represent stuttering; see Conture, 1990) obtained from each analysis were assigned point-values based on the guidelines in the Iowa Scale for Rating the Severity of Stuttering (Johnson, Darley, & Spriestersbach, 1963).3 The resulting point-values were then compared to determine whether there were any differences that might have resulted in a client’s stuttering severity being assigned a substantially different severity rating (e.g., “mild” using one technique and “moderate” or “severe” using the other technique). Analyses revealed only two instances in the 50 speech samples (4%) in which the frequency counts obtained from the different measurement techniques resulted in such different severity ratings. In one case, a client was rated as “mild” using data from the SDA and as “moderate” using data from the real-time analysis. In the other, a client was rated as “moderate” using the SDA and “severe” using the real-time analysis. Again, these were among the cases where clustered disfluencies were tabulated differently in the two techniques as discussed above.
DISCUSSION Results from the present investigation reveal that the frequency of more typical and less typical disfluencies obtained from a transcript-based analysis
3The goal of this comparison was to examine how differences in frequency counts might be related to differences in severity rating, Accordingly, the more commonly used Stuttering Severity Instrument (Riley, 1994) was not selected for these analyses to minimize the effects of differences in the duration of disfluencies or physical concomitants on overall severity ratings.
146
J. S. YARUSS ET AL.
were quite similar, but not identical, to those obtained from a real-time analysis. As such, these findings provide part of the necessary background for evaluating a comprehensive data-collection strategy in which clinicians utilize a transcript-based approach when more data are needed for making diagnostic decisions and a real-time approach for documenting on-going treatment (e.g., Yaruss, 1997a). Because of the strong similarities that were found between the two techniques, it would seem quite appropriate for clinicians to employ such a strategy, knowing that the results they obtain using different measurement techniques will not differ significantly simply because of the nature of the measurement techniques utilized. Still, there were some important differences in results that clinicians should consider when combining measurement approaches in this way. Specifically, the speech samples that resulted in the greatest discrepancies between real-time and transcript-based measures were those that contained either brief, subtle prolongations or a higher number of multi-component or clustered disfluencies. These findings are not necessarily surprising, for as noted in the Introduction, it seems quite reasonable to assume that events of shorter duration may be more likely to be overlooked in a real-time analysis. It is interesting to note, however, that the instructions for the SDA technique (Campbell & Hill, 1987) specifically state that clinicians should not code disfluencies of less than 0.5 second in duration. Thus, the fact that clinicians were identifying these brief disfluencies would appear to provide an example of the “overanalysis” suggested in the Introduction. Furthermore, if the clinicians had been using the SDA as instructed, it is likely that the present analysis would have revealed even fewer differences between the two measurement techniques than were found. Similarly, the more complicated nature of disfluency clusters makes it likely that a clinician may not be able to adequately consider whether the components of a disfluency cluster are combined or separate when conducting a real-time analysis. Thus, as appears to have happened in the present study, such multi-component disfluencies may be more likely to be coded as separate disfluencies, rather than combined disfluencies. Accurate coding of disfluency clusters is a potentially important issue, since recent research has indicated that disfluency clusters are a common aspect of the speech of children who stutter and may help distinguish children who do stutter from children who do not (LaSalle & Conture, 1995). Further, such clustering may be indicative of the types of coping strategies used by adults who stutter. Accordingly, clinicians should be particularly cautious when analyzing the results of a speech sample that contains a high number of disfluency clusters, regardless of the specific measurement technique that is selected. Although these differences between the two techniques were identified, it is important to note that such differences were rarely large enough to result in substantially different severity ratings being assigned using the Iowa Scale
STUTTERING MEASUREMENT TECHNIQUES
147
(Johnson et al., 1963). So, even though some minor differences might be introduced into the documentation of a client’s stuttering behaviors due to the difference in measurement technique, it is not yet clear that these are differences that make a difference for the diagnostic process. It is important to note that the high degree of reliability and consistency found in the present study is likely due, at least in part, to the extensive training clinicians received prior to collecting their data. In general, prior reports have revealed that it is possible to achieve good measurement reliability when judges are carefully trained about the nature of the stuttering measures to be made (Cordes & Ingham, 1994b, 1996b; Costello & Hurst, 1981; Ingham, Cordes, & Gow, 1993). Accordingly, clinicians must make sure that they have received adequate training and that their measures are sufficiently reliable before attempting to measure their client’s stuttering behaviors (Yaruss, 1997a; 1998). Based on the present findings, then, it seems that a reasonable aspect of this training may involve analyzing a given speech sample using more than one measurement technique in order to identify similarities and differences in the clinician’s judgments (Yaruss, 1997a; 1998). Indeed, the process of analyzing data from this study has highlighted the need for further training in the area of coding brief prolongations and disfluency clusters in order to improve the reliability of both the SDA and real-time measures made by the clinicians trained by the present authors. Further studies of this sort should help identify other aspects of the measurement process that may benefit from improved training. Related to the issue of high reliability, it is also important to note that the present results are based on overall or average frequencies, rather than a pointby-point comparison of the specific syllables coded during the real-time and SDA analyses. Prior research has demonstrated that overall agreement is generally higher than point-by-point agreement for stuttering measures (see Cordes, 1994; Lewis, 1994). Thus, it would seem that an important extension of this study would be to examine point-by-point agreement between these two techniques. Unfortunately, however, it is not possible to examine pointby-point agreement for real-time measures of conversational speech, since it is not possible to determine exactly which words were coded when the words themselves are not recorded on the scoring sheet (Yaruss, 1998). Thus, further research on the point-by-point agreement of real-time and transcript-based measures must await further study, perhaps using a reading passage or other material for which it can be assured that the same words are coded using each technique. In addition, due to the procedures for coding disfluency types described in the instructions for the SDA (which codes individual disfluency types, but does not calculate individual frequencies), the present study examined agreements obtained for more typical disfluencies or less typical disfluencies as a whole. Accordingly, another way in which the present findings can be extended in future studies would be to examine the agreement between the
148
J. S. YARUSS ET AL.
two measurement techniques for different types of disfluencies (e.g., prolongations or repetitions). For the present, however, it seems clear that the two techniques can provide highly comparable data for the types of data that are collected in many clinical settings. In sum, results from this study confirm that transcript-based and real-time techniques can provide highly similar results for measuring the frequency of more typical and less typical speech disfluencies. When measuring the speech behavior of their clients who stutter, clinicians can feel comfortable adopting a comprehensive measurement strategy that involves using a detailed transcript-based approach to support the diagnostic decision-making process combined with a less detailed real-time approach to document changes seen in ongoing treatment. This manuscript was prepared while all authors were at Northwestern University. The authors would like to express their appreciation to Gene Brutten, Ken St. Louis, and two anonymous reviewers for their helpful input on an earlier version of this paper. Portions of this paper were presented at the 1995 Convention of the American Speech-Language-Hearing Association, Orlando, Florida.
REFERENCES Andrews, G. & Ingham, R. (1971). Stuttering: Considerations in the evaluation of treatment. British Journal of Disorders of Communication 6, 427– 429. Boberg, E. & Kully, D. (1985). Comprehensive Stuttering Program. San Diego: College-Hill Press. Brundage, S. & Bernstein Ratner, N. (1989). The measurement of stuttering frequency in children’s speech. Journal of Fluency Disorders 14, 351–358. Campbell, J. & Hill, D. (1987, Nov.). Systematic disfluency analysis: Accountability for differential evaluation and treatment. Miniseminar presented to the Annual Convention of the American Speech-Language-Hearing Assoc. New Orleans, LA. Campbell, J., Hill, D., & Driscoll, M. (1991, Nov.). Systematic Disfluency Analysis: Using SDA to determine stuttering severity. Poster paper presented to the Annual Convention of the American Speech-Language-Hearing Association. Anaheim, CA. Campbell, J.H., Hill, D.G., Yaruss, J.S., & Gregory, H.H. (1996, Nov.). Integrating academic and clinical education in fluency disorders. Invited seminar presented at the Annual Convention of the American Speech-LanguageHearing Association, Seattle, WA.
STUTTERING MEASUREMENT TECHNIQUES
149
Conture, E.G. (1990). Stuttering (2nd ed.). Englewood Cliffs, NJ: PrenticeHall. Conture, E.G. & Yaruss, J.S. (1993). Handbook for childhood stuttering: A training manual. Tucson, AZ: Bahill Intelligent Computer Systems. Cordes, A.K. (1994). The reliability of observational data: I. Theories and methods of speech-language pathology. Journal of Speech and Hearing Research 37, 264–278. Cordes, A.K. & Ingham, R.J. (1994a). The reliability of observational data: II. Issues in the identification and measurement of stuttering events. Journal of Speech and Hearing Research 37, 279–294. Cordes, A.K. & Ingham, R.J. (1994b). Time-interval measurement of stuttering: Effects of interval duration. Journal of Speech and Hearing Research 37, 779–788. Cordes, A.K. & Ingham, R.J. (1994c). Time-interval measurement of stuttering: Effects of training with highly agreed or poorly agreed exemplars. Journal of Speech and Hearing Research 37, 1295–1307. Cordes, A.K. & Ingham, R.J. (1995a). Judgments of stuttered and nonstuttered intervals by recognized authorities in stuttering research. Journal of Speech and Hearing Research 38, 33–41. Cordes, A.K. & Ingham, R.J. (1996a). Disfluency types and stuttering measurement: A necessary connection? Journal of Speech and Hearing Research 39, 404–405. Cordes, A.K. & Ingham, R.J. (1996b). Time-interval measurement of stuttering: Establishing and modifying judgment accuracy. Journal of Speech and Hearing Research 39, 298–310. Cordes, A.K., Ingham, R.J., Frank, P., & Ingham, J.C. (1992). Time-interval analysis of interjudge and intrajudge agreement for stuttering event judgments. Journal of Speech and Hearing Research 35, 483–494. Costello, J.M. & Hurst, M.R. (1981). An analysis of the relationship among stuttering behaviors. Journal of Speech and Hearing Research 24, 247–256. Costello, J.M. & Ingham, R.J. (1984). Assessment strategies for stuttering. In R.F. Curlee & W.H. Perkins (Eds.), Nature and treatment of stuttering: New directions. San Diego: College-Hill Press. Curlee, R.F. (1981). Observer agreement on disfluency and stuttering. Journal of Speech and Hearing Research 24, 595–699. Gregory, H. (1986). Stuttering: Differential evaluation and therapy. Austin, TX: Pro-Ed. Gregory, H.H. & Hill, D. (1993). Differential evaluation—differential therapy
150
J. S. YARUSS ET AL.
for stuttering children. In R.F. Curlee (Ed.), Stuttering and related disorders of fluency. New York: Thieme Medical Publishers. Ham, R. (1986). Techniques of stuttering therapy. Englewood Cliffs, NJ: Prentice-Hall. Ham, R. (1989). “What are we measuring?” Journal of Fluency Disorders 14, 231–243. Hubbard, C. & Yairi, E. (1988). Clustering of disfluencies in the speech of stuttering and nonstuttering preschool children. Journal of Speech and Hearing Research 31, 228–233. Ingham, R.J., Cordes, A.K., & Finn, P. (1993). Time-interval measurement of stuttering: Systematic replication of Ingham, Cordes, & Gow (1993). Journal of Speech and Hearing Research 36, 1168–1176. Ingham, R.J., Cordes, A.K., & Gow, M. (1993). Time-interval measurement of stuttering: Modifying interjudge agreement. Journal of Speech and Hearing Research 36, 503–515. Johnson, W., Darley, F.L., & Spriestersbach, D.C. (1963). Diagnostic methods in speech pathology. New York: Harper & Row. Kully, D. & Boberg, E. (1988). An investigation of interclinic agreement in the identification of fluent and stuttered syllables. Journal of Fluency Disorders 13, 308–318. LaSalle, L.R. & Conture, E.G. (1995). Clustering of between-word and within-word disfluencies in the speech of children who do and do not stutter. Journal of Speech and Hearing Research 38, 965–999. Lewis, K.E. (1994). Reporting observer agreement on stuttering event judgments: A survey and evaluation of current practice. Journal of Fluency Disorders 19, 269–284. MacDonald, J.D. & Martin, R.R. (1973). Stuttering and disfluency as two reliable and unambiguous response classes. Journal of Speech and Hearing Research 16, 691–699. Riley, G. (1994). Stuttering severity instrument for children and adults (3rd ed.). Austin, TX: Pro-Ed. Rustin, L., Botterill, W., & Kelman, E. (1996). Assessment and therapy for young dysfluent children: Family interaction. San Diego: Singular Publishing. Schiavetti, N. & Metz, D.E. (1997). Evaluating research in communicative disorders (3rd Ed.). Boston, MA: Allyn & Bacon. Tuthill, C.E. (1946). A quantitative study of extensional meaning with special reference to stuttering. Speech Monographs 13, 81–98. Yairi, E. (1996). Applications of disfluencies in measurements of stuttering. Journal of Speech and Hearing Research 39, 402–404.
STUTTERING MEASUREMENT TECHNIQUES
151
Yairi, E. & Ambrose, N. (1992). A longitudinal study of stuttering in children: A preliminary report. Journal of Speech and Hearing Research 35, 755–760. Yairi, E., Ambrose, N., & Niermann, R. (1993). The early months of stuttering: A developmental study. Journal of Speech and Hearing Research 36, 521–528. Yairi, E., Ambrose, N., Paden, E.P., & Throneburg, R.N. (1996). Predictive factors of persistence and recovery: Pathways of childhood stuttering. Journal of Communication Disorders 29, 51–77. Yaruss, J.S. (1997a). Clinical measurement of stuttering behaviors. Contemporary Issues in Communication Science and Disorders 24, 33–44. Yaruss, J.S. (1997b). Clinical implications of situational variability in preschool children who stutter. Journal of Fluency Disorders 22, 187–203. Yaruss, J.S. (1998). Real-time analysis of speech fluency: Procedures and reliability training. American Journal of Speech-Language Pathology 7, 25–37. Young, M.A. (1975). Observer agreement for marking moments of stuttering. Journal of Speech and Hearing Research 18, 530–540. Young, M.A. (1984). Identification of stuttering and stutterers. In R.F. Curlee & W.H. Perkins (Eds.), Nature and treatment of stuttering: New directions. San Diego: College-Hill.