A multi-media, computer-based method for stroboscopy rating training

A multi-media, computer-based method for stroboscopy rating training

Journal of Voice Vol. 12, No. 4, pp. 513-526 © 1998 Singular Publishing Group, Inc. A Multi-media, Computer-Based Method for Stroboscopy Rating Train...

NAN Sizes 1 Downloads 42 Views

Journal of Voice Vol. 12, No. 4, pp. 513-526 © 1998 Singular Publishing Group, Inc.

A Multi-media, Computer-Based Method for Stroboscopy Rating Training Bruce J. Poburka and *Diane M. Bless Mankato State University, Mankato, Minnesota; *University of Wisconsin-Madison, Madison, Wisconsin, U.S.A.

Summary: Methods of training individuals to rate stroboscopic examinations vary widely in rating criteria, viewing times, samples, and length of training. Consequently, problems occur in both inter- and intrajudge agreement. Computer-aided instruction (CAI) provides a means to integrate and control key learning factors that facilitate learning. This study attempted to determine if CAI could train individuals to make accurate and reliable visuo-perceptual judgments of stroboscopy. Experienced and inexperienced subjects rated 45 samples before and after training. Following 4 to 5 hours of CAI training, the subjects with no previous experience demonstrated improved interjudge agreement with a panel of expert raters. The training was not effective for the experienced group. Regardless of the rater's experience, the parameters that required evaluation of movement were more difficult to rate than those requiring only an assessment of structure. Key Words: Videostroboscopy--Visuo-perceptual-Reliability---Computer-aided instruction.

Videostroboscopy has been heralded as the best clinical technique for describing voice disorders. Compared to perceptual or acoustic methods of voice analysis, stroboscopy offers a more direct view of the larynx and vibratory pattern. Bless, Hirano, and Feder (1) have stated that stroboscopy provides an immediate indication of the presence or absence of pathology; it provides a permanent record; and when

paired with other assessment techniques, it provides qualitative and quantitative measures of vocal function. Sercarz et al. (2) stated that stroboscopy has gained widespread use and that it compliments other assessment measures. Sataloff, Spiegel, and Hawkshaw (3) studied 377 clinical cases and concluded that the videostroboscopic examination modified the diagnosis (i.e., changed, added to, or confirmed uncertain diagnoses) in 47% of the cases. Mahieu and Dikkers (4) wrote that, "The value of stroboscopy in vocal fold pathology cannot be overemphasized." They reported that phonosurgery is accomplished only when stroboscopic assessment occurs both pre- and postoperatively.

Accepted for publication March 10, 1997. Address all correspondence and reprint requests to Bruce J. Poburka, Ph.D, Department of Communication Disorders, MSU 77, P.O. Box 8400, Mankato State University, Mankato, MN 56002-8400, U.S.A.

513

514

BRUCE J. POBURKA AND DIANE M. BLESS

Fundamental to the validity and reliability of stroboscopic evaluation is the experience and training of the rater. Experts have identified a number of factors that can influence the acquisition and interpretation of stroboscopy images and some of them relate specifically to training. They include: 1) the observer's knowledge of how vocal fold vibration relates to sound production; 2) knowledge of normal anatomy and physiology; 3) skill with the stroboscopic technique; and 4) skill in interpretation of the recorded image ( 1,5). Although stroboscopy is a valued clinical and research tool, its utility is currently limited by trainingrelated problems. Bartell, Bless, and Ford (6) found that the type of training or length of time necessary for adequate training were issues in need of further study. Research studies reporting interjudge agreement and intrajudge reliability have reported values that clearly suggest a need for improvement. Interjudge agreement values ranging from .75 to .98 and intrajudge reliability values ranging from .31 to .97 have been reported (7,8). Recognizing this problem, some authors have sought ways to develop a more reliable method of making and interpreting stroboscopy observations (9,10). Perhaps the training problem was best described by Leonard (11) who stated: Although specific experiences may differ among professionals, the interpretation and clinical use of laryngeal videostroboscopy information in the assessment and treatment of phonatory function disorders is highly specialized and requires substantial training and knowledge beyond that believed available in most graduate speech-language pathology or laryngology residency programs. It is clear that a better method of training is needed, lest stroboscopic evaluations not be fully exploited. ADVANTAGES O F C O M P U T E R - A S S I S T E D T R A I N I N G (CAI) Selected educational research has focused on how computer-aided instruction (CAI) can be used to manipulate key instructional variables that have been shown to facilitate learning. These variables are: 1) methods of presenting material, 2) interactivity, 3) feedback, and 4) practice. Since rating stroboscopic images is a visuo-perceptual task, it requires the rater Journal of Voice, VoL 12, No. 4, 1998

to have a good mental concept of normal laryngeal movement patterns. CAI may be well suited for helping to develop these mental concepts. Kozma (12) stated that, "computers also have the capability of creating dynamic, symbolic representations of nonconcrete, formal constructs that are frequently missing in the mental models of novices." He also argued that computer-assisted learning facilitates development of expertise: Experts in a domain are distinguished from novices, in part, by the nature of their mental models and how they use them to solve problems. The processing capabilities of the computer can help novices build and refine mental models so that they are more like those of the experts. Kozma (12) formulated a theoretical framework to support the use of CAI. Within his theoretical framework, "the learner actively collaborates with the medium to construct knowledge" (p179). The author argued that the medium (the various ways the material is presented) and the method have an integral relationship where both are part of the educational design. Researchers who examined use of interactive videodiscs concluded that videodisk presentations include dynamic, visual, and spatial characteristics that facilitate the formation of rich mental models of a given topic (13,14). Interactivity allows learners to adjust the instruction to conform to their individual needs and capabilities (15). A critical aspect of interactivity is branching, or the capability to move to various parts of a lesson in a nonlinear manner (16). Control over the leaming environment has been shown to be a critical factor in obtaining positive learning results (17,18). Theory of skill acquisition is an area often ignored in discussions of teaching clinical judgments. Yet these theories appear to be at the heart of acquiring clinical skill. Theory of skill acquisition relates practice, interactivity, nonlinearity of learning, and the need to control the learning environment. The theory advanced by Anderson (19) underscores the importance of practice in the development of expertise and further indicates that virtually every study of skill acquisition has related practice to improved performance. Videostroboscopy would appear to be an ideal skill to be acquired through CAI: It is a dynamic, visual

A MULTI-MEDIA, COMPUTER-BASED METHOD FOR STROBOSCOPY RATING TRAINING

phenomenon; persons acquire the skill at different rates and come to the task with different levels of the underlying foundation of knowledge; and clinicians could benefit from feedback about their ratings. Despite its seemingly ideal nature, no CAI training program has yet been developed. It was therefore the purpose of this study to determine if a computer-assisted training program could be used to train individuals to make accurate and reliable visuo-perceptual judgments of stroboscopic examinations. This study does not seek to compare computer-based and classroom teaching techniques. Also, it should be clearly understood that this is a preliminary study. It is only the first in a possible series of studies that could examine stroboscopy training issues.

METHODS Subjects There were two subject groups in this study: a group of 27 female students who had no experience with stroboscopy and a group of female speech-language pathologists who did have experience. The student group consisted of students who were enrolled in the Assessment and Treatment of Voice Disorders course, a graduate-level voice disorders course at a midwestern university. The experienced group consisted of nine individuals with a minimum of 1 year's experience in rating stroboscopy. On average, this group had 2.4 years of experience in rating stroboscopy examinations (range = 1-5.5 years). Although no males served as subjects in either group, it was felt that the all-female groups were acceptable since the great majority of speech-language pathologists are female. A panel of three expert judges was recruited to rate the same video samples as the two subject groups. The ratings submitted by the expert judges were used as the standard of correctness against which the subjects' ratings were compared. The individuals who were eventually selected as judges had considerable stroboscopy rating experience and all had conducted research involving stroboscopy rating. Additionally, the experts met a criterion tbr rating reliability. The expert judges had an average of 9.2 years of experience with stroboscopy rating (range = 5.5-14 years).

515

PROCEDURES Development of the computer-based modules Information that is critical to stroboscopy rating was integrated into an interactive, multi-media, computer-based format using HyperCard software (Apple Developers Inc.) and a laser disk player. All of the modules utilized the key learning features discussed earlier (i.e., interactivity, practice, etc.). A typical screen included narrative information, digitized pictures, and video (Fig. 1). The video sequences were stored on a laser disc and the software was configured to control the laser disk player (Sony LDP 1500). The information was organized into four modules. The modules addressed: 1) background information; 2) interpretation of the stroboscopic images; 3) vocal pathologies and their effects on phonation; and 4) practice/feedback including a quiz and strobe rating practice. The module covering background information introduced stroboscopy equipment and the principles governing its operation. Information was also included about patient positioning and practical aspects of obtaining the stroboscopic image. The module covering rating of stroboscopic images defined each stroboscopic parameter according to definitions that have appeared in existing literature (5). Each parameter was presented with written definitions, graphics to supplement the text, and video clips to demonstrate the phenomenon being discussed. In nearly all situations, between 2-4 video clips were provided. The clips included examples of normal and disordered phonatory patterns. The module discussing vocal pathologies and their effects on phonation covered 13 different pathological conditions. For each condition, text, graphics, and a variety of video clips were provided. The review and quiz module included a selftest of selected material covered in the other modules and 10 video clips in which the user could practice rating each stroboscopic parameter. On-screen feedback about the accuracy of the practice ratings was also provided. To assure the content validity of the materials, all of the text and video material used in the training was reviewed and approved by three individuals with considerable academic and clinical preparation in videostroboscopy and its interpretation. None of these individuals was included in the experienced subject group, but two served as expert judges. Journal of Voice, Vol. 12, No. 4, 1998

516

BRUCE J. POBURKA AND DIANE M. BLESS

FIG. 1. Sample of a typical screen from the computer-based training modules.

Pretraining and posttraining tests of academic knowledge of the stroboscopy topic

Pretraining visuo-perceptual ratings of stroboscopy

To evaluate how much learning could be attributed to the training program, each student was given a written test (Appendix A) of her knowledge of the stroboscopy topic before and after training. To insure the validity of the written test, the professor teaching the course reviewed each test question to verify that it addressed a key concept. The test format was multiple choice and fill-in-the-blank. For multiple choice items, three foils were used for each question. No feedback was given to any subject regarding her performance on the pretest. To facilitate equivalency of experience, the experienced subjects were also given the identical pre- and posttests before and after their training. The test scores from the experienced group were not analyzed.

To isolate the effects of the training program, both subject groups made visuo-perceptual ratings of a 45-sample stroboscopy videotape before any training took place. The subjects were given 2 weeks to complete the ratings.

Journal of Voice, VoL 12, No. 4, 1998

The training The subjects were instructed that the training program would be used on a self-paced basis and that there was no minimum or maximum amount of time they should spend using the training program. Although individual training times were not recorded, a poll of the subjects revealed that most subjects spent about 4.5 hours using the training program. The students were reminded that the academic material cov-

A MULTI-MEDIA, COMPUTER-BASED METHOD FOR STROBOSCOPY RATING TRAINING

ered in the modules would be covered on the final exam in their class and that they should spend as much time as necessary to understand the material to their own satisfaction. The students used two multi-media work stations featuring identical Macintosh iici Computers and identical Sony LDP-1500 laser disk players. The stations were located in a University library. Because of geographic distance, subjects in the experienced group were not required to use the work stations at the library. Instead, the software was installed on Macintosh Quadra 840 AV and Quadra 700 computers that were accessible to them. A Sony LDP 1500 laser disk player was also used by the experienced group.

Posttraining visuo-perceptual ratings of stroboscopy Upon completing the training phase, subjects in both groups were required to view and rate the same videostroboscopic examinations that they rated before the training. The ratings were made with the same equipment and in the same environments previously described for the pretraining ratings.

Consensus rating by the expert judges The expert judges rated the same video samples on similar equipment as the two subject groups (Mitsubishi S-VHS HS V69 super VHS VCR and JVC C1350 high resolution monitor). A consensus rating procedure was used to insure that the ratings used for comparison were agreed on by all judges. Two consensus rating sessions were conducted with roughly half of the samples being rated at each session. Consensus was reached according to the following guidelines: Each judge was given his or her own rating forms. A video sample was played as many times as necessary for all judges to make their individual ratings. No conferring was allowed for this initial round of ratings. The individual ratings were then examined by the principal investigator, who was present at the consensus rating sessions but did not serve as a judge. If at least two of the three judges demonstrated exact agreement for a given rating, consensus was considered to be reached. When each judge submitted a different rating, differences were resolved in the following manner: Each judge shared his or her original rating and the sample was replayed as many times as necessary. The judges then openly discussed

517

the reasoning for their respective ratings and came to consensus on the rating. There was only one video sample in which consensus could not be reached and the sample was omitted from analysis.

Analyses Ratings from the two subject groups were compared to the consensus ratings of the expert judges. Interjudge agreement between the expert judges and each subject group was determined by computing correlations between mean ratings from each subject group with consensus ratings from the expert judges. The ANOVA model for estimating reliability was used. To avoid difficulties associated with coding distribution differences between observers (e.g., linear relationship but values consistently differing by a scaler point), an intraclass correlation coefficient was used. A separate correlation was computed for each stroboscopic parameter to allow identification of which parameters were associated with the best and worst interjudge agreement. To examine intrajudge reliability, 10 video samples were presented twice among the total of 45 samples. Each repetition was presented randomly with the other samples. The repeated measures were also analyzed using the ANOVA model for estimating reliability. RESULTS Interjudge agreement and intrajudge reliability are reported for both subject groups and for the expert panel of judges in Tables 1 through 6. There was insufficient variability in the ratings for the parameters of vertical level and the nonvibrating portion to complete meaningful correlations for these parameters.

Interjudge agreement for the student group The data in Table 1 indicate that before any training agreement between the students and expert judges was the best for the left and right vocal fold edge (.81 and .76, respectively) and left/right amplitude of excursion (.77 and .78, respectively). These correlations were significant (alpha .001, nondirectional). The parameters with the least amount of agreement were phase closure (.21), regularity (.58), and supraglottic activity (.58). The correlation for phase closure was not significant. Joun~al of Voice, Vol. 12, No. 4, 1998

518

B R U C E J. P O B U R K A A N D D I A N E M. BLESS

After completing their training, there was better agreement between the students and the expert judges for all parameters that were analyzed. The parameters of vocal fold edge-left and right had the best agreement after training (.90 and .89, respectively); the parameter with the lowest agreement was again phase closure (.57). With the exception of mucosal wave-right, all remaining parameters had correlations of .74 or higher. All correlations were significant after training. These correlations reveal that, for the parameters with the best agreement (vocal fold edge), 81% of the variance in the ratings can be accounted for. In contrast, for phase closure, only 32% of the variance can be accounted for.

Interjudge agreement for the experienced group

correlations in Table 2 indicate that the training did not have an appreciable effect on the experienced group's rating ability. Table 3 contains data comparing the posttraining agreement with the judges for the students and the experienced group. The correlations are quite similar. After training, the students generally agreed with the expert judges about as well as or slightly better than the experienced group.

Agreement on the glottal closure parameter Results for glottal closure are reported in percent agreement because rating this parameter is not numerical, but rather, uses descriptive terms (e.g., complete, posterior gap, etc.). Overall, the students agreed with the judges 51% of the time before training and 58% of the time after training. For the experienced group, the percent agreement improved from 58% before training to 67% after training.

Similar to the student group, the data in Table 2 revealed that before any training the experienced group demonstrated the best agreement with the panel of expert judges on the parameters of vocal fold edge (.87) and amplitude of excursion (.87). As well, the parameters with the least amount of agreement were phase closure (.55), regularity (.73), and mucosal wave-right (.74). The pretraining and posttraining

Table 4 shows intrajudge reliability for the student group before and after training. The data revealed that the students were highly reliable on all parame-

T A B L E 1. Pre/posttraining intraclass correlations (inteljudge agreement) and percent agreement for glottal closure for student ratings and expert judges' consensus ratings

T A B L E 2. Pre/posttraining intraclass correlations ( interjudge agreement) and percent agreement for glottal closure for experienced group's ratings and expert judges' consensus ratings

Intrajudge reliability

Correlations

Correlations

Parameter

Pre

Post

Parameter

Pre

Post

Supraglottic activity

0.58

0.76

Supraglottic activity

0.82

0.75

Fold edge-L

0.81

0.90

Fold edge-L

0.86

0.90

Fold edge-R

0.76

0.89

Fold edge-R

0.87

0.88

Amplitude-L

0.77

0.84

Amplitude-L

0.87

0.85

Amplitude-R

0.78

0.83

Amplitude-R

0.84

0.83

Mucosal wave-L

0.75

0.82

Mucosal wave-L

0.86

0.77

Mucosal wave-R

0.60

0.74

Mucosal wave-R

0.74

0.75

Phase closure

0.21

0.57

Phase closure

0.55

0.54

Phase symmetry

0.72

0.78

Phase symmetry

0.83

0.83

Regularity

0.58

0.78

Regularity

0.73

0.71

Percent Agreement Glottal closure

Journal of Voice, VoL 12, No. 4, 1998

51%

58%

Percent Agreement Glottal closure

58%

67%

A MULTI-MEDIA, COMPUTER-BASED METHOD FOR STROBOSCOPY RATING TRAINING

ters even before training (.84-.99); and as expected, after training, they were even more consistent (.94-.99). Table 5 contains the intrajudge reliabilities for the experienced group before and after training. They also exhibited high reliability values prior to training (.88-.98) as well as after training (.94-.98). Table 6 contains the reliability coefficients for each of the expert judges. For most parameters, the judges also exhibited a high degree of consistency. However, there were some parameters (e.g., fold edge-left and mucosal wave-right) with somewhat lower values.

Tests of student knowledge of the stroboscopy topic The mean score for the students' written tests of academic knowledge of stroboscopy improved from 60% to 80% after training.

Consensus ratings by the expert judges Table 7 summarizes the results of the consensus ratings by the expert judges. The judges reached initial consensus on 86% of all ratings. Differences in rating of a single scaler point had to be resolved on 11% of all ratings. The remaining 3% involved differences in rating of 2 or more scaler points. Overall, the greatest amount of disagreement (2+ scaler point differences) occurred when rating the parameters of

T A B L E 3. Comparison of student group or experienced

group posttraining agreement with expert judges (interjudge agreement) Students and judge correlations

Exp. grp. and exp. judges correlations

Supraglottic activity

0.76

0.75

Fold edge-Left

0.90

Fold edge-Right

Parameter

519

phase closure and regularity. These parameters also had some of the lowest levels of interjudge agreement for both subject groups even after training.

DISCUSSION The discussion is organized according to two main issues that have emerged from this study: 1) the ef-

T A B L E 4. Pre/posttraining intraclass

correlations ( intrajudge reliability)for student ratings of repeated measures Correlations Parameter

Pre

Post

Supraglottic activity

0.84

0.95

Fold edge-L

0.99

0.99

Fold edge-R

0.98

0.98

Amplitude-L

0.97

0.97

Amplitude-R

0.97

0.97

Mucosal wave-L

0.98

0.97

Mucosal wave-R

0.98

0.94

Phase closure

0.97

0.98

Phase symmetry

0.97

0.97

Regularity

0.97

0.97

T A B L E 5. Pre/posttraining intraclass

correlations (intrajudge reliability)for experienced group's ratings of repeated measures Correlations Parameter

Pre

Post

0.90

Supraglottic activity

0.88

0.94

0.89

0.88

Fold edge-L

0.98

0.98

Amplitude-Left

0.84

0.85

Fold edge-R

0.96

0.98

Amplitude-Right

0.83

0.83

Amplitude-L

0.94

0.94

Mucosal wave-Left

0.82

0.77

Amplitude-R

0.92

0.97

Mucosal wave-Right

0.74

0.75

Mucosal Wave-L

0.93

0.97

0.94

0.98

Phase closure

0.57

0.54

Mucosal Wave-R

Phase symmetry

0.78

0.83

Phase closure

0.89

0.98

Regularity

0.78

0.71

Phase symmetry

0.98

0.98

Glottal closure

58%

67%

Regularity

0.92

0.94

Journal of Voice, Vol. 12, No. 4, 1998

520

BRUCE J. POBURKA AND DIANE M. BLESS T A B L E 6. Intrajudge reliability for the expert judges Edge

Judge

Supraglottic activity

Amplitude

Mucosal wave

Left

Right

Left

Right

Left

Right

Phase closure

Phase symmetry

Regularity

0.89

0.89

0.77

0.77

0.86

0.86

0.94

0.85

0.90

2

0.99

0.95

0.91

0.92

0.87

0.88

0.87

0.88

0.97

0.96

3

0.64

0.44

0.91

0.82

0.93

*

0.42

0.85

0.85

0.89

*Insufficient variability.

T A B L E 7. Summaly of consensus ratings by the panel of expert judges

Initial agreement Disagreement of I point Disagreement of 2+ points Totals

No. of ratings

Percent of total

491 63 18 572

86% I 1% 3% 100%

Item Analysis of Disagreement % of total disagreements

% 1 point disagreements

% 2+ points disagreements

5

5

0

Vocal fold edge

12

14

5

Amplitude

26

28

17

Mucosal wave

16

17

I1

Phase closure

21

21

23*

4

4

5

15 100%

8 100%

39* 100%

Parameter Supraglottic activity

Phase symmetry Regularity Totals

*Parameters comprising 62% of all disagreements of 2+ scalar points.

fect of this training method on interjudge agreement and intrajudge reliability and 2) the data suggest that stroboscopy parameters can be divided into two main classifications: geometric parameters and dynamic parameters. The dynamic parameters are more difficult to rate, regardless of the amount of experience the rater has. These classifications are a new way to conceptualize stroboscopy parameters and the new terms will be discussed in detail.

lnterjudge agreement The data contained in Tables 1 and 2 showed that the training resulted in improved accuracy of ratings Journal ofVoice, Vol. 12, No. 4, 1998

for the students but not for the experienced subjects. Table 3 compares posttraining interjudge agreement data for both groups of subjects. It is important to note that, after training, the students performed as well as the experienced group. Apparently, training affected the students' strobe rating skill such that they were comparable to those who already had strobe rating experience (the experienced group had an average of 2.1 years of experience with strobe rating). It was somewhat surprising that the experienced group did not achieve better agreement with the expert judges. Two factors may account for this: 1) amount of experience and 2) methodology of this study. The

A MULTI-MEDIA, COMPUTER-BASED METHOD FOR STROBOSCOPY RATING TRAINING

expert judges had an average of almost 7 more years of experience than subjects in the experienced group. It is possible that this difference in experience could lead to a different rating strategy. For example, highly experienced judges may recognize situations where optical illusions (e.g., apparent reduction of amplitude due to angle of view) should be ignored, whereas the less experienced subjects may not have recognized this. Additionally, the methods used in this study may also account for the differences. The expert judges rated the strobe exams using a consensus rating procedure, whereas the subjects in the experienced group rated the examinations individually and their mean ratings were used for comparison to the expert judges. Thus, the ratings of the expert judges would have had reduced variability due to the consensus rating method they used.

Intrajudge reliability Tables 4 through 6 contain intrajudge reliability data for the two subject groups and the expert judges. The data indicated that reliability was high for both subject groups. The reliability cannot be attributed to the training program since it was high in both groups even before training. A possible reason for the high reliability values could be related to the nature of the rating scale that was used. Most of the parameters were rated on a 6-point scale (values ranging from 0-5). Because this scale has only 6 points it may be regarded as a somewhat crude scale. The rater is forced to select one of only six possible ratings. It is possible therefore, that the design of the scale may have contributed to the high reliability values that were observed. Reliability data for the expert judges appears in Table 6 and is similar to values that have been reported in other stroboscopy literature (7-8).

The geometric and dynamic parameters A review of the agreement data in Table 3 revealed that, for some parameters, a satisfactory level of agreement was not reached, even after training. It is important to determine why these problems exist and whether agreement and reliability can be improved to an acceptable level. The problems that were observed on certain parameters may be related to their complex nature. It is proposed that the stroboscopy parameters can be classified as geometric or dynamic. These new terms can be used to distinguish parameters that require the

521

rater to make a simple rating of the shape or configuration of a structure (geometric) from those that involve rating a continuously changing movement pattern (dynamic). The geometric parameters are vocal fold edge, vertical level, and glottic closure. The dynamic parameters are supraglottic activity, amplitude of excursion, mucosal wave, nonvibrating portion, phase closure, phase symmetry, and regularity. It may be argued that the geometric parameters are easier to rate because the task is simply to describe the physical appearance and/or configuration of the vocal folds. In contrast, rating the dynamic parameters is far more difficult. The dynamic parameters involve judging movement patterns and the task is to evaluate a pattern of continuously changing movement. An examination of the data provides support for the argument that there are inherent differences between the geometric and dynamic parameters and that the differences may affect interjudge agreement. The data in Table 1 revealed that, for vocal fold edge, a geometric parameter, the students' ratings had some of the highest correlations (.81 and .76) even prior to any training. The same pattern was observed among the experienced raters (Table 2). In contrast, the more complex, dynamic parameters were associated with some of the lowest correlations (e.g., phase closure, .21; regularity, .58) even after training. The data in Table 7 showed that, even the expert judges who reached high levels of agreement (initial agreement was reached on 86% of all ratings), the majority of all occasions where the judges disagreed by 2 or more scaler points occurred on the dynamic parameters of phase closure and regularity. It appears that the geometric parameters may be more "intuitive" and easier to rate, while the dynamic parameters appear to be more complicated and difficult to rate. The dynamic parameters should be emphasized in training because they involve concepts that may be more difficult to conceptualize and to evaluate. Since the dynamic parameters involve judgments of movement and timing patterns, the multimedia approach used by CAI may better facilitate conceptualization of the dynamic parameters.

In support of the multi-media, computer-based approach Rating stroboscopy requires specialized skill. Raters must have a clear, internal conceptualization of Journal of Voice, Vol. 12, No. 4, 1998

522

BRUCE J. POBURKA AND DIANE M. BLESS

normal and disordered laryngeal movement patterns. This is important because the internal conceptualization may be used as a referent against which the person will compare all strobe examinations. Theoretical support for this idea of internal referent-matching is supported by the template-matching and feature analysis models of visual perception (19). Anderson (19) explained that, "The template-matching theory of perception assumes that a retinal image of an object is faithfully transmitted to the brain, and that an attempt is made to compare it directly to various stored patterns" (p47). It is important to note that this model of visual perception involves the use of templates that function as internal referents against which images are compared in the perception process. Using this theory, it can be argued that since strobe interpretation is visuo-perceptual, any strobe training program must aim to develop a learner's internal concept of normal and disordered vibratory patterns. The training should use multiple media for presenting the various concepts involved; particularly for the dynamic parameters. Proponents of experiential learning lend further support to the idea of pairing didactic material with experiential learning. Winn (20) stated that, "the transfer of knowledge and skill to a variety of settings is impeded when the design of the instruction that teaches them is separated from the implementation of that instruction and from the settings in which the skills will be used" (p17). Harley (21) indicated: Students create (tacitly or otherwise) a personalized sense of situation that guides their determination as to what is meaningful and how it is to be understood and incorporated into what is already known. For the classroom teacher the challenge of situated learning becomes one of developing methodologies and course content that support cooperative activity, and reflect the complex interaction between what individuals already know and what they are expected to learn, recognizing that ultimately meaning can only be established by and not for the learner. (p47) The preceding arguments about the need for example-based learning are based on the idea that didactic knowledge alone is inadequate for training at least some stroboscopic parameters. Data from the pretest of academic knowledge in Table 8 support this argument. For example, item 14 dealt with the dynamic parameter of phase closure. The data revealed that Journal ofVoice, Vol. 12, No. 4, 1998

before training 83% of the students knew the correct definition of phase closure. However, when rating phase closure, the pretraining correlation with the judges was only .21. After the training in which video was paired with formal definitions, the students demonstrated better agreement with the expert judges (.57). Although the correlation remained low, interjudge agreement for phase closure improved considerably. This is interpreted as evidence that didactic knowledge alone is insufficient and that practical experience must be paired with it.

Student knowledge of the stroboscopy topic It was hypothesized that the training received by the students would bring the group to a more common level of academic knowledge about stroboscopy and that there would be less variance in test scores after the training period. After the training, average scores were improved, but the variance of the test scores was not significantly changed from the pretraining phase. Future uses of the training method Besides clinical skills training, this training method may be useful for research that requires the training of judges to rate stroboscopy examinations. It cannot be assumed that providing raters with only a scale to structure their ratings will result in an acceptable level of agreement. Training should be provided specifically for use of a particular scale. The data showed that the training procedure developed as part of this study can be used to effectively train inexperienced subjects to rate stroboscopic images. This is of particular significance considering that the rating scale on which the training program was developed is widely used. It is also possible that this type of training method could also be used in training individuals to make acoustic-perceptual judgments. Perhaps experienced-based learning with emphasis on audio samples paired with descriptive terms and opportunities to practice in an interactive manner would resolve some of the problems with agreement and reliability that were discussed by Gelfer (22). CONCLUSIONS The training was beneficial to individuals with little or no previous exposure to videostroboscopy rat-

A MULTI-MEDIA, COMPUTER-BASED METHOD FOR STROBOSCOPY RATING TRAINING

523

T A B L E 8. Analysis o f student responses to test items on stroboscopy pretraining and posttraining % students with correct answer Question

Topic

1

Equipment-observation light

2

Pretraining

Posttraining

0

25

Equipment-running phase

76

89

3

Equipment-running phase

38

50

4

Equipment-light trigger

90

57

5

Strobe interp.glottal closure

97

93

6

Question omitted

-

-

7

Strobe interp.-vertical level

0

79

8

Strobe interp.-fold edge

83

82

9

Strobe interp.-amplitude

90

89

10

Strobe interp.-amplitude

90

89

11

Strobe interp.-mucosal wave

69

93

12

Strobe interp.-mucosal wave

41

89

13

Strobe interp.-nonvibrating portion

97

96

14

Strobe interp.-phase closure

83

82

15

Strobe interp.-phase closure

62

89

16

Strobe interp.-phase symmetry

2

64

17

Strobe interp.-regularity

24

89

18

Disease-nodules

93

100

19

Disease-papilloma

93

100

20

Disease-carcinoma

59

71

21

Disease-laryngitis

27

67

22

Disease-Reinke's edema

38

82

23

Disease-paralysis

76

93

24

Disease-paralysis

17

39

25

Disease-scarfing

97

100

ing. The training improved rating abilities in the student group to a level beyond what was accomplished with didactic information. Although the student group did not reach a high level of agreement with the experts on all parameters, it is important to note that after training, their performance was comparable to the subjects with 2 years of experience. The geometric parameters (e.g., vocal fold edge) may not need to be heavily emphasized in training. In contrast, special emphasis on the dynamic parameters (e.g., phase closure, regularity) with numerous

opportunities for interactive practice and feedback may be necessary. Limitations The number of subjects in the experienced group was relatively small (n = 9). The degree to which this group represents working clinicians in general is uncertain. It is possible that experienced clinicians who use strobe on a daily basis or limit their practice to voice disorders would perform differently than those who use their skills on a less regular basis. Journal of Voice, VoL 12, No. 4, 1998

524

BRUCE J. POBURKA AND DIANE M. BLESS

The materials used in the training modules were not validated separately from this study. The results pertain only to the combined effects of the material and the computer-based format. Conclusions cannot be drawn about the individual contributions made by the materials or the computer-based format. Finally, this study was designed only to validate a particular computer-based training method. It did not compare the computer-based approach to a more conventional classroom approach (e.g., lecture and discussion). A proper comparison of the two approaches could only be accomplished with an experimental design comparing outcomes when separate groups are taught with either classroom or computerbased approaches.

Acknowledgment: This research was supported in part by grant no. P60 DC00976. REFERENCES 1. Bless DM, Hirano M, Feder R. Videostroboscopic evaluation of the larynx. Ear Nose Throat J 1987;66:290-6. 2. Sercarz J, Berke G, Gerratt B, Kreiman J, Ming Y, Natividad M. Synchronizing videostroboscopic images of human laryngeal vibration with physiological signals. Am J Otolaryngol 1992;13:40-4. 3. Sataloff RT, Spiegel J, Hawkshaw M. Strobovideolaryngoscopy: results and clinical value. Ann Otol Rhinol Lar3.,ngol 1991 ; 100:725-7. 4. Mahieu HF, Dikkers FG. Letter to the Editor. Arch Otolaryngol--Head Neck Surg 1992;118:1004. 5. Hi rano M, Bless DM.Videostroboscopic examination of the larynx. San Diego, Calif." Singular Publishing Group Inc, 1993. 6. Bartell T, Bless DM, Ford CN. Stroboscopic examination of voice: a comparison of two procedures. Proceedings of the International Conference on Voice. Kurume, Japan, 1986.

Journal ofVoice, Vol. 12, No. 4, 1998

7. Ramos C, Bless DM, Harmon R, Ford C. The mucosal wave as a prognostic sign in vocal paralysis. Paper presented at the meeting of the American Speech-LanguageHearing Association; November, 1993; Anaheim, Calif. 8. Teitler N. Examiner bias: influence of patient history on perceptual ratings of videostroboscopy [master's thesis]. Madison: University of Wisconsin-Madison, 1992. 9. Peppard R, Bless DM. A method for improving measurement reliability in laryngeal videostroboscopy. J Voice 1990;4:280-5. 10. Sercarz J, Berke G, Arnstein D, Gerratt B, Natividad M. A new technique for quantitative measurement of laryngeal videostroboscopic images. Ear Nose Throat J 1991;117; 871-5. 11. Leonard RJ. Use of laryngeal imaging procedures. Asha 1992;34:270. 12. Kozma RB. Learning with media. Rev Educational Res 1991;61:192, 196. 13. Sherwood R, Kinzer C, Bransford J, Franks J. Some benefits of creating macro-contexts for science instruction: initial findings. J Res Science Teach 1987;24:417-35. 14. Sherwood R, Kinzer C, Hasselbring T, Bransford J. Macrocontexts for learning: initial findings and issues. Appl Cognitive Psychol 1987; 1:93-108. 15. Weller HG. Interactivity in microcomputer-based instruction: its essential components and how it can be enhanced. Education Tech 1988;28:23-7. 16. Kearsley GP, Frost J. Design factors for successful videodiscbased instruction. Education Tech 1985;25:7-13. 17. Hannafin M. Guidelines for using locus of instructional control in the design of computer-assisted instruction. J h~struction Devel 1984;7:6-10. 18. Gay G. Interaction of learner control and prior understanding in computer-assisted video instruction. J Education Psychol 1986;78:225-7. 19. Anderson J. Cognitive psychology and its implications. New York, NY: W.H. Freeman and Company, 1985. 20. Winn W. Instructional design and situated learning: paradox or partnership? Education Tech 1993;33:16-21. 21. Harley S. Situated learning and classroom instruction. Education Tech 1993 ;33:46-51. 22. Gelfer M. Perceptual attributes of voice: development and use of rating scales. J Voice 1988;2:320-6.

A MULTI-MEDIA, COMPUTER-BASED METHOD FOR STROBOSCOPY RATING TRAINING

525

APPENDIX A Test of Students' Knowledge

1.

is a light source that allows examination of the gross structures of the larynx.

.

is a mode of operation where light pulses are timed to be slightly different from the rate of vocal fold vibration.

3. The purpose of using a running phase for examination of the vocal folds is to: a. Observe the larynx with a constant light source, allowing the best view of the structural condition of the vocal folds. b. Observe vibration in fast motion. c. Observe the vibration pattern during vibration. d. Assess the cycle-to-cycle regularity of vibration. 4. How is the strobe light triggered? a. Photoglottography (PGG) b. Microphone/accelerometer c. A switch on the front panel of the stroboscope d. None of the above 5. Incomplete glottic closure is a common closure pattern seen in patients with: a. Vocal nodules b. Spasmodic dysphonia c. No pathology (normal) d. Unilateral vocal fold paralysis 6. What is the best means of evaluating supraglottic activity? a. Direct laryngoscopy b. Rigid endoscope c. Flexible trans-nasal endoscope d. External palpation of the suprahyoid area of the neck 7. Vocal folds are described as being "off plane" when they are not 8. The vocal folds are maximally abducted during what activity. ? 9. Amplitude of excursion is a vibratory parameter that is directly related to: a. Pitch

b. Quality c. Intensity d. Timbre 10. When the vocal fold cannot fully interact with the air stream as in vocal fold paralysis with incomplete closure, the amplitude of excursion is likely to be: a. Excessive b. Difficult to assess c. Decreased d. Of no importance clinically 11. Disease such as scarfing and papilloma are likely to make mucosa: 12. The mucosal wave is reduced at frequencies. 13. What is a possible interpretation of a nonvibrating portion of the vocal fold: a. Stiffness in that region of the fold. b. An underlying disease process such as carcinoma c. Localized reaction to trauma d. All of the above. 14. Phase closure is a rating of: a. How much the two folds act as mirror images of each other during vibration. b. The amount of time the folds are closed relative to the amount of time they are open. c. How much one vibratory cycle is like the next. d. All of the above. 15. A phase closure pattern frequently associated with paralysis is: a. Closed phase predominates b. Open phase predominates c. Normal phase closure d. Phase closure is not rated in paralysis 16. Phase symmetry assesses 17. When assessing regularity, the vocal folds will appear to stand still if vibration is

Journal of Voice, Vol. 12, No. 4, 1998

526

B R U C E J. P O B U R K A A N D D I A N E M. B L E S S

18. A disease that usually occurs at the junction of the anterior and middle thirds of the vocal folds is: a. Nodules b. Polypoid degeneration c. Carcinoma d. Papilloma 19. A disease process that results in sometimes rapid growth of wart-like, benign neoplasm is: a. Papilloma b. Squamous cell carcinoma c. Mucous retention cyst d. Laryngomalacia 20. A disease which can occur anywhere in larynx and appears as an irregular structure, usually white is: a. Papilloma b. Carcinoma c. Nodules d. Cyst

Journal of Voice, VoL 12. No. 4, 1998

21.

is a swelling of the larynx resulting from infection or abuse. The folds appear reddish and swollen.

22.

is a collection of fluid beneath the submucosal layer. This disease is common in smokers and post-menopausal women. The vocal folds appear jelly-like superficially, and have a grayish, translucent appearance.

23.

is a neurological condition characterized by an immobile vocal fold which cannot move medially.

24.

is a neurological condition in which the cricothyroid muscle is paralyzed.

25. Scarring of the vocal fold usually results in the cover becoming: