JOURNAL
OF COMMUNICATION
DISORDERS
15 (1982). 353-362
353
CULATTAANDSELTZERCONTENTANDSEQUENCE ANALYSIS OF THE SUPERVISORY SESSION: ArTmClrll~AxT xxr7 KnLlNDlLl1x nmx lx=r wmrr AlYU . ‘LTT\ x, * T IT\TrmX, yun~1lulY ur v HLlUlll T
.
SUSANN DOWLING of Houston, Houston. Texas
University
KRISTINE
V. SBASCHNIG
and CATHY
Wayne State University,
J. WILLIAMS
Detroit, Michigan
The Culatta and Seltzer (1976) Content and Sequence Analysis of the Supervisory Session was a pioneer effort in the evolution of an observation tool for use in charting the supervisory conference in speech-language pathology. The purpose of this study was to evaluate the reliability, validity. and practicality of data collected with this system. Supervisory conferences of three certified speech-language pathologists, each of whom supervised four students representing four experience levels, were audiotaped three times. Trained individuals coded specified segments of the conferences using the Culatta and Seltzer system. Data collected with the system was found to be unreliable. The observation system meets criteria for face validity but not content or construct validity. Criterion validitv was not evaluated. The maior strength of the system is nracticalitv.
Introduction The supervisory conference has been a major focus in supervisory process research. Observation systems have been developed or adapted to objectify the behaviors occurring during the supervisory conference (McCrea, 1979; Oratio, 1977; Smith, 1977; Underwood, 1974). The Content and Sequence Analysis of the Supervisory Session devised by Culatta and Seltzer in 1976 was a pioneer effort in the evolution of an observational tool to record the supervisory conference in speech, language pathology. The Content and Sequence Analysis of the Supervisory Session (Culatta and Seltzer, 1976) consists of twelve categories, six measuring supervisor behavior and six measuring clinician behavior. The categories are: Supervisor
Behavior
Good Evaluation: Supervisor evaluates trainee and gives verbal or nonverbal
Address correspondence Houston Central Campus,
observed behavior approval.
to Susann Dowling, 132 NOA, Communication Houston, TX 77004.
0 Elsevier Science Publishing Company, Inc., 1982 52 Vanderbilt Ave., New York, NY 10017
or verbal report of
Disorders,
University of
0021-9924/82/050353-10$02.75
354
S. DOWLING,
K.V. SBASCHNIG,
and C.J. WILLIAMS
Bad Evaluation: Supervisor evaluates observed behavior trainee and gives verbal or nonverbal disapproval. Question: Any interrogative client being discussed.
statement
or verbal
made by the supervisor
Strategy: Any statement by the supervisor therapeutic intervention.
given
report
relevant
to the clinician
of
to the
for future
Observation/Information: Provision by the supervisor of any relevant comment pertinent to the therapeutic interaction that is not evaluating, questioning, or providing strategy. Irrelevant: Any statement or question made by the supervisor relationship to the supervisory process.
which has no direct
Clinician Behavior Good Self-Evaluation: behavior or strategy.
Clinician
Bad Self-Evaluation: Clinician behavior or strategy. Question: Any interrogative being discussed.
provides provides
a positive
statement
about his own
a negative
statement
about his own
statement made by the clinician
relevant to the client
Strategy: Any statement or suggestion made by the clinician for future therapeutic intervention or justification of past therapeutic intervention. Observation/Information: Any statement made by the clinician rL__..-_..&::_r-_-_r:__ AL..& -..-l..-r:-_.._,.r:__:-_ LLK:Iilt)~;uIIL1‘,1T;lacLLU11 ~uat :,. lb __I uv~ t;valuauug, quc3lluurllg, strategy. Irrelevant: Any statement or question made by the clinician relationship to the supervisory process.
relevant
to the
^..__^^ r:_._ Oi buggcbung
which has no direct
This system has been used as a data collection tool in supervisory research by Culatta and Seltzer (1976, 1977), Dowling and Shank (1981). and Ingrisano et al. (1979). Culatta and Seltzer (1976, 1977) analyzed conferencing behavior throughout an academic term. No significant differences in either study were found as a function of time. Supervisor and clinician behaviors identified at the beginning of the term were similar to those at the end. Dowling and Shank (198 1) used the Culatta and Seltzer (1976) system to observe conference behaviors over an 8-wk period under two conditions: conventional one-to-one supervision; and the Teaching Clinic, a peer-group method of supervision. Significant differences were not found between the two supervisory methods. The Culatta and Seltzer (1976) system was employed by Ingrisano et al.
355
CULATTAANDSELTZERCONTENT
(1979) to collect conference data for six supervisor-supervisee pairs. In this -L..L rl.- -l’_.-‘r~_ stuuy me cnnlcldns were divided into two experience leveis. iu’o significant differences were found between the conferences of novice and experienced supervisees. Measures of supervisor-supervisee conference behavior using the Culatta and Seltzer (1976) system has consistently produced nonsignificant results. The question that emerges is whether there truly were no differences among the groups studied or whether data collected with this observation tool is reliable and/or valid. The issues of reliability and validity have been addressed by numerous authors (Ebel, 1961; Frick and Semmel, 1975; Herbert and Attridge, 1975; Rowley, 1976). Herbert and Attridge ( 1975) state that validity refers to the degree to which the measures obtained by an instrument actually describe what they purport to describe; reliability refers to the accuracy and consistency with which they do so.
Four subsets of validity are considered: content validity, construct validity, criterion-related validity, and face validity. They are defined by Herbert and Attridge (1975) as follows: Content validity refers to the degree to which the items of the instrument sample the behaviors about which conclusions are to be drawn. Construct validity refers to the degree to which the theoretical ;n..tr..mn,t LUG “__ J”“~,~(II,,Gl,~U. “..l.“+,,..+:,.+,%A r,.*r\nrt VI r\F +I.*“~~“I, ll,C lllJ,lUL,IGII,
claims and
Criterion related validity refers to the relationship between scores on the instrument in question and scores on some other variable used as a criterion. Face validity refers to the degree to which an instrument appears to measure or describe what it purports to measure and describe.
An important issue when selecting an instrument for use, in addition to reliability and validity, is practicality. This involves factors such as ease of use, ease of training coders, and complexity of data-gathering procedures. The purpose of this study is to determine if data collected with the Content and Sequence Analysis of the Supervisory Session (Culatta and Seltzer, 1976) observation system is valid, reliable, and practical. Procedures The subjects in this study were three certified speech-language pathologists employed as supervisors at Wayne State University and 12 student speech-language pathologists. Each supervisor supervised four students, representing each of four clock hour levels: 0; l-25; 26- 100; and 101 + . The 12 supervisor-supervisee pairs audiotaped their supervisory conference at three points in
356
S. DOWLING,
K.V. SBASCHNIG,
and C.J. WILLIAMS
time: the third, sixth, and ninth week of the ten week quarter. The tapes were timed and the middle 5 min identified. A total of 36 conference segments representing the middle 5 min of the supervisory conferences of three supervisors and each of their four supervisees were dubbed on master tapes for data coding. Two individuals were trained to code the conference segments using the Content and Sequence Analysis of the Supervisory Session Observation System (Culatta and Seltzer, 1976). Prior to data coding, a coder agreement of 0.85 with a criterion was required. The day following the establishment of coder agreement with a criterion, the coder recoded the criterion tape data as a measure of intra-observer agreement. A level of 0.85 was required. Intra- and intercoder agreement were sampled frequently. If intraagreement dropped below 0.85, training was reinstituted until coder agreement was reachieved. If interagreement between coders during actual data coding dropped below 0.50, both coders were required to recode the segment. In addition to the above procedures the authors examined the observation system and their experiences using the system to address the issues of content, and face validity as well as practicality. Results The results will address the issues in the following order: reliability, validity, and practicality. Reliability is concerned with the accuracy and consistency of measurement. It concerns coders abilities to accurately identify and code the appropriate categories of behaviors when they occur in a consistent manner over time. Thirty-six conference segments consisting of the middle 5 min of supervisory conferences for three supervisors and their four supervisees at three data collection points were coded and were the basis of the analysis. A multivariate 3X3 X4 repeated measures split-plot factorial design was used. Reliability was assessed through multivariate analysis and intraclass correlation coefficients. A multivariate analysis for the factor coder contrasted coder 1’s ratings against coder 2’s ratings as each of the coders coded each of the conference segments. There were no significant differences between coders (f = 1.459; d’ 6, 137; NS) indicating that they had coded similarly and apparently reliably. Further analysis contradicted this finding. Intraclass correlation coefficients, as described by Ebel (1951) and Rowley (1975), were computed to assess the reliability of ratings for each of four variables: coder, time, supervisor, and clinician. To be considered minimally reliable, the coefficient should exceed 0.60 (Frankewicz, 1980). However, a rating of 0.70 or above is preferable. The asterisked variables in Table 1 were found to be reliable. The factor followed by its reliable variables are as follows: Coder-strategy; Time-question, irrele-
CULATTA
Intraclass Dependent
TABLE 1 Correlation Coefficients
Variable
Irrelevant X Coefficient
CONTENT
* 0.46 * 0.78 0.31 -0.98 0.14
Time 0.54 -0.001 0.75 0.47 0.55 0.73 0.51
351
of Reliability
Independent Coder
Good evaluation Bad evaluation Question Strategy Obserhnfor.
AND SELTZER
Variable Supervisor
Clinician
0.73 0.59 * 0.67 0.80
0.03 0.04 0.19 0.21 *
0.84 0.12
0.85 0.26
*Indeterminant.
vant; Supervisor-good evaluation, strategy, observation/ information, irrelevant; Clinician-irrelevant. A mean interclass correlation coefficient was determined for each variable. All variables, with the exception of supervisor, fell below the cut-off point of 0.60 indicating that, overall, the ratings were unreliable. Unreliable ratings can arise from (1) inaccurate coders, (2) inconsistency in behaviors observed, or (3) an unreliable observation tool. The coders in this study achieved agreement with a criterion and intracoder agreement indicating knowledge of and consistent use of the system. Intra- and intercoder agreement was sampled frequently and retraining or recoding of data was instituted when needed. As a result, inaccurate coding did not seem a probable cause. Inconsistency in behaviors observed is a second potential cause of unreliable data. Three supervisors paired with four clinicians were observed three times. Previous research by Culatta and Seltzer (1976) using this system indicated that supervisor behavior is stagnant over time. Smith (1977) found supervisor behavior to be constant even though clinicians vary in experience level. This research implies that supervisor behavior is constant over time and clinician experience level, decreasing the probability that the low reliability of ratings in this study were due to inconsistency in observed behaviors. The final option to explain the lack of obtained reliability is the nature of the observation tool itself. It may mean that (1) there are too many categories in the system, (2) that some categories are ambiguous, or (3) that the inference level needed to make category judgments is too high. Validity was also addressed in this study. Four types of validity which were previously defined are generally considered: content, construct, face, and criterion. Face, construct, and content validity were evaluated in this study. Face validity is the degree to which an instrument measures or describes what it claims to assess. The purpose of the Content and Sequence Analysis of the Supervisory
358
S. DOWLING,
K.V.
SBASCHNIG,
and C.J. WILLIAMS
Session as stated by Culatta and Seltzer (1976) was to develop a system to “isolate interaction variables of the supervisory conference” and to objectify the supervisory conference. The observation system meets stated claims. Therefore, the system appears to have face validity. Construct validity is a consideration of the theoretical claims and hypotheses underlying and generated from a system. Culatta and Seltzer (1976) did not report or postulate theory relative to the system. As presented by Culatta and Seltzer (1976), the observation system does not have construct validity. Content validity is an examination of an instrument to determine if the items sample the world of behaviors for which it was developed. It includes an evaluation of the clarity, inclusiveness. and mutual exclusiveness of categories as well as the appropriateness of guidelines for coding. Content validity can also be measured on a more global level by evaluating a systems ability to identify differences among behaviors when they occur. The concept of content validity was evaluated in two ways. The first was an examination of the instrument and the authors experiences with the instrument. The second was the analysis of the supervisory conference segments for three supervisors and each of four supervisees at three data collection points. The title of this system is the Content and Sequence Analysis of the Supervisory Session, which implies that the content and sequence of the supervisory session are described in full. The system does accurately record the sequence of the behaviors recorded. However, behaviors in the supervisory conference include both process and content variables-process being a description of the types of behaviors that occur and content being the gradation of types. This observation system is limited to process variables. It identifies the types of behaviors occurring (e.g., questions or observation, but not the content). For example, in relation to questions, one does not know if the question is related to the client, clinician, or supervisor or whether it is about proposed procedures, clinic organization, etc. The system does not describe the universe of the supervisory process because content is not addressed. Another facet of content validity is the evaluation of the categories in the system as to clarity, inclusiveness, and mutual exclusiveness. The categories are clearly defined and are mutually exclusive with the exception of two supervisor and clinician categories, observation/information, and strategy. In the training of coders, these category definitions were found to be ambiguous due to the fact that numerous behaviors could be justifiably coded in either or both of the categories. At these times the categories did not appear to be mutually exclusive. For these categories, insufficient guidelines were provided for the coding of ambiguous situations. An additional facet of content validity, inclusiveness, is the capability of the system to be all inclusive, or capable of coding all occurring behaviors. An unscorable category was added to the observation system by the authors for both the supervisor and clinician as a sensitivity index. If a behavior could not be
CULATTAANDSELTZERCONTENT
359
appropriately coded in the categories defined by the system, it was coded in the unscorable category. The number of behaviors coded in this unscoreable category during the 36 segments was minimal, a total of 46. There are two types of behaviors that appear to be unscorable with this system. There was no provision for clinician reinforcement of the supervisor, such as the clinician saying to the supervisor “you’ve been very helpful.” Secondly, there was no appropriate category provided for structuring of the conference by either the supervisor or supervisee, such as the supervisor saying “First, let’s talk about your goals and then your activities. When we’ve completed that, we need to talk about report writing.” Since the numbers of unscorable behaviors are minimal, the statement that the system is inclusive of all possible behaviors need only be minimally qualified. A component of content validity is a system’s ability to identify differences among groups when they occur. This study attempted to maximize the potential for differences in the supervisory conference by sampling over time and by including clinician’s of various experience levels. Rowley (1976) states that the use of multiple, short, independent samples also enhances the reliability of ratings. The data for the content validity analysis were the 36 5-min supervisory conference segments previously described. The statistical design was a multivariate 3X3X4 repeated measures split-plot factorial. The results for the variables time, supervisor, clinicians and their interactions are as follows: Time (F = 2.926; df 12,272; p
360
S. DOWLING,
Homogeneity Dependent
K.V. SBASCHNIG,
of Variance
Variable
*Homogeneous
TABLE 2 as Tested by Cochran’s Independent
Coder Good evaluation Bad evaluation Question Strategy Obser./infor. Irrelevant
and C.J. WILLIAMS
* * * *
Time * * * * * *
Box F-Test
Variable Supervisor * * * * * -
Clinician * * * * *
variables.
question, irrelevant; Supervisor--good evaluation, observation/information, strategy; Clinician-irrelevant. Those factors with two or more homogeneous reliable variables (time and supervisor) were then used as the basis for reanalyzing the 36 supervisory conferences representing each of three supervisors, with four clinicians at three different times. The statistical design was the multivariate 3 x 3 x 4 repeated measures split-plot factorial. The results were as follows: Time (F = 3.861; df 4,280; p < 0.005); Supervisor (F = 4.081; df 6,278; p < 0.001). Interactions could not be tested as variables homogeneous and reliable with one factor were not homogeneous and reliable with another. Both variables were found to be highly significant which may indicate that the previous findings of indiscriminate significance may not have been due to the unreliability of ratings but rather a function of an insensitive observation tool. The final issue to be addressed in addition to reliability and validity is practicality. The strength of this observation system. as determined by author use, is practicality. It is easy to learn and for supervisory process variables, provides a means of charting occurrence and sequence. A summary of the reliability, validity, and practicality findings are found in Table 3. Discussion Serious questions have been raised about the validity and reliability of data collected with the Culatta and Seltzer (1976) Content and Sequence Analysis of the Supervisory Session observation system. As a research tool, it has serious limitations. But these findings should not be generalized to the use of the system on a nonresearch level. The system is easy to learn and it identifies the occurrence and sequence of process variables in the supervisory conference. For supervisors with limited amounts of available time who are interested in
CULATTA
AND SELTZER
361
CONTENT
TABLE 3 Summary Table of Reliability, Validity, and Practicality for the Culatta and Seltzer (1976) Observation System Content and Sequence Analysis of the Supervisory Session
Yes Reliability Can inter- intracoder achieved? Are ratings reliable?
agreement
No
In some instances
be *
Face Validity Does the system measue or describe what it claims? Content Validity Does the system describe the process of supervisory conferences? Does the system describe the content of supervisory conferences? Are categories clearly defined? Are categories mutually exclusive? Are the categories all inclusive? Are sufficient guidelines provided for coding? Is the system capable of identifying significant differences when they occur? Practicality Is the system easy to learn and use?
objectifying and observing appropriate tool.
* * *
*
their own conference
behavior.
it is a valuable
Appreciation is expressed to Eileen Glein and Gloria Polk for their participation in data collection. References Culatta, R.. and Seltzer, H. (1976). Content and sequence analysis of the supervisory session. Asha l&8- 12. Culatta, R., and Seltzer, H. (1977). Content and sequence analysis of the supervisory session: a report of clinical use. Asho 19523-526. Dowling, S., and Shank, K. (1981) A comparison to determine the effects of two supervisory styles, conventional and Teaching Clinic, in the training of speech pathologists. J. Commun. Dis 14:51-58.
362
S. DOWLING,
K.V. SBASCHNIG.
and C.J. WILLIAMS
Ebel, R. (1951). Estunation of the reliability of ratings. Pswhomrrriku 16:407-423. Frankiewicz, R. (1980). Personal communication. Frick, T., and Semmel. M T. (1975). Observrr Agrremenr and Reliabilitirs of Classroom Observarion Records. Bloomington. IN: Center for Innovation in Teaching the Handicapped. Herbert, J.. and Attridge. C. (1975). A guide for developers and users of observation systems and manuals. Am. Ed. Res. J. 12: I-20. Ingrisano, D., Guinty, C.. Chambers. R., McDonald, J., Clem, R., and Gory, M. (1979). The relationship between supervisory conference performance and supervise experience. Paper presented at the meeting of the American Speech. Language, Hearing Association, Atlanta, Georgia. McCrea, E. (1979) Supervisee self-exploration and four facilitative dimensions of supervisor behaviors: preliminary findings. Paper presented at the meeting of the American Speech, Language, Hearing Association, Atlanta. Georgia. Oratio, A. R. (1977). Supervision m Speech Pathology: A Handbookfor Supervisors and Cliniciuns. Baltimore: University Park Press. Rowley. G. L. (1976). The Reliability of Observational Measures. Am. Rrs. J. 13:51-59. Smith, K. (1976). Identification of perceived effectiveness components in the individual supervisory conference. Unpublished Doctoral dissertation, Indiana University, Bloomington. Underwood, J. (1974). Underwood category system for analyzing supervisor-clinician behavior. Paper presented at the meeting of the American Speech, Language. Hearing Assoctation, Las Vegas, Nevada.