Videotaped behavioral observations: Enhancing validity and reliability

Videotaped behavioral observations: Enhancing validity and reliability

Clinical Methods Videotaped Behavioral Observations: Enhancing Validity and Reliability Jennifer Harrison Elder Observing and quantifying behaviors ...

360KB Sizes 0 Downloads 72 Views

Clinical Methods

Videotaped Behavioral Observations: Enhancing Validity and Reliability Jennifer Harrison Elder

Observing and quantifying behaviors in natural or clinical settings has long produced challenges for researchers and clinicians. Although nurses are uniquely positioned to conduct behavioral research, they may not be well acquainted with behavioral observation strategies. The author identifies several problems commonly encountered while studying behavior in natural settings and suggests methods for enhancing validity and reliability through videotaped observations. Results of a newly developed calibration procedure are discussed with implications for rater training. Nurse clinicians and researchers should find this information useful for evaluating interventions in clinics, homes, and other applied settings. Copyright © 1999 by W.B. Saunders Company

ASY ACCESS TO and the relatively low cost of videotaping have contributed to its increasing use in studying human behavior in clinical and home settings. Yet, the systematic development of research procedures related to videotaping has not kept pace. Several problems with this type of data collection methodology can adversely affect validity and reliability, including (a) subject reactivity, (b) environmental extraneous variables, (c) ambiguous behavioral definitions, and (d) low inter-rater agreement in videotape interpretation.

E

SUBJECT REACTIVITY

"Subject reactivity" is defined by Johnson and Bolstad (1975) as the subject's knowledge that he is being observed. Ethical considerations and environmental constraints often prohibit observing subjects without their knowledge; therefore, it is possible that videotaping may threaten the validity of research findings because subjects recognize they are being videotaped. In the most basic sense, "validity" refers to how true or accurately claims From the College of Nursing, University of Florida, 100187 JHMHC, Gainesville, FL 32610. Jennifer Harrison Eider, PhD, RN, College of Nursing, University of Florida, I O0187 JHMHC, Gainesville, FL 32610. Copyright © 1999 by W.B. Saunders Company 0897-1987/99/1204-0006510.00/0 206

or important concerns are measured (Bums & Grove, 1997). Surprisingly, tittle has been reported regarding reactivity effects. Well-controlled clinical trials are needed to empirically evaluate whether videotaping influences subject behavior. The author's experience suggests that although subjects may initially be camera-conscious, they become habituated with repeated observations (three to four sessions). Thus, it is likely that reactivity effects are minimized as subjects become accustomed to the observers and/or cameras (Elder, 1995). "Mock taping sessions" are often useful in familiarizing participants with the videotaping process and reducing reactivity bias. With this approach, videotaping takes place for several sessions but data obtained from these sessions are not analyzed. This also gives raters opportunities to rehearse the procedure, check for light and sound quality, and determine the most advantageous camera positions. Another method is to conduct numerous baseline sessions where behaviors are coded and analyzed before an intervention is introduced. This method is commonly used in single-subject experimental designs (Elder, 1997). Ideally, baseline data should stabilize, and not show any trends or unusual graphical spikes. An example of stable baseline data is noted in Figure 1, which graphi-

Applied NursingResearch,Vol. 12, No. 4 (November), 1999: pp 206-209

VIDEOTAPED BEHAVIORAL OBSERVATIONS

207

10 9

v

Session 1

Figure 1.

Session 2

Session 3

Session 4

Sample showing stabilization.

cally depicts frequencies of a target behavior recorded during four 10-minute baseline sessions (i.e., sessions that occurred prior to the introduction of the intervention). The line is almost flat, indicating that the behavior occurred with similar frequency across the four baseline sessions. In sharp contrast to the data in Figure 1 are the unstable rates for another target behavior as depicted in Figure 2. When baseline instability is observed, as in this case, it is critical for investigators to closely examine videotapes and look for confounding (extraneous) variables and/or signs that participants may be overly aware of or distracted by the videotaping procedures. ENVIRONMENTAL EXTRANEOUS VARIABLES

Polit and Hungler (1997, p. 459) define "internal validity" as "the degree to which it can be inferred that the experimental treatment (independent variable), rather than uncontrolled, extraneous factors, is responsible for the desired effects." In behavioral research, internal validity can be threatened by environmental extraneous variables (e.g., noise from surrounding areas, the presence of other individuals other than research subjects), which may confound findings. It is critical for investigators to consider possible extraneous variables prior to data collection. Once identified, potential confounds can be eliminated, controlled, or even isolated for study within the context of the treatment. However, even 16 14 12.

with optimal preparation and planned control, natural settings may offer unexpected interference. Videotape methodology usually surpasses direct behavioral observations with regard to detecting extraneous variables. One reason for this is that raters present in the environment may be so focused on the target behavior(s) that they fail to notice environmental influences. Videotapes can be paused as needed and carefully studied. In addition, because videotapes can be viewed repeatedly, they provide opportunities for more accurate pictures of the actual target behaviors. Such was the case in Elder's (1997) report of an autistic child engaging in self-stimulatory behavior. Data indicated wide variability across the sessions, and this variability was difficult initially to explain. However, repeated viewing of videotapes indicated the presence of environmental noises from an adjacent room that clearly affected the child's behavior. Because the videotaping method afforded numerous data points over time, it was possible to analyze the graphed data and identify and eliminate the environmental confounding variables. AMBIGUOUS BEHAVIORAL DEFINITIONS

Ambiguous behavioral definitions present a threat to "construct validity," which is described as the instrument's ability to measure the constructs of interest (Polit & Hungler, 1997). In behavioral research it applies most directly to how targeted behaviors are defined. When determining frequencies of behaviors, it is critical that observers recognize when the targeted behavior begins and ends as well as what constitutes the behavior. For example, "tantrum" has as a variety of meanings. In earlier work, this author found that the definition of "tantrum" needed clarification. The following operational definition was developed: "tantrum is clearly audible crying sounds emitted by the child, associated with kicking and/or flailing arms (and bounded by a 5-second pause)". This definition not only provides observers with a "behavioral picture" of a tantrum but also a time parameter necessary to differentiate one tantrum from another.

10

8 6. 4. 2-

LOW INTER-RATER AGREEMENT

Session 1

Figure 2.

Session 2

Session 3

Session 4

Sample showing instability.

A reliable instrument should yield the same results repeatedly; if it does not, the meaning and applicability of findings are in question (Burns & Grove, 1997). In behavioral research, the instruments are often the observers (raters). Consistency

ELDER

208

Table 1. Summary of Potential Threats to Validity and Reliability and Their Management Videotaping Problem

Potential Threat

Table 2. Written Behavioral Definitions Given to Raters Prior to Training Target Behavior Tantrum

Subject reactivity

Environmental extraneous variables

Ambiguous behavioral definitions Low Interrater agreement

Internal validity

Internal validity

Construct validity reliability Reliability

Operational Definition

Management Subject habituation through repeated videotaping; mock taping sessions; extended baseline periods Careful preplanning to minimize chances of environmental extraneous variables; repeated viewing of videotapes with close visual inspection of graphed data Clear behavioral deftnitions that address duration Initial group practice and discussion sessions; ongoing calibration activities

Mother initiating

Child initiating

Child responding

Mother responding

among raters is critical. Two methods for enhancing inter-rater reliability--initial group practice/ discussion sessions and ongoing "rater calibration" activities--are noted in Table 1. "Rater calibration" is a term derived from a process described by Johnston and Pennypacker (1986) whereby observational instruments are kept finely tuned to ensure that the targeted behaviors are captured accurately. When behavioral instruments are human observers, similar care must be taken to facilitate consistency among the raters. EVALUATION OF A NEW CALIBRATION PROCEDURE

The author conducted a pilot project to quantify mother-child interactions. Child subjects ranged in age from 3 to 6 years old and were diagnosed with "pervasive developmental delay." This broad diagnostic category includes children who have impairments in social interaction and communication and exhibit restricted, stereotyped patterns of behavior (American Psychiatric Association, 1994). Mothers ranged in age from 23 to 35 years old and were videotaped playing with their children in a clinic playroom setting during a standard intake procedure. Concerned about issues related to validity and reliability, the author developed written behavioral definitions (Table 2).

Clearly audible crying sounds emitted by the child, associated with kicking and/or flailing arms (and bounded by a 5-s pause). Movement cycle beginning with a clearly observable or audible maternal action or prompt indicating an attempt to engage/instruct the child (verbal or physical), directed toward the child (e.g., pusing a toy, verbal directive). Movement cycle beginning with a clearly observable or audible child action or prompt (verbal or physical) directed toward the mother indicating that the child wants to engage the parent (e.g., showing an object to the parent), verbal directive. Movement cycle that begins with the occurrence of a parental prompt and ends after the child initiates the requested movement or vocalizations within 5 s. Movement cycle that begins with the occurrence of a child behavior and ends after the mother initiates the requested movement or offers a comment that indicates that the mother acknowledges, understands, and/or approves of the child initiation within 2 s.

Procedure Coding Using Only Written Definitions. Following the development of written behavioral definitions, four raters, who were members of an honors nursing research class, were instructed to independently code (without consulting one another) preadmission videotapes of 11 children and their mothers. Even though the raters initially commented that the behavioral definitions "seemed clear" and were "easy to understand," inter-rater agreement was less than optimal (Table 3), particularly for behaviors that occurred infrequently. For example, a child displayed a brief tantrum on one occasion during a 15-minute segment. Two of the four raters correctly coded the tantrum, while the other two did not record the incident at all. This resulted in an inter-rater reliability score of .50, far below the desired .80 to 1.0. Initial Group Practice and Discussion. A need for more rater training and opportunities to discuss

VIDEOTAPED BEHAVIORAL OBSERVATIONS

209

Table 3, Mean Interrater Reliability Scores* Before and After Training Sessions Type of Observation

Noncalibrated~

Calibrated

Mother initiating Child responding Child initiating Mother responding

0.79 0.82 0.70 0.74

0.93 0.90 0.90 0.86

N = 11 videotapes, 4 raters. *Calculated by (number of agreements)/(number of agreements + disagreements). tUsing only written behavioral definitions.

behavioral occurrences was noted. Practice videotapes of children who were not subjects in the current project were coded, with coders "calling out" their responses to the group. In cases where discrepancies occurred, discussions commenced. Raters coded the videotapes repeatedly until 100% (1.0) agreement was obtained. Ongoing R a t e r Calibration. After viewing a number of videotapes, raters began to describe "drifting effects"; that is, concern that they were losing focus because of fatigue. In addition, inter-rater reliability was still not consistently within the acceptable range. They also noted extreme individual differences among the children that suggested tailoring some of the behavioral definitions. For example, one child's "initiating" did not include direct eye contact, yet he was clearly attempting to interact with his mother in the session. In response to this concern, calibration sessions were implemented. These involved a procedure similar to the previously described training sessions, but instead of using practice tapes from children not in the project, raters used 5minute segments from each of the mother-child sessions. These 5-minute segments were later discarded and not part of the analysis.

RESULTS Pre- and postcalibration scores of four behaviors: (1) mother initiating, (2) child responding, (3) child initiating, and (4) mother responding are presented in Table 3. Although the sample size was too small to claim statistical significance (11 videotapes and 4 raters), these preliminary findings suggest that the calibration sessions implemented in this study enhanced inter-rater reliability. By using practice tape segments that were not part of the data analysis, scientific integrity of the project was maintained. In addition, raters reported a much clearer understanding of behavioral response categories and an absence of drift effects. DISCUSSION AND IMPLICATIONS Videotaping offers new opportunities for nurse researchers and clinicians to empirically evaluate client behavior and intervention efficacy in a variety of natural settings. However, there are a number of potential problems associated with this type of inquiry that must be addressed to facilitate reliable and valid findings. Clearly there is a need for additional work in this area. This research project illustrates the type of procedural inquiry that can provide answers to methodological questions raised in behavioral research. Informed nurses are uniquely positioned to lead the way in advancing this methodology while ensuring quality research and empirically based clinical practice. ACKNOWLEDGMENTS I acknowledgethe followinghonorsnursing research students for their participation as raters in this project: Alicia Anderson, Melissa Beauchamp Spurrier, Jonathan Decker, and Mitzi Tucker. I am also grateful for Dr. Kathlean Long's suggestions regarding the preparation and submission of this manuscript.

REFERENCES American Psychiatric Association (1994). Diagnostic and

articles by Birkimer and Brown. Journal of Applied Behavior

Statistical Manual of Mental Disorders. 4th Ed. Revised.

Analysis, 12, 559-560.

Washington, DC: AmericanPsychiatricAssociation. Bums,N., & Grove,S. (1997). ThePractice of Nursing Research: Conduct, Critique and Utilization. Philadelphia,PA:W.B.Saunders. Elder, J. H. (1995). In-home communication intervention training for parents of multiply handicapped children. Scholarly Inquiry for Nursing Practice, 9, 71-92. Elder, J. H. (1997). Single subject experimentation for psychiatric nursing.Archives in Psychiatric Nursing, 11, I-7. Hartmann, D. E, & Gardner, W. (1979). On the not so recent invention of interobserver reliability: A commentary on two

Johnson, J. M., & Bolstad, O. D. (1975). Reactivity to home observation: A comparison of audio recorded behavior with observers present or absent. Journal of Applied Behavior Analysis, 8, 181-185. Johnston, J., & Pennypacker, H. (1993). Strategies and Tactics of Human Behavior Research, 2nd Ed. Hillsdale, NJ: Lawrence ErlbaumAssociates. Polit, D. E, & Hungler, B. E (1997). Essential of Nursing Research: Methods, Appraisal, and Utilization. Philadelphia, PA: Lippincott.