Interobserver test-retest reliability of the randot preschool stereoacuity test

Interobserver test-retest reliability of the randot preschool stereoacuity test

Interobserver Test-Retest Reliability of the Randot Preschool Stereoacuity Test Sherry L. Fawcett, PhD,a and Eileen E. Birch, PhDa,b Purpose: Random d...

88KB Sizes 3 Downloads 94 Views

Interobserver Test-Retest Reliability of the Randot Preschool Stereoacuity Test Sherry L. Fawcett, PhD,a and Eileen E. Birch, PhDa,b Purpose: Random dot stereoacuity can be quantified to between 40 and 800 seconds of arc in preschool children by using the Randot Preschool Stereoacuity test (Stereo Optical Co, Inc, Chicago, Ill). To incorporate this test into clinic and research settings, the reliability of its stereoacuity scores obtained by separate examiners needs to be evaluated. The purpose of this study was to evaluate its interobserver test-retest reliability. Methods: Participants included 102 consecutive children with binocular sensory function ranging from fine to no measurable stereopsis. Clinical research participants included children with anomalous binocular vision caused by strabismus, cataracts, anisometropia, and ptosis. In a prospective study, random dot stereoacuity was measured twice under masked testing conditions by 2 examiners within a 1-hour period. Results: Interobserver test-retest reliability of the Randot Preschool Stereoacuity test is high among a population of children with diverse binocular sensory function. The correlation coefficient between individual test scores was highly significant (r = 0.97, P < .001). Mean differences between the 2 scores (0.021 log seconds of arc) were not significantly different from zero (t99 = 1.33, P > .1). The upper and lower limits of agreement were narrow, reflecting both the large sample size and the small variation between the 2 test scores. Interobserver test-retest reliability of the Randot Preschool Stereoacuity test was nearly constant across levels of functional stereoacuity, patient categorization, and age at the time of the test. Conclusions: The high agreement between the Randot Preschool Stereoacuity test scores by 2 independent observers supports its use in clinical management and research settings for the quantitative assessment of binocular sensory vision, as well as in multicentered research studies. (J AAPOS 2000;4:354-8)

P

erformance on stereoacuity tests is dependent on equal visual acuity by the 2 eyes, normal eye alignment, and history of binocular visual experience. Consequently, stereoacuity tests are useful to evaluate and to manage patients with binocular vision disorders. Several of the stereoacuity tests used routinely by pediatric ophthalmologists, including the circle tasks of the Titmus and the Randot test (version 2), contain monocular cues that make the tests possible to pass to a certain level without stereopsis. Identification of the correct Titmus circle up to the third or fourth set is possible when the test is viewed under conditions that preclude stereoacuity or when the test is viewed by patients with known abnormalities of binocular vision. A nonstereoscopic misalignment of 1 of

From the Retina Foundation of the Southwest,a and the Department of Ophthalmology, University of Texas Southwestern Medical Center,b Dallas, Texas. This study was conducted at the Retina Foundation of the Southwest. Submitted January 18, 2000. Revision accepted July 3, 2000. Reprint requests: Sherry Fawcett, PhD, Retina Foundation of the Southwest, 9900 N Central Expressway, Suite 400, Dallas, TX 75231 (e-mail: [email protected]). Supported by Fight for Sight PD98011 and PD99007 and National Institutes of Health EY05236. Copyright © 2000 by the American Association for Pediatric Ophthalmology and Strabismus. 1091-8531/2000/$12.00 + 0 75/1/110340 doi:10.1067/mpa.2000.110340

354

December 2000

the 4 Titmus circles is readily apparent in at least the first 3 sets of Titmus circles when the test is viewed monocularly through polarized lenses.1-3 Patients with known binocular vision abnormalities have also demonstrated the ability to identify the correct circle in up to the third or fourth set of circles in the Titmus test2,4 and up to the first or second set of circles in the Randot (version 2) test.4 Similarly, under monocular viewing conditions, it is apparent where the 3 figures are located on the Lang 1 test, although it is impossible to identify them. Random dot stereoacuity tests provide a pure measure of stereoacuity as they do not contain monocular form cues. The shapes task of the Randot test (version 2) is a random dot stereoacuity test. However, its use is limited by its measurable disparities of 500 and 250 seconds of arc. The circle task of the Randot test (version 2) can measure stereopsis to 20 seconds of arc. It is not a true random dot task, however, but rather a hybrid task of line stereograms printed on a random dot background. Its design consequently makes coarse disparity discrimination possible by using monocular form cues, reducing its validity. The new Randot Preschool Stereoacuity (Stereo Optical Co, Inc, Chicago, Ill) test is a quick and simple-toadminister random dot stereoacuity test with a similar range of measurable disparity as the Titmus and Randot Journal of AAPOS

Journal of AAPOS Volume 4 Number 6 December 2000

circle tasks but is detectable only with the use of binocular form cues. Its unique test design incorporates a set of 11 shapes recognized by more than 95% of 3-year-old children, making this test easy to administer to young children.5 Validation studies have been completed.5 In both clinical and screening settings, the sensitivity of the Randot Preschool Stereoacuity test to identify children with normal versus deficient stereopsis exceeds 90% accuracy.5 In comparison with the Titmus, the Randot (version 2), and the Lang 1 tests, the Randot Preschool Stereoacuity test has substantially higher sensitivity. In the same children, the ability to accurately classify children with normal and deficient stereopsis was 50% correct with use of the Titmus circles, 50% correct with use of the Titmus animals, 71% correct with use of the Randot shapes, and 81% correct with use of the Lang 1.5 Testability or the success of administration of the Randot Preschool Stereoacuity test has also been quantified.5 The success rate of the Randot Preschool Stereoacuity test exceeds the success rate of each of the above tests. By age 3 years, success rate of the Randot Preschool Stereoacuity test exceeds 90%, and by age 4 years, success rate approaches 100%.5 To summarize, in clinical and screening settings with preschool children, the Randot Preschool Stereoacuity test is more sensitive and is more successfully administered than other stereoacuity tests including the Titmus, the Randot (version 2), and the Lang 1, making it a valuable tool for the management of and the screening for pediatric binocular vision disorders.5 The Randot Preschool Stereoacuity test has recently been adopted as a secondary outcome measure in national multicentered treatment trials, including trials by the Pediatric Eye Disease Investigation Group (eg, the Amblyopia Treatment Study). As multicenter testing involves measurements by numerous independent examiners, it is important to know the test-retest reliability of the Randot Preschool Stereoacuity test. High interobserver agreement of test scores is clearly of primary importance in reviewing the criterion-related validity of a new test. The purpose of this study was to assess the interobserver test-retest reliability of the Randot Preschool Stereoacuity test among children with normal and deficient stereopsis.

SUBJECTS AND METHODS Subjects Subjects included 102 consecutive children aged between 2 and 12 years (median age, 4 years; mean age, 5 years). To ensure diversity in binocular sensory function, subjects evaluated for this study included clinical research patients (n = 75) recruited for sensory vision evaluation by local pediatric ophthalmologists in Dallas, Texas, and healthy volunteer participants (n = 27) recruited from the Dallas metropolitan area. Clinical research patients included patients treated for accommodative esotropia (n = 53), infantile esotropia (n = 6), unilateral cataract (n = 5), anisometropia (n = 4), intermittent strabismus (n = 3), ptosis

Fawcett and Birch 355 (n = 2), Duane syndrome (n = 1), and nystagmus (n = 1). At the time of testing, all subjects were orthotropic at distance and at near with their best correction. Study exclusion criteria included neurological abnormalities, developmental delays, gestational age at birth less than 36 weeks, and amblyopia greater than one line. Informed consent was obtained before testing from the parents of all participants. No subjects chose not to participate in the study; however, 2 patients were uncooperative for the second evaluation and were not included in the analysis of 100 patients. The procedures of the experiment were approved by the institutional review board of the University of Texas Southwestern Medical Center, Dallas, Texas, and were performed according to the guidelines of the Declaration of Helsinki. Methods The Randot Preschool Stereoacuity test evaluates the perception of binocular disparities ranging from 40 to 800 seconds of arc. The test consists of a set of 3 test booklets. Each test booklet contains 2 sets of 4 random dot patterns for identifying or, for the youngest children, matching with four 2-dimensional black-and-white illustrations presented in a different order displayed on the opposite page of the book. Book 1 contains intermediate disparities (200 and 100 seconds of arc), book 2 contains fine disparities (60 and 40 seconds of arc), and book 3 contains coarse disparities (800 and 400 seconds of arc). Testing begins with book 1 or book 3. To determine the testability of the child, the child is first asked to identify the 2-dimensional pictures that they will be required to identify or to match during the test. After successful identification of the 2-dimensional pictures, testing proceeds by asking the child to identify or to match the corresponding 3-dimensional random dot pictures on the opposite page, which are visible only through the polarized glasses. At each level of disparity, the child must match or identify at least 2 of the 3 random dot figures correctly to be considered to pass and to proceed to the next level of the test with a smaller binocular disparity. Stereoacuity was defined as the smallest disparity with 2 of 3 random dot pictures correctly identified or matched. Random dot stereoacuity was measured twice in each child by using the Randot Preschool Stereoacuity test. After the evaluation by the first examiner, the second examiner (masked to the stereoacuity score obtained by the first examiner) reevaluated the patient. The examiners included 1 of the authors (S.F.) and a research technician trained to administer the tests. The second evaluation was made within 1 hour of the initial test. Visual acuity was measured concurrently to rule out amblyopia.

DATA ANALYSIS Parametric statistics requires an assignment of a value to each data entry in the analysis. For the purpose of statistical analysis and for figure plots, no measurable random dot

356

Journal of AAPOS Volume 4 Number 6 December 2000

Fawcett and Birch

FIG 1. Random dot stereoacuity (log sec) of 100 participants measured by 2 observers with regression line. r2 equals the strength of the relation between the 2 test scores. Perfect agreement by all test scores would be indicated by a regression line with an intercept (a) of 0 and a slope (b) of 1. The number of participants represented by each data point are embedded in each data point.

FIG 2. Test-retest random dot stereoacuity differences (log sec) plotted as a function of the mean random dot stereoacuity (log sec) of 100 participants. The mean test-retest difference (0.021 log sec) is represented by the hatched line and the limits of agreement (±2 SD) are represented by the solid lines. The 95% tolerance limits are represented by the bars. The number of participants represented by each data point are embedded in each data point.

stereopsis was coded as 3.2 log seconds of arc. The assignment of this value was based on the approximate 0.3 log sec interval between stimulus disparity levels and the largest available level of disparity (2.9 log sec). Test-retest reliability between the stereoacuity scores was first examined by plotting the results of the 2 tests against one another. Perfect agreement is demonstrated between 2 sets of scores when the regression line is equal to the line of equality. That is, the intercept is equal to zero and the slope is equal to 1. The second step in our analysis was to calculate the correlation coefficient (r) between the individual test scores.

FIG 3. Test-retest random dot stereoacuity differences (log sec) plotted as a function of age (years) of 100 participants. The mean testretest difference (0.021 log sec) is represented by the hatched line and the limits of agreement (±2 SD) are represented by the solid lines. The number of participants represented by each data point are embedded in each data point.

Because it is unlikely that a set of 2 scores will have perfect agreement (that r will be equal to 1 and that all of the data will fall along the line of equality), it is important to assess by how much the 2 scores differ from each other and whether the difference between the scores is large enough to influence the clinical management and treatment of patients. The third step in our analysis was to measure the agreement between the 2 test scores by plotting the difference between the test scores of each participant against the mean of his or her 2 test scores. This type of plot is known as a Bland-Altman plot.6 A Bland-Altman plot also allows assessment of whether the test-retest reliability depends on the level of stereoacuity. The fourth step in our analysis was to determine whether test-retest reliability varied as a function of age.

RESULTS Figure 1 shows the log stereoacuity scores for test 1 and test 2 for each subject. Seventy-two of the 100 pairs of test scores fall along the line of equality, having perfect agreement. The intercept and the slope of the regression line shown in Figure 1 are 0.03 and 0.98, respectively, exemplifying the high agreement between the test scores by the 2 examiners. The correlation coefficient between the individual test scores is highly significant (r = 0.97, P < .001). Figure 2 is a Bland-Altman plot, a plot comparing the test score differences between the 2 examiners with their average test score for each subject. The intercept and the slope of the regression line are 0.03 and 0.005, respectively. These values show that the test score differences are small and invariable as a function of level of functional stereoacuity. The mean difference between the 2 scores is 0.02 log seconds of arc (SD, 0.16 log seconds of arc) and is not significantly different from zero (t99 = 1.33, P > .1),

Journal of AAPOS Volume 4 Number 6 December 2000

signifying no influence by the first measurement on the second measurement. The 95% tolerance limits of agreement, defined as 2 SD above and below the mean difference, are 0.34 and –0.30 log seconds of arc. The CI for the mean difference and the 95% tolerance limits of agreement are narrow, reflecting both the large sample size and the small variation between the 2 test scores. The 4 children with test-retest differences larger than 2 SD included 1 participant with normal vision, 2 patients with accommodative strabismus, and 1 patient with anisometropia. The stereoacuity of these children ranged between 60 and 400 seconds of arc. To determine whether test-retest reliability is dependent on patient categorization, 2 separate regression analyses of the test-retest score differences were conducted for patients with accommodative esotropia (n = 53) and healthy volunteer participants (n = 27). These analyses revealed no difference of testretest reliability as a function of level of stereopsis by patients with accommodative esotropia (r = 0.62, P > .5) versus healthy participants (r = 0.33, P > .1). Figure 3 shows the test-retest differences of each subject plotted against age in years. The intercept and the slope of the regression line are 0.01 and 0.001, respectively, consistent with no effect of age on test-retest differences. The 4 scores with test-retest differences larger than 2 SD included children aged 3.5 (n = 1), 4 (n = 2), and 4.5 (n = 1) years. Although the proportion of patients under 5 years of age (n = 60) is larger than the proportion of patients 5 years of age and older (n = 40), any influence of age on test-retest reliability must be ruled out. An analysis of the test-retest score differences by children under 5 years of age compared with children 5 years of age and older shows the mean differences between the test scores to be not significantly different (t98 = 0.33, P = .97). Age of subject has no influence on the test-retest reliability of patients.

DISCUSSION The new book-format Randot Preschool Stereoacuity test is superior to other stereoacuity tests for administration to young children.5 Whereas other stereoacuity tests typically provide a pass/fail outcome (eg, the Lang 1), the Randot Preschool Stereoacuity test provides a quantitative measure of random dot stereoacuity. Its design was based on the shape recognition task of the Randot test (version 2). However, the Randot test (version 2) was not specifically designed for administration to preschool-aged children. The Randot Preschool Stereoacuity test incorporates the naming or matching task but was designed with use of shapes that are recognized by more than 95% of 3-yearolds.5 In addition, unlike the original shape recognition task of the Randot test (version 2), the Randot Preschool Stereoacuity test includes a broader range of disparities. Whereas the disparities of the Randot test (version 2) are limited to coarse and intermediate levels of disparity (500 and 250 second of arc), the disparities of the Randot

Fawcett and Birch 357 Preschool Stereoacuity test include coarse (800 and 400 seconds of arc), intermediate (200 and 100 seconds of arc) and fine (60 and 40 seconds of arc). Although the overall range of measurable stereoacuity by the Randot Preschool Stereoacuity test is similar to the overall range of testable stereoacuities of other stereoacuity tests (eg, the circles and animal tasks of both the Randot [version 2] and the Titmus fly), the recognition task of the Randot Preschool Stereoacuity test has better comprehensibility5 and a higher success rate in the preschool age range.5,7 Moreover, the Titmus and the Randot circles are designed in such a way that it is possible to pass without stereopsis, by using monocular form cues alone. The superior design of the Randot Preschool Stereoacuity test for administration to preschool-aged children, its high success rate, and its high accuracy for identifying children with deficient binocular vision has resulted in its recent adoption in national multicentered treatment studies to assess binocular sensory function in preschool- and school-aged children. Because of the dependence of multicentered testing on multiple examiners, it is important to assess the test-retest interobserver reliability of the Randot Preschool Stereoacuity test. In this study, high agreement was found between test scores acquired by 2 independent examiners. Seventy-two percent of the scores by the 2 examiners had perfect agreement, and the overall correlation between the test scores was high. If the difference between 2 test scores that use the same method is significantly different from zero, the data may not be used to assess repeatability simply because the first measurement influences the second (eg, there are practice or fatigue effects). The mean difference between the 2 test scores was not found to be significantly different than zero. This important finding not only demonstrates the repeatability of the test but also demonstrates that previous experience with the test does not influence the test score. Clinicians can confidently reassess patients by using the Randot Preschool Stereoacuity test the same day or on repeated visits without any concern that the patient has learned how to pass the test due to repeated administration. Children often learn how to perform stereoacuity tests, including the Titmus fly, the Butterfly, and the Lang 1, upon repeated administration during early childhood, reducing the validity of the tests for the purpose of clinical management and assessment. As described earlier, the Randot Preschool Stereoacuity test consists of a set of 3 test booklets, each containing different random dot pictures for identification. This design makes it more difficult for the children to pass by memorizing the pictures and their locations. Furthermore, a flexible protocol for administration allows the examiner to omit the identification of the 2-dimensional pictures at the start of the test and to administer the test booklets in a different order. The repeatability of a test may be influenced by different factors, including the level of function being measured by a test. Some tests of stereoacuity may have high test-

358

Journal of AAPOS Volume 4 Number 6 December 2000

Fawcett and Birch

retest agreement when patients with normal binocular sensory function are being assessed but may have poorer test-retest agreement when stereo-deficient patients are being assessed. Variability of repeatability as a function of having normal versus deficient stereopsis intrinsically reduces the validity of a test. The Bland-Altman plot provides information about such a relationship. The slope of the regression line of the data plotted in Figure 2 indicates that the variability of test-retest scores does not vary as a function of functional stereoacuity. The repeatability of a test may also be influenced by other characteristics of the population being tested, including their patient status or their age at the time of testing. To determine whether the repeatability of the Randot Preschool Stereoacuity test varies as a function of patient status, we compared its repeatability by patients with accommodative strabismus with its repeatability by healthy control participants. Any influence of patient status on repeatability would suggest that the Randot Preschool Stereoacuity test is not an ideal test for screening for binocular vision disorders. We found no influence of patient status on the repeatability of the test. Similarly, in order for a test to be appropriate for a range of ages, age

at the time of test should not influence repeatability. We found no influence of age at the time of test on test-retest agreement. Together with previous reports of high success rate and high clinical efficacy of the Randot Preschool Stereoacuity test, the high inter-tester reliability supports its use in clinical management and research settings for the quantitative assessment of binocular sensory vision and in multicentered research studies. References 1. Simons K, Reinecke RD. A reconsideration of amblyopia screening and stereopsis. Am J Ophthalmol 1974;78:707-13. 2. Reinecke RD, Simons K. A new stereoscopic test for amblyopia screening. Am J Ophthalmol 1974;78:714-21. 3. Levy NS, Glick EB. Stereoscopic perception and Snellen visual acuity. Am J Ophthalmol 1974;78:722-4. 4. Clarke WN, Noel LP. Stereoacuity testing in the monofixation syndrome. J Pediatr Ophthalmol Strabismus 1990;27:161-3. 5. Birch EE, Williams C, Hunter J, Lapa MC, and the ALSPAC “Children in Focus” Study Team. Random dot stereoacuity of preschool children. J Pediatr Ophthalmol Strabismus 1997;34:217-22. 6. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307-10. 7. Birch EE, Hale L. Operant assessment of stereoacuity. Clin Vis Sci 1988;4:295-300.

An Eye on the Arts – The Arts on the Eye The evil eye is a fairly consistent and uniform folk belief complex based on the idea that an individual, male or female, has the power, voluntarily or involuntarily, to cause harm to another individual or his property, merely by looking at or praising that person or property. The harm may consist of illness, or even death or destruction. Typically, the victim’s good fortune, good health, or good looks—or unguarded comments about them—invite or provoke an attack by someone with the evil eye. If the object attacked is animate, it may fall ill. Inanimate objects such as buildings or rocks may crack or burst. Symptoms of illness caused by the evil eye include loss of appetite, excessive yawning, hiccoughs, vomiting, and fever. If the object attacked is a cow, its milk may dry up; if a plant or a fruit tree, it may suddenly wither and die. —Alan Dundes (from The Evil Eye: A Casebook p 258)