Reliability of Visual Field Results over Repeated Testing JOANNE KATZ, MS, ALFRED SOMMER, MD, KATHE WITT, COMT
Abstract: Fifty-one normal subjects, 337 with ocular hypertension, and 55 patients with glaucoma underwent C-30-2 testing on the Humphrey Field Analyzer on at least three occasions over a 6-year period. The time between tests was approximately 1 year. Using the manufacturer's standard for a reliable field (false-positive and false-negative rates, <33%; fixation losses, <20%), no trends in the proportion of reliable fields or the component indices were observed over time. Four percent of normal subjects, 9% of those with ocular hypertension, and 8% of patients with glaucoma were unable to meet the reliability standard every time they were tested. This repeated lack of reliability was due almost exclusively to fixation losses. However, patients with glaucoma were more likely to have repeatedly high false-negative responses than those with ocular hypertension or normal subjects, providing further evidence that false-negative responses are more indicative of glaucoma than of patient reliability. Ophthalmology 1991; 98:70-75
Automated perimetry has become the standard method by which patients at risk for glaucoma and those with documented field loss are followed in clinical practice and in research studies. The usefulness of automated perimetry is conditional on the ability of the patient to understand instructions, fixate adequately, and provide timely responses to presented stimuli. The Humphrey Field Analyzer (Allergan Humphrey, San Leandro, CA) provides an estimate of these abilities by testing fixation losses and false negative and false positive responses during the test. As part of an ongoing research study, automated visual field testing has been performed at approximately yearly intervals on normal subjects and on patients with ocular hypertension and glaucoma. The extent to which subjects can perform this test reliably over time has important implications for our ability to use automated perimetry for follow-up of patients. It has been previously shown that a large proportion of patients with glaucoma and normal subjects in this study were found to have "unre-
Originally received: June 26, 1990. Revision accepted: September 4, 1990. From the Dana Center for Preventive Ophthalmology, Wilmer Institute, Johns Hopkins Medical Institutions, Baltimore. Supported by research grants EY03605, EY05092, and RR04060 from the National Institutes of Health, Bethesda. Reprint requests to Joanne Katz, MS, Wilmer Institute, Room 120, Johns Hopkins Hospital, 600 N Wolfe St, Baltimore, MD 21205.
70
liable" test results on their first automated perimetric examination.! Bickler-Bluth et ae found that 35% of patients with ocular hypertension had unreliable results on initial testing and that the proportion declined to 26% by the second testing. In this article, we examine whether the reliability of automated perimetry improves with yearly testing among normal subjects and patients with ocular hypertension and glaucoma. We also sought to identify whether a subgroup of patients was repeatedly unreliable, thus necessitating manual perimetry to confirm diagnoses or field changes.
SUBJECfS AND METHODS The Nerve Fiber Layer Study is an ongoing study at the Wilmer Institute that follows self-referred subjects at yearly intervals. 3 Informed consent is obtained at the time of enrollment into the study. Each subject undergoes a comprehensive ocular examination, optic disk and nerve fiber layer photographs are taken, and a medical and ocular history is completed. Threshold-related suprathreshold kinetic and static manual perimetry is performed on all subjects. A diagnosis of glaucoma is made on the basis of intraocular pressure greater than 21 mmHg and one or more visual field defects detected on at least two different occasions. A visual field defect is defined as a nasal step at least 10 degrees wide and present to at least two isopters, or as a paracentral or full arcuate scotoma at least 0.4 log units deep. Patients with confirmed elevated intraocular
KATZ et al
•
VISUAL FIELD RESULTS
pressure who have normal visual fields on manual perimetry or unconfirmed early defects not meeting the above definition of glaucoma are considered to have ocular hypertension. Subjects are defined as normal only if the visual field is entirely normal on manual perimetry, pressures are consistently 21 mmHg or lower, and they have no family history of glaucoma. Subjects with visual field defects due potentially to other ocular disorders are excluded from the normal group. Recruitment began in 1981, and patients return each year for follow-up. In 1984, we began using the C-30-2 test of the Humphrey Field Analyzer as the annual screening test, but manual perimetry remained the standard test for diagnosis. By the end of 1989, 51 normal subjects, 337 with ocular hypertension, and 55 patients with glaucoma had been tested at least three times using the C-30-2. For those with more than three visits, the fourth test was also used. When both eyes were tested and met the same diagnostic definition, the right eye was selected for analysis. Visual field testing was done annually or as close to annually as possible. For almost all subjects, the first C-30-2 test performed in our study was their first experience with automated perimetry. However, all had previously undergone manual perimetry. In addition, all glaucoma patients were managed outside of the study and were likely to be undergoing other visual field testing at more frequent intervals during the time of study. The C-30-2 test measures threshold values within the central 30-degree field using a staircase technique. 4 ,5 Before the initiation of the C-30-2 test, a foveal threshold test is given to each subject. If the threshold seems too high or too low relative to the patient's acuity, the test is repeated. The results of the foveal threshold test give the technician an indication of which patients may need additional instructions or have trouble fixating. After the foveal threshold test, four primary points are tested in each quadrant. If fixation is poor, the test is stopped and the patient is reinstructed. The blind spot is then mapped. If fixation is poor during this phase, further instruction is given and the test is restarted. In a small proportion of subjects «3%) the blind spot is enlarged and, therefore, is unmappable with the size III stimulus. The test is repeated using the size V stimulus. Similarly, if the stimulus is always seen in the blind spot, the test is repeated using the size I stimulus. This was the case for 3% of subjects. At random times during the test, stimuli are presented within the blind spot. If the subject responds to the stimuli, a fixation loss is recorded. If the blind spot cannot be mapped because the peripapillary depression is absolute to the largest stimulus (size V), the blind spot check for fixation loss is turned off. Under these circumstances, the rate of fixation loss is not available for analysis. This occurred in 14 of the total 1465 tests « 1%) used in this analysis. False-positive responses were noted when a subject responded positively to the sound of the motor moving but no stimulus was actually presented. When a patient failed to respond to a stimulus 9 dB brighter than one they saw previously at the same location, a false-negative response was recorded. The Humphrey Field Analyzer's criteria for reliability are fixation losses less than 20% and
false-positive and false-negative rates less than 33%. These are the catch trial cut-offs for tests that comprise the Humphrey's normal database from which global indices and deviation plots are calculated. The fixation loss monitor was turned on for all but 14 tests performed in this study. In one test, no false-positive catch trials were performed. In three tests, there were no false-negative trials performed because there were insufficient locations with measurable sensitivity for such trials to be conducted. In these cases, the false-positive or -negative rates were considered missing data. The proportion of tests for which anyone of the three catch trials exceeded the manufacturer's standard was calculated for each visit. Such tests were considered unreliable. The proportion of fixation losses greater than or equal to 20% and false-positive and -negative rates greater than or equal to 33% also was calculated separately for each visit. A logistic regression model was used to assess whether the reliability of subjects worsened or improved over time. This was done by regressing the logit of the probability of being reliable against diagnostic group and time using the statistical package GLlM.6 This was equivalent to estimating a common trend in the proportion of reliable subjects over time, while allowing the overall proportion of reliable subjects in each diagnostic group to differ. An interaction between diagnostic group and time was added to the model to assess whether or not time trends were different among the three groups. The overall proportion of reliable subjects in each group was compared after adjusting for the fact that the same subjects were measured several times.
RESULTS Fifty-one normal subjects, 337 with ocular hypertension, and 55 with glaucoma with three or more visual fields were included in the analysis. Of these, 32 normal subjects, 253 with ocular hypertension, and 22 patients with glaucoma had three visits, whereas the remainder had four visits (Table 1). For the group with three tests, the median time between tests was 14 months. For those with four tests, the median was 16 months. Patients with glaucoma were older than the patients with ocular hypertension, and normal subjects were the youngest (Table 2). Visual acuities were similar in all three groups, with only 2.3% of tests performed with an acuity worse than 20/40. Using the Humphrey's criteria for reliability, a similar proportion of subjects tested as unreliable at each visit (P = 0.6). No trend was observed among any of the three groups separately or when combined (Table 3). Patients with glaucoma were less reliable than those with ocular hypertension (P = 0.01), who were less reliable than normal subjects (P < 0.001). Table 4 gives the proportion of subjects with high fixation losses, false-positive, and falsenegative rates separately. Very few subjects had false-positive rates of 33% or greater. False-negative rates of 33% or more were low among normal subjects and those with ocular hypertension but higher among patients with glau71
OPHTHALMOLOGY
•
JANUARY 1991
•
VOLUME 98
Three Visits
Four Visits
Total
32 253 22
19 84 33
51 337 55
14 mos (13-17)*
16 mos (15-18)*
Proportion
<40 40-49 50-59 60-69 70+ Total
N
(%)
13 16
(25) (31) (16) (20)
8
10 4 51
(8)
(100)
Proportion
0/51 1/51 2/51 0/19
(0) (2) (4) (0)
Glaucoma
(%)
N
44
(13) (18) (29) (27) (12) (100)
6 14 21 12 55
2
(%)
1st visit 2nd visit 3rd visit 4th visit
(4)
(11 ) (25) (38)
(22)
(100)
7/47 10/51 9/51 5/19
Proportion
(%)
(4) (5) (4) (8)
1/54 4/55 2/55 3/33
(2) (7) (4) (9)
(2) (4) (2) (2)
11/55 10/54 9/54 3/33
(20) (19)
(26) (25) (26) (31)
11/55 13/54 12/55 9/31
33%
~
7/337 12/336 6/337 2/84
Fixation loss 1st visit 2nd visit 3rd visit 4th visit
(%)
13/337 17/337 14/337 7/84
(6) (2) (0) (0)
3/51 1/51 0/51 0/19
(15) (20) (18) (26)
~
Glaucoma
33%
~
False-negative
N
62 98 92 41 337
(%)
False-positive
Table 2. Baseline Age Distribution of Subjects Ocular Hypertension
Ocular Hypertension
Normal
1st visit 2nd visit 3rd visit 4th visit
* Interquartile range (75% of data fall between these limits).
Normal
NUMBER 1
Table 4. Proportions and Percentages of Unreliable Tests
Table 1. Number and Frequency of Visits
Normal Ocular hypertension Glaucoma Median time between visits
•
(17) (9)
20%
87/336 84/333 89/335 26/84
(20) (24)
(22)
(29)
Table 3. Proportions and Percentages of Unreliable Tests* Ocular Hypertension
Normal
Table 5. Number of Unreliable Visual Fields* Glaucoma
Three Visits
Visit
Proportion
(%)
Proportion
(%)
Proportion
(%)
1st 2nd 3rd 4th
9/47 10/51 9/51 5/19
(19) (20) (18) (26)
95/337 100/333 94/336 27/84
(28) (30) (28) (32)
20/54 21/53 20/54 10/31
(37) (40) (37) (32)
~
* Fixation loss 33%.
~
72
Glaucoma
Number
N
(%)
N
(%)
N
(%)
o
20 6 2 1 29
(69) (21)
126 59 40 24 249
(51) (24) (16) (10) (100)
7 6 7
(33) (29) (33)
1 2
20% or false-positive or false-negative results
coma (P = 0.007). Fixation losses of 20% or more accounted for the greatest number of unreliable tests among normal subjects, those with ocular hypertension, and patients with glaucoma. All three groups had rates ranging from 15 to 31%. No trends over time were observed for the proportion with high fixation losses (P = 0.27), false-negative (P = 0.17), or false-positive rates (P = 0.11). There were no significant time trends among any of the three diagnostic groups when estimated separately. We divided subjects into those younger than 60 years of age and those 60 years of age and older. No time trends were observed in either age group. Of those with three visits, 3% of normals, 10% of patients with ocular hypertension, and 5% of patients with glaucoma had at least one catch trial that exceeded the manufacturer's standard on all three occasions they were tested (Table 5). Ten percent of normal subjects, 26% of those with ocular hypertension, and 38% of patients with glaucoma had two or more such unreliable fields. Among
Ocular Hypertension
Normal
3 Total
(7) (3)
(100)
1
21
(5)
(100)
Four Visits Ocular Hypertension
Normal
Glaucoma
Number
N
(%)
N
(%)
N
(%)
o
10
(56) (17)
1 3
(43) (24)
3
36 20 7 13 7 83
9 6 8
(31) (21) (28) (10) (10) (100)
1 2
4
Total ~
3
1 18
* Fixation loss 33%.
~
(6)
(17)
(6)
(100)
(8)
(16)
(8)
(100)
3 3
29
20% or false-positive or false-negative results
those with four visits, 6% of normal subjects, 8% of those with ocular hypertension, and 10% of patients with glaucoma were never able to produce a reliable field. Twentythree percent, 24% and 20% of normal subjects, those
KATZ et al
•
VISUAL FIELD RESULTS
Table 6. Number of Unreliable Visual Fields for Those with Three Visits Ocular Hypertension
Normal Number
N
(%)
(%)
N
False-positive 0 1 2 3 Total
30 1 1 0 32
(94) (3) (3) (0) (100)
0 1 2 3 Total
30 2 0 0 32
(94) (6) (0) (0) (100)
Fixation loss 0 1 2 3 Total
21 5 2 1 29
(72) (17) (7) (3) (100)
(91) (6) (3) (0) (100)
230 15 7 1 253 ~
~
133 61 36 18 248
(%)
20 1 0 0 21
(95) (5) (0) (0) (100)
15 4 3 0 22
(68) (18) (14) (0) (100)
11 4 5 1 21
(52) (19) (24) (5) (100)
33% (94) (4) (2) (0) (100)
238 9 5 0 252
N
33%
~
False-negative
Glaucoma
20% (54) (25) (15) (7) (100)
Table 7. Number of Unreliable Visual Fields for Those with Four Visits Ocular Hypertension
Normal Number
N
(%)
N
(%)
False-positive 0 1 2 3 4 Total
19 0 0 0 0 19
(100) (0) (0) (0) (0) (100)
~
70 11 1 2 0 84
0 1 2 3 4 Total
17 2 0 0 0 19
(89) (11 ) (0) (0) (0) (100)
10 3 2 2 1 18
(56) (17) (11 ) (11 ) (6) (100)
38 20 6 12 7 83
~
(%)
27 4 1 1 0 33
(82) (12) (3) (3) (0) (100)
19 6 5 1 1 32
(59) (19) (16) (3) (3) (100)
16 7 3 1 3 30
(53) (23) (10) (3) (10) (100)
33% (94) (4) (1 ) (1 ) (0) (100)
79 3 1 1 0 84
Fixation loss 0 1 2 3 4 Total
~
N
33% (83) (13) (1) (2) (0) (100)
False-negative
Glaucoma
20% (46) (24) (7) (14) (8) (100)
with ocular hypertension and patients with glaucoma had three or more unreliable fields, respectively. Combining all visits, 4% of normal subjects, 9%of those with ocular hypertension, and 8% of patients with glaucoma were never able to produce a reliable test. Although fewer normal subjects were unable to produce a reliable test than either those with ocular hypertension or patients with glaucoma, this difference was not statistically significant. Thirty-six percent (17/47) of normal subjects, 51 % (170/ 332) of those with ocular hypertension, and 68% (34/50) of patients with glaucoma had at least one unreliable field examination. Fixation losses were the prime reason for almost all the repeatedly unreliable tests. Repeatedly high false-positive and false-negative rates were not a problem among normal subjects (Tables 6 and 7). Among patients with ocular hypertension, repeatedly high false-positive and falsenegative rates accounted for a small fraction of unreliable tests. However, repeatedly high false-negative rates were responsible for approximately one third of repeatedly unreliable tests among patients with glaucoma. Subjects were considered repeatedly unreliable if two or more ofthree tests had poor reliability or three or more of four tests had poor reliability. The manufacturer's criteria for poor reliability (false-positive or -negative rate ~33 % or fixation loss ~20%) were used. Repeatedly unreliable subjects were slightly older at baseline than usually reliable subjects, although not significantly (Table 8). The average pupil size over all visits was slightly smaller among repeatedly unreliable patients with ocular hypertension and normal subjects, but not among patients with glaucoma. The distributions of visual acuity across all visits were similar for repeatedly unreliable and usually reliable subjects. Pupil size and visual acuity varied very little across visits.
DISCUSSION Nineteen percent of normal subjects, 28% of those with ocular hypertension, and 37% of patients with glaucoma were unreliable on initial automated testing. Bickler-Bluth et al 2 found 35% of patients with ocular hypertension to be unreliable initially. Others have reported that only 13% of patients with ocular hypertension were unreliable on their first testing. 7 In our data, a large proportion of tests were found to be unreliable due to fixation losses, except among patients with glaucoma, for whom false-negative responses were more common than among patients with ocular hypertension or normal subjects. Others also have noted this association between glaucoma and false-negative responses,I ,8,9 a finding that may be explained by the increased short-term fluctuations exhibited by patients with glaucoma. A stimulus previously seen may no longer be visible and would be called a false-negative response. Patients with glaucoma also take longer to perform the test and may tire more easily than normal subjects. 10 This would also contribute to an increased false-negative response rate. 73
OPHTHALMOLOGY
•
JANUARY 1991
•
VOLUME 98
•
NUMBER 1
Table 8. Comparison of "Repeatedly Unreliable" and "Usually Reliable" Subjects Baseline Age (Years) Usually Reliable
Repeatedly Unreliable
Normal Ocular hypertension Glaucoma
Mean
95% CI
Mean
95% CI
52.0 59.1 62.7
(36.8, 67.2) (51.8, 66.4) (54.8, 70.7)
48.5 53.6 59.4
(42.9, 54.1) (51.4, 55.8) (53.6, 65.2)
Pupil Diameter (mm) Usually Reliable
Repeatedly Unreliable
Normal Ocular hypertension Glaucoma
Mean
95% CI
3.8 3.6 3.6
(2.8,4:8) (3.4, 3.8) (3.0, 4.2)
95% CI
Mean 4.0 4.0 3.4
(3.7, 4.3) (3.9,4.1) (2.9,3.9)
Visual Acuity Repeatedly Unreliable
Normal Ocular hypertension Glaucoma
Median
Range
Median
Range
20/25 20/20 20/30
(20/20-20/60) (20/15-20/100) (20/20-20/60)
20/20 20/20 20/25
(20/15-20/60) (20/25-20/80) (20/15-20/60)
It has been suggested that different rates of reliability from different studies may be due to population differences or differences in administration of the test.7 •9 In our study, all patients were familiar with manual perimetry but had not previously undergone automated testing. An explanation was given before testing. Mapping of the blind spot was repeated if fixation was poor during this phase of the test. Those patients exhibiting fixation losses were encouraged to fixate as the test progressed. Although the perimetrist's skills are not as critical for automated as for manual perimetry, we recognize that quality control is important and may influence the number of fixation losses and false responses occurring during the test (unpublished data presented at 1990 ARVO meeting). Although the perimetrists follow a standard protocol for test procedures, as described in the Subjects and Methods section, the same perimetrist did not necessarily administer all the tests for an individual patient. Hence, it is possible that some improvement in reliability could be masked by the administration of the test by different perimetrists. Fifty-one percent of our patients with ocular hypertension had at least one unreliable test. Similarly, BicklerBluth et aJ2 found 54% of ocular hypertensives who underwent three visual field tests at 6-month intervals had at least one unreliable test. However, unlike them, we were unable to find any time trends in the reliability of normal subjects or those with ocular hypertension or glaucoma. Bickler-Bluth found that 35% of ocular hypertensives were unreliable initially, a finding that dropped
74
Usually Reliable
to 26% at 6 months and remained there at 12 months. One possible explanation for this difference may be that their patients were less familiar with any type of visual field testing at the outset and exhibited an initial learning effect that manifested itself in an increase in reliability from the first to second test but not from the second to the third. Although our subjects had not experienced automated perimetry before the initial C-30-2 test, they were all familiar with manual static and kinetic threshold perimetry. Because glaucoma patients were managed independently of our research study, they most likely also underwent some form of perimetric testing in addition to our annual tests. Our patients were also tested over longer intervals, i.e., 14 to 16 months, whereas Bickler-Bluth's subjects were tested at 6-month intervals. We found that fewer normal subjects were repeatedly unreliable than were those with ocular hypertension or glaucoma. A higher proportion of glaucoma patients had repeatedly unreliable fields due to false-negative responses than did patients with ocular hypertension or normal subjects. This lends further weight to the idea that falsenegative responses are associated with glaucoma, particularly as patients with ocular hypertension appeared to represent an intermediate group with regard to the proportion of patients with repeatedly high false-negative rates. We found that 10% of patients with ocular hypertension were unreliable on all three of their tests, whereas for those with four tests, 8% were unreliable each time. This is sim-
KATZ et al
•
VISUAL FIELD RESULTS
ilar to the finding of Bickler-Bluth et ae where 8.5% could not produce one reliable test at each of the three times they were tested. Poor reliability has not been associated with age cross-sectionally,I,2 but Nelson-Quigg et al 9 did find slightly higher fixation losses in older age groups. Those of our subjects who were repeatedly unreliable were slightly older, and those with ocular hypertension had somewhat smaller pupil diameters. Repeatedly unreliable subjects did not have poorer visual acuities than those who were usually reliable. Only 4% of normal subjects were unable to produce even one reliable field. Fixation losses were almost exclusively responsible for the repeated lack of reliability in this group. The proportion of patients with ocular hypertension and glaucoma who were unable to produce one reliable field was double that of normal subjects (9% for ocular hypertensives and 8% for glaucoma patients). Fixation losses were an important source of repeated poor reliability among all groups. However, false-negative responses were a further factor in repeated poor reliability among patients with glaucoma. Fixation losses have been shown to have very little effect on the height and shape of the hill of vision, but high false-negative rates can produce visual fields whose overall sensitivity is depressed by an average of 9 decibels when compared with patients with low false-negative rates. II The group of subjects with repeatedly unreliable tests was small, but manual testing with more technician interaction might produce a more "reliable" visual field among those glaucoma patients with consistently high false-negative rates. The current manufacturer's criteria for reliability attempt to capture three types of errors in response to the presentation of stimuli. Fixation losses are the most common errors made by normal and pathologic subjects. The reported rate of fixation loss may reflect incorrect initial mapping of the blind spot or other artifacts, in addition to genuine fixation loss. Genuine fixation loss might be identified by the presence of other catch trial errors within the same test or fixation losses that persist with repeat
testing. False-negative responses appear to be closely associated with glaucoma and may not reflect, therefore, reliability per se, especially when there are no other catch trial errors or when patients have false-negative responses on repeat testing. Because of these issues, the most informative way to assess reliability is to examine the various catch trials separately, the association between types of catch trials within the same test session, and the repetition of catch trial errors on multiple testing.
REFERENCES 1. Katz J, Sommer A. Reliability indices of automated perimetric tests. Arch Ophthalmol1988; 106:1252-4. 2. Bickler-Bluth M, Trick GL, Kolker AE, Cooper DG. Assessing the utility of reliability indices for automated visual fields. Testing ocular hypertensives. Ophthalmology 1989; 96:616-9. 3. Sommer A, Quigley HA, Robin AL, et al. Evaluation of neNe fiber layer assessment. Arch Ophthalmol1984; 102:1766-71. 4. Humphrey Field Analyzer Operator's Manual: Model 620. San Leandro, CA: Allergan Humphrey, 1983. Available from Allergan Humphrey, 2992 Alvarado St, San Leandro, CA 94577. 5. Statpac User's Guide. San Leandro, CA: Allergan Humphrey, 1986. Available from Allergan Humphrey, 2992 Alvarado St, San Leandro, CA 94577. 6. Generalized Linear Interactive Modelling. Oxford, UK: Numerical Algorithms Group Ltd, 1985. 7. Hardage L, Stamper RL. Reliability indices for automated visual fields [Letter]. Ophthalmology 1989; 96:1810. 8. Heijl A, Lindgren G, Olsson J. Reliability parameters in computerized perimetry. In: Greve EL, Heijl A, eds. Seventh Intemational Visual Field Symposium, Amsterdam, Sept 1986. Dordrechl: Martinus Nijhoff/Dr. W. Junk, 1987; 593-600. (Doc Ophthalmol Proc Ser; 49). 9. Nelson-Quigg JM, Twelker JD, Johnson CA. Response properties of normal obseNers and patients during automated perimetry. Arch Ophthalmol1989; 107:1612-5. 10. Kosoko 0, Sommer A, Auer C. Duration of automated suprathreshold vs quantitative threshold field examination. Impact of age and ocular status. Arch Ophthalmol 1986; 104:398-401. 11. Katz J, Sommer A. Screening for glaucomatous visual field loss: the effect of patient reliability. Ophthalmology 1990; 97:1030-7.
75