Reliable Change Index scores for persons over the age of 65 tested on alternate forms of the Rey AVLT

Reliable Change Index scores for persons over the age of 65 tested on alternate forms of the Rey AVLT

Archives of Clinical Neuropsychology 22 (2007) 513–518 Reliable Change Index scores for persons over the age of 65 tested on alternate forms of the R...

119KB Sizes 0 Downloads 23 Views

Archives of Clinical Neuropsychology 22 (2007) 513–518

Reliable Change Index scores for persons over the age of 65 tested on alternate forms of the Rey AVLT Robert G. Knight a,∗ , Jennifer McMahon b , C. Murray Skeaff b , Timothy J. Green b a

Department of Psychology, University of Otago, Box 56, Dunedin, New Zealand Department of Human Nutrition, University of Otago, Dunedin, New Zealand

b

Accepted 14 March 2007

Abstract An issue that often confronts the clinician referred an elderly person for neuropsychological assessment is how to interpret the significance of changes in test scores over time. In this report, data useful for estimating the statistical significance of changes on the Rey Auditory Verbal Learning Test (AVLT) are presented. The sample tested comprised 253 healthy persons aged 65 and over taking part in a randomized double-blind trial of the effect on cognitive performance of lowering homocysteine using dietary supplements. Results were based on the full sample because of the absence of any treatment effects. Test–retest data with a 1-year interval were used to estimate reliability coefficients and to calculate reliable change indices. The magnitude of a change necessary for a deterioration or improvement in scores at the two-tailed 90% confidence interval is given for the full sample, and persons above and below the age of 75. © 2007 National Academy of Neuropsychology. Published by Elsevier Ltd. All rights reserved. Keywords: Retest reliability; Rey AVLT; Old age; Reliable Change Index

1. Introduction Clinical neuropsychologists are frequently in the position of interpreting test results from clients who have been tested on more than one occasion to determine whether there has been a change in their cognitive function. A common circumstance where this arises is in the assessment of the elderly person with suspected dementia, such as Alzheimer’s disease, or the evaluation of a patient with a reversible cognitive decline (e.g., medication-induced toxicity) after treatment. The interpretation of changes in an individual patient’s test scores depends on being able to estimate the extent to which a score can be expected to change purely as function of measurement error. The Reliable Change Index (RCI; Jacobson & Truax, 1991) was designed to do this by using the known retest reliability and standard deviation of a specific test to calculate a standard error of measurement (SEm ) for the test and an estimate of the standard error of the difference between scores on the two testing occasions (SEdiff ). It is then possible to determine the probability of a particular change in scores over time occurring by chance and to calculate cut-off scores at conventional levels of probability to aid accurate decision making. The original RCI formula has been improved by including an adjustment to account for practice effects that might arise from having completed the test previously (Chelune, Naugle, L¨uders, Sedlak, & Awad, 1993). In this case, the ∗

Corresponding author. Tel.: +64 3 479 7623; fax: +64 3 479 8335. E-mail address: [email protected] (R.G. Knight).

0887-6177/$ – see front matter © 2007 National Academy of Neuropsychology. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.acn.2007.03.005

514

R.G. Knight et al. / Archives of Clinical Neuropsychology 22 (2007) 513–518

average difference between the standardisation sample’s test and retest scores is added to the individual’s retest score before determining the expected magnitude of change due to measurement error. The RCI plus practice effects (RCI + P) model has been evaluated against more complex regression-based models and found to give comparable results under most circumstances (e.g., Heaton et al., 2001; Temkin, Heaton, Grant, & Dikmen, 1999). Although analysis techniques such as general estimating equations provide a superior means of evaluating change in clinical trials, the RCI + P model has the distinct advantage of being readily understood and applied to the interpretation of test scores from individual patients (Chelune, 2002; Woods et al., 2006). The Rey Auditory Verbal Learning Test (AVLT) is a 15-item word list test, translated from French to English by Taylor (1959), that is used to assess immediate memory span, new learning, susceptibility to interference, and delayed recall and recognition. The administration of the test is described by Lezak, Howieson, and Loring (2004), and normative data for the measure have been collated by Mitrushina, Boone, and D’Elia (1999) and Spreen and Strauss (1998). The AVLT is commonly used to evaluate memory in the elderly and, because it has alternate forms, to track changes over time (Ivnik et al., 1990). Because practice is a significant factor on most tests of memory, even with lengthy retest intervals, alternate forms of tests such as the AVLT are typically administered on repeat testing. There is evidence that when alternate forms of AVLT are used, gains in performance seen with repeat administrations of the same form are minimized (Crawford, Stewart, & Moore, 1989). Changes in scores at retest on alternate forms administered on more than one occasion to older persons are likely to be the product of factors such as measurement error, increased age, differences in the difficulty and reliability of the parallel forms of the test, and increased familiarity with testing procedure on the second occasion (for a comprehensive discussion of these issues, see McCaffrey, Duff, & Westervelt, 2000). Some of these factors serve to increase retest scores, while others, such as age, may result in a decline; but for the sake of consistency with the literature and ease of exposition, the change in mean scores over time is referred to as a practice effect in what follows. The clinical interpretation of differences in test scores requires that the aggregated effect of these influences be taken into account, regardless of their source. A major obstacle to the widespread use of reliable change indices to interpret changes in cognitive performance is the scarcity of relevant published retest data for neuropsychological tests. This is the case for Rey AVLT; although many studies using the measure have been reported (McCaffrey et al., 2000), there are few published retest reliability estimates that might be used to compute RCIs for the elderly. The data for the present report came from a sample of 253 healthy older persons who were taking part in a trial of the effect of dietary supplements on cognition, and who were administered a battery of neuropsychological tests, including the Rey AVLT, at baseline and 12 months later. Since no evidence was found for the efficacy of the treatment regimen over the 2-year clinical trial period, the findings from the study could be used to determine reliability estimates for the AVLT. 2. Method 2.1. Participants The present data came from the baseline and 1-year follow-up phases of a 2-year clinical trial assessing the effects of lowering plasma homocysteine concentration by the administration of folate (L-5-methyltetrahydrofolate), and vitamins B12 and B6, on the cognitive function of healthy participants, aged over 65. The outcome of the trial provided no evidence that lowering plasma homocysteine concentration improved cognitive performance in the elderly (McMahon et al., 2006) and, accordingly, the data were suitable for the calculation of retest statistics. The trial began with the recruitment of 465 community volunteers over the age of 65 from Dunedin and surrounding communities. This was achieved through a process of newspaper advertisement, distribution of leaflets, and contact with community groups, asking for volunteers in good health to take part in a study on the effectiveness of vitamin supplements. All eligible volunteers were asked to attend a screening session at a morning clinic, following an over night fast, to evaluate their suitability for the trial. At the screening session, participants provided a blood sample, were measured for height and weight, interviewed to determine whether they had any ongoing health problems, and completed a general questionnaire in which they recorded all current medications, tea, coffee, and alcohol consumption (factors known to affect homocysteine metabolism), and activity levels. Finally, they completed the Geriatric Depression Scale (GDS) and the Center for Epidemiologic Studies Depression Scale (CES-D; Radloff, 1977). Participants were eligible for inclusion in the trial if they had a total plasma homocysteine concentration ≥13 ␮mol/L and a normal creatinine clearance (133 ␮mol/L in men and 115 ␮mol/L in women). Following the screening session,

R.G. Knight et al. / Archives of Clinical Neuropsychology 22 (2007) 513–518

515

persons were excluded from further participation if they had any evidence of disease where the disorder, or its treatment, might affect homocysteine levels or cognitive test scores. Participants were excluded if they were taking medications known to interfere with folate metabolism (e.g., oral hypoglycemics, antiepiletics, and antidepressants), any vitamin supplements containing folic acid, or vitamin B12, or B6, had CES-D scores greater than 16, or reported any prior history of renal disease, diabetes, major depression, psychosis, and cancer. Since one aim of the trial was to determine if elevated homocysteine levels were a risk factor for dementia, particular care was taken to exclude persons with suspected or diagnosed dementia, or who had any history of stroke or transient ischemic attacks. Since the trial extended over 2 years, it was possible to validate these exclusion criteria, and three persons who developed clinical symptoms of dementia during the trial were excluded retrospectively from the baseline and follow-up data. A total of 290 persons with elevated concentrations of homocysteine and normal creatinine levels were invited to take part in the randomized double-blind trial; of these 276 agreed to take part. Of the 276 persons who entered the trial, 253 completed the AVLT at both the baseline and the 12-month testing sessions. Reasons for failure to complete both sessions were death (6), onset of dementia or chronic disease (9), unwilling to complete tests of cognition (3), death or illness of spouse (2), and failed to complete AVLT (3). As a further check on a possible diagnosis of dementia, all participants completed the Mini-Mental State examination (MMSE; Folstein, Folstein, & McHugh, 1975), and all had a score greater 24 (M = 29.18, S.D. = 1.01). The sample was composed of a total of 112 males and 141 females with an average age of 73.51 (S.D. = 5.73, range = 65–90). All but 1% of the baseline sample described their ethnic origin as European (approximately 92% of the New Zealand population over age of 65 are European). Participants were asked about their highest educational attainment. In all, 9% had no high school education, 27% had less than 3 years high school education, 12% had more than 3 years high school education, 38% had attained vocational qualifications (e.g., in trades or nursing), and 14% had attended university. Their average National Adult Reading Test (NART; Nelson & Willison, 1991) correct score was 36.22 (S.D. = 7.08), which equates to a predicted Full Scale Wechsler IQ score of 116. 2.2. Procedure All volunteers were randomly assigned to either the supplement treatment or placebo conditions, and an appointment made for them to attend a 90-min baseline testing session at a convenient community hall. They were sent copies of the GDS and CES-D to complete and bring with them to the testing session. On arrival, these forms were collected and the volunteers completed a questionnaire about exercise levels, consumption of tea, coffee, and alcohol, and changes in medication. Each participant was then administered the following sequence of cognitive tests for the clinical trial: The MMSE, Logical Memory paragraphs adapted from the Wechsler for use by New Zealand participants, a Category naming task, the immediate recall trials of the AVLT, the NART, 20 items from Raven’s Progressive Matrices, the delayed trials of the AVLT, a semantic fluency test, and Parts A and B of the Trail Making Test (TMT). All testing was conducted by a trained health professional (JM) individually. Results for the AVLT, MMSE, TMT, and fluency measures for the 272 persons who completed all the cognitive tests at baseline are reported elsewhere (Knight, McMahon, Green, & Skeaff, 2006). The cognitive tests were re-administered 12 months later, in the same order, with each participant being tested as near as possible to his or her anniversary date, and on the same day of the week and time of day as previously. The general procedure for the administration of the Rey AVLT is described by Lezak et al. (2004, pp. 422–426). On the first administration at baseline, Lists A and B, taken from Taylor (1959) and reproduced by Lezak et al. (2004), were used. List A was administered on five occasions, followed by an interference trial using List B, and a seventh trial, using List A again. The instructions used for each of these trials (and for both of the delayed trials) were those given in Lezak et al. (2004). After a 30-min delayed, filled by completion of other tests of cognition, and the measurement of the participant’s blood pressure, the delayed recall and recognition of the words in List A were tested. The words read to the participant in the recognition trial are those listed by Lezak et al. (2004, Table 11-6), for use with Lists A and B. On the second administration 12 months later, the same procedure was used as for the previous occasion, except that two new word lists developed by Crawford et al. (1989) were used. The word lists and the words used in the recognition trial appear in Crawford et al. (1989, A), and are reproduced as lists AC and BC in Lezak et al. (2004, Tables 11-4 and 11-6). As part of the clinical trial, testing was repeated again 24 months after baseline, using the lists labelled JG and C in Lezak et al. (2004, p. 423).

516

R.G. Knight et al. / Archives of Clinical Neuropsychology 22 (2007) 513–518

3. Results and discussion At the conclusion of the trial, 24 months after baseline testing, the treatment condition for each participant was known and it was possible to determine whether the supplements had any effect on cognition. In a preliminary analysis, a series of t-tests comparing the treated (n = 127) and placebo (n = 126) groups on the Rey AVLT variables reported in Table 1 at the 12-month follow-up were conducted. There were no significant differences between groups on any of the variables, suggesting that the dietary supplements had no effect on AVLT scores after 12 months (these findings were confirmed in the final analysis of the data, conducted at the end of the trial; McMahon et al., 2006). As a further check on the validity of combining data from the treated and placebo groups, retest correlations were computed separately for the placebo and treated groups. There were no significant differences between the retest correlations for the two groups using a Fisher r-to-z transformation to compute the standard score difference between the two groups. The largest absolute difference between any pair of retest correlations was .09 for Trial 1 (zdiff = .99, p = .32). For this reason, it was decided to use the retest correlations for the entire sample, and not just for the placebo group, in the computation of RCI values. The means and standard deviations for the AVLT scores for the total sample at Time 1 are given in Table 2 and those for Time 2 in Table 1. Paired t-tests were used to assess mean differences between testing occasions; significant differences were found for Trial 5, Sum 1–5, Trial 7, and Delayed recall (Table 1) justifying the use of the RCI + P statistic (rather than RCI alone) for assessing change scores. Because the study did not counterbalance the administration the AVLT forms over time, it is not possible to calculate the proportion of the general decline in scores attributable to either age-related decrements or differences in the alternate forms. Cross-sectional data (e.g., as presented in Table 3) clearly demonstrate that there is a decline in performance as a consequence of age on any specific version of the test. The rate of decline, however, is less than one word for each 12 months of age on average for the Sum 1–5 scores, which suggests that age-related effects did not explain the total magnitude of change between Time 1 and Time 2 in the present study. It is likely that the versions of the AVLT used are not exactly equivalent in difficulty, indeed it would be surprising if they were, and that this contributed to the change in scores on retesting. In Table 2, retest correlations and RCI values are presented for the total sample and separately for participants above and below the age of 75. Given that the retest interval was 1 year and alternate forms of the AVLT were used, the estimates of retest reliabilities are substantial and comparable to previous findings (e.g., Ryan, Geisser, Randall, & Georgemiller, 1986; Uchiyama et al., 1995). Reliabilities increased from Trial 1 to Trial 5, and that for the total sample were in the range .70 to .80 for several of the indices. The most reliable measure proved to be the sum of scores across the five acquisition trials, which is to be expected, since all other things being equal, reliability is proportional to test length. In the final two columns of Table 2, the magnitude of a change necessary for a deterioration or improvement in scores at the two-tailed 90% confidence interval (z = 1.645) are given. This defines the boundaries that would encompass 90% of the individuals in the sample, with 5% lying beyond the lower interval and 5% above. Other values for determining the statistical significance of a change in performance can be computed by multiplying the SEdiff by the appropriate value of z and adding the practice effect. For example, an RCI value exceeding z = 1.96 + practice defines the boundaries where 2.5% of the sample would either exceed or 2.5% would fall below the scores of 95% of the total sample.

Table 1 Means and standard deviations for the AVLT scores at Time 2 and t-tests for Time 1 and Time 2 differences Time 1 − Time 2

Time 2

Trial 1 Trial 5 Sum 1–5 Interference Trial 7 Delay Recognition

M

S.D.

t

p

4.99 10.30 41.20 4.71 7.55 7.05 12.40

1.66 2.64 9.83 1.74 3.21 3.34 2.73

0.66 3.70 2.86 0.30 3.63 4.67 0.58

.51 .001 .005 .77 .001 .001 .56

R.G. Knight et al. / Archives of Clinical Neuropsychology 22 (2007) 513–518

517

Table 2 Reliable change plus practice for Rey AVLT scores Time 1 M

Practice

rχ␹

SEm

RCI + practice*

SEdiff

S.D.

Deteriorate

Improve

Total sample (n = 253) Trial 1 5.05 Trial 5 10.74 Sum 1–5 42.34 Interference 4.67 Trial 7 8.13 Delay 7.73 Recognition 12.29

1.70 2.64 9.85 1.69 3.31 3.61 2.68

−0.06 −0.44 −1.14 0.04 −0.58 −0.68 0.11

.53 .75 .79 .39 .71 .74 .67

1.17 1.32 4.51 1.32 1.78 1.84 1.54

1.65 1.87 6.38 1.87 2.52 2.60 2.18

−2.77 −3.51 −11.64 −3.03 −4.73 −4.96 −3.47

2.65 2.63 9.36 3.11 3.57 3.60 3.69

Age less than 75 (n = 153) Trial 1 5.42 Trial 5 11.41 Sum 1–5 45.16 Interference 4.97 Trial 7 8.97 Delay 8.64 Recognition 12.94

1.62 2.38 9.07 1.65 3.19 3.33 2.06

−0.06 −0.34 −0.95 0.28 −0.77 −0.73 0.15

.45 .65 .74 .40 .64 .67 .53

1.20 1.41 4.62 1.28 1.91 1.91 1.41

1.69 1.99 6.54 1.81 2.71 2.71 2.00

−2.85 −3.61 −11.71 −2.69 −5.22 −5.18 −3.13

2.73 2.94 9.81 3.25 3.68 3.72 3.43

Age 75+ (n = 100) Trial 1 Trial 5 Sum 1–5 Interference Trial 7 Delay Recognition

1.67 2.69 9.47 1.66 3.11 3.57 3.18

−0.08 −0.58 −1.45 −0.34 −0.30 −0.60 0.03

.56 .80 .79 .23 .74 .77 .73

1.11 1.20 4.34 1.46 1.59 1.71 1.65

1.57 1.70 6.14 2.06 2.24 2.42 2.34

−2.66 −3.38 −11.55 −3.73 −3.99 −4.58 −3.87

2.50 2.22 8.65 3.05 3.39 3.38 3.81

4.50 9.71 38.03 4.21 6.87 6.33 11.30

Note. rχχ = 12-month retest reliability; SEm = standard error of measurement; SEdiff = standard error of the difference between the test–retest scores; RCI = Reliable Change Index. * Given for 90% confidence interval, two-tailed.

As a check on the stability of the results in Table 2, when the AVLT was re-administered at the 24-month follow-up, the retest statistics for the 12- and 24-month period were computed for the Sum 1–5 data. The test–retest correlation was .79, the mean for the total sample at 12 months was 41.4 (S.D. = 9.71), the practice effect was 2.69, the SEm was 4.42, and the SEdiff was 6.25. There was close agreement between the results for the SEdiff values calculated for the baseline to 12-month data (6.38) and the 12–24-month follow-up (6.25), providing convincing evidence of the stability of the reliability estimates. One concern in the interpretation of the results is that the sample of healthy elderly persons may have been atypical because of their elevated homocysteine levels, or differ from English-speaking samples tested in other countries. To assess this, comparisons were made with a study of 394 cognitively intact adults from Minnesota aged over 55 that used a similar testing procedure for the AVLT, together with stringent exclusion criteria for volunteers (Ivnik et al., 1990). The authors provided normative data for age ranges that were the same as those reported for the baseline testing Table 3 Total learning acquisition scores over five trials from Ivnik et al. (1990) and Knight et al. (2006) Age (years)

65–69 70–74 75–79 80–84 85+

Ivnik et al. (1990)

Knight et al. (2006)

N

M

S.D.

N

M

S.D.

64 67 69 49 47

45.8 43.1 39.6 36.2 34.4

9.5 9.1 8.7 7.4 8.6

77 83 68 26 18

46.3 43.6 39.9 36.2 31.8

8.6 9.9 9.2 8.9 8.9

518

R.G. Knight et al. / Archives of Clinical Neuropsychology 22 (2007) 513–518

of the present sample (Knight et al., 2006). The data for the average Sum 1–5 scores (total learning scores in Ivnik et al., 1990) from the two studies are presented in Table 3. Of note is the comparability of the means and standard deviations for the two samples, providing evidence for the credibility of the psychometric data derived from the present study sample. The primary aim of the present report was to provide psychometric information about the AVLT, a memory test widely used in clinical practice with the elderly. The application of the data in Table 2 allows the interpretation of an individual client’s changes in scores on the AVLT over time. It is important to note that this procedure is valid only when the specific alternate forms are used that were employed in this study, and test administration follows the guidelines published by Lezak et al. (2004). It should also be noted that the sample tested were in good health and had no psychiatric or medical conditions that might have affected their memory performance. The participants were predominantly White European and were all native English speakers. The correlations between variables in samples with different ethnic composition or containing persons with frail health may well be different. In interpreting an individual’s test score discrepancy, confounding factors such as chronic ill health, psychiatric disorder, or depression need to be taken into account. References Chelune, G. J. (2002). Making neuropsychological outcomes research consumer friendly: A commentary on Keith et al. (2002). Neuropsychology, 16, 422–425. Chelune, G. J., Naugle, R. I., L¨uders, H., Sedlak, J., & Awad, I. A. (1993). Individual change after epilepsy surgery: Practice effects and base-rate information. Neuropsychology, 7, 41–52. Crawford, J. R., Stewart, L. E., & Moore, J. W. (1989). Demonstration of savings on the AVLT and development of a parallel form. Journal of Clinical and Experimental Neuropsychology, 11, 975–981. Folstein, M., Folstein, S., & McHugh, P. (1975). Mini-Metal State: A practical method for grading the cognitive state of patients for the clinician. Journal of Psychiatric Research, 12, 189–198. Heaton, R. K., Temkin, N., Dikmen, S., Avitable, N., Taylor, M. J., Marcotte, T. D., et al. (2001). Detecting change: A comparison of three neuropsychological methods using normal and clinical samples. Archives of Clinical Neuropsychology, 16, 75–91. Jacobson, N. S., & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 12–19. Knight, R. G., McMahon, J., Green, T. J., & Skeaff, C. M. (2006). Regression equations for predicting scores of persons over 65 on the Rey Auditory Verbal Learning Test, Trail Making Test, and Semantic Fluency measures. British Journal of Clinical Psychology, 45, 1–11. Ivnik, R. J., Malec, J. F., Tangalos, E. G., Petersen, R. C., Kokmen, E., & Kurland, L. T. (1990). The Auditory Verbal Learning Test (AVLT): Norms for age 55 and older. Psychological Assessment, 2, 304–312. Lezak, M. D., Howieson, D. B., & Loring, D. W. (2004). Neuropsychological assessment. New York: Oxford University Press. McCaffrey, R. J., Duff, K., & Westervelt, H. J. (Eds.). (2000). Practitioner’s guide to evaluating change with neuropsychological assessment instruments. New York: Kluwer Academic/Plenum Publishers. McMahon, J. A., Green, T. J., Skeaff, C. M., Knight, R. G., Mann, J. I., & Williams, S. M. (2006). A controlled trial of homocysteine lowering and cognitive performance. New England Journal of Medicine, 354, 2764–2772. Mitrushina, M. N., Boone, K. B., & D’Elia, L. F. (1999). Handbook of normative data for neuropsychological assessment. New York: Oxford University Press. Nelson, H. E., & Willison, J. (1991). National Adult Reading Test (NART) Test manual (2nd ed.). Berkshire, England: NFER-NELSON Publishing Company Ltd.. Radloff, L. S. (1977). The CES-D scale: A self-report depression scale for research in the general population. Applied Psychological Measurement, 1, 385–401. Ryan, J. J., Geisser, M. E., Randall, D. M., & Georgemiller, R. J. (1986). Alternate form reliability and equivalency of the Rey Auditory Verbal Learning Test. Journal of Clinical and Experimental Neuropsychology, 8, 611–616. Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological tests: Administration, norms and commentary. New York: Oxford University Press. Taylor, E. M. (1959). Psychological appraisal of children with cerebral deficits. Cambridge, MA: Harvard University Press. Temkin, N. R., Heaton, R. K., Grant, I., & Dikmen, S. S. (1999). Detecting significant change in neuropsychological test performance: A comparison of four models. Journal of the International Neuropsychological Society, 5, 357–369. Uchiyama, C. L., D’Elia, L. F., Dellinger, A. M., Becker, J. T., Selnes, O. A., Wesch, J. E., Chen, B. B., Satz, P., van Gorp, W., & Miller, E. N. (1995). Alternate forms of the Auditory Verbal Learning Test: Issues of test comparability, longitudinal reliability, and moderating demographic variables. Archives of Clinical Neuropsychology, 10, 147–158. Woods, S. P., Childers, M., Ellis, R. J., Guaman, S., Grant, I., & Heaton, R. K. (2006). A battery approach for measuring neuropsychological change. Archives of Clinical Neuropsychology, 21(1), 83–89.