Archives of Clinical Neuropsychology, Vol. 5, pp. 40-410, Printed in the USA. All rights reserved.
1990 Copyright
0 1990 National
0887-6177190 $3.00 + .@I Academy of Neurapsychology
Brief Report
Alternate
Forms of the AVLT: A Procedure
and Test of Form Equivalency David M. Shapiro and David W. Harrison Virginia
Polytechnic
Institute
& State University,
Blacksburg,
VA
The Rey Auditory Verbal Learning Test is frequently used in neuropsychological evaluation and research. However, its utility in the measurement of progressive change is limited by the availability of alternate and equivalent forms. Criteria were developed for word selection to generate new lists. TWO alternate forms generated according to these criteria, as well as the original Rey AVLT and alternate form, were administered to elderly Medical Center patients and undergraduate volunteers. All four AVLT forms yielded comparable mean recall scores, and alternate form reliability coefficients for each trial varied from .67 to .90. Conclusions from this study, though, must be made with caution as the sample size used was small.
The Rey Auditory Verbal Learning Test (AVLT; Rey, 1964) consists of five presentations of a 15word list, the presentation of a second interference list, and a recall trial of the original list. The test is used to assess new learning, susceptibility to interference, and immediate memory span and has been shown to be sensitive to dysfunction within the principal verbal mnemonic systems (Mungas, 1983; Ryan & Geisser, 1987; see also Ryan, Geisser, Randall, & Georgemiller, 1986). The utility of the AVLT in the assessment of progressive decline or recovery of function is restricted by the lack of comparable forms for repeated measurements (Lezak, 1982). A second form was offered as a solution to this problem (See Lezak, 1983), but lacked empirical support for its validity until recently (Ryan et al. 1986, 1987). More than two forms are required in the evaluation of progressive change across time. Criteria for the a priori selection of additional words are needed. This would Requests for reprints should be addressed to David W. Harrison, PhD, Department of Psychology, Virginia Polytechnic Institute & State University, Blacksburg, VA 24061. The present research was partially supported by a research grant to David Harrison from North American Philips Corporation. 405
406
D. h4. Shapiro and D. K Harrison
be useful especially in longitudinal or sequential research designs where the baseline performance, experimental manipulation, and recovery time might be observed in clinical populations. The purpose of the present study was to develop descriptive criteria for the selection of additional words suitable for the construction of AVLT forms, and to evaluate two new forms developed using these criteria for reliability and equivalency. To test the suitability of these criteria and to provide alternate forms in addition to the two presently available, each of four forms were administered and scores on the four forms were compared. METHOD Subjects
Forty-two subjects were recruited from the patient population at the Veterans Administration Medical Center (VAMC) at Salem, Virginia (N = 17; mean age = 66 years) and from the departmental undergraduate subject pool (N = 25; mean age = 19 years). Subjects from the VAMC were rehabilitating primarily from stroke or from limb surgery, but they had associated illnesses (e.g., cardiovascular disease, chronic obstructive pulmonary disease, diabetes mellitus, and renal or liver disease). Seven of these subjects carried a clinical diagnosis of dementia. Patients were excluded from the study (a) if they were acutely ill; (b) if a change occurred in their psychotropic medication during the study; and (c) if they had been present on the ward for less than two weeks. Word Lists
Four alternate forms of the AVLT (Form AB; Form CD; Form EF; and Form GH) were used. List pair AB was the standard Rey AVLT (Rey, 1964), whereas Form CD was the alternate list pair provided by Lezak (1983). The word items of these lists (i.e., AB and CD) were evaluated for consistency on the following dimensions to establish a priori criteria for the generation of additional forms. 1. The probability of the occurrence of the word in common usage was ascertained using the Thorndike-Lorge tables (1944). Only commonly used words with a frequency rating of A (i.e., 50 to 100 occurrences per million) or AA (over 100 occurrences per million) had been used in the generation of the original lists. 2. The imagery value of the items were compared using Paivio’s (1968) tables. The tables consist of a large number of words rated on a sevenpoint scale predictive of the subject’s ability to recall the word in verbal
407
Auditory Verbal Learning Test
learning studies. The original lists were found to be composed of words rated high in imagery value, viz., above 6.3 on a 7-point scale. 3. Only one- or two-syllable nouns are used in the original lists. 4. Obvious semantic or phonetic associations or similarities between words in the same list are controlled. Two new list pairs (EF and GH) were developed using these criteria (see Table 1). Procedures
Testing was performed by examiners instructed on the protocol described by Lezak (1983). Each subject was tested at a consistent time at quiet locations by the same examiner using only one list pair per session. The order of administration of the original and three alternate forms was counterbalanced across date and subjects. The interval between sessions varied from 2 to 13 days (mean = 5 days; SD = 3.6). RESULTS AND DISCUSSION The four versions of the AVLT yielded comparable mean recall scores for each trial with a difference of less than one point across forms. The means and standard deviations of the recall scores for each trial on each form are shown in Tables 2 and 3. TABLE 1 The Original AVLT (List AB), the Alternate List (List CD) and TWONew Lists (List EF and List GH) List AB* Drum Curtain Bell Coffee School Parent Moon Garden Hat Farmer Nose Turkey Color House River
Desk Ranger Bird Shoe Stove Mountain Glasses Towel Cloud Boat Lamb Gum Pencil Church Fish
*Rey, 1964. tlezak, 1983.
List CDT Book Flower Train Rug Meadow Harp Salt Finger Apple Chimney Button Key Dog Glass Rattle
Bowl Dawn Judge Grant Insect Plane County Pool Seed Sheep Meal Coat Bottle Peach Chair
List EF Street Grass Door Arm Star Wife Window City Pupil Cabin Lake Pipe Skin Fire Clock
Baby Ocean Palace Lip Bar Dress Steam Coin Rock Army Building Friend Storm Village Cell
List GH Tower Wheat Queen Sugar Home Boy Doctor Camp Flag Letter Corn Nail Cattle Shore Body
Sky Dollar Valley Butter Hall Diamond Winter Mother Christmas Meat Forest Gold Plant Money Hotel
408
D. M. Shapiro and D. Ct:Harrison TABLE 2 Means and Standard Deviations on the Original and Three Alternate Forms of the AVLT for University Undergraduates Trial
List AB*
I II III IV V Int. VI
7.00 10.20 11.76 12.40 13.04 7.20 12.00
f f f f f f f
1.63 2.24 2.45 2.20 2.09 1.89 2.61
List CDT 7.40 10.08 11.40 12.40 12.80 6.76 11.64
f 1.63 f 1.87 f 1.71 f 1.68 f 1.55 f 1.59 f 2.12
List EF 6.84 9.76 11.12 12.16 12.92 7.04 11.68
f 1.93 f 1.90 f 2.26 f 1.82 f 2.14 f 1.62 f 2.64
List GH 7.28 10.00 11.80 12.52 13.40 7.52 12.24
f f f f f f f
2.39 2.25 2.08 1.78 1.35 1.50 2.31
*Rey, 1964. tlezak, 1983.
Correlation coefficients were derived in comparisons of each trial of each alternate form with each trial in the original AVLT (see Table 4). This produced correlations ranging from 0.67 to 0.90 (average = 0.80). Test-retest reliability across forms was similar to that reported by Lezak (1982). Generally, the correlations exceeded those reported by Ryan et al. (1986). This may have resulted from Ryan et al.‘s (1986) relatively brief test-retest interval. Also, the improved correlations in the present study may be expected with the inclusion of college students and the expanded range of test scores in this study. Normative data on larger samples of both clinical and normal subjects are required, to further assess reliability and equivalency. Moreover, any conclusions drawn from these results must remain tentative given the small sample size. An overall two-factor analysis of variance (ANOVA) with repeated measures on forms (4) and trials (7) revealed no significant main effect of form, F(3, 123) = 1.24, p < .30 or Form x Trial interaction effects, F(18, 738) = .76, p < .75. Only the expected main effect of trial was reliable, F(6, 246) = 105.46, p < .OOOl. Separate ANOVAs on forms (4) were TABLE 3 Means and Standard Deviations on the Original and Three Alternate Forms of the AVLT for the VAMC Population Trial I II III IV V Int. VI
List AB* 4.06 5.52 6.12 6.41 6.47 3.35 4.06
*Rey, 1964. TLezak, 1983.
f f f f f f f
1.43 1.66 2.00 2.12 2.72 1.66 3.65
List CDT 3.29 4.94 5.59 5.71 6.41 3.41 3.29
f f f f f f f
1.96 2.08 2.09 2.26 2.83 1.66 2.87
List EF 3.52 4.76 5.76 6.47 6.88 3.18 3.71
f f f f * f +
1.55 2.61 2.31 2.65 3.16 1.55 3.67
List GH 3.41 4.71 5.76 5.65 6.47 3.41 4.17
f f f f f f f
1.37 1.57 2.44 2.18 3.24 1.97 2.96
409
Auditory Verbal Learning Test TABLE 4 Correlations of Each Alternate Form (CD, EF, and GH) with the Original AVLT* (N = 42) (90% Confidence Interval in parentheses) Trials I II III IV V Int. VI
CDt
EF
GH
0.74 (0.60; 0.88) 0.83 (0.73; 0.90) 0.87 (0.79; 0.92) 0.90 (0.84; 0.94) 0.89 (0.82; 0.93) 0.71 (0.55; 0.82) 0.90 (0.84; 0.94)
0.67 (0.50; 0.79) 0.74 (0.60; 0.84) 0.82 (0.71; 0.89) 0.77 (0.64; 0.86) 0.81 (0.70; 0.88) 0.68 (0.51; 0.80) 0.85 (0.76; 0.91)
0.77 (0.64; 0.86) 0.79 (0.67; 0.87) 0.86 (0.77; 0.91) 0.84 (0.74; 0.90) 0.84 (0.74; 0.90) 0.74 (0.60; 0.84) 0.88 (0.80; 0.93)
*Rey, 1964. tlezak, 1983. p 5 .OOOlfor each comparison.
used to compare performance on the four forms at each trial. No differences were shown among the forms and probability values exceeded .23 (df = 3, 123) in all cases. Practice effects were assessed with separate two-factor ANOVAs with repeated measures on date(4) and trial(7) for each group. Data from the VAMC location showed no reliable practice effects among this patient population. However, the effect of date was confounded by changes in the VAMC environment (relocation). These data were nonetheless valid for the previous analysis (evaluation of the equivalency of forms), since forms were counterbalanced across dates and subjects, using a Latin Square design. For the undergraduates, the main effect of trial (7) was reliable, F(6, 138) = 186.96, p < .OOOl,as was the main effect of date, F(3, 69) = 11.86, p < .OOOl. Pairwise comparisons using Tukey’s procedure showed no differences between the first and second administrations, or between the third and fourth testing sessions, but revealed a significant improvement between the second and third testing sessions. One-way ANOVAs with repeated measures on date (4) were subsequently performed at each trial. Reliable improvement was shown across date from the learning Trials I through V (p s .Ol). An exception to these improvements existed at the interference trial, F(3, 23) = 1.78, p = .16. For the post interference trial VI, a reliable improvement was again found, F(3, 23) = 3.03, p < .05. This finding suggests that even though the use of alternate forms may eliminate direct practice effects, there remains a general practice effect due
410
D. M. Shapiro and D. K Harrison
to repeated administrations when the tests are spaced as much as five days apart. The results of this study suggest that this potential confound persists for an extended period of time (i.e., days) in healthy college students but not among the older patient population. Improvements across time in a clinical population, then, should be interpreted with caution. Additional research is needed to clarify the change scores suggestive of significant recovery effects beyond the general learning effect observed in this study. Moreover, research is needed to establish the minimal intertest period sufficient to minimize the otherwise potent effects of practice on this neuropsychological test. REFERENCES M. D. (1982, June). The test-retest stability and reliability of some tests commonly used in neuropsychological assessment. Paper presented at the meeting of the International
Lezak,
Neuropsychological Society, Deauville, France. Lezak, M. D. (1983). Neurologicalassessment (2nd Ed.). New York: Oxford University Press. Mungas, D. (1983). Differential clinical sensitivity of specific parameters of the Rey AuditoryVerbal Learning Test. Journal of Consulting and Clinical Psychology, 51, 848-855. Paivio, A., Yuille, J. C., & Madigan, S. A. (1968). Concreteness, imagery, and meaningfulness values of 925 nouns. Journal of Experimental Psychology (Monograph Supplement), 76, lRey, A. (1964). L’examen clinique en psychologie. Paris: Presses Universitaires de France. Ryan, J. J., & Geisser, M. E. (1987). Validity and diagnostic accuracy of an alternate form of the Rey Auditory Verbal Learning Test. Archives of Clinical Neuropsychology, 1, 209-217. Ryan, J. J., Geisser, M. E., Randall, D. M., & Georgemiller, R. J. (1986). Alternate form reliability and equivalency of the Rey Auditory Verbal Learning Test. Journal of Clinical
and Experimental Neuropsychology, 8,611-616. Thorndike, E. L., & Lorge, I. (1944). The teacher’s word book of .30,000 words. New York: Teacher’s College, Columbia University, Bureau of Publications.