Archives of Clinical Neuropsychology, Vol. 14, No. 6, pp. 545–559, 1999 Copyright © 1999 National Academy of Neuropsychology Printed in the USA. All rights reserved 0887-6177/99 $–see front matter
PII S0887-6177(98)00042-0
Performance of 225 Dutch School Children on Rey’s Auditory Verbal Learning Test (AVLT): Parallel Test-Retest Reliabilities with an Interval of 3 Months and Normative Data Willem van den Burg Groningen University
Annette Kingma University Hospital of Groningen
Performance on Dutch adaptations of Rey’s AVLT was examined in a sample of 225 6- to 12year-old Dutch school children, selected to be representative of the general population. With an interval of 3 months, they were tested twice, using two out of three test forms which proved to be parallel. No test-retest practice effects were apparent. Age had a considerable impact on test performance. Measured imperfectly, socioeconomic background and intellectual level had some additional influence, as did the examiner who administered the tests. Boys made more errors than girls. Normative data corresponded well with those collected in a smaller Australian sample, suggesting usefulness outside The Netherlands. With a parallel test-retest reliability of .70, which was reduced to .59 when the influence of age was taken into account, the most reliable AVLT measure was the total number of words correctly recalled over the five learning trials. On the basis of reliability data, implications for the clinical use of the AVLT are discussed. © 1999 National Academy of Neuropsychology. Published by Elsevier Science Ltd
As it is usually applied, Rey’s (1941, 1958) Auditory-Verbal Learning Test (AVLT) consists of at least two parts or stages. First, a list of 15 unrelated, concrete nouns is presented over five learning trials, with immediate recall tested following each presentation.
The authors gratefully acknowledge the cooperation of teachers and pupils of the following schools: “De O.L.S” in Gasteren; “O.L.S. De Badde” in Annerveensche kanaal; “O.L.S. De Veldkei” in Assen; “O.L.S. van Baggelhuizen” in Assen; and “O.L.S. Kindergemeenschap Admiraal de Ruyter” in Groningen. We further thank the following psychological assistants and doctoral students who, together with the second author, took care of the administration of the tests: Christien Brons, Anky Caspers, Eeke van den Berg, August van Dommele, Riet Til, and Els Westerman. Address correspondence to: Willem van den Burg, Groningen University, Department of Neuropsychology, Poortweg 4 AZG, Hanzeplein 1, 9713 GZ Groningen, The Netherlands.
545
546
W. van den Burg and A. Kingma
Next, after a delay interval of some 20–30 minutes and with no further presentations of the list, delayed recall is assessed. In English-speaking countries, following a suggestion of E. M. Taylor (1959), this design is often extended by presenting one time a new interference list after the fifth learning trial; following recall of this second list, recall of the first list is assessed, and this is followed by the delay interval (Lezak, 1995). In The Netherlands, as in a number of other countries, it has been the tradition to stick to Rey’s procedure without an interference trial. In this form, the AVLT was administered in the present study. In both forms, the AVLT is one of the most widely used word learning and memory tests. With this test, deficits in these abilities have been observed in quite a variety of disorders (e.g., Berg, Koning-Haanstra, & Deelman, 1991; Isaacs-Glaberman, Medalia, & Scheinberg, 1989; Powell, Cripe, & Dodrill, 1991; Ryan, Paolo, & Skrade, 1992; Spikman, Berg, & Deelman, 1995; Sutker, Allain, Johnson, & Butters, 1992; A. E. Taylor, Saint-Cyr, & Lang, 1986; Van den Burg, Van Zomeren, Minderhoud, Prange, & Meijer, 1987; cf. Lezak, 1995), including child disorders (Brouwers & Poplack, 1990; Jannoun & Chessels, 1987; Kingma, Mooyaart, Kamps, Nieuwenhuizen, & Wilmink, 1993; Tromp & Van den Burg, 1982). An often stressed advantage of the test is that many aspects of the performance may provide clinically relevant information. Apart from the numbers of words correctly recalled, these include the errors made, the steepness or flatness of the learning curve, (continual) repetition of words during recall, and delayed recall performance relative to what was learned before, among other facets (cf. Crockett, Hadjistavropoulos, & Hurwitz, 1992; Geffen, Moar, O’Hanlon, Clark, & Geffen, 1990; Heubrock, 1995; Lezak, 1995). However, adequate studies on reliability and norms are often insufficient or lacking (Stallings, Boake, & Sherer, 1995). This is especially so considering the desirabilities (cf. Reynolds, 1989) that such studies include: (a) an assessment of test-retest reliability, with not too short a between-test interval; (b) a retest with a parallel test, because words learned at the first test may be partly remembered at the second test (Crawford, Stewart, & Moore, 1989); (c) a considerable sample of subjects, in particular for establishing good normative data; and (d) a sample representative of the population of interest—mostly a demographic segment of the general population. For adult populations a number of studies now have appeared which partly meet these desirabilities (Bleecker, Bolla-Wilson, Agnew, & Meyers, 1988; Crossen & Wiens, 1994; Geffen, Butterworth, & Geffen, 1994; Geffen et al., 1990; Helmstaedter & Durwen, 1990; Heubrock, 1994; Ivnik et al., 1990, 1992; Mitrushina, Satz, Chervinsky, & D’Elia, 1991; Savage & Gouvier, 1992; Selnes et al., 1991; Wiens, McMinn, & Crossen, 1988). However, studies on the performance of children in the general population are nearly completely absent. This is remarkable because the AVLT was originally designed especially for use in children aged 6 years and older (Rey, 1941, 1958; E. M. Taylor, 1959). To our knowledge, to date only Forrester and Geffen (1991), using the standard American adaptation of Rey’s test in a small but careful study, have published some representative AVLT data for Australian children. Test-retest studies have not been performed in this age group. The purpose of the study reported here was to fill this gap for a representative group of Dutch children, using Dutch adaptations of the Rey test. The immediate reason was that the data were needed for ongoing, longitudinal studies, conducted at the Pediatric Oncology Center of the Groningen University Hospital, on the cognitive development of children who are treated for acute lymphoblastic leukemia or brain tumor (e.g., Kingma et al., 1993). The need was keenly felt when it appeared that at a retest quite a number of children spontaneously enumerated words of the AVLT they had learned 1 year before.
AVLT in Children
547
MATERIALS AND METHODS Subjects and Procedure Based on demographic data supplied by The Netherlands Central Bureau of Statistics, five elementary schools in the provinces of Groningen and Drente were asked for their cooperation and all consented to participate. The schools were selected by “broad” and “impressionistic modal” sampling (Neale & Liebert, 1986), that is, in such a mix of higher and lower socioeconomic schools, city schools and country schools, as to provide a reasonable representation of the general population of 6- to 12-year-old children. In advance, parents of all children were asked by letter for approval, which was given by all but one. Next, the children were sampled at random from each class, with the restriction that they had to be raised in The Netherlands and spoke Dutch as primary language, and that only one child of any given family could participate. The children were tested in the morning, in a separate room. They were urged not to speak about the tests on return to their class. The testing was done by six well-trained examiners, generally a fixed pair for each school. The retest took place after 3 months and was administered by nearly always the same person. At the first test, a child randomly received one of three AVLT versions (see below), and at the second test randomly one of the two other forms, but such that the examiners used the three test forms equally often and tested an equal amount of children from each class, and as many girls as boys. After the first test, 16 subjects were lost due to illness or moving away. This resulted in 225 children being tested twice: 112 boys and 113 girls. Apart from the fact that we did not use an interference list, the test instructions and further proceedings were as described in detail by Lezak (1995), with three exceptions. First, to the introduction of the test was added: “It is a difficult task, so don’t mind if you remember only a few words.” Second, to standardize the test as much as possible, the 15 words were not read by the examiner, but presented by a taperecorder. Third, the words were not read at a rate of one word per second, but with an interval of 1 second between the words. Following the fifth trial, two nonverbal tests were administered, the Developmental Test of Visual-Motor Integration (Beery & Buktenica, 1989) and 10 lines of the Bourdon Dot Cancellation Test (cf. Van Zomeren & Brouwer, 1994). This took about 25 minutes (range 20–30 minutes). Early finishers were required to make drawings. Subsequently, without having been informed that this would occur, the child was asked which words it still remembered. All words mentioned at any trial, including repeats and errors, were immediately written down by the examiner.
The Three Word Lists List I (see Table 1) is a standard in The Netherlands since its creation (Kalverboer & Deelman, 1964). It includes five words of the American standard for the first test to be learned, and four words of the American standard for use in the interference trial (Lists A and B in Lezak, 1995, p. 439). Lists II and III were constructed by the second author to be parallel tests of List I. Form equivalence with List I was pursued by selecting nouns that (a) had a comparable frequency in spoken Dutch (Uit den Boogaart, 1975); (b) were familiar to at least 90% of Dutch and Flemish children of 6 years old, as estimated by Kohnstamm (1981); all words of List I satisfied this criterion except, curiously, river; (c) were monosyllabic or disyllabic in the same proportion as in List I; (d) were comparable with regard to semantic and phonetic features; (e) seemed equally concrete; (f) seemed as difficult or easy to cluster by association. Comparability, especially regarding
548
W. van den Burg and A. Kingma
TABLE 1 The Three AVLT Word Lists: English [Dutch] List I
Curtain [Gordijn] Bird [Vogel] Pencil [Potlood] Glasses [Bril] Shop [Winkel] Sponge [Spons] River [Rivier] Colour [Kleur] Flute [Fluit] Plant [Plant] Coffee [Koffie] Chair [Stoel] Drum [Trommel] Shoe [Schoen] Air/sky [Lucht]
List II
List III
Rabbit [Konijn] Shoelace [Veter] Paper [Papier] Kitchen [Keuken] Ship [Schip] Ball [Bal] Fairy-tale [Sprookje] Fruit [Fruit] Post [Post] Face [Gezicht] Rubbish [Rommel] Trousers [Broek] Tent [Tent] Doorstep/sidewalk [Stoep] Lamp [Lamp]
Soldier [Soldaat] Watering can [Gieter] Balloon [Ballon] Pipe [Pijp] Toddler [Kleuter] Wind [Wind] Paint [Verf] Tower [Toren] Chain [Ketting] Foot [Voet] Shed/barn [Schuur] Cat/puss [Poes] Mirror [Spiegel] Rose [Roos] Lip [Lip]
the last two criteria, was checked in some preliminary tests of, and interviews with, both colleagues and thirty children. The words were read in the order of Table 1. Analysis The following AVLT scores were considered: Trial 1 to Trial 5 scores, that is, the numbers of words correctly recalled after each presentation of the 15 words; total score, that is, the sum of the trial scores; delayed recall score, that is, the number of words correctly recalled after the delay interval of about 25 minutes; errors in Trials 1–5 and, equally, errors in delayed recall, that is, with the exception of intrusion errors (see below), all words mentioned which were not on the list, including repeated errors; repeats in Trials 1–5 and, equally, repeats in delayed recall, that is, the total number of repetitions of the correctly recalled words; intrusion errors, that is, the number of times words of the first list were “recalled” at the second administration, when another list had to be remembered. Some additional, derived scores are discussed below. The impact of the following factors on the AVLT scores was examined: age, sex, school class, school, and examiner. School class. This factor is of course strongly related to age, but not completely; some, often less bright and/or less motivated children, not uncommonly with a lower socioeconomic background, were in a lower school class than usual for their age. Thus, taking account of age, this factor may provide an indication of the additional influence of the children’s intellectual level, and, to some degree, of their socioeconomic background. School. This factor refers primarily to socioeconomic background, but indirectly to some extent also to intellectual level. Because any influence of school class and school appeared to go together, and both refer to similar underlying factors (intellect and socioeconomic background, in an inextricable combination), results will be given for one school and class combined factor. The influence of these factors was mainly investigated by means of hierarchical regression analyses, for the purpose of which the categorical factors (all, except age), and their levels, were coded in sets of dummy variables (Cohen & Cohen, 1983). Nearly all
AVLT in Children
549
statistical tests could be performed by using SPSS/PC1 (Norusis, 1986). An exception was a x2 test for the equality of correlations in independent samples (cf. Cohen & Cohen, 1983, p. 55).
RESULTS Equivalence of Test Forms Preliminary checks showed that the test form administered was not associated with any of the mentioned factors which might affect the AVLT scores. The randomization procedure thus had proved successful. Three kinds of analyses indicated that the three test versions could be considered as parallel. The first of these showed no appreciable differences in means and variances of the various AVTL measures. Analyses of variance were conducted for comparing: Trial 1–5 scores; the total score; the linear component of the learning curve with and without the total score as a covariate; and the delayed recall score with and without the total score as a covariate. Variances were compared using the Cochran C statistic. Because the distributions of errors and repeats were highly skewed, the Kruskal-Wallis test was used for comparing these scores. Analyses were performed separately for the data of the first and second test. No comparison yielded a p , .05. Table 2 shows means and standard deviations of the total and delayed recall scores for the three test versions on the two testing occasions. The second kind of analysis indicated similar intercorrelations of Trial 1–5 scores, total scores and delayed recall scores on both testing occasions ( x2 tests); in keeping with chance, three of the 42 tests showed a difference at p 5 .05. Table 2 shows the correlations between total scores and delayed recall scores, and in addition approximations of the internal reliabilities of the total scores by Cronbach’s coefficient alpha. It should be noted that the alphas were computed only to compare the values; because the five trial scores are not experimentally independent (Lord & Novick, 1968), alpha may not form here a lower bound of the internal reliability. The third kind of analysis indicated no different correlations of the AVLT measures with age and sex. Sex was of marginal importance anyhow (see below). Substantial but nonlinear relationships were found between the numbers of correctly recalled words and
TABLE 2 Statistics to Assess Test Form Equivalence of Word Lists I, II, and III and a Possible Test-Retest Practice Effect Administration 1
Total Score I Delayed Recall I Total Score II Delayed Recall II Total Score III Delayed Recall III a
M
SD
aa
41.0 9.1 42.3 9.6 43.3 9.7
9.4 2.6 8.9 2.5 8.9 2.8
.90
rTDb
.77 .91 .79 .89 .76
Administration 2
Ragec
rages d
n
M
SD
aa
.56 .54 .64 .50 .52 .40
.56 .54 .64 .48 .52 .41
79 79 71 71 75 75
41.9 9.3 43.4 9.4 42.7 9.4
8.3 2.5 8.6 2.7 9.5 2.6
.86
rTDb
.74 .87 .76 .90 .84
Ragec
rages d
n
.55 .60 .45 .45 .47 .30
.55 .60 .43 .44 .46 .30
74 74 74 74 77 77
Cronbach’s coefficient alpha. Correlation between Total and Delayed recall scores. c Multiple correlation with age and age squared. d Correlation with “shortened age” (i.e., when 10- to 12-year-old children are given the age of 10). b
550
W. van den Burg and A. Kingma
age, with the scores progressing from 6 to 10 years in an approximately linear way, and hardly any difference between the performance of 10-, 11- and 12-year-old children. In Table 2 multiple correlations (Rage) are included with age and age squared (to adjust for the nonlinearity). In addition, ordinary (linear) correlations with shortened age ( rages) are presented, with ages meaning that 10- to 12-year-old children were all given the age of 10. As shown, Rage is practically equal to rages for all three test forms (x2 tests). The sufficiency of using ages emerged likewise from other analyses, and most results yet to be presented will therefore be based on this simple age variable. Parallel Test-Retest Reliabilities Given the equivalence of the three test versions and the considerable impact of age, test-retest reliabilities were computed across test forms, without and with the influence of age partialled out (Table 3). As shown, the total score is the most reliable measure and, as has been found before (e.g., Geffen et al., 1994), later trial scores are more reliable than earlier ones. We therefore looked for a weighted sum of trial scores which is more reliable than the total score, by giving more weight to later than to earlier trial scores. This search, however, was unrewarding; with peculiar weights, the maximal testretest correlation attained was .715.1 No evidence was found that reliabilities were lower (or higher) for younger than for older children (x2 tests). We further examined to what extent single trial scores and the delayed recall score may give additional and reliable information on test performance, independent of the total score. Results showed very low reliabilities. With the total score partialled out, testretest reliabilities of the single trial scores were all lower than .15, and that of the delayed recall score was .28. Similar results were obtained for some measures of aspects which often receive particular attention in clinical practice. Thus, measures for the retention of the words after 25 minutes, relative to the number of words recalled in the last learning trials, yielded reliabilities smaller than .20, and these were reduced to practically zero when the total score was partialled out. Proportions were considered, such as the delayed recall score divided by the mean of the Trial 4 and Trial 5 scores (which resulted in the highest reliability, .19), as well as differences, such as the Trial 5 score minus the delayed recall score. Likewise unreliable, in the order of .25, were different measures for the steepness or flatness of the learning curve, for example, the Trial 5 minus the Trial 1 score, and the linear component of the learning curve. Controlling for the total score, these reliabilities as well became practically zero. It is worth noting here that the relationship of the steepness/flatness measures with the total score was nonlinear, indicating that a flat learning curve may be associated with a poor total score, but also with a high total score (when many words are already recalled in the first trials). Test-Retest Practice Effects As Table 2 indicates for the total and delayed recall scores, no test-retest practice effects were apparent for any of the AVTL measures. Overall, the average total score was 0.5 word higher at the second test administration, F(1, 224) 5 1.09, p 5 .30, and the delayed recall score was 0.1 word lower.
1We are indebted to Professor J. M. F. ten Berge who, using iterative procedures, performed the necessary computations in a jiffy.
AVLT in Children
551
TABLE 3 Parallel Test-Retest Reliabilities of AVLT Measures, Without and With Age Accounted for [in Brackets] Trial 1 Trial 2 Trial 3 Trial 4 Trial 5 Total
.38* .47* .50* .56* .61* .70*
[.30*] [.35*] [.35*] [.46*] [.51*] [.59*]
Errors in trials 1–5 Repeats in trials 1–5 Delayed recall Errors in delayed recall Repeats in delayed recall
.20* .27* .62* .07 .12
[.20*] [.27*] [.52*] [.07] [.12]
* p , .01.
Factors Affecting Test Performance: Intercorrelations of the AVLT Measures Age had by far the strongest impact on test performance. Controlling for age, hierarchical regression analyses showed that sex, school and class, and the examiner, all had a significant additional influence on one or more AVLT scores, on one or both testing occasions. When that was the case, percentages variance explained on the two testing occasions were pooled. Because test forms could be considered as equivalent and no practice effect was apparent, this pooling could well be done by performing the pertinent regression analyses on a data file, consisting of 450 records, with data of the second administration added under those of the first administration. The main results were as follows (percentages variance explained are corrected for shrinkage). Age explained 27.2% of the variance of the total score, school and class added a further 5.2%, and the examiner subsequently 1.5%. Adjusting for age and school and class, the mean difference between examiners was 2.1 words (SD 5 1.4), with a maximum of 4.4 words. Any influence of sex was deemed negligible. However, a slightly better performance of girls than of boys was suggested; controlling for age, the estimated mean difference was 1.8 words (.10 . p . .05 in analyses for both test administrations). Age explained 20.3% of the variance of the delayed recall score, school and class added a further 4.3%, and the examiner subsequently 1.5%. Adjusting for age and school and class, the mean difference between examiners was 0.6 words (SD 5 0.35), with a maximum of 1.3 words. Sex had no significant influence. Explaining 2.4% of the variance, only sex was clearly related to the errors in Trials 1–5. Errors in delayed recall, repeats in Trials 1–5, repeats in delayed recall, and intrusion errors in the second test administration were all unrelated to any of the factors examined. Table 4 shows the intercorrelations of the several AVLT measures.
TABLE 4 Correlations Between AVLT Measures (N 5 450)
(1) Total score (2) Errors in trials 1–5 (3) Repeats in trials 1–5 (4) Delayed recall score (5) Errors in delayed recall (6) Repeats in delayed recall * p , .01 (assuming N 5 225).
(1)
(2)
(3)
(4)
(5)
1.00 2.24* .25* .78* 2.21* .17*
1.00 .01 2.19* .57* 2.07
1.00 .23* 2.02 .52*
1.00 2.24* .21*
1.00 2.08
552
W. van den Burg and A. Kingma
Normative Data Taking account of age and sex, if relevant, Tables 5 to 7 contain means, standard deviations, percentages and percentile scores, as observed in the present study. In Table 5 the data collected by Forrester and Geffen (1991) are included. These correspond very well with ours, but there are two differences. First, in the Forrester and Geffen study, standard deviations of the scores of the two older groups were generally smaller than those in our corresponding groups; second, the mean delayed recall score of the oldest group was lower than in our groups (allowing for unequal standard deviations, Welch’s test was used; cf. Timm, 1975). Because of very skewed distributions, data on errors and repeats are presented by cumulative percentages (Table 6). Forrester and Geffen reported only means and standard deviations of errors in Trials 1–5. From these data an overall mean of 0.9 can be inferred (SD 5 1.6), which is lower than our mean of 1.9 (SD 5 3.3). It is almost certain that the difference is significant, although the best test we could apply (Welch’s test, yielding significancy at p 5 .001) presupposes a normal distribution of the scores, and therefore is dubious. Because normative data for intrusion errors are of little use, these are not included in Table 6. It is worth noting, however, that intrusion errors were made by 18% of the children (1 by 14%, 2 by 2%, and 3 by 1%). Rather than the data on the total and delayed recall scores given in Table 5, for assessment purposes, we mostly use the distributions of residual scores, computed by subtracting from the total and delayed recall scores the regression-prediction of these scores by ages (Table 7). This is a viable alternative because of the approximately linear relationship with ages (Table 2), and because standard deviations in the age groups hardly differed (Table 5). The advantage is that use can be made of two “robust” reference tables which are based on all 450 records. The prediction of the total score was: 13 1 3.4(ages), and of the delayed recall score: 2.5 1 0.8(ages). Instead of referring to the percentile scores tabulated, one may compute standard scores by means of the standard deviations presented, and convert these to percentile scores using a table of the standard normal distribution: the distribution of the residuals of the total scores did not deviate
TABLE 5 Mean Numbers (SD) of Words Correctly Recalled (N 5 450), According to Age and Trial, and Including the Data of Forrester and Geffen (1991)
6 Years 7 Years 7.4 Yearsa 8 Years 9 Years 9.4 Yearsa 10 Years 11 Years 11.6 Yearsa 12 Years 10–12 Years
Trial 1
Trial 2
Trial 3
Trial 4
Trial 5
Total Score
Delayed Recall
n
4.1 (1.0) 4.5 (1.6) 4.5 (1.3) 4.9 (1.4) 5.6 (1.4) 5.8 (1.2) 5.9 (1.6) 5.7 (1.5) 6.2 (1.0c) 5.5 (1.6) 5.7 (1.6)
6.1 (1.6) 6.7 (1.8) 6.7 (1.8) 7.6 (2.0) 8.2 (1.9) 8.9 (1.6) 8.6 (1.6) 8.7 (1.8) 8.3 (1.5) 8.6 (1.7) 8.6 (1.7)
6.8 (1.7) 7.9 (2.3) 8.1 (2.2) 8.5 (2.0) 9.7 (2.2) 9.9 (1.8) 10.1 (1.9) 10.2 (2.1) 9.7 (1.9) 10.2 (1.9) 10.1 (2.0)
8.1 (2.3) 8.4 (2.1) 9.4 (2.3) 9.6 (2.2) 10.5 (2.0) 10.9 (1.5) 10.7 (2.1) 11.0 (2.2) 11.4 (1.4c) 11.1 (2.0) 10.9 (2.1)
8.0 (2.3) 9.3 (2.3) 10.2 (2.6) 9.9 (2.3) 10.9 (2.0) 11.3 (1.3b) 11.2 (2.0) 11.4 (1.9) 11.5 (1.5) 11.5 (2.0) 11.4 (1.9)
33.1 (7.3) 36.8 (8.3) 38.9 (7.9) 40.4 (8.1) 44.8 (7.3) 46.7 (5.4) 46.5 (7.4) 47.0 (7.8) 46.9 (5.0d) 47.0 (6.6) 46.8 (7.3)
7.5 (2.2) 7.7 (2.3) 8.4 (2.6) 8.8 (2.5) 10.0 (2.1) 9.9 (2.3) 10.3 (2.3) 10.6 (2.5) 9.6e (1.4c) 10.7 (2.0) 10.5 (2.3)
41 68 20 94 66 20 64 72 20 45 181
Note. For all tests, p 5 .05 was used (two-tailed). a Data of Forrester and Geffen (1991). b SD smaller than in 9- and 10-year-old groups. c SD smaller than in 11-, 12-, and 10- to 12-year-old groups. d SD smaller than in 11- and 10- to 12-year-old groups. e Mean smaller than in 11-, 12-, and 10- to 12-year-old groups.
AVLT in Children
553
TABLE 6 Errors and Repeats, Cumulative Percentages in 450 Records In Trials 1–5 Errors
Boys Girls Repeats
0
#2
#4
#6
#8
#10
#12
41.5% 65.6% 79.0% 88.4% 94.7% 96.0% 98.7% 53.1% 77.9% 92.1% 96.1% 97.4% 99.2% 99.6% 0
#3
#6
#9
#12
#15
#18
9.3% 45.5% 73.5% 89.1% 94.7% 97.4% 98.3% In Delayed Recall Errors
0
#1
#2
#3
#4
66.9% 90.2% 97.3% 98.2% 99.3% Repeats
0
#2
#4
#6
#8
50.2% 90.6% 97.1% 99.1% 99.5%
significantly from a normal distribution, and that of the delayed recall scores hardly so (.01 , p , .05, Kolmogorov-Smirnov tests). For a delayed recall score relative to what was learned before (Table 7), we have a slight preference for the delayed recall residual score, computed by subtracting from the delayed recall score the regression-prediction of this score by the total score, which was 1 1 0.2(Total Score). This residual score is by definition completely independent of the total score, was distributed normally (Kolmogorov-Smirnov test), and was slightly more reliable (.28) than the alternative, delayed recall, % (.19). The latter index, however, may be more attractive conceptually. In view of the different meanings a flat learning curve may have (see above), no normative data for an index of this aspect are included in Table 7. The last two entries of Table 7 contain normative data for differences which can be expected between total scores and between delayed recall scores when a retest is
TABLE 7 Normative Data for Some Additional AVLT Measures Percentile Scores Measure
ages b
Total score | Delayed recall |ages c Delayed recall | Total scored Delayed recall, % e Change in Total score f Change in Delayed recall g aA
N
Meana
450 450 450 450 225 225
0 0 0 93% 0 0
SD
P1
P5
P10
7.6 217 213 210 2.3 24.9 23.6 22.9 1.65 24.2 22.8 22.2 18.4% 55% 64% 71% 6.9 214 212 28 2.25 26 24 23
P20
P80
P95
P99
26 7 10 13 16 22.1 2.1 2.9 4.1 5.2 21.4 1.4 2.0 2.6 3.8 78% 105% 113% 120% 139% 25 7 9 12 18 22 2 3 4 6
mean of zero should be read as negligibly different from zero. score minus its prediction by 3.4(ages) 1 13. c Delayed recall score minus its prediction by 0.8(ages) 1 2.5. d Delayed recall score minus its prediction by 0.2 (Total score) 1 1. e Delayed recall score divided by the mean of the Trial 4 and Trial 5 scores, times 100%. f Second Total score minus first. g Second Delayed recall score minus first. b Total
P90
554
W. van den Burg and A. Kingma
administered (with an interval of 3 months). These data may be useful for assessing whether memory ability has changed, due for example to illness or treatment.
DISCUSSION The normative data for Dutch children corresponded well with those established for Australian children by Forrester and Geffen (1991)—this despite the fact that these investigators tested small samples and followed Lezak’s procedure, including different words (the American standard list) and an interference trial, among other differences. As we did, they paid ample attention to the representativeness of their groups. The findings, therefore, together with the fact that we succeeded without too much effort in constructing two parallel versions of the Dutch AVLT, suggest that the test may stand all these differences fairly well, and that both sets of normative data can have applicability outside the countries or language areas where they were established. We shall elaborate on this point in five ways. 1. A worrying difference may have been that Forrester and Geffen, complying with Lezak (1995) and earlier editions, used a presentation rate of one word per second, whereas we, complying with Rey (1958), E. M. Taylor (1959), and Spreen and Strauss (1991), used a 1-second interval between each word. Wiens et al. (1988), applying a 1-second interval, drew attention to this difference. The present and other findings, however, indicate that it is not important. Sarver, Howland, and McManus (1976), who required school-aged children to recall auditorially presented digits, hardly found an effect of presentation rate on immediate and delayed recall in the range of rates which is relevant here. Reitsma and Deelman (1979), in a small, unpublished study at our department using medical students, compared an AVLT version consisting of 15 concrete nouns with a version consisting of 15 abstract nouns, and applied presentation rates of 1 word per second, 1 word per 11⁄2 seconds, and 1 word per 2 seconds. The faster presentations had an adverse effect on learning the abstract words but not the concrete words. 2. It is not very clear which considerations led to the composition of Rey’s original French/Swiss list or its American (cf. Wiens et al., 1988) and Dutch adaptations, but it is perfectly clear that for constructing parallel forms the words should be chosen carefully, by means of such (largely overlapping) criteria as applied by Crawford et al. (1989), Geffen et al. (1994), Shapiro and Harrison (1990), and ourselves. At the same time, there is no doubt that suitable words in one country or language may be less suitable in another country, or when directly translated into another language. For example, turkey (both in the American and as dindon in the French list) would be less suitable in a Dutch list because only an estimated 60% of Dutch 6-year-old children appears to know the animal (Kohnstamm et al., 1981). Conversely, it is obvious that watering can, the translation of our gieter in List III, does not qualify for an English list. However, the present findings indicate that in such cases judiciously chosen alternatives may lead to test forms with comparable difficulty across countries and languages. Concurrently, the findings suggest that the ability to memorize common, concrete nouns, as tapped by the AVLT, develops at a similar rate during childhood in (Western) countries. Thus, the test perhaps may serve well as a paradigm in cross-cultural neuropsychology (Ardila, 1995). Using a fairly literal translation of the American standard list, Heubrock (1994) reported that 50 young and healthy German adults performed like
AVLT in Children
555
their Australian peers. In an interesting study, Maj et al. (1993) tried to minimize any cultural bias of a new AVLT by including only words from five universally familiar categories, such as parts of the body and animals. However, thereby this version no longer consists of unrelated words. In a small German sample, the performance on this version correlated poorly with that on the (German translation of the) American standard. 3. By sampling only children from ordinary schools, most children with serious learning and behavioral disturbances were excluded from our study. (To that extent, the sample was not completely representative for the general population.) In the Australian study a further selection was made by excluding “learning-disabled children receiving special assistance.” We are unable to guess how many of our children might have met this exclusion criterion, but were informed by the teachers that the few children who scored many errors frequently were “problematic.” All four children with more than seven errors in Trials 1–5 on both testing occasions were notoriously known as such. We therefore assume that the difference in sampling at least partly explains the difference in errors scored in the two studies. In all probability the inclusion of such problematic children had only a minor influence on means and standard deviations of the numbers of correctly recalled words. The mean total and delayed recall score of the four “notorious” cases were both only about 2/3 SD lower than those of other children (taking age into account). A similar difference held for all children with errors in Trials 1–5 in the upper 5% range. More generally, the correlations between the two kinds of measures were very low (Table 4). These findings indicate, as is our clinical experience, that scoring many errors is often not clearly related to learning/memory ability per se. 4. Although we included no interference trial as in the Australian study, mean delayed recall scores still corresponded well, except for the oldest children (Table 5). Given the equality of the trial scores, this agreement suggests that the net effect of the interference trial, the subsequent recall of the learning list (which forms a rehearsal trial too), and the delay interval is about the same as the effect of the delay interval only. It should be noted further that in the Australian study delayed recall performance of children with a mean age of 9.4 years was slightly better than in the group of 11.6 years. This adds to the conjecture that this (sole) discrepancy is partly or completely a chance result. 5. The standard deviations of the scores of the two older age groups in the Australian study were generally smaller than those in ours (Table 5). This is important because standard deviations in norm groups determine to a large extent whether a subject’s test performance is considered deficient. The discrepancy may be related to a minor degree to the difference in exclusion criteria (as discussed), and may once again be a chance result, but might also be due to the way the Australian children were selected. Each age group was carefully composed such that occupational status of parents closely approximated the distribution of occupational status in the general population. However, one reason why many statisticians are opposed to this nonprobability method of quota sampling is that the true variance in the population cannot be estimated (e.g., Kish, 1967). The issue has an interesting history (cf. Converse, 1987). Forrester and Geffen may have selected rather average children in the age range of 9 to 12 years, leading to too small standard deviations. In any case, the standard deviations observed in the present study in this age range correspond much better with those found in younger children (Table 5), in older children (Forrester & Geffen, 1991; Spreen & Strauss, 1991), and in normal adults (Bleecker et al., 1988; Geffen et al., 1994; Geffen et al., 1990; Helm-
556
W. van den Burg and A. Kingma
staedter & Durwen, 1990; Heubrock, 1994; Ivnik et al., 1990; Mitrushina et al., 1991; Savage & Gouvier, 1992; Selnes et al., 1991; Wiens et al., 1988). All in all, we believe that these standard deviations are realistic. Turning to other aspects of the present study, we found that boys made more errors than girls. The better performance of girls, however, appeared at best marginally in the words correctly recalled. In that respect our results are similar to those of Forrester and Geffen (1991) and Bishop, Knights, and Stoddart (1990), who both reported no significant sex difference in children. For adults, findings are less consistent. Bleecker et al. (1988) and Geffen et al. (1990) observed that correct recall of adult females was appreciably better than that of adult males, but other data showed a marginal difference at best (Heubrock, 1994; Majdan, Sziklas, & Jones-Gotman, 1996; Savage & Gouvier, 1992). It is remarkable, however, that no study seems to have shown (a trend towards) a better performance of males. In a recognition task, Majdan et al. (1996) observed more errors in male than in female college students. Explaining 5.2% of the variance of the total score, the factor school and class indicated a modest influence of intellectual level and socioeconomic background (combined). This influence was certainly not measured profoundly in this way, but the order of magnitude is compatible with other findings. Bishop et al. (1990) reported correlations ranging from .22 to .36 between AVLT scores and IQ scores in a group of 252 children who were referred for neuropsychological assessment. (In this clinical group a stronger relationship can be expected than in our group.) In normal adults generally a similar, modest influence of educational and intellectual level has been found (Bleecker et al., 1988; Geffen et al., 1990; Petersen, Smith, Kokmen, Ivnik, & Tangalos, 1992; Selnes et al., 1991), except when extreme levels are compared (Wiens et al., 1988). For a discussion, see Savage and Gouvier (1992). With a parallel test-retest reliability of .70 in the group of children as a whole, the most reliable AVLT measure was the total score. Attempts to find a combination of trial scores with a higher reliability than the total score were unrewarding. The delayed recall score and the Trial 5 score were next best, and the reliabilities of the other trial scores decreased with trial number (Table 3). Two parallel test-retest reliability studies (Delaney, Prevey, Cramer, Mattson, & the VA Epilepsy Cooperative Study #264 Research Group, 1992; Geffen et al., 1994), and one test-retest study using the same form (Mitrushina & Satz, 1991), suggest a similar pattern of reliabilities for normal adults, but with somewhat higher values. Geffen et al. (1994) found .77 for the total score (N 5 51, interval 6–14 days), and Delaney et al. (1992) reported .61 to .86 for the trial scores (N 5 42, interval 1 month). The most relevant indices, however, are arguably reliabilities which have been purged of the influence of age, because this influence is also eliminated when comparing AVLT scores with age norms in practical work. Age had a substantial impact in all four studies. In the present study the “purified” reliability was .59 for the total score. In the other three studies, such reliabilities were not computed. The reliability of a number of other measures was very low (Table 3). This indicates that normative data for these measures may mainly (or only) be useful insofar as boundaries are set up for what normally can be expected. Even extreme scores on these measures may have to be interpreted with caution, however. Lezak (1995) points out that many repeats may reflect a problem in self-monitoring and tracking, particularly when the subject can recall relatively few words; but this phenomenon may also indicate a strategy to retrieve the other words. Because repeats correlated positively with total and delayed recall scores (Table 4), this strategy may even be fruitful. The possible significance of errors has been discussed above. Likewise unreliable were measures for the
AVLT in Children
557
flatness of the learning curve and for a delayed recall score relative to what was learned before. Geffen et al. (1994) also reported very unsatisfying parallel test-retest reliabilities of similar and other, derived measures. Thus, although performance on the AVLT undoubtedly may provide much clinically relevant information, reliability data advise limited interpretation of features such as the flatness of the learning curve, the number of repeats, and (the nature of the) errors, if the performance is not clearly deficient. Usually, this deficiency is most reliably assessed by the total score, the delayed recall score and the Trial 5 score (cf. Powell et al., 1991). Corresponding to parallel test-retest reliabilities, the rather wide range of differences observed between total scores and between delayed recall scores on the two testing occasions (Table 7) demands caution in interpreting even a considerable difference as evidence that memory ability has changed. Fortunately, by using parallel test forms, any test-retest practice effect was minimal, as was found in other studies (Crawford et al., 1989; Geffen et al., 1994; Shapiro & Harrison, 1990). Similarly, Crossen and Wiens (1994) found no significant practice effect when the AVLT was preceded or followed by the related California Verbal Learning Test. The necessity of using a parallel form for a retest was clearly manifested in the present study by the fact that so many children made intrusion errors (18%). One obvious reason for the moderate reliabilities observed (accounting for age) may be that the focused attention needed for a good performance fluctuates, and perhaps more in children than in adults. As a new finding, we established that the examiner may play a role also, despite the fact that they were well-trained in our study, and that the presentation of the word list was standardized by using a taperecorder. Although this influence was overall not strong, accounting for 1.5% of the variance of both the total and delayed recall scores, in particular the largest differences observed between the examiners suggest that the influence may not be negligible. The factor may be considered for explaining between-study differences. We think that the persistence of encouraging a child to retrieve as many words as possible determines the effect. In retrospect, a good alternative to our study design might have been to construct at least one more equivalent AVLT form and administer two test forms on both the first and second occasions. This could have thrown light on the questions to what extent momentary fluctuations and longer term changes are responsible for the moderate parallel test-retest reliabilities (controlling for age), and how far these reliabilities might be enhanced by using mean test scores of two AVLTs. In any case, it seems wise for a neuropsychological assessment of a school-aged child to add to one AVLT further tests of verbal memory: different tests (cf. Lezak, 1995) and/or another AVLT. In the latter case the Spearman-Brown prophecy formula would predict the reliability to increase from .59 to .74 for the mean total score, and from .52 to .68 for the mean delayed recall score. Whether this is true, however, is a subject for further research.
REFERENCES Ardila, A. (1995). Directions of research in cross-cultural neuropsychology. Journal of Clinical and Experimental Neuropsychology, 17, 143–150. Beery, K. E., & Buktenica, N. A. (1989). Developmental Test of Visual-Motor Integration. Odessa, FL: Psychological Assessment Resources. Berg, I. J., Koning-Haanstra, M., & Deelman, B. G. (1991). Long term effects of memory rehabilitation: A controlled study. Neuropsychological Rehabilitation, 1, 97–111. Bishop, J., Knights, R. M., & Stoddart, C. (1990). Rey Auditory-Verbal Learning Test: Performance of English and French children aged 5 to 16. The Clinical Neuropsychologist, 4, 133–140.
558
W. van den Burg and A. Kingma
Bleecker, M. L., Bolla-Wilson, K., Agnew, J., & Meyers, D. A. (1988). Age-related sex differences in verbal memory. Journal of Clinical Psychology, 44, 403–411. Brouwers, P., & Poplack, D. (1990). Memory and learning sequelae in long-term survivors of acute lymphoblastic leukemia: Association with attention deficits. American Journal of Pediatric Hematology/Oncology, 12, 174–181. Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Converse, J. M. (1987). Survey research in the United States; roots and emergence 1890–1960. Berkeley: University of California Press. Crawford, J. R., Stewart, L. E., & Moore, J. W. (1989). Demonstration of savings on the AVLT and development of a parallel form. Journal of Clinical and Experimental Neuropsychology, 6, 975–981. Crockett, D. J., Hadjistavropoulos, T., & Hurwitz, T. (1992). Primacy and recency effects in the assessment of memory using the Rey Auditory Verbal Learning Test. Archives of Clinical Neuropsychology, 7, 97–107. Crossen J. R., & Wiens, A. N. (1994). Comparison of the Auditory-Verbal Learning Test (AVLT) and California Verbal Learning Test (CVLT) in a sample of normal subjects. Journal of Clinical and Experimental Neuropsychology, 16, 190–194. Delaney, R. C., Prevey, M. L., Cramer, J., Mattson, R. H., & the VA Epilepsy Cooperative Study #264 Research Group. (1992). Test-retest comparability and control subject data for the Rey-Auditory Verbal Learning Test and Rey-Osterrieth/Taylor Complex Figures. Archives of Clinical Neuropsychology, 7, 523–528. Forrester, G., & Geffen, G. (1991). Performance measures of 7- to 15-year-old children on the Auditory Verbal Learning Test. The Clinical Neuropsychologist, 5, 345–359. Geffen, G., Butterworth, P., & Geffen, L. B. (1994). Test-retest reliability of a new form of the Auditory Verbal Learning Test (AVLT). Archives of Clinical Neuropsychology, 9, 303–316. Geffen, G., Moar, K. J., O’Hanlon, C. R., Clark, C. R., & Geffen, L. B. (1990). Performance measures of 16- to 86-year-old males and females on the Auditory Verbal Learning Test. The Clinical Neuropsychologist, 4, 45–63. Helmstaedter, C., & Durwen, H. F. (1990). VLMT: Verbaler Lern- und Merkfähigkeitstest. Schweizer Archiv für Neurologie und Psychiatrie, 141, 21–30. Heubrock, D. (1994). Auditiv-verbales Lernen unter standardisierten Bedingungen: Erste deutsche Normen für 18- bis 26-jährige Männer und Frauen zum Auditiv-Verbalen Lerntest (AVLT). Zeitschrift für Differentielle und Diagnostische Psychologie, 15, 65–76. Heubrock, D. (1995). Error analysis in neuropsychological assessment of verbal memory and learning. European Journal of Psychological Assessment, 11, 21–28. Isaacs-Glaberman, K., Medalia, A., & Scheinberg, H. (1989). Verbal recall and recognition abilities in patients with Wilson’s disease. Cortex, 25, 353–361. Ivnik, R. J., Malec, J. F., Tangalos, E. G., Petersen, R. C., Kokmen, E. & Kurland, L. T. (1990). The AuditoryVerbal Learning Test (AVLT): Norms for ages 55 years and older. Psychological Assessment, 2, 304–312. Ivnik, R. J., Malec, J. F., Smith, G. E., Tangalos, E. G., Petersen, R. C., Kokmen, E. & Kurland, L. T. (1992). Mayo’s older American normative studies: Updated AVLT norms for ages 56 to 97. The Clinical Neuropsychologist, 6 (Suppl.), 83–104. Jannoun, L., & Chessels, J. M. (1987). Long-term psychological effects of childhood leukemia and its treatment. Pediatric Hematology and Oncology, 4, 293–308. Kalverboer, A. F., & Deelman, B. G. (1964). Voorlopige selectie van enkele tests voor het onderzoeken van geheugenstoornissen van verschillende aard. Unpublished report, Groningen University, Department of Neuropsychology, Groningen, The Netherlands. Kingma, A., Mooyaart, E. L., Kamps, W. A., Nieuwenhuizen, P., & Wilmink, J. T. (1993). Magnetic resonance imaging of the brain and neuropsychological evaluation in children treated for acute lymphoblastic leukemia at a young age. American Journal of Pediatric Hematology/Oncology, 15, 231–238. Kish, L. (1967). Survey sampling. New York: Wiley. Kohnstamm, G. A. (1981). Nieuwe streeflijst woordenschat voor 6-jarigen: Gebaseerd op onderzoek in Nederland en België. Lisse: Swets & Zeitlinger. Lezak, M. D. (1995). Neuropsychological assessment (3rd ed.). New York: Oxford University Press. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley. Maj, M., D’Elia, L., Satz, P., Janssen, R., Zaudig, M., Uchiyama, C., Starace, F., Galderisi, S., & Chervinsky, A. (1993). Evaluation of two new neuropsychological tests designed to minimize cultural bias in the assessment of HIV-1 seropositive persons: A WHO study. Archives of Clinical Neuropsychology, 8, 123–135. Majdan, A., Sziklas, V., & Jones-Gotman, M. (1996). Performance of healthy subjects and patients with resection from the anterior temporal lobe on matched tests of verbal and visuospatial learning. Journal of Clinical and Experimental Neuropsychology, 18, 416–430. Mitrushina, M., & Satz, P. (1991). Effect of repeated administration of a neuropsychological battery in the elderly. Journal of Clinical Psychology, 47, 790–801.
AVLT in Children
559
Mitrushina, M., Satz, P., Chervinsky, A., & D’Elia, L. (1991). Performance of four age groups of normal elderly on the Rey Auditory-Verbal Learning Test. Journal of Clinical Psychology, 47, 351–356. Neale, J. M., & Liebert, R. M. (1986). Science and behavior (3rd ed.). Englewood Cliffs, NJ: Prentice Hall. Norusis, M. J. (1986). SPSS/PC1. Chicago: SPSS. Petersen, R. C., Smith, G., Kokmen, E., Ivnik, R. J., & Tangalos, E. G. (1992). Memory function in normal aging. Neurology, 42, 396–401. Powell, J. B., Cripe, L. I., & Dodrill, C. B. (1991). Assessment of brain impairment with the Rey Auditory Verbal Learning Test: A comparison with other neuropsychological measures. Archives of Clinical Neuropsychology, 6, 241–249. Reitsma, B., & Deelman, B. G. (1979). Geheugenstoornissen en imagery. Unpublished report, Groningen University, Department of Neuropsychology, Groningen, The Netherlands. Rey, A. (1941). L’examen psychologique dans les cas d’encéphalopathie traumatique. Archives de Psychologie, 28(112), 286–340. Rey, A. (1958). L’examen clinique en psychologie. Paris: Presses Universitaires de France. Reynolds, C. R. (1989). Measurement and statistical problems in neuropsychological assessment of children. In C. R. Reynolds & E. Fletcher-Janzen (Eds.), Handbook of clinical child neuropsychology (pp. 147– 166). New York: Plenum. Ryan, J. J., Paolo, A. M., & Skrade, M. (1992). Rey Auditory Verbal Learning Test performance of a federal corrections sample with acquired immunodeficiency syndrome. International Journal of Neuroscience, 64, 177–181. Sarver, G. S., Howland, A., & McManus, T. (1976). Effects of age and stimulus presentation rate on immediate and delayed recall in children. Child Development, 47, 452–458. Savage, R. M., & Gouvier, W. D. (1992). Rey Auditory-Verbal Learning Test: The effects of age and gender, and norms for delayed recall and story recognition trials. Archives of Clinical Neuropsychology, 7, 407– 414. Selnes, O. A., Jacobson, L., Machado, A. M., Becker, J. T., Wesch, J., Miller, E. N., Visscher, B. & McArthur, J. C. (1991). Normative data for a brief neuropsychological screening battery. Perceptual and Motor Skills, 73, 539–550. Shapiro, D. M., & Harrison, D. W. (1990). Alternate test forms of the AVLT: A procedure and test of form equivalency. Archives of Clinical Neuropsychology, 5, 405–410. Spikman, J. M., Berg, I., & Deelman, B. G. (1995). Spared recognition capacity in elderly and closed-headinjury subjects with clinical memory deficits. Journal of Clinical and Experimental Neuropsychology, 17, 29–34. Spreen, O., & Strauss, E. (1991). A compendium of neuropsychological tests. New York: Oxford University Press. Stallings, G., Boake, C., & Sherer, M. (1995). Comparison of the California Verbal Learning Test and the Rey Auditory Verbal Learning Test in head-injured patients. Journal of Clinical and Experimental Neuropsychology, 17, 706–712. Sutker, P. B., Allain, A. N., Johnson, J. L., & Butters, N. M. (1992). Memory and learning performances in POW survivors with history of malnutrition and combat veteran controls. Archives of Clinical Neuropsychology, 7, 431–444. Taylor, A. E., Saint-Cyr, J. A., & Lang, A. E. (1986). Frontal lobe dysfunction in Parkinson’s disease. Brain, 109, 845–883. Taylor, E. M. (1959). Psychological appraisal of children with cerebral defects. Cambridge, MA: Harvard University Press. Timm, N. H. (1975). Multivariate analysis with applications in education and psychology. Monterey, CA: Brooks/Cole. Tromp, C. N., & Van den Burg, W. (1982). Verbal memory impairment in treated hydrocephalic children. Zeitschrift für Kinderchirurgie, 37, 175–178. Uit den Boogaart, P. C. (1975). Woordfrequenties in geschreven en gesproken Nederlands. Utrecht: Oosthoek, Scheltema & Holkema. Van den Burg, W., Van Zomeren, A. H., Minderhoud, J. M., Prange, A. J. A., & Meijer, N. S. A. (1987). Cognitive impairment in patients with multiple sclerosis and mild physical disability. Archives of Neurology, 44, 494–501. Van Zomeren, A. H., & Brouwer, W. H. (1994). Clinical neuropsychology of attention. New York: Oxford University Press. Wiens, A. N., McMinn, M. R., & Crossen, J. R. (1988). Rey Auditory-Verbal Learning Test: Development of norms for healthy young adults. The Clinical Neuropsychologist, 2, 67–87.