Verbal Paired Associates tests limits on validity and reliability

Verbal Paired Associates tests limits on validity and reliability

Archives of Clinical Neuropsychology 17 (2002) 567–581 Verbal Paired Associates tests limits on validity and reliability夽 Bob Uttl a,∗ , Peter Graf b...

244KB Sizes 0 Downloads 59 Views

Archives of Clinical Neuropsychology 17 (2002) 567–581

Verbal Paired Associates tests limits on validity and reliability夽 Bob Uttl a,∗ , Peter Graf b , Laura K. Richter c a

Department of Psychology, University of Oregon State, 204C Moreland Hall, Corvallis, OR 97331-5303, USA b Department of Psychology, University of British Columbia, Vancouver, BC, Canada c Gustavus Adolphus College, St. Peter, MN, USA Accepted 21 May 2001

Abstract The Verbal Paired Associates (VPA) subtest from the Wechsler Memory Scale-III (WMS-III) is one of the most widely used instruments for assessing explicit episodic memory performance. The normative data for the VPA subtest in the WMS-III manual show clear evidence of performance ceiling effects that limit the usefulness of this instrument. For this reason, we developed a new 15-item VPA test and we report normative data obtained from a partially stratified sample of 351 healthy adults between 18 and 91 years of age. Only a small fraction of participants obtained perfect scores on our new Paired Associates test. The results show the expected large age-related decline in memory acquisition, indexed by performance on the first study test trial, together with a much smaller age effect on learning across trials. The results also show that performance on the Paired Associates test is related to education, verbal IQ, and to a lesser extent, participants’ sex. We provide various equations for precise predictions of Paired Associates test performance. © 2002 National Academy of Neuropsychology. Published by Elsevier Science Ltd. All rights reserved. Keywords: Memory assessement; Wechsler Memory Scale; WMS; Ceiling effects; Verbal Paired Associates

1. Introduction The Verbal Paired Associates (VPA) test from the Wechsler Memory Scale (WMS) is among the most widely used instruments for assessing explicit episodic memory. It has benefited from 夽

A preliminary report of this work was presented at the annual meeting of the National Academy of Neuropsychology in San Antonio, TX (November, 1999). ∗ Corresponding author. Tel.: +1-541-737-1374; fax: +1-541-737-3547. E-mail address: [email protected] (B. Uttl). 0887-6177/02/$ – see front matter © 2002 National Academy of Neuropsychology. PII: S 0 8 8 7 - 6 1 7 7 ( 0 1 ) 0 0 1 3 5 - 4

568

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

two major revisions. The original test (Wechsler, 1945) required subjects to learn six pairs of related words and four pairs of unrelated words across three study test trials, and it was used to compute a single index of memory acquisition. In 1987, the to-be-remembered materials were changed to consist of four pairs of related and four pairs of unrelated words, and a 30-min delayed recall test was added (Wechsler, 1987). In 1997, the test was revised again (Wechsler, 1997). For this revision called the VPA-WMS-III, the task is to learn eight unrelated word pairs across four study test trials, followed after 30 min by a delayed recall test and a recognition test. The VPA-WMS-III yields a memory acquisition score, a learning score, as well as delayed recall and recognition scores and, thus, is sensitive to diverse aspects of memory functioning. An additional factor favoring its widespread clinical use is that the VPA-WMS-III manual comes with extensive standardization norms obtained from a partially stratified US sample of 1250 healthy adults between 16 and 89 years of age. Our focus in this article is on performance ceiling effects on the VPA-WMS-III. Performance ceiling effects occur with tests that are relatively easy, resulting in a score distribution that is compressed at the upper end. In general, ceiling effects—when performance is at or near the maximum possible—are undesirable. They limit a test’s ability to ferret out differences among high-scoring individuals. They also reduce the true range of scores, leading to an underestimate of sample variability and biasing any derived scores whose computation uses the sample variability. The normative data for the VPA subtest in the WMS-III manual (Wechsler, 1997) show clear evidence of ceiling effects in performance. The manual includes a table for converting raw scores into scaled scores and we used this table to compute the raw scores corresponding to the 75th percentile, that is, the upper limit of the middle 50% of the sample. Table 1 shows the obtained raw scores for subjects from different age groups for the following: recall on Trial 1 (max = 8), total recall across four study test trials (max = 32), learning—defined as the difference between recall on Trial 4 minus recall on Trial 1 (max = 8), and delayed retention (max = 8). The delayed retention scores show the most obvious evidence of a performance ceiling effect. The data in Table 1 highlight that according to the norms, the top-performing 25% of subjects who are under 25 years of age are expected to score perfectly on the delayed recall test. In addition, the top-performing 25% of all subjects who are under 55 years of age are expected to score within 1 S.D. of the maximum. All of these scores may be invalidated because of ceiling effects. An examination of the difference between the Trial 1 scores and the total recall scores (based on Trials 1–4) reveals ceiling effects also in the latter. According to the published norms, a 17-year-old individual who is performing at the 75th percentile is expected to recall five of eight words on Trial 1 (see Table 1). The maximum possible total score across all trials is 32 words (i.e., eight words on each of four study test trials), but for a 17-year-old who fails to recall three words on Trial 1, the max = 29 (i.e., 32 − 3 words), and this can be achieved only by obtaining perfect scores on Trials 2–4. According to the norms, a 17-year-old at the 75th percentile is expected to obtain a total score of 28 (see Table 1), that is, he/she is expected to obtain perfect scores on two of the trials after the first trial. A similar analysis shows that 18–19-year-old at the 75th performance percentile are also expected to score perfectly on two trials following the first. The top-performing 25% of individuals from the 20–24 and 25–29

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

569

Table 1 WMS-III VPA test performance by a person scoring at the 75th percentile Age group (years)

Trial 1 recalla

Total recallb

Expected learning scorec

Maximum possible learning scored

Delayed retentione

16–17 18–19 20–24 25–29 30–34 35–44 45–54 55–64 65–69 70–74 75–79 80–84 85–89

5 4 3.5f 3.5 3 3 3 2.5 2 2 1.5 1.5 1

28 27 27 27 25 25 25 23 20 19 18 18 13

5.67 6 6 6 6 6 6 5.5 5 5 5 5 5

3 4 4.5 4.5 5 5 5 5.5 6 6 6.5 6.5 7

8 8 8 7.5 7.5 7.5 7.5 6.5 6 5 5 5 4

a

Recall on Trial 1 computed from Table D.2 in Wechsler (1997, pp. 149–161). Total recall on Trials 1–4 computed from Table D.1 in Wechsler (1997, pp. 135–147). c Expected learning score, defined as Trial 4 minus Trial 1 recall computed from Table D.2 in Wechsler (1997, pp. 149–161). d Maximum learning score defined by the maximum score on the delayed test (i.e., 8) minus Trial 1 recall. e Delayed retention = recall on the delayed test from Table D.1 in Wechsler (1997, pp. 135–149). f When no value was shown in Tables D.1 or D.2 (Wechsler, 1997), the value was interpolated from the available data. b

age groups are expected to fail to remember a total of only 1/2 word on Trials 2–4, and the top 25% of individuals in the 30–34, 35–44, and 45–54 age groups are expected to fail to remember a total of only two words on Trials 2–4. These observations emphasize that the total recall scores from most high-performing young and middle-aged individuals are affected by performance ceiling effects. The learning scores from the WMS-III manual give a compelling, direct demonstration of the distorting impact of performance ceiling effects. The raw score equivalents corresponding to the learning scores from the WMS-III manual are also listed in Table 1. The tabled values show that a 17-year-old who is performing at the 75th percentile is expected to achieve a learning score of 5.67, a value that is not achievable by any means. Learning is defined as the difference between Trial 4 minus Trial 1 recall (Wechsler, 1997). For the data in Table 1, the maximum possible score for a high-performing individual from the youngest age group is 3 (i.e., 8 − 5 words). Table 1 lists the maximum possible learning scores for the top-performing 25% of individuals from all age groups. The scores show that because of performance ceiling effects, the top-performing 25% of all individuals who are under 55 years of age have expected learning scores that cannot be achieved. We undertook the present investigation for several reasons. The first is to increase awareness of the ceiling effects problems associated with the VPA subtest of the WMS (for other commentaries on ceiling effects problems with earlier versions of the WMS VPA subtest,

570

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

see Theisen, Rapport, Axelrod, & Brines, 1998; Trahan, Larrabee, Quintana, Goethe, & Willingham, 1989). A second goal was to augment the arguments made in the preceding paragraphs with direct empirical evidence of performance ceiling effects that are bound to occur with the use of eight-item tests. The third reason was to provide new normative data with a 15-item Paired Associates test that avoids ceiling effects. The WMS-III manual does not give raw score means and standard deviations for the VPA subtest and an additional goal was to obtain such statistics for different age groups. Finally, even though it is recognized that factors such as sex, education, and intelligence can influence explicit episodic memory performance, these factors were not targeted by the WMS-III normative study (see Mitrushina, Boone, & D’Elia, 1999) and our goal was to furnish the missing, potentially valuable data.

2. Method 2.1. Subjects The subjects were 351 community-living healthy adults, between 18 and 91 years of age, who participated in a study on cognitive aging. They were recruited by means of advertisements in community newspapers and from a volunteer database at the National Institutes of Health in Bethesda, MD. All participants were tested between May 1997 and October 1998. Participants were paid between US$40 and US$50, depending on the time required for testing, for attending a single session that lasted from 3 to 4 h. The subjects were from the age groups shown in Table 2. Each group spans 10 years, except for the youngest and the oldest, and each group contains about the same number of men and women. The vast majority (98%) of subjects were native English speakers. The table gives demographic and other descriptive data for the entire sample. The participants were comparable in terms of years of formal education, except for the youngest group who averaged significantly fewer years of education than the other groups (by Newman–Keuls, with α = .05). The remaining groups were not significantly different from each other. Table 2 also shows participants’ performance on two tests of verbal ability—the North American Adult Reading Test (NAART; Spreen & Strauss, 1991) and the Vocabulary subtest from the Wechsler Adult Intelligence Scale—Revised (WAIS-R) (Wechsler, 1981). Finally, the table includes performance on the WAIS-R Digit Symbol subtest, a scale used in the computation of the WAIS-R Performance IQ index (Wechsler, 1981). Regression analyses showed a significant age-related increase in both NAART and WAIS-R Vocabulary raw scores, consistent with previous findings of a small, positive relation between age and verbal intellectual ability (Graf & Uttl, 1995; Spreen & Strauss, 1991, 1998; Uttl & Graf, 1997). Regression analysis of the Digit Symbol raw scores showed a large and linear age-related decline, consistent with numerous previous findings (Salthouse, 1985, 1988). Participants also responded to the following question: “How is your overall health at the present time? Excellent, good, fair, or poor?” Ratings were translated into values from 1 (poor) to 4 (excellent), and these values were averaged across subjects. Table 2 shows that subjects over 60 years of age rated their overall health slightly lower than the younger subjects.

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

571

572

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

573

2.2. Assessment instruments Participants completed a 3–4 h battery of tests focusing on sensation, perception, visuomotor coordination and speed, attention and processing resources, memory, knowledge, and language skills. A full description of the entire battery of tests appears in Uttl et al. (2001) and Uttl, Graf, and Cosentino (2000). Included in the battery was a new 15-item VPA test, hereafter called the VPA15.

Table 3 VPA15 word pairs and their study and test order presentation by trial Study order

Recall order

Trial 1 Frog-neck Metal-iron Foot-tree School-grocery Fruit-apple Hill-ring Baby-cries Obey-inch Crush-dark Girl-sign Coal-year Room-face Rose-flower Cabbage-pen Bank-milk

Rose (flower) Fruit (apple) Room (face) Coal (year) Metal (iron) School (grocery) Hill (ring) Frog (neck) Cabbage (pen) Bank (milk) Girl (sign) Obey (inch) Foot (tree) Baby (cries) Crush (dark)

Trial 2 Bank-milk Frog-neck Room-face School-grocery Coal-year Girl-sign Rose-flower Obey-inch Baby-cries Fruit-apple Metal-iron Cabbage-pen Foot-tree Crush-dark Hill-ring

Obey (inch) Bank (milk) Hill (ring) Crush (dark) Coal (year) Room (face) Foot (tree) Girl (sign) Baby (cries) Metal (iron) Frog (neck) Fruit (apple) Rose (flower) School (grocery) Cabbage (pen)

“Bold print” identifies word pairs from the WMS. Copyright 1945, renewed in 1972 by The Psychological Corporation, a Harcourt Assessment Company. Adapted and reproduced with permission. All rights reserved. “Wechsler Memory Scale” and “WMS” are trademarks of The Psychological Corporation, a Harcourt Assessment Company, registered in the United States of America and/or other jurisdictions.

574

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

The VPA15 included all eight word pairs (four related/easy and four unrelated/difficult) from the WMS/WMS-R VPA subtest1 plus seven new unrelated/difficult word pairs (Table 3). The VPA15 was administered according to the published instructions for the WMS-R VPA test (Wechsler, 1987) with two exceptions. First, the VPA15 had only two study test trials because pilot work revealed pronounced ceiling effects on subsequent trials. Second, the word pairs and cue words were recorded and presented via loud speakers, making use of a SoundBlaster AWE64 audio card driven by a 486-class personal computer. This method achieved consistent stimulus presentation conditions across subjects. 2.3. Procedure Each participant was tested individually, in a single session lasting about 4 h, in a small office on the main campus of the National Institutes of Health in Bethesda, MD. After giving informed consent, subjects completed a large battery of neuropsychological tests including the VPA15. The battery was administered either by psychometricians with extensive testing experience (40% of participants) or by a trained research assistant (60% of participants). Confirmatory analyses revealed no differences in test scores due to examiners.

3. Results and discussion Figure 1 shows performance on the first (top panel) and second trials (bottom panel) of the VPA15. Each point in the figure represents one participant’s score. The figure highlights the expected age-related reduction of performance, especially on Trial 1, and the increased level of performance on Trial 2. The figure underscores that even on Trial 1, a substantial proportion of subjects (13.4%) had scores of ≥8, the maximum score that can be obtained on the VPA subtest of the WMS and WMS-III. The majority of participants (62.3%) recalled eight or more words on Trial 2. These observations provide empirical support for the claim that the published VPA-WMS normative data are marred by serious ceiling effects in performance. One goal of the present investigation was to develop normative data for a Paired Associates test that is not limited by performance ceiling effects. The results in Figure 1 show that ceiling effects are not a problem with the VPA15. Only 1% of the participants obtained perfect scores on Trial 1, and only a small portion (14.5%) of participants scored at the ceiling on Trial 2. Table 4 gives the complete normative data for the VPA15, with separate scores for the first and second trials, arranged by midpoint overlapping age groups (Pauker, 1988; Uttl & Graf, 1997). Arranging the data in this manner has two main advantages: it maximizes the practical usefulness of the available data and it smoothes out group-specific irregularities in the data set. The table also lists values for average recall across the two study test trials and for the difference between Trials 1 and 2. The latter is an index of learning across trials. All tabled values highlight the age-related decline familiar from a wealth of prior investigations (Cullum, 1 We used the word pairs from the VPA subtest of the WMS/WMS-R because the WMS-III had not been published when we began the present investigation.

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

575

Fig. 1. Number of word pairs recalled on the first trial (top panel) and on the second trial (bottom panel) of the VPA15 by participants’ age. The horizontal line near the middle of each panel shows ceiling performance on the Paired Associates test of the WMS-III.

Butters, Troster, & Salmo, 1990; Lezak, 1995; Mitrushina et al., 1999; Spreen & Strauss, 1998; Trahan et al., 1989; Wechsler, 1945, 1987, 1997). Separate hierarchical regression analyses that used age and age2 (age2 was included to capture nonlinear differences across age groups) as predictors revealed significant age-related declines on Trials 1 and 2 performance, as well as on the average of the two trials and on the index of learning (i.e., Trial 2 minus Trial 1 performance). A correlation analysis revealed a moderate link, r = .74, between Trials 1 and 2 recall performance. Based on this value, the reliability of the VPA15 average scores is .85 (using the Spearman–Brown formula, see Anastasi, 1988), thus, comparable to the reliability achieved by

576

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

577

the VPA subtest from the WMS-III (Wechsler, 1997) and by other standardized instruments, such as the California Verbal Learning Test (Delis, Kramer, Kaplan, & Ober, 1987) or the Rey Auditory Verbal Learning Test (Graf & Uttl, 1995; Spreen & Strauss, 1998). The relation between Trials 1 and 2 recall bears on the claim that aging is associated with a decline in learning (increase from Trials 1 to 2), over and above a decline in acquisition (defined by performance on Trial 1). We used a hierarchical regression analysis to assess this claim. The goal of this analysis was, in a first step, to predict Trial 2 recall by means of the Trial 1 scores (i.e., by partialling out Trial 1 performance), and in a second step, to predict the residuals in terms of age and age2 . The results showed that Trial 1 performance accounted for 53.2% [F (1, 350) = 394.91] of the variance in Trial 2 recall, and age explained an additional 3.4% [F (1, 349) = 27.12] (age2 was not a significant predictor). This outcome indicates that aging has a much larger impact on memory acquisition (8.2%) than on learning across trials. Consistent with previous findings that sex, education, and IQ affect explicit episodic memory performance under some study test conditions, we also examined the influence of these factors on VPA15 performance. For this purpose, we used the NAART scores to estimate verbal IQ according to the method in Spreen and Strauss (1991). The estimated mean IQ scores are listed in Table 2. Sex was scored as 1 (female) and 0 (male). Table 5 lists the correlations among various VPA15 performance indicators, demographic characteristics (sex, age, education), and indices of verbal IQ (estimates obtained from the NAART scores and from the WAIS-R Vocabulary subtest). The correlations indicate that sex was not related to any of the VPA15 performance measures, but VPA15 performance correlated weakly/moderately (r = .12–.34) with education and verbal IQ. These findings contribute valuable information about the influence of education and IQ that is missing from the WMS-III manual. In order to optimize the usefulness of our normative data and to facilitate their application to clinical decision-making, we developed three sets of prediction equations for each of the VPA15 performance measures from Table 4. For the first equation set, we used only the demographic variables—age, age2 , education, and sex—as predictors. For the second set, Table 5 Correlations among performance on the VPA15, sex, age, education, and measures of verbal IQ (NAART and WAIS-R Vocabulary)

VPA15 Trial 1 VPA15 Trial 2 VPA average VPA learning Sexa Age Education (years) WAIS-R VIQb Vocabularyc

1

2

3

4

5

6

7

8

.73 .91 −.12 .08 −.29 .18 .23 .29

.94 .59 .05 −.38 .12 .25 .33

.29 .07 −.37 .16 .25 .34

−.03 −.23 −.04 .09 .13

.01 −.09 .05 −.02

.11 .20 .11

.42 .45

.74

Correlations printed in bold are significant at P < .05. a Sex was coded as female = 1 and male = 0. b VIQ estimated from scores on the NAART (Spreen & Strauss, 1991). c Number correct on WAIS-R Vocabulary subtest (Wechsler, 1981).

578

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

the demographic variables plus verbal IQ served as predictors. Finally, for the third set, all of the demographic variables plus the WAIS-R Vocabulary raw scores served as predictors. Even though they are partially overlapping, we give all three sets of prediction equations on the assumptions that different types of information will be available on different occasions. The prediction equations were derived by means of a forward stepwise regression method that allows specific predictors to enter the equation in the order of their usefulness (i.e., a predictor causing the greatest improvement in a prediction enters the equation first, see Darlington, 1968). The prediction equations are listed in Table 6 (only significant predictors are listed), together with the standard error of prediction and the proportion of variance explained (in the predicted scores) by each equation. The r2 values in Table 6 indicate that the inclusions of verbal IQ or WAIS-R Vocabulary scores in the equation improves the prediction of VPA15 performance and these factors account for as much as 10% of the variance in performance. The prediction equations (Table 6) can be used to calculate an expected score based on the information available about an individual and the corresponding standard error of estimate (S.E.E.)—a value analogous to a standard deviation—can be used to determine how far the individual’s obtained score is from the expected/normative score. For an example, consider a 60-year-old woman who has had 13 years of education and whose verbal IQ equals 105. By using the equations in Table 6, we can predict her Trial 2 score lobe −2.838−.083 × age + .150 × verbal IQ = −2.838 − .083 × 60 + .150 × 105 = 7.93, or about eight items. And if this woman were to complete the VPA15 and obtained a Trial 2 score of 2, for example, Table 6 Equations for predicting performance on VPA15 from demographic characteristics and measures of verbal IQ (NAART and WAIS-R Vocabulary) Predictors and prediction equations

S.E.E.

r2

Age, age2 , education, and sex as predictors Trial 1 = 2.355 − .047 × age + .278 × education + .639 × sex Trial 2 = 9.494 − .074 × age + .248 × education Average = 6.211 − .062 × age + .257 × education Learning = 6.42 − .029 × age

2.77 3.32 2.81 2.37

.141 .173 .179 .054

Age, age2 , education, sex, and verbal IQ as predictorsa Trial 1 = −4.711 − .053 × age + .087 × verbal IQ + .149 × education Trial 2 = −2.838 − .083 × age + .150 × verbal IQ Average = −3.507 − .069 × age + .127 × verbal IQ Learning = 1.886 − .032 × age + .042 × verbal IQ

2.71 3.15 2.69 2.35

.177 .258 .249 .074

Age, age2 , education, sex, and vocabulary as predictorsb Trial 1 = −.458 − .049 × age + .127 × vocabulary + .567 × sex Trial 2 = 3.309 − .079 × age + .180 × vocabulary Average = 1.641 − .065 × age + .153 × vocabulary Learning = 3.621 − .031 × age + .051 × vocabulary

2.68 3.08 2.63 2.34

.197 .289 .282 .080

In the prediction equations, age and education are specified in years, sex is coded 1 for female and 0 for male, Vocabulary is WAIS-R Vocabulary raw score, and verbal IQ (NAART) is computed using the following equation: estimated VIQ = 128.7 − .89 (NAART errors) (Spreen & Strauss, 1991). a Estimated from NAART (Spreen & Strauss, 1998). b WAIS-R Vocabulary raw score (Wechsler, 1981).

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

579

we could use the S.E.E. data from Table 5 and compute a corresponding z score in order to quantify the difference between the expected and obtained score. In this case, the difference would be equal to (2−7.93)/13.15, that is a z value of −1.88. A z table shows that value of −1.88 corresponds to sixth percentile. The z scores can also be used to generate the familiar WMS-III scaled scores according to the following formula: 10 + z score × 3. For our example, the scaled score would be 10 − (1.88 × 3) or 4.33, in the severely impaired range.

4. Concluding comments Our results provide empirical support for the claim that severe ceiling effects limit performance on the VPA tests of the WMS-R and WMS-III. Because of such ceiling effects, the WMS-R and WMS-III norms should not be used for making decisions about the memory abilities of majority of adult individuals, and the WMS-R and WMS-III VPA tests are not appropriate for research with healthy adults. Ceiling effects artificially restrict the range of scores and this confounds correlations between VPA scores and other measures. The WAIS-R IQ estimates shown in Table 2 may raise concerns about the generalizability of the VPA15 findings reported in this article. The mean derived IQs in Table 2 indicate that our sample scored about 2/3 S.D. above the 1981 WAIS-R normative sample (Wechsler, 1981), suggesting that our sample is special, higher functioning than the general population. However, this conclusion does not seem warranted. The strongest argument against it is the mounting evidence that performance on various measures of intelligence, including the WAIS-R, is rising at a rate of 3–4 IQ points per decade. This has been highlighted in several prominent studies and reviews (see Flynn, 1984, 1987, 1999; Fuggle, Tokar, Grant, & Smith, 1993; Lynn & Pagliari, 1994; Matarazzo, 1990; Neisser, 1998; Uttl & Van Alstine, 2000). To illustrate, the sample used in 1985/1986 for normalizing the WMS-R averaged 103.9 on the WAIS-R (Wechsler, 1987), compared to the WAIS-R average of 100 established for the normative sample tested between 1976 and 1980 (Wechsler, 1981). On the California Verbal Learning Test, a recent normative sample produced a WAIS-R full-scale IQ estimate of 116.3 for healthy elderly persons (Paolo, Trösher, & Ryan, 1997). Tested prior to 1992, a Mayo clinic normative sample produced estimates of 105.5 for the WAIS-R verbal IQ and 107.5 for the WAIS-R Performance IQ (Ivnik et al., 1992). In line with the score inflation revealed by these investigations, if our sample was similar to existing normative samples, it should show WAIS-R IQ scores between 106 and 108 by a conservative estimate, or IQs of about 111 by a liberal estimate. The derived IQ scores in Table 2 are clearly consistent with these estimates obtained from existing normative samples. It appears that our sample was not advantaged by superior cognitive skills, and therefore, the findings reported in this article can be generalized to the general population.

Acknowledgment This research was supported by Henry M. Jackson Foundation’s funding of Bob Uttl, by an operating grant from the Natural Sciences and Engineering Research Council of Canada to P. Graf, and by equipment loans from Alfalab Research. Part of this research was

580

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

conducted while Bob Uttl was at the National Institute of Neurological Disorders and Stroke, National Institute of Health, Bethesda, MD. We thank Joy Bonerba, Stephanie Cosentino, Elizabeth Daniels, Victoria Pharr, Pilar Santacruz, and Kristin Stover for assisting with the project. References Anastasi, A. (1988). Psychological testing. New York: Macmillan. Cullum, C., Butters, N., Troster, A., & Salmon, D. (1990). Normal aging and forgetting rates on the Wechsler Memory Scale—Revised. Archives of Clinical Neuropsychology, 5, 23–30. Darlington, R. B. (1968). Multiple regression in psychological research and practice. Psychological Bulletin, 69, 161–182. Delis, D. C., Kramer, J. H., Kaplan, E., & Ober, B. A. (1987). CVLT: California Verbal Learning Test. New York: Psychological Corporation. Flynn, J. R. (1984). The mean IQ of Americans: Massive gains 1932 to 1978. Psychological Bulletin, 95, 29–51. Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure. Psychological Bulletin, 101, 171–191. Flynn, J. R. (1999). Searching for justice. The discovery of IQ gains over time. American Psychologist, 54, 5–20. Fuggle, P. W., Tokar, S., Grant, D. B., & Smith, I. (1993). Rising IQ scores in British children: Recent evidence. Journal of Child Psychology and Psychiatry and Allied Disciplines, 33, 1241–1247. Graf, P., & Uttl, B. (1995). Component processes of memory: Changes across the adult lifespan. Swiss Journal of Psychology, 54, 113–130. Ivnik, R. J., Malec, J. F., Smith, G. E., Tangalos, E. G., Petersen, R. C., Kokmen, E., & Kurkland, L. T. (1992). Mayo’s older Americans normative studies. WAIS-R norms for ages 56 to 97. Clinical Neuropsychologist, 6(Supplement), 1–30. Lezak, M. (1995). Neuropsychological assessment (3rd ed.). New York: Oxford University Press. Lynn, R., & Pagliari, C. (1994). The intelligence of American children is still rising. Journal of Biosocial Science, 26, 65–67. Matarazzo, J. D. (1990). Wechsler’s measurement and appraisal of adult intelligence (5th ed.). Baltimore, MD: Williams & Wilkins. Mitrushina, M. N., Boone, K. B., & D’Elia, L. F. (1999). Handbook of normative data for neuropsychological assessment. New York: Oxford University Press. Neisser, U. (Ed.). (1998). The rising curve: Long-term gains in IQ and related measures. Washington, DC: American Psychological Association. Paolo, A. M., Trösher, A.I., & Ryan, J. J. (1997). California Verbal Learning Test: Normative data for the elderly. Journal of Clinical and Experimental Neuropsychology, 19, 220–234. Pauker, J. D. (1988). Constructing overlapping cell tables to maximize the clinical usefulness of normative test data: Rationale and an example from neuropsychology. Journal of Clinical Psychology, 44, 11–17. Salthouse, T. A. (1985). Theory of cognitive aging. Amsterdam: North-Holland. Salthouse, T. A. (1988). Resource-reduction interpretation of cognitive aging. Developmental Reviews, 8, 238–272. Spreen, O., & Strauss, E. (1991). A compendium of neuropsychological tests. New York: Oxford University Press. Spreen, O., & Strauss, E. (1998). A compendium of neuropsychological tests. New York: Oxford University Press. Theisen, M. E., Rapport, L. J., Axelrod, B. N., & Brines, D. B. (1998). Effects of practice in repeated administration of the Wechsler Memory Scale—Revised in normal adults. Assessment, 5, 85–92. Trahan, D. E., Larrabee, G. J., Quintana, J. W., Goethe, K. E., & Willingham, A. C. (1989). Development and clinical validation of an expanded paired associate test with delayed recall. Clinical Neuropsychologist, 3, 169–183. Uttl, B., & Graf, P. (1997). Color–word Stroop test performance across the adult life-span. Journal of Clinical and Experimental Neuropsychology, 19, 405–420.

B. Uttl et al. / Archives of Clinical Neuropsychology 17 (2002) 567–581

581

Uttl, B., Graf, R., Bonerba, J., Santacruz, P., Stover, K., Cosentino, S., & Pharr, V. (2001). Elementary components of speed and executive functions mediate age-related changes in episodic verbal memory. Manuscript in preparation. Uttl, B., Graf, P., & Cosentino, S. (2000). Exacting assessments: Do older adults fatigue more quickly? Journal of Clinical and Experimental Neuropsychology, 22, 496–507. Uttl, B., & Van Alstine, C. (2000, November). In 2050, a typical volunteer in psychology experiments will be a genius. Orlando, FL: National Academy of Neuropsychology. Wechsler, D. (1945). A standardized memory scale for clinical use. Journal of Psychology, 19, 87–95. Wechsler, D. (1981). Wechsler Adult Intelligence Scale—Revised. San Antonio, TX: Psychological Corporation. Wechsler, D. (1987). Wechsler Memory Scale—Revised. San Antonio, TX: Psychological Corporation. Wechsler, D. (1997). Wechsler Memory Scale-III. San Antonio, TX: Psychological Corporation.