J. psychrol. Rex, Vol. 24, No. 3, pp. 213-226, Printed in Great Britain.
ITALIAN
1990
0022-3956/90 $3.00 + .N Pergamon Press plc
MULTICENTRE
A NEUROPSYCHOLOGICAL
STUDY
ON DEMENTIA
TEST BATTERY
ALZHEIMER’S
(SMID):
FOR ASSESSING
DISEASE
L. BRACCO, L. AMADUCCI, D. PEDONE*, G. &Not,
M. P. LAZZARO$, F. CARELLA§,
R. D’ANTONA~, R. GALLATO(( and G. DENES Department of Neurological and Psychiatric Sciences, University of Florence; *Department of Neurology, University of Bari; TDepartment of Neurology, University of Genoa; $Department of Neurology, University of L’Aquila; of Neurology, University of Rome; )/Department of §Neurological Institute “C. Besta”, Milan; IDepartment Biostatistics and Epidemiology, Fidia, Abano Terme, Padua and Department of Neurology, University of Padua, Italy. (Received 29 January
1990; revised 26 March 1990)
Summary-For the Italian Multicentre Study on Dementia, a longitudinal survey on Alzheimer’s disease (AD) initiated in 1982, we developed a neuropsychological test battery for screening, staging and monitoring cognitive impairment in AD patients and for delineating their pattern of cognitive decline. The tests measured higher cortical functions primarily involved in AD, such as short- and long-term memory, orientation, language, and praxis, and spanned a large enough range of difficulty to minimize ceiling and floor effects. We administered this battery to 143 clinicallg diagnosed AD patients and 146 hospital controls whose scores were corrected for age and educational level. Interrater and test-retest reliability were substantial, as were content and concurrent validity. Five of the battery’s subtests proved capable of accurately screening early demented from non-demented elderly subjects and of staging mild, moderate, severe and very severe mental impairment. The mean performance of subjects classified into these categories differed significantly on all cognitive functions tested. Follow-up studies are in progress. INTRODUCTION
Multicentre Study on Dementia, a longitudinal survey launched in 1982, had as its primary aim the establishment of a data bank on Alzheimer’s disease (AD) in Italy. This called for a neuropsychological research tool that would (1) enable us to evaluate comprehensively and over time-in normal elderly as well as in severely demented subjects-a series of functions typically compromised in AD, but also (2) permit rapid and unambiguous screening and staging of mental deterioration and yet (3) not be distorted by variables such as age, gender or education of subjects, subjectivity of examiners or repeat administration of tests. The literature contains many examples of tests for screening and rating mental decline (see ISRAEL, WEINTRAUB, & FILLENBAUM, 1986, for review), some of which seem particularly well designed (HERSCH, 1979; PFEFFER et al., 1981; HUGHES, BERG, DANZIGER, COBEN, & MARTIN, 1982; ROTH et al., 1986; MORRIS et al., 1989). All the same, neuropsychological evaluation of the elderly, normal or demented, still poses methodological problems (MAPOU, 1988; RITCHIE, 1988) and few tests are really suitable for this purpose (HUPPERT & TYM, 1986). For instance, most are not corrected for the effects of age or schooling while others fail to assign enough importance to fundamental AD deficits such as aphasia and apraxia. Moreover, while some of the more recent studies have generated short batteries for discriminating normals from demented (ESLINGER, DAMASIO, BENTON, & VAN ALLEN, 1985) THE ITALIAN
Address correspondence to: Dr. Laura Bracco, Department of Neurological and Psychiatric of Florence, Policlinico di Careggi, Viale Morgagni 85, 50134 Florence, Italy. 213
Sciences, University
L. BRAWOef al.
214
and from mild AD cases (STORANDT,BOTWINICK,DANZIGER,BERG,& HUGHES, 1984), none of these batteries, so far as we know, has been used for staging cognitive impairment. We sought to address some of these shortcomings. This paper describes the criteria we followed in the design, standardization and validation of the neuropsychological battery used in the Italian study the results of which have been published in part (AMADUCCI ef al., 1986; AMADUCCI& SMID GROUP, 1988; BRACCOet al., 1989) and will be more fully detailed in future publications. Our aim was to develop a reliable and valid instrument for delineating cognitive function impairment in AD and for its early detection and staging. METHODS
Neuropsychoiogical
battery
The measures making up our neuropsychological battery (NPB) were chosen on the basis of a literature review of previous research on normal aging and dementia. We selected tasks which are portable, relatively easy to administer and, when possible, already standardized on an Italian population. Care was taken to choose subtests the scoring range of which and difficulty levels covered a wide enough span to minimize “floor” and “ceiling” effects. The final NPB includes scales which examine daily-living activities and affectivity as well as tasks which explore verbal and spatial memory, orientation, calculation, language, writing and reading capacities, and visuomotor functions. The specific behaviours and cognitive functions sampled by each test and its minimum and maximum scores are described in the Appendix. Administration of the total test battery required about 45 min on average and could, when necessary, be divided into two sessions. An additional 5-10 min were required the following day to measure 24 h delayed recall on two of the tests (Five Items and Paired Words).
Subjects The complete NPB was administered to 143 consecutive patients clinically diagnosed as suffering from AD and to 146 hospital controls, referred to one of the seven neurology departments (Bari, Florence, Genoa, L’Aquila, Milan, Padua, Rome) involved in the Italian Multicentre Study on Dementia (SMID) between April 1982 and December 1983. The AD patients were identified through a standardized protocol (BRACCO & AMADUCCI, 1983; AMADUCCI et al., 1986) the diagnostic criteria of which included: age between 40 and 80 years; slow and progressive decline of mental function for at least 6 months; and at least two of the following signs or symptoms: memory loss, impaired recognition of people, things or places, personality changes, behavioural and mood disorders without clear signs of depression and a score of 30 or less on the Information-Memory-Concentration Test (BLESSED,TOMLINSON, & ROTH, 1968). Clinical history, neurological examination, Hachinski and instrumental investigations Ischemic Score (HACHINSKI et al., 1975), laboratory
(including brain CT) were used to exclude dementias other than AD (i.e. multi-infarct dementia, normal pressure hydrocephalus and dementia secondary to alcoholism, head trauma, brain tumour, depression, psychosis and other conditions). These criteria did not substantially differ from the NINCDS-ADRDA guidelines for the clinical diagnosis of AD published in 1984 (MCKHAN et al., 1984). The hospital control patients had diseases of the spinal cord or peripheral nervous system and presented no signs of brain failure, depression or other psychiatric illness. The patient and control groups were comparable with respect to sex ratio, age and schooling (Table 1).
NEUROPSYCHOLOGICAL
BATTERYFOR AD IN ITALY
TABLE 1. DEMOGRAPHIC CHARACTERISXCS AND AD PATENTS
Controls (n = 146)
215
OF CONTROLS
AD patients (n = 143)
Sex Males Females
70 16
56 87
Age in years I 45 46-55 56-65 66-15 2 16 mean (SD) range
5 31 65 41 4 61.6 (5.5) 43-80
3 23 78 35 6t.8 (5.9) 45-80
Education in years 3 4-5 6-8 9-13 14-17 mean (SD) range
18 66 28 21 10 6.7 (4.4) 2-17
33 61 15 17 17 7.3 (5.0) 2-17
PROCEDURE
Reliability For each test of the NPB we evaluated
inter-rater
agreement
and reproducibility
over time.
Inter-rater agreement. Inter-rater agreement was assessed in two sessions at the Florence centre. Firstly 5 male and 7 female subjects, aged between 47 and 79 years (mean 58, SD ll), were tested by two examiners. Four of these subjects displayed no signs of mental impairment, 4 had mild and 4 had moderate dementia. For half of the cases, examiner A was the interviewer and B the observer, for the other half, their roles were reversed; both examiners scored the tests. The results were analyzed by the weighted kappa (wk) method which takes chance agreement into account (HALL, 1974), so that scores differing by 2 points or more counted as total disagreement (weight = 0) and a one-point difference corresponded to partial agreement (weight = 0.5). In the second session, an independent examiner administered tests to another 6 patients (2 controls, 2 with mild dementia, 1 with moderate and 1 with severe dementia). Their ages ranged from 56 to 66 years (mean 61.5, SD 4.5); 2 were females and 4 were males. Seven raters, one from each centre involved in SMID, who were unaware of the patients’ diagnosis and their degree of mental impairment, acted as observers and simultaneously scored the test results. These were then analyzed by calculating kappa (COHEN, 1960) since no procedures have been developed for measuring generalized agreement among three or more judges (KRAMER & FEINSTEIN, 1981); scores differing by more than 2 points were considered as disagreement whereas differences of 2 points or less were taken as agreement. Test-retest reliability. Reproducibility by different
methods
depending
over time of NPB components was determined on the subtest under study: repeat administration on two
216
L.
RRACCO
rr
al.
separate occasions (Blessed Dementia Scale, HRSD, Orientation and Gibson Maze), parallel forms (Five Items, Paired Words, Babcock Story, Digit Forward and Corsi Block Tapping Test) and the split-half technique (ANASTASI, 1982) wherein half of a test’s items are presented during the first session and the other half during the second session (Token Test, Set Test, Mental Capacity and Copying Drawings). Tests were administered at intervals of 48-72 h to 28 of our subjects, 18 males and 10 females, between 48 and 80 years old (mean 63.9, SD 8), with either no disorder (10) or with mild (10) to moderate (8) dementia. Reliability was expressed as a correlation coefficient (Pearson’s) which was adjusted by the Spearman-Brown formula for tests, the reliability of which was evaluated by the splithalf technique.
Effects of age and educational level To assess the influence of age and schooling on NPB test scores, both control and AD patients were divided into the following age and education categories: under 45 years, 46-55, 56-65, 66-75, and over 76 years old; less than 3 years of schooling, 4-5, 6-8, 9-13, and 14-17 years respectively. Analysis of variance (ANOVA) was used to examine statistical differences between groups and analysis of covariance (ANCOVA) to derive the regression coefficient for correcting scores.
Score distribution and task difficulty We examined the score distribution on each test of the NPB for both control and AD patients and calculated skewness and kurtosis coefficients. As an additional measure of test difficulty, to determine the ability range covered, we used the ratio between the mean score of the demented and the mean of the controls on a particular test. Thus the lower the value of this ratio, the more difficult the task.
Validity The battery’s concurrent validity was determined by matrix correlation of each of its subtests with three validated psychometric tools. For this purpose, we selected 45 of our subjects whose mental status ranged from normal to very severely demented. This group of 16 males and 29 females, aged between 48 and 79 years (mean 62, SD 7.6), took the entire NPB and in addition three widely used tests of mental performance: the Mini Mental Status Examination (MMSE; FOLSTEIN, FOLSTEIN, & MCHUGH, 1975), Raven’s Colored Progressive Matrices (P.M. ‘47; RAVEN, 1949), and the Global Deterioration Scale (CDS; REISBERG, FERRIS, DE LEON, & CROOK, 1982). To obtain an index of the NPB’s content validity its factorial structure was established by factor analysis of the test results from the 143 AD patients. We followed the principal components procedure, with rotation of the factors by the Varimax method (Statistical Package for Social Sciences, version 6.01). Items with a load of 0.40 or higher on a given factor were included, no item being assigned to more than one factor.
Accuracy in screening mentally healthy elderly from demented, impairment
and in staging mental
Sensitivity and specificity of each NPB test in distinguishing normal from pathological performance was established through a decision matrix (MCNIEL, KEELER, & ADELSTEIN, 1975) applied to our 143 patients and 146 controls.
NEUROPSYCHOLOGICALBATTERY FOR AD IN ITALY
217
To recognize tests maximally useful in identifying several stages of mental impairment, we carried out a study involving two groups of 35 subjects each. Two neurologists with no access to neuropsychological test results or diagnostic labels independently examined these 70 subjects (49 AD patients and 21 controls) and classified them as affected by “no”, “mild”, “moderate”, “severe”, or “very severe” cognitive decline. Their ratings were based on the Global Deterioration Scale (GDS; REISBERG et al., 1982) which describes seven levels of dementia in terms of specific clinical and behavioral criteria. Since the size of our sample did not warrant setting up more than five groups, we combined stages 2 and 3 of the GDS into a single level (“mild cognitive decline”) and likewise treated stages 4 and 5 as “moderate cognitive decline”. To reduce inter-rater variability, the two reviewing neurologists discussed in advance the administration and scoring of the GDS-a procedure which resulted in perfect agreement. The first group of subjects (training group) was composed of 16 men and 19 women whose ages ranged from 51 to 75 years (mean 62.3, SD 6.6) and who had from 3 to 17 years of schooling (mean 7.05, SD 4.5). The two neurologists classified eight of them as mentally healthy aged persons, ten as having mild dementia, seven moderate dementia, seven severe and three very severe dementia. The second group (validation group) consisted of 15 males and 20 females aged between 51 and 79 years (mean 64.2, SD 6.9): their educational level ranged from 3 to 17 years of schooling (mean 7.4, SD 4.6). Thirteen were classified as non-demented, nine as mildly, eight as moderately, three as severely and two as very severely demented. The first group was tested with the NPB and their results treated by stepwise discriminant analysis (Statistical Package for Social Sciences, version 6.01) to identify those tests best able to discriminate among our five degrees of mental impairment, along with their relative weights. Subtests in our battery (such as the BDS) asking for information similar to that covered by the GDS were eliminated from the discriminant analysis in order to avoid partwhole correlations with the external criterion. The canonical variables obtained from the discriminant analysis were then applied to the NPB test results of the validation group. The 5-fold classification thus derived for the latter group was then compared with the corresponding neurologists’ ratings. Finally, to obtain more stable coefficients, a second discriminant analysis was carried out on the larger sample (n = 70) made up of the pooled training and validation groups. In classifying subjects into levels of mental impairment, each patient was assigned to the level to which he had the highest probability of belonging, as determined from the discriminant scores and the pooled within-groups covariance matrix for the discriminant functions. The agreement between these results and the neurologists’ staging was evaluated by the weighted kappa (wk), the linear agreement weights of which ranged from a maximum of 1 to a minimum of 0 (HALL, 1974). The degree of mental impairment of the same 70 patients was re-assessed 1 year later, based again on the discriminant analysis variables and the judgment of two other “blind” neurologists.
Comparison of test scores in different stages of mental impairment The staging procedure was extended to our entire sample (143 AD patients and 146 controls) and differences in test performance among the resulting five groups examined by analysis of variance (ANOVA).
L. B~~cco
218
el al.
RESULTS
Reliability Inter-rater agreement. For the NPB as a whole,
agreement between two examiners, evaluated by the weighted kappa (wk) index (n = 12), was 0.88 with an estimated 95% confidence interval (CI) of 0.79-0.97. Agreement among seven examiners judging six subjects was 0.86 (95% CI: 0.75-0.97) as measured by the kappa (k) index.
Test-retest reliability. Repetition subjects
yielded
a mean
reliability
of the battery after a two- to three-day interval coefficient of 0.84 (95% CI: 0.76-0.91).
on 28
Effects of age and educational level In the demented population, no effect on cognitive performance attributable to age or education could be detected. In the control population there was a significant, inverse relationship between test scores and age for about one-third of the battery (Mental Capacity, Five Items/Acquisition and Recall, Corsi, Copying Drawings and Gibson Maze), while amount of schooling was positively related to scores on more than one-half of the subtests (same tests as for age, except Gibson Maze, plus Babcock Story, Paired Words/Acquisition and Recall, Set and Token test). Consequently, scores on these tests of both AD and normal subjects were routinely corrected for age and schooling. Gender did not affect test scores for either group.
Score distribution and task difficulty An index of the difficulty that NPB subtests posed for the demented was computed from the ratio of their mean scores on each task to that of the controls. Expressed in these terms, the difficulty measure ranged from a low of 0.10 on the Babcock Story/IO’ delayed recall, the most difficult test, to a high of 0.77 on Personal Memory, the easiest of our tasks. The control group performed well on all NPB subtests, so that the results on most of them were skewed toward the right. However scores on several tasks exploring memory, such as Five Items, Digit, Babcock Story, Corsi test and the Gibson Maze, were distributed more symmetrically. In AD patients, score distributions on information, orientation, mental capacity and concentration tasks were approximately symmetrical while verbal and spatial memory test results showed an asymmetry toward the left. Personal Memory, the SET test, the Gibson Maze and the Token Test yielded distributions asymmetrical towards the right in the AD population, too.
Validity All tests, except HRDS, were highly intercorrelated (Table 2); they showed high correlations as well with MMSE and GDS, and lower ones with P.M. ‘47 (Table 2, lower part). The Varimax Rotation applied to the factor analysis of the AD patients data produced two factors (Table 3) which accounted for 95% of the common variance of the items. The first factor could be characterized as memory and orientation (13 items) while the second was mainly saturated by tasks measuring concentration, language and visuomotor abilities (8 items).
0.00
-0.68
- 0.67
-0.74
-0.78
,k Story
:k/lO’
luisition
recall
recall
0.08
-0.71
‘g
/ Mare
0.00
-0.80
TN
-0.02
0.17
0.06
-0.73
-0.53
0.82
0.08
- 0.64
0.13
-0.75
est
0.02
0.07
0.01
-0.57
0.80
-0.76
0.06
0.00
0.06
0.02
-0.04
-0.02
~0.10
0.10
0.12
0.07
0.15
0.12
0.06
HRSD
3lock
Ih recall
I’
0.74
-0.62
‘orward
:quisition
- 0.67
:rsonal
0.79
-0.64
al Mem.
recall
PO.74
-0.69
0.72
-0.73
Capac.
nation
ition
ation
BDS
PO.89
0.69
0.91
0.65
0.78
0.83
0.87
0.57
0.87
0.84
0.81
0.82
0.82
0.78
0.62
0.71
0.62
0.84
0.80
0.76
0.80
0.95
Inf.
PO.87
0.50
0.89
0.61
0.75
0.83
0.88
0.54
0.86
0.84
0.78
0.80
0.80
0.80
0.58
0.66
0.61
0.80
0.83
0.80
0.80
Dnent
-0.72
0.60
0.73
0.70
0.75
0.8 I
0.87
0.67
0.70
0.66
0.64
0.67
0.65
0.65
0.51
0.69
0.70
0.70
0.73
0.80
Cont.
-0.75
0.50
0.65
O.RO
0.74
0.78
0.75
0.58
0.70
0.71
0.67
0.72
0.68
0.65
0.58
0.65
0.67
0.60
0.73
M.C.
-
~-0.70
0.53
0.79
0.62
0.54
0.74
0.87
0.48
0.68
0.72
0.64
0.63
0.63
0.62
0.45
0.53
0.63
0.69
Pers.
-
~0.81
0.66
0.84
0.50
0.67
0.74
0.78
0.61
0.79
0.79
0.80
0.75
0.75
0.75
0.52
0.68
0.68
N.Per.
_ ._
-0.67
0.55
0.66
0.61
0.60
0.66
0.69
0.61
0.57
0.54
0.55
0.57
0.59
0.64
0.61
0.61
Digit
-
-0.81
0.55
0.72
0.48
0.65
0.67
0.59
0.61
0.80
0.76
0.81
0.82
0.75
0.78
0.81
Babe.
0.63
0.76
0.54
0.69
0.72
0.69
0.62
0.89
0.89
0.92
0.86
0.93
FIA
PO.90
_ _ ._
-0.75
0.64
0.82
0.61
0.64
0.64
0.52
0.51
0.72
0.70
0.72
0.71
0.70
0.73
BablO’
-0.93
0.56
0.84
0.61
0.66
0.72
0.73
0.52
0.90
0.87
0.79
0.88
FIIO’
-
-
-0.95
0.64
0.82
0.61
0.68
0.72
0.68
0.57
0.92
0.88
0.86
Fl24h
-0.89
0.63
0.76
0.54
0.69
0.72
0.69
0.62
0.89
0.89
PWA
-0.89
0.55
0.81
0.62
0.65
0.76
0.76
0.62
0.94
PWIO’
-0.95
0.61
0.87
0.58
0.70
0.80
0.80
0.63
PW24h
-0.64
0.50
0.59
0.61
0.69
0.66
0.52
Cors~.
-0.78
0.58
0.83
0.61
0.66
0.83
SET
0.82
0.63
0.82
0.67
0.67
0.77
0.72
Draw.
-0.62
0.56
0.61
Gibs.
‘47, GDS)
-0.76
P.M. Token
-0.79
TABLE 2. INTERMATRIX CORRELATION OF NPB SUBTESTS WITH ONE ANOTHERAND WITH THREEOTHERPSYCHOMETRIC MEASURES(MMSE,
-0.91
0.74
MMSE
PO.70
PM’47
L. BRKCO rt al.
220
TABLE 3. FA(.~oR ANALYSISOF NPB SUBTESTSON 143 AD PATIEYVS NPB subtest Blessed Dementia Scale Information Orientation Concentration Mental Capacity Personal Memory Non Personal Memory Digit Forward Babcock Story Babcock Story 10’ recall Five Items Acquisition Five Items IO’ recall Five Items 24h recall Paired Words Acquisition Paired Words IO’ recall Paired Words 24h recall Corsi Block Tapping Test Set Test Token Test Copying Drawings Gibson Maze
Factor
1
Factor
2
0.614 0.664 0.635 0.736 0.703 0.542 0.634 0.646 0.762 0.693 0.824 0.x43 0.870 0.798 0.814 0.859 0.668 0.618 0.745 0.742 0.728
Accuracy in screening mentally healthy elderly from demented, and in staging several levels of mental impairment With the exception of the HDRS, each test of the NPB was able to distinguish between AD and non-demented subjects. The battery on the whole had a sensitivity of 0.90 (estimated 95% CI: 0.86-0.94) and a specificity of 0.87 (estimated 95% CI: 0.84-0.90). All but two patients (94.3%) of the training set (n = 35) were correctly classified into five levels of mental impairment on the basis of the combination of tests which emerged as the most discriminating from the first stepwise analysis. These were: Personal Memory (PM), Paired Words Acquisition (PWA), Five Items/24h delayed recall (FI/24h), Mental Capacity (MC) and Orientation. Applying the weighting obtained from this analysis to the validation group (n = 35) revealed that 30 subjects (86%) had been correctly classifiedthat is, the degree of their mental impairment corresponded to that expressed by two neurologists. The second discriminant analysis, performed on the combined groups, showed that these variables and their order of entry in the stepwise analysis were the same as in the first discriminant analysis. On the basis of this final discriminant analysis, two canonical variables were obtained with the following unstandardized coefficients and constants: (1) (- O.O69)FI/24h + (- 0.188)PM + (- 0.027)PWA + (-0.023)MC + (-0.050) Orientation + 2.546. (2) (-O.l29)F1/24h + (0.169)PM + (-0.188)PWA + (0.072)MC + (0.032) Orientation - 2.804. Figure 1 graphically depicts the result of applying these equations to each subject’s test performance. The cut-off points along the canonical variable 1 turned out to be: -0.6 between normality and mild deficits, 0.2 between mild and moderate, 1.O between moderate and severe, and 1.9 between severe and very severe impairment. Values greater than 1.0
221
NEUROPSYCHOLOGICALBATTERY FOR AD IN ITALY
I ~1
0
7.
9
CANONICAL
VARIABLE
I
FIO. 1. Scatter plot depicting group means (*) with associated cases on two derived canonical variables. b no, 0 mild, 0 moderate, A severe, xx very severe mental impairment. Open symbols indicate the training group, solid symbols the validation group of subjects.
along the canonical variable 2, and falling in the canonical variable 1 range between 0.5 and 1.5, correspond to the moderate mental impairment stage. This procedure served to classify correctly 90% of our 70 patients. Specifically, the mentally healthy were correctly classified in 100% of the cases, the mildly impaired in 84%, the moderately demented in 86%, and the severely and very severely demented in 80% and 100% of the cases respectively. The agreement between observed (NPB) and predicted (neurologists’) results was 0.93, as measured by wk; in no case did the discrepancy between predicted and observed outcomes differ by more than one of our stages. Re-assessed the following year, none of the 21 controls had become demented. Of the 19 patients with mild dementia at entry, nine had moderate dementia one year later, 5 of the 15 moderately demented had progressed to severe dementia, and of the 10 severely demented cases, 4 had worsened to very severe in the interim. The accuracy of classification into levels of mental impairment in terms of the canonical variables was 100% for the normals and 86% for mild, 92% for moderate, 83% for severe and 100% for very severe dementia. The agreement between observed and predicted results was 0.94.
Comparison of test scores in different stages of mental impairment The staging of the degree of cognitive decline in controls and AD patients confirmed that controls were free from mental impairment while 26.5% of the patients presented mild, 37.7% moderate, 31.5% severe, and 4.3% very severe mental impairment. ANOVA applied to these groups showed significant differences in NPB test scores among patients with different degrees of mental deficit (Table 4). On most tests, score differences between three, or occasionally two, contiguous stages were significant; indeed scores on a few tests differed significantly among all impairment levels. Moreover, all but four of the subtests differentiated mildly impaired from normal subjects so that the battery clearly is capable of detecting dementia in its incipient phases.
L. BaAcco
222
et al.
TABLE 4. MEAN NPB TEST SCORESOF CONTROLSAND OF AD PATIENTS WITH DIFFERENTLEVELSOF MENTAL IMI
NPB subtest
No (n = 146)
Blessed Dementia Scale Depression (HRSD) Information Orientation Concentration Mental Capacity Personal Memory Non Personal Memory Digit Forward Babcock Story Babcock Story 10’ recall Five Items Acquisition Five Items 10’ recall Five Items 24h recall Paired Words Acquisition Paired Words 10’ recall Paired Words 24h recall Corsi Block Tapping Test Set Test Token Test Copying Drawings Gibson Maze
0.4 24.3 Il.8 11.7 5.9 10.2 7.0 7.8 4.2 9.1 11.2 11.8 17.6 15.4 13.3 21.2 20.3 4.0 38.5 33.1 31.2 10.9
*control Standard
(1.3) (2.5) (0.4) (0.8) (0.3) (1.2) (0.0) (1.8) (1.2) (4.2) (5.0) (1.9) (1.9) (5.0) (2.5) (2.8) (3.1) (1.2) (1.6) (2.2) (1.9) (0.8)
Mild (n = 38) 7.3 29.7 9.4 9.2 4.5 8.9 6.5 4.7 3.5 2.4 2.2 7.0 12.5 9.2 7.8 12.4 10.5 1.7 34.2 25.1 9.4 7.7
vs. mild; tmild vs. moderate; imoderate deviation values in parentheses.
(3.1) (4.5) (1.8) (1.9) (1.3) (2.2) (0.7) (2.3) (1.5) (2.0) (2.4) (3.2) (4.2) (2.6) (3.2) (5.3) (4.9) (1.7) (6.2) (6.5) (5.7) (4.2)
Degree of mental impairment Severe Moderate Very severe (n = 54) (n = 45) (n = 6) 9.4 30.1 6.3 6.3 3.4 7.0 6.3 3.1 2.9 1.5 1.4 2.9 5.7 3.3 3.9 7.2 4.9 1.2 27.7 21.2 6.1 6.5
vs. severe;
(3.3) (4.2) (2.1) (2.2) (1.4) (2.7) (0.8) (2.0) (1.4) (1.4) (2.0) (2.7) (4.6) (3.1) (3.5) (5.7) (4.4) (1.3j (8.7) (7.5) (5.3) (4.5)
13.1 32.1 4.8 4.2 2.1 5.2 3.9 1.4 1.7 0.6 0.3 2.0 2.9 2.2 1.5 2.8 1.6 0.8 12.9 14.3 4.5 4.4
(2.7) (5.6) (2.0) (2.0) (1.0) (2.9) (1.2) (1.3) (1.3) (1.3) (0.8) (2.7) (4.5) (3.3) (2.4) (3.9) (2.3) (1.1) (8.8) (6.7) (5.2) (4.4)
12.1 28.1 1.o 1.0 1.o 0.3 0.8 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.3 3.4 0.0 0.0
(5.1) (7.4) (0.3) (1.0) (0.8) (0.6) (0.7) (0.6j (0.0) (0.0) (0.0) (0.0) (o.oj (0.0) (0.0) (0.0) (0.0) (o.oj (1.5) (3.5) (0.0) (0.0)
F 25.5 2.7 49.5 52.6 33.3 2.5 13.3 24.2 15.8 10.6 9.4 26.1 38.9 48.5 33.3 30.7 37.5 3.6 69.0 26.2 7.0 6.5
Osevere vs. very severe.
DISCUSSION
Our design of the NPB sought to take into acount some of the major comments criticisms levelled at neuropsychological assessment of the demented in 1982 when we st: our project, many of which still apply today (MAPOU, 1988; RITCHIE, 1988). While w not claim to have devised the ideal battery, we do feel that the NPB reliably evalual large enough number of cognitive functions so as to yield a detailed neuropsycholol profile. Test scores were corrected for the subject’s age and education, variables which influence commonly used measures of dementia so that low socres imply gre deterioration in younger than in older people and in more than in less educated per, (KITTNER et a/., 1986). We used the coefficients derived from ANCOVA of the COI group’s test scores to adjust the data from both demented and control subjects. The la it should be recalled, were patients suffering from non-dementing neurological dise whose inclusion in the study was intended to equalize the effect of poor health on me performance (LEHR, 1983; FILLENBAUM, HUGHES, HAYMAN, GEORGE, & BLAZER, 1s Their cognitive functioning appeared to be more influenced by education than by although this may be due to the relatively restricted age range of the group. For the patients, while we detected no age- or education-related impact on their performance, disorder itself may mask such effects. Because we wanted the NPB to reflect the competence range typical of patients refe for neuropsychological assessment, that is, from normal to very poor performance
NELJROPSYCHOLOGICAL BATTERY FOR AD IN ITALY
223
decided to include in our battery tests’ sampling specific cognitive functions varying in difficulty. Although many tests appeared too easy for control subjects (skewed to the right) and some too difficult for mentally impaired people (skewed to the left), non-demented subjects only exceptionally showed a ceiling effect and floor effects were observed only in patients who were profoundly demented. Thus, we feel that the battery spans a broad enough performance spectrum, a conclusion further reinforced by the difficulty index range of its tests. The validity of neuropsychological measures remains a thorny issue because of the problem of adequate external criteria against which to compare them. However, tests and scales do exist which have come to be well-recognized instruments for mental status assessment. The high correlations of NPB subtests with some of these dementia ratings point to the battery’s concurrent validity. In particular, the high correlations with the GDS, which probes situational aspects of the AD patients’ lives, suggest that the NPB explores cognitive functions critically involved in everyday experience. The NPB tests appear to be less influenced by general intelligence, as measured by P.M. ‘47, than are other psychometric measures such as the MMSE. The NPB’s internal validity is implied by the strong correlation of all its subtests with one another and by the Factor Analysis results showing two main factors that correspond to functions characteristically involved in AD: Memory-Information and ConcentrationLanguage-Visuoconstructive capacities. The number of measures needed to depict the cognitive status of AD patients depends on the use one intends to make of the tests. A comprehensive battery, such as the NPB, yields detailed neuropsychological profiles essential for comparing the pattern of mental decline in AD and in other forms of dementia, for establishing correlations between neuroimaging features and specific cognitive deficits such as aphasia and apraxia, and for assessing the prognostic value of certain neuropsychological symptoms (severity of aphasia, for instance). BERG et al. (1988) feel that, in longitudinal studies, a global measure of dementia from a series of neuropsychological tests provides more information than can be obtained from a brief mental status questionnaire or a single cognitive measure. Yet when the evolution of dementia is measured through changes in all cognitive functions explored by extensive batteries, the results can also be contradictory and their interpretation ambiguous, since some capacities may decline faster than others or be more sensitive to therapeutic intervention or to other variables. More concretely, the question may be asked whether a patient with moderate memory loss and mild language and visuomotor deficits is better or worse off cognitively than one with moderate memory loss, moderate language deficit and normal visuomotor functions; or again, how to assess a patient who, on follow-up, displays a worsening of memory, an improvement in orientation and mood, and no change in language and praxis, as compared to his previous examination. Considerations such as these led us to search for a more restricted number of tests that might suffice to classify mental deterioration into several distinct levels. To that end, we used discriminant function analysis, a robust technique recently employed by several investigators to distinguish the normal elderly from the demented as a whole (ESLINGER et al., 1985) or from those with mild AD (STORANDT et al., 1984), though no one, as far as we know, has applied this procedure to the identification of more than three groups (KATZMAN et al., 1983).
224
L. BRACC~ et al.
The combined scores from five of the NPB’s subtests permitted us to classify dementia as very severe, severe, moderate, mild or absent with an accuracy of 90%. When each subject was assigned to the most probable one of these categories on the basis of his performance, most NPB tests revealed significant differences between various impairment levels. In our view, these results reflect the adequacy of our staging procedure as well as the battery’s ability to register changes in specific cognitive functions. The NPB lends itself to monitoring mental performance in longitudinal studies on AD, one of the principal aims of the SMID project. In a sense, we have demonstrated as much in this paper, if it is true that “comparing Senile Dementia of the Alzheimer’s Type (SDAT) groups differing in severity provides for the same observations as following up subjects with mild SDAT over time” (BOTWINICK, STORANDT, BERG AND BOLAND, 1988). In any event, we have been following our sample over the past 7 years (BRACCO et al., 1989) and plan to present other relevant results in the near future. Acknowledgemenfs--We thank all the other participants of the SMID Group: E. Ferrari, P. Livrea, Dept. of Neurology, University of Bari; C. Loeb, M. Tabaton, Dept. of Neurology, University of Genoa; A. Lippi, Dept. of Neurology, University of Florence; M. Prencipe, M. L. Bonatti, Dept. of Neurology, University of L’Aquila; T. Caraceni, Neurological Institute “C. Besta”, Milan; L. Battistin, B. Tavolato, S. Ferla, Dept. of Neurology, University of Padua; C. Fieschi, P. Gentile, Dept. of Neurological Sciences, University of Rome “La Sapienza”. Thanks are due to Dr. Eda Berger Vidale for her critical reading and creative editing of the text. This work was supported in part by CNR grant No. 86.01689.56. REFERENCES AMADUCCI, L., FRATIOLIONI, L.., ROC~A, W., FIESCHI, C., LIVREA, P., PEDONE, D., BRA~CO, L., LIPPI, A., GANDOLFO, C., BINO, C., PRENCIPE, M., BONATTI, M., GIROTTI, F.. CARELLA, F., TAVOLATO, B., FERLA, S., LENZI, G. L., CAROLEI, A., GAMBI, A., GRIGOLETTO, F., &SCHOENBERG, B. (1986). Risk factors for clinically diagnosed Alzheimer’s disease: A case-control study of an Italian population. Neurology 36, 922-931. AMADUCCI, L. & SMID Group (1988). Phosphatidylserine in the treatment of Alzheimer’s disease: results of a multicenter study. Psyrhophormacology Bulletin 24, 130-135. ANASTASI, A. (1982). Psychological testing (5th ed.). New York: Macmillan. ARRIGONI, G. & DE RENZI, E. (1964). Constructional apraxia and hemispheric locus of lesions. Cortex 1, 170-178. BABCOCK, H. & LEVY, L. (1930). An experiment in the measurement of mental deterioration. Archives OfPsychology 117, 105-107. BERG, L., MILLER, J. P., STORANDT, M., DUCHEK, .I., MORRIS, J. C., RUBIN, E. H., BURKE, W. J., & COBEN, L. A. (1988). Mild senile dementia of the Alzheimer type: 2. Longitudinal assessment. Annuls of Neurology 23, 477-484. BLESSED, G., TOMLINSON, B. E., & ROTH, M. (1968). The association between quantitative measures of dementia and of senile change in the cerebral grey matter of elderly subjects. British Journalof Psychiatry 114, 797-811. BOTWINICK, J., STORANDT, M., BERG, L., & BOLAND, S. (1988). Senile dementia of the Alzheimer type: subject attrition and testability in research. Archives of Neurology 45, 493-496. BRACCO, L. & AMADUCCI, L. (1983). A clinical protocol for the assessment of senile dementia of the Alzheimer type: a progress report. In W. H. Gispen & G. Traber (Eds), Aging of rhe bruin (pp. 275-281). Amsterdam: Elsevier Science. BRACCO, L., GALLATO, R., GRIGOLETTO, F., LIPPI, A., A~DU~CI, L., & SMID Group (1989). Survival in presenile and senile Alzheimer’s disease. Journal of Neural Transmission, Parkinson’s Disease and Dementia Section 1, 39. COHEN, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measuremenf 20, 37-46. DE RENZI, E. & FACLIONI, P. (1978). Normative data and screening power of a shortened version of the token test. Cortex 14, 41-49. ESLINGER, P. J., DAMASIO, A. R., BENTON, A. L., & VAN ALLEN, M. (1985). Neuropsychological detection of abnormal mental decline in older persons. J,4MA 253, 670-674. FILLENBAUM, G. C., HUGHES, D. C., HEYMAN, A., GEORGE, L. K., & BLAZER, D. G. (1988). Relationship of health and demographic characteristics of mini-mental State examination score among community residents. Psychological Medicine 18, 719-726. FOLSTEIN, M. F., FOLSTEIN, S. E., &MCHUGH, P. R. (1975). “Mini-mental state”: a practical method for grading the cognitive state of patients for the clinician. Jonrnal of Psychiatric Research 12, 189-198.
NEUROPSYCHOLOGICAL
BATTERY FOR AD IN ITALY
225
HACHINSKI, V. C., ILIFF, L. D., ZILKHA, E., Du BOULAY, G. H., MCALLISTER, V. L., MARSHALL, J., RUSSELL, R. W., & SYMON, L. (1975). Cerebral blood flow in dementia. Archives of Neurology 32, 632-637. HALL, F. (1974). Inter-rater reliability of rating scales. Brifish Journal of Psychiatry 125, 248-255. HAMILTON, M. (1967). Development of a rating scale for primary depressive illness. British Journal of Social and Clinical Psychology 6, 278-296. HERSCH, E. L. (1979). Development and application of the extended scale for dementia. Journal of the American Geriatric Society 32, 348-354. HUGHES, P. H., BERG, L., DANZIGER, W. L., COBEN, L. A., & MARTIN, R. C. (1982). A new clinical scale for the staging of dementia. British Journal of Psychiatry 140, 566-572. HUPPERT, F. A. & Ty?,f, E. (1986). Clinical and neuropsychological assessment of dementia. British Medical Bulletin 42, 11-18. ISAACS, B. & KENNIE, A. T. (1973). The set test as an aid to the detection of dementia in old people. British Journal of Psychiatry 123, 467-470. ISRAEL,L., WEINTRAUB, L., & FILLENBALIM,G. G. (1986). Assessing the dementias in clinical practice and population surveys: review of the literature since 1965. In A. Bes (Ed.), Senile dementias. Early detection (pp. 592-603). London: John Libbey Eurotext. KATZMAN, R., BROWN, T., FULD, P., PECK, A., SCHECHTER, R., & SCHIMMEL, H. (1983). Validation of a short orientation-memory-concentration test of cognitive impairment. American Journal of Psychiatry 140,734-739. KITTNER, S. J., WHITE, L. R., FARMER, M. E., WOLZ, M., KAPLAN, E., MOES, E., BRODY, J., & FEINLEIB, M. (1986). Methodological issue in screening for dementia: the problem of education adjustment. Journal of Chronic Diseases 39, 163- 170. KRAMER, M. S. & FEINSTEIN, A. R. (1981). Clinical biostatistics. LIV. The biostatics of concordance. Clinical Pharmacology and Therapy 29, 11 l-123. LEHR, U. M. (1983). Objective and subjective health in longitudinal perspective. In A. Agnoli, G. Crepaldi, P. F. Spano, & M. Trabucchi (Eds), Aging brain and ergot alkaloids (pp. 139-145). New York: Raven Press. MAPOU, R. L. (1988). Testing to detect brain damage: an alternative to what may no longer be useful. Journal of Clinical and Experimental Neuropsychology 10, 27 l-278. MC-, G., DRACHMAN, D., FOLSTEIN,M., KATZMAN,M., PRICE, D., & STADLAN,E. M. (1984). Clinical diagnosis of Alzheimer’s disease: report of the NINCDS-ADRDA work group under the auspices of Department of Health and Human Services Task Force on Alzheimer’s disease. Neurology 34, 939-944. MCNIEL, B. J., KEELER, E., & ADELSTEIN, S. J. (1975). Primer on certain elements of medical decision making. New England Journal of Medicine 293, 21 I-21 5. MILNER, B. (1971). Interhemispheric differences in the localisation of psychological processes in man. British Medical Bulletin, 21, 272-276. MORRIS, J. C., HEYMAN, A., MOHS, R. C., HUGHES, M. S., VAN BELLE, G., FILLENBAUM, G., MELLITS, E. D., CLARK, C., & THE CERAD INVESTIGATORS (1989). The consortium to establish a registry for Alzheimer’s disease (CERAD). Part I. Clinical and neuropsychological assessment of Alzheimer’s disease. Neurology 39, 1159-I 165. PATTIE, A. H. & GILLEARD, C. J. (1975). A brief psychogeriatric assessment schedule. British Journal of Psychiatry 27, 489-493. PFEFFER, R. I., KUROSAKI, T. T., HARRAH, C. H. JR., CHANCE, J. M., BATES, D., DETELS, R., FILOS, S., & BUTZKE, C. (1981). A survey diagnostic tool for senile dementia. American Journal of Epidemiology 114, 515-527. RANDT, C. T., BROWN, E. R., & OSBORNE, D. P. JR. (1980). A memory test for longitudinal measurement of mild to moderate deficits. Clinical Neuropsychology 4, 184-194. RAVEN, J. C. (1949). Progressive matrices (1947) Sets A, Ab, B. Board and Book Forms. London: Lewis. REISBERG, B., FERRIS, S. H., DE LEON, M. J., & CROOK, T. (1982). The Global Deterioration scale for assessment of primary degenerative dementia. American Journal of Psychiatry 139, 1136-I 139. RITCHIE, K. (1988). The screening of cognitive impairment in the elderly: a critical review of current methods. Journal of Clinical Epidemiology 41, 635-643. ROTH, M., TYM, E., MOUNTJOY, C. Q., HUPPERT, F. A., HENDRIE, H., VERMA, S., & GODDARD, R. (1986). CAMDEX. A standardized instrument for the diagnosis of mental disorder in the elderly, with special reference to the early detection of dementia. British Journal of Psychiatry 149, 698-709. STORANDT, M., BOTWINICK, J., DANZIGER, W. L., BERG, L., &HUGHES, C. P. (1984). Psychometric differentiation of mild senile dementia of the Alzheimer type. Archives of Neurology 41 497-499. WECHSLER, D. (1945). A standardized memory scale for clinical use. Journal of Psychology 19, 87-95.
APPENDIX Description
of NPB subtests
and the functions
they assess
Blessed Dementia Scale (BDS; BLESSED, TOMLINSON,& ROTH, 1968). Informants well-acquainted were asked about his daily living abilities (working, mood, eating, dressing, bowel and bladder range from zero (“fully preserved capacity”) to 28 (“extreme incapacity”).
with the subject control). Scores
226
L. BRACCO et al.
Hamilton Raring Scalefor Depression (HRSD: HAMILTON, 1967) was administered to the subject and his responses checked against those of next of kin. The score per item ranged from 1 (absence of symptom) to 5 (very severe symptom). The minimum total score was 22, the maximum 85. Information was explored by means of twelve easy questions about time and place from the Information-MemoryConcentration Test (IMC; BLESSED ef al., 1968). Maximum score 12. Concentration was measured by an IMC subtest which includes tasks such as saying the months of the year backward, counting l-20 and 20-l Maximum score 6. Orientation was evaluated by a brief validated psychometric measure-a subtest of the Clifton Assessment Schedule (CAS; PATTIE & GILLEARD, 1975). Maximum score 12. Mental capacity was determined by a CAS subtest which consists of counting, reading, writing and saying the alphabet. Maximum score 12. Memory Personal Memory was sampled by a subtest of the IMC which inquires about the subject’s date and place of birth, schools attended, name of spouse, name and place of his former occupation. Maximum score 7. The accuracy of the patient’s responses was checked against the information provided by his next of kin. Non Persona/Memory was investigated by another IMC subtest which asks for the dates of the first and second world wars, name of the President and of the Pope, recall of a brief address. Maximum score 9. Short-Term Memory was evaluated through the following tests: Digit Forward (WECHSLER, 1945). Numbers of increasing length were given to the subject in serial trials up to a maximum of eight digits. Attempt to recall each series of digits was terminated after the second unsuccessful trial. Maximum score 9. Babcock Story (BABCOCK & LEVY, 1930). An emotionally charged short story was read to the patient who was to recall as much of it as possible immediately after its presentation. The story was repeated again and delayed recall tested after IO min with distraction (Babcock Story/lo’ delayed recall). Maximum score 13 for each session. Corsi Block Tapping Test (MILNER, 1971). The subject had to tap an increasing number of cubes in the same sequence as they had just been touched by the examiner. The maximum number of cubes war eight and the test was terminated after the second unsuccessful trial. Maximum score 9. Acquisition and Delayed Recall were examined by 2 tests characterized by controlled learning exposure, separation of acquisition from retrieval, fixed intervals of recall after delays with distraction, and multiple trials of relearning which allow estimation of savings (Randt Memory Test; RANDT, BROWN, & OSBORNE, 1980). Five Items. Five bi- or tri-syllabic nouns with high and concrete imagery were read to the subject who had to recall them in a maximum of three re-acquisition trials after 10 set of distraction (FI Acquisition, maximum score 15), after 10 min (FI/lO’-delayed recall, maximum score 20) and again after 24h (F1/24h-delayed recall, maximum score 20). Paired Words. Six word pairs with varied conceptual relationships were read to the subject who had to supply the appropriate second item when given the first. Recall was tested immediately after presentation (PW’ Acquisition, maximum score 18), 10’ later (PW/lO’-delayed recall, maximum score 24) and again after 24h (PW/24h-delayed recall, maximum score 24). Language Verbal comprehension was assessed by the Token Test (DE RENZI & FAGLIONI, 1978) which requires the subject to execute verbal orders of increasing difficulty involving 20 plastic tokens differing in shape (circles and squares), size (large and small) and color (black, white, red and green). The maximum score is 36. Verbal fluency was measured by the SET Test (ISAA~S & KENNIE, 1973). The subject was asked to produce, in three minutes, as many words as possible within specific categories (towns, fruits, animals and colors). Maximum score 40. Praxis Constructive praxis was evaluated by the Copying Drawings Test (ARRIGONI & DE RENZI, 1964). The subject was presented with eight drawings printed on the upper half of a sheet and asked to reproduce them on the lower half. Each design was scored 0 (unrecognizable reproduction), 1 (partially defective reproduction) or 2 (accurate reproduction). Visuomotor performance was measured by the Gibson Maze, a subtest of the CAS. The maze is a spiral drawn on a large card with obstacles in the shape of the letter 0 along its pathway. The subject had to trace the path through the maze with a pencil, avoiding the obstacles, within four minutes. Time and errors were recorded and then converted to a score, with maximum of 12.