Neuropsychological Tests for Monitoring Delirium Severity in Elderly Patients Kathy J. Christensen, Ph.D., Kris M. Bettin, B.S., Kris M.Jilk, B.A., Derik T. Weldon, B.A.,
John R. Mach, Jr., M.D. Tbe authors identified neuropsychological tests appropriate for use in monitoring delirium seuerity in elderly patients. Ten elderly patients were administered a battery of tests tobile they toere delirious and later during tbeir recover)'. All of tbe measures sbouied significant improvement across tbe two occasions (P < 0.05). Examination ofC011ZpOne11.ts of variance suggests that modified versions of tbe FOrUJa1-d Digit Span, Stmilarities, or Oral Sentence Spelling tests or a combination ofForward Digit Span and Similarities or Oral Sentence Spelling are likel)' to be I11,Ost effective in monitoring delirium severity. (American Journal of Geriatric Psychiatry 1996; 4:69-76)
N
e uropsychological testing may serve two different purposes in the study of delirium: 1) documentation of deficits in specific cognitive domains, and 2) monitoring of the level of delirium severity during its fluctuating course. The former objective is concerned with issues of diagnosis and elaboration ofbrain-behavior relationships, whereas the latter is important in the search for the neuro.. physiological correlates ofbehavioral fluctuations. This study identifies a group of neuropsychological tests that are appropriate for use in monitoring the level of delirium severity in elderly patients.
Hill et al. 1 in a review of the assessment of cognitive deficits in delirium states that these have not been studied in a systematic way and that further research is needed. Although the literature docu.. menting neuropsychological changes in delirium is very sparse, there are no neuropsychological studies that are can.. cerned with monitoring delirium severity Chedru and Geschwind'' presented the only large..scale study of neuropsychological functioning in delirium; however their subjects were under 60 years of age. They found, in the majority of patients, impairment in confrontation nam-
Received October 19, 1994; revised April 6, 1995; accepted June 2, 1995. From the Geriatric Research, Education, and Clinical Center, VA Medical Center, Minneapolis, MN. Address correspondence to Dr. Christensen, Geriatric Research, Education, and Clinical Center (11G), VAMedical Center; One Veterans Dr., Minneapolis, MN 55417. THE AMERICAN JOURNAl. OP GHIUATRIC PSYCHIATRY
69
Testsfor Monitoring Delirium ing, generative naming, reading, writing, right-left orientation, imitation of nonrepresentational movements performed by the examiner, finger recognition, calculations, drawing, and remote and recent memory Abnormalities were less common in oral spelling of single words, comprehension, repetition, similarities, and proverb interpretation. Trzepacz et a1. 3 has reported deficits on the Trailmaking Test, a measure of visual sequencing and set-shifting, in delirium, but in another stud)', over 50% of the delirious patients were incapable of performing the test at all.4 This finding highlights one of the important considerations in identifying tests for monitoring delirium severity: selecting tests that can be performed, albeit at a low level, by patients in the delirious state and that, similarly are not too easy for the recovering patient. In the current study, neuropsychological tests administered to patients with delirium were evaluated for their utility in monitoring delirium severity according to the following criteria: 1) sensitivity to differences between the delirious and recovered states; 2) absence of floor effects; 3) absence of ceiling effects; 4) length of administration time; and 5) acceptability to delirious patients. Also, we were interested in identifying a small group of tests that might be combined to produce a composite cognitive performance score. Toward this end, we planned to determine whether combining tests enhanced the sensitivity to differences between delirious and recovered states.
METHODS Subjects Subjects were patients ages 55 and older, hospitalized at the Minneapolis VA Medical Center, who were participating in a larger investigation of delirium 70
pathophysiology; Potential subjects were identified by nurses and physicians on medical and surgical wards who named patients demonstrating signs of confusion. The Confusion Assessment Method (CAM)5 was used to screen potential subjects. The CAM is a questionnaire that allows the observer to rate a patient on nine features of delirium. Diagnosis of delirium using the CAM requires the presence of 1) acute onset and fluctuating course; and 2) inattention; and either 3) disorganized thinking; or 4) altered level of consciousness. Patients whose CAM ratings suggested delirium and who met criteria of the Diagnostic and Statistical Manual of Mental Disorders, 31-d Edition, Revised (DSM-III-R)6 for delirium werc included in the study; provided they did not meet any of the following exclusion criteria: 1) evidence of previous memory loss, as determined by caregiver interview by means of questions from the Consortium to Establish a Registry for Alzheimer's Disease (CERAD) screen;7 2) history of mental illness; 3) serum hemoglobin less than 9.0 mg/dl or difficulty in drawing blood; 4) imminent death or discharge, 5) aphasia; 6) significant hearing or visual impairment after correction; or 7) potential compromise of medical care by frequent interviewing. Surrogate informed consent was obtained from the next-of-kin for all subjects. Instruments
The following neuropsychological tests were administered with modifications as described below Except where noted, descriptions of each of these tests appears in the text by Lezak.8 These were selected from a larger group of neuropsychological tests that were pilot-tested in a small group of delirious patients. Pilottesting was directed at answering four questions: 1) whether patients at varying levels of delirium severity could understand and respond to the test; 2) whether VOLUME 4 • NUMBER 1 • \VINTER 1996
Christensen et at. the test showed indications of being reliable in delirious patients, as determined by a general pattern of passing easier and failing more difficult items; 3) the degree to which scores on the test appeared to vary with levels of delirium severity; and 4) the administration time in delirious patients. A total of 13 tests received pilot evaluation. In general, delirious patients understood and responded to the test in a man.. ner that varied with the difficulty levels of the items. Because of the need to develop a short battery for frequent administration, final selection of the tests rested largely on the range of scores seen with varying levels of delirium severity and the brevity of the test. Because some delirious patients have limited use of their hands and arms, only tests requiring oral responses were included. The Temporal Orientation9 test was administered according to the standard instructions, however time of day was omitted because of patients' frequently referring to clocks or watches before responding. To produce scores that correlated positively rather than negatively with other measures, scoring was reversed so that higher scores represented better orientation. The Digit Span subtest from the Wechsler Memory Scale-Revised (WMS-R) 10 was administered with two additions: 1) forward spans of one and two digits if there were any fails on the three-digit spans; and 2) a third trial at each span length subsequent to the first fail. The new trials, which were added to increase the reliability of the test, were as follows: Forward
2,5,3; 8-7,4-1,6-9; 3-8-6; 5-9-3-4; 1-6-3-2-7;5-2-7-4-8-1; 9-7-1-8-2-4-3; 2-9-4-3-7-5-1-6; Back-
ward 4-9; 8-2-7; 9-5-6-2; 5-3-8-6-9; 7-3-5-9-1-4; 6-9-4-1-3-7-2. Failure on all three trials at a given span length were required for discontinuation. The Similarities subtest of the Wechsler Adult Intelligence Scale-Revised (WAIS-R) 11 was also administered. This
THE AMEIUCAN JOUHNAI. OF GERJATIUC PSYCHIATRY
was followed by Items 1, 2, 4, 6, and 8 of the Similarities subtest of the Wechsler Intelligence Scale for Children-Revised (WISC-R).12 The WISC-R items were included to lower the floor of the test, that is, to increase the probability that delirious patients would be able to answer some items correctly The scores for both WAIS-R and WISC-R items were added together to form a single score. An Oral Sentence Spelling test designed to tap working mernory'f was developed for this project. For delirious patients, the simplicity of the instructions for this task was considered to be an advantage over other tests of working memory, for example, classifying each word in an orally presented sentence as noun or verb. Subjects were asked to spell orally sentences of increasing length. If errors were made on the first sentence, subjects were asked to spell orally two words, "book" and "open," before proceeding with the remaining sentences. The sentences were Items 1, 2, 3, 5, 6, 7, and 9 from a sen. • · was astence repetition test. 14 0 ne point signed for correct spelling of each of the following: first letter of first word, first word,' each subsequent word. The test was discontinued after three consecutive items with incorrect spelling of all words and failure to give the first letter of the first word. A portion of the instructions was repeated before each item in order to assist the subject in maintaining the appropriate set. For Forward Digit Span, the subject was told, "Say these after me." For Backward Digit Span, the subject was told, "Say these backward." Each Similarities item was presented by asking, "In what way are a [word) and a [word) alike?" The instructions "Spell the words in the sentence (or the word), ....," preceded each Oral Sentence Spelling item. A period of 15 seconds was allowed for the patient to begin responding to each item. If the subject did not respond or appeared to sleep, he was reprompted 71
Testsfor Monitoring Delirium once by addressing him as "Sir," and then a repetition of instructions before the examiner moved on to the next item. Items continued to be presented until the subject reached discontinuation criteria. If the subject made no response while awake or appeared to be sleeping on two of three (Forward Digit Span, Backward Digit Span, Oral Sentence Spelling) or three of four (Similarities) consecutive items required for discontinuation, the data for that -test were treated as missing. Procedure
Subjects were tested at the bedside at 9 A.M., 3 HM., and 9 I~M. for seven successive interviews, starting with the first designated time after consent by next-of-kin. Eighth and ninth testing sessions were conducted on the 7th and 14th days of study enrollment at the same time of day as the first interview (One subject's final testing session was conducted 29 days after enrollment because of intervening medical procedures.) Tests were administered in the following order for all but one patient: 'Iernporal Orientation, Forward Digit Span, Backward Digit Span, Similarities, and Oral Sentence Spelling. Blood was drawn before or after the cognitive testing, and a brief sample of connected speech was obtaincd for a separate study after the neuropsychological testing. All sessions were videotaped, but cognitive scores were based exclusively on the examiner's written record of responses for consistency with clinical bedside testing conditions. Data Analysis
Subjects were selected for analysis who had sufficient data to enable a comparison of their cognitive performance in delirious and recovered states. In the absence of a "gold standard" for delirium severity, passage of time was used to differentiate the delirious from the recov72
ered states, with the expectation that some interviews assigned to the recovered state would reflect only partial recovery Subjects were chosen based on completion (defined as no more than one missing test score) of one of the first three and one of the last three scheduled testing sessions. The earliest interview with no more than one missing test score was defined as the delirious state for purposes of this study and the latest interview with no more than one missing test score was defined as the recovered state, Data were also included for one subject who completed only the first four testing sessions but who had a dramatic recovery from delirium by the fourth session. Videotapes of the interviews designated as delirious and recovered states were reviewed by a geropsychiatrist blind to the interview occasion for correspondence with the DSM-III-R behavioral criteria for delirium.
RESULTS A total of 10 of 17 enrolled patients met
the criteria for selection for this study. A total of 137 interviews were scheduled for the 17 enrolled patients. Twelve percent of these were not conducted because of changes in patients' medical condition, and 15% were not conducted because patients refused the interview. Four percent of the scheduled interviews were begun but discontinued because of patients' medical condition, and 10% were discontinued because of patient refusals. It should be noted that these rates are derived from a fairly rigorous testing schcdule (three times per day initially), and may not apply to studies with lighter testing demands. The mean age (and standard deviation [SO]) of the 10 selected subjects was 65.0 ± 7.2 years. The mean education (and SO) of the subjects was 11.0 ± 2.2 years. All subjects were white men. VOLUME 4 • NUMBER 1 • \VINTER 1996
Cbristensen et al. Geropsychiatric ratings of the videotaped interviews confirmed a change in delirium status between early and late interviews. On early interview, which typically occurred several hours after the screening for subject enrollment, 8 of the 10 subjects met all three DSM-III-Rbehavioral diagnostic criteria (A, B, and C) for delirium. On late interview, only one of the subjects met all three behavioral criteria. The two remaining subjects on early interview met two behavioral criteria. On late interview; five, one, and two of the remaining subjects met two, one, and no behavioral criteria, respectively. (One subject's late interview was not intelligible on videotape because he wore an oxygen mask, however the examiner in the room was able to understand and score his responses.) Three subjects were missing data for one test from the first test session selected in data analysis, either Temporal Orientation, Backward Digit Span, or Oral Sentence Spelling. Missing values were replaced with the mean values of the remaining subjects in data analyses. Table 1 presents the means and standard deviations (SO) of each of the tests for the first and last interviews. In keeping with the goal of evaluating each measure on five criteria (sensitivity to delirium recovery; floor and ceiling effects, administration time, and patient acceptability), univariate procedures were used to evaluate sensitivity to delirium recovTABLE 1.
Repeated-measures analysis of variance for early and late Interview scores in 10 subjects with delirium
Tests Temporal Orientation Forward Digit Span Backward Digit Span Similarities Oral Sentence Spelling all
ery Repeated-measures analyses of variance (ANOVAs) were used to determine which tests were able to detect differences between delirious and recovered states. Significant differences (P < 0.05) between occasions were seen on all of the tests. The differences remained significant when Holm's procedure was applied to control the generalized Type I error probability at 0.05. 15 The relative sensitivity of measures to change in delirium severity can be compared by use of an estimate of effect size. Effect size refers to the proportion of variance attributable to the variable of interest, in this case, passage of time or recovery from delirium. The measure of effect size appropriate for this study is a partial omega-squared,16,17 consisting of a ratio of the variance attributable to change across occasions over the error variance plus the variance attributable to change across occasions. A larger partial omega-squared value indicates a higher proportion of variance attributable to differences between the delirious and recovered states. A partial omega-squared value for each of the tests appears in Table 1. These values range from 0.27 to 0.53, with the highest value seen for Oral Sentence Spelling and the lowest value seen for Temporal Orientation. Floor effects, represented by scores of 0, were found for the fol lowing number of subject occasions on each of these tests: Orientation (11 = 1), Back-
Early Interview Scores (Delirium) Mean ± SD
66.33 9.70 0.89 4.50 11.78
± 35.20" ± 4.00 ± 1.36 3 ± 4.25 ± 12.14 3
Late Interview Scores (Recovery) Mean ± SD
99.10 15.10 4.20 13.80 37.10
± 13.31
± 3.21 ± 2.04 ± 5.73 ± 16.00
F[1,91
p<
Partial OmegaSquared
8.34 18.43 12.55 15.75 23.15
0.019 0.003 0.007 0.004 0.002
0.27 0.47 0.37 0.42 0.53
= 9.
THE AMERICAN JOURNAL OF GERIATIUC PSYCHIATRY
73
Testsfor Monitoring Delirium ward Digit Span (n = 7), and Similarities (n = 2). Ceiling effects, represented by maximum possible scores, were found for three subjects on Temporal Orientation. In order to determine whether a combination of tests might provide a more sensitive measure of changes in delirium severity; scores from pairs of tests were combined in the following manner. In order to obtain a common metric for combination across tests, z-scores were obtained for each test. The mean and SO for the entire set of scores for both occasions were used to derive the z-scores. These z-scores were summed for each possible pair of tests. Repeated-measures ANOVAs were conducted on the sums of these pairs of tests to determine whether the pairs resulted in more highly significant differences between occasions than the single tests. Only 3 of the 15 combinations resulted in higher F values and lower P values than the single tests that were combined: Forward Digit Span plus Oral Sentence Spelling (FII, 9) = 35.06; P < 0.001), Forward Digit Span plus Similarities (F(1.9) = 25.04; P < 0.002); and Backward Digit Span plus Similarities (F(1,91 = 17.88; P < 0.002). Partial omega..s quared values for each of the combinations, respectively, were 0.65, 0.55, and 0.46. Medians and ranges of times to complete each of the tests were as follows: Temporal Orientation (1 min, 45 sec, [23 sec-3 min, 51 sec]); Digit Span Forward (2 min, 48 sec, [1 min, 43 sec-7 min, 35 sec]); Digit Span Backwards (2 min, 17 sec, [56 sec-4 min, 45 sec]); Similarities (6 min, 18 sec, [2 min, 10 sec-IO min, 18 sec]); Oral Sentence Spelling (4 min, 14 sec, [2 min, 40 sec-12 min, 29 sec]). The three examiners who adminis.. tered the tests rated each test for its ac.. ceptability by the delirious patient according to the following scale: 1: not objectionable, no complaints by patients; 2: slightly objectionable, occasional com.. 74
plaints by patients; 3: somewhat objection.. able, frequent complaints by patients; and 4: quite objectionable, complaints by most patients. The mean rating for each test was as follows: Temporal Orientation, 1.5; Digit Span Forward, 1.67; Digit Span Backwards, 2.33; Similarities, 2.17; and Oral Sentence Spelling, 2.83.
DISCUSSION The purpose of this study was to provide information about the suitability of several neuropsychological tests for monitoring fluctuations in delirium severity. The results suggest that Temporal Orientation, Forward Digit Span, Backward Digit Span, Similarities, and Oral Sentence Spelling are all capable of detecting differences between groups of patients early in the course of delirium and patients at some stage of recovery from delirium. A study is currently under way with healthy elderly patients to evaluate the extent to which practice effects may contribute to these differences. A comparison of the magnitude of improvement seen in the current study with that reported for re .. peated administrations of the Digit Span and Similarities subtests of the WAIS_R 11 suggests that practice effects are likely to playa small role in the differences seen in the current study. For subjects ages 45-54, the oldest group forwhorn data on practice effects are provided, Digit Span (Forward and Backward combined) shows an improvement of less than one raw-score point and Similarities an im.. provement of less than two raw-score points over a 2-5-week interval. In the current study, Forward Digit Span scores improved by 5.4 raw points and Backward Digit Span by 3.3 points, for a total of8.7 points. Similarities scores improved by 9.3 points. Although the current study involved shorter time intervals, modifica.. tions to test protocols, and an older sample, it appears unlikely that these large VOLUME 4 • NUMBER 1 • \VINTER 1996
Christensen et al. improvements seen between early and late interviews are primarily a function of practice effects. Although the sample size in this study is too small to reach any firm conclusions about relative sensitivity to changes in delirium severity; 18 some of the tests showed a higher proportion of variance associated with the change from the delirious to the recovered state. A higher proportion of variance, as measured by partial omega-squared, indicates a greater ability of a given measure to respond to changes in delirium severity Partial omega-squared values for the Digit Span tests, Similarities, and Oral Sentence Spelling exceeded that for Temporal Orientation. Partial omega-squared values for three pairs of tests (Forward Digit Span plus Spelling, Forward Digit Span plus Similarities, and Backward Digit Span plus Similarities) exceeded those for the contributing single tests. Investigators planning studies that require monitoring of cognitive status in delirium patients may wish to first consider those tests or combinations of tests with higher partial omega-squared values. Backward Digit Span demonstrated frequent floor effects that would be likely to render it insensitive to changes within the more delirious patients. Similarities and Temporal Orientation showed small floor effects, whereas the rest of the measures were free from floor effects. Ceiling effects were seen only in Temporal Orientation. Examiners' ratings indicate that Oral Sentence Spelling, with frequent complaints from the delirious patients, was the most objectionable test. There were also occasional complaints about the other tests. Investigators considering frequent, repeated testing of delirious patients may wish to consider the potential impact on subject retention of patients' relative dislike for a given test. Although the purpose of the current
study was to identify measures for monitoring delirium severity; some comparison can be made with that of Chcdru and Geschwind.i where the goal was to characterize cognitive deficits in delirium. All measures in the current study showed improvement from the delirious to the recovered states, a finding that is consistent with initial impairment in all areas. These results agree with those of Chedru and Geschwind in finding deficits in temporal orientation and forward digit span. Although they report relative preservation of oral spelling, their measure was limited to the spelling of four individual words, in contrast to the sentences that showed impairment in the current study Subjects in their study did demonstrate significant spelling errors in writing to dictation. The major difference between the two studies is in the degree of impairment suggested on items from Similarities. They reported relative preservation, whereas the current study finds Similarities to be one of the stronger indicators of recovery Although there are several plausible explanations for this difference (e.g., use of a six-item scale and the exclusion of severely delirious and elderly subjects in the previous study), it is beyond the scope of this study to address this difference. The generalizability of this study is limited by its small sample size and the participation of exclusively white, male subjects with, on average, 11 years of education. A larger study may reveal inconsistencies with the current study with respect to the relative sensitivity of the measures to change in delirium severity Examination ofsubjects of both sexes and varying education levels and ethnic backgrounds would be required to ensure that these measures may be appropriately applied to delirious patients in these subgroups. Taken together, the findings from this study suggest that investigators planning to monitor cognitive status in patients
THE AMERICAN JOURNAL OF GERIATRIC PSYCHIATRY
75
Tests for Monitoring Delirium with delirium may wish to consider using versions of the Forward Digit Span, Similarities, or Oral Sentence Spelling, or a combination of Forward Digit Span plus either of the two other tests. Relative to Oral Sentence Spelling, Similarities offers the advantage of infrequent complaints from the delirious patients and the disadvantage of an approximately 2-minute longer median administration time and potentially greater problems with floor effects. Future studies of delirious and recovered states in a larger number of
subjects have the potential to provide more definitive information about the relative sensitivity of specific neuropsychological measures to changes in delirium severity
Portions of tbis research were presented at tbe American Geriatrics Society An· nual Scientific Meeting, Ma)' 1994 in Los Angeles, CA. Tbis research was supported by funds from tbe Medical Researcb Service oftbe Department ofVeterans Affairs.
References 1. Hill C, Rlsby H, Morgan N: Cognitive deficits in delirium: assessment over time. Psychopharmacol Bull 1992; 28:401-407 2. Chedru F, Geschwind N: Disorders of higher cortical functions in acute confusional states. Cortex 1972; 8:395-411 3. Trzepacz P, Brenner R, Thiel DV: A psychiatric study of 247 liver transplantation candidates. Psychosomatics 1989; 30: 147-153 4. T...zepacz P, Brenner R, Coffman G, et al: Delirium in liver transplantation candidates: discriminant analysis of multiple test variables. Bioi Psychiatry 1988; 24:3-14 5. Inouye S, Dyck CV, Alessi C, et al: Clarifying confusion: the confusion assessment method. Ann Intern Med 1990j 113:941-948 6.American Psychiatric Association: Diagnostic and Statistical Manual of Mental Disorders, 3rd Edition, Revised. \Vashington, DC, American Psychiatric Association, 1987 7. Morris J, Heyman A, Mohs R, et al: The consortium to establish a registry for Alzheimer's Disease (CEHAD), part I: clinical and neuropsychological assessment of Alzheimer's disease. Neurology 1989; 39: 1165-1168 8. Lczak M: Neuropsychological Assessment, 2nd Edition. New York, Oxford University Press, 1983 9. Benton A, Hamsher DO, Varney N, ct al: Contri-
76
butions to Neuropsychological Assessment. New York, Oxford University Press, 1983 10. \Vechslcr D: \VMS-H: \Vechsler Memory Scale-Revised Manual. New York, Psychological Corporation, 1987 11. \Vechsler D: \Vechsler Adult Intelligcnce Scale-Revised Manual. New York, Psychological Corporation, 1981 12. \Vechsler D: Manual for the \Vechslcr Intelligence Scale for Children-Revised, New York, Psychological Corporation, 1974 13. Baddeley A: \Vorking Memory. New York, Oxford University Press, 1989 14. Benton A, Hamsher K: Multilingual Aphasia Examination. Iowa City, lA, University of Iowa, 1976 (revised 1978) 15. Holm S: A simple, sequentially rejective, multiple test procedure. Scandinavian Journal of Statistics 1979; 6:65-70 16. Gabclei n J, Soderqu ist D: The utility of wi th i nsubjects variables: estimates of strength. Educational and Psychological Measurement 1978; 38:351-360 17. Lipsey M: A scheme for assessing measurement sensitivity in evaluation and other applied research. Psychol Bull 1983; 94:152-165 18. Carroll It, Nordholm L: Sample characteristics of Kelley's £2 and Hays' (02. Educational and Psychological Measurement 1975; 35:541-554
VOLUME 4 • NUMBER 1 • \VINTER 1996