Pergamon
Archivesof ClinicalNeuropsychology,Vol.1I, No.7, pp. 557-57I, 1996 Copyright© 1996NationalAcademyof Neuropsychology Printedin the USA.All rightsreserved 0887-6177/96$15.00+ .(DO
SSDI 0887-6177(95)00043-7
Relationship Between Reaction Time and Neuropsychological Test Performance Stephanie L. Western and Charles I. Long The University of Memphis
Reaction time (RT) has been used to discriminate individuals with brain damage from controls, but little is known about the relationship between RT and neuropsychological (NP) tests. This relationship was examined 3 ways. First, the mean, median, and SD scores of simple and choice RT for 213 referrals were used as potential predictors of NP impairment in discriminant analyses; cross-validation produced a significant hit rate. Second, the hit rate (NP impairment vs. no) using RT was compared to hit rates using fluency, IQ, and memory measures; the RT rate did not significantly exceed the others. Third, the abilities of RT and of a NP impairment score to classify 28 subjects into brain injury and control groups were compared; the hit rates were not significantly different. Results are interpreted to indicate moderate agreement between RT and NP tests regarding impairment classifications and to suggest roles for RT in NP assessment. Copyright© 1996 NationalAcademyof Neuropsychology
Measures of reaction time to visual stimuli have been used as indicators of speed of processing and have been repeatedly shown to discriminate individuals with supra-tentorial cerebral disease of almost any type from those with no brain damage (Blackburn & Benton, 1955; Dee & Van Allen, 1973; DeRenzi & Faglioni, 1965; Elsass & Hartelius, 1985; MacHynn, Montgomery, Fenton, & Rutherford, 1984; Miller, 1970). Analysis of the relation between reaction time (RT) and location of lesions has revealed that RT measures are sensitive both to general lesions and to focal lesions in various locations in the brain (Benton & Joynt, 1958; MacFlynn et al., 1984). Whereas bilateral lesions often produce the greatest effect on RT (Newcombe & Ratcliff, 1979), patients with unilateral cerebral disease have also been found to be significantly slower than control patients (Benton & Joynt, 1958). Consistent findings of sensitivity of RT performance to brain damage have led to the suggestion that "reaction time performance is a cognitive task requiring continuous information processing and sustained attention and may well be a consequence of a subtle disturbance of cortical function of a global nature" (MacHynn et al., 1984, p. 1330). Elsass (1986) suggested that "the discriminative power of continuous reaction time in brain disease has been equivalent to more sophisticated and complex psychological tests" (p. 237), and it has been proposed that RT tasks be used in the diagnosis and determination Address correspondence to: Charles J. Long, PhD, Department of Psychology, The University of Memphis, Memphis, TN 38152. 557
558
S. L. Westernand C. J. Long
of the severity of brain damage (Dee & Van Allen, 1972; DeRenzi & Faglioni, 1965). Certainly RT measures should be investigated regarding their possible benefits as an addition to more traditional neuropsychological batteries. Such applications of RT may provide additional measures of cognitive functions and increase the accuracy of decisions regarding the presence and extent of brain dysfunction. Despite the existence of extensive research investigating the ability of RT procedures to detect the presence of brain dysfunction, there is a lack of research investigating the relationship between RT tasks and traditional neuropsychological measures such as the Halstead-Reitan Battery. Variations of Halstead's Impairment Index have been calculated from performance on several tests and demonstrated to be "very sensitive to impaired brain functions" (Reitan & Wolfson, 1993, p. 92). Wheeler, Burke, and Reitan (1963) used the Impairment Index to correctly predict the presence or absence of cerebral damage in approximately 87% of 219 subjects. Because RT tests, like the Halstead-Reitan Battery, are sensitive to cerebral dysfunction, it is of interest to examine the relationship between RT and neuropsychological test performance. The examination of RT performance is complicated by the fact that several summary scores can be obtained from a series of RT trials. RT studies are not uniform in the selection of a summary score. The mean RT has often been used (Byme, 1976; DeRenzi & Faglioni, 1965; Miller, 1970), but other researchers have argued that median scores should be preferred because of the skewness of RT distributions (Van Zomeren & Deelman, 1978) and the relationship of mean scores to variability in abnormal groups (Dee & Van Allen, 1973). In addition to measures of central tendency, the SD has been used as a measure of dispersion (Jensen, 1992; Stuss et al., 1989). RT studies vary not only in the summary score chosen to represent performance but also in the task characteristics. At a general level, all RT tasks can be dichotomized into simple and choice designs. Simple reaction time (SRT) procedures require an invariant response each time a stimulus is detected. Choice reaction time (CRT) procedures utilize two or more stimuli and either require different responses for different stimuli or are "go/no-go" tasks that require a response for certain stimuli and inhibition of the response for others. Whereas simple measures have been found to differentiate individuals with brain dysfunction from those without (Elsass & Hartelius, 1985; Tartaglione, Bino, Manzino, Spadavecchia, & Favale, 1986), much research has shown CRT to have higher discriminative power (Benton, 1986; Miller, 1970; Van Zomeren & Deelman, 1978). On the other hand, some studies have found either no difference or an advantage for SRT in discriminative ability (Blackburn & Benton, 1955; DeRenzi & Faglioni, 1965). The issue remains unresolved at least partially because RT studies have employed various types of experimental and control subjects. They have also utilized diverse experimental procedures (Benton, 1986; Elsass, 1986). The present study involved three experiments in which the relationship between RT measures and a neuropsychological impairment score was examined. RT measures that were considered were the mean, median, and SD scores of both a simple and a choice procedure. The impairment score was derived from the Halstead-Reitan Battery and based on Halstead's Impairment Index. The relationship was examined by using RT measures as predictors of neuropsychological impairment, by comparing RT measures as predictors of neuropsychological impairment to other predictors, and by comparing the abilities of RT, the impairment score, and other scores to differentiate subjects with recent history of traumatic brain injury from controls. E X P E R I M E N T 1: CLASSIFICATION OF NEUROPSYCHOLOGICALLY IMPAIRED VERSUS NOT IMPAIRED SUBJECTS BY RT In Experiment 1, demographic variables as well as the mean, median, and SD scores of both an SRT and a CRT procedure were considered for selection as predictors of neuropsy-
Reaction 7~meand TestPerformance
559
chological impairment as defined by an impairment score. It was hypothesized that the combination of selected demographic and RT variables would correctly classify a significant percentage of subjects as impaired or not impaired. The percentage of subjects correctly classified was considered significant if it met two criteria: it was significantly higher than the base rate, and it was significantly higher than the classification rate obtained using only demographic data. The comparison with the base rate was important because, as stated by Willis (1984), "in order to be efficient, an actuarial rule should make possible more correct classifications than could be made from base rate data alone" (p. 567). The comparison with demographic predictors was important to assess the classification power of RT beyond the power of demographic variables.
METHOD
Subjects Subjects were 426 people who had been referred to and assessed in a neuropsychological laboratory between 1988 and 1995. Approximately 50% were traumatic brain injury patients, 13% had significant emotional problems, and 7% had vascular disorders. Other referral problems included developmental disorders, tumors, seizure disorders, and infections of the nervous system. A small number (3%) had been referred but had no documented neurological abnormality, no traumatic brain injury, and no significant psychiatric history. The mean age of the subject group was 35.22 years (SD = 13.87), and mean education was 12.62 years (SD = 2.73). Males composed 69% of the group. The 426 subjects had been chosen from a larger database because they met four criteria: 1. Scores for at least 5 of the 7 measures contributing to an impairment score had been recorded. 2. Age, education, and gender had been recorded. 3. A 50-trial SRT task had been administered and valid responses had been provided for at least 45 of the trials. 4. A 30-trial CRT "go/no-go" task had been administered and valid responses had been provided for at least 16 of the 18 trials that required a response. Valid SRT and CRT responses were defined as scores between 150 and 2600 ms. (Responses faster than 150 ms were considered anticipatory; responses slower than 2600 ms were considered late.) Each subject was labeled according to a neuropsychological impairment score as either impaired or not impaired. (Calculation of the impairment score is explained in the "procedure" section.) Individuals with impairment scores greater than or equal to 0.5 were labeled as impaired; scores less than 0.5 signified no impairment. (The cutoff of 0.5 is based on the Impairment Index cutoff recommended by Reitan and Wolfson, 1993). This procedure resulted in the identification of 174 (40.8%) subjects as impaired and 252 (59.2%) as not impaired. The entire subject pool - - including both impaired and not impaired individuals - - was then divided randomly into two groups for analysis. One-half of the subjects formed the first subject group on which initial discriminant analyses were performed (discriminant group). Cross-validation analyses were run on the other 213 subjects who composed a second group (validation group). Impaired subjects accounted for 39.4% of the discriminant group and 42.3% of the validation group; these percentages were not significantly different, z = -0.61, p > 0.50. Mean age did not differ significantly between the discriminant (M = 35.35, SD = 13.37) and validation (M = 35.09, SD = 14.39) groups, t(424) = 0.19, p > 0.80.
560
S. L Western and C. J. Long
Likewise, years of education (M = 12.62, SD = 2.44 and M = 12.63, SD = 3.00 for discriminant and validation groups, respectively) did not significantly differ, t(407.34) = -0.07, p > 0.90. The percentages of males in each group (68.5% in the discriminant group and 69.5% in the validation group) were not significantly different, z = -0.22, p > 0.80. A reference group of 293 pseudo-neurologic controls was considered for the conversion of raw scores into age-corrected standard scores. This control group was comprised of people referred for testing over a period of 18 years who proved to have no documented neurological abnormality, no traumatic brain injury, and no history of significant psychiatric disturbance. Test Materials
RT programs were written in Turbo Pascal and presented on Zenith laptop computers. Timing for each trial was measured and recorded in milliseconds. Neuropsychological tests administered included the Tactual Performance Test, the Seashore Rhythm Test, the Speech Sounds Perception Test, the Finger Tapping Test, and the Trail Making Test (Reitan & Wolfson, 1993). Standard testing materials were used for each test with the exception of an alternate response form for the Speech Sounds Perception Test (Bolter, Hutcherson, & Long, 1984). Discriminant analyses were run using SPSS for Windows, Release 5 (Norusis, 1992a). Procedure
During the RT tasks, the subject sat facing the computer monitor with a finger or fingers of either hand resting on the space bar of a keyboard. RT was measured by the computer as the time between appearance of the stimulus and depression of the space bar. The SRT task was modeled after the task of Elsass (1986). Five practice trials were followed by 50 test trials. On each trial, a 13 mm diameter, solid blue circle appeared on the screen and the subject was instructed to tap the space bar as quickly as possible after the appearance of the circle. The circle disappeared after depression of the space bar. The interval between the response and the appearance of another circle was of variable duration, with a mean of 3.4 s. Responses within a time limit of approximately 3.5 s were followed by a short, high-pitched beep. Trials with no response were followed by a longer, low-pitched beep. Scores were the mean, median, and SD of all valid responses (responses occurring between 150 and 2600 ms after stimulus presentation) on test trials. On each of the 5 practice and 30 test trials in the CRT task (which was modeled after Miller, 1970), two beeps served as a warning signal. The beeps were followed by a delay of variable duration; the mean delay was 1.3 s. After the delay, a horizontal row of four circles appeared. Each circle was 13 mm in diameter. Some of the circles were rings of blue and some were solid blue. The number of solid blue circles varied from 0 to 4 across trials. A "go/no-go" rule was employed, and the subject was instructed to tap the space bar as quickly as possible only on trials with two solid blue circles. Trials with fewer or more than two solid circles required no response. The probability of two solid circles appearing on each test trial was 0.60; 18 trials therefore required a response. A high-pitched beep sounded after correct responses, a low beep sounded after errors, and no beeps sounded after correct inhibitions of a response. Scores were the mean, median, and SD of valid responses on test trials requiring a response. Each subject obtained at least 5 of the following neuropsychological test scores: Tactual Performance Test - - Total Time, Tactual Performance Test - - Memory, Tactual Performance Test - - Location, Seashore Rhythm Test (errors), Speech Sounds Perception Test (errors), Finger Tapping Test (number of taps; mean over 5 trials), and Trail Making Test-Part B
Reaction Iime and Test Performance
561
(time). Six of these seven (TP'r-Total Time, TPT-Memory, TPT-Location, Seashore Rhythm Test, Speech Sounds Perception Test, and Finger Tapping Test) were used in identical or modified form in the creation of Halstead's Impairment Index (ReRan, 1959). The test performances of normal, older individuals can be mistakenly identified as indicative of brain damage if age is not considered in the interpretation of scores (Long & Klein, 1990). To correct for age, subjects were assigned to one of six age ranges: 15-24, 25-34, 35-44, 45-54, 55-64, and over 64. The pseudo-neurologic reference group was also divided into these ranges. Each neuropsychological test score was then converted into an age-corrected standard score with a mean of 50 and a SD of 10 through use of the mean and SD of the portion of the pseudo-neurologic reference group in the same age range. Impaired ranges for each age-corrected standard score (Table 1) were derived by converting Halstead's cutoffs for brain damage as reported by Reitan (1959) into standard scores with a mean of 50 and a SD of 10. The impairment scores were calculated by dividing the number of standard scores in the impaired range by the number of scores obtained, and each subject was labeled either impaired (score greater than or equal to 0.5) or not impaired. As previously mentioned, 39.4% of the discriminant group and 42.3% of the validation group were labeled as neuropsychologically impaired. Consequently, 60.6% of the discriminant group and 57.7% of the validation group were not impaired. These values (60.6% and 57.7%) were defined as the base rates for the two groups. It was important to define the base rates as 60.6% and 57.7% instead of 39.4% and 42.3% because 60.6% and 57.7% of the discriminant and validation groups, respectively, could have been correctly classified simply by assuming all subjects were not impaired. Several predictive discriminant analyses were run on the discriminant group. The Wilks method with a maximum F probability of 0.04 to enter (PIN) and a maximum F probability of 0.05 to remain (POUT) was used for all analyses. The goal of each analysis was to classify subjects as impaired or not impaired. Each produced two Fisher's linear discriminant functions, one corresponding to impaired individuals and one to not impaired individuals. Cross-validations were performed by using these functions to calculate two scores for each subject in the validation group; the largest score indicated group membership (Norusis, 1992b). Cross-validations were included because internal classification analyses (in which one sample of subjects is classified according to classification statistics derived from that same sample) can result in positively biased hit rates (Huberty, 1984). The first three discriminant analyses were run to determine how well two demographic variables (age and education) could predict group membership. In one analysis, age (in years) was entered directly (force-entered). In another, education (in years) was force-entered. In the third, age and education were force-entered. All three analyses were cross-validated. In the validation group, age correctly classified 46.0% of the subjects, education classified 60.1%,
TABLE 1 Impaired Ranges for Scores Used in the Calculation of the Impairment Score Score Tactual Performance Test: Total Time (minutes) Tactual Performance Test: Memory (blocks) Tactual Performance Test: Location (blocks) Seashore Rhythm Test (errors) Speech Sounds Perception Test (errors) Finger Tapping Test: dominant hand (mean number of taps) Trail Making Test: Part B (seconds)
Lower Limit of Impaired Rangea 55.981 68.794 56.085 52.319 51.733 51.015 61.375
aThe values represent standard scores derived from Halstead's cutoffs for brain damage (Reitan, 1959); scores are inversely related to performance.
562
S. L Western and C. J. Long
and age with education classified 60.6%. Because age and education together had the highest hit rate (rate of correct classification) and exceeded the base rate of 57.7%, they were forceentered with RT variables in all subsequent discriminant analyses to examine the effect of RT on classification over and above the effect of these demographic variables. Six RT variables were available from each subject for consideration as predictors in subsequent discriminant analyses: mean, median, and SD for both SRT and CRT tasks. The method of variable selection was based on the recommendation of Huberty (1984). This method entails adding one variable in each of several phases until the variable set yielding the maximum hit rate is reached. In the first phase, six discriminant analyses were run on the discriminant group. Age, education, and one of the six RT variables were force-entered for each. Cross-validations were run. The set of three variables (age, education, and one RT variable) yielding the highest hit rate in the validation group was identified. This three-variable set was then force-entered along with one of the five remaining RT variables for each of the five discriminant analyses in phase two, and the best four-variable set was identified. The third phase was then run to create five-variable sets. None of the five-variable sets in phase three produced a hit rate that exceeded the maximum in phase two, so the process of variable selection was discontinued.
RESULTS Hit rates for the discriminant analyses used in the first two phases of variable selection are listed in Table 2. In the first phase, the combination of predictors yielding the highest hit rate (69.0%) added median CRT to age and education. The addition of either mean SRT or median SRT to this triplet in phase two resulted in equivalent hit rates of 69.5% (the maximum for this phase); the median score was chosen for retention because its addition provided the highest hit rate in the discriminant group. None of the potential fifth variables in phase three increased the hit rate above 69.5% and variable selection was discontinued. The final classification distribution is shown in Table 3. The maximum hit rate of 69.5% obtained by using age, education, median CRT, and median SRT was compared to the base rate of 57.7% and to the hit rate of 60.6% that resulted from the use of only age and education as predictors. Use of the maximum chance criterion (Huberty, 1984) with an alpha level of 0.05 (one-tailed) led to the conclusion that the 69.5% hit rate significantly exceeded the base rate, z = 3.47, p < 0.0005. Use of a one-tailed test for the significance of the difference between two correlated proportions (Ferguson & Takane, 1989) led to the conclusion that the 69.5% hit rate also significantly exceeded the 60.6% hit rate, z = -3.12, p < 0.001.
DISCUSSION The results indicate that two RT variables (median CRT and median SRT), in combination with two demographic variables (age and education), classified significantly more people according to neuropsychological impairment than could have been classified using only base rate information. In addition, the classification rate significantly exceeded the rate obtained using the demographic variables alone. This can be interpreted to indicate at least moderate agreement between RT and the neuropsychological impairment score on decisions regarding impairment. The selection of both a choice and a simple measure as predictors may indicate that each type of task made a contribution to classification power. In addition, the hit rates obtained
Reaction l~me and Test Performance
563
TABLE 2 Percentages of the Validation Group Correctly Classified by Reaction Time Measures and Demographic Variables In Experiment 1 Predictorsa
Hit Rate
Pre-Phase 1 Base rate Age, ed
57.7 60.6
Phase 1 Age, ed, SRT mean Age, ed, SRT median Age, ed, SRT SD Age, ed, CRT mean Age, ed, CRT median Age, ed, CRT SD
67.6 66.7 63.8 68.1 69.0 64.3
Phase 2 Age, ed, CRT median, SRT mean Age, ed, CRT median, SRT median Age, ed, CRT median, SRT SD Age, ed, CRT median, CRT mean Age, ed, CRT median, CRT SD
69.5 69.5 68.1 66.2 68.1
aMean, median, and SD refer to statistics computed for each subject. SRT = simple reaction time; CRT = choice reaction time; ed = years of education; SD = standard deviation.
using the mean SRT and median SRT scores were only 0.5% and 2.3% lower than the corresponding CRT scores (Table 2). However, the actual increase in correct classifications between the first and second phases was small (0.5%). At this point, it can be tentatively concluded that SRT may be a useful predictor but that its contribution to the classification power provided by CRT may be mostly redundant. Both of the RT scores selected were measures of central tendency, specifically median scores. However, a comparison of the median CRT hit rate (69.0%) to the CRT SD hit rate (64.3%) indicated that the difference was not statistically significant, z = 1.67, p > 0.05. Likewise, the median CRT hit rate (69.0%) did not differ significantly from the mean CRT hit rate (68.1%), z = 0.50, p > 0.60. Thus, conclusions regarding the superiority of one type of summary score should not be made based on the data of Experiment 1. The classification rate of 69.5% indicates that 30.5%, or 65 individuals, were misclassified. The misclassifications were almost evenly split between two types (Table 3): false positives (people not impaired on neuropsychological testing but classified as impaired by RT,
TABLE 3 Classification of Subjects in the Validation Group Using Age, Education, Median Choice Reaction Time, and Median Simple Reaction Time as Predictors in Experiment 1 Predicted Group Actual Group
Impaired
Not Impaired
Total
Impaired Not impaired Total
28.2 16.4 44.6
14.1 41.3 55.4
42.3 57.7 100.0
Note. The values represent percentages of subjects (n = 213).
564
S. L Western and C. J. Long
16.4%) and false negatives (people impaired on neuropsychological testing but classified as not impaired by reaction time, 14.1%). There are several possible explanations for the misclassifications. It is probable that factors other than level of cognitive functioning have an effect on reaction time. For example, people diagnosed with depression (Byrne, 1976; Knott & Lapierre, 1987; Martin & Rees, 1966) or schizophrenia (King, 1965; Rosofsky, Levin, & Holzman, 1982; Schwartz et al., 1989) have been found to have slower RTs than controls. Another explanation for false positive classifications is that these subjects have a life history of very high levels of cognitive functioning and that these levels have been lowered, yet the lowered level does not fall into the impaired range. In this case, the RT procedures would be detecting a decrement in cognitive abilities not identified by the impairment score. Similarly, it is possible that some people classified as false negatives would always have been impaired according to the impairment score, and thus the quicker-than-expected reaction time may reflect a lack of change in level of functioning. False negative classifications may also have resulted from conditions such as limited education, restricted arm movement, or hearing problems that may have adversely affected neuropsychological test performance while having minimal effect on RT. The last three proposed explanations for false negative classifications (previously high level, long-standing low level, and adverse conditions) suggest that RT may actually differentiate those with brain dysfunction from those without more accurately than the impairment score under certain circumstances. The impairment score is indeed an imperfect indicator of evidence of cerebral damage, and it was for this reason that Experiment 3 was run. First, however, Experiment 2 was completed to determine whether other tests could have been used in place of RT as predictors of neuropsychological impairment.
E X P E R I M E N T 2: COMPARISON OF T H E CLASSIFICATION ABILITY OF RT TO FLUENCY, I N T E L L I G E N C E , AND MEMORY MEASURES In this second experiment, the percentage of subjects correctly classified using the combination of RT variables selected in the first experiment was compared to the hit rates obtained using scores from other tests, specifically the Thurstone Word Fluency Test, the Wechsler Adult Intelligence Scale - - Revised (WAIS-R), and the Wechsler Memory Scale. It was important to compare the classification ability of RT to the abilities of other tests to determine whether the hit rate obtained in Experiment 1 indicates a sensitivity and specificity of RT to impairment that exceeds the sensitivity and specificity of other tests. It was hypothesized that the RT hit rate would indeed be significantly greater than the hit rate obtained using the fluency, intelligence, or memory scores.
METHOD Subjects
Subjects considered for inclusion were the same 426 subjects from the first experiment. Labels of "impaired" and "not impaired" were retained from Experiment 1, as were assignments into the discriminant and validation groups. Three discriminant analyses with cross-validations were run, each involving one of three tests not included in the first experiment; subjects missing a score for a particular test were excluded from the relevant analysis only. For the analysis using the Thurstone Word Fluency Test, 189 subjects in the discriminant group and 194 in the validation group were retained. When using the Full-Scale Intelligence Quotient (FSIQ) of the WAIS-R, 206 and 207 in the discriminant and validation groups, respectively,
Reaction l~me and Test Performance
565
were retained. The analysis using the Memory Quotient from the Wechsler Memory Scale involved 212 subjects in the discriminant group and 213 in the validation group. Test Materials
Tests included in Experiment 2 were those of Experiment 1 plus the Thurstone Word Fluency Test (Pendleton, Heaton, Lehman, & Hulihan, 1982), the WAIS-R (Wechsler, 1981), and the Wechsler Memory Scale (Wechsler, 1945). Procedure
Three predictive discriminant analyses were run on the discriminant group and cross-validated on the validation group. Each analysis involved force-entering age, education, and either the sum of words produced on the Thurstone, the FSIQ, or the Memory Quotient. The Wilks method with a PIN value of 0.04 and a POUT value of 0.05 was used.
RESULTS Percentages of correct classifications made using the Thurstone fluency score, the FSIQ, and the Memory Quotient (each along with age and education) are listed in Table 4. The RT entries following each of these tests reflect the rates of classification of the same subjects (e.g., 194 for the entry following the Thurstone results) by the linear discriminant functions incorporating age, education, median CRT, and median SRT from Experiment 1. The difference between the fluency hit rate (65.5%) and the RT hit rate (68.6%) was in the hypothesized direction, and a one-tailed test for the significance of the difference between two correlated proportions (Ferguson & Takane, 1989) was used to determine whether the difference was statistically significant. It was not, z = 0.88, p > 0.15. Testing of the hypotheses regarding the differences between the intelligence and RT hit rates and between the memory and RT hit rates did not require statistical tests because the direction of the difference was opposite to that hypothesized. Not only did the classification ability of
TABLE 4 Comparison of Correct Classification Rates of Reaction Time to Rates of Three Other Scores in Experiment 2 Predictors a
Hit Rate b
n = 194 Thurstone score CRT median and SRT median
65.5 68.6
n = 207 Full-Scale Intelligence Quotient CRT median and SRT median
75.8 68.6
n=213 Memory Quotient CRT median and SRT median
73.2 69.5
aClassifications were made using age and education in addition to each of the test scores listed. SRT = simple reaction time; CRT = choice reaction time, bThe values represent percentages of subjects.
566
S. L Western and C J. Long
RT not significantly exceed the abilities of these other scores, but it did not exceed them at all. If the hypotheses had not specified directionality, two-tailed tests for the significance of the difference between two correlated proportions would have led to the conclusion that the difference between 75.8% (FSIQ) and 68.6% (RT) was statistically significant, z = -1.99, p < 0.05, while the difference between 73.2% (Memory Quotient) and 69.5% (RT) was not, z = -1.05, p > 0.25.
DISCUSSION The hypothesis that the classification ability of RT would exceed the abilities of other test scores (Thurstone sum, FSIQ, Memory Quotient) was not supported. In fact, the RT hit rate was significantly lower than the FSIQ hit rate. Results suggest that although RT can differentiate neuropsychologically impaired from not impaired individuals at a significant rate, this rate is not notably higher - - or is even lower - - than the classification rates of the other scores examined. However, it remains important to realize that the goal of classification in Experiment 2 was to differentiate according to an imperfect indicator (the impairment score). Subjects were grouped according to history instead of the impairment score in Experiment 3 to provide another way to examine the relationship between RT and neuropsychological performance.
EXPERIMENT 3: CLASSIFICATION OF TRAUMATIC BRAIN INJURY VERSUS CONTROL SUBJECTS BY RT, THE NEUROPSYCHOLOGICAL IMPAIRMENT SCORE, AND THREE OTHER SCORES A third step in examining the relationship between RT and the impairment score involved classifying subjects according to history instead of according to neuropsychological performance. In Experiment 3, the ability of selected RT measures to differentiate subjects with recent history of traumatic brain injury from controls was compared to the classification ability of the neuropsychological impairment score. Secondarily, comparisons were made between RT and three other test scores. It was hypothesized that the RT hit rate would exceed all hit rates except that of the impairment score, from which it was expected to not significantly differ.
METHOD Subjects
The recorded histories of the 426 subjects from Experiment 1 were examined to determine the nature of the referral question for each subject, and two subject groups were chosen. The first group was comprised of 14 individuals who had been tested less than 3 months following a closed head injury that was moderate, severe, or very severe. Severity of injury was determined by the reported duration of posttraumatic amnesia; all 14 subjects reported at least 1 day of amnesia, corresponding to a classification of moderate severity (Long, 1991). Only testing within 3 months was considered in order to minimize the possibility that complete recovery had occurred at the time of testing (Long, 1991). The second group was comprised of all subjects classified as pseudo-neurologic controls according to history. Pseudo-neurologic controls are individuals referred for testing with no documented neurological abnormality, no traumatic brain injury, and no significant psychiatric history. Coincidentally, the number of subjects in the second group (14) was equal to the number in the first, and therefore the base rate for this experiment was 50.0%.
Reaction Time and Test Performance
567
Test Materials
Tests included in Experiment 3 were identical to those in Experiment 2. Procedure
Predictive discriminant analyses were run to classify subjects into the traumatic brain injury and control groups. The Wilks method with PIN and POUT values identical to those of Experiments 1 and 2 was used. Because of the small number of subjects, cross-validation analyses were not performed. This limitation was considered when analyzing the results for Experiment 3. Three preliminary discriminant analyses were run; first age was force-entered, second education was force-entered, and third age and education were force-entered. Correct classification rates were as follows: age = 67.9%, education = 57.1%, age and education = 71.4%. Consequently, age and education were force-entered along with other variables in the subsequent discriminant analyses. Six RT variables (mean, median, and SD for the simple and the choice tasks) were used as potential predictors to add to age and education. Variables were selected using Huberty's (1984) method as in Experiment 1, except that hit rates were not cross-validated. As in Experiment 1, the variable combination with the highest hit rate was obtained in the second phase. Another discriminant analysis was run using age, education, and the impairment score to classify the 28 subjects. The impairment score was a continuous variable ranging from 0 to 1, and its derivation was described for Experiment 1. The hit rate obtained with the impairment score was compared to that obtained with the RT variables. Three additional discriminant analyses were run, each using one of these variable combinations: l. Age, education, and Thurstone fluency sum. 2. Age, education, and FSIQ. 3. Age, education, and Memory Quotient. The analyses involving fluency and the intelligence quotient considered 26 and 27 subjects, respectively, because not all subjects had obtained these scores. All 28 had completed the memory test.
RESULTS The results of the first two phases of the RT variable selection procedure are outlined in Table 5. The first phase in selection resulted in the addition of either mean SRT or the CRT SD. Each of these variables yielded a hit rate of 78.6%. Phase two was thus run twice, adding each of the remaining five variables individually to either age, education, and mean SRT or age, education, and CRT SD. (Only the run providing the highest hit rate appears in Table 5). At the end of phase two, the highest hit rate (89.3%) was obtained with age, education, choice SD, and simple SD. No variable combinations in phase three produced a hit rate exceeding 89.3%, so variable selection was discontinued. Rates of correct classification for RT as well as for the other scores are listed in Table 6. The RT hit rate exceeded (at least marginally) all other hit rates. The most interesting difference for purposes of this study was that between RT and the impairment score. A two-tailed test was used to examine the difference for statistical significance. The difference was not statistically significant, z = -1.13, p > 0.25. The only other difference tested statistically was that between RT and the Memory Quotient. A one-tailed test
568
S. L Western and C. J. Long
TABLE 5
Percentages of Subjects Correctly Classified by Reaction Time Measures and Demographic Variables in Experiment 3 Predictors a
Hit Rate
Pre-Phase 1 Base rate Age, ed
50.0 71.4
Phase 1 Age, ed, Age, ed, Age, ed, Age, ed, Age, ed, Age, ed,
SRT mean SRT median SRT SD CRT mean CRT median CRT SD
78.6 75.0 75.0 71.4 71.4 78.6
Phase 2 Age, ed, Age, ed, Age, ed, Age, ed, Age, ed,
CRT SD, CRT SD, CRT SD, CRT SD, CRT SD,
85.7 82.1 89.3 85.7 85.7
SRT mean SRT median SRT SD CRT mean CRT median
aMean, median, and SD refer to statistics computed for each subject. SRT = simple reaction time; CRT = choice reaction time; ed = years of education; SD = standard deviation.
was used, because the direction of the difference had been hypothesized. The RT hit rate was significantly greater, z = -2.12, p < 0.05. RT was not compared statistically to the fluency or intelligence scores because all subjects had not obtained these scores, and thus a condition of the test for the significance of a difference between correlated proportions (equivalent sampies) was violated.
DISCUSSION Results need to be interpreted cautiously. Because the samples were not large enough to permit cross-validation analyses, the absolute percentages should not be considered as indi-
TABLE 6 Percentages of Subjects Correctly Classified into Traumatic Brain Injury and Control Groups by Five Different Test Score and Demographic Combinations in Experiment 3 Predictors a
n
Hit Rate
Base rate Age, ed Age, ed, CRT SD, SRT SD Age, ed, impairment score Age, ed, Thurstone Age, ed, Full-Scale Intelligence Quotient Age, ed, Memory Quotient
28 28 28 28 26 27 28
50.0 71.4 89.3 78.6 80.8 88.9 67.9
aSRT = simple reaction time; CRT = choice reaction time; ed --- years of education; SD = standard deviation.
Reaction :lime and Test Performance
569
cators of classification ability. However, assuming that the degree of shrinkage would not have been extremely different for the various test scores had cross-validation been possible, the hit rates can be compared. It can then be tentatively concluded that the impairment score was not superior to RT measures in predicting recent history of traumatic brain injury. In fact, the RT classification rate was higher, though the difference was not statistically significant. Comparison of the RT hit rate with the classification rates of other tests suggests that RT may in fact be superior to some tests and at least comparable to others. The one statistical comparison that was completed supported the hypothesis that the ability of RT to differentiate recent traumatic brain injury from control subjects is superior to the ability of memory testing. In this experiment, addition of an SRT measure in the second phase to a first-phase CRT measure increased the hit rate by 10.7%. This is notable because SRT added very little power (0.5%) in the selection process of Experiment 1. The selection process in the present experiment also differed from that in Experiment 1 because a measure of dispersion (SD) was chosen instead of a measure of central tendency.
G E N E R A L DISCUSSION The general purpose of this study was to examine the relationship between RT and neuropsychological test performance as measured by an impairment score. The ability of RT measures to classify 69.5% of subjects as they were classified by the impairment score in Experiment 1 indicates at least moderate agreement regarding impairment classifications. However, Experiment 2 showed that classifications made by three other scores (fluency, intelligence, and memory) also agreed with the impairment score at nearly the same level as RT or higher. In sum, RT did show a significant level of agreement with the impairment score, but the rate did not significantly exceed the rates obtained using other measures. It could be argued that the most interesting results are those of Experiment 3. These results suggest that the impairment score is not superior to RT in classifying individuals with recent history of traumatic brain injury versus controls. This is potentially important; it implies that reaction time is comparable in ability to extensive neuropsychological testing when the goal is simply to categorize into these subject groups. However, the ability to draw firm conclusions is limited by the small number of subjects, and this experiment requires replication with larger subject groups. In addition, different groups with various etiological backgrounds should be considered. The main purpose of these experiments was not to determine whether simple or choice RT procedures are to be preferred or to advocate one type of RT summary score over others. Nevertheless, two general recommendations can be made on the basis of the results obtained, and both encourage continued research with a variety of measures. First, the selection of both a simple and a choice measure as predictors in Experiment 3 indicates that each type of task made a unique contribution to classification ability. The argument for unique contributions is less powerful in view of Experiment 1, where the simple task added very little to the correct classification rate obtained using only a choice task; however, because RT tasks require little time, it is recommended that both SRT and CRT be used until the differences between the cognitive requirements of each are more fully understood. Second, it appears that measures of central tendency as well as measures of dispersion are useful predictors, because median times were selected in Experiment 1 and SDs were selected in Experiment 3. The mean times were not selected as predictors in either experiment. It is recommended that at least medians and SDs be used to summarize individual RT performance. In summary, although the results fell short of supporting all of the proposed hypotheses, they warrant continued investigation of the utility of RT in clinical settings. Specifically,
570
S. L Western and C. J. Long
three uses are envisioned. First, RT measures may be incorporated as cost-effective components o f a screening instrument for neuropsychological impairment. Although results o f this study do not show full agreement between results o f neuropsychological testing and RT testing and also do not demonstrate that RT is superior to other tests in predicting neuropsychological impairment, the ability o f RT measures to classify a significant number o f subjects as they were classified by an impairment score derived from seven neuropsychological scores suggests that RT could be used to identify individuals requiring further testing. The agreement between RT and the impairment score also suggests a second potential use for RT measures. They could serve as an adjunct to neuropsychological testing when a complete index o f impairment cannot be obtained. This may be the case, for example, when the time available for testing is brief. Also, it seems very likely that RT tasks can be completed by individuals across a broader range o f cognitive functioning than is possible for some neuropsychological tests. Finally, RT tasks should be considered as worthwhile additions to complete neuropsychological batteries. They add a direct measure o f speed o f processing to the standard assessment, and results o f Experiment 3 suggest that there are instances when decisions regarding the presence o f traumatic brain injury are more accurately made by the short, easily-administered RT tasks than by neuropsychological tests.
REFERENCES Benton, A. (1986). Reaction time in brain disease: Some reflections. Cortex, 22, 129-140. Benton, A. L., & Joynt, R. J. (1958). Reactiontime in unilateralcerebral disease. Confinia Neurologica, 19, 247-256. Blackburn, H. L., & Benton, A. L. (1955). Simple and choice reaction time in cerebral disease. Confinia Neurologica, 15, 327-338. Bolter, J. E, Hutcherson, W. L., & Long, C. J. (1984). Speech Sounds Perception Test: A rational response strategy can invalidate the test results. Journal of Consulting & Clinical Psychology, 52, 132-133. Byme, D. G. (1976). Choice reaction times in depressive states. British Journal of Social and Clinical Psychology, 15, 149-156. Dee, H. L., & Van Allen, M. W. (1972). Psychomotor testing as an aid in the recognition of cerebral lesions. Neurology, 22, 845-848. Dee, H. L., & Van Allen, M. W. (1973). Speed of decision-making processes in patients with unilateral cerebral disease. Archives of Neurology, 28, 163-166. DeRenzi, E., & Faglioni, R (1965). The comparative efficiency of intelligence and vigilance tests in detecting hemispheric cerebral damage. Cortex, 1,410-433. Elsass, E (1986). Continuous reaction times in cerebral dysfunction.Acta Neurologica Scandinavica, 73, 225-246. Elsass, E, & Hartelius, H. (1985). Reaction time and brain disease: Relations to location, etiology, and progression of cerebral dysfunction.Acta Neurologica Scandinavica, 71, I I-I 9. Ferguson, G. A., & Takane, Y. (1989). Statistical analysis in psychology and education (6th ed.). New York: McGraw-Hill. Huberty, C. J. (1984). Issues in the use and interpretation of discriminant analysis. Psychological Bulletin, 95, 156-171. Jensen, A. R. (1992). The importance of intraindividual variation in reaction time. Personality and Individual Differences, 13, 869-881. King, H. E. (1965). Reaction time and speed of voluntary movement by normal and psychotic subjects. The Journal of Psychology, 59, 219-227. Knott, V. J., & Lapierre, Y. D. (1987). Electrophysiologicaland behavioral correlates of psychomotor responsivity in depression. Biological Psychiatry, 22, 313-324. Long, C. J. (1991). A model of recovery to maximize the rehabilitation of individuals with head trauma. The Journal of Head Injury, 2, 18-28. Long, C. J., & Klein, K. (1990). Decision strategies in neuropsychology II: Determination of age effects on neuropsychologicalperformance. Archives of Clinical Neuropsychology, 5, 335-345. MacFlynn, G., Montgomery, E. A., Fenton, G. W., & Rutherford, W. (1984). Measurement of reaction time following minor head injury. Journal of Neurology, Neurosurgery, & Psychiatry, 47, 1326-1331. Martin, I., & Rees, L. (1966). Reaction times and somatic reactivity in depressed patients. Journal of Psychosomatic Research, 9, 375-382.
Reaction llme and Test Performance
571
Miller, E. (1970). Simple and choice reaction time following severe head injury, Cortex, 6, 121-127. Newcombe, F., & Ratcliff, G. (1979). Long-term psychological consequences of cerebral lesions. In M. S. Gazzaniga (ed.), Handbook of behavioral neurobiology: Vol. 2. Neuropsychology (pp. 495-540). New York: Plenum. Norusis, M. J. (1992a). SPSSfor Windows (Release 5.0) ]Computer software]. Chicago: SPSS Inc. Norusis, M. J. (1992b). SPSS for Windows: Professional statistics (Release 5) [Computer software manual]. Chicago: SPSS Inc. Pendleton, M. G., Heaton, R. K., Lehman, R. A. W., & Hulihan, D. (1982). Diagnostic utility of the Thurstone Word Fluency Test in neuropsychological evaluations. Journal of Clinical Neuropsychology, 4, 307-317. Reitan, R. M. (1959). The effects of brain lesions on adaptive abilities in human beings. Unpublished manuscript, Indiana University Medical Center at Indianapolis. Reitan, R. M., & Wolfson, D. (1993). The Halstead-Reitan neuropsychological test battery: Theory and clinical interpretation (2nd ed.). Tucson, AZ: Neuropsychology Press. Rosofsky, I., Levin, S., & Holzman, E S. (1982). Psychomotility in the functional psychoses. Journal of Abnormal Psychology, 91, 71-74. Schwartz, E, Carr, A. C., Munich, R. L., Glauber, S., Lesser, B., & Murray, J. (1989). Reaction time impairment in schizophrenia and affective illness: The role of attention. Biological Psychiatry, 25, 540-548. Stuss, D. T., Stethem, L. L., Hugenholtz, H., Picton, T., Pivik, J., & Richard, M. T. (1989). Reaction time after head injury: Fatigue, divided and focused attention, and consistency of performance. Journal of Neurology, Neurosurgery, and Psychiatry, 52, 742-748. Tartaglione, A., Bino, G., Manzino, M., Spadavecchia, L., & Favale, E. (1986). Simple reaction time changes in patients with unilateral brain damage. Neuropsychologia, 24, 649-658. Van Zomeren, A. H., & Deelman, B. G. (1978). Long-term recovery of visual reaction time after closed head injury. Journal of Neurology, Neurosurgery, & Psychiatry, 41, 452-457. Wechsler, D. (1945). A standardized memory scale for clinical use. The Journal of Psychology, 19, 87-95. Wechsler, D. (1981). WAIS-R manual: Wechsler Adult Intelligence Scale-Revised. New York: The Psychological Corporation. Wheeler, L., Burke, C. J., & Reitan, R. M. (1963). An application of discriminant functions to the problem of predicting brain damage using behavioral variables. Perceptual and Motor Skills, 16, 417-440. Willis, W. G. (1984). Reanalysis of an actuarial approach to neuropsychological diagnosis in consideration of base rates. Journal of Consulting and Clinical Psychology, 52, 567-569.