NOTE
THE RELIABILITY OF REACTION TIME DETERMINATIONS! K. de S. Hamsher and A. L. Benton (Departments of Neurology and Psychology, University of Iowa)
Studies of reaction time (RT) in the fields of neuropsychology and gerontology have employed widely varying numbers of trials to determine the RT of individual subjects as well as different measures of central tendency. Some investigators (e.g., Boller, Howes and Patten, 1970; Dirken, 1972) have ~given as few as 10 trials or less while others (e.g., Arrigoni and De Renzi, 1964; De Renzi and Faglioni, 1965; Bruhn and Parsons, 1971; Klisz and Parsons, 1975) have presented 50 or more trials to obtain a reliable estimate. The majority of studies have used mean RT as a measure of central tendency. However, some investigators, mindful of the skewed distribution of within-individual RTs and the sensitivity of the mean to the occurrence of extremely long RTs, have employed presumably more stable indexes such as median RT (Costa, 1962; Dee and Van Allen, 1971, 1973), mean of reciprocal RT values (Blackburn and Benton, 1955; Benton and Joynt, 1959) and transformed log RT values (Boller, Howes and Patten, 1970). Another source of variation has been the number of practice trials given before RT is measured. For example, Benton and Joynt (1959) gave 5 practice trials before measuring simple or two-choice RT while Bruhn and Parsons (1971) gave 20 practice trials before making the same determinations. The relationship of the reliability of RT determinations to the number of trials presented and the type of measure of central tendency employed is a question of some methodological significance. If the mean of 6 simple or choice RTs, as was utilized by Dirken (1972) in a study of age differences in performance, is a reliable measure, it is scarcely necessary to give 30, 50 or 100 trials, unless the focus of interest is not RT per se but change of performance over trials. The point is of some practical importance. The time required for a procedure often determines whether it will be included in a comprehensive research study or in clinical evaluation. In this connection, it may be noted that RT measurement is rarely a component of clinical assessment batteries despite the fact that slowing in RT is a salient behavioral characteristic of brain disease. No doubt one reason for this exclusion is the assumption that a substantial number of trials must be given to obtain reliable measures and this would be too costly in time. The purpose of the present study was to obtain and report some empirical data relevant to these questions that might provide indications about the minimal ! This investigation was supported by Research Grant NS-00616 from the National Institute of Neurological Diseases and Stroke, U.S. Public Health Service.
Cortex (1977) 13, 306-310.
Reaction time determinations
307
number of trials necessary for reliable determinations and the choice of measure of central tendency. The corrected split-half reliability of estimates of simple and two-choice RT based on 6, 12, 18, 24 and 30 trials was computed, utilizing four different measures of central tendency - mean RT, median RT, log RT and reciprocal RT (l/RT). The determinations were made on separate groups of control and brain-diseased patients.
MATERIAL AND METHOD
Subjects The control group consisted of 30 patients without history or evidence of brain disease on various services of the University Hospitals, Iowa City. Their mean age was 46 years (S.D. = 10; Range = 29-60) and their mean educational level was 9 years (S.D. = 2). The brain-diseased group consisted of 30 patients with disease involving the cerebral hemispheres on the neurological and neurosurgical services. Their mean age was 39 years (S.D. = 13; Range = 16-60) and their mean educational level was 10 years (S.D. = 3). The lower mean age of the brain-diseased patients was related to the inclusion of 5 patients in that group who were younger than the youngest control patient.
Procedure Each patient was given 5 practice trials followed by 30 test trials of warned, simple visual RT. Mter a rest period of 2-3 min. he was given 5 practice trials and 30 test trials of warned, disjunctive (2-choice) visual RT. The warning signal was a buzz of 2 sec. duration. Details concerning the apparatus and instruction may be found in previous reports (Blackburn and Benton, 1955; Blackburn, 1958; Benton and Joynt, 1959). RTs for each test trial were recorded in .01 sec. units and utilized for the computation of the mean and median RT for each task for trials 1-6, 1-12, 1-18, 1-24 and 1-30. The means of transformed RT values (reciprocal and log) were similarly computed for each of these series of trials. In order to estimate the split-half reliability coefficients for each series of trials, the same statistics were computed for trials 1-3, 3-6, 7-12, 1-9, 10-18, 13-24, 1-15 and 16-30.
RESULTS
The corrected split-half product-moment correlation coefficients of the four measures of central tendency for each series of trials in both groups and for both tasks are shown in Table 1. The following impressions are gained from inspection of the table. (1) There was a clear, although not completely consistent, trend toward a higher reliability coefficient with increasing number of trials. The median of the 16 correlation coefficients over tasks, groups and measures of central tendency was .895 for 6 trials and .900 for 12 trials. The same statistic was .940 for both 24 and 30 trials. (2) The reliability coefficients for the 6-trials series (preceded by 5 practice
K. de S. Hamsher and A. L. Benton
308
trials, to be sure) were remarkably high, the lowest coefficient in the set of 16 being .83. (3) The reliability of the simple RTdeterminations was slightly greater than that of the 2-choice RT determinations. The median of the 40 correlation coefficients over trials, groups and measures of central tendency was .930 for simple RT and .895 for 2-choice RT. ( 4) The reliability coefficients were slightly lower for the control patients than for the brain-diseased patients, no doubt because of the greater inter-individual variability in the latter group. The median of the 40 correlations over trials, tasks and measures of central tendency was .905 for the controls and .935 for the braindiseased patients. (5) Reliability was not related to the choice of measure of central tendency. The median of the 20 correlation coefficients over trials, tasks and groups ranged from .910 (mean) to .930 (median) with that for both transformed measures being .91.'5. (6) If a corrected split-half correlation coefficient of .90 is adopted as a criterion of satisfactory reliability, then 18 trials (preceded by 5 practice trials) seem to be adequate for the determination of warned simple visual RT in both groups and with all measures of central tendency. In contrast, 30 trials (again preceded by 5 practice trials) are required for the determination of warned twochoice visual RT. TABLE I
Spearman-Brown Reliability Correlations for Different Numbers of Simple and 2"Choice RT Trials and Types of Measure of Central Tendency Number of Trials
Simple RT Mean
Brain-damaged (N 6 12 18 24 30
Controls (N 6 12 18 24 30
.85 .92 .97 .95 .97
2-Choice RT
Log
Reciprocal
.83 .93 .96 .96 .97
.87 .94 .98
.97
.88 .95 .98 .97 .98
.89 .89 .94 .92 .95
.94 .89 .90 .92 .95
.87 .89 .94 .93 .95
.83 .87 .93 .94 .95
.90 .93 .94 .95 .93
.90 .91 .93 .95 .92
.90 .92 .93 .94 .91
.91 .88 .84 .89 .91
.93 .84 .79 .83 .92
.90 .85 .83 .88 .91
.89 .83 .81 .86 .90
Median
Mean
Median Reciprocal
Log
= 30) .96
= 30) .90 .89 .91 .95 .93
Differentiation between control and brain-damaged patients The degree to which these RT tasks differentiated between the control and brain-diseased groups in relation to number of trials and measure of central tendency employed was also investigated. Table II shows the number of brain-
Reaction time determinations
309
diseased patients whose RTs were longer than those of 28 of the 30 controls, i.e., longer than 93 percent of the controls. Inspection of the table indicates that in the case of simple RT none of the measures showed a trend toward increasing differentiation with increasing number of trials. In contrast, for choice RT, all the measures of central tendency showed increasing differentiation with increasing number of trials. TABLE II
Number of Brain-Diseased Patients with RTs Longer than Those of 93 Percent of Controls
Number of Trials Simple RT
2-Choice RT
Mean
Median
Log
Reciprocal
8
12 12 12 12 12
14 12
14 12 12 12 12
6 12 18 24 30
13
6 12 18 24 30
6 12 15 17 16
11
10 12
9 13
15 16 15
13
12 12
6
12 14 14 16
7 12 14
13
16
DISCUSSION
The implications of these findings apply only to the measurement of warned simple and two-choice visual RT in non-psychotic adult patients. It is possible that lower reliability coefficients for comparable numbers of trials would be found for continuous (unwarned) simple and two-choice reactions or for disjunctive reactions involving three or more choices. It is also possible that the reliability of measurement is different in other populations, e.g., children, aged subjects or psychotic patients. These are questions that can be resolved empirically. Our results indicate clearly that a large number of trials is not required to achieve a reliable estimate of warned simple or two-choice visual RT in the subject populations most frequently investigated in clinical neuropsychological research. Hence, unless change of performance over trials is of specific interest, the presentation of 50-100 trials to arrive at these estimates seems to be a needless expenditure of time. Nor does the choice of measures of central tendency affect reliability per se. However, for theoretical reasons some investigators will still prefer to utilize an index that is less sensitive to outlying" trial times than is mean RT. II
SUMMARY
The reliability of simple and two-choice reaction time (RT) as a function of number of trials and measure of central tendency was examined. It was found
K. de S. Hamsher and A. L. Benton
310
that, within the particular conditions of the study, 18 trials were sufficient to obtain a highly reliable determination of simple RT but 30 trials were required for a highly reliable determination of two-choice RT. The selection of measure of central tendency was not related to degree of reliability.
REFERENCES
ARRIGONI, G., and DE RENZI, E. (1964) Constructional apraxia and hemispheric locus of lesion, "Cortex," 1, 170-197. BENTON, A. 1., and JOYNT, R. J. (1959) Reaction time in unilateral cerebral disease, "Confin. Neurol.," 19, 247-256. BLACKBURN, H. (1958) Effects of motivating instructions on reaction time in cerebral disease, "J. Abn. Psychol.," 56, 359-366. -, and BENTON, A. 1. (1955) Simple and choice reaction time in cerebral disease, II Confin. Neurol.," 15, 327-338. BOLLER, F., HOWES, D., and PATTEN, D. H. (1970) A behavioral evaluation of brain scan estimates of lesion size, "Neurology," 20, 852-859. BRUHN, P., and PARSONS, O. A. (1971) Continuous reaction time in brain damage, "Cortex," 7, 278-291.
COSTA, 1. D. (1962) Visual reaction time of patients with cerebral disease as a function of length and constancy of preparatory interval, "Percept. Mot. Skills," 14, 391-397. DEE, H. L., and VAN ALLEN, M. W. (1971) Simple and choice reaction time and motor strength in unilateral cerebral disease, "Acta Psychiatr. Scand.," 47, 315-323. -, (1973) Speed of decision making processs in patients with unilateral cerebral disease, "Arch. Neurol.," 28, 163-166. DE RENZI, E., and FAGLIONI, P. (1965) The comparative efficiency of intelligence and vigilance tests in detecting hemispheric cerebral damage, "Cortex," 1, 410-433. DIRKEN, J. M. (1972) Functional Age of Industrial Workers, Wolters-Noordhoff Publishing, Groningen. KLISZ, D. K., and PARSONS, O. A. (1975) Ear asymmetry in reaction time tasks as a function of handedness, "Neuropsychologia," 13, 323-330.
Mr. Kerry de S. Hamsher and Dr. A. L. Benton, Department of Neurology, University Hospitals, Iowa City, Iowa 52242, U.S.A.