On the Reliability, Validity, and Cognitive Structure of the Thurstone Word Fluency Test

On the Reliability, Validity, and Cognitive Structure of the Thurstone Word Fluency Test

Archives of Clinical Neuropsychology, Vol. 15, No. 3, pp. 267–279, 2000 Copyright © 2000 National Academy of Neuropsychology Printed in the USA. All r...

317KB Sizes 2 Downloads 37 Views

Archives of Clinical Neuropsychology, Vol. 15, No. 3, pp. 267–279, 2000 Copyright © 2000 National Academy of Neuropsychology Printed in the USA. All rights reserved 0887-6177/00 $–see front matter

PII S0887-6177(99)00017-7

On the Reliability, Validity, and Cognitive Structure of the Thurstone Word Fluency Test Melissa J. Cohen Visalia, California

Daniel E. Stanczak Psychology Research Service, Wilford Hall Air Force Medical Center

The Thurstone Word Fluency Test (TWFT) is a widely used neuropsychological instrument. However, data regarding its psychometric properties are lacking. The results of the present study suggest that the TWFT possesses excellent test-retest and inter-rater reliability, in addition to good construct validity. However, its criterion validity is limited by its lack of specificity and sensitivity. The present study also suggests that the TWFT is a complex cognitive task, and that successful TWFT performance depends upon a constellation of cognitive abilities, including attention/concentration, psychomotor speed, and memory. Finally, the relationship between verbal IQ and TWFT letter association value was examined. While the TWFT appears to be useful in detecting the presence of cerebral dysfunction, it is of less value in localizing such dysfunction. It is argued that the TWFT should not be used as a neuropsychological screening instrument, but rather, is best used within the context of a thorough neuropsychological examination. © 2000 National Academy of Neuropsychology. Published by Elsevier Science Ltd Keywords: assessment, neuropsychological testing, Thurstone Word Fluency Test, cognitive functions, psychometry

Word fluency tasks are commonly used to diagnose aphasia and deficits in neuropsychological functioning (Borkowski, Benton, & Spreen, 1967). Although many clinicians and researchers have reported the utility of word fluency tasks and have chosen to include such tasks in their standardized neuropsychological test batteries (Heaton, Grant, & Matthews, 1991), the literature regarding the psychometric properties of such tasks is quite limited. The Thurstone Word Fluency Test (TWFT; Thurstone, 1938) is frequently used to detect the presence of cerebral dysfunction and to define the nature of such dysfunction. The views expressed in this paper are solely those of the authors and do not reflect endorsement by the Department of Defense, Department of the Air Force, or any other governmental agency. Address correspondence to: Daniel E. Stanczak, PhD, Director, Psychology Research Service, 59th Medical Wing/ MMCNB, 2200 Bergquist Drive, Suite 1, Lackland Air Force Base, TX, 78236. E-mail: [email protected]

267

268

M. J. Cohen and D. E. Stanczak

The TWFT is a controlled association task that requires the subject write as many words beginning with the letter “S” as possible in 5 minutes and, then, to write as many four-letter words beginning with the letter “C” as possible within 4 minutes. The first condition is considered a high association task; the second is considered a low association task. Milner (1964) was one of the earliest researchers to employ the TWFT, reporting its usefulness in detecting left frontal damage. More specifically, she found significant deficits in word fluency following left frontal lobectomy, with little to no deficits following left temporal or right frontal lobectomy. Pendleton, Heaton, Lehman, and Hulihan (1982) also found the TWFT to be useful in distinguishing brain-damaged from neurologically normal individuals. These researchers reported that individuals with left frontal lesions demonstrated the most deficits in word fluency. They also found that the TWFT was effective in discriminating left versus right and frontal versus nonfrontal lesions. However, the investigators also found that the TWFT was unable to discriminate focal frontal from diffuse lesions. Perret (1974) found that patients with left nonfrontal lesions demonstrated significantly more deficits in word fluency when compared to patients with right nonfrontal lesions. More recently, Martin, Loring, Meador, and Lee (1990) reported that patients with left temporal lesions displayed more deficits in verbal fluency than patients with right temporal lesions, with both lesion groups demonstrating impaired verbal fluency when compared to a neurologically normal control group. Such findings provide evidence of the criterion (predictive) validity of the TWFT. However, the potential confounding effects of demographic variables upon TWFT performance is an issue that remains unresolved, indeed largely ignored, in the literature. Variables such as age, gender, and education are thought to strongly influence performance on many neuropsychological tests (Heaton et al., 1991) and may be important covariates in the interpretation of the TWFT. Pendleton et al. (1982) reported small correlations between age and word fluency and between education and word fluency, with such correlations being stronger for the neurologically normal than for the brain damaged group. Schaie and Strother (1968) also found that age significantly affected TWFT performance. In contrast, Cauthen (1978) found no significant correlations between age and word fluency in individuals ages 20 to 59. Intelligence is yet another potential confound to the interpretation of TWFT performance. Borkowski et al. (1967) investigated word fluency using a task similar to the TWFT and found that brain-damaged groups performed less well than the control group on both the high and low association conditions. When verbal IQ (VIQ) was included as a factor, the low association conditions were better at discriminating brain-damaged from neurologically normal individuals only in the high VIQ group. In contrast, high association conditions were better at discriminating brain damaged from normals in the low VIQ group. When VIQ was eliminated as a factor, the high association conditions and low association conditions were equally discriminative. Bolter, Long, and Wagner (1983) also found IQ to be associated with verbal fluency. Specifically, they found the TWFT to discriminate better among individuals with low VIQ regardless of lesion location. Subjects with left nonfrontal lesions and low VIQ produced significantly fewer words than high VIQ subjects with left frontal lesions. No significant deficits in verbal fluency were found among brain-damaged groups with high VIQ. Furthermore, it was found that, regardless of VIQ, the high association condition was more sensitive in detecting cerebral dysfunction. Unfortunately, Bolter et al. only examined high versus low IQ groups, using a VIQ of 100 as a cutoff. Furthermore, they did not include, in their analysis, the total TWFT score as a potential discriminator. It is also possible, if not probable, that cognitive skills other than word fluency are also required for successful TWFT performance. For instance, Cauthen (1978) demon-

Thurstone Word Fluency Test

269

strated that the Digit Symbol subtest of the Wechsler Adult Intelligence Scale (WAIS; Wechsler, 1945) was highly correlated with word fluency scores, suggesting that psychomotor speed may be an important cognitive component of successful TWFT performance. It is also probable that attention factors contribute to successful TWFT performance. Several aspects of attention/concentration are required in order for the subject to perform well. First, a subject must be able to sustain his/her lexical search for periods of 5 and 4 minutes. Additionally, the subject must have the capacity to selectively produce target words while concomitantly inhibiting the production of irrelevant words. Thirdly, the subject must also be able to filter out distracting environmental and/or cognitive stimuli. Finally, the subject must be able to recall words from his/her lexicon. Although the TWFT is widely used, its psychometric properties and its relation to potential confounding variables remain largely undefined, a fact that limits accurate test interpretation. The goal of the present study is to investigate and document the psychometric properties of the TWFT that have been largely ignored to date. Toward that end, the present study attempts to (a) establish the test-retest and inter-rater reliability of the TWFT, (b) examine the criterion and construct validity of the TWFT, (c) define some of the component skills necessary for successful TWFT performance, (d) examine the relationship between TWFT performance and demographic variables, and (e) more clearly identify the relationship between intelligence and letter association value.

ANALYSIS 1: RELIABILITY Purpose The purpose of the first analysis was to examine the test-retest and inter-rater reliability of the TWFT.

Subjects Subjects were 70 student volunteers, ranging in age from 18 to 60 years (M ⫽ 20.62, SD ⫽ 4.36) recruited from the student body of California State University, Fresno. Subjects were told only that the purpose of the study was to investigate brain functioning. Informed consent was obtained, and each subject’s raw data was numerically coded to insure confidentiality. Seven subjects withdrew from the study before completion, and data regarding these subjects was deleted from further analysis.

Procedures All subjects completed the TWFT twice, with a 6-week intertest interval. Forty-three protocols were blindly scored by both the first author and a neuropsychology doctoral candidate.

Design Pearson product–moment correlations were calculated between scores obtained at the initial testing session and those obtained 6 weeks later and between the scores obtained by the two independent raters.

270

M. J. Cohen and D. E. Stanczak

Results This procedure yielded a test-retest reliability correlation of .79 (p ⬍ .001). A significant practice effect was noted between the first administration (M ⫽ 50.00; SD ⫽ 15.49; range,18–86) and the second administration (M ⫽ 58.09; SD ⫽ 15.53; range, 27–81; t ⫽ 2.93, df ⫽ 126, p ⫽ .004) of the TWFT. An interrater reliability of .98 (p ⬍ .001) was obtained. No significant differences were noted between the scores obtained by the two raters (t ⫽ .32, df ⫽ 86, p ⫽ .74). ANALYSIS 2: CRITERION VALIDITY Purpose The purpose of the second analysis was to examine the criterion validity of the TWFT. In particular, predictive validity was examined. Subjects Archival data, collected over the past 11 years, were analyzed. Subjects were 296 brain-damaged patients and 188 normal control subjects, selected according to the methods suggested by Stanczak, Stanczak, and Templer (1999). The brain-damaged subjects had been referred for neuropsychological evaluation at one of nine different inpatient or outpatient settings. Lesion location for all brain-damaged subjects had been confirmed by appropriate neurodiagnostic tests. No case was included if neurodiagnostic tests yielded equivocal results. The brain-damaged sample was heterogeneous in terms of le-

TABLE 1 Group Demographics for Analysis 2 Group

Normal controlsa Age (years) Education (years) Brain-damagedb Age (years) Education (years) Time post-onset (years) Lesion type Closed-head injury Penetrating head injury CVA Neoplasia Seizures Degenerative De/Dysmyelinating Metabolic Infectious AVM Hydrocephalus Lobectomy Anoxia Undetermined

M

SD

31.06 14.83

10.38 2.24

43.34 12.66 3.37

17.01 3.09 5.69

n

88 4 53 3 67 27 3 1 8 5 4 1 2 2

CVA ⫽ cerebrovascular accident; AVM ⫽ arteriovenous malformation. a29 males, 39 females. b181 males, 110 females.

Thurstone Word Fluency Test

271

sion type, lesion severity, and time post-onset, thus more accurately approximating a population that might present for neuropsychological examination and maximizing the external validity of the study (Gemmell & Stanczak, 1996). Normal volunteers had been recruited from university settings in California, Tennessee, or Louisiana, and had no history of neuropathology or significant psychopathology. Sample demographics are presented in Table 1. Design For purposes of this study, anterior lesions were defined as any lesion anterior to the Sylvian and Rolandic fissures. Posterior lesions were defined as any lesion posterior to these landmarks. The left hemisphere lesion group consisted of those individuals with left anterior and left posterior lesions, as well as those individuals with left hemispheric lesions extending across the Sylvian fissure. The right hemisphere lesion group consisted individuals with lesions in homologous right hemisphere regions. The anterior lesion group included subjects with right frontal, left frontal, and bifrontal lesions. The posterior lesion group consisted of subjects with right posterior, left posterior, and bilateral posterior lesions. The diffuse lesion group included subjects whose lesions extended bilaterally and were not restricted to either anterior or posterior regions. Direct discriminant analyses, using the TWFT as the predictor variable, were performed contrasting the following groups: (a) brain-damaged (n ⫽ 296) versus normal controls (n ⫽ 188), (b) left hemisphere (n ⫽ 92) lesions versus normal controls, (c) right hemisphere (n ⫽ 78) lesions versus normal controls, (d) right hemisphere versus left hemisphere lesions, (e) anterior (n ⫽ 70) versus posterior (n ⫽ 75) lesions, (f) right anterior (n ⫽ 34) versus right posterior (n ⫽ 30) lesions, (g) right anterior versus left anterior (n ⫽ 34) lesions, (h) right anterior versus left posterior (n ⫽ 38) lesions, (i) right posterior versus left posterior lesions, (j) right posterior versus left anterior lesions, (k) left

TABLE 2 Criterion (Predictive) Validity of the Thurstone Word Fluency Test

Comparison

Bd vs. N Lf vs. Rt Lf vs. N Rt vs. N A vs. P Ra vs. Rp La vs. Ra Lp vs. Rp Ra vs. Lp La vs. Rp Ra vs. Dif Rp vs. Dif La vs. Dif Lp vs. Dif

Hit Rate (%)

Sensitivitya (%)

Specificitya (%)

88.3 52.3 74.3 71.2 61.5 56.6 67.1 66.7 60.5 61.0 64.4 62.3 63.9 76.9

96.6 0.0 68.7 64.6 0.0 100.0 0.0 100.0 0.0 0.0 90.0 86.0 76.0 100.0

37.5 96.7 77.2 75.3 100.0 0.0 100.0 0.0 100.0 100.0 8.7 23.3 51.0 0.0

Bd ⫽ brain-damaged; N ⫽ normal controls; Lf ⫽ left hemisphere; Rt ⫽ right hemisphere; A ⫽ anterior; P ⫽ posterior; Ra ⫽ right anterior; Rp ⫽ right posterior; La ⫽ left anterior; lp ⫽ left posterior; dif ⫽ diffuse. aSensitivity to and specificity for first condition.

272

M. J. Cohen and D. E. Stanczak

FIGURE 1. Group means and standard deviations for the Thurstone Word Fluency Test total score.

anterior versus left posterior lesions, (l) right anterior versus diffuse lesions, (m) right posterior versus diffuse lesions, (n) left anterior versus diffuse (n ⫽ 46) lesions, and (o) left posterior versus diffuse lesions. Results The results of the discriminant analyses are summarized in Table 2. As can be seen, the TWFT is effective in discriminating brain-damaged from normal subjects. However, it is relatively ineffective in discriminating left versus right and anterior versus posterior lesions. Moreover, the TWFT is rather ineffective in discriminating focal from diffuse lesions.

Thurstone Word Fluency Test

273

A post-hoc comparison of group means was performed using a one-way analysis of variance (ANOVA) with the TWFT as the dependent variable. The results of the ANOVA were significant (SS ⫽ 67906.12, df ⫽ 573, F ⫽ 15.71, p ⬍ .0001). Follow-up Tukey HSD tests revealed significant differences between: (a) normal controls and all lesion groups, (b) normal controls and a group of 85 mixed psychiatric controls, (c) left anterior versus generalized lesioned individuals, (d) psychiatric controls versus generalized lesioned individuals, and (e) psychiatric controls versus left hemisphere lesioned individuals. Group means and standard deviations are graphically presented in Figure 1.

ANALYSIS 3: CONSTRUCT VALIDITY Purpose The purpose of Analysis 3 was to examine the construct validity of the TWFT.

Subjects Subjects were the same as those in Analysis 1.

Procedure All subjects were administered four tasks of verbal fluency in the following order: the TWFT, the Word Fluency Test-Form A (University of Chicago, 1985), the Controlled Oral Word Association Test (COWA; Benton, Hamsher, & Silvay, 1994), and the F-A-S (Benton, 1968).

Design Pearson product–moment correlations were derived between all variables.

Results Significant correlations were obtained between all word fluency tasks. The correlation matrix is presented in Table 3.

TABLE 3 Construct Validity of the Thurstone Word Fluency Test (Expressed in Pearson Product–Moment Correlations) Variable

Thurstone Word Fluency COWA

Word Fluency

COWA

F-A-S

.66** – –

.72** .52** –

.81** .49* .85**

*p ⫽ .001. **p ⬍ .001. COWA ⫽ Controlled Oral Word Association Test; F-A-S.

274

M. J. Cohen and D. E. Stanczak

ANALYSIS 4: RELATED COGNITIVE FUNCTIONS Purpose The purpose of the fourth analysis was to examine the cognitive skills necessary for successful TWFT performance. Subjects Subjects were 215 normal control subjects selected from the database described in Analysis 2. Design The following tests were entered into a principle components factor analysis with orthogonal rotation of factors: (a) the Trail Making Test, Forms A and B, (b) the TWFT, (c) the Finger Oscillation Test-Dominant Hand score, (d) the “Freedom from Distractibility” factor score from the Wechsler Adult Intelligence Scale-Revised (Wechsler, 1981), (e) the interference score from the Stroop Color and Word Test (Golden, 1978), and the Memory Quotient from the Wechsler Memory Scale, Form 1 (Wechsler, 1945). A priori criteria of an eigenvalue greater than one and the meaningfulness of the factor were established for factor selection. A factor loading of .50 or greater was arbitrarily chosen as the criterion for inclusion. Results A three-factor model emerged which explained 74.1% of the variance (Table 4). The TWFT loaded most heavily—along with the Finger Oscillation Test, the Freedom from Distractibility Factor, and the Memory Quotient—on Factor 1, which explained 43.8% of the variance. Both forms of the Trail Making Test loaded on Factor 2, while only the Stroop interference score loaded on Factor 3.

ANALYSIS 5: DECOMPOSING BETWEEN-GROUP VARIANCE Purpose The purpose of Analysis 5 was to determine the amount of between-group (brain-damaged versus normals) variance attributable to age, gender, education, and TWFT scores.

TABLE 4 Results of Factor Analysis Test

Thurstone WFT WAIS-R Factor 3 WMS-MQ Tapping Dominant Hand Trails A Trails B Stroop Interference

Factor 1

Factor 2

Factor 3

.84 .87 .84 .51 ⫺.34 ⫺.05 .15

⫺.17 ⫺.09 ⫺.18 ⫺.30 .79 .88 ⫺.08

⫺.001 .13 .04 ⫺.43 .08 ⫺.08 .91

WAIS ⫽ Wechsler Adult Intelligence Scale - Revised; WMS-MQ ⫽ Wechsler Memory Scale - Memory Quotient.

Thurstone Word Fluency Test

275

Subjects Subjects were the same as those employed in Analysis 2.

Design Using multiple regression techniques, part correlations were calculated for TWFT scores, age, gender, and education in order to determine the unique proportion of variance explained by each of these variables.

Results The results of this analysis are summarized in Table 5. As can be seen, the TWFT scores explained the largest unique percentage of variance, more in fact than the combined unique effects of age, gender, and age combined. Age, in and of itself, did not explain a significant amount of unique between-group variance.

ANALYSIS 6: IQ AND LETTER ASSOCIATION VALUE Purpose The purpose of Analysis 6 was to describe the relationship between intelligence and TWFT letter association value.

Subjects Subjects were the same as those employed in Analysis 2.

Design Subjects were classified into three groups according to their VIQ: (a) high VIQ (ⱖ 110), (b) average VIQ (90–109), and (c) low VIQ (ⱕ 89). Three separate discriminant analyses (one for each IQ group) were performed with group (brain-damaged vs. normals) as the dependent variable. Discriminating variables entered into the analyses were the total TWFT scores, the high association TWFT score (“S” words), and the low association TWFT score (“C” words). TABLE 5 Percent of Unique Variance Explained

Variable

Age Gender Education TWFT

Part Correlation Squared

t

p

.0006 .0324 .0289 .1225

⫺.57 4.00 3.82 7.66

.57 ⱕ.001 ⱕ.001 ⱕ.001

Thurstone Word Fluency Test.

276

M. J. Cohen and D. E. Stanczak TABLE 6 Relationship Between Verbal IQ (VIQ) and Thurstone Letter Association Condition VIQ Group

High Average Low

Best Discriminator

Hit Rate

Specificity

Sensitivity

Total score C words C words

.62 .76 .81

.63 .78 .82

.61 .71 .60

RESULTS The results of the discriminant analyses are summarized in Table 6. For subjects with both low and average VIQ, the low association TWFT score was the best discriminating variable. However, for subjects with high VIQ, the total TWFT score was best at discriminating brain-damaged from normal subjects.

DISCUSSION The present study provides additional support for the clinical and research utilization of the TWFT. The TWFT possesses high test-retest and inter-rater reliability and good construct validity. However, the results regarding the criterion validity of the TWFT do not support its interpretation as a test of left frontal lobe function. The present study does confirm the utility of the TWFT in discriminating brain-damaged from normal subjects, with an overall “hit rate” of 88%. Unfortunately, this hit rate is obtained at the expense of low specificity and a high number of false-positive diagnostic errors. Similarly, high hit rates were obtained in the left-lesioned versus normals and right-lesioned versus normals comparisons, indicating that the TWFT is useful in identifying cerebral dysfunction regardless of lesion lateralization. However, these high hit rates were obtained at the expense of low sensitivity and a high number of false-negative diagnostic errors. The TWFT performed rather poorly at differentiating between focal lesions, in discriminating focal from diffuse lesions, in differentiating left- from right-sided lesions, or in discriminating anterior from posterior lesions. Additionally, given its low specificity, the TWFT is probably unsuited for sole use as a screening device for “organicity.” Thus, it appears that the TWFT is most effectively used within the context of a larger neuropsychological test battery to detect brain damage in general, a finding that is consistent with most of the existing literature. As discussed previously, various forms of verbal fluency tasks exist and are being used in clinical and research applications. The current study examined four of the most widely used versions of word fluency tasks. The TWFT and Word Fluency Test-Form A are written forms, while the COWA and F-A-S both require oral responses. In contrast, the Word Fluency Test-Form A requires categorical responses, that is, it requires a more convergent type of reasoning as opposed to other word fluency tasks, which require more divergent thinking. In spite of these differences, all word fluency tasks examined in this study were moderately to highly correlated, suggesting that mode of administration (written vs. oral) or cognitive modality (convergent vs. divergent) may be irrelevant. However, further research is obviously necessary before mode of administration or cognitive modality can be ruled out as significant influences upon word fluency task performance.

Thurstone Word Fluency Test

277

However, such a hypothesis is consistent with the findings of Cauthen (1978), who also suggested that discrepancies among results are not due to variations in the testing conditions. While some researchers (e.g., Bolter et al., 1983) have suggested that inconsistent results may be due to variation between the particular word fluency tasks used, the results of the present study suggest that other variables, such as sampling procedures, may be responsible for unsuccessful replications of studies investigating verbal fluency. The results of the present study suggest that word fluency is actually a complex task. Psychomotor speed, attention/concentration, and memory appear to be significant components of successful word fluency performance. The fact that the Stroop interference score did not emerge as a significant cognitive correlate of word fluency performance suggests that the process of recalling specific words from the lexicon relies more heavily upon facilitory, as opposed to inhibitory, processing. This hypothesis will obviously need be subjected to empirical test. However, if supported, this hypothesis suggests that word fluency tasks might be of value in differentiating dorsolateral from mesiobasal frontal lobe lesions. Given that word fluency tests tap a broad range of cognitive functions, it therefore follows that deficits in spontaneous word production, as suggested by Milner (1964), are not solely responsible for lowered scores on the TWFT. Additionally, the current findings also demonstrate that brain lesions, other than those in the left frontal cortices, may result in impaired TWFT performance. Martin et al. (1990) reported that bilateral lesions of the frontal and temporal lobes contribute to impaired performance on word fluency tasks. In contrast, the present study suggests that the TWFT is sensitive to lesions, unilateral and bilateral, in almost all cortical areas. Results of the decomposition of between-group variance indicated that age is not an important factor in word fluency task performance, at least among adults. Such a finding is consistent with that of Cauthen (1978), who reported no significant correlation between age and word fluency. However, the present study did not include a large number of subjects over the age of 70, and further study of word fluency across the lifespan is certainly warranted. The present study also found that education explained a small, but statistically significant, unique proportion of between-group variance. This finding is consistent with that obtained by Pendleton et al. (1982) who reported a small but significant relationship between education and TWFT performance. However, since the brain-damaged and normal control groups, in the present study, differed significantly in terms of their mean number of years of education (normals: M ⫽ 14.72, SD ⫽ 2.48; brain-damaged: M ⫽ 12.51, SD ⫽ 3.08; t ⫽ 10.8, df ⫽ 716, p ⬍ .001) it was unclear whether the observed effect of education upon TWFT performance was genuine or an artifact of sampling bias. Thus, follow-up analyses of covariance were performed using the three TWFT conditions (high association, low association, and total words) as the dependent variables, group membership as the factor, and education as the covariate. In all three conditions, the covariate was significant (high association SS ⫽ 3467.12, df ⫽ 1, F ⫽ 18.9, p ⬍ .001; low association SS ⫽ 743.6, df ⫽ 1, F ⫽ 14.92, p ⬍ .001; total words SS ⫽ 11502.36, df ⫽ 1, F ⫽ 31.14, p ⬍ .001). Thus, it appears that education does indeed contribute uniquely to the explanation of between-group variance on the TWFT. Similarly, even though gender explained a small but significant proportion of betweengroup variance, it is unclear whether this represents a true gender-linked effect or the effects of sampling. Thus, follow-up comparisons were made between male and female control subjects. The results indicated that females produced significantly more words in the high association (“S” words; t ⫽ 2.16, df ⫽ 289, p ⫽ .03) and total words (t ⫽ 3.07, df ⫽ 592, p ⫽ .003) conditions but not in the low association (”C” words) condition (t ⫽ 1.85, df ⫽

278

M. J. Cohen and D. E. Stanczak

278, p ⫽ .06). These findings suggest that the unique contribution of gender in explaining between-group variance in TWFT performance is not due to sampling bias. Thus, in clinical applications, gender and education-based norms should be used while, in research applications, the effects of gender and education should be controlled. The present study also indicated a relationship between intelligence and letter association value. In particular, the results indicated that, among low and middle VIQ groups, the low association (“C” words) was best at discriminating between brain-damaged and normal subjects. However, in the high VIQ group, the total score emerged as the single best discriminating variable. These findings are inconsistent with those obtained by Bolter et al. (1983) and by Borkowski et al. (1967). Bolter et al. found that, regardless of VIQ, the high association (“S” words) condition was more sensitive to the presence of cerebral dysfunction. Furthermore, these researchers suggested that elimination of the low association condition would not reduce the discriminant validity of the test. Our results, however, argue against such a suggestion, especially when one considers that only 4 minutes of assessment time would be saved. The discrepancy between the current findings and those obtained by Bolter et al. (1983) may be due to several factors. First, Bolter et al. examined only high versus low VIQ groups, using a VIQ of 100 as the cutoff. Additionally, they did not include in their analysis, the total TWFT scores, which of course were found to be of importance in the present study. Borkowski et al. (1967) reported that, in their high VIQ group, the low association condition was better at detecting brain damage. Conversely, in their low VIQ group, the high association task was found to be the best discriminator. Bolter et al. speculated that the inconsistencies between the two studies may have been due to task administration differences, since Borkowski et al. had used verbal, rather than written, responses. In light of our Analysis 3, this speculation seems specious, since all word fluency tasks, regardless of mode of administration, were significantly correlated. Additionally, Borkowski et al. (1967) chose their letter association conditions based upon letter difficulty (i.e., the number of words listed in the dictionary). In contrast, the TWFT places a restriction on the words in the low association condition. Thus, inconsistent results may be attributable, in part, to the manner in which the association conditions were defined. In conclusion, the results of the present study provide evidence for the reliability and validity of the TWFT. However, because of its lack of specificity or sensitivity, depending upon the diagnostic decision to be made, the TWFT is best employed to detect the presence of cerebral dysfunction, not to localize such dysfunction. For the same reason, the TWFT should not be used as a screening test for brain dysfunction. Rather it should be used only within the context of a thorough neuropsychological examination. The TWFT appears to be a complex task that subsumes several cognitive functions, including memory, psychomotor speed, and attention/concentration. Thus, low performance on the TWFT may reflect deficits in cortical areas not typically associated with speech and language processes. It is suggested that, in both clinical and research applications, education and gender be controlled when using the TWFT. Finally, although further research is necessary, it is hypothesized that the TWFT, because it seems to involve facilitory processing, may be useful in discriminating dorsolateral from mesiobasal frontal lesions.

REFERENCES Benton, A. (1968). Differential behavioral effects in frontal lobe disease. Neuropsychologia, 6, 53–60. Benton, A. L., Hamsher, K. deS., & Silvay, A. B. (1994). Multilingual aphasia examination. Iowa City, IA: AJA Associates.

Thurstone Word Fluency Test

279

Bolter, J. F., Long, C. J., & Wagner, M. (1983). The utility of the Thurstone Word Fluency Test in identifying cortical damage. Clinical Neuropsychology, 5, 77–82. Borkowski, J. G., Benton, A. L., & Spreen, O. (1967). Word fluency and brain damage. Neuropsychologia, 5, 135–140. Cauthen, N. (1978). Verbal fluency: Normative data. Journal of Clinical Psychology, 34, 126–129. Gemmell, S. B., & Stanczak, D. E. Subject selection in neuropsychological research: The use of homogeneous versus heterogeneous samples. Paper presented at the 17th Annual Central California Research Symposium, May 2, 1996, California State University, Fresno, CA. Golden, C. J. (1978). Stroop color and word test. Wood Dale, IL: Stoelting Company. Heaton, R. K., Grant, I., & Matthews, C. G. (1991). Comprehensive norms for an expanded Halstead-Reitan battery: Demographics corrections, research findings, and clinical applications. Gainesville, FL: Psychological Assessment Resources. Martin, R. C., Loring, D. W., Meador, K. J., & Lee, G. P. (1990). The effects of lateralized temporal lobe dysfunction on formal and semantic word fluency. Neuropsychologia, 28, 823–829. Milner, B. (1964). Some effects of frontal lobectomy in man. In J. M. Warren & K. Akert (Eds.), The frontal granular cortex and behavior (pp. 313–334). New York: McGraw-Hill. Pendleton, M. C., Heaton, R. K., Lehman, R. A., & Hulihan, D. (1982). Diagnostic utility of the Thurstone Word Fluency Test in neuropsychological evaluations. Journal of Clinical Neuropsychology, 4, 307–317. Perret, E. (1974). The left frontal lobe of man and the suppression of habitual responses in verbal categorical behaviour. Neuropsychologia, 12, 323–330. Schaie, K. W., & Strother, C. R. (1968). A cross-sequential study of age changes in cognitive behavior. Psychological Bulletin, 70, 671–680. Stanczak, E. M., Stanczak, D. E., & Templer, D. I. (1999). Subject selection procedures in neuropsychological research: A meta-analysis and prospective study. Manuscript submitted for publication. Thurstone, L. L. (1938). Primary mental abilities. Chicago: University of Chicago Press. University of Chicago. (1985). Word fluency test. Chicago: London House. Wechsler, D. (1945). A standardized memory scale for clinical use. Journal of Psychology, 19, 87–95. Wechsler, D. (1981). Wechsler adult intelligence scale-revised. San Antonio: The Psychological Corporation.